Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    Python: iterate (and read) all files in a directory (folder)

    12th August 2007

    To iterate through all the files within the specified directory (folder), with ability to use wildcards (*, ?, and [ ]-style ranges), use the following code snippet:

    1. import os
    2. import glob
    3.  
    4. path = 'sequences/'
    5. for infile in glob.glob( os.path.join(path, '*.fasta') ):
    6.     print "current file is: " + infile

    If you do not need wildcards, then there is a simpler way to list all items in a directory:

    1. import os
    2.  
    3. path = 'sequences/'
    4. listing = os.listdir(path)
    5. for infile in listing:
    6.     print "current file is: " + infile

    print was promoted from a statement to a function in Python 3 (use print(infile) instead of print infile).

    One should use ‘os.path.join()’ part to make the script cross-platform-portable (different OS use different path separators, and hard-coding path separator would stop the script from executing under a different OS).

    Python docs mention that there is also iglob(), which is an iterator and thus working on directories with way too many files it will save memory by returning only single result per iteration, and not the whole list of files – as glob() does.

    StumbleUponDeliciousCiteULikeShare

    38 Responses to “Python: iterate (and read) all files in a directory (folder)”

    1. Dt Says:

      works just fine for me, only important change to the code that i had to make was turning print into a function because im using python 3.0, i also set it to read files with *all* extensions.

      import os, glob
      path = 'insert your own path you lazy bastards '
      for infile in glob.glob( os.path.join(path, '*.*') ):
             print("current file is: " + infile)
      
    2. Bogdan Says:

      Dt, thanks, I’ve updated the code.

    3. Mike Says:
      import os, glob
      def dir(path):
          for infile in glob.glob( os.path.join(path) ):
              print "current file is: " + infile
      path = dir(raw_input("Enter the path: "))
      
    4. Ferralll Says:

      Thankyou very much…
      This was exactly what I was looking for!

    5. Richard Says:

      marvellous

    6. Kris Says:
      import os, glob
      path = './'
      for infile in glob.glob( os.path.join(path, '*.*') ):
          print("current file is: " + infile)
      

      #lists all files in directory script is in

    7. Dan Says:

      Is there a way to change this script so that it also runs through sub-directories under the given path name?

    8. Bogdan Says:

      Make that code into a function – e.g. scan_dirs(path) – and add a single line of code to it (pseudocode below):

      if os.path.isdir(infile): scan_dirs(infile)

      This will do exactly what you want.

    9. Dan Says:

      Bogdan,

      Thanks for the help. I’m still not getting the code to look at the directories within the path. Here’s my code, it still only looks at the files under the initial path.

      
      def scandirs(path):
          for currentFile in glob.glob( os.path.join(path, '*.*') ):
              if os.path.isdir(currentFile):
                  scandirs(currentFile)
              print "processing file: " + currentFile
      scandirs('XML/')
      
    10. Bogdan Says:

      Dan,

      script below seems to work perfectly for me:

      
      import os, glob
      def scandirs(path):
          for currentFile in glob.glob( os.path.join(path, '*') ):
              if os.path.isdir(currentFile):
                  print 'got a directory: ' + currentFile
                  scandirs(currentFile)
              print "processing file: " + currentFile
      scandirs('Desktop')
      

      Basically, I’ve changed the ‘*.*’ wildcard to just ‘*’.

    11. Dan Says:

      Ahh… My *.* as opposed to a * had it so it wasn’t looking at folders, thus the problem. Thanks again!

    12. Bill Tate Says:

      Is there a way to also do thiw in Windows? What I need to do is
      process every *.txt file in a directory, one at a time, inside
      a Python script.

    13. Rommel Says:

      Thanks. This snippet helped a bunch.

    14. Stefan Says:

      Is there a possibility to list the files in order, by name ?
      For example :
      /path/file01.txt
      /path/file02.txt
      …………..

      If I use the codes you presented here i get scrambled order

    15. Stefan Says:

      I found it:

      dirList=os.listdir(path)
      dirList.sort()
      for fname in dirList:
          print( fname)
      
    16. ablaze Says:

      Hi…
      I am working in ubuntu. I have a bunch of commands (say 10 commands like cmd1, cmd2, cmd3…………..cmd10)
      I want to write a python script, which can achive the following:

      It should traverse through the directory structure and apply a command at particular directory path.
      The location and the commands are already known to me.

      /local/mnt/myspace/sample1$ cmd1
      /local/mnt/myspace/sample2$ cmd2
      /local/mnt/myspace$ cmd3
      /local/mnt$cmd4
      /local/mnt/myspace/sample9$ cmd 8
      /local/mnt/myspace/sample3$ dmd10

      can someone please provide the script as I am not event a beginner in python.

    17. toto Says:

      thank you very much for your explaining. I get a problem when try to list file or directory in Python. You solve my problem :)

    18. born Says:

      hi …
      I have been messing around with a python program to browse through images in a directory and display it in a canvas.can anybody help??

    19. vaishu Says:

      Is there a way to open and read many PDB files(eg:1ASD.pdb,2sew.pdb,5res.pdb) from a folder(eg:protein) present in drive(eg:E:/)automatically without entering each name of the PDB file? bcos it is upto 14,000 PDB files.

    20. Adam Says:

      First off, this is great! Can’t begin to tell you how helpful it is. One question: Is there a way to have it loop through only visible files? For example, in every folder, Mac OSX creates a .DS_Store file. When I iterate through, it picks up this file, which gets included in any subsequent arrays, lists, etc.

      Thanks

    21. Bogdan Says:

      @Vaishu: just use the script with a proper mask, like *.pdb. Maybe also make it recursive (see comment 10), if you have PDB files in sub-directories.

      @Adam: just use the proper filename mask. For example, *.* should not include any files which start with a dot (like .DS_Store). Another way is to check the filename in Python, e.g.

      if filename == '.DS_Store':
          continue  # skip the file
      
    22. Adam Says:

      Thank @Bogdan. That definitely helps, but there’s no way to systematically look for only visible files?

    23. priya Says:

      Hiiii
      pls help me to do this simple program:
      Drive:F:/
      folder:X
      files:x1.txt,x2.txt,x3.txt,x4.txt,x5.txt(5 seperate files)

      I have to read all these files quickly, so i had generated a list as list.txt=['F:/X/x1.txt','F:/X/x2.txt',F:/X/x3.txt',F:/X/x4.txt',F:/X/x5.txt']

      now i have to read list.txt file and i want to generate listres.txt file by ‘w’
      where
      listres.txt=['F:/X/x1res.txt','F:/X/x2res.txt',F:/X/x3res.txt',F:/X/x4res.txt',F:/X/x5res.txt']

      i expect to write result of X1.txt file in X1res.txt alone(X2.txt file in X2res.txt file) but unfortunately it is writing result of all x1+x2+x3+x4+x5 files in x1res files and same result in x2res files how to seperate it?

    24. Bogdan Says:

      @Adam, I’m not currently aware of such a method. If it exists, then it should be either somewhere in os.path, or in collections. Please report back if you find it :)

      @Priya, you should probably use http://stackoverflow.com/ or http://codereview.stackexchange.com/ to post your code and have volunteers help you with it.

    25. vaishu Says:

      Hello sir,
      Thank you very much to Bogdan and Adam.

    26. vaishu Says:

      hELLO,
      Python Comments used for arranging floating points in ascending order
      For ex..,
      In Drive=C:/ Folder=r Textfile=seq.txt
      Contents of seq.txt=
      9.45
      6.346
      2.5632
      8.1452
      My aim is i want result as
      2.5632
      6.346
      8.1452
      9.45
      what python code should be used for such type of process

    27. vaishu Says:

      sorry,
      contents of seq.txt is
      9.45
      9SEQ
      6.346
      4CGF
      2.5632
      3RES
      8.1452
      2HAB
      and i want results of only floating points,
      i.e)
      2.5632
      6.346
      8.1452
      9.45

    28. Paal Says:

      Hey, I’m having a different problem.
      I have two or more folders with lots of files in them. Both folders contain some files that are exactly the same, but the files have different names. I want to use a python script to matches files in these two folders by size. Cause When the size of the file is the same, I think it’s a high enough possibillity that the files are the same.
      The best script would merge these to folders together and delete duplicates, based on name or size, or name and size.

      Anyone who know how to write one of these scripts?

      That would be really helpfull!

      sorry bout my bad english.

    29. Bogdan Says:

      @Vaishu, you could wrap the conversion to float() into a try..except block, and thus separate purely-numeric values from alphanumeric.

      @Paal, you could use a program like ‘fdupes’, which does exactly what you want – de-duplicates the contents of two arbitrary directories.

    30. Paal Says:

      @Bogdan, Yeah, but I’m on a windows platform at work. Know of any similar program for windows? Or a python script

    31. Bogdan Says:

      I guess fdupes could be compiled/run in cygwin.

      There are tons of similar programs for windows, and some are even good enough, but I’m not up-to-date on that software – so cannot advise.

      You could write the required python script by, e.g., first creating two dicts of {filename: filesize}, and then comparing them to find identical filesizes (and yield the two filenames). This is a suboptimal approach, but that probably doesn’t matter for low numbers of files; for higher numbers, you would want a different approach. (One more suboptimal idea, but slightly better, would be to populate the 1st dict with {size:name}, and then iterate over the files in the 2nd dir, checking for “size in dict”.)

    32. VS Says:

      Hey i want to change the file path dynamically. Eg: “/cygdrive/d/Python_Study/Snehal/xyz/1.xml”
      here xyz may be anything like default, config etc.. i want to read this path depending on xyz value. How can i do that???

    33. Pim Says:

      @adam

      This should work:

      if filename[0] == '.': # if the first character is a dot
          continue           # skip the file
      
    34. Sneha tayade Says:

      Hi

      Can anybody help me?
      I have some images in one folder and I want to open all the images one by one,
      need to do some processing and save it. I try this code, but got an error, as ‘No such file or directory’
      on the other hand, when I print the list of images, it work nicely.

      path = '/home/Folder/S/'
      listing = os.listdir(path)
      for infile in listing:
          im = Image.open(infile)
          im.save("out.jpg","JPEG")
      
    35. Casa Says:

      Hi Guys,

      I have problem this similar to the situation currently being discussed here. I have some CAD files in differerent folders arranged in sequence of years like,
      C:\CADfile\1990_dwg
      C:\CADfile\1991_dwg
      C:\CADfile\1992_dwg
      C:\CADfile\1993_dwg
      C:\CADfile\1994_dwg
      C:\CADfile\19950_dwg
      etc. upto the year 2012

      The case is this I want my python script to iterate through these folders, create a geodatabase under each name folder and the populate the geodatabase with the feature classes store in the folder
      I have been a able to produce a script that can create the geodatabase of the single file and populate it with feature datasets but the problem is I can not ghet the script to go through all the folders and do the same thing.
      Please I will be glad to get help on this.
      Here is my scripts so far

      #Import system modules
      import arcpy
      import glob
      import os
      # Set workspace and variables
      for year in range(1990,2009): # 1990-2009
          inFolder =  r"c:\data\cadfiles\{0}_dwg".format(year) # 1990_dwg
          gdbName = "d{0}.gdb".format (1990,2009) # d1990.gdb
          arcpy.env.workspace = gdbName
      # Create a FileGDB for the fds
      arcpy.CreateFileGDB_management("C:/data", "d{0}.gdb".format(year))
      reference_scale = "1500"
      for file in glob.glob(r"N:\{0}_dwg"):
          outDS = arcpy.ValidateTableName(os.path.splitext("d" + os.path.basename(file))[0])
          arcpy.CADToGeodatabase_conversion(file, gdbName, outDS, reference_scale)
    36. renu Says:

      hai I am very new to programming specially python.I have been given a task.i have to define a function which looks like my_func(input_db, output_directory).
      it has to be robust and:
      Check if there are any matching NS**** or T**** files existing.
      Only for those files the function has to be executed!
      All results
      should be stored inside a user-definable directory. The function should print out how many data sets
      were found and how many data sets were processed.
      can any one help please?

    37. Diego Says:

      Thank you!

    38. Yidnekachew kibru Says:
      
      # open all the documents
      #docnames.txt   holds the path for the douments
      infile=open("D:/laboratory/python/docnames.txt", encoding='utf-8', mode='r')
      docnum=20
      for i in range(1,docnum):
          line = infile.readline()
          k=''
          l=k+line.rstrip()
          infile1=open(l,encoding='utf-8',mode='r')
          k=infile1.readlines()
          print(k)
      

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>