Python: iterate (and read) all files in a directory (folder)

12th August 2007

To iterate through all the files within the specified directory (folder), with ability to use wildcards (*, ?, and [ ]-style ranges), use the following code snippet:

import os
import glob
path = 'sequences/'
for infile in glob.glob( os.path.join(path, '*.fasta') ):
print "current file is: " + infile

If you do not need wildcards, then there is a simpler way to list all items in a directory:

import os
path = 'sequences/'
listing = os.listdir(path)
for infile in listing:
print "current file is: " + infile

print was promoted from a statement to a function in Python 3 (use print(infile) instead of print infile).

One should use ‘os.path.join()’ part to make the script cross-platform-portable (different OS use different path separators, and hard-coding path separator would stop the script from executing under a different OS).

Python docs mention that there is also iglob(), which is an iterator and thus working on directories with way too many files it will save memory by returning only single result per iteration, and not the whole list of files – as glob() does.

This entry was posted on Sunday, August 12th, 2007 at 13:33 and is filed under Programming, Python. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

39 Responses to “Python: iterate (and read) all files in a directory (folder)”

Dt Says:
December 23rd, 2008 at 11:38
works just fine for me, only important change to the code that i had to make was turning print into a function because im using python 3.0, i also set it to read files with *all* extensions.
```
import os, glob
path = 'insert your own path you lazy bastards '
for infile in glob.glob( os.path.join(path, '*.*') ):
       print("current file is: " + infile)
```
Bogdan Says:
December 23rd, 2008 at 13:21
Dt, thanks, I’ve updated the code.

Mike Says:
May 14th, 2009 at 3:52

import os, glob
def dir(path):
    for infile in glob.glob( os.path.join(path) ):
        print "current file is: " + infile
path = dir(raw_input("Enter the path: "))

Ferralll Says:
May 18th, 2009 at 22:25
Thankyou very much…
This was exactly what I was looking for!
Richard Says:
November 24th, 2009 at 3:38
marvellous

Kris Says:
December 7th, 2009 at 2:11

import os, glob
path = './'
for infile in glob.glob( os.path.join(path, '*.*') ):
    print("current file is: " + infile)

#lists all files in directory script is in

Dan Says:
January 3rd, 2010 at 20:14
Is there a way to change this script so that it also runs through sub-directories under the given path name?
Bogdan Says:
January 3rd, 2010 at 21:07
Make that code into a function – e.g. scan_dirs(path) – and add a single line of code to it (pseudocode below):
if os.path.isdir(infile): scan_dirs(infile)
This will do exactly what you want.

Dan Says:
January 11th, 2010 at 22:07

Bogdan,

Thanks for the help. I’m still not getting the code to look at the directories within the path. Here’s my code, it still only looks at the files under the initial path.


def scandirs(path):
    for currentFile in glob.glob( os.path.join(path, '*.*') ):
        if os.path.isdir(currentFile):
            scandirs(currentFile)
        print "processing file: " + currentFile
scandirs('XML/')

Bogdan Says:
January 11th, 2010 at 23:44

Dan,

script below seems to work perfectly for me:


import os, glob
def scandirs(path):
    for currentFile in glob.glob( os.path.join(path, '*') ):
        if os.path.isdir(currentFile):
            print 'got a directory: ' + currentFile
            scandirs(currentFile)
        print "processing file: " + currentFile
scandirs('Desktop')

Basically, I’ve changed the ‘*.*’ wildcard to just ‘*’.

Dan Says:
January 12th, 2010 at 4:43
Ahh… My *.* as opposed to a * had it so it wasn’t looking at folders, thus the problem. Thanks again!
Bill Tate Says:
November 30th, 2010 at 1:47
Is there a way to also do thiw in Windows? What I need to do is
process every *.txt file in a directory, one at a time, inside
a Python script.
Rommel Says:
January 26th, 2011 at 21:05
Thanks. This snippet helped a bunch.
Stefan Says:
May 13th, 2011 at 13:51
Is there a possibility to list the files in order, by name ?
For example :
/path/file01.txt
/path/file02.txt
…………..
If I use the codes you presented here i get scrambled order

Stefan Says:
May 13th, 2011 at 14:20

I found it:

dirList=os.listdir(path)
dirList.sort()
for fname in dirList:
    print( fname)

ablaze Says:
August 10th, 2011 at 14:54
Hi…
I am working in ubuntu. I have a bunch of commands (say 10 commands like cmd1, cmd2, cmd3…………..cmd10)
I want to write a python script, which can achive the following:
It should traverse through the directory structure and apply a command at particular directory path.
The location and the commands are already known to me.
/local/mnt/myspace/sample1$ cmd1
/local/mnt/myspace/sample2$ cmd2
/local/mnt/myspace$ cmd3
/local/mnt$cmd4
/local/mnt/myspace/sample9$ cmd 8
/local/mnt/myspace/sample3$ dmd10
can someone please provide the script as I am not event a beginner in python.
toto Says:
September 14th, 2011 at 11:28
thank you very much for your explaining. I get a problem when try to list file or directory in Python. You solve my problem
born Says:
October 13th, 2011 at 14:40
hi …
I have been messing around with a python program to browse through images in a directory and display it in a canvas.can anybody help??
vaishu Says:
March 16th, 2012 at 13:07
Is there a way to open and read many PDB files(eg:1ASD.pdb,2sew.pdb,5res.pdb) from a folder(eg:protein) present in drive(eg:E:/)automatically without entering each name of the PDB file? bcos it is upto 14,000 PDB files.
Adam Says:
March 22nd, 2012 at 17:55
First off, this is great! Can’t begin to tell you how helpful it is. One question: Is there a way to have it loop through only visible files? For example, in every folder, Mac OSX creates a .DS_Store file. When I iterate through, it picks up this file, which gets included in any subsequent arrays, lists, etc.
Thanks
Bogdan Says:
March 22nd, 2012 at 23:21
@Vaishu: just use the script with a proper mask, like *.pdb. Maybe also make it recursive (see comment 10), if you have PDB files in sub-directories.
@Adam: just use the proper filename mask. For example, *.* should not include any files which start with a dot (like .DS_Store). Another way is to check the filename in Python, e.g.
```
if filename == '.DS_Store':
    continue  # skip the file
```
Adam Says:
March 23rd, 2012 at 5:01
Thank @Bogdan. That definitely helps, but there’s no way to systematically look for only visible files?
priya Says:
March 23rd, 2012 at 8:00
Hiiii
pls help me to do this simple program:
Drive:F:/
folder:X
files:x1.txt,x2.txt,x3.txt,x4.txt,x5.txt(5 seperate files)
I have to read all these files quickly, so i had generated a list as list.txt=['F:/X/x1.txt','F:/X/x2.txt',F:/X/x3.txt',F:/X/x4.txt',F:/X/x5.txt']
now i have to read list.txt file and i want to generate listres.txt file by ‘w’
where
listres.txt=['F:/X/x1res.txt','F:/X/x2res.txt',F:/X/x3res.txt',F:/X/x4res.txt',F:/X/x5res.txt']
i expect to write result of X1.txt file in X1res.txt alone(X2.txt file in X2res.txt file) but unfortunately it is writing result of all x1+x2+x3+x4+x5 files in x1res files and same result in x2res files how to seperate it?
Bogdan Says:
March 24th, 2012 at 19:52
@Adam, I’m not currently aware of such a method. If it exists, then it should be either somewhere in os.path, or in collections. Please report back if you find it
@Priya, you should probably use http://stackoverflow.com/ or http://codereview.stackexchange.com/ to post your code and have volunteers help you with it.
vaishu Says:
March 26th, 2012 at 6:48
Hello sir,
Thank you very much to Bogdan and Adam.
vaishu Says:
April 2nd, 2012 at 8:13
hELLO,
Python Comments used for arranging floating points in ascending order
For ex..,
In Drive=C:/ Folder=r Textfile=seq.txt
Contents of seq.txt=
9.45
6.346
2.5632
8.1452
My aim is i want result as
2.5632
6.346
8.1452
9.45
what python code should be used for such type of process
vaishu Says:
April 4th, 2012 at 13:29
sorry,
contents of seq.txt is
9.45
9SEQ
6.346
4CGF
2.5632
3RES
8.1452
2HAB
and i want results of only floating points,
i.e)
2.5632
6.346
8.1452
9.45
Paal Says:
April 10th, 2012 at 15:08
Hey, I’m having a different problem.
I have two or more folders with lots of files in them. Both folders contain some files that are exactly the same, but the files have different names. I want to use a python script to matches files in these two folders by size. Cause When the size of the file is the same, I think it’s a high enough possibillity that the files are the same.
The best script would merge these to folders together and delete duplicates, based on name or size, or name and size.
Anyone who know how to write one of these scripts?
That would be really helpfull!
sorry bout my bad english.
Bogdan Says:
April 10th, 2012 at 20:00
@Vaishu, you could wrap the conversion to float() into a try..except block, and thus separate purely-numeric values from alphanumeric.
@Paal, you could use a program like ‘fdupes’, which does exactly what you want – de-duplicates the contents of two arbitrary directories.
Paal Says:
April 11th, 2012 at 13:41
@Bogdan, Yeah, but I’m on a windows platform at work. Know of any similar program for windows? Or a python script
Bogdan Says:
April 11th, 2012 at 23:40
I guess fdupes could be compiled/run in cygwin.
There are tons of similar programs for windows, and some are even good enough, but I’m not up-to-date on that software – so cannot advise.
You could write the required python script by, e.g., first creating two dicts of {filename: filesize}, and then comparing them to find identical filesizes (and yield the two filenames). This is a suboptimal approach, but that probably doesn’t matter for low numbers of files; for higher numbers, you would want a different approach. (One more suboptimal idea, but slightly better, would be to populate the 1st dict with {size:name}, and then iterate over the files in the 2nd dir, checking for “size in dict”.)
VS Says:
April 16th, 2012 at 13:51
Hey i want to change the file path dynamically. Eg: “/cygdrive/d/Python_Study/Snehal/xyz/1.xml”
here xyz may be anything like default, config etc.. i want to read this path depending on xyz value. How can i do that???

Pim Says:
April 24th, 2012 at 8:52

@adam

This should work:

if filename[0] == '.': # if the first character is a dot
    continue           # skip the file

Sneha tayade Says:
August 9th, 2012 at 16:37
Hi
Can anybody help me?
I have some images in one folder and I want to open all the images one by one,
need to do some processing and save it. I try this code, but got an error, as ‘No such file or directory’
on the other hand, when I print the list of images, it work nicely.
```
path = '/home/Folder/S/'
listing = os.listdir(path)
for infile in listing:
    im = Image.open(infile)
    im.save("out.jpg","JPEG")
```
Casa Says:
August 21st, 2012 at 21:37
Hi Guys,
I have problem this similar to the situation currently being discussed here. I have some CAD files in differerent folders arranged in sequence of years like,
C:\CADfile\1990_dwg
C:\CADfile\1991_dwg
C:\CADfile\1992_dwg
C:\CADfile\1993_dwg
C:\CADfile\1994_dwg
C:\CADfile\19950_dwg
etc. upto the year 2012
The case is this I want my python script to iterate through these folders, create a geodatabase under each name folder and the populate the geodatabase with the feature classes store in the folder
I have been a able to produce a script that can create the geodatabase of the single file and populate it with feature datasets but the problem is I can not ghet the script to go through all the folders and do the same thing.
Please I will be glad to get help on this.
Here is my scripts so far
```
#Import system modules
import arcpy
import glob
import os
# Set workspace and variables
for year in range(1990,2009): # 1990-2009
    inFolder =  r"c:\data\cadfiles\{0}_dwg".format(year) # 1990_dwg
    gdbName = "d{0}.gdb".format (1990,2009) # d1990.gdb
    arcpy.env.workspace = gdbName
# Create a FileGDB for the fds
arcpy.CreateFileGDB_management("C:/data", "d{0}.gdb".format(year))
reference_scale = "1500"
for file in glob.glob(r"N:\{0}_dwg"):
    outDS = arcpy.ValidateTableName(os.path.splitext("d" + os.path.basename(file))[0])
    arcpy.CADToGeodatabase_conversion(file, gdbName, outDS, reference_scale)
```
renu Says:
November 11th, 2012 at 10:49
hai I am very new to programming specially python.I have been given a task.i have to define a function which looks like my_func(input_db, output_directory).
it has to be robust and:
Check if there are any matching NS**** or T**** files existing.
Only for those files the function has to be executed!
All results
should be stored inside a user-definable directory. The function should print out how many data sets
were found and how many data sets were processed.
can any one help please?
Diego Says:
December 7th, 2012 at 20:12
Thank you!

Yidnekachew kibru Says:
May 20th, 2014 at 12:39


# open all the documents
#docnames.txt   holds the path for the douments
infile=open("D:/laboratory/python/docnames.txt", encoding='utf-8', mode='r')
docnum=20
for i in range(1,docnum):
    line = infile.readline()
    k=''
    l=k+line.rstrip()
    infile1=open(l,encoding='utf-8',mode='r')
    k=infile1.readlines()
    print(k)

Valentin Cosug Says:
October 24th, 2016 at 18:00

Hello,I’m also new to programming
I’m trying to replace certain characters (like “A” with “U”) in all files in a specific directory
I’m sure my code is a mess , so don’t be mad at me, but can you tell me how to do this?
Thanks


import os
yourpath = 'd:\\Pyton\\Programs'
exten='*.*'
for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        print(os.path.join(yourpath, name))
        filedata = name
        with open(name, 'r') as file :
            filedata = file.read()
            filedata = filedata.replace('A','U')
            with open(name, 'w') as file:
                file.write(filedata)
exit()

« Back to blog

Non-Programmer’s Tutorial for Python »

Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

Categories

Related entries

Subscribe

Archives

Recent comments

Meta