Simple substring counting script in Python
21st June 2006
Approximately a month ago I endeavoured to use Python as my main shell-scripting language. At that moment, I was already aware of multiple benefits you get when you use Python for scripting:
- source-level cross-platform scripting: your script will run anywhere, where Python compiles; expanding this statement – your script will run anywhere, where there is a C compiler (needed to build Python itself)
- high-level language: you can iterate all the lines in a text file with as little as one ‘for’-statement, for example (see the actual example below)
- simple/minimalist syntax: no curly braces around blocks of statements, no semicolons after each and every line of code, etc. Python at a glance looks much more understandable, than, for example, Perl.
- the power of C in a language-interpreting system
- it is interpreted! This gives easyness of debugging: modify, execute, see the trouble – with no compile/link stages
- and, despite being interpreted, it is fast!
For the comparison (in speed, memory use, program size) with other computer programming languages, please see the “Computer Language Shootout Benchmarks”. Here I provide the link only to the comparison of Python with Perl and comparison of Python with PHP (which can also be used as shell-scripting language, albeit after some tinkering with settings and stuff)
Below is an example of the 2-minute script in Python, which counts the number of occurrences of some string in a file.
- """Read FILE and count number of occurences of SUBSTR."""
- version = 0.01
- import sys
- def main():
- from optparse import OptionParser
- opts = OptionParser(usage="%prog [options] FILE SUBSTR",
- version="%prog " + str(version),
- description="Read FILE and count number of occurences of SUBSTR.")
- opts.set_defaults(verbose=False,flush=False)
- opts.add_option("-v", "--verbose", action="store_true", dest="verbose", help="Print every line containing substr [default: %default]")
- opts.add_option("-f", "--flush", action="store_true", dest="flush", help="When verbose, flush every line [default: %default]")
- (options, args) = opts.parse_args()
- if len(args) != 2:
- print "Two arguments required for correct processing"
- opts.print_help()
- sys.exit(2)
- infile = args[0]
- substr = args[1]
- lines_count = 0
- substr_count = 0
- lines_substr_count = 0
- if options.verbose and not options.flush:
- msg = ""
- f = open(infile, 'r')
- for line in f:
- lines_count += 1
- found = line.count(substr)
- substr_count += found
- if found > 0:
- lines_substr_count += 1
- if options.verbose and not options.flush:
- msg += str(found) + ": " + line
- elif options.verbose and options.flush:
- print (str(found) + ": " + line).replace("n","")
- f.close()
- if options.verbose and not options.flush:
- print msg
- print "Lines read from file: ", str(lines_count)
- print "Lines with substring found: ", str(lines_substr_count)
- print "Total substrings detected: ", str(substr_count)
- return
- if __name__ == "__main__": main()
October 12th, 2009 at 6:56
That’s not simple
October 12th, 2009 at 11:38
Thanks for the contribution!
* please note, that the script listed is 3+ years old; something has definitely changed in Python since then
* your options parser is definitely less flexible (and less verbose to the end-user)
* your main loop is indeed several lines shorter, but mostly thanks to omitting open() and file.close() with while() – imported from __future__, as well as not using the ‘–verbose’ option processing to dump extra data to the terminal
Overall, I find your submission definitely useful, but not actually as short and simple as you implied with “That’s not simple :)”
January 7th, 2010 at 17:26
January 7th, 2010 at 18:25
Nice, thanks.