Simple substring counting script in Python
21st June 2006
Approximately a month ago I endeavoured to use Python as my main shell-scripting language. At that moment, I was already aware of multiple benefits you get when you use Python for scripting:
- source-level cross-platform scripting: your script will run anywhere, where Python compiles; expanding this statement - your script will run anywhere, where there is a C compiler (needed to build Python itself)
- high-level language: you can iterate all the lines in a text file with as little as one 'for'-statement, for example (see the actual example below)
- simple/minimalist syntax: no curly braces around blocks of statements, no semicolons after each and every line of code, etc. Python at a glance looks much more understandable, than, for example, Perl.
- the power of C in a language-interpreting system
- it is interpreted! This gives easyness of debugging: modify, execute, see the trouble - with no compile/link stages
- and, despite being interpreted, it is fast!
For the comparison (in speed, memory use, program size) with other computer programming languages, please see the "Computer Language Shootout Benchmarks". Here I provide the link only to the comparison of Python with Pearl and comparison of Python with PHP (which can also be used as shell-scripting language, albeit after some tinkering with settings and stuff)
Below is an example of the 2-minute script in Python, which counts the number of occurrences of some string in a file.
PYTHON:
-
"""Read FILE and count number of occurences of SUBSTR."""
-
version = 0.01
-
-
import sys
-
-
def main():
-
from optparse import OptionParser
-
opts = OptionParser(usage="%prog [options] FILE SUBSTR",
-
version="%prog " + str(version),
-
description="Read FILE and count number of occurences of SUBSTR.")
-
opts.set_defaults(verbose=False,flush=False)
-
opts.add_option("-v", "--verbose", action="store_true", dest="verbose", help="Print every line containing substr [default: %default]")
-
opts.add_option("-f", "--flush", action="store_true", dest="flush", help="When verbose, flush every line [default: %default]")
-
(options, args) = opts.parse_args()
-
-
if len(args) != 2:
-
print "Two arguments required for correct processing"
-
opts.print_help()
-
sys.exit(2)
-
-
infile = args[0]
-
substr = args[1]
-
lines_count = 0
-
substr_count = 0
-
lines_substr_count = 0
-
if options.verbose and not options.flush:
-
msg = ""
-
-
f = open(infile, 'r')
-
for line in f:
-
lines_count += 1
-
found = line.count(substr)
-
substr_count += found
-
if found> 0:
-
lines_substr_count += 1
-
if options.verbose and not options.flush:
-
msg += str(found) + ": " + line
-
elif options.verbose and options.flush:
-
print (str(found) + ": " + line).replace("n","")
-
-
f.close()
-
-
if options.verbose and not options.flush:
-
print msg
-
print "Lines read from file: ", str(lines_count)
-
print "Lines with substring found: ", str(lines_substr_count)
-
print "Total substrings detected: ", str(substr_count)
-
-
return
-
-
if __name__ == "__main__": main()
October 12th, 2009 at 6:56
That's not simple
October 12th, 2009 at 11:38
Thanks for the contribution!
* please note, that the script listed is 3+ years old; something has definitely changed in Python since then
* your options parser is definitely less flexible (and less verbose to the end-user)
* your main loop is indeed several lines shorter, but mostly thanks to omitting open() and file.close() with while() - imported from __future__, as well as not using the '--verbose' option processing to dump extra data to the terminal
Overall, I find your submission definitely useful, but not actually as short and simple as you implied with "That's not simple
"
January 7th, 2010 at 17:26
January 7th, 2010 at 18:25
Nice, thanks.