Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    Simple substring counting script in Python

    21st June 2006

    Approximately a month ago I endeavoured to use Python as my main shell-scripting language. At that moment, I was already aware of multiple benefits you get when you use Python for scripting:

    • source-level cross-platform scripting: your script will run anywhere, where Python compiles; expanding this statement – your script will run anywhere, where there is a C compiler (needed to build Python itself)
    • high-level language: you can iterate all the lines in a text file with as little as one ‘for’-statement, for example (see the actual example below)
    • simple/minimalist syntax: no curly braces around blocks of statements, no semicolons after each and every line of code, etc. Python at a glance looks much more understandable, than, for example, Perl.
    • the power of C in a language-interpreting system
    • it is interpreted! This gives easyness of debugging: modify, execute, see the trouble – with no compile/link stages
    • and, despite being interpreted, it is fast!

    For the comparison (in speed, memory use, program size) with other computer programming languages, please see the “Computer Language Shootout Benchmarks”. Here I provide the link only to the comparison of Python with Perl and comparison of Python with PHP (which can also be used as shell-scripting language, albeit after some tinkering with settings and stuff)

    Below is an example of the 2-minute script in Python, which counts the number of occurrences of some string in a file.

    1. """Read FILE and count number of occurences of SUBSTR."""
    2. version = 0.01
    3.  
    4. import sys
    5.  
    6. def main():
    7.   from optparse import OptionParser
    8.   opts = OptionParser(usage="%prog [options] FILE SUBSTR",
    9.     version="%prog " + str(version),
    10.     description="Read FILE and count number of occurences of SUBSTR.")
    11.   opts.set_defaults(verbose=False,flush=False)
    12.   opts.add_option("-v", "--verbose", action="store_true", dest="verbose", help="Print every line containing substr [default: %default]")
    13.   opts.add_option("-f", "--flush", action="store_true", dest="flush", help="When verbose, flush every line [default: %default]")
    14.   (options, args) = opts.parse_args()
    15.  
    16.   if len(args) != 2:
    17.     print "Two arguments required for correct processing"
    18.     opts.print_help()
    19.     sys.exit(2)
    20.  
    21.   infile = args[0]
    22.   substr = args[1]
    23.   lines_count = 0
    24.   substr_count = 0
    25.   lines_substr_count = 0
    26.   if options.verbose and not options.flush:
    27.     msg = ""
    28.  
    29.   f = open(infile, 'r')
    30.   for line in f:
    31.     lines_count += 1
    32.     found = line.count(substr)
    33.     substr_count += found
    34.     if found > 0:
    35.       lines_substr_count += 1
    36.       if options.verbose and not options.flush:
    37.         msg += str(found) + ": " + line
    38.       elif options.verbose and options.flush:
    39.         print (str(found) + ": " + line).replace("n","")
    40.  
    41.   f.close()
    42.  
    43.   if options.verbose and not options.flush:
    44.     print msg
    45.   print "Lines read from file: ", str(lines_count)
    46.   print "Lines with substring found: ", str(lines_substr_count)
    47.   print "Total substrings detected: ", str(substr_count)
    48.  
    49.   return
    50.  
    51. if __name__ == "__main__":  main()
    Share

    Posted in Programming, Python | 4 Comments »