Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    My favourite command line for mirroring with wget

    9th January 2011

    wget ––continue ––mirror ––page-requisites ––adjust-extension ––convert-links ––backup-converted ––limit-rate=500k ––wait=2 URL

    (short form: wget -cmpEkK ––limit-rate=500k -w 2 URL)

    • ––continue does no harm on initial run, but when run again this instructs wget to pick up where it was interrupted the previous time
    • ––mirror turns on several options useful for mirroring (read wget manual for details)
    • ––page-requisites instructs wget to also fetch linked resources, which would otherwise be skipped
    • ––adjust-extension (or ––html-extension in older wget versions) will append the .html extension to files without it, but with document type text/html or equivalent
    • ––convert-links will convert the links for convenient local viewing
    • ––backup-converted helps to avoid re-downloading some files, especially when ––convert-links is specified
    • ––limit-rate=500k and ––wait=2 prevent overloading the target website (being nice to someone offering information you need is a must)
    Share

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>