Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    • Archives

    • Recent comments

    My smartd.conf, explained

    28th February 2015

    After fixing offline uncorrectable sector warning email, I have taken a closer look at my /etc/smartd.conf, and now it looks like this:

    DEFAULT -d sat -H -f -p -t -W 0,40,45 -n standby -S on -m example@example.com
    # Attributes 1, 230, and 231 are very important (-r 1! -r 230! -R 230! -r 231! -R 231!), but likely covered by -t.
    /dev/sda -s (S/../../6/01|L/../(01|02|03|04|05|06|07)/7/00) -C 0 -I 189 -I 194
    # -a implies -f and -p (through -t)
    DEFAULT -d sat -a -I 194 -W 0,40,45 -n standby -o on -S on -m example@example.com
    /dev/sdb -s (S/../../6/02|L/../(01|02|03|04|05|06|07)/7/02)
    # This drive does not decrement Offline_Uncorrectable (198) after re-allocation,
    # so only monitoring for increase, not for non-zero value.
    /dev/sdc -s (S/../../6/03|L/../(01|02|03|04|05|06|07)/7/04) -U 198+
    # This drive has 40 “normally”.
    /dev/sdd -s (S/../../6/04|L/../(01|02|03|04|05|06|07)/7/06) -W 0,42,45

    Note: explanations below are intentionally simplified; please consult man smartd.conf for more precise, complete, and up-to-date information.

    Ok, so what do these settings mean, and how is this different from default settings?

    By default, smartd assumes a DEVICESCAN directive, which auto-detects all HDDs, and enables reasonable default monitoring of SMART attributes.
    However, there are several benefits to individually specifying your disks:

    • less verbose smartd startup log messages (no messages about auto-detection and missing attributes)
    • ability to run scheduled offline, short, and long SMART self-tests
    • ability to monitor or exclude attributes individually for each drive (including temperature)

    Reasonable default mentioned above is the -a option, equivalent to the following individual options:

    -H -f -t -l error -l selftest -C 197 -U 198

    This is important to know, because my only SSD has no attribute 197, no self-test log, no error log, and no automatic offline testing.
    But I still want to start with quasi-default settings, and that is why the first configuration line includes all the options from -a, except those that my SSD does not support:

    DEFAULT -d sat -H -f -p -t -W 0,40,45 -n standby -S on -m example@example.com

    Here and in the -a options above,

    • -H: monitor overall health status (passed/failed)
    • -d sat: HDD type is SATA
    • -f: check if any of the Usage attributes (those not marked as Pre-fail) are below the manufacturer-set thresholds
    • -p: report changes in Pre-fail attributes (implied by -t below, so can be omitted)
    • -t: same as -p (above) with -u (report changes in Usage attributes)
    • -W 0,40,45: log a message if drive’s temperature goes above 40 degrees Celsius; log a critical messages if above 45
    • -n standby: do not wake-up (spin-up) the HDD if it is in sleep or standby mode (in which platters do not spin)
    • -S on: enable attributes auto-saving
    • -m example@example.com: address (or several comma-separated addresses) to receive warnings from smartd

    The DEFAULT directive is for convenience: options set by this directive apply to all the individual disk configuration lines below, until a different DEFAULT line is encountered.
    Here, I had used it to separate all /dev/sda options into 2 logical groups: supported defaults, and drive-specific configuration.
    SSD’s configuration is

    /dev/sda -s (S/../../6/01|L/../(01|02|03|04|05|06|07)/7/00) -C 0 -I 189 -I 194

    Here,

    • -C 0: explicitly disable attribute 197 monitoring (which is not present in this SSD)
    • -I 189 and -I 194: ignore attributes 189 and 194 (they both show temperature in this SSD)
    • -s (S/../../6/01|L/../(01|02|03|04|05|06|07)/7/00): schedule for short and long tests

    As for the tests…
    I want short self-tests every Saturday night, between 1 AM and 5 AM (shifted by 1H for every disk).
    I want long self-tests on the 1st Sunday of every month, between midnight and 8 AM (shifted by 2H for every disk).
    This is exactly what is encoded in my configuration for all drives. The easiest way to figure out the format is to read the relevant section of man smartd.conf.

    The remaining 3 drives are all HDDs, so I had defined a different common DEFAULT for them.
    The only new option that we see is -U 198+, which instructs smartd to only report increases of the Offline_Uncorrectable (198) attribute.
    This is necessary because my /dev/sdc does not decrement this attribute after sector re-allocation.

    I hope you found this post helpful.

    Share

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>