Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

  • Exits

  • Categories

  • Archives

  • Tags list

    • Blog sponsors

    • Exits

    • Visitors’ track

      Locations of visitors to this page

    R script to filter probesets with log-expression values below the lowest spike-in

    27th January 2010

    Sometimes there is a need to remove all the probesets, which have expression values below the minimal spike-in intensity on the Affymetrix microarray. The reasoning behind this procedure is simple: minimal-expression spike-ins represent the bottom margin of microarray sensitivity, and anything below that margin cannot be reliably quantified - which also means that both fold-change and p-value of expression variance will be unreliable for these probesets.

    Here's a simple R script to do just that. It is abundantly commented, and also contains an optional (commented out) fragment which allows the removal of more low-variance, low-intensity probesets.


    Hint: click the "plain text" box header to be able to right-click the code, "Select All", and then "Copy".

    CODE:
    1. filter_below_spikes = function(eset) {
    2.     # Finds max(lowest) AFFX/spike-in intensity, and removes rows consisting entirely of values below max(lowest).
    3.     # @param eset
    4.     # ExpressionSet
    5.     # @returns
    6.     # exprs(ExpressionSet), filtered
    7.  
    8.     # Without Biobase exprs() will not work.
    9.     require(Biobase)
    10.     expr = exprs(eset)
    11.  
    12.     # 'expr' sample:
    13.     #                  1
    14.     # 1367452_at 10.880208
    15.     # 1367453_at 10.554647
    16.  
    17.     cat(nrow(expr), "rows before filtering.\n")
    18.  
    19.     # Make a vector of spike row names.
    20.     spikes = grep("AFFX", rownames(expr), value = TRUE)
    21.     cat("Expression matrix has", length(spikes), "spike-in rows.\n")
    22.     cat("Summary of spike-in values distribution follows:\n")
    23.     print(summary(expr[spikes, ]))
    24.  
    25.     # Find max(lowest) spike-in values.
    26.     minval_max = max(as.double(substr(grep("Min", summary(expr[spikes, ]), value = TRUE), 10, 14)))
    27.     cat("max(minimal spike-in log-intensity values) =", minval_max, "\n")
    28.  
    29.     # Remove spike-ins from expr.
    30.     expr = expr[grep("AFFX", rownames(expr), value = TRUE, invert = TRUE), ]
    31.  
    32.     nospike_rows = nrow(expr)
    33.     cat(nospike_rows, "rows remaining after the removal of", length(spikes), "spike-in probesets.\n")
    34.  
    35.     # Optional: calculate max(SD) of all removed rows.
    36.     #bad_sds_max = max(apply(expr[!apply((expr> minval_max), 1, any),], 1, sd))
    37.  
    38.     # Now remove all rows, where each value is <= minval_max.
    39.     expr = expr[!apply((expr <= minval_max), 1, all), ]
    40.     cat(nrow(expr), "rows remaining after filtering out", nospike_rows - nrow(expr), "probesets with all values below", minval_max, "\n")
    41.     #cat(bad_sds_max, "is max(SD) of all", nospike_rows - nrow(expr), "filtered probesets with all values below", minval_max, "\n")
    42.  
    43.     # Optional: Remove *some* of the rows, which have at least one value below minval_max, and row_SD <= bad_sds_max.
    44.     #pre_final_rows = nrow(expr)
    45.     #expr = expr[(apply(expr, 1, sd)> bad_sds_max) | (apply(expr, 1, min)> minval_max), ]
    46.     #cat(pre_final_rows-nrow(expr), "rows with SD <=", bad_sds_max, "and min(row) <=", minval_max, "were removed.\n")
    47.     #cat(nrow(expr), "final rows returned.\n")
    48.  
    49.     return(expr)
    50. }

    Sample use:

    > source("script.R")
    > expr.filtered = filter_below_spikes(eset)

    Comments and suggestions are welcome.

    • Delicious
    • Google Bookmarks
    • Yahoo Bookmarks
    • Windows Live Favorites
    • Technorati Favorites
    • Digg
    • Slashdot
    • StumbleUpon
    • Read It Later
    • Twitter
    • Share/Bookmark

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>