Loading...
 

R-Server Analyses




The following analyses are available:
  • Agilent QC
  • Affymetrix QC
  • Two Group Analysis
  • ANOVA Analysis
  • Exon Two Group Analysis
  • Exon ANOVA Analysis

R Analysis Options can be passed in the name string of the analysis.

Advanced Options


R-Server analyses allow the specification of advanced options in the name of the analysis. The syntax is
<analysis name> [<option name>=<option value> ....]
Spaces in an option must be escaped by a backslash.


File locations

dataRoot the root directory for the data files
config$dataRoot = "/srv/www/htdocs"

annoDir the directory where all chip annotation files are located
config$annoDir =  "/srv/GT/reference/microarray/annotations"

htmlDir the output html directory
config$htmlDir = "."

cssFile the CSS file to be included in the header of the report
config$cssFile = "/usr/local/ngseq/bfab_scripts/bfabStyle.css"

aptCommand the command to launch the Affymetrix Power Tools (APT)
config$aptCommand = "/usr/local/ngseq/bfab_scripts/affymetrix/apt/Linux/bin/apt-probeset-summarize"

cdfRepository the directory with all the CDF files. The suffix .CDF of the files must be uppercase
config$cdfRepository = "/srv/GT/reference/microarray/cdfRepository"

gseaDatabaseDir the directory with the geneset databases for the GSEA analysis
config$gseaDatabaseDir = "/srv/GT/reference/microarray/annotations/GSEA/GeneSetDatabases"

jarDir the directory with the jar files for metacore connection
config$jarDir = "/usr/local/ngseq/bfab_scripts"

logFile the file where the starting and the finishing of jobs is reported
config$logFile = "/srv/GT/reference/microarray/jobLog.txt"

exonmapConfDir location info for exonmap databases
config$exonmapConfDir="/srv/GT/reference/microarray/.exonmap"

xmapcoreConfDir location of xmapcore databases
config$xmapcoreConfDir="/srv/GT/reference/microarray/.xmapcore"

Probe Detection Thresholds

useSigThresh boolean flag telling whether a probe must have a signal above a threshold in order to be considered present
config$useSigThresh = TRUE

sigThresh the signal threshold for a probe to be present, only used if useSigThresh is TRUE. The Agilent QC method overwrits this threshold and sets it to 200.
config$sigThresh = 25

useDetectionPThresh boolean flag telling whether the detection p-value should be considered. Only used if the data has actually detection p-value, which is currenty only true for MAS5 processed Affy data.
config$useDetectionPThresh = FALSE

detectionPThresh threshold for the detection p-value, see useDetectionPThresh
config$detectionPThresh = 0.05

Input and Preprocessing Options

normMethod Normalization method to be used. Valid options are: "none", "quantile", "logMean", "vsn". Different analyses will use different default normalizations. The normalization used is reported in the HTML-Report.
config$normMethod = "quantile"

doBatchNorm whether batch normalization usign the pamr package should be applied. Is only valid for experimmental designs where the batch-factor is balanced with respect to the other experimental factors. As a side effect this normalizes each probe to mean 1, which means that signal thresholding can not be used in combination with this flag. Requires in the sample annotation a column called "batch" that indicates the batch for each sample.
config$doBatchNorm = FALSE

Input and Preprocessing Options: Agilent

AgilentSignalColumn the column to read from Agilent TXT files. For two channel data the corresponding control signal column is a also read. Common choices are "gMedianSignal", "gProcessedSignal", "LogRatio". For "LogRatio" the control columnb will be "gProcessedSignal"
config$AgilentSignalColumn = "gMedianSignal"

AgilentFlagColumn the column in the Agilent File that provides the present flags
config$AgilentFlagColumn = "gIsWellAboveBG"

loadAgilentFeatureData for Agilent files the loaded signal always holds one value per probe. Replicate spots/features are averaged. If this is true, the original feature values are also kept and stored in the rawData as featureSignal.
config$loadAgilentFeatureData = FALSE

useBarcodeAsReplicateId this will load the slide barcode and fill the replicate slot of the sample annotation with the barcode. Using this flag is deprecated.
config$useBarcodeAsReplicateId = FALSE

normMethod Normalization method to be used. Valid options are: "none", "quantile", "logMean", "vsn". Different analyses will use different default normalizations. The normalization used is reported in the HTML-Report.
config$normMethod = "quantile"

doBatchNorm whether batch normalization usign the pamr package should be applied. Is only valid for experimmental designs where the batch-factor is balanced with respect to the other experimental factors. As a side effect this normalizes each probe to mean 1, which means that signal thresholding can not be used in combination with this flag. Requires in the sample annotation a column called "batch" that indicates the batch for each sample.
config$doBatchNorm = FALSE

Input and Preprocessing Options: Affymetrix

AffyPreprocessing preprocessing method for Affy data, supported values are "rma" and "mas5"
config$AffyPreprocessing = "rma"

runMas5 flag whether additionally MAS5 processing should be run
config$runMas5 = FALSE

useFirstExonOnly flag whether for exon arrays only probe sets in the 5'-exon of genes should be used. Only for exploratory analysis checking for degradation
config$useFirstExonOnly = FALSE

useLastExonOnly flag whether for exon arrays only probe sets in the 3'-exon of genes should be used. Only for exploratory analysis checking for degradation.
config$useLastExonOnly = FALSE

useExonicOnly should only "exon"-targeting probe sets as defined by the xmap annotation database be used?. Only relevant when using exon arrays. Speeds up processing and reduces false positives when only looking for expression in well defined genes.
config$useExonicOnly = FALSE

removePoorAffyProbes should probes that have a signal above affyPorbeSignalFilterThresh in less than minAffyPresentProbeValues samples be removed?. Particularly useful for tiling arrays where many probes may have bad hybridization properties
config$removePoorAffyProbes = FALSE

affyProbeSignalFilterThresh threshold for removal of "non-working" probes on Affy chips
config$affyProbeSignalFilterThresh = 32

minAffyPresentProbeValues number of samples in which a value above affyProbeSignalFilterThresh is required in order to keep a probe
config$minAffyPresentProbeValues = 3

minAffyProbeCount min number of probes to keep in a probe set, even if some of them are "non-working"
config$minAffyProbeCount = 3

exonLevelCdf set of cdfs that define exon-level probe sets and for which exon-level analyses can be applied
config$exonLevelCdf = c("ratexonpm", "mouseexonpm", "exon.pm", "raex10stv1", "huex10stv2", "moex10stv1")

xmapCdf set of cdfs that are available in the xmap database
config$xmapCdf = c("ratexonpm", "mouseexonpm", "exon.pm")

exonProbeLevel wich probes from Affy exon chips to use: "core", "extended", or "full"; this is ignored then using xmapCDFs
config$exonProbeLevel = "core"

Plotting Options

writeScatterPlots flag telling whether scatter plots should be drawn
config$writeScatterPlots = TRUE

logColorRange for log-ratio heatmap plots, the color range will be -logColorRange to +logColorRange, values outside this interval are clamped
config$logColorRange  = 4

topGeneSize for the top gene QC correlation plots and sample clustering this number of probes with highest variance in the data set is used
config$topGeneSize = 100

maxGenesForClustering maximum number of genes to use for clustering, if there are more genes then only the most varyiing are used
config$maxGenesForClustering = 2000

minGenesForClustering minimum number of genes needed for a clustering
config$minGenesForClustering = 30

showGeneClusterLabels should gene labels be drawn on the clustering heatmap
config$showGeneClusterLabels = FALSE

plotDegradation should degradation plots be generated; only for Affymetrix Exon Data; plots value from the first exon versus values from the last exon
config$plotDegradation = FALSE

highVarThreshold for the heamap showing the most varying genes; only genes where the standard deviation of the log2 values exceeds the thresholds are used
config$highVarThreshold = 0.5

showTailEffects show the tail effects for miRNA arrays
config$showTailEffects = FALSE

Differential Expression Options

minimalLog2Effect threshold for prefiltering probes on variance before running hypothesis tests; see the hypothesis test section for an explanation
config$minimalLog2Effect = 0.3

pValueHighlightThresh only probes with a p-value below are highlighted in the plots
config$pValueHighlightThresh = 0.01

log2RatioHighlightThresh only probes with an absolute log2 ratio above the threshold are highlighted
config$log2RatioHighlightThresh = 0.5

testMethod the test method to use in a two groups-test; possible values: "t-test", "Wilcox", "limma"
config$testMethod = "t-test"

tukeyThresh for probes with ANOVA p-values below this threshold we compute the Tukey post-hoc tests
config$tukeyThresh = 0

Gene Set Analysis Options

runGO whether GO analysis should be run or not; will only be run if a probe to gene mapping is available
config$runGO = FALSE

runMetaCore whether MetaCore's pathway analysis should be run; will only be run if a probe to gene mapping is available
config$runMetaCore = FALSE

pValThreshGO only probes with a differential expression p-value below will be used as input for overrepresentation analysis
config$pValThreshGO = 1e-2

log2RatioThreshGO only probes with a higher expression change will be used as input for overrepresentation analysis
config$log2RatioThreshGO = 0

pValThreshFisher only GO categories with a Bonferroni-Holm correted p-value below will be shown
config$pValThreshFisher = 1e-4

pValThreshFisherKegg only Kegg pathways with a Bonferroni-Holm corrected p-value below will be shown
config$pValThreshFisherKegg = 1e-2

minCountFisher only GO categories that have at least this many genes are searched for overrepresented genes
config$minCountFisher = 3

runGSEA whether the Gene Set Enrichment Analysis (GSEA) should be run; is very slow
config$runGSEA = FALSE

pValThreshGsea only Gene Sets with a p-value below will be reported
config$pValThreshGsea = 1e-4

maxNumberGroupsDisplayed the maximum number of GO groups to show in the HTML tables
config$maxNumberGroupsDisplayed = 40

Output Options

writeAllProbes whether all probes should be written in the result of a test; if false
config$writeAllProbes = TRUE

doZip whether text files should be zipped
config$doZip = TRUE

writeAffyTxt whether the Affy Txt files produced by APT should be written
config$writeAffyTxt = FALSE

Annotation Options

probeAnnotationFromBioC the names of the probe annotation fields from bioconductor packages that should be used; and how they should be renamed
config$probeAnnotationFromBioC = c("Gene Symbol"="SYMBOL", "Gene Description"="GENENAME", "Entrez Gene ID"="ENTREZID")

geneColumnSet the annotation columns that can be used to map probes to genes
config$geneColumnSet =c("Entrez Gene ID", "Gene Symbol", "Ensembl Gene ID", "Gene Symbol [Agilent]")

useAnnotationFromFile if annotations should be loaded from existing annotation files only; not trying to use Bioconductor packages or BioMart
config$useAnnotationFromFile = TRUE

Other Options

printChips print some header information from the loaded Agilent files on stdout
config$printChips = TRUE

saveImage save an image at the end of the analysis?
config$saveImage = FALSE

saveRawData save an .RData file holding the rawData Object right after import
config$saveRawData = FALSE

subset for testing purposes, use only a subset of probes/reads to speed up the processing
config$subset = FALSE



NGS Options

readUnmapped if unmapped reads should be loaded from BAM files; specific methods will have their own defaults
config$readUnmapped = TRUE

multiMatch if multi-matching reads should be loaded; values: "all";; future will be implement also: "unique", "random" (take randomly one of the hits), N (use all up to a multi-matching of N)
config$multiMatch = "all"



Choosing CDF Files for Affymetrix Data

Affymetrix data can be analysed using different groupings of probes into probe sets. The grouping is defined by CDF files. Next to the Affymetrix' standard CDF files we support also files from the brain array group:
http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp

3'-IVT Arrays

By default, the Affymetrix CDF is used, you can overwrite this with the CDF option field in the analysis settings. If your CDF file is not available, please contact your FGCZ Bioinformatics contact to install it for you.

Exon arrays

For exon arrays, the data analysis can be done at the "exon" or at the "transcript" level, the available CDFs are:

SpeciesExon level (exonmap) Transcript level (Brainarray)
Humanexon.pmHuEx10stv2_Hs_ENSG
MousemouseexonpmMoEx10stv1_Mm_ENSG
RatratexonpmRaEx10stv1_Rn_ENSG


The default choice of the CDF file depends on the analysis:
  • Affymetrix QC: Brainarray version is used; because we're only looking at overall quality and sample similarity at gene expression level.
  • Two Groups Analysis, ANOVA Analysis: Brainarray version is used
  • Exon Two Groups Analysis, Exon ANOVA Analysis: exonmap version is used

List of manually installed CDF environments


SpeciesEnvironments
ArabidopsisATH1121501_At_TAIRG, ATH1121501_At_TAIRT, atsschiptilingprobes, atsschipallprobes, atsschipathprobes, Atdschip_expr




Manually generated annotation files

  • rice.txt: for the Affymetrix Rice chip. Holds annotations extracted from Affymetrix annotation file. The column "Entrez Gene ID" does not hold the Entrez Gene ID but a mixture of Gebank and TIGR Ids extracted from the "Target Description" column. This was done in order to have a usable Gene column for GO analysis.

Preprocessing


Affymetrix


  • 3'IVT arrays and if a CDF file is available: B-Fabric uses the Affymetrix Power tools implementation to run the RMA and MAS5 algorithm
  • Exon arrays: B-Fabric uses bioconducturs rma implementation


KEGG Pathway analysis


We use the mapping of full species names to the three-character kegg organism acronym:
ftp://ftp.genome.jp/pub/kegg/genes/taxonomy

Arabidopsis ATH1 array etc. is not supported because we don't have NCBI gene id annotations for these.


Created by akal. Last Modification: Tuesday August 30, 2011 09:21:27 CEST by hubert.