Two Group Analysis (Microarray)

Two Group Analysis (Microarray)

Preprocessing

Normalization

For differential expression computation, B-Fabric applies by default the quantile normalization implemented in the Bioconductor package preprocessCore to make sample comparable.

Probe Selection

For completeness the result table will contain p-values and for all probes. However for probes where the signal was not above background the p-values will not be useful!!

A probe is considered Present (above background) for the comparison if it is considered as Present in at least 50% of the samples of one condition. Whether a probe satisfies this criterion is reported in the result table in the column isPresent.

The FDR is only computed for those probes that are present! For the remaining probes the FDR is set to NA.

It is recommended to further follow up only those genes where probes do have a valid FDR!

Using probes with FDR values set to NA, implies an increased risk of following up false positive changes. However using probes with small changes may be warranted if the effect on the gene expression is expected to be small (e.g. because of cell mixtures, or low dosages, ...).

Hypothesis Test Methods

The method that is used to compute the significance of the differential expression is indicated by the Method parameter in the resulting html report.

Available methods are:

limma: Uses the limma package in Bioconductor
paired limma: Uses the limma package in Bioconductor with a paired model
t-test: Student's t-test
paired t-test: Paired Student's t-test
Wilcox: Wilcoxon's rank-sum test

The above methods compute the p-values and we use Benjamini-Hochberg's algorithm to compute the false discovery rate (FDR) in the result table.

Result File Format

The result is generated as a tab-separated text file that can be loaded into Excel. You must make sure that annotation columns are loaded as "text" format, otherwise Excel may convert some gene symbols into dates or may round integer Gene IDs or chromosomal coordinates!!!

The columns of the file are from left to right:

Column Names	Column Values
Probe Identifier	ID of the microarray probe
Optional columns like: Gene Symbol, ...	Annotation
IsControl	whether the probe is designed to be a control for hybridization, labeling or gridding
log2 Signal	the average log2 hybridization signals of the two conditions compared
isPresent	whether the probe signal was in at leas one of the conditions above background
log2 Ratio	log2 of the expression ratio of the two conditions
ratio	ratio of the average expression in the two conditions
pValue	significance value computed by the hypothesis test
fdr	false discovery rate (FDR) associated to the set of probes with this or higher signifiicance. The FDR is "N/A" for probes where isPresent is FALSE and for probes that showed only a small amount of variation and fold-change in the entire data set
Avg of ....	average log2 expression values of the two conditions compared
column names that are sample names	the columns hold the normalized log2 expression of the probe in that sample
(Optional) Cluster	If clustering was performed the color of the cluster the probe is part of. If a probe has an empty string here, the probe was not used for clustering

How to select candidate genes from the result table

The probes/genes in the result table are sorted according to p-value. If you want to select "promising" candidate genes, you should follow the rulres

do not use probes where the "IsControl" column shows TRUE, these are probes that served as hybridization or other controls
do not use probes where the "IsPresent" column shows FALSE, these are probes where the signal did not go well above the background signal in any of the conditions, see the Probe Selection section for information how the present status was computed
do filter genes based on p-value, a commonly accepted choice is p=0.05. Do also record the FDR that you obtain for this choice. This is the maximal value in the FDR column, after filtering away everything with p>0.05; Reviewers will probably ask for this FDR value.
do filter genes based on log-ratio, a commonly accepted choice is greater 1 or below -1.
to be more stringent you may also want to remove low expressors. This would be additional to the IsPresent filtering. Depending on the platform, "low expressed" genes are those with log2 expression like 2 to 5.