NGS Two Group Analysis

Computes differential expression for transcripts or genes (see options below). With the term feature we refer to either of them.

Assumes as input expression estimates for tanscripts.

Normalization

The tests operate on the raw counts and apply their own normalization/scaling scheme.

Feature Selection

For completeness the result table will contain p-values and for all features.

A feature is considered as expressed or Present (above background) for the comparison if it is considered as Present in at least 50% of the samples of one condition. Whether a feature satisfies this criterion is reported in the result table in the column isPresent.

The FDR is only computed for those features that are present! For the remaining features the FDR is set to NA.

It is recommended to further follow up only those features that are present!

Relying features that are not Present, implies an increased risk of following up false positive changes.

Hypothesis Test Methods

The method that is used to compute the significance of the differential expression is indicated by the Method parameter in the resulting html report.

Available methods are:

glm: Uses the glm method implemented in the edgeR Bioconductor package (runs paired test if Pairing is specified)
edger: Uses the exactTest method implemented in the edgeR Bioconductor package (runs paired test if Pairing is specified)
deseq: Uses the nbinomTest method implemented in the DESeq Bioconductor package
sam: Uses the SAMseq method implemented in the samr Bioconductor package

The above methods compute the p-values and we use Benjamini-Hochberg's algorithm to compute the false discovery rate (FDR) in the result table.

Options

Supported options are:

testMethod=<glm|edger|deseq|samr> default: glm
featureLevel=<gene|transcript> default: gene — whether to compute differential expression for genes or transcripts (gene_id or transcript_id in the gtf file). If this is set to gene, the expression values of the transcripts belonging to a gene will be summed up.
geneColumn=<column name> default: gene_id — the column in the count file that holds the gene id, this column will be used to merge transcripts into genes
useSigThresh=<true|false> default: true — whether features with an expression below a threshold should be considered absent
sigThresh=<N> default: 25 — the expression above which a feature is considered absent. Make sure to adapt this to your type of expression value
columnName=<name of the column in the input file that holds the expression counts>. If unset will try any of: multiMatchCounts, transcriptExpressionPosteriorEstimate, transcriptCountPosteriorEstimate, transcriptCount

Q: What are the allowed options for "Transcript Class" ? It is a mandatory parameter? (MO, 29.08.2013)

For further options determining appearance of plots etc. see Analysis Parameters.

Result

The result is generated as a tab-separated text file that can be loaded into Excel. You must make sure that annotation columns are loaded as "text" format, otherwise Excel may convert some gene symbols into dates or may round integer Gene IDs or chromosomal coordinates!!!

Column Names	Column Values
ID	feature identifier
Optional columns like: gene_id, ...	Annotation
isPresent	whether the feature was expressed in at least one of the conditions
log2 Ratio	log2 of the expression ratio of the two conditions
ratio	ratio of the average expression in the two conditions
pValue	significance value computed by the hypothesis test; significance values are computed for each feature even it it is not classified as expressed (see isPresent); significance values should be treated with care if the feature was extremely low expressed
fdr	false discovery rate (FDR) associated to the set of features with this or higher signifiicance. The FDR is "N/A" for features where isPresent is FALSE because the FDR is only evaluated if the feature was actually present
column names that are sample names	the columns hold the normalized expression values of the genomic features in that sample
(Optional) Cluster	If clustering was performed the color of the cluster the probe is part of. If a probe has an empty string here, the probe was not used for clustering