B-Fabric Application: Map Reads


Map Reads can be used to call an R method that again calls the actual mapper. Currently implemented methods are:

Important notes:
  • When mapping paired-end data, do only provide the R1 reads! If you have R2 reads selected click on the Back button to deselect them
  • Stranded RNA-seq: You have to provide in the "Options" text field: strandMode=sense or strandMode=antisense; this will set the mapper specific strand options
    • the translation of the strandMode to the library-type in the tuxedo suite apps is
      • strandMode=both == --library-type fr-unstranded
      • strandMode=sense == --library-type fr-firststrand
      • strandMode=antisense == --library-type fr-secondstrand
  • Transcriptclass field: if you specify "foo" this will use "foo.gtf" in your build directory to define genes and corresponding exon junctions; default entry is "genes" but you can place other gene definition gtf files in your build folder
When setting Application options, do not navigate in different tabs to other pages. If you do, your job will not be submitted

B-Fabric provides the following options:
  • Name: a suffix that will appended to the file name
  • Method: one of the methods above
  • Options:
    • strandMode=<sense|antisense>
    • additionally if the references are not in the reference directory or if a custom reference should be used the references can be set explicitly by specifying:
      • ref=<full path to reference genome index to be used>
      • gtf=<full path to gtf file to be used>
    • everything else that is in the option field will be passed as is to the mapper

Details


B-Fabric will start the mapper in a subdirectory of /scratch/bfabric of the executing node

The methods do the following steps:
  1. Get reads from gstore in the local directory
    1. gunzip (if files are .gz files)
    2. do filtering/trimming with prinseq-lite
  2. Build index if needed
  3. Run the mapper with as many threads as the jobs has got slots (if the mapper supports this)
  4. if needed convert the alignments to an indexed, sorted bam file
  5. generate a text file .mapstat that holds some statistics of the mapping
  6. edit the header of the BAM and add the number of input reads as a comment, like @CO INPUTREADCOUNT:<N>

B-Fabric places the .bam, .bai, and .mapstat file then back in the gstore

Bowtie 2

Sensitive option is:
--very-sensitive-local Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
which corresponds to
  • -D 20: number of acceptable failed seed extensions
  • -R 3: number of reseeding attempts, in case the current seeds yield too many hits
  • -N 0: number of mismatches in the seed region
  • -L 20: seed length
  • -i S, 1,0.5: interval between seed starts will be 1 + 0.5 * sqrt(read length)

Tophat

Options for sensitive mapping:
"--initial-read-mismatches 3 --segment-mismatches 3 --closure-search --coverage-search --microexon-search