RGASP Round 3: RNAseq Read Alignment Assessment

One of the lessons learned from rounds 1 & 2 of the project was that the initial step of aligning the reads has a major influence on the quality of gene predictions produced. Therefore, a third round of RGASP is planned to focus primarily on read mapping to the genome. This will be run at the end of 2010 with a workshop in Barcelona early 2011.

The project is run as part of the "Sequence Mapping and Assembly Assessment Project (SMAAP)", a collaborative effort to compare and evaluate methods and strategies for de novo genome assembly (dnGASP)and RNASeq read alignment (RGASP3) using data from second generation sequencing platforms.

Both projects will culminate in a workshop held in Barcelona on April 4-7, 2011. This meeting will be organized in partnership with the International Center for Scientific Debate (CIDC), an initiative fostered by Biocat together with 'la Caixa' Foundation Welfare Projects.

RGASP3 is organised by Paul Bertone (EBI) with input from the Wellcome Trust Sanger Institute and the CRG. [Contact]

Goal of RGASP 3

The principal aim of the RGASP3 project is to allow a fair evaluation of different analysis methods within the community generating high-quality RNASeq read alignments that can be used for efficient transcriptome characterization (transcript discovery and quantitation).

Time line

  • Submission cut-off date: March 1, 2011
  • Workshop in Barcelona: April 4-7, 2011 (dnGASP: 5.-6., RGASP3: 6.-7.)

Source input data

Login information for the private ftp site hosting the data will be provided upon registration. The data consists of:

  1. mouse whole brain RNASeq data (David Adams lab, WTSI/UK)

    paired-end Illumina 76bp reads, insert sizes 175-225 bp

  2. K562 cell line (human chronic myelogenous leukemia) RNASeq data (Tom Gingeras lab, CSHL/USA)

    a) whole cell, b) nuclear fraction, and c) cytosolic fraction

    paired-end Illumina 76bp reads

  3. Simulated human RNA-seq data (Gregory Grant, University of Pennsylvania)

    Paired-end Illumina 76bp reads, mean insert size 225 bp Quality scores are not simulated

This data is not to be used for publications without written permission of the providers.

Guidelines for RGASP3 participation

General procedure

  1. Register your group
  2. Fetch the input data mentioned above
  3. Create your alignments following the rules below
  4. Submit your data to the "submission/your_id" directory at the above ftp server before the cut-off date. "your_id" is the personal subdirectory generated for you after registration.
  5. Supply a description of your methods with the submission.
  6. The workshop will give us the oportunity to discuss the methods and results, potentially also allow a ad-hoc re-analysis of a small new test data set.

Input data

  • Software should only use the reference genomes applicable to each dataset (human: GRCh37/hg19, mouse: NCBI m37) and RNA-seq data supplied on the FTP site as input.
  • Please ignore haplotypes data and use only the reference genome.
  • Ideally you should use the files from each organism or cell line to produce read alignments and mapping statistics.

Data for submission

  • Programs should output complete mapping of sequencing reads to the reference genome in BAM format.
  • Putative splice junction mappings may optionally be included as a separate submission. Submissions may be based on single-end data, paired-end data, or both.
  • We encourage participants to submit predictions for all datasets.
  • Along with program output, participants should include notes listing the methods and input data used.
  • The parameters used for each run must be included with submissions.

Evaluation

  • Evaluation will be genome wide using GENCODE annotation
  • Submitted alignments will be processed, in a uniform manner, by RNA-seq analysis tools to assess the impact of mapping performance on detected genes and transcripts.
  • Evaluation metrics will be standard gene prediction assessment metrics: sensitivity, specificity and correlation coefficient at the nucleotide, exon, transcript and gene level.
  • There will be an evaluation at the level of exon splice boundaries. This will assess the ability of each program to place short reads across splice junctions
The evaluation and analysis team consists of:
  • Paer Engstroem, Tamara Steijger, Paul Bertone (EBI)
  • Botond Sipos, Nick Goldman (EBI)
  • Greg Grant (Penn)

Conditions:

  • You agree that the submitted predictions will be evaluated by the RGASP team qualitatively and quantitatively.
  • You agree that the RGASP team may publish the results of these evaluations and your prediction sets both in a journal and on the web.
  • You agree to share the details of your method used with all participants after the submission.
  • You certify that your alignments will be generated using only the data provided and the methods you describe.
  • Submissions may be updated before the deadline, but after this date, submissions may not be updated for any reason.
  • Submissions not provided in a validated format and on the requested genome assembly can not be evaluated.
 
RGASP3 workshop sponsors

RGASP3 workshop sponsors

 

Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.