GENCODE

RGASP

RGASP Round 3: RNA-seq Read Alignment Assessment

One of the lessons learned from rounds 1 & 2 of the project was that the initial step of aligning the reads has a major influence on the quality of gene predictions produced. Therefore, a third round of RGASP was conducted to focus primarily on read mapping to the genome.

The project was related to the "Sequence Mapping and Assembly Assessment Project (SMAAP)", a collaborative effort to compare and evaluate methods and strategies for de novo genome assembly (dnGASP) and RNA-seq read alignment (RGASP3) using data from second generation sequencing platforms.

RGASP3 is organised by Paul Bertone (EBI) with input from the Wellcome Trust Sanger Institute and the CRG. [Contact]

Goal of RGASP 3

The principal aim of the RGASP3 project is to allow an unbiased evaluation of different analysis methods within the community generating high-quality RNA-seq read alignments that can be used for efficient transcriptome characterization (transcript discovery and quantitation). A total of 26 spliced alignment protocols based on 11 programs and pipelines were evaluated based on alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. These results will be published in a forthcoming paper and additional data will be posted here.

Source input data

  1. Mouse whole brain RNA-seq data (David Adams lab, WTSI/UK)

    Paired-end Illumina 76bp reads, insert sizes 175-225 bp

  2. K562 cell line (human chronic myelogenous leukemia) RNASeq data (Tom Gingeras lab, CSHL/USA)

    a) whole cell, b) nuclear fraction, and c) cytosolic fraction

    Paired-end strand-specific Illumina 76bp reads

  3. Simulated human RNA-seq data (Gregory Grant, University of Pennsylvania)

    Paired-end Illumina 76bp reads, mean insert size 225 bp

Guidelines for RGASP3 participation

General procedure

  1. Register your group
  2. Fetch the input data mentioned above
  3. Create your alignments following the rules below
  4. Submit your data to the "submission/your_id" directory at the above ftp server before the cut-off date. "your_id" is the personal subdirectory generated for you after registration.
  5. Supply a description of your methods with the submission.
  6. The workshop will give us the oportunity to discuss the methods and results, potentially also allow a ad-hoc re-analysis of a small new test data set.

Input data

  • Software should only use the reference genomes applicable to each dataset (human: GRCh37/hg19, mouse: NCBI m37) and RNA-seq data supplied on the FTP site as input.
  • Please ignore haplotypes data and use only the reference genome.
  • Ideally you should use the files from each organism or cell line to produce read alignments and mapping statistics.

Data for submission

  • Programs should output complete mapping of sequencing reads to the reference genome in BAM format.
  • Putative splice junction mappings may optionally be included as a separate submission. Submissions may be based on single-end data, paired-end data, or both.
  • We encourage participants to submit predictions for all datasets.
  • Along with program output, participants should include notes listing the methods and input data used.
  • The parameters used for each run must be included with submissions.

Evaluation

  • Evaluation will be genome wide using GENCODE annotation
  • Submitted alignments will be processed, in a uniform manner, by RNA-seq analysis tools to assess the impact of mapping performance on detected genes and transcripts.
  • Evaluation metrics will be standard gene prediction assessment metrics: sensitivity, specificity and correlation coefficient at the nucleotide, exon, transcript and gene level.
  • There will be an evaluation at the level of exon splice boundaries. This will assess the ability of each program to place short reads across splice junctions
The evaluation and analysis team consists of:
  • Paer Engstroem, Tamara Steijger, Paul Bertone (EBI)
  • Botond Sipos, Nick Goldman (EBI)
  • Greg Grant (Penn)

Conditions:

  • You agree that the submitted predictions will be evaluated by the RGASP team qualitatively and quantitatively.
  • You agree that the RGASP team may publish the results of these evaluations and your prediction sets both in a journal and on the web.
  • You agree to share the details of your method used with all participants after the submission.
  • You certify that your alignments will be generated using only the data provided and the methods you describe.
  • Submissions may be updated before the deadline, but after this date, submissions may not be updated for any reason.
  • Submissions not provided in a validated format and on the requested genome assembly can not be evaluated.
 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.