RNA-seq Genome Annotation Assessment Project (1/2)

Data Sets

Usage restrictions: Data should not to be used for publications without written permission, see https://www.genome.gov/ENCODE/#3.

Round 1

A readme file is available on our FTP site.

  1. Illumina fastq files:
    Insert sizes for paired reads: 200, for single reads: 100 nucleotides.
    151 nucl. sequences consists of 2x75 reads, the last nucl. is discarded.
    • Human polyA+ total RNA, single reads, K562
    • Human polyA+ total RNA, single reads, GM12878
    • Human polyA+ total RNA, paired reads, K562
    • Human polyA+ total RNA, paired reads, GM12878
    • Human polyA+ cytosolic RNA, single reads, stranded, K562

    FTP download

  2. SOLiD cfasta files:
    • Human cytosolic long polyA+, K562
    • Human cytosolic long polyA+, GM12878

    FTP download

  3. Helicos fasta file:
    • Human cytosolic long polyA+, K562

    FTP download

  4. modENCODE Drosophila data:
    • fastq files from cell lines S2-DRSC, CME_W1_CI, Kc167, ML-DmBG3-c2

    FTP download

  5. modENCODE C.elegans data:
    • fastq files from 6 different stages

    FTP download

Round 2

A readme file and the data are available on our FTP site.

  1. experiment: Homo sapiens polyA+ total RNA, paired reads, HepG2
    • lab: Wold lab, Caltech
    • format: fastq, tar archive with bzipped files
    • other details: 75mer sequences, the last base has been removed
    • _1 & _2 are the corresponding pairs
    • includes spike-in sequences for quantification
    • quality scores are Sanger rather than Illumina
    • fragment length is 200bp with a std deviation of 34
  2. experiment: Caenorhabditis elegans polyA+ total RNA, paired reads, L3 phase
    • lab: Sternberg lab/Wold lab, Caltech
    • format: fastq, tar archive with bzipped files
    • other details: 75mer sequences, the last base has been removed
    • _1 & _2 are the corresponding pairs
    • includes spike-in sequences for quantification
    • quality scores are Sanger rather than Illumina
    • fragment length is 165bp with a standard deviation of 28
  3. experiment: Drosophila melanogaster polyA+ total RNA, paired reads, L3 stage larvae
    • lab: Celniker lab, Lawrence Berkeley National Laboratory
    • format: fastq, tar archive with gzipped files
    • other details: 76mer sequences
    • _1 & _2 are the corresponding pairs
    • produced on an Illumina Genome Analyzer II
    • fragment length is 250-300bp
    • low quality reads have been filtered out

Spike in Data for Quantification

To allow a more precise quantification control for (human) RNA-Seq quantification, we will test control sequences of defined concetrations in the nanostring experiments for the datasets from the Wold lab (fastq files 1-4).

There is a fasta file with the spiked-in sequences available (download).

Please make sure you submit your quantification for these as well!