GENCODE

Release 19 (GRCh37.p13)


GTF / GFF3 files

Content Regions Description Download
Comprehensive gene annotation CHR
  • It contains the comprehensive gene annotation on the reference chromosomes only
  • This is the main annotation file for most users
GTF   GFF3
Comprehensive gene annotation ALL
  • It contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes)
  • This is a superset of the main annotation file
GTF
Long non-coding RNA gene annotation CHR
  • It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes
  • This is a subset of the main annotation file
GTF
PolyA feature annotation CHR
  • It contains the polyA features (polyA_signal, polyA_site, pseudo_polyA) manually annotated by HAVANA on the reference chromosomes
  • This dataset does not form part of the main annotation file
GTF
Consensus pseudogenes predicted by the Yale and UCSC pipelines CHR
  • 2-way consensus (retrotransposed) pseudogenes predicted by the Yale and UCSC pipelines, but not by HAVANA, on the reference chromosomes
  • This dataset does not form part of the main annotation file
GTF
Predicted tRNA genes CHR
  • tRNA genes predicted by ENSEMBL on the reference chromosomes using tRNAscan-SE
  • This dataset does not form part of the main annotation file
GTF

Fasta files

Content Regions Description Download
Protein-coding transcript sequences CHR
  • Nucleotide sequences of coding transcripts on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Protein-coding transcript translation sequences CHR
  • Amino acid sequences of coding transcript translations on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Long non-coding RNA transcript sequences CHR
  • Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes
Fasta
Genome sequence (GRCh37.p13) ALL
  • Nucleotide sequence of the GRCh37.p13 genome assembly version on all regions, including reference chromosomes, scaffolds, assembly patches and haplotypes
  • The sequence region names are the same as in the GTF/GFF3 files
Fasta

Metadata files

Content Regions Description Download
Annotation remarks ALL
  • Remarks made during the manual annotation of the transcript
Metadata
Exon annotation evidence ALL
  • Piece of evidence used in the annotation of an exon (usually peptides, mRNAs, ESTs)
Metadata
Gene source ALL
  • Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of small RNA and mitochondrial genes)
Metadata
Gene symbol ALL
  • HGNC approved gene symbol (from Ensembl xref pipeline)
Metadata
PDB id ALL
  • PDB entries associated to the transcript (from Ensembl xref pipeline)
Metadata
PolyA features ALL
  • Manually annotated polyA features overlapping the transcript 3'-end
Metadata
PubMed id ALL
  • Pubmed ids of publications associated to the transcript (from HGNC website)
Metadata
RefSeq ALL
  • RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline)
Metadata
Selenocysteine ALL
  • Amino acid position of a selenocysteine residue in the transcript
Metadata
SwissProt ALL
  • UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
Metadata
Transcript source ALL
  • Source of the transcript annotation
Metadata
Transcript annotation evidence ALL
  • Piece of evidence used in the annotation of the transcript
Metadata
TrEMBL ALL
  • UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
Metadata

    Statistics of this Gencode version are found here.

    Format description of the GTFs is found here.

    More information about the GRCh37.p13 patches/scaffolds/haplotypes can be found here.

 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.