GENCODE

Release 7 (GRCh37)


GTF files

Content Regions Description Download
Comprehensive gene annotation CHR
  • It contains the comprehensive gene annotation on the reference chromosomes only
  • This is the main annotation file for most users
GTF
Comprehensive gene annotation CHR
  • It contains the comprehensive gene annotation on the reference chromosomes only
  • Updated main file: replaced ncRNA_host biotypes with ncRNA_host attributes
GTF
Long non-coding RNA gene annotation CHR
  • It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes
  • This is a subset of the main annotation file
GTF
PolyA feature annotation CHR
  • It contains the polyA features (polyA_signal, polyA_site, pseudo_polyA) manually annotated by HAVANA on the reference chromosomes
  • This dataset does not form part of the main annotation file
GTF
Consensus pseudogenes predicted by the Yale and UCSC pipelines CHR
  • 2-way consensus (retrotransposed) pseudogenes predicted by the Yale and UCSC pipelines, but not by HAVANA, on the reference chromosomes
  • This dataset does not form part of the main annotation file
GTF
Predicted tRNA genes CHR
  • tRNA genes predicted by ENSEMBL on the reference chromosomes using tRNAscan-SE
  • This dataset does not form part of the main annotation file
GTF

Fasta files

Content Regions Description Download
Protein-coding transcript sequences CHR
  • Nucleotide sequences of coding transcripts on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Protein-coding transcript translation sequences CHR
  • Amino acid sequences of coding transcript translations on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta

Metadata files

Content Regions Description Download
Exon annotation evidence CHR
  • Piece of evidence used in the annotation of an exon (usually peptides, mRNAs, ESTs)
Metadata
Gene source CHR
  • Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of smCHR RNA and mitochondrial genes)
Metadata
Gene symbol CHR
  • HGNC approved gene symbol (from Ensembl xref pipeline)
Metadata
PDB id CHR
  • PDB entries associated to the transcript (from Ensembl xref pipeline)
Metadata
PubMed id CHR
  • Pubmed ids of publications associated to the transcript (from HGNC website)
Metadata
RefSeq CHR
  • RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline)
Metadata
SwissProt CHR
  • UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
Metadata
Transcript source CHR
  • Source of the transcript annotation
Metadata
Transcript annotation evidence CHR
  • Piece of evidence used in the annotation of the transcript
Metadata
TrEMBL CHR
  • UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
Metadata

    Statistics of this Gencode version are found here.

    Format description of the GTFs is found here.

 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.