GENCODE

Release 27 (mapped to GRCh37)


GTF / GFF3 files

Content Regions Description Download
Comprehensive gene annotation CHR
  • It contains the comprehensive gene annotation originally created on the GRCh38 reference chromosomes, mapped to the GRCh37 primary assembly with gencode-backmap
  • This is the main annotation file for most users
  • Note that automated annotation ('ENSEMBL') was not mapped to GRCh37 in this release. The corresponding annotation was obtained from GENCODE 19
  • Also note that some manually annotated ('HAVANA') genes did not map properly to GRCh37. Their annotation was copied from GENCODE 19 if available, or they are completely absent otherwise. The unmapped gene annotation can be found here (gtf, gff3)
GTF  GFF3
Basic gene annotation CHR
  • It contains the basic gene annotation on the reference chromosomes only
  • This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene
GTF   GFF3
Long non-coding RNA gene annotation CHR
  • It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes
  • This is a subset of the main annotation file
GTF   GFF3

Fasta files

Content Regions Description Download
Transcript sequences CHR
  • Nucleotide sequences of all transcripts on the reference chromosomes
Fasta
Protein-coding transcript sequences CHR
  • Nucleotide sequences of coding transcripts on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Protein-coding transcript translation sequences CHR
  • Amino acid sequences of coding transcript translations on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Long non-coding RNA transcript sequences CHR
  • Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes
Fasta
Genome sequence, primary assembly (GRCh37) PRI
  • Nucleotide sequence of the GRCh37 primary genome assembly (chromosomes and scaffolds)
  • The sequence region names are the same as in the GTF/GFF3 files
Fasta

Metadata files

Content Regions Description Download
Annotation remarks CHR
  • Remarks made during the manual annotation of the transcript
Metadata
Entrez gene ids CHR
  • Entrez gene ids associated to GENCODE transcripts (from Ensembl xref pipeline)
Metadata
Gene source CHR
  • Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of small RNA and mitochondrial genes)
Metadata
Gene symbol CHR
  • HGNC approved gene symbol (from Ensembl xref pipeline)
Metadata
PDB id CHR
  • PDB entries associated to the transcript (from Ensembl xref pipeline)
Metadata
PubMed id CHR
  • Pubmed ids of publications associated to the transcript (from HGNC website)
Metadata
RefSeq CHR
  • RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline)
Metadata
SwissProt CHR
  • UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
Metadata
Transcript source CHR
  • Source of the transcript annotation
Metadata
Transcript annotation evidence CHR
  • Piece of evidence used in the annotation of the transcript
Metadata
TrEMBL CHR
  • UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
Metadata

    Format description of the GTFs is found here. A description of the mapping attributes in the GTF/GFF3 files can be found here.

    More information about the GRCh37 primary assembly can be found here.

 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.