Human

Release 44 (GRCh37)

GTF / GFF3 files

Content Regions Description Download
Comprehensive gene annotation CHR
  • It contains the comprehensive gene annotation originally created on the GRCh38 reference chromosomes, mapped to the GRCh37 primary assembly with gencode-backmap
  • Note that automated annotation ('ENSEMBL') was not mapped to GRCh37 in this release. The corresponding annotation was obtained from GENCODE 19
  • Also note that some manually annotated ('HAVANA') genes did not map properly to GRCh37. Their annotation was copied from GENCODE 19 if available, or they are completely absent otherwise. The gene annotation mapping summary can be found here
GTF GFF3
Basic gene annotation CHR
  • It contains the basic gene annotation on the reference chromosomes only
  • This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene
  • This is the main annotation file for most users
GTF GFF3
Long non-coding RNA gene annotation CHR
  • It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes
GTF GFF3

Fasta files

Content Regions Description Download
Transcript sequences CHR
  • Nucleotide sequences of all transcripts on the reference chromosomes
Fasta
Protein-coding transcript sequences CHR
  • Nucleotide sequences of coding transcripts on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene, protein_coding_LoF
Fasta
Protein-coding transcript translation sequences CHR
  • Amino acid sequences of coding transcript translations on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene, protein_coding_LoF
Fasta
Long non-coding RNA transcript sequences CHR
  • Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes
Fasta
Genome sequence, primary assembly (GRCh37) PRI
  • Nucleotide sequence of the GRCh37 primary genome assembly (chromosomes and scaffolds)
  • The sequence region names are the same as in the GTF/GFF3 files
Fasta

Metadata files

Content Regions Description Download
Annotation remarks CHR
  • Remarks made during the manual annotation of the transcript
Metadata
Entrez gene ids CHR
  • Entrez gene ids associated to GENCODE transcripts (from Ensembl xref pipeline)
Metadata
Gene source CHR
  • Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of small RNA and mitochondrial genes)
Metadata
Gene symbol CHR
  • HGNC approved gene symbol (from Ensembl xref pipeline)
Metadata
PDB id CHR
  • PDB entries associated to the transcript (from Ensembl xref pipeline)
Metadata
PubMed id CHR
  • Pubmed ids of publications associated to the transcript (from HGNC website)
Metadata
RefSeq CHR
  • RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline)
Metadata
SwissProt CHR
  • UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
Metadata
Transcript source CHR
  • Source of the transcript annotation
Metadata
Transcript annotation evidence CHR
  • Piece of evidence used in the annotation of the transcript
Metadata
TrEMBL CHR
  • UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
Metadata