GENCODE

Release 26 (GRCh38.p10)

Go to GRCh37 version of this release

GTF / GFF3 files

Content Regions Description Download
Comprehensive gene annotation CHR
  • It contains the comprehensive gene annotation on the reference chromosomes only
  • This is the main annotation file for most users
GTF   GFF3
Comprehensive gene annotation ALL
  • It contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes)
  • This is a superset of the main annotation file
GTF   GFF3
Comprehensive gene annotation PRI
  • It contains the comprehensive gene annotation on the primary assembly (chromosomes and scaffolds) sequence regions
  • This is a superset of the main annotation file
GTF   GFF3
Basic gene annotation CHR
  • It contains the basic gene annotation on the reference chromosomes only
  • This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene
GTF   GFF3
Basic gene annotation ALL
  • It contains the basic gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes)
  • This is a subset of the corresponding comprehensive annotation, including only those transcripts tagged as 'basic' in every gene
GTF   GFF3
Long non-coding RNA gene annotation CHR
  • It contains the comprehensive gene annotation of lncRNA genes on the reference chromosomes
  • This is a subset of the main annotation file
GTF   GFF3
PolyA feature annotation CHR
  • It contains the polyA features (polyA_signal, polyA_site, pseudo_polyA) manually annotated by HAVANA on the reference chromosomes
  • This dataset does not form part of the main annotation file
GTF   GFF3
Consensus pseudogenes predicted by the Yale and UCSC pipelines CHR
  • 2-way consensus (retrotransposed) pseudogenes predicted by the Yale and UCSC pipelines, but not by HAVANA, on the reference chromosomes
  • This dataset does not form part of the main annotation file
GTF   GFF3
Predicted tRNA genes CHR
  • tRNA genes predicted by ENSEMBL on the reference chromosomes using tRNAscan-SE
  • This dataset does not form part of the main annotation file
GTF   GFF3

Fasta files

Content Regions Description Download
Transcript sequences CHR
  • Nucleotide sequences of all transcripts on the reference chromosomes
Fasta
Protein-coding transcript sequences CHR
  • Nucleotide sequences of coding transcripts on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Protein-coding transcript translation sequences CHR
  • Amino acid sequences of coding transcript translations on the reference chromosomes
  • Transcript biotypes: protein_coding, nonsense_mediated_decay, non_stop_decay, IG_*_gene, TR_*_gene, polymorphic_pseudogene
Fasta
Long non-coding RNA transcript sequences CHR
  • Nucleotide sequences of long non-coding RNA transcripts on the reference chromosomes
Fasta
Genome sequence (GRCh38.p10) ALL
  • Nucleotide sequence of the GRCh38.p10 genome assembly version on all regions, including reference chromosomes, scaffolds, assembly patches and haplotypes
  • The sequence region names are the same as in the GTF/GFF3 files
Fasta
Genome sequence, primary assembly (GRCh38) PRI
  • Nucleotide sequence of the GRCh38 primary genome assembly (chromosomes and scaffolds)
  • The sequence region names are the same as in the GTF/GFF3 files
Fasta

Metadata files

Content Regions Description Download
Annotation remarks ALL
  • Remarks made during the manual annotation of the transcript
Metadata
Entrez gene ids ALL
  • Entrez gene ids associated to GENCODE transcripts (from Ensembl xref pipeline)
Metadata
Exon annotation evidence ALL
  • Piece of evidence used in the annotation of an exon (usually peptides, mRNAs, ESTs)
Metadata
Gene source ALL
  • Source of the gene annotation (Ensembl, Havana, Ensembl-Havana merged model or imported in the case of small RNA and mitochondrial genes)
Metadata
Gene symbol ALL
  • HGNC approved gene symbol (from Ensembl xref pipeline)
Metadata
PDB id ALL
  • PDB entries associated to the transcript (from Ensembl xref pipeline)
Metadata
PolyA features ALL
  • Manually annotated polyA features overlapping the transcript 3'-end
Metadata
PubMed id ALL
  • Pubmed ids of publications associated to the transcript (from HGNC website)
Metadata
RefSeq ALL
  • RefSeq RNA and/or protein associated to the transcript (from Ensembl xref pipeline)
Metadata
Selenocysteine ALL
  • Amino acid position of a selenocysteine residue in the transcript
Metadata
SwissProt ALL
  • UniProtKB/SwissProt entry associated to the transcript (from Ensembl xref pipeline)
Metadata
Transcript source ALL
  • Source of the transcript annotation
Metadata
Transcript annotation evidence ALL
  • Piece of evidence used in the annotation of the transcript
Metadata
TrEMBL ALL
  • UniProtKB/TrEMBL entry associated to the transcript (from Ensembl xref pipeline)
Metadata

    Statistics of this Gencode version are found here.

    Format description of the GTFs is found here.

    More information about the GRCh38.p10 patches/scaffolds/haplotypes can be found here.

 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.