GENCODE

Gene/Transcript Biotypes in GENCODE & Ensembl

Please also compare to the VEGA descriptions.

Further details about the annotation of non-coding RNAs are listed on this Ensembl page.

Gencode GTF format description.


IG_C_gene
IG_D_gene
IG_J_gene
IG_LV_gene
IG_V_gene
TR_C_gene
TR_J_gene
TR_V_gene
TR_D_gene
Immunoglobulin (Ig) variable chain and T-cell receptor (TcR) genes imported or annotated according to the IMGT.
IG_pseudogene
IG_C_pseudogene
IG_J_pseudogene
IG_V_pseudogene
TR_V_pseudogene
TR_J_pseudogene
Inactivated immunoglobulin gene.
Mt_rRNA
Mt_tRNA
miRNA
misc_RNA
rRNA
scRNA
snRNA
snoRNA
ribozyme
sRNA
scaRNA
Non-coding RNA predicted using sequences from Rfam and miRBase
Mt_tRNA_pseudogene
tRNA_pseudogene
snoRNA_pseudogene
snRNA_pseudogene
scRNA_pseudogene
rRNA_pseudogene
misc_RNA_pseudogene
miRNA_pseudogene
Non-coding RNA predicted to be pseudogene by the Ensembl pipeline
TEC
To be Experimentally Confirmed. This is used for non-spliced EST clusters that have polyA features. This category has been specifically created for the ENCODE project to highlight regions that could indicate the presence of protein coding genes that require experimental validation, either by 5' RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.
nonsense_mediated_decay
If the coding sequence (following the appropriate reference) of a transcript finishes >50bp from a downstream splice site then it is tagged as NMD. If the variant does not cover the full reference coding sequence then it is annotated as NMD if NMD is unavoidable i.e. no matter what the exon structure of the missing portion is the transcript will be subject to NMD.
non_stop_decay
Transcript that has polyA features (including signal) without a prior stop codon in the CDS, i.e. a non-genomic polyA tail attached directly to the CDS without 3' UTR. These transcripts are subject to degradation.
retained_intron
Alternatively spliced transcript believed to contain intronic sequence relative to other, coding, variants.
protein_coding
Contains an open reading frame (ORF).
processed_transcript
Doesn't contain an ORF.
non_coding
Transcript which is known from the literature to not be protein coding.
ambiguous_orf
Transcript believed to be protein coding, but with more than one possible open reading frame.
sense_intronic
Long non-coding transcript in introns of a coding gene that does not overlap any exons.
sense_overlapping
Long non-coding transcript that contains a coding gene in its intron on the same strand.
antisense
Has transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand.
known_ncrna
 
pseudogene
Have homology to proteins but generally suffer from a disrupted coding sequence and an active homologous gene can be found at another locus. Sometimes these entries have an intact coding sequence or an open but truncated ORF, in which case there is other evidence used (for example genomic polyA stretches at the 3' end) to classify them as a pseudogene. Can be further classified as one of the following.
processed_pseudogene
Pseudogene that lack introns and is thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome.
polymorphic_pseudogene
Pseudogene owing to a SNP/DIP but in other individuals/haplotypes/strains the gene is translated.
retrotransposed
Pseudogene owing to a reverse transcribed and re-inserted sequence.
transcribed_processed_pseudogene
transcribed_unprocessed_pseudogene
transcribed_unitary_pseudogene
Pseudogene where protein homology or genomic structure indicates a pseudogene, but the presence of locus-specific transcripts indicates expression.
translated_unprocessed_pseudogene
Pseudogene that has mass spec data suggesting that it is also translated.
unitary_pseudogene
A species specific unprocessed pseudogene without a parent gene, as it has an active orthologue in another species.
unprocessed_pseudogene
Pseudogene that can contain introns since produced by gene duplication.
artifact
Used to tag mistakes in the public databases (Ensembl/SwissProt/Trembl)
lincRNA
Long, intervening noncoding (linc) RNA that can be found in evolutionarily conserved, intergenic regions.
macro_lncRNA
Unspliced lncRNA that is several kb in size.
LRG_gene
Gene in a "Locus Reference Genomic" region known to have disease-related sequence variations.
3prime_overlapping_ncRNA
Transcript where ditag and/or published experimental data strongly supports the existence of short non-coding transcripts transcribed from the 3'UTR.
disrupted_domain
Otherwise viable coding region omitted from this alternatively spliced transcript because the splice variation affects a region coding for a protein domain.
vaultRNA
Short non coding RNA gene that forms part of the vault ribonucleoprotein complex.
bidirectional_promoter_lncRNA
A non-coding locus that originates from within the promoter region of a protein-coding gene, with transcription proceeding in the opposite direction on the other strand.
 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.