GENCODE

Statistics about the current GENCODE freeze (version 21)

Statistics of previous GENCODE freezes are found archived here.

* The statistics derive from the gtf file that contains only the annotation of the main chromosomes.

For details about the calculation of these statistics please see the README_stats.txt file.

Version 21 (June 2014 freeze, GRCh38) - Ensembl 77

General stats

Total No of Genes
60155
Protein-coding genes
19881
Long non-coding RNA genes
15877
Small non-coding RNA genes
9534
Pseudogenes
14467
   - processed pseudogenes:
10753
   - unprocessed pseudogenes:
3230
   - unitary pseudogenes:
170
   - polymorphic pseudogenes:
59
   - pseudogenes:
29
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
395
   - pseudogenes:
226
 
 
Total No of Transcripts
196327
Protein-coding transcripts
79377
   - full length protein-coding:
54420
   - partial length protein-coding:
24957
Nonsense mediated decay transcripts
13222
Long non-coding RNA loci transcripts
26414




Total No of distinct translations
59512
Genes that have more than one distinct translations
13526


Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncrna 27 31
all IG_genes 208 242
all other pseudogenes 14468 14507
all RNA pseudogenes 0 0
all RNA_genes 13363 18630
antisense 5542 10397
IG_C_gene 14 30
IG_C_pseudogene 9 9
IG_D_gene 37 37
IG_J_gene 18 18
IG_J_pseudogene 3 3
IG_V_gene 139 157
IG_V_pseudogene 180 180
known_ncrna 2 2
lincRNA 7666 12919
miRNA 3837 3837
misc_RNA 2234 2248
Mt_rRNA 2 2
Mt_tRNA 22 22
non_coding 1 1
non_stop_decay 0 74
nonsense_mediated_decay 0 13222
polymorphic_pseudogene 59 73
processed_pseudogene 10312 10315
processed_transcript 468 26942
protein_coding 19881 79377
pseudogene 29 48
retained_intron 0 26412
rRNA 549 549
sense_intronic 915 975
sense_overlapping 198 324
snoRNA 978 978
snRNA 1912 1912
TEC 1058 1148
TR_C_gene 5 19
TR_D_gene 3 3
TR_J_gene 73 73
TR_J_pseudogene 4 4
TR_V_gene 106 111
TR_V_pseudogene 30 30
transcribed_processed_pseudogene 441 441
transcribed_unitary_pseudogene 1 1
transcribed_unprocessed_pseudogene 658 659
translated_processed_pseudogene 1 1
translated_unprocessed_pseudogene 1 1
unitary_pseudogene 169 169
unprocessed_pseudogene 2571 2573

Version 10 (July 2011 freeze, GRCh37) - Ensembl 65

Statistics about the Reference Gene Set for the ENCODE analysis (version 10)

General stats

Total No of Genes
52376
Protein-coding genes
20007
Long non-coding RNA genes
10840
Small non-coding RNA genes
8801
Pseudogenes
12358
   - processed pseudogenes:
8908
   - unprocessed pseudogenes:
2266
   - unitary pseudogenes:
151
   - polymorphic pseudogenes:
27
   - pseudogenes:
814
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
370
   - pseudogenes:
192
Total No of Transcripts
172975
Protein-coding transcripts
78832
   - full length protein-coding:
59895
   - partial length protein-coding:
18937
Nonsense mediated decay transcripts
9619
Long non-coding RNA loci transcripts
17547
 
 
 
 
Total No of distinct translations
61675
Genes that have more than one distinct translations
13569


Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncrna 12 12
all IG_genes 191 194
all other pseudogenes 12358 13416
all RNA pseudogenes 1838 1838
all RNA_genes 10691 11949
ambiguous_orf 20 62
antisense 3526 5446
disrupted_domain 0 1
IG_C_gene 14 16
IG_C_pseudogene 7 7
IG_D_gene 27 27
IG_J_gene 18 18
IG_J_pseudogene 3 3
IG_V_gene 132 133
IG_V_pseudogene 151 151
lincRNA 5484 6742
miRNA 1756 1756
miRNA_pseudogene 15 15
misc_RNA 1187 1187
misc_RNA_pseudogene 3 3
Mt_rRNA 2 2
Mt_tRNA 22 22
Mt_tRNA_pseudogene 580 580
non_coding 104 217
non_stop_decay 0 8
nonsense_mediated_decay 0 9619
polymorphic_pseudogene 27 42
processed_pseudogene 0 8985
processed_transcript * 1271 29900
protein_coding 20007 78832
pseudogene 12139 912
retained_intron 10 19015
retrotransposed 0 211
rRNA 531 531
rRNA_pseudogene 179 179
scRNA_pseudogene 787 787
sense_intronic 395 433
sense_overlapping 18 47
snoRNA 1521 1521
snoRNA_pseudogene 73 73
snRNA 1944 1944
snRNA_pseudogene 73 73
TEC 0 51
TR_C_gene 5 5
TR_D_gene 3 3
TR_J_gene 74 74
TR_J_pseudogene 4 4
TR_V_gene 97 97
TR_V_pseudogene 27 27
transcribed_processed_pseudogene 0 209
transcribed_unprocessed_pseudogene 0 471
tRNA_pseudogene 128 128
unitary_pseudogene 0 167
unprocessed_pseudogene 0 2227

* stats are according to gencode.v10.annotation_updated_ncrna_host.gtf file

Version 7 (December 2010 freeze, GRCh37) - Ensembl 62

Statistics about the Reference Gene Set for the ENCODE analysis (version 7)

General stats

Total No of Genes
51082
Protein-coding genes
20687
Long non-coding RNA genes
9640
Small non-coding RNA genes
8801
Pseudogenes
11580
   - processed pseudogenes:
8298
   - unprocessed pseudogenes:
2117
   - unitary pseudogenes:
138
   - polymorphic pseudogenes:
19
   - pseudogenes:
826
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
374
   - pseudogenes:
182
Total No of Transcripts
161375
Protein-coding transcripts
76052
   - full length protein-coding:
59634
   - partial length protein-coding:
16418
Nonsense mediated decay transcripts
8356
Long non-coding RNA loci transcripts
15512
 
 
 
 
Total No of distinct translations
60495
Genes that have more than one distinct translations
13346


Further details on this version's gene and transcript types

biotype genes transcripts
all IG_genes 292 295
all other pseudogenes 11580 12460
all RNA pseudogenes 1838 1838
all RNA_genes 6446 5979
ambiguous_orf 0 54
antisense 0 133
IG_C_gene 16 18
IG_C_pseudogene 7 7
IG_D_gene 30 30
IG_J_gene 83 83
IG_J_pseudogene 3 3
IG_V_gene 163 164
IG_V_pseudogene 151 151
lincRNA 1239 772
miRNA 1756 1756
miRNA_pseudogene 15 15
misc_RNA 1187 1187
misc_RNA_pseudogene 3 3
Mt_rRNA 2 2
Mt_tRNA 22 22
Mt_tRNA_pseudogene 580 580
non_coding 0 326
nonsense_mediated_decay 0 8356
polymorphic_pseudogene 19 29
processed_pseudogene 0 8381
processed_transcript * 8401 37659
protein_coding 20687 76052
pseudogene 11379 920
retained_intron 0 16350
retrotransposed 0 215
rRNA 531 531
rRNA_pseudogene 179 179
scRNA_pseudogene 787 787
snoRNA 1521 1521
snoRNA_pseudogene 73 73
snRNA 1944 1944
snRNA_pseudogene 73 73
TEC 0 35
TR_C_gene 3 3
TR_J_gene 13 13
TR_V_gene 66 66
TR_V_pseudogene 21 21
transcribed_processed_pseudogene 0 160
transcribed_unprocessed_pseudogene 0 307
tRNA_pseudogene 128 128
unitary_pseudogene 0 144
unprocessed_pseudogene 0 2122

* stats are according to gencode.v7.annotation_updated_ncrna_host.gtf file

 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.