GENCODE

Statistics about the Human GENCODE Reference Release Set

* The statistics derive from the gtf files that contain only the annotation of the main chromosomes.

For details about the calculation of these statistics please see the README_stats.txt file.


Compare two reference releases »

Version 27 (January 2017 freeze, GRCh38) - Ensembl 90

General stats

Total No of Genes
58288
Protein-coding genes
19836
Long non-coding RNA genes
15778
Small non-coding RNA genes
7569
Pseudogenes
14694
   - processed pseudogenes:
10704
   - unprocessed pseudogenes:
3469
   - unitary pseudogenes:
206
   - polymorphic pseudogenes:
63
   - pseudogenes:
18
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
410
   - pseudogenes:
234
 
 
Total No of Transcripts
200401
Protein-coding transcripts
80930
   - full length protein-coding:
55406
   - partial length protein-coding:
25524
Nonsense mediated decay transcripts
14208
Long non-coding RNA loci transcripts
27908




Total No of distinct translations
60297
Genes that have more than one distinct translations
13580


Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncRNA 31 35
antisense_RNA 5521 11050
bidirectional_promoter_lncRNA 19 40
IG_C_gene 14 23
IG_C_pseudogene 9 9
IG_D_gene 37 37
IG_J_gene 18 18
IG_J_pseudogene 3 3
IG_pseudogene 1 1
IG_V_gene 144 144
IG_V_pseudogene 188 188
lincRNA 7499 13348
macro_lncRNA 1 1
miRNA 1881 1881
misc_RNA 2213 2227
Mt_rRNA 2 2
Mt_tRNA 22 22
non_coding 3 3
non_stop_decay 0 84
nonsense_mediated_decay 0 14208
polymorphic_pseudogene 63 89
processed_pseudogene 10240 10243
processed_transcript 544 28230
protein_coding 19836 80930
pseudogene 18 37
retained_intron 0 27239
ribozyme 8 8
rRNA 544 544
scaRNA 49 49
scRNA 1 1
sense_intronic 905 963
sense_overlapping 189 339
snoRNA 943 955
snRNA 1900 1900
sRNA 5 5
TEC 1066 1165
TR_C_gene 6 6
TR_D_gene 4 4
TR_J_gene 79 79
TR_J_pseudogene 4 4
TR_V_gene 108 108
TR_V_pseudogene 30 30
transcribed_processed_pseudogene 462 462
transcribed_unitary_pseudogene 111 113
transcribed_unprocessed_pseudogene 830 836
translated_processed_pseudogene 2 2
unitary_pseudogene 95 95
unprocessed_pseudogene 2639 2640
vaultRNA 1 1

Version 21 (June 2014 freeze, GRCh38) - Ensembl 77, 78

General stats

Total No of Genes
60155
Protein-coding genes
19881
Long non-coding RNA genes
15877
Small non-coding RNA genes
9534
Pseudogenes
14467
   - processed pseudogenes:
10753
   - unprocessed pseudogenes:
3230
   - unitary pseudogenes:
170
   - polymorphic pseudogenes:
59
   - pseudogenes:
29
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
395
   - pseudogenes:
226
 
 
Total No of Transcripts
196327
Protein-coding transcripts
79377
   - full length protein-coding:
54420
   - partial length protein-coding:
24957
Nonsense mediated decay transcripts
13222
Long non-coding RNA loci transcripts
26414




Total No of distinct translations
59512
Genes that have more than one distinct translations
13526


Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncrna 27 31
all IG_genes 208 242
all other pseudogenes 14468 14507
all RNA pseudogenes 0 0
all RNA_genes 13363 18630
antisense 5542 10397
IG_C_gene 14 30
IG_C_pseudogene 9 9
IG_D_gene 37 37
IG_J_gene 18 18
IG_J_pseudogene 3 3
IG_V_gene 139 157
IG_V_pseudogene 180 180
known_ncrna 2 2
lincRNA 7666 12919
miRNA 3837 3837
misc_RNA 2234 2248
Mt_rRNA 2 2
Mt_tRNA 22 22
non_coding 1 1
non_stop_decay 0 74
nonsense_mediated_decay 0 13222
polymorphic_pseudogene 59 73
processed_pseudogene 10312 10315
processed_transcript 468 26942
protein_coding 19881 79377
pseudogene 29 48
retained_intron 0 26412
rRNA 549 549
sense_intronic 915 975
sense_overlapping 198 324
snoRNA 978 978
snRNA 1912 1912
TEC 1058 1148
TR_C_gene 5 19
TR_D_gene 3 3
TR_J_gene 73 73
TR_J_pseudogene 4 4
TR_V_gene 106 111
TR_V_pseudogene 30 30
transcribed_processed_pseudogene 441 441
transcribed_unitary_pseudogene 1 1
transcribed_unprocessed_pseudogene 658 659
translated_processed_pseudogene 1 1
translated_unprocessed_pseudogene 1 1
unitary_pseudogene 169 169
unprocessed_pseudogene 2571 2573

Version 19 (July 2013 freeze, GRCh37) - Ensembl 74, 75

General stats

Total No of Genes
57820
Protein-coding genes
20345
Long non-coding RNA genes
13870
Small non-coding RNA genes
9013
Pseudogenes
14206
   - processed pseudogenes:
10532
   - unprocessed pseudogenes:
2942
   - unitary pseudogenes:
161
   - polymorphic pseudogenes:
45
   - pseudogenes:
296
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
386
   - pseudogenes:
230
 
 
Total No of Transcripts
196520
Protein-coding transcripts
81814
   - full length protein-coding:
57005
   - partial length protein-coding:
24809
Nonsense mediated decay transcripts
13052
Long non-coding RNA loci transcripts
23898




Total No of distinct translations
61559
Genes that have more than one distinct translations
13600


Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncrna 21 25
all IG_genes 207 217
all other pseudogenes 14206 15343
all RNA pseudogenes 0 0
all RNA_genes 13072 17837
antisense 5276 9710
IG_C_gene 14 18
IG_C_pseudogene 9 10
IG_D_gene 37 37
IG_J_gene 18 18
IG_J_pseudogene 3 3
IG_V_gene 138 144
IG_V_pseudogene 187 196
lincRNA 7114 11780
miRNA 3055 3116
misc_RNA 2034 2050
Mt_rRNA 2 2
Mt_tRNA 22 22
non_stop_decay 0 58
nonsense_mediated_decay 0 13052
polymorphic_pseudogene 45 59
processed_pseudogene 0 10623
processed_transcript 515 28082
protein_coding 20345 81814
pseudogene 13931 387
retained_intron 0 25955
rRNA 527 531
sense_intronic 742 802
sense_overlapping 202 330
snoRNA 1457 1529
snRNA 1916 1923
TR_C_gene 5 5
TR_D_gene 3 3
TR_J_gene 74 74
TR_J_pseudogene 4 4
TR_V_gene 97 97
TR_V_pseudogene 27 27
transcribed_processed_pseudogene 0 442
transcribed_unprocessed_pseudogene 0 860
translated_processed_pseudogene 0 1
unitary_pseudogene 0 182
unprocessed_pseudogene 0 2549

Version 10 (July 2011 freeze, GRCh37) - Ensembl 65

General stats

Total No of Genes
52376
Protein-coding genes
20007
Long non-coding RNA genes
10840
Small non-coding RNA genes
8801
Pseudogenes
12358
   - processed pseudogenes:
8908
   - unprocessed pseudogenes:
2266
   - unitary pseudogenes:
151
   - polymorphic pseudogenes:
27
   - pseudogenes:
814
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
370
   - pseudogenes:
192
Total No of Transcripts
172975
Protein-coding transcripts
78832
   - full length protein-coding:
59895
   - partial length protein-coding:
18937
Nonsense mediated decay transcripts
9619
Long non-coding RNA loci transcripts
17547
 
 
 
 
Total No of distinct translations
61675
Genes that have more than one distinct translations
13569


Further details on this version's gene and transcript types

biotype genes transcripts
3prime_overlapping_ncrna 12 12
all IG_genes 191 194
all other pseudogenes 12358 13416
all RNA pseudogenes 1838 1838
all RNA_genes 10691 11949
ambiguous_orf 20 62
antisense 3526 5446
disrupted_domain 0 1
IG_C_gene 14 16
IG_C_pseudogene 7 7
IG_D_gene 27 27
IG_J_gene 18 18
IG_J_pseudogene 3 3
IG_V_gene 132 133
IG_V_pseudogene 151 151
lincRNA 5484 6742
miRNA 1756 1756
miRNA_pseudogene 15 15
misc_RNA 1187 1187
misc_RNA_pseudogene 3 3
Mt_rRNA 2 2
Mt_tRNA 22 22
Mt_tRNA_pseudogene 580 580
non_coding 104 217
non_stop_decay 0 8
nonsense_mediated_decay 0 9619
polymorphic_pseudogene 27 42
processed_pseudogene 0 8985
processed_transcript * 1271 29900
protein_coding 20007 78832
pseudogene 12139 912
retained_intron 10 19015
retrotransposed 0 211
rRNA 531 531
rRNA_pseudogene 179 179
scRNA_pseudogene 787 787
sense_intronic 395 433
sense_overlapping 18 47
snoRNA 1521 1521
snoRNA_pseudogene 73 73
snRNA 1944 1944
snRNA_pseudogene 73 73
TEC 0 51
TR_C_gene 5 5
TR_D_gene 3 3
TR_J_gene 74 74
TR_J_pseudogene 4 4
TR_V_gene 97 97
TR_V_pseudogene 27 27
transcribed_processed_pseudogene 0 209
transcribed_unprocessed_pseudogene 0 471
tRNA_pseudogene 128 128
unitary_pseudogene 0 167
unprocessed_pseudogene 0 2227

* stats are according to gencode.v10.annotation_updated_ncrna_host.gtf file

Version 7 (December 2010 freeze, GRCh37) - Ensembl 62

General stats

Total No of Genes
51082
Protein-coding genes
20687
Long non-coding RNA genes
9640
Small non-coding RNA genes
8801
Pseudogenes
11580
   - processed pseudogenes:
8298
   - unprocessed pseudogenes:
2117
   - unitary pseudogenes:
138
   - polymorphic pseudogenes:
19
   - pseudogenes:
826
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
374
   - pseudogenes:
182
Total No of Transcripts
161375
Protein-coding transcripts
76052
   - full length protein-coding:
59634
   - partial length protein-coding:
16418
Nonsense mediated decay transcripts
8356
Long non-coding RNA loci transcripts
15512
 
 
 
 
Total No of distinct translations
60495
Genes that have more than one distinct translations
13346


Further details on this version's gene and transcript types

biotype genes transcripts
all IG_genes 292 295
all other pseudogenes 11580 12460
all RNA pseudogenes 1838 1838
all RNA_genes 6446 5979
ambiguous_orf 0 54
antisense 0 133
IG_C_gene 16 18
IG_C_pseudogene 7 7
IG_D_gene 30 30
IG_J_gene 83 83
IG_J_pseudogene 3 3
IG_V_gene 163 164
IG_V_pseudogene 151 151
lincRNA 1239 772
miRNA 1756 1756
miRNA_pseudogene 15 15
misc_RNA 1187 1187
misc_RNA_pseudogene 3 3
Mt_rRNA 2 2
Mt_tRNA 22 22
Mt_tRNA_pseudogene 580 580
non_coding 0 326
nonsense_mediated_decay 0 8356
polymorphic_pseudogene 19 29
processed_pseudogene 0 8381
processed_transcript * 8401 37659
protein_coding 20687 76052
pseudogene 11379 920
retained_intron 0 16350
retrotransposed 0 215
rRNA 531 531
rRNA_pseudogene 179 179
scRNA_pseudogene 787 787
snoRNA 1521 1521
snoRNA_pseudogene 73 73
snRNA 1944 1944
snRNA_pseudogene 73 73
TEC 0 35
TR_C_gene 3 3
TR_J_gene 13 13
TR_V_gene 66 66
TR_V_pseudogene 21 21
transcribed_processed_pseudogene 0 160
transcribed_unprocessed_pseudogene 0 307
tRNA_pseudogene 128 128
unitary_pseudogene 0 144
unprocessed_pseudogene 0 2122

* stats are according to gencode.v7.annotation_updated_ncrna_host.gtf file

Version 3c (July 2009 freeze, GRCh37) - Ensembl 56

General stats

Total No of Genes
47553
Protein-coding genes
22550
Long non-coding RNA genes
6496
Small non-coding RNA genes
9243
Pseudogenes
8894
   - processed pseudogenes:
6232
   - unprocessed pseudogenes:
1147
   - unitary pseudogenes:
100
   - polymorphic pseudogenes:
0
   - pseudogenes:
1415
Immunoglobulin/T-cell receptor gene segments
   - protein coding segments:
370
   - pseudogenes:
0
Total No of Transcripts
132067
Protein-coding transcripts
68880
   - full length protein-coding:
67766
   - partial length protein-coding:
1114
Nonsense mediated decay transcripts
4703
Long non-coding RNA loci transcripts
10475
 
 
 
 
Total No of distinct translations
56217
Genes that have more than one distinct translations
12491
 
Cookies policy | Terms & Conditions. This site is hosted by the Wellcome Trust Sanger Institute.