Human

GRCh37-mapped Release history

GENCODE supports genomics projects that are still attached to GRCh37/hg19 by providing updated human gene annotation on this genome assembly version.

The following GENCODE releases were built on GRCh38, but GRCh37-mapped versions are also available from the links below.

Show all releases

Freeze date	GENCODE release	Reference release?	Release date	Genome assembly version	UCSC version	Notes
08.2024	48	Y	05.2025	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set
04.2024	47	N	10.2024	mapped to GRCh37	47lift37	re-merge with new Havana annotation, updated Ensembl gene set
09.2023	46	N	05.2024	mapped to GRCh37	46lift37	re-merge with new Havana annotation, updated Ensembl gene set
03.2023	45	N	01.2024	mapped to GRCh37	45lift37	re-merge with new Havana annotation, updated Ensembl gene set
12.2022	44	N	07.2023	mapped to GRCh37	44lift37	re-merge with new Havana annotation, updated Ensembl gene set
08.2022	43	N	02.2023	mapped to GRCh37	43lift37	re-merge with new Havana annotation, updated Ensembl gene set
04.2022	42	N	10.2022	mapped to GRCh37	42lift37	re-merge with new Havana annotation, updated Ensembl gene set
01.2022	41	N	07.2022	mapped to GRCh37	41lift37	re-merge with new Havana annotation, updated Ensembl gene set
08.2021	40	N	04.2022	mapped to GRCh37	40lift37	re-merge with new Havana annotation, updated Ensembl gene set
05.2021	39	N	12.2021	mapped to GRCh37	39lift37	re-merge with new Havana annotation, updated Ensembl gene set
12.2020	38	N	05.2021	mapped to GRCh37	38lift37	re-merge with new Havana annotation, updated Ensembl gene set
08.2020	37	N	02.2021	mapped to GRCh37	37lift37	re-merge with new Havana annotation, updated Ensembl gene set
05.2020	36	N	10.2020	mapped to GRCh37	36lift37	re-merge with new Havana annotation, updated Ensembl gene set
03.2020	35	N	08.2020	mapped to GRCh37	35lift37	re-merge with new Havana annotation, updated Ensembl gene set
11.2019	34	N	04.2020	mapped to GRCh37	34lift37	re-merge with new Havana annotation, updated Ensembl gene set
08.2019	33	N	01.2020	mapped to GRCh37	33lift37	re-merge with new Havana annotation, updated Ensembl gene set
05.2019	32	N	09.2019	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set
02.2019	31	N	06.2019	mapped to GRCh37	31lift37	re-merge with new Havana annotation, updated Ensembl gene set
11.2018	30	N	04.2019	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set
05.2018	29	N	10.2018	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set
11.2017	28	N	04.2018	mapped to GRCh37	28lift37	re-merge with new Havana annotation, updated Ensembl gene set
01.2017	27	N	08.2017	mapped to GRCh37	27lift37	re-merge with new Havana annotation, updated Ensembl gene set
10.2016	26	N	03.2017	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set
03.2016	25	N	07.2016	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set
08.2015	24	N	12.2015	mapped to GRCh37	24lift37	re-merge with new Havana annotation, updated Ensembl gene set
03.2015	23	N	07.2015	mapped to GRCh37	-	re-merge with new Havana annotation, updated Ensembl gene set

Mapping algorithm

The gene annotation originally created on the GRCh38 reference chromosomes was mapped to GRCh37 using gencode-backmap following the instructions provided in its website.

The program takes the current ("source") GENCODE GFF3 or GTF, cross-assembly (UCSC hg38-to-hg19 liftover) genomic alignments, and the GENCODE 19 ("target") annotation files. The mapping algorithm described in the documentation is as follows.

Mapping is done on a per-gene basis using the following steps:

Project transcripts of the gene through the alignments, keeping exons chained.
- If there are multiple mappings, first look for the ones that overlap with the previous version of the transcript, if it exists. Otherwise, if there is a previous version of the gene, select mappings overlapping the gene. Otherwise, to filter for paralog mappings, pick the mapping with the most similar span as the source.
- Project features of the transcript, such as CDS and start codons, to the transcript alignment between the genomes. This ensures that features stay in the same location within the transcript.
Check all transcripts of the gene for consistency. Reject source gene mappings with transcripts on different chromosomes or strands, or where the genomic length of the gene has changed more than 50%.
If a version of the gene exists in the target and the mapped gene doesn't overlap the target gene, it is also rejected.
- If a gene did not map or was rejected and a version of the gene with the same biotype exists in the target annotations, use the existing gene.
Small, automatic-only or all automatic genes are optionally not mapped, with the target annotation being passed through. This avoids complex mappings of small RNAs imported from other database (e.g. mirRNAs).
Target genes with no corresponding mappings and that overlap patched regions or regions with GRC incident reports in the target genome may optionally be passed through. This addresses a fair number of problem cases. This was a common problem on GRCh37 chrX.

Pairing of source and target genes is somewhat complex due to instability of some gene identifiers between assemblies. If a matching base gene id (less version) is not found, an attempt is made to match the genes using the symbolic name.

Mapping categories

Information on each gene mapping is stored as attributes in the GFF3/GTF files. The attributes and their values are:

attribute name	attribute value
remap_status	Attribute that indicates the status of the mapping. Possible values are: full_contig: Gene or transcript completely mapped to the target genome with all features intact. full_fragment: Gene or transcript completely mapped to the target genome with insertions in some features. These are usually small insertions. partial: Gene or transcript partially mapped to the target genome. deleted: Gene or transcript did not map to the target genome. no_seq_map: The source sequence is not in the assembly alignments. This will occur with alt loci genes if the alignments only contain the primary assembly. gene_conflict: Transcripts in the gene mapped to multiple locations. gene_size_change: Transcripts caused gene length to change by more than 50%. This is to detect mapping to processed pseudogenes and mapping across tandem gene duplications. automatic_small_ncrna_gene: Gene is from a small, automatic (ENSEMBL source) non-coding RNA. Taken from the target annotation. automatic_gene: Gene is from an automatic process (ENSEMBL source). Taken from the target annotation. pseudogene: Pseudogene annotations (excluding polymorphic).
remap_original_id	Original ID attribute of the feature. If a feature is split when mapped, new IDs are created, otherwise the original ID is used.
remap_original_location	Location of the feature in the source genome.
remap_num_mappings	Number of mappings of the feature, only one of them was used.
remap_target_status	Attribute that compares the mapping to the existing target annotations. Possible values are: new: Gene or transcript was not in target annotations. lost: Gene or transcript exists in source and target genome, however source was not mapped. overlap: Gene or transcript overlaps previous version of annotation on target genome. nonOverlap: Gene or transcript exists in target, however source mapping is to a different location. This is often mappings to a gene family members or pseudogenes.
remap_substituted_missing_target	Target annotation from which this gene annotation was taken, if the source gene couldn't be mapped or the mapping was ignored (eg. ENSEMBL source). The usual value is "V19" (GENCODE 19).

Human

GRCh37-mapped Release history

Mapping algorithm

Mapping categories

More about GENCODE Human