Human

GRCh37-mapped Release history

GENCODE supports genomics projects that are still attached to GRCh37/hg19 by providing updated human gene annotation on this genome assembly version.

The following GENCODE releases were built on GRCh38, but GRCh37-mapped versions are also available from the links below.

Show all releases

Freeze date GENCODE release Reference release? Release date Genome assembly version Ensembl release UCSC version Notes
03.2023 45 Y 01.2024 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set
12.2022 44 N 07.2023 mapped to GRCh37 44lift37 re-merge with new Havana annotation, updated Ensembl gene set
08.2022 43 N 02.2023 mapped to GRCh37 43lift37 re-merge with new Havana annotation, updated Ensembl gene set
04.2022 42 N 10.2022 mapped to GRCh37 42lift37 re-merge with new Havana annotation, updated Ensembl gene set
01.2022 41 N 07.2022 mapped to GRCh37 41lift37 re-merge with new Havana annotation, updated Ensembl gene set
08.2021 40 N 04.2022 mapped to GRCh37 40lift37 re-merge with new Havana annotation, updated Ensembl gene set
05.2021 39 N 12.2021 mapped to GRCh37 39lift37 re-merge with new Havana annotation, updated Ensembl gene set
12.2020 38 N 05.2021 mapped to GRCh37 38lift37 re-merge with new Havana annotation, updated Ensembl gene set
08.2020 37 N 02.2021 mapped to GRCh37 37lift37 re-merge with new Havana annotation, updated Ensembl gene set
05.2020 36 N 10.2020 mapped to GRCh37 36lift37 re-merge with new Havana annotation, updated Ensembl gene set
03.2020 35 N 08.2020 mapped to GRCh37 35lift37 re-merge with new Havana annotation, updated Ensembl gene set
11.2019 34 N 04.2020 mapped to GRCh37 34lift37 re-merge with new Havana annotation, updated Ensembl gene set
08.2019 33 N 01.2020 mapped to GRCh37 33lift37 re-merge with new Havana annotation, updated Ensembl gene set
05.2019 32 N 09.2019 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set
02.2019 31 N 06.2019 mapped to GRCh37 31lift37 re-merge with new Havana annotation, updated Ensembl gene set
11.2018 30 N 04.2019 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set
05.2018 29 N 10.2018 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set
11.2017 28 N 04.2018 mapped to GRCh37 28lift37 re-merge with new Havana annotation, updated Ensembl gene set
01.2017 27 N 08.2017 mapped to GRCh37 27lift37 re-merge with new Havana annotation, updated Ensembl gene set
10.2016 26 N 03.2017 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set
03.2016 25 N 07.2016 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set
08.2015 24 N 12.2015 mapped to GRCh37 24lift37 re-merge with new Havana annotation, updated Ensembl gene set
03.2015 23 N 07.2015 mapped to GRCh37 - re-merge with new Havana annotation, updated Ensembl gene set

Mapping algorithm

The gene annotation originally created on the GRCh38 reference chromosomes was mapped to GRCh37 using gencode-backmap following the instructions provided in its website.

The program takes the current ("source") GENCODE GFF3 or GTF, cross-assembly (UCSC hg38-to-hg19 liftover) genomic alignments, and the GENCODE 19 ("target") annotation files. The mapping algorithm described in the documentation is as follows.

Mapping is done on a per-gene basis using the following steps:

  • Project transcripts of the gene through the alignments, keeping exons chained.
    • If there are multiple mappings, first look for the ones that overlap with the previous version of the transcript, if it exists. Otherwise, if there is a previous version of the gene, select mappings overlapping the gene. Otherwise, to filter for paralog mappings, pick the mapping with the most similar span as the source.
    • Project features of the transcript, such as CDS and start codons, to the transcript alignment between the genomes. This ensures that features stay in the same location within the transcript.
  • Check all transcripts of the gene for consistency. Reject source gene mappings with transcripts on different chromosomes or strands, or where the genomic length of the gene has changed more than 50%.
  • If a version of the gene exists in the target and the mapped gene doesn't overlap the target gene, it is also rejected.
    • If a gene did not map or was rejected and a version of the gene with the same biotype exists in the target annotations, use the existing gene.
  • Small, automatic-only or all automatic genes are optionally not mapped, with the target annotation being passed through. This avoids complex mappings of small RNAs imported from other database (e.g. mirRNAs).
  • Target genes with no corresponding mappings and that overlap patched regions or regions with GRC incident reports in the target genome may optionally be passed through. This addresses a fair number of problem cases. This was a common problem on GRCh37 chrX.

Pairing of source and target genes is somewhat complex due to instability of some gene identifiers between assemblies. If a matching base gene id (less version) is not found, an attempt is made to match the genes using the symbolic name.

Mapping categories

Information on each gene mapping is stored as attributes in the GFF3/GTF files. The attributes and their values are:

attribute name attribute value
remap_status Attribute that indicates the status of the mapping. Possible values are:
  • full_contig: Gene or transcript completely mapped to the target genome with all features intact.
  • full_fragment: Gene or transcript completely mapped to the target genome with insertions in some features. These are usually small insertions.
  • partial: Gene or transcript partially mapped to the target genome.
  • deleted: Gene or transcript did not map to the target genome.
  • no_seq_map: The source sequence is not in the assembly alignments. This will occur with alt loci genes if the alignments only contain the primary assembly.
  • gene_conflict: Transcripts in the gene mapped to multiple locations.
  • gene_size_change: Transcripts caused gene length to change by more than 50%. This is to detect mapping to processed pseudogenes and mapping across tandem gene duplications.
  • automatic_small_ncrna_gene: Gene is from a small, automatic (ENSEMBL source) non-coding RNA. Taken from the target annotation.
  • automatic_gene: Gene is from an automatic process (ENSEMBL source). Taken from the target annotation.
  • pseudogene: Pseudogene annotations (excluding polymorphic).
remap_original_id Original ID attribute of the feature. If a feature is split when mapped, new IDs are created, otherwise the original ID is used.
remap_original_location Location of the feature in the source genome.
remap_num_mappings Number of mappings of the feature, only one of them was used.
remap_target_status Attribute that compares the mapping to the existing target annotations. Possible values are:
  • new: Gene or transcript was not in target annotations.
  • lost: Gene or transcript exists in source and target genome, however source was not mapped.
  • overlap: Gene or transcript overlaps previous version of annotation on target genome.
  • nonOverlap: Gene or transcript exists in target, however source mapping is to a different location. This is often mappings to a gene family members or pseudogenes.
remap_substituted_missing_target Target annotation from which this gene annotation was taken, if the source gene couldn't be mapped or the mapping was ignored (eg. ENSEMBL source). The usual value is "V19" (GENCODE 19).