GRCh37-mapped Release history
GENCODE supports genomics projects that are still attached to GRCh37/hg19 by providing updated human gene annotation on this genome assembly version.
The following GENCODE releases were built on GRCh38, but GRCh37-mapped versions are also available from the links below.
|Freeze date||GENCODE release||Reference release?||Release date||Genome assembly version||UCSC version||Notes|
|05.2018||29||Y||10.2018||mapped to GRCh37||-||re-merge with new Havana annotation, updated Ensembl gene set|
|11.2017||28||N||04.2018||mapped to GRCh37||-||re-merge with new Havana annotation, updated Ensembl gene set|
|01.2017||27||N||08.2017||mapped to GRCh37||27lift37||re-merge with new Havana annotation, updated Ensembl gene set|
|10.2017||26||N||03.2017||mapped to GRCh37||-||re-merge with new Havana annotation, updated Ensembl gene set|
|03.2016||25||N||07.2016||mapped to GRCh37||-||re-merge with new Havana annotation, updated Ensembl gene set|
|08.2015||24||N||12.2015||mapped to GRCh37||24lift37||re-merge with new Havana annotation, updated Ensembl gene set|
|03.2015||23||N||07.2015||mapped to GRCh37||-||re-merge with new Havana annotation, updated Ensembl gene set|
The gene annotation originally created on the GRCh38 reference chromosomes was mapped to GRCh37 using gencode-backmap following the instructions provided in its website.
The program takes the current ("source") GENCODE GFF3 or GTF, cross-assembly (UCSC hg38-to-hg19 liftover) genomic alignments, and the GENCODE 19 ("target") annotation files. The mapping algorithm described in the documentation is as follows.
Mapping is done on a per-gene basis using the following steps:
- Project transcripts of the gene through the alignments, keeping exons chained.
- If there are multiple mappings, first look for ones that the overlapping to the previous version of the transcript, if it exists. Otherwise, if there is a previous version of the gene, select mappings overlapping the gene. Otherwise, to filter for paralog mappings, pick the mapping with the most similar span as the source.
- Project features of the transcript, such as CDS and start codons, to the transcript alignment between the genomes. This ensures that features stay in the same location within the transcript.
- Check all transcripts of the gene for consistency. Reject source gene mappings with transcripts on different chromosomes or strands, or where the genomic length of the gene has changed more than 50%.
- If a version of the gene exists in the target and the mapped gene doesn't overlap the target gene, it is also rejected.
- If a gene did not map or was rejected and a version of the gene with the same biotype exists in the target annotations, use the existing gene.
- Small, automatic-only or all automatic genes are optionally not mapped, with the target annotation being passed through. This avoids complex mappings of small RNAs imported from other database (e.g. mirRNAs).
- Target genes with no corresponding mappings and that overlap patched regions or regions with GRC incident reports in the target genome may optionally be passed through. This addresses a fair number of problem cases. This was a common problem on GRCh37 chrX.
Pairing of source and target genes is somewhat complex due to instability of some gene identifiers between assemblies. If a matching base gene id (less version) is not found, an attempt is made to match the genes using the symbolic name.
Information on each gene mapping is stored as attributes in the GFF3/GTF files. The attributes and their values are:
|attribute name||attribute value|
Attribute that indicates the status of the mapping. Possible values are:
|remap_original_id||Original ID attribute of the feature. If a feature is split when mapped, new IDs are created, otherwise the original ID is used.|
|remap_original_location||Location of the feature in the source genome.|
|remap_num_mappings||Number of mappings of the feature, only one of them was used.|
Attribute that compares the mapping to the existing target annotations. Possible values are:
|remap_substituted_missing_target||Target annotation from which this gene annotation was taken, if the source gene couldn't be mapped or the mapping was ignored (eg. ENSEMBL source). The usual value is "V19" (GENCODE 19).|