Covid-19 Gene Annotation

GENCODE are updating the annotation of human protein-coding genes linked to SARS-CoV-2 infection and COVID-19 disease.

In response to the SARS-CoV-2 pandemic, GENCODE are applying our annotation resources to human genes with potential links to viral infection and COVID-19 disease. This gene list is found here. We are looking to see if the existing annotation for these genes can be improved. Firstly, we are adding transcript models to these genes where possible, i.e. based on the identification of additional alternatively spliced isoforms. Secondly, we are also making changes to existing transcript models where appropriate; in particular we are examining whether our ‘partial’ models - i.e. those that are incomplete at their 5’ and / or 3’ ends - can be extended to full length. For both steps we are making particular usage of ‘Capture-Long-read-Seq’ transcriptomics libraries produced for GENCODE at the Centre for Genomic Regulation, Barcelona.

We emphasise that this work is being carried out according to our existing GENCODE annotation guidelines, and we are working to interpret the expression and function of each gene in a general scientific context. We do not offer any guidance as to which specific annotation changes may be of direct relevance to SARS-CoV-2 infection or COVID-19 disease.

The new annotations will appear in an Ensembl / GENCODE release in due course. However, to expedite faster public access, all annotation changes will be made freely available within 24 hours via our new ‘COVID-19 genes’ Track Hub, which can be accessed at both the Ensembl and UCSC genome browsers. In the Ensembl browser, the hub has been added to the Track Hub Registry (accessed via the ‘Custom tracks’ section), and can be connected to by searching for ‘COVID-19’. Alternatively, you can manually connect to as a custom track. You should use this same URL for connection in the UCSC browser, adding it to ‘My Hubs’ within the ‘Track Hubs’ section. The Track Hub data set contains annotation for only those genes that are included in our COVID-19 gene list. In the Track data view, transcript models that are unchanged with respect to release Ensembl 100 are coloured blue, whereas new models or pre-existing models that have been modified are shown in orange. We also offer BED and gtf files for these annotations.

Our gene list has been extrapolated from recent publications and in collaboration with other projects. The bulk of genes were taken from recently published drug repurposing studies by Zhou et al and Gordon et al. The former study collates host proteins associated with other related coronaviruses based on various types of experimental evidence, while the latter examines human proteins found to physically associate with SARS-CoV-2 viral proteins in the cell. We have also incorporated additional genes featured in lists curated by UniProt and the Human Cell Atlas project, and added several more interferon-stimulated genes with known antiviral activity (see Schoggins and Rice). We emphasise that our overall list is not a set of genes with confirmed scientific or medical importance to SARS-CoV-2 infection or COVID-19 disease, rather a broadly curated list of genes that may be relevant.

We expect that further genes will be added to this list as research in SARS-CoV-2 / COVID-19 continues, and we will update this webpage as appropriate. Please contact us if you believe additional genes should be added to the list, or if you would like further information on any annotation changes.

You may also wish to read our blog post on this work hosted at the Ensembl website.