In recent years, Ribosome Profiling (Ribo-seq) has been used to detect thousands of non-canonical – i.e. unannotated – translated open reading frames (ORFs) in the human genome. GENCODE are working on a long-term community-driven project to incorporate these features into reference gene annotation. This pioneering work is being done in collaboration with the UniProtKB / Swiss-Prot, HUPO-PP and HGNC annotation projects, alongside a variety of experimental and analytical research groups from across the globe.
The first stage of this work involved making a consensus set of Ribo-seq ORFs identified by seven recent experimental publications mapped to GENCODE version 35 annotations. A manuscript detailing this work is available here. A supplementary file containing these 7,264 Ribo-seq ORFs, plus other data, is attached to the publication. A bigBed file is available here, and the URL https://ftp.ebi.ac.uk/pub/databases/gencode/riboseq_orfs/data/Ribo-seq_ORFs.bb can be used to create a custom track on the Ensembl or UCSC genome browsers.
We strongly recommend reading the manuscript before exploring these annotations. The biological interpretation of Ribo-seq ORFs is not straightforward, and at the present time GENCODE are deliberately not providing additional insights into their prospective functionality. In particular, we caution that Ribo-seq ORFs should not be regarded en masse as ‘missing’ protein annotations. It remains to be established which Ribo-seq ORFs are translated into stable proteins, and it is known that translation can instead impart function through gene regulation. It is also plausible that certain Ribo-seq ORFs do not make important contributions to cellular physiology. We hope that these preliminary annotations will help researchers interested in addressing such questions, and anticipate that answers provided will lead to the further improvement and refinement of our catalog.