In recent years, Ribosome Profiling (Ribo-seq) has been used to detect thousands of non-canonical – i.e. unannotated – translated open reading frames (ORFs) in the human genome. GENCODE have now embarked on a long-term community-driven project to incorporate these features into reference gene annotation. This pioneering work is being done in collaboration with the UniProtKB / Swiss-Prot and HGNC annotation projects, alongside a variety of experimental and analytical research groups from across the globe.
The first stages of this work involve making a consensus set of Ribo-seq ORFs identified by seven recent experimental publications mapped to GENCODE version 35 annotations. A pre-print manuscript describing efforts these is available here. We are currently preparing to release a GENCODE-style annotation file for these Ribo-seq ORFs; in the meantime, a spreadsheet containing this first catalog can be found attached to the pre-print.
We strongly recommend reading the manuscript before exploring these annotations. We note that the biological interpretation of Ribo-seq ORFs is not straightforward, and at the present time GENCODE are not providing additional insights into their prospective functionality. In particular, we caution that Ribo-seq ORFs should not be regarded as ‘missing’ protein annotations. It remains to be established which Ribo-seq ORFs are translated into stable proteins, and it is known that translation can instead impart function through gene regulation. It is also plausible that certain Ribo-seq ORFs do not make important contributions to cellular physiology. We hope that these preliminary annotations will help researchers interested in addressing such questions.