Ribo-seq ORFs

Ribosome profiling (Ribo-seq) has revealed thousands of translated open reading frames (ORFs) in the human genome that are not currently represented as protein-coding CDSs in reference gene annotation. GENCODE is working with UniProtKB, HUPO-PP, PeptideAtlas, HGNC and experimental and analytical research groups through the TransCODE community to evaluate how these features should be represented in reference annotation.

This page provides access to GENCODE-supported Ribo-seq ORF catalogs, browser tracks and sequence files. All coordinates are on GRCh38.

Available datasets

Phase 1 is the initial consensus set of 7,264 Ribo-seq ORFs identified from seven experimental publications and mapped to GENCODE v35. The corresponding manuscript is available here.

Phase 2 is an updated catalog based on GENCODE v45 and published in Nucleic Acids Research in 2026, following initial release on bioRxiv in July 2025 (10.1093/nar/gkag234). Phase 2 includes a broader Comprehensive set and a higher-confidence Primary set selected using Ribo-seq translation-signature scores. Dataset files are available via the GENCODE FTP site here. A UCSC Genome Browser track hub is available here, or can be loaded directly in the UCSC Genome Browser.

A parallel line of work evaluates which Phase 1 Ribo-seq ORFs are supported by peptide evidence. This work has now been published in Nature as “Expanding the human proteome with microproteins and peptideins” (10.1038/s41586-026-10459-x). Supplementary Table S3 lists Ribo-seq ncORFs annotated in the Human non-HLA PeptideAtlas 2023-06 build, Supplementary Table S6 lists peptides mapped to Ribo-seq ncORFs from the Human HLA PeptideAtlas 2023-11 build, and Supplementary Table S12 lists peptideins: Ribo-seq ORFs appraised by TransCODE and PeptideAtlas as having sufficient evidence for annotation as peptide products of yet to be determined functional significance.

Phase 1 files have also been standardised into BED12 and sequence formats. The Phase 1 BED12 files encode peptide support tiers in the itemRgb field; see the Phase 1 README for the colour scheme. A standalone UCSC Genome Browser track hub for the peptide-support track is available here.

Standardised release files

The table below provides access to standardised BED12 and sequence files for the Phase 1 and Phase 2 ORF sets. Phase 1 ORFs are annotated against GENCODE v35; Phase 2 ORFs against GENCODE v45. Phase 2 amino acid sequences (.faa) are translated computationally from the spliced nucleotide sequence.

Dataset # Features GENCODE BED12 Nucleotide (.fna) Amino acid (.faa) bigBed
Phase 1 — all 7,264 v35 Ribo-seq_ORFs.all.bed Ribo-seq_ORFs.all.fna Ribo-seq_ORFs.all.faa Ribo-seq_ORFs_peptide_support.bb
Phase 1 — peptideins 121 v35 Ribo-seq_ORFs.peptideins.bed Ribo-seq_ORFs.peptideins.fna Ribo-seq_ORFs.peptideins.faa Included in Phase 1 — all
Phase 2 — primary 10,127 v45 Ribo-seq_ORFs.primary.bed Ribo-seq_ORFs.primary.fna Ribo-seq_ORFs.primary.faa Ribo-seq_ORFs.primary.bb
Phase 2 — comprehensive 28,359 v45 Ribo-seq_ORFs.comprehensive.bed Ribo-seq_ORFs.comprehensive.fna Ribo-seq_ORFs.comprehensive.faa Ribo-seq_ORFs.comprehensive.bb

Important interpretation notes

We strongly recommend reading the associated manuscripts before interpreting these annotations. Ribo-seq provides evidence of translation, but the biological interpretation of individual Ribo-seq ORFs is not always straightforward.

GENCODE has so far annotated only a modest number of these ORFs as protein-coding genes. Evidence that an ORF is translated, or even that a corresponding peptide can be detected, is not necessarily sufficient on its own to support protein-coding gene annotation. This is particularly relevant for ORFs supported by immunopeptidomics evidence, where detected peptides may derive from translation products that are not fully stable proteins.

Translated ORFs may contribute to biology in several ways: by encoding stable proteins or microproteins, by producing peptides detectable by proteomics or immunopeptidomics, by regulating translation of another coding sequence, or by having little measurable cellular consequence. These catalogs are intended to support research into these possibilities and will continue to be refined as additional evidence becomes available.