Molecular Medicine Israel

Microproteins encoded by noncanonical ORFs are a major source of tumor-specific antigens in a liver cancer patient meta-cohort

bstract

The expression of tumor-specific antigens during cancer progression can trigger an immune response against the tumor. Here, we investigate if microproteins encoded by noncanonical open reading frames (ncORFs) are a relevant source of tumor-specific antigens. We analyze RNA sequencing data from 117 hepatocellular carcinoma (HCC) tumors and matched healthy tissue together with ribosome profiling and immunopeptidomics data. Combining human leukocyte antigen–epitope binding predictions and experimental validation experiments, we conclude that around 40% of the tumor-specific antigens in HCC are likely to be derived from ncORFs, including two peptides that can trigger an immune response in humanized mice. We identify a subset of 33 tumor-specific long noncoding RNAs expressing novel cancer antigens shared by more than 10% of the HCC samples analyzed, which, when combined, cover a large proportion of the patients. The results of the study open avenues for extending the range of anticancer vaccines.

INTRODUCTION

Immunotherapy approaches against cancer, including immune checkpoint inhibitors (ICIs) and vaccines, rely on the ability of the immune system to recognize “nonself” antigens bound to human leukocyte antigen (HLA) molecules. Such neoepitopes can originate not only from nonsynonymous mutations in the cancer genome that result in mutated peptides but also from aberrant gene expression in tumors. The first class of antigens is especially relevant in cancers associated with a large number of mutations, such as melanoma, lung cancer, or bladder cancer (1). Expectedly, tumor mutational burden and the number of mutated peptides with predicted affinity to HLA molecules are positively correlated with the response to ICIs (23).

The second class of antigens might be particularly relevant to develop therapeutic strategies for tumors that mutate less frequently, such as hepatocellular carcinoma (HCC), which represents ~90% of cases of liver cancer. Known cancer-specific antigens include the so-called cancer/testis antigens (CTAs) as well as peptides derived from reactivated human endogenous retroviruses (HERVs) (4). These antigens can be found in different cancer types, and they can be shared by several patients. Some of them, such as MAGE1A and NY-ESO, have been the basis of several cancer vaccines (5). Current limitations are the relatively low number of suitable targets with high tumor specificity and their sparse expression in cancer patient samples.

A promising approach to expand the current range of cancer-specific antigens that can be targeted by immunotherapy approaches is to consider the translation products of noncanonical open reading frames (ncORFs). These ORFs are located in sequences that are not annotated as protein coding. One well-studied example is the MELOE-1 and MELOE-2 peptides encoded by the long noncoding transcript meloe (67). This transcript is overexpressed in melanomas, and the encoded peptides generate a reactive T cell response (8). In the past few years, thousands of long noncoding RNAs (lncRNAs) containing ncORFs that are translated into microproteins have been described previously (910). In addition, mass spectrometry (MS) immunopeptidomics data from cancer cell lines and tumors indicate that ncORFs can generate peptides that are presented by HLA molecules (1117). It has been reported that ncORF products can represent up to 15% of the HLA-I–bound peptides in certain tumor types (18), a sizable fraction that remains largely uncharacterized. In addition, they appear to give rise to HLA-I–bound peptides more frequently than standard proteins (19).

To be able to avoid immune self-tolerance, the ncORFs need to be expressed in a tumor-specific manner. However, due to the lack of studies comparing tumor and healthy tissues from the same set of patients, it is unclear how many of the previously reported ncORF-derived antigens are actually restricted to tumors. Thus, it is not known if peptides derived from ncORFs could be relevant as therapeutic targets. To address these questions, we have focused on tumor and matched healthy tissue sequencing data from a larger number of patients with HCC. Treatment of HCC in advanced stages remains a challenge (20). Because this is a type of cancer with relatively few mutations, antigens derived from tumor-specific transcripts could play a major role in driving immunogenicity. We present data supporting that ncORFs are a relevant source of tumor-specific antigens in HCC. The findings could have important implications for the development of cancer vaccines of wide applicability.

RESULTS

The integration of different tumor/normal matched datasets results in a large meta-cohort for the discovery of tumor-specific transcripts

We identified four HCC patient cohorts with transcriptomics data for both tumor and adjacent normal tissue (Fig. 1A, HCC1 to HCC3 and TCGA, and table S1) (2124). Together, this represented a meta-cohort of 117 patients. We also identified ribosome profiling (Ribo-Seq) sequencing data from an additional set of 10 HCC tumors (HCC4) (17). We used several previously described HCC biomarkers to validate these datasets: two genes that tend to be overexpressed in HCC—TERT (24) and THBS4 (25)—and one that is usually underexpressed—MT1M (26). Consistent with these previous findings, we found that TERT and THBS4 had significantly higher expression levels in tumor than in normal matched samples in all cohorts and that MT1M showed the opposite tendency (Fig. 1B).

After validating the datasets with the above biomarkers, we designed a pipeline that combined different computational and experimental methods to unravel the impact of ncORFs in the generation of tumor-specific antigens in the set of 117 patients (Fig. 1C). The first steps were centered on the quantification of gene expression, the discovery of novel transcripts, and the identification of tumor-specific transcripts from tumor/normal matched RNA sequencing (RNA-Seq) data. We also predicted ncORF translation by the analysis of Ribo-Seq data and putative HLA-I–binding peptides using patient-specific HLA information (table S2). To validate the predictions, we performed in vitro HLA-peptide binding assays of a subset of the candidates as well as immunogenicity experiments in mice expressing the human HLA molecule (Fig. 1C). The analysis provided information about the quantitative relevance of different types of tumor antigens in HCC. It was also informative on the distribution of these antigens in the patient population. We identified a set of highly tumor-specific lncRNAs containing ncORFs with translation and immunopeptidomics evidence. The results are described in the next sections.

Thousands of noncoding transcripts are expressed in HCC tumors

We used the RNA-Seq data from the four HCC cohorts to quantify the expression level of protein-coding genes and lncRNAs as well as to perform genome-guided de novo transcript assembly and identify transcripts not annotated in Ensembl (novel transcripts). LncRNAs and novel transcripts showed overall lower expression values than protein-coding genes (Fig. 1D); only those expressed above a given cutoff [fragments per kilobase million (FPKM) > 1 or FPKM > 2 depending on the dataset] were selected for further analyses (fig. S1 and tables S3 to S5). As expected, lncRNAs and novel transcripts tended to have a lower number of introns than protein-coding genes (Fig. 1E and fig. S2). We also noted that most novel transcripts, even if not annotated in Ensembl, matched entries in miTranscriptome, a gene database that contains an extended set of cancer transcripts (fig. S3) (27). Each tumor sample expressed around 10,000 to 12,000 protein-coding genes together with 2000 to 4000 noncoding transcripts (lncRNAs and novel transcripts) (Fig. 1F and fig. S4). We found that, in general, the expression of lncRNAs and novel transcripts was more patient-specific than the expression of protein-coding genes (fig. S5).

Tumor lncRNAs are pervasively translated

Recent studies have shown that many lncRNAs contain ORFs that are translated into small proteins or microproteins (91028). Here, we used Ribo-Seq data from HCC (cohort HCC4; Fig. 1A) to predict the level of translation of the previously identified tumor lncRNAs and novel transcripts (cohorts HCC1 to HCC3 and TCGA). To obtain reliable estimates, we focused on those transcripts that were widely expressed in HCC4 and at least another cohort (see Materials and Methods). In addition to ATG, we also considered near cognate codons (ACG, CTG, GTG, and TTG) as putative start sites as these codons have been shown to frequently initiate translation of ncORFs (1319). Translation was predicted using RibORF (v1.0) (Fig. 2A and fig. S6; see Materials and Methods) (29). We identified 251 unique translated lncRNAs, including 124 transcripts that were common to all cohorts (Fig. 2B). A large fraction of the latter transcripts (86 of 124) had also been predicted to be translated in a study that analyzed different cancer cell lines and tumors (table S6) (13), which reinforced our results. Because the latter study did not include HCC data, this also implies that many of these lncRNAs are expressed in different cancer types.

The number of ncORFs for which translation was detected was 909, with 524 being common to all cohorts (Fig. 2C). Most of the transcripts contained multiple translated ncORFs (Fig. 2D). As expected, the resulting proteins tended to be smaller than canonical proteins (Fig. 2E). Translation predictions comprised ORFs initiated not only at ATG but also at alternative sites, especially CTG (Fig. 2F).

We used the Ribo-Seq data to compute a translation index for lncRNAs and novel transcripts, which we defined as the fraction of ncORF sequence predicted to be translated. In the case of lncRNAs, the translation index was 0.116. It was calculated taking into account the total percentage of translated ORFs (8.3%; table S7) as well as the fact that translated ORFs tended to be somewhat longer than nontranslated ORFs (fig. S7). The same estimation for novel transcripts resulted in a much smaller translation index (0.0053), indicating that the latter transcripts are rarely translated.

Figure 2 (G and H) shows examples of putatively translated ORFs in ZNF674-AS1 and LINC01419, respectively. ZNF674-AS1 is transcribed in antisense direction to the protein-coding gene ZNF674 through the use of a bidirectional promoter and low expression in tumors is associated with bad prognosis (30). LINC01419 is an lncRNA that is transcribed and translated in tumor samples but not in the healthy controls.

Tumor-specific transcripts are enriched in lncRNAs and novel transcripts

Microproteins generated from ncORFs in tumor-specific lncRNAs are a potential source of cancer antigens with immunotherapy applications, as described for some of the canonical CTAs (31). To determine how many of the ncORFs expressed in tumors were tumor-specific, we discarded cases that were expressed in matched healthy liver samples, Genotype-Tissue Expression (GTEx) gene expression tables for nonreproductive organs, or in a collection de novo assembled transcriptomes from diverse healthy organs (see Materials and Methods). Expression in testis was not considered an impediment as this is an immunocompromised tissue that can also express antigens of interest for anticancer vaccination. Notably, we found that, among tumor-specific transcripts, lncRNAs and novel transcripts were more numerous than protein-coding genes (Fig. 3A, fig. S8, and table S8 and S9). This was in sharp contrast with the observations for overall tumor expression, which was dominated by protein-coding transcripts (Fig. 1F)….

Sign up for our Newsletter