November 12, 2019

Widespread misinterpretation of gene expression data

Reproducibility is a major challenge in experimental biology, and with the increasing complexity of data generated by genomic-scale techniques this concern is immensely amplified. RNA-seq, one of the most widely used methods in modern molecular biology, allows in a single test the simultaneous measurement of the expression level of all the genes in a given sample. New research publishing November 12 in the open-access journal PLOS Biology by Shir Mandelbaum, Zohar Manber, Orna Elroy-Stein, and Ran Elkon from Tel Aviv University, identifies a frequent technical bias in data generated by RNA-seq technology, which recurrently leads to false results.

Analysing dozens of publicly available RNA-seq datasets, which profiled the cellular responses to numerous different stresses, Mandelbaum and colleagues noticed that sets of particularly short or long genes repeatedly showed changes in expression level (as shown by the apparent number of RNA transcripts from a given gene).

Puzzled by this recurring pattern, the authors then asked whether it reflects some universal biological response common to many different triggers or it rather stems from some experimental artefact. To tackle this question, they compared replicate samples from the same biological condition. Differences in gene expression between replicates can reflect technical effects that are not related to the experiment's biological factor of interest. Unexpectedly, the same pattern of particularly short or long genes showing changes in expression level was observed in these comparisons between replicates, demonstrating that this pattern is the result of a technical bias that seemed to be coupled with gene length.

A main goal of RNA-seq experiments is to characterize biological processes that are activated or repressed in response to the conditions of interest. Notably, specific biological processes are executed by products of particularly short and long genes. For example, many of the short genes encode proteins that constitute the ribosome, the cell's protein-making machinery. Conversely, many of the long genes encode proteins that constitute the extra-cellular matrix (ECM), the network of macromolecules that provide cells with an external structural support.

Mandelbaum and colleagues were able to show how, in many RNA-seq datasets, the length bias they detected, combined with some flaws in the statistical analysis, can lead to the false identification of specific biological functions (including ribosome and ECM-related functions) as cellular responses to the conditions tested. Importantly, the study also shows how this bias can be removed from the data, thus filtering out false calls while preserving the biologically genuine ones.

Recent years have witnessed a growing alarm about false results in biological research, sometimes referred to as the reproducibility crisis. This study emphasizes the importance of proper statistical handling of data to lessen the number of misleading findings.

More information: Mandelboum S, Manber Z, Elroy-Stein O, Elkon R (2019) Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias. PLoS Biol 17(11): e3000481. doi.org/10.1371/journal.pbio.3000481

Journal information: PLoS Biology

Provided by Public Library of Science

Citation: Widespread misinterpretation of gene expression data (2019, November 12) retrieved 20 April 2024 from https://phys.org/news/2019-11-widespread-misinterpretation-gene.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Study finds age hinders cancer development

80 shares

Feedback to editors

Widespread misinterpretation of gene expression data

Saturday Citations: Irrationality modeled; genetic basis for PTSD; Tasmanian devils still endangered

Lemur's lament: When one vulnerable species stalks another

Study uncovers neural mechanisms underlying foraging behavior in freely moving animals

Scientists assess paths toward maintaining BC caribou until habitat recovers

European XFEL elicits secrets from an important nanogel

Chemists introduce new copper-catalyzed C-H activation strategy

Scientists discover new way to extract cosmological information from galaxy surveys

Compact quantum light processing: New findings lead to advances in optical quantum computing

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

Relevant PhysicsForums posts

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Dolphins don't breathe through their esophagus

Is this egg-laying or something else?

Color Recognition: What we see vs animals with a larger color range

Study finds age hinders cancer development

A novel method to characterize genes with high-precision in single cells

Doing more means changing less when it comes to gene response, new study shows

Computational method makes gene expression analyses more accurate

Mathematics meets biology to uncover unexpected biorhythms

Shifting protein networks in breast cancer may alter gene function

Why zebrafish can regenerate damaged heart tissue, while other fish species cannot

Seeing is believing: Scientists reveal connectome of the fruit fly visual system

Uncovering key players in gene silencing: Insights into plant growth and human diseases

Light show in living cells: New method allows simultaneous fluorescent labeling of many proteins

Key protein regulates immune response to viruses in mammal cells

RNA's hidden potential: New study unveils its role in early life and future bioengineering

Medical Xpress

Tech Xplore

Science X

Widespread misinterpretation of gene expression data

Saturday Citations: Irrationality modeled; genetic basis for PTSD; Tasmanian devils still endangered

Lemur's lament: When one vulnerable species stalks another

Study uncovers neural mechanisms underlying foraging behavior in freely moving animals

Scientists assess paths toward maintaining BC caribou until habitat recovers

European XFEL elicits secrets from an important nanogel

Chemists introduce new copper-catalyzed C-H activation strategy

Scientists discover new way to extract cosmological information from galaxy surveys

Compact quantum light processing: New findings lead to advances in optical quantum computing

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

Relevant PhysicsForums posts

Related Stories

Study finds age hinders cancer development

A novel method to characterize genes with high-precision in single cells

Doing more means changing less when it comes to gene response, new study shows

Computational method makes gene expression analyses more accurate

Mathematics meets biology to uncover unexpected biorhythms

Shifting protein networks in breast cancer may alter gene function

Recommended for you

Why zebrafish can regenerate damaged heart tissue, while other fish species cannot

Seeing is believing: Scientists reveal connectome of the fruit fly visual system

Uncovering key players in gene silencing: Insights into plant growth and human diseases

Light show in living cells: New method allows simultaneous fluorescent labeling of many proteins

Key protein regulates immune response to viruses in mammal cells

RNA's hidden potential: New study unveils its role in early life and future bioengineering

Newsletter sign up

Donate and enjoy an ad-free experience