September 3, 2014

A new tool to correct DNA sequencing errors using consensus and context

by Paul Greenfield, Microsoft

The rapid development of next-generation DNA sequencing has revolutionized biological and ecological research in the last few years. The cost of DNA sequencing has fallen dramatically, and sequencing machines are becoming a standard piece of lab equipment. Low-cost sequencing is enabling researchers to uncover the gene differences that make some people more susceptible to diseases; to explore the genetic makeup microbial communities from the human gut or the bottom of the ocean; and to rapidly identify the organism responsible for a life-threatening infection.

But while the costs of sequencing have plummeted, the accuracy of the data produced has improved only slowly: about 1 percent of the bases generated are still called incorrectly. The bioinformatics community has responded to this problem by building specialized error correction tools that use the inherent redundancy in sequence data to find and repair miscalls and other sequencing errors. Tests have shown that incorporating the best of these error-correction tools into standard bioinformatics analytical pipelines can result in much better quality genomes and more accurately called gene variants.

However, accurately correcting errors turns out to be a difficult problem, largely because of the repetitive and ambiguous nature of genomes. It is easy to correct simple substitution errors, such as when 50 sequence reads say that a given base is an A, and only the read being corrected says it's a G. Such simple errors are well handled by downstream tools such as assemblers and aligners. The challenge is making the right correction when there are multiple plausible corrections—such as when 50 reads say A, 49 say G, and the read being corrected says T—as happens whenever reads fall across the end of a repeated region within a genome. Just to make things more challenging, this correction has to be done without any knowledge of the genomes being sequenced, and the only clues about which corrections are '"right" comes from the sequence data itself.

My colleagues and I at the Commonwealth Scientific and Industrial Research Organisation (CSIRO) have just released a new error correction tool we've developed for use by the research community. We call it "Blue." Blue is a high-performance C# application that runs natively on Windows systems, and under Mono on Linux and OS X. As we reported in a paper published in Bioinformatics, test results show that Blue is significantly faster than other available tools—especially on Windows—and is also more accurate as it recursively evaluates possible alternative corrections in the context of the read being corrected.

Another uncommon feature of Blue is that it can correct all three types of possible errors (substitutions, deletions, and insertions), making it suitable for use of data produced by the Roche 454 and Life Technologies Ion Torrent systems. Blue also allows for the correction of one set of reads with a consensus derived from another set of reads, and this capability has been used to correct small numbers of long (and expensive) Roche 454 reads with a consensus derived from a large file of cheaper (but shorter) Illumina reads. This "cross-correction" method has been used very effectively to improve the quality of several reference assemblies, ranging in size from bacteria to moths and grasses.

More information: Paul Greenfield, Konsta Duesing, Alexie Papanicolaou, and Denis C. Bauer. "Blue: correcting sequencing errors using consensus and context." Bioinformatics first published online June 11, 2014 DOI: 10.1093/bioinformatics/btu368

Blue and its associated tools can be downloaded from CSIRO Bioinformatics: www.bioinformatics.csiro.au/blue/

Journal information: Bioinformatics

Provided by Microsoft

Citation: A new tool to correct DNA sequencing errors using consensus and context (2014, September 3) retrieved 20 April 2024 from https://phys.org/news/2014-09-tool-dna-sequencing-errors-consensus.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New software automates and improves phylogenomics from next-generation sequencing data

0 shares

Feedback to editors

A new tool to correct DNA sequencing errors using consensus and context

Lemur's lament: When one vulnerable species stalks another

Study uncovers neural mechanisms underlying foraging behavior in freely moving animals

Scientists assess paths toward maintaining BC caribou until habitat recovers

European XFEL elicits secrets from an important nanogel

Chemists introduce new copper-catalyzed C-H activation strategy

Scientists discover new way to extract cosmological information from galaxy surveys

Compact quantum light processing: New findings lead to advances in optical quantum computing

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

Which countries are more at risk in the global supply chain?

Relevant PhysicsForums posts

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Dolphins don't breathe through their esophagus

Is this egg-laying or something else?

Color Recognition: What we see vs animals with a larger color range

New software automates and improves phylogenomics from next-generation sequencing data

Researchers develop tool to evaluate genome sequencing method

New genetic analysis identifies ancestry, reduces false positives in pinpointing disease

An error-eliminating fix overcomes big problem in '3rd-gen' genome sequencing

Going deep to improve maize transcriptome

New approach to 'spell checking' gene sequences

Researchers train a bank of AI models to identify memory formation signals in the brain

Neuronal gateway to essential molecules in learning and memory discovered on atomic scale

Computer model suggests frozen cells could be used to save northern white rhino from extinction

Plant sensors could act as an early warning system for farmers

A nematode gel to protect crops in Africa and Asia

Making crops colorful for easier weeding by robots

Medical Xpress

Tech Xplore

Science X

A new tool to correct DNA sequencing errors using consensus and context

Lemur's lament: When one vulnerable species stalks another

Study uncovers neural mechanisms underlying foraging behavior in freely moving animals

Scientists assess paths toward maintaining BC caribou until habitat recovers

European XFEL elicits secrets from an important nanogel

Chemists introduce new copper-catalyzed C-H activation strategy

Scientists discover new way to extract cosmological information from galaxy surveys

Compact quantum light processing: New findings lead to advances in optical quantum computing

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

Which countries are more at risk in the global supply chain?

Relevant PhysicsForums posts

Related Stories

New software automates and improves phylogenomics from next-generation sequencing data

Researchers develop tool to evaluate genome sequencing method

New genetic analysis identifies ancestry, reduces false positives in pinpointing disease

An error-eliminating fix overcomes big problem in '3rd-gen' genome sequencing

Going deep to improve maize transcriptome

New approach to 'spell checking' gene sequences

Recommended for you

Researchers train a bank of AI models to identify memory formation signals in the brain

Neuronal gateway to essential molecules in learning and memory discovered on atomic scale

Computer model suggests frozen cells could be used to save northern white rhino from extinction

Plant sensors could act as an early warning system for farmers

A nematode gel to protect crops in Africa and Asia

Making crops colorful for easier weeding by robots

Newsletter sign up

Donate and enjoy an ad-free experience