November 22, 2019 feature

A deep learning-based model DeepSpCas9 to predict SpCas9 activity

by Thamarasee Jeewandara , Phys.org

In a new report on Science Advances, Hui Kwon Kim and interdisciplinary researchers at the departments of Pharmacology, Electrical and Computer Engineering, Medical Sciences, Nanomedicine and Bioinformatics in the Republic of Korea, evaluated the activities of SpCas9; a bacterial RNA-guided Cas9 endonuclease variant (a bacterial enzyme that cuts DNA for genome editing) from Streptococcus pyogenes. They used a high-throughput approach with 12,832 target sequences based on a human cell library to build a deep learning model and predict the activity of SpCas9.

The data contained oligonucleotides (nucleotides or building blocks) containing target sequence pairs and a corresponding guide sequence to encode single-guide RNA (sgRNA), which can direct the Cas9 protein to bind and cleave a specific DNA sequence for genome editing. They implemented deep learning-based training on the large dataset of SpCas9-induced indel (insertion or deletion) frequencies to develop an SpCas9 activity predicting model named DeepSpCas9 now available online. When the team tested the software against independently generated datasets, the results showed high generalization performance, i.e. the model could properly adapt to new, previously unseen data.

The CRISPR-Cas prokaryotic adaptive immune system functions as a genome editing tool with translational research potential in a variety of species and cell types including human cells, where the capacity to accurately predict SpCas9 enzyme activity is important. Researchers had previously developed several computational models to predict SpCas9 activity based on datasets of phenotypic changes of gene-edited cells or based on medium-sized datasets of plasmid-based (vehicles that transfer genes between bacteria and other cells) library-on-library approaches. However, the generalization performance of these models were limited, since the quality and size of the datasets were not ideal. For instance, model-predicted gene insertions and deletions (indels) to create functional knockout models (a method to inactivate genes in an experimental animal model in lab) resulted in false negatives. Additionally, these SpCas9-induced indel frequency datasets were also only medium-sized.

Kim et al. had previously reported on a deep learning-based computational model named DeepCpf1 to predict the activity of a different endonuclease (AsCpf1 from Acidaminococcus species) with high generalization performance. For this, they used lentiviral libraries of guide-RNA-encoding, target sequence pairs to generate a large training dataset known as DeepCpf1. While similar library-based methods were used to develop computational models that predicted indel frequencies generated by the Cas9 enzyme, a large dataset of Cas9-induced frequencies remains to be formed.

Scientists must therefore develop Cas9 activity-predicting computational models with high generalization performance. In this work, Kim et al. generated a high-throughput model to test SpCas9-induced indel frequencies at tens of thousands of target sequences by modifying their previously developed DeepCpf1 method to form DeepSpCas9. The DeepSpCas9 web tool is a deep learning-based model that can accurately predict the activities of SpCas9 with high generalization performance.

Kim et al. first prepared a lentiviral (a complex retrovirus subfamily that can incorporate foreign DNA) library of 15,656 guide RNA (gRNA)-encoding and target sequence pairs, for high-throughput assessment of SpCas9 activities. The research team amplified the pool of oligonucleotides containing pairs of guide and target sequences using the polymerase chain reaction (PCR) and cloned them into a lentiviral plasmid (transgene delivery system to transfer genetic material between cells) using the Gibson DNA assembly technique.

In a two-step approach, the researchers cut plasmids and inserted the sgRNA scaffold sequence at the cut site to generate plasmid libraries. To subsequently form a cell library, the scientists treated human embryonic kidney cells (HEK 293T) with lentivirus generated from the plasmid library. Each cell now contained a synthetic target sequence in its genome and expressed the corresponding sgRNA. The scientists then treated the cell library with the SpCas9-encoding lentivirus to cause sgRNA-directed cleavage and indel formation at the target sequences with frequencies that depended on the sgRNA activity. To measure the indel frequencies, the scientists PCR-amplified the target sequences and subjected them to deep sequencing. Based on the high throughput experiments, Kim et al. generated two datasets for training and testing purposes of the DeepSpCas9 model.

The scientists selected SpCas9 activities at 124 endogenous target sites with different properties of chromatin accessibility (effect of chromatin structure modifications on gene transcription) to test if the indel frequencies at the integrated synthetic target sequence correlated with those at the corresponding endogenous site. They observed a strong correlation between indel frequencies at the ingrained target sites and at the endogenous locations within the HEK cells.

The research team next developed an accurate computational model to predict SpCas9 activity on a large dataset using an end-to-end deep learning framework to form DeepSpCas9 and predict the SpCas9 activity. For the base model architecture, they used a convolutional neural network (CNN, similar to ordinary neural networks) and for the input sequence they used a 30-nucleotide sequence, which they converted into a four-dimensional binary matrix using one-hot encoding (splitting columns containing numerical categorical data to many columns). To understand the generalization performance of model selection and training, the team conducted 10-fold cross-validation using Spearman correlation coefficients between experimental measurements and predicted Cas9 activity levels.

When they increased the size of the training dataset for cross-validation, the average Spearman correlation coefficients between the experimental indel frequencies and predicted scores from the DeepSpCas9 model steadily increased up to 0.77. Compared to conventional machine learning algorithms such as support vector machine (SVM), AdaBoost (adaptive boosting), random forest and gradient-boosted regression trees, previously used for SpCas9 activity prediction, Spearman correlations of the DeepSpCas9 model were significantly higher. In total, DeepSpCas9 exhibited the best performance among all models.

In previous work, Kim et al. considered chromatin accessibility information to improve the prediction of AsCpf1 enzyme activities at endogenous target sites. They sought to determine if such considerations would also improve SpCas9 activity predictions. The results implied that fine-tuning with chromatin accessibility information barely improved the accuracy of DeepSpCas9 to predict indel frequencies at endogenous sites compared to their previous efforts with AsCpf1. The SpCas9 activity was only therefore slightly affected by chromatin accessibility in strong contrast to the previously developed DeepCpf1 algorithm.

To understand the generalization performance of DeepSpCas9, the research team tested the model using sufficiently large, published datasets derived from diverse research studies as test data. They compared the results with those of other SpCas9 activity predicting programs such as DeepCRISPR. The results suggested DeepSpCas9 to maintain the highest generalization function among nine published models used to predict SpCas9 activity. In this way, Hui Kwon Kim and research team extensively validated the potential to accurately predict SpCas9 activity using the DeepSpCas9 web tool, now available online, alongside supplementary code provided for research scientists to incorporate DeepSpCas9 into existing models. Based on the high generalization performance of DeepSpCas9, the research team expect to facilitate higher accuracy for SpCas9-based genome editing.

More information: Hui Kwon Kim et al. SpCas9 activity prediction by DeepSpCas9, a deep learning–based model with high generalization performance, Science Advances (2019). DOI: 10.1126/sciadv.aax9249

Hui Kwon Kim et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity, Nature Biotechnology (2018). DOI: 10.1038/nbt.4061

Hui K Kim et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity, Nature Methods (2016). DOI: 10.1038/nmeth.4104

Journal information: Science Advances , Nature Biotechnology , Nature Methods

Citation: A deep learning-based model DeepSpCas9 to predict SpCas9 activity (2019, November 22) retrieved 26 April 2024 from https://phys.org/news/2019-11-deep-learning-based-deepspcas9-spcas9.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New genome editing technology for plant breeding

127 shares

Feedback to editors

A deep learning-based model DeepSpCas9 to predict SpCas9 activity

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Automated machine learning robot unlocks new potential for genetics research

Relevant PhysicsForums posts

The Cass Report (UK)

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Dolphins don't breathe through their esophagus

New genome editing technology for plant breeding

A new CRISPR-Cas9 protein to increase precision of gene editing

Researchers identify drugs that block CRISPR-Cas9 genome editing

Research team evolves CRISPR-Cas9 nucleases with novel properties

New CRISPR-Cas9 variant may boost precision in gene editing

High-fidelity CRISPR-Cas9 nucleases have no detectable off-target mutations

Automated machine learning robot unlocks new potential for genetics research

AI deciphers new gene regulatory code in plants and makes accurate predictions for newly sequenced genomes

Study details a common bacterial defense against viral infection

Study suggests host response needs to be studied along with other bacteriophage research

Researchers decipher how an enzyme modifies the genetic material in the cell nucleus

Scientists discover higher levels of CO₂ increase survival of viruses in the air and transmission risk

Medical Xpress

Tech Xplore

Science X

A deep learning-based model DeepSpCas9 to predict SpCas9 activity

Ridesourcing platforms thrive on socio-economic inequality, say researchers

Did Vesuvius bury the home of the first Roman emperor?

Florida dolphin found with highly pathogenic avian flu: Report

A new way to study and help prevent landslides

New algorithm cuts through 'noisy' data to better predict tipping points

Researchers reconstruct landscapes that greeted the first humans in Australia around 65,000 years ago

High-precision blood glucose level prediction achieved by few-molecule reservoir computing

Enhancing memory technology: Multiferroic nanodots for low-power magnetic storage

Researchers advance detection of gravitational waves to study collisions of neutron stars and black holes

Automated machine learning robot unlocks new potential for genetics research

Relevant PhysicsForums posts

Related Stories

New genome editing technology for plant breeding

A new CRISPR-Cas9 protein to increase precision of gene editing

Researchers identify drugs that block CRISPR-Cas9 genome editing

Research team evolves CRISPR-Cas9 nucleases with novel properties

New CRISPR-Cas9 variant may boost precision in gene editing

High-fidelity CRISPR-Cas9 nucleases have no detectable off-target mutations

Recommended for you

Automated machine learning robot unlocks new potential for genetics research

AI deciphers new gene regulatory code in plants and makes accurate predictions for newly sequenced genomes

Study details a common bacterial defense against viral infection

Study suggests host response needs to be studied along with other bacteriophage research

Researchers decipher how an enzyme modifies the genetic material in the cell nucleus

Scientists discover higher levels of CO₂ increase survival of viruses in the air and transmission risk

Newsletter sign up

Donate and enjoy an ad-free experience