Artificial Intelligence
How DeepMind's AI Cracked a 50-Year Science Problem Revealed
Google's protein-predicting AI is open-sourced and may disrupt life sciences.
Posted July 19, 2021 Reviewed by Kaja Perina
DeepMind, a Google-owned artificial intelligence (AI) company based in the United Kingdom, made scientific history when it announced last November that it had a solution to a 50-year-old grand challenge in biology—protein folding. This AI machine learning breakthrough may help accelerate the discovery of new medications and novel treatments for diseases. On July 15, 2021 DeepMind revealed details on how its AI works in a new peer-reviewed paper published in Nature, and made its revolutionary AlphaFold version 2.0 model available as open-source on GitHub.
The three-dimensional (3D) shape and function of proteins are determined by the sequence of its amino acids. AlphaFold predicts three-dimensional (3D) models of protein structures. Open-source software makes code open and available to download, which over time may also help to improve the software as well.
“Predicting the 3-D structure that a protein will adopt based solely on its amino acid sequence, the structure prediction component of the ‘protein folding problem’, has been an important open research problem for more than 50 years,” wrote Demis Hassabis, John Jumper, Richard Evans, and their DeepMind research colleagues who authored the new paper.
According to the DeepMind researchers, the structures of roughly 100,000 unique proteins have already been determined by scientists, however this “represents a small fraction of the billions of known protein sequences.” There are approximately 200 million known proteins in existence, and roughly 30 million new ones are discovered annually.
“Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure,” wrote the researchers. “Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics.”
In molecular biology, proteins are large, chemically complex large biomolecules molecules called macromolecules that consist of hundreds or even thousands of amino acids connected in long chains linked through covalent peptide bonds. Amino acids, the building blocks of proteins, are organic molecules that combine to form proteins. There are around 20 types of amino acids in proteins, and each has different chemical traits.
“Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even where no similar structure is known,” reported the DeepMind researchers. “We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experiment in a majority of cases and greatly outperforming other methods.”
AlphaFold uses input features extracted from templates, amino-acid sequence, and multiple sequence alignments (MSAs) for inference and produces features outputs that include per-residue confidence scores, atom coordinates, and the distogram, which is a histogram of distances between pairs of pixels in an image. A histogram is a graphical display of numerical data that represents a frequency distribution with rectangles or bars where the widths represent class intervals and the areas are proportional to the corresponding frequencies. TensorFlow was used to develop the artificial neural networks and Python was used for the data analysis. Markov Chain Monte Carlo based Bayesian analysis was also used.
Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function,” the researchers wrote.
DeepMind’s groundbreaking work and the open-sourcing of AlphaFold version 2.0 may help accelerate innovation through scientific collaboration in life sciences, biotechnology, data science, molecular biology, pharmaceutical, and more industries to discover novel treatments and medications to extend human longevity in the future.
Copyright © 2021 Cami Rosso All rights reserved.