Materials science and chemistry can work hand in hand with machine learning to develop novel materials for tomorrow's energy storage needs!

Machine Learning and the Future of Sustainability Science

Guest Post by Sabari Kumar, 2023-2024 Sustainability Leadership Fellow, and Ph.D. Student in the Department of Chemistry at Colorado State University

It seems like you can’t go more than a few minutes these days without hearing about machine learning (ML); recent technical advances in the field have made it more accessible than ever, allowing lay people to generate realistic text, create compelling artwork, and make lifelike videos. The use of generative tools like ChatGPT has become commonplace, and it can be difficult to differentiate the outputs of these models from something that a human created. In addition to their more mainstream uses, ML tools have prompted a quiet revolution in the sciences, ushering in advances in materials science and green chemistry.

For context, the perennial problem in designing novel chemical materials, like new biofuels or energy storage materials for batteries, is one of scale. There are billions and billions of chemical compounds, and selecting compounds to use for a particular application is frequently a labor and time-intensive proposition. This selection process traditionally involves lots of experimentation by trained chemists in a laboratory. For example, taking an example from the development of pharmaceutical compounds, experimental screening to find a potential drug candidate that lowers blood pressure and is safe for humans can take over a decade [1]. With advances in computing power over the last few decades, computer simulation of chemical compounds has emerged as a viable alternative to physical experimentation – this has resulted in large scale elimination of chemical waste and huge time savings [2].

However, running these computer simulations is no small feat – they require huge amounts of computing power and time. Consider the case of simulating the production of an average protein molecule in our cells, such as the rhodopsin protein that allows us to see; on average, this process takes around 20 seconds. Anton3 is a built-from-scratch supercomputer designed to run these calculations as fast as possible and represents the current state of the art in chemical simulation. Simulating the full process of producing rhodopsin on Anton3 would take over 500 years and use ¼ of the energy Fort Collins uses in one year!

Nowadays, these expensive simulations and exhaustive experimental screening campaigns can be replaced by a quick, relatively cheap ML prediction. To illustrate this, let’s consider designing a novel biofuel blend to replace diesel. Since we want our new biofuel to be a drop-in replacement for diesel, it needs to have characteristics similar to diesel – for example, it should have similar reactivity and viscosity characteristics. At the same time, we’d like for it to produce less soot, and thus, less air pollution. It also needs to be easily synthesized from commonly available biomass – ideally, waste from activities like paper manufacturing or corn oil production.

One way that we can do this is by individually optimizing each of the properties we care about. For each desired property, we find molecules that have acceptable property values and then select the common molecules from these sets. Work done by the Kim group at Colorado State University has employed this tactic to great success – we’ve developed machine learning models that can accurately predict several different fuel properties directly from chemical structures [3] [4]. This allows fuel designers to construct fuel blends with very specific properties, tailoring fuels to specific engine designs to maximize power output and minimize emissions.

This work isn’t limited to just sustainable biofuels: ML models have been developed to design new sustainable battery electrolytes that increase energy storage capacity without needing to use environmentally fraught rare earth metals like cobalt, lithium, or nickel [5]; to generate new inorganic materials for catalysis and energy storage [6]; and to predict the structure of proteins [7] and drug molecules[8]. These models have greatly accelerated the pace of scientific progress in developing clean energy technologies and reducing chemical waste from costly experimentation. With the rapid pace of development in ML technologies, the future for sustainable science has never looked brighter!

1. Hughes, James P., et al. “Principles of early drug discovery.” British journal of pharmacology 162.6 (2011): 1239-1249.

2. Coley, Connor W. “Defining and exploring chemical spaces.” Trends in Chemistry 3.2 (2021): 133-145.

3. Kim, Yeonjoon, et al. Designing high-performance fuels through graph neural networks for predicting cetane number of multicomponent surrogate mixtures. No. 2023-32-0052. SAE Technical Paper, 2023.

4. Kim, Yeonjoon, et al. “Design Green Chemicals by Predicting Vaporization Properties Using Explainable Graph Attention Networks.” (2023).

5. SV, Shree Sowndarya, et al. “Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries.” Nature Machine Intelligence 4.8 (2022): 720-730.

6. Merchant, Amil, et al. “Scaling deep learning for materials discovery.” Nature (2023): 1-6.

7. Lin, Zeming, et al. “Evolutionary-scale prediction of atomic-level protein structure with a language model.” Science 379.6637 (2023): 1123-1130.

8. Abdin, Osama, and Philip M. Kim. “PepFlow: direct conformational sampling from peptide energy landscapes through hypernetwork-conditioned diffusion.” bioRxiv (2023): 2023-06.

Share this post

Share on facebook
Share on google
Share on twitter
Share on linkedin
Share on pinterest
Share on print
Share on email