Physics, Accuracy and Machine Learning: Towards the next-generation of Molecular Potentials

PHYMOL: Scientific goals

The fundamental interactions between (neutral) molecules are relatively weak, but they determine much of the complex phenomena in solids, liquids, and gases. These intermolecular interactions are of paramount importance at interfaces, in molecular crystals, in cells, and even in interstellar gas clouds. These interactions are not easy to compute from first principles because the small size of these energies places extreme demands on the theoretical and numerical methods used. They are also often quite difficult to model (i.e. to construct an analytic, easily computable representation) accurately due to the subtle effects of anisotropy (atoms in a molecule are not spheres), many-body non-additivity (the whole is not the sum of its parts), and quantum tunneling (charge-transfer or delocalization). Simple models ignore, or average out many of these subtleties, but while these computationally simple models allow the study of large systems at long time-scales, they do so at the cost of accuracy and predictive power.

This is best exemplified in blind tests of organic molecular crystal structure prediction conducted by the Cambridge Crystallographic Data Centre, which have conclusively demonstrated that the most reliable predictions of the structures and free energy ranking of the molecular crystals is obtained by a combination of advanced theory-based models with ab initio methods such as density functional theory, and these combined approaches vastly outperformed the empirical models. There are tangible consequences for the increase in predictive power that arises from paying attention to physical details of the phenomena we aim to model: in the case of molecular crystals this leads to a better understanding of the stability of the crystalline material and its polymorphs. A profound understanding is the key to avoid the humanitarian and financial disasters that have arisen when one drug form transforms into another in an unexpected way, as it has happened with the antiretroviral medication Ritonavir.

Another application is transit transmission spectroscopy, where one observes absorption by an exoplanetary atmosphere as the planet transits between us and its sun. Here, collision-induced absorption (one of the highly sensitive ways in which we will assess reference data in PHYMOL) gives information on the atmospheric pressure. Of particular interest is the measurement of collision-induced absorption by O2, as evidence of an O2-rich atmosphere could be explained by photosynthesis, and therefore life, on the exoplanet.

Central to these applications is the PES — the potential energy surface — that needs to be constructed as an accurate mathematical model that is capable of accurately describing the intermolecular interactions as well as those within the molecular complexes, and also the couplings between these. Additionally, the PES must be computationally cheap to evaluate so as to allow us to simulate large systems for long time-scales. This is where we see the strong interlinking of physical models and machine learning: by combining the best of these two — in a sense, by combining human learning with machine learning — we will see the biggest advances in intermolecular model building for targeted applications.

The main scientific goals of the PHYMOL network are contained in three scientific work packages:

WP1 concerns the development and assessment of the fundamental theoretical models for intermolecular interactions, and serves as a means of providing skills to create and implement new ab initio models, as well as to critically evaluate the models using spectroscopic data. WP1 involves three doctoral positions: DC1 will be involved with developing a new model for the induction energy, DC2 will study the CO-CO* excimer, and DC3 will study collision-induced absorption spectra of a number of complexes of importance to atmospheric models.

WP2 is focused on taking data from the best available methods currently available and, with machine-learning and the best physical understanding we possess, creating models that can be used in simulations. WP1 involves four doctoral positions:DC4 will work on reparameterization of semiempirical models, DC5 will develop models that incorporate the MBD dispersion model, DC6 will develop methods to treat polarization and charge-delocalization on a physically equivalent footing, and DC11 will develop techniques to map force-field parameters onto properties of the electronic density.

In WP3 we see the applications of the best available models to difficult problems faced by industry. Here, our DCs will be faced with the challenges of understanding the complex nature of real-world problems, and the issues involved with the methods and models when using them in computer simulations. DC7 will investigate how intermolecular interactions shape the energy landscape of molecular crystals, DC8 will be involved with state-of-the-art modelling of metal atomic quantum clusters, DC9 will develop implicit machine learning solvent models for molecules in confined spaces, and DC10 will develop open source big data analysis tools for industry.


For more information, click on the figure.

Key Facts

  • Scientific beneficiaries: 10
  • Partners: 6
  • Number of countries involved: 10
  • Budget: approx. €2.6M from Horizon Europe and €300K from UKRI.
  • Number of funded doctoral candidates: 11
  • Coordinated by Prof Piotr Zuchowski from Universytet Mikłaja Kopernika, Torun
  • PHYMOL is a doctoral network funded mainly under the Horizon Europe scheme, and also by the UK Research and Innovation



Piotr Zuchowski

Project Manager:

Agata Wiśniewska

Also find us on :

PHYMOL© 2023, All rights reserved