Bringing the power of equivariant NN potential through the interface of MACE to MLatom@XACS

Bringing the power of equivariant NN potential through the interface of MACE to MLatom@XACS

Equivariant potentials are the (relatively) new kid on the block with promising high accuracy in published benchmarks. One of them is MACE which we now added to the zoo of machine learning potentials available through the interfaces in MLatom. See the above figure with the overview of MLPs supported by MLatom (in bold) and other representatives (modified from our MLP benchmark paper). We have just released the MLatom 3.1.0 version with MACE and show how to use it here.

Installation

pip install mlatom
git clone https://github.com/ACEsuit/mace.git 
pip install ./mace

Data preparation

Below we provide a 1000-point dataset that was randomly sampled from the MD17 dataset for the ethanol molecule as the training data (xyz.dat, en.dat, grad.dat, which store the geometries, potential energies, and energy gradients respectively) along with test data of another 1000 points (names begin with “test_”).

Note that the energies are in Hartree, and distances are in Ångstrom.

Training, testing and using MACE can be done through input files, command line, and Python API. Below we show how.

Training and testing with input file and command line

createMLmodel            # task to create MLmodel
XYZfile=xyz.dat          # file with geometies
Yfile=en.dat             # file with energies
YgradXYZfile=grad.dat    # file with energy gradients
MLmodelType=MACE         # specify the model type to be MACE
mace.max_num_epochs=100  # only train for 100 epochs (optional)
MLmodelOut=mace.pt       # give your trained model a name

You can save the following input in file train.inp and then run it with MLatom in your terminal as:

> mlatom train.inp

Alternatively, you can run all options in the command line:

> mlatom createMLmodel XYZfile=xyz.dat Yfile=en.dat YgradXYZfile=grad.dat MLmodelType=MACE mace.max_num_epochs=100 MLmodelOut=mace.pt

You can also submit a job to our XACS cloud computing or use its online terminal. It’s free, but training only on CPUs can be very slow. To speed up the test, you can comment out or delete the line YgradXYZfile=grad.dat, which would only train on energies but will be faster.

The web interface of XACS cloud computing’s job submitter.

After the training of 100 epochs is finished (it may take a while especially if you don’t use a GPU), you will see the analysis of the training performance generated by MACE and MLatom. My result looks like:

2024-01-05 17:17:31.318 INFO: 
+-------------+--------------+------------------+-------------------+
| config_type | RMSE E / meV | RMSE F / meV / A | relative F RMSE % |
+-------------+--------------+------------------+-------------------+
|    train    |     14.3     |       24.0       |        2.45       |
|    valid    |     14.1     |       26.0       |        2.65       |
+-------------+--------------+------------------+-------------------+

The validation RMSE is 14.1 meV (or 0.33 kcal/mol), which is quite impressive for just 1000 training points.

Then you can test the trained model with the test files by following inputs:

useMLmodel
XYZfile=test_xyz.dat
YgradXYZestFile=test_gradest.dat
Yestfile=test_enest.dat
MLmodelType=MACE
MLmodelIn=mace.pt
analyze 
Yfile=test_en.dat 
YgradXYZfile=test_grad.dat 
Yestfile=test_enest.dat 
YgradXYZestFile=test_gradest.dat

The analysis results looks like (note that the orignal unit is Hartree and Hartree/Angstrom):

Analysis for values
 Statistical analysis for 1000 entries in the set
   MAE =           0.0006553622464
   MSE =          -0.0006529191680
   RMSE =           0.0007100342323
   mean(Y) =        -154.8910225874238
   mean(Yest) =        -154.8916755065918
   correlation coefficient =           0.9992099019391
   linear regression of {y, y_est} by f(a,b) = a + b * y
     R^2 =           0.9984203065680
...
 Analysis for gradients in XYZ coordinates
 Statistical analysis for 1000 entries in the set
   MAE =           0.0008618973153
   MSE =          -0.0000057122824
   RMSE =           0.0012088419764
   mean(Y) =           0.0000057123026
   mean(Yest) =           0.0000000000202
   correlation coefficient =           0.9996190787940
   linear regression of {y, y_est} by f(a,b) = a + b * y
     R^2 =           0.9992383026890
...

Around 0.45 kcal/mol for energy and 0.76 kcal/mol/A for gradients.

Training and using in Python

MLatom can be used in your Python scripts too. Below it is embedded in the Google colab:

Here is the break down of the commands if you do not have access to Google colab. First, let’s import MLatom:

import mlatom as ml

which offers greate flexibility. You can check the documentation from here.

Doing the training in Python is also simple.

First, load the data into a molecular database:

molDB = ml.data.molecular_database.from_xyz_file(filename = 'xyz.dat')
molDB.add_scalar_properties_from_file('en.dat', 'energy') 
molDB.add_xyz_vectorial_properties_from_file('grad.dat', 'energy_gradients')

Then define a MACE model and train with the database:

model = ml.models.mace(model_file='mace.pt', hyperparameters={'max_num_epochs': 100})
model.train(molDB, property_to_learn='energy', xyz_derivative_property_to_learn='energy_gradients')

Making predictions with the model:

test_molDB = ml.data.molecular_database.from_xyz_file(filename = 'test_xyz.dat')
test_molDB.add_scalar_properties_from_file('test_en.dat', 'energy') 
test_molDB.add_xyz_vectorial_properties_from_file('test_grad.dat', 'energy_gradients')

model.predict(molecular_database=test_molDB, property_to_predict='mace_energy', xyz_derivative_property_to_predict='mace_gradients')

Then you can do analysis whatever you like, e.g. calculate RMSE:

ml.stats.rmse(test_molDB.get_properties('energy'), test_molDB.get_properties('mace_energy'))*ml.constants.Hartree2kcalpermol

ml.stats.rmse(test_molDB.get_xyz_vectorial_properties('energy_gradients').flatten(), test_molDB.get_xyz_vectorial_properties('mace_gradients').flatten())*ml.constants.Hartree2kcalpermol

Using the model

After the model is trained, it can be used with MLatom for applications, e.g., geometry optimizations or MD, check out MLatom’s manual for details. Here is brief example how the input file for geometry optimization would look like:

geomopt                      # Request geometry optimization
MLmodelType=MACE             # use ML model of the MACE type
MLmodelIn=mace.pt            # the model to be used
XYZfile=ethanol_init.xyz     # The file with the initial guess
optXYZ=eq_MACE.xyz           # optimized geometry output

In Python, geometry optimization is also quite simple:

import mlatom as ml

# load initial geometry
mol = ml.data.molecule.from_xyz_file('ethanol_init.xyz')
print(mol.get_xyz_string())

# load the model
model = ml.models.mace(model_file='mace.pt')

# run geometry optimization
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())

Summary

We are glad to introduce here the MACE interface in MLatom and share the tutorial on how to use it. This model shows a great performance even with a relatively small data set size. Hope it will be helpful to you.

Finaly, atomistic machine learning is growing fast, and as an integrative platform, MLatom will keep evolving with state-of-the-art methods to offer the best experience for the comunity.

Leave a Reply

Your email address will not be published. Required fields are marked *

*