Bringing the power of equivariant NN potential through the interface of MACE to MLatom@XACS
Equivariant potentials are the (relatively) new kid on the block with promising high accuracy in published benchmarks. One of them is MACE which we now added to the zoo of machine learning potentials available through the interfaces in MLatom. See the above figure with the overview of MLPs supported by MLatom (in bold) and other representatives (modified from our MLP benchmark paper). We have just released the MLatom 3.1.0 version with MACE and show how to use it here.
Installation
pip install mlatom
git clone https://github.com/ACEsuit/mace.git
pip install ./mace
Data preparation
Below we provide a 1000-point dataset that was randomly sampled from the MD17 dataset for the ethanol molecule as the training data (xyz.dat, en.dat, grad.dat, which store the geometries, potential energies, and energy gradients respectively) along with test data of another 1000 points (names begin with “test_”).
Note that the energies are in Hartree, and distances are in Ångstrom.
Training, testing and using MACE can be done through input files, command line, and Python API. Below we show how.
Training and testing with input file and command line
createMLmodel # task to create MLmodel
XYZfile=xyz.dat # file with geometies
Yfile=en.dat # file with energies
YgradXYZfile=grad.dat # file with energy gradients
MLmodelType=MACE # specify the model type to be MACE
mace.max_num_epochs=100 # only train for 100 epochs (optional)
MLmodelOut=mace.pt # give your trained model a name
You can save the following input in file train.inp
and then run it with MLatom in your terminal as:
> mlatom train.inp
Alternatively, you can run all options in the command line:
> mlatom createMLmodel XYZfile=xyz.dat Yfile=en.dat YgradXYZfile=grad.dat MLmodelType=MACE mace.max_num_epochs=100 MLmodelOut=mace.pt
You can also submit a job to our XACS cloud computing or use its online terminal. It’s free, but training only on CPUs can be very slow. To speed up the test, you can comment out or delete the line YgradXYZfile=grad.dat
, which would only train on energies but will be faster.
After the training of 100 epochs is finished (it may take a while especially if you don’t use a GPU), you will see the analysis of the training performance generated by MACE and MLatom. My result looks like:
2024-01-05 17:17:31.318 INFO:
+-------------+--------------+------------------+-------------------+
| config_type | RMSE E / meV | RMSE F / meV / A | relative F RMSE % |
+-------------+--------------+------------------+-------------------+
| train | 14.3 | 24.0 | 2.45 |
| valid | 14.1 | 26.0 | 2.65 |
+-------------+--------------+------------------+-------------------+
The validation RMSE is 14.1 meV (or 0.33 kcal/mol), which is quite impressive for just 1000 training points.
Then you can test the trained model with the test files by following inputs:
useMLmodel
XYZfile=test_xyz.dat
YgradXYZestFile=test_gradest.dat
Yestfile=test_enest.dat
MLmodelType=MACE
MLmodelIn=mace.pt
analyze
Yfile=test_en.dat
YgradXYZfile=test_grad.dat
Yestfile=test_enest.dat
YgradXYZestFile=test_gradest.dat
The analysis results looks like (note that the orignal unit is Hartree and Hartree/Angstrom):
Analysis for values
Statistical analysis for 1000 entries in the set
MAE = 0.0006553622464
MSE = -0.0006529191680
RMSE = 0.0007100342323
mean(Y) = -154.8910225874238
mean(Yest) = -154.8916755065918
correlation coefficient = 0.9992099019391
linear regression of {y, y_est} by f(a,b) = a + b * y
R^2 = 0.9984203065680
...
Analysis for gradients in XYZ coordinates
Statistical analysis for 1000 entries in the set
MAE = 0.0008618973153
MSE = -0.0000057122824
RMSE = 0.0012088419764
mean(Y) = 0.0000057123026
mean(Yest) = 0.0000000000202
correlation coefficient = 0.9996190787940
linear regression of {y, y_est} by f(a,b) = a + b * y
R^2 = 0.9992383026890
...
Around 0.45 kcal/mol for energy and 0.76 kcal/mol/A for gradients.
Training and using in Python
MLatom can be used in your Python scripts too. Below it is embedded in the Google colab:
Here is the break down of the commands if you do not have access to Google colab. First, let’s import MLatom:
import mlatom as ml
which offers greate flexibility. You can check the documentation from here.
Doing the training in Python is also simple.
First, load the data into a molecular database:
molDB = ml.data.molecular_database.from_xyz_file(filename = 'xyz.dat')
molDB.add_scalar_properties_from_file('en.dat', 'energy')
molDB.add_xyz_vectorial_properties_from_file('grad.dat', 'energy_gradients')
Then define a MACE model and train with the database:
model = ml.models.mace(model_file='mace.pt', hyperparameters={'max_num_epochs': 100})
model.train(molDB, property_to_learn='energy', xyz_derivative_property_to_learn='energy_gradients')
Making predictions with the model:
test_molDB = ml.data.molecular_database.from_xyz_file(filename = 'test_xyz.dat')
test_molDB.add_scalar_properties_from_file('test_en.dat', 'energy')
test_molDB.add_xyz_vectorial_properties_from_file('test_grad.dat', 'energy_gradients')
model.predict(molecular_database=test_molDB, property_to_predict='mace_energy', xyz_derivative_property_to_predict='mace_gradients')
Then you can do analysis whatever you like, e.g. calculate RMSE:
ml.stats.rmse(test_molDB.get_properties('energy'), test_molDB.get_properties('mace_energy'))*ml.constants.Hartree2kcalpermol
ml.stats.rmse(test_molDB.get_xyz_vectorial_properties('energy_gradients').flatten(), test_molDB.get_xyz_vectorial_properties('mace_gradients').flatten())*ml.constants.Hartree2kcalpermol
Using the model
After the model is trained, it can be used with MLatom for applications, e.g., geometry optimizations or MD, check out MLatom’s manual for details. Here is brief example how the input file for geometry optimization would look like:
geomopt # Request geometry optimization
MLmodelType=MACE # use ML model of the MACE type
MLmodelIn=mace.pt # the model to be used
XYZfile=ethanol_init.xyz # The file with the initial guess
optXYZ=eq_MACE.xyz # optimized geometry output
In Python, geometry optimization is also quite simple:
import mlatom as ml
# load initial geometry
mol = ml.data.molecule.from_xyz_file('ethanol_init.xyz')
print(mol.get_xyz_string())
# load the model
model = ml.models.mace(model_file='mace.pt')
# run geometry optimization
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())
Summary
We are glad to introduce here the MACE interface in MLatom and share the tutorial on how to use it. This model shows a great performance even with a relatively small data set size. Hope it will be helpful to you.
Finaly, atomistic machine learning is growing fast, and as an integrative platform, MLatom will keep evolving with state-of-the-art methods to offer the best experience for the comunity.
Leave a Reply