Calculations of enthalpies of formation

This tutorial describes how to calculate enthalpies of formation with ANI-1ccx and AIQM1 approaches as described in:

Peikun Zheng, Wudi Yang, Wei Wu, Olexandr Isayev, Pavlo O. Dral, Toward Chemical Accuracy in Predicting Enthalpies of Formation with General-Purpose Data-Driven Methods, J. Phys. Chem. Lett. 2022, 13, 3479–3491. DOI: 10.1021/acs.jpclett.2c00734.

Please cite this paper in the publications using the above calculations.

Installation

Required installations and other types of calculations with AIQM1 are given in a dedicated tutorial. Below minimum installation requirements are given for ANI-1ccx.

MLatom

You need MLatom version 2.2 or newer.

It is recommended to install the latest MLatom release as described in download page. See manual pages on this website for license and further instructions.

TorchANI

TorchANI program is required to provide the neural network (NN) part of AIQM1.

TorchANI is an open-source package. The latest version when writing this tutorial is v2.2.

Installation

1. install Numpy and nightly version of PyTorch (if you do not have them already):

pip install numpy
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu100/torch_nightly.html

2. install TorchANI:

pip install torchani

Visit https://aiqm.github.io/torchani/ for more info. The latest version of TorchANI used when writing this tutorial is v2.2, you can install this version by pip install torchani==2.2 if there are any problems when running with the newest version of TorchANI. The CUDA extension for AEV calculation is not supported for the NN part of AIQM1 and ANI-1ccx now.

Other packages

To perform geometry optimizations or calculations of enthalpies of formation, MLatom relies on either Gaussian or ASE package (any of them can be used; the first one is commercial, the second one is open-source).

Gaussian

Our implementation work with both Gaussian 09 and Gaussian 16. It is a commercial program, which can be obtained and installed separately.

To use Gaussian interface, make sure that your environmental variable GAUSS_EXEDIR points to the right place.

ASE

The ASE (Atomic Simulation Environment) are Python modules, which can be installed as described on ASE website.

Usage

To perform calculations, you can provide input files with appropriate options to MLatom (command line options also supported) and then run MLatom calculations as usual (see manual and tutorial pages on this website). Below we provide examples of some typical uses. In all cases, we use mlatom as alias to $pathToMLatom/MLatom.py (it is useful to setup such an alias in your shell).

The calculations of enthalpies of formation consist of two steps: geometry optimization and frequency calculations.

Geometry optimization

Geometry optimization of closed-shell molecules in electronic ground state is as simple as running single point calculations and 4-line MLatom input file, e.g., opt.inp, looks like this:

ANI-1ccx
xyzfile=opt.xyz
optxyz=final_geo.xyz
geomopt
optprog=gaussian # or optprog=ase if you choose ASE

This input requires opt.xyz file with initial XYZ geometries of molecules to be optimized (you can provide many molecules as usual for MLatom), e.g., for hydrogen and methane opt.xyz file can look like (geometries in Å):

2
                                                        
H         0.0000000000        0.0000000000        0.0000000000
H         0.7414000000        0.0000000000        0.0000000000
5
C         0.0000000000        0.0000000000        0.0000000000
H         1.0870000000        0.0000000000        0.0000000000
H        -0.3623333220       -1.0248334322       -0.0000000000
H        -0.3623333220        0.5124167161       -0.8875317869
H        -0.3623333220        0.5124167161        0.8875317869

After you prepared your input files opt.inp and opt.xyz, you can run MLatom as usual:

mlatom opt.inp

After the calculations finish, the optimized geometries are saved in finla_geo.xyz.

If you use Gaussian, the Gaussian output files of geometry optimizations are saved in mol_1.log, mol_2.log, … files for each molecule, where you can find the optimized geometries and visualize them by GausianView program as in any other typical Gaussian job.

If you use ASE, you can directly obtain optimized XYZ geometries in a single file final_geo.xyz with MLatom format, which for our example looks like (geometries in Å):

2
                                                     
H        0.15255733       0.00000000       0.00000000
H        0.58884267       0.00000000       0.00000000
5
C        0.00000000       0.00000000       0.00000000
H        1.08733372      -0.00000000       0.00000000
H       -0.36244456      -1.02514806       0.00000000
H       -0.36244456       0.51257403      -0.88780426
H       -0.36244456       0.51257403       0.88780426

Frequency calculations

Thermochemical properties of closed-shell molecules in the electronic ground state can be calculated at ANI-1ccx or AIQM1 level by adding an argument freq to the MLatom input file, e.g., freq.inp, and they should be run on geometries optimized with the corresponding model. An example of MLatom input file using the ANI-1ccx-optimized geometries:

ANI-1ccx
xyzfile=optgeoms.xyz
freq
optprog=ASE # or optprog=gaussian if you choose Gaussian
ase.linear=1,0
ase.symmetrynumber=2,12

When ASE is used for the calculation of thermochemical properties, you should specify ase.linear and ase.symmetrynumber this two keywrods. ase.linear is 0 for nonlinear molecule, 1 for linear molecule, and ase.symmetrynumber is the rotational symmetry number for each molecule (see Table 10.1 and Appendix B of C. Cramer “Essentials of Computational Chemistry”, 2nd Ed.). For example, for hydrogen and methane this two molecules, you should set ase.linear=1,0 and ase.symmetrynumber=2,12.

File with preoptimized geometries optgeoms.xyz for our example are (geometries in Å):

2
hydrogen                                                     
H        0.15255733       0.00000000       0.00000000
H        0.58884267       0.00000000       0.00000000
5
methane
C        0.00000000       0.00000000       0.00000000
H        1.08733372      -0.00000000       0.00000000
H       -0.36244456      -1.02514806       0.00000000
H       -0.36244456       0.51257403      -0.88780426
H       -0.36244456       0.51257403       0.88780426

After you prepared your input files freq.inp and optgeoms.xyz, you can run MLatom as usual:

mlatom freq.inp > freq.out

After the calculations finish, MLatom output freq.out will contain the summary with atomization enthalpy at 0 K, ZPVE-exclusive atomization energy at 0 K, and heat of formation at 298.15 K for each molecule. If you use ASE, MLatom output will contain the same lines as above, but also include additional data such as entropy and the Gibbs free energy:

 ......
 Zero-point vibrational energy           :         4.07528 kcal/mol
 Atomization enthalpy at 0 K             :       126.42974 kcal/mol
 ZPE exclusive atomization energy at 0 K :       130.50502 kcal/mol
 Heat of formation at 298.15 K           :       -23.11424 kcal/mol
 * Warning * Heat of formation have high uncertainty!
......
Zero-point vibrational energy           :        27.87144 kcal/mol
Atomization enthalpy at 0 K             :       391.92513 kcal/mol
ZPE exclusive atomization energy at 0 K :       419.79657 kcal/mol
Heat of formation at 298.15 K           :       -17.63420 kcal/mol

If you use Gaussian, the Gaussian output files of frequency calculations are saved in mol_1.log, mol_2.log, … files for each molecule; these files contain ZPVE energy and lots of thermochemical data such as entropy and the Gibbs free energy.