# Frequencies and thermochemistry

In this tutorial we show how to calculate frequencies of molecules with MLatom. We will basically continue from geometry optimization tutorial, as it is a common practice to first optimize geometry and then calculate its frequency, i.e., to check whether it is a true minimum or transition state (number of imaginary frequencies) and obtain ZPVE and other thermochemical corrections.

Here is a video of frequency and thermochemistry calculations using XACS cloud and Python API:

We start with the same example as that in the beginning of geometry optimization tutorial. Geometry optimization is typically needed before calculating frequencies and the same method or model should be used for both.

After getting the optimized geometry (`opt.xyz`

) as described in the tutorial on geometry optimization, you can do frequency calculations with input file/command line options `freq.inp`

shown below (see the separate section on calculating frequencies via Python API):

```
freq # 1. requests frequency calculation
ANI-1ccx # 2. pre-trained model, the same as that used in geometry optimization
xyzfile=opt.xyz # 3. file with optimized geometry
```

Similar to geometry optimization, the user needs to provide the model or method, which can be QM, ML or their combinations. Analytical Hessian is used when it is available, otherwise semianalytical (calculated with analytical gradients) or fully numerical Hessian (calculated with only energies) can be used.

`xyzfile=[file name]`

tells MLatom in what file it can find the geometry. The user can directly use the optimized geometry output by MLatom (file defined by `optxyz`

keyword), or provide it in XYZ format, e.g., for this example (in Angstrom, e.g., `opt.xyz`

) which would look like:

```
9
C -1.2121068029534 -0.2252488537660 0.0000164117920
H -1.2701429633537 -0.8599818439347 -0.8855315546559
H -1.2701264459840 -0.8599834499240 0.8855643352209
H -2.0710636368970 0.4504289041619 0.0000252076803
C 0.0804149514179 0.5556635798575 0.0000041512523
H 0.1395635927681 1.1970970820212 -0.8856250576131
H 0.1395848675981 1.1970854779804 0.8856411245403
O 1.1428329343502 -0.3971737931938 -0.0000106635278
H 1.9796722202817 0.0702558186994 -0.0001121252171
```

After you created the input file, you can run MLatom, e.g., on XACS cloud, as:

```
mlatom freq.inp &> freq.out
```

The program output will be saved in file `freq.out`

.

Alternatively, you can run the same simulation in the command line with the same options:

```
mlatom freq ANI-1ccx xyzfile=opt.xyz
```

You should be able to find in the MLatom output the vibration analysis and thermochemistry results. The vibration analysis prints out frequency, reduced mass and force constant of each normal mode. The thermochemistry part prints out zero-point energy, enthalpy, Gibbs free energy, heat of formation and other properties. For our example, the part of the output would look like:

```
==============================================================================
Vibration analysis for molecule 1
==============================================================================
Multiplicity: 1
Rotational symmetry number: 1
This is a nonlinear molecule
Mode Frequencies Reduced masses Force Constants
(cm^-1) (AMU) (mDyne/A)
1 261.0407 1.0929 0.0439
2 286.9474 1.1295 0.0548
3 428.0328 2.7260 0.2943
4 841.1644 1.0947 0.4564
5 928.2688 2.3346 1.1853
6 1082.6338 3.1779 2.1946
7 1138.6399 1.9583 1.4959
8 1212.5257 1.5518 1.3442
9 1287.5002 1.1134 1.0874
10 1292.8263 1.0682 1.0520
11 1426.7497 1.2307 1.4760
12 1457.4408 1.3203 1.6524
13 1496.9454 1.1496 1.5178
14 1501.2654 1.0458 1.3887
15 1524.8936 1.0529 1.4426
16 3063.7060 1.0530 5.8232
17 3076.9020 1.1120 6.2026
18 3093.2039 1.0342 5.8299
19 3201.4189 1.1000 6.6424
20 3213.8757 1.1004 6.6964
21 3741.4822 1.0681 8.8098
==============================================================================
Thermochemistry for molecule 1
==============================================================================
Standard deviation of NNs : 0.00063190 Hartree 0.39652 kcal/mol
Energy : -154.89195912 Hartree
ZPE-exclusive internal energy at 0 K: -154.89196 Hartree
Zero-point vibrational energy : 0.08101 Hartree
Internal energy at 0 K: -154.81095 Hartree
Enthalpy at 298 K: -154.80576 Hartree
Gibbs free energy at 298 K: -154.83631 Hartree
Atomization enthalpy at 0 K: 1.21200 Hartree 760.54458 kcal/mol
ZPE-exclusive atomization energy at 0 K: 1.29301 Hartree 811.37653 kcal/mol
Heat of formation at 298 K: -0.09030 Hartree -56.66187 kcal/mol
```

## Choosing the method or model

The user can benefit from MLatom’s many choices of the methods or models. Many methods such as ANI-1ccx or AIQM1 are recognized automatically by MLatom as shown above. We also show examples of using QM method (B3LYP/6-31G*) and user-trained ML model (ANI model).

The general way to define the QM method is to use `method`

and the corresponding QM program providing its implementations (MLatom uses interfaces to such programs). For example, the input file of using B3LYP/6-31G*:

```
freq # 1. requests frequency calculation
method=B3LYP/6-31G* # 2. requests running DFT calculation with B3LYP
qmprog=PySCF # 3. requests using PySCF for B3LYP calculations; qmprog=Gaussian can also be used
xyzfile=opt.xyz # 4. file with optimized geometry
```

The general way to request the frequency calculations with the user-trained ML model is to specify the ML model type with `MLmodelType`

and `MLmodelIn`

keyword. For example, the input file of using ANI model:

```
freq # 1. requests frequency calculation
MLmodelType=ANI # 2. request optimization with the ANI-type of machine learning potential
MLmodelIn=ani.npz # 3. the file with the trained model should be provided, here it is ani.npz file
xyzfile=opt.xyz # 4. file with optimized geometry
```

The advanced user-defined hybrid models such as delta-learning models can be used for frequency calculation only via Python API as demonstrated separately.

Note

In principle, our default recomendation is to use AIQM1 as it is usually the most accurate method for affordable cost. However, it is currently only applicable to the compounds with CHNO elements. Another limitation is that to unlock its full potential, we recommend to install the MNDO program for its QM part. The XACS cloud uses Sparrow instead, which currently only provides numerical gradients and Hessians severly limiting its applicability for frequency calculations. ANI-1ccx is hence a good alternative on the XACS cloud if it can be applied (it has additional limitations to neutral closed-shell compounds in ground state).

Also, currently, among ML models, Hessians are currently available only for ANI-type models.

## Calculations of many molecules

MLatom can also perform frequency calculations of many molecules like geometry optimization. MLatom will output the vibration analysis and thermochemical properties of each molecule.
To do these calculations, you simply need to prepare the XYZ file with the geometries of multiple molecules (or get this file from MLatom’s previous geometry optimization calculations), e.g., for the hydrogen and methane molecules your `opt.xyz`

file may look like:

```
2
1 0.0000000000 0.0000000000 0.0000000000
1 0.7414000000 0.0000000000 0.0000000000
5
C 0.0000000000 0.0000000000 0.0000000000
H 1.0870000000 0.0000000000 0.0000000000
H -0.3623333220 -1.0248334322 -0.0000000000
H -0.3623333220 0.5124167161 -0.8875317869
H -0.3623333220 0.5124167161 0.8875317869
```

## Charge and multiplicity

A quick reminder on how to define the charges and multiplicities of a single or multiple molecules. The input might looks something like:

```
freq # 1. requests frequency calculation
AIQM1 # 2. AI-enhanced quantum mechanical method 1
xyzfile=opt.xyz # 3. file with optimized geometries
charges=0,1 # 4. charge of the first molecule is 0, of the second is 1.
multiplicities=1,2 # 5. multiplicity of the first molecule is 1, second - 2
```

Note

It does not make sense to define charges and multiplicities for models like ANI-1ccx which were trained on the neutral closed-shell species.

## Choosing the program

MLatom provides several ways (programs) to calculate frequencies and thermochemical properties, similarly to geometry optimizations, the user can choose the program with the keyword `freqprog`

. Three options are available:
`freqprog=Gaussian`

, `freqprog=PySCF`

and `freqprog=ASE`

for using the respective programs.

These are actually different tasks and while they are split in Python API (see below), in input file, when `freq`

is requested, both tasks are performed.

`freqprog=Gaussian`

will perform both tasks with Gaussian and will generate input and output files which the user can modify and inspect. Gaussian’s interface is the most time-tested and hence, robust, but not available on the XACS cloud and needs to be installed by the user locally.

`freqprog=PySCF`

will use PySCF to generate thermochemical properties which is quite robust and also available on the XACS cloud since PySCF is an open-source Python package.

If `freqprog=ASE`

is requested, the ASE is used to calculate thermochemical properties following the frequencies calculations performed either with the improved code based on TorchANI (modified to correctly capture the negative frequency in transition states). Frequencies given by modified TorchANI’s code was also tested by us and available on the XACS cloud.

If no `freqprog`

is provided, by default, MLatom will check whether Gaussian is available and use it, then PySCF and ASE & modified TorchANI combination.

Whatever program you chhose, it is worth checking the instructions below for pitfalls and tips.

### ASE

For MLatom version older than 3.1.1, the user should tell ASE whether molecules are linear by using `ase.linear`

keyword. If the molecule is linear, you need to set this value as 1, otherwise it is by default 0. If your input XYZ file has more than 1 molecule, `ase.linear=1`

makes MLatom assume that they are all linear molecules. You can specify the linearity by, e.g., `ase.linear=1,0`

. For version newer than 3.1.1 (including), MLatom can automatically detect the linearity of molecules so you don’t have to specify explicitly. But you can still do that which will override MLatom’s judgement.

It is also important to specify the symmetry of your molecules by using keyword `ase.symmetrynumber`

. The default one in MLatom corresponds to \(C_1\) point group. You can get the symmetry number by referring to the table below, taken from C. Cramer “Essentials of Computational Chemistry”, 2nd Ed.

Point Group |
Symmetry number |
---|---|

\(C_1\) |
\(1\) |

\(C_i\) |
\(1\) |

\(C_s\) |
\(1\) |

\(C_{{\infty}v}\) |
\(1\) |

\(D_{{\infty}h}\) |
\(2\) |

\(S_n,n=2,4,6,...\) |
\(n/2\) |

\(C_n,n=2,3,4,...\) |
\(n\) |

\(C_{nh},n=2,3,4,...\) |
\(n\) |

\(C_{nv},n=2,3,4,...\) |
\(n\) |

\(D_n,n=2,3,4,...\) |
\(2n\) |

\(D_{nh},n=2,3,4,...\) |
\(2n\) |

\(D_{nd},n=2,3,4,...\) |
\(2n\) |

\(T\) |
\(12\) |

\(T_d\) |
\(12\) |

\(O_h\) |
\(24\) |

\(I_h\) |
\(60\) |

For example, if your input XYZ file has hydrogen (belongs to \(D_{{\infty}h}\) group) and methane (belongs to \(T_d\) group), the input file should look like this:

```
freq
ANI-1ccx
xyzfile=opt.xyz
freqprog=ase
ase.linear=1,0
ase.symmetrynumber=2,12
```

### Gaussian

When Gaussian is used, you will see the `gaussian.log`

file which is similar to the common Gaussian output file with the difference, that it will use the MLatom’s model to predict Hessian to perform the frequency calculations. Then you can use your favorite tools to analyze this output as usual for Gaussian, which is a big advantage for the Gaussian users. Here is the input file of using Gaussian to calculate frequencies.

```
freq
ANI-1ccx
xyzfile=opt.xyz
freqprog=Gaussian
```

For above example, you can check the MLatom’s output file:

```
==============================================================================
Vibration analysis for molecule 1
==============================================================================
Multiplicity: 1
This is a nonlinear molecule
Mode Frequencies Reduced masses Force Constants
(cm^-1) (AMU) (mDyne/A)
1 264.1253 1.0872 0.0447
2 288.7816 1.1305 0.0555
3 429.1248 2.7295 0.2961
4 840.9571 1.0959 0.4566
5 929.0118 2.3254 1.1825
6 1083.2419 3.1763 2.1960
7 1138.7381 1.9666 1.5025
8 1211.7614 1.5471 1.3384
9 1291.9096 1.0694 1.0516
10 1299.4679 1.1131 1.1074
11 1434.7836 1.2319 1.4941
12 1464.0813 1.3257 1.6743
13 1497.4787 1.1427 1.5097
14 1502.1301 1.0457 1.3902
15 1525.4126 1.0539 1.4449
16 3045.3746 1.0531 5.7544
17 3061.9747 1.1130 6.1482
18 3092.9313 1.0343 5.8295
19 3197.6062 1.0999 6.6261
20 3213.0380 1.1000 6.6905
21 3741.0501 1.0681 8.8073
==============================================================================
Thermochemistry for molecule 1
==============================================================================
Standard deviation of NNs : 0.00062817 Hartree 0.39419 kcal/mol
Energy : -154.89196013 Hartree
ZPE-exclusive internal energy at 0 K: -154.89196 Hartree
Zero-point vibrational energy : 0.08100 Hartree
Internal energy at 0 K: -154.81096 Hartree
Enthalpy at 298 K: -154.80578 Hartree
Gibbs free energy at 298 K: -154.83631 Hartree
Atomization enthalpy at 0 K: 1.21202 Hartree 760.55134 kcal/mol
ZPE-exclusive atomization energy at 0 K: 1.29301 Hartree 811.37710 kcal/mol
Heat of formation at 298 K: -0.09032 Hartree -56.67409 kcal/mol
```

The examples of the full `MLatom`

and `Gaussian`

output files can be downloaded too.

An additional important advantage is that the advanced users can modify the `gaussian.com`

file which contains the input required for geometry optimization with MLatom’s models.

After the `gaussian.com`

file is modified, you can run with it the Gaussian job as usual, e.g., in your system like `g16 gaussian.com`

in command line.

### PySCF

Frequency calculation with PySCF works similar with that in Gaussian. We slightly modified source code to make it adaptive to MLatom. By using input as below,

```
freq
ANI-1ccx
xyzfile=opt.xyz
freqprog=PySCF
```

you will get the MLatom’s output file:

```
==============================================================================
Vibration analysis for molecule 1
==============================================================================
Multiplicity: 1
This is a nonlinear molecule
Mode Frequencies Reduced masses Force Constants
(cm^-1) (AMU) (mDyne/A)
1 263.8939 1.0878 0.0446
2 288.6910 1.1303 0.0555
3 429.0255 2.7312 0.2962
4 840.8114 1.0961 0.4566
5 928.7823 2.3273 1.1829
6 1082.9072 3.1819 2.1984
7 1138.4541 1.9665 1.5017
8 1211.4647 1.5472 1.3379
9 1291.7699 1.0696 1.0516
10 1299.3247 1.1131 1.1072
11 1434.5579 1.2319 1.4937
12 1463.8316 1.3262 1.6743
13 1497.2788 1.1422 1.5087
14 1501.9697 1.0458 1.3901
15 1525.2540 1.0541 1.4448
16 3045.0585 1.0532 5.7540
17 3061.5892 1.1131 6.1473
18 3092.6304 1.0344 5.8292
19 3197.1933 1.1000 6.6252
20 3212.6500 1.1001 6.6896
21 3740.7180 1.0683 8.8072
==============================================================================
Thermochemistry for molecule 1
==============================================================================
Standard deviation of NNs : 0.00062817 Hartree 0.39419 kcal/mol
Energy : -154.89196013 Hartree
ZPE-exclusive internal energy at 0 K: -154.89196 Hartree
Zero-point vibrational energy : 0.08098 Hartree
Internal energy at 0 K: -154.81098 Hartree
Enthalpy at 298 K: -154.80579 Hartree
Gibbs free energy at 298 K: -154.83632 Hartree
Atomization enthalpy at 0 K: 1.21203 Hartree 760.55897 kcal/mol
ZPE-exclusive atomization energy at 0 K: 1.29301 Hartree 811.37718 kcal/mol
Heat of formation at 298 K: -0.09033 Hartree -56.68121 kcal/mol
```

## Calculations through Python API

The Python API provides more flexibility to use more complicated models and build custom workflows.

Below is a workflow of optimizing geometry and calculating frequency. You might already see some of the codes in geometry optimization tutorial.

```
import mlatom as ml
# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('init.xyz')
# Choose one of the predifined (automatically recognized) methods
mymodel = ml.models.methods(method='ANI-1ccx')
# Optimize the geometry with the choosen optimizer:
geomopt = ml.optimize_geometry(model=mymodel, initial_molecule=initmol, program='ASE', maximum_number_of_steps=10000)
# Get the optimized molecule
final_mol = geomopt.optimized_molecule
# Do vibration analysis and thermochemistry calculation
freq = ml.thermochemistry(model=mymodel, molecule=final_mol, program='ASE')
# or vibration analysis only
# freq = ml.freq(model=mymodel, molecule=final_mol, program='')
# Save the molecule with vibration analysis and thermochemistry results
final_mol.dump(filename='final_mol.json',format='json')
# Check vibration analysis
print("Mode Frequencies Reduced masses Force Constants")
print(" (cm^-1) (AMU) (mDyne/A)")
for ii in range(len(final_mol.frequencies)):
print("%d %13.4f %13.4f %13.4f"%(ii,final_mol.frequencies[ii],final_mol.reduced_masses[ii],final_mol.force_constants[ii]))
# Check thermochemistry results
print(f"Zero-point vibrational energy: {final_mol.ZPE} Hartree")
print(f"Enthalpy at 298 K: {final_mol.H} Hartree")
print(f"Gibbs Free energy at 298 K: {final_mol.G} Hartree")
print(f"Heat of formation at 298 K: {final_mol.DeltaHf298} Hartree")
```

You can compare your results with the output shown above.

Note

Note that `ml.thermochemistry`

contains both vibration analysis and thermochemistry calculation. If you just want vibrational analysis, you can use `ml.freq`

instead. All the results are saved as the arguments of `final_mol`

.

There will be several properties related to thermochemistry listed below that are attributed to the `molecule`

object (in Hartree for energies and Hartree/K for entropy) after calculation:

```
{...
'ZPE': 0.080996, # Zero-point energy
'DeltaE2U': 0.085241, # Difference between internal energy and total energy (ZPVE-exclusive)
'DeltaE2H': 0.086185, # Difference between enthalpy and total energy (ZPVE-exclusive)
'DeltaE2G': 0.055654, # Difference between Gibbs free energy and total energy (ZPVE-exclusive)
'U0': -154.810964, # Internal energy at 0 K
'H0': -154.810964, # Enthalpy at 0 K
'U:' -154.80672, # Internal energy at 298 K
'H': -154.805775, # Enthalpy at 298 K
'G': -154.836306, # Gibbs free energy at 0 K
'S': 0.00010240004758876359, # entropy
'atomization_energy_0K': 1.2120157100000029,
'ZPE_exclusive_atomization_energy_0K': 1.293011710000003,
'DeltaHf298': -0.0903159176864495 # Heat of formation at 298 K
}
```

You can retrieve the value of the property by directly calling `molecule.property`

(e.g. `molecule.G`

will return you the Gibbs free energy of the molecule).

### Calculations with custom hybrid models

Finally, a more advanced example of using a custom delta-learning model, trained on the differences between full configuration interaction and Hartree–Fock energies:

XYZ coordinates, full CI energies and Hartree–Fock energies of the training points can be downloaded here (`xyz_h2_451.dat`

, `E_FCI_451.dat`

, `E_HF_451.dat`

). You can try to train the delta-learning model by yourself. Here, we provide a pre-trained model using ANI `delta_FCI-HF_h2_ani.npz`

.

```
import mlatom as ml
# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('h2_init.xyz')
# Let's build our own delta-model
# First, define the baseline QM method, e.g.,
baseline = ml.models.model_tree_node(name='baseline', operator='predict', model=ml.models.methods(method='HF/STO-3G', program='pyscf'))
# Second, define the correcting ML model, in this case the ANI model trained on the differences between full CI and HF
delta_correction = ml.models.model_tree_node(name='delta_correction', operator='predict', model=ml.models.ani(model_file='delta_FCI-HF_h2_ani.npz'))
# Third, build the delta-model which would sum up the predictions by the baseline (HF) with the correction from ML
delta_model = ml.models.model_tree_node(name='delta_model', children=[baseline, delta_correction], operator='sum')
# Optimize the geometry with the delta-model as usual:
geomopt = ml.optimize_geometry(model=delta_model, initial_molecule=initmol, program='ASE')
# Get the final geometry approximately at the full CI level
final_mol = geomopt.optimized_molecule
print('Optimized coordinates:')
print(final_mol.get_xyz_string())
final_mol.write_file_with_xyz_coordinates(filename='final.xyz')
# Let's check how many full CI calculations, our delta-model saved us
print('Number of optimization steps:', len(geomopt.optimization_trajectory.steps))
# Calculate frequency
freq = ml.freq(model=delta_model, molecule=final_mol, program='')
# Check vibration analysis
print("Mode Frequencies Reduced masses Force Constants")
print(" (cm^-1) (AMU) (mDyne/A)")
for ii in range(len(final_mol.frequencies)):
print("%d %13.4f %13.4f %13.4f"%(ii,final_mol.frequencies[ii],final_mol.reduced_masses[ii],final_mol.force_constants[ii]))
```

Given an initial geometry `h2_init.xyz`

, you will get the following output when using the codes above.

```
Optimized coordinates:
2
H 0.1272159900000 0.0000000000000 0.0000000000000
H 0.8727840100000 0.0000000000000 0.0000000000000
Number of optimization steps: 7
Mode Frequencies Reduced masses Force Constants
(cm^-1) (AMU) (mDyne/A)
0 4327.2885 1.0078 11.1190
```

## Thermochemistry in Diels–Alder reaction

By using these thermochemical properties, you can easily evaluate relative energies for various aspects of chemical behavior,
such as the change of energy profile during reaction process. We take the Diels–Alder reaction of cyclopentadiene and maleimide, an example in the MLatom 3 paper. You can download the geometry file needed here: `re.xyz`

.

There are 3 structures in `re.xyz`

: the first two strucutres are reactants and the third one is the product (see figure below):

Thus, the basic idea here is that we will calculate thermochemical properties for each molecule and then calculate the energy changes from product to reactants. We will illustrate that it is very convenient to do with the Python API (the same can be done manually through input file/command line calculations by performing thermochemical calculations for each molecule and get the changes in each type of energy).

```
import mlatom as ml
# read molecules from .xyz file
reDB = ml.data.molecular_database().read_from_xyz_file('re.xyz')
optreDB = ml.data.molecular_database()
# define method to calculate thermochemical properties
model = ml.models.methods(method='AIQM1', qm_program='mndo')
for mol in reDB:
# geometry optimization
geomopt = ml.optimize_geometry(model=model, initial_molecule=mol, program='gaussian')
optmol = geomopt.optimized_molecule
# frequency calculation
ml.thermochemistry(model=model, molecule=optmol)
optreDB.append(optmol)
# get relative energies
deltaE = optreDB[2].energy - optreDB[1].energy - optreDB[0].energy
deltaG = optreDB[2].G - optreDB[1].G - optreDB[0].G
deltaH = optreDB[2].H - optreDB[1].H - optreDB[0].H
print(deltaE)
print(deltaG)
print(deltaH)
```

Here is the summary of the output containing changes in ZPVE-exclusive energy, Gibbs free energy, and enthalpy.

Method |
\(\Delta E_r\) |
\(\Delta G_r\) |
\(\Delta H_r\) |
---|---|---|---|

reference |
-34.20 |
- |
- |

AIQM1 |
-34.13 |
-15.98 |
-30.36 |

This shows that if you can use AIQM1, you should use this method to obtain accurate results fast; it is often much more preferable than performing analogous, slower and less accurate, DFT calculations.

## Any questions or suggestions?

If you have further questions, criticism, and suggestions, we would be happy to receive them in English or Chinese via email, Slack (preferred), or WeChat (please send an email to request to add you to the XACS user support group).