Tutorial for chapter on “Learning from multiple quantum chemical methods”
On this page you can find instructions how to follow the examples given in the upcoming book chapter on Learning from multiple quantum chemical methods: Δ-learning, transfer learning, co-kriging, and beyond (to be published in 2022).
Please see Manual of MLatom and other pages on this website for more details about how to use this program package. For examples you need MLatom 1.1 or newer version.
Training KRR with the linear kernel
R_20.dat file with 20 points corresponding to internuclear distances in the H2 molecule in Å:
E_FCI_20.dat file with full CI energies (calculated with the aug-cc-pV6Z basis set, in Hartree) for above 20 points:
You can use this input file for MLatom to train the KRR model with the linear kernel without regularization (lambda parameter set to 0 by default):
The predicted values will be saved to East.dat file and as you can check the values are all over the place indicating that the fitting essentially failed.
You can modify the above input file to use regularization, e.g., by setting the hyperparameter λ=10−6:
Now the estimated values saved to file Eest_lambda.dat are the same as predicted by the linear regression. You can print out the coefficients alpha from the saved ML model by using the command:
mlatom useMLmodel MLmodelIn=linear_lambda.unf YestFile=Eest_lambda-use.dat XfileIn=R_20.dat debug
Training KRR with the Gaussian kernel
Download the full data set of with 451 points along the H2 dissociation curve. File with internuclear distances in the H2 molecule in Å:
File with the reference energies (full CI, calculated with the aug-cc-pV6Z basis set, in Hartree):
Now you can download the indices of the points in the data set to be used as the training and test sets. You can check that all the training points are the same as above.
To train and test the KRR model with the Gaussian kernel using the hyperparameters σ=1 and λ=3.5⋅10−13, you can use the following input file:
You can check that estimated values saved in file are very close to the reference values for both training and test points.