Tutorial for Kernel Methods chapter

On this page you can find instructions how to follow the examples given in the upcoming book chapter on kernel methods (to be published in 2022).

Please see Manual of MLatom and other pages on this website for more details about how to use this program package. For examples you need MLatom 1.1 or newer version.

Training KRR with the linear kernel

Download R_20.dat file with 20 points corresponding to internuclear distances in the H2 molecule in Å:

Download E_FCI_20.dat file with full CI energies (calculated with the aug-cc-pV6Z basis set, in Hartree) for above 20 points:

You can use this input file for MLatom to train the KRR model with the linear kernel without regularization (lambda parameter set to 0 by default):

createMLmodel
kernel=linear
XfileIn=R_20.dat
Yfile=E_FCI_20.dat
YestFile=Eest.dat
MLmodelOut=linear.unf

The predicted values will be saved to East.dat file and as you can check the values are all over the place indicating that the fitting essentially failed.

You can modify the above input file to use regularization, e.g., by setting the hyperparameter λ=10−6:

createMLmodel
kernel=linear
XfileIn=R_20.dat
Yfile=E_FCI_20.dat
YestFile=Eest_lambda.dat
MLmodelOut=linear_lambda.unf
lambda=0.000001

Now the estimated values saved to file Eest_lambda.dat are the same as predicted by the linear regression. You can print out the coefficients alpha from the saved ML model by using the command:

mlatom useMLmodel MLmodelIn=linear_lambda.unf YestFile=Eest_lambda-use.dat XfileIn=R_20.dat debug

Training KRR with the Gaussian kernel

Download the full data set of with 451 points along the H2 dissociation curve. File with internuclear distances in the H2 molecule in Å:

File with the reference energies (full CI, calculated with the aug-cc-pV6Z basis set, in Hartree):

Now you can download the indices of the points in the data set to be used as the training and test sets. You can check that all the training points are the same as above.

To train and test the KRR model with the Gaussian kernel using the hyperparameters σ=1 and λ=3.5⋅10−13, you can use the following input file:

estAccMLmodel
XfileIn=R_451.dat
Yfile=E_FCI_451.dat
sampling=user-defined
Ntrain=20 iTrainIn=itrain.dat
iTestIn=itest.dat
YestFile=Eest_gaussian.dat
sigma=1
lambda=3.5e-13

You can check that estimated values saved in file are very close to the reference values for both training and test points.