Case study 1: ML-NEA spectrum for a single molecule

On this page you can find instructions how to follow the case study 1 given in the book chapter “Learning excited-state properties”:

Julia Westermayr*, Pavlo O. Dral, Philipp Marquetand. Learning excited-state properties. DOI: 10.1016/B978-0-323-90049-2.00004-4.

In Quantum Chemistry in the Age of Machine Learning, Pavlo O. Dral, Ed. Elsevier: 2023. DOI: 10.1016/B978-0-323-90049-2.00014-7.

Preparation

Download the package with the tutorial:

This package contains MLatom 2.0.5 and directories with input files and data for examples 1-3. All examples can be only run on a Linux machine and you should have Python 3.7+ installed.

You also need:

  • install Newton-X (NX) (version = 2.2)
  • use export NX=/path/to/Newton-X to define the environmental variable NX (in bash shell)
  • install matplotlib with the command python3 -m pip install matplotlib

Unpack the data by running the command

unzip tutorial_MLinQC22-ML-ES-props.zip

Go to the directory tutorial_MLinQC22-ML-ES-props:

cd tutorial_MLinQC22-ML-ES-props

Example 1

Go to the directory with example 1:

cd example1

Run MLatom:

python3 ../MLatom_v2-0-5/MLatom.py ML-NEA.inp &> ML-NEA.out

Depending on your machine, it may take from couple to dozens of minutes. The output ML-NEA.out should contain the following lines:

==========================================================================================
run ML-NEA iteratively for spectrum generation ( ML_train_iter ) started at Wed Dec  1 12:00:19 2021 CST
ML-NEA iteration 1: train_number = 50; RMSE_geom = 0.06717941145022376; rRMSE = 1.0

ML-NEA iteration 2: train_number = 100; RMSE_geom = 0.09043318436728051; rRMSE = 0.25713761026721255

ML-NEA iteration 3: train_number = 150; RMSE_geom = 0.06411060145373663; rRMSE = 0.410580813729204

ML-NEA iteration 4: train_number = 200; RMSE_geom = 0.0695737045717655; rRMSE = 0.07852252732055763

ML-NEA iteration ended after 4 iteration!
run ML-NEA iteratively for spectrum generation ( ML_train_iter ) finished at Wed Dec  1 12:08:01 2021 CST |||| total spent 462.02 sec
==========================================================================================

After the calculations finished, the spectra are plotted to plot.png file in the cross-section sub-directory. You can open and check it with your favorite image viewer, e.g.:

gwenview cross-section/plot.png

It should look like:

The final result: ‘ref’ is the experimental spectrum, QC-NEA – spectrum calculated with quantum chemical approach on 200 points in ensemble, ML-NEA – machine learning spectrum generated with 200 points in the training set and 50k points in ensemble, QC-SPC – spectrum generated with single-point convolution.

Example 2

Go to the directory with example 2:

cd example2

Run MLatom to calculate the spectrum with 2k training points and compare the resulting spectrum to the ML-NEA spectrum generated in example 1 (i.e. with 200 training points):

python3 ../MLatom_v2-0-5/MLatom.py ML-NEA.inp &> ML-NEA.out

Depending on your machine, it may take from couple to dozens of minutes. The output ML-NEA.out should contain the following lines:

==========================================================================================
run ML-NEA iteratively for spectrum generation ( ML_train_iter ) started at Wed Dec  1 12:00:19 2021 CST
==========================================================================================
use all QC points to run ML-NEA calculations ( ML_train_all ) started at Wed Dec  1 12:20:25 2021 CST

RMSE_geom value for 2000 point: 0.05609438652907553
use all QC points to run ML-NEA calculations ( ML_train_all ) finished at Wed Dec  1 12:44:40 2021 CST |||| total spent 1454.98 sec
==========================================================================================

After the calculations finished, the spectra are plotted to plot.png file in the cross-section sub-directory. You can open and check it with your favorite image viewer, e.g.:

gwenview cross-section/plot.png

It should look like:

The final result: ‘ref’ is the ML-NEA spectrum from example 1 (generated with 200 training points), ML-NEA – machine learning spectrum generated with 2k points in the training set and 50k points in ensemble.

Example 3

Go to the directory with example 3:

cd example3

Run MLatom to calculate the spectrum with 100k points in ensemble and compare the resulting spectrum to the ML-NEA spectrum generated in example 1 (i.e. with 50k points in ensemble):

python3 ../MLatom_v2-0-5/MLatom.py ML-NEA.inp &> ML-NEA.out

Depending on your machine, it may take from couple to dozens of minutes. The output ML-NEA.out should contain the following lines:

==========================================================================================
use all QC points to run ML-NEA calculations ( ML_train_all ) started at Wed Dec  1 13:25:24 2021 CST

RMSE_geom value for 200 point: 0.0695737045717655
use all QC points to run ML-NEA calculations ( ML_train_all ) finished at Wed Dec  1 13:33:41 2021 CST |||| total spent 496.45 sec
==========================================================================================

After the calculations finished, the spectra are plotted to plot.png file in the cross-section sub-directory. You can open and check it with your favorite image viewer, e.g.:

gwenview cross-section/plot.png

It should look like:

The final result: ‘ref’ is the ML-NEA spectrum from example 1 (generated with 50k points in ensemble), ML-NEA – machine learning spectrum generated with 100k points in ensemble.

Leave a Reply

Your email address will not be published. Required fields are marked *

*