Quick tour with QM9

This notebook showcases a simple example of training a neural network potential on the QM9 dataset with PiNN.

[1]:
import os, warnings
import tensorflow as tf
from glob import glob
from ase.collections import g2
from pinn.io import load_qm9, sparse_batch
from pinn.models import potential_model
from pinn.calculator import PiNN_calc
# CPU is used for documentation generation, feel free to use your GPU!
os.environ['CUDA_VISIBLE_DEVICES'] = ''
# We heavily use indexed slices to do sparse summations,
# which causes tensorflow to complain,
# we believe it's safe to ignore this warning.
index_warning = 'Converting sparse IndexedSlices'
warnings.filterwarnings('ignore', index_warning)

Getting the dataset

PiNN adapts TensorFlow’s dataset API to handle different datasets.

For this and the following notebooks the QM9 dataset (https://doi.org/10.6084/m9.figshare.978904) is used.
To follow the notebooks, download the dataset and change the directory accordingly.
The dataset will be automatically split into subsets according to the split_ratio.
Note that to use the dataset with the estimator, the datasets should be a function, instead of a dataset object.
[2]:
filelist = glob('/home/yunqi/datasets/QM9/dsgdb9nsd/*.xyz')
dataset = lambda: load_qm9(filelist, split={'train':8, 'test':2})
train = lambda: dataset()['train'].repeat().shuffle(1000).apply(sparse_batch(100))
test = lambda: dataset()['test'].repeat().apply(sparse_batch(100))

Defining the model

In PiNN, models are defined at two levels: models and networks.

  • A model (model_fn) defines the target, loss and training details of the network.
  • A network defines the structure of the neural network.

In this example, we will use the potential model, and the PiNet network.

The configuration of a model is stored in a nested dictionary like:

{'model_dir': 'PiNet_QM9',
 'network': 'pinet',
 'network_params': {
     'depth': 4,
     'rc':4.0,
     'atom_types':[1,6,7,8,9]},
 'model_params':{
     'learning_rate': 3e-4,
     ...}}

Available options of the network and model can be found in the documentation.

The easiest way to specify the network function is with its name.
You can also provide a custom function here.
[3]:
params = {'model_dir': '/tmp/PiNet_QM9',
          'network': 'pinet',
          'network_params': {},
          'model_params': {'learning_rate': 1e-3}}
model = potential_model(params)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/PiNet_QM9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f8114523710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Configuring the training process

The defined model is indeed a tf.Estimator object, thus, the training can be easily controlled

[4]:
train_spec = tf.estimator.TrainSpec(input_fn=train, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn=test, steps=100)

Train and evaluate

[5]:
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

INFO:tensorflow:Calling model_fn.

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/yunqi/work/pinn_proj/code/PiNN_dev/pinn/networks/pinet.py:76: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
Total number of trainable variables: 12064
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/PiNet_QM9/model.ckpt.
INFO:tensorflow:loss = 182843.95, step = 1
INFO:tensorflow:global_step/sec: 14.1376
INFO:tensorflow:loss = 12441.237, step = 101 (7.075 sec)
INFO:tensorflow:global_step/sec: 13.5323
INFO:tensorflow:loss = 4586.3096, step = 201 (7.391 sec)
INFO:tensorflow:global_step/sec: 13.306
INFO:tensorflow:loss = 1428.9677, step = 301 (7.514 sec)
INFO:tensorflow:global_step/sec: 13.8819
INFO:tensorflow:loss = 1650.88, step = 401 (7.206 sec)
INFO:tensorflow:global_step/sec: 12.0886
INFO:tensorflow:loss = 932.35297, step = 501 (8.272 sec)
INFO:tensorflow:global_step/sec: 11.0241
INFO:tensorflow:loss = 546.3759, step = 601 (9.070 sec)
INFO:tensorflow:global_step/sec: 10.7535
INFO:tensorflow:loss = 1051.3408, step = 701 (9.300 sec)
INFO:tensorflow:global_step/sec: 10.7193
INFO:tensorflow:loss = 1045.8083, step = 801 (9.328 sec)
INFO:tensorflow:global_step/sec: 10.9043
INFO:tensorflow:loss = 542.9288, step = 901 (9.171 sec)
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/PiNet_QM9/model.ckpt.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/metrics_impl.py:363: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-08-12T21:50:11Z
INFO:tensorflow:Graph was finalized.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/PiNet_QM9/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [20/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [80/100]
INFO:tensorflow:Evaluation [90/100]
INFO:tensorflow:Evaluation [100/100]
INFO:tensorflow:Finished evaluation at 2019-08-12-21:50:19
INFO:tensorflow:Saving dict for global step 1000: METRICS/E_LOSS = 605.77673, METRICS/E_MAE = 14.159206, METRICS/E_RMSE = 24.612532, METRICS/TOT_LOSS = 605.77673, global_step = 1000, loss = 605.77673
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: /tmp/PiNet_QM9/model.ckpt-1000
INFO:tensorflow:Loss for final step: 240.72758.
[5]:
({'METRICS/E_LOSS': 605.77673,
  'METRICS/E_MAE': 14.159206,
  'METRICS/E_RMSE': 24.612532,
  'METRICS/TOT_LOSS': 605.77673,
  'loss': 605.77673,
  'global_step': 1000},
 [])

Using the model

The trained model can be used as an ASE calculator.

[6]:
calc = PiNN_calc(model)
atoms = g2['C2H4']
atoms.set_calculator(calc)
atoms.get_forces(), atoms.get_potential_energy()
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/PiNet_QM9/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[6]:
(array([[-0.0000000e+00, -6.8545341e-07, -1.7648684e+01],
        [-0.0000000e+00, -9.5367432e-07,  1.7648701e+01],
        [-0.0000000e+00,  3.2562576e+01,  2.0237427e+01],
        [-0.0000000e+00, -3.2562576e+01,  2.0237431e+01],
        [-0.0000000e+00,  3.2562592e+01, -2.0237436e+01],
        [-0.0000000e+00, -3.2562592e+01, -2.0237436e+01]], dtype=float32),
 -88.99898529052734)

Conclusion

You have trained your first PiNN model, though the accuracy is not so satisfying (RMSE=21 Hartree!). Also, the training speed is slow as it’s limited by the IO and pre-processing of data.

We will show in following notebooks that:

  • Proper scaling of the energy will improve the accuracy of the model.
  • The training speed can be enhanced by caching and pre-processing the data.