Quick tour with QM9¶
This notebook showcases a simple example of training a neural network potential on the QM9 dataset with PiNN.
[1]:
import os, warnings
import tensorflow as tf
from glob import glob
from ase.collections import g2
from pinn.io import load_qm9, sparse_batch
from pinn.models import potential_model
from pinn.calculator import PiNN_calc
# CPU is used for documentation generation, feel free to use your GPU!
os.environ['CUDA_VISIBLE_DEVICES'] = ''
# We heavily use indexed slices to do sparse summations,
# which causes tensorflow to complain,
# we believe it's safe to ignore this warning.
index_warning = 'Converting sparse IndexedSlices'
warnings.filterwarnings('ignore', index_warning)
Getting the dataset¶
PiNN adapts TensorFlow’s dataset API to handle different datasets.
[2]:
filelist = glob('/home/yunqi/datasets/QM9/dsgdb9nsd/*.xyz')
dataset = lambda: load_qm9(filelist, split={'train':8, 'test':2})
train = lambda: dataset()['train'].repeat().shuffle(1000).apply(sparse_batch(100))
test = lambda: dataset()['test'].repeat().apply(sparse_batch(100))
Defining the model¶
In PiNN, models are defined at two levels: models and networks.
- A model (model_fn) defines the target, loss and training details of the network.
- A network defines the structure of the neural network.
In this example, we will use the potential model, and the PiNet network.
The configuration of a model is stored in a nested dictionary like:
{'model_dir': 'PiNet_QM9',
'network': 'pinet',
'network_params': {
'depth': 4,
'rc':4.0,
'atom_types':[1,6,7,8,9]},
'model_params':{
'learning_rate': 3e-4,
...}}
Available options of the network and model can be found in the documentation.
[3]:
params = {'model_dir': '/tmp/PiNet_QM9',
'network': 'pinet',
'network_params': {},
'model_params': {'learning_rate': 1e-3}}
model = potential_model(params)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/PiNet_QM9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f8114523710>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Configuring the training process¶
The defined model is indeed a tf.Estimator object, thus, the training can be easily controlled
[4]:
train_spec = tf.estimator.TrainSpec(input_fn=train, max_steps=1000)
eval_spec = tf.estimator.EvalSpec(input_fn=test, steps=100)
Train and evaluate¶
[5]:
tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
INFO:tensorflow:Calling model_fn.
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:From /home/yunqi/work/pinn_proj/code/PiNN_dev/pinn/networks/pinet.py:76: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
Total number of trainable variables: 12064
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/PiNet_QM9/model.ckpt.
INFO:tensorflow:loss = 182843.95, step = 1
INFO:tensorflow:global_step/sec: 14.1376
INFO:tensorflow:loss = 12441.237, step = 101 (7.075 sec)
INFO:tensorflow:global_step/sec: 13.5323
INFO:tensorflow:loss = 4586.3096, step = 201 (7.391 sec)
INFO:tensorflow:global_step/sec: 13.306
INFO:tensorflow:loss = 1428.9677, step = 301 (7.514 sec)
INFO:tensorflow:global_step/sec: 13.8819
INFO:tensorflow:loss = 1650.88, step = 401 (7.206 sec)
INFO:tensorflow:global_step/sec: 12.0886
INFO:tensorflow:loss = 932.35297, step = 501 (8.272 sec)
INFO:tensorflow:global_step/sec: 11.0241
INFO:tensorflow:loss = 546.3759, step = 601 (9.070 sec)
INFO:tensorflow:global_step/sec: 10.7535
INFO:tensorflow:loss = 1051.3408, step = 701 (9.300 sec)
INFO:tensorflow:global_step/sec: 10.7193
INFO:tensorflow:loss = 1045.8083, step = 801 (9.328 sec)
INFO:tensorflow:global_step/sec: 10.9043
INFO:tensorflow:loss = 542.9288, step = 901 (9.171 sec)
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/PiNet_QM9/model.ckpt.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/metrics_impl.py:363: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-08-12T21:50:11Z
INFO:tensorflow:Graph was finalized.
WARNING:tensorflow:From /home/yunqi/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /tmp/PiNet_QM9/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [20/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [80/100]
INFO:tensorflow:Evaluation [90/100]
INFO:tensorflow:Evaluation [100/100]
INFO:tensorflow:Finished evaluation at 2019-08-12-21:50:19
INFO:tensorflow:Saving dict for global step 1000: METRICS/E_LOSS = 605.77673, METRICS/E_MAE = 14.159206, METRICS/E_RMSE = 24.612532, METRICS/TOT_LOSS = 605.77673, global_step = 1000, loss = 605.77673
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: /tmp/PiNet_QM9/model.ckpt-1000
INFO:tensorflow:Loss for final step: 240.72758.
[5]:
({'METRICS/E_LOSS': 605.77673,
'METRICS/E_MAE': 14.159206,
'METRICS/E_RMSE': 24.612532,
'METRICS/TOT_LOSS': 605.77673,
'loss': 605.77673,
'global_step': 1000},
[])
Using the model¶
The trained model can be used as an ASE calculator.
[6]:
calc = PiNN_calc(model)
atoms = g2['C2H4']
atoms.set_calculator(calc)
atoms.get_forces(), atoms.get_potential_energy()
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/PiNet_QM9/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[6]:
(array([[-0.0000000e+00, -6.8545341e-07, -1.7648684e+01],
[-0.0000000e+00, -9.5367432e-07, 1.7648701e+01],
[-0.0000000e+00, 3.2562576e+01, 2.0237427e+01],
[-0.0000000e+00, -3.2562576e+01, 2.0237431e+01],
[-0.0000000e+00, 3.2562592e+01, -2.0237436e+01],
[-0.0000000e+00, -3.2562592e+01, -2.0237436e+01]], dtype=float32),
-88.99898529052734)
Conclusion¶
You have trained your first PiNN model, though the accuracy is not so satisfying (RMSE=21 Hartree!). Also, the training speed is slow as it’s limited by the IO and pre-processing of data.
We will show in following notebooks that:
- Proper scaling of the energy will improve the accuracy of the model.
- The training speed can be enhanced by caching and pre-processing the data.