Use PiNN with Ray Tune¶
Tune is a library for hyperparameter tuning at any scale.
We have some helpers to tune the hyperparameters with the Tune library.
A working example¶
Suppose we would like to tune the hyperparameters for say, learning rate, number of Pi blocks and the cutoff radius for a PiNet potential. We need to first declare the space we would like to search, with Tune.
from ray import tune
config = {
'lr': tune.grid_search([3e-4, 1e-4, 3e-5]),
'rc': tune.grid_search([3.5, 4.5, 5.0, 5.5]),
'depth': tune.grid_search([4, 5, 6])}
Then we need to define a train_fn. Given a config it should return
several objects, a model (as an estimator), the train_spec and
eval_spec, and a reporter function which reports the metrics to Tune.
The reporter should take the evaluation output as input, and return a
dictionary of metrics. The naming does not matter, but note that Tune
expects a reward_attr which increases during training to
discriminate models, so you should define things like accuracy
instead of error.
def train_fn(c, ckpt=None):
import yaml, os
import tensorflow as tf
from pinn.io import load_tfrecord
from pinn.models import potential_model
# parameter builder
params = {
# Tune will take charge of the running path,
# so model_dir can be a relative directory
'model_dir': './model_dir/',
'network': 'pinet',
'network_params': {
'depth': c['depth'],
'atom_types': [1,6,7,8,9]},
'model_params': {
'learning_rate': c['lr'],
'en_scale': 1}}
# disable checkpoint cleaning, and specify report frequency
# note we report to Tune whenever a checkpoint is saved
config = tf.estimator.RunConfig(
save_checkpoints_secs=300,
keep_checkpoint_max=None,
keep_checkpoint_every_n_hours=None)
# Returning things
model = potential_model(params, config=config, warm_start_from=ckpt)
train_fn = lambda: load_tfrecord('/path/to/train.yml')
test_fn = lambda: load_tfrecord('/path/to/test.yml')
train_spec = tf.estimator.TrainSpec(input_fn=train_fn, max_steps=1e6)
eval_spec = tf.estimator.EvalSpec(input_fn=test_fn)
reporter = lambda eval_out: {'accuracy': 1./eval_out['METRICS/E_MAE']}
return model, train_spec, eval_spec, reporter
Now you can use the TuneTrainable function to feed your train_fn
to Tune as a trainable.
import ray
from pinn.utils import TuneTrainable
# Define the trainable from train_fn
trainable = TuneTrainable(train_fn)
# Start ray cluster, use all the GPUs on this machine
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5'
ray.shutdown()
ray.init()
# Searching strategy of Ray Tune
search_alg = tune.suggest.BasicVariantGenerator(shuffle=True)
scheduler = tune.schedulers.HyperBandScheduler(
time_attr='time_total_s', reward_attr='accuracy', max_t=3600)
# Run jobs with Ray tune
tune.run(trainable, name='test_tunning', config=config,
resources_per_trial={'cpu':5, 'gpu':1},
search_alg=search_alg, scheduler=scheduler,
local_dir="/tmp/test_tuning",
num_samples=2, verbose=1)
Visualizing the result¶
We have a notebook example for visualizing tuning results from Tune.
How this works¶
The TuneTrainable function is actually a general wrapper for
estimators to work with Ray Tune. It runs the train and evaluate but stops
the estimator whenever a checkpoint is saved and reports the metrics
to Tune.
This is a bit more elaborate than adding a reporter hook to the estimator, and slightly slower since the training must restart from checkpoints. But the gain is that Tune can now stop and restore trainings when it wants, which is required by of some of Tune’s schedulers.