trphysx.embedding.training

trphysx.embedding.training.enn_args

class trphysx.embedding.training.enn_args.EmbeddingParser

Bases: argparse.ArgumentParser

Arguments for training embedding models

mkdirs(*directories) → None

Makes a directory if it does not exist

Parameters:directories (str...) – a sequence of directories to create
Raises:OSError – if directory cannot be created
parse(args: List[T] = None, dirs: bool = True) → None

Parse program arguments

Parameters:
  • args (List, optional) – Explicit list of arguments. Defaults to None.
  • dirs (bool, optional) – Make experiment directories. Defaults to True.

trphysx.embedding.training.enn_data_handler

class trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler

Bases: object

Base class for embedding data handlers. Data handlers are used to create the training and testing datasets.

mu = None
std = None
norm_params

Get normalization parameters

Raises:ValueError – If normalization parameters have not been initialized
Returns:mean and standard deviation
Return type:(Tuple)
createTrainingLoader(*args, **kwargs)
createTestingLoader(*args, **kwargs)
class trphysx.embedding.training.enn_data_handler.LorenzDataHandler

Bases: trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler

Built in embedding data handler for Lorenz system

class LorenzDataset(examples: List[T])

Bases: torch.utils.data.dataset.Dataset

Dataset for training Lorenz embedding model.

Parameters:examples (List) – list of training/testing examples
class LorenzDataCollator

Bases: object

Data collator for lorenz embedding problem

createTrainingLoader(file_path: str, block_size: int, stride: int = 1, ndata: int = -1, batch_size: int = 32, shuffle: bool = True) → torch.utils.data.dataloader.DataLoader

Creating training data loader for Lorenz system. For a single training simulation, the total time-series is sub-chunked into smaller blocks for training.

Parameters:
  • file_path (str) – Path to HDF5 file with training data
  • block_size (int) – The length of time-series blocks
  • stride (int) – Stride of each time-series block
  • ndata (int, optional) – Number of training time-series. If negative, all of the provided
  • will be used. Defaults to -1. (data) –
  • batch_size (int, optional) – Training batch size. Defaults to 32.
  • shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to True.
Returns:

Training loader

Return type:

(DataLoader)

createTestingLoader(file_path: str, block_size: int, ndata: int = -1, batch_size: int = 32, shuffle: bool = False) → torch.utils.data.dataloader.DataLoader

Creating testing/validation data loader for Lorenz system. For a data case with time-steps [0,T], this method extract a smaller time-series to be used for testing [0, S], s.t. S < T.

Parameters:
  • file_path (str) – Path to HDF5 file with testing data
  • block_size (int) – The length of testing time-series
  • ndata (int, optional) – Number of testing time-series. If negative, all of the provided
  • will be used. Defaults to -1. (data) –
  • batch_size (int, optional) – Testing batch size. Defaults to 32.
  • shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to False.
Returns:

Testing/validation data loader

Return type:

(DataLoader)

class trphysx.embedding.training.enn_data_handler.CylinderDataHandler

Bases: trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler

Built in embedding data handler for flow around a cylinder system

class CylinderDataset(examples: List[T], visc: List[T])

Bases: torch.utils.data.dataset.Dataset

Dataset for training flow around a cylinder embedding model

Parameters:
  • examples (List) – list of training/testing example flow fields
  • visc (List) – list of training/testing example viscosities
class CylinderDataCollator

Bases: object

Data collator for flow around a cylinder embedding problem

createTrainingLoader(file_path: str, block_size: int, stride: int = 1, ndata: int = -1, batch_size: int = 32, shuffle: bool = True) → torch.utils.data.dataloader.DataLoader

Creating training data loader for the flow around a cylinder system. For a single training simulation, the total time-series is sub-chunked into smaller blocks for training.

Parameters:
  • file_path (str) – Path to HDF5 file with training data
  • block_size (int) – The length of time-series blocks
  • stride (int) – Stride of each time-series block
  • ndata (int, optional) – Number of training time-series. If negative, all of the provided
  • will be used. Defaults to -1. (data) –
  • batch_size (int, optional) – Training batch size. Defaults to 32.
  • shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to True.
Returns:

Training loader

Return type:

(DataLoader)

createTestingLoader(file_path: str, block_size: int, ndata: int = -1, batch_size: int = 32, shuffle: bool = False) → torch.utils.data.dataloader.DataLoader

Creating testing/validation data loader for the flow around a cylinder system. For a data case with time-steps [0,T], this method extract a smaller time-series to be used for testing [0, S], s.t. S < T.

Parameters:
  • file_path (str) – Path to HDF5 file with testing data
  • block_size (int) – The length of testing time-series
  • ndata (int, optional) – Number of testing time-series. If negative, all of the provided
  • will be used. Defaults to -1. (data) –
  • batch_size (int, optional) – Testing batch size. Defaults to 32.
  • shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to False.
Returns:

Testing/validation data loader

Return type:

(DataLoader)

class trphysx.embedding.training.enn_data_handler.GrayScottDataHandler

Bases: trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler

Built in embedding data handler for the Gray-Scott system

class GrayScottDataset(h5_file: str, keys: List[T], indices: List[T], block_size: int = 1)

Bases: torch.utils.data.dataset.Dataset

Dataset for Gray-Scott system. Dynamically loads data from file each mini-batch since loading an entire data-set would be way too large. This data-set support the loading of sub-chunked time-series.

Parameters:
  • h5_file (str) – Path to hdf5 file with raw data
  • keys (List) – List of keys corresponding to each example
  • indices (List) – List of start indices for each time-series block
  • block_size (int, optional) – List to time-series block sizes for each example. Defaults to 1.
class GrayScottDataCollator

Bases: object

Data collator for the Gray-scott embedding problem

createTrainingLoader(file_path: str, block_size: int, stride: int = 1, ndata: int = -1, batch_size: int = 32, shuffle: bool = True, mpi_rank: int = -1, mpi_size: int = 1) → torch.utils.data.dataloader.DataLoader

Creating training data loader for the Gray-Scott system. For a single training simulation, the total time-series is sub-chunked into smaller blocks for training. This particular dataloader support splitting the dataset between GPU processes for parallel training if needed.

Parameters:
  • file_path (str) – Path to HDF5 file with training data
  • block_size (int) – The length of time-series blocks
  • stride (int) – Stride of each time-series block
  • ndata (int, optional) – Number of training time-series. If negative, all of the provided
  • will be used. Defaults to -1. (data) –
  • batch_size (int, optional) – Training batch size. Defaults to 32.
  • shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to True.
  • mpi_rank (int, optional) – Rank of current MPI process. Defaults to -1.
  • mpi_size (int, optional) – Number of training processes. Set to 1 for serial training. Defaults to 1.
Returns:

Training loader

Return type:

(DataLoader)

createTestingLoader(file_path: str, block_size: int, ndata: int = -1, batch_size: int = 32, shuffle: bool = False) → torch.utils.data.dataloader.DataLoader

Creating testing/validation data loader for the Gray-Scott system. For a data case with time-steps [0,T], this method extract a smaller time-series to be used for testing [0, S], s.t. S < T.

Parameters:
  • file_path (str) – Path to HDF5 file with testing data
  • block_size (int) – The length of testing time-series
  • ndata (int, optional) – Number of testing time-series. If negative, all of the provided
  • will be used. Defaults to -1. (data) –
  • batch_size (int, optional) – Testing batch size. Defaults to 32.
  • shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to False.
Returns:

Testing/validation data loader

Return type:

(DataLoader)

class trphysx.embedding.training.enn_data_handler.AutoDataHandler

Bases: object

Helper class for intializing different built in data-handlers for embedding training

classmethod load_data_handler(model_name: str, **kwargs) → trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler

Gets built-in data handler. Currently supports: “lorenz”, “cylinder”, “grayscott”

Parameters:model_name (str) – Model name
Raises:KeyError – If model_name is not a supported model type
Returns:Embedding data handler
Return type:(EmbeddingDataHandler)

trphysx.embedding.training.enn_trainer

trphysx.embedding.training.enn_trainer.set_seed(seed: int) → None

Set random seed

Parameters:seed (int) – random seed
class trphysx.embedding.training.enn_trainer.EmbeddingTrainer(model: trphysx.embedding.embedding_model.EmbeddingTrainingHead, args: argparse.ArgumentParser, optimizers: Tuple[torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler], viz: trphysx.viz.viz_model.Viz = None)

Bases: object

Trainer for Koopman embedding model

Parameters:
  • model (EmbeddingTrainingHead) – Embedding training model
  • args (TrainingArguments) – Training arguments
  • optimizers (Tuple[Optimizer, Scheduler]) – Tuple of Pytorch optimizer and lr scheduler.
  • viz (Viz, optional) – Visualization class. Defaults to None.
train(training_loader: torch.utils.data.dataloader.DataLoader, eval_dataloader: torch.utils.data.dataloader.DataLoader) → None

Training loop for the embedding model

Parameters:
  • training_loader (DataLoader) – Training dataloader
  • eval_dataloader (DataLoader) – Evaluation dataloader
evaluate(eval_dataloader: torch.utils.data.dataloader.DataLoader, epoch: int = 0) → Dict[str, float]

Run evaluation, plot prediction and return metrics.

Parameters:
  • eval_dataset (Dataset) – Evaluation dataloader
  • epoch (int, optional) – Current epoch, used for naming figures. Defaults to 0.
Returns:

Dictionary of prediction metrics

Return type:

Dict[str, float]