trphysx.embedding.training¶
trphysx.embedding.training.enn_args¶
-
class
trphysx.embedding.training.enn_args.
EmbeddingParser
¶ Bases:
argparse.ArgumentParser
Arguments for training embedding models
-
mkdirs
(*directories) → None¶ Makes a directory if it does not exist
Parameters: directories (str...) – a sequence of directories to create Raises: OSError – if directory cannot be created
-
parse
(args: List[T] = None, dirs: bool = True) → None¶ Parse program arguments
Parameters: - args (List, optional) – Explicit list of arguments. Defaults to None.
- dirs (bool, optional) – Make experiment directories. Defaults to True.
-
trphysx.embedding.training.enn_data_handler¶
-
class
trphysx.embedding.training.enn_data_handler.
EmbeddingDataHandler
¶ Bases:
object
Base class for embedding data handlers. Data handlers are used to create the training and testing datasets.
-
mu
= None¶
-
std
= None¶
-
norm_params
¶ Get normalization parameters
Raises: ValueError – If normalization parameters have not been initialized Returns: mean and standard deviation Return type: (Tuple)
-
createTrainingLoader
(*args, **kwargs)¶
-
createTestingLoader
(*args, **kwargs)¶
-
-
class
trphysx.embedding.training.enn_data_handler.
LorenzDataHandler
¶ Bases:
trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler
Built in embedding data handler for Lorenz system
-
class
LorenzDataset
(examples: List[T])¶ Bases:
torch.utils.data.dataset.Dataset
Dataset for training Lorenz embedding model.
Parameters: examples (List) – list of training/testing examples
-
class
LorenzDataCollator
¶ Bases:
object
Data collator for lorenz embedding problem
-
createTrainingLoader
(file_path: str, block_size: int, stride: int = 1, ndata: int = -1, batch_size: int = 32, shuffle: bool = True) → torch.utils.data.dataloader.DataLoader¶ Creating training data loader for Lorenz system. For a single training simulation, the total time-series is sub-chunked into smaller blocks for training.
Parameters: - file_path (str) – Path to HDF5 file with training data
- block_size (int) – The length of time-series blocks
- stride (int) – Stride of each time-series block
- ndata (int, optional) – Number of training time-series. If negative, all of the provided
- will be used. Defaults to -1. (data) –
- batch_size (int, optional) – Training batch size. Defaults to 32.
- shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to True.
Returns: Training loader
Return type: (DataLoader)
-
createTestingLoader
(file_path: str, block_size: int, ndata: int = -1, batch_size: int = 32, shuffle: bool = False) → torch.utils.data.dataloader.DataLoader¶ Creating testing/validation data loader for Lorenz system. For a data case with time-steps [0,T], this method extract a smaller time-series to be used for testing [0, S], s.t. S < T.
Parameters: - file_path (str) – Path to HDF5 file with testing data
- block_size (int) – The length of testing time-series
- ndata (int, optional) – Number of testing time-series. If negative, all of the provided
- will be used. Defaults to -1. (data) –
- batch_size (int, optional) – Testing batch size. Defaults to 32.
- shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to False.
Returns: Testing/validation data loader
Return type: (DataLoader)
-
class
-
class
trphysx.embedding.training.enn_data_handler.
CylinderDataHandler
¶ Bases:
trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler
Built in embedding data handler for flow around a cylinder system
-
class
CylinderDataset
(examples: List[T], visc: List[T])¶ Bases:
torch.utils.data.dataset.Dataset
Dataset for training flow around a cylinder embedding model
Parameters: - examples (List) – list of training/testing example flow fields
- visc (List) – list of training/testing example viscosities
-
class
CylinderDataCollator
¶ Bases:
object
Data collator for flow around a cylinder embedding problem
-
createTrainingLoader
(file_path: str, block_size: int, stride: int = 1, ndata: int = -1, batch_size: int = 32, shuffle: bool = True) → torch.utils.data.dataloader.DataLoader¶ Creating training data loader for the flow around a cylinder system. For a single training simulation, the total time-series is sub-chunked into smaller blocks for training.
Parameters: - file_path (str) – Path to HDF5 file with training data
- block_size (int) – The length of time-series blocks
- stride (int) – Stride of each time-series block
- ndata (int, optional) – Number of training time-series. If negative, all of the provided
- will be used. Defaults to -1. (data) –
- batch_size (int, optional) – Training batch size. Defaults to 32.
- shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to True.
Returns: Training loader
Return type: (DataLoader)
-
createTestingLoader
(file_path: str, block_size: int, ndata: int = -1, batch_size: int = 32, shuffle: bool = False) → torch.utils.data.dataloader.DataLoader¶ Creating testing/validation data loader for the flow around a cylinder system. For a data case with time-steps [0,T], this method extract a smaller time-series to be used for testing [0, S], s.t. S < T.
Parameters: - file_path (str) – Path to HDF5 file with testing data
- block_size (int) – The length of testing time-series
- ndata (int, optional) – Number of testing time-series. If negative, all of the provided
- will be used. Defaults to -1. (data) –
- batch_size (int, optional) – Testing batch size. Defaults to 32.
- shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to False.
Returns: Testing/validation data loader
Return type: (DataLoader)
-
class
-
class
trphysx.embedding.training.enn_data_handler.
GrayScottDataHandler
¶ Bases:
trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler
Built in embedding data handler for the Gray-Scott system
-
class
GrayScottDataset
(h5_file: str, keys: List[T], indices: List[T], block_size: int = 1)¶ Bases:
torch.utils.data.dataset.Dataset
Dataset for Gray-Scott system. Dynamically loads data from file each mini-batch since loading an entire data-set would be way too large. This data-set support the loading of sub-chunked time-series.
Parameters: - h5_file (str) – Path to hdf5 file with raw data
- keys (List) – List of keys corresponding to each example
- indices (List) – List of start indices for each time-series block
- block_size (int, optional) – List to time-series block sizes for each example. Defaults to 1.
-
class
GrayScottDataCollator
¶ Bases:
object
Data collator for the Gray-scott embedding problem
-
createTrainingLoader
(file_path: str, block_size: int, stride: int = 1, ndata: int = -1, batch_size: int = 32, shuffle: bool = True, mpi_rank: int = -1, mpi_size: int = 1) → torch.utils.data.dataloader.DataLoader¶ Creating training data loader for the Gray-Scott system. For a single training simulation, the total time-series is sub-chunked into smaller blocks for training. This particular dataloader support splitting the dataset between GPU processes for parallel training if needed.
Parameters: - file_path (str) – Path to HDF5 file with training data
- block_size (int) – The length of time-series blocks
- stride (int) – Stride of each time-series block
- ndata (int, optional) – Number of training time-series. If negative, all of the provided
- will be used. Defaults to -1. (data) –
- batch_size (int, optional) – Training batch size. Defaults to 32.
- shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to True.
- mpi_rank (int, optional) – Rank of current MPI process. Defaults to -1.
- mpi_size (int, optional) – Number of training processes. Set to 1 for serial training. Defaults to 1.
Returns: Training loader
Return type: (DataLoader)
-
createTestingLoader
(file_path: str, block_size: int, ndata: int = -1, batch_size: int = 32, shuffle: bool = False) → torch.utils.data.dataloader.DataLoader¶ Creating testing/validation data loader for the Gray-Scott system. For a data case with time-steps [0,T], this method extract a smaller time-series to be used for testing [0, S], s.t. S < T.
Parameters: - file_path (str) – Path to HDF5 file with testing data
- block_size (int) – The length of testing time-series
- ndata (int, optional) – Number of testing time-series. If negative, all of the provided
- will be used. Defaults to -1. (data) –
- batch_size (int, optional) – Testing batch size. Defaults to 32.
- shuffle (bool, optional) – Turn on mini-batch shuffling in dataloader. Defaults to False.
Returns: Testing/validation data loader
Return type: (DataLoader)
-
class
-
class
trphysx.embedding.training.enn_data_handler.
AutoDataHandler
¶ Bases:
object
Helper class for intializing different built in data-handlers for embedding training
-
classmethod
load_data_handler
(model_name: str, **kwargs) → trphysx.embedding.training.enn_data_handler.EmbeddingDataHandler¶ Gets built-in data handler. Currently supports: “lorenz”, “cylinder”, “grayscott”
Parameters: model_name (str) – Model name Raises: KeyError – If model_name is not a supported model type Returns: Embedding data handler Return type: (EmbeddingDataHandler)
-
classmethod
trphysx.embedding.training.enn_trainer¶
-
trphysx.embedding.training.enn_trainer.
set_seed
(seed: int) → None¶ Set random seed
Parameters: seed (int) – random seed
-
class
trphysx.embedding.training.enn_trainer.
EmbeddingTrainer
(model: trphysx.embedding.embedding_model.EmbeddingTrainingHead, args: argparse.ArgumentParser, optimizers: Tuple[torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler], viz: trphysx.viz.viz_model.Viz = None)¶ Bases:
object
Trainer for Koopman embedding model
Parameters: - model (EmbeddingTrainingHead) – Embedding training model
- args (TrainingArguments) – Training arguments
- optimizers (Tuple[Optimizer, Scheduler]) – Tuple of Pytorch optimizer and lr scheduler.
- viz (Viz, optional) – Visualization class. Defaults to None.
-
train
(training_loader: torch.utils.data.dataloader.DataLoader, eval_dataloader: torch.utils.data.dataloader.DataLoader) → None¶ Training loop for the embedding model
Parameters: - training_loader (DataLoader) – Training dataloader
- eval_dataloader (DataLoader) – Evaluation dataloader
-
evaluate
(eval_dataloader: torch.utils.data.dataloader.DataLoader, epoch: int = 0) → Dict[str, float]¶ Run evaluation, plot prediction and return metrics.
Parameters: - eval_dataset (Dataset) – Evaluation dataloader
- epoch (int, optional) – Current epoch, used for naming figures. Defaults to 0.
Returns: Dictionary of prediction metrics
Return type: Dict[str, float]