trphysx.data_utils

trphysx.data_utils.data_utils

class trphysx.data_utils.data_utils.DataCollator

Bases: object

Data collator used for training datasets. Combines examples in a minibatch into one tensor.

Parameters:
  • examples (List[Dict[str, Tensor]]) – List of training examples. An example should be a dictionary of tensors from the dataset.
  • Returns – Dict[str, Tensor]: Minibatch dictionary of combined example data tensors

trphysx.data_utils.dataset_auto

class trphysx.data_utils.dataset_auto.AutoDataset

Bases: object

Helper class for creating training data-sets for different numerical examples

Raises:EnvironmentError – If direct initialization of this class is attempted.
classmethod create_dataset(dataset_name: str, *args, **kwargs) → trphysx.data_utils.dataset_phys.PhysicalDataset

Creates a data-set for testing or validation Currently supports: “lorenz”, “cylinder”, “grayscott”

Parameters:dataset_name (str) – Keyword/name of the data-set needed
Raises:KeyError – If dataset_name is not a supported model type
Returns:Initialized data-set
Return type:(PhysicalDataset)

trphysx.data_utils.dataset_cylinder

class trphysx.data_utils.dataset_cylinder.CylinderDataset(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)

Bases: trphysx.data_utils.dataset_phys.PhysicalDataset

Dataset for 2D flow around a cylinder numerical example

embed_data(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel) → None

Embeds cylinder flow data into a 1D vector representation for the transformer.

Parameters:
  • h5_file (h5py.File) – HDF5 file object of raw data
  • embedder (EmbeddingModel) – Embedding neural network

trphysx.data_utils.dataset_grayscott

class trphysx.data_utils.dataset_grayscott.GrayscottDataset(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)

Bases: trphysx.data_utils.dataset_phys.PhysicalDataset

Dataset class for the Gray-scott numerical example.

embed_data(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel)

Embeds gray-scott data into a 1D vector representation for the transformer.

TODO: Clean up and remove custom positions

Parameters:
  • h5_file (h5py.File) – HDF5 file object of raw data
  • embedder (EmbeddingModel) – Embedding neural network
class trphysx.data_utils.dataset_grayscott.GrayscottPredictDataset(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, neval: int = 16, overwrite_cache: bool = False, cache_path: str = None)

Bases: trphysx.data_utils.dataset_grayscott.GrayscottDataset

Prediction data-set for the flow around a cylinder numerical example. Used during testing/validation since this data-set will store the embedding model and target states.

TODO: Remove this and have an overloaded trainer class for gray-scott

Parameters:
  • embedder (trphysx.embedding.embedding_model.EmbeddingModel) – Embedding neural network
  • file_path (str) – Path to hdf5 raw data file
  • block_size (int) – Length of time-series blocks for training
  • stride (int, optional) – Stride interval to sample blocks from the raw time-series. Defaults to 1.
  • neval (int, optional) – Number of time-series from the HDF5 file to use for testing. Defaults to 16.
  • overwrite_cache (bool, optional) – Overwrite cache file if it exists, i.e. embeded the raw data from file. Defaults to False.
  • cache_path (str, optional) – Path to save the cached embeddings at. Defaults to None.
recover(x0, mb_size: int = 96)

Recovers the physical state variables from an embedded vector

Parameters:
  • x0 (torch.Tensor) – [B, config.n_embd] Time-series of embedded vectors
  • mb_size (int, optional) – Mini-batch size for recovering the state variables
Returns:

[B, 2, H, W, D] physical state variable tensor

Return type:

(torch.Tensor)

trphysx.data_utils.dataset_lorenz

class trphysx.data_utils.dataset_lorenz.LorenzDataset(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)

Bases: trphysx.data_utils.dataset_phys.PhysicalDataset

Dataset for the Lorenz numerical example

embed_data(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel) → None

Embeds lorenz data into a 1D vector representation for the transformer.

Parameters:
  • h5_file (h5py.File) – HDF5 file object of raw data
  • embedder (EmbeddingModel) – Embedding neural network

trphysx.data_utils.dataset_phys

class trphysx.data_utils.dataset_phys.PhysicalDataset(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)

Bases: torch.utils.data.dataset.Dataset

Parent class for training and evaluation datasets for physical transformers. The caching of the dataset is based on the Hugging Face implementation.

Parameters:
  • embedder (EmbeddingModel) – Embedding neural network
  • file_path (str) – Path to hdf5 raw data file
  • block_size (int) – Length of time-series blocks for training
  • stride (int, optional) – Stride interval to sample blocks from the raw time-series. Defaults to 1.
  • ndata (int, optional) – Number of time-series from the HDF5 file to block. Will use all if negative. Defaults to -1.
  • eval (bool, optional) – If this is a eval data-set, which will provide target states. Defaults to False.
  • overwrite_cache (bool, optional) – Overwrite cache file if it exists, i.e. embed the raw data from file. Defaults to False.
  • cache_path (str, optional) – Path to save the cached embeddings at. Defaults to None.
read_cache(cached_features_file: str) → None

Default method to read cache file into object.

Parameters:cached_features_file (str) – Cache file path
write_cache(cached_features_file: str) → None

Default method to write cache file .

Parameters:cached_features_file (str) – Cache file path
embed_data(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel)

Embeds raw physical data into a 1D vector representation for the transformer. This is problem specific and thus must be overridden.

Parameters:
  • h5_file (h5py.File) – HDF5 file object to read raw data from
  • embedder (EmbeddingModel) – Embedding neural network
Raises:

NotImplementedError – If function has not been overridden by a child dataset class.