trphysx.data_utils¶
trphysx.data_utils.data_utils¶
-
class
trphysx.data_utils.data_utils.
DataCollator
¶ Bases:
object
Data collator used for training datasets. Combines examples in a minibatch into one tensor.
Parameters: - examples (List[Dict[str, Tensor]]) – List of training examples. An example should be a dictionary of tensors from the dataset.
- Returns – Dict[str, Tensor]: Minibatch dictionary of combined example data tensors
trphysx.data_utils.dataset_auto¶
-
class
trphysx.data_utils.dataset_auto.
AutoDataset
¶ Bases:
object
Helper class for creating training data-sets for different numerical examples
Raises: EnvironmentError – If direct initialization of this class is attempted. -
classmethod
create_dataset
(dataset_name: str, *args, **kwargs) → trphysx.data_utils.dataset_phys.PhysicalDataset¶ Creates a data-set for testing or validation Currently supports: “lorenz”, “cylinder”, “grayscott”
Parameters: dataset_name (str) – Keyword/name of the data-set needed Raises: KeyError – If dataset_name is not a supported model type Returns: Initialized data-set Return type: (PhysicalDataset)
-
classmethod
trphysx.data_utils.dataset_cylinder¶
-
class
trphysx.data_utils.dataset_cylinder.
CylinderDataset
(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)¶ Bases:
trphysx.data_utils.dataset_phys.PhysicalDataset
Dataset for 2D flow around a cylinder numerical example
-
embed_data
(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel) → None¶ Embeds cylinder flow data into a 1D vector representation for the transformer.
Parameters: - h5_file (h5py.File) – HDF5 file object of raw data
- embedder (EmbeddingModel) – Embedding neural network
-
trphysx.data_utils.dataset_grayscott¶
-
class
trphysx.data_utils.dataset_grayscott.
GrayscottDataset
(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)¶ Bases:
trphysx.data_utils.dataset_phys.PhysicalDataset
Dataset class for the Gray-scott numerical example.
-
embed_data
(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel)¶ Embeds gray-scott data into a 1D vector representation for the transformer.
TODO: Clean up and remove custom positions
Parameters: - h5_file (h5py.File) – HDF5 file object of raw data
- embedder (EmbeddingModel) – Embedding neural network
-
-
class
trphysx.data_utils.dataset_grayscott.
GrayscottPredictDataset
(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, neval: int = 16, overwrite_cache: bool = False, cache_path: str = None)¶ Bases:
trphysx.data_utils.dataset_grayscott.GrayscottDataset
Prediction data-set for the flow around a cylinder numerical example. Used during testing/validation since this data-set will store the embedding model and target states.
TODO: Remove this and have an overloaded trainer class for gray-scott
Parameters: - embedder (
trphysx.embedding.embedding_model.EmbeddingModel
) – Embedding neural network - file_path (str) – Path to hdf5 raw data file
- block_size (int) – Length of time-series blocks for training
- stride (int, optional) – Stride interval to sample blocks from the raw time-series. Defaults to 1.
- neval (int, optional) – Number of time-series from the HDF5 file to use for testing. Defaults to 16.
- overwrite_cache (bool, optional) – Overwrite cache file if it exists, i.e. embeded the raw data from file. Defaults to False.
- cache_path (str, optional) – Path to save the cached embeddings at. Defaults to None.
-
recover
(x0, mb_size: int = 96)¶ Recovers the physical state variables from an embedded vector
Parameters: - x0 (torch.Tensor) – [B, config.n_embd] Time-series of embedded vectors
- mb_size (int, optional) – Mini-batch size for recovering the state variables
Returns: [B, 2, H, W, D] physical state variable tensor
Return type: (torch.Tensor)
- embedder (
trphysx.data_utils.dataset_lorenz¶
-
class
trphysx.data_utils.dataset_lorenz.
LorenzDataset
(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)¶ Bases:
trphysx.data_utils.dataset_phys.PhysicalDataset
Dataset for the Lorenz numerical example
-
embed_data
(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel) → None¶ Embeds lorenz data into a 1D vector representation for the transformer.
Parameters: - h5_file (h5py.File) – HDF5 file object of raw data
- embedder (EmbeddingModel) – Embedding neural network
-
trphysx.data_utils.dataset_phys¶
-
class
trphysx.data_utils.dataset_phys.
PhysicalDataset
(embedder: trphysx.embedding.embedding_model.EmbeddingModel, file_path: str, block_size: int, stride: int = 1, ndata: int = -1, eval: bool = False, overwrite_cache: bool = False, cache_path: str = None, **kwargs)¶ Bases:
torch.utils.data.dataset.Dataset
Parent class for training and evaluation datasets for physical transformers. The caching of the dataset is based on the Hugging Face implementation.
Parameters: - embedder (EmbeddingModel) – Embedding neural network
- file_path (str) – Path to hdf5 raw data file
- block_size (int) – Length of time-series blocks for training
- stride (int, optional) – Stride interval to sample blocks from the raw time-series. Defaults to 1.
- ndata (int, optional) – Number of time-series from the HDF5 file to block. Will use all if negative. Defaults to -1.
- eval (bool, optional) – If this is a eval data-set, which will provide target states. Defaults to False.
- overwrite_cache (bool, optional) – Overwrite cache file if it exists, i.e. embed the raw data from file. Defaults to False.
- cache_path (str, optional) – Path to save the cached embeddings at. Defaults to None.
-
read_cache
(cached_features_file: str) → None¶ Default method to read cache file into object.
Parameters: cached_features_file (str) – Cache file path
-
write_cache
(cached_features_file: str) → None¶ Default method to write cache file .
Parameters: cached_features_file (str) – Cache file path
-
embed_data
(h5_file: h5py._hl.files.File, embedder: trphysx.embedding.embedding_model.EmbeddingModel)¶ Embeds raw physical data into a 1D vector representation for the transformer. This is problem specific and thus must be overridden.
Parameters: - h5_file (h5py.File) – HDF5 file object to read raw data from
- embedder (EmbeddingModel) – Embedding neural network
Raises: NotImplementedError – If function has not been overridden by a child dataset class.