trphysx.transformer

trphysx.transformer.attention

class trphysx.transformer.attention.MaskedAttention(nx: int, n_ctx: int, config: trphysx.config.configuration_phys.PhysConfig, scale: bool = False, mask: str = 'tril')

Bases: torch.nn.modules.module.Module

Masked self-attention module based on the Hugging face implementation https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_gpt2.py

Parameters:
  • nx (int) – Dimensionality of feature vector
  • n_ctx (int) – Context length of the attention (TODO: Not needed with config object?)
  • config (PhysConfig) – Transformer config object
  • scale (bool, optional) – Scale the attention scores. Defaults to False.
  • mask (str, optional) – Attention mask type. Defaults to ‘tril’.
Raises:

ValueError – Invalid mask type

merge_heads(x: torch.Tensor) → torch.Tensor

Merge attention heads

Parameters:x (Tensor) – [batch, head, seq_length, head_features] Input tensor
Returns:[batch, seq_length, head * head_features] Concatenated output tensor
Return type:Tensor
split_heads(x, k: bool = False) → torch.Tensor

Splits key, query or value tensor into separate heads. Dimensionality of output depends if tensor is a key.

Parameters:
  • x (Tensor) – [batch, seq_length, nx] Input tensor
  • k (bool) – If input tensor is a key tensor
Returns:

[batch, head, seq_length, head_features] Split features for query and value, [batch, head, seq_length, head_features] split feature for key

Return type:

Tensor

forward(x: torch.Tensor, layer_past: List[torch.Tensor] = None, attention_mask: torch.Tensor = None, head_mask: torch.Tensor = None, use_cache: bool = False, output_attentions: bool = False) → List[torch.Tensor]

Masked attention forward pass

Parameters:
  • x (Tensor) – [batch, seq_length, nx] Input feature.
  • layer_past (Tensor, optional) – [2, batch, n_head, seq_length, nx] Precomputed self-attention vectors. Defaults to None.
  • attention_mask (Tensor, optional) – Optional defined attention mask. Applied before soft mask. Defaults to None.
  • head_mask (Tensor, optional) – Optional attention value mask. Applied after softmax Defaults to None.
  • use_cache (bool, optional) – Return calculated key values or faster generation. Defaults to False.
  • output_attentions (bool, optional) – Return attention matrix. Defaults to False.
Returns:

Output consisting of output feature, key values (if requested), attention tensor (if requested)

Return type:

List[Tensor]

trphysx.transformer.generate_utils

class trphysx.transformer.generate_utils.GenerationMixin

Bases: object

Class containing generative functions for transformers

prepare_inputs_for_generation(inputs_embeds: torch.Tensor, position_ids: torch.Tensor = None, prop_embeds: torch.Tensor = None, **kwargs) → Dict[str, torch.Tensor]

Prepares input features for prediction

Parameters:
  • inputs_features (Dict[str, Tensor]) – Input feature tensors
  • are being generated. (that) –
Returns:

Dictionary of model inputs

Return type:

Dict[str, Tensor]

generate(inputs_embeds: torch.Tensor, position_ids: torch.Tensor = None, prop_embeds: torch.Tensor = None, max_length: int = None, attention_mask: torch.LongTensor = None, use_cache: bool = False, **model_specific_kwargs) → Tuple[torch.Tensor]

Generated a predicted sequence of features

Parameters:
  • inputs_embeds (Tensor) – [batch, seq, n_embed] Input feature tensor
  • position_ids (Tensor, optional) – [seq, n_embed] Position tensor. Defaults to None.
  • prop_embeds (Tensor, optional) – [batch, seq, n_embed] Property tensor. Defaults to None.
  • max_length (int, optional) – Length of time series to predict. Defaults to None.
  • attention_mask (LongTensor, optional) – Manual attention mask. Defaults to None.
  • use_cache (bool, optional) – Cache past transformer states for faster generation. Defaults to False.
Returns:

[batch, max_length, n_embed] Predicted feature tensor, additional optional transformer outputs.

Return type:

Tuple[Tensor]

trphysx.transformer.phys_transformer_base

class trphysx.transformer.phys_transformer_base.PhysformerBase(config, *inputs, **kwargs)

Bases: torch.nn.modules.module.Module

Parent class for physical transformers

model_name = 'transformer_model'
forward()

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

generate()
get_input_embeddings()
set_input_embeddings(new_embeddings)
tie_weights()

Tie the weights between the input embeddings and the output embeddings. If the torchscript flag is set in the configuration, can’t handle parameter sharing so we are cloning the weights instead.

save_model(save_directory: str, epoch: int = 0) → None

Saves transformer model to the specified directory.

Parameters:
  • save_directory (str) – Folder to save file at
  • epoch (int, optional) – Epoch number to name model file. Defaults to 0.
Raises:

AssertionError – If provided directory is not valid.

load_model(file_or_path_directory: str, epoch: int = 0) → None

Load a transformer model from the specified file or path

Parameters:
  • file_or_path_directory (str) – File or folder path to load state dictionary from.
  • epoch (int, optional) – Epoch of current model for file name, used if folder path is provided. Defaults to 0.
Raises:

FileNotFoundError – If provided file or directory could not be found.

trphysx.transformer.phys_transformer_gpt2

class trphysx.transformer.phys_transformer_gpt2.MLP(n_state: int, config: trphysx.config.configuration_phys.PhysConfig)

Bases: torch.nn.modules.module.Module

Simple fully connected neural network layer. Includes activations function and dropout.

Parameters:
  • n_state (int) – dimensionality of input features
  • config (PhysConfig) – Phys-transformer config object
forward(x: torch.Tensor) → torch.Tensor

Forward pass

Parameters:x (Tensor) – [B, T, n_state] input features
Returns:Output features
Return type:Tensor
class trphysx.transformer.phys_transformer_gpt2.Block(n_ctx: int, config: trphysx.config.configuration_phys.PhysConfig, scale: bool = False)

Bases: torch.nn.modules.module.Module

Transformer decoder block consisting of layer norm, masked self-attention, layer norm and fully connected layer.

Parameters:
  • n_ctx (int) – contex length of block
  • config (PhysConfig) – Phys-transformer config object
  • scale (bool, optional) – Scaled self-attention calculation. Defaults to False.
forward(x: torch.Tensor, layer_past: List[torch.Tensor] = None, attention_mask: torch.LongTensor = None, head_mask: torch.LongTensor = None, use_cache: bool = False, output_attentions: bool = False) → List[torch.Tensor]

Forward pass

Parameters:
  • x (Tensor) – [B, T, n_state] input features
  • layer_past ([type], optional) – Past self-attention calculation. Defaults to None.
  • attention_mask (LongTensor, optional) – Attention mask. Defaults to None.
  • head_mask (LongTensor, optional) – Attention value. Defaults to None.
  • use_cache (bool, optional) – Store attention state (key values). Defaults to False.
  • output_attentions (bool, optional) – Return attention values. Defaults to False.
Returns:

List of output tensors

Return type:

List[Tensor]

class trphysx.transformer.phys_transformer_gpt2.PhysformerGPT2(config: trphysx.config.configuration_phys.PhysConfig, model_name: str = None)

Bases: trphysx.transformer.generate_utils.GenerationMixin, trphysx.transformer.phys_transformer_base.PhysformerBase

Transformer decoder model for modeling physics

Parameters:
  • config (PhysConfig) – Phys-transformer config object
  • model_name (str, optional) – Model name. Defaults to None.
forward(inputs_embeds: torch.Tensor, position_ids: torch.Tensor = None, prop_embeds: torch.Tensor = None, past: List[List[torch.Tensor]] = None, attention_mask: torch.LongTensor = None, head_mask: torch.LongTensor = None, use_cache: bool = True, output_attentions: bool = False) → List[torch.Tensor]

Forward pass

Note: Attention masks are not properly implemented presently and will likely not work.

Parameters:
  • inputs_embeds (Tensor) – [B, T, n_embed] Input features
  • position_ids (Tensor, optional) – [T, n_embed] Manually specify position ids. Defaults to None.
  • prop_embeds (Tensor, optional) – [B, T, n_embed] Optional property feature. Defaults to None.
  • past (List[List[Tensor]], optional) – Transformer past state. Defaults to None.
  • attention_mask (LongTensor, optional) – [B, T] Sequence attention mask. Defaults to None.
  • head_mask (LongTensor, optional) – Attention value mask. Defaults to None.
  • use_cache (bool, optional) – Return attention states (keys). Defaults to True.
  • output_attentions (bool, optional) – Return attention scores. Defaults to False.
Returns:

Output features, attention state (if requested), hidden states of all layers (if requested), attention tensor (if requested)

Return type:

List[Tensor]

trphysx.transformer.phys_transformer_helpers

class trphysx.transformer.phys_transformer_helpers.PhysformerTrain(config: trphysx.config.configuration_phys.PhysConfig, transformer_model: trphysx.transformer.phys_transformer_base.PhysformerBase = None)

Bases: trphysx.transformer.phys_transformer_base.PhysformerBase

Model head for training the physics transformer base.

Parameters:
  • config (PhysConfig) – Phys-transformer config object
  • transformer_model (PhysformerBase) – Initialized transformer model
forward(inputs_embeds: torch.Tensor, labels_embeds: torch.Tensor, **kwargs) → Tuple[Union[float, torch.Tensor]]

Forward method for this head calculates the MSE between the predicted time-series and target embeddings This head allows for easy distribution to multiple GPUs and CPUs. See transformer

Parameters:
  • inputs_embeds (Tensor) – [B, T, n_embed] Input features
  • labels_embeds (Tensor) – [B, T, n_embed] Target output features
  • **kwargs (optional) – Additional tensformer forward pass arguments
Returns:

mse loss, last hidden state, (present attention state), (all hidden_states), (attention scores)

Return type:

Tuple[Union[float, Tensor]]

evaluate(inputs_embeds: torch.Tensor, labels_embeds: torch.Tensor, **kwargs) → Tuple[Union[float, torch.Tensor]]

Generate a time-series prediction using the transformer and calc MSE error.

Parameters:
  • inputs_embeds (Tensor) – [B, 1, n_embed] Starting input feature(s)
  • labels_embeds (Tensor) – [B, T, n_embed] Target output features
  • **kwargs (optional) – Additional tensformer forward pass arguments
Returns:

mse loss, last hidden state, (present attention state), (all hidden_states), (attention scores)

Return type:

Tuple[Union[float, Tensor]]

generate(*args, **kwargs)

Generate call is just the forward call of the transformer

save_model(*args, **kwargs)

Saves physformer model

load_model(*args, **kwargs)

Load a physformer model

trphysx.transformer.utils

class trphysx.transformer.utils.Conv1D(nf: int, nx: int)

Bases: torch.nn.modules.module.Module

1D-convolutional layer (eqv to FCN) as defined by Radford et al. for OpenAI GPT (and also used in GPT-2). Basically works like a linear layer but the weights are transposed.

Parameters:
  • nf (int) – The number of output features.
  • nx (int) – The number of input features.
forward(x: torch.Tensor) → torch.Tensor

Forward pass

Parameters:x (Tensor) – […, nx] input features
Returns:[…, nf] output features
Return type:Tensor
trphysx.transformer.utils.gelu_new(x: torch.Tensor) → torch.Tensor

Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT).

trphysx.transformer.utils.gelu_fast(x)

Faster approximate form of GELU activation function

trphysx.transformer.utils.mish(x: torch.Tensor) → torch.Tensor

Mish activation function

trphysx.transformer.utils.linear_act(x: torch.Tensor) → torch.Tensor

Linear activate function

trphysx.transformer.utils.get_activation(activation_string: str) → Callable

Gets a activation function

Parameters:activation_string (str) – Name of activate function
Raises:KeyError – Not a valid activation function
Returns:activate function
Return type:Callable