trphysx.transformer¶

trphysx.transformer.attention¶

class trphysx.transformer.attention.MaskedAttention(nx: int, n_ctx: int, config: trphysx.config.configuration_phys.PhysConfig, scale: bool = False, mask: str = 'tril')¶

Bases: torch.nn.modules.module.Module

Masked self-attention module based on the Hugging face implementation https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_gpt2.py

Parameters:	nx (int) – Dimensionality of feature vector n_ctx (int) – Context length of the attention (TODO: Not needed with config object?) config (PhysConfig) – Transformer config object scale (bool, optional) – Scale the attention scores. Defaults to False. mask (str, optional) – Attention mask type. Defaults to ‘tril’.
Raises:	ValueError – Invalid mask type

merge_heads(x: torch.Tensor) → torch.Tensor¶

Merge attention heads

Parameters:	x (Tensor) – [batch, head, seq_length, head_features] Input tensor
Returns:	[batch, seq_length, head * head_features] Concatenated output tensor
Return type:	Tensor

split_heads(x, k: bool = False) → torch.Tensor¶

Splits key, query or value tensor into separate heads. Dimensionality of output depends if tensor is a key.

Parameters:	x (Tensor) – [batch, seq_length, nx] Input tensor k (bool) – If input tensor is a key tensor
Returns:	[batch, head, seq_length, head_features] Split features for query and value, [batch, head, seq_length, head_features] split feature for key
Return type:	Tensor

forward(x: torch.Tensor, layer_past: List[torch.Tensor] = None, attention_mask: torch.Tensor = None, head_mask: torch.Tensor = None, use_cache: bool = False, output_attentions: bool = False) → List[torch.Tensor]¶

Masked attention forward pass

Parameters:	x (Tensor) – [batch, seq_length, nx] Input feature. layer_past (Tensor, optional) – [2, batch, n_head, seq_length, nx] Precomputed self-attention vectors. Defaults to None. attention_mask (Tensor, optional) – Optional defined attention mask. Applied before soft mask. Defaults to None. head_mask (Tensor, optional) – Optional attention value mask. Applied after softmax Defaults to None. use_cache (bool, optional) – Return calculated key values or faster generation. Defaults to False. output_attentions (bool, optional) – Return attention matrix. Defaults to False.
Returns:	Output consisting of output feature, key values (if requested), attention tensor (if requested)
Return type:	List[Tensor]

trphysx.transformer.generate_utils¶

class trphysx.transformer.generate_utils.GenerationMixin¶

Bases: object

Class containing generative functions for transformers

prepare_inputs_for_generation(inputs_embeds: torch.Tensor, position_ids: torch.Tensor = None, prop_embeds: torch.Tensor = None, **kwargs) → Dict[str, torch.Tensor]¶

Prepares input features for prediction

Parameters:	inputs_features (Dict[str, Tensor]) – Input feature tensors are being generated. (that) –
Returns:	Dictionary of model inputs
Return type:	Dict[str, Tensor]

generate(inputs_embeds: torch.Tensor, position_ids: torch.Tensor = None, prop_embeds: torch.Tensor = None, max_length: int = None, attention_mask: torch.LongTensor = None, use_cache: bool = False, **model_specific_kwargs) → Tuple[torch.Tensor]¶

Generated a predicted sequence of features

Parameters:	inputs_embeds (Tensor) – [batch, seq, n_embed] Input feature tensor position_ids (Tensor, optional) – [seq, n_embed] Position tensor. Defaults to None. prop_embeds (Tensor, optional) – [batch, seq, n_embed] Property tensor. Defaults to None. max_length (int, optional) – Length of time series to predict. Defaults to None. attention_mask (LongTensor, optional) – Manual attention mask. Defaults to None. use_cache (bool, optional) – Cache past transformer states for faster generation. Defaults to False.
Returns:	[batch, max_length, n_embed] Predicted feature tensor, additional optional transformer outputs.
Return type:	Tuple[Tensor]

trphysx.transformer.phys_transformer_base¶

class trphysx.transformer.phys_transformer_base.PhysformerBase(config, *inputs, **kwargs)¶

Bases: torch.nn.modules.module.Module

Parent class for physical transformers

model_name = 'transformer_model'¶

forward()¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

generate()¶

get_input_embeddings()¶

set_input_embeddings(new_embeddings)¶

tie_weights()¶: Tie the weights between the input embeddings and the output embeddings. If the torchscript flag is set in the configuration, can’t handle parameter sharing so we are cloning the weights instead.

save_model(save_directory: str, epoch: int = 0) → None¶

Saves transformer model to the specified directory.

Parameters:	save_directory (str) – Folder to save file at epoch (int, optional) – Epoch number to name model file. Defaults to 0.
Raises:	AssertionError – If provided directory is not valid.

load_model(file_or_path_directory: str, epoch: int = 0) → None¶

Load a transformer model from the specified file or path

Parameters:	file_or_path_directory (str) – File or folder path to load state dictionary from. epoch (int, optional) – Epoch of current model for file name, used if folder path is provided. Defaults to 0.
Raises:	FileNotFoundError – If provided file or directory could not be found.

trphysx.transformer.phys_transformer_gpt2¶

class trphysx.transformer.phys_transformer_gpt2.MLP(n_state: int, config: trphysx.config.configuration_phys.PhysConfig)¶

Bases: torch.nn.modules.module.Module

Simple fully connected neural network layer. Includes activations function and dropout.

Parameters:	n_state (int) – dimensionality of input features config (PhysConfig) – Phys-transformer config object

forward(x: torch.Tensor) → torch.Tensor¶

Forward pass

Parameters:	x (Tensor) – [B, T, n_state] input features
Returns:	Output features
Return type:	Tensor

class trphysx.transformer.phys_transformer_gpt2.Block(n_ctx: int, config: trphysx.config.configuration_phys.PhysConfig, scale: bool = False)¶

Bases: torch.nn.modules.module.Module

Transformer decoder block consisting of layer norm, masked self-attention, layer norm and fully connected layer.

Parameters:	n_ctx (int) – contex length of block config (PhysConfig) – Phys-transformer config object scale (bool, optional) – Scaled self-attention calculation. Defaults to False.

forward(x: torch.Tensor, layer_past: List[torch.Tensor] = None, attention_mask: torch.LongTensor = None, head_mask: torch.LongTensor = None, use_cache: bool = False, output_attentions: bool = False) → List[torch.Tensor]¶

Forward pass

Parameters:	x (Tensor) – [B, T, n_state] input features layer_past ([type], optional) – Past self-attention calculation. Defaults to None. attention_mask (LongTensor, optional) – Attention mask. Defaults to None. head_mask (LongTensor, optional) – Attention value. Defaults to None. use_cache (bool, optional) – Store attention state (key values). Defaults to False. output_attentions (bool, optional) – Return attention values. Defaults to False.
Returns:	List of output tensors
Return type:	List[Tensor]

class trphysx.transformer.phys_transformer_gpt2.PhysformerGPT2(config: trphysx.config.configuration_phys.PhysConfig, model_name: str = None)¶

Bases: trphysx.transformer.generate_utils.GenerationMixin, trphysx.transformer.phys_transformer_base.PhysformerBase

Transformer decoder model for modeling physics

Parameters:	config (PhysConfig) – Phys-transformer config object model_name (str, optional) – Model name. Defaults to None.

forward(inputs_embeds: torch.Tensor, position_ids: torch.Tensor = None, prop_embeds: torch.Tensor = None, past: List[List[torch.Tensor]] = None, attention_mask: torch.LongTensor = None, head_mask: torch.LongTensor = None, use_cache: bool = True, output_attentions: bool = False) → List[torch.Tensor]¶

Forward pass

Note: Attention masks are not properly implemented presently and will likely not work.

Parameters:	inputs_embeds (Tensor) – [B, T, n_embed] Input features position_ids (Tensor, optional) – [T, n_embed] Manually specify position ids. Defaults to None. prop_embeds (Tensor, optional) – [B, T, n_embed] Optional property feature. Defaults to None. past (List[List[Tensor]], optional) – Transformer past state. Defaults to None. attention_mask (LongTensor, optional) – [B, T] Sequence attention mask. Defaults to None. head_mask (LongTensor, optional) – Attention value mask. Defaults to None. use_cache (bool, optional) – Return attention states (keys). Defaults to True. output_attentions (bool, optional) – Return attention scores. Defaults to False.
Returns:	Output features, attention state (if requested), hidden states of all layers (if requested), attention tensor (if requested)
Return type:	List[Tensor]

trphysx.transformer.phys_transformer_helpers¶

class trphysx.transformer.phys_transformer_helpers.PhysformerTrain(config: trphysx.config.configuration_phys.PhysConfig, transformer_model: trphysx.transformer.phys_transformer_base.PhysformerBase = None)¶

Bases: trphysx.transformer.phys_transformer_base.PhysformerBase

Model head for training the physics transformer base.

Parameters:	config (PhysConfig) – Phys-transformer config object transformer_model (PhysformerBase) – Initialized transformer model

forward(inputs_embeds: torch.Tensor, labels_embeds: torch.Tensor, **kwargs) → Tuple[Union[float, torch.Tensor]]¶

Forward method for this head calculates the MSE between the predicted time-series and target embeddings This head allows for easy distribution to multiple GPUs and CPUs. See transformer

Parameters:	inputs_embeds (Tensor) – [B, T, n_embed] Input features labels_embeds (Tensor) – [B, T, n_embed] Target output features *kwargs (optional*) – Additional tensformer forward pass arguments
Returns:	mse loss, last hidden state, (present attention state), (all hidden_states), (attention scores)
Return type:	Tuple[Union[float, Tensor]]

evaluate(inputs_embeds: torch.Tensor, labels_embeds: torch.Tensor, **kwargs) → Tuple[Union[float, torch.Tensor]]¶

Generate a time-series prediction using the transformer and calc MSE error.

Parameters:	inputs_embeds (Tensor) – [B, 1, n_embed] Starting input feature(s) labels_embeds (Tensor) – [B, T, n_embed] Target output features *kwargs (optional*) – Additional tensformer forward pass arguments
Returns:	mse loss, last hidden state, (present attention state), (all hidden_states), (attention scores)
Return type:	Tuple[Union[float, Tensor]]

generate(*args, **kwargs)¶: Generate call is just the forward call of the transformer

save_model(*args, **kwargs)¶: Saves physformer model

load_model(*args, **kwargs)¶: Load a physformer model

trphysx.transformer.utils¶

class trphysx.transformer.utils.Conv1D(nf: int, nx: int)¶

Bases: torch.nn.modules.module.Module

1D-convolutional layer (eqv to FCN) as defined by Radford et al. for OpenAI GPT (and also used in GPT-2). Basically works like a linear layer but the weights are transposed.

Note

Code adopted from: https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_utils.py

Parameters:	nf (int) – The number of output features. nx (int) – The number of input features.

forward(x: torch.Tensor) → torch.Tensor¶

Forward pass

Parameters:	x (Tensor) – […, nx] input features
Returns:	[…, nf] output features
Return type:	Tensor

trphysx.transformer.utils.gelu_new(x: torch.Tensor) → torch.Tensor¶: Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT).

trphysx.transformer.utils.gelu_fast(x)¶: Faster approximate form of GELU activation function

trphysx.transformer.utils.mish(x: torch.Tensor) → torch.Tensor¶: Mish activation function

trphysx.transformer.utils.linear_act(x: torch.Tensor) → torch.Tensor¶: Linear activate function

trphysx.transformer.utils.get_activation(activation_string: str) → Callable¶

Gets a activation function

Parameters:	activation_string (str) – Name of activate function
Raises:	KeyError – Not a valid activation function
Returns:	activate function
Return type:	Callable