venumML.deep_learning.transformer.transformer

class Embeddings:

Handles embedding lookup for input tokens, returning embeddings with a specified dimension.

Attributes
  • embedding_matrix (np.ndarray): Custom embedding matrix, where each row corresponds to the embedding vector for a token.
  • d_model (int): Dimensionality of each embedding vector.
Embeddings(custom_embeddings)

Initialises the Embeddings class with a custom embedding matrix.

Parameters
  • custom_embeddings (np.ndarray): Pre-trained embedding matrix, shape (vocab_size, d_model).
def forward(self, x, batch_size, max_seq_len):

Computes embeddings for a batch of input token sequences.

Parameters
  • x (np.ndarray): Array of token indices with shape [batch_size, seq_length].
  • batch_size (int): The number of sequences in the batch.
  • max_seq_len (int): Maximum sequence length.
Returns
  • np.ndarray: Array of embeddings with shape (batch_size, seq_length, d_model).
def positional_encoding(max_seq_len, d_model):

Generates positional encoding for a sequence of given length and embedding dimension.

Parameters
  • max_seq_len (int): Maximum sequence length for which positional encoding is generated.
  • d_model (int): Dimensionality of each embedding vector.
Returns
  • np.ndarray: Array of shape (max_seq_len, d_model) with positional encodings for each position.
def scaled_dot_product_attention(Q, K, V, ctx):

Computes scaled dot-product attention with encrypted attention weights.

Parameters
  • Q (np.ndarray): Query matrix of shape (batch_size, num_heads, seq_length, d_k).
  • K (np.ndarray): Key matrix of shape (batch_size, num_heads, seq_length, d_k).
  • V (np.ndarray): Value matrix of shape (batch_size, num_heads, seq_length, d_v).
  • ctx (EncryptionContext): The encryption context used to encrypt the attention scores.
Returns
  • output (np.ndarray): Output after applying attention weights to the values, shape (batch_size, num_heads, seq_length, d_v).
  • attention_weights (np.ndarray): Encrypted attention weights applied to each value.
class MultiHeadAttention:

Implements multi-head attention mechanism with separate attention heads and output projection.

Attributes
  • num_heads (int): Number of attention heads.
  • W_Qs (list): List of query weight matrices, one per head.
  • b_Qs (list): List of query bias vectors, one per head.
  • W_Ks (list): List of key weight matrices, one per head.
  • b_Ks (list): List of key bias vectors, one per head.
  • W_Vs (list): List of value weight matrices, one per head.
  • b_Vs (list): List of value bias vectors, one per head.
  • W_O (np.ndarray): Output weight matrix applied after concatenating head outputs.
  • b_O (np.ndarray): Output bias vector applied after concatenating head outputs.
MultiHeadAttention(num_heads)

Initialises the MultiHeadAttention class with the specified number of heads.

Parameters
  • num_heads (int): Number of attention heads.
def set_head_weights(self, head_index, head_weights):

Sets the weights and biases for a specific attention head.

Parameters
  • head_index (int): Index of the attention head.
  • head_weights (dict): Dictionary containing weights and biases for the specified head.
def set_output_weights(self, W_O, b_O):

Sets the weights and biases for the output layer.

Parameters
  • W_O (np.ndarray): Output weight matrix.
  • b_O (np.ndarray): Output bias vector.
def multi_head_attention(self, Q, K, V, ctx, d_model):

Computes multi-head attention for the given query, key, and value matrices.

Parameters
  • Q (np.ndarray): Query matrix of shape (batch_size, seq_length, d_model).
  • K (np.ndarray): Key matrix of shape (batch_size, seq_length, d_model).
  • V (np.ndarray): Value matrix of shape (batch_size, seq_length, d_model).
  • ctx (EncryptionContext): The encryption context used to encrypt attention scores.
  • d_model (int): Dimensionality of the model.
Returns
  • np.ndarray: The output of multi-head attention, shape (batch_size, seq_length, d_model).
class PositionwiseFeedForwardNetwork:

Implements a position-wise feed-forward network with two linear transformations and an activation function.

Attributes
  • d_model (int): Dimensionality of the input.
  • d_ff (int): Dimensionality of the hidden layer.
  • W_1 (np.ndarray): Weight matrix for the first linear layer.
  • b_1 (np.ndarray): Bias vector for the first linear layer.
  • W_2 (np.ndarray): Weight matrix for the second linear layer.
  • b_2 (np.ndarray): Bias vector for the second linear layer.
PositionwiseFeedForwardNetwork(d_model, d_ff)

Initialises the PositionwiseFeedForwardNetwork.

Parameters
  • d_model (int): Dimensionality of the input.
  • d_ff (int): Dimensionality of the hidden layer.
def set_weights(self, W_1, b_1, W_2, b_2):

Sets the weights and biases for the feed-forward network.

Parameters
  • W_1 (np.ndarray): Weight matrix for the first linear layer.
  • b_1 (np.ndarray): Bias vector for the first linear layer.
  • W_2 (np.ndarray): Weight matrix for the second linear layer.
  • b_2 (np.ndarray): Bias vector for the second linear layer.
def forward(self, x, ctx):

Forward pass through the feed-forward network.

Parameters
  • x (np.ndarray): Input array with shape [batch_size, seq_length, d_model].
  • ctx (EncryptionContext): Encryption context used for any approximations in activation.
Returns
  • np.ndarray: Output of the feed-forward network.
def output_linear_layer(x, W, b):

Implements the output linear layer for a Transformer model.

Parameters
  • x (np.ndarray): Input array representing the output of the last Transformer layer.
  • W (np.ndarray): Weight matrix for the linear transformation.
  • b (np.ndarray): Bias vector for the linear transformation.
Returns
  • np.ndarray: The output vector for each token in the vocabulary.
class TransformerModule:

A simplified Transformer module implementing embedding, multi-head attention, positional encoding, and a feed-forward network.

Attributes
  • max_seq_len (int): Maximum sequence length.
  • num_heads (int): Number of attention heads.
  • d_model (int): Dimensionality of embeddings.
  • MHA (MultiHeadAttention): Multi-head attention layer.
  • positional_encoding (np.ndarray): Positional encoding matrix.
  • p_ffn (PositionwiseFeedForwardNetwork): Feed-forward network.
  • output_w (np.ndarray): Weight matrix for the output layer.
  • output_b (np.ndarray): Bias vector for the output layer.
TransformerModule( encrypted_state_dict, max_seq_len, d_model, num_heads, d_ff, vocab_size)

Initialises the TransformerModule with necessary layers and weights.

Parameters
  • encrypted_state_dict (dict): Dictionary containing pre-trained weights in encrypted form.
  • max_seq_len (int): Maximum length of input sequences.
  • d_model (int): Dimensionality of embeddings.
  • num_heads (int): Number of attention heads.
  • d_ff (int): Dimensionality of the feed-forward network's hidden layer.
  • vocab_size (int): Size of the vocabulary.
def forward(self, embeddings, ctx, batch_size):

Processes the input embeddings through the Transformer.

Parameters
  • embeddings (np.ndarray): Input embeddings with shape [batch_size, seq_length, d_model].
  • ctx (EncryptionContext): Encryption context used for any approximations in activation.
  • batch_size (int): Number of input sequences in the batch.
Returns
  • np.ndarray: Output vector after embedding, positional encoding, attention, and the feed-forward network.