venumML.deep_learning.transformer.transformer

class Embeddings:

Handles embedding lookup for input tokens, returning embeddings with a specified dimension.

Attributes

embedding_matrix (np.ndarray): Custom embedding matrix, where each row corresponds to the embedding vector for a token.
d_model (int): Dimensionality of each embedding vector.

Embeddings(custom_embeddings)

Initialises the Embeddings class with a custom embedding matrix.

Parameters

custom_embeddings (np.ndarray): Pre-trained embedding matrix, shape (vocab_size, d_model).

def forward(self, x, batch_size, max_seq_len):

Computes embeddings for a batch of input token sequences.

Parameters

x (np.ndarray): Array of token indices with shape [batch_size, seq_length].
batch_size (int): The number of sequences in the batch.
max_seq_len (int): Maximum sequence length.

Returns

np.ndarray: Array of embeddings with shape (batch_size, seq_length, d_model).

def positional_encoding(max_seq_len, d_model):

Generates positional encoding for a sequence of given length and embedding dimension.

Parameters

max_seq_len (int): Maximum sequence length for which positional encoding is generated.
d_model (int): Dimensionality of each embedding vector.

Returns

np.ndarray: Array of shape (max_seq_len, d_model) with positional encodings for each position.

def scaled_dot_product_attention(Q, K, V, ctx):

Computes scaled dot-product attention with encrypted attention weights.

Parameters

Q (np.ndarray): Query matrix of shape (batch_size, num_heads, seq_length, d_k).
K (np.ndarray): Key matrix of shape (batch_size, num_heads, seq_length, d_k).
V (np.ndarray): Value matrix of shape (batch_size, num_heads, seq_length, d_v).
ctx (EncryptionContext): The encryption context used to encrypt the attention scores.

Returns

output (np.ndarray): Output after applying attention weights to the values, shape (batch_size, num_heads, seq_length, d_v).
attention_weights (np.ndarray): Encrypted attention weights applied to each value.

class MultiHeadAttention:

Implements multi-head attention mechanism with separate attention heads and output projection.

Attributes

num_heads (int): Number of attention heads.
W_Qs (list): List of query weight matrices, one per head.
b_Qs (list): List of query bias vectors, one per head.
W_Ks (list): List of key weight matrices, one per head.
b_Ks (list): List of key bias vectors, one per head.
W_Vs (list): List of value weight matrices, one per head.
b_Vs (list): List of value bias vectors, one per head.
W_O (np.ndarray): Output weight matrix applied after concatenating head outputs.
b_O (np.ndarray): Output bias vector applied after concatenating head outputs.

MultiHeadAttention(num_heads)

Initialises the MultiHeadAttention class with the specified number of heads.

Parameters

num_heads (int): Number of attention heads.

def set_head_weights(self, head_index, head_weights):

Sets the weights and biases for a specific attention head.

Parameters

head_index (int): Index of the attention head.
head_weights (dict): Dictionary containing weights and biases for the specified head.

def set_output_weights(self, W_O, b_O):

Sets the weights and biases for the output layer.

Parameters

W_O (np.ndarray): Output weight matrix.
b_O (np.ndarray): Output bias vector.

def multi_head_attention(self, Q, K, V, ctx, d_model):

Computes multi-head attention for the given query, key, and value matrices.

Parameters

Q (np.ndarray): Query matrix of shape (batch_size, seq_length, d_model).
K (np.ndarray): Key matrix of shape (batch_size, seq_length, d_model).
V (np.ndarray): Value matrix of shape (batch_size, seq_length, d_model).
ctx (EncryptionContext): The encryption context used to encrypt attention scores.
d_model (int): Dimensionality of the model.

Returns

np.ndarray: The output of multi-head attention, shape (batch_size, seq_length, d_model).

class PositionwiseFeedForwardNetwork:

Implements a position-wise feed-forward network with two linear transformations and an activation function.

Attributes

d_model (int): Dimensionality of the input.
d_ff (int): Dimensionality of the hidden layer.
W_1 (np.ndarray): Weight matrix for the first linear layer.
b_1 (np.ndarray): Bias vector for the first linear layer.
W_2 (np.ndarray): Weight matrix for the second linear layer.
b_2 (np.ndarray): Bias vector for the second linear layer.

PositionwiseFeedForwardNetwork(d_model, d_ff)

Initialises the PositionwiseFeedForwardNetwork.

Parameters

d_model (int): Dimensionality of the input.
d_ff (int): Dimensionality of the hidden layer.

def set_weights(self, W_1, b_1, W_2, b_2):

Sets the weights and biases for the feed-forward network.

Parameters

W_1 (np.ndarray): Weight matrix for the first linear layer.
b_1 (np.ndarray): Bias vector for the first linear layer.
W_2 (np.ndarray): Weight matrix for the second linear layer.
b_2 (np.ndarray): Bias vector for the second linear layer.

def forward(self, x, ctx):

Forward pass through the feed-forward network.

Parameters

x (np.ndarray): Input array with shape [batch_size, seq_length, d_model].
ctx (EncryptionContext): Encryption context used for any approximations in activation.

Returns

np.ndarray: Output of the feed-forward network.

def output_linear_layer(x, W, b):

Implements the output linear layer for a Transformer model.

Parameters

x (np.ndarray): Input array representing the output of the last Transformer layer.
W (np.ndarray): Weight matrix for the linear transformation.
b (np.ndarray): Bias vector for the linear transformation.

Returns

np.ndarray: The output vector for each token in the vocabulary.

class TransformerModule:

A simplified Transformer module implementing embedding, multi-head attention, positional encoding, and a feed-forward network.

Attributes

max_seq_len (int): Maximum sequence length.
num_heads (int): Number of attention heads.
d_model (int): Dimensionality of embeddings.
MHA (MultiHeadAttention): Multi-head attention layer.
positional_encoding (np.ndarray): Positional encoding matrix.
p_ffn (PositionwiseFeedForwardNetwork): Feed-forward network.
output_w (np.ndarray): Weight matrix for the output layer.
output_b (np.ndarray): Bias vector for the output layer.

TransformerModule( encrypted_state_dict, max_seq_len, d_model, num_heads, d_ff, vocab_size)

Initialises the TransformerModule with necessary layers and weights.

Parameters

encrypted_state_dict (dict): Dictionary containing pre-trained weights in encrypted form.
max_seq_len (int): Maximum length of input sequences.
d_model (int): Dimensionality of embeddings.
num_heads (int): Number of attention heads.
d_ff (int): Dimensionality of the feed-forward network's hidden layer.
vocab_size (int): Size of the vocabulary.

def forward(self, embeddings, ctx, batch_size):

Processes the input embeddings through the Transformer.

Parameters

embeddings (np.ndarray): Input embeddings with shape [batch_size, seq_length, d_model].
ctx (EncryptionContext): Encryption context used for any approximations in activation.
batch_size (int): Number of input sequences in the batch.

Returns

np.ndarray: Output vector after embedding, positional encoding, attention, and the feed-forward network.