venumML.deep_learning.transformer.transformer
Handles embedding lookup for input tokens, returning embeddings with a specified dimension.
Attributes
- embedding_matrix (np.ndarray): Custom embedding matrix, where each row corresponds to the embedding vector for a token.
- d_model (int): Dimensionality of each embedding vector.
Initialises the Embeddings class with a custom embedding matrix.
Parameters
- custom_embeddings (np.ndarray): Pre-trained embedding matrix, shape (vocab_size, d_model).
Computes embeddings for a batch of input token sequences.
Parameters
- x (np.ndarray): Array of token indices with shape [batch_size, seq_length].
- batch_size (int): The number of sequences in the batch.
- max_seq_len (int): Maximum sequence length.
Returns
- np.ndarray: Array of embeddings with shape (batch_size, seq_length, d_model).
Generates positional encoding for a sequence of given length and embedding dimension.
Parameters
- max_seq_len (int): Maximum sequence length for which positional encoding is generated.
- d_model (int): Dimensionality of each embedding vector.
Returns
- np.ndarray: Array of shape (max_seq_len, d_model) with positional encodings for each position.
Computes scaled dot-product attention with encrypted attention weights.
Parameters
- Q (np.ndarray): Query matrix of shape (batch_size, num_heads, seq_length, d_k).
- K (np.ndarray): Key matrix of shape (batch_size, num_heads, seq_length, d_k).
- V (np.ndarray): Value matrix of shape (batch_size, num_heads, seq_length, d_v).
- ctx (EncryptionContext): The encryption context used to encrypt the attention scores.
Returns
- output (np.ndarray): Output after applying attention weights to the values, shape (batch_size, num_heads, seq_length, d_v).
- attention_weights (np.ndarray): Encrypted attention weights applied to each value.
Implements multi-head attention mechanism with separate attention heads and output projection.
Attributes
- num_heads (int): Number of attention heads.
- W_Qs (list): List of query weight matrices, one per head.
- b_Qs (list): List of query bias vectors, one per head.
- W_Ks (list): List of key weight matrices, one per head.
- b_Ks (list): List of key bias vectors, one per head.
- W_Vs (list): List of value weight matrices, one per head.
- b_Vs (list): List of value bias vectors, one per head.
- W_O (np.ndarray): Output weight matrix applied after concatenating head outputs.
- b_O (np.ndarray): Output bias vector applied after concatenating head outputs.
Initialises the MultiHeadAttention class with the specified number of heads.
Parameters
- num_heads (int): Number of attention heads.
Sets the weights and biases for a specific attention head.
Parameters
- head_index (int): Index of the attention head.
- head_weights (dict): Dictionary containing weights and biases for the specified head.
Sets the weights and biases for the output layer.
Parameters
- W_O (np.ndarray): Output weight matrix.
- b_O (np.ndarray): Output bias vector.
Computes multi-head attention for the given query, key, and value matrices.
Parameters
- Q (np.ndarray): Query matrix of shape (batch_size, seq_length, d_model).
- K (np.ndarray): Key matrix of shape (batch_size, seq_length, d_model).
- V (np.ndarray): Value matrix of shape (batch_size, seq_length, d_model).
- ctx (EncryptionContext): The encryption context used to encrypt attention scores.
- d_model (int): Dimensionality of the model.
Returns
- np.ndarray: The output of multi-head attention, shape (batch_size, seq_length, d_model).
Implements a position-wise feed-forward network with two linear transformations and an activation function.
Attributes
- d_model (int): Dimensionality of the input.
- d_ff (int): Dimensionality of the hidden layer.
- W_1 (np.ndarray): Weight matrix for the first linear layer.
- b_1 (np.ndarray): Bias vector for the first linear layer.
- W_2 (np.ndarray): Weight matrix for the second linear layer.
- b_2 (np.ndarray): Bias vector for the second linear layer.
Initialises the PositionwiseFeedForwardNetwork.
Parameters
- d_model (int): Dimensionality of the input.
- d_ff (int): Dimensionality of the hidden layer.
Sets the weights and biases for the feed-forward network.
Parameters
- W_1 (np.ndarray): Weight matrix for the first linear layer.
- b_1 (np.ndarray): Bias vector for the first linear layer.
- W_2 (np.ndarray): Weight matrix for the second linear layer.
- b_2 (np.ndarray): Bias vector for the second linear layer.
Forward pass through the feed-forward network.
Parameters
- x (np.ndarray): Input array with shape [batch_size, seq_length, d_model].
- ctx (EncryptionContext): Encryption context used for any approximations in activation.
Returns
- np.ndarray: Output of the feed-forward network.
Implements the output linear layer for a Transformer model.
Parameters
- x (np.ndarray): Input array representing the output of the last Transformer layer.
- W (np.ndarray): Weight matrix for the linear transformation.
- b (np.ndarray): Bias vector for the linear transformation.
Returns
- np.ndarray: The output vector for each token in the vocabulary.
A simplified Transformer module implementing embedding, multi-head attention, positional encoding, and a feed-forward network.
Attributes
- max_seq_len (int): Maximum sequence length.
- num_heads (int): Number of attention heads.
- d_model (int): Dimensionality of embeddings.
- MHA (MultiHeadAttention): Multi-head attention layer.
- positional_encoding (np.ndarray): Positional encoding matrix.
- p_ffn (PositionwiseFeedForwardNetwork): Feed-forward network.
- output_w (np.ndarray): Weight matrix for the output layer.
- output_b (np.ndarray): Bias vector for the output layer.
Initialises the TransformerModule with necessary layers and weights.
Parameters
- encrypted_state_dict (dict): Dictionary containing pre-trained weights in encrypted form.
- max_seq_len (int): Maximum length of input sequences.
- d_model (int): Dimensionality of embeddings.
- num_heads (int): Number of attention heads.
- d_ff (int): Dimensionality of the feed-forward network's hidden layer.
- vocab_size (int): Size of the vocabulary.
Processes the input embeddings through the Transformer.
Parameters
- embeddings (np.ndarray): Input embeddings with shape [batch_size, seq_length, d_model].
- ctx (EncryptionContext): Encryption context used for any approximations in activation.
- batch_size (int): Number of input sequences in the batch.
Returns
- np.ndarray: Output vector after embedding, positional encoding, attention, and the feed-forward network.