deepword.models package¶
Submodules¶
deepword.models.dqn_modeling module¶
-
class
deepword.models.dqn_modeling.BaseDQN(hp, src_embeddings=None, is_infer=False)¶ Bases:
object-
classmethod
get_eval_model(hp, device_placement)¶
-
get_q_actions()¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(q_actions)¶
-
classmethod
init_glove(glove_path)¶
-
classmethod
-
class
deepword.models.dqn_modeling.CnnDQN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.dqn_modeling.BaseDQN-
get_q_actions()¶
-
get_train_op(q_actions)¶
-
-
class
deepword.models.dqn_modeling.LstmDQN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.dqn_modeling.BaseDQN-
get_q_actions()¶
-
get_train_op(q_actions)¶
-
deepword.models.drrn_modeling module¶
-
class
deepword.models.drrn_modeling.BertDRRN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.drrn_modeling.CnnDRRN-
__init__(hp, src_embeddings=None, is_infer=False)¶ - inputs:
src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate
some actions for a certain state.
- Parameters
hp –
is_infer –
-
get_q_actions()¶
-
-
class
deepword.models.drrn_modeling.CnnDRRN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.dqn_modeling.BaseDQN-
__init__(hp, src_embeddings=None, is_infer=False)¶ - inputs:
src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions
- Parameters
hp –
is_infer –
-
classmethod
get_eval_model(hp, device_placement)¶
-
classmethod
get_eval_student_model(hp, device_placement)¶
-
get_q_actions()¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(q_actions)¶
-
classmethod
get_train_student_model(hp, device_placement)¶
-
-
class
deepword.models.drrn_modeling.TransformerDRRN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.drrn_modeling.CnnDRRN-
__init__(hp, src_embeddings=None, is_infer=False)¶ - inputs:
src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate
some actions for a certain state.
- Parameters
hp –
src_embeddings –
is_infer –
-
get_q_actions()¶
-
deepword.models.dsqn_modeling module¶
-
class
deepword.models.dsqn_modeling.CnnDSQN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.dqn_modeling.BaseDQNDSQN that uses CNN as the trajectory encoder
-
classmethod
get_eval_model(hp, device_placement)¶
-
get_h_state(src)¶
-
get_merged_train_op(loss, snn_loss)¶
-
get_q_actions()¶
-
get_snn_train_op(semantic_same)¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(q_actions)¶
-
classmethod
get_train_student_model(hp, device_placement)¶
-
is_semantic_same()¶
-
classmethod
-
class
deepword.models.dsqn_modeling.CnnZorkDSQN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.dsqn_modeling.CnnDSQNDSQN for Zork
-
classmethod
get_eval_model(hp, device_placement)¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(q_actions)¶
-
classmethod
-
class
deepword.models.dsqn_modeling.TransformerDSQN(hp, src_embeddings=None, is_infer=False)¶ Bases:
deepword.models.dsqn_modeling.CnnDSQNDSQN that uses transformer as the trajectory encoder
-
get_h_state(src)¶
-
deepword.models.gen_modeling module¶
-
class
deepword.models.gen_modeling.TransformerGenDQN(hp, is_infer=False)¶ Bases:
deepword.models.dqn_modeling.BaseDQN-
decode()¶
-
classmethod
get_eval_model(hp, device_placement)¶
-
get_q_actions()¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(q_actions)¶
-
classmethod
get_train_student_model(hp, device_placement)¶
-
-
class
deepword.models.gen_modeling.TransformerPGN(hp, is_infer=False)¶ Bases:
deepword.models.gen_modeling.TransformerGenDQNTransformerPGN is similar with TransformerGenDQN, the only difference is the former uses cross entropy loss, while the latter uses MSE. Thus, TransformerPGN is not allowed training with the DQN framework. It can only be trained with supervised learning, e.g. imitation learning.
-
get_train_op(q_actions)¶ - b_weight could be
per instance, i.e. [batch_size, 1]
per token, i.e. [batch_size, n_tokens]
-
deepword.models.models module¶
-
class
deepword.models.models.DQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor])¶ Bases:
object
-
class
deepword.models.models.DRRNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], actions_: tensorflow.python.ops.array_ops.placeholder, actions_len_: tensorflow.python.ops.array_ops.placeholder, actions_repeats_: tensorflow.python.ops.array_ops.placeholder)¶
-
class
deepword.models.models.DSQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], actions_: tensorflow.python.ops.array_ops.placeholder, actions_len_: tensorflow.python.ops.array_ops.placeholder, actions_repeats_: tensorflow.python.ops.array_ops.placeholder, snn_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], weighted_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], semantic_same: tensorflow.python.framework.ops.Tensor, snn_src_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src_len_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_len_: Optional[tensorflow.python.ops.array_ops.placeholder], labels_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_loss: Optional[tensorflow.python.framework.ops.Tensor], weighted_loss: Optional[tensorflow.python.framework.ops.Tensor], merged_train_op: Optional[tensorflow.python.framework.ops.Operation], snn_train_op: Optional[tensorflow.python.framework.ops.Operation], h_states_diff: Optional[tensorflow.python.framework.ops.Tensor])¶
-
class
deepword.models.models.DSQNZorkModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], snn_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], weighted_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], semantic_same: tensorflow.python.framework.ops.Tensor, snn_src_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src_len_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_len_: Optional[tensorflow.python.ops.array_ops.placeholder], labels_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_loss: Optional[tensorflow.python.framework.ops.Tensor], weighted_loss: Optional[tensorflow.python.framework.ops.Tensor], merged_train_op: Optional[tensorflow.python.framework.ops.Operation], snn_train_op: Optional[tensorflow.python.framework.ops.Operation], h_states_diff: Optional[tensorflow.python.framework.ops.Tensor])¶
-
class
deepword.models.models.GenDQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], decoded_idx_infer: tensorflow.python.framework.ops.Tensor, action_idx_out_: tensorflow.python.ops.array_ops.placeholder, action_len_: tensorflow.python.ops.array_ops.placeholder, temperature_: tensorflow.python.ops.array_ops.placeholder, p_gen: tensorflow.python.framework.ops.Tensor, p_gen_infer: tensorflow.python.framework.ops.Tensor, beam_size_: tensorflow.python.ops.array_ops.placeholder, use_greedy_: tensorflow.python.ops.array_ops.placeholder, col_eos_idx: tensorflow.python.framework.ops.Tensor, decoded_logits_infer: tensorflow.python.framework.ops.Tensor)¶
-
class
deepword.models.models.NLUModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], classification_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], seg_tj_action_: tensorflow.python.ops.array_ops.placeholder, swag_labels_: Optional[tensorflow.python.ops.array_ops.placeholder], classification_loss: Optional[tensorflow.python.framework.ops.Tensor], classification_train_op: Optional[tensorflow.python.framework.ops.Operation])¶
-
class
deepword.models.models.SNNModel(graph: tensorflow.python.framework.ops.Graph, target_src_: tensorflow.python.ops.array_ops.placeholder, same_src_: tensorflow.python.ops.array_ops.placeholder, diff_src_: tensorflow.python.ops.array_ops.placeholder, semantic_same: tensorflow.python.framework.ops.Operation, train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], train_summary_op: Optional[tensorflow.python.framework.ops.Operation])¶ Bases:
object
deepword.models.nlu_modeling module¶
-
class
deepword.models.nlu_modeling.AlbertNLU(hp, is_infer=False)¶ Bases:
deepword.models.nlu_modeling.BertNLU-
__init__(hp, is_infer=False)¶ - inputs:
- src: source sentences to encode,
has paddings, [CLS], and [SEP] prepared
src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate
some actions for a certain state.
- Parameters
hp –
is_infer –
-
get_q_actions()¶
-
-
class
deepword.models.nlu_modeling.BertNLU(hp, is_infer=False)¶ Bases:
deepword.models.dqn_modeling.BaseDQN-
__init__(hp, is_infer=False)¶ - inputs:
- src: source sentences to encode,
has paddings, [CLS], and [SEP] prepared
src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate
some actions for a certain state.
- Parameters
hp –
is_infer –
-
get_classification_train_op(q_actions)¶ q_actions: [batch_size, 1] in this case, when we want to compute classification error, we need the batch_size = src batch size * num classes which means that number of classes for each src should be equal :param q_actions: :return:
-
classmethod
get_eval_model(hp, device_placement)¶
-
classmethod
get_eval_student_model(hp, device_placement)¶
-
get_q_actions()¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(q_actions)¶
-
classmethod
get_train_student_model(hp, device_placement)¶
-
-
deepword.models.nlu_modeling.create_eval_bert_nlu_model(model_creator, hp, device_placement)¶
-
deepword.models.nlu_modeling.create_train_bert_nlu_model(model_creator, hp, device_placement)¶
deepword.models.snn_modeling module¶
-
class
deepword.models.snn_modeling.BertSNN(hp, is_infer=False)¶ Bases:
objectUse SNN to encode sentences for additive features representation learning
-
add_cls_token(src)¶
-
classmethod
get_eval_model(hp, device_placement)¶
-
classmethod
get_eval_student_model(hp, device_placement)¶
-
get_h_state(raw_src)¶
-
classmethod
get_train_model(hp, device_placement)¶
-
get_train_op(semantic_same)¶
-
classmethod
get_train_student_model(hp, device_placement)¶
-
is_semantic_same()¶
-
deepword.models.transformer module¶
Copied from https://www.tensorflow.org/beta/tutorials/text/transformer decode function added by Xusen Yin
-
class
deepword.models.transformer.Decoder(num_layers, d_model, num_heads, dff, tgt_vocab_size, dropout_rate=0.1, with_pointer=False)¶ Bases:
tensorflow.python.keras.engine.base_layer.Layer-
call(x, enc_x, enc_output, training, look_ahead_mask, padding_mask, copy_mask=None)¶ decode with pointer
- Parameters
x – decoder input
enc_x – encoder input
enc_output – encoder encoded result
training – is training or inference
look_ahead_mask – combined look ahead mask with padding mask
padding_mask – padding mask for source sentence
copy_mask – dense vector size |V| to mark all tokens that skip copying with 1; otherwise, 0.
- Returns
total logits, probability of generation, gen logits, copy logits
-
-
class
deepword.models.transformer.DecoderLayer(d_model, num_heads, dff, rate=0.1)¶ Bases:
tensorflow.python.keras.engine.base_layer.Layer-
call(x, enc_output, training, look_ahead_mask, padding_mask)¶ This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
-
-
class
deepword.models.transformer.Encoder(num_layers, d_model, num_heads, dff, input_vocab_size, dropout_rate=0.1)¶ Bases:
tensorflow.python.keras.engine.base_layer.Layer-
call(x, training=None, mask=None, x_seg=None)¶ This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
-
-
class
deepword.models.transformer.EncoderLayer(d_model, num_heads, dff, rate=0.1)¶ Bases:
tensorflow.python.keras.engine.base_layer.Layer-
call(x, training, mask)¶ This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
-
-
class
deepword.models.transformer.MultiHeadAttention(d_model, num_heads)¶ Bases:
tensorflow.python.keras.engine.base_layer.Layer-
call(v, k, q, mask)¶ This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
-
split_heads(x, batch_size)¶ Split the last dimension into (num_heads, depth). Transpose the result such that the shape is
(batch_size, num_heads, seq_len, depth)
-
-
class
deepword.models.transformer.Transformer(num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, dropout_rate=0.1, with_pointer=True)¶ Bases:
tensorflow.python.keras.engine.training.Model-
call(inp, tar, training, copy_mask=None)¶ Calls the model on new inputs.
In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).
- Parameters
inputs – A tensor or list of tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a tensor or None (no mask).
- Returns
A tensor if there is a single output, or a list of tensors if there are more than one outputs.
-
decode(enc_x, training, max_tar_len, sos_id, eos_id, padding_id, use_greedy=True, beam_size=1, temperature=1.0, copy_mask=None)¶
-
-
deepword.models.transformer.categorical_with_replacement(logits, k: int)¶
-
deepword.models.transformer.categorical_without_replacement(logits, k: int)¶ Courtesy of https://github.com/tensorflow/tensorflow/issues/ 9260#issuecomment-437875125 also cite here: @misc{vieira2014gumbel,
title = {Gumbel-max trick and weighted reservoir sampling}, author = {Tim Vieira}, url = {http://timvieira.github.io/blog/post/2014/08/01/ gumbel-max-trick-and-weighted-reservoir-sampling/}, year = {2014}
} Notice that the logits represent unnormalized log probabilities, in the citation above, there is no need to normalized them first to add the Gumbel random variant, which surprises me! since I thought it should be logits - tf.reduce_logsumexp(logits) + z
-
deepword.models.transformer.create_decode_masks(tar)¶ Create masking for decoding
This masking combines the look ahead mask and target sentence padding mask.
We create look ahead mask for each sentence;
We combine the sentence padding mask with the look ahead mask, e.g. when the look ahead mask says “0” for a token, while the sentence padding mask says “1” for the same token because of the token is a padding, then the final mask for this token is “1”.
- Parameters
tar – target sentence, shape: (batch_size, seq_len_k)
- Returns
(batch_size, 1, seq_len_k, seq_len_k)
- Return type
a combined mask of look ahead mask and padding mask, shape
Examples
>>> tar_src = [[1,2,3,4,0,0], [1,3,0,0,0,0]] >>> create_decode_masks(tar_src) array([[[[0., 1., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 0., 1., 1., 1.], [0., 0., 0., 0., 1., 1.], [0., 0., 0., 0., 1., 1.], [0., 0., 0., 0., 1., 1.]]], [[[0., 1., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.]]]], dtype=float32)
-
deepword.models.transformer.create_look_ahead_mask(size: int)¶ create look ahead mask for decoding
At every decoding step i, only t_0, …, t_i can be accessed by the model, while t_{i+1}, …, t_n should be masked out.
- Parameters
size – decoding output size
- Returns
look ahead mask, True means masked out.
Examples
>>> create_look_ahead_mask(3) array([[0., 1., 1.], [0., 0., 1.], [0., 0., 0.]], dtype=float32)
-
deepword.models.transformer.create_padding_mask(seq)¶ Padding value should be 0. This mask contains one dimension for num_heads, i.e. (batch_size, <broadcast to num_heads>, <broadcast to seq_len_q>, seq_len_k)
- Parameters
seq – (batch_size, seq_len_k)
- Returns
padding mask, paddings is set to True, others are False shape: (batch_size, 1, 1, seq_len_k)
-
deepword.models.transformer.decode_next_step(decoder, time, enc_x, enc_output, training, dec_padding_mask, copy_mask, batch_size, tgt_vocab_size, eos_id, padding_id, beam_size, use_greedy, temperature, inc_tar, inc_continue, inc_valid_len, inc_p_gen, inc_sum_logits)¶ decode one step with beam search given inc_tar as the current decoded target sequence (batch_size * beam_size), first decode one step with decoder to get decoded_logits. then mask the decoded_logits:
if continue to decode (i.e. eos never reached) and current time reach the max_tar_len, then only EOS is allowed to choose;
if not continue to decode, only PAD is allowed to choose;
default, we don’t mask the decoded_logits.
After get predicted_id, either by sampling method or greedy method, we compute 1) beam_id and 2) token_id from predicted_id. beam_id indicates which beam to choose, token_id indicates under that beam, which token to choose.
for loop variables, inc_tar, inc_continue, inc_logits, inc_valid_len, and inc_p_gen, we first select rows according to beam_id, then pad the token_id related info to the end. e.g. given beam_size = 2, batch_size = 2, we have inc_tar:
- [[[1, 2, 3],
[2, 3, 4]], # –> this beam row will be deleted
- [[9, 8, 7],
[8, 7, 6]]]
if beam_id = [[0, 0], [0, 1]], then we choose [1, 2, 3] twice, and [9, 8, 7] once, and [8, 7, 6] once, then make the inc_tar to be [[[1, 2, 3],
[1, 2, 3]],
- [[9, 8, 7],
[8, 7, 6]]]
then pad new token_id to the end.
-
deepword.models.transformer.get_sparse_idx_for_copy(src, target_seq_len: int)¶ Create sparse index from source sentence for copying into decoder using the tf.scatter_nd method.
Considering the following source sentence: “a, b, a, c”; turn it into indices: [0, 1, 0, 2], and they have attention weights attn = [a0, a1, a2, a3].
Now we want to decode a sentence with 3 tokens, for each generated token, we want to collect attention weights from the source sentence, and mix with the logits to generate the current token.
I.e. for decoded sentence position i, we have logits(i) = [0.1, 0.2, 0.3, 0.5] for all possible tokens a, b, c, d. Then we want to sum the attention weights of two-0s, one-1, and one-2 into the logits(i) according to a generation weight p(i), i.e. total logits = logits(i) * p(i) + [a0 + a2, a1, a3, 0] * (1 - p(i)).
The goal is to create a dense vector of vocabulary size, and copy attention weights from source sentence to the dense vector.
We create a inverse index to do so. For target token i, we need to collect [(0, a0), (1, a1), (0, a2), (2, a3)] to construct the vector.
- Parameters
src – source sentence
target_seq_len – target sequence len
- Returns
sparse index to construct attention weight matrix for a batch
Examples
>>> get_sparse_idx_for_copy(src=[[0, 1, 0, 2]], target_seq_len=3) array([[[[0, 0], [0, 1], [0, 0], [0, 2]], [[1, 0], [1, 1], [1, 0], [1, 2]], [[2, 0], [2, 1], [2, 0], [2, 2]]]], dtype=int32) shape: (1, 3, 4, 2) # batch_size, target sentence len, source sentence len, 2D matrix indices
-
deepword.models.transformer.nucleus_renormalization(logits, p=0.95)¶ Refer to [Holtzman et al., 2020] for nucleus sampling
- Parameters
logits – last-dimension logits of vocabulary V; 2D array, [batch, V] or [batch*beam, V]
p – the cumulative probability bound, default 0.95;
- Returns
normalized nucleus logits
-
deepword.models.transformer.point_wise_feed_forward_network(d_model, dff)¶ Two dense layers, one with activation, the second without activation.
- Parameters
d_model – model size
dff – intermediate size
- Returns
FFN(x)
-
deepword.models.transformer.scaled_dot_product_attention(q, k, v, mask)¶ Calculate the attention weights. q, k, v must have matching leading dimensions. k, v must have matching penultimate dimension, i.e.: seq_len_k = seq_len_v. The mask has different shapes depending on its type(padding or look ahead) but it must be broadcastable for addition.
- Parameters
q – query shape == (…, seq_len_q, depth)
k – key shape == (…, seq_len_k, depth)
v – value shape == (…, seq_len_v, depth_v)
mask – Float tensor with shape broadcastable to (…, seq_len_q, seq_len_k). Defaults to None.
- Notice that mask must have the same dimensions as q, k, v.
e.g. if q, k, v are (batch_size, num_heads, seq_len, depth), then the mask should be also (batch_size, num_heads, seq_len, depth). However, if q, k, v are (batch_size, seq_len, depth), then the mask should also not contain num_heads.
- Returns
output (a.k.a. context vectors), scaled_attention_logits
-
deepword.models.transformer.sequential_decoding(decoder, copy_mask, enc_x, enc_output, training, max_tar_len, sos_id, eos_id, padding_id, use_greedy=True, beam_size=1, temperature=1.0)¶
-
deepword.models.transformer.token_logit_masking(token_id: int, vocab_size: int)¶ Generate logits to choose the token_id. e.g. with vocab_size = 10, token_id = 0, we have [ 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf] plus this mask with normal logits, only token_id=0 can be chose
deepword.models.utils module¶
-
deepword.models.utils.encoder_cnn(src, src_embeddings, pos_embeddings, filter_sizes, num_filters, embedding_size, is_infer=False, num_channels=2, activation='tanh')¶ encode state with CNN, refer to Convolutional Neural Networks for Sentence Classification
- Parameters
src – placeholder, (tf.int32, [batch_size, seq_len])
src_embeddings – (tf.float32, [vocab_size, embedding_size])
pos_embeddings – (tf.float32, [max_position_size, embedding_size])
filter_sizes – list of ints, e.g. [3, 4, 5]
num_filters – number of filters of each filter_size
embedding_size – embedding size
is_infer – training or inference
num_channels – 1 or 2.
activation – tanh (default) or relu
- Returns
a vector as the inner state
-
deepword.models.utils.encoder_cnn_base(input_tensor, filter_sizes, num_filters, num_channels, embedding_size, is_infer=False, activation='tanh')¶ We pad input_tensor in the head for each string to generate equal-size output. E.g.
go north forest path this is a path … given conv-filter size 3, it will be padded in the head with two tokens <S> <S> go north forest path this is a path … OR [PAD] [PAD] go north forest path this is a path …
the type of padding values doesn’t matter only if it is a special token, and be identical for each model.
We use constant value 0 here, so make sure index-0 is a special token that can be used to pad in your vocabulary.
- Parameters
input_tensor – (tf.float32, [batch_size, seq_len, embedding_size, num_channels])
filter_sizes – list of ints, e.g. [3, 4, 5]
num_filters – number of filters for each filter size
num_channels – 1 or 2, depending on the input tensor
embedding_size – word embedding size
is_infer – training or infer
activation – choose from “tanh” or “relu”. Notice that if choose relu, make sure adding an extra dense layer, otherwise the output is all non-negative values.
- Returns
a vector as the inner state
-
deepword.models.utils.encoder_lstm(src, src_len, src_embeddings, num_units, num_layers)¶ encode state with LSTM
- Parameters
src – placeholder, (tf.int32, [None, None])
src_len – placeholder, (tf.float32, [None])
src_embeddings – (tf.float32, [vocab_size, embedding_size])
num_units – number of LSTM units
num_layers – number of LSTM layers
- Returns
inner states (c, h)
-
deepword.models.utils.l2_loss_1d_action(q_actions, action_idx, expected_q, b_weight)¶ - l2 loss for 1D action space. only q values in q_actions
selected by action_idx will be computed against expected_q
e.g. “go east” would be one whole action. action_idx should have the same dimension as expected_q
- Parameters
q_actions – q-values
action_idx – placeholder, the action chose for the state, in a format of (tf.int32, [None])
expected_q – placeholder, the expected reward gained from the step, in a format of (tf.float32, [None])
b_weight – weights for each data point
- Returns
l2 loss and l1 loss
-
deepword.models.utils.l2_loss_1d_action_v2(q_actions, action_idx, expected_q, n_actions, b_weight)¶ l2 loss for 1D action space. e.g. “go east” would be one whole action.
q_actions: Q-vector of a state for all actions action_idx: placeholder, the action chose for the state,
in a format of (tf.int32, [None])
- expected_q: placeholder, the expected reward gained from the step,
in a format of (tf.float32, [None])
n_actions: number of total actions b_weight: weights for each data point
- Returns
l2 loss and l1 loss
-
deepword.models.utils.l2_loss_2d_action(q_actions, action_idx, expected_q, vocab_size, action_len, max_action_len, b_weight)¶ l2 loss for 2D action space. e.g. “go east” is an action composed by “go” and “east”.
- Parameters
q_actions – Q-matrix of a state for all action-components, e.g. tokens
action_idx – placeholder, the action-components chose for the state, in a format of (tf.int32, [None, None])
expected_q – placeholder, the expected reward gained from the step, in a format of (tf.float32, [None])
vocab_size – number of action-components
action_len – length of each action in a format of (tf.int32, [None])
max_action_len – maximum length of action
b_weight – weights for each data point
- Returns
l2 loss and l1 loss
-
deepword.models.utils.positional_encoding(position, d_model)¶ Create position embeddings with sin/cos, not need to train
- Parameters
position – maximum position size
d_model – embedding size
- Returns
position embeddings in shape (1, position, d_model)
