deepword.models package

Submodules

deepword.models.dqn_modeling module

class deepword.models.dqn_modeling.BaseDQN(hp, src_embeddings=None, is_infer=False)

Bases: object

classmethod get_eval_model(hp, device_placement)
get_q_actions()
classmethod get_train_model(hp, device_placement)
get_train_op(q_actions)
classmethod init_glove(glove_path)
class deepword.models.dqn_modeling.CnnDQN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.dqn_modeling.BaseDQN

get_q_actions()
get_train_op(q_actions)
class deepword.models.dqn_modeling.LstmDQN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.dqn_modeling.BaseDQN

get_q_actions()
get_train_op(q_actions)

deepword.models.drrn_modeling module

class deepword.models.drrn_modeling.BertDRRN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.drrn_modeling.CnnDRRN

__init__(hp, src_embeddings=None, is_infer=False)
inputs:

src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate

some actions for a certain state.

Parameters
  • hp

  • is_infer

get_q_actions()
class deepword.models.drrn_modeling.CnnDRRN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.dqn_modeling.BaseDQN

__init__(hp, src_embeddings=None, is_infer=False)
inputs:

src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions

Parameters
  • hp

  • is_infer

classmethod get_eval_model(hp, device_placement)
classmethod get_eval_student_model(hp, device_placement)
get_q_actions()
classmethod get_train_model(hp, device_placement)
get_train_op(q_actions)
classmethod get_train_student_model(hp, device_placement)
class deepword.models.drrn_modeling.TransformerDRRN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.drrn_modeling.CnnDRRN

__init__(hp, src_embeddings=None, is_infer=False)
inputs:

src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate

some actions for a certain state.

Parameters
  • hp

  • src_embeddings

  • is_infer

get_q_actions()

deepword.models.dsqn_modeling module

class deepword.models.dsqn_modeling.CnnDSQN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.dqn_modeling.BaseDQN

DSQN that uses CNN as the trajectory encoder

classmethod get_eval_model(hp, device_placement)
get_h_state(src)
get_merged_train_op(loss, snn_loss)
get_q_actions()
get_snn_train_op(semantic_same)
classmethod get_train_model(hp, device_placement)
get_train_op(q_actions)
classmethod get_train_student_model(hp, device_placement)
is_semantic_same()
class deepword.models.dsqn_modeling.CnnZorkDSQN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.dsqn_modeling.CnnDSQN

DSQN for Zork

classmethod get_eval_model(hp, device_placement)
classmethod get_train_model(hp, device_placement)
get_train_op(q_actions)
class deepword.models.dsqn_modeling.TransformerDSQN(hp, src_embeddings=None, is_infer=False)

Bases: deepword.models.dsqn_modeling.CnnDSQN

DSQN that uses transformer as the trajectory encoder

get_h_state(src)

deepword.models.gen_modeling module

class deepword.models.gen_modeling.TransformerGenDQN(hp, is_infer=False)

Bases: deepword.models.dqn_modeling.BaseDQN

decode()
classmethod get_eval_model(hp, device_placement)
get_q_actions()
classmethod get_train_model(hp, device_placement)
get_train_op(q_actions)
classmethod get_train_student_model(hp, device_placement)
class deepword.models.gen_modeling.TransformerPGN(hp, is_infer=False)

Bases: deepword.models.gen_modeling.TransformerGenDQN

TransformerPGN is similar with TransformerGenDQN, the only difference is the former uses cross entropy loss, while the latter uses MSE. Thus, TransformerPGN is not allowed training with the DQN framework. It can only be trained with supervised learning, e.g. imitation learning.

get_train_op(q_actions)
b_weight could be
  1. per instance, i.e. [batch_size, 1]

  2. per token, i.e. [batch_size, n_tokens]

deepword.models.models module

class deepword.models.models.DQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor])

Bases: object

class deepword.models.models.DRRNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], actions_: tensorflow.python.ops.array_ops.placeholder, actions_len_: tensorflow.python.ops.array_ops.placeholder, actions_repeats_: tensorflow.python.ops.array_ops.placeholder)

Bases: deepword.models.models.DQNModel

class deepword.models.models.DSQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], actions_: tensorflow.python.ops.array_ops.placeholder, actions_len_: tensorflow.python.ops.array_ops.placeholder, actions_repeats_: tensorflow.python.ops.array_ops.placeholder, snn_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], weighted_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], semantic_same: tensorflow.python.framework.ops.Tensor, snn_src_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src_len_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_len_: Optional[tensorflow.python.ops.array_ops.placeholder], labels_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_loss: Optional[tensorflow.python.framework.ops.Tensor], weighted_loss: Optional[tensorflow.python.framework.ops.Tensor], merged_train_op: Optional[tensorflow.python.framework.ops.Operation], snn_train_op: Optional[tensorflow.python.framework.ops.Operation], h_states_diff: Optional[tensorflow.python.framework.ops.Tensor])

Bases: deepword.models.models.DRRNModel

class deepword.models.models.DSQNZorkModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], snn_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], weighted_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], semantic_same: tensorflow.python.framework.ops.Tensor, snn_src_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src_len_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_len_: Optional[tensorflow.python.ops.array_ops.placeholder], labels_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_loss: Optional[tensorflow.python.framework.ops.Tensor], weighted_loss: Optional[tensorflow.python.framework.ops.Tensor], merged_train_op: Optional[tensorflow.python.framework.ops.Operation], snn_train_op: Optional[tensorflow.python.framework.ops.Operation], h_states_diff: Optional[tensorflow.python.framework.ops.Tensor])

Bases: deepword.models.models.DQNModel

class deepword.models.models.GenDQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], decoded_idx_infer: tensorflow.python.framework.ops.Tensor, action_idx_out_: tensorflow.python.ops.array_ops.placeholder, action_len_: tensorflow.python.ops.array_ops.placeholder, temperature_: tensorflow.python.ops.array_ops.placeholder, p_gen: tensorflow.python.framework.ops.Tensor, p_gen_infer: tensorflow.python.framework.ops.Tensor, beam_size_: tensorflow.python.ops.array_ops.placeholder, use_greedy_: tensorflow.python.ops.array_ops.placeholder, col_eos_idx: tensorflow.python.framework.ops.Tensor, decoded_logits_infer: tensorflow.python.framework.ops.Tensor)

Bases: deepword.models.models.DQNModel

class deepword.models.models.NLUModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], classification_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], seg_tj_action_: tensorflow.python.ops.array_ops.placeholder, swag_labels_: Optional[tensorflow.python.ops.array_ops.placeholder], classification_loss: Optional[tensorflow.python.framework.ops.Tensor], classification_train_op: Optional[tensorflow.python.framework.ops.Operation])

Bases: deepword.models.models.DQNModel

class deepword.models.models.SNNModel(graph: tensorflow.python.framework.ops.Graph, target_src_: tensorflow.python.ops.array_ops.placeholder, same_src_: tensorflow.python.ops.array_ops.placeholder, diff_src_: tensorflow.python.ops.array_ops.placeholder, semantic_same: tensorflow.python.framework.ops.Operation, train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], train_summary_op: Optional[tensorflow.python.framework.ops.Operation])

Bases: object

deepword.models.nlu_modeling module

class deepword.models.nlu_modeling.AlbertNLU(hp, is_infer=False)

Bases: deepword.models.nlu_modeling.BertNLU

__init__(hp, is_infer=False)
inputs:
src: source sentences to encode,

has paddings, [CLS], and [SEP] prepared

src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate

some actions for a certain state.

Parameters
  • hp

  • is_infer

get_q_actions()
class deepword.models.nlu_modeling.BertNLU(hp, is_infer=False)

Bases: deepword.models.dqn_modeling.BaseDQN

__init__(hp, is_infer=False)
inputs:
src: source sentences to encode,

has paddings, [CLS], and [SEP] prepared

src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate

some actions for a certain state.

Parameters
  • hp

  • is_infer

get_classification_train_op(q_actions)

q_actions: [batch_size, 1] in this case, when we want to compute classification error, we need the batch_size = src batch size * num classes which means that number of classes for each src should be equal :param q_actions: :return:

classmethod get_eval_model(hp, device_placement)
classmethod get_eval_student_model(hp, device_placement)
get_q_actions()
classmethod get_train_model(hp, device_placement)
get_train_op(q_actions)
classmethod get_train_student_model(hp, device_placement)
deepword.models.nlu_modeling.create_eval_bert_nlu_model(model_creator, hp, device_placement)
deepword.models.nlu_modeling.create_train_bert_nlu_model(model_creator, hp, device_placement)

deepword.models.snn_modeling module

class deepword.models.snn_modeling.BertSNN(hp, is_infer=False)

Bases: object

Use SNN to encode sentences for additive features representation learning

add_cls_token(src)
classmethod get_eval_model(hp, device_placement)
classmethod get_eval_student_model(hp, device_placement)
get_h_state(raw_src)
classmethod get_train_model(hp, device_placement)
get_train_op(semantic_same)
classmethod get_train_student_model(hp, device_placement)
is_semantic_same()

deepword.models.transformer module

Copied from https://www.tensorflow.org/beta/tutorials/text/transformer decode function added by Xusen Yin

class deepword.models.transformer.Decoder(num_layers, d_model, num_heads, dff, tgt_vocab_size, dropout_rate=0.1, with_pointer=False)

Bases: tensorflow.python.keras.engine.base_layer.Layer

call(x, enc_x, enc_output, training, look_ahead_mask, padding_mask, copy_mask=None)

decode with pointer

Parameters
  • x – decoder input

  • enc_x – encoder input

  • enc_output – encoder encoded result

  • training – is training or inference

  • look_ahead_mask – combined look ahead mask with padding mask

  • padding_mask – padding mask for source sentence

  • copy_mask – dense vector size |V| to mark all tokens that skip copying with 1; otherwise, 0.

Returns

total logits, probability of generation, gen logits, copy logits

class deepword.models.transformer.DecoderLayer(d_model, num_heads, dff, rate=0.1)

Bases: tensorflow.python.keras.engine.base_layer.Layer

call(x, enc_output, training, look_ahead_mask, padding_mask)

This is where the layer’s logic lives.

Parameters
  • inputs – Input tensor, or list/tuple of input tensors.

  • **kwargs – Additional keyword arguments.

Returns

A tensor or list/tuple of tensors.

class deepword.models.transformer.Encoder(num_layers, d_model, num_heads, dff, input_vocab_size, dropout_rate=0.1)

Bases: tensorflow.python.keras.engine.base_layer.Layer

call(x, training=None, mask=None, x_seg=None)

This is where the layer’s logic lives.

Parameters
  • inputs – Input tensor, or list/tuple of input tensors.

  • **kwargs – Additional keyword arguments.

Returns

A tensor or list/tuple of tensors.

class deepword.models.transformer.EncoderLayer(d_model, num_heads, dff, rate=0.1)

Bases: tensorflow.python.keras.engine.base_layer.Layer

call(x, training, mask)

This is where the layer’s logic lives.

Parameters
  • inputs – Input tensor, or list/tuple of input tensors.

  • **kwargs – Additional keyword arguments.

Returns

A tensor or list/tuple of tensors.

class deepword.models.transformer.MultiHeadAttention(d_model, num_heads)

Bases: tensorflow.python.keras.engine.base_layer.Layer

call(v, k, q, mask)

This is where the layer’s logic lives.

Parameters
  • inputs – Input tensor, or list/tuple of input tensors.

  • **kwargs – Additional keyword arguments.

Returns

A tensor or list/tuple of tensors.

split_heads(x, batch_size)

Split the last dimension into (num_heads, depth). Transpose the result such that the shape is

(batch_size, num_heads, seq_len, depth)

class deepword.models.transformer.Transformer(num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, dropout_rate=0.1, with_pointer=True)

Bases: tensorflow.python.keras.engine.training.Model

call(inp, tar, training, copy_mask=None)

Calls the model on new inputs.

In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Parameters
  • inputs – A tensor or list of tensors.

  • training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.

  • mask – A mask or list of masks. A mask can be either a tensor or None (no mask).

Returns

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

decode(enc_x, training, max_tar_len, sos_id, eos_id, padding_id, use_greedy=True, beam_size=1, temperature=1.0, copy_mask=None)
deepword.models.transformer.categorical_with_replacement(logits, k: int)
deepword.models.transformer.categorical_without_replacement(logits, k: int)

Courtesy of https://github.com/tensorflow/tensorflow/issues/ 9260#issuecomment-437875125 also cite here: @misc{vieira2014gumbel,

title = {Gumbel-max trick and weighted reservoir sampling}, author = {Tim Vieira}, url = {http://timvieira.github.io/blog/post/2014/08/01/ gumbel-max-trick-and-weighted-reservoir-sampling/}, year = {2014}

} Notice that the logits represent unnormalized log probabilities, in the citation above, there is no need to normalized them first to add the Gumbel random variant, which surprises me! since I thought it should be logits - tf.reduce_logsumexp(logits) + z

deepword.models.transformer.create_decode_masks(tar)

Create masking for decoding

This masking combines the look ahead mask and target sentence padding mask.

  1. We create look ahead mask for each sentence;

  2. We combine the sentence padding mask with the look ahead mask, e.g. when the look ahead mask says “0” for a token, while the sentence padding mask says “1” for the same token because of the token is a padding, then the final mask for this token is “1”.

Parameters

tar – target sentence, shape: (batch_size, seq_len_k)

Returns

(batch_size, 1, seq_len_k, seq_len_k)

Return type

a combined mask of look ahead mask and padding mask, shape

Examples

>>> tar_src = [[1,2,3,4,0,0], [1,3,0,0,0,0]]
>>> create_decode_masks(tar_src)
array([[[[0., 1., 1., 1., 1., 1.],
         [0., 0., 1., 1., 1., 1.],
         [0., 0., 0., 1., 1., 1.],
         [0., 0., 0., 0., 1., 1.],
         [0., 0., 0., 0., 1., 1.],
         [0., 0., 0., 0., 1., 1.]]],
       [[[0., 1., 1., 1., 1., 1.],
         [0., 0., 1., 1., 1., 1.],
         [0., 0., 1., 1., 1., 1.],
         [0., 0., 1., 1., 1., 1.],
         [0., 0., 1., 1., 1., 1.],
         [0., 0., 1., 1., 1., 1.]]]], dtype=float32)
deepword.models.transformer.create_look_ahead_mask(size: int)

create look ahead mask for decoding

At every decoding step i, only t_0, …, t_i can be accessed by the model, while t_{i+1}, …, t_n should be masked out.

Parameters

size – decoding output size

Returns

look ahead mask, True means masked out.

Examples

>>> create_look_ahead_mask(3)
array([[0., 1., 1.],
       [0., 0., 1.],
       [0., 0., 0.]], dtype=float32)
deepword.models.transformer.create_padding_mask(seq)

Padding value should be 0. This mask contains one dimension for num_heads, i.e. (batch_size, <broadcast to num_heads>, <broadcast to seq_len_q>, seq_len_k)

Parameters

seq – (batch_size, seq_len_k)

Returns

padding mask, paddings is set to True, others are False shape: (batch_size, 1, 1, seq_len_k)

deepword.models.transformer.decode_next_step(decoder, time, enc_x, enc_output, training, dec_padding_mask, copy_mask, batch_size, tgt_vocab_size, eos_id, padding_id, beam_size, use_greedy, temperature, inc_tar, inc_continue, inc_valid_len, inc_p_gen, inc_sum_logits)

decode one step with beam search given inc_tar as the current decoded target sequence (batch_size * beam_size), first decode one step with decoder to get decoded_logits. then mask the decoded_logits:

  1. if continue to decode (i.e. eos never reached) and current time reach the max_tar_len, then only EOS is allowed to choose;

  2. if not continue to decode, only PAD is allowed to choose;

  3. default, we don’t mask the decoded_logits.

After get predicted_id, either by sampling method or greedy method, we compute 1) beam_id and 2) token_id from predicted_id. beam_id indicates which beam to choose, token_id indicates under that beam, which token to choose.

for loop variables, inc_tar, inc_continue, inc_logits, inc_valid_len, and inc_p_gen, we first select rows according to beam_id, then pad the token_id related info to the end. e.g. given beam_size = 2, batch_size = 2, we have inc_tar:

[[[1, 2, 3],

[2, 3, 4]], # –> this beam row will be deleted

[[9, 8, 7],

[8, 7, 6]]]

if beam_id = [[0, 0], [0, 1]], then we choose [1, 2, 3] twice, and [9, 8, 7] once, and [8, 7, 6] once, then make the inc_tar to be [[[1, 2, 3],

[1, 2, 3]],

[[9, 8, 7],

[8, 7, 6]]]

then pad new token_id to the end.

deepword.models.transformer.get_sparse_idx_for_copy(src, target_seq_len: int)

Create sparse index from source sentence for copying into decoder using the tf.scatter_nd method.

Considering the following source sentence: “a, b, a, c”; turn it into indices: [0, 1, 0, 2], and they have attention weights attn = [a0, a1, a2, a3].

Now we want to decode a sentence with 3 tokens, for each generated token, we want to collect attention weights from the source sentence, and mix with the logits to generate the current token.

I.e. for decoded sentence position i, we have logits(i) = [0.1, 0.2, 0.3, 0.5] for all possible tokens a, b, c, d. Then we want to sum the attention weights of two-0s, one-1, and one-2 into the logits(i) according to a generation weight p(i), i.e. total logits = logits(i) * p(i) + [a0 + a2, a1, a3, 0] * (1 - p(i)).

The goal is to create a dense vector of vocabulary size, and copy attention weights from source sentence to the dense vector.

We create a inverse index to do so. For target token i, we need to collect [(0, a0), (1, a1), (0, a2), (2, a3)] to construct the vector.

Parameters
  • src – source sentence

  • target_seq_len – target sequence len

Returns

sparse index to construct attention weight matrix for a batch

Examples

>>> get_sparse_idx_for_copy(src=[[0, 1, 0, 2]], target_seq_len=3)
array([[[[0, 0],
         [0, 1],
         [0, 0],
         [0, 2]],
        [[1, 0],
         [1, 1],
         [1, 0],
         [1, 2]],
        [[2, 0],
         [2, 1],
         [2, 0],
         [2, 2]]]], dtype=int32)
shape: (1, 3, 4, 2)  # batch_size, target sentence len, source sentence
len, 2D matrix indices
deepword.models.transformer.nucleus_renormalization(logits, p=0.95)

Refer to [Holtzman et al., 2020] for nucleus sampling

Parameters
  • logits – last-dimension logits of vocabulary V; 2D array, [batch, V] or [batch*beam, V]

  • p – the cumulative probability bound, default 0.95;

Returns

normalized nucleus logits

deepword.models.transformer.point_wise_feed_forward_network(d_model, dff)

Two dense layers, one with activation, the second without activation.

Parameters
  • d_model – model size

  • dff – intermediate size

Returns

FFN(x)

deepword.models.transformer.scaled_dot_product_attention(q, k, v, mask)

Calculate the attention weights. q, k, v must have matching leading dimensions. k, v must have matching penultimate dimension, i.e.: seq_len_k = seq_len_v. The mask has different shapes depending on its type(padding or look ahead) but it must be broadcastable for addition.

Parameters
  • q – query shape == (…, seq_len_q, depth)

  • k – key shape == (…, seq_len_k, depth)

  • v – value shape == (…, seq_len_v, depth_v)

  • mask – Float tensor with shape broadcastable to (…, seq_len_q, seq_len_k). Defaults to None.

Notice that mask must have the same dimensions as q, k, v.

e.g. if q, k, v are (batch_size, num_heads, seq_len, depth), then the mask should be also (batch_size, num_heads, seq_len, depth). However, if q, k, v are (batch_size, seq_len, depth), then the mask should also not contain num_heads.

Returns

output (a.k.a. context vectors), scaled_attention_logits

deepword.models.transformer.sequential_decoding(decoder, copy_mask, enc_x, enc_output, training, max_tar_len, sos_id, eos_id, padding_id, use_greedy=True, beam_size=1, temperature=1.0)
deepword.models.transformer.token_logit_masking(token_id: int, vocab_size: int)

Generate logits to choose the token_id. e.g. with vocab_size = 10, token_id = 0, we have [ 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf] plus this mask with normal logits, only token_id=0 can be chose

deepword.models.utils module

deepword.models.utils.encoder_cnn(src, src_embeddings, pos_embeddings, filter_sizes, num_filters, embedding_size, is_infer=False, num_channels=2, activation='tanh')

encode state with CNN, refer to Convolutional Neural Networks for Sentence Classification

Parameters
  • src – placeholder, (tf.int32, [batch_size, seq_len])

  • src_embeddings – (tf.float32, [vocab_size, embedding_size])

  • pos_embeddings – (tf.float32, [max_position_size, embedding_size])

  • filter_sizes – list of ints, e.g. [3, 4, 5]

  • num_filters – number of filters of each filter_size

  • embedding_size – embedding size

  • is_infer – training or inference

  • num_channels – 1 or 2.

  • activation – tanh (default) or relu

Returns

a vector as the inner state

deepword.models.utils.encoder_cnn_base(input_tensor, filter_sizes, num_filters, num_channels, embedding_size, is_infer=False, activation='tanh')

We pad input_tensor in the head for each string to generate equal-size output. E.g.

go north forest path this is a path … given conv-filter size 3, it will be padded in the head with two tokens <S> <S> go north forest path this is a path … OR [PAD] [PAD] go north forest path this is a path …

the type of padding values doesn’t matter only if it is a special token, and be identical for each model.

We use constant value 0 here, so make sure index-0 is a special token that can be used to pad in your vocabulary.

Parameters
  • input_tensor – (tf.float32, [batch_size, seq_len, embedding_size, num_channels])

  • filter_sizes – list of ints, e.g. [3, 4, 5]

  • num_filters – number of filters for each filter size

  • num_channels – 1 or 2, depending on the input tensor

  • embedding_size – word embedding size

  • is_infer – training or infer

  • activation – choose from “tanh” or “relu”. Notice that if choose relu, make sure adding an extra dense layer, otherwise the output is all non-negative values.

Returns

a vector as the inner state

deepword.models.utils.encoder_lstm(src, src_len, src_embeddings, num_units, num_layers)

encode state with LSTM

Parameters
  • src – placeholder, (tf.int32, [None, None])

  • src_len – placeholder, (tf.float32, [None])

  • src_embeddings – (tf.float32, [vocab_size, embedding_size])

  • num_units – number of LSTM units

  • num_layers – number of LSTM layers

Returns

inner states (c, h)

deepword.models.utils.l2_loss_1d_action(q_actions, action_idx, expected_q, b_weight)
l2 loss for 1D action space. only q values in q_actions

selected by action_idx will be computed against expected_q

e.g. “go east” would be one whole action. action_idx should have the same dimension as expected_q

Parameters
  • q_actions – q-values

  • action_idx – placeholder, the action chose for the state, in a format of (tf.int32, [None])

  • expected_q – placeholder, the expected reward gained from the step, in a format of (tf.float32, [None])

  • b_weight – weights for each data point

Returns

l2 loss and l1 loss

deepword.models.utils.l2_loss_1d_action_v2(q_actions, action_idx, expected_q, n_actions, b_weight)

l2 loss for 1D action space. e.g. “go east” would be one whole action.

q_actions: Q-vector of a state for all actions action_idx: placeholder, the action chose for the state,

in a format of (tf.int32, [None])

expected_q: placeholder, the expected reward gained from the step,

in a format of (tf.float32, [None])

n_actions: number of total actions b_weight: weights for each data point

Returns

l2 loss and l1 loss

deepword.models.utils.l2_loss_2d_action(q_actions, action_idx, expected_q, vocab_size, action_len, max_action_len, b_weight)

l2 loss for 2D action space. e.g. “go east” is an action composed by “go” and “east”.

Parameters
  • q_actions – Q-matrix of a state for all action-components, e.g. tokens

  • action_idx – placeholder, the action-components chose for the state, in a format of (tf.int32, [None, None])

  • expected_q – placeholder, the expected reward gained from the step, in a format of (tf.float32, [None])

  • vocab_size – number of action-components

  • action_len – length of each action in a format of (tf.int32, [None])

  • max_action_len – maximum length of action

  • b_weight – weights for each data point

Returns

l2 loss and l1 loss

deepword.models.utils.positional_encoding(position, d_model)

Create position embeddings with sin/cos, not need to train

Parameters
  • position – maximum position size

  • d_model – embedding size

Returns

position embeddings in shape (1, position, d_model)

Module contents