deepword.models package¶
Submodules¶
deepword.models.dqn_modeling module¶
- 
class deepword.models.dqn_modeling.BaseDQN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - object- 
classmethod get_eval_model(hp, device_placement)¶
 - 
get_q_actions()¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(q_actions)¶
 - 
classmethod init_glove(glove_path)¶
 
- 
classmethod 
- 
class deepword.models.dqn_modeling.CnnDQN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.dqn_modeling.BaseDQN- 
get_q_actions()¶
 - 
get_train_op(q_actions)¶
 
- 
- 
class deepword.models.dqn_modeling.LstmDQN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.dqn_modeling.BaseDQN- 
get_q_actions()¶
 - 
get_train_op(q_actions)¶
 
- 
deepword.models.drrn_modeling module¶
- 
class deepword.models.drrn_modeling.BertDRRN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.drrn_modeling.CnnDRRN- 
__init__(hp, src_embeddings=None, is_infer=False)¶
- inputs:
- src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate - some actions for a certain state. 
 - Parameters
- hp – 
- is_infer – 
 
 
 - 
get_q_actions()¶
 
- 
- 
class deepword.models.drrn_modeling.CnnDRRN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.dqn_modeling.BaseDQN- 
__init__(hp, src_embeddings=None, is_infer=False)¶
- inputs:
- src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions 
 - Parameters
- hp – 
- is_infer – 
 
 
 - 
classmethod get_eval_model(hp, device_placement)¶
 - 
classmethod get_eval_student_model(hp, device_placement)¶
 - 
get_q_actions()¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(q_actions)¶
 - 
classmethod get_train_student_model(hp, device_placement)¶
 
- 
- 
class deepword.models.drrn_modeling.TransformerDRRN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.drrn_modeling.CnnDRRN- 
__init__(hp, src_embeddings=None, is_infer=False)¶
- inputs:
- src: source sentences to encode src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate - some actions for a certain state. 
 - Parameters
- hp – 
- src_embeddings – 
- is_infer – 
 
 
 - 
get_q_actions()¶
 
- 
deepword.models.dsqn_modeling module¶
- 
class deepword.models.dsqn_modeling.CnnDSQN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.dqn_modeling.BaseDQN- DSQN that uses CNN as the trajectory encoder - 
classmethod get_eval_model(hp, device_placement)¶
 - 
get_h_state(src)¶
 - 
get_merged_train_op(loss, snn_loss)¶
 - 
get_q_actions()¶
 - 
get_snn_train_op(semantic_same)¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(q_actions)¶
 - 
classmethod get_train_student_model(hp, device_placement)¶
 - 
is_semantic_same()¶
 
- 
classmethod 
- 
class deepword.models.dsqn_modeling.CnnZorkDSQN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.dsqn_modeling.CnnDSQN- DSQN for Zork - 
classmethod get_eval_model(hp, device_placement)¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(q_actions)¶
 
- 
classmethod 
- 
class deepword.models.dsqn_modeling.TransformerDSQN(hp, src_embeddings=None, is_infer=False)¶
- Bases: - deepword.models.dsqn_modeling.CnnDSQN- DSQN that uses transformer as the trajectory encoder - 
get_h_state(src)¶
 
- 
deepword.models.gen_modeling module¶
- 
class deepword.models.gen_modeling.TransformerGenDQN(hp, is_infer=False)¶
- Bases: - deepword.models.dqn_modeling.BaseDQN- 
decode()¶
 - 
classmethod get_eval_model(hp, device_placement)¶
 - 
get_q_actions()¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(q_actions)¶
 - 
classmethod get_train_student_model(hp, device_placement)¶
 
- 
- 
class deepword.models.gen_modeling.TransformerPGN(hp, is_infer=False)¶
- Bases: - deepword.models.gen_modeling.TransformerGenDQN- TransformerPGN is similar with TransformerGenDQN, the only difference is the former uses cross entropy loss, while the latter uses MSE. Thus, TransformerPGN is not allowed training with the DQN framework. It can only be trained with supervised learning, e.g. imitation learning. - 
get_train_op(q_actions)¶
- b_weight could be
- per instance, i.e. [batch_size, 1] 
- per token, i.e. [batch_size, n_tokens] 
 
 
 
- 
deepword.models.models module¶
- 
class deepword.models.models.DQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor])¶
- Bases: - object
- 
class deepword.models.models.DRRNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], actions_: tensorflow.python.ops.array_ops.placeholder, actions_len_: tensorflow.python.ops.array_ops.placeholder, actions_repeats_: tensorflow.python.ops.array_ops.placeholder)¶
- 
class deepword.models.models.DSQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], actions_: tensorflow.python.ops.array_ops.placeholder, actions_len_: tensorflow.python.ops.array_ops.placeholder, actions_repeats_: tensorflow.python.ops.array_ops.placeholder, snn_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], weighted_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], semantic_same: tensorflow.python.framework.ops.Tensor, snn_src_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src_len_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_len_: Optional[tensorflow.python.ops.array_ops.placeholder], labels_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_loss: Optional[tensorflow.python.framework.ops.Tensor], weighted_loss: Optional[tensorflow.python.framework.ops.Tensor], merged_train_op: Optional[tensorflow.python.framework.ops.Operation], snn_train_op: Optional[tensorflow.python.framework.ops.Operation], h_states_diff: Optional[tensorflow.python.framework.ops.Tensor])¶
- 
class deepword.models.models.DSQNZorkModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], snn_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], weighted_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], semantic_same: tensorflow.python.framework.ops.Tensor, snn_src_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src_len_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_src2_len_: Optional[tensorflow.python.ops.array_ops.placeholder], labels_: Optional[tensorflow.python.ops.array_ops.placeholder], snn_loss: Optional[tensorflow.python.framework.ops.Tensor], weighted_loss: Optional[tensorflow.python.framework.ops.Tensor], merged_train_op: Optional[tensorflow.python.framework.ops.Operation], snn_train_op: Optional[tensorflow.python.framework.ops.Operation], h_states_diff: Optional[tensorflow.python.framework.ops.Tensor])¶
- 
class deepword.models.models.GenDQNModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], decoded_idx_infer: tensorflow.python.framework.ops.Tensor, action_idx_out_: tensorflow.python.ops.array_ops.placeholder, action_len_: tensorflow.python.ops.array_ops.placeholder, temperature_: tensorflow.python.ops.array_ops.placeholder, p_gen: tensorflow.python.framework.ops.Tensor, p_gen_infer: tensorflow.python.framework.ops.Tensor, beam_size_: tensorflow.python.ops.array_ops.placeholder, use_greedy_: tensorflow.python.ops.array_ops.placeholder, col_eos_idx: tensorflow.python.framework.ops.Tensor, decoded_logits_infer: tensorflow.python.framework.ops.Tensor)¶
- 
class deepword.models.models.NLUModel(graph: tensorflow.python.framework.ops.Graph, q_actions: tensorflow.python.framework.ops.Tensor, src_: tensorflow.python.ops.array_ops.placeholder, src_len_: tensorflow.python.ops.array_ops.placeholder, action_idx_: Optional[tensorflow.python.ops.array_ops.placeholder], train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], expected_q_: Optional[tensorflow.python.ops.array_ops.placeholder], b_weight_: Optional[tensorflow.python.ops.array_ops.placeholder], train_summary_op: Optional[tensorflow.python.framework.ops.Operation], classification_train_summary_op: Optional[tensorflow.python.framework.ops.Operation], abs_loss: Optional[tensorflow.python.framework.ops.Tensor], src_seg_: Optional[tensorflow.python.ops.array_ops.placeholder], h_state: Optional[tensorflow.python.framework.ops.Tensor], seg_tj_action_: tensorflow.python.ops.array_ops.placeholder, swag_labels_: Optional[tensorflow.python.ops.array_ops.placeholder], classification_loss: Optional[tensorflow.python.framework.ops.Tensor], classification_train_op: Optional[tensorflow.python.framework.ops.Operation])¶
- 
class deepword.models.models.SNNModel(graph: tensorflow.python.framework.ops.Graph, target_src_: tensorflow.python.ops.array_ops.placeholder, same_src_: tensorflow.python.ops.array_ops.placeholder, diff_src_: tensorflow.python.ops.array_ops.placeholder, semantic_same: tensorflow.python.framework.ops.Operation, train_op: Optional[tensorflow.python.framework.ops.Operation], loss: Optional[tensorflow.python.framework.ops.Tensor], train_summary_op: Optional[tensorflow.python.framework.ops.Operation])¶
- Bases: - object
deepword.models.nlu_modeling module¶
- 
class deepword.models.nlu_modeling.AlbertNLU(hp, is_infer=False)¶
- Bases: - deepword.models.nlu_modeling.BertNLU- 
__init__(hp, is_infer=False)¶
- inputs:
- src: source sentences to encode,
- has paddings, [CLS], and [SEP] prepared 
 - src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate - some actions for a certain state. 
 - Parameters
- hp – 
- is_infer – 
 
 
 - 
get_q_actions()¶
 
- 
- 
class deepword.models.nlu_modeling.BertNLU(hp, is_infer=False)¶
- Bases: - deepword.models.dqn_modeling.BaseDQN- 
__init__(hp, is_infer=False)¶
- inputs:
- src: source sentences to encode,
- has paddings, [CLS], and [SEP] prepared 
 - src_len: length of source sentences action_idx: the action chose to run expected_q: E(q) computed from the iterative equation of DQN actions: all possible actions actions_len: length of actions actions_mask: a 0-1 vector of size |actions|, using 0 to eliminate - some actions for a certain state. 
 - Parameters
- hp – 
- is_infer – 
 
 
 - 
get_classification_train_op(q_actions)¶
- q_actions: [batch_size, 1] in this case, when we want to compute classification error, we need the batch_size = src batch size * num classes which means that number of classes for each src should be equal :param q_actions: :return: 
 - 
classmethod get_eval_model(hp, device_placement)¶
 - 
classmethod get_eval_student_model(hp, device_placement)¶
 - 
get_q_actions()¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(q_actions)¶
 - 
classmethod get_train_student_model(hp, device_placement)¶
 
- 
- 
deepword.models.nlu_modeling.create_eval_bert_nlu_model(model_creator, hp, device_placement)¶
- 
deepword.models.nlu_modeling.create_train_bert_nlu_model(model_creator, hp, device_placement)¶
deepword.models.snn_modeling module¶
- 
class deepword.models.snn_modeling.BertSNN(hp, is_infer=False)¶
- Bases: - object- Use SNN to encode sentences for additive features representation learning - 
add_cls_token(src)¶
 - 
classmethod get_eval_model(hp, device_placement)¶
 - 
classmethod get_eval_student_model(hp, device_placement)¶
 - 
get_h_state(raw_src)¶
 - 
classmethod get_train_model(hp, device_placement)¶
 - 
get_train_op(semantic_same)¶
 - 
classmethod get_train_student_model(hp, device_placement)¶
 - 
is_semantic_same()¶
 
- 
deepword.models.transformer module¶
Copied from https://www.tensorflow.org/beta/tutorials/text/transformer decode function added by Xusen Yin
- 
class deepword.models.transformer.Decoder(num_layers, d_model, num_heads, dff, tgt_vocab_size, dropout_rate=0.1, with_pointer=False)¶
- Bases: - tensorflow.python.keras.engine.base_layer.Layer- 
call(x, enc_x, enc_output, training, look_ahead_mask, padding_mask, copy_mask=None)¶
- decode with pointer - Parameters
- x – decoder input 
- enc_x – encoder input 
- enc_output – encoder encoded result 
- training – is training or inference 
- look_ahead_mask – combined look ahead mask with padding mask 
- padding_mask – padding mask for source sentence 
- copy_mask – dense vector size |V| to mark all tokens that skip copying with 1; otherwise, 0. 
 
- Returns
- total logits, probability of generation, gen logits, copy logits 
 
 
- 
- 
class deepword.models.transformer.DecoderLayer(d_model, num_heads, dff, rate=0.1)¶
- Bases: - tensorflow.python.keras.engine.base_layer.Layer- 
call(x, enc_output, training, look_ahead_mask, padding_mask)¶
- This is where the layer’s logic lives. - Parameters
- inputs – Input tensor, or list/tuple of input tensors. 
- **kwargs – Additional keyword arguments. 
 
- Returns
- A tensor or list/tuple of tensors. 
 
 
- 
- 
class deepword.models.transformer.Encoder(num_layers, d_model, num_heads, dff, input_vocab_size, dropout_rate=0.1)¶
- Bases: - tensorflow.python.keras.engine.base_layer.Layer- 
call(x, training=None, mask=None, x_seg=None)¶
- This is where the layer’s logic lives. - Parameters
- inputs – Input tensor, or list/tuple of input tensors. 
- **kwargs – Additional keyword arguments. 
 
- Returns
- A tensor or list/tuple of tensors. 
 
 
- 
- 
class deepword.models.transformer.EncoderLayer(d_model, num_heads, dff, rate=0.1)¶
- Bases: - tensorflow.python.keras.engine.base_layer.Layer- 
call(x, training, mask)¶
- This is where the layer’s logic lives. - Parameters
- inputs – Input tensor, or list/tuple of input tensors. 
- **kwargs – Additional keyword arguments. 
 
- Returns
- A tensor or list/tuple of tensors. 
 
 
- 
- 
class deepword.models.transformer.MultiHeadAttention(d_model, num_heads)¶
- Bases: - tensorflow.python.keras.engine.base_layer.Layer- 
call(v, k, q, mask)¶
- This is where the layer’s logic lives. - Parameters
- inputs – Input tensor, or list/tuple of input tensors. 
- **kwargs – Additional keyword arguments. 
 
- Returns
- A tensor or list/tuple of tensors. 
 
 - 
split_heads(x, batch_size)¶
- Split the last dimension into (num_heads, depth). Transpose the result such that the shape is - (batch_size, num_heads, seq_len, depth) 
 
- 
- 
class deepword.models.transformer.Transformer(num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, dropout_rate=0.1, with_pointer=True)¶
- Bases: - tensorflow.python.keras.engine.training.Model- 
call(inp, tar, training, copy_mask=None)¶
- Calls the model on new inputs. - In this case call just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs). - Parameters
- inputs – A tensor or list of tensors. 
- training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode. 
- mask – A mask or list of masks. A mask can be either a tensor or None (no mask). 
 
- Returns
- A tensor if there is a single output, or a list of tensors if there are more than one outputs. 
 
 - 
decode(enc_x, training, max_tar_len, sos_id, eos_id, padding_id, use_greedy=True, beam_size=1, temperature=1.0, copy_mask=None)¶
 
- 
- 
deepword.models.transformer.categorical_with_replacement(logits, k: int)¶
- 
deepword.models.transformer.categorical_without_replacement(logits, k: int)¶
- Courtesy of https://github.com/tensorflow/tensorflow/issues/ 9260#issuecomment-437875125 also cite here: @misc{vieira2014gumbel, - title = {Gumbel-max trick and weighted reservoir sampling}, author = {Tim Vieira}, url = {http://timvieira.github.io/blog/post/2014/08/01/ gumbel-max-trick-and-weighted-reservoir-sampling/}, year = {2014} - } Notice that the logits represent unnormalized log probabilities, in the citation above, there is no need to normalized them first to add the Gumbel random variant, which surprises me! since I thought it should be logits - tf.reduce_logsumexp(logits) + z 
- 
deepword.models.transformer.create_decode_masks(tar)¶
- Create masking for decoding - This masking combines the look ahead mask and target sentence padding mask. - We create look ahead mask for each sentence; 
- We combine the sentence padding mask with the look ahead mask, e.g. when the look ahead mask says “0” for a token, while the sentence padding mask says “1” for the same token because of the token is a padding, then the final mask for this token is “1”. 
 - Parameters
- tar – target sentence, shape: (batch_size, seq_len_k) 
- Returns
- (batch_size, 1, seq_len_k, seq_len_k) 
- Return type
- a combined mask of look ahead mask and padding mask, shape 
 - Examples - >>> tar_src = [[1,2,3,4,0,0], [1,3,0,0,0,0]] >>> create_decode_masks(tar_src) array([[[[0., 1., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 0., 1., 1., 1.], [0., 0., 0., 0., 1., 1.], [0., 0., 0., 0., 1., 1.], [0., 0., 0., 0., 1., 1.]]], [[[0., 1., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.], [0., 0., 1., 1., 1., 1.]]]], dtype=float32) 
- 
deepword.models.transformer.create_look_ahead_mask(size: int)¶
- create look ahead mask for decoding - At every decoding step i, only t_0, …, t_i can be accessed by the model, while t_{i+1}, …, t_n should be masked out. - Parameters
- size – decoding output size 
- Returns
- look ahead mask, True means masked out. 
 - Examples - >>> create_look_ahead_mask(3) array([[0., 1., 1.], [0., 0., 1.], [0., 0., 0.]], dtype=float32) 
- 
deepword.models.transformer.create_padding_mask(seq)¶
- Padding value should be 0. This mask contains one dimension for num_heads, i.e. (batch_size, <broadcast to num_heads>, <broadcast to seq_len_q>, seq_len_k) - Parameters
- seq – (batch_size, seq_len_k) 
- Returns
- padding mask, paddings is set to True, others are False shape: (batch_size, 1, 1, seq_len_k) 
 
- 
deepword.models.transformer.decode_next_step(decoder, time, enc_x, enc_output, training, dec_padding_mask, copy_mask, batch_size, tgt_vocab_size, eos_id, padding_id, beam_size, use_greedy, temperature, inc_tar, inc_continue, inc_valid_len, inc_p_gen, inc_sum_logits)¶
- decode one step with beam search given inc_tar as the current decoded target sequence (batch_size * beam_size), first decode one step with decoder to get decoded_logits. then mask the decoded_logits: - if continue to decode (i.e. eos never reached) and current time reach the max_tar_len, then only EOS is allowed to choose; 
- if not continue to decode, only PAD is allowed to choose; 
- default, we don’t mask the decoded_logits. 
 - After get predicted_id, either by sampling method or greedy method, we compute 1) beam_id and 2) token_id from predicted_id. beam_id indicates which beam to choose, token_id indicates under that beam, which token to choose. - for loop variables, inc_tar, inc_continue, inc_logits, inc_valid_len, and inc_p_gen, we first select rows according to beam_id, then pad the token_id related info to the end. e.g. given beam_size = 2, batch_size = 2, we have inc_tar: - [[[1, 2, 3],
- [2, 3, 4]], # –> this beam row will be deleted - [[9, 8, 7],
- [8, 7, 6]]] 
 
 - if beam_id = [[0, 0], [0, 1]], then we choose [1, 2, 3] twice, and [9, 8, 7] once, and [8, 7, 6] once, then make the inc_tar to be [[[1, 2, 3], - [1, 2, 3]], - [[9, 8, 7],
- [8, 7, 6]]] 
 - then pad new token_id to the end. 
- 
deepword.models.transformer.get_sparse_idx_for_copy(src, target_seq_len: int)¶
- Create sparse index from source sentence for copying into decoder using the tf.scatter_nd method. - Considering the following source sentence: “a, b, a, c”; turn it into indices: [0, 1, 0, 2], and they have attention weights attn = [a0, a1, a2, a3]. - Now we want to decode a sentence with 3 tokens, for each generated token, we want to collect attention weights from the source sentence, and mix with the logits to generate the current token. - I.e. for decoded sentence position i, we have logits(i) = [0.1, 0.2, 0.3, 0.5] for all possible tokens a, b, c, d. Then we want to sum the attention weights of two-0s, one-1, and one-2 into the logits(i) according to a generation weight p(i), i.e. total logits = logits(i) * p(i) + [a0 + a2, a1, a3, 0] * (1 - p(i)). - The goal is to create a dense vector of vocabulary size, and copy attention weights from source sentence to the dense vector. - We create a inverse index to do so. For target token i, we need to collect [(0, a0), (1, a1), (0, a2), (2, a3)] to construct the vector. - Parameters
- src – source sentence 
- target_seq_len – target sequence len 
 
- Returns
- sparse index to construct attention weight matrix for a batch 
 - Examples - >>> get_sparse_idx_for_copy(src=[[0, 1, 0, 2]], target_seq_len=3) array([[[[0, 0], [0, 1], [0, 0], [0, 2]], [[1, 0], [1, 1], [1, 0], [1, 2]], [[2, 0], [2, 1], [2, 0], [2, 2]]]], dtype=int32) shape: (1, 3, 4, 2) # batch_size, target sentence len, source sentence len, 2D matrix indices 
- 
deepword.models.transformer.nucleus_renormalization(logits, p=0.95)¶
- Refer to [Holtzman et al., 2020] for nucleus sampling - Parameters
- logits – last-dimension logits of vocabulary V; 2D array, [batch, V] or [batch*beam, V] 
- p – the cumulative probability bound, default 0.95; 
 
- Returns
- normalized nucleus logits 
 
- 
deepword.models.transformer.point_wise_feed_forward_network(d_model, dff)¶
- Two dense layers, one with activation, the second without activation. - Parameters
- d_model – model size 
- dff – intermediate size 
 
- Returns
- FFN(x) 
 
- 
deepword.models.transformer.scaled_dot_product_attention(q, k, v, mask)¶
- Calculate the attention weights. q, k, v must have matching leading dimensions. k, v must have matching penultimate dimension, i.e.: seq_len_k = seq_len_v. The mask has different shapes depending on its type(padding or look ahead) but it must be broadcastable for addition. - Parameters
- q – query shape == (…, seq_len_q, depth) 
- k – key shape == (…, seq_len_k, depth) 
- v – value shape == (…, seq_len_v, depth_v) 
- mask – Float tensor with shape broadcastable to (…, seq_len_q, seq_len_k). Defaults to None. 
 
 - Notice that mask must have the same dimensions as q, k, v.
- e.g. if q, k, v are (batch_size, num_heads, seq_len, depth), then the mask should be also (batch_size, num_heads, seq_len, depth). However, if q, k, v are (batch_size, seq_len, depth), then the mask should also not contain num_heads. 
 - Returns
- output (a.k.a. context vectors), scaled_attention_logits 
 
- 
deepword.models.transformer.sequential_decoding(decoder, copy_mask, enc_x, enc_output, training, max_tar_len, sos_id, eos_id, padding_id, use_greedy=True, beam_size=1, temperature=1.0)¶
- 
deepword.models.transformer.token_logit_masking(token_id: int, vocab_size: int)¶
- Generate logits to choose the token_id. e.g. with vocab_size = 10, token_id = 0, we have [ 0., -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf, -inf] plus this mask with normal logits, only token_id=0 can be chose 
deepword.models.utils module¶
- 
deepword.models.utils.encoder_cnn(src, src_embeddings, pos_embeddings, filter_sizes, num_filters, embedding_size, is_infer=False, num_channels=2, activation='tanh')¶
- encode state with CNN, refer to Convolutional Neural Networks for Sentence Classification - Parameters
- src – placeholder, (tf.int32, [batch_size, seq_len]) 
- src_embeddings – (tf.float32, [vocab_size, embedding_size]) 
- pos_embeddings – (tf.float32, [max_position_size, embedding_size]) 
- filter_sizes – list of ints, e.g. [3, 4, 5] 
- num_filters – number of filters of each filter_size 
- embedding_size – embedding size 
- is_infer – training or inference 
- num_channels – 1 or 2. 
- activation – tanh (default) or relu 
 
- Returns
- a vector as the inner state 
 
- 
deepword.models.utils.encoder_cnn_base(input_tensor, filter_sizes, num_filters, num_channels, embedding_size, is_infer=False, activation='tanh')¶
- We pad input_tensor in the head for each string to generate equal-size output. E.g. - go north forest path this is a path … given conv-filter size 3, it will be padded in the head with two tokens <S> <S> go north forest path this is a path … OR [PAD] [PAD] go north forest path this is a path … - the type of padding values doesn’t matter only if it is a special token, and be identical for each model. - We use constant value 0 here, so make sure index-0 is a special token that can be used to pad in your vocabulary. - Parameters
- input_tensor – (tf.float32, [batch_size, seq_len, embedding_size, num_channels]) 
- filter_sizes – list of ints, e.g. [3, 4, 5] 
- num_filters – number of filters for each filter size 
- num_channels – 1 or 2, depending on the input tensor 
- embedding_size – word embedding size 
- is_infer – training or infer 
- activation – choose from “tanh” or “relu”. Notice that if choose relu, make sure adding an extra dense layer, otherwise the output is all non-negative values. 
 
- Returns
- a vector as the inner state 
 
- 
deepword.models.utils.encoder_lstm(src, src_len, src_embeddings, num_units, num_layers)¶
- encode state with LSTM - Parameters
- src – placeholder, (tf.int32, [None, None]) 
- src_len – placeholder, (tf.float32, [None]) 
- src_embeddings – (tf.float32, [vocab_size, embedding_size]) 
- num_units – number of LSTM units 
- num_layers – number of LSTM layers 
 
- Returns
- inner states (c, h) 
 
- 
deepword.models.utils.l2_loss_1d_action(q_actions, action_idx, expected_q, b_weight)¶
- l2 loss for 1D action space. only q values in q_actions
- selected by action_idx will be computed against expected_q 
 - e.g. “go east” would be one whole action. action_idx should have the same dimension as expected_q - Parameters
- q_actions – q-values 
- action_idx – placeholder, the action chose for the state, in a format of (tf.int32, [None]) 
- expected_q – placeholder, the expected reward gained from the step, in a format of (tf.float32, [None]) 
- b_weight – weights for each data point 
 
- Returns
- l2 loss and l1 loss 
 
- 
deepword.models.utils.l2_loss_1d_action_v2(q_actions, action_idx, expected_q, n_actions, b_weight)¶
- l2 loss for 1D action space. e.g. “go east” would be one whole action. - q_actions: Q-vector of a state for all actions action_idx: placeholder, the action chose for the state, - in a format of (tf.int32, [None]) - expected_q: placeholder, the expected reward gained from the step,
- in a format of (tf.float32, [None]) 
 - n_actions: number of total actions b_weight: weights for each data point - Returns
- l2 loss and l1 loss 
 
- 
deepword.models.utils.l2_loss_2d_action(q_actions, action_idx, expected_q, vocab_size, action_len, max_action_len, b_weight)¶
- l2 loss for 2D action space. e.g. “go east” is an action composed by “go” and “east”. - Parameters
- q_actions – Q-matrix of a state for all action-components, e.g. tokens 
- action_idx – placeholder, the action-components chose for the state, in a format of (tf.int32, [None, None]) 
- expected_q – placeholder, the expected reward gained from the step, in a format of (tf.float32, [None]) 
- vocab_size – number of action-components 
- action_len – length of each action in a format of (tf.int32, [None]) 
- max_action_len – maximum length of action 
- b_weight – weights for each data point 
 
- Returns
- l2 loss and l1 loss 
 
- 
deepword.models.utils.positional_encoding(position, d_model)¶
- Create position embeddings with sin/cos, not need to train - Parameters
- position – maximum position size 
- d_model – embedding size 
 
- Returns
- position embeddings in shape (1, position, d_model) 
 
