deepword package¶
Subpackages¶
- deepword.agents package
- deepword.models package- Submodules
- deepword.models.dqn_modeling module
- deepword.models.drrn_modeling module
- deepword.models.dsqn_modeling module
- deepword.models.gen_modeling module
- deepword.models.models module
- deepword.models.nlu_modeling module
- deepword.models.snn_modeling module
- deepword.models.transformer module
- deepword.models.utils module
- Module contents
 
- deepword.students package
- deepword.tests package
- deepword.tools package- Submodules
- deepword.tools.build_test_set module
- deepword.tools.clean_hs2tj module
- deepword.tools.collect_game_elements module
- deepword.tools.compare_eval_results module
- deepword.tools.diff_train_w_test module
- deepword.tools.play_game module
- deepword.tools.read_eval_failed_reason module
- deepword.tools.read_eval_results module
- deepword.tools.replay_from_log module
- Module contents
 
Submodules¶
deepword.action module¶
- 
class deepword.action.ActionCollector(tokenizer: deepword.tokenizers.Tokenizer, n_tokens: int, unk_val_id: int, padding_val_id: int)¶
- Bases: - deepword.log.Logging- Collect actions for different games - 
__init__(tokenizer: deepword.tokenizers.Tokenizer, n_tokens: int, unk_val_id: int, padding_val_id: int) → None¶
- Parameters
- tokenizer – see - deepword.tokenizers
- n_tokens – max allowed number of tokens for all actions 
- unk_val_id – ID of the unknown token 
- padding_val_id – ID of the padding token 
 
 
 - 
property action2idx¶
- Current action to token IDs 
 - 
property action_len¶
- Current action lengths 
 - 
property action_matrix¶
- Current action matrix 
 - 
property actions¶
- Current actions in string. 
 - 
add_new_episode(gid: str) → None¶
- Add a new episode with game ID. - Parameters
- gid – game ID, a string that separate different games. 
 
 - 
extend(actions: List[str]) → numpy.ndarray¶
- Extend actions into ActionCollector. - Parameters
- actions – a list of actions for current episode of game-playing. 
 
 - 
get_action_len(gid: Optional[str] = None) → numpy.ndarray¶
- Get action lengths for a game - Parameters
- gid – game ID, None falls back to current episode of game. 
- Returns
- an array of int, each element is a length for that action 
 
 - 
get_action_matrix(gid: Optional[str] = None) → numpy.ndarray¶
- Get action matrix for a game. - Parameters
- gid – the game ID. None falls back to current activated game. 
- Returns
- an array of actions, each action is a vector of its IDs to tokens
- that are filled with padding in the end to reach the same token size. 
 
 
 - 
get_actions(gid: Optional[str] = None) → List[str]¶
- Get all actions for a game. - Parameters
- gid – game ID, None falls back to current episode of game. 
- Returns
- a list of actions in string 
 
 - 
get_game_ids() → List[str]¶
- Get all game IDs in this ActionCollector. 
 - 
load_actions(path: str) → None¶
- Load all actions in this ActionCollector - Parameters
- path – a path to a npz file. 
 
 - 
save_actions(path: str) → None¶
- Save all actions to a path as a npz file. - Parameters
- path – a npz path to save 
 
 
- 
deepword.dependency_parser module¶
- 
class deepword.dependency_parser.DependencyParserReorder(padding_val: str, stride_len: int)¶
- Bases: - deepword.log.Logging- Use dependency parser to reorder master sentences. Make sure to open Stanford CoreNLP server first. Refer to https://stanfordnlp.github.io/CoreNLP/corenlp-server.html - The DP reorder class is used with CNN layers for trajectory encoding. Refer to https://arxiv.org/abs/1905.02265 for details. - 
__init__(padding_val: str, stride_len: int) → None¶
- Parameters
- padding_val – padding token, e.g. ‘[PAD]’ or ‘O’ 
- stride_len – CNN stride len 
 
 
 - 
reorder(master: str) → str¶
- Use dependency parser to reorder a paragraph. 
 
- 
deepword.eval_games module¶
- 
class deepword.eval_games.EvalResult(score, positive_score, negative_score, max_score, steps, won, action_list)¶
- 
class deepword.eval_games.FullDirEvalPlayer¶
- Bases: - deepword.log.Logging- 
classmethod start(hp, model_dir, game_files, n_gpus, range_min=None, range_max=None)¶
 
- 
classmethod 
- 
class deepword.eval_games.LoopDogEvalPlayer¶
- Bases: - deepword.log.Logging- 
start(hp, model_dir, game_files, n_gpus)¶
 
- 
- 
class deepword.eval_games.MultiGPUsEvalPlayer(hp, model_dir, game_files, n_gpus, load_best=True)¶
- Bases: - deepword.log.Logging- Eval Player that runs on multiple GPUs - 
evaluate(restore_from: str, debug: bool = False) → None¶
- Evaluate an agent - Parameters
- restore_from – path to restore weights 
- debug – if debug, multi-threads will be disabled 
 
 
 - 
has_better_model(total_scores: float, total_steps: float) → bool¶
- Whether or not the current model is better - Parameters
- total_scores – total scores earned 
- total_steps – total steps used 
 
 
 - 
save_best_model(loaded_ckpt_step: int) → None¶
- Copy current model to the best model dir - Parameters
- loaded_ckpt_step – which model to copy 
 
 - 
classmethod split_game_files(game_files: List[str], k: int, rnd_seed: int = 42) → List[List[str]]¶
- Split game files into k portions for multi GPUs playing - Parameters
- game_files – a list of games for playing 
- k – number of splits 
- rnd_seed – random seed 
 
- Returns
- a list of list of game files 
 
 
- 
- 
class deepword.eval_games.NewModelHandler(hp, model_dir, game_files, n_gpus)¶
- Bases: - watchdog.events.FileSystemEventHandler- 
is_ckpt_file(src_path)¶
 - 
on_created(event)¶
- Called when a file or directory is created. - Parameters
- event ( - DirCreatedEventor- FileCreatedEvent) – Event representing file/directory creation.
 
 - 
on_modified(event)¶
- Called when a file or directory is modified. - Parameters
- event ( - DirModifiedEventor- FileModifiedEvent) – Event representing file/directory modification.
 
 - 
run_eval_player(restore_from=None, load_best=False)¶
 
- 
- 
class deepword.eval_games.WatchDogEvalPlayer¶
- Bases: - deepword.log.Logging- 
start(hp, model_dir, game_files, n_gpus)¶
 
- 
- 
deepword.eval_games.agent_collect_data(agent, game_files, max_episode_steps, epoch_size, epoch_limit)¶
- 
deepword.eval_games.agg_eval_results(eval_results: Dict[str, List[deepword.eval_games.EvalResult]], max_steps_per_episode: int = 100) → Tuple[Dict[str, deepword.eval_games.EvalResult], float, float, float, float, float, float]¶
- Aggregate evaluation results. We run N test games, each with M episodes, each episode has a maximum of K steps. - Parameters
- eval_results – - evaluation results of text-based games, in the following format: - dict(game_name, [eval_result1, …, evaluate_resultM]) and the number of eval_results are the same for all games. evaluate_result: - score, positive_score, negative_score, max_score, steps,
- won (bool), used_action_list 
 
- max_steps_per_episode – i.e. M, default = 100 
 
- Returns
- dict(game_name, sum scores, sum max scores, sum steps, # won) sample_mean: total earned scores / total maximum scores confidence_interval: confidence interval of sample_mean over M episodes. steps: total used steps / total maximum steps 
- Return type
- agg_per_game 
 
- 
deepword.eval_games.eval_agent(hp: tensorflow.contrib.training.python.training.hparam.HParams, model_dir: str, load_best: bool, restore_from: str, game_files: List[str], gpu_device: Optional[str] = None) → Tuple[Dict[str, List[deepword.eval_games.EvalResult]], int]¶
- Evaluate an agent with given games. For each game, we run nb_episodes, and max_episode_steps for on episode. - Notice that evaluation game running is different with training. In training, we register all given games to TextWorld structure, and play them in a random way. For evaluation, we register one game at a time, and play it for nb_episodes. - Parameters
- hp – hyperparameter to create the agent 
- model_dir – model dir of the agent 
- load_best – bool, load from best_weights or not (last_weights) 
- restore_from – string, load from a specific model, e.g. {model_dir}/last_weights/after_epoch-0 
- game_files – game files for evaluation 
- gpu_device – which GPU device to load, in a format of “/device:GPU:i” 
 
- Returns
- eval_results, loaded_ckpt_step 
 
- 
deepword.eval_games.scores_of_tiers(agg_per_game: Dict[str, deepword.eval_games.EvalResult]) → Dict[str, float]¶
- Compute scores per tier given aggregated scores per game - Parameters
- agg_per_game – Aggregated results per game 
- Returns
- list of tier-name -> scores, starting from tier1 to tier6 
 
deepword.floor_plan module¶
- 
class deepword.floor_plan.FloorPlanCollector¶
- Bases: - deepword.log.Logging- Collect floor plan from games - e.g. going eastward from kitchen is bedroom, then we know
- kitchen – east –> bedroom and bedroom – west –> kitchen. 
 - 
add_new_episode(eid)¶
 - 
extend(fps)¶
 - 
get_map(room)¶
 - 
init()¶
 - 
load_fps(path)¶
 - 
route_to_kitchen(room)¶
 - 
classmethod route_to_room(ss, tt, fp, visited)¶
- find the fastest route to a target room from a given room using DFS. :param ss: start room :param tt: target room :param fp: floor plan :param visited: initialized by [] :return: directions, rooms 
 - 
save_fps(path)¶
 
deepword.hparams module¶
- 
class deepword.hparams.Conventions(logo_file, bert_ckpt_dir, bert_vocab_file, nltk_vocab_file, glove_vocab_file, glove_emb_file, legacy_zork_vocab_file, albert_ckpt_dir, albert_vocab_file, albert_spm_path, bert_cls_token, bert_unk_token, bert_padding_token, bert_sep_token, bert_mask_token, bert_sos_token, bert_eos_token, albert_cls_token, albert_unk_token, albert_padding_token, albert_sep_token, albert_mask_token, nltk_unk_token, nltk_padding_token, nltk_sos_token, nltk_eos_token)¶
- Bases: - deepword.hparams.Conventions
- 
deepword.hparams.copy_hparams(hp: tensorflow.contrib.training.python.training.hparam.HParams) → tensorflow.contrib.training.python.training.hparam.HParams¶
- Deepcopy for hp 
- 
deepword.hparams.get_model_hparams(model_creator: str) → tensorflow.contrib.training.python.training.hparam.HParams¶
- 
deepword.hparams.has_valid_val(dict_args: Optional[Dict[str, Any]], key: str) → bool¶
- if dict_args exists 
- if key in dict_args 
- if dict_args[key] is not None 
 
- 
deepword.hparams.load_hparams(fn_model_config: Optional[str] = None, cmd_args: Optional[Dict[str, Any]] = None, fn_pre_config: Optional[str] = None) → tensorflow.contrib.training.python.training.hparam.HParams¶
- load hyper-parameters priority(file_args) > priority(cmd_args) except arg in allowed_to_change priority(cmd_args) > priority(pre_config) priority(pre_config) > priority(default) - Parameters
- fn_model_config – hyperparameter config file in model_dir 
- cmd_args – command line arguments 
- fn_pre_config – pre config file for model 
 
- Returns
- hp 
 
- 
deepword.hparams.output_hparams(hp: tensorflow.contrib.training.python.training.hparam.HParams) → str¶
- pretty print str in a table style for hp 
- 
deepword.hparams.save_hparams(hp: tensorflow.contrib.training.python.training.hparam.HParams, file_path: str) → None¶
- Save hyperparameters to a json file 
- 
deepword.hparams.update_hparams_from_dict(hp: tensorflow.contrib.training.python.training.hparam.HParams, cmd_args: Dict[str, Any], allowed_to_change: Optional[Iterable[str]] = None) → tensorflow.contrib.training.python.training.hparam.HParams¶
- update hp from a dict :param hp: hyperparameters :param cmd_args: command line arguments :param allowed_to_change: keys that are allowed to update - Returns
- a hp 
 
- 
deepword.hparams.update_hparams_from_file(hp: tensorflow.contrib.training.python.training.hparam.HParams, file_args: str) → tensorflow.contrib.training.python.training.hparam.HParams¶
- update hp from a json file 
- 
deepword.hparams.update_hparams_from_hparams(hp: tensorflow.contrib.training.python.training.hparam.HParams, hp2: tensorflow.contrib.training.python.training.hparam.HParams) → tensorflow.contrib.training.python.training.hparam.HParams¶
- update hp from hp2 hp should not have same keys with hp2 
deepword.log module¶
- 
class deepword.log.Logging(name: Optional[str] = None)¶
- Bases: - object- Logging utils for classes - 
__init__(name: Optional[str] = None)¶
- Parameters
- name – name for logging, default module_name.class_name 
 
 - 
debug(msg, *args, **kwargs)¶
 - 
error(msg, *args, **kwargs)¶
 - 
info(msg, *args, **kwargs)¶
 - 
warning(msg, *args, **kwargs)¶
 
- 
deepword.main module¶
- 
deepword.main.eval_one_ckpt(hp, model_dir, data_path, learner_clazz, device, ckpt_path)¶
- 
deepword.main.get_parser() → argparse.ArgumentParser¶
- Get arg parser for different modules 
- 
deepword.main.hp_parser() → argparse.ArgumentParser¶
- Arg parser for hyper-parameters 
- 
deepword.main.main(args)¶
- 
deepword.main.process_eval_dqn(args)¶
- Evaluate dqn models 
- 
deepword.main.process_eval_student(args)¶
- Evaluate student models 
- 
deepword.main.process_gen_data(args)¶
- Generate training data from a teacher model 
- 
deepword.main.process_hp(args) → tensorflow.contrib.training.python.training.hparam.HParams¶
- Load hyperparameters from three location 1. config file in model_dir 2. pre config files 3. cmd line args - Parameters
- args – cmd line args 
- Returns
- hyperparameters 
 
- 
deepword.main.process_snn_input(args)¶
- generate snn input 
- 
deepword.main.process_train_dqn(args)¶
- Train DQN models 
- 
deepword.main.process_train_student(args)¶
- Train student models 
- 
deepword.main.run_agent(agent: deepword.agents.base_agent.BaseAgent, game_env: gym.core.Env, nb_games: int, nb_epochs: int) → None¶
- Run a train agent on given games. - Parameters
- agent – an agent extends the base agent, see - deepword.agents.base_agent.BaseAgent.
- game_env – game Env, from gym 
- nb_games – number of games 
- nb_epochs – number of epochs for training 
 
 
- 
deepword.main.run_agent_v2(agent: deepword.agents.base_agent.BaseAgent, game_env: gym.core.Env, nb_games: int, nb_epochs: int) → None¶
- Run a train agent on given games. Proactively request look and inventory results from games to substitute the description and inventory parts of infos. This is useful when games don’t provide description and inventory, e.g. for Z-machine games. - NB: This will incur extra steps for game playing, remember to use 3-times of previous step quota. E.g. previously use 100 max steps, now you need 100 * 3 max steps. 
- 
deepword.main.train(hp: tensorflow.contrib.training.python.training.hparam.HParams, model_dir: str, game_dir: str, f_games: Optional[str] = None, func_run_agent: Callable[[deepword.agents.base_agent.BaseAgent, gym.core.Env, int, int], None] = <function run_agent>) → None¶
- train an agent - Parameters
- hp – hyper-parameters see - deepword.hparams
- model_dir – model dir 
- game_dir – game dir with ulx games 
- f_games – game name to select from game_dir 
- func_run_agent – how to run the agent and games, see - deepword.main.run_agent()
 
 
- 
deepword.main.train_v2(hp, model_dir, game_dir, f_games=None)¶
- Train DQN agents by proactively requesting description and inventory - max step per episode will be enlarged by 3-times. 
deepword.stats module¶
- 
class deepword.stats.UCBComputer(d_states: int, d_actions: int)¶
- Bases: - object- Compute the Upper Confidence Bound actions during game playing at inference time only, when hidden states are fixed. - The Abbasi-Yadkori, Pal, and Szepesvari bound for LinUCB.
- Cite: Improved Algorithms for Linear Stochastic Bandits,
- (Abbasi-Yadkori, Pal, and Szepesvari, 2011) 
 - See also: Learn What Not to Learn, (Tom Zahavy et al., 2019) 
 - We use the APS bound. - 
__init__(d_states: int, d_actions: int)¶
- V: covariance matrix for each action a lam: lambda to control parameter size in ridge regression r: R-sub-Gaussian s: bound for |theta_a|_2 delta: with probability of 1 - delta, we have the bound - Parameters
- d_states – dimension of hidden states 
- d_actions – number of actions 
 
 
 - 
aps_bound(q_actions: numpy.ndarray, h_state: numpy.ndarray) → float¶
- Compute APS bound - Parameters
- q_actions – Q-vector of actions 
- h_state – hidden state of a game state 
 
 - Returns
- upper confidence bound of q_actions. 
 
 - 
collect_sample(action_idx: int, h_state: numpy.ndarray) → None¶
- Collect state-action pairs - Parameters
- action_idx – action index 
- h_state – hidden state vector 
 
 
 - 
reset() → None¶
- Reset to accept new episodes 
 
- 
deepword.stats.mean_confidence_interval(data: numpy.ndarray, confidence: float = 0.95) → Tuple[float, float]¶
- Given data, 1D np array, compute the mean and confidence intervals given confidence level. - Parameters
- data – 1D np array 
- confidence – confidence level 
 
- Returns
- mean and confidence interval 
 
deepword.sum_tree module¶
This SumTree code is modified version of Morvan Zhou: https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/contents/5.2_Prioritized_Replay_DQN/RL_brain.py
- 
class deepword.sum_tree.SumTree(*args, **kwds)¶
- Bases: - typing.Generic- The SumTree is a binary tree, with leaf nodes containing the real data. - 
add(priority: float, data: E)¶
- Add an experience into the tree with a priority - Parameters
- priority – priority of sampling 
- data – experience of the replay 
 
- Returns
- old data at the same position, 0 if unset 
 
 - 
get_leaf(v: float) → Tuple[int, float, E]¶
- Get a leaf_index w.r.t. a priority value the selected leaf_index must have the smallest priority among all leaves that have larger priority values than v. - Parameters
- v – a priority value 
- Returns
- leaf index, priority, and experience associated with the leaf index 
 
 - 
property total_priority¶
- The total priority is the value on the root node. 
 - 
update(tree_index: int, priority: float) → None¶
- Update the leaf priority score and propagate the change through tree - Parameters
- tree_index – tree index of the current data_pointer 
- priority – priority sampling value 
 
 
 
- 
deepword.tokenizers module¶
- 
class deepword.tokenizers.AlbertTokenizer(vocab_file, do_lower_case, spm_model_file)¶
- Bases: - deepword.tokenizers.BertTokenizer- The tokenizer from Albert - 
de_tokenize(ids)¶
- turn a list of ids - Parameters
- ids – ids of tokens 
- Returns
- a string 
 
 
- 
- 
class deepword.tokenizers.BertTokenizer(vocab_file, do_lower_case)¶
- Bases: - deepword.tokenizers.Tokenizer- The tokenizer from BERT - 
convert_ids_to_tokens(ids)¶
- convert ids to tokens - Parameters
- ids – a list of ids 
- Returns
- a list of tokens 
 
 - 
convert_tokens_to_ids(tokens)¶
- convert tokens into ids - Parameters
- tokens – a list of tokens 
- Returns
- a list of ids 
 
 - 
de_tokenize(ids)¶
- turn a list of ids - Parameters
- ids – ids of tokens 
- Returns
- a string 
 
 - 
property inv_vocab¶
- inverse of vocabulary - Returns
- map from positions (ids) to tokens 
 
 - 
tokenize(text)¶
- tokenize a text into a list of tokens - Parameters
- text – a string to tokenize 
- Returns
- a list of tokens 
 
 - 
property vocab¶
- get the vocabulary - Returns
- map from tokens to positions (ids) 
 
 
- 
- 
class deepword.tokenizers.LegacyZorkTokenizer(vocab_file)¶
- Bases: - deepword.tokenizers.NLTKTokenizer- the NLTK tokenizer but keep only alphabetic strings by removing all other tokens w/ non-alphabetic characters. - 
tokenize(text)¶
- tokenize a text into a list of tokens - Parameters
- text – a string to tokenize 
- Returns
- a list of tokens 
 
 
- 
- 
class deepword.tokenizers.NLTKTokenizer(vocab_file, do_lower_case)¶
- Bases: - deepword.tokenizers.Tokenizer- wrapper of the tokenizer from NLTK package - 
convert_ids_to_tokens(ids)¶
- convert ids to tokens - Parameters
- ids – a list of ids 
- Returns
- a list of tokens 
 
 - 
convert_tokens_to_ids(tokens)¶
- convert tokens into ids - Parameters
- tokens – a list of tokens 
- Returns
- a list of ids 
 
 - 
de_tokenize(ids: List[int]) → str¶
- turn a list of ids - Parameters
- ids – ids of tokens 
- Returns
- a string 
 
 - 
property inv_vocab¶
- inverse of vocabulary - Returns
- map from positions (ids) to tokens 
 
 - 
tokenize(text)¶
- tokenize a text into a list of tokens - Parameters
- text – a string to tokenize 
- Returns
- a list of tokens 
 
 - 
property vocab¶
- get the vocabulary - Returns
- map from tokens to positions (ids) 
 
 
- 
- 
class deepword.tokenizers.Tokenizer¶
- Bases: - object- A wrapper of tokenizer - 
convert_ids_to_tokens(ids: List[int]) → List[str]¶
- convert ids to tokens - Parameters
- ids – a list of ids 
- Returns
- a list of tokens 
 
 - 
convert_tokens_to_ids(tokens: List[str]) → List[int]¶
- convert tokens into ids - Parameters
- tokens – a list of tokens 
- Returns
- a list of ids 
 
 - 
de_tokenize(ids: List[int]) → str¶
- turn a list of ids - Parameters
- ids – ids of tokens 
- Returns
- a string 
 
 - 
property inv_vocab¶
- inverse of vocabulary - Returns
- map from positions (ids) to tokens 
 
 - 
tokenize(text: str) → List[str]¶
- tokenize a text into a list of tokens - Parameters
- text – a string to tokenize 
- Returns
- a list of tokens 
 
 - 
property vocab¶
- get the vocabulary - Returns
- map from tokens to positions (ids) 
 
 
- 
- 
deepword.tokenizers.get_albert_tokenizer(hp: tensorflow.contrib.training.python.training.hparam.HParams) → Tuple[tensorflow.contrib.training.python.training.hparam.HParams, deepword.tokenizers.Tokenizer]¶
- 
deepword.tokenizers.get_bert_tokenizer(hp: tensorflow.contrib.training.python.training.hparam.HParams) → Tuple[tensorflow.contrib.training.python.training.hparam.HParams, deepword.tokenizers.Tokenizer]¶
- 
deepword.tokenizers.get_nltk_tokenizer(hp: tensorflow.contrib.training.python.training.hparam.HParams, vocab_file: str = '/Users/xusenyin/git-store/deepword/python/deepword/../../resources/vocab.txt') → Tuple[tensorflow.contrib.training.python.training.hparam.HParams, deepword.tokenizers.Tokenizer]¶
- 
deepword.tokenizers.get_zork_tokenizer(hp: tensorflow.contrib.training.python.training.hparam.HParams, vocab_file: str = '/Users/xusenyin/local/opt/legacy-zork-vocab.txt') → Tuple[tensorflow.contrib.training.python.training.hparam.HParams, deepword.tokenizers.Tokenizer]¶
- 
deepword.tokenizers.init_tokens(hp: tensorflow.contrib.training.python.training.hparam.HParams) → Tuple[tensorflow.contrib.training.python.training.hparam.HParams, deepword.tokenizers.Tokenizer]¶
- Initialize a tokenizer given hyperparameters - Parameters
- hp – hyperparameters, see - deepword.hparams
- Returns
- updated hp, tokenizer 
 
deepword.trajectory module¶
- 
class deepword.trajectory.Trajectory(*args, **kwds)¶
- Bases: - typing.Generic,- deepword.log.Logging- BaseTrajectory only takes care of interacting with Agent on collecting game scripts. Fetching data from Trajectory feeding into Encoder should be implemented in extended classes. - 
__init__(num_turns: int, size_per_turn: int = 1)¶
- Take the ActionMaster (AM) as an example, Trajectory(AM1, AM2, AM3, AM4, AM5), and last_sid points to AM5; num_turns = 1 means we choose [AM5]; - size_per_turn only controls the way we separate pre- and post-trajectory default with size_per_turn = 1, AM4 is the pre-trajectory of AM5. - Sometimes we need to change it, e.g. with legacy data where we store trajectory as Trajectory(M1, A1, M2, A2, M3, A3, M4), and the last sid points to M4, then the pre-trajectory of [A3, M4] is [A2, M3], that’s why the size_per_turn should set to 2. - Parameters
- num_turns – how many turns to choose other than current turn 
- size_per_turn – how many cells count as one turn 
 
 
 - 
add_new_tj(tid: Optional[int] = None) → int¶
- Add a new trajectory - Parameters
- tid – trajectory id, None falls back to auto-generated id. 
- Returns
- a tid 
 
 - 
append(content: T) → None¶
- Use generic type for content the trajectory class doesn’t care what has been stored and also doesn’t process them - Parameters
- content – something to add in the current trajectory 
 
 - 
fetch_batch_pre_states(b_tid: List[int], b_sid: List[int]) → List[List[T]]¶
- Fetch a batch of pre-states given trajectory ids and state ids - the position of pre-states is depend on size_per_turn. - Parameters
- b_tid – a batch of trajectory ids 
- b_sid – a batch of state ids 
 
- Returns
- a list of lists of contents 
 
 - 
fetch_batch_states(b_tid: List[int], b_sid: List[int]) → List[List[T]]¶
- Fetch a batch of states given trajectory ids and state ids. - Parameters
- b_tid – a batch of trajectory ids 
- b_sid – a batch of state ids 
 
- Returns
- a list of lists of contents 
 
 - 
fetch_last_state() → List[T]¶
- Fetch the last state from the current trajectory - Returns
- a list of contents 
 
 - 
fetch_state_by_idx(tid: int, sid: int) → List[T]¶
- fetch a state given trajectory id and state id - Returns
- a list of contents 
 
 - 
get_current_tid() → int¶
- Get current trajectory id 
 - 
get_last_sid() → int¶
- A state is defined as a series of interactions between a game and an agent ended with the game’s last response. e.g. “G0, A1, G2, A3, G4” is a state ended with the game’s last response named “G4”. 
 - 
load_tjs(path: str) → None¶
- Load trajectories from a npz file 
 - 
request_delete_keys(ks: List[int]) → Dict[int, List[T]]¶
- Request to delete all trajectories with keys in ks. - Parameters
- ks – a list of keys of trajectories to be deleted 
 
 - 
save_tjs(path: str) → None¶
- Save all trajectories in a npz file - All trajectory ids, all trajectories, current trajectory id, and current trajectory will be saved. 
 
- 
deepword.tree_memory module¶
This SumTree code is modified version and the original code is from: https://github.com/jaara/AI-blog/blob/master/Seaquest-DDQN-PER.py
- 
class deepword.tree_memory.TreeMemory(*args, **kwds)¶
- Bases: - deepword.log.Logging,- typing.Generic- TreeMemory to store and sample experiences to replay. - 
append(experience: E) → E¶
- New experiences have a score of max priority over all leaves to make sure to be sampled. - Parameters
- experience – a new experience 
- Returns
- previous experience in the same position 
 
 - 
batch_update(tree_idx: numpy.ndarray, abs_errors: numpy.ndarray) → None¶
- Update the priorities on the tree - Parameters
- tree_idx – an array of index (int) 
- abs_errors – an array of abs errors (float) 
 
 
 - 
load_memo(path: str) → None¶
- load memo only apply to a new memo without any appending. load memo will change previous tree structure. - Parameters
- path – a npz file to load 
 
 - 
sample_batch(n: int) → Tuple[numpy.ndarray, List[E], numpy.ndarray]¶
- Sample a batch of experiences according to priority values - First, to sample a batch of n size, the range [0, priority_total] is divided into n equally ranges. 
- Then a value is uniformly sampled from each range 
- We search in the SumTree, the experience where priority score correspond to sample values are retrieved from. 
- Then, we calculate importance sampling (IS) weights for each element in the batch 
 - Parameters
- n – batch size 
- Returns
- tree index, experiences, IS weights 
 
 - 
save_memo(path: str) → None¶
- Save the memory to a npz file - Parameters
- path – path to a npz file 
 
 - 
uniform_sample_batch(n: int) → numpy.ndarray¶
- Randomly sample a batch of experiences - Parameters
- n – batch size 
- Returns
- a batch of experiences 
 
 
- 
deepword.utils module¶
- 
deepword.utils.agent_name2clazz(name: str)¶
- Find the class given the agent name in this package. - Parameters
- name – Agent name from - deepword.agents
- Returns
- the class w.r.t. the agent name 
 
- 
deepword.utils.bytes2idx(byte_mask: List[bytes], size: int) → numpy.ndarray¶
- load a list of bytes to choose 1 for selected actions - Parameters
- byte_mask – a list of bytes 
- size – the size of total actions 
 
- Returns
- an np array of indices 
 
- 
deepword.utils.core_name2clazz(name: str)¶
- Find the class given the core name in this package. - Parameters
- name – Agent name from - deepword.agents.cores
- Returns
- the class w.r.t. the core name 
 
- 
deepword.utils.ctime() → int¶
- current time in millisecond 
- 
deepword.utils.eprint(*args, **kwargs)¶
- print to stderr 
- 
deepword.utils.flatmap(f, items)¶
- flatmap for python 
- 
deepword.utils.flatten(items)¶
- flatten a list of lists to a list 
- 
deepword.utils.get_hash(txt: str) → str¶
- get hex hash value for a string 
- 
deepword.utils.get_token2idx(tokens: List[str]) → Dict[str, int]¶
- From a list of tokens to a dict of token to position 
- 
deepword.utils.learner_name2clazz(name: str)¶
- Find the class given the learner name in this package. - Parameters
- name – Learner name from - deepword.students
- Returns
- the class w.r.t. the learner name 
 
- 
deepword.utils.load_actions(action_file: str) → List[str]¶
- Load unique actions from an action file 
- 
deepword.utils.load_and_split(game_path: str, f_games: str) → Tuple[List[str], List[str]]¶
- Load games and split train dev set - Parameters
- game_path – game dir 
- f_games – a file with list of games, each game name per line, without the 
- of ulx (suffix) – 
 
- Returns
- train_games, dev_games 
 
- 
deepword.utils.load_game_files(game_path: str, f_games: Optional[str] = None) → List[str]¶
- Load a dir of games, or a single game. if game_path represents a file, then return a list of the file; if game_path is a dir, then return a list of files in the dir suffixed with - .ulx; - if f_games is set, then load files in the game_path with names listed in
- f_games. 
 - Parameters
- game_path – a dir, or a single file 
- f_games – a file of game names, without suffix, default suffix .ulx 
 
- Returns
- a list of game files 
 
- 
deepword.utils.load_uniq_lines(fname: str) → List[str]¶
- Load unique lines from a file, line order preserved 
- 
deepword.utils.load_vocab(vocab_file: str) → List[str]¶
- Load unique words from a vocabulary 
- 
deepword.utils.model_name2clazz(name: str)¶
- Find the class given the model name in this package. - Parameters
- name – Model name from - deepword.models
- Returns
- the class w.r.t. the model name 
 
- 
deepword.utils.report_status(lst_of_status: List[Tuple[str, Any]]) → str¶
- Pretty print a series of k-v pairs - Parameters
- lst_of_status – A list of k-v pairs 
- Returns
- a string to print 
 
- 
deepword.utils.setup_eval_log(log_filename: str)¶
- Setup log for evaluation - Parameters
- log_filename – the path to log file 
 
- 
deepword.utils.setup_logging(default_path: str = 'logging.yaml', default_level: int = 20, env_key: str = 'LOG_CFG', local_log_filename: Optional[str] = None) → None¶
- Setup logging for python project - Load YAML config file from default_path, or from the environment variable set by env_key. Falls back to default config if file not exist. - if local_log_filename set, add a local rotating log file. 
- 
deepword.utils.setup_train_log(model_dir: str)¶
- Setup log for training by putting a game_script.log in model_dir. 
- 
deepword.utils.softmax(x: numpy.ndarray) → numpy.ndarray¶
- numerical stability softmax 
- 
deepword.utils.split_train_dev(game_files: List[str], train_ratio: float = 0.9, rnd_seed: int = 42) → Tuple[List[str], List[str]]¶
- Split train/dev sets from given game files sort - shuffle w/ Random(42) - split - Parameters
- game_files – game files 
- train_ratio – the percentage of training files 
- rnd_seed – for randomly shuffle files, default = 42 
 
- Returns
- train_games, dev_games 
 - Exception:
- empty game_files 
 
- 
deepword.utils.uniq(lst)¶
- order-preserving unique 
