Detail¶

Config¶

class caver.config.Config[source]¶

Basic config. All model config should inherit this.

batch_size = 256¶: batch size

checkpoint_dir = 'checkpoints'¶: checkpoint directory

dropout = 0.15¶: dropout rate

embedding_dim = 256¶: embedding dimension

epoch = 10¶: epoch num for train

input_data_dir = 'dataset'¶: data directory

lr = 0.0001¶: learning rate

master_device = 0¶: gpu device number

multi_gpu = False¶: use multi gpu or not

output_data_dir = 'processed_data'¶: save processed data directory

recall_k = 5¶: recall@k

train_filename = 'nlpcc_train.tsv'¶: train filename

valid_filename = 'nlpcc_valid.tsv'¶: validation filename

class caver.config.ConfigCNN[source]¶

CNN model config.

filter_num = 6¶: filter number

filter_sizes = [2, 3, 4]¶: list of filter size

model = 'CNN'¶: model name

class caver.config.ConfigLSTM[source]¶

LSTM model config.

bidirectional = False¶: use bidirectional LSTM or not

hidden_dim = 128¶: hidden number

layer_num = 1¶: hidden layer number

model = 'LSTM'¶: model name

class caver.config.ConfigfastText[source]¶

fastText model config.

model = 'fastText'¶: model name

Data¶

class caver.data.TextData(path='', **kwargs)[source]¶

extract()[source]¶: Extract word-freq and label-freq from data file.

load_index()[source]¶: Load index information from JSON file.

prepare()[source]¶: Generate data replaced by index from data file.

class caver.data.Segment(model='jieba', userdict=None, model_path=None)[source]¶

Parameters

model (str) – model type, [‘jieba’, ‘pyltp’]
userdict (str) – user dict file, used for initializing segment model
model_path (str) – segment model path (if you use pyltp)

cut(text)[source]¶: Cut sentence into words list.

Utils¶

class caver.utils.MiniBatchWrapper(dl, x_var, y_vars)[source]¶: wrap the simple torchtext iter with multiple y label

caver.utils.init_weight(layer)[source]¶: Init layer weights and bias

caver.utils.load_embedding(embedding_file, dim, vocab_size, index2word)[source]¶

Parameters

embedding_file (str) – path of embedding file
dim (int) – dimension of vector
vocab_size (int) – size of vocabulary
index2word (dict) – index => word

Load pre-trained embedding file.

First line of file should be the number of words and dimension of vector. Then each line is combined of word and vectors separated by space.

1024, 64 # 1024 words and 64-d
a 0.223 0.566 ......
b 0.754 0.231 ......
......

caver.utils.set_config(config, args_dict)[source]¶

Update config attributes with key-value in kwargs.

Keys not in config will be ignored.

caver.utils.update_config(config, **kwargs)[source]¶

Update config attributes with key-value in kwargs.

Keys not in config will be ignored.