Detail

Config

class caver.config.Config[source]

Basic config. All model config should inherit this.

batch_size = 256

batch size

checkpoint_dir = 'checkpoints'

checkpoint directory

dropout = 0.15

dropout rate

embedding_dim = 256

embedding dimension

epoch = 10

epoch num for train

input_data_dir = 'dataset'

data directory

lr = 0.0001

learning rate

master_device = 0

gpu device number

multi_gpu = False

use multi gpu or not

output_data_dir = 'processed_data'

save processed data directory

recall_k = 5

recall@k

train_filename = 'nlpcc_train.tsv'

train filename

valid_filename = 'nlpcc_valid.tsv'

validation filename

class caver.config.ConfigCNN[source]

CNN model config.

filter_num = 6

filter number

filter_sizes = [2, 3, 4]

list of filter size

model = 'CNN'

model name

class caver.config.ConfigLSTM[source]

LSTM model config.

bidirectional = False

use bidirectional LSTM or not

hidden_dim = 128

hidden number

layer_num = 1

hidden layer number

model = 'LSTM'

model name

class caver.config.ConfigfastText[source]

fastText model config.

model = 'fastText'

model name

Data

class caver.data.TextData(path='', **kwargs)[source]
extract()[source]

Extract word-freq and label-freq from data file.

load_index()[source]

Load index information from JSON file.

prepare()[source]

Generate data replaced by index from data file.

class caver.data.Segment(model='jieba', userdict=None, model_path=None)[source]
Parameters
  • model (str) – model type, [‘jieba’, ‘pyltp’]

  • userdict (str) – user dict file, used for initializing segment model

  • model_path (str) – segment model path (if you use pyltp)

cut(text)[source]

Cut sentence into words list.

Utils

class caver.utils.MiniBatchWrapper(dl, x_var, y_vars)[source]

wrap the simple torchtext iter with multiple y label

caver.utils.init_weight(layer)[source]

Init layer weights and bias

caver.utils.load_embedding(embedding_file, dim, vocab_size, index2word)[source]
Parameters
  • embedding_file (str) – path of embedding file

  • dim (int) – dimension of vector

  • vocab_size (int) – size of vocabulary

  • index2word (dict) – index => word

Load pre-trained embedding file.

First line of file should be the number of words and dimension of vector. Then each line is combined of word and vectors separated by space.

1024, 64 # 1024 words and 64-d
a 0.223 0.566 ......
b 0.754 0.231 ......
......
caver.utils.set_config(config, args_dict)[source]

Update config attributes with key-value in kwargs.

Keys not in config will be ignored.

caver.utils.update_config(config, **kwargs)[source]

Update config attributes with key-value in kwargs.

Keys not in config will be ignored.