Component: DocClassificationDataHandler

class DocClassificationDataHandler.Config[source]

Bases: DataHandler.Config

Configuration class for DocClassificationDataHandler.


List[str] – List containing the names of the columns to read from the data files.


int – Maximum sequence length for the input. The input is trimmed after the maximum sequence length.

All Attributes (including base classes)

columns_to_read: List[str] = ['doc_label', 'text', 'dict_feat']
shuffle: bool = True
sort_within_batch: bool = True
train_path: str = 'train.tsv'
eval_path: str = 'eval.tsv'
test_path: str = 'test.tsv'
train_batch_size: int = 128
eval_batch_size: int = 128
test_batch_size: int = 128
max_seq_len: int = -1

Default JSON

    "columns_to_read": [
    "shuffle": true,
    "sort_within_batch": true,
    "train_path": "train.tsv",
    "eval_path": "eval.tsv",
    "test_path": "test.tsv",
    "train_batch_size": 128,
    "eval_batch_size": 128,
    "test_batch_size": 128,
    "max_seq_len": -1