pytext.utils package


pytext.utils.cuda_utils module

pytext.utils.cuda_utils.Variable(data, *args, **kwargs)[source]

pytext.utils.data_utils module

class pytext.utils.data_utils.Slot(label: str, start: int, end: int)[source]

Bases: object

token_label(use_bio_labels, token_start, token_end)[source]
token_overlap(token_start, token_end)[source]
pytext.utils.data_utils.align_slot_labels(token_ranges: List[Tuple[int, int]], slots_field: str, use_bio_labels: bool = False)[source]
pytext.utils.data_utils.no_tokenize(s: Any) → Any[source]
pytext.utils.data_utils.parse_json_array(json_text: str) → List[str][source]
pytext.utils.data_utils.parse_slot_string(slots_field: str) → List[pytext.utils.data_utils.Slot][source]
pytext.utils.data_utils.parse_token(utterance: str, token_range: List[int]) → List[Tuple[str, Tuple[int, int]]][source]
pytext.utils.data_utils.simple_tokenize(s: str) → List[str][source]

pytext.utils.dist_utils module

pytext.utils.dist_utils.dist_init(distributed_rank: int, world_size: int, init_method: str, backend: str = 'nccl')[source]

pytext.utils.documentation_helper module

pytext.utils.documentation_helper.eprint(*args, **kwargs)[source]

Return the set of PyText classes matching that name. Handles fully-qualified class_name including module.


Find all the field names for a given class and their default value.


Return a dict of config help for this object, where: - key: config name - value: (default, type, options)

  • default: default value for this key if not specified
  • type: type for this config value, as a string
  • options: possible values for this config, only if type = Union

If the type is “Union”, the options give the lists of class names that are possible, and the default is one of those class names.


Pretty-print the fields of one object.

pytext.utils.documentation_helper.replace_components(root, component, base_class)[source]

Recursively look at all fields in config to find where component would fit. This is used to change configs so that they don’t use default values. Return the chain of field names, from child to parent.

pytext.utils.embeddings_utils module

class pytext.utils.embeddings_utils.PretrainedEmbedding(embeddings_path: str = None, lowercase_tokens: bool = True)[source]

Bases: object

Utility class for loading/caching/initializing word embeddings

cache_pretrained_embeddings(cache_path: str) → None[source]

Cache the processed embedding vectors and vocab to a file for faster loading

initialize_embeddings_weights(vocab_to_idx: Dict[str, int], unk: str, embed_dim: int, init_strategy: pytext.config.field_config.EmbedInitStrategy) → torch.Tensor[source]

Initialize embeddings weights of shape (len(vocab_to_idx), embed_dim) from the pretrained embeddings vectors. Words that are not in the pretrained embeddings list will be initialized according to init_strategy. :param vocab_to_idx: a dict that maps words to indices that the model expects :param unk: unknown token :param embed_dim: the embeddings dimension :param init_strategy: method of initializing new tokens :returns: a float tensor of dimension (vocab_size, embed_dim)

load_cached_embeddings(cache_path: str) → None[source]

Load cached embeddings from file

load_pretrained_embeddings(raw_embeddings_path: str, append: bool = False, dialect: str = None, lowercase_tokens: bool = True) → None[source]

Loading raw embeddings vectors from file in the format: num_words dim [word_i] [v0, v1, v2, …., v_dim] [word_2] [v0, v1, v2, …., v_dim] ….

Optionally appends _dialect to every token in the vocabulary (for XLU embeddings).

pytext.utils.embeddings_utils.append_dialect(word: str, dialect: str) → str[source]

pytext.utils.loss_utils module

class pytext.utils.loss_utils.LagrangeMultiplier[source]

Bases: torch.autograd.function.Function

static backward(ctx, grad_output)[source]

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs did forward() return, and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, input)[source]

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store tensors that can be then retrieved during the backward pass.

pytext.utils.loss_utils.build_class_priors(labels, class_priors=None, weights=None, positive_pseudocount=1.0, negative_pseudocount=1.0)[source]

build class priors, if necessary. For each class, the class priors are estimated as (P + sum_i w_i y_i) / (P + N + sum_i w_i), where y_i is the ith label, w_i is the ith weight, P is a pseudo-count of positive labels, and N is a pseudo-count of negative labels.

  • labels – A Tensor with shape [batch_size, num_classes]. Entries should be in [0, 1].
  • class_priors – None, or a floating point Tensor of shape [C] containing the prior probability of each class (i.e. the fraction of the training data consisting of positive examples). If None, the class priors are computed from targets with a moving average.
  • weightsTensor of shape broadcastable to labels, [N, 1] or [N, C], where N = batch_size, C = num_classes`
  • positive_pseudocount – Number of positive labels used to initialize the class priors.
  • negative_pseudocount – Number of negative labels used to initialize the class priors.

A Tensor of shape [num_classes] consisting of the

weighted class priors, after updating with moving average ops if created.

Return type:


pytext.utils.loss_utils.false_postives_upper_bound(labels, logits, weights)[source]

false_positives_upper_bound defined in paper: “Scalable Learning of Non-Decomposable Objectives”

  • labels – A Tensor of shape broadcastable to logits.
  • logits – A Tensor of shape [N, C] or [N, C, K]. If the third dimension is present, the lower bound is computed on each slice [:, :, k] independently.
  • weights – Per-example loss coefficients, with shape broadcast-compatible with that of labels. i.e. [N, 1] or [N, C]

A Tensor of shape [C] or [C, K].

pytext.utils.loss_utils.range_to_anchors_and_delta(precision_range, num_anchors)[source]

Calculates anchor points from precision range.

  • precision_range – an interval (a, b), where 0.0 <= a <= b <= 1.0
  • num_anchors – int, number of equally spaced anchor points.

A Tensor of [num_anchors] equally spaced values

in the interval precision_range.

delta: The spacing between the values in precision_values.

Return type:



ValueError – If precision_range is invalid.

pytext.utils.loss_utils.true_positives_lower_bound(labels, logits, weights)[source]

true_positives_lower_bound defined in paper: “Scalable Learning of Non-Decomposable Objectives”

  • labels – A Tensor of shape broadcastable to logits.
  • logits – A Tensor of shape [N, C] or [N, C, K]. If the third dimension is present, the lower bound is computed on each slice [:, :, k] independently.
  • weights – Per-example loss coefficients, with shape [N, 1] or [N, C]

A Tensor of shape [C] or [C, K].

pytext.utils.loss_utils.weighted_hinge_loss(labels, logits, positive_weights=1.0, negative_weights=1.0)[source]
  • labels – one-hot representation Tensor of shape broadcastable to logits
  • logits – A Tensor of shape [N, C] or [N, C, K]
  • positive_weights – Scalar or Tensor
  • negative_weights – same shape as positive_weights

3D Tensor of shape [N, C, K], where K is length of positive weights or 2D Tensor of shape [N, C]

pytext.utils.model_utils module

pytext.utils.model_utils.to_onehot(feat: pytext.utils.cuda_utils.Variable, size: int) → pytext.utils.cuda_utils.Variable[source]

Transform features into one-hot vectors

pytext.utils.onnx_utils module

pytext.utils.onnx_utils.add_feats_numericalize_ops(c2_prepared, vocab_map, input_names)[source]
pytext.utils.onnx_utils.create_vocab_index(vocab_list, net, net_workspace, index_name)[source]
pytext.utils.onnx_utils.create_vocab_indices_map(c2_prepared, init_net, vocab_map)[source]
pytext.utils.onnx_utils.export_nets_to_predictor_file(c2_prepared, input_names, output_names, predictor_path, extra_params=None)[source]
pytext.utils.onnx_utils.get_numericalize_net(c2_prepared, vocab_map, input_names)[source]
pytext.utils.onnx_utils.pytorch_to_caffe2(model, export_input, external_input_names, output_names, export_path, export_onnx_path=None)[source]

pytext.utils.python_utils module


pytext.utils.test_utils module

class pytext.utils.test_utils.ResultRow(name, metrics_dict)[source]

Bases: object

class pytext.utils.test_utils.ResultTable(metrics, class_names, labels, preds)[source]

Bases: object

pytext.utils.test_utils.merge_token_labels_by_bio(token_ranges, labels)[source]
pytext.utils.test_utils.merge_token_labels_by_label(token_ranges, labels)[source]
pytext.utils.test_utils.merge_token_labels_to_slot(token_ranges, labels, use_bio_label=True)[source]

Module contents