pytext.metrics package

Submodules

pytext.metrics.intent_slot_metrics module

class pytext.metrics.intent_slot_metrics.AllMetrics[source]

Bases: tuple

Aggregated class for intent-slot related metrics.

top_intent_accuracy

Accuracy of the top-level intent.

frame_accuracy

Frame accuracy.

frame_accuracies_by_depth

Frame accuracies bucketized by depth of the gold tree.

bracket_metrics

Bracket metrics for intents and slots. For details, see the function compute_intent_slot_metrics().

tree_metrics

Tree metrics for intents and slots. For details, see the function compute_intent_slot_metrics().

loss

Cross entropy loss.

bracket_metrics

Alias for field number 3

frame_accuracies_by_depth

Alias for field number 2

frame_accuracy

Alias for field number 1

loss

Alias for field number 5

print_metrics() → None[source]
top_intent_accuracy

Alias for field number 0

tree_metrics

Alias for field number 4

pytext.metrics.intent_slot_metrics.FrameAccuraciesByDepth

alias of typing.Dict

class pytext.metrics.intent_slot_metrics.FrameAccuracy[source]

Bases: tuple

Frame accuracy for a collection of intent frame predictions.

Frame accuracy means the entire tree structure of the predicted frame matches that of the gold frame.

frame_accuracy

Alias for field number 1

num_samples

Alias for field number 0

class pytext.metrics.intent_slot_metrics.FramePredictionPair[source]

Bases: tuple

Pair of predicted and gold intent frames.

expected_frame

Alias for field number 1

predicted_frame

Alias for field number 0

class pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]

Bases: tuple

Aggregated class for intent and slot confusions.

intent_confusions

Confusion counts for intents.

slot_confusions

Confusion counts for slots.

intent_confusions

Alias for field number 0

slot_confusions

Alias for field number 1

class pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]

Bases: tuple

Precision/recall/F1 metrics for intents and slots.

intent_metrics

Precision/recall/F1 metrics for intents.

slot_metrics

Precision/recall/F1 metrics for slots.

overall_metrics

Combined precision/recall/F1 metrics for all nodes (merging intents and slots).

intent_metrics

Alias for field number 0

overall_metrics

Alias for field number 2

print_metrics() → None[source]
slot_metrics

Alias for field number 1

class pytext.metrics.intent_slot_metrics.IntentsAndSlots[source]

Bases: tuple

Collection of intents and slots in an intent frame.

intents

Alias for field number 0

slots

Alias for field number 1

class pytext.metrics.intent_slot_metrics.Node(label: str, span: pytext.metrics.intent_slot_metrics.Span, children: Set[Node] = None)[source]

Bases: object

Node in an intent-slot tree, representing either an intent or a slot.

label

Label of the node.

span

Span of the node.

children

Children of the node.

children
get_depth() → int[source]
label
span
class pytext.metrics.intent_slot_metrics.NodesPredictionPair[source]

Bases: tuple

Pair of predicted and expected sets of nodes.

expected_nodes

Alias for field number 1

predicted_nodes

Alias for field number 0

class pytext.metrics.intent_slot_metrics.Span[source]

Bases: tuple

Span of a node in a text.

start

Start position of the node.

end

End position of the node (exclusive).

end

Alias for field number 1

start

Alias for field number 0

pytext.metrics.intent_slot_metrics.compare_frames(predicted_frame: pytext.metrics.intent_slot_metrics.Node, expected_frame: pytext.metrics.intent_slot_metrics.Node, tree_based: bool, intent_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None, slot_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None) → pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]

Compares two intent frames and returns TP, FP, FN counts for intents and slots. Optionally collects the per label TP, FP, FN counts.

Parameters:
  • predicted_frame – Predicted intent frame.
  • expected_frame – Gold intent frame.
  • tree_based – Whether to get the tree-based confusions (if True) or bracket-based confusions (if False). For details, see the function compute_intent_slot_metrics().
  • intent_per_label_confusions – If provided, update the per label confusions for intents as well. Defaults to None.
  • slot_per_label_confusions – If provided, update the per label confusions for slots as well. Defaults to None.
Returns:

IntentSlotConfusions, containing confusion counts for intents and slots.

pytext.metrics.intent_slot_metrics.compute_all_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], top_intent_accuracy: bool = True, frame_accuracy: bool = True, frame_accuracies_by_depth: bool = True, bracket_metrics: bool = True, tree_metrics: bool = True, overall_metrics: bool = False) → pytext.metrics.intent_slot_metrics.AllMetrics[source]

Given a list of predicted and gold intent frames, computes intent-slot related metrics.

Parameters:
  • frame_pairs – List of predicted and gold intent frames.
  • top_intent_accuracy – Whether to compute top intent accuracy or not. Defaults to True.
  • frame_accuracy – Whether to compute frame accuracy or not. Defaults to True.
  • frame_accuracies_by_depth – Whether to compute frame accuracies by depth or not. Defaults to True.
  • bracket_metrics – Whether to compute bracket metrics or not. Defaults to True.
  • tree_metrics – Whether to compute tree metrics or not. Defaults to True.
  • overall_metrics – If bracket_metrics or tree_metrics is true, decides whether to compute overall (merging intents and slots) metrics for them. Defaults to False.
Returns:

AllMetrics which contains intent-slot related metrics.

pytext.metrics.intent_slot_metrics.compute_frame_accuracies_by_depth(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy][source]

Given a list of predicted and gold intent frames, splits the predictions into buckets according to the depth of the gold trees, and computes frame accuracy for each bucket.

Parameters:frame_pairs – List of predicted and gold intent frames.
Returns:FrameAccuraciesByDepth, a map from depths to their corresponding frame accuracies.
pytext.metrics.intent_slot_metrics.compute_frame_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]

Computes frame accuracy given a list of predicted and gold intent frames.

Parameters:frame_pairs – List of predicted and gold intent frames.
Returns:Frame accuracy. For a prediction, frame accuracy is achieved if the entire tree structure of the predicted frame matches that of the gold frame.
pytext.metrics.intent_slot_metrics.compute_intent_slot_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], tree_based: bool, overall_metrics: bool = True) → pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]

Given a list of predicted and gold intent frames, computes precision, recall and F1 metrics for intents and slots, either in tree-based or bracket-based manner.

The following assumptions are taken on intent frames: 1. The root node is an intent, 2. Children of intents are always slots, and children of slots are always intents.

For tree-based metrics, a node (an intent or slot) in the predicted frame is considered a true positive only if the subtree rooted at this node has an exact copy in the gold frame, otherwise it is considered a false positive. A false negative is a node in the gold frame that does not have an exact subtree match in the predicted frame.

For bracket-based metrics, a node in the predicted frame is considered a true positive if there is a node in the gold frame having the same label and span (but not necessarily the same children). The definitions of false positives and false negatives are similar to the above.

Parameters:
  • frame_pairs – List of predicted and gold intent frames.
  • tree_based – Whether to compute tree-based metrics (if True) or bracket-based metrics (if False).
  • overall_metrics – Whether to compute overall (merging intents and slots) metrics or not. Defaults to True.
Returns:

IntentSlotMetrics, containing precision/recall/F1 metrics for intents and slots.

pytext.metrics.intent_slot_metrics.compute_prf1_metrics(nodes_pairs: Sequence[pytext.metrics.intent_slot_metrics.NodesPredictionPair]) → Tuple[pytext.metrics.AllConfusions, pytext.metrics.PRF1Metrics][source]

Computes precision/recall/F1 metrics given a list of predicted and expected sets of nodes.

Parameters:nodes_pairs – List of predicted and expected node sets.
Returns:A tuple, of which the first member contains the confusion information, and the second member contains the computed precision/recall/F1 metrics.
pytext.metrics.intent_slot_metrics.compute_top_intent_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]

Computes accuracy of the top-level intent.

Parameters:frame_pairs – List of predicted and gold intent frames.
Returns:Prediction accuracy of the top-level intent.

pytext.metrics.language_model_metrics module

class pytext.metrics.language_model_metrics.LanguageModelMetric[source]

Bases: tuple

Class for language model metrics.

perplexity_per_word

Average perplexity per word of the dataset.

perplexity_per_word

Alias for field number 0

print_metrics()[source]
pytext.metrics.language_model_metrics.compute_language_model_metric(loss_per_word: float) → pytext.metrics.language_model_metrics.LanguageModelMetric[source]

Module contents

class pytext.metrics.AllConfusions[source]

Bases: object

Aggregated class for per label confusions.

per_label_confusions

Per label confusion information.

confusions

Overall TP, FP and FN counts across the labels in per_label_confusions.

compute_metrics() → pytext.metrics.PRF1Metrics[source]
confusions
per_label_confusions
class pytext.metrics.ClassificationMetrics[source]

Bases: tuple

Metric class for various classification metrics.

accuracy

Overall accuracy of predictions.

macro_prf1_metrics

Macro precision/recall/F1 scores.

per_label_soft_scores

Per label soft metrics.

mcc

Matthews correlation coefficient.

roc_auc

Area under the Receiver Operating Characteristic curve.

accuracy

Alias for field number 0

macro_prf1_metrics

Alias for field number 1

mcc

Alias for field number 3

per_label_soft_scores

Alias for field number 2

print_metrics() → None[source]
roc_auc

Alias for field number 4

class pytext.metrics.Confusions(TP: int = 0, FP: int = 0, FN: int = 0)[source]

Bases: object

Confusion information for a collection of predictions.

TP

Number of true positives.

FP

Number of false positives.

FN

Number of false negatives.

FN
FP
TP
compute_metrics() → pytext.metrics.PRF1Scores[source]
class pytext.metrics.LabelPrediction[source]

Bases: tuple

Label predictions of an example.

label_scores

Confidence scores that each label receives.

predicted_label

Index of the predicted label. This is usually the label with the highest confidence score in label_scores.

expected_label

Index of the true label.

expected_label

Alias for field number 2

label_scores

Alias for field number 0

predicted_label

Alias for field number 1

class pytext.metrics.MacroPRF1Metrics[source]

Bases: tuple

Aggregated metric class for macro precision/recall/F1 scores.

per_label_scores

Mapping from label string to the corresponding precision/recall/F1 scores.

macro_scores

Macro precision/recall/F1 scores across the labels in per_label_scores.

macro_scores

Alias for field number 1

per_label_scores

Alias for field number 0

print_metrics() → None[source]
class pytext.metrics.MacroPRF1Scores[source]

Bases: tuple

Macro precision/recall/F1 scores (averages across each label).

num_label

Number of distinct labels.

precision

Equally weighted average of precisions for each label.

recall

Equally weighted average of recalls for each label.

f1

Equally weighted average of F1 scores for each label.

f1

Alias for field number 3

num_labels

Alias for field number 0

precision

Alias for field number 1

recall

Alias for field number 2

class pytext.metrics.PRF1Metrics[source]

Bases: tuple

Metric class for all types of precision/recall/F1 scores.

per_label_scores

Map from label string to the corresponding precision/recall/F1 scores.

macro_scores

Macro precision/recall/F1 scores across the labels in per_label_scores.

micro_scores

Micro (regular) precision/recall/F1 scores for the same collection of predictions.

macro_scores

Alias for field number 1

micro_scores

Alias for field number 2

per_label_scores

Alias for field number 0

print_metrics() → None[source]
class pytext.metrics.PRF1Scores[source]

Bases: tuple

Precision/recall/F1 scores for a collection of predictions.

true_positives

Number of true positives.

false_positives

Number of false positives.

false_negatives

Number of false negatives.

precision

TP / (TP + FP).

recall

TP / (TP + FN).

f1

2 * TP / (2 * TP + FP + FN).

f1

Alias for field number 5

false_negatives

Alias for field number 2

false_positives

Alias for field number 1

precision

Alias for field number 3

recall

Alias for field number 4

true_positives

Alias for field number 0

class pytext.metrics.PerLabelConfusions[source]

Bases: object

Per label confusion information.

label_confusions_map

Map from label string to the corresponding confusion counts.

compute_metrics() → pytext.metrics.MacroPRF1Metrics[source]
label_confusions_map
update(label: str, item: str, count: int) → None[source]

Increase one of TP, FP or FN count for a label by certain amount.

Parameters:
  • label – Label to be modified.
  • item – Type of count to be modified, should be one of “TP”, “FP” or “FN”.
  • count – Amount to be added to the count.
Returns:

None

pytext.metrics.RECALL_AT_PRECISION_THREHOLDS = [0.2, 0.4, 0.6, 0.8, 0.9]

Basic metric classes and functions for single-label prediction problems.

class pytext.metrics.SoftClassificationMetrics[source]

Bases: tuple

Classification scores that are independent of thresholds.

average_precision

Alias for field number 0

recall_at_precision

Alias for field number 1

pytext.metrics.average_precision_score(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray) → float[source]

Computes average precision, which summarizes the precision-recall curve as the precisions achieved at each threshold weighted by the increase in recall since the previous threshold.

Parameters:
  • y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
  • Numpy array of confidence scores for the predictions in (y_score_sorted) – decreasing order.
Returns:

Average precision score.

pytext.metrics.compute_classification_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]

A general function that computes classification metrics given a list of label predictions.

Parameters:
  • predictions – Label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_matthews_correlation_coefficients(TP: int, FP: int, FN: int, TN: int) → float[source]

Computes Matthews correlation coefficient, a way to summarize all four counts (TP, FP, FN, TN) in the confusion matrix of binary classification.

Parameters:
  • TP – Number of true positives.
  • FP – Number of false positives.
  • FN – Number of false negatives.
  • TN – Number of true negatives.
Returns:

Matthews correlation coefficient, which is sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).

pytext.metrics.compute_prf1(tp: int, fp: int, fn: int) → Tuple[float, float, float][source]
pytext.metrics.compute_roc_auc(predictions: Sequence[pytext.metrics.LabelPrediction]) → Optional[float][source]

Computes area under the Receiver Operating Characteristic curve, for binary classification. Implementation based off of (and explained at) https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en.

pytext.metrics.compute_soft_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]

Computes soft classification metrics (for now, average precision) given a list of label predictions.

Parameters:
  • predictions – Label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
Returns:

Dict from label strings to their corresponding soft metrics.

pytext.metrics.recall_at_precision(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Dict[float, float][source]

Computes recall at various precision levels

Parameters:
  • y_true_sorted – Numpy array sorted according to decreasing condifence scores indicating whether each prediction is correct.
  • y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
  • thresholds – Sequence of floats indicating the requested precision thresholds
Returns:

Dictionary of maximum recall at requested precision thresholds.

pytext.metrics.safe_division(n: Union[int, float], d: int) → float[source]
pytext.metrics.sort_by_score(y_true_list: Sequence[bool], y_score_list: Sequence[float])[source]