_model_utils

Classes

StatementClassifierEUlaw

Functions

load_data(file)

Open data from a file and returns it as pandas DataFrame.

preprocess_function(image)

For LIME: we divided the input data by 256 for the model (binary mnist) and LIME needs RGB values.

fill_segmentation(values, segmentation)

For KernelSHAP: fill each pixel with SHAP values.

load_model(file)

load_labels(file)

load_training_data(file)

load_sunshine(file)

Tabular sunshine example.

load_penguins(penguins)

Prep the data for the penguin model example as per ntoebook.

features_eulaw(texts[, model_tag])

Create features for a list of texts.

classify_texts_eulaw(texts, model_path[, return_proba])

Classifies every text in a list of texts using the xgboost model stored in model_path.

Module Contents

_model_utils.load_data(file)[source]

Open data from a file and returns it as pandas DataFrame.

_model_utils.preprocess_function(image)[source]

For LIME: we divided the input data by 256 for the model (binary mnist) and LIME needs RGB values.

_model_utils.fill_segmentation(values, segmentation)[source]

For KernelSHAP: fill each pixel with SHAP values.

_model_utils.load_model(file)[source]
_model_utils.load_labels(file)[source]
_model_utils.load_training_data(file)[source]
_model_utils.load_sunshine(file)[source]

Tabular sunshine example.

Load the csv file in a pandas dataframe and split the data in a train and test set.

_model_utils.load_penguins(penguins)[source]

Prep the data for the penguin model example as per ntoebook.

_model_utils.features_eulaw(texts: list[str], model_tag='law-ai/InLegalBERT')[source]

Create features for a list of texts.

_model_utils.classify_texts_eulaw(texts: list[str], model_path, return_proba: bool = False)[source]

Classifies every text in a list of texts using the xgboost model stored in model_path.

The xgboost model will be loaded and used to classify the texts. The texts however will first be processed by a large language model which will do the feature extraction for every text. The classifications of the xgboost model will be returned. For training the xgboost model, see train_legalbert_xgboost.py.

Parameters:
  • texts – A list of strings of which each needs to be classified.

  • model_path – The path to a stored xgboost model

  • return_proba – return the probabilities of the model

Return type:

List of classifications, one for every text in the list

class _model_utils.StatementClassifierEUlaw(model_path)[source]
tokenizer[source]
model_path[source]
__call__(sentences)[source]