_model_utils
============

.. py:module:: _model_utils


Classes
-------

.. autoapisummary::

   _model_utils.StatementClassifierEUlaw


Functions
---------

.. autoapisummary::

   _model_utils.load_data
   _model_utils.preprocess_function
   _model_utils.fill_segmentation
   _model_utils.load_model
   _model_utils.load_labels
   _model_utils.load_training_data
   _model_utils.load_sunshine
   _model_utils.load_penguins
   _model_utils.features_eulaw
   _model_utils.classify_texts_eulaw


Module Contents
---------------

.. py:function:: load_data(file)

   Open data from a file and returns it as pandas DataFrame.


.. py:function:: preprocess_function(image)

   For LIME: we divided the input data by 256 for the model (binary mnist) and LIME needs RGB values.


.. py:function:: fill_segmentation(values, segmentation)

   For KernelSHAP: fill each pixel with SHAP values.


.. py:function:: load_model(file)

.. py:function:: load_labels(file)

.. py:function:: load_training_data(file)

.. py:function:: load_sunshine(file)

   Tabular sunshine example.

   Load the csv file in a pandas dataframe and split the data in a train and test set.


.. py:function:: load_penguins(penguins)

   Prep the data for the penguin model example as per ntoebook.


.. py:function:: features_eulaw(texts: list[str], model_tag='law-ai/InLegalBERT')

   Create features for a list of texts.


.. py:function:: classify_texts_eulaw(texts: list[str], model_path, return_proba: bool = False)

   Classifies every text in a list of texts using the xgboost model stored in model_path.

   The xgboost model will be loaded and used to classify the texts. The texts however will first be processed by a
   large language model which will do the feature extraction for every text. The classifications of the
   xgboost model will be returned.
   For training the xgboost model, see train_legalbert_xgboost.py.

   :param texts: A list of strings of which each needs to be classified.
   :param model_path: The path to a stored xgboost model
   :param return_proba: return the probabilities of the model

   :rtype: List of classifications, one for every text in the list


.. py:class:: StatementClassifierEUlaw(model_path)

   .. py:attribute:: tokenizer


   .. py:attribute:: model_path


   .. py:method:: __call__(sentences)