Logo_ER10

Model Interpretation using KernelSHAP for weather prediction regressor

This notebook demonstrates the use of DIANNA with the KernelSHAP explainer method for next day temperature predictor (regressor) on a weather dataset containing tabular data of the temperatures from several locations in Europe.

Colab setup

[1]:
running_in_colab = 'google.colab' in str(get_ipython())
if running_in_colab:
    # install dianna
    !python3 -m pip install dianna[notebooks]

0 - Libraries

[2]:
import dianna
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from dianna.utils.onnx_runner import SimpleModelRunner
from dianna.utils.downloader import download

from numba.core.errors import NumbaDeprecationWarning
import warnings
# silence the Numba deprecation warnings in shap
warnings.simplefilter('ignore', category=NumbaDeprecationWarning)

1 - Loading the data

Load weather prediction dataset.

[3]:
data = pd.read_csv(download('weather_prediction_dataset_light.csv', 'data'))

Prepare the data

As the target, the sunshine hours for the next day in the data-set will be used. Therefore, we will remove the last data point as this has no target. A tabular regression model will be trained which does not require time-based data, therefore DATE and MONTH can be removed.

[4]:
X_data = data.drop(columns=['DATE', 'MONTH'])[:-1]
y_data = data.loc[1:]["BASEL_sunshine"]

Training, validation, and test data split.

[5]:
X_train, X_holdout, y_train, y_holdout = train_test_split(X_data, y_data, test_size=0.3, random_state=0)
X_val, X_test, y_val, y_test = train_test_split(X_holdout, y_holdout, test_size=0.5, random_state=0)

Get an instance to explain.

[6]:
# get an instance from test data
data_instance = X_test.iloc[10].to_numpy()

2 - Loading ONNX model

DIANNA supports ONNX models. Here we demonstrate the use of KernelSHAP explainer for tabular data with a pre-trained ONNX model, which is a MLP regressor for the weather dataset. The model is trained following this notebook.

[7]:
# load onnx model and check the prediction with it
model_path = download('sunshine_hours_regression_model.onnx', 'model')
loaded_model = SimpleModelRunner(model_path)
predictions = loaded_model(data_instance.reshape(1,-1).astype(np.float32))
predictions
[7]:
array([[3.0719438]], dtype=float32)

A runner function is created to prepare data for the ONNX inference session.

[8]:
import onnxruntime as ort

def run_model(data):
    # get ONNX predictions
    sess = ort.InferenceSession(model_path)
    input_name = sess.get_inputs()[0].name
    output_name = sess.get_outputs()[0].name

    onnx_input = {input_name: data.astype(np.float32)}
    pred_onnx = sess.run([output_name], onnx_input)[0]

    return pred_onnx

3 - Applying KernelSHAP with DIANNA

The simplest way to run DIANNA on tabular data is with dianna.explain_tabular.

DIANNA requires input in numpy format, so the input data is converted into a numpy array.

Note that the training data is also required since KernelSHAP needs it to generate proper perturbation.

[9]:
explanation = dianna.explain_tabular(run_model, input_tabular=data_instance, method='kernelshap',
                                     mode ='regression', training_data = X_train,
                                     training_data_kmeans = 5, feature_names=X_test.columns)

4 - Visualization

The output can be visualized with the DIANNA built-in visualization function. It shows the top 10 importance of each feature contributing to the prediction.

[10]:
from dianna.visualization import plot_tabular

fig, _ = plot_tabular(explanation, X_test.columns, num_features=10)
../_images/tutorials_5-kernalshap_tabular_weather_20_0.png
[ ]: