Deep Learning Model Interpretability

February 21st, 2018 02:02

By Chitora Shindo, internship student at Alpaca.

Deep Learning has been known in recent years for impressive performance on various machine learning tasks, with the best performance obtained by deep and complex models. Because of this complexity, interpretability can become very difficult, and as such they are often treated as black boxes. A lack of interpretability can lead to a lack of trust, especially in critical fields such as medicine or finance, so deep learning models interpretability has received increased attention in recent years [3].

To probe the predictor, various stages of its predictions can be decomposed. One way to interpret the model is to understand how it is interacting with its features, which is the method I choose.
In this work, I explore two different methods to tackle this task, namely Sensitivity Analysis [1] and Layerwise Relevance Propagation [2], in order to make Alpaca’s deep learning models more understandable.


For its predictions, Alpaca uses 1-dimensional signals such as the evolution of a currency prices through time. Inputs are extracted from these signals over a sliding time window. This gives us a dataset that can be used by a convolutional model for training and prediction.

To simplify the experiment and more easily check the results, I produced some simple dataset based on a sinusoidal signal, instead of price data with low signal to noise ratio.

The goal for the network is to predict the future value of the signal, in the form of a simple binary prediction, with 0 as increasing, and 1 as decreasing.

To make the dataset interesting for our purpose, the sinusoid is distributed among several noisy channels. Two variations of this were used.

The first dataset is made of the sinusoidal signal is split evenly, then distributed randomly between the three channels.

Fig 1: A sinusoid randomly split between 3 noisy channels.

I expect this dataset to be easy to interpret as the relation between the inputs and the target is very straightforward, and indeed the accuracy obtained after training is about 0.993.

The second dataset is a copy of the first dataset plus two dummies random channels which do not contain the sinusoidal signal.

Fig 2: A sinusoid randomly split between 3 noisy channels and 2 dummy channels.

This dataset is used to see if an interpretation method may helps us to filter out channels which do not contain useful informations.

The accuracy obtained after training is about 0.985.

Sensitivity analysis

First, I did some gradient-based sensitivity analysis of the features. The sensitivity analysis [1] work by partially derivating each features to find their influence on the prediction. That method is model agnostic, which mean that it only rely on the inputs. It can be defined by:

Where xi is each input feature, and f(x)the model’s function to generate the prediction.

The idea is to look at how changes on each feature affect the overall prediction.

For our dataset, we will look at the influence of each input channel and each time-steps.

Despite this method is not the most detailed to peek at the inner workings of the network, it seems effective to give a general view of the interaction with the features. This can be used to determine which features are important.

In our second toy dataset with the dummy channels, the impact of the dummy channels tend to be insignificant compared to the channels carrying the sinusoidal signal. Which mean that changes on those channels does not affect the prediction. This is the expected outcome and confirm that the network can focus on the right channels.

We can also study how time information is impacting the prediction. It seems that the model is able to recognize the sinusoidal signal among the channels, but it is hard to pinpoint the contribution in time.  

Fig 4: input to the network

Fig 5: Impact by channel and summed over all channels.

channel 0 2.617
channel 1 2.159
channel 2 2.501
channel 3 0.342
channel 4 0.357

Fig 6: summed results across channels

Layerwise Relevance Propagation

The Layerwise Relevance Propagation[2] (LRP) is another way to analyze the connection between the model and its feature. While the sensitivity analysis only focus on the variation of the output (prediction) from the variation of the input (features), the LRP run a backward pass from the output in order to interpret and give a heatmap visualization of how the model use each features for inference.

Fig 7: Backward pass of the LRP method, see [4] for an interactive demo by the authors.

This is a model specific method which is much more complex to implement as we need to take care of the backward pass for every type of layers (convolution, pooling, …) that are present in our models. This also require to update the method when a new layer is incorporated into our models.

I based my implementation on the alpha-beta LRP (according to Eq(60) in [2]), since it appear to be the most robust version regarding handling of negative values. Most of publicly available implementations are built for non-negative features only as they focus on image recognition models based on RELU activation, but we needed to handle the negative cases.

We can see on fig. 9  that the trained model seems to focus on the highest and lowest point of the sinusoidal signal to make its prediction. It also seems to be able to use the appropriate channels.

We can infer from this visualization which parts of the datasets are important for the model and even how the model is interacting with the features.

Using the insights from this visualization, I did an ablation study by keeping only the values of the dataset where the sinusoidal signal is near its highest or lowest points (fig 10).

The model’s accuracy stays almost the same (around 0.9915), which confirm that the model is mostly making its prediction using only those extreme values.

Fig 8: Input channels

Fig 9: LRP on channels

Fig 10: Reduced inputs for all channels after feedback from LRP

We can also use this method by summing over each output channels to see which ones seems to be the most/less important, as we did with the sensitivity analysis.

channel 0 0.216
channel 1 0.221
channel 2 0.206

Even if the differences between the impact values are less important, it seems that the LRP is also able to distinct the channels carrying the signal from the ones without it.


Both methods help to interpret a model by studying the interaction with its features, their working can be summarized as answering the question “How the features are influencing the outcome” for the sensitivity analysis, and “How the model is using its features” for the layerwise relevance propagation.

While the SA was easier to implement, it is less expressive especially regarding temporal informations.

On the other hand, the LRP was more complex to implement and it may need more maintenance, but the interpretations are more detailed and allow us to pinpoint to particular feature contributions.


[1] Baehrens, David, et al. “How to Explain Individual Classification Decisions.” [0912.1128] How to Explain Individual Classification Decisions, 6 Dec. 2009,

[2] Bach, Sebastian, et al. “On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.” Plos One, vol. 10, no. 7, 2015, doi:10.1371/journal.pone.0130140.

[3] Samek, Wojciech, et al. “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models.” [1708.08296] Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models, 28 Aug. 2017,