Predicting Mixed Targets with Neural Networks and Keras

Train a neural network to predict two different targets simultaneously.

Published in

Towards Data Science

5 min readMar 11, 2021

Using a network of nodes, you can train models that take into account multiple targets and even targets of different types. I thought this was so great the first time I tried it on an actual project and it opened up my perception of what neural networks can do. In this article I’ll talk about how you can train a neural network to predict two different targets simultaneously. If you are a data scientist or machine learning engineer, then you should be looking at neural networks as they offer a lot of capability and flexibility that other out-of-the-box ML algorithms do not have .

Training and Predicting on Multiple Targets

If using the same features to predict on multiple targets, then you can use those same features to train a network on those targets. Some features in your dataset might be better suited for one of your targets and not the other. You could also set your model architecture to accommodate two different training datasets if you wanted to, but here we are concatenating the two feature sets together. I did this so we can see the effect of changing loss weights on model performance on a single training dataset.

Below is the code to define the network using the Keras model API. Notice that there are two output layers and two outputs in the model: one for regression and one for classification. In this problem, we want to predict both of these targets simultaneously.

Image by author

See the code gist in Performance using loss weights section to see the syntax on model training.

Model Architecture

For this example, the actual network of layers is only two deep and densely connected. Normally, a single output layer is used and, depending on the target type, a single activation type is selected. Here, both a regression and classification target are used, so two output layers are used:

For the regression_output layer, a single node with a linear activation and for the classification_output layer, we just need a single node with a sigmoid layer. This ensures that the predictions that come from the model match the types of targets they are intended to predict.

Network Predictions

On the left, the test dataset predictions from the regression_output is shown against the actual test dataset values. On the right, the test dataset predictions of probabilities from the classification_output is shown against the actual class in the test dataset.

Image by author: Depiction of predictions generated by model on two targets simultaneously

Both can be obtained from the predict method of the trained model. Refer to the Keras documentation or checkout the link at the bottom of the article to see the full notebook on github. Basically, the predict method will return a list of predictions for each target, in this case there will be two lists of predictions.

Performance Using Different Loss Weights

In addition to training a model to prediction multiple targets, we can choose which target we want to learn more from. What I mean by this, is that we can weight specify weights to the targets to specify which one is more important (if that is the case).

From the Keras documentation on this parameter:

loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model's outputs. If a dict, it is expected to map output names (strings) to scalar coefficients.

The key takeaway here is that we can increase one of the output layers contribution to the model’s loss by increasing the weight in the loss_weights parameter. In the code snippet below, I’m choosing different loss_weights parameters in each training loop. We’ll see how changing this parameter affects performance.

Image by author

Precision & Recall Performance

From left to right, the weights associated with the loss of each target is changed. For example, the first value of [1,100] would weight the classification loss metric (most like log-loss, but labeled binary cross entropy) as 100 times more than the regression loss metric (mean-squared error). The intent here would be for the model to be penalized by making errors on the classification metric more and, hopefully, return a better predictive model for the classification target.

Notice how the performance on the test dataset changes after altering the loss weights to favor classification over regression, and vice versa.

R-squared Performance

Similarly, for regression performance, there is a general trend of increasing performance when the regression is weighted more than the classification loss metric. The regression target was already well defined by the feature set (basically a functional relationship) so the performance is near perfect, but the idea is still the same.

Conclusion

Using a network of nodes, you can train models that take into account multiple targets and even targets of different types. It’s also possible to weight which target’s loss contributes more to the overall model loss, which is a way to state which target needs to be learned from more. As always with neural networks, you can do much more with defining the model architecture. While we use a single input layer and multiple targets in this example, it is also possible to define a model with separate input layers and separate output layers. If you are a data scientist or machine learning engineer, then you should be looking at neural networks as they offer a lot of capability and flexibility that other ML algorithms do not have out of the box.

References

GitHub - caseywhorton/medium-blog-code

Contribute to caseywhorton/medium-blog-code development by creating an account on GitHub.

github.com

Keras documentation: Model training APIs

Configures the model for training. Arguments optimizer: String (name of optimizer) or optimizer instance…

keras.io