5 Effective Strategies for Preserving Your Machine Learning Models
Written on
Chapter 1: Importance of Saving Machine Learning Models
Preserving your trained machine learning models is a crucial aspect of the machine learning process, as it allows for their reuse in future applications. For example, it is often necessary to evaluate multiple models to identify the best one for production deployment. By saving the models after training, this evaluation process becomes more manageable.
Conversely, retraining a model each time it is needed can be a significant drain on productivity, especially if the training process is lengthy.
In this article, we will explore five effective methods to save your trained models.
Section 1.1: Method #1 - Pickle
Pickle is a widely-used library in Python for serializing objects. It allows you to save your trained machine learning model to a file, which can later be deserialized in another script for making predictions.
Here’s a utility function I created to save the model pipeline, and I will illustrate its use in the training pipeline:
# Code snippet showing utility function for saving model
In our training pipeline script, we invoke the save_pipeline utility function.
# Code snippet showing training pipeline
To load the pipeline, I developed a corresponding function called load_pipeline.
# Code snippet for loading the model
In our prediction script, we retrieve the pipeline into a variable named _fraud_detection_pipe, allowing us to utilize it for predictions. Here’s the rest of that script:
# Code snippet for prediction
(Note: The trained model is loaded in line 14.)
Section 1.2: Method #2 - Joblib
Joblib serves as a robust alternative to Pickle for saving and loading models. It is part of the SciPy ecosystem and excels in handling large NumPy arrays. For more insights on Joblib's advantages, refer to this StackOverflow discussion.
“Joblib is a set of tools to provide lightweight pipelining in Python, including transparent disk-caching and easy parallel computing.”
— Source: Joblib Documentation
To utilize Joblib for saving our model, we only need to adjust our save_pipeline() function:
# Code snippet showing joblib implementation
You’ll notice we switched to using Joblib instead of Pickle, and on line 16, we serialize our model pipeline using Joblib.
(Note: Interested readers can view the complete code on my GitHub.)
Section 1.3: Method #3 - JSON
Another approach to saving your model is through JSON. Unlike Joblib and Pickle, the JSON method does not save the trained model directly but rather stores all necessary parameters to reconstruct the model. This is beneficial when you need full control over the saving and restoration process.
# Code snippet demonstrating JSON saving method
You can invoke the save_json() method on your MyLogisticRegression instance to store the parameters.
# Code snippet showing how to call save_json
This data can be utilized in another script to recreate the previous model.
Section 1.4: Method #4 - PMML
Predictive Model Markup Language (PMML) is another format used for saving machine learning models. It offers greater robustness than Pickle, as a PMML model is independent of the class from which it was created.
# Code snippet for loading PMML model
Section 1.5: Method #5 - TensorFlow Keras
TensorFlow Keras provides the capability to save models to either SavedModel or HDF5 files. Let’s create a simple model and save it using TensorFlow Keras, starting with generating some training data.
# Code snippet for generating training data
Next, we can construct a sequential model:
# Code snippet for building sequential model
In the code above, we use the save() method on our sequential model instance, specifying the directory path for saving.
To retrieve the model, we simply call the load_model() method on the model object.
# Code snippet for loading model
Wrap-Up
In this article, we've examined five distinct methods for preserving your machine learning models. Additionally, it's essential to document the Python version and library versions utilized during model creation. This information will facilitate recreating the environment necessary for future model reproduction.
Thank you for reading!
Connect with me:
Subscribe to receive notifications whenever I publish new content.
In the following video, you’ll learn how to save your machine learning model using Joblib and Pickle effectively.
This video demonstrates the process of saving and loading a machine learning model with Joblib and Pickle.