The Keras/Tensorflow Python API allows simple and easy model saving, loading and inference of trained models. But performing the same operations with C++ is somehow more complicated. This article will describe how to load a SavedModel
with C++ for inference operations.
Using Python and Tensorflow 2.X, it has become super simple to save and load a model:
# Saving a Tensorflow model
model = ... # Training
model.save('path/to/model')
from tensorflow import keras
# loading a Tensorflow model
model = keras.models.load_model('path/to/model')
# perform inference
outputs = model(input)
Tensorflow Subclassed Model with multiple inputs and outputs
Even when the model architectures has multiple input and outputs the code remains very pythonic.
# Tensorflow subclassed model with several inputs
class MyModel(tf.keras.Model):
def __init__(self):
super(MyModel, self).__init__()
def call(self, inputs, training=False, mask=None):
input1, input2 = inputs[0], inputs[1]
...
...
...
return bbox_proposals, probabilities
We can simply call the model object directly to perform the inference. This will invoke the MyModel
’s call(inputs, training=False, mask=None)
function.
model = keras.models.load_model('path/to/model')
# perform inference
bbox_proposals, probabilities = model([input1, input2])
Using the Tensorflow C++ API
However, using the Tensorflow C++ API to perform inference with a Keras’ SavedModel
is not very well documented. The following snippets provide a small walk through for loading and inference of the Tensorflow Model with C++.
As described in the official guide, it’s recommended to use tensorflow::SavedModelBundle
which contains the MetaGraphDef and the Tensorflow session.
tensorflow::SessionOptions session_options_;
tensorflow::RunOptions run_options_;
tensorflow::SavedModelBundle model_;
auto status = tensorflow::LoadSavedModel(session_options_,
run_options_,
path_to_model_,
{tensorflow::kSavedModelTagServe},
&model_);
if (!status.ok()) {
std::cerr << "Failed to load model: " << status;
return;
}
So far, so good. We successfully loaded the model. Now, comes the tricky part.
The model that we consider is still the subclassed model defined above with two inputs and two outputs. In order to feed the model successfully during the inference we need to know the input node names and also the output node names of the computational graph. Usually, a node name in Tensorflow can be defined with the parameter name="node_name"
. But in our case the inputs nor the outputs where tagged like that. Furthermore, the documentation does not provide any information to solve that issue.
Hence, we can investigate the loaded model by using the signature map of the model.
auto sig_map = model_.GetSignatures();
auto model_def = sig_map.at("serving_default");
printf("Model Signature");
for (auto const& p : sig_map) {
printf("key: %s", p.first.c_str());
}
printf("Model Input Nodes");
for (auto const& p : model_def.inputs()) {
printf("key: %s value: %s", p.first.c_str(), p.second.name().c_str());
}
printf("Model Output Nodes");
for (auto const& p : model_def.outputs()) {
printf("key: %s value: %s", p.first.c_str(), p.second.name().c_str());
}
Which will print something like
Model Signature
key: __saved_model_init_op
key: serving_default
Model Input Nodes
key: input_1 value: serving_default_input_1:0
key: input_2 value: serving_default_input_2:0
Model Output Nodes
key: output_1 value: StatefulPartitionedCall:0
key: output_2 value: StatefulPartitionedCall:1
Great, our input nodes are called
serving_default_input_1:0
serving_default_input_2:0
and the output nodes
StatefulPartitionedCall:0
StatefulPartitionedCall:1
.
We can use that information to perform the inference session by getting the node names:
input_name_1 = model_def.inputs().at("input_1").name();
input_name_2 = model_def.inputs().at("input_2").name();
output_name_bbox_proposals = model_def.outputs().at("output_1").name();
output_name_probabilities = model_def.outputs().at("output_2").name();
And then finally run the model within the session.
std::vector<TFTensor> inputTensor_1;
std::vector<TFTensor> inputTensor_2;
std::vector<TFTensor> bbox_output;
// fill the input tensors with data
tensorflow::Status status;
status = model_.session->Run({ {input_name_1, inputTensor_1},
{input_name_2, inputTensor_2} },
{output_name_bbox_proposals}, {}, &bbox_output);
if (!status.ok()) {
std::cerr << "Inference failed: " << status;
return;
}
Note, that on a GPU device we can use similar to the Python API the set_allow_growth
variable for proper GPU RAM allocation.
session_options_.config.mutable_gpu_options()->set_allow_growth(true);
Troubleshooting
In case you are encountering the following error during runtime,
undefined symbol: _ZNK6google8protobuf8internal12MapFieldBase24SyncMapWithRepeatedFieldEv
you need to make sure that you link the C++ code against Google’s Protobuf library. These libs can be installed with
sudo apt install -y libprotobuf-dev
sudo apt install -y protobuf-compiler
Wrap-up
- We should load a Keras’
SavedModel
withtensorflow::LoadSavedModel
in C++ - The names of input and output nodes of a subclassed Tensorflow model are not always uniquely defined
- We can look up the node names by examining the signature map of the model
- The inference of the model is then performed by call the model’s session