I am working on a Jetson AGX Xavier with Jetpack 4.5, Cuda 10.2, OpenCV 4.5.1 and Tensorflow C++ 2.3.1 build from sources with Cuda support. I used Bazel 3.1.0 to build tensorflow.
The purpose is to speed-up frames processing so I decided to implement an inference engine in C++. Problem is it is actually slower than python's.
Model (saved as a protobuf model : saved_model.pb) is loaded using :
auto status = tensorflow::LoadSavedModel(session_options, run_options, path_to_pb_file, {"serve"}, &bundle);
Input tensor's shape is [15, 224, 224, 3], it is filled copying data from cv::Mat to tensor's data pointer, I can post the code if needed.
Then inference is done running session :
const string input_node = "serving_default_mobilenetv2_0_50_224_input:0";
std::vector<string> output_nodes = {{"PartitionedCall:0"}};
std::vector<Tensor> predictions;
std::vector<std::pair<string, Tensor>> inputs_data = {{input_node, image_output[0]}};
//Inference
this->bundle.GetSession()->Run(inputs_data, output_nodes, {}, &predictions);
The inference takes about 30 ms in C++ against less than 10ms in Python. I read online that it could be because the model is loaded as a frozen graph, I don't think it is the case here because tensorflow's verbose says the model has been successfully restored and I guess a frozen graph can't be restored ?
It is important to note that python takes more than 1 minute to load the model against less than 5 secondes in C++ so I guess I don't load the model the best way to have a fast inference.
Question are, how can i optimize the inference in C++ ? Should I load my model differently ?
Hoping all informations needed are present, don't hesitate to ask for more. Thanks for your time, BC.
question from:https://stackoverflow.com/questions/66050622/tensorflow-c-slower-than-python