Onnx Runtime

本文以Resnet34示例如何将Pytorch导出的pth模型转为onnx,使用ONNX Runtime进行推理。

pth转为onnx

使用Pytorch api加载网络结构,加载pth模型,导出ONNX模型。注意需要修改模型传入的类别数。

ONNX Runtime推理(Python)

此处使用随机生成变量进行推理,需确定输入张量的shape。同时需确定模型输入变量名,如不确定,可借助Netron可视化工具查看。

ONNX Runtime推理(C++)

使用VS中的NuGet搜索安装Microsoft.ML.OnnxRuntime.Gpu1.12.0与opencv4.2库。

官方给出了OnnxRuntime版本与CUDA的对应关系: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

Note: Because of CUDA Minor Version Compatibility, Onnx Runtime built with CUDA 11.4 should be compatible with any CUDA 11.x version.

示例代码来源:https://github.com/cassiebreviu/cpp-onnxruntime-resnet-console-app/tree/main/OnnxRuntimeResNet;https://www.youtube.com/watch?v=imjqRdsm2Qw

c++
#include <onnxruntime_cxx_api.h>
#include <iostream>
#include <chrono>
#include "Helpers.cpp"

int main()
{
Ort::Env env;
Ort::RunOptions runOptions;
Ort::Session session(nullptr);

<pre><code>constexpr int64_t numChannels = 3;
constexpr int64_t width = 224;
constexpr int64_t height = 224;
constexpr int64_t numClasses = 139;
constexpr int64_t numInputElements = numChannels * height * width;

const std::string imageFile = "D:\\inspur\\backup\\newsanzhou\\flame\\model_c\\Project1\\1.jpg";
const std::string labelFile = "D:\\inspur\\backup\\newsanzhou\\flame\\model_c\\Project1\\imagenet_classes.txt";
auto modelPath = L"D:\\inspur\\backup\\newsanzhou\\flame\\model_c\\Project1\\model.onnx";
//load labels
std::vector<std::string> labels = loadLabels(labelFile);
if (labels.empty()) {
std::cout << "Failed to load labels: " << labelFile << std::endl;
return 1;
}

// load image
const std::vector<float> imageVec = loadImage(imageFile);
if (imageVec.empty()) {
std::cout << "Failed to load image: " << imageFile << std::endl;
return 1;
}

if (imageVec.size() != numInputElements) {

std::cout << "Invalid image format. Must be 224x224 RGB image." << std::endl;
return 1;
}

// Use CUDA GPU
Ort::SessionOptions ort_session_options;

OrtCUDAProviderOptions options;
options.device_id = 0;
//options.arena_extend_strategy = 0;
//options.gpu_mem_limit = 2 * 1024 * 1024 * 1024;
//options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchExhaustive;
//options.do_copy_in_default_stream = 1;

OrtSessionOptionsAppendExecutionProvider_CUDA(ort_session_options, options.device_id);

// create session
session = Ort::Session(env, modelPath, ort_session_options);

// Use CPU
//session = Ort::Session(env, modelPath, Ort::SessionOptions{ nullptr });

// define shape
const std::array<int64_t, 4> inputShape = { 1, numChannels, height, width };
const std::array<int64_t, 2> outputShape = { 1, numClasses };

// define array
std::array<float, numInputElements> input;
std::array<float, numClasses> results;

// define Tensor
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
auto inputTensor = Ort::Value::CreateTensor<float>(memory_info, input.data(), input.size(), inputShape.data(), inputShape.size());
auto outputTensor = Ort::Value::CreateTensor<float>(memory_info, results.data(), results.size(), outputShape.data(), outputShape.size());

// copy image data to input array
std::copy(imageVec.begin(), imageVec.end(), input.begin());

// define names
Ort::AllocatorWithDefaultOptions ort_alloc;
Ort::AllocatedStringPtr inputName = session.GetInputNameAllocated(0, ort_alloc);
Ort::AllocatedStringPtr outputName = session.GetOutputNameAllocated(0, ort_alloc);
const std::array<const char*, 1> inputNames = { inputName.get()};
const std::array<const char*, 1> outputNames = { outputName.get()};
inputName.release();
outputName.release();
// 获取当前时间点
auto start = std::chrono::high_resolution_clock::now();
// run inference
try {
session.Run(runOptions, inputNames.data(), &inputTensor, 1, outputNames.data(), &outputTensor, 1);
}
catch (Ort::Exception& e) {
std::cout << e.what() << std::endl;
return 1;
}

// 获取当前时间点
auto end = std::chrono::high_resolution_clock::now();

// 计算时间差
std::chrono::duration<double> duration = end - start;

// 输出执行时间,以秒为单位
std::cout << "代码执行时间: " << duration.count() << " 秒" << std::endl;

// sort results
std::vector<std::pair<size_t, float>> indexValuePairs;
for (size_t i = 0; i < results.size(); ++i) {
indexValuePairs.emplace_back(i, results[i]);
}
std::sort(indexValuePairs.begin(), indexValuePairs.end(), [](const auto& lhs, const auto& rhs) { return lhs.second > rhs.second; });

// show Top5
for (size_t i = 0; i < 5; ++i) {
const auto& result = indexValuePairs[i];
std::cout << i + 1 << ": " << labels[result.first] << " " << result.second << std::endl;
}
</code></pre>

}

注:modelPath前需要加L。

补充:实际测试时发现,单张图片推理使用CPU速度为20ms,而使用CUDA速度为1s。经过查阅github发现,OnnxRuntime调用CUDA首次推理是比较慢的,之后会明显加快,二次推理速度为2ms。(https://github.com/microsoft/onnxruntime/issues/11581)

发表评论

邮箱地址不会被公开。 必填项已用*标注