本文以Resnet34示例如何将Pytorch导出的pth模型转为onnx,使用ONNX Runtime进行推理。
pth转为onnx
使用Pytorch api加载网络结构,加载pth模型,导出ONNX模型。注意需要修改模型传入的类别数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import torch from model import resnet34 import os device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # create model 实例化时传入类别数 model = resnet34(num_classes=139).to(device) # load model weights weights_path = "./resNet34.pth" assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path) model.load_state_dict(torch.load(weights_path, map_location=device)) # prediction model.eval() # 准备示例输入 example_input = torch.randn(1, 3, 224, 224).cuda() # 假设输入尺寸为224x224 # 导出为ONNX格式 output_path = "model.onnx" torch.onnx.export(model, example_input, output_path, opset_version=13) |
ONNX Runtime推理(Python)
此处使用随机生成变量进行推理,需确定输入张量的shape。同时需确定模型输入变量名,如不确定,可借助Netron可视化工具查看。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import onnx import onnxruntime import torch # 加载导出的ONNX模型 onnx_model = onnx.load("model.onnx") # 创建ONNX运行时会话 session = onnxruntime.InferenceSession(onnx_model.SerializeToString()) # 准备示例输入并转换为CUDA类型 example_input = torch.randn(1, 3, 224, 224).cuda() # 将输入张量转换为numpy数组 input_data = example_input.detach().cpu().numpy() # 使用ONNX运行时进行推理 outputs = session.run(None, {"input.1": input_data}) # 获取输出结果 output_data = outputs[0] # 将输出结果转换为PyTorch张量 output_tensor = torch.from_numpy(output_data) # 对结果进行后处理 output_tensor = torch.squeeze(output_tensor) # 去除维度为1的维度 print(output_tensor) _, predicted_class = torch.max(output_tensor, dim=0) # 获取最大值及其索引 print("Predicted class:", predicted_class.item()) |
ONNX Runtime推理(C++)
使用VS中的NuGet搜索安装Microsoft.ML.OnnxRuntime.Gpu1.12.0与opencv4.2库。
官方给出了OnnxRuntime版本与CUDA的对应关系: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html
Note: Because of CUDA Minor Version Compatibility, Onnx Runtime built with CUDA 11.4 should be compatible with any CUDA 11.x version.
示例代码来源:https://github.com/cassiebreviu/cpp-onnxruntime-resnet-console-app/tree/main/OnnxRuntimeResNet;https://www.youtube.com/watch?v=imjqRdsm2Qw
c++
#include <onnxruntime_cxx_api.h>
#include <iostream>
#include <chrono>
#include "Helpers.cpp"
int main()
{
Ort::Env env;
Ort::RunOptions runOptions;
Ort::Session session(nullptr);
<pre><code>constexpr int64_t numChannels = 3;
constexpr int64_t width = 224;
constexpr int64_t height = 224;
constexpr int64_t numClasses = 139;
constexpr int64_t numInputElements = numChannels * height * width;
const std::string imageFile = "D:\\inspur\\backup\\newsanzhou\\flame\\model_c\\Project1\\1.jpg";
const std::string labelFile = "D:\\inspur\\backup\\newsanzhou\\flame\\model_c\\Project1\\imagenet_classes.txt";
auto modelPath = L"D:\\inspur\\backup\\newsanzhou\\flame\\model_c\\Project1\\model.onnx";
//load labels
std::vector<std::string> labels = loadLabels(labelFile);
if (labels.empty()) {
std::cout << "Failed to load labels: " << labelFile << std::endl;
return 1;
}
// load image
const std::vector<float> imageVec = loadImage(imageFile);
if (imageVec.empty()) {
std::cout << "Failed to load image: " << imageFile << std::endl;
return 1;
}
if (imageVec.size() != numInputElements) {
std::cout << "Invalid image format. Must be 224x224 RGB image." << std::endl;
return 1;
}
// Use CUDA GPU
Ort::SessionOptions ort_session_options;
OrtCUDAProviderOptions options;
options.device_id = 0;
//options.arena_extend_strategy = 0;
//options.gpu_mem_limit = 2 * 1024 * 1024 * 1024;
//options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchExhaustive;
//options.do_copy_in_default_stream = 1;
OrtSessionOptionsAppendExecutionProvider_CUDA(ort_session_options, options.device_id);
// create session
session = Ort::Session(env, modelPath, ort_session_options);
// Use CPU
//session = Ort::Session(env, modelPath, Ort::SessionOptions{ nullptr });
// define shape
const std::array<int64_t, 4> inputShape = { 1, numChannels, height, width };
const std::array<int64_t, 2> outputShape = { 1, numClasses };
// define array
std::array<float, numInputElements> input;
std::array<float, numClasses> results;
// define Tensor
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);
auto inputTensor = Ort::Value::CreateTensor<float>(memory_info, input.data(), input.size(), inputShape.data(), inputShape.size());
auto outputTensor = Ort::Value::CreateTensor<float>(memory_info, results.data(), results.size(), outputShape.data(), outputShape.size());
// copy image data to input array
std::copy(imageVec.begin(), imageVec.end(), input.begin());
// define names
Ort::AllocatorWithDefaultOptions ort_alloc;
Ort::AllocatedStringPtr inputName = session.GetInputNameAllocated(0, ort_alloc);
Ort::AllocatedStringPtr outputName = session.GetOutputNameAllocated(0, ort_alloc);
const std::array<const char*, 1> inputNames = { inputName.get()};
const std::array<const char*, 1> outputNames = { outputName.get()};
inputName.release();
outputName.release();
// 获取当前时间点
auto start = std::chrono::high_resolution_clock::now();
// run inference
try {
session.Run(runOptions, inputNames.data(), &inputTensor, 1, outputNames.data(), &outputTensor, 1);
}
catch (Ort::Exception& e) {
std::cout << e.what() << std::endl;
return 1;
}
// 获取当前时间点
auto end = std::chrono::high_resolution_clock::now();
// 计算时间差
std::chrono::duration<double> duration = end - start;
// 输出执行时间,以秒为单位
std::cout << "代码执行时间: " << duration.count() << " 秒" << std::endl;
// sort results
std::vector<std::pair<size_t, float>> indexValuePairs;
for (size_t i = 0; i < results.size(); ++i) {
indexValuePairs.emplace_back(i, results[i]);
}
std::sort(indexValuePairs.begin(), indexValuePairs.end(), [](const auto& lhs, const auto& rhs) { return lhs.second > rhs.second; });
// show Top5
for (size_t i = 0; i < 5; ++i) {
const auto& result = indexValuePairs[i];
std::cout << i + 1 << ": " << labels[result.first] << " " << result.second << std::endl;
}
</code></pre>
}
注:modelPath前需要加L。
补充:实际测试时发现,单张图片推理使用CPU速度为20ms,而使用CUDA速度为1s。经过查阅github发现,OnnxRuntime调用CUDA首次推理是比较慢的,之后会明显加快,二次推理速度为2ms。(https://github.com/microsoft/onnxruntime/issues/11581)