首页 \ 问答 \ 将oclMat发送到函数会在运行时产生巨大差异(sending oclMat to function creates huge difference in runtime)

将oclMat发送到函数会在运行时产生巨大差异(sending oclMat to function creates huge difference in runtime)

 我用3个输入写了一个函数（masking）：  
 
  inputOCL - 一个oclMat  
  comparisonValue - 双精度值  
  method - 确定比较方法的int变量  
 
 对于我的例子，我选择了method = 1，它代表CMP_GT，测试inputOCL> comparisonValue是否元素。  
 该函数的目的是将inputOCL中不符合给定copma的所有元素清零。  
 这是功能屏蔽：  
void masking(cv::ocl::oclMat inputOCL, double comparisonValue, int method){
// NOTE: method can be set to 1-->5 corresponding to (==, >, >=, <, <=, !=)

cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(comparisonValue));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);  
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL); }
 
 在对函数进行计时时，我发现运行函数或运行以下代码时直接运行计算时运行时间差异很大：  
int main(int argc, char** argv){

double value1 = 1.23456789012345;
double value2 = 1.23456789012344;

// initialize matrix
cv::Mat I(5000, 5000, CV_64F, cv::Scalar(value1));
// copy input to GPU
cv::ocl::oclMat inputOCL(I);
int method = 1;
static double start_TIMER;

// computation done in function
start_TIMER = cv::getTickCount();
masking(inputOCL, value2, method);
std::cout << "\nFunction runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";

// direct computation
start_TIMER = cv::getTickCount();
cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(value2));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL);
std::cout << "\nDirect runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";
}
 
 运行时可以在此屏幕截图中看到：  
 
 为什么运行时有这么大的差异？ 

I wrote a function (masking) with 3 inputs: 
 
 inputOCL - an oclMat 
 comparisonValue - a double value 
 method - an int variable determining the comparison method 
 
For my example I chose method=1, which stands for CMP_GT, testing if inputOCL>comparisonValue element-wise. 
The purpose of the function is to zero out all the elements in inputOCL that don't comply with the given copmarison. 
Here is the function masking: 
void masking(cv::ocl::oclMat inputOCL, double comparisonValue, int method){
// NOTE: method can be set to 1-->5 corresponding to (==, >, >=, <, <=, !=)

cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(comparisonValue));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);  
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL); }
 
When timing the function I find a very large difference in runtime between running the function or running the computation directly when running the following code: 
int main(int argc, char** argv){

double value1 = 1.23456789012345;
double value2 = 1.23456789012344;

// initialize matrix
cv::Mat I(5000, 5000, CV_64F, cv::Scalar(value1));
// copy input to GPU
cv::ocl::oclMat inputOCL(I);
int method = 1;
static double start_TIMER;

// computation done in function
start_TIMER = cv::getTickCount();
masking(inputOCL, value2, method);
std::cout << "\nFunction runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";

// direct computation
start_TIMER = cv::getTickCount();
cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(value2));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL);
std::cout << "\nDirect runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";
}
 
The runtimes can be seen in this screenshot: 
 
Why is there such a large difference in runtimes?

原文：https://stackoverflow.com/questions/30319096

更新时间：2024-04-22 10:04

最满意答案

 最简单的例子是SDK中的ArrayBuffer示例（examples / api / var_array_buffer）。  
 ArrayBuffer的内存由pp :: VarArrayBuffer拥有，因此只要你有一个引用（并且你没有调用pp :: VarArrayBuffer :: Unmap ），你就不必复制记忆。  
 pp :: Var变量会自动引用计数，因此您无需显式调用AddRef 。 

The simplest example of this is the ArrayBuffer example in the SDK (examples/api/var_array_buffer). 
The memory for the ArrayBuffer is owned by the pp::VarArrayBuffer, so as long as you have a reference to that (and you haven't called pp::VarArrayBuffer::Unmap) you don't have to make a copy of the memory. 
pp::Var variables are automatically reference counted, so you don't need to explicitly call AddRef.

将oclMat发送到函数会在运行时产生巨大差异(sending oclMat to function creates huge difference in runtime)

最满意答案

相关问答

如何通过SignalR将javascript对象从一个客户端发送到另一个客户端(how to send a javascript Object from one Client to Another by SignalR)[2022-12-14]

传入缓冲区大小无法设置为Tyrus客户端(Incoming buffer size cannot be set to Tyrus client)[2023-05-10]

如何从Native客户端和javascript中发送数组缓冲区(how to send array buffer from and to in Native client and javascript)[2023-01-16]

无法显示或recv（）客户端缓冲区中的数据(unable display or recv() the data at the client buffer)[2022-04-15]

在Javascript中将对象转换为缓冲区(Converting an object to a buffer in Javascript)[2022-07-10]

Javascript异步缓冲区副本(Javascript Async buffer copy)[2023-05-14]

dojo /使用NodeJS请求png图像数组缓冲区，并将图像返回给客户端(dojo/request a png image array buffer using NodeJS, and return the image to the client)[2023-06-03]

如何“刷新”TCP客户端缓冲区？(How do I “flush” a TCP Client Buffer?)[2022-02-10]

为什么tcp服务器收到一个缓冲区如果客户端发送多个缓冲区没有睡眠？(Why tcp server receives one buffer if Client sends multiple buffer without sleep?)[2023-09-26]

Firebase：通过设备ID将消息发送到JavaScript客户端（react-native）(Firebase: Send Messages to a JavaScript Client (react-native) via device id)[2022-11-21]

相关文章

最新问答