首页 \ 问答 \ 将oclMat发送到函数会在运行时产生巨大差异(sending oclMat to function creates huge difference in runtime)

将oclMat发送到函数会在运行时产生巨大差异(sending oclMat to function creates huge difference in runtime)

我用3个输入写了一个函数(masking):

  1. inputOCL - 一个oclMat
  2. comparisonValue - 双精度值
  3. method - 确定比较方法的int变量

对于我的例子,我选择了method = 1,它代表CMP_GT,测试inputOCL> comparisonValue是否元素。

该函数的目的是将inputOCL中不符合给定copma的所有元素清零。

这是功能屏蔽:

void masking(cv::ocl::oclMat inputOCL, double comparisonValue, int method){
// NOTE: method can be set to 1-->5 corresponding to (==, >, >=, <, <=, !=)

cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(comparisonValue));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);  
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL); }

在对函数进行计时时,我发现运行函数或运行以下代码时直接运行计算时运行时间差异很大:

int main(int argc, char** argv){

double value1 = 1.23456789012345;
double value2 = 1.23456789012344;

// initialize matrix
cv::Mat I(5000, 5000, CV_64F, cv::Scalar(value1));
// copy input to GPU
cv::ocl::oclMat inputOCL(I);
int method = 1;
static double start_TIMER;

// computation done in function
start_TIMER = cv::getTickCount();
masking(inputOCL, value2, method);
std::cout << "\nFunction runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";

// direct computation
start_TIMER = cv::getTickCount();
cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(value2));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL);
std::cout << "\nDirect runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";
}

运行时可以在此屏幕截图中看到:

在此处输入图像描述

为什么运行时有这么大的差异?


I wrote a function (masking) with 3 inputs:

  1. inputOCL - an oclMat
  2. comparisonValue - a double value
  3. method - an int variable determining the comparison method

For my example I chose method=1, which stands for CMP_GT, testing if inputOCL>comparisonValue element-wise.

The purpose of the function is to zero out all the elements in inputOCL that don't comply with the given copmarison.

Here is the function masking:

void masking(cv::ocl::oclMat inputOCL, double comparisonValue, int method){
// NOTE: method can be set to 1-->5 corresponding to (==, >, >=, <, <=, !=)

cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(comparisonValue));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);  
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL); }

When timing the function I find a very large difference in runtime between running the function or running the computation directly when running the following code:

int main(int argc, char** argv){

double value1 = 1.23456789012345;
double value2 = 1.23456789012344;

// initialize matrix
cv::Mat I(5000, 5000, CV_64F, cv::Scalar(value1));
// copy input to GPU
cv::ocl::oclMat inputOCL(I);
int method = 1;
static double start_TIMER;

// computation done in function
start_TIMER = cv::getTickCount();
masking(inputOCL, value2, method);
std::cout << "\nFunction runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";

// direct computation
start_TIMER = cv::getTickCount();
cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());
valueOCL.setTo(cv::Scalar(value2));
cv::ocl::oclMat logicalOCL;
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL);
std::cout << "\nDirect runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";
}

The runtimes can be seen in this screenshot:

enter image description here

Why is there such a large difference in runtimes?


原文:https://stackoverflow.com/questions/30319096
更新时间:2024-04-22 10:04

最满意答案

最简单的例子是SDK中的ArrayBuffer示例(examples / api / var_array_buffer)。

ArrayBuffer的内存由pp :: VarArrayBuffer拥有,因此只要你有一个引用(并且你没有调用pp :: VarArrayBuffer :: Unmap ),你就不必复制记忆。

pp :: Var变量会自动引用计数,因此您无需显式调用AddRef


The simplest example of this is the ArrayBuffer example in the SDK (examples/api/var_array_buffer).

The memory for the ArrayBuffer is owned by the pp::VarArrayBuffer, so as long as you have a reference to that (and you haven't called pp::VarArrayBuffer::Unmap) you don't have to make a copy of the memory.

pp::Var variables are automatically reference counted, so you don't need to explicitly call AddRef.

相关问答

更多

相关文章

更多

最新问答

更多
  • 带有简单redis应用程序的Node.js抛出“未处理的错误”(Node.js with simple redis application throwing 'unhandled error')
  • 高考完可以去做些什么?注意什么?
  • Allauth不会保存其他字段(Allauth will not save additional fields)
  • Flask中的自定义中止映射/异常(Custom abort mapping/exceptions in Flask)
  • sed没有按预期工作,从字符串中间删除特殊字符(sed not working as expected, removing special character from middle of string)
  • 怎么在《我的世界》游戏里面编程
  • .NET可移植可执行文件VS .NET程序集(.NET Portable Executable File VS .NET Assembly)
  • 搜索字符串从视图中键入两个字段的“名字”和“姓氏”组合(Search Strings Typed from View for Two Fields 'First Name' and 'Last Name' Combined)
  • 我可以通过配置切换.Net缓存提供程序(Can I switch out .Net cache provider through configuration)
  • 在鼠标悬停或调整浏览器大小之前,内容不会加载(Content Does Not Load Until Mouse Hover or Resizing Browser)
  • 未捕获的TypeError:auth.get不是函数(Uncaught TypeError: auth.get is not a function)
  • 如何使用变量值创建参数类(How to create a parameter class with variant value)
  • 在std :: deque上并行化std :: replace(Parallelizing std::replace on std::deque)
  • 单元测试返回Connection对象的方法(Unit Test for a method that returns a Connection object)
  • rails:上传图片时ios中的服务器内部错误(rails: server internal error in ios while uploading image)
  • 如何在Android中构建应用程序警报[关闭](How build an application Alarm in Android [closed])
  • 以编程方式连接到Windows Mobile上的蓝牙耳机(Programmatically connect to bluetooth headsets on Windows Mobile)
  • 在两个不同的SharedPreference中编写并获得相同的结果(Writing in two different SharedPreference and getting the same result)
  • CSS修复容器和溢出元素(CSS Fix container and overflow elements)
  • 在'x','y','z'迭代上追加数组(Append array on 'x', 'y', 'z' iteration)
  • 我在哪里可以看到使用c ++源代码的UML方案示例[关闭](Where I can see examples of UML schemes with c++ source [closed])
  • SQL多个连接在与where子句相同的表上(SQL Multiple Joins on same table with where clause)
  • 位字段并集的大小,其成员数多于其大小(Size of bit-field union which has more members than its size)
  • 我安装了熊猫,但它不起作用(I installed pandas but it is not working)
  • Composer - 更改它在env中使用的PHP版本(Composer - Changing the version of PHP it uses in the env)
  • 使用JavaFX和Event获取鼠标位置(Getting a mouse position with JavaFX and Event)
  • 函数调用可以重新排序(Can function calls be reordered)
  • 关于“一对多”关系的NoSQL数据建模(NoSQL Data Modeling about “one to many” relationships)
  • 如何解释SBT错误消息(How to interpret SBT error messages)
  • 调试模式下的Sqlite编译器错误“初始化程序不是常量”(Sqlite compiler errors in Debug mode “initializer is not a constant”)