gpgpu - CUDA context lifetime -


in application have part of code works follows

main.cpp

int main() {   //first dimension small (1-10)   //second dimension (100 - 1500)   //third dimension (10000 - 1000000)   vector<vector<vector<double>>> someinfo;    object someobject(...); //host class    (int = 0; < n; i++)      someobject.functiona(&(someinfo[i])); } 

object.cpp

void someobject::functionb(vector<vector<double>> *someinfo) { #define gpu 1 #if gpu == 1     //gpu computing     computeongpu(someinfo, aconstvalue, asecondconstvalue); #else     //cpu computing #endif } 

object.cu

extern "c" void computeongpu(vector<vector<double>> *someinfo, int aconstvalue, int asecondconstvalue) {    //copy values constant memory     //allocate memory on gpu            //copy data gpu global memory     //launch kernel     //copy data cpu     //free memory } 

so (i hope) can see in code, function prepares gpu called many times depending on value of first dimension.

all values send constant memory remain same , sizes of pointers allocated in global memory same (the data 1 changing).

this actual workflow in code i'm not getting speedup when using gpu, mean kernel execute faster memory transfers became problem (as reported nvprof).

so wondering in app cuda context starts , finishes see if there way once copies constant memory , memory allocations.

normally, cuda context begins first cuda call in application, , ends when application terminates.

you should able have in mind, allocations once (at beginning of app) , corresponding free operations once (at end of app) , populate __constant__ memory once, before used first time.

it's not necessary allocate , free data structures in gpu memory repetetively, if not changing in size.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -