gpgpu - CUDA context lifetime -
in application have part of code works follows
main.cpp
int main() { //first dimension small (1-10) //second dimension (100 - 1500) //third dimension (10000 - 1000000) vector<vector<vector<double>>> someinfo; object someobject(...); //host class (int = 0; < n; i++) someobject.functiona(&(someinfo[i])); }
object.cpp
void someobject::functionb(vector<vector<double>> *someinfo) { #define gpu 1 #if gpu == 1 //gpu computing computeongpu(someinfo, aconstvalue, asecondconstvalue); #else //cpu computing #endif }
object.cu
extern "c" void computeongpu(vector<vector<double>> *someinfo, int aconstvalue, int asecondconstvalue) { //copy values constant memory //allocate memory on gpu //copy data gpu global memory //launch kernel //copy data cpu //free memory }
so (i hope) can see in code, function prepares gpu called many times depending on value of first dimension.
all values send constant memory remain same , sizes of pointers allocated in global memory same (the data 1 changing).
this actual workflow in code i'm not getting speedup when using gpu, mean kernel execute faster memory transfers became problem (as reported nvprof).
so wondering in app cuda context starts , finishes see if there way once copies constant memory , memory allocations.
normally, cuda context begins first cuda call in application, , ends when application terminates.
you should able have in mind, allocations once (at beginning of app) , corresponding free operations once (at end of app) , populate __constant__
memory once, before used first time.
it's not necessary allocate , free data structures in gpu memory repetetively, if not changing in size.
Comments
Post a Comment