Some CUDA questions and answers

From BOINC Wiki
Jump to: navigation, search

In between reporting a whole lot of erroneous tasks from Seti to the Nvidia developer, I also asked him some other useful questions.

Is there an easy way for people to check for memory leaks when using the Seti CUDA application? The application primarily uses video memory for data. There is no straight forward method to check for leaks, but it can be done by hand instrumenting the source code with a global list or something. I'm not sure of the behavior on freeing VRAM memory if the application or data hangs or catches an exception. That is something we have to check into.

Weird one maybe, but what's the preferred screen resolution to use with the CUDA application? It doesn't really matter as long as there is enough video memory available. Even on a 256MB GPU card, at 1920x1200x32bpp resolution it would consume only 10MB of VRAM and it's likely that any extra off screen GDI memory is no more than another 10MB. That still gives [email protected] CUDA enough to do its thing given the fact that there is no other app running consuming more VRAM. Just realize that on a single primary GPU, CUDA and GDI (Desktop) have to share the same device so running Photoshop or playing Youtube video will rob CUDA of some crunching performance and vise versa running [email protected] will rob perf from your desktop app. It's best to just run with a blank screen saver (or the BOINC screen saver).

What about the colour depth. Would that make a difference? 8bit, 16bit, (24bit), 32bit? If available VRAM is a problem on a 256MB configured GPU and it is being used as a primary display, changing to a lower bpp value (16 or 8) will help reduce the GDI usage of VRAM heap so that it can run.

Can the GPU be benchmarked and if yes, which application would be preferred? Ideally another CUDA app or test with the usage of the cuFFT lib would be the best way to test the GPU compute rather than a 3rd party Graphics app or game. There are plenty of sample apps in the publicly available CUDA SDK that may suffice as a way to benchmark any given CPU.

Does the Seti application put the GPU under continuous load, or does it do it in bursts? The CUDA code kernels sent to the GPU put this under a heavy continuous load. The work does come in bursts but the period between bursts is probably insignificant. We scale the work to the capabilities of the GPU which means we try to keep it saturated with computing tasks.

Seti specific: Is it known for certain that Very Low Angle Range (VLAR) or Very High Angle Range (VHAR) tasks will always error? The problem stems from CUDA pulse detection code path in the GPU taking way too long to complete on some VLARs. This can cause the GPU to time out towards the OS and driver. The driver instability may be a result of those long execution times. We are investigating the problem and will try to fix it as soon as possible.

Can the GPU be throttled in another way? (BOINC uses a pause system to throttle CPU calculations, if set by the user. It then pauses all of BOINC for the duration of a second or more. I'm wondering what effect that has on the GPU's lifespan.) In other words, can the GPU be set to use only half its capacity (50% comparable CPU cycles) or not? I don't know of any supervisory ways to assign a CUDA app to limit to a percentage of the GPU throughput. You could always throttle the CPU thread that feeds the GPU to effectively limit the GPU.

Ah, but the problem here is that it uses so little CPU already. What does it take, 3 to 4% of the CPU? And then it only uses the CPU when data is transferred from the GPU's memory to disk and from the disk to the GPU's memory. The rest is done solely by the GPU. The CPU usage is fairly small in SETI, but that's only because we sleep in the driver waiting for the GPU to complete its current task at hand. Program-wise pausing the execution of the CPU thread that is feeding CUDA kernel functions will effectively reduce GPU usage rate because you're starving the GPU for data to crunch. The downside is that it will slow down speed of the app.

But: The values set by the drivers in combination with the VBIOS should already monitor temperatures and regulate the fan and GPU clocks accordingly. This may not work on a deliberately overclocked GPU.

Original writer Original FAQ Date
Jorden 478 07-01-2009