Learning to Engineer: Ed: OpenCL vs CUDA, Mid-2012 Edition

It has been a year to the day since I wrote about choosing between CUDA and OpenCL for GPU-accelerated applications. A lot has changed in the GPU compute industry since then, so I thought the topic was worth a revisit. This time, I'll make it more of a shootout and split up the discussion into some points worth considering in this debate, namely scalability, friendliness, and compatibility.

Scalability

If you need the power of a GPU cluster for either your research or improving your company's bottom line, then NVIDIA will try to be your new best friend. Over the past year, many of the features they have introduced to CUDA (but not to OpenCL) have been gunning for HPC applications. GPUDirect 2.0 with Fermi in CUDA 4 allowed peer-to-peer communication between GPUs, and the next version will allow the GPU to talk with third-party devices via RDMA. OpenCL cannot get these features until it supports a unified address space. NVIDIA is also adding features like Hyper-Q to increase GPU utilization via contexts provided by nodes throughout a cluster (via MPI). It's not clear if this feature will be available in OpenCL.

I am curious what AMD's plan of attack is to further OpenCL in this space. They've recently released a remarkable compute GPU, but haven't yet released a professional version with ECC support and perhaps stronger FP64 support (although the 1/4 FP32 given is already nice). How they plan to share Graphics Core Next with industry and research labs remains to be seen. Perhaps AFDS '12 in a couple of weeks will provide more insight.

Winner: CUDA

Friendliness

Developer friendliness in something as tricky as writing parallel code on a coprocessor is pretty important. Last year I noted that the OpenCL APIs looked as arcane as the forgotten CUDA Driver API. Thankfully progress is being made. OpenCL 1.1 introduced C++ wrappers to help reduce code complexity and narrow the gap. AMD's extensions to OpenCL 1.2 allow C++ features like templates in kernel code too, which means that thrust can now be theoretically ported to OpenCL. If thrust and other well-known GPU libraries can cross the divide, then OpenCL can make quite a comeback. The technology is there, and I'd like to think the mindshare is there too.

Winner: CUDA (for now)

Compatibility

This point used to be pretty straightforward. CUDA only ran on NVIDIA GPUs, and OpenCL had the promise to run everywhere. Then NVIDIA ported the CUDA compiler toolchain to LLVM, which would allow CUDA to be compiled through any of the many backends LLVM supports, such as x86 and ARM (or even AMD IL?). This work has yet to be open-sourced to the masses, though. Meanwhile, OpenCL can run on not only the lastest NVIDIA and AMD GPUs, but also on the new Intel Ivy Bridge integrated graphics platform (IGP). These IGPs, along with AMD's accelerated processing units (APUs) are where the real volume is at. It is not an exaggeration to say that nearly every new PC will have one of these OpenCL-capable IGPs or their successors in the near future. For a consumer application developer, this alone could clinch the debate for OpenCL.

Winner: OpenCL

But what about the alternatives?

Perhaps the most interesting part about this debate in 2012 is that they are not the only two entrants. For instance, Microsoft announced at AFDS '11 a framework called C++ AMP for parallel programming. C++ AMP (my recap from last year) offers the ability for lightly extended C++ code to run on all of the heterogeneous hardware your PC has available, and for that code to be written and debugged in the comfort of Visual Studio. Meanwhile, NVIDIA has been working with Cray and PGI to develop OpenACC, a platform building off the OpenMP paradigm to allow compiler directives to parallelize your code instead of forcing a costly rewrite. These platforms are still young and don't necessarily take the place of CUDA or OpenCL, so we'll see where developers and the market take them. At this nascent stage of GPU computing, I think more choice is a good thing. Let's see how progress in the rest of 2012 and 2013 take the industry even further.

Learning to Engineer

Thursday, May 31, 2012

Ed: OpenCL vs CUDA, Mid-2012 Edition

Scalability

Friendliness

Compatibility

But what about the alternatives?

No comments:

Post a Comment

About Me

Blog Archive