Quick summary:
Blender crashes 100% when using CUDA 5 on a RMBP on any .blend file (including default cube).
GPU Compute device not recognized at all with CUDA 4.2. Recognized only on CUDA 5.0.x with Blender 2.65
Reports Launch failed in cuCtxSynchronize() when viewport rendering set to "Rendered".
Crashes on F12 render.
Detail:
OS X 10.8.2
Retina MacBook Pro (2.7GHz Core i7)
16GB RAM
Blender 2.65
CUDA Driver 5.0.37
CUDA 5.0 Tools and SDK installed
CUDA deviceQuery return:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 650M"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 1024 MBytes (1073414144 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 900 MHz (0.90 GHz)
Memory Clock rate: 2508 Mhz
Memory Bus Width: 128-bit
... (and so on)
************
Blender file: simple cube (will happen with ANY .blend file)
************
If viewport shading is set to "rendered" (with
I get: CUDA error: Launch failed in cuCtxSynchronize()
If I press F12 to render, I get a crash:
Process: blender [839]
Path: /Applications/Blender/blender.app/Contents/MacOS/blender
Identifier: org.blenderfoundation.blender
Version: 2.65 (2.65, 2012-Dec-10, Blender Foundation)
Code Type: X86-64 (Native)
Parent Process: launchd [159]
User ID: 501
Date/Time: 2012-12-19 11:02:57.730 -0800
OS Version: Mac OS X 10.8.2 (12C3006)
Report Version: 10
Interval Since Last Report: 537515 sec
Crashes Since Last Report: 46
Per-App Interval Since Last Report: 293780 sec
Per-App Crashes Since Last Report: 4
Anonymous UUID: 686F38A4-15BC-F990-B28E-3BB77EC37403
Crashed Thread: 24
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: EXC_I386_GPFLT
...
Thread 24 Crashed:
0 libcuda_304.10.20.dylib 0x00000001151e9197 cuGraphicsGLRegisterImage + 1534103
1 libcuda_304.10.20.dylib 0x0000000115076ec0 cuGraphicsGLRegisterImage + 17856
2 libcuda_304.10.20.dylib 0x0000000115062b9a cuFuncSetCacheConfig + 106
3 org.blenderfoundation.blender 0x0000000100ae8e7f ccl::CUDADevice::path_trace(ccl::RenderTile&, int) + 1439
4 org.blenderfoundation.blender 0x0000000100ae9e49 ccl::CUDADevice::thread_run(ccl::DeviceTask*) + 233
5 org.blenderfoundation.blender 0x0000000100000800 0x100000000 + 2048
Description
Event Timeline
Only CUDA toolkit 4.2 is supported, 5.0 will indeed not work, so the problem that needs to be solved is why it doesn't work with 4.2. (The CUDA driver can be any version)
http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.65/Cycles#CUDA
The CUDA compiler is expected to be installed in:
/usr/local/cuda/bin/nvcc
I've not seen the installer put it elsewhere by default, but maybe that changed? If it is installed elsewhere, you can set the environment variable CUDA_BIN_PATH e.g. like this:
export CUDA_BIN_PATH=/usr/local/cuda-4.2/bin
Brecht,
Thanks so very much for your quick reply (and for all your work on Blender!). I really appreciate it.
Can you briefly walk me through what is needed for Blender to recognize that CUDA is properly installed? In brief, in spite of pointing it, I believe, properly to 4.2, Blender doesn't think CUDA is available. Recall that it was detecting 5.0 properly, but for some reason not 4.2. When I run 2.65 (or 2.64) the CUDA option is not available in the Systems tab.
I now have /usr/local/cuda now pointing to the 4.2 installation. This means that /usr/local/cuda/bin/nvcc should now work as should all the .dylib files in /usr/local/cuda/lib. The driver is installed and working (though it is the 5.0 driver...but you said this didn't matter...it doesn't seem to help anyway as the 4.2 driver makes no difference, so I left the 5.0 driver there).
My assumption is that you need just the compiler, nvcc, and perhaps the dylib files. Is this correct?
I'm a new Blender user, having only used it the past six weeks or so, and only tried to get GPU working recently. As such, my CUDA installation is very new, and NVIDIA appears to have changed how things install with CUDA 5.0+. Most people probably had an earlier install of 4.2 and aren't seeing issues.
I'd be quite happy to document up any findings and help new OS X users get going with GPU acceleration on Blender once I get this figured out!
Thanks again for your help.
I also forgot to note that I did try the environment variable, just to make sure Blender was looking in /usr/local/cuda/bin for nvcc, but it made no difference.
Hi Jon,
I would like to remind you to bug reporting guidelines... for build issues we have much better support channels available... check "get involved" on blender.org?
We have a great OS X platform maintainer (Jens Verwiebe), who can also be of assistance. Getting good cuda and opencl support on OS X is definitely a current topic.
I leave it to Brecht to give a final word on this report :)
There might be a bug here, or at least something we should add to our documentation.
I don't understand yet what's going wrong here, so to be clear, running this command in the terminal works and gives version 4.2?
/usr/local/cuda/bin/nvcc --version
Assuming that works, is your GPU listed and selected under User Preferences > System?
Assuming that's there, is there an option to select GPU as device in the Render properties?
Assuming that's there, what is the error message?
Ton, thanks for the pointer, and perhaps I should have started there. I reported the bug because it crashed with CUDA 5.0 and I hadn't seen the note somewhat buried in the release notes that OS X had to use CUDA 4. Elsewhere in the docs it suggests that 5 should work, though perhaps not as fast.
Still, I think there's a bug, even with the 4.2 support.
I am a software developer, so I definitely understand spurious bugs! (if you knew what I did for work, you'd probably find it humorous that I'm using CUDA at all...I'd really prefer to be helping with the OpenCL support, but unfortunately my company's open source policies currently prevent me from helping with the code...I look forward to the day when I can contribute in the code.)
Brecht: I appreciate your indulgence here. It does appear to be a bug to me at least, or at least something that might require some manual configuration (and hence documentation).
jonthomason@jont-rmbp ~> /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Sat_Apr__7_14:56:41_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221
That is the return from nvcc on the expected path. Looks perfectly proper to me.
HOWEVER, my GPU is NOT listed under User Preferences > System.
Please note that when /usr/local/cuda/bin points to the nvcc from CUDA 5.0, it *is* listed (but crashes during render, as you probably expect).
What I'm trying to figure out is what else might be misconfigured and/or what any detection bug might be in Blender.
The NVIDIA GT 650M is a CUDA-capable part on OS X and reports CUDA capability 3.
If you think this is not a bug, I will pursue any possible configuration problems via the other support channels. The last thing I want to do is waste your time, keeping you from your good work on the project.
I also hope to contribute to your OpenCL side of things someday. Perhaps needless to say, OpenCL would be a lot more stable and easy to configure on OS X, but as you say elsewhere, the LLVM compiler used there is not nearly as mature.
Thanks for testing. I currently don't understand how this can be failing. There's two conditions for the device to show in the user preferences, one is that the CUDA driver has to report the device as being available, and the other is the existence of /usr/local/cuda/bin/nvcc. It doesn't try to run or do any other test with nvcc, it just has to exist there.
That this check somehow fails is very strange if CUDA toolkit 5.0 and 4.2 are basically just installed in the same directory and only one of them works. As far as I know the CUDA toolkit has no influence on the CUDA driver and vice versa, i.e. you could use the toolkit when there is GPU or an AMD GPU, and you could use a precompiled binary even if there is toolkit.
I've seen the CUDA driver fail to detect the GPU after putting a MacBook to sleep, restarting fixes that here. Another thing that could happen is that it's not using a low power GPU, setting the graphics settings to High Performance in the Energy Saver settings can fix that. But neither of these are related to the CUDA toolkit..
Thanks Brecht.
With newer installs of the CUDA Toolkit on OS X, NVIDIA puts the toolkits in the following trees:
/Developer/NVIDIA/CUDA-4.2/
/Developer/NVIDIA/CUDA-5.0/
/usr/local/cuda is just a folder of symlinks into either of the two folders in /Developer.
1. Interestingly, if I move the symlink to CUDA-5.0, it detects CUDA but crashes.
2. If I move the symlink to CUDA-4.2, it doesn't detect.
3. if I leave the symlink pointing to CUDA-5.0 but ONLY change the symlink from /usr/local/cuda/bin to 4.2...it detects...AND WORKS. To be super specific, let me show you the configuration that works:
bash-3.2$ cd /usr/local/cuda
bash-3.2$ ls -l
total 88
lrwxr-xr-x 1 root wheel 30 Dec 20 12:15 bin -> /Developer/NVIDIA/CUDA-4.2/bin
lrwxr-xr-x 1 root wheel 30 Dec 19 09:56 doc -> /Developer/NVIDIA/CUDA-5.0/doc
lrwxr-xr-x 1 root wheel 33 Dec 19 09:56 extras -> /Developer/NVIDIA/CUDA-5.0/extras
lrwxr-xr-x 1 root wheel 34 Dec 19 09:56 include -> /Developer/NVIDIA/CUDA-5.0/include
drwxr-xr-x 14 root wheel 476 Dec 19 10:03 lib
lrwxr-xr-x 1 root wheel 36 Dec 19 09:56 libnsight -> /Developer/NVIDIA/CUDA-5.0/libnsight
lrwxr-xr-x 1 root wheel 34 Dec 19 09:56 libnvvp -> /Developer/NVIDIA/CUDA-5.0/libnvvp
lrwxr-xr-x 1 root wheel 31 Dec 19 09:56 nvvm -> /Developer/NVIDIA/CUDA-5.0/nvvm
lrwxr-xr-x 1 root wheel 33 Dec 19 09:56 open64 -> /Developer/NVIDIA/CUDA-5.0/open64
lrwxr-xr-x 1 root wheel 34 Dec 19 09:56 samples -> /Developer/NVIDIA/CUDA-5.0/samples
lrwxr-xr-x 1 root wheel 30 Dec 19 09:56 src -> /Developer/NVIDIA/CUDA-5.0/src
lrwxr-xr-x 1 root wheel 32 Dec 19 09:56 tools -> /Developer/NVIDIA/CUDA-5.0/tools
Note how only the bin symlink is changed to the 4.2 folder.
When it does work, it's super slow, though, 50% slower than CPU on this hardware (10s on CPU, 15s on GPU at 64samples).
So there is something other than nvcc that is required for detection and whatever it is, I only got to work from the CUDA 5.0 installation. I can figure out further, but it might be pointless if the GPU is that much slower. The GPU is around 5x faster on my core i7 PC with NVIDIA GTX 670 card. Seems odd that it would be 50% slower with the notebook GT 650M as it's still a pretty fast GPU.
There's something else in CUDA detection that is requiring something outside the /usr/local/cuda/bin folder. I haven't figured out what it is yet, and whatever it is, only the 5.0 version works. I believe that other people will have trouble with this as well if they're installing the newest CUDA toolkit from NVIDIA. Note that I couldn't get it to work even with the archived 4.2 toolkit install.
Thanks for your attention to this. Obviously you can close the bug if you like, but I do think there's something that bears investigation in finding out why something other than nvcc matters.
Regarding performance, when comparing things like cores x Mhz or texture fill rate, the GTX 670 has about 4x more than the GT 650M. Of course these measures are not an exact indication of performance, but it shows that there is a big gap and that a 5x difference isn't unexpected.
I'll look closer into the paths stuff later, but indeed notebook GPU's are often slower than the CPU.
Thanks Brecht.
And yes, I understand the performance difference pretty well between this GPU and my desktop GPU, but I still thought the GT 650M should be ~2x as fast as the CPU. I'm quite surprised to see it 50% slower. This should be entirely dependent on number of GPU ALUs times clock speed. The GTX 670 has 1344 shader ALUs at 915MHz. The GT 650M has 384 shader cores at 900MHz. This is a factor of about 3.5x (as you said). Now since the GTX 670 is around 4-5x faster than the fast core i7 in my desktop machine, the GT 650M being only 3.5x slower than the GTX 670 should still be faster, somewhere around 2x, the speed of the slower core i7 in my notebook. Yet it's 50% slower instead. This is a bigger difference than I can easily explain away.
But I'll stop wasting your time on this. I do appreciate your help.
I did some testing now, and it seems that installing CUDA toolkit 4.2 can remove a file from the CUDA driver (/usr/local/cuda/lib/libcuda.dylib). Reinstalling the driver fixed that for me, added a note about that in the release notes. Further I've enabled CUDA binaries now for the OS X 10.6 buildbot build, which means you don't even need to have the toolkit installed when using those:
http://builder.blender.org/download/
Also, raytracing the speed is not easily comparable in terms of Mhz, it's often memory bound. But note the 670 actually has a processor clock of 980Mhz, for the 650M I can't find the number, it's not listed here even. The graphics clock is irrelevant for Cycles as far as I know.
http://www.geforce.com/hardware/desktop-gpus/geforce-gt-650m/specifications