Mac10.11安装Python2.7+Theano0.8+CUDA7.5

前面已经介绍过Win10Ubuntu14.04安装Python2.7+Theano0.7+Cuda7.5的方法。本文把最后的Mac系统也搞定了。总体过程比较顺利,中间有些小Trick,都一一解决了。

本文假设你已经安装好了xcode和cuda7.5 sdk(一个从应用商店装,一个从Nvidia官网下dmg装)。

1. 下载安装pip

下载pip-8.1.2.tar.gz,解压后运行

sudo chmod a+x setup.py
sudo ./setup.py install

2. 安装常用库的最新版本

sudo pip install numpy scipy matplotlib scikit-learn scikit-image --upgrade
sudo pip install theano  --upgrade
sudo easy_install --upgrade six

3. 安装pycuda

修改/etc/profile文件添加如下内容

# cuda
export CUDA_INC_DIR=/usr/local/cuda/include
export PATH=/usr/local/cuda/bin:$PATH
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH

下载pycuda-2016.1.2.tar.gz,解压后运行

sudo ln -s /usr/local/cuda/lib /usr/local/cuda/lib64
sudo ln -s /Developer/NVIDIA/CUDA-7.5/lib /Developer/NVIDIA/CUDA-7.5/lib64 
source /etc/profile
./configure.py
sudo chmod a+x setup.py
sudo ./setup.py install

运行到一个非常长的C++ 编译命令会报错。经过分析,这个命令需要做如下修改:

  • 在C++命令后面添加参数:
-F/Library/Frameworks -Xlinker /usr/local/cuda/lib/libcuda.dylib
  • 在参数"-lcuda" 前面添加参数
-L/usr/local/cuda/lib 
  • 去掉"-arch i386"。

最后,手动运行

sudo c++ -F/Library/Frameworks -Xlinker /usr/local/cuda/lib/libcuda.dylib -bundle -undefined dynamic_lookup  -arch x86_64 -Wl,-F. build/temp.macosx-10.11-x86_64-2.7/src/cpp/cuda.o build/temp.macosx-10.11-x86_64-2.7/src/cpp/bitlog.o build/temp.macosx-10.11-x86_64-2.7/src/wrapper/wrap_cudadrv.o build/temp.macosx-10.11-x86_64-2.7/src/wrapper/mempool.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/arg_to_python_base.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/builtin_converters.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/from_python.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/registry.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/type_id.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/class.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/enum.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/function.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/function_doc_signature.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/inheritance.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/iterator.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/life_support.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/pickle_support.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/stl_iterator.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/dict.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/errors.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/exec.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/import.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/list.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/long.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/module.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/numeric.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object_operators.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object_protocol.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/slice.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/str.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/tuple.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/wrapper.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/smart_ptr/src/sp_collector.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/smart_ptr/src/sp_debug_hooks.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/system/src/error_code.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/thread/src/pthread/once.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/thread/src/pthread/thread.o build/temp.macosx-10.11-x86_64-2.7/src/wrapper/wrap_curand.o -L/Developer/NVIDIA/CUDA-7.5/lib -L/Developer/NVIDIA/CUDA-7.5/lib64 -L/Developer/NVIDIA/CUDA-7.5/lib/stubs -L/Developer/NVIDIA/CUDA-7.5/lib64/stubs -L/Developer/NVIDIA/CUDA-7.5/lib -L/Developer/NVIDIA/CUDA-7.5/lib64 -L/Developer/NVIDIA/CUDA-7.5/lib/stubs -L/Developer/NVIDIA/CUDA-7.5/lib64/stubs -L/usr/local/cuda/lib -lcuda -lcurand -o build/lib.macosx-10.11-x86_64-2.7/pycuda/_driver.so -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib64 -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib/stubs -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib64/stubs

编译成功,再执行

sudo ./setup.py install

安装成功。接下来设置theano配置文件

vim ~/.theanorc

保存以下内容

[global]
floatX = float32
device = gpu
[lib]
cnmem = 0.70

注意,参数cnmem表示快速CUDA显存申请时使用的显存比例,它是介于0~1之间的浮点数。设置得过大运行时会出现“CNMEM OutOfMemory”错误,不设置则CUDA加速不可用。具体应该设置为多少可以通过测试得出。cnmem是theano0.8的新特性。

4. 测试

执行如下测试代码

# coding=utf8
# settings

useGPU = True
import os
# 通过环境变量控制theano使用GPU或者CPU
if useGPU:
    os.environ["THEANO_FLAGS"] = "device=gpu"
else:
    os.environ["THEANO_FLAGS"] = "device=cpu"
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())


t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

结果为

/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/tomheaven/PycharmProjects/testTheano/testTheano.py
Using gpu device 0: GeForce GTX 870M (CNMeM is enabled with initial size: 70.0% of memory, cuDNN not available)
[GpuElemwise{exp,no_inplace}(), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.232286 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu

Process finished with exit code 0

这个运行时间比Win10和Ubuntu14.04都要短一些,应该是theano0.8的改进所得。