前面已经介绍过Win10和Ubuntu14.04安装Python2.7+Theano0.7+Cuda7.5的方法。本文把最后的Mac系统也搞定了。总体过程比较顺利,中间有些小Trick,都一一解决了。
本文假设你已经安装好了xcode和cuda7.5 sdk(一个从应用商店装,一个从Nvidia官网下dmg装)。
1. 下载安装pip
下载pip-8.1.2.tar.gz,解压后运行
sudo chmod a+x setup.py
sudo ./setup.py install
2. 安装常用库的最新版本
sudo pip install numpy scipy matplotlib scikit-learn scikit-image --upgrade
sudo pip install theano --upgrade
sudo easy_install --upgrade six
3. 安装pycuda
修改/etc/profile文件添加如下内容
# cuda
export CUDA_INC_DIR=/usr/local/cuda/include
export PATH=/usr/local/cuda/bin:$PATH
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
下载pycuda-2016.1.2.tar.gz,解压后运行
sudo ln -s /usr/local/cuda/lib /usr/local/cuda/lib64
sudo ln -s /Developer/NVIDIA/CUDA-7.5/lib /Developer/NVIDIA/CUDA-7.5/lib64
source /etc/profile
./configure.py
sudo chmod a+x setup.py
sudo ./setup.py install
运行到一个非常长的C++ 编译命令会报错。经过分析,这个命令需要做如下修改:
- 在C++命令后面添加参数:
-F/Library/Frameworks -Xlinker /usr/local/cuda/lib/libcuda.dylib
- 在参数"-lcuda" 前面添加参数
-L/usr/local/cuda/lib
- 去掉"-arch i386"。
最后,手动运行
sudo c++ -F/Library/Frameworks -Xlinker /usr/local/cuda/lib/libcuda.dylib -bundle -undefined dynamic_lookup -arch x86_64 -Wl,-F. build/temp.macosx-10.11-x86_64-2.7/src/cpp/cuda.o build/temp.macosx-10.11-x86_64-2.7/src/cpp/bitlog.o build/temp.macosx-10.11-x86_64-2.7/src/wrapper/wrap_cudadrv.o build/temp.macosx-10.11-x86_64-2.7/src/wrapper/mempool.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/arg_to_python_base.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/builtin_converters.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/from_python.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/registry.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter/type_id.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/class.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/enum.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/function.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/function_doc_signature.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/inheritance.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/iterator.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/life_support.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/pickle_support.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object/stl_iterator.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/dict.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/errors.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/exec.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/import.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/list.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/long.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/module.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/numeric.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object_operators.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object_protocol.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/slice.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/str.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/tuple.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/wrapper.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/smart_ptr/src/sp_collector.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/smart_ptr/src/sp_debug_hooks.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/system/src/error_code.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/thread/src/pthread/once.o build/temp.macosx-10.11-x86_64-2.7/bpl-subset/bpl_subset/libs/thread/src/pthread/thread.o build/temp.macosx-10.11-x86_64-2.7/src/wrapper/wrap_curand.o -L/Developer/NVIDIA/CUDA-7.5/lib -L/Developer/NVIDIA/CUDA-7.5/lib64 -L/Developer/NVIDIA/CUDA-7.5/lib/stubs -L/Developer/NVIDIA/CUDA-7.5/lib64/stubs -L/Developer/NVIDIA/CUDA-7.5/lib -L/Developer/NVIDIA/CUDA-7.5/lib64 -L/Developer/NVIDIA/CUDA-7.5/lib/stubs -L/Developer/NVIDIA/CUDA-7.5/lib64/stubs -L/usr/local/cuda/lib -lcuda -lcurand -o build/lib.macosx-10.11-x86_64-2.7/pycuda/_driver.so -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib64 -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib/stubs -Xlinker -rpath -Xlinker /Developer/NVIDIA/CUDA-7.5/lib64/stubs
编译成功,再执行
sudo ./setup.py install
安装成功。接下来设置theano配置文件
vim ~/.theanorc
保存以下内容
[global]
floatX = float32
device = gpu
[lib]
cnmem = 0.70
注意,参数cnmem表示快速CUDA显存申请时使用的显存比例,它是介于0~1之间的浮点数。设置得过大运行时会出现“CNMEM OutOfMemory”错误,不设置则CUDA加速不可用。具体应该设置为多少可以通过测试得出。cnmem是theano0.8的新特性。
4. 测试
执行如下测试代码
# coding=utf8
# settings
useGPU = True
import os
# 通过环境变量控制theano使用GPU或者CPU
if useGPU:
os.environ["THEANO_FLAGS"] = "device=gpu"
else:
os.environ["THEANO_FLAGS"] = "device=cpu"
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
结果为
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/tomheaven/PycharmProjects/testTheano/testTheano.py
Using gpu device 0: GeForce GTX 870M (CNMeM is enabled with initial size: 70.0% of memory, cuDNN not available)
[GpuElemwise{exp,no_inplace}(), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.232286 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
Process finished with exit code 0
这个运行时间比Win10和Ubuntu14.04都要短一些,应该是theano0.8的改进所得。