theano开始使用theano


备注

Theano是一个python库,用于处理在张量变量上定义和评估符号表达式。有各种应用,但最受欢迎的是深度学习。

安装或设置

有关设置或安装theano的详细说明。

在Ubuntu 14.04上安装Theano并配置GPU

您可以使用以下说明安装Theano并配置GPU(假设新安装的Ubuntu 14.04):

# Install Theano
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano

# Install Nvidia drivers, CUDA and CUDA toolkit, following some instructions from http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb # Got the link at https://developer.nvidia.com/cuda-downloads
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda

sudo reboot
 

此时,运行nvidia-smi 应该可以正常工作,但运行nvcc 将无法正常工作。

# Execute in console, or (add in ~/.bash_profile then run "source ~/.bash_profile"):
export PATH=/usr/local/cuda-7.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
 

那时, nvidia-sminvcc 都应该有效。

要测试Theano是否能够使用GPU:

将以下内容复制粘贴到gpu_test.py

# Start gpu_test.py
# From http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')
# End gpu_test.py
 

并运行它:

THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
 

哪个应该返回:

f@f-Aurora-R4:~$ THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32' python gpu_test.py
Using gpu device 0: GeForce GTX 690
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.658292 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
 

要了解您的CUDA版本:

​nvcc -V
 

例:

username@server:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
 

添加cuDNN

要添加cuDNN(来自http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html的说明

  1. https://developer.nvidia.com/rdp/cudnn-download下载cuDNN(需要注册,免费)
  2. tar -xvf cudnn-7.0-linux-x64-v3.0-prod.tgz
  3. 执行以下操作之一

选项1:将*.h 文件复制到CUDA_ROOT/include ,将*.so* 文件复制到CUDA_ROOT/lib64 (默认情况下, CUDA_ROOT/usr/local/cuda )。

sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
 

选项2:

export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
 

默认情况下,Theano将检测它是否可以使用cuDNN。如果是这样,它将使用它。如果没有,Theano优化将不会引入cuDNN操作。因此,如果用户没有手动引入它们,Theano仍然可以工作。

要在Theano无法使用cuDNN时出现错误,请使用此Theano标志: optimizer_including=cudnn

例:

THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,optimizer_including=cudnn' python gpu_test.py
 

要了解您的cuDNN版本:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
 

添加CNMeM

CNMeM库是“帮助深度学习框架管理CUDA内存的简单库”。

# Build CNMeM without the unit tests
git clone https://github.com/NVIDIA/cnmem.git cnmem
cd cnmem
mkdir build
cd build
sudo apt-get install -y cmake
cmake ..
make

# Copy files to proper location
sudo cp ../include/cnmem.h /usr/local/cuda/include
sudo cp *.so /usr/local/cuda/lib64/
cd ../..
 

要与Theano一起使用,您需要添加lib.cnmem 标志。例:

THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32,lib.cnmem=0.8,optimizer_including=cudnn' python gpu_test.py
 

脚本的第一个输出应该是:

Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5005)
 

lib.cnmem=0.8 意味着它可以使用高达80%的GPU。

据报道,CNMeM提供了一些有趣的速度改进,并得到了Theano,Torch和Caffee的支持。

Theano - 来源1

加速取决于许多因素,如形状和模型本身。速度从0加快到2倍。

Theano - 来源2

如果你不改变Theano标志allow_gc,你可以期望在GPU上加速20%。在某些情况下(小型号),我们看到了50%的加速。


常见问题:

在多个CPU核心上运行Theano

您可以使用OMP_NUM_THREADS=[number_of_cpu_cores] 标志在多个CPU核心上运行Theano。

例:

OMP_NUM_THREADS=4 python gpu_test.py 
 

脚本theano/misc/check_blas.py 输出有关使用哪个BLAS的信息:

cd [theano_git_directory]
OMP_NUM_THREADS=4 python theano/misc/check_blas.py
 

你的第一个theano计划

在这个例子中,我们将编译计算求和和给出两个实数的差异的函数。

from __future__ import print_function
import theano
import theano.tensor as T

#define two symbolic scalar
s_x = T.fscalar()
s_y = T.fscalar()

#compute something
s_sum = s_x + s_y
s_diff = s_x - s_y

#compile a function that adds two number
#theano will call system compiler at here
fn_add = theano.function(inputs=[s_x, s_y], outputs=s_sum)
fn_diff = theano.function(inputs=[s_x, s_y], outputs=s_diff)

#call the compiled functions
print(fn_add(2., 2.)) #4.
print(fn_diff(2., 2.)) #0.