ALERT! Warning: your browser isn't supported. Please install a modern one, like Firefox, Opera, Safari, Chrome or the latest Internet Explorer. Thank you!
Startseite » ... » Zentrale Einrichtungen  » ZIH  » Wiki
phone prefix: +49 351 463.....

HPC Support

Operation Status

Ulf Markwardt: 33640
Claudia Schmidt: 39833 hpcsupport@zih.tu-dresden.de

Login and project application

Phone: 40000
Fax: 42328
servicedesk@tu-dresden.de

You are here: Compendium » Applications » DeepLearning

Deep Learning Software

Please refer to our List of Modules page for a daily-updated list of the respective software versions that are currently installed.

Caffe

Caffe is available in our EasyBuild module environment under the module name "Caffe".

TensorFlow

TensorFlow is available in our EasyBuild module environment under the module name "tensorflow". There also is a build with the "-avx2fma" suffix available, however, according to our internal tests the addition of AVX2/FMA did not have a positive effect on performance, so we recommend to use the default version instead.

Note that up to version 1.2.1, it was installed using the binary distribution packages from Google. Since those are built with GLIBC 2.16 and the operating system on Taurus only includes GLIBC 2.12, it is not possible to use those versions out-of-the-box. You have to use the supplied wrapper script "python-glibc2.17" as the interpreter for your scripts in order to make it work. Starting from version 1.3.0 we have done custom builds which make this workaround obsolete.

Keras

Keras is available in our EasyBuild module environment under the module name "Keras".

It can either use TensorFlow or Theano as its backend. If you wish to use the TensorFlow backend, please note the special circumstances described in the section above (don't forget to load the corresponding tensorflow module). Theano should be loaded automatically as a dependency. You can select your desired backend by editing the configuration file ~/.keras/keras.json in your home directory and specify either:
   "backend": "tensorflow",

or:
   "backend": "theano",

The file is created automatically when running Keras for the first time.

Multi-CPU Theano

If you wish to use CPU-parallelism with Theano, you have to supply it with a multi-threaded BLAS library. On Taurus, it is recommended to use the Intel MKL for that. For our module, it is already loaded as a dependency, so you just have to add the following to your ~/.theanorc:
[blas]
ldflags = '-L${MKLROOT}/lib/intel64 -lmkl_avx2 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl'

#NOTE: you might have to replace "-lmkl_avx2" with "-lmkl_def" if you want to run on non-Haswell nodes

Then increase your --cpus-per-task SLURM parameter according to the number of threads you wish to use.

Multi-GPU Theano

For multi-GPU support you have to use the new libgpuarray backend for Theano (device=gpu is old backend, device=cuda uses new backend, also see [1]). It is supported in the latest release starting from Theano 0.9.0, which is now available as a module on Taurus.

Example:
THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1" python -c 'import theano'

Note: This does not work in an interactive bash that was started with "srun --pty ... bash -l", because the sourcing of /etc/profile that happens when using the "-l" parameter to bash overwrites some environment variables, which leads to a PMI error.

[1] https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray)

Test Case 1: Keras with Theano/Tensorflow on MNIST data

Go to a directory on taurus, get Keras for the examples and go to the examples:
git clone https://github.com/fchollet/keras.git
cd keras/examples/

Used configuration for our test case (if you do not specify Keras backend, then tensorflow is used as default):
$ cat ~/.keras/keras.json
{
    "floatx": "float32",
    "epsilon": 1e-07,
    "image_dim_ordering": "tf",
    "backend": "theano"
}
$ cat ~/.theanorc 
[global]
floatX = float32
device = cuda
[lib]
cnmem = 1
[nvcc]
fastmath = True

Job-file (schedule job with sbatch, check status with squeue -u <Username>):
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --mem=10000
#SBATCH -p gpu2 # K80 GPUs on Haswell node
#SBATCH --time=01:00:00

## with Theano (using configs from above)
module purge # purge if you already have modules loaded
module load eb
module load Keras

srun python mnist_cnn.py


## with Tensorflow
module purge
module load eb
module load Keras
module load tensorflow

export KERAS_BACKEND=tensorflow # configure Keras to use tensorflow

srun python-glibc2.17 mnist_cnn.py # requires glibc2.17 otherwise native TensorFlow
# runtime fails to load, see Tensorflow section above

Output with Theano backend:
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 45s - loss: 0.3244 - acc: 0.9008 - val_loss: 0.0748 - val_acc: 0.9768
Epoch 2/12
60000/60000 [==============================] - 44s - loss: 0.1116 - acc: 0.9662 - val_loss: 0.0543 - val_acc: 0.9827
Epoch 3/12
60000/60000 [==============================] - 44s - loss: 0.0860 - acc: 0.9744 - val_loss: 0.0516 - val_acc: 0.9840
Epoch 4/12
60000/60000 [==============================] - 44s - loss: 0.0714 - acc: 0.9788 - val_loss: 0.0404 - val_acc: 0.9865
Epoch 5/12
60000/60000 [==============================] - 44s - loss: 0.0634 - acc: 0.9808 - val_loss: 0.0366 - val_acc: 0.9877
Epoch 6/12
60000/60000 [==============================] - 44s - loss: 0.0569 - acc: 0.9825 - val_loss: 0.0337 - val_acc: 0.9885
Epoch 7/12
60000/60000 [==============================] - 44s - loss: 0.0519 - acc: 0.9844 - val_loss: 0.0324 - val_acc: 0.9893
Epoch 8/12
60000/60000 [==============================] - 44s - loss: 0.0472 - acc: 0.9860 - val_loss: 0.0316 - val_acc: 0.9895
Epoch 9/12
60000/60000 [==============================] - 44s - loss: 0.0445 - acc: 0.9865 - val_loss: 0.0331 - val_acc: 0.9889
Epoch 10/12
60000/60000 [==============================] - 44s - loss: 0.0406 - acc: 0.9885 - val_loss: 0.0319 - val_acc: 0.9895
Epoch 11/12
60000/60000 [==============================] - 44s - loss: 0.0384 - acc: 0.9889 - val_loss: 0.0299 - val_acc: 0.9901
Epoch 12/12
60000/60000 [==============================] - 44s - loss: 0.0366 - acc: 0.9888 - val_loss: 0.0298 - val_acc: 0.9906Using Theano backend.
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Tesla K80 (0000:05:00.0)

Test loss: 0.0298093453895
Test accuracy: 0.9906

Output with Tensorflow backend:
[...]
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:05:00.0)
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 10s - loss: 0.3288 - acc: 0.8980 - val_loss: 0.0787 - val_acc: 0.9758
Epoch 2/12
60000/60000 [==============================] - 9s - loss: 0.1164 - acc: 0.9658 - val_loss: 0.0534 - val_acc: 0.9826
Epoch 3/12
60000/60000 [==============================] - 9s - loss: 0.0885 - acc: 0.9738 - val_loss: 0.0465 - val_acc: 0.9837
Epoch 4/12
60000/60000 [==============================] - 9s - loss: 0.0737 - acc: 0.9783 - val_loss: 0.0403 - val_acc: 0.9868
Epoch 5/12
60000/60000 [==============================] - 9s - loss: 0.0656 - acc: 0.9807 - val_loss: 0.0361 - val_acc: 0.9876
Epoch 6/12
60000/60000 [==============================] - 9s - loss: 0.0581 - acc: 0.9828 - val_loss: 0.0361 - val_acc: 0.9884
Epoch 7/12
60000/60000 [==============================] - 9s - loss: 0.0522 - acc: 0.9843 - val_loss: 0.0324 - val_acc: 0.9886
Epoch 8/12
60000/60000 [==============================] - 9s - loss: 0.0479 - acc: 0.9851 - val_loss: 0.0304 - val_acc: 0.9893
Epoch 9/12
60000/60000 [==============================] - 9s - loss: 0.0450 - acc: 0.9868 - val_loss: 0.0291 - val_acc: 0.9902
Epoch 10/12
60000/60000 [==============================] - 9s - loss: 0.0420 - acc: 0.9873 - val_loss: 0.0308 - val_acc: 0.9899
Epoch 11/12
60000/60000 [==============================] - 9s - loss: 0.0404 - acc: 0.9882 - val_loss: 0.0282 - val_acc: 0.9906
Epoch 12/12
60000/60000 [==============================] - 11s - loss: 0.0382 - acc: 0.9885 - val_loss: 0.0292 - val_acc: 0.9910
Test loss: 0.0292089643462
Test accuracy: 0.991
Using TensorFlow backend.

Test Case 2: Multi-GPU usage (Theano)

Job file
#!/bin/bash
#SBATCH --gres=gpu:2  # using 2 GPUs
#SBATCH --mem=5000
#SBATCH -p gpu2
#SBATCH --time=01:00:00

module purge
module load eb
module load Theano

THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1" python -c 'import theano'

Output (with Theano backend):
Using cuDNN version 5105 on context dev0
Mapped name dev0 to device cuda0: Tesla K80 (0000:04:00.0)
Using cuDNN version 5105 on context dev1
Mapped name dev1 to device cuda1: Tesla K80 (0000:05:00.0)