deepdish

Serving Up Chicago-Style Deep Learning.

Latest post:

The building blocks of Deep Learning 21 Nov 2015

© 2015. All rights reserved.

Local Torch installation

This post describes how to do a local Torch7 installation while ignoring a potentially conflicting global installation in /usr/local/share.

Doing a local Torch7 installation is easily done using torch/distro. However, when running install.sh, I ran into the following error:

/usr/larsson/torch/install/bin/luajit: /tmp/luarocks_cutorch-scm-1-5301/cutorch/TensorMath.lua:184: attempt to call method 'registerDefaultArgument' (a nil value)
stack traceback:
        /tmp/luarocks_cutorch-scm-1-5301/cutorch/TensorMath.lua:184: in main chunk
        [C]: at 0x00405330
make[2]: *** [TensorMath.c] Error 1
make[1]: *** [CMakeFiles/cutorch.dir/all] Error 2
make: *** [all] Error 2

This issue is documented here and the solution is to remove the global installation in /usr/local/share. This was not an option for me. This is what I did.

I cloned torch/distro as you would, let us say to ~/torch:

git clone git@github.com:torch/distro.git ~/torch --recursive

I went into ~/torch and ran install.sh, which failed. For me, I still got Torch installed even though some packages failed. Check that this is the case by running which th and which luarocks - it should point to ~/torch. If this is the case, run th and type in:

> print(package.path)
> print(package.cpath)

Copy these strings to your LUA_PATH and LUA_CPATH, respectively. Leave out any references to /usr/local/share! This might look something like this in your ~/.bashrc:

export TORCH_DIR=$HOME/torch
export LUA_PATH="$TORCH_DIR/install/share/lua/5.1/?.lua;$TORCH_DIR/install/share/lua/5.1/?/init.lua;$TORCH_DIR/install/share/luajit-2.1.0-alpha/?.lua"
export LUA_CPATH="$TORCH_DIR/install/lib/lua/5.1/?.so"

Note that you have to quote the strings since their usage of ; as a delimiter does not play well with bash. Once saved, refresh your shell by running source ~/.bashrc and try installing the packages that failed. I did

luarocks install cutorch
luarocks install cunn

This time around it worked and I was good to go.

Python dictionary to HDF5

I used to be a big fan of Numpy’s savez and load, since you can throw any Python structure in there that you want to save. However, these files are not compatible between Python 2 and 3, so they do not fit my needs anymore since I have computers running both versions. I took the matter to Stackoverflow, but a clear winner did not emerge.

Finally, I decided to write my own alternative to savez based on HDF5 using PyTables. The result can be found in our deepdish project (in hdf5io.py). It also seconds as a general-purpose HDF5 saver/loader. First, an example of how to write a Caffe-compatible data file:

import deepdish as dd
import numpy as np

X = np.zeros((100, 3, 32, 32))
y = np.zeros(100)

dd.io.save('test.h5', {'data': X, 'label': y}, compression=None)

Note that Caffe does not like the compressed version, so we are turning off compression. Let’s take a look at it:

$ h5ls test.h5
data                     Dataset {100, 3, 32, 32}
label                    Dataset {100}

It will load into a dictionary with dd.io.load:

dd.io.load('test.h5')

Now, it does much more than that. It can save numbers, lists, strings, dictionaries and numpy arrays. It will try its best to store things natively in HDF5, so that it could be read by other programs as well. Another example:

>>> x = [np.arange(3), {'d': 100, 'e': 'hello'}]
>>> dd.io.save('test.h5', x)
>>> dd.io.load('test.h5')
[array([0, 1, 2]), {'e': 'hello', 'd': 100}]

If it doesn’t know how to save a particular data type, it will fall-back and use pickling. This means it will still work, but you will lose the compatibility across Python 2 and 3.

Caffe with weighted samples

Caffe is a great framework for training and running deep learning networks. However, it does not support weighted samples, which is when you assign an importance for each sample. A weight (importance) of 2 should have the same semantics for a sample as if you made a duplicate of it.

I created an experimental fork of Caffe that supports this:

This modification is so far rough around the edges and likely easy to break. I have also not implemented support for it in all the loss layers, but only a select few.

It works by adding the blob sample_weight to the dataset, alongside data and label. The easiest way is to save the data as HDF5, which can easily be done through Python:

import h5py
import os

# X should have shape (samples, color channel, width, height)
# y should have shape (samples,)
# w should have shape (samples,)
# They should have dtype np.float32, even label

# DIR is an absolute path (important!)

h5_fn = os.path.join(DIR, 'data.h5')

with h5py.File(h5_fn, 'w') as f:
    f['data'] = X
    f['label'] = y
    f['sample_weight'] = w

text_fn = os.path.join(DIR, 'data.txt')
with open(text_fn, 'w'): as f:
    print(h5_fn, file=f)

Or, if you have our deepdish package installed, saving the HDF5 can be done as follows (also see this post):

dd.io.save(h5_fn, dict(data=X, label=y, sample_weight=w))

Now, load the sample_weight in your data layer:

layers {
    name: "example"
    type: HDF5_DATA
    top: "data"
    top: "label"
    top: "sample_weight"  # <-- add this
    hdf5_data_param {
        source: "/path/to/data.txt"
        batch_size: 100
    }
}

The file data.txt should contain a single line with the absolute path to h5_fn, for instance /path/to/data.h5. Next, hook it up to the softmax layer as:

layers {
    name: "loss"
    type: SOFTMAX_LOSS
    bottom: "last_layer"
    bottom: "label"
    bottom: "sample_weight"  # <-- add this
    top: "loss"
}

The layer SOFTMAX_LOSS is one of the few layers that have been adapted to use sample_weight. If you want to use one that has not been implemented yet, take inspiration from src/caffe/softmax_loss_layer.cpp. Remember to also update hpp and cu files where needed. If you end up doing this, pull requests are welcome.