20 Feb 2015
Gustav Larsson
This post describes how to do a local Torch7 installation while ignoring a
potentially conflicting global installation in /usr/local/share
.
Doing a local Torch7 installation is easily done using
torch/distro. However, when running
install.sh
, I ran into the following error:
/usr/larsson/torch/install/bin/luajit: /tmp/luarocks_cutorch-scm-1-5301/cutorch/TensorMath.lua:184: attempt to call method 'registerDefaultArgument' (a nil value)
stack traceback:
/tmp/luarocks_cutorch-scm-1-5301/cutorch/TensorMath.lua:184: in main chunk
[C]: at 0x00405330
make[2]: *** [TensorMath.c] Error 1
make[1]: *** [CMakeFiles/cutorch.dir/all] Error 2
make: *** [all] Error 2
This issue is documented here
and the solution is to remove the global installation in /usr/local/share
.
This was not an option for me. This is what I did.
I cloned torch/distro
as you would, let us say to ~/torch
:
git clone git@github.com:torch/distro.git ~/torch --recursive
I went into ~/torch
and ran install.sh
, which failed. For me, I still got
Torch installed even though some packages failed. Check that this is the case by
running which th
and which luarocks
- it should point to ~/torch
. If this
is the case, run th
and type in:
> print(package.path)
> print(package.cpath)
Copy these strings to your LUA_PATH
and LUA_CPATH
, respectively. Leave out
any references to /usr/local/share
! This might look something like this in
your ~/.bashrc
:
export TORCH_DIR=$HOME/torch
export LUA_PATH="$TORCH_DIR/install/share/lua/5.1/?.lua;$TORCH_DIR/install/share/lua/5.1/?/init.lua;$TORCH_DIR/install/share/luajit-2.1.0-alpha/?.lua"
export LUA_CPATH="$TORCH_DIR/install/lib/lua/5.1/?.so"
Note that you have to quote the strings since their usage of ;
as a delimiter
does not play well with bash. Once saved, refresh your shell by running source ~/.bashrc
and try
installing the packages that failed. I did
luarocks install cutorch
luarocks install cunn
This time around it worked and I was good to go.
11 Nov 2014
Gustav Larsson
I used to be a big fan of Numpy’s
savez
and
load,
since you can throw any Python structure in there that you want to save.
However, these files are not compatible between Python 2 and 3, so they do not
fit my needs anymore since I have computers running both versions. I took the
matter to
Stackoverflow,
but a clear winner did not emerge.
Finally, I decided to write my own alternative to savez
based on HDF5 using
PyTables. The result can be found in our
deepdish project (in
hdf5io.py).
It also seconds as a general-purpose HDF5 saver/loader. First, an example of
how to write a Caffe-compatible data file:
import deepdish as dd
import numpy as np
X = np.zeros((100, 3, 32, 32))
y = np.zeros(100)
dd.io.save('test.h5', {'data': X, 'label': y}, compression=None)
Note that Caffe does not like the compressed version, so we are turning off
compression. Let’s take a look at it:
$ h5ls test.h5
data Dataset {100, 3, 32, 32}
label Dataset {100}
It will load into a dictionary with dd.io.load
:
Now, it does much more than that. It can save numbers, lists, strings,
dictionaries and numpy arrays. It will try its best to store things natively in
HDF5, so that it could be read by other programs as well. Another example:
>>> x = [np.arange(3), {'d': 100, 'e': 'hello'}]
>>> dd.io.save('test.h5', x)
>>> dd.io.load('test.h5')
[array([0, 1, 2]), {'e': 'hello', 'd': 100}]
If it doesn’t know how to save a particular data type, it will fall-back and
use pickling. This means it will still work, but you will lose the
compatibility across Python 2 and 3.
04 Nov 2014
Gustav Larsson
Caffe is a great framework for training and
running deep learning networks. However, it does not support weighted samples,
which is when you assign an importance for each sample. A weight (importance)
of 2 should have the same semantics for a sample as if you made a duplicate of
it.
I created an experimental fork of Caffe that supports this:
This modification is so far rough around the edges and likely easy to break. I
have also not implemented support for it in all the loss layers, but only a
select few.
It works by adding the blob sample_weight
to the dataset, alongside data
and label
. The easiest way is to save the data as HDF5, which can easily be
done through Python:
import h5py
import os
# X should have shape (samples, color channel, width, height)
# y should have shape (samples,)
# w should have shape (samples,)
# They should have dtype np.float32, even label
# DIR is an absolute path (important!)
h5_fn = os.path.join(DIR, 'data.h5')
with h5py.File(h5_fn, 'w') as f:
f['data'] = X
f['label'] = y
f['sample_weight'] = w
text_fn = os.path.join(DIR, 'data.txt')
with open(text_fn, 'w'): as f:
print(h5_fn, file=f)
Or, if you have our deepdish package
installed, saving the HDF5 can be done as follows (also see this post):
dd.io.save(h5_fn, dict(data=X, label=y, sample_weight=w))
Now, load the sample_weight
in your data layer:
layers {
name: "example"
type: HDF5_DATA
top: "data"
top: "label"
top: "sample_weight" # <-- add this
hdf5_data_param {
source: "/path/to/data.txt"
batch_size: 100
}
}
The file data.txt
should contain a single line with the absolute path to
h5_fn
, for instance /path/to/data.h5
. Next, hook it up to the softmax layer
as:
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "last_layer"
bottom: "label"
bottom: "sample_weight" # <-- add this
top: "loss"
}
The layer SOFTMAX_LOSS
is one of the few layers that have been
adapted to use sample_weight
. If you want to use one that has not been
implemented yet, take inspiration from
src/caffe/softmax_loss_layer.cpp.
Remember to also update hpp
and cu
files where needed. If you end up doing this, pull requests are welcome.