deepdish

Serving Up Chicago-Style Deep Learning.

Latest post:

The building blocks of Deep Learning 21 Nov 2015

© 2015. All rights reserved.

Python dictionary to HDF5

I used to be a big fan of Numpy’s savez and load, since you can throw any Python structure in there that you want to save. However, these files are not compatible between Python 2 and 3, so they do not fit my needs anymore since I have computers running both versions. I took the matter to Stackoverflow, but a clear winner did not emerge.

Finally, I decided to write my own alternative to savez based on HDF5 using PyTables. The result can be found in our deepdish project (in hdf5io.py). It also seconds as a general-purpose HDF5 saver/loader. First, an example of how to write a Caffe-compatible data file:

import deepdish as dd
import numpy as np

X = np.zeros((100, 3, 32, 32))
y = np.zeros(100)

dd.io.save('test.h5', {'data': X, 'label': y}, compression=None)

Note that Caffe does not like the compressed version, so we are turning off compression. Let’s take a look at it:

$ h5ls test.h5
data                     Dataset {100, 3, 32, 32}
label                    Dataset {100}

It will load into a dictionary with dd.io.load:

dd.io.load('test.h5')

Now, it does much more than that. It can save numbers, lists, strings, dictionaries and numpy arrays. It will try its best to store things natively in HDF5, so that it could be read by other programs as well. Another example:

>>> x = [np.arange(3), {'d': 100, 'e': 'hello'}]
>>> dd.io.save('test.h5', x)
>>> dd.io.load('test.h5')
[array([0, 1, 2]), {'e': 'hello', 'd': 100}]

If it doesn’t know how to save a particular data type, it will fall-back and use pickling. This means it will still work, but you will lose the compatibility across Python 2 and 3.