Thursday, April 30, 2015

Numpy exercises

When one starts writing in python, the typical reaction is disappointment about how slow it is compared to any compilable language. After a while, you learn numpy and find out it's actually not so bad.

Having spent a month with numpy, I found out that many things can be written in it.
Having spent a year with it, I found out that almost any algorithm may be vectorized, though it's sometimes non-trivial.

I'm still quite disappointed about the most answers at stackoverflow, where people prefer plain python for any nontrivial thing.

For instance, you need to compute statistics of values in array. Well, you can sort array and keep track of initial position. Alternatively, you can do it in numpy-one-liner:

order_statistics = numpy.argsort(numpy.argsort(initial_array))

Don't believe? Check this!
Want to compute mean value over the group of events? Another one-liner:
means = numpy.bincount(group_indices, weights=values) / numpy.bincount(group_indices)

Writing oblivious decision tree in numpy is very simple and can be done really fast.

As a non-trivial problem: will you be able to write application of usual decision tree in pure numpy? For simplicity, you can first consider only trees with equal depth of all leaves.

No comments :