I have been playing aroung with Neo4j v1.6 in the embedded mode with Python. The Neo4j library for python uses JPype. I was quite unhappy with the the raw speed. So I decided to decided to test the following code in Python, or more specifically CPython.
The code is pretty basic. It takes a csv file with headers and creates nodes out of that data. On a sample file, this code took ~415 seconds to run.
I could have used the batch insertion API provided by Michael Hunger here. But it’s not appropriate if you are manipulating the data from the csv file.
Then I rewrote the code to make use of the native Neo4j library through Jython. Anecdotally I’ve heard that Jython is slower than CPython but I wanted to test it for this particular program. I had allocated the same Java max heap size for both the programs.
On the same sample file as before, this took ~130 seconds to run. I got a over 3x performance improvement by shifting to jython. That’s a major speed improvement!
So I decided to use jython for the component which does the bulk data import and cpython in other places.
Your mileage may vary.