I have been playing aroung with Neo4j v1.6 in the embedded mode with Python. The Neo4j library for python uses JPype. I was quite unhappy with the the raw speed. So I decided to decided to test the following code in Python, or more specifically CPython.
import csv, sys, datetime
from neo4j import GraphDatabase
if __name__ == "__main__":
start = datetime.datetime.now()
db = GraphDatabase("test")
reader = csv.reader(open(sys.argv[1]))
header = reader.next()
with db.transaction:
for row in reader:
# do something with row
node = db.node(**dict(zip(header, row)))
print "Closing....."
db.shutdown()
end = datetime.datetime.now()
print "Time taken: %s" % (end - start, )
The code is pretty basic. It takes a csv file with headers and creates nodes out of that data. On a sample file, this code took ~415 seconds to run.
I could have used the batch insertion API provided by Michael Hunger here. But it’s not appropriate if you are manipulating the data from the csv file.
Then I rewrote the code to make use of the native Neo4j library through Jython. Anecdotally I’ve heard that Jython is slower than CPython but I wanted to test it for this particular program. I had allocated the same Java max heap size for both the programs.
import sys, datetime, csv
from org.neo4j.kernel import EmbeddedGraphDatabase
if __name__ == "__main__":
start = datetime.datetime.now()
db = EmbeddedGraphDatabase("test-jy")
reader = csv.reader(open(sys.argv[1]))
header = reader.next()
tx = db.beginTx()
try:
for row in reader:
# do something with row.
node = db.createNode()
for k, v in zip(header, row):
node.setProperty(k, v)
tx.success()
except:
tx.failure()
tx.finish()
print "Closing....."
db.shutdown()
end = datetime.datetime.now()
print "Time taken: %s" % (end - start, )
On the same sample file as before, this took ~130 seconds to run. I got a over 3x performance improvement by shifting to jython. That’s a major speed improvement!
So I decided to use jython for the component which does the bulk data import and cpython in other places.
Your mileage may vary.