The Bible in a Graph Database

The Graph database is a rising trend in technology today. We hear discussions of why we should use it, how it’s different, what value it brings and just how cool the technology is. Developing technology for Bible Translation, we at Bridge Connectivity Solutions (BCS) have been thinking and asking relevant questions: Would graphs be suited for our requirements? What data would go in it? What improvements could it bring, if any? To answer these, we made a small exploration into the world of current Graph technologies, toyed around with a couple of databases and tried to model the data we have as graphs. Here is the story of our little journey and what we found out.

The relational databases have been around for a long time and have greatly influenced our thinking. I can tend to unknowingly model all data that I think of as tables. This also means that it is well established in its tooling. But Graph databases require us to move away from that. It has got a very different data structure. When relational databases claim to be about relations, graphs are actually all about relations. The data modelling in a Graph Database is about defining what becomes the nodes and how to connect them, via relations. Once you are a little familiar with the idea of nodes and edges, it becomes a more natural and intutive way to model data and even as a means of drawing out things on a white-board while you explain something complex. You will start to see that its not just the social-network data that has complex connections, but a lot of the data we work with today is deeply connected and fit to be modeled as graphs.

To begin with, we thought of trying out the Neo4j graph database, which is popular and has been used in the industry since 2007. Other than this, we are currently exploring dGraph database which is more recent and has some promising features and claims better performance.

We have an alignment project at BCS, as part of the AutographaMT platform, where we built a tool to align the words of the Bible in 12 Indian languages with the original languages- Greek and Hebrew. While working on that, we became familiarized with the data structures used to represent the Bible, various versions of the Bible, issues with versification mismatches between versions, etc. When we wanted to experiment on Graph database we thought of using the same data. It includes:

  • UGNT Bible for Greek,
  • IRV Bible in Hindi,
  • ULT Bible for English,
  • The Alignment we created between these(test/sample data, the ‘Quality Check’ stage of which has not been completed), and
  • Marking up of Strong’s numbers and translation words in the UGNT Bible.

So, we have some data we wanted to build a graph with, the next challenge was the actual data modeling. Trying to do that, we found ourselves pondering over a bunch of questions: what becomes nodes, how to connect them with relationships efficiently, how to choose the data types, how to create indices, and the like. The book Graph Databases, by Ian Robinson, et al, came as a help here. It had a good introduction to the various concepts of Graph Databases and on how to model data effectively for a Graph Database.

By going through this process, one thing we realized for certain is that, graph databases are going to add more value and power to the data processing we do. It is going to be an integral part of the Vachan Engine we are building at BCS, to serve as a smart/intelligent data engine that powers various AI-aided Bible Translation applications.

That said, I am excited to give you access to a demo site we have setup with the data on Neo4j (because it had better looking visualizations) for the aligned Biblical data. You can try it out here. When it asks to you connect to the DB, use the following credential
url: bolt://staging.autographamt.com:7687
user: neo4j
password: 111111
And make sure, you are using an http connection, not an https

Here are a few queries you can try out. We would love to see the cool queries you come up with- please share them as comments to this post!

Try

MATCH (n:BIBLE) RETURN n

or

MATCH p=((bib:BIBLE)<-[:BELONGS_TO]-(b:BOOK)<-[:BELONGS_TO]-(c:CHAPTER)<-[:BELONGS_TO]-(v:VERSE)<-[:BELONGS_TO]-(w:WORD))
WHERE bib.name='Hindi_IRV4' and b.number='40' and c.number='5' and v.number='3' 
RETURN p

or, for something a bit more sophisticated

MATCH p=((bib:BIBLE)<-[:BELONGS_TO]-(b:BOOK)<-[:BELONGS_TO]-(c:CHAPTER)<-[:BELONGS_TO]-(v:VERSE)<-[:BELONGS_TO]-(w:WORD))
WHERE b.number='40' and c.number='5' and v.number='3' 
RETURN p

And click your way into the depths of the Graph!

This video has more details about the work we did and a walk through for the database mentioned above.
Presenation on Graph DB

4 Likes

This is probably a silly question, but why use a database at all? It seems like you would get the best performance creating data objects the represent exactly what you need for your application instead of trying to make your application work within the confines of a database. Unless you’re expecting to have more data in your database than can be reasonably held in memory, I don’t see the point of using a database for any type of storage in an application. And even if the data is much larger than can be held in memory, it seems like a custom system will always out-perform an off-the-shelf database solution. :thinking:

I haven’t once used a database in an application I was writing and thought the application was better because of it. Granted, I’ve only used a database in like three applications over the years and none were web-based. :grin:

Maybe my thinking comes from my distaste of having to “query” for information instead of just accessing data objects in the code. :face_with_raised_eyebrow:

Great job Kavitha and BCS team! I see a lot of potential here. I’ve been talking with many people recently about semantic representations and graph-representations of the Bible, and I have been helping Robert Rouse (viz.bible) with a few things (he is using graph representations). I think there are some interested uses of the graph DB representation in:

  • Conversational AI and chatbots (where the semantic representation could allow for really nice question and answer and exploration)
  • Translation and checking assistants / algos
  • Visualizations
  • maybe more

Also, I would highly recommend DGraph. I’ve been using it on a project with SIL, and it’s so nice to work with. It’s super quick, deploys and scales nicely, and it has a GraphQL-like query language (which is a HUGE advantage for front end people). I know the CEO and developer advocate there fairly well, and I think the latter is in Bangalore. Happy to make a connection and/or act as a sounding board for these sorts of things.

2 Likes

Building a custom system may be the right choice but it is still better to first use something off-the-shelf and see how far it would get you. Then based on what you learn and more importantly by quantitatively understanding what the limitations are, making an informed choice whether to build something custom. This is especially true for something that could be an alternative to an established database system which already does a lot of things right. I should also mention the time and money required to make a custom system might make it infeasible.

There are applications out there that may have been better off if they had not used a DB :smile:. But at the same time there are far too many important applications/industries that would not be possible if it weren’t for ACID compliant databases- banks, mapping systems, eCommerce to name a few. These are use-cases where designers have made a deliberate choice to use databases since that is the most performant alternative. Also yes, query languages are an added layer of abstraction :stuck_out_tongue:

@kavitha, this is awesome! Good work! You’ve done a great job of explaining the rationale and showing some use cases of alignment data. The online demo is also really cool!

I keep noticing similarities with Text Fabric, which is also a graph based system, just minus the database component. There may be interesting applications for distributing resources using the Text Fabric file structure and then using (or not using @FoolRunning!) a database to ingest and use the data in an app. :slight_smile:

1 Like

Thanks @jag3773 ! Text fabric looks like a cool project. It is feasible to make our graph data and structure available in their format if anyone wants it. Seeing data like this being used in apps would be amazing!