Splitting Bible dictionary entries which include multiple unique entities

Bible dictionaries (and digital versions of them) are arranged such that each entry goes with a particular word. When that word is a proper name shared by more than one person or place, there may be a bulleted or numbered list: one for each individual or unique location.

Here’s an example from Easton’s Bible dictionary about Abda:

Servant.
(1.) The father of Adoniram, whom Solomon set over the tribute (1 Kings 4:6); i.e., the forced labour (R.V., “levy”).

(2.) A Levite of the family of Jeduthun (Neh. 11:17), also called Obadiah (1 Chr. 9:16).

This poses a special problem for any tool which aims to show relevant information for one individual, not both. If I were studying 1 Kings, I would want to see only the first section. For Nehemiah, I’d want my application to show only the text in the second section.

Parsing the segments
To that end, I have begin working on a script to segment these entries and match them with unique identifiers for people and places in my Bible knowledge graph. If anyone is aware of similar efforts elsewhere, please let me know how to get in touch with them.

The script will work like this, using Easton’s dictionary:

  1. Split dictionary entries into separate records for each item in the numbered list
  2. Query the knowledge graph api for matching names and verse references
  3. Declare a “match” when verse references from the api correspond to one or more references given in the dictionary text.
  4. Create a relationship between the list item in the dictionary and the matched person/place node.

Tech approach
I am using Google Colab to create the Python script, available here. The source file is from CCEL which offers the dictionary in Theological Markup Format, making it easy to pull out machine-readable verse references in each entry. Here’s an example:

<def id="a-p16.5">
<p id="a-p17">Servant. (1.) The father of Adoniram, whom Solomon set over the
tribute (<scripRef passage="1 Kings 4:6" id="a-p17.1" parsed="|1Kgs|4|6|0|0" osisRef="Bible:1Kgs.4.6">1 Kings 4:6</scripRef>); i.e., the forced labour (R.V., &#8220;levy&#8221;).</p>

<p id="a-p18">(2.) A Levite of the family of Jeduthun (<scripRef passage="Neh. 11:17" id="a-p18.1" parsed="|Neh|11|17|0|0" osisRef="Bible:Neh.11.17">Neh. 11:17</scripRef>), also
called Obadiah (<scripRef passage="1 Chr. 9:16" id="a-p18.2" parsed="|1Chr|9|16|0|0" osisRef="Bible:1Chr.9.16">1 Chr. 9:16</scripRef>).</p>
</def>

End Result
Once parsed, the matched descriptive text will be added to the knowledge graph. This will be useful for disambiguation and to serve only the relevant portion of dictionary text within Bible applications.

1 Like

This looks great!
In case you were interested, there is also the open-source translationWords created by unfoldingWord: https://git.door43.org/unfoldingWord/en_tw
Those are the words identified as crucial from the lens of a translator for ensuring proper transfer of meaning. You can see that each of those words has translations suggestions and also lists the connected Strong’s which gives clues to the different ways the word can be translated (or the different senses attached for that word).

Updated post to include Colab notebook: https://colab.research.google.com/drive/1S8QMEpeadtzATfvIpRBk7hBYu0neX5w0

Thanks, I’ve looked through the unfoldingWord data in a couple of formats. I could probably use that to tie person/place names with Strong’s numbers in my database, which would be helpful down the road.