Load BKO_id to BKO

d86b3b96 · Ribary, Marton Dr (School of Law) · d1615c03 · d86b3b96 · d86b3b96 · d86b3b96
Commit d86b3b96 authored 5 years ago by Ribary, Marton Dr (School of Law)
--- a/Ddf_documentation.md
+++ b/Ddf_documentation.md
@@ -6,7 +6,7 @@ The documentation describes the creation of the core dataframes of the `pyDigest

 The dcumentation follows the order of output files stored in the prject's `dump` folder. The title of individual steps include the method and the name of the output file in the format of `[Method] > [outputFile]`. The `[Method]` is either (1) manual or (2) executed in Pyhton files which are stored in the `script` folder. The `[Method]` and `[outputFile]` correspond with elements in `Ddf_flowchart`. These steps document the data manipulation pipeline from the raw text of the _Digest_ to its [relational database](link).

-![Ddf_flowchart](https://raw.githubusercontent.com/mribary/pyDigest/master/images/Ddf_flowchart.png)
+![Ddf_flowchart](https://raw.githubusercontent.com/mribary/pyDigest/master/images/Ddf_flowchart.graphml)

 1. Manual editing > ROMTEXT.txt

@@ -212,6 +212,10 @@ The manually corrected `Ddf_v104.csv` and `BKO_v004.csv` files are free from err

 Manual inspection has revealed that Greek script had been lost during manual editing in step 7 above. The Python file recovers the `TextUnit` column from `Ddf_v102.csv` and inserts them into `Ddf_v105.csv`. The output `csv` file is streamlined to keep the index, the four-level _Digest_ reference and the text only. These columns will be used to create the SQL database.

+12. Manual editing > BKO_v006.csv
+
+Based on `Work_ref`, a verbose `title` is added to the `BKO` dataframe which includes the title of the work in an unabbreviated format and drops the book numbers where the work in the `BKO` dataframe is split. Where the verbose title could not be verified by checking against Mommsen's print edition, a `?` mark is added to the title to indicate that the data may need to be updated at a later point.
+
 ### 3. Additional dataframes

 1. Sections dataframes
@@ -224,7 +228,7 @@ An additional dataframe including section_IDs with their corresponding section t

 2. ID dataframes

-Ddf_IDs_001.py > Ddf_IDs_001.csv, Ddf_BKO_IDs_001.csv, Ddf_Work_IDs_001.csv, Ddf_Book_IDs_001.csv
+`Ddf_IDs_001.py > Ddf_IDs_001.csv, Ddf_BKO_IDs_001.csv, Ddf_Work_IDs_001.csv, Ddf_Book_IDs_001.csv`

 The script initiates a dataframe `df` with the `BKO_key`, `Work` and `TextUnit_ref` columns from `Ddf_v104.csv`. It creates separate dataframes for unique `BKO_key` (294), `Work` (251), and `TextUnit_ref` values (1352) where values are sorted alphabetically and associated with a unique ID. These dataframes are exported as `Ddf_BKO_IDs_v001.csv`, `Ddf_Work_IDs_v001.csv` and `Ddf_Book_IDs_v001.csv`.

@@ -232,11 +236,11 @@ The script links the reference IDs above with the 21055 text units of the Digest

 3. Jurists dataframes

-> 3.1. Ddf_jurists.py > Jurists_v001.csv
+> 3.1. `Ddf_jurists.py > Jurists_v001.csv`

 The script initiates a dataframe `df` with the `Jurist_name` column from `BKO_v004.csv`. It strips whitespace, orders the list of unique values, associates items with unique IDs and outputs the `Jurists_v001.csv` file to be enriched with data manually.

-> 3.2. Manual editing: Jurists_v001.csv > Jurists_v002.csv, Ddf_v105.csv
+> 3.2. `Manual editing: Jurists_v001.csv > Jurists_v002.csv, Ddf_v105.csv`

 Jurists are associated with a date range of their lifetime according to information available in Adolf Berger's _Dictionary of Roman law_[<sup id="inline6">6</sup>](#fn6) consulted in conjunction with _Paulys Realencyclopädie der classischen Altertumswissenschaft_[<sup id="inline7">7</sup>](#fn7). The manually edited `Jurists_v002.csv` includes a `Note` which explains how the date range is estimated and a column with `Reference` information to _Berger_ and the _RE_. `Start_date` corresponds to the (estimated) birth of the jurist, `Mid_date` to his (estimated) most active period (_floruit_) at the age of 40, and `End_date` to his (estimated) death at the age of 60. Where exact dates are available for any of the three dates from _Berger_ and the _RE_, it is entered into the appropriate column instead of the estimate.

@@ -244,14 +248,20 @@ For those jurists who are dated only by rough estimates in the _RE_ or _Berger_,

 Minor typos and alternative versions of headings are corrected in Ddf which is updated manually in `Ddf_v105.csv`. The consistency of the `Ddf` and `BKO` dataframes are checked in `Ddf_BKO_check_4.py`.

-> 3.3. Ddf_IDs_002.py > Ddf_IDs_002.csv, Ddf_BKO_IDs_002.csv, Ddf_Work_IDs_002.csv, Ddf_Book_IDs_002.csv
+> 3.3. `Ddf_IDs_002.py > Ddf_IDs_002.csv, Ddf_BKO_IDs_002.csv, Ddf_Work_IDs_002.csv, Ddf_Book_IDs_002.csv`

 ID dataframes are updated according to manually edited files in the previous step.

-> 3.4. Ddf_IDs_003.py > BKO_v005.csv, Ddf_IDs_003.csv, Ddf_BKO_IDs_003.csv, Ddf_Work_IDs_003.csv, Ddf_Book_IDs_003.csv
+> 3.4. `Ddf_IDs_003.py > BKO_v005.csv, Ddf_IDs_003.csv, Ddf_BKO_IDs_003.csv, Ddf_Work_IDs_003.csv, Ddf_Book_IDs_003.csv`

 `Jurist_id` and `Mid_date` columns are inserted into the `BKO`, `BKO_IDs` and `Ddf_IDs` dataframes by merging. The Work_IDs and Book_IDs dataframes are updated by removing duplicate values in the Book_id and Work_id columns in the new Ddf_Ids dataframe. All text units in Ddf, all elements in BKO, and all elements in the ID dataframes are now associated with a date which is stipulated to be the most active period of the corresponding jurist.

+4. BKO dataframe alignment
+
+`Ddf_IDs_004.py > BKO_v007.csv`
+
+A new column `BKO_id` is added to the `BKO` dataframe which aligns `Work_ref` in the `BKO` daraframe with `BKO_label` in the `Ddf_BKO_IDs` dataframe. `None` is entered where `Work_ref` cannot be mathced with a `BKO_label`.
+
 ### Footnotes

 [<sup id="fn1">1</sup>](#inline1) Georg Klingenberg, "Die ROMTEXT-Datenbank," _Informatica e diritto_ 4 (1995): 223-232.

--- a/dump/BKO_v007.csv
+++ b/dump/BKO_v007.csv
--- a/images/Ddf_flowchart.graphml
+++ b/images/Ddf_flowchart.graphml
@@ -855,7 +855,7 @@
      <data key="d6">
        <y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
          <y:Geometry height="55.0" width="35.0" x="-503.74999999999994" y="647.5"/>
-          <y:Fill color="#FBBC05" transparent="false"/>
+          <y:Fill color="#FFFFFFE6" transparent="false"/>
          <y:BorderStyle color="#000000" type="line" width="1.0"/>
          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="e" textColor="#000000" verticalTextPosition="bottom" visible="true" width="89.318359375" x="39.0" xml:space="preserve" y="18.515625">BKO_v005.csv</y:NodeLabel>
          <y:StyleProperties>
@@ -920,7 +920,6 @@
      </data>
    </node>
    <node id="n54">
-      <data key="d5"/>
      <data key="d6">
        <y:GenericNode configuration="com.yworks.bpmn.Event.withShadow">
          <y:Geometry height="30.0" width="30.0" x="-315.0" y="1230.0"/>
@@ -928,8 +927,8 @@
          <y:BorderStyle color="#27AE27" type="line" width="1.0"/>
          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="s" textColor="#000000" verticalTextPosition="bottom" visible="true" width="72.51953125" x="-21.259765625" xml:space="preserve" y="34.0">Ddf_005.py</y:NodeLabel>
          <y:StyleProperties>
-            <y:Property class="com.yworks.yfiles.bpmn.view.EventCharEnum" name="com.yworks.bpmn.characteristic" value="EVENT_CHARACTERISTIC_START"/>
            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
+            <y:Property class="com.yworks.yfiles.bpmn.view.EventCharEnum" name="com.yworks.bpmn.characteristic" value="EVENT_CHARACTERISTIC_START"/>
            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="EVENT_TYPE_CONDITIONAL"/>
@@ -938,13 +937,66 @@
      </data>
    </node>
    <node id="n55">
-      <data key="d5"/>
      <data key="d6">
        <y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
          <y:Geometry height="55.0" width="35.0" x="-126.25" y="1217.5"/>
          <y:Fill color="#FBBC05" transparent="false"/>
          <y:BorderStyle color="#000000" type="line" width="1.0"/>
          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="s" textColor="#000000" verticalTextPosition="bottom" visible="true" width="84.853515625" x="-24.9267578125" xml:space="preserve" y="59.0">Ddf_v106.csv</y:NodeLabel>
+          <y:StyleProperties>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
+            <y:Property class="com.yworks.yfiles.bpmn.view.DataObjectTypeEnum" name="com.yworks.bpmn.dataObjectType" value="DATA_OBJECT_TYPE_PLAIN"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
+            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="ARTIFACT_TYPE_DATA_OBJECT"/>
+          </y:StyleProperties>
+        </y:GenericNode>
+      </data>
+    </node>
+    <node id="n56">
+      <data key="d5"/>
+      <data key="d6">
+        <y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
+          <y:Geometry height="55.0" width="35.0" x="-503.75" y="546.25"/>
+          <y:Fill color="#FFFFFFE6" transparent="false"/>
+          <y:BorderStyle color="#000000" type="line" width="1.0"/>
+          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="e" textColor="#000000" verticalTextPosition="bottom" visible="true" width="89.318359375" x="39.0" xml:space="preserve" y="18.515625">BKO_v006.csv</y:NodeLabel>
+          <y:StyleProperties>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
+            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="ARTIFACT_TYPE_DATA_OBJECT"/>
+            <y:Property class="com.yworks.yfiles.bpmn.view.DataObjectTypeEnum" name="com.yworks.bpmn.dataObjectType" value="DATA_OBJECT_TYPE_PLAIN"/>
+          </y:StyleProperties>
+        </y:GenericNode>
+      </data>
+    </node>
+    <node id="n57">
+      <data key="d5"/>
+      <data key="d6">
+        <y:GenericNode configuration="com.yworks.bpmn.Event.withShadow">
+          <y:Geometry height="30.0" width="30.0" x="-594.375" y="558.75"/>
+          <y:Fill color="#FFFFFFE6" color2="#D4D4D4CC" transparent="false"/>
+          <y:BorderStyle color="#27AE27" type="line" width="1.0"/>
+          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="custom" textColor="#000000" verticalTextPosition="bottom" visible="true" width="97.55078125" x="-33.775390625" xml:space="preserve" y="34.0">Ddf_IDs_004.py<y:LabelModel><y:SmartNodeLabelModel distance="4.0"/></y:LabelModel><y:ModelParameter><y:SmartNodeLabelModelParameter labelRatioX="0.0" labelRatioY="-0.5" nodeRatioX="0.0" nodeRatioY="0.5" offsetX="0.0" offsetY="4.0" upX="0.0" upY="-1.0"/></y:ModelParameter></y:NodeLabel>
+          <y:StyleProperties>
+            <y:Property class="com.yworks.yfiles.bpmn.view.EventCharEnum" name="com.yworks.bpmn.characteristic" value="EVENT_CHARACTERISTIC_START"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
+            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
+            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="EVENT_TYPE_CONDITIONAL"/>
+          </y:StyleProperties>
+        </y:GenericNode>
+      </data>
+    </node>
+    <node id="n58">
+      <data key="d5"/>
+      <data key="d6">
+        <y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
+          <y:Geometry height="55.0" width="35.0" x="-690.0" y="546.25"/>
+          <y:Fill color="#FBBC05" transparent="false"/>
+          <y:BorderStyle color="#000000" type="line" width="1.0"/>
+          <y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="n" textColor="#000000" verticalTextPosition="bottom" visible="true" width="89.318359375" x="-27.1591796875" xml:space="preserve" y="-21.96875">BKO_v007.csv</y:NodeLabel>
          <y:StyleProperties>
            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
            <y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
@@ -1749,7 +1801,6 @@
      </data>
    </edge>
    <edge id="e66" source="n41" target="n54">
-      <data key="d9"/>
      <data key="d10">
        <y:GenericEdge configuration="com.yworks.bpmn.Connection">
          <y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
@@ -1762,6 +1813,44 @@
      </data>
    </edge>
    <edge id="e67" source="n54" target="n55">
+      <data key="d10">
+        <y:GenericEdge configuration="com.yworks.bpmn.Connection">
+          <y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
+          <y:LineStyle color="#000000" type="line" width="1.0"/>
+          <y:Arrows source="none" target="delta"/>
+          <y:StyleProperties>
+            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="CONNECTION_TYPE_SEQUENCE_FLOW"/>
+          </y:StyleProperties>
+        </y:GenericEdge>
+      </data>
+    </edge>
+    <edge id="e68" source="n50" target="n56">
+      <data key="d9"/>
+      <data key="d10">
+        <y:GenericEdge configuration="com.yworks.bpmn.Connection">
+          <y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
+          <y:LineStyle color="#000000" type="line" width="1.0"/>
+          <y:Arrows source="none" target="delta"/>
+          <y:StyleProperties>
+            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="CONNECTION_TYPE_SEQUENCE_FLOW"/>
+          </y:StyleProperties>
+        </y:GenericEdge>
+      </data>
+    </edge>
+    <edge id="e69" source="n56" target="n57">
+      <data key="d9"/>
+      <data key="d10">
+        <y:GenericEdge configuration="com.yworks.bpmn.Connection">
+          <y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
+          <y:LineStyle color="#000000" type="line" width="1.0"/>
+          <y:Arrows source="none" target="delta"/>
+          <y:StyleProperties>
+            <y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="CONNECTION_TYPE_SEQUENCE_FLOW"/>
+          </y:StyleProperties>
+        </y:GenericEdge>
+      </data>
+    </edge>
+    <edge id="e70" source="n57" target="n58">
      <data key="d9"/>
      <data key="d10">
        <y:GenericEdge configuration="com.yworks.bpmn.Connection">

--- a/script/Ddf_IDs_004.py
+++ b/script/Ddf_IDs_004.py
+import pandas as pd 
+
+# Load dataframes
+path_BKO = '/home/mribary/Dropbox/pyDigest/dump/BKO_v006.csv'
+BKO = pd.read_csv(path_BKO, index_col=0)
+path_BKO_ids = '/home/mribary/Dropbox/pyDigest/dump/Ddf_BKO_IDs_v003.csv'
+BKO_ids = pd.read_csv(path_BKO_ids, index_col=0)
+
+# Align BKO and BKO_IDs and load BKO_id to the BKO
+BKO_id_list = []
+for i in BKO.index:
+    if BKO_ids.BKO_id[BKO_ids.BKO_label == BKO.Work_ref[i]].size:       # Check if array is not empty
+        BKO_id = int(BKO_ids.BKO_id[BKO_ids.BKO_label == BKO.Work_ref[i]].values)
+    else:                                                               # If array is empty
+        BKO_id = None
+    BKO_id_list.append(BKO_id)
+BKO.insert(loc=3, column='BKO_id', value=BKO_id_list)
+
+BKO.to_csv('/home/mribary/Dropbox/pyDigest/dump/BKO_v007.csv')
\ No newline at end of file