Skip to content
Snippets Groups Projects
Commit d86b3b96 authored by Ribary, Marton Dr (School of Law)'s avatar Ribary, Marton Dr (School of Law)
Browse files

Load BKO_id to BKO

parent d1615c03
No related branches found
No related tags found
No related merge requests found
......@@ -6,7 +6,7 @@ The documentation describes the creation of the core dataframes of the `pyDigest
The dcumentation follows the order of output files stored in the prject's `dump` folder. The title of individual steps include the method and the name of the output file in the format of `[Method] > [outputFile]`. The `[Method]` is either (1) manual or (2) executed in Pyhton files which are stored in the `script` folder. The `[Method]` and `[outputFile]` correspond with elements in `Ddf_flowchart`. These steps document the data manipulation pipeline from the raw text of the _Digest_ to its [relational database](link).
![Ddf_flowchart](https://raw.githubusercontent.com/mribary/pyDigest/master/images/Ddf_flowchart.png)
![Ddf_flowchart](https://raw.githubusercontent.com/mribary/pyDigest/master/images/Ddf_flowchart.graphml)
1. Manual editing > ROMTEXT.txt
......@@ -212,6 +212,10 @@ The manually corrected `Ddf_v104.csv` and `BKO_v004.csv` files are free from err
Manual inspection has revealed that Greek script had been lost during manual editing in step 7 above. The Python file recovers the `TextUnit` column from `Ddf_v102.csv` and inserts them into `Ddf_v105.csv`. The output `csv` file is streamlined to keep the index, the four-level _Digest_ reference and the text only. These columns will be used to create the SQL database.
12. Manual editing > BKO_v006.csv
Based on `Work_ref`, a verbose `title` is added to the `BKO` dataframe which includes the title of the work in an unabbreviated format and drops the book numbers where the work in the `BKO` dataframe is split. Where the verbose title could not be verified by checking against Mommsen's print edition, a `?` mark is added to the title to indicate that the data may need to be updated at a later point.
### 3. Additional dataframes
1. Sections dataframes
......@@ -224,7 +228,7 @@ An additional dataframe including section_IDs with their corresponding section t
2. ID dataframes
Ddf_IDs_001.py > Ddf_IDs_001.csv, Ddf_BKO_IDs_001.csv, Ddf_Work_IDs_001.csv, Ddf_Book_IDs_001.csv
`Ddf_IDs_001.py > Ddf_IDs_001.csv, Ddf_BKO_IDs_001.csv, Ddf_Work_IDs_001.csv, Ddf_Book_IDs_001.csv`
The script initiates a dataframe `df` with the `BKO_key`, `Work` and `TextUnit_ref` columns from `Ddf_v104.csv`. It creates separate dataframes for unique `BKO_key` (294), `Work` (251), and `TextUnit_ref` values (1352) where values are sorted alphabetically and associated with a unique ID. These dataframes are exported as `Ddf_BKO_IDs_v001.csv`, `Ddf_Work_IDs_v001.csv` and `Ddf_Book_IDs_v001.csv`.
......@@ -232,11 +236,11 @@ The script links the reference IDs above with the 21055 text units of the Digest
3. Jurists dataframes
> 3.1. Ddf_jurists.py > Jurists_v001.csv
> 3.1. `Ddf_jurists.py > Jurists_v001.csv`
The script initiates a dataframe `df` with the `Jurist_name` column from `BKO_v004.csv`. It strips whitespace, orders the list of unique values, associates items with unique IDs and outputs the `Jurists_v001.csv` file to be enriched with data manually.
> 3.2. Manual editing: Jurists_v001.csv > Jurists_v002.csv, Ddf_v105.csv
> 3.2. `Manual editing: Jurists_v001.csv > Jurists_v002.csv, Ddf_v105.csv`
Jurists are associated with a date range of their lifetime according to information available in Adolf Berger's _Dictionary of Roman law_[<sup id="inline6">6</sup>](#fn6) consulted in conjunction with _Paulys Realencyclopädie der classischen Altertumswissenschaft_[<sup id="inline7">7</sup>](#fn7). The manually edited `Jurists_v002.csv` includes a `Note` which explains how the date range is estimated and a column with `Reference` information to _Berger_ and the _RE_. `Start_date` corresponds to the (estimated) birth of the jurist, `Mid_date` to his (estimated) most active period (_floruit_) at the age of 40, and `End_date` to his (estimated) death at the age of 60. Where exact dates are available for any of the three dates from _Berger_ and the _RE_, it is entered into the appropriate column instead of the estimate.
......@@ -244,14 +248,20 @@ For those jurists who are dated only by rough estimates in the _RE_ or _Berger_,
Minor typos and alternative versions of headings are corrected in Ddf which is updated manually in `Ddf_v105.csv`. The consistency of the `Ddf` and `BKO` dataframes are checked in `Ddf_BKO_check_4.py`.
> 3.3. Ddf_IDs_002.py > Ddf_IDs_002.csv, Ddf_BKO_IDs_002.csv, Ddf_Work_IDs_002.csv, Ddf_Book_IDs_002.csv
> 3.3. `Ddf_IDs_002.py > Ddf_IDs_002.csv, Ddf_BKO_IDs_002.csv, Ddf_Work_IDs_002.csv, Ddf_Book_IDs_002.csv`
ID dataframes are updated according to manually edited files in the previous step.
> 3.4. Ddf_IDs_003.py > BKO_v005.csv, Ddf_IDs_003.csv, Ddf_BKO_IDs_003.csv, Ddf_Work_IDs_003.csv, Ddf_Book_IDs_003.csv
> 3.4. `Ddf_IDs_003.py > BKO_v005.csv, Ddf_IDs_003.csv, Ddf_BKO_IDs_003.csv, Ddf_Work_IDs_003.csv, Ddf_Book_IDs_003.csv`
`Jurist_id` and `Mid_date` columns are inserted into the `BKO`, `BKO_IDs` and `Ddf_IDs` dataframes by merging. The Work_IDs and Book_IDs dataframes are updated by removing duplicate values in the Book_id and Work_id columns in the new Ddf_Ids dataframe. All text units in Ddf, all elements in BKO, and all elements in the ID dataframes are now associated with a date which is stipulated to be the most active period of the corresponding jurist.
4. BKO dataframe alignment
`Ddf_IDs_004.py > BKO_v007.csv`
A new column `BKO_id` is added to the `BKO` dataframe which aligns `Work_ref` in the `BKO` daraframe with `BKO_label` in the `Ddf_BKO_IDs` dataframe. `None` is entered where `Work_ref` cannot be mathced with a `BKO_label`.
### Footnotes
[<sup id="fn1">1</sup>](#inline1) Georg Klingenberg, "Die ROMTEXT-Datenbank," _Informatica e diritto_ 4 (1995): 223-232.
......
This diff is collapsed.
......@@ -855,7 +855,7 @@
<data key="d6">
<y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
<y:Geometry height="55.0" width="35.0" x="-503.74999999999994" y="647.5"/>
<y:Fill color="#FBBC05" transparent="false"/>
<y:Fill color="#FFFFFFE6" transparent="false"/>
<y:BorderStyle color="#000000" type="line" width="1.0"/>
<y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="e" textColor="#000000" verticalTextPosition="bottom" visible="true" width="89.318359375" x="39.0" xml:space="preserve" y="18.515625">BKO_v005.csv</y:NodeLabel>
<y:StyleProperties>
......@@ -920,7 +920,6 @@
</data>
</node>
<node id="n54">
<data key="d5"/>
<data key="d6">
<y:GenericNode configuration="com.yworks.bpmn.Event.withShadow">
<y:Geometry height="30.0" width="30.0" x="-315.0" y="1230.0"/>
......@@ -928,8 +927,8 @@
<y:BorderStyle color="#27AE27" type="line" width="1.0"/>
<y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="s" textColor="#000000" verticalTextPosition="bottom" visible="true" width="72.51953125" x="-21.259765625" xml:space="preserve" y="34.0">Ddf_005.py</y:NodeLabel>
<y:StyleProperties>
<y:Property class="com.yworks.yfiles.bpmn.view.EventCharEnum" name="com.yworks.bpmn.characteristic" value="EVENT_CHARACTERISTIC_START"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
<y:Property class="com.yworks.yfiles.bpmn.view.EventCharEnum" name="com.yworks.bpmn.characteristic" value="EVENT_CHARACTERISTIC_START"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="EVENT_TYPE_CONDITIONAL"/>
......@@ -938,13 +937,66 @@
</data>
</node>
<node id="n55">
<data key="d5"/>
<data key="d6">
<y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
<y:Geometry height="55.0" width="35.0" x="-126.25" y="1217.5"/>
<y:Fill color="#FBBC05" transparent="false"/>
<y:BorderStyle color="#000000" type="line" width="1.0"/>
<y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="s" textColor="#000000" verticalTextPosition="bottom" visible="true" width="84.853515625" x="-24.9267578125" xml:space="preserve" y="59.0">Ddf_v106.csv</y:NodeLabel>
<y:StyleProperties>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
<y:Property class="com.yworks.yfiles.bpmn.view.DataObjectTypeEnum" name="com.yworks.bpmn.dataObjectType" value="DATA_OBJECT_TYPE_PLAIN"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="ARTIFACT_TYPE_DATA_OBJECT"/>
</y:StyleProperties>
</y:GenericNode>
</data>
</node>
<node id="n56">
<data key="d5"/>
<data key="d6">
<y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
<y:Geometry height="55.0" width="35.0" x="-503.75" y="546.25"/>
<y:Fill color="#FFFFFFE6" transparent="false"/>
<y:BorderStyle color="#000000" type="line" width="1.0"/>
<y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="e" textColor="#000000" verticalTextPosition="bottom" visible="true" width="89.318359375" x="39.0" xml:space="preserve" y="18.515625">BKO_v006.csv</y:NodeLabel>
<y:StyleProperties>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="ARTIFACT_TYPE_DATA_OBJECT"/>
<y:Property class="com.yworks.yfiles.bpmn.view.DataObjectTypeEnum" name="com.yworks.bpmn.dataObjectType" value="DATA_OBJECT_TYPE_PLAIN"/>
</y:StyleProperties>
</y:GenericNode>
</data>
</node>
<node id="n57">
<data key="d5"/>
<data key="d6">
<y:GenericNode configuration="com.yworks.bpmn.Event.withShadow">
<y:Geometry height="30.0" width="30.0" x="-594.375" y="558.75"/>
<y:Fill color="#FFFFFFE6" color2="#D4D4D4CC" transparent="false"/>
<y:BorderStyle color="#27AE27" type="line" width="1.0"/>
<y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="custom" textColor="#000000" verticalTextPosition="bottom" visible="true" width="97.55078125" x="-33.775390625" xml:space="preserve" y="34.0">Ddf_IDs_004.py<y:LabelModel><y:SmartNodeLabelModel distance="4.0"/></y:LabelModel><y:ModelParameter><y:SmartNodeLabelModelParameter labelRatioX="0.0" labelRatioY="-0.5" nodeRatioX="0.0" nodeRatioY="0.5" offsetX="0.0" offsetY="4.0" upX="0.0" upY="-1.0"/></y:ModelParameter></y:NodeLabel>
<y:StyleProperties>
<y:Property class="com.yworks.yfiles.bpmn.view.EventCharEnum" name="com.yworks.bpmn.characteristic" value="EVENT_CHARACTERISTIC_START"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill" value="#ffffffe6"/>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="EVENT_TYPE_CONDITIONAL"/>
</y:StyleProperties>
</y:GenericNode>
</data>
</node>
<node id="n58">
<data key="d5"/>
<data key="d6">
<y:GenericNode configuration="com.yworks.bpmn.Artifact.withShadow">
<y:Geometry height="55.0" width="35.0" x="-690.0" y="546.25"/>
<y:Fill color="#FBBC05" transparent="false"/>
<y:BorderStyle color="#000000" type="line" width="1.0"/>
<y:NodeLabel alignment="center" autoSizePolicy="content" fontFamily="Dialog" fontSize="12" fontStyle="plain" hasBackgroundColor="false" hasLineColor="false" height="17.96875" horizontalTextPosition="center" iconTextGap="4" modelName="sides" modelPosition="n" textColor="#000000" verticalTextPosition="bottom" visible="true" width="89.318359375" x="-27.1591796875" xml:space="preserve" y="-21.96875">BKO_v007.csv</y:NodeLabel>
<y:StyleProperties>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.line.color" value="#000000"/>
<y:Property class="java.awt.Color" name="com.yworks.bpmn.icon.fill2" value="#d4d4d4cc"/>
......@@ -1749,7 +1801,6 @@
</data>
</edge>
<edge id="e66" source="n41" target="n54">
<data key="d9"/>
<data key="d10">
<y:GenericEdge configuration="com.yworks.bpmn.Connection">
<y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
......@@ -1762,6 +1813,44 @@
</data>
</edge>
<edge id="e67" source="n54" target="n55">
<data key="d10">
<y:GenericEdge configuration="com.yworks.bpmn.Connection">
<y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
<y:LineStyle color="#000000" type="line" width="1.0"/>
<y:Arrows source="none" target="delta"/>
<y:StyleProperties>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="CONNECTION_TYPE_SEQUENCE_FLOW"/>
</y:StyleProperties>
</y:GenericEdge>
</data>
</edge>
<edge id="e68" source="n50" target="n56">
<data key="d9"/>
<data key="d10">
<y:GenericEdge configuration="com.yworks.bpmn.Connection">
<y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
<y:LineStyle color="#000000" type="line" width="1.0"/>
<y:Arrows source="none" target="delta"/>
<y:StyleProperties>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="CONNECTION_TYPE_SEQUENCE_FLOW"/>
</y:StyleProperties>
</y:GenericEdge>
</data>
</edge>
<edge id="e69" source="n56" target="n57">
<data key="d9"/>
<data key="d10">
<y:GenericEdge configuration="com.yworks.bpmn.Connection">
<y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0"/>
<y:LineStyle color="#000000" type="line" width="1.0"/>
<y:Arrows source="none" target="delta"/>
<y:StyleProperties>
<y:Property class="com.yworks.yfiles.bpmn.view.BPMNTypeEnum" name="com.yworks.bpmn.type" value="CONNECTION_TYPE_SEQUENCE_FLOW"/>
</y:StyleProperties>
</y:GenericEdge>
</data>
</edge>
<edge id="e70" source="n57" target="n58">
<data key="d9"/>
<data key="d10">
<y:GenericEdge configuration="com.yworks.bpmn.Connection">
......
import pandas as pd
# Load dataframes
path_BKO = '/home/mribary/Dropbox/pyDigest/dump/BKO_v006.csv'
BKO = pd.read_csv(path_BKO, index_col=0)
path_BKO_ids = '/home/mribary/Dropbox/pyDigest/dump/Ddf_BKO_IDs_v003.csv'
BKO_ids = pd.read_csv(path_BKO_ids, index_col=0)
# Align BKO and BKO_IDs and load BKO_id to the BKO
BKO_id_list = []
for i in BKO.index:
if BKO_ids.BKO_id[BKO_ids.BKO_label == BKO.Work_ref[i]].size: # Check if array is not empty
BKO_id = int(BKO_ids.BKO_id[BKO_ids.BKO_label == BKO.Work_ref[i]].values)
else: # If array is empty
BKO_id = None
BKO_id_list.append(BKO_id)
BKO.insert(loc=3, column='BKO_id', value=BKO_id_list)
BKO.to_csv('/home/mribary/Dropbox/pyDigest/dump/BKO_v007.csv')
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment