Mortaza Shoae Bargh, Research and Data Center, Dutch Ministry of Justice and Security, The Netherlands
Sunil Choenni, Research and Data Center, Dutch Ministry of Justice and Security, The Netherlands
Data and data sharing foster information systems that can create value for society, individuals, businesses and organizations. Data sharing and usage require establishing an appropriate data ecosystem where solid and effective data governance and management are in place to deal with associated risks like data being biased, personal, sensitive and stigmatizing, to name a few. Data lineage is a necessary means for data governance and management. In this contribution, we revisit the objectives of data lineage and investigate how it can be deployed in cross organizational settings. Specifically, we provide an overview of the objectives to which contemporary data lineage can contribute, revise the existing definition(s) of data lineage and adapt it to cross organizational settings, and propose architectural models for data lineage deployment across loosely coupled semi-autonomous organizations. Our revised definition of data lineage conceptualizes the physical distribution of data related objects as well as the semantical distribution of the related concepts that relate or apply to those data objects. The proposed architecture for data lineage related metadata management relies on the existing organizational structures, which, therefore, can scale up organically as seen in the case of federated identity management among academia.