There are four different MDM implementation styles
—> The Registry style
—> The Consolidation style
—> The co-existence style
—> The Centralised style
Registry :
—> Data is authored in different systems
—> Data is then sent to the MDM
—> MDM deduplicates data
—> Data cleansed using match and merge.
—> Data is not sent back to source systems.
—> All systems data can now be analysed and duplicates can be spotted in a central place.
Consolidation :
—> Data is authored in different systems
—> Data is then sent to the MDM
—> MDM deduplicates data
—> Data cleansed using match and merge
—> Data is not sent back to source systems
—> Used as a high quality DW/BI system for reporting and analytical purposes.
Coexistence :
—> Data is authored in different systems
—> Data is then sent to the MDM
—> MDM deduplicates data
—> Data cleansed using match and merge
—> Clean Data is sent back to source systems
—>Single version of truth between MDM and source systems.
Centralized :
—> Master data is authored in MDM,not in the systems.
—> All systems subscribe to the MDM hub for Master data.
—> MDM is the sole source of data.
—> Systems do not write back data to MDM.
Registry | Consolidation | Coexistance | Centralized | |
Source of data | Systems | Systems | Systems | MDM |
Complexity | Low | Medium | High | High |
Single Source of Truth | MDM | MDM | MDM + Systems | MDM |
Data updated back? | No | No | Yes | MDM |
Best used for | Data analysis for compliance | BI/Reporting | Distributed Source of Truth | Top down approach |
System of Record (SOR) :
—> The authoritative data source for a given data element or piece of information.
—> The data repository where the data object, as a whole or specific attributes of a data object, are maintained.
—> This includes data creation, updating, modifying and deleting.
System of Record Characteristics
—> Durable
Eliminate single point of failure
—> Correct
Storage technology
—> Restorable
Be able to restore backed-up data
—> Disaster-Ready
Be able to switch to a DR site in case of a disater
Golden Record :
In some cases, there is no single system that has the complete data set.
—> Bits and pieces of the required data is stored in multiple systems.
—> We cannot easily combine the bits and pieces to make a complete meaningful data set.
In other cases, the same data with the same attributes are available in multiple systems
—> But, the data values are different and we are not sure which among them is accurate.
The solution would be to create a ‘Golden Record’ in a single source of truth (SSOT) by compiling element attributes from the different systems.
MDM creates a master record ( also known as a ‘Golden Record’ or ‘Best Version of the Truth’ )that contains the essential information upon which a business or organization relies.
The ‘Golden Record’ contains what an organization needs to know about critical ‘things’ - a customer , location, product, supplier and so on.
Single Source of Truth (SSOT) :
—> The Single source of Truth (SSOT) or sometimes called Golden Source of Truth is a trusted data source that hosts a complete picture of a given data element.
—> It contains the ‘Golden Records’.
—> It can be used as source for any Business Intelligence and Data warehouse system.
—> AKA Golden source of Truth.
Deduplication Approaches
—> Survival of the fittest record
—> Use of Data Survivorship Rules
Survival of the fittest record :
Selecting the record that according to a data quality rule is the most fit is the simplest approach. The rule(s) that determines which record that will survive is most often based on either :
—> Lineage, where the source systems are prioritised : the record that belong to the highest priority source will be selected as a golden record.
—> Completeness - which record has the most fields and characters filled : the record with the highest percentage of completed fields will be selected as a golden record.
The downside of this approach : the surviving record has values from 1 source but not the other.
Use of Data Survivorship Rules :
In order to determine which field values should be selected while establishing the golden record in the master hub, it will be necessary to define data survivorship rules.
Data Survivorship Rules are set of rules that can be applied on the same data element that originates from different source systems to determine the golden record version of this data element.
Data Survivorship Rules factors
Factor | Description |
Accuracy | Records from a specific system might have a higher accuracy than records in any other system regardless of all other factors |
Recency | A record that was created more recently, or has a more recent update is more reliable than another record that was updated years ago. |
Frequency | A field value that is the same in several systems is more reliable than another value that appears once in another system |
Completeness | A record that has more complete field is more reliable than another record that has less complete fields. |
No comments:
Post a Comment