DAMA - Rocky Mountain Chapter

DMBoK Figure 91 Context Diagram: Data Quality

12/31/2024 7:00 AM | Anonymous member

Data Quality can be defined as the degree to which dimensions of Data Quality meet the requirements. This implies that requirements should be formulated for each (relevant) dimension. A much shorter definition for quality of data is ‘fit for purpose.’

Data that meets the requirements are of sufficient quality; data that doesn’t meet the requirements are of insufficient quality. To keep it simple, we respectively speak of high and low, or poor quality data.

Effective Data Management involves a set of interrelated processes enabling an organization to use its data to achieve strategic goals. An underlying assertion is that the data itself is of high quality. Data Quality Management is the planning, implementation, and control of activities that apply quality management techniques to data in order to assure it is fit for consumption and meets the needs of data consumers.

High quality data is context driven. This means that the same data may be simultaneously viewed as high quality by some areas of an organization while being viewed as low quality by other areas. Many organizations fail to engage with this question of context, that is, high Data Quality being that which is fit for purpose.

If we understand organizations as data manufacturing machines, we can assert (from our experience in manufacturing) that organizations that formally manage the quality of data will be more effective, more efficient and deliver a better experience than those that leave Data Quality to chance. However, no organization has perfect business processes, technical processes, or data management practices. In reality, all organizations experience problems related to their Data Quality. Many factors undermine quality data: lack of understanding about the effects on organizational success, leadership that does not value Data Quality, poor planning, ‘siloed’ system design, inconsistent development processes, incomplete documentation, a lack of standards, or a lack of Data Governance.

As is the case with Data Governance and with Data Management as a whole, Data Quality Management is a function, not a program or project. This is because projects and even programs have starts, middles, and ends. A Data Quality Function is, or should be, a continuing business as usual set of activities. It will include both projects and programs (to address specific Data Quality improvements) as well as operational work, along with a commitment to communications and training. Most importantly, the long-term success of a Data Quality improvement program depends on getting an organization to change its culture and adopt a quality mindset. As stated in The Leader’s Data Manifesto, “fundamental, lasting change requires committed leadership and involvement from people at all levels in an organization.” People who use data to do their jobs – which in most organizations is a very large percentage of employees – need to drive change. One of the most critical changes to focus on is how their organizations manage and improve the quality of their data.

Formal Data Quality Management is similar to continuous quality management for other product manufacturing. It includes managing data through its lifecycle by setting standards, building quality into the processes that create, transform, and store data, and measuring data against standards. Managing data to this level usually requires a Data Quality Function team. The Data Quality Function team is responsible for engaging both business and technical data management professionals and driving the work of applying quality management techniques to data to ensure that data is fit for consumption for a variety of purposes. The team will likely be involved with a series of projects through which they can establish processes and best practices while addressing high priority data issues.

#damarmc TechYeet Slack Channel

12/27/2024 7:00 AM | Anonymous member

DAMA Rocky Mountain Chapter is happy to announce its partnership with TechYeet. A Slack community based on connecting people in the wider data and technology communities. With over 5,000 TechYeet members, DAMA-RMC is utilizing the TechYeet community platform to bring people together in the Data Management space.

Please reach out to Greg Sheridan PMI-ACP, VP of Partnerships & Sponsorships, at PartnershipsVP@damarmc.org, if you are in TechYeet and would like to join the DAMA-RMC channel.

DMBoK Figure 90 Sample System Lineage Flow Diagram

12/26/2024 7:00 AM | Anonymous member

Although a lineage graphic, such as in last week's figure, describes what is happening to a particular data element, not all business users will understand it. Higher levels of lineage (e.g., ‘System Lineage’) summarize movement at the system or application level. Many visualization tools provide zoom-in / zoom-out capability, to show data element lineage in the context of system lineage. For example, this figure shows a sample system lineage, where at a glance, general data movement is understood and visualized at a system or an application level.

As the number of data elements in a system grows, the lineage discovery becomes complex and difficult to manage. In order to successfully achieve the business goals, a strategy for discovering and importing assets into the Metadata repository requires planning and design. Successful lineage discovery needs to account for both business and technical focus:

Business focus: Limit the lineage discovery to data elements prioritized by the business. Start from the target locations and trace back to the source systems where the specific data originates. By limiting the scanned assets to those that move, transfer, or update the selected data elements, this approach will enable business data consumers to understand what is happening to the specific data element as it moves through systems. If coupled with Data Quality measurements, lineage can be used to pinpoint where system design adversely impacts the quality of the data.

Technical focus: Start at the source systems and identify all the immediate consumers, then identify all the subsequent consumers of the first set identified and keep repeating these steps until all systems are identified. Technology users benefit more from the system discovery strategy in order to help answer the various questions about the data. This approach will enable technology and business users to answer question about discovering data elements across the enterprise, like “Where is social security number?” or generate impact reports like “What systems are impacted if the width of a specific column is changed?” This strategy can, however, be complex to manage.

Many data integration tools offer lineage analysis that considers not only the developed population code but the data model and the physical database as well. Some offer business user facing web interfaces to monitor and update definitions. These begin to look like business glossaries.

Documented lineage helps both business and technical people use data. Without it, much time is wasted in investigating anomalies, potential change impacts, or unknown results. Look to implement an integrated impact and lineage tool that can understand all the moving parts involved in the load process as well as end user reporting and analytics. Impact reports outline which components are affected by a potential change expediting and streamlining estimating and maintenance tasks.

Give the Gift of Professional Membership!

12/24/2024 1:25 PM | Anonymous member

Gift a colleague, or yourself, a 25% off discounted DAMA-RMC professional membership this holiday season. Join as a professional member OR upgrade from a guest membership. Professional membership includes:

Entry to all chapter meetings AND mingle events
Meeting and presentation archive access
CDMP virtual study group, bootcamp and discounts
DMBoK discount
Conference discounts
DAMA events and programming discounts
Plus so much more...

Promo Code: 12HOLIDAY25

Join HERE.

Congratulations New CDMPs!

12/23/2024 1:10 PM | Anonymous member

Thanks to everyone who participated in DAMA-RMC's study sessions, bootcamp, and "Pay-If-You Pass" exam prep over the last few months.

We are thrilled at the progress everyone made and excited to announce 6 new CDMPs:

Funke Bishi

John Lieto

Katrina Miyamoto

Kris New

Benjamin Seidle

Rachel Udow

Several others will be completing their tests in the next few weeks.

We wish everyone luck!

Learn more: CDMP Certification with DAMA-RMC Support

Questions?

Please contact ProfessionalDevelopmentVP@damarmc.org.

December 2024 Newsletter

12/20/2024 7:00 AM | Anonymous member

December 2024 Newsletter.pdf

DMBoK Figure 89 Sample Data Element Lineage Flow Diagram

12/18/2024 5:30 PM | Anonymous member

A key benefit of discovering and documenting Metadata about the physical assets is to provide information on how data is transformed as it moves between systems. Many Metadata tools carry information about what is happening to the data within their environments and provide capabilities to view the lineage across the span of the systems or applications they interface. The current version of the lineage based on programming code is referred to as ‘As Implemented Lineage’. In contrast, lineage describe in mapping specification documents is referred to as ‘As Designed Lineage’.

The limitations of a lineage build are based on the coverage of the Metadata management system. Function-specific Metadata repositories or data visualization tools have information about the data lineage within the scope of the environments they interact with but will not provide visibility to what is happening to the data outside their environments.

Metadata management systems import the ‘As Implemented’ lineage from the various tools that can provide this lineage detail and then augment the data lineage with the ‘As Designed’ from the places where the actual implementation details is not extractable. The process of connecting the pieces of the data lineage referred to as stitching. It results in a holistic visualization of the data as it moves from its original locations (official source or system of record) until it lands in its final destination.

This figure shows a sample data element lineage. In reading this, the ‘Total Backorder’ business data element, which is physically implemented as column zz_total, depends on 3 other data elements: ‘Units Cost in Cents’ physically implemented as ‘yy_unit_cost’, ‘Tax in Ship to State’ implemented in ‘yy_tax’ and ‘Back Order Quantity’ implemented in ‘yy_qty’.

DMBoK Figure 88 Example Metadata Repository Metamodel

12/11/2024 7:00 AM | Anonymous member

A Metadata Management system must be capable of extracting Metadata from many sources. Design the architecture to be capable of scanning the various Metadata sources and periodically updating the repository. The system must support the manual updates of Metadata, requests, searches, and lookups of Metadata by various user groups.

A managed Metadata environment should isolate the end user from the various and disparate Metadata sources. The architecture should provide a single access point for the Metadata repository. The access point must supply all related Metadata resources transparently to the user. Users should be able to access Metadata without being aware of the differing environments of the data sources. In analytics and Big Data solutions, the interface may have largely user-defined functions (UDF) to draw on various data sets, and the Metadata exposure to the end user is inherent to those customizations. With less reliance on UDF in solutions, end users will be gathering, inspecting, and using data sets more directly and various supporting Metadata is usually more exposed.

Design of the architecture depends on the specific requirements of the organization. Three technical architectural approaches to building a common Metadata repository mimic the approaches to designing data warehouses: centralized, distributed, and hybrid (see Section 1.3.6). These approaches all take into account implementation of the repository, and how the update mechanisms operate.

Create a data model for the Metadata repository, or metamodel, as one of the first design steps after the Metadata strategy is complete and the business requirements are understood. Different levels of metamodel may be developed as needed; a high-level conceptual model, that explains the relationships between systems, and a lower level metamodel that details the attributions, to describe the elements and processes of a model. In addition to being a planning tool and a means of articulating requirements, the metamodel is in itself a valuable source of Metadata. this figure depicts a sample Metadata repository metamodel. The boxes represent the high-level major entities, which contain the data.

DMBoK Figure 87 Hybrid Metadata Architecture

12/04/2024 1:04 PM | Anonymous member

Another advanced architectural approach is bi-directional Metadata Architecture, which allows Metadata to change in any part of the architecture (source, data integration, user interface), and then feedback is coordinated from the repository (broker) into its original source.

Various challenges are apparent in this approach. The design forces the Metadata repository to contain the latest version of the Metadata source and forces it to manage changes to the source, as well. Changes must be trapped systematically, and then resolved. Additional sets of process interfaces to tie the repository back to the Metadata source(s) must be built and maintained.

This figure illustrates how common Metadata from different sources is collected in a centralized Metadata store. Users submit their queries to the Metadata portal, which passes the request to a centralized repository. The centralized repository will try to fulfill the user request from the common Metadata collected initially from the various sources. As the request becomes more specific or the user needs more detailed Metadata then the centralized repository will delegate down to the specific source to research the specific details. Global search across the various tools is available due to the common Metadata collected in the centralized repository.

DMBoK Figure 86 Distributed Metadata Architecture

11/27/2024 7:00 AM | Anonymous member

A completely distributed architecture maintains a single access point. The Metadata retrieval engine responds to user requests by retrieving data from source systems in real time; there is no persistent repository. In this architecture, the Metadata management environment maintains the necessary source system catalogs and lookup information needed to process user queries and searches effectively. A common object request broker or similar middleware protocol accesses these source systems.

Advantages of distributed Metadata Architecture include:

Metadata is always as current and valid as possible because it is retrieved from its source
Queries are distributed, possibly improving response and process time
Metadata requests from proprietary systems are limited to query processing rather than requiring a detailed understanding of proprietary data structures, therefore minimizing the implementation and maintenance effort required
Development of automated Metadata query processing is likely simpler, requiring minimal manual intervention
Batch processing is reduced, with no Metadata replication or synchronization processes

Distributed architectures also have limitations:

No ability to support user-defined or manually inserted Metadata entries since there is no repository in which to place these additions
Standardization of presenting Metadata from various systems
Query capabilities are directly affected by the availability of the participating source systems
The quality of Metadata depends solely on the participating source systems

This figure illustrates a distributed Metadata Architecture. There is no centralized Metadata repository store and the portal passes the users’ requests to the appropriate tool to execute. As there is no centralized store for the Metadata to be collected from the various tools, every request has to be delegated down to the sources; hence, no capability exist for a global search across the various Metadata sources.

News & Announcements

DMBoK Figure 91 Context Diagram: Data Quality

12/31/2024 7:00 AM | Anonymous member

#damarmc TechYeet Slack Channel

12/27/2024 7:00 AM | Anonymous member

DMBoK Figure 90 Sample System Lineage Flow Diagram

12/26/2024 7:00 AM | Anonymous member

Give the Gift of Professional Membership!

12/24/2024 1:25 PM | Anonymous member

Congratulations New CDMPs!

12/23/2024 1:10 PM | Anonymous member

December 2024 Newsletter

12/20/2024 7:00 AM | Anonymous member

DMBoK Figure 89 Sample Data Element Lineage Flow Diagram

12/18/2024 5:30 PM | Anonymous member

DMBoK Figure 88 Example Metadata Repository Metamodel

12/11/2024 7:00 AM | Anonymous member

DMBoK Figure 87 Hybrid Metadata Architecture

12/04/2024 1:04 PM | Anonymous member

DMBoK Figure 86 Distributed Metadata Architecture

11/27/2024 7:00 AM | Anonymous member

Featured Articles

Not a member yet?
Join us now

Quick links

Follow our activities

News & Announcements

12/31/2024 7:00 AM | Anonymous member

12/27/2024 7:00 AM | Anonymous member

12/26/2024 7:00 AM | Anonymous member

12/24/2024 1:25 PM | Anonymous member

12/23/2024 1:10 PM | Anonymous member

12/20/2024 7:00 AM | Anonymous member

12/18/2024 5:30 PM | Anonymous member

12/11/2024 7:00 AM | Anonymous member

12/04/2024 1:04 PM | Anonymous member

11/27/2024 7:00 AM | Anonymous member

Featured Articles

Not a member yet? Join us now

Quick links

Follow our activities

Not a member yet?
Join us now