Pythian Blog: Technical Track

Data Product Resiliency

We often discuss resiliency through the lens of systems, facilities and people. We build systems resiliency through security controls, disaster recovery plans and regular trials of our processes. We build facility resiliency through security protocols and designated alternative work locations. Our people resiliency comes from documented processes, designated leaders for crisis management and governance of our controls over decision making and documentation lifecycle management.

As we have structured more of our data for consumption and monetization, we begin to see a gap in our resiliency planning. While we plan for systems, facilities and people, many organizations are just now beginning to assess the resiliency of their data products. Data products are the curated results of business processes, data transformations and systems integration that enable data to be consumed for decision making and monetization. Data products are the final value-producing component of our complex enterprise data ecosystems. Data products ensure that our resiliency planning looks at the final output of data for consumption and avoids missing key systems or processes that may be individually deemed low risk or low importance.

Resiliency is about adjusting to change, the process of assessing different levels of severity of unplanned events and building proactive controls and responses that are proportionate to the identified risks. For data products that change daily to meet business needs while also containing high amounts of sensitive data, we must assess and produce controls and processes to recover data products and ensure their availability under various conditions.

A data product can contain up to nine (9) defined elements. As an organization matures, their data products will contain and consider a higher count of these key elements. These elements of inclusion and definition include consumer persona, creator persona, owner persona, quality measures, time measures, source elements, transformation logic, exposed features and journey touch points. The cumulative impact of these elements being defined is the enablement of practitioners to clearly define data in terms of usage, value derived, and preparation needed to maximize value.

As we assess our data products and begin to develop a plan to increase resiliency, we must first understand how they are used, by whom, and the impact of loss or compromise.

  • Business dependence - Assessment of the business dependencies and cycle of consumption for a data product. Asking questions about how often people access or update the data product and what is the implication if that access is delayed by different possible time periods.
  • Data movement & transformation considerations - We must understand where data movement and transformations are occurring so that our resiliency plans can include support and recoverability for the necessary dependent data sources, systems and process flows.
  • Risk of inadvertent changes - We must assess the risk of and plan to mitigate unanticipated or unauthorized changes to our data products. This assessment includes the necessary tooling and definition of policy to detect unauthorized changes and the processes for incident response.
  • Threats - As we assess the level of investment to mitigate the risks to our data products, we must understand and assess the threats they face. Threats can include things as diverse as outside actors, internal threats, services failures and vendor support.
  • Privacy & compliance obligations - Data is an ever-changing landscape of privacy and compliance obligations. Today's consumption of data could be eliminated tomorrow by new regulations. Organizations must continually assess the use of data for compliance against applicable laws and determine mitigation plans to ensure business operations if data use is suddenly limited. Additional considerations come from the risk of fines for non-compliance and the influence our investment in data protection has on our resiliency plans.
  • Our data consumer - Our data consumer is the final step in realizing the value of our data products. Our resiliency plans should consider their needs, the tools they utilize, the location they work from and the steps in their journey for data usage and decision making. Our resiliency plans should consider how to augment their process should data products become unavailable for varying levels of time.

Risk management is about making decisions regarding our investments against the ever-increasing levels of severity. To properly plan resiliency for data products, we must build a risk profile for our data products, including what could occur, the chances of occurrence, and what mitigating factors we are willing to invest in as an organization. Future posts will dig deeper into risk management for data products and balancing investing in preventing incidents with the cost of incident response.

The concept of resiliency planning for our data is not new. We have regularly assessed our databases, our enterprise data warehouses and business systems against disaster recovery and business continuity requirements and developed plans for recovery. The shift now is the data product as the point of evaluation and resiliency planning, driven by the ongoing increase of integration between systems leading to new and complex failures to be mitigated. Data products give us a focal point for assessing the impact of unanticipated change as close to the data consumer as possible.

As we do with all cyber security, compliance and operational resiliency initiatives, we must "shift left" as we design and deploy an increasing number of data products across our enterprise. As we define requirements for resiliency early in the process, we better enable engineering teams to take action on key design decisions and improve the accuracy, reliability and availability of our data products.  

No Comments Yet

Let us know what you think

Subscribe by email