Pythian Blog: Technical Track

Geography in Data Governance

Previously we discussed the growth in certain types of data, specifically data lineage. Data lineage information allows us to programmatically build systems that respond to changing conditions of data access, data quality and consumption.

 

Another type of data that is growing in both creation and consumption is geospatial data. This type of data creates new and unique risks for storage and processing due to the highly regulated nature and individual levels of detail present.

As we think about our data governance programs geospatial data takes on multiple dimensions for how we define policies, train our staff about proper use, and consume data. Geospatial data is a specific data type that we generate, manage, transform, and destroy. Geospatial data is often present in conjunction with other data about a user’s actions, searches, purchases, or behaviors. Geospatial data carries specific policies that protect the privacy of the consumer the data was generated against. These policies are derived from the consumer, their location, and dictate specifics about how the data is managed, retained, used, and stored.

In our digitally connected work, geospatial data drives much of the personalization that consumers have come to rely on. Geospatial data provides many unique opportunities for both individual personalization and highly specific segmentation of digital experiences, placements, and ad-targeting. Some of the main uses of geospatial data include:

  • Personalization: Ensuring that each user has an experience that is specific to them, their actions, their preferences, and their current location. This can be allowing a user to place a mobile order at the nearest store while also leveraging their past order history.
  • Recommendations: Going beyond just personalization, we can leverage geospatial data to make recommendations that are near the user at the given time. This can include recommendations on restaurants, where to fill up a gas tank, or the fastest path through traffic.
  • Targeting: Providing users with advertisements that meet their needs, both location and past actions online. This often comes up in the form of car dealerships sending ads to those social media users in the same city, county or state, coupled with specifics about the users car preferences from online searches.
  • Financial Modeling: Many retail establishments will look at large pools of geographic data that is often de-identified to determine where new store locations would be most profitable and easiest to access by their target customer base.

While this data is valuable, there are key questions we must ask, as an organization, to understand what we can leverage this geospatial data for and if we are seeing a positive impact.

  • Do our users allow us to gather and  use this data, and under what conditions? This becomes the foundation for how we implement our applications and data pipelines and creates the basics of data usage policies that all employees must be aware of and follow.
  • Did the user follow our recommendations? This helps us understand the value of the data as an organization and the actions we are taking based on geospatial data.

Now we must apply these considerations to how our data governance programs define, manage,  and set policies for geospatial data. Our data governance programs must treat this data in specific ways due to risk associated with the personalized nature of the data and the regulatory obligations specific to data that can track individuals’ movements and locations.

  • Data Literacy: Our data literacy programs should include training material and examples of appropriate usage for geospatial data. It’s not uncommon to have special certifications for staff who handle this type of higher risk data.
  • Boundaries: As data moves between a larger number of systems, the risk of misuse increases. It is common to apply policies for financial data that minimize what systems it is stored and processed in. This model can also be applied to geospatial data, creating microservices with specific API boundaries for accessing and consuming the data to ensure fewer points to monitor, manage, and audit by data governance programs.
  • Consumer Obligations: As part of all internal processes to move geospatial data between systems, we should also carry the context of usage data. This means that user preferences for how we use, analyze, share, and destroy should be carried with the data itself allowing downstream data consumers to make educated decisions about how to handle data they have received.
  • Legal/Compliance: The legal and compliance obligations for geospatial data are complex and will only continue to become more-so. A company’s legal teams should be involved early to build a matrix for engineering teams that evaluates how data is handled, processed, and destroyed based on user preferences, location of creation, and the consumer’s place of residence.

As with lineage data, geospatial data must be uniquely identified and handled by data governance programs. The automation of policy implementation, by leveraging policies defined by legal teams and user context carried with the data, will ensure compliance across systems in a repeatable and auditable manner. The value of geospatial data can’t be overstated; it drives much of the personalization and value found in today’s mobile applications, and with this value comes risk that can be managed if done holistically across the organization through strong data governance programs.

In our next post, we’ll explore data retention policies and how to balance the need for historical information on events with the cost of storage and the legal requirements for long-term availability of data. We’ll discuss the ongoing myth of “keeping all data forever” while exploring the real-time periods of value for data retention where usage is maximized, and risk, minimized. Make sure to sign up for updates so you don’t miss it.

 

No Comments Yet

Let us know what you think

Subscribe by email