Pythian Blog: Technical Track

Breaking down data silos: BI’s broken data integration promise

This is the first in a series of four posts on the breaking down data silos to gain more complete, accurate business insights. When setting up a data program for your organization, it can make sense to start by adopting one of the many available out-of-the-box BI or visualization tools. After all, these tools are supposed to be relatively easy to integrate into your organization. Just sign up, connect the tool to your data sources (on-prem or cloud service of choice), add as many users in your organization as makes sense, and then kick back and watch the insights roll in. But while it’s true that out-of-the-box, self-service BI tools offer slick user interfaces and powerful visualizations, they often tout themselves as one-size-fits-all solutions to nearly everything, including getting insights from multisource data, governance, data prep and data access. The reality is there is a lot of work that has to happen behind the scenes to get data ready for analysis and visualization - particularly if you’re trying to get insights from multiple, disparate data sources. BI tools are, of course, very useful in many cases. But using only out-of-the-box software to try to scale an enterprise-level data program may not solve the most common barrier to getting complete insights -- data silos. At best, these tools will present a single data set in a cleaner, easier to digest visual format. And at worst, they can result in analysis built on bad data, which inevitably begets the most unfortunate outcome of all: business decisions based on inaccurate or incomplete insights. So what good are these tools if they can’t solve the most common data-related issues? The short answer is that they are very valuable. The problem lies not in the tools themselves, but in the fact that companies adopt them without a full understanding of what it takes to get data prepared for analysis and visualization. They don’t always account for complex processes like cleaning, unifying and integrating data for consumption by end-users and their tools. But this is something savvy data users understand all too well. In fact, a majority of data scientists agree that the most time-consuming element of analysis isn’t actually analysis at all. It’s actually cleaning the data so it can be used in the first place.

What is data integration?

Data integration is a critical component of the data preparation process, particularly when the data is coming from multiple sources. The preparation process includes data integration, cleansing, formatting and organizing. It also involves validation for accuracy and consistency so the data can be analyzed using business intelligence and visualization software, or can be used as input for systems like decision support. The data preparation process also focuses on business user requirements, improving data quality and transforming data into a format that meets user needs. At its core, data integration is the combining of data from different sources, so users see a unified view of all relevant data (despite being different types, or residing in different places, or generated within various departments or business units) instead of just one source or type. Because most business units or departments are generally prone to walling off their data due to structural, political, or other reasons, organizations must be proactive to not fall into the data silo trap.

Data silos

Data silos are data sets segregated from the rest of the enterprise. They are essentially islands of data unto themselves, and their existence makes it difficult and expensive to analyze for trends and insights across an organization as a whole. Data silos are also prone to containing duplicate or conflicting data and can lead to bad analysis and false conclusions. There are several data integration benefits that organizations should keep in mind:
  • It’s best to speak the same language: managing the creation of datasets, data catalogs and definitions of data within your organization are best managed in a consistent, governed, repeatable and shareable way. Reinventing the wheel every time analysis is required is a recipe for disaster.
  • All data types have their strengths: When combined effectively, various data types and from different sources are able to paint a much more coherent picture than pulling siloed data from your HR or sales departments and viewing in a vacuum.
  • It allows for unified data governance levels: Using data governance maturity models and integrating data from all sources, organizations can holistically determine which level of data governance is right for them. This is much more difficult when data is siloed in various departments.

So what are BI tools good for?

As mentioned before, the many available BI or visualization tools are great for working with good data -- not for cleaning, unifying data formats or aggregating data from multiple sources. Popular tools are excellent for analytics, off-the-cuff reporting, data experimentation and other isolated projects or activities that don’t require a ton of integration with the rest of the organization. Users can play with their data using BI tools to see what works and what doesn’t. And if they come up with something valuable to more than just their department, it can then be migrated to the company-wide data platform. BI tools can also be used to great effect in combination with a modern data platform, which can automate the cleaning, de-duplication and unification of data sources and types. This ensures the derived insights are based on clean, unsiloed data. What many popular BI tools aren’t so great at, however, is creating governed and repeatable data processes across an organization. They can also force organizations to continually re-invest in different data modeling, management and creation processes for each tool brought in to the organization, as opposed to having everyone work from the same playbook. And that’s where a modern data platform comes in.

The benefits of a modern data platform

The first benefit of a modern data platform that includes a data lake, built-in ETL and support for advanced analytics and machine learning is full data integration. Modern data platforms specialize in automating data integration across an organization – including cleaning, de-duplicating and unification – which makes your data more valuable. After all, your data’s value is highest when it’s repeatable and well-managed. A modern data platform involves a combination of different technologies that work together to ingest, store, process and present data -- all while enabling governance, data audit readiness, security and PII protection -- resulting in a truly scalable and flexible solution. And those BI tools we mentioned? They are very valuable once you’ve prepared the data appropriately for them. A centralized platform allows users to plug in any number of their favorite data visualization tools, while working from the same data sets and producing repeatable results across the organization.   Learn how to break down data silos and achieve true integration with cloud-based master data management platforms with our latest ebook The book of data integration zen.

No Comments Yet

Let us know what you think

Subscribe by email