Pythian Blog: Technical Track

Top big data analytics trends hold true as we look toward 2020

The overall importance of data and information within organizations has continued to grow. We’ve also seen the continued rise of megatrends like IoT, big data – even too much data – and of course, machine learning. That’s along with the ongoing maturation of other, perhaps less known, but equally important data initiatives such as governance and integration in the cloud. Early in January, we forecasted the top trends for 2019. Now that we are well into the year, we are seeing new and exciting developments take shape that we anticipate will spur even more data sources and types, more demand for integration and cost optimization and even better analytics and insights for organizations. You may have already noticed these trends gaining momentum in 2019:
  1. IoT and the growth of digital twins. Even though the Internet of Things was on everyone’s lips in 2018, the buzz around the digitization of the world around us and its implications for data isn’t going away. The frenzied growth of IoT data – along with many organizations’ continued inability to handle or make sense of all that data with their traditional data warehouses – continued to be a major theme of 2019. And it promises to gain momentum, presenting very real business opportunities for more organizations in 2020. Adding fuel to this ever-expanding fire is the ongoing growth of digital twins, which are digital replicas of physical objects, people, places and systems powered by real-time data collected by sensors. By some estimates there will be more than 20 billion connected sensors by 2020, powering potentially billions of digital twins. To capture the value of all that data, it needs to be integrated into on a modern data platform using an automated data integration solution that engages in data cleaning, de-duplication and unification of disparate and unstructured sources.
  2. Augmented analytics. Until recently, most qualitative insights still had to be teased out by data scientists or analysts after poring over reams of quantitative data. But with augmented data, systems can use artificial intelligence and machine learning to suggest insights pre-emptively. Gartner says this will soon become a widespread feature of data preparation, management, analytics and business process management, leading to more citizen data scientists as barriers to entry come down – especially when combined with natural language processing, which makes possible interfaces that let users query their data using normal speech and phrases.
  3. The harnessing of dark data. Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” This kind of data is often recorded and stored for compliance purposes only, taking up a lot of storage without being monetized either directly or through analytics to gain a competitive advantage. But with organizations increasingly leaving no business intelligence-related stone unturned, we’re likely to see more emphasis placed on this as-of-yet relatively untapped resource, including the digitization of analog records and items (think everything from dusty old files to fossils sitting on museum shelves) and their integration into the data warehouse.
  4. Cold storage and cloud cost optimization. Migrating your data warehouse to the cloud is almost always less expensive than an on-premise build, but that doesn’t mean cloud systems can’t be cost-optimized even further. And because of this, in 2019 more organizations have been turning to cold data storage solutions such as Azure Cool Blob and Google’s Nearline and Coldline. And with good reason: Parking older and unused data in cold storage can save organizations as much as 50 percent on storage costs, thus freeing up cash to invest in data activities that can generate ROI instead of being a money drain.
  5. Edge computing and analytics. Edge computing takes advantage of proximity by processing information as physically close to sensors and endpoints as possible, thus reducing latency and traffic in the network. Gartner predicted that edge computing and cloud computing would become complementary models in 2019, with cloud services expanding to live not only in centralized servers, but also in distributed on-premise servers and even on the edge devices themselves. This will continue to decrease latency and costs for organizations processing real-time data through 2020. Some say that edge computing and analytics can also help increase security due to its decentralized approach, which localizes processing and reduces the need to send data over networks or to other processors. Others, however, note that the increased number of access points for hackers that these devices represent – not to mention that most edge devices lack IT security protocols – leaves organizations even more open to hacks. Either way, the explosion in edge computing and analytics means an even greater need for a flexible data warehouse that can integrate all your data types when it’s time to run analytics.
  6. Data storytelling and visualization. Data storytelling and visualization continues to become more established in 2019 as more organizations move their traditional and often siloed data warehouses to the cloud. An increase in the use of cloud-based data integration tools and platforms means a more unified approach to data, in turn meaning more and more employees will have the ability to tell relevant, accurate stories with data using an organization’s single version of the truth. And as organizations use even better and improved integration tools to solve their data silo problems, data storytelling will become more trusted by the C-suite as insights gleaned across the organization become more relevant to business outcomes.
  7. DataOps. The concept of DataOps really started to emerge this year, and continued to grow in importance in 2019 as data pipelines became more complex and required even more integration and governance tools. DataOps applies Agile and DevOps methods to the entire data analytics lifecycle, from collection to preparation to analysis, employing automated testing and delivery for better data quality and analytics. DataOps promotes collaboration, quality and continuous improvement, and uses statistical process control to monitor the data pipeline to ensure constant, consistent quality. Because when experts predict organizations should be able to handle 1,000 data sources in their data warehouse, it means truly automated and always-on data integration will be the difference between delivering value and drowning.
To fully take advantage of these trends and more, most organizations are coming to understand that their traditional data warehouse just won’t cut it. As more and more endpoints, edge devices and other data sources spur newer and newer data types, it’s imperative to stay prepared by using a flexible data platform that’s able to automate and integrate all your data sources and types at scale. Find out how Pythian can help your organization meet these challenges and opportunities.

No Comments Yet

Let us know what you think

Subscribe by email