A large software-as-a-service (SaaS) company depended on its on-premise, proprietary neural network (advanced machine learning) application in order to continually refine its product. Because the product is AI-based, properly training its underlying algorithms was of the utmost importance to ensure a well-functioning service that could quickly and accurately adapt to the needs of customers. The client’s on-prem Hadoop cluster (running on Cloudera Express) was used to collect and store tens of thousands of online audio files and metadata per day, stored in Hbase and processed using Apache Spark and the Apache Impala SQL query engine. These were then fed into a proprietary neural network (advanced machine learning) application and DataFox with the goal of continually refining and improving the system’s understanding and ability to respond to these interactions. However, because their version of Cloudera Express was no longer supported, the company needed to re-evaluate its data preparation and ingestion processes to find the most cost-effective and scalable alternative with the least possible disruption.
Pythian’s experience in data science, machine learning and neural networks – including our Machine Learning Partner Specialization from Google, and wealth of expertise working with other machine learning tools like AWS SageMaker, Apache Spark MLlib, TensorFlow, and Apache MXNet – meant we were well-positioned to provide advice on their best possible options. Pythian advised the client that its best bet to achieve continuity with its machine learning program while improving scalability was to replace its Hadoop cluster with a cloud-native solution such as AWS combined with Athena or Google Cloud Platform and Google BigQuery.
Pythian’s recommendation confirmed the client’s hunch that moving its machine learning data collection and ingestion processes to the cloud was the best way to continue its machine learning operations with the least disruption – ensuring the company’s software could continue improving in near-real-time – while also improving scalability and cost-effectiveness by using cloud-native ephemeral tools.
PYTHIAN® and LOVE YOUR DATA® are trademarks and registered trademarks owned by Pythian in North America and certain other countries, and are valuable assets of our company. Other brands, product and company names on this website may be trademarks or registered trademarks of Pythian or of third parties. Use of trademarks without permission is strictly prohibited.