Published in World Economic Forum in February 2015.
The past few years has seen an explosion in the number of platforms available for big data analytical tasks.
The open source Hadoop framework is free to use, but very technical to set up and not specialized towards any particular job or industry. To use it in your business, you need a “platform” to operate it from.
These platforms are commercial offerings (you pay an ongoing service charge), most of which take the Hadoop framework and build on it, to provide analytical services of practical use to businesses and organizations.
So here is a rundown, in no particular order, of ten of the best and most widely used of these services. Like any commercial product in a competitive market, each has its advantages and disadvantages, and you need to make sure you are picking the right tool for the job.
Cloudera was formed by former employees of Google, Yahoo, Facebook and Oracle and offers open source as well as commercial Hadoop-based big data solutions with the label Cloudera Distribution including Hadoop, known as CDH. Their distributions make use of their Impala analytics engine which has also been adopted and included in packages offered by competitors such as Amazon and MapR.
Unlike every other big analytics platform, HDP is entirely comprised of open source code, with all of its elements built through the Apache Software Foundation. They make their money offering services and support getting it running and providing the results you are after.
Microsoft’s flagship analytical offering, HDInsight is based on Hortonworks Data Platform, but tailored to work with their own Azure cloud services and SQL Server database management system. A big advantage for businesses is that it integrates with Excel, meaning even staff with only basic IT skills can dip their toes into big data analytics.
IBM offers a range of products and services designed to make complex big data analysis more accessible to businesses. They offer their own Hadoop distribution known as InfoSphere BigInsights.
This platform is specifically geared to businesses that generate a lot of their own data through their own machinery. Their stated goal is “machine data to operational intelligence”. Internet of Things is key to their strategy, and among other products they provide the analytics behind the Nest wifi-enabled smart thermostat. Their analytics also drives Dominos Pizza’s US coupon campaigns.
Although everyone thinks of them as an online store, Amazon also make money by selling the magic that makes their business run so smoothly to other companies. The business model was based on big data from the start – using personal information to offer a personalized shopping experience. Amazon Web Services includes its Elastic Cloud Compute and Elastic MapReduce services to offer large-scale data storage and analysis in the cloud.
Pivotal’s big data package is comprised of their own Hadoop distribution, Pivotal HD and their analytics platform Pivotal Analytics. Their business model allows consumers to store an unlimited amount of data and pay a subscription fee which varies according to how much they analyze. The company is strongly invested in the “data lake” philosophy, of a unified, object-based storage repository for all of an organization’s data.
Another database management system, again available in both an open source, free edition and a paid-for proprietary version. This product is geared towards users looking to get involved with the Internet of Things. They offer three levels of service for paid users, with more users given access to the helpdesk, and quicker email support response times, for higher tier customers.
MapR offer their own distribution of Hadoop, notably different from others as it replaces the commonly-used Hadoop File System with its alternative MapR Data Platform, which it claims offers better performance and ease of use.
Like many of the other systems here, this takes data from your Hadoop or cloud-based storage network and gives the users access to a range of advanced analytical functions. Kognitio is used by BT to help set their call charges and by loyalty program Nectar for its customer analytics.
As always, I hope this was useful? Please let me know if you have any views or comments on the topic. E.g. are there other platforms you would include? Any practical tips on picking the right one for you?
This article is published in collaboration with Linkedin. Publication does not imply endorsement of views by the World Economic Forum.
To keep up with Agenda subscribe to our weekly newsletter.
Author: Bernard Marr is a globally recognized expert in strategy, performance management, analytics, KPIs and big data.