What are the top data analytics platforms of 2015?

Published in World Economic Forum in February 2015.

The past few years has seen an explosion in the number of platforms available for big data analytical tasks.

The open source Hadoop framework is free to use, but very technical to set up and not specialized towards any particular job or industry. To use it in your business, you need a “platform” to operate it from.

These platforms are commercial offerings (you pay an ongoing service charge), most of which take the Hadoop framework and build on it, to provide analytical services of practical use to businesses and organizations.

So here is a rundown, in no particular order, of ten of the best and most widely used of these services. Like any commercial product in a competitive market, each has its advantages and disadvantages, and you need to make sure you are picking the right tool for the job.

Cloudera CDH

Cloudera was formed by former employees of Google, Yahoo, Facebook and Oracle and offers open source as well as commercial Hadoop-based big data solutions with the label Cloudera Distribution including Hadoop, known as CDH. Their distributions make use of their Impala analytics engine which has also been adopted and included in packages offered by competitors such as Amazon and MapR.

Hortonworks Data Platform

Unlike every other big analytics platform, HDP is entirely comprised of open source code, with all of its elements built through the Apache Software Foundation. They make their money offering services and support getting it running and providing the results you are after.

Microsoft HDInsight

Microsoft’s flagship analytical offering, HDInsight is based on Hortonworks Data Platform, but tailored to work with their own Azure cloud services and SQL Server database management system. A big advantage for businesses is that it integrates with Excel, meaning even staff with only basic IT skills can dip their toes into big data analytics.

IBM Big Data Platform

IBM offers a range of products and services designed to make complex big data analysis more accessible to businesses. They offer their own Hadoop distribution known as InfoSphere BigInsights.

Splunk Enterprise

This platform is specifically geared to businesses that generate a lot of their own data through their own machinery. Their stated goal is “machine data to operational intelligence”. Internet of Things is key to their strategy, and among other products they provide the analytics behind the Nest wifi-enabled smart thermostat. Their analytics also drives Dominos Pizza’s US coupon campaigns.

Amazon Web Services

Although everyone thinks of them as an online store, Amazon also make money by selling the magic that makes their business run so smoothly to other companies. The business model was based on big data from the start – using personal information to offer a personalized shopping experience. Amazon Web Services includes its Elastic Cloud Compute and Elastic MapReduce services to offer large-scale data storage and analysis in the cloud.

Pivotal Big Data Suite

Pivotal’s big data package is comprised of their own Hadoop distribution, Pivotal HD and their analytics platform Pivotal Analytics. Their business model allows consumers to store an unlimited amount of data and pay a subscription fee which varies according to how much they analyze. The company is strongly invested in the “data lake” philosophy, of a unified, object-based storage repository for all of an organization’s data.

Infobright

Another database management system, again available in both an open source, free edition and a paid-for proprietary version. This product is geared towards users looking to get involved with the Internet of Things. They offer three levels of service for paid users, with more users given access to the helpdesk, and quicker email support response times, for higher tier customers.

MapR

MapR offer their own distribution of Hadoop, notably different from others as it replaces the commonly-used Hadoop File System with its alternative MapR Data Platform, which it claims offers better performance and ease of use.

Kognitio Analytical Platform

Like many of the other systems here, this takes data from your Hadoop or cloud-based storage network and gives the users access to a range of advanced analytical functions. Kognitio is used by BT to help set their call charges and by loyalty program Nectar for its customer analytics.

As always, I hope this was useful? Please let me know if you have any views or comments on the topic. E.g. are there other platforms you would include? Any practical tips on picking the right one for you?

This article is published in collaboration with Linkedin. Publication does not imply endorsement of views by the World Economic Forum.

To keep up with Agenda subscribe to our weekly newsletter.

Author: Bernard Marr is a globally recognized expert in strategy, performance management, analytics, KPIs and big data.

Image: Visitors stand in front of QR-codes information panels during a ceremony to open an information showroom in central Moscow April 29, 2014. REUTERS/Maxim Shemetov 
Advertisements

Communications and Impact Metrics for Think Tanks

Published in CIGI online in July 2013

This blog is based on a presentation made at the conference “Think Tanks – Facing the Changing World,” hosted by the Chinese Academy of Social Sciences in Beijing, June 17-18, 2013.

Today, many of the world’s 5,500 think tanks are seeking more effective ways to communicate, to increase their impact – and exploring better ways to measure that impact.

My views on these tasks are shaped by 35 years in communications, including in newspapers and news websites, as well as my work these past three years with an independent, non-partisan global think tank, The Centre for International Governance Innovation (CIGI).

For any organization, including think tanks, good communications begin with the creation of an overall strategic plan. This may seem obvious, but any enterprise is more likely to succeed with a clear mission and goals (many of us can identify cases where a muddy plan led to poor results). Mission is a definition of purpose. Goals define what success will look like: the desired impact. Tactics are the actions necessary to achieve those goals. It helps everyone in the organization if a strategy combining these elements in a logical fashion is written consultatively, then shared internally, so that each person can see how his or her work contributes to the overall plan.

A traditional view of think tanks is that their strategy requires them to conduct research and analysis to develop policy ideas, and then communicate their policy ideas both directly and indirectly. They can communicate directly, to policy makers who exercise power by making decisions. They can also communicate indirectly, to policy influencers, such as the media, scholars and citizens.

One challenge, however, is measuring the influence of think tanks, especially in the areas of policy impact, to assess whether the strategic plan was successful. The problem is one of attribution — who gets the credit for a policy that is implemented? Policy input comes from many places. Public or governmental policy development is a complex and iterative process in which policy ideas are researched, analyzed, discussed and refined — often through broad consultations with many stakeholders. When a policy is finally adopted, it may wear the fingerprints of many hands. For these reasons, a think tank cannot always claim success and say, “this policy was our idea.” In many cases, it would be highly unusual for a political leader to give credit to a particular think tank for a specific policy; such leaders must take ownership of their own policies, to be accountable for them.

In creating impact, a think tank can extend its role beyond that of conducting research, analyzing and identifying policy problems or sharing policy ideas. For example, think tanks also have the ability to convene meetings of different groups at conferences, seminars and workshops — to connect people and to facilitate dialogue. As conveners, think tanks have the ability to build bridges among diverse groups such as policy makers, non-governmental organizations, academics, business leaders and the media. In this way, think tanks can create a sort of “Track II” process — a catalytic role in which the think tank’s own influence is, once again, hard to measure. Think tanks may also have a role in education; through training programs, education and outreach, think tanks can help to develop the next generation of diplomats, bureaucrats and political leaders.

In communications, it is important for think tanks to reach the right people, with the right message, using the right method. Think tanks use a variety of communications channels — as different channels may be more effective with certain audiences. To reach top leaders, for example, an ineffective method would be to rely on academic-style research papers — because high-level leaders are busy and have little time to read.  The best method of outreach to senior leaders might be small meetings to present research findings in person – but this depends on having access to leaders, through a think tank’s network of people with excellent connections. Meanwhile, middle-level officials can be reached through multiple channels, such as conferences, workshops, papers and policy briefs (research papers might be 5,000 to 10,000 words, or more; but policy briefs are shorter documents of 1,000 to 1,500 words, which distill the key policy recommendations into a few concise findings or policy recommendations). Academics and scholars are more easily reached through well-written research papers and scholarly books. The wider public can best be reached through accessible websites and through the news media. For outreach through news media, think tanks must deploy skilled communications specialists to create and send news releases written in journalistic style, and who will follow up personally with journalists with whom they have developed relationships through regular contact. Other channels of communications include social media, newsletters (including email newsletters) and annual reports — each suitable to a particular audience. Good communication plans use a combination of all of these channels to achieve the greatest impact.

The “Cycle of Impact” for a think tank has three phases. The first phase is to Plan. Researchers within think tanks consult with policy makers to better understand the challenges and issues those policy makers are facing; they design projects to address those topics, and the design includes an allocation of resources, budgets, staff and timelines. The second phase is to Engage. The think tank may engage in deep research and analysis of the topic, including the historical context and policy options; it may also convene conferences and public or private meetings as necessary; and it may communicate its findings through publications, websites and social media. The final step is to Measure. The think tank may track the quantity of outputs in publications, media mentions, website traffic and social media hits; it may evaluate the quality of the outputs (even if this is a subjective judgment) and it may even try to assess the actual impact on public policies (although this raises the difficulty of attribution, as discussed earlier); and it may report on these measurements to stakeholders, such as funders of the think tank. The third phase is the easiest to overlook, but measuring outcomes can yield valuable lessons to help a think tank improve its work.

We can think of many things to measure at a think tank. What follows is a list of 15 possible metrics, as suggested by various experts on think tanks — and unfortunately, the more useful ones to consider may also be the hardest to measure in exact numbers. These metrics can be grouped, with the first five metrics being measures of Exposure, based on an assumption that more influential think tanks are more exposed to public view.

  1. Media mentions: These are citations of the think tank, by name, in media such as newspapers and news websites. Some third-party services can be hired to measure citations, or think-tank staff can search the Web with Internet search engines. Online searches are imperfect, however; they may not capture references that occur in traditional print only, or on television or radio; and they may miss citations behind pay walls or other security measures.
  2. Number and type of publications. This is strictly a quantitative measure of the think tank’s publications, and does not evaluate the actual content of the publications as being of a high quality or not.
  3. Scholarly citations. These include citations of the think tank’s work in academic journals.
  4. Government citations. These include citations of the think tank’s work in government meetings or official party proceedings.
  5. Think tank ratings. How did the think tank fare in annual ratings, such as those produced by the University of Pennsylvania? Some critics see such rankings as mere popularity votes, based on perceptions only, with methodologies that do not take into account different structures, funders, missions or other characteristics of think tanks. Nevertheless, the ratings do garner considerable attention.

The next group of metrics looks at Resources, based on the assumption that more resources allow a think tank to exercise more clout and, hence, achieve more influence.

  1. Quality, diversity and stability of funding. The source of its money may reflect on a think tank’s independence, support and connections.
  2. Number, experience, skills, reputation of experts, analysts and researchers. It’s easy to count heads, but reputation is a subjective quality and harder to measure.
  3. Quality and extent of networks and partnerships. Influence is not just a question of who you are, but who you know.

The next group of metrics concerns Demand — that is, does anyone actually want to see or hear from a particular think tank?

  1. Events. The number of conferences, lectures and workshops, and the number of attendees (both of these are a simple quantifiable measure). Harder to measure is the quality of the attendees. Are we just filling the room or are we attracting influential opinion leaders, powerful policy makers and top-level experts?
  2. Digital traffic and engagement. Number of website visitors, page views, time spent on pages, “likes” or followers.
  3. Official access. Number of consultations with officials, as requested by the officials themselves.
  4. Publications sold or downloaded from websites. This is not the measure of output, but rather the external “pull” on the publications.

The final group of metrics considers Policy Impact and Quality of Work. These may be the most important things to measure, but also are among the most difficult to quantify.

  1. Policy recommendations considered or actually adopted. As discussed previously, this is a problem of attribution. A think tank may say it put forward an idea, but if others had the same idea, who gets the credit if a policy is implemented?
  2. Testimonials. Praise, criticism or other assessments of a think tank’s work can be collected through interviews with policy makers or recognized experts; this work can be done by external, independent evaluators, reporting to the think tank’s board or funders. As well, opinions about the think tank can be collected through formal surveys of the organization’s event attendees or subscribers to its newsletters and publications.
  3. Quality of the think tank’s work. This is the most subjective of all metrics, but criteria for quality can be developed and defined, and placed on scales (such as from 1 to 10). How good were the publications in terms of readability and insight? How relevant were the projects and outputs to real-world problems and issues? How effective is the think tank in communicating its messages? Again, external and independent evaluators can be hired to make these highly subjective judgments.

In summary, to achieve maximum impact, think tanks should develop an overall strategic plan for the organization, plan their research projects consultatively with policy makers, engage their audiences through channels that are carefully designed to reach the right people using the right method and, finally, measure the outcomes of their work to ensure the goals were met.

Prepared with the assistance of CIGI Public Affairs Coordinator Kelly Lorimer. 

Achieving a data revolution in sustainable development: open data for development

Published in publishwhatyoufund in October 2014.

Submission to the UN Expert Advisory Panel on the Data Revolution

Achieving a data revolution in sustainable development: open data for development

We welcome the appointment of the Independent Expert Advisory Group (IEAG) on the Data Revolution. We believe that open, timely and comparable data is needed to unlock the power of information to drive positive outcomes for sustainable development.

Definition of a data revolution

A data revolution will see timely, accessible, comprehensive and comparable information about development-related activities and impacts made public, in a way that different users can freely access it to monitor, compare, use and reuse the information for decision making, planning and accountability. This includes financial, descriptive and performance-related information.

Principles for a data revolution

Transparency, accountability and citizen engagement are now accepted as central to more effective development and are reflected in the current discussions on the post-2015 Development Agenda. In support of these discussions, the Expert Advisory Group should establish some basic principles in order to maximise both the availability and use of the data:

  1. Information should be published proactively: All providers and recipients of sustainable development flows should make public what they are doing, for whom, when and how.
  2. Information should be comprehensive, timely, accessible and comparable: Development information should be provided in open, comparable formats. Organisations should develop their systems to better facilitate the collection and publication of timely information.
  3. Everyone should be able to request and receive information on sustainable development processes: Everyone needs to be able to access the information as and when they wish. The information should be open by default.
  4. The right of access to information should be promoted: Governments and other organisations engaged in sustainable development should actively promote this right. This includes private companies, foundations, academic institutions, civil society organisations (CSOs) and other third parties.
  5. Open data and new technologies should be leveraged: Organisations should draw on the potential that new technologies offer for transparency and accountability. They should provide incentives to make the data more accessible to different stakeholders, including by investing in capacity building and adopting open data policies and practices.
  6. Build an enabling environment for citizen-led accountability: Open data is not a ‘silver bullet’ for accountability; citizens need political space to feed into discussions and enabling conditions to exert their right to access to information and participation.

Open data standards

Within the context of the Post-2015 Development Agenda, the international development community has been discussing the importance of standards and harmonisation to increase the transparency and comparability of development flows. These discussions can benefit from the lessons learnt from establishing and implementing the International Aid Transparency Initiative (IATI), a multi-stakeholder initiative comprised of donors, partner countries, foundations, open data experts and civil society.

Agreed in 2011, the IATI Standard is a technical publishing framework allowing open data from different development organisations to be compared, aligned with partner country budgets, and linked to results at national level. The Standard was developed after extensive consultations on the information needs of partner countries, CSOs and donors

Based on Publish What You Fund’s experience of advocating for an open standard for aid data, we recommend the establishment of compatible open data standards for other flows based on the following approach:

  1. Consult with users: Open consultations with user groups help identify their needs early on and build them into open data standards as they are being developed.
  2. Engage all stakeholders: A multi-stakeholder governance structure and working groups help ensure that open data standards are fit for purpose for various providers and users of the information.
  3. Invest in information management systems: The best quality information is backed up by good internal data collection systems that allow detailed, disaggregated information to be published automatically, using the “publish once, use often” approach. As well publishing the information as raw data, visualising it and making it available via open data portals help improve accessibility for non-expert users.
  4. Publish, use and improve: Several organisations initially published a limited amount of information to IATI and then continually made improvements, both to the coverage and timeliness of the data. This “publish, use, improve” approach allows for quick progress and external feedback helps identify systems improvements for driving up the quality of the data.
  5. Build awareness and capacity: Data supply does not guarantee use. This is partly due to capacity constraints of users; lack of awareness of what new data is available; and lack of systems required for mapping the information to other datasets already being used.
  6. Coordinate processes: Close coordination between the policy and technical functions of some IATI publishers has meant they have progressed quickly. Sharing best practice with others helps drive the supply and increase the quality of the data made available.

Getting the basics right

For a data revolution to have maximum impact, it needs to be built on these basic foundations:

Untitled

Joining up different datasets, ensuring that they are standardised and data quality need to remain at the centre of discussions on the data revolution. Crucially, lessons learnt from opening up information on development flows via a common, open standard need to be incorporated into any new open data initiatives, building on the work done to date. The next step is to ensure the interoperability of different standards so the richness and usefulness of the data is enhanced.

Adequate resources and funds need to be allocated to all these activities to ensure continuous progress. The potential for open data to have a profound impact on development outcomes is enormous but it will require a truly multi-stakeholder approach to ensure that citizens and users are at the heart of discussions on a data revolution.