Open-source source platforms for big data have exploded in popularity. And in the past few months, it seems like nearly everyone is feeling the fallout.
Cost, flexibility and the availability of trained personnel are major reasons for the open-source boom. Hadoop, R and NoSQL are now the supporting pillars of many enterprises' big data strategies, whether they involve managing unstructured data or performing complex statistical analyses on it."
It's almost hard to keep up: SAP AG recently released a new product, SAP BusinessObjects Predictive Analysis, software that integrates algorithms from the open-source R language, which is used extensively in the academic community for advanced statistical modelling.
A few weeks before that, Teradata Corp. announced that its new integrated analytics portfolio would include R functionality as well as a connection to GeoServer, a Java-based open-source geolocation platform. Countless other companies are rushing to build links to Hadoop.
Widespread Adoption, Feverish Innovation
James Kobielus, then an analyst at Forrester Research Inc. (he's now senior program director for product marketing of big data analytics solutions at IBM Corp.), wrote in an e-mail message that "open-source approaches have the momentum of the most widespread adoption and the most feverish innovation."
But what's the rush?
First of all, Kobielus explains, just as open-source products ranging from Mozilla to Android have earned widespread acceptance in the IT community after some birth pains, open-source data storage and analysis software have now matured ("no longer the risky bet they were just a year or two ago," as he puts it).
Secondly, Kobielus wrote, platforms like Hadoop, R and NoSQL have enjoyed an advantage over proprietary software because they were able to evolve faster. They're also being continuously developed and refined by many different parties. Pretty soon, he predicts, open-source will begin to dominate the big data world.
"As the footprint of closed-source software shrinks in many data/analytics environments, many incumbent vendors will evolve their business models toward open-source approaches," he wrote, "and also ramp up professional services and systems integration to assist customers in their moves towards open-source, cloud-oriented analytics, much of it focused on Hadoop and R.
"Forrester regards Hadoop, for example, as the nucleus of the next-generation enterprise data warehouse (EDW) in the cloud, and R as a key codebase in the coming wave of integrated big data development tools. We also expect various open-source NoSQL databases and tools to coalesce into rich alternatives to closed-source content analytics offerings."