Over the past six months I have been engaged with a number of EMC customers, and
partners creating Big Data Analytics solutions. Big Data Analytics is one of the fastest growing segments of IT. The amount of data being generated by us and about us is projected to grow by more than 50x this decade according the latest IDC Digital Universe study. Customers are looking to analyze their data more deeply to make better decisions.
Customers are typically looking to deploy Hadoop services to analyze their vast amounts of data. Customers IT organization are experiencing three main types of challenges supporting Big Data Analytics Hadoop projects:
- Complexity
- Cost
- Security
The complexity challenge is created with the expanding number of Hadoop (Cloudera, Hortonworks, Pivotal, and Intel, …) distributions. Each of these distributions offers unique capabilities and all have significant number of customer deployments. All Hadoop distributions require IT to deploy a new file system service, HDFS. Our customers need to leverage their existing storage investment. Over the past year at EMC we have added native HDFS support to our most popular data analytics storage product, Isilon. Existing Isilon customers, in addition to new Isilon deployments can have data stored on Isilon accessible by HDFS without any data copying required. In addition, to further reduce complexity for our customers we have built deployment guides for all the major Hadoop distributions to provide a simple, low risk, repeatable process to deploy Hadoop using existing IT capacity. These deployment guides or starter kits as we like to call them are available here.
Shortly EMC will be adding HDFS data services to our ViPR product. This will allow customers to use the HDFS protocol to be able to access data on any of the ViPR support storage platforms. A preview demo of Cloudera using ViPR HDFS services was published by Jim Ruddy recently here.
Enabling customers to leverage their existing IT infrastructure reduces their upfront costs. Our customers find it easier to get started with their Hadoop deployment. Our customers also eliminated training costs needed to support new infrastructure dedicated to Hadoop processing.
Recently many of our customers in the healthcare and financial markets have needed to implement more robust security governance capabilities. As customers implement Hadoop solutions, they are often taking copies of data from several of their transaction processing systems with mature security governance processes in place and dumping them into large HDFS data lakes with little security governance. This exposes customers to HIPPA and finance (SEC17a-4) violations. We have partnered with a company, Rainstor at several customers recently. Rainstor, provides a robust set of protection and auditing security services to Hadoop data repositories leveraging tight integration with EMC Isilon. The security governance services they provide are:
- Authentication – Role Based Access Controls
- Authorization – ACL’s by user
- Encryption – Data at Rest
- Audit Trail – logs data access by user for audit
- Immutability – data can never changed
The Rainstor services can be added with minimal configuration changes to the main Hadoop distributions.
Customers have responded positively to our Hadoop solutions including our imminent ViPR support for HDFS. In last quarter we have over 100 new Hadoop implementations using EMC infrastructure. I look forward to continuing to work with our customers and partners to remove complexity, cost, and improve security for their Big Data Analytics services. I see this market continuing to grow rapidly in 2014 and IT departments will need to know how to quickly deploy robust Hadoop services with minimal cost and risk.
I actually didn't even realize that Intel had their own distro. That's pretty cool to see them involved in Hadoop. Guess I need to go check that out now.
Great post, Eddie!
Posted by: Clint | 09/24/2014 at 03:04 PM