Storage

Cloud Inter-Working – Distributed Data Access

In my previous post, Cloud Interworking Services described a new set of IT infrastructure services that enable reliable, and secure inter cloud access. In this post I am going to describe inter cloud data access by your
applications. As more applications leverage cloud infrastructure services data sets are being distributed CDAacross several clouds. Most applications will need access to data sets stored in one more cloud infrastructure services different from where they are running. For example when developing a new customer engagement mobile application that runs in your private
cloud you may need access to data stored in the Salesforce.com cloud and SAP ADAapplication data running at Virtustream. A well architected cloud infrastructure needs to enable friction-less data access by the new mobile application. Application access to any of your data sets is a basic requirement to compete in the digital economy. The faster IT can iterate on application development the faster the business will deliver customer value.

Application access to data sets created, and maintained remotely is not a new challenge for IT. Starting at the beginning of this decade the industry began using storage virtualization technologies to enable data sets to be accessible in multiple data centers. Products like EMC VPLEX, Hitachi USPV, and Netapp V-Series provide these capabilities. These storage virtualization technologies were primarily designed to enable rapid restart business continuity between clouds up to 100’s of miles away. It is not easy for multiple applications to easily access the same data sets simultaneously without implementing a complex, distributed lock manager to keep the data sets in a consistent state. I have seen many customers successfully create snapshot copies of the data to enable other application to access read only copies of transactional data sets for analytics processing. Storage virtualization is limited by distance and network latency typically not exceeding 50ms or <100 miles. Storage virtualization is mostly limited to block storage protocol limiting application access.

More recently storage gateway technologies have been introduced to place data sets in their most cost effective cloud service while maintaining application access over traditional block storage and file protocols. Typically these storage gateways will cache the most frequently accessed data locally to minimize access latency. The storage gateways will pull the data it doesn’t have cached locally transparent to applications. The challenge with most storage gateway technologies is the data is not easily accessible by applications running anywhere but the source site. Some of these storage gateway products I see most often are EMC CloudArray, and Panzura.

12FactorBoth storage virtualization and gateways technologies do not allow IT to provide ubiquitous access to data sets across multiple cloud services. In order to de-couple the data and applications a new architecture is required. New applications should access all data through standard API’s rather than traditional storage protocols. Data sets must be accessible independent of any single application and cloud infrastructure. Application architectures for modern mobile, web, and social application follow The Twelve-Factor App architecture where data sources are treated as backing services that are attached at run time. For example, a modern 12-Factor App should be able to attach and detach to any number of MySQL databases and data object store the same way each time regardless of which cloud infrastructure the application or data set is operating.

DatafabricFor existing data sets that are tightly coupled to applications new data fabrics will be necessary to virtualize access to data sources. For example, if you want an application to perform data analytics against data sets in SQL database and HDFS file system your application will need to rely on a data fabric product like Pivotal Hawq to access the two different data formats and execute a SQL query. New applications will leverage data fabric API’s to access legacy data sources such as ERP databases. These modern data fabrics manage metadata describing data sets including location, and format. Since new applications are creating more unstructured data (i.e. audio, video, images) in addition to tradition structured data (spreadsheets, SQL databases) application will need a data fabric to manage access consistently regardless of format.

Application access to all your data sets is critical to developing, and operating new software. While we have been making IT infrastructures more flexible with storage virtualization and gateways, the new data fabrics are critical to enabling the consumption of cloud infrastructure. In order for companies to successfully compete in the digital economy they need to be able to quickly develop new custom software delivering differentiated products, and customer experiences. In order to get the application development speed, and scale these applications need to be deployed in cloud infrastructures with a robust data inter cloud service.


Cloud Interworking Services




In my previous post, Cloud Is Not A Place I presented my case for enterprise IT needing four types of cloud services to support their application workloads. Many enterprise IT customers I work with are adapting a Bi-Modal IT strategy. One mode of cloud services for supporting their traditional 3-tier client-server applications such as SAP/R3, Oracle ERP, SharePoint, SQLServer based application. Most of these traditional systems are their systems of record. The second mode of cloud services is optimized for modern mobile, web, social and big data applications such as Salesforce.com, and custom developed web portal systems. Many of these applications are their systems of customer engagement.

CloudappWorkloadsMany applications workloads can be supported by just one of these cloud types but all enterprise IT application portfolio’s require a combination of more than one of these cloud types. For example, many businesses run SAP for ERP and use Salesforce.com for CRM. These two application workloads will be support by different cloud types. As you add more application workloads you must deal with applications that need access to other application generated data sets which may not run on the same cloud type service. You also see opportunities to leverage one cloud type for primary data and other cloud types for redundancy and protection. Frictionless access between these different cloud services is critical.

A new class of cloud services I call Cloud Interworking services is needed. These Cloud Interworking services are critical to maximizing application workload placement and inter-operability. I believe these Cloud Interworking services will enable enterprise IT organizations to provide the most differentiated and cost effective IT services for their businesses.

We have identified three basic Cloud Interworking services that modern enterprise IT need to support:

  • Data Set Access – access data sets easily from any cloud
  • Data Security – encryption of data in transit and at rest
  • Data Protection – data copies that can be used to restore failed data access requests

In my next series of posts I am going to discuss how these capabilities can be implemented today. These Cloud Interworking services will enable enterprise IT infrastructure teams to become their companies cloud portfolio manager. As the cloud portfolio manager they will be able reduce friction with their application development team while reducing costs and improving their agility.