In my previous post, Cloud Interworking Services described a new set of IT infrastructure services that enable reliable, and secure inter cloud access. In this post I am going to describe inter cloud data access by your
applications. As more applications leverage cloud infrastructure services data sets are being distributed across several clouds. Most applications will need access to data sets stored in one more cloud infrastructure services different from where they are running. For example when developing a new customer engagement mobile application that runs in your private
cloud you may need access to data stored in the Salesforce.com cloud and SAP application data running at Virtustream. A well architected cloud infrastructure needs to enable friction-less data access by the new mobile application. Application access to any of your data sets is a basic requirement to compete in the digital economy. The faster IT can iterate on application development the faster the business will deliver customer value.
Application access to data sets created, and maintained remotely is not a new challenge for IT. Starting at the beginning of this decade the industry began using storage virtualization technologies to enable data sets to be accessible in multiple data centers. Products like EMC VPLEX, Hitachi USPV, and Netapp V-Series provide these capabilities. These storage virtualization technologies were primarily designed to enable rapid restart business continuity between clouds up to 100’s of miles away. It is not easy for multiple applications to easily access the same data sets simultaneously without implementing a complex, distributed lock manager to keep the data sets in a consistent state. I have seen many customers successfully create snapshot copies of the data to enable other application to access read only copies of transactional data sets for analytics processing. Storage virtualization is limited by distance and network latency typically not exceeding 50ms or <100 miles. Storage virtualization is mostly limited to block storage protocol limiting application access.
More recently storage gateway technologies have been introduced to place data sets in their most cost effective cloud service while maintaining application access over traditional block storage and file protocols. Typically these storage gateways will cache the most frequently accessed data locally to minimize access latency. The storage gateways will pull the data it doesn’t have cached locally transparent to applications. The challenge with most storage gateway technologies is the data is not easily accessible by applications running anywhere but the source site. Some of these storage gateway products I see most often are EMC CloudArray, and Panzura.
Both storage virtualization and gateways technologies do not allow IT to provide ubiquitous access to data sets across multiple cloud services. In order to de-couple the data and applications a new architecture is required. New applications should access all data through standard API’s rather than traditional storage protocols. Data sets must be accessible independent of any single application and cloud infrastructure. Application architectures for modern mobile, web, and social application follow The Twelve-Factor App architecture where data sources are treated as backing services that are attached at run time. For example, a modern 12-Factor App should be able to attach and detach to any number of MySQL databases and data object store the same way each time regardless of which cloud infrastructure the application or data set is operating.
For existing data sets that are tightly coupled to applications new data fabrics will be necessary to virtualize access to data sources. For example, if you want an application to perform data analytics against data sets in SQL database and HDFS file system your application will need to rely on a data fabric product like Pivotal Hawq to access the two different data formats and execute a SQL query. New applications will leverage data fabric API’s to access legacy data sources such as ERP databases. These modern data fabrics manage metadata describing data sets including location, and format. Since new applications are creating more unstructured data (i.e. audio, video, images) in addition to tradition structured data (spreadsheets, SQL databases) application will need a data fabric to manage access consistently regardless of format.
Application access to all your data sets is critical to developing, and operating new software. While we have been making IT infrastructures more flexible with storage virtualization and gateways, the new data fabrics are critical to enabling the consumption of cloud infrastructure. In order for companies to successfully compete in the digital economy they need to be able to quickly develop new custom software delivering differentiated products, and customer experiences. In order to get the application development speed, and scale these applications need to be deployed in cloud infrastructures with a robust data inter cloud service.