I apologize for the lack of postings lately, but the APM marketplace is full of news and I’ve been off covering it in a number of new papers posted at: www.emausa.com. I also took the time to write an article for APM Digest entitled “Why Your APM Solutions may not be Cloud Ready”, available at: www.apmdigest.com/why-your-apm-solutions-may-not-be-cloud-ready. In that article, I discussed some of the ways that hybrid Cloud transactions– those spanning multiple hops, on- and off-premise– are “breaking” traditional APM paradigms.

My coverage of hybrid Cloud management has been ongoing for several years. The latest research indicates that approximately 50% of companies are already running transactions that span on and off-premise hosting. At the same time, there are few good answers to the problem of monitoring and troubleshooting these complex environments.

I recently heard about an IT organization that has deployed 2,000 hop transactions—yes, Virginia, a single transaction that traverses 2,000 hops. These complex environments—which I call “FrankenApps”– break traditional deep dive application management toolsets for two primary reasons (other than sheer scale). One is a lack of APIs and instrumentation on the side of the Cloud provider. The other is that most APM vendors haven’t invested the resources necessary to connect to APIs for those providers which do supply them. And without visibility to off-premise tiers, APM toolsets cannot support deep-dive troubleshooting and root cause diagnosis in these complex environments.

If you remember, Dr. Frankenstein’s created his “manlike” invention from bits and pieces of bodies, wire, and string he picked up on his gruesome nightly forays. He brought the entity to life with lightning, and his creation functioned more or less as a viable living creature. However the outcome was less than successful. The monster took on a life of its own, spiraled out of control, and wrought general havoc.

The similarities to FrankenApps are obvious. They can span multiple bits and pieces of hardware and software, from back office mainframes to virtualized database and application servers, web technology, hosted SOA components, IaaS, PaaS, and/or SaaS, endpoints, etc. When orchestrated and integrated, they resemble an “applicationlike” entity which, like Frankenstein, is very likely to get out of control.

In fact, the dirty little secret about public Cloud “ease of use” is that most such environments introduce significant monitoring and management challenges. Some, but not all, IaaS vendors, for example, provide monitoring tools and/or APIs. Amazon and Rackspace do, however this is not the case for all such providers. Fewer SaaS and PaaS vendors do so, although more mature companies such as OpSource have done so for years.

Even for public Cloud providers which do deliver such visibility, it is highly variable in terms of granularity and accessibility. Typically, where such capabilities do exist, they are self-contained, stand-alone solutions delivering variable levels of visibility to the outside world via APIs.

The problem is exacerbated by software vendors, whose Application Performance Management tools may or may not integrate with these APIs. The result is that many existing toolsets have minimal visibility to these environments.

While synthetic transactions and end user experience monitoring tools are good options for determining high-level, end-to-end performance and availability, as stand-alone technology they lack the granularity necessary for troubleshooting public Cloud tiers. Most IT organizations I have spoken to, for example, tell me they are leveraging synthetic transactions for high-level performance metrics, but manually correlate performance across monitoring stations and/or geography for troubleshooting purposes.

From an industry perspective, we need two things immediately:

Cloud service providers (virtualization, IaaS, PaaS, and SaaS vendors) must deliver “hooks” that provide monitoring metrics to traditional APM solutions. At minimum, the tools need to be able to “see” the transaction as it enters and exits to quantify time spent and verify completion. Ideally, error messages and even payload contents should be visible as well. Such information makes it possible to isolate performance problems in multi-hop transactions to a single tier or set of tiers.

Cloud APM vendors must develop the partnerships and “hooks” necessary to incorporate provider metrics into existing monitoring systems. Most vendors I speak to are notably lacking these partnerships, which means they lack visibility to Cloud provider metrics. This is aggravated by the fact that there is no common cross-vendor API/protocol standard such as network management solutions have in SNMP. IBM is promoting the Open Services for Lifecycle Collaboration (OSLC) standard (see open-services.net). As of this writing, however, the member list includes only two major APM vendors: IBM and Oracle. While such a standard is an excellent longer-term option, history has proven that such standards take a long time to develop. In the short term, APM tools vendors need to integrate the old fashioned way—by developing relationships and creating adapters for Cloud provider APIs.

I will be writing about this topic throughout the year, as I truly believe we as an industry finally need to step up to the plate with a better answer. Heck, as a short-term stopgap, I’d even be willing to extend SNMP to applications—it is already the de-facto monitoring standard for everything from routers to servers to toasters (well, maybe not toasters…). It is also easy to embed in software systems (a command line is all you need), exists now, and would at least provide SOME level of information until application-specific standards can be developed—a process which, regrettably, could take years.

Meanwhile, Cloud IT customers are the ones that pay the price, since they get the phone call from users when performance goes south. In the end, CIOs have the power to vote with their pocketbooks. Requirements for manageability APIs, and for API support, should be part of the RFP process, and customers should buy public Cloud, virtualization, and APM solutions accordingly.

I would love to hear from Cloud providers and APM vendors who are addressing these concerns. Feel free to respond via a comment to this blog post or at: jcraig@enterprisemanagement.com.

Enhanced by Zemanta