Businesses are attracted to the public cloud due to the elastic, on-demand and limitless compute and storage it provides to meet ever-changing business needs. The approach used by cloud service providers is to leverage a multi-tenant environment, where multiple clients are sharing the same infrastructure. The sharing of common resources makes the public cloud economically feasible for the providers. However, different clients will require different levels of service, and there is a need to ensure priority of one over another. Another challenge with the multi-tenant approach is that there is an opportunity for one client to negatively impact the performance of all others by consuming an excessive portion of performance resources. The term “noisy neighbor” is used for a co-tenant that monopolizes performance, bandwidth or compute resources to the detriment of other tenants sharing the same infrastructure.
Even on-premises private clouds can experience this problem when multiple business groups within the same company share a common infrastructure. Consolidation of data centers along with server virtualization have increased the degree of resource sharing and the opportunity for this performance conflict.

Quality of Service or QoS is a concept that originated in networking and is the concept that transmission rates, error rates, and other characteristics can be quantified and furnished to meet specifics levels of attainment. With storage, QOS guarantees a specific level of IOPS or latency at a per virtual machine or volume basis. The term “noisy neighbor” is used for a co-tenant that monopolizes performance, bandwidth or compute resources to the detriment of other tenants sharing the same infrastructure. Storage Systems typically enable the creation of multiple levels of QoS, such as gold silver and bronze, associating each with distinct levels of IOPS and latency. The objective is that the system allocates more resources to the holder of a higher QoS level. Maximums (or in the case of latency minimums) can be assigned to each service level preventing a noisy neighbor from monopolizing resources.

Performance QoS is the extent to which most storage systems engineer QoS, but there are other aspects of storage that could be managed to deliver a broader QoS experience. An increased reach of QoS would create an automated way to ensure service level agreements across a wider landscape of requirements. Such an approach is not easy to engineer and must be inherently architected from the ground up. What follows are some examples of what this broader landscape would include beyond IOPS and latency.

QOS for performance is the policy that exists in most storage systems that currently claim a QoS capability. The degree that different systems deploy performance QoS differ, but it should include a target, maximum, and minimum for IOPS. The maximum will prevent the effects of noisy neighbors, and the minimum ensures the performance QoS attainment. Establishing a target latency value also ensures that response times meet a pre-determined requirement. The key to this process is the identification of resources coupled with the aggregated QoS levels to assure the required infrastructure is in place. In the event performance QoS levels are not being met in total, the identification of the situation along with recommending actions such as add more ports or flash would provide administrators needed guidance.

QOS priority is workload oriented and gives virtual machines a hierarchy in the servicing of multiple VMs with the same performance policy. Each QoS performance policy can have an associated priority level that determines the prioritization of the policy under load. As an example, one could assign high (mission critical), medium (business-critical), and low (non-critical) priority levels each QoS performance policy and these service levels would influence the system on how to prioritize resources to maintain each QoS policy’s targets. Two volumes with the same policy will be prioritized by the priority level if system resources become constrained.

For QOS Location, there is an opportunity to use data tiering algorithms integrated with the QoS policies. This approach is especially useful when using multiple classes of storage. As an example, when using RAM, PCIe flash, consumer grade SSDs and capacity rated hard drives. The highest-priority virtual volumes could consume more of the faster flash with lower rated volumes consuming the different levels of storage based on the QoS Location level. It is important that the location service level prevents hotspots as part of its overall algorithm.

QOS Availability would expand the Location level of service to include the level of availability. Storing of mission or business critical data would on hardware with redundancy and a high level of flash endurance. Data with lower criticality could be stored onto media with lower reliability and without alternate pathing to reduce costs.

QOS Data Protection could schedule and execute snapshots, replication, and retention according to both performance policies and priority QoS levels. The combination of the two levels determines the frequency of protection and the volume order of snapshots and replication.

Have other ideas that will make good QoS elements for storage? Give me a shout at 303-534-9500 X134 or jmiller@emausa.com