Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers

ManualsBrandsHP ManualsServerHP ProLiant SL4540 Gen8 Server

Table Of Contents

Reference Architecture| Ceph on HP ProLiant SL4540 Gen8 Servers

Block testing

• Test phases for random IO are 8k read, write, and 70% read/30% write mix. Test phases for sequential IO are 256k read

and write. All block IO is submitted to the same 4TB RADOS block device mapped to all three traffic generators. The

sequential tests are started at offsets of 0, 1, and 2TB on the block device

• The RBD pool was left at default 4M striping

• Block IO test passes last 30 minutes each

• The test was set up with a level of performance that is ‘reasonably’ stressful to characterize the cluster, rather than for

maximum performance. The iodepth was set to 8, and ioengine used was asynchronous

These mixes are a characterization of ‘real world’ small block random and large block sequential loads, respectively. Tests

like these are a subset of common canned test block benchmark loads and are representative of IO on VM boot/data drive

images. Unlike object testing, there isn’t as much opportunity to do caching on the reads—they’re done before a write

phase or with random distribution across the pool. The biggest performance limiting factor here is the amount of load from

the test, not an element of the cluster. The block test suite represents load on a cluster with performance headroom.

Bounding principles and choices

With a large number of variables and a lot of data to present, the test matrix was chosen as a good representation of cluster

behavior under load while limiting scaling and tuning variables. This type of benchmarking won’t represent production

traffic, but does form a base for the reader to extrapolate from when configuring their own cluster.

Important factors to consider about the tests chosen:

• Without a particular use case to simulate, the test standardizes on a single thread count to stress the system but not

overly thrash. There’s no perfect thread count across all object sizes, so the number aims for a ‘good’ fit.

• Traffic generators were pushed to utilize as much network bandwidth and CPU as possible. This means very few clients

required to saturate resources. A production environment usually has less bandwidth per client and a higher number of

clients for application load, but that’s a variable HP is not currently prepared to model in a way useful to the reader.

• High bandwidth PUT tests are unlikely to be useful for colder object storage load planning.

• DELETEs are not benchmarked from a performance standpoint for a few reasons. They didn’t seem as critical to system

performance planning, as this class of data is purged infrequently. Under load, variance per object size was much less

significant than variances for GET and PUT. They’re time consuming to accurately gauge at higher object sizes since it

takes so much more time writing objects than it does to acquire a significant DELETE sample.

Workload generator tools

These are brief descriptions of the tools used to create and characterize the workload. Look at Appendix H: Workload Tool

Detail for more information, including how to get the tools.

getput

To test REST API object interface performance through object storage gateways, an HP authored tool named getput is used.

Getput is written in python and uses the swift client library to do Swift object IO. The getput program is the ‘workhorse’

piece; the author of getput also has code for building test suites and synchronizing getput runs across multiple traffic

generators.

fio

For block testing, fio is utilized. It’s a publicly available tool that will spawn threads doing a user specified IO mix. It’s fairly

common to use fio for both benchmark and stress/hardware verification.

collectl

During test collectl is run to gather periodic samples of CPU, Memory and Disk stats on an object gateway and one of the

OSD hosts. Collectl gathers performance on a number of subsystems and allows later playback of the performance

samples to filter information.

haproxy

We used an open source load balancer as a way of demonstrating a simpler way to connect clients to a number of object

gateways. The test configuration uses a single load balancer with 1 10GbE port between the traffic generator clients and

the object storage gateways. This does restricts overall bandwidth for object storage benchmarking, but keeps the

configuration simple.

Workload results and analysis

These cover bandwidth, IOPS and latency data for object and block IO tests. Object data also includes CPU usage graphs

representing load on an OSD host and object gateways. IO results are the sum of the three traffic generator client results.