Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers

Table Of Contents

Reference Architecture| Ceph on HP ProLiant SL4540 Gen8 Servers

Appendix F: Newer Ceph Features

While the sample reference configuration here used the Dumpling release, Ceph is continuing to make feature releases of

significant technologies. This section lists features already released in stable code bases, or coming soon. There are many

features on the Inktank roadmap; the selected are from Emperor and Firefly releases.

Multi-Site

Ceph Emperor Release has fully functional support for multi-site clusters. Ceph object gateway regions and metadata

synchronization agents maintain a global namespace across different geographies and even clusters. Zones can be defined

within regions to synchronize and maintain further copies of the data. A typical configuration would be a Ceph cluster per

region, with zones defined as needed within each region for failover, disaster, and backup recovery protection.

There are of course hardware impacts when deploying multi-site. Make sure the SL4540 compute node density works well

for splitting failure domains across sites (clearly a single SL4540 chassis can’t be divided). The count of object gateways and

monitors will increase above what the same cluster OSD host count would require on a single site to match region and zone

configuration. It’s also likely that object gateway distribution will dictate additional load balancers per site.

Erasure Coding

Replication has the performance advantage of data locality as a full copy of data is present on each device in the Active Set.

It also provides sufficient protection for data at massive scale. It does however come with the drawback of being less

storage efficient than traditional RAID 5/RAID 6 architectures. At larger scales—especially where cost per usable gigabyte is

a primary driver of the storage architecture— this becomes a significant scaling drawback.

Erasure coding is a Forward Error Correction code that translates a message of ‘k’ symbols into a message of ‘n’ symbols

such that the original message can be recovered from a subset of the n symbols (k symbols). Erasure codes use math to

create extra data that allows the user to need only a subset of the total data to recreate the message. It is similar to RAID 6

but the SLA, latency, and scale characteristics of an object store require tolerating > 2 drive failures. Therefore, Erasure

Coding can be tuned for ‘n’ and ‘k’ based on the scale and failure tolerance of the cluster.

The tradeoff is lower performance, but instead of 3.2:1 storage efficiency it’s more in the 1.2-1.8:1 range. As implemented

under the Ceph Firefly release, it can be set as a ‘storage tier’ with more performant replicated pools. Objects that are

‘colder’ will be migrated to the erasure coded storage; erasure coding supports a layer of storage with appropriate

price/performance to the temperature of the data.

Cache Tiering

For pools that require more performance, Ceph implements a cache pool tier in Firefly. There are two defined use cases for

initial release:

• Writeback cache—take an existing data pool and put a fast cache pool (such as SSDs) in front of it. Writes are acked from

the cache tier pool and flushed to the data pool based on the defined policy.

• Read-only pool, weak consistency—take an existing data pool and add one or more read-only cache pools. Copy data to

the cache pool(s) on read and forward writes to the original data pool. Stale data expired from the cache pools based

on the defined policy.

These will be useful when combined with specific applications with access patterns that match these caching properties.

The object gateway is an example, but this could also be used as a performant accelerator for a block layer with need for

write performance or a cacheable read load (‘golden image’ VM boot volumes).