Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers
Table Of Contents
- Executive summary
- Introduction
- Overview
- Solution components
- Workload testing
- Configuration guidance
- Bill of materials
- Summary
- Appendix A: Sample Reference Ceph Configuration File
- Appendix B: Sample Reference Pool Configuration
- Appendix C: Syntactical Conventions for command samples
- Appendix D: Server Preparation
- Appendix E: Cluster Installation
- Naming Conventions
- Ceph Deploy Setup
- Ceph Node Setup
- Create a Cluster
- Add Object Gateways
- Apache/FastCGI W/100-Continue
- Configure Apache/FastCGI
- Enable SSL
- Install Ceph Object Gateway
- Add gateway configuration to Ceph
- Redeploy Ceph Configuration
- Create Data Directory
- Create Gateway Configuration
- Enable the Configuration
- Add Ceph Object Gateway Script
- Generate Keyring and Key for the Gateway
- Restart Services and Start the Gateway
- Create a Gateway User
- Appendix F: Newer Ceph Features
- Appendix G: Helpful Commands
- Appendix H: Workload Tool Detail
- Glossary
- For more information
Reference Architecture| Ceph on HP ProLiant SL4540 Gen8 Servers
Appendix F: Newer Ceph Features
While the sample reference configuration here used the Dumpling release, Ceph is continuing to make feature releases of
significant technologies. This section lists features already released in stable code bases, or coming soon. There are many
features on the Inktank roadmap; the selected are from Emperor and Firefly releases.
Multi-Site
Ceph Emperor Release has fully functional support for multi-site clusters. Ceph object gateway regions and metadata
synchronization agents maintain a global namespace across different geographies and even clusters. Zones can be defined
within regions to synchronize and maintain further copies of the data. A typical configuration would be a Ceph cluster per
region, with zones defined as needed within each region for failover, disaster, and backup recovery protection.
There are of course hardware impacts when deploying multi-site. Make sure the SL4540 compute node density works well
for splitting failure domains across sites (clearly a single SL4540 chassis can’t be divided). The count of object gateways and
monitors will increase above what the same cluster OSD host count would require on a single site to match region and zone
configuration. It’s also likely that object gateway distribution will dictate additional load balancers per site.
Erasure Coding
Replication has the performance advantage of data locality as a full copy of data is present on each device in the Active Set.
It also provides sufficient protection for data at massive scale. It does however come with the drawback of being less
storage efficient than traditional RAID 5/RAID 6 architectures. At larger scales—especially where cost per usable gigabyte is
a primary driver of the storage architecture— this becomes a significant scaling drawback.
Erasure coding is a Forward Error Correction code that translates a message of ‘k’ symbols into a message of ‘n’ symbols
such that the original message can be recovered from a subset of the n symbols (k symbols). Erasure codes use math to
create extra data that allows the user to need only a subset of the total data to recreate the message. It is similar to RAID 6
but the SLA, latency, and scale characteristics of an object store require tolerating > 2 drive failures. Therefore, Erasure
Coding can be tuned for ‘n’ and ‘k’ based on the scale and failure tolerance of the cluster.
The tradeoff is lower performance, but instead of 3.2:1 storage efficiency it’s more in the 1.2-1.8:1 range. As implemented
under the Ceph Firefly release, it can be set as a ‘storage tier’ with more performant replicated pools. Objects that are
‘colder’ will be migrated to the erasure coded storage; erasure coding supports a layer of storage with appropriate
price/performance to the temperature of the data.
Cache Tiering
For pools that require more performance, Ceph implements a cache pool tier in Firefly. There are two defined use cases for
initial release:
• Writeback cache—take an existing data pool and put a fast cache pool (such as SSDs) in front of it. Writes are acked from
the cache tier pool and flushed to the data pool based on the defined policy.
• Read-only pool, weak consistency—take an existing data pool and add one or more read-only cache pools. Copy data to
the cache pool(s) on read and forward writes to the original data pool. Stale data expired from the cache pools based
on the defined policy.
These will be useful when combined with specific applications with access patterns that match these caching properties.
The object gateway is an example, but this could also be used as a performant accelerator for a block layer with need for
write performance or a cacheable read load (‘golden image’ VM boot volumes).
48










