Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers
Table Of Contents
- Executive summary
- Introduction
- Overview
- Solution components
- Workload testing
- Configuration guidance
- Bill of materials
- Summary
- Appendix A: Sample Reference Ceph Configuration File
- Appendix B: Sample Reference Pool Configuration
- Appendix C: Syntactical Conventions for command samples
- Appendix D: Server Preparation
- Appendix E: Cluster Installation
- Naming Conventions
- Ceph Deploy Setup
- Ceph Node Setup
- Create a Cluster
- Add Object Gateways
- Apache/FastCGI W/100-Continue
- Configure Apache/FastCGI
- Enable SSL
- Install Ceph Object Gateway
- Add gateway configuration to Ceph
- Redeploy Ceph Configuration
- Create Data Directory
- Create Gateway Configuration
- Enable the Configuration
- Add Ceph Object Gateway Script
- Generate Keyring and Key for the Gateway
- Restart Services and Start the Gateway
- Create a Gateway User
- Appendix F: Newer Ceph Features
- Appendix G: Helpful Commands
- Appendix H: Workload Tool Detail
- Glossary
- For more information

Reference Architecture | Product, solution, or service
To access the storage, a RESTful interface is used to provide better client independence and remove state tracking load on
the server. HTTP is typically used as the transport mechanism to connect applications to the data, so it’s very easy to
connect any device over the network to the object store.
The IO interface is designed for static data. There are no file handles, concerns for locking, or reservations on objects. An S3
or Swift API object IO translates to an HTTP PUT (write) for the entire object, HTTP GET (read), or HTTP DELETE. Along with
the flat structure, it’s much easier for the storage architecture to support client concurrency because write concurrency
doesn’t exist. If multiple clients attempt to write to the same object, one version will ‘win’; the entire resulting object will be
coherent with a given client object PUT. This may not be easy to predict, so what simplifies storage architecture could
impact client software.
Object storage commonly includes multi-tenancy with access keys and ACLs for storage. With a metadata rich focus, object
storage is built around ‘what’ is in data rather than where it’s located. That means that work to guarantee enterprise
availability—sites, replica counts, etc.—stays in the cluster. The client code is focused on the data context.
At the core of the object storage concept is the way clients leverage a (relatively) flat namespace, metadata tags on objects,
and the RESTful interface. Various object storage interfaces may have more or less hierarchy in the namespace, allow partial
writes to existing objects (RADOS does this), or might not require client features such as access or ACLs. Because this
document covers object storage access through APIs provided by the object storage gateway, HP has provided additional
details specific to those interfaces.
Key solution technologies
Using industry-standard servers as cluster components gives enormous flexibility for customizing, configuring, and
balancing cost for the use case (CPU per disk, storage density, network infrastructure, etc.). With massive scale, costs of the
cluster building blocks add up, so choosing the right components for the task makes a difference.
It’s very important for enterprise adopters to develop a roadmap for understanding and implementing a maintainable object
storage solution. As an early adopter of object storage in general—and an open source solution in particular—expect to
realize a cost and feature benefit for implementing object storage that can make a real difference operating at scale. But
also plan for an engineering load both to support a Ceph cluster and develop code to utilize object storage.
Cluster architecture
A Ceph cluster is SDS architecture layered on top of traditional server storage. It provides a federated view of storage across
multiple industry-standard servers using block storage and traditional file systems, and does this with object storage
architecture. This approach has the advantages of leveraging work and standard hardware where appropriate, while still
providing the overall solution scale and performance needed. See Ceph Architecture
for more details.
The core of mapping a GET/PUT or block read/write to Ceph objects from any of the access methods is CRUSH (Controlled
Replication Under Scalable Hashing). It is the algorithm Ceph uses to compute object storage locations. All access methods
are converted into some number of Ceph native objects on the back end.
Cluster Roles
There are three primary roles in the Ceph cluster covered by this sample reference configuration:
OSD Host:
The HP ProLiant SL4540 Gen8 Server has been presented as the object storage host; this is how Ceph terms the
role of the server storing object data. The Ceph OSD Daemon is software which interacts with the OSD (Object Storage Disk);
for production clusters there’s a 1:1 mapping of OSD Daemon to logical volume. The default file system used for this
sample reference configuration on an OSD is xfs, although btrfs and ext4 are also supported.
Ceph Monitor (MON):
A Ceph Monitor maintains maps of the cluster state, including the monitor map, the OSD map, the
Placement Group Map, and the CRUSH map. Ceph maintains a history (called an “epoch”) of each state change in the Ceph
Monitors, Ceph OSD Daemons, and PGs.
Object Gateway (RGW):
An object storage interface to provide applications with a RESTful gateway to Ceph Storage Clusters.
Ceph Object Storage Gateway supports two interfaces, S3 and Swift. These interfaces support a large subset of their
respective APIs as implemented by Amazon and OpenStack Swift.
There’s also a metadata server for file system support which is outside of the context of this reference architecture.
How RADOS IO Works
This is a view of IO at a relatively high level. Ceph architecture documentation and source code provides more detail, but
understanding how IO functions helps show how the cluster provides unified control of all the storage and protects data. To
start with, here’s a model of how Ceph is accessed by clients and how that’s layered on top of the object storage
architecture.
7










