Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers

ManualsBrandsHP ManualsServerHP ProLiant SL4540 Gen8 Server

Table Of Contents

Reference Architecture | Product, solution, or service

• A cluster network offloads replication traffic from the data network, and provides an isolated failure domain. With tested

replication settings, there are two writes for replication on the cluster network for every actual IO. That’s a significant

amount of traffic to isolate from the data network.

• It is recommended to reserve a separate 1GbE network for management as it supports a different class and purpose of

traffic than cluster IO.

Matching object gateways to traffic

Start by selecting the typical object size and IO pattern then compare to the sample reference configuration results. The

object gateway limits depend on the object traffic, so accurate scaling requires testing and characterization with load

representative of the use case. Here are some considerations when determining how many object gateways to select for

the cluster:

• Object gateway operation processing tends to limit small object transfer. File system caching for GETs tends to have the

biggest performance impact at these small sizes.

• For larger object and cluster sizes, gateway network bandwidth is the typical limiting factor for performance.

• HP has observed around a peak of 3000-5000 ops/sec per object gateway testing across object sizes; that range was

seen at small object sizes. Maximum practical bandwidth limits seen were in the 900MB-1GB/sec range on a 10GbE link.

• Load balancing does make sense at scale to improve latency, IOPS, and bandwidth. Consider at least three object

gateways behind a load balancer architecture.

• Very cold storage or environments with limited clients may only ever need a single gateway.

With the monitor process having relatively lightweight resource requirements, the monitor can run on the same hardware

used for an object gateway. Performance and failure domain requirements dictate that not every monitor host is an object

gateway, and vice-versa. To maximize client traffic per object gateway or meet strictest failure domain requirements, it is

recommended the two roles be hosted on separate hardware.

Planning monitor count

Use a minimum of three monitors for a production setup. While it is possible to run with just one monitor, it’s not

recommended for an enterprise deployment, as larger counts are important for quorum and redundancy. With multiple

sites it makes sense to extend the monitor count higher to maintain a quorum with a site down.

Use physical boxes rather than VMs, to have separate hardware for failure cases. Do not run a monitor on the same box as

OSDs; Ceph documentation recommends avoiding that due to the monitor’s usage of fsync() impacting OSD performance.

The sample reference configuration is not stressing DL360p monitor resources in a 200 OSD cluster. Therefore, there are

no scaling recommendations for monitors based on cluster size.

Cluster Installation

Hardware platform and OS preparation details are contained in Appendix D: Server Preparation. Most installation details are

broken out in Appendix E: Cluster Installation.

Even for more complicated clusters, the quick Ceph deployment flow using ceph-deploy is a good starting point for cluster

installation. There is community work with more advanced configuration management tools to further automate cluster

install (ex: Chef, Juju, Puppet), but those details are outside of this document’s scope.

Whether Ceph’s quick start instructions are used or not, it’s recommended to use ceph-deploy where possible over manual

configuration as steps tend to be simpler to execute and maintain. Do expect to make manual configuration changes to

ceph.conf regardless. Object gateway configuration is currently not supported under ceph-deploy; cluster use and

configuration may dictate non-default parameters be added

Cluster tuning

This section contains tuning guidance, which HP considered important to general system configuration. This section is not

an ‘optimal performance’ guide; there are a lot of settings to modify operating behavior and the goal was easy to reproduce

performance with a good baseline configuration.

Placement groups

The tested ratio (for the sum of

all

pools) recommended by online documentation is

<total_placement_group_count>= ((# OSD * 100) / replica count)

Some tuning heuristics: