Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers

ManualsBrandsHP ManualsServerHP ProLiant SL4540 Gen8 Server

Table Of Contents

Reference Architecture| Ceph on HP ProLiant SL4540 Gen8 Servers

Overview

Business problem

Businesses are looking for better and more cost-effective ways to manage their exploding data storage requirements.

In recent years, the amount of storage required for businesses has increased dramatically. Exploration data from oil and

gas, patient medical records, user and machine generated content, and many other data types generate massive amounts

of data per day. Simultaneously, businesses are dealing with a shift from tape- to disk-based backup. Cost-per-gigabyte

and ease of retrieval are important factors for choosing a solution that can scale quickly and economically over many years

of continually increasing capacities and data retention requirements.

Many organizations still need to manage much —or all —of that data in-house. Regulations and privacy considerations can

make offsite storage impractical or impossible. Hosting on a public cloud may not meet cost or data control requirements in

the long-term; the performance and control of on -premise equipment still offers real business advantages.

Organizations that have been trying to keep up with data growth using traditional file and block storage solutions are finding

that the complexity of managing and operating them has grown significantly—as have the costs of storage infrastructure.

Typical architectures vs. object storage

Storage solutions designed for traditional IT tasks are not optimal for petabyte-scale unstructured data

Typical architectures often struggle to meet business service level agreements (SLAs) when applied to petabyte scale

unstructured and archival data. In addition, with traditional storage solutions it’s possible to pay for features that aren’t

needed, and achieve less flexibility, scale, and reliability than required by the SLA.

Here are some ways traditional thinking falls short when architecting a solution to serve unstructured data at massive scale.

Architectural and cost mismatches

• File and block storage methods that make sense for structured data impose unnecessary overhead for unstructured

data, particularly at large scale. Traditionally, businesses buy block storage optimized for classic data access cases, like

database workloads and file systems. These solutions have the ability to support high IOPS and heavy, concurrent write

load. However, unstructured and archival data is often written just once. Bandwidth and storage capacity are much more

important for unstructured and archival data than low latency. Traditional storage means paying for drive classes and

features an unstructured use case may not need.

• When trying to drive the lowest cost per GB, tape immediately comes to mind. For many Big Data use cases, worst case

latency of tape-based storage falls outside of the required latency behaviors for data access. Unstructured and archival

data may sit dormant for a while but need to be available quickly—with maximum latency times measured in seconds

instead of minutes. Where tape latencies are acceptable, many enterprises don’t want to manage tape storage for onsite

data.

Gaps in reliability, manageability, scalability

• Storage systems designed for smaller-scale, single-site deployments are often not capable of delivering the overall

reliability and data durability necessary to support complex, multi-site scale-out configurations.

• Many existing storage solutions are a challenge to manage and control at massive scale. Management silos and user

interface limitations make it harder deploy new storage into business infrastructure.

• Unstructured deployments can accumulate billions of objects and petabytes of data. File system limits on count and size

of files, and block storage limits on the size of presented block devices become significant connection management and

deployment challenges.

Why object storage technology

Businesses need an architecture that’s more scalable, and provides an easier way to manage and access data. The

enterprise also still requires availability and access control, even if the performance requirements are different than those

of traditional storage architecture.

Object storage is designed for the scale, characteristics, and requirements of unstructured data

By creating an interface that isn’t encumbered with design restrictions of file and block, but is optimized for unstructured

data, it’s possible to create a cluster architecture that breaks out of typical scale storage architectural drawbacks.

Object Storage Architecture Details

Object storage allows the storage of arbitrary-sized “objects” using a flat, wide namespace where each object can be tagged

with its own metadata. This simple architecture makes it much easier for software to support massive numbers of objects

across the object store. The APIs provided by the object storage gateway add an additional layer above objects—called

‘containers’ (Swift) and ‘buckets’ (S3)— to hold groupings of objects.