Open Source Object Storage for Unstructured Data: Ceph on HP ProLiant SL4540 Gen8 Servers

Table Of Contents
Reference Architecture| Ceph on HP ProLiant SL4540 Gen8 Servers
Overview
Business problem
Businesses are looking for better and more cost-effective ways to manage their exploding data storage requirements.
In recent years, the amount of storage required for businesses has increased dramatically. Exploration data from oil and
gas, patient medical records, user and machine generated content, and many other data types generate massive amounts
of data per day. Simultaneously, businesses are dealing with a shift from tape- to disk-based backup. Cost-per-gigabyte
and ease of retrieval are important factors for choosing a solution that can scale quickly and economically over many years
of continually increasing capacities and data retention requirements.
Many organizations still need to manage much or all of that data in-house. Regulations and privacy considerations can
make offsite storage impractical or impossible. Hosting on a public cloud may not meet cost or data control requirements in
the long-term; the performance and control of on -premise equipment still offers real business advantages.
Organizations that have been trying to keep up with data growth using traditional file and block storage solutions are finding
that the complexity of managing and operating them has grown significantlyas have the costs of storage infrastructure.
Typical architectures vs. object storage
Storage solutions designed for traditional IT tasks are not optimal for petabyte-scale unstructured data
Typical architectures often struggle to meet business service level agreements (SLAs) when applied to petabyte scale
unstructured and archival data. In addition, with traditional storage solutions it’s possible to pay for features that aren’t
needed, and achieve less flexibility, scale, and reliability than required by the SLA.
Here are some ways traditional thinking falls short when architecting a solution to serve unstructured data at massive scale.
Architectural and cost mismatches
File and block storage methods that make sense for structured data impose unnecessary overhead for unstructured
data, particularly at large scale. Traditionally, businesses buy block storage optimized for classic data access cases, like
database workloads and file systems. These solutions have the ability to support high IOPS and heavy, concurrent write
load. However, unstructured and archival data is often written just once. Bandwidth and storage capacity are much more
important for unstructured and archival data than low latency. Traditional storage means paying for drive classes and
features an unstructured use case may not need.
When trying to drive the lowest cost per GB, tape immediately comes to mind. For many Big Data use cases, worst case
latency of tape-based storage falls outside of the required latency behaviors for data access. Unstructured and archival
data may sit dormant for a while but need to be available quicklywith maximum latency times measured in seconds
instead of minutes. Where tape latencies are acceptable, many enterprises don’t want to manage tape storage for onsite
data.
Gaps in reliability, manageability, scalability
Storage systems designed for smaller-scale, single-site deployments are often not capable of delivering the overall
reliability and data durability necessary to support complex, multi-site scale-out configurations.
Many existing storage solutions are a challenge to manage and control at massive scale. Management silos and user
interface limitations make it harder deploy new storage into business infrastructure.
Unstructured deployments can accumulate billions of objects and petabytes of data. File system limits on count and size
of files, and block storage limits on the size of presented block devices become significant connection management and
deployment challenges.
Why object storage technology
Businesses need an architecture that’s more scalable, and provides an easier way to manage and access data. The
enterprise also still requires availability and access control, even if the performance requirements are different than those
of traditional storage architecture.
Object storage is designed for the scale, characteristics, and requirements of unstructured data
By creating an interface that isn’t encumbered with design restrictions of file and block, but is optimized for unstructured
data, it’s possible to create a cluster architecture that breaks out of typical scale storage architectural drawbacks.
Object Storage Architecture Details
Object storage allows the storage of arbitrary-sized “objects” using a flat, wide namespace where each object can be tagged
with its own metadata. This simple architecture makes it much easier for software to support massive numbers of objects
across the object store. The APIs provided by the object storage gateway add an additional layer above objectscalled
‘containers’ (Swift) and ‘buckets’ (S3)to hold groupings of objects.
6