Technologies Paper
performance requirements—such as
analyzing high-velocity data—while
relieving performance bottlenecks
within the cluster.
Intel® Xeon® Processor E7 v2
Product Family: Performance for
CPU-Intensive Workloads
The Intel Xeon processor E7 v2 family,
which is built on the Intel 22 nanometer
process, reaches new levels of processing
density with up to 15 cores and, with
Intel® Hyper-Threading Technology
(Intel® HT Technology)
4
enabled, 30
logical processors per socket. The family
supports two-, four-, and eight-socket
congurations natively, which provides
a maximum of 120 cores and 240 logical
processors per server. Enterprises can
extend socket congurations even further
with third-party controllers. With more
cores and threads available, enterprises
can deploy Apache Hadoop clusters with
greater processing capabilities while using
fewer servers.
A four-socket server conguration can
support up to 6 terabytes of memory,
while an eight-socket conguration can
scale memory to 12 terabytes. Enterprises
can use larger memory congurations
to temporarily store frequently used or
high-velocity data for analysis by Apache
Hadoop services. For example, system
engineers can congure servers to use a
portion of a server’s RAM for data storage
space and then congure Apache Hadoop
to use this space for temporary data
storage. Since RAM is orders of magnitude
faster than disk-based storage, Apache
Hadoop can use the RAM-based storage
to analyze larger amounts of high-velocity
data faster than if the data were located
on disk-based storage. After Apache
Hadoop completes analysis, it can then
write the data to slower disks for longer-
term storage.
The Intel Xeon processor E7 v2 family also
builds on a tradition of advanced reliability,
availability, and serviceability (RAS)
features that can give Apache Hadoop
Combined, the features of the Intel Xeon
processor E7 v2 family complement those of
Apache Hadoop to help enterprises increase
the computing capabilities and reliability of
their Apache Hadoop clusters with fewer
servers, which can lower the overall TCO.
Fewer servers means less complexity and
lower power and cooling requirements.
Intel® Solid-State Drives:
High-Performance Storage for
I/O-Intensive Workloads
Intel continues to be a leader in solid-state
drive (SSD) technology, with drives that
provide performance that is dramatically
higher than that of mechanical hard disks,
combined with greater reliability. Intel®
SSDs eliminate the mechanical limitations
of hard disks and provide higher I/O
operations per second (IOPS) and increased
mean time between failures (MTBF).
Apache Hadoop clusters that require higher
performance storage—such as those that
perform real-time analysis on in-ight
data—can benet from Intel SSDs.
Intel SSDs can provide higher throughput
than traditional mechanical hard drives,
which can reduce the risk of storage
bottlenecks on the cluster nodes. When
tested against mechanical hard drives, SSDs
have shown that they can deliver up to
2.7 times the throughput for I/O-intensive
Apache Hadoop workloads, such as Sort.
7
In addition to being more reliable, Intel SSDs
also require less power and cooling than
traditional mechanical hard drives, which can
lead to greater node uptime and lower TCO.
clusters better recovery from server
hardware failures. Designed for systems
with 99.999% uptime requirements,
the Intel Xeon processor E7 v2 family
provides continuous self-monitoring and
self-healing capabilities that rival those
of RISC-based systems.
5
Some of these
features include:
• Machine Check Architecture (MCA)
Recovery, which lets the CPU and
operating system isolate errors that
could normally crash a server, such
as unrecoverable memory errors.
• MCA Recovery Execution Path, which
handles uncorrectable data errors
passed to the CPU. This feature
enables operating systems and
applications to assist in recovering
from errors that cannot be corrected
at the hardware level.
• MCA I/O, which provides information
on uncorrectable I/O errors so that
the operating system can take action.
Operating systems or monitoring tools
can use this information to determine
the cause of system errors and enable
preventive maintenance.
• Enhanced Machine Check Architecture
(eMCA) Gen 1, which provides enhanced
logging information to the operating
system and applications that can
better diagnose errors and proactively
predict failures.
• PCIe Live Error Recovery (LER),
6
which provides recovery from and
containment of PCIe errors.
Apache Hadoop
*
Throughput
776 MB/s
289 MB/s
2.7x
faster overall
data throughput
Intel® SSDs
HDD
Figure 1: Intel® SSDs can improve Apache Hadoop
*
throughput by up to
2.7 times over traditional mechanical hard disks.
7
3
Accelerate Big Data Analysis with Intel® Technologies