Paper

Big Data Integration for
Massive Scalability
Many businesses already need to store
and analyze terabytes and in some cases
petabytes of data. It is possible to build a
real-time, in-memory business platform
with petabyte scalability (see the sidebar,
In-Memory Computing at Petabyte Scale).
However, this approach is neither finan-
cially practical nor necessary for most
businesses. Apache Hadoop* offers a more
cost-effective solution for integrating
very large data volumes with in-memory
computing environments.
Hadoop provides massively scalable
storage and analytics on a distributed
architecture based on affordable servers
and commodity disk drives. It offers a cost-
effective solution for ingesting, preparing,
and storing warm data for inclusion in the
real-time analytics environment. Petabytes
of data, including all data types, can be
stored at a cost-per-terabyte that is not
only much lower than an in-memory data-
base, but also much lower than a tradi-
tional, disk-based storage system.
Intel and SAP offer an integrated solution
today based on SAP HANA and the Intel®
Distribution for Apache Hadoop (IDH)
software. Business users and data analysts
see the data in Hadoop as an extension
of the SAP HANA data set, and queries
are automatically federated across both
platforms. IDH also provides enterprise-
class tools and capabilities for management
and data security. The combined solution
supports real-time analytics acting on
petabytes of data.
Other database vendors are following suit.
Hadoop is becoming the defacto standard
for storing and analyzing massive, unstruc-
tured data sets. There may come a time
when it is economical to store all data
in main memory. Until then, integrating
Hadoop and other massively scalable solu-
tions with in-memory computing platforms
will be key to optimizing capability versus
cost across all enterprise data and all busi-
ness requirements.
Where to Start
In time, all business computing will be done
in memory. Today, businesses have to move
forward intelligently, by balancing cost, risk
and value. Companies with high-value use
cases that cannot be solved using tradi-
tional tools should consider implementing
in-memory computing sooner rather than
later. For others, it may make more sense
to wait. As vendors continue to integrate
in-memory capability into their core prod-
ucts, costs will go down and integration
will become simpler.
Regardless of your current needs and goals,
now is the time to:
• Evaluate the potential of in-memory
computing in your specific business
and industry. Work with business units
to identify potential, high-value use cases.
Consider what you could do better, faster,
or different if you could analyze large
data sets, including fresh operational
data, almost instantly.
• Explore current in-memory solutions
and track progress as new solutions
emerge. Given the proven business value
and the maturity of the enabling tech-
nologies, in-memory computing can be
expected to advance rapidly.
• Quantify the potential business value
and consider the cost and risk of imple-
mentation. There is no doubt that a time
will come when the benefits of in-memory
computing exceed the cost and risk of
implementation for your business. The
potential benefits are huge. Be prepared
so you can make the right move at the
right time.
Conclusion
In-memory computing represents a para-
digm shift in the way businesses manage
and use data. The unprecedented speed
and scale of in-memory databases allows
companies to host transactional and ana-
lytical applications on the same database.
Operational data is available for analysis
as soon as it is generated, and complex
queries can be completed in seconds rather
than hours or days. Infrastructure and
operational requirements are also greatly
reduced, which can lead to dramatic savings
in total cost of ownership.
In-memory solutions are available today
from dozens of vendors, and all major data-
base vendors, including SAP, IBM, Oracle,
and Microsoft, offer or will soon offer
in-memory options. The Intel Xeon processor
E7 v2 family powers a new generation of
servers that are specifi cally optimized for
in-memory computing, delivering up to
2x higher performance than previous-
generation servers
11
and providing up
to 3x higher memory scalability. They
are ideal for the data-intensive, mission-
critical demands of in-memory computing.
In-Memory Computing
at Petabyte Scale
In May 2012, Intel and SAP launched
the SAP HANA* petascale cloud, a
100 TB in-memory system consisting
of 100 servers based on the Intel®
Xeon® processor E7 family. They have
since expanded this cloud infrastruc-
ture to include more than 250 servers,
8,000 threads, 4,000 cores and 250
TB of RAM, all capable of running a
single instance of SAP HANA.
Used for customer proof-of-concept
projects and as a laboratory for
Intel and SAP research teams, this
petascale cloud environment has
clearly demonstrated that in-memory
computing can scale to deliver real-
time performance while acting on
massive data volumes.
For more information, see the Intel
and SAP solution brief, "Scaling
Real-Time Analytics across the
Enterprise— and into the Cloud"
7
Changing the Way Businesses Compute…and Compete