Paper

The “Big Data” Challenge
The term big data refers to the growing
flood of data in today’s hyperconnected
world, and to the enormous challenges
and opportunities this data presents. In a
very real sense, however, data has been
too big for decades. Since the first Intel
processor was launched more than forty
years ago, the data access speeds of bulk
storage systems have not kept pace with
processor capability.
Processor speed and functionality are
delivered in silicon, and have benefitted
from the rapid transition toward smaller
and faster silicon process technologies
predicted by Moore’s Law.
2
Bulk storage
has long been based on mechanical tech-
nologies, from punch cards to magnetic tape
to the spinning magnetic disks of today’s
hard disk drives (HDDs). Although the cost
per gigabyte (GB) of storage has declined
rapidly over the years, storage performance
has been limited by the mechanical nature
of the data access process.
Faster, silicon-based data storage tech
-
nologies have existed for many years, but
have been too costly for bulk storage.
3
Instead, such technologies have been
used for the main memory of computing
systems and for the even faster cache
subsystems that reside directly on the
processor die. Although these high-speed
memory subsystems ameliorate the data
access problem to some degree, their
limited capacity has been a performance
bottleneck. Getting the right data out of
bulk storage and into the right processor
registers at the right time has been a
tough challenge for decades.
Database vendors have done much over
the years to work around and mask this
performance gap, but the resulting cost and
complexity have been significant. As illus
-
trated in Figure 1, traditional HDD-depen-
dent information infrastructure requires:
Separate databases for transactional
and analytical applications, along with
separate infrastructure. In each case,
hardware and software must be optimized
to achieve acceptable performance.
Multiple data marts to address special-
ized business intelligence needs without
overloading data warehouses.
Constant tuning and optimization
of databases and storage systems
to deliver acceptable performance,
especially for analytical workloads.
Despite all the cost and effort, custom-
ers still experience long delays between
the time that data is generated and the
time it is available for analysis. Data must
be extracted from transactional systems,
transformed into required formats, and
loaded into the analytics environment.
In many cases, the data models in the
warehouse or data mart must then be
re-optimized for performance. Even with
all this preparatory work, complex queries
can still take many hours to complete.
Harnessing Big Data with
In-Memory Computing
In-memory computing changes the
computing paradigm fundamentally (see
Figure 2 on the next page). All relevant
data is kept in the main memory of the
computing system, rather than in a
separate bulk storage system.
Data can be accessed orders of magni-
tude faster, so fast that transactional
and analytical applications can run simul-
taneously on the same database running
on the same infrastructure. The need
for separate data ware houses and data
marts is eliminated, along with the
associated costs.
4
Data is available for analysis as
soon as it is generated. Even if it is
generated on a separate system, it
can be replicated almost instantly
into an in-memory database.
Data Warehouse
ERP
TRANSACTIONAL
INFRASTRUCTURE
Data Marts
Extraction,
Transformation,
Loading (ETL)
BATCH
REPORTS
ANALYTICS
INFRASTRUCTURE
OLAP, Reporting,
Data Mining
CHALLENGES OF TRADITIONAL BUSINESS INTELLIGENCE
• Slow business response
• High infrastructure and operational costs
CRM
OLTP
Long time to insight (hours to days)
Figure 1. Traditional, HDD-dependent information infrastructure requires separate transactional and analytical infrastructure, resulting in high costs
and long delays between data generation and insight.
3
Changing the Way Businesses Compute…and Compete