Technical information
Before You Begin: Important Concepts
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 5
number can be different from the overall performance. The best way to evaluate the
performance of complex application code is to measure it in a real system.
The standard method for improving software performance is to run the software at high speed
for a period of time sufficient for the collection of the performance data. This is accomplished
using profiling tools or the system performance monitor embedded in the Cortex-A9 processor.
With the data collected, you can find which bottleneck or hotspot consumes the most execution
time. Usually, the amount of such code is limited. You can focus on these relatively small pieces
of code and improve the overall performance quickly, with minimal effort.
Profiling tools are usually required for finding bottlenecks. Advanced use of profiling tools is
beyond the scope of this document, but a few solutions are presented to help make you aware
of available options.
Gprof
Gprof is a GNU tool that provides an easy way to profile a C/C++ application. To use it, the
source code must be compiled using GCC with the -pg flag. For line-by-line profiling, the -g
option is also needed. These options add profiling instrumentation to collect data at function
entry and exit at run time. Then you can execute the compiled program and generate profiling
data. After this, gprof analyzes the data and generates meaningful information.
Gprof can only be used to profile code that has been re-built using the -pg option. It cannot be
applied to anything that has not been built for profiling (for example, libc or the kernel). Be
aware that operating system kernel limitations or I/O bottlenecks (such as memory
fragmentation or file accesses) can affect profiling results.
OProfile
OProfile is a Linux whole-system profiling tool that includes the kernel in its metrics. It uses a
statistical sampling method. That is, it examines the system at regular intervals, determines
which code is running, and updates the appropriate counters. Because it uses interrupts, code
that disables interrupts can cause inaccuracies.
OProfile can also be made to trigger on hardware events and record all system activity,
including kernel and library code execution. OProfile does not require code to be recompiled
with any special flags. It can use a Cortex-A9 performance monitor unit (PMU) to provide useful
hardware information, such as clock cycles and cache misses.
ARM DS-5 Streamline
ARM DS-5 Streamline is a GUI-based performance analysis tool for Linux or Android systems.
It is part of ARM DS-5 and consists of a Linux kernel driver, target daemon, and Eclipse-based
UI.
DS-5 Streamline samples the system periodically and reports data visually in a statistical way.
It uses both a hardware performance monitor unit, which has hardware counters for processor
events and Linux kernel metrics to trace application information.
The issue with profiling is that it might require the operating system to be ready. Sometimes, if
you have interest only in a specific algorithm or have a high level of confidence that you know
the location of the bottleneck, you can extract the time-critical codes and run them in
standalone mode. You can try several code sequences to find the one that is most optimal.
Usually, this method requires high-precision timers.
There are some advantages to optimizing code in standalone mode:
• Easy and convenient
• No interference from the operating system
• Fast turn-around
Correctly measuring time is important when using the ARM DS-5 Streamline tool. Usually, high-
resolution timers are expected. Cortex-A9 processors have one global timer (64-bit) for all