Shark User Guide (Legacy)
Contents Introduction 13 Overview 13 Philosophy 13 Organization of This Document 14 Getting Started with Shark 17 Main Window 17 Mini Configuration Editors 19 Perform Sampling 19 Session Windows and Files 20 Session Files 21 Session Information Sheet 22 Session Report 23 Advanced Settings Drawer 23 Shark Preferences 24 Time Profiling 29 Statistical Sampling 29 Taking a Time Profile 31 Profile Browser 32 Heavy View 36 Tree View 36 Profile Display Preferences 38 Chart View 40 Advanced Chart View Settings 4
Contents System Tracing 63 Tracing Methodology 63 Basic Usage 64 Interpreting Sessions 66 Summary View In-depth 67 Trace View In-depth 73 Timeline View In-depth 77 Sign Posts 90 Tips and Tricks 93 Other Profiling and Tracing Techniques 97 Time Profile (All Thread States) 97 Malloc Trace 101 Using a Malloc Trace 102 Advanced Display Options 105 Static Analysis 107 Using Shark with Java Programs 108 Java Tracing Techniques 109 Linking Shark with the Java Virtual Machine 110 Event Counting and Profiling Over
Contents Network/iPhone Profiling 138 Using Shared Profiling Mode 141 Mac OS X Firewall Considerations 143 Advanced Session Management and Data Mining 145 Automatic Symbolication Troubleshooting 145 Symbol Lookup 145 Debugging Information 146 Manual Session Symbolication 146 Managing Sessions 150 Comparing Sessions 150 Merging Sessions 151 Data Mining 151 Callstack Data Mining 152 Perf Count Data Mining 157 Example: Using Data Mining with a Time Profile 158 A Performance Problem...
Contents Hardware Counter Configuration 202 Configuring the Sampling Technique: The Sampling Tab 202 Common Elements in Performance Counter Configuration Tabs 206 Counter Control 206 Privilege Level Filtering 207 Process Marking 207 MacOS X OS-Level Counters Configuration 208 Intel CPU Performance Counter Configuration 209 PowerPC G3/G4/G4+ CPU Performance Counter Configuration 211 PowerPC G5 (970) Performance Counter Configuration 213 PowerPC North Bridge Counter Configuration 221 U1.
Contents PPC 7450 (G4+) Performance Counter Event List 271 PPC 970 (G5) Performance Counter Event List 282 UniNorth-2 (U1.5/2) Performance Counter Event List 316 UniNorth-3 (U3) Performance Counter Event List 319 Kodiak (U4) Performance Counter Event List 323 ARM11 Performance Counter Event List 327 Document Revision History 329 Swift 12 Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Figures, Tables, and Listings Getting Started with Shark 17 Figure 1-1 Figure 1-2 Figure 1-3 Figure 1-4 Figure 1-5 Figure 1-6 Figure 1-7 Figure 1-8 Figure 1-9 Main Window 17 Process Target 18 Mini Configuration Editor 19 Session Inspector Panel 22 Sample Window with Advanced Settings Drawer visible at right 24 Shark Preferences — Appearance 25 Shark Preferences — Sampling 26 Shark Preferences — Sessions 27 Shark Preferences — Search Paths 28 Time Profiling 29 Figure 2-1 Figure 2-2 Figure 2-3 Figure 2-4 F
Figures, Tables, and Listings System Tracing 63 Figure 3-1 Figure 3-2 Figure 3-3 Figure 3-4 Figure 3-5 Figure 3-6 Figure 3-7 Figure 3-8 Figure 3-9 Figure 3-10 Figure 3-11 Figure 3-12 Figure 3-13 Figure 3-14 Figure 3-15 Figure 3-16 Figure 3-17 Figure 3-18 Figure 3-19 Figure 3-20 Listing 3-1 Listing 3-2 Listing 3-3 Time Profile vs.
Figures, Tables, and Listings Figure 4-14 Chart View with additional timed counter graphs 121 Advanced Profiling Control 122 Figure 5-1 Figure 5-2 Figure 5-3 Figure 5-4 Figure 5-5 Figure 5-6 Figure 5-7 Figure 5-8 Figure 5-9 Figure 5-10 Figure 5-11 Figure 5-12 Listing 5-1 Listing 5-2 Process Attach 122 Launch Process Panel 123 Batch Mode 125 Normal Profiling Workflow 126 Windowed Time Facility Workflow 126 The Windowed Time Facility Timeline 128 Unresponsive Application Triggering 129 Samples Taken for T
Figures, Tables, and Listings Figure 6-22 Figure 6-23 Figure 6-24 Figure 6-25 Figure 6-26 Figure 6-27 Figure 6-28 Figure 6-29 Figure 6-30 Figure 6-31 Figure 6-32 Figure 6-33 Figure 6-34 Figure 6-35 Figure 6-36 Data Mining Contextual Menu 168 After Focus Symbol -[SKTGraphicView drawRect:] 169 After focus and expansion 170 Source View: SKTGraphic drawInView:isSelected: 171 Source View: SKGraphic drawHandlesInView: 172 Source View: SKGraphic drawHandleAtPoint:inView: 173 Heavy View of Focused Sketch 174 Expa
Figures, Tables, and Listings Figure 8-8 Figure 8-9 Figure 8-10 Figure 8-11 Figure 8-12 Figure 8-13 Figure 8-14 Figure 8-15 PowerPC 970 IMC (IFU) Configuration Tab 217 PowerPC 970 IMC (IDU) Configuration Tab 221 U1.
SwiftObjective-C Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Introduction Important: This document may not represent best practices for current development. Links to downloads and other resources may no longer be valid. Overview Shark is a tool for performance understanding and optimization. Why is it called “Shark?” Performance tuning requires a hunter’s mentality, and no animal is as pure in this quest as a shark. A shark is also an expert in his field — one who uses all potential resources to achieve his goals.
Introduction Organization of This Document 2. It must be relevant. Optimizing functionality that is rarely used is usually counter-productive. 3. It shows up as a hot spot in a time profile. If there is no obvious hot spot in your code or you are spending a lot of time in system libraries, performance is more likely to improve through high-level improvements (architectural changes).
Introduction Organization of This Document ● Getting Started with Shark— This introduction and Getting Started with Shark (page 17) are designed to give you an overall introduction to Shark. After covering some basic philosophy here, Getting Started with Shark (page 17) describes basic ways to use Shark to sample your applications, features of the Session windows that open after you sample your applications, and the use of Shark’s global preferences.
Introduction Organization of This Document Counter Event List (page 252), PPC 750 (G3) Performance Counter Event List (page 263), PPC 7400 (G4) Performance Counter Event List (page 265), PPC 7450 (G4+) Performance Counter Event List (page 271), PPC 970 (G5) Performance Counter Event List (page 282), UniNorth-2 (U1.
Getting Started with Shark Starting to use Shark is a relatively simple process. You only need to choose one or two items from menus and press a big “Start” button in order to start sampling your applications. This chapter describes these basic steps and a few other general Shark features, such as its preferences. Main Window Figure 1-1 Main Window After launching Shark, you will be presented with Shark’s main window, as illustrated in Figure 1-1.
Getting Started with Shark Main Window ● Malloc Trace— If your program allocates and deallocates a lot of memory, performance can suffer and the odds of accidental memory leaks increase. Shark can help you find and analyze these allocations. Malloc Trace (page 101) talks about this more. ● Static Analysis— Shark can provide some basic optimization hints without actually running code. See Static Analysis (page 107) for more information.
Getting Started with Shark Perform Sampling Mini Configuration Editors Each configuration typically has a few parameters that are frequently modified. Shark allows you to edit these easily using the mini configuration editors associated with each configuration. You can enable mini configuration editors by selecting the Config Show Mini Config Editor menu item (Command-Shift-C ).
Getting Started with Shark Session Windows and Files Note: Occasionally you may notice a small delay while Shark allocates the sample buffers it needs to record data, due to time spent in the Mac OS X virtual memory system performing the necessary memory allocations.
Getting Started with Shark Session Windows and Files Shark allows you to work with multiple sampling sessions at a time, displaying a separate window for each session. This is useful for comparing two or more sampling sessions side-by-side. The currently displayed session can be changed using the Window menu. By default, sessions are listed in the order they are loaded or created. In addition, each new session is given a unique name, in the format of “Session # - Configuration.
Getting Started with Shark Session Windows and Files Note: Shark’s session files have slowly evolved and changed over time, as new features have been added that made it difficult to keep backwards-compatible file formats. The current file format (.mshark) is only compatible with Shark 4.6 and later. Shark 4.0–4.5 use a transitional file format (also called.mshark) that can still be read by more recent versions of Shark. However, users of these versions of Shark who need to read Shark 4.
Getting Started with Shark Session Windows and Files 1. Basic Statistics — This section of the pane contains basic information about the system at the time the session was recorded. The system’s name, the current user, date, and time are available here. 2. Software Configuration — This shows version information about Shark and the underlying Mac OS X and frameworks. 3. Sampling Configuration — This shows a text description of the configuration used for recording the session.
Getting Started with Shark Shark Preferences Advanced Settings menu item (Command-Shift-M ). An example is depicted below in Main Window. The controls presented will vary depending upon the current session viewer visible in the window, and so instructions on how to use these controls are provided in sections following the descriptions of the session viewers themselves.
Getting Started with Shark Shark Preferences 3. Alternating/Solid Table Background — For tabular session window views, such as the profile browsers and code browsers described in Profile Browser (page 32) and Code Browser (page 45), Shark can use either a solid background color behind the text or alternate between a color and white on every row. Select the viewing option that you prefer here. 4.
Getting Started with Shark Shark Preferences 3. Remain in Background — Shark normally brings itself to the front when sampling completes. This means that it will be the main application while it analyzes samples and then displays a session window. Generally, this is the desirable behavior, because most users want to examine their sampled sessions immediately.
Getting Started with Shark Shark Preferences 1. Ask About Unsaved Sessions — With Shark, you can optionally disable the usual behavior of asking if you want to individually save each session file when closing it or quitting Shark. Some users tend to examine their data right after sampling, and therefore will rarely need to save Shark session files. If you tend to work this way, then you might find the default behavior annoying and wish to uncheck this box. 2.
Getting Started with Shark Shark Preferences 1. Source — Shark will usually find source files automatically if they are not moved between compilation and session viewing times. If you must move the source at all, however, then you will need to specify a path to the new source location here so that Shark can find your source. Probably the most common reason why you might need to use this is if you compile the source on one system and then execute your code and examine your session on another. 2.
Time Profiling The first and most frequently used Shark configuration is the Time Profile . This produces a statistical sampling of the program’s or system’s execution by recording a new sample every time that a timer interrupt occurs, at a user-specified frequency (1 KHz, for 1ms sampling intervals, by default).
Time Profiling Statistical Sampling as taking an entire time quantum balances out the numerous times that it is missed entirely, providing a fairly accurate measurement of the time spent executing the routine overall. As a result, execution time measurements for the most critical routines, where the program spends most of its time executing, are generally very good.
Time Profiling Taking a Time Profile sampling mechanism are spread out to affect most areas of measured execution more or less equally. In contrast, most event counting-based mechanisms, such as function or basic block counting, record data at preset code locations, and therefore distort performance more near the preset sample points than elsewhere.
Time Profiling Profile Browser 5. Sample Limit — The maximum number of samples to record. Specifying a maximum of N samples will result in at most N samples being taken, even on a multi-processor system, so this should be scaled up as larger systems are sampled. When the sample limit is reached, data collection automatically stops. With the Windowed Time Facility mode, its sample history field replaces this one, and if the Time Limit is very small it may be reached first.
Time Profiling Profile Browser menu (#8), if you would rather see the “Tree” view, which is described in Tree View (page 36) and organizes the sample groups according to the program’s callgraph tree, or “Heavy and Tree” view, which splits the window and shows both simultaneously. Figure 2-4 The Profile Browser The window consists of several main parts: 1.
Time Profiling Profile Browser TheEdit ind F ind F command(Command-F )andtherelatedEdit ind F indNext F (Command-G )andEdit ind F ind F Previous (Command-Shift-G ) commands are very useful when you are searching for particular entries in a profile browser listing many symbols. Simply type the desired library or symbol name into the Find... dialog box, and Shark will automatically find and highlight the next instance of that library or symbol. The table consists of five columns: a.
Time Profiling Profile Browser e. Symbol — The symbol where this sample was located. Most of the time, this is the name of the function or subroutine that was executing when the sample was taken, but the precise definition is controlled by the compiler. One particular area for wariness is with macros and inline functions. These will usually be labeled according to the name of the calling function, and not the macro or inline function name itself.
Time Profiling Profile Browser 6. Process Popup Menu— This lists all of the sampled processes, in order of descending number of samples in the profile, plus an “All” option at the top. When you choose an option here, the Results Table is constrained to only show samples falling within the selected process. Each entry in the process list displays the following information: the percent of total samples taken within that process, process name, and process ID (PID).
Time Profiling Profile Browser The “Tree” view gives you an overall picture of the program calling structure. In the sample profile (Figure 2-8), the top-level function is [CelestiaOpenGLView drawRect:], which in turn calls [CelestiaController display], which then calls CelestiaCore::draw(), and so on. In “Tree” view, the Total column lists the amount of time spent in a function and its descendants, while the Self column lists the time spent only inside the listed function.
Time Profiling Profile Browser Note on Heavy/Tree comparisons: Please note that there may not be a one-to-one correspondence between entries in “Tree” view and “Heavy” view. If you select a function in “Heavy” view and then switch to “Tree” view, it will always select exactly one function in the tree. On the other hand, if you select a function in “Tree” view and then switch back to “Heavy” view, Shark will automatically select the “heaviest” symbol corresponding to that callpath.
Time Profiling Profile Browser deep callstacks being over-represented in the profile, since they are counted many times, but makes it easier to find symbols for frequently-occurring but non-leaf functions, since one no longer must drill down through multiple levels of disclosure triangles to find them. While Shark will allow you to use this mode with “tree” view, it is not recommended. 2. Color By Library— Uses colors to differentiate libraries in the Results table. 3.
Time Profiling Chart View Chart View Click Shark’s Chart tab to explore sample data chronologically, from either a thread- or CPU-based perspective. This can help you understand the chronological calling behavior in your program, as opposed to the summary calling behavior shown in the Results Table . Using this chart, you can see at a glance if your program rarely/often calls functions and if there are any recurring patterns in the way your program calls functions.
Time Profiling Chart View 1. Callstack Chart— This chart displays the depth (y-axis) of the callstack for each sample, chronologically from left-to-right over time (x-axis). The figure also clearly shows several key features of the chart: a. User Callstack — Most callstacks, in blue, represent user-level code from your program. b. Supervisor Callstack — Callstacks in dark red represent supervisor-level code stacks that were sampled. c.
Time Profiling Chart View 6. Callstack Table— This displays the functions within the callstack for the currently selected sample, with the leaf function at the top and the base of the stack at the bottom. As you select different samples in the chart, this listing will update to reflect the location of the current selection. This is displayed or hidden using the Callstack Table Button (#4). 7.
Time Profiling Chart View 13. View Popup Menu— This popup lets you choose to view sets of samples from different processor cores. Advanced Chart View Settings The first pane of the Advanced Settings drawer displays a new set of options if you switch to a Chart view (Figure 2-11). These controls affect the appearance of the chart, and are generally fairly minor: 1.
Time Profiling Chart View 7. Color Selection— Choose colors to use for user sample callstacks, kernel sample callstacks, and the selection area by clicking on these color wells. Figure 2-11 Advanced Settings for the Chart View The remainder of the controls visible in the Advanced Settings Drawer, which control Data Mining, are described in Data Mining (page 151). Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Time Profiling Code Browser Code Browser Double-clicking on an entry in the Results Table or Callstack Table will open a Code Browser view for that entry, as shown in Figure 2-12. If available, the source code for the selected function is displayed. Source line and file information are available if the sampled application was compiled with debugging information (see Debugging Information (page 146)).
Time Profiling Code Browser 2. Browse Buttons— You can use these buttons to maneuver through function calls. After you double-click on a function call (denoted by blue text) and go to the actual function, the “back” button here (left arrow) will be enabled. To return to the caller, just click on the “back” button. After you have maneuvered through a function call, you can navigate through code forward and backward just as you would navigate web pages in a web browser. 3.
Time Profiling Code Browser b. Total — This optional column lists the percentage of displayed references for each instruction or source line, including called functions. To see sample counts instead of percentages, double-click on the column. c. Line — The line number for each line from your original source code file. This column is particularly useful for sorting the browser window, in order to keep the line numbers there in sequence. d.
Time Profiling Code Browser 9. Source File Popup Menu—A given memory range can contain source code from more than one file because of inlining done by the compiler. You can select which source file to view using this menu. 10. Edit Button— You can open the currently displayed source file in Xcode by selecting the Edit button. The file will open up and scroll to your selected line, or the line with the most samples if nothing is selected. 11.
Time Profiling Code Browser a. Address Column — This displays the address of the assembly-language instruction displayed on this row. With PowerPC, this value simply increases by 4 with every row, but with x86 this will change by 1–18 bytes per row, depending upon the variable length of each instruction. You can double-click on this column to switch to relative decimal or hexadecimal offsets from the beginning of the address range. b.
Time Profiling Code Browser 4. Asm Help Button— Press this button to get help for the selected assembly-language instruction, as described in ISA Reference Window (page 54). Figure 2-14 Assembly Browser Advanced Code Browser Settings The Advanced Settings drawer displays a new set of options if you switch to a Code Browser view (Manual Session Symbolication). These options allow you to customize the viewing of source and assembly code and turn on and off various features of the browser.
Time Profiling Code Browser ● ● 6. Show Self Column— Toggles display of the column that lists the percentage of displayed references for each instruction or source line, but not including called functions. 7. Show Perf Event Column(s)— If the current profile contains performance counter information, this setting toggles display of that data within the browser. Otherwise, it will be disabled. 8. Show Code Analysis Columns— Toggles display of the columns that provide optimization tips.
Time Profiling Code Browser 5. Show G5 (PPC970) Details Drawer— (PowerPC-only) Shark can display graphs of instruction dispatch slot and functional unit utilization in an additional, G5-specific “details” drawer. Further details on is can be found in Code Analysis with the G5 (PPC970) Model (page 242). This function is always disabled for sessions recorded on Macs with other processor architectures. Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Time Profiling Code Browser Figure 2-15 Advanced Settings for the Code Browser Other architectures have slightly different options for items 3–5 of the Asm Browser settings. For x86-based systems, illustrated in , these options are: Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Time Profiling Code Browser ● Syntax— Chooses whether to display the x86 instructions in Intel assembler syntax or AT&T syntax (the default). ● Show Prefixes— If checked, instruction prefixes (like lock and temporary mode shifts) will be displayed. ● Show Operand Sizes— If checked, each instruction explicitly encodes its operand size into the mnemonic (AT&T syntax) or operand list (Intel syntax).
Time Profiling Code Browser The ISA Reference Window provides an indexed, searchable interface to the PowerPC, IA-32 (32-bit x86), or EM64T (64-bit x86) instruction sets. The reference is also integrated with selection in the Shark Assembly Browser – selecting an instruction in the table causes the ISA Reference Window to jump to that instruction’s definition in the manual. Figure 2-18 ISA Reference Window Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Time Profiling Tips and Tricks Tips and Tricks This section points out a few things that you might see while looking at a Time Profile , what they may mean, and how to optimize your code if you see them. The tips and tricks listed herein are organized according to the view most commonly used to infer the associated behavior. ● Profile Browser ● Where should I start? : When first presented with a Profile Browser, you will want to begin in “Heavy View” sorted by “Self” and see what pops to the top.
Time Profiling Tips and Tricks ● Chart View ● Different parts of the chart look visibly different: Different-looking areas were probably created by different code in your program as it executes different program phases of execution. In most applications, each of these will need to be optimized separately. As a result, you may want to sample these with different Shark sessions, so that you can examine the different phases separately.
Time Profiling Example: Optimizing MPEG-2 using Time Profiles Shark. Please note that in Xcode you will need to adjust the build settings for the Target that you are testing and the correct (optimized) build configuration . Unfortunately, it is quite easy to set options for the wrong target or build configuration accidentally.
Time Profiling Example: Optimizing MPEG-2 using Time Profiles After compiling and running the reference decoder, Shark generated the session displayed in Figure 2-19. Just by pressing the “Start” and “Stop” button, we get a session that lets us see that about half the execution time is spent in a combination of the Reference_IDCT() function and the floor() function.
Time Profiling Example: Optimizing MPEG-2 using Time Profiles Vectorization Optimizing the Reference_IDCT() function by converting it from floating point to integer also presented another possible optimization that could be helpful: SIMD vectorization. All Intel Macintoshes support the SSE instruction set extensions, allowing them to process 128-bit vectors of data, and most PowerPC Macintoshes support the very similar AltiVec™ extensions.
Time Profiling Example: Optimizing MPEG-2 using Time Profiles Add_Block()), colorspace conversion (dither()), and pixel interpolation (conv420to422() and conv422to444()) achieved a speedup of 5.69x over the original code — a dramatic improvement made possible in a relatively short amount of time thanks to the feedback provided by Shark.
Time Profiling Example: Optimizing MPEG-2 using Time Profiles Optimization Step Speedup Fast floor() 1.12x Integer IDCT 1.86x Vector IDCT 2.05x All Vector 5.69x Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
System Tracing Shark’s System Trace configuration records an exact trace of system-level events, such as system calls, thread scheduling decisions, interrupts, and virtual memory faults. System Trace allows you to measure and understand how your code interacts with Mac OS X and how the threads in your multi-threaded application interact with each other. If you would like to gain a clear understanding of the multi-threaded behavior of a given program, characterize user vs.
System Tracing Basic Usage and multithreading problems, because these issues frequently hinge upon managing the precise timing of interaction events properly in order to minimize the time that threads spend waiting for resources (blocked), as opposed to minimizing execution time. Figure 3-1 Time Profile vs.
System Tracing Basic Usage ● Start Time ● Stop Time ● A backtrace of the user-space function calls (callstack) associated with each event ● Additional data customized depending on the event type that triggers recording (see Trace View In-depth (page 73) for details) In the course of profiling your application, it may become necessary to trim or expand the number of events recorded. Most of the typical options are tunable by displaying the Mini Config Editor, depicted in Figure 3-2.
System Tracing Interpreting Sessions Out of memory errors?: If you see these when starting a system trace, then just reduce the Sample Limit value until Shark is able to successfully allocate a buffer for itself. Interpreting Sessions Upon opening a System Trace session, Shark will present you with three different views, each in a separate tab.
System Tracing Interpreting Sessions Summary View In-depth The Summary View is the starting point for most types of analysis, and is shown in Figure 3-3. Its most salient feature is a pie chart that gives an overview of where time was spent during the session. Time is broken down between user, system call, virtual memory fault, interrupt, idle, and other kernel time. Figure 3-3 Summary View Underneath the pie chart, there are individual summaries of the various event types.
System Tracing Interpreting Sessions Scheduler Summary The Scheduler Summary tab, shown in Figure 3-4, summarizes the overall scheduling behavior of the threads running in the system during the trace. Each thread is listed in the outline underneath its owning process, as shown at (1). To the left of each thread’s name, Shark displays the number of run intervals of that thread (or all threads within a process) that it recorded in the course of this session.
System Tracing Interpreting Sessions Note on Thread IDs: Thread IDs on Mac OS X are not necessarily unique across the duration of a System Trace Session. The Thread IDs reported by the kernel are not static, single use identifiers they are actually memory addresses. When you destroy a thread, and then create a new one immediately thereafter, there is a very high probability that the new thread will have the same thread ID as the one you just destroyed.
System Tracing Interpreting Sessions Note on System Trace callstacks: In rare cases, it is not possible for System Trace to accurately determine the user callstack for the currently active thread. In this case, it may just copy the callstack from the previous sample. While it occurs so rarely that it is usually not a problem, this “interpolation” can occasionally result in bad callstack information.
System Tracing Interpreting Sessions More settings for modifying this display are available in the Advanced Settings drawer, and are described in Summary View Advanced Settings (page 71). Figure 3-6 Summary View: VM Faults Summary View Advanced Settings When you are viewing the System Calls Summary and VM Faults Summary tabs, several options are available in the Advanced Settings drawer (see Advanced Settings Drawer (page 23)), as seen in Figure 3-7: 1.
System Tracing Interpreting Sessions 4. Callstack Data Mining— The System Call and VM Fault summaries support Shark’s data mining options, described in Data Mining (page 151), which can also be used to customize the presentation of the data. Figure 3-7 Summary View Advanced Settings Drawer Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
System Tracing Interpreting Sessions Trace View In-depth The Trace View lists all of the events that occurred in the currently selected scope. Because events are most commonly viewed with “System” scope (all processes and all CPUs), each event list has a Process and a Thread column describing the execution context in which it took place. As with the Summary View , the Trace View is sub-divided according to the class of event.
System Tracing Interpreting Sessions ● Reason— Reason that the thread tenure ended (described in Thread Run Intervals (page 79)) ● Priority— Dynamic scheduling priority of the thread Figure 3-8 Trace View: Scheduler System Call Trace The System Call Trace Tab, shown in Figure 3-9, lists the system call events that occurred during the trace. In most respects, this Tab behaves much like the scheduler tab described previously, but it does have a couple of new features.
System Tracing Interpreting Sessions occurred. Otherwise, the beginning and ending thread interval indices are listed. Because it is possible for an event to start before the beginning of a trace session, or end after a trace session is stopped, event records may be incomplete. Incomplete events are listed with “?” for the unknown thread run interval index, and have a gray background in the event lists. ● Process— Shows the process in which the system call occurred.
System Tracing Interpreting Sessions You can toggle the display of the Callstack Table , which displays the user callstack for the currently selected VM fault entry, by clicking the button in the lower right corner of the trace table. The columns in the trace view have the following meanings in this tab: ● Index— A unique index for the VM fault event, assigned by Shark ● Interval— Displays thread run interval(s) in which the VM fault occurred. Each fault occurs over one or more thread run intervals.
System Tracing Interpreting Sessions ● Size— Number of bytes affected by the fault, an integral multiple of the 4096-byte system page size Figure 3-10 Trace View: VM Faults Timeline View In-depth The Timeline View , displayed in Figure 3-11, allows you to visualize a complete picture of system events and threading behavior in detail, instead of just summaries. Each row in the timeline corresponds to a traced thread, with the horizontal axis representing time.
System Tracing Interpreting Sessions ● Keyboard Navigation— After highlighting a Thread Run Interval by clicking on it, the Left or Right Arrow keys will take you to the previous or next run interval from the same thread, respectively. If you highlighted a System Call, VM Fault, or Interrupt, the arrow keys will scroll to the next event of any of these types. Holding the Option key, however, will scroll to the next event of the same type.
System Tracing Interpreting Sessions Thread Run Intervals Each time interval that a thread is actively running on a CPU is a thread run interval . Thread run intervals are depicted as solid rectangles in the Timeline View , as is shown in Figure 3-12, with lines depicting context switches joining the ends of the two threads running before and after each context switch.
System Tracing Interpreting Sessions There are five basic reasons a thread will be switched out by the system to run another thread: Blocked— The thread is waiting on a resource and has voluntarily released the processor while it waits. Explicit Yield— The thread voluntarily released its processor, even though it is not waiting on any particular resource. Quantum Expired— The thread ran for the maximum allowed time slice, normally 10ms, and was therefore interrupted and descheduled by the kernel.
System Tracing Interpreting Sessions MIG Message— Mach interface generator routines, which are usually only used within the kernel Figure 3-14 Timeline View: System Calls Calls from all of these groups are visible in Figure 3-14. Clicking on the icon for a system call will bring up the System Call Inspector, as seen in Figure 3-15. The resulting inspector displays many useful pieces of information which you can use to correlate the system call to you application’s code.
System Tracing Interpreting Sessions ● Arguments— The first four integer arguments Figure 3-15 System Call Inspector VM Faults As is the case with almost all modern operating systems, Mac OS X implements a virtual memory system. Virtual memory works by dividing up the addressable space (typically 4GB on a 32-bit machine, currently 256 TB on 64-bit machines) into pages (typically 4 KB in size).
System Tracing Interpreting Sessions Non-Zero Fill— A previously unused page not marked “zero fill on demand” was touched for the first time. Generally, this is only used in situations when the OS knows that page being allocated will immediately be overwritten with new data, such as when it allocates I/O buffers. Copy on Write (COW)— A shared, read-only page was modified, so the OS made a private, read-write copy for this process. Page Cache Hit— A memory-resident but unmapped page was touched.
System Tracing Interpreting Sessions Three of these types of faults are visible in Figure 3-16. A zero-fill fault is circled to highlight it. Clicking on a VM Fault Icon will bring up the VM Fault Inspector, as seen in Figure 3-17. This inspector functions much like the System Call Inspector, except instead of listing arguments and return values, the VM Fault Inspector lists the fault address, size, and — for code faults — the library in which it occurred.
System Tracing Interpreting Sessions Clicking on an Interrupt icon will bring up the Interrupt Inspector. This inspector lists the amount of time the interrupt consumed, broken down by CPU and wait time. Figure 3-18 Interrupt Inspector Sign Posts You can often get a good idea of your application’s current state by inspecting the user callstacks associated with the built-in VM fault and system call events that occur in your application.
System Tracing Interpreting Sessions the amount of time spent on the CPU and time spent Waiting between the begin and end event. Since you can supply different arguments to the start and end points of an interval sign post, the inspector supplies “Begin” and “End” tabs that display the arguments supplied to the start and end points, respectively.
System Tracing Interpreting Sessions 4. Draw Context Switch Lines— Check this to enable (default) or disable the thin gray lines that show context switches, linking the thread tenures before and after the switch that ran on the same CPU core. 5. Detailed Event Icons— Deselecting this instructs Shark turn off the icons that identify the various types of VM faults and system calls and just replace them with generic “plain page” VM fault and “gray phone” system call icons.
System Tracing Interpreting Sessions 6. Label Events— These checkboxes allow you to enable or disable the display of event icons either entirely, by type group, or on an individual, type-by-type basis. For example, you can use them to enable interrupt icons or to remove icons for events, such as VM faults, that you may not be interested in at the present time. Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
System Tracing Interpreting Sessions Figure 3-20 Timeline View Advanced Settings Drawer Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
System Tracing Sign Posts Sign Posts Even with all of the system-level instrumentation already included in Mac OS X, you may sometimes find that it is helpful or even necessary to further instrument your code. Whether to orient yourself within a long trace, or to time certain operations, Sign Posts can be inserted in your code to accomplish these and other tasks.
System Tracing Sign Posts ● User Applications using CHUD Framework: User Applications that link with the CHUD.framework, and can simply call chudRecordSignPost(), which has the following API: int chudRecordSignPost(unsigned code, chud_signpost_t type, unsigned arg1, unsigned arg2, unsigned arg3, unsigned arg4); ● User Applications not using CHUD Framework: User Applications for which you prefer not to link with the CHUD.framework can still create signposts using explicit system calls.
System Tracing Sign Posts Listing 3-2 signPostExample.c #include #include
System Tracing Tips and Tricks /* * Use the kernel_debug() method when in the kernel (arg5 is unused), * DBG_FUNC_START corresponds to chudBeginIntervalSignPost.
System Tracing Tips and Tricks It can also indicate that your threads frequently block while waiting for locks. In this case, it is possible that the short intervals are inherent to your program’s locking needs. However, you may want to see if you can reduce the inter-thread contention for locks in your code so that the locks are not contested nearly as much. ● Inordinate count of the same system call: Sometimes, things like select() are called too often.
System Tracing Tips and Tricks ● Multi-threaded application only has only one thread running at a time: First of all, ensure you’ve performed the System Trace on a multiprocessor machine. You can do this by pressing Command+I to bring up the session inspector, which will list the pertinent hardware information from the machine on which the session was created. Usually, this is not an issue. Second, ensure your selected scope is not limited to a single CPU.
System Tracing Tips and Tricks Another possibility is that you’ve simply not given your worker threads enough work to do. Verify this theory using the tip from the summary view suggestions above. ● One processor doesn’t show any thread run intervals until much later than another: If this happens, chances are you are using the Windowed Time Facility. This is due to a fundamental difference in how the data is recorded when using this mode.
Other Profiling and Tracing Techniques Not every performance problem stems from computation in a program or a program’s interaction with the operating system. For these other types of problems, Shark provides a number of profiling and tracing configurations that focus on individual types of performance problems. Any of them may be chosen using the configuration list in the main Shark window before pressing “Start.
Other Profiling and Tracing Techniques Time Profile (All Thread States) 5. Prefer User Callstacks— When enabled, Shark will ignore and discard any samples from threads running exclusively in the kernel. This can eliminate spurious samples from places such as idle threads and interrupt handlers, if your program is not affected by these. 6.
Other Profiling and Tracing Techniques Time Profile (All Thread States) showing you how much time your threads are blocked and how often they are running. As a result, it is a good “sanity check” technique to make sure that threads that are supposed to be CPU-bound are not accidentally wasting time blocked, and that threads that are supposed to be blocked really are idle.
Other Profiling and Tracing Techniques Time Profile (All Thread States) few disclosure triangles open, this view lets you logically follow your code paths until you reach a point where they call blocking library routines. At that point, and possibly with the help of a code browser, you should be able to get a good idea of which parts of your code are blocking, and how frequently this blocking is occurring.
Other Profiling and Tracing Techniques Malloc Trace Note regarding launched target processes: When launching a process (as described in Process Launch (page 122)) with Time Profile (All Thread States) , you may notice samples in _dyld_start. Since Shark starts sampling the process before the launched process begins executing, some samples will fall in this method.
Other Profiling and Tracing Techniques Malloc Trace 3. Start Delay— Specify a length of time that Shark should wait after being told to start collecting a profile before the collection begins. If the program action to be profiled requires a sequence of actions to start, this option can be used to delay the start until after the setup actions have been completed.
Other Profiling and Tracing Techniques Malloc Trace is often a good idea to look over the routines near the top of this list and make sure that both the routines allocating memory are the ones you think should be allocating memory, and that the amount of memory allocated by each routine makes sense. Figure 4-5 Malloc Trace session, profile browser ● Code Browsers: Locate Allocating Code— If you see a potentially troublesome routine, double-clicking it will bring it up in a code browser window.
Other Profiling and Tracing Techniques Malloc Trace allocation and deallocation operations outside of loops, so that you can reuse the same memory buffers repeatedly without reallocating them each time through the loop.
Other Profiling and Tracing Techniques Malloc Trace Advanced Display Options Each Malloc Trace records a few additional pieces of information at each allocation event. These are not displayed by default, but can be useful in some situations.
Other Profiling and Tracing Techniques Malloc Trace When you enable display of a particular type of data, it will appear in several places. First, columns displaying it will appear in the Profile Browser (as shown previously in Figure 4-5 (page 103)), although this type of display is really only meaningful for Alloc Size . Raw values are displayed in the list of samples at the bottom of the Chart View (as shown in Figure 4-8), and all types of displays are useful here.
Other Profiling and Tracing Techniques Static Analysis Static Analysis Most of Shark’s profiling methods limit their code analysis to those functions that appear dynamically in functions that are executed during the profiling. Dead or otherwise unused code is not analyzed or presented for optimization precisely because it has very little effect on the measured performance.
Other Profiling and Tracing Techniques Using Shark with Java Programs ● PowerPC Model — Selects the PowerPC model to use when searching for and assigning problem severities . ● Intel Model — Selects the Intel model to use when searching for and assigning problem severities . Figure 4-9 Static Analysis mini configuration editor Once you have created a Static Analysis session, you can examine it to see Shark’s optimization suggestions for your program.
Other Profiling and Tracing Techniques Using Shark with Java Programs Sun’s Java virtual machine included with Mac OS X do provide an interface that Shark can use. As a result, Shark includes some special, Java-only configurations that use this interface to allow you to usefully profile your Java applications. Figure 4-10 How Shark-for-Java differs from regular Shark configurations *AVA #LASSES *AVA 3HARK ATTACHES TO THE *6- INSTEAD *AVA6- 3HARK NORMALLY PROFILES BY LOOKING AT THE MACHINE .
Other Profiling and Tracing Techniques Using Shark with Java Programs ● Java Alloc Trace: This records memory allocations and the sizes of the objects allocated, and is analogous to a regular Malloc Trace (Malloc Trace (page 101)). Not surprisingly, the resulting session window produced by Shark is very similar to one produced by Malloc Trace . As with Malloc Trace , the display is just that of a Time Profile — albeit a Java Time Profile, in this case — with an added “allocation size” column.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview Event Counting and Profiling Overview After analyzing an application using a Time Profile , you may find it informative to count system events or even sample based on system events in order to understand why your application spends time where it does. The best way to do this is to take advantage of the performance counters built into your Mac’s processors (usually called PMCs) and Mac OS X itself.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview Counter Spreadsheet Advanced Settings (page 116)). Selecting rows in this list also selects the corresponding columns in the counter table, graphing them. Use Command-clicks (for discontiguous selection) and/or Shift-clicks (for contiguous selection) to select multiple rows simultaneously. 2. a. “Eye” Column — Uncheck the checkboxes in this column to hide columns in the results table(s).
Other Profiling and Tracing Techniques Event Counting and Profiling Overview e. Shortcut Result Column(s) — These columns show the performance counter results after they have been processed by the math in any “shortcut” equations.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview see Adding Shortcut Equations (page 119), below. For a complete description of how to write performance counter equations, including how to add them permanently to your configurations, see Counter Spreadsheet Analysis PlugIn Editor (page 196). Figure 4-11 Performance Counter Spreadsheet Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview The Counters Menu When you switch to the Counters tab in a session made with timed performance counters, a Counters menu will appear in the menu bar. You can also access this menu by control-clicking (or right-clicking, with a 2-button mouse) in the Results table , as shown in Figure 4-12.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview Performance Counter Spreadsheet Advanced Settings With the session window in the foreground, select Window Show Advanced Settings (Command-Shift-M ), as we described earlier in Advanced Settings Drawer (page 23). The palette of advanced controls will appear (Performance Counter Spreadsheet Advanced Settings).
Other Profiling and Tracing Techniques Event Counting and Profiling Overview This drawer contains three main panels, each with many different controls that affect the presentation of results: 1. 2. 3. Counter Shortcut Equations— This table displays the “Shortcut Equations” used to generate each of the computed results columns in the Results Table , one equation per row.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview c. d. ● Bars — Display results using a vertical bar chart. Bars from multiple selected columns will be superimposed over one another. ● Stacks of Bars — Display results as stacked vertical bar charts, with the values from all selected columns added together at each sample point.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview Adding Shortcut Equations This section gives a brief summary of how to add new “shortcut equation” results columns to your performance counter spreadsheet. For a full description of all the capabilities of shortcut equations, see Using the Editor (page 196). 1. Open up the Advanced Settings drawer, if you do not already have it open. 2.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview Note: The built-in L2 cache miss profile configuration is a great way to find lines in your code that access memory in ways that cause very slow L2 cache misses, events which can significantly slow down processors like the ones in modern Macs. Optimizing this code to reduce the number of cache misses by adjusting your algorithms and/or memory access patterns can be a very helpful way to improve performance significantly.
Other Profiling and Tracing Techniques Event Counting and Profiling Overview While none of the default configurations use this capability, it is also possible to essentially record callstacks like a Time Profile simultaneously with timed counter information, giving you timed counter recording with a way to approximately correlate results with your code, by building your own custom configuration.
Advanced Profiling Control Although the Start button makes starting and stopping Shark quite simple, sometimes it can be impractical, or even impossible to use. For example, how can you press the start button on a headless server? Profiling application launch can be hard to accomplish by hand as well.
Advanced Profiling Control Process Launch Process Attach mode, by selecting Process from the Target popup (Command-2 ). Now, select the “Launch...” target from the top of the process list (or use Command-Shift-L ). Choosing this “process” and then pressing “Start” will bring up the Process Launch Panel, shown below in Figure 5-2.
Advanced Profiling Control Process Launch 2. Working Dir— The full path to the working directory that the application will start using. By default, this is the path where the executable is located, but you may point it anywhere else that you like. When the application is executed, it will appear to have been started from a shell that had this directory as its working directory (i.e. the output of pwd) just before executing the command.
Advanced Profiling Control Batch Mode Batch Mode Batch mode queues up any sessions recorded without displaying them. Pending sessions are listed in the main Shark window. Batch mode allows multiple sessions to be recorded in quick succession, by not immediately incurring the overhead of displaying the viewers for each session. To enter Batch Mode, select the Sampling atch B Mode menu item (or Command-Shift-B ).
Advanced Profiling Control Windowed Time Facility (WTF) leaving the area of interest. However, you may not always know when you will encounter the “interesting” region of your program in advance. WTF mode sidesteps this problem by eliminating the need to “start” normally. While Shark stops in the normal way, as we show in Figure 5-5, the starting position is just a fixed number of samples (or effectively amount of time, with Time Profiling) before the end, instead of at a user-specified point.
Advanced Profiling Control Windowed Time Facility (WTF) Tracing the execution around an asynchronous event, such as inter-thread communication, the arrival of a network packet, or OS event such as a page fault, are all situations when WTF mode can make profiling easier, especially when these “glitches” occur at hard-to-predict times.
Advanced Profiling Control Unresponsive Application Measurements Second, the beginning of a WTF System Trace Timeline (see Figure 5-6) can appear a bit strange; different processors might first appear at vastly different points in the timeline. In the figure below, the timeline for the processor at (1) begins well before the other three processor timelines at (2).
Advanced Profiling Control Command Line Shark Sampling UnresponsiveApplications menu item (Command-Shift-A ). When Unresponsive Application Triggering is enabled, Shark will automatically switch to Batch Mode and display unresponsive application triggering options, as shown below.
Advanced Profiling Control Command Line Shark general, it is intended that you use it to collect sessions and then review your results with a graphical copy of Shark later. This section will discuss some common ways to use command line shark. A more complete description of the options available when using command line shark is available in its man page, shark(1). Basic Methodology There are four main ways to use command-line shark.
Advanced Profiling Control Command Line Shark Remote Mode A third way to use command line shark is remote mode, which works much like the remote mode supported by graphical Shark and described in Interprocess Remote Control (page 134). Once started in this mode, shark will wait for start/stop signals from other processes to arrive in any one of three ways: 1. A program instrumented with chudStartRemotePerfMonitor() and chudStopRemotePerfMonitor() calls can start and stop shark, respectively.
Advanced Profiling Control Command Line Shark ● Time Interval — shark -I allows you to change the sampling interval for configurations that support a sampling interval. Valid times are entered the same way as for time limits. ● Sample Limit — shark -S allows you to specify the maximum number of samples to record during each session. Some other, less commonly used options change the behavior of shark: ● Quiet Mode — shark -q will limit terminal output to reporting of errors only.
Advanced Profiling Control Command Line Shark Reports Command line shark supports generation of textual reports, either from session files that you’ve already created, or from new sessions as they are generated. These reports can be simple summaries (-g or -G options), or complete analysis reports (-t option). When creating a summary report, you can either create one from a session that is already saved on disk, with shark -g, or create it for new sessions, with shark -G.
Advanced Profiling Control Interprocess Remote Control More Information This section has presented some of the most common options and techniques for using command-line shark. For more detailed information on all available options, please read the man page: shark(1). Interprocess Remote Control In some cases, it is best to have your programs start and stop Shark’s sampling at precisely chosen points in their execution.
Advanced Profiling Control Interprocess Remote Control It is important to keep in mind that many profiling techniques used by Shark employ statistical sampling in order to generate a profile. If the sampling interval is longer than the time it takes to execute the instrumented section of code, you may see few or no samples in the resulting profile. Statistical sampling is most useful when at least several hundred samples are taken.
Advanced Profiling Control Interprocess Remote Control sprintf(label_str, "Hanoi #%d", i); chudStartRemotePerfMonitor(label_str); Hanoi('A','B','C',i); chudStopRemotePerfMonitor(); } chudReleaseRemoteAccess(); Note: To compile this example program, you must instruct gcc to link with the CHUD framework: gcc -framework CHUD -F/System/Library/PrivateFrameworks towersOfHanoi.c Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Profiling Control Interprocess Remote Control The Towers of Hanoi test program demonstrates the need for a sampling interval that is much shorter than the time between the calls to start and stop Shark (see Figure 5-8). Less than 100 samples are taken unless the problem size is at least 15 disks. As a result, you will often find that it is better to sample your entire application and use Shark’s powerful Data Mining (page 151) mechanisms to narrow down what is displayed after sampling.
Advanced Profiling Control Network/iPhone Profiling When used to stop profiling, chudRemoteCtrl will not return until Shark has stopped profiling. In the case of command-line shark, chudRemoteCtrl will not exit until the session file is written to disk. More information on chudRemoteCtrl is available from its man page, chudRemoteCtrl(1).
Advanced Profiling Control Network/iPhone Profiling Important: Shark cannot capture symbol information on the iPhone itself, so “raw” sessions recorded from an iPhone will appear in Shark labeled only based on sample address ranges. This can make it very difficult to understand the results that Shark returns. Instead, you must tell Shark to recover symbol information afterwards from a copy of your iOS application which is stored locally on your Macintosh.
Advanced Profiling Control Network/iPhone Profiling ● Control network profiling of shared computers — Any computers on the network (in the local domain) running Shark in “shared” network mode will automatically be listed as available for control by this instance of Shark.
Advanced Profiling Control Network/iPhone Profiling ● Config — The currently active Sampling Configuration on the shared computer. The entries in this column are menus, just like the one in Shark’s main window (see Main Window (page 17)). The selected configuration on the remote shared computer can be changed by changing the selection in the menu ● Target — The currently active Profiling Target on the shared computer.
Advanced Profiling Control Network/iPhone Profiling then respond to network requests to start and stop profiling. A sample transcript of a remote command line shark in “Network Sharing” mode is shown in Figure 5-10. For more information on the usage and configuration of command line shark, see Timed Counters: The Performance Counter Spreadsheet (page 111). Figure 5-10 Command Line Shark in Network Profiling Mode Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Profiling Control Network/iPhone Profiling Mac OS X Firewall Considerations The sharing firewall on Mac OS X can prevent Shark’s network profiling from working in either sharing or control mode. When the Shark Network Manager successfully opens a network connection, the communication port number Shark is using is listed in brackets. This is normally port number 7475. Unfortunately, this port is not “open” through the firewall by default.
Advanced Profiling Control Network/iPhone Profiling Click the Sharing... button in the warning dialog to bring up the System Preferences window Sharing tab. Otherwise click the Ignore button to dismiss the dialog, but note that doing so may result in the inability to use Shark over the network. Once in the Sharing Preference window of System Preferences , select the Firewall tab.
Advanced Session Management and Data Mining SwiftObjective-C Often, the profile analysis windows can provide you with a very helpful view of your application’s behavior using the default settings. However, there are also many tools available in Shark that can help you sort through the large quantity of data that Shark can collect quite quickly.
Advanced Session Management and Data Mining Manual Session Symbolication If symbol lookup fails, Shark may present the missing “symbols” in two different ways. If the memory of the process is readable — for example, a binary that has had its symbols stripped — Shark tries to determine the range of the source function by looking for typical compiler-generated function prologue and epilogue sequences around the address of the sampled instruction.
Advanced Session Management and Data Mining Manual Session Symbolication require debugging information to work, but it can be much more helpful if it’s available. In case you record a Shark session and discover that symbols have not been captured, then you can attempt to have Shark add them in afterwards. Figure 6-1 Session Inspector: Symbols The most common way to “symbolicate” or add symbols (along with other debugging information) to your session is to simply use the File ymbolicate... S command.
Advanced Session Management and Data Mining Manual Session Symbolication No matter which way you choose to get here, you will be presented with a Symbolication dialog (Figure 2-20). Figure 6-2 Symbolication Dialog Use this dialog to select a symbol-rich (but otherwise identical) version of the binary you are symbolicating. The version, creation date and size is shown for both the original and selected binary. For maximum flexibility Shark does not restrict what you can select in any way.
Advanced Session Management and Data Mining Manual Session Symbolication Shark will warn you if you select a binary that is potentially problematic. If you do happen to select an executable that isn’t a good match, the profile results will be incorrect. Heavy View and Tree View show an example session before and after symbolication. Figure 6-3 Before Symbolication Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Session Management and Data Mining Managing Sessions Figure 6-4 After Symbolication Managing Sessions If you have multiple sessions measuring the same application, it is possible to use Shark to compare or merge those sessions with each other. Comparing Sessions Shark can be used for tracking performance regressions. Shark allows you to compare the contents of two session files sampling the same process through the File Compare ... menu item (Command-Option-C ).
Advanced Session Management and Data Mining Data Mining When used, a new session is created from two existing ones: Session A and Session B. The first session (Session A) is given a negative scaling factor, and the second session (Session B) is given a positive scaling factor. The result of a compare operation is a new session with negative profile entries for more samples in the earlier session (Session A), and positive profile entries for more samples in the later session (Session B).
Advanced Session Management and Data Mining Data Mining Callstack Data Mining In order to understand how to use data mining to better understand your application, it is necessary to first understand a few fundamental concepts about samples and callstacks. Each Shark session contains some number of samples.
Advanced Session Management and Data Mining Data Mining large routines farther down the callstack that call many other routines in the course of their execution. Once you have a clear picture of how callstacks are converted into call trees, it is easier to understand the application of the data mining operations.
Advanced Session Management and Data Mining Data Mining Figure 6-7 Tree View main Total: Self: foo 5 0 Total: Self: bar Total: Self: 2 0 cos 2 0 Total: Self: 1 1 sqrt Total: Self: bar baz Total: Self: 1 1 3 0 Total: 1 Self: 1 sqrt Total: 1 Self: 1 cos Total: 1 Self: 1 Shark’s Data Mining operations allow you to prune down call trees in order to make them easier to understand.
Advanced Session Management and Data Mining Data Mining in controlled ways. For example, you often won’t care about the exact places that samples occur within MacOS X’s extensive libraries — only which of your functions are calling them too much. Data Mining can help with simplifications like this. It is accessible in three different ways.
Advanced Session Management and Data Mining Data Mining a flag such as ‘–g’ with GCC or XLC, and in the process eliminating a lot of user-level code that you probably do not have control over. Samples from code that isn’t called from debug-friendly code are eliminated entirely. 4. Hide Weight < N— Hides any granules that have a total weight less than the specified limit. This macro helps reduce visual noise caused by granules (i.e.
Advanced Session Management and Data Mining Data Mining 9. Focus Callers of Symbol X — Removes functions called by the specified symbol and removes callstacks that do not contain the specified symbol. 10. Focus Callers of Library X — Removes functions called by the specified library and removes callstacks that do not contain the specified library. 11. Unfocus All— Undo all Focus operations. This same menu appears as a contextual menu on entries in the Heavy , Tree and Callstack results tables.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile The Perf Count Data Mining palette also supplies a global enable/disable toggle, much like the one available with conventional data mining, and check boxes for toggling the visibility of perf count information (the eye column) and whether or not the perf count data is accumulated across processors (the column), on a per-counter basis.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 2. Make four shapes as shown in Figure 6-11 Figure 6-11 3. Example Shapes Repeat the following steps until the app becomes sluggish (takes a half second or second to select all): ● Select All (Command-A ) ● Copy (Command-C ) ● Paste (Command-V ) Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile This should take 8-10 times (maybe more) depending on hardware. When you are done it should look something similar to Figure 6-12 Figure 6-12 Example Shapes, Replicated 4. Click in blank area of the window to deselect all the shapes. 5. Do select all and notice how long it takes for all of them to be selected. This is a performance problem. Taking Samples 1.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile This reveals a third pop-up button that you can use to target your application. Select Sketch from the list of running applications. Figure 6-13 Sampling a Specific Process 3. Switch back to Sketch and make sure nothing is selected. 4. Move the Sketch window to expose the Shark window (optional but makes things easier). 5. Press Option-Escape to start sampling. 6.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile High Level Analysis The session window gives you by default a summary of all the functions that the sampler found samples in and the percentage of the samples that were found there. So in the example, 14.1% of the samples were found in objc_msgSend. This view is very useful for doing analysis of performance when the bottlenecks occur in leaf functions.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 3. Click on the callstack button on the lower right corner of the table to reveal the callstack pane, as shown in Figure 6-15. As you click on symbols on the left, the callstack pane will show you the stack leading up to the selected symbol. Since system libraries and frameworks were filtered out in the previous step, you will only see your application's symbols.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 2. Double click on the symbol -[SKTGraphicView selectAll:] in the tree view above. You will see a source window that looks like Figure 6-17 Figure 6-17 Source View: SKTGraphicView selectAll The code browser uses yellow to indicate sample counts that occur in this function or functions called by that function. Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 3. Double-click on the yellow colored line to navigate to the function (performSelector) called here. When the new source window comes up, double-click in the yellow area marked with 2.7 s.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 4. Double-click on the yellow colored line [self performSelector: sel withObject:[array ObjectAtIndex:i]]; and you'll get Figure 6-19: Figure 6-19 Source View: SKTGraphicView selectGraphic There are several hotspots here: At line 116, there is a call to indexOfObjectIdenticalTo:graphic. This is a linear search of the selected graphics.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 5. Double-click on [self invalidateGraphic:graphic]; and you'll get Figure 6-20. This contains one line of expensive code that tests for nested objects. Figure 6-20 Source View: SKTGraphicView invalidateGraphic It is interesting to note that even with this fairly quick analysis we have already identified several glaring problems.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile Introduction To Focusing This example will take us through analyzing the behavior of drawing the selected rectangles. Here, we will develop ideas for analyzing larger and more complex programs (or frameworks) that involve multiple libraries. In doing so, we will introduce the Analysis menu/context menu and the ideas of focusing and filtering.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 5. Choose "Focus Symbol -[SKTGraphicView drawRect:]" and you will get something that looks like Figure 6-23 Figure 6-23 After Focus Symbol -[SKTGraphicView drawRect:] The bottom pane (Tree view) is now rooted on the symbol that we focused on and the items in the top pane (Heavy view) have changed to reflect only the leaf times relative to the execution tree under this new root.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 6. Expand -[SKTGraphicView drawRect:] in the bottom outline a few times until it looks likes likeFigure 6-24: Figure 6-24 After focus and expansion There are two interesting things here: ● The self time is pretty large in this function ● A lot of time is spent in -[SKTGraphic drawHandleAtPoint: inView] Let's look at the self time first. Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 7. Double click on -[SKTGraphic drawInView:isSelected] to see the source, as shown in Figure 6-25: Figure 6-25 Source View: SKTGraphic drawInView:isSelected: Here we see that time is split pretty evenly between the AppKit graphics primitive [path stroke] and the call to -[SKTGraphic drawHandleAtPoint: inView].
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 8. Double click on line 406 on the text -[self drawHandlesInView: view] and you'll get Figure 6-26: Figure 6-26 Source View: SKGraphic drawHandlesInView: This continues on with other calls to [self drawHandleAtPoint: inView], so it's been elided for brevity. Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 9. Double click on line 502 in the text [self drawHandleAtPoint: ...] and it will take you to the code for [SKTGraphicview drawHandleAtPoint: ...] which is shown in Figure 6-27 Figure 6-27 Source View: SKGraphic drawHandleAtPoint:inView: Here we see another call into an NS drawing primitive.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 2. We're going to work with the “Heavy View” (the upper profile) for a bit. So click the and set it back to 3. . Select the first symbol in the upper profile, as shown in Figure 6-28. Figure 6-28 Heavy View of Focused Sketch Notice that the stack view on the right shows a backtrace leading up to our old friend -[SKTGraphicView drawRect:]. 4.
Advanced Session Management and Data Mining Example: Using Data Mining with a Time Profile 5. In the left hand outline select the symbol ripd_mark and control+click on it to bring up the data mining contextual menu. Choose "Charge Library libRIP.A.dylib" and you get Figure 6-30: Figure 6-30 After Charge Library libRIP.A.dylib Notice that the symbols for libRIP.A.dylib are gone from the samples. Now this is a bit cleaner, but there are still multiple layers in CoreGraphics.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace This example is a bit simplistic, but it shows the power of the exclusion operations to strip out unnecessary information and identify where the real choke points are in the middle part of the execution tree. Please note that using the data mining operations does not change the underlying sample data that you've recorded.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace 3. Target your application and choose “Malloc Trace” instead of “Time Profile,” as with Figure 6-32. Figure 6-32 Malloc Trace Main Window 4. Switch back to Sketch. 5. Move the Sketch window to expose the Shark window (optional but makes things easier). 6. Make sure everything is selected. 7. Press Option+Escape to start sampling. 8.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace The window should look like Figure 6-33, if you have gone through Tutorial 1 first. Otherwise, it will look similar but not exactly the same. Figure 6-33 Result of Malloc Sampling Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace Graphical Analysis of a Malloc Trace 1. Click on the Chart Tab and you'll get a window that looks like Figure 6-34. Figure 6-34 Chart View The lower graph a standard plot of the callstacks, with sample number on the X axis and stack depth on the Y axis, while the upper graph is a plot of the size of each allocated block plotted against the sample number.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace 2. Select the first hump just before sample 6,000 and enlarge it, as shown in Figure 6-35: Figure 6-35 Place to Select The yellow indicates the tenure of different stack frames. Stack frame 0 is main and it is active the entire time. As you get deeper into the stack the tenures get narrower and narrower.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace 3. Now use the slider on the bottom left of the window to adjust zoom. Play with this a bit. As you zoom in and out you'll see that there are multiple levels of unfolding complexity — much like a fractal.
Advanced Session Management and Data Mining Example: Graphical Analysis using Chart View with a Malloc Trace 4. We'll finish up with another good application of this graphical analysis. Click on the call stack to reveal the call stack for this sample, as shown in Figure 6-36: Figure 6-36 button Graph View with Call-Stack Pane Using the callstack view, notice that a bunch of XML parsing to build up some kind of NSPrintInfo is occurring. This is surprising since all we did was a clipboard copy.
Custom Configurations Up until now, you have been using the configuration menu in Shark’s main window (in Figure 7-1) to select from various built-in sampling methods. Each of these sampling methods is called a configuration (abbreviated as “configs"), and Shark saves each configuration as a separate configuration file (which is also often called a “config”).
Custom Configurations The Config Editor The Config Editor The Configuration Editor lets you individually modify settings for any of Shark’s modules, which are called PlugIns . The properties available in each PlugIn differ depending on the nature of the work that particular PlugIn is designed to do. Shark uses three types of PlugIns: ● Data Source – These are responsible for collecting and/or generating session data.
Custom Configurations The Config Editor ● You can Rename any custom config in the list, but not built-in config files. A renamed config will be changed in the appropriate Configs folder immediately. ● You can Import any config that you may have saved on your system or a mounted fileserver. Imported configs are copied to your home $USER/Library/Application Support/Shark/Configs folder. You can also perform this function without invoking the Configuration Editor by using the Config Imp ort...
Custom Configurations Simple Timed Samples and Counters Config Editor ● In Advanced mode, all of the available plugins are listed with a checkbox next to each indicating whether or not it is enabled in the current config. Figure 7-2 Config Editor The remainder of this chapter describes Shark’s wide variety of PlugIn editors that are controllable through the Configuration Editor .
Custom Configurations Simple Timed Samples and Counters Config Editor ● Sampling Tab – The controls on this tab (see Figure 7-3) determine when to start and stop recording samples. 1. Windowed Time Facility— If enabled, Shark will collect samples until you explicitly stop it. However, it will only store the last N samples, where N is the number entered into the sample history field (10,000 by default). This mode is also described in Windowed Time Facility (WTF) (page 125). 2.
Custom Configurations Simple Timed Samples and Counters Config Editor column to select the performance counter mode (None, Counter, or Trigger). Only a small subset of possible counter options are available here. For more, you will have to use the Advanced settings, described in Hardware Counter Configuration (page 202). Figure 7-4 Simple Timed Samples and Counters Data Source - Counter Settings Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Custom Configurations Malloc Data Source PlugIn Editor Malloc Data Source PlugIn Editor The Malloc data source is used for the Malloc Trace config described in Malloc Trace (page 101). It is used for collecting a memory allocation profile from a particular executable. All of its configurable controls are contained in a single tab (see Figure 7-5), which modifies the timing of starting and stopping of memory allocation recording behavior: Figure 7-5 Malloc Data Source - Sampling Settings 1 2 3 1.
Custom Configurations Static Analysis Data Source PlugIn Editor Static Analysis Data Source PlugIn Editor The Static Analysis data source is used by the Static Analysis default configuration, described in Static Analysis (page 107). It is used to search for potential performance issues by looking for problems that might crop up through some other (as yet untested) code path.
Custom Configurations Java Trace Data Source PlugIn Editor 4. Processor Settings— Shark needs to know which model of processor is your target before it can examine code and find potential problems. Separate menus are provided for PowerPC and Intel processors because it can analyze for one model of each processor family simultaneously. ● PowerPC Model — Selects the PowerPC model to use when searching for and assigning problem severities .
Custom Configurations Sampler Data Source PlugIn Editor Sampler Data Source PlugIn Editor The Sampler data source provides the same functionality as the separate Sampler application and command-line tool. It is not used for any of the default configurations provided with Shark, as most of its functionality has been superseded by features of the much more sophisticated “Timed Samples and Counters” PlugIn.
Custom Configurations System Trace Data Source PlugIn Editor System Trace Data Source PlugIn Editor This data source collects data for the System Trace default configuration, described in System Tracing (page 63). All configurable features can be modified on a single tab (see Figure 7-9), which adjusts basic timing parameters: Figure 7-9 System Trace Data Source - Settings 2 1 3 4 5 1. Sample Limit — The maximum number of samples to record.
Custom Configurations All Thread States Data Source PlugIn Editor All Thread States Data Source PlugIn Editor This data source collects data for the Time Profile (All Thread States) default configuration, described in Time Profile (All Thread States) (page 97), which samples the callstacks of all threads on the system simultaneously, whether they are running or blocked.
Custom Configurations Analysis and Viewer PlugIn Summary Analysis and Viewer PlugIn Summary All Data Source PlugIns include configuration editors. However, most of the analysis and viewer editors do not. While you generally will not need to spend much time worrying about these plugins during the configuration process, you will still need to enable or disable the correct PlugIns in your configuration in order to be able to see your results in the way you expect.
Custom Configurations Counter Spreadsheet Analysis PlugIn Editor ● System Trace: Timeline— This can only be used with the “System Trace” data source and analysis PlugIns. It displays the Timeline tab used by System Trace and described in Timeline View In-depth (page 77). ● System Trace: Raw— This can only be used with the “System Trace” data source and analysis PlugIns. It displays raw and unprocessed samples recorded by System Trace, and is normally not used by end users.
Custom Configurations Counter Spreadsheet Analysis PlugIn Editor This view contains the following constituent parts: 1. PMC Sumary Table – This table summarizes all the performance counters (PMCs) that are currently selected and enabled in the Timed Samples and Counters data source. ● PMC column — This is a short description of the counter and the device in which this performance monitor counter is found. ● Mode column — The counter’s current mode.
Custom Configurations Counter Spreadsheet Analysis PlugIn Editor Shortcut Description Equation Terms pNcY Represents a summation of results from all processors on counter-Y . For example: pNc1 is the term that represents event count samples for every active processor’s counter #1, all added together. You could get the same effect with an equation of your own like (p1c1+p2c1+p3c1+p4c1), but this would only work correctly on a four processor system.
Custom Configurations Counter Spreadsheet Analysis PlugIn Editor Spreadsheet Configuration Example Because this editor is very flexible and powerful, an example can be helpful to illustrate how it might be used. Starting with a predefined config, we will add some performance counter events, and activate the Performance Counter Spreadsheet plugins. Last, we will add some shortcut equations to the analysis. Select the configuration named “Processor Bandwidth (Intel Core 2)” (Figure 7-12).
Custom Configurations Counter Spreadsheet Analysis PlugIn Editor 2. Next search the list by typing “INST” into the search field, as is shown in Figure 7-13. Select the “INST_RETIRED” entry and change the mode to “Counter” as with the first event. Figure 7-13 Enabling two performance counters Click on the Counter Spreadsheet line in the list of PlugIns to see the Performance Counter Spreadsheet . You will see the editor described previously in Using the Editor (page 196).
Custom Configurations Counter Spreadsheet Analysis PlugIn Editor Next, enter the equation pNc3/pNc2, as is shown in Figure 7-14. This will automatically calculate the number of cycles per completed instruction, or CPI, and allow you to display it alongside the “raw” counts of CPU cycles, instructions completed, and the bus bandwidths already calculated by the original “Processor Bandwidth” configuration.
Hardware Counter Configuration The different CPUs and North bridge chipsets available in Macintosh systems have widely varying performance monitoring capabilities. Because there are such a wide variety of counters and ways in which they can be combined to get useful information, the default configurations supplied with each version of Shark can only scratch the surface of the immense variety of possible configurations.
Hardware Counter Configuration Configuring the Sampling Technique: The Sampling Tab Once you have decided which counters you want to measure, and thought a bit about how you might want to control sampling, there are several configuration steps that must be performed using the controls on the Sampling tab, illustrated in Figure 8-1. Figure 8-1 Timed Samples & Counters Data Source - Advanced Sampling Tab 1.
Hardware Counter Configuration Configuring the Sampling Technique: The Sampling Tab ● 2. Sample Limit — Sets the maximum number of samples to record. Specifying a maximum of N samples will result in at most N samples being taken on a uniprocessor machine or C *N samples taken on a multiprocessor system with C processors. This prevents the sample buffers from growing too large in case you happen to choose a combination of a large time limit and high sampling rate.
Hardware Counter Configuration Configuring the Sampling Technique: The Sampling Tab ● chudRecordUserSample — A sample is recorded for every call to the CHUD.frameworkchudRecordUserSample() function. This is analogous to using signposts (Sign Posts (page 85)) with system trace. Finally, once you have chosen a sampling mode, there is one additional variation that can be applied to the nominal sampling rate.
Hardware Counter Configuration Common Elements in Performance Counter Configuration Tabs Common Elements in Performance Counter Configuration Tabs All of the various performance counter configuration tabs have many unique elements, as the various processors and North bridges supported by MacOS X are significantly different from each other in many ways.
Hardware Counter Configuration Common Elements in Performance Counter Configuration Tabs 3. Sample Interval— This is the number of events that must occur before this PMC will trigger sampling. It is ignored unless this particular counter has been set to Trigger mode using the Enable Button . If a counter cannot support Trigger mode, then this box will not be present.
Hardware Counter Configuration MacOS X OS-Level Counters Configuration You can mark processes with Shark’s Process Marker (Figure 8-3). The Process Marker can be opened via the Sampling Mark Process menu item. Shark disables this menu item for timer sampling, because the marked bit is ignored in that case.
Hardware Counter Configuration Intel CPU Performance Counter Configuration ● Scheduler Events: Events such as context switches, “thread ready” events, and stack handoffs ● Disk I/O Events: Disk reads and writes, with optional breakdown by type (data, disk control metadata, VM page-in, and VM page-out) and timing (synchronous and asynchronous) No unique controls are needed to control these counters; for information on all of the controls, see Counter Control (page 206).
Hardware Counter Configuration Intel CPU Performance Counter Configuration both count a similar but not identical list of events on the programmable processors. Full event listings are provided in Intel Core Performance Counter Event List (page 246) and Intel Core 2 Performance Counter Event List (page 252) for the Core and Core 2, respectively. Figure 8-5 shows the single configuration tab for the Intel Core 2 processor (the one for the Core is virtually identical, but lacks PMCs 3–5).
Hardware Counter Configuration PowerPC G3/G4/G4+ CPU Performance Counter Configuration bit-names in the mask list. Any bit in the list labeled *Reserved* should not be enabled. A brief summary of which bits are active for any particular event is included in the event lists in Intel Core Performance Counter Event List (page 246) and Intel Core 2 Performance Counter Event List (page 252).
Hardware Counter Configuration PowerPC G3/G4/G4+ CPU Performance Counter Configuration Figure 8-6 shows the single configuration tab for the G4+ processor (the one for the G3 and G4 is virtually identical, but lacks PMCs 5–6). For the most part, it uses the standard controls from Counter Control (page 206). Both user/supervisor event counting selection and “marked” threads and processes can be used, but all counters must use the same settings at once.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration Warning: If you leave branch folding disabled and exit Shark, branch folding will remain disabled. While this will not cause any correctness problems or crashes, it can adversely affect performance.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration In addition, several additional controls are provided. Most are multiplexer controls to switch the various event pre-filtering multiplexers, but the last two adjust features specific to these processors. These controls are numbered on Figure 8-7, and are: 1.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration 8. TB Select: This is the divider used for timebase events that cause processor exceptions, and selects from four different division ratios. More information is available in the PowerPC 970 Documentation. Figure 8-7 PowerPC 970 Processor Performance Counters Configuration Figure 8-8 shows the second tab used to configure the PowerPC 970 performance counters, the IMC tab.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration to count it. Please note that as long as an instruction resides in the L1 instruction cache, its match bit will remain unchanged. Hence, if the match condition for an instruction changes, then the L1 instruction cache should be flushed to force the lines to be reloaded and the “match” bits to be recalculated.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration Due to the very flexible and complex nature of these mechanisms, it is highly recommended that you read the pertinent sections of the PowerPC 970 Documentation, Sections 10.9 and 10.10 in the main user’s manual. Figure 8-8 PowerPC 970 IMC (IFU) Configuration Tab In the top part of the IMC pane are some general controls (black numbers on white): 1.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration 2. IOP Marking – This pre-filter will limit the type of internal PowerPC microinstructions (IOPs) that are matched or sampled. ● All IOPs – (default) Any IOP will pass ● µCode IOPs –Only IOPs resulting from microcode expansion will pass. ● One Per Inst – Only pass one IOP per PowerPC instruction.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration 5. 6. 7. Major Opcode Bits— This allows you to select marked instructions on the basis of their six major opcode bits (bits 0–5 of each PowerPC instruction). There is a column for each bit, and you can individually control matching on the basis of each bit. ● X — (default) Ignore the bit ● 0 — Only match if this bit is a zero ● 1 — Only match if this bit is a one ● * (any) — Match any state.
Hardware Counter Configuration PowerPC G5 (970) Performance Counter Configuration 2. 3. ● BSFL column — This lists the BSFL (Branch instruction, instruction that will be Split, First instruction in a dispatch group, and Last instruction in a dispatch group) bits associated with every instruction in the L1 cache. ● Classification column — This gives the name of every microinstruction class. IMRMASK bits— These bits mask off an instruction’s BSFL bits before matching.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration ● 1 — Match this bit position with 1. Normally only desired if the corresponding IMRMASK bit is 1, or if you want to intentionally match nothing.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration settings, we strongly suggest that you start using the “Simple” settings at first, as described in Simple Timed Samples and Counters Config Editor (page 186), at least until you learn which combinations of settings are best at producing useful information. Memory controller counters are available on PowerPC machines with UniNorth v1.5 and later memory controllers. Unfortunately, no equivalent exists for Intel processors.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration 4. FireWire/Enet — The dedicated FireWire and Ethernet I/O ports Figure 8-10 U1.5/U2 Configuration Tab U3 North Bridge This section describes how you can make custom configurations for Macs equipped with the U3 North bridge chipset used in some PowerPC G5 Macs.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration b. Write — Only store requests to memory can increment the counter. c. Read — Only load requests from memory can increment the counter. d. Any — All memory requests cause a counter increment, read or write. 2. Divider PopUp— Because U3 PMCs are 32-bit, you may overflow a counter if you are counting high-frequency events or are counting continuously for a long time.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration e. AGP — The AGP interface Figure 8-11 U3 Memory Configuration Tab Figure 8-12 shows the second of U3’s two configuration tabs, the API configuration panel. As with the memory tab, the first line of each PMC’s controls are just standard controls from Counter Control (page 206). Below this is a pair of custom controls that may be set independently for each of the different PMCs: 1.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration 2. Divider PopUp— This is the same as the Divider PopUp on the memory tab. Figure 8-12 U3 API Configuration Tab U4 (Kodiak) North Bridge This section describes how you can make custom configurations for Macs equipped with the U4 (Kodiak) North bridge chipset used in some PowerPC G5 Macs.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration b. Write — Only store requests to memory can increment the counter. c. Read — Only load requests from memory can increment the counter. d. Any — All memory requests cause a counter increment, read or write. 2. Divider PopUp— Because Kodiak PMCs are 32-bit, you may overflow a counter if you are counting high-frequency events or are counting continuously for a long time.
Hardware Counter Configuration PowerPC North Bridge Counter Configuration Figure 8-14 shows the second of U4’s two configuration tabs, the API configuration panel. As with the memory tab, the first line of each PMC’s controls are just standard controls from Counter Control (page 206). Below this are four custom controls that may be set independently for each of the different PMCs: 1.
Hardware Counter Configuration ARM11 CPU Performance Counter Configuration ARM11 CPU Performance Counter Configuration This section describes how you can make custom configurations for iOS devices with ARM11 processors. These devices have two identical, fully programmable performance counters plus one counter (#1) that can record cycle counts only. Full event listings are provided in ARM11 Performance Counter Event List (page 327). Figure 8-15 shows the configuration tab for the ARM11 processor.
Command Reference Menu Reference This section summarizes Shark’s commands, arranged by menu. Shark This menu contains the usual application-menu commands. Command Shortcut About Shark... Description Where Described See revision information for Shark. Preferences... Cmd-, Edit some global Shark parameters. Hide Shark Cmd-H Hides Shark's window(s) and switches to the next-frontmost application. Hide Others Opt-Cmd-H Hides all other applications' windows.
Command Reference Menu Reference Command Shortcut Description Close Cmd-W Close the frontmost window. If the frontmost window is the main control window, this will quit Shark. Close All Opt-Cmd-W Close all session windows. Save Cmd-S Save the frontmost session. Session Files (page 21) Save As... Shift-Cmd-S Save the frontmost session to a new location. Session Files (page 21) Attach a copy of the frontmost session to a new email in your default email program.
Command Reference Menu Reference Command Shortcut Description Redo Shift-Cmd-Z Redo the next action. Cut Cmd-X Cut the selected text, placing it on the clipboard. Copy Cmd-C Copy the selected text to the clipboard. Paste Cmd-V Paste the contents of the clipboard. Paste and Match Style Opt-Shift-Cmd-V Paste the contents of the clipboard using the same style as existing text. Select All Cmd-A Select all of whatever was most recently selected (samples, text, etc.). Find...
Command Reference Menu Reference Format All items in this menu are standard text processing commands. Since it is generally not possible to apply custom formats to most text within Shark, this menu is seldom used. Command Shortcut Description Show Fonts Cmd-T Show the Font palette. Bold Cmd-B Toggle the bold attribute of the selected text. Font Italic Toggle the italic attribute of the selected text. Underline Cmd-U Toggle the underline attribute of the selected text.
Command Reference Menu Reference Command Shortcut Description Where Described Show/Hide Mini Config Editor Shift-Cmd-C Show/Hide the mini config editor attached to the main control window. Mini Configuration Editors (page 19) Edit... Opt-Shift-Cmd-C Edit the current configuration. The Config Editor (page 184) New... Cmd-N Create a new configuration. The Config Editor (page 184) Export... Export the current configuration to a file. The Config Editor (page 184) Import...
Command Reference Menu Reference Command Shortcut Description Where Described Network/iPhone Profiling... Shift-Cmd-N Enable Network Profiling of other computers or iPhones, instead of local profiling, or share this computer for others to profile. Network/iPhone Profiling (page 138) Data Mining This menu, which disappears when data mining is not possible, provides access to Shark’s powerful symbol-level data mining capabilities. These are described in more detail in Data Mining (page 151).
Command Reference Alphabetical Reference Window Along with standard window control functionality, this contains the command to show or hide the Advanced Settings drawer on the right side of each session window, as described in Advanced Settings Drawer (page 23). Command Shortcut Description Minimize Cmd-M Minimise the frontmost window. Minimize All Minimise all Shark windows. Zoom Zoom the frontmost window.
Command Reference Alphabetical Reference Command Shortcut Description Where Described Menu Batch Mode Shift-Cmd-B Toggles Batch mode, allowing the recording of multiple sessions before analysis begins. Automatically enabled when Unresponsive Applications is ticked. Batch Mode (page 125) Sampling Charge Library to Callers Shift-Cmd-E Add the cost of all calls to the selected library(ies) to their caller(s), and hide the selected library(ies).
Command Reference Alphabetical Reference Command Shortcut Description Where Described Menu Flatten Library Shift-Cmd-F Just hide the selected library(ies), without adding time to the callers. Data Mining (page 151) Data Mining Focus Callers of Library Opt-Shift-Cmd-Y Hide all except the caller(s) of the selected library(ies). Data Mining (page 151) Data Mining Focus Callers of Symbol Opt-Cmd-Y Hide all except the caller(s) of the selected symbol(s).
Command Reference Alphabetical Reference Command Shortcut Mail This Session Description Where Described Menu Attach a copy of the frontmost session to a new email in your default email program. Session Files (page 21) File Merge... Opt-Cmd-M Merge two saved sessions. Merging Sessions (page 151) File Network/iPhone Profiling... Shift-Cmd-N Enable Network Profiling of other computers or iPhones, instead of local profiling, or share this computer for others to profile.
Command Reference Alphabetical Reference Command Shortcut Description Where Described Menu Remove Callstacks with Symbol Cmd-K Hide all callstacks which contain the selected symbol(s). Data Mining (page 151) Data Mining Restore All Cmd-R Show all symbols and libraries previously hidden by "Charge Symbol to Callers", "Charge Library to Callers" or "Remove Callstacks with Symbol", and restore original costs for all symbols.
Command Reference Alphabetical Reference Command Shortcut Description Where Described Menu Show/Hide Mini Config Editor Shift-Cmd-C Show/Hide the mini config editor attached to the main control window. Mini Configuration Editors (page 19) Config Symbolicate... Opt-Cmd-S Add symbols to the frontmost session from a symbol-rich copy of the target application on disk. Manual Session Symbolication (page 146) File Show all symbols and libraries previously hidden by any of the Focus commands.
Miscellaneous Topics Code Analysis with the G5 (PPC970) Model Shark offers several features designed to help the programmer understand instruction execution behavior on the G5 (PPC970). From the Advanced Settings drawer’s Assembly Browser tab, you can set the Assembly Browser to display an estimate of G5 dispatch group formations, using the check box near item #1 in Figure B-1.
Miscellaneous Topics Supervisor Space Sampling Guidelines note that the data in the G5 Resource Utilization drawer is based on the currently selected instructions in the Code Table, or on the entire code sequence if nothing is selected. The user can specify a subset of instructions within the current Code Table , and the G5 Resource Utilization charts and tables will update dynamically.
Miscellaneous Topics Supervisor Space Sampling Guidelines any timer interrupts that occur in it are not serviced until interrupts are reenabled in ml_restore(). It is for this reason that all of the timer samples appear to come from the isync instruction at 0x96da8 (see Figure B-2). Figure B-2 Timer Sampling in the Kernel Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Miscellaneous Topics Supervisor Space Sampling Guidelines A more accurate picture of the kernel behavior can be seen with event sampling (Figure B-3). This is because CPU event sampling reads the SIAR (sampled instruction address register) rather than the originating PC when the performance monitor interrupt is serviced. Whenever a CPU performance monitor interrupt (PMI) occurs, the SIAR register is set to the currently executing PC (program counter) .
Intel Core Performance Counter Event List Intel’s Core processors have 2 performance counters per core. Both are programmable, and can count 111 (#1) or 112 (#2) different types of events. Most of the events are reserved, and not listed here. The available events can be modified by enabling Event-Mask bits, in the PMC control registers. There are eight such bits in each programmable PMC.
Intel Core Performance Counter Event List Performance Counter Event Name Event Number PMC Valid Event-Mask Number Bits BR_CALL_MISSP_EXEC 147 1,2 none BR_CND_EXEC 139 1,2 none BR_CND_MISSP_EXEC 140 1,2 none BR_IND_CALL_EXEC 148 1,2 none BR_IND_EXEC 141 1,2 none BR_IND_MISSP_EXEC 142 1,2 none BR_INST_DECODED 224 1,2 none BR_INST_EXEC 136 1,2 none BR_INST_RETIRED 196 1,2 none BR_MISS_PRED_RETIRED 197 1,2 none BR_MISS_PRED_TAKEN_RET 202 1,2 none BR_MISSP_EXEC
Intel Core Performance Counter Event List Performance Counter Event Name Event Number PMC Valid Event-Mask Number Bits BUS_TRAN_BRD 101 1,2 6 BUS_TRAN_BURST 110 1,2 567 BUS_TRAN_DEF 109 1,2 567 BUS_TRAN_IFETCH 104 1,2 6 BUS_TRAN_INVAL 105 1,2 6 BUS_TRAN_MEM 111 1,2 567 BUS_TRAN_PWR 106 1,2 6 BUS_TRAN_RFO 102 1,2 6 BUS_TRANS_IO 108 1,2 6 BUS_TRANS_P 107 1,2 6 BUS_TRANS_WB 103 1,2 567 CPU_CLK_UNHALTED 60 1,2 01 CYCLES_DIV_BUSY 20 1 none CYCLES_INT_MAS
Intel Core Performance Counter Event List Performance Counter Event Name Event Number PMC Valid Event-Mask Number Bits EMON_ESP_UOPS 215 1,2 none EMON_FUSED_UOPS_RET 218 1,2 01 EMON_KNI_PREF_DISPATCHED 7 1,2 01 EMON_KNI_PREF_MISS 75 1,2 01 EMON_PREF_RQSTS_DN 248 1,2 none EMON_PREF_RQSTS_UP 240 1,2 none EMON_SIMD_INSTR_RETIRED 206 1,2 none EMON_SSE_SSE2_COMP_INST_RETIRED 217 1,2 01 EMON_SSE_SSE2_INST_RETIRED 216 1,2 012 EMON_SYNCH_UOPS 211 1,2 none EMON_UNFUSION
Intel Core Performance Counter Event List Performance Counter Event Name Event Number PMC Valid Event-Mask Number Bits INST_RETIRED 192 1,2 none ITLB_MISS 133 1,2 none L1_CACHEABLE_DATA_READS 64 1,2 0123 L1_CACHEABLE_DATA_READS_AND_WRITES 68 1,2 0123 L1_CACHEABLE_DATA_WRITES 65 1,2 0123 L1_CACHEABLE_LOCK_READS 66 1,2 0123 L1_PREFETCH_REQUEST_MISSES 79 1,2 none L2_ADS 33 1,2 6 L2_DBUS_BUSY 34 1,2 none L2_DBUS_BUSY_RD 35 1,2 6 L2_IFETCH 40 1,2 0123456 L2_LD
Intel Core Performance Counter Event List Performance Counter Event Name Event Number PMC Valid Event-Mask Number Bits MMX_INSTR_EXEC 176 1,2 none MMX_INSTR_TYPE_EXEC 179 1,2 012345 MMX_SAT_INSTR_EXEC 177 1,2 none MMX_SAT_INSTR_RET 207 1,2 none MMX_UOPS_EXEC 178 1,2 none MUL 18 2 none PARTIAL_RAT_STALLS 210 1,2 none RESOURCE_STALLS 162 1,2 none RET_SEG_RENAMES 214 1,2 none SB_DRAINS 4 1,2 none SEG_REG_RENAMES 213 1,2 0123 SEG_RENAME_STALLS 212 1,2 0123
Intel Core 2 Performance Counter Event List Intel’s Core 2 processors have 5 performance counters per core. Two of these are fully programmable, and can count 116 (#1) or 115 (#2) different types of events. The other three counters are fixed, and can only count one type of event (for counter 3: INSTR_RETIRED.ANY, 4: CPU_CLK_UNHALTED.CORE, and 5: CPU_CLK_UNHALTED.REF). In addition, the available events can be modified by enabling any of the eight Event-Mask bits associated with each programmable counter.
Intel Core 2 Performance Counter Event List Performance Counter Event Name BR_CALL_EXEC 146 BR_CALL_MISSP_EXEC BR_CND_EXEC 147 139 BR_CND_MISSP_EXEC 140 BR_IND_CALL_EXEC 148 BR_IND_EXEC 141 BR_IND_MISSP_EXEC 142 BR_INST_DECODED 224 BR_INST_EXEC 136 BR_INST_RETIRED 196 BR_INST_RETIRED.
Intel Core 2 Performance Counter Event List Performance Counter Event Name BR_RET_BAC_MISSP_EXEC BR_RET_EXEC BR_RET_MISSP_EXEC BR_TKN_BUBBLE_1 BR_TKN_BUBBLE_2 BUS_BNR_DRV BUS_DATA_RCV BUS_DRDY_CLOCKS BUS_HIT_DRV BUS_HITM_DRV BUS_IO_WAIT Event Number 145 143 144 151 152 97 100 98 122 123 127 PMC Number Valid Event-Mask Bits 2 0 1 none 2 0 1 none 2 0 1 none 2 0 1 none 2 0 1 none 2 0 1 5 2 0 1 67 2 07 1 5 2 0 1 5 2 02357 1 5 2 023 1 67 2 0
Intel Core 2 Performance Counter Event List Performance Counter Event Name Event Number PMC Number Valid Event-Mask Bits BUS_LOCK_CLOCKS (Core and Bus Agents masks apply) 99 1 567 2 067 1 4567 2 0 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 1 567 2 07 BUS_REQ_OUTSTANDING BUS_TRAN_RFO BUS_TRANS_ANY BUS_TRANS_BRD BUS_TRANS_BURST BUS_TRANS_DEF BUS_TRANS_IFETCH BUS_TRANS_INVAL BUS_TRANS_IO BUS_
Intel Core 2 Performance Counter Event List Performance Counter Event Name Event Number PMC Number Valid Event-Mask Bits BUS_TRANS_PWR 106 1 567 2 07 1 567 2 0 1 67 2 07 1 0167 2 0 1 01 2 0 BUS_TRANS_WB 103 BUSQ_EMPTY 125 CMP_SNOOP 120 CPU_CLK_UNHALTED 60 CPU_CLK_UNHALTED.CORE 0 4 none CPU_CLK_UNHALTED.
Intel Core 2 Performance Counter Event List Performance Counter Event Name EXT_SNOOP Event Number 119 PMC Number Valid Event-Mask Bits 2 0 1 0135 2 0 FP_ASSIST 17 2 0 FP_COMP_OPS_EXE 16 1 none FP_MMX_TRANS 204 1 01 2 0 1 none 2 0 HW_INT_RCV 200 IDLE_DURING_DIV 24 1 none ILD_STALL 135 1 none 2 0 1 1 2 0 1 012 2 0 INST_QUEUE.FULL INST_RETIRED 131 192 INSTR_RETIRED.
Intel Core 2 Performance Counter Event List Performance Counter Event Name L1D_CACHE_LOCK 66 L1D_CACHE_ST 65 L1D_M_EVICT 71 L1D_M_REPL 70 L1D_PEND_MISS 72 L1D_PREFETCH.
Intel Core 2 Performance Counter Event List Performance Counter Event Name L2_IFETCH L2_LD L2_LINES_IN L2_LINES_OUT L2_LOCK L2_M_LINES_IN L2_M_LINES_OUT L2_NO_REQ L2_REJECT_BUSQ L2_RQSTS L2_ST LOAD_BLOCK Event Number 40 41 36 38 43 37 39 50 48 46 42 3 PMC Number Valid Event-Mask Bits 2 0 1 012367 2 0 1 01234567 2 067 1 4567 2 0 1 4567 2 0 1 01234567 2 02345 1 67 2 0 1 4567 2 067 1 67 2 0 1 01234567 2 0 1 01234567 2 0234567 1 012367 2
Intel Core 2 Performance Counter Event List Performance Counter Event Name LOAD_HIT_PRE Event Number 76 MACHINE_NUKES 195 MACRO_INSTS.
Intel Core 2 Performance Counter Event List Performance Counter Event Name SEG_REG_RENAMES 213 SEG_RENAME_STALLS 212 SEGMENT_REG_LOADS SIMD_ASSIST SIMD_INST_RETIRED 202 199 SIMD_INSTR_RETIRED 206 SIMD_SAT_INSTR_RETIRED SIMD_SAT_UOP_EXEC 207 177 SIMD_UOP_TYPE_EXEC SNOOP_STALL_DRV 6 205 SIMD_COMP_INST_RETIRED SIMD_UOPS_EXEC Event Number 179 176 126 PMC Number Valid Event-Mask Bits 2 0 1 0123 2 02345 1 0123 2 0 1 none 2 034567 1 none 2 02345 1 0123 2 023456 1
Intel Core 2 Performance Counter Event List Performance Counter Event Name SSE_PRE_EXEC SSE_PRE_MISS STORES BLOCKED THERMAL_TRIP UOPS_RETIRED X87_OPS_RETIRED Event Number 7 75 4 59 194 193 PMC Number Valid Event-Mask Bits 2 07 1 01 2 0235 1 01 2 0 1 013 2 0 1 67 2 0 1 0123 2 0 1 0123456 2 0 Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
PPC 750 (G3) Performance Counter Event List The PowerPC 750 (G3) cores contain four independent performance counters, each of which can count 12–17 different types of events. Four commonly measured types of events (CPU cycles, instructions completed, timebase clock transitions, and instructions dispatched) can be counted on any counter, while other types of events can only be counted on a limited subset of the counters.
PPC 750 (G3) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number Instr Bkpt Matches 1, 2 9 Instr Completed 1, 2, 3, 4 2 Instr Dispatched 1, 2, 3, 4 4 Instr Fetches 1, 2 8 Integer Instr 4 13 ITLB Search Cycles 1, 2 6 L2 Castouts 4 5 L2 Hits 1, 2 7 L2 Snoop Castouts 3 12 Marked/Unmarked Supervisor Transitions 4 9 Marked/Unmarked User Transitions 3 9 Mispredicted Branches 4 8 Nothing 1, 2, 3, 4 0 Snoop Retries 4 12 STWCX Instr
PPC 7400 (G4) Performance Counter Event List The PowerPC 7400 (G4) cores contain four independent performance counters, each of which can count 27–48 different types of events. Four commonly measured types of events (CPU cycles, instructions completed, timebase clock transitions, and instructions dispatched) can be counted on any counter, while other types of events can only be counted on a limited subset of the counters.
PPC 7400 (G4) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number Branch Unit LR/CTR Stall Cycles 3 15 Branch Unit Speculative Load Stall Cycles 1 37 Branch Unit Speculative Stall Cycles 1 13 4 14 Branches Taken 3 5 Bus Kill Transactions (Non-Retried) 4 12 Bus Multi Beat Write TAs 4 25 Bus Multi-Beat Read TAs 2 21 Bus Read TAs 2 42 Bus Retries 2 20 Bus Single Beat Read TAs 1 28 Bus Single Beat Write TAs 3 29 Bus Transactions (Non-Re
PPC 7400 (G4) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number dL1 Cycles 3 18 dL1 Hits 1 24 dL1 Load Hits 1 22 dL1 Load Misses 2 15 dL1 Miss Cycles > Threshold 1 11 dL1 Misses 2 17 dL1 Reloads 3 30 dL1 Snoop Hits 4 23 dL1 Snoop Interventions 4 16 4 26 dL1 Store Hits 1 23 dL1 Store Misses 2 16 dL1 Touch Hits 3 16 dL1 Touch Misses 4 15 dL1 Writes Hit Shared 1 30 dL2 Hits 1 33 dL2 Misses 2 26 DSS Instr 3 24 DSSALL
PPC 7400 (G4) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number EIEIO Instr 1 5 External Snoop Requests 1 42 Fall through Branches 2 5 Floating Point Instr 3 11 Full Cache Line Store Miss Merge 3 21 Hit Exclusive Interventions 3 28 Hit Interventions 1 45 Hit Modified Interventions 2 36 Hit Shared Interventions 4 24 iL1 Misses 1 32 iL1 Reloads 2 25 iL2 Hits 2 27 iL2 Misses 1 34 Instr Bkpt match 1 9 Instr Completed 1, 2, 3,
PPC 7400 (G4) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number L2 Allocations 1 36 L2 Castout Snoop Hits 1 44 L2 Sectors Castout 2 29 L2 Snoop Hits 3 27 L2 Snoop Interventions 3 13 L2 Tag Accesses 1 26 L2 Tag Lookup 1 25 L2 Tag Snoop Writes 4 17 L2 Tag Snoops 3 19 L2 Tag Writes 2 18 L2 Write Hit on Shared 2 23 L2SRAM Cycles 4 18 L2SRAM Read Cycles 2 19 L2SRAM Write Cycles 3 20 Load Instr 2 11 Mispredicted Branches 4 5
PPC 7400 (G4) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number SYNC Instr 4 11 System Register Unit Instr 2 14 TimeBase (Lower) 0->1 bit transitions 1, 2, 3, 4 3 TLBI Instr 2 40 TLBSYNC Instr 3 10 Unresolved Branches 1 12 User/Supervisor Switches 2 9 VCIU Wait Cycles 2 8 VSIU Instr 2 7 VTE Data Reload Table Hits 3 22 VTE dL1 Hits 4 20 VTE L1 Cache Misses 2 30 VTE Line Fetches 2 32 VTE Premature Cancels 2 33 VTE Refresh 1
PPC 7450 (G4+) Performance Counter Event List The PowerPC 7450 (G4+) cores contain six independent performance counters, each of which can count 20–94 different types of events.
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number AltiVec MFVSCR Instr Sync Cycles 1, 2 18 AltiVec MTVRSAVE Instr 1, 2, 4 13 AltiVec MTVSCR Instr 1, 2, 4 12 AltiVec Permute Instr 1, 2, 4 8 AltiVec Permute Stall Cycles 1, 2 14 AltiVec SFX Instr 1, 2, 4 10 AltiVec SFX Stall Cycles 1, 2 16 AltiVec VSCR[SAT] 0->1 1, 2 19 Branch Flushes 3 26 Branch Instr 1 34 Branch Link Stack Correct 2 59 Branch Link Stack Mispredicts 3
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number Bus Retry from L1 Retry 6 47 Bus Retry from Prev-Adjacent 6 48 Bus TA's for Reads 6 42 Bus TA's for Writes 6 43 Bus Writes not Retried 6 45 Cache-Inhibited Stores 1 52 Canceled iL1 Misses 3 19 Completed 0 Instr 1 32 Completed 1 Instr 2 33 Completed 2 Instr 3 8 Completed 3 Instr 4 14 Completion Queue > Threshold 2 32 Complex Integer Instr 1 33 CPU Cycles 1, 2, 3,
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number dL1 Load Hits 1 53 dL1 Load Miss Cycles 3 21 dL1 Load Misses 2 37 dL1 Load-Miss Cycles > Threshold 1 43 dL1 Misses 2, 3 23 dL1 Pushes 3 22 dL1 Reloads 2 49 dL1 Snoop Hit in COQ 1 48 dL1 Snoop Hit in COQ Retry 1 49 dL1 Snoop Hit Modified 1 44 dL1 Snoop Hits 1 50 dL1 Snoops 1, 2 22 dL1 Store Hits 1 55 dL1 Store Misses 2 39 dL1 Touch Hits 1 54 dL1 Touch Miss C
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number DTLB Misses 3 18 DTLB Search Cycles 4 23 DTLB Search Cycles > Threshold 1 40 DTQ Full 6 25 EIEIO Instr 1 35 Extern Perf Monitor 1, 2, 3, 4 7 External Interventions 6 22 External Pushes 6 23 External Snoop Retries 6 24 Fall Thru Branches 2 54 Fast BTIC Hits 3 30 Folded Branches 4 29 FP Denorm Result 1 94 FP Denormalize 1 67 FP Instr Dispatched to FPR Queue 2
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number FPSCR Renames 1/2 Busy 1 91 FPSCR Renames 1/4 Busy 1 90 FPSCR Renames 3/4 Busy 1 92 FPSCR Renames All Busy 1 93 FPU Instr 3 14 GPR Issue Queue > Threshold Cycles 4 16 GPR Issue Queue Stall Cycles 4 17 GPR Rename Buffer > Threshold 3 12 iL1 Accesses 1 41 iL1 Miss Cycles 2 36 iL1 Misses 1, 2 21 iL1 Reloads 2 48 iL2 Misses 5, 6 4 iL3 Misses 5, 6 5 6 33 Instr Bk
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number L1 External Interventions 6 19 L2 Castout Queue Full Cycles 6 10 L2 Castouts 6 8 L2 External Interventions 6 20 L2 Hits 5, 6 2 L2 Load Hits 5 8 L2 Misses 5 19 6 29 L2 Store Hits 5 9 L2 Touch Hits 5, 6 13 L2 Valid Requests 6 27 L3 Castout Queue Full Cycles 6 11 L3 Castouts 6 9 L3 External Interventions 6 21 L3 Hits 5, 6 3 6 31 5 10 6 35 5 20 6 30 6
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number L3 Touch Hits 5, 6 14 6 37 L3 Write Queue Full Cycles 6 17 LD/ST Alias vs. CSQ 1 73 LD/ST Alias vs. FSQ/WB0/WB1 1 72 LD/ST CSQ Forwards 1 86 LD/ST Indexed Alias Stalls 1 71 1st Spec Branch Buffer Correct 2 55 LD/ST LMQ Full Stalls 1 78 LD/ST LMQ Index Alias Stalls 1 84 LD/ST Load vs. STQ Alias Stalls 1 83 LD/ST Load-Hit Line vs. CSQ0 1 74 LD/ST Load-Miss Line vs.
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number LWARX Instr 2 29 MFSPR Instr 2 30 Mispredicted Branches 4 28 MTSPR Instr 1 36 Nothing 1, 2, 3, 4, 5, 6 0 Perf Monitor Interrupts 1, 2, 3, 4 5 Prefetch Engine Full 6 57 Prefetch Engine Requests 6 52 Prefetch Instr Fetch Collisions 6 55 Prefetch Load Collisions 6 53 Prefetch Load/Store/Instr Fetch Collisions 6 56 Prefetch Store Collisions 6 54 Refetch Serializations
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number Store String/Multi Pieces 4 22 STSWI/STSWX/STMW Instr 2 27 STWCX Instr 3 15 Successful STWCX Instr 4 25 SYNC Instr 4 21 Taken Branches 3 25 TimeBase (Lower) 0->1 bit transitions 1, 2, 3, 4 3 TLBIE Instr 2 28 TLBIE Snoops 2 47 TLBSYNC Instr 4 20 Touch Alias 1 47 Unaligned Load Instr 1 87 Unaligned Load/Store Instr 1 89 Unaligned Store Instr 1 88 Unresolved Bran
PPC 7450 (G4+) Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number VTE2 Line Fetches 3 24 VTE3 Line Fetches 4 26 Write-Through Stores 1 51 Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
PPC 970 (G5) Performance Counter Event List The PowerPC 970 (G5) cores contain an extremely sophisticated and complex set of performance counters. Unlike the other processors used in Macintoshes, one cannot simply choose a counter and type of performance counter event for it to count. There are simply too many different possible events in these processors that can be counted.
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [FPU] fp0 estimate + fp1 estimate 0 3 0: FPU 1: TTM0 [FPU] fp0 finished and produced a result 19 3, 4, 7, 8 0: FPU 1: TTM0 [FPU] fp0 finished and produced a result + fp1 finished and produced a result 0 4 0: FPU 1: TTM0 [FPU] fp0 fpscr 24 3, 4, 7, 8 0: FPU 3: TTM0 [FPU] fp0 fpscr + nothing 32 8 0: FPU 3: TTM0 [FPU] fp0 move, estimate 16
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [FPU] fp1 add, mult, sub, compare, fsel 23 1, 2, 5, 6 0: FPU 0: TTM0 [FPU] fp1 denorm operand 28 1, 2, 5, 6 0: FPU 2: TTM0 [FPU] fp1 divide 20 1, 2, 5, 6 0: FPU 0: TTM0 [FPU] fp1 estimate 22 3, 4, 7, 8 0: FPU 1: TTM0 [FPU] fp1 finished and produced a result 23 3, 4, 7, 8 0: FPU 1: TTM0 [FPU] fp1 move, estimate 20 3, 4, 7, 8 0: FPU 1:
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [GPS] Cacheable store queue full 30 1, 2, 5, 6 1: GPS 2: TTM1 [GPS] I=1 load operation completed on bus 16 3, 4, 7, 8 1: GPS 1: TTM1 [GPS] I=1 load operation completed on bus + Master L2 store transaction on bus was retried 0 8 1: GPS 1: TTM1 [GPS] I=1 store operation (before gathering) 22 1, 2, 5, 6 1: GPS 0: TTM1 [GPS] I=1 store operation c
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [GPS] L2 miss on store access (R, S, I) + I=1 store operation completed on bus 0 5 1: GPS 0: TTM1 [GPS] L2 miss, bus response is modified intervention 20 1, 2, 5, 6 1: GPS 0: TTM1 [GPS] L2 miss, bus response is shared intervention 21 1, 2, 5, 6 1: GPS 0: TTM1 [GPS] Load or store dispatch retries 26 1, 2, 5, 6 1: GPS 2: TTM1 [GPS] Load or stor
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [GPS] Master L2 read transaction on bus was retried 21 3, 4, 7, 8 1: GPS 1: TTM1 [GPS] Master L2 store transaction on bus was retried 20 3, 4, 7, 8 1: GPS 1: TTM1 [GPS] Master SYNC operation competed 22 3, 4, 7, 8 1: GPS 1: TTM1 [GPS] Master SYNC operation retried 23 3, 4, 7, 8 1: GPS 1: TTM1 [GPS] Snoop (external) 24 3, 4, 7, 8 1: GPS 3:
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [GPS] Snoop state machine dispatched 25 3, 4, 7, 8 1: GPS 3: TTM1 [GPS] Snoop state machine dispatched + Snoop caused cache transition from E to S 32 7 1: GPS 3: TTM1 [IDU] instruction queue fullness 16, 17, 18, 19, 20, 21, 22, 23, 16, 17, 18, 19, 20, 21, 22, 23, 16, 17, 18, 19, 20, 21, 22, 23, 16, 17, 18, 19, 20, 21, 22, 23 3, 4, 4, 4, 4, 4, 4, 4, 4
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [IFU] cycles i L1 write active + nothing 32 4 0: IFU 3: TTM0 [IFU] i cache data source 24, 25, 26, 27, 24, 25, 26, 27, 24, 25, 26, 27, 24, 25, 26, 27 1, 2, 2, 2, 2, 5, 5, 5, 5, 6, 6, 6, 6 0: IFU 2: TTM0 [IFU] i cache data source + instr prefetch installed in prefetch buffer 32 6 0: IFU 2: TTM0 [IFU] i cache data source + instruction prefetch reque
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [ISU] completion table full + cr mapper full 0 1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [ISU] duration MSR(EE) = 0 + MSR(EE)=0 and interrupt pending 32 4 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 0: TTM0 1: ISU
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [ISU] fx0 produced a result + fx1 produced a result 32 3 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 1: ISU 2: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU 2: TTM0 1: ISU 2: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 3: TTM0 1: ISU 3: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 1: TTM0 1: ISU 1: TTM1 0: ISU 0: TTM0 1: ISU 0: TTM1 0: ISU 0: TTM0 [ISU] instructions dispa
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 1: ISU 0: TTM1 [LSU0] d erat miss side 0 18 1, 2, 5, 6 0: LSU0 [LSU0] d erat miss side 0 + d erat miss side 1 0 6 0: LSU0 [LSU0] d erat miss side 1 22 1, 2, 5, 6 0: LSU0 [LSU0] d slb miss 21 1, 2, 5, 6 0: LSU0 [LSU0] d tlb miss 20 1, 2, 5, 6 0: LSU0 [LSU0] fl pt load side 0 24 3, 4, 7, 8 3: LSU0 [LSU0] fl pt load side 0 + fl pt load sid
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU0] marked flush from LRQ shl, lhl side 0 + marked flush from LRQ shl, lhl side 1 0 3 1: LSU0 [LSU0] marked flush from LRQ shl, lhl side 1 22 3, 4, 7, 8 1: LSU0 [LSU0] marked flush SRQ lhs side 0 19 3, 4, 7, 8 1: LSU0 [LSU0] marked flush SRQ lhs side 0 + marked flush SRQ lhs side 1 0 4 1: LSU0 [LSU0] marked flush SRQ lhs side 1 23 3, 4, 7, 8
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU0] marked L1 d cache store miss + larx executed 0 32 5 2: LSU0 [LSU0] marked L1 dcache load miss side 0 24 1, 2, 5, 6 2: LSU0 [LSU0] marked L1 dcache load miss side 0 + marked L1 dcache load miss side 1 32 1 2: LSU0 [LSU0] marked L1 dcache load miss side 1 28 1, 2, 5, 6 2: LSU0 [LSU0] marked stcx fail 30 1, 2, 5, 6 2: LSU0 [LSU0] new stre
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 3: LSU1 6|7 [LSU1] flush from LRQ shl,lhl side 0 + flush from LRQ shl,lhl side 1 0 6 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] flush from LRQ shl,lhl side 1 22 1, 2, 5, 6 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] flush SRQ lhs side 0 19 1, 2, 5, 6 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LS
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] flush unaligned load side 0 16 1, 2, 5, 6 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] flush unaligned load side 0 + flush unaligned load side 1 0 1 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] flush unaligned load side 1 20 1, 2, 5, 6 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] flush u
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] flush unaligned store side 1 21 1, 2, 5, 6 3: LSU1 2|3 0: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 29 5 0: FPU 2: TTM0 0: ISU 0: IFU 0: VMX 1: IDU 2: TTM1 1: ISU 1: GPS 2: LSU0 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 d cache store miss 19 3, 4, 7, 8 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 27 1,
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] L1 d cache store miss + L1 dcache entries invalidated from L2 0 4 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 d cache store miss + LSU ls1 reject 32 5 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [LSU1] L1 dcache entries invalidated from L2 23 3, 4, 7, 8 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 dcache load
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 dcache load side 0 16 3, 4, 7, 8 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 dcache load side 0 + L1 dcache load side 1 0 8 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 dcache load side 1 20 3, 4, 7, 8 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [L
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] L1 dcache store side 1 21 3, 4, 7, 8 3: LSU1 2|3 1: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 reload data source 24, 25, 26, 27, 24, 25, 26, 27, 24, 25, 26, 27, 24, 25, 26, 27 3, 4, 4, 4, 4, 7, 7, 7, 7, 8, 8, 8, 8 3: LSU1 2|3 3: LSU1 3: LSU1 2|7 3: LSU1 6|3 3: LSU1 6|7 [LSU1] L1 reload data source + L1 reload data valid 32 8 3: LSU1
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] L1 reload data source + Marked L1 reload data source valid 32 8 3: LSU1 2|7 3: LSU1 3: LSU1 6|7 [LSU1] L1 reload data source + Marked SRQ valid 32 3 3: LSU1 2|7 3: LSU1 3: LSU1 6|7 [LSU1] L1 reload data source + nothing 32 4 3: LSU1 2|7 3: LSU1 3: LSU1 6|7 [LSU1] L1 reload data valid 28 3, 4, 7, 8 3: LSU1 2|3 3: LSU1 3: LSU1 6|3 [LSU1]
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] LMQ slot 0 allocated 30 3, 4, 7, 8 3: LSU1 2|3 3: LSU1 3: LSU1 6|3 [LSU1] LMQ slot 0 valid 29 3, 4, 7, 8 3: LSU1 2|3 3: LSU1 3: LSU1 6|3 [LSU1] LRQ slot 0 allocated 30 1, 2, 5, 6 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [LSU1] LRQ slot 0 valid 26 1, 2, 5, 6 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [LSU1] LRQ slot 0 valid + LRQ slot 0 allocated 32 6 3
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] LS1 reject - reload cdf or tag updata collision 30 1, 2, 5, 6 3: LSU1 6|3 2: LSU1 3: LSU1 6|7 [LSU1] LSU ls1 reject 31 1, 2, 5, 6 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [LSU1] Marked L1 reload data source valid 28 3, 4, 7, 8 3: LSU1 2|7 3: LSU1 3: LSU1 6|7 [LSU1] Marked SRQ valid 30 3, 4, 7, 8 3: LSU1 2|7 3: LSU1 3: LSU1 6|7 [LSU1] SRQ reject
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [LSU1] SRQ store forwarding side 0 24 1, 2, 5, 6 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [LSU1] SRQ store forwarding side 0 + SRQ store forwarding side 1 32 1 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [LSU1] SRQ store forwarding side 1 28 1, 2, 5, 6 3: LSU1 2|3 2: LSU1 3: LSU1 2|7 [SPECA] reserved 11 5 [SPECB] reserved 11 7 [SPECC] reserved 12 5 [SPECD] re
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number [VMX] forwarding occurred from perm or alu or load 29 1, 2, 6 0: VMX 2: TTM0 [VMX] Generic forward 17 3, 4, 7, 8 0: VMX 1: TTM0 [VMX] Generic forward + finish contention cycle 0 7 0: VMX 1: TTM0 [VMX] instruction finish with IMR 28 1, 2, 5, 6 0: VMX 2: TTM0 [VMX] issue valid 30 1, 2, 5, 6 0: VMX 2: TTM0 [VMX] Perm issue marked inst 19
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number CPU Marked Instruction finish 5 4 Dispatch Successes 1 5 dL2 Hit (dL1 reload from L2) 7 1 3: LSU1 dL2 Miss (dL1 reload from Memory) 7 3 3: LSU1 External Interrupt 2 8 FPU marked instruction finish 4 7 FXU Marked Instr finish 4 6 FXU0 busy and FXU1 busy 2 6 FXU0 busy and FXU1 idle 2 7 FXU0 Idle and FXU1 Busy 2 4 FXU0 idle and FXU1
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number Instr Src Encode 0 (Lane 2 not set to IFU) 6 1 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU0 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU1 0: ISU 0: IFU 0: VMX Instr Src Encode 1 (Lane 2 not set to IFU) 6 2 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU Retired Document | 2012-07-23 | Copyright
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU1 0: ISU 0: IFU 0: VMX Instr Src Encode 2 (Lane 2 not set to IFU) 6 3 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU0 0: ISU 0: IFU 0: VMX 0: FPU 0: ISU 0: IFU 0: VMX Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number Instr Src Encode 3 (Lane 2 not set to IFU) 6 4 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU0 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU1 0: ISU 0: IFU 0: VMX Instr Src Encode 4 (Lane 2 not set to IFU) 6 5 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU Retired Document | 2012-07-23 | Copyright
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU1 0: ISU 0: IFU 0: VMX Instr Src Encode 5 (Lane 2 not set to IFU) 6 6 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU0 0: ISU 0: IFU 0: VMX 0: FPU 0: ISU 0: IFU 0: VMX Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number Instr Src Encode 6 (Lane 2 not set to IFU) 6 7 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU0 0: ISU 0: IFU 0: VMX 0: FPU 2: LSU1 0: ISU 0: IFU 0: VMX Instr Src Encode 7 (Lane 2 not set to IFU) 6 8 0: FPU 2: TTM0 0: ISU 0: VMX 0: FPU 2: TTM1 0: ISU 0: IFU 0: VMX 0: FPU Retired Document | 2012-07-23 | Copyright
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number 0: ISU 0: IFU 0: VMX 0: FPU 0: ISU 0: IFU 0: VMX Instructions Completed (ppc,io,ld/st) 1 1 LSU empty (LMQ and SRQ empty) 2 2, 3 LSU Marked Instr finish 4 8 Marked Group complete 4 4 Marked Group Complete Timeout 5 5 Marked group dispatch 2 1 Marked Group issued 5 6 Marked Instr finish in any unit 5 7 Marked store complete 3 1 Marked St
PPC 970 (G5) Performance Counter Event List Performance Counter Event Event PMC TTM Mux Byte Lane Name Number(s) Number(s) Number Number Overflow from PMC3 10 4 Overflow from PMC4 10 5 Overflow from PMC5 10 6 Overflow from PMC6 10 7 Overflow from PMC7 10 8 Overflow from PMC8 10 1 Run Cycles 5 1 SRQ empty 3 4 Stop Completion 1 3 Threshold Timeout 3 2 Timebase Event 5 8 VMX Marked Instruction finish 5 3 Work Held 1 2 Retired Document | 2012-07-23 | Copyright ©
UniNorth-2 (U1.5/2) Performance Counter Event List The U1.5 and U2 North bridge chipsets contain four independent counters, each of which can count any one of 55 different types of events. The table lists the events alphabetically by name, followed by the Event Number that must be selected to activate counting of a particular event. Some of the events are suffixed with a term in braces at the end of the event name.
UniNorth-2 (U1.5/2) Performance Counter Event List Performance Counter Event Name Event Number Burst Read Reqs [Bus] 72 Burst Write Reqs [Bus] 73 Burst Xacts [Bus] 65 Cache Inhib.
UniNorth-2 (U1.
UniNorth-3 (U3) Performance Counter Event List The U3 North bridge chipsets contain two distinct sets of counters. The first set of counters counts memory events, in a manner similar to the counters for the other North bridge chips. Six independent memory counters are present, each of which can count any one of five different general types of events.
UniNorth-3 (U3) Performance Counter Event List API Performance Counter Event Name Event Number API Cycles 0x00 Nothing 0xFF Queue Reservations 0x03 Queue Transactions 0x01 Retries 0x05 Transaction Size (bytes) 0x04 API Event Source Name Source Number API0 Mem MI Target Rq Queue 0x1A API0 Mem Rd Target Rq Queue 0x16 API0 Mem Wt Target Rq Queue 0x18 API1 Mem MI Target Rq Queue 0x1B API1 Mem Rd Target Rq Queue 0x17 API1 Mem Wt Target Rq Queue 0x19 Command Slot 0x01 Ht Coh Rd Rq Q
UniNorth-3 (U3) Performance Counter Event List API Event Source Name Source Number Master Tag: API0 0x200 Master Tag: API0 and API1 0x400 Master Tag: API1 0x300 Master Tag: HT 0xA00 Master Tag: PCI 0x900 Master Tag: VSP 0x800 Master Tag: VSP, PCI, and HT 0xC00 Mem Rd Data Queue 0x70 Mem Rd Target Rq Queue 0x05 Mem Response Queue 0x0C Mem Wt Data Queue 0x20 Mem Wt Target Rq Queue 0x04 Pci Coh Rd Rq Queue 0x11 Pci Coh Wt Rq Queue 0x12 Pci Rd Data Queue 0x80 Pci Rd Target Rq Qu
UniNorth-3 (U3) Performance Counter Event List API Event Source Name Source Number Synchronization Queue 0x00 Vsp Coh Rd Rq Queue 0x15 Vsp Rd Data Queue 0xA0 Vsp Response Queue 0x0F Vsp Target Rq Queue 0x0A Vsp Wt Data Queue 0x50 Write Data Buffer 0x100 Write Data Buffer API0 MI 0x130 Write Data Buffer API0 Wr 0x110 Write Data Buffer API1 MI 0x140 Write Data Buffer API1 Wr 0x120 Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Kodiak (U4) Performance Counter Event List The U4/Kodiak North bridge chipsets contain two distinct sets of counters. The first set of counters counts memory events, in a manner similar to the counters for the other North bridge chips. Six independent memory counters are present, each of which can count any one of 22 different general types of events.
Kodiak (U4) Performance Counter Event List Memory Performance Counter Event Name Event Number Issued transfer size (accumulate events, no filters) 83 Non-coherent read request [RT #24253] (count events, no filters) 97 Non-coherent request [RT #24252] (count events, no filters) 96 Nothing 0 Precharge commands -- close page (filtered and counted) 6 Read reorder queue empty (count events, no filters) 17 Read requests (filtered and counted) 1 Request queue empty [RT #23441] (count events, no fi
Kodiak (U4) Performance Counter Event List API Event Source Name Source Number API Wt Data Buffer 0x28 Bypass Queue 0x10 Command Slot 0x01 GCR Rd Data Queue 0x27 GCR Response Queue 0x0B GCR Target Rq Queue 0x08 GCR Wt Data Queue 0x23 Ht Coh Rd Pending Queue 0x14 Ht Coh Rd Rq Queue 0x0E Ht Coh Wt Pending Queue 0x15 Ht Coh Wt Rq Queue 0x0F Ht Rd Data Queue 0x26 Ht Rd Target Rq Queue 0x07 Ht Response Queue 0x0A Ht Wt Data Queue 0x22 Ht Wt Target Rq Queue 0x06 Intervention Buf
Kodiak (U4) Performance Counter Event List API Event Source Name Source Number PCIE Coh Wt Rq Queue 0x0D PCIE Rd Data Queue 0x25 PCIE Rd Target Rq Queue 0x05 PCIE Response Queue 0x09 PCIE Wt Data Queue 0x21 PCIE Wt Target Rq Queue 0x04 Power Management 0x3F Snoop Slots 0x02 Synchronization Queue 0x00 Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
ARM11 Performance Counter Event List The ARM11 cores used in iOS devices contain three independent performance counters. The first counter can count only cycle counts, while the other two (which are identical) can count 25 different types of events. The table below lists each Event Name, the counter (PMC) number(s) for counters which can count the event, and each event’s number. For more information on how to configure these counters, see ARM11 CPU Performance Counter Configuration (page 229).
ARM11 Performance Counter Event List Performance Counter Event Name PMC Number(s) Event Number Main TLB miss 2-3 15 Procedure call instruction executed 2-3 35 Procedure return instruction executed, return address predicted incorrectly 2-3 38 Procedure return instruction executed, return address predicted 2-3 37 Procedure return instruction executed 2-3 36 Software changed the PC 2-3 13 Stall, data dependency 2-3 2 Stall, instruction buffer cannot deliver 2-3 1 Stall, LSU request
Document Revision History This table describes the changes to Shark User Guide . Date Notes 2008-04-14 TBD 2007-10-31 New document that explains how to analyze code performance by profiling the system. Retired Document | 2012-07-23 | Copyright © 2012 Apple Inc. All Rights Reserved.
Apple Inc. Copyright © 2012 Apple Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, mechanical, electronic, photocopying, recording, or otherwise, without prior written permission of Apple Inc.