User Guide

84 General-Purpose Programming

AMD64 Technology 24592—Rev. 3.15—November 2009

3.7.7 System Calls

A disadvantage of far CALLs and far RETs is that they use segment-based protection and privilege-

checking. This involves significant overhead associated with loading new segment selectors and their

corresponding descriptors into the segment registers. The overhead includes not only the time required

to load the descriptors from memory but also the time required to perform the privilege, type, and limit

checks. Privilege-changing CALLs to the operating system are slowed further by the control transfer

through a gate descriptor.

SYSCALL and SYSRET. SYSCALL and SYSRET are low-latency system-call and system-return

control-transfer instructions. They can be used in protected mode. These instructions eliminate

segment-based privilege checking by using pre-determined target and return code segments and stack

segments. The operating system sets up and maintains the predetermined segments using special

registers within the processor, so the segment descriptors do not need to be fetched from memory when

the instructions are used. The simplifications made to privilege checking allow SYSCALL and

SYSRET to complete in far fewer processor clock cycles than CALL and RET.

SYSRET can only be used to return from CPL = 0 procedures and is not available to application

software. SYSCALL can be used by applications to call operating system service routines running at

CPL = 0. The SYSCALL instruction does not take operands. Linkage conventions are initialized and

maintained by the operating system. “System-Management Instructions” in Volume 2 contains

detailed information on the operation of SYSCALL and SYSRET.

SYSENTER and SYSEXIT. The SYSENTER and SYSEXIT instructions provide similar capabilities

to SYSCALL and SYSRET. However, these instructions can be used only in legacy mode and are not

supported in long mode. SYSCALL and SYSRET are the preferred instructions for calling privileged

software. See “System-Management Instructions” in Volume 2 for further information on SYSENTER

and SYSEXIT.

3.7.8 General Considerations for Branching

Branching causes delays which are a function of the hardware-implementation’s branch-prediction

capabilities. Sequential flow avoids the delays caused by branching but is still exposed to delays

caused by cache misses, memory bus bandwidth, and other factors.

In general, branching code should be replaced with sequential code whenever practical. This is

especially important if the branch body is small (resulting in frequent branching) and when branches

depend on random data (resulting in frequent mispredictions of the branch target). In certain hardware

implementations, far branches (as opposed to near branches) may not be predictable by the hardware,

and recursive functions (those that call themselves) may overflow a return-address stack.

All calls and returns should be paired for optimal performance. Hardware implementations that

include a return-address stack can lose stack synchronization if calls and returns are not paired.