User Guide

84 General-Purpose Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
3.7.7 System Calls
A disadvantage of far CALLs and far RETs is that they use segment-based protection and privilege-
checking. This involves significant overhead associated with loading new segment selectors and their
corresponding descriptors into the segment registers. The overhead includes not only the time required
to load the descriptors from memory but also the time required to perform the privilege, type, and limit
checks. Privilege-changing CALLs to the operating system are slowed further by the control transfer
through a gate descriptor.
SYSCALL and SYSRET. SYSCALL and SYSRET are low-latency system-call and system-return
control-transfer instructions. They can be used in protected mode. These instructions eliminate
segment-based privilege checking by using pre-determined target and return code segments and stack
segments. The operating system sets up and maintains the predetermined segments using special
registers within the processor, so the segment descriptors do not need to be fetched from memory when
the instructions are used. The simplifications made to privilege checking allow SYSCALL and
SYSRET to complete in far fewer processor clock cycles than CALL and RET.
SYSRET can only be used to return from CPL = 0 procedures and is not available to application
software. SYSCALL can be used by applications to call operating system service routines running at
CPL = 0. The SYSCALL instruction does not take operands. Linkage conventions are initialized and
maintained by the operating system. “System-Management Instructions” in Volume 2 contains
detailed information on the operation of SYSCALL and SYSRET.
SYSENTER and SYSEXIT. The SYSENTER and SYSEXIT instructions provide similar capabilities
to SYSCALL and SYSRET. However, these instructions can be used only in legacy mode and are not
supported in long mode. SYSCALL and SYSRET are the preferred instructions for calling privileged
software. See “System-Management Instructions” in Volume 2 for further information on SYSENTER
and SYSEXIT.
3.7.8 General Considerations for Branching
Branching causes delays which are a function of the hardware-implementation’s branch-prediction
capabilities. Sequential flow avoids the delays caused by branching but is still exposed to delays
caused by cache misses, memory bus bandwidth, and other factors.
In general, branching code should be replaced with sequential code whenever practical. This is
especially important if the branch body is small (resulting in frequent branching) and when branches
depend on random data (resulting in frequent mispredictions of the branch target). In certain hardware
implementations, far branches (as opposed to near branches) may not be predictable by the hardware,
and recursive functions (those that call themselves) may overflow a return-address stack.
All calls and returns should be paired for optimal performance. Hardware implementations that
include a return-address stack can lose stack synchronization if calls and returns are not paired.