x86-64 Tour of Intel Manuals

MazeGen, 2007-10-04

Revision: 1.0

I'm always surprised by how few asmers use probably the best source of information available – official processor manuals, either Intel's or AMD's. That's why this article was written. It should guide you step by step through complexity of Intel manuals, describing x86-64 architecture in the process.

Majority of asmers learn from various unofficial references and information sources. The reason could be the fact that orientation on intel.com or amd.com websites isn't easy, and direct links are suprisingly not spread. Additionally, these manuals are very complex and it takes time to learn using them. In short, these manuals are not very popular, even though the unofficial sources are often incomplete and imprecise.

This article assumes understanding of programming in 32-bit assembler in protected mode. It is a guide to Intel manuals with some additional notes. Intel manuals and more information can be obtained from here.

The article is written from application assembly programmer's point of view and that's why it doesn't deal much with system programming issues. Instruction encoding is mentioned only very briefly, too.

Note. At the time of writing this article, I use the latest revision of Intel manuals, which is nr. 022 for Intel. Many information may changed if you've got older than, say, revision 020. I recommend to use the latest ones. (The revision number is located on the very first page of any Intel manual as the last three numbers of the order number.)

x86-64, x64

What does x86-64 mean anyway? It is an extension to original x86-32 architecture, which was born with 80386 processor. Recently, Intel started calling this extended architecture as Intel 64 Architecture (formerly still known as IA-32 Intel Architecture with 64-bit extensions). AMD used to call it steadily as AMD64. To refer this architecture independently on the manufacturer, the name x86-64 or x64 is used.

Intel 64 Architecture

A recapitulation of the architecture's features can be found in chapter 2.2.7 Intel® 64 Architecture in manual Basic Architecture:

Intel 64 architecture increases the linear address space for software to 64 bits and supports physical address space up to 40 bits. The technology also introduces a new operating mode referred to as IA-32e mode.

IA-32e mode operates in one of two sub-modes: (1) compatibility mode enables a 64-bit operating system to run most legacy 32-bit software unmodified, (2) 64-bit mode enables a 64-bit operating system to run applications written to access 64-bit address space.

In the 64-bit mode, applications may access:

64-bit flat linear addressing

8 additional general-purpose registers (GPRs)

8 additional registers for streaming SIMD extensions (SSE, SSE2, SSE3 and SSSE3)

64-bit-wide GPRs and instruction pointers

uniform byte-register addressing

fast interrupt-prioritization mechanism

a new instruction-pointer relative-addressing mode

An Intel 64 architecture processor supports existing IA-32 software because it is able to run all non-64-bit legacy modes supported by IA-32 architecture. Most existing IA-32 applications also run in compatibility mode.

IA-32e Mode

Intel 64 architecture runs in IA-32e mode. This mode is described in chapter 3.1.1 Intel® 64 Architecture in manual Basic Architecture. The interesting point is 64-bit mode:

Intel 64 architecture adds IA-32e mode. IA-32e mode has two sub-modes. These are:

Compatibility mode (sub-mode of IA-32e mode) – …

64-bit mode (sub-mode of IA-32e mode) – This mode enables a 64-bit operating system to run applications written to access 64-bit linear address space. For brevity, the 64-bit sub-mode is referred to as 64-bit mode in IA-32 architecture. 64-bit mode extends the number of general purpose registers and SIMD extension registers from 8 to 16. General purpose registers are widened to 64 bits. The mode also introduces a new opcode prefix (REX) to access the register extensions. See Section 3.2.1 for a detailed description. 64-bit mode is enabled by the operating system on a code-segment basis. Its default address size is 64 bits and its default operand size is 32 bits. The default operand size can be overridden on an instruction-by-instruction basis using a REX opcode prefix in conjunction with an operand size override prefix. REX prefixes allow a 64-bit operand to be specified when operating in 64-bit mode. By using this mechanism, many existing instructions have been promoted to allow the use of 64-bit registers and 64-bit addresses.

The description of Compatibility Mode is ommitted on purpose, because this mode is virtually identical to 32-bit protected mode.

Note. The term Long Mode is often used in connection with x64 architecture. This term initally used AMD and it is nothing else than IA-32e mode. This term is also misused in situations when "64-bit mode" should be used. This term share both Intel and AMD.

64-bit Mode

Chapter 3.2.1 64-Bit Mode Execution Environment in manual Basic Architecture summarizes the differencies in 64-bit mode:

The execution environment for 64-bit mode is similar to that described in Section 3.2. The following paragraphs describe the differences that apply.

Address space – A task or program running in 64-bit mode on an IA-32 processor can address linear address space of up to 264 bytes (subject to the canonical addressing requirement described in Section 3.3.7.1) and physical address space of up to 240 bytes. Software can query CPUID for the physical address size supported by a processor.

Basic program execution registers – The number of general-purpose registers (GPRs) available is 16. GPRs are 64-bits wide and they support operations on byte, word, doubleword and quadword integers. Accessing byte registers is done uniformly to the lowest 8 bits. The instruction pointer register becomes 64 bits. The EFLAGS register is extended to 64 bits wide, and is referred to as the RFLAGS register. The upper 32 bits of RFLAGS is reserved. The lower 32 bits of RFLAGS is the same as EFLAGS. See Figure 3-2.

XMM registers – There are 16 XMM data registers for SIMD operations. See Section 10.2, SSE Programming Environment, for more information about these registers.

Stack – The stack pointer size is 64 bits. Stack size is not controlled by a bit in the SS descriptor (as it is in non-64-bit modes) nor can the pointer size be overridden by an instruction prefix.

Control registers – Control registers expand to 64 bits. A new control register (the task priority register: CR8 or TPR) has been added. See Chapter 2, Intel® 64 and IA-32 Architectures, in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A.

Debug registers – Debug registers expand to 64 bits. See Chapter 18, Debugging and Performance Monitoring, in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B.

Descriptor table registers – The global descriptor table register (GDTR) and interrupt descriptor table register (IDTR) expand to 10 bytes so that they can hold a full 64-bit base address. The local descriptor table register (LDTR) and the task register (TR) also expand to hold a full 64-bit base address.

The interesting part for an assembler programmer is the change of stack pointer size, which is fixed to 64 bits in 64-bit mode.

Note. In 32-bit protected mode, B (big) flag of stack segment descriptor controls the default size of the stack pointer, independently of code segment setting. It can be used, for instance, to force 16-bit stack in 32-bit mode.

Memory Model

The memory model is described in chapter 3.3.4 Modes of Operation vs. Memory Model in manual Basic Architecture (synopsis):

Segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. Specifically, the processor treats the segment base of CS, DS, ES, and SS as zero in 64-bit mode (this makes a linear address equal an effective address). Segmented and real address modes are not available in 64-bit mode.

Segment Registers

The new memory model relates to different interpretation of segment registers. This relation is described in detail in chapter 3.4.2.1 Segment Registers in 64-Bit Mode in manual Basic Architecture:

In 64-bit mode: CS, DS, ES, SS are treated as if each segment base is 0, regardless of the value of the associated segment descriptor base. This creates a flat address space for code, data, and stack. FS and GS are exceptions. Both segment registers may be used as additional base registers in linear address calculations (in the addressing of local data and certain operating system data structures).

Even though segmentation is generally disabled, segment register loads may cause the processor to perform segment access assists. During these activities, enabled processors will still perform most of the legacy checks on loaded values (even if the checks are not applicable in 64-bit mode). Such checks are needed because a segment register loaded in 64-bit mode may be used by an application running in compatibility mode.

Limit checks for CS, DS, ES, SS, FS, and GS are disabled in 64-bit mode.

Default Operand and Address Size

Another important change is new rules for default operand and address size, much simpler in comparison with 32-bit protected mode.

Note. In 32-bit protected mode, the D (default size) flag of code segment descriptor controls the default operand and address size, thus there can be more code segments with various operand and address sizes at a time.

These rules are described in section 3.6.1 Operand Size and Address Size in 64-Bit Mode in manual Basic Architecture (synopsis):

In 64-bit mode, the default address size is 64 bits and the default operand size is 32 bits. Defaults can be overridden using prefixes. Address-size and operand-size prefixes allow mixing of 32/64-bit data and 32/64-bit addresses on an instruction-by-instruction basis. Table 3-4 shows valid combinations of the 66H instruction prefix and the REX.W prefix that may be used to specify operand-size overrides in 64-bit mode. Note that 16-bit addresses are not supported in 64-bit mode.

REX prefixes consist of 4-bit fields that form 16 different values. The W-bit field in the REX prefixes is referred to as REX.W. If the REX.W field is properly set, the prefix specifies an operand size override to 64 bits. Note that software can still use the operand-size 66H prefix to toggle to a 16-bit operand size. However, setting REX.W takes precedence over the operand-size prefix (66H) when both are used.

In the case of SSE/SSE2/SSE3/SSSE3 SIMD instructions: the 66H, F2H, and F3H prefixes are mandatory for opcode extensions. In such a case, there is no interaction between a valid REX.W prefix and a 66H opcode extension prefix.

Table 3-4. Effective Operand- and Address-Size Attributes in 64-Bit Mode

L Flag in Code Segment Descriptor 1 1 1 1 1 1 1 1

REX.W Prefix 0 0 0 0 1 1 1 1

Operand-Size Prefix 66H N N Y Y N N Y Y

Address-Size Prefix 67H N Y N Y N Y N Y

Effective Operand Size 32 32 16 16 64 64 64 64

Effective Address Size 64 32 64 32 64 32 64 32

Table 3-4. Effective Operand- and Address-Size Attributes in 64-Bit Mode
L Flag in Code Segment Descriptor	1	1	1	1	1	1	1	1
REX.W Prefix	0	0	0	0	1	1	1	1
Operand-Size Prefix 66H	N	N	Y	Y	N	N	Y	Y
Address-Size Prefix 67H	N	Y	N	Y	N	Y	N	Y
Effective Operand Size	32	32	16	16	64	64	64	64
Effective Address Size	64	32	64	32	64	32	64	32

Instruction Pointer

Instruction pointer size change is described in chapter 3.5.1 Instruction Pointer in 64-Bit Mode in manual Basic Architecture:

In 64-bit mode, the RIP register becomes the instruction pointer. This register holds the 64-bit offset of the next instruction to be executed. 64-bit mode also supports a technique called RIP-relative addressing. Using this technique, the effective address is determined by adding a displacement to the RIP of the next instruction.

The manual also mentions new addressing mode, RIP-relative addressing. It is one most interesting features added with 64-bit mode. More about RIP-relative addressing can be found in section 3.7.5.1 Specifying an Offset in 64-Bit Mode v manuálu Basic Architecture:

The offset part of a memory address in 64-bit mode can be specified directly as a static value or through an address computation made up of one or more of the following components:

Displacement – 8-bit, 16-bit, or 32-bit value.

Base – The value in a 32-bit (or 64-bit if REX.W is set) general-purpose register.

Index – The value in a 32-bit (or 64-bit if REX.W is set) general-purpose register.

Scale factor – A value of 2, 4, or 8 that is multiplied by the index value.

The base and index value can be specified in one of sixteen available general-purpose registers in most cases. See Chapter 2, Instruction Format, in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A.

The following unique combination of address components is also available.

RIP + Displacement – In 64-bit mode, RIP-relative addressing uses a signed 32-bit displacement to calculate the effective address of the next instruction by sign-extend the 32-bit value and add to the 64-bit value in RIP.

Important aspect of RIP-relative addressing is, that it is not possible to use any other register in the address, like [RIP+EAX] or similar. Another (not so obvious) aspect of RIP-relative addressing is fact, that it is (just like any other addressing) controlled by address-size override prefix 67. With this prefix, it is possible to address relative to EIP:

67 8B 05 10 00 00 00   MOV EAX, [EIP+10h]

This is not described anywhere in manuals directly, nor called EIP-relative addressing.

Address Calculation

As mentioned, the default address size is 64 bits. Section 3.3.7 Address Calculations in 64-Bit Mode in manual Basic Architecture starts with description of 64-bit instruction pointer calculations:

In most cases, 64-bit mode uses flat address space for code, data, and stacks. In 64-bit mode (if there is no address-size override), the size of effective address calculations is 64 bits. An effective-address calculation uses a 64-bit base and index registers and sign-extend displacements to 64 bits.

In the flat address space of 64-bit mode, linear addresses are equal to effective addresses because the base address is zero. In the event that FS or GS segments are used with a non-zero base, this rule does not hold. In 64-bit mode, the effective address components are added and the effective address is truncated (See for example the instruction LEA) before adding the full 64-bit segment base. The base is never truncated, regardless of addressing mode in 64-bit mode.

The instruction pointer is extended to 64 bits to support 64-bit code offsets. The 64-bit instruction pointer is called the RIP. Table 3-1 shows the relationship between RIP, EIP, and IP.

Table 3-1. Instruction Pointer Sizes

Bits 63:32 Bits 31:16 Bits 15:0

16-bit instruction pointer Not Modified IP

32-bit instruction pointer Zero Extension EIP

64-bit instruction pointer RIP

Table 3-1. Instruction Pointer Sizes
	Bits 63:32	Bits 31:16	Bits 15:0
16-bit instruction pointer	Not Modified	IP
32-bit instruction pointer	Zero Extension	EIP
64-bit instruction pointer	RIP

This section further describes calculations of other addresses and immediates:

Generally, displacements and immediates in 64-bit mode are not extended to 64 bits. They are still limited to 32 bits and sign-extended during effective-address calculations. In 64-bit mode, however, support is provided for 64-bit displacement and immediate forms of the MOV instruction.

All 16-bit and 32-bit address calculations are zero-extended in IA-32e mode to form 64-bit addresses. Address calculations are first truncated to the effective address size of the current mode (64-bit mode or compatibility mode), as overridden by any address-size prefix. The result is then zero-extended to the full 64-bit address width. Because of this, 16-bit and 32-bit applications running in compatibility mode can access only the low 4 GBytes of the 64-bit mode effective addresses. Likewise, a 32-bit address generated in 64-bit mode can access only the low 4 GBytes of the 64-bit mode effective addresses.

The former paragraph needs further explanation. As for address calculation, one fact is quite obvious: the displacement size still remain 32 bits and it is sign-extended once the address is calculated. Less obvious one is that address consisting only of a displacement can't address a range of 80000000h to FFFFFFFF_7FFFFFFFh, inclusive. An exception are forms of MOV instructions, whose one operand is the accumulator (rAX register) and other is immediate memory offset (opcodes A0, A1, A2, and A3). In these cases, it is possible to use full 64-bit address, what makes these instructions kind of privileged:

48 A1 00 00 00 00 00 00 00 80   MOV RAX, [8000000000000000]

As for immediates, they still remain 32-bit. There is again one exception – those forms of MOV instructions, whose destination operand is one of general-purpose registers (opcodes B8 to BF):

48 BA 00 00 00 00 00 00 00 80   MOV RDX, 8000000000000000

Default 64-bit operand

Default address size is always 64 bits in 64-bit mode. However, default operand size is 32 bits, and stack width is 64 bits. That causes another exceptions described in section 2.2.1.7 Default 64-Bit Operand Size in manual Instruction Set Reference, A-M:

In 64-bit mode, two groups of instructions have a default operand size of 64 bits (do not need a REX prefix for this operand size). These are:

Near branches

All instructions, except far branches, that implicitly reference the RSP

The fact that near branches are 64-bit (operand is RIP register) by default won't suprise anyone. Much more about them can be found in section 6.3.7 Branch Functions in 64-Bit Mode v manual Basic Architecture, but there's no need to quote it here.

The second group of instructions (which includes PUSH etc.) has much bigger consequences for an assembly programmer. It means that instruction like PUSH EAX cannot be used, only PUSH RAX (or PUSH AX with prefix 66). These instruction use 64-bit operand by default, and there is no way to encode them with 32-bit operands. More in section 6.2.5 Stack Behavior in 64-Bit Mode in manual Basic Architecture:

In 64-bit mode, address calculations that reference SS segments are treated as if the segment base is zero. Fields (base, limit, and attribute) in segment descriptor registers are ignored. SS DPL is modified such that it is always equal to CPL. This will be true even if it is the only field in the SS descriptor that is modified.

Registers E(SP), E(IP) and E(BP) are promoted to 64-bits and are re-named RSP, RIP, and RBP respectively. Some forms of segment load instructions are invalid (for example, LDS, POP ES).

PUSH/POP instructions increment/decrement the stack using a 64-bit width. When the contents of a segment register is pushed onto 64-bit stack, the pointer is automatically aligned to 64 bits (as with a stack that has a 32-bit width).

General-purpose Registers

As mentioned before, these registers are extended to 64 bits and eight new registers are added. 64-bit mode also has few surprising features, described in 3.4.1.1 General-Purpose Registers in 64-Bit Mode in manual Basic Architecture (synopsis):

In 64-bit mode, there are 16 general purpose registers and the default operand size is 32 bits. However, general-purpose registers are able to work with either 32-bit or 64-bit operands. If a 32-bit operand size is specified: EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D are available. If a 64-bit operand size is specified: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8-R15 are available. R8D-R15D/R8-R15 represent eight new general-purpose registers. All of these registers can be accessed at the byte, word, dword, and qword level. REX prefixes are used to generate 64-bit operand sizes or to reference registers R8-R15.

In 64-bit mode, there are limitations on accessing byte registers. An instruction cannot reference legacy high-bytes (for example: AH, BH, CH, DH) and one of the new byte registers at the same time (for example: the low byte of the RAX register). However, instructions may reference legacy low-bytes (for example: AL, BL, CL or DL) and new byte registers at the same time (for example: the low byte of the R8 register, or RBP). The architecture enforces this limitation by changing high-byte references (AH, BH, CH, DH) to low byte references (BPL, SPL, DIL, SIL: the low 8 bits for RBP, RSP, RDI and RSI) for instructions using a REX prefix.

When in 64-bit mode, operand size determines the number of valid bits in the destination general-purpose register:

64-bit operands generate a 64-bit result in the destination general-purpose register.

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register.

8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56 bits or 48 bits (respectively) of the destination general-purpose register are not be modified by the operation. If the result of an 8-bit or 16-bit operation is intended for 64-bit address calculation, explicitly sign-extend the register to the full 64-bits.

Table 3-2. Addressable General Purpose Registers

Register Type Without REX With REX

Byte Registers AL, BL, CL, DL, AH, BH, CH, DH AL, BL, CL, DL, DIL, SIL, BPL, SPL, R8L - R15L

Word Registers AX, BX, CX, DX, DI, SI, BP, SP AX, BX, CX, DX, DI, SI, BP, SP, R8W - R15W

Doubleword Registers EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D

Quadword Registers N.A. RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8 - R15

Table 3-2. Addressable General Purpose Registers
Register Type	Without REX	With REX
Byte Registers	AL, BL, CL, DL, AH, BH, CH, DH	AL, BL, CL, DL, DIL, SIL, BPL, SPL, R8L - R15L
Word Registers	AX, BX, CX, DX, DI, SI, BP, SP	AX, BX, CX, DX, DI, SI, BP, SP, R8W - R15W
Doubleword Registers	EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP	EAX, EBX, ECX, EDX, EDI, ESI, EBP, ESP, R8D - R15D
Quadword Registers	N.A.	RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8 - R15

Perhaps the most surprising fact is that an instruction such as MOV EAX, EBX automatically zeroes upper 32 bits of RAX register. This doesn't happen with instructions that only read destination registers, like TEST EAX, EBX. In this case RAX remains unmodified. There is one exception to this rule, CMOVcc instructions, for example CMOVBE. These instructions zero upper 32 bits even if the condition is false, when the move doesn't occur.

Another surprising fact is that it is impossible to use 8-bit registers AH, CH, DH, and BH together with one new feature of 64-bit mode, in instruction that requires REX prefix. For example, instruction MOV AH, SIL cannot be encoded in 64-bit mode, because SIL register requires REX 40 prefix. The reason for this is that any of REX prefixes cause remapping of original registers AH, CH, DH, and BH to SPL, BPL, SIL, and DIL:

-- 88 EC   MOV AH, CH
40 88 EC   MOV SPL, BPL

RFLAGS Register

EFLAGS Register is extended to 64-bit register RFLAGS, as described in section 3.4.3.4 RFLAGS Register in 64-Bit Mode in manual Basic Architecture:

In 64-bit mode, EFLAGS is extended to 64 bits and called RFLAGS. The upper 32 bits of RFLAGS register is reserved. The lower 32 bits of RFLAGS is the same as EFLAGS.

x87 FPU, MMX

Sections 8.1.1 x87 FPU in 64-Bit Mode and Compatibility Mode and 9.2.1 MMX Technology in 64-Bit Mode and Compatibility Mode in manual Basic Architecture just say that there are virtually no changes.

SSE/2/3, SSSE3

Sections 10.2.1 SSE in 64-Bit Mode and Compatibility Mode, 11.2.1 SSE2 in 64-Bit Mode and Compatibility Mode, and 12.1.1 SSE3/SSSE3 in 64-Bit Mode and Compatibility Mode in manual Basic Architecture just repeat that there are eight new XMM registers, as mentioned above.

New Instructions

New 64-bit mode comes with few new instructions. Most of them are just extension to 64-bit addressing so they aren't really new. List of these instructions can be found in section 5.10 64-BIT MODE INSTRUCTIONS v manual Basic Architecture:

The following instructions are introduced in 64-bit mode. This mode is a sub-mode of IA-32e mode.

CDQE Convert doubleword to quadword

CMPSQ Compare string operands

CMPXCHG16B Compare RDX:RAX with m128

LODSQ Load qword at address (R)SI into RAX

MOVSQ Move qword from address (R)SI to (R)DI

MOVZX (64-bits) Move doubleword to quadword, zero-extension

STOSQ Store RAX at address RDI

SWAPGS Exchanges current GS base register value with value in MSR address C0000102H

SYSCALL Fast call to privilege level 0 system procedures

SYSRET Return from fast system call

Instructions Encoding

Instructions encoding is described mainly in chapter CHAPTER 2 INSTRUCTION FORMAT in manual Instruction Set Reference, A-M. This topic falls outside of this article and it is not covered here.

Note. If you are interested in similar article regarding AMD manuals, let me know. I can think about it, if there is more interest.

Comments

Continue to discussion board.

My contact information is here.

Revisions

2007-10-04

1.0

First public version

MazeGen

(dates format correspond to ISO 8601)

CDQE	Convert doubleword to quadword
CMPSQ	Compare string operands
CMPXCHG16B	Compare RDX:RAX with m128
LODSQ	Load qword at address (R)SI into RAX
MOVSQ	Move qword from address (R)SI to (R)DI
MOVZX (64-bits)	Move doubleword to quadword, zero-extension
STOSQ	Store RAX at address RDI
SWAPGS	Exchanges current GS base register value with value in MSR address C0000102H
SYSCALL	Fast call to privilege level 0 system procedures
SYSRET	Return from fast system call