Portable x86 Flat Syntax

MazeGen, 2007-04-18

Revision: 1.0

From time to time, speculations about portable assembler (what is a contradiction itself) araise. As an assembly programmer, I couldn't avoid these weird speculations. The similarity between x86-32 and x64 simply can't be overlooked and itself leads to finding some conjunctive syntax, here called Portable x86 Flat Syntax (PFS).

X64 Architecture News

x64 architecture highlights:

RIP-relative and EIP-relative addressing (example: MOV [RIP-2008h], RAX)
Operands extended to 64 bits
Eight new general registers, r8 to r15
Eight new XMM registers, xmm8 to xmm15

Let's start with prohibition of all these new features. This way, 32-bit code is portable to x64 architecture. No problem, all works well :-)

Unfortunately, it is not so simple. These are incompatible issues:

The size (width) of instruction pointer is 64 bits as well (RIP), thus all pointers should be 64-bit too. Even if whole process would be loaded below 4GB boundary, where 32-bit pointers are big enough, pointers returned by operating system can be above this boundary.
Default stack size (width) is 64 bits. This makes encoding of instructions like PUSH EAX impossible – only PUSH RAX or PUSH AX can be used (this applies to 32-bit memory operand as well).

It is clear that we must set specific rules in this concept.

Minimal PFS

What rules should be given?

A register holding a pointer must be used the same way like in 64-bit code. While compiling to 32-bit code, it will be transformed to 32 bits. When stored in memory, it must use an abstract type, say PVOID. This type is ruled same way. It would be defined this way: (preudocode)

IF compile-for-64-bits
 PVOID typedef QWORD
ELSE
 PVOID typedef DWORD
ENDIF

Example of using a pointer within PFS:

port_ptr PVOID ?
 ...
 mov [port_ptr], rax   ; in 32-bit code, rax becomes eax

Making instructions like PUSH DWORD PTR [foo] portable is a puzzler. Forcing 64-bit declaration of foo variable is not the way to go (because of memory consumption etc.). New type, specific to variables which pass through stack, is also not suitable. To solve this issue, let's assume that the implementation of PFS (called Portable x86 Flat Framework, PFF) will provide comfortable way of using local variables in functions. This way, there is almost no need for explicit use of PUSH and POP instructions and these can be forbidden. The exception is PUSH const, which always default to current stack width using zero extension (this feature makes the instruction partially portable).

These two rules should solve all incompatibilities. It is something what we could call Minimal Portable Flat Syntax (MPFS). Let's try if we can do better. Again, here go new 64-bit features:

Concept of final PFS Rules

RIP-relative and EIP-relative adressing

Can't be explicitly enabled, this addressing mode is not available in 32-bit mode. There is a trick to get current EIP, we won't use it for simplicity though. Besides this, an assembler can choose this addressing mode regardless of a syntax.

Operands extended to 64 bits

These could be enabled only using complicated transformations in 32-bit code. For simplicity, it is not enabled (it works only for pointers, see first rule).

Eight new general registers, r8 to r15
Eight new XMM registers, xmm8 to xmm15

Can be enabled, if we assume that a PFS implementation supports creation of threads on its own and can manage issues connected with emulation of these registers. Additionally, at least one original general register must is reserved for transformation of instructions like MOV EAX, [R8] to 32-bit code: (pseudocode)

 ; r8d_reg is emulated register r8d
 ; tmp_reg is a general register, reserved for PFS

 mov tmp_reg, [r8d_reg]
 mov eax, [tmp_reg]

This implies that if we'd want to enable addressing like MOV EAX, [R8+R9*4], another reserved register would be necessary:

 mov tmp_reg1, [r8d_reg]
 mov tmp_reg2, [r9d_reg]
 mov eax, [tmp_reg1+tmp_reg2*4]

First possible reserved register, which comes to my mind, is rBX. This register is used by default only within XLAT instruction, which is not much in use and can be easily replaced.

It gets worse for the other reserved register, because all remaining ones hold specific meaning in some instructions. Now, we need to realize that some instructions have the same syntax in both modes. If we add the no-64-bit-operands rule, it appers that, for example, REP STOSD instruction has the same syntax also within PFS. If we make clear what circumtances lead to use (rewrite) of reserved registers, we can reserve also rCX register. This one is difficult to replace only in case of REP prefixes family. Using this register, we can use also this code within PFS:

 xor eax, eax        ; (or mov eax, 0 for those who don't like this ;-))
 lea rdi, [buffer]
 mov ecx, [rdx]      ; beware of any new registers (r8, ...)
 rep stosd

Still, not all operand combinations are solved. Most complicated one is MOV [R8+R9], R10. Ideal solution would be another reserved register:

 mov tmp_reg1, [r8d_reg]
 mov tmp_reg2, [r9d_reg]
 mov tmp_reg3, [r10d_reg]
 mov [tmp_reg1+tmp_reg2], tmp_reg3

However, we can't allocate another one, three reserved registers would be too much. We can work around it this way instead:

 mov tmp_reg1, [r8d_reg]
 mov tmp_reg2, [r9d_reg]
 lea tmp_reg1, [tmp_reg1+tmp_reg2]   ; release tmp_reg2
 mov tmp_reg2, [r10d_reg]
 mov [tmp_reg1], tmp_reg2

None of XMM register is hardcoded so we can reserve, for example, xmm7.

PFS Rules

All rules are now given. They can be recapitulated and specified like the following:

PFS is similar to 64-bit code syntax with these differences:

Full 64-bit registers can't be used. The exception is using base register, which must be always 64-bit. Scaled index can't never be above FFFFFFFFh, otherwise it couldn't be accessible from within 32-bit code.
A pointer can be stored in memory only using specific type, PVOID. This type manages portability of pointers.
It is not possible to use RIP-relative addressing, which can't be emulated in 32-bit code. However, an assembler can generate this addressing, when appropriate.
It is not possible to use PUSH and POP instructions explicitly. The only exception is PUSH const instruction, where const is below or equal FFFFFFFFh.
It is possible to use all registers (including the new ones). An exception is limited use of eBX, eCX and xmm7 registers (see next rule).
Transformation to 32-bit code works this way: If one of the new general registers is used either directly or as a base register in an address, ebx register is rewritten (destroyed). If one of those registers is used as an index additionally, ebx register is rewritten. If one of the new XMM registers is used, xmm7 register is rewritten. In all other cases, it is guaranteed that these registers will remain the same.
Threads are created by PFS implementation on its own. This way, the emulation of new registers can be handled in multi-threading environment.

An example of most of these rules:

port_base PVOID ?
port_index DWORD ?
 ...
 mov [port_base], rax
 mov r8, [port_base]
 mov r9d, [port_index]        ; acts as movzx in 64-bit mode
 add r10w, [buffer+r8+r9*2]

The 32-bit transformation would look like this:

 mov [port_base], eax         ; mov [port_base], rax
 mov ebx, [port_base]         ; mov r8, [port_base]
 mov [r8d_reg], ebx
 mov ebx, [port_index]        ; mov r9d, [port_index]
 mov [r9d_reg], ebx
 mov ebx, [r8d_reg]           ; add r10w, [buffer+r8+r9*2]
 mov ecx, [r9d_reg]
 mov bx, [buffer+ebx+ecx*2]
 add word ptr [r10d_reg], bx

Portable X86 Flat Framework (PFF)

To bring this syntax to light, here goes description of PFF, created using MASM macros.

Since I'm not interested in complete framework, the sample is just a basic demo, which includes only portable code of primary thread (only a few instructions). For compilation, I use ML.EXE and ML64.EXE 8.00.50727.42 (shipped with Visual Studio 2005).

This is how the command lines look like:

ml /c /Cp /Fl /DPFF32 demo.asm
link /SUBSYSTEM:WINDOWS /entry:main demo.obj

ml64 /c /Cp /Fl demo.asm
link /SUBSYSTEM:WINDOWS /entry:main demo.obj

PFS Demo

Instructions like mov@ and similar are macros, which manage the portability.

;

IFDEF PFF32			; add the header only for 32-bit code
.686
.MODEL FLAT, STDCALL
ENDIF

include pff.asm

.DATA?
port_base  PVOID ?
port_index DWORD ?

.DATA
buffer     WORD 5 DUP (20h)

.CODE
main PROC
 lea rax, [buffer]		; acts the same in both modes

 mov@ [port_base], rax
 mov@ [port_index], 1

 mov@ r8, [port_base]
 mov@ r9d, [port_index]		; (acts as movzx in 64-bit mode)

 mov@ r11w, 2
 add@ [r8+r9*2], r11w

 mov@ r12w, 22h
 cmp@ r12w, [r8+r9*2]
 jne main
main ENDP

END

;

PFF Macros

;

IFDEF PFF32
 PVOID TYPEDEF DWORD
ELSE
 PVOID TYPEDEF QWORD
ENDIF

IFDEF PFF32

; init: no temp register is being used

PFF_EBX = 0
PFF_ECX = 0

; 64-bit general registers (which may hold only a pointer) are simply
; EQUated to 32-bit ones for 32-bit mode

rax TEXTEQU <eax>
;...

; usage of any of new general registers causes calling of pff_r macro,
; which move the emulated value into free reserved register (eBX or eCX)

r8  EQU <pff_r (r8, d)>
r9  EQU <pff_r (r9, d)>
;...

r9d  EQU <pff_r (r9, d)>
;...

r11w EQU <pff_r (r11, w)>
r12w EQU <pff_r (r12, w)>
;...

; set registers mapping to reserved registers so it is possible to test
; whether a register is emulated or not

rax_mapping TEXTEQU <>
;...

r8_mapping  TEXTEQU <ebx>
r9_mapping  TEXTEQU <ebx>
;...

r9d_mapping  TEXTEQU <ebx>
;...

r11w_mapping TEXTEQU <bx>
r12w_mapping TEXTEQU <bx>
;...

; Macro pff_get_tmp_r
;
; This macro returns appropriate reserved register, which would be currently
; used with given emulated register
;
; If no reserved register is available, macro returns blank string.
;
; Input:
;  regex   emulated register name with "_" postfix

pff_get_tmp_r MACRO regex:REQ
LOCAL postfix

 IF PFF_EBX AND PFF_ECX
   EXITM <>				; no reserved register available
 ENDIF

 postfix SUBSTR <regex>, @SizeStr (regex) - 1

%IFIDN <postfix>, <d_>		; dword register
   IFE PFF_EBX
     EXITM <ebx>
   ELSE
     EXITM <ecx>
   ENDIF
%ELSEIFIDN <postfix>, <w_>	; word
   IFE PFF_EBX
     EXITM <bx>
   ELSE
     EXITM <cx>
   ENDIF
%ELSEIFIDN <postfix>, <b_>	; byte
   IFE PFF_EBX
     EXITM <bl>
   ELSE
     EXITM <cl>
   ENDIF
 ELSE				; qword
   IFE PFF_EBX
     EXITM <ebx>
   ELSE
     EXITM <ecx>
   ENDIF
 ENDIF
ENDM

; Macro pff_r
;
; This macro moves the value of an emulated register to free reserved
; register and returns the register.
;
; Input:
;  regex   emulated register name
;  size    emulated register size

pff_r MACRO regex:REQ, size:REQ
 IFIDNI <size>, <b>
   IFE PFF_EBX
     PFF_EBX = 1
     mov bl, pff.global&regex&size
     EXITM <bl>
   ELSE
     PFF_ECX = 1
     mov cl, pff.global&regex&size
     EXITM <cl>
   ENDIF

 ELSEIFIDNI <size>, <w>
   IFE PFF_EBX
     PFF_EBX = 1
     mov bx, pff.global&regex&size
     EXITM <bx>
   ELSE
     PFF_ECX = 1
     mov cx, pff.global&regex&size
     EXITM <cx>
   ENDIF

 ELSE
   IFE PFF_EBX
     PFF_EBX = 1
     mov ebx, pff.global&regex&size
     EXITM <ebx>
   ELSE
     PFF_ECX = 1
     mov ecx, pff.global&regex&size
     EXITM <ecx>
   ENDIF
 ENDIF
ENDM

; Macro pff_meta
;
; This macro provides the facility for two-operand instructions.
;
; Input:
;  type   type of operation: read/write
;  op     the operation itself (mov, add, cmp, test, ...)
;  op1    destination operand
;  op2    source operand

pff_meta MACRO type:REQ, op:REQ, op1:REQ, op2:REQ
LOCAL src, dst
LOCAL tmp

 ; add "_" to prevent expansion of possible emulated register

 IFE @InStr (, op1&_, <[>)	; destination operand is not a memory location

   ; if the destination is an emulated register...

   tmp TEXTEQU op1&_mapping

  %IFNB <tmp>

     ; if the source is a memory location, load it first to tmp register

     IF @InStr (, op2&_, <[>)	; source operand is a memory location
       mov op1&_mapping, op2

       PFF_ECX = 0		; now, the second reserved register can be used

       ; don't load current emulated value if the operation is MOV

       IFDIF <op>, <mov>
         dst TEXTEQU op1
       ELSE
         dst TEXTEQU pff_get_tmp_r (op1&_)
       ENDIF

       ; don't perform MOV operation since it is unnecessary in this case
       ; (MOV is here actually performed by the the former and the latter MOV)

       IFDIF <op>, <mov>
         op dst, op1&_mapping
       ENDIF

       IFIDN <type>, <write>	; if mov, add, etc., write it back
         mov [pff.global&op1&], dst
       ENDIF

     ELSE			; source operand is not a memory location
       IFDIF <op>, <mov>
         dst TEXTEQU op1
       ELSE
         dst TEXTEQU pff_get_tmp_r (op1&_)
       ENDIF
       op dst, op2
       IFIDN <type>, <write>
         mov [pff.global&op1&], dst
       ENDIF
     ENDIF

   ELSE
     op op1, op2
   ENDIF

 ELSE	; op1 is a memory location

   IFNDEF op2&_mapping		; catch immediate source operand
     tmp TEXTEQU <>
   ELSE
     tmp TEXTEQU op2&_mapping
   ENDIF

  %IFNB <tmp>			; source is an emulated register

     ; if the destination is a memory location, load first its address
     ; to tmp register

     IF @InStr (, op1&_, <[>)	; destination operand is a memory location
       lea ebx, op1

       PFF_ECX = 0		; now, the second reserved register can be used

       src TEXTEQU op2		; load current emulated value

       op [ebx], src

     ELSE			; destination operand is not a memory location
       op op1, op2&_mapping
     ENDIF

   ELSE
     op op1, op2
   ENDIF
 ENDIF

 ; set both temp registers as unused

 PFF_EBX = 0
 PFF_ECX = 0
ENDM

; Macros supplying original instructions

mov@ MACRO op1:REQ, op2:REQ
 pff_meta write, mov, op1, op2
ENDM

add@ MACRO op1:REQ, op2:REQ
 pff_meta write, add, op1, op2
ENDM

cmp@ MACRO op1:REQ, op2:REQ
 pff_meta read, cmp, op1, op2
ENDM

; Internal macro pff_global_r
;
; This internal macro is just used to declare global memory space for
; emulated registers; see PFF struct

pff_global_r MACRO regex:REQ
 UNION
  global&regex     DWORD ?
  global&regex&d   DWORD ?
  global&regex&w   WORD ?
  global&regex&b   BYTE ?
 ENDS
ENDM

PFF STRUCT
 pff_global_r r8
 pff_global_r r9
 pff_global_r r10
 pff_global_r r11
 pff_global_r r12
 pff_global_r r13
 pff_global_r r14
 pff_global_r r15
PFF ENDS

ELSE ; IF PFF32

mov@ TEXTEQU <mov>
add@ TEXTEQU <add>
cmp@ TEXTEQU <cmp>

ENDIF

.DATA?
IFDEF PFF32
pff PFF <>   ; reserve space for emulated registers
ENDIF

;

Resulting 64-bit Code

The resulting 64-bit code equates to the source one.

main:
 lea rax, [402000h]
 mov [402010h], rax
 mov dword ptr [402018h], 1
 mov r8, [402010h]
 mov r9d, [402018h]
 mov r11w, 2
 add [r8+r9*2], r11w
 mov r12w, 22h
 cmp r12w, [r8+r9*2]
 jne main

Resulting 32-bit Code

The code is edited by hand to make it more clear.

main:
 ; lea rax, [buffer]

 lea eax, [402000]

 ; mov@ [port_base], rax

 mov [402030], eax

 ; mov@ [port_index], 1

 mov dword ptr [402034], 1

 ; mov@ r8, [port_base]

 mov ebx, [402030]
 mov [402010], ebx

 ; mov@ r9d, [port_index]

 mov ebx, [402034]
 mov [402014], ebx

 ; mov@ r11w, 2

 mov bx, 2          ; unnecessary, don't care
 mov [40201C], bx

 ; add@ [r8+r9*2], r11w

 mov ebx, [402010]
 mov ecx, [402014]
 lea ebx, [ebx+ecx*2]
 mov cx, [40201C]
 add [ebx], cx

 ; mov@ r12w, 22h

 mov bx, 22         ; unnecessary, don't care
 mov [402020], bx

 ; cmp@ r12w, [r8+r9*2]

 mov ebx, [402010]
 mov ecx, [402014]
 mov bx, [ebx+ecx*2]
 mov cx, [402020]      ; unnecessary, don't care
 cmp cx, bx

 jnz main