| MazeGen, 2007-04-18 | Revision: 1.0 |
From time to time, speculations about portable assembler (what is a contradiction itself) araise. As an assembly programmer, I couldn't avoid these weird speculations. The similarity between x86-32 and x64 simply can't be overlooked and itself leads to finding some conjunctive syntax, here called Portable x86 Flat Syntax (PFS).
x64 architecture highlights:
MOV [RIP-2008h], RAX)r8 to r15xmm8 to xmm15Let's start with prohibition of all these new features. This way, 32-bit code is portable to x64 architecture. No problem, all works well :-)
Unfortunately, it is not so simple. These are incompatible issues:
RIP), thus all pointers should be 64-bit too.
Even if whole process would be loaded below 4GB boundary, where
32-bit pointers are big enough, pointers returned by operating
system can be above this boundary.PUSH EAX impossible
– only PUSH RAX or
PUSH AX can be used (this applies to 32-bit
memory operand as well).It is clear that we must set specific rules in this concept.
What rules should be given?
A register holding a pointer must be used the same
way like in 64-bit code. While compiling to 32-bit code, it will be
transformed to 32 bits. When stored in memory, it must use an
abstract type, say PVOID. This type is ruled same
way. It would be defined this way: (preudocode)
IF compile-for-64-bits PVOID typedef QWORD ELSE PVOID typedef DWORD ENDIF
Example of using a pointer within PFS:
port_ptr PVOID ? ... mov [port_ptr], rax ; in 32-bit code, rax becomes eax
Making instructions like
PUSH DWORD PTR [foo] portable is
a puzzler. Forcing 64-bit declaration of foo variable
is not the way to go (because of memory consumption etc.). New type,
specific to variables which pass through stack, is also not
suitable. To solve this issue, let's assume that the implementation
of PFS (called Portable x86 Flat Framework,
PFF) will provide comfortable way of using local variables
in functions. This way, there is almost no need for explicit use of
PUSH and POP instructions and these can be
forbidden. The exception is PUSH const, which always
default to current stack width using zero extension (this feature
makes the instruction partially portable).
These two rules should solve all incompatibilities. It is something what we could call Minimal Portable Flat Syntax (MPFS). Let's try if we can do better. Again, here go new 64-bit features:
Can't be explicitly enabled, this addressing mode is not
available in 32-bit mode. There is a trick to get current
EIP, we won't use it for simplicity though. Besides
this, an assembler can choose this addressing mode regardless of
a syntax.
These could be enabled only using complicated transformations in 32-bit code. For simplicity, it is not enabled (it works only for pointers, see first rule).
r8 to r15xmm8 to xmm15Can be enabled, if we assume that a PFS
implementation supports creation of threads on its own and can manage
issues connected with emulation of these registers. Additionally, at
least one original general register must is reserved for transformation
of instructions like MOV EAX, [R8] to 32-bit
code: (pseudocode)
; r8d_reg is emulated register r8d ; tmp_reg is a general register, reserved for PFS mov tmp_reg, [r8d_reg] mov eax, [tmp_reg]
This implies that if we'd want to enable addressing
like MOV EAX, [R8+R9*4], another reserved
register would be necessary:
mov tmp_reg1, [r8d_reg] mov tmp_reg2, [r9d_reg] mov eax, [tmp_reg1+tmp_reg2*4]
First possible reserved register, which comes to
my mind, is rBX. This register is used by default only
within XLAT instruction, which is not much in
use and can be easily replaced.
It gets worse for the other reserved register,
because all remaining ones hold specific meaning in some
instructions. Now, we need to realize that some instructions have
the same syntax in both modes. If we add the no-64-bit-operands
rule, it appers that, for example, REP STOSD
instruction has the same syntax also within PFS. If we
make clear what circumtances lead to use (rewrite) of reserved
registers, we can reserve also rCX register. This one
is difficult to replace only in case of REP prefixes
family. Using this register, we can use also this code within
PFS:
xor eax, eax ; (or mov eax, 0 for those who don't like this ;-)) lea rdi, [buffer] mov ecx, [rdx] ; beware of any new registers (r8, ...) rep stosd
Still, not all operand combinations are solved. Most
complicated one is MOV [R8+R9], R10. Ideal
solution would be another reserved register:
mov tmp_reg1, [r8d_reg] mov tmp_reg2, [r9d_reg] mov tmp_reg3, [r10d_reg] mov [tmp_reg1+tmp_reg2], tmp_reg3
However, we can't allocate another one, three reserved registers would be too much. We can work around it this way instead:
mov tmp_reg1, [r8d_reg] mov tmp_reg2, [r9d_reg] lea tmp_reg1, [tmp_reg1+tmp_reg2] ; release tmp_reg2 mov tmp_reg2, [r10d_reg] mov [tmp_reg1], tmp_reg2
None of XMM register is hardcoded so we
can reserve, for example, xmm7.
All rules are now given. They can be recapitulated and specified like the following:
PFS is similar to 64-bit code syntax with these differences:
PVOID. This type manages portability of pointers.PUSH and
POP instructions explicitly. The only exception is
PUSH const instruction, where const
is below or equal FFFFFFFFh.eBX,
eCX and xmm7 registers (see next
rule).ebx register is rewritten
(destroyed). If one of those registers is used as an index
additionally, ebx register is rewritten. If one of
the new XMM registers is used, xmm7
register is rewritten. In all other cases, it is guaranteed that
these registers will remain the same.An example of most of these rules:
port_base PVOID ? port_index DWORD ? ... mov [port_base], rax mov r8, [port_base] mov r9d, [port_index] ; acts as movzx in 64-bit mode add r10w, [buffer+r8+r9*2]
The 32-bit transformation would look like this:
mov [port_base], eax ; mov [port_base], rax mov ebx, [port_base] ; mov r8, [port_base] mov [r8d_reg], ebx mov ebx, [port_index] ; mov r9d, [port_index] mov [r9d_reg], ebx mov ebx, [r8d_reg] ; add r10w, [buffer+r8+r9*2] mov ecx, [r9d_reg] mov bx, [buffer+ebx+ecx*2] add word ptr [r10d_reg], bx
To bring this syntax to light, here goes description of PFF, created using MASM macros.
Since I'm not interested in complete framework, the sample is just a basic demo, which includes only portable code of primary thread (only a few instructions). For compilation, I use ML.EXE and ML64.EXE 8.00.50727.42 (shipped with Visual Studio 2005).
This is how the command lines look like:
ml /c /Cp /Fl /DPFF32 demo.asm link /SUBSYSTEM:WINDOWS /entry:main demo.obj
ml64 /c /Cp /Fl demo.asm link /SUBSYSTEM:WINDOWS /entry:main demo.obj
Instructions like mov@ and similar are
macros, which manage the portability.
; IFDEF PFF32 ; add the header only for 32-bit code .686 .MODEL FLAT, STDCALL ENDIF include pff.asm .DATA? port_base PVOID ? port_index DWORD ? .DATA buffer WORD 5 DUP (20h) .CODE main PROC lea rax, [buffer] ; acts the same in both modes mov@ [port_base], rax mov@ [port_index], 1 mov@ r8, [port_base] mov@ r9d, [port_index] ; (acts as movzx in 64-bit mode) mov@ r11w, 2 add@ [r8+r9*2], r11w mov@ r12w, 22h cmp@ r12w, [r8+r9*2] jne main main ENDP END ;
;
IFDEF PFF32
PVOID TYPEDEF DWORD
ELSE
PVOID TYPEDEF QWORD
ENDIF
IFDEF PFF32
; init: no temp register is being used
PFF_EBX = 0
PFF_ECX = 0
; 64-bit general registers (which may hold only a pointer) are simply
; EQUated to 32-bit ones for 32-bit mode
rax TEXTEQU <eax>
;...
; usage of any of new general registers causes calling of pff_r macro,
; which move the emulated value into free reserved register (eBX or eCX)
r8 EQU <pff_r (r8, d)>
r9 EQU <pff_r (r9, d)>
;...
r9d EQU <pff_r (r9, d)>
;...
r11w EQU <pff_r (r11, w)>
r12w EQU <pff_r (r12, w)>
;...
; set registers mapping to reserved registers so it is possible to test
; whether a register is emulated or not
rax_mapping TEXTEQU <>
;...
r8_mapping TEXTEQU <ebx>
r9_mapping TEXTEQU <ebx>
;...
r9d_mapping TEXTEQU <ebx>
;...
r11w_mapping TEXTEQU <bx>
r12w_mapping TEXTEQU <bx>
;...
; Macro pff_get_tmp_r
;
; This macro returns appropriate reserved register, which would be currently
; used with given emulated register
;
; If no reserved register is available, macro returns blank string.
;
; Input:
; regex emulated register name with "_" postfix
pff_get_tmp_r MACRO regex:REQ
LOCAL postfix
IF PFF_EBX AND PFF_ECX
EXITM <> ; no reserved register available
ENDIF
postfix SUBSTR <regex>, @SizeStr (regex) - 1
%IFIDN <postfix>, <d_> ; dword register
IFE PFF_EBX
EXITM <ebx>
ELSE
EXITM <ecx>
ENDIF
%ELSEIFIDN <postfix>, <w_> ; word
IFE PFF_EBX
EXITM <bx>
ELSE
EXITM <cx>
ENDIF
%ELSEIFIDN <postfix>, <b_> ; byte
IFE PFF_EBX
EXITM <bl>
ELSE
EXITM <cl>
ENDIF
ELSE ; qword
IFE PFF_EBX
EXITM <ebx>
ELSE
EXITM <ecx>
ENDIF
ENDIF
ENDM
; Macro pff_r
;
; This macro moves the value of an emulated register to free reserved
; register and returns the register.
;
; Input:
; regex emulated register name
; size emulated register size
pff_r MACRO regex:REQ, size:REQ
IFIDNI <size>, <b>
IFE PFF_EBX
PFF_EBX = 1
mov bl, pff.global®ex&size
EXITM <bl>
ELSE
PFF_ECX = 1
mov cl, pff.global®ex&size
EXITM <cl>
ENDIF
ELSEIFIDNI <size>, <w>
IFE PFF_EBX
PFF_EBX = 1
mov bx, pff.global®ex&size
EXITM <bx>
ELSE
PFF_ECX = 1
mov cx, pff.global®ex&size
EXITM <cx>
ENDIF
ELSE
IFE PFF_EBX
PFF_EBX = 1
mov ebx, pff.global®ex&size
EXITM <ebx>
ELSE
PFF_ECX = 1
mov ecx, pff.global®ex&size
EXITM <ecx>
ENDIF
ENDIF
ENDM
; Macro pff_meta
;
; This macro provides the facility for two-operand instructions.
;
; Input:
; type type of operation: read/write
; op the operation itself (mov, add, cmp, test, ...)
; op1 destination operand
; op2 source operand
pff_meta MACRO type:REQ, op:REQ, op1:REQ, op2:REQ
LOCAL src, dst
LOCAL tmp
; add "_" to prevent expansion of possible emulated register
IFE @InStr (, op1&_, <[>) ; destination operand is not a memory location
; if the destination is an emulated register...
tmp TEXTEQU op1&_mapping
%IFNB <tmp>
; if the source is a memory location, load it first to tmp register
IF @InStr (, op2&_, <[>) ; source operand is a memory location
mov op1&_mapping, op2
PFF_ECX = 0 ; now, the second reserved register can be used
; don't load current emulated value if the operation is MOV
IFDIF <op>, <mov>
dst TEXTEQU op1
ELSE
dst TEXTEQU pff_get_tmp_r (op1&_)
ENDIF
; don't perform MOV operation since it is unnecessary in this case
; (MOV is here actually performed by the the former and the latter MOV)
IFDIF <op>, <mov>
op dst, op1&_mapping
ENDIF
IFIDN <type>, <write> ; if mov, add, etc., write it back
mov [pff.global&op1&], dst
ENDIF
ELSE ; source operand is not a memory location
IFDIF <op>, <mov>
dst TEXTEQU op1
ELSE
dst TEXTEQU pff_get_tmp_r (op1&_)
ENDIF
op dst, op2
IFIDN <type>, <write>
mov [pff.global&op1&], dst
ENDIF
ENDIF
ELSE
op op1, op2
ENDIF
ELSE ; op1 is a memory location
IFNDEF op2&_mapping ; catch immediate source operand
tmp TEXTEQU <>
ELSE
tmp TEXTEQU op2&_mapping
ENDIF
%IFNB <tmp> ; source is an emulated register
; if the destination is a memory location, load first its address
; to tmp register
IF @InStr (, op1&_, <[>) ; destination operand is a memory location
lea ebx, op1
PFF_ECX = 0 ; now, the second reserved register can be used
src TEXTEQU op2 ; load current emulated value
op [ebx], src
ELSE ; destination operand is not a memory location
op op1, op2&_mapping
ENDIF
ELSE
op op1, op2
ENDIF
ENDIF
; set both temp registers as unused
PFF_EBX = 0
PFF_ECX = 0
ENDM
; Macros supplying original instructions
mov@ MACRO op1:REQ, op2:REQ
pff_meta write, mov, op1, op2
ENDM
add@ MACRO op1:REQ, op2:REQ
pff_meta write, add, op1, op2
ENDM
cmp@ MACRO op1:REQ, op2:REQ
pff_meta read, cmp, op1, op2
ENDM
; Internal macro pff_global_r
;
; This internal macro is just used to declare global memory space for
; emulated registers; see PFF struct
pff_global_r MACRO regex:REQ
UNION
global®ex DWORD ?
global®ex&d DWORD ?
global®ex&w WORD ?
global®ex&b BYTE ?
ENDS
ENDM
PFF STRUCT
pff_global_r r8
pff_global_r r9
pff_global_r r10
pff_global_r r11
pff_global_r r12
pff_global_r r13
pff_global_r r14
pff_global_r r15
PFF ENDS
ELSE ; IF PFF32
mov@ TEXTEQU <mov>
add@ TEXTEQU <add>
cmp@ TEXTEQU <cmp>
ENDIF
.DATA?
IFDEF PFF32
pff PFF <> ; reserve space for emulated registers
ENDIF
;
The resulting 64-bit code equates to the source one.
main: lea rax, [402000h] mov [402010h], rax mov dword ptr [402018h], 1 mov r8, [402010h] mov r9d, [402018h] mov r11w, 2 add [r8+r9*2], r11w mov r12w, 22h cmp r12w, [r8+r9*2] jne main
The code is edited by hand to make it more clear.
main: ; lea rax, [buffer] lea eax, [402000] ; mov@ [port_base], rax mov [402030], eax ; mov@ [port_index], 1 mov dword ptr [402034], 1 ; mov@ r8, [port_base] mov ebx, [402030] mov [402010], ebx ; mov@ r9d, [port_index] mov ebx, [402034] mov [402014], ebx ; mov@ r11w, 2 mov bx, 2 ; unnecessary, don't care mov [40201C], bx ; add@ [r8+r9*2], r11w mov ebx, [402010] mov ecx, [402014] lea ebx, [ebx+ecx*2] mov cx, [40201C] add [ebx], cx ; mov@ r12w, 22h mov bx, 22 ; unnecessary, don't care mov [402020], bx ; cmp@ r12w, [r8+r9*2] mov ebx, [402010] mov ecx, [402014] mov bx, [ebx+ecx*2] mov cx, [402020] ; unnecessary, don't care cmp cx, bx jnz main
pff.asm, PFF macros.
demo.asm, PFF demo.
result64.lst, resulting 64-bit code listing.
result32.lst, resulting 32-bit code listing.
compile64.bat, a batch for 64-bit compilation.
compile32.bat, a batch for 32-bit compilation.
x86-64 Tour of Intel Manuals: Summary of new x64 features, as served by Intel manuals
Writing 64-bit programs by Jeremy Gordon
Microsoft Macro Assembler Reference, MASM for x64 (ml64.exe)
Continue to discussion board.
My contact information is here.
| 2007-04-18 | 1.0 | First public version | MazeGen |
(dates format correspond to ISO 8601)