MazeGen, 2007-04-18 | Revision: 1.0 |
From time to time, speculations about portable assembler (what is a contradiction itself) araise. As an assembly programmer, I couldn't avoid these weird speculations. The similarity between x86-32 and x64 simply can't be overlooked and itself leads to finding some conjunctive syntax, here called Portable x86 Flat Syntax (PFS).
x64 architecture highlights:
MOV [RIP-2008h], RAX
)r8
to r15
xmm8
to xmm15
Let's start with prohibition of all these new features. This way, 32-bit code is portable to x64 architecture. No problem, all works well :-)
Unfortunately, it is not so simple. These are incompatible issues:
RIP
), thus all pointers should be 64-bit too.
Even if whole process would be loaded below 4GB boundary, where
32-bit pointers are big enough, pointers returned by operating
system can be above this boundary.PUSH EAX
impossible
– only PUSH RAX
or
PUSH AX
can be used (this applies to 32-bit
memory operand as well).It is clear that we must set specific rules in this concept.
What rules should be given?
A register holding a pointer must be used the same
way like in 64-bit code. While compiling to 32-bit code, it will be
transformed to 32 bits. When stored in memory, it must use an
abstract type, say PVOID
. This type is ruled same
way. It would be defined this way: (preudocode)
IF compile-for-64-bits PVOID typedef QWORD ELSE PVOID typedef DWORD ENDIF
Example of using a pointer within PFS:
port_ptr PVOID ? ... mov [port_ptr], rax ; in 32-bit code, rax becomes eax
Making instructions like
PUSH DWORD PTR [foo]
portable is
a puzzler. Forcing 64-bit declaration of foo
variable
is not the way to go (because of memory consumption etc.). New type,
specific to variables which pass through stack, is also not
suitable. To solve this issue, let's assume that the implementation
of PFS (called Portable x86 Flat Framework,
PFF) will provide comfortable way of using local variables
in functions. This way, there is almost no need for explicit use of
PUSH
and POP
instructions and these can be
forbidden. The exception is PUSH const
, which always
default to current stack width using zero extension (this feature
makes the instruction partially portable).
These two rules should solve all incompatibilities. It is something what we could call Minimal Portable Flat Syntax (MPFS). Let's try if we can do better. Again, here go new 64-bit features:
Can't be explicitly enabled, this addressing mode is not
available in 32-bit mode. There is a trick to get current
EIP
, we won't use it for simplicity though. Besides
this, an assembler can choose this addressing mode regardless of
a syntax.
These could be enabled only using complicated transformations in 32-bit code. For simplicity, it is not enabled (it works only for pointers, see first rule).
r8
to r15
xmm8
to xmm15
Can be enabled, if we assume that a PFS
implementation supports creation of threads on its own and can manage
issues connected with emulation of these registers. Additionally, at
least one original general register must is reserved for transformation
of instructions like MOV EAX, [R8]
to 32-bit
code: (pseudocode)
; r8d_reg is emulated register r8d ; tmp_reg is a general register, reserved for PFS mov tmp_reg, [r8d_reg] mov eax, [tmp_reg]
This implies that if we'd want to enable addressing
like MOV EAX, [R8+R9*4]
, another reserved
register would be necessary:
mov tmp_reg1, [r8d_reg] mov tmp_reg2, [r9d_reg] mov eax, [tmp_reg1+tmp_reg2*4]
First possible reserved register, which comes to
my mind, is rBX
. This register is used by default only
within XLAT
instruction, which is not much in
use and can be easily replaced.
It gets worse for the other reserved register,
because all remaining ones hold specific meaning in some
instructions. Now, we need to realize that some instructions have
the same syntax in both modes. If we add the no-64-bit-operands
rule, it appers that, for example, REP STOSD
instruction has the same syntax also within PFS. If we
make clear what circumtances lead to use (rewrite) of reserved
registers, we can reserve also rCX
register. This one
is difficult to replace only in case of REP
prefixes
family. Using this register, we can use also this code within
PFS:
xor eax, eax ; (or mov eax, 0 for those who don't like this ;-)) lea rdi, [buffer] mov ecx, [rdx] ; beware of any new registers (r8, ...) rep stosd
Still, not all operand combinations are solved. Most
complicated one is MOV [R8+R9], R10
. Ideal
solution would be another reserved register:
mov tmp_reg1, [r8d_reg] mov tmp_reg2, [r9d_reg] mov tmp_reg3, [r10d_reg] mov [tmp_reg1+tmp_reg2], tmp_reg3
However, we can't allocate another one, three reserved registers would be too much. We can work around it this way instead:
mov tmp_reg1, [r8d_reg] mov tmp_reg2, [r9d_reg] lea tmp_reg1, [tmp_reg1+tmp_reg2] ; release tmp_reg2 mov tmp_reg2, [r10d_reg] mov [tmp_reg1], tmp_reg2
None of XMM register is hardcoded so we
can reserve, for example, xmm7
.
All rules are now given. They can be recapitulated and specified like the following:
PFS is similar to 64-bit code syntax with these differences:
PVOID
. This type manages portability of pointers.PUSH
and
POP
instructions explicitly. The only exception is
PUSH const
instruction, where const
is below or equal FFFFFFFFh.eBX
,
eCX
and xmm7
registers (see next
rule).ebx
register is rewritten
(destroyed). If one of those registers is used as an index
additionally, ebx
register is rewritten. If one of
the new XMM registers is used, xmm7
register is rewritten. In all other cases, it is guaranteed that
these registers will remain the same.An example of most of these rules:
port_base PVOID ? port_index DWORD ? ... mov [port_base], rax mov r8, [port_base] mov r9d, [port_index] ; acts as movzx in 64-bit mode add r10w, [buffer+r8+r9*2]
The 32-bit transformation would look like this:
mov [port_base], eax ; mov [port_base], rax mov ebx, [port_base] ; mov r8, [port_base] mov [r8d_reg], ebx mov ebx, [port_index] ; mov r9d, [port_index] mov [r9d_reg], ebx mov ebx, [r8d_reg] ; add r10w, [buffer+r8+r9*2] mov ecx, [r9d_reg] mov bx, [buffer+ebx+ecx*2] add word ptr [r10d_reg], bx
To bring this syntax to light, here goes description of PFF, created using MASM macros.
Since I'm not interested in complete framework, the sample is just a basic demo, which includes only portable code of primary thread (only a few instructions). For compilation, I use ML.EXE and ML64.EXE 8.00.50727.42 (shipped with Visual Studio 2005).
This is how the command lines look like:
ml /c /Cp /Fl /DPFF32 demo.asm link /SUBSYSTEM:WINDOWS /entry:main demo.obj
ml64 /c /Cp /Fl demo.asm link /SUBSYSTEM:WINDOWS /entry:main demo.obj
Instructions like mov@
and similar are
macros, which manage the portability.
; IFDEF PFF32 ; add the header only for 32-bit code .686 .MODEL FLAT, STDCALL ENDIF include pff.asm .DATA? port_base PVOID ? port_index DWORD ? .DATA buffer WORD 5 DUP (20h) .CODE main PROC lea rax, [buffer] ; acts the same in both modes mov@ [port_base], rax mov@ [port_index], 1 mov@ r8, [port_base] mov@ r9d, [port_index] ; (acts as movzx in 64-bit mode) mov@ r11w, 2 add@ [r8+r9*2], r11w mov@ r12w, 22h cmp@ r12w, [r8+r9*2] jne main main ENDP END ;
; IFDEF PFF32 PVOID TYPEDEF DWORD ELSE PVOID TYPEDEF QWORD ENDIF IFDEF PFF32 ; init: no temp register is being used PFF_EBX = 0 PFF_ECX = 0 ; 64-bit general registers (which may hold only a pointer) are simply ; EQUated to 32-bit ones for 32-bit mode rax TEXTEQU <eax> ;... ; usage of any of new general registers causes calling of pff_r macro, ; which move the emulated value into free reserved register (eBX or eCX) r8 EQU <pff_r (r8, d)> r9 EQU <pff_r (r9, d)> ;... r9d EQU <pff_r (r9, d)> ;... r11w EQU <pff_r (r11, w)> r12w EQU <pff_r (r12, w)> ;... ; set registers mapping to reserved registers so it is possible to test ; whether a register is emulated or not rax_mapping TEXTEQU <> ;... r8_mapping TEXTEQU <ebx> r9_mapping TEXTEQU <ebx> ;... r9d_mapping TEXTEQU <ebx> ;... r11w_mapping TEXTEQU <bx> r12w_mapping TEXTEQU <bx> ;... ; Macro pff_get_tmp_r ; ; This macro returns appropriate reserved register, which would be currently ; used with given emulated register ; ; If no reserved register is available, macro returns blank string. ; ; Input: ; regex emulated register name with "_" postfix pff_get_tmp_r MACRO regex:REQ LOCAL postfix IF PFF_EBX AND PFF_ECX EXITM <> ; no reserved register available ENDIF postfix SUBSTR <regex>, @SizeStr (regex) - 1 %IFIDN <postfix>, <d_> ; dword register IFE PFF_EBX EXITM <ebx> ELSE EXITM <ecx> ENDIF %ELSEIFIDN <postfix>, <w_> ; word IFE PFF_EBX EXITM <bx> ELSE EXITM <cx> ENDIF %ELSEIFIDN <postfix>, <b_> ; byte IFE PFF_EBX EXITM <bl> ELSE EXITM <cl> ENDIF ELSE ; qword IFE PFF_EBX EXITM <ebx> ELSE EXITM <ecx> ENDIF ENDIF ENDM ; Macro pff_r ; ; This macro moves the value of an emulated register to free reserved ; register and returns the register. ; ; Input: ; regex emulated register name ; size emulated register size pff_r MACRO regex:REQ, size:REQ IFIDNI <size>, <b> IFE PFF_EBX PFF_EBX = 1 mov bl, pff.global®ex&size EXITM <bl> ELSE PFF_ECX = 1 mov cl, pff.global®ex&size EXITM <cl> ENDIF ELSEIFIDNI <size>, <w> IFE PFF_EBX PFF_EBX = 1 mov bx, pff.global®ex&size EXITM <bx> ELSE PFF_ECX = 1 mov cx, pff.global®ex&size EXITM <cx> ENDIF ELSE IFE PFF_EBX PFF_EBX = 1 mov ebx, pff.global®ex&size EXITM <ebx> ELSE PFF_ECX = 1 mov ecx, pff.global®ex&size EXITM <ecx> ENDIF ENDIF ENDM ; Macro pff_meta ; ; This macro provides the facility for two-operand instructions. ; ; Input: ; type type of operation: read/write ; op the operation itself (mov, add, cmp, test, ...) ; op1 destination operand ; op2 source operand pff_meta MACRO type:REQ, op:REQ, op1:REQ, op2:REQ LOCAL src, dst LOCAL tmp ; add "_" to prevent expansion of possible emulated register IFE @InStr (, op1&_, <[>) ; destination operand is not a memory location ; if the destination is an emulated register... tmp TEXTEQU op1&_mapping %IFNB <tmp> ; if the source is a memory location, load it first to tmp register IF @InStr (, op2&_, <[>) ; source operand is a memory location mov op1&_mapping, op2 PFF_ECX = 0 ; now, the second reserved register can be used ; don't load current emulated value if the operation is MOV IFDIF <op>, <mov> dst TEXTEQU op1 ELSE dst TEXTEQU pff_get_tmp_r (op1&_) ENDIF ; don't perform MOV operation since it is unnecessary in this case ; (MOV is here actually performed by the the former and the latter MOV) IFDIF <op>, <mov> op dst, op1&_mapping ENDIF IFIDN <type>, <write> ; if mov, add, etc., write it back mov [pff.global&op1&], dst ENDIF ELSE ; source operand is not a memory location IFDIF <op>, <mov> dst TEXTEQU op1 ELSE dst TEXTEQU pff_get_tmp_r (op1&_) ENDIF op dst, op2 IFIDN <type>, <write> mov [pff.global&op1&], dst ENDIF ENDIF ELSE op op1, op2 ENDIF ELSE ; op1 is a memory location IFNDEF op2&_mapping ; catch immediate source operand tmp TEXTEQU <> ELSE tmp TEXTEQU op2&_mapping ENDIF %IFNB <tmp> ; source is an emulated register ; if the destination is a memory location, load first its address ; to tmp register IF @InStr (, op1&_, <[>) ; destination operand is a memory location lea ebx, op1 PFF_ECX = 0 ; now, the second reserved register can be used src TEXTEQU op2 ; load current emulated value op [ebx], src ELSE ; destination operand is not a memory location op op1, op2&_mapping ENDIF ELSE op op1, op2 ENDIF ENDIF ; set both temp registers as unused PFF_EBX = 0 PFF_ECX = 0 ENDM ; Macros supplying original instructions mov@ MACRO op1:REQ, op2:REQ pff_meta write, mov, op1, op2 ENDM add@ MACRO op1:REQ, op2:REQ pff_meta write, add, op1, op2 ENDM cmp@ MACRO op1:REQ, op2:REQ pff_meta read, cmp, op1, op2 ENDM ; Internal macro pff_global_r ; ; This internal macro is just used to declare global memory space for ; emulated registers; see PFF struct pff_global_r MACRO regex:REQ UNION global®ex DWORD ? global®ex&d DWORD ? global®ex&w WORD ? global®ex&b BYTE ? ENDS ENDM PFF STRUCT pff_global_r r8 pff_global_r r9 pff_global_r r10 pff_global_r r11 pff_global_r r12 pff_global_r r13 pff_global_r r14 pff_global_r r15 PFF ENDS ELSE ; IF PFF32 mov@ TEXTEQU <mov> add@ TEXTEQU <add> cmp@ TEXTEQU <cmp> ENDIF .DATA? IFDEF PFF32 pff PFF <> ; reserve space for emulated registers ENDIF ;
The resulting 64-bit code equates to the source one.
main: lea rax, [402000h] mov [402010h], rax mov dword ptr [402018h], 1 mov r8, [402010h] mov r9d, [402018h] mov r11w, 2 add [r8+r9*2], r11w mov r12w, 22h cmp r12w, [r8+r9*2] jne main
The code is edited by hand to make it more clear.
main: ; lea rax, [buffer] lea eax, [402000] ; mov@ [port_base], rax mov [402030], eax ; mov@ [port_index], 1 mov dword ptr [402034], 1 ; mov@ r8, [port_base] mov ebx, [402030] mov [402010], ebx ; mov@ r9d, [port_index] mov ebx, [402034] mov [402014], ebx ; mov@ r11w, 2 mov bx, 2 ; unnecessary, don't care mov [40201C], bx ; add@ [r8+r9*2], r11w mov ebx, [402010] mov ecx, [402014] lea ebx, [ebx+ecx*2] mov cx, [40201C] add [ebx], cx ; mov@ r12w, 22h mov bx, 22 ; unnecessary, don't care mov [402020], bx ; cmp@ r12w, [r8+r9*2] mov ebx, [402010] mov ecx, [402014] mov bx, [ebx+ecx*2] mov cx, [402020] ; unnecessary, don't care cmp cx, bx jnz main
pff.asm, PFF macros.
demo.asm, PFF demo.
result64.lst, resulting 64-bit code listing.
result32.lst, resulting 32-bit code listing.
compile64.bat, a batch for 64-bit compilation.
compile32.bat, a batch for 32-bit compilation.
x86-64 Tour of Intel Manuals: Summary of new x64 features, as served by Intel manuals
Writing 64-bit programs by Jeremy Gordon
Microsoft Macro Assembler Reference, MASM for x64 (ml64.exe)
Continue to discussion board.
My contact information is here.
2007-04-18 | 1.0 | First public version | MazeGen |
(dates format correspond to ISO 8601)