vid, 2007-09-29 | Revision: 1.0 |
I often get to argument with various linux guys about AT&T versus Intel syntax. There are many things I dislike on AT&T syntax, so I decided to write them all down in this article.
This article will be specifically about most often used AT&T-syntax assembler, GNU Assembler (GAS). GAS implements "AT&T syntax", and adds its own extensions as technology evolves. I will discuss GAS-specific extensions in this article too.
Testing for this article was done with MinGW GAS version 2.17.50 20060824.
Order of instruction arguments is first point both sides bring up first in discussion. GAS uses "source, dest" order of arguments, while intel uses "dest, source".
GAS proponents argue that "source, dest" is more readable and more logical.
First thing you will always hear is that movl %eax, %ebx
in GAS reads
nicely to "move register eax to register ebx". Equivalent in Intel syntax
mov ebx, eax
is trickier to read. Okay, i agree in this one case, when transformed
to human language, GAS's mov
instruction reads better. But there are much more important
problems with reading assembly, than mov
.
Very basic counterexample to mov
are instruction like
xor
, cmp
, and
,
test
, etc. These read better in Intel syntax:
xor eax, 10 ;bitwise xor register eax with 10 and [var], 0xFF ;bitwise and var with 0xFF cmp eax, 10 ;compare eax to 10
GAS equivalents are not so readable:
xor $10, %eax and $0xFF, var cmp $10, %eax
This alone is enough for me to outweight mov
, add
, and others of GAS.
Also, readability for me doesn't mean how easy is instruction transformed to human languages,
readability means how easily is purpose of instruction understood. That makes difference.
More serious problem with GAS syntax is mnemonics of conditional jumps. Intel is one who designed procesors and decided on mnemonics of instructions, and it decided these mnemonics for intel syntax, not for AT&T syntax. This causes serious problems to readability of GAS code with conditional instructions:
cmp eax, 10 ;compare eax to 10 jg greater ;jump if greater cmovl eax, 10 ;move if lesser
In GAS:
cmp $10, %eax jg greater cmovl $10, %eax
Another problem with GAS, that we will see numerous times in this article, is that GAS keeps compatibility to ancient standards, which were often bad decisions. Keeping backwards compatibility is of course important for it, otherwise lot of old code would have to be rewritten. But programmers can decide to use new and better tools.
Regarding order of arguments, this causes problem with FPU instructions. FPU instruction in GAS don't have reversed order. They have "dest, source" order of arguments. Even many proponents of GAS consider this a bug, see AT&T Syntax bugs as an example.
Discussion about which of "dest, source" or "source, dest" is more logical extends beyond area of assemblers. There are languages using both ways, but "dest, source" languages are strongly prevailing. Consider C:
a = 5; a += 10;
There are same rare languages using other way:
5 = a 5 += a
But I can't in any way find the latter more logical. Especially if there are more than one source operands:
a = (x + y) / 2 imul eax, ebx, 10 ;eax = ebx*10
Versus:
(x + y) / 2 = a imull $10, %ebx, %eax ;eax = ebx*10
The way I see it is that first information you get is "what am I working with", and only then you get on more complex "what I am doing with it". Otherwise, you first must "parse" through all the things you are doing to the value, and only then you find out with what you was doing all those things. It is not as big problem in assembly, but still…
Memory addressing is in my opinion one of worst problems of GAS. How it works:
If argument of instruction is without any special marker (such as %
for register
or $
for numeric constant), then it is memory access. So following:
movl 10, %eax movl foo, %eax
Corresponds to intel syntax:
mov eax, [10] mov eax, [foo]
To use numeric constant, or use address of label, there is $
operator:
movl $10, %eax movl $foo, %eax
In Intel syntax:
mov eax, 10 mov eax, offset foo
Note that in NASM-style syntax, last instruction (getting offset of foo) is
mov eax, foo
This is my preffered style, it makes the syntax most clear and unambigous.
One more minor GAS problem with 16-bit addressing was, that I wasn't able to enforce 32-bit addressing by immediate in 16-bit code.
Another minor thing is that you have to keep order (%bx, %si) or (%bx, %di),
you cannot use other order of arguments. It gives sense for AT&T syntax, but
may be annoying, especially if you use registers in other sense (e. g., SI
is
base and BX
is index).
In Intel syntax, full 32-bit addressing looks like this:
push segment:[base + scale*index + displacement]
For example:
push fs:[table + 8*ecx + ebx] push gs:[8*eax + 4]
Note. Some intel-syntax-derivate assemblers use this form:
push [segment : base + scale*index + displacement] push [fs:0]
Note. Some assemblers can automatically create "base + scale*index" from single "value*register", like the following. I am not fan of this feature, but some people do like it.
lea eax, [9*eax] lea eax, [eax + 8*eax]
In AT&T syntax, full x86-32 addressing is written as:
segment:displacement(base, index, scale)
For example:
pushl %fs:table(%ebx, %ecx, 8) pushl %gs:4(,%eax,8)
More complicated example follows. table
is array of ITEM
structures.
sizeof.ITEM
holds size of item
structure,
ITEM.foo
holds index of member foo
within structure ITEM
. Following code gets member
foo
of structure ITEM
, that is
at index EBX
in array table
(e. g.,
ax = table[ebx].foo
in C).
sizeof.ITEM = 8 ITEM.foo = 2 mov ax, [table + sizeof.ITEM*ebx + ITEM.foo]
In GAS:
mov table+ITEM.foo(,%ebx,sizeof.ITEM), %ax
I think intel-style syntax clearly wins in readability here.
Also same problem as with 16-bit mode, i wasn't able to enforce 16-bit addressing in 32-bit code.
Adding support for AMD64 architecture caused great troubles in all assemblers.
AMD64 provides RIP-relative addressing. With this addressing, you can make
position-independent code very easily. This addressing is what you want 99.9% times
when writing 64-bit code. Using absolute addresses is still possible, but pretty
much limited to 4GB address space, and they require relocations. The only exceptions are mov
instructions (opcodes A0h
-A3h
), where absolute 64-bit addressing is enabled.
Some assemblers (YASM, GAS) still use absolute addresses by default in 64-bit mode, and they require explicit notation for RIP-relative addressing. This makes writing position-independent code pain.
mov variable(%rip), %eax
Other assemblers (FASM, MASM) use RIP-relative addressing by default. MASM doesn't provide any way to use absolute addresing, and FASM does offer way to use it, in rare cases when it is needed.
As a minor problem, GAS doesn't provide some exotic 64-bit addressing modes. These are not really needed, but would be nice for sake of completness of assembler.
One more thing thing that I dislike about GAS is it's lazy (easy to parse) syntax. I understand that it was useful few tenths of years ago, when parsing language had to be as simple as possible, to save expensive memory. Nowadays this isn't a issue, especially for assembler. Unfortunatelly, GAS inherited this syntax.
Having %
before ever register name is reasonable, to separate it from label with
same name as register. This is good way for GAS as back-end for gcc compilers.
But for hand-written assembly code, having to type %
character before every register
is a little bit annoying. Registers are used much more often than variables in assembly.
Some other assemblers solve this issue in different way: MASM (and all MS tools) decorate
every name with underscore, and FASM allows assigning symbols different "global" and
private name in object. As far as I know, NASM doesn't solve this issue.
Using special notation for memory addressing makes more sense in handwritten assembly
code, than using special notation for registers. And for compilers, it doesn't matter.
Using $
is similar to offset
operator in MASM/TASM style
assemblers.
However, GAS has less ambiguity here than
MASM, because even constant values are treated
like memory addresing, unless prefixed with $
. But still,
NASM/FASM syntax is in my
opinion better understandable, and easier to write.
Thing that is no longer an issue is specifying operand size in mnemonics,
like movl
, cmpw
, etc. GAS can now deduce operand size from registers used, and
in case there is no register used, nor explicit size in mnemonics, it throws error.
Feature that I lack in GAS is ability to assign size to label. In many other assemblers you can assign size to label, and if that label is used as address, and no explicit size is given for instruction, assembler uses size associated with this label:
var1 dd 10 label var2 dword mov [var1], 10 cmp [var2], 0
GAS can only output object files. It is unable to create pure binary file. It is still possible to produce pure binary output using linker scripts, but doing it this way has several limitations. These limitations are imposed by object format, in which code produced by GAS is stored.
All addresses in object format must be relocated, even though in resulting binary they will be constant. But GAS doesn't know about this, and has to treat all addresses as relocatable. That makes it impossible to do things like:
lea eax, [variable and (not 0xFFF)] ;get address of page where a variable is
because there is no relocation for and (not 0xFFF)
in ELF objects.
This is a minor limitation, but still it is a little annoying, and this problem is solved in other assemblers with pure binary output (NASM, YASM, FASM).
Very annoying thing in GAS is that it treats all undefined symbols as external dependencies. That means if you mistype some name, your program compiles fine, and you can find error only later during linking, having to go back to assembly source.
If you are unlucky, you may happen to have symbol with such name defined in other module, linking will succeed too, and you have nasty bug to look for. If GAS wouldn't beheave this way, you could catch such bug immediately.
As for other assemblers i know, only FASM and MASM solve this issue properly. See my External Dependencies in Assemblers article for more details.
In my view, GAS is too lowlevel assembler. It is fine as backend for compilers, but it doesn't matter at all to compiler nowadays how much lowlevel its backend is. For human, assembler can beheave nicer than GAS does, without losing any control or simplicity.
Another GAS's problem is that it still keeps very old standards, not all of which are the best choice. There are newer and better assemblers to use, even though they are less standard.
Note. This article is only a first version, and I am no GAS expert. I believe GAS proponents will reply, explain how to do things I wasn't able to do with GAS, and correct possible mistakes.
GAS is part of binutils package.
http://www.x86-64.org/, resources and discussion on 64-bit programming with GAS.
Continue to discussion board.
You can contact the author using e-mail vid@x86asm.net.
Visit author's home page.
2007-09-29 | 1.0 | First public version | vid |
(dates format correspond to ISO 8601)