It's helpful to realize x86 assembly is not what's executed by the machine; machine code is. One assembly instruction, e.g. ADDL, is translated to several different machine code instructions depending on the destination, source, and addressing mode.
I'm looking at the Microsoft Macro Assembler 5.1 Reference manual (it was nearby and easily accessible to me; yes it's old (from the very late 80s or early 90s) but it covers the 32 bit 80386, which is still valid.
Anyway, it shows three different encodings for the ADD instruction. The first:
000000dw mod,reg,r/m
This adds register to register, or memory to register (either direction, the d above) using either 8 or 16/32 bits (the w above [1]). The second form:
100000sw mod,000,r/m
This adds an immediate value (8 or 16 bits, w again) to a register. The s bit is used to sign extend the data (s=1; otherwise, 0-extend it) if required [2]. The final form:
0000010w data
This adds an immediate value to the accumulator register (EAX, AX, AL) [1]. That's three different encodings for the "same" instruction. The MOV instruction (and again, I'm only talking about the 80386 here) has 8 different encodings, depending upon registers used.
[1] If the current code segment is designated as a 16-bit segment, then the w means 16 bits, unless a size override byte (an opcode prefix byte) is present, in which case it means 32-bits. If the current code segment is designated as a 32-bit segment, then the w means 32 bits, again unless a size override byte is present, in which case it means 16-bits.
[2] It seems to me that if w=1, then the s bit is extraneous and thus could be used to encode other instructions. I'm not sure if that is the case but it's common to use otherwise nonsensical instruction encoding to do something useful.
It seems to me that if w=1, then the s bit is extraneous and thus could be used to encode other instructions. I'm not sure if that is the case but it's common to use otherwise nonsensical instruction encoding to do something useful.
Opcode 82h is an alias for 80h --- it presumably sign-extends the immediate value into an internal temporary register, but the upper bits don't matter anyway since it's an 8-bit add. Some interesting discussion on that here, along with an example application:
There's a bit of a miscommunication going on. What I meant was, when you write an assembly instruction, that maps to one machine instruction.
I.e., because of things like addressing modes, different invocations of an ADD instruction can map to different machine instructions. But one ADD invocation will always map to one machine instruction.
The parent comment sounded to me like one assembly instruction could map to several machine instructions, like one line of C is equivalent to several lines of assembly. Just wanted to clarify that that isn't the case.
> There's a bit of a miscommunication going on. What I meant was, when you write an assembly instruction, that maps to one machine instruction.
That's not true on x86-16, x86-32 and x86-64. For example
060o, 310o
and
062o, 301o
(...o means "octal"; for the reason why I give this example in octal instead of hexadecimal cf. https://news.ycombinator.com/item?id=13051770) both stand for "xor al, cl" (the assembler you use will one of the two encodings) - for those people who really prefer hexadecimal here: It corresponds to
30h, C8h
and
32h, C1h
The fact that there are different ways to encode some instructions was used by the A86 assembler (https://en.wikipedia.org/w/index.php?title=A86_(software)&ol...) to watermark machine code that was generated by it; in particular to detect whether it was generated by a registered or unregistered version of A86:
"The assembler automatically embeds a "fingerprint" into the generated code through a particular choice of functionally equivalent instruction encodings. This makes it possible to tell if code was assembled with A86, and also to distinguish between registered and unregistered versions of the assembler, although access to the source code is required."
That is also true --- for example, "mov reg, reg" is a special case of "mov reg, r/m" or "mov r/m, reg" with the r/m specifying a register, so basically two separate sequences of bytes which perform the same operation. This has been exploited by copy-protection and steganography, going back to the A86 shareware assembler which was the first use of this technique that I can remember, to more recent developments:
>Can you point to a source for this? All x86 assemblers that I know of map one assembly instruction to one machine instruction.
Seeing the confusion and clarifications in the replies to your comment, I think it may have been more clear if you had said (and you probably meant):
"Can you point to a source for this? All x86 assemblers that I know of map one assembly instruction to one out of a set of machine instructions (where the chosen machine instruction depends on things like the addressing mode (immediate, indexed, indirect indexed, etc. - I'm using older terms for addressing mode, not sure if they are valid now with newer processors. but the concept is the same).