assembler x86 commands. Assembly instructions and machine instructions

Assembler is a low-level programming language that is used to program various processors, microprocessors, and microcontrollers. This test considers assembler for x86 processors.

Assembly language programs consist of a set of specific instructions. These commands are then, with the help of a translator, converted into machine code, which is then executed by the central processor. With the help of commands, you can perform arithmetic calculations, work with memory and ports, etc.

Typically, assembler is used when it is necessary to optimize critical sections of code for speed, in device drivers, in viruses and other malware, in operating systems, compilers, etc.

Target audience of the test Assembler x86

The test tests knowledge of assembly language and x86 architecture. The test is focused more on practical knowledge of the language and architecture and therefore will be of interest to system programmers and students to test knowledge, and also useful to all programmers to improve knowledge about computer architecture and low-level programming.

Test structure in x86 assembler

The following topics can be arbitrarily identified:

General issues
Processor operating modes (real, protected)
Processor instructions

Further development of the x86 assembler test

In the future, we plan to add questions on topics not covered (FPU, working with devices / ports). Also, there is an intermediate level test in development, which will soon be available for passing.

Can be considered as autocode(see below), extended by constructions . It is essentially platform dependent. Assembly languages for various hardware platforms are incompatible, although they may be generally similar.

In Russian, it can be called simply " assembler” (expressions like “write a program in assembler” are typical), which, strictly speaking, is not true, since assembler the utility for translating a program with assembly language into computer code.

General definition

Assembly language is a notation used to represent programs written in machine code in a readable form. The assembly language allows the programmer to use alphabetic mnemonic operation codes, assign symbolic names to computer registers and memory at his own discretion, and also set addressing schemes that are convenient for him (for example, index or indirect). In addition, it allows the use of different number systems (for example, decimal or hexadecimal) to represent numeric constants and makes it possible to mark program lines with labels with symbolic names so that they can be accessed (by name, not by address) from other parts of the program (for example, to transfer control) .

The translation of an assembly language program into executable machine code (calculation of expressions, expansion of macros, replacement of mnemonics with machine codes proper and symbolic addresses with absolute or relative addresses) is carried out assembler- a translator program, which gave the assembly language its name.

Assembly language instructions correspond one to one to processor instructions. In fact, they represent a symbolic form of notation that is more convenient for a person - mnemocodes- commands and their arguments. In this case, several variants of processor instructions can correspond to one assembly language instruction.

In addition, assembly language allows the use of symbolic labels instead of addresses of memory cells, which, during assembly, are replaced by absolute or relative addresses calculated by the assembler or linker, as well as the so-called directives(assembler instructions that are not translated into machine instructions of the processor, but are executed by the assembler itself).

Assembly directives allow, in particular, to include data blocks, set the assembly of a program fragment by condition, set label values, use macros with parameters.

Each model (or family) of processors has its own set - system - of instructions and the corresponding assembly language. The most popular assembly language syntaxes are Intel syntax and AT&T syntax.

There are computers that implement as a machine programming language high level(Fort, Lisp, El-76). In fact, in such computers they perform the role of assembly languages.

Opportunities

The use of assembly language provides the programmer with a number of features that are usually not available when programming in high-level languages. Most of them are related to the proximity of the language to the hardware platform.

The ability to make full use of all the features of the hardware platform allows, theoretically, to write the fastest and most compact code possible for a given processor. A skilled programmer, as a rule, is able to significantly optimize the program in comparison with a translator from a high-level language in one or more parameters and create a code close to Pareto optimal (as a rule, the speed of the program is achieved by lengthening the code and vice versa):
- due to more rational use of processor resources, for example, the most efficient placement of all initial data in registers, unnecessary access to RAM can be eliminated;
- due to manual optimization of calculations, including more efficient use of intermediate results, the amount of code can be reduced and the speed of the program can be increased.
The ability to directly access hardware, and, in particular, I / O ports, specific memory addresses, processor registers (however, this ability is significantly limited by the fact that in many operating systems direct access from application programs to write to registers of peripheral equipment is blocked for reliability system operation).

Using assembler has almost no alternative when creating:

hardware drivers and the operating system kernel (at least, machine-dependent subsystems of the OS kernel), when it is important to coordinate the operation of peripheral devices with the central processor;
programs that must be stored in limited ROM and / or run on devices with limited performance (“firmware” of computers and various electronic devices)
platform-specific components of compilers and interpreters of high-level languages, system libraries and code that implements platform compatibility.

Separately, it can be noted that with the help of a disassembler program, it is possible to convert a compiled program into an assembly language program. In most cases, this is the only (albeit extremely time-consuming) way to reverse engineer the program's algorithms if its source code in a high-level language is not available.

Restrictions

Application

Historically, if machine codes are considered the first generation of programming languages, then assembly language can be considered as the second generation of programming languages. The shortcomings of the assembly language, the complexity of developing large software systems on it led to the emergence of third-generation languages - high-level programming languages (such as Fortran, Lisp, Cobol, Pascal, C, etc.). It is high-level programming languages and their successors that are currently mainly used in the information technology industry. However, assembly languages retain their niche due to their unique advantages in terms of efficiency and the ability to fully use the specific features of a particular platform.

Programs or their fragments are written in assembly language in cases where they are critical:

performance (drivers, games);
the amount of memory used (boot sectors, embedded (eng. embedded) software, programs for microcontrollers and processors with limited resources, viruses, software protection).

Using assembly language programming, the following are produced:

Optimization of speed-critical sections of programs in programs in high-level languages such as C++ or Pascal. This is especially true for game consoles, which have a fixed performance, and for multimedia codecs, which tend to be less resource intensive and faster.
Creation of operating systems (OS) or their components. Currently, the vast majority of operating systems are written in higher-level languages (mainly C, a high-level language that was specifically created to write one of the first versions of UNIX). Hardware-dependent pieces of code, such as the OS loader, the hardware abstraction layer, and the kernel, are often written in assembly language. In fact, there is very little assembly code in the Windows or Linux kernels, as the authors strive to ensure portability and reliability, but it is there nonetheless. Some amateur OSes, such as MenuetOS and KolibriOS, are written entirely in assembly language. At the same time, MenuetOS and KolibriOS are placed on a floppy disk and contain a graphical multi-window interface.
Programming of microcontrollers (MC) and other embedded processors. According to Professor Tanenbaum, the development of MC repeats the historical development of modern computers. Now (2013), assembly language is very often used for programming MK (although languages like C are also widely used in this area). In MK, you have to move individual bytes and bits between different memory cells. MK programming is very important, since, according to Tanenbaum, in the car and apartment of a modern civilized person, on average, there are 50 microcontrollers.
Creation of drivers. Drivers (or some of their software modules) program in assembly language. Although at present, drivers also tend to write in high-level languages (it is much easier to write a reliable driver in a high-level language) due to the increased requirements for reliability and sufficient performance of modern processors (speed ensures the timing of processes in the device and processor) and sufficient perfection of compilers with high-level languages (the absence of unnecessary data transfers in the generated code), the vast majority of modern drivers are written in assembly language. Reliability for drivers plays a special role, because in Windows NT and UNIX (including Linux) drivers run in the kernel mode of the system. One subtle error in a driver can crash the entire system.
Creation of antiviruses and other protective programs.
Writing code for low-level libraries of translators of programming languages.

Linking programs in different languages

Since for a long time only fragments of programs have often been coded in assembly language, they must be linked with the rest of the software system written in other programming languages. This is achieved in two main ways:

At the compilation stage - inserting assembler fragments (eng. inline assembler) into the source code of a program in a high-level language using special language directives. The method is convenient for simple data transformations, but it is impossible to make a full-fledged assembler code with data and subroutines, including subroutines with many inputs and outputs that are not supported by a high-level language.
At the link stage when compiling separately . For composable modules to interact, it is sufficient that the imported functions (defined in some modules and used in others) support a certain calling conventions. Separate modules can be written in any languages, including assembly language.

Syntax

Assembly language syntax is determined by the instruction set of a particular processor.

Command set

Typical assembly language commands are (most examples are given for x86 architecture Intel syntax):

Data transfer commands (mov, etc.)
Arithmetic commands (add , sub , imul etc.)
Logical and bitwise operations (or , and , xor , shr , etc.)
Program flow control commands (jmp , loop , ret , etc.)
Interrupt call instructions (sometimes referred to as control instructions): int
I/O commands to ports (in , out)
Microcontrollers and microcomputers are also characterized by commands that perform checks and transitions by condition, for example:

cjne - jump if not equal
djnz - decrement, and if the result is non-zero, then jump
cfsneq - compare, and if not equal, skip next command

Instructions

Typical Command Recording Format

[label:] [ [prefix] mnemocode [operand (, operand)] ] [ ;comment]

where mnemocode- directly mnemonic of the instruction to the processor. Prefixes can be added to it (repetitions, addressing type changes, etc.).

The mnemonics used are usually the same for all processors of the same architecture or family of architectures (among the widely known ones are x86, ARM, SPARC, PowerPC, M68k processor and controller mnemonics). They are described in the processor specification. Possible exceptions:

if the assembler uses cross-platform AT&T syntax (original mnemonics are converted to AT&T syntax);
if initially there were two standards for writing mnemonics (the instruction system was inherited from the processor of another manufacturer).

For example, the Zilog Z80 processor inherited the Intel 8080 instruction set, expanded it and changed the mnemonics (and register designations) in its own way. Motorola Fireball processors inherited the Z80 instruction set, cutting it down a bit. At the same time, Motorola has officially returned to Intel mnemonics and at the moment half of the Fireball assemblers work with Intel mnemonics, and half with Zilog mnemonics.

directives

An assembly language program may contain directives: instructions that do not translate directly into machine instructions, but control the operation of the compiler. Their set and syntax vary significantly and depend not on the hardware platform, but on the translator used (giving rise to dialects of languages within the same family of architectures). As a "gentleman's set" of directives, the following can be distinguished:

definition of data (constants and variables),
managing the organization of the program in memory and the parameters of the output file,
setting the compiler mode,
all kinds of abstractions (that is, elements of high-level languages) - from the design of procedures and functions (to simplify the implementation of the procedural programming paradigm) to conditional structures and loops (for the structured programming paradigm),

Program example

Program examples Hello, world! for different platforms and different dialects:

SECTION .data msg: db " Hello , world " , 10 len: equ $-msg SECTION .text global _start _start: mov edx , len mov ecx , msg mov ebx , 1 ; stdout mov eax , 4 ; write(2) int 0x80 mov ebx , 0 mov eax , 1 ; exit(2) int 0x80

SECTION .data msg: db " Hello , world " , 10 len: equ $-msg SECTION .text global _start syscall: int 0x80 ret _start: push len push msg push 1 ; stdout mov eax , 4 ; write(2) call syscall add esp , 3 * 4 push 0 mov eax , 1 ; exit(2) call syscall

386 .model flat , stdcall option casemap : none include \ masm32 \ include \ windows.inc include \ masm32 \ include \ kernel32.inc includelib \ masm32 \ lib \ kernel32.lib .data msg db " Hello , world " , 13 , 10 len equ $-msg .data ? written dd ? .code start: push - 11 call GetStdHandle push 0 push OFFSET written push len push OFFSET msg push eax call WriteFile push 0 call ExitProcess end start

format PE console entry start include " include \ win32a.inc " section " .data " data readable writeable message db " Hello , world ! " , 0 section " .code " code readable executable start: ; CINVOKE macro in FASM. ; Allows you to call CDECL functions. cinvoke printf , message cinvoke getch ; INVOKE is a similar macro for STDCALL functions. invoke ExitProcess , 0 section " .idata " import data readable library kernel , " kernel32.dll " , \ msvcrt , " msvcrt.dll " import kernel , \ ExitProcess , " ExitProcess " import msvcrt , \ printf , " printf " , \ getch , "_getch"

;yasm-1.0.0-win32.exe -f win64 HelloWorld_Yasm.asm;setenv /Release /x64 /xp ;link HelloWorld_Yasm.obj Kernel32.lib User32.lib /entry:main /subsystem:windows /LARGEADDRESSAWARE:NO bits 64 global main extern MessageBoxA extern ExitProcess section .data mytit db " The 64 - bit world of Windows & assembler... " , 0 mymsg db " Hello World ! " , 0 section .text main: mov r9d , 0 ; uType = MB_OK mov r8 , mytit ; LPCSTR lpCaption mov rdx , mymsg ; LPCSTR lpText mov rcx , 0 ; hWnd = HWND_DESKTOP call MessageBoxA mov ecx , eax ; uExitCode = MessageBox(...) call ExitProcess ret

Section ".data" hello: .asciz "Hello World!\n" .section ".text" .align 4 .global main main: save %sp , - 96 , %sp ! allocate memory mov 4 , %g1 ! 4 = WRITE (system call) mov 1 , %o0 ! 1 = STDOUT set hello , %o1 mov 14 , %o2 ! number of characters ta 8 ! system call ! program exit mov 1 , %g1 ! move 1 (exit () syscall ) into %g1 mov 0 , %o0 ! move 0 (return address ) into %o0 ta 8 ! system call

ORG 7 C00H USE16 JMP Code NOP DB "HELLOWRD" Sectsize DW 00200 H Clustsize DB 001 H Ressecs DW 00001 H Fatcnt DB 002 H Rootsiz DW 000 E0H Totsecs DW 00 B40H Media DB 0 F0H Fatsize DW 00009 H TRKSECS DW 00012 H Headcnt DW 00002 h HidnSec dw 00000 h code: cli mov ax , cs mov ds , ax mov ss , ax mov sp , 7 c00h sti mov ax , 0 b800h mov es , ax mov di , 200 mov ah , 2 mov bx , MessStr msg_print: mov al ,[ cs : bx ] mov [ es : di ], ax inc bx add di , 2 cmp bx , MessEnd jnz msg_print loo: jmp loo MessStr equ $ Message db " Hello , World ! " MessEnd equ $

History and terminology

This type of language got its name from the name of the translator (compiler) from these languages \u200b\u200b- assembler (English assembler - assembler). The name is due to the fact that the program was "automatically assembled" and not manually entered command-by-command directly in codes. At the same time, there is a confusion of terms: assembler is often called not only a translator, but also the corresponding programming language (“assembler program”).

I am writing a program in assembler (x86) for the projector. The bottom line is that there is an automatic slide switching mode. It requires you to delay the slide show in 10 second increments. They helped me to make such a delay. Below are two procedures (delay creation and delay decrement)

MakeDelay Proc Near mov al,Timer ;delay value in the range of 10 to 90 units shr al,4 mov ah,al xor al,al ror ax,2 mov word ptr Delay+1,ax mov byte ptr Delay,0 mov byte ptr Delay + 3.0 ret MakeDelay Endp

mov ax,word ptr Delay or ax,word ptr Delay + 2 cmp ax,0 ; je nxtslide ;Yes - go to slide selection sub word ptr Delay,1 ;No - go to decrease sbb word ptr Delay + 2,0 ;Delays jmp ext5 ;Exit subroutine

In the first procedure, it is not clear how they came to such a delay algorithm at all. And in the delay decrement procedure, it is not clear why to make a disjunction. The problem is that the implementation of the delay is very dependent on the operating system. And, for example, setting the value to 60 in Virtual Windows xp results in a delay of 65 seconds, and in Windows 7 50 seconds. Please help me get rid of this mess.

Code for the task: "Assembler x86"

textual

Program listing

ACP proc near ;procedure to generate correct value from ADC push ax ;save registers used in this procedure push dx ; mov al,01h ;turn on "Start" out 03h,al ; Waiting: in al,02h ;wait for "Rdy" to light up test al,01h ; jz Waiting ; mov al,0 ; reset "Start" out 03h,al ; in al,01h ;read the value on the input register mov ah,10d ; mov dl,0FFh ;load into dl the maximum number that can be applied to the input register mul ah ;multiply the number from the input register by the upper limit of the ADC (max. voltage) div dl ;divide the result by the maximum value on the input register and get the number from the range 1..10 mov skor_ACP,al pop dx pop ax ret ACP endp

This information was originally posted on the Key Table Explanations page. But then it was decided that these long general arguments should be put on a separate page. However, after such a transfer, these arguments grew a little more. Now, perhaps, they are only suitable for the "Miscellaneous Notes" section ...

Assembly instructions and machine instructions

First of all, we must not forget that assembly language instructions and machine language instructions are two different things. Although it is clear that these two concepts are closely related.

An assembler instruction is some mnemonic name. For x86 family processors, this name is written to English language. For example, the addition command has the name ADD, and the subtraction command has a name SUB.

Command shows the command name in assembly language.

The basis of a machine instruction is the opcode, which is simply a number. For x86 processors (however, for other processors too), it is customary to use hexadecimal numbers. (In passing, we note that octal numbers were adopted for Soviet computers, there was less confusion with them, since such numbers consist only of numbers and do not contain letters).

In the tables of this handbook in the column The code shows the opcode of the machine instruction, and in the column Format shows the format of the machine instruction.

We can assume that the number of different machine instructions for a given processor is equal to the number of possible operation codes. By the format, you can find out what components a given machine instruction consists of. Different machine instructions may have different formats. The opcode of a machine instruction completely defines its format.

Often one assembler instruction has several different variants of machine instructions. Moreover, the formats of these machine commands can be different for different options.

For example, the assembler instruction ADD has ten variants of machine instructions with different opcodes. But there are fewer different formats, only three. And each of these three formats requires different types of operands when writing an instruction in assembly language.

It is important to note here that all these ten machine instructions perform the same elementary operation, which in assembly language is called ADD.

And, therefore, it turns out that it seems to be possible to reason like this: the processor can perform as many different elementary operations as there are different assembler instructions. However, this simple principle still needs reservations and notes. Since some of the assembler commands also have synonyms.

A general list of all processor instructions can be built in different ways, choosing a different order of instructions. The main two ways are.

Method (1). Take the assembly language commands as a basis and arrange the commands alphabetically. Then tables like this can be obtained. All commands in alphabetical order (briefly)

Method (2). Take as a basis the opcode of the machine instruction and arrange the instructions in the order of the opcodes. In this case, it would be better if the general list is divided into two parts, to make separate lists for commands with a one-byte opcode and for commands with a two-byte opcode. First opcode byte Second opcode byte

Of course, there is also a third way, which is usually used in textbooks. Divide all commands into groups according to their meaning and study them in groups, starting with the simpler ones.

Main opcode byte

In the x86 command system, one byte (256 different combinations) was not enough to encode all commands. Therefore, the opcode in a machine instruction occupies either one byte or two bytes.

If the first byte contains the code 0F, then the opcode consists of two bytes.

If the opcode in a machine instruction consists of one byte, then this single byte is the main byte of the opcode. And the content of this byte determines what the operation is.

If the opcode in a machine command consists of two bytes, then not the first, but the second byte will be the main and defining byte in the opcode.

In manual tables that show the encoding of machine instructions, the main byte of the operation code is usually shown twice, first in the "Code" column as a hexadecimal number, and then in the "Format" column in the form of conditional eight dashes, on which special bits are marked, if there are any in the main opcode byte.

Main pages of the manual

x86 Processor Command Reference - main page(here is a map of all pages of the manual)

Excuse me, do you have a minute to talk about our savior, assembler? In the last article, we wrote our first hello world application in asma, learned how to compile and debug it, and also learned how to make system calls in Linux. Today we will get acquainted directly with assembler instructions, the concept of registers, the stack, and all this. Assemblers for x86 (a.k.a i386) and x64 (a.k.a amd64) architectures are very similar, and therefore it makes no sense to consider them in separate articles. Moreover, I will try to focus on x64, along the way noting the differences from x86, if any. The following assumes that you already know, for example, how a stack differs from a heap, and there is no need to explain such things.

General purpose registers

A register is a small (usually 4 or 8 bytes) piece of memory in a processor with an extremely fast access rate. Registers are divided into special purpose registers and registers general purpose. We are now interested in general purpose registers. As you can guess from the name, the program can use these registers for its own needs, as it pleases.

On x86, eight 32-bit general purpose registers are available - eax, ebx, ecx, edx, esp, ebp, esi and edi. Registers do not have a predefined type, that is, they can be treated as signed or unsigned integers, pointers, booleans, ASCII character codes, and so on. Although in theory these registers can be used in any way, in practice each register is usually used in a particular way. So, esp points to the top of the stack, ecx plays the role of a counter, and eax is the result of an operation or procedure. There are 16-bit registers ax, bx, cx, dx, sp, bp, si, and di, which are the least significant 16 bits of the corresponding 32-bit registers. Also available are the 8-bit registers ah, al, bh, bl, ch, cl, dh, and dl, which represent the upper and lower bytes of the ax, bx, cx, and dx registers, respectively.

Consider an example. Let's say the following three instructions are executed:

(gdb) x/3i $pc
=> 0x8048074: mov $0xaabbccdd,%eax
0x8048079: mov $0xee,%al
0x804807b: mov $0x1234,%ax

(gdb) p/x $eax
$1 = 0xaabbccdd
(gdb) p/x $ax
$2 = 0xccdd
(gdb) p/x $ah
$3 = 0xcc
(gdb) p/x $al
$4 = 0xdd

Values after writing 0 to register al x EE:

(gdb) p/x $eax
$5 = 0xaabbccee
(gdb) p/x $ax
$6 = 0xccee
(gdb) p/x $ah
$7 = 0xcc
(gdb) p/x $al
$8 = 0xee

(gdb) p/x $eax
$9 = 0xaabb1234
(gdb) p/x $ax
$10 = 0x1234
(gdb) p/x $ah
$11 = 0x12
(gdb) p/x $al
$12 = 0x34

As you can see, nothing complicated.

Note: The GAS syntax allows you to explicitly specify the sizes of operands by using the suffixes b (byte), w (word, 2 bytes), l (long word, 4 bytes), q (quadword, 8 bytes) and some others. For example, instead of the command mov $0xEE , % al you can write movb $0xEE , %al , instead of mov $0x1234 , % ax — movw $0x1234 , %ax , etc. In modern GAS these suffixes are optional and I personally don't use them. But don't be alarmed if you see them in someone else's code.

On x64, the register size has been increased to 64 bits. The corresponding registers are named rax, rbx, and so on. In addition, there are sixteen general-purpose registers instead of eight. Additional registers are named r8, r9, ..., r15. The corresponding registers, which represent the lower 32, 16 and 8 bits, are called r8d, r8w, r8b, and by analogy for registers r9-r15. In addition, registers appeared, which are the lower 8 bits of the rsi, rdi, rbp and rsp registers - sil, dil, bpl and spl, respectively.

About addressing

As already noted, registers can be treated as pointers to data in memory. To dereference such pointers, a special syntax is used:

mov(%rsp) , %rax

This entry means "read 8 bytes from the address in the rsp register and store them in the rax register." When a program is started, rsp points to the top of the stack, which stores the number of arguments passed to the program (argc), pointers to those arguments, as well as environment variables and some other information. Thus, as a result of executing the above instruction (of course, provided that no other instructions were executed before it), the number of arguments with which the program was launched will be written to rax.

In one command, you can specify the address and the offset (both positive and negative) relative to it:

mov 8 (% rsp ) , % rax

This entry means "take rsp, add 8 to it, read 8 bytes at the resulting address and put them in rax." Thus, rax will contain the address of the string representing the first argument of the program, that is, the name of the executable file.

When working with arrays, it can be convenient to refer to an element at a specific index. Relevant syntax:

# xchg instruction swaps values
xchg 16 (% rsp , % rcx , 8 ) , % rax

It reads like this: “calculate rcx*8 + rsp + 16, and swap 8 bytes (register size) at the resulting address and the value of the rax register.” In other words, rsp and 16 still play the role of an offset, rcx plays the role of an index in the array, and 8 is the size of the array element. When using this syntax, the only valid element sizes are 1, 2, 4, and 8. If some other size is required, you can use the multiplication, binary shift, and other instructions, which we will discuss next.

Finally, the following code is also valid:

Data
msg:
. ascii "Hello, world!\n"
. text

Globl_start
_start:
# reset rcx
xor %rcx , %rcx
mov msg(,% rcx , 8 ) , % al
mov msg, %ah

In the sense that you can not specify a register with an offset or any registers at all. As a result of executing this code, the ASCII code of the letter H, or 0, will be written to the al and ah registers. x 48.

In this context, I would like to mention one more useful assembler instruction:

# rax:= rcx*8 + rax + 123
lea 123 (% rax , % rcx , 8 ) , % rax

The lea instruction is very handy, as it allows you to perform multiplications and multiple additions at once.

fun facts! On x64, instruction bytecode never uses 64-bit offsets. Unlike x86, instructions often operate not with absolute addresses, but with addresses relative to the address of the instruction itself, which allows you to access the nearest +/- 2 GB of RAM. Relevant syntax:

movb msg(% rip) , % al

Let's compare the lengths of the "regular" and "relative" mov opcodes (objdump -d ):

4000b0: 8a 0c 25 e8 00 60 00 mov 0x6000e8,%cl
4000b7: 8a 05 2b 00 20 00 mov 0x20002b(%rip),%al # 0x6000e8

As you can see, the "relative" mov is also one byte shorter! What kind of register is this rip we will find out a little lower.

To write the full 64-bit value to the register, a special instruction is provided:

movabs $0x1122334455667788 , %rax

In other words, x64 processors code instructions as sparingly as x86 processors, and nowadays it makes little sense to use x86 processors in systems with a couple of gigabytes of RAM or less (mobile devices, refrigerators, microwave ovens, and so on). Chances are x64 processors will be even more efficient due to more available registers and bigger size these registers.

Arithmetic operations

Consider the basic arithmetic operations:

# initialize register values
mov $123 , %rax
mov $456 , %rcx

# increment: rax = rax + 1 = 124
inc%rax

# decrement: rax = rax - 1 = 123
dec%rax

# addition: rax = rax + rcx = 579
add % rcx , % rax

# subtraction: rax = rax - rcx = 123
sub % rcx , % rax

# change sign: rcx = - rcx = -456
neg %rcx

Here and below, operands can be not only registers, but also memory areas or constants. But both operands cannot be memory locations. This rule applies to all x86/x64 assembler instructions, at least those discussed in this article.

Multiplication example:

mov $100 , % al
mov $3 , %cl
mul % cl

In this example, the mul instruction multiplies al by cl, and stores the result of the multiplication in the al and ah register pair. Thus, ax will take the value 0 x 12C or 300 in decimal notation. In the worst case, it may take up to 2*N bytes to store the result of multiplying two N-byte values. Depending on the size of the operand, the result is stored in al:ah, ax:dx, eax:edx, or rax:rdx. Moreover, the first of these registers and the argument passed to the instruction are always used as multipliers.

Signed multiplication is done in exactly the same way using the imul instruction. In addition, there are variants of imul with two and three arguments:

mov $123 , %rax
mov $456 , %rcx

#rax=rax*rcx=56088
imul % rcx , % rax

#rcx=rax*10=560880
imul $10 , % rax , % rcx

The div and idiv instructions do the opposite of mul and imul. For example:

mov $0 , %rdx
mov $456 , %rax
mov $123 , %rcx

# rax = rdx:rax / rcx = 3
# rdx = rdx:rax % rcx = 87
div %rcx

As you can see, the result of an integer division was obtained, as well as the remainder of the division.

This is not all arithmetic instructions. For example, there are also adc (add with the carry flag), sbb (subtract with the borrow), as well as instructions corresponding to them that set and clear the corresponding flags (ctc, clc), and many others. But they are much less common, and therefore are not considered within the framework of this article.

Logic and bit operations

As already noted, there is no special typing in x86/x64 assembler. Therefore, do not be surprised that it does not have separate instructions for performing Boolean operations and separate instructions for performing bit operations. Instead, there is one set of instructions that work with bits, and how to interpret the result is up to the specific program.

So, for example, the calculation of the simplest logical expression looks like:

mov $0 , % rax # a = false
mov $1 , % rbx # b = true
mov $0 , % rcx # c = false

# rdx:= a || !(b && c)
mov % rcx , % rdx # rdx = c
and % rbx , % rdx # rdx &= b
not %rdx#rdx=~rdx
or % rax , % rdx # rdx |= a
and $1 , % rdx # rdx &= 1

Note that here we have used one least significant bit in each of the 64-bit registers. Thus, garbage is formed in the high bits, which we reset to zero with the last command.

Another useful instruction is xor (exclusive or). In boolean expressions, xor is used infrequently, but it often resets registers. If you look at the instruction opcodes, it becomes clear why:

4000b3: 48 31 db xor %rbx,%rbx
4000b6: 48 ff c3 inc %rbx
4000b9: 48 c7 c3 01 00 00 00 mov $0x1,%rbx

As you can see, the xor and inc instructions are encoded with only three bytes each, while the mov instruction that does the same takes up as many as seven bytes. Each individual case, of course, is better to benchmark separately, but the general heuristic rule is this - the shorter the code, the more it fits into the processor caches, the faster it works.

In this context, we should also recall the instructions for bit shift, bit test (bit test) and bit scan (bit scan):

# put something in the register
movabs $0xc0de1c0ffee2beef , %rax

# shift left 3 bits
# rax = 0x0de1c0ffee2beef0
shl $4 , % rax

# shift right 7 bits
#rax = 0x001bc381ffdc57dd
shr $7 , % rax

# rotate right 5 bits
#rax=0xe800de1c0ffee2be
ror $5 , % rax

# rotate left by 5 bits
#rax = 0x001bc381ffdc57dd
roll $5 , % rax

# ditto + set bit (bit test and set)

bts $13 , % rax

# ditto + reset bit (bit test and reset)
#rax=0x001bc381ffdc57dd, CF=1
btr $13 , % rax

# ditto + invert bit (bit test and complement)
#rax=0x001bc381ffdc77dd, CF=0
btc $13 , % rax

# find the least significant non-zero byte (bit scan forward)
#rcx=0, ZF=0
bsf %rax , %rcx

# find the most significant non-zero byte (bit scan reverse)
#rdx=52, ZF=0
bsr % rax , % rdx

# if all bits are zero, ZF = 1, rdx value is undefined
xor % rax , % rax
bsf %rax , %rdx

There are also signed bit shifts (sal, sar), cyclic shifts with a carry flag (rcl, rcr), and double precision shifts (shld, shrd). But they are not used so often, and you will get tired of listing all the instructions in general. Therefore, I leave their study to you as homework.

Conditionals and Loops

Some flags were mentioned above several times, for example, the transfer flag. Flags are bits of the special register eflags / rflags (the name on x86 and x64, respectively). This register cannot be directly accessed by mov, add and similar instructions, but it is changed and used indirectly by various instructions. For example, the already mentioned carry flag (CF) is stored in bit zero of eflags / rflags and is used, for example, in the same bt instruction. Other frequently used flags include zero flag (ZF, 6th bit), sign flag (SF, 7th bit), direction flag (DF, 10th bit) and overflow flag (OF, 11th bit) .

Another of these implicit registers should be called eip / rip, which stores the address of the current instruction. It also cannot be accessed directly, but is visible in GDB along with eflags / rflags if you say info registers , and is changed indirectly everyone instructions. Most instructions simply increase eip / rip by the length of that instruction, but there are exceptions to this rule. For example, the jmp instruction simply jumps to the given address:

# reset rax
xor % rax , % rax
jmp next
# this instruction will be skipped
inc%rax
next:
inc%rax

As a result, the value of rax will be equal to one, since the first inc instruction will be skipped. Note that the jump address can also be written in a register:

xor % rax , % rax
mov $next, %rcx
jmp*%rcx
inc%rax
next:
inc%rax

However, in practice, such code is best avoided, as it breaks branch prediction and is therefore less efficient.

Note: GAS allows labels to be given numerical names like 1: , 2: , and so on, and jump to the nearest previous or next label with a given number with instructions like jmp1b And jmp 1f. This is quite handy, as it can sometimes be difficult to come up with meaningful names for labels. Details can be found.

Conditional jumps are usually implemented with the cmp instruction, which compares its two operands and sets the appropriate flags, followed by an instruction from the je, jg, and similar families:

cmp %rax , %rcx

je 1f # jump if equal (equal)
jl 1f # jump if sign less (less)
jb 1f # jump if unsigned less than (below)
jg 1f # jump if sign greater than (greater)
ja 1f # jump if unsigned greater than (above)

There are also instructions jne (jump if not equal), jle (jump if signed less than or equal), jna (jump if unsigned not greater than), and the like. The principle of their naming, I hope, is obvious. Instead of je / jne, jz / jnz is often written, since the je / jne instructions simply check the value of ZF. There are also instructions that check other flags - js, jo and jp, but in practice they are rarely used. All of these instructions put together are commonly referred to as jcc. That is, instead of specific conditions, two letters “c” are written, from “condition”. you can find a good summary table of all the jcc instructions and what flags they check.

In addition to cmp, the test statement is also often used:

test %rax , %rax
jz 1f # jump if rax == 0
js 2f # jump if rax< 0
1 :
# some code
2 :
# some other code

fun facts! Interestingly, cmp and test are essentially the same as sub and and, only they don't change their operands. This knowledge can be used to execute a sub or and and a conditional branch at the same time, without additional cmp or test instructions.

More instructions related to conditional jumps, the following can be noted.

jrcxz 1f
# some code
1 :

The jrcxz instruction jumps only if the value of the rcx register is zero.

cmovge %rcx , %rax

Instructions of the cmovcc family (conditional move) work like mov, but only when the specified condition is met, by analogy with jcc.

setnz % al

The setcc instructions set a single-byte register or byte in memory to 1 if the specified condition is true, and 0 otherwise.

cmpxchg % rcx , (% rdx )

Compare rax with the given piece of memory. If equal, set ZF and store the value of the specified register at the specified address, in this example rcx. Otherwise clear ZF and load the value from memory into rax. Also, both operands can be registers.

cmpxchg8b(%rsi)
cmpxchg16b(%rsi)

The cmpxchg8b instruction is mostly needed on x86. It works similarly to cmpxchg, only it compares and swaps 8 bytes at once. The edx:eax registers are used for comparison, and the ecx:ebx registers store what we want to write. The cmpxchg16b instruction, on the same principle, compares and swaps 16 bytes at once on x64.

Important! Note that without the lock prefix, all these compare and swap instructions are not atomic.

mov $10 , %rcx
1 :
# some code
loop 1b
# loopz 1b
# loopnz 1b

The loop instruction decrements the value of the rcx register by one, and if after that rcx != 0 , jumps to the given label. The loopz and loopnz instructions work similarly, only the conditions are more complicated - (rcx != 0) && (ZF == 1) and (rcx != 0) && (ZF == 0) respectively.

It doesn't take a brain to figure out if-then-else constructs or for/while loops with these instructions, so let's move on.

"String" operations

Consider the following piece of code:

mov $str1, %rsi
mov $str2, % edi
cld
cmpsb

The rsi and rdi registers are filled with the addresses of two strings. The cld command clears the direction flag (DF). The instruction that executes reverse action, is called std. Then the cmpsb instruction comes into play. It compares bytes (%rsi) and (%rdi) and sets flags according to the result of the comparison. Then, if DF = 0, rsi and rdi increase by one (the number of bytes in what we compared), otherwise they decrease. The similar instructions cmpsw, cmpsl, and cmpsq compare words, long words and quad words, respectively.

The cmps instructions are interesting because they can be used with the rep prefix, repe (repz), and repne (repnz). For example:

mov $str1, %rsi
mov $str2, % edi
mov $len, %rcx
cld
repe cmpsb
jne not_equal

The rep prefix repeats the instruction the number of times specified in the rcx register. The prefixes repz and repnz do the same, but only after each execution of the instruction, ZF is additionally checked. The loop is terminated if ZF = 0 in the case of c repz and if ZF = 1 in the case of repnz. So the code above checks for equality between two buffers of the same size.

Similar instructions movs moves data from the buffer whose address is specified in rsi to the buffer whose address is specified in rdi (easy to remember - rsi means source, rdi means destination). The stos instruction fills the buffer at the address in rdi with the bytes in rax (or eax, or ax, or al, depending on the particular instruction). The lods instructions do the opposite - copy the bytes at the address specified in rsi to the rax register. Finally, the scas instructions look up the bytes in the rax register (or corresponding smaller registers) in the buffer pointed to by rdi. Like cmps, these instructions all work with rep, repz, and repnz prefixes.

Based on these instructions, the memcmp, memcpy, strcmp and similar procedures are easily implemented. It is interesting that, for example, to reset the memory, Intel engineers recommend using on modern processors rep stosb, that is, reset byte by byte, and not, say, by quad words.

Stack Handling and Procedures

With a stack, everything is very simple. The push instruction pushes its argument onto the stack, and the pop instruction pops a value from the stack. For example, if you temporarily forget about the xchg instruction, then you can swap the value of two registers like this:

push %rax
mov % rcx , % rax
pop %rcx

There are instructions that push and pop the rflags / eflags register on the stack:

pushf
# do something that changes the flags
popf
# flags restored, it's time to do jcc

And so, for example, you can get the value of the CF flag:

pushf
pop %rax
and $1 , %rax

On x86, there are also pusha and popa instructions that save and restore the values of all registers on the stack. In x64, these instructions are no longer available. Apparently, because there are more registers and the registers themselves are now longer - it has become much more expensive to save and restore them all.

Procedures are usually "created" using the call and ret instructions. The call instruction pushes the address of the next instruction onto the stack and transfers control to the address specified in the argument. The ret instruction reads the return address from the stack and transfers control over it. For example:

someproc:
# typical procedure prologue
# for example, allocate 0x10 bytes on the stack for local variables
# rbp - pointer to stack frame
push %rbp
mov % rsp , % rbp
sub $0x10 , % rsp

# some kind of calculation here...
mov $1 , %rax

# typical procedure epilogue
add $0x10 , %rsp
pop %rbp

# exit procedure
ret

start:
# as with jmp, the jump address can be in a register
call someproc
test %rax , %rax
jnz error

Note: A similar prologue and epilogue can be written using the instructions enter $0x10 , $0 And leave. But these statements are rarely used these days, as they are slower to execute due to the extra support for nested procedures.

As a rule, the return value is passed in register rax or, if its size is not large enough, it is written to the structure, the address of which is passed as an argument. On the issue of passing arguments. There are a lot of calling conventions. In some, all arguments are always passed through the stack (a separate question is in what order) and the procedure itself is responsible for clearing the stack of arguments, in others, some of the arguments are passed through registers, and some through the stack, and the caller is responsible for clearing the stack of arguments, plus lots of options in the middle, with separate rules for aligning arguments on the stack, passing this if it's an OOP language, and so on. In the general case, for an arbitrary architecture, compiler, and programming language, the calling convention can be anything at all.

I] ;
}
return hash;
}

Disassembler listing (when compiled with -O0 , comments are mine):

# typical procedure prologue
# register rsp does not change, because the procedure does not call any
# other procedures
400950: 55 push %rbp
400951: 48 89 e5 mov %rsp,%rbp

# initialization of local variables:
# -0x08(%rbp) - const unsigned char *data (8 bytes)
# -0x10(%rbp) - const size_t data_len (8 bytes)
# -0x14(%rbp) - unsigned int hash (4 bytes)
# -0x18(%rbp) - int i (4 bytes)
400954: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400958: 48 89 75 f0 mov %rsi,-0x10(%rbp)
40095c: c7 45 ec 4b 43 41 48 movl $0x4841434b,-0x14(%rbp)
400963: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)

#rax:= i. if data_len is reached, exit the loop
40096a: 48 63 45 e8 movslq -0x18(%rbp),%rax
40096e: 48 3b 45 f0 cmp -0x10(%rbp),%rax
400972: 0f 83 28 00 00 00 jae 4009a0

# eax:= (hash<< 5) + hash
400978: 8b 45ec mov -0x14(%rbp),%eax
40097b: c1 e0 05 shl $0x5,%eax
40097e: 03 45 ec add -0x14(%rbp),%eax

# eax += data[i]
400981: 48 63 4d e8 movslq -0x18(%rbp),%rcx
400985: 48 8b 55 f8 mov -0x8(%rbp),%rdx
400989: 0f b6 34 0a movzbl(%rdx,%rcx,1),%esi
40098d: 01 f0 add %esi,%eax

# hash:= eax
40098f: 89 45 ec mov %eax,-0x14(%rbp)

# i++ and go to the beginning of the loop
400992: 8b 45 e8 mov -0x18(%rbp),%eax
400995: 83 c0 01 add $0x1,%eax
400998: 89 45 e8 mov %eax,-0x18(%rbp)
40099b: e9 ca ff ff ff jmpq 40096a

# the return value (hash) is put into the eax register
4009a0: 8b 45ec mov -0x14(%rbp),%eax

# typical epilogue
4009a3: 5d pop %rbp
4009a4: c3 retq

Here we met two new instructions - movs and movz. They work exactly like mov, only they expand one operand to the size of the second, signed and unsigned, respectively. For example, the movzbl (%rdx,%rcx,1),%esi instruction reads a byte (b) at (%rdx,%rcx,1) , expands it into a long word (l) by prepending zeros (z) and puts the result in the esi register.

As you can see, two arguments were passed to the procedure through the rdi and rsi registers. It appears to be using a convention called System V AMD64 ABI . This is claimed to be the de facto standard for x64 on *nix systems. I see no reason to retell the description of this convention here, interested readers can read the full description at the link provided.

Conclusion

Needless to say, within the framework of one article, it is not possible to describe the entire x86 / x64 assembler (moreover, I'm not sure that I myself know it right the whole). At a minimum, topics such as operations on floating point numbers, MMX, SSE, and AVX instructions, as well as all sorts of exotic instructions like lidt, lgdt, bswap , rdtsc, cpuid, movbe, xlatb, or prefetch were left behind the scenes. I will try to cover them in future articles, but I do not promise anything. It should also be noted that in the output of objdump -d for most real programs, you will very rarely see anything other than what is described above.

Another interesting topic left behind the scenes is atomic operations, memory barriers, spinlocks, and that's it. For example, compare and swap is often implemented simply as a cmpxchg instruction with a lock prefix. By analogy, an atomic increment, decrement, and so on are implemented. Alas, all this pulls on a topic for a separate article.

As sources of additional information, we can recommend the book Modern X86 Assembly Language Programming, and, of course, manuals from Intel. The x86 Assembly book on wikibooks.org is also pretty good.

From the online references for assembler instructions, you should pay attention to the following:

Do you know assembly language, and if so, do you find this knowledge useful?