May 15, 2004 8:35 PM
There are 3 assembly languages to learn this semester. For the
hypothetical SIC1 XE, MIPS R2000/R3000 and the Intel 80x86. Apart
from that, I'm also fooling around with Knuth's MIXAL and AT&T
mnemonics. After a lot of searching on the net and in our library, I
figured out how to assemble instructions for my 80686 in linux.
Almost all high level languages translate to an intermediate assembly
language. Programming in assembly language is always much more
challenging and cumbersome. Especially in the real mode segmented
model. In gcc
, the -s
switch lets you view
the intermediate assembly code2 it generates. The .s assembly thus
obtained is normally assembled by gas
(GNU assembler) and
deleted.
Tools
NASM is a free assembler that comes with the required features. ALINK in conjunction with NASM provides a complete free alternative to assembly level programming in DOS. Everybody I know are MASM addicts. But with NASM, assembling in DOS as well as the mighty 32-bit protected mode operating system, Linux, is a snap.
Compiling and linking code with the usual standard C libraries3 can
be accomplished using the following simple makefile
:
anihi: anihi.o gcc anihi.o -o anihi anihi.o: anihi.asm nasm anihi.asm -f elf
Coding
In Linux, the critical portion of the assembly program is a subroutine
call which is called from the startup code linked in at the link
stage. Simply put, when a program begins, first some standard C
library code runs, executes a CALL
instruction to the
main:
label in the program which finally returns control
back to the standard C library program code using a RET
instruction.
Stack frames provide an easy way to access command line arguments and environment variables. Generally, all assembly programs that follows C calling conventions should have these instructions in the beginning of the program:
push ebp mov ebp,esp push ebx push esi push edi
A stack frame is instantiated by pushing the caller's copy of EBP onto the stack and by copying ESP to EBP. The caller must also push EBX, ESI and EDI onto the stack before it is used.
Once a stack frame is created, you have to destroy it before your program terminates. Therefore, the following lines must be at the end of every assembly program in Linux:
pop edi pop esi pop ebx mov esp,ebp pop ebp
Sandwich your code in between these two, compile it using make with the sample makefile and presto, you have your first assembly program in Linux.
Linking to external functions like printf
is easy, just
reference to it using extern printf
, push argument in a right to left
order compared to the c version and do call printf
.
Here's the full code for a hello world program
[SECTION .text] extern puts;reference to c library function puts global main;for linker entry point main: push ebp;Set up the stack frame mov ebp,esp;ebp points to our stack push ebx;save ebp,ebx,esi and edi push esi push edi ;;; our code begins push dword msg;push 32-bit pt to msg on stack call puts;call c lib func puts add esp,4;Clean the stack by adjusting esp back by 4 bytes ;;; our code ends pop edi pop esi pop ebx mov esp,ebp;Destroy stack frame from returning pop ebp ret [SECTION .data] msg:db "Hiiiiiiiiii World",10,0;for n00bies, 10 is newline and 0 is null [SECTION .bss]
Conclusion
At this point, the expected question would be:
Why would you want to use assembly language in Linux?
Assuming you have an answer to "Why use assembly language?" the answer might be simply because of the fact that you, the assembly language geek, get to harness the power of a free protected mode operating system which implies no segments, more versatile registers, addressing modes and a stable, crash-free environment.
[1] Simplified Instructional Computer.
[2] .s files uses AT&T mnemonics which are quite different from the
normal MASM, TASM, NASM x86 mnemonics.
[3] You don't need them if you only use the INT 80H4 kernel interface.
[4] Visit this website for more info.