Assembly in Linux

May 15, 2004 8:35 PM
There are 3 assembly languages to learn this semester. For the hypothetical SIC¹ XE, MIPS R2000/R3000 and the Intel 80x86. Apart from that, I'm also fooling around with Knuth's MIXAL and AT&T mnemonics. After a lot of searching on the net and in our library, I figured out how to assemble instructions for my 80686 in linux.

Almost all high level languages translate to an intermediate assembly language. Programming in assembly language is always much more challenging and cumbersome. Especially in the real mode segmented model. In gcc, the -s switch lets you view the intermediate assembly code² it generates. The .s assembly thus obtained is normally assembled by gas (GNU assembler) and deleted.

Tools

NASM is a free assembler that comes with the required features. ALINK in conjunction with NASM provides a complete free alternative to assembly level programming in DOS. Everybody I know are MASM addicts. But with NASM, assembling in DOS as well as the mighty 32-bit protected mode operating system, Linux, is a snap.

Compiling and linking code with the usual standard C libraries³ can be accomplished using the following simple makefile:

anihi: anihi.o
	gcc anihi.o -o anihi
anihi.o: anihi.asm
	nasm anihi.asm -f elf

Coding

In Linux, the critical portion of the assembly program is a subroutine call which is called from the startup code linked in at the link stage. Simply put, when a program begins, first some standard C library code runs, executes a CALL instruction to the main: label in the program which finally returns control back to the standard C library program code using a RET instruction.

Stack frames provide an easy way to access command line arguments and environment variables. Generally, all assembly programs that follows C calling conventions should have these instructions in the beginning of the program:

push ebp 
mov ebp,esp
push ebx
push esi
push edi

A stack frame is instantiated by pushing the caller's copy of EBP onto the stack and by copying ESP to EBP. The caller must also push EBX, ESI and EDI onto the stack before it is used.

Once a stack frame is created, you have to destroy it before your program terminates. Therefore, the following lines must be at the end of every assembly program in Linux:

pop edi
pop esi
pop ebx
mov esp,ebp
pop ebp

Sandwich your code in between these two, compile it using make with the sample makefile and presto, you have your first assembly program in Linux.

Linking to external functions like printf is easy, just reference to it using extern printf, push argument in a right to left order compared to the c version and do call printf.

Here's the full code for a hello world program

[SECTION .text]
	extern puts;reference to c library function puts
	global main;for linker entry point

main:
	push ebp;Set up the stack frame
	mov ebp,esp;ebp points to our stack
	push ebx;save ebp,ebx,esi and edi
	push esi
	push edi
;;; our code begins	
	push dword msg;push 32-bit pt to msg on stack
	call puts;call c lib func puts
	add esp,4;Clean the stack by adjusting esp back by 4 bytes	
;;; our code ends
	pop edi
	pop esi
	pop ebx
	mov esp,ebp;Destroy stack frame from returning
	pop ebp
	ret

	[SECTION .data]

msg:db "Hiiiiiiiiii World",10,0;for n00bies, 10 is newline and 0 is null

	[SECTION .bss]

Conclusion

At this point, the expected question would be:

Why would you want to use assembly language in Linux?

Assuming you have an answer to "Why use assembly language?" the answer might be simply because of the fact that you, the assembly language geek, get to harness the power of a free protected mode operating system which implies no segments, more versatile registers, addressing modes and a stable, crash-free environment.

CategoryProgramming

^[1] Simplified Instructional Computer.
^[2] .s files uses AT&T mnemonics which are quite different from the normal MASM, TASM, NASM x86 mnemonics.
^[3] You don't need them if you only use the INT 80H⁴ kernel interface.
^[4] Visit this website for more info.