Assembly Language Getting started with Assembly Language Machine code


Machine code is term for the data in particular native machine format, which are directly processed by the machine - usually by the processor called CPU (Central Processing Unit).

Common computer architecture (von Neumann architecture) consist of general purpose processor (CPU), general purpose memory - storing both program (ROM/RAM) and processed data and input and output devices (I/O devices).

The major advantage of this architecture is relative simplicity and universality of each of components - when compared to computer machines before (with hard-wired program in the machine construction), or competing architectures (for example the Harvard architecture separating memory of program from memory of data). Disadvantage is a bit worse general performance. Over long run the universality allowed for flexible usage, which usually outweighed the performance cost.

How does this relate to machine code?

Program and data are stored in these computers as numbers, in the memory. There's no genuine way to tell apart code from data, so the operating systems and machine operators give the CPU hints, at which entry point of memory starts the program, after loading all the numbers into memory. The CPU then reads the instruction (number) stored at entry point, and processing it rigorously, sequentially reading next numbers as further instructions, unless the program itself tells CPU to continue with execution elsewhere.

For example a two 8 bit numbers (8 bits grouped together are equal to 1 byte, that's an unsigned integer number within 0-255 range): 60 201, when executed as code on Zilog Z80 CPU will be processed as two instructions: INC a (incrementing value in register a by one) and RET (returning from sub-routine, pointing CPU to execute instructions from different part of memory).

To define this program a human can enter those numbers by some memory/file editor, for example in hex-editor as two bytes: 3C C9 (decimal numbers 60 and 201 written in base 16 encoding). That would be programming in machine code.

To make the task of CPU programming easier for humans, an Assembler programs were created, capable to read text file containing something like:

    INC   a

    DB    60          ; define byte with value 60 right after the ret

outputting byte hex-numbers sequence 3C C9 3C, wrapped around with optional additional numbers specific for target platform: marking which part of such binary is executable code, where is the entry point for program (the first instruction of it), which parts are encoded data (not executable), etc.

Notice how the programmer specified the last byte with value 60 as "data", but from CPU perspective it does not differ in any way from INC a byte. It's up to the executing program to correctly navigate CPU over bytes prepared as instructions, and process data bytes only as data for instructions.

Such output is usually stored in a file on storage device, loaded later by OS (Operating System - a machine code already running on the computer, helping to manipulate with the computer) into memory ahead of executing it, and finally pointing the CPU on the entry point of program.

The CPU can process and execute only machine code - but any memory content, even random one, can be processed as such, although result may be random, ranging from "crash" detected and handled by OS up to accidental wipe of data from I/O devices, or damage of sensitive equipment connected to the computer (not a common case for home computers :) ).

The similar process is followed by many other high level programming languages, compiling the source (human readable text form of program) into numbers, either representing the machine code (native instructions of CPU), or in case of interpreted/hybrid languages into some general language-specific virtual machine code, which is further decoded into native machine code during execution by interpreter or virtual machine.

Some compilers use the Assembler as intermediate stage of compilation, translating the source firstly into Assembler form, then running assembler tool to get final machine code out of it (GCC example: run gcc -S helloworld.c to get an assembler version of C program helloworld.c).