How To Write A Assembler


  • [blue]Before you begin, it is useful to read how other assemblers are doing it. There are few open source assemblers, like FASM.

    First thing you need is a syntax. You need to decide what kind of syntax your assembler is going to use for defining elements of language. For example, how to declare variables? How to declare structures (which are basically, collections of declared fields) and so on.

    Next thing you need is a manual for the CPU you will be writing your assembler. From that manual you can see which instructions your assembler is going to convert to code and how exactly these instructions look in binary form.

    Then you need to parse your source code. It is actually simple. You need to divide your source code into lines. Usually, the line has the ability to continue (if it is terminated with a backslash character ''), so you need to account for this. Also, do not forget to remove comments. That's tricky, because the comment symbol can be encountered inside a literal, like this:

    variable db 'This is not a comment;' ; This is a REAL comment
    | |
    Skip that one (it is inside literals)---+ |
    Now this one stops the line---+
    Now, a parser of a line should include only two functions:

    1. Skip white space
    2. Get token

    - skipping white space is simple. You need to move your source code pointer when it points to a blank symbol or a TAB. As soon as it encounters a different symbol it should return. Of course, when seeing the end of line it should stop line processing.

    - getting a token is more complex. There are three types of tokens:

    A) a single symbol, which is not a part of identifier. This includes the punctuation characters, like .,;:/[]{}+-*& and so on.

    B) an identifier, which contains symbols a..z, A..Z, 0..9 and underscore (you can also add @ character here - in some assemblers it plays significant role). Identifiers are values and names, like 0FFFFh or Var1.

    C) literal token. It is any text enclosed into single or double quotes. When you processing a literal - you should not 'see' any of the tokens of type A or B. Just skip all until terminating quote.

    So, now when you have these functions, calling them in a sequence will give you tokens from a parsed line of source code, so your algorithm will look something like this:
    while (not end of line)
    skip white space
    get next token
    token analyser

    Now, token analyser will function according to rules of syntax for your assembler.

    And, finally, the last thing you need is to build the binary output file. This may be Win32 PE (or EXE) file or some other executable file for different system. In order to do that you need to know the structure of a file you going to build. Search the WEB for it.
  • Sir Mr.AsmGuru69

    I send Y:-)u Sir As Many Thanks as Much Alphabets you have typed
    Without a Word Mentioned in my question.
    That was because my computer got virused and it suddenly
    was not allowing my message to be posted on the site
    i ran the Anti-virus and 33 trojans were detected.
    SO here is what you might have found
    if my question would have been allowed to get posted

    I dont even know assembly first of all althou i understand the nature of it now i downloaded few books of intels. 6 of them in which they mention opcodes why i say this is because
    what if i write opcode directly in a notepad (thats what assemblers are doing arent they converting statements into opcodes!!!)
    and put a extension of .exe and run it would it work
    i tried it opened a dos shell and made the cursor bounce all
    over like not knowing what to do thou i never asked for that.

    So is this what assembler does throw opcodes into memory and
    let the computer run over it resulting in their execution.

    and while you are at it sir and now that i
    Understand the meaning of you name through
    you are a abbrevation of
    "Assembly Guru 69"
    While all this while i was thinking
    "Awesome Guru 69"
    Now Thats AweS:-)me!!!
    I would get asnwers for assembly
    Three Cheers For That
    Hip Hip Hurray :-)
    Hip Hip Hurray :-)
    Hip hip Hurray!!!!!!! :-)

    So Then :-)
    would you explain a little
    what does Mod/rm and SIB means
    while issuing a general instruction to the processor
    InstructionPrefixes Opcode Immediate ModR/M SIB Displacement Immediate | |
    | |
    | |
    Mod Reg/opcode R/M |
    Scale index Base


    Thanks !!!

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!