ALE and fun with line compilers

anon80485093 · October 2, 2021, 3:10am

Hi Everyone,

The thing I really love about assembly language is that you get absolutely no help whatsoever in getting it right, and pretty much all the assistance you need in getting it wrong. To that end I’ve made a lot of changes to the ALE compiler to bring it more into line with yea oldie compilers of yesteryear. Now I’m not going to be completely horrible, I’m still going to give you warnings about things you should do or haven’t done, but if it isn’t an error that would crash the compiler, then your ALE program will still run, and if you chose to ignore the warnings you might get some unexpected results. This is what I so enjoy, and ALE is after all, in my mind, all about learning.

So what am I talking about? Firstly you need to know that ALE’s compiler is a traditional single pass line compiler, and single pass compilers need to make a few assumptions. They also don’t take any final program flow into consideration, they just hope that you’ll make everything right in the future. So lets consider the following snippet of code:

_start:

	jmp ABC123

	mov 1, eax
	int

In the above code you can see that we have a jmp instruction to the label ABC123, but if you examine the rest of the code you can clearly see that there is no label ABC123. When the compiler encounters this line it will create an uninitialised pointer for the label, it then assumes that later on in the code you will actually define the label and the pointer will be updated with the correct address. As far as the compiler is concerned from this point on the label actually exists. If you were to run the program, blissfully ignoring the warning about the missing label, your ALE program would immediately exit as soon as it encounters the jmp instruction, as it has no idea where to go.

A similar is true with variables. If the compiler encounters an instruction which is clearly reading or writing to a variable, then it will assign a memory address and hope that you define the variable in the future.

So why does any of this occur? Well because the compiler doesn’t consider any program flow, it doesn’t know that you might call a subroutine to create your variables before you need them. It just assumes that you’re going to do the right thing later. Also with many assembly language implementations variables are created in a section called data at the end of the program, so a similar thing would occur.

Now what about scope? Traditionally assembly language has no scope, you can just think of everything as being global. This is also why you can only ever have one label or variable of a given name, even if you were to create a variable inside a subroutine. Yes there are some later assembly language implementations which add a layer on top to handle scope and memory management, but ALE isn’t one of them.

Now I have added a very simple scope check to ALE’s compiler, just to make sure you’re not jumping into a subroutine without using using a call statement, which would be bad and could cause all sorts of stack and execution pointer errors.

0	_start:
0		call mySub
0		mov 1, eax
0		int
1	mySub:
1		add r0, r1
1		ret

In the above code you can see that I assign an increasing index to each called subroutine. For jump instructions I just check to see that the line you’re jumping from and to have the same index, which means your not jumping out of scope. With variables I check that the variable’s index is the same as the line making the change or 0 which means the variable is global.

None of this actually effects the runtime, as the engine just treats everything as global anyway, but by following the scope warnings you can write more robust code.

Now you don’t have to follow any of this if you don’t wish as you can turn off the options in the editor, but I do think these changes are beneficial, because now you get a whole list of warnings for your code, rather than just a single error code which only gave a vague idea of what was wrong. By following the warnings I believe you can write better code.

I hope this helps.

Kind Regards

TC

Robert_Weaver · October 2, 2021, 7:56am

Thanks for the info.

My own personal philosophy is that any error that can be detected, should be detected and flagged. Assembly language programmers need all the breaks they can get.

Several years ago, I wrote an assembler for PIC microcontrollers, Coincidentally, I wrote it in Xojo’s predecessor: Real Basic. It was a two pass assembler. First pass to scan the code for labels and build the symbol table, and the second pass to compile the machine code. I paid a lot of attention to detecting errors. In the end, it would flag a lot of things that Microchip’s PIC assembler would miss. One, in particular, was detecting the wrong number of operands for an instruction. Microchip’s assembler would miss that, causing all kinds of grief.

BTW, you’re apparently old enough to remember punched cards. The original two pass assemblers required the user to run the source code deck through the card reader twice in order to do the two pass assembly. Hence, to save the nuisance of having to do this, the single pass assembler was born.

anon80485093 · October 2, 2021, 8:01am

Hi Robert,

Thanks for that, I probably wasn’t that clear, I certainly am flagging all the errors so you can easily see them, I’m just not automatically fixing them for you in the background anymore. I like to understand what I got wrong and not have it hidden. Any critical error wouldn’t even be allowed to run.

TC

anon80485093 · October 2, 2021, 8:03am

I started in IT a while after punch cards, but I do recall seeing them, though I never had the pleasure of using them

TC