Destructible or self modifying code

anon80485093 · December 21, 2021, 6:07am

I’ve been looking at a little used modifier on some assembly code which I think would be fun to implement, at least in my dev version, along with the ATOMIC modifier which is even stranger and just as fun.

Let me explain, say you wanted to create a subroutine to write a carriage return to the text console, but you only wanted to create one subroutine, not one to create the CRLF string and another to display it. Now without calling an external library zero level assembler doesn’t really have a CHR method so you have to push the character codes onto the stack then pop them off onto the string to create your crlf string.

Now you could certainly create your CRLF string in your program init somewhere which would be logical, but say you wanted to keep it all nicely encapsulated in one place.

So let’s examine whats happening in the following code. The _crlfStr$ variable definition, blank lines and comments will be stripped by the compiler and the variable _crlfStr will be allocated a memory address on the heap containing a pointer to its string data on the string stack.

The next four lines with the Destruct modifier will destroy themselves after being run just once so they’ll disappear as well. What really happens is they replace their bitcode with a 0 making them a nop so they’ll just be skipped in the future.

So in the end we are just left with the last three lines that actually write the CRLF to the console.

In essence we have self modifying or self optimising code that we don’t need to put any conditional check around to see if the CRLF string has been created. Conditionals in any language are not known for being super fast and organising our code so they are not required is a sure fire method to improve performance.

I’d be interested in knowing if anyone else has had experience with destructible or self modifying code. One of my closest friend who is a serious developer is horrified with the idea as he believes it would be impossible to debug because you wouldn’t know what’s actually running which is a fair point…

_doCRLF
	_crlfStr$ "" 

	push 10	: Destruct
	push 13 : Destruct
	popsc _crlfStr : Destruct
	popsc _crlfStr : Destruct

	mov _crlfStr, ebx
	mov 1, eax
	int 0x21
ret

Jeff_Tullin · December 21, 2021, 6:23am

I expect MacAfee, Avast , AVG and many others would be all over that.

anon80485093 · December 21, 2021, 6:40am

True, but certainly not my idea and it’s been around for a long time. I’ve seen it used is some game engines to improve performance… The only thing that changes is the bitcode of the instruction, the line still contains all the other data.

anon80485093 · December 21, 2021, 8:36am

I was just talking to an older friend of mine who started programming back is the very early days of LISP, and he was able to confirm that even at that time LISP had self modifying code if you wanted to use it. So this is by no means a new idea…

Robert_Weaver · December 21, 2021, 8:56am

LISP is a lot of fun. It doesn’t distinguish between program code and data. The program syntax is identical to data syntax. So, you can write a program that writes a program on the fly and then runs it. Lambda expressions and the Eval function are the principal building blocks for doing this.

anon80485093 · December 21, 2021, 9:03am

Sounds like I’m going to have to have a play with it… I’m sure my friend would be delighted…

Eric_Bloom · December 21, 2021, 1:12pm

Tossing in my 2 cents …

Based on my experiences many years ago when I was programming in assembly language on Data General Eclipse, DEC PDP-10 and DEC VAX …

The early designs of computers combined program and data space - probably for easier design of the hardware. This allowed for self-modifying code, and could be handy when memory was tight. Developers were both pleased with the cleverness shown, and upset with the potential for bugs and difficulty of debugging.

Currently I consider it a design defect that allows this. With large memory spaces, and virtual memory there should be no real need for this. Just compatibility with old stuff. These days its mostly useful for hacking.

Windows has an optional feature called Data Execution Prevention which attempts to segregate data and program space, and block execution of data. Intel and AMD have added hardware support for this. But in Windows 10, the default setting is off except for “essential windows programs and services.” There must still be enough application software that mixes code and data. Maybe created by old compilers that didn’t separate program and data?

And if you look at a file system in a similar way, you see the same problem. The same file space stores both programs and data. Recent operating systems attempt to use permissions to prevent overwriting programs, but this doesn’t stop admin users or other techniques.

In the file system, self-modifying code can be useful allowing software to update itself automatically.

Aside from software that needs to update itself, I don’t see any real need for self-modifying software any longer. It may be useful to teach techniques to computer science students so they understand how things work.

TimStreater · December 21, 2021, 2:57pm

Even 35 years ago, memory blocks could be marked as no-access, read-only, read-write, or execute-only. In fact I thought VMS did this routinely BICBW. There’s nothing new about such techniques.

Either way self-modifying code should generaly be consigned to the dustbin of history. There may be specific instances where bootstraps, installers, and the like might be rendered simpler if one is able to overwrite an address or so here and there, but these must be the exceptions.

anon80485093 · December 21, 2021, 6:13pm

When I used to lecture at UNI on programming concepts and also computer history I was often amused at the opinions of students on only using the newest method of doing something because it was new so it was obviously better than the old way. I’m not having a go at anyone. Everyone is entitled to their own opinion but that doesn’t mean their opinion is any less valid than mine. If we all had the same opinion about something then there would be no room for discussion on anything and the world would a much more boring place than it already is.

I’m not suggesting that anyone use any of the historical concepts I describe, and I never would, I merely post them because as a lover of programming history I think that understanding old techniques can give you ideas to improve your newer code. The current example might make you think about how you might be able to remove conditionals or redesign your code in some places that might speed up execution, and in OO you sure don’t need to modify running code to do that, and if there are now operating system protections in place to prevent it then of course that is impossible. But knowing that it was once possible can make you think.

I would ask a question of new students which was, “A client has asked you to write a large business application so what are you going to write it in?” Invariably I’d get responses of what was the latest and greatest programming language at the time, when in fact the correct answer was, “what ever tool was best suited to the task to meet the client requirements and get the project completed in the desired time frame, regardless of the language.” If you needed to use a RAD which used, to the students horror, BASIC or PASCAL, then that’s what you’d use.

When I moved onto a company called QCOM in 1999, they were already Australia’s oldest software house being over 30 years old when I joined them, I was responsible for coding some of Queensland Rail’s train monitoring systems, as well a graduate recruitment. Now QCOM had a lot of very well thought out QA and coding procedures which a lot of the new graduates couldn’t cope with because the procedures used old tried, tested and proven ways of doing things. Some graduates would often soon leave because they didn’t want to learn the old ways of doing something. In 2000 QCOM was acquired by Unisys and I spent almost the next 20 years being contracted out to code for some big names and I can honestly say we didn’t always use the latest techniques or technologies.

Now it’s no secret that I’m really not that much of a lover of object orientated coding, I like procedural code and I particularly like zero level assembler, which means everything is global, there is no scope, no dynamic memory management or really any memory management of any kind. All variables are static and their addresses and memory allocated at compile time. The only dynamic memory you have is the stack which you can push and pop items on and off at will. If you allocate too much memory to the heap and it crashes into the stack then everything goes down really quick.

Now I’m reminded of an interesting example of using an old method over new. In 1999 ID software released Quake 3 Arena, in 1999 we had pentium 2, 3, Celeron’s and Xeon processors all with floating point coprocessors, the 486DX was released in 1989 so consumer level FPU’s on a chip had been around for a while. So the question is why did ID decide to use a software implementation of doing a Fast Inverse Square Root function instead of using the FPU. The FISR algorithm as it’s now known is legendary, it works strangely by smashing a floating point number into an Integer then converting it back to a floating point by using a hexadecimal constant of 0x5F3759DF to get an estimate of the reciprocal. Now if you look into the history no one is one hundred percent sure who came up with the code, ID claims it was one person but the algorithm can be traced back to SGI with the introduction of the Silicon Graphics Indigo back in about 1991. No one knows who came up with the constant though. If you’re interested in a really good mystery reading up on the FISR algorithm is fascinating. You could also watch a YouTube video about it on the channel Dave’s Garage, hosted by Dave Plummer an ex Microsoft programmer responsible for such things as dos 6.2, TaskManager, Zip Folders the dreaded Windows activation code and much much more…

As you can tell I’m more than a little passionate about how the old ways can teach us new ways of doing things, I hope that this little tome into some of my past and the reason I am the way that I am has not be too tiring for everyone…

TC

Douglas_Handy · December 21, 2021, 7:18pm

IMHO, the specific reason self-modifying code should be avoided now is simply because of the growing need to separate program code from data, and keep the program code isolated and read-only can reduce the potential for malware. Sadly, malware has become more of an issue than in the days where self-modifying code was first used to reduce footprint sizes or hardware constraints.

I also recall that self-modifying code was used by some code it make it much harder to dissemble code in a debugger using breakpoints.

Decades ago I used to enjoy assembler and trying to optimize programs. Now that I am older and the hardware price/performance vs programmer time ratio has changed dramatically, I tend to be much more focused on code readability and maintainability. And self modifying code fails that test, even if the hardware and OS allows it.