It's very common for people to learn assembly using x86(-64) but this ISA is so messy, complicated and layered that it always seems like the bad choice to me. It's like teaching an intro programming course using C++, I understand why it's practical but it seems like it will be more trouble than it's worth.
I'd probably recommend getting some ARM or ARM64 board and starting with this, most of the concepts will carry over to other assemblies anyway. Writing a simple CPU emulator for some simple architecture and then coding on it can also be a great teaching experience, if a little more involved.
I find this tutorial a bit oddly structured too, but that may be because I'm from un*x world and I don't have the Windows mindset. It basically goes from "int3 ret" to "The PE Format and DLL Imports". That seems like quite the jump, and not necessarily super relevant to learning ASM IMO.
Kind of disagree, but just because the literature for ASM for x86 is that good. What I am missing most of the time is declaration of the platform. In most cases "learning ASM" means x86/Linux or x64/Linux, this time it is x64/Windows. Because you will quickly get to a point where you want to do system calls and that can vary from platform to platform.
If you like an easy instruction set and don't care about platform specific syscalls, I would recommend an 8bit µC. An Arduino for example.
I get your point (and agree about 8bit µC) but I would also point out that ARM/Linux and ARM64/Linux are pretty common and accessible these days. Buying an RPi to learn ASM and not have to deal with x86 nonsense would be a good investment IMO.
Maybe not everybody shares my deeply rooted dislike for x86 though.
I think I agree but it is very important that the student learns which facts they were taught are about ARM/ARM64/MIPS (or whatever you choose to use) and which are about computers in general. Otherwise, you get people coming to stack overflow with questions like "How come this book says that on x86 you can jump into the middle of an instruction and execute those bytes as a different instruction, when clearly this is impossible, because I have studied RISC-based computer in a COA class and thought that instructions are read sequentially and that instructions memory are read from memory 4 bytes at a time".
I mainly use assembly for optimization myself, in general to double-check that
the compiler did what I expected it to do. I also sometimes use it when I do
bare-metal programming or other very low level code, but that's pretty niche
these days.
But even more generally I think you won't have any trouble picking up x86
idiosyncrasies if you've familiarized yourself with ARM/MIPS/Z80 assembly.
Although maybe MIPS would be a bad choice because it doesn't use flags which are
an important concept for many assembly languages.
IMO the key concepts for an assembler tutorial would be, out of the top of my
head:
- The stack,
- banking registers, the frame pointer,
- The various types of jumps/calls/branches and their differences,
- Conditionals,
- Calling conventions,
- Banking/context switching/IRQs (at least for low level programming, not so
important if you're only dealing with userland I suppose),
That stuff exists on basically any architecture. Then you have things like
immediate encoding and addressing modes which are also very important and
architecture-specific.
I suppose SIMD could be interesting as well, but these days it seems to be
mostly done with intrinsics instead of raw assembly, at least in my experience.
Yes, when you study assembly as a general programming language, the same way you learn another language. But again, for people who are interested in reverse engineering, the focus is different, and you'll find that if you start looking for assembly tutorials online (and also the books on modern assembly, e.g. I have "Modern x64 assembly language programming" and "Windows 64 bit assembly language programming quick start" here next to me and those fit that description), a lot of those are for aspiring reverse engineers. The (numerical) optimization angle is a lot harder to find, agner.org probably being the exception.
E.g., knowing what _main is and how to call an OS primitive is a lot more important when reverse engineering than knowing about context switching and simd. I was just trying to say, that's where the focus on a specific OS comes from, and a focus on things that people who do manual numerical optimization would consider irrelevant or at best tangential to what they consider 'important' in assembly.
There's no reason that somebody's first exposure to bare-metal programming should be with the ISA they actually want to understand in practice, though.
Although no obvious example is springing to mind (any help?), I'm pretty sure there are cases where it's faster to teach someone—even an adult!—a https://en.wikipedia.org/wiki/Lie-to-children model of a system, and then teach them the actual system, than it is to teach them the actual system from the start.
I believe that, for the same reason I believe that people get better at the reflex skills of a video game if early parts of the game hold back some of the game's mechanics, focusing on only a core subset. You're allowed to develop just those sub-skills in isolation, and get good at them, so that they can become subconscious-enough that you won't be distracted thinking about them any more by the time the new sub-skills are introduced.
Honestly, I think a great way to learn bare-metal programming would be to not work with bare-metal at all, but rather to work with a virtual machine for a custom ISA, where that supported ISA has 100 different sub-variants (ranging from very simplified to very "realistic") and the VM supports all of them. You'd learn to work with the simplest ISA (e.g. target a compiler backend to it), then learn the next-simplest (and tweak your compiler to also emit the new instructions introduced), etc. Evolve from RISC to CISC, from stack-based to register-based, from SISD to SIMD, gradually add vector instructions, etc.
By the end of a process like that, you'd understand a (fake) ISA nearly as complex as x86-64; but more importantly, you'd understand the why of it, not just the what. You'd understand the history behind each instruction, and why it has the options/limits it does. At that point, learning x86-64, or AArch64, or whatever else, would just be "learning the vocabulary", with no new skills per se.
> It's very common for people to learn assembly using x86(-64) but this ISA is so messy, complicated and layered that it always seems like the bad choice to me.
Agreed. As someone who did lots of assembly for the 6502 and the whole Motorola 68k series CPUs while on Mac & Commodore-machines, I thought I might as well learn x86 assembly when I moved to "regular" PCs.
After a week or so of en-bafflement, I just decided that dealing with assembly on any sort of regular basis was not something I was interested in doing on the Intel-platform.
It was inferior to competing platforms in almost every conceivable way back then (except performance) and it haven't improved since. It's just utterly terrible and the only reason it's still around is backwards compatibility. Nobody would design anything like this today.
It really depends what you started with - I've never touched 6502, but I did a lot of z80 assembly back in the day. Since Zilog was started by ex-Intel people there's a lot in common.
For me jumping from z80 -> x86 was only a small change. The same notion of paired-registers and very similar instructions made it a simple enough change.
Of course these days there have been a lot of changes, but the basics are still basic, albeit idiosyncratic.
It's too complicated IMO. The surface to cover is absolutely huge. Teaching complete novices to program is complicated enough, explaining the difference between char * and std::string and how a char is not really a char if you deal with unicode and OOP and destructors and exceptions and virtual calls and global constructors and RAII and templates and overloading and...
I think it would make more sense to start with a simpler language and work your way to C++ if that's what you want.
Depends if your goal is to teach the maximum number of people the fundamentals of logic, or to transfer as much practical knowledge as quick as possible to the students with an aptitude for programming.
I recently took up the challenge of porting JonesForth to x86-64. It was one of the most rewarding personal challenges in a very long time. For one, I was able to write x86-64 without needing to whip up a complete application. And learning Forth during the process was even more fun.
Almost right. At the end of the exercise, I at least wrote TIME&DATE, a word to get the current system time. And I implemented DO..LOOP which appeared to be quite tricky. But yes, I agree. Getting the system to run was much more interesting then actually using it :-)
Nice illustration, thanks. Regarding the formatting, for a chart like this you will find it easier if you switch to a monospaced font. You can do that with two spaces at the beginning of each line, and you won't need the blank lines either. Here is a start:
Completely off topic, but I just have to say it: This is one of the few light-text-on-dark-background web pages that actually does it well. Contrast is managed so the text is still readable without escaping to Reader View or wearing sunglasses. I wish those sites that use squeaky-white on pitch-black would take example of this.
Around 2000 we did this in the first week of university along side C. The number of people who changed courses to more analysis or business focused degrees was high. Of course it was deliberate. We barely touched Assembly again unless you opted into specific classes.
When I was in my masters I never programmed in C and barely assembly assembly as I was in a Java/Python/JS school. Nevertheless, I found it doable to learn them both at the same time.
How?
GDB
Combined with
the TUI window and typing:
* layout asm
* layout regs
* focus cmd
And then ni and si were my favorite "step into instruction" and "next instruction" commands.
And for C the same thing but then simply with layout src and sometimes layout asm and layout regs as well.
Maybe I'm just dumb, but anytime I've tried to learn anything about modern x86 platforms I just end up completely lost.
I find it interesting the author uses FASM, and I've seen it used a bit more than I did in the past. Several years ago I toyed with it and found it neat because the editor and all the samples it shipped with. It did seems a bit different from things like nasm or gas as it the FASM code I saw used all sorts of interesting macros that provided quasi-high-level constructs like if statements.
I feel like there should be a text that teaches assembly, not assumes it, but teaches it specifically with an eye toward shellcode and/or reversing. Every "learn assembly" text I've ever seen either teaches it in relation to C or architecture or simply by itself. Seems like a gap that could be filled by nostarch or someone willing to self publish
Write position independent code. This is much easier nowadays that you can use RIP relative addressing. To include data, just append it to your code or even put it inline with jmps to avoid executing it. To accomplish tasks use syscalls. If you want a library, load it dynamically with dlsym.
A mid 90s object oriented textbook somewhere is screaming at me, "pidgeon-hole computing is dead and unappealing unless you are a pidgeon or a mailman! Don't treat computers like a list of robots and mailboxes, don't focus on the hows of computing, focus on the whys!"
Huge thanks! Was recently implementing an assembly language of my own, and this read seems to capture all of the important things that were otherwise hidden in ten different outdated books, specifications and websites.
Yes, that was like assembly for dummies. Still, I don't think it really exploited its concept well enough. It was not remotely challenging to someone with a software background.
I'd probably recommend getting some ARM or ARM64 board and starting with this, most of the concepts will carry over to other assemblies anyway. Writing a simple CPU emulator for some simple architecture and then coding on it can also be a great teaching experience, if a little more involved.
I find this tutorial a bit oddly structured too, but that may be because I'm from un*x world and I don't have the Windows mindset. It basically goes from "int3 ret" to "The PE Format and DLL Imports". That seems like quite the jump, and not necessarily super relevant to learning ASM IMO.