Can someone explain how is that from a high resolution image of the die the microcode can be reconstructed? I'm really curious, what's the process? Is the output some sort of Verilog? Does the process involve recognizing each and every transistor and model a circuit from that? I'm fascinated that something like this is possible at all...
Then you have to classify them as 0's or 1's. Each is visually distinct, a 1 being encoded by the presence of a transistor and a gap in the polysilicon. We didn't have to guess which is which is by the nature of Intel microcode we could assume 0's were much more frequent, so a transistor meant a 1.
There are some automatic tools designed to perform this work via color thresholding, but they didn't work very well here because some of the mosaic was blurry, and a lot of dust had crept in which created false 1 bits.
Instead, we trained a convolutional neural network to classify the extracted bit regions into 0's and 1's. This was overlaid back onto the original mosaic as white or black squares at 50% opacity.
Then we spent several long, tedious days just checking the results for errors. Finally we had the raw 2d array of bits - the next step is to extract the microcode words from the bit array.
Documentation from the NEC vs Intel lawsuit ended up documenting the microcode word format for both the 8088 and NEC V20 CPUs, but unfortunately, we were on our own for the 386. But we could take educated guesses - working off the 8088 field format, what additional microcode fields would a 386 add? What fields would expand and how many bits would they need?
We used a lot of python scripts to decode the microcode array into 37-pixel wide, very long bitmaps, in different permutations, to see if any vertical patterns emerged that would hint to us the boundaries of microcode word fields. And some did emerge!
> The photo above shows part of the microcode ROM. Under a microscope, the contents of the microcode ROM are visible, and the bits can be read out, based on the presence or absence of transistors in each position.
[1]: https://www.righto.com/2020/06/a-look-at-die-of-8086-process...
z386: An Open-Source 80386 Built Around Original Microcode - https://news.ycombinator.com/item?id=48248014 - May 2026 (22 comments)
It's especially fun seeing his blog going back 33 years.
Easy to find a free pdf
For people that don’t have access to a uni, I recommend nand2tetris.org
There certainly is no need to go to university to learn chip design. Watching a few Alan Kay talks [3] or browsing Bitsavers computer designs [4] are good starting points.
We made an easier way (than FPGA) to simulate and convert your gate level design into transistors on a chip (for less than $200 in 2026). We call it Morphle Logic [1].
Eventually you grow into making the largest fastest and cheapest supercomputer wafer scale integration [2].
[1] https://github.com/fiberhood/MorphleLogic/blob/main/README_M...
[2]https://www.youtube.com/watch?v=vbqKClBwFwI
[3] https://www.youtube.com/watch?v=f1605Zmwek8
[4] http://bitsavers.informatik.uni-stuttgart.de/pdf/xerox/alto/...
I don’t think anyone would actually label it as microcode (not when the entire point of RISC was to avoid microcode) they would call it a sequencer or finite state machine; But really it’s the same thing. It’s certainly much simpler than the full microcode of any contemporary CISC, and the bulk of instructions execute in a single cycle without using it.
If you want a design with zero microcode, you really need to look at MIPS, or the original Berkeley RISC. Those ISAs go out of their way to avoid multicycle instructions. Not entirely successfully, but they don't use PLAs [1] to implement any state machines for the few remaining instructions like multiply and divide.
[0] http://daveshacks.blogspot.com/2016/01/inside-armv1-instruct...
[1] At least on the few MIPS designs I've looked at. And I'm not sure if they deliberately avoided PLAs for doctrine reasons, or it was just more efficient to do so.
z386: An Open-Source 80386 Built Around Original Microcode
There's sort of a wild west nostalgia that came with the 8086 and 8088 chips and a sense of approachable individual adventure that came along with it. Staring into the 386 is like staring into the cold and dispassionate industrial machine future that Fritz Lang was trying to portray in Metropolis.
Still fun to look at though. Great post.
Other instructions like PUSHA and POPA are implemented as loops that iterate by incrementing the fields corresponding to registers - and we know in what order they operate.
Bit by bit, relation by relation, you can puzzle out the format of the microcode. Of course, this is glossing over the enormous added complexity of protected-mode operations. This was a herculean effort by reenigne, and I don't think it is hyperbole to call it one of the more impressive human achievements I have witnessed in my lifetime.
That language can then be translated into Verilog, and has been.
http://brianluft.com/images/2026/05/386_microcode_bits.jpg -- my fully annotated result. I was working from a higher-quality PNG; this is highly compressed because it's a big image.
It's not really needless complication of there is a reason for the complication. Obvioudsly in this case the need to be backward compatible with an old design made the implemtation more complicated than if they didn't need to do that. There were very, very strong business reasons why backward compatibility was a design requirment.
In a way I guess the instructions in nand2tetris are the microcode. The bits of the instructions directly control the hardware with the first bit choosing 2 instruction types, so there’s only 1 step of code per instruction, unlike with microcode where an instruction can have any number of microcode steps.
In Ben Eater’s series of videos building an 8-bit CPU on breadboards he has ROMs that are indexed by the opcode (4 bits of the instruction) + a step counter to determine the control word. The ROM stands in for what could be done with sufficiently complicated logic gates. I like it as a next step on the hardware side as you get hands on experience with electronics and having to troubleshoot it.
It’s disappointing how it only has 16 bytes of RAM so you can’t really build higher levels of abstraction like you can with nand2tetris. But at that point you could (I should) either redo it with a better design (and put it on PCBs) or move on to the 6502 project, and then since that puts together a timer, CPU, ROM, RAM, I/O, UART, etc. mentally group those together and move on to microcontrollers that already have them together.
Anyone interested in reading about how a CPU could be made out of logic gates could also read Code by Charles Petzold (moves slower, recently updated) and/or Pattern on the Stone by Danny Hillis (moves faster).
Edit: I just checked Code (2nd edition) and that uses a 4 bit cycle counter and hard logic gates to determine what to do each cycle. But then it uses an array of diodes for part of the logic. Would that be considered microcode?
[0] there were classes that covered more advanced (pipelined) CPUs in another CS class but not at quite a low level where you felt like you could make one yourself
I might upload Tristam Island (Z-Machine v3 game, like Zork and infocom games they already have the interpreter) among the feelies in ASCII format. Yes, dfrotz runs snappier than the vi clone they have. And more stable than their ed implementation.
More modern devices are of course more difficult due to layers, feature size, and less visually obvious ROM bit designs.
Anyway, the impressive part of this project was really understanding the undocumented microcode assembly language through inference and trace following; the 1s and 0s look like they were the easy part!
1. Extract the ROM bits. 2. Determine physical-to-logical bit ordering. 3. Identify microinstruction boundaries. 4. Infer field boundaries. 5. Associate fields with hardware destinations (check with die tracing). 6. Decode instruction-dispatch programmable logic arrays. 7. Associate x86 instructions with microcode entry points. 8. Infer repeated idioms: moves, ALU ops, termination, calls, tests, redirects. 9. Decode accelerator protocols. 10. Validate against known architectural behavior.