Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Writing a Game Boy Emulator in OCaml (2022) (linoscope.github.io)
254 points by ibobev 1 day ago | hide | past | favorite | 62 comments





Would anyone here assert that there's any particular programming language that's better for writing emulators, virtual machines, bytecode interpreters, etc?

Where, when I say "better", I'm not so much talking about getting results that are particularly efficient/performant; nor in making fewer implementation errors... but more in terms of the experience of implementing an emulator in this particular language, being more rewarding, intuitive, and/or teaching you more about both emulators and the language.

I ask because I know that this sort of language exists in other domains. Erlang, for example, is particularly rewarding to implement a "soft-realtime nine-nines-of-uptime distributed system" in. The language, its execution semantics, its runtime, and its core libraries, were all co-designed to address this particular problem ___domain. Using Erlang "for what it's for" can thus teach you a lot about distributed systems (due to the language/runtime/etc guiding your hand toward its own idiomatic answers to distributed-systems problems — which usually are "best practice" solutions in theory as well); and can lead you to a much-deeper understanding of Erlang (exploring all its corners, discovering all the places where the language designers considered the problems you'd be having and set you up for success) than you'd get by trying to use it to solve problems in some other ___domain.

Is there a language like that... but where the "problem ___domain" that the language's designers were targeting, was "describing machines in code"?


Haskell excels at DSLs and the sort of data manipulation needed in compilers. OCaml, Lisp, and really any language with support for ADTs and such things do the trick as well. You can even try hard with modern C++ and variant types and such, but it won't be as pretty.

Of course, if you actually want to run games on the emulator, C or C++ is where the game is. I suppose Rust would work too, but I can't speak much for its low-level memory manipulation.


Haskell and OCaml are excellent for compilers, because - as you suggest - you end up building, walking, and transforming tree data structures where sum types are really useful. Lisp is an odd suggestion there, as it doesn’t really have any built-in support for this sort of thing.

At any rate, that’s not really the case when building an emulator or bytecode interpreter. And Haskell ends up being mostly a liability here, because most work is just going to be imperatively modifying your virtual machine’s state.


Lisp has one of the most powerful macro systems.

Also when people say Lisp in 2025, usually we can assume Common Lisp, which is far beyond the Lisp 1.5 reference manual in capabilities.

In fact, back when I was in the university, Caml Light was still recent, Miranda was still part of programming language lectures, the languages forbidden on compiler development assignments were Lisp and Prolog, as they would make it supper easy assignment.


> And Haskell ends up being mostly a liability here, because most work is just going to be imperatively modifying your virtual machine’s state.

That sounds odd to me. Haskell is great for managing state, since it makes it possible to do so in a much more controlled manner than non-pure languages.


Yeah, I don't understand what the "liability" here is. I never claimed it was going to be optimal, and I already pointed out C/C++ as the only reasonable choice if you actually want to run games on the thing and get as much performance as possible. But manipulating the machine state in Haskell is otherwise perfect. Code will look like equations, everything becomes trivially testable and REPLable, and you'd even get a free time machine from the immutability of the data, which makes debugging easy.

If you're effectively always in a stateful monad, Haskell's purity offers nothing. Code doesn't look like equations, things aren't trivially testable and REPLable, you don't get a free time machine, and there's syntactic overhead from things like lifting or writes to deeply nested structures and arrays, since the language doesn't have built-in syntactic support for them.

Even if you use a stateful monad (not necessarily the State monad), you can take snapshots of the state of the machine and literally produce a log. You haven't lost immutability or the time machine, and you can 'deriving Show' the hell out of everything and get human-readable output for free. Fuck, you could even lift functions in such a way that they produce a trace of assertions that each function of (state -> state) must satisfy. A state-debugger-log monad.

Not that you'd need a monad for something like this anyway.


On the other hand, it does have support for things like side-effectful traversals, folds, side effects conditional on value existing, etc. In most other languages you have to write lower-level code to accomplish the same thing.

I've heard Haskell described as the best imperative language.


Haskell isn't a liability for that lol

I’d also point out, that even in the compiler space, there are basically no production compilers written in Haskell and OCaml.

I believe those two languages themselves self-host. So not saying it’s impossible. And I have no clue about the technical merits.

But if you look around programming forums, there’s this ideas that”Ocaml is one of the leading languages for compiler writers”, which seems to be a completely made up statistic.


I don't know that many production compilers are in them, but how much of that is compilers tending towards self hosting once they get far enough along these days? My understanding is early Rust compilers were written in Ocaml, but they transitioned to Rust to self-host.

What do you define as a production compiler? Two related languages have compilers built in Haskell: PureScript and Elm.

Also, Haskell has parsers for all major languages. You can find them on Hackage with the `language-` prefix: language-python, language-rust, language,javascript, etc.

https://hackage.haskell.org/packages/browse?terms=language


Obviously C is the ultimate compiler of compilers.

But I would call Rust, Haxe and Hack production compilers. (As mentioned by sibling, Rust bootstraps itself since its early days. But that doesn't diminish that OCaml was the choice before bootstrapping.)


Most C compilers are written in C++ nowadays.

Yes, C and C++ have an odd symbiosis. I should have said C/C++.

Most C and C++ developers take umbrage with combining them. Since C++11, and especially C++17, the languages have diverged significantly. C is still largely compatible (outside of things like uncasted malloc) since the rules are still largely valid in C++; but both have gained fairly substantial incompatibilities to each other. Writing a pure C++ application today will look nothing like a modern C app.

RAII, iterators, templates, object encapsulation, smart pointers, data ownership, etc are entrenched in C++; while C is still raw pointers, no generics (no _Generic doesn’t count), procedural, void* casting, manual malloc/free, etc.

I code in both, and enjoy each (generally for different use cases), but certainly they are significantly differing experiences.


What are you on? Rust was written in ocaml, and Haxe is still after 25 years going strong with a ocaml based compiler, and is very much production grade.

We must be looking at different compilers.

I wrote a GBC emulator in Haskell (https://github.com/CLowcay/hgbc/tree/master). It's nice for modelling the instruction set, decoding and dispatching instructions. Optimization is tough though. To achieve playable performance, everything has to go into the IO monad. Haskell is famous for lazy evaluation. I found that occasionally useful, but mostly a source of performance problems.

Ultimately, the hard thing in emulation was not decoding instructions. It was synchronization, timing, and faithfully reproducing all the hardware glitches (because many games will not work without certain hardware bugs). Haskell doesn't help much for those things. If I was doing another emulation project I'd choose rust.


I'm curious why you put ADTs ahead of basic arrays. The latter seems a lot more relevant for writing hardware emulation.

I would argue that systems languages (C, C++, Rust, and Zig standout) are the most “fulfilling” (in my experience).

The reason being that the methodologies are far more orthogonal. A uint8 directly represents a byte in memory, doing a memcpy is equivalent to a blit, etc. You spend far less time trying to wrangle a JavaScript Number type into acting like a byte/word/etc for a bitshift operation for a simple ADC. A very simple example that you’ll run into in the first day of writing a js emulator.

That all being said, if the language can paint to some surface and has the memory size to handle the machine you’re emulating, they’re all roughly equivalent. So the answer becomes “whichever language you’re most comfortable with is the one writing an emulator in is most enjoyable”.


> Would anyone here assert that there's any particular programming language that's better for writing emulators, virtual machines, bytecode interpreters, etc?

No, absolutely not. Emulation is super easy to implement in any language with arrays (constant-time lookup of arbitrary indices) and bit operations. At least before considering JIT.

And even functional languages have arrays and bitwise operations.


sml, specifically the MLTon dialect. It's good for all the same reasons ocaml is good, it's just a much better version of the ML-language in my opinion.

I think the only thing that ocaml has that I miss in sml is applicative functors, but in the end that just translates to slightly different module styles.


Can you expand on what makes SML better, in your eyes, than OCaml?

IMO: it's certainly "simpler" and "cleaner" (although it's been a while but IIRC the treatment of things like equality and arithmetic is hacky in its own way), which I think causes some people to prefer SML over aesthetics, but TBH I feel like many of OCaml's features missing in SML are quite useful. You mentioned applicative functors, but there's also things like labelled arguments, polymorphic variants, GADTs, even the much-maligned object system that have their place. Is there anything SML really brings to the table besides the omission of features like this?


> the treatment of things like equality and arithmetic is hacky in its own way

mlton allows you to use a keyword to get the same facility for function overloading that is used for addition and equality. it's disabled by default for hygienic reasons, function overloading shouldn't be abused.

https://baturin.org/code/mlton-overload/

> labelled arguments

generally speaking if my functions are large enough for this to matter, i'd rather be passing around refs to structures so refactoring is easier.

> polymorphic variants

haven't really missed them.

> GADTs

afaik being able to store functors inside of modules would fix this (and I think sml/nj supports this), but SML's type system is more than capable of expressing virtual machines in a comfortable way with normal ADTs. if i wanted to get that cute with the type system, i'd probably go the whole country mile and reach for idris.

> even the much-maligned object system that have their place

never used it.

> Is there anything SML really brings to the table besides the omission of features like this?

mlton is whole-program optimizing (and very good at it)[1], has a much better FFI[2][3], is much less opinionated as a language, and the parallelism is about 30 years ahead[4]. the most important feature to me is that sml is more comfortable to use over ocaml. being nicer syntactically matters, and that increases in proportion with the amount of code you have to read and write. you dont go hiking in flip flops. as a knock-on effect, that simplicitly in sml ends up with a language that allows for a lot more mechanical sympathy.

all of these things combine for me, as an engineer, to what's fundamentally a more pragmatic language. the french have peculiar taste in programming languages, marseille prolog is also kind of weird. ocaml feels quirky in the same way as a french car, and i don't necessarily want that from a tool.

[1] - http://www.mlton.org/Performance

[2] - http://www.mlton.org/ForeignFunctionInterface

[3] - http://www.mlton.org/MLNLFFIGen

[4] - https://sss.cs.purdue.edu/projects/multiMLton/mML/Documentat...


I love, love, love StandardML.

I respect the sheer power of what mlton does. The language itself is clean, easy to understand, reads better than anything else out there, and is also well-formalised. I read (enjoyed!) the tiger book before I knew anything about SML.

Sadly, this purism (not as in Haskell but as a vision) is what probably killed it. MLTon or not, the language needed to evolve, expand, rework the stdlib, etc.

But authors were just not interested in the boring part of language maintenance.


What are your thoughts on basis[1] and successorml[2]?

[1] - http://www.mlton.org/MLBasis

[2] - https://smlfamily.github.io/successor-ml/


I don't think these change anything for the language. Too little, too late.

I remember working through Appel's compiler textbook in school, in SML and Java editions side by side, and the SML version was of course laughably more concise. It felt like cheating, because it practically was.

Nowadays you could make the Java version quite similar to the SML one, if there would be a new edition.

I have been looking forward to ML like capabilities on mainstream since using Caml Light.

Regarding those books, while we used the Java version, alongside JavaCC, when time came to actually buy the book, I also got the SML edition.


Also another option for fun in the browser Elm. check out similar older project https://github.com/Malax/elmboy

I want to dissuade anyone possible from looking at Elm. I used it professionally for about six months before I convinced my org to switch away. It simply doesn't have the type system to support complex programs, and the maintainers have a really noxious relationship with the community.

Ocaml is really good for this, and is usually "fast enough" for most things. When you REALLY need speed, there is the oxcaml project by jane street that gets you all sorts of perf improvements and close to C speed, while still having a GC for things not critical for raw speed.

one of the options for fast iterations would be Forth. in its circles, it famous for generation targets and cross compiling between archs. seaech the net you shold find plenty.

OCaml doesn't seem like a bad choice here. Haven't played with it much, but I wonder if Racket might be a good choice as well?


Verilog?

...just kidding (maybe).

Assuming we're talking about a pure interpreter, pretty much anything that makes it straightforward to work with bytes and/or arrays is going to work fine. I probably wouldn't recommend Haskell, just because most operations are going to involve imperatively mutating the state of the machine, so pure FP won't win you much.

The basic process of interpretation is just: "read an opcode, then dispatch on it". You'll probably have some memory address space to maintain. And that's kind of it? Most languages can do that fine. So your preference should be based on just about everything else: how comfortable are you using it, how much do you like its abilities to interface with your host platform, how much do you like type checking, and so on.


C is probably the best language for this.

C isn't really the best language for anything anymore. Maybe as a compilation target for other languages.

The only thing C is interesting for at this point is as a target for a compiler you can implement over a weekend.

Only from a UNIX culture mindset.

As opposed to what?

Everyone else that understands C is not the first systems language to ever exist since FORTRAN in 1958, nor it will be around forever outside anything tied to UNIX/POSIX.

C isn't even a systems language lol, it's too delicate to rely on for systems.

UNIX authors saw it otherwise thought, as they created it for UNIX System V rewrite.

I quite frankly disagree. From personal experience I don't think there's any mainstream programming language that in itself teaches you anything much about emulating systems like the Game Boy or NES - in fact, I'd go so far as to say that none of them even at least yield elegant and accurate implementations.

People write "production-grade" emulators in C because it's fast, not because it's uniquely suited to the ___domain as a language.



Cool. The demo runs way too fast, though. The throttle checkbox doesn't really change it. Unchecking it, if anything, makes it run slower. It runs at 240 fps with throttle and at 180 fps without. With the throttle checbox active one second are already about four seconds in the emulator. I suspect this is related to the screen refresh rate, which is 240Hz in my case.

probably they are calling requestAnimationFrame() and then not accounting for deltaTime?

This is a very nice write up of not only Ocaml but also gameboy emulator implementation. Great job and thank you to the author!

As an aside I’ve always thought it would be awesome to create a single page app with an assembler editor and assembler/linker/loader to enable doing gameboy homebrew in the browser. I think it would be a great, accessible embedded development teaching opportunity.


rgbds-live is more-or-less this same idea, with an embedded version of RGBDS: https://gbdev.io/rgbds-live/

I know it’s a long shot, but does anyone know of a tutorial for the sound of a game boy emulator? Most of these tutorials never cover that piece and when I try it on my own I find it hard to properly implement or even understand the reference material well enough to implement on my own.

Not a tutorial per-se, but here are 2 slides describing how I've done it:

https://www.slideshare.net/slideshow/emulating-game-boy-in-j...

Essentially, there are 4 channels, each providing a number 0-15 on every tick. Emulator should mix them together (arithmetic average), scale up to 0-255 and feed to the sound buffer, adjusting the tick rate (4.19MHz) to the sound output rate (e.g.: 22 kHz) - taking every ~190 value (4.19MHz / 22 kHz) is a good start.

Now the 0..15 value that should be produced by each channel depends on its characteristics, but it's well documented:

https://gbdev.gg8.se/wiki/articles/Gameboy_sound_hardware

Channels 1 and 2 produce square waves, so a bunch of low (0) and high (15) values, with optional volume envelope (gradually going down from 15 to 0 on the "high" part of the square) and frequency sweep (alternating 0s and 15s slower or faster).

Channel 3 allows an arbitrary waveform, read from the memory.

Channel 4 is a random noise, generated by the LSFR.

See SoundModeX.java for the reference:

https://github.com/trekawek/coffee-gb/tree/master/src/main/j...




Excellent writeup and cool project.

Needs a (2022).


Beautiful write-up! Thanks for sharing this. I want to write a game boy emulator in Rust and your blogpost really inspired me to kick this off. I’m bookmarking this.

ah nice ! great use of functors, GADTs

I wanna compare a CHIP 8 or NES emulator or port CAMLBOY to WASM using ocaml-wasm


It should already be possible to run CAMLBOY on WASM because of the new WASM backend of js_of_ocaml (wasm_of_ocaml).



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: