C65gm (pronounced cee-gee-em) is a language that mainly targets the Commodore 64 computer.
It compiles c65gm code using the known ACME 6502 Assembler as a backend to generate the actual .prg files.
ASM blocks are allowed pretty much everywhere, and mixing c65gm code with assembler is not only supported but highly encouraged.
Being a high level language with full assembler access and compile time scripting via Starlark, it lets you write the bulk of your program quickly and drop to hand tuned code where it matters. The goal is productivity without losing control.
The following small example shows what c65gm code might look like:
// @runnable
#INCLUDE <c64start.c65>
GOTO start
FUNC fill_screen({BYTE char})
WORD w
FOR w = $0400 TO $0400+999
POKE w , char
NEXT
FEND
LABEL start
fill_screen(255)
SUBEND(Choose the run button in the example above, to try it out right here in the browser)
The example above shows how to fill the screen of a C64 with the char $ff (255)
While this might not be extremely impressive let us extend the program slightly with a function:
FUNC fill_mem( {WORD from @ $fb} {WORD to} {BYTE value} )
FOR from = from TO to
POKE from , value
NEXT
FENDAs you see our routine is generic and reusable. The `@ $fb` syntax pins the pointer to zero page locations, letting the compiler generate indirect indexed addressing invisible to the programmer but critical for 6502 performance. The same 5-line function replaces 15-20 lines of raw assembly (loop init, LDA/STA with indirect indexed, compare, branch). You think in terms of ranges and operations, not carry flags.
Now here is where c65gm starts to shine. SCRIPT blocks let you embed Starlark code (very Python-like) that runs at compile time:
// @runnable
// @show-asm
#INCLUDE <c64start.c65>
GOTO start
SCRIPT
addrs = []
for x in range(40):
y = int(12 + 10 * math.sin(x * 2.0 * math.pi / 40.0))
addrs.append(0x0400 + y * 40 + x)
print("sine_lo")
for a in addrs:
print(" !8 %d" % (a & 0xff))
print("sine_hi")
for a in addrs:
print(" !8 %d" % (a >> 8))
ENDSCRIPT
FUNC fill_mem( {WORD from @ $fb} {WORD to} {BYTE value} )
FOR from = from TO to
POKE from , value
NEXT
FEND
FUNC plot_sine
WORD screen_ptr @ $fb
BYTE x
FOR x = 0 TO 39
ASM
ldx |x|
lda sine_lo,x
sta |screen_ptr|
lda sine_hi,x
sta |screen_ptr|+1
ENDASM
POKE screen_ptr , $2e // .-char
NEXT
FEND
LABEL start
fill_mem($0400, $0400+999, 32) // Clear screen
plot_sine()
SUBENDThis clears the screen, then plots a sine wave of dots across it. Starlark computes 40 target screen addresses using `math.sin` at compile time and outputs them as two tables: 1) one for low bytes 2) for high bytes. The runtime loop uses the classic 6502 split-table pattern: index into both tables, load the 16-bit address into a zero-page pointer, and POKE through it. The resulting lookup data is embedded directly into the PRG, and costs zero cycles to compute at runtime.
Some sceners resort to external Python scripts for sine tables, but c65gm lets you keep it all in one source file.
The compiler was born as an internal tool of the Siders C64 (and VIC-20) Group back in the late 90ies, however in 2025 it was reborn by a complete rewrite in Go, and it gained features such as:
- A powerful macro scripting language
- Variable and function usage analysis (detects unused variables and functions, can auto-remove them from output)
- Zero page usage analysis by analyzing the call graph (catches conflicting assignments across function boundaries)
Siders decided to make the tool public as we think the C64 scene deserves a modern toolchain that bridges the gap between high-level productivity and low-level control. The programming style lets you choose: write clean high-level loops for the bulk of your code, drop into ASM blocks for hot paths, and use SCRIPT macros to generate optimized sequences all in the same file.
Is c65gm perfect? No. It does not yet handle peephole optimizations. A loop like `FOR from = from TO to` (above) compiles to correct code but the generated assembly loads Y with zero inside the loop body and reloads the loop counter from memory when it's already in a register. The `fill_mem` function from earlier compiles to about 30 cycles per byte; a hand tuned version with self modifying code and a counted loop fits the same contract in about 23 cycles. The gap is real and that is fine. The fastest possible code is always within reach. You just do not have to write it everywhere.