GPT-OSS
This will be a simple, fast and somewhat funny blog post. Take it as it is, and thanks for reading it ![]()
The tale of the requirements
I’ve always been curious about local LLMs, but the excitement has always been cut short by hardware requirements.
hear me out, read everything before quitting the post . Here are the minimum requirements from the OpenAI Cookbook’s article:
- The smaller model
- Best with ≥16GB VRAM or unified memory
- Perfect for higher-end consumer GPUs or Apple Silicon Macs
And, more importantly:
- You can offload to CPU if you’re short on VRAM, but expect it to run slower.
you can complain with “Ehm.. yeah… yuhu! Okay… you can offload also Llama2/3, Phi 3, Gemma, Mistral, and several other models. That’s… that’s not new; I mean, it’s not exactly a feature.”
Hey, this is my blog, and I am free to be happy and excited on whatever I want!
Joke asides, thanks to the quantization, which is a microscaling at 4-bit floating-point format (in short, MXFP4, from OCP), the Mixture-of-Experts weights are stored so that the memory footprint dramatically reduces.
This setup enables the running of the 20B model with just 16GB of VRAM (and about 80GB for the 120B model), which is not really an uncommon, especially for AI tinkerers, AI DIY geeks, or high-end gamers.
How?
curl -fsSL https://ollama.com/install.sh | sh # install ollama
ollama pull gpt-oss:20b # pull the 20B model ~13GB
ollama run gpt-oss:20b # enjoy
>>> Send a message (/? for help)
Done! ![]()
Some gpt-oss 20B specifications
>>> /show info
Model
architecture gptoss
parameters 20.9B
context length 131072
embedding length 2880
quantization MXFP4
Capabilities
completion
tools
thinking
Parameters
temperature 1
License
Apache License
Version 2.0, January 2004
Thanks for reaching the end of this short post.
I hope it made you smile.
See you in the next one!
Will it be about LLaVA? MXFP4? Dracula? Bulbasaur? Who knows!