GPT-OSS

1 minute read

This will be a simple, fast and somewhat funny blog post. Take it as it is, and thanks for reading it

The tale of the requirements

I’ve always been curious about local LLMs, but the excitement has always been cut short by hardware requirements.

hear me out, read everything before quitting the post . Here are the minimum requirements from the OpenAI Cookbook’s article:

The smaller model

Best with ≥16GB VRAM or unified memory

Perfect for higher-end consumer GPUs or Apple Silicon Macs

And, more importantly:

You can offload to CPU if you’re short on VRAM, but expect it to run slower.

you can complain with “Ehm.. yeah… yuhu! Okay… you can offload also Llama2/3, Phi 3, Gemma, Mistral, and several other models. That’s… that’s not new; I mean, it’s not exactly a feature.”

Hey, this is my blog, and I am free to be happy and excited on whatever I want!

Joke asides, thanks to the quantization, which is a microscaling at 4-bit floating-point format (in short, MXFP4, from OCP), the Mixture-of-Experts weights are stored so that the memory footprint dramatically reduces.

This setup enables the running of the 20B model with just 16GB of VRAM (and about 80GB for the 120B model), which is not really an uncommon, especially for AI tinkerers, AI DIY geeks, or high-end gamers.

How?

curl -fsSL https://ollama.com/install.sh | sh # install ollama
ollama pull gpt-oss:20b # pull the 20B model ~13GB
ollama run gpt-oss:20b # enjoy
>>> Send a message (/? for help)

Done!

Some gpt-oss 20B specifications

>>> /show info
  Model
    architecture        gptoss    
    parameters          20.9B     
    context length      131072    
    embedding length    2880      
    quantization        MXFP4     

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    temperature    1    

  License
    Apache License               
    Version 2.0, January 2004

Thanks for reaching the end of this short post.
I hope it made you smile.

See you in the next one!
Will it be about LLaVA? MXFP4? Dracula? Bulbasaur? Who knows!

Share on

Twitter Facebook LinkedIn

GPT-OSS

The tale of the requirements

How?

Some gpt-oss 20B specifications

Share on

You may also enjoy

JAX

DINOv3

NVIDIA NV1 (and NV2)

Hardware and Software of Lumino