Running Local LLMs on AMD Ryzen AI 9: A Linux Setup Guide

Mar 23, 2026 by Alexandre Croix | 7017 views

https://cylab.be/blog/494/running-local-llms-on-amd-ryzen-ai-9-a-linux-setup-guide

Running a local Large Language Model (LLM) is sometimes something we want to avoid being fully dependent on the big AI actors, such as OpenAI, Anthropic, and Google, and so on.

Despite Linux manage hardware better days after days, it is sometimes a little bit tricky to set up properly your computer and use a local model in a correct way, especially for AMD CPU/GPU.

Here, we will describe how to setup, configure and run a local LLM on a laptop with:

An AMD Ryzen AI 9 HX PRO CPU
An AMD Radeon 890M GPU
64GB of RAM shared with CPU and GPU (unified memory)
ZorinOS 18 (Ubuntu 24.04 based)

Install ROCm stack

To fully utilize your AMD GPU, you need the ROCm (Radeon Open Compute) software stack. More than just a driver, ROCm includes the libraries and debugging tools necessary for AI workloads.

We need to download and install the last version of the ROCm package from the AMD website : https://www.amd.com/en/support/download/linux-drivers.html

Pay attention to use the correct version based on your hardware. For the computer of this tutorial (AMD Ryzen AI) the kernel must be 6.14-1018 or newer:

sudo apt update
wget https://repo.radeon.com/amdgpu-install/7.2/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb
sudo apt install ./amdgpu-install_7.2.70200-1_all.deb

It is time to setup ROCm usecase:

amdgpu-install -y --usecase=rocm --no-dkms

The –no-dkms option tells installer to use the AMD drivers already built into your Linux kernel rather than trying to compile and install a separate (and often conflicting) kernel module.

The ZorinOS trick

If you are using ZorinOS, the amdgpu-install script might fail because it looks for the “Ubuntu” ID in /etc/os-release. To fix this, temporarily change the line ID=zorin to ID=ubuntu in that file before running the installer.

Finally, add your user to the necessary groups and reboot:

sudo usermod -a -G render,video $LOGNAME
sudo reboot

Now, time to check if your GPU is properly recognized by the OS. The command: rocminfo

should produce something containing information about your hardware:

*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU
[...]

Configuring shared memory (TTM)

This hardware uses Unified Memory, meaning the CPU and GPU share the same 64GB of RAM. While you can reserve a fixed amount of VRAM in the BIOS (UMA Frame Buffer Size), AMD recommends keeping this minimum and using the Translation Table Manager (TTM) limit instead.

What is the TTM (Translation Table Manager)?

Reserving 16GB in the BIOS makes that memory unavailable to the CPU, even when the GPU is idle. TTM allows the GPU to request memory dynamically “on the fly.”

sudo apt install pipx
pipx ensurepath
pipx install amd-debug-tools
amd-ttm

❯ amd-ttm
💻 Current TTM pages limit: 14680064 pages (56.00 GB)
💻 Total system memory: 61.45 GB

amd-ttm can be used to change the amount of memory your GPU is allowed to use:

❯ amd-ttm --set 55
🐧 Successfully set TTM pages limit to 14417920 pages (55.00 GB)
🐧 Configuration written to /etc/modprobe.d/ttm.conf
○ NOTE: You need to reboot for changes to take effect.
Would you like to reboot the system now? (y/n): y

Optimizing Ollama

Ollama is a fantastic tool, but it may not “whitelist” brand-new chips like the Radeon 890M immediately. If you run a model and see no GPU usage in nvtop or radeontop, you need to force-identify the hardware.

Additionally, we can optimize performance using Flash Attention and KV Cache Quantization. These reduce the memory bottleneck between the RAM and the GPU. For example, 4-bit quantization of the context window can reduce memory usage by 4x with minimal loss in precision.

To apply these optimizations, edit the Ollama service:

sudo systemctl edit ollama.service

Add the following environment variables into the [Service] section:

[Service] 
# Force Ollama to recognize the RDNA3 architecture
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0" 

# Enable Flash Attention to optimize data transfer
Environment="OLLAMA_FLASH_ATTENTION=1"

# Enable 4-bit quantization for the KV cache (saves VRAM)
Environment="OLLAMA_KV_CACHE_TYPE=q4_0"

Save file and restart the service:

sudo systemctl daemon-reload
sudo systemctl restart ollama

You are now ready to run high-performance local LLMs directly on your AMD-powered laptop!

This blog post is licensed under CC BY-SA 4.0 creative commons attribution share-alike

Running Local LLMs on AMD Ryzen AI 9: A Linux Setup Guide

Install ROCm stack

The ZorinOS trick

Configuring shared memory (TTM)

Optimizing Ollama

Manage Open WebUI users and roles with Keycloak

[Neovim] Teaming up codeCompanion with a remote Open WebUI instance

Teaming Up Zed AI-Powered Code Assistant with a Remote Open WebUI Instance