Mar 23, 2026 by Alexandre Croix | 291 views
https://cylab.be/blog/494/running-local-llms-on-amd-ryzen-ai-9-a-linux-setup-guide
Running a local Large Language Model (LLM) is sometimes something we want to avoid being fully dependent on the big AI actors, such as OpenAI, Anthropic, and Google, and so on.
Despite Linux manage hardware better days after days, it is sometimes a little bit tricky to set up properly your computer and use a local model in a correct way, especially for AMD CPU/GPU.
Here, we will describe how to setup, configure and run a local LLM on a laptop with:
To fully utilize your AMD GPU, you need the ROCm (Radeon Open Compute) software stack. More than just a driver, ROCm includes the libraries and debugging tools necessary for AI workloads.
We need to download and install the last version of the ROCm package from the AMD website : https://www.amd.com/en/support/download/linux-drivers.html
Pay attention to use the correct version based on your hardware. For the computer of this tutorial (AMD Ryzen AI) the kernel must be 6.14-1018 or newer:
sudo apt update
wget https://repo.radeon.com/amdgpu-install/7.2/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb
sudo apt install ./amdgpu-install_7.2.70200-1_all.deb
It is time to setup ROCm usecase:
amdgpu-install -y --usecase=rocm --no-dkms
The –no-dkms option tells installer to use the AMD drivers already built into your Linux kernel rather than trying to compile and install a separate (and often conflicting) kernel module.
If you are using ZorinOS, the amdgpu-install script might fail because it looks for the “Ubuntu” ID in /etc/os-release. To fix this, temporarily change the line ID=zorin to ID=ubuntu in that file before running the installer.
Finally, add your user to the necessary groups and reboot:
sudo usermod -a -G render,video $LOGNAME
sudo reboot
Now, time to check if your GPU is properly recognized by the OS. The command: rocminfo
should produce something containing information about your hardware:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
[...]
This hardware uses Unified Memory, meaning the CPU and GPU share the same 64GB of RAM. While you can reserve a fixed amount of VRAM in the BIOS (UMA Frame Buffer Size), AMD recommends keeping this minimum and using the Translation Table Manager (TTM) limit instead.
What is the TTM (Translation Table Manager)?
Reserving 16GB in the BIOS makes that memory unavailable to the CPU, even when the GPU is idle. TTM allows the GPU to request memory dynamically “on the fly.”
sudo apt install pipx
pipx ensurepath
pipx install amd-debug-tools
amd-ttm
❯ amd-ttm
💻 Current TTM pages limit: 14680064 pages (56.00 GB)
💻 Total system memory: 61.45 GB
amd-ttm can be used to change the amount of memory your GPU is allowed to use:
❯ amd-ttm --set 55
🐧 Successfully set TTM pages limit to 14417920 pages (55.00 GB)
🐧 Configuration written to /etc/modprobe.d/ttm.conf
○ NOTE: You need to reboot for changes to take effect.
Would you like to reboot the system now? (y/n): y
Ollama is a fantastic tool, but it may not “whitelist” brand-new chips like the Radeon 890M immediately. If you run a model and see no GPU usage in nvtop or radeontop, you need to force-identify the hardware.
Additionally, we can optimize performance using Flash Attention and KV Cache Quantization. These reduce the memory bottleneck between the RAM and the GPU. For example, 4-bit quantization of the context window can reduce memory usage by 4x with minimal loss in precision.
To apply these optimizations, edit the Ollama service:
sudo systemctl edit ollama.service
Add the following environment variables into the [Service] section:
[Service]
# Force Ollama to recognize the RDNA3 architecture
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
# Enable Flash Attention to optimize data transfer
Environment="OLLAMA_FLASH_ATTENTION=1"
# Enable 4-bit quantization for the KV cache (saves VRAM)
Environment="OLLAMA_KV_CACHE_TYPE=q4_0"
Save file and restart the service:
sudo systemctl daemon-reload
sudo systemctl restart ollama
You are now ready to run high-performance local LLMs directly on your AMD-powered laptop!
This blog post is licensed under
CC BY-SA 4.0