Compile and Install llama.cpp using BLIS

Today I want to test my luck to run llama.cpp in ThinkPad L14 Gen 1 in Fedora Linux 42. I have 32GB of RAM, mind you, so I can try to load bigger model.

Install BLIS

Get BLIS:

git clone https://github.com/flame/blis && cd blis

This command clones the BLIS (BLAS-like Library Instantiation Software) repository from GitHub and navigates into the cloned directory. BLIS is a high-performance BLAS library optimized for various CPU architectures.

Compile and install

CFLAGS="-O3 -march=native -mtune=native -funroll-loops -fomit-frame-pointer" LDFLAGS="-ljemalloc" ./configure --prefix=/usr --libdir=/usr/lib64 --enable-cblas  -t openmp,pthreads zen2
sudo make install

This configure command sets up BLIS for compilation with optimizations for Zen 2 architecture (used in Ryzen processors). It enables CBLAS interface, OpenMP and pthreads threading, and installs to system directories. The CFLAGS optimize for performance with native architecture tuning and jemalloc for memory allocation.

NOTE: I install them in /usr because llama-cli doesn’t linked to /usr/local/lib/libblis.so

Install `llama.cpp`

Get Llama.cpp:

cd ..
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp

Let’s compile it:

cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=FLAME  -DCMAKE_C_FLAGS="-O3 -march=znver2 -mtune=znver2 -fPIC" -DCMAKE_CXX_FLAGS="-O3 -march=znver2 -mtune=znver2 -fPIC" -DCMAKE_EXE_LINKER_FLAGS="-lblis -ljemalloc"

cmake --build build --config Release

Installing the Binaries

While llama.cpp does not provide a standard installation script (make install), you need to manually place the compiled binaries. I opt for put the bin directory in build into a local directory and add it to your system’s PATH. The reason why I don’t put it in the general bin directory is because it has a lot of binaries and libraries.

Here’s a common way to do it:

Create a directory to store the llama.cpp binaries. We’ll use ~/.local/share/llama.cpp.
```
mkdir -p ~/.local/share/llama.cpp
```
Copy the compiled binaries from the build/bin directory to the newly created directory.
```
cp -a build/bin/ ~/.local/share/llama.cpp/
```
Add the new directory to your PATH. This command appends a line to your .bashrc file, so the PATH is updated automatically in future terminal sessions.
```
echo 'export PATH=$PATH:$HOME/.local/share/llama.cpp/bin' >> ~/.bashrc
```
Apply the changes to your current terminal session.
```
source ~/.bashrc
```

Now, you can verify that the installation was successful by checking the version of llama-cli:

$ llama-cli --version
version: 6294 (bcbddcd5)
built with gcc (GCC) 14.2.1 20240805 (Red Hat 14.2.1-1) for x86_64-redhat-linux

The output confirms that the llama-cli command is now accessible and that it was compiled with GCC.

Running a Model

With llama.cpp installed, you can now run a model. The llama-cli tool can automatically download a model from Hugging Face and run it.

Here is an example command to run a model:

llama-cli -hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_K_M --color -c 2048 -n 512 -t 3 --temp 0.7

Here’s a breakdown of the parameters used in this command:

-hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_K_M: Downloads the Qwen3-4B-Instruct model (quantized to Q4_K_M) from the Hugging Face repository unsloth/Qwen3-4B-Instruct-2507-GGUF
--color: Enables colorized output for better readability
-c 2048: Sets the context size to 2048 tokens (conservative for laptop memory)
-n 512: Sets the maximum number of tokens to generate to 512 (reasonable response length)
-t 3: Uses 3 threads for processing (conservative for laptop thermal management)
--temp 0.7: Sets the temperature for sampling (balanced creativity vs determinism)

Twitter Facebook LinkedIn

Compile and Install llama.cpp using BLIS

Jan Peter Alexander

Install BLIS

Install `llama.cpp`

Installing the Binaries

Running a Model

You May Also Enjoy

Run Electron on Wayland

Run Electron on Wayland

Install Podman Docker-compatible in Arch Linux

Install Golang in Arch Linux