Today I want to test my luck to run llama.cpp in ThinkPad L14 Gen 1 in Fedora Linux 42. I have 32GB of RAM, mind you, so I can try to load bigger model.
Install BLIS
Get BLIS:
git clone https://github.com/flame/blis && cd blis
This command clones the BLIS (BLAS-like Library Instantiation Software) repository from GitHub and navigates into the cloned directory. BLIS is a high-performance BLAS library optimized for various CPU architectures.
Compile and install
CFLAGS="-O3 -march=native -mtune=native -funroll-loops -fomit-frame-pointer" LDFLAGS="-ljemalloc" ./configure --prefix=/usr --libdir=/usr/lib64 --enable-cblas -t openmp,pthreads zen2
sudo make install
This configure command sets up BLIS for compilation with optimizations for Zen 2 architecture (used in Ryzen processors). It enables CBLAS interface, OpenMP and pthreads threading, and installs to system directories. The CFLAGS optimize for performance with native architecture tuning and jemalloc for memory allocation.
NOTE:
I install them in /usr because llama-cli doesn’t linked to /usr/local/lib/libblis.so
Install llama.cpp
Get Llama.cpp:
cd ..
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
Let’s compile it:
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=FLAME -DCMAKE_C_FLAGS="-O3 -march=znver2 -mtune=znver2 -fPIC" -DCMAKE_CXX_FLAGS="-O3 -march=znver2 -mtune=znver2 -fPIC" -DCMAKE_EXE_LINKER_FLAGS="-lblis -ljemalloc"
cmake --build build --config Release
Installing the Binaries
While llama.cpp does not provide a standard installation script (make install), you need to manually place the compiled binaries. I opt for put the bin directory in build into a local directory and add it to your system’s PATH. The reason why I don’t put it in the general bin directory is because it has a lot of binaries and libraries.
Here’s a common way to do it:
-
Create a directory to store the
llama.cppbinaries. We’ll use~/.local/share/llama.cpp.mkdir -p ~/.local/share/llama.cpp -
Copy the compiled binaries from the
build/bindirectory to the newly created directory.cp -a build/bin/ ~/.local/share/llama.cpp/ -
Add the new directory to your
PATH. This command appends a line to your.bashrcfile, so thePATHis updated automatically in future terminal sessions.echo 'export PATH=$PATH:$HOME/.local/share/llama.cpp/bin' >> ~/.bashrc -
Apply the changes to your current terminal session.
source ~/.bashrc
Now, you can verify that the installation was successful by checking the version of llama-cli:
$ llama-cli --version
version: 6294 (bcbddcd5)
built with gcc (GCC) 14.2.1 20240805 (Red Hat 14.2.1-1) for x86_64-redhat-linux
The output confirms that the llama-cli command is now accessible and that it was compiled with GCC.
Running a Model
With llama.cpp installed, you can now run a model. The llama-cli tool can automatically download a model from Hugging Face and run it.
Here is an example command to run a model:
llama-cli -hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_K_M --color -c 2048 -n 512 -t 3 --temp 0.7
Here’s a breakdown of the parameters used in this command:
-hf unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_K_M: Downloads the Qwen3-4B-Instruct model (quantized to Q4_K_M) from the Hugging Face repositoryunsloth/Qwen3-4B-Instruct-2507-GGUF--color: Enables colorized output for better readability-c 2048: Sets the context size to 2048 tokens (conservative for laptop memory)-n 512: Sets the maximum number of tokens to generate to 512 (reasonable response length)-t 3: Uses 3 threads for processing (conservative for laptop thermal management)--temp 0.7: Sets the temperature for sampling (balanced creativity vs determinism)