Talos Vulnerability Report

TALOS-2024-1913

llama.cpp GGUF library gguf_fread_str heap-based buffer overflow vulnerability

February 26, 2024

CVE Number

CVE-2024-23496

SUMMARY

A heap-based buffer overflow vulnerability exists in the GGUF library gguf_fread_str functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

llama.cpp Commit 18c2e17

PRODUCT URLS

llama.cpp - https://github.com/ggerganov/llama.cpp

CVSSv3 SCORE

8.8 - CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-190 - Integer Overflow or Wraparound

DETAILS

LLaMA.cpp is the cpp implementation for running the LLaMA. This project relies on the ggml library, that is a tensor library with several functionalities.

The LLaMA project and many other relies on the GGUF file format. GGUF is a popular file format for storing LLM model representations. In this library the function that parses the .gguf file is gguf_init_from_file:

struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_params params) {
    FILE * file = fopen(fname, "rb");
    if (!file) {
        return NULL;
    }
    [...]
    struct gguf_context * ctx = GGML_ALIGNED_MALLOC(sizeof(struct gguf_context));

    // read the header
    {
        [...]

        ctx->kv    = NULL;
        ctx->infos = NULL;
        ctx->data  = NULL;

        ok = ok && gguf_fread_el(file, &ctx->header.version,   sizeof(ctx->header.version),   &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_tensors, sizeof(ctx->header.n_tensors), &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_kv,      sizeof(ctx->header.n_kv),      &offset);

        [...]
    }
    [...]
    // read the kv pairs
    {
        ctx->kv = malloc(ctx->header.n_kv * sizeof(struct gguf_kv));

        for (uint64_t i = 0; i < ctx->header.n_kv; ++i) {
            struct gguf_kv * kv = &ctx->kv[i];

[1]         ok = ok && gguf_fread_str(file, &kv->key,                    &offset);
            
            [...]
        }
    }
    [...]
}

In the gguf_init_from_file function uses the gguf_fread_str function. For example, at [1], this function is called for getting a kv element’s key of the. The gguf_fread_str function is used to, fetch at [2] the string length from the file in p->n, and then call calloc, at [3], using length p->n, the result pointer is saved in p->data. Once the memory space is allocated, then at [4], the string is fetched from the file and placed, through the gguf_fread_el function, in the just allocated buffer:

static bool gguf_fread_str(FILE * file, struct gguf_str * p, size_t * offset) {
    p->n    = 0;
    p->data = NULL;

    bool ok = true;

[2] ok = ok && gguf_fread_el(file, &p->n,    sizeof(p->n), offset);
[3] p->data = calloc(p->n + 1, 1);
[4] ok = ok && gguf_fread_el(file,  p->data, p->n,         offset);

    return ok;
}

Because the p->n is an arbitrary uint64_t value, the p->n + 1 calculation, performed at [3], can overflow. This integer overflow would cause to allocate a smaller space than the one necessary. Then at [4] the p->data will be filled with data from the file but the p->data array potentially does not have enough space, this could lead to an heap-based buffer overflow.

Crash Information

=================================================================
==51876==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000651 at pc 0x7fa18dc9a559 bp 0x7ffd8e912170 sp 0x7ffd8e911920
WRITE of size 23 at 0x602000000651 thread T0
    #0 0x7fa18dc9a558 in __interceptor_fread ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1025
    #1 0x563e97d0b37c in gguf_fread_el /home/vagrant/llama.cpp/ggml.c:18652
    #2 0x563e97d0b506 in gguf_fread_str /home/vagrant/llama.cpp/ggml.c:18664
    #3 0x563e97d0bed8 in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18753
    #4 0x563e97dab7e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #5 0x563e97d57592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #6 0x563e97d76355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #7 0x563e97e951b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #8 0x563e97c418b1 in main examples/main/main.cpp:187
    #9 0x7fa18d75ed09 in __libc_start_main ../csu/libc-start.c:308
    #10 0x563e97c3bf49 in _start (/home/vagrant/llama.cpp/main+0x41f49)

0x602000000651 is located 0 bytes to the right of 1-byte region [0x602000000650,0x602000000651)
allocated by thread T0 here:
    #0 0x7fa18dd06037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x563e97d0b4af in gguf_fread_str /home/vagrant/llama.cpp/ggml.c:18663
    #2 0x563e97d0bed8 in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18753
    #3 0x563e97dab7e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #4 0x563e97d57592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #5 0x563e97d76355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #6 0x563e97e951b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #7 0x563e97c418b1 in main examples/main/main.cpp:187
    #8 0x7fa18d75ed09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1025 in __interceptor_fread
Shadow bytes around the buggy address:
  0x0c047fff8070: fa fa fd fa fa fa fd fa fa fa 00 fa fa fa 00 fa
  0x0c047fff8080: fa fa fd fa fa fa fd fa fa fa 00 fa fa fa 00 fa
  0x0c047fff8090: fa fa fd fa fa fa fd fa fa fa 00 fa fa fa 00 fa
  0x0c047fff80a0: fa fa fd fa fa fa fd fa fa fa 00 fa fa fa 00 fa
  0x0c047fff80b0: fa fa fd fa fa fa fd fa fa fa 00 fa fa fa 00 fa
=>0x0c047fff80c0: fa fa 00 fa fa fa 01 fa fa fa[01]fa fa fa fa fa
  0x0c047fff80d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff80e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff80f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==51876==ABORTING

VENDOR RESPONSE

Databricks has independently reported this vulnerability concurrently with our own discovery.

We have not received a response from the vendor, however, we confirmed that this vulnerability has been fixed.

TIMELINE

2024-01-29 - Initial Vendor Contact
2024-01-29 - Vendor Patch Release
2024-01-30 - Vendor Disclosure
2024-02-26 - Public Release

Credit

Discovered by Francesco Benvenuto of Cisco Talos.

TALOS-2024-1916

TALOS-2024-1912

Intelligence Center

Vulnerability Research

Incident Response

Security Resources

Media

Company