Talos Vulnerability Report

TALOS-2024-1914

llama.cpp GGUF library info->ne heap-based buffer overflow vulnerability

February 26, 2024
CVE Number

CVE-2024-21802

SUMMARY

A heap-based buffer overflow vulnerability exists in the GGUF library info->ne functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

llama.cpp Commit 18c2e17

PRODUCT URLS

llama.cpp - https://github.com/ggerganov/llama.cpp

CVSSv3 SCORE

8.8 - CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-122 - Heap-based Buffer Overflow

DETAILS

LLaMA.cpp is the cpp implementation for running the LLaMA. This project relies on the ggml library, that is a tensor library with several functionalities.

The LLaMA project and many other relies on the GGUF file format. GGUF is a popular file format for storing LLM model representations. In this library the function that parses the .gguf file is gguf_init_from_file:

struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_params params) {
    FILE * file = fopen(fname, "rb");
    if (!file) {
        return NULL;
    }
    [...]
    
    struct gguf_context * ctx = GGML_ALIGNED_MALLOC(sizeof(struct gguf_context));

    // read the header
    {
        [...]

        ctx->kv    = NULL;
        ctx->infos = NULL;
        ctx->data  = NULL;

        ok = ok && gguf_fread_el(file, &ctx->header.version,   sizeof(ctx->header.version),   &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_tensors, sizeof(ctx->header.n_tensors), &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_kv,      sizeof(ctx->header.n_kv),      &offset);

        [...]
    }

    // read the tensor infos
    {
[1]     ctx->infos = malloc(ctx->header.n_tensors * sizeof(struct gguf_tensor_info));

        for (uint64_t i = 0; i < ctx->header.n_tensors; ++i) {
[2]         struct gguf_tensor_info * info = &ctx->infos[i];

            for (int j = 0; j < GGML_MAX_DIMS; ++j) {
[3]             info->ne[j] = 1;
            }

            ok = ok && gguf_fread_str(file, &info->name,                          &offset);
[4]         ok = ok && gguf_fread_el (file, &info->n_dims, sizeof(info->n_dims),  &offset);
            for (uint32_t j = 0; j < info->n_dims; ++j) {
[5]             ok = ok && gguf_fread_el(file, &info->ne[j], sizeof(info->ne[j]), &offset);
            }
            ok = ok && gguf_fread_el (file, &info->type,   sizeof(info->type),    &offset);
            ok = ok && gguf_fread_el (file, &info->offset, sizeof(info->offset),  &offset);
            [...]
        }
        [...]
    }
    [...]
}

We will focus on the tensor parsing part. At [1] the spaces for the specified ctx->header.n_tensors, value parsed from the provided file, is used to allocate the correct number of gguf_tensor_info elements. Then it is fetched from the file, for each gguf_tensor_info element, its name, the number of elements in the info->ne array and other information. Then at [2] the i-th element of the allocated array is fetched, and then used at [3] to fill the info->ne array is with ones. Then at [4] the info->n_dims struct’s member is populated with info from the file, this information is used as dimension for the for loop that follows the code line at [4]. At [5], the j-th element of the array info->ne is populated fetching the data from the file. Because info->ne is an array of size GGML_MAX_DIMS, that at the time of writing is equal to 4, and the info->n_dims is an arbitrary uint32_t value, this can lead to an heap-based buffer overflow, indeed the code line at [5] could actually access out of bound the array and insert data outside the actual buffer.

Crash Information

=================================================================
==3992039==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x612000000148 at pc 0x7fab54324559 bp 0x7fff1c2a4660 sp 0x7fff1c2a3e10
WRITE of size 8 at 0x612000000148 thread T0
    #0 0x7fab54324558 in __interceptor_fread ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1025
    #1 0x55bfba96137c in gguf_fread_el /home/vagrant/llama.cpp/ggml.c:18652
    #2 0x55bfba962b90 in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18833
    #3 0x55bfbaa017e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #4 0x55bfba9ad592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #5 0x55bfba9cc355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #6 0x55bfbaaeb1b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #7 0x55bfba8978b1 in main examples/main/main.cpp:187
    #8 0x7fab53de8d09 in __libc_start_main ../csu/libc-start.c:308
    #9 0x55bfba891f49 in _start (/home/vagrant/llama.cpp/main+0x41f49)

0x612000000148 is located 0 bytes to the right of 264-byte region [0x612000000040,0x612000000148)
allocated by thread T0 here:
    #0 0x7fab5438fe8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x55bfba9629b0 in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18821
    #2 0x55bfbaa017e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #3 0x55bfba9ad592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #4 0x55bfba9cc355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #5 0x55bfbaaeb1b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #6 0x55bfba8978b1 in main examples/main/main.cpp:187
    #7 0x7fab53de8d09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:1025 in __interceptor_fread
Shadow bytes around the buggy address:
  0x0c247fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c247fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c247fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c247fff8000: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c247fff8010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c247fff8020: 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa
  0x0c247fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c247fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c247fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c247fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c247fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==3992039==ABORTING
VENDOR RESPONSE

Databricks has independently reported this vulnerability concurrently with our own discovery.

We have not received a response from the vendor, however, we confirmed that this vulnerability has been fixed.

TIMELINE

2024-01-29 - Initial Vendor Contact
2024-01-29 - Vendor Patch Release
2024-01-30 - Vendor Disclosure
2024-02-26 - Public Release

Credit

Discovered by Francesco Benvenuto of Cisco Talos.