Talos Vulnerability Report

TALOS-2024-1912

llama.cpp GGUF library GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing heap-based buffer overflow vulnerability

February 26, 2024
CVE Number

CVE-2024-21825

SUMMARY

A heap-based buffer overflow vulnerability exists in the GGUF library GGUF_TYPE_ARRAY/GGUF_TYPE_STRING parsing functionality of llama.cpp Commit 18c2e17. A specially crafted .gguf file can lead to code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

llama.cpp Commit 18c2e17

PRODUCT URLS

llama.cpp - https://github.com/ggerganov/llama.cpp

CVSSv3 SCORE

8.8 - CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-190 - Integer Overflow or Wraparound

DETAILS

LLaMA.cpp is the cpp implementation for running the LLaMA. This project relies on the ggml library, that is a tensor library with several functionalities.

The LLaMA project and many other relies on the GGUF file format. GGUF is a popular file format for storing LLM model representations. In this library the function that parses the .gguf file is gguf_init_from_file:

struct gguf_context * gguf_init_from_file(const char * fname, struct gguf_init_params params) {
    FILE * file = fopen(fname, "rb");
    if (!file) {
        return NULL;
    }
    [...]
    struct gguf_context * ctx = GGML_ALIGNED_MALLOC(sizeof(struct gguf_context));

    // read the header
    {
        [...]

        ctx->kv    = NULL;
        ctx->infos = NULL;
        ctx->data  = NULL;

        ok = ok && gguf_fread_el(file, &ctx->header.version,   sizeof(ctx->header.version),   &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_tensors, sizeof(ctx->header.n_tensors), &offset);
        ok = ok && gguf_fread_el(file, &ctx->header.n_kv,      sizeof(ctx->header.n_kv),      &offset);

        [...]
    }
    [...]
    // read the kv pairs
    {
[1]     ctx->kv = malloc(ctx->header.n_kv * sizeof(struct gguf_kv));

        for (uint64_t i = 0; i < ctx->header.n_kv; ++i) {
            struct gguf_kv * kv = &ctx->kv[i];

            ok = ok && gguf_fread_str(file, &kv->key,                    &offset);
            ok = ok && gguf_fread_el (file, &kv->type, sizeof(kv->type), &offset);

            switch (kv->type) {
                [...]
                case GGUF_TYPE_ARRAY:
                    {
                        ok = ok && gguf_fread_el(file, &kv->value.arr.type, sizeof(kv->value.arr.type), &offset);
                        ok = ok && gguf_fread_el(file, &kv->value.arr.n,    sizeof(kv->value.arr.n), &offset);

                        switch (kv->value.arr.type) {
                            [...]
                            case GGUF_TYPE_STRING:
                                {
[2]                                 kv->value.arr.data = malloc(kv->value.arr.n * sizeof(struct gguf_str));
                                    for (uint64_t j = 0; j < kv->value.arr.n; ++j) {
[3]                                     ok = ok && gguf_fread_str(file, &((struct gguf_str *) kv->value.arr.data)[j], &offset);
                                    }
                                } break;
                            case GGUF_TYPE_ARRAY:
                            case GGUF_TYPE_COUNT: GGML_ASSERT(false && "invalid type"); break;
                        }
                    } break;
                case GGUF_TYPE_COUNT: GGML_ASSERT(false && "invalid type");
            }

            if (!ok) {
                break;
            }
        }

        if (!ok) {
            fprintf(stderr, "%s: failed to read key-value pairs\n", __func__);
            fclose(file);
            gguf_free(ctx);
            return NULL;
        }
    }
    [...]
}

We will focus on the kv parsing part. At [1] the spaces for the specified ctx->header.n_kv, value parsed from the provided file, is used to allocate the correct number of gguf_kv elements. Then it is fetched from the file, for each kv element, its key and its type. Based on the type different action are performed, if the specified type is GGUF_TYPE_ARRAY then other two values are fetched from the file, the type of the array, stored in kv->value.arr.type, and the number of elements, stored in kv->value.arr.n. If the kv->value.arr.type is GGUF_TYPE_STRING the code at [2] will be reached.

At [2] it is executed kv->value.arr.data = malloc(kv->value.arr.n * sizeof(struct gguf_str)) to allocate the amount of elements specified in kv->value.arr.n for the gguf_str struct. Then for each elements the code gguf_fread_str(file, &((struct gguf_str *) kv->value.arr.data)[j], &offset); is executed to read a string into kv->value.arr.data[j].

At [2], the kv->value.arr.n value is an arbitrary uint64_t value, and the sizeof(struct gguf_str) is 16 bytes. The multiplication between the two can lead to an integer overflow, this would results in allocating less space than the required ones. Then the loop will be execute potentially for kv->value.arr.n times, that could be more than the actual number of elements allocated, this could lead to a heap-based buffer overflow in the gguf_fread_str function writing the pointer to a string in &((struct gguf_str *) kv->value.arr.data)[j].

Crash Information

=================================================================
==3991119==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x610000000200 at pc 0x562bf62483fb bp 0x7ffd27aa13a0 sp 0x7ffd27aa1398
WRITE of size 8 at 0x610000000200 thread T0
    #0 0x562bf62483fa in gguf_fread_str /home/vagrant/llama.cpp/ggml.c:18658
    #1 0x562bf62496e7 in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18796
    #2 0x562bf62e87e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #3 0x562bf6294592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #4 0x562bf62b3355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #5 0x562bf63d21b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #6 0x562bf617e8b1 in main examples/main/main.cpp:187
    #7 0x7fe6229d9d09 in __libc_start_main ../csu/libc-start.c:308
    #8 0x562bf6178f49 in _start (/home/vagrant/llama.cpp/main+0x41f49)

0x610000000200 is located 0 bytes to the right of 192-byte region [0x610000000140,0x610000000200)
allocated by thread T0 here:
    #0 0x7fe622f80e8f in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x562bf6249639 in gguf_init_from_file /home/vagrant/llama.cpp/ggml.c:18794
    #2 0x562bf62e87e9 in llama_model_loader::llama_model_loader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, llama_model_kv_override const*) (/home/vagrant/llama.cpp/main+0x1b17e9)
    #3 0x562bf6294592 in llama_model_load /home/vagrant/llama.cpp/llama.cpp:3792
    #4 0x562bf62b3355 in llama_load_model_from_file /home/vagrant/llama.cpp/llama.cpp:9291
    #5 0x562bf63d21b4 in llama_init_from_gpt_params(gpt_params&) common/common.cpp:1105
    #6 0x562bf617e8b1 in main examples/main/main.cpp:187
    #7 0x7fe6229d9d09 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/vagrant/llama.cpp/ggml.c:18658 in gguf_fread_str
Shadow bytes around the buggy address:
  0x0c207fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c207fff8000: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c207fff8010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c207fff8020: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c207fff8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c207fff8040:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c207fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==3991119==ABORTING
VENDOR RESPONSE

Databricks has independently reported this vulnerability concurrently with our own discovery.

We have not received a response from the vendor, however, we confirmed that this vulnerability has been fixed.

TIMELINE

2024-01-29 - Initial Vendor Contact
2024-01-29 - Vendor Patch Release
2024-01-30 - Vendor Disclosure
2024-02-26 - Public Release

Credit

Discovered by Francesco Benvenuto of Cisco Talos.