Talos Vulnerability Report

TALOS-2023-1807

GTKWave VCD sorted bsearch arbitrary write vulnerabilities

January 8, 2024

CVE Number

CVE-2023-37921,CVE-2023-37923,CVE-2023-37922

SUMMARY

Multiple arbitrary write vulnerabilities exist in the VCD sorted bsearch functionality of GTKWave 3.3.115. A specially crafted .vcd file can lead to arbitrary code execution. A victim would need to open a malicious file to trigger these vulnerabilities.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

GTKWave 3.3.115

PRODUCT URLS

GTKWave - https://gtkwave.sourceforge.net

CVSSv3 SCORE

7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-118 - Incorrect Access of Indexable Resource (‘Range Error’)

DETAILS

GTKWave is a wave viewer, often used to analyze FPGA simulations and logic analyzer captures. It includes a GUI to view and analyze traces, as well as convert across several file formats (.lxt, .lxt2, .vzt, .fst, .ghw, .vcd, .evcd) either by using the UI or its command line tools. GTKWave is available for Linux, Windows and MacOS. Trace files can be shared within teams or organizations, for example to compare results of simulation runs across different design implementations, to analyze protocols captured with logic analyzers or just as a reference when porting design implementations.

GTKWave sets up mime types for its supported extensions.

VCD (Value Change Dump) files are parsed by the vcd_parse function. This function is duplicated in several conversion utilities (vcd2lxt, vcd2lxt2, vcd2vzt) but not in the GUI parsers of GTKWave. In general the various implementations are very similar or identical, but in some cases they differ enough to not allow the issue described in this advisory to be triggered.

Let’s describe the execution flow for the vcd2lxt utility; the other implementations have very similar behavior.

The function vcd_parse loops over each line in the file [1]. Depending on which token has been read [2], a different switch block is executed:

     static void vcd_parse(int linear) {
         int tok;

[1]      for (;;) {
[2]          switch (get_token()) {
                 ...

The get_token() function simply extracts a token from the file at the current cursor position, saving it to the global yytext buffer, and assigning the token’s length to the global yylen.
Moreover, if the token does not start with “$”, the token is considered a special symbol, and it has to match one of these tokens:

char *tokens[] = {"var", "end", "scope", "upscope",
                  "comment", "date", "dumpall", "dumpoff", "dumpon",
                  "dumpvars", "enddefinitions",
                  "dumpports", "dumpportsoff", "dumpportson", "dumpportsall",
                  "timescale", "version", "vcdclose", "timezero",
                  "", "", ""};

The return value of get_token is a token type, which is an index inside the tokens array above.
If the token does not start with “$”, the token is considered a string and the returned token type will be T_STRING.

Going back to the switch above, if the parsed token is “$var”, the token type will be T_VAR and we will enter the block at [3]:

[3]  case T_VAR: {
         int vtok;
         struct vcdsymbol *v = NULL;

         var_prevch = 0;
         ...
[4]      vtok = get_vartoken(1);

A new token is read using get_vartoken() [5] and saved into vtok.
This function, similarly to get_token(), extracts a token from the file, separated by any of “ “, “\t”, “\n”, or “\r”.
If match_kw (the first argument to the function) is 1, then the token is matched against the vartypes array and the return value is set to the matching index inside it:

static char *vartypes[] = {"event", "parameter",
                           "integer", "real", "real_parameter", "realtime", "string", "reg", "supply0",
                           "supply1", "time", "tri", "triand", "trior",
                           "trireg", "tri0", "tri1", "wand", "wire", "wor", "port", "in", "out", "inout",
                           "$end", "", "", "", ""};

Going on with the T_VAR case. Our token needs to be “port” or any of the other symbols before it [5]:

         ...
[5]      if (vtok > V_PORT) goto bail;

[6]      v = (struct vcdsymbol *)calloc_2(1, sizeof(struct vcdsymbol));
         v->vartype = vtok;
         v->msi = v->lsi = vcd_explicit_zero_subscripts; /* indicate [un]subscripted status */

         if (vtok == V_PORT) {
             ...
         } else /* regular vcd var, not an evcd port var */
         {
[7]          vtok = get_vartoken(1);
             if (vtok == V_END) goto err;
[7]          v->size = atoi_64(yytext);
[8]          vtok = get_strtoken();
             if (vtok == V_END) goto err;
             v->id = (char *)malloc_2(yylen + 1);
[8]          strcpy(v->id, yytext);
             v->nid = vcdid_hash(yytext, yylen);

             if (v->nid < vcd_minid) vcd_minid = v->nid;
             if (v->nid > vcd_maxid) vcd_maxid = v->nid;

[9]          vtok = get_vartoken(0);
             if (vtok != V_STRING) goto err;
             if (slisthier_len) {
                ...
             } else {
                 v->name = (char *)malloc_2(yylen + 1);
[9]              strcpy(v->name, yytext);
             }

[10]         vtok = get_vartoken(1);
             if (vtok == V_END) goto dumpv;
             ...
         }

A series of tokens is then extracted to populate the vcdsymbol pointed by v [6], in this order:

[7] v->size (integer) is the size of this symbol
[8] v->id is a string representing the symbol ID
[9] v->name is a string representing the name of this symbol
[10] the declaration must end with the string “$end”

The code then continues by populating v->value and v->narray:

     dumpv:
         ...

         /* initial conditions */
[11]     v->value = (char *)malloc_2(v->size + 1);
         v->value[v->size] = 0;
         v->narray = (struct Node **)calloc_2(v->size, sizeof(struct Node *));
         {
             int i;
             for (i = 0; i < v->size; i++) {
[12]             v->value[i] = 'x';

                 v->narray[i] = (struct Node *)calloc_2(1, sizeof(struct Node));
                 v->narray[i]->head.time = -1;
                 v->narray[i]->head.v.val = 1;
             }
         }

         ...

[13]    if (!vcdsymroot) {
            vcdsymroot = vcdsymcurr = v;
        } else {
            vcdsymcurr->next = v;
            vcdsymcurr = v;
        }
        numsyms++;

         ...

     bail:
         if (vtok != V_END) sync_end(NULL);
         break;
     }

v->value is allocated with a size of v->size + 1 [11], then it is initialized with “x” characters [12] and null-terminated.
Finally, the new v symbol is added to the symbols list [13] and numsyms is incremented. Here the case block ends and we go back to the loop at [1].

By defining variables using $var, we can create an arbitrary number of vcd symbols (as many as they can fit in memory), and numsyms increments accordingly without a limit.

After all variables are declared, “$enddefinitions” is read, which let us enter the switch block at [14]:

[14] case T_ENDDEFINITIONS:
         if (!header_over) {
[15]         header_over = 1; /* do symbol table management here */
[16]         create_sorted_table();
             if ((!sorted) && (!indexed)) {
                 fprintf(stderr, "No symbols in VCD file..nothing to do!\n");
                 exit(1);
             }

             if (linear) lt_set_no_interlace(lt);
         }
         break;

At [15], the variable header_over is set to 1. This is important, because this variable must be set to 1 in order to call the parse_valuechange() function in the next step.
As the variable definition has completed, the function create_sorted_table() is called to store all vcd symbols in an index for faster access [16].

     static void create_sorted_table(void) {
         struct vcdsymbol *v;
         struct vcdsymbol **pnt;
         unsigned int vcd_distance;
         struct vcdsymbol *root_v;
         int i;

         if (numsyms) {
[17]         vcd_distance = vcd_maxid - vcd_minid + 1;

[18]         if (vcd_distance <= 8 * 1024 * 1024) {
[19]             indexed = (struct vcdsymbol **)calloc_2(vcd_distance, sizeof(struct vcdsymbol *));

                 printf("%d symbols span ID range of %d, using indexing...\n", numsyms, vcd_distance);

                 v = vcdsymroot;
                 while (v) {
                     if (!(root_v = indexed[v->nid - vcd_minid])) {
                         indexed[v->nid - vcd_minid] = v;
                     }
                     alias_vs_normal_symadd(v, root_v);

                     v = v->next;
                 }
[20]         } else {
                 pnt = sorted = (struct vcdsymbol **)calloc_2(numsyms, sizeof(struct vcdsymbol *));
                 v = vcdsymroot;
                 while (v) {
                     *(pnt++) = v;
                     v = v->next;
                 }

                 qsort(sorted, numsyms, sizeof(struct vcdsymbol *), vcdsymcompare);

                 ...
         }
     }

At [17], the difference between vcd_maxid and vcd_minid is calculated. If two variables with symbols “A” and “AAAAA” were declared, their distance will be very big: 0x9b3890cb - 0x21 + 1. If the distance is smaller than ~8 millions [18], create_sorted_table will create a simple hash table in indexed [19] so symbols can be retrieved directly, after computing the vcd_hash of the symbol to look up. The size of this table is at maximum 8 * 1024 * 1024 * sizeof(void *), so 32 MB for 32-bit code and 64 MB for 64-bit code.
If instead the distance is too big [20], a sorted array is created in sorted.

At this point, we go back to the loop at [1].

The next token is read. If it’s a string, we enter the switch block at [21].

     case T_STRING:
[21]     if (header_over) {
             /* catchall for events when header over */
[22]         if (yytext[0] == '#') {
                ...
             } else {
[23]             parse_valuechange();
             }
         }
         break;

If header_over is 1 [21] and the string does not start with “#” [22], the function parse_valuechange() is called.

     static void parse_valuechange(void) {
         struct vcdsymbol *v;
         char *vector;
         int vlen;

         switch (yytext[0]) {
             ...
[24]         case 'p':
                 /* extract port dump value.. */
[25]             vector = malloc_2(yylen_cache = yylen);
                 strcpy(vector, yytext + 1);
                 vlen = yylen - 1;

[26]             get_strtoken(); /* throw away 0_strength_component */
                 get_strtoken(); /* throw away 0_strength_component */
                 get_strtoken(); /* this is the id                  */
[27]             v = bsearch_vcd(yytext, yylen);
                 if (!v) {
                     fprintf(stderr, "Near line %d, Unknown identifier: '%s'\n", vcdlineno, yytext);
                     free_2(vector);
                 } else {
[28]                 if (vlen < v->size) /* fill in left part */
                     {
                         char extend;
                         int i, fill;

                         extend = '0';

                         fill = v->size - vlen;
[29]                     for (i = 0; i < fill; i++) {
[30]                         v->value[i] = extend;
                         }
                         evcd_strcpy(v->value + fill, vector);
                     }
                     ...
     }

If the token starts with the letter “p”, we enter the block at [24].
The array vector is allocated to store the string after “p” [25].
Two tokens are then extracted and discarded [26]. Finally, the id token is extracted and searched for in the vcdsymbol’s list [27] using bsearch_vcd. If found, the resulting vcdsymbol is then pointed by v.
If the symbol has been found, we reach [28]. Here, if vlen is smaller than v->size, this means that the string in vector is not big enough to fill the whole v->value buffer, which is thus padded with zeros. Note that the loop that does the padding is controlled by the symbol’s size v->size.

It’s important to keep in mind that bsearch_vcd [27] returns a vcdsymbol, that, if arbitrarily controlled, allows for easily writing arbitrary memory. v->size is controlled, so the loop at [29] can write an arbitrary number of zeros to v->value, which is a pointer stored inside the vcdsymbol.

Indeed, bsearch_vcd presents an issue that allows for an arbitrary vcdsymbol to be returned:

     static struct vcdsymbol *bsearch_vcd(char *key, int len) {
         struct vcdsymbol **v;
         struct vcdsymbol *t;

[31]     if (indexed) {
[32]         unsigned int hsh = vcdid_hash(key, len);
[33]         if ((hsh >= vcd_minid) && (hsh <= vcd_maxid)) {
                 return (indexed[hsh - vcd_minid]);
             }
         }

[34]     v = (struct vcdsymbol **)bsearch(key, sorted, numsyms,
                                          sizeof(struct vcdsymbol *), vcdsymbsearchcompare);

         if (v) {
     #ifndef VCD_BSEARCH_IS_PERFECT
             for (;;) {
                 t = *v;

                 if ((v == sorted) || (strcmp((*(--v))->id, key))) {
                     return (t);
                 }
             }
     #else
             return (*v);
     #endif
         } else {
             return (NULL);
         }
     }

Let’s assume that we have a .vcd file with 0x20000 (131072) “$var” declarations, all with the same id “x”, and one port dump line:

$var reg 2 x x $end
... 0x20000 times ...
$var reg 2 x x $end

$enddefinitions
p a a A

In this case vcd_minid and vcd_maxid will be equal because there’s only one hash (the one for the symbol “x”). Also, when “$enddefinitions” was called, create_sorted_table() filled the indexed hash table (as there is only one hash), and sorted would be left to 0 (its initialization value).

The 0x20000 “$var” declarations increment the numsyms value to 0x20000. Then, the p line is parsed, leading to the call bsearch_vcd("A", 1) to look the symbol up.
At [31], indexed is indeed not 0, so the hash for “A” is calculated [32]. However, it won’t be within vcd_minid and vcd_maxid [33].

The code does not return and executes the command [34], calling bsearch on the sorted array. Here the code is simply missing a null check against sorted.
bsearch is called with sorted set to 0, and numsyms set to 0x20000.

Let’s see glibc’s bsearch implementation:

     __extern_inline void *
     bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, __compar_fn_t __compar)
     {
       size_t __l, __u, __idx;
       const void *__p;
       int __comparison;

       __l = 0;
       __u = __nmemb;
       while (__l < __u)
         {
[35]       __idx = (__l + __u) / 2;
[36]       __p = (void *) (((const char *) __base) + (__idx * __size));
[37]       __comparison = (*__compar) (__key, __p);
           if (__comparison < 0)
         __u = __idx;
           else if (__comparison > 0)
         __l = __idx + 1;
           else
         return (void *) __p;
         }

       return NULL;
     }

As expected, as bsearch is doing a binary search, it’s assuming that sorted points to a buffer of size numsyms * sizeof(struct vcdsymbol *). In our example, assuming we’re executing in 32-bit mode, that would be 0x80000. At [35], the middle of this buffer is calculated and __p (the element to compare) is taken from sorted [36], which means it’s accessing an element at address 0x40000.
Finally it calls the __compar function vcdsymbsearchcompare:

     static int vcdsymbsearchcompare(const void *s1, const void *s2) {
         char *v1;
         struct vcdsymbol *v2;

         v1 = (char *)s1;
[38]     v2 = *((struct vcdsymbol **)s2);

         return (strcmp(v1, v2->id));
     }

At [38] the address 0x40000 is dereferenced, looking for a vcdsymbol, which is then returned if v2->id matches “A”. If an attacker controls the content of address 0x40000 (or any other, depending on the value set by numsyms, which is controllable by the number of “$var” definitions), they can return a vcdsymbol structure with arbitrary contents, leading to the arbitrary write described earlier [30]. This, in turn, could lead to arbitrary code execution. As this is a parser with few limitations on sizes and number of elements, controlling the content of arbitrary addresses, especially in 32-bit mode, may be achieved by carefully manipulating heap allocations.

As mentioned before, this issue affects 3 different source files, listed separately below.

CVE-2023-37921 - vcd2vzt

The vcd2vzt conversion utility does not check if the sorted buffer is initialized before calling bsearch at line src/helpers/vcd2vzt.c:206, leading to an arbitrary read which could be turned into an arbitrary write.

CVE-2023-37922 - vcd2lxt2

The vcd2lxt2 conversion utility does not check if the sorted buffer is initialized before calling bsearch at line src/helpers/vcd2lxt2.c:204, leading to an arbitrary read which could be turned into an arbitrary write.

CVE-2023-37923 - vcd2lxt

The vcd2lxt conversion utility does not check if the sorted buffer is initialized before calling bsearch at line src/helpers/vcd2lxt.c:198, leading to an arbitrary read which could be turned into an arbitrary write.

Crash Information

AddressSanitizer:DEADLYSIGNAL
=================================================================
==273333==ERROR: AddressSanitizer: SEGV on unknown address 0x00040100 (pc 0x565585e4 bp 0xffffd5b8 sp 0xffffd590 T0)
==273333==The signal is caused by a READ memory access.
    #0 0x565585e4 in vcdsymbsearchcompare src/helpers/vcd2lxt.c:175
    #1 0xf765eab4 in __GI_bsearch ../bits/stdlib-bsearch.h:33
    #2 0xf79dd4ab in __interceptor_bsearch ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:10155
    #3 0xf79dd4ab in __interceptor_bsearch ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:10150
    #4 0x56558703 in bsearch_vcd src/helpers/vcd2lxt.c:198
    #5 0x5655c4ca in parse_valuechange src/helpers/vcd2lxt.c:881
    #6 0x5655f9de in vcd_parse src/helpers/vcd2lxt.c:1417
    #7 0x56561640 in vcd_main src/helpers/vcd2lxt.c:1704
    #8 0x56562dad in main src/helpers/vcd2lxt.c:1959
    #9 0xf7647294 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #10 0xf7647357 in __libc_start_main_impl ../csu/libc-start.c:381
    #11 0x565583f6 in _start (vcd2lxt+0x33f6)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/helpers/vcd2lxt.c:175 in vcdsymbsearchcompare

VENDOR RESPONSE

Fixed in version 3.3.118, available from https://sourceforge.net/projects/gtkwave/files/gtkwave-3.3.118/

TIMELINE

2023-08-01 - Vendor Disclosure
2023-12-31 - Vendor Patch Release
2024-01-08 - Public Release

Credit

Discovered by Claudio Bozzato of Cisco Talos.

TALOS-2023-1810

TALOS-2023-1806

Intelligence Center

Vulnerability Research

Incident Response

Security Resources

Media

Company