Talos Vulnerability Report

TALOS-2023-1805

GTKWave VCD var definition section out-of-bounds read vulnerabilities

January 8, 2024
CVE Number

CVE-2023-37447,CVE-2023-37446,CVE-2023-37445,CVE-2023-37444,CVE-2023-37442,CVE-2023-37443

SUMMARY

Multiple out-of-bounds read vulnerabilities exist in the VCD var definition section functionality of GTKWave 3.3.115. A specially crafted .vcd file can lead to arbitrary code execution. A victim would need to open a malicious file to trigger these vulnerabilities.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

GTKWave 3.3.115

PRODUCT URLS

GTKWave - https://gtkwave.sourceforge.net

CVSSv3 SCORE

7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-119 - Improper Restriction of Operations within the Bounds of a Memory Buffer

DETAILS

GTKWave is a wave viewer, often used to analyze FPGA simulations and logic analyzer captures. It includes a GUI to view and analyze traces, as well as convert across several file formats (.lxt, .lxt2, .vzt, .fst, .ghw, .vcd, .evcd) either by using the UI or its command line tools. GTKWave is available for Linux, Windows and MacOS. Trace files can be shared within teams or organizations, for example to compare results of simulation runs across different design implementations, to analyze protocols captured with logic analyzers or just as a reference when porting design implementations.

GTKWave sets up mime types for its supported extensions.

VCD (Value Change Dump) files are parsed by the vcd_parse function. This function is duplicated in several conversion utilities (vcd2lxt, vcd2lxt2, vcd2vzt) and in the GUI portion of GTKWave. In general the various implementations are very similar or identical, and in this case they are all affected by the issue described in this advisory.
Let’s describe the execution flow for the vcd2lxt utility; the other implementations have very similar behavior.

The function vcd_parse loops over each line in the file [1]. Depending on which token has been read [2], a different switch block is executed:

     static void vcd_parse(int linear) {
         int tok;

[1]      for (;;) {
[2]          switch (get_token()) {
                 ...

The get_token() function simply extracts a token from the file at the current cursor position, saving it to the global yytext buffer and assigning the token’s length to the global yylen.
Moreover, if the token does not start with “$”, the token is considered a special symbol, and it has to match one of these tokens:

char *tokens[] = {"var", "end", "scope", "upscope",
                  "comment", "date", "dumpall", "dumpoff", "dumpon",
                  "dumpvars", "enddefinitions",
                  "dumpports", "dumpportsoff", "dumpportson", "dumpportsall",
                  "timescale", "version", "vcdclose", "timezero",
                  "", "", ""};

The return value of get_token is a token type, which is an index inside the tokens array above.
If the token does not start with “$”, the token is considered a string and the returned token type will be T_STRING.

Going back to the switch above, if the parsed token is “$var”, the token type will be T_VAR and we will enter the block at [3]:

[3]  case T_VAR: {
         int vtok;
         struct vcdsymbol *v = NULL;

         var_prevch = 0;
         ...
[4]      vtok = get_vartoken(1);

A new token is read using get_vartoken() [4] and saved into vtok.
This function, similarly to get_token(), extracts a token from the file, separated by any of “ “, “\t”, “\n”, or “\r”.
If match_kw (the first argument to the function) is 1, then the token is matched against the vartypes array, and the return value is set to the matching index inside it:

static char *vartypes[] = {"event", "parameter",
                           "integer", "real", "real_parameter", "realtime", "string", "reg", "supply0",
                           "supply1", "time", "tri", "triand", "trior",
                           "trireg", "tri0", "tri1", "wand", "wire", "wor", "port", "in", "out", "inout",
                           "$end", "", "", "", ""};

Going on with the T_VAR case. Our token needs to be “port” or any of the other symbols before it [5]:

         ...
[5]      if (vtok > V_PORT) goto bail;

[6]      v = (struct vcdsymbol *)calloc_2(1, sizeof(struct vcdsymbol));
         v->vartype = vtok;
         v->msi = v->lsi = vcd_explicit_zero_subscripts; /* indicate [un]subscripted status */

         if (vtok == V_PORT) {
             ...
         } else /* regular vcd var, not an evcd port var */
         {
[7]          vtok = get_vartoken(1);
             if (vtok == V_END) goto err;
[7]          v->size = atoi_64(yytext);
[8]          vtok = get_strtoken();
             if (vtok == V_END) goto err;
             v->id = (char *)malloc_2(yylen + 1);
[8]          strcpy(v->id, yytext);
[11]         v->nid = vcdid_hash(yytext, yylen);

[12]         if (v->nid < vcd_minid) vcd_minid = v->nid;
             if (v->nid > vcd_maxid) vcd_maxid = v->nid;

[9]          vtok = get_vartoken(0);
             if (vtok != V_STRING) goto err;
             if (slisthier_len) {
                ...
             } else {
                 v->name = (char *)malloc_2(yylen + 1);
[9]              strcpy(v->name, yytext);
             }

[10]          vtok = get_vartoken(1);
             if (vtok == V_END) goto dumpv;
             ...
         }

A series of tokens is then extracted to populate the vcdsymbol pointed by v [6], in this order:

  • [7] v->size (integer) is the size of this symbol
  • [8] v->id is a string representing the symbol ID
  • [9] v->name is a string representing the name of this symbol
  • [10] the declaration must end with the string “$end”

Most importantly, a hash of v->id is computed [11] and the vcd_minid and vcd_maxid global variables are updated to represent, respectively, the minimum and maximum hash IDs ever encountered.

It is interesting to note how the hash is computed in the vcdid_hash function:

static unsigned int vcdid_hash(char *s, int len) {
    unsigned int val = 0;
    int i;

    s += (len - 1);

    for (i = 0; i < len; i++) {
        val *= 94;
        val += (((unsigned char)*s) - 32);
        s--;
    }

    return (val);
}

As we can see, each character in the input symbol s is added to the integer variable val, which is multiplied by 94 in each loop. This means that short symbols will have small hashes as a result. The returned hash can span the whole integer space. Let’s see two simple examples:

vcdid_hash("A", 1) returns 0x21
vcdid_hash("AAAAA", 5) returns 0x9b3890cb

Going back to the variable definition, the code continues by adding the new v symbol to the symbols list, [13] and numsyms is incremented:

     dumpv:
         ...

[13]    if (!vcdsymroot) {
            vcdsymroot = vcdsymcurr = v;
        } else {
            vcdsymcurr->next = v;
            vcdsymcurr = v;
        }
        numsyms++;

         ...

     bail:
         if (vtok != V_END) sync_end(NULL);
         break;
     }

The next token is read. If it matches “$enddefinitions” the switch block at [14] is entered.

[14] case T_ENDDEFINITIONS:
         if (!header_over) {
[15]         header_over = 1; /* do symbol table management here */
[16]         create_sorted_table();
             if ((!sorted) && (!indexed)) {
                 fprintf(stderr, "No symbols in VCD file..nothing to do!\n");
                 exit(1);
             }

             if (linear) lt_set_no_interlace(lt);
         }
         break;

At [15], the variable header_over is set to 1. This is important, as it declares the end of the variable definitions and allows the definition of the next sections. As the variable definition has completed, the function create_sorted_table() is called to store all VCD symbols in an index for faster access [16].

     static void create_sorted_table(void) {
         struct vcdsymbol *v;
         struct vcdsymbol **pnt;
         unsigned int vcd_distance;
         struct vcdsymbol *root_v;
         int i;

         if (numsyms) {
[17]         vcd_distance = vcd_maxid - vcd_minid + 1;

[18]         if (vcd_distance <= 8 * 1024 * 1024) {
[19]             indexed = (struct vcdsymbol **)calloc_2(vcd_distance, sizeof(struct vcdsymbol *));

                 printf("%d symbols span ID range of %d, using indexing...\n", numsyms, vcd_distance);

                 v = vcdsymroot;
                 while (v) {
                     if (!(root_v = indexed[v->nid - vcd_minid])) {
                         indexed[v->nid - vcd_minid] = v;
                     }
                     alias_vs_normal_symadd(v, root_v);

                     v = v->next;
                 }
[20]         } else {
                 pnt = sorted = (struct vcdsymbol **)calloc_2(numsyms, sizeof(struct vcdsymbol *));
                 v = vcdsymroot;
                 while (v) {
                     *(pnt++) = v;
                     v = v->next;
                 }

                 qsort(sorted, numsyms, sizeof(struct vcdsymbol *), vcdsymcompare);

                 ...
         }
     }

At [17], the difference between vcd_maxid and vcd_minid is calculated. If we consider the example above, if two variables with symbols “A” and “AAAAA” are declared, their distance will be very big: 0x9b3890cb - 0x21 + 1. If the distance is smaller than ~8 millions [18], create_sorted_table will create a simple hash table in indexed [19], so that symbols can be retrieved directly, after computing the vcd_hash of the symbol to look up. The size of this table is at maximum 8 * 1024 * 1024 * sizeof(void *), so 32 MB for 32-bit code and 64 MB for 64-bit code.
If instead the distance is too big [20], a sorted array is created in sorted.

At this point, we go back to the loop at [1].

As “$enddefinitions” has been issued and the index has been created, it should not be possible to create a new variable definition. However, there’s nothing preventing it. This is the core of the issue described in this advisory.

If another variable is defined, we end up again in the T_VAR case (reporting the code from above for reference):

[3]  case T_VAR: {
         int vtok;
         struct vcdsymbol *v = NULL;

         var_prevch = 0;
         ...
[4]      vtok = get_vartoken(1);

...

[11]         v->nid = vcdid_hash(yytext, yylen);

[12]         if (v->nid < vcd_minid) vcd_minid = v->nid;
             if (v->nid > vcd_maxid) vcd_maxid = v->nid;

Here the code may eventually set new vcd_minid and vcd_maxid values, depending on the hash value for the new variable. If vcd_minid changes, the hash lookups will be affected, as we will see soon.

Let’s assume we go back to the loop at [1] and we encounter a port dump definition. This would hit the T_STRING case:

     case T_STRING:
[21]     if (header_over) {
             /* catchall for events when header over */
[22]         if (yytext[0] == '#') {
                ...
             } else {
[23]             parse_valuechange();
             }
         }
         break;

If header_over is 1 [21] and the string does not start with “#” [22], the function parse_valuechange() is called [23].

     static void parse_valuechange(void) {
         struct vcdsymbol *v;
         char *vector;
         int vlen;

         switch (yytext[0]) {
             ...
[24]         case 'p':
                 /* extract port dump value.. */
                 vector = malloc_2(yylen_cache = yylen);
                 strcpy(vector, yytext + 1);
                 vlen = yylen - 1;

[25]             get_strtoken(); /* throw away 0_strength_component */
                 get_strtoken(); /* throw away 0_strength_component */
                 get_strtoken(); /* this is the id                  */
[26]             v = bsearch_vcd(yytext, yylen);
                 if (!v) {
                     fprintf(stderr, "Near line %d, Unknown identifier: '%s'\n", vcdlineno, yytext);
                     free_2(vector);
                 } else {
[27]                 if (vlen < v->size) /* fill in left part */
                     {
                         char extend;
                         int i, fill;

                         extend = '0';

                         fill = v->size - vlen;
                         for (i = 0; i < fill; i++) {
                             v->value[i] = extend;
                         }
                         evcd_strcpy(v->value + fill, vector);
                     }
                     ...

If the token starts with the letter “p”, we enter the block at [24].
Two tokens are extracted and discarded [25]. Finally, the id token is extracted and searched for in the vcdsymbol’s list [26] using bsearch_vcd.

     static struct vcdsymbol *bsearch_vcd(char *key, int len) {
         struct vcdsymbol **v;
         struct vcdsymbol *t;

         if (indexed) {
[28]         unsigned int hsh = vcdid_hash(key, len);
[29]         if ((hsh >= vcd_minid) && (hsh <= vcd_maxid)) {
[30]             return (indexed[hsh - vcd_minid]);
             }
         }

         v = (struct vcdsymbol **)bsearch(key, sorted, numsyms,
                                          sizeof(struct vcdsymbol *), vcdsymbsearchcompare);
         ...
     }

If the vcd symbols have been stored in the indexed hash table, a direct array access is performed to retrieve the symbol.
At [28] the hash is calculated, and if it’s within vcd_minid and vcd_maxid [29], the nth element from indexed is returned, calculated as hsh - vcd_minid [30].

As vcd_minid may have been modified after $enddefinitions, using a direct access this way may lead to an out-of-bounds read.

For example, assume this input file:

$var wire 2 symid symname $end
$enddefinitions
$var wire 5 A anything $end
p AA

The first line will create a vcdsymbol and set both vcd_minid and vcd_maxid to 0x401a052d (the hash for “symid”).
The indexed hash table is created as vcd_maxid - vcd_minid + 1 is 1, and it is filled with the only variable (symname) defined at position 0. The size of indexed will thus be 4 (or a higher number depending on the malloc implementation, but this does not matter in this case).
Another variable is defined, this time with an id of “A”. vcd_minid will be set to 0x21. When the last line p AA is read, bsearch_vcd("AA", 2) will be called. Inside bsearch_vcd, hsh will be set to 0xc3f, which is within vcd_minid (0x21) and vcd_maxid (0x401a052d). Finally, indexed[0xc3f - 0x21] is returned. As indexed only has one element, this will read out-of-bounds in the heap.

Even though this is a read operation, this can later be used to perform arbitrary writes. By controlling the heap, it is possible at this point to return a vcdsymbol that points to arbitrary contents. With this assumption, let’s see an example of how to turn this into an arbitrary write. When we return from bsearch_vcd we are back in parse_valuechange:

[26]             v = bsearch_vcd(yytext, yylen);
                 if (!v) {
                     fprintf(stderr, "Near line %d, Unknown identifier: '%s'\n", vcdlineno, yytext);
                     free_2(vector);
                 } else {
[27]                 if (vlen < v->size) /* fill in left part */
                     {
                         char extend;
                         int i, fill;

                         extend = '0';

[28]                     fill = v->size - vlen;
                         for (i = 0; i < fill; i++) {
[29]                         v->value[i] = extend;
                         }
                         evcd_strcpy(v->value + fill, vector);
                     }
                     ...

At [27] vlen is checked to be smaller than v->size, but as v->size is controlled, we can choose to enter this block. fill at [28] is also controlled as we control v->size, which will allow to control the size of the loop. Finally, as we control the v->value pointer, we can write the ‘0’ character anywhere in memory. This allows, in turn, arbitrary code execution.

As mentioned before, this issue affects both the GUI program and some conversion utilities, which lie in separate source files. For this reason, we are listing each issue separately below.

CVE-2023-37442 - VCD GUI recoder

The GUI’s recoder VCD parsing code (default parser) allows “$var” definitions to happen after “$enddefinitions”. This allows the out-of-bounds read described earlier to be exploited into an out-of-bounds write, leading to arbitrary code execution.

Note that in vcd_recoder.c:1590 there seems to be the right check to defend against this issue, but it’s nullified by the && 0. This if condition can never be true:

case T_VAR:
    if((GLOBALS->header_over_vcd_recoder_c_3)&&(0))
    {
    fprintf(stderr,"$VAR encountered after $ENDDEFINITIONS near byte %d.  VCD is malformed, exiting.\n",
        (int)(GLOBALS->vcdbyteno_vcd_recoder_c_3+(GLOBALS->vst_vcd_recoder_c_3-GLOBALS->vcdbuf_vcd_recoder_c_3)));
    vcd_exit(255);
    }

This issue does not need any special command-line switch to be triggered when starting GTKWave.

CVE-2023-37443 - VCD GUI legacy

The GUI’s legacy VCD parsing code allows “$var” definitions to happen after “$enddefinitions”. This allows the out-of-bounds read described earlier to be exploited into an out-of-bounds write, leading to arbitrary code execution.

Note that in vcd.c:1248 there seems to be the right check to defend against this issue, but it’s nullified by the && 0. This if condition can never be true:

case T_VAR:
    if((GLOBALS->header_over_vcd_c_1)&&(0))
    {
    fprintf(stderr,"$VAR encountered after $ENDDEFINITIONS near byte %d.  VCD is malformed, exiting.\n",
        (int)(GLOBALS->vcdbyteno_vcd_c_1+(GLOBALS->vst_vcd_c_1-GLOBALS->vcdbuf_vcd_c_1)));
    vcd_exit(255);
    }

This issue can be triggered by using the -L flag when starting GTKWave.

CVE-2023-37444 - VCD GUI interactive

The GUI’s interactive VCD parsing code allows “$var” definitions to happen after “$enddefinitions”. This allows the out-of-bounds read described earlier to be exploited into an out-of-bounds write, leading to arbitrary code execution.

Note that in vcd_partial.c:1196 there seems to be the right check to defend against this issue, but it’s nullified by the && 0. This if condition can never be true:

case T_VAR:
    if((GLOBALS->header_over_vcd_partial_c_2)&&(0))
    {
    fprintf(stderr,"$VAR encountered after $ENDDEFINITIONS near byte %d.  VCD is malformed, exiting.\n",
        (int)(GLOBALS->vcdbyteno_vcd_partial_c_2+(GLOBALS->vst_vcd_partial_c_2-GLOBALS->vcdbuf_vcd_partial_c_2)));
    exit(0);
    }

This issue can be triggered by using the -I flag when starting GTKWave.

CVE-2023-37445 - vcd2vzt

The VCD parsing code in the vcd2vzt conversion utility allows “$var” definitions to happen after “$enddefinitions”. This allows the out-of-bounds read described earlier to be exploited into an out-of-bounds write, leading to arbitrary code execution.

CVE-2023-37446 - vcd2lxt2

The VCD parsing code in the vcd2lxt2 conversion utility allows “$var” definitions to happen after “$enddefinitions”. This allows the out-of-bounds read described earlier to be exploited into an out-of-bounds write, leading to arbitrary code execution.

CVE-2023-37447 - vcd2lxt

The VCD parsing code in the vcd2lxt conversion utility allows “$var” definitions to happen after “$enddefinitions”. This allows the out-of-bounds read described earlier to be exploited into an out-of-bounds write, leading to arbitrary code execution.

VENDOR RESPONSE

Fixed in version 3.3.118, available from https://sourceforge.net/projects/gtkwave/files/gtkwave-3.3.118/

TIMELINE

2023-08-01 - Vendor Disclosure
2023-12-31 - Vendor Patch Release
2024-01-08 - Public Release

Credit

Discovered by Claudio Bozzato of Cisco Talos.