CVE-2023-37921,CVE-2023-37923,CVE-2023-37922
Multiple arbitrary write vulnerabilities exist in the VCD sorted bsearch functionality of GTKWave 3.3.115. A specially crafted .vcd file can lead to arbitrary code execution. A victim would need to open a malicious file to trigger these vulnerabilities.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
GTKWave 3.3.115
GTKWave - https://gtkwave.sourceforge.net
7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE-118 - Incorrect Access of Indexable Resource (‘Range Error’)
GTKWave is a wave viewer, often used to analyze FPGA simulations and logic analyzer captures. It includes a GUI to view and analyze traces, as well as convert across several file formats (.lxt
, .lxt2
, .vzt
, .fst
, .ghw
, .vcd
, .evcd
) either by using the UI or its command line tools. GTKWave is available for Linux, Windows and MacOS. Trace files can be shared within teams or organizations, for example to compare results of simulation runs across different design implementations, to analyze protocols captured with logic analyzers or just as a reference when porting design implementations.
GTKWave sets up mime types for its supported extensions.
VCD (Value Change Dump) files are parsed by the vcd_parse
function. This function is duplicated in several conversion utilities (vcd2lxt
, vcd2lxt2
, vcd2vzt
) but not in the GUI parsers of GTKWave. In general the various implementations are very similar or identical, but in some cases they differ enough to not allow the issue described in this advisory to be triggered.
Let’s describe the execution flow for the vcd2lxt
utility; the other implementations have very similar behavior.
The function vcd_parse
loops over each line in the file [1]. Depending on which token has been read [2], a different switch block is executed:
static void vcd_parse(int linear) {
int tok;
[1] for (;;) {
[2] switch (get_token()) {
...
The get_token()
function simply extracts a token from the file at the current cursor position, saving it to the global yytext
buffer, and assigning the token’s length to the global yylen
.
Moreover, if the token does not start with “$”, the token is considered a special symbol, and it has to match one of these tokens:
char *tokens[] = {"var", "end", "scope", "upscope",
"comment", "date", "dumpall", "dumpoff", "dumpon",
"dumpvars", "enddefinitions",
"dumpports", "dumpportsoff", "dumpportson", "dumpportsall",
"timescale", "version", "vcdclose", "timezero",
"", "", ""};
The return value of get_token
is a token type, which is an index inside the tokens
array above.
If the token does not start with “$”, the token is considered a string and the returned token type will be T_STRING
.
Going back to the switch
above, if the parsed token is “$var”, the token type will be T_VAR
and we will enter the block at [3]:
[3] case T_VAR: {
int vtok;
struct vcdsymbol *v = NULL;
var_prevch = 0;
...
[4] vtok = get_vartoken(1);
A new token is read using get_vartoken()
[5] and saved into vtok
.
This function, similarly to get_token()
, extracts a token from the file, separated by any of “ “, “\t”, “\n”, or “\r”.
If match_kw
(the first argument to the function) is 1, then the token is matched against the vartypes
array and the return value is set to the matching index inside it:
static char *vartypes[] = {"event", "parameter",
"integer", "real", "real_parameter", "realtime", "string", "reg", "supply0",
"supply1", "time", "tri", "triand", "trior",
"trireg", "tri0", "tri1", "wand", "wire", "wor", "port", "in", "out", "inout",
"$end", "", "", "", ""};
Going on with the T_VAR
case. Our token needs to be “port” or any of the other symbols before it [5]:
...
[5] if (vtok > V_PORT) goto bail;
[6] v = (struct vcdsymbol *)calloc_2(1, sizeof(struct vcdsymbol));
v->vartype = vtok;
v->msi = v->lsi = vcd_explicit_zero_subscripts; /* indicate [un]subscripted status */
if (vtok == V_PORT) {
...
} else /* regular vcd var, not an evcd port var */
{
[7] vtok = get_vartoken(1);
if (vtok == V_END) goto err;
[7] v->size = atoi_64(yytext);
[8] vtok = get_strtoken();
if (vtok == V_END) goto err;
v->id = (char *)malloc_2(yylen + 1);
[8] strcpy(v->id, yytext);
v->nid = vcdid_hash(yytext, yylen);
if (v->nid < vcd_minid) vcd_minid = v->nid;
if (v->nid > vcd_maxid) vcd_maxid = v->nid;
[9] vtok = get_vartoken(0);
if (vtok != V_STRING) goto err;
if (slisthier_len) {
...
} else {
v->name = (char *)malloc_2(yylen + 1);
[9] strcpy(v->name, yytext);
}
[10] vtok = get_vartoken(1);
if (vtok == V_END) goto dumpv;
...
}
A series of tokens is then extracted to populate the vcdsymbol
pointed by v
[6], in this order:
The code then continues by populating v->value
and v->narray
:
dumpv:
...
/* initial conditions */
[11] v->value = (char *)malloc_2(v->size + 1);
v->value[v->size] = 0;
v->narray = (struct Node **)calloc_2(v->size, sizeof(struct Node *));
{
int i;
for (i = 0; i < v->size; i++) {
[12] v->value[i] = 'x';
v->narray[i] = (struct Node *)calloc_2(1, sizeof(struct Node));
v->narray[i]->head.time = -1;
v->narray[i]->head.v.val = 1;
}
}
...
[13] if (!vcdsymroot) {
vcdsymroot = vcdsymcurr = v;
} else {
vcdsymcurr->next = v;
vcdsymcurr = v;
}
numsyms++;
...
bail:
if (vtok != V_END) sync_end(NULL);
break;
}
v->value
is allocated with a size of v->size + 1
[11], then it is initialized with “x” characters [12] and null-terminated.
Finally, the new v
symbol is added to the symbols list [13] and numsyms
is incremented. Here the case block ends and we go back to the loop at [1].
By defining variables using $var
, we can create an arbitrary number of vcd symbols (as many as they can fit in memory), and numsyms
increments accordingly without a limit.
After all variables are declared, “$enddefinitions” is read, which let us enter the switch block at [14]:
[14] case T_ENDDEFINITIONS:
if (!header_over) {
[15] header_over = 1; /* do symbol table management here */
[16] create_sorted_table();
if ((!sorted) && (!indexed)) {
fprintf(stderr, "No symbols in VCD file..nothing to do!\n");
exit(1);
}
if (linear) lt_set_no_interlace(lt);
}
break;
At [15], the variable header_over
is set to 1. This is important, because this variable must be set to 1 in order to call the parse_valuechange()
function in the next step.
As the variable definition has completed, the function create_sorted_table()
is called to store all vcd symbols in an index for faster access [16].
static void create_sorted_table(void) {
struct vcdsymbol *v;
struct vcdsymbol **pnt;
unsigned int vcd_distance;
struct vcdsymbol *root_v;
int i;
if (numsyms) {
[17] vcd_distance = vcd_maxid - vcd_minid + 1;
[18] if (vcd_distance <= 8 * 1024 * 1024) {
[19] indexed = (struct vcdsymbol **)calloc_2(vcd_distance, sizeof(struct vcdsymbol *));
printf("%d symbols span ID range of %d, using indexing...\n", numsyms, vcd_distance);
v = vcdsymroot;
while (v) {
if (!(root_v = indexed[v->nid - vcd_minid])) {
indexed[v->nid - vcd_minid] = v;
}
alias_vs_normal_symadd(v, root_v);
v = v->next;
}
[20] } else {
pnt = sorted = (struct vcdsymbol **)calloc_2(numsyms, sizeof(struct vcdsymbol *));
v = vcdsymroot;
while (v) {
*(pnt++) = v;
v = v->next;
}
qsort(sorted, numsyms, sizeof(struct vcdsymbol *), vcdsymcompare);
...
}
}
At [17], the difference between vcd_maxid
and vcd_minid
is calculated. If two variables with symbols “A” and “AAAAA” were declared, their distance will be very big: 0x9b3890cb - 0x21 + 1
. If the distance is smaller than ~8 millions [18], create_sorted_table
will create a simple hash table in indexed
[19] so symbols can be retrieved directly, after computing the vcd_hash
of the symbol to look up. The size of this table is at maximum 8 * 1024 * 1024 * sizeof(void *)
, so 32 MB for 32-bit code and 64 MB for 64-bit code.
If instead the distance is too big [20], a sorted array is created in sorted
.
At this point, we go back to the loop at [1].
The next token is read. If it’s a string, we enter the switch block at [21].
case T_STRING:
[21] if (header_over) {
/* catchall for events when header over */
[22] if (yytext[0] == '#') {
...
} else {
[23] parse_valuechange();
}
}
break;
If header_over
is 1 [21] and the string does not start with “#” [22], the function parse_valuechange()
is called.
static void parse_valuechange(void) {
struct vcdsymbol *v;
char *vector;
int vlen;
switch (yytext[0]) {
...
[24] case 'p':
/* extract port dump value.. */
[25] vector = malloc_2(yylen_cache = yylen);
strcpy(vector, yytext + 1);
vlen = yylen - 1;
[26] get_strtoken(); /* throw away 0_strength_component */
get_strtoken(); /* throw away 0_strength_component */
get_strtoken(); /* this is the id */
[27] v = bsearch_vcd(yytext, yylen);
if (!v) {
fprintf(stderr, "Near line %d, Unknown identifier: '%s'\n", vcdlineno, yytext);
free_2(vector);
} else {
[28] if (vlen < v->size) /* fill in left part */
{
char extend;
int i, fill;
extend = '0';
fill = v->size - vlen;
[29] for (i = 0; i < fill; i++) {
[30] v->value[i] = extend;
}
evcd_strcpy(v->value + fill, vector);
}
...
}
If the token starts with the letter “p”, we enter the block at [24].
The array vector
is allocated to store the string after “p” [25].
Two tokens are then extracted and discarded [26]. Finally, the id
token is extracted and searched for in the vcdsymbol
’s list [27] using bsearch_vcd
. If found, the resulting vcdsymbol
is then pointed by v
.
If the symbol has been found, we reach [28]. Here, if vlen
is smaller than v->size
, this means that the string in vector is not big enough to fill the whole v->value
buffer, which is thus padded with zeros. Note that the loop that does the padding is controlled by the symbol’s size v->size
.
It’s important to keep in mind that bsearch_vcd
[27] returns a vcdsymbol
, that, if arbitrarily controlled, allows for easily writing arbitrary memory. v->size
is controlled, so the loop at [29] can write an arbitrary number of zeros to v->value
, which is a pointer stored inside the vcdsymbol
.
Indeed, bsearch_vcd
presents an issue that allows for an arbitrary vcdsymbol
to be returned:
static struct vcdsymbol *bsearch_vcd(char *key, int len) {
struct vcdsymbol **v;
struct vcdsymbol *t;
[31] if (indexed) {
[32] unsigned int hsh = vcdid_hash(key, len);
[33] if ((hsh >= vcd_minid) && (hsh <= vcd_maxid)) {
return (indexed[hsh - vcd_minid]);
}
}
[34] v = (struct vcdsymbol **)bsearch(key, sorted, numsyms,
sizeof(struct vcdsymbol *), vcdsymbsearchcompare);
if (v) {
#ifndef VCD_BSEARCH_IS_PERFECT
for (;;) {
t = *v;
if ((v == sorted) || (strcmp((*(--v))->id, key))) {
return (t);
}
}
#else
return (*v);
#endif
} else {
return (NULL);
}
}
Let’s assume that we have a .vcd file with 0x20000 (131072) “$var” declarations, all with the same id “x”, and one port dump line:
$var reg 2 x x $end
... 0x20000 times ...
$var reg 2 x x $end
$enddefinitions
p a a A
In this case vcd_minid
and vcd_maxid
will be equal because there’s only one hash (the one for the symbol “x”). Also, when “$enddefinitions” was called, create_sorted_table()
filled the indexed
hash table (as there is only one hash), and sorted
would be left to 0 (its initialization value).
The 0x20000 “$var” declarations increment the numsyms
value to 0x20000. Then, the p
line is parsed, leading to the call bsearch_vcd("A", 1)
to look the symbol up.
At [31], indexed
is indeed not 0, so the hash for “A” is calculated [32]. However, it won’t be within vcd_minid
and vcd_maxid
[33].
The code does not return and executes the command [34], calling bsearch
on the sorted
array. Here the code is simply missing a null check against sorted
.
bsearch
is called with sorted
set to 0, and numsyms
set to 0x20000.
Let’s see glibc’s bsearch
implementation:
__extern_inline void *
bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size, __compar_fn_t __compar)
{
size_t __l, __u, __idx;
const void *__p;
int __comparison;
__l = 0;
__u = __nmemb;
while (__l < __u)
{
[35] __idx = (__l + __u) / 2;
[36] __p = (void *) (((const char *) __base) + (__idx * __size));
[37] __comparison = (*__compar) (__key, __p);
if (__comparison < 0)
__u = __idx;
else if (__comparison > 0)
__l = __idx + 1;
else
return (void *) __p;
}
return NULL;
}
As expected, as bsearch
is doing a binary search, it’s assuming that sorted
points to a buffer of size numsyms * sizeof(struct vcdsymbol *)
. In our example, assuming we’re executing in 32-bit mode, that would be 0x80000. At [35], the middle of this buffer is calculated and __p
(the element to compare) is taken from sorted
[36], which means it’s accessing an element at address 0x40000.
Finally it calls the __compar
function vcdsymbsearchcompare
:
static int vcdsymbsearchcompare(const void *s1, const void *s2) {
char *v1;
struct vcdsymbol *v2;
v1 = (char *)s1;
[38] v2 = *((struct vcdsymbol **)s2);
return (strcmp(v1, v2->id));
}
At [38] the address 0x40000 is dereferenced, looking for a vcdsymbol
, which is then returned if v2->id
matches “A”. If an attacker controls the content of address 0x40000 (or any other, depending on the value set by numsyms
, which is controllable by the number of “$var” definitions), they can return a vcdsymbol
structure with arbitrary contents, leading to the arbitrary write described earlier [30]. This, in turn, could lead to arbitrary code execution. As this is a parser with few limitations on sizes and number of elements, controlling the content of arbitrary addresses, especially in 32-bit mode, may be achieved by carefully manipulating heap allocations.
As mentioned before, this issue affects 3 different source files, listed separately below.
The vcd2vzt
conversion utility does not check if the sorted
buffer is initialized before calling bsearch
at line src/helpers/vcd2vzt.c:206
, leading to an arbitrary read which could be turned into an arbitrary write.
The vcd2lxt2
conversion utility does not check if the sorted
buffer is initialized before calling bsearch
at line src/helpers/vcd2lxt2.c:204
, leading to an arbitrary read which could be turned into an arbitrary write.
The vcd2lxt
conversion utility does not check if the sorted
buffer is initialized before calling bsearch
at line src/helpers/vcd2lxt.c:198
, leading to an arbitrary read which could be turned into an arbitrary write.
AddressSanitizer:DEADLYSIGNAL
=================================================================
==273333==ERROR: AddressSanitizer: SEGV on unknown address 0x00040100 (pc 0x565585e4 bp 0xffffd5b8 sp 0xffffd590 T0)
==273333==The signal is caused by a READ memory access.
#0 0x565585e4 in vcdsymbsearchcompare src/helpers/vcd2lxt.c:175
#1 0xf765eab4 in __GI_bsearch ../bits/stdlib-bsearch.h:33
#2 0xf79dd4ab in __interceptor_bsearch ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:10155
#3 0xf79dd4ab in __interceptor_bsearch ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:10150
#4 0x56558703 in bsearch_vcd src/helpers/vcd2lxt.c:198
#5 0x5655c4ca in parse_valuechange src/helpers/vcd2lxt.c:881
#6 0x5655f9de in vcd_parse src/helpers/vcd2lxt.c:1417
#7 0x56561640 in vcd_main src/helpers/vcd2lxt.c:1704
#8 0x56562dad in main src/helpers/vcd2lxt.c:1959
#9 0xf7647294 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#10 0xf7647357 in __libc_start_main_impl ../csu/libc-start.c:381
#11 0x565583f6 in _start (vcd2lxt+0x33f6)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/helpers/vcd2lxt.c:175 in vcdsymbsearchcompare
Fixed in version 3.3.118, available from https://sourceforge.net/projects/gtkwave/files/gtkwave-3.3.118/
2023-08-01 - Vendor Disclosure
2023-12-31 - Vendor Patch Release
2024-01-08 - Public Release
Discovered by Claudio Bozzato of Cisco Talos.