CVE-2015-8790
A specially crafted unicode string can cause an off-by-few read on the heap in unicode string parsing code in libebml. This issue can potentialy be used for information leaks.
http://matroska.org
An off-by-few read on heap occurs when parsing unicode strings in
EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8
. String is parsed
in a for loop but in case of a four byte character, no check is made
if the last bytes accessed fall outside the allocated buffer:
Technical information below:
Vulnerable code is located in EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8
:
for (j=0, i=0; i<UTF8string.length(); j++) {
uint8 lead = static_cast<uint8>(UTF8string[i]);
if (lead < 0x80) {
_Data[j] = lead;
i++;
} else if ((lead >> 5) == 0x6) {
_Data[j] = ((lead & 0x1F) << 6) + (UTF8string[i+1] & 0x3F);
i += 2;
} else if ((lead >> 4) == 0xe) {
_Data[j] = ((lead & 0x0F) << 12) + ((UTF8string[i+1] & 0x3F) << 6) + (UTF8string[i+2] & 0x3F);
i += 3;
} else if ((lead >> 3) == 0x1e) {
printf("i is now %d and the highest accessed byte is %d\n",i,i+3 );
_Data[j] = ((lead & 0x07) << 18) + ((UTF8string[i+1] & 0x3F) << 12) + ((UTF8string[i+2] & 0x3F) << 6) + (UTF8string[i+3] & 0x3F);
i += 4;
} else
// Invalid char?
break;
}
If the last byte in the string being parsed satisfies the
else if ((lead >> 3) == 0x1e)
condition, for example 0xf2, 3 bytes
past the end of the buffer will be read thereby causing a out of
bounds read on the heap.
Richard Johnson and Aleksandar Nikolic