Talos Vulnerability Report

TALOS-2016-0036

Matroska libebml EbmlUnicodeString Heap Information Leak

January 28, 2016
CVE Number

CVE-2015-8790

Description

A specially crafted unicode string can cause an off-by-few read on the heap in unicode string parsing code in libebml. This issue can potentialy be used for information leaks.

Tested Versions

  • libmatroska master branch

Product URLs

http://matroska.org

Details

An off-by-few read on heap occurs when parsing unicode strings in EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8. String is parsed in a for loop but in case of a four byte character, no check is made if the last bytes accessed fall outside the allocated buffer:

Technical information below:

Vulnerable code is located in EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8:

for (j=0, i=0; i<UTF8string.length(); j++) { uint8 lead = static_cast(UTF8string[i]); if (lead < 0x80) { _Data[j] = lead; i++; } else if ((lead >> 5) == 0x6) { _Data[j] = ((lead & 0x1F) << 6) + (UTF8string[i+1] & 0x3F); i += 2; } else if ((lead >> 4) == 0xe) { _Data[j] = ((lead & 0x0F) << 12) + ((UTF8string[i+1] & 0x3F) << 6) + (UTF8string[i+2] & 0x3F); i += 3; } else if ((lead >> 3) == 0x1e) { printf("i is now %d and the highest accessed byte is %d\n",i,i+3 ); _Data[j] = ((lead & 0x07) << 18) + ((UTF8string[i+1] & 0x3F) << 12) + ((UTF8string[i+2] & 0x3F) << 6) + (UTF8string[i+3] & 0x3F); i += 4; } else // Invalid char? break; }

If the last byte in the string being parsed satisfies the else if ((lead >> 3) == 0x1e) condition, for example 0xf2, 3 bytes past the end of the buffer will be read thereby causing a out of bounds read on the heap.

Credit

Richard Johnson and Aleksandar Nikolic