Invalid read of size 8 in strncmp() from is_dst()

Bug #2015216 reported by Daniel van Vugt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Invalid
Undecided
Unassigned
valgrind (Fedora)
Unknown
Unknown
valgrind (Ubuntu)
Fix Released
High
Simon Chopin

Bug Description

[Impact]

This bug makes valgrind detect memory error false positives in ld.so now that it started using strncmp. in is_dst. The fix is to extend the special treatment of strncmp done in libc.so to ld.so as well. The patch is already available upstream in a new release, this is just about cherry-picking it.

[Rationale]

Given that the false-positive is triggered in ld.so, it's fairly likely that quite a few users will hit it.

[Original report]

Valgrind reports this in gnome-shell on almost every run:

==34822== Invalid read of size 8
==34822==    at 0x40264A8: strncmp (strcmp-sse2.S:162)
==34822==    by 0x400554E: is_dst (dl-load.c:216)
==34822==    by 0x40067D6: _dl_dst_count (dl-load.c:253)
==34822==    by 0x40067D6: expand_dynamic_string_token (dl-load.c:395)
==34822==    by 0x4006981: fillin_rpath.isra.0 (dl-load.c:483)
==34822==    by 0x4006CB2: decompose_rpath (dl-load.c:654)
==34822==    by 0x40092DF: cache_rpath (dl-load.c:696)
==34822==    by 0x40092DF: _dl_map_object (dl-load.c:2114)
==34822==    by 0x4002934: openaux (dl-deps.c:64)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x4002D6E: _dl_map_object_deps (dl-deps.c:232)
==34822==    by 0x400CE5E: dl_open_worker_begin (dl-open.c:592)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x400C2E9: dl_open_worker (dl-open.c:782)
==34822==  Address 0xe5c00a9 is 9 bytes inside a block of size 15 alloc'd
==34822==    at 0x4843828: malloc (vg_replace_malloc.c:381)
==34822==    by 0x402628E: malloc (rtld-malloc.h:56)
==34822==    by 0x402628E: strdup (strdup.c:42)
==34822==    by 0x4006C44: decompose_rpath (dl-load.c:629)
==34822==    by 0x40092DF: cache_rpath (dl-load.c:696)
==34822==    by 0x40092DF: _dl_map_object (dl-load.c:2114)
==34822==    by 0x4002934: openaux (dl-deps.c:64)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x4002D6E: _dl_map_object_deps (dl-deps.c:232)
==34822==    by 0x400CE5E: dl_open_worker_begin (dl-open.c:592)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x400C2E9: dl_open_worker (dl-open.c:782)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x400C6BB: _dl_open (dl-open.c:884)
==34822==
==34822== Invalid read of size 8
==34822==    at 0x40264A8: strncmp (strcmp-sse2.S:162)
==34822==    by 0x400554E: is_dst (dl-load.c:216)
==34822==    by 0x4006645: _dl_dst_substitute (dl-load.c:295)
==34822==    by 0x4006981: fillin_rpath.isra.0 (dl-load.c:483)
==34822==    by 0x4006CB2: decompose_rpath (dl-load.c:654)
==34822==    by 0x40092DF: cache_rpath (dl-load.c:696)
==34822==    by 0x40092DF: _dl_map_object (dl-load.c:2114)
==34822==    by 0x4002934: openaux (dl-deps.c:64)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x4002D6E: _dl_map_object_deps (dl-deps.c:232)
==34822==    by 0x400CE5E: dl_open_worker_begin (dl-open.c:592)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x400C2E9: dl_open_worker (dl-open.c:782)
==34822==  Address 0xe5c00a9 is 9 bytes inside a block of size 15 alloc'd
==34822==    at 0x4843828: malloc (vg_replace_malloc.c:381)
==34822==    by 0x402628E: malloc (rtld-malloc.h:56)
==34822==    by 0x402628E: strdup (strdup.c:42)
==34822==    by 0x4006C44: decompose_rpath (dl-load.c:629)
==34822==    by 0x40092DF: cache_rpath (dl-load.c:696)
==34822==    by 0x40092DF: _dl_map_object (dl-load.c:2114)
==34822==    by 0x4002934: openaux (dl-deps.c:64)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x4002D6E: _dl_map_object_deps (dl-deps.c:232)
==34822==    by 0x400CE5E: dl_open_worker_begin (dl-open.c:592)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x400C2E9: dl_open_worker (dl-open.c:782)
==34822==    by 0x40014DC: _dl_catch_exception (dl-catch.c:237)
==34822==    by 0x400C6BB: _dl_open (dl-open.c:884)

ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: libc6 2.37-0ubuntu2
ProcVersionSignature: Ubuntu 6.2.0-18.18-generic 6.2.6
Uname: Linux 6.2.0-18-generic x86_64
ApportVersion: 2.26.0-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: pass
Date: Tue Apr 4 18:01:17 2023
InstallationDate: Installed on 2022-11-28 (127 days ago)
InstallationMedia: Ubuntu 23.04 "Lunar Lobster" - Alpha amd64 (20221126)
SourcePackage: glibc
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
information type: Public → Private Security
information type: Private Security → Public Security
Revision history for this message
Florian Weimer (fweimer) wrote :

This is likely a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2081583, which was fixed in valgrind in https://bugs.kde.org/show_bug.cgi?id=434764. Based on our analysis, it is not a security bug.

Revision history for this message
Simon Chopin (schopin) wrote :

Thanks for the links, Florian, much appreciated. I'll prepare a valgrind upload with the appropriate patch.

I'm also going to dump below a rough explanation of the relevant parts of that routine as I understand it. It's mostly for my benefit, as I was curious to understand what was going on.

Disclaimer: I'm not fluent in asm *at all*, nor am I particularly well versed in security issues.

The relevant code is this (with plenty of ifdefs removed):

ENTRY (STRCMP)
 test %RDX_LP, %RDX_LP
 je LABEL(strcmp_exitz)
 cmp $1, %RDX_LP
 je LABEL(Byte0)
 mov %RDX_LP, %R11_LP

 mov %esi, %ecx
 mov %edi, %eax
        /* Use 64bit AND here to avoid long NOP padding. */
 and $0x3f, %rcx /* rsi alignment in cache line */
 and $0x3f, %rax /* rdi alignment in cache line */

 cmp $0x30, %ecx
 ja LABEL(crosscache) /* rsi: 16-byte load will cross cache line */
 cmp $0x30, %eax
 ja LABEL(crosscache) /* rdi: 16-byte load will cross cache line */
 movlpd (%rdi), %xmm1
 movlpd (%rsi), %xmm2
 movhpd 8(%rdi), %xmm1
 movhpd 8(%rsi), %xmm2 <- valgrind complains.

 pxor %xmm0, %xmm0 /* clear %xmm0 for null char checks */
 pcmpeqb %xmm1, %xmm0 /* Any null chars? */
 pcmpeqb %xmm2, %xmm1 /* compare first 16 bytes for equality */
 psubb %xmm0, %xmm1 /* packed sub of comparison results*/
 pmovmskb %xmm1, %edx

 sub $0xffff, %edx /* if first 16 bytes are same, edx == 0xffff */
 jnz LABEL(less16bytes) /* If not, find different value or null char */

 sub $16, %r11
 jbe LABEL(strcmp_exitz) /* finish comparision */

 add $16, %rsi /* prepare to search next 16 bytes */
 add $16, %rdi /* prepare to search next 16 bytes */
/* SNIP: after that control flow becomes... complicated, and it's not relevant for our use case since we're with small buffers here. */

LABEL(less16bytes):
 bsf %rdx, %rdx /* find and store bit index in %rdx */

 sub %rdx, %r11
 jbe LABEL(strcmp_exitz)

 movzbl (%rsi, %rdx), %ecx
 movzbl (%rdi, %rdx), %eax

 sub %ecx, %eax
 ret

LABEL(strcmp_exitz):
 xor %eax, %eax
 ret

The pointers to our buffers are stored in rdi and rsi.
First we're copying the LSB of the addresses in eax/ecx, which we then mask off further, under the assumption that a cache line is 64 bytes. Then, if one ptr is > 48, ptr+16 will be in another cache line, so we abort in that case.

So now we know our next 16 bytes are all in the same cache line. A fortiori they're in the same memory page. Those two conditions mean that once you read the first byte, reading the others doesn't have any visible side-effect regarding the cache. Since we're not in C, the usual notion of undefined behaviour due to reading outside a given buffer doesn't apply, either, so we don't have to fear the compiler eating our children.

Further down the line there are some SSE calculations done on the whole 16 bytes which end up with a bitmask of which bytes are unequal or null. The position of its least significant set bit can then be checked to see if it starts to diverge in our buffer or not. If not, we simply return success, otherwise we just return the result of the comparison on that first divergent byte.

Simon Chopin (schopin)
Changed in glibc (Ubuntu):
status: New → Invalid
Changed in valgrind (Ubuntu):
status: New → In Progress
importance: Undecided → High
information type: Public Security → Public
Simon Chopin (schopin)
Changed in valgrind (Ubuntu):
status: In Progress → Confirmed
description: updated
Revision history for this message
Simon Chopin (schopin) wrote :

Attaching a debdiff of the uploaded package for the FFe.

tags: added: fixed-in-valgrind-3.20 fixed-upstream
Changed in valgrind (Ubuntu):
assignee: nobody → Simon Chopin (schopin)
status: Confirmed → In Progress
tags: added: focal jammy kinetic
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm not convinced we have the best fix even if it is the upstream fix and only fix right now. What concerns me is that it looks like a workaround specifically for strncmp(), which even if correct suggests there's a more fundamental inaccuracy in Valgrind's machine emulation. And if so then that might affect other functions again in future.

Revision history for this message
Simon Chopin (schopin) wrote :

As I understand it, there is a fundamental inaccuracy in Valgrind's machine emulation. It's not emulating hardware but rather some kind of C machine model, with some extra bells & whistles when it interacts with the OS. That works very well because pretty much everything in our systems boils down to C code (I'd be curious to see how valgrind and golang play together, though).

Glibc is also implementing that C machine model against actual hardware, and is thus sometimes stepping outside of it to take advantage of hardware specificities that valgrind doesn't know about. Valgrind is aware of it, that's why it's been intercepting calls to glibc functions that don't play by its rules.

Now, I dig a bit further into the issue, and it turns out that Valgrind has suppression files to, well, suppress some outputs that are known false positives. There's a file for glibc in the upstream sources, but it isn't shipped in our package. I'll be looking into that.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That actually sounds encouraging because it suggests Valgrind is more portable than I thought :)

Revision history for this message
Simon Chopin (schopin) wrote :

Again, I don't have any deep knowledge of Valgrind. Regarding portability, it still needs pretty intimate knowledge of the underlying system since real world programs don't just use malloc() to get addresses, there are stuff like mmap() to take into account. Also, tracking *access* would likely require hooking into very lowlevel machinery.

Regarding the suppression files, the glibc suppression file is actually merged into the default suppression and so is shipped in the package, but it's tailored to the build-time version of glibc, and based on the file patterns it probably doesn't work on Debian-based systems anyway. I confirmed this by rebuilding the current package (without the patch), and the strncmp issue still shows (I'm using the test program outlined at https://bugs.kde.org/show_bug.cgi?id=434764 ).

I'll file a bug in Debian for the suppression file pattern, but in the mean time we still want the patch.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package valgrind - 1:3.19.0-1ubuntu2

---------------
valgrind (1:3.19.0-1ubuntu2) lunar; urgency=medium

  * d/p/lp2015216.patch: Cherry-picked from upstream to silence errors caused
    by asm-optimised strncmp in ld.so from recent glibc (LP: #2015216)

 -- Simon Chopin <email address hidden> Tue, 04 Apr 2023 20:25:00 +0200

Changed in valgrind (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.