Received: by 2002:ab2:1689:0:b0:1f7:5705:b850 with SMTP id d9csp1375005lqa; Mon, 29 Apr 2024 07:00:39 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCUpKV7UP0jqW9R3DlbMIkunVPv+ub/Ol9S8TclcA7QrNq3bcHJbRWQxOmZt17oaoHob+Ffm+zf32bETUUn5glKHfkcVNRBR32u1goW1nQ== X-Google-Smtp-Source: AGHT+IG2jxnSbU9LKpRGsHDwLs1BBn3PV3rVqWBdmlnO51ERne5SYLUfRHvRl4umQbha1bPmhzfD X-Received: by 2002:a17:906:f847:b0:a58:ecff:79fb with SMTP id ks7-20020a170906f84700b00a58ecff79fbmr4713289ejb.62.1714399238856; Mon, 29 Apr 2024 07:00:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1714399238; cv=none; d=google.com; s=arc-20160816; b=Nq2bapdxQRmWo6mki17e2IXwZZjMYBm16Ek1mnbMcRd4qQq7zMh6YvWziTJBtDx99Y Ht693fXckmBE2xhKnzNToMthZzXR+0rMVxa4Vs/F92a7LSC1q8+Hh98nAhpG1wlSb5p8 45ezqj4RbA3Bsimc9kOF+zW237Z/a8nrYh+GSCJktkrBFIUW89cc6NzR3MRkzDo3fsht BBBgdD/WzhIoSen+pLMhSJu2JVVkQ0TCrpQm9cH9PFQKsBJrZaHaMSCuJ/bevlPb1Fz1 TQPFz/nJYIUdNdQNqmnwaSun+1sHAV4AoV63MUON6rrxEzA5PPWWY5RJ/iguVid0O8Mr g1Jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=subject:content-transfer-encoding:in-reply-to:references:to :mime-version:user-agent:from:date:message-id:dkim-signature :delivered-to:delivered-to:reply-to:list-id:list-subscribe :list-unsubscribe:list-help:list-post:precedence:mailing-list; bh=840buJdFvlUr9oaDPhn/J4UETUpPVIvPJwqcj3xN5RQ=; fh=9jsPTyo6edd9xvAeG+KFFrRrXMmgB/RdwUKOrvy9dcA=; b=QYlO70N5C4e5yrqiTst1V2Pw0ec4LvbuRlFE22uJCtVwFMvMJ1sLqbEjmvIQk5KHtn XaXZw7hod5mYdz9xU9Lmwl/sp4Vajkz3fYF2oa35ka8tiPG6TIOtJw9tM/8O4JaSgBqn B+fy9U98xbj1WRPvJ4P22xKOJFnvGTEW25fvqbnFy+mCvdbju35ELlqeNmlNSxJhTrck p5MLw/qdnNCgxTbjnA6a20qsH+XuCXZUk3hNpZ5Bw13ms9fPFmnbBI8ku3DxuW0w9CBg VzWAg5H1dQBIzS9ivz+2g9j8DS4zPE0w0Xhugjd9mBURi7HXJhF3jqm6RB2Jdhum6uwn LvPA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20230601 header.b=gkuEaV18; spf=pass (google.com: domain of oss-security-return-30093-linux.lists.archive=gmail.com@lists.openwall.com designates 193.110.157.125 as permitted sender) smtp.mailfrom="oss-security-return-30093-linux.lists.archive=gmail.com@lists.openwall.com"; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from second.openwall.net (second.openwall.net. [193.110.157.125]) by mx.google.com with SMTP id nc38-20020a1709071c2600b00a58db5d5ba6si3639489ejc.816.2024.04.29.07.00.38 for ; Mon, 29 Apr 2024 07:00:38 -0700 (PDT) Received-SPF: pass (google.com: domain of oss-security-return-30093-linux.lists.archive=gmail.com@lists.openwall.com designates 193.110.157.125 as permitted sender) client-ip=193.110.157.125; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20230601 header.b=gkuEaV18; spf=pass (google.com: domain of oss-security-return-30093-linux.lists.archive=gmail.com@lists.openwall.com designates 193.110.157.125 as permitted sender) smtp.mailfrom="oss-security-return-30093-linux.lists.archive=gmail.com@lists.openwall.com"; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (qmail 22432 invoked by uid 550); 29 Apr 2024 14:00:15 -0000 Mailing-List: contact oss-security-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: oss-security@lists.openwall.com Delivered-To: mailing list oss-security@lists.openwall.com Delivered-To: moderator for oss-security@lists.openwall.com Received: (qmail 28429 invoked from network); 29 Apr 2024 04:48:27 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714366097; x=1714970897; darn=lists.openwall.com; h=content-transfer-encoding:in-reply-to:references:subject:to :mime-version:user-agent:reply-to:from:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=840buJdFvlUr9oaDPhn/J4UETUpPVIvPJwqcj3xN5RQ=; b=gkuEaV18O6uOQq+ELwSpvwN2NXAV62W/MmC6Z8cmZz5knn0+YusGJzmX+cavGlA+wc smuLAJakQ9NY9Y6/EtLfLhatqCEu/urJDLaVQfNtrXEbKFSOGrVYe6+MhijjHEGDHlNC Nf8P/+hVyWwoSuNxTuV2LeX9PZEuNqw8AWzACCAkfqSRTS2BV+31SvD7G1exRRHab7lp 3+bjTe4GzxNEUSiN0zRecjcEXrlnJHWmQ6MjPtooEk1BM+haym6FYltG8npc8MBr3R1H 2abjrlUj2xA9fMNbEJ/jU36/LZZ+Na2SPLeZY3WQn17ShKr+cOmHlj4TDvmlHb/TDWds TGEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714366097; x=1714970897; h=content-transfer-encoding:in-reply-to:references:subject:to :mime-version:user-agent:reply-to:from:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=840buJdFvlUr9oaDPhn/J4UETUpPVIvPJwqcj3xN5RQ=; b=mGvLnQVmOg+UgScKx/hxLpu8neLJ1y0FJjkaS8peaX2klIuoIB7ENKNswBfAz12b5Z RZT0KEOwR2p4lopNRlWEyAZSSkVqyq9ZmItbh0Cagh6STLQi0X181m7/c+e8g6qmWcDX nVquZnF1FI/oieGpuD4VW+xAjGHJrzaj5zN328mdUNHR9A8Err9zo7tgEM++UP79FZCi wj1Lw4+xrONUpk/mmcsc0OzQj9pEU1C5a2FVz86/Onp3PZLiq7Ry9y/AUQYtxpCB7lRP EQk0BXe9/FOnYK+E3vMh4YawJUcKk47ghm+gmUBCkOm4FGRtwY0rijRuet5mU+GMNFSK f3bg== X-Gm-Message-State: AOJu0YwS7G7p/SHLypWLwIInBT8KHDlwr7QlSwzMFaqZspWKdmbmB+ta s9Nt1pvnztILwXoNBp4xmNsIQJVoIPnhoHANQFBEb2Jr3mZB8tpwICgPOw== X-Received: by 2002:a05:6808:bcf:b0:3c8:4d79:3c5d with SMTP id o15-20020a0568080bcf00b003c84d793c5dmr12700020oik.47.1714366097025; Sun, 28 Apr 2024 21:48:17 -0700 (PDT) Message-ID: <662F268D.1050906@gmail.com> Date: Sun, 28 Apr 2024 23:48:13 -0500 From: Jacob Bachmeyer User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: oss-security@lists.openwall.com References: <20240427234834.c0219029-fe37-49ef-a563-4d24eea118c2@korelogic.com> In-Reply-To: <20240427234834.c0219029-fe37-49ef-a563-4d24eea118c2@korelogic.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [oss-security] Update on the distro-backdoor-scanner effort Hank Leininger wrote: > On 2024-04-27, Jacob Bachmeyer wrote: > > >>> - Output is manageable; able to rule out all hits not part of the >>> actual xz-utils backdoors as false positives. >>> >> This is what I would expect: the backdoor dropper appears to have >> been specifically developed for xz-utils, but could /possibly/ be >> adaptable to other compression tools. >> > > Indeed, well my thinking was more along the lines of: "This is an > impressive amount of moving parts to be created new for this project, > and burned for just this project. What if what we are seeing is the 2nd > or 3rd gen of such a toolkit, and earlier ones have some similar > characteristics but maybe fewer layers, etc. so we could spot them?" I disagree here because, while I agree that there are quite a few moving parts, I also (having replicated unpacking the dropper shell scripts) see incremental development paths, and a few major mistakes that a more polished backdoor could have avoided. If this were a 2nd or 3rd generation toolkit, those mistakes would have been fixed. Overall, I think the "Jia Tan" crew went straight for the "Golden Key" and bit off /way/ more than they could chew. > [...] while the liblzma decompressor had > certain things going for it as a target, why not other things? (Nor am I > restricting to "other things that would get linked into sshd", but > really more broadly.) > I disagree with this assessment: the backdoor blob, according to reports so far, does not appear to actually affect xz or liblzma /at/ /all/; liblzma is merely used (as a dependency of libsystemd) to smuggle the blob into the sshd process and pass control into the blob during process initialization. From the reports that I have seen, the entrypoint that liblzma is patched to call does very little itself, and only modifies ld.so's data segment to half-register the blob with the LD_AUDIT framework so it will be called again later. Nearly all of the backdoor seems to be independent of liblzma. > [...] The > xz-utils stuff is many generations ahead of those; it would be > interesting to spot some missing links. > I suspect that the "missing links" you seek are to be found in various Windows malware over the years, at least as far as the binary backdoor blob is involved, but I have not been directly involved in that analysis. From the reports I have read, it seems to be more-or-less a piece of Windows malware adapted to an ELF environment. The adaptations also seem to be exactly the parts that went poorly, causing performance problems that led to the backdoor's discovery. (Or ld.so really is that much more efficient than the Windows module loader, and performance issues that are routinely lost in the noise on Windows got the "Jia Tan" crew caught.) The dropper scripts do not appear well-developed to me. Overall, they /do/ strike me as a first-generation effort, likely the authors' first shell scripts on top of that, and evidence a poor understanding of the GNU system. There are worse-than-beginner mistakes in there, like omitting the spaces around the '=' in a string comparison, and overall patterns that suggest a poor understanding of shell scripting. For example, a common idiom in shell programming is to use && and || for conditional execution of single commands, especially conditional exits; the dropper uses this idiom once, getting the test wrong (the aforementioned missing spaces), but then uses if/then/fi later for conditional exits. The dropper scripts also appear to be the product of copy-and-paste cargo cult programming: some commands are written as "[...] || true" even though their exit status will actually be ignored, suggesting that they were adapted from a Makefile or a script that runs with "set -e" in effect. Lastly, the outer dropper uses a linear series of head(1) commands to "skip 1KiB of garbage, extract 2KiB of backdoor, repeat" but simply reading the manuals would have revealed more appropriate tools for that---and a way to make the outer dropper independent of the inner dropper's length. >> You might get better results by indexing macro definitions found in >> *.m4 files, instead of trying to fuzzily hash the files. The >> interesting comparison is then different definitions of macros with >> the same name. >> > > I like this a lot as a potential next layer for the m4 reconciliation. > Essentially a field-level matching once things that match at a > file-level have been eliminated. I don't see why (he says, not actually > having dug into the m4 format much) we couldn't break apart all the m4s > we are choosing to consider known-good, and catalog each individual > macro, and then do the same when bashing project-specific files. Could > be we can entirely "clear" a file with an unknown checksum because it > consists 100% of idnividual macros that are known. More likely, you would be able to "clear" a file that actually differs in either whitespace or comments---or catch a file that has some nastiness inserted between macro definitions. > (Insert weird machine > here that combined OK macros in surprising ways.) > While m4 technically is Turing-complete, I strongly doubt the existence of m4 weird machines: m4 is a macro processor that performs text expansion. You should have a manual for it in the Info system. >> many (most?) modern Linux kernels are compressed using xz, which means >> that a Thompsonesque attack could binary-patch a freshly built kernel >> while compressing vmlinux to make vmlinuz. >> > > Good call. This may be far out on the.... "far out" scale, but it'd be > pretty trivial to harvest distro kernels that used xz to make their > vmlinuz, and then run those through multiple independent implementations > like we have done with .tar.xz files. Sold. > Make sure that at least one of those implementations is a wrapper around the xz-embedded decompresser that Linux itself uses. (As in, mmap() vmlinuz MAP_PRIVATE, mmap an output file at the appropriate virtual address, patch the jump into the kernel at the end of the decompression stub with a return, and call the embedded decompresser.) If you are concerned about weird machines, there is a tiny, paranoid, possibility that a tampered compressed stream would decompress "clean" with any other decompresser. I doubt that you will find anything, though. The "Jia Tan" crew does not seem to have been that advanced. >> The IFUNC mechanism is actually a security feature. In "inner-loop" >> code, having multiple implementations with different optimizations >> with the preferred implementation for the local processor chosen at >> runtime is fairly common. >> > > Thanks for this! I've seen that discussed as a (valid, useful) use of > IFUNC but also AFAIK things like musl don't implement such a thing, so > either software that wants it just doesn't support musl, or can't pick > an optimization, or does so in an undesirable way like writable pointers > in the data segment or... some other option? When IFUNC usage was added to xz-utils, the older "dispatch through a function pointer" technique was kept as a fallback when building without IFUNC. > [...] I'd be satisfied being able to say "IFUNCs are > used by 15 out of 10,000 packages; that's a small enough number we can > a) audit them all b) add alerting to tooling used for builds; when a > package suddenly starts using them, look into it." If the real number > turns out to be 1,000 out of 10,000, then that's good to know, and > probably give up. > Alerting should be simple matter of `grep -Ri ifunc .` at the top of the package tree. >> I currently suspect that the crackers used IFUNC support as a covert >> flag. The "jankiness" of the current glibc IFUNC implementation >> provided a convenient excuse to ask oss-fuzz to --disable-ifunc when >> building xz-utils, which *also* conveniently inhibited the backdoor >> dropper and ensured that the fuzzing builds would not contain the >> backdoor. >> > > Uncynically, I like this conspiracy theory. > Then I will go a step farther: I half-suspect that the entire CLMUL-based CRC calculation may have been added as an excuse to add the dispatch logic, which in turn was an excuse to use IFUNC. While I have not run tests, I suspect that the baseline table-driven CRC calculation is probably already I/O-bound on modern processors, especially x86-64 processors new enough to /have/ CLMUL. > [...] > I think Sam looked into existing pkg-config verifiers and found they do > not complain about things we thought they should complain about (this > could just mean we misunderstand their purpose). A strict lint-checker > for such files would be better than just checking for specific > suspicious patterns. But, I don't yet know how strict a format we could > insist on (would it turn out 10% of files in fact break what we > initially think are reasonable rules?). Even still, I think you could > embed badness in legit variables, although I haven't dug in enough to > know that for sure. > Quoting pkg-config(1): > Files have two kinds of line: keyword lines start with a keyword plus a > colon, and variable definitions start with an alphanumeric string plus > an equals sign. Keywords are defined in advance and have special > meaning to /pkg-config/; variables do not, you can have any variables > that you wish (however, users may expect to retrieve the usual > directory name variables). The format is clearly defined; any pkg-config file containing nonblank lines /not/ matching the above description should be rejected, and pkg-config itself should implement this check. This rule ensures that a pkg-config file cannot also contain shell commands. The sample backdoor also uses an *-uninstalled.pc file to override a legitimate file, but places the "uninstalled" file in a system directory, which means it is actually installed! The pkg-config program should also refuse to read *-uninstalled.pc files from system directories, and emit a warning if such a file is found to exist. -- Jacob