Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp4146276rwd; Tue, 23 May 2023 04:00:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7PKurhWfzKMP0r9JP0W5Y00/YimGd5O82VS8BzfWBqfPLEzERg4br0KpPbSdknFPwNVVdf X-Received: by 2002:a17:90a:1b6c:b0:250:ce6a:cf1a with SMTP id q99-20020a17090a1b6c00b00250ce6acf1amr13190590pjq.38.1684839643277; Tue, 23 May 2023 04:00:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684839643; cv=none; d=google.com; s=arc-20160816; b=fzxHv089mLyZTdYjSJ6qXHa8S7PqcNdRU1Xw9BpGzlHXwIi6iuuTzuTbDUxdJtg4h7 PiW5JTU4e/kINVEaxhwrOkq98Xa5PlZ5TBcjNKdzsZfONAeYcZIKxRQDjorSE1S2Zy9+ OKJf5J4/u34/bpFbEZCusB5UI5ciSLbk7WtSYTVJCHHqRcG2bzUnhMMz5irwijstfRSI E21+abhkYg6ERhr9JxZIWXhmhVd/bWCO55a9WOJkm2YafN3SK2kzC+o4FEN9CWHJ8X96 49kQ317V33EQWv1TXmT4gs5lzuMAA9PU8ERlL1Ac5TtHrqHnFdGsdH81OjMhSJTsNYbR 5dmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:message-id:user-agent :references:in-reply-to:subject:cc:to:from:date:mime-version :dkim-signature:dkim-filter; bh=RMMF/TIJahmFqopd2qxNaPFmrmsiIjxi/vD3Y9rzttM=; b=MgiVTvIl5zKBcZokElBqikgamB4ngNJpXq/gvbYBk4KPRaiGscL/yqfFUVBC4FLuCk 4/iPsZbAkJtdcy7mS1Hr36c59Rd+zqwYaMcGWOk9qK3M3OMcl/REkOq5a1ns/WfhUeTE XuIGkHDcsb5LW9VfHQuFg/jsXo8wRaEkUkDK4OpJxVFuY4ofMWOVnn04exdKkDJRjaRy BbnkSLlJPFjsjeMj0X7w/JUhgYpz09XHQzOVj8NhKcWIOjFlo01u7jwC02wTpwtFCFT+ cH6rkPVpAfXdwFlmjrblpO6Emrpcd+gCm9+S/1AAMMaJQDvW7ACWi6lKZh4dkHOykN7l QVTA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ispras.ru header.s=default header.b=rnA8PS7r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ispras.ru Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gv22-20020a17090b11d600b0024e0d0ebda1si8346989pjb.75.2023.05.23.04.00.29; Tue, 23 May 2023 04:00:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ispras.ru header.s=default header.b=rnA8PS7r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ispras.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235695AbjEWKxP (ORCPT + 99 others); Tue, 23 May 2023 06:53:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232450AbjEWKxO (ORCPT ); Tue, 23 May 2023 06:53:14 -0400 Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B75F011A for ; Tue, 23 May 2023 03:53:11 -0700 (PDT) Received: from mail.ispras.ru (unknown [83.149.199.84]) by mail.ispras.ru (Postfix) with ESMTPSA id 7EFDB4076265; Tue, 23 May 2023 10:53:09 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 7EFDB4076265 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1684839189; bh=RMMF/TIJahmFqopd2qxNaPFmrmsiIjxi/vD3Y9rzttM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=rnA8PS7r5OC0HUVXFetZ0ldVcZbaRzcXsw1p9kwWVk9I7Ciwxx5x1VwVF+JkuoknT yXUn/KV0QjUeCqwAdJ29jEGulbUk+7SfcBLnYgX/+2QNbxR4FzAWXypByPcEpBmwcu T4MPsQInZc8Dne4aUf0AXoqzspzjRzuNSmGQhR5Q= MIME-Version: 1.0 Date: Tue, 23 May 2023 13:53:09 +0300 From: Alexey Izbyshev To: David Hildenbrand Cc: Florent Revest , linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, catalin.marinas@arm.com, anshuman.khandual@arm.com, joey.gouly@arm.com, mhocko@suse.com, keescook@chromium.org, peterx@redhat.com, broonie@kernel.org, szabolcs.nagy@arm.com, kpsingh@kernel.org, gthelen@google.com, toiwoton@gmail.com Subject: Re: [PATCH v2 3/5] mm: Make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long In-Reply-To: References: <20230517150321.2890206-1-revest@chromium.org> <20230517150321.2890206-4-revest@chromium.org> <884d131bbc28ebfa0b729176e6415269@ispras.ru> <3c2e210b75bd56909322e8a3e5086d91@ispras.ru> User-Agent: Roundcube Webmail/1.4.4 Message-ID: <7c572622c0d8e283fc880fe3f4ffac27@ispras.ru> X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,PDS_BTC_ID,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023-05-23 12:12, David Hildenbrand wrote: > On 22.05.23 20:58, Alexey Izbyshev wrote: >> On 2023-05-22 19:22, David Hildenbrand wrote: >>> On 22.05.23 12:35, Alexey Izbyshev wrote: >>>> On 2023-05-22 11:55, David Hildenbrand wrote: >>>>> On 17.05.23 17:03, Florent Revest wrote: >>>>>> Alexey pointed out that defining a prctl flag as an int is a >>>>>> footgun >>>>>> because, under some circumstances, when used as a flag to prctl, >>>>>> it >>>>>> can >>>>>> be casted to long with garbage upper bits which would result in >>>>>> unexpected behaviors. >>>>>> >>>>>> This patch changes the constant to a UL to eliminate these >>>>>> possibilities. >>>>>> >>>>>> Signed-off-by: Florent Revest >>>>>> Suggested-by: Alexey Izbyshev >>>>>> --- >>>>>> include/uapi/linux/prctl.h | 2 +- >>>>>> tools/include/uapi/linux/prctl.h | 2 +- >>>>>> 2 files changed, 2 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/include/uapi/linux/prctl.h >>>>>> b/include/uapi/linux/prctl.h >>>>>> index f23d9a16507f..6e9af6cbc950 100644 >>>>>> --- a/include/uapi/linux/prctl.h >>>>>> +++ b/include/uapi/linux/prctl.h >>>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map { >>>>>> /* Memory deny write / execute */ >>>>>> #define PR_SET_MDWE 65 >>>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN 1 >>>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0) >>>>>> #define PR_GET_MDWE 66 >>>>>> diff --git a/tools/include/uapi/linux/prctl.h >>>>>> b/tools/include/uapi/linux/prctl.h >>>>>> index 759b3f53e53f..6e6563e97fef 100644 >>>>>> --- a/tools/include/uapi/linux/prctl.h >>>>>> +++ b/tools/include/uapi/linux/prctl.h >>>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map { >>>>>> /* Memory deny write / execute */ >>>>>> #define PR_SET_MDWE 65 >>>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN 1 >>>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0) >>>>>> #define PR_GET_MDWE 66 >>>>>> >>>>> >>>>> Both are changing existing uapi, so you'll already have existing >>>>> user >>>>> space using the old values, that your kernel code has to deal with >>>>> no? >>>> >>>> I'm the one who suggested this change, so I feel the need to >>>> clarify. >>>> >>>> For any existing 64-bit user space code using the kernel and the >>>> uapi >>>> headers before this patch and doing the wrong prctl(PR_SET_MDWE, >>>> PR_MDWE_REFUSE_EXEC_GAIN) call instead of the correct >>>> prctl(PR_SET_MDWE, >>>> (unsigned long)PR_MDWE_REFUSE_EXEC_GAIN), there are two >>>> possibilities >>>> when prctl() implementation extracts the second argument via >>>> va_arg(op, >>>> unsigned long): >>>> >>>> * It gets lucky, and the upper 32 bits of the argument are zero. The >>>> call does what is expected by the user. >>>> >>>> * The upper 32 bits are non-zero junk. The flags argument is >>>> rejected >>>> by >>>> the kernel, and the call fails with EINVAL (unexpectedly for the >>>> user). >>>> >>>> This change is intended to affect only the second case, and only >>>> after >>>> the program is recompiled with the new uapi headers. The currently >>>> wrong, but naturally-looking prctl(PR_SET_MDWE, >>>> PR_MDWE_REFUSE_EXEC_GAIN) call becomes correct. >>>> >>>> The kernel ABI is unaffected by this change, since it has been >>>> defined >>>> in terms of unsigned long from the start. >>> >>> The thing I'm concerned about is the following: old user space (that >>> would fail) on new kernel where we define some upper 32bit to >>> actually >>> have a meaning (where it would succeed with wrong semantics). >>> >>> IOW, can we ever really "use" these upper 32bit, or should we instead >>> only consume the lower 32bit in the kernel and effectively ignore the >>> upper 32bit? >>> >> I see, thanks. But I think this question is mostly independent from >> this >> patch. The patch removes a footgun, but it doesn't change the flags >> check in the kernel, and nothing stops the user from doing >> >> int flags = PR_MDWE_REFUSE_EXEC_GAIN; >> prctl(PR_SET_MDWE, flags); >> >> So we have to decide whether to ignore the upper 32 bits or not even >> if >> this patch is not applied (actually *had to* when PR_SET_MDWE op was >> being added). > > Well, an alternative to this patch would be to say "well, for this > prctl we ignore any upper 32bit. Then, this change would not be > needed. Yes, I also don't like that :) > > Bu unrelated, I looked at some other random prctl. > > #define SUID_DUMP_USER 1 > > And in kernel/sys.c: > > case PR_SET_DUMPABLE: > if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER) > ... > > Wouldn't that also suffer from the same issue, or how is this > different? > Yes, it is the same issue, so e.g. prctl(PR_SET_DUMPABLE, SUID_DUMP_DISABLE ) may wrongly fail with EINVAL on 64-bit targets. > Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We > need arg2 -> arg5 to be 0. But wouldn't the following also just pass a > 0 "int" ? > > prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0) > Yes, this is not reliable on 64-bit targets too. The simplest fix is to use "0L", as done in MDWE self-tests (but many other tests get this wrong). Florent also expressed surprise[1] that we don't see a lot of failures due to such issues, and I tried to provide some reasons. To elaborate on the x86-64 thing, for prctl(PR_SET_DUMPABLE, 0) the compiler will likely generate "xorl %esi, %esi" to pass zero, but this instruction will also clear the upper 32 bits of %rsi, so the problem is masked (and I believe CPU vendors are motivated to do such zeroing to reduce false dependencies). But this zeroing is not required by the ABI, so in a more complex situation junk might get through. Real-world examples of very similar breakage in variadic functions involving NULL sentinels are mentioned in [2] (the musl bug report is [3]). In short, musl defined NULL as plain 0 for C++, so when people do e.g. execl("/bin/true", "true", NULL), junk might prevent detection of the sentinel in execl() impl. (Though if the sentinel is passed via stack because there are a lot of preceding arguments, the breakage becomes more apparent because auto-zeroing of registers doesn't come into play anymore.) > > I'm easily confused by such (va_args) things, so sorry for the dummy > questions. This stuff *is* confusing, and note that Linux man pages don't even tell that prctl() is actually declared as a variadic function (and for ptrace() this is mentioned only in the notes, but not in its signature). Thanks, Alexey [1] https://lore.kernel.org/lkml/3a38319a3b241e578729ffa5484ad24b@ispras.ru [2] https://ewontfix.com/11 [3] https://www.openwall.com/lists/musl/2013/01/09/1