Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp2809641pxb; Sun, 15 Nov 2020 19:08:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJwiB1A48HvNbMLrrD8siCr+7qg38E/dS0GfCIULl8exvgugyT+dJ8bOfZ9uSA3LNG1UpZjw X-Received: by 2002:a05:6402:1352:: with SMTP id y18mr13525632edw.378.1605496099645; Sun, 15 Nov 2020 19:08:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605496099; cv=none; d=google.com; s=arc-20160816; b=kQ6wF4SBFcOTHR32BE/GkRro3fq+XIHH2tzo9zfBollW7pZaUIkUjk8jmZC6WBuvkc Wyl4fDxcx+EDXRPj3EFdayfcC4JW/dqKR5L7KMrKn023cztq6C4yfrTpq0VuTKOmPYFb wfUXGJIFJQxPV1CBLleAEhIKXRLP37ntcj+cof+/eglYai4Kr2AUWSXLLE3e9scp7jTo P1hOqsYy6AI2rZeBBuCPx8fimjbhAIWPLWirjrNQ6f8+iVQZX2O/j4Zl2Ag7GA/8pfLh usVc3+Qrs6CAmTQE94CkaHKl6eV1Y0iFGEH74zoad/aM2Z51o4FILEMcxcrrwXQN/PzL rU9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=gJxn6sq6qhon1acm2iYoh2wPDnbLeK+vN/VWPc4HSoI=; b=nbQJntRUSc4Fcdj3ALzoywxHQQ8DEsLQsPo0fUH+KEyUci1RPn/C8A2xZUenqmZAzc A/mc05uqVV4P4YfowQnP//Pv1odUVuHWBWCfLDacRVZ3c615hoG1eQjZGhfBGUVsTwsR Lb0Va3rORd8vvnx/FuwL3jw4UBNlu7bGvKu5Xi4sw5tYqQ6EN/HS0Pyp5KcmE9kG3eAI JRlQQbnK+NNAgpSjuhQwBNJIrA0CVHZX23P6scM6J//1N7p6rYU5A568BEv/UMeZ3tY9 pjFrQwGiJ03XG00G44+SXNRYnYxPeOe44kQR2bwR6WJbUXjPcrBlYb9b5OcX2Fy8SzAe xAXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@jrtc27.com header.s=gmail.jrtc27.user header.b=coGc1PVR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id os24si10845656ejb.272.2020.11.15.19.07.57; Sun, 15 Nov 2020 19:08:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@jrtc27.com header.s=gmail.jrtc27.user header.b=coGc1PVR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727936AbgKPAzK (ORCPT + 99 others); Sun, 15 Nov 2020 19:55:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725981AbgKPAzJ (ORCPT ); Sun, 15 Nov 2020 19:55:09 -0500 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 938F5C0613CF for ; Sun, 15 Nov 2020 16:55:09 -0800 (PST) Received: by mail-wr1-x441.google.com with SMTP id r17so16971178wrw.1 for ; Sun, 15 Nov 2020 16:55:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jrtc27.com; s=gmail.jrtc27.user; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=gJxn6sq6qhon1acm2iYoh2wPDnbLeK+vN/VWPc4HSoI=; b=coGc1PVRPjuIFI2jeppkSKXeBYOeMb/8VA4zIWi0XOUmssmUrSPESOCPtxZ79EKh2p lRzdN2Mkp3B+9N3fX7Enax2EKUl4qvqi3NTvFrxqiBgLQfhWfqaU7A+bE5L6tvTjWQLW /zqmtVmanB+QUrTNKHEeLQVLl3PnvFYIe8NnVhGVt9kJgCb9kLUkya1yeV0u/Y9q/Ire cUbPKK+zmP5bkVAREqFKaQy4RQri1WJfTwrWwCirov18u3YOp+H6qXT/2RqlvIqci7C+ y23LNOyjJI6H6ulWL6c/zJpF15EuRjvaazDslKb1SQgYMiSD7AK1+g2qAe/RXcBo0qit uA8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=gJxn6sq6qhon1acm2iYoh2wPDnbLeK+vN/VWPc4HSoI=; b=uFFnjVp1GpCNmt8BJBcNM/7EhqQLrihnxe9tYQkotgLcg9qtBhLXH4e5E1DtziX38Y K7sMCkLUlX4DIEUPk0CNbqn1D+0A6Eo7LLuFS5/9lKatKPpCQCUkig+MyBlkSEwhZLas lJiWdgXW1mPxD8NKvWFeOToQUIWQiRc2031vaRCg6Dz251YuMFoZ9RIlfk14zdpPdaHb GDJneX1hlEvKILXas1oXQsGu6f4Ox/YlV7xLbEhA1R7zvdXNhDRfhdwa91i8v258mLDO GMsxZ8pBdHnyTW7e8ks0VvWDj7SzGIGnJi6DS5W0z7+T5gu5jhZb0VvqwI533hSBXnlO RA8A== X-Gm-Message-State: AOAM533xJjeCBHr4TJvZ/Ikev2wdNt5nn1UllNchgI3NIPowtSxFP9hq jZnUQBy0Ks9GwsTiMva1/kItDQ== X-Received: by 2002:a05:6000:182:: with SMTP id p2mr17712466wrx.116.1605488108155; Sun, 15 Nov 2020 16:55:08 -0800 (PST) Received: from [192.168.149.251] (trinity-students-nat.trin.cam.ac.uk. [131.111.193.104]) by smtp.gmail.com with ESMTPSA id c6sm21928315wrh.74.2020.11.15.16.55.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 15 Nov 2020 16:55:07 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [PATCH v2] x86: Fix x32 System V message queue syscalls From: Jessica Clarke In-Reply-To: <20201101210102.GO534@brightrain.aerifal.cx> Date: Mon, 16 Nov 2020 00:55:05 +0000 Cc: Andy Lutomirski , Florian Weimer , linux-x86_64@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , LKML Content-Transfer-Encoding: quoted-printable Message-Id: <29423184-A433-42D4-B635-CDEFE7271B40@jrtc27.com> References: <1156938F-A9A3-4EE9-B059-2294A0B9FBFE@jrtc27.com> <20201012134444.1905-1-jrtc27@jrtc27.com> <20201101012202.GM534@brightrain.aerifal.cx> <7842A462-0ADB-4EE3-B4CB-AE6DCD70CE1C@jrtc27.com> <20201101015013.GN534@brightrain.aerifal.cx> <04832096-ED7F-4754-993D-F578D4A90843@jrtc27.com> <20201101210102.GO534@brightrain.aerifal.cx> To: Rich Felker X-Mailer: Apple Mail (2.3608.120.23.2.4) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1 Nov 2020, at 21:01, Rich Felker wrote: >=20 > On Sun, Nov 01, 2020 at 06:27:10PM +0000, Jessica Clarke wrote: >> On 1 Nov 2020, at 18:15, Jessica Clarke wrote: >>>=20 >>> On 1 Nov 2020, at 18:07, Andy Lutomirski wrote: >>>>=20 >>>> On Sat, Oct 31, 2020 at 6:50 PM Rich Felker = wrote: >>>>>=20 >>>>> On Sun, Nov 01, 2020 at 01:27:35AM +0000, Jessica Clarke wrote: >>>>>> On 1 Nov 2020, at 01:22, Rich Felker wrote: >>>>>>> On Sat, Oct 31, 2020 at 04:30:44PM -0700, Andy Lutomirski wrote: >>>>>>>> cc: some libc folks >>>>>>>>=20 >>>>>>>> On Mon, Oct 12, 2020 at 6:45 AM Jessica Clarke = wrote: >>>>>>>>>=20 >>>>>>>>> POSIX specifies that the first field of the supplied msgp, = namely mtype, >>>>>>>>> is a long, not a __kernel_long_t, and it's a user-defined = struct due to >>>>>>>>> the variable-length mtext field so we can't even bend the spec = and make >>>>>>>>> it a __kernel_long_t even if we wanted to. Thus we must use = the compat >>>>>>>>> syscalls on x32 to avoid buffer overreads and overflows in = msgsnd and >>>>>>>>> msgrcv respectively. >>>>>>>>=20 >>>>>>>> This is a mess. >>>>>>>>=20 >>>>>>>> include/uapi/linux/msg.h has: >>>>>>>>=20 >>>>>>>> /* message buffer for msgsnd and msgrcv calls */ >>>>>>>> struct msgbuf { >>>>>>>> __kernel_long_t mtype; /* type of message */ >>>>>>>> char mtext[1]; /* message text */ >>>>>>>> }; >>>>>>>>=20 >>>>>>>> Your test has: >>>>>>>>=20 >>>>>>>> struct msg_long { >>>>>>>> long mtype; >>>>>>>> char mtext[8]; >>>>>>>> }; >>>>>>>>=20 >>>>>>>> struct msg_long_ext { >>>>>>>> struct msg_long msg_long; >>>>>>>> char mext[4]; >>>>>>>> }; >>>>>>>>=20 >>>>>>>> and I'm unclear as to exactly what you're trying to do there = with the >>>>>>>> "mext" part. >>>>>>>>=20 >>>>>>>> POSIX says: >>>>>>>>=20 >>>>>>>> The application shall ensure that the argument msgp points = to a user- >>>>>>>> defined buffer that contains first a field of type long = specifying the >>>>>>>> type of the message, and then a data portion that holds the = data bytes >>>>>>>> of the message. The structure below is an example of what = this user-de=E2=80=90 >>>>>>>> fined buffer might look like: >>>>>>>>=20 >>>>>>>> struct mymsg { >>>>>>>> long mtype; /* Message type. */ >>>>>>>> char mtext[1]; /* Message text. */ >>>>>>>> } >>>>>>>>=20 >>>>>>>> NTP has this delightful piece of code: >>>>>>>>=20 >>>>>>>> 44 typedef union { >>>>>>>> 45 struct msgbuf msgp; >>>>>>>> 46 struct { >>>>>>>> 47 long mtype; >>>>>>>> 48 int code; >>>>>>>> 49 struct timeval tv; >>>>>>>> 50 } msgb; >>>>>>>> 51 } MsgBuf; >>>>>>>>=20 >>>>>>>> bluefish has: >>>>>>>>=20 >>>>>>>> struct small_msgbuf { >>>>>>>> long mtype; >>>>>>>> char mtext[MSQ_QUEUE_SMALL_SIZE]; >>>>>>>> } small_msgp; >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> My laptop has nothing at all in /dev/mqueue. >>>>>>>>=20 >>>>>>>> So I don't really know what the right thing to do is. = Certainly if >>>>>>>> we're going to apply this patch, we should also fix the header. = I >>>>>>>> almost think we should *delete* struct msgbuf from the headers, = since >>>>>>>> it's all kinds of busted, but that will break the NTP build. = Ideally >>>>>>>> we would go back in time and remove it from the headers. >>>>>>>>=20 >>>>>>>> Libc people, any insight? We can probably fix the bug without >>>>>>>> annoying anyone given how lightly x32 is used and how lightly = POSIX >>>>>>>> message queues are used. >>>>>>>=20 >>>>>>> If it's that outright wrong and always has been, I feel like the = old >>>>>>> syscall numbers should just be deprecated and new ones assigned. >>>>>>> Otherwise, there's no way for userspace to be safe against data >>>>>>> corruption when run on older kernels. If there's a new syscall = number, >>>>>>> libc can just use the new one unconditionally (giving ENOSYS on >>>>>>> kernels where it would be broken) or have a x32-specific >>>>>>> implementation that makes the old syscall and performs = translation if >>>>>>> the new one fails with ENOSYS. >>>>>>=20 >>>>>> That doesn't really help broken code continue to work reliably, = as >>>>>> upgrading libc will just pull in the new syscall for a binary = that's >>>>>> expecting the broken behaviour, unless you do symbol versioning, = but >>>>>> then it'll just break when you next recompile the code, and = there's no >>>>>> way for that to be diagnosed given the *application* has to = define the >>>>>> type. But given it's application-defined I really struggle to see = how >>>>>> any code out there is actually expecting the current x32 = behaviour as >>>>>> you'd have to go really out of your way to find out that x32 is = broken >>>>>> and needs __kernel_long_t. I don't think there's any way around = just >>>>>> technically breaking ABI whilst likely really fixing ABI in = 99.999% of >>>>>> cases (maybe 100%). >>>>>=20 >>>>> I'm not opposed to "breaking ABI" here because the current syscall >>>>> doesn't work unless someone wrote bogus x32-specific code to work >>>>> around it being wrong. I don't particularly want to preserve any = of >>>>> the current behavior. >>>>>=20 >>>>> What I am somewhat opposed to is making a situation where an = updated >>>>> libc can't be safe against getting run on a kernel with a broken >>>>> version of the syscall and silently corrupting data. I'm flexible >>>>> about how avoiding tha tis achieved. >>>>=20 >>>> If we're sufficiently confident that we won't regress anything by >>>> fixing the bug, I propose we do the following. First, we commit a = fix >>>> that's Jessica's patch plus a fix to struct msghdr, and we mark = that >>>> for -stable. Then we commit another patch that removes 'struct >>>> msghdr' from uapi entirely, but we don't mark that for -stable. If >>>> people complain about the latter, we revert it. >>>=20 >>> Thinking about this more, MIPS n32 is also affected by that header. = In >>> fact the n32 syscalls currently do the right thing and use the = compat >>> implementations, so the header is currently out-of-sync with the = kernel >>> there*. This should be noted when committing the change to msg.h. >>=20 >> Never mind, it seems MIPS n32 is weird and leaves __kernel_long_t as = a >> normal long despite being an ILP32-on-64-bit ABI, I guess because = it's >> inherited from IRIX rather than being invented by the GNU world. >=20 > Yes, the whole __kernel_long_t invention is largely x32-only (maybe > theoretically on aarch64-ilp32 too? if that even really exists?) and > is pretty much entirely a mistake from lacking the proper > infrastructure to do time64 when x32 was introduced (note that n32 has > 32-bit old-time_t). I hope effort will be made to keep the same > mistake from creeping into future ilp32-on-64 ABIs if there are any. Ping? Does anyone have further thoughts/are people happy for this to land? Jess