Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp2794825pxx; Sun, 1 Nov 2020 10:10:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJwD1dfZQcDJkWrGrHo8OZbxL3pq6BBVChIrO+l/OccSw4C0hQmOWRlb59AbN6fdNQsf1Rnr X-Received: by 2002:aa7:dc12:: with SMTP id b18mr13003845edu.295.1604254242891; Sun, 01 Nov 2020 10:10:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604254242; cv=none; d=google.com; s=arc-20160816; b=YJcphPbMLpYKrViOLoCfPym2sHaHgd9CflHUN1L35TRZHjtBjPIcV3SzeWOBwUZCPF UcR1kw6fc6v3ZyzQ+YBqcWcybfnZeUtguvI9WRu3ERZgGezfJY1R0Ytz7V4NKn4nVgBN 2XDleMTut3Rj6KQ5JanZf6lGma+0R8SmT8JGb2M8n9kstKLnN5sJDGnRjXFTigm4MqWd aLp1zD6G1ld5H1V/lUE/mN2YD6x6IsKNiZhnAI1QrThedCTdq2UreBnLtGNNkDjdLHWf zz69R959EQoF0wW9nSH7tbV7sN2D/FS72czbjoXEGcSOL/yqRhXUqMaJqtmT7yp6f5Qn UntQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=UVed8ayiVwLeYquHeWopE8OlBBgVTp0O9zzfPtxAJ7Q=; b=eLEZMT5CPIG+dNxGM2npXUEe5x/ZRxZpOt6cqSYHFTMuTOxgl8pylIJLgon4oTUezs 86+ZOuUHPQqv9OGIL54wpucmSrIOii0J2g94ai5upg+O7XZqCEt5XkqOnoQ6JiuCF9dh VE32bhdjRPlVkGERkLrpmsNm0FAXgFFATub4Jbn4vxA8kCC0+qNE7t2DE+42S7fX9fcP Zj7CDqJip2wn/Rj8Q0sODQoFMZIl3NBaZdaxRkNeUA+mKWwtdV1lz0UxkCmLn8a7qt5R jgCERBtAHxQRcVxHQS6Ne4QV0G4ht1Y5r005Gkn/XAcCXepWCdARcWJWghi8Pw6O2XB7 djzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ECSincph; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h16si1243229ejd.579.2020.11.01.10.10.18; Sun, 01 Nov 2020 10:10:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ECSincph; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727233AbgKASID (ORCPT + 99 others); Sun, 1 Nov 2020 13:08:03 -0500 Received: from mail.kernel.org ([198.145.29.99]:32954 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727111AbgKASIC (ORCPT ); Sun, 1 Nov 2020 13:08:02 -0500 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id AF9FA22253 for ; Sun, 1 Nov 2020 18:08:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604254082; bh=OfBjCIPtc0TxbS1/4RHTocGxkKTL3vnRTUDz7fBxN1s=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ECSincphbOM5qbSu3Xdc6BUpyip/6hXudBbbPlAJm8riQGC6eQ60ewr48wfyZIStO 1hd8Y7QABAoZY/PQ3+O91ngG/ORXZQYzClemLRo0Jg1zvBTIU9kVlgbfzELq2lBsvH UyDyOe/bxmlP0z9RKlZDIXyeOXFEmpT80xEoMkJY= Received: by mail-wr1-f48.google.com with SMTP id n15so12033086wrq.2 for ; Sun, 01 Nov 2020 10:08:01 -0800 (PST) X-Gm-Message-State: AOAM531LealxlE3k35J2ZPSHpB3Hq4n0yD3T2CwmqaEiHWh0/OIVssQ6 ijHhKEBlylSTEMWoqo8GHJ8P0zqMHgmTURY7G6IZWw== X-Received: by 2002:adf:e682:: with SMTP id r2mr15522981wrm.184.1604254080252; Sun, 01 Nov 2020 10:08:00 -0800 (PST) MIME-Version: 1.0 References: <1156938F-A9A3-4EE9-B059-2294A0B9FBFE@jrtc27.com> <20201012134444.1905-1-jrtc27@jrtc27.com> <20201101012202.GM534@brightrain.aerifal.cx> <7842A462-0ADB-4EE3-B4CB-AE6DCD70CE1C@jrtc27.com> <20201101015013.GN534@brightrain.aerifal.cx> In-Reply-To: <20201101015013.GN534@brightrain.aerifal.cx> From: Andy Lutomirski Date: Sun, 1 Nov 2020 10:07:48 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] x86: Fix x32 System V message queue syscalls To: Rich Felker Cc: Jessica Clarke , Andy Lutomirski , Florian Weimer , linux-x86_64@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , X86 ML , "H. Peter Anvin" , LKML Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 31, 2020 at 6:50 PM Rich Felker wrote: > > On Sun, Nov 01, 2020 at 01:27:35AM +0000, Jessica Clarke wrote: > > On 1 Nov 2020, at 01:22, Rich Felker wrote: > > > On Sat, Oct 31, 2020 at 04:30:44PM -0700, Andy Lutomirski wrote: > > >> cc: some libc folks > > >> > > >> On Mon, Oct 12, 2020 at 6:45 AM Jessica Clarke w= rote: > > >>> > > >>> POSIX specifies that the first field of the supplied msgp, namely m= type, > > >>> is a long, not a __kernel_long_t, and it's a user-defined struct du= e to > > >>> the variable-length mtext field so we can't even bend the spec and = make > > >>> it a __kernel_long_t even if we wanted to. Thus we must use the com= pat > > >>> syscalls on x32 to avoid buffer overreads and overflows in msgsnd a= nd > > >>> msgrcv respectively. > > >> > > >> This is a mess. > > >> > > >> include/uapi/linux/msg.h has: > > >> > > >> /* message buffer for msgsnd and msgrcv calls */ > > >> struct msgbuf { > > >> __kernel_long_t mtype; /* type of message */ > > >> char mtext[1]; /* message text */ > > >> }; > > >> > > >> Your test has: > > >> > > >> struct msg_long { > > >> long mtype; > > >> char mtext[8]; > > >> }; > > >> > > >> struct msg_long_ext { > > >> struct msg_long msg_long; > > >> char mext[4]; > > >> }; > > >> > > >> and I'm unclear as to exactly what you're trying to do there with th= e > > >> "mext" part. > > >> > > >> POSIX says: > > >> > > >> The application shall ensure that the argument msgp points to = a user- > > >> defined buffer that contains first a field of type long speci= fying the > > >> type of the message, and then a data portion that holds the da= ta bytes > > >> of the message. The structure below is an example of what this= user-de=E2=80=90 > > >> fined buffer might look like: > > >> > > >> struct mymsg { > > >> long mtype; /* Message type. */ > > >> char mtext[1]; /* Message text. */ > > >> } > > >> > > >> NTP has this delightful piece of code: > > >> > > >> 44 typedef union { > > >> 45 struct msgbuf msgp; > > >> 46 struct { > > >> 47 long mtype; > > >> 48 int code; > > >> 49 struct timeval tv; > > >> 50 } msgb; > > >> 51 } MsgBuf; > > >> > > >> bluefish has: > > >> > > >> struct small_msgbuf { > > >> long mtype; > > >> char mtext[MSQ_QUEUE_SMALL_SIZE]; > > >> } small_msgp; > > >> > > >> > > >> My laptop has nothing at all in /dev/mqueue. > > >> > > >> So I don't really know what the right thing to do is. Certainly if > > >> we're going to apply this patch, we should also fix the header. I > > >> almost think we should *delete* struct msgbuf from the headers, sinc= e > > >> it's all kinds of busted, but that will break the NTP build. Ideall= y > > >> we would go back in time and remove it from the headers. > > >> > > >> Libc people, any insight? We can probably fix the bug without > > >> annoying anyone given how lightly x32 is used and how lightly POSIX > > >> message queues are used. > > > > > > If it's that outright wrong and always has been, I feel like the old > > > syscall numbers should just be deprecated and new ones assigned. > > > Otherwise, there's no way for userspace to be safe against data > > > corruption when run on older kernels. If there's a new syscall number= , > > > libc can just use the new one unconditionally (giving ENOSYS on > > > kernels where it would be broken) or have a x32-specific > > > implementation that makes the old syscall and performs translation if > > > the new one fails with ENOSYS. > > > > That doesn't really help broken code continue to work reliably, as > > upgrading libc will just pull in the new syscall for a binary that's > > expecting the broken behaviour, unless you do symbol versioning, but > > then it'll just break when you next recompile the code, and there's no > > way for that to be diagnosed given the *application* has to define the > > type. But given it's application-defined I really struggle to see how > > any code out there is actually expecting the current x32 behaviour as > > you'd have to go really out of your way to find out that x32 is broken > > and needs __kernel_long_t. I don't think there's any way around just > > technically breaking ABI whilst likely really fixing ABI in 99.999% of > > cases (maybe 100%). > > I'm not opposed to "breaking ABI" here because the current syscall > doesn't work unless someone wrote bogus x32-specific code to work > around it being wrong. I don't particularly want to preserve any of > the current behavior. > > What I am somewhat opposed to is making a situation where an updated > libc can't be safe against getting run on a kernel with a broken > version of the syscall and silently corrupting data. I'm flexible > about how avoiding tha tis achieved. If we're sufficiently confident that we won't regress anything by fixing the bug, I propose we do the following. First, we commit a fix that's Jessica's patch plus a fix to struct msghdr, and we mark that for -stable. Then we commit another patch that removes 'struct msghdr' from uapi entirely, but we don't mark that for -stable. If people complain about the latter, we revert it. After all, we could argue that the x32 bug here isn't really that different from any of the huge number of various syscall bugs we've had in the past and should just be fixed in the kernel. If you run user programs on a buggy kernel, you get buggy behavior... --Andy