Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp3596082pxk; Mon, 21 Sep 2020 19:17:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzVzwSfeand+wHcxh02p0biUPq0N8/eae9H8of79XbJQELXgS6CzCpV33aIy25JLF9liWRh X-Received: by 2002:a05:6402:d09:: with SMTP id eb9mr1738240edb.219.1600741074897; Mon, 21 Sep 2020 19:17:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600741074; cv=none; d=google.com; s=arc-20160816; b=z1d4FcnztfcBZeOvB2hz7tltLH+c6Nby/DhjvWcFSwRdPZX6uZvB480r5AfR3D3aEU FhhRw1hXd+9XE2QbO8lUYDAQetJJQ1ZUryjjXdiALO/2HIiOtj9Jt9DkJ1j1ggFmNC4n zTAx9QA2ciOw06jNADogg3/gE5z0xNckkUp4DOJQCtpInmGwTLNz09Gi6Eg+drdB3QE8 Oz7lXT1tLrUI5vYO0anlyaJu0sUqtaPg5X8TK2YwzWJYJCkkJ5+7T43zZoY+EcCFMbMg 5NgAVuQA05Rp6eDyNRLMCM85IWgH0eN8BoJxEklMca3Ry81WYooeZbJh8SkIwJrUo9+A HWbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7iZC5dgMN4/Poja/NJzn248Hd+bAt+Je0cGY2i7LP2M=; b=cuAi0tJZWoW7S/pn6JVzNDWlwiBrttywI8u3mY6xh+2OJY8vkpPIRcaUCLfmfbkoJ3 LBLwQDUO30ESta4Xq5LQMs5Qn9l+B+nfrc1YGGsp20nxKMlNC72naMve+s37chzqYmUm 73LF6WJZyTABiZeZ80gWkjYEo96aKEZGVbNSDiXYa3Ox82HF7NBc7b6DKFPAkOFteuyj uiRnucs9TVVbRKDAGwfAYNQuYm6Jphlx+/CZtiLjBPQUOK7d1BhB5ftJdrzYXLNBXr+l YG8tTHSAXfKewIPoohOms6sGdtKkwlRxZXLz73sNGSPHeRuBM+gs/5dvH/xdwNZBOHXE fU9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ZvsltSwj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p4si9320999ejj.664.2020.09.21.19.17.31; Mon, 21 Sep 2020 19:17:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ZvsltSwj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729431AbgIVA6g (ORCPT + 99 others); Mon, 21 Sep 2020 20:58:36 -0400 Received: from mail.kernel.org ([198.145.29.99]:50972 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729405AbgIVA6f (ORCPT ); Mon, 21 Sep 2020 20:58:35 -0400 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 34A6923A9C for ; Tue, 22 Sep 2020 00:58:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1600736314; bh=mO4DYr7N0RLjWZa37eU/VgHJpvS4aRf2iPegAH0iEtc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=ZvsltSwjMQQjZiYWUVNiO25TW5AUeBh08HcndWK7Xe35RS/hY9lrzTe2a+eWc93Vs W63+BpGaM0/hEmj0H5NNiiF+B4Dto3ZfhGCuYSg/VL/ifmFADrmP21pnAjqQPEbpTf GPIWfRyXGAJ/aoCVrG5ZycMOv5bit2MDE6Nn2vFw= Received: by mail-wm1-f44.google.com with SMTP id l9so1605783wme.3 for ; Mon, 21 Sep 2020 17:58:34 -0700 (PDT) X-Gm-Message-State: AOAM5322roAqqsvHE/XmJDhe9SUFlYcxRkI41EQXgIP5v1Ak1ZHcdNx9 vUcq38bI4vUrsJrAH/9n0enxpSRRAF8AkuglCfJZdA== X-Received: by 2002:a1c:740c:: with SMTP id p12mr1761323wmc.176.1600736312695; Mon, 21 Sep 2020 17:58:32 -0700 (PDT) MIME-Version: 1.0 References: <563138b5-7073-74bc-f0c5-b2bad6277e87@gmail.com> <486c92d0-0f2e-bd61-1ab8-302524af5e08@gmail.com> In-Reply-To: From: Andy Lutomirski Date: Mon, 21 Sep 2020 17:58:20 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag To: Pavel Begunkov Cc: Andy Lutomirski , Arnd Bergmann , Christoph Hellwig , Al Viro , Andrew Morton , Jens Axboe , David Howells , linux-arm-kernel , X86 ML , LKML , "open list:MIPS" , Parisc List , linuxppc-dev , linux-s390 , sparclinux , linux-block , Linux SCSI List , Linux FS Devel , linux-aio , io-uring@vger.kernel.org, linux-arch , Linux-MM , Network Development , keyrings@vger.kernel.org, LSM List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov wro= te: > > > > On 22/09/2020 02:51, Andy Lutomirski wrote: > > On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov = wrote: > >> > >> On 21/09/2020 19:10, Pavel Begunkov wrote: > >>> On 20/09/2020 01:22, Andy Lutomirski wrote: > >>>> > >>>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann wrote: > >>>>> > >>>>> =EF=BB=BFOn Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski wrote: > >>>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig wr= ote: > >>>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote: > >>>>>>>> Said that, why not provide a variant that would take an explicit > >>>>>>>> "is it compat" argument and use it there? And have the normal > >>>>>>>> one pass in_compat_syscall() to that... > >>>>>>> > >>>>>>> That would help to not introduce a regression with this series ye= s. > >>>>>>> But it wouldn't fix existing bugs when io_uring is used to access > >>>>>>> read or write methods that use in_compat_syscall(). One example = that > >>>>>>> I recently ran into is drivers/scsi/sg.c. > >>>>> > >>>>> Ah, so reading /dev/input/event* would suffer from the same issue, > >>>>> and that one would in fact be broken by your patch in the hypotheti= cal > >>>>> case that someone tried to use io_uring to read /dev/input/event on= x32... > >>>>> > >>>>> For reference, I checked the socket timestamp handling that has a > >>>>> number of corner cases with time32/time64 formats in compat mode, > >>>>> but none of those appear to be affected by the problem. > >>>>> > >>>>>> Aside from the potentially nasty use of per-task variables, one th= ing > >>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way. If we're > >>>>>> going to have a generic mechanism for this, shouldn't we allow a f= ull > >>>>>> override of the syscall arch instead of just allowing forcing comp= at > >>>>>> so that a compat syscall can do a non-compat operation? > >>>>> > >>>>> The only reason it's needed here is that the caller is in a kernel > >>>>> thread rather than a system call. Are there any possible scenarios > >>>>> where one would actually need the opposite? > >>>>> > >>>> > >>>> I can certainly imagine needing to force x32 mode from a kernel thre= ad. > >>>> > >>>> As for the other direction: what exactly are the desired bitness/arc= h semantics of io_uring? Is the operation bitness chosen by the io_uring c= reation or by the io_uring_enter() bitness? > >>> > >>> It's rather the second one. Even though AFAIR it wasn't discussed > >>> specifically, that how it works now (_partially_). > >> > >> Double checked -- I'm wrong, that's the former one. Most of it is base= d > >> on a flag that was set an creation. > >> > > > > Could we get away with making io_uring_enter() return -EINVAL (or > > maybe -ENOTTY?) if you try to do it with bitness that doesn't match > > the io_uring? And disable SQPOLL in compat mode? > > Something like below. If PF_FORCE_COMPAT or any other solution > doesn't lend by the time, I'll take a look whether other io_uring's > syscalls need similar checks, etc. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 0458f02d4ca8..aab20785fa9a 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, = u32, to_submit, > if (ctx->flags & IORING_SETUP_R_DISABLED) > goto out; > > + ret =3D -EINVAl; > + if (ctx->compat !=3D in_compat_syscall()) > + goto out; > + This seems entirely reasonable to me. Sharing an io_uring ring between programs with different ABIs seems a bit nutty. > /* > * For SQ polling, the thread will do all submissions and complet= ions. > * Just return the requested submit count, and wake the thread if > @@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struc= t io_uring_params *p, > if (ret) > goto err; > > + ret =3D -EINVAL; > + if (ctx->compat) > + goto err; > + I may be looking at a different kernel than you, but aren't you preventing creating an io_uring regardless of whether SQPOLL is requested? > /* Only gets the ring fd, doesn't install it in the file table */ > fd =3D io_uring_get_fd(ctx, &file); > if (fd < 0) { > -- > Pavel Begunkov