Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2636437pxk; Sun, 20 Sep 2020 11:09:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy9Kvx7WKDcwn0L1PGtfItrEbtOAO1YsRDewTQymSH55yCVvDzTNM1t8bFNbKcOVcI6hlPM X-Received: by 2002:a17:906:2c14:: with SMTP id e20mr48224974ejh.205.1600625370045; Sun, 20 Sep 2020 11:09:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600625370; cv=none; d=google.com; s=arc-20160816; b=VI7wnpFcaaPVaJPaEGaov6mIBh1BduZe316jfzpP+PSOxaWh4xkFMgMjNzWVbdcs1J Q/HhEzYrqwj29gNCY3zxIekpOofNWNGN2INJaSXB7Xirka5+R8tWC2GWkrL9vt8OqnlE ie4B2XT4fVVjDb/WrzvSInrzdz8fL105iLgOQcHWzj7u5Uednva8P+PnSR88aQ48KQYH +aw8Lv9dgyCh7MAMahZNOpVMlgpeGYcUJHJuPD6MAft2yhSZlcNITwOU62Y8TXBr8YQ3 kErX6nduT+cTnE/BGXFRKNQBlCKcjGdysuDnUELM9xBOyApyzcZwLHY3r2b5jkq1QkuV RDhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=r5aVMS36rAzsZT2l1QWrkug1vXFkvw4JbSaM6KgQByo=; b=fM85ThoBn/UGPKFbHhYas6gdOB4S9jkbCEClvhxlSLgP2//eoa3U6mxiPfUKMfLsBI BvqbSU/YSwWMRI/6SCRA40eiEQZ2bceAfqoeKs8dXN7LPbtgGER7g5sg3ZhShx9KwYVd Aiyhuk2iVR9xPhlYHPomAaeQi5UPmAJmvbEsUpa2fioKhGPCYrjW7Bwaw6hhB9fpX4U8 X54d/q7PC9uelz2vJjWZjzGmpW8dawkqaxvl7rPDHyp+o9EBs7swpq0noawreYrgZEq9 FZXfwvH/vOR7anPSqRiojWpJtFZCeImQFJE46DgdqBNMzk5GApQT4Diq516+On9yKax8 LGVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id fi13si6787315ejb.490.2020.09.20.11.09.04; Sun, 20 Sep 2020 11:09:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726267AbgITSIC (ORCPT + 99 others); Sun, 20 Sep 2020 14:08:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726043AbgITSIB (ORCPT ); Sun, 20 Sep 2020 14:08:01 -0400 Received: from ZenIV.linux.org.uk (zeniv.linux.org.uk [IPv6:2002:c35c:fd02::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 470E0C061755; Sun, 20 Sep 2020 11:08:01 -0700 (PDT) Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kK3kU-002bOQ-WD; Sun, 20 Sep 2020 18:07:43 +0000 Date: Sun, 20 Sep 2020 19:07:42 +0100 From: Al Viro To: Matthew Wilcox Cc: Christoph Hellwig , Andrew Morton , Jens Axboe , Arnd Bergmann , David Howells , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, io-uring@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, keyrings@vger.kernel.org, linux-security-module@vger.kernel.org Subject: Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag Message-ID: <20200920180742.GN3421308@ZenIV.linux.org.uk> References: <20200918124533.3487701-1-hch@lst.de> <20200918124533.3487701-2-hch@lst.de> <20200920151510.GS32101@casper.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200920151510.GS32101@casper.infradead.org> Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Sep 20, 2020 at 04:15:10PM +0100, Matthew Wilcox wrote: > On Fri, Sep 18, 2020 at 02:45:25PM +0200, Christoph Hellwig wrote: > > Add a flag to force processing a syscall as a compat syscall. This is > > required so that in_compat_syscall() works for I/O submitted by io_uring > > helper threads on behalf of compat syscalls. > > Al doesn't like this much, but my suggestion is to introduce two new > opcodes -- IORING_OP_READV32 and IORING_OP_WRITEV32. The compat code > can translate IORING_OP_READV to IORING_OP_READV32 and then the core > code can know what that user pointer is pointing to. Let's separate two issues: 1) compat syscalls want 32bit iovecs. Nothing to do with the drivers, dealt with just fine. 2) a few drivers are really fucked in head. They use different *DATA* layouts for reads/writes, depending upon the calling process. IOW, if you fork/exec a 32bit binary and your stdin is one of those, reads from stdin in parent and child will yield different data layouts. On the same struct file. That's what Christoph worries about (/dev/sg he'd mentioned is one of those). IMO we should simply have that dozen or so of pathological files marked with FMODE_SHITTY_ABI; it's not about how they'd been opened - it describes the userland ABI provided by those. And it's cast in stone. Any in_compat_syscall() in ->read()/->write() instances is an ABI bug, plain and simple. Some are unfixable for compatibility reasons, but any new caller like that should be a big red flag. How we import iovec array is none of the drivers' concern; we do not need to mess with in_compat_syscall() reporting the matching value, etc. for that. It's about the instances that want in_compat_syscall() to decide between the 32bit and 64bit data layouts. And I believe that we should simply have them marked as such and rejected by io_uring. With any new occurences getting slapped down hard. Current list of those turds: /dev/sg (pointer-chasing, generally insane) /sys/firmware/efi/vars/*/raw_var (fucked binary structure) /sys/firmware/efi/vars/new_var (fucked binary structure) /sys/firmware/efi/vars/del_var (fucked binary structure) /dev/uhid (pointer-chasing for one obsolete command) /dev/input/event* (timestamps) /dev/uinput (timestamps) /proc/bus/input/devices (fucked bitmap-to-text representation) /sys/class/input/*/capabilities/* (fucked bitmap-to-text representation)