Received: by 10.223.148.5 with SMTP id 5csp7845039wrq; Thu, 18 Jan 2018 10:13:14 -0800 (PST) X-Google-Smtp-Source: ACJfBotweAuH29LQq7UW3u7XoYfgubSe/n8U68dxszHDuyt1K2qrmw5Ulynjh0VmM2Q++Vblbnn4 X-Received: by 10.99.96.201 with SMTP id u192mr36539555pgb.234.1516299194410; Thu, 18 Jan 2018 10:13:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516299194; cv=none; d=google.com; s=arc-20160816; b=LOkhEVreqggGx4IOzPF8WuTrhKGOWFcmb/SaN5Snf8omQ4EagkItjusoGzfswPGbS3 urwUMSk8dwv44angCfh1lV2dmugMGShZSxPEEAkt4ZpzN3e+4kJGLZnoOInStl3al/NC CZCfe9SPd1O8pPme09oakGNvdyUqZvm6Kx/j5JIFhlNjcfRz1LssAuL1UQ4Ca1O/e8uV QdWlehnZitRROi1INMYZQkS7CR1JSgwAsUxU+QDCY06zVCj6gerUD8FPVDzIDiGH0R6f KQW6NuWuy8sdV79WD4eA6TkZNq7XZQbit6ObelUImC+Qfi6md0jqZybD5uogE8qJkz8v vtTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=ctwnSQSdvkYG5mTTNQtYPGu5BdxhBmRWucyQGFy7Jh4=; b=Q2oYmjUUim+uyC+mR4bu8llNkk/lH21aWPFYbA7QudI2T4qA2rYZAw5DTAwePQLVAT o47s05k4zItl+m4T9LhsX8EVmhD86RJirlUehAZdwfiIYeJC7j0Tu2uVGCpRX41nIvHI e87B+pc2XnmDZRJ1wexydFQXNqMUJq+We4KeBCzjeIEOa77HF3cqj3dXYqAAqpQZfj6O w1LFwr2K95NaFoi+ay4eV+4J/PGMBi2Fa1gf0F/6fY65ZNAISEwHWUWRcT6oC2GVLgeU 0vvezzcnc1YqddKQkF2QeMVb+0m2bI/ZonI8Pmcg4ESZHuS6oYcF9PVQxIKXxS9guE/g 622w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p184si7374794pfg.371.2018.01.18.10.12.59; Thu, 18 Jan 2018 10:13:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755407AbeARSMf (ORCPT + 99 others); Thu, 18 Jan 2018 13:12:35 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:53208 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752368AbeARSMd (ORCPT ); Thu, 18 Jan 2018 13:12:33 -0500 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1ecEfi-0001y2-Gd; Thu, 18 Jan 2018 18:12:18 +0000 Date: Thu, 18 Jan 2018 18:12:18 +0000 From: Al Viro To: Linus Torvalds Cc: Christoph Hellwig , Alan Cox , Eric Dumazet , Dan Williams , Linux Kernel Mailing List , linux-arch@vger.kernel.org, Andi Kleen , Kees Cook , kernel-hardening@lists.openwall.com, Greg Kroah-Hartman , the arch/x86 maintainers , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Andrew Morton Subject: Re: [PATCH v3 8/9] x86: use __uaccess_begin_nospec and ASM_IFENCE in get_user paths Message-ID: <20180118181218.GB13338@ZenIV.linux.org.uk> References: <151586748981.5820.14559543798744763404.stgit@dwillia2-desk3.amr.corp.intel.com> <1516198646.4184.13.camel@linux.intel.com> <20180118163818.GB16649@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 18, 2018 at 08:49:31AM -0800, Linus Torvalds wrote: > On Thu, Jan 18, 2018 at 8:38 AM, Christoph Hellwig wrote: > > > > > But there are about ~100 set_fs() calls in generic code, and some of > > > those really are pretty fundamental. Doing things like "kernel_read()" > > > without set_fs() is basically impossible. > > > > Not if we move to iov_iter or iov_iter-like behavior for all reads > > and writes. > > Not going to happen. Really. We have how many tens of thousands of > drivers again, all doing "copy_to_user()". The real PITA is not even that (we could provide helpers making conversion from ->read() to ->read_iter() easy for char devices, etc.). It's the semantics of readv(2). Consider e.g. readv() from /dev/rtc, with iovec array consisting of 10 segments, each int-sized. Right now we'll get rtc_dev_read() called in a loop, once for each segment. Single read() into 40-byte buffer will fill one long and bugger off. Converting it to ->read_iter() will mean more than just "use copy_to_iter() instead of put_user()" - that would be trivial. But to preserve the current behaviour we would need something like total = 0; while (iov_iter_count(to)) { count = iov_iter_single_seg_count(to); /* current body of rtc_dev_read(), with * put_user() replaced with copy_to_iter() */ .... if (res < 0) { if (!total) total = res; break; } total += res; if (res != count) break; } return total; in that thing. And similar boilerplates would be needed in a whole lot of drivers. Sure, they are individually trivial, but they would add up to shitloads of code to get wrong. These are basically all ->read() instances that ignore *ppos and, unlike pipes, do not attempt to fill as much of the buffer as possible. We do have quite a few of such. Some ->read() instances can be easily converted to ->read_iter() and will, in fact, be better off that way. We had patches of that sort and I'm certain that we still have such places left. Ditto for ->write() and ->write_iter(). But those are not even close to being the majority. Sorry. We could, in principle, do something like dev_rtc_read_iter(iocb, to) { return loop_read_iter(iocb, to, modified_dev_rtc_read); } with modified_dev_rtc_read() being the result of minimal conversion (put_user() and copy_to_user() replaced with used of copy_to_iter()). It would be less boilerplate that way, but I really don't see big benefits from doing that. On the write side the things are just as unpleasant - we have a lot of ->write() instances that parse the beginning of the buffer, ignore the rest and report that everything got written. writev() on those will parse each iovec segment, ignoring the junk in the end of each. Again, that loop needs to go somewhere. And we do have a bunch of "parse the buffer and do some action once" ->write() instances - in char devices, debugfs, etc.