Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp4307916pxv; Tue, 27 Jul 2021 04:17:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzusK70ax5T2Li8mMSMVDbgJQerY6nyScc+kFxWzlUw2kT432c3l6iR/LESTDVzJW+rhSfo X-Received: by 2002:a05:6402:2283:: with SMTP id cw3mr19132028edb.87.1627384678169; Tue, 27 Jul 2021 04:17:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627384678; cv=none; d=google.com; s=arc-20160816; b=Oj3Ci5O+ndb1QQp6e1GZxzrohaWdrKIUGwVYfVoDX11o7Bo9xoigEXqfzQOl/YZAXt 4cS7nKXzuenyOa7+CyuhndS0J1LCaYinqMguBmWxB9viohSFxBObSekKZ8OtIU0g9+n4 cl6iCFUViZ2AY8iGRIr8JeDWN3vqNhZXTj66vUgDODO45THGUgsjcFKCn3rfnsdknfpG k/CuojiqxIUlge3wtOe70Q8S1srKh+F5K6xOkuN1RhmotZyGjrb1MbTwXCo/kpzzD/XF NqVT0TGrq3/pgreklp/kGVsYjIzPcD+SwAGWt3w8TRgANM7wGbkkm3GNu3KDAih7MEdW dhJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=afrSbaGGjeeVykwqXymNzbcEw/D8kWccnFVpP7E5Guw=; b=RT9cb3tg2u/IUbllF3SVcoBANb920MwbJMKOvhWUZSpSJn428lOTo1mJ0CKHhSu5ng pM2OOmkDfgrA42s9Z2XMuX+rPa/etBUp+1o1eOeGmZ/6fJFORhJkqQ60AixHHMANRWiH ExFms+bKQGYoPj3xiYNdPxneLDRdJTuVWC6u0V56gr87gS6OVRfXJNr+KoJLVGj1ZzZT WiKgK0E2gjhSggJ6SwKeMDZiPBuQYgTzQueVi1el8cT0BFZW5MaLOVK3ESYQLJ0u7zgr DmmFBGcovNuQEvhUQrRlkYARGvLG6lwjrKRNZkMUJFKB8FF3iZkSj+SgE04lonP93YRj rwiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ej9DgGZ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p12si3275788ejy.596.2021.07.27.04.17.34; Tue, 27 Jul 2021 04:17:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ej9DgGZ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236340AbhG0LOD (ORCPT + 99 others); Tue, 27 Jul 2021 07:14:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:28277 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236318AbhG0LOC (ORCPT ); Tue, 27 Jul 2021 07:14:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1627384442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=afrSbaGGjeeVykwqXymNzbcEw/D8kWccnFVpP7E5Guw=; b=ej9DgGZ9d+PoFi/f2r8vB/M/mnLAXhtHdUgtalgc4GFNVbiNUzG0Lah7+P7B14EUGLGL2+ 5/hMvVbPCeyzjHvpk4JxnT8YY/VauD45Urn1NjL4MeA61DVWrQCASdXDR6FdkRDk3UGGUL gJVkh4E+IfWQ7v58BbkirBVdp4tW2rc= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-274-gpvx7ylEMrCZ-6aPNDkaxg-1; Tue, 27 Jul 2021 07:14:00 -0400 X-MC-Unique: gpvx7ylEMrCZ-6aPNDkaxg-1 Received: by mail-wr1-f70.google.com with SMTP id f6-20020adfe9060000b0290153abe88c2dso1595400wrm.20 for ; Tue, 27 Jul 2021 04:14:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=afrSbaGGjeeVykwqXymNzbcEw/D8kWccnFVpP7E5Guw=; b=fROZoGYZpwpAdlbjjlLK9O45WFgP9EmprTas9hofpOUvuuYnTc6oEolX41i42Q9rHH Mx/g9mQ56yD+NahDzVfMBOr8HKV6vSZbuR4jt9pfhNbLE7ukMkFoe36uLIjQdiGOdDJ/ VAI2wOWf/YRcwV6qpBWhE4194OI85O2J6Z1ccTcHHqYwzv+7Z26lcTuTtVBl9BaM206P UCselasYw1+XVQMO7zjUQXK0XHDmis24XExkEpfSb6lRpgfiwMnxXUPjoLzbWh4INtud ACtkdZujBswrlFk8SJmBTU+AQLzpde1ft9e+7ZwxzvUC4vw5TIKWPg//Ggt71JokewqC vfmg== X-Gm-Message-State: AOAM531zS2/XyKtdxc5ZlYm5TcifH25sJPtVmh0oURZt4ZfuRrkGOzQn 34lFnsKK4LhPoz5/oL89qyP0QGOscz0aMowuNAYpQzCa0oO9SPMCdpcofkqn4ogNuwYW7DSLxaX AaygBzZy6BySuc0jo1k4QQIpZF3bnYzPDlD8BkGs/ X-Received: by 2002:a1c:2282:: with SMTP id i124mr21633323wmi.166.1627384439576; Tue, 27 Jul 2021 04:13:59 -0700 (PDT) X-Received: by 2002:a1c:2282:: with SMTP id i124mr21633303wmi.166.1627384439403; Tue, 27 Jul 2021 04:13:59 -0700 (PDT) MIME-Version: 1.0 References: <20210724193449.361667-1-agruenba@redhat.com> <20210724193449.361667-2-agruenba@redhat.com> <03e0541400e946cf87bc285198b82491@AcuMS.aculab.com> In-Reply-To: <03e0541400e946cf87bc285198b82491@AcuMS.aculab.com> From: Andreas Gruenbacher Date: Tue, 27 Jul 2021 13:13:47 +0200 Message-ID: Subject: Re: [PATCH v4 1/8] iov_iter: Introduce iov_iter_fault_in_writeable helper To: David Laight Cc: Linus Torvalds , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , "ocfs2-devel@oss.oracle.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 27, 2021 at 11:30 AM David Laight wrote: > From: Linus Torvalds > > Sent: 24 July 2021 20:53 > > > > On Sat, Jul 24, 2021 at 12:35 PM Andreas Gruenbacher > > wrote: > > > > > > +int iov_iter_fault_in_writeable(const struct iov_iter *i, size_t bytes) > > > +{ > > ... > > > + if (fault_in_user_pages(start, len, true) != len) > > > + return -EFAULT; > > > > Looking at this once more, I think this is likely wrong. > > > > Why? > > > > Because any user can/should only care about at least *part* of the > > area being writable. > > > > Imagine that you're doing a large read. If the *first* page is > > writable, you should still return the partial read, not -EFAULT. > > My 2c... > > Is it actually worth doing any more than ensuring the first byte > of the buffer is paged in before entering the block that has > to disable page faults? We definitely do want to process as many pages as we can, especially if allocations are involved during a write. > Most of the all the pages are present so the IO completes. That's not guaranteed. There are cases in which none of the pages are present, and then there are cases in which only the first page is present (for example, because of a previous access that wasn't page aligned). > The pages can always get unmapped (due to page pressure or > another application thread unmapping them) so there needs > to be a retry loop. > Given the cost of actually faulting in a page going around > the outer loop may not matter. > Indeed, if an application has just mmap()ed in a very large > file and is then doing a write() from it then it is quite > likely that the pages got unmapped! > > Clearly there needs to be extra code to ensure progress is made. > This might actually require the use of 'bounce buffers' > for really problematic user requests. I'm not sure if repeated unmapping of the pages that we've just faulted in is going to be a problem (in terms of preventing progress). But a suitable heuristic might be to shrink the fault-in "window" on each retry until it's only one page. > I also wonder what actually happens for pipes and fifos. > IIRC reads and write of up to PIPE_MAX (typically 4096) > are expected to be atomic. > This should be true even if there are page faults part way > through the copy_to/from_user(). > > It has to be said I can't see any reference to PIPE_MAX > in the linux man pages, but I'm sure it is in the POSIX/TOG > spec. > > David Thanks, Andreas