Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp4435449imm; Mon, 17 Sep 2018 13:59:19 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaugUV3E10T3b6JLPPSjMEuvvK1K+JpVLBXZgrEpfi0/wD+AW9rrwVPILMsdpvzuuKGjbia X-Received: by 2002:a17:902:6115:: with SMTP id t21-v6mr26530079plj.92.1537217959216; Mon, 17 Sep 2018 13:59:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537217959; cv=none; d=google.com; s=arc-20160816; b=HoIH2vrYj+A0y9Ij/66MrPtyDhmNfU8961GI/PaKkxd0wfUPToo7mBkHnM1Ah8KsfT w9uSfqz/7RzNrfIrpGFhWpfCm/Sloslj0GuX7JX5hPaRhCNlY6yJUi8B9usxYGO8HRh9 iiLYGT8fLq/mxIiWwtXR42sv1eueGVcFBhG+rZbdjFFGUJDrG78Z4l+K4zXVSwYJGhXO HHTgrvIpmqQD45cH4w7o0KHFYABIDRfcwYDX6kwA6e23B6M0Y2NBVnfZcaEWaI8lCBd1 MkdC9Ae3gilJFb6iRjEP+xX1utFNRzG7Pbz/eTcmRCoBAmzD12sxl7NIP+NBHBkYHQjq Na7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:content-transfer-encoding :content-id:mime-version:subject:cc:to:references:in-reply-to:from :organization; bh=s7lK9TZxfGD4dLq/6fQ9eQ+qazwFIMeKfX69kQcKjrs=; b=iwfVW8AbQU0HUUEhWat89KL6Ie2Y/n6JkuUx/HSGXr77PF7/38NKWjkye7yajpnQ4z HsRni4ROsW0ALcGtB4xPwE+rCT7qM2suomo3GeM9Gmij5HonMfbSjwVTrVjE2fs6rKGy Zz3VrmjwPsk4ZjmoeU8lK6+VqxqlEhtVoFOn1vO+psMYObeNHBSykP8DSL9F4zdp0AwX r8Oz7TfUBEjtYVl5iELbIcGW4ZJPeduSMG7hyzbxjgq4vT4XLhGA+3OHA0OuXX8Qk/r8 x7P9/BtEHvQj4EGYJz8bShhhqM7pIFG5VjG4GYFdh/5QBOsLTQ+WYdHWHNk88MV6Qivj HfUQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r6-v6si15222478pgp.591.2018.09.17.13.59.03; Mon, 17 Sep 2018 13:59:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728440AbeIRC2A convert rfc822-to-8bit (ORCPT + 99 others); Mon, 17 Sep 2018 22:28:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59384 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727672AbeIRC2A (ORCPT ); Mon, 17 Sep 2018 22:28:00 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BB15530E6867; Mon, 17 Sep 2018 20:58:58 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-123-84.rdu2.redhat.com [10.10.123.84]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64E2982218; Mon, 17 Sep 2018 20:58:57 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20180914041831.GY19965@ZenIV.linux.org.uk> References: <20180914041831.GY19965@ZenIV.linux.org.uk> <153685389564.14766.11306559824641824935.stgit@warthog.procyon.org.uk> <153685392942.14766.3347355712333618914.stgit@warthog.procyon.org.uk> To: Al Viro Cc: dhowells@redhat.com, linux-fsdevel@vger.kernel.org, linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 04/10] iov_iter: Add mapping and discard iterator types MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3532.1537217936.1@warthog.procyon.org.uk> Content-Transfer-Encoding: 8BIT Date: Mon, 17 Sep 2018 21:58:56 +0100 Message-ID: <3533.1537217936@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Mon, 17 Sep 2018 20:58:58 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Al Viro wrote: > > Add two new iterator types to iov_iter: > > > > (1) ITER_MAPPING > > > > This walks through a set of pages attached to an address_space that > > are pinned or locked, starting at a given page and offset and walking > > for the specified amount of space. A facility to get a callback each > > time a page is entirely processed is provided. > > > > This is useful for copying data from socket buffers to inodes in > > network filesystems. > > Interesting... Questions: > * what will hold those pages? IOW, where will you unlock/drop/whatnot > those sucker? The caller needs to have those pages pinned - say with PG_locked or PG_writeback. Sorry - I mentioned this in the cover, but not here. You can either undo there changes in the callback or upon completion of the iteration. > * "callback" sounds dangerous - it appears to imply that you won't > copy to/from the same page twice. Not true for a lot of iov_iter users; what > happens if you pass such a beast to them? Similar to ITER_PIPE. There's no rewind. Once you've passed a page, it's gone. Under what circumstances would you want to copy to/from the same page twice? > * why not simply "build and populate ITER_BVEC aliasing a piece of > mapping", possibly in "grab" and "grab+lock" variants? ITER_BVEC is inefficient. This is what the upstream now. See afs_load_bvec(). That what the code currently uses. There's a practical limit to the number of pages I can shovel into one in one go. Further, every time I reach the end of a ITER_BVEC, I have to return to process context, which then has to round up the next bundle of pages by calling the radix tree. It seems to work out better to put the radix iteration into the iterator if we can. The caller guarantees that the contents of the region of interest are (a) fully populated and (b) pinned. Yet further, with ITER_BVEC, I can't release any of the pinned pages until the entire iteration is finished. That means if I have a 4GB BVEC, those pages are going to be pinned a long time. With ITER_MAPPING, they're released incrementally via the callback. > Those ITER_MAPPING do seem to be related to ITER_BVEC, at the very least. Only in the sense that the current position can be described by the same three numbers: page, len, offset. I'm reusing struct bio_vec so that I can share some of the code with ITER_BVEC. > Note, BTW, that iov_iter_get_pages...() might mutate into something similar > - "build and populate ITER_BVEC aliasing a piece of given iov_iter". Or, > perhaps, a nicer-on-memory analogue of ITER_BVEC - with pointer to pages array> instead of as elements, with > the same "populate from mapping" to get something similar to your > functionality and "populate from iov_iter" for > iov_iter_get_pages... replacement The whole point is to avoid having to use ITER_BVEC. ITER_BVEC has a number of issues that ITER_MAPPING overcomes - though ITER_MAPPING can only be used with a mapping (or, at least, a radix tree). There is no point to the loop shifting page runs into a page array for use with a BVEC when the mapping carries the same information. You save memory and processing time. David