Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp4448826rwi; Mon, 17 Oct 2022 06:29:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5pN5pzIeV2rYMMgqmnFjDyC0fjCCCPXZdWJTplJxoJrwDkeCB9xcHXPGWBhnqiPoaAjQJG X-Received: by 2002:a17:902:f546:b0:177:ed6b:4696 with SMTP id h6-20020a170902f54600b00177ed6b4696mr11949572plf.171.1666013351290; Mon, 17 Oct 2022 06:29:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666013351; cv=none; d=google.com; s=arc-20160816; b=0D88+t9aowDvsKqnb4yFl1kRgmf1jTYCPNiazZlWbaFbFpK/TsEzjtP+0eXHnTK7IP rWjG4N1hWSQZDquqHpXeRft3p7bS/9+xVUO6GNeIxPbhJFJGkSkjRbhXLlJPFiRk/8wS FI9h8fNwe7/5QNzvLDPmfFN3Z2e0pt5xXBdw88ZuoQ3B9aA6CeZgA3E8uMJYnjtAdult BQu8ODV2vwEBYXxbxYaA3CfGWr3q/51rcX4HFlr9qPEwwkB3P1YmotfaLClKSGyhAoRJ /R5QM4NvIkD+DXYfJGKxhAG/oof0M/BMhNCJJRhaIePwY3c5kFp322MlYq8m5qelIfej h1gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=aGtg2Qj2UsZ6c2ADL+W0CG9O1Y771EKXkyHyvj1eDO0=; b=gB3cks8G34VTXso8hsRQ48Wpn334X4YPBwMps9l1MyfQt4Dvhn+W9jAQd0Nl1KZf7q 2BmR45kXE7UThSzl6kd6I+dDhhGZW9AEUVpRDjuciWNmMNRCs451Nn5NCvG0A/0/8EDx fAogQv7x6qSQcwMTnd58HGG7TRiVV7/V/vkSf45wlABJ+e4xkeoT7LAL/UHqqC2OiJ/1 RpZjEX6GUCwm2sZaKvJ7hyQ5eJWXq896hSxXu9epzRgo+Q9tngXQtRvFcITNm2/yPnVb xMpAR+Jk6bR0ITc/lMPLZj/yM7nbT5D25Jtv4zQ5pfkQSgzozQTfnWN9xhVjXbvDZ/Hn hIJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=I+MvpcvC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o1-20020a17090a744100b001fe39932aa3si15662890pjk.48.2022.10.17.06.28.38; Mon, 17 Oct 2022 06:29:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=I+MvpcvC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230104AbiJQNQP (ORCPT + 99 others); Mon, 17 Oct 2022 09:16:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229506AbiJQNQM (ORCPT ); Mon, 17 Oct 2022 09:16:12 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59A3E167DE; Mon, 17 Oct 2022 06:16:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=aGtg2Qj2UsZ6c2ADL+W0CG9O1Y771EKXkyHyvj1eDO0=; b=I+MvpcvCzDPis8piDJr5WJCHXZ 8osuIpox/vJzj2b2p2TeK14qywkWvP07p0lrq6d+TD/3fYVV1N7hFqmwxB+99V3ABhWIWgt8GAKag hsTszpBAI2lmS8pwjDahw6JDAfE8IFsIhQibctK4XJVsRTWktl9e9DwRyZvU7Nh81mWGFgUN3L0Ow +Ryo7NBlnOtAbJX9iBL6tprVdn8TO16zTzurZeUuycJkL42F1zapbNRmkUZ6CEqrBSMHKWZ6HgwBd NJRjs4ObvIb+5SDZEZq2UiQqckr78+0hTLCya4Mgnmu95vFB1ItKRGBZKzLGugB1Elxx5xZIqPYG/ BhjSgSEA==; Received: from hch by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1okPyH-00C5kl-0a; Mon, 17 Oct 2022 13:15:57 +0000 Date: Mon, 17 Oct 2022 06:15:56 -0700 From: Christoph Hellwig To: David Howells Cc: Christoph Hellwig , Al Viro , willy@infradead.org, dchinner@redhat.com, Steve French , Shyam Prasad N , Rohith Surabattula , Jeff Layton , torvalds@linux-foundation.org, linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: How to convert I/O iterators to iterators, sglists and RDMA lists Message-ID: References: <1762414.1665761217@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1762414.1665761217@warthog.procyon.org.uk> X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 14, 2022 at 04:26:57PM +0100, David Howells wrote: > (1) Async direct I/O. > > In the async case direct I/O, we cannot hold on to the iterator when we > return, even if the operation is still in progress (ie. we return > EIOCBQUEUED), as it is likely to be on the caller's stack. > > Also, simply copying the iterator isn't sufficient as virtual userspace > addresses cannot be trusted and we may have to pin the pages that > comprise the buffer. This is very related to the discussion we are having related to pinning for O_DIRECT with Ira and Al. What block file systems do is to take the pages from the iter and some flags on what is pinned. We can generalize this to store all extra state in a flags word, or byte the bullet and allow cloning of the iter in one form or another. > (2) Crypto. > > The crypto interface takes scatterlists, not iterators, so we need to be > able to convert an iterator into a scatterlist in order to do content > encryption within netfslib. Doing this in netfslib makes it easier to > store content-encrypted files encrypted in fscache. Note that the scatterlist is generally a pretty bad interface. We've been talking for a while to have an interface that takes a page array as an input and return an array of { dma_addr, len } tuples. Thinking about it taking in an iter might actually be an even better idea. > (3) RDMA. > > To perform RDMA, a buffer list needs to be presented as a QPE array. > Currently, cifs converts the iterator it is given to lists of pages, then > each list to a scatterlist and thence to a QPE array. I have code to > pass the iterator down to the bottom, using an intermediate BVEC iterator > instead of a page list if I can't pass down the original directly (eg. an > XARRAY iterator on the pagecache), but I still end up converting it to a > scatterlist, which is then converted to a QPE. I'm trying to go directly > from an iterator to a QPE array, thus avoiding the need to allocate an > sglist. I'm not sure what you mean with QPE. The fundamental low-level interface in RDMA is the ib_sge. If you feed it to RDMA READ/WRITE requests the interface for that is the RDMA R/W API in drivers/infiniband/core/rw.c, which currently takes a scatterlist but to which all of the above remarks on DMA interface apply. For RDMA SEND that ULP has to do a dma_map_single/page to fill it, which is a quite horrible layering violation and should move into the driver, but that is going to a massive change to the whole RDMA subsystem, so unlikely to happen anytime soon. Neither case has anything to do with what should be in common iov_iter code, all this needs to live in the RDMA subsystem as a consumer.