Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp236424pxu; Wed, 7 Oct 2020 01:29:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyv865elyO4uKQEVhtI3xSrJlTfrNQ0RtS0alKa++H+jr4R5Z6qr8ut6JNX47WQYPGda/pv X-Received: by 2002:a17:906:f2d2:: with SMTP id gz18mr2161203ejb.542.1602059395601; Wed, 07 Oct 2020 01:29:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602059395; cv=none; d=google.com; s=arc-20160816; b=Cp1eTtK5OrYLIvQPwBLvkUvAfJUnpJeCOjhwg3/kz8+B5i2gCq0dJNDdzGdG26Txqd zs1/Z7DS0pIpLFnnHgwLbtdhCU3E7cNPkrl1pUb0GrILg/bzNtW60DwQhMiECM1tQCUI N3oCGjc+n0jPRoqfET3CKbRDWz42/ehdnOzPpDLUdzPSWeZCZTKJtutM+4K92ZKxtXS+ NmAqOkrCL66uxhDxHz2edPLoF3kYMw8JnRoBxpITV1FQwaNFiGRF5m8XVr4bkKGDZwfP QqoEgfUit+kctykhMTjqkYsEoK1rE5v4XJSrzWbxnYjZEjnnv5hYeXdTGhA+xd8lgcBq Fogg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=pzPWUp2kZLxPkBgPS5CFIuIDZhRYDXybDPGOe+gkyjk=; b=NI/4fBh6Ratg5rgkqOnQ+h8vQPCNYbXtY5aieJQDrcKU77RaOS/d5AXpMnzngP22Kt icB9Q339ojl0/JsOgrfhq52aAGz3MeM+6SJKoGxokpKvBxLen866zvZglkELeqvxJCLR u0N426D7kW+IK29teF4Us1qsz9FC0wfho8ofEMLToKMWt5qsLOijcQ9+noGtB++BkA8n Qdi2IXQBykwwPIj9pd/k7eLfwYNxTor4YcUgMuzJ1D+gMmboxuemV1BEpgZgadd6lqa1 p84mx88o8mMzKw7eoZ1ac3XeNZcEwU5Tlz6/1MEygL8XPi+bFLyzIIGjwqw+39k8nYqr qizw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=Sk5u8ZJA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n17si1003886eje.315.2020.10.07.01.29.32; Wed, 07 Oct 2020 01:29:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=Sk5u8ZJA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727495AbgJGIPn (ORCPT + 99 others); Wed, 7 Oct 2020 04:15:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725976AbgJGIPn (ORCPT ); Wed, 7 Oct 2020 04:15:43 -0400 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DB7BC0613D2 for ; Wed, 7 Oct 2020 01:15:43 -0700 (PDT) Received: by mail-ot1-x344.google.com with SMTP id e20so976994otj.11 for ; Wed, 07 Oct 2020 01:15:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pzPWUp2kZLxPkBgPS5CFIuIDZhRYDXybDPGOe+gkyjk=; b=Sk5u8ZJACW67ugyj/T0pqGfKZwFhZ6QzSMPb8GHncNSoFSXz5ceYEjUiMY2g1VWlur akMu3IMZ04idCFQMJIH1qTYbGm9PtJrZDJY7cTSUMyHD2kqt0HmBgZwTu+2zWdQFbvWW DnFXUIWNgYS5d6BaKP/BYCJ/koUmUcIdl08hI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pzPWUp2kZLxPkBgPS5CFIuIDZhRYDXybDPGOe+gkyjk=; b=bDRvuIOMpWLz4XFFbbN4lPwCr9hexXNrCvb0O6wBt39mw+qAJq35oB1nZXGXg+OiP4 kSdX7P7bIJvFRzfbtgtArB9G93VWD6EqzmwtbljldIAR8BHTfPt46suTYh//4A5VWUTg 07lVDcg6jHPD4IOzBtuO8HJoDmL5apqZktYNjTAe1ylpumj0ku/s5Bu2/4StNRWpT2qd cx2Pof05Mbcs62zywuNi3BoEQQFQI42knsK+FBYTGFQRYiNjDrXCOvw9wza7BBcDcf3x +VgPSoIxDJGsWdkyaySkRwL221Qb2ZmkqpvMSZd9j5UrwSEWW8eFSMHCVfZ4wsQv+xnA sQ6w== X-Gm-Message-State: AOAM533HXUHp2o9UUtNqDrERmfh9X7pLBzpEbDoWrvFxHmuiS7k8hZTG oMl2c2rNCeId7OoRIi9fhYtVXy7HP9zpA+Rks1PtMg== X-Received: by 2002:a05:6830:1e56:: with SMTP id e22mr1110002otj.303.1602058542439; Wed, 07 Oct 2020 01:15:42 -0700 (PDT) MIME-Version: 1.0 References: <20201004154340.1080481-1-leon@kernel.org> <20201005235650.GA89159@nvidia.com> <20201006104122.GA438822@phenom.ffwll.local> <20201006114627.GE5177@ziepe.ca> In-Reply-To: <20201006114627.GE5177@ziepe.ca> From: Daniel Vetter Date: Wed, 7 Oct 2020 10:15:31 +0200 Message-ID: Subject: Re: [PATCH rdma-next v5 0/4] Dynamicaly allocate SG table from the pages To: Jason Gunthorpe Cc: Leon Romanovsky , Doug Ledford , Leon Romanovsky , Christoph Hellwig , David Airlie , dri-devel , intel-gfx , Jani Nikula , Joonas Lahtinen , Linux Kernel Mailing List , linux-rdma , Maor Gottlieb , Rodrigo Vivi , Roland Scheidegger , Tvrtko Ursulin , VMware Graphics Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 7, 2020 at 9:22 AM Jason Gunthorpe wrote: > On Tue, Oct 06, 2020 at 12:41:22PM +0200, Daniel Vetter wrote: > > On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote: > > > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote: > > > > This series extends __sg_alloc_table_from_pages to allow chaining of > > > > new pages to already initialized SG table. > > > > > > > > This allows for the drivers to utilize the optimization of merging contiguous > > > > pages without a need to pre allocate all the pages and hold them in > > > > a very large temporary buffer prior to the call to SG table initialization. > > > > > > > > The second patch changes the Infiniband driver to use the new API. It > > > > removes duplicate functionality from the code and benefits the > > > > optimization of allocating dynamic SG table from pages. > > > > > > > > In huge pages system of 2MB page size, without this change, the SG table > > > > would contain x512 SG entries. > > > > E.g. for 100GB memory registration: > > > > > > > > Number of entries Size > > > > Before 26214400 600.0MB > > > > After 51200 1.2MB > > > > > > > > Thanks > > > > > > > > Maor Gottlieb (2): > > > > lib/scatterlist: Add support in dynamic allocation of SG table from > > > > pages > > > > RDMA/umem: Move to allocate SG table from pages > > > > > > > > Tvrtko Ursulin (2): > > > > tools/testing/scatterlist: Rejuvenate bit-rotten test > > > > tools/testing/scatterlist: Show errors in human readable form > > > > > > This looks OK, I'm going to send it into linux-next on the hmm tree > > > for awhile to see if anything gets broken. If there is more > > > remarks/tags/etc please continue > > > > An idea that just crossed my mind: A pin_user_pages_sgt might be useful > > for both rdma and drm, since this would avoid the possible huge interim > > struct pages array for thp pages. Or anything else that could be coalesced > > down into a single sg entry. > > > > Not sure it's worth it, but would at least give a slightly neater > > interface I think. > > We've talked about it. Christoph wants to see this area move to a biovec > interface instead of sgl, but it might still be worthwhile to have an > interm step at least as an API consolidation. Hm but then we'd need a new struct for the mapped side of things (which would still be what you get from dma-buf). That would be quite a bit of work to roll out everywhere, and sgt isn't such a huge misfit for passing buffer object mappings and system memory backing storage around, and hence what we (very slowly) converging drivers/gpu towards over the past 10 years or so. And moving the dma_map step out of dma-buf doesn't work, because some of the use-cases we have is for very special iommus which are managed by the gpu driver directly. Stuff that e.g. rotates/retiles/compresses on the fly, and is accessible by other (gfx related like video code, camera, ..) devices. Not something I expect to ever be relevant for rdma since this exist mostly on some small soc, but it's a thing. Without that dma-buf could hand out biovec for struct_page backed stuff, or some pfn_vec for the p2p stuff. Anyway was just an idea, I guess we'll have to live with some impedance mismatch since rolling out the one an only iovec structure which suits everyone is I think impossible :-) > Avoiding the page list would be complicated as we'd somehow have to > code share the page table iterator scheme. We're (slowly) getting towards thp for vram mappings and everything so I guess for drivers/gpu we might make that happen. But yeah it'd be not so pretty I think. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch