Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp4824279imm; Tue, 26 Jun 2018 00:53:13 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJsRH6cN6rK8Y48nqbIykeEVv7ga2PdQoKr+gfpF25iIdjNK1QZt5m8hD9ROx3yOCJRpIwz X-Received: by 2002:a65:5cc5:: with SMTP id b5-v6mr382549pgt.425.1529999593242; Tue, 26 Jun 2018 00:53:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529999593; cv=none; d=google.com; s=arc-20160816; b=Cs8MIs+k78d+VBmK29CcDq8C6n6B3M6+XgDdVN4zU0KL68HFD4c8zvWjPdHmC0IARi VDX86jT1MDkLQMKQdpaEBIQIqvAtrUXCK5rZlCjqI3OcSxN4SQd9MPVmmxwCMXur72E1 ekL+jP9AQI3pzr94HGIU5VZyV+Uh4sOnJDfo4AcUeSiANiYJJKda9wP1tYYZzeEzVyV0 XkfytrwOgJQkF+RKzn6HfZcZRSTzZ+eaabMiX352BkIf8lrO6+9CSH+otSdBLCWVrLwI QN5a1t3RtX2q8/Phe7DD1mfZbdcKG3HPmSmyNXYqJuYctO3uQDkRL9lFREAkIPl6EjEW woJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=ZxYUULyIk15d9UQgD2qgOHNx/+27AhN75+3tNu1TEhc=; b=paeSDIMx3RBww1IBNHkbKqbQmKbg4AsaNI7qJNxrGjSwobgPrUqAUR9o0xuD7I5aY0 rqJP+3CfcMh/qmssq9pSy+turGEIIcOiuptCOeZQVl6sdjffMCDSIiRL/w0NWz/+lgLd 45zexbOx1gQTwaCqKvlPwNsFwN4kVYmm1h6axiOMNJG259S5r3ZplXFLpfuffYg4XFo6 EJneAaI9q8NWKc0dhfqq774yycNlH6PXIYqq0iYTC31BxvbP9XJibFkzHalwHsSFHS0f K24jVrQYRyF1XZqhOYaETPXZ8g5V0jMozwIKmvI6+hoV+LD0wXDo3MqKEeRFg8vMVdSU MOtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p22-v6si862542pgv.236.2018.06.26.00.52.57; Tue, 26 Jun 2018 00:53:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752506AbeFZHwS (ORCPT + 99 others); Tue, 26 Jun 2018 03:52:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:35999 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751965AbeFZHwQ (ORCPT ); Tue, 26 Jun 2018 03:52:16 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 05703AC71; Tue, 26 Jun 2018 07:52:15 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id F3EA41E3C10; Tue, 26 Jun 2018 09:52:13 +0200 (CEST) Date: Tue, 26 Jun 2018 09:52:13 +0200 From: Jan Kara To: John Hubbard Cc: Jan Kara , Matthew Wilcox , Dan Williams , Christoph Hellwig , Jason Gunthorpe , John Hubbard , Michal Hocko , Christopher Lameter , Linux MM , LKML , linux-rdma Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*() Message-ID: <20180626075213.qn7ykt7j5usgvuiq@quack2.suse.cz> References: <0e6053b3-b78c-c8be-4fab-e8555810c732@nvidia.com> <20180619082949.wzoe42wpxsahuitu@quack2.suse.cz> <20180619090255.GA25522@bombadil.infradead.org> <20180619104142.lpilc6esz7w3a54i@quack2.suse.cz> <70001987-3938-d33e-11e0-de5b19ca3bdf@nvidia.com> <20180620120824.bghoklv7qu2z5wgy@quack2.suse.cz> <151edbf3-66ff-df0c-c1cc-5998de50111e@nvidia.com> <20180621163036.jvdbsv3t2lu34pdl@quack2.suse.cz> <20180625152150.jnf5suiubecfppcl@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 25-06-18 12:03:37, John Hubbard wrote: > On 06/25/2018 08:21 AM, Jan Kara wrote: > > On Thu 21-06-18 18:30:36, Jan Kara wrote: > >> On Wed 20-06-18 15:55:41, John Hubbard wrote: > >>> On 06/20/2018 05:08 AM, Jan Kara wrote: > >>>> On Tue 19-06-18 11:11:48, John Hubbard wrote: > >>>>> On 06/19/2018 03:41 AM, Jan Kara wrote: > >>>>>> On Tue 19-06-18 02:02:55, Matthew Wilcox wrote: > >>>>>>> On Tue, Jun 19, 2018 at 10:29:49AM +0200, Jan Kara wrote: > >>> [...] > > I've spent some time on this. There are two obstacles with my approach of > > putting special entry into inode's VMA tree: > > > > 1) If I want to place this special entry in inode's VMA tree, I either need > > to allocate full VMA, somehow initiate it so that it's clear it's a special > > "pinned" range, not a VMA => uses unnecessarily too much memory, it is > > ugly. Another solution I was hoping for was that I would factor out some > > common bits of vm_area_struct (pgoff, rb_node, ..) into a structure common > > for VMA and the locked range => doable but causes a lot of churn as VMAs > > are accessed (and modified!) at hundreds of places in the kernel. Some > > accessor functions would help to reduce the churn a bit but then stuff like > > vma_set_pgoff(vma, pgoff) isn't exactly beautiful either. > > > > 2) Some users of GUP (e.g. direct IO) get a block of pages and then put > > references to these pages at different times and in random order - > > basically when IO for given page is completed, reference is dropped and one > > GUP call can acquire page references for pages which end up in multiple > > different bios (we don't know in advance). This makes is difficult to > > implement counterpart to GUP to 'unpin' a range of pages - we'd either have > > to support partial unpins (and splitting of pinned ranges and all such fun) > > or just have to track internally in how many pages are still pinned in the > > originally pinned range and release the pin once all individual pages are > > unpinned but then it's difficult to e.g. get to this internal structure > > from IO completion callback where we only have the bio. > > > > So I think the Matthew's idea of removing pinned pages from LRU is > > definitely worth trying to see how complex that would end up being. Did you > > get to looking into it? If not, I can probably find some time to try that > > out. > > > > OK. Even if we remove the pages from the LRU, we still have to insert a > "put_gup_page" or similarly named call. But it could be a simple > replacement for put_page, with that approach, so that does make it much > much easier. Yes, that's exactly what I thought about as well. > I was (and still am) planning on tackling this today, so let me see how > far I get before yelling for help. :) OK, good. Honza -- Jan Kara SUSE Labs, CR