Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp4239486imm; Mon, 25 Jun 2018 12:05:07 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLqt3G5PfKATWQJCHZ1xjmAHhKu28+ZKFmI701OAsF6IyF2pJpollD4aKd5bbQoxqbDht8A X-Received: by 2002:a65:6157:: with SMTP id o23-v6mr11926646pgv.310.1529953507278; Mon, 25 Jun 2018 12:05:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529953507; cv=none; d=google.com; s=arc-20160816; b=l8mvUKWI44KiSN346T3L/7L5V9nTMjawnRjkw6zfGMk3fco5xM2mITRfC+CmZcz4fw bF+1RgbgkmzFaVupBpksZOMbiYCyz9ED2m9UWR5GqeaXcGFTJ2VYAiQl38FJ3Rq96Jnh 2VPzQQ0nsPL1Hj/04oO3eZXhwRhIXvMza6V5u6WHgMWyl9otkXCpBDJi16BZDvwSbluD Rh4pKtaq9/3M71dE4l2vVitVfgCICDV6d287mZ4NNo3djNP/jvwEKbl06miWYfN6opVl g2fo52sOQqek/dho8HdlfLgWBoF74fyyzI7TLiqo59cDxtAf9Se0QF+Oz7ZEMdRM2JHe x7vQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=o8GvcH/Gy6Mzw2H18Mi8hdRt9ylsL9zsXmkCtTxyP+A=; b=p8cDo5t28XHznKRPfdcHJlOGqdUcGKNYwAABB5Zq/5I9V78pmNIylMUnf3dM0CswXQ CllOV79azoHc9tDsW0Qy0R0UFc1xXc9foQjMUcL3Lt5zF2vDjSOWXemvNmKVnzszzLoY mAH8uEcKqyeSZliLLBwzGhZ0qPIKMzxPpoccqWMjjxzyKnRyIPD+/oH+5lMvLliwt4G0 EJ48SY+FPPykSsslVDPGvwX3wQv4auanp3fdHgYGeczUT9/nuIQReqILIKeZkYWPQJSv WJUWHC+Ts6NEUpKispDRDCpqPStCfJXkRHqqO3Rsk4R7tok7AAo/gDjtc9fIrqRPmBj0 3eNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r133-v6si6997956pgr.17.2018.06.25.12.04.51; Mon, 25 Jun 2018 12:05:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932723AbeFYTEK (ORCPT + 99 others); Mon, 25 Jun 2018 15:04:10 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:13281 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752146AbeFYTEI (ORCPT ); Mon, 25 Jun 2018 15:04:08 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Mon, 25 Jun 2018 12:03:39 -0700 Received: from HQMAIL107.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 25 Jun 2018 12:04:07 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 25 Jun 2018 12:04:07 -0700 Received: from [10.2.161.233] (10.2.161.233) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Mon, 25 Jun 2018 19:04:03 +0000 Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*() To: Jan Kara CC: Matthew Wilcox , Dan Williams , Christoph Hellwig , Jason Gunthorpe , John Hubbard , Michal Hocko , Christopher Lameter , Linux MM , LKML , linux-rdma References: <3898ef6b-2fa0-e852-a9ac-d904b47320d5@nvidia.com> <0e6053b3-b78c-c8be-4fab-e8555810c732@nvidia.com> <20180619082949.wzoe42wpxsahuitu@quack2.suse.cz> <20180619090255.GA25522@bombadil.infradead.org> <20180619104142.lpilc6esz7w3a54i@quack2.suse.cz> <70001987-3938-d33e-11e0-de5b19ca3bdf@nvidia.com> <20180620120824.bghoklv7qu2z5wgy@quack2.suse.cz> <151edbf3-66ff-df0c-c1cc-5998de50111e@nvidia.com> <20180621163036.jvdbsv3t2lu34pdl@quack2.suse.cz> <20180625152150.jnf5suiubecfppcl@quack2.suse.cz> From: John Hubbard X-Nvconfidentiality: public Message-ID: Date: Mon, 25 Jun 2018 12:03:37 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180625152150.jnf5suiubecfppcl@quack2.suse.cz> X-Originating-IP: [10.2.161.233] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/25/2018 08:21 AM, Jan Kara wrote: > On Thu 21-06-18 18:30:36, Jan Kara wrote: >> On Wed 20-06-18 15:55:41, John Hubbard wrote: >>> On 06/20/2018 05:08 AM, Jan Kara wrote: >>>> On Tue 19-06-18 11:11:48, John Hubbard wrote: >>>>> On 06/19/2018 03:41 AM, Jan Kara wrote: >>>>>> On Tue 19-06-18 02:02:55, Matthew Wilcox wrote: >>>>>>> On Tue, Jun 19, 2018 at 10:29:49AM +0200, Jan Kara wrote: >>> [...] > I've spent some time on this. There are two obstacles with my approach of > putting special entry into inode's VMA tree: > > 1) If I want to place this special entry in inode's VMA tree, I either need > to allocate full VMA, somehow initiate it so that it's clear it's a special > "pinned" range, not a VMA => uses unnecessarily too much memory, it is > ugly. Another solution I was hoping for was that I would factor out some > common bits of vm_area_struct (pgoff, rb_node, ..) into a structure common > for VMA and the locked range => doable but causes a lot of churn as VMAs > are accessed (and modified!) at hundreds of places in the kernel. Some > accessor functions would help to reduce the churn a bit but then stuff like > vma_set_pgoff(vma, pgoff) isn't exactly beautiful either. > > 2) Some users of GUP (e.g. direct IO) get a block of pages and then put > references to these pages at different times and in random order - > basically when IO for given page is completed, reference is dropped and one > GUP call can acquire page references for pages which end up in multiple > different bios (we don't know in advance). This makes is difficult to > implement counterpart to GUP to 'unpin' a range of pages - we'd either have > to support partial unpins (and splitting of pinned ranges and all such fun) > or just have to track internally in how many pages are still pinned in the > originally pinned range and release the pin once all individual pages are > unpinned but then it's difficult to e.g. get to this internal structure > from IO completion callback where we only have the bio. > > So I think the Matthew's idea of removing pinned pages from LRU is > definitely worth trying to see how complex that would end up being. Did you > get to looking into it? If not, I can probably find some time to try that > out. > OK. Even if we remove the pages from the LRU, we still have to insert a "put_gup_page" or similarly named call. But it could be a simple replacement for put_page, with that approach, so that does make it much much easier. I was (and still am) planning on tackling this today, so let me see how far I get before yelling for help. :) thanks, -- John Hubbard NVIDIA