Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp5492677imm; Tue, 19 Jun 2018 11:13:15 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKRafrIKvSvHzk+OgJvHhE6iU8HxWWgfBYqoI3GRkXMjf5qnShUvLCMkDMuEbqcVp8KMBYk X-Received: by 2002:a65:4241:: with SMTP id d1-v6mr15483022pgq.392.1529431995068; Tue, 19 Jun 2018 11:13:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529431995; cv=none; d=google.com; s=arc-20160816; b=goQZL18ZhehkBD7cGTk19YV8wjJGZOuSEDO82hjvf3jRBe+ja0M6E27Fx/oux2JISD st8aFFuQ42mvJOoBq3doZF9ylD6E180te+9HY06td+fTPWPiBMaKj2JsRKoH8e/UERJc ETdziDqT0s2r85goe8aA4WZ+Ak3J9wodZF0GoeD1EH6Kf9BFXXX2aXzOLD8Aj6JkWEQD 9on8DnDs8KHa1oCuVTZ7t72bdfpVDXhYqp2t9W8K+RYBw2rW65Bq151b9n1UpfnKzEbq p453rTuZ0hrodn9me3Gn+IxX2AN9kWki/uii/XR4FTIFm0LdhCwsrpuuoaaT+5/S29/A brJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=rhnmnlP9jocmn07QaO+0lYnheeBbSUOgdSIpEMxaBZ8=; b=QC96m+41eQCm+yp4aP840NJRkqMUDaLzommwIKR/8W03lWrt7v6GwPTvFf+5QyFI+G yHPyZExD8h5hGRiQGQnAtVrUtVbbvAO3U3SLwW3S6AteHNgWyHrNuOhiCK017qxFXQFk d9euuOxgk4ONZ8I2pw0c92jylH9m1mk3mYNPb0QIahopakyryXxR+000mm983GD5m8cI Bwn+Dh4HLr5nj8bzxNn2N83H8Pm2Xm0Zw6zdukMpDdMXCfYRLppYv5TGvxzgI7iT0wcc 0iPgLvl3xWaLCLdJLUHwvb+/wEDQbKdtGBxjajYoZWVPVJ6F4SJTWHi0PCDXg5HbU0IZ vlrA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b9-v6si242756plk.111.2018.06.19.11.12.49; Tue, 19 Jun 2018 11:13:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967116AbeFSSMM (ORCPT + 99 others); Tue, 19 Jun 2018 14:12:12 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:13570 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966738AbeFSSMK (ORCPT ); Tue, 19 Jun 2018 14:12:10 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Tue, 19 Jun 2018 11:11:48 -0700 Received: from HQMAIL107.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 19 Jun 2018 11:12:10 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 19 Jun 2018 11:12:10 -0700 Received: from [10.2.175.123] (10.2.175.123) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Tue, 19 Jun 2018 18:12:09 +0000 Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*() To: Jan Kara , Matthew Wilcox CC: Dan Williams , Christoph Hellwig , Jason Gunthorpe , John Hubbard , Michal Hocko , Christopher Lameter , Linux MM , LKML , linux-rdma References: <311eba48-60f1-b6cc-d001-5cc3ed4d76a9@nvidia.com> <20180618081258.GB16991@lst.de> <3898ef6b-2fa0-e852-a9ac-d904b47320d5@nvidia.com> <0e6053b3-b78c-c8be-4fab-e8555810c732@nvidia.com> <20180619082949.wzoe42wpxsahuitu@quack2.suse.cz> <20180619090255.GA25522@bombadil.infradead.org> <20180619104142.lpilc6esz7w3a54i@quack2.suse.cz> X-Nvconfidentiality: public From: John Hubbard Message-ID: <70001987-3938-d33e-11e0-de5b19ca3bdf@nvidia.com> Date: Tue, 19 Jun 2018 11:11:48 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180619104142.lpilc6esz7w3a54i@quack2.suse.cz> X-Originating-IP: [10.2.175.123] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/19/2018 03:41 AM, Jan Kara wrote: > On Tue 19-06-18 02:02:55, Matthew Wilcox wrote: >> On Tue, Jun 19, 2018 at 10:29:49AM +0200, Jan Kara wrote: >>> And for record, the problem with page cache pages is not only that >>> try_to_unmap() may unmap them. It is also that page_mkclean() can >>> write-protect them. And once PTEs are write-protected filesystems may end >>> up doing bad things if DMA then modifies the page contents (DIF/DIX >>> failures, data corruption, oopses). As such I don't think that solutions >>> based on page reference count have a big chance of dealing with the >>> problem. >>> >>> And your page flag approach would also need to take page_mkclean() into >>> account. And there the issue is that until the flag is cleared (i.e., we >>> are sure there are no writers using references from GUP) you cannot >>> writeback the page safely which does not work well with your idea of >>> clearing the flag only once the page is evicted from page cache (hint, page >>> cache page cannot get evicted until it is written back). >>> >>> So as sad as it is, I don't see an easy solution here. >> >> Pages which are "got" don't need to be on the LRU list. They'll be >> marked dirty when they're put, so we can use page->lru for fun things >> like a "got" refcount. If we use bit 1 of page->lru for PageGot, we've >> got 30/62 bits in the first word and a full 64 bits in the second word. > > Interesting idea! It would destroy the aging information for the page but > for pages accessed through GUP references that is very much vague concept > anyway. It might be a bit tricky as pulling a page out of LRU requires page > lock but I don't think that's a huge problem. And page cache pages not on > LRU exist even currently when they are under reclaim so hopefully there > won't be too many places in MM that would need fixing up for such pages. This sound promising, I'll try it out! > > I'm also still pondering the idea of inserting a "virtual" VMA into vma > interval tree in the inode - as the GUP references are IMHO closest to an > mlocked mapping - and that would achieve all the functionality we need as > well. I just didn't have time to experiment with it. How would this work? Would it have the same virtual address range? And how does it avoid the problems we've been discussing? Sorry to be a bit slow here. :) > > And then there's the aspect that both these approaches are a bit too > heavyweight for some get_user_pages_fast() users (e.g. direct IO) - Al Viro > had an idea to use page lock for that path but e.g. fs/direct-io.c would have > problems due to lock ordering constraints (filesystem ->get_block would > suddently get called with the page lock held). But we can probably leave > performance optimizations for phase two. So I assume that phase one would be to apply this approach only to get_user_pages_longterm. (Please let me know if that's wrong.) thanks, -- John Hubbard NVIDIA