Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp5014867imm; Tue, 19 Jun 2018 03:43:42 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKP88dY3eM+8kO8lebGlBwIfK7J8qJHHTWa2xkc7vfAa4zWjMfPzxTiDEiKZAV8dlHjItRO X-Received: by 2002:aa7:8510:: with SMTP id v16-v6mr17602155pfn.77.1529405022870; Tue, 19 Jun 2018 03:43:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529405022; cv=none; d=google.com; s=arc-20160816; b=qHPHhFOq2VWCUkvrnNlRvMKGeflLweHZeoZsL/Z6hsoNpapyr1jXDf3+0GRdcSIvAr HMiylF1oKZAfXyOR27c/PW/WpYu7L9473ghY8q57bdppEX4cJgzPJf7FgfXx8mU8otZv SuO5+PyvKZnvIrIP7S8OEzrWY+SMwjbqiMI7LlUTVRPjDk4kHOQQcleujYaPKbQDGnrI kjx+ZIUxghOhtTydTFtP4J56uts5+Fd1LfrmKAvjG+svYBXJKOgAPQZu3yZycxFkC1Fk SNYZRYUAMyX7Xu6dIMyIieNeai2x+DcV7rBH2rzc0Q+Cemn0Ac1X+/xAHIMxqE0oKpGQ TRFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=nAZydf8hedazczUc6+OFtxMpjAniNOkosLGjxtcZ0e0=; b=nhoyCRGmbAQgjzWs2SU52z57Kqc3HQ1P40DSOPsXVINSbZQCWhii19LSW5FFt1oVU3 nwo+WFGmyH3CRXpETW/H8fsEKIcsHgBalLs8gt7D15xxNN0q/UVEnfh1G1tnETdrteJo OR4zOZ476VBgR6I6eMWBmZmLmnpsCp8SPAF3h+TyM7kv3wmEDLjwd+Fb/+vWgSAo5Ol7 Wu4U5FsfUqt7Z7pyyzlRDJ/fOfcozKiV8DKpNGh1DRAoWCuAJq1YUtqWd2OAWIuap+F5 OA2pwJPbeP/dabh5TgsO/Uf5SBksrhQqszRT4dWjey1X8JxucQIYE92osUbwUnJHGHwd /O3g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o10-v6si14221761pgq.545.2018.06.19.03.43.28; Tue, 19 Jun 2018 03:43:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965978AbeFSKlq (ORCPT + 99 others); Tue, 19 Jun 2018 06:41:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:54077 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965898AbeFSKlo (ORCPT ); Tue, 19 Jun 2018 06:41:44 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 3D5C8ABF4; Tue, 19 Jun 2018 10:41:43 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id BD0721E0AAF; Tue, 19 Jun 2018 12:41:42 +0200 (CEST) Date: Tue, 19 Jun 2018 12:41:42 +0200 From: Jan Kara To: Matthew Wilcox Cc: Jan Kara , John Hubbard , Dan Williams , Christoph Hellwig , Jason Gunthorpe , John Hubbard , Michal Hocko , Christopher Lameter , Linux MM , LKML , linux-rdma Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*() Message-ID: <20180619104142.lpilc6esz7w3a54i@quack2.suse.cz> References: <311eba48-60f1-b6cc-d001-5cc3ed4d76a9@nvidia.com> <20180618081258.GB16991@lst.de> <3898ef6b-2fa0-e852-a9ac-d904b47320d5@nvidia.com> <0e6053b3-b78c-c8be-4fab-e8555810c732@nvidia.com> <20180619082949.wzoe42wpxsahuitu@quack2.suse.cz> <20180619090255.GA25522@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180619090255.GA25522@bombadil.infradead.org> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 19-06-18 02:02:55, Matthew Wilcox wrote: > On Tue, Jun 19, 2018 at 10:29:49AM +0200, Jan Kara wrote: > > And for record, the problem with page cache pages is not only that > > try_to_unmap() may unmap them. It is also that page_mkclean() can > > write-protect them. And once PTEs are write-protected filesystems may end > > up doing bad things if DMA then modifies the page contents (DIF/DIX > > failures, data corruption, oopses). As such I don't think that solutions > > based on page reference count have a big chance of dealing with the > > problem. > > > > And your page flag approach would also need to take page_mkclean() into > > account. And there the issue is that until the flag is cleared (i.e., we > > are sure there are no writers using references from GUP) you cannot > > writeback the page safely which does not work well with your idea of > > clearing the flag only once the page is evicted from page cache (hint, page > > cache page cannot get evicted until it is written back). > > > > So as sad as it is, I don't see an easy solution here. > > Pages which are "got" don't need to be on the LRU list. They'll be > marked dirty when they're put, so we can use page->lru for fun things > like a "got" refcount. If we use bit 1 of page->lru for PageGot, we've > got 30/62 bits in the first word and a full 64 bits in the second word. Interesting idea! It would destroy the aging information for the page but for pages accessed through GUP references that is very much vague concept anyway. It might be a bit tricky as pulling a page out of LRU requires page lock but I don't think that's a huge problem. And page cache pages not on LRU exist even currently when they are under reclaim so hopefully there won't be too many places in MM that would need fixing up for such pages. I'm also still pondering the idea of inserting a "virtual" VMA into vma interval tree in the inode - as the GUP references are IMHO closest to an mlocked mapping - and that would achieve all the functionality we need as well. I just didn't have time to experiment with it. And then there's the aspect that both these approaches are a bit too heavyweight for some get_user_pages_fast() users (e.g. direct IO) - Al Viro had an idea to use page lock for that path but e.g. fs/direct-io.c would have problems due to lock ordering constraints (filesystem ->get_block would suddently get called with the page lock held). But we can probably leave performance optimizations for phase two. Honza -- Jan Kara SUSE Labs, CR