Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2330326imm; Sat, 13 Oct 2018 16:02:10 -0700 (PDT) X-Google-Smtp-Source: ACcGV61gBhcPAZS4jIbRdNVId/pvbJlDsWIsLPwpA/2ECYZZ2veaKmIBwNAwtH61gaOicwA1y1qj X-Received: by 2002:a65:66c9:: with SMTP id c9-v6mr10688015pgw.55.1539471730102; Sat, 13 Oct 2018 16:02:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539471730; cv=none; d=google.com; s=arc-20160816; b=ZsKEhJgOvi9fqPmN8YB7/GOhAYlRINFcW/GITUTREY6aYljgvs9EMCCrZ1enh6E3m4 knCxP6iVXZQl9BUMdaY9SwCtl7iHckJOXFrNF1HXdlPUiK44srW0CvMNXC+x4W9s8NSR v0nYdQWfjNLQrX9QCXD6zNDt1G8ER17N8LBV4c383KKm4omfrXqRx6jEfjMmSbqTkGyj ygrcn8ORJaG8a/XJxdeRM5q7YSvk91Gk8nt3ZBYlRQ/sxod9VfUTokmwReprD8TR/Y2r Zs1zPDjCECDfXeaZW0huj7Tc+Lz3RgVKeT/RYF4JU4+WNIdi1YEYNS7uyC5gVmEUWI+y slIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=u3EId4kSM9HFdWKAFCmY88Ttjz6SSou0kOQMGc22FQc=; b=BDHBTVzyEmzksn05dqK0bq4k1tJ1klmS6JXGe83eAMnbu+iLlGF1+7qTr7ltDBpwex 1QIvOd1PG7KCq8OGOedFbXiAx7nM+y/Dj6VufJ/3IqzAeG/VcrA8ycqj6gfYGEQagqS4 2150394jzF7er29KTHuv97OJPTJz78XILaUpUknbfcLydjumFwdPr9xcMsEwThH9iE2B SShHhlxQcUs38UlInWC1puquf61RbvXDmLtYQv6dSDHpDvBQbgcJMsxmiqe2kEA0lAWf GhoCaA5DAuoDPw8d6ZxbB6o6eRi4Dl2jRT90ZZINiPrgoN/YTpaGMwAP8c9JHWpC8PGz Ipaw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q76-v6si6166563pfa.91.2018.10.13.16.01.53; Sat, 13 Oct 2018 16:02:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726283AbeJNGkQ (ORCPT + 99 others); Sun, 14 Oct 2018 02:40:16 -0400 Received: from ipmail01.adl6.internode.on.net ([150.101.137.136]:54716 "EHLO ipmail01.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725990AbeJNGkQ (ORCPT ); Sun, 14 Oct 2018 02:40:16 -0400 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail01.adl6.internode.on.net with ESMTP; 14 Oct 2018 09:31:25 +1030 Received: from dave by dastard with local (Exim 4.80) (envelope-from ) id 1gBSuS-0007hP-6c; Sun, 14 Oct 2018 10:01:24 +1100 Date: Sun, 14 Oct 2018 10:01:24 +1100 From: Dave Chinner To: John Hubbard Cc: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara , linux-mm@kvack.org, Andrew Morton , LKML , linux-rdma , linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 4/6] mm: introduce page->dma_pinned_flags, _count Message-ID: <20181013230124.GB18822@dastard> References: <20181012060014.10242-1-jhubbard@nvidia.com> <20181012060014.10242-5-jhubbard@nvidia.com> <20181013035516.GA18822@dastard> <7c2e3b54-0b1d-6726-a508-804ef8620cfd@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7c2e3b54-0b1d-6726-a508-804ef8620cfd@nvidia.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 13, 2018 at 12:34:12AM -0700, John Hubbard wrote: > On 10/12/18 8:55 PM, Dave Chinner wrote: > > On Thu, Oct 11, 2018 at 11:00:12PM -0700, john.hubbard@gmail.com wrote: > >> From: John Hubbard > [...] > >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > >> index 5ed8f6292a53..017ab82e36ca 100644 > >> --- a/include/linux/mm_types.h > >> +++ b/include/linux/mm_types.h > >> @@ -78,12 +78,22 @@ struct page { > >> */ > >> union { > >> struct { /* Page cache and anonymous pages */ > >> - /** > >> - * @lru: Pageout list, eg. active_list protected by > >> - * zone_lru_lock. Sometimes used as a generic list > >> - * by the page owner. > >> - */ > >> - struct list_head lru; > >> + union { > >> + /** > >> + * @lru: Pageout list, eg. active_list protected > >> + * by zone_lru_lock. Sometimes used as a > >> + * generic list by the page owner. > >> + */ > >> + struct list_head lru; > >> + /* Used by get_user_pages*(). Pages may not be > >> + * on an LRU while these dma_pinned_* fields > >> + * are in use. > >> + */ > >> + struct { > >> + unsigned long dma_pinned_flags; > >> + atomic_t dma_pinned_count; > >> + }; > >> + }; > > > > Isn't this broken for mapped file-backed pages? i.e. they may be > > passed as the user buffer to read/write direct IO and so the pages > > passed to gup will be on the active/inactive LRUs. hence I can't see > > how you can have dual use of the LRU list head like this.... > > > > What am I missing here? > > Hi Dave, > > In patch 6/6, pin_page_for_dma(), which is called at the end of get_user_pages(), > unceremoniously rips the pages out of the LRU, as a prerequisite to using > either of the page->dma_pinned_* fields. How is that safe? If you've ripped the page out of the LRU, it's no longer being tracked by the page cache aging and reclaim algorithms. Patch 6 doesn't appear to put these pages back in the LRU, either, so it looks to me like this just dumps them on the ground after the gup reference is dropped. How do we reclaim these page cache pages when there is memory pressure if they aren't in the LRU? > The idea is that LRU is not especially useful for this situation anyway, > so we'll just make it one or the other: either a page is dma-pinned, and > just hanging out doing RDMA most likely (and LRU is less meaningful during that > time), or it's possibly on an LRU list. gup isn't just used for RDMA. It's used by direct IO in far, far more situations and machines than RDMA is. Please explain why ripping pages out of the LRU and not putting them back is safe, has no side effects, doesn't adversely impact page cache reclaim, etc. Indeed, I'd love to see a description of all the page references and where they come and go so we know the changes aren't just leaking these pages until the filesystem invalidates them at unmount. Maybe I'm not seeing why this is safe yet, but seeing as you haven't explained why it is safe then, at minimum, the patch descriptions are incomplete. Cheers, Dave. -- Dave Chinner david@fromorbit.com