Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp3659110imm; Mon, 2 Jul 2018 03:17:11 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdqPj8QBlaAVvgjtA95i9UKmsQZAE2ZhSaZ2rIkrnXW1y9FkgElEer+XOqFm7StYp8uHU1k X-Received: by 2002:aa7:8298:: with SMTP id s24-v6mr24627777pfm.136.1530526631157; Mon, 02 Jul 2018 03:17:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530526631; cv=none; d=google.com; s=arc-20160816; b=AWQ2QpLBpDrGwTSEsfbeW20zRFxgdVRS1rCYmhISL0gVKOe4K1ztX1O3UaOBAfbabQ VP6fNx3gjgANdBeeIijOCguKYZL1xgffufEA8lb2V9kcWFKK040MXaoEzp+Vz8TpsN1g 05HGr4aFrpIH925Tkp7ng1rlV/GbJnjcMZBu067cYQvGXcmVe+rfkhwygJsjv3eN14nF +Rf08W6i1f3D/LciAw4hYtM/9ZiHK8sZ4VAcEL1ohuPfUkmNDiC7ADNN3NKZRP89ialB /Btubx1NLw4uxJyJYcQ83qrRiTjA1hg/6crA94doT3HgH+o1bAGX99kzdTrk9SVlznyv U9nA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=RtC6d0K5r/YjE+Vfk3iHo4Va+1GdbBcN3Ve5+NTib30=; b=FcIse7rx5TnlAP17RUTn0jRrr6jiaepUJnHq6D12iTEo3v6vrVDHCEj/4Mv2EFb5Jh ICsfbifQ5mTQdG+aDlUA5zplf3GK5mB/dVeRe+dCyA8iyyrIVmj+xzZWRffXTdpNZ/NX MXplHbj7HY0fPimYmMysHaf+TDK42s/dicLwqsQipP3XVtIuIM+dZRSQB0W4UO68iX1T /FyX6IQBf1yHjzNw4b26Hnc+AHauXSztiDwdVWHJCrAJuykwssVvsTuUIY7p+aWUZWNS 3Qvqz/frJWAVglJfCTCWHEZ6D1gBEVzMSKBV9/NZvToFpLd2pLN3MpvlY1qPIUkmNxgJ 9swg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l189-v6si7686907pfc.356.2018.07.02.03.16.56; Mon, 02 Jul 2018 03:17:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965439AbeGBKPs (ORCPT + 99 others); Mon, 2 Jul 2018 06:15:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:36926 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965246AbeGBKPo (ORCPT ); Mon, 2 Jul 2018 06:15:44 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 85537ACCE; Mon, 2 Jul 2018 10:15:42 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 35CF41E3C07; Mon, 2 Jul 2018 12:15:42 +0200 (CEST) Date: Mon, 2 Jul 2018 12:15:42 +0200 From: Jan Kara To: john.hubbard@gmail.com Cc: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , Jan Kara , linux-mm@kvack.org, LKML , linux-rdma , linux-fsdevel@vger.kernel.org, John Hubbard Subject: Re: [PATCH v2 6/6] mm: page_mkclean, ttu: handle pinned pages Message-ID: <20180702101542.fi7ndfkg5fpzodey@quack2.suse.cz> References: <20180702005654.20369-1-jhubbard@nvidia.com> <20180702005654.20369-7-jhubbard@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180702005654.20369-7-jhubbard@nvidia.com> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun 01-07-18 17:56:54, john.hubbard@gmail.com wrote: > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 9d142b9b86dc..c4bc8d216746 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -931,6 +931,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, > int kill = 1, forcekill; > struct page *hpage = *hpagep; > bool mlocked = PageMlocked(hpage); > + bool skip_pinned_pages = false; I'm not sure we can afford to wait for page pins when handling page poisoning. In an ideal world we should but... But I guess this is for someone understanding memory poisoning better to judge. > diff --git a/mm/rmap.c b/mm/rmap.c > index 6db729dc4c50..c137c43eb2ad 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -879,6 +879,26 @@ int page_referenced(struct page *page, > return pra.referenced; > } > > +/* Must be called with pinned_dma_lock held. */ > +static void wait_for_dma_pinned_to_clear(struct page *page) > +{ > + struct zone *zone = page_zone(page); > + > + while (PageDmaPinnedFlags(page)) { > + spin_unlock(zone_gup_lock(zone)); > + > + schedule(); > + > + spin_lock(zone_gup_lock(zone)); > + } > +} Ouch, we definitely need something better here. Either reuse the page_waitqueue() mechanism or create at least a global wait queue for this (I don't expect too much contention on the waitqueue and even if there eventually is, we can switch to page_waitqueue() when we find it). But this is a no-go... > + > +struct page_mkclean_info { > + int cleaned; > + int skipped; > + bool skip_pinned_pages; > +}; > + > static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, > unsigned long address, void *arg) > { > @@ -889,7 +909,24 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, > .flags = PVMW_SYNC, > }; > unsigned long start = address, end; > - int *cleaned = arg; > + struct page_mkclean_info *mki = (struct page_mkclean_info *)arg; > + bool is_dma_pinned; > + struct zone *zone = page_zone(page); > + > + /* Serialize with get_user_pages: */ > + spin_lock(zone_gup_lock(zone)); > + is_dma_pinned = PageDmaPinned(page); Hum, why do you do this for each page table this is mapped in? Also the locking is IMHO going to hurt a lot and we need to avoid it. What I think needs to happen is that in page_mkclean(), after you've cleared all the page tables, you check PageDmaPinned() and wait if needed. Page cannot be faulted in again as we hold page lock and so races with concurrent GUP are fairly limited. So with some careful ordering & memory barriers you should be able to get away without any locking. Ditto for the unmap path... Honza -- Jan Kara SUSE Labs, CR