Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp5607293ybi; Wed, 31 Jul 2019 00:15:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqywgBmNQ8LzVf/tCWjfc8sgNxHGQ1WxfyvN+rcoZ9Uiy3YjiG9j4CgE0UoC/afs4WsZDdKX X-Received: by 2002:a17:90a:220a:: with SMTP id c10mr1431622pje.33.1564557312656; Wed, 31 Jul 2019 00:15:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564557312; cv=none; d=google.com; s=arc-20160816; b=eRhQ3eMs6r0fDOrHAJegx3i5mL3Jd8+bO/GH7l/uBXAhi5PsFNOtQiP7hwySxpIn+J LJbLOW2OxYhiTn93CS1S0ESNoUcKe5Xh350IFS0xpQf0E1ltLlSW8yTOJd2V1VE68yx0 bDX8k/dHv1I5r8tfwsLGs1WmfAHip8ytDbc/CAuyYBKessSt6EmqBVS7QUTWghnd9b6N QBSjU30o8JBrNAprq2rtT6VOG40ey4Iojj2rrSStlvDbf6gkB3JMzbkmYedxNqKritEN tm52g8N1iBSe8X59HKv4KPzDFK6Ts7OEKpUL0G4W7+JFkcOMWtIe6tytoZB2lMvIiDU6 /+ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=iqFMksg+GEFnhHcmM4K8TjXa4SGBpwoJKaNawCff3BE=; b=kacmHyYohQ7yfKCvNJz4ThyIcD1sz2oqe6bUgV9Ms/yr7utcg2uD/FZwCvDecV6nBY UM8EEu/zkV3KlVHcAPiKnYz8/6IP0ial8UiQIfYHStadiAexAYP4ZPc5kYIsbmQBB7qt vK85t71yU6d5st8SsFwcrZGaZxO9Z7QQiuSLdgFiPbg3+/uY1gWxrOJ3i0low4jp6Waj 9HdXpHvGPGP3JL6T6h1aLtEHnsKyuzQ5v6QGBNNTbqoPQOaV1AnrYnXEylkBPgksgVnT XjGU6KQ8rcADxj8ZLWisKnqY6nE9irPDcTbwl/a5pg/v6kUQ2hNPoLm1QHUe7TjU1Plx HgKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=pIS3Y8Su; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n6si37784590pgs.333.2019.07.31.00.14.56; Wed, 31 Jul 2019 00:15:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=pIS3Y8Su; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727158AbfGaGOr (ORCPT + 99 others); Wed, 31 Jul 2019 02:14:47 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:41792 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726078AbfGaGOr (ORCPT ); Wed, 31 Jul 2019 02:14:47 -0400 Received: by mail-pl1-f196.google.com with SMTP id m9so29887790pls.8 for ; Tue, 30 Jul 2019 23:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=iqFMksg+GEFnhHcmM4K8TjXa4SGBpwoJKaNawCff3BE=; b=pIS3Y8SuH3hxHZYjEY+VPhyyjAyo+FGqtGWW4NAuNJfqiqSygcvaq1zogoZMZ22TDV e5/jFcKAqDx4QgkhPV/1LWdmUTVf2YDlRCVHSue09MBhLKtMMGWoxspklADGUYxUARbJ 46c8B/8+nQT0JQ7zxbD0kr8Vock5IGm9m0UJ+m6oCCl3Q06B+1fGK6bCiDdMJdQQucoY 0v8ReN13g5if32JdgIjkTLHW7JfgkznVnENKTdGMGreyrD3BpaVGsIQIjIoZXNLxzCzo 0R7aNcYuW3KqKa7dbBwWu3xZthNfGW+NlU04sAXWE4Ds4uO6RSpJ1mgPAfQfkoGmC6qg gL/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=iqFMksg+GEFnhHcmM4K8TjXa4SGBpwoJKaNawCff3BE=; b=fwYbxNDjTVFhwBSuM02vyr4VeJi6q5Tc7ZuIWKhjKByMcMcoLwENzh4vd7Aqw3SgZ7 sptzWWtL4W7ywYx7/sBhXavaGUbtWQ6IynO17gBKBX82uIYHqlUFJwqXtkd/oDpivSpa koUgi9GFZ0KyhFhlqSW+IMk2ucrsBsnIiH6Skts4F8+r/oYLRmDNG7G3zQwoUH5Pw3va YcUqJYVcXDjqgXtbzNGAB0viwrbsgPJ5SuTZW9HVcY7oCufUR7S5WxCdiyHBjFxWJdaQ +Hyi8sQnkoYt0ISjp8qVnuBwvcy5vdDjq1JO2ubWbjgGICcIpab1qyIgcAxdNXqGmeYj jl9Q== X-Gm-Message-State: APjAAAUAqSabs5wm6GjE0Gx7F7fPNakdps2Q6zvorvDP9yC/ON9Rv2Dp NDBZTfTmQDubgIbqYZn4sHA= X-Received: by 2002:a17:902:2bc5:: with SMTP id l63mr122711848plb.30.1564553685998; Tue, 30 Jul 2019 23:14:45 -0700 (PDT) Received: from google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id 22sm76624580pfu.179.2019.07.30.23.14.42 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 30 Jul 2019 23:14:44 -0700 (PDT) Date: Wed, 31 Jul 2019 15:14:40 +0900 From: Minchan Kim To: Andrew Morton Cc: Michal Hocko , LKML , linux-mm , Miguel de Dios , Wei Wang , Johannes Weiner , Mel Gorman Subject: Re: [PATCH] mm: release the spinlock on zap_pte_range Message-ID: <20190731061440.GC155569@google.com> References: <20190729071037.241581-1-minchan@kernel.org> <20190729074523.GC9330@dhcp22.suse.cz> <20190729082052.GA258885@google.com> <20190730124207.da70f92f19dc021bf052abd0@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190730124207.da70f92f19dc021bf052abd0@linux-foundation.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 30, 2019 at 12:42:07PM -0700, Andrew Morton wrote: > On Mon, 29 Jul 2019 17:20:52 +0900 Minchan Kim wrote: > > > > > @@ -1022,7 +1023,16 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > > > > flush_tlb_batched_pending(mm); > > > > arch_enter_lazy_mmu_mode(); > > > > do { > > > > - pte_t ptent = *pte; > > > > + pte_t ptent; > > > > + > > > > + if (progress >= 32) { > > > > + progress = 0; > > > > + if (need_resched()) > > > > + break; > > > > + } > > > > + progress += 8; > > > > > > Why 8? > > > > Just copied from copy_pte_range. > > copy_pte_range() does > > if (pte_none(*src_pte)) { > progress++; > continue; > } > entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte, > vma, addr, rss); > if (entry.val) > break; > progress += 8; > > which appears to be an attempt to balance the cost of copy_one_pte() > against the cost of not calling copy_one_pte(). > Indeed. > Your code doesn't do this balancing and hence can be simpler. Based on the balancing code of copy_one_pte, it seems we should balance it with cost of mark_page_accessed against the cost of not calling mark_page_accessed. IOW, add up 8 only when mark_page_accessed is called. However, every mark_page_accessed is not heavy since it uses pagevec and caller couldn't know whether the target page will be activated or just have PG_referenced which is cheap. Thus, I agree, do not make it complicated. > > It all seems a bit overdesigned. need_resched() is cheap. It's > possibly a mistake to check need_resched() on *every* loop because some > crazy scheduling load might livelock us. But surely it would be enough > to do something like > > if (progress++ && need_resched()) { > > progress = 0; > } > > and leave it at that? Seems like this? From bb1d7aaf520e98a6f9d988c25121602c28e12e67 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Mon, 29 Jul 2019 15:28:48 +0900 Subject: [PATCH] mm: release the spinlock on zap_pte_range In our testing(carmera recording), Miguel and Wei found unmap_page_range takes above 6ms with preemption disabled easily. When I see that, the reason is it holds page table spinlock during entire 512 page operation in a PMD. 6.2ms is never trivial for user experince if RT task couldn't run in the time because it could make frame drop or glitch audio problem. I had a time to benchmark it via adding some trace_printk hooks between pte_offset_map_lock and pte_unmap_unlock in zap_pte_range. The testing device is 2018 premium mobile device. I can get 2ms delay rather easily to release 2M(ie, 512 pages) when the task runs on little core even though it doesn't have any IPI and LRU lock contention. It's already too heavy. If I remove activate_page, 35-40% overhead of zap_pte_range is gone so most of overhead(about 0.7ms) comes from activate_page via mark_page_accessed. Thus, if there are LRU contention, that 0.7ms could accumulate up to several ms. Thus, this patch adds preemption point for once every 32 times in the loop. Reported-by: Miguel de Dios Reported-by: Wei Wang Cc: Michal Hocko Cc: Johannes Weiner Cc: Mel Gorman Signed-off-by: Minchan Kim --- mm/memory.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 2e796372927fd..8bfcef09da674 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1007,6 +1007,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, struct zap_details *details) { struct mm_struct *mm = tlb->mm; + int progress = 0; int force_flush = 0; int rss[NR_MM_COUNTERS]; spinlock_t *ptl; @@ -1022,7 +1023,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); do { - pte_t ptent = *pte; + pte_t ptent; + + if (progress++ >= 32) { + progress = 0; + if (need_resched()) + break; + } + + ptent = *pte; if (pte_none(ptent)) continue; @@ -1123,8 +1132,11 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (force_flush) { force_flush = 0; tlb_flush_mmu(tlb); - if (addr != end) - goto again; + } + + if (addr != end) { + progress = 0; + goto again; } return addr; -- 2.22.0.709.g102302147b-goog