Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp4379808pxb; Tue, 31 Aug 2021 03:51:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwBmBpr5b/0tx3upGh+Pl9Reu2TePtw4Tz7lLg9yhbK5jazKQEUhnwaRRO5EmSLSi/CY+/7 X-Received: by 2002:a6b:710f:: with SMTP id q15mr22267041iog.77.1630407068223; Tue, 31 Aug 2021 03:51:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630407068; cv=none; d=google.com; s=arc-20160816; b=Ii1CUSdkFnJR4vfi3bSchwciDJJVMBANcIn21bxH/CDI6f+o4LhVj+ppRCYpv3UqJo 4jRKTmMlYA1lVORckz30MvJEap6jggV99vugvPiCcInYg/lpcybQDIUplVSKs0aG6XUP A8W8SsTst4/kXFBNmkZotw+TpTt4M9PBPc2fcBxWbMjeImAzHQ/WSvKbm5gCW3m6keNl WPM5KXAw5Zu9tkdQioZBtK87uLgvJuJAkhBz2QA1GjD4sP1lLIRIqunylZAwdtQJ7024 W4ZdYXl6HFMvXEukxhc0NUEh0gbWGoEsd/ZOggOx00hcoF1ku38CKAZS3PpGsiaDCwsD kT7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OEcToZkoYKuiFE41/6lTPV4K6JQy3+Js7pUHU25j4eU=; b=llxw2BKfWqLgX2Cquwcq91G7wqf2hR3RkTKyP2Cjs85tZ4Qa6Pei/LELBrvVeh1Dlj p84lZ3tAPzRFjq1BE5DwpamPrHeClOWkw9MhGJzx9wW791ZU1jbjy5unML6fXM5uzL1y TJZcri4atkjZNr/pkgHu/7BqK2lClrqKfNsUFjWABOZn+zXUlqp9s80ZzPin4fJtmxWE qFOolHh4oa9Z4nb+FLh24xh42w9H62SCIUQ1Ka4/E64IC5EfTYs8mY1Wa5v4IQ1iYC/M Z2IQ8UM6Nacr1X3o02QeltyQjbn8QOBC+l1XZgI3Zv84TX+SDuItezih7h5/RLg6/Qv5 nGqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Wnnse1+H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o4si16689726ilf.130.2021.08.31.03.50.56; Tue, 31 Aug 2021 03:51:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Wnnse1+H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239451AbhHaKul (ORCPT + 99 others); Tue, 31 Aug 2021 06:50:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233966AbhHaKuk (ORCPT ); Tue, 31 Aug 2021 06:50:40 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85C79C061575 for ; Tue, 31 Aug 2021 03:49:45 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id b64so19104779qkg.0 for ; Tue, 31 Aug 2021 03:49:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to; bh=OEcToZkoYKuiFE41/6lTPV4K6JQy3+Js7pUHU25j4eU=; b=Wnnse1+HsQN/3SGu5RdMmE/PZiJylJm1QdEcT4L4g9aH2VoryKml8B8CFtYjJqAiIX YIP4aw07Di5gL/seCqODFg7zil6lhe8jdJPaMX1ULiFP06haNBBxCzjUhJKU+f0AwdCF 9arDr1Y4YuQmh0yQGJStjPGmgfh+EoQXbjwu5CNVmZYDpkCQCASSqw4UuJ/otKAs0vjj MQF6d5vrWOmkPgl5uDCCwARBnigAxJfI0g7GhS11ln/8x1vxfyL8OdcThIPFRZkgp5/4 JBzQ0ot7Co/uohYdIDHOk2bU2deDOTxvS1W7bUTwy9da/bCRFHZ0pLXZhkozOM9HJ6PJ 1uOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to; bh=OEcToZkoYKuiFE41/6lTPV4K6JQy3+Js7pUHU25j4eU=; b=nDo8SFQYZ2AA5Q4rtPDQ3Lv0zDBFttpqPooF1xCCTGWWcpGp6JU+yIOm3BTcqWCrOK y6os0TL8c5Y0GI+ndcZqSEle6VQ886I3cSGwkhH+R2PgwV2W9opU6Xj7uisx3SsTLzu5 YS6EeFLRoXH3a1AfkkoV3LUEsFYgALIPGs7oB+H1PQ2PykuZdBDcFUr8rCkl/+94nB0m RLoTfIKzMGaHeqowMd+snJwlBixQOVal/AnS+LJDD9gXZW469Vde1zKpZNxfOZkpXs/E 17FLZYQ2PFNfNQFMXpLy7mGeUByWbKgU4GkdPIwcj0XeTe+Onkrzhqdyy/h9RYkEjJ96 myrw== X-Gm-Message-State: AOAM533CwxJmMywYheT0LDxIeI7cHib8U02JssBU6xbLiVavldsNA7iz /4nO8n//QN/EK8SCYvINO/iTB3zA2mA= X-Received: by 2002:ae9:ea19:: with SMTP id f25mr2232651qkg.341.1630406984666; Tue, 31 Aug 2021 03:49:44 -0700 (PDT) Received: from localhost.localdomain (ec2-35-169-212-159.compute-1.amazonaws.com. [35.169.212.159]) by smtp.gmail.com with ESMTPSA id f29sm10117673qtv.34.2021.08.31.03.49.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Aug 2021 03:49:44 -0700 (PDT) From: SeongJae Park X-Google-Original-From: SeongJae Park To: David Hildenbrand Cc: SeongJae Park , akpm@linux-foundation.org, markubo@amazon.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, SeongJae Park Subject: Re: [PATCH] mm/damon/vaddr: Safely walk page table Date: Tue, 31 Aug 2021 10:49:38 +0000 Message-Id: <20210831104938.33548-1-sjpark@amazon.de> X-Mailer: git-send-email 2.17.1 In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: SeongJae Park On Tue, 31 Aug 2021 11:53:05 +0200 David Hildenbrand wrote: > On 27.08.21 17:04, SeongJae Park wrote: > > From: SeongJae Park > > > > Commit d7f647622761 ("mm/damon: implement primitives for the virtual > > memory address spaces") of linux-mm[1] tries to find PTE or PMD for > > arbitrary virtual address using 'follow_invalidate_pte()' without proper > > locking[2]. This commit fixes the issue by using another page table > > walk function for more general use case under proper locking. > > > > [1] https://github.com/hnaz/linux-mm/commit/d7f647622761 > > [2] https://lore.kernel.org/linux-mm/3b094493-9c1e-6024-bfd5-7eca66399b7e@redhat.com > > > > Fixes: d7f647622761 ("mm/damon: implement primitives for the virtual memory address spaces") > > Reported-by: David Hildenbrand > > Signed-off-by: SeongJae Park > > --- > > mm/damon/vaddr.c | 81 +++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 74 insertions(+), 7 deletions(-) > > > > diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c > > index 230db7413278..b3677f2ef54b 100644 > > --- a/mm/damon/vaddr.c > > +++ b/mm/damon/vaddr.c > > @@ -8,10 +8,12 @@ > > #define pr_fmt(fmt) "damon-va: " fmt > > > > #include > > +#include > > #include > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -446,14 +448,69 @@ static void damon_pmdp_mkold(pmd_t *pmd, struct mm_struct *mm, > > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > } > > > > +struct damon_walk_private { > > + pmd_t *pmd; > > + pte_t *pte; > > + spinlock_t *ptl; > > +}; > > + > > +static int damon_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next, > > + struct mm_walk *walk) > > +{ > > + struct damon_walk_private *priv = walk->private; > > + > > + if (pmd_huge(*pmd)) { > > + priv->ptl = pmd_lock(walk->mm, pmd); > > + if (pmd_huge(*pmd)) { > > + priv->pmd = pmd; > > + return 0; > > + } > > + spin_unlock(priv->ptl); > > + } > > + > > + if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd))) > > + return -EINVAL; > > + priv->pte = pte_offset_map_lock(walk->mm, pmd, addr, &priv->ptl); > > + if (!pte_present(*priv->pte)) { > > + pte_unmap_unlock(priv->pte, priv->ptl); > > + priv->pte = NULL; > > + return -EINVAL; > > + } > > + return 0; > > +} > > + > > +static struct mm_walk_ops damon_walk_ops = { > > + .pmd_entry = damon_pmd_entry, > > +}; > > + > > +int damon_follow_pte_pmd(struct mm_struct *mm, unsigned long addr, > > + struct damon_walk_private *private) > > +{ > > + int rc; > > + > > + private->pte = NULL; > > + private->pmd = NULL; > > + rc = walk_page_range(mm, addr, addr + 1, &damon_walk_ops, private); > > + if (!rc && !private->pte && !private->pmd) > > + return -EINVAL; > > + return rc; > > +} > > + > > static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) > > { > > - pte_t *pte = NULL; > > - pmd_t *pmd = NULL; > > + struct damon_walk_private walk_result; > > + pte_t *pte; > > + pmd_t *pmd; > > spinlock_t *ptl; > > > > - if (follow_invalidate_pte(mm, addr, NULL, &pte, &pmd, &ptl)) > > + mmap_write_lock(mm); > > Can you elaborate why mmap_read_lock() isn't sufficient for your use > case? The write mode might heavily affect damon performance and workload > impact. Because as you also mentioned in the previous mail, 'we can walk page tables ignoring VMAs with the mmap semaphore held in write mode', and in this case we don't know to which VMA the address is belong. I thought the link to the mail can help people understanding the reason. But, as you are suggesting, I now think putting an elaborated explanation here would be much better. I will also put a warning for the possible performance impact. > > > Also, I wonder if it wouldn't be much easier and cleaner to just handle > it completely in the .pmd_entry callback, instead of returning PMDs, > PTEs, LOCKs, ... here. > > You could have > > static struct mm_walk_ops damon_mkold_ops = { > .pmd_entry = damon_mkold_pmd_entry, > }; > > and > > static struct mm_walk_ops damon_young_ops = { > .pmd_entry = damon_young_pmd_entry, > }; > > And then just handle everything completely inside the callback, avoiding > having to return locked PTEs, PMDs, .... and instead handling it at a > single location. Simply forward the page_sz pointer in the latter case > to damon_young_ops. > > > damon_va_mkold()/damon_va_young() would mostly only call > walk_page_range() with the right ops and eventually convert some return > values. I just wanted to make the change as small as possible, but you're right. That must be much cleaner. I will post the next version soon. Thanks, SeongJae Park > > -- > Thanks, > > David / dhildenb