Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp6784998pxb; Wed, 17 Feb 2021 13:25:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJy/+8z+TUCOFsoCwEiQ6rbGLlJvwyngk1Nd3ByCxlJ1WjpUnsILjuxKkHwd/uUXQY8yx1eH X-Received: by 2002:a17:906:2755:: with SMTP id a21mr899725ejd.374.1613597107440; Wed, 17 Feb 2021 13:25:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613597107; cv=none; d=google.com; s=arc-20160816; b=DcrIioZu+sbErhsPD9fsDFybAX8LSeKS+fZpATycSItvAhS/1te+iHsscL0U9Sgt4b YrerP5cNiKlJTJoOXfbbWHuA3kep0H4yAQQ1TOGESmvAfUp4HDFUuA51MlS2BFr91gMO JPgeOCjieN4X7ViuKs3+QhXwIgCYLL1Bdwp3Udvnwl2SsPt2HYVEcLIpw5eSvdoFOwzP Cfj28WaNFSJHlGFF/cdsv1K+0dcvr0qG0oxXsOHOmcRNqiKxCTTZcU/K6ceYaQW0sp4T IN49fhMyYIJdG0mFZydVPA2eHoKJe4AGBgH9iKw6J9EBl/eJsAxU8DNzF8vltUdjJnFl TwZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=hZtRJeUML7NHUfQaoih4ryjpEijORracEaJufXohTpE=; b=xKVTzZMyJZL0smLpY8S77oOyO5B+nPEjCycWp2y/pKUEfokmmobwWCvKcav+zDTGDt Wp+CPCCsk5+f37TxsJ3x7JzQC3GlDGqvCbOlRVrPqfM6w22G5hKLW97ZojVOjTEY/EvE Xwl8RuEj38PH2Gjwvvb3i5MIApvigCNJF0De7V80MAXb6Um7ivAU27yqc7iFhsD0C1N5 yR0qhAGmrjDAmeaZe0gjkPmIAQgfMv5kkpVbvJiF6xVEPembuB+GkTThkVnbQvMnJ1pa 2WDx3J3O9umDDB1ii6NZJLXPF/VGsy3DdcyGKJBpUsH90tNKuXS1Pno1vbPbEk+Aadhi /S8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cc8vViWY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d4si2244194ejb.594.2021.02.17.13.24.40; Wed, 17 Feb 2021 13:25:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cc8vViWY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235026AbhBQU07 (ORCPT + 99 others); Wed, 17 Feb 2021 15:26:59 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:26643 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235015AbhBQU0u (ORCPT ); Wed, 17 Feb 2021 15:26:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613593522; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hZtRJeUML7NHUfQaoih4ryjpEijORracEaJufXohTpE=; b=cc8vViWY2QCgC5YaU/p1FluPbyHCTL/18aHudvKUjuK8crgjn+n7yv8Ihod6nXmykYf3JL QGS4AAkr8bHycQANPmsbIxriEC/xVF/zyxhEs5i0vts8w1oYePAIWtRX20gxV39/r20u+9 m751s3FzqHYnDZQJjt/oY6A72pgYl9M= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-184-2_Jx9N0TOFK7fWZrDSTPQQ-1; Wed, 17 Feb 2021 15:25:21 -0500 X-MC-Unique: 2_Jx9N0TOFK7fWZrDSTPQQ-1 Received: by mail-qt1-f197.google.com with SMTP id 22so10914738qty.14 for ; Wed, 17 Feb 2021 12:25:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hZtRJeUML7NHUfQaoih4ryjpEijORracEaJufXohTpE=; b=l6/GZgCk8IGqPT77S9uZysyLazRAtSZHV+DUi9/1PT3Bx/lt0Hi15R75SGDCiVv2t2 c5/7Xa40yjGIE4BS1nOilmn6oLoq31DNfKggJUkKCRffF1R/MJSUgBQbxsvTcNi0Tlpi wuiggi2MiSCKpgszqWx1+DsSoNyHxkj/hwvUxNsGWuhD3iJKtnuMGkcG7NUDgX4HLg7u mKJ9bRPuHedogB5lUuu5GBKj8coMdrsm/rpi3rCjujUnu4gb68TDdeSjWdIt2Y5aCx2c c6omvHk4zG0nO9vzwLm/7aQe5KTkJZnXnQ2NpMCUu8mmKul2Ms1jth76MBb6W6I0//8E 11YA== X-Gm-Message-State: AOAM530m42ZknE2QUfVYbeK9qEMgtQ8hvKq/nldRjjUWIeGjik6Ni1fV PlqwDlJf8503NDVJ4v03+f15uh17iG9Jdf/nVW2CVz/NzF2POZ1pt6Neg2RQNP4Bs7GjT7ZGFBR Nx6pbls34gsHDyPbehMVukauk X-Received: by 2002:a37:e4d:: with SMTP id 74mr997932qko.109.1613593520568; Wed, 17 Feb 2021 12:25:20 -0800 (PST) X-Received: by 2002:a37:e4d:: with SMTP id 74mr997909qko.109.1613593520273; Wed, 17 Feb 2021 12:25:20 -0800 (PST) Received: from xz-x1 (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id r80sm2436111qke.97.2021.02.17.12.25.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 12:25:19 -0800 (PST) Date: Wed, 17 Feb 2021 15:25:18 -0500 From: Peter Xu To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, shu wang , Axel Rasmussen , Andrea Arcangeli , Heiko Carstens , Alexey Dobriyan , Matthew Wilcox , Michel Lespinasse , Andrew Morton Subject: Re: [RFC PATCH 5/5] mm proc/task_mmu.c: add hugetlb specific routine for clear_refs Message-ID: <20210217202518.GA19238@xz-x1> References: <20210211000322.159437-1-mike.kravetz@oracle.com> <20210211000322.159437-6-mike.kravetz@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210211000322.159437-6-mike.kravetz@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 10, 2021 at 04:03:22PM -0800, Mike Kravetz wrote: > There was is no hugetlb specific routine for clearing soft dirty and > other referrences. The 'default' routines would only clear the > VM_SOFTDIRTY flag in the vma. > > Add new routine specifically for hugetlb vmas. > > Signed-off-by: Mike Kravetz > --- > fs/proc/task_mmu.c | 110 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 110 insertions(+) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 829b35016aaa..f06cf9b131a8 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -1116,6 +1116,115 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, > } > #endif > > +#ifdef CONFIG_HUGETLB_PAGE > +static inline bool huge_pte_is_pinned(struct vm_area_struct *vma, > + unsigned long addr, pte_t pte) > +{ > + struct page *page; > + > + if (likely(!atomic_read(&vma->vm_mm->has_pinned))) > + return false; > + page = pte_page(pte); > + if (!page) > + return false; > + return page_maybe_dma_pinned(page); > +} > + > +static int clear_refs_hugetlb_range(pte_t *ptep, unsigned long hmask, > + unsigned long addr, unsigned long end, > + struct mm_walk *walk) > +{ > + struct clear_refs_private *cp = walk->private; > + struct vm_area_struct *vma = walk->vma; > + struct hstate *h = hstate_vma(walk->vma); > + unsigned long adj_start = addr, adj_end = end; > + spinlock_t *ptl; > + pte_t old_pte, pte; > + > + /* > + * clear_refs should only operate on complete vmas. Therefore, > + * values passed here should be huge page aligned and huge page > + * size in length. Quick validation before taking any action in > + * case upstream code is changed. > + */ > + if ((addr & hmask) != addr || end - addr != huge_page_size(h)) { > + WARN_ONCE(1, "%s passed unaligned address\n", __func__); > + return 1; > + } I wouldn't worry too much on the interface change - The one who will change the interface should guarantee all existing hooks will still work, isn't it? :) It's slightly confusing to me on why "clear_refs should only operate on complete vmas" is related to the check, though. > + > + ptl = huge_pte_lock(hstate_vma(vma), walk->mm, ptep); > + > + /* Soft dirty and pmd sharing do not mix */ Right, this seems to be a placeholder for unsharing code. Though maybe we can do that earlier in pre_vma() hook? That should be per-vma rather than handling one specific huge page here, hence more efficient imho. this reminded me that I should also better move hugetlb_unshare_all_pmds() of my other patch into hugetlb.c, so that this code can call it. Currently it's a static function in userfaultfd.c. > + > + pte = huge_ptep_get(ptep); > + if (!pte_present(pte)) > + goto out; > + if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) > + goto out; > + > + if (cp->type == CLEAR_REFS_SOFT_DIRTY) { Maybe move this check into clear_refs_test_walk()? We can bail out earlier if: (is_vm_hugetlb_page(vma) && (type != CLEAR_REFS_SOFT_DIRTY)) > + if (huge_pte_is_pinned(vma, addr, pte)) > + goto out; Out of topic of this patchset, but it's definitely a pity that we can't track soft dirty for pinned pages. Currently the assumption of the pte code path is: "if this page can be DMA written then we won't know whether data changed after all, then tracking dirty is meaningless", however that's prone to change when new hardwares coming, say, IOMMU could start to trap DMA writes already. But again that's another story.. and we should just follow what we do with non-hugetlbfs for sure here, until some day if we'd like to revive soft dirty tracking with pinned pages. > + > + /* > + * soft dirty and pmd sharing do not work together as > + * per-process is tracked in ptes, and pmd sharing allows > + * processed to share ptes. We unshare any pmds here. > + */ > + adjust_range_if_pmd_sharing_possible(vma, &adj_start, &adj_end); Ideally when reach here, huge pmd sharing won't ever exist, right? Then do we still need to adjust the range at all? Thanks, -- Peter Xu