Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp10573pxv; Wed, 21 Jul 2021 14:03:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwowxdZWD+ST0VxLhAgbv2aeaSntMJ7RerTDCHJsEFruTB3CxmTSHguMuCD9YfvH5TX215L X-Received: by 2002:a92:c54d:: with SMTP id a13mr12729513ilj.74.1626901415069; Wed, 21 Jul 2021 14:03:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626901415; cv=none; d=google.com; s=arc-20160816; b=XjY20zHAIaSSrcWkQSkvA25lDnn1moGG+0REf+P+qU5THjGoJOwRLvpd0FLnsDPW8c qkMRCxHswU3L/RFVdvbnSJOZL6T9BZRj3d2fjc3qv2UUWe77DkL3xEIEhVYhPBeao245 YrIx3iDNvaN08s7iIhufoaR+3K8vNXZmNuuYPWR4C/LW+jBBkuvwQ3xD4Ug78LFM6GWo ykkCHN56eqQ4RqQgIwJTkalMAlG2tfRP/1CjMTByBHvjJ05hH+KujuO3iNN66daVMSYh N6UqggvVT1H/Mm+n+9CI+4ejmYAVmmKM7dOwros0mcLd0ZFPnBDnkgu4arC9g1MlMbt/ v0TQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=0Ix6B2u9o/3NyIeJsNBLENdZ3uGHmD3Meu6yoqcTW5I=; b=TcyoJgWpaIY8lc0wdbmtR4MYpUNA0/XcaWLrbTsDvx/JR3bRHBgiFhSUNFcG9f7t3o D6GdYIkON9F0HD//Wxf4mdRiD4zAyPb7bIYW+aH4K53cAwTKz5rKKxEurXSmIFJy8EKb hvIbUk7xf4fumxDy9VVQyh7DXnLT+i4DlIU3BAbSh+RQWBlfniQLQuT8NvXNi4uR/lza Vpg9JxLTLykh1P5Ehmuwceeljg3xu4ZIrkwDso7/JdDP1wQSEwlsTd0sXe3Eg/t2bq7E jQ8dPCT9c6hmCooDCA7lJje+MApnHVs239YAfnci+BJ63vV9gMluBtVKHkxJzVJ5m7nj c6IA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="tjb/HC71"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u17si3008365jat.84.2021.07.21.14.03.23; Wed, 21 Jul 2021 14:03:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="tjb/HC71"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240018AbhGUPAe (ORCPT + 99 others); Wed, 21 Jul 2021 11:00:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239922AbhGUPAd (ORCPT ); Wed, 21 Jul 2021 11:00:33 -0400 Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 487F3C061757 for ; Wed, 21 Jul 2021 08:41:10 -0700 (PDT) Received: by mail-pg1-x529.google.com with SMTP id u14so2236367pga.11 for ; Wed, 21 Jul 2021 08:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=0Ix6B2u9o/3NyIeJsNBLENdZ3uGHmD3Meu6yoqcTW5I=; b=tjb/HC71RJXdMuiHb+AnXkVgM/b6xHvu74cDA1hoYQB5GURWM5En8OJFJhSG1hJ6Tz OgMrefJI/AmSw9VD0fKAVxTIznanZAurCA3hSMMniT4v+fhVBLtwRwTZa4+OwS/SmroK b8WNlQL+ZWBI7oxCqX27JhmuIlu0UD6V7UiKO71zfiw0wYgMi5ss3yh91VzLcPATycpe xhkOdts328XxPJVZZ1DZOJaz9ArcXfEANrzfrlH0mJMOPGwmbZojZqyZFxiDRkY+mHTb kfAj/M/ld91js53aOxvPHZlqZr008CtZqa26/qI3kKe3QoX46FPPZF45FVmqTPpRMuXB TPSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=0Ix6B2u9o/3NyIeJsNBLENdZ3uGHmD3Meu6yoqcTW5I=; b=VpJPe6HyVrLZaSnI4obkp1aiusd/GjjkTpYZE60MgBfutoK3F/i0tUVA64g9NMIBvr MIxQaCyT5mmQ9xaRJvmNuGbG9+Nrsa9hRH79fPv3LaK2dHdu2RpG12qdDM751rKGhOhz IEl3hE6XBLPo/2uy0O5JvdvJC9lhQFvVt6+pqkWApXhX58kdpOUi+t6E8GS4oaKobNxx aio0vQieBRlEtdLXaC+NSX0lfDyuz7MdzgBWXWr2aaX/HLoMuXCxaBCR+SU99/wZgi8a uzG+ELeH2if3ZqGvCfCtp7/zKNDD6WbXddWHwa0DLm+bLt3XhVv+DK3yum7qeKQckFO4 yy6Q== X-Gm-Message-State: AOAM533qJXn3xij01Q+bQGsUosYslf5REBaWwxOAeX3ZPDWKFeIvN38W I9Wm2CRTsKmOrTC67FcxTPiNBA== X-Received: by 2002:aa7:93a2:0:b029:333:64d3:e1f7 with SMTP id x2-20020aa793a20000b029033364d3e1f7mr33712898pff.25.1626882069640; Wed, 21 Jul 2021 08:41:09 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id 19sm12148673pgh.6.2021.07.21.08.41.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Jul 2021 08:41:09 -0700 (PDT) Date: Wed, 21 Jul 2021 15:41:05 +0000 From: Sean Christopherson To: Yang Shi Cc: Zi Yan , Christian Borntraeger , Huang Ying , Andrew Morton , Linux MM , Linux Kernel Mailing List , Dan Carpenter , Mel Gorman , Gerald Schaefer , Heiko Carstens , Hugh Dickins , Andrea Arcangeli , "Kirill A . Shutemov" , Michal Hocko , Vasily Gorbik , Paolo Bonzini , kvm list Subject: Re: [PATCH] mm,do_huge_pmd_numa_page: remove unnecessary TLB flushing code Message-ID: References: <20210720065529.716031-1-ying.huang@intel.com> <0D75A92F-E2AA-480C-9E9A-0B6EE7897757@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 20, 2021, Yang Shi wrote: > On Tue, Jul 20, 2021 at 2:04 PM Zi Yan wrote: > > > > On 20 Jul 2021, at 16:53, Yang Shi wrote: > > > > > On Tue, Jul 20, 2021 at 7:25 AM Christian Borntraeger > > > wrote: > > >>> - if (mm_tlb_flush_pending(vma->vm_mm)) { > > >>> - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); > > >>> - /* > > >>> - * change_huge_pmd() released the pmd lock before > > >>> - * invalidating the secondary MMUs sharing the primary > > >>> - * MMU pagetables (with ->invalidate_range()). The > > >>> - * mmu_notifier_invalidate_range_end() (which > > >>> - * internally calls ->invalidate_range()) in > > >>> - * change_pmd_range() will run after us, so we can't > > >>> - * rely on it here and we need an explicit invalidate. > > >>> - */ > > >>> - mmu_notifier_invalidate_range(vma->vm_mm, haddr, > > >>> - haddr + HPAGE_PMD_SIZE); > > >>> - } > > >>> CC Paolo/KVM list so we also remove the mmu notifier here. Do we need those > > >> now in migrate_pages? I am not an expert in that code, but I cant find > > >> an equivalent mmu_notifier in migrate_misplaced_pages. > > >> I might be totally wrong, just something that I noticed. > > > > > > Do you mean the missed mmu notifier invalidate for the THP migration > > > case? Yes, I noticed that too. But I'm not sure whether it is intended > > > or just missed. > > > > From my understand of mmu_notifier document, mmu_notifier_invalidate_range() > > is needed only if the PTE is updated to point to a new page or the page pointed > > by the PTE is freed. Page migration does not fall into either case. The "new page" part of a page table entry is updated to point to a new page is referring to a different physical page, i.e. a different pfn, not a different struct page. do_huge_pmd_numa_page() is moving a THP between nodes, thus it's changing the backing pfn and needs to invalidate secondary MMUs at some point. > > In addition, in migrate_pages(), more specifically try_to_migrate_one(), > > there is a pair of mmu_notifier_invalidate_range_start() and > > mmu_notifier_invalidate_range_end() around the PTE manipulation code, which should > > be sufficient to notify secondary TLBs (including KVM) about the PTE change > > for page migration. Correct me if I am wrong. > > Thanks, I think you are correct. By looking into commit 7066f0f933a1 > ("mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page()"), > the tlb flush and mmu notifier invalidate were needed since the old > numa fault implementation didn't change PTE to migration entry so it > may cause data corruption due to the writes from GPU secondary MMU. > > The refactor does use the generic migration code which converts PTE to > migration entry before copying data to the new page. That's my understanding as well, based on this blurb from commit 7066f0f933a1. The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and uses the generic migrate_pages which transitions the pte from numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs and all mmu notifiers there before copying the page. That analysis/justification for removing the invalidate_range() call should be captured in the changelog. Confirmation from Andrea would be a nice bonus.