Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp801833pxp; Fri, 11 Mar 2022 15:31:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJzCjf6HAJUhAc3EJgLsAne2JEhiyNp/xNjDtcBlT2v9G+gDPZhugl+7EdtaOsLYjizXCOBO X-Received: by 2002:a17:90a:aa83:b0:1b9:7c62:61e5 with SMTP id l3-20020a17090aaa8300b001b97c6261e5mr13413134pjq.118.1647041511947; Fri, 11 Mar 2022 15:31:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1647041511; cv=none; d=google.com; s=arc-20160816; b=CxjjM7eIrOZd8DyK7A/wGpvlqAsOHaZXy5VfaJFQVJJemyLd99YfWyNbxrBIlkDD0f FJnauezkQa2hN2HmtjIgpudWGsecEJ3mZs1089yMC1MDkk00xNIvAx0VG1P9MKDzoyR5 dkL48djCZLlKA25N+wJTjITGr+JmheSG1/+F8vCkKFZ0vTT/KaLl0lKQdOKZ+QMJPLVG Vjg0BdB1A4gRZEKNWpHfoFOdAntkd66KlhpDAwi+zyi5u7mm3ANT9e06EV+T79jhPocc 7Kx8DSHiSApq8owdn8f77YHrGw9WT3Fqp2BLd48OIK3zwRixxtdT1andU1EwSHACwxvI oJbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :content-language:references:cc:to:from:user-agent:mime-version:date :message-id:dkim-signature; bh=4HkFo33t0B/GmpZdwKXUYVOb2mL+7LS9kL2qjQ81AZE=; b=kXCWkYS2nFwaA8JuR+l1DdGXkVk7XUk2cdNoy30SPEzgRUKJr1AQG+QCFELmpixg23 HrNvyB762igFk9XjMXFJVvNRWZvfOnGtlC3oiNos+0B7b0eS3c4cFj1K981ma5xxZ2fU zgQTzj0T+acuc23SYBWnSQC44uo6bO6PUxLIJJKKeAJsldoABA4RuRL0pnW8Zh4A3EqG etvRgMdjQv3Z0v4y9XBV4d4vLJ0O6OJjgmYp5a5kZOdlx6Jk4Wlq6FmyVppsHrWx1/Nb GSRa/5IryBxUxPqsoWHJ8POniBvF0mx40OBZRsEA4S0FrPSrM6IiXy8+yZFRQuKpRzhn S5Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FTPqdbvj; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id j12-20020a63230c000000b0035e0b7ff792si9276477pgj.430.2022.03.11.15.31.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Mar 2022 15:31:51 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FTPqdbvj; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A84942713CA; Fri, 11 Mar 2022 14:18:02 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229491AbiCKUmy (ORCPT + 99 others); Fri, 11 Mar 2022 15:42:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232399AbiCKUmw (ORCPT ); Fri, 11 Mar 2022 15:42:52 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A12291D6F63 for ; Fri, 11 Mar 2022 12:41:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647031308; x=1678567308; h=message-id:date:mime-version:from:to:cc:references: subject:in-reply-to:content-transfer-encoding; bh=d+V62j80p7kCm2OV0b4F2QuYNYGrrrxGrbo0cWvMM/A=; b=FTPqdbvjah/iYSSgkx4T7C+7/ZTSB1Vzx68gOiB78WCfE3kjRtblFoC7 bpJYDM2rb5oQHHgTCBXLwRohoI3lWmds2DtQvs1dEaxgqv1jJ/Zsw0M5G HMaWnmywSE+yyagSXrFanBuff5d3CrkOsRpkWDvUKiXsBBxAwj5EIJqxc ZhdF80uQiWV+z8bxW2hfNqw7Rxh6fEmPJzRINXE1ht70LIcmac9pY0svf f8UeFNeoe0Ft3CA0zi2cqNDkNpnivxzs2ALOxOQD5bWg61ZKyQ5W5WlQ4 lD20gNqe5GdBgDwaAc8N07akv0CItw5qIDZTT6lshjz7YKX2VwnS2VkuR Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10283"; a="235602800" X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="235602800" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 12:41:48 -0800 X-IronPort-AV: E=Sophos;i="5.90,174,1643702400"; d="scan'208";a="645046437" Received: from cpeirce-mobl1.amr.corp.intel.com (HELO [10.212.128.243]) ([10.212.128.243]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Mar 2022 12:41:47 -0800 Message-ID: Date: Fri, 11 Mar 2022 12:41:41 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 From: Dave Hansen To: Nadav Amit , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Andrew Morton , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org References: <20220311190749.338281-1-namit@vmware.com> <20220311190749.338281-6-namit@vmware.com> Content-Language: en-US Subject: Re: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() In-Reply-To: <20220311190749.338281-6-namit@vmware.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/11/22 11:07, Nadav Amit wrote: > From: Nadav Amit > > Calls to change_protection_range() on THP can trigger, at least on x86, > two TLB flushes for one page: one immediately, when pmdp_invalidate() is > called by change_huge_pmd(), and then another one later (that can be > batched) when change_protection_range() finishes. > > The first TLB flush is only necessary to prevent the dirty bit (and with > a lesser importance the access bit) from changing while the PTE is > modified. However, this is not necessary as the x86 CPUs set the > dirty-bit atomically with an additional check that the PTE is (still) > present. One caveat is Intel's Knights Landing that has a bug and does > not do so. First of all, thank you for your diligence here. This is a super obscure issue. I think I put handling for it in the kernel and I'm not sure I would have even thought about this angle. That said, I'm not sure this is all necessary. Yes, the Dirty bit can get set unexpectedly in some PTEs. But, the question is whether it is *VALUABLE* and needs to be preserved. The current kernel code pretty much just lets the hardware set the Dirty bit and then ignores it. If it were valuable, ignoring it would have been a bad thing. We'd be losing data on today's kernels because the hardware told us about a write that happened but that the kernel ignored. My mental model of what the microcode responsible for the erratum does is something along these lines: if (write) pte |= _PAGE_DIRTY; if (!pte_present(pte)) #PF The PTE is marked dirty, but the write never actually executes. The thread that triggered the A/D setting *also* gets a fault. I'll double-check with some Intel folks to make sure I'm not missing something. But, either way, I don't think we should be going to this much trouble for the good ol' Xeon Phi. I doubt there are many still around and I *REALLY* doubt they're running new kernels. *If* we need this (and I'm not convinced we do), my first instinct would be to just do this instead: clear_cpu_cap(c, X86_FEATURE_PSE); on KNL systems. If anyone cares, they know where to find us.