Received: by 10.223.176.46 with SMTP id f43csp2174964wra; Thu, 25 Jan 2018 06:08:14 -0800 (PST) X-Google-Smtp-Source: AH8x227ghjvD5ZuTfN6xJU7ffgqdlU1hQseQ1YZ81qDNZsioBq0ip21QZ2g1OudyzBPKdrVLMsWk X-Received: by 10.99.176.15 with SMTP id h15mr13619325pgf.374.1516889294135; Thu, 25 Jan 2018 06:08:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516889294; cv=none; d=google.com; s=arc-20160816; b=iGeUfNl6oQrzZmsBdtrPzsAt8RbcGgXInl1RiPwQDspFzAXhOgY9Ah6UcU1FoW/H15 /iJN/Ql9JqqSog1O2hmmdvExv6IBnrKMwpKXdUoEhZWFd/egZ9/D7ekM3lgWXirp6FG1 85V6kYTnmm6a2jfYVXX2rNiC/qCYT9o33cuo8tyLvfscJjF7gCzmTr4iSQAF+eK8Aj4e 9m4CysBaD9IasGPrPORNpkXLA0mkXkWOR8WoOwGxqQbbtfdUhPhh1nEE7CTx1iOY1THn q4PeyUz14/qyFYFc2FItSHMcA/nwtREpXqPqDjmDOKElr0C9XZWXKMIi+38SYuDLS9Hk iSpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=fS9t551HfimJP4nwFlBhBNME19QJUs5PN8jKGEAc1Xs=; b=coXVR0i0vSKbQjWiv06heXUZyE+amYWgLtoVOAOew83x2Wsp6vqhs/BzC+BQCiJv42 w+41JcdfDm1snHrhhPC96sq4Ynwr+mSxVEKC/rBDdOGEGrzjkb9HZ4QmDiHCNk0L3Goo ujd+Sxjo1X2PupqTzhCXVcyT7jfaTOOjgOgAtcguep6u0fHO00QANl4s5hsZGqm5cCHu xyAH8Gp3y5PXcRbrYs4rtAvXuW/dlq/bbqnMaVtiP/i6Xr9N/4IXTmi4GlfJrClVR0BJ swp60FC/JLWJprgyloR8ZGMs3kmdJaeEPwE8eXzu5DZUJo7b+KEgsejVDyaF1C05WYf+ c2jA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f90-v6si1827938plf.0.2018.01.25.06.07.59; Thu, 25 Jan 2018 06:08:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751514AbeAYOHK (ORCPT + 99 others); Thu, 25 Jan 2018 09:07:10 -0500 Received: from mga02.intel.com ([134.134.136.20]:9468 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751312AbeAYOHJ (ORCPT ); Thu, 25 Jan 2018 09:07:09 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Jan 2018 06:07:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,412,1511856000"; d="scan'208";a="29416369" Received: from avandeve-mobl.amr.corp.intel.com (HELO [10.254.64.38]) ([10.254.64.38]) by orsmga002.jf.intel.com with ESMTP; 25 Jan 2018 06:07:07 -0800 Subject: Re: [RFC PATCH 1/2] x86/ibpb: Skip IBPB when we switch back to same user process To: Peter Zijlstra Cc: Tim Chen , linux-kernel@vger.kernel.org, KarimAllah Ahmed , Andi Kleen , Andrea Arcangeli , Andy Lutomirski , Ashok Raj , Asit Mallick , Borislav Petkov , Dan Williams , Dave Hansen , David Woodhouse , Greg Kroah-Hartman , "H . Peter Anvin" , Ingo Molnar , Janakarajan Natarajan , Joerg Roedel , Jun Nakajima , Laura Abbott , Linus Torvalds , Masami Hiramatsu , Paolo Bonzini , rkrcmar@redhat.com, Thomas Gleixner , Tom Lendacky , x86@kernel.org References: <20180125085820.GV2228@hirez.programming.kicks-ass.net> <20180125092233.GE2295@hirez.programming.kicks-ass.net> <86541aca-8de7-163d-b620-083dddf29184@linux.intel.com> <20180125135055.GK2249@hirez.programming.kicks-ass.net> From: Arjan van de Ven Message-ID: Date: Thu, 25 Jan 2018 06:07:07 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180125135055.GK2249@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/25/2018 5:50 AM, Peter Zijlstra wrote: > On Thu, Jan 25, 2018 at 05:21:30AM -0800, Arjan van de Ven wrote: >>> >>> This means that 'A -> idle -> A' should never pass through switch_mm to >>> begin with. >>> >>> Please clarify how you think it does. >>> >> >> the idle code does leave_mm() to avoid having to IPI CPUs in deep sleep states >> for a tlb flush. > > The intel_idle code does, not the idle code. This is squirreled away in > some driver :/ afaik (but haven't looked in a while) acpi drivers did too > >> (trust me, that you really want, sequentially IPI's a pile of cores in a deep sleep >> state to just flush a tlb that's empty, the performance of that is horrific) > > Hurmph. I'd rather fix that some other way than leave_mm(), this is > piling special on special. > the problem was tricky. but of course if something better is possible lets figure this out problem is that an IPI to an idle cpu is both power inefficient and will take time, exit of a deep C state can be, say 50 to 100 usec range of time (it varies by many things, but for abstractly thinking about the problem one should generally round up to nice round numbers) if you have say 64 cores that had the mm at some point, but 63 are in idle, the 64th really does not want to IPI each of those 63 serially (technically this is does not need to be serial but IPI code is tricky, some things end up serializing this a bit) to get the 100 usec hit 63 times. Actually, even if it's not serialized, even ONE hit of 100 usec is unpleasant. so a CPU that goes idle wants to "unsubscribe" itself from those IPIs as general objective. but not getting flush IPIs is only safe if the TLBs in the CPU have nothing that such IPI would want to flush, so the TLB needs to be empty of those things. the only way to do THAT is to switch to an mm that is safe; a leave_mm() does this, but I'm sure other options exist. note: While a CPU that is in a deeper C state will itself flush the TLB, you don't know if you will actually enter that deep at the time of making OS decisions (if an interrupt comes in the cycle before mwait, mwait becomes a nop for example). In addition, once you wake up, you don't want the CPU to go start filling the TLBs with invalid data so you can't really just set a bit and flush after leaving idle