Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp41625rwr; Tue, 25 Apr 2023 17:16:22 -0700 (PDT) X-Google-Smtp-Source: AKy350aZl0hw0l999Ug2VuDkIGbcAcUHYO6xsQGTVj9BbQEvu5ubvxkZCQUJaFMBMltSy2gbq98M X-Received: by 2002:a17:903:120b:b0:1a5:2db2:2bb with SMTP id l11-20020a170903120b00b001a52db202bbmr25499213plh.15.1682468182351; Tue, 25 Apr 2023 17:16:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682468182; cv=none; d=google.com; s=arc-20160816; b=yQD4OQVu7+oX87R1qo39YY9mzDhxw1nQYMpCxTCNQgsBVYzLtnQf72vRrldcHebw3Z HecrTIP3rjP2DKgLmu2o7+oe4bMUqAFz0AZGVQL2u5TG9A26pQqwfek0cWc2KboD0LRC AIg0GK2Rk0oUOVMLzm48buYLYdz30EP1oAdV4RFsskjoswBM7Tl5nWe/Dn+jRmF+wK+A 3RRII2ErXw2Fe5tsc+NVWurM2Cp/RH9zVylX4Xprucypf9XBBvLKyD2ecEOKrg8ZMNsx uOOjzINRhBSgDMi01hTcUWAhkNv36PpzUBN3EGigX0ivnM1U6aTpwB7j9ppupzGZkc4+ 8qiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:references:in-reply-to:user-agent:subject:cc:to:from :date:dkim-signature:dkim-filter; bh=zlADXuLQzQL6rHF9UUciVfpOcO8xxJBWdPf+jRMOtOY=; b=zMOvuU+IalMRUrx5wQ48nx4RyabrQcj+VirL/qC2RUeVrFze6SzqVcminTVjfey1Oj iePwGq0mGj6Hi184YBo+RCjF6KbsKjRUtiTj1+3x47zVWO2mPo9P4MJB4kE8d3xdGoCW LRDov1gRmyrwfqSp3frAlyrwTLhhbjjRnQv11a3hjEX8T7TVTkEqza6Sgj3PDfqtW+nt xNVM4VHh8XAdtyL1Jx6CkqC6Vw3NCSWX9DYoEbKNrf/77qyVvyf0Gyrr9jtvsOEt/KBN vivC9LADDXQGV7oQxzDSEOdkKoE5cgHM9zQattsOokr+mtCtMKxZJsrVRrt1HD9VdGjF mdyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zytor.com header.s=2023040901 header.b=jQNul7h3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zytor.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l12-20020a170903120c00b0019ca1961bc1si15941593plh.108.2023.04.25.17.16.08; Tue, 25 Apr 2023 17:16:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@zytor.com header.s=2023040901 header.b=jQNul7h3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zytor.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237915AbjDZALh (ORCPT + 99 others); Tue, 25 Apr 2023 20:11:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231896AbjDZALg (ORCPT ); Tue, 25 Apr 2023 20:11:36 -0400 Received: from mail.zytor.com (unknown [IPv6:2607:7c80:54:3::138]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB81810E9 for ; Tue, 25 Apr 2023 17:11:34 -0700 (PDT) Received: from [127.0.0.1] ([73.231.166.163]) (authenticated bits=0) by mail.zytor.com (8.17.1/8.17.1) with ESMTPSA id 33Q0Ahlv3867755 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 25 Apr 2023 17:10:44 -0700 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 33Q0Ahlv3867755 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2023040901; t=1682467844; bh=zlADXuLQzQL6rHF9UUciVfpOcO8xxJBWdPf+jRMOtOY=; h=Date:From:To:CC:Subject:In-Reply-To:References:From; b=jQNul7h3BL6TjIB5IeiXakFXyVmFOZYvsiqT5Y9CCEe2lL0yHHCEnrzPwoJ3iynxH zyOSN4jPN8whhWJcgEvNZ3Cwn5ue4eVAHjUWRFD8WdYSuvdKxyGy3cqHZwfmvr/BzL fgbOwCxq8xyJ7UOh4ped5n6brZ+vBBrBchNZMOnIFWmkUKxOniqjjfvWZtDTjJwycE Z4NQxO8T70KqwH/kQQcVRsDkkQfm2uyVxwEEi10dOPAAVHNDWsMEtrViMZKzbAz1VT UXUpjy1cX9iDEBChgHLfsjobulqrE9R7lOpC6tuWuUxKCeJtoY7MJNlXvA23vd7Joz cPHTCf2EPjBug== Date: Tue, 25 Apr 2023 17:10:42 -0700 From: "H. Peter Anvin" To: Dave Hansen , Thomas Gleixner , Tony Battersby , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org CC: Mario Limonciello , Tom Lendacky , "linux-kernel@vger.kernel.org" , Andi Kleen Subject: Re: [PATCH RFC] x86/cpu: fix intermittent lockup on poweroff User-Agent: K-9 Mail for Android In-Reply-To: References: <3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com> <87o7nbzn8w.ffs@tglx> Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On April 25, 2023 3:29:49 PM PDT, Dave Hansen w= rote: >On 4/25/23 14:05, Thomas Gleixner wrote: >> The only consequence of looking at bit 0 of some random other leaf is >> that all CPUs which run stop_this_cpu() issue WBINVD in parallel, which >> is slow but should not be a fatal issue=2E >>=20 >> Tony observed this is a 50% chance to hang, which means this is a timin= g >> issue=2E > >I _think_ the system in question is a dual-socket Westmere=2E I don't se= e >any obvious errata that we could pin this on: > >> https://www=2Eintel=2Ecom/content/dam/www/public/us/en/documents/specif= ication-updates/xeon-5600-specification-update=2Epdf > >Andi Kleen had an interesting theory=2E WBINVD is a pretty expensive >operation=2E It's possible that it has some degenerative behavior when >it's called on a *bunch* of CPUs all at once (which this path can do)=2E >If the instruction takes too long, it could trigger one of the CPU's >internal lockup detectors and trigger a machine check=2E At that point, >all hell breaks loose=2E > >I don't know the cache coherency protocol well enough to say for sure, >but I wonder if there's a storm of cache coherency traffic as all those >lines get written back=2E One of the CPUs gets starved from making enoug= h >forward progress and trips a CPU-internal watchdog=2E > >Andi also says that it _should_ log something in the machine check banks >when this happens so there should be at least some kind of breadcrumb=2E > >Either way, I'm hoping this hand waving satiates tglx's morbid curiosity >about hardware that came out from before I even worked at Intel=2E ;) "Pretty expensive" doesn't really cover it=2E It is by far the longest tim= e an x86 CPU can block out all outside events=2E