Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp3951572pxy; Tue, 4 May 2021 13:54:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCvhyD9ugGf8iRTq7O8XZ/ycnD/RNKrpe/ngOvuyQDddyiMBGAKQR2puviEaWvkGqJVJBq X-Received: by 2002:a63:f311:: with SMTP id l17mr24402846pgh.405.1620161698109; Tue, 04 May 2021 13:54:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620161698; cv=none; d=google.com; s=arc-20160816; b=c7HWGaOQ/RLLuvIZYxDmn1uOvTCsbbNjOva1mo8OOSZxSWYcT0/ks5cR4NKfKMpxsg /547Ii8T2TVa+vG9QZP96g4MwSguO0FXsEnO3YE9Tz/2JaZsbKXccpFKnZMrngAhGyKm QQlNCeC2SXtwOLIhkhiRW9vCtiFufGUxaKQxjiksjNIm9TO4/7Llbfdgig2leH1pGM2f bgVIz6wAooH72gVWziW1aH/CS65Iu9I8XRwHWe9Ylr4+eiGX7DSzwG0surOJEF/gI0P/ cnkgUBu2+Xa+VeDIKaC6ZiLu1FlhOmPJ/4qHAZ379vSyy+kcQVl4O80zA7Qcs0o0PmsQ 4OOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=p0d+wC9ngM2LGyyQqRCKJ9/0SOMWmZ5kn2xSFhbIYHM=; b=vrc25vCJC2cNg2pslqLyEl32tE05iVsYrn7fKt606AhuYw3dcnuWJrR6PpaHnIgO7D K3a0CE02o00CWoiNwJIpBy8saTI4pn0YMYcrxtUafDtzdBCYNqOlzbXxza9czYrWPTgj b+nzShgDTMAMHwkKmD9Fa14o0Pr9H6kB456P1mit/FTU/k1gdmkGY93WKOobC1uyCp3T EfUoIUExCgj/0NQnsiop7nPDWP6loNHeGb8CjIZNAtOyTlF6bE9efU9f4h4e/aBo8Sqh VndnYGecDAxy1Eqni4YRH2BV3n8G13x7oDLOcIIdUqN1z0OTMERqlsuNvZ/VY7f1JS7n 3Uxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=ReKj41rk; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=+5skIosH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cv1si2468689pjb.137.2021.05.04.13.54.43; Tue, 04 May 2021 13:54:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=ReKj41rk; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=+5skIosH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232615AbhEDUyo (ORCPT + 99 others); Tue, 4 May 2021 16:54:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229542AbhEDUyo (ORCPT ); Tue, 4 May 2021 16:54:44 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02A2CC061574 for ; Tue, 4 May 2021 13:53:48 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1620161626; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p0d+wC9ngM2LGyyQqRCKJ9/0SOMWmZ5kn2xSFhbIYHM=; b=ReKj41rkHJXOC1+AZYE02fANJ/GDymegOHysZVTsn+ph2qWbtxNvqaZ8QivawM3qQhV0KC tAuhpbsEDg3FAjhqaVE1a3GNFYe6Cyhj+M4NKsypS7BQ1siDI5PjP1I4VR/3c/UxslYVCN C9BHnJAOWYZhxQpi78M21IMwdMrEMb7uQMdaDZeS3YrRk86Py5U7DeZlTlx2eVtkUK0P/Q Fy2vG7ivo7X8CdzpDLafQwDW5KFjZuwKcgPUncPaDEv8FPzvwXMghSwI5189c8UKqP7Kd6 5G3kto4u36urhgkGv0FCipjDyWI9vnxGwN5NgDOMSkkNAATSjBvP7mDWGfQ8TA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1620161626; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p0d+wC9ngM2LGyyQqRCKJ9/0SOMWmZ5kn2xSFhbIYHM=; b=+5skIosHLbcUBt3Ln8R+NyVdnFJfX7rKIdC5NrctrASPvd4ORD27gsRuhzIwKnMZxYLvyR zSHdA6kYpQ5mx+Dg== To: Ricardo Neri , Ingo Molnar , Borislav Petkov Cc: "H. Peter Anvin" , Ashok Raj , Andi Kleen , Tony Luck , Nicholas Piggin , "Peter Zijlstra \(Intel\)" , Andrew Morton , Stephane Eranian , Suravee Suthikulpanit , "Ravi V. Shankar" , Ricardo Neri , x86@kernel.org, linux-kernel@vger.kernel.org, Ricardo Neri , Andi Kleen Subject: Re: [RFC PATCH v5 07/16] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector In-Reply-To: <20210504190526.22347-8-ricardo.neri-calderon@linux.intel.com> References: <20210504190526.22347-1-ricardo.neri-calderon@linux.intel.com> <20210504190526.22347-8-ricardo.neri-calderon@linux.intel.com> Date: Tue, 04 May 2021 22:53:46 +0200 Message-ID: <87o8dqi5k5.ffs@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ricardo, On Tue, May 04 2021 at 12:05, Ricardo Neri wrote: > +static int hardlockup_detector_nmi_handler(unsigned int type, > + struct pt_regs *regs) > +{ > + struct hpet_hld_data *hdata = hld_data; > + int cpu = smp_processor_id(); > + > + if (is_hpet_wdt_interrupt(hdata)) { > + /* > + * Make a copy of the target mask. We need this as once a CPU > + * gets the watchdog NMI it will clear itself from ipi_cpumask. > + * Also, target_cpumask will be updated in a workqueue for the > + * next NMI IPI. > + */ > + cpumask_copy(hld_data->ipi_cpumask, hld_data->monitored_cpumask); > + /* > + * Even though the NMI IPI will be sent to all CPUs but self, > + * clear the CPU to identify a potential unrelated NMI. > + */ > + cpumask_clear_cpu(cpu, hld_data->ipi_cpumask); > + if (cpumask_weight(hld_data->ipi_cpumask)) > + apic->send_IPI_mask_allbutself(hld_data->ipi_cpumask, > + NMI_VECTOR); How is this supposed to work correctly? x2apic_cluster: x2apic_send_IPI_mask_allbutself() __x2apic_send_IPI_mask() tmpmsk = this_cpu_cpumask_var_ptr(ipi_mask); cpumask_copy(tmpmsk, mask); So if an NMI hits right after or in the middle of the cpumask_copy() then the IPI sent from that NMI overwrites tmpmask and when its done then tmpmask is empty. Similar to when it hits in the middle of processing, just with the difference that maybe a few IPIs have been sent already. But the not yet sent ones are lost... Also anything which ends up in __default_send_IPI_dest_field() is borked: __default_send_IPI_dest_field() cfg = __prepare_ICR2(mask); native_apic_mem_write(APIC_ICR2, cfg); <- NMI hits and invokes IPI which invokes __default_send_IPI_dest_field()... cfg = __prepare_ICR(0, vector, dest); native_apic_mem_write(APIC_ICR, cfg); IOW, when the NMI returns ICR2 has been overwritten and the interrupted IPI goes into lala land. Thanks, tglx