Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp2779001ybv; Mon, 24 Feb 2020 11:26:15 -0800 (PST) X-Google-Smtp-Source: APXvYqz10w2pWATPRotNZ35eWSis18XO+vV9psND7OfqkeFjULyfGYR1eKxVQx95HlRZl0HsAaN4 X-Received: by 2002:a05:6830:10c6:: with SMTP id z6mr44280467oto.203.1582572374848; Mon, 24 Feb 2020 11:26:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582572374; cv=none; d=google.com; s=arc-20160816; b=ZPbxz6NA+K3ytG3wRrSd1VNEPCio3+QvuSFE6XpbKuFxp3fcxFLjaveniXIfk14e58 gSjMzRYFhPTKbloZ51Nfe4eT73YpPw2xojn2CmlOXA6KoJbqy8WhOD+Dy18BQ9hb/d7j crxmDqrBByDpve6kkcnQFn5WEUnr6NDLVbuYMCR1c+qMhoOYySO/BYYpCPxs8i+y1rtm PCA1XZRl72e3A5+hLND7KDR0roHjVPw25F9t1n6MOtHx8kyQN6iJ/rfBG47Zxdddk6CS Wfr6Te1aepqSsUHPOIaqcisCAGuSEyY+1u/fcNWgo5y6TWn9cYRkbK653POJqbFesbb2 Fz/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=wGork7U1b9FxcgqNciaFhT+Plg5mf4/WpffnWYiEUtk=; b=aBFxRsVBITNGCJ8VXqBOOZn0OYaJO2QyCAzZoneEcuHiOvYa8yiAdyT/IXu48EfM8c lHZXxXAj0+KZzJdlr6nyYCxaSYN5LHDbAotZVyA3lZdj7ztd231FoBWHhfCajqDEl8Ng IXjK/gbgv107VFGCgu7hqthOXM7hhmKD4QdfDxx2xDCehflkHu/pWpKQdBeRxJIRip4G m6L3ifB9BXJfZ1wqELq1r/U8AUF6RdVYs0ZrcOOrbkMY/5yKVWx+rfM4QoXWyMBCar4R XgzKv8Ja73QQ5hso0JstF1+HULgcOpnD4vNC9bkm5z4yLXVBWOD4CzEMu3P8kwHdnj72 mIAA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k16si5346398oiw.128.2020.02.24.11.26.02; Mon, 24 Feb 2020 11:26:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727185AbgBXTZy (ORCPT + 99 others); Mon, 24 Feb 2020 14:25:54 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:50991 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726628AbgBXTZy (ORCPT ); Mon, 24 Feb 2020 14:25:54 -0500 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1j6JMH-0003bs-3P; Mon, 24 Feb 2020 20:25:37 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 891F4104088; Mon, 24 Feb 2020 20:25:36 +0100 (CET) From: Thomas Gleixner To: Borislav Petkov , Srinivas Pandruvada Cc: tony.luck@intel.com, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Wilson Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU In-Reply-To: <87y2ssm0sz.fsf@nanos.tec.linutronix.de> References: <20200222162432.497201-1-srinivas.pandruvada@linux.intel.com> <20200222175151.GD11284@zn.tnic> <40989625ca5496a986ca3e595957da83723777f4.camel@linux.intel.com> <20200224125525.GA29318@zn.tnic> <87y2ssm0sz.fsf@nanos.tec.linutronix.de> Date: Mon, 24 Feb 2020 20:25:36 +0100 Message-ID: <87lforn5xr.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thomas Gleixner writes: > Which is wrong as well. Trying to "fix" it in the work queue callback is > papering over the root cause. > > Why is any work scheduled on an outgoing CPU after this CPU executed > thermal_throttle_offline()? > > When thermal_throttle_offline() is invoked the cpu bound work queues are > still functional and thermal_throttle_offline() cancels outstanding > work. > > So no, please fix the root cause not the symptom. And if you look at thermal_throttle_online() then you'll notice that it is asymetric vs. thermal_throttle_offline(). Also you want to do cancel_delayed_work_sync() and not just cancel_delayed_work() because only the latter guarantees that the work is not enqueued anymore while the former does not take running or self requeueing work into account. Something like the untested patch below. Thanks, tglx --- --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi struct thermal_state *state = &per_cpu(thermal_state, cpu); struct device *dev = get_cpu_device(cpu); - cancel_delayed_work(&state->package_throttle.therm_work); - cancel_delayed_work(&state->core_throttle.therm_work); + /* Mask the thermal vector before draining evtl. pending work */ + l = apic_read(APIC_LVTTHMR); + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED); + + cancel_delayed_work_sync(&state->package_throttle.therm_work); + cancel_delayed_work_sync(&state->core_throttle.therm_work); state->package_throttle.rate_control_active = false; state->core_throttle.rate_control_active = false;