Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp3467330ybv; Tue, 25 Feb 2020 01:51:29 -0800 (PST) X-Google-Smtp-Source: APXvYqzW+3UI/7gS1PaBpMLyGniKQfu8MQjveN3H/oVGgZf0PzQQmZQyZdJEbw4ymj/3NWfFpHeY X-Received: by 2002:a05:6830:1042:: with SMTP id b2mr43360674otp.306.1582624289169; Tue, 25 Feb 2020 01:51:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582624289; cv=none; d=google.com; s=arc-20160816; b=KISg3E148CeCe4ECmWA9+Tvs0Eeci+OTOihsc6XDOH84zA3Z1EfdhLEJyvvOfaYuOx O026PqSGcfDfnO//XSaqUqO1pfK0X9wiLHvmalnMcVLZiJQmspqSOzwHfGJERJ3wuN/+ Demc5OEyTuf59XwmHn8scAYtsmjYE/4WSE5OHMQaYLK6WFlVGgEbVCMp8Y5WKmuHMVXh cLoUBFF8IyiRQ8nx1+qx5dPHFZ6/SUSJzpfHWcgxJKrYyDdbfXgU27GMI/K7hOMwi5+y 5bySUkFusgewcdqEtCntqaGL3R3IwORGw1378SGaAie5mPqjuOhZpNWqj+Nqhj3dRVZX EvCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=Y13eIRSiAE0RaoN8tzRN8Uhz1DmDB1l323Agko6WYJ8=; b=pmslqhqF79jWz5Dpwe8PPVt/ix2WIPgKoHRADMtOYkaYltWNJ2MWJvtT9hlyudmRye HGwKarmp5WpE+4UIRYhLwxkaDxY11nRT7Vi3H/Pru8EvvQ2Pro9m8URBQhmgtWni+17r Bt2bcwcxGMzkMFKp9rvUfSQe/auzdrjPxqIuqbfcj7WuRz/x7QUy/cJLj214/T6qPCwx 99Hf0Ig3+P3VsJsGU1wTwuaP88VhDD8NYkxl53lrD+XcTOQMuOKGK+C+/+ed4WgFEoP2 gfPoNaHPpgKSiyxFiJXey2MOl10URAv/T+QJwau/t3U82rfWGuKdrBhgjBYdKA+m5hvS e6iw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a28si8164988otd.257.2020.02.25.01.51.16; Tue, 25 Feb 2020 01:51:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729086AbgBYJqu (ORCPT + 99 others); Tue, 25 Feb 2020 04:46:50 -0500 Received: from mga02.intel.com ([134.134.136.20]:36307 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726587AbgBYJqu (ORCPT ); Tue, 25 Feb 2020 04:46:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Feb 2020 01:46:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,483,1574150400"; d="scan'208";a="317038412" Received: from spandruv-mobl3.jf.intel.com ([10.254.107.199]) by orsmga001.jf.intel.com with ESMTP; 25 Feb 2020 01:46:43 -0800 Message-ID: <9edd2470285db5cf38556d00cfc56215069b2d4c.camel@linux.intel.com> Subject: Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU From: Srinivas Pandruvada To: Thomas Gleixner , Borislav Petkov Cc: tony.luck@intel.com, mingo@redhat.com, hpa@zytor.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, Chris Wilson Date: Tue, 25 Feb 2020 01:46:43 -0800 In-Reply-To: <87lforn5xr.fsf@nanos.tec.linutronix.de> References: <20200222162432.497201-1-srinivas.pandruvada@linux.intel.com> <20200222175151.GD11284@zn.tnic> <40989625ca5496a986ca3e595957da83723777f4.camel@linux.intel.com> <20200224125525.GA29318@zn.tnic> <87y2ssm0sz.fsf@nanos.tec.linutronix.de> <87lforn5xr.fsf@nanos.tec.linutronix.de> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.3 (3.34.3-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2020-02-24 at 20:25 +0100, Thomas Gleixner wrote: > Thomas Gleixner writes: > > Which is wrong as well. Trying to "fix" it in the work queue > > callback is > > papering over the root cause. > > > > Why is any work scheduled on an outgoing CPU after this CPU > > executed > > thermal_throttle_offline()? > > > > When thermal_throttle_offline() is invoked the cpu bound work > > queues are > > still functional and thermal_throttle_offline() cancels outstanding > > work. > > > > So no, please fix the root cause not the symptom. > > And if you look at thermal_throttle_online() then you'll notice that > it > is asymetric vs. thermal_throttle_offline(). > > Also you want to do cancel_delayed_work_sync() and not just > cancel_delayed_work() because only the latter guarantees that the > work > is not enqueued anymore while the former does not take running or > self > requeueing work into account. > > Something like the untested patch below. I tested this patch. After simulating 20 million thermal interrupts and online/offline test for 12+ hours, don't see the issue. So this change fixed the issue. I can send change on your behalf or you can add Tested-by: Pandruvada, Srinivas Thanks, Srinivas > > Thanks, > > tglx > --- > --- a/arch/x86/kernel/cpu/mce/therm_throt.c > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c > @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi > struct thermal_state *state = &per_cpu(thermal_state, cpu); > struct device *dev = get_cpu_device(cpu); > > - cancel_delayed_work(&state->package_throttle.therm_work); > - cancel_delayed_work(&state->core_throttle.therm_work); > + /* Mask the thermal vector before draining evtl. pending work > */ > + l = apic_read(APIC_LVTTHMR); > + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED); > + > + cancel_delayed_work_sync(&state->package_throttle.therm_work); > + cancel_delayed_work_sync(&state->core_throttle.therm_work); > > state->package_throttle.rate_control_active = false; > state->core_throttle.rate_control_active = false;