Received: by 2002:a05:6a10:eb17:0:0:0:0 with SMTP id hx23csp562674pxb; Fri, 3 Sep 2021 08:13:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvoHFQrkzgBAWeIEV58yjudg1ENGRGIfWkfGzoqyecSPSNkputiCFRSwcZrZWgbfrDxR1u X-Received: by 2002:a05:6638:619:: with SMTP id g25mr3078623jar.38.1630682005472; Fri, 03 Sep 2021 08:13:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630682005; cv=none; d=google.com; s=arc-20160816; b=Rw3lfvsjaJUT81dQN+x6xB6f0uvuDitYtzZ1WbA8bT06HjnMYj+TX1HJZ5rOzbPV87 XmpIFk4lBX4K6VjCP0QMJY9Ph6HEKbdULq0TUnHBJkFT5F3fOoSH4gXQ5syxloYEB+bQ LnCdo7AJJE15By8F70VO9c04B3mx3J51eLDl5yea1Bo5ul4LW/A+mgP1ZlEkQjWlwZE/ iAu3YkZJKY/mtFE55lIFHtIaI25+9nRqN1tNVgLZTfn3VZ1KmMsrFHPbHRJmxb6TRtX9 dn4xL86WK/FgvjGHKHeUStfHTC6s05qZ3GUmdGtdOa48vPaxrSf3xoDRviSk13ZNwHif k0Uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:to:from:subject:message-id; bh=6d3+eDq/3alXaIk9rYE7Va6+vQbO0oKGMh5xQIQKBfU=; b=mO9JEodYEeQTpQNa3/XGOpIRQW6NNxzeYD6nceRW/caqOwxV/UzkfZZKUMCgc/23YQ QX9wB3TqI87KEhj19O9e8DRvWs/0WzYZz2Sll9MwQ7m7BKnsdT4Jg+pVXP+kw1U409TZ 7H6CWdKXBUC2Ax3fAyMAQPutU2kEsxnsh0jVXwxtHxcC4KlheiJrPeMEitECl3G3mToG quyrFjQ/OGO0DRuQj4s7sEvpD/CmhYob4WbO2Gs2zXWsx5E0cUL7khlNl8er/rPXfCNm IKaN0ZEjDRTzyelW5IE2Bwph+9JJLvv54MYt8TKsMO6aC0w3Fn9+K83WqGzK8ZXQMU8B /VvA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r4si5538023ila.160.2021.09.03.08.12.53; Fri, 03 Sep 2021 08:13:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236549AbhICPMw (ORCPT + 99 others); Fri, 3 Sep 2021 11:12:52 -0400 Received: from mga18.intel.com ([134.134.136.126]:53645 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229789AbhICPMv (ORCPT ); Fri, 3 Sep 2021 11:12:51 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10096"; a="206564378" X-IronPort-AV: E=Sophos;i="5.85,265,1624345200"; d="scan'208";a="206564378" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2021 08:11:51 -0700 X-IronPort-AV: E=Sophos;i="5.85,265,1624345200"; d="scan'208";a="534175292" Received: from achiranj-mobl.gar.corp.intel.com ([10.213.105.90]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2021 08:11:48 -0700 Message-ID: <3dd730013f98dde18b5a92ff56446703788195f9.camel@linux.intel.com> Subject: Re: Bug: d0e936adbd22 crashes at boot From: Srinivas Pandruvada To: Jens Axboe , LKML , "Rafael J. Wysocki" , Len Brown , inux-pm@vger.kernel.org Date: Fri, 03 Sep 2021 08:11:44 -0700 In-Reply-To: <767fe00f-bf31-1eb0-09cc-1be91c633bb4@kernel.dk> References: <942f4041-e4e7-1b08-3301-008ab37ff5b8@kernel.dk> <3ac87893-55ba-f2d4-bb1e-382868f12d4c@kernel.dk> <7f115f0476618d34b24ddec772acbbd7c0c4a572.camel@linux.intel.com> <767fe00f-bf31-1eb0-09cc-1be91c633bb4@kernel.dk> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.0-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote: > On 9/3/21 8:38 AM, Srinivas Pandruvada wrote: > > On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote: > > > On 9/3/21 8:13 AM, Srinivas Pandruvada wrote: > > > > Hi Axboe, > > > > > > > > Thanks for reporting. > > > > On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote: > > > > > Hi, > > > > > > > > > > Booting Linus's tree causes a crash on my laptop, an x1 gen9. > > > > > This > > > > > was > > > > > a bit > > > > > difficult to pin down as it crashes before the display is up, > > > > > but I > > > > > managed > > > > > to narrow it down to: > > > > > > > > > > commit d0e936adbd2250cb03f2e840c6651d18edc22ace > > > > > Author: Srinivas Pandruvada < > > > > > srinivas.pandruvada@linux.intel.com> > > > > > Date:   Thu Aug 19 19:40:06 2021 -0700 > > > > > > > > > >     cpufreq: intel_pstate: Process HWP Guaranteed change > > > > > notification > > > > > > > > > > which crashes with a NULL pointer deref in > > > > > notify_hwp_interrupt() - > > > > > > > > > > > queue_delayed_work_on(). > > > > > > > > > > Reverting this change makes the laptop boot fine again. > > > > > > > > > Does this change fixes your issue? > > > > > > I would assume so, as it's crashing on cpudata == NULL :-) > > > > > > But why is it NULL? Happy to test patches, but the below doesn't > > > look > > > like > > > a real fix and more of a work-around. > > > > This platform is sending an HWP interrupt on a CPU which we didn't > > yet > > bring it up for pstate control. So somehow firmware decided to send > > very early during boot, which previously we would have ignored it > > > > Actually try this, with more prevention > > I can give this a whirl. > > > diff --git a/drivers/cpufreq/intel_pstate.c > > b/drivers/cpufreq/intel_pstate.c > > index b4ffe6c8a0d0..6ee88d7640ea 100644 > > --- a/drivers/cpufreq/intel_pstate.c > > +++ b/drivers/cpufreq/intel_pstate.c > > @@ -1645,12 +1645,24 @@ void notify_hwp_interrupt(void) > >         if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY)) > >                 return; > >   > > -       rdmsrl(MSR_HWP_STATUS, value); > > +       rdmsrl_safe(MSR_HWP_STATUS, &value); > >         if (!(value & 0x01)) > >                 return; > >   > > +       /* > > +        * After hwp_active is set and all_cpu_data is allocated, > > there > > +        * is small window. > > +        */ > > +       if (!all_cpu_data) { > > +               wrmsrl_safe(MSR_HWP_STATUS, 0); > > +               return; > > +       } > > What synchronizes the all_cpu_data setup and the interrupt? Can the > interrupt come in while it's still being setup? Yes. I am working on a change to simulate that case and fix. Thanks, Srinivas >