Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp3214005ybl; Mon, 19 Aug 2019 14:19:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqzvKFa6ntC5u5pEK0qFsGjGlhA+me9edqD/31hKRnxjioENXe5OS+G40gq6AMEXv6oMsnm5 X-Received: by 2002:a17:902:f216:: with SMTP id gn22mr25882210plb.59.1566249572449; Mon, 19 Aug 2019 14:19:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566249572; cv=none; d=google.com; s=arc-20160816; b=nONg/wKVjyHlkonbQO7zRWq37FAkmiPhASFRvdtPjCFDliZ9+VFBVmPpDAt4gstBn5 2XKBZueJSItbSEz0zf8iobfyqYYjH/0Oc2A0mJZbhsbOTXrSY8hJPrxZajQ+Q7XkMqBe 4CPvHMloe+5UG8qxaYc36pBUJCL/v0b8K8BoRjltQJ5V6aN83iNR5QB3OgDIlwVs+lWZ bCRyvl4r2t804ZX3iTrakFzhCTux1hBOvoJzPGZqmmx32txzTNJddG9suVmRSArTwPOT OCDjNtpLNMVlFpQ4KCmA4aI20qpfCG7RhQyADjFiS/SMJCeCv7KxNtC6aZHo5diOHGkA hi5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=JmRUWuZGRYvejqJ0CUl2fCALAk85wX0lVFMjrXImZdc=; b=RRliz6oPSkALwQLl+HJhpfYdBSSPvn0ckNTHIP4Hlu01OXZKaSyGFmrEQf+/4oBTPS UC4YJWBE1ip0DX2UhmrAVcPPgJLWbrGPdFf8b+xhkaNIL/klFU702t0iUickov25HMLB 2LdmNxk8cyubOvZhIh+rgRApiMenaU3eLUNy+fSLkky3o1mnVP6/nhRWYFptH9/4dYdZ k0AY7HSU5FHpmHhp7WDMWly8WO0MeAf42Yf/5Dlq4NaZ7DVi5JcPTQOZMvH/Np9kHrW0 h2gbp0lX0gIWiyKKKr82UOfIkyK87UmDjmhLi6XJmNhnFqaq/x2KOk+BR+6S7s5p32mA RB3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mXb6e6bs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r1si9568649pjp.80.2019.08.19.14.19.17; Mon, 19 Aug 2019 14:19:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mXb6e6bs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728619AbfHSVRk (ORCPT + 99 others); Mon, 19 Aug 2019 17:17:40 -0400 Received: from mail-lj1-f194.google.com ([209.85.208.194]:35724 "EHLO mail-lj1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728438AbfHSVRj (ORCPT ); Mon, 19 Aug 2019 17:17:39 -0400 Received: by mail-lj1-f194.google.com with SMTP id l14so3116388lje.2 for ; Mon, 19 Aug 2019 14:17:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JmRUWuZGRYvejqJ0CUl2fCALAk85wX0lVFMjrXImZdc=; b=mXb6e6bs4QT2yrjP7C/IBvcnLtOpwDkhvqsFMtttgrfiPi4UgcnjRWaQq+IgLVV7PH K6eGkIc1bpr4dPKOCCdSjwqiW8M7MmAi5XXShZeEw6JDX1QrSf7cjFCIsHfKeuQ3PZ5i QTH91MnnjYGe+seCnwB0N/IAWNczvsmAu5eU8PDdU6TgGSsn0YBeKZN/+B8H1QUuwg1k aCwFMQChTDjwVGRjyC8SHR5H/i2VSEQmRLoEc5TF7WMc3LTCM48uia3u6e3h9KQjFzXS r5V7pBW7M2aCORdpwLs2mXxjGr7qZX38oVMM/Q6/3mfr8ac/5uwV1NXDCG2OclY1gs9E xi5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JmRUWuZGRYvejqJ0CUl2fCALAk85wX0lVFMjrXImZdc=; b=EDqw0yy8VgfE1/UPn5Y3PhD/MbNbxRjGWcYk0qilkfexjAJiXz9ezxxyzhDRggiaT9 /ndRX1UqkttUHE8QfqF6NrLd89dsLERIGyzEFKV8nM3DpjFbvSCgrKkxhMY6vCp4o34f 0nr2Nh9xLNhQSN331q1ty/J1ga4McOIGio49f7SSCGYmA7QEyCOd2+D/2PkDEZ3DCbsp d1gMPTmc8mzLGBaeE0G6QDasKLCxG1GWBA5c1VxfgswbOaDa+Uqd912uqBaEy88sT/P3 9+34mmYomUbbB+98Gu1KHJ4o5wukwGC+RDw6fMD/lfm576PBWHfef/FzgQ8+LPdSmTQb siBw== X-Gm-Message-State: APjAAAV8FKRsbvIrIxvF8x3aWZG5KX54iE7uDqZj8IqEolG3F9howYzN Swn+ZpoYA5Jza+5gXbQVjqbPaMAUdlKcMcx/Gzc= X-Received: by 2002:a05:651c:c1:: with SMTP id 1mr13933150ljr.119.1566249457346; Mon, 19 Aug 2019 14:17:37 -0700 (PDT) MIME-Version: 1.0 References: <20180223121456.GZ25201@hirez.programming.kicks-ass.net> <20180226203937.GA21543@tassilo.jf.intel.com> In-Reply-To: From: Josh Hunt Date: Mon, 19 Aug 2019 14:17:25 -0700 Message-ID: Subject: Re: Long standing kernel warning: perfevents: irq loop stuck! To: Thomas Gleixner Cc: Andi Kleen , Peter Zijlstra , Cong Wang , "Liang, Kan" , jolsa@redhat.com, bigeasy@linutronix.de, "H. Peter Anvin" , Ingo Molnar , x86 , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 12, 2019 at 12:42 PM Josh Hunt wrote: > > On Mon, Aug 12, 2019 at 12:34 PM Thomas Gleixner wrote: > > > > On Mon, 12 Aug 2019, Josh Hunt wrote: > > > On Mon, Aug 12, 2019 at 10:55 AM Thomas Gleixner wrote: > > > > > > > > On Mon, 12 Aug 2019, Josh Hunt wrote: > > > > > Was there any progress made on debugging this issue? We are still > > > > > seeing it on 4.19.44: > > > > > > > > I haven't seen anyone looking at this. > > > > > > > > Can you please try the patch Ingo posted: > > > > > > > > https://lore.kernel.org/lkml/20150501070226.GB18957@gmail.com/ > > > > > > > > and if it fixes the issue decrease the value from 128 to the point where it > > > > comes back, i.e. 128 -> 64 -> 32 ... > > > > > > > > Thanks, > > > > > > > > tglx > > > > > > I just checked the machines where this problem occurs and they're both > > > Nehalem boxes. I think Ingo's patch would only help Haswell machines. > > > Please let me know if I misread the patch or if what I'm seeing is a > > > different issue than the one Cong originally reported. > > > > Find the NHM hack below. > > > > Thanks, > > > > tglx > > > > 8<---------------- > > > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > > index 648260b5f367..93c1a4f0e73e 100644 > > --- a/arch/x86/events/intel/core.c > > +++ b/arch/x86/events/intel/core.c > > @@ -3572,6 +3572,11 @@ static u64 bdw_limit_period(struct perf_event *event, u64 left) > > return left; > > } > > > > +static u64 nhm_limit_period(struct perf_event *event, u64 left) > > +{ > > + return max(left, 128ULL); > > +} > > + > > PMU_FORMAT_ATTR(event, "config:0-7" ); > > PMU_FORMAT_ATTR(umask, "config:8-15" ); > > PMU_FORMAT_ATTR(edge, "config:18" ); > > @@ -4606,6 +4611,7 @@ __init int intel_pmu_init(void) > > x86_pmu.pebs_constraints = intel_nehalem_pebs_event_constraints; > > x86_pmu.enable_all = intel_pmu_nhm_enable_all; > > x86_pmu.extra_regs = intel_nehalem_extra_regs; > > + x86_pmu.limit_period = nhm_limit_period; > > > > mem_attr = nhm_mem_events_attrs; > > > Thanks Thomas. Will try this and let you know. > > -- > Josh Thomas I found on my setup that setting the value to 32 was the lowest value I could use to keep the problem from happening. Let me know if you want me to send a patch with the updated value, etc. I saw in the original thread from Ingo and Vince that this was seen on Haswell, but I checked our Haswell boxes and so far we have not reproduced the problem there. -- Josh