Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1311393yba; Tue, 2 Apr 2019 06:39:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqxobSNAZR3H+Yld+pNOPhp39q0KWKvMBIuJ7qanAgkGYrAYckNe8czfue61gd+yFQvmIddg X-Received: by 2002:a62:ee0a:: with SMTP id e10mr50202308pfi.6.1554212385511; Tue, 02 Apr 2019 06:39:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554212385; cv=none; d=google.com; s=arc-20160816; b=S4DU7spkt0ba9U4rGGAoUF2u7gwLL/3OTQALDTe64DxX/IXwYxqlHcRuuIeQxe+KYD VDKxA63+DWbc0hbkAoQVx+x+sLtsI8G0X9wUBIP55ukABdGc40vl4duIatvxPvE5Npxm Ef2ZRZB1UNOdjmIHmgB6S5K2uB9bnCb0To3B+xFNecxJmflX9BE97B3rt9wGGY6zSyct ga9TKUPIDTIHlLjrDNtN8BxgZSJIQh4UT0GNiR5bon1RgRaoiAuhgyxq+OFXKEilCSHs u2/hiGRTEjnwRRTrDeDkvAtRch/KFgPT1CyacJp3jHLlvTbNidLq8/2904hMRWDUlMeY 5/oQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=h7+g/iYIWMSAyqpdMvQ0Q7lGt5FdIMeDmj57P8TlVG4=; b=sOOt9L/wQ0EQ0AG7ESFfq926BbdGERFjvZqX0FNBnqFfTJiA2/+6+0PYeKX4shM0dc IxHN+zjPzbT+U5YJsTd6hGZQRDYqzIpNiB+JHAg6SQCmPFwgJ36rjxNE9KNVp7uAJR29 LyMSrL3CkE7foMPQRNxDG3/xpUre/NLeCQs6o8HmhL+b0kDtTi1JysA6UslVQNnnwajV I2OK1s7wVlbRpCKCE5Nue3umuOpVQZ0e7ULUEH2lmw3xhCetnPk5crGbtT+baMntw4z9 XYJplvxCdsgELXnn+DkEu9uCtepn8nhQfN5WsiSJ/U1VtofTF1OAoo+foUNszXH31Ywq jUKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MzDBZl1t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t3si4476923plq.181.2019.04.02.06.39.29; Tue, 02 Apr 2019 06:39:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MzDBZl1t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730673AbfDBNWF (ORCPT + 99 others); Tue, 2 Apr 2019 09:22:05 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:45751 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726716AbfDBNWF (ORCPT ); Tue, 2 Apr 2019 09:22:05 -0400 Received: by mail-lj1-f195.google.com with SMTP id y6so11544479ljd.12 for ; Tue, 02 Apr 2019 06:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=h7+g/iYIWMSAyqpdMvQ0Q7lGt5FdIMeDmj57P8TlVG4=; b=MzDBZl1tdfETNjYIicV0opq1TK9RnNrBdMmsMK181pJlJhDT2yd5Uf14yQP842NNJ2 flCoUCU40Llemlllx2eeinISgjZxUFthV65ZH6SeBae4s7YGJqPzlmlB5VkKG52D2ISu S7Ukqu1TIBuORVUgZNCiPZ1z3UZi50AAXLyAcAJP6pFXPA5tuSvx9oeprMqy7zFnTuYl 0rxJvoZlFKplXhtKiPUcKD3QekKq6ToAnJGVm2UpI3VxFOfDmvTWLa8wCyS1VGqT7sPz 541VVsma1UovY2mcIF+D35kgtQPJ05cP2Sm8MDH7x0XmlFtyHHNUk/W7OzUA6dXLxQGI vlOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=h7+g/iYIWMSAyqpdMvQ0Q7lGt5FdIMeDmj57P8TlVG4=; b=KXYn33G+Y1T3xOkLO9Q4+T7SCg+HcYQTOz+ZaeJx226lbaukHZZhQKFEPqEWDSkjSt 0zF38WgYmvqG2ikPBhEEbdHrchA2uWYNCwu8qrAiZhROvnDANNFZGK3zHdUQyiWmQES1 oxUw+sBmyrgVqPeZ1AF1GgcjbK404kN77h3uHiAmj1Jp1ty/oedaEEFJ47X83EQgWoow WzNIdX+X7HA5c3+rKTEOy8tHF3hPyLF1IuvcjAPa1bKFRyw+7mVT5LTq7PmsmtIZrjYM MGGcL9d26doCzlJpFaitDGSPudewVGHOgj6QqxW0gS8JQS0gvi/f+gORuw3Gs9wgMFJg letQ== X-Gm-Message-State: APjAAAX1KKcUFMIfTKI2v7wxPUj5xpeUcnk9jlLAAsHgLbYqwbwxCLKN WEbISkkDHKC1l3SN3g3jnfs= X-Received: by 2002:a2e:2b16:: with SMTP id q22mr8732084lje.20.1554211323073; Tue, 02 Apr 2019 06:22:03 -0700 (PDT) Received: from uranus.localdomain ([5.18.103.226]) by smtp.gmail.com with ESMTPSA id e5sm2715449lja.96.2019.04.02.06.22.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 02 Apr 2019 06:22:01 -0700 (PDT) Received: by uranus.localdomain (Postfix, from userid 1000) id A493C4607D0; Tue, 2 Apr 2019 16:22:00 +0300 (MSK) Date: Tue, 2 Apr 2019 16:22:00 +0300 From: Cyrill Gorcunov To: Peter Zijlstra Cc: "Lendacky, Thomas" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Arnaldo Carvalho de Melo , Alexander Shishkin , Ingo Molnar , Borislav Petkov , Namhyung Kim , Thomas Gleixner , Jiri Olsa , Vince Weaver , Stephane Eranian Subject: Re: [RFC PATCH v3 0/3] x86/perf/amd: AMD PMC counters and NMI latency Message-ID: <20190402132200.GA23501@uranus> References: <155415519143.24457.2706922532995302758.stgit@tlendack-t1.amdoffice.net> <20190402130302.GL12232@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190402130302.GL12232@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.11.3 (2019-02-01) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 02, 2019 at 03:03:02PM +0200, Peter Zijlstra wrote: > On Mon, Apr 01, 2019 at 09:46:33PM +0000, Lendacky, Thomas wrote: > > This patch series addresses issues with increased NMI latency in newer > > AMD processors that can result in unknown NMI messages when PMC counters > > are active. > > > > The following fixes are included in this series: > > > > - Resolve a race condition when disabling an overflowed PMC counter, > > specifically when updating the PMC counter with a new value. > > - Resolve handling of active PMC counter overflows in the perf NMI > > handler and when to report that the NMI is not related to a PMC. > > - Remove earlier workaround for spurious NMIs by re-ordering the > > PMC stop sequence to disable the PMC first and then remove the PMC > > bit from the active_mask bitmap. As part of disabling the PMC, the > > code will wait for an overflow to be reset. > > > > The last patch re-works the order of when the PMC is removed from the > > active_mask. There was a comment from a long time ago about having > > to clear the bit in active_mask before disabling the counter because > > the perf NMI handler could re-enable the PMC again. Looking at the > > handler today, I don't see that as possible, hence the reordering. The > > question will be whether the Intel PMC support will now have issues. > > There is still support for using x86_pmu_handle_irq() in the Intel > > core.c file. Did Intel have any issues with spurious NMIs in the past? > > Peter Z, any thoughts on this? > > I can't remember :/ I suppose we'll see if anything pops up after these > here patches. At least then we get a chance to properly document things. > > > Also, I couldn't completely get rid of the "running" bit because it > > is used by arch/x86/events/intel/p4.c. An old commit comment that > > seems to indicate the p4 code suffered the spurious interrupts: > > 03e22198d237 ("perf, x86: Handle in flight NMIs on P4 platform"). > > So maybe that partially answers my previous question... > > Yeah, the P4 code is magic, and I don't have any such machines left, nor > do I think does Cyrill who wrote much of that. It was so long ago :) What I remember from the head is some of the counters were borken on hardware level so that I had to use only one counter instead of two present in the system. And there were spurious NMIs too. I think we can move this "running" bit to per-cpu base declared inside p4 code only, so get rid of it from cpu_hw_events? > I have vague memories of the P4 thing crashing with Vince's perf_fuzzer, > but maybe I'm wrong. No, you're correct. p4 was crashing many times before we manage to make it more-less stable. The main problem though that to find working p4 box is really a problem. > Ideally we'd find a willing victim to maintain that thing, or possibly > just delete it, dunno if anybody still cares. As to me, I would rather mark this p4pmu code as deprecated, until there is *real* need for its support. > > Anyway, I like these patches, but I cannot apply since you send them > base64 encoded and my script chokes on that.