Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7097999imu; Thu, 31 Jan 2019 05:03:01 -0800 (PST) X-Google-Smtp-Source: ALg8bN6rMm91gtcMJdYiey4Rtv7ak52XR7yzW/07BohZb7a8Krrx+DJ5QMgi1w+AnDOyZugNZwg8 X-Received: by 2002:a63:9306:: with SMTP id b6mr30112482pge.36.1548939781203; Thu, 31 Jan 2019 05:03:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548939781; cv=none; d=google.com; s=arc-20160816; b=Cp4xYygUoXb3DgU6gpSC9QMZ5+QJsPg/yoqgL6BcczfUzsUxpl49qilrschCaINUCE 0on7rAlYH625LmdpOZ7X1T0YH4Mn3rJzr/IOhBpV/FsUpaaG6UU/LMi3/4lcnTuRFQMb CpQIEcYbXJEv2HoB569xamasZHIZPlxOMkGB6tlBhnyisvj8i9WZyTrIs2wA4ZtES3JM 7nU67UT8ecCHKpLSSVncIX7uUPePxSB3kJ7D92fJMOcrW38kc0EgyNYM4jHsRBQnH1E3 O2ZtsalMX90VjQYMPJ7F8Z6c2irh5o4d0j68GCpUC6g/v57dS2/13gI8oIw3SibyVOgu Sw4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=fqCm6Wqxu0V8kKnOqe3F9d1n6TEKdoGtSOC0RY6W0JA=; b=PJDNJnSPvMYArAjaZkNu9CALljmByF6wiEsqAayXpk+f03JgyGXVwYOMYDHIO+LNue 3jIOqi6Hg2vSVAj081iy5aRwzfE4VN0B67D8sPgUTPhjBCElr6N6L1eY5G0k9CyXLA+g 1xt+VOHjB+4A48rERfjZ79Wuh1JPBdQHfQewq0DSeDXJ8+ZLJP4v9hXyBWFfmAYlBKUC 5uQ5g/AGdfH1Xha8NcehaeEPU1SovkfHvm3pILPEkBhiugayr3oJqNOpIj6MqiPKrXer 1PutPPi2JhWkOdwQ41hiPoRPzCs8M0SpOldoVZfksNl5kVT0229UYpFKTuOhcCb+y4xt V9uw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g184si4558137pfb.288.2019.01.31.05.02.40; Thu, 31 Jan 2019 05:03:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733283AbfAaNAb (ORCPT + 99 others); Thu, 31 Jan 2019 08:00:31 -0500 Received: from mga05.intel.com ([192.55.52.43]:28597 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726266AbfAaNAb (ORCPT ); Thu, 31 Jan 2019 08:00:31 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 Jan 2019 05:00:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,544,1539673200"; d="scan'208";a="139506095" Received: from tassilo.jf.intel.com (HELO tassilo.localdomain) ([10.7.201.137]) by fmsmga002.fm.intel.com with ESMTP; 31 Jan 2019 05:00:30 -0800 Received: by tassilo.localdomain (Postfix, from userid 1000) id BF3F0301201; Thu, 31 Jan 2019 05:00:30 -0800 (PST) Date: Thu, 31 Jan 2019 05:00:30 -0800 From: Andi Kleen To: Ravi Bangoria Cc: lkml , Jiri Olsa , Peter Zijlstra , linux-perf-users@vger.kernel.org, Arnaldo Carvalho de Melo , eranian@google.com, vincent.weaver@maine.edu, "Naveen N. Rao" Subject: Re: System crash with perf_fuzzer (kernel: 5.0.0-rc3) Message-ID: <20190131130030.GW6118@tassilo.jf.intel.com> References: <7c7ec3d9-9af6-8a1d-515d-64dcf8e89b78@linux.ibm.com> <20190125160056.GG6118@tassilo.jf.intel.com> <33d300b5-f8b0-4987-4d4c-b5175b6b6c60@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <33d300b5-f8b0-4987-4d4c-b5175b6b6c60@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 31, 2019 at 01:28:34PM +0530, Ravi Bangoria wrote: > Hi Andi, > > On 1/25/19 9:30 PM, Andi Kleen wrote: > >> [Fri Jan 25 10:28:53 2019] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 > >> [Fri Jan 25 10:29:08 2019] perf: interrupt took too long (3136 > 3126), lowering kernel.perf_event_max_sample_rate to 63750 > >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (4140 > 3920), lowering kernel.perf_event_max_sample_rate to 48250 > >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (5231 > 5175), lowering kernel.perf_event_max_sample_rate to 38000 > >> [Fri Jan 25 10:29:11 2019] perf: interrupt took too long (6736 > 6538), lowering kernel.perf_event_max_sample_rate to 29500 > > > > These are fairly normal. > > I understand that throttling mechanism is designed exactly to do this. > But I've observed that, everytime I run the fuzzer, max_sample_rates is > been throttled down to 250 (which is CONFIG_HZ I guess). Doesn't this > mean the interrupt time is somehow increasing gradually? Is that fine? It's more like the throttling mechanism is an controller and it takes multiple tries to zoom in on the truely needed value. You can measure the PMI time by enabling the nmi:nmi_handler trace point. It directly reports it. From what I've seen it's a long tail distribution with regular large outliers. Most of the PMIs are not that slow, just an occassional few are. When I did some investigation on this a couple years back the outliers were either due to call stack processing, or due to flushing the perf ring buffer. There were some fixes on the the call stack case back then, but I'm sure more could be done. For the call stack processing there isn't much more we can do I think (other than switching to call stack LBR only), but I suspect the buffer flushing problem could be improved more. It's relatively easy to investigate with a variant of the ftrace recipe I posted earlier (but you need to fix the Makefile first to enable ftrace for all of perf) Just add a ftrace trigger on the nmi_handler trace point to stop tracing when the nmi_handler time exceeds a threshold and look at the traces. -Andi