Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3241810pxb; Mon, 1 Mar 2021 05:24:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJy3SHF6EcTaOWfAlrgbfvLAcT9nas+JvdSp+0rkbxBhtUZEYg13rDR5UW8QXVOr0lcyPt7/ X-Received: by 2002:a17:906:12db:: with SMTP id l27mr15741927ejb.500.1614605098213; Mon, 01 Mar 2021 05:24:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614605098; cv=none; d=google.com; s=arc-20160816; b=0zOjdK96gbOT94YofRZf1FtFJkbp2xHZgI+50QqMCFz7Trmjp2RfOsELKmI7P5fCDF niYYAI/kpx9t6Y/AboYLBnQmu/4xAdIvV8cy9lU25siA908shn2tpEhGjsQJTMW/Hjiv 1jnUnixe6GPjPsLUZsxGXWLl23awpR/t7JdcMDluLsHlPUeRe2baJixoSkdXA+9ML4be l7VItyKgM5IHEtvcHTXNq0JJR++umpq9W+udTLlPX57YIVuIqTVNdq2aHSBLjJn1bFp8 fVwswkCF7o37lFCYkawPhZNYKl7tT0YuZnOgnPEIPbEavLEFtKPI1q3rgVuTcWeryQUH sdbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=7KWinSv3Eti6IGJjB++smYhTTYVj/80tJDLFB4nPt8U=; b=a2g5k/ohPsevvB4nHSv1cey9PieQom5v5nd5pNeEOFBBXVYE1eW8ZHyIN+JfleXDvP 4CjC1jA9TsRGXcFtju5NmUF5/Kb4Nq04ziQVcgdJxUK7N+Lq+/jILVUx3draJ9GfK32l wfzlbaDc1bCSgRtZKi8NCAlXLsW9bu+MyhbexhRznRD0lQEwmGmDMec6aoyVPT4y5tde krJOURS7gE1MXFTcrO2rglu6Ap80q/G1jbcsF1DcXuz+LW2lr1pe9SS33H+GuS459u2Z i9O6ssrEoalVPx9JjFN+kvwD2vmmkjjeRfJhQlZX8RrXyqRKe8rNTQOetAhcg3/p2Jo/ XWnQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id os24si8291245ejb.729.2021.03.01.05.24.34; Mon, 01 Mar 2021 05:24:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235604AbhCANWW (ORCPT + 99 others); Mon, 1 Mar 2021 08:22:22 -0500 Received: from mga05.intel.com ([192.55.52.43]:1946 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234334AbhCANWU (ORCPT ); Mon, 1 Mar 2021 08:22:20 -0500 IronPort-SDR: 7hxp9OazjGGkAzC8H3P1bs5LUEoGXpIQ6aEixcXk02S08MsHjzu4mfp5cpNiEFaK4oj/wkq053 PVWESfLKjbxQ== X-IronPort-AV: E=McAfee;i="6000,8403,9909"; a="271456164" X-IronPort-AV: E=Sophos;i="5.81,215,1610438400"; d="scan'208";a="271456164" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2021 05:20:50 -0800 IronPort-SDR: Kp0ANovFLJIjU3mLk3Zx4w4IhT1Og9kuUmmZSMArM8FhN2sM6q43r1BwwmH+PZQrCIXkS1x2oR bQrjPPxdgWgw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,215,1610438400"; d="scan'208";a="595415959" Received: from linux.intel.com ([10.54.29.200]) by fmsmga006.fm.intel.com with ESMTP; 01 Mar 2021 05:20:49 -0800 Received: from [10.251.13.36] (kliang2-MOBL.ccr.corp.intel.com [10.251.13.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id C01DE58033E; Mon, 1 Mar 2021 05:20:48 -0800 (PST) Subject: Re: [perf] perf_fuzzer causes crash in intel_pmu_drain_pebs_nhm() To: Peter Zijlstra , Vince Weaver Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Stephane Eranian References: <61a56699-aab4-ef6-ed8d-a22b6bf532d@maine.edu> <7170d3b-c17f-1ded-52aa-cc6d9ae999f4@maine.edu> From: "Liang, Kan" Message-ID: <32888c33-c286-c600-66cb-8b1b03beeb8b@linux.intel.com> Date: Mon, 1 Mar 2021 08:20:48 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/11/2021 9:53 AM, Peter Zijlstra wrote: > > Kan, do you have time to look at this? > > On Thu, Jan 28, 2021 at 02:49:47PM -0500, Vince Weaver wrote: >> On Thu, 28 Jan 2021, Vince Weaver wrote: >> >>> the perf_fuzzer has turned up a repeatable crash on my haswell system. >>> >>> addr2line is not being very helpful, it points to DECLARE_PER_CPU_FIRST. >>> I'll investigate more when I have the chance. >> >> so I poked around some more. >> >> This seems to be caused in >> >> __intel_pmu_pebs_event() >> get_next_pebs_record_by_bit() ds.c line 1639 >> get_pebs_status(at) ds.c line 1317 >> return ((struct pebs_record_nhm *)n)->status; >> >> where "n" has the value of 0xc0 rather than a proper pointer. >> I think I find the suspicious patch. The commt id 01330d7288e00 ("perf/x86: Allow zero PEBS status with only single active event") https://lore.kernel.org/lkml/tip-01330d7288e0050c5aaabc558059ff91589e67cd@git.kernel.org/ The patch is an SW workaround for some old CPUs (HSW and earlier), which may set 0 to the PEBS status. It adds a check in the intel_pmu_drain_pebs_nhm(). It tries to minimize the impact of the defect by avoiding dropping the PEBS records which have PEBS status 0. But, it doesn't correct the PEBS status, which may bring problems, especially for the large PEBS. It's possible that all the PEBS records in a large PEBS have the PEBS status 0. If so, the first get_next_pebs_record_by_bit() in the __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large PEBS, the 'count' parameter must > 1. The second get_next_pebs_record_by_bit() will crash. Could you please revert the patch and check whether it fixes your issue? Thanks, Kan