Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp860810pxb; Fri, 22 Apr 2022 12:51:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyo2AAAvy3LnbDzrNXabTBXcWVWF9yOz7lwCzfIcyjtSZlPaY0ytKOQys9YxDhefQtcv8cU X-Received: by 2002:a17:90b:1bc9:b0:1d2:a0df:5ca5 with SMTP id oa9-20020a17090b1bc900b001d2a0df5ca5mr7264557pjb.3.1650657109339; Fri, 22 Apr 2022 12:51:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650657109; cv=none; d=google.com; s=arc-20160816; b=0kD+TnLYVF/DvE0BZ88ub2xLpNkIQncqn+uMDlEeAYwZFoQkO1+h11j6NAZ/qKPCEv aQGhNYI08f7ZiFn1efMiOoT3VSTcO/LdJNDZVcmSAKsnVWWBI8WRqpX8EW7jEFB9h+kp QBhx0wub/y2puyPUHLjIKf7smhQT4QlEnDaduOtDAIkLo43r8jEpA4PJbL5VCnTKQFZY nV8IxIh9nF/nZsX2QzwOBMDjzFYdylveX3ZxP3wtyT56WxpW3wqRLxPMWb9MiIjJSI6O bugyX3q70ndaBW2BeFrJwgo0bP2rhz8uHmxQP9NEaQRw566dn/Q9l5pYQ8IHHCE2hc+I tKYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:subject:from:user-agent:mime-version:date:message-id; bh=xaVcYkEbWnQRH77MQZju/KRduWEraAdcZvoeCzqPwew=; b=eX6V3vyjTxcOGzewEosfkapCgSPEVPpci5Ul6TaDeEAU2ux5BDf/j2Ah5j5naoE/Tg FHjZ5zw2R9wHnrVHZbmincrt0ok3/nY6ly++OuyaapdTZxOEE+nAf4sZnUttNbK5AICl iXZwj+fopSnSzm7pViY2DjJeMzDXnjn3XPOUFJjJaQSrhHcKl8aNfsq2aTavmdJ0n0yy ByQCoIFrPD9XJgVTR0zKm5nOFCIDMH3owFOfRYDxtZojI6UUENsJ9wavM4dVTJgtntK4 1rcnwFFLbC/ylwCvoB0U8VpCAqIWZxzArMYUhuPGUp/LIerf3ZZdFiXR3TZA+k/FH17g kelQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id u16-20020a170902e81000b001540e087684si10402984plg.24.2022.04.22.12.51.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 12:51:49 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 87F0116FE32; Fri, 22 Apr 2022 11:53:23 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379760AbiDTOr1 (ORCPT + 99 others); Wed, 20 Apr 2022 10:47:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379751AbiDTOrX (ORCPT ); Wed, 20 Apr 2022 10:47:23 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C85F5427C6; Wed, 20 Apr 2022 07:44:35 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=wenyang@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0VAaWWIp_1650465867; Received: from 30.30.111.41(mailfrom:wenyang@linux.alibaba.com fp:SMTPD_---0VAaWWIp_1650465867) by smtp.aliyun-inc.com(127.0.0.1); Wed, 20 Apr 2022 22:44:30 +0800 Message-ID: <9f137001-276d-0c7c-d0f3-1d0a34355f4c@linux.alibaba.com> Date: Wed, 20 Apr 2022 22:44:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 From: Wen Yang Subject: Re: [RESEND PATCH 2/2] perf/x86: improve the event scheduling to avoid unnecessary pmu_stop/start To: Stephane Eranian , Peter Zijlstra Cc: Wen Yang , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Thomas Gleixner , mark rutland , jiri olsa , namhyung kim , borislav petkov , x86@kernel.org, "h. peter anvin" , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org References: <20220304110351.47731-1-simon.wy@alibaba-inc.com> <20220304110351.47731-2-simon.wy@alibaba-inc.com> <0c119da1-053b-a2d6-1579-8fb09dbe8e63@linux.alibaba.com> <271bc186-7ffb-33c8-4934-cda2beb94816@linux.alibaba.com> <05861b8c-2c7c-ae89-613a-41fcace6a174@linux.alibaba.com> <20220419205738.GZ2731@worktop.programming.kicks-ass.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/4/20 上午5:18, Stephane Eranian 写道: > Hi, > > Going back to the original description of this patch 2/2, it seems the > problem was that you expected PINNED events to always remain in > the same counters. This is NOT what the interface guarantees. A pinned > event is guaranteed to either be on a counter or in error state if active. > But while active the event can change counters because of event scheduling > and this is fine. The kernel only computes deltas of the raw counter. If you > are using the read() syscall to extract a value, then this is totally > transparent > and you will see no jumps. If you are instead using RDPMC, then you cannot > assume the counter index of a pinned event remains the same. If you do, then > yes, you will see discrepancies in the count returned by RDPMC. You cannot > just use RDPMC to read a counter from user space. You need kernel help. > The info you need is in the page you must mmap on the fd of the event. It > shows the current counter index of the event along with sequence number and > timing to help scale the count if necessary. This proper loop for > RDPMC is documented > in include/uapi/linux/perf_event.h inside the perf_event_mmap_page definition. > > As for TFA, it is not clear to me why this is a problem unless you > have the RDPMC problem > I described above. > Thank you for your comments. Our scenario is: all four GP are used up, and the abnormal PMC3 counter is observed on several machines. In addition, the kernel version is 4.9/4.19. After we encountered the problem of abnormal CPI data a few months ago, we checked all kinds of applications according to your suggestions here and finally determined that they all comply with the specifications in include/uapi/linux/perf_event.h. After a long experiment, it was found that this problem was caused by TFA: When Restricted Transactional Memory (RTM) is supported (CPUID.07H.EBX.RTM [bit 11] = 1) and CPUID.07H.EDX[bit 13]=1 and TSX_FORCE_ABORT[RTM_FORCE_ABORT]=0 (described later in this document), then Performance Monitor Unit (PMU) general purpose counter 3 (IA32_PMC3, MSR C4H and IA32_A_PMC3, MSR 4C4H) may contain unexpected values. Specifically, IA32_PMC3 (MSR C4H), IA32_PERF_GLOBAL_CTRL[3] (MSR 38FH) and IA32_PERFEVTSEL3 (MSR 189H) may contain unexpected values, which also affects IA32_A_PMC3 (MSR 4C4H) and IA32_PERF_GLOBAL_INUSE[3] (MSR 392H). --> from https://www.intel.com/content/dam/support/us/en/documents/processors/Performance-Monitoring-Impact-of-TSX-Memory-Ordering-Issue-604224.pdf We also submitted an IPS to Intel: https://premiersupport.intel.com/IPS/5003b00001fqdhaAAA For the latest kernel, this issue could be handled by the following commit: 400816f60c54 perf/x86/intel: ("Implement support for TSX Force Abort") However, many production environments are 4.9, 4.19, or even 3.10 kernel, which do not contain the above commit, and it is difficult to make hotfix from this commit, so these kernels will be affected by this problem. This patch 2/2 attempts to avoid the switching of the pmu counters in various perf_events, so the special behavior of a single pmu counter (eg, PMC3 here) will not be propagated to other events. We also made hotfix from it and verified it on some machines. Please have another look. Thanks -- Best wishes, Wen > > On Tue, Apr 19, 2022 at 1:57 PM Peter Zijlstra wrote: >> >> On Tue, Apr 19, 2022 at 10:16:12PM +0800, Wen Yang wrote: >>> We finally found that TFA (TSX Force Abort) may affect PMC3's behavior, >>> refer to the following patch: >>> >>> 400816f60c54 perf/x86/intel: ("Implement support for TSX Force Abort") >>> >>> When the MSR gets set; the microcode will no longer use PMC3 but will >>> Force Abort every TSX transaction (upon executing COMMIT). >>> >>> When TSX Force Abort (TFA) is allowed (default); the MSR gets set when >>> PMC3 gets scheduled and cleared when, after scheduling, PMC3 is >>> unused. >>> >>> When TFA is not allowed; clear PMC3 from all constraints such that it >>> will not get used. >>> >>> >>>> >>>> However, this patch attempts to avoid the switching of the pmu counters >>>> in various perf_events, so the special behavior of a single pmu counter >>>> will not be propagated to other events. >>>> >>> >>> Since PMC3 may have special behaviors, the continuous switching of PMU >>> counters may not only affects the performance, but also may lead to abnormal >>> data, please consider this patch again. >> >> I'm not following. How do you get abnormal data? >> >> Are you using RDPMC from userspace? If so, are you following the >> prescribed logic using the self-monitoring interface?