Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3273030imm; Tue, 17 Jul 2018 01:51:49 -0700 (PDT) X-Google-Smtp-Source: AAOMgpePkOTIsbFqiC6Tf8T7rRCipz81doNnSRguvzlolvkv9non4sQtxNIiV5F45luMrME/NwKi X-Received: by 2002:a63:686:: with SMTP id 128-v6mr732555pgg.338.1531817509259; Tue, 17 Jul 2018 01:51:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531817509; cv=none; d=google.com; s=arc-20160816; b=fV1+/oVHqqAH7Q5Lr8sqxdXxTj5XJvn5112EKnwIuk3rNuenmc18L/lo2xoKpPb4+T 6aKLl+ng9SwL5jC15oYLFWBA0WPulAO7cmc2iZL3wNz2/7gwAFAg3VqNWz15aNooJn0m gJq3zmA4DT9Oh0FYZy+i0qNQTQAmxeQLBy/dbJdp09cdf82tP9qnfdl/9eXdt2KqHoXk raCyttSuM/YgpjypY69GKWY88/VLUPDfxcWCESY520yqhvwvYxtx0ij2RtwdY9tE1GeM HS1p94mhnt4ihWu3m6EFR1HHl80KP+Dj3qtzWCnCfKlduKbk8OHKNeHafuQrbjD3Nrel g71g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=6Jxo+24tFljz6RQOpZz2ziQRyO2i+uSkR1/Ypk3yyTg=; b=uX0iYtOUDw46U710qYQvkgR9mrShKUvMQr3Ea/KYgxWL5LAmAfsIXLnvt2Nhw4H8JW 2uis1fO5u16voxA1nIdJFrUENzCkk9YQY/o6AlTPIBSzi9z9v6f+Yp4+MW9ApKNxngTr xEO9lqnPZYx+ypPp3kHYfZZbWM8lNVi8lP7MoU2Z+dO9K2iidHNcNYmZrebb0PWVgaQL CbtySSMVLPi59Ln7sl7YNyPWaH7nVaEKG2eVd2p/wAjMvnP84kdnyVOBe+en79IDyJ1u Lac3wL87TtdkTj3wIm8vw2S1wOYE6UZy7bULejO3dcthfcJb0Tyk3WBDNHYq13DtJQfu Cs0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r128-v6si373217pgr.634.2018.07.17.01.51.33; Tue, 17 Jul 2018 01:51:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729473AbeGQJWb (ORCPT + 99 others); Tue, 17 Jul 2018 05:22:31 -0400 Received: from smtp.nue.novell.com ([195.135.221.5]:42178 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728422AbeGQJWb (ORCPT ); Tue, 17 Jul 2018 05:22:31 -0400 Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Tue, 17 Jul 2018 10:50:56 +0200 Received: from suselix (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Tue, 17 Jul 2018 09:50:41 +0100 Date: Tue, 17 Jul 2018 10:50:39 +0200 From: Andreas Herrmann To: "Rafael J. Wysocki" Cc: "Rafael J. Wysocki" , Peter Zijlstra , Frederic Weisbecker , Viresh Kumar , Linux PM , Linux Kernel Mailing List Subject: Re: Commit 554c8aa8ecad causing severe performance degression with pcc-cpufreq Message-ID: <20180717085039.kqxwbkgruhj5qxtx@suselix> References: <20180717065048.74mmgk4t5utjaa6a@suselix> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 10:03:41AM +0200, Rafael J. Wysocki wrote: > On Tue, Jul 17, 2018 at 9:33 AM, Rafael J. Wysocki wrote: > > Hi, > > > > Thanks for your report! > > > > On Tue, Jul 17, 2018 at 8:50 AM, Andreas Herrmann wrote: > >> Hello, > >> > >> I've recently noticed that commit 554c8aa8ecad ("sched: idle: Select > >> idle state before stopping the tick") causes severe performance drop > >> for systems using pcc-cpufreq driver. Depending on the number of CPUs > >> the system might be almost unusable. The OS jitter for 4.17.y and > >> 4.18.-rcx kernels is off the charts, you can even spot it with top > >> command (issued when the system is supposedly idle), e.g. > >> > >> top - 14:44:24 up 2 min, 1 user, load average: 90.11, 38.20, 14.38 > >> Tasks: 1199 total, 109 running, 541 sleeping, 0 stopped, 0 zombie > >> %Cpu(s): 1.2 us, 58.7 sy, 0.0 ni, 39.3 id, 0.6 wa, 0.0 hi, 0.3 si, 0.0 st > >> KiB Mem: 13137064+total, 1192168 used, 13017848+free, 2340 buffers > >> KiB Swap: 2104316 total, 0 used, 2104316 free. 522296 cached Mem > >> > >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >> 3373 root 20 0 982024 49916 36120 R 96.691 0.038 0:19.54 kubelet > >> 67 root 20 0 0 0 0 R 78.676 0.000 0:49.36 kworker/9:0 > >> 25 root 20 0 0 0 0 R 78.125 0.000 0:49.67 kworker/2:0 > >> 182 root 20 0 0 0 0 R 75.735 0.000 1:18.17 kworker/28:0 > >> 43 root 20 0 0 0 0 R 75.000 0.000 0:11.56 kworker/5:0 > >> 103 root 20 0 0 0 0 R 74.449 0.000 0:46.83 kworker/15:0 > >> 334 root 20 0 0 0 0 R 72.978 0.000 1:06.88 kworker/53:0 > >> 789 root 20 0 0 0 0 R 69.853 0.000 1:29.50 kworker/38:1 > >> 418 root 20 0 0 0 0 R 69.301 0.000 0:41.33 kworker/67:0 > >> 779 root 20 0 0 0 0 R 68.934 0.000 1:33.60 kworker/27:1 > >> 773 root 20 0 0 0 0 R 68.566 0.000 1:37.91 kworker/22:1 > >> 762 root 20 0 0 0 0 R 68.015 0.000 1:41.01 kworker/11:1 > >> 769 root 20 0 0 0 0 R 67.647 0.000 1:37.65 kworker/18:1 > >> 805 root 20 0 0 0 0 R 67.096 0.000 1:30.96 kworker/54:1 > >> 840 root 20 0 0 0 0 R 66.912 0.000 1:23.82 kworker/89:1 > >> 812 root 20 0 0 0 0 R 66.728 0.000 1:31.89 kworker/59:1 > >> 847 root 20 0 0 0 0 R 66.360 0.000 1:28.40 kworker/96:1 > >> 763 root 20 0 0 0 0 R 66.176 0.000 1:42.57 kworker/12:1 > >> 772 root 20 0 0 0 0 R 66.176 0.000 1:12.58 kworker/21:1 > >> 821 root 20 0 0 0 0 R 66.176 0.000 1:29.62 kworker/69:1 > >> 923 root 20 0 0 0 0 R 65.809 0.000 1:44.32 kworker/3:18 > >> 1284 root 20 0 0 0 0 R 65.809 0.000 1:23.50 kworker/101:2 > >> 61 root 20 0 0 0 0 R 65.625 0.000 1:29.37 kworker/8:0 > >> 3531 root 20 0 24384 3768 2356 R 65.625 0.003 0:08.91 top > >> 771 root 20 0 0 0 0 R 65.074 0.000 1:37.90 kworker/20:1 > >> 767 root 20 0 0 0 0 R 64.706 0.000 1:38.01 kworker/16:1 > >> 764 root 20 0 0 0 0 R 64.522 0.000 1:40.28 kworker/13:1 > >> 765 root 20 0 0 0 0 R 64.154 0.000 1:40.13 kworker/14:1 > >> > >> When I apply below patch (trying to revert essential parts of commit > >> 554c8aa8ecad) behaviour seems back to normal. > > > > Well, that basically defeats the purpose of the change in commit > > 554c8aa8ecad, so it's not what I'd like to do to fix this problem. > > > > Also it would be good to understand what actually happens. > > > >> I know that pcc-cpufreq driver is not "state-of-the-art" when it comes > >> to cpufreq drivers and you better not use it. > > > > That's exactly right. > > > >> But I wonder whether commit 554c8aa8ecad ("sched: idle: Select idle state before > >> stopping the tick") introduced bad behaviour for other cases as well. > > > > It has been tested quite extensively in that respect, although > > admittedly not with the pcc-cpufreq driver. > > > > Nothing bad related to it has been has been reported so far, FWIW. > > > >> I'll send some performance results to illustrate the issue asap. I've > >> also tried to modify pcc-cpufreq to reduce the amount of frequency > >> changes triggered by this driver but this does not help for kernels > >> where commit 554c8aa8ecad is applied. > > > > Can you replace pcc-cpufreq with a different cpufreq driver on the > > affected systems? If so, do performance numbers look bad after that > > too? > > Also, what cpufreq governor do you use with pcc-cpufreq? Ondemand governor. Which triggers a lot of PCC related platform calls. And as Peter noticed already the driver has a severe bottleneck (lock protecting shared memory used for all CPUs to pass data to/from platform for PCC calls). > Does changing it to something like "performance" improve things? With performance governor above mentioned bottleneck is no issue. On balance before this commit users could use pcc-cpufreq but had already suboptimal performance (compared to say intel_pstate driver which can be used changing BIOS options). Starting with this commit systems using pcc-cpufreq are unusable with high number of CPUs (top output above is for system with 120 CPUs). So should the driver be removed (sooner or later), or this behaviour be documented somewhere, or just leave it as is. Andreas