Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp1513226pxu; Thu, 8 Oct 2020 13:28:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyZnHQ7sH67/oBsFqFr7d+B5BIKGo1WSWyzDnB5BnqH+AFaCWlwllJTxa+gxw92iwQbRoOh X-Received: by 2002:aa7:c3c7:: with SMTP id l7mr10920228edr.213.1602188882681; Thu, 08 Oct 2020 13:28:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602188882; cv=none; d=google.com; s=arc-20160816; b=Tq8njdhfJPntX43fZ4NZAoITb1g5yMOCgFF63HX9y9FSrVUdq1gihnF4Z0cDRukr4A VFFm+tFumlIsJkfJPX1coSZm2ITwagZkXV88RJ12bqlofG3Rl874yWBSMptXLO1MGjxO rq+EmoYPIdK5SYn5DEsR7ptNcXosl7/7UVPntoUbQQSOAR9JiBjufP4VdCo/HpRXQcXH wiIDVFSJIiOvgFER2UtqF3RlcaHuYmT2+M/Cnr71l+7eC/PRLBX8LNuVORlAcVNOkRDc 2TQoXoKaYkxMXLDCLSN4EkQdmPL2m/4CffbJroHFaAvt7jarJdbuMTM0Tg5PMmDypLn/ b7ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:ironport-sdr:ironport-sdr; bh=93Oreb0fB+CIbtL8hyqvmKyHcyXfkIE47Zvwmn05n90=; b=JL7DieG9+9YSeBGH+cvkr30bXOilWm9vk8PB1t27tEwh/19E94ilaWDSTZC0Fn2ggV M+YmgF7w2UZ4napxZ1ekwbQuPEs5MgeALphW86GdjAGarKil4eQB7BrIwry+Ux8gyfzR +g2OMo0zxgUeFMntm0s3MFoKOAkGQTMg5xOnQCIP3KyUXJNS6TYXiWh6CFKri8Gi7Qhl KJQcXoWHQfx0MPqzZHWXjVXlPrIf/Got6Ty7fI5iVHgUL+fdXMvG7PIQGZZ4LjQDrinD JIRnLJMv1J0seQ2u+RAMovtE3Poiq2gWRdrZBO279R02Qz6ZiWkUpOk+qNntzmGXwjIG dAtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g9si4422901ejw.316.2020.10.08.13.27.39; Thu, 08 Oct 2020 13:28:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731851AbgJHRPv (ORCPT + 99 others); Thu, 8 Oct 2020 13:15:51 -0400 Received: from mga14.intel.com ([192.55.52.115]:34491 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730261AbgJHRPu (ORCPT ); Thu, 8 Oct 2020 13:15:50 -0400 IronPort-SDR: PRXVjiQOfO9NbECaA7l5aOwm6grg87XUThOwniSvIJS7avgz3btmJlBb41HTuS5miFZwQQbfYD fp7JhdBgZEUA== X-IronPort-AV: E=McAfee;i="6000,8403,9768"; a="164584976" X-IronPort-AV: E=Sophos;i="5.77,351,1596524400"; d="scan'208";a="164584976" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2020 10:15:49 -0700 IronPort-SDR: NrFpWhw4HCbPGKlNDxf6uWO/Aa3k5DKKLvRG//d+1tFRb2d4sfiXsHFYbP1F5c9Rtb868qWqvq UIAOUxqEbK0w== X-IronPort-AV: E=Sophos;i="5.77,351,1596524400"; d="scan'208";a="355452759" Received: from rjwysock-mobl1.ger.corp.intel.com (HELO [10.249.148.62]) ([10.249.148.62]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2020 10:15:48 -0700 Subject: Re: ACPI _CST introduced performance regresions on Haswll To: Mel Gorman Cc: Takashi Iwai , linux-kernel@vger.kernel.org References: <20201006083639.GJ3227@techsingularity.net> <20201006190322.GL3227@techsingularity.net> <25f31d3e-7a67-935f-93ba-32216a5084e2@intel.com> <20201006211820.GN3227@techsingularity.net> <2382d796-7c2f-665e-9169-5cdc437bf34c@intel.com> <20201008090909.GP3227@techsingularity.net> From: "Rafael J. Wysocki" Organization: Intel Technology Poland Sp. z o. o., KRS 101882, ul. Slowackiego 173, 80-298 Gdansk Message-ID: Date: Thu, 8 Oct 2020 19:15:46 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <20201008090909.GP3227@techsingularity.net> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/8/2020 11:09 AM, Mel Gorman wrote: > On Wed, Oct 07, 2020 at 05:45:30PM +0200, Rafael J. Wysocki wrote: >>> pre-cst is just before your patch >>> enable-cst is your patch that was bisected >>> enable-cst-no-hsx-acpi is your patch with use_acpi disabled >>> 5.9-rc8-vanilla is what it sounds like >>> 5.9-rc8-no-hsx-acpi disables use_acpi >>> >>> The enable-cst-no-hsx-acpi result indicates that use_acpi was the issue for >>> Haswell (at least these machines). Looking just at 5.9-rc8-vanillaa might >>> have been misleading because its performance is not far off the baseline >>> due to unrelated changes that mostly offset the performance penalty. >>> >>> The key question is -- how appropriate would it be to disable acpi for >>> Haswell? Would that be generally safe or could it hide other surprises? >>> >> It should be safe, but let's try to do something more fine-grained. >> >> There is the CPUIDLE_FLAG_ALWAYS_ENABLE flag that is set for C1E.? Can you >> please try to set it for C6 in hsw_cstates instead of clearing use_acpi in >> idle_cpu_hsx and retest? >> > Performance-wise, always enabling C6 helps but it may be specific to > this workload. Looking across all tested kernels I get; > > netperf-udp > 5.5.0 5.5.0-rc2 5.5.0-rc2 5.9.0-rc8 5.9.0-rc8 5.9.0-rc8 > vanilla pre-cst enable-cst vanilla disable-acpi enable-c6 > Hmean send-64 196.31 ( 0.00%) 208.56 * 6.24%* 181.15 * -7.72%* 199.84 * 1.80%* 235.09 * 19.76%* 234.79 * 19.60%* > Hmean send-128 391.75 ( 0.00%) 408.13 * 4.18%* 359.92 * -8.12%* 396.81 ( 1.29%) 469.44 * 19.83%* 465.55 * 18.84%* > Hmean send-256 776.38 ( 0.00%) 798.39 * 2.84%* 707.31 * -8.90%* 781.63 ( 0.68%) 917.19 * 18.14%* 905.06 * 16.57%* > Hmean send-1024 3019.64 ( 0.00%) 3099.00 * 2.63%* 2756.32 * -8.72%* 3017.06 ( -0.09%) 3509.84 * 16.23%* 3532.85 * 17.00%* > Hmean send-2048 5790.31 ( 0.00%) 6209.53 * 7.24%* 5394.42 * -6.84%* 5846.11 ( 0.96%) 6861.93 * 18.51%* 6852.08 * 18.34%* > Hmean send-3312 8909.98 ( 0.00%) 9483.92 * 6.44%* 8332.35 * -6.48%* 9047.52 * 1.54%* 10677.93 * 19.84%* 10509.41 * 17.95%* > Hmean send-4096 10517.63 ( 0.00%) 11044.19 * 5.01%* 9851.70 * -6.33%* 10914.24 * 3.77%* 12719.58 * 20.94%* 12731.06 * 21.04%* > Hmean send-8192 17355.48 ( 0.00%) 18344.50 * 5.70%* 15844.38 * -8.71%* 17690.46 ( 1.93%) 20777.97 * 19.72%* 20220.24 * 16.51%* > Hmean send-16384 28585.78 ( 0.00%) 28950.90 ( 1.28%) 25946.88 * -9.23%* 26643.69 * -6.79%* 30891.89 * 8.07%* 30701.46 * 7.40%* > > The difference between always using ACPI and force enabling C6 is > negligible in this case but more on that later > > netperf-udp > 5.9.0-rc8 5.9.0-rc8 > disable-acpi enable-c6 > Hmean send-64 235.09 ( 0.00%) 234.79 ( -0.13%) > Hmean send-128 469.44 ( 0.00%) 465.55 ( -0.83%) > Hmean send-256 917.19 ( 0.00%) 905.06 ( -1.32%) > Hmean send-1024 3509.84 ( 0.00%) 3532.85 ( 0.66%) > Hmean send-2048 6861.93 ( 0.00%) 6852.08 ( -0.14%) > Hmean send-3312 10677.93 ( 0.00%) 10509.41 * -1.58%* > Hmean send-4096 12719.58 ( 0.00%) 12731.06 ( 0.09%) > Hmean send-8192 20777.97 ( 0.00%) 20220.24 * -2.68%* > Hmean send-16384 30891.89 ( 0.00%) 30701.46 ( -0.62%) > > The default status and enabled states differ. > > For 5.9-rc8 vanilla, the default and disabled status for cstates are > > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0 > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0 > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0 > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:1 > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/disable:1 > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status:enabled > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status:enabled > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status:enabled > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/default_status:disabled > ./5.9.0-rc8-vanilla/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/default_status:disabled > > For use_acpi == false, all c-states are enabled > > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0 > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0 > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0 > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:0 > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/disable:0 > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status:enabled > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status:enabled > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status:enabled > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/default_status:enabled > ./5.9.0-rc8-disable-acpi/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/default_status:enabled > > Force enabling C6 > > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/disable:0 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/disable:0 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/disable:0 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/disable:1 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/disable:0 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/default_status:enabled > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/default_status:enabled > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/default_status:enabled > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/default_status:disabled > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/default_status:enabled > > Note that as expected, C3 remains disabled when only C6 is forced (state3 > == c3, state4 == c6). While this particular workload does not appear to > care as it does not remain idle for long, the exit latency difference > between c3 and c6 is large so potentially a workload that idles for short > durations that are somewhere between c1e and c3 exit latency might take > a larger penalty exiting from c6 state if the deeper c-state is selected > for idling. > > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state0/residency:0 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state1/residency:2 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state2/residency:20 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state3/residency:100 > ./5.9.0-rc8-enable-c6/iter-0/sys/devices/system/cpu/cpu0/cpuidle/state4/residency:400 > If you are worried that C6 might be used instead of C3 in some cases, this is not going to happen. I all cases in which C3 would have been used had it not been disabled, C1E will be used instead. Which BTW indicates that using C1E more often adds a lot of latency to the workload (if C3 and C6 are both disabled, C1E is used in all cases in which one of them would have been used). With C6 enabled, that state is used at least sometimes (so C1E is used less often), but PC6 doesn't seem to be really used - it looks like core C6 only is entered and which may be why C6 adds less latency than C1E (and analogously for C3).