Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp718774rwb; Wed, 26 Jul 2023 01:27:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlGvmO9PyA0wLl6KOOJ+ihNRfrxbAiAnmgm9aypF2SE0RZv+J3pbBOZe2p8zBGjUeldd74+m X-Received: by 2002:a17:906:cc5d:b0:973:71c3:8b21 with SMTP id mm29-20020a170906cc5d00b0097371c38b21mr1146680ejb.72.1690360023586; Wed, 26 Jul 2023 01:27:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690360023; cv=none; d=google.com; s=arc-20160816; b=jcFzKvrlDc0LVPJX5lozV3KAPt/dnfrpbs40n8IbtVpD7/32tJV8qTk/yrWeWMw/4B vQaKIfV0pYSouTNnu3PAKCAUHu5OE6+j4gOPnpYP2EJPNHAtI46UtOUFmCWR0hbSASKI Krr83+Thpm7aflBFFIVmrK0acJKhDQJSaFBzEGVH9TqLLM2W3k6SFTFcdpP3G49sSpCi qNyyFG5GJvo2HlfV6ZgtkOyP6T5vGxuTHtVMsFmCKoNQT9Tg/LPprDFyiYf+cecFn0G9 OxUBwOKlDVlqmOr0+iZ4GfsplUeCtj5NLZ9qDsOpPav9tLQ6jm2WOJXi6ojgEQ9eo6VP Mj9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :in-reply-to:from:references:cc:to:content-language:subject :user-agent:date:message-id:dkim-signature; bh=z17LpmUgP9yua4R/Ls32uf+x5wZN1PCE2KZUVsoI07A=; fh=4cKQbK+ZIJwwIl3ikXFiClZvT3jldMwHl7f2WHDWFlw=; b=ghCKD4c5eYWeaNn9CJH1Uuxaj9VfGfTFEIIB4OaQ9hnURi3V45uESnf6b10lJfROEH J6SueNakwfiotsJmX3e95YROvYdue1/m9dBie8J2OOaJyvzirFxcY3qflSZlJOLNNUPy 2GqvL4UWj7po260qIX/uV5k6uBe1O4n/75DBpdcF1WAZKY/MZuZPgYQYQDAXTZ5yOE40 i4vgE0vGdO2Pgv2eUbXot6lFZYFb2x3d+gAUxqwpSr+8sCokjl/hG2Y7u7cma2bnC7aa YzyQqDR9UoFyiE7wGyJ8VCGG4NniRbUQhTYH5U5sB0u5Lr5cMq8XG33gyEsymXtj+y6F YXBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=XmLlzxW0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lo12-20020a170906fa0c00b00993ac8be754si9036585ejb.106.2023.07.26.01.26.39; Wed, 26 Jul 2023 01:27:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=XmLlzxW0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231512AbjGZIPJ (ORCPT + 99 others); Wed, 26 Jul 2023 04:15:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231500AbjGZIOk (ORCPT ); Wed, 26 Jul 2023 04:14:40 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2390B5270 for ; Wed, 26 Jul 2023 01:05:39 -0700 (PDT) Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36Q80BW3032353; Wed, 26 Jul 2023 08:04:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=z17LpmUgP9yua4R/Ls32uf+x5wZN1PCE2KZUVsoI07A=; b=XmLlzxW0tTQMAq/JPGYya6VNO5wECvN1aFRnVSLWwdVCD3rJqHGH0l2BGperDkyZnx9m nY0PfG59aQ946y7uSk56X789MKDF8G+xMXAMXWy14ynlcvhhVEoz2RrG6R1y696seOsZ 1G+3cczW7wS2mBSDYfw6QAOtq4R/EADJM0I3zC0RmSHohHzOsq7eohxxj1sLs2z2oegX H8wYz++FXUSvCqY6Xaymrvlrk6CtlOetWCPfqW/zM1Uv4r6+MuMq0BMT2KaIUZp4V+kt cuVZOaDZl8Sy8a46T+nKxMtoiGsuJiaFE72eZbjQJpUlOY5PunbCxa5W5UrFwjzbhNy3 +A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s2ykj07yd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 26 Jul 2023 08:04:52 +0000 Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 36Q81qac006067; Wed, 26 Jul 2023 08:04:52 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s2ykj07ws-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 26 Jul 2023 08:04:52 +0000 Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 36Q7DBtO001851; Wed, 26 Jul 2023 08:04:50 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3s0unjjryk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 26 Jul 2023 08:04:50 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 36Q84o196816266 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 26 Jul 2023 08:04:50 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 514F258057; Wed, 26 Jul 2023 08:04:50 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CD2E35805D; Wed, 26 Jul 2023 08:04:44 +0000 (GMT) Received: from [9.179.15.237] (unknown [9.179.15.237]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTP; Wed, 26 Jul 2023 08:04:44 +0000 (GMT) Message-ID: <69076f8e-191b-2e3e-d810-ea72d8ff18bb@linux.vnet.ibm.com> Date: Wed, 26 Jul 2023 13:34:43 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms Content-Language: en-US To: Mathieu Desnoyers Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Valentin Schneider , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Vincent Guittot , Juri Lelli , Swapnil Sapkal , Aaron Lu , x86@kernel.org, Peter Zijlstra , shrikanth hegde , Srikar Dronamraju References: <20230725193048.124796-1-mathieu.desnoyers@efficios.com> From: Shrikanth Hegde In-Reply-To: <20230725193048.124796-1-mathieu.desnoyers@efficios.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: m5xzHoUKzaDa61l7Kn6kN12dtr2GgLyM X-Proofpoint-ORIG-GUID: esGXYJWH4hP0mThGigHXoHOHrIgMC4ZM Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-26_01,2023-07-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 spamscore=0 mlxlogscore=963 phishscore=0 clxscore=1015 bulkscore=0 malwarescore=0 mlxscore=0 adultscore=0 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307260070 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/26/23 1:00 AM, Mathieu Desnoyers wrote: > Allow select_task_rq to consider a cpu as idle for 1ms after that cpu > has exited the idle loop. > > This speeds up the following hackbench workload on a 192 cores AMD EPYC > 9654 96-Core Processor (over 2 sockets): > > hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100 > > from 49s to 34s. (30% speedup) > > My working hypothesis for why this helps is: queuing more than a single > task on the runqueue of a cpu which just exited idle rather than > spreading work over other idle cpus helps power efficiency on systems > with large number of cores. > > This was developed as part of the investigation into a weird regression > reported by AMD where adding a raw spinlock in the scheduler context > switch accelerated hackbench. > > It turned out that changing this raw spinlock for a loop of 10000x > cpu_relax within do_idle() had similar benefits. > > This patch achieve a similar effect without the busy-waiting by > introducing a runqueue state sampling the sched_clock() when exiting > idle, which allows select_task_rq to consider "as idle" a cpu which has > recently exited idle. > > This patch should be considered "food for thoughts", and I would be glad > to hear feedback on whether it causes regressions on _other_ workloads, > and whether it helps with the hackbench workload on large Intel system > as well. > > Link: https://lore.kernel.org/r/09e0f469-a3f7-62ef-75a1-e64cec2dcfc5@amd.com > Signed-off-by: Mathieu Desnoyers > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Valentin Schneider > Cc: Steven Rostedt > Cc: Ben Segall > Cc: Mel Gorman > Cc: Daniel Bristot de Oliveira > Cc: Vincent Guittot > Cc: Juri Lelli > Cc: Swapnil Sapkal > Cc: Aaron Lu > Cc: x86@kernel.org > --- > kernel/sched/core.c | 4 ++++ > kernel/sched/sched.h | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index a68d1276bab0..d40e3a0a5ced 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6769,6 +6769,7 @@ void __sched schedule_idle(void) > * TASK_RUNNING state. > */ > WARN_ON_ONCE(current->__state); > + WRITE_ONCE(this_rq()->idle_end_time, sched_clock()); > do { > __schedule(SM_NONE); > } while (need_resched()); > @@ -7300,6 +7301,9 @@ int idle_cpu(int cpu) > { > struct rq *rq = cpu_rq(cpu); > > + if (sched_clock() < READ_ONCE(rq->idle_end_time) + IDLE_CPU_DELAY_NS) Wouldn't this hurt the latency badly? Specially on a loaded system with a workload that does a lot of wakeup. ran schbench on a 50% loaded system with stress-ng. (there could be a better benchmark to measure latency) I see that latency takes a hit. specially tail latencies.full log below with different schbench groups. 6.5-rc3 6.5-rc3+this patch Groups: 1 50.0th: 14.0 13.0 75.0th: 16.0 16.0 90.0th: 19.5 20.0 95.0th: 53.0 226.0 99.0th: 1969.0 2165.0 99.5th: 2912.0 2648.0 99.9th: 4680.0 4142.0 Groups: 2 50.0th: 15.5 15.5 75.0th: 18.0 19.5 90.0th: 25.5 497.0 95.0th: 323.0 1384.0 99.0th: 2055.0 3144.0 99.5th: 2972.0 4014.0 99.9th: 6026.0 6560.0 Groups: 4 50.0th: 18.0 18.5 75.0th: 21.5 26.0 90.0th: 56.0 940.5 95.0th: 678.0 1896.0 99.0th: 2484.0 3756.0 99.5th: 3224.0 4616.0 99.9th: 4960.0 6824.0 Groups: 8 50.0th: 23.5 25.5 75.0th: 30.5 421.5 90.0th: 443.5 1722.0 95.0th: 1410.0 2736.0 99.0th: 3942.0 5496.0 99.5th: 5232.0 7016.0 99.9th: 7996.0 8896.0 Groups: 16 50.0th: 33.5 41.5 75.0th: 49.0 752.0 90.0th: 1067.5 2332.0 95.0th: 2093.0 3468.0 99.0th: 5048.0 6728.0 99.5th: 6760.0 7624.0 99.9th: 8592.0 9504.0 Groups: 32 50.0th: 60.0 79.0 75.0th: 456.5 1712.0 90.0th: 2788.0 3996.0 95.0th: 4544.0 5768.0 99.0th: 8444.0 9104.0 99.5th: 9168.0 9808.0 99.9th: 11984.0 12448.0 > + return 1; > + > if (rq->curr != rq->idle) > return 0; > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 81ac605b9cd5..8932e198a33a 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -97,6 +97,8 @@ > # define SCHED_WARN_ON(x) ({ (void)(x), 0; }) > #endif > > +#define IDLE_CPU_DELAY_NS 1000000 /* 1ms */ > + > struct rq; > struct cpuidle_state; > > @@ -1010,6 +1012,7 @@ struct rq { > > struct task_struct __rcu *curr; > struct task_struct *idle; > + u64 idle_end_time; > struct task_struct *stop; > unsigned long next_balance; > struct mm_struct *prev_mm;