Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2986771rwb; Thu, 29 Sep 2022 18:52:39 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5GAvmsXkKWHQ5+dB455eRm8hh7aL/6/c6HQJ6L8gc7f4g8x0OC6nxS5v7eeVP2mQGCuHEW X-Received: by 2002:a17:907:6e90:b0:782:a5ef:89a8 with SMTP id sh16-20020a1709076e9000b00782a5ef89a8mr4861332ejc.639.1664502758879; Thu, 29 Sep 2022 18:52:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664502758; cv=none; d=google.com; s=arc-20160816; b=O5IPSLu5WaEGqHieyalteM7VPGbPzQ2ONxW6Yc7DRbiyWvU18KbpjptrXgx2NCl6v7 vfkyjJK5p2ViQ4Y/XLV/65uYjuEEVGyaLIP9I/dEsk0eimF7hjgjUrad2RVaAYw8X/3Q QNZlYL93ZS4t0sMfXKVOQKVmtT0VXAYkdnM68+7kEVIv+WFpGQ5mvQoeTv5NS2jfTss5 7+tikZwcFIM6RLHcoeRgMgGizlCgUuyTbXtlnbAIQdJi2jP3xEgCyq3i4VBfyHjO5EJV eeYKOiR9F5rjHaPCvnF7Kz3y4s15+U+MNZ1TGLRgvFTxEUY+WG4EpHf/mQTdzO4BKRel 20cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from:cc:to :content-language:subject:user-agent:mime-version:date:message-id; bh=+2ZTTpVcAEiSNEay+yehsj+JhSzdb7mZ+DLMOMjClZA=; b=zH6kcTC1DKMn7qBsUuW6I7tknzzWIV18FqTdzC8RBv3SMDfQ7/6vTT9xI3fLMU0fYG jFmqa4CdTviW5dU6JRwbVKZseWgfSM1xctX3Ha7NZkyREUevsU/RJp1Z6v0tLcy4/VW9 vBfqF6TGuLuxuJQWAyvMkKMex3KhI/UXSo093nr/joDd28X9Jf71tDIadUoLTDHjJW9U mv81Lq9aAyhSDmJ12nGrIyfzh6MWclybr9tivSWWxCKYkVW1YAwopJtIV4h1YN9i2Yok RWiRF/9f1KCI98mFpieXOSAP9J5wMSGm86eliQ7jfpt5NpntMmunzxoYXCVU8ibQPuVn a5lQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dm21-20020a170907949500b0078234266a78si693115ejc.894.2022.09.29.18.52.12; Thu, 29 Sep 2022 18:52:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229625AbiI3A67 (ORCPT + 99 others); Thu, 29 Sep 2022 20:58:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229498AbiI3A66 (ORCPT ); Thu, 29 Sep 2022 20:58:58 -0400 Received: from mx5.didiglobal.com (mx5.didiglobal.com [111.202.70.122]) by lindbergh.monkeyblade.net (Postfix) with SMTP id 4051D1E05CB for ; Thu, 29 Sep 2022 17:58:56 -0700 (PDT) Received: from mail.didiglobal.com (unknown [10.79.64.13]) by mx5.didiglobal.com (Maildata Gateway V2.8) with ESMTPS id 993C6B00A383E; Fri, 30 Sep 2022 08:58:53 +0800 (CST) Received: from [172.24.140.16] (10.79.65.101) by ZJY01-ACTMBX-03.didichuxing.com (10.79.64.13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.17; Fri, 30 Sep 2022 08:58:53 +0800 Message-ID: <4ad76a27-dfb4-23be-fdb3-49c0780df670@didichuxing.com> Date: Fri, 30 Sep 2022 08:58:51 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [RFC PATCH] sched/fair: Choose the CPU where short task is running during wake up Content-Language: en-US To: K Prateek Nayak , Chen Yu CC: Peter Zijlstra , Vincent Guittot , Tim Chen , Mel Gorman , Juri Lelli , Rik van Riel , Aaron Lu , Abel Wu , Yicong Yang , "Gautham R . Shenoy" , Ingo Molnar , Dietmar Eggemann , Steven Rostedt , Ben Segall , Daniel Bristot de Oliveira , Valentin Schneider , X-MD-Sfrom: wanghonglei@didiglobal.com X-MD-SrcIP: 10.79.64.13 From: Honglei Wang In-Reply-To: <2c50baa4-beef-54b9-74fe-1cbf6e8f8dbd@amd.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.79.65.101] X-ClientProxiedBy: ZJY02-PUBMBX-01.didichuxing.com (10.79.65.31) To ZJY01-ACTMBX-03.didichuxing.com (10.79.64.13) X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Prateek, On 2022/9/30 01:34, K Prateek Nayak wrote: > Hello Honglei, > > Thank you for looking into this. > > On 9/29/2022 12:29 PM, Honglei Wang wrote: >> >> [..snip..] >> >>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>> index 914096c5b1ae..7519ab5b911c 100644 >>>>> --- a/kernel/sched/fair.c >>>>> +++ b/kernel/sched/fair.c >>>>> @@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p) >>>>>       return 1; >>>>>   } >>>>>   +/* >>>>> + * If a task switches in and then voluntarily relinquishes the >>>>> + * CPU quickly, it is regarded as a short running task. >>>>> + * sysctl_sched_min_granularity is chosen as the threshold, >>>>> + * as this value is the minimal slice if there are too many >>>>> + * runnable tasks, see __sched_period(). >>>>> + */ >>>>> +static int is_short_task(struct task_struct *p) >>>>> +{ >>>>> +    return (p->se.sum_exec_runtime <= >>>>> +        (p->nvcsw * sysctl_sched_min_granularity)); >>>>> +} >>>>> + >>>>>   /* >>>>>    * The purpose of wake_affine() is to quickly determine on which CPU we can run >>>>>    * soonest. For the purpose of speed we only consider the waking and previous >>>>> @@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync) >>>>>       if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu)) >>>>>           return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu; >>>>>   -    if (sync && cpu_rq(this_cpu)->nr_running == 1) >>>>> +    if ((sync && cpu_rq(this_cpu)->nr_running == 1) || >>>>> +        is_short_task(cpu_curr(this_cpu))) >> >> Seems it a bit breaks idle (or will be idle) purpose of wake_affine_idle() here. Maybe we can do it something like this? >> >> if ((sync || is_short_task(cpu_curr(this_cpu))) && cpu_rq(this_cpu)->nr_running == 1) > > I believe this will still cause performance degradation on split-LLC > system for Stream like workloads. Based on the logs below, we can > have a situation as follows: > > stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1) > > Where sync is 0 but is_short_task() may return 1 and the > current_rq->nr_running is 1. This will lead to two Stream threads > getting placed on same LLC during wakeup which will cause cache > contention and performance degradation. > What I meant was that we should not break the purpose of wake_affine_idle(). 'nr_running == 1' makes sure there won't be a long queue here, and this might be helpful in the benchmark tests as well. Probably the short code section I sent was not considerate.. It's just kinda clue. I see your test result in another mail. It's great and is exactly what I was thinking we should test. Thanks, Honglei >> >> Thanks, >> Honglei >> >>>> >>>> This change seems to optimize for affine wakeup which benefits >>>> tasks with producer-consumer pattern but is not ideal for Stream. >>>> Currently the logic ends will do an affine wakeup even if sync >>>> flag is not set: >>>> >>>>            stream-4135    [029] d..2.   353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082 >>>>            stream-4135    [029] d..2.   353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1) >>>>            stream-4135    [029] d..2.   353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30 >>>>            -0       [030] dNh2.   353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030 > > This is the exact situation observed during our testing. > >>>> >>>> [..snip..] >>>> > -- > Thanks and Regards, > Prateek