Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp1373810rwi; Mon, 31 Oct 2022 15:49:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4C96RpDJGiSIOszEwS+yzL577uKIFhhaLiy4EGzfXAiqJNid83UcIFB36kYt/Ts88pQzpL X-Received: by 2002:a17:906:4717:b0:7ad:c606:349f with SMTP id y23-20020a170906471700b007adc606349fmr9659088ejq.214.1667256579772; Mon, 31 Oct 2022 15:49:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667256579; cv=none; d=google.com; s=arc-20160816; b=irdqADcs+IN4EqoPHvHKaOLPrjm4r2D30J7KI/lwVZ6pcQNfaXtrvMRfV5xg1ITOv8 zHZHIAcO9p+7vojVJ0JGF9RWQ97EkaiWfH7HOZNSFmBTF4et/qYp8nL3iiZvlsAPvi0o /DYNL1/r0SsoXut0V4vXg7xbBC24iEaDxQn6MXj9ow5EFJZRLsOrOr4b1zgokNPL3L1W /6utTjyYSuFWJNJTdIBiWJ0UPmq0MJyqq8qsLqT+xhub+tOE/lR4mO8u/t5nc4WPeNi1 17WIUbdR73VEj9WV++Hd1x5VjIX2lhk3Bu1mn2yNzzPbutn/Byir77DDGNt9hOBBUsyO UBMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=eMQqyfbMdrgau/z8G/eKPFBv+2qgBW09vti4cEGeyAc=; b=KxuFj7/LUYpYRzpQXXJxp/6yKneXLB+wwnD2Rc0BSvQ8Z1TerJaf3VsmX84rxSU3Zb gdl4/8DidJbneWV9XfdGEjFdc05/kx3siqLzFH3jFoNgh3cbvDEedc7eWtSnM/csSLoc CYcVmUOlr/pGXZrac2pQmr40kDWCxYpt+d0SC/gt0OM4b2+Pj88alYReF+y/zBNE4voz 2cqdHHlP/iyS8EfFBLqrrX8hk+GfsXr1CBGxEQfwCVelGU5Vi2n6ps+R3s3rL+M334mr BWtVW2/jzBEaurcJk3zxpkqCHIrkdGNSbrzbKe69Th4PbudtCITTosaVsXr8JdK7SpuI 6TNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MD8v9x1D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sa14-20020a1709076d0e00b00730c1850171si10147056ejc.800.2022.10.31.15.49.10; Mon, 31 Oct 2022 15:49:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MD8v9x1D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229785AbiJaWpH (ORCPT + 98 others); Mon, 31 Oct 2022 18:45:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229546AbiJaWpG (ORCPT ); Mon, 31 Oct 2022 18:45:06 -0400 Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78B4F12AD5 for ; Mon, 31 Oct 2022 15:45:00 -0700 (PDT) Received: by mail-lf1-x133.google.com with SMTP id f37so21373104lfv.8 for ; Mon, 31 Oct 2022 15:45:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eMQqyfbMdrgau/z8G/eKPFBv+2qgBW09vti4cEGeyAc=; b=MD8v9x1DmAGUEGLn0g2vhCe4emHRns9CFiMHv2EGrV/PthsdXEHCiRs6QTFqvwxCB4 0j/ZOhfWtATrqJefP+2cOWa2PJoDKPiVxklCrrWylf6T2eX0S6pas2EvBMtJJKLvNTwR 36nXOBinJUsoDI0l+634UwgUchG+YecWtR1WVXGQZct17byNBT8ibvEDbQXReOvoq+a3 j9OUrIcnUIoxEBPp4Ky5DVSuXibsZR4/k7tRpL5eikEvx4zo8npNdrlRm9Vkxa+g+EQM 6yChmKwXdPUE2GSCfJIglW918QTgpI99GdPYAZqQx7yqjcaoI7ouAszpUD6vtUHOyYdD AuIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eMQqyfbMdrgau/z8G/eKPFBv+2qgBW09vti4cEGeyAc=; b=BxUQ8GutHrNsUWKbXwYFjH4WfLuSWHHFQw/939FLsN7+rL42ul5p9IHor2v7kTncmW ociLt8SXgb4CwaxrMXpO0BZNl0YnNH7yffodRy2zZc6AnuJYeQx2iReCgRmou7wc/wwH jqWlzTkRuZZ4IeenJW1lfcPo3LE2kjsxAEu9kSWbYqz+AiPUXl7crP8xiSxNtGyn5CKv GP548zIXJtI9OwWBrs0B/Z4t9WF4hpZc63r0h2shgjRnq/DwPd8PsjByJhLOd9S6MwW3 49FLjGy9Zx06Xi+eu40JHn1gP4mUrfexJgMt6CiZABiX3UwTr+6dDu5ETU1HETL8kisa R01g== X-Gm-Message-State: ACrzQf3ascuZ1/G7zoOsS4F2rpxWkhBUr5KO8e0/clGF0DA5XqZjmJWC fZJ1Y1NpUsP92n2Sb4TuJxu35UEUxsu47gmFb8aRow== X-Received: by 2002:a05:6512:2356:b0:4a2:693b:2bc3 with SMTP id p22-20020a056512235600b004a2693b2bc3mr6183436lfu.545.1667256298526; Mon, 31 Oct 2022 15:44:58 -0700 (PDT) MIME-Version: 1.0 References: <20221027081630.34081-1-zhouchuyi@bytedance.com> <64d963b6-2d9c-3f93-d427-a1ff705fb65a@bytedance.com> <5af26ac9-3bdb-32d2-77a7-6cd8feca97aa@bytedance.com> In-Reply-To: <5af26ac9-3bdb-32d2-77a7-6cd8feca97aa@bytedance.com> From: Josh Don Date: Mon, 31 Oct 2022 15:44:46 -0700 Message-ID: Subject: Re: [PATCH] sched/fair: favor non-idle group in tick preemption To: Chuyi Zhou Cc: peterz@infradead.org, juri.lelli@redhat.com, mingo@redhat.com, vincent.guittot@linaro.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 31, 2022 at 1:39 AM Chuyi Zhou wrote: > > >> =E5=9C=A8 2022/10/28 07:34, Josh Don =E5=86=99=E9=81=93: > > The reason for limiting the control of weight for idle cgroups is to > > match the semantics of the per-task SCHED_IDLE api, which gives > > SCHED_IDLE threads minimum weight. The idea behind SCHED_IDLE is that > > these entities are intended to soak "unused" cpu cycles, and should > > give minimal interference to any non-idle thread. However, we don't > > have strict priority between idle and non-idle, due to the potential > > for starvation issues. > > > > Perhaps you could clarify your use case a bit further. Why do you want > > to change the weight? Is it to adjust the competition between two idle > > groups, or something else? > > > Suppose we have two cgroups=EF=BC=88idle & non-idle=EF=BC=89in /sys/fs/cg= roup/cpu. > Idle cgroup contains some offline service, such as beg data processing; > non-idle cgroup contains some online service which have > higher priority to users and are sensitive to latency. We set > quota/period for idle cgroup which indicates it's *cpu limit*. > In general, we consider that the idle cgroup's cpu usage > closer to the limit, the better. However, when the system is busy, > the idle cgroups can only get little cpu resources with minimum weight. > To cope with the above situation, we changed the default weight. I see. So you want the part of SCHED_IDLE that makes the entity highly preemptible (and avoids preemption of non idle entities), but want to adjust weight to reach a target cpu split? That seems a bit counterintuitive to me, since by giving the idle entities higher weight, you'll end up pushing out the round-robin latency for the non-idle entities. Worth noting that SCHED_IDLE is a bit of a CFS hack, but the intended semantics of it are that these threads soak only "remaining cycles". This comes with many implications beyond just weight. For example, a cpu running only SCHED_IDLE entities is considered as "idle" from the perspective of non-idle entities. If we give these idle entities meaningful weight, we start to break assumptions there, for example see sched_idle_cpu() and load balancing. I wonder if maybe dusting off SCHED_BATCH is a better answer here, for this type of use case (some amount of throughput "guarantee", but with preemption properties similar to SCHED_IDLE). Peter, thoughts? > One more question is, why you think this patch can strave idle entity? > > /* > * Ensure that a task that missed wakeup preemption by a > * narrow margin doesn't have to wait for a full slice. > * This also mitigates buddy induced latencies under load. > */ > se =3D __pick_first_entity(cfs_rq); > delta =3D curr->vruntime - se->vruntime; > > if (delta < 0) > return; > > if (delta > ideal_runtime) > resched_curr(rq_of(cfs_rq)); > > se can preempt curr only when > curr->vruntime > se->vruntime && > curr->vruntime - se->vruntime > ideal_runtime > is true. I think the original purpose is that se doesn't have to wait > for a full slice, reduce response time if se is latency sensitive. > This patch just let curr exhaust it's idleal_runtime when se is idle and > curr is non-idle. Normally se will be choosed by pick_next_entity(). > > Maybe I missed something ? > Thanks No that was my mistake, I accidentally thought this delta was being applied to the 'if (delta_exec > ideal_runtime) {' above in check_preempt_tick(). Some weirdness about this change though, is that if there is a non-idle current entity, and the two next entities on the cfs_rq are idle and non-idle respectively, we'll now take longer to preempt the on-cpu non-idle entity, because the non-idle entity on the cfs_rq is 'hidden' by the idle 'first' entity. Wakeup preemption is different because we're always directly comparing the current entity with the newly woken entity. Best, Josh