Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp438205pxv; Wed, 14 Jul 2021 07:25:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzpbNW2nI8ttLHjFCT/OLkK2Nn79sFViIViQAqB1ixn7Rk2cd9WyGDHSQH8NNZjU8wyxCiQ X-Received: by 2002:a50:cd86:: with SMTP id p6mr862679edi.212.1626272730059; Wed, 14 Jul 2021 07:25:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626272730; cv=none; d=google.com; s=arc-20160816; b=iDGLbR67+N9w51IWBKfE83ZfjC6PcXWlagXu+U8LZj9RlvgfrV76hPO41lhtpJMZdl xhJ4I9gtbQ1crxRd6uvB+ADclXLSZRW8jJNF6J2tr+fXGmNEiQOq3JWfjgvFSLIZY5R+ ffwwmLtIUdf92X6CeAbNlvXas91dP/8GKQEvIhaThgZ9sSt6A640nv7wIpOl3Z4VHJl4 7soerrR/69e0JJKq2PMUq4NGHocwzW55KHOK3JmFflLEc3D2cwiZPXfMEoRleAupOWKe KAAJvUVfOCpM+iPFLlWqwfNtYwBIlpMBoKposuNmuyCTKOrr9WPhtpObUZisiMCA1quY uEYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=zWKuXmV9NTR6uWsb5ZRs6PtCut+FHwMqDqou+HUiiPs=; b=yzM0Kqk33HEYGNqKEoKOSsWdPgkRQZcV74uBI/wqhzsDNWTKAA3F3+qxCavGhIhQEI wHarQOKqzHaqanv8jGdFLkoq11AmZxxsQseHjxLy3KTXKuQtInF+plTOJe/dIiSuWlNC u/kPjURG26dzLH16bC6507XbvEozeMuVu74c3GIGbnkyUroMollOktmoTAOmXQMm1bPD 6uv0Gv+wqYGIT5Xi4tGz/nBQRo8oEbC3gwZkZ74Lw637CqfaJFJf7o9AE3t9bE4nzDsG j4FQC4ombEclyrnBChGGn3mjtufAisUJm0BA5/B3kr+Y6u4jPnQ7sEZ7L4HNXV7KXbhC H3RA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w10si3281079ejv.584.2021.07.14.07.25.06; Wed, 14 Jul 2021 07:25:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232702AbhGNO0F (ORCPT + 99 others); Wed, 14 Jul 2021 10:26:05 -0400 Received: from foss.arm.com ([217.140.110.172]:35540 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232265AbhGNO0F (ORCPT ); Wed, 14 Jul 2021 10:26:05 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3532231B; Wed, 14 Jul 2021 07:23:13 -0700 (PDT) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7D9EB3F694; Wed, 14 Jul 2021 07:23:11 -0700 (PDT) Subject: Re: [RFC PATCH 1/1] sched: do active load balance in balance callback To: Yafang Shao , Vincent Guittot , Peter Zijlstra , Valentin Schneider , Ingo Molnar , Juri Lelli , Steven Rostedt , Benjamin Segall , Mel Gorman , Daniel Bristot de Oliveira Cc: LKML References: From: Dietmar Eggemann Message-ID: Date: Wed, 14 Jul 2021 16:23:10 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/07/2021 09:40, Yafang Shao wrote: > The active load balance which means to migrate the CFS task running on > the busiest CPU to the new idle CPU has a known issue[1][2] that > there are some race window between waking up the migration thread on the > busiest CPU and it begins to preempt the current running CFS task. > These race window may cause unexpected behavior that the latency > sensitive RT tasks may be preempted by the migration thread as it has a > higher priority. > > This RFC patch tries to improve this situation. Instead of waking up the > migration thread to do this work, this patch do it in the balance > callback as follows, > > The New idle CPUm The target CPUn > find the target task A CFS task A is running > queue it into the target CPUn A is scheduling out > do balance callback and migrate A to CPUm > It avoids two context switches - task A to migration/n and migration/n to > task B. And it avoids preempting the RT task if the RT task has already > preempted task A before we do the queueing. > > TODO: > - I haven't done some benchmark to measure the impact on performance > - To avoid deadlock I have to unlock the busiest_rq->lock before > calling attach_one_task() and lock it again after executing > attach_one_task(). That may re-introduce the issue addressed by > commit 565790d28b1e ("sched: Fix balance_callback()") > > [1]. https://lore.kernel.org/lkml/CAKfTPtBygNcVewbb0GQOP5xxO96am3YeTZNP5dK9BxKHJJAL-g@mail.gmail.com/ > [2]. https://lore.kernel.org/lkml/20210615121551.31138-1-laoar.shao@gmail.com/ This didn't apply for me and I guess won't compile on tip/sched/core: raw_spin_{,un}lock(&busiest_rq->lock) -> raw_spin_rq_{,un}lock(busiest_rq) p->state == TASK_RUNNING -> p->__state or task_is_running(p) > Signed-off-by: Yafang Shao > --- > kernel/sched/core.c | 1 + > kernel/sched/fair.c | 69 ++++++++++++++------------------------------ > kernel/sched/sched.h | 6 +++- > 3 files changed, 28 insertions(+), 48 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 4ca80df205ce..a0a90a37e746 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -8208,6 +8208,7 @@ void __init sched_init(void) > rq->cpu_capacity = rq->cpu_capacity_orig = SCHED_CAPACITY_SCALE; > rq->balance_callback = &balance_push_callback; > rq->active_balance = 0; > + rq->active_balance_target = NULL; > rq->next_balance = jiffies; > rq->push_cpu = 0; > rq->cpu = i; [...] > +DEFINE_PER_CPU(struct callback_head, active_balance_head); > + > /* > * Check this_cpu to ensure it is balanced within domain. Attempt to move > * tasks if there is an imbalance. > @@ -9845,15 +9817,14 @@ static int load_balance(int this_cpu, struct > rq *this_rq, > if (!busiest->active_balance) { > busiest->active_balance = 1; > busiest->push_cpu = this_cpu; > + busiest->active_balance_target = busiest->curr; > active_balance = 1; > } > - raw_spin_unlock_irqrestore(&busiest->lock, flags); > > - if (active_balance) { > - stop_one_cpu_nowait(cpu_of(busiest), > - active_load_balance_cpu_stop, busiest, > - &busiest->active_balance_work); > - } > + if (active_balance) > + queue_balance_callback(busiest, > &per_cpu(active_balance_head, busiest->cpu), > active_load_balance_cpu_stop); When you defer the active load balance of p into a balance_callback (from __schedule()) p has to stop running on busiest, right? Deferring active load balance for too long might be defeat the purpose of load balance which has to happen now. Also, before balance_callback get invoked, active balancing might try to migrate p again and again but fails because `busiest->active_balance` is still 1 (you kept this former synchronization meant for active_balance_work). In this case the likelihood increases that one of the error condition in active_load_balance_cpu_stop() hit when it's finally called. What's wrong with the FIFO-1 "stopper" for CFS active lb? [...]