Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp20417369rwd; Thu, 29 Jun 2023 01:53:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5JBgGuZX2ZTvUSON4HPoh3KuYtZLqM9YH7dJF5DX4NjBYwKAUR+3Zp8Djsk2WVqRM5zYwx X-Received: by 2002:a05:6a00:1888:b0:67e:4313:811e with SMTP id x8-20020a056a00188800b0067e4313811emr7368804pfh.0.1688028803652; Thu, 29 Jun 2023 01:53:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688028803; cv=none; d=google.com; s=arc-20160816; b=Hilo3U/6kbzItMe8uLIPXbgxZFFdAF9F2GZhEH36bGiE4A7KzmR8hSboRVUb65A5/M DZ5X9RfhiNyPTaLH/asuISgmYplixEFP7VGw/+B0IGkAzI5ZrbWQ4gMDd1TTvZqGQpBh +YmMGjX5rOJxEDTmQB4cmWDl5fi6+iqceHJExdyBqzlnMpmUrGWCf96KowJNLzUVjwRh lHga5uIJTb38EOmdYSW9iEwAWzR5LfcfmX2rdn+unPrDenywMMu6PhOVu275dQJ5GWMx OtAA1tQdwsv1QSJW7i9y/zVyepkDG2lXtoqG36KM+twYGB51tAzuVAXnl4DhZXRblagj sVyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=nXznbG4FVhuMJDAumoD9paaeIhbqWqlZS3esXX8J+O4=; fh=DQDVs/7C1DWNIuKwZojf2MAOssAzA9S/0stJVov2BEQ=; b=IrJUWxL3NRDtJ+GaExbxyaZZzN9PzBMfzRbjtiRociAaSYlyt39un8SPB4lr25/BRZ ifFG0h5srq1Ug/H6+wL6n3wMZ58blH0QWEoq/mVXP7vAILcv5se06bKIF8O9wl3/NcPc lg/oJ4qdnoDiAROt75lK11aEHO1Jn+qSD5HWrv8heBT8R5lxvQVZHZti4ASnZpodS0y3 fv93qfsoEtCr6vIF/NSR4IJoInT2JeqgEkYrXnCHU0eULAgxpkmxLknsHpALY+u65OgA G88/2eath+Pot/ocxQOYJVUS/0OZCgRsiHM8aLZ0GqPyBsGDtTOjiijjmDA/VJTG8VKm bBZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="XKc34Zr/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w18-20020a056a0014d200b0067d12984ec5si4980728pfu.289.2023.06.29.01.53.11; Thu, 29 Jun 2023 01:53:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="XKc34Zr/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231638AbjF2Ikc (ORCPT + 99 others); Thu, 29 Jun 2023 04:40:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231899AbjF2Ijr (ORCPT ); Thu, 29 Jun 2023 04:39:47 -0400 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 862EF35BB for ; Thu, 29 Jun 2023 01:34:01 -0700 (PDT) Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-51d9865b8a2so448240a12.3 for ; Thu, 29 Jun 2023 01:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1688027640; x=1690619640; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=nXznbG4FVhuMJDAumoD9paaeIhbqWqlZS3esXX8J+O4=; b=XKc34Zr/KrbjfZ2h+WbrbmrFYeow/xUEUtaX5uqQDppmCb3l9MQvCfDWixxXTvFthS kQ+z4gnG6xNTDB6b0RTyln9I0j2PpqojT7364fCJfAhpf3a+K+khc8ntfByqUT+7xW7D RsTk0MPiDDUFEiv4SIrFFCgGO7UIwfrdcdXx1WbvQVPbpYlIGE3wjTuUktQ2jLhp2iTN DWeKhzz2obhJIVEYjJKhhwxn2dt6NpFiKhwsr0pRW7OE1WT15QI3BwcIBNED5NJnRXbI pFKIle2OFLB3OqhUF4hLNNkC4TGov5S0ptdHNVywbaom6fNgMBM+gcInytEXJS+dHRNS wFPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688027640; x=1690619640; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nXznbG4FVhuMJDAumoD9paaeIhbqWqlZS3esXX8J+O4=; b=XVbFdsgOB+VB4JzVOD025w/qtePFSjA1lK6kelCXyudLUihvwztGL7nYSTaMw69FvM W13NzHsu4u5pRg93zxZG5c6yOImMNi/k7Am0NNpIdb0LyeUpqr1ou+Ip0mA9qe9rEM1h fvoN1FXI7m2nH3saGG2hVnllRT2nIbax0cW0GRA6vWizvoIgunyw8aWeH77n7Yvjz6CT 5nxoIqqkObTAGXCs+T+3YAP5vRFB1xT+sTBzEaRIHrymbZxqO8NVcC7HV55P6k+Ie6Sm DgV66EKJS8zWl+s++sAFYnnU0/5oep9L48qEQHmuPPHAUkAiQ1kEudKLk6DAfkqeTjDj gMjw== X-Gm-Message-State: AC+VfDwp9iH+UtQ6R5I//l9Tp8DXukVnDh1tuRzbQdDHB7czBVRJSj2N E4gZIjXQ5704N/6RlQh051T2jthTtJrejgNnlbex/g== X-Received: by 2002:a05:6402:546:b0:51d:b8a3:b475 with SMTP id i6-20020a056402054600b0051db8a3b475mr2812445edx.39.1688027639928; Thu, 29 Jun 2023 01:33:59 -0700 (PDT) MIME-Version: 1.0 References: <8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com> <87mt18it1y.ffs@tglx> <68baeac9-9fa7-5594-b5e7-4baf8ac86b77@huawei.com> In-Reply-To: From: Vincent Guittot Date: Thu, 29 Jun 2023 10:33:49 +0200 Message-ID: Subject: Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling To: Xiongfeng Wang Cc: Thomas Gleixner , vschneid@redhat.com, Phil Auld , vdonnefort@google.com, Linux Kernel Mailing List , Wei Li , "liaoyu (E)" , zhangqiao22@huawei.com, Peter Zijlstra , Dietmar Eggemann , Ingo Molnar Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 29 Jun 2023 at 03:26, Xiongfeng Wang wrote: > > > > On 2023/6/28 0:46, Vincent Guittot wrote: > > On Mon, 26 Jun 2023 at 10:23, Xiongfeng Wang wrote: > >> > >> Hi, > >> > >> Kindly ping~ > >> Could you please take a look at this issue and the below temporary fix ? > >> > >> Thanks, > >> Xiongfeng > >> > >> On 2023/6/12 20:49, Xiongfeng Wang wrote: > >>> > >>> > >>> On 2023/6/9 22:55, Thomas Gleixner wrote: > >>>> On Fri, Jun 09 2023 at 19:24, Xiongfeng Wang wrote: > >>>> > >>>> Cc+ scheduler people, leave context intact > >>>> > >>>>> Hello, > >>>>> When I do some low power tests, the following hung task is printed. [...] > >>> diff --cc kernel/sched/fair.c > >>> index d9d6519fae01,bd6624353608..000000000000 > >>> --- a/kernel/sched/fair.c > >>> +++ b/kernel/sched/fair.c > >>> @@@ -5411,10 -5411,16 +5411,15 @@@ void start_cfs_bandwidth(struct cfs_ban > >>> { > >>> lockdep_assert_held(&cfs_b->lock); > >>> > >>> - if (cfs_b->period_active) > >>> + if (cfs_b->period_active) { > >>> + struct hrtimer_clock_base *clock_base = cfs_b->period_timer.base; > >>> + int cpu = clock_base->cpu_base->cpu; > >>> + if (!cpu_active(cpu) && cpu != smp_processor_id()) > >>> + hrtimer_start_expires(&cfs_b->period_timer, > >>> HRTIMER_MODE_ABS_PINNED); > >>> return; > >>> + } > > > > I have been able to reproduce your problem and run your fix on top. I > > still wonder if there is a > > Sorry, I forgot to provide the kernel modification to help reproduce the issue. > At first, the issue can only be reproduced on the product environment with > product stress testcase. After firguring out the reason, I add the following > modification. It make sure the process ran out cfs quota and can be sched out in > free_vm_stack_cache. Although the real schedule point is in __vunmap(), this can > also show the issue exists. I have been able to reproduce the problem ( or at least something similar) without your change below with a shorter cfs_quota_us and other tasks always running in the cgroup > > diff --git a/kernel/fork.c b/kernel/fork.c > index 0fb86b65ae60..3b2d83fb407a 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -110,6 +110,8 @@ > #define CREATE_TRACE_POINTS > #include > > +#include > + > /* > * Minimum number of threads to boot the kernel > */ > @@ -199,6 +201,9 @@ static int free_vm_stack_cache(unsigned int cpu) > struct vm_struct **cached_vm_stacks = per_cpu_ptr(cached_stacks, cpu); > int i; > > + mdelay(2000); > + cond_resched(); > + > for (i = 0; i < NR_CACHED_STACKS; i++) { > struct vm_struct *vm_stack = cached_vm_stacks[i]; > > Thanks, > Xiongfeng > > > Could we have a helper from hrtimer to get the cpu of the clock_base ? > > > > > >>> > >>> cfs_b->period_active = 1; > >>> - > >>> hrtimer_forward_now(&cfs_b->period_timer, cfs_b->period); > >>> hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED); > >>> } > >>> > >>> Thanks, > >>> Xiongfeng > >>> > >>> . > >>> > > . > >