Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7322483imu; Thu, 27 Dec 2018 17:46:59 -0800 (PST) X-Google-Smtp-Source: ALg8bN7h+wGL7krvozosEijTtmiThZCRoNLeD45gq7l/eTmHlEErI7Qie+F5UHgythcQsSuHo616 X-Received: by 2002:a17:902:7d90:: with SMTP id a16mr24773578plm.249.1545961619536; Thu, 27 Dec 2018 17:46:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1545961619; cv=none; d=google.com; s=arc-20160816; b=0AFv/nf0WorZ52YXbyJ4Vxfpli3yYjQ2F3bGDJPO/YdP+d+mFTsztLQo+C8QZAFvKS GVWvm3lFefNV4kgKOEA8IyigpgW0GN1m3JenGTkGFCgZtR6L/ZZdntA/0Kasy85/bu4v HLZVRjW0gYh1RoYFdg+pV+fzWpNXeZni+04X0LnHqQ1a5gCNvJpkPQrRILrV0+KMUDf0 l6AIer9YYbE28MM3Ea0FCw1qajGsL2Ug1aS+yR4Boy1ENtxKmeXWfl6JCI+zlvksS4DG x+0WOTVjxyfMSCjL+ktUfDeFQaPAeW/9pCQxqKtDBKfxeiR5v1N3u0AWxlem5pLocDgv qEbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=FtgNyepzqbTkY4r5q3lqXSKgyVadBV5IxTsZaLWA19A=; b=kxB730n84hpCifOI0g9s2T35By3UUAVXAsz+8qXAZTwUasx1JXyNfGA5Fzk+NW0wGG CGDtT4t55dzf/Jri8yLl3FD0J8zftGWpdpb6K+VW0qZcqOQTNAaZYb+SYS5blb0PEKx0 84T9G2UVnWYLXJ2erJe1HshsjuWnG1lx3rJ3fmKpG1m4HBbsB9vmUGRD/gFpBgwFPhqL reSPhs/l09C/oTmUqOmicoIPYVb8x3ZHVGfl+XuRluHVC8sgyFf2H1DUtxoaxZs2MDPY hemILcj39dDojkZEakkxj1smuDS7JclXbHUfP5QZd/IREi9CsMckvM7oeTVzl4Vucnw7 a7VQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bExDuNXO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w31si37399505pla.308.2018.12.27.17.46.44; Thu, 27 Dec 2018 17:46:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bExDuNXO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731202AbeL0RB7 (ORCPT + 99 others); Thu, 27 Dec 2018 12:01:59 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:45411 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729264AbeL0RB7 (ORCPT ); Thu, 27 Dec 2018 12:01:59 -0500 Received: by mail-io1-f68.google.com with SMTP id p7so4110819iog.12 for ; Thu, 27 Dec 2018 09:01:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=FtgNyepzqbTkY4r5q3lqXSKgyVadBV5IxTsZaLWA19A=; b=bExDuNXO7XjpmSF8Uo7l91ET0R+x9Xml59aLkb3l104r6OevlrHXyaMmpEtlXQm1hK 321ECQV9BStRf3tkxFEw55aUtWUOCw6q4tzBBQscMDTqwUQo9E3GPObETEGuwwPNxu7c 0zJRxHISD5v9FI7M+m5jz56cYLVhvlaY6m1Hs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FtgNyepzqbTkY4r5q3lqXSKgyVadBV5IxTsZaLWA19A=; b=RJ3uKgAsuO+umhlFwseHLtow/i/jaicnc5O8z7utl2iYg4p2XWLGLTV1aZd2YwEKJG fQmxd9tGBkihHkYttjPfEMevMBltfWvFH3+MCbWyog3NRMUMyQwucgz2xqmauLBRKVT7 jo7gHmsBnowWvETN0kln3fiwTFGpLftfQDYEjDSUTiENIKlUaYPJ7n25bBQt0lnDP6iA NvtT6kkx6aZdo00+56D2LnQz1UdlG/y/GAlI7weyZ6DbP4LGqgKh6ckem6MDHJdrK3T8 Z/pA6+W3C40lpsFJsir0t1DT3mfJEu9Aop1FJ7k09pLzFhz1FbqMJQB6p2MVIGbk7YGi ICFA== X-Gm-Message-State: AJcUukc4yH8K68r0WQPqS4en30O2vZAkfFeRDOd/CeTO9j/SB/irkUjT 36xTJLB7Ce7CF96MLJjD0ikRpcMgcMNXZ0wjjFBQOw== X-Received: by 2002:a6b:6b02:: with SMTP id g2mr12537364ioc.18.1545930117212; Thu, 27 Dec 2018 09:01:57 -0800 (PST) MIME-Version: 1.0 References: <1545879866-27809-1-git-send-email-xiexiuqi@huawei.com> <20181227102107.GA21156@linaro.org> In-Reply-To: From: Vincent Guittot Date: Thu, 27 Dec 2018 18:01:46 +0100 Message-ID: Subject: Re: [PATCH] sched: fix infinity loop in update_blocked_averages To: Sargun Dhillon Cc: Xie XiuQi , Ingo Molnar , Peter Zijlstra , xiezhipeng1@huawei.com, huawei.libin@huawei.com, linux-kernel , Dmitry Adamushko , Tejun Heo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 27 Dec 2018 at 17:40, Sargun Dhillon wrote: > > On Thu, Dec 27, 2018 at 5:23 AM Vincent Guittot > wrote: > > > > Adding Sargun and Dimitry who faced similar problem > > Adding Tejun > > > > On Thu, 27 Dec 2018 at 11:21, Vincent Guittot > > wrote: > > > > > > Le Thursday 27 Dec 2018 =C3=A0 10:21:53 (+0100), Vincent Guittot a = =C3=A9crit : > > > > Hi Xie, > > > > > > > > On Thu, 27 Dec 2018 at 03:57, Xie XiuQi wrote= : > > > > > > > > > > Zhepeng Xie report a bug, there is a infinity loop in > > > > > update_blocked_averages(). > > > > > > > > > > PID: 14233 TASK: ffff800b2de08fc0 CPU: 1 COMMAND: "docker" > > > > > #0 [ffff00002213b9d0] update_blocked_averages at ffff00000811e4a= 8 > > > > > #1 [ffff00002213ba60] pick_next_task_fair at ffff00000812a3b4 > > > > > #2 [ffff00002213baf0] __schedule at ffff000008deaa88 > > > > > #3 [ffff00002213bb70] schedule at ffff000008deb1b8 > > > > > #4 [ffff00002213bb80] futex_wait_queue_me at ffff000008180754 > > > > > #5 [ffff00002213bbd0] futex_wait at ffff00000818192c > > > > > #6 [ffff00002213bd00] do_futex at ffff000008183ee4 > > > > > #7 [ffff00002213bde0] __arm64_sys_futex at ffff000008184398 > > > > > #8 [ffff00002213be60] el0_svc_common at ffff0000080979ac > > > > > #9 [ffff00002213bea0] el0_svc_handler at ffff000008097a6c > > > > > #10 [ffff00002213bff0] el0_svc at ffff000008084044 > > > > > > > > > > rq->tmp_alone_branch introduced in 4.10, used to point to > > > > > the new beg of the list. If this cfs_rq is deleted somewhere > > > > > else, then the tmp_alone_branch will be illegal and cause > > > > > a list_add corruption. > > > > > > > > shouldn't all the sequence be protected by rq_lock ? > > > > > > > > > > > > > > > > > > (When enabled DEBUG_LIST, we fould this list_add corruption) > > > > > > > > > > [ 2546.741103] list_add corruption. next->prev should be prev > > > > > (ffff800b4d61ad40), but was ffff800ba434fa38. (next=3Dffff800b6a9= 5e740). > > > > > [ 2546.741130] ------------[ cut here ]------------ > > > > > [ 2546.741132] kernel BUG at lib/list_debug.c:25! > > > > > [ 2546.741136] Internal error: Oops - BUG: 0 [#1] SMP > > > > > [ 2546.742870] CPU: 1 PID: 29428 Comm: docker-runc Kdump: loaded = Tainted: G E 4.19.5-1.aarch64 #1 > > > > > [ 2546.745415] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.= 0 02/06/2015 > > > > > [ 2546.747402] pstate: 40000085 (nZcv daIf -PAN -UAO) > > > > > [ 2546.749015] pc : __list_add_valid+0x50/0x90 > > > > > [ 2546.750485] lr : __list_add_valid+0x50/0x90 > > > > > [ 2546.751975] sp : ffff00001b5eb910 > > > > > [ 2546.753286] x29: ffff00001b5eb910 x28: ffff800abacf0000 > > > > > [ 2546.754976] x27: ffff00001b5ebbb0 x26: ffff000009570000 > > > > > [ 2546.756665] x25: ffff00000960d000 x24: 00000250f41ca8f8 > > > > > [ 2546.758366] x23: ffff800b6a95e740 x22: ffff800b4d61ad40 > > > > > [ 2546.760066] x21: ffff800b4d61ad40 x20: ffff800ba434f080 > > > > > [ 2546.761742] x19: ffff800b4d61ac00 x18: ffffffffffffffff > > > > > [ 2546.763425] x17: 0000000000000000 x16: 0000000000000000 > > > > > [ 2546.765089] x15: ffff000009570748 x14: 6666662073617720 > > > > > [ 2546.766755] x13: 747562202c293034 x12: 6461313664346230 > > > > > [ 2546.768429] x11: 3038666666662820 x10: 0000000000000000 > > > > > [ 2546.770124] x9 : 0000000000000001 x8 : ffff000009f34a0f > > > > > [ 2546.771831] x7 : 0000000000000000 x6 : 000000000000250d > > > > > [ 2546.773525] x5 : 0000000000000000 x4 : 0000000000000000 > > > > > [ 2546.775227] x3 : 0000000000000000 x2 : 70ef7f624013ca00 > > > > > [ 2546.776929] x1 : 0000000000000000 x0 : 0000000000000075 > > > > > [ 2546.778623] Process docker-runc (pid: 29428, stack limit =3D 0= x00000000293494a2) > > > > > [ 2546.780742] Call trace: > > > > > [ 2546.781955] __list_add_valid+0x50/0x90 > > > > > [ 2546.783469] enqueue_entity+0x4a0/0x6e8 > > > > > [ 2546.784957] enqueue_task_fair+0xac/0x610 > > > > > [ 2546.786502] sched_move_task+0x134/0x178 > > > > > [ 2546.787993] cpu_cgroup_attach+0x40/0x78 > > > > > [ 2546.789540] cgroup_migrate_execute+0x378/0x3a8 > > > > > [ 2546.791169] cgroup_migrate+0x6c/0x90 > > > > > [ 2546.792663] cgroup_attach_task+0x148/0x238 > > > > > [ 2546.794211] __cgroup1_procs_write.isra.2+0xf8/0x160 > > > > > [ 2546.795935] cgroup1_procs_write+0x38/0x48 > > > > > [ 2546.797492] cgroup_file_write+0xa0/0x170 > > > > > [ 2546.799010] kernfs_fop_write+0x114/0x1e0 > > > > > [ 2546.800558] __vfs_write+0x60/0x190 > > > > > [ 2546.801977] vfs_write+0xac/0x1c0 > > > > > [ 2546.803341] ksys_write+0x6c/0xd8 > > > > > [ 2546.804674] __arm64_sys_write+0x24/0x30 > > > > > [ 2546.806146] el0_svc_common+0x78/0x100 > > > > > [ 2546.807584] el0_svc_handler+0x38/0x88 > > > > > [ 2546.809017] el0_svc+0x8/0xc > > > > > > > > > > > > > Have you got more details about the sequence that generates this bu= g ? > > > > Is it easily reproducible ? > > > > > > > > > In this patch, we move rq->tmp_alone_branch point to its prev bef= ore delete it > > > > > from list. > > > > > > > > > > Reported-by: Zhipeng Xie > > > > > Cc: Bin Li > > > > > Cc: [4.10+] > > > > > Fixes: 9c2791f936ef (sched/fair: Fix hierarchical order in rq->le= af_cfs_rq_list) > > > > > > > > If it only happens in update_blocked_averages(), the del leaf has b= een added by: > > > > a9e7f6544b9c (sched/fair: Fix O(nr_cgroups) in load balance path) > > > > > > > > > Signed-off-by: Xie XiuQi > > > > > Tested-by: Zhipeng Xie > > > > > --- > > > > > kernel/sched/fair.c | 5 +++++ > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > > index ac855b2..7a72702 100644 > > > > > --- a/kernel/sched/fair.c > > > > > +++ b/kernel/sched/fair.c > > > > > @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(stru= ct cfs_rq *cfs_rq) > > > > > static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) > > > > > { > > > > > if (cfs_rq->on_list) { > > > > > + struct rq *rq =3D rq_of(cfs_rq); > > > > > + > > > > > + if (rq->tmp_alone_branch =3D=3D &cfs_rq->leaf_cfs= _rq_list) > > > > > + rq->tmp_alone_branch =3D cfs_rq->leaf_cfs= _rq_list.prev; > > > > > + > > > > > > I'm afraid that your patch will break the ordering of leaf_cfs_rq_lis= t > > > > > > Can you tried the patch below: > > > > > > --- > > > kernel/sched/fair.c | 7 ------- > > > 1 file changed, 7 deletions(-) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index ca46964..4d51b2d 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -7694,13 +7694,6 @@ static void update_blocked_averages(int cpu) > > > if (se && !skip_blocked_update(se)) > > > update_load_avg(cfs_rq_of(se), se, 0); > > > > > > - /* > > > - * There can be a lot of idle CPU cgroups. Don't let= fully > > > - * decayed cfs_rqs linger on the list. > > > - */ > > > - if (cfs_rq_is_decayed(cfs_rq)) > > > - list_del_leaf_cfs_rq(cfs_rq); > > > - > > > /* Don't need periodic decay once load/util_avg are n= ull */ > > > if (cfs_rq_has_blocked(cfs_rq)) > > > done =3D false; > > > -- > > > 2.7.4 > > > > > > > > > > > > > Tested-by: Sargun Dhillon Thanks > > This patch fixes things for me. I imagine this code that's being > removed has a purpose? In the original behavior, the cs_rq was removed from the list only when the cgroup was removed. patch a9e7f6544b9c (sched/fair: Fix O(nr_cgroups) in load balance path) has added an optimization which remove the cfs_rq when there were no blocked load to update in order to optimize the loop but it has introduced a race condition that create this infinite loop. The patch fixes the problem by removing the optimization. I will look at re-adding the optimization once i will have afix for the race condition > > > > > list_del_rcu(&cfs_rq->leaf_cfs_rq_list); > > > > > cfs_rq->on_list =3D 0; > > > > > } > > > > > -- > > > > > 1.8.3.1 > > > > >