Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3138543ybf; Mon, 2 Mar 2020 23:57:04 -0800 (PST) X-Google-Smtp-Source: ADFU+vuK9pDiKxnvEQQETl0qt0P9smhJR5ogi4QaNEd+twNUaPARK/kBN3JB40r+ftcurZ1Lmcu0 X-Received: by 2002:a9d:51cb:: with SMTP id d11mr2328970oth.219.1583222224456; Mon, 02 Mar 2020 23:57:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583222224; cv=none; d=google.com; s=arc-20160816; b=IuJ9dbzVj2+iVNnl8NusOv8VubjMSHh0VRaxiCtEzKhRUtvN5uWxvpWT7KaDWtK/kh CLq/Sw3XlKAdq5V1SJo5NvwSlawJoiz51Z3c8e5+gez8N8UWDjlXcbj2Cb50DpT3H1Tc Fd+C+jsJyjmhkFKUfHVwFpFbrbNxKhUa+/XDK+Fp7MlTh5/8uqS+PEwzUaI81qg5nNoF Os37rVplVg98T9faBolItvps9vKPVmm5P5YiGe0JuuhHJ4kNSjUb5dcxZ7YSkFt7bmLI PqidcLJBxG+7fTBM3fBwHTjL3Kja/TL3VZT8F39gRfDF3YfiP9V4FqTQA3ZIDsTVAqSS scRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=1lPF8zw0bXFqFlQLFPStlm5FHoYNBwBPM18zyUiNjIA=; b=gJtZUBWOqr8ecBGPFtW66lpvoQhbT/bUOKYT0T75yhGv2mkV0qG+aF8eOWy1fc2idp fv7YLRt9o5DR24Y5Fr1FuyME3XMwe9hRrtQW92dHtBQbUL96MQnfepb6DCONbPYVnCco pguDO6ylFzFeBjqjsFdjOLBI8QF4X1myv/BbBB4AUWrgdURuNl2yA44kaydxGZ1Y3KTu cOcdb27CZv9/5WSIi8m7d4Ceu1hy3i7Kqzij6zECYro4+9m3iojvgziFa8csz2MZZdNF wr6aWL1o3LMvU6iycb+S6behLOYZ2X+v4TEOyBhNtYKuEEk1TK6mfLv92zw/jgdCIy2T VFcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=lAxDWi1+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e9si7250761otk.318.2020.03.02.23.56.52; Mon, 02 Mar 2020 23:57:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=lAxDWi1+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727725AbgCCHzy (ORCPT + 99 others); Tue, 3 Mar 2020 02:55:54 -0500 Received: from mail-lf1-f68.google.com ([209.85.167.68]:33283 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727531AbgCCHzy (ORCPT ); Tue, 3 Mar 2020 02:55:54 -0500 Received: by mail-lf1-f68.google.com with SMTP id c20so1906212lfb.0 for ; Mon, 02 Mar 2020 23:55:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=1lPF8zw0bXFqFlQLFPStlm5FHoYNBwBPM18zyUiNjIA=; b=lAxDWi1+pqP/9qDvQilXYNucTAvjaes6a1BWSvxMFNN+3Lf1LQdW3q2rHh4Y6D21+I 2fflpfjG4j7t97ZZFhfV9Xzx55jJpeApk/0VsIIe/0y0zr/bOym4XSiRcK+Gjew3eaxX AQ8rdWEQTkLQLJgtr3kYIHLac4tAILLe247A4xLz+511vTnO+ciLDqpkiELvyvVsLLvs nNU1DSD3EyigBRd9aToPtgI4Tt+IMAdxFvmHi8mVgzyWrdouZlFZnN3b09IKGddGScyh Y9gGQWtdXmdBQ5CCPS0XVqsKtr9hVcbhUKl0UqGQs6uReu3GM0BOxcgF8B2dP7MpOuW2 WniQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=1lPF8zw0bXFqFlQLFPStlm5FHoYNBwBPM18zyUiNjIA=; b=uNVhpLEsHVWSDrofksbeJYwRodmReo4jnDnm6IfKjZKWR+2lZKr/DNrZ1VaPnl4yZ2 7K33j71B2qk3Xuz2QN66pCFcIPIl0KGHzBp83mvMRrsNsYEmFctKJqu7MiS+aswpTGdf AQqc0lWkvSph7qXN55LBv9yC0PCpf/beuLVp5cCanheBiNw0BxrTjaYX/tbaEnNZ9euM ygHaeIunoj1zXyJYgubdEHG8AewgpkoY/MlvnMKnwwW03TXEt6mNcxcCms9cAhAlVOfS dedfL8GcAL6g7fTzl8AyU/g04o8l8HNVAp1Zlvnz6C20/bJwXxfYyoYcf+90A+ZkxexP SdAw== X-Gm-Message-State: ANhLgQ03Y65uWcpvgn1e1+IBG9qC2PhqqgYGicyO5dIIoBmVu7adW1v7 SHSs7Ugdnbxc7vm9m81lWu5iBKtSe3Wg0uBC6g23CA== X-Received: by 2002:a19:230d:: with SMTP id j13mr1964235lfj.189.1583222151895; Mon, 02 Mar 2020 23:55:51 -0800 (PST) MIME-Version: 1.0 References: <1a607a98-f12a-77bd-2062-c3e599614331@de.ibm.com> <20200228163545.GA18662@vingu-book> <49a2ebb7-c80b-9e2b-4482-7f9ff938417d@de.ibm.com> In-Reply-To: From: Vincent Guittot Date: Tue, 3 Mar 2020 08:55:40 +0100 Message-ID: Subject: Re: 5.6-rc3: WARNING: CPU: 48 PID: 17435 at kernel/sched/fair.c:380 enqueue_task_fair+0x328/0x440 To: Christian Borntraeger Cc: Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 3 Mar 2020 at 08:37, Christian Borntraeger wrote: > > > > On 02.03.20 19:17, Christian Borntraeger wrote: > > On 02.03.20 12:16, Christian Borntraeger wrote: > >> > >> > >> On 28.02.20 17:35, Vincent Guittot wrote: > >>> Le vendredi 28 f=C3=A9vr. 2020 =C3=A0 16:42:27 (+0100), Christian Bor= ntraeger a =C3=A9crit : > >>>> > >>>> > >>>> On 28.02.20 16:37, Vincent Guittot wrote: > >>>>> On Fri, 28 Feb 2020 at 16:08, Christian Borntraeger > >>>>> wrote: > >>>>>> > >>>>>> Also happened with 5.4: > >>>>>> Seems that I just happen to have an interesting test workload/syst= em size interaction > >>>>>> on a newly installed system that triggers this. > >>>>> > >>>>> you will probably go back to 5.1 which is the version where we put > >>>>> back the deletion of unused cfs_rq from the list which can trigger = the > >>>>> warning: > >>>>> commit 039ae8bcf7a5 : (Fix O(nr_cgroups) in the load balancing path= ) > >>>>> > >>>>> AFAICT, we haven't changed this since > >>>> > >>>> So you do know what is the problem? If not is there any debug option= or > >>>> patch that I could apply to give you more information? > >>> > >>> No I don't know what is happening. Your test probably goes through an= unexpected path > >>> > >>> Would it be difficult for me to reproduce your test env ? > >> > >> Not sure. Its a 32CPU (SMT2 -> 64) host. I have about 10 KVM guests ru= nning doing different > >> things. > >> > >>> > >>> There is an optimization in the code which could generate problem if = assumption is not > >>> true. Could you try the patch below ? > >>> > >>> --- > >>> kernel/sched/fair.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >>> index 3c8a379c357e..beb773c23e7d 100644 > >>> --- a/kernel/sched/fair.c > >>> +++ b/kernel/sched/fair.c > >>> @@ -4035,8 +4035,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sc= hed_entity *se, int flags) > >>> __enqueue_entity(cfs_rq, se); > >>> se->on_rq =3D 1; > >>> > >>> + list_add_leaf_cfs_rq(cfs_rq); > >>> if (cfs_rq->nr_running =3D=3D 1) { > >>> - list_add_leaf_cfs_rq(cfs_rq); > >>> check_enqueue_throttle(cfs_rq); > >>> } > >>> } > >> > >> Now running for 3 hours. I have not seen the issue yet. I can tell tom= orrow if this fixes > >> the issue. > > > > > > Still running fine. I can tell for sure tomorrow, but I have the impres= sion that this makes the > > WARN_ON go away. > > So I guess this change "fixed" the issue. If you want me to test addition= al patches, let me know. Thanks for the test. For now, I don't have any other patch to test. I have to look more deeply how the situation happens. I will let you know if I have other patch to test >