Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp976833pxb; Wed, 6 Apr 2022 05:51:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJykvmdOUIWISDYDXROnSo24Uj6eJvHpahnq5ytZ3nw8ljOb0CJDzps50SU2/fM2ie7PsAE5 X-Received: by 2002:a17:90a:5983:b0:1c9:ee11:76df with SMTP id l3-20020a17090a598300b001c9ee1176dfmr9836805pji.95.1649249467309; Wed, 06 Apr 2022 05:51:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649249467; cv=none; d=google.com; s=arc-20160816; b=R4yNmgnXkYwTmH6+c5jWaT1HR86WlQAAjxcw3MYFxa+3B3mYkDuAe+2KTtiw35ejpJ v/QE6HIpoS1jEYILOIP54KJsWazatNqfdx4eG54jW1YWlE0vmBXfw5qEAeD9CuOYGupy 5OeTxymPxNdmiziZrpGycuDGVYZghs4yQ/uwwVvnvPdNXAyq8nfdA4REbM5UA8DxdFJZ ZGxlh6Fkny/gcW7FjmxgFw/DV80Rmif9jBdZXQGKoOScOIYELuBznuKnSHU1sYCkczmN zfUx1ZKM2v1UpRAxmDAHVxpgKZAmSeTBiPz512y/0JFSyvxiwwL17NblrvC+gHgwejB5 dgIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=u+T3NU6QNFkOQKHYYcZy7EJ0pKJCkBa9x+ezT/MG77s=; b=EIvNN33sUKgH7wrIqlkuDM8EAlBQvMDJcBBzxRNEby9gOlBqx1vEuZ2dnBayd5GlZp NAxXjM6dWQeeuUQRjqPdv7zfbBqbLFZy1l8Epe+w02l3QK0CiPXCILF9/qYspGnk1wSo foMOqXXbLoLipjrj35xWUuwluwtwc6FXJLWJkoUTAXNnhCmXHmGaxs5pS3Noi0f3RB6r f55BSmWg+4dc8wKLxnRETFb8Xk1QQ/7IDe/JFQLVxqgALl2aEoadSFnGspxKdxddMkXh 3YDNWUWQqA3RN65hT8tT0GoLQb0ZJu4VzvcousIkNjqa64m9YFU3QY4y9Bye5B0qmnIb odzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=fHGI43Hn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id kk1-20020a17090b4a0100b001bd14e01f52si5016500pjb.64.2022.04.06.05.51.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 05:51:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=fHGI43Hn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 29C8A5EA590; Wed, 6 Apr 2022 02:43:24 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1573957AbiDEWxv (ORCPT + 99 others); Tue, 5 Apr 2022 18:53:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1573640AbiDET1r (ORCPT ); Tue, 5 Apr 2022 15:27:47 -0400 Received: from mail-il1-x12c.google.com (mail-il1-x12c.google.com [IPv6:2607:f8b0:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C55EBD7C6 for ; Tue, 5 Apr 2022 12:25:48 -0700 (PDT) Received: by mail-il1-x12c.google.com with SMTP id 14so286478ily.11 for ; Tue, 05 Apr 2022 12:25:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=u+T3NU6QNFkOQKHYYcZy7EJ0pKJCkBa9x+ezT/MG77s=; b=fHGI43HnzKEaZ07fIjKBZkHadZt8X8koXagtbSwmRMHFHmPgl9iQmJnTwHdQkVFRcO GU46qvE9L5+uBxPExQk4gtIc5KvuEeUYhLhWLcZ36mJgUc5Fk0FlnOoPz0BKzSuT+nJl 64Qeom+Zf9OtQ1GOxzg8pQLZywcy1w3SwLtIE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=u+T3NU6QNFkOQKHYYcZy7EJ0pKJCkBa9x+ezT/MG77s=; b=bSU3Y4CniODsyE6YmvJujSIh/gbBrtODerkqrxEm4DWsZ68HVHW65iMQ9hDsIr4jF4 UfWO7F8brW9LdiSpNSNrvxvDBiQuFw883U1KghuLZGQaevRrU79/qyebAAkrs3Tz9Pv1 B20OMzx8IkG+Xmc3nW7ZboLNqIbw/Wrb3WncsxPDhHb6MiZd4gVme41kBslf/LNWfmsV 1eGh/DRUpNn3JxhtMa/97r05YiqEKzVUNqsWsGbyuPNMucQhGeZ3QRQCzGXJHld5AJOv BpndhdgnFqVniRWN0hgIxoQvjxSeWFZpnMLI5d9M6aZF8yIj94m84Elhyvmg0cPLCuXo PTaw== X-Gm-Message-State: AOAM531kUXb5KM8uYxak39q5oTVvnD2YXXqyNtsTGO+AC3VK2fANfyJ9 BPY00Xr0xj8cpsAGchh2n0SqYJxVjYKZMSVJ8kWu3Q== X-Received: by 2002:a92:cac4:0:b0:2c8:1095:b352 with SMTP id m4-20020a92cac4000000b002c81095b352mr2384844ilq.103.1649186747658; Tue, 05 Apr 2022 12:25:47 -0700 (PDT) MIME-Version: 1.0 References: <20220330160535.GN8939@worktop.programming.kicks-ass.net> In-Reply-To: From: Joel Fernandes Date: Tue, 5 Apr 2022 15:25:36 -0400 Message-ID: Subject: Re: [PATCH] sched/core: Fix forceidle balancing To: Peter Zijlstra Cc: Ingo Molnar , Steven Rostedt , Vincent Guittot , LKML , Thomas Gleixner , Sebastian Andrzej Siewior , Dietmar Eggemann , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Guenter Roeck Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 1, 2022 at 7:46 AM Peter Zijlstra wrote: > > On Thu, Mar 31, 2022 at 03:00:40PM -0400, Joel Fernandes wrote: > > Hi, > > > > By the way, might be slightly related - we still see crashes with > > pick_task_fair() in our kernel even with this change: > > https://lkml.org/lkml/2020/11/17/2137 > > Please as to not use lkml.org. Please use something with a MsgID in like > lore. Yep, will do. > > Is it possible that when doing pick_task_fair() especially on a remote > > CPU, both the "cfs_rq->curr" and the rbtree's "left" be NULL with core > > scheduling? In this case, se will be NULL and can cause crashes right? > > I think the code assumes this can never happen. > > > > +Guenter Roeck kindly debugged pick_task_fair() in a crash as > > follows. Copying some details he mentioned in a bug report: > > > > Assembler/source: > > > > 25: e8 4f 11 00 00 call 0x1179 ; se = > > pick_next_entity(cfs_rq, curr); > > 2a:* 48 8b 98 60 01 00 00 mov 0x160(%rax),%rbx ; trapping > > instruction [cfs_rq = group_cfs_rq(se);] > > 31: 48 85 db test %rbx,%rbx > > 34: 75 d1 jne 0x7 > > 36: 48 89 c7 mov %rax,%rdi > > > > At 2a: RAX = se == NULL after pick_next_entity(). Looking closely into > > pick_next_entity(), it can indeed return NULL if curr is NULL and if > > left in pick_next_entity() is NULL. Per line 7:, curr is in %r14 and > > indeed 0. > > > > Thoughts? > > It is possible for ->curr and ->leftmost to be NULL, but then we should > also be having ->nr_running == 0 and not call pick in the first place. > Because picking a task from no tasks doesn't make much sense. Indeed the code checks for nr_running so it is really bizarre. My guess is this is kernel memory corruption due to an unrelated bug or something, it is also not easy to trigger. Thanks, - Joel