Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4977969pxu; Tue, 22 Dec 2020 05:41:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJzENov3RNmigGxlSfG/e+Bwl2YZrAFrbtIJFC7KzAGMQ7j9Bp0u2T4ZKQ/j/4djYFICCiLQ X-Received: by 2002:a05:6402:1c9b:: with SMTP id cy27mr20137444edb.253.1608644512447; Tue, 22 Dec 2020 05:41:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608644512; cv=none; d=google.com; s=arc-20160816; b=E9pLZLIoKrkBfKkzi/riqSYgDJm+8ZCK/pqymjAS3G+vOTCgVNegv+GemVFHz47nLn P9SNO8bu4wiQiauYVBtMrgQ1LUvcLvM7QBVoKzqS+MqjR/5IcPOg9hgecu2d/WwgAcAt elmB7pGt0uEvN0Utj0d1UpOUnaqZnxsOxeQdPmKEsOzDaCaLkXc+yZ6W7F0rviTVk3Tq ngikLCY7q7vxTKwRaa6JtNc6ALCnVjb1y1uUd56d6hV4F9GfLyEZrCCsQdattdwukjFr vkqq7sFwtDNMcptIUHwRlKBnNc9LWObWmWlqKvsxIUcevMnU3KM94/BHLCPTBxeTnTLg 1ECw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:in-reply-to:subject :cc:to:from:user-agent:references; bh=txM9+zhozkxBg5eLaeTNQgJKR2nt5R/jJZwviuOSjI4=; b=NMBVZ8FMDvu3/VgffLf3FxNlt64ExeTkCwPO6SC0S4ohKbLp2pewz2KwSWr2QMUrXb DUqYJV2G130kgntl80EIRf7eyi98omuA8hwjHjkdVn0oTaMf9yaMI8EC0CUTUH5aYHH5 TJGINvKfIwSdczTgcBxwG0Gmv2tviv7/ejwK3pEKoAraaPHa49BwhYYFGOUGe5yAbkJj JTAL4dERH6CulpW2/9Ii9MSmGtXb+/fPdGwwAnQQCx9N3EGJDnKMtwAtbr4Q4xlpQa2x wOM36bKm2WD4zxCrmO0w5g2SCJl8qC4RJoIfqOoYPu1HrMF7We7Nj2qqAnY/ibt0ub4V 13rQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y35si3656255edy.362.2020.12.22.05.41.28; Tue, 22 Dec 2020 05:41:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727062AbgLVNkq (ORCPT + 99 others); Tue, 22 Dec 2020 08:40:46 -0500 Received: from foss.arm.com ([217.140.110.172]:36054 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726860AbgLVNkp (ORCPT ); Tue, 22 Dec 2020 08:40:45 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B7FFF1FB; Tue, 22 Dec 2020 05:39:59 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D56243F6CF; Tue, 22 Dec 2020 05:39:57 -0800 (PST) References: User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: Dexuan Cui Cc: "mingo\@redhat.com" , "peterz\@infradead.org" , "juri.lelli\@redhat.com" , "vincent.guittot\@linaro.org" , "dietmar.eggemann\@arm.com" , "rostedt\@goodmis.org" , "bsegall\@google.com" , "mgorman\@suse.de" , "bristot\@redhat.com" , "x86\@kernel.org" , "linux-pm\@vger.kernel.org" , "linux-kernel\@vger.kernel.org" , "linux-hyperv\@vger.kernel.org" , Michael Kelley Subject: Re: v5.10: sched_cpu_dying() hits BUG_ON during hibernation: kernel BUG at kernel/sched/core.c:7596! In-reply-to: Date: Tue, 22 Dec 2020 13:39:53 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 22/12/20 09:13, Dexuan Cui wrote: > Hi, > I'm running a Linux VM with the recent mainline (48342fc07272, 12/20/2020) on Hyper-V. > When I test hibernation, the VM can easily hit the below BUG_ON during the resume > procedure (I estimate this can repro about 1/5 of the time). BTW, my VM has 40 vCPUs. > > I can't repro the BUG_ON with v5.9.0, so I suspect something in v5.10.0 may be broken? > > In v5.10.0, when the BUG_ON happens, rq->nr_running==2, and rq->nr_pinned==0: > > 7587 int sched_cpu_dying(unsigned int cpu) > 7588 { > 7589 struct rq *rq = cpu_rq(cpu); > 7590 struct rq_flags rf; > 7591 > 7592 /* Handle pending wakeups and then migrate everything off */ > 7593 sched_tick_stop(cpu); > 7594 > 7595 rq_lock_irqsave(rq, &rf); > 7596 BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq)); > 7597 rq_unlock_irqrestore(rq, &rf); > 7598 > 7599 calc_load_migrate(rq); > 7600 update_max_interval(); > 7601 nohz_balance_exit_idle(rq); > 7602 hrtick_clear(rq); > 7603 return 0; > 7604 } > > The last commit that touches the BUG_ON line is the commit > 3015ef4b98f5 ("sched/core: Make migrate disable and CPU hotplug cooperative") > but the commit looks good to me. > > Any idea? > I'd wager this extra task is a kworker; could you give this series a try? https://lore.kernel.org/r/20201218170919.2950-1-jiangshanlai@gmail.com > Thanks, > -- Dexuan