Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752056AbdGEP3C (ORCPT ); Wed, 5 Jul 2017 11:29:02 -0400 Received: from mail-qk0-f195.google.com ([209.85.220.195]:33818 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750908AbdGEP3B (ORCPT ); Wed, 5 Jul 2017 11:29:01 -0400 Date: Wed, 5 Jul 2017 11:28:55 -0400 From: Tejun Heo To: Abdul Haleem Cc: sachinp , Stephen Rothwell , ego , linux-kernel , Li Zefan , linuxppc-dev , Ingo Molnar , mpe Subject: Re: [next-20170609] Oops while running CPU off-on (cpuset.c/cpuset_can_attach) Message-ID: <20170705152855.GD19330@htj.duckdns.org> References: <1497266622.15415.39.camel@abdul.in.ibm.com> <20170627153608.GD2289@htj.duckdns.org> <1499092582.10651.15.camel@abdul.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1499092582.10651.15.camel@abdul.in.ibm.com> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3015 Lines: 109 Hello, Abdul. Thanks for the debug info. Can you please see whether the following patch fixes the issue? If the problem is too difficult to reproduce to confirm the fix by seeing whether it no longer triggers, please let me know. We can instead apply a patch which triggers WARN on the failing condition to confirm the diagnosis. Thanks. diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index 793565c05742..8b4c3c2f2509 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -33,6 +33,9 @@ struct cgroup_taskset { struct list_head src_csets; struct list_head dst_csets; + /* the number of tasks in the set */ + int nr_tasks; + /* the subsys currently being processed */ int ssid; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index dbfd7028b1c6..e3c4152741a3 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1954,6 +1954,8 @@ static void cgroup_migrate_add_task(struct task_struct *task, if (!cset->mg_src_cgrp) return; + mgctx->tset.nr_tasks++; + list_move_tail(&task->cg_list, &cset->mg_tasks); if (list_empty(&cset->mg_node)) list_add_tail(&cset->mg_node, @@ -2047,16 +2049,18 @@ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx) return 0; /* check that we can legitimately attach to the cgroup */ - do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { - if (ss->can_attach) { - tset->ssid = ssid; - ret = ss->can_attach(tset); - if (ret) { - failed_ssid = ssid; - goto out_cancel_attach; + if (tset->nr_tasks) { + do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { + if (ss->can_attach) { + tset->ssid = ssid; + ret = ss->can_attach(tset); + if (ret) { + failed_ssid = ssid; + goto out_cancel_attach; + } } - } - } while_each_subsys_mask(); + } while_each_subsys_mask(); + } /* * Now that we're guaranteed success, proceed to move all tasks to @@ -2085,25 +2089,29 @@ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx) */ tset->csets = &tset->dst_csets; - do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { - if (ss->attach) { - tset->ssid = ssid; - ss->attach(tset); - } - } while_each_subsys_mask(); + if (tset->nr_tasks) { + do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { + if (ss->attach) { + tset->ssid = ssid; + ss->attach(tset); + } + } while_each_subsys_mask(); + } ret = 0; goto out_release_tset; out_cancel_attach: - do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { - if (ssid == failed_ssid) - break; - if (ss->cancel_attach) { - tset->ssid = ssid; - ss->cancel_attach(tset); - } - } while_each_subsys_mask(); + if (tset->nr_tasks) { + do_each_subsys_mask(ss, ssid, mgctx->ss_mask) { + if (ssid == failed_ssid) + break; + if (ss->cancel_attach) { + tset->ssid = ssid; + ss->cancel_attach(tset); + } + } while_each_subsys_mask(); + } out_release_tset: spin_lock_irq(&css_set_lock); list_splice_init(&tset->dst_csets, &tset->src_csets);