Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753758AbbKYQdt (ORCPT ); Wed, 25 Nov 2015 11:33:49 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33332 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753266AbbKYQdq (ORCPT ); Wed, 25 Nov 2015 11:33:46 -0500 Date: Wed, 25 Nov 2015 17:34:27 +0100 From: Oleg Nesterov To: Aleksa Sarai , Johannes Weiner , Li Zefan , Tejun Heo Cc: linux-kernel@vger.kernel.org Subject: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate() Message-ID: <20151125163427.GA8139@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3555 Lines: 115 Hello, I am trying to backport cgroup_pids.c and I fail to understand pids_fork() which does css = task_get_css(task, pids_cgrp_id); pids = css_pids(css); /* * If the association has changed, we have to revert and reapply the * charge/uncharge on the wrong hierarchy to the current one. Since * the association can only change due to an organisation event, its * okay for us to ignore the limit in this case. */ if (pids != old_pids) { pids_uncharge(old_pids, 1); pids_charge(pids, 1); } But if the association has changed, pids_can_attach() which moved the child into another cgrp has called pids_uncharge(old_pids) too? IOW. Suppose that the new child is moved right before cgroup_post_fork() does for_each_subsys_which(...) ss->fork(child); doesn't this mean that after ss->fork() we do the same sequence pids_uncharge(old_pids, 1); pids_charge(pids, 1); twice? Note that threadgroup_change_begin/end depends on CLONE_THREAD. So we can actually hit WARN_ON() in pids_cancel(). However, we can't simply remove this uncharge/charge afaics. We need this in case when the parent was moved to another cgroup before cgroup_post_fork(), and then css_set_move_task() moves the child. I know almost nothing about cgroups, perhaps I missed something, please correct me. If am right. How about the patch below? percpu_down_read() is cheap. And we can simplify cgroup_pids after this change. And. We can probably unify cgroup_threadgroup_rwsem and dup_mmap_sem. Note that if we take cgroup_threadgroup_rwsem for reading if CLONE_THREAD, otherwise we take another percpu-rwsem in dup_mmap(), dup_mmap_sem. Or I am totally confused? Oleg. --- x/kernel/fork.c +++ x/kernel/fork.c @@ -1368,8 +1368,7 @@ static struct task_struct *copy_process( p->real_start_time = ktime_get_boot_ns(); p->io_context = NULL; p->audit_context = NULL; - if (clone_flags & CLONE_THREAD) - threadgroup_change_begin(current); + threadgroup_change_begin(current); cgroup_fork(p); #ifdef CONFIG_NUMA p->mempolicy = mpol_dup(p->mempolicy); @@ -1610,8 +1609,7 @@ static struct task_struct *copy_process( proc_fork_connector(p); cgroup_post_fork(p, cgrp_ss_priv); - if (clone_flags & CLONE_THREAD) - threadgroup_change_end(current); + threadgroup_change_end(current); perf_event_fork(p); trace_task_newtask(p, clone_flags); --- x/kernel/cgroup_pids.c +++ x/kernel/cgroup_pids.c @@ -243,27 +243,10 @@ static void pids_cancel_fork(struct task static void pids_fork(struct task_struct *task, void *priv) { - struct cgroup_subsys_state *css; - struct cgroup_subsys_state *old_css = priv; - struct pids_cgroup *pids; - struct pids_cgroup *old_pids = css_pids(old_css); - - css = task_get_css(task, pids_cgrp_id); - pids = css_pids(css); - - /* - * If the association has changed, we have to revert and reapply the - * charge/uncharge on the wrong hierarchy to the current one. Since - * the association can only change due to an organisation event, its - * okay for us to ignore the limit in this case. - */ - if (pids != old_pids) { - pids_uncharge(old_pids, 1); - pids_charge(pids, 1); - } + struct cgroup_subsys_state *css = priv; + WARN_ON(task_css(task, pids_cgrp_id) != css); css_put(css); - css_put(old_css); } static void pids_free(struct task_struct *task) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/