Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp5644353img; Wed, 27 Mar 2019 12:19:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqztsc8HcjhRwENtz60eBkt0HjEqGX70qqeS9CD2ZLUzvzVrRMIghIUSKgNQmki7LQMh7HBf X-Received: by 2002:a63:190d:: with SMTP id z13mr35569467pgl.432.1553714368215; Wed, 27 Mar 2019 12:19:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553714368; cv=none; d=google.com; s=arc-20160816; b=RqWnGPKn/RIGLdTsg2zQ1k/TUcFCLq270kOZiI6trmI7tMEK/WYCmkQnC+Nlpjwgl4 YlkoLTplR6lgaFcYVYhcQDeBRnBMp7/cNn+ZLi77lTlv2hCynk1lwuMH+ChV2FKI+/0L HaMU0HtwLDvqicmvsWxZUcaDOvRr9phpXnb9i243ktjUgGleQvKBnljjZfzH+U7uAwvD L+MGhHCGqfnRUp4zfwZEi4nsTJsllRLm672lvYpIIk3g4JZfQC6e6vO5VoJa4ROgzvAy xIY9lN1dGB5EX5RK6t6jJy7xC4IU3KM5jVwOpxz9Gozc4aQfmHW9mMsG4OyK8HtGU10U fMIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Ba8K1Rgzkhzv5uCeJ/n+LY3WhAaX0705wQEuHQCjhCY=; b=mvPo+pow6+w3PZ27yiS5wMgLJ1wjgn4i7/gxDTOhPWIfpdB/lKL+x2ExNCuIoE2dIv /0+LHGIM0pGcyMlngFz0hSfjiEqUNlnHvmnpt0iGjmK6tBFx/aqPp9xnUMk+VSotq/Ej ZVHcuE0+3y51/vcLIRAkJYWPYj0yKr9uFKqOYI5kpDxAeb0D+U1DtaAklKIIpYFSHdJa /X7f3xcYJu6Zpx+k+oGH27lS0vEcJK5/FDk5rq43MZdcsxeo7AJYdfGbGNdZBEMjvPpu TZEFAh7KbUNDiUMe9xTOM5pDD+e6krVHj+IB5A+oiWolOuj6utgYDw/AZTnnvG+nQ3za Di3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=pCVPr5XP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i3si19755722plt.120.2019.03.27.12.19.12; Wed, 27 Mar 2019 12:19:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=pCVPr5XP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388419AbfC0SIu (ORCPT + 99 others); Wed, 27 Mar 2019 14:08:50 -0400 Received: from mail.kernel.org ([198.145.29.99]:50422 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388367AbfC0SIi (ORCPT ); Wed, 27 Mar 2019 14:08:38 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id D910E2070B; Wed, 27 Mar 2019 18:08:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1553710116; bh=BB4PbjXe50kzG7St37LEU3FBgzYqQQiYKuZ1AxfRPwk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pCVPr5XPpsGbhRbsLstcvsWtWKjl/uGIRqNTuqBckUmDyfC+2+VJf2Aln3LYwC3+J QmEpmuV8/j2x98eRSqU/JhJVSGTwy8x+Uzk/uC+KsUH5+tmEE6mN4WoAvTA+5AobgX B2GWZLI5iGFaSN7awS8kAz7wnA3lJAmsZ2Iz/aFM= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Oleg Nesterov , Tejun Heo , Sasha Levin , cgroups@vger.kernel.org Subject: [PATCH AUTOSEL 5.0 203/262] cgroup/pids: turn cgroup_subsys->free() into cgroup_subsys->release() to fix the accounting Date: Wed, 27 Mar 2019 14:00:58 -0400 Message-Id: <20190327180158.10245-203-sashal@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190327180158.10245-1-sashal@kernel.org> References: <20190327180158.10245-1-sashal@kernel.org> MIME-Version: 1.0 X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Oleg Nesterov [ Upstream commit 51bee5abeab2058ea5813c5615d6197a23dbf041 ] The only user of cgroup_subsys->free() callback is pids_cgrp_subsys which needs pids_free() to uncharge the pid. However, ->free() is called from __put_task_struct()->cgroup_free() and this is too late. Even the trivial program which does for (;;) { int pid = fork(); assert(pid >= 0); if (pid) wait(NULL); else exit(0); } can run out of limits because release_task()->call_rcu(delayed_put_task_struct) implies an RCU gp after the task/pid goes away and before the final put(). Test-case: mkdir -p /tmp/CG mount -t cgroup2 none /tmp/CG echo '+pids' > /tmp/CG/cgroup.subtree_control mkdir /tmp/CG/PID echo 2 > /tmp/CG/PID/pids.max perl -e 'while ($p = fork) { wait; } $p // die "fork failed: $!\n"' & echo $! > /tmp/CG/PID/cgroup.procs Without this patch the forking process fails soon after migration. Rename cgroup_subsys->free() to cgroup_subsys->release() and move the callsite into the new helper, cgroup_release(), called by release_task() which actually frees the pid(s). Reported-by: Herton R. Krzesinski Reported-by: Jan Stancek Signed-off-by: Oleg Nesterov Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin --- include/linux/cgroup-defs.h | 2 +- include/linux/cgroup.h | 2 ++ kernel/cgroup/cgroup.c | 15 +++++++++------ kernel/cgroup/pids.c | 4 ++-- kernel/exit.c | 1 + 5 files changed, 15 insertions(+), 9 deletions(-) diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 8fcbae1b8db0..120d1d40704b 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -602,7 +602,7 @@ struct cgroup_subsys { void (*cancel_fork)(struct task_struct *task); void (*fork)(struct task_struct *task); void (*exit)(struct task_struct *task); - void (*free)(struct task_struct *task); + void (*release)(struct task_struct *task); void (*bind)(struct cgroup_subsys_state *root_css); bool early_init:1; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9968332cceed..81f58b4a5418 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -121,6 +121,7 @@ extern int cgroup_can_fork(struct task_struct *p); extern void cgroup_cancel_fork(struct task_struct *p); extern void cgroup_post_fork(struct task_struct *p); void cgroup_exit(struct task_struct *p); +void cgroup_release(struct task_struct *p); void cgroup_free(struct task_struct *p); int cgroup_init_early(void); @@ -697,6 +698,7 @@ static inline int cgroup_can_fork(struct task_struct *p) { return 0; } static inline void cgroup_cancel_fork(struct task_struct *p) {} static inline void cgroup_post_fork(struct task_struct *p) {} static inline void cgroup_exit(struct task_struct *p) {} +static inline void cgroup_release(struct task_struct *p) {} static inline void cgroup_free(struct task_struct *p) {} static inline int cgroup_init_early(void) { return 0; } diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 503bba3c4bae..f84bf28f36ba 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -197,7 +197,7 @@ static u64 css_serial_nr_next = 1; */ static u16 have_fork_callback __read_mostly; static u16 have_exit_callback __read_mostly; -static u16 have_free_callback __read_mostly; +static u16 have_release_callback __read_mostly; static u16 have_canfork_callback __read_mostly; /* cgroup namespace for init task */ @@ -5316,7 +5316,7 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) have_fork_callback |= (bool)ss->fork << ss->id; have_exit_callback |= (bool)ss->exit << ss->id; - have_free_callback |= (bool)ss->free << ss->id; + have_release_callback |= (bool)ss->release << ss->id; have_canfork_callback |= (bool)ss->can_fork << ss->id; /* At system boot, before all subsystems have been @@ -5752,16 +5752,19 @@ void cgroup_exit(struct task_struct *tsk) } while_each_subsys_mask(); } -void cgroup_free(struct task_struct *task) +void cgroup_release(struct task_struct *task) { - struct css_set *cset = task_css_set(task); struct cgroup_subsys *ss; int ssid; - do_each_subsys_mask(ss, ssid, have_free_callback) { - ss->free(task); + do_each_subsys_mask(ss, ssid, have_release_callback) { + ss->release(task); } while_each_subsys_mask(); +} +void cgroup_free(struct task_struct *task) +{ + struct css_set *cset = task_css_set(task); put_css_set(cset); } diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c index 9829c67ebc0a..c9960baaa14f 100644 --- a/kernel/cgroup/pids.c +++ b/kernel/cgroup/pids.c @@ -247,7 +247,7 @@ static void pids_cancel_fork(struct task_struct *task) pids_uncharge(pids, 1); } -static void pids_free(struct task_struct *task) +static void pids_release(struct task_struct *task) { struct pids_cgroup *pids = css_pids(task_css(task, pids_cgrp_id)); @@ -342,7 +342,7 @@ struct cgroup_subsys pids_cgrp_subsys = { .cancel_attach = pids_cancel_attach, .can_fork = pids_can_fork, .cancel_fork = pids_cancel_fork, - .free = pids_free, + .release = pids_release, .legacy_cftypes = pids_files, .dfl_cftypes = pids_files, .threaded = true, diff --git a/kernel/exit.c b/kernel/exit.c index 2639a30a8aa5..2166c2d92ddc 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -219,6 +219,7 @@ void release_task(struct task_struct *p) } write_unlock_irq(&tasklist_lock); + cgroup_release(p); release_thread(p); call_rcu(&p->rcu, delayed_put_task_struct); -- 2.19.1