Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756503AbdLVQL3 (ORCPT ); Fri, 22 Dec 2017 11:11:29 -0500 Received: from mail-wr0-f195.google.com ([209.85.128.195]:34830 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752594AbdLVQL1 (ORCPT ); Fri, 22 Dec 2017 11:11:27 -0500 X-Google-Smtp-Source: ACJfBotiDM9IhGu2nYnniEsnMw2WbSE7Pyddk3xStLjtJovR7JG8p48lac+//v88/XrizO3lWqGQGw== Date: Fri, 22 Dec 2017 19:11:23 +0300 From: Alexey Dobriyan To: "Eric W. Biederman" Cc: Dave Jones , Linus Torvalds , Al Viro , Linux Kernel , syzkaller-bugs@googlegroups.com, Gargi Sharma , Oleg Nesterov , Rik van Riel , Andrew Morton Subject: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops) Message-ID: <20171222161123.GA2632@avx2> References: <871sjp1cjz.fsf@xmission.com> <20171221031606.GA4636@codemonkey.org.uk> <87po78trjm.fsf@xmission.com> <20171221220044.GA4977@codemonkey.org.uk> <87wp1fk0pd.fsf@xmission.com> <20171222033500.GA17273@codemonkey.org.uk> <87y3lvi480.fsf@xmission.com> <87lghuesel.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87lghuesel.fsf@xmission.com> User-Agent: Mutt/1.7.2 (2016-11-26) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2787 Lines: 102 On Fri, Dec 22, 2017 at 08:41:54AM -0600, Eric W. Biederman wrote: > Alexey Dobriyan writes: > > unshare > > fork > > alloc_pid in level 1 succeeds > > alloc_pid in level 0 fails, ->idr_next is 2 > > fork > > alloc pid 2 > > exit > > > > Reliable reproducer and fail injection patch attached > > > > I'd say proper fix is allocating pids in the opposite order > > so that failure in the last layer doesn't move IDR cursor > > in baby child pidns. > > I agree with you about changing the order. That will make > the code simpler and in the second loop actually conforming C code, > and fix the immediate problem. Something like that (barely tested) --- include/linux/pid_namespace.h | 2 ++ kernel/pid.c | 15 +++++++++++---- kernel/pid_namespace.c | 3 --- 3 files changed, 13 insertions(+), 7 deletions(-) --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -28,6 +28,8 @@ struct pid_namespace { unsigned int pid_allocated; struct task_struct *child_reaper; struct kmem_cache *pid_cachep; +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ +#define MAX_PID_NS_LEVEL 32 unsigned int level; struct pid_namespace *parent; #ifdef CONFIG_PROC_FS --- a/kernel/pid.c +++ b/kernel/pid.c @@ -146,6 +146,7 @@ void free_pid(struct pid *pid) struct pid *alloc_pid(struct pid_namespace *ns) { + struct pid_namespace *pid_ns[MAX_PID_NS_LEVEL + 1]; struct pid *pid; enum pid_type type; int i, nr; @@ -157,12 +158,19 @@ struct pid *alloc_pid(struct pid_namespace *ns) if (!pid) return ERR_PTR(retval); - tmp = ns; pid->level = ns->level; - for (i = ns->level; i >= 0; i--) { + tmp = ns; + do { + pid_ns[tmp->level] = tmp; + tmp = tmp->parent; + } while (tmp->level != 0); + + for (i = 0; i <= ns->level; i++) { int pid_min = 1; + tmp = pid_ns[i]; + idr_preload(GFP_KERNEL); spin_lock_irq(&pidmap_lock); @@ -189,7 +197,6 @@ struct pid *alloc_pid(struct pid_namespace *ns) pid->numbers[i].nr = nr; pid->numbers[i].ns = tmp; - tmp = tmp->parent; } if (unlikely(is_child_reaper(pid))) { @@ -223,7 +230,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) out_free: spin_lock_irq(&pidmap_lock); - while (++i <= ns->level) + while (--i >= 0) idr_remove(&ns->idr, (pid->numbers + i)->nr); spin_unlock_irq(&pidmap_lock); --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -80,9 +80,6 @@ static void proc_cleanup_work(struct work_struct *work) pid_ns_release_proc(ns); } -/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */ -#define MAX_PID_NS_LEVEL 32 - static struct ucounts *inc_pid_namespaces(struct user_namespace *ns) { return inc_ucount(ns, current_euid(), UCOUNT_PID_NAMESPACES);