Date: Mon, 23 Feb 2015 21:17:01 +0100
From: Michal Hocko <mhocko@suse.cz>
To: linux-api@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>, Oleg Nesterov <oleg@redhat.com>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        Michael Kerrisk <mtk.manpages@gmail.com>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fork: report pid reservation failure properly
Message-ID: <20150223201701.GA28639@dhcp22.suse.cz>
References: <20150204102719.GB29434@dhcp22.suse.cz>
 <1423046628-23638-1-git-send-email-mhocko@suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1423046628-23638-1-git-send-email-mhocko@suse.cz>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5105
Lines: 150

ping on this one? Should I just resend (your way Andrew)? Or are there
any objections to the patch as is.

On Wed 04-02-15 11:43:48, Michal Hocko wrote:
> copy_process will report any failure in alloc_pid as ENOMEM currently
> which is misleading because the pid allocation might fail not only when
> the memory is short but also when the pid space is consumed already.
> 
> The current man page even mentions this case:
> "
> EAGAIN
> 
>       A system-imposed limit on the number of threads was encountered.
>       There are a number of limits that may trigger this error: the
>       RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which
>       limits the number of processes and threads for a real user ID, was
>       reached; the kernel's system-wide limit on the number of processes
>       and threads, /proc/sys/kernel/threads-max, was reached (see
>       proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max,
>       was reached (see proc(5)).
> "
> 
> so the current behavior is also incorrect wrt. documentation. POSIX man
> page also suggest returing EAGAIN when the process count limit is
> reached.
> 
> This patch simply propagates error code from alloc_pid and makes sure we
> return -EAGAIN due to reservation failure. This will make behavior of
> fork closer to both our documentation and POSIX.
> 
> alloc_pid might alsoo fail when the reaper in the pid namespace is dead
> (the namespace basically disallows all new processes) and there is no
> good error code which would match documented ones. We have traditionally
> returned ENOMEM for this case which is misleading as well but as per
> Eric W. Biederman this behavior is documented in man pid_namespaces(7)
> "
> If the "init" process of a PID namespace terminates, the kernel
> terminates all of the processes in the namespace via a SIGKILL signal.
> This behavior reflects the fact that the "init" process is essential for
> the correct operation of a PID namespace.  In this case, a subsequent
> fork(2) into this PID namespace will fail with the error ENOMEM; it is
> not possible to create a new processes in a PID namespace whose "init"
> process has terminated.
> "
> and introducing a new error code would be too risky so let's stick to
> ENOMEM for this case.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  kernel/fork.c |  5 +++--
>  kernel/pid.c  | 15 ++++++++-------
>  2 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3ce8d57cff09..c37b88a162c5 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1405,10 +1405,11 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  		goto bad_fork_cleanup_io;
>  
>  	if (pid != &init_struct_pid) {
> -		retval = -ENOMEM;
>  		pid = alloc_pid(p->nsproxy->pid_ns_for_children);
> -		if (!pid)
> +		if (IS_ERR(pid)) {
> +			retval = PTR_ERR(pid);
>  			goto bad_fork_cleanup_io;
> +		}
>  	}
>  
>  	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 82430c858d69..bbb805ccd893 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -179,7 +179,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
>  			spin_unlock_irq(&pidmap_lock);
>  			kfree(page);
>  			if (unlikely(!map->page))
> -				break;
> +				return -ENOMEM;
>  		}
>  		if (likely(atomic_read(&map->nr_free))) {
>  			for ( ; ; ) {
> @@ -207,7 +207,7 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
>  		}
>  		pid = mk_pid(pid_ns, map, offset);
>  	}
> -	return -1;
> +	return -EAGAIN;
>  }
>  
>  int next_pidmap(struct pid_namespace *pid_ns, unsigned int last)
> @@ -298,17 +298,20 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>  	int i, nr;
>  	struct pid_namespace *tmp;
>  	struct upid *upid;
> +	int retval = -ENOMEM;
>  
>  	pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
>  	if (!pid)
> -		goto out;
> +		return ERR_PTR(retval);
>  
>  	tmp = ns;
>  	pid->level = ns->level;
>  	for (i = ns->level; i >= 0; i--) {
>  		nr = alloc_pidmap(tmp);
> -		if (nr < 0)
> +		if (IS_ERR_VALUE(nr)) {
> +			retval = nr;
>  			goto out_free;
> +		}
>  
>  		pid->numbers[i].nr = nr;
>  		pid->numbers[i].ns = tmp;
> @@ -336,7 +339,6 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>  	}
>  	spin_unlock_irq(&pidmap_lock);
>  
> -out:
>  	return pid;
>  
>  out_unlock:
> @@ -348,8 +350,7 @@ out_free:
>  		free_pidmap(pid->numbers + i);
>  
>  	kmem_cache_free(ns->pid_cachep, pid);
> -	pid = NULL;
> -	goto out;
> +	return ERR_PTR(retval);
>  }
>  
>  void disable_pid_allocation(struct pid_namespace *ns)
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-api" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/