Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755774AbbBFTpZ (ORCPT ); Fri, 6 Feb 2015 14:45:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42948 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753216AbbBFTpX (ORCPT ); Fri, 6 Feb 2015 14:45:23 -0500 Date: Fri, 6 Feb 2015 20:44:05 +0100 From: Oleg Nesterov To: Konstantin Khlebnikov Cc: linux-api@vger.kernel.org, Andrew Morton , Linus Torvalds , linux-kernel@vger.kernel.org, Roman Gushchin , Nikita Vetoshkin , Pavel Emelyanov Subject: Re: [PATCH 1/2] kernel/fork: handle put_user errors for CLONE_CHILD_SETTID/CLEARTID Message-ID: <20150206194405.GA13960@redhat.com> References: <20150206162301.18031.32251.stgit@buzz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150206162301.18031.32251.stgit@buzz> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2088 Lines: 56 On 02/06, Konstantin Khlebnikov wrote: > > Whole sequence looks like: task calls fork, glibc calls syscall clone with > CLONE_CHILD_SETTID and passes pointer to TLS THREAD_SELF->tid as argument. > Child task gets read-only copy of VM including TLS. Child calls put_user() > to handle CLONE_CHILD_SETTID from schedule_tail(). put_user() trigger page > fault and it fails because do_wp_page() hits memcg limit without invoking > OOM-killer because this is page-fault from kernel-space. Because of !FAULT_FLAG_USER? Perhaps we should fix this? Say mem_cgroup_oom_enable/disable around put_user(), I dunno. > Put_user returns > -EFAULT, which is ignored. Child returns into user-space and catches here > assert (THREAD_GETMEM (self, tid) != ppid), If only I understood why else we need CLONE_CHILD_SETTID ;) > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2312,8 +2312,20 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) > post_schedule(rq); > preempt_enable(); > > - if (current->set_child_tid) > - put_user(task_pid_vnr(current), current->set_child_tid); > + if (current->set_child_tid && > + unlikely(put_user(task_pid_vnr(current), current->set_child_tid))) { > + int dummy; > + > + /* > + * If this address is unreadable then userspace has not set > + * proper pointer. Application either doesn't care or will > + * notice this soon. If this address is readable then task > + * will be mislead about its own tid. It's better to die. > + */ > + if (!get_user(dummy, current->set_child_tid) && > + !fatal_signal_pending(current)) > + force_sig(SIGSEGV, current); > + } Well, get_user() can fail the same way? The page we need to cow can be swapped out. At first glance, to me this problem should be solved somewhere else... I'll try to reread this all tomorrow. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/