Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756804AbbBFU2E (ORCPT ); Fri, 6 Feb 2015 15:28:04 -0500 Received: from mail-lb0-f179.google.com ([209.85.217.179]:45313 "EHLO mail-lb0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756628AbbBFU2B (ORCPT ); Fri, 6 Feb 2015 15:28:01 -0500 MIME-Version: 1.0 In-Reply-To: <20150206195529.GA15517@redhat.com> References: <20150206162301.18031.32251.stgit@buzz> <20150206194405.GA13960@redhat.com> <20150206195529.GA15517@redhat.com> Date: Fri, 6 Feb 2015 23:27:59 +0300 Message-ID: Subject: Re: [PATCH 1/2] kernel/fork: handle put_user errors for CLONE_CHILD_SETTID/CLEARTID From: Konstantin Khlebnikov To: Oleg Nesterov Cc: Konstantin Khlebnikov , Linux API , Andrew Morton , Linus Torvalds , Linux Kernel Mailing List , Roman Gushchin , Nikita Vetoshkin , Pavel Emelyanov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3236 Lines: 78 On Fri, Feb 6, 2015 at 10:55 PM, Oleg Nesterov wrote: > On 02/06, Oleg Nesterov wrote: >> >> On 02/06, Konstantin Khlebnikov wrote: >> > >> > Whole sequence looks like: task calls fork, glibc calls syscall clone with >> > CLONE_CHILD_SETTID and passes pointer to TLS THREAD_SELF->tid as argument. >> > Child task gets read-only copy of VM including TLS. Child calls put_user() >> > to handle CLONE_CHILD_SETTID from schedule_tail(). put_user() trigger page >> > fault and it fails because do_wp_page() hits memcg limit without invoking >> > OOM-killer because this is page-fault from kernel-space. >> >> Because of !FAULT_FLAG_USER? Yep. As I see memcg triggers OOM only on page-faults and only from user-space. >> >> Perhaps we should fix this? Say mem_cgroup_oom_enable/disable around put_user(), >> I dunno. >> >> > Put_user returns >> > -EFAULT, which is ignored. Child returns into user-space and catches here >> > assert (THREAD_GETMEM (self, tid) != ppid), >> >> If only I understood why else we need CLONE_CHILD_SETTID ;) I dunno, CLONE_PARENT_SETTID should be enough for everybody but it's broken too. Twice. See the next patch =) >> >> > --- a/kernel/sched/core.c >> > +++ b/kernel/sched/core.c >> > @@ -2312,8 +2312,20 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev) >> > post_schedule(rq); >> > preempt_enable(); >> > >> > - if (current->set_child_tid) >> > - put_user(task_pid_vnr(current), current->set_child_tid); >> > + if (current->set_child_tid && >> > + unlikely(put_user(task_pid_vnr(current), current->set_child_tid))) { >> > + int dummy; >> > + >> > + /* >> > + * If this address is unreadable then userspace has not set >> > + * proper pointer. Application either doesn't care or will >> > + * notice this soon. If this address is readable then task >> > + * will be mislead about its own tid. It's better to die. >> > + */ >> > + if (!get_user(dummy, current->set_child_tid) && >> > + !fatal_signal_pending(current)) >> > + force_sig(SIGSEGV, current); >> > + } >> >> Well, get_user() can fail the same way? The page we need to cow can be >> swapped out. >> >> At first glance, to me this problem should be solved somewhere else... >> I'll try to reread this all tomorrow. > > And in fact I think that this is not set_child_tid/etc-specific. Perhaps > I am totally confused, but I think that put_user() simply should not fail > this way. Say, why a syscall should return -EFAULT if memory allocation > "silently" fails? Confused. That's how memcg works. All other places are handled explicitly and returns into user-space as -ENOMEM or -EFAULT. Probably some strange numa policy / memoryaffinity might trigger this too. Probably all page-faults must be forced to succeed or die mode, in this case all errors in put_user/copy_to_user could be simply ignored. -- Konstantin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/