Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp254274yba; Mon, 20 May 2019 08:10:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqymz8lIjOWsjjz0ia0cJd6tsd4GwYauktDh8U0l81jr0iRPUcPniFT5vQSjhuc9t5J50ud5 X-Received: by 2002:a17:902:5ac9:: with SMTP id g9mr76433033plm.134.1558365050943; Mon, 20 May 2019 08:10:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558365050; cv=none; d=google.com; s=arc-20160816; b=ulkgczU77Ir93obQCOKMFVZMgYEC/T6Iyfpf3cgh51fj/6D9fh8j7CT7unE+w/aWyj NpjZZQUsmB+VbhUzbNKOU3wd+ROorzEUTQZxbJIfZ7lp6uTsmooivGgwSPF2h+hMpV70 nAcK0LcUDM69YSc5Juu4ZedTRI75HFbVrdIdLq+NSOawmyp50qCVUFA+xa78El+ntjsS gAwjYSuAxYVdQn7aIgWr7vO1t3W8TqJPsBpk/FYbdKasvjwvMgbBJ9LxRSODJIaf2Ozx OefUlNNbzuAswjNnWF1lHvISg8i53dXmlaKx+Tf3/UbCNFhjNG8e0w4unkTEBUjbzHMy dMeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=yn/2GKTJhrhPEgLumz+Q273JoTsphl29pVU2mQU76rA=; b=wy6wGOlA/7DY6u8iHdF41TumgX45jyxRVbkXJqIIXiJ5mwraRTPt1RrRmrVkesj+oP 6YdzX4j82/mO5NbUxPIvVeUJvcCFlwNdx51VaZO+kpDWEZqi3IBjzpnW9pCSGkCLpVMl rOHOZBFjmVku06hk/QWIHe5B5fGF7Xm/+3Z7eAmnJ3inJw4AnM3ONbdwF/p9ZKiSXyyU eleU8hWNVc4T0koTX5T072xlCLBIpKKZydyRAVxrr6Ln5IXnfCIxh5YMZC0JoPT+kSxX MgXwaHhI9/t2W3nLgmRNPhNzvochs1HWhGDuNCs7KG878jqN80q1EsnxkkQWDy3837Qa LONQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=HGWZUXTs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b12si18979488pge.261.2019.05.20.08.10.31; Mon, 20 May 2019 08:10:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=HGWZUXTs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388579AbfETMXX (ORCPT + 99 others); Mon, 20 May 2019 08:23:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:37594 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388555AbfETMXQ (ORCPT ); Mon, 20 May 2019 08:23:16 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 969E420815; Mon, 20 May 2019 12:23:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558354996; bh=2VA0O26o4dHCNp+2dfI3MdeQeRxsI67oDktz1/c5iYc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HGWZUXTsAnYjTULViWaOWNqRDvnXRHeoePAbeOdPb6wq96Ouel1JRywU8WpDWtLZG 7WNksXqsKVy7RupJuYiBKzsIxifjv9aPMHvtaESHaO1jMaUA8iYu8haShSeXHPx5e6 F58MzkI+9LZE6fMxBdToYfmKzhZp35nllEr3seWE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Andrea Arcangeli , zhong jiang , syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com, Oleg Nesterov , Jann Horn , Hugh Dickins , Mike Rapoport , Mike Kravetz , Peter Xu , Jason Gunthorpe , "Kirill A . Shutemov" , Michal Hocko , Andrew Morton , Linus Torvalds Subject: [PATCH 4.19 058/105] userfaultfd: use RCU to free the task struct when fork fails Date: Mon, 20 May 2019 14:14:04 +0200 Message-Id: <20190520115251.126549391@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190520115247.060821231@linuxfoundation.org> References: <20190520115247.060821231@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrea Arcangeli commit c3f3ce049f7d97cc7ec9c01cb51d9ec74e0f37c2 upstream. The task structure is freed while get_mem_cgroup_from_mm() holds rcu_read_lock() and dereferences mm->owner. get_mem_cgroup_from_mm() failing fork() ---- --- task = mm->owner mm->owner = NULL; free(task) if (task) *task; /* use after free */ The fix consists in freeing the task with RCU also in the fork failure case, exactly like it always happens for the regular exit(2) path. That is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm() (left side above) effective to avoid a use after free when dereferencing the task structure. An alternate possible fix would be to defer the delivery of the userfaultfd contexts to the monitor until after fork() is guaranteed to succeed. Such a change would require more changes because it would create a strict ordering dependency where the uffd methods would need to be called beyond the last potentially failing branch in order to be safe. This solution as opposed only adds the dependency to common code to set mm->owner to NULL and to free the task struct that was pointed by mm->owner with RCU, if fork ends up failing. The userfaultfd methods can still be called anywhere during the fork runtime and the monitor will keep discarding orphaned "mm" coming from failed forks in userland. This race condition couldn't trigger if CONFIG_MEMCG was set =n at build time. [aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal] Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com Fixes: 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event") Signed-off-by: Andrea Arcangeli Tested-by: zhong jiang Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com Cc: Oleg Nesterov Cc: Jann Horn Cc: Hugh Dickins Cc: Mike Rapoport Cc: Mike Kravetz Cc: Peter Xu Cc: Jason Gunthorpe Cc: "Kirill A . Shutemov" Cc: Michal Hocko Cc: zhong jiang Cc: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/fork.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) --- a/kernel/fork.c +++ b/kernel/fork.c @@ -907,6 +907,15 @@ static void mm_init_aio(struct mm_struct #endif } +static __always_inline void mm_clear_owner(struct mm_struct *mm, + struct task_struct *p) +{ +#ifdef CONFIG_MEMCG + if (mm->owner == p) + WRITE_ONCE(mm->owner, NULL); +#endif +} + static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) { #ifdef CONFIG_MEMCG @@ -1286,6 +1295,7 @@ static struct mm_struct *dup_mm(struct t free_pt: /* don't put binfmt in mmput, we haven't got module yet */ mm->binfmt = NULL; + mm_init_owner(mm, NULL); mmput(mm); fail_nomem: @@ -1617,6 +1627,21 @@ static inline void rcu_copy_process(stru #endif /* #ifdef CONFIG_TASKS_RCU */ } +static void __delayed_free_task(struct rcu_head *rhp) +{ + struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); + + free_task(tsk); +} + +static __always_inline void delayed_free_task(struct task_struct *tsk) +{ + if (IS_ENABLED(CONFIG_MEMCG)) + call_rcu(&tsk->rcu, __delayed_free_task); + else + free_task(tsk); +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2072,8 +2097,10 @@ bad_fork_cleanup_io: bad_fork_cleanup_namespaces: exit_task_namespaces(p); bad_fork_cleanup_mm: - if (p->mm) + if (p->mm) { + mm_clear_owner(p->mm, p); mmput(p->mm); + } bad_fork_cleanup_signal: if (!(clone_flags & CLONE_THREAD)) free_signal_struct(p->signal); @@ -2104,7 +2131,7 @@ bad_fork_cleanup_count: bad_fork_free: p->state = TASK_DEAD; put_task_stack(p); - free_task(p); + delayed_free_task(p); fork_out: spin_lock_irq(¤t->sighand->siglock); hlist_del_init(&delayed.node);