Received: by 2002:a05:6520:4d:b0:139:a872:a4c9 with SMTP id i13csp2565652lkm; Mon, 20 Sep 2021 18:52:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzRpwXJfPkWDTzPR6R//4re/RBnOzBJXBLT8aJLXUAMv6qrTZy6L3ku6piHB7kW/XQujni2 X-Received: by 2002:a17:906:c20d:: with SMTP id d13mr32180844ejz.259.1632189124567; Mon, 20 Sep 2021 18:52:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632189124; cv=none; d=google.com; s=arc-20160816; b=iqUPsWwoDqNsD4SZqQZAz6az4WK1JSt/gEbcmB3UKRXku83BAKJG2Mq5c2NTQLFrBJ KEvcNeJ210+VIzsmnfqFuNniVnAkN242TtFR+u7eDlfbkFvPavuFzUWXkXlSDe9hT+m1 5MEUpTFLKW2tqRrNuFTjQurTcdM79l1BKqzuO+0iJNmcLfk0wOjvPH4w2Y1K7ZECysgB y7SZ1N9vAM4zj89kgAKGDUcBFAYUAZR8gDeeDKvieDRwVdRGbLgbUMt7dWhOFr1VRuYX SxVadHEhEOY2bIDsI7PMTuI8LCzkvzLr4JrKRvWKjsxwBGpWJxyuDLN+4c7JOWMoi+So qfow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=DBPmqZyESBD8DwFnv62yvOWbX4yGTnIj8+kFIaD879Y=; b=uY4FQf1xkkkDRJmkQSfqu6heMTD/K9AXFq9XfRHOkoiXgwrD5NozeM2I3dZliwJdrA ow3GO/68DR1Anriq2UlYR5tV2XXbt31rBnARmsajrKtMjGp2U/5xZxn9Ok8pfZTfHa/K bf4f+lP3zFhZDij1aS17uJC0JwQ0W9HXLzbpSJRoiBnJCH7MWJTe7yl1iHhLIH8x3nxZ p8d3cOsQFqfHEmpbE8COScaHdN+RsklKFkb9ngKMYpJjqtFZDBSI5aob2rgHs5BHVozz xADTwXvhYpOC73G+/bpwGSbu1F79ClqkOeQSbFGg4ut8ncXeIWPB4UtYdOIj0xNmQRBa SG8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=pMRx6dgy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dz15si17648027edb.601.2021.09.20.18.51.41; Mon, 20 Sep 2021 18:52:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=pMRx6dgy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376305AbhITSLh (ORCPT + 99 others); Mon, 20 Sep 2021 14:11:37 -0400 Received: from mail.kernel.org ([198.145.29.99]:32930 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1358005AbhITSFT (ORCPT ); Mon, 20 Sep 2021 14:05:19 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id AB2A56324E; Mon, 20 Sep 2021 17:17:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1632158254; bh=eSddgF1DPNBm4217aqC9hoSExfd74lACsyFo6XosSKI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pMRx6dgyvTeQ2sNuQoP8gLeU3s6LPwWQHZ4ZkH4T70if9+nc7GE3ijWLxzEYDX5Xr lUFQd3w32tc6bRNmUCuryZZCNTmpFahP2R2HuHFvsN9A/gsfKwKkiJDKKSK2dGnwIP fCJNwGX5LpZffHMidN6XhXGSljzCXxdT2IHqeieY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Nadav Amit , Alexander Viro , Andrea Arcangeli , Axel Rasmussen , Jens Axboe , Mike Rapoport , Peter Xu , Andrew Morton , Linus Torvalds , Sasha Levin Subject: [PATCH 5.4 071/260] userfaultfd: prevent concurrent API initialization Date: Mon, 20 Sep 2021 18:41:29 +0200 Message-Id: <20210920163933.538524640@linuxfoundation.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210920163931.123590023@linuxfoundation.org> References: <20210920163931.123590023@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Nadav Amit [ Upstream commit 22e5fe2a2a279d9a6fcbdfb4dffe73821bef1c90 ] userfaultfd assumes that the enabled features are set once and never changed after UFFDIO_API ioctl succeeded. However, currently, UFFDIO_API can be called concurrently from two different threads, succeed on both threads and leave userfaultfd's features in non-deterministic state. Theoretically, other uffd operations (ioctl's and page-faults) can be dispatched while adversely affected by such changes of features. Moreover, the writes to ctx->state and ctx->features are not ordered, which can - theoretically, again - let userfaultfd_ioctl() think that userfaultfd API completed, while the features are still not initialized. To avoid races, it is arguably best to get rid of ctx->state. Since there are only 2 states, record the API initialization in ctx->features as the uppermost bit and remove ctx->state. Link: https://lkml.kernel.org/r/20210808020724.1022515-3-namit@vmware.com Fixes: 9cd75c3cd4c3d ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") Signed-off-by: Nadav Amit Cc: Alexander Viro Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: Jens Axboe Cc: Mike Rapoport Cc: Peter Xu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/userfaultfd.c | 93 +++++++++++++++++++++++------------------------- 1 file changed, 45 insertions(+), 48 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 2c807283115d..ec57bbb6bb05 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -32,11 +32,6 @@ int sysctl_unprivileged_userfaultfd __read_mostly = 1; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; -enum userfaultfd_state { - UFFD_STATE_WAIT_API, - UFFD_STATE_RUNNING, -}; - /* * Start with fault_pending_wqh and fault_wqh so they're more likely * to be in the same cacheline. @@ -68,8 +63,6 @@ struct userfaultfd_ctx { unsigned int flags; /* features requested from the userspace */ unsigned int features; - /* state machine */ - enum userfaultfd_state state; /* released */ bool released; /* memory mappings are changing because of non-cooperative event */ @@ -103,6 +96,14 @@ struct userfaultfd_wake_range { unsigned long len; }; +/* internal indication that UFFD_API ioctl was successfully executed */ +#define UFFD_FEATURE_INITIALIZED (1u << 31) + +static bool userfaultfd_is_initialized(struct userfaultfd_ctx *ctx) +{ + return ctx->features & UFFD_FEATURE_INITIALIZED; +} + static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, int wake_flags, void *key) { @@ -699,7 +700,6 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) refcount_set(&ctx->refcount, 1); ctx->flags = octx->flags; - ctx->state = UFFD_STATE_RUNNING; ctx->features = octx->features; ctx->released = false; ctx->mmap_changing = false; @@ -980,38 +980,33 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait) poll_wait(file, &ctx->fd_wqh, wait); - switch (ctx->state) { - case UFFD_STATE_WAIT_API: + if (!userfaultfd_is_initialized(ctx)) return EPOLLERR; - case UFFD_STATE_RUNNING: - /* - * poll() never guarantees that read won't block. - * userfaults can be waken before they're read(). - */ - if (unlikely(!(file->f_flags & O_NONBLOCK))) - return EPOLLERR; - /* - * lockless access to see if there are pending faults - * __pollwait last action is the add_wait_queue but - * the spin_unlock would allow the waitqueue_active to - * pass above the actual list_add inside - * add_wait_queue critical section. So use a full - * memory barrier to serialize the list_add write of - * add_wait_queue() with the waitqueue_active read - * below. - */ - ret = 0; - smp_mb(); - if (waitqueue_active(&ctx->fault_pending_wqh)) - ret = EPOLLIN; - else if (waitqueue_active(&ctx->event_wqh)) - ret = EPOLLIN; - - return ret; - default: - WARN_ON_ONCE(1); + + /* + * poll() never guarantees that read won't block. + * userfaults can be waken before they're read(). + */ + if (unlikely(!(file->f_flags & O_NONBLOCK))) return EPOLLERR; - } + /* + * lockless access to see if there are pending faults + * __pollwait last action is the add_wait_queue but + * the spin_unlock would allow the waitqueue_active to + * pass above the actual list_add inside + * add_wait_queue critical section. So use a full + * memory barrier to serialize the list_add write of + * add_wait_queue() with the waitqueue_active read + * below. + */ + ret = 0; + smp_mb(); + if (waitqueue_active(&ctx->fault_pending_wqh)) + ret = EPOLLIN; + else if (waitqueue_active(&ctx->event_wqh)) + ret = EPOLLIN; + + return ret; } static const struct file_operations userfaultfd_fops; @@ -1205,7 +1200,7 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf, struct uffd_msg msg; int no_wait = file->f_flags & O_NONBLOCK; - if (ctx->state == UFFD_STATE_WAIT_API) + if (!userfaultfd_is_initialized(ctx)) return -EINVAL; for (;;) { @@ -1807,9 +1802,10 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, static inline unsigned int uffd_ctx_features(__u64 user_features) { /* - * For the current set of features the bits just coincide + * For the current set of features the bits just coincide. Set + * UFFD_FEATURE_INITIALIZED to mark the features as enabled. */ - return (unsigned int)user_features; + return (unsigned int)user_features | UFFD_FEATURE_INITIALIZED; } /* @@ -1822,12 +1818,10 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, { struct uffdio_api uffdio_api; void __user *buf = (void __user *)arg; + unsigned int ctx_features; int ret; __u64 features; - ret = -EINVAL; - if (ctx->state != UFFD_STATE_WAIT_API) - goto out; ret = -EFAULT; if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api))) goto out; @@ -1844,9 +1838,13 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, ret = -EFAULT; if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api))) goto out; - ctx->state = UFFD_STATE_RUNNING; + /* only enable the requested features for this uffd context */ - ctx->features = uffd_ctx_features(features); + ctx_features = uffd_ctx_features(features); + ret = -EINVAL; + if (cmpxchg(&ctx->features, 0, ctx_features) != 0) + goto err_out; + ret = 0; out: return ret; @@ -1863,7 +1861,7 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, int ret = -EINVAL; struct userfaultfd_ctx *ctx = file->private_data; - if (cmd != UFFDIO_API && ctx->state == UFFD_STATE_WAIT_API) + if (cmd != UFFDIO_API && !userfaultfd_is_initialized(ctx)) return -EINVAL; switch(cmd) { @@ -1964,7 +1962,6 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) refcount_set(&ctx->refcount, 1); ctx->flags = flags; ctx->features = 0; - ctx->state = UFFD_STATE_WAIT_API; ctx->released = false; ctx->mmap_changing = false; ctx->mm = current->mm; -- 2.30.2