Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2505049pxu; Sat, 28 Nov 2020 16:54:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJyFle688PZgJoOPTWBqBNcFx/ytCF/fBbg2dXVGYH/da0ESTgO1JjLwhoHZo/XC9q/RMVMR X-Received: by 2002:a17:906:e254:: with SMTP id gq20mr14360029ejb.520.1606611292461; Sat, 28 Nov 2020 16:54:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606611292; cv=none; d=google.com; s=arc-20160816; b=iiy3GjJisxgRs/AaJjUROlUY84MpVYq4FBbAiYDQI6QWmcSEvuh/vl5pqc8svo3MN1 Pr6JJKUikJYsDbUPXMnj1fNkXdLU4YhmzqDcFrpTxmAQNRQAshRAqG7Qz8/+ujAcOFiD +VNGGYixf+3ifAUqAKTMXT+qORTvHvHcGfz7q344As79Wd8HirAsSRR/2H3naiw1xSKk pnsUykyNQjl3FEisjZbRRMhBJN2HQiykUn/wV25eMNsk6Xw32T7+osooqiAyayzmsR9z 8yO51/RU3q81p2ZlQySNtcCQoRFBdlUWNUNHdJ5rMeF5I0bZLDmhuoUsFjcuutlXx13F v4mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gF437dYZ7jdLwMEZ7tinhd91PTaHOyGZ1bS4Ys+9sbE=; b=cO05q10ZG/lCcL2X3+qsGg24thZkYtqLC6kszutWzPTAtsGDtseCuDTIwCQPmI5qTb tn+2c3iLvDricDIUwyyCgdE0KBPdeVxbMeVJlRzwGq/3dQHsw9F7p8lXo2kBJ951pO3c TDwMJ5wZfXPKDxa6JStfKwhUhgOIut64Ii02E381UOg3PXNwJ/NMwXwjf1to9Y0nbo4m 7AUCRDtS7StzCzRu1SWj0X3Pw0MfqQrrSRayrqEqIiBerhYAMwYwD9eR7rq+yogWWy4k 8OJm/CmvumH8m7Ll7VDkQimbP8mttF+9INqxfXSGCOOWc51dHQPFF/Sl2h3STInA4a7P WvNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lsz3W7Eo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y9si7698216eju.6.2020.11.28.16.54.30; Sat, 28 Nov 2020 16:54:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=lsz3W7Eo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729843AbgK2Auq (ORCPT + 99 others); Sat, 28 Nov 2020 19:50:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727183AbgK2Aup (ORCPT ); Sat, 28 Nov 2020 19:50:45 -0500 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97081C061A04; Sat, 28 Nov 2020 16:50:02 -0800 (PST) Received: by mail-pf1-x444.google.com with SMTP id 131so7732415pfb.9; Sat, 28 Nov 2020 16:50:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gF437dYZ7jdLwMEZ7tinhd91PTaHOyGZ1bS4Ys+9sbE=; b=lsz3W7Eo6rNaH9lJHydIzJMFNyroHGWOYBcGBrThiveADagHW1sODxWZKkeaSi1VC7 zLQZB+Ua3lMMfGNhgBXE4MzHHfya8W0dlJgtLfLZa0WZZDqH7VSucoOi7I3mktfr1TVk V1IbKNnHF5YIZtjZM0NFkJ4iYvTZ1W17FJZ1z89/Q4E17kk3XyPI2hW3dh9Ekuv8VlFV xNjcWGBA3WiziRHvYQu275NQjRrlQAnRRFD/qQ11Vl9/L59G+sp+i3WgSMLiTuU1Yn1X zWljhsVdOMGwRMqTk+DbE5nMO7w004wAWe6dZs/QSEmC1uKuT0Mdy5htd4XQ9D4odrjk HTow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gF437dYZ7jdLwMEZ7tinhd91PTaHOyGZ1bS4Ys+9sbE=; b=dkt6YZq6BOv7qmbBSGmSzWqCDS5iWpU82TFkpGBDCYdcxtdfxkKoIdwBQXVBGzVDQw eknzGc2oYBqAezlvFVnNhpHTeTi9lKujiR8798+DiHD6k4+u4PQ4jgJXocWnvce1S4uw hQ9J2y94yJl6PJlsddiCw0loUeppj3qjB4M2DQ+njMYKAXarYUkpHvnBMNbTa3Z7/kxH 7lwrA3CX2ouw/ClGa7m+nmy3vv7yAlVAw+IvYbEq1KkIlhDnj+i9oB+IKwbdekV9A0az i59xb2/OXUGb+gauMJf96rBGtzYQejaqiWlUPro848dgsoc3H3Gh4AYC++hC5OAniwbe 5SdQ== X-Gm-Message-State: AOAM533kxGUz8NiMFfNLoWhkpiwa32Tnsp7odUZJia0Iesis5Y34Aej7 76K8nFkMDI/NCysnspGg8HMe3W1H7E7/2Q== X-Received: by 2002:a63:68f:: with SMTP id 137mr923738pgg.361.1606611001687; Sat, 28 Nov 2020 16:50:01 -0800 (PST) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id gg19sm16444871pjb.21.2020.11.28.16.50.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 Nov 2020 16:50:01 -0800 (PST) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-fsdevel@vger.kernel.org Cc: Nadav Amit , Jens Axboe , Andrea Arcangeli , Peter Xu , Alexander Viro , io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [RFC PATCH 05/13] fs/userfaultfd: introduce UFFD_FEATURE_POLL Date: Sat, 28 Nov 2020 16:45:40 -0800 Message-Id: <20201129004548.1619714-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201129004548.1619714-1-namit@vmware.com> References: <20201129004548.1619714-1-namit@vmware.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Nadav Amit Add a feature UFFD_FEATURE_POLL that makes the faulting thread spin while waiting for the page-fault to be handled. Users of this feature should be wise by setting the page-fault handling thread on another physical CPU and to potentially ensure that there are available cores to run the handler, as otherwise they will see performance degradation. We can later enhance it by setting one or two timeouts: one timeout until the page-fault is handled and another until the handler was woken. Cc: Jens Axboe Cc: Andrea Arcangeli Cc: Peter Xu Cc: Alexander Viro Cc: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 24 ++++++++++++++++++++---- include/uapi/linux/userfaultfd.h | 9 ++++++++- 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index fedf7c1615d5..b6a04e526025 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -122,7 +122,9 @@ static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, if (len && (start > uwq->msg.arg.pagefault.address || start + len <= uwq->msg.arg.pagefault.address)) goto out; - WRITE_ONCE(uwq->waken, true); + + smp_store_mb(uwq->waken, true); + /* * The Program-Order guarantees provided by the scheduler * ensure uwq->waken is visible before the task is woken. @@ -377,6 +379,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) vm_fault_t ret = VM_FAULT_SIGBUS; bool must_wait; long blocking_state; + bool poll; /* * We don't do userfault handling for the final child pid update. @@ -410,6 +413,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + poll = ctx->features & UFFD_FEATURE_POLL; + /* * If it's already released don't get it. This avoids to loop * in __get_user_pages if userfaultfd_release waits on the @@ -495,7 +500,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) * following the spin_unlock to happen before the list_add in * __add_wait_queue. */ - set_current_state(blocking_state); + + if (!poll) + set_current_state(blocking_state); + spin_unlock_irq(&ctx->fault_pending_wqh.lock); if (!is_vm_hugetlb_page(vmf->vma)) @@ -509,10 +517,18 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (likely(must_wait && !READ_ONCE(ctx->released))) { wake_up_poll(&ctx->fd_wqh, EPOLLIN); - schedule(); + if (poll) { + while (!READ_ONCE(uwq.waken) && !READ_ONCE(ctx->released) && + !signal_pending(current)) { + cpu_relax(); + cond_resched(); + } + } else + schedule(); } - __set_current_state(TASK_RUNNING); + if (!poll) + __set_current_state(TASK_RUNNING); /* * Here we race with the list_del; list_add in diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index e7e98bde221f..4eeba4235afe 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -27,7 +27,9 @@ UFFD_FEATURE_MISSING_HUGETLBFS | \ UFFD_FEATURE_MISSING_SHMEM | \ UFFD_FEATURE_SIGBUS | \ - UFFD_FEATURE_THREAD_ID) + UFFD_FEATURE_THREAD_ID | \ + UFFD_FEATURE_POLL) + #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -171,6 +173,10 @@ struct uffdio_api { * * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will * be returned, if feature is not requested 0 will be returned. + * + * UFFD_FEATURE_POLL polls upon page-fault if the feature is requested + * instead of descheduling. This feature should only be enabled for + * low-latency handlers and when CPUs are not overcomitted. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -181,6 +187,7 @@ struct uffdio_api { #define UFFD_FEATURE_EVENT_UNMAP (1<<6) #define UFFD_FEATURE_SIGBUS (1<<7) #define UFFD_FEATURE_THREAD_ID (1<<8) +#define UFFD_FEATURE_POLL (1<<9) __u64 features; __u64 ioctls; -- 2.25.1