Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1931781pxb; Wed, 2 Feb 2022 16:17:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJwlwh92WqqhCD+ux0RpVyhnYWQKpJNQxkllFqV3hXw4+KZpztLxBuw6jSygTU94Q/H3wyl3 X-Received: by 2002:a63:af08:: with SMTP id w8mr26787224pge.1.1643847470108; Wed, 02 Feb 2022 16:17:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643847470; cv=none; d=google.com; s=arc-20160816; b=KxMkx5aKaF68T/SXnlgRTuH49IAVbA3miwet1wJi7s+lHs1XYnJNnWTScKcZsW9/hM sLbg3L5l9IAiwLmHwF92glmN+zvQIA801nIqko68Z4TIocPIotjoOLX/7pSZrnFjDrCX 4I41pVMcA+e1d/puhKeJvVujm6MPWJfal7gQaWNurIMasqpK9xh6Gdo73sIS2VeJNMQc L+C1NIxg1xr0O50E0cWQ3dtM9sK5hfiPak09IJq1se2I04+IRwNr0nVyPZ58RsGdyf4U WNC+eZT9j1iDV/xiPPxOOrsibRDVLHzyzE30l9LVY+MCFAVeYqgYdfv/Me7VWFTYWWCx BJBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=GGdeOsZzuLs+0MhgGI3q1ydhumFVoaDll7p251gfuzY=; b=b3LoHx4Fc1NUjC9Rn/CknDv3myvvREOiOjVGl7w94856vTFl14l/bQuRpfEMth9BjN juSvGViUAZny4NPGt/0j4zQxQivI9j4Y0CxrbV4zMNg7CTKCG3OVjAZxQGRBVaMGgtc5 i0820fHBbd79NYD86b04/HiQJmoMsk8EAoHzyCCXWXeLCL+TVwhnHQq2Rm3d6xfMgX/6 fyaAu/HrKkdspAo3LW3P31NEk7YUAb3/pFirq7OZfsAe6vCkM+9ahvXLW/HRo71og2E/ 4fE999IKGmvyGxTpfB6rMuPSCVM5kNUIEVazo87jHhBqY+IKpJ3jqxZHJt25d48EOzVn AA6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Xo4mJlrO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g16si20588700pfv.349.2022.02.02.16.17.36; Wed, 02 Feb 2022 16:17:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Xo4mJlrO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238283AbiBBP7r (ORCPT + 99 others); Wed, 2 Feb 2022 10:59:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232735AbiBBP7q (ORCPT ); Wed, 2 Feb 2022 10:59:46 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4883C061714 for ; Wed, 2 Feb 2022 07:59:46 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id u15so39247801wrt.3 for ; Wed, 02 Feb 2022 07:59:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=GGdeOsZzuLs+0MhgGI3q1ydhumFVoaDll7p251gfuzY=; b=Xo4mJlrO7IZTeUrR8//wRJx2R8eAk2W54lNpkfQiddlaXlHKiRFu+LEQybNTg6zrTg TlbC2o4ITIE9Ez9h9cH4nAoWfeLloQMTSraHnIp+8jfXi6Esrukrenblw3nf+BzgKJeg CkYZHz8D9Af99nn6ANzXJ7CDcyflZ3+R1qGCFsSWZec9yi0zHshT94cxUYzBWgSAJlVm /6rwOH+3d9WWwEhe4JP22kHORcebR/gkl2HXQYyOtlrw1pD9+c7tM++GMdIWKMDDhpkt lvWsX/Sv8N4Cr10I+YVJr4IbZnwol1BymWKBEB5QUpPtenqlotOa8wE79XAXWbupYuBr /q+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=GGdeOsZzuLs+0MhgGI3q1ydhumFVoaDll7p251gfuzY=; b=QlATB44RNSxNadOGp4bZewoWrS1Zhw6EFmiT9Up2RTVvUrje7h074iDSINHIRdloIu td620PXC2s7AM3uRTsarI/1lCo7SWQC5aAkYo9kkVLNU5h9B4t18aDeYjYqtS76yIZmK R1mbNL9KTm1SdYSHuiTnlrJg1T7OcjupM260Pauh2Wt/8vwmLMgoz7T07k4wWmaqncAC 4RiZx/XGwy2fcp46qJcl+OEy24ADlCHkj31ePPOwQ0qTRKwhAYmgjoSI8w/6f9HWd9XB rv3jWVOBEWqAoovMSFyNL/uyXiK3T3VS/jaYnsKNECjjG1X8gPMkl55E7n6mRP3l6psi 98kA== X-Gm-Message-State: AOAM53237yZI/z2I+q92tVpgOdGEEDdhKs2uOwET0LFJ3XgrZf9gB0DL 2mSYeXA2rbikZyXsoQQD/+WcuJ91vcnrjA== X-Received: by 2002:adf:f44a:: with SMTP id f10mr25373533wrp.653.1643817585179; Wed, 02 Feb 2022 07:59:45 -0800 (PST) Received: from usaari01.cust.communityfibre.co.uk ([2a02:6b6d:f804:0:18da:9567:5ef:1a19]) by smtp.gmail.com with ESMTPSA id k25sm5374033wms.23.2022.02.02.07.59.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Feb 2022 07:59:44 -0800 (PST) From: Usama Arif To: io-uring@vger.kernel.org, axboe@kernel.dk, asml.silence@gmail.com, linux-kernel@vger.kernel.org Cc: fam.zheng@bytedance.com, Usama Arif Subject: [RFC] io_uring: avoid ring quiesce while registering/unregistering eventfd Date: Wed, 2 Feb 2022 15:59:23 +0000 Message-Id: <20220202155923.4117285-1-usama.arif@bytedance.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Acquire completion_lock at the start of __io_uring_register before registering/unregistering eventfd and release it at the end. Hence all calls to io_cqring_ev_posted which adds to the eventfd counter will finish before acquiring the spin_lock in io_uring_register, and all new calls will wait till the eventfd is registered. This avoids ring quiesce which is much more expensive than acquiring the spin_lock. On the system tested with this patch, io_uring_reigster with IORING_REGISTER_EVENTFD takes less than 1ms, compared to 15ms before. Signed-off-by: Usama Arif Reviewed-by: Fam Zheng --- fs/io_uring.c | 50 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 16 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 2e04f718319d..e75d8abd225a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1803,11 +1803,11 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW); } - if (posted) + if (posted) { io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); - if (posted) io_cqring_ev_posted(ctx); + } + spin_unlock(&ctx->completion_lock); return all_flushed; } @@ -1971,8 +1971,8 @@ static void io_req_complete_post(struct io_kiocb *req, s32 res, spin_lock(&ctx->completion_lock); __io_req_complete_post(req, res, cflags); io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); + spin_unlock(&ctx->completion_lock); } static inline void io_req_complete_state(struct io_kiocb *req, s32 res, @@ -2231,11 +2231,11 @@ static void __io_req_find_next_prep(struct io_kiocb *req) spin_lock(&ctx->completion_lock); posted = io_disarm_next(req); - if (posted) + if (posted) { io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); - if (posted) io_cqring_ev_posted(ctx); + } + spin_unlock(&ctx->completion_lock); } static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req) @@ -2272,8 +2272,8 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked) static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx) { io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); + spin_unlock(&ctx->completion_lock); } static void handle_prev_tw_list(struct io_wq_work_node *node, @@ -2535,8 +2535,8 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) } io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); + spin_unlock(&ctx->completion_lock); state->flush_cqes = false; } @@ -5541,10 +5541,12 @@ static int io_poll_check_events(struct io_kiocb *req) filled = io_fill_cqe_aux(ctx, req->user_data, mask, IORING_CQE_F_MORE); io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); - if (unlikely(!filled)) + if (unlikely(!filled)) { + spin_unlock(&ctx->completion_lock); return -ECANCELED; + } io_cqring_ev_posted(ctx); + spin_unlock(&ctx->completion_lock); } else if (req->result) { return 0; } @@ -5579,8 +5581,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked) hash_del(&req->hash_node); __io_req_complete_post(req, req->result, 0); io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); + spin_unlock(&ctx->completion_lock); } static void io_apoll_task_func(struct io_kiocb *req, bool *locked) @@ -8351,8 +8353,8 @@ static void __io_rsrc_put_work(struct io_rsrc_node *ref_node) spin_lock(&ctx->completion_lock); io_fill_cqe_aux(ctx, prsrc->tag, 0, 0); io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); + spin_unlock(&ctx->completion_lock); io_ring_submit_unlock(ctx, lock_ring); } @@ -9639,11 +9641,11 @@ static __cold bool io_kill_timeouts(struct io_ring_ctx *ctx, } } spin_unlock_irq(&ctx->timeout_lock); - if (canceled != 0) + if (canceled != 0) { io_commit_cqring(ctx); - spin_unlock(&ctx->completion_lock); - if (canceled != 0) io_cqring_ev_posted(ctx); + } + spin_unlock(&ctx->completion_lock); return canceled != 0; } @@ -10970,6 +10972,8 @@ static bool io_register_op_must_quiesce(int op) case IORING_REGISTER_IOWQ_AFF: case IORING_UNREGISTER_IOWQ_AFF: case IORING_REGISTER_IOWQ_MAX_WORKERS: + case IORING_REGISTER_EVENTFD: + case IORING_UNREGISTER_EVENTFD: return false; default: return true; @@ -11030,6 +11034,17 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, return -EACCES; } + /* + * Acquire completion_lock at the start of __io_uring_register before + * registering/unregistering eventfd and release it at the end. Any + * completion events pending before this call will finish before acquiring + * the spin_lock here, and all new completion events will wait till the + * eventfd is registered. This avoids ring quiesce which is much more + * expensive then acquiring spin_lock. + */ + if (opcode == IORING_REGISTER_EVENTFD || opcode == IORING_UNREGISTER_EVENTFD) + spin_lock(&ctx->completion_lock); + if (io_register_op_must_quiesce(opcode)) { ret = io_ctx_quiesce(ctx); if (ret) @@ -11141,6 +11156,9 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; } + if (opcode == IORING_REGISTER_EVENTFD || opcode == IORING_UNREGISTER_EVENTFD) + spin_unlock(&ctx->completion_lock); + if (io_register_op_must_quiesce(opcode)) { /* bring the ctx back to life */ percpu_ref_reinit(&ctx->refs); -- 2.25.1