Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp3307093pxb; Fri, 4 Feb 2022 06:02:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJwA5xeNwmR11HqzYOqMAy5Gb2WHBvPBCTLIoxQcLksLh6YFMHRTodcFmsXzGQFRz8PLDNUe X-Received: by 2002:a05:6a00:98e:: with SMTP id u14mr3285479pfg.12.1643983351074; Fri, 04 Feb 2022 06:02:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643983351; cv=none; d=google.com; s=arc-20160816; b=tpH4iGShJaJfSSYfm9eHU7CyttRhq/nI4UdFLw372aXq9BznYqUKb+S1e9/bOxqEuh +5a8jjMOIRF/elGBSbiDIUqy3bXA86tt+Wnh9GykmSjJi4xg6TkxX8aPDmfPEKi0/kAP HJaSyexytrGrHcvSo3WZNDVCfW1AzF4EmXQHXMoZ1SWG8Q8Zi6VOm8bDK992BsA+nqi8 Fx9LxZks2dq4T2GPLVVrtKjyOXPDUuDNfNqb3PDsSfqAqfE/b+9fEfrl5RUNx03iZroB B4wnA3f6wOHB2pCjjhzukxY2H6xy4qti3f2MNKnGxz+A2ieTzRspqNItZ6v4c+if8Rcw z0hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=cMLtxoaKOMblqcoSXrvnXLJcL2/9t7S9kcEj9MMGPJY=; b=NwCzsYmoGHWBcH3jM1VRMNTgkHTqLe/P8aO0dUtOeN8gf0y5jiEffSm0t83vTtkECQ u+zgMbTT2Bt/lSRaK/zCn7uX8CfiMZ5f/XanyjuSVtqBbhesc6stL+c7vMiDrOZB/NFL ceu99TkPAeiO2+UvYuYzfBfjmRdymQe4XdfL2tBC++91TZY6sg6hHfYvTcNXb69vL4aQ mzOIrBlYLUkhZrSQPoseKVazDZCE+pHElH4yhLNGeKvnM9prwE7y5Hj30naxpQYnEDO2 6Sq+8ga1CD4qn7xLVDwRFDhZVU1DyPW78+JfLwcEb/AAVJRNun81anvLpNk7iiKBXMuA x4xg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=FYr5rXb9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a191si1909063pge.446.2022.02.04.06.02.17; Fri, 04 Feb 2022 06:02:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=FYr5rXb9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242347AbiBCXhZ (ORCPT + 99 others); Thu, 3 Feb 2022 18:37:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229532AbiBCXhW (ORCPT ); Thu, 3 Feb 2022 18:37:22 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99EA7C061714 for ; Thu, 3 Feb 2022 15:37:22 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id j25so7353105wrb.2 for ; Thu, 03 Feb 2022 15:37:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=cMLtxoaKOMblqcoSXrvnXLJcL2/9t7S9kcEj9MMGPJY=; b=FYr5rXb9w8sXNRUvhTwRwCnK5L0UdP1E5QGqXF7uiMrty2lCwUv56Dbw9vokV6mV3n P81HRfth66vdb6VBQ+ZkcqGkF5DPIRjcJKTVBwli4m0kM8g7k7LKuhcXKCwQ6fkGEA+H Vhn5OfUeWVEb/qSZAm4O+OVPdPZ0ca4HBdxaMVtz1LIxEnEki8349JKgE8M2o1sdD42j /NzivFp4zc/xCHqaTnNxOR0htps6ZjWdGyxfbC1xhmCLrsYqHxCRY1YIWDUzlt6hI2rX QcUkhV+kc9s3Zo7QgCXqtG6zF4W9LGOpQE3YC2SsrP2X10di0cj041gILiQcXCvDG6rN SsvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=cMLtxoaKOMblqcoSXrvnXLJcL2/9t7S9kcEj9MMGPJY=; b=nqRGItk/sXBA9nk3XQUtZ1f8eL0NBvKDayty19+5PbmNJ0/hneeHsxtdiK9kMe6kPi Tcd5BTVJbLh17gxe5eN0TA7rdCYWxSo2CsUWPqwZGTOO3mUitwkbqSTPHtae98+cmJOd eycmijEVixfQljvzgq5Tb6vqxv5QlETYgPgW6MhoJiQAdJez+0oMuDV1KC0FkVXFZgmF 89gwb6p/s1cJJ2541KPq1nf/ISs1ERkDjUPObUABg3J8seFxaba4p/XmeVX5iq235Pnx myoABFqO7aqGmLWAS8SXUfqUGY/x05zgIP8iurduJBnt4Rmjt20hs+DZldc9ol3dF637 lguQ== X-Gm-Message-State: AOAM533c2jSqhufbAi8w3Pr0Ki7+LYRzaU/jqyt6ZfIuGtFnb4mRYg9m 7td2yUGK1JWGN6dohpN1tyLKtg== X-Received: by 2002:adf:fb05:: with SMTP id c5mr230271wrr.220.1643931441200; Thu, 03 Feb 2022 15:37:21 -0800 (PST) Received: from ?IPv6:2a02:6b6d:f804:0:28c2:5854:c832:e580? ([2a02:6b6d:f804:0:28c2:5854:c832:e580]) by smtp.gmail.com with ESMTPSA id g4sm202184wrd.111.2022.02.03.15.37.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 03 Feb 2022 15:37:20 -0800 (PST) Subject: Re: [External] Re: [PATCH v4 2/3] io_uring: avoid ring quiesce while registering/unregistering eventfd To: Jens Axboe , io-uring@vger.kernel.org, asml.silence@gmail.com, linux-kernel@vger.kernel.org Cc: fam.zheng@bytedance.com References: <20220203182441.692354-1-usama.arif@bytedance.com> <20220203182441.692354-3-usama.arif@bytedance.com> <8369e0be-f922-ba6b-ceed-24886ebcdb78@kernel.dk> <11e423ca-4272-86cf-8d51-2620094cfe29@kernel.dk> From: Usama Arif Message-ID: <7e1a1452-c7be-fa98-f1b1-e19088088424@bytedance.com> Date: Thu, 3 Feb 2022 23:37:20 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <11e423ca-4272-86cf-8d51-2620094cfe29@kernel.dk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/02/2022 19:12, Jens Axboe wrote: > On 2/3/22 12:05 PM, Usama Arif wrote: >> >> >> On 03/02/2022 18:49, Jens Axboe wrote: >>> On 2/3/22 11:24 AM, Usama Arif wrote: >>>> -static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx) >>>> +static void io_eventfd_signal(struct io_ring_ctx *ctx) >>>> { >>>> - if (likely(!ctx->cq_ev_fd)) >>>> - return false; >>>> + struct io_ev_fd *ev_fd; >>>> + >>>> + rcu_read_lock(); >>>> + /* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */ >>>> + ev_fd = rcu_dereference(ctx->io_ev_fd); >>>> + >>>> + if (likely(!ev_fd)) >>>> + goto out; >>>> if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED) >>>> - return false; >>>> - return !ctx->eventfd_async || io_wq_current_is_worker(); >>>> + goto out; >>>> + >>>> + if (!ctx->eventfd_async || io_wq_current_is_worker()) >>>> + eventfd_signal(ev_fd->cq_ev_fd, 1); >>>> + >>>> +out: >>>> + rcu_read_unlock(); >>>> } >>> >>> This still needs what we discussed in v3, something ala: >>> >>> /* >>> * This will potential race with eventfd registration, but that's >>> * always going to be the case if there is IO inflight while an eventfd >>> * descriptor is being registered. >>> */ >>> if (!rcu_dereference_raw(ctx->io_ev_fd)) >>> return; >>> >>> rcu_read_lock(); >> >> Hmm, so i am not so worried about the registeration, but actually >> worried about unregisteration. >> If after the check and before the rcu_read_lock, the eventfd is >> unregistered won't we get a NULL pointer exception at >> eventfd_signal(ev_fd->cq_ev_fd, 1)? > > You need to check it twice, that's a hard requirement. The first racy > check is safe because we don't care if we miss a notification, once > inside rcu_read_lock() it needs to be done properly of course. Like you > do below, that's how it should be done. > >>> I wonder if we can get away with assigning ctx->io_ev_fd to NULL when we >>> do the call_rcu(). The struct itself will remain valid as long as we're >>> under rcu_read_lock() protection, so I think we'd be fine? If we do >>> that, then we don't need any rcu_barrier() or synchronize_rcu() calls, >>> as we can register a new one while the previous one is still being >>> killed. >>> >>> Hmm? >>> >> >> We would have to remove the check that ctx->io_ev_fd != NULL. That we >> would also result in 2 successive calls to io_eventfd_register without >> any unregister in between being successful? Which i dont think is the >> right behaviour? >> >> I think the likelihood of hitting the rcu_barrier itself is quite low, >> so probably the cost is low as well. > > Yeah it might very well be. To make what I suggested work, we'd need a > way to mark the io_ev_fd as going away. Which would be feasible, as we > know the memory will remain valid for us to check. So it could > definitely work, you'd just need a check for that. > >> Thanks, will do that this in the next patchset with the above >> io_eventfd_signal changes if those look ok as well? > > The code you pasted looked good. Consider the "is unregistration in > progress" suggestion as well, as it would be nice to avoid any kind of > rcu synchronization if at all possible. > Thanks for the review comments! I think all of them should have been addressed now in v5. I also removed ring quiesce from io_uring_register as the remaining 2 opcodes don't need them (Thanks Pavel for confirming that!) Regards, Usama