Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1892270ybn; Thu, 26 Sep 2019 03:54:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqypaIYlJo37k27Rr9U1HjTh7rWd6UhYIUKsl7XjEDZ5sY0VTXzxeCw1XwT+BBDi8npqr7dm X-Received: by 2002:a17:906:a40d:: with SMTP id l13mr2323068ejz.237.1569495281991; Thu, 26 Sep 2019 03:54:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569495281; cv=none; d=google.com; s=arc-20160816; b=BzCGv/e3ewTL3RFVD/vWJXPqSRtPbKCeGaDUgQCaZBC7WZSXsndbwa24i2WIBL/P1M 9d3H7JQcNjolyAPPVxeE5rSnNLVqJyYp434hPXLjq3wvC6KhNoJ0lo4fx2hvhAFFykP7 PR7jdxlvmVikNtbhieVjZTOo0+Uqt7vkfua5zi47r3bSf9BczlhEI729IRhMgWnU2/AP DcQDEmadXjzTkq7j2OxLsDVPVDSMT/j4HKJ+zTkahZdnbeRF3L42uXClT1GiHfEehlOg d4a0L4Py3uzxoin3zajJy3DDYKujhbFKgu/ohLGM8j0+u4znIWol0sy+LSnYofjnL+sG e2fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=cA8IQc84E5RSkNbWvmCzRFr+G9jaBPf2YktK5OCp14Q=; b=1BKZoSWNRnxqwWoPoXx08377ovK3ISDxa2pAabBxf6sxYQ8l02R8TAIIRMBvwSaMIk x8Mb5vmT6n1lrFNN0Bv6W/WKIUHhJL5qmLkgFvB/UNcDXhhyayvNgHdnUmDkB9D99o+3 MHyS8PAfUUyFRp7OucsdS6gkjXRnocNL0jdpB92QUmeARYvoeSkfz9Neq5QgqsHOP34N EFfja41O6zL6lASTeHa+Jk9a7t9f4FUTU35yGxJiPe/PoGUAW3AhO7chdlQguYPDqvC3 UVm9HfUdp942p35QIBzfEVZkRloyJ0GcHYM+cCvgHDQ9v0T8w6tkFuSNJfD4L5VviIep 16ZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=pgas3t9X; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s42si1047091edm.292.2019.09.26.03.54.18; Thu, 26 Sep 2019 03:54:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=pgas3t9X; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410275AbfIXTc3 (ORCPT + 99 others); Tue, 24 Sep 2019 15:32:29 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:42443 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410233AbfIXTc2 (ORCPT ); Tue, 24 Sep 2019 15:32:28 -0400 Received: by mail-wr1-f66.google.com with SMTP id n14so3295643wrw.9 for ; Tue, 24 Sep 2019 12:32:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=cA8IQc84E5RSkNbWvmCzRFr+G9jaBPf2YktK5OCp14Q=; b=pgas3t9XEV7VmJ4qUpsOOHak91bTgUYQVYgBRjaQ7mFjX9x3P9XBSsm+tkQ/qWivXV dKeMqiibae20yKuaomEQ303gi5ciaTUGz9KgfOMzQgAWJgN2Hdl9AuK+dum02E/vw7Hw Sp1Hi7Rotk0e9Rnd4b/ZODV3TqOccixiJVG3ym0FDsVlskNtrLG03XlHs0cfWLDfP07+ MAkThfA3id/QvLLDtpEMTTL5U4Sfb6cT3CFNpANUPU2jQiO2+wtA1Dx4cGt48n0K7K9M Sp74BKrWnDqj3SRJey8KtQbLtLYGhejXHvYBOesCOZTAuE4eePtmmqkhQStibVJPk8BD Ibyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=cA8IQc84E5RSkNbWvmCzRFr+G9jaBPf2YktK5OCp14Q=; b=SP24JNF2ZGoFuTkKStIWJQAfUrmVYrTPRp4lZ5/w+yRcr1QRLqygpAiL82ClSQjFWV ZoMLkA/UCavg7Ir/08O5I24V//F8+5jUWTNUmfoPipgeFVFruYcb0v0zkV7YyB9ry9C3 0f3OF9w1KmYfbinj3EG3kwldx/lyEjDxyJpKCRzItjfgJDibuZTAr++SmP1WRsOPhtnY +4cBKOy65GNuDOOYYqcFmeC8AMtVpbbj2fgZ1kY1DW7/uqXkrXXRkwweQcBJgxDU4SLO aihaEgyIsSpf6xcpw0iTS3J+jjauFxUMmR0ki5MuCR04eEZRXPREyJjP1q+1g4ae4yid z2Mg== X-Gm-Message-State: APjAAAX/dq0WzBbFI356FYwFS9uzse5LMEY15MSCxRHQD1xc98Co6GIY mCb+yDzkvlOWSa8jPDdl5k2LETxAttSPXXmr X-Received: by 2002:adf:f949:: with SMTP id q9mr4440358wrr.382.1569353544815; Tue, 24 Sep 2019 12:32:24 -0700 (PDT) Received: from [172.20.9.27] ([83.167.33.93]) by smtp.gmail.com with ESMTPSA id k24sm3901160wmi.1.2019.09.24.12.32.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Sep 2019 12:32:23 -0700 (PDT) Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting To: Pavel Begunkov , Peter Zijlstra Cc: Ingo Molnar , Ingo Molnar , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org References: <20190923083549.GA42487@gmail.com> <731b2087-7786-5374-68ff-8cba42f0cd68@kernel.dk> <759b9b48-1de3-1d43-3e39-9c530bfffaa0@kernel.dk> <43244626-9cfd-0c0b-e7a1-878363712ef3@gmail.com> <20190924094942.GN2349@hirez.programming.kicks-ass.net> <6f935fb9-6ebd-1df1-0cd0-69e34a16fa7e@kernel.dk> <29e6e06e-351f-c19d-ed7c-51f30c9ca887@kernel.dk> <08193e07-6f05-a496-492d-06ed8ce3aea1@gmail.com> <6228b13d-5ef6-e83e-b5dc-7a157013d43f@gmail.com> <79fa9cc2-e0cc-922f-89d3-9ace59abb2e8@gmail.com> <82f95b0f-8dd5-2fb5-7e17-b77072e86093@kernel.dk> From: Jens Axboe Message-ID: <6b896ca1-f2a1-8133-3a91-3512bb86a2ac@kernel.dk> Date: Tue, 24 Sep 2019 21:32:22 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/24/19 12:28 PM, Pavel Begunkov wrote: > On 24/09/2019 20:46, Jens Axboe wrote: >> On 9/24/19 11:33 AM, Pavel Begunkov wrote: >>> On 24/09/2019 16:13, Jens Axboe wrote: >>>> On 9/24/19 5:23 AM, Pavel Begunkov wrote: >>>>>> Yep that should do it, and saves 8 bytes of stack as well. >>>>>> >>>>>> BTW, did you test my patch, this one or the previous? Just curious if it >>>>>> worked for you. >>>>>> >>>>> Not yet, going to do that tonight >>>> >>>> Thanks! For reference, the final version is below. There was still a >>>> signal mishap in there, now it should all be correct afaict. >>>> >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index 9b84232e5cc4..d2a86164d520 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -2768,6 +2768,38 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit, >>>> return submit; >>>> } >>>> >>>> +struct io_wait_queue { >>>> + struct wait_queue_entry wq; >>>> + struct io_ring_ctx *ctx; >>>> + unsigned to_wait; >>>> + unsigned nr_timeouts; >>>> +}; >>>> + >>>> +static inline bool io_should_wake(struct io_wait_queue *iowq) >>>> +{ >>>> + struct io_ring_ctx *ctx = iowq->ctx; >>>> + >>>> + /* >>>> + * Wake up if we have enough events, or if a timeout occured since we >>>> + * started waiting. For timeouts, we always want to return to userspace, >>>> + * regardless of event count. >>>> + */ >>>> + return io_cqring_events(ctx->rings) >= iowq->to_wait || >>>> + atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts; >>>> +} >>>> + >>>> +static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, >>>> + int wake_flags, void *key) >>>> +{ >>>> + struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, >>>> + wq); >>>> + >>>> + if (!io_should_wake(iowq)) >>>> + return -1; >>> >>> It would try to schedule only the first task in the wait list. Is that the >>> semantic you want? >>> E.g. for waiters=[32,8] and nr_events == 8, io_wake_function() returns >>> after @32, and won't wake up the second one. >> >> Right, those are the semantics I want. We keep the list ordered by using >> the exclusive wait addition. Which means that for the case you list, >> waiters=32 came first, and we should not wake others before that task >> gets the completions it wants. Otherwise we could potentially starve >> higher count waiters, if we always keep going and new waiters come in. >> > Yes. I think It would better to be documented in userspace API. I > could imagine some crazy case deadlocking userspace. E.g. > thread 1: wait_events(8), reap_events > thread 2: wait_events(32), wait(thread 1), reap_events No matter how you handle cases like this, there will always be deadlocks possible... So I don't think that's a huge concern. It's more important to not have intentional livelocks, which we would have if we always allowed the lowest wait count to get woken and steal the budget everytime. > works well > Reviewed-by: Pavel Begunkov > Tested-by: Pavel Begunkov Thanks, will add! > BTW, I searched for wait_event*(), and it seems there are plenty of > similar use cases. So, generic case would be useful, but this is for > later. Agree, it would undoubtedly be useful. -- Jens Axboe