Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1400934ybn; Wed, 25 Sep 2019 17:48:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqwGZle3lXaxYAuy0bsw8a7WFgdYribfAtmPIGnGGnp5ocvbKynIR3WwSoHliw2lYTRg2nHR X-Received: by 2002:a17:906:3443:: with SMTP id d3mr886305ejb.156.1569458927148; Wed, 25 Sep 2019 17:48:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569458927; cv=none; d=google.com; s=arc-20160816; b=Po+x9WvoyUnQbuFcyXAM36z2XLXKUZLfyyYcnb7kqd89hpHuQPTaUpTWOb53cHkOCn sCI4dJKvWEiDllslsRSYy52FOijFsBOzkUm+vDK/NkQ/mOfQ0btNGhwfsH8/jdRz2yGq vTch8RnAFA16Zu75OoIHmYtS320kfDXdKj8sbw2euBymrnWK8k6jZIlnyVQWmQnSwy0Q pc6p/xENoY4PLl+lSgGODZfmDGbqlz1bPWLkioCF76EJmkVGBMixjWj9mIOGINwCW/uG mufzXgi9Ahx0YFPr1TsiSKpSs2hwnJhTHjBT8jnSAWSh9W58WR3cI+wKfbawv3rUv8DX 5lrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=XlIpObiXn0pGhAIkyW27MYi03pXVmO2CJBBWixP9Me0=; b=hovXRcZQXG1VEiYvLhQxA2VJv/PyX5Tnzn+rjPhbcGFK8kpKxzus6pUV/c7nvOhI+a hi/m01FtExS+CnQw5n0MC2vMBSIJxN+OLDZAx1yl61P8Hmw9DwpW5d9Oucls99/2+2m0 2fUt7A7+xeH2Itca478WIBFRdMMMElRWwUoJlfeNop3m0iW5V4jsuPUkme+LIzRq8x06 eaJoJ/jJEQYq3wH44UF73mINBZezBp3iDQ0XLQs29oQ0QfmnM8miWlwL5FQm3Q1zCzgs e4h+wmOXd+9EizDV1CGIAnaMiWwpUJw9SBeQl+qnzJnxCq2Xrbfl2wvoGX4g3N0Lcmsn vEVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Nz+nPJZv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id op27si247638ejb.253.2019.09.25.17.48.23; Wed, 25 Sep 2019 17:48:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=Nz+nPJZv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2409371AbfIXICw (ORCPT + 99 others); Tue, 24 Sep 2019 04:02:52 -0400 Received: from mail-io1-f66.google.com ([209.85.166.66]:34092 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2409293AbfIXICw (ORCPT ); Tue, 24 Sep 2019 04:02:52 -0400 Received: by mail-io1-f66.google.com with SMTP id q1so2256966ion.1 for ; Tue, 24 Sep 2019 01:02:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=XlIpObiXn0pGhAIkyW27MYi03pXVmO2CJBBWixP9Me0=; b=Nz+nPJZvOg62qPOPDhEOsJMRJQoL2XQgBHHlrnN9+BpxsHlPCHNo1iGM9YC3V1qlN8 sB/WhtHSxQWK+I1d3xe9nY6vUIPU8rc67xoER270JGjBpZiMVNhRd0vJgBQB4tuXFZJ5 gin0/eYkj35DuDCKMOkQD0uHe2eAObajZj2EYG670ieSyo927HKf2iVp8QHSfOuXbhoa /pNzJzg1cgETOJ/uGTD+3Xx4rxK1xKWJ6gHg1q+8YFIprj07ZxJ5ZtS3VBYcaLldQDZr Bvg1Gbjs30F77ptP9bjalM7eh77tIXnY9HVv+Q7bVHME5u8Hbc3gE9DyxYAWdr+s3Osb QGZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=XlIpObiXn0pGhAIkyW27MYi03pXVmO2CJBBWixP9Me0=; b=gpRhMb/Ujw+j9NdBpSUxoeGGb5xM1IkeeT6Yi3gFTBqeE04/Gv3YdpI1XQffhIwOxV c7/gYkBs2ay5wZFV6s/ckNDeIwxLCPhAOLbPuEbhCAnZeSL/fapQHqguBjf1DDEogWiK 3xbTse50Un0LaHLmGIUFu6IcEnku22bQPc4kyde2kGaAi3e8FPGawG7krTs4tQhRRD2c 9wBT4xYmuJSMFpH2nrlpctUhAxKhisHctMPdFbQc0CIOQUk6W5Fdjlu6jUWtLCcNk0l5 evblzf8dCQZ8rCmEHoGPNR8v9JsKSuzEMSGdQX4Bv9tyLwYL7H5XAJcyeEr1418zHYPS LnwA== X-Gm-Message-State: APjAAAVV+9hw6OrP4tCkt2lpFvMuJcccR2TTEQWG5r/ynj8VgZP7EO9H YuAWlmx6Deupt0TAeibcuNWS/nhsqcqnZA== X-Received: by 2002:a02:65cd:: with SMTP id u196mr2368567jab.3.1569312170512; Tue, 24 Sep 2019 01:02:50 -0700 (PDT) Received: from [172.19.131.113] ([8.46.75.9]) by smtp.gmail.com with ESMTPSA id t16sm970403iol.12.2019.09.24.01.02.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Sep 2019 01:02:48 -0700 (PDT) Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting To: Pavel Begunkov , Ingo Molnar Cc: Ingo Molnar , Peter Zijlstra , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org References: <20190923083549.GA42487@gmail.com> <731b2087-7786-5374-68ff-8cba42f0cd68@kernel.dk> <759b9b48-1de3-1d43-3e39-9c530bfffaa0@kernel.dk> <43244626-9cfd-0c0b-e7a1-878363712ef3@gmail.com> From: Jens Axboe Message-ID: Date: Tue, 24 Sep 2019 10:02:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <43244626-9cfd-0c0b-e7a1-878363712ef3@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/24/19 1:06 AM, Pavel Begunkov wrote: > On 24/09/2019 02:00, Jens Axboe wrote: >>> I think we can do the same thing, just wrapping the waitqueue in a >>> structure with a count in it, on the stack. Got some flight time >>> coming up later today, let me try and cook up a patch. >> >> Totally untested, and sent out 5 min before departure... But something >> like this. > Hmm, reminds me my first version. Basically that's the same thing but > with macroses inlined. I wanted to make it reusable and self-contained, > though. > > If you don't think it could be useful in other places, sure, we could do > something like that. Is that so? I totally agree it could be useful in other places. Maybe formalized and used with wake_up_nr() instead of adding a new primitive? Haven't looked into that, I may be talking nonsense. In any case, I did get a chance to test it and it works for me. Here's the "finished" version, slightly cleaned up and with a comment added for good measure. diff --git a/fs/io_uring.c b/fs/io_uring.c index ca7570aca430..14fae454cf75 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2768,6 +2768,42 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit, return submit; } +struct io_wait_queue { + struct wait_queue_entry wq; + struct io_ring_ctx *ctx; + struct task_struct *task; + unsigned to_wait; + unsigned nr_timeouts; +}; + +static inline bool io_should_wake(struct io_wait_queue *iowq) +{ + struct io_ring_ctx *ctx = iowq->ctx; + + /* + * Wake up if we have enough events, or if a timeout occured since we + * started waiting. For timeouts, we always want to return to userspace, + * regardless of event count. + */ + return io_cqring_events(ctx->rings) >= iowq->to_wait || + atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts; +} + +static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, + int wake_flags, void *key) +{ + struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, + wq); + + if (io_should_wake(iowq)) { + list_del_init(&curr->entry); + wake_up_process(iowq->task); + return 1; + } + + return -1; +} + /* * Wait until events become available, if we don't already have some. The * application must reap them itself, as they reside on the shared cq ring. @@ -2775,8 +2811,16 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit, static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, const sigset_t __user *sig, size_t sigsz) { + struct io_wait_queue iowq = { + .wq = { + .func = io_wake_function, + .entry = LIST_HEAD_INIT(iowq.wq.entry), + }, + .task = current, + .ctx = ctx, + .to_wait = min_events, + }; struct io_rings *rings = ctx->rings; - unsigned nr_timeouts; int ret; if (io_cqring_events(rings) >= min_events) @@ -2795,15 +2839,16 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, return ret; } - nr_timeouts = atomic_read(&ctx->cq_timeouts); - /* - * Return if we have enough events, or if a timeout occured since - * we started waiting. For timeouts, we always want to return to - * userspace. - */ - ret = wait_event_interruptible(ctx->wait, - io_cqring_events(rings) >= min_events || - atomic_read(&ctx->cq_timeouts) != nr_timeouts); + iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); + prepare_to_wait_exclusive(&ctx->wait, &iowq.wq, TASK_INTERRUPTIBLE); + do { + if (io_should_wake(&iowq)) + break; + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } while (1); + finish_wait(&ctx->wait, &iowq.wq); + restore_saved_sigmask_unless(ret == -ERESTARTSYS); if (ret == -ERESTARTSYS) ret = -EINTR; -- Jens Axboe