Received: by 2002:a25:2c96:0:0:0:0:0 with SMTP id s144csp197520ybs; Tue, 26 May 2020 07:06:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzzytHS4Lv5cz9S/YCHYvf95AZnQ9ZQUVqu8gCgCmf0Yzz4MSpc6MPDPv8G2ED+sgytDlwb X-Received: by 2002:a17:906:509:: with SMTP id j9mr1250483eja.152.1590501996812; Tue, 26 May 2020 07:06:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590501996; cv=none; d=google.com; s=arc-20160816; b=HdwGMfdzUQ3doOuAYpwbWaapjoRvEty2/gSbNV2DMxtG+MR/P/Zf/7fDa5kGKwEFqO bad92QRzkDVd3F8OM78+P3shHBB9BmvhEveS8l0+PRNgosuUQqhxQpTZA0pE8/JTezei dQ/SnllrF2e1PrwNcQlG/ShPeFk9H8nHQyKUA1bd2hDl7UKWywpG4/GBIsdf/Mg7FDiz sMJg7ReJn/kKTETy1evSbscx1BOid0OfkrV+qrWYy5ECwpfBdlaOnURfCz5srO0NFj2c EC1t356fbmfst28B7tY0TrvBrnpOiXZy6e9DKQhsghzcpDWOdVzdVUu21ag6yHCYp56d vOjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=TsJ+ya7DLaAC9EC4liADTO0tprvxGjdA+7opmVl0FVA=; b=mHeafjV0lZ3rNZndGfkZrz57eDIdaJDo9WuQ14lQa88tuW44dxZxTyzRK9G7LytDfC GKzfS73zw3bSyeK5GCnE0uqU5BRPL1ewz0n98oXPtQ2gV3XGWxVqFCzBbUp8LnWQRTXz E8EIUe2ZW1tTgKitMMExVOgC1aorgbhnW612n8PTst4m5sfWGHRnk9ySSu7wqSy034ky vIlr1MZnkCQyrTKn17Cq6KsZfCznVkOscsvOwTkwxIt0A5yA0JR7JF5BxAoL8Aj0xZ/u wsywoG+J+Rcsdn2/e/mrfJc11VHCFYLX+RYijks3l6OPejilN8zOKitHvwvN440f3cpv 7QZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b="g1/7QOjK"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id du11si13860944ejc.40.2020.05.26.07.06.06; Tue, 26 May 2020 07:06:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b="g1/7QOjK"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729602AbgEZNuw (ORCPT + 99 others); Tue, 26 May 2020 09:50:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729400AbgEZNuv (ORCPT ); Tue, 26 May 2020 09:50:51 -0400 Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD82BC03E96D for ; Tue, 26 May 2020 06:50:51 -0700 (PDT) Received: by mail-pj1-x1042.google.com with SMTP id cx22so1455568pjb.1 for ; Tue, 26 May 2020 06:50:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=TsJ+ya7DLaAC9EC4liADTO0tprvxGjdA+7opmVl0FVA=; b=g1/7QOjKa61RC7wBeOn5GBPb3KCi20eVgYcsv13bLH50Z/106HIIkaLsu8OSxDSIFg ucdGdTmC48KzoU4rK60EfjAk9scExcaZ13dVFvi3tSLDoUcxKj/R/NFPqtNpe7MfrRXU dw3ZyFTS7C4diWZUxMwp9bdibC+YFYfzhQPe0AE89vq49rIBUReh4cFWufHq8nko2pco MTFYlOIl/V3YBqvvgIqihIeVIK2gPJx4HV3jVvEEkdeJvSqdMMBhrCeZukUvWBQCBePu DWM01ieH5nVgZ2sYu/3BF4pfwo2Ml59RwghLR5qHs2m6V1axIso60dJSSbANhObcrRa8 V7Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=TsJ+ya7DLaAC9EC4liADTO0tprvxGjdA+7opmVl0FVA=; b=k8o04dHCsL0pL2ap5fwmgoJ8nLwNaCq3qN0FECy5TZ41TU71NoKThctThoLCV4V2sR nb65ha2lYvhUbICMEel6CB0JALl88rTdXHZ2C+gGW9drWJbtlCNqXmuCPOQaCpgdVLR1 48MRrSrh2oECqjyykHRbo5a0b2CUdtI1LHCttkSrV9mkf8ev7jH+8evrSsDChhjczBfT WeYPlMGAI1KtqTRMNLQ+i1Zj9yaafpJXWNjVMvenBFW3yjyBu8EaowPghJ2dkQV8//aT DWH9ctD+nq05bN6IZHKrc4gswHi/RQbTMHJ5vY4p3Th8Es4ACwm+G2GcI0GPMd71A3DH GpUg== X-Gm-Message-State: AOAM530yrdKBBrx3MakfFh9VKhZCnevsN01HoufcFSeAKwA95l/ul4V3 mPDBsNK4eesxpljDbEKTjHxWfX1rabhLng== X-Received: by 2002:a17:902:502:: with SMTP id 2mr1199090plf.134.1590501051121; Tue, 26 May 2020 06:50:51 -0700 (PDT) Received: from ?IPv6:2605:e000:100e:8c61:a9e6:df54:e55e:4c47? ([2605:e000:100e:8c61:a9e6:df54:e55e:4c47]) by smtp.gmail.com with ESMTPSA id i3sm15567936pfe.44.2020.05.26.06.50.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 May 2020 06:50:50 -0700 (PDT) Subject: Re: [PATCH 12/12] io_uring: support true async buffered reads, if file provides it To: Pavel Begunkov , io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20200523185755.8494-1-axboe@kernel.dk> <20200523185755.8494-13-axboe@kernel.dk> <8d429d6b-81ee-0a28-8533-2e1d4faa6b37@gmail.com> <717e474a-5168-8e1e-2e02-c1bdff007bd9@kernel.dk> From: Jens Axboe Message-ID: <69516a01-a209-8a7e-6b9a-7d5b6fef4e96@kernel.dk> Date: Tue, 26 May 2020 07:50:49 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/26/20 1:44 AM, Pavel Begunkov wrote: > On 25/05/2020 22:59, Jens Axboe wrote: >> On 5/25/20 1:29 AM, Pavel Begunkov wrote: >>> On 23/05/2020 21:57, Jens Axboe wrote: >>>> If the file is flagged with FMODE_BUF_RASYNC, then we don't have to punt >>>> the buffered read to an io-wq worker. Instead we can rely on page >>>> unlocking callbacks to support retry based async IO. This is a lot more >>>> efficient than doing async thread offload. >>>> >>>> The retry is done similarly to how we handle poll based retry. From >>>> the unlock callback, we simply queue the retry to a task_work based >>>> handler. >>>> >>>> Signed-off-by: Jens Axboe >>>> --- >>>> fs/io_uring.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 99 insertions(+) >>>> >>> ... >>>> + >>>> + init_task_work(&rw->task_work, io_async_buf_retry); >>>> + /* submit ref gets dropped, acquire a new one */ >>>> + refcount_inc(&req->refs); >>>> + tsk = req->task; >>>> + ret = task_work_add(tsk, &rw->task_work, true); >>>> + if (unlikely(ret)) { >>>> + /* queue just for cancelation */ >>>> + init_task_work(&rw->task_work, io_async_buf_cancel); >>>> + tsk = io_wq_get_task(req->ctx->io_wq); >>> >>> IIRC, task will be put somewhere around io_free_req(). Then shouldn't here be >>> some juggling with reassigning req->task with task_{get,put}()? >> >> Not sure I follow? Yes, we'll put this task again when the request >> is freed, but not sure what you mean with juggling? > > I meant something like: > > ... > /* queue just for cancelation */ > init_task_work(&rw->task_work, io_async_buf_cancel); > + put_task_struct(req->task); > + req->task = get_task_struct(io_wq_task); > > > but, thinking twice, if I got the whole idea right, it should be ok as > is -- io-wq won't go away before the request anyway, and leaving > req->task pinned down for a bit is not a problem. OK good, then I thin kwe agree it's fine. >>>> + task_work_add(tsk, &rw->task_work, true); >>>> + } >>>> + wake_up_process(tsk); >>>> + return 1; >>>> +} >>> ... >>>> static int io_read(struct io_kiocb *req, bool force_nonblock) >>>> { >>>> struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; >>>> @@ -2601,6 +2696,7 @@ static int io_read(struct io_kiocb *req, bool force_nonblock) >>>> if (!ret) { >>>> ssize_t ret2; >>>> >>>> +retry: >>>> if (req->file->f_op->read_iter) >>>> ret2 = call_read_iter(req->file, kiocb, &iter); >>>> else >>>> @@ -2619,6 +2715,9 @@ static int io_read(struct io_kiocb *req, bool force_nonblock) >>>> if (!(req->flags & REQ_F_NOWAIT) && >>>> !file_can_poll(req->file)) >>>> req->flags |= REQ_F_MUST_PUNT; >>>> + if (io_rw_should_retry(req)) >>> >>> It looks like a state machine with IOCB_WAITQ and gotos. Wouldn't it be cleaner >>> to call call_read_iter()/loop_rw_iter() here directly instead of "goto retry" ? >> >> We could, probably making that part a separate helper then. How about the >> below incremental? > > IMHO, it was easy to get lost with such implicit state switching. > Looks better now! See a small comment below. Agree, that is cleaner. >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index a5a4d9602915..669dccd81207 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -2677,6 +2677,13 @@ static bool io_rw_should_retry(struct io_kiocb *req) >> return false; >> } >> >> +static int __io_read(struct io_kiocb *req, struct iov_iter *iter) >> +{ >> + if (req->file->f_op->read_iter) >> + return call_read_iter(req->file, &req->rw.kiocb, iter); >> + return loop_rw_iter(READ, req->file, &req->rw.kiocb, iter); >> +} >> + >> static int io_read(struct io_kiocb *req, bool force_nonblock) >> { >> struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; >> @@ -2710,11 +2717,7 @@ static int io_read(struct io_kiocb *req, bool force_nonblock) >> if (!ret) { >> ssize_t ret2; >> >> -retry: >> - if (req->file->f_op->read_iter) >> - ret2 = call_read_iter(req->file, kiocb, &iter); >> - else >> - ret2 = loop_rw_iter(READ, req->file, kiocb, &iter); >> + ret2 = __io_read(req, &iter); >> >> /* Catch -EAGAIN return for forced non-blocking submission */ >> if (!force_nonblock || ret2 != -EAGAIN) { >> @@ -2729,8 +2732,11 @@ static int io_read(struct io_kiocb *req, bool force_nonblock) >> if (!(req->flags & REQ_F_NOWAIT) && >> !file_can_poll(req->file)) >> req->flags |= REQ_F_MUST_PUNT; >> - if (io_rw_should_retry(req)) >> - goto retry; >> + if (io_rw_should_retry(req)) { >> + ret2 = __io_read(req, &iter); >> + if (ret2 != -EAGAIN) >> + goto out_free; > > "goto out_free" returns ret=0, so someone should add a cqe > > if (ret2 != -EAGAIN) { > kiocb_done(kiocb, ret2); > goto free_out; > } Fixed up in the current one. -- Jens Axboe