Received: by 2002:a25:2c96:0:0:0:0:0 with SMTP id s144csp450511ybs; Sun, 24 May 2020 10:17:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy7Fxb9yiKQka/WBEFqD07NG0fPutRBCtAtfb+lEuzFoEdzZmMaFFYW3qSrFeNlAfj3eVs8 X-Received: by 2002:a17:906:b31a:: with SMTP id n26mr15378969ejz.441.1590340637841; Sun, 24 May 2020 10:17:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590340637; cv=none; d=google.com; s=arc-20160816; b=pTwpsBQfGNwwM2KKenXMeDUltRusQg1tCTJ9OV235JaHxhYjKts/MjjzEOKphCRyjK 4E6H6Acy993YTFzto3FQF5EEX/hJ1Al83olnfxd0A7jycKcOJ+CPkWyVvUqGeh6cJVkR kQvkddx4fz5OTzwymscrsHxr4VJWQVC8Z4WaVyDlOtAzmPyke33Y0Ih04VuZVXNKCvqQ BHPgx2D8o8hyy7nfVvueMWayIZQd8XZwigjKCYWEopJa7rIDYKdWQvto4Oi/yg7uUvD2 YZaw92OwrL0Ic7ucLZfkfBua21OT0dsdQiHRAVJ3f346UjlJ3tqWmIZ9ZdkJ0EPa8OWx iMPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=xZU7YPutzDudiXDbOu5V9kxMpNCAHn/94lq+JvMMNGA=; b=emgP0bqJltdNTiBDmBcfd2WCBX0wz9pwHcJOe3h4l2d5LXYm0bzJfn2G0tQxKUZY7j 4LPyIoSlvjruMBp6V1MNFF5GK57KI7jFbQFsn7BVg/HURn4XqssSq58u3UVLDZgJgiJe 6eHNmczRqUbkUPN6AJ0RlTlz77PndIX5W7sIZX06KbA7U0u9Ro8bWrv+l9YUN86gkaHF z/hibQCIGXBLn7WM97oL4GUMCHPsN0iGMgK2PBdDYCZDpICPOYuxosEQChXU9BIF44fy qife6b5sJ8Ivr0Uu7kwbpEX7QYWJpKc8bh/Kjhe/z/g5bC+UEM9xV/5lyIGkQdIWDyiy 4eKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=s3WWLmhI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a23si8380057ejk.653.2020.05.24.10.16.54; Sun, 24 May 2020 10:17:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=s3WWLmhI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729338AbgEXRMu (ORCPT + 99 others); Sun, 24 May 2020 13:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729045AbgEXRMu (ORCPT ); Sun, 24 May 2020 13:12:50 -0400 Received: from mail-pg1-x541.google.com (mail-pg1-x541.google.com [IPv6:2607:f8b0:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3ED8EC08C5C0 for ; Sun, 24 May 2020 10:12:50 -0700 (PDT) Received: by mail-pg1-x541.google.com with SMTP id j21so7683697pgb.7 for ; Sun, 24 May 2020 10:12:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=xZU7YPutzDudiXDbOu5V9kxMpNCAHn/94lq+JvMMNGA=; b=s3WWLmhIBDzWHUwLoFqnGK0yBXoTyqmTZJLgIGqlAUxRBDhAfMZwdqyk3Zsnnu9fd4 /5MP0Rk6jT0nU3BIeuywIlcoU4UapexFBY2evnBLVr6MuE4WSrNOId3FsiIxNN2884TZ 8cVqh9J2/L0anXIafBCS19mEzqQY0MNNZG4nC3xBMnNe2cL+hKfCZj0ZUQ1ZQYf1Hh2i a9NAdULi4EXWgYSZNJyq7QBcVUwieQ2Z6kaqOQwaALR+KGYwiqxNlHNskqNol2GHI3k5 TSJnrBRuVtNuQHmw4zsag6UaCs5IKIQw5KqubNpAGDm3HkiTeIuOvY/nWCrbAZoVM4bR wxOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xZU7YPutzDudiXDbOu5V9kxMpNCAHn/94lq+JvMMNGA=; b=PJ4Gctst5Qp/GloeuyT/bU+e6nkzkMajU6ZNRhOqixtGIcnd5roxFLZBaDnXI02hLR 4152OcwqK94i4RuweLSPHW+/8sI8I5Qs229M7z/hplS8HvQSWZAB3K4JUeHTidgDX1/K pdojfBsF49m0uszxTx2kPUXZG+Sasq7JmTgEhnpXHINn4o132KJrAjnw9gBioTiM32WZ 5M5RHxrDFijNMKjp+Bhdd1obmLi5vP6cbn+6cr870+XJCQN/H9QC/Frgu9iP45q40fF/ CY2bRmOx539QOGe4hCt+tUgxfECvPcYeYuklkuLO+xC6sg7hvuVEDlfTjAODsRzHyIST zrnw== X-Gm-Message-State: AOAM533O/Hhjk/MP79T2vYO7q5UIRm4srhsVRPHUAj0XXVKp0yFfoThP BSFshUgeVskeOZYukZnp29Nbt9dwlHLiRQ== X-Received: by 2002:a05:6a00:46:: with SMTP id i6mr13904666pfk.146.1590340369238; Sun, 24 May 2020 10:12:49 -0700 (PDT) Received: from ?IPv6:2605:e000:100e:8c61:8568:4ec4:ebd3:32d1? ([2605:e000:100e:8c61:8568:4ec4:ebd3:32d1]) by smtp.gmail.com with ESMTPSA id r21sm10949605pjo.2.2020.05.24.10.12.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 24 May 2020 10:12:48 -0700 (PDT) Subject: Re: [PATCH 05/12] mm: support async buffered reads in generic_file_buffered_read() To: Trond Myklebust , "io-uring@vger.kernel.org" Cc: "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" References: <20200523185755.8494-1-axboe@kernel.dk> <20200523185755.8494-6-axboe@kernel.dk> <264614fc4fa08df2b0899da1cd38bb07150cd7f3.camel@hammerspace.com> <2fa7104a-ea85-55f2-692c-514eb3b88a88@kernel.dk> From: Jens Axboe Message-ID: <476e123f-0269-5e88-5318-de9a63f13048@kernel.dk> Date: Sun, 24 May 2020 11:12:47 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/24/20 11:11 AM, Trond Myklebust wrote: > On Sun, 2020-05-24 at 10:40 -0600, Jens Axboe wrote: >> On 5/24/20 10:30 AM, Jens Axboe wrote: >>> On 5/24/20 8:05 AM, Trond Myklebust wrote: >>>> On Sat, 2020-05-23 at 12:57 -0600, Jens Axboe wrote: >>>>> Use the async page locking infrastructure, if IOCB_WAITQ is set >>>>> in >>>>> the >>>>> passed in iocb. The caller must expect an -EIOCBQUEUED return >>>>> value, >>>>> which means that IO is started but not done yet. This is >>>>> similar to >>>>> how >>>>> O_DIRECT signals the same operation. Once the callback is >>>>> received by >>>>> the caller for IO completion, the caller must retry the >>>>> operation. >>>>> >>>>> Signed-off-by: Jens Axboe >>>>> --- >>>>> mm/filemap.c | 33 ++++++++++++++++++++++++++------- >>>>> 1 file changed, 26 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>>> index c746541b1d49..a3b86c9acdc8 100644 >>>>> --- a/mm/filemap.c >>>>> +++ b/mm/filemap.c >>>>> @@ -1219,6 +1219,14 @@ static int >>>>> __wait_on_page_locked_async(struct >>>>> page *page, >>>>> return ret; >>>>> } >>>>> >>>>> +static int wait_on_page_locked_async(struct page *page, >>>>> + struct wait_page_queue >>>>> *wait) >>>>> +{ >>>>> + if (!PageLocked(page)) >>>>> + return 0; >>>>> + return __wait_on_page_locked_async(compound_head(page), >>>>> wait, >>>>> false); >>>>> +} >>>>> + >>>>> /** >>>>> * put_and_wait_on_page_locked - Drop a reference and wait for >>>>> it to >>>>> be unlocked >>>>> * @page: The page to wait for. >>>>> @@ -2058,17 +2066,25 @@ static ssize_t >>>>> generic_file_buffered_read(struct kiocb *iocb, >>>>> index, last_index - >>>>> index); >>>>> } >>>>> if (!PageUptodate(page)) { >>>>> - if (iocb->ki_flags & IOCB_NOWAIT) { >>>>> - put_page(page); >>>>> - goto would_block; >>>>> - } >>>>> - >>>>> /* >>>>> * See comment in do_read_cache_page on >>>>> why >>>>> * wait_on_page_locked is used to avoid >>>>> unnecessarily >>>>> * serialisations and why it's safe. >>>>> */ >>>>> - error = >>>>> wait_on_page_locked_killable(page); >>>>> + if (iocb->ki_flags & IOCB_WAITQ) { >>>>> + if (written) { >>>>> + put_page(page); >>>>> + goto out; >>>>> + } >>>>> + error = >>>>> wait_on_page_locked_async(page, >>>>> + >>>>> iocb- >>>>>> private); >>>> >>>> If it is being used in 'generic_file_buffered_read()' as storage >>>> for a >>>> wait queue, then it is hard to consider this a 'private' field. >>> >>> private isn't the prettiest, and in fact this one in particular is >>> a bit >>> of a mess. It's not clear if it's caller or callee owned. It's >>> generally >>> not used, outside of the old usb gadget code, iomap O_DIRECT, and >>> ocfs2. >>> With FMODE_BUF_RASYNC, the fs obviously can't set it if it uses >>> ->private for buffered IO. >>> >>>> Perhaps either rename and add type checking, or else add a >>>> separate >>>> field altogether to struct kiocb? >>> >>> I'd hate to add a new field and increase the size of the kiocb... >>> One >>> alternative is to do: >>> >>> union { >>> void *private; >>> struct wait_page_queue *ki_waitq; >>> }; >>> >>> and still use IOCB_WAITQ to say that ->ki_waitq is valid. >>> >>> There's also 4 bytes of padding in the kiocb struct. And some >>> fields are >>> only used for O_DIRECT as well, eg ->ki_cookie which is just used >>> for >>> polled O_DIRECT. So we could also do: >>> >>> union { >>> unsigned int ki_cookie; >>> struct wait_page_queue *ki_waitq; >>> }; >>> >>> and still not grow the kiocb. How about we go with this approach, >>> and >>> also add: >>> >>> if (kiocb->ki_flags & IOCB_HIPRI) >>> return -EOPNOTSUPP; >>> >>> to kiocb_wait_page_queue_init() to make sure that this combination >>> isn't >>> valid? >> >> Here's the incremental, which is spread over 3 patches. I think this >> one >> makes sense, as polled IO doesn't support buffered IO. And because >> doing >> an async callback for completion is not how polled IO operates >> anyway, >> so even if buffered IO supported it, we'd not use the callback for >> polled IO anyway. kiocb_wait_page_queue_init() checks and backs this >> up. >> >> >> diff --git a/include/linux/fs.h b/include/linux/fs.h >> index 0ef5f5973b1c..f7b1eb765c6e 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -317,7 +317,7 @@ enum rw_hint { >> #define IOCB_SYNC (1 << 5) >> #define IOCB_WRITE (1 << 6) >> #define IOCB_NOWAIT (1 << 7) >> -/* iocb->private holds wait_page_async struct */ >> +/* iocb->ki_waitq is valid */ >> #define IOCB_WAITQ (1 << 8) >> >> struct kiocb { >> @@ -332,7 +332,10 @@ struct kiocb { >> int ki_flags; >> u16 ki_hint; >> u16 ki_ioprio; /* See linux/ioprio.h */ >> - unsigned int ki_cookie; /* for ->iopoll */ >> + union { >> + unsigned int ki_cookie; /* for ->iopoll */ >> + struct wait_page_queue *ki_waitq; /* for async >> buffered IO */ >> + }; >> >> randomized_struct_fields_end >> }; >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index def58de92053..8b65420410ee 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -498,13 +498,16 @@ static inline int >> kiocb_wait_page_queue_init(struct kiocb *kiocb, >> wait_queue_func_t func, >> void *data) >> { >> + /* Can't support async wakeup with polled IO */ >> + if (kiocb->ki_flags & IOCB_HIPRI) >> + return -EINVAL; >> if (kiocb->ki_filp->f_mode & FMODE_BUF_RASYNC) { >> wait->wait.func = func; >> wait->wait.private = data; >> wait->wait.flags = 0; >> INIT_LIST_HEAD(&wait->wait.entry); >> kiocb->ki_flags |= IOCB_WAITQ; >> - kiocb->private = wait; >> + kiocb->ki_waitq = wait; >> return 0; >> } >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index a3b86c9acdc8..18022de7dc33 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -2077,7 +2077,7 @@ static ssize_t >> generic_file_buffered_read(struct kiocb *iocb, >> goto out; >> } >> error = wait_on_page_locked_async(page, >> - iocb- >>> private); >> + iocb- >>> ki_waitq); >> } else { >> if (iocb->ki_flags & IOCB_NOWAIT) { >> put_page(page); >> @@ -2173,7 +2173,7 @@ static ssize_t >> generic_file_buffered_read(struct kiocb *iocb, >> page_not_up_to_date: >> /* Get exclusive access to the page ... */ >> if (iocb->ki_flags & IOCB_WAITQ) >> - error = lock_page_async(page, iocb->private); >> + error = lock_page_async(page, iocb->ki_waitq); >> else >> error = lock_page_killable(page); >> if (unlikely(error)) > > Ack. That seems cleaner to me. I agree, this is better. Ran it through the testing and updated the series accordingly: https://git.kernel.dk/cgit/linux-block/log/?h=async-buffered.4 -- Jens Axboe