MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.10.1612121635020.2777@sstabellini-ThinkPad-X260>
References: <alpine.DEB.2.10.1612081249000.22778@sstabellini-ThinkPad-X260>
 <1481230746-16741-1-git-send-email-sstabellini@kernel.org>
 <1481230746-16741-4-git-send-email-sstabellini@kernel.org>
 <20161210015059.GD1555@ZenIV.linux.org.uk> <alpine.DEB.2.10.1612121635020.2777@sstabellini-ThinkPad-X260>
From: Latchesar Ionkov <lucho@ionkov.net>
Date: Tue, 13 Dec 2016 07:29:33 -0700
Message-ID: <CAOha14z_D4S6Z6sS9V=dXSzxTf5v1DW-zSi-43K7z+YyBi9w3w@mail.gmail.com>
Subject: Re: [PATCH 4/5] 9p: introduce async read requests
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
        V9FS Developers <v9fs-developer@lists.sourceforge.net>,
        Eric Van Hensbergen <ericvh@gmail.com>, rminnich@sandia.gov,
        Linux Kernel <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4788
Lines: 100

If the user asked for more than msize-iohdrsz (or rather iounit)
bytes, you should do your best to read as much as possible before you
return to user space. That means that if a read returns the maximum
possible count you HAVE to issue another read until you get a short
read.

What Al is hinting at is that you can issue multiple read requests
simultaneously, assuming that most of them will return data.

So the way I see your options are two, with an additional tweak. The
first option is to issue more read calls when the previous ones
complete (your second option). The second option is at the beginning
to issue all the read calls simultaneously. And the tweak is to do
something in between and issue up to a limit of simultaneous read
calls.

Thanks,
    Lucho

On Mon, Dec 12, 2016 at 6:15 PM, Stefano Stabellini
<sstabellini@kernel.org> wrote:
> Hi Al,
> thanks for looking at the patch.
>
> On Sat, 10 Dec 2016, Al Viro wrote:
>> On Thu, Dec 08, 2016 at 12:59:05PM -0800, Stefano Stabellini wrote:
>>
>>
>> > +           } else {
>> > +                   req = p9_client_get_req(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize);
>> > +                   if (IS_ERR(req)) {
>> > +                           *err = PTR_ERR(req);
>> > +                           break;
>> > +                   }
>> > +                   req->rsize = iov_iter_get_pages_alloc(to, &req->pagevec,
>> > +                                   (size_t)rsize, &req->offset);
>> > +                   req->kiocb = iocb;
>> > +                   for (i = 0; i < req->rsize; i += PAGE_SIZE)
>> > +                           page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]);
>> > +                   req->callback = p9_client_read_complete;
>> > +
>> > +                   *err = clnt->trans_mod->request(clnt, req);
>> > +                   if (*err < 0) {
>> > +                           clnt->status = Disconnected;
>> > +                           release_pages(req->pagevec,
>> > +                                           (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE,
>> > +                                           true);
>> > +                           kvfree(req->pagevec);
>> > +                           p9_free_req(clnt, req);
>> > +                           break;
>> > +                   }
>> > +
>> > +                   *err = -EIOCBQUEUED;
>>
>> IDGI.  AFAICS, your code will result in shitloads of short reads - every
>> time when you give it a multi-iovec array, only the first one will be
>> issued and the rest won't be even looked at.  Sure, it is technically
>> legal, but I very much doubt that aio users will be happy with that.
>>
>> What am I missing here?
>
> There is a problem with short reads, but I don't think it is the one
> you described, unless I made a mistake somewhere.
>
> The code above will issue one request as big as possible (size is
> limited by clnt->msize, which is the size of the channel). No matter how
> many segs in the iov_iter, the request will cover the maximum amount of
> bytes allowed by the channel, typically 4K but can be larger. So
> multi-iovec arrays should be handled correctly up to msize. Please let
> me know if I am wrong, I am not an expert on this. I tried, but couldn't
> actually get any multi-iovec arrays in my tests.
>
>
> That said, reading the code again, there are indeed two possible causes
> of short reads. The first one are responses which have a smaller byte
> count than requested. Currently this case is not handled, forwarding
> the short read up to the user. But I wrote and tested successfully a
> patch that issues another follow-up request from the completion
> function. Example:
>   p9_client_read, issue request, size 4K
>   p9_client_read_complete, receive response, count 2K
>   p9_client_read_complete, issue request size 4K-2K = 2K
>   p9_client_read_complete, receive response size 2K
>   p9_client_read_complete, call ki_complete
>
>
> The second possible cause of short reads are requests which are bigger
> than msize. For example a 2MB read over a 4K channel. In this patch we
> simply issue one request as big as msize, and eventually return to the
> caller a smaller byte count. One option is to simply fall back to sync
> IO in these cases. Another solution is to use the same technique
> described above: we issue the first request as big as msize, then, from
> the callback function, we issue a follow-up request, and again, and
> again, until we fully complete the large read. This way, although we are
> not issuing any simultaneous requests for a specific large read, at
> least we can issue other aio requests in parallel for other reads or
> writes. It should still be more performant than sync IO.
>
> I am thinking of implementing this second option in the next version of
> the series.