Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933081AbcLMO3h (ORCPT ); Tue, 13 Dec 2016 09:29:37 -0500 Received: from mail-oi0-f68.google.com ([209.85.218.68]:35883 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752597AbcLMO3e (ORCPT ); Tue, 13 Dec 2016 09:29:34 -0500 MIME-Version: 1.0 In-Reply-To: References: <1481230746-16741-1-git-send-email-sstabellini@kernel.org> <1481230746-16741-4-git-send-email-sstabellini@kernel.org> <20161210015059.GD1555@ZenIV.linux.org.uk> From: Latchesar Ionkov Date: Tue, 13 Dec 2016 07:29:33 -0700 X-Google-Sender-Auth: lyeeUNmlzgb_gwvmw-IaVGxt7E0 Message-ID: Subject: Re: [PATCH 4/5] 9p: introduce async read requests To: Stefano Stabellini Cc: Al Viro , V9FS Developers , Eric Van Hensbergen , rminnich@sandia.gov, Linux Kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4788 Lines: 100 If the user asked for more than msize-iohdrsz (or rather iounit) bytes, you should do your best to read as much as possible before you return to user space. That means that if a read returns the maximum possible count you HAVE to issue another read until you get a short read. What Al is hinting at is that you can issue multiple read requests simultaneously, assuming that most of them will return data. So the way I see your options are two, with an additional tweak. The first option is to issue more read calls when the previous ones complete (your second option). The second option is at the beginning to issue all the read calls simultaneously. And the tweak is to do something in between and issue up to a limit of simultaneous read calls. Thanks, Lucho On Mon, Dec 12, 2016 at 6:15 PM, Stefano Stabellini wrote: > Hi Al, > thanks for looking at the patch. > > On Sat, 10 Dec 2016, Al Viro wrote: >> On Thu, Dec 08, 2016 at 12:59:05PM -0800, Stefano Stabellini wrote: >> >> >> > + } else { >> > + req = p9_client_get_req(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); >> > + if (IS_ERR(req)) { >> > + *err = PTR_ERR(req); >> > + break; >> > + } >> > + req->rsize = iov_iter_get_pages_alloc(to, &req->pagevec, >> > + (size_t)rsize, &req->offset); >> > + req->kiocb = iocb; >> > + for (i = 0; i < req->rsize; i += PAGE_SIZE) >> > + page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]); >> > + req->callback = p9_client_read_complete; >> > + >> > + *err = clnt->trans_mod->request(clnt, req); >> > + if (*err < 0) { >> > + clnt->status = Disconnected; >> > + release_pages(req->pagevec, >> > + (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE, >> > + true); >> > + kvfree(req->pagevec); >> > + p9_free_req(clnt, req); >> > + break; >> > + } >> > + >> > + *err = -EIOCBQUEUED; >> >> IDGI. AFAICS, your code will result in shitloads of short reads - every >> time when you give it a multi-iovec array, only the first one will be >> issued and the rest won't be even looked at. Sure, it is technically >> legal, but I very much doubt that aio users will be happy with that. >> >> What am I missing here? > > There is a problem with short reads, but I don't think it is the one > you described, unless I made a mistake somewhere. > > The code above will issue one request as big as possible (size is > limited by clnt->msize, which is the size of the channel). No matter how > many segs in the iov_iter, the request will cover the maximum amount of > bytes allowed by the channel, typically 4K but can be larger. So > multi-iovec arrays should be handled correctly up to msize. Please let > me know if I am wrong, I am not an expert on this. I tried, but couldn't > actually get any multi-iovec arrays in my tests. > > > That said, reading the code again, there are indeed two possible causes > of short reads. The first one are responses which have a smaller byte > count than requested. Currently this case is not handled, forwarding > the short read up to the user. But I wrote and tested successfully a > patch that issues another follow-up request from the completion > function. Example: > p9_client_read, issue request, size 4K > p9_client_read_complete, receive response, count 2K > p9_client_read_complete, issue request size 4K-2K = 2K > p9_client_read_complete, receive response size 2K > p9_client_read_complete, call ki_complete > > > The second possible cause of short reads are requests which are bigger > than msize. For example a 2MB read over a 4K channel. In this patch we > simply issue one request as big as msize, and eventually return to the > caller a smaller byte count. One option is to simply fall back to sync > IO in these cases. Another solution is to use the same technique > described above: we issue the first request as big as msize, then, from > the callback function, we issue a follow-up request, and again, and > again, until we fully complete the large read. This way, although we are > not issuing any simultaneous requests for a specific large read, at > least we can issue other aio requests in parallel for other reads or > writes. It should still be more performant than sync IO. > > I am thinking of implementing this second option in the next version of > the series.