Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752763AbcLIWXB (ORCPT ); Fri, 9 Dec 2016 17:23:01 -0500 Received: from mail.kernel.org ([198.145.29.136]:42312 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751989AbcLIWW7 (ORCPT ); Fri, 9 Dec 2016 17:22:59 -0500 Date: Fri, 9 Dec 2016 14:22:54 -0800 (PST) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-X260 To: Dominique Martinet cc: Stefano Stabellini , v9fs-developer@lists.sourceforge.net, ericvh@gmail.com, rminnich@sandia.gov, linux-kernel@vger.kernel.org, lucho@ionkov.net Subject: Re: [V9fs-developer] [PATCH 4/5] 9p: introduce async read requests In-Reply-To: <20161209072717.GD18158@nautica> Message-ID: References: <1481230746-16741-1-git-send-email-sstabellini@kernel.org> <1481230746-16741-4-git-send-email-sstabellini@kernel.org> <20161209072717.GD18158@nautica> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5525 Lines: 164 On Fri, 9 Dec 2016, Dominique Martinet wrote: > Stefano Stabellini wrote on Thu, Dec 08, 2016: > > If the read is an async operation, send a 9p request and return > > EIOCBQUEUED. Do not wait for completion. > > > > Complete the read operation from a callback instead. > > > > Signed-off-by: Stefano Stabellini > > --- > > net/9p/client.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 86 insertions(+), 2 deletions(-) > > > > diff --git a/net/9p/client.c b/net/9p/client.c > > index eb589ef..f9f09db 100644 > > --- a/net/9p/client.c > > +++ b/net/9p/client.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -1554,13 +1555,68 @@ int p9_client_unlinkat(struct p9_fid *dfid, const char *name, int flags) > > } > > EXPORT_SYMBOL(p9_client_unlinkat); > > > > +static void > > +p9_client_read_complete(struct p9_client *clnt, struct p9_req_t *req, int status) > > +{ > > + int err, count, n, i, total = 0; > > + char *dataptr, *to; > > + > > + if (req->status == REQ_STATUS_ERROR) { > > + p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err); > > + err = req->t_err; > > + goto out; > > + } > > + err = p9_check_errors(clnt, req); > > + if (err) > > + goto out; > > + > > + err = p9pdu_readf(req->rc, clnt->proto_version, > > + "D", &count, &dataptr); > > + if (err) { > > + trace_9p_protocol_dump(clnt, req->rc); > > + goto out; > > + } > > + if (!count) { > > + p9_debug(P9_DEBUG_ERROR, "count=%d\n", count); > > + err = 0; > > + goto out; > > + } > > + > > + p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count); > > + if (count > req->rsize) > > + count = req->rsize; > > + > > + for (i = 0; i < ((req->rsize + PAGE_SIZE - 1) / PAGE_SIZE); i++) { > > + to = kmap(req->pagevec[i]); > > + to += req->offset; > > + n = PAGE_SIZE - req->offset; > > + if (n > count) > > + n = count; > > + memcpy(to, dataptr, n); > > + kunmap(req->pagevec[i]); > > + req->offset = 0; > > + count -= n; > > + total += n; > > + } > > + > > + err = total; > > + req->kiocb->ki_pos += total; > > + > > +out: > > + req->kiocb->ki_complete(req->kiocb, err, 0); > > + > > + release_pages(req->pagevec, (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE, false); > > + kvfree(req->pagevec); > > + p9_free_req(clnt, req); > > +} > > + > > int > > p9_client_read(struct p9_fid *fid, struct kiocb *iocb, u64 offset, > > struct iov_iter *to, int *err) > > { > > struct p9_client *clnt = fid->clnt; > > struct p9_req_t *req; > > - int total = 0; > > + int total = 0, i; > > *err = 0; > > > > p9_debug(P9_DEBUG_9P, ">>> TREAD fid %d offset %llu %d\n", > > @@ -1587,10 +1643,38 @@ int p9_client_unlinkat(struct p9_fid *dfid, const char *name, int flags) > > req = p9_client_zc_rpc(clnt, P9_TREAD, to, NULL, rsize, > > 0, 11, "dqd", fid->fid, > > offset, rsize); > > - } else { > > + /* sync request */ > > + } else if(iocb == NULL || is_sync_kiocb(iocb)) { > > non_zc = 1; > > req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset, > > rsize); > > + /* async request */ > > + } else { > > I'm not too familiar with iocb/how async IOs should work, but a logic > question just to make sure that has been thought out: > We prefer zc here to async, even if zc can be slow? > > Ideally at some point zc and async aren't exclusive so we'll have async > zc and async normal, but for now I'd say async comes before zc - yes > there will be an extra copy in memory, but it will be done > asynchronously. > Was it intentional to prefer zc here? I wasn't sure what to do about zc. The backends I am testing with don't support zc, so I didn't feel confident in changing its behavior. I think whether zc is faster than async+copy depends on the specific benchmark. iodepth and blocksize parameters in fio, for example. With iodepth=1, zc would be faster, the higher the iodepth, the faster async+copy would become in comparison. At some point async+copy will be faster than zc, but I am not sure where is the threshold, it would probably be storage backend dependent too. Maybe around iodepth=3. This is a reasonable guess but I haven't run any numbers to confirm it. That said, I am happy to follow any strategy you suggest in regards to zc. > > + req = p9_client_get_req(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); > > + if (IS_ERR(req)) { > > + *err = PTR_ERR(req); > > + break; > > + } > > + req->rsize = iov_iter_get_pages_alloc(to, &req->pagevec, > > + (size_t)rsize, &req->offset); > > + req->kiocb = iocb; > > + for (i = 0; i < req->rsize; i += PAGE_SIZE) > > + page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]); > > + req->callback = p9_client_read_complete; > > + > > + *err = clnt->trans_mod->request(clnt, req); > > + if (*err < 0) { > > + clnt->status = Disconnected; > > + release_pages(req->pagevec, > > + (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE, > > + true); > > + kvfree(req->pagevec); > > + p9_free_req(clnt, req); > > + break; > > + } > > + > > + *err = -EIOCBQUEUED; > > + break; > > (Just a note to myself (or anyone who wants to do this) that a couple of > places only look at err if size written is 0... They should check if err > has been set) > > > } > > if (IS_ERR(req)) { > > *err = PTR_ERR(req); >