Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1531336imu; Tue, 20 Nov 2018 20:29:08 -0800 (PST) X-Google-Smtp-Source: AJdET5dIE50MPybuXVol0rx3bT6p6xnJKHA0P+yiNC15IrXBWDcdM56UYlYijIRQTMpWGlhnB3CX X-Received: by 2002:a62:1289:: with SMTP id 9mr5261029pfs.102.1542774548758; Tue, 20 Nov 2018 20:29:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542774548; cv=none; d=google.com; s=arc-20160816; b=JorPjiCWWWot9XT/hv8SqaP5NhV45nnGJPhtW//9cZjm2trRwLLXAceb/LD5sJVQKI q0ftmV5EyKnJReOFUybRl2a74JjYB6DZB1VceeGAA7O8ex264zdaVpgzNPwKPHBAywM7 pKv54rkytYiMl0KjE7uz6sGlL5Zo3AVOlOx1Bvp29+0qpnpAMvp0hWGFdF7jhYVK8V3e QdFuUWYdOWFm30/lYqfdwxAbIsmHjZ3qvDP0XGLvHLofTl3qZxKOGpFTQy41aroXSsJ7 whgds0d0D63N2YTjzpYdrGuTYkdFVdj1DTxGJ+8yFYFdLolqcHUnTRPYGgXYOPQGnDl7 Wd8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=qlXNnk0fz2zW30JawgPi0MLm0NV117hTjPsSlAiELmg=; b=EIhcO9M/R73RfDqRpjsSu03RhDBtGcOxPIi883q4pXL7o4Y/gRGGB325RBTKi1Yven jJdBfpNatj7yBWpQnYkhiGTaW5Jjkv3HUEn7R+LI/94V35kOWiD4RWA0zkTPR3J9H3qv iAkP7EPTuaIrY1qJD/jVxnPInRBOYPSwBSfl79Hc4KbrA4ot4yKT8bmicWq2Q01h7wTI ei9cq8f1XdAkUKLKd99b9+Gdol58QzCRorZchRN6148gbua0h9t9NEOP9oMWXXR5Qxs5 5yyFnEBuUk8BNowKl+qbGaY6b18vCieM+Zpm9I+SCAchqfrGGfvg+HSI3XdMNeOZkVT8 F8dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u11si29409249plz.280.2018.11.20.20.28.54; Tue, 20 Nov 2018 20:29:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726877AbeKUO6f (ORCPT + 99 others); Wed, 21 Nov 2018 09:58:35 -0500 Received: from mail-pl1-f194.google.com ([209.85.214.194]:34975 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725939AbeKUO6e (ORCPT ); Wed, 21 Nov 2018 09:58:34 -0500 Received: by mail-pl1-f194.google.com with SMTP id v1-v6so3546054plo.2; Tue, 20 Nov 2018 20:25:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=qlXNnk0fz2zW30JawgPi0MLm0NV117hTjPsSlAiELmg=; b=EuO7JMgv8NoR/UGeyrK9pSGK4Drml396tt0iHO2NqbCSkiSntbMP9Ww6MAlghS/XFA 6vD7dkn+34A8qVrv7Jjgjxh/d3OHgtd6Wc8203BBMoHjHqEGGFYxe/sMgG9iH99mZULJ ITjBYf3reQu7Bw1GduRrvvF44YAemA6UpaDsgg5xShircgkS6R9vkd99CXvL4nAu3m3H HuzdROYJO4JMjjD6xrs6wg9bl0JBt5wpH6e+FmITaYfz+7MJAig7LwPaNa0D9CGP2QcI +hALwUitTUaaO4PIWEQND+bPX1cFmdGGUhqzQTxwzTZm/HtO9fkqV3hC5n8ZWRGs1gMd mDJw== X-Gm-Message-State: AA+aEWYkmo54NajRX72pdpqVmEKzj0xmH2U0WlMPvbYWDAXCcRxQkVAo Q+WG0KLovudinxS7LJXwts0= X-Received: by 2002:a17:902:aa84:: with SMTP id d4-v6mr5236385plr.25.1542774349548; Tue, 20 Nov 2018 20:25:49 -0800 (PST) Received: from ?IPv6:2601:647:4800:973f:8a0:7611:3223:f4db? ([2601:647:4800:973f:8a0:7611:3223:f4db]) by smtp.gmail.com with ESMTPSA id t13sm86916556pgr.42.2018.11.20.20.25.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Nov 2018 20:25:48 -0800 (PST) Subject: Re: [PATCH V10 09/19] block: introduce bio_bvecs() To: Ming Lei Cc: Christoph Hellwig , Jens Axboe , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Kent Overstreet , Mike Snitzer , dm-devel@redhat.com, Alexander Viro , linux-fsdevel@vger.kernel.org, Shaohua Li , linux-raid@vger.kernel.org, linux-erofs@lists.ozlabs.org, David Sterba , linux-btrfs@vger.kernel.org, "Darrick J . Wong" , linux-xfs@vger.kernel.org, Gao Xiang , Theodore Ts'o , linux-ext4@vger.kernel.org, Coly Li , linux-bcache@vger.kernel.org, Boaz Harrosh , Bob Peterson , cluster-devel@redhat.com References: <20181115085306.9910-1-ming.lei@redhat.com> <20181115085306.9910-10-ming.lei@redhat.com> <20181116134541.GH3165@lst.de> <002fe56b-25e4-573e-c09b-bb12c3e8d25a@grimberg.me> <20181120161651.GB2629@lst.de> <53526aae-fb9b-ee38-0a01-e5899e2d4e4d@grimberg.me> <20181121005902.GA31748@ming.t460p> <2d9bee7a-f010-dcf4-1184-094101058584@grimberg.me> <20181121034415.GA8408@ming.t460p> From: Sagi Grimberg Message-ID: <2a47d336-c19b-6bf4-c247-d7382871eeea@grimberg.me> Date: Tue, 20 Nov 2018 20:25:46 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181121034415.GA8408@ming.t460p> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> I would like to avoid growing bvec tables and keep everything >> preallocated. Plus, a bvec_iter operates on a bvec which means >> we'll need a table there as well... Not liking it so far... > > In case of bios in one request, we can't know how many bvecs there > are except for calling rq_bvecs(), so it may not be suitable to > preallocate the table. If you have to send the IO request in one send(), > runtime allocation may be inevitable. I don't want to do that, I want to work on a single bvec at a time like the current implementation does. > If you don't require to send the IO request in one send(), you may send > one bio in one time, and just uses the bio's bvec table directly, > such as the single bio case in lo_rw_aio(). we'd need some indication that we need to reinit my iter with the new bvec, today we do: static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req, int len) { req->snd.data_sent += len; req->pdu_sent += len; iov_iter_advance(&req->snd.iter, len); if (!iov_iter_count(&req->snd.iter) && req->snd.data_sent < req->data_len) { req->snd.curr_bio = req->snd.curr_bio->bi_next; nvme_tcp_init_send_iter(req); } } and initialize the send iter. I imagine that now I will need to switch to the next bvec and only if I'm on the last I need to use the next bio... Do you offer an API for that? >>> can this way avoid your blocking issue? You may see this >>> example in branch 'rq->bio != rq->biotail' of lo_rw_aio(). >> >> This is exactly an example of not ignoring the bios... > > Yeah, that is the most common example, given merge is enabled > in most of cases. If the driver or device doesn't care merge, > you can disable it and always get single bio request, then the > bio's bvec table can be reused for send(). Does bvec_iter span bvecs with your patches? I didn't see that change? >> I'm not sure how this helps me either. Unless we can set a bvec_iter to >> span bvecs or have an abstract bio crossing when we re-initialize the >> bvec_iter I don't see how I can ignore bios completely... > > rq_for_each_bvec() will iterate over all bvecs from all bios, so you > needn't to see any bio in this req. But I don't need this iteration, I need a transparent API like; bvec2 = rq_bvec_next(rq, bvec) This way I can simply always reinit my iter without thinking about how the request/bios/bvecs are constructed... > rq_bvecs() will return how many bvecs there are in this request(cover > all bios in this req) Still not very useful given that I don't want to use a table... >>> So looks nvme-tcp host driver might be the 2nd driver which benefits >>> from multi-page bvec directly. >>> >>> The multi-page bvec V11 has passed my tests and addressed almost >>> all the comments during review on V10. I removed bio_vecs() in V11, >>> but it won't be big deal, we can introduce them anytime when there >>> is the requirement. >> >> multipage-bvecs and nvme-tcp are going to conflict, so it would be good >> to coordinate on this. I think that nvme-tcp host needs some adjustments >> as setting a bvec_iter. I'm under the impression that the change is rather >> small and self-contained, but I'm not sure I have the full >> picture here. > > I guess I may not get your exact requirement on block io iterator from nvme-tcp > too, :-( They are pretty much listed above. Today nvme-tcp sets an iterator with: vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); nsegs = bio_segments(bio); size = bio->bi_iter.bi_size; offset = bio->bi_iter.bi_bvec_done; iov_iter_bvec(&req->snd.iter, WRITE, vec, nsegs, size); and when done, iterate to the next bio and do the same. With multipage bvec it would be great if we can simply have something like rq_bvec_next() that would pretty much satisfy the requirements from the nvme-tcp side...