Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752638AbbETJfO (ORCPT ); Wed, 20 May 2015 05:35:14 -0400 Received: from smtp02.citrix.com ([66.165.176.63]:61419 "EHLO SMTP02.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751520AbbETJfL (ORCPT ); Wed, 20 May 2015 05:35:11 -0400 X-IronPort-AV: E=Sophos;i="5.13,464,1427760000"; d="scan'208";a="266817003" Message-ID: <555C5533.8010205@citrix.com> Date: Wed, 20 May 2015 11:34:43 +0200 From: =?windows-1252?Q?Roger_Pau_Monn=E9?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Bob Liu CC: , , , , , Subject: Re: [PATCH v3 2/2] xen/block: add multi-page ring support References: <1431428504-6623-1-git-send-email-bob.liu@oracle.com> <1431428504-6623-2-git-send-email-bob.liu@oracle.com> <5555D4D1.1030705@citrix.com> <5555E15B.8000406@oracle.com> In-Reply-To: <5555E15B.8000406@oracle.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 8bit X-DLP: MIA2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7635 Lines: 189 El 15/05/15 a les 14.06, Bob Liu ha escrit: > > On 05/15/2015 07:13 PM, Roger Pau Monn? wrote: >> El 12/05/15 a les 13.01, Bob Liu ha escrit: >>> Extend xen/block to support multi-page ring, so that more requests can be issued >>> by using more than one pages as the request ring between blkfront and backend. >>> As a result, the performance can get improved significantly. >> ^ s/can get improved/improves/ >> >>> >>> We got some impressive improvements on our highend iscsi storage cluster backend. >>> If using 64 pages as the ring, the IOPS increased about 15 times for the >>> throughput testing and above doubled for the latency testing. >>> >>> The reason was the limit on outstanding requests is 32 if use only one-page >>> ring, but in our case the iscsi lun was spread across about 100 physical drives, >>> 32 was really not enough to keep them busy. >>> >>> Changes in v2: >>> - Rebased to 4.0-rc6 >>> - Added description on how this protocol works into io/blkif.h >> >> I don't see any changes to io/blkif.h in this patch, is something missing? >> > > Sorry, I should mention in v3 that these changed were removed because I followed the protocol > already defined in XEN git tree: xen/include/public/io/blkif.h So since you were reusing the old protocol specification there was no need to add anything to blkif.h? AFAICT from the conversation in the other thread (the one about updating blkif.h in Xen tree) if you want to reuse an existing protocol you need to make sure it's compatible with the previous implementation. >> Also you use XENBUS_MAX_RING_PAGES which AFAICT it's not defined anywhere. >> > > It was defined in include/xen/xenbus.h. > >>> >>> Changes in v3: >>> - Follow the protocol defined in io/blkif.h on XEN tree >>> >>> Signed-off-by: Bob Liu >>> --- >>> drivers/block/xen-blkback/blkback.c | 14 ++++- >>> drivers/block/xen-blkback/common.h | 4 +- >>> drivers/block/xen-blkback/xenbus.c | 83 ++++++++++++++++++++++------- >>> drivers/block/xen-blkfront.c | 102 +++++++++++++++++++++++++++--------- >>> 4 files changed, 156 insertions(+), 47 deletions(-) >>> >>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c >>> index 713fc9f..f191083 100644 >>> --- a/drivers/block/xen-blkback/blkback.c >>> +++ b/drivers/block/xen-blkback/blkback.c >>> @@ -84,6 +84,12 @@ MODULE_PARM_DESC(max_persistent_grants, >>> "Maximum number of grants to map persistently"); >>> >>> /* >>> + * Maximum number of pages to be used as the ring between front and backend >>> + */ >>> +unsigned int xen_blkif_max_ring_pages = XENBUS_MAX_RING_PAGES; >>> +module_param_named(max_ring_pages, xen_blkif_max_ring_pages, int, S_IRUGO); >>> +MODULE_PARM_DESC(max_ring_pages, "Maximum amount of pages to be used as the ring"); >>> +/* >>> * The LRU mechanism to clean the lists of persistent grants needs to >>> * be executed periodically. The time interval between consecutive executions >>> * of the purge mechanism is set in ms. >>> @@ -630,7 +636,7 @@ purge_gnt_list: >>> } >>> >>> /* Shrink if we have more than xen_blkif_max_buffer_pages */ >>> - shrink_free_pagepool(blkif, xen_blkif_max_buffer_pages); >>> + shrink_free_pagepool(blkif, xen_blkif_max_buffer_pages * blkif->nr_ring_pages); >> >> You are greatly increasing the buffer of free (ballooned) pages. >> Possibly making it 32 times bigger than it used to be, is this really >> needed? >> > > Hmm.. it's a bit aggressive. > How about (xen_blkif_max_buffer_pages * blkif->nr_ring_pages) / 2? IMHO I'm not sure about the best solution here. xen_blkif_max_buffer_pages is set by the user and should represent the buffer pages used by blkback. Modifying it this way means that we are effectively lifting the limit without the user consent, which looks wrong. I think we should change the meaning, so that it's better suited for your proposes. >>> >>> if (log_stats && time_after(jiffies, blkif->st_print)) >>> print_stats(blkif); >>> @@ -1435,6 +1441,12 @@ static int __init xen_blkif_init(void) >>> { >>> int rc = 0; >>> >>> + if (xen_blkif_max_ring_pages > XENBUS_MAX_RING_PAGES) { >>> + pr_info("Invalid max_ring_pages (%d), will use default max: %d.\n", >>> + xen_blkif_max_ring_pages, XENBUS_MAX_RING_PAGES); >>> + xen_blkif_max_ring_pages = XENBUS_MAX_RING_PAGES; >>> + } >>> + >>> if (!xen_domain()) >>> return -ENODEV; >>> >>> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h >>> index f620b5d..84a964c 100644 >>> --- a/drivers/block/xen-blkback/common.h >>> +++ b/drivers/block/xen-blkback/common.h >>> @@ -44,6 +44,7 @@ >>> #include >>> #include >>> >>> +extern unsigned int xen_blkif_max_ring_pages; >>> /* >>> * This is the maximum number of segments that would be allowed in indirect >>> * requests. This value will also be passed to the frontend. >>> @@ -248,7 +249,7 @@ struct backend_info; >>> #define PERSISTENT_GNT_WAS_ACTIVE 1 >>> >>> /* Number of requests that we can fit in a ring */ >>> -#define XEN_BLKIF_REQS 32 >>> +#define XEN_BLKIF_REQS (32 * XENBUS_MAX_RING_PAGES) >> >> This should be made a member of xen_blkif and set dynamically, or else >> we are just wasting memory if the frontend doesn't support multipage >> rings. We seem to be doing this for a bunch of features, but allocating >> 32 times the needed amount of requests (if the frontend doesn't support >> multipage rings) seems too much IMHO. >> > > Right, the reason I use static is to simple the code. > Else we should: > 1) Move xen_blkif_alloc() from backend_probe() to some place until we read the > negotiated 'num-ring-pages' written by front. > 2) Deal with memory alloc/free when domU migrated and 'num-ring-pages' changed. > > I'll try to find a suitable place in frontend_changed() but may be in an new patch late. Ack. As said, we have been doing this for a long time. When I added indirect descriptors I've also allocated everything before knowing if indirect descriptors will be used or not. Maybe it's time to change that and provide a way to allocate how many requests we need, and which fields should be allocated based on the supported features. [...] >> >> Why is this moved to here... >> > > Else the kernel will panic. > >>> + info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff; >>> xenbus_switch_state(dev, XenbusStateInitialised); >>> >>> return 0; >>> @@ -1422,9 +1467,8 @@ static int blkfront_probe(struct xenbus_device *dev, >>> info->connected = BLKIF_STATE_DISCONNECTED; >>> INIT_WORK(&info->work, blkif_restart_queue); >>> >>> - for (i = 0; i < BLK_RING_SIZE; i++) >>> + for (i = 0; i < BLK_MAX_RING_SIZE; i++) >>> info->shadow[i].req.u.rw.id = i+1; >>> - info->shadow[BLK_RING_SIZE-1].req.u.rw.id = 0x0fffffff; >> >> from here. Isn't the previous location suitable anymore? >> > > See: > #define BLK_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * info->nr_ring_pages) > > Because we haven't get info->nr_ring_pages yet here(in probe()), info->nr_ring_pages > is read out in talk_to_blkback() which I already moved to blkback_changed(). Maybe it would be clearer if the initialization for loop was also moved to the place above were info->shadow[BLK_RING_SIZE-1]... is being set? Roger. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/