Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757489AbbEVJDc (ORCPT ); Fri, 22 May 2015 05:03:32 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:51178 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757323AbbEVJD0 (ORCPT ); Fri, 22 May 2015 05:03:26 -0400 Message-ID: <555EF0CF.8030009@oracle.com> Date: Fri, 22 May 2015 17:03:11 +0800 From: Bob Liu User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Paul Durrant CC: "xen-devel@lists.xen.org" , David Vrabel , "justing@spectralogic.com" , "konrad.wilk@oracle.com" , Roger Pau Monne , Julien Grall , "boris.ostrovsky@oracle.com" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v5 2/2] xen/block: add multi-page ring support References: <1432252787-24268-1-git-send-email-bob.liu@oracle.com> <1432252787-24268-2-git-send-email-bob.liu@oracle.com> <9AAE0902D5BC7E449B7C8E4E778ABCD0259141D9@AMSPEX01CL01.citrite.net> In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD0259141D9@AMSPEX01CL01.citrite.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5905 Lines: 158 On 05/22/2015 04:31 PM, Paul Durrant wrote: >> -----Original Message----- >> From: Bob Liu [mailto:bob.liu@oracle.com] >> Sent: 22 May 2015 01:00 >> To: xen-devel@lists.xen.org >> Cc: David Vrabel; justing@spectralogic.com; konrad.wilk@oracle.com; Roger >> Pau Monne; Paul Durrant; Julien Grall; boris.ostrovsky@oracle.com; linux- >> kernel@vger.kernel.org; Bob Liu >> Subject: [PATCH v5 2/2] xen/block: add multi-page ring support >> >> Extend xen/block to support multi-page ring, so that more requests can be >> issued by using more than one pages as the request ring between blkfront >> and backend. >> As a result, the performance can get improved significantly. >> >> We got some impressive improvements on our highend iscsi storage cluster >> backend. If using 64 pages as the ring, the IOPS increased about 15 times >> for the throughput testing and above doubled for the latency testing. >> >> The reason was the limit on outstanding requests is 32 if use only one-page >> ring, but in our case the iscsi lun was spread across about 100 physical >> drives, 32 was really not enough to keep them busy. >> >> Changes in v2: >> - Rebased to 4.0-rc6. >> - Document on how multi-page ring feature working to linux io/blkif.h. >> >> Changes in v3: >> - Remove changes to linux io/blkif.h and follow the protocol defined >> in io/blkif.h of XEN tree. >> - Rebased to 4.1-rc3 >> >> Changes in v4: >> - Turn to use 'ring-page-order' and 'max-ring-page-order'. >> - A few comments from Roger. >> >> Changes in v5: >> - Clarify 4k granularity to comment. >> - Address more comments from Roger. >> >> Signed-off-by: Bob Liu >> --- >> drivers/block/xen-blkback/blkback.c | 13 ++++ >> drivers/block/xen-blkback/common.h | 3 +- >> drivers/block/xen-blkback/xenbus.c | 88 +++++++++++++++++------ >> drivers/block/xen-blkfront.c | 135 +++++++++++++++++++++++++------- >> ---- >> 4 files changed, 179 insertions(+), 60 deletions(-) >> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen- >> blkback/blkback.c >> index 713fc9f..2126842 100644 >> --- a/drivers/block/xen-blkback/blkback.c >> +++ b/drivers/block/xen-blkback/blkback.c >> @@ -84,6 +84,13 @@ MODULE_PARM_DESC(max_persistent_grants, >> "Maximum number of grants to map persistently"); >> >> /* >> + * Maximum order of pages to be used for the shared ring between front >> and >> + * backend, 4KB page granularity is used. >> + */ >> +unsigned int xen_blkif_max_ring_order = >> XENBUS_MAX_RING_PAGE_ORDER; >> +module_param_named(max_ring_page_order, xen_blkif_max_ring_order, >> int, S_IRUGO); >> +MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages >> to be used for the shared ring"); >> +/* >> * The LRU mechanism to clean the lists of persistent grants needs to >> * be executed periodically. The time interval between consecutive >> executions >> * of the purge mechanism is set in ms. >> @@ -1438,6 +1445,12 @@ static int __init xen_blkif_init(void) >> if (!xen_domain()) >> return -ENODEV; >> >> + if (xen_blkif_max_ring_order > XENBUS_MAX_RING_PAGE_ORDER) >> { >> + pr_info("Invalid max_ring_order (%d), will use default max: >> %d.\n", >> + xen_blkif_max_ring_order, >> XENBUS_MAX_RING_PAGE_ORDER); >> + xen_blkif_max_ring_order = >> XENBUS_MAX_RING_PAGE_ORDER; >> + } >> + >> rc = xen_blkif_interface_init(); >> if (rc) >> goto failed_init; >> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen- >> blkback/common.h >> index f620b5d..919a1ab 100644 >> --- a/drivers/block/xen-blkback/common.h >> +++ b/drivers/block/xen-blkback/common.h >> @@ -44,6 +44,7 @@ >> #include >> #include >> >> +extern unsigned int xen_blkif_max_ring_order; >> /* >> * This is the maximum number of segments that would be allowed in >> indirect >> * requests. This value will also be passed to the frontend. >> @@ -248,7 +249,7 @@ struct backend_info; >> #define PERSISTENT_GNT_WAS_ACTIVE 1 >> >> /* Number of requests that we can fit in a ring */ >> -#define XEN_BLKIF_REQS 32 >> +#define XEN_MAX_BLKIF_REQS (32 * >> XENBUS_MAX_RING_PAGES) >> >> struct persistent_gnt { >> struct page *page; >> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen- >> blkback/xenbus.c >> index 6ab69ad..bc33888 100644 >> --- a/drivers/block/xen-blkback/xenbus.c >> +++ b/drivers/block/xen-blkback/xenbus.c >> @@ -25,6 +25,7 @@ >> >> /* Enlarge the array size in order to fully show blkback name. */ >> #define BLKBACK_NAME_LEN (20) >> +#define RINGREF_NAME_LEN (20) >> >> struct backend_info { >> struct xenbus_device *dev; >> @@ -152,7 +153,7 @@ static struct xen_blkif *xen_blkif_alloc(domid_t >> domid) >> INIT_LIST_HEAD(&blkif->pending_free); >> INIT_WORK(&blkif->free_work, xen_blkif_deferred_free); >> >> - for (i = 0; i < XEN_BLKIF_REQS; i++) { >> + for (i = 0; i < XEN_MAX_BLKIF_REQS; i++) { > > How big is XEN_MAX_BLKIF_REQS? These allocations are per-instance so I'd be concerned that the increase in the number of allocations would hit system scalability. > Right, Roger and I have agreed to delay request allocation(including indirect page related memory) until we know the exactly value. But that would be in an new patch soon. " Ack. As said, we have been doing this for a long time. When I added indirect descriptors I've also allocated everything before knowing if indirect descriptors will be used or not. Maybe it's time to change that and provide a way to allocate how many requests we need, and which fields should be allocated based on the supported features. " Thanks, -Bob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/