2012-11-09 19:18:28

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 00/10] make I/O path allocations more numa-friendly

Hi,

This patch set makes memory allocations for data structures used in
the I/O path more numa friendly by allocating them from the same numa
node as the storage device. I've only converted a handful of drivers
at this point. My testing showed that, for workloads where the I/O
processes were not tied to the numa node housing the device, a speedup
of around 6% was observed. When the I/O processes were tied to the
numa node of the device, there was no measurable difference in my test
setup. Given my relatively low-end setup[1], I wouldn't be surprised
if others could show a more significant performance advantage.

Comments would be greatly appreciated.

Cheers,
Jeff

[1] LSI Megaraid SAS controller with 1GB battery-backed cache,
fronting a RAID 6 10+2. The workload I used was tuned to not
have to hit disk. Fio file attached.

--
changes from v2->v3:
- Made the numa_node Scsi_Host structure member dependent on CONFIG_NUMA
- Got rid of a GFP_ZERO I added accidentally
changes from v1->v2:
- got rid of the vfs patch, as Al pointed out some fundamental
problems with it
- credited Bart van Assche properly


Jeff Moyer (10):
scsi: add scsi_host_alloc_node
scsi: make __scsi_alloc_queue numa-aware
scsi: make scsi_alloc_sdev numa-aware
scsi: allocate scsi_cmnd-s from the device's local numa node
sd: use alloc_disk_node
ata: use scsi_host_alloc_node
megaraid_sas: use scsi_host_alloc_node
mpt2sas: use scsi_host_alloc_node
lpfc: use scsi_host_alloc_node
cciss: use blk_init_queue_node

drivers/ata/libata-scsi.c | 3 ++-
drivers/block/cciss.c | 3 ++-
drivers/scsi/hosts.c | 13 +++++++++++--
drivers/scsi/lpfc/lpfc_init.c | 10 ++++++----
drivers/scsi/megaraid/megaraid_sas_base.c | 5 +++--
drivers/scsi/mpt2sas/mpt2sas_scsih.c | 4 ++--
drivers/scsi/scsi.c | 16 ++++++++++------
drivers/scsi/scsi_lib.c | 3 ++-
drivers/scsi/scsi_scan.c | 4 ++--
drivers/scsi/sd.c | 2 +-
include/scsi/scsi_host.h | 28 ++++++++++++++++++++++++++++
11 files changed, 69 insertions(+), 22 deletions(-)


2012-11-09 19:18:49

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 01/10] scsi: add scsi_host_alloc_node

Allow an LLD to specify on which numa node to allocate scsi data
structures. Thanks to Bart Van Assche for the suggestion.

Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/hosts.c | 13 +++++++++++--
include/scsi/scsi_host.h | 28 ++++++++++++++++++++++++++++
2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 593085a..06ce602 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -336,16 +336,25 @@ static struct device_type scsi_host_type = {
**/
struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
{
+ return scsi_host_alloc_node(sht, privsize, NUMA_NO_NODE);
+}
+EXPORT_SYMBOL(scsi_host_alloc);
+
+struct Scsi_Host *scsi_host_alloc_node(struct scsi_host_template *sht,
+ int privsize, int node)
+{
struct Scsi_Host *shost;
gfp_t gfp_mask = GFP_KERNEL;

if (sht->unchecked_isa_dma && privsize)
gfp_mask |= __GFP_DMA;

- shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
+ shost = kzalloc_node(sizeof(struct Scsi_Host) + privsize,
+ gfp_mask, node);
if (!shost)
return NULL;

+ scsi_host_set_numa_node(shost, node);
shost->host_lock = &shost->default_lock;
spin_lock_init(shost->host_lock);
shost->shost_state = SHOST_CREATED;
@@ -443,7 +452,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
kfree(shost);
return NULL;
}
-EXPORT_SYMBOL(scsi_host_alloc);
+EXPORT_SYMBOL(scsi_host_alloc_node);

struct Scsi_Host *scsi_register(struct scsi_host_template *sht, int privsize)
{
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 4908480..438856d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -732,6 +732,14 @@ struct Scsi_Host {
*/
struct device *dma_dev;

+#ifdef CONFIG_NUMA
+ /*
+ * Numa node this device is closest to, used for allocating
+ * data structures locally.
+ */
+ int numa_node;
+#endif
+
/*
* We should ensure that this is aligned, both for better performance
* and also because some compilers (m68k) don't automatically force
@@ -776,6 +784,8 @@ extern int scsi_queue_work(struct Scsi_Host *, struct work_struct *);
extern void scsi_flush_work(struct Scsi_Host *);

extern struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *, int);
+extern struct Scsi_Host *scsi_host_alloc_node(struct scsi_host_template *,
+ int, int);
extern int __must_check scsi_add_host_with_dma(struct Scsi_Host *,
struct device *,
struct device *);
@@ -919,6 +929,24 @@ static inline unsigned char scsi_host_get_guard(struct Scsi_Host *shost)
return shost->prot_guard_type;
}

+#ifdef CONFIG_NUMA
+static inline int scsi_host_get_numa_node(struct Scsi_Host *shost)
+{
+ return shost->numa_node;
+}
+
+static inline void scsi_host_set_numa_node(struct Scsi_Host *shost, int node)
+{
+ shost->numa_node = node;
+}
+#else /* CONFIG_NUMA */
+static inline int scsi_host_get_numa_node(struct Scsi_Host *shost)
+{
+ return NUMA_NO_NODE;
+}
+static inline void scsi_host_set_numa_node(struct Scsi_Host *shost, int node) {}
+#endif
+
/* legacy interfaces */
extern struct Scsi_Host *scsi_register(struct scsi_host_template *, int);
extern void scsi_unregister(struct Scsi_Host *);
--
1.7.1

2012-11-09 19:18:56

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 03/10] scsi: make scsi_alloc_sdev numa-aware

Use the numa node id set in the Scsi_Host to allocate the sdev structure
on the device-local numa node.

Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/scsi_scan.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..d91749d 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -232,8 +232,8 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
extern void scsi_evt_thread(struct work_struct *work);
extern void scsi_requeue_run_queue(struct work_struct *work);

- sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
- GFP_ATOMIC);
+ sdev = kzalloc_node(sizeof(*sdev) + shost->transportt->device_size,
+ GFP_ATOMIC, scsi_host_get_numa_node(shost));
if (!sdev)
goto out;

--
1.7.1

2012-11-09 19:18:52

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 02/10] scsi: make __scsi_alloc_queue numa-aware

Pass the numa node id set in the Scsi_Host on to blk_init_queue_node
in order to keep all allocations local to the numa node the device is
closest to.

Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/scsi_lib.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index da36a3a..ebad5e8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1664,7 +1664,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
struct request_queue *q;
struct device *dev = shost->dma_dev;

- q = blk_init_queue(request_fn, NULL);
+ q = blk_init_queue_node(request_fn, NULL,
+ scsi_host_get_numa_node(shost));
if (!q)
return NULL;

--
1.7.1

2012-11-09 19:19:07

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 07/10] megaraid_sas: use scsi_host_alloc_node


Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index d2c5366..707a6cd 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4020,8 +4020,9 @@ megasas_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
if (megasas_set_dma_mask(pdev))
goto fail_set_dma_mask;

- host = scsi_host_alloc(&megasas_template,
- sizeof(struct megasas_instance));
+ host = scsi_host_alloc_node(&megasas_template,
+ sizeof(struct megasas_instance),
+ dev_to_node(&pdev->dev));

if (!host) {
printk(KERN_DEBUG "megasas: scsi_host_alloc failed\n");
--
1.7.1

2012-11-09 19:19:14

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 09/10] lpfc: use scsi_host_alloc_node

Acked-By: James Smart <[email protected]>
Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/lpfc/lpfc_init.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 7dc4218..65956d3 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3051,11 +3051,13 @@ lpfc_create_port(struct lpfc_hba *phba, int instance, struct device *dev)
int error = 0;

if (dev != &phba->pcidev->dev)
- shost = scsi_host_alloc(&lpfc_vport_template,
- sizeof(struct lpfc_vport));
+ shost = scsi_host_alloc_node(&lpfc_vport_template,
+ sizeof(struct lpfc_vport),
+ dev_to_node(&phba->pcidev->dev));
else
- shost = scsi_host_alloc(&lpfc_template,
- sizeof(struct lpfc_vport));
+ shost = scsi_host_alloc_node(&lpfc_template,
+ sizeof(struct lpfc_vport),
+ dev_to_node(&phba->pcidev->dev));
if (!shost)
goto out;

--
1.7.1

2012-11-09 19:19:11

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 10/10] cciss: use blk_init_queue_node


Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/block/cciss.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index b0f553b..5fe5546 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1930,7 +1930,8 @@ static void cciss_get_serial_no(ctlr_info_t *h, int logvol,
static int cciss_add_disk(ctlr_info_t *h, struct gendisk *disk,
int drv_index)
{
- disk->queue = blk_init_queue(do_cciss_request, &h->lock);
+ disk->queue = blk_init_queue_node(do_cciss_request, &h->lock,
+ dev_to_node(&h->dev));
if (!disk->queue)
goto init_queue_failure;
sprintf(disk->disk_name, "cciss/c%dd%d", h->ctlr, drv_index);
--
1.7.1

2012-11-09 19:19:04

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 06/10] ata: use scsi_host_alloc_node


Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/ata/libata-scsi.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e3bda07..9d5dd09 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3586,7 +3586,8 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
struct Scsi_Host *shost;

rc = -ENOMEM;
- shost = scsi_host_alloc(sht, sizeof(struct ata_port *));
+ shost = scsi_host_alloc_node(sht, sizeof(struct ata_port *),
+ dev_to_node(host->dev));
if (!shost)
goto err_alloc;

--
1.7.1

2012-11-09 19:19:00

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 08/10] mpt2sas: use scsi_host_alloc_node


Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/mpt2sas/mpt2sas_scsih.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index af4e6c4..a4d6b36 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -8011,8 +8011,8 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
struct MPT2SAS_ADAPTER *ioc;
struct Scsi_Host *shost;

- shost = scsi_host_alloc(&scsih_driver_template,
- sizeof(struct MPT2SAS_ADAPTER));
+ shost = scsi_host_alloc_node(&scsih_driver_template,
+ sizeof(struct MPT2SAS_ADAPTER), dev_to_node(&pdev->dev));
if (!shost)
return -ENODEV;

--
1.7.1

2012-11-09 19:18:44

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 05/10] sd: use alloc_disk_node


Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/sd.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 12f6fdf..a5dae6b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2714,7 +2714,7 @@ static int sd_probe(struct device *dev)
if (!sdkp)
goto out;

- gd = alloc_disk(SD_MINORS);
+ gd = alloc_disk_node(SD_MINORS, scsi_host_get_numa_node(sdp->host));
if (!gd)
goto out_free;

--
1.7.1

2012-11-09 19:18:40

by Jeff Moyer

[permalink] [raw]
Subject: [patch,v3 04/10] scsi: allocate scsi_cmnd-s from the device's local numa node


Signed-off-by: Jeff Moyer <[email protected]>
---
drivers/scsi/scsi.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2936b44..1750702 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -173,16 +173,19 @@ static DEFINE_MUTEX(host_cmd_pool_mutex);
* NULL on failure
*/
static struct scsi_cmnd *
-scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask)
+scsi_pool_alloc_command(struct scsi_host_cmd_pool *pool, gfp_t gfp_mask,
+ int node)
{
struct scsi_cmnd *cmd;

- cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
+ cmd = kmem_cache_alloc_node(pool->cmd_slab,
+ gfp_mask | pool->gfp_mask | __GFP_ZERO,
+ node);
if (!cmd)
return NULL;

- cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab,
- gfp_mask | pool->gfp_mask);
+ cmd->sense_buffer = kmem_cache_alloc_node(pool->sense_slab,
+ gfp_mask | pool->gfp_mask, node);
if (!cmd->sense_buffer) {
kmem_cache_free(pool->cmd_slab, cmd);
return NULL;
@@ -223,7 +226,8 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask)
{
struct scsi_cmnd *cmd;

- cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask);
+ cmd = scsi_pool_alloc_command(shost->cmd_pool, gfp_mask,
+ scsi_host_get_numa_node(shost));
if (!cmd)
return NULL;

@@ -435,7 +439,7 @@ struct scsi_cmnd *scsi_allocate_command(gfp_t gfp_mask)
if (!pool)
return NULL;

- return scsi_pool_alloc_command(pool, gfp_mask);
+ return scsi_pool_alloc_command(pool, gfp_mask, NUMA_NO_NODE);
}
EXPORT_SYMBOL(scsi_allocate_command);

--
1.7.1

2012-11-10 08:59:58

by Bart Van Assche

[permalink] [raw]
Subject: Re: [patch,v3 04/10] scsi: allocate scsi_cmnd-s from the device's local numa node

On 11/09/12 20:18, Jeff Moyer wrote:
> - cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
> + cmd = kmem_cache_alloc_node(pool->cmd_slab,
> + gfp_mask | pool->gfp_mask | __GFP_ZERO,
> + node);

Hello Jeff,

Is it necessary to add __GFP_ZERO here ? And if so, why ?

Thanks,

Bart.

2012-11-12 15:12:23

by Jeff Moyer

[permalink] [raw]
Subject: Re: [patch,v3 04/10] scsi: allocate scsi_cmnd-s from the device's local numa node

Bart Van Assche <[email protected]> writes:

> On 11/09/12 20:18, Jeff Moyer wrote:
>> - cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask | pool->gfp_mask);
>> + cmd = kmem_cache_alloc_node(pool->cmd_slab,
>> + gfp_mask | pool->gfp_mask | __GFP_ZERO,
>> + node);
>
> Hello Jeff,
>
> Is it necessary to add __GFP_ZERO here ? And if so, why ?

Hi, Bart,

The code used to do zeroing, so I just kept that aspect the same. There
is no kmem_cache_zalloc_node, and it didn't seem worth adding one.

Cheers,
Jeff

2012-11-16 04:33:49

by Jeff Garzik

[permalink] [raw]
Subject: Re: [patch,v3 06/10] ata: use scsi_host_alloc_node

On 11/09/2012 02:18 PM, Jeff Moyer wrote:
> Signed-off-by: Jeff Moyer <[email protected]>
> ---
> drivers/ata/libata-scsi.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)

Acked-by: Jeff Garzik <[email protected]>


2012-11-16 08:08:45

by Bart Van Assche

[permalink] [raw]
Subject: Re: [patch,v3 00/10] make I/O path allocations more numa-friendly

On 11/09/12 20:17, Jeff Moyer wrote:
> This patch set makes memory allocations for data structures used in
> the I/O path more numa friendly by allocating them from the same numa
> node as the storage device. I've only converted a handful of drivers
> at this point. My testing showed that, for workloads where the I/O
> processes were not tied to the numa node housing the device, a speedup
> of around 6% was observed. When the I/O processes were tied to the
> numa node of the device, there was no measurable difference in my test
> setup. Given my relatively low-end setup[1], I wouldn't be surprised
> if others could show a more significant performance advantage.
>
> Comments would be greatly appreciated.

Sorry but I'm not familiar with any of the SCSI LLDs modified via this
patch series. But I'm fine with the SCSI core patches in this series. So
if you want you can add the following to the first five patches in this
series:

Reviewed-by: Bart Van Assche <[email protected]>

Bart.