LinuxLists.cc - [PATCH] nvme: use the correct msix vector for each queue

2016-12-07 22:04:04

Subject: [PATCH] nvme: use the correct msix vector for each queue

Change each queue's cq_vector to match its qid, instead of qid - 1.

The first queue is always the admin queue, and the remaining queues are
I/O queues. The interrupt vectors they use are all in the same array,
however, the vector indexes for the admin and I/O queues are setup
differently; the admin queue's cq_vector is manually set to 0, while
each I/O queue's cq_vector is set to qid - 1. Since the admin queue
is qid 0, and the I/O queues start at qid 1, using qid - 1 is wrong for the
I/O queues, as it makes the first I/O queue (qid 1) share the vector from
the admin queue (qid 0), and no queue uses the last interrupt vector.
Instead, each I/O queue should set their cq_vector to qid.

Signed-off-by: Dan Streetman <[email protected]>
---
drivers/nvme/host/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5e52034..def2285 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1117,7 +1117,7 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
struct nvme_dev *dev = nvmeq->dev;
int result;

- nvmeq->cq_vector = qid - 1;
+ nvmeq->cq_vector = qid;
result = adapter_alloc_cq(dev, qid, nvmeq);
if (result < 0)
return result;
--
2.9.3

2016-12-07 22:34:31

by Keith Busch

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 07, 2016 at 05:03:48PM -0500, Dan Streetman wrote:
> Change each queue's cq_vector to match its qid, instead of qid - 1.
>
> The first queue is always the admin queue, and the remaining queues are
> I/O queues. The interrupt vectors they use are all in the same array,
> however, the vector indexes for the admin and I/O queues are setup
> differently; the admin queue's cq_vector is manually set to 0, while
> each I/O queue's cq_vector is set to qid - 1. Since the admin queue
> is qid 0, and the I/O queues start at qid 1, using qid - 1 is wrong for the
> I/O queues, as it makes the first I/O queue (qid 1) share the vector from
> the admin queue (qid 0), and no queue uses the last interrupt vector.
> Instead, each I/O queue should set their cq_vector to qid.

pci_alloc_irq_vectors doesn't know you intend to make the first
vector special, so it's going to come up with a CPU affinity from
blk_mq_pci_map_queues that clashes with what you've programmed in the
IO completion queues.

2016-12-07 22:36:43

by Dan Streetman

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 7, 2016 at 5:44 PM, Keith Busch <[email protected]> wrote:
> On Wed, Dec 07, 2016 at 05:03:48PM -0500, Dan Streetman wrote:
>> Change each queue's cq_vector to match its qid, instead of qid - 1.
>>
>> The first queue is always the admin queue, and the remaining queues are
>> I/O queues. The interrupt vectors they use are all in the same array,
>> however, the vector indexes for the admin and I/O queues are setup
>> differently; the admin queue's cq_vector is manually set to 0, while
>> each I/O queue's cq_vector is set to qid - 1. Since the admin queue
>> is qid 0, and the I/O queues start at qid 1, using qid - 1 is wrong for the
>> I/O queues, as it makes the first I/O queue (qid 1) share the vector from
>> the admin queue (qid 0), and no queue uses the last interrupt vector.
>> Instead, each I/O queue should set their cq_vector to qid.
>
> pci_alloc_irq_vectors doesn't know you intend to make the first
> vector special, so it's going to come up with a CPU affinity from
> blk_mq_pci_map_queues that clashes with what you've programmed in the
> IO completion queues.

I don't follow. You're saying you mean to share cq_vector 0 between
the admin queue and io queue 1?

2016-12-07 22:39:45

by Keith Busch

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 07, 2016 at 05:36:00PM -0500, Dan Streetman wrote:
> On Wed, Dec 7, 2016 at 5:44 PM, Keith Busch <[email protected]> wrote:
> > pci_alloc_irq_vectors doesn't know you intend to make the first
> > vector special, so it's going to come up with a CPU affinity from
> > blk_mq_pci_map_queues that clashes with what you've programmed in the
> > IO completion queues.
>
> I don't follow. You're saying you mean to share cq_vector 0 between
> the admin queue and io queue 1?

I'm just saying that blk-mq's hctx mapping will end up choosing a queue
who's vector is mapped to a different CPU, and we don't want that.

We are currently sharing the first IO queue's interrupt vector with
the admin queue's on purpose. Are you saying there's something wrong
with that?

2016-12-07 22:46:32

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 07, 2016 at 05:49:42PM -0500, Keith Busch wrote:
> I'm just saying that blk-mq's hctx mapping will end up choosing a queue
> who's vector is mapped to a different CPU, and we don't want that.

Right. For 4.10 we could use the pci_alloc_irq_vectors_affinity helper
to set away a pre_vector IFF we want a separate vector for the admin
queue.

> We are currently sharing the first IO queue's interrupt vector with
> the admin queue's on purpose. Are you saying there's something wrong
> with that?

But given that the sharing was done intentionally and we had a long
discussion on it back then there should be no real reason to change
the assignment in NVMe.

2016-12-07 22:47:17

by Dan Streetman

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 7, 2016 at 5:49 PM, Keith Busch <[email protected]> wrote:
> On Wed, Dec 07, 2016 at 05:36:00PM -0500, Dan Streetman wrote:
>> On Wed, Dec 7, 2016 at 5:44 PM, Keith Busch <[email protected]> wrote:
>> > pci_alloc_irq_vectors doesn't know you intend to make the first
>> > vector special, so it's going to come up with a CPU affinity from
>> > blk_mq_pci_map_queues that clashes with what you've programmed in the
>> > IO completion queues.
>>
>> I don't follow. You're saying you mean to share cq_vector 0 between
>> the admin queue and io queue 1?
>
> I'm just saying that blk-mq's hctx mapping will end up choosing a queue
> who's vector is mapped to a different CPU, and we don't want that.
>
> We are currently sharing the first IO queue's interrupt vector with
> the admin queue's on purpose. Are you saying there's something wrong
> with that?

that's intentional? Ok then. That's extremely non-obvious.

Is there a reason you want to share the interrupt between the queues?

2016-12-07 22:50:02

by Dan Streetman

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 7, 2016 at 5:46 PM, Christoph Hellwig <[email protected]> wrote:
> On Wed, Dec 07, 2016 at 05:49:42PM -0500, Keith Busch wrote:
>> I'm just saying that blk-mq's hctx mapping will end up choosing a queue
>> who's vector is mapped to a different CPU, and we don't want that.
>
> Right. For 4.10 we could use the pci_alloc_irq_vectors_affinity helper
> to set away a pre_vector IFF we want a separate vector for the admin
> queue.
>
>> We are currently sharing the first IO queue's interrupt vector with
>> the admin queue's on purpose. Are you saying there's something wrong
>> with that?
>
> But given that the sharing was done intentionally and we had a long
> discussion on it back then there should be no real reason to change
> the assignment in NVMe.

sorry, i missed the past discussion. It still seems strange and
obscure that it's intentional, from reading the code at least.

2016-12-07 22:55:20

by Keith Busch

[permalink] [raw]

Subject: Re: [PATCH] nvme: use the correct msix vector for each queue

On Wed, Dec 07, 2016 at 05:46:34PM -0500, Dan Streetman wrote:
>
> Is there a reason you want to share the interrupt between the queues?

The admin queue is hardly ever used (if at all) compared to an IO queue's
usage, so why waste the resource? I bet you can't measure a preformance
difference on queues that don't share, and it's reasonable to expect an
nvme controller's MSI-x count matches the IO queue count, which forces
us to share a vector.