LinuxLists.cc - [PATCH 2/2] nvme-pci: allow unmanaged interrupts

2024-05-10 14:15:40

Subject: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

From: Keith Busch <[email protected]>

Some people _really_ want to control their interrupt affinity.

Signed-off-by: Keith Busch <[email protected]>
---
drivers/nvme/host/pci.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8e0bb9692685d..4c2799c3f45f5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -63,6 +63,11 @@ MODULE_PARM_DESC(sgl_threshold,
"Use SGLs when average request segment size is larger or equal to "
"this size. Use 0 to disable SGLs.");

+static bool managed_irqs = true;
+module_param(managed_irqs, bool, 0444);
+MODULE_PARM_DESC(managed_irqs,
+ "set to false for user controlled irq affinity");
+
#define NVME_PCI_MIN_QUEUE_SIZE 2
#define NVME_PCI_MAX_QUEUE_SIZE 4095
static int io_queue_depth_set(const char *val, const struct kernel_param *kp);
@@ -456,7 +461,7 @@ static void nvme_pci_map_queues(struct blk_mq_tag_set *set)
* affinity), so use the regular blk-mq cpu mapping
*/
map->queue_offset = qoff;
- if (i != HCTX_TYPE_POLL && offset)
+ if (managed_irqs && i != HCTX_TYPE_POLL && offset)
blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
else
blk_mq_map_queues(map);
@@ -2180,6 +2185,9 @@ static void nvme_calc_irq_sets(struct irq_affinity *affd, unsigned int nrirqs)
struct nvme_dev *dev = affd->priv;
unsigned int nr_read_queues, nr_write_queues = dev->nr_write_queues;

+ if (!nrirqs)
+ nrirqs = affd->post_vectors;
+
/*
* If there is no interrupt available for queues, ensure that
* the default queue is set to 1. The affinity set size is
@@ -2226,6 +2234,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues)
poll_queues = min(dev->nr_poll_queues, nr_io_queues - 1);
dev->io_queues[HCTX_TYPE_POLL] = poll_queues;

+ if (!managed_irqs)
+ affd.post_vectors = nr_io_queues - poll_queues;
+
/*
* Initialize for the single interrupt case, will be updated in
* nvme_calc_irq_sets().
--
2.43.0

2024-05-10 15:12:01

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
> From: Keith Busch <[email protected]>
>
> Some people _really_ want to control their interrupt affinity.

So let them argue why. I'd rather have a really, really, really
good argument for this crap, and I'd like to hear it from the horses
mouth.

2024-05-10 16:28:07

by Keith Busch

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Fri, May 10, 2024 at 05:10:47PM +0200, Christoph Hellwig wrote:
> On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
> > From: Keith Busch <[email protected]>
> >
> > Some people _really_ want to control their interrupt affinity.
>
> So let them argue why. I'd rather have a really, really, really
> good argument for this crap, and I'd like to hear it from the horses
> mouth.

It's just prioritizing predictable user task scheduling for a subset of
CPUs instead of having consistently better storage performance.

We already have "isolcpus=managed_irq," parameter to prevent managed
interrupts from running on a subset of CPUs, so the use case is already
kind of supported. The problem with that parameter is it is a no-op if
the starting affinity spread contains only isolated CPUs.

2024-05-10 23:50:48

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Fri, May 10, 2024 at 10:20:02AM -0600, Keith Busch wrote:
> On Fri, May 10, 2024 at 05:10:47PM +0200, Christoph Hellwig wrote:
> > On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
> > > From: Keith Busch <[email protected]>
> > >
> > > Some people _really_ want to control their interrupt affinity.
> >
> > So let them argue why. I'd rather have a really, really, really
> > good argument for this crap, and I'd like to hear it from the horses
> > mouth.
>
> It's just prioritizing predictable user task scheduling for a subset of
> CPUs instead of having consistently better storage performance.
>
> We already have "isolcpus=managed_irq," parameter to prevent managed
> interrupts from running on a subset of CPUs, so the use case is already
> kind of supported. The problem with that parameter is it is a no-op if
> the starting affinity spread contains only isolated CPUs.

Can you explain a bit why it is a no-op? If only isolated CPUs are
spread on one queue, there will be no IO originated from these isolated
CPUs, that is exactly what the isolation needs.

Thanks,
Ming

2024-05-11 00:43:54

by Keith Busch

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Sat, May 11, 2024 at 07:50:21AM +0800, Ming Lei wrote:
> On Fri, May 10, 2024 at 10:20:02AM -0600, Keith Busch wrote:
> > On Fri, May 10, 2024 at 05:10:47PM +0200, Christoph Hellwig wrote:
> > > On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
> > > > From: Keith Busch <[email protected]>
> > > >
> > > > Some people _really_ want to control their interrupt affinity.
> > >
> > > So let them argue why. I'd rather have a really, really, really
> > > good argument for this crap, and I'd like to hear it from the horses
> > > mouth.
> >
> > It's just prioritizing predictable user task scheduling for a subset of
> > CPUs instead of having consistently better storage performance.
> >
> > We already have "isolcpus=managed_irq," parameter to prevent managed
> > interrupts from running on a subset of CPUs, so the use case is already
> > kind of supported. The problem with that parameter is it is a no-op if
> > the starting affinity spread contains only isolated CPUs.
>
> Can you explain a bit why it is a no-op? If only isolated CPUs are
> spread on one queue, there will be no IO originated from these isolated
> CPUs, that is exactly what the isolation needs.

The "isolcpus=managed_irq," option doesn't limit the dispatching CPUs.
It only limits where the managed irq will assign the effective_cpus as a
best effort.

Example, I boot with a system with 4 threads, one nvme device, and
kernel parameter:

isolcpus=managed_irq,2-3

Run this:

for i in $(seq 0 3); do taskset -c $i dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000 iflag=direct; done

Check /proc/interrupts | grep nvme0:

CPU0 CPU1 CPU2 CPU3
..
26: 1000 0 0 0 PCI-MSIX-0000:00:05.0 1-edge nvme0q1
27: 0 1004 0 0 PCI-MSIX-0000:00:05.0 2-edge nvme0q2
28: 0 0 1000 0 PCI-MSIX-0000:00:05.0 3-edge nvme0q3
29: 0 0 0 1043 PCI-MSIX-0000:00:05.0 4-edge nvme0q4

The isolcpus did nothing becuase the each vector's mask had just one
cpu; there was no where else that the managed irq could send it. The
documentation seems to indicate that was by design as a "best effort".

2024-05-11 00:59:44

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Fri, May 10, 2024 at 06:41:58PM -0600, Keith Busch wrote:
> On Sat, May 11, 2024 at 07:50:21AM +0800, Ming Lei wrote:
> > On Fri, May 10, 2024 at 10:20:02AM -0600, Keith Busch wrote:
> > > On Fri, May 10, 2024 at 05:10:47PM +0200, Christoph Hellwig wrote:
> > > > On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
> > > > > From: Keith Busch <[email protected]>
> > > > >
> > > > > Some people _really_ want to control their interrupt affinity.
> > > >
> > > > So let them argue why. I'd rather have a really, really, really
> > > > good argument for this crap, and I'd like to hear it from the horses
> > > > mouth.
> > >
> > > It's just prioritizing predictable user task scheduling for a subset of
> > > CPUs instead of having consistently better storage performance.
> > >
> > > We already have "isolcpus=managed_irq," parameter to prevent managed
> > > interrupts from running on a subset of CPUs, so the use case is already
> > > kind of supported. The problem with that parameter is it is a no-op if
> > > the starting affinity spread contains only isolated CPUs.
> >
> > Can you explain a bit why it is a no-op? If only isolated CPUs are
> > spread on one queue, there will be no IO originated from these isolated
> > CPUs, that is exactly what the isolation needs.
>
> The "isolcpus=managed_irq," option doesn't limit the dispatching CPUs.

Please see commit a46c27026da1 ("blk-mq: don't schedule block kworker on isolated CPUs")
in for-6.10/block.

> It only limits where the managed irq will assign the effective_cpus as a
> best effort.

Most of times it does work.

>
> Example, I boot with a system with 4 threads, one nvme device, and
> kernel parameter:
>
> isolcpus=managed_irq,2-3
>
> Run this:
>
> for i in $(seq 0 3); do taskset -c $i dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000 iflag=direct; done

It is one test problem, when you try to isolate '2-3', it isn't expected
to submit IO or run application on these isolated CPUs.

Thanks,
Ming

2024-05-12 06:36:32

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Fri, May 10 2024 at 18:41, Keith Busch wrote:
> On Sat, May 11, 2024 at 07:50:21AM +0800, Ming Lei wrote:
>> Can you explain a bit why it is a no-op? If only isolated CPUs are
>> spread on one queue, there will be no IO originated from these isolated
>> CPUs, that is exactly what the isolation needs.
>
> The "isolcpus=managed_irq," option doesn't limit the dispatching CPUs.
> It only limits where the managed irq will assign the effective_cpus as a
> best effort.
>
> Example, I boot with a system with 4 threads, one nvme device, and
> kernel parameter:
>
> isolcpus=managed_irq,2-3
>
> Run this:
>
> for i in $(seq 0 3); do taskset -c $i dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000 iflag=direct; done
>
> Check /proc/interrupts | grep nvme0:
>
> CPU0 CPU1 CPU2 CPU3
> ..
> 26: 1000 0 0 0 PCI-MSIX-0000:00:05.0 1-edge nvme0q1
> 27: 0 1004 0 0 PCI-MSIX-0000:00:05.0 2-edge nvme0q2
> 28: 0 0 1000 0 PCI-MSIX-0000:00:05.0 3-edge nvme0q3
> 29: 0 0 0 1043 PCI-MSIX-0000:00:05.0 4-edge nvme0q4
>
> The isolcpus did nothing becuase the each vector's mask had just one
> cpu; there was no where else that the managed irq could send it. The
> documentation seems to indicate that was by design as a "best effort".

That's expected as you pin the I/O operation on the isolated CPUs which
in turn makes them use the per CPU queue.

The isolated CPUs are only excluded for device management interrupts,
but not for the affinity spread of the queues.

Thanks,

tglx

2024-05-13 07:33:39

by Benjamin Meier

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

> From: Christoph Hellwig <[email protected]>
>
> So let them argue why. I'd rather have a really, really, really
> good argument for this crap, and I'd like to hear it from the horses
> mouth.

I reached out to Keith to explore the possibility of manually defining
which cores handle NVMe interrupts.

The application which we develop and maintain (in the company I work)
has very high requirements regarding latency. We have some isolated cores
and we run our application on those.

Our system is using kernel 5.4 which unfortunately does not support
"isolcpus=managed_irq". Actually, we did not even know about that
option, because we are focussed on kernel 5.4. It solves part
of our problem, but being able to specify where exactly interrupts
are running is still superior in our opinion.

E.g. assume the number of house-keeping cores is small, because we
want to have full control over the system. In our case we have threads
of different priorities where some get an exclusive core. Some other threads
share a core (or a group of cores) with other threads. Now we are still
happy to assign some interrupts to some of the cores which we consider as
"medium-priority". Due to the small number of non-isolated cores, it can
be tricky to assign all interrupts to those without a performance-penalty.

Given these requirements, manually specifying interrupt/core assignments
would offer greater flexibility and control over system performance.
Moreover, the proposed code changes appear minimal and have no
impact on existing functionalities.

2024-05-13 08:40:20

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Mon, May 13, 2024 at 09:33:27AM +0200, Benjamin Meier wrote:
> > From: Christoph Hellwig <[email protected]>
> >
> > So let them argue why. I'd rather have a really, really, really
> > good argument for this crap, and I'd like to hear it from the horses
> > mouth.
>
> I reached out to Keith to explore the possibility of manually defining
> which cores handle NVMe interrupts.
>
> The application which we develop and maintain (in the company I work)
> has very high requirements regarding latency. We have some isolated cores

Are these isolated cores controlled by kernel command line `isolcpus=`?

> and we run our application on those.
>
> Our system is using kernel 5.4 which unfortunately does not support
> "isolcpus=managed_irq". Actually, we did not even know about that
> option, because we are focussed on kernel 5.4. It solves part
> of our problem, but being able to specify where exactly interrupts
> are running is still superior in our opinion.
>
> E.g. assume the number of house-keeping cores is small, because we
> want to have full control over the system. In our case we have threads
> of different priorities where some get an exclusive core. Some other threads
> share a core (or a group of cores) with other threads. Now we are still
> happy to assign some interrupts to some of the cores which we consider as
> "medium-priority". Due to the small number of non-isolated cores, it can

So these "medium-priority" cores belong to isolated cpu list, you still expect
NVMe interrupts can be handled on these cpu cores, do I understand correctly?

If yes, I think your case still can be covered with 'isolcpus=managed_irq' which
needn't to be same with cpu cores specified from `isolcpus=`, such as
excluding medium-priority cores from 'isolcpus=managed_irq', and
meantime include them in plain `isolcpus=`.

> be tricky to assign all interrupts to those without a performance-penalty.
>
> Given these requirements, manually specifying interrupt/core assignments
> would offer greater flexibility and control over system performance.
> Moreover, the proposed code changes appear minimal and have no
> impact on existing functionalities.

Looks your main concern is performance, but as Keith mentioned, the proposed
change may degrade nvme perf too:

https://lore.kernel.org/linux-nvme/[email protected]/

thanks,
Ming

2024-05-13 08:59:30

by Benjamin Meier

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

> > The application which we develop and maintain (in the company I work)
> > has very high requirements regarding latency. We have some isolated
cores
>
> Are these isolated cores controlled by kernel command line `isolcpus=`?

Yes, exactly.

> > and we run our application on those.
> >
> > Our system is using kernel 5.4 which unfortunately does not support
> > "isolcpus=managed_irq". Actually, we did not even know about that
> > option, because we are focussed on kernel 5.4. It solves part
> > of our problem, but being able to specify where exactly interrupts
> > are running is still superior in our opinion.
> >
> > E.g. assume the number of house-keeping cores is small, because we
> > want to have full control over the system. In our case we have threads
> > of different priorities where some get an exclusive core. Some
other threads
> > share a core (or a group of cores) with other threads. Now we are still
> > happy to assign some interrupts to some of the cores which we
consider as
> > "medium-priority". Due to the small number of non-isolated cores,
it can
>
> So these "medium-priority" cores belong to isolated cpu list, you
still expect
> NVMe interrupts can be handled on these cpu cores, do I understand
correctly?

We want to avoid that the NVMe interrupts are on the "high priority"
cores. Having
noise on them is quite bad for us, so we wanted to move some interrupts
to house
keeping cores and if needed (due to performance issues) keep some on those
"medium-priority" isolated cores. NVMe is not that highest priority for us,
but possibly running too much on the house-keeping cores could also be bad.

> If yes, I think your case still can be covered with
'isolcpus=managed_irq' which
> needn't to be same with cpu cores specified from `isolcpus=`, such as
> excluding medium-priority cores from 'isolcpus=managed_irq', and
> meantime include them in plain `isolcpus=`.

Unfortunately, our kernel version (5.4) does not support "managed_irq"
and due
to that we're happy with the patch. However, I see that for newer kernel
versions
the already existing arguments could be sufficient to do everything.

> > be tricky to assign all interrupts to those without a
performance-penalty.
> >
> > Given these requirements, manually specifying interrupt/core
assignments
> > would offer greater flexibility and control over system performance.
> > Moreover, the proposed code changes appear minimal and have no
> > impact on existing functionalities.
>
> Looks your main concern is performance, but as Keith mentioned, the
proposed
> change may degrade nvme perf too:
>
>
https://lore.kernel.org/linux-nvme/[email protected]/

Yes, but for NVMe it's not that critical. The most important point for us is
to keep them away from our "high-priority" cores. We still wanted to
have control
where we run those interrupts, but also because we just did not know the
"managed_irq"
option.

Thanks,
Benjamin

2024-05-13 09:25:32

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Mon, May 13, 2024 at 10:59:02AM +0200, Benjamin Meier wrote:
> > > The application which we develop and maintain (in the company I work)
> > > has very high requirements regarding latency. We have some isolated
> cores
> >
> > Are these isolated cores controlled by kernel command line `isolcpus=`?
>
> Yes, exactly.
>
> > > and we run our application on those.
> > >
> > > Our system is using kernel 5.4 which unfortunately does not support
> > > "isolcpus=managed_irq". Actually, we did not even know about that
> > > option, because we are focussed on kernel 5.4. It solves part
> > > of our problem, but being able to specify where exactly interrupts
> > > are running is still superior in our opinion.
> > >
> > > E.g. assume the number of house-keeping cores is small, because we
> > > want to have full control over the system. In our case we have threads
> > > of different priorities where some get an exclusive core. Some other
> threads
> > > share a core (or a group of cores) with other threads. Now we are still
> > > happy to assign some interrupts to some of the cores which we consider
> as
> > > "medium-priority". Due to the small number of non-isolated cores, it can
> >
> > So these "medium-priority" cores belong to isolated cpu list, you still
> expect
> > NVMe interrupts can be handled on these cpu cores, do I understand
> correctly?
>
> We want to avoid that the NVMe interrupts are on the "high priority" cores.
> Having
> noise on them is quite bad for us, so we wanted to move some interrupts to
> house
> keeping cores and if needed (due to performance issues) keep some on those
> "medium-priority" isolated cores. NVMe is not that highest priority for us,
> but possibly running too much on the house-keeping cores could also be bad.
>
> > If yes, I think your case still can be covered with 'isolcpus=managed_irq'
> which
> > needn't to be same with cpu cores specified from `isolcpus=`, such as
> > excluding medium-priority cores from 'isolcpus=managed_irq', and
> > meantime include them in plain `isolcpus=`.
>
> Unfortunately, our kernel version (5.4) does not support "managed_irq" and
> due
> to that we're happy with the patch. However, I see that for newer kernel
> versions
> the already existing arguments could be sufficient to do everything.

'isolcpus=managed_irq' enablement patches are small, and shouldn't be very
hard to backport.

>
> > > be tricky to assign all interrupts to those without a
> performance-penalty.
> > >
> > > Given these requirements, manually specifying interrupt/core assignments
> > > would offer greater flexibility and control over system performance.
> > > Moreover, the proposed code changes appear minimal and have no
> > > impact on existing functionalities.
> >
> > Looks your main concern is performance, but as Keith mentioned, the
> proposed
> > change may degrade nvme perf too:
> >
> > https://lore.kernel.org/linux-nvme/[email protected]/
>
> Yes, but for NVMe it's not that critical. The most important point for us is
> to keep them away from our "high-priority" cores. We still wanted to have
> control
> where we run those interrupts, but also because we just did not know the
> "managed_irq"
> option.

OK, thanks for share the input!

Now from upstream viewpoint, 'isolcpus=managed_irq' should work for your case,
and seems not necessary to support nvme unmanaged irq for this requirement
at least.

thanks,
Ming

2024-05-13 12:33:25

by Benjamin Meier

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

> 'isolcpus=managed_irq' enablement patches are small, and shouldn't be
very
> hard to backport.

I have big respect of kernel code and probably for non-kernel devs it's
not so easy:)

But yeah, we'll look into this.

> > > > be tricky to assign all interrupts to those without a
> > performance-penalty.
> > > >
> > > > Given these requirements, manually specifying interrupt/core
assignments
> > > > would offer greater flexibility and control over system
performance.
> > > > Moreover, the proposed code changes appear minimal and have no
> > > > impact on existing functionalities.
> > >
> > > Looks your main concern is performance, but as Keith mentioned, the
> > proposed
> > > change may degrade nvme perf too:
> > >
> > >
https://lore.kernel.org/linux-nvme/[email protected]/
> >
> > Yes, but for NVMe it's not that critical. The most important point
for us is
> > to keep them away from our "high-priority" cores. We still wanted
to have
> > control
> > where we run those interrupts, but also because we just did not
know the
> > "managed_irq"
> > option.
>
> OK, thanks for share the input!
>
> Now from upstream viewpoint, 'isolcpus=managed_irq' should work for
your case,
> and seems not necessary to support nvme unmanaged irq for this
requirement
> at least.

Yes, probably that will do it. Personally, I still think it's a nice
thing if it's
possible to assign interrupts to specific cores, but practically the
advantages are
likely not that big compared to 'isolcpus=managed_irq'.

Thanks for all the explanations

2024-05-13 13:13:03

by Bart Van Assche

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On 5/10/24 08:10, Christoph Hellwig wrote:
> On Fri, May 10, 2024 at 07:14:59AM -0700, Keith Busch wrote:
>> From: Keith Busch <[email protected]>
>>
>> Some people _really_ want to control their interrupt affinity.
>
> So let them argue why. I'd rather have a really, really, really
> good argument for this crap, and I'd like to hear it from the horses
> mouth.

Performance can be increased by modifying the interrupt assignments
carefully, especially in storage appliances that have to process a
large number of network and storage interrupts. By carefully assigning
interrupts the number of completions processed per interrupt can be
increased and hence performance also increases. In 2014 I was working
on a product that benefited from this approach.

Thanks,

Bart.

2024-05-20 15:38:00

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Sun, May 12, 2024 at 08:35:55AM +0200, Thomas Gleixner wrote:
> That's expected as you pin the I/O operation on the isolated CPUs which
> in turn makes them use the per CPU queue.
>
> The isolated CPUs are only excluded for device management interrupts,
> but not for the affinity spread of the queues.

We'll probably need a version of isolcpus that also excludes the
interrupt spread given that users are asking for it. And I'd much
prefer that over adding radom module options to every driver to disable
managed interrupts.

2024-05-20 20:34:56

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Mon, May 20 2024 at 17:37, Christoph Hellwig wrote:

> On Sun, May 12, 2024 at 08:35:55AM +0200, Thomas Gleixner wrote:
>> That's expected as you pin the I/O operation on the isolated CPUs which
>> in turn makes them use the per CPU queue.
>>
>> The isolated CPUs are only excluded for device management interrupts,
>> but not for the affinity spread of the queues.
>
> We'll probably need a version of isolcpus that also excludes the
> interrupt spread given that users are asking for it. And I'd much
> prefer that over adding radom module options to every driver to disable
> managed interrupts.

No objections from my side.

2024-05-21 02:32:26

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Mon, May 20, 2024 at 05:37:42PM +0200, Christoph Hellwig wrote:
> On Sun, May 12, 2024 at 08:35:55AM +0200, Thomas Gleixner wrote:
> > That's expected as you pin the I/O operation on the isolated CPUs which
> > in turn makes them use the per CPU queue.
> >
> > The isolated CPUs are only excluded for device management interrupts,
> > but not for the affinity spread of the queues.
>
> We'll probably need a version of isolcpus that also excludes the
> interrupt spread given that users are asking for it. And I'd much
> prefer that over adding radom module options to every driver to disable
> managed interrupts.

BTW, isolcpus has been marked as deprecated, and it can't be adjust
runtime.

isolcpus= [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
[Deprecated - use cpusets instead]

Thanks,
Ming

2024-05-21 08:38:36

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

On Tue, May 21 2024 at 10:31, Ming Lei wrote:

> On Mon, May 20, 2024 at 05:37:42PM +0200, Christoph Hellwig wrote:
>> On Sun, May 12, 2024 at 08:35:55AM +0200, Thomas Gleixner wrote:
>> > That's expected as you pin the I/O operation on the isolated CPUs which
>> > in turn makes them use the per CPU queue.
>> >
>> > The isolated CPUs are only excluded for device management interrupts,
>> > but not for the affinity spread of the queues.
>>
>> We'll probably need a version of isolcpus that also excludes the
>> interrupt spread given that users are asking for it. And I'd much
>> prefer that over adding radom module options to every driver to disable
>> managed interrupts.
>
> BTW, isolcpus has been marked as deprecated, and it can't be adjust
> runtime.

Which is far from reality as cpusets do not allow to do what isolcpus
does today.

Also runtime adjusting managed interrupts needs way more thoughts.

Thanks

tglx

2024-05-21 10:07:13

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 2/2] nvme-pci: allow unmanaged interrupts

Le Tue, May 21, 2024 at 10:38:25AM +0200, Thomas Gleixner a ?crit :
> On Tue, May 21 2024 at 10:31, Ming Lei wrote:
>
> > On Mon, May 20, 2024 at 05:37:42PM +0200, Christoph Hellwig wrote:
> >> On Sun, May 12, 2024 at 08:35:55AM +0200, Thomas Gleixner wrote:
> >> > That's expected as you pin the I/O operation on the isolated CPUs which
> >> > in turn makes them use the per CPU queue.
> >> >
> >> > The isolated CPUs are only excluded for device management interrupts,
> >> > but not for the affinity spread of the queues.
> >>
> >> We'll probably need a version of isolcpus that also excludes the
> >> interrupt spread given that users are asking for it. And I'd much
> >> prefer that over adding radom module options to every driver to disable
> >> managed interrupts.
> >
> > BTW, isolcpus has been marked as deprecated, and it can't be adjust
> > runtime.
>
> Which is far from reality as cpusets do not allow to do what isolcpus
> does today.
>
> Also runtime adjusting managed interrupts needs way more thoughts.

I'll remove that comment (unless someone beats me at it?). We used to think
that cpusets would indeed deprecate isolcpus but for several reasons this
will never be the case.

Thanks.