Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
From:   Thomas Gleixner <tglx@linutronix.de>
To:     "Michael S. Tsirkin" <mst@redhat.com>
Cc:     Angus Chen <angus.chen@jaguarmicro.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Ming Lei <ming.lei@redhat.com>,
        Jason Wang <jasowang@redhat.com>
Subject: Re: IRQ affinity problem from virtio_blk
In-Reply-To: <20221115183339-mutt-send-email-mst@kernel.org>
References: <TY2PR06MB3424CB11DB57CA1FAA16F10D85049@TY2PR06MB3424.apcprd06.prod.outlook.com>
 <87v8nfrhbw.ffs@tglx> <20221115174152-mutt-send-email-mst@kernel.org>
 <87sfijrf9o.ffs@tglx> <87o7t7rec7.ffs@tglx>
 <20221115183339-mutt-send-email-mst@kernel.org>
Date:   Wed, 16 Nov 2022 11:43:24 +0100
Message-ID: <87leobqiwj.ffs@tglx>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

On Tue, Nov 15 2022 at 18:36, Michael S. Tsirkin wrote:
> On Wed, Nov 16, 2022 at 12:24:24AM +0100, Thomas Gleixner wrote:
>> I just checked on a random VM. The PCI device as advertised to the guest
>> does not expose that many vectors. One has 2 and the other 4.
>> 
>> But as the interrupts are requested 'managed' the core ends up setting
>> the vectors aside. That's a fundamental property of managed interrupts.
>> 
>> Assume you have less queues than CPUs, which is the case with 2 vectors
>> and tons of CPUs, i.e. one ends up for config and the other for the
>> actual queue. So the affinity spreading code will end up having the full
>> cpumask for the queue vector, which is marked managed. And managed means
>> that it's guaranteed e.g. in the CPU hotplug case that the interrupt can
>> be migrated to a still online CPU.
>> 
>> So we end up setting 79 vectors aside (one per CPU) in the case that the
>> virtio device only provides two vectors.
>> 
>> But that's not the end of the world as you really would need ~200 such
>> devices to exhaust the vector space...
>
> Let's say we have 20 queues - then just 10 devices will exhaust the
> vector space right?

No.

If you have 20 queues then the queues are spread out over the
CPUs. Assume 80 CPUs:

Then each queue is associated to 80/20 = 4 CPUs and the resulting
affinity mask of each queue contains exactly 4 CPUs:

q0:      0 -  3
q1:      4 -  7
...
q19:    76 - 79

So this puts exactly 80 vectors aside, one per CPU.

As long as at least one CPU of a queue mask is online the queue is
enabled. If the last CPU of a queue mask goes offline then the queue is
shutdown which means the interrupt associated to the queue is shut down
too. That's all handled by the block MQ and the interrupt core. If a CPU
of a queue mask comes back online then the guaranteed vector is
allocated again.

So it does not matter how many queues per device you have it will
reserve exactly ONE interrupt per CPU.

Ergo you need 200 devices to exhaust the vector space.

Thanks,

        tglx