Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
From:   John Garry <john.garry@huawei.com>
Subject: Re: Question on handling managed IRQs when hotplugging CPUs
To:     Keith Busch <keith.busch@intel.com>
References: <f6f6e031-8b79-439d-c2af-8d3e76f30710@huawei.com>
 <20190129154433.GF15302@localhost.localdomain>
CC:     "tglx@linutronix.de" <tglx@linutronix.de>,
        Christoph Hellwig <hch@lst.de>,
        Marc Zyngier <marc.zyngier@arm.com>,
        "axboe@kernel.dk" <axboe@kernel.dk>,
        Peter Zijlstra <peterz@infradead.org>,
        Michael Ellerman <mpe@ellerman.id.au>,
        Linuxarm <linuxarm@huawei.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Hannes Reinecke <hare@suse.com>
Message-ID: <757902fc-a9ea-090b-7853-89944a0ce1b5@huawei.com>
Date:   Tue, 29 Jan 2019 17:12:40 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.3.0
MIME-Version: 1.0
In-Reply-To: <20190129154433.GF15302@localhost.localdomain>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On 29/01/2019 15:44, Keith Busch wrote:
> On Tue, Jan 29, 2019 at 03:25:48AM -0800, John Garry wrote:
>> Hi,
>>
>> I have a question on $subject which I hope you can shed some light on.
>>
>> According to commit c5cb83bb337c25 ("genirq/cpuhotplug: Handle managed
>> IRQs on CPU hotplug"), if we offline the last CPU in a managed IRQ
>> affinity mask, the IRQ is shutdown.
>>
>> The reasoning is that this IRQ is thought to be associated with a
>> specific queue on a MQ device, and the CPUs in the IRQ affinity mask are
>> the same CPUs associated with the queue. So, if no CPU is using the
>> queue, then no need for the IRQ.
>>
>> However how does this handle scenario of last CPU in IRQ affinity mask
>> being offlined while IO associated with queue is still in flight?
>>
>> Or if we make the decision to use queue associated with the current CPU,
>> and then that CPU (being the last CPU online in the queue's IRQ
>> afffinity mask) goes offline and we finish the delivery with another CPU?
>>
>> In these cases, when the IO completes, it would not be serviced and timeout.
>>
>> I have actually tried this on my arm64 system and I see IO timeouts.
>
> Hm, we used to freeze the queues with CPUHP_BLK_MQ_PREPARE callback,
> which would reap all outstanding commands before the CPU and IRQ are
> taken offline. That was removed with commit 4b855ad37194f ("blk-mq:
> Create hctx for each present CPU"). It sounds like we should bring
> something like that back, but make more fine grain to the per-cpu context.
>

Seems reasonable. But we would need it to deal with drivers where they 
only expose a single queue to BLK MQ, but use many queues internally. I 
think megaraid sas does this, for example.

I would also be slightly concerned with commands being issued from the 
driver unknown to blk mq, like SCSI TMF.

Thanks,
John

> .
>