2017-12-04 16:21:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
> Seems that this is the place where the system stops. (see the sysrq-t output
> at the bottom).

Can you check which of the patches in the tree is the culprit?


2017-12-06 12:25:22

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On 12/04/2017 05:21 PM, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:18:09PM +0100, Christian Borntraeger wrote:
>> Works fine under KVM with virtio-blk, but still hangs during boot in an LPAR.
>> FWIW, the system not only has scsi disks via fcp but also DASDs as a boot disk.
>> Seems that this is the place where the system stops. (see the sysrq-t output
>> at the bottom).
>
> Can you check which of the patches in the tree is the culprit?


>From this branch

git://git.infradead.org/users/hch/block.git blk-mq-hotplug-fix

commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
blk-mq: create a blk_mq_ctx for each possible CPU
does not boot on DASD and
commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
genirq/affinity: assign vectors to all possible CPUs
does boot with DASD disks.

Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
s390 irq handling code).


Some history:
I got this warning
"WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)"
since 4.13 (and also in 4.12 stable)
on CPU hotplug of previously unavailable CPUs (real hotplug, no offline/online)

This was introduced with

blk-mq: Create hctx for each present CPU
commit 4b855ad37194f7bdbb200ce7a1c7051fecb56a08

And Christoph is currently working on a fix. The fixed kernel does boot with virtio-blk and
it fixes the warning but it hangs (outstanding I/O) with dasd disks.

2017-12-06 23:29:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
> blk-mq: create a blk_mq_ctx for each possible CPU
> does not boot on DASD and
> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
> genirq/affinity: assign vectors to all possible CPUs
> does boot with DASD disks.
>
> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> s390 irq handling code).

That is interesting as it really isn't related to interrupts at all,
it just ensures that possible CPUs are set in ->cpumask.

I guess we'd really want:

e005655c389e3d25bf3e43f71611ec12f3012de0
"blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"

before this commit, but it seems like the whole stack didn't work for
your either.

I wonder if there is some weird thing about nr_cpu_ids in s390?

2017-12-07 09:20:27

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)



On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>> blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>> genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
>
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
>
> I guess we'd really want:
>
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>
> before this commit, but it seems like the whole stack didn't work for
> your either.
>
> I wonder if there is some weird thing about nr_cpu_ids in s390?

The problem starts as soon as NR_CPUS is larger than the number
of real CPUs.

Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:

e.g. dont we need something like (whitespace and indent damaged)

@@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
if (--hctx->next_cpu_batch <= 0) {
int next_cpu;

+ do {
next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
- if (!cpu_online(next_cpu))
- next_cpu = cpumask_next(next_cpu, hctx->cpumask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(hctx->cpumask);
+ } while (!cpu_online(next_cpu));

hctx->next_cpu = next_cpu;
hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;

it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)


2017-12-14 17:32:31

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

Independent from the issues with the dasd disks, this also seem to not enable
additional hardware queues.

with cpus 0,1 (and 248 cpus max)
I get cpus 0 and 2-247 attached to hardware contect 0 and I get
cpu 1 for hardware context 1.

If I now add a cpu this does not change anything. hardware context 2,3,4
etc all have no CPU and hardware context 0 keeps sitting on all cpus (except 1).




On 12/07/2017 10:20 AM, Christian Borntraeger wrote:
>
>
> On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>>> blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>>> genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>>
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>
> The problem starts as soon as NR_CPUS is larger than the number
> of real CPUs.
>
> Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu:
>
> e.g. dont we need something like (whitespace and indent damaged)
>
> @@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
> if (--hctx->next_cpu_batch <= 0) {
> int next_cpu;
>
> + do {
> next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
> - if (!cpu_online(next_cpu))
> - next_cpu = cpumask_next(next_cpu, hctx->cpumask);
> if (next_cpu >= nr_cpu_ids)
> next_cpu = cpumask_first(hctx->cpumask);
> + } while (!cpu_online(next_cpu));
>
> hctx->next_cpu = next_cpu;
> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
>
> it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)
>
>

2017-12-18 13:56:14

by Stefan Haberland

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On 07.12.2017 00:29, Christoph Hellwig wrote:
> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> bad
>> blk-mq: create a blk_mq_ctx for each possible CPU
>> does not boot on DASD and
>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good
>> genirq/affinity: assign vectors to all possible CPUs
>> does boot with DASD disks.
>>
>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>> s390 irq handling code).
> That is interesting as it really isn't related to interrupts at all,
> it just ensures that possible CPUs are set in ->cpumask.
>
> I guess we'd really want:
>
> e005655c389e3d25bf3e43f71611ec12f3012de0
> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>
> before this commit, but it seems like the whole stack didn't work for
> your either.
>
> I wonder if there is some weird thing about nr_cpu_ids in s390?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

I tried this on my system and the blk-mq-hotplug-fix branch does not
boot for me as well.
The disks get up and running and I/O works fine. At least the partition
detection and EXT4-fs mount works.

But at some point in time the disk do not get any requests.

I currently have no clue why.
I took a dump and had a look at the disk states and they are fine. No
error in the logs or in our debug entrys. Just empty DASD devices
waiting to be called for I/O requests.

Do you have anything I could have a look at?

2017-12-20 15:47:32

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> On 07.12.2017 00:29, Christoph Hellwig wrote:
>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>> does not boot on DASD and
>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>     genirq/affinity: assign vectors to all possible CPUs
>>> does boot with DASD disks.
>>>
>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>> s390 irq handling code).
>> That is interesting as it really isn't related to interrupts at all,
>> it just ensures that possible CPUs are set in ->cpumask.
>>
>> I guess we'd really want:
>>
>> e005655c389e3d25bf3e43f71611ec12f3012de0
>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>
>> before this commit, but it seems like the whole stack didn't work for
>> your either.
>>
>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>
> But at some point in time the disk do not get any requests.
>
> I currently have no clue why.
> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>
> Do you have anything I could have a look at?

Jens, Christoph, so what do we do about this?
To summarize:
- commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
- Jens' quick revert did fix the issue and did not broke DASD support but has some issues
with interrupt affinity.
- Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
without hotplug).

Christian

2018-01-11 09:13:48

by Ming Lei

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
> > On 07.12.2017 00:29, Christoph Hellwig wrote:
> >> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
> >> t > commit 11b2025c3326f7096ceb588c3117c7883850c068??? -> bad
> >>> ???? blk-mq: create a blk_mq_ctx for each possible CPU
> >>> does not boot on DASD and
> >>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc??? -> good
> >>> ??? genirq/affinity: assign vectors to all possible CPUs
> >>> does boot with DASD disks.
> >>>
> >>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
> >>> s390 irq handling code).
> >> That is interesting as it really isn't related to interrupts at all,
> >> it just ensures that possible CPUs are set in ->cpumask.
> >>
> >> I guess we'd really want:
> >>
> >> e005655c389e3d25bf3e43f71611ec12f3012de0
> >> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
> >>
> >> before this commit, but it seems like the whole stack didn't work for
> >> your either.
> >>
> >> I wonder if there is some weird thing about nr_cpu_ids in s390?
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> >> the body of a message to [email protected]
> >> More majordomo info at? http://vger.kernel.org/majordomo-info.html
> >>
> >
> > I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
> > The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
> >
> > But at some point in time the disk do not get any requests.
> >
> > I currently have no clue why.
> > I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
> >
> > Do you have anything I could have a look at?
>
> Jens, Christoph, so what do we do about this?
> To summarize:
> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
> with interrupt affinity.
> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
> without hotplug).

Hello,

This one is a valid use case for VM, I think we need to fix that.

Looks there is issue on the fouth patch("blk-mq: only select online
CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
the other 3 patches are same with Christoph's:

https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix

gitweb:
https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix

Could you test it and provide the feedback?

BTW, if it can't help this issue, could you boot from a normal disk first
and dump blk-mq debugfs of DASD later?

Thanks,
Ming

2018-01-11 09:26:40

by Stefan Haberland

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On 11.01.2018 10:13, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>     genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
> Hello,
>
> This one is a valid use case for VM, I think we need to fix that.
>
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
>
> https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
>
> gitweb:
> https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>
> Could you test it and provide the feedback?
>
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?
>
> Thanks,
> Ming
>

Hi,

thanks for the patch. I had pretty much the same place in suspicion.
I will test it asap.

Regards,
Stefan

2018-01-11 11:45:02

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)



On 01/11/2018 10:13 AM, Ming Lei wrote:
> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>>>> does not boot on DASD and
>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>     genirq/affinity: assign vectors to all possible CPUs
>>>>> does boot with DASD disks.
>>>>>
>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>> s390 irq handling code).
>>>> That is interesting as it really isn't related to interrupts at all,
>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>
>>>> I guess we'd really want:
>>>>
>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>
>>>> before this commit, but it seems like the whole stack didn't work for
>>>> your either.
>>>>
>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>
>>> But at some point in time the disk do not get any requests.
>>>
>>> I currently have no clue why.
>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>
>>> Do you have anything I could have a look at?
>>
>> Jens, Christoph, so what do we do about this?
>> To summarize:
>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>> with interrupt affinity.
>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>> without hotplug).
>
> Hello,
>
> This one is a valid use case for VM, I think we need to fix that.
>
> Looks there is issue on the fouth patch("blk-mq: only select online
> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
> the other 3 patches are same with Christoph's:
>
> https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
>
> gitweb:
> https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>
> Could you test it and provide the feedback?
>
> BTW, if it can't help this issue, could you boot from a normal disk first
> and dump blk-mq debugfs of DASD later?

That kernel seems to boot fine on my system with DASD disks.

2018-01-11 13:17:16

by Stefan Haberland

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On 11.01.2018 12:44, Christian Borntraeger wrote:
>
> On 01/11/2018 10:13 AM, Ming Lei wrote:
>> On Wed, Dec 20, 2017 at 04:47:21PM +0100, Christian Borntraeger wrote:
>>> On 12/18/2017 02:56 PM, Stefan Haberland wrote:
>>>> On 07.12.2017 00:29, Christoph Hellwig wrote:
>>>>> On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote:
>>>>> t > commit 11b2025c3326f7096ceb588c3117c7883850c068    -> bad
>>>>>>      blk-mq: create a blk_mq_ctx for each possible CPU
>>>>>> does not boot on DASD and
>>>>>> commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc    -> good
>>>>>>     genirq/affinity: assign vectors to all possible CPUs
>>>>>> does boot with DASD disks.
>>>>>>
>>>>>> Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the
>>>>>> s390 irq handling code).
>>>>> That is interesting as it really isn't related to interrupts at all,
>>>>> it just ensures that possible CPUs are set in ->cpumask.
>>>>>
>>>>> I guess we'd really want:
>>>>>
>>>>> e005655c389e3d25bf3e43f71611ec12f3012de0
>>>>> "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu"
>>>>>
>>>>> before this commit, but it seems like the whole stack didn't work for
>>>>> your either.
>>>>>
>>>>> I wonder if there is some weird thing about nr_cpu_ids in s390?
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>> I tried this on my system and the blk-mq-hotplug-fix branch does not boot for me as well.
>>>> The disks get up and running and I/O works fine. At least the partition detection and EXT4-fs mount works.
>>>>
>>>> But at some point in time the disk do not get any requests.
>>>>
>>>> I currently have no clue why.
>>>> I took a dump and had a look at the disk states and they are fine. No error in the logs or in our debug entrys. Just empty DASD devices waiting to be called for I/O requests.
>>>>
>>>> Do you have anything I could have a look at?
>>> Jens, Christoph, so what do we do about this?
>>> To summarize:
>>> - commit 4b855ad37194f7 ("blk-mq: Create hctx for each present CPU") broke CPU hotplug.
>>> - Jens' quick revert did fix the issue and did not broke DASD support but has some issues
>>> with interrupt affinity.
>>> - Christoph patch set fixes the hotplug issue for virtio blk but causes I/O hangs on DASDs (even
>>> without hotplug).
>> Hello,
>>
>> This one is a valid use case for VM, I think we need to fix that.
>>
>> Looks there is issue on the fouth patch("blk-mq: only select online
>> CPUs in blk_mq_hctx_next_cpu"), I fixed it in the following tree, and
>> the other 3 patches are same with Christoph's:
>>
>> https://github.com/ming1/linux.git v4.15-rc-block-for-next-cpuhot-fix
>>
>> gitweb:
>> https://github.com/ming1/linux/commits/v4.15-rc-block-for-next-cpuhot-fix
>>
>> Could you test it and provide the feedback?
>>
>> BTW, if it can't help this issue, could you boot from a normal disk first
>> and dump blk-mq debugfs of DASD later?
> That kernel seems to boot fine on my system with DASD disks.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

I did some regression testing and it works quite well. Boot works,
attaching CPUs during runtime on z/VM and enabling them in Linux works
as well.
I also did some DASD online/offline CPU enable/disable loops.

Regards,
Stefan

2018-01-11 17:46:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

Thanks for looking into this Ming, I had missed it in the my current
work overload. Can you send the updated series to Jens?

2018-01-12 01:17:06

by Ming Lei

[permalink] [raw]
Subject: Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)

On Thu, Jan 11, 2018 at 06:46:54PM +0100, Christoph Hellwig wrote:
> Thanks for looking into this Ming, I had missed it in the my current
> work overload. Can you send the updated series to Jens?

OK, I will post it out soon.

Thanks,
Ming