2020-08-17 13:54:08

by Ahmed S. Darwish

[permalink] [raw]
Subject: v5.9-rc1 commit reliably breaks pci nvme detection

Hello,

Below v5.9-rc1 commit reliably breaks my boot on a Thinkpad e480
laptop. PCI nvme detection fails, and the kernel becomes not able
anymore to find the rootfs / parse "root=".

Bisecting v5.8=>v5.9-rc1 blames that commit. Reverting it *reliably*
fixes the problem and makes me able to boot v5.9-rc1.

Please advise.

commit 61f3b89630973037f67d8e25e5d26e80a51a7b37
Author: Chaitanya Kulkarni <[email protected]>
Date: Wed Jun 17 10:05:13 2020 +0200

nvme-pci: use unsigned for io queue depth

The NVMe PCIe declares module parameter io_queue_depth as int. Change
this to u16 as queue depth can never be negative. Now to reflect this
update module parameter getter function from param_get_int() ->
param_get_uint() and respective setter function with type of n changed
from int to u16 with param_set_int() to param_set_ushort(). Finally
update struct nvme_dev q_depth member to u16 and use u16 in min_t()
when calculating dev->q_depth in the nvme_pci_enable() (since q_depth is
now u16) and use unsigned int instead of int when calculating
dev->tagset.queue_depth as target variable tagset->queue_depth is of type
unsigned int in nvme_dev_add().

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH


2020-08-17 18:39:04

by Jens Axboe

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On 8/17/20 8:56 AM, Keith Busch wrote:
> On Mon, Aug 17, 2020 at 03:50:11PM +0200, Ahmed S. Darwish wrote:
>> Hello,
>>
>> Below v5.9-rc1 commit reliably breaks my boot on a Thinkpad e480
>> laptop. PCI nvme detection fails, and the kernel becomes not able
>> anymore to find the rootfs / parse "root=".
>>
>> Bisecting v5.8=>v5.9-rc1 blames that commit. Reverting it *reliably*
>> fixes the problem and makes me able to boot v5.9-rc1.
>
> The fix is staged in the nvme tree here:
>
> http://git.infradead.org/nvme.git/commit/286155561ecd13b6c85a78eaf2880d3baea03b9e

That would have been nice to have in -rc1...

--
Jens Axboe

2020-08-17 18:45:12

by Keith Busch

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On Mon, Aug 17, 2020 at 03:50:11PM +0200, Ahmed S. Darwish wrote:
> Hello,
>
> Below v5.9-rc1 commit reliably breaks my boot on a Thinkpad e480
> laptop. PCI nvme detection fails, and the kernel becomes not able
> anymore to find the rootfs / parse "root=".
>
> Bisecting v5.8=>v5.9-rc1 blames that commit. Reverting it *reliably*
> fixes the problem and makes me able to boot v5.9-rc1.

The fix is staged in the nvme tree here:

http://git.infradead.org/nvme.git/commit/286155561ecd13b6c85a78eaf2880d3baea03b9e

2020-08-20 16:39:45

by Jens Axboe

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On 8/17/20 9:58 AM, Jens Axboe wrote:
> On 8/17/20 8:56 AM, Keith Busch wrote:
>> On Mon, Aug 17, 2020 at 03:50:11PM +0200, Ahmed S. Darwish wrote:
>>> Hello,
>>>
>>> Below v5.9-rc1 commit reliably breaks my boot on a Thinkpad e480
>>> laptop. PCI nvme detection fails, and the kernel becomes not able
>>> anymore to find the rootfs / parse "root=".
>>>
>>> Bisecting v5.8=>v5.9-rc1 blames that commit. Reverting it *reliably*
>>> fixes the problem and makes me able to boot v5.9-rc1.
>>
>> The fix is staged in the nvme tree here:
>>
>> http://git.infradead.org/nvme.git/commit/286155561ecd13b6c85a78eaf2880d3baea03b9e
>
> That would have been nice to have in -rc1...

And now we're getting very close to shipping items for -rc2, and it's still
not in. Can we please get the nvme pull request out for -rc2?

--
Jens Axboe

2020-08-20 17:10:51

by Ahmed S. Darwish

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On Thu, Aug 20, 2020 at 09:35:38AM -0600, Jens Axboe wrote:
> On 8/17/20 9:58 AM, Jens Axboe wrote:
> > On 8/17/20 8:56 AM, Keith Busch wrote:
> >> On Mon, Aug 17, 2020 at 03:50:11PM +0200, Ahmed S. Darwish wrote:
> >>> Hello,
> >>>
> >>> Below v5.9-rc1 commit reliably breaks my boot on a Thinkpad e480
> >>> laptop. PCI nvme detection fails, and the kernel becomes not able
> >>> anymore to find the rootfs / parse "root=".
> >>>
> >>> Bisecting v5.8=>v5.9-rc1 blames that commit. Reverting it *reliably*
> >>> fixes the problem and makes me able to boot v5.9-rc1.
> >>
> >> The fix is staged in the nvme tree here:
> >>
> >> http://git.infradead.org/nvme.git/commit/286155561ecd13b6c85a78eaf2880d3baea03b9e
> >
> > That would have been nice to have in -rc1...
>
> And now we're getting very close to shipping items for -rc2, and it's still
> not in. Can we please get the nvme pull request out for -rc2?
>

I keep wondering that myself. Completely breaking the boot like this is
really not nice -- and for x86-64 laptops no less :-(

The fix is really small and isolated. "Urgent pull requests", containing
only a fix or two, were created *exactly* for this reson...

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

2020-08-20 17:13:38

by Jens Axboe

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On 8/20/20 11:07 AM, Ahmed S. Darwish wrote:
> On Thu, Aug 20, 2020 at 09:35:38AM -0600, Jens Axboe wrote:
>> On 8/17/20 9:58 AM, Jens Axboe wrote:
>>> On 8/17/20 8:56 AM, Keith Busch wrote:
>>>> On Mon, Aug 17, 2020 at 03:50:11PM +0200, Ahmed S. Darwish wrote:
>>>>> Hello,
>>>>>
>>>>> Below v5.9-rc1 commit reliably breaks my boot on a Thinkpad e480
>>>>> laptop. PCI nvme detection fails, and the kernel becomes not able
>>>>> anymore to find the rootfs / parse "root=".
>>>>>
>>>>> Bisecting v5.8=>v5.9-rc1 blames that commit. Reverting it *reliably*
>>>>> fixes the problem and makes me able to boot v5.9-rc1.
>>>>
>>>> The fix is staged in the nvme tree here:
>>>>
>>>> http://git.infradead.org/nvme.git/commit/286155561ecd13b6c85a78eaf2880d3baea03b9e
>>>
>>> That would have been nice to have in -rc1...
>>
>> And now we're getting very close to shipping items for -rc2, and it's still
>> not in. Can we please get the nvme pull request out for -rc2?
>>
>
> I keep wondering that myself. Completely breaking the boot like this is
> really not nice -- and for x86-64 laptops no less :-(

To be fair, I've only heard this one complaint about it, so hopefully it's
not too widespread. I'm on an x86-64 laptop myself with nvme, and it works
just fine :-)

> The fix is really small and isolated. "Urgent pull requests", containing
> only a fix or two, were created *exactly* for this reson...

Totally agree on that, it should have gone in for -rc1. This will be going
upstream tomorrow, so the end is near...

--
Jens Axboe

2020-08-20 17:16:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On Thu, Aug 20, 2020 at 11:10:58AM -0600, Jens Axboe wrote:
> To be fair, I've only heard this one complaint about it, so hopefully it's
> not too widespread. I'm on an x86-64 laptop myself with nvme, and it works
> just fine :-)

The cause for this is the weird NVMe of by ones, where 0 in a field
means 1. So for the overflow to happen you need a controller that
supports USHORT_MAX queue entries. Which don't seem to be all that
common.

2020-08-20 17:17:17

by Jens Axboe

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On 8/20/20 11:12 AM, Christoph Hellwig wrote:
> On Thu, Aug 20, 2020 at 11:10:58AM -0600, Jens Axboe wrote:
>> To be fair, I've only heard this one complaint about it, so hopefully it's
>> not too widespread. I'm on an x86-64 laptop myself with nvme, and it works
>> just fine :-)
>
> The cause for this is the weird NVMe of by ones, where 0 in a field
> means 1. So for the overflow to happen you need a controller that
> supports USHORT_MAX queue entries. Which don't seem to be all that
> common.

Yeah, don't think I've ever seen those. I come across 1023 and 128 all
the time, but I don't have one in my arsenal of NVMe drives that is
any different than those two.

--
Jens Axboe

2020-08-21 07:32:21

by John Garry

[permalink] [raw]
Subject: Re: v5.9-rc1 commit reliably breaks pci nvme detection

On 20/08/2020 18:12, Christoph Hellwig wrote:
> On Thu, Aug 20, 2020 at 11:10:58AM -0600, Jens Axboe wrote:
>> To be fair, I've only heard this one complaint about it, so hopefully it's
>> not too widespread. I'm on an x86-64 laptop myself with nvme, and it works
>> just fine:-)
> The cause for this is the weird NVMe of by ones, where 0 in a field
> means 1.

I thought that this was a common trick by spec writers to fit a number
in range (0, 2^x] in x bits (as opposed to x+1).