2020-12-08 21:21:09

by John Garry

[permalink] [raw]
Subject: Re: problem booting 5.10

On 08/12/2020 19:19, Linus Torvalds wrote:
> On Tue, Dec 8, 2020 at 10:59 AM Martin K. Petersen
> <[email protected]> wrote:
>>
>>> So I'm adding SCSI people to the cc, just in case they go "Hmm..".
>>
>> Only change in this department was:
>>
>> 831e3405c2a3 scsi: core: Don't start concurrent async scan on same host
>
> Yeah, I found that one too, and dismissed it for the same reason you
> did - it wasn't in rc1. Plus it looked very simple.
>
> That said, maybe Julia might have misspoken, and rc1 was ok, so I
> guess it's possible. The scan_mutex does show up in that "locks held"
> list, although I can't see why it would matter. But it does
> potentially change timing (so it could expose some existing race), if
> nothing else.
>
> But let's make sure Jens is aware of this too, in case it's some ATA
> issue. Not that any of those handful of 5.10 changes look remotely
> likely _either_.
>
> Jens, see
>
> https://lore.kernel.org/lkml/alpine.DEB.2.22.394.2012081813310.2680@hadrien/
>
> if you don't already have the lkml thread locally.. There's not enough
> of the dmesg to even really guess what Julia's actual hardware is,
> apart from it being a Seagate SATA disk. Julia? What controllers and
> disks do you have show up when things work?
>
> Linus
> .
>

JFYI, About "scsi: megaraid_sas: Added support for shared host tagset
for cpuhotplug", we did have an issue reported here already from Qian
about a boot hang:

https://lore.kernel.org/linux-scsi/[email protected]/

And the solution to that specific problem is in:
https://lore.kernel.org/linux-block/[email protected]/

This issue may be related, so you could test by reverting that megaraid
sas commit or setting the driver module param "host_tagset_enable=0"
just to see.

Thanks,
John


2020-12-08 21:27:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: problem booting 5.10

On Tue, Dec 8, 2020 at 1:14 PM John Garry <[email protected]> wrote:
>
> JFYI, About "scsi: megaraid_sas: Added support for shared host tagset
> for cpuhotplug", we did have an issue reported here already from Qian
> about a boot hang:

Hmm. That does sound like it might be it.

At this point, the patches from Ming Lei seem to be a riskier approach
than perhaps just reverting the megaraid_sas change?

It looks like those patches are queued up for 5.11, and we could
re-apply the megaraid_sas change then?

Jens, comments?

And Julia - if it's that thing, then a

git revert 103fbf8e4020

would be the thing to test.

Linus

2020-12-08 22:45:34

by Julia Lawall

[permalink] [raw]
Subject: Re: problem booting 5.10



On Tue, 8 Dec 2020, Linus Torvalds wrote:

> On Tue, Dec 8, 2020 at 1:14 PM John Garry <[email protected]> wrote:
> >
> > JFYI, About "scsi: megaraid_sas: Added support for shared host tagset
> > for cpuhotplug", we did have an issue reported here already from Qian
> > about a boot hang:
>
> Hmm. That does sound like it might be it.
>
> At this point, the patches from Ming Lei seem to be a riskier approach
> than perhaps just reverting the megaraid_sas change?
>
> It looks like those patches are queued up for 5.11, and we could
> re-apply the megaraid_sas change then?
>
> Jens, comments?
>
> And Julia - if it's that thing, then a
>
> git revert 103fbf8e4020
>
> would be the thing to test.

This solves the problem. Starting from 5.10-rc7 and doing this revert, I
get a kernel that boots.

thanks,
julia

2020-12-08 22:55:11

by Martin K. Petersen

[permalink] [raw]
Subject: Re: problem booting 5.10


Julia,

> This solves the problem. Starting from 5.10-rc7 and doing this revert, I
> get a kernel that boots.

Thanks for testing!

I'll go ahead and revert 103fbf8e4020 in 5.10/scsi-fixes. We can revisit
this change in 5.11 when Ming's fixes are in place.

--
Martin K. Petersen Oracle Linux Engineering

2020-12-09 01:13:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: problem booting 5.10

[ Just re-sending with Jens added back - he's been on a couple of the
emails, but wean't on this one. Sorry for the duplication ]

On Tue, Dec 8, 2020 at 1:23 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Dec 8, 2020 at 1:14 PM John Garry <[email protected]> wrote:
> >
> > JFYI, About "scsi: megaraid_sas: Added support for shared host tagset
> > for cpuhotplug", we did have an issue reported here already from Qian
> > about a boot hang:
>
> Hmm. That does sound like it might be it.
>
> At this point, the patches from Ming Lei seem to be a riskier approach
> than perhaps just reverting the megaraid_sas change?
>
> It looks like those patches are queued up for 5.11, and we could
> re-apply the megaraid_sas change then?
>
> Jens, comments?
>
> And Julia - if it's that thing, then a
>
> git revert 103fbf8e4020
>
> would be the thing to test.
>
> Linus

2020-12-09 01:15:23

by Jens Axboe

[permalink] [raw]
Subject: Re: problem booting 5.10

On 12/8/20 2:25 PM, Linus Torvalds wrote:
> [ Just re-sending with Jens added back - he's been on a couple of the
> emails, but wean't on this one. Sorry for the duplication ]

Don't think I was, but gmail shows me the rest of the thread now.

> On Tue, Dec 8, 2020 at 1:23 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Tue, Dec 8, 2020 at 1:14 PM John Garry <[email protected]> wrote:
>>>
>>> JFYI, About "scsi: megaraid_sas: Added support for shared host tagset
>>> for cpuhotplug", we did have an issue reported here already from Qian
>>> about a boot hang:
>>
>> Hmm. That does sound like it might be it.
>>
>> At this point, the patches from Ming Lei seem to be a riskier approach
>> than perhaps just reverting the megaraid_sas change?
>>
>> It looks like those patches are queued up for 5.11, and we could
>> re-apply the megaraid_sas change then?
>>
>> Jens, comments?
>>
>> And Julia - if it's that thing, then a
>>
>> git revert 103fbf8e4020
>>
>> would be the thing to test.

Ming's series is queued up for 5.11, so if the revert does show that
this is indeed the issue (and it sure looks like it), then I'd suggest
we simply revert this commit from 5.10 and we can revisit after the
merge window opens and Ming's patches are in anyway.

--
Jens Axboe

2020-12-09 01:15:35

by Jens Axboe

[permalink] [raw]
Subject: Re: problem booting 5.10

On Tue, Dec 8, 2020 at 3:42 PM Julia Lawall <[email protected]> wrote:
> On Tue, 8 Dec 2020, Linus Torvalds wrote:
>
> > On Tue, Dec 8, 2020 at 1:14 PM John Garry <[email protected]> wrote:
> > >
> > > JFYI, About "scsi: megaraid_sas: Added support for shared host tagset
> > > for cpuhotplug", we did have an issue reported here already from Qian
> > > about a boot hang:
> >
> > Hmm. That does sound like it might be it.
> >
> > At this point, the patches from Ming Lei seem to be a riskier approach
> > than perhaps just reverting the megaraid_sas change?
> >
> > It looks like those patches are queued up for 5.11, and we could
> > re-apply the megaraid_sas change then?
> >
> > Jens, comments?
> >
> > And Julia - if it's that thing, then a
> >
> > git revert 103fbf8e4020
> >
> > would be the thing to test.
>
> This solves the problem. Starting from 5.10-rc7 and doing this
> revert, I get a kernel that boots.

Thanks for testing! Linus, do you just want to revert this, or do you
want me to queue it up?

--
Jens Axboe

2020-12-09 01:16:13

by John Garry

[permalink] [raw]
Subject: Re: problem booting 5.10

On 08/12/2020 22:51, Martin K. Petersen wrote:
>
> Julia,
>
>> This solves the problem. Starting from 5.10-rc7 and doing this revert, I
>> get a kernel that boots.
>

Hi Julia,

Can you also please test Ming's patchset here (without the megaraid sas
revert) when you get a chance:
https://lore.kernel.org/linux-block/[email protected]/

And please also share your .config, as I guess that it is not mainline
vanilla and we will want to recreate this to be sure for future. Qian's
issue was only exposed with a specific .config enabling lots of heavy
debug options.

Thanks,
John

> Thanks for testing!
>
> I'll go ahead and revert 103fbf8e4020 in 5.10/scsi-fixes. We can revisit
> this change in 5.11 when Ming's fixes are in place.
>

2020-12-09 08:26:34

by Julia Lawall

[permalink] [raw]
Subject: Re: problem booting 5.10



On Tue, 8 Dec 2020, John Garry wrote:

> On 08/12/2020 22:51, Martin K. Petersen wrote:
> >
> > Julia,
> >
> > > This solves the problem. Starting from 5.10-rc7 and doing this revert, I
> > > get a kernel that boots.
> >
>
> Hi Julia,
>
> Can you also please test Ming's patchset here (without the megaraid sas
> revert) when you get a chance:
> https://lore.kernel.org/linux-block/[email protected]/
>
> And please also share your .config, as I guess that it is not mainline vanilla
> and we will want to recreate this to be sure for future. Qian's issue was only
> exposed with a specific .config enabling lots of heavy debug options.

My config is attached. I'll try the patchset shortly.

julia


Attachments:
i80.config (235.81 kB)

2020-12-09 15:50:21

by Julia Lawall

[permalink] [raw]
Subject: Re: problem booting 5.10



On Tue, 8 Dec 2020, John Garry wrote:

> On 08/12/2020 22:51, Martin K. Petersen wrote:
> >
> > Julia,
> >
> > > This solves the problem. Starting from 5.10-rc7 and doing this revert, I
> > > get a kernel that boots.
> >
>
> Hi Julia,
>
> Can you also please test Ming's patchset here (without the megaraid sas
> revert) when you get a chance:
> https://lore.kernel.org/linux-block/[email protected]/

5.10-rc7 plus these three commits boots fine.

thanks,
julia

>
> And please also share your .config, as I guess that it is not mainline vanilla
> and we will want to recreate this to be sure for future. Qian's issue was only
> exposed with a specific .config enabling lots of heavy debug options.
>
> Thanks,
> John
>
> > Thanks for testing!
> >
> > I'll go ahead and revert 103fbf8e4020 in 5.10/scsi-fixes. We can revisit
> > this change in 5.11 when Ming's fixes are in place.
> >
>

2020-12-09 15:57:38

by John Garry

[permalink] [raw]
Subject: Re: problem booting 5.10

On 09/12/2020 15:44, Julia Lawall wrote:
>
> On Tue, 8 Dec 2020, John Garry wrote:
>
>> On 08/12/2020 22:51, Martin K. Petersen wrote:
>>> Julia,
>>>
>>>> This solves the problem. Starting from 5.10-rc7 and doing this revert, I
>>>> get a kernel that boots.
>> Hi Julia,
>>
>> Can you also please test Ming's patchset here (without the megaraid sas
>> revert) when you get a chance:
>> https://lore.kernel.org/linux-block/[email protected]/
> 5.10-rc7 plus these three commits boots fine.
>

Hi Julia,

Ok, Thanks for the confirmation. A sort of relief.

@Kashyap, It would be good if we could recreate this, just in case.

Cheers,
John

2020-12-09 18:54:13

by Kashyap Desai

[permalink] [raw]
Subject: RE: problem booting 5.10

> -----Original Message-----
> From: John Garry [mailto:[email protected]]
> Sent: Wednesday, December 9, 2020 9:22 PM
> To: Julia Lawall <[email protected]>; Kashyap Desai
> <[email protected]>
> Cc: Martin K. Petersen <[email protected]>; Linus Torvalds
> <[email protected]>; James E.J. Bottomley
> <[email protected]>; Linux Kernel Mailing List <linux-
> [email protected]>; [email protected]; linux-scsi
> <[email protected]>; Ming Lei <[email protected]>; Sumit Saxena
> <[email protected]>; Shivasharan S
> <[email protected]>
> Subject: Re: problem booting 5.10
>
> On 09/12/2020 15:44, Julia Lawall wrote:
> >
> > On Tue, 8 Dec 2020, John Garry wrote:
> >
> >> On 08/12/2020 22:51, Martin K. Petersen wrote:
> >>> Julia,
> >>>
> >>>> This solves the problem. Starting from 5.10-rc7 and doing this
> >>>> revert, I get a kernel that boots.
> >> Hi Julia,
> >>
> >> Can you also please test Ming's patchset here (without the megaraid
> >> sas
> >> revert) when you get a chance:
> >> https://lore.kernel.org/linux-block/20201203012638.543321-1-ming.lei@
> >> redhat.com/
> > 5.10-rc7 plus these three commits boots fine.
> >
>
> Hi Julia,
>
> Ok, Thanks for the confirmation. A sort of relief.
>
> @Kashyap, It would be good if we could recreate this, just in case.

I already tested series and issue fixed for me. Final patch (V2) provided by
Ming already has "Tested-by" Tag from me.
I once again confirm using config file provided by Julia Lawall and same
result - Issue recreated once again and fixed by Ming's below patch.

https://lore.kernel.org/linux-block/[email protected]/

Kashyap

>
> Cheers,
> John

--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2020-12-10 01:45:48

by Martin K. Petersen

[permalink] [raw]
Subject: Re: problem booting 5.10


Julia,

> 5.10-rc7 plus these three commits boots fine.

Great! Thanks for confirming.

--
Martin K. Petersen Oracle Linux Engineering