2023-05-29 16:28:36

by Sedat Dilek

[permalink] [raw]
Subject: Revert "module: error out early on concurrent load of the same module file"

Hi,

after building Linux v6.4-rc4 I can NOT boot into my Debian GNU/Linux
AMD64 system with root-ext4 (/dev/sdc2 - of course using UUID in
/etc/fstab).

Hanging in...

Running /scripts/local-block

Entering busybox (shell) shows /dev/sdaX and /dev/sdbX all NOT Ext4-FS
formatted - there is no ls /dev/sdc*.

Attached are my qemu script + log if that helps.

I have CONFIG_EXT4_FS=m - might be a good idea to move that to built-in.

kmod is Debian version 30+20221128-1 - for the records.

Building from scratch with latest Linus Git including:

Revert "module: error out early on concurrent load of the same module file"
https://git.kernel.org/linus/ac2263b588dffd3a1efd7ed0b156ea6c5aea200d

My kernel-config is also attached.

Thanks.

Best regards,
-Sedat-


Attachments:
qemu-log.txt (25.48 kB)
run_qemu.sh (289.00 B)
config-6.4.0-rc4-1-amd64-clang16-kcfi (257.94 kB)
Download all attachments

2023-05-29 17:46:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"

On Mon, May 29, 2023 at 12:18 PM Sedat Dilek <[email protected]> wrote:
>
> Building from scratch with latest Linus Git including:
>
> Revert "module: error out early on concurrent load of the same module file"
> https://git.kernel.org/linus/ac2263b588dffd3a1efd7ed0b156ea6c5aea200d

So just to confirm: both plain 6.4 _and_ with that revert hangs?

The revert is pure "go back to old state", so the revert really
shouldn't cause any problems what-so-ever.

So if rc4 doesn't boot for you, and the revert also didn't fix it for
you, then there is something else going on.

There have been other module changes during the merge window, and
obviously it might be entirely unrelated to modules too. Can you try
to narrow down when it started failing?

Linus

2023-05-30 04:00:41

by Sedat Dilek

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"

On Mon, May 29, 2023 at 7:33 PM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, May 29, 2023 at 12:18 PM Sedat Dilek <[email protected]> wrote:
> >
> > Building from scratch with latest Linus Git including:
> >
> > Revert "module: error out early on concurrent load of the same module file"
> > https://git.kernel.org/linus/ac2263b588dffd3a1efd7ed0b156ea6c5aea200d
>
> So just to confirm: both plain 6.4 _and_ with that revert hangs?
>
> The revert is pure "go back to old state", so the revert really
> shouldn't cause any problems what-so-ever.
>
> So if rc4 doesn't boot for you, and the revert also didn't fix it for
> you, then there is something else going on.
>
> There have been other module changes during the merge window, and
> obviously it might be entirely unrelated to modules too. Can you try
> to narrow down when it started failing?
>

I was able to boot into my system again with the reverted commit.
Note: Ext4-FS was built as module.

-Sedat-

2023-05-30 21:48:38

by Dan Williams

[permalink] [raw]
Subject: RE: Revert "module: error out early on concurrent load of the same module file"

[ add linux-cxl ]

Sedat Dilek wrote:
> Hi,
>
> after building Linux v6.4-rc4 I can NOT boot into my Debian GNU/Linux
> AMD64 system with root-ext4 (/dev/sdc2 - of course using UUID in
> /etc/fstab).

I did not find a mailing-list thread for "9828ed3f695a module: error out early
on concurrent load of the same module file", so replying here. This
commit breaks the basic CXL smoke test of loading the combination of
cxl_acpi, cxl_pci, and cxl_mem modules.

Just wanted to highlight this a test case for the next attempt at this
fix.


2023-05-30 22:03:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"

On Tue, May 30, 2023 at 5:43 PM Dan Williams <[email protected]> wrote:
>
> [ add linux-cxl ]
>
> Sedat Dilek wrote:
> > Hi,
> >
> > after building Linux v6.4-rc4 I can NOT boot into my Debian GNU/Linux
> > AMD64 system with root-ext4 (/dev/sdc2 - of course using UUID in
> > /etc/fstab).
>
> I did not find a mailing-list thread for "9828ed3f695a module: error out early
> on concurrent load of the same module file", so replying here.

It is this thread:

https://lore.kernel.org/lkml/[email protected]/

which initially proposed a different solution, then that "just reject
concurrent loads", and after that caused problems, there's yet another
proposal at

https://lore.kernel.org/lkml/CAHk-=wg7ihygotpO9x5a6QJO5oAom9o91==L_Kx-gUHvRYuXiQ@mail.gmail.com/

although if you want to try out that approach, Johan pointed out a
missing initialization of a spinlock in that patch in a reply there.

> Just wanted to highlight this a test case for the next attempt at this
> fix.

See above: the next attempt won't be until 6.5, but if you saw the
failure on your test-cases, it might be a good idea to check out that
next attempt early..

Linus

2023-05-30 22:04:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"'

On Tue, May 30, 2023 at 02:43:12PM -0700, Dan Williams wrote:
> [ add linux-cxl ]
>
> Sedat Dilek wrote:
> > Hi,
> >
> > after building Linux v6.4-rc4 I can NOT boot into my Debian GNU/Linux
> > AMD64 system with root-ext4 (/dev/sdc2 - of course using UUID in
> > /etc/fstab).
>
> I did not find a mailing-list thread for "9828ed3f695a module: error out early
> on concurrent load of the same module file", so replying here. This
> commit breaks the basic CXL smoke test of loading the combination of
> cxl_acpi, cxl_pci, and cxl_mem modules.
>
> Just wanted to highlight this a test case for the next attempt at this
> fix.

The revert has already been done:

commit ac2263b588dffd3a1efd7ed0b156ea6c5aea200d
Author: Linus Torvalds <[email protected]>
Date: Mon May 29 06:40:33 2023 -0400

Revert "module: error out early on concurrent load of the same
module file"

The smoke test, is this the ndctl cxl tests with the mock driver? If so
then I could use it to test future efforts for alternatives for this
work before any new changes get merged.

Luis

2023-05-30 22:38:44

by Dan Williams

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"

Linus Torvalds wrote:
> On Tue, May 30, 2023 at 5:43 PM Dan Williams <[email protected]> wrote:
> >
> > [ add linux-cxl ]
> >
> > Sedat Dilek wrote:
> > > Hi,
> > >
> > > after building Linux v6.4-rc4 I can NOT boot into my Debian GNU/Linux
> > > AMD64 system with root-ext4 (/dev/sdc2 - of course using UUID in
> > > /etc/fstab).
> >
> > I did not find a mailing-list thread for "9828ed3f695a module: error out early
> > on concurrent load of the same module file", so replying here.
>
> It is this thread:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> which initially proposed a different solution, then that "just reject
> concurrent loads", and after that caused problems, there's yet another
> proposal at
>
> https://lore.kernel.org/lkml/CAHk-=wg7ihygotpO9x5a6QJO5oAom9o91==L_Kx-gUHvRYuXiQ@mail.gmail.com/
>
> although if you want to try out that approach, Johan pointed out a
> missing initialization of a spinlock in that patch in a reply there.
>
> > Just wanted to highlight this a test case for the next attempt at this
> > fix.
>
> See above: the next attempt won't be until 6.5, but if you saw the
> failure on your test-cases, it might be a good idea to check out that
> next attempt early..

Thanks, will check those out.

I know that the "Link:" for "mailing-list thread where patch originated"
is mostly useless information [1], but when it comes to quickly reporting
test results on the output of "git bisect", it comes in handy.

[1]: https://lore.kernel.org/all/CAHk-=wgzRUT1fBpuz3xcN+YdsX0SxqOzHWRtj0ReHpUBb5TKbA@mail.gmail.com/

2023-05-30 22:40:05

by Dan Williams

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"'

Luis Chamberlain wrote:
> On Tue, May 30, 2023 at 02:43:12PM -0700, Dan Williams wrote:
> > [ add linux-cxl ]
> >
> > Sedat Dilek wrote:
> > > Hi,
> > >
> > > after building Linux v6.4-rc4 I can NOT boot into my Debian GNU/Linux
> > > AMD64 system with root-ext4 (/dev/sdc2 - of course using UUID in
> > > /etc/fstab).
> >
> > I did not find a mailing-list thread for "9828ed3f695a module: error out early
> > on concurrent load of the same module file", so replying here. This
> > commit breaks the basic CXL smoke test of loading the combination of
> > cxl_acpi, cxl_pci, and cxl_mem modules.
> >
> > Just wanted to highlight this a test case for the next attempt at this
> > fix.
>
> The revert has already been done:

Yup, found that shortly after.

>
> commit ac2263b588dffd3a1efd7ed0b156ea6c5aea200d
> Author: Linus Torvalds <[email protected]>
> Date: Mon May 29 06:40:33 2023 -0400
>
> Revert "module: error out early on concurrent load of the same
> module file"
>
> The smoke test, is this the ndctl cxl tests with the mock driver? If so
> then I could use it to test future efforts for alternatives for this
> work before any new changes get merged.

In this case this was even before running those. I typically just run
"cxl list" on my base QEMU config, and if it comes up empty then
something regressed.

2023-05-30 22:49:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"

On Tue, May 30, 2023 at 6:19 PM Dan Williams <[email protected]> wrote:
>
> I know that the "Link:" for "mailing-list thread where patch originated"
> is mostly useless information [1], but when it comes to quickly reporting
> test results on the output of "git bisect", it comes in handy.

It was literally there in this case. We had multiple links, and you
may just have been overwhelmed by the pure cornucopia of links.

In this case is was the third one:

Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%[email protected]/
[3]

which linked to that thread.

It's the links to pure patch submissions that are useless (ie the
"this is where I sent the patch"). Those lore can find for you
automatically.

Links to actual threads with background and test commentary are
useful, and I add those myself. There were several of them.

Linus

2023-05-30 23:18:07

by Dan Williams

[permalink] [raw]
Subject: Re: Revert "module: error out early on concurrent load of the same module file"

Linus Torvalds wrote:
> On Tue, May 30, 2023 at 6:19 PM Dan Williams <[email protected]> wrote:
> >
> > I know that the "Link:" for "mailing-list thread where patch originated"
> > is mostly useless information [1], but when it comes to quickly reporting
> > test results on the output of "git bisect", it comes in handy.
>
> It was literally there in this case. We had multiple links, and you
> may just have been overwhelmed by the pure cornucopia of links.
>
> In this case is was the third one:
>
> Link: https://lore.kernel.org/lkml/ZG%2Fa+nrt4%[email protected]/
> [3]
>
> which linked to that thread.
>
> It's the links to pure patch submissions that are useless (ie the
> "this is where I sent the patch"). Those lore can find for you
> automatically.
>
> Links to actual threads with background and test commentary are
> useful, and I add those myself. There were several of them.

Yes, my bad, I was lost in the cornucopia of that thread.