2019-11-19 19:01:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: general protection fault in kernfs_add_one

So looking at the decode, as usual the noise generated by KASAN isn't
being very helpful, but it does look like at least one of the reports
(I picked 5.2 because I don't care about 4.19 etc) is because
'kernfs_root(kn) is NULL in kernfs_add_one().

Looking at the reports, every single one seems to have a call chain
that comes from vhci_write() -> vhci_get_user() ->
vhci_create_device() -> __vhci_create_device() -> hci_register_dev()
-> device_add() -> kobject_add().

(In this case, "every single one" is by looking at the last 10 reports
sorted by date, it wasn't exhaustive).

The way it got into 'write()' can be a bit varied (splice, write, whatever).

That makes me think it's bluetooth that is the problem, but it might
be an effect of how syzbot groups the reports too, of course.

Might the device have been added at the same time that the last
previous device was removed, so that the parent was deleted as the new
device was aded? I dunno. The repro seem to be a repeated "open
/dev/vhci, write two random bytes to it"

Or might it be some "it happens after you've added enough devices that
something overflows" issue?

Adding bluetooth people to the cc.

Linus

On Mon, Nov 18, 2019 at 10:27 PM syzbot
<[email protected]> wrote:
>
> syzbot has bisected this bug to:
>
> commit 726e41097920a73e4c7c33385dcc0debb1281e18
> Author: Benjamin Herrenschmidt <[email protected]>
> Date: Tue Jul 10 00:29:10 2018 +0000
>
> drivers: core: Remove glue dirs from sysfs earlier
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=168e1012e00000
> start commit: 5e335542 Merge branch 'for-linus' of git://git.kernel.org/..
> git tree: upstream
> final crash: https://syzkaller.appspot.com/x/report.txt?x=158e1012e00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=118e1012e00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=9917ff4b798e1a1e
> dashboard link: https://syzkaller.appspot.com/bug?extid=db1637662f412ac0d556
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10a66c11400000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1346c771400000
>
> Reported-by: [email protected]
> Fixes: 726e41097920 ("drivers: core: Remove glue dirs from sysfs earlier")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection


2019-11-19 23:04:06

by Marcel Holtmann

[permalink] [raw]
Subject: Re: general protection fault in kernfs_add_one

Hi Linus,

> So looking at the decode, as usual the noise generated by KASAN isn't
> being very helpful, but it does look like at least one of the reports
> (I picked 5.2 because I don't care about 4.19 etc) is because
> 'kernfs_root(kn) is NULL in kernfs_add_one().
>
> Looking at the reports, every single one seems to have a call chain
> that comes from vhci_write() -> vhci_get_user() ->
> vhci_create_device() -> __vhci_create_device() -> hci_register_dev()
> -> device_add() -> kobject_add().
>
> (In this case, "every single one" is by looking at the last 10 reports
> sorted by date, it wasn't exhaustive).
>
> The way it got into 'write()' can be a bit varied (splice, write, whatever).
>
> That makes me think it's bluetooth that is the problem, but it might
> be an effect of how syzbot groups the reports too, of course.
>
> Might the device have been added at the same time that the last
> previous device was removed, so that the parent was deleted as the new
> device was aded? I dunno. The repro seem to be a repeated "open
> /dev/vhci, write two random bytes to it"
>
> Or might it be some "it happens after you've added enough devices that
> something overflows" issue?

long time ago there used to be an issue with quick device remove / device add operations, but that was fixed. I am just too fuzzy on the details since it has been a while.

We also haven’t touched our sysfs integration in a while and Bluetooth support is so old that this might have been bit-rotting.

I need to run the re-producer myself and see if something stands out that I can spot.

Regards

Marcel


2019-11-20 04:04:58

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: general protection fault in kernfs_add_one

On Tue, 2019-11-19 at 11:00 -0800, Linus Torvalds wrote:
> So looking at the decode, as usual the noise generated by KASAN isn't
> being very helpful, but it does look like at least one of the reports
> (I picked 5.2 because I don't care about 4.19 etc) is because
> 'kernfs_root(kn) is NULL in kernfs_add_one().
>
> Looking at the reports, every single one seems to have a call chain
> that comes from vhci_write() -> vhci_get_user() ->
> vhci_create_device() -> __vhci_create_device() -> hci_register_dev()
> -> device_add() -> kobject_add().
>
> (In this case, "every single one" is by looking at the last 10
> reports
> sorted by date, it wasn't exhaustive).
>
> The way it got into 'write()' can be a bit varied (splice, write,
> whatever).
>
> That makes me think it's bluetooth that is the problem, but it might
> be an effect of how syzbot groups the reports too, of course.
>
> Might the device have been added at the same time that the last
> previous device was removed, so that the parent was deleted as the
> new
> device was aded? I dunno. The repro seem to be a repeated "open
> /dev/vhci, write two random bytes to it"
>
> Or might it be some "it happens after you've added enough devices
> that
> something overflows" issue?
>
> Adding bluetooth people to the cc.

Could this be what was fixed by:

ac43432cb1f5c2950408534987e57c2071e24d8f
("driver core: Fix use-after-free and double free on glue directory")

Which went into 5.3 afaik ?

Cheers,
Ben.

> Linus
>
> On Mon, Nov 18, 2019 at 10:27 PM syzbot
> <[email protected]> wrote:
> >
> > syzbot has bisected this bug to:
> >
> > commit 726e41097920a73e4c7c33385dcc0debb1281e18
> > Author: Benjamin Herrenschmidt <[email protected]>
> > Date: Tue Jul 10 00:29:10 2018 +0000
> >
> > drivers: core: Remove glue dirs from sysfs earlier
> >
> > bisection log:
> > https://syzkaller.appspot.com/x/bisect.txt?x=168e1012e00000
> > start commit: 5e335542 Merge branch 'for-linus' of
> > git://git.kernel.org/..
> > git tree: upstream
> > final crash:
> > https://syzkaller.appspot.com/x/report.txt?x=158e1012e00000
> > console output:
> > https://syzkaller.appspot.com/x/log.txt?x=118e1012e00000
> > kernel config:
> > https://syzkaller.appspot.com/x/.config?x=9917ff4b798e1a1e
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=db1637662f412ac0d556
> > syz repro:
> > https://syzkaller.appspot.com/x/repro.syz?x=10a66c11400000
> > C reproducer:
> > https://syzkaller.appspot.com/x/repro.c?x=1346c771400000
> >
> > Reported-by: [email protected]
> > Fixes: 726e41097920 ("drivers: core: Remove glue dirs from sysfs
> > earlier")
> >
> > For information about bisection process see:
> > https://goo.gl/tpsmEJ#bisection


2019-11-20 17:37:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: general protection fault in kernfs_add_one

On Tue, Nov 19, 2019 at 8:04 PM Benjamin Herrenschmidt
<[email protected]> wrote:
>
> Could this be what was fixed by:
>
> ac43432cb1f5c2950408534987e57c2071e24d8f
> ("driver core: Fix use-after-free and double free on glue directory")
>
> Which went into 5.3 afaik ?

Hmm. Sounds very possible. It matches the commit syzbot bisected to,
and looking at the reports, the I can't find anything that is 5.3 or
later.

I did find a 5.3.0-rc2+ report, but that's still consistent with that
commit: it got merged just before 5.3-rc4.

So I think you're right.

I forget what the magic email rule was to report that something is
fixed to syzbot..

Linus

2019-11-22 08:13:10

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: general protection fault in kernfs_add_one

On Wed, Nov 20, 2019 at 5:54 PM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Nov 19, 2019 at 8:04 PM Benjamin Herrenschmidt
> <[email protected]> wrote:
> >
> > Could this be what was fixed by:
> >
> > ac43432cb1f5c2950408534987e57c2071e24d8f
> > ("driver core: Fix use-after-free and double free on glue directory")
> >
> > Which went into 5.3 afaik ?
>
> Hmm. Sounds very possible. It matches the commit syzbot bisected to,
> and looking at the reports, the I can't find anything that is 5.3 or
> later.
>
> I did find a 5.3.0-rc2+ report, but that's still consistent with that
> commit: it got merged just before 5.3-rc4.
>
> So I think you're right.
>
> I forget what the magic email rule was to report that something is
> fixed to syzbot..

Hi Linus,

This would be:

#syz fix: driver core: Fix use-after-free and double free on glue directory

FTR, the cheat sheet is referenced in every bug report:

> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.