2021-11-01 20:05:20

by Tadeusz Struk

[permalink] [raw]
Subject: Re: general protection fault in del_gendisk

On 10/29/21 12:13, Tadeusz Struk wrote:
> Hi,
> I'm looking at a bug found by the syzkaller robot [1], and I just wanted
> to confirm that my understanding is correct, and the issue can be closed.
> First, the kernel is configured with some fault injections enabled:
>
> CONFIG_FAULT_INJECTION=y
> CONFIG_FAILSLAB=y
> CONFIG_FAIL_PAGE_ALLOC=y
>
> The test adds loop devices, which causes some entries in sysfs to be created.
> It does some magic with ioctls, which calls:
> __device_add_disk() -> register_disk()
> which eventually triggers sysfs_create_files() and it crashes there,
> in line 627 [2], because the fault injector logic triggers it.
> That can be seen in the trace [3]:
> [   34.089707][ T1813] FAULT_INJECTION: forcing a failure.
>
> Sysfs code returns a -ENOMEM error, but because the __device_add_disk()
> implementation mostly uses void function, and doesn't return on errors [4]
> it goes farther, hits some warnings, like:
> disk_add_events() -> sysfs_create_files() -> sysfs_create_file_ns() - > WARN()
> and eventually triggers general protection fault in sysfs code, and panics there.
>
> I think for this to recover and return an error to the caller via ioctl()
> the __device_add_disk() code would need be reworked to handle errors,
> and return errors to the caller.
> My question is: is it implemented like this by design? Are there any plans
> to make it fail more gracefully?

Hi,
Any comments on this one?

--
Thanks,
Tadeusz


2021-11-01 20:07:08

by Jens Axboe

[permalink] [raw]
Subject: Re: general protection fault in del_gendisk

On 11/1/21 2:01 PM, Tadeusz Struk wrote:
> On 10/29/21 12:13, Tadeusz Struk wrote:
>> Hi,
>> I'm looking at a bug found by the syzkaller robot [1], and I just wanted
>> to confirm that my understanding is correct, and the issue can be closed.
>> First, the kernel is configured with some fault injections enabled:
>>
>> CONFIG_FAULT_INJECTION=y
>> CONFIG_FAILSLAB=y
>> CONFIG_FAIL_PAGE_ALLOC=y
>>
>> The test adds loop devices, which causes some entries in sysfs to be created.
>> It does some magic with ioctls, which calls:
>> __device_add_disk() -> register_disk()
>> which eventually triggers sysfs_create_files() and it crashes there,
>> in line 627 [2], because the fault injector logic triggers it.
>> That can be seen in the trace [3]:
>> [   34.089707][ T1813] FAULT_INJECTION: forcing a failure.
>>
>> Sysfs code returns a -ENOMEM error, but because the __device_add_disk()
>> implementation mostly uses void function, and doesn't return on errors [4]
>> it goes farther, hits some warnings, like:
>> disk_add_events() -> sysfs_create_files() -> sysfs_create_file_ns() - > WARN()
>> and eventually triggers general protection fault in sysfs code, and panics there.
>>
>> I think for this to recover and return an error to the caller via ioctl()
>> the __device_add_disk() code would need be reworked to handle errors,
>> and return errors to the caller.
>> My question is: is it implemented like this by design? Are there any plans
>> to make it fail more gracefully?
>
> Hi,
> Any comments on this one?

People will take a look at it, but you sent it out on a Saturday right
before a merge window, doing a 'ping' kind of followup on a Monday is
way too soon.

--
Jens Axboe

2021-11-02 06:48:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: general protection fault in del_gendisk

> People will take a look at it, but you sent it out on a Saturday right
> before a merge window, doing a 'ping' kind of followup on a Monday is
> way too soon.

Please retests on 5.16-rc1 once it is out. Sorting out add_disk error
handling is one of the big changes in this merge window. And no, it is
not easily backportable.

2021-11-09 03:15:48

by Tadeusz Struk

[permalink] [raw]
Subject: Re: general protection fault in del_gendisk

On 11/1/21 23:45, Christoph Hellwig wrote:
>> People will take a look at it, but you sent it out on a Saturday right
>> before a merge window, doing a 'ping' kind of followup on a Monday is
>> way too soon.
> Please retests on 5.16-rc1 once it is out. Sorting out add_disk error
> handling is one of the big changes in this merge window. And no, it is
> not easily backportable.

I have ran this on the latest mainline. I can confirm that the errors are
nicely handled there. It triggers a warning and panics in device_add_disk()
because of:
return WARN_ON_ONCE(ret); /* keep until all callers handle errors */
and
Kernel panic - not syncing: panic_on_warn set ...

I will close the issue.
--
Thanks,
Tadeusz

2021-11-12 06:36:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: general protection fault in del_gendisk

On Mon, Nov 08, 2021 at 11:19:35AM -0800, Tadeusz Struk wrote:
> I have ran this on the latest mainline. I can confirm that the errors are
> nicely handled there. It triggers a warning and panics in device_add_disk()
> because of:
> return WARN_ON_ONCE(ret); /* keep until all callers handle errors */
> and
> Kernel panic - not syncing: panic_on_warn set ...

The WARRN_ON will go away once all drivers are fixed. And no, it does
not panic the kernel unless you set an obscure sysctl asking for it to
panic on warnings, in which case you get what you ask for.