2022-03-30 03:46:11

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

In sd_probe(), if device_add_disk() fails it simply calls put_device()
and jumps to the "out" label but the device is never deleted from system.
This leads to a memory leak as reported by Syzbot.[1]

Fix this bug by calling device_del() soon before put_device() when
device_add_disk() fails.

[1] [syzbot] memory leak in blk_mq_init_tags
https://lore.kernel.org/lkml/[email protected]/

Reported-by: [email protected]
Suggested-by: Dan Carpenter <[email protected]>
Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
Signed-off-by: Fabio M. De Francesco <[email protected]>
---

This patch replace the previous attempt to fix the bug reported by
Syzbot. Therefore, the previous wrong patch at
https://lore.kernel.org/lkml/[email protected]/
must be discarded.

Many thanks to Dan Carpenter.

drivers/scsi/sd.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a390679cf458..13d96d0f9dde 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3474,6 +3474,7 @@ static int sd_probe(struct device *dev)

error = device_add_disk(dev, gd, NULL);
if (error) {
+ device_del(&sdkp->disk_dev);
put_device(&sdkp->disk_dev);
goto out;
}
--
2.34.1


2022-03-30 16:13:55

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On Tue, Mar 29, 2022 at 05:49:48PM +0200, Fabio M. De Francesco wrote:
> In sd_probe(), if device_add_disk() fails it simply calls put_device()
> and jumps to the "out" label but the device is never deleted from system.
> This leads to a memory leak as reported by Syzbot.[1]
>
> Fix this bug by calling device_del() soon before put_device() when
> device_add_disk() fails.
>
> [1] [syzbot] memory leak in blk_mq_init_tags
> https://lore.kernel.org/lkml/[email protected]/
>
> Reported-by: [email protected]
> Suggested-by: Dan Carpenter <[email protected]>
> Fixes: 2a7a891f4c40 ("scsi: sd: Add error handling support for add_disk()")
> Signed-off-by: Fabio M. De Francesco <[email protected]>

Thanks!

regards,
dan carpenter

2022-03-31 06:05:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

> The temptation was to call device_unregister() which is a combined
> device_del(); device_put(); but when the device_initialize() and
> device_add() are called separately, then I think it is more readable to
> call del and put separately as well.

I think we should also consolidate the initialization side. Using
device_register and device_unregister would have prevented this bug
and I should have switched to that before refactoring the code.

2022-03-31 06:35:10

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On Thu, Mar 31, 2022 at 11:26:22AM -0400, 'Wenchao Hao' via syzkaller-bugs wrote:
> I do not think it's necessary to call device_del() on this path. If the device
> has been added, put_device() would delete it from sysfs. So the origin error
> handle is ok with me.
>

No. The original is buggy and it was detected at runtime by syzbot.
It's not static analysis, it is an actual bug found in testing.

The device_put() unwinds device_initialize(). The device_del() unwinds
device_add(). Take a look at the comments to device_add() or take a
look at how device_register/unregister() work.

The temptation was to call device_unregister() which is a combined
device_del(); device_put(); but when the device_initialize() and
device_add() are called separately, then I think it is more readable to
call del and put separately as well.

regards,
dan carpenter

2022-03-31 14:31:57

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
of questions in turn.

Fabio, did you test your patch?

This is another reason why syzbot should display the whole dmesg because
otherwise we can't ask people link to their test results if it just says
"PASSED" with no additional information. If syzbot provided a dmesg
at the end then I would require a link to it under the --- cut off
line for patches that I review.

In a way this gets back to the original testing that syzbot did do. If
Wenchao's reading of the code is correct the Fabio's patch caused a
series of use after free bugs but because the test results just said
"PASSED" with no additional information.

Either way, failing to call device_del() is still a bug.

Also, I don't really understand why we don't have to call
put_device(&sdkp->disk_dev) at the end of sd_remove().

regards,
dan carpenter

2022-03-31 16:14:39

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On Thu, 2022-03-31 at 16:42 +0300, Dan Carpenter wrote:
[...]
> Also, I don't really understand why we don't have to call
> put_device(&sdkp->disk_dev) at the end of sd_remove().

That's because the final put is done by the gendisk ->free_disk()
function which is scsi_disk_free_disk(). Most of the gendisk functions
we provide convert a gendisk to a scsi_disk (via the gendisk
private_data), so the sdkp has to live as long as the gendisk.

James


2022-03-31 17:02:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On Thu, Mar 31, 2022 at 11:07:45AM +0200, Fabio M. De Francesco wrote:
> If I don't misunderstand what you wrote, I think you mean something like
> the following changes:
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index a390679cf458..7a000a9a9dbe 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3431,7 +3431,7 @@ static int sd_probe(struct device *dev)
> sdkp->disk_dev.class = &sd_disk_class;
> dev_set_name(&sdkp->disk_dev, "%s", dev_name(dev));
>
> - error = device_add(&sdkp->disk_dev);
> + error = device_register(&sdkp->disk_dev);

The device_initialize call about also need to go.

> if (error) {
> put_device(&sdkp->disk_dev);

.. and this put_device

> goto out;
> @@ -3474,7 +3474,7 @@ static int sd_probe(struct device *dev)
>
> error = device_add_disk(dev, gd, NULL);
> if (error) {
> - put_device(&sdkp->disk_dev);
> + device_unregister(&sdkp->disk_dev);
> goto out;
> }

.. and then the cleanup patch would need the same logic. But thinking
about it I don't think we actually can do that due to the split
unregistration. So I take my suggestion back.

2022-03-31 18:32:59

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On giovedì? 31 marzo 2022 07:45:12 CEST Christoph Hellwig wrote:
> > The temptation was to call device_unregister() which is a combined
> > device_del(); device_put(); but when the device_initialize() and
> > device_add() are called separately, then I think it is more readable to
> > call del and put separately as well.
>
> I think we should also consolidate the initialization side. Using
> device_register and device_unregister would have prevented this bug
> and I should have switched to that before refactoring the code.
>
If I don't misunderstand what you wrote, I think you mean something like
the following changes:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a390679cf458..7a000a9a9dbe 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3431,7 +3431,7 @@ static int sd_probe(struct device *dev)
sdkp->disk_dev.class = &sd_disk_class;
dev_set_name(&sdkp->disk_dev, "%s", dev_name(dev));

- error = device_add(&sdkp->disk_dev);
+ error = device_register(&sdkp->disk_dev);
if (error) {
put_device(&sdkp->disk_dev);
goto out;
@@ -3474,7 +3474,7 @@ static int sd_probe(struct device *dev)

error = device_add_disk(dev, gd, NULL);
if (error) {
- put_device(&sdkp->disk_dev);
+ device_unregister(&sdkp->disk_dev);
goto out;
}

@Dan, @Christoph: what do you think of the changes that I've copy-pasted above?

Thanks,

Fabio M. De Francesco



2022-04-01 10:46:09

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On gioved? 31 marzo 2022 11:13:27 CEST Christoph Hellwig wrote:
>
> .. and then the cleanup patch would need the same logic. But thinking
> about it I don't think we actually can do that due to the split
> unregistration. So I take my suggestion back.
>

If I understand correctly, after thinking about it some more, you decided
to withdraw your own suggestion.

Dan had already approved this patch.

Therefore, I'll leave the patch as it is now and wait for someone to place
a "Reviewed-by" tag and Maintainers to merge (if, in the meantime, nobody
else require further changes).

Thanks,

Fabio



2022-04-01 12:02:40

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On Thu, Mar 31, 2022 at 10:19:57AM -0400, James Bottomley wrote:
> On Thu, 2022-03-31 at 16:42 +0300, Dan Carpenter wrote:
> [...]
> > Also, I don't really understand why we don't have to call
> > put_device(&sdkp->disk_dev) at the end of sd_remove().
>
> That's because the final put is done by the gendisk ->free_disk()
> function which is scsi_disk_free_disk(). Most of the gendisk functions
> we provide convert a gendisk to a scsi_disk (via the gendisk
> private_data), so the sdkp has to live as long as the gendisk.
>
> James

Thanks James.

Ah... And scsi_disk_free_disk() will not be called unless
device_add_disk() succeeds. So Wenchao Hao's patch is correct.

And after that there is just a missing device_del() and also I see
another problem where if device_add() fails then we need to call
put_disk(gd); on that error path.

regards,
dan carpenter

2022-04-01 14:05:26

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On gioved? 31 marzo 2022 15:42:10 CEST Dan Carpenter wrote:
> Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
> of questions in turn.
>
> Fabio, did you test your patch?

Yes, I did, Dan. I tested it the usual way with the "#syz test:" command.
Obviously I have not the hardware to test code on it.

Therefore, the messages that say "Syzbot tested the patch and it didn't
trigger any issue" is all that I can know about the code being good or not.
This is the criterion I've always used before sending patches for Syzbot's
reports.

However, my knowledge of these subsystems and the API that are related to
this bug were very little, but now I can say that during the last couple of
days it has improved to the point where I can affirm that Wenchao's patch
seems to me to be the only correct solution.

Thanks for all the help you and the the other developers provided. It was
invaluable for a better understanding of this matter.

Regards,

Fabio M. De Francesco


2022-04-01 17:40:44

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On gioved? 31 marzo 2022 18:24:16 CEST Dan Carpenter wrote:
> On Thu, Mar 31, 2022 at 06:14:27PM +0200, Fabio M. De Francesco wrote:
> > On gioved? 31 marzo 2022 15:42:10 CEST Dan Carpenter wrote:
> > > Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
> > > of questions in turn.
> > >
> > > Fabio, did you test your patch?
> >
> > Yes, I did, Dan. I tested it the usual way with the "#syz test:" command.
> > Obviously I have not the hardware to test code on it.
> >
>
> Yeah. What a nightmare. You posted a link to the first test. It said
> passed but definitely introduced some use after frees but how was anyone
> supposed to know?

Maybe that a "spare-time Linux developer" like me should leave these
kinds of bug fixes to more experienced people. But we should also note
that I tried two or three different patches and _all_ of them passed
the tests.

>
> No way we would have figured this out.

I think that something should change about the way Syzbot tests patches
and about how it provides the results. The other four or five bugs that
I have fixed were based mainly to the fact that they passed the Syzbot
tests.

Perhaps I've been lucky but my patches were good and they were merged.

However, I began to trust Syzbot too much. This is not how I should
approach and try to solve bugs.

> I'm working to make Smatch
> understand device_put() better but this one is way difficult.
>
> Sorry that you went through this.

Please don't be sorry :)

Believe me when I say that I cannot explain how many things I have
learned during these days while working on this issue. I see no
problems at all but only opportunities for learning.

Thank you very much!

Fabio M. De Francesco

>
> regards,
> dan carpenter
>
>




2022-04-01 18:32:30

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH] scsi: sd: call device_del() if device_add_disk() fails

On Thu, Mar 31, 2022 at 06:14:27PM +0200, Fabio M. De Francesco wrote:
> On gioved? 31 marzo 2022 15:42:10 CEST Dan Carpenter wrote:
> > Wenchao Hao, what you're saying makes a lot of sense but it raises a lot
> > of questions in turn.
> >
> > Fabio, did you test your patch?
>
> Yes, I did, Dan. I tested it the usual way with the "#syz test:" command.
> Obviously I have not the hardware to test code on it.
>

Yeah. What a nightmare. You posted a link to the first test. It said
passed but definitely introduced some use after frees but how was anyone
supposed to know?

No way we would have figured this out. I'm working to make Smatch
understand device_put() better but this one is way difficult.

Sorry that you went through this.

regards,
dan carpenter