2024-03-04 16:36:33

by Greg KH

[permalink] [raw]
Subject: Re: [Bug] INFO: task hung in hub_activate

On Mon, Mar 04, 2024 at 11:15:24AM -0500, Alan Stern wrote:
> Third, this must be a generic problem. It will occur any time a sysfs
> attribute callback tries to lock its device while another process is
> trying to unregister that device.
>
> We faced this sort of problem some years ago when we were worrying
> about "suicidal" attributes -- ones which would unregister their own
> devices. I don't remember what the fix was or how it worked. But we
> need something like it here.
>
> Greg and Tejun, any ideas? Is it possible somehow for an attribute file
> to be removed while its callback is still running?

Yes, it's a pain, and I hate it, but I think SCSI does this somehow for
one of their attributes. I don't remember how at the moment, and I
can't look it up (am traveling), but this should be a good hint.

thanks,

greg k-h


2024-03-04 18:30:59

by Alan Stern

[permalink] [raw]
Subject: Re: [Bug] INFO: task hung in hub_activate

On Mon, Mar 04, 2024 at 05:36:19PM +0100, Greg KH wrote:
> On Mon, Mar 04, 2024 at 11:15:24AM -0500, Alan Stern wrote:
> > Third, this must be a generic problem. It will occur any time a sysfs
> > attribute callback tries to lock its device while another process is
> > trying to unregister that device.
> >
> > We faced this sort of problem some years ago when we were worrying
> > about "suicidal" attributes -- ones which would unregister their own
> > devices. I don't remember what the fix was or how it worked. But we
> > need something like it here.
> >
> > Greg and Tejun, any ideas? Is it possible somehow for an attribute file
> > to be removed while its callback is still running?
>
> Yes, it's a pain, and I hate it, but I think SCSI does this somehow for
> one of their attributes. I don't remember how at the moment, and I
> can't look it up (am traveling), but this should be a good hint.

Are you thinking of the sysfs_break_active_protection() and
sysfs_unbreak_active_protection() functions? They seem to be the
appropriate ones to use here.

Alan Stern

2024-03-04 19:17:47

by Alan Stern

[permalink] [raw]
Subject: Bug in sysfs_break_active_protection()

Tejun:

It looks like there's a memory leak in sysfs_break_active_protection().
It and its companion routine do this:

struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj,
const struct attribute *attr)
{
struct kernfs_node *kn;

kobject_get(kobj);
kn = kernfs_find_and_get(kobj->sd, attr->name);
if (kn)
kernfs_break_active_protection(kn);
return kn;
}

void sysfs_unbreak_active_protection(struct kernfs_node *kn)
{
struct kobject *kobj = kn->parent->priv;

kernfs_unbreak_active_protection(kn);
kernfs_put(kn);
kobject_put(kobj);
}

If kn is NULL then the kobject_get(kobj) reference is never dropped.
It looks like this could happen if two processes want to unregister the
same kobject at the same time.

Shouldn't sysfs_break_active_protection() do this?

kobject_get(kobj);
kn = kernfs_find_and_get(kobj->sd, attr->name);
if (kn)
kernfs_break_active_protection(kn);
+ else
+ kobject_put(kobj);
return kn;

Alan Stern

2024-03-05 12:03:16

by Greg KH

[permalink] [raw]
Subject: Re: [Bug] INFO: task hung in hub_activate

On Mon, Mar 04, 2024 at 01:30:35PM -0500, Alan Stern wrote:
> On Mon, Mar 04, 2024 at 05:36:19PM +0100, Greg KH wrote:
> > On Mon, Mar 04, 2024 at 11:15:24AM -0500, Alan Stern wrote:
> > > Third, this must be a generic problem. It will occur any time a sysfs
> > > attribute callback tries to lock its device while another process is
> > > trying to unregister that device.
> > >
> > > We faced this sort of problem some years ago when we were worrying
> > > about "suicidal" attributes -- ones which would unregister their own
> > > devices. I don't remember what the fix was or how it worked. But we
> > > need something like it here.
> > >
> > > Greg and Tejun, any ideas? Is it possible somehow for an attribute file
> > > to be removed while its callback is still running?
> >
> > Yes, it's a pain, and I hate it, but I think SCSI does this somehow for
> > one of their attributes. I don't remember how at the moment, and I
> > can't look it up (am traveling), but this should be a good hint.
>
> Are you thinking of the sysfs_break_active_protection() and
> sysfs_unbreak_active_protection() functions? They seem to be the
> appropriate ones to use here.

Yes, that sounds correct.

2024-03-13 20:45:25

by Tejun Heo

[permalink] [raw]
Subject: Re: Bug in sysfs_break_active_protection()

Sorry about late reply.

On Mon, Mar 04, 2024 at 02:17:27PM -0500, Alan Stern wrote:
..
> struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj,
> const struct attribute *attr)
> {
> struct kernfs_node *kn;
>
> kobject_get(kobj);
> kn = kernfs_find_and_get(kobj->sd, attr->name);
> if (kn)
> kernfs_break_active_protection(kn);
> return kn;
> }
..
> If kn is NULL then the kobject_get(kobj) reference is never dropped.
> It looks like this could happen if two processes want to unregister the
> same kobject at the same time.
>
> Shouldn't sysfs_break_active_protection() do this?
>
> kobject_get(kobj);
> kn = kernfs_find_and_get(kobj->sd, attr->name);
> if (kn)
> kernfs_break_active_protection(kn);
> + else
> + kobject_put(kobj);
> return kn;

Yeah, I think you're right. It's an obvious ref leak. Would you mind writing
up a patch?

Thank you.

--
tejun

2024-03-13 21:43:56

by Alan Stern

[permalink] [raw]
Subject: [PATCH] fs: sysfs: Fix reference leak in sysfs_break_active_protection()

The sysfs_break_active_protection() routine has an obvious reference
leak in its error path. If the call to kernfs_find_and_get() fails then
kn will be NULL, so the companion sysfs_unbreak_active_protection()
routine won't get called (and would only cause an access violation by
trying to dereference kn->parent if it was called). As a result, the
reference to kobj acquired at the start of the function will never be
released.

Fix the leak by adding an explicit kobject_put() call when kn is NULL.

Signed-off-by: Alan Stern <[email protected]>
Fixes: 2afc9166f79b ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
Cc: Bart Van Assche <[email protected]>
Cc: <[email protected]>

---

fs/sysfs/file.c | 2 ++
1 file changed, 2 insertions(+)

Index: usb-devel/fs/sysfs/file.c
===================================================================
--- usb-devel.orig/fs/sysfs/file.c
+++ usb-devel/fs/sysfs/file.c
@@ -463,6 +463,8 @@ struct kernfs_node *sysfs_break_active_p
kn = kernfs_find_and_get(kobj->sd, attr->name);
if (kn)
kernfs_break_active_protection(kn);
+ else
+ kobject_put(kobj);
return kn;
}
EXPORT_SYMBOL_GPL(sysfs_break_active_protection);

2024-03-13 21:44:59

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] fs: sysfs: Fix reference leak in sysfs_break_active_protection()

On Wed, Mar 13, 2024 at 05:43:41PM -0400, Alan Stern wrote:
> The sysfs_break_active_protection() routine has an obvious reference
> leak in its error path. If the call to kernfs_find_and_get() fails then
> kn will be NULL, so the companion sysfs_unbreak_active_protection()
> routine won't get called (and would only cause an access violation by
> trying to dereference kn->parent if it was called). As a result, the
> reference to kobj acquired at the start of the function will never be
> released.
>
> Fix the leak by adding an explicit kobject_put() call when kn is NULL.
>
> Signed-off-by: Alan Stern <[email protected]>
> Fixes: 2afc9166f79b ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
> Cc: Bart Van Assche <[email protected]>
> Cc: <[email protected]>

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2024-03-13 22:10:36

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] fs: sysfs: Fix reference leak in sysfs_break_active_protection()

On 3/13/24 14:43, Alan Stern wrote:
> The sysfs_break_active_protection() routine has an obvious reference
> leak in its error path. If the call to kernfs_find_and_get() fails then
> kn will be NULL, so the companion sysfs_unbreak_active_protection()
> routine won't get called (and would only cause an access violation by
> trying to dereference kn->parent if it was called). As a result, the
> reference to kobj acquired at the start of the function will never be
> released.
>
> Fix the leak by adding an explicit kobject_put() call when kn is NULL.

Reviewed-by: Bart Van Assche <[email protected]>