On Mon, Mar 04, 2024 at 11:15:24AM -0500, Alan Stern wrote:
> Third, this must be a generic problem. It will occur any time a sysfs
> attribute callback tries to lock its device while another process is
> trying to unregister that device.
>
> We faced this sort of problem some years ago when we were worrying
> about "suicidal" attributes -- ones which would unregister their own
> devices. I don't remember what the fix was or how it worked. But we
> need something like it here.
>
> Greg and Tejun, any ideas? Is it possible somehow for an attribute file
> to be removed while its callback is still running?
Yes, it's a pain, and I hate it, but I think SCSI does this somehow for
one of their attributes. I don't remember how at the moment, and I
can't look it up (am traveling), but this should be a good hint.
thanks,
greg k-h
On Mon, Mar 04, 2024 at 05:36:19PM +0100, Greg KH wrote:
> On Mon, Mar 04, 2024 at 11:15:24AM -0500, Alan Stern wrote:
> > Third, this must be a generic problem. It will occur any time a sysfs
> > attribute callback tries to lock its device while another process is
> > trying to unregister that device.
> >
> > We faced this sort of problem some years ago when we were worrying
> > about "suicidal" attributes -- ones which would unregister their own
> > devices. I don't remember what the fix was or how it worked. But we
> > need something like it here.
> >
> > Greg and Tejun, any ideas? Is it possible somehow for an attribute file
> > to be removed while its callback is still running?
>
> Yes, it's a pain, and I hate it, but I think SCSI does this somehow for
> one of their attributes. I don't remember how at the moment, and I
> can't look it up (am traveling), but this should be a good hint.
Are you thinking of the sysfs_break_active_protection() and
sysfs_unbreak_active_protection() functions? They seem to be the
appropriate ones to use here.
Alan Stern
Tejun:
It looks like there's a memory leak in sysfs_break_active_protection().
It and its companion routine do this:
struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj,
const struct attribute *attr)
{
struct kernfs_node *kn;
kobject_get(kobj);
kn = kernfs_find_and_get(kobj->sd, attr->name);
if (kn)
kernfs_break_active_protection(kn);
return kn;
}
void sysfs_unbreak_active_protection(struct kernfs_node *kn)
{
struct kobject *kobj = kn->parent->priv;
kernfs_unbreak_active_protection(kn);
kernfs_put(kn);
kobject_put(kobj);
}
If kn is NULL then the kobject_get(kobj) reference is never dropped.
It looks like this could happen if two processes want to unregister the
same kobject at the same time.
Shouldn't sysfs_break_active_protection() do this?
kobject_get(kobj);
kn = kernfs_find_and_get(kobj->sd, attr->name);
if (kn)
kernfs_break_active_protection(kn);
+ else
+ kobject_put(kobj);
return kn;
Alan Stern
On Mon, Mar 04, 2024 at 01:30:35PM -0500, Alan Stern wrote:
> On Mon, Mar 04, 2024 at 05:36:19PM +0100, Greg KH wrote:
> > On Mon, Mar 04, 2024 at 11:15:24AM -0500, Alan Stern wrote:
> > > Third, this must be a generic problem. It will occur any time a sysfs
> > > attribute callback tries to lock its device while another process is
> > > trying to unregister that device.
> > >
> > > We faced this sort of problem some years ago when we were worrying
> > > about "suicidal" attributes -- ones which would unregister their own
> > > devices. I don't remember what the fix was or how it worked. But we
> > > need something like it here.
> > >
> > > Greg and Tejun, any ideas? Is it possible somehow for an attribute file
> > > to be removed while its callback is still running?
> >
> > Yes, it's a pain, and I hate it, but I think SCSI does this somehow for
> > one of their attributes. I don't remember how at the moment, and I
> > can't look it up (am traveling), but this should be a good hint.
>
> Are you thinking of the sysfs_break_active_protection() and
> sysfs_unbreak_active_protection() functions? They seem to be the
> appropriate ones to use here.
Yes, that sounds correct.
Sorry about late reply.
On Mon, Mar 04, 2024 at 02:17:27PM -0500, Alan Stern wrote:
..
> struct kernfs_node *sysfs_break_active_protection(struct kobject *kobj,
> const struct attribute *attr)
> {
> struct kernfs_node *kn;
>
> kobject_get(kobj);
> kn = kernfs_find_and_get(kobj->sd, attr->name);
> if (kn)
> kernfs_break_active_protection(kn);
> return kn;
> }
..
> If kn is NULL then the kobject_get(kobj) reference is never dropped.
> It looks like this could happen if two processes want to unregister the
> same kobject at the same time.
>
> Shouldn't sysfs_break_active_protection() do this?
>
> kobject_get(kobj);
> kn = kernfs_find_and_get(kobj->sd, attr->name);
> if (kn)
> kernfs_break_active_protection(kn);
> + else
> + kobject_put(kobj);
> return kn;
Yeah, I think you're right. It's an obvious ref leak. Would you mind writing
up a patch?
Thank you.
--
tejun
The sysfs_break_active_protection() routine has an obvious reference
leak in its error path. If the call to kernfs_find_and_get() fails then
kn will be NULL, so the companion sysfs_unbreak_active_protection()
routine won't get called (and would only cause an access violation by
trying to dereference kn->parent if it was called). As a result, the
reference to kobj acquired at the start of the function will never be
released.
Fix the leak by adding an explicit kobject_put() call when kn is NULL.
Signed-off-by: Alan Stern <[email protected]>
Fixes: 2afc9166f79b ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
Cc: Bart Van Assche <[email protected]>
Cc: <[email protected]>
---
fs/sysfs/file.c | 2 ++
1 file changed, 2 insertions(+)
Index: usb-devel/fs/sysfs/file.c
===================================================================
--- usb-devel.orig/fs/sysfs/file.c
+++ usb-devel/fs/sysfs/file.c
@@ -463,6 +463,8 @@ struct kernfs_node *sysfs_break_active_p
kn = kernfs_find_and_get(kobj->sd, attr->name);
if (kn)
kernfs_break_active_protection(kn);
+ else
+ kobject_put(kobj);
return kn;
}
EXPORT_SYMBOL_GPL(sysfs_break_active_protection);
On Wed, Mar 13, 2024 at 05:43:41PM -0400, Alan Stern wrote:
> The sysfs_break_active_protection() routine has an obvious reference
> leak in its error path. If the call to kernfs_find_and_get() fails then
> kn will be NULL, so the companion sysfs_unbreak_active_protection()
> routine won't get called (and would only cause an access violation by
> trying to dereference kn->parent if it was called). As a result, the
> reference to kobj acquired at the start of the function will never be
> released.
>
> Fix the leak by adding an explicit kobject_put() call when kn is NULL.
>
> Signed-off-by: Alan Stern <[email protected]>
> Fixes: 2afc9166f79b ("scsi: sysfs: Introduce sysfs_{un,}break_active_protection()")
> Cc: Bart Van Assche <[email protected]>
> Cc: <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Thanks.
--
tejun
On 3/13/24 14:43, Alan Stern wrote:
> The sysfs_break_active_protection() routine has an obvious reference
> leak in its error path. If the call to kernfs_find_and_get() fails then
> kn will be NULL, so the companion sysfs_unbreak_active_protection()
> routine won't get called (and would only cause an access violation by
> trying to dereference kn->parent if it was called). As a result, the
> reference to kobj acquired at the start of the function will never be
> released.
>
> Fix the leak by adding an explicit kobject_put() call when kn is NULL.
Reviewed-by: Bart Van Assche <[email protected]>