When getting disconnected we need to release eventual grabs on the
underlying input device as we also release the input device itself.
Otherwise, we would try to release the grab when the client that
requested it closes its handle, accessing the input device which
might already be freed.
Signed-off-by: Bj?rn Steinbrink <[email protected]>
---
I can't reproduce the bug on my UP box and currently can't afford
crashing my SMP box (all the oopses seem to come from SMP kernels, so I
guess it needs SMP to crash), so while this doesn't show any new
problems, I can't tell whether it actually fixes anything. Testers
welcome!
drivers/input/evdev.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 0727b0a..99562ce 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -853,6 +853,9 @@ static void evdev_cleanup(struct evdev *evdev)
evdev_hangup(evdev);
evdev_remove_chrdev(evdev);
+ if (evdev->grab)
+ evdev_ungrab(evdev, evdev->grab);
+
/* evdev is marked dead so no one else accesses evdev->open */
if (evdev->open) {
input_flush_device(handle, NULL);
--
1.5.5.rc2
On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
>
> I can't reproduce the bug on my UP box and currently can't afford
> crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> guess it needs SMP to crash), so while this doesn't show any new
> problems, I can't tell whether it actually fixes anything. Testers
> welcome!
Ok, I applied this because I will do an -rc8 today or tomorrow, but I
really really hope somebody can figure out what made this all start to
trigger. It does smell like some core device layer change, because we do
not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
seem relevant.
Greg, are there any refcounting changes that would cause the input devices
to be free'd earlier or something?
Linus
On Sun, Mar 30, 2008 at 02:51:02PM -0700, Linus Torvalds wrote:
>
>
> On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> >
> > I can't reproduce the bug on my UP box and currently can't afford
> > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > guess it needs SMP to crash), so while this doesn't show any new
> > problems, I can't tell whether it actually fixes anything. Testers
> > welcome!
>
> Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> really really hope somebody can figure out what made this all start to
> trigger. It does smell like some core device layer change, because we do
> not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> seem relevant.
>
> Greg, are there any refcounting changes that would cause the input devices
> to be free'd earlier or something?
Earlier? No, not that I know of at all, as long as the reference
counting logic was correct originally. All of the problems we have been
fixing were ones where we accidentally were grabbing too many references
and then wondering why things were not getting cleaned up properly as
the kobject rework exposed these problems making them more obvious.
I haven't heard of the opposite happening. Anything that I can try to
test for here, I have a lot of removable input devices to test with.
thanks,
greg k-h
On Sun, 30 Mar 2008, Greg KH wrote:
>
> I haven't heard of the opposite happening. Anything that I can try to
> test for here, I have a lot of removable input devices to test with.
Ahh, you weren't cc'd earlier, and may have missed the discussion. It's on
Linux-kernel in the thread called "Oops/Warning report for the week of
March 28th 2008"
Johannes says this triggers for him:
On the other hand, it should be easily reproducible by anyone else with
the same trick, here's what I do:
* configure X to use /dev/input/event* devices
* in an xterm, do something like
rmmod usbhid ; modprobe usbhid
* switch to a VT
* watch kernel crash as X releases the grab on the event device
and I haven't tried it because none of my normal machines use modules, and
I was lazy and really hoping somebody else who actually knows the device
and input layers would take a peek.
Linus
On 2008.03.30 15:22:28 -0700, Greg KH wrote:
> On Sun, Mar 30, 2008 at 02:51:02PM -0700, Linus Torvalds wrote:
> > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > > I can't reproduce the bug on my UP box and currently can't afford
> > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > guess it needs SMP to crash), so while this doesn't show any new
> > > problems, I can't tell whether it actually fixes anything. Testers
> > > welcome!
> >
> > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > really really hope somebody can figure out what made this all start to
> > trigger. It does smell like some core device layer change, because we do
> > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > seem relevant.
> >
> > Greg, are there any refcounting changes that would cause the input devices
> > to be free'd earlier or something?
>
> Earlier? No, not that I know of at all, as long as the reference
> counting logic was correct originally. All of the problems we have been
> fixing were ones where we accidentally were grabbing too many references
> and then wondering why things were not getting cleaned up properly as
> the kobject rework exposed these problems making them more obvious.
Not freeing the input device at all would of course also hide any
access-after-free problems :-) So if that's the case, that might explain
the sudden exposure of the problem. IMHO, my patch is the right thing to
do anyway, because releasing a grab on the underlying input device from
within evdev clearly needs to happen before we release that device. So
AFAICT we're really just looking for "why do we see that bug now?" and
"is there another bug?"
> I haven't heard of the opposite happening. Anything that I can try to
> test for here, I have a lot of removable input devices to test with.
Sorry, forgot to set the In-Reply-To header when sending the patch. The
original thread, with a reproducing recipe is here:
http://lkml.org/lkml/2008/3/28/442
Message-Id: <[email protected]>
Bj?rn
Bj?rn Steinbrink wrote:
> On 2008.03.30 15:22:28 -0700, Greg KH wrote:
>> On Sun, Mar 30, 2008 at 02:51:02PM -0700, Linus Torvalds wrote:
>>> On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
>>>> I can't reproduce the bug on my UP box and currently can't afford
>>>> crashing my SMP box (all the oopses seem to come from SMP kernels, so I
>>>> guess it needs SMP to crash), so while this doesn't show any new
>>>> problems, I can't tell whether it actually fixes anything. Testers
>>>> welcome!
>>> Ok, I applied this because I will do an -rc8 today or tomorrow, but I
>>> really really hope somebody can figure out what made this all start to
>>> trigger. It does smell like some core device layer change, because we do
>>> not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
>>> seem relevant.
>>>
>>> Greg, are there any refcounting changes that would cause the input devices
>>> to be free'd earlier or something?
>> Earlier? No, not that I know of at all, as long as the reference
>> counting logic was correct originally. All of the problems we have been
>> fixing were ones where we accidentally were grabbing too many references
>> and then wondering why things were not getting cleaned up properly as
>> the kobject rework exposed these problems making them more obvious.
>
> Not freeing the input device at all would of course also hide any
> access-after-free problems :-) So if that's the case, that might explain
> the sudden exposure of the problem. IMHO, my patch is the right thing to
> do anyway, because releasing a grab on the underlying input device from
> within evdev clearly needs to happen before we release that device. So
> AFAICT we're really just looking for "why do we see that bug now?" and
> "is there another bug?"
>
>> I haven't heard of the opposite happening. Anything that I can try to
>> test for here, I have a lot of removable input devices to test with.
>
> Sorry, forgot to set the In-Reply-To header when sending the patch. The
> original thread, with a reproducing recipe is here:
> http://lkml.org/lkml/2008/3/28/442
> Message-Id: <[email protected]>
>
one note.. you do want to enable slab poison, just to catch the bug right away...
Hi Linus,
On Sunday 30 March 2008, Linus Torvalds wrote:
>
> On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> >
> > I can't reproduce the bug on my UP box and currently can't afford
> > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > guess it needs SMP to crash), so while this doesn't show any new
> > problems, I can't tell whether it actually fixes anything. Testers
> > welcome!
>
> Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> really really hope somebody can figure out what made this all start to
> trigger. It does smell like some core device layer change, because we do
> not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> seem relevant.
>
> Greg, are there any refcounting changes that would cause the input devices
> to be free'd earlier or something?
>
The following commit changed lifetime runes on kobjects breaking input:
commit 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806
Author: Kay Sievers <[email protected]>
Date: Wed Dec 19 01:40:42 2007 +0100
Kobject: auto-cleanup on final unref
We save the current state in the object itself, so we can do proper
cleanup when the last reference is dropped.
If the initial reference is dropped, the object will be removed from
sysfs if needed, if an "add" event was sent, "remove" will be send, and
the allocated resources are released.
This allows us to clean up some driver core usage as well as allowing us
to do other such changes to the rest of the kernel.
Signed-off-by: Kay Sievers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Before we dropped reference to kobject's parent only when child kobject
was released (in kobject_cleanup). The changeset above moves the release
to kobject_del() which is way too early in my opinion. The kobject is only
marked for deletion at that time, not really deleted.
I will look how to properly fix evdev and the rest of input interfaces
tomorrow.
--
Dmitry
On Mon, Mar 31, 2008 at 02:15:39AM -0400, Dmitry Torokhov wrote:
> Hi Linus,
>
> On Sunday 30 March 2008, Linus Torvalds wrote:
> >
> > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > >
> > > I can't reproduce the bug on my UP box and currently can't afford
> > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > guess it needs SMP to crash), so while this doesn't show any new
> > > problems, I can't tell whether it actually fixes anything. Testers
> > > welcome!
> >
> > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > really really hope somebody can figure out what made this all start to
> > trigger. It does smell like some core device layer change, because we do
> > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > seem relevant.
> >
> > Greg, are there any refcounting changes that would cause the input devices
> > to be free'd earlier or something?
> >
>
> The following commit changed lifetime runes on kobjects breaking input:
>
> commit 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806
> Author: Kay Sievers <[email protected]>
> Date: Wed Dec 19 01:40:42 2007 +0100
>
> Kobject: auto-cleanup on final unref
>
> We save the current state in the object itself, so we can do proper
> cleanup when the last reference is dropped.
>
> If the initial reference is dropped, the object will be removed from
> sysfs if needed, if an "add" event was sent, "remove" will be send, and
> the allocated resources are released.
>
> This allows us to clean up some driver core usage as well as allowing us
> to do other such changes to the rest of the kernel.
>
> Signed-off-by: Kay Sievers <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> Before we dropped reference to kobject's parent only when child kobject
> was released (in kobject_cleanup). The changeset above moves the release
> to kobject_del() which is way too early in my opinion. The kobject is only
> marked for deletion at that time, not really deleted.
It was "deleted" from sysfs, and should have never been used again by
any callers. If the reference count was dropped to zero with this call,
it would be cleaned up as well, it seems that you were assuming that it
would not be? Perhaps you just need to grab another reference as this
would have caused you problems without this change anyway, but without
slab debugging, you never saw it.
> I will look how to properly fix evdev and the rest of input interfaces
> tomorrow.
If you need any help, please let me know.
thanks,
greg k-h
On Mon, Mar 31, 2008 at 10:28:13AM -0700, Greg KH wrote:
> On Mon, Mar 31, 2008 at 02:15:39AM -0400, Dmitry Torokhov wrote:
> > Hi Linus,
> >
> > On Sunday 30 March 2008, Linus Torvalds wrote:
> > >
> > > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > > >
> > > > I can't reproduce the bug on my UP box and currently can't afford
> > > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > > guess it needs SMP to crash), so while this doesn't show any new
> > > > problems, I can't tell whether it actually fixes anything. Testers
> > > > welcome!
> > >
> > > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > > really really hope somebody can figure out what made this all start to
> > > trigger. It does smell like some core device layer change, because we do
> > > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > > seem relevant.
> > >
> > > Greg, are there any refcounting changes that would cause the input devices
> > > to be free'd earlier or something?
> > >
> >
> > The following commit changed lifetime runes on kobjects breaking input:
> >
> > commit 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806
> > Author: Kay Sievers <[email protected]>
> > Date: Wed Dec 19 01:40:42 2007 +0100
> >
> > Kobject: auto-cleanup on final unref
> >
> > We save the current state in the object itself, so we can do proper
> > cleanup when the last reference is dropped.
> >
> > If the initial reference is dropped, the object will be removed from
> > sysfs if needed, if an "add" event was sent, "remove" will be send, and
> > the allocated resources are released.
> >
> > This allows us to clean up some driver core usage as well as allowing us
> > to do other such changes to the rest of the kernel.
> >
> > Signed-off-by: Kay Sievers <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >
> > Before we dropped reference to kobject's parent only when child kobject
> > was released (in kobject_cleanup). The changeset above moves the release
> > to kobject_del() which is way too early in my opinion. The kobject is only
> > marked for deletion at that time, not really deleted.
>
> It was "deleted" from sysfs, and should have never been used again by
> any callers. If the reference count was dropped to zero with this call,
> it would be cleaned up as well, it seems that you were assuming that it
> would not be? Perhaps you just need to grab another reference as this
> would have caused you problems without this change anyway, but without
> slab debugging, you never saw it.
>
Greg, please look at the change again. Before kobject_put(kobj->parent)
was done in kobject_cleanup() and so the parent would only be freed when
all its children are gone. Now parent is deleted early, even if its
children are still referenced by other users. This is lifetime rule
change and should really be announced as such.
If this change it intentional and is here to stay then I will just grab
the references myself, although I wonder what else might be broken by
it.
--
Dmitry
On Mon, 31 Mar 2008, Dmitry Torokhov wrote:
>
> Greg, please look at the change again. Before kobject_put(kobj->parent)
> was done in kobject_cleanup() and so the parent would only be freed when
> all its children are gone. Now parent is deleted early, even if its
> children are still referenced by other users. This is lifetime rule
> change and should really be announced as such.
>
> If this change it intentional and is here to stay then I will just grab
> the references myself, although I wonder what else might be broken by
> it.
I do agree that this might want reverting, unless there is some rally good
reason for it. People may have pefectly valid reasons to expect topology
and reachability to remain valid - it's certainly what we guarantee in the
VFS code for similar rules (ie the parent of a dentry is only free'd after
all children have gone away).
Greg, is it possible to get the old lifetime rules back wrt his? They seem
valid and sane..
Linus
On 2008.03.30 20:42:59 +0200, Bj?rn Steinbrink wrote:
> I can't reproduce the bug on my UP box and currently can't afford
> crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> guess it needs SMP to crash), so while this doesn't show any new
> problems, I can't tell whether it actually fixes anything. Testers
> welcome!
Hm, crap, can't reproduce on my x86_64 SMP box either. Tried with
various preemption models as well as rcu classic and preempt. Johannes,
is there anything "special" in your configuration?
Bj?rn
On Mon, Mar 31, 2008 at 02:01:20PM -0400, Dmitry Torokhov wrote:
> On Mon, Mar 31, 2008 at 10:28:13AM -0700, Greg KH wrote:
> > On Mon, Mar 31, 2008 at 02:15:39AM -0400, Dmitry Torokhov wrote:
> > > Hi Linus,
> > >
> > > On Sunday 30 March 2008, Linus Torvalds wrote:
> > > >
> > > > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > > > >
> > > > > I can't reproduce the bug on my UP box and currently can't afford
> > > > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > > > guess it needs SMP to crash), so while this doesn't show any new
> > > > > problems, I can't tell whether it actually fixes anything. Testers
> > > > > welcome!
> > > >
> > > > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > > > really really hope somebody can figure out what made this all start to
> > > > trigger. It does smell like some core device layer change, because we do
> > > > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > > > seem relevant.
> > > >
> > > > Greg, are there any refcounting changes that would cause the input devices
> > > > to be free'd earlier or something?
> > > >
> > >
> > > The following commit changed lifetime runes on kobjects breaking input:
> > >
> > > commit 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806
> > > Author: Kay Sievers <[email protected]>
> > > Date: Wed Dec 19 01:40:42 2007 +0100
> > >
> > > Kobject: auto-cleanup on final unref
> > >
> > > We save the current state in the object itself, so we can do proper
> > > cleanup when the last reference is dropped.
> > >
> > > If the initial reference is dropped, the object will be removed from
> > > sysfs if needed, if an "add" event was sent, "remove" will be send, and
> > > the allocated resources are released.
> > >
> > > This allows us to clean up some driver core usage as well as allowing us
> > > to do other such changes to the rest of the kernel.
> > >
> > > Signed-off-by: Kay Sievers <[email protected]>
> > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > >
> > > Before we dropped reference to kobject's parent only when child kobject
> > > was released (in kobject_cleanup). The changeset above moves the release
> > > to kobject_del() which is way too early in my opinion. The kobject is only
> > > marked for deletion at that time, not really deleted.
> >
> > It was "deleted" from sysfs, and should have never been used again by
> > any callers. If the reference count was dropped to zero with this call,
> > it would be cleaned up as well, it seems that you were assuming that it
> > would not be? Perhaps you just need to grab another reference as this
> > would have caused you problems without this change anyway, but without
> > slab debugging, you never saw it.
> >
>
> Greg, please look at the change again. Before kobject_put(kobj->parent)
> was done in kobject_cleanup() and so the parent would only be freed when
> all its children are gone. Now parent is deleted early, even if its
> children are still referenced by other users. This is lifetime rule
> change and should really be announced as such.
Ugh, this was done because of scsi, they required that if you really
were deleting the parent, you wanted it gone.
> If this change it intentional and is here to stay then I will just grab
> the references myself, although I wonder what else might be broken by
> it.
Yes, if you need those references, you are going to have to hold on for
them, the kobject layer will not keep them around. It now is a "does
what you ask for" type model :)
I fail to see where this affects the input code though, in glancing at
it, it looks like you are doing things properly. Kay, any thoughts
here, I think you looked at the kobject input layer interaction in the
past.
thanks,
greg k-h
Hi Bjorn,
On Mon, Mar 31, 2008 at 12:42:03AM +0200, Bj?rn Steinbrink wrote:
> On 2008.03.30 15:22:28 -0700, Greg KH wrote:
> > On Sun, Mar 30, 2008 at 02:51:02PM -0700, Linus Torvalds wrote:
> > > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > > > I can't reproduce the bug on my UP box and currently can't afford
> > > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > > guess it needs SMP to crash), so while this doesn't show any new
> > > > problems, I can't tell whether it actually fixes anything. Testers
> > > > welcome!
> > >
> > > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > > really really hope somebody can figure out what made this all start to
> > > trigger. It does smell like some core device layer change, because we do
> > > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > > seem relevant.
> > >
> > > Greg, are there any refcounting changes that would cause the input devices
> > > to be free'd earlier or something?
> >
> > Earlier? No, not that I know of at all, as long as the reference
> > counting logic was correct originally. All of the problems we have been
> > fixing were ones where we accidentally were grabbing too many references
> > and then wondering why things were not getting cleaned up properly as
> > the kobject rework exposed these problems making them more obvious.
>
> Not freeing the input device at all would of course also hide any
> access-after-free problems :-) So if that's the case, that might explain
> the sudden exposure of the problem. IMHO, my patch is the right thing to
> do anyway, because releasing a grab on the underlying input device from
> within evdev clearly needs to happen before we release that device. So
> AFAICT we're really just looking for "why do we see that bug now?" and
> "is there another bug?"
>
If device is being disconnected (rdestroyed) then we dont really need to
release grab since there won't be any input events coming through anyway,
so there is no "another bug". I am considering removing the call to
release device once we sort out the issue with lifetime rules change,
since it is not needed.
--
Dmitry
On Mon, Mar 31, 2008 at 01:42:21PM -0700, Greg KH wrote:
> On Mon, Mar 31, 2008 at 02:01:20PM -0400, Dmitry Torokhov wrote:
> > On Mon, Mar 31, 2008 at 10:28:13AM -0700, Greg KH wrote:
> > > On Mon, Mar 31, 2008 at 02:15:39AM -0400, Dmitry Torokhov wrote:
> > > > Hi Linus,
> > > >
> > > > On Sunday 30 March 2008, Linus Torvalds wrote:
> > > > >
> > > > > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > > > > >
> > > > > > I can't reproduce the bug on my UP box and currently can't afford
> > > > > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > > > > guess it needs SMP to crash), so while this doesn't show any new
> > > > > > problems, I can't tell whether it actually fixes anything. Testers
> > > > > > welcome!
> > > > >
> > > > > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > > > > really really hope somebody can figure out what made this all start to
> > > > > trigger. It does smell like some core device layer change, because we do
> > > > > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > > > > seem relevant.
> > > > >
> > > > > Greg, are there any refcounting changes that would cause the input devices
> > > > > to be free'd earlier or something?
> > > > >
> > > >
> > > > The following commit changed lifetime runes on kobjects breaking input:
> > > >
> > > > commit 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806
> > > > Author: Kay Sievers <[email protected]>
> > > > Date: Wed Dec 19 01:40:42 2007 +0100
> > > >
> > > > Kobject: auto-cleanup on final unref
> > > >
> > > > We save the current state in the object itself, so we can do proper
> > > > cleanup when the last reference is dropped.
> > > >
> > > > If the initial reference is dropped, the object will be removed from
> > > > sysfs if needed, if an "add" event was sent, "remove" will be send, and
> > > > the allocated resources are released.
> > > >
> > > > This allows us to clean up some driver core usage as well as allowing us
> > > > to do other such changes to the rest of the kernel.
> > > >
> > > > Signed-off-by: Kay Sievers <[email protected]>
> > > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > >
> > > > Before we dropped reference to kobject's parent only when child kobject
> > > > was released (in kobject_cleanup). The changeset above moves the release
> > > > to kobject_del() which is way too early in my opinion. The kobject is only
> > > > marked for deletion at that time, not really deleted.
> > >
> > > It was "deleted" from sysfs, and should have never been used again by
> > > any callers. If the reference count was dropped to zero with this call,
> > > it would be cleaned up as well, it seems that you were assuming that it
> > > would not be? Perhaps you just need to grab another reference as this
> > > would have caused you problems without this change anyway, but without
> > > slab debugging, you never saw it.
> > >
> >
> > Greg, please look at the change again. Before kobject_put(kobj->parent)
> > was done in kobject_cleanup() and so the parent would only be freed when
> > all its children are gone. Now parent is deleted early, even if its
> > children are still referenced by other users. This is lifetime rule
> > change and should really be announced as such.
>
> Ugh, this was done because of scsi, they required that if you really
> were deleting the parent, you wanted it gone.
>
Gone from sysfs or gone from memory?
> > If this change it intentional and is here to stay then I will just grab
> > the references myself, although I wonder what else might be broken by
> > it.
>
> Yes, if you need those references, you are going to have to hold on for
> them, the kobject layer will not keep them around. It now is a "does
> what you ask for" type model :)
>
> I fail to see where this affects the input code though, in glancing at
> it, it looks like you are doing things properly. Kay, any thoughts
> here, I think you looked at the kobject input layer interaction in the
> past.
>
Ok, I really liked the old behavior better, but if it is to stay then
we need something like this (not for inclusion yet as mousedev and joydev
need to be adjusted as well):
Signed-off-by: Dmitry Torokhov <[email protected]>
---
drivers/input/evdev.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
Index: linux/drivers/input/evdev.c
===================================================================
--- linux.orig/drivers/input/evdev.c
+++ linux/drivers/input/evdev.c
@@ -124,6 +124,7 @@ static void evdev_free(struct device *de
{
struct evdev *evdev = container_of(dev, struct evdev, dev);
+ input_put_device(evdev->handle.dev);
kfree(evdev);
}
@@ -853,9 +854,6 @@ static void evdev_cleanup(struct evdev *
evdev_hangup(evdev);
evdev_remove_chrdev(evdev);
- if (evdev->grab)
- evdev_ungrab(evdev, evdev->grab);
-
/* evdev is marked dead so no one else accesses evdev->open */
if (evdev->open) {
input_flush_device(handle, NULL);
@@ -896,7 +894,7 @@ static int evdev_connect(struct input_ha
evdev->exist = 1;
evdev->minor = minor;
- evdev->handle.dev = dev;
+ evdev->handle.dev = input_get_device(dev);
evdev->handle.name = evdev->name;
evdev->handle.handler = handler;
evdev->handle.private = evdev;
--
Dmitry
On Mon, 31 Mar 2008, Greg KH wrote:
> > Greg, please look at the change again. Before kobject_put(kobj->parent)
> > was done in kobject_cleanup() and so the parent would only be freed when
> > all its children are gone. Now parent is deleted early, even if its
> > children are still referenced by other users. This is lifetime rule
> > change and should really be announced as such.
> Ugh, this was done because of scsi, they required that if you really
> were deleting the parent, you wanted it gone.
What is the exact meaning of "gone" here please?
--
Jiri Kosina
On Mon, Mar 31, 2008 at 04:57:36PM -0400, Dmitry Torokhov wrote:
> On Mon, Mar 31, 2008 at 01:42:21PM -0700, Greg KH wrote:
> > On Mon, Mar 31, 2008 at 02:01:20PM -0400, Dmitry Torokhov wrote:
> > > On Mon, Mar 31, 2008 at 10:28:13AM -0700, Greg KH wrote:
> > > > On Mon, Mar 31, 2008 at 02:15:39AM -0400, Dmitry Torokhov wrote:
> > > > > Hi Linus,
> > > > >
> > > > > On Sunday 30 March 2008, Linus Torvalds wrote:
> > > > > >
> > > > > > On Sun, 30 Mar 2008, Bj?rn Steinbrink wrote:
> > > > > > >
> > > > > > > I can't reproduce the bug on my UP box and currently can't afford
> > > > > > > crashing my SMP box (all the oopses seem to come from SMP kernels, so I
> > > > > > > guess it needs SMP to crash), so while this doesn't show any new
> > > > > > > problems, I can't tell whether it actually fixes anything. Testers
> > > > > > > welcome!
> > > > > >
> > > > > > Ok, I applied this because I will do an -rc8 today or tomorrow, but I
> > > > > > really really hope somebody can figure out what made this all start to
> > > > > > trigger. It does smell like some core device layer change, because we do
> > > > > > not seem to have a lot of changes since 2.6.24 in evdev.c and input.c that
> > > > > > seem relevant.
> > > > > >
> > > > > > Greg, are there any refcounting changes that would cause the input devices
> > > > > > to be free'd earlier or something?
> > > > > >
> > > > >
> > > > > The following commit changed lifetime runes on kobjects breaking input:
> > > > >
> > > > > commit 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806
> > > > > Author: Kay Sievers <[email protected]>
> > > > > Date: Wed Dec 19 01:40:42 2007 +0100
> > > > >
> > > > > Kobject: auto-cleanup on final unref
> > > > >
> > > > > We save the current state in the object itself, so we can do proper
> > > > > cleanup when the last reference is dropped.
> > > > >
> > > > > If the initial reference is dropped, the object will be removed from
> > > > > sysfs if needed, if an "add" event was sent, "remove" will be send, and
> > > > > the allocated resources are released.
> > > > >
> > > > > This allows us to clean up some driver core usage as well as allowing us
> > > > > to do other such changes to the rest of the kernel.
> > > > >
> > > > > Signed-off-by: Kay Sievers <[email protected]>
> > > > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > > >
> > > > > Before we dropped reference to kobject's parent only when child kobject
> > > > > was released (in kobject_cleanup). The changeset above moves the release
> > > > > to kobject_del() which is way too early in my opinion. The kobject is only
> > > > > marked for deletion at that time, not really deleted.
> > > >
> > > > It was "deleted" from sysfs, and should have never been used again by
> > > > any callers. If the reference count was dropped to zero with this call,
> > > > it would be cleaned up as well, it seems that you were assuming that it
> > > > would not be? Perhaps you just need to grab another reference as this
> > > > would have caused you problems without this change anyway, but without
> > > > slab debugging, you never saw it.
> > > >
> > >
> > > Greg, please look at the change again. Before kobject_put(kobj->parent)
> > > was done in kobject_cleanup() and so the parent would only be freed when
> > > all its children are gone. Now parent is deleted early, even if its
> > > children are still referenced by other users. This is lifetime rule
> > > change and should really be announced as such.
> >
> > Ugh, this was done because of scsi, they required that if you really
> > were deleting the parent, you wanted it gone.
> >
>
> Gone from sysfs or gone from memory?
>
> > > If this change it intentional and is here to stay then I will just grab
> > > the references myself, although I wonder what else might be broken by
> > > it.
> >
> > Yes, if you need those references, you are going to have to hold on for
> > them, the kobject layer will not keep them around. It now is a "does
> > what you ask for" type model :)
> >
> > I fail to see where this affects the input code though, in glancing at
> > it, it looks like you are doing things properly. Kay, any thoughts
> > here, I think you looked at the kobject input layer interaction in the
> > past.
> >
>
> Ok, I really liked the old behavior better, but if it is to stay then
> we need something like this (not for inclusion yet as mousedev and joydev
> need to be adjusted as well):
Yes, that's the proper behavior anyway, as you are passing off a pointer
to a device, you need to keep the reference to that device around until
you are finished with it.
I'm amazed that this wasn't causing a problem before the kobject change,
as this should have been needed there as well. Would running with slab
debugging cause it to hit then?
> Signed-off-by: Dmitry Torokhov <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
thanks,
greg k-h
On Mon, 2008-03-31 at 23:27 +0200, Jiri Kosina wrote:
> On Mon, 31 Mar 2008, Greg KH wrote:
>
> > > Greg, please look at the change again. Before kobject_put(kobj->parent)
> > > was done in kobject_cleanup() and so the parent would only be freed when
> > > all its children are gone. Now parent is deleted early, even if its
> > > children are still referenced by other users. This is lifetime rule
> > > change and should really be announced as such.
> > Ugh, this was done because of scsi, they required that if you really
> > were deleting the parent, you wanted it gone.
>
> What is the exact meaning of "gone" here please?
Gone means, that if you remove a device from sysfs, you drop the
implicit reference to the parent device, as this is no longer needed.
You are expected to keep a ref to the parent object (same way as to any
other used object) if you need to access the data. Removed objects are
isolated now, which means that you just pin their data and not their
parents.
This is the expected behavior and makes it possible to resolve refcount
loops (parent ref's child) which could not be released with the implicit
parent ref that was only released on object cleanup.
Thanks,
Kay
> I do agree that this might want reverting, unless there is some rally good
> reason for it. People may have pefectly valid reasons to expect topology
> and reachability to remain valid - it's certainly what we guarantee in the
> VFS code for similar rules (ie the parent of a dentry is only free'd after
> all children have gone away).
>
> Greg, is it possible to get the old lifetime rules back wrt his? They seem
> valid and sane..
Looks like we are seeing something similar with suspend, I just got this
oops log. I think what happens is that appletouch suspend causes it to
disconnect and then X console switches or closes the evdev, whatever,
kaboom ...
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
usbcore: deregistering interface driver appletouch
input: appletouch disconnected
PM: Syncing filesystems ... done.
Oops: Kernel access of bad area, sig: 11 [#1]
PowerMac
Modules linked in: sg sd_mod binfmt_misc appletalk psnap llc hci_usb radeon drm rfcomm l2cap bluetooth cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fuse i2c_dev therm_adt746x sr_mod sbp2 apm_emu apm_emulation arc4 ecb snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa b43 mac80211 joydev pcmcia cfg80211 rng_core ohci1394 snd_aoa_i2sbus ieee1394 snd_pcm snd_page_alloc snd_seq_midi snd_rawmidi pmac_zilog serial_core snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore snd_aoa_soundbus yenta_socket rsrc_nonstatic pcmcia_core ssb uninorth_agp agpgart [last unloaded: appletouch]
NIP: c02932a8 LR: c01f961c CTR: c002ac78
REGS: d26e7dc0 TRAP: 0300 Not tainted (2.6.25-rc7-p1)
MSR: 00009032 <EE,ME,IR,DR> CR: 24000888 XER: 00000000
DAR: 00000000, DSISR: 42000000
TASK = d5be6d30[14676] 'Xorg' THREAD: d26e6000
GPR00: d26e7e78 d26e7e70 d5be6d30 ef065640 c037b570 c03a60d0 00000f40 00000001
GPR08: 00000008 00000000 d26e7ea4 00000000 24000888 101fba44 101f3bc4 101f3bec
GPR16: bffb0660 00000000 102187e4 10218ae4 102186e4 10218764 102189e4 bffb0574
GPR24: 102187e4 101f3cf8 ef63c0d4 ef80cc20 ef152c1c ef065644 d5be6d30 ef065640
NIP [c02932a8] __mutex_lock_slowpath+0x2c/0xc0
LR [c01f961c] input_release_device+0x24/0x48
Call Trace:
[d26e7e70] [c0380000] 0xc0380000 (unreliable)
[d26e7ea0] [c01f961c] input_release_device+0x24/0x48
[d26e7ec0] [c01fd9c4] evdev_ungrab+0x4c/0x64
[d26e7ed0] [c01fdac8] evdev_release+0xec/0xf0
[d26e7ef0] [c008b744] __fput+0xbc/0x1a4
[d26e7f10] [c0088178] filp_close+0x68/0xb0
[d26e7f30] [c0088250] sys_close+0x90/0xb4
[d26e7f40] [c0011c78] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfc1d34c
LR = 0x10077d78
Instruction dump:
4bfffe70 9421ffd0 7c0802a6 bfa10024 3ba30004 7c7f1b78 90010034 38010008
7c5e1378 93a10008 81230008 90030008 <90090000> 9121000c 3920ffff 90410010
---[ end trace aeb589f151c28f3f ]---
agpgart: Putting AGP V2 device at 0000:00:0b.0 into 4x mode
On Tue, Apr 01, 2008 at 10:12:44AM +1100, Benjamin Herrenschmidt wrote:
>
> > I do agree that this might want reverting, unless there is some rally good
> > reason for it. People may have pefectly valid reasons to expect topology
> > and reachability to remain valid - it's certainly what we guarantee in the
> > VFS code for similar rules (ie the parent of a dentry is only free'd after
> > all children have gone away).
> >
> > Greg, is it possible to get the old lifetime rules back wrt his? They seem
> > valid and sane..
>
> Looks like we are seeing something similar with suspend, I just got this
> oops log. I think what happens is that appletouch suspend causes it to
> disconnect and then X console switches or closes the evdev, whatever,
> kaboom ...
>
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
> usbcore: deregistering interface driver appletouch
> input: appletouch disconnected
> PM: Syncing filesystems ... done.
> Oops: Kernel access of bad area, sig: 11 [#1]
> PowerMac
> Modules linked in: sg sd_mod binfmt_misc appletalk psnap llc hci_usb radeon drm rfcomm l2cap bluetooth cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fuse i2c_dev therm_adt746x sr_mod sbp2 apm_emu apm_emulation arc4 ecb snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa b43 mac80211 joydev pcmcia cfg80211 rng_core ohci1394 snd_aoa_i2sbus ieee1394 snd_pcm snd_page_alloc snd_seq_midi snd_rawmidi pmac_zilog serial_core snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore snd_aoa_soundbus yenta_socket rsrc_nonstatic pcmcia_core ssb uninorth_agp agpgart [last unloaded: appletouch]
> NIP: c02932a8 LR: c01f961c CTR: c002ac78
> REGS: d26e7dc0 TRAP: 0300 Not tainted (2.6.25-rc7-p1)
> MSR: 00009032 <EE,ME,IR,DR> CR: 24000888 XER: 00000000
> DAR: 00000000, DSISR: 42000000
> TASK = d5be6d30[14676] 'Xorg' THREAD: d26e6000
> GPR00: d26e7e78 d26e7e70 d5be6d30 ef065640 c037b570 c03a60d0 00000f40 00000001
> GPR08: 00000008 00000000 d26e7ea4 00000000 24000888 101fba44 101f3bc4 101f3bec
> GPR16: bffb0660 00000000 102187e4 10218ae4 102186e4 10218764 102189e4 bffb0574
> GPR24: 102187e4 101f3cf8 ef63c0d4 ef80cc20 ef152c1c ef065644 d5be6d30 ef065640
> NIP [c02932a8] __mutex_lock_slowpath+0x2c/0xc0
> LR [c01f961c] input_release_device+0x24/0x48
> Call Trace:
> [d26e7e70] [c0380000] 0xc0380000 (unreliable)
> [d26e7ea0] [c01f961c] input_release_device+0x24/0x48
> [d26e7ec0] [c01fd9c4] evdev_ungrab+0x4c/0x64
> [d26e7ed0] [c01fdac8] evdev_release+0xec/0xf0
Can you try it with the patch that was just posted by Dmitry for the
evdev code?
thanks,
greg k-h
On Mon, 2008-03-31 at 16:51 -0700, Greg KH wrote:
>
> Can you try it with the patch that was just posted by Dmitry for the
> evdev code?
Yup, it works, ship it ! :-)
Cheers,
Ben.
On Mon, Mar 31, 2008 at 03:09:06PM -0700, Greg KH wrote:
> On Mon, Mar 31, 2008 at 04:57:36PM -0400, Dmitry Torokhov wrote:
> >
> > Ok, I really liked the old behavior better, but if it is to stay then
> > we need something like this (not for inclusion yet as mousedev and joydev
> > need to be adjusted as well):
>
> Yes, that's the proper behavior anyway, as you are passing off a pointer
> to a device, you need to keep the reference to that device around until
> you are finished with it.
>
> I'm amazed that this wasn't causing a problem before the kobject change,
> as this should have been needed there as well. Would running with slab
> debugging cause it to hit then?
>
It worked because evdev (and mousedev, joydev) are direct children of
input_dev and prior to Kay change parent would stay till all childrens
are gone.
I will ask Linus to pull extended patch covering also joydev and mousedev
shortly.
--
Dmitry