2006-02-22 09:12:31

by Ville Tervo

[permalink] [raw]
Subject: [Bluez-devel] Soft lockup

Hi,

I'm got attached oops while playing with Nokia 770. Attached patch
helps.

Comments?

--
Ville


Attachments:
(No filename) (102.00 B)
rfcomm_oopt.txt (3.53 kB)
rfcomm_dlc_patch.txt (594.00 B)
Download all attachments

2006-02-24 11:03:58

by Ville Tervo

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Marcel,

On Wed, Feb 22, 2006 at 02:54:25PM +0200, ext Ville Tervo wrote:
> Hi Marcel,
>
> On Wed, Feb 22, 2006 at 12:20:45PM +0100, ext Marcel Holtmann wrote:
> > Hi Ville,
> >
> > > I'm got attached oops while playing with Nokia 770. Attached patch
> > > helps.
> >
> > I am not really happy with this fix. Do you have an idea why this soft
> > lockup happens? Is this OMAP specific?
>
> Me neither. I'll try to reproduce it with i386 and see if I can get more
> information.

Now I have more information about this problem.

I had debuggin enabled in net/bluetooth/rfcomm/core.c and I noticed that
for some reson rfcomm_dlc_clear_timer() is called for allready freed
dlc and lock_timer_base() gets stuck because it's trying to use invalid
timer pointer. I also tried to reproduce bug in i386 without success.
Also then debugging is on the bug much harder to reproduse. To me this
looks like dlc locking doens't work as it should work.

And other notice. rfcomm_dlc_clear_timeri() uses timer_pending()
together with del_timer() which allready calls timer_pending(). So
timer_pending() is useless in rfcomm_dlc_clear_timer().

--
Ville


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2006-02-22 12:54:25

by Ville Tervo

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Marcel,

On Wed, Feb 22, 2006 at 12:20:45PM +0100, ext Marcel Holtmann wrote:
> Hi Ville,
>
> > I'm got attached oops while playing with Nokia 770. Attached patch
> > helps.
>
> I am not really happy with this fix. Do you have an idea why this soft
> lockup happens? Is this OMAP specific?

Me neither. I'll try to reproduce it with i386 and see if I can get more
information.

--
Ville


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2006-02-22 11:20:45

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Ville,

> I'm got attached oops while playing with Nokia 770. Attached patch
> helps.

I am not really happy with this fix. Do you have an idea why this soft
lockup happens? Is this OMAP specific?

Regards

Marcel




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2006-03-01 04:48:21

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Ville,

> Now I have more information about this problem.
>
> I had debuggin enabled in net/bluetooth/rfcomm/core.c and I noticed that
> for some reson rfcomm_dlc_clear_timer() is called for allready freed
> dlc and lock_timer_base() gets stuck because it's trying to use invalid
> timer pointer. I also tried to reproduce bug in i386 without success.
> Also then debugging is on the bug much harder to reproduse. To me this
> looks like dlc locking doens't work as it should work.
>
> And other notice. rfcomm_dlc_clear_timeri() uses timer_pending()
> together with del_timer() which allready calls timer_pending(). So
> timer_pending() is useless in rfcomm_dlc_clear_timer().

this all looks like we are missing a memory barrier somewhere. This
could also be the reason why it only gets triggered on OMAP systems. Any
further ideas?

Regards

Marcel




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-28 08:40:23

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Bastien,

> > Works with 2.6.22-rc3 (for a meaning of works, my code doesn't seem to
> > work, but that's a different matter).
>
> I forgot to mention that the serial service doesn't seem to create any
> physical devices (ie. test-serial doesn't create a /dev/rfcomm0 despite
> it being the return value of the ConnectService function).

that looks like an udev problem. It should create the device nodes.

Regards

Marcel



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-28 08:39:23

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Bastien,

> > > > > Here's what I could capture from the i386 crash:
> > > > > BUG warning at lib/kref.c:32/kref_get()
> > > >
> > > > any chance you can test a 2.6.22-rc3 kernel. We fixed some potential
> > > > problems in RFCOMM and Greg pushed some driver model changes that might
> > > > make this go away. I really have no idea what triggers this at the
> > > > moment. However I would like to see a simple reproducer.
> > >
> > > Works with 2.6.22-rc3 (for a meaning of works, my code doesn't seem to
> > > work, but that's a different matter).
> >
> > so this is actually fixed now within the upstream kernel. This also
> > means that I didn't really bother about older kernels.
>
> Any plans for a backport? It would be most useful, at least for testing.

I don't really have plans for a backport. Mainly because I don't know
which patch it finally solved. Meaning if it was one of the Bluetooth
patches or one Greg's for the driver model.

> > > I can reproduce the crash at will on 2 different machines. It might be
> > > easier to create a minimal reproducer if you have a debug version of the
> > > serial service.
> >
> > Actually using SIGUSR2 should switch debugging mode on/off, but I think
> > we forgot to introduce that within the service implementation. Need to
> > fix that at some point.
>
> It would also be useful if the different functions could have debugging
> output of their different entry points, after argument parsing.

That might be a little bit too much debug overhead. However all services
can be started with -s and then they work standalone. So you can use gdb
and valgrind with them.

Regards

Marcel



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-27 22:22:48

by Bastien Nocera

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

On Sun, 2007-05-27 at 15:43 +0100, Bastien Nocera wrote:
> Hey Marcel,
>
> On Sun, 2007-05-27 at 14:17 +0200, Marcel Holtmann wrote:
> > > Here's what I could capture from the i386 crash:
> > > BUG warning at lib/kref.c:32/kref_get()
> >
> > any chance you can test a 2.6.22-rc3 kernel. We fixed some potential
> > problems in RFCOMM and Greg pushed some driver model changes that might
> > make this go away. I really have no idea what triggers this at the
> > moment. However I would like to see a simple reproducer.
>
> Works with 2.6.22-rc3 (for a meaning of works, my code doesn't seem to
> work, but that's a different matter).

I forgot to mention that the serial service doesn't seem to create any
physical devices (ie. test-serial doesn't create a /dev/rfcomm0 despite
it being the return value of the ConnectService function).

Is that normal?

--
Bastien Nocera <[email protected]>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-27 22:16:10

by Bastien Nocera

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

On Sun, 2007-05-27 at 18:28 +0200, Marcel Holtmann wrote:
> Hi Bastien,
>
> > > > Here's what I could capture from the i386 crash:
> > > > BUG warning at lib/kref.c:32/kref_get()
> > >
> > > any chance you can test a 2.6.22-rc3 kernel. We fixed some potential
> > > problems in RFCOMM and Greg pushed some driver model changes that might
> > > make this go away. I really have no idea what triggers this at the
> > > moment. However I would like to see a simple reproducer.
> >
> > Works with 2.6.22-rc3 (for a meaning of works, my code doesn't seem to
> > work, but that's a different matter).
>
> so this is actually fixed now within the upstream kernel. This also
> means that I didn't really bother about older kernels.

Any plans for a backport? It would be most useful, at least for testing.

> > I can reproduce the crash at will on 2 different machines. It might be
> > easier to create a minimal reproducer if you have a debug version of the
> > serial service.
>
> Actually using SIGUSR2 should switch debugging mode on/off, but I think
> we forgot to introduce that within the service implementation. Need to
> fix that at some point.

It would also be useful if the different functions could have debugging
output of their different entry points, after argument parsing.

--
Bastien Nocera <[email protected]>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-27 16:28:28

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Bastien,

> > > Here's what I could capture from the i386 crash:
> > > BUG warning at lib/kref.c:32/kref_get()
> >
> > any chance you can test a 2.6.22-rc3 kernel. We fixed some potential
> > problems in RFCOMM and Greg pushed some driver model changes that might
> > make this go away. I really have no idea what triggers this at the
> > moment. However I would like to see a simple reproducer.
>
> Works with 2.6.22-rc3 (for a meaning of works, my code doesn't seem to
> work, but that's a different matter).

so this is actually fixed now within the upstream kernel. This also
means that I didn't really bother about older kernels.

> I can reproduce the crash at will on 2 different machines. It might be
> easier to create a minimal reproducer if you have a debug version of the
> serial service.

Actually using SIGUSR2 should switch debugging mode on/off, but I think
we forgot to introduce that within the service implementation. Need to
fix that at some point.

Regards

Marcel



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-27 14:43:23

by Bastien Nocera

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hey Marcel,

On Sun, 2007-05-27 at 14:17 +0200, Marcel Holtmann wrote:
> > Here's what I could capture from the i386 crash:
> > BUG warning at lib/kref.c:32/kref_get()
>
> any chance you can test a 2.6.22-rc3 kernel. We fixed some potential
> problems in RFCOMM and Greg pushed some driver model changes that might
> make this go away. I really have no idea what triggers this at the
> moment. However I would like to see a simple reproducer.

Works with 2.6.22-rc3 (for a meaning of works, my code doesn't seem to
work, but that's a different matter).

To reproduce the problem:
- Install gnome-vfs2-obexftp (let me know if you want me to provide
binaries) from
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=231005
- Make sure you have bluez-utils 3.11 (for the serial service)
- On the console:
eval `dbus-launch --sh-syntax` # Launches a session D-Bus daemon
/usr/libexec/gnome-vfs-daemon # Launches the gnome-vfs daemon
gnomevfs-ls obex://[00:00:11:22:22:00]/ # Where 00:00:11:22:22:00 is the
bdaddr of a phone, or other device with ObexFTP support

I can reproduce the crash at will on 2 different machines. It might be
easier to create a minimal reproducer if you have a debug version of the
serial service.

--
Bastien Nocera <[email protected]>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-27 12:17:09

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Bastien,

> > > Now I have more information about this problem.
> > >
> > > I had debuggin enabled in net/bluetooth/rfcomm/core.c and I noticed that
> > > for some reson rfcomm_dlc_clear_timer() is called for allready freed
> > > dlc and lock_timer_base() gets stuck because it's trying to use invalid
> > > timer pointer. I also tried to reproduce bug in i386 without success.
> > > Also then debugging is on the bug much harder to reproduse. To me this
> > > looks like dlc locking doens't work as it should work.
> > >
> > > And other notice. rfcomm_dlc_clear_timeri() uses timer_pending()
> > > together with del_timer() which allready calls timer_pending(). So
> > > timer_pending() is useless in rfcomm_dlc_clear_timer().
> >
> > this all looks like we are missing a memory barrier somewhere. This
> > could also be the reason why it only gets triggered on OMAP systems. Any
> > further ideas?
>
> I'm getting the same kind of trace (on both x86-64 and i386) when
> testing the gnome-vfs2-obexftp package with a patch to use the new
> serial service[1].
>
> It happens on 2.6.19-1.2895.fc6 (x86-64) and 2.6.21-1.3191.fc7 (i386).
>
> Here's what I could capture from the i386 crash:
> BUG warning at lib/kref.c:32/kref_get()

any chance you can test a 2.6.22-rc3 kernel. We fixed some potential
problems in RFCOMM and Greg pushed some driver model changes that might
make this go away. I really have no idea what triggers this at the
moment. However I would like to see a simple reproducer.

Regards

Marcel



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-05-27 11:30:56

by Bastien Nocera

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

On Wed, 2006-03-01 at 05:48 +0100, Marcel Holtmann wrote:
> Hi Ville,
>
> > Now I have more information about this problem.
> >
> > I had debuggin enabled in net/bluetooth/rfcomm/core.c and I noticed that
> > for some reson rfcomm_dlc_clear_timer() is called for allready freed
> > dlc and lock_timer_base() gets stuck because it's trying to use invalid
> > timer pointer. I also tried to reproduce bug in i386 without success.
> > Also then debugging is on the bug much harder to reproduse. To me this
> > looks like dlc locking doens't work as it should work.
> >
> > And other notice. rfcomm_dlc_clear_timeri() uses timer_pending()
> > together with del_timer() which allready calls timer_pending(). So
> > timer_pending() is useless in rfcomm_dlc_clear_timer().
>
> this all looks like we are missing a memory barrier somewhere. This
> could also be the reason why it only gets triggered on OMAP systems. Any
> further ideas?

I'm getting the same kind of trace (on both x86-64 and i386) when
testing the gnome-vfs2-obexftp package with a patch to use the new
serial service[1].

It happens on 2.6.19-1.2895.fc6 (x86-64) and 2.6.21-1.3191.fc7 (i386).

Here's what I could capture from the i386 crash:
BUG warning at lib/kref.c:32/kref_get()

kref_get
kobject_get
get_device
device_move
rfcomm_tty_close
release_dev
rfcomm_dlc_send
rfcomm_tty_write
file_has_perm
tty_release
__fput
filp_close
sys_close
syscall_call
wext_handle_ioctl

BUG unable to handle kernel paging request at virtual address ffffffff

rfcomm_dlc_send
rfcomm_dlc_write
file_has_perm
tty_release
__fput
filp_close
sys_close
syscall_call
wext_handle_ioctl

BUG unable to handle kernel paging request at virtual address ffffffff

last sysfs file: /class/tty/rfcomm0/dev

[... Memory corruption dump]

The ffffffff address is the one in the actual oops.

[1]: Package at:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=231005
--
Bastien Nocera <[email protected]>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-06-01 15:16:07

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hi Pierre-Yves,

> > That might be a little bit too much debug overhead. However all services
> > can be started with -s and then they work standalone. So you can use gdb
> > and valgrind with them.
>
> Could you or anybody knowing about it please explain more about the
> funtion and usage of this "-s" parameter, which I can't find in the man
> pages? I experience various crashes of hcid on some systems, and would
> like to track the problem down as much as possible, and it seems this
> option may help me.

the -s is only for the services. It can only used for service debugging.
If hcid crashed, then that is something different.

The -s option only tells the service to register the service and make it
possible to run standalone (hence the option character).

Regards

Marcel



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2007-06-01 14:53:43

by Pierre-Yves Paulus

[permalink] [raw]
Subject: Re: [Bluez-devel] Soft lockup

Hello Marcel and everyone,

> That might be a little bit too much debug overhead. However all services
> can be started with -s and then they work standalone. So you can use gdb
> and valgrind with them.

Could you or anybody knowing about it please explain more about the
funtion and usage of this "-s" parameter, which I can't find in the man
pages? I experience various crashes of hcid on some systems, and would
like to track the problem down as much as possible, and it seems this
option may help me.

Thanks in advance,
Best Regards.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel