2013-07-09 17:05:02

by Gianluca Anzolin

[permalink] [raw]
Subject: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

In net/bluetooth/rfcomm/tty.c the struct tty is used without proper
refcounting. This leads to OOPS and panics triggered by the tty layer functions
which are invoked after the struct tty has already been destroyed.

This happens for example when the bluetooth connection is lost because the host
goes down unexpectedly.

The fix uses the tty_port_* helpers already in place.

This patch depends on patch "Fix refcount leak in tty_port.c" already sent to
linux-kernel. [0]

Signed-off-by: Gianluca Anzolin <[email protected]>

[0] http://lkml.indiana.edu/hypermail/linux/kernel/1307.1/00600.html


Attachments:
(No filename) (602.00 B)
rfcomm-tty-refcount.patch (3.00 kB)
Download all attachments

2013-07-10 17:01:10

by Gianluca Anzolin

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

On Wed, Jul 10, 2013 at 05:55:41PM +0100, Gustavo Padovan wrote:
> Hi Gianluca,
>
> * Gianluca Anzolin <[email protected]> [2013-07-10 18:24:18 +0200]:
>
> > Hello,
> >
> > On Wed, Jul 10, 2013 at 04:46:23PM +0100, Gustavo Padovan wrote:
> > > Hi Gianluca,
> > >
> > > haven't looked into detail into these patches, but to get rfcomm patches
> > > upstream you would first need the tty maintainer to accept this patch you are
> > > mentioning since our side would depend on it. It seems to be a regression
> > > caused by aa27a094e2c and your patch seems to be the right fix.
> >
> > Thank you for pointing that out. These patches were also sent to Marcel Holtman
> > who seems to be the maintainer by looking at the source and by using the script
> > get_maintainers.pl and I'm waiting for a reply.
> >
> > The source mentions also Maxim Krasnyansky, I didn't include him since the
> > maintainers script didn't return his name anymore. Should I send the patches
> > also to him?
>
> No, just keep sending the bluetooth patches here. Once you get the drivers/tty
> patch applied we can go ahead with the bluetooth ones. It helps if you also
> use git formatted patches.
>
> Gustavo

Ah now I get it, sorry I'm boiled.

If you have time could you also review the second patch? Then I'll resend them
both.

Thank you,

Gianluca

2013-07-10 16:55:41

by Gustavo Padovan

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

Hi Gianluca,

* Gianluca Anzolin <[email protected]> [2013-07-10 18:24:18 +0200]:

> Hello,
>
> On Wed, Jul 10, 2013 at 04:46:23PM +0100, Gustavo Padovan wrote:
> > Hi Gianluca,
> >
> > haven't looked into detail into these patches, but to get rfcomm patches
> > upstream you would first need the tty maintainer to accept this patch you are
> > mentioning since our side would depend on it. It seems to be a regression
> > caused by aa27a094e2c and your patch seems to be the right fix.
>
> Thank you for pointing that out. These patches were also sent to Marcel Holtman
> who seems to be the maintainer by looking at the source and by using the script
> get_maintainers.pl and I'm waiting for a reply.
>
> The source mentions also Maxim Krasnyansky, I didn't include him since the
> maintainers script didn't return his name anymore. Should I send the patches
> also to him?

No, just keep sending the bluetooth patches here. Once you get the drivers/tty
patch applied we can go ahead with the bluetooth ones. It helps if you also
use git formatted patches.

Gustavo

2013-07-10 16:24:18

by Gianluca Anzolin

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

Hello,

On Wed, Jul 10, 2013 at 04:46:23PM +0100, Gustavo Padovan wrote:
> Hi Gianluca,
>
> haven't looked into detail into these patches, but to get rfcomm patches
> upstream you would first need the tty maintainer to accept this patch you are
> mentioning since our side would depend on it. It seems to be a regression
> caused by aa27a094e2c and your patch seems to be the right fix.

Thank you for pointing that out. These patches were also sent to Marcel Holtman
who seems to be the maintainer by looking at the source and by using the script
get_maintainers.pl and I'm waiting for a reply.

The source mentions also Maxim Krasnyansky, I didn't include him since the
maintainers script didn't return his name anymore. Should I send the patches
also to him?

>
>
> > --- linux/net/bluetooth/rfcomm/tty.c.orig 2013-07-09 18:10:09.071322663 +0200
> > +++ linux/net/bluetooth/rfcomm/tty.c 2013-07-09 18:14:44.517783673 +0200
> > @@ -333,10 +333,11 @@ static inline unsigned int rfcomm_room(s
> > static void rfcomm_wfree(struct sk_buff *skb)
> > {
> > struct rfcomm_dev *dev = (void *) skb->sk;
> > - struct tty_struct *tty = dev->port.tty;
> > +
> > atomic_sub(skb->truesize, &dev->wmem_alloc);
> > - if (test_bit(RFCOMM_TTY_ATTACHED, &dev->flags) && tty)
> > - tty_wakeup(tty);
> > + if (test_bit(RFCOMM_TTY_ATTACHED, &dev->flags))
> > + tty_port_tty_wakeup(&dev->port);
> > +
> > tty_port_put(&dev->port);
> > }
> >
> > @@ -410,6 +411,7 @@ static int rfcomm_release_dev(void __use
> > {
> > struct rfcomm_dev_req req;
> > struct rfcomm_dev *dev;
> > + struct tty_struct *tty;
> >
> > if (copy_from_user(&req, arg, sizeof(req)))
> > return -EFAULT;
> > @@ -429,11 +431,15 @@ static int rfcomm_release_dev(void __use
> > rfcomm_dlc_close(dev->dlc, 0);
> >
> > /* Shut down TTY synchronously before freeing rfcomm_dev */
> > - if (dev->port.tty)
> > - tty_vhangup(dev->port.tty);
> > + tty = tty_port_tty_get(&dev->port);
> > + if (tty) {
> > + tty_vhangup(tty);
> > + tty_kref_put(tty);
> > + }
> >
> > if (!test_bit(RFCOMM_RELEASE_ONHUP, &dev->flags))
> > rfcomm_dev_del(dev);
> > +
>
> Please remove the extra blank line.

I will do it.

>
> > tty_port_put(&dev->port);
> > return 0;
> > }
> > @@ -563,6 +569,7 @@ static void rfcomm_dev_data_ready(struct
> > static void rfcomm_dev_state_change(struct rfcomm_dlc *dlc, int err)
> > {
> > struct rfcomm_dev *dev = dlc->owner;
> > + struct tty_struct *tty;
> > if (!dev)
> > return;
> >
> > @@ -572,7 +579,8 @@ static void rfcomm_dev_state_change(stru
> > wake_up_interruptible(&dev->wait);
> >
> > if (dlc->state == BT_CLOSED) {
> > - if (!dev->port.tty) {
> > + tty = tty_port_tty_get(&dev->port);
> > + if (!tty) {
> > if (test_bit(RFCOMM_RELEASE_ONHUP, &dev->flags)) {
> > /* Drop DLC lock here to avoid deadlock
> > * 1. rfcomm_dev_get will take rfcomm_dev_lock
> > @@ -591,8 +599,10 @@ static void rfcomm_dev_state_change(stru
> > tty_port_put(&dev->port);
> > rfcomm_dlc_lock(dlc);
> > }
> > - } else
> > - tty_hangup(dev->port.tty);
> > + } else {
> > + tty_hangup(tty);
> > + tty_kref_put(tty);
> > + }
>
> Shouldn't we be using tty_port_tyy_hangup?

I couldn't use the helper here because I have to check for tty == NULL, and
take the first branch of the if statement.

However I agree that the 2nd patch which gets rid of that codepath should use
the helper, I overlooked that.

I'll wait for a reply from the maintainer then and if he agrees I'll resend the
patches with these further fixes.

I really hope that my patches or an equivalent fix will get accepted soon,
since I tried to capture an oops this afternoon but 9 times out of 10 the
machine directly reboots.

Thank you,

Gianluca


>
> > }
> > }
> >
> > @@ -604,10 +614,8 @@ static void rfcomm_dev_modem_status(stru
> >
> > BT_DBG("dlc %p dev %p v24_sig 0x%02x", dlc, dev, v24_sig);
> >
> > - if ((dev->modem_status & TIOCM_CD) && !(v24_sig & RFCOMM_V24_DV)) {
> > - if (dev->port.tty && !C_CLOCAL(dev->port.tty))
> > - tty_hangup(dev->port.tty);
> > - }
> > + if ((dev->modem_status & TIOCM_CD) && !(v24_sig & RFCOMM_V24_DV))
> > + tty_port_tty_hangup(&dev->port, true);
> >
> > dev->modem_status =
> > ((v24_sig & RFCOMM_V24_RTC) ? (TIOCM_DSR | TIOCM_DTR) : 0) |
> > @@ -674,7 +682,7 @@ static int rfcomm_tty_open(struct tty_st
> >
> > rfcomm_dlc_lock(dlc);
> > tty->driver_data = dev;
> > - dev->port.tty = tty;
> > + tty_port_tty_set(&dev->port, tty);
> > rfcomm_dlc_unlock(dlc);
> > set_bit(RFCOMM_TTY_ATTACHED, &dev->flags);
> >
> > @@ -742,7 +750,7 @@ static void rfcomm_tty_close(struct tty_
> >
> > rfcomm_dlc_lock(dev->dlc);
> > tty->driver_data = NULL;
> > - dev->port.tty = NULL;
> > + tty_port_tty_set(&dev->port, NULL);
> > rfcomm_dlc_unlock(dev->dlc);
> >
> > if (test_bit(RFCOMM_TTY_RELEASED, &dev->flags)) {
>
>
> Gustavo

2013-07-10 15:46:23

by Gustavo Padovan

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

Hi Gianluca,

* Gianluca Anzolin <[email protected]> [2013-07-09 19:05:02 +0200]:

> In net/bluetooth/rfcomm/tty.c the struct tty is used without proper
> refcounting. This leads to OOPS and panics triggered by the tty layer functions
> which are invoked after the struct tty has already been destroyed.
>
> This happens for example when the bluetooth connection is lost because the host
> goes down unexpectedly.
>
> The fix uses the tty_port_* helpers already in place.
>
> This patch depends on patch "Fix refcount leak in tty_port.c" already sent to
> linux-kernel. [0]
>
> Signed-off-by: Gianluca Anzolin <[email protected]>
>
> [0] http://lkml.indiana.edu/hypermail/linux/kernel/1307.1/00600.html
>

haven't looked into detail into these patches, but to get rfcomm patches
upstream you would first need the tty maintainer to accept this patch you are
mentioning since our side would depend on it. It seems to be a regression
caused by aa27a094e2c and your patch seems to be the right fix.


> --- linux/net/bluetooth/rfcomm/tty.c.orig 2013-07-09 18:10:09.071322663 +0200
> +++ linux/net/bluetooth/rfcomm/tty.c 2013-07-09 18:14:44.517783673 +0200
> @@ -333,10 +333,11 @@ static inline unsigned int rfcomm_room(s
> static void rfcomm_wfree(struct sk_buff *skb)
> {
> struct rfcomm_dev *dev = (void *) skb->sk;
> - struct tty_struct *tty = dev->port.tty;
> +
> atomic_sub(skb->truesize, &dev->wmem_alloc);
> - if (test_bit(RFCOMM_TTY_ATTACHED, &dev->flags) && tty)
> - tty_wakeup(tty);
> + if (test_bit(RFCOMM_TTY_ATTACHED, &dev->flags))
> + tty_port_tty_wakeup(&dev->port);
> +
> tty_port_put(&dev->port);
> }
>
> @@ -410,6 +411,7 @@ static int rfcomm_release_dev(void __use
> {
> struct rfcomm_dev_req req;
> struct rfcomm_dev *dev;
> + struct tty_struct *tty;
>
> if (copy_from_user(&req, arg, sizeof(req)))
> return -EFAULT;
> @@ -429,11 +431,15 @@ static int rfcomm_release_dev(void __use
> rfcomm_dlc_close(dev->dlc, 0);
>
> /* Shut down TTY synchronously before freeing rfcomm_dev */
> - if (dev->port.tty)
> - tty_vhangup(dev->port.tty);
> + tty = tty_port_tty_get(&dev->port);
> + if (tty) {
> + tty_vhangup(tty);
> + tty_kref_put(tty);
> + }
>
> if (!test_bit(RFCOMM_RELEASE_ONHUP, &dev->flags))
> rfcomm_dev_del(dev);
> +

Please remove the extra blank line.

> tty_port_put(&dev->port);
> return 0;
> }
> @@ -563,6 +569,7 @@ static void rfcomm_dev_data_ready(struct
> static void rfcomm_dev_state_change(struct rfcomm_dlc *dlc, int err)
> {
> struct rfcomm_dev *dev = dlc->owner;
> + struct tty_struct *tty;
> if (!dev)
> return;
>
> @@ -572,7 +579,8 @@ static void rfcomm_dev_state_change(stru
> wake_up_interruptible(&dev->wait);
>
> if (dlc->state == BT_CLOSED) {
> - if (!dev->port.tty) {
> + tty = tty_port_tty_get(&dev->port);
> + if (!tty) {
> if (test_bit(RFCOMM_RELEASE_ONHUP, &dev->flags)) {
> /* Drop DLC lock here to avoid deadlock
> * 1. rfcomm_dev_get will take rfcomm_dev_lock
> @@ -591,8 +599,10 @@ static void rfcomm_dev_state_change(stru
> tty_port_put(&dev->port);
> rfcomm_dlc_lock(dlc);
> }
> - } else
> - tty_hangup(dev->port.tty);
> + } else {
> + tty_hangup(tty);
> + tty_kref_put(tty);
> + }

Shouldn't we be using tty_port_tyy_hangup?

> }
> }
>
> @@ -604,10 +614,8 @@ static void rfcomm_dev_modem_status(stru
>
> BT_DBG("dlc %p dev %p v24_sig 0x%02x", dlc, dev, v24_sig);
>
> - if ((dev->modem_status & TIOCM_CD) && !(v24_sig & RFCOMM_V24_DV)) {
> - if (dev->port.tty && !C_CLOCAL(dev->port.tty))
> - tty_hangup(dev->port.tty);
> - }
> + if ((dev->modem_status & TIOCM_CD) && !(v24_sig & RFCOMM_V24_DV))
> + tty_port_tty_hangup(&dev->port, true);
>
> dev->modem_status =
> ((v24_sig & RFCOMM_V24_RTC) ? (TIOCM_DSR | TIOCM_DTR) : 0) |
> @@ -674,7 +682,7 @@ static int rfcomm_tty_open(struct tty_st
>
> rfcomm_dlc_lock(dlc);
> tty->driver_data = dev;
> - dev->port.tty = tty;
> + tty_port_tty_set(&dev->port, tty);
> rfcomm_dlc_unlock(dlc);
> set_bit(RFCOMM_TTY_ATTACHED, &dev->flags);
>
> @@ -742,7 +750,7 @@ static void rfcomm_tty_close(struct tty_
>
> rfcomm_dlc_lock(dev->dlc);
> tty->driver_data = NULL;
> - dev->port.tty = NULL;
> + tty_port_tty_set(&dev->port, NULL);
> rfcomm_dlc_unlock(dev->dlc);
>
> if (test_bit(RFCOMM_TTY_RELEASED, &dev->flags)) {


Gustavo

2013-07-10 11:24:46

by Gianluca Anzolin

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

On Wed, Jul 10, 2013 at 10:39:21AM +0200, Mathias Hasselmann wrote:
> > Is this failure easy to reproduce ?
>
> Just setup a RFCOMM tty and then violently break the link, by killing
> the rfcomm tool, by plugging the bt adapter, by turning off one
> peer, ... As soon as timeouts occur on the still running side you'll get
> the panic. Almost 100% reproducible.

Hi,

yes indeed that's the easiest way to reproduce the issue. I was wondering
why so few people hit these bugs since they're so easy to trigger.

Did you have a chance to test the two patches I sent?

Ciao,
Gianluca

>
> Ciao,
> Mathias
> --
> Mathias Hasselmann | [email protected] | Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel. Germany +49-30-521325470, Sweden (HQ) +46-563-540090
> KDAB - Qt Experts - Platform-independent software solutions
>
>

2013-07-10 10:43:10

by Gianluca Anzolin

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

On Wed, Jul 10, 2013 at 11:37:24AM +0200, Gianluca Anzolin wrote:
> Linux kernel 3.10, but I see the issue is reproducible also on 3.9 because
> other people reported it a while ago: have a look at this thread
>
> http://marc.info/?t=136868685500006&r=1&w=2

Oh wait, it seems you were already aware of this issue, since you were part of
that discussion too...

Sorry if I didn't CC you directly when I sent the patches.

Ciao,
Gianluca

2013-07-10 09:37:24

by Gianluca Anzolin

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

On Wed, Jul 10, 2013 at 09:17:27AM +0100, Dean Jenkins wrote:
> Hi Gianluca,
>
> On 09/07/13 18:05, Gianluca Anzolin wrote:
> >In net/bluetooth/rfcomm/tty.c the struct tty is used without proper
> >refcounting. This leads to OOPS and panics triggered by the tty layer functions
> >which are invoked after the struct tty has already been destroyed.
> >
> >This happens for example when the bluetooth connection is lost because the host
> >goes down unexpectedly.
> >
> >The fix uses the tty_port_* helpers already in place.
> >
> >This patch depends on patch "Fix refcount leak in tty_port.c" already sent to
> >linux-kernel. [0]
> >
> >Signed-off-by: Gianluca Anzolin <[email protected]>
> >
> >[0] http://lkml.indiana.edu/hypermail/linux/kernel/1307.1/00600.html
> >
>
> Do you have any backtraces of the OOPS and panics ?

Not at hand because when they happen I can't even sync to disk. I could take a
picture however maybe this evening.

>
> In which kernel did you discover the failure ?

Linux kernel 3.10, but I see the issue is reproducible also on 3.9 because
other people reported it a while ago: have a look at this thread

http://marc.info/?t=136868685500006&r=1&w=2

Unfortunately that discussion died without results so I gave a look at the
code: the tty struct we were using died under us because it was missing
proper refcounting. So I fixed that problem with the first patch.

Then it turned out that also the tty_port structure didn't work as expected and
was destroyed while being used: the code manually checked for port.count
instead of relying on proper refcounting. That's what the 2nd patch is for.

It had a look at usb-serial.c and changed the code to mimic that behaviour. It
works for me reliably now, however it's better if someone with more knowledge
gives the patches a look.

>
> What platform were you using, I mean x86, ARM or something else ?
>
> Is this failure easy to reproduce ?

I'm seeing this on a x86_64 xeon cpu. It's very easy to reproduce, all you need
to do is to power off the bluetooth host you're communicating to while reading
from the tty device with some program, I use picocom for instance.

But the current code fails in many other ways: removing the device or releasing
the tty port with the port open for example.

>
> Did you use Bluez userland to control Bluetooth ?

I'm using the bluez4 packages of ARCH linux:

blueman 1.23-10
bluez-libs 5.7-1
bluez-utils 5.7-1
bluez4 4.101-3

>
> In the failure case, what protocol or application was bound to the
> RFCOMM TTY ? I mean was it SLIP, minicom or something else ?
>

I was using picocom: picocom /dev/rfcomm0

> Thanks,
>
> Regards,
> Dean Jenkins
> Mentor Graphics

Thank you for your reply :D

Gianluca

2013-07-10 08:39:21

by Mathias Hasselmann

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

Am Mittwoch, den 10.07.2013, 09:17 +0100 schrieb Dean Jenkins:
> Hi Gianluca,
>
> On 09/07/13 18:05, Gianluca Anzolin wrote:
> > In net/bluetooth/rfcomm/tty.c the struct tty is used without proper
> > refcounting. This leads to OOPS and panics triggered by the tty layer functions
> > which are invoked after the struct tty has already been destroyed.
> >
> > This happens for example when the bluetooth connection is lost because the host
> > goes down unexpectedly.
> >
> > The fix uses the tty_port_* helpers already in place.
> >
> > This patch depends on patch "Fix refcount leak in tty_port.c" already sent to
> > linux-kernel. [0]
> >
> > Signed-off-by: Gianluca Anzolin <[email protected]>
> >
> > [0] http://lkml.indiana.edu/hypermail/linux/kernel/1307.1/00600.html
> >
>
> Do you have any backtraces of the OOPS and panics ?
>
> In which kernel did you discover the failure ?
>
> What platform were you using, I mean x86, ARM or something else ?
>
> Is this failure easy to reproduce ?

Just setup a RFCOMM tty and then violently break the link, by killing
the rfcomm tool, by plugging the bt adapter, by turning off one
peer, ... As soon as timeouts occur on the still running side you'll get
the panic. Almost 100% reproducible.

Ciao,
Mathias
--
Mathias Hasselmann | [email protected] | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel. Germany +49-30-521325470, Sweden (HQ) +46-563-540090
KDAB - Qt Experts - Platform-independent software solutions

2013-07-10 08:17:27

by Dean Jenkins

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix tty refcounting in rfcomm/tty.c

Hi Gianluca,

On 09/07/13 18:05, Gianluca Anzolin wrote:
> In net/bluetooth/rfcomm/tty.c the struct tty is used without proper
> refcounting. This leads to OOPS and panics triggered by the tty layer functions
> which are invoked after the struct tty has already been destroyed.
>
> This happens for example when the bluetooth connection is lost because the host
> goes down unexpectedly.
>
> The fix uses the tty_port_* helpers already in place.
>
> This patch depends on patch "Fix refcount leak in tty_port.c" already sent to
> linux-kernel. [0]
>
> Signed-off-by: Gianluca Anzolin <[email protected]>
>
> [0] http://lkml.indiana.edu/hypermail/linux/kernel/1307.1/00600.html
>

Do you have any backtraces of the OOPS and panics ?

In which kernel did you discover the failure ?

What platform were you using, I mean x86, ARM or something else ?

Is this failure easy to reproduce ?

Did you use Bluez userland to control Bluetooth ?

In the failure case, what protocol or application was bound to the
RFCOMM TTY ? I mean was it SLIP, minicom or something else ?

Thanks,

Regards,
Dean Jenkins
Mentor Graphics