2013-07-02 15:42:02

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH 0/5] SLIP SLIP-Improve robustness to crashing

Using SLIP bound to RFCOMM or PTY/TTY has identified some weaknesses to crashing
under abnormal conditions.

Here is a proposed patchset baselined and built on Linux 3.9.

Note the patches have not been tested on x86 Linux 3.9. However similar patches
have been used on ARM Linux 2.6.34 to avoid kernel crashes in a commercial
project. I believe the same weaknesses still exist in Linux 3.9.

If some or all of the patches look to be useful to the community then I may
attempt to test on x86 but this is not straight forward for me.

I welcome any feedback and whether the fixes are a suitable solution.

Who is the maintainer of SLIP in the kernel ?

The patchset consists of:

0001-Bluetooth-Add-RFCOMM-TTY-write-return-error-codes.patch
0002-SLIP-Handle-error-codes-from-the-TTY-layer.patch
0003-SLIP-Prevent-recursion-stack-overflow-and-scheduler-.patch
0004-SLIP-Add-error-message-for-xleft-non-zero.patch
0005-SLIP-Fix-transmission-segmentation-mechanism.patches

Some background:

0001-Bluetooth-Add-RFCOMM-TTY-write-return-error-codes.patch
This patch is a Bluetooth change to add some error return codes to RFCOMM to
avoid NULL pointer dereference crashes. Note RFCOMM can already generate an
error code that will cause SLIP to malfunction.

0002-SLIP-Handle-error-codes-from-the-TTY-layer.patches
This patch allows SLIP to handle error codes from RFCOMM or other bound TTY layers.

0003-SLIP-Prevent-recursion-stack-overflow-and-scheduler-.patches
This patch prevents SLIP from causing a recursive loop that overflows the stack
and catastrophically crashes the kernel. The scenario is SLIP bound to PTY/TTY.
The underlying trigger is a probably a failure to allocate a TTY buffer in
tty_buffer_alloc() but this is unproven. The crash is sporadic in an ARM
embedded environment where resources are limited.

0004-SLIP-Add-error-message-for-xleft-non-zero.patch
This is an error message patch to identify when a SLIP frame has not been fully
transmitted meaning the frame was truncated.

0005-SLIP-Fix-transmission-segmentation-mechanism.patches
This patch allows multiple attempts to transmit segments of the SLIP frame.
Currently only 1 attempt at writing the whole SLIP frame to PTY/TTY occurs.
This could truncate transmitted SLIP frames. In addition the modification
relies on the TTY write wake-up event to complete the transmission of the
SLIP frame rather than the sl_encaps() call to pty_write(). Probably,
pty_write() should not call tty_wakeup() but safer to modify SLIP rather
than the PTY/TTY layer.

Thanks,
Dean Jenkins
Mentor Graphics
--
1.8.1.5


2013-07-02 15:42:00

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH 1/5] Bluetooth: Add RFCOMM TTY write return error codes

It appears that rfcomm_tty_write() does not check that the
passed in TTY device_data is not NULL and also does not check
that the RFCOMM DLC serial data link pointer is not NULL.

A kernel crash was observed whilst SLIP was bound to /dev/rfcomm0
but the /dev/rfcomm0 had subsequently disconnected. Unfortunately,
SLIP attempted to write to the now non-existant RFCOMM TTY device
which caused a NULL pointer dereference because the device_data
no longer existed.

Therefore, add NULL pointer checks for the dev and dlc pointers
and output kernel error debug to show that NULL had been detected.

Signed-off-by: Dean Jenkins <[email protected]>
---
net/bluetooth/rfcomm/tty.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c
index b6e44ad..56d28d1 100644
--- a/net/bluetooth/rfcomm/tty.c
+++ b/net/bluetooth/rfcomm/tty.c
@@ -761,12 +761,24 @@ static void rfcomm_tty_close(struct tty_struct *tty, struct file *filp)
static int rfcomm_tty_write(struct tty_struct *tty, const unsigned char *buf, int count)
{
struct rfcomm_dev *dev = (struct rfcomm_dev *) tty->driver_data;
- struct rfcomm_dlc *dlc = dev->dlc;
+ struct rfcomm_dlc *dlc;
struct sk_buff *skb;
int err = 0, sent = 0, size;

BT_DBG("tty %p count %d", tty, count);

+ if (!dev) {
+ BT_ERR("RFCOMM TTY device data structure does not exist");
+ return -ENODEV;
+ }
+
+ dlc = dev->dlc;
+
+ if (!dlc) {
+ BT_ERR("RFCOMM serial data link does not exist");
+ return -ENOLINK;
+ }
+
while (count) {
size = min_t(uint, count, dlc->mtu);

--
1.8.1.5

2013-07-02 15:42:25

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH 2/5] SLIP: Handle error codes from the TTY layer

It appears that SLIP does not handle error codes from the TTY layer.
This will result in a malfunction because the remaining length of
data will be corrupted by the negative error code values from the TTY
layer.

Therefore, add error code checks in sl_encaps() and sl_encaps_wakeup()
to prevent the corruption of the sent data length.

Note that SLIP is connectionless so on TTY error indicate that all data
was sent. It seems SLIP does not return error codes to the network
layer.

Signed-off-by: Dean Jenkins <[email protected]>
---
drivers/net/slip/slip.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index a34d6bf..bed819f 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -374,7 +374,7 @@ static void sl_bump(struct slip *sl)
static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
{
unsigned char *p;
- int actual, count;
+ int actual, count, err;

if (len > sl->mtu) { /* Sigh, shouldn't occur BUT ... */
printk(KERN_WARNING "%s: truncating oversized transmit packet!\n", sl->dev->name);
@@ -404,7 +404,16 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
* 14 Oct 1994 Dmitry Gorodchanin.
*/
set_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
- actual = sl->tty->ops->write(sl->tty, sl->xbuff, count);
+ err = sl->tty->ops->write(sl->tty, sl->xbuff, count);
+
+ if (err < 0) {
+ /* error case, say all was sent as connectionless */
+ actual = count;
+ } else {
+ /* good case, err contains the number sent */
+ actual = err;
+ }
+
#ifdef SL_CHECK_TRANSMIT
sl->dev->trans_start = jiffies;
#endif
@@ -422,7 +431,7 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
*/
static void slip_write_wakeup(struct tty_struct *tty)
{
- int actual;
+ int actual, err;
struct slip *sl = tty->disc_data;

/* First make sure we're connected. */
@@ -438,7 +447,16 @@ static void slip_write_wakeup(struct tty_struct *tty)
return;
}

- actual = tty->ops->write(tty, sl->xhead, sl->xleft);
+ err = tty->ops->write(tty, sl->xhead, sl->xleft);
+
+ if (err < 0) {
+ /* error case, say all was sent as connectionless */
+ actual = sl->xleft;
+ } else {
+ /* good case, err contains the number sent */
+ actual = err;
+ }
+
sl->xleft -= actual;
sl->xhead += actual;
}
--
1.8.1.5

2013-07-02 15:41:59

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH 3/5] SLIP: Prevent recursion stack overflow and scheduler crash

This is an issue when SLIP is bound to a PTY/TTY. If
TTY_DO_WRITE_WAKEUP is set, pty_write() calls tty_wakeup()
then slip_write_wakeup() can be called. slip_write_wakeup() can
be called by pty_write(). This is a recursion loop.

pty_write() is called in sl_encaps().
slip_write_wakeup() can be called by the TTY wakeup event.

The pty_write() call in sl_encaps() will also call
slip_write_wakeup() but xleft has not been updated and contains
the value from the previous SLIP frame transmission. xleft is zero
unless the previous SLIP frame failed to be fully transmitted in
which case xleft has a positive value. A failed transmission causes
the next SLIP frame pending transmission to cause a crash.

In the failure case when xleft is positive in slip_write_wakeup(),
recursion causes the stack to overflow and task structures located
near the stack are corrupted by the stack overflow. The corrupted
task structures crash the kernel's scheduler and the system
crashes with exception handlers crashing and the emergency reboot
fails.

The recursion loop is:
slip_write_wakeup()-->pty_write()-->tty_wakeup()-->slip_write_wakeup()
etc.

Therefore ensure xleft is zero before writing the SLIP frame to the
PTY/TTY layers. This prevents the xleft value of the previous SLIP
frame from interfering with the slip_write_wakeup() execution when
SLIP is bound to a PTY/TTY.

Note the transmission segmentation mechanism is broken and only a
single call to the write() function pointer will take place per
SLIP frame. This could cause missing or truncated SLIP frames to
be transmitted when the write() function fails to write the complete
frame. In other words the TTY wakeup event does nothing because
the TTY_DO_WRITE_WAKEUP flag has been cleared.

Signed-off-by: Dean Jenkins <[email protected]>
---
drivers/net/slip/slip.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index bed819f..f7303e0 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -395,6 +395,11 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
#endif
count = slip_esc(p, sl->xbuff, len);

+ /* ensure xleft set by the previous SLIP frame is zero for this frame
+ * otherwise slip_write_wakeup() can cause a recursive loop.
+ */
+ sl->xleft = 0;
+
/* Order of next two lines is *very* important.
* When we are sending a little amount of data,
* the transfer may be completed inside the ops->write()
--
1.8.1.5

2013-07-02 15:42:58

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH 5/5] SLIP: Fix transmission segmentation mechanism

Binding SLIP with a PTY/TTY can cause truncated SLIP frames to be
transmitted when the first write to the TTY fails to consume all
the characters of the SLIP frame.

Asynchronous to the write function is a wakeup event from the TTY
that indicates the TTY can accept more data. The wakeup event
calls tty_wakeup() which calls slip_write_wakeup() when
TTY_DO_WRITE_WAKEUP is set.

To complete the transmission of a SLIP frame to the TTY,
slip_write_wakeup() must run at least once. Unfortunately, pty_write()
also calls tty_wakeup() and when TTY_DO_WRITE_WAKEUP is set,
slip_write_wakeup() is called. xleft is always zero and
causes slip_write_wakeup() to complete the transmission and
clears the TTY_DO_WRITE_WAKEUP flag. This can cause a truncated
SLIP frame because any remaining characters will not get sent.

The wakeup event is unable to process the remaining characters
because the TTY_DO_WRITE_WAKEUP flag has been cleared.

The code modification fixes the transmission segmentation
mechanism by preventing pty_write() from calling
slip_write_wakeup() by clearing TTY_DO_WRITE_WAKEUP before calling
pty_write(). After pty_write() returns, TTY_DO_WRITE_WAKEUP is set
to allow the TTY wakeup event to call slip_write_wakeup() to
attempt to complete the transmission of the SLIP frame.

This may not be foolproof because a timeout is needed to
break out of the cycle of transmission attempts. Note error
codes from the TTY layer will break out of the cycle of
transmission attempts.

Signed-off-by: Dean Jenkins <[email protected]>
---
drivers/net/slip/slip.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index e2eff84..0ded23d 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -404,15 +404,13 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
*/
sl->xleft = 0;

- /* Order of next two lines is *very* important.
- * When we are sending a little amount of data,
- * the transfer may be completed inside the ops->write()
- * routine, because it's running with interrupts enabled.
- * In this case we *never* got WRITE_WAKEUP event,
- * if we did not request it before write operation.
- * 14 Oct 1994 Dmitry Gorodchanin.
+ /* ensure slip_write_wakeup() does not run due to write()
+ * or write_wakeup event and this prevents slip_write_wakeup()
+ * responding to an out of date xleft value.
*/
- set_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
+ clear_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
+
+ /* attempt to write the SLIP frame to the TTY buffer */
err = sl->tty->ops->write(sl->tty, sl->xbuff, count);

if (err < 0) {
@@ -432,6 +430,11 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
/* VSV */
clear_bit(SLF_OUTWAIT, &sl->flags); /* reset outfill flag */
#endif
+ /* xleft will be zero when all characters have been written.
+ * if xleft is positive then additional writes are needed.
+ * write_wakeup event is needed to complete the transmission.
+ */
+ set_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
}

/*
@@ -447,15 +450,18 @@ static void slip_write_wakeup(struct tty_struct *tty)
if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev))
return;

+ clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
+
if (sl->xleft <= 0) {
+ /* Whole SLIP frame has been written. */
/* Now serial buffer is almost free & we can start
* transmission of another packet */
sl->dev->stats.tx_packets++;
- clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
sl_unlock(sl);
return;
}

+ /* attempt to write the remaining SLIP frame characters */
err = tty->ops->write(tty, sl->xhead, sl->xleft);

if (err < 0) {
@@ -468,6 +474,11 @@ static void slip_write_wakeup(struct tty_struct *tty)

sl->xleft -= actual;
sl->xhead += actual;
+
+ /* allow the next tty wakeup event to attempt to complete
+ * the transmission
+ */
+ set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
}

static void sl_tx_timeout(struct net_device *dev)
--
1.8.1.5

2013-07-02 15:42:56

by Dean Jenkins

[permalink] [raw]
Subject: [PATCH 4/5] SLIP: Add error message for xleft non-zero

Add a printk to show when xleft is non-zero in sl_encaps.

The idea is to see whether a previous SLIP frame failed to be
fully transmitted.

Signed-off-by: Dean Jenkins <[email protected]>
---
drivers/net/slip/slip.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index f7303e0..e2eff84 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -395,6 +395,10 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
#endif
count = slip_esc(p, sl->xbuff, len);

+ if (sl->xleft)
+ printk(KERN_ERR "%s: ERROR: xleft is non-zero %d\n",
+ __func__, sl->xleft);
+
/* ensure xleft set by the previous SLIP frame is zero for this frame
* otherwise slip_write_wakeup() can cause a recursive loop.
*/
--
1.8.1.5

2013-07-24 22:18:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/5] SLIP: Handle error codes from the TTY layer

On Tue, Jul 02, 2013 at 04:31:31PM +0100, Dean Jenkins wrote:
> It appears that SLIP does not handle error codes from the TTY layer.
> This will result in a malfunction because the remaining length of
> data will be corrupted by the negative error code values from the TTY
> layer.
>
> Therefore, add error code checks in sl_encaps() and sl_encaps_wakeup()
> to prevent the corruption of the sent data length.
>
> Note that SLIP is connectionless so on TTY error indicate that all data
> was sent. It seems SLIP does not return error codes to the network
> layer.
>
> Signed-off-by: Dean Jenkins <[email protected]>

Acked-by: Greg Kroah-Hartman <[email protected]>

2013-07-24 22:18:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 5/5] SLIP: Fix transmission segmentation mechanism

On Tue, Jul 02, 2013 at 04:31:34PM +0100, Dean Jenkins wrote:
> Binding SLIP with a PTY/TTY can cause truncated SLIP frames to be
> transmitted when the first write to the TTY fails to consume all
> the characters of the SLIP frame.
>
> Asynchronous to the write function is a wakeup event from the TTY
> that indicates the TTY can accept more data. The wakeup event
> calls tty_wakeup() which calls slip_write_wakeup() when
> TTY_DO_WRITE_WAKEUP is set.
>
> To complete the transmission of a SLIP frame to the TTY,
> slip_write_wakeup() must run at least once. Unfortunately, pty_write()
> also calls tty_wakeup() and when TTY_DO_WRITE_WAKEUP is set,
> slip_write_wakeup() is called. xleft is always zero and
> causes slip_write_wakeup() to complete the transmission and
> clears the TTY_DO_WRITE_WAKEUP flag. This can cause a truncated
> SLIP frame because any remaining characters will not get sent.
>
> The wakeup event is unable to process the remaining characters
> because the TTY_DO_WRITE_WAKEUP flag has been cleared.
>
> The code modification fixes the transmission segmentation
> mechanism by preventing pty_write() from calling
> slip_write_wakeup() by clearing TTY_DO_WRITE_WAKEUP before calling
> pty_write(). After pty_write() returns, TTY_DO_WRITE_WAKEUP is set
> to allow the TTY wakeup event to call slip_write_wakeup() to
> attempt to complete the transmission of the SLIP frame.
>
> This may not be foolproof because a timeout is needed to
> break out of the cycle of transmission attempts. Note error
> codes from the TTY layer will break out of the cycle of
> transmission attempts.
>
> Signed-off-by: Dean Jenkins <[email protected]>

Acked-by: Greg Kroah-Hartman <[email protected]>

2013-07-24 22:18:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] SLIP: Add error message for xleft non-zero

On Tue, Jul 02, 2013 at 04:31:33PM +0100, Dean Jenkins wrote:
> Add a printk to show when xleft is non-zero in sl_encaps.
>
> The idea is to see whether a previous SLIP frame failed to be
> fully transmitted.
>
> Signed-off-by: Dean Jenkins <[email protected]>
> ---
> drivers/net/slip/slip.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
> index f7303e0..e2eff84 100644
> --- a/drivers/net/slip/slip.c
> +++ b/drivers/net/slip/slip.c
> @@ -395,6 +395,10 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
> #endif
> count = slip_esc(p, sl->xbuff, len);
>
> + if (sl->xleft)
> + printk(KERN_ERR "%s: ERROR: xleft is non-zero %d\n",
> + __func__, sl->xleft);

dev_err() perhaps?

thanks,

greg k-h

2013-07-24 22:41:28

by Paul Bolle

[permalink] [raw]
Subject: Re: [PATCH 4/5] SLIP: Add error message for xleft non-zero

On Wed, 2013-07-24 at 15:18 -0700, Greg Kroah-Hartman wrote:
> On Tue, Jul 02, 2013 at 04:31:33PM +0100, Dean Jenkins wrote:
> > Add a printk to show when xleft is non-zero in sl_encaps.
> >
> > The idea is to see whether a previous SLIP frame failed to be
> > fully transmitted.
> >
> > Signed-off-by: Dean Jenkins <[email protected]>
> > ---
> > drivers/net/slip/slip.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
> > index f7303e0..e2eff84 100644
> > --- a/drivers/net/slip/slip.c
> > +++ b/drivers/net/slip/slip.c
> > @@ -395,6 +395,10 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
> > #endif
> > count = slip_esc(p, sl->xbuff, len);
> >
> > + if (sl->xleft)
> > + printk(KERN_ERR "%s: ERROR: xleft is non-zero %d\n",
> > + __func__, sl->xleft);
>
> dev_err() perhaps?

After looking at the commit explanation and the patch itself I wonder
why this should be printed at error level. Especially since patch 3/5
will set sl->xleft to zero immediately after.

So, dev_dbg() perhaps?

And can't this be merged into 3/5?


Paul Bolle

2013-07-24 23:12:13

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 1/5] Bluetooth: Add RFCOMM TTY write return error codes

On 07/02/2013 11:31 AM, Dean Jenkins wrote:
> It appears that rfcomm_tty_write() does not check that the
> passed in TTY device_data is not NULL and also does not check
> that the RFCOMM DLC serial data link pointer is not NULL.
>
> A kernel crash was observed whilst SLIP was bound to /dev/rfcomm0
> but the /dev/rfcomm0 had subsequently disconnected. Unfortunately,
> SLIP attempted to write to the now non-existant RFCOMM TTY device
> which caused a NULL pointer dereference because the device_data
> no longer existed.
>
> Therefore, add NULL pointer checks for the dev and dlc pointers
> and output kernel error debug to show that NULL had been detected.

Dean,

Sorry, I didn't see these until just now.

> Signed-off-by: Dean Jenkins <[email protected]>
> ---
> net/bluetooth/rfcomm/tty.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/rfcomm/tty.c b/net/bluetooth/rfcomm/tty.c
> index b6e44ad..56d28d1 100644
> --- a/net/bluetooth/rfcomm/tty.c
> +++ b/net/bluetooth/rfcomm/tty.c
> @@ -761,12 +761,24 @@ static void rfcomm_tty_close(struct tty_struct *tty, struct file *filp)
> static int rfcomm_tty_write(struct tty_struct *tty, const unsigned char *buf, int count)
> {
> struct rfcomm_dev *dev = (struct rfcomm_dev *) tty->driver_data;
> - struct rfcomm_dlc *dlc = dev->dlc;
> + struct rfcomm_dlc *dlc;
> struct sk_buff *skb;
> int err = 0, sent = 0, size;
>
> BT_DBG("tty %p count %d", tty, count);
>
> + if (!dev) {
> + BT_ERR("RFCOMM TTY device data structure does not exist");
> + return -ENODEV;
> + }
> +
> + dlc = dev->dlc;
> +
> + if (!dlc) {
> + BT_ERR("RFCOMM serial data link does not exist");
> + return -ENOLINK;
> + }
> +

A number of bugs in rfcomm contributed to the crash you observed.
Several other reporters have also noted crashes stemming from
device disconnect. The rfcomm device must not disappear while a tty
is in-use.

Fixes for these are in-progress in a patch series by Gianluca Anzolin
on linux-bluetooth ML.

Also, for the future, kernel bluetooth patches go to:
Gustavo Padovan <[email protected]>
with cc's to:
Johan Hedberg <[email protected]>
Marcel Holtmann <[email protected]>
[email protected]

Regards,
Peter Hurley

2013-07-25 00:13:29

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 5/5] SLIP: Fix transmission segmentation mechanism

On 07/02/2013 11:31 AM, Dean Jenkins wrote:
> Binding SLIP with a PTY/TTY can cause truncated SLIP frames to be
> transmitted when the first write to the TTY fails to consume all
> the characters of the SLIP frame.
>
> Asynchronous to the write function is a wakeup event from the TTY
> that indicates the TTY can accept more data. The wakeup event
> calls tty_wakeup() which calls slip_write_wakeup() when
> TTY_DO_WRITE_WAKEUP is set.
>
> To complete the transmission of a SLIP frame to the TTY,
> slip_write_wakeup() must run at least once. Unfortunately, pty_write()
> also calls tty_wakeup() and when TTY_DO_WRITE_WAKEUP is set,
> slip_write_wakeup() is called. xleft is always zero and
> causes slip_write_wakeup() to complete the transmission and
> clears the TTY_DO_WRITE_WAKEUP flag. This can cause a truncated
> SLIP frame because any remaining characters will not get sent.
>
> The wakeup event is unable to process the remaining characters
> because the TTY_DO_WRITE_WAKEUP flag has been cleared.
>
> The code modification fixes the transmission segmentation
> mechanism by preventing pty_write() from calling
> slip_write_wakeup() by clearing TTY_DO_WRITE_WAKEUP before calling
> pty_write(). After pty_write() returns, TTY_DO_WRITE_WAKEUP is set
> to allow the TTY wakeup event to call slip_write_wakeup() to
> attempt to complete the transmission of the SLIP frame.
>
> This may not be foolproof because a timeout is needed to
> break out of the cycle of transmission attempts. Note error
> codes from the TTY layer will break out of the cycle of
> transmission attempts.
>
> Signed-off-by: Dean Jenkins <[email protected]>
> ---
> drivers/net/slip/slip.c | 29 ++++++++++++++++++++---------
> 1 file changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
> index e2eff84..0ded23d 100644
> --- a/drivers/net/slip/slip.c
> +++ b/drivers/net/slip/slip.c
> @@ -404,15 +404,13 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
> */
> sl->xleft = 0;
>
> - /* Order of next two lines is *very* important.
> - * When we are sending a little amount of data,
> - * the transfer may be completed inside the ops->write()
> - * routine, because it's running with interrupts enabled.
> - * In this case we *never* got WRITE_WAKEUP event,
> - * if we did not request it before write operation.
> - * 14 Oct 1994 Dmitry Gorodchanin.
> + /* ensure slip_write_wakeup() does not run due to write()
> + * or write_wakeup event and this prevents slip_write_wakeup()
> + * responding to an out of date xleft value.
> */
> - set_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
> + clear_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);
> +
> + /* attempt to write the SLIP frame to the TTY buffer */
> err = sl->tty->ops->write(sl->tty, sl->xbuff, count);
>
> if (err < 0) {
> @@ -432,6 +430,11 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
> /* VSV */
> clear_bit(SLF_OUTWAIT, &sl->flags); /* reset outfill flag */
> #endif
> + /* xleft will be zero when all characters have been written.
> + * if xleft is positive then additional writes are needed.
> + * write_wakeup event is needed to complete the transmission.
> + */
> + set_bit(TTY_DO_WRITE_WAKEUP, &sl->tty->flags);

What happens if the tty_wakeup() has already happened on another
CPU before TTY_DO_WRITE_WAKEUP was set?

> }
>
> /*
> @@ -447,15 +450,18 @@ static void slip_write_wakeup(struct tty_struct *tty)
> if (!sl || sl->magic != SLIP_MAGIC || !netif_running(sl->dev))
> return;
>
> + clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
> +
> if (sl->xleft <= 0) {
> + /* Whole SLIP frame has been written. */
> /* Now serial buffer is almost free & we can start
> * transmission of another packet */
> sl->dev->stats.tx_packets++;
> - clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
> sl_unlock(sl);
> return;
> }
>
> + /* attempt to write the remaining SLIP frame characters */
> err = tty->ops->write(tty, sl->xhead, sl->xleft);
>
> if (err < 0) {
> @@ -468,6 +474,11 @@ static void slip_write_wakeup(struct tty_struct *tty)
>
> sl->xleft -= actual;
> sl->xhead += actual;
> +
> + /* allow the next tty wakeup event to attempt to complete
> + * the transmission
> + */
> + set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);

Same here.

> }
>
> static void sl_tx_timeout(struct net_device *dev)
>

2013-07-25 01:12:24

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH 3/5] SLIP: Prevent recursion stack overflow and scheduler crash

On 07/02/2013 11:31 AM, Dean Jenkins wrote:
> This is an issue when SLIP is bound to a PTY/TTY. If
> TTY_DO_WRITE_WAKEUP is set, pty_write() calls tty_wakeup()
> then slip_write_wakeup() can be called. slip_write_wakeup() can
> be called by pty_write(). This is a recursion loop.
>
> pty_write() is called in sl_encaps().
> slip_write_wakeup() can be called by the TTY wakeup event.
>
> The pty_write() call in sl_encaps() will also call
> slip_write_wakeup() but xleft has not been updated and contains
> the value from the previous SLIP frame transmission. xleft is zero
> unless the previous SLIP frame failed to be fully transmitted in
> which case xleft has a positive value. A failed transmission causes
> the next SLIP frame pending transmission to cause a crash.
>
> In the failure case when xleft is positive in slip_write_wakeup(),
> recursion causes the stack to overflow and task structures located
> near the stack are corrupted by the stack overflow. The corrupted
> task structures crash the kernel's scheduler and the system
> crashes with exception handlers crashing and the emergency reboot
> fails.
>
> The recursion loop is:
> slip_write_wakeup()-->pty_write()-->tty_wakeup()-->slip_write_wakeup()
> etc.

pty_write() calling tty_wakeup() directly is a no-no.

This is fixed in tty-next.

Regards,
Peter Hurley

> Therefore ensure xleft is zero before writing the SLIP frame to the
> PTY/TTY layers. This prevents the xleft value of the previous SLIP
> frame from interfering with the slip_write_wakeup() execution when
> SLIP is bound to a PTY/TTY.
>
> Note the transmission segmentation mechanism is broken and only a
> single call to the write() function pointer will take place per
> SLIP frame. This could cause missing or truncated SLIP frames to
> be transmitted when the write() function fails to write the complete
> frame. In other words the TTY wakeup event does nothing because
> the TTY_DO_WRITE_WAKEUP flag has been cleared.
>
> Signed-off-by: Dean Jenkins <[email protected]>
> ---
> drivers/net/slip/slip.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
> index bed819f..f7303e0 100644
> --- a/drivers/net/slip/slip.c
> +++ b/drivers/net/slip/slip.c
> @@ -395,6 +395,11 @@ static void sl_encaps(struct slip *sl, unsigned char *icp, int len)
> #endif
> count = slip_esc(p, sl->xbuff, len);
>
> + /* ensure xleft set by the previous SLIP frame is zero for this frame
> + * otherwise slip_write_wakeup() can cause a recursive loop.
> + */
> + sl->xleft = 0;
> +
> /* Order of next two lines is *very* important.
> * When we are sending a little amount of data,
> * the transfer may be completed inside the ops->write()
>