2014-04-08 12:43:49

by Manfred Schlaegl

[permalink] [raw]
Subject: [PATCH] tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc

The race was introduced while development of linux-3.11 by
e8437d7ecbc50198705331449367d401ebb3181f and
e9975fdec0138f1b2a85b9624e41660abd9865d4.
Originally it was found and reproduced on linux-3.12.15 and
linux-3.12.15-rt25, by sending 500 byte blocks with 115kbaud to the
target uart in a loop with 100 milliseconds delay.

In short:
1. The consumer flush_to_ldisc is on to remove the head tty_buffer.
2. The producer adds a number of bytes, so that a new tty_buffer must
be allocated and added by __tty_buffer_request_room.
3. The consumer removes the head tty_buffer element, without handling
newly committed data.

Detailed example:
* Initial buffer:
* Head, Tail -> 0: used=250; commit=250; read=240; next=NULL
* Consumer: ''flush_to_ldisc''
* consumed 10 Byte
* buffer:
* Head, Tail -> 0: used=250; commit=250; read=250; next=NULL
{{{
count = head->commit - head->read; // count = 0
if (!count) { // enter
// INTERRUPTED BY PRODUCER ->
if (head->next == NULL)
break;
buf->head = head->next;
tty_buffer_free(port, head);
continue;
}
}}}
* Producer: tty_insert_flip_... 10 bytes + tty_flip_buffer_push
* buffer:
* Head, Tail -> 0: used=250; commit=250; read=250; next=NULL
* added 6 bytes: head-element filled to maximum.
* buffer:
* Head, Tail -> 0: used=256; commit=250; read=250; next=NULL
* added 4 bytes: __tty_buffer_request_room is called
* buffer:
* Head -> 0: used=256; commit=256; read=250; next=1
* Tail -> 1: used=4; commit=0; read=250 next=NULL
* push (tty_flip_buffer_push)
* buffer:
* Head -> 0: used=256; commit=256; read=250; next=1
* Tail -> 1: used=4; commit=4; read=250 next=NULL
* Consumer
{{{
count = head->commit - head->read;
if (!count) {
// INTERRUPTED BY PRODUCER <-
if (head->next == NULL) // -> no break
break;
buf->head = head->next;
tty_buffer_free(port, head);
// ERROR: tty_buffer head freed -> 6 bytes lost
continue;
}
}}}

This patch reintroduces a spin_lock to protect this case. Perhaps later
a lock-less solution could be found.

Signed-off-by: Manfred Schlaegl <[email protected]>
---
drivers/tty/tty_buffer.c | 16 ++++++++++++++--
include/linux/tty.h | 1 +
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index 8ebd9f8..f1d30f6 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -255,11 +255,16 @@ static int __tty_buffer_request_room(struct tty_port *port, size_t size,
if (change || left < size) {
/* This is the slow path - looking for new buffers to use */
if ((n = tty_buffer_alloc(port, size)) != NULL) {
+ unsigned long iflags;
+
n->flags = flags;
buf->tail = n;
+
+ spin_lock_irqsave(&buf->flush_lock, iflags);
b->commit = b->used;
- smp_mb();
b->next = n;
+ spin_unlock_irqrestore(&buf->flush_lock, iflags);
+
} else if (change)
size = 0;
else
@@ -443,6 +448,7 @@ static void flush_to_ldisc(struct work_struct *work)
mutex_lock(&buf->lock);

while (1) {
+ unsigned long flags;
struct tty_buffer *head = buf->head;
int count;

@@ -450,14 +456,19 @@ static void flush_to_ldisc(struct work_struct *work)
if (atomic_read(&buf->priority))
break;

+ spin_lock_irqsave(&buf->flush_lock, flags);
count = head->commit - head->read;
if (!count) {
- if (head->next == NULL)
+ if (head->next == NULL) {
+ spin_unlock_irqrestore(&buf->flush_lock, flags);
break;
+ }
buf->head = head->next;
+ spin_unlock_irqrestore(&buf->flush_lock, flags);
tty_buffer_free(port, head);
continue;
}
+ spin_unlock_irqrestore(&buf->flush_lock, flags);

count = receive_buf(tty, head, count);
if (!count)
@@ -512,6 +523,7 @@ void tty_buffer_init(struct tty_port *port)
struct tty_bufhead *buf = &port->buf;

mutex_init(&buf->lock);
+ spin_lock_init(&buf->flush_lock);
tty_buffer_reset(&buf->sentinel, 0);
buf->head = &buf->sentinel;
buf->tail = &buf->sentinel;
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 1c3316a..036cccd 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -61,6 +61,7 @@ struct tty_bufhead {
struct tty_buffer *head; /* Queue head */
struct work_struct work;
struct mutex lock;
+ spinlock_t flush_lock;
atomic_t priority;
struct tty_buffer sentinel;
struct llist_head free; /* Free queue head */


2014-04-18 08:12:38

by Manfred Schlaegl

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc

On 2014-04-08 14:42, Manfred Schlaegl wrote:
> The race was introduced while development of linux-3.11 by
> e8437d7ecbc50198705331449367d401ebb3181f and
> e9975fdec0138f1b2a85b9624e41660abd9865d4.
> Originally it was found and reproduced on linux-3.12.15 and
> linux-3.12.15-rt25, by sending 500 byte blocks with 115kbaud to the
> target uart in a loop with 100 milliseconds delay.
>
> In short:
> 1. The consumer flush_to_ldisc is on to remove the head tty_buffer.
> 2. The producer adds a number of bytes, so that a new tty_buffer must
> be allocated and added by __tty_buffer_request_room.
> 3. The consumer removes the head tty_buffer element, without handling
> newly committed data.

Hi!

Reminder: The Problem is still existent in linux-3.11, 3.12, 3.13. 3.14 and pre 3.15 kernels.

The lastly delivered patch cleanly applies to pre 3.15 (torvalds tree) and 3.14(.1).

Manfred

2014-04-22 10:56:17

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc

On Fri, 18 Apr 2014 10:10:17 +0200
Manfred Schlaegl <[email protected]> wrote:

> On 2014-04-08 14:42, Manfred Schlaegl wrote:
> > The race was introduced while development of linux-3.11 by
> > e8437d7ecbc50198705331449367d401ebb3181f and
> > e9975fdec0138f1b2a85b9624e41660abd9865d4.
> > Originally it was found and reproduced on linux-3.12.15 and
> > linux-3.12.15-rt25, by sending 500 byte blocks with 115kbaud to the
> > target uart in a loop with 100 milliseconds delay.
> >
> > In short:
> > 1. The consumer flush_to_ldisc is on to remove the head tty_buffer.
> > 2. The producer adds a number of bytes, so that a new tty_buffer must
> > be allocated and added by __tty_buffer_request_room.
> > 3. The consumer removes the head tty_buffer element, without handling
> > newly committed data.
>
> Hi!
>
> Reminder: The Problem is still existent in linux-3.11, 3.12, 3.13. 3.14 and pre 3.15 kernels.
>
> The lastly delivered patch cleanly applies to pre 3.15 (torvalds tree) and 3.14(.1).

Greg ?? what is happening about this one. We can't just ignore losing
bytes caused by incorrect lock removal. We are seeing breakages in things
like bluetooth over serial which are quite probably down to this problem.

Alan

2014-04-22 11:16:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc

On Tue, Apr 22, 2014 at 11:55:52AM +0100, One Thousand Gnomes wrote:
> On Fri, 18 Apr 2014 10:10:17 +0200
> Manfred Schlaegl <[email protected]> wrote:
>
> > On 2014-04-08 14:42, Manfred Schlaegl wrote:
> > > The race was introduced while development of linux-3.11 by
> > > e8437d7ecbc50198705331449367d401ebb3181f and
> > > e9975fdec0138f1b2a85b9624e41660abd9865d4.
> > > Originally it was found and reproduced on linux-3.12.15 and
> > > linux-3.12.15-rt25, by sending 500 byte blocks with 115kbaud to the
> > > target uart in a loop with 100 milliseconds delay.
> > >
> > > In short:
> > > 1. The consumer flush_to_ldisc is on to remove the head tty_buffer.
> > > 2. The producer adds a number of bytes, so that a new tty_buffer must
> > > be allocated and added by __tty_buffer_request_room.
> > > 3. The consumer removes the head tty_buffer element, without handling
> > > newly committed data.
> >
> > Hi!
> >
> > Reminder: The Problem is still existent in linux-3.11, 3.12, 3.13. 3.14 and pre 3.15 kernels.
> >
> > The lastly delivered patch cleanly applies to pre 3.15 (torvalds tree) and 3.14(.1).
>
> Greg ?? what is happening about this one. We can't just ignore losing
> bytes caused by incorrect lock removal. We are seeing breakages in things
> like bluetooth over serial which are quite probably down to this problem.

It's in my queue, sorry, been traveling...