2013-03-11 20:45:46

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 00/44] ldisc patchset

Greg,
This patchset includes
'tty: Drop lock contention stat from ldsem trylocks'
so no need to apply that on this series. Also, I noticed you
kept the 'tty is NULL' removal on a different branch so I left
my patch in this series that removes it.

This series applies cleanly to tty-next.

v5 changes:

After completing an audit of the recursive use of ldisc
references, I discovered the _blocking_ recursive acquisition
of ldisc references was limited to line disciplines misusing
the tty_perform_flush() function.
With that now resolved in,
'tty: Fix recursive deadlock in tty_perform_flush()'
the recursion design in ldsem has been removed.

The recursion removal is in its own patch,
'tty: Remove ldsem recursion support'
to ease review for those that have already reviewed the
ldsem implementation.

In addition, this patchset implements lock stealing derived
from the work of Michel Lespinasse <[email protected]> on
writer lock stealing in rwsem.

Although the rwsem write lock stealing changes are motivated
by performance criteria, these changes are motivated by reduced
code line count and simplicity of design.


*** Edited below to remove recursion discussion ***

Back in early December I realized that a classic read/write semaphore
with writer priority was the ideal mechanism for handling the
line discipline referencing.

Line discipline references act as "readers"; closing or changing the
line discipline is prevented while these references are outstanding.
Conversely, line discipline references should not be granted while
the line discipline is closing or changing; these are the "writers".

Unfortunately, the existing rwsem uses a FIFO priority for
waiting threads and does not support timeouts.

So this implements just that: a writer-priority
read/write semaphore with timed waits.

Initially, it was my intention to have this in 2 patchsets but
since the v2 and v3 patchsets were pushed back, this ends up
being an all-in-one :)

Other changes in v4 from v3:

- From Jiri's review of v3:

'tty: Add diagnostic for halted line discipline' was split:
the function relocation is now in a separate patch,
'tty: Relocate tty_ldisc_halt() to avoid forward declaration'

'n_tty: Factor packet mode status change for reuse':
packet_mode_flush() was renamed n_tty_packet_mode_flush()

'tty: Refactor wait for ldisc refs out of tty_ldisc_hangup()':
the parentheses were removed in the return value conversion.

'tty: Strengthen no-subsequent-use guarantee of tty_ldisc_halt()':
although addressed in the follow-on patch, the timeout values
are now identical so as to avoid confusion in reviewing.

'tty: Kick waiters _after_ the ldisc is locked' was dropped:
although my logic is solid, this change made the conversion
to a r/w semaphore awkward-looking. I think it makes sense to
fix this separately in n_tty anyway.

'tty: Remove unnecessary buffer work flush' was merged with
new patch 'tty: Complete ownership transfer of flip buffers'

Patches 19-31 implement the switch to ldsem.

Patch 32 removes the 'tty is NULL' diagnostic. The logic supporting
this change is in the commit message but I'll repeat it here:

Now that the driver i/o path is separate from tty lifetimes
(implemented in Jiri's last patch series, soon to be in 3.9-rc1),
a driver may unknowingly submit i/o to a tty that no longer exists.
There is little sense in WARNing about an expected outcome.

Patch 14/32 'tty: Complete ownership transfer of flip buffers' ensures
that no bad will come of the superfluous work -- other than that work for
no-good-reason was submitted in the first place -- by waiting for
work that may have retrieved what will soon be a stale tty value
and by cancelling outstanding work before the port is destroyed
(the work is owned by the port and contained within its structure).

As before, this series passes the stress tests that Ilya wrote plus some
new ones that I have written.


Peter Hurley (44):
tty: Add diagnostic for halted line discipline
n_tty: Factor packet mode status change for reuse
n_tty: Don't flush buffer when closing ldisc
tty: Refactor wait for ldisc refs out of tty_ldisc_hangup()
tty: Remove unnecessary re-test of ldisc ref count
tty: Fix ldisc halt sequence on hangup
tty: Relocate tty_ldisc_halt() to avoid forward declaration
tty: Strengthen no-subsequent-use guarantee of tty_ldisc_halt()
tty: Halt both ldiscs concurrently
tty: Wait for SAK work before waiting for hangup work
n_tty: Correct unthrottle-with-buffer-flush comments
n_tty: Fully initialize ldisc before restarting buffer work
tty: Don't reenable already enabled ldisc
tty: Complete ownership transfer of flip buffers
tty: Make core responsible for synchronizing its work
tty: Fix 'deferred reopen' ldisc comment
tty: Bracket ldisc release with TTY_DEBUG_HANGUP messages
tty: Add ldisc hangup debug messages
tty: Don't protect atomic operation with mutex
tty: Separate release semantics of ldisc reference
tty: Document unsafe ldisc reference acquire
tty: Fold one-line assign function into callers
tty: Locate get/put ldisc functions together
tty: Remove redundant tty_wait_until_sent()
tty: Fix recursive deadlock in tty_perform_flush()
tty: Add read-recursive, writer-prioritized rw semaphore
tty: Drop lock contention stat from ldsem trylocks
tty: Remove ldsem recursion support
tty: Add lock/unlock ldisc pair functions
tty: Replace ldisc locking with ldisc_sem
tty: Clarify ldisc variable
tty: Fix hangup race with TIOCSETD ioctl
tty: Clarify multiple-references comment in TIOCSETD ioctl
tty: Fix tty_ldisc_lock name collision
tty: Drop "tty is NULL" flip buffer diagnostic
tty: Inline ldsem down_failed() into down_{read,write}_failed()
tty: Drop ldsem wait type
tty: Drop wake type optimization
tty: Factor ldsem writer trylock
tty: Simplify lock taking for waiting writers
tty: Implement ldsem fast path write lock stealing
tty: Reduce and simplify ldsem atomic ops
tty: Early-out ldsem write lock stealing
tty: Early-out tardy ldsem readers

drivers/net/ppp/ppp_async.c | 2 +-
drivers/net/ppp/ppp_synctty.c | 2 +-
drivers/tty/Makefile | 2 +-
drivers/tty/n_tty.c | 62 +++--
drivers/tty/tty_buffer.c | 4 +-
drivers/tty/tty_io.c | 33 ++-
drivers/tty/tty_ioctl.c | 28 ++-
drivers/tty/tty_ldisc.c | 551 +++++++++++++++---------------------------
drivers/tty/tty_ldsem.c | 450 ++++++++++++++++++++++++++++++++++
drivers/tty/tty_port.c | 1 +
include/linux/tty.h | 7 +-
include/linux/tty_ldisc.h | 49 +++-
12 files changed, 779 insertions(+), 412 deletions(-)
create mode 100644 drivers/tty/tty_ldsem.c

--
1.8.1.2


2013-03-11 20:45:48

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 01/44] tty: Add diagnostic for halted line discipline

Flip buffer work must not be scheduled by the line discipline
after the line discipline has been halted; issue warning.

Note: drivers can still schedule flip buffer work.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/n_tty.c | 8 ++++++++
drivers/tty/tty_ldisc.c | 7 ++++++-
include/linux/tty.h | 1 +
3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 05e72be..9c18e37 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -153,6 +153,12 @@ static void n_tty_set_room(struct tty_struct *tty)
if (left && !old_left) {
WARN_RATELIMIT(tty->port->itty == NULL,
"scheduling with invalid itty\n");
+ /* see if ldisc has been killed - if so, this means that
+ * even though the ldisc has been halted and ->buf.work
+ * cancelled, ->buf.work is about to be rescheduled
+ */
+ WARN_RATELIMIT(test_bit(TTY_LDISC_HALTED, &tty->flags),
+ "scheduling buffer work for halted ldisc\n");
schedule_work(&tty->port->buf.work);
}
}
@@ -1645,6 +1651,8 @@ static int n_tty_open(struct tty_struct *tty)
goto err_free_bufs;

tty->disc_data = ldata;
+ /* indicate buffer work may resume */
+ clear_bit(TTY_LDISC_HALTED, &tty->flags);
reset_buffer_flags(tty);
tty_unthrottle(tty);
ldata->column = 0;
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index d794087..c641321 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -375,6 +375,7 @@ static inline void tty_ldisc_put(struct tty_ldisc *ld)

void tty_ldisc_enable(struct tty_struct *tty)
{
+ clear_bit(TTY_LDISC_HALTED, &tty->flags);
set_bit(TTY_LDISC, &tty->flags);
clear_bit(TTY_LDISC_CHANGING, &tty->flags);
wake_up(&tty_ldisc_wait);
@@ -513,8 +514,11 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)

static int tty_ldisc_halt(struct tty_struct *tty)
{
+ int scheduled;
clear_bit(TTY_LDISC, &tty->flags);
- return cancel_work_sync(&tty->port->buf.work);
+ scheduled = cancel_work_sync(&tty->port->buf.work);
+ set_bit(TTY_LDISC_HALTED, &tty->flags);
+ return scheduled;
}

/**
@@ -820,6 +824,7 @@ void tty_ldisc_hangup(struct tty_struct *tty)
clear_bit(TTY_LDISC, &tty->flags);
tty_unlock(tty);
cancel_work_sync(&tty->port->buf.work);
+ set_bit(TTY_LDISC_HALTED, &tty->flags);
mutex_unlock(&tty->ldisc_mutex);
retry:
tty_lock(tty);
diff --git a/include/linux/tty.h b/include/linux/tty.h
index c75d886..7aa4be6 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -315,6 +315,7 @@ struct tty_file_private {
#define TTY_NO_WRITE_SPLIT 17 /* Preserve write boundaries to driver */
#define TTY_HUPPED 18 /* Post driver->hangup() */
#define TTY_HUPPING 21 /* ->hangup() in progress */
+#define TTY_LDISC_HALTED 22 /* Line discipline is halted */

#define TTY_WRITE_FLUSH(tty) tty_write_flush((tty))

--
1.8.1.2

2013-03-11 20:45:54

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 02/44] n_tty: Factor packet mode status change for reuse

Factor the packet mode status change from n_tty_flush_buffer
for use by follow-on patch.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/n_tty.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 9c18e37..4d3ab2c 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -238,6 +238,18 @@ static void reset_buffer_flags(struct tty_struct *tty)
n_tty_set_room(tty);
}

+static void n_tty_packet_mode_flush(struct tty_struct *tty)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&tty->ctrl_lock, flags);
+ if (tty->link->packet) {
+ tty->ctrl_status |= TIOCPKT_FLUSHREAD;
+ wake_up_interruptible(&tty->link->read_wait);
+ }
+ spin_unlock_irqrestore(&tty->ctrl_lock, flags);
+}
+
/**
* n_tty_flush_buffer - clean input queue
* @tty: terminal device
@@ -252,19 +264,11 @@ static void reset_buffer_flags(struct tty_struct *tty)

static void n_tty_flush_buffer(struct tty_struct *tty)
{
- unsigned long flags;
/* clear everything and unthrottle the driver */
reset_buffer_flags(tty);

- if (!tty->link)
- return;
-
- spin_lock_irqsave(&tty->ctrl_lock, flags);
- if (tty->link->packet) {
- tty->ctrl_status |= TIOCPKT_FLUSHREAD;
- wake_up_interruptible(&tty->link->read_wait);
- }
- spin_unlock_irqrestore(&tty->ctrl_lock, flags);
+ if (tty->link)
+ n_tty_packet_mode_flush(tty);
}

/**
--
1.8.1.2

2013-03-11 20:46:03

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 05/44] tty: Remove unnecessary re-test of ldisc ref count

Since the tty->ldisc is prevented from being changed by tty_set_ldisc()
when a tty is being hung up, re-testing the ldisc user count is
unnecessary -- ie, it cannot be a different ldisc and the user count
cannot have increased (assuming the caller meets the precondition that
TTY_LDISC flag is cleared)

Removal of the 'early-out' locking optimization is necessary for
the subsequent patch 'tty: Fix ldisc halt sequence on hangup'.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index c5b848a..fa0170e 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -558,29 +558,29 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
* have been halted for this to guarantee it remains idle.
*
* Caller must hold legacy and ->ldisc_mutex.
+ *
+ * NB: tty_set_ldisc() is prevented from changing the ldisc concurrently
+ * with this function by checking the TTY_HUPPING flag.
*/
static bool tty_ldisc_hangup_wait_idle(struct tty_struct *tty)
{
- while (tty->ldisc) { /* Not yet closed */
- if (atomic_read(&tty->ldisc->users) != 1) {
- char cur_n[TASK_COMM_LEN], tty_n[64];
- long timeout = 3 * HZ;
- tty_unlock(tty);
-
- while (tty_ldisc_wait_idle(tty, timeout) == -EBUSY) {
- timeout = MAX_SCHEDULE_TIMEOUT;
- printk_ratelimited(KERN_WARNING
- "%s: waiting (%s) for %s took too long, but we keep waiting...\n",
- __func__, get_task_comm(cur_n, current),
- tty_name(tty, tty_n));
- }
- /* must reacquire both locks and preserve lock order */
- mutex_unlock(&tty->ldisc_mutex);
- tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);
- continue;
+ char cur_n[TASK_COMM_LEN], tty_n[64];
+ long timeout = 3 * HZ;
+
+ if (tty->ldisc) { /* Not yet closed */
+ tty_unlock(tty);
+
+ while (tty_ldisc_wait_idle(tty, timeout) == -EBUSY) {
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ printk_ratelimited(KERN_WARNING
+ "%s: waiting (%s) for %s took too long, but we keep waiting...\n",
+ __func__, get_task_comm(cur_n, current),
+ tty_name(tty, tty_n));
}
- break;
+ /* must reacquire both locks and preserve lock order */
+ mutex_unlock(&tty->ldisc_mutex);
+ tty_lock(tty);
+ mutex_lock(&tty->ldisc_mutex);
}
return !!tty->ldisc;
}
--
1.8.1.2

2013-03-11 20:46:14

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 08/44] tty: Strengthen no-subsequent-use guarantee of tty_ldisc_halt()

In preparation for destructing and freeing the tty, the line discipline
must first be brought to an inactive state before it can be destructed.
This line discipline shutdown must:
- disallow new users of the ldisc
- wait for existing ldisc users to finish
- only then, cancel/flush their pending/running work

Factor tty_ldisc_wait_idle() from tty_set_ldisc() and tty_ldisc_kill()
to ensure this shutdown order.

Failure to provide this guarantee can result in scheduled work
running after the tty has already been freed, as indicated in the
following log message:

[ 88.331234] WARNING: at drivers/tty/tty_buffer.c:435 flush_to_ldisc+0x194/0x1d0()
[ 88.334505] Hardware name: Bochs
[ 88.335618] tty is bad=-1
[ 88.335703] Modules linked in: netconsole configfs bnep rfcomm bluetooth ......
[ 88.345272] Pid: 39, comm: kworker/1:1 Tainted: G W 3.7.0-next-20121129+ttydebug-xeon #20121129+ttydebug
[ 88.347736] Call Trace:
[ 88.349024] [<ffffffff81058aff>] warn_slowpath_common+0x7f/0xc0
[ 88.350383] [<ffffffff81058bf6>] warn_slowpath_fmt+0x46/0x50
[ 88.351745] [<ffffffff81432bd4>] flush_to_ldisc+0x194/0x1d0
[ 88.353047] [<ffffffff816f7fe1>] ? _raw_spin_unlock_irq+0x21/0x50
[ 88.354190] [<ffffffff8108a809>] ? finish_task_switch+0x49/0xe0
[ 88.355436] [<ffffffff81077ad1>] process_one_work+0x121/0x490
[ 88.357674] [<ffffffff81432a40>] ? __tty_buffer_flush+0x90/0x90
[ 88.358954] [<ffffffff81078c84>] worker_thread+0x164/0x3e0
[ 88.360247] [<ffffffff81078b20>] ? manage_workers+0x120/0x120
[ 88.361282] [<ffffffff8107e230>] kthread+0xc0/0xd0
[ 88.362284] [<ffffffff816f0000>] ? cmos_do_probe+0x2eb/0x3bf
[ 88.363391] [<ffffffff8107e170>] ? flush_kthread_worker+0xb0/0xb0
[ 88.364797] [<ffffffff816fff6c>] ret_from_fork+0x7c/0xb0
[ 88.366087] [<ffffffff8107e170>] ? flush_kthread_worker+0xb0/0xb0
[ 88.367266] ---[ end trace 453a7c9f38fbfec0 ]---

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 42 ++++++++++++++++++++++++------------------
1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index f691c76..525ee53 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -530,24 +530,38 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
/**
* tty_ldisc_halt - shut down the line discipline
* @tty: tty device
+ * @pending: returns true if work was scheduled when cancelled
+ * (can be set to NULL)
+ * @timeout: # of jiffies to wait for ldisc refs to be released
*
* Shut down the line discipline and work queue for this tty device.
* The TTY_LDISC flag being cleared ensures no further references can
* be obtained while the delayed work queue halt ensures that no more
* data is fed to the ldisc.
*
+ * Furthermore, guarantee that existing ldisc references have been
+ * released, which in turn, guarantees that no future buffer work
+ * can be rescheduled.
+ *
* You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex)
* in order to make sure any currently executing ldisc work is also
* flushed.
*/

-static int tty_ldisc_halt(struct tty_struct *tty)
+static int tty_ldisc_halt(struct tty_struct *tty, int *pending, long timeout)
{
- int scheduled;
+ int scheduled, retval;
+
clear_bit(TTY_LDISC, &tty->flags);
+ retval = tty_ldisc_wait_idle(tty, timeout);
+ if (retval)
+ return retval;
+
scheduled = cancel_work_sync(&tty->port->buf.work);
set_bit(TTY_LDISC_HALTED, &tty->flags);
- return scheduled;
+ if (pending)
+ *pending = scheduled;
+ return 0;
}

/**
@@ -688,9 +702,9 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
* parallel to the change and re-referencing the tty.
*/

- work = tty_ldisc_halt(tty);
- if (o_tty)
- o_work = tty_ldisc_halt(o_tty);
+ retval = tty_ldisc_halt(tty, &work, 5 * HZ);
+ if (!retval && o_tty)
+ retval = tty_ldisc_halt(o_tty, &o_work, 5 * HZ);

/*
* Wait for ->hangup_work and ->buf.work handlers to terminate.
@@ -701,8 +715,6 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)

tty_ldisc_flush_works(tty);

- retval = tty_ldisc_wait_idle(tty, 5 * HZ);
-
tty_lock(tty);
mutex_lock(&tty->ldisc_mutex);

@@ -921,11 +933,6 @@ int tty_ldisc_setup(struct tty_struct *tty, struct tty_struct *o_tty)

static void tty_ldisc_kill(struct tty_struct *tty)
{
- /* There cannot be users from userspace now. But there still might be
- * drivers holding a reference via tty_ldisc_ref. Do not steal them the
- * ldisc until they are done. */
- tty_ldisc_wait_idle(tty, MAX_SCHEDULE_TIMEOUT);
-
mutex_lock(&tty->ldisc_mutex);
/*
* Now kill off the ldisc
@@ -958,13 +965,12 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
* race with the set_ldisc code path.
*/

- tty_ldisc_halt(tty);
- if (o_tty)
- tty_ldisc_halt(o_tty);
-
+ tty_ldisc_halt(tty, NULL, MAX_SCHEDULE_TIMEOUT);
tty_ldisc_flush_works(tty);
- if (o_tty)
+ if (o_tty) {
+ tty_ldisc_halt(o_tty, NULL, MAX_SCHEDULE_TIMEOUT);
tty_ldisc_flush_works(o_tty);
+ }

tty_lock_pair(tty, o_tty);
/* This will need doing differently if we need to lock */
--
1.8.1.2

2013-03-11 20:46:08

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 07/44] tty: Relocate tty_ldisc_halt() to avoid forward declaration

tty_ldisc_halt() will use the file-scoped function, tty_ldisc_wait_idle(),
in the following patch.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 15667c0..f691c76 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -499,29 +499,6 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
}

/**
- * tty_ldisc_halt - shut down the line discipline
- * @tty: tty device
- *
- * Shut down the line discipline and work queue for this tty device.
- * The TTY_LDISC flag being cleared ensures no further references can
- * be obtained while the delayed work queue halt ensures that no more
- * data is fed to the ldisc.
- *
- * You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex)
- * in order to make sure any currently executing ldisc work is also
- * flushed.
- */
-
-static int tty_ldisc_halt(struct tty_struct *tty)
-{
- int scheduled;
- clear_bit(TTY_LDISC, &tty->flags);
- scheduled = cancel_work_sync(&tty->port->buf.work);
- set_bit(TTY_LDISC_HALTED, &tty->flags);
- return scheduled;
-}
-
-/**
* tty_ldisc_flush_works - flush all works of a tty
* @tty: tty device to flush works for
*
@@ -551,6 +528,29 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
}

/**
+ * tty_ldisc_halt - shut down the line discipline
+ * @tty: tty device
+ *
+ * Shut down the line discipline and work queue for this tty device.
+ * The TTY_LDISC flag being cleared ensures no further references can
+ * be obtained while the delayed work queue halt ensures that no more
+ * data is fed to the ldisc.
+ *
+ * You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex)
+ * in order to make sure any currently executing ldisc work is also
+ * flushed.
+ */
+
+static int tty_ldisc_halt(struct tty_struct *tty)
+{
+ int scheduled;
+ clear_bit(TTY_LDISC, &tty->flags);
+ scheduled = cancel_work_sync(&tty->port->buf.work);
+ set_bit(TTY_LDISC_HALTED, &tty->flags);
+ return scheduled;
+}
+
+/**
* tty_ldisc_hangup_halt - halt the line discipline for hangup
* @tty: tty being hung up
*
--
1.8.1.2

2013-03-11 20:46:20

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 09/44] tty: Halt both ldiscs concurrently

The pty driver does not obtain an ldisc reference to the linked
tty when writing. When the ldiscs are sequentially halted, it
is possible for one ldisc to be halted, and before the second
ldisc can be halted, a concurrent write schedules buffer work on
the first ldisc. This can lead to an access-after-free error when
the scheduled buffer work starts on the closed ldisc.

Prevent subsequent use after halt by performing each stage
of the halt on both ttys.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 525ee53..7712091 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -530,14 +530,17 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
/**
* tty_ldisc_halt - shut down the line discipline
* @tty: tty device
+ * @o_tty: paired pty device (can be NULL)
* @pending: returns true if work was scheduled when cancelled
* (can be set to NULL)
+ * @o_pending: returns true if work was scheduled when cancelled
+ * (can be set to NULL)
* @timeout: # of jiffies to wait for ldisc refs to be released
*
- * Shut down the line discipline and work queue for this tty device.
- * The TTY_LDISC flag being cleared ensures no further references can
- * be obtained while the delayed work queue halt ensures that no more
- * data is fed to the ldisc.
+ * Shut down the line discipline and work queue for this tty device and
+ * its paired pty (if exists). Clearing the TTY_LDISC flag ensures
+ * no further references can be obtained while the work queue halt
+ * ensures that no more data is fed to the ldisc.
*
* Furthermore, guarantee that existing ldisc references have been
* released, which in turn, guarantees that no future buffer work
@@ -548,19 +551,32 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
* flushed.
*/

-static int tty_ldisc_halt(struct tty_struct *tty, int *pending, long timeout)
+static int tty_ldisc_halt(struct tty_struct *tty, struct tty_struct *o_tty,
+ int *pending, int *o_pending, long timeout)
{
- int scheduled, retval;
+ int scheduled, o_scheduled, retval;

clear_bit(TTY_LDISC, &tty->flags);
+ if (o_tty)
+ clear_bit(TTY_LDISC, &o_tty->flags);
+
retval = tty_ldisc_wait_idle(tty, timeout);
+ if (!retval && o_tty)
+ retval = tty_ldisc_wait_idle(o_tty, timeout);
if (retval)
return retval;

scheduled = cancel_work_sync(&tty->port->buf.work);
set_bit(TTY_LDISC_HALTED, &tty->flags);
+ if (o_tty) {
+ o_scheduled = cancel_work_sync(&o_tty->port->buf.work);
+ set_bit(TTY_LDISC_HALTED, &o_tty->flags);
+ }
+
if (pending)
*pending = scheduled;
+ if (o_tty && o_pending)
+ *o_pending = o_scheduled;
return 0;
}

@@ -702,9 +718,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
* parallel to the change and re-referencing the tty.
*/

- retval = tty_ldisc_halt(tty, &work, 5 * HZ);
- if (!retval && o_tty)
- retval = tty_ldisc_halt(o_tty, &o_work, 5 * HZ);
+ retval = tty_ldisc_halt(tty, o_tty, &work, &o_work, 5 * HZ);

/*
* Wait for ->hangup_work and ->buf.work handlers to terminate.
@@ -965,12 +979,10 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
* race with the set_ldisc code path.
*/

- tty_ldisc_halt(tty, NULL, MAX_SCHEDULE_TIMEOUT);
+ tty_ldisc_halt(tty, o_tty, NULL, NULL, MAX_SCHEDULE_TIMEOUT);
tty_ldisc_flush_works(tty);
- if (o_tty) {
- tty_ldisc_halt(o_tty, NULL, MAX_SCHEDULE_TIMEOUT);
+ if (o_tty)
tty_ldisc_flush_works(o_tty);
- }

tty_lock_pair(tty, o_tty);
/* This will need doing differently if we need to lock */
--
1.8.1.2

2013-03-11 20:45:59

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 03/44] n_tty: Don't flush buffer when closing ldisc

A buffer flush is both undesirable and unnecessary when the ldisc
is closing. A buffer flush performs the following:
1. resets ldisc data fields to their initial state
2. resets tty->receive_room to indicate more data can be sent
3. schedules buffer work to receive more data
4. signals a buffer flush has happened to linked pty in packet mode

Since the ldisc has been halted and the tty may soon be destructed,
buffer work must not be scheduled as that work might access
an invalid tty and ldisc state. Also, the ldisc read buffer is about
to be freed, so that's pointless.

Resetting the ldisc data fields is pointless as well since that
structure is about to be freed.

Resetting tty->receive_room is unnecessary, as it will be properly
reset if a new ldisc is reopened. Besides, resetting the original
receive_room value would be wrong since the read buffer will be
gone.

Since the packet mode flush is observable from userspace, this
behavior has been preserved.

The test jig originally authored by Ilya Zykov <[email protected]> and
signed off by him is included below. The test jig prompts the
following warnings which this patch fixes.

[ 38.051111] ------------[ cut here ]------------
[ 38.052113] WARNING: at drivers/tty/n_tty.c:160 n_tty_set_room.part.6+0x8b/0xa0()
[ 38.053916] Hardware name: Bochs
[ 38.054819] Modules linked in: netconsole configfs bnep rfcomm bluetooth parport_pc ppdev snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq psmouse snd_timer serio_raw mac_hid snd_seq_device
snd microcode lp parport virtio_balloon soundcore i2c_piix4 snd_page_alloc floppy 8139too 8139cp
[ 38.059704] Pid: 1564, comm: pty_kill Tainted: G W 3.7.0-next-20121130+ttydebug-xeon #20121130+ttydebug
[ 38.061578] Call Trace:
[ 38.062491] [<ffffffff81058b4f>] warn_slowpath_common+0x7f/0xc0
[ 38.063448] [<ffffffff81058baa>] warn_slowpath_null+0x1a/0x20
[ 38.064439] [<ffffffff8142dc2b>] n_tty_set_room.part.6+0x8b/0xa0
[ 38.065381] [<ffffffff8142dc82>] n_tty_set_room+0x42/0x80
[ 38.066323] [<ffffffff8142e6f2>] reset_buffer_flags+0x102/0x160
[ 38.077508] [<ffffffff8142e76d>] n_tty_flush_buffer+0x1d/0x90
[ 38.078782] [<ffffffff81046569>] ? default_spin_lock_flags+0x9/0x10
[ 38.079734] [<ffffffff8142e804>] n_tty_close+0x24/0x60
[ 38.080730] [<ffffffff81431b61>] tty_ldisc_close.isra.2+0x41/0x60
[ 38.081680] [<ffffffff81431bbb>] tty_ldisc_kill+0x3b/0x80
[ 38.082618] [<ffffffff81432a07>] tty_ldisc_release+0x77/0xe0
[ 38.083549] [<ffffffff8142b781>] tty_release+0x451/0x4d0
[ 38.084525] [<ffffffff811950be>] __fput+0xae/0x230
[ 38.085472] [<ffffffff8119524e>] ____fput+0xe/0x10
[ 38.086401] [<ffffffff8107aa88>] task_work_run+0xc8/0xf0
[ 38.087334] [<ffffffff8105ea56>] do_exit+0x196/0x4b0
[ 38.088304] [<ffffffff8106c77b>] ? __dequeue_signal+0x6b/0xb0
[ 38.089240] [<ffffffff8105ef34>] do_group_exit+0x44/0xa0
[ 38.090182] [<ffffffff8106f43d>] get_signal_to_deliver+0x20d/0x4e0
[ 38.091125] [<ffffffff81016979>] do_signal+0x29/0x130
[ 38.092096] [<ffffffff81431a9e>] ? tty_ldisc_deref+0xe/0x10
[ 38.093030] [<ffffffff8142a317>] ? tty_write+0xb7/0xf0
[ 38.093976] [<ffffffff81193f53>] ? vfs_write+0xb3/0x180
[ 38.094904] [<ffffffff81016b20>] do_notify_resume+0x80/0xc0
[ 38.095830] [<ffffffff81700492>] int_signal+0x12/0x17
[ 38.096788] ---[ end trace 5f6f7a9651cd999b ]---

[ 2730.570602] ------------[ cut here ]------------
[ 2730.572130] WARNING: at drivers/tty/n_tty.c:160 n_tty_set_room+0x107/0x140()
[ 2730.574904] scheduling buffer work for halted ldisc
[ 2730.578303] Pid: 9691, comm: trinity-child15 Tainted: G W 3.7.0-rc8-next-20121205-sasha-00023-g59f0d85 #207
[ 2730.588694] Call Trace:
[ 2730.590486] [<ffffffff81c41d77>] ? n_tty_set_room+0x107/0x140
[ 2730.592559] [<ffffffff8110c827>] warn_slowpath_common+0x87/0xb0
[ 2730.595317] [<ffffffff8110c8b1>] warn_slowpath_fmt+0x41/0x50
[ 2730.599186] [<ffffffff81c41d77>] n_tty_set_room+0x107/0x140
[ 2730.603141] [<ffffffff81c42fe7>] reset_buffer_flags+0x137/0x150
[ 2730.607166] [<ffffffff81c43018>] n_tty_flush_buffer+0x18/0x90
[ 2730.610123] [<ffffffff81c430af>] n_tty_close+0x1f/0x60
[ 2730.612068] [<ffffffff81c461f2>] tty_ldisc_close.isra.4+0x52/0x60
[ 2730.614078] [<ffffffff81c462ab>] tty_ldisc_reinit+0x3b/0x70
[ 2730.615891] [<ffffffff81c46db2>] tty_ldisc_hangup+0x102/0x1e0
[ 2730.617780] [<ffffffff81c3e537>] __tty_hangup+0x137/0x440
[ 2730.619547] [<ffffffff81c3e869>] tty_vhangup+0x9/0x10
[ 2730.621266] [<ffffffff81c48f1c>] pty_close+0x14c/0x160
[ 2730.622952] [<ffffffff81c3fd45>] tty_release+0xd5/0x490
[ 2730.624674] [<ffffffff8127fbe2>] __fput+0x122/0x250
[ 2730.626195] [<ffffffff8127fd19>] ____fput+0x9/0x10
[ 2730.627758] [<ffffffff81134602>] task_work_run+0xb2/0xf0
[ 2730.629491] [<ffffffff811139ad>] do_exit+0x36d/0x580
[ 2730.631159] [<ffffffff81113c8a>] do_group_exit+0x8a/0xc0
[ 2730.632819] [<ffffffff81127351>] get_signal_to_deliver+0x501/0x5b0
[ 2730.634758] [<ffffffff8106de34>] do_signal+0x24/0x100
[ 2730.636412] [<ffffffff81204865>] ? user_exit+0xa5/0xd0
[ 2730.638078] [<ffffffff81183cd8>] ? trace_hardirqs_on_caller+0x118/0x140
[ 2730.640279] [<ffffffff81183d0d>] ? trace_hardirqs_on+0xd/0x10
[ 2730.642164] [<ffffffff8106df78>] do_notify_resume+0x48/0xa0
[ 2730.643966] [<ffffffff83cdff6a>] int_signal+0x12/0x17
[ 2730.645672] ---[ end trace a40d53149c07fce0 ]---

/*
* pty_thrash.c
*
* Based on original test jig by Ilya Zykov <[email protected]>
*
* Signed-off-by: Peter Hurley <[email protected]>
* Signed-off-by: Ilya Zykov <[email protected]>
*/

static int fd;

static void error_exit(char *f, ...)
{
va_list va;

va_start(va, f);
vprintf(f, va);
printf(": %s\n", strerror(errno));
va_end(va);

if (fd >= 0)
close(fd);

exit(EXIT_FAILURE);
}

int main(int argc, char *argv[]) {
int parent;
char pts_name[24];
int ptn, unlock;

while (1) {

fd = open("/dev/ptmx", O_RDWR);
if (fd < 0)
error_exit("opening pty master");
unlock = 0;
if (ioctl(fd, TIOCSPTLCK, &unlock) < 0)
error_exit("unlocking pty pair");
if (ioctl(fd, TIOCGPTN, &ptn) < 0)
error_exit("getting pty #");
snprintf(pts_name, sizeof(pts_name), "/dev/pts/%d", ptn);

child_id = fork();
if (child_id == -1)
error_exit("forking child");

if (parent) {
int err, id, status;
char buf[128];
int n;

n = read(fd, buf, sizeof(buf));
if (n < 0)
error_exit("master reading");
printf("%.*s\n", n-1, buf);

close(fd);

err = kill(child_id, SIGKILL);
if (err < 0)
error_exit("killing child");
id = waitpid(child_id, &status, 0);
if (id < 0 || id != child_id)
error_exit("waiting for child");

} else { /* Child */

close(fd);
printf("Test cycle on slave pty %s\n", pts_name);
fd = open(pts_name, O_RDWR);
if (fd < 0)
error_exit("opening pty slave");

while (1) {
char pattern[] = "test\n";
if (write(fd, pattern, strlen(pattern)) < 0)
error_exit("slave writing");
}

}
}

/* never gets here */
return 0;
}

Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/n_tty.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 4d3ab2c..0b85693 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1617,7 +1617,9 @@ static void n_tty_close(struct tty_struct *tty)
{
struct n_tty_data *ldata = tty->disc_data;

- n_tty_flush_buffer(tty);
+ if (tty->link)
+ n_tty_packet_mode_flush(tty);
+
kfree(ldata->read_buf);
kfree(ldata->echo_buf);
kfree(ldata);
--
1.8.1.2

2013-03-11 20:48:07

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 06/44] tty: Fix ldisc halt sequence on hangup

Flip buffer work cannot be cancelled until all outstanding ldisc
references have been released. Convert the ldisc ref wait into
a full ldisc halt with buffer work cancellation.

Note that the legacy mutex is not held while cancelling.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index fa0170e..15667c0 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -551,22 +551,30 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
}

/**
- * tty_ldisc_hangup_wait_idle - wait for the ldisc to become idle
- * @tty: tty to wait for
- *
- * Wait for the line discipline to become idle. The discipline must
- * have been halted for this to guarantee it remains idle.
+ * tty_ldisc_hangup_halt - halt the line discipline for hangup
+ * @tty: tty being hung up
*
+ * Shut down the line discipline and work queue for the tty device
+ * being hungup. Clear the TTY_LDISC flag to ensure no further
+ * references can be obtained, wait for remaining references to be
+ * released, and cancel pending buffer work to ensure no more
+ * data is fed to this ldisc.
* Caller must hold legacy and ->ldisc_mutex.
*
* NB: tty_set_ldisc() is prevented from changing the ldisc concurrently
* with this function by checking the TTY_HUPPING flag.
+ *
+ * NB: if tty->ldisc is NULL then buffer work does not need to be
+ * cancelled because it must already have done as a precondition
+ * of closing the ldisc and setting tty->ldisc to NULL
*/
-static bool tty_ldisc_hangup_wait_idle(struct tty_struct *tty)
+static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
{
char cur_n[TASK_COMM_LEN], tty_n[64];
long timeout = 3 * HZ;

+ clear_bit(TTY_LDISC, &tty->flags);
+
if (tty->ldisc) { /* Not yet closed */
tty_unlock(tty);

@@ -577,6 +585,10 @@ static bool tty_ldisc_hangup_wait_idle(struct tty_struct *tty)
__func__, get_task_comm(cur_n, current),
tty_name(tty, tty_n));
}
+
+ cancel_work_sync(&tty->port->buf.work);
+ set_bit(TTY_LDISC_HALTED, &tty->flags);
+
/* must reacquire both locks and preserve lock order */
mutex_unlock(&tty->ldisc_mutex);
tty_lock(tty);
@@ -851,24 +863,11 @@ void tty_ldisc_hangup(struct tty_struct *tty)
*/
mutex_lock(&tty->ldisc_mutex);

- /*
- * this is like tty_ldisc_halt, but we need to give up
- * the BTM before calling cancel_work_sync, which may
- * need to wait for another function taking the BTM
- */
- clear_bit(TTY_LDISC, &tty->flags);
- tty_unlock(tty);
- cancel_work_sync(&tty->port->buf.work);
- set_bit(TTY_LDISC_HALTED, &tty->flags);
- mutex_unlock(&tty->ldisc_mutex);
- tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);
-
/* At this point we have a closed ldisc and we want to
reopen it. We could defer this to the next open but
it means auditing a lot of other paths so this is
a FIXME */
- if (tty_ldisc_hangup_wait_idle(tty)) {
+ if (tty_ldisc_hangup_halt(tty)) {
if (reset == 0) {

if (!tty_ldisc_reinit(tty, tty->termios.c_line))
--
1.8.1.2

2013-03-11 20:45:57

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 04/44] tty: Refactor wait for ldisc refs out of tty_ldisc_hangup()

Refactor tty_ldisc_hangup() to extract standalone function,
tty_ldisc_hangup_wait_idle(), to wait for ldisc references
to be released.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 54 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index c641321..c5b848a 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -551,6 +551,41 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
}

/**
+ * tty_ldisc_hangup_wait_idle - wait for the ldisc to become idle
+ * @tty: tty to wait for
+ *
+ * Wait for the line discipline to become idle. The discipline must
+ * have been halted for this to guarantee it remains idle.
+ *
+ * Caller must hold legacy and ->ldisc_mutex.
+ */
+static bool tty_ldisc_hangup_wait_idle(struct tty_struct *tty)
+{
+ while (tty->ldisc) { /* Not yet closed */
+ if (atomic_read(&tty->ldisc->users) != 1) {
+ char cur_n[TASK_COMM_LEN], tty_n[64];
+ long timeout = 3 * HZ;
+ tty_unlock(tty);
+
+ while (tty_ldisc_wait_idle(tty, timeout) == -EBUSY) {
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ printk_ratelimited(KERN_WARNING
+ "%s: waiting (%s) for %s took too long, but we keep waiting...\n",
+ __func__, get_task_comm(cur_n, current),
+ tty_name(tty, tty_n));
+ }
+ /* must reacquire both locks and preserve lock order */
+ mutex_unlock(&tty->ldisc_mutex);
+ tty_lock(tty);
+ mutex_lock(&tty->ldisc_mutex);
+ continue;
+ }
+ break;
+ }
+ return !!tty->ldisc;
+}
+
+/**
* tty_set_ldisc - set line discipline
* @tty: the terminal to set
* @ldisc: the line discipline
@@ -826,7 +861,6 @@ void tty_ldisc_hangup(struct tty_struct *tty)
cancel_work_sync(&tty->port->buf.work);
set_bit(TTY_LDISC_HALTED, &tty->flags);
mutex_unlock(&tty->ldisc_mutex);
-retry:
tty_lock(tty);
mutex_lock(&tty->ldisc_mutex);

@@ -834,23 +868,7 @@ retry:
reopen it. We could defer this to the next open but
it means auditing a lot of other paths so this is
a FIXME */
- if (tty->ldisc) { /* Not yet closed */
- if (atomic_read(&tty->ldisc->users) != 1) {
- char cur_n[TASK_COMM_LEN], tty_n[64];
- long timeout = 3 * HZ;
- tty_unlock(tty);
-
- while (tty_ldisc_wait_idle(tty, timeout) == -EBUSY) {
- timeout = MAX_SCHEDULE_TIMEOUT;
- printk_ratelimited(KERN_WARNING
- "%s: waiting (%s) for %s took too long, but we keep waiting...\n",
- __func__, get_task_comm(cur_n, current),
- tty_name(tty, tty_n));
- }
- mutex_unlock(&tty->ldisc_mutex);
- goto retry;
- }
-
+ if (tty_ldisc_hangup_wait_idle(tty)) {
if (reset == 0) {

if (!tty_ldisc_reinit(tty, tty->termios.c_line))
--
1.8.1.2

2013-03-11 21:15:12

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 10/44] tty: Wait for SAK work before waiting for hangup work

SAK work may schedule hangup work (if TTY_SOFT_SAK is defined), thus
SAK work must be flushed before hangup work.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 7712091..37671fcc 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -506,8 +506,8 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
*/
static void tty_ldisc_flush_works(struct tty_struct *tty)
{
- flush_work(&tty->hangup_work);
flush_work(&tty->SAK_work);
+ flush_work(&tty->hangup_work);
flush_work(&tty->port->buf.work);
}

--
1.8.1.2

2013-03-11 21:15:42

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 37/44] tty: Drop ldsem wait type

Now the lock failures are handled independently for reads and
writes, the wait flags are no longer necessary.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index d849fb85..ddfbdfe 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -76,9 +76,6 @@
struct ldsem_waiter {
struct list_head list;
struct task_struct *task;
- unsigned int flags;
-#define LDSEM_READ_WAIT 0x00000001
-#define LDSEM_WRITE_WAIT 0x00000002
};

/* Wake types for __ldsem_wake(). Note: RWSEM_WAKE_NO_CHECK implies
@@ -226,19 +223,13 @@ static struct ld_semaphore __sched *
down_read_failed(struct ld_semaphore *sem, long timeout)
{
struct ldsem_waiter waiter;
- long flags = LDSEM_READ_WAIT;
long adjust = -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS;

/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
-
- if (flags & LDSEM_READ_WAIT)
- list_add_tail(&waiter.list, &sem->read_wait);
- else
- list_add_tail(&waiter.list, &sem->write_wait);
+ list_add_tail(&waiter.list, &sem->read_wait);

waiter.task = current;
- waiter.flags = flags;
get_task_struct(current);

/* change the lock attempt to a wait --
@@ -287,19 +278,13 @@ static struct ld_semaphore __sched *
down_write_failed(struct ld_semaphore *sem, long timeout)
{
struct ldsem_waiter waiter;
- long flags = LDSEM_WRITE_WAIT;
long adjust = -LDSEM_ACTIVE_BIAS;

/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
-
- if (flags & LDSEM_READ_WAIT)
- list_add_tail(&waiter.list, &sem->read_wait);
- else
- list_add_tail(&waiter.list, &sem->write_wait);
+ list_add_tail(&waiter.list, &sem->write_wait);

waiter.task = current;
- waiter.flags = flags;
get_task_struct(current);

/* change the lock attempt to a wait --
--
1.8.1.2

2013-03-11 21:15:52

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 24/44] tty: Remove redundant tty_wait_until_sent()

tty_ioctl() already waits until sent.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 9 ---------
1 file changed, 9 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 4e46c17..1afe192 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -625,15 +625,6 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
return 0;
}

- tty_unlock(tty);
- /*
- * Problem: What do we do if this blocks ?
- * We could deadlock here
- */
-
- tty_wait_until_sent(tty, 0);
-
- tty_lock(tty);
mutex_lock(&tty->ldisc_mutex);

/*
--
1.8.1.2

2013-03-11 21:15:55

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 14/44] tty: Complete ownership transfer of flip buffers

Waiting for buffer work to complete is not required for safely
performing changes to the line discipline, once the line discipline
is halted. The buffer work routine, flush_to_ldisc(), will be
unable to acquire an ldisc ref and all existing references were
waited until released (so it can't already have one).

Ensure running buffer work which may reference the soon-to-be-gone
tty completes and any buffer work running after this point retrieves
a NULL tty.

Also, ensure all buffer work is cancelled on port destruction.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_io.c | 1 +
drivers/tty/tty_ldisc.c | 47 ++++++++++++-----------------------------------
drivers/tty/tty_port.c | 1 +
3 files changed, 14 insertions(+), 35 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 1ee318a..3613d8b 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1549,6 +1549,7 @@ static void release_tty(struct tty_struct *tty, int idx)
tty_free_termios(tty);
tty_driver_remove_tty(tty->driver, tty);
tty->port->itty = NULL;
+ cancel_work_sync(&tty->port->buf.work);

if (tty->link)
tty_kref_put(tty->link);
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 9c727da..cbb945b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -508,7 +508,6 @@ static void tty_ldisc_flush_works(struct tty_struct *tty)
{
flush_work(&tty->SAK_work);
flush_work(&tty->hangup_work);
- flush_work(&tty->port->buf.work);
}

/**
@@ -531,20 +530,12 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
* tty_ldisc_halt - shut down the line discipline
* @tty: tty device
* @o_tty: paired pty device (can be NULL)
- * @pending: returns true if work was scheduled when cancelled
- * (can be set to NULL)
- * @o_pending: returns true if work was scheduled when cancelled
- * (can be set to NULL)
* @timeout: # of jiffies to wait for ldisc refs to be released
*
* Shut down the line discipline and work queue for this tty device and
* its paired pty (if exists). Clearing the TTY_LDISC flag ensures
- * no further references can be obtained while the work queue halt
- * ensures that no more data is fed to the ldisc.
- *
- * Furthermore, guarantee that existing ldisc references have been
- * released, which in turn, guarantees that no future buffer work
- * can be rescheduled.
+ * no further references can be obtained, while waiting for existing
+ * references to be released ensures no more data is fed to the ldisc.
*
* You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex)
* in order to make sure any currently executing ldisc work is also
@@ -552,9 +543,9 @@ static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
*/

static int tty_ldisc_halt(struct tty_struct *tty, struct tty_struct *o_tty,
- int *pending, int *o_pending, long timeout)
+ long timeout)
{
- int scheduled, o_scheduled, retval;
+ int retval;

clear_bit(TTY_LDISC, &tty->flags);
if (o_tty)
@@ -566,17 +557,10 @@ static int tty_ldisc_halt(struct tty_struct *tty, struct tty_struct *o_tty,
if (retval)
return retval;

- scheduled = cancel_work_sync(&tty->port->buf.work);
set_bit(TTY_LDISC_HALTED, &tty->flags);
- if (o_tty) {
- o_scheduled = cancel_work_sync(&o_tty->port->buf.work);
+ if (o_tty)
set_bit(TTY_LDISC_HALTED, &o_tty->flags);
- }

- if (pending)
- *pending = scheduled;
- if (o_tty && o_pending)
- *o_pending = o_scheduled;
return 0;
}

@@ -586,17 +570,12 @@ static int tty_ldisc_halt(struct tty_struct *tty, struct tty_struct *o_tty,
*
* Shut down the line discipline and work queue for the tty device
* being hungup. Clear the TTY_LDISC flag to ensure no further
- * references can be obtained, wait for remaining references to be
- * released, and cancel pending buffer work to ensure no more
- * data is fed to this ldisc.
+ * references can be obtained and wait for remaining references to be
+ * released to ensure no more data is fed to this ldisc.
* Caller must hold legacy and ->ldisc_mutex.
*
* NB: tty_set_ldisc() is prevented from changing the ldisc concurrently
* with this function by checking the TTY_HUPPING flag.
- *
- * NB: if tty->ldisc is NULL then buffer work does not need to be
- * cancelled because it must already have done as a precondition
- * of closing the ldisc and setting tty->ldisc to NULL
*/
static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
{
@@ -616,7 +595,6 @@ static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
tty_name(tty, tty_n));
}

- cancel_work_sync(&tty->port->buf.work);
set_bit(TTY_LDISC_HALTED, &tty->flags);

/* must reacquire both locks and preserve lock order */
@@ -644,7 +622,6 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
{
int retval;
struct tty_ldisc *o_ldisc, *new_ldisc;
- int work, o_work = 0;
struct tty_struct *o_tty;

new_ldisc = tty_ldisc_get(ldisc);
@@ -718,7 +695,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
* parallel to the change and re-referencing the tty.
*/

- retval = tty_ldisc_halt(tty, o_tty, &work, &o_work, 5 * HZ);
+ retval = tty_ldisc_halt(tty, o_tty, 5 * HZ);

/*
* Wait for ->hangup_work and ->buf.work handlers to terminate.
@@ -782,10 +759,10 @@ enable:

/* Restart the work queue in case no characters kick it off. Safe if
already running */
- if (work)
- schedule_work(&tty->port->buf.work);
- if (o_work)
+ schedule_work(&tty->port->buf.work);
+ if (o_tty)
schedule_work(&o_tty->port->buf.work);
+
mutex_unlock(&tty->ldisc_mutex);
tty_unlock(tty);
return retval;
@@ -979,7 +956,7 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
* race with the set_ldisc code path.
*/

- tty_ldisc_halt(tty, o_tty, NULL, NULL, MAX_SCHEDULE_TIMEOUT);
+ tty_ldisc_halt(tty, o_tty, MAX_SCHEDULE_TIMEOUT);
tty_ldisc_flush_works(tty);
if (o_tty)
tty_ldisc_flush_works(o_tty);
diff --git a/drivers/tty/tty_port.c b/drivers/tty/tty_port.c
index b7ff59d..6d9e0b2 100644
--- a/drivers/tty/tty_port.c
+++ b/drivers/tty/tty_port.c
@@ -132,6 +132,7 @@ EXPORT_SYMBOL(tty_port_free_xmit_buf);
*/
void tty_port_destroy(struct tty_port *port)
{
+ cancel_work_sync(&port->buf.work);
tty_buffer_free_all(port);
}
EXPORT_SYMBOL(tty_port_destroy);
--
1.8.1.2

2013-03-11 21:16:39

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 17/44] tty: Bracket ldisc release with TTY_DEBUG_HANGUP messages

Expected typical log output:
[ 2.437211] tty_open: opening pts1...
[ 2.443376] tty_open: opening pts5...
[ 2.447830] tty_release: ptm0 (tty count=1)...
[ 2.447849] pts0 vhangup...
[ 2.447865] tty_release: ptm0: final close
[ 2.447876] tty_release: ptm0: freeing structure...
[ 2.451634] tty_release: tty1 (tty count=1)...
[ 2.451638] tty_release: tty1: final close
[ 2.451654] tty_release: tty1: freeing structure...
[ 2.452505] tty_release: pts5 (tty count=2)...
[ 2.453029] tty_open: opening pts0...

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_io.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 9e8ff84..2ac516d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1790,7 +1790,7 @@ int tty_release(struct inode *inode, struct file *filp)
return 0;

#ifdef TTY_DEBUG_HANGUP
- printk(KERN_DEBUG "%s: freeing tty structure...\n", __func__);
+ printk(KERN_DEBUG "%s: %s: final close\n", __func__, tty_name(tty, buf));
#endif
/*
* Ask the line discipline code to release its structures
@@ -1802,6 +1802,9 @@ int tty_release(struct inode *inode, struct file *filp)
if (o_tty)
tty_flush_works(o_tty);

+#ifdef TTY_DEBUG_HANGUP
+ printk(KERN_DEBUG "%s: %s: freeing structure...\n", __func__, tty_name(tty, buf));
+#endif
/*
* The release_tty function takes care of the details of clearing
* the slots and preserving the termios structure. The tty_unlock_pair
--
1.8.1.2

2013-03-11 21:16:38

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 20/44] tty: Separate release semantics of ldisc reference

tty_ldisc_ref()/tty_ldisc_unref() have usage semantics
equivalent to down_read_trylock()/up_read(). Only
callers of tty_ldisc_put() are performing the additional
operations necessary for proper ldisc teardown, and then only
after ensuring no outstanding 'read lock' remains.

Thus, tty_ldisc_unref() should never be the last reference;
WARN if it is. Conversely, tty_ldisc_put() should never be
destructing if the use count != 1.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 69 +++++++++++++++++++++++++------------------------
1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 328ff5b..9362a10 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -49,37 +49,6 @@ static inline struct tty_ldisc *get_ldisc(struct tty_ldisc *ld)
return ld;
}

-static void put_ldisc(struct tty_ldisc *ld)
-{
- unsigned long flags;
-
- if (WARN_ON_ONCE(!ld))
- return;
-
- /*
- * If this is the last user, free the ldisc, and
- * release the ldisc ops.
- *
- * We really want an "atomic_dec_and_raw_lock_irqsave()",
- * but we don't have it, so this does it by hand.
- */
- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
- if (atomic_dec_and_test(&ld->users)) {
- struct tty_ldisc_ops *ldo = ld->ops;
-
- ldo->refcount--;
- module_put(ldo->owner);
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
-
- kfree(ld);
- return;
- }
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
-
- if (waitqueue_active(&ld->wq_idle))
- wake_up(&ld->wq_idle);
-}
-
/**
* tty_register_ldisc - install a line discipline
* @disc: ldisc number
@@ -363,13 +332,45 @@ EXPORT_SYMBOL_GPL(tty_ldisc_ref);

void tty_ldisc_deref(struct tty_ldisc *ld)
{
- put_ldisc(ld);
+ unsigned long flags;
+
+ if (WARN_ON_ONCE(!ld))
+ return;
+
+ raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ /*
+ * WARNs if one-too-many reader references were released
+ * - the last reference must be released with tty_ldisc_put
+ */
+ WARN_ON(atomic_dec_and_test(&ld->users));
+ raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+
+ if (waitqueue_active(&ld->wq_idle))
+ wake_up(&ld->wq_idle);
}
EXPORT_SYMBOL_GPL(tty_ldisc_deref);

+/**
+ * tty_ldisc_put - release the ldisc
+ *
+ * Complement of tty_ldisc_get().
+ */
static inline void tty_ldisc_put(struct tty_ldisc *ld)
{
- put_ldisc(ld);
+ unsigned long flags;
+
+ if (WARN_ON_ONCE(!ld))
+ return;
+
+ raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+
+ /* unreleased reader reference(s) will cause this WARN */
+ WARN_ON(!atomic_dec_and_test(&ld->users));
+
+ ld->ops->refcount--;
+ module_put(ld->ops->owner);
+ kfree(ld);
+ raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
}

/**
@@ -1001,7 +1002,7 @@ void tty_ldisc_init(struct tty_struct *tty)
*/
void tty_ldisc_deinit(struct tty_struct *tty)
{
- put_ldisc(tty->ldisc);
+ tty_ldisc_put(tty->ldisc);
tty_ldisc_assign(tty, NULL);
}

--
1.8.1.2

2013-03-11 21:15:50

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 29/44] tty: Add lock/unlock ldisc pair functions

Just as the tty pair must be locked in a stable sequence
(ie, independent of which is consider the 'other' tty), so must
the ldisc pair be locked in a stable sequence as well.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 1afe192..ae0287f 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -31,6 +31,13 @@
#define tty_ldisc_debug(tty, f, args...)
#endif

+/* lockdep nested classes for tty->ldisc_sem */
+enum {
+ LDISC_SEM_NORMAL,
+ LDISC_SEM_OTHER,
+};
+
+
/*
* This guards the refcounted line discipline lists. The lock
* must be taken with irqs off because there are hangup path
@@ -351,6 +358,86 @@ void tty_ldisc_deref(struct tty_ldisc *ld)
}
EXPORT_SYMBOL_GPL(tty_ldisc_deref);

+
+static inline int __lockfunc
+tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
+{
+ return ldsem_down_write(&tty->ldisc_sem, timeout);
+}
+
+static inline int __lockfunc
+tty_ldisc_lock_nested(struct tty_struct *tty, unsigned long timeout)
+{
+ return ldsem_down_write_nested(&tty->ldisc_sem,
+ LDISC_SEM_OTHER, timeout);
+}
+
+static inline void tty_ldisc_unlock(struct tty_struct *tty)
+{
+ return ldsem_up_write(&tty->ldisc_sem);
+}
+
+static int __lockfunc
+tty_ldisc_lock_pair_timeout(struct tty_struct *tty, struct tty_struct *tty2,
+ unsigned long timeout)
+{
+ int ret;
+
+ if (tty < tty2) {
+ ret = tty_ldisc_lock(tty, timeout);
+ if (ret) {
+ ret = tty_ldisc_lock_nested(tty2, timeout);
+ if (!ret)
+ tty_ldisc_unlock(tty);
+ }
+ } else {
+ /* if this is possible, it has lots of implications */
+ WARN_ON_ONCE(tty == tty2);
+ if (tty2 && tty != tty2) {
+ ret = tty_ldisc_lock(tty2, timeout);
+ if (ret) {
+ ret = tty_ldisc_lock_nested(tty, timeout);
+ if (!ret)
+ tty_ldisc_unlock(tty2);
+ }
+ } else
+ ret = tty_ldisc_lock(tty, timeout);
+ }
+
+ if (!ret)
+ return -EBUSY;
+
+ set_bit(TTY_LDISC_HALTED, &tty->flags);
+ if (tty2)
+ set_bit(TTY_LDISC_HALTED, &tty2->flags);
+ return 0;
+}
+
+static void __lockfunc
+tty_ldisc_lock_pair(struct tty_struct *tty, struct tty_struct *tty2)
+{
+ tty_ldisc_lock_pair_timeout(tty, tty2, MAX_SCHEDULE_TIMEOUT);
+}
+
+static void __lockfunc tty_ldisc_unlock_pair(struct tty_struct *tty,
+ struct tty_struct *tty2)
+{
+ tty_ldisc_unlock(tty);
+ if (tty2)
+ tty_ldisc_unlock(tty2);
+}
+
+static void __lockfunc tty_ldisc_enable_pair(struct tty_struct *tty,
+ struct tty_struct *tty2)
+{
+ clear_bit(TTY_LDISC_HALTED, &tty->flags);
+ if (tty2)
+ clear_bit(TTY_LDISC_HALTED, &tty2->flags);
+
+ tty_ldisc_unlock_pair(tty, tty2);
+}
+
+
/**
* tty_ldisc_enable - allow ldisc use
* @tty: terminal to activate ldisc on
--
1.8.1.2

2013-03-11 21:17:21

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 22/44] tty: Fold one-line assign function into callers

Now that tty_ldisc_assign() is a one-line file-scoped function,
remove it and perform the simple assignment at its call sites.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 5ee0b2b..f26ef1a 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -228,24 +228,6 @@ const struct file_operations tty_ldiscs_proc_fops = {
};

/**
- * tty_ldisc_assign - set ldisc on a tty
- * @tty: tty to assign
- * @ld: line discipline
- *
- * Install an instance of a line discipline into a tty structure. The
- * ldisc must have a reference count above zero to ensure it remains.
- * The tty instance refcount starts at zero.
- *
- * Locking:
- * Caller must hold references
- */
-
-static void tty_ldisc_assign(struct tty_struct *tty, struct tty_ldisc *ld)
-{
- tty->ldisc = ld;
-}
-
-/**
* tty_ldisc_try - internal helper
* @tty: the tty
*
@@ -488,7 +470,7 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
/* There is an outstanding reference here so this is safe */
old = tty_ldisc_get(old->ops->num);
WARN_ON(IS_ERR(old));
- tty_ldisc_assign(tty, old);
+ tty->ldisc = old;
tty_set_termios_ldisc(tty, old->ops->num);
if (tty_ldisc_open(tty, old) < 0) {
tty_ldisc_put(old);
@@ -496,7 +478,7 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
new_ldisc = tty_ldisc_get(N_TTY);
if (IS_ERR(new_ldisc))
panic("n_tty: get");
- tty_ldisc_assign(tty, new_ldisc);
+ tty->ldisc = new_ldisc;
tty_set_termios_ldisc(tty, N_TTY);
r = tty_ldisc_open(tty, new_ldisc);
if (r < 0)
@@ -725,7 +707,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
tty_ldisc_close(tty, o_ldisc);

/* Now set up the new line discipline. */
- tty_ldisc_assign(tty, new_ldisc);
+ tty->ldisc = new_ldisc;
tty_set_termios_ldisc(tty, ldisc);

retval = tty_ldisc_open(tty, new_ldisc);
@@ -799,11 +781,10 @@ static int tty_ldisc_reinit(struct tty_struct *tty, int ldisc)

tty_ldisc_close(tty, tty->ldisc);
tty_ldisc_put(tty->ldisc);
- tty->ldisc = NULL;
/*
* Switch the line discipline back
*/
- tty_ldisc_assign(tty, ld);
+ tty->ldisc = ld;
tty_set_termios_ldisc(tty, ldisc);

return 0;
@@ -986,7 +967,7 @@ void tty_ldisc_init(struct tty_struct *tty)
struct tty_ldisc *ld = tty_ldisc_get(N_TTY);
if (IS_ERR(ld))
panic("n_tty: init_tty");
- tty_ldisc_assign(tty, ld);
+ tty->ldisc = ld;
}

/**
@@ -999,7 +980,7 @@ void tty_ldisc_init(struct tty_struct *tty)
void tty_ldisc_deinit(struct tty_struct *tty)
{
tty_ldisc_put(tty->ldisc);
- tty_ldisc_assign(tty, NULL);
+ tty->ldisc = NULL;
}

void tty_ldisc_begin(void)
--
1.8.1.2

2013-03-11 21:15:48

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 32/44] tty: Fix hangup race with TIOCSETD ioctl

The hangup may already have happened; check for that state also.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 9ace119..84ba790 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -543,10 +543,8 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
old_ldisc = tty->ldisc;
tty_lock(tty);

- /* FIXME: for testing only */
- WARN_ON(test_bit(TTY_HUPPED, &tty->flags));
-
- if (test_bit(TTY_HUPPING, &tty->flags)) {
+ if (test_bit(TTY_HUPPING, &tty->flags) ||
+ test_bit(TTY_HUPPED, &tty->flags)) {
/* We were raced by the hangup method. It will have stomped
the ldisc data and closed the ldisc down */
tty_ldisc_enable_pair(tty, o_tty);
--
1.8.1.2

2013-03-11 21:17:47

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 28/44] tty: Remove ldsem recursion support

Read lock recursion is no longer required for ldisc references;
remove mechanism.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 83 +++++------------------------------------------
include/linux/tty_ldisc.h | 2 --
2 files changed, 8 insertions(+), 77 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index c162295..a60d7e3 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -3,28 +3,14 @@
*
* The ldisc semaphore is semantically a rw_semaphore but which enforces
* an alternate policy, namely:
- * 1) Recursive read locking is allowed
- * 2) Supports lock wait timeouts
- * 3) Write waiter has priority, even if lock is already read owned, except:
- * 4) Write waiter does not prevent recursive locking
- * 5) Downgrading is not supported (because of #3 & #4 above)
+ * 1) Supports lock wait timeouts
+ * 2) Write waiter has priority
+ * 3) Downgrading is not supported
*
* Implementation notes:
* 1) Upper half of semaphore count is a wait count (differs from rwsem
* in that rwsem normalizes the upper half to the wait bias)
* 2) Lacks overflow checking
- * 3) Read recursion is tracked with a bitmap indexed by hashed 'current'
- * This approach results in some false positives; ie, a non-recursive
- * read lock may be granted while a write lock is waited.
- * However, this approach does not produce false-negatives
- * (ie. not granting a read lock to a recursive attempt) which might
- * deadlock.
- * Testing the bitmap need not be atomic wrt. setting the bitmap
- * (as the 'current' thread cannot contend with itself); however,
- * since the bitmap is cleared when write lock is granted.
- * Note: increasing the bitmap size reduces the probability of false
- * positives, and thus the probability of granting a non-recursive
- * read lock with writer(s) waiting.
*
* The generic counting was copied and modified from include/asm-generic/rwsem.h
* by Paul Mackerras <[email protected]>.
@@ -53,12 +39,12 @@
# ifdef CONFIG_PROVE_LOCKING
# define lockdep_acquire(l, s, t, i) __acq(l, s, t, 0, 2, NULL, i)
# define lockdep_acquire_nest(l, s, t, n, i) __acq(l, s, t, 0, 2, n, i)
-# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 2, 2, NULL, i)
+# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 1, 2, NULL, i)
# define lockdep_release(l, n, i) __rel(l, n, i)
# else
# define lockdep_acquire(l, s, t, i) __acq(l, s, t, 0, 1, NULL, i)
# define lockdep_acquire_nest(l, s, t, n, i) __acq(l, s, t, 0, 1, n, i)
-# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 2, 1, NULL, i)
+# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 1, 1, NULL, i)
# define lockdep_release(l, n, i) __rel(l, n, i)
# endif
#else
@@ -107,26 +93,6 @@ static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem)
}


-static inline unsigned long __hash_current(void)
-{
- return (unsigned long)current % TASK_MAP_BITS;
-}
-
-static inline void ldsem_clear_task_map(struct ld_semaphore *sem)
-{
- bitmap_zero(sem->task_map, TASK_MAP_BITS);
-}
-
-static inline void ldsem_update_task_map(struct ld_semaphore *sem)
-{
- __set_bit(__hash_current(), sem->task_map);
-}
-
-static inline int ldsem_lock_recursion(struct ld_semaphore *sem)
-{
- return test_bit(__hash_current(), sem->task_map);
-}
-
/*
* Initialize an ldsem:
*/
@@ -144,7 +110,6 @@ void __init_ldsem(struct ld_semaphore *sem, const char *name,
raw_spin_lock_init(&sem->wait_lock);
INIT_LIST_HEAD(&sem->read_wait);
INIT_LIST_HEAD(&sem->write_wait);
- ldsem_clear_task_map(sem);
}

static void __ldsem_wake_readers(struct ld_semaphore *sem, int wake_type)
@@ -217,9 +182,6 @@ static void __ldsem_wake_writer(struct ld_semaphore *sem)
return;
} while (1);

- /* reset read lock recursion map */
- ldsem_clear_task_map(sem);
-
/* We must be careful not to touch 'waiter' after we set ->task = NULL.
* It is an allocated on the waiter's stack and may become invalid at
* any time after that point (due to a wakeup from another source).
@@ -268,17 +230,9 @@ down_failed(struct ld_semaphore *sem, unsigned flags, long adjust, long timeout)
/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);

- if (flags & LDSEM_READ_WAIT) {
- /* Handle recursive read locking -- if the reader already has
- * a read lock then allow lock acquire without waiting
- * but also without waking other waiters
- */
- if (ldsem_lock_recursion(sem)) {
- raw_spin_unlock_irq(&sem->wait_lock);
- return sem;
- }
+ if (flags & LDSEM_READ_WAIT)
list_add_tail(&waiter.list, &sem->read_wait);
- } else
+ else
list_add_tail(&waiter.list, &sem->write_wait);

waiter.task = current;
@@ -358,9 +312,6 @@ static inline int __ldsem_down_read_nested(struct ld_semaphore *sem,
}
}
lock_stat(sem, acquired);
-
- /* used for read lock recursion test */
- ldsem_update_task_map(sem);
return 1;
}

@@ -371,17 +322,9 @@ static inline int __ldsem_down_write_nested(struct ld_semaphore *sem,

lockdep_acquire(sem, subclass, 0, _RET_IP_);

- raw_spin_lock_irq(&sem->wait_lock);
-
count = atomic_long_add_return(LDSEM_WRITE_BIAS,
(atomic_long_t *)&sem->count);
- if (count == LDSEM_WRITE_BIAS) {
- /* reset read lock recursion map */
- ldsem_clear_task_map(sem);
- raw_spin_unlock_irq(&sem->wait_lock);
- } else {
- raw_spin_unlock_irq(&sem->wait_lock);
-
+ if (count != LDSEM_WRITE_BIAS) {
lock_stat(sem, contended);
if (!down_write_failed(sem, timeout)) {
lockdep_release(sem, 1, _RET_IP_);
@@ -414,8 +357,6 @@ int ldsem_down_read_trylock(struct ld_semaphore *sem)
count + LDSEM_READ_BIAS)) {
lockdep_acquire_read(sem, 0, 1, _RET_IP_);
lock_stat(sem, acquired);
-
- ldsem_update_task_map(sem);
return 1;
}
}
@@ -438,21 +379,13 @@ int ldsem_down_write_trylock(struct ld_semaphore *sem)
{
long count;

- raw_spin_lock_irq(&sem->wait_lock);
-
count = atomic_long_cmpxchg(&sem->count, LDSEM_UNLOCKED,
LDSEM_WRITE_BIAS);
if (count == LDSEM_UNLOCKED) {
- /* reset read lock recursion map */
- ldsem_clear_task_map(sem);
-
- raw_spin_unlock_irq(&sem->wait_lock);
-
lockdep_acquire(sem, 0, 1, _RET_IP_);
lock_stat(sem, acquired);
return 1;
}
- raw_spin_unlock_irq(&sem->wait_lock);
return 0;
}

diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index bbefe71..bfbe41a 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -122,8 +122,6 @@ struct ld_semaphore {
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
-#define TASK_MAP_BITS 157
- DECLARE_BITMAP(task_map, TASK_MAP_BITS);
};

extern void __init_ldsem(struct ld_semaphore *sem, const char *name,
--
1.8.1.2

2013-03-11 21:15:46

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 43/44] tty: Early-out ldsem write lock stealing

If, when attempting to reverse the count bump, the writer discovers
the lock is unclaimed, try to immediately steal the lock.

Derived from Michel Lespinasse's write lock stealing work on
rwsem.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index e750ac3..84ea8a7 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -247,7 +247,7 @@ down_read_failed(struct ld_semaphore *sem, long timeout)
* wait for the write lock to be granted
*/
static struct ld_semaphore __sched *
-down_write_failed(struct ld_semaphore *sem, long timeout)
+down_write_failed(struct ld_semaphore *sem, long count, long timeout)
{
struct ldsem_waiter waiter;
long adjust = -LDSEM_ACTIVE_BIAS;
@@ -255,17 +255,24 @@ down_write_failed(struct ld_semaphore *sem, long timeout)

/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
+
+ /* Try to reverse the lock attempt but if the count has changed
+ * so that reversing fails, check if the lock is now owned,
+ * and early-out if so */
+ do {
+ if (ldsem_cmpxchg(&count, count + adjust, sem))
+ break;
+ if ((count & LDSEM_ACTIVE_MASK) == LDSEM_ACTIVE_BIAS) {
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return sem;
+ }
+ } while (1);
+
list_add_tail(&waiter.list, &sem->write_wait);

waiter.task = current;
get_task_struct(current);

- /* change the lock attempt to a wait --
- * if there are no active locks, wake the new lock owner(s)
- */
- if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
- __ldsem_wake(sem);
-
set_current_state(TASK_UNINTERRUPTIBLE);
for (;;) {
if (!timeout)
@@ -320,7 +327,7 @@ static inline int __ldsem_down_write_nested(struct ld_semaphore *sem,
count = ldsem_atomic_update(LDSEM_WRITE_BIAS, sem);
if ((count & LDSEM_ACTIVE_MASK) != LDSEM_ACTIVE_BIAS) {
lock_stat(sem, contended);
- if (!down_write_failed(sem, timeout)) {
+ if (!down_write_failed(sem, count, timeout)) {
lockdep_release(sem, 1, _RET_IP_);
return 0;
}
--
1.8.1.2

2013-03-11 21:18:29

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 13/44] tty: Don't reenable already enabled ldisc

tty_ldisc_hangup() guarantees the ldisc is enabled (or that there
is no ldisc). Since __tty_hangup() was the only user, re-define
tty_ldisc_enable() in file-scope.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_io.c | 1 -
drivers/tty/tty_ldisc.c | 2 +-
include/linux/tty.h | 2 --
3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index fd47363..1ee318a 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -666,7 +666,6 @@ static void __tty_hangup(struct tty_struct *tty)
*/
set_bit(TTY_HUPPED, &tty->flags);
clear_bit(TTY_HUPPING, &tty->flags);
- tty_ldisc_enable(tty);

tty_unlock(tty);

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 37671fcc..9c727da 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -373,7 +373,7 @@ static inline void tty_ldisc_put(struct tty_ldisc *ld)
* Clearing directly is allowed.
*/

-void tty_ldisc_enable(struct tty_struct *tty)
+static void tty_ldisc_enable(struct tty_struct *tty)
{
clear_bit(TTY_LDISC_HALTED, &tty->flags);
set_bit(TTY_LDISC, &tty->flags);
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 7aa4be6..bfa6fca 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -527,8 +527,6 @@ extern void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty);
extern void tty_ldisc_init(struct tty_struct *tty);
extern void tty_ldisc_deinit(struct tty_struct *tty);
extern void tty_ldisc_begin(void);
-/* This last one is just for the tty layer internals and shouldn't be used elsewhere */
-extern void tty_ldisc_enable(struct tty_struct *tty);


/* n_tty.c */
--
1.8.1.2

2013-03-11 21:15:40

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 34/44] tty: Fix tty_ldisc_lock name collision

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 9725c94..ba49c0e 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -44,7 +44,7 @@ enum {
* callers who will do ldisc lookups and cannot sleep.
*/

-static DEFINE_RAW_SPINLOCK(tty_ldisc_lock);
+static DEFINE_RAW_SPINLOCK(tty_ldiscs_lock);
/* Line disc dispatch table */
static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];

@@ -58,7 +58,7 @@ static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];
* from this point onwards.
*
* Locking:
- * takes tty_ldisc_lock to guard against ldisc races
+ * takes tty_ldiscs_lock to guard against ldisc races
*/

int tty_register_ldisc(int disc, struct tty_ldisc_ops *new_ldisc)
@@ -69,11 +69,11 @@ int tty_register_ldisc(int disc, struct tty_ldisc_ops *new_ldisc)
if (disc < N_TTY || disc >= NR_LDISCS)
return -EINVAL;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
tty_ldiscs[disc] = new_ldisc;
new_ldisc->num = disc;
new_ldisc->refcount = 0;
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);

return ret;
}
@@ -88,7 +88,7 @@ EXPORT_SYMBOL(tty_register_ldisc);
* currently in use.
*
* Locking:
- * takes tty_ldisc_lock to guard against ldisc races
+ * takes tty_ldiscs_lock to guard against ldisc races
*/

int tty_unregister_ldisc(int disc)
@@ -99,12 +99,12 @@ int tty_unregister_ldisc(int disc)
if (disc < N_TTY || disc >= NR_LDISCS)
return -EINVAL;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
if (tty_ldiscs[disc]->refcount)
ret = -EBUSY;
else
tty_ldiscs[disc] = NULL;
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);

return ret;
}
@@ -115,7 +115,7 @@ static struct tty_ldisc_ops *get_ldops(int disc)
unsigned long flags;
struct tty_ldisc_ops *ldops, *ret;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
ret = ERR_PTR(-EINVAL);
ldops = tty_ldiscs[disc];
if (ldops) {
@@ -125,7 +125,7 @@ static struct tty_ldisc_ops *get_ldops(int disc)
ret = ldops;
}
}
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);
return ret;
}

@@ -133,10 +133,10 @@ static void put_ldops(struct tty_ldisc_ops *ldops)
{
unsigned long flags;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
ldops->refcount--;
module_put(ldops->owner);
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);
}

/**
@@ -149,7 +149,7 @@ static void put_ldops(struct tty_ldisc_ops *ldops)
* available
*
* Locking:
- * takes tty_ldisc_lock to guard against ldisc races
+ * takes tty_ldiscs_lock to guard against ldisc races
*/

static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc)
--
1.8.1.2

2013-03-11 21:19:14

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 35/44] tty: Drop "tty is NULL" flip buffer diagnostic

Although this warning uncovered a multitude of races and errors
in the tty and ldisc layers, this diagnostic now has too many
false-positives.

An expected outcome of the separation of driver i/o paths from
tty lifetimes (added in 3.9) is that the tty may already be
in advanced stages of teardown when scheduled flip buffer work
runs. Ldisc i/o is prevented in this case because flush_to_ldisc()
cannot acquire an ldisc reference.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index 0dd35ce..8e8d730 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -425,7 +425,7 @@ static void flush_to_ldisc(struct work_struct *work)
struct tty_ldisc *disc;

tty = port->itty;
- if (WARN_RATELIMIT(tty == NULL, "tty is NULL\n"))
+ if (tty == NULL)
return;

disc = tty_ldisc_ref(tty);
--
1.8.1.2

2013-03-11 21:19:36

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 39/44] tty: Factor ldsem writer trylock

Prepare to simplify lock acquistion by waiting writer.

Derived from Michel Lespinasse's write lock stealing work on rwsem.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index d2f091a..48f1ce8 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -136,13 +136,8 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem)
sem->wait_readers = 0;
}

-static void __ldsem_wake_writer(struct ld_semaphore *sem)
+static inline int writer_trylock(struct ld_semaphore *sem)
{
- struct ldsem_waiter *waiter;
- struct task_struct *tsk;
-
- waiter = list_entry(sem->write_wait.next, struct ldsem_waiter, list);
-
/* only wake this writer if the active part of the count can be
* transitioned from 0 -> 1
*/
@@ -159,9 +154,22 @@ static void __ldsem_wake_writer(struct ld_semaphore *sem)
*/
count = ldsem_atomic_update(-LDSEM_ACTIVE_BIAS, sem);
if (count & LDSEM_ACTIVE_MASK)
- return;
+ return 0;
} while (1);

+ return 1;
+}
+
+static void __ldsem_wake_writer(struct ld_semaphore *sem)
+{
+ struct ldsem_waiter *waiter;
+ struct task_struct *tsk;
+
+ waiter = list_entry(sem->write_wait.next, struct ldsem_waiter, list);
+
+ if (!writer_trylock(sem))
+ return;
+
/* We must be careful not to touch 'waiter' after we set ->task = NULL.
* It is an allocated on the waiter's stack and may become invalid at
* any time after that point (due to a wakeup from another source).
--
1.8.1.2

2013-03-11 21:15:38

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 33/44] tty: Clarify multiple-references comment in TIOCSETD ioctl

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 84ba790..9725c94 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -567,13 +567,15 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
tty_ldisc_restore(tty, old_ldisc);
}

- /* At this point we hold a reference to the new ldisc and a
- a reference to the old ldisc. If we ended up flipping back
- to the existing ldisc we have two references to it */
-
if (tty->ldisc->ops->num != old_ldisc->ops->num && tty->ops->set_ldisc)
tty->ops->set_ldisc(tty);

+ /* At this point we hold a reference to the new ldisc and a
+ reference to the old ldisc, or we hold two references to
+ the old ldisc (if it was restored as part of error cleanup
+ above). In either case, releasing a single reference from
+ the old ldisc is correct. */
+
tty_ldisc_put(old_ldisc);

/*
--
1.8.1.2

2013-03-11 21:22:05

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 42/44] tty: Reduce and simplify ldsem atomic ops

Merge all atomic operations on the sem count into only
2 functions: atomic add w/ result and cmpxchg w/ success/fail
and in-memory value return.

Reduce the waiting readers and writer trylocks to a single
optimistic grant attempt followed by looping on unsuccessful
reversal attempts with cmpxchg. This allows unsuccessful
reversals that already grant the lock to pass through without
needing to retry the grant atomically.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 59 +++++++++++++++++++------------------------------
1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index fd95950..e750ac3 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -83,6 +83,12 @@ static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem)
return atomic_long_add_return(delta, (atomic_long_t *)&sem->count);
}

+static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem)
+{
+ long tmp = *old;
+ *old = atomic_long_cmpxchg(&sem->count, *old, new);
+ return *old == tmp;
+}

/*
* Initialize an ldsem:
@@ -108,20 +114,18 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem)
{
struct ldsem_waiter *waiter, *next;
struct task_struct *tsk;
- long adjust;
+ long adjust, count;

/* Try to grant read locks to all readers on the read wait list.
* Note the 'active part' of the count is incremented by
* the number of readers before waking any processes up.
*/
adjust = sem->wait_readers * (LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS);
+ count = ldsem_atomic_update(adjust, sem);
do {
- long count;
- count = ldsem_atomic_update(adjust, sem);
if (count > 0)
break;
- count = ldsem_atomic_update(-adjust, sem);
- if (count + adjust < 0)
+ if (ldsem_cmpxchg(&count, count - adjust, sem))
return;
} while (1);

@@ -141,23 +145,13 @@ static inline int writer_trylock(struct ld_semaphore *sem)
/* only wake this writer if the active part of the count can be
* transitioned from 0 -> 1
*/
+ long count = ldsem_atomic_update(LDSEM_ACTIVE_BIAS, sem);
do {
- long count;
-
- count = ldsem_atomic_update(LDSEM_ACTIVE_BIAS, sem);
- if ((count & LDSEM_ACTIVE_MASK) == 1)
- break;
-
- /* Someone grabbed the sem already -
- * undo the change to the active count, but check for
- * a transition 1->0
- */
- count = ldsem_atomic_update(-LDSEM_ACTIVE_BIAS, sem);
- if (count & LDSEM_ACTIVE_MASK)
+ if ((count & LDSEM_ACTIVE_MASK) == LDSEM_ACTIVE_BIAS)
+ return 1;
+ if (ldsem_cmpxchg(&count, count - LDSEM_ACTIVE_BIAS, sem))
return 0;
} while (1);
-
- return 1;
}

static void __ldsem_wake_writer(struct ld_semaphore *sem)
@@ -305,7 +299,7 @@ static inline int __ldsem_down_read_nested(struct ld_semaphore *sem,
{
lockdep_acquire_read(sem, subclass, 0, _RET_IP_);

- if (atomic_long_inc_return((atomic_long_t *)&sem->count) <= 0) {
+ if (ldsem_atomic_update(LDSEM_READ_BIAS, sem) <= 0) {
lock_stat(sem, contended);
if (!down_read_failed(sem, timeout)) {
lockdep_release(sem, 1, _RET_IP_);
@@ -323,8 +317,7 @@ static inline int __ldsem_down_write_nested(struct ld_semaphore *sem,

lockdep_acquire(sem, subclass, 0, _RET_IP_);

- count = atomic_long_add_return(LDSEM_WRITE_BIAS,
- (atomic_long_t *)&sem->count);
+ count = ldsem_atomic_update(LDSEM_WRITE_BIAS, sem);
if ((count & LDSEM_ACTIVE_MASK) != LDSEM_ACTIVE_BIAS) {
lock_stat(sem, contended);
if (!down_write_failed(sem, timeout)) {
@@ -351,11 +344,10 @@ int __sched ldsem_down_read(struct ld_semaphore *sem, long timeout)
*/
int ldsem_down_read_trylock(struct ld_semaphore *sem)
{
- long count;
+ long count = sem->count;

- while ((count = sem->count) >= 0) {
- if (count == atomic_long_cmpxchg(&sem->count, count,
- count + LDSEM_READ_BIAS)) {
+ while (count >= 0) {
+ if (ldsem_cmpxchg(&count, count + LDSEM_READ_BIAS, sem)) {
lockdep_acquire_read(sem, 0, 1, _RET_IP_);
lock_stat(sem, acquired);
return 1;
@@ -378,14 +370,10 @@ int __sched ldsem_down_write(struct ld_semaphore *sem, long timeout)
*/
int ldsem_down_write_trylock(struct ld_semaphore *sem)
{
- long count;
-
- while (((count = sem->count) & LDSEM_ACTIVE_MASK) == 0) {
- long tmp;
+ long count = sem->count;

- tmp = atomic_long_cmpxchg(&sem->count, count,
- count + LDSEM_WRITE_BIAS);
- if (count == tmp) {
+ while ((count & LDSEM_ACTIVE_MASK) == 0) {
+ if (ldsem_cmpxchg(&count, count + LDSEM_WRITE_BIAS, sem)) {
lockdep_acquire(sem, 0, 1, _RET_IP_);
lock_stat(sem, acquired);
return 1;
@@ -403,7 +391,7 @@ void ldsem_up_read(struct ld_semaphore *sem)

lockdep_release(sem, 1, _RET_IP_);

- count = atomic_long_dec_return((atomic_long_t *)&sem->count);
+ count = ldsem_atomic_update(-LDSEM_READ_BIAS, sem);
if (count < 0 && (count & LDSEM_ACTIVE_MASK) == 0)
ldsem_wake(sem);
}
@@ -417,8 +405,7 @@ void ldsem_up_write(struct ld_semaphore *sem)

lockdep_release(sem, 1, _RET_IP_);

- count = atomic_long_sub_return(LDSEM_WRITE_BIAS,
- (atomic_long_t *)&sem->count);
+ count = ldsem_atomic_update(-LDSEM_WRITE_BIAS, sem);
if (count < 0)
ldsem_wake(sem);
}
--
1.8.1.2

2013-03-11 21:22:01

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 44/44] tty: Early-out tardy ldsem readers

As long as there are no other waiters, read locks should early out.
Otherwise, a reader can end up sleeping while readers are already
running and there are no waiting writers.
This can happen in the following scenario:

CPU 0 | CPU 1
|
| down_write()

... CPU 1 has the write lock for the semaphore.
Meanwhile, 1 or more down_read(s) are attempted and fail;
these are put on the wait list. Then ...

down_read() | up_write()
local = atomic_update(+read_bias) |
local <= 0? | local = atomic_update(-write_bias)
if (true) | local < 0?
down_read_failed() | if (true)
| wake()
| grab wait_lock
wait for wait_lock | wake all readers
| release wait_lock

... At this point, sem->count > 0 and the wait list is empty,
but down_read_failed() will sleep the reader.

Instead, try to reverse the down_read() attempt, but if the count has
changed so that reversing fails, check if there are no other waiters
and early-out if not.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 84ea8a7..2f35661 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -191,23 +191,34 @@ static void ldsem_wake(struct ld_semaphore *sem)
* wait for the read lock to be granted
*/
static struct ld_semaphore __sched *
-down_read_failed(struct ld_semaphore *sem, long timeout)
+down_read_failed(struct ld_semaphore *sem, long count, long timeout)
{
struct ldsem_waiter waiter;
long adjust = -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS;

/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
+
+ /* Try to reverse the lock attempt but if the count has changed
+ * so that reversing fails, check if there are are no waiters,
+ * and early-out if not */
+ do {
+ if (ldsem_cmpxchg(&count, count + adjust, sem))
+ break;
+ if (count > 0) {
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return sem;
+ }
+ } while (1);
+
list_add_tail(&waiter.list, &sem->read_wait);
sem->wait_readers++;

waiter.task = current;
get_task_struct(current);

- /* change the lock attempt to a wait --
- * if there are no active locks, wake the new lock owner(s)
- */
- if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
+ /* if there are no active locks, wake the new lock owner(s) */
+ if ((count & LDSEM_ACTIVE_MASK) == 0)
__ldsem_wake(sem);

raw_spin_unlock_irq(&sem->wait_lock);
@@ -304,11 +315,14 @@ down_write_failed(struct ld_semaphore *sem, long count, long timeout)
static inline int __ldsem_down_read_nested(struct ld_semaphore *sem,
int subclass, long timeout)
{
+ long count;
+
lockdep_acquire_read(sem, subclass, 0, _RET_IP_);

- if (ldsem_atomic_update(LDSEM_READ_BIAS, sem) <= 0) {
+ count = ldsem_atomic_update(LDSEM_READ_BIAS, sem);
+ if (count <= 0) {
lock_stat(sem, contended);
- if (!down_read_failed(sem, timeout)) {
+ if (!down_read_failed(sem, count, timeout)) {
lockdep_release(sem, 1, _RET_IP_);
return 0;
}
--
1.8.1.2

2013-03-11 21:15:35

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 30/44] tty: Replace ldisc locking with ldisc_sem

Line discipline locking was performed with a combination of
a mutex, a status bit, a count, and a waitqueue -- basically,
a rw semaphore.

Replace the existing combination with an ld_semaphore.

Fixes:
1) the 'reference acquire after ldisc locked' bug
2) the over-complicated halt mechanism
3) lock order wrt. tty_lock()
4) dropping locks while changing ldisc
5) previously unidentified deadlock while locking ldisc from
both linked ttys concurrently
6) previously unidentified recursive deadlocks

Adds much-needed lockdep diagnostics.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_buffer.c | 2 +-
drivers/tty/tty_io.c | 7 +-
drivers/tty/tty_ldisc.c | 324 ++++++----------------------------------------
include/linux/tty.h | 4 +-
include/linux/tty_ldisc.h | 3 +-
5 files changed, 48 insertions(+), 292 deletions(-)

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index bb11993..0dd35ce 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -429,7 +429,7 @@ static void flush_to_ldisc(struct work_struct *work)
return;

disc = tty_ldisc_ref(tty);
- if (disc == NULL) /* !TTY_LDISC */
+ if (disc == NULL)
return;

spin_lock_irqsave(&buf->lock, flags);
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index bf33440..adc1d01 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1328,8 +1328,7 @@ static int tty_reopen(struct tty_struct *tty)
struct tty_driver *driver = tty->driver;

if (test_bit(TTY_CLOSING, &tty->flags) ||
- test_bit(TTY_HUPPING, &tty->flags) ||
- test_bit(TTY_LDISC_CHANGING, &tty->flags))
+ test_bit(TTY_HUPPING, &tty->flags))
return -EIO;

if (driver->type == TTY_DRIVER_TYPE_PTY &&
@@ -1345,7 +1344,7 @@ static int tty_reopen(struct tty_struct *tty)
}
tty->count++;

- WARN_ON(!test_bit(TTY_LDISC, &tty->flags));
+ WARN_ON(!tty->ldisc);

return 0;
}
@@ -2956,7 +2955,7 @@ void initialize_tty_struct(struct tty_struct *tty,
tty->pgrp = NULL;
mutex_init(&tty->legacy_mutex);
mutex_init(&tty->termios_mutex);
- mutex_init(&tty->ldisc_mutex);
+ init_ldsem(&tty->ldisc_sem);
init_waitqueue_head(&tty->write_wait);
init_waitqueue_head(&tty->read_wait);
INIT_WORK(&tty->hangup_work, do_tty_hangup);
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index ae0287f..a150f95 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -45,7 +45,6 @@ enum {
*/

static DEFINE_RAW_SPINLOCK(tty_ldisc_lock);
-static DECLARE_WAIT_QUEUE_HEAD(tty_ldisc_wait);
/* Line disc dispatch table */
static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];

@@ -153,7 +152,7 @@ static void put_ldops(struct tty_ldisc_ops *ldops)
* takes tty_ldisc_lock to guard against ldisc races
*/

-static struct tty_ldisc *tty_ldisc_get(int disc)
+static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc)
{
struct tty_ldisc *ld;
struct tty_ldisc_ops *ldops;
@@ -180,8 +179,7 @@ static struct tty_ldisc *tty_ldisc_get(int disc)
}

ld->ops = ldops;
- atomic_set(&ld->users, 1);
- init_waitqueue_head(&ld->wq_idle);
+ ld->tty = tty;

return ld;
}
@@ -193,20 +191,11 @@ static struct tty_ldisc *tty_ldisc_get(int disc)
*/
static inline void tty_ldisc_put(struct tty_ldisc *ld)
{
- unsigned long flags;
-
if (WARN_ON_ONCE(!ld))
return;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
-
- /* unreleased reader reference(s) will cause this WARN */
- WARN_ON(!atomic_dec_and_test(&ld->users));
-
- ld->ops->refcount--;
- module_put(ld->ops->owner);
+ put_ldops(ld->ops);
kfree(ld);
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
}

static void *tty_ldiscs_seq_start(struct seq_file *m, loff_t *pos)
@@ -258,34 +247,6 @@ const struct file_operations tty_ldiscs_proc_fops = {
};

/**
- * tty_ldisc_try - internal helper
- * @tty: the tty
- *
- * Make a single attempt to grab and bump the refcount on
- * the tty ldisc. Return 0 on failure or 1 on success. This is
- * used to implement both the waiting and non waiting versions
- * of tty_ldisc_ref
- *
- * Locking: takes tty_ldisc_lock
- */
-
-static struct tty_ldisc *tty_ldisc_try(struct tty_struct *tty)
-{
- unsigned long flags;
- struct tty_ldisc *ld;
-
- /* FIXME: this allows reference acquire after TTY_LDISC is cleared */
- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
- ld = NULL;
- if (test_bit(TTY_LDISC, &tty->flags) && tty->ldisc) {
- ld = tty->ldisc;
- atomic_inc(&ld->users);
- }
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
- return ld;
-}
-
-/**
* tty_ldisc_ref_wait - wait for the tty ldisc
* @tty: tty device
*
@@ -298,16 +259,15 @@ static struct tty_ldisc *tty_ldisc_try(struct tty_struct *tty)
* against a discipline change, such as an existing ldisc reference
* (which we check for)
*
- * Locking: call functions take tty_ldisc_lock
+ * Note: only callable from a file_operations routine (which
+ * guarantees tty->ldisc !- NULL when the lock is acquired).
*/

struct tty_ldisc *tty_ldisc_ref_wait(struct tty_struct *tty)
{
- struct tty_ldisc *ld;
-
- /* wait_event is a macro */
- wait_event(tty_ldisc_wait, (ld = tty_ldisc_try(tty)) != NULL);
- return ld;
+ ldsem_down_read(&tty->ldisc_sem, MAX_SCHEDULE_TIMEOUT);
+ WARN_ON(!tty->ldisc);
+ return tty->ldisc;
}
EXPORT_SYMBOL_GPL(tty_ldisc_ref_wait);

@@ -318,13 +278,16 @@ EXPORT_SYMBOL_GPL(tty_ldisc_ref_wait);
* Dereference the line discipline for the terminal and take a
* reference to it. If the line discipline is in flux then
* return NULL. Can be called from IRQ and timer functions.
- *
- * Locking: called functions take tty_ldisc_lock
*/

struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty)
{
- return tty_ldisc_try(tty);
+ if (ldsem_down_read_trylock(&tty->ldisc_sem)) {
+ if (!tty->ldisc)
+ ldsem_up_read(&tty->ldisc_sem);
+ return tty->ldisc;
+ }
+ return NULL;
}
EXPORT_SYMBOL_GPL(tty_ldisc_ref);

@@ -334,27 +297,11 @@ EXPORT_SYMBOL_GPL(tty_ldisc_ref);
*
* Undoes the effect of tty_ldisc_ref or tty_ldisc_ref_wait. May
* be called in IRQ context.
- *
- * Locking: takes tty_ldisc_lock
*/

void tty_ldisc_deref(struct tty_ldisc *ld)
{
- unsigned long flags;
-
- if (WARN_ON_ONCE(!ld))
- return;
-
- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
- /*
- * WARNs if one-too-many reader references were released
- * - the last reference must be released with tty_ldisc_put
- */
- WARN_ON(atomic_dec_and_test(&ld->users));
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
-
- if (waitqueue_active(&ld->wq_idle))
- wake_up(&ld->wq_idle);
+ ldsem_up_read(&ld->tty->ldisc_sem);
}
EXPORT_SYMBOL_GPL(tty_ldisc_deref);

@@ -439,26 +386,6 @@ static void __lockfunc tty_ldisc_enable_pair(struct tty_struct *tty,


/**
- * tty_ldisc_enable - allow ldisc use
- * @tty: terminal to activate ldisc on
- *
- * Set the TTY_LDISC flag when the line discipline can be called
- * again. Do necessary wakeups for existing sleepers. Clear the LDISC
- * changing flag to indicate any ldisc change is now over.
- *
- * Note: nobody should set the TTY_LDISC bit except via this function.
- * Clearing directly is allowed.
- */
-
-static void tty_ldisc_enable(struct tty_struct *tty)
-{
- clear_bit(TTY_LDISC_HALTED, &tty->flags);
- set_bit(TTY_LDISC, &tty->flags);
- clear_bit(TTY_LDISC_CHANGING, &tty->flags);
- wake_up(&tty_ldisc_wait);
-}
-
-/**
* tty_ldisc_flush - flush line discipline queue
* @tty: tty
*
@@ -555,14 +482,14 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
int r;

/* There is an outstanding reference here so this is safe */
- old = tty_ldisc_get(old->ops->num);
+ old = tty_ldisc_get(tty, old->ops->num);
WARN_ON(IS_ERR(old));
tty->ldisc = old;
tty_set_termios_ldisc(tty, old->ops->num);
if (tty_ldisc_open(tty, old) < 0) {
tty_ldisc_put(old);
/* This driver is always present */
- new_ldisc = tty_ldisc_get(N_TTY);
+ new_ldisc = tty_ldisc_get(tty, N_TTY);
if (IS_ERR(new_ldisc))
panic("n_tty: get");
tty->ldisc = new_ldisc;
@@ -576,101 +503,6 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
}

/**
- * tty_ldisc_wait_idle - wait for the ldisc to become idle
- * @tty: tty to wait for
- * @timeout: for how long to wait at most
- *
- * Wait for the line discipline to become idle. The discipline must
- * have been halted for this to guarantee it remains idle.
- */
-static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
-{
- long ret;
- ret = wait_event_timeout(tty->ldisc->wq_idle,
- atomic_read(&tty->ldisc->users) == 1, timeout);
- return ret > 0 ? 0 : -EBUSY;
-}
-
-/**
- * tty_ldisc_halt - shut down the line discipline
- * @tty: tty device
- * @o_tty: paired pty device (can be NULL)
- * @timeout: # of jiffies to wait for ldisc refs to be released
- *
- * Shut down the line discipline and work queue for this tty device and
- * its paired pty (if exists). Clearing the TTY_LDISC flag ensures
- * no further references can be obtained, while waiting for existing
- * references to be released ensures no more data is fed to the ldisc.
- *
- * You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex)
- * in order to make sure any currently executing ldisc work is also
- * flushed.
- */
-
-static int tty_ldisc_halt(struct tty_struct *tty, struct tty_struct *o_tty,
- long timeout)
-{
- int retval;
-
- clear_bit(TTY_LDISC, &tty->flags);
- if (o_tty)
- clear_bit(TTY_LDISC, &o_tty->flags);
-
- retval = tty_ldisc_wait_idle(tty, timeout);
- if (!retval && o_tty)
- retval = tty_ldisc_wait_idle(o_tty, timeout);
- if (retval)
- return retval;
-
- set_bit(TTY_LDISC_HALTED, &tty->flags);
- if (o_tty)
- set_bit(TTY_LDISC_HALTED, &o_tty->flags);
-
- return 0;
-}
-
-/**
- * tty_ldisc_hangup_halt - halt the line discipline for hangup
- * @tty: tty being hung up
- *
- * Shut down the line discipline and work queue for the tty device
- * being hungup. Clear the TTY_LDISC flag to ensure no further
- * references can be obtained and wait for remaining references to be
- * released to ensure no more data is fed to this ldisc.
- * Caller must hold legacy and ->ldisc_mutex.
- *
- * NB: tty_set_ldisc() is prevented from changing the ldisc concurrently
- * with this function by checking the TTY_HUPPING flag.
- */
-static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
-{
- char cur_n[TASK_COMM_LEN], tty_n[64];
- long timeout = 3 * HZ;
-
- clear_bit(TTY_LDISC, &tty->flags);
-
- if (tty->ldisc) { /* Not yet closed */
- tty_unlock(tty);
-
- while (tty_ldisc_wait_idle(tty, timeout) == -EBUSY) {
- timeout = MAX_SCHEDULE_TIMEOUT;
- printk_ratelimited(KERN_WARNING
- "%s: waiting (%s) for %s took too long, but we keep waiting...\n",
- __func__, get_task_comm(cur_n, current),
- tty_name(tty, tty_n));
- }
-
- set_bit(TTY_LDISC_HALTED, &tty->flags);
-
- /* must reacquire both locks and preserve lock order */
- mutex_unlock(&tty->ldisc_mutex);
- tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);
- }
- return !!tty->ldisc;
-}
-
-/**
* tty_set_ldisc - set line discipline
* @tty: the terminal to set
* @ldisc: the line discipline
@@ -679,103 +511,45 @@ static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
* context. The ldisc change logic has to protect itself against any
* overlapping ldisc change (including on the other end of pty pairs),
* the close of one side of a tty/pty pair, and eventually hangup.
- *
- * Locking: takes tty_ldisc_lock, termios_mutex
*/

int tty_set_ldisc(struct tty_struct *tty, int ldisc)
{
int retval;
struct tty_ldisc *o_ldisc, *new_ldisc;
- struct tty_struct *o_tty;
+ struct tty_struct *o_tty = tty->link;

- new_ldisc = tty_ldisc_get(ldisc);
+ new_ldisc = tty_ldisc_get(tty, ldisc);
if (IS_ERR(new_ldisc))
return PTR_ERR(new_ldisc);

- tty_lock(tty);
- /*
- * We need to look at the tty locking here for pty/tty pairs
- * when both sides try to change in parallel.
- */
-
- o_tty = tty->link; /* o_tty is the pty side or NULL */
-
+ retval = tty_ldisc_lock_pair_timeout(tty, o_tty, 5 * HZ);
+ if (retval)
+ return retval;

/*
* Check the no-op case
*/

if (tty->ldisc->ops->num == ldisc) {
- tty_unlock(tty);
+ tty_ldisc_enable_pair(tty, o_tty);
tty_ldisc_put(new_ldisc);
return 0;
}

- mutex_lock(&tty->ldisc_mutex);
-
- /*
- * We could be midstream of another ldisc change which has
- * dropped the lock during processing. If so we need to wait.
- */
-
- while (test_bit(TTY_LDISC_CHANGING, &tty->flags)) {
- mutex_unlock(&tty->ldisc_mutex);
- tty_unlock(tty);
- wait_event(tty_ldisc_wait,
- test_bit(TTY_LDISC_CHANGING, &tty->flags) == 0);
- tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);
- }
-
- set_bit(TTY_LDISC_CHANGING, &tty->flags);
-
- /*
- * No more input please, we are switching. The new ldisc
- * will update this value in the ldisc open function
- */
-
+ /* FIXME: why 'shutoff' input if the ldisc is locked? */
tty->receive_room = 0;

o_ldisc = tty->ldisc;
-
- tty_unlock(tty);
- /*
- * Make sure we don't change while someone holds a
- * reference to the line discipline. The TTY_LDISC bit
- * prevents anyone taking a reference once it is clear.
- * We need the lock to avoid racing reference takers.
- *
- * We must clear the TTY_LDISC bit here to avoid a livelock
- * with a userspace app continually trying to use the tty in
- * parallel to the change and re-referencing the tty.
- */
-
- retval = tty_ldisc_halt(tty, o_tty, 5 * HZ);
-
- /*
- * Wait for hangup to complete, if pending.
- * We must drop the mutex here in case a hangup is also in process.
- */
-
- mutex_unlock(&tty->ldisc_mutex);
-
- flush_work(&tty->hangup_work);
-
tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);

- /* handle wait idle failure locked */
- if (retval) {
- tty_ldisc_put(new_ldisc);
- goto enable;
- }
+ /* FIXME: for testing only */
+ WARN_ON(test_bit(TTY_HUPPED, &tty->flags));

if (test_bit(TTY_HUPPING, &tty->flags)) {
/* We were raced by the hangup method. It will have stomped
the ldisc data and closed the ldisc down */
- clear_bit(TTY_LDISC_CHANGING, &tty->flags);
- mutex_unlock(&tty->ldisc_mutex);
+ tty_ldisc_enable_pair(tty, o_tty);
tty_ldisc_put(new_ldisc);
tty_unlock(tty);
return -EIO;
@@ -804,14 +578,10 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)

tty_ldisc_put(o_ldisc);

-enable:
/*
* Allow ldisc referencing to occur again
*/
-
- tty_ldisc_enable(tty);
- if (o_tty)
- tty_ldisc_enable(o_tty);
+ tty_ldisc_enable_pair(tty, o_tty);

/* Restart the work queue in case no characters kick it off. Safe if
already running */
@@ -819,7 +589,6 @@ enable:
if (o_tty)
schedule_work(&o_tty->port->buf.work);

- mutex_unlock(&tty->ldisc_mutex);
tty_unlock(tty);
return retval;
}
@@ -852,7 +621,7 @@ static void tty_reset_termios(struct tty_struct *tty)

static int tty_ldisc_reinit(struct tty_struct *tty, int ldisc)
{
- struct tty_ldisc *ld = tty_ldisc_get(ldisc);
+ struct tty_ldisc *ld = tty_ldisc_get(tty, ldisc);

if (IS_ERR(ld))
return -1;
@@ -891,14 +660,8 @@ void tty_ldisc_hangup(struct tty_struct *tty)

tty_ldisc_debug(tty, "closing ldisc: %p\n", tty->ldisc);

- /*
- * FIXME! What are the locking issues here? This may me overdoing
- * things... This question is especially important now that we've
- * removed the irqlock.
- */
ld = tty_ldisc_ref(tty);
if (ld != NULL) {
- /* We may have no line discipline at this point */
if (ld->ops->flush_buffer)
ld->ops->flush_buffer(tty);
tty_driver_flush_buffer(tty);
@@ -909,21 +672,22 @@ void tty_ldisc_hangup(struct tty_struct *tty)
ld->ops->hangup(tty);
tty_ldisc_deref(ld);
}
- /*
- * FIXME: Once we trust the LDISC code better we can wait here for
- * ldisc completion and fix the driver call race
- */
+
wake_up_interruptible_poll(&tty->write_wait, POLLOUT);
wake_up_interruptible_poll(&tty->read_wait, POLLIN);
+
+ tty_unlock(tty);
+
/*
* Shutdown the current line discipline, and reset it to
* N_TTY if need be.
*
* Avoid racing set_ldisc or tty_ldisc_release
*/
- mutex_lock(&tty->ldisc_mutex);
+ tty_ldisc_lock_pair(tty, tty->link);
+ tty_lock(tty);

- if (tty_ldisc_hangup_halt(tty)) {
+ if (tty->ldisc) {

/* At this point we have a halted ldisc; we want to close it and
reopen a new ldisc. We could defer the reopen to the next
@@ -942,9 +706,8 @@ void tty_ldisc_hangup(struct tty_struct *tty)
BUG_ON(tty_ldisc_reinit(tty, N_TTY));
WARN_ON(tty_ldisc_open(tty, tty->ldisc));
}
- tty_ldisc_enable(tty);
}
- mutex_unlock(&tty->ldisc_mutex);
+ tty_ldisc_enable_pair(tty, tty->link);
if (reset)
tty_reset_termios(tty);

@@ -976,15 +739,12 @@ int tty_ldisc_setup(struct tty_struct *tty, struct tty_struct *o_tty)
tty_ldisc_close(tty, ld);
return retval;
}
- tty_ldisc_enable(o_tty);
}
- tty_ldisc_enable(tty);
return 0;
}

static void tty_ldisc_kill(struct tty_struct *tty)
{
- mutex_lock(&tty->ldisc_mutex);
/*
* Now kill off the ldisc
*/
@@ -995,7 +755,6 @@ static void tty_ldisc_kill(struct tty_struct *tty)

/* Ensure the next open requests the N_TTY ldisc */
tty_set_termios_ldisc(tty, N_TTY);
- mutex_unlock(&tty->ldisc_mutex);
}

/**
@@ -1017,15 +776,16 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)

tty_ldisc_debug(tty, "closing ldisc: %p\n", tty->ldisc);

- tty_ldisc_halt(tty, o_tty, MAX_SCHEDULE_TIMEOUT);
-
+ tty_ldisc_lock_pair(tty, o_tty);
tty_lock_pair(tty, o_tty);
- /* This will need doing differently if we need to lock */
+
tty_ldisc_kill(tty);
if (o_tty)
tty_ldisc_kill(o_tty);

tty_unlock_pair(tty, o_tty);
+ tty_ldisc_unlock_pair(tty, o_tty);
+
/* And the memory resources remaining (buffers, termios) will be
disposed of when the kref hits zero */

@@ -1042,7 +802,7 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)

void tty_ldisc_init(struct tty_struct *tty)
{
- struct tty_ldisc *ld = tty_ldisc_get(N_TTY);
+ struct tty_ldisc *ld = tty_ldisc_get(tty, N_TTY);
if (IS_ERR(ld))
panic("n_tty: init_tty");
tty->ldisc = ld;
diff --git a/include/linux/tty.h b/include/linux/tty.h
index bfa6fca..2c109a3 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -238,7 +238,7 @@ struct tty_struct {
int index;

/* Protects ldisc changes: Lock tty not pty */
- struct mutex ldisc_mutex;
+ struct ld_semaphore ldisc_sem;
struct tty_ldisc *ldisc;

struct mutex atomic_write_lock;
@@ -306,8 +306,6 @@ struct tty_file_private {
#define TTY_DO_WRITE_WAKEUP 5 /* Call write_wakeup after queuing new */
#define TTY_PUSH 6 /* n_tty private */
#define TTY_CLOSING 7 /* ->close() in progress */
-#define TTY_LDISC 9 /* Line discipline attached */
-#define TTY_LDISC_CHANGING 10 /* Line discipline changing */
#define TTY_LDISC_OPEN 11 /* Line discipline is open */
#define TTY_HW_COOK_OUT 14 /* Hardware can do output cooking */
#define TTY_HW_COOK_IN 15 /* Hardware can do input cooking */
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index bfbe41a..6ee666f 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -196,8 +196,7 @@ struct tty_ldisc_ops {

struct tty_ldisc {
struct tty_ldisc_ops *ops;
- atomic_t users;
- wait_queue_head_t wq_idle;
+ struct tty_struct *tty;
};

#define TTY_LDISC_MAGIC 0x5403
--
1.8.1.2

2013-03-11 21:23:10

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 40/44] tty: Simplify lock taking for waiting writers

Rather than granting the lock from the wakeup thread,
have the woken thread claim the lock instead. This may
delay the taking of the lock by the waiting writer
(readers may have bumped the semaphore but can't reverse it
because they need to acquire the wait_lock to put themselves
on the read_wait list). However, this step is necessary to
implement write lock stealing.

Derived from Michel Lespinasse's write lock stealing work on
rwsem.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 52 +++++++++++++++----------------------------------
1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 48f1ce8..372e897 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -163,23 +163,9 @@ static inline int writer_trylock(struct ld_semaphore *sem)
static void __ldsem_wake_writer(struct ld_semaphore *sem)
{
struct ldsem_waiter *waiter;
- struct task_struct *tsk;

waiter = list_entry(sem->write_wait.next, struct ldsem_waiter, list);
-
- if (!writer_trylock(sem))
- return;
-
- /* We must be careful not to touch 'waiter' after we set ->task = NULL.
- * It is an allocated on the waiter's stack and may become invalid at
- * any time after that point (due to a wakeup from another source).
- */
- list_del(&waiter->list);
- tsk = waiter->task;
- smp_mb();
- waiter->task = NULL;
- wake_up_process(tsk);
- put_task_struct(tsk);
+ wake_up_process(waiter->task);
}

/*
@@ -271,6 +257,7 @@ down_write_failed(struct ld_semaphore *sem, long timeout)
{
struct ldsem_waiter waiter;
long adjust = -LDSEM_ACTIVE_BIAS;
+ int locked = 0;

/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
@@ -285,36 +272,29 @@ down_write_failed(struct ld_semaphore *sem, long timeout)
if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
__ldsem_wake(sem);

- raw_spin_unlock_irq(&sem->wait_lock);
-
- /* wait to be given the lock */
+ set_current_state(TASK_UNINTERRUPTIBLE);
for (;;) {
- set_current_state(TASK_UNINTERRUPTIBLE);
-
- if (!waiter.task)
- break;
if (!timeout)
break;
+ raw_spin_unlock_irq(&sem->wait_lock);
timeout = schedule_timeout(timeout);
+ raw_spin_lock_irq(&sem->wait_lock);
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if ((locked = writer_trylock(sem)))
+ break;
}

+ list_del(&waiter.list);
+ raw_spin_unlock_irq(&sem->wait_lock);
+ put_task_struct(waiter.task);
+
__set_current_state(TASK_RUNNING);

- if (!timeout) {
- /* lock timed out but check if this task was just
- * granted lock ownership - if so, pretend there
- * was no timeout; otherwise, cleanup lock wait */
- raw_spin_lock_irq(&sem->wait_lock);
- if (waiter.task) {
- ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem);
- list_del(&waiter.list);
- put_task_struct(waiter.task);
- raw_spin_unlock_irq(&sem->wait_lock);
- return NULL;
- }
- raw_spin_unlock_irq(&sem->wait_lock);
+ /* lock wait may have timed out */
+ if (!locked) {
+ ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem);
+ return NULL;
}
-
return sem;
}

--
1.8.1.2

2013-03-11 21:23:31

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 41/44] tty: Implement ldsem fast path write lock stealing

Even with waiting writers, if the lock can be grabbed by
a new writer, simply grab it.

Derived from Michel Lespinasse's write lock stealing work on
rwsem.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 372e897..fd95950 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -325,7 +325,7 @@ static inline int __ldsem_down_write_nested(struct ld_semaphore *sem,

count = atomic_long_add_return(LDSEM_WRITE_BIAS,
(atomic_long_t *)&sem->count);
- if (count != LDSEM_WRITE_BIAS) {
+ if ((count & LDSEM_ACTIVE_MASK) != LDSEM_ACTIVE_BIAS) {
lock_stat(sem, contended);
if (!down_write_failed(sem, timeout)) {
lockdep_release(sem, 1, _RET_IP_);
@@ -380,12 +380,16 @@ int ldsem_down_write_trylock(struct ld_semaphore *sem)
{
long count;

- count = atomic_long_cmpxchg(&sem->count, LDSEM_UNLOCKED,
- LDSEM_WRITE_BIAS);
- if (count == LDSEM_UNLOCKED) {
- lockdep_acquire(sem, 0, 1, _RET_IP_);
- lock_stat(sem, acquired);
- return 1;
+ while (((count = sem->count) & LDSEM_ACTIVE_MASK) == 0) {
+ long tmp;
+
+ tmp = atomic_long_cmpxchg(&sem->count, count,
+ count + LDSEM_WRITE_BIAS);
+ if (count == tmp) {
+ lockdep_acquire(sem, 0, 1, _RET_IP_);
+ lock_stat(sem, acquired);
+ return 1;
+ }
}
return 0;
}
--
1.8.1.2

2013-03-11 21:24:35

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 38/44] tty: Drop wake type optimization

Prepare to implement write lock stealing. Since a writer might
grab the lock from waiting readers, the LDSEM_WAKE_NO_CHECK
optimization is no longer safe. Instead, waiting readers must be
granted the lock in a similar fashion to the other lock grants;
ie., the sem count is optimistically updated and if the result
shows the lock was granted, the readers are woken. If the result
shows the lock was not granted, the grant is reversed.

Derived from Michel Lespinasse's write lock stealing work on rwsem.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 56 +++++++++++++++++------------------------------
include/linux/tty_ldisc.h | 1 +
2 files changed, 21 insertions(+), 36 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index ddfbdfe..d2f091a 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -78,12 +78,6 @@ struct ldsem_waiter {
struct task_struct *task;
};

-/* Wake types for __ldsem_wake(). Note: RWSEM_WAKE_NO_CHECK implies
- * the spinlock must have been kept held since the ldsem value was observed.
- */
-#define LDSEM_WAKE_NORMAL 0 /* All race conditions checked */
-#define LDSEM_WAKE_NO_CHECK 1 /* Reader wakeup can skip race checking */
-
static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem)
{
return atomic_long_add_return(delta, (atomic_long_t *)&sem->count);
@@ -104,44 +98,32 @@ void __init_ldsem(struct ld_semaphore *sem, const char *name,
lockdep_init_map(&sem->dep_map, name, key, 0);
#endif
sem->count = LDSEM_UNLOCKED;
+ sem->wait_readers = 0;
raw_spin_lock_init(&sem->wait_lock);
INIT_LIST_HEAD(&sem->read_wait);
INIT_LIST_HEAD(&sem->write_wait);
}

-static void __ldsem_wake_readers(struct ld_semaphore *sem, int wake_type)
+static void __ldsem_wake_readers(struct ld_semaphore *sem)
{
struct ldsem_waiter *waiter, *next;
struct task_struct *tsk;
long adjust;

- /* If we come here from up_xxxx(), another thread might have reached
- * down_failed() before we acquired the spinlock and
- * woken up a waiter, making it now active. We prefer to check for
- * this first in order to not spend too much time with the spinlock
- * held if we're not going to be able to wake up readers in the end.
- *
- * Note that we do not need to update the ldsem count: any writer
- * trying to acquire ldsem will run down_write_failed() due
- * to the waiting threads and block trying to acquire the spinlock.
- *
- * We use a dummy atomic update in order to acquire the cache line
- * exclusively since we expect to succeed and run the final ldsem
- * count adjustment pretty soon.
- */
- if (wake_type == LDSEM_WAKE_NORMAL &&
- (ldsem_atomic_update(0, sem) & LDSEM_ACTIVE_MASK) != 0)
- /* Someone grabbed the sem for write already */
- return;
-
- /* Grant read locks to all readers on the read wait list.
+ /* Try to grant read locks to all readers on the read wait list.
* Note the 'active part' of the count is incremented by
* the number of readers before waking any processes up.
*/
- adjust = 0;
- list_for_each_entry(waiter, &sem->read_wait, list)
- adjust += LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS;
- ldsem_atomic_update(adjust, sem);
+ adjust = sem->wait_readers * (LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS);
+ do {
+ long count;
+ count = ldsem_atomic_update(adjust, sem);
+ if (count > 0)
+ break;
+ count = ldsem_atomic_update(-adjust, sem);
+ if (count + adjust < 0)
+ return;
+ } while (1);

list_for_each_entry_safe(waiter, next, &sem->read_wait, list) {
tsk = waiter->task;
@@ -151,6 +133,7 @@ static void __ldsem_wake_readers(struct ld_semaphore *sem, int wake_type)
put_task_struct(tsk);
}
INIT_LIST_HEAD(&sem->read_wait);
+ sem->wait_readers = 0;
}

static void __ldsem_wake_writer(struct ld_semaphore *sem)
@@ -199,12 +182,12 @@ static void __ldsem_wake_writer(struct ld_semaphore *sem)
* - the spinlock must be held by the caller
* - woken process blocks are discarded from the list after having task zeroed
*/
-static void __ldsem_wake(struct ld_semaphore *sem, int wake_type)
+static void __ldsem_wake(struct ld_semaphore *sem)
{
if (!list_empty(&sem->write_wait))
__ldsem_wake_writer(sem);
else if (!list_empty(&sem->read_wait))
- __ldsem_wake_readers(sem, wake_type);
+ __ldsem_wake_readers(sem);
}

static void ldsem_wake(struct ld_semaphore *sem)
@@ -212,7 +195,7 @@ static void ldsem_wake(struct ld_semaphore *sem)
unsigned long flags;

raw_spin_lock_irqsave(&sem->wait_lock, flags);
- __ldsem_wake(sem, LDSEM_WAKE_NORMAL);
+ __ldsem_wake(sem);
raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
}

@@ -228,6 +211,7 @@ down_read_failed(struct ld_semaphore *sem, long timeout)
/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
list_add_tail(&waiter.list, &sem->read_wait);
+ sem->wait_readers++;

waiter.task = current;
get_task_struct(current);
@@ -236,7 +220,7 @@ down_read_failed(struct ld_semaphore *sem, long timeout)
* if there are no active locks, wake the new lock owner(s)
*/
if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
- __ldsem_wake(sem, LDSEM_WAKE_NO_CHECK);
+ __ldsem_wake(sem);

raw_spin_unlock_irq(&sem->wait_lock);

@@ -291,7 +275,7 @@ down_write_failed(struct ld_semaphore *sem, long timeout)
* if there are no active locks, wake the new lock owner(s)
*/
if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
- __ldsem_wake(sem, LDSEM_WAKE_NO_CHECK);
+ __ldsem_wake(sem);

raw_spin_unlock_irq(&sem->wait_lock);

diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index 6ee666f..272075e 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -117,6 +117,7 @@
struct ld_semaphore {
long count;
raw_spinlock_t wait_lock;
+ unsigned int wait_readers;
struct list_head read_wait;
struct list_head write_wait;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
--
1.8.1.2

2013-03-11 21:15:33

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 31/44] tty: Clarify ldisc variable

Rename o_ldisc to avoid confusion with the ldisc of the
'other' tty.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index a150f95..9ace119 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -516,7 +516,7 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
int tty_set_ldisc(struct tty_struct *tty, int ldisc)
{
int retval;
- struct tty_ldisc *o_ldisc, *new_ldisc;
+ struct tty_ldisc *old_ldisc, *new_ldisc;
struct tty_struct *o_tty = tty->link;

new_ldisc = tty_ldisc_get(tty, ldisc);
@@ -540,7 +540,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
/* FIXME: why 'shutoff' input if the ldisc is locked? */
tty->receive_room = 0;

- o_ldisc = tty->ldisc;
+ old_ldisc = tty->ldisc;
tty_lock(tty);

/* FIXME: for testing only */
@@ -555,8 +555,8 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
return -EIO;
}

- /* Shutdown the current discipline. */
- tty_ldisc_close(tty, o_ldisc);
+ /* Shutdown the old discipline. */
+ tty_ldisc_close(tty, old_ldisc);

/* Now set up the new line discipline. */
tty->ldisc = new_ldisc;
@@ -566,17 +566,17 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
if (retval < 0) {
/* Back to the old one or N_TTY if we can't */
tty_ldisc_put(new_ldisc);
- tty_ldisc_restore(tty, o_ldisc);
+ tty_ldisc_restore(tty, old_ldisc);
}

/* At this point we hold a reference to the new ldisc and a
a reference to the old ldisc. If we ended up flipping back
to the existing ldisc we have two references to it */

- if (tty->ldisc->ops->num != o_ldisc->ops->num && tty->ops->set_ldisc)
+ if (tty->ldisc->ops->num != old_ldisc->ops->num && tty->ops->set_ldisc)
tty->ops->set_ldisc(tty);

- tty_ldisc_put(o_ldisc);
+ tty_ldisc_put(old_ldisc);

/*
* Allow ldisc referencing to occur again
--
1.8.1.2

2013-03-11 21:24:59

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 36/44] tty: Inline ldsem down_failed() into down_{read,write}_failed()

Separate the read and write lock paths to simplify handling of
initial acquire failure.

Derived from Michel Lespinasse's write lock stealing work on rwsem.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 57 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index a60d7e3..d849fb85 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -220,12 +220,14 @@ static void ldsem_wake(struct ld_semaphore *sem)
}

/*
- * wait for a lock to be granted
+ * wait for the read lock to be granted
*/
static struct ld_semaphore __sched *
-down_failed(struct ld_semaphore *sem, unsigned flags, long adjust, long timeout)
+down_read_failed(struct ld_semaphore *sem, long timeout)
{
struct ldsem_waiter waiter;
+ long flags = LDSEM_READ_WAIT;
+ long adjust = -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS;

/* set up my own style of waitqueue */
raw_spin_lock_irq(&sem->wait_lock);
@@ -279,22 +281,64 @@ down_failed(struct ld_semaphore *sem, unsigned flags, long adjust, long timeout)
}

/*
- * wait for the read lock to be granted
- */
-static struct ld_semaphore __sched *
-down_read_failed(struct ld_semaphore *sem, long timeout)
-{
- return down_failed(sem, LDSEM_READ_WAIT,
- -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS, timeout);
-}
-
-/*
* wait for the write lock to be granted
*/
static struct ld_semaphore __sched *
down_write_failed(struct ld_semaphore *sem, long timeout)
{
- return down_failed(sem, LDSEM_WRITE_WAIT, -LDSEM_ACTIVE_BIAS, timeout);
+ struct ldsem_waiter waiter;
+ long flags = LDSEM_WRITE_WAIT;
+ long adjust = -LDSEM_ACTIVE_BIAS;
+
+ /* set up my own style of waitqueue */
+ raw_spin_lock_irq(&sem->wait_lock);
+
+ if (flags & LDSEM_READ_WAIT)
+ list_add_tail(&waiter.list, &sem->read_wait);
+ else
+ list_add_tail(&waiter.list, &sem->write_wait);
+
+ waiter.task = current;
+ waiter.flags = flags;
+ get_task_struct(current);
+
+ /* change the lock attempt to a wait --
+ * if there are no active locks, wake the new lock owner(s)
+ */
+ if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
+ __ldsem_wake(sem, LDSEM_WAKE_NO_CHECK);
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ /* wait to be given the lock */
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+
+ if (!waiter.task)
+ break;
+ if (!timeout)
+ break;
+ timeout = schedule_timeout(timeout);
+ }
+
+ __set_current_state(TASK_RUNNING);
+
+ if (!timeout) {
+ /* lock timed out but check if this task was just
+ * granted lock ownership - if so, pretend there
+ * was no timeout; otherwise, cleanup lock wait */
+ raw_spin_lock_irq(&sem->wait_lock);
+ if (waiter.task) {
+ ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem);
+ list_del(&waiter.list);
+ put_task_struct(waiter.task);
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return NULL;
+ }
+ raw_spin_unlock_irq(&sem->wait_lock);
+ }
+
+ return sem;
}


--
1.8.1.2

2013-03-11 21:15:28

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 12/44] n_tty: Fully initialize ldisc before restarting buffer work

Buffer work may already be pending when the n_tty ldisc is re-opened,
eg., when setting the ldisc (via TIOCSETD ioctl) and when hanging up
the tty. Since n_tty_set_room() may restart buffer work, first ensure
the ldisc is completely initialized.

Factor n_tty_set_room() out of reset_buffer_flags() (only 2 callers)
and reorganize n_tty_open() to set termios last; buffer work will
be restarted there if necessary, after the char_map is properly
initialized.

Fixes this WARNING:

[ 549.561769] ------------[ cut here ]------------
[ 549.598755] WARNING: at drivers/tty/n_tty.c:160 n_tty_set_room+0xff/0x130()
[ 549.604058] scheduling buffer work for halted ldisc
[ 549.607741] Pid: 9417, comm: trinity-child28 Tainted: G D W 3.7.0-next-20121217-sasha-00023-g8689ef9 #219
[ 549.652580] Call Trace:
[ 549.662754] [<ffffffff81c432cf>] ? n_tty_set_room+0xff/0x130
[ 549.665458] [<ffffffff8110cae7>] warn_slowpath_common+0x87/0xb0
[ 549.668257] [<ffffffff8110cb71>] warn_slowpath_fmt+0x41/0x50
[ 549.671007] [<ffffffff81c432cf>] n_tty_set_room+0xff/0x130
[ 549.673268] [<ffffffff81c44597>] reset_buffer_flags+0x137/0x150
[ 549.675607] [<ffffffff81c45b71>] n_tty_open+0x131/0x1c0
[ 549.677699] [<ffffffff81c47824>] tty_ldisc_open.isra.5+0x54/0x70
[ 549.680147] [<ffffffff81c482bf>] tty_ldisc_hangup+0x11f/0x1e0
[ 549.682409] [<ffffffff81c3fa17>] __tty_hangup+0x137/0x440
[ 549.684634] [<ffffffff81c3fd49>] tty_vhangup+0x9/0x10
[ 549.686443] [<ffffffff81c4a42c>] pty_close+0x14c/0x160
[ 549.688446] [<ffffffff81c41225>] tty_release+0xd5/0x490
[ 549.690460] [<ffffffff8127d8a2>] __fput+0x122/0x250
[ 549.692577] [<ffffffff8127d9d9>] ____fput+0x9/0x10
[ 549.694534] [<ffffffff811348c2>] task_work_run+0xb2/0xf0
[ 549.696349] [<ffffffff81113c6d>] do_exit+0x36d/0x580
[ 549.698286] [<ffffffff8107d964>] ? syscall_trace_enter+0x24/0x2e0
[ 549.702729] [<ffffffff81113f4a>] do_group_exit+0x8a/0xc0
[ 549.706775] [<ffffffff81113f92>] sys_exit_group+0x12/0x20
[ 549.711088] [<ffffffff83cfab18>] tracesys+0xe1/0xe6
[ 549.728001] ---[ end trace 73eb41728f11f87e ]---

Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/n_tty.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index a786f4e..66ce178 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -219,9 +219,8 @@ static void check_unthrottle(struct tty_struct *tty)
* Locking: tty_read_lock for read fields.
*/

-static void reset_buffer_flags(struct tty_struct *tty)
+static void reset_buffer_flags(struct n_tty_data *ldata)
{
- struct n_tty_data *ldata = tty->disc_data;
unsigned long flags;

raw_spin_lock_irqsave(&ldata->read_lock, flags);
@@ -234,7 +233,6 @@ static void reset_buffer_flags(struct tty_struct *tty)

ldata->canon_head = ldata->canon_data = ldata->erasing = 0;
bitmap_zero(ldata->read_flags, N_TTY_BUF_SIZE);
- n_tty_set_room(tty);
}

static void n_tty_packet_mode_flush(struct tty_struct *tty)
@@ -262,7 +260,8 @@ static void n_tty_packet_mode_flush(struct tty_struct *tty)

static void n_tty_flush_buffer(struct tty_struct *tty)
{
- reset_buffer_flags(tty);
+ reset_buffer_flags(tty->disc_data);
+ n_tty_set_room(tty);

if (tty->link)
n_tty_packet_mode_flush(tty);
@@ -1654,14 +1653,14 @@ static int n_tty_open(struct tty_struct *tty)
goto err_free_bufs;

tty->disc_data = ldata;
- /* indicate buffer work may resume */
- clear_bit(TTY_LDISC_HALTED, &tty->flags);
- reset_buffer_flags(tty);
- tty_unthrottle(tty);
+ reset_buffer_flags(tty->disc_data);
ldata->column = 0;
- n_tty_set_termios(tty, NULL);
tty->minimum_to_wake = 1;
tty->closing = 0;
+ /* indicate buffer work may resume */
+ clear_bit(TTY_LDISC_HALTED, &tty->flags);
+ n_tty_set_termios(tty, NULL);
+ tty_unthrottle(tty);

return 0;
err_free_bufs:
--
1.8.1.2

2013-03-11 21:15:26

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 16/44] tty: Fix 'deferred reopen' ldisc comment

This comment is a victim of code migration from
"tty: Fix the ldisc hangup race"; re-parent it.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 7f7e1a3..0030d55 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -854,11 +854,12 @@ void tty_ldisc_hangup(struct tty_struct *tty)
*/
mutex_lock(&tty->ldisc_mutex);

- /* At this point we have a closed ldisc and we want to
- reopen it. We could defer this to the next open but
- it means auditing a lot of other paths so this is
- a FIXME */
if (tty_ldisc_hangup_halt(tty)) {
+
+ /* At this point we have a halted ldisc; we want to close it and
+ reopen a new ldisc. We could defer the reopen to the next
+ open but it means auditing a lot of other paths so this is
+ a FIXME */
if (reset == 0) {

if (!tty_ldisc_reinit(tty, tty->termios.c_line))
--
1.8.1.2

2013-03-11 21:26:31

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rw semaphore

The semantics of a rw semaphore are almost ideally suited
for tty line discipline lifetime management; multiple active
threads obtain "references" (read locks) while performing i/o
to prevent the loss or change of the current line discipline
(write lock).

Unfortunately, the existing rw_semaphore is ill-suited in other
ways;
1) obtaining a "reference" can be recursive, ie., a reference holder
may attempt to obtain another "reference". Recursive read locks
are not supported by rwsem.
2) TIOCSETD ioctl (change line discipline) expects to return an
error if the line discipline cannot be exclusively locked within
5 secs. Lock wait timeouts are not supported by rwsem.
3) A tty hangup is expected to halt and scrap pending i/o, so
exclusive locking must be prioritized without precluding
existing reference holders from obtaining recursive read locks.
Writer priority is not supported by rwsem.

Add ld_semaphore which implements these requirements in a
semantically and operationally similar way to rw_semaphore.

Read recursion is handled with a small bitmap indexed by a hashed
current value (allowing some non-recursive read locks with a
writer waiting is acceptable, whereas not allowing a recursive
read lock is not). This comes at a small expense to write lock
speed, as this bitmap is cleared under spinlock when a write lock is
obtained.

Writer priority is handled by separate wait lists for readers and
writers. Pending write waits are priortized before existing read
waits and prevent further read locks, except for the aforementioned
read lock recursion.

Wait timeouts are trivially added, but obviously change the lock
semantics as lock attempts can fail (but only due to timeout).

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/Makefile | 2 +-
drivers/tty/tty_ldsem.c | 507 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/tty_ldisc.h | 47 +++++
3 files changed, 555 insertions(+), 1 deletion(-)
create mode 100644 drivers/tty/tty_ldsem.c

diff --git a/drivers/tty/Makefile b/drivers/tty/Makefile
index 6b78399..58ad1c0 100644
--- a/drivers/tty/Makefile
+++ b/drivers/tty/Makefile
@@ -1,5 +1,5 @@
obj-$(CONFIG_TTY) += tty_io.o n_tty.o tty_ioctl.o tty_ldisc.o \
- tty_buffer.o tty_port.o tty_mutex.o
+ tty_buffer.o tty_port.o tty_mutex.o tty_ldsem.o
obj-$(CONFIG_LEGACY_PTYS) += pty.o
obj-$(CONFIG_UNIX98_PTYS) += pty.o
obj-$(CONFIG_AUDIT) += tty_audit.o
diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
new file mode 100644
index 0000000..0ab5b09
--- /dev/null
+++ b/drivers/tty/tty_ldsem.c
@@ -0,0 +1,507 @@
+/*
+ * Ldisc rw semaphore
+ *
+ * The ldisc semaphore is semantically a rw_semaphore but which enforces
+ * an alternate policy, namely:
+ * 1) Recursive read locking is allowed
+ * 2) Supports lock wait timeouts
+ * 3) Write waiter has priority, even if lock is already read owned, except:
+ * 4) Write waiter does not prevent recursive locking
+ * 5) Downgrading is not supported (because of #3 & #4 above)
+ *
+ * Implementation notes:
+ * 1) Upper half of semaphore count is a wait count (differs from rwsem
+ * in that rwsem normalizes the upper half to the wait bias)
+ * 2) Lacks overflow checking
+ * 3) Read recursion is tracked with a bitmap indexed by hashed 'current'
+ * This approach results in some false positives; ie, a non-recursive
+ * read lock may be granted while a write lock is waited.
+ * However, this approach does not produce false-negatives
+ * (ie. not granting a read lock to a recursive attempt) which might
+ * deadlock.
+ * Testing the bitmap need not be atomic wrt. setting the bitmap
+ * (as the 'current' thread cannot contend with itself); however,
+ * since the bitmap is cleared when write lock is granted.
+ * Note: increasing the bitmap size reduces the probability of false
+ * positives, and thus the probability of granting a non-recursive
+ * read lock with writer(s) waiting.
+ *
+ * The generic counting was copied and modified from include/asm-generic/rwsem.h
+ * by Paul Mackerras <[email protected]>.
+ *
+ * The scheduling policy was copied and modified from lib/rwsem.c
+ * Written by David Howells ([email protected]).
+ *
+ * Copyright (C) 2013 Peter Hurley <[email protected]>
+ *
+ * This file may be redistributed under the terms of the GNU General Public
+ * License v2.
+ */
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <linux/tty.h>
+#include <linux/sched.h>
+
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define __acq(l, s, t, r, c, n, i) \
+ lock_acquire(&(l)->dep_map, s, t, r, c, n, i)
+# define __rel(l, n, i) \
+ lock_release(&(l)->dep_map, n, i)
+# ifdef CONFIG_PROVE_LOCKING
+# define lockdep_acquire(l, s, t, i) __acq(l, s, t, 0, 2, NULL, i)
+# define lockdep_acquire_nest(l, s, t, n, i) __acq(l, s, t, 0, 2, n, i)
+# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 2, 2, NULL, i)
+# define lockdep_release(l, n, i) __rel(l, n, i)
+# else
+# define lockdep_acquire(l, s, t, i) __acq(l, s, t, 0, 1, NULL, i)
+# define lockdep_acquire_nest(l, s, t, n, i) __acq(l, s, t, 0, 1, n, i)
+# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 2, 1, NULL, i)
+# define lockdep_release(l, n, i) __rel(l, n, i)
+# endif
+#else
+# define lockdep_acquire(l, s, t, i) do { } while (0)
+# define lockdep_acquire_nest(l, s, t, n, i) do { } while (0)
+# define lockdep_acquire_read(l, s, t, i) do { } while (0)
+# define lockdep_release(l, n, i) do { } while (0)
+#endif
+
+#ifdef CONFIG_LOCK_STAT
+# define lock_stat(_lock, stat) lock_##stat(&(_lock)->dep_map, _RET_IP_)
+#else
+# define lock_stat(_lock, stat) do { } while (0)
+#endif
+
+
+#if BITS_PER_LONG == 64
+# define LDSEM_ACTIVE_MASK 0xffffffffL
+#else
+# define LDSEM_ACTIVE_MASK 0x0000ffffL
+#endif
+
+#define LDSEM_UNLOCKED 0L
+#define LDSEM_ACTIVE_BIAS 1L
+#define LDSEM_WAIT_BIAS (-LDSEM_ACTIVE_MASK-1)
+#define LDSEM_READ_BIAS LDSEM_ACTIVE_BIAS
+#define LDSEM_WRITE_BIAS (LDSEM_WAIT_BIAS + LDSEM_ACTIVE_BIAS)
+
+struct ldsem_waiter {
+ struct list_head list;
+ struct task_struct *task;
+ unsigned int flags;
+#define LDSEM_READ_WAIT 0x00000001
+#define LDSEM_WRITE_WAIT 0x00000002
+};
+
+/* Wake types for __ldsem_wake(). Note: RWSEM_WAKE_NO_CHECK implies
+ * the spinlock must have been kept held since the ldsem value was observed.
+ */
+#define LDSEM_WAKE_NORMAL 0 /* All race conditions checked */
+#define LDSEM_WAKE_NO_CHECK 1 /* Reader wakeup can skip race checking */
+
+static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem)
+{
+ return atomic_long_add_return(delta, (atomic_long_t *)&sem->count);
+}
+
+
+static inline unsigned long __hash_current(void)
+{
+ return (unsigned long)current % TASK_MAP_BITS;
+}
+
+static inline void ldsem_clear_task_map(struct ld_semaphore *sem)
+{
+ bitmap_zero(sem->task_map, TASK_MAP_BITS);
+}
+
+static inline void ldsem_update_task_map(struct ld_semaphore *sem)
+{
+ __set_bit(__hash_current(), sem->task_map);
+}
+
+static inline int ldsem_lock_recursion(struct ld_semaphore *sem)
+{
+ return test_bit(__hash_current(), sem->task_map);
+}
+
+/*
+ * Initialize an ldsem:
+ */
+void __init_ldsem(struct ld_semaphore *sem, const char *name,
+ struct lock_class_key *key)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ /*
+ * Make sure we are not reinitializing a held semaphore:
+ */
+ debug_check_no_locks_freed((void *)sem, sizeof(*sem));
+ lockdep_init_map(&sem->dep_map, name, key, 0);
+#endif
+ sem->count = LDSEM_UNLOCKED;
+ raw_spin_lock_init(&sem->wait_lock);
+ INIT_LIST_HEAD(&sem->read_wait);
+ INIT_LIST_HEAD(&sem->write_wait);
+ ldsem_clear_task_map(sem);
+}
+
+static void __ldsem_wake_readers(struct ld_semaphore *sem, int wake_type)
+{
+ struct ldsem_waiter *waiter, *next;
+ struct task_struct *tsk;
+ long adjust;
+
+ /* If we come here from up_xxxx(), another thread might have reached
+ * down_failed() before we acquired the spinlock and
+ * woken up a waiter, making it now active. We prefer to check for
+ * this first in order to not spend too much time with the spinlock
+ * held if we're not going to be able to wake up readers in the end.
+ *
+ * Note that we do not need to update the ldsem count: any writer
+ * trying to acquire ldsem will run down_write_failed() due
+ * to the waiting threads and block trying to acquire the spinlock.
+ *
+ * We use a dummy atomic update in order to acquire the cache line
+ * exclusively since we expect to succeed and run the final ldsem
+ * count adjustment pretty soon.
+ */
+ if (wake_type == LDSEM_WAKE_NORMAL &&
+ (ldsem_atomic_update(0, sem) & LDSEM_ACTIVE_MASK) != 0)
+ /* Someone grabbed the sem for write already */
+ return;
+
+ /* Grant read locks to all readers on the read wait list.
+ * Note the 'active part' of the count is incremented by
+ * the number of readers before waking any processes up.
+ */
+ adjust = 0;
+ list_for_each_entry(waiter, &sem->read_wait, list)
+ adjust += LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS;
+ ldsem_atomic_update(adjust, sem);
+
+ list_for_each_entry_safe(waiter, next, &sem->read_wait, list) {
+ tsk = waiter->task;
+ smp_mb();
+ waiter->task = NULL;
+ wake_up_process(tsk);
+ put_task_struct(tsk);
+ }
+ INIT_LIST_HEAD(&sem->read_wait);
+}
+
+static void __ldsem_wake_writer(struct ld_semaphore *sem)
+{
+ struct ldsem_waiter *waiter;
+ struct task_struct *tsk;
+
+ waiter = list_entry(sem->write_wait.next, struct ldsem_waiter, list);
+
+ /* only wake this writer if the active part of the count can be
+ * transitioned from 0 -> 1
+ */
+ do {
+ long count;
+
+ count = ldsem_atomic_update(LDSEM_ACTIVE_BIAS, sem);
+ if ((count & LDSEM_ACTIVE_MASK) == 1)
+ break;
+
+ /* Someone grabbed the sem already -
+ * undo the change to the active count, but check for
+ * a transition 1->0
+ */
+ count = ldsem_atomic_update(-LDSEM_ACTIVE_BIAS, sem);
+ if (count & LDSEM_ACTIVE_MASK)
+ return;
+ } while (1);
+
+ /* reset read lock recursion map */
+ ldsem_clear_task_map(sem);
+
+ /* We must be careful not to touch 'waiter' after we set ->task = NULL.
+ * It is an allocated on the waiter's stack and may become invalid at
+ * any time after that point (due to a wakeup from another source).
+ */
+ list_del(&waiter->list);
+ tsk = waiter->task;
+ smp_mb();
+ waiter->task = NULL;
+ wake_up_process(tsk);
+ put_task_struct(tsk);
+}
+
+/*
+ * handle the lock release when processes blocked on it that can now run
+ * - if we come here from up_xxxx(), then:
+ * - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
+ * - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so)
+ * - the spinlock must be held by the caller
+ * - woken process blocks are discarded from the list after having task zeroed
+ */
+static void __ldsem_wake(struct ld_semaphore *sem, int wake_type)
+{
+ if (!list_empty(&sem->write_wait))
+ __ldsem_wake_writer(sem);
+ else if (!list_empty(&sem->read_wait))
+ __ldsem_wake_readers(sem, wake_type);
+}
+
+static void ldsem_wake(struct ld_semaphore *sem)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&sem->wait_lock, flags);
+ __ldsem_wake(sem, LDSEM_WAKE_NORMAL);
+ raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+}
+
+/*
+ * wait for a lock to be granted
+ */
+static struct ld_semaphore __sched *
+down_failed(struct ld_semaphore *sem, unsigned flags, long adjust, long timeout)
+{
+ struct ldsem_waiter waiter;
+
+ /* set up my own style of waitqueue */
+ raw_spin_lock_irq(&sem->wait_lock);
+
+ if (flags & LDSEM_READ_WAIT) {
+ /* Handle recursive read locking -- if the reader already has
+ * a read lock then allow lock acquire without waiting
+ * but also without waking other waiters
+ */
+ if (ldsem_lock_recursion(sem)) {
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return sem;
+ }
+ list_add_tail(&waiter.list, &sem->read_wait);
+ } else
+ list_add_tail(&waiter.list, &sem->write_wait);
+
+ waiter.task = current;
+ waiter.flags = flags;
+ get_task_struct(current);
+
+ /* change the lock attempt to a wait --
+ * if there are no active locks, wake the new lock owner(s)
+ */
+ if ((ldsem_atomic_update(adjust, sem) & LDSEM_ACTIVE_MASK) == 0)
+ __ldsem_wake(sem, LDSEM_WAKE_NO_CHECK);
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ /* wait to be given the lock */
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+
+ if (!waiter.task)
+ break;
+ if (!timeout)
+ break;
+ timeout = schedule_timeout(timeout);
+ }
+
+ __set_current_state(TASK_RUNNING);
+
+ if (!timeout) {
+ /* lock timed out but check if this task was just
+ * granted lock ownership - if so, pretend there
+ * was no timeout; otherwise, cleanup lock wait */
+ raw_spin_lock_irq(&sem->wait_lock);
+ if (waiter.task) {
+ ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem);
+ list_del(&waiter.list);
+ put_task_struct(waiter.task);
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return NULL;
+ }
+ raw_spin_unlock_irq(&sem->wait_lock);
+ }
+
+ return sem;
+}
+
+/*
+ * wait for the read lock to be granted
+ */
+static struct ld_semaphore __sched *
+down_read_failed(struct ld_semaphore *sem, long timeout)
+{
+ return down_failed(sem, LDSEM_READ_WAIT,
+ -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS, timeout);
+}
+
+/*
+ * wait for the write lock to be granted
+ */
+static struct ld_semaphore __sched *
+down_write_failed(struct ld_semaphore *sem, long timeout)
+{
+ return down_failed(sem, LDSEM_WRITE_WAIT, -LDSEM_ACTIVE_BIAS, timeout);
+}
+
+
+
+static inline int __ldsem_down_read_nested(struct ld_semaphore *sem,
+ int subclass, long timeout)
+{
+ lockdep_acquire_read(sem, subclass, 0, _RET_IP_);
+
+ if (atomic_long_inc_return((atomic_long_t *)&sem->count) <= 0) {
+ lock_stat(sem, contended);
+ if (!down_read_failed(sem, timeout)) {
+ lockdep_release(sem, 1, _RET_IP_);
+ return 0;
+ }
+ }
+ lock_stat(sem, acquired);
+
+ /* used for read lock recursion test */
+ ldsem_update_task_map(sem);
+ return 1;
+}
+
+static inline int __ldsem_down_write_nested(struct ld_semaphore *sem,
+ int subclass, long timeout)
+{
+ long count;
+
+ lockdep_acquire(sem, subclass, 0, _RET_IP_);
+
+ raw_spin_lock_irq(&sem->wait_lock);
+
+ count = atomic_long_add_return(LDSEM_WRITE_BIAS,
+ (atomic_long_t *)&sem->count);
+ if (count == LDSEM_WRITE_BIAS) {
+ /* reset read lock recursion map */
+ ldsem_clear_task_map(sem);
+ raw_spin_unlock_irq(&sem->wait_lock);
+ } else {
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ lock_stat(sem, contended);
+ if (!down_write_failed(sem, timeout)) {
+ lockdep_release(sem, 1, _RET_IP_);
+ return 0;
+ }
+ }
+ lock_stat(sem, acquired);
+ return 1;
+}
+
+
+/*
+ * lock for reading -- returns 1 if successful, 0 if timed out
+ */
+int __sched ldsem_down_read(struct ld_semaphore *sem, long timeout)
+{
+ might_sleep();
+ return __ldsem_down_read_nested(sem, 0, timeout);
+}
+
+/*
+ * trylock for reading -- returns 1 if successful, 0 if contention
+ */
+int ldsem_down_read_trylock(struct ld_semaphore *sem)
+{
+ long count;
+
+ while ((count = sem->count) >= 0) {
+ if (count == atomic_long_cmpxchg(&sem->count, count,
+ count + LDSEM_READ_BIAS)) {
+ lockdep_acquire_read(sem, 0, 1, _RET_IP_);
+ lock_stat(sem, acquired);
+
+ ldsem_update_task_map(sem);
+ return 1;
+ }
+ }
+ lock_stat(sem, contended);
+ return 0;
+}
+
+/*
+ * lock for writing -- returns 1 if successful, 0 if timed out
+ */
+int __sched ldsem_down_write(struct ld_semaphore *sem, long timeout)
+{
+ might_sleep();
+ return __ldsem_down_write_nested(sem, 0, timeout);
+}
+
+/*
+ * trylock for writing -- returns 1 if successful, 0 if contention
+ */
+int ldsem_down_write_trylock(struct ld_semaphore *sem)
+{
+ long count;
+
+ raw_spin_lock_irq(&sem->wait_lock);
+
+ count = atomic_long_cmpxchg(&sem->count, LDSEM_UNLOCKED,
+ LDSEM_WRITE_BIAS);
+ if (count == LDSEM_UNLOCKED) {
+ /* reset read lock recursion map */
+ ldsem_clear_task_map(sem);
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ lockdep_acquire(sem, 0, 1, _RET_IP_);
+ lock_stat(sem, acquired);
+ return 1;
+ }
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+ lock_stat(sem, contended);
+ return 0;
+}
+
+/*
+ * release a read lock
+ */
+void ldsem_up_read(struct ld_semaphore *sem)
+{
+ long count;
+
+ lockdep_release(sem, 1, _RET_IP_);
+
+ count = atomic_long_dec_return((atomic_long_t *)&sem->count);
+ if (count < 0 && (count & LDSEM_ACTIVE_MASK) == 0)
+ ldsem_wake(sem);
+}
+
+/*
+ * release a write lock
+ */
+void ldsem_up_write(struct ld_semaphore *sem)
+{
+ long count;
+
+ lockdep_release(sem, 1, _RET_IP_);
+
+ count = atomic_long_sub_return(LDSEM_WRITE_BIAS,
+ (atomic_long_t *)&sem->count);
+ if (count < 0)
+ ldsem_wake(sem);
+}
+
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+int ldsem_down_read_nested(struct ld_semaphore *sem, int subclass, long timeout)
+{
+ might_sleep();
+ return __ldsem_down_read_nested(sem, subclass, timeout);
+}
+
+int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass,
+ long timeout)
+{
+ might_sleep();
+ return __ldsem_down_write_nested(sem, subclass, timeout);
+}
+
+#endif
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index 455a0d7..bbefe71 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -110,6 +110,53 @@
#include <linux/wait.h>
#include <linux/wait.h>

+
+/*
+ * the semaphore definition
+ */
+struct ld_semaphore {
+ long count;
+ raw_spinlock_t wait_lock;
+ struct list_head read_wait;
+ struct list_head write_wait;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+#define TASK_MAP_BITS 157
+ DECLARE_BITMAP(task_map, TASK_MAP_BITS);
+};
+
+extern void __init_ldsem(struct ld_semaphore *sem, const char *name,
+ struct lock_class_key *key);
+
+#define init_ldsem(sem) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __init_ldsem((sem), #sem, &__key); \
+} while (0)
+
+
+extern int ldsem_down_read(struct ld_semaphore *sem, long timeout);
+extern int ldsem_down_read_trylock(struct ld_semaphore *sem);
+extern int ldsem_down_write(struct ld_semaphore *sem, long timeout);
+extern int ldsem_down_write_trylock(struct ld_semaphore *sem);
+extern void ldsem_up_read(struct ld_semaphore *sem);
+extern void ldsem_up_write(struct ld_semaphore *sem);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern int ldsem_down_read_nested(struct ld_semaphore *sem, int subclass,
+ long timeout);
+extern int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass,
+ long timeout);
+#else
+# define ldsem_down_read_nested(sem, subclass, timeout) \
+ ldsem_down_read(sem, timeout)
+# define ldsem_down_write_nested(sem, subclass, timeout) \
+ ldsem_down_write(sem, timeout)
+#endif
+
+
struct tty_ldisc_ops {
int magic;
char *name;
--
1.8.1.2

2013-03-11 21:26:34

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 27/44] tty: Drop lock contention stat from ldsem trylocks

When lockdep is notified of lock contention, lockdep expects its
internal state to indicate the lock is held. Since trylocks cannot
reflect this state unless the lock is actually acquired, contention
stats cannot be collected for trylocks.

Fixes:
[ 1473.912280] =================================
[ 1473.913180] [ BUG: bad contention detected! ]
[ 1473.914071] 3.8.0-next-20130220-sasha-00038-g1ad55df-dirty #8 Tainted: G W
[ 1473.915684] ---------------------------------
[ 1473.916549] kworker/1:1/361 is trying to contend lock (&tty->ldisc_sem) at:
[ 1473.918031] [<ffffffff81c493df>] tty_ldisc_ref+0x1f/0x60
[ 1473.919060] but there are no locks held!
[ 1473.919813]
[ 1473.919813] other info that might help us debug this:
[ 1473.920044] 2 locks held by kworker/1:1/361:
[ 1473.920044] #0: (events){.+.+.+}, at: [<ffffffff811328a8>] process_one_work+0x228/0x6a0
[ 1473.920044] #1: ((&buf->work)){+.+...}, at: [<ffffffff811328a8>] process_one_work+0x228/0x6a0
[ 1473.920044]
[ 1473.920044] stack backtrace:
[ 1473.920044] Pid: 361, comm: kworker/1:1 Tainted: G W 3.8.0-next-20130220-sasha-00038-g1ad55df-dirty #8
[ 1473.920044] Call Trace:
[ 1473.920044] [<ffffffff81c493df>] ? tty_ldisc_ref+0x1f/0x60
[ 1473.920044] [<ffffffff81182026>] print_lock_contention_bug+0xf6/0x110
[ 1473.920044] [<ffffffff81184973>] lock_contended+0x213/0x4e0
[ 1473.920044] [<ffffffff81c4bb41>] ldsem_down_read_trylock+0xb1/0xc0
[ 1473.920044] [<ffffffff81c493df>] tty_ldisc_ref+0x1f/0x60
[ 1473.920044] [<ffffffff81c4a687>] flush_to_ldisc+0x37/0x1a0
[ 1473.920044] [<ffffffff811329e6>] process_one_work+0x366/0x6a0
[ 1473.920044] [<ffffffff811328a8>] ? process_one_work+0x228/0x6a0
[ 1473.920044] [<ffffffff811332a8>] worker_thread+0x238/0x370
[ 1473.920044] [<ffffffff81133070>] ? rescuer_thread+0x310/0x310
[ 1473.920044] [<ffffffff8113d873>] kthread+0xe3/0xf0
[ 1473.920044] [<ffffffff8113d790>] ? flush_kthread_work+0x1f0/0x1f0
[ 1473.920044] [<ffffffff83dab4fc>] ret_from_fork+0x7c/0xb0
[ 1473.920044] [<ffffffff8113d790>] ? flush_kthread_work+0x1f0/0x1f0

Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldsem.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 0ab5b09..c162295 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -419,7 +419,6 @@ int ldsem_down_read_trylock(struct ld_semaphore *sem)
return 1;
}
}
- lock_stat(sem, contended);
return 0;
}

@@ -453,9 +452,7 @@ int ldsem_down_write_trylock(struct ld_semaphore *sem)
lock_stat(sem, acquired);
return 1;
}
-
raw_spin_unlock_irq(&sem->wait_lock);
- lock_stat(sem, contended);
return 0;
}

--
1.8.1.2

2013-03-11 21:15:24

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 15/44] tty: Make core responsible for synchronizing its work

The tty core relies on the ldisc layer for synchronizing destruction
of the tty. Instead, the final tty release must wait for any pending tty
work to complete prior to tty destruction.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_io.c | 17 +++++++++++++++++
drivers/tty/tty_ldisc.c | 24 ++++--------------------
2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 3613d8b..9e8ff84 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1464,6 +1464,17 @@ void tty_free_termios(struct tty_struct *tty)
}
EXPORT_SYMBOL(tty_free_termios);

+/**
+ * tty_flush_works - flush all works of a tty
+ * @tty: tty device to flush works for
+ *
+ * Sync flush all works belonging to @tty.
+ */
+static void tty_flush_works(struct tty_struct *tty)
+{
+ flush_work(&tty->SAK_work);
+ flush_work(&tty->hangup_work);
+}

/**
* release_one_tty - release tty structure memory
@@ -1785,6 +1796,12 @@ int tty_release(struct inode *inode, struct file *filp)
* Ask the line discipline code to release its structures
*/
tty_ldisc_release(tty, o_tty);
+
+ /* Wait for pending work before tty destruction commmences */
+ tty_flush_works(tty);
+ if (o_tty)
+ tty_flush_works(o_tty);
+
/*
* The release_tty function takes care of the details of clearing
* the slots and preserving the termios structure. The tty_unlock_pair
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index cbb945b..7f7e1a3 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -499,18 +499,6 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
}

/**
- * tty_ldisc_flush_works - flush all works of a tty
- * @tty: tty device to flush works for
- *
- * Sync flush all works belonging to @tty.
- */
-static void tty_ldisc_flush_works(struct tty_struct *tty)
-{
- flush_work(&tty->SAK_work);
- flush_work(&tty->hangup_work);
-}
-
-/**
* tty_ldisc_wait_idle - wait for the ldisc to become idle
* @tty: tty to wait for
* @timeout: for how long to wait at most
@@ -698,13 +686,13 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
retval = tty_ldisc_halt(tty, o_tty, 5 * HZ);

/*
- * Wait for ->hangup_work and ->buf.work handlers to terminate.
+ * Wait for hangup to complete, if pending.
* We must drop the mutex here in case a hangup is also in process.
*/

mutex_unlock(&tty->ldisc_mutex);

- tty_ldisc_flush_works(tty);
+ flush_work(&tty->hangup_work);

tty_lock(tty);
mutex_lock(&tty->ldisc_mutex);
@@ -951,15 +939,11 @@ static void tty_ldisc_kill(struct tty_struct *tty)
void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
{
/*
- * Prevent flush_to_ldisc() from rescheduling the work for later. Then
- * kill any delayed work. As this is the final close it does not
- * race with the set_ldisc code path.
+ * Shutdown this line discipline. As this is the final close,
+ * it does not race with the set_ldisc code path.
*/

tty_ldisc_halt(tty, o_tty, MAX_SCHEDULE_TIMEOUT);
- tty_ldisc_flush_works(tty);
- if (o_tty)
- tty_ldisc_flush_works(o_tty);

tty_lock_pair(tty, o_tty);
/* This will need doing differently if we need to lock */
--
1.8.1.2

2013-03-11 21:15:23

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 19/44] tty: Don't protect atomic operation with mutex

test_bit() is already atomic; drop mutex lock/unlock.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_io.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 2ac516d..bf33440 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1345,9 +1345,7 @@ static int tty_reopen(struct tty_struct *tty)
}
tty->count++;

- mutex_lock(&tty->ldisc_mutex);
WARN_ON(!test_bit(TTY_LDISC, &tty->flags));
- mutex_unlock(&tty->ldisc_mutex);

return 0;
}
--
1.8.1.2

2013-03-11 21:27:44

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 25/44] tty: Fix recursive deadlock in tty_perform_flush()

tty_perform_flush() can deadlock when called while holding
a line discipline reference. By definition, all ldisc drivers
hold a ldisc reference, so calls originating from ldisc drivers
must not block for a ldisc reference.

The deadlock can occur when:
CPU 0 | CPU 1
|
tty_ldisc_ref(tty) |
.... | <line discipline halted>
tty_ldisc_ref_wait(tty) |
|

CPU 0 cannot progess because it cannot obtain an ldisc reference
with the line discipline has been halted (thus no new references
are granted).
CPU 1 cannot progress because an outstanding ldisc reference
has not been released.

An in-tree call-tree audit of tty_perform_flush() [1] shows 5
ldisc drivers calling tty_perform_flush() indirectly via
n_tty_ioctl_helper() and 2 ldisc drivers calling directly.
A single tty driver safely uses the function.

[1]
Recursive usage:

/* These functions are line discipline ioctls and thus
* recursive wrt line discipline references */

tty_perform_flush() - ./drivers/tty/tty_ioctl.c
n_tty_ioctl_helper()
hci_uart_tty_ioctl(default) - drivers/bluetooth/hci_ldisc.c (N_HCI)
n_hdlc_tty_ioctl(default) - drivers/tty/n_hdlc.c (N_HDLC)
gsmld_ioctl(default) - drivers/tty/n_gsm.c (N_GSM0710)
n_tty_ioctl(default) - drivers/tty/n_tty.c (N_TTY)
gigaset_tty_ioctl(default) - drivers/isdn/gigaset/ser-gigaset.c (N_GIGASET_M101)
ppp_synctty_ioctl(TCFLSH) - drivers/net/ppp/pps_synctty.c
ppp_asynctty_ioctl(TCFLSH) - drivers/net/ppp/ppp_async.c

Non-recursive use:

tty_perform_flush() - drivers/tty/tty_ioctl.c
ipw_ioctl(TCFLSH) - drivers/tty/ipwireless/tty.c
/* This function is a tty i/o ioctl method, which
* is invoked by tty_ioctl() */

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/net/ppp/ppp_async.c | 2 +-
drivers/net/ppp/ppp_synctty.c | 2 +-
drivers/tty/tty_ioctl.c | 28 +++++++++++++++++++---------
3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index a031f6b..9c889e0 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -314,7 +314,7 @@ ppp_asynctty_ioctl(struct tty_struct *tty, struct file *file,
/* flush our buffers and the serial port's buffer */
if (arg == TCIOFLUSH || arg == TCOFLUSH)
ppp_async_flush_output(ap);
- err = tty_perform_flush(tty, arg);
+ err = n_tty_ioctl_helper(tty, file, cmd, arg);
break;

case FIONREAD:
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 1a12033..bdf3b13 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -355,7 +355,7 @@ ppp_synctty_ioctl(struct tty_struct *tty, struct file *file,
/* flush our buffers and the serial port's buffer */
if (arg == TCIOFLUSH || arg == TCOFLUSH)
ppp_sync_flush_output(ap);
- err = tty_perform_flush(tty, arg);
+ err = n_tty_ioctl_helper(tty, file, cmd, arg);
break;

case FIONREAD:
diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c
index d58b92c..935b032 100644
--- a/drivers/tty/tty_ioctl.c
+++ b/drivers/tty/tty_ioctl.c
@@ -1086,14 +1086,12 @@ int tty_mode_ioctl(struct tty_struct *tty, struct file *file,
}
EXPORT_SYMBOL_GPL(tty_mode_ioctl);

-int tty_perform_flush(struct tty_struct *tty, unsigned long arg)
+
+/* Caller guarantees ldisc reference is held */
+static int __tty_perform_flush(struct tty_struct *tty, unsigned long arg)
{
- struct tty_ldisc *ld;
- int retval = tty_check_change(tty);
- if (retval)
- return retval;
+ struct tty_ldisc *ld = tty->ldisc;

- ld = tty_ldisc_ref_wait(tty);
switch (arg) {
case TCIFLUSH:
if (ld && ld->ops->flush_buffer) {
@@ -1111,12 +1109,24 @@ int tty_perform_flush(struct tty_struct *tty, unsigned long arg)
tty_driver_flush_buffer(tty);
break;
default:
- tty_ldisc_deref(ld);
return -EINVAL;
}
- tty_ldisc_deref(ld);
return 0;
}
+
+int tty_perform_flush(struct tty_struct *tty, unsigned long arg)
+{
+ struct tty_ldisc *ld;
+ int retval = tty_check_change(tty);
+ if (retval)
+ return retval;
+
+ ld = tty_ldisc_ref_wait(tty);
+ retval = __tty_perform_flush(tty, arg);
+ if (ld)
+ tty_ldisc_deref(ld);
+ return retval;
+}
EXPORT_SYMBOL_GPL(tty_perform_flush);

int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file,
@@ -1155,7 +1165,7 @@ int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file,
}
return 0;
case TCFLSH:
- return tty_perform_flush(tty, arg);
+ return __tty_perform_flush(tty, arg);
default:
/* Try the mode commands */
return tty_mode_ioctl(tty, file, cmd, arg);
--
1.8.1.2

2013-03-11 21:15:20

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 18/44] tty: Add ldisc hangup debug messages

Expected typical debug log:
[ 582.721965] tty_open: opening pts3...
[ 582.721970] tty_open: opening pts3...
[ 582.721977] tty_release: pts3 (tty count=3)...
[ 582.721980] tty_release: ptm3 (tty count=1)...
[ 582.722015] pts3 vhangup...
[ 582.722020] tty_ldisc_hangup: pts3: closing ldisc: ffff88007a920540
[ 582.724128] tty_release: pts3 (tty count=2)...
[ 582.724217] tty_ldisc_hangup: pts3: re-opened ldisc: ffff88007a920580
[ 582.724221] tty_release: ptm3: final close
[ 582.724234] tty_ldisc_release: ptm3: closing ldisc: ffff88007a920a80
[ 582.724238] tty_ldisc_release: ptm3: ldisc closed
[ 582.724241] tty_release: ptm3: freeing structure...
[ 582.724741] tty_open: opening pts3...

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 0030d55..328ff5b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -20,6 +20,17 @@
#include <linux/uaccess.h>
#include <linux/ratelimit.h>

+#undef LDISC_DEBUG_HANGUP
+
+#ifdef LDISC_DEBUG_HANGUP
+#define tty_ldisc_debug(tty, f, args...) ({ \
+ char __b[64]; \
+ printk(KERN_DEBUG "%s: %s: " f, __func__, tty_name(tty, __b), ##args); \
+})
+#else
+#define tty_ldisc_debug(tty, f, args...)
+#endif
+
/*
* This guards the refcounted line discipline lists. The lock
* must be taken with irqs off because there are hangup path
@@ -822,6 +833,8 @@ void tty_ldisc_hangup(struct tty_struct *tty)
int reset = tty->driver->flags & TTY_DRIVER_RESET_TERMIOS;
int err = 0;

+ tty_ldisc_debug(tty, "closing ldisc: %p\n", tty->ldisc);
+
/*
* FIXME! What are the locking issues here? This may me overdoing
* things... This question is especially important now that we've
@@ -878,6 +891,8 @@ void tty_ldisc_hangup(struct tty_struct *tty)
mutex_unlock(&tty->ldisc_mutex);
if (reset)
tty_reset_termios(tty);
+
+ tty_ldisc_debug(tty, "re-opened ldisc: %p\n", tty->ldisc);
}

/**
@@ -944,6 +959,8 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
* it does not race with the set_ldisc code path.
*/

+ tty_ldisc_debug(tty, "closing ldisc: %p\n", tty->ldisc);
+
tty_ldisc_halt(tty, o_tty, MAX_SCHEDULE_TIMEOUT);

tty_lock_pair(tty, o_tty);
@@ -955,6 +972,8 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)
tty_unlock_pair(tty, o_tty);
/* And the memory resources remaining (buffers, termios) will be
disposed of when the kref hits zero */
+
+ tty_ldisc_debug(tty, "ldisc closed\n");
}

/**
--
1.8.1.2

2013-03-11 21:28:34

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 23/44] tty: Locate get/put ldisc functions together

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index f26ef1a..4e46c17 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -179,6 +179,29 @@ static struct tty_ldisc *tty_ldisc_get(int disc)
return ld;
}

+/**
+ * tty_ldisc_put - release the ldisc
+ *
+ * Complement of tty_ldisc_get().
+ */
+static inline void tty_ldisc_put(struct tty_ldisc *ld)
+{
+ unsigned long flags;
+
+ if (WARN_ON_ONCE(!ld))
+ return;
+
+ raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+
+ /* unreleased reader reference(s) will cause this WARN */
+ WARN_ON(!atomic_dec_and_test(&ld->users));
+
+ ld->ops->refcount--;
+ module_put(ld->ops->owner);
+ kfree(ld);
+ raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+}
+
static void *tty_ldiscs_seq_start(struct seq_file *m, loff_t *pos)
{
return (*pos < NR_LDISCS) ? pos : NULL;
@@ -329,29 +352,6 @@ void tty_ldisc_deref(struct tty_ldisc *ld)
EXPORT_SYMBOL_GPL(tty_ldisc_deref);

/**
- * tty_ldisc_put - release the ldisc
- *
- * Complement of tty_ldisc_get().
- */
-static inline void tty_ldisc_put(struct tty_ldisc *ld)
-{
- unsigned long flags;
-
- if (WARN_ON_ONCE(!ld))
- return;
-
- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
-
- /* unreleased reader reference(s) will cause this WARN */
- WARN_ON(!atomic_dec_and_test(&ld->users));
-
- ld->ops->refcount--;
- module_put(ld->ops->owner);
- kfree(ld);
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
-}
-
-/**
* tty_ldisc_enable - allow ldisc use
* @tty: terminal to activate ldisc on
*
--
1.8.1.2

2013-03-11 21:28:53

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 21/44] tty: Document unsafe ldisc reference acquire

Merge get_ldisc() into its only call site.
Note how, after merging, the unsafe acquire of an ldisc reference
is obvious.

CPU 0 in tty_ldisc_try() | CPU 1 in tty_ldisc_halt()
|
test_bit(TTY_LDISC, &tty_flags) |
if (true) | clear_bit(TTY_LDISC, &tty_flags)
tty->ldisc != 0? | atomic_read(&tty->ldisc->users)
if (true) | ret_val == 1?
atomic_inc(&tty->ldisc->users) | if (false)
| wait
|
<goes on assuming safe ldisc use> | <doesn't wait - proceeds w/ close>
|

The spin lock in tty_ldisc_try() does nothing wrt synchronizing
the ldisc halt since it's not acquired as part of halting.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 9362a10..5ee0b2b 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -42,13 +42,6 @@ static DECLARE_WAIT_QUEUE_HEAD(tty_ldisc_wait);
/* Line disc dispatch table */
static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];

-static inline struct tty_ldisc *get_ldisc(struct tty_ldisc *ld)
-{
- if (ld)
- atomic_inc(&ld->users);
- return ld;
-}
-
/**
* tty_register_ldisc - install a line discipline
* @disc: ldisc number
@@ -269,10 +262,13 @@ static struct tty_ldisc *tty_ldisc_try(struct tty_struct *tty)
unsigned long flags;
struct tty_ldisc *ld;

+ /* FIXME: this allows reference acquire after TTY_LDISC is cleared */
raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
ld = NULL;
- if (test_bit(TTY_LDISC, &tty->flags))
- ld = get_ldisc(tty->ldisc);
+ if (test_bit(TTY_LDISC, &tty->flags) && tty->ldisc) {
+ ld = tty->ldisc;
+ atomic_inc(&ld->users);
+ }
raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
return ld;
}
--
1.8.1.2

2013-03-11 21:29:14

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v5 11/44] n_tty: Correct unthrottle-with-buffer-flush comments

The driver is no longer unthrottled on buffer reset, so remove
comments that claim it is.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/n_tty.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 0b85693..a786f4e 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -213,9 +213,8 @@ static void check_unthrottle(struct tty_struct *tty)
* reset_buffer_flags - reset buffer state
* @tty: terminal to reset
*
- * Reset the read buffer counters, clear the flags,
- * and make sure the driver is unthrottled. Called
- * from n_tty_open() and n_tty_flush_buffer().
+ * Reset the read buffer counters and clear the flags.
+ * Called from n_tty_open() and n_tty_flush_buffer().
*
* Locking: tty_read_lock for read fields.
*/
@@ -254,17 +253,15 @@ static void n_tty_packet_mode_flush(struct tty_struct *tty)
* n_tty_flush_buffer - clean input queue
* @tty: terminal device
*
- * Flush the input buffer. Called when the line discipline is
- * being closed, when the tty layer wants the buffer flushed (eg
- * at hangup) or when the N_TTY line discipline internally has to
- * clean the pending queue (for example some signals).
+ * Flush the input buffer. Called when the tty layer wants the
+ * buffer flushed (eg at hangup) or when the N_TTY line discipline
+ * internally has to clean the pending queue (for example some signals).
*
* Locking: ctrl_lock, read_lock.
*/

static void n_tty_flush_buffer(struct tty_struct *tty)
{
- /* clear everything and unthrottle the driver */
reset_buffer_flags(tty);

if (tty->link)
--
1.8.1.2

2013-03-11 21:37:33

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 25/44] tty: Fix recursive deadlock in tty_perform_flush()

[ +cc Paul Mackerras, linux-ppp, netdev ]

I neglected to cc the proper folks. Sorry about that.

Regards,
Peter Hurley

On Mon, 2013-03-11 at 16:44 -0400, Peter Hurley wrote:
> tty_perform_flush() can deadlock when called while holding
> a line discipline reference. By definition, all ldisc drivers
> hold a ldisc reference, so calls originating from ldisc drivers
> must not block for a ldisc reference.
>
> The deadlock can occur when:
> CPU 0 | CPU 1
> |
> tty_ldisc_ref(tty) |
> .... | <line discipline halted>
> tty_ldisc_ref_wait(tty) |
> |
>
> CPU 0 cannot progess because it cannot obtain an ldisc reference
> with the line discipline has been halted (thus no new references
> are granted).
> CPU 1 cannot progress because an outstanding ldisc reference
> has not been released.
>
> An in-tree call-tree audit of tty_perform_flush() [1] shows 5
> ldisc drivers calling tty_perform_flush() indirectly via
> n_tty_ioctl_helper() and 2 ldisc drivers calling directly.
> A single tty driver safely uses the function.
>
> [1]
> Recursive usage:
>
> /* These functions are line discipline ioctls and thus
> * recursive wrt line discipline references */
>
> tty_perform_flush() - ./drivers/tty/tty_ioctl.c
> n_tty_ioctl_helper()
> hci_uart_tty_ioctl(default) - drivers/bluetooth/hci_ldisc.c (N_HCI)
> n_hdlc_tty_ioctl(default) - drivers/tty/n_hdlc.c (N_HDLC)
> gsmld_ioctl(default) - drivers/tty/n_gsm.c (N_GSM0710)
> n_tty_ioctl(default) - drivers/tty/n_tty.c (N_TTY)
> gigaset_tty_ioctl(default) - drivers/isdn/gigaset/ser-gigaset.c (N_GIGASET_M101)
> ppp_synctty_ioctl(TCFLSH) - drivers/net/ppp/pps_synctty.c
> ppp_asynctty_ioctl(TCFLSH) - drivers/net/ppp/ppp_async.c
>
> Non-recursive use:
>
> tty_perform_flush() - drivers/tty/tty_ioctl.c
> ipw_ioctl(TCFLSH) - drivers/tty/ipwireless/tty.c
> /* This function is a tty i/o ioctl method, which
> * is invoked by tty_ioctl() */
>
> Signed-off-by: Peter Hurley <[email protected]>
> ---
> drivers/net/ppp/ppp_async.c | 2 +-
> drivers/net/ppp/ppp_synctty.c | 2 +-
> drivers/tty/tty_ioctl.c | 28 +++++++++++++++++++---------
> 3 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
> index a031f6b..9c889e0 100644
> --- a/drivers/net/ppp/ppp_async.c
> +++ b/drivers/net/ppp/ppp_async.c
> @@ -314,7 +314,7 @@ ppp_asynctty_ioctl(struct tty_struct *tty, struct file *file,
> /* flush our buffers and the serial port's buffer */
> if (arg == TCIOFLUSH || arg == TCOFLUSH)
> ppp_async_flush_output(ap);
> - err = tty_perform_flush(tty, arg);
> + err = n_tty_ioctl_helper(tty, file, cmd, arg);
> break;
>
> case FIONREAD:
> diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
> index 1a12033..bdf3b13 100644
> --- a/drivers/net/ppp/ppp_synctty.c
> +++ b/drivers/net/ppp/ppp_synctty.c
> @@ -355,7 +355,7 @@ ppp_synctty_ioctl(struct tty_struct *tty, struct file *file,
> /* flush our buffers and the serial port's buffer */
> if (arg == TCIOFLUSH || arg == TCOFLUSH)
> ppp_sync_flush_output(ap);
> - err = tty_perform_flush(tty, arg);
> + err = n_tty_ioctl_helper(tty, file, cmd, arg);
> break;
>
> case FIONREAD:
> diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c
> index d58b92c..935b032 100644
> --- a/drivers/tty/tty_ioctl.c
> +++ b/drivers/tty/tty_ioctl.c
> @@ -1086,14 +1086,12 @@ int tty_mode_ioctl(struct tty_struct *tty, struct file *file,
> }
> EXPORT_SYMBOL_GPL(tty_mode_ioctl);
>
> -int tty_perform_flush(struct tty_struct *tty, unsigned long arg)
> +
> +/* Caller guarantees ldisc reference is held */
> +static int __tty_perform_flush(struct tty_struct *tty, unsigned long arg)
> {
> - struct tty_ldisc *ld;
> - int retval = tty_check_change(tty);
> - if (retval)
> - return retval;
> + struct tty_ldisc *ld = tty->ldisc;
>
> - ld = tty_ldisc_ref_wait(tty);
> switch (arg) {
> case TCIFLUSH:
> if (ld && ld->ops->flush_buffer) {
> @@ -1111,12 +1109,24 @@ int tty_perform_flush(struct tty_struct *tty, unsigned long arg)
> tty_driver_flush_buffer(tty);
> break;
> default:
> - tty_ldisc_deref(ld);
> return -EINVAL;
> }
> - tty_ldisc_deref(ld);
> return 0;
> }
> +
> +int tty_perform_flush(struct tty_struct *tty, unsigned long arg)
> +{
> + struct tty_ldisc *ld;
> + int retval = tty_check_change(tty);
> + if (retval)
> + return retval;
> +
> + ld = tty_ldisc_ref_wait(tty);
> + retval = __tty_perform_flush(tty, arg);
> + if (ld)
> + tty_ldisc_deref(ld);
> + return retval;
> +}
> EXPORT_SYMBOL_GPL(tty_perform_flush);
>
> int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file,
> @@ -1155,7 +1165,7 @@ int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file,
> }
> return 0;
> case TCFLSH:
> - return tty_perform_flush(tty, arg);
> + return __tty_perform_flush(tty, arg);
> default:
> /* Try the mode commands */
> return tty_mode_ioctl(tty, file, cmd, arg);

2013-03-12 02:28:39

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Mon, Mar 11, 2013 at 1:44 PM, Peter Hurley <[email protected]> wrote:
> Greg,
> This patchset includes
> 'tty: Drop lock contention stat from ldsem trylocks'
> so no need to apply that on this series. Also, I noticed you
> kept the 'tty is NULL' removal on a different branch so I left
> my patch in this series that removes it.
>
> This series applies cleanly to tty-next.
>
> v5 changes:
>
> After completing an audit of the recursive use of ldisc
> references, I discovered the _blocking_ recursive acquisition
> of ldisc references was limited to line disciplines misusing
> the tty_perform_flush() function.
> With that now resolved in,
> 'tty: Fix recursive deadlock in tty_perform_flush()'
> the recursion design in ldsem has been removed.
>
> The recursion removal is in its own patch,
> 'tty: Remove ldsem recursion support'
> to ease review for those that have already reviewed the
> ldsem implementation.
>
> In addition, this patchset implements lock stealing derived
> from the work of Michel Lespinasse <[email protected]> on
> writer lock stealing in rwsem.
>
> Although the rwsem write lock stealing changes are motivated
> by performance criteria, these changes are motivated by reduced
> code line count and simplicity of design.
>
> *** Edited below to remove recursion discussion ***
>
> Back in early December I realized that a classic read/write semaphore
> with writer priority was the ideal mechanism for handling the
> line discipline referencing.
>
> Line discipline references act as "readers"; closing or changing the
> line discipline is prevented while these references are outstanding.
> Conversely, line discipline references should not be granted while
> the line discipline is closing or changing; these are the "writers".
>
> Unfortunately, the existing rwsem uses a FIFO priority for
> waiting threads and does not support timeouts.
>
> So this implements just that: a writer-priority
> read/write semaphore with timed waits.

Thanks for eliminating the recursion requirement. I think this really
helps - I didn't like that multiple readers with a colliding current
hash could basically starve out a writer forever.

Not knowing anything about the tty layer, I am curious about the
context for your other requirements. What are ldisc references taken
for and for how long are they held ? I am surprised that the writers
may hit a 5 second timeout (because I didn't expect the references to
be held for very long).

Also why the write-priority requirement rather than reader-writer
fairness ? Is it to make it less likely to hit the writer timeouts ?

In short: I am worried about the introduciton of a new lock type, and
would be happier if rwsem could be made to fit. BTW, extending rwsem
itself to add writer timeouts seems quite doable (but making it work
as a write priority lock would seem like a bad idea).

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-03-12 16:47:59

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Mon, 2013-03-11 at 19:28 -0700, Michel Lespinasse wrote:
> On Mon, Mar 11, 2013 at 1:44 PM, Peter Hurley <[email protected]> wrote:
> > Greg,
> > This patchset includes
> > 'tty: Drop lock contention stat from ldsem trylocks'
> > so no need to apply that on this series. Also, I noticed you
> > kept the 'tty is NULL' removal on a different branch so I left
> > my patch in this series that removes it.
> >
> > This series applies cleanly to tty-next.
> >
> > v5 changes:
> >
> > After completing an audit of the recursive use of ldisc
> > references, I discovered the _blocking_ recursive acquisition
> > of ldisc references was limited to line disciplines misusing
> > the tty_perform_flush() function.
> > With that now resolved in,
> > 'tty: Fix recursive deadlock in tty_perform_flush()'
> > the recursion design in ldsem has been removed.
> >
> > The recursion removal is in its own patch,
> > 'tty: Remove ldsem recursion support'
> > to ease review for those that have already reviewed the
> > ldsem implementation.
> >
> > In addition, this patchset implements lock stealing derived
> > from the work of Michel Lespinasse <[email protected]> on
> > writer lock stealing in rwsem.
> >
> > Although the rwsem write lock stealing changes are motivated
> > by performance criteria, these changes are motivated by reduced
> > code line count and simplicity of design.
> >
> > *** Edited below to remove recursion discussion ***
> >
> > Back in early December I realized that a classic read/write semaphore
> > with writer priority was the ideal mechanism for handling the
> > line discipline referencing.
> >
> > Line discipline references act as "readers"; closing or changing the
> > line discipline is prevented while these references are outstanding.
> > Conversely, line discipline references should not be granted while
> > the line discipline is closing or changing; these are the "writers".
> >
> > Unfortunately, the existing rwsem uses a FIFO priority for
> > waiting threads and does not support timeouts.
> >
> > So this implements just that: a writer-priority
> > read/write semaphore with timed waits.
>
> Thanks for eliminating the recursion requirement. I think this really
> helps - I didn't like that multiple readers with a colliding current
> hash could basically starve out a writer forever.

Yeah, writer starvation was a definite down-side. It looked like the
only option at the time.

> Not knowing anything about the tty layer, I am curious about the
> context for your other requirements. What are ldisc references taken
> for and for how long are they held ? I am surprised that the writers
> may hit a 5 second timeout (because I didn't expect the references to
> be held for very long).

Some background:

A line discipline (ldisc) abstracts raw device-side i/o for the tty
layer. The line discipline is responsible for i/o translation,
buffering, and flow control. Line disciplines can also provide other
features, such as i/o routing; eg., N_PPP routes serial i/o out to the
network stack, instead of providing user-space file i/o.

Line disciplines are packaged as loadable drivers. Each tty device has
its own 'instance' of a line discipline. In addition, a tty's line
discipline can be changed via the TIOCSETD ioctl.

The N_TTY line discipline is the default ldisc and implements the
standard terminal control features such as ^C -> SIGINT, CR/NL mangling,
echoing and line-edit mode.

While the tty layer manages the state of the tty device and provides the
user-space file i/o interface, the line discipline actually implements
the file i/o. This means that user-space blocking reads and blocking
polls block in the respective line discipline i/o methods.

Ldisc references:

Line discipline (ldisc) references pin the tty device's ldisc instance
while in use. This can be short -- eg., when device-side input is
received -- or can be really long -- eg., getty uses blocking read to
park the virtual terminals.

Two requirements complicate this otherwise straightforward
reference-counting pattern.

First, the TIOCSETD ioctl must change the line discipline atomically
wrt. user-space file i/o _and_ device-side i/o. Of course, this can only
happen if there are no existing references to this ldisc. Since ldisc
references have highly variable lifetimes, changing the line discipline
may fail if all existing references to this ldisc are not released. Thus
the lock timeout requirement.

Second, ttys can be hung up, synchronously (like logout and usb
disconnect) and asynchronously (like carrier loss). A tty hangup
recycles the ldisc so at the conclusion of the tty hangup, a fresh reset
ldisc is ready for use. This idiosyncrasy is both a security measure,
and also a requirement to support legacy logins which don't properly
reset the terminal state.

>From the point at which a hangup occurs, all current and future i/o must
cease. Foreground processes which may have existing ldisc references are
signalled so they can exit the ldisc i/o loop and drop their references.

> Also why the write-priority requirement rather than reader-writer
> fairness ? Is it to make it less likely to hit the writer timeouts ?

Since tty i/o can be really [painfully] slow, allowing waiting future
references to succeed is not an option.

> In short: I am worried about the introduciton of a new lock type, and
> would be happier if rwsem could be made to fit. BTW, extending rwsem
> itself to add writer timeouts seems quite doable (but making it work
> as a write priority lock would seem like a bad idea).

I understand the concern regarding the potential proliferation of new
lock types. Lock implementations are hard to get right, and no one wants
to debug 7 different lock policy implementations of a read/write
semaphore.

OTOH, a lack of existing options has spawned a DIY approach without
higher-order locks that is rarely correct, but which goes largely
unnoticed exactly because it's not a new lock. A brief review of the
hangs, races, and deadlocks fixed by this patchset should be convincing
enough of that fact. In my opinion, this is the overriding concern.

The two main problems with a one-size-fits-all lock policy is that,
1) lock experts can't realistically foresee the consequences of policy
changes without already being experts in the subsystems in which that
lock is used. Even domain experts may miss potential consequences, and
2) domain experts typically wouldn't even consider writing a new lock.
So they make do with atomic bit states, spinlocks, reference counts,
mutexes, and waitqueues, making a mostly-functional, higher-order lock.

I won't make a case for 2) above, as I think that's self-evident.

As an example of 1) above, Alex's and your work on write lock stealing
is predicated on the proposition that writers advancing in front of
waiting readers is acceptable; that may or may not be true for every use
case. [ It's not my intention to be critical here -- obviously I
appreciate your work because I used it in ldsem. :) ]

Perhaps a future direction for rwsem would be to provide a selectable
lock policy (fifo, mostly-fair, writer-first) on initialization so that
the different use cases can be easily accomodated?

Regards,
Peter Hurley

2013-03-13 11:37:17

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Tue, Mar 12, 2013 at 9:47 AM, Peter Hurley <[email protected]> wrote:
> On Mon, 2013-03-11 at 19:28 -0700, Michel Lespinasse wrote:
>> Also why the write-priority requirement rather than reader-writer
>> fairness ? Is it to make it less likely to hit the writer timeouts ?
>
> Since tty i/o can be really [painfully] slow, allowing waiting future
> references to succeed is not an option.

All right, that makes sense after your explanation.

> I understand the concern regarding the potential proliferation of new
> lock types. Lock implementations are hard to get right, and no one wants
> to debug 7 different lock policy implementations of a read/write
> semaphore.
>
> OTOH, a lack of existing options has spawned a DIY approach without
> higher-order locks that is rarely correct, but which goes largely
> unnoticed exactly because it's not a new lock. A brief review of the
> hangs, races, and deadlocks fixed by this patchset should be convincing
> enough of that fact. In my opinion, this is the overriding concern.

Agree that having a suitable lock for your usage is much nicer than
having ad-hoc solutions.

> The two main problems with a one-size-fits-all lock policy is that,
> 1) lock experts can't realistically foresee the consequences of policy
> changes without already being experts in the subsystems in which that
> lock is used. Even domain experts may miss potential consequences, and
> 2) domain experts typically wouldn't even consider writing a new lock.
> So they make do with atomic bit states, spinlocks, reference counts,
> mutexes, and waitqueues, making a mostly-functional, higher-order lock.

Have you considered building your ldlock based on lib/rwsem-spinlock.c
instead ? i.e. having an internal spinlock to protect the ldisc
reference count and the reader and writer queues. This would seem much
simpler get right. The downside would be that a spinlock would be
taken for a short time whenever an ldisc reference is taken or
released. I don't expect that the internal spinlock would get
significant contention ?

> Perhaps a future direction for rwsem would be to provide a selectable
> lock policy (fifo, mostly-fair, writer-first) on initialization so that
> the different use cases can be easily accomodated?

Probably makes more sense to have different locks for the different
usage models IMO...

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-03-14 01:13:18

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Wed, 2013-03-13 at 04:36 -0700, Michel Lespinasse wrote:
> Have you considered building your ldlock based on lib/rwsem-spinlock.c
> instead ? i.e. having an internal spinlock to protect the ldisc
> reference count and the reader and writer queues. This would seem much
> simpler get right. The downside would be that a spinlock would be
> taken for a short time whenever an ldisc reference is taken or
> released. I don't expect that the internal spinlock would get
> significant contention ?

That would have been too easy :)

TBH, I hadn't considered it until I was most the way through a working
atomic version. I had already split the reader/writer wait lists. And
figured out how to always use the wait bias for every waiting reader and
writer -- rather than the rwsem way of testing for an empty list --
which made the timeout handling easier.

At the time, the only thing that I was still struggling with was
recursion, and the spinlock flavor wasn't going to fix that. So I just
kept with the atomic flavor.

2013-03-14 07:25:13

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Wed, Mar 13, 2013 at 6:12 PM, Peter Hurley <[email protected]> wrote:
> On Wed, 2013-03-13 at 04:36 -0700, Michel Lespinasse wrote:
>> Have you considered building your ldlock based on lib/rwsem-spinlock.c
>> instead ? i.e. having an internal spinlock to protect the ldisc
>> reference count and the reader and writer queues. This would seem much
>> simpler get right. The downside would be that a spinlock would be
>> taken for a short time whenever an ldisc reference is taken or
>> released. I don't expect that the internal spinlock would get
>> significant contention ?
>
> That would have been too easy :)
>
> TBH, I hadn't considered it until I was most the way through a working
> atomic version. I had already split the reader/writer wait lists. And
> figured out how to always use the wait bias for every waiting reader and
> writer -- rather than the rwsem way of testing for an empty list --
> which made the timeout handling easier.
>
> At the time, the only thing that I was still struggling with was
> recursion, and the spinlock flavor wasn't going to fix that. So I just
> kept with the atomic flavor.

Its not too late to run away from it and preserve your sanity (as well
as that of the next person working on the tty layer :)

I think I know that rwsem code pretty well by now and I still get
surprised here and there, as in our other discussion...

Seriously, things will be easier if you can use an internal spinlock.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-03-14 11:43:11

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Thu, 2013-03-14 at 00:25 -0700, Michel Lespinasse wrote:
> On Wed, Mar 13, 2013 at 6:12 PM, Peter Hurley <[email protected]> wrote:
> > On Wed, 2013-03-13 at 04:36 -0700, Michel Lespinasse wrote:
> >> Have you considered building your ldlock based on lib/rwsem-spinlock.c
> >> instead ? i.e. having an internal spinlock to protect the ldisc
> >> reference count and the reader and writer queues. This would seem much
> >> simpler get right. The downside would be that a spinlock would be
> >> taken for a short time whenever an ldisc reference is taken or
> >> released. I don't expect that the internal spinlock would get
> >> significant contention ?
> >
> > That would have been too easy :)
> >
> > TBH, I hadn't considered it until I was most the way through a working
> > atomic version. I had already split the reader/writer wait lists. And
> > figured out how to always use the wait bias for every waiting reader and
> > writer -- rather than the rwsem way of testing for an empty list --
> > which made the timeout handling easier.
> >
> > At the time, the only thing that I was still struggling with was
> > recursion, and the spinlock flavor wasn't going to fix that. So I just
> > kept with the atomic flavor.
>
> Its not too late to run away from it and preserve your sanity (as well
> as that of the next person working on the tty layer :)

The long-term plan is to migrate it to lib so it won't be a maintenance
burden to tty.

2013-03-14 12:13:09

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Thu, Mar 14, 2013 at 4:42 AM, Peter Hurley <[email protected]> wrote:
> On Thu, 2013-03-14 at 00:25 -0700, Michel Lespinasse wrote:
>> Its not too late to run away from it and preserve your sanity (as well
>> as that of the next person working on the tty layer :)
>
> The long-term plan is to migrate it to lib so it won't be a maintenance
> burden to tty.

That only moves the problem though, and makes sense only if we know of
another place where an unfair rwsem is desired...

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-03-18 23:52:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v5 00/44] ldisc patchset

On Mon, Mar 11, 2013 at 04:44:20PM -0400, Peter Hurley wrote:
> Greg,
> This patchset includes
> 'tty: Drop lock contention stat from ldsem trylocks'
> so no need to apply that on this series. Also, I noticed you
> kept the 'tty is NULL' removal on a different branch so I left
> my patch in this series that removes it.
>
> This series applies cleanly to tty-next.

I've applied the first 25 patches, right up to the "create a new lock"
patch. I'll comment on that one next.

thanks,

greg k-h

2013-03-18 23:57:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rw semaphore

On Mon, Mar 11, 2013 at 04:44:46PM -0400, Peter Hurley wrote:
> The semantics of a rw semaphore are almost ideally suited
> for tty line discipline lifetime management; multiple active
> threads obtain "references" (read locks) while performing i/o
> to prevent the loss or change of the current line discipline
> (write lock).
>
> Unfortunately, the existing rw_semaphore is ill-suited in other
> ways;
> 1) obtaining a "reference" can be recursive, ie., a reference holder
> may attempt to obtain another "reference". Recursive read locks
> are not supported by rwsem.

Why does a ldisc need to obtain this recursively?

> 2) TIOCSETD ioctl (change line discipline) expects to return an
> error if the line discipline cannot be exclusively locked within
> 5 secs. Lock wait timeouts are not supported by rwsem.

Don't we have some other lock that can timeout?

> 3) A tty hangup is expected to halt and scrap pending i/o, so
> exclusive locking must be prioritized without precluding
> existing reference holders from obtaining recursive read locks.
> Writer priority is not supported by rwsem.

But how bad is it really if we have to wait a bit for that write lock to
get through all of the existing readers? Either way, we are supposed to
be dropping i/o, so it shouldn't be a big deal, right?

> Add ld_semaphore which implements these requirements in a
> semantically and operationally similar way to rw_semaphore.

I _really_ don't want to add a new lock to the kernel, especially one
that is only used by one "driver". You are going to have to convince
the current lock authors that this really is needed, before I can take
it, sorry.

What is wrong with the existing ldisc code that the creation of this
lock is needed? Is our current code that broken?

Ok, it is the tty layer, so it probably is, but it's made it this far
for the past 20 years...

greg k-h

2013-03-18 23:58:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v5 28/44] tty: Remove ldsem recursion support

On Mon, Mar 11, 2013 at 04:44:48PM -0400, Peter Hurley wrote:
> Read lock recursion is no longer required for ldisc references;
> remove mechanism.
>
> Signed-off-by: Peter Hurley <[email protected]>
> ---
> drivers/tty/tty_ldsem.c | 83 +++++------------------------------------------
> include/linux/tty_ldisc.h | 2 --
> 2 files changed, 8 insertions(+), 77 deletions(-)

Wait, why did you add something 3 patches ago, only to remove it here?
Why not just smush these patches together in the first place?

greg k-h

2013-03-19 00:01:23

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 28/44] tty: Remove ldsem recursion support

On Mon, 2013-03-18 at 16:59 -0700, Greg Kroah-Hartman wrote:
> On Mon, Mar 11, 2013 at 04:44:48PM -0400, Peter Hurley wrote:
> > Read lock recursion is no longer required for ldisc references;
> > remove mechanism.
> >
> > Signed-off-by: Peter Hurley <[email protected]>
> > ---
> > drivers/tty/tty_ldsem.c | 83 +++++------------------------------------------
> > include/linux/tty_ldisc.h | 2 --
> > 2 files changed, 8 insertions(+), 77 deletions(-)
>
> Wait, why did you add something 3 patches ago, only to remove it here?
> Why not just smush these patches together in the first place?

>From [PATCH v5 00/44] ldisc patchset...

On Mon, 2013-03-11 at 16:44 -0400, Peter Hurley wrote:
> v5 changes:
>
> After completing an audit of the recursive use of ldisc
> references, I discovered the _blocking_ recursive acquisition
> of ldisc references was limited to line disciplines misusing
> the tty_perform_flush() function.
> With that now resolved in,
> 'tty: Fix recursive deadlock in tty_perform_flush()'
> the recursion design in ldsem has been removed.
>
> The recursion removal is in its own patch,
> 'tty: Remove ldsem recursion support'
> to ease review for those that have already reviewed the
> ldsem implementation.

2013-03-19 00:03:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v5 28/44] tty: Remove ldsem recursion support

On Mon, Mar 18, 2013 at 08:01:01PM -0400, Peter Hurley wrote:
> On Mon, 2013-03-18 at 16:59 -0700, Greg Kroah-Hartman wrote:
> > On Mon, Mar 11, 2013 at 04:44:48PM -0400, Peter Hurley wrote:
> > > Read lock recursion is no longer required for ldisc references;
> > > remove mechanism.
> > >
> > > Signed-off-by: Peter Hurley <[email protected]>
> > > ---
> > > drivers/tty/tty_ldsem.c | 83 +++++------------------------------------------
> > > include/linux/tty_ldisc.h | 2 --
> > > 2 files changed, 8 insertions(+), 77 deletions(-)
> >
> > Wait, why did you add something 3 patches ago, only to remove it here?
> > Why not just smush these patches together in the first place?
>
> >From [PATCH v5 00/44] ldisc patchset...
>
> On Mon, 2013-03-11 at 16:44 -0400, Peter Hurley wrote:
> > v5 changes:
> >
> > After completing an audit of the recursive use of ldisc
> > references, I discovered the _blocking_ recursive acquisition
> > of ldisc references was limited to line disciplines misusing
> > the tty_perform_flush() function.
> > With that now resolved in,
> > 'tty: Fix recursive deadlock in tty_perform_flush()'
> > the recursion design in ldsem has been removed.
> >
> > The recursion removal is in its own patch,
> > 'tty: Remove ldsem recursion support'
> > to ease review for those that have already reviewed the
> > ldsem implementation.

Ah, ok. Who reviewed the ldsem implementation? I didn't see any other
acks on it, or did I miss them?

thanks,

greg k-h

2013-03-19 00:12:23

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 28/44] tty: Remove ldsem recursion support

On Mon, 2013-03-18 at 17:05 -0700, Greg Kroah-Hartman wrote:
> On Mon, Mar 18, 2013 at 08:01:01PM -0400, Peter Hurley wrote:
> > On Mon, 2013-03-18 at 16:59 -0700, Greg Kroah-Hartman wrote:
> > > On Mon, Mar 11, 2013 at 04:44:48PM -0400, Peter Hurley wrote:
> > > > Read lock recursion is no longer required for ldisc references;
> > > > remove mechanism.
> > > >
> > > > Signed-off-by: Peter Hurley <[email protected]>
> > > > ---
> > > > drivers/tty/tty_ldsem.c | 83 +++++------------------------------------------
> > > > include/linux/tty_ldisc.h | 2 --
> > > > 2 files changed, 8 insertions(+), 77 deletions(-)
> > >
> > > Wait, why did you add something 3 patches ago, only to remove it here?
> > > Why not just smush these patches together in the first place?
> >
> > >From [PATCH v5 00/44] ldisc patchset...
> >
> > On Mon, 2013-03-11 at 16:44 -0400, Peter Hurley wrote:
> > > v5 changes:
> > >
> > > After completing an audit of the recursive use of ldisc
> > > references, I discovered the _blocking_ recursive acquisition
> > > of ldisc references was limited to line disciplines misusing
> > > the tty_perform_flush() function.
> > > With that now resolved in,
> > > 'tty: Fix recursive deadlock in tty_perform_flush()'
> > > the recursion design in ldsem has been removed.
> > >
> > > The recursion removal is in its own patch,
> > > 'tty: Remove ldsem recursion support'
> > > to ease review for those that have already reviewed the
> > > ldsem implementation.
>
> Ah, ok. Who reviewed the ldsem implementation? I didn't see any other
> acks on it, or did I miss them?

Nobody ack'd it. What I meant by that was, if someone was working their
way through it, it would suck to have the base implementation all
different again, and much easier to review just the changes.

2013-03-19 01:01:41

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rw semaphore

On Mon, 2013-03-18 at 16:58 -0700, Greg Kroah-Hartman wrote:
> On Mon, Mar 11, 2013 at 04:44:46PM -0400, Peter Hurley wrote:
> > The semantics of a rw semaphore are almost ideally suited
> > for tty line discipline lifetime management; multiple active
> > threads obtain "references" (read locks) while performing i/o
> > to prevent the loss or change of the current line discipline
> > (write lock).
> >
> > Unfortunately, the existing rw_semaphore is ill-suited in other
> > ways;
> > 1) obtaining a "reference" can be recursive, ie., a reference holder
> > may attempt to obtain another "reference". Recursive read locks
> > are not supported by rwsem.
>
> Why does a ldisc need to obtain this recursively?

You already discovered it doesn't (but it used to be required).

BTW, it's only because I had a real lock with lockdep support, that this
recursive usage which deadlocks was even discoverable.

> > 2) TIOCSETD ioctl (change line discipline) expects to return an
> > error if the line discipline cannot be exclusively locked within
> > 5 secs. Lock wait timeouts are not supported by rwsem.
>
> Don't we have some other lock that can timeout?

Not that behaves like a r/w semaphore.

> > 3) A tty hangup is expected to halt and scrap pending i/o, so
> > exclusive locking must be prioritized without precluding
> > existing reference holders from obtaining recursive read locks.
> > Writer priority is not supported by rwsem.
>
> But how bad is it really if we have to wait a bit for that write lock to
> get through all of the existing readers? Either way, we are supposed to
> be dropping i/o, so it shouldn't be a big deal, right?

The rwsem behavior is in the process of changing. Write lock stealing
has already been added and refinements there will likely allow some
readers in front of writers.

With slow serial i/o, I'd rather have hangups occur promptly than let a
bunch more i/o through.

> > Add ld_semaphore which implements these requirements in a
> > semantically and operationally similar way to rw_semaphore.
>
> I _really_ don't want to add a new lock to the kernel, especially one
> that is only used by one "driver". You are going to have to convince
> the current lock authors that this really is needed, before I can take
> it, sorry.

That's fine. I can understand the reluctance to take on a new lock
[although you might be interested to read my analysis of rwsem here
https://lkml.org/lkml/2013/3/11/533 which outlines an existing flaw].

That said, part of the reason why the current ldisc implementation is
broken is the lack of appropriate locks. As I recently explained
(actually in this patchset's thread),

a lack of existing options has spawned a DIY approach without
higher-order locks that is rarely correct, but which goes largely
unnoticed exactly because it's not a new lock. A brief review of the
hangs, races, and deadlocks fixed by this patchset should be convincing
enough of that fact. In my opinion, this is the overriding concern.

The two main problems with a one-size-fits-all lock policy is that,
1) lock experts can't realistically foresee the consequences of policy
changes without already being experts in the subsystems in which that
lock is used. Even domain experts may miss potential consequences, and
2) domain experts typically wouldn't even consider writing a new lock.
So they make do with atomic bit states, spinlocks, reference counts,
mutexes, and waitqueues, making a mostly-functional, higher-order lock.

>From whom would you like me to get an ack for this?

> What is wrong with the existing ldisc code that the creation of this
> lock is needed? Is our current code that broken?

Yes. Even just the acquistion of the ldisc reference is wrong [the
analysis is in the patch 21 changelog].

If you'd like, I can send you 6 or so short user test programs that
hang, crash, or deadlock inside 60 seconds on mainline and next, but not
with this patchset.

Regards,
Peter Hurley

2013-03-19 01:58:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rw semaphore

On Mon, Mar 18, 2013 at 09:01:19PM -0400, Peter Hurley wrote:
> On Mon, 2013-03-18 at 16:58 -0700, Greg Kroah-Hartman wrote:
> > On Mon, Mar 11, 2013 at 04:44:46PM -0400, Peter Hurley wrote:
> > > 2) TIOCSETD ioctl (change line discipline) expects to return an
> > > error if the line discipline cannot be exclusively locked within
> > > 5 secs. Lock wait timeouts are not supported by rwsem.
> >
> > Don't we have some other lock that can timeout?
>
> Not that behaves like a r/w semaphore.

Can't we just add it? Or is that too much work?

> > > 3) A tty hangup is expected to halt and scrap pending i/o, so
> > > exclusive locking must be prioritized without precluding
> > > existing reference holders from obtaining recursive read locks.
> > > Writer priority is not supported by rwsem.
> >
> > But how bad is it really if we have to wait a bit for that write lock to
> > get through all of the existing readers? Either way, we are supposed to
> > be dropping i/o, so it shouldn't be a big deal, right?
>
> The rwsem behavior is in the process of changing. Write lock stealing
> has already been added and refinements there will likely allow some
> readers in front of writers.
>
> With slow serial i/o, I'd rather have hangups occur promptly than let a
> bunch more i/o through.

So all we are now lacking, with the changes to rwsem, is the timeout
problem?

> > > Add ld_semaphore which implements these requirements in a
> > > semantically and operationally similar way to rw_semaphore.
> >
> > I _really_ don't want to add a new lock to the kernel, especially one
> > that is only used by one "driver". You are going to have to convince
> > the current lock authors that this really is needed, before I can take
> > it, sorry.
>
> That's fine. I can understand the reluctance to take on a new lock
> [although you might be interested to read my analysis of rwsem here
> https://lkml.org/lkml/2013/3/11/533 which outlines an existing flaw].
>
> That said, part of the reason why the current ldisc implementation is
> broken is the lack of appropriate locks. As I recently explained
> (actually in this patchset's thread),
>
> a lack of existing options has spawned a DIY approach without
> higher-order locks that is rarely correct, but which goes largely
> unnoticed exactly because it's not a new lock. A brief review of the
> hangs, races, and deadlocks fixed by this patchset should be convincing
> enough of that fact. In my opinion, this is the overriding concern.
>
> The two main problems with a one-size-fits-all lock policy is that,
> 1) lock experts can't realistically foresee the consequences of policy
> changes without already being experts in the subsystems in which that
> lock is used. Even domain experts may miss potential consequences, and
> 2) domain experts typically wouldn't even consider writing a new lock.
> So they make do with atomic bit states, spinlocks, reference counts,
> mutexes, and waitqueues, making a mostly-functional, higher-order lock.

I read that, however rolling your own lock is almost never the solution.

> From whom would you like me to get an ack for this?

The people who wrote the rwsem code?

> > What is wrong with the existing ldisc code that the creation of this
> > lock is needed? Is our current code that broken?
>
> Yes. Even just the acquistion of the ldisc reference is wrong [the
> analysis is in the patch 21 changelog].

Yes, very nice work, I'm not saying that this isn't a messed up area at
all, it's just that such deep flaws that require a new type of a lock
don't usually come up all that often.

> If you'd like, I can send you 6 or so short user test programs that
> hang, crash, or deadlock inside 60 seconds on mainline and next, but not
> with this patchset.

That would be interesting to have, please send them.

And I hope that they only lock up when run as root, but I'm afraid to
ask that question...

thanks,

greg k-h

2013-03-19 15:43:24

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rw semaphore


On Mon, 2013-03-18 at 18:59 -0700, Greg Kroah-Hartman wrote:
> On Mon, Mar 18, 2013 at 09:01:19PM -0400, Peter Hurley wrote:
> > On Mon, 2013-03-18 at 16:58 -0700, Greg Kroah-Hartman wrote:
> > > On Mon, Mar 11, 2013 at 04:44:46PM -0400, Peter Hurley wrote:
> > > > 2) TIOCSETD ioctl (change line discipline) expects to return an
> > > > error if the line discipline cannot be exclusively locked within
> > > > 5 secs. Lock wait timeouts are not supported by rwsem.
> > >
> > > Don't we have some other lock that can timeout?
> >
> > Not that behaves like a r/w semaphore.
>
> Can't we just add it? Or is that too much work?

See my comments below about rolling your own lock.

> > > > 3) A tty hangup is expected to halt and scrap pending i/o, so
> > > > exclusive locking must be prioritized without precluding
> > > > existing reference holders from obtaining recursive read locks.
> > > > Writer priority is not supported by rwsem.
> > >
> > > But how bad is it really if we have to wait a bit for that write lock to
> > > get through all of the existing readers? Either way, we are supposed to
> > > be dropping i/o, so it shouldn't be a big deal, right?
> >
> > The rwsem behavior is in the process of changing. Write lock stealing
> > has already been added and refinements there will likely allow some
> > readers in front of writers.
> >
> > With slow serial i/o, I'd rather have hangups occur promptly than let a
> > bunch more i/o through.
>
> So all we are now lacking, with the changes to rwsem, is the timeout
> problem?

No. What I'm saying is the existing rwsem lock policy is not ideal and
the future lock policy is unlikely to be any more ideal, and possibly
worse.

> > > > Add ld_semaphore which implements these requirements in a
> > > > semantically and operationally similar way to rw_semaphore.
> > >
> > > I _really_ don't want to add a new lock to the kernel, especially one
> > > that is only used by one "driver". You are going to have to convince
> > > the current lock authors that this really is needed, before I can take
> > > it, sorry.
> >
> > That's fine. I can understand the reluctance to take on a new lock
> > [although you might be interested to read my analysis of rwsem here
> > https://lkml.org/lkml/2013/3/11/533 which outlines an existing flaw].
> >
> > That said, part of the reason why the current ldisc implementation is
> > broken is the lack of appropriate locks. As I recently explained
> > (actually in this patchset's thread),
> >
> > a lack of existing options has spawned a DIY approach without
> > higher-order locks that is rarely correct, but which goes largely
> > unnoticed exactly because it's not a new lock. A brief review of the
> > hangs, races, and deadlocks fixed by this patchset should be convincing
> > enough of that fact. In my opinion, this is the overriding concern.
> >
> > The two main problems with a one-size-fits-all lock policy is that,
> > 1) lock experts can't realistically foresee the consequences of policy
> > changes without already being experts in the subsystems in which that
> > lock is used. Even domain experts may miss potential consequences, and
> > 2) domain experts typically wouldn't even consider writing a new lock.
> > So they make do with atomic bit states, spinlocks, reference counts,
> > mutexes, and waitqueues, making a mostly-functional, higher-order lock.
>
> I read that, however rolling your own lock is almost never the solution.

Except that's the whole issue, isn't it?

There is no existing lock solution because there is no r/w semaphore
that times out.

So, no matter what, a new lock is required.

Since there is only one use-case for the new lock, it makes sense for
that lock to have the lock policy best suited for the one use-case,
especially since it's already done.

> > From whom would you like me to get an ack for this?
>
> The people who wrote the rwsem code?

Ok.


Regards,
Peter Hurley


2013-03-19 19:51:32

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 4/7] tty: Clarify ldisc variable

Rename o_ldisc to avoid confusion with the ldisc of the
'other' tty.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index a150f95..9ace119 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -516,7 +516,7 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
int tty_set_ldisc(struct tty_struct *tty, int ldisc)
{
int retval;
- struct tty_ldisc *o_ldisc, *new_ldisc;
+ struct tty_ldisc *old_ldisc, *new_ldisc;
struct tty_struct *o_tty = tty->link;

new_ldisc = tty_ldisc_get(tty, ldisc);
@@ -540,7 +540,7 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
/* FIXME: why 'shutoff' input if the ldisc is locked? */
tty->receive_room = 0;

- o_ldisc = tty->ldisc;
+ old_ldisc = tty->ldisc;
tty_lock(tty);

/* FIXME: for testing only */
@@ -555,8 +555,8 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
return -EIO;
}

- /* Shutdown the current discipline. */
- tty_ldisc_close(tty, o_ldisc);
+ /* Shutdown the old discipline. */
+ tty_ldisc_close(tty, old_ldisc);

/* Now set up the new line discipline. */
tty->ldisc = new_ldisc;
@@ -566,17 +566,17 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
if (retval < 0) {
/* Back to the old one or N_TTY if we can't */
tty_ldisc_put(new_ldisc);
- tty_ldisc_restore(tty, o_ldisc);
+ tty_ldisc_restore(tty, old_ldisc);
}

/* At this point we hold a reference to the new ldisc and a
a reference to the old ldisc. If we ended up flipping back
to the existing ldisc we have two references to it */

- if (tty->ldisc->ops->num != o_ldisc->ops->num && tty->ops->set_ldisc)
+ if (tty->ldisc->ops->num != old_ldisc->ops->num && tty->ops->set_ldisc)
tty->ops->set_ldisc(tty);

- tty_ldisc_put(o_ldisc);
+ tty_ldisc_put(old_ldisc);

/*
* Allow ldisc referencing to occur again
--
1.8.1.2

2013-03-19 19:52:19

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 7/7] tty: Fix tty_ldisc_lock name collision

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 9725c94..ba49c0e 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -44,7 +44,7 @@ enum {
* callers who will do ldisc lookups and cannot sleep.
*/

-static DEFINE_RAW_SPINLOCK(tty_ldisc_lock);
+static DEFINE_RAW_SPINLOCK(tty_ldiscs_lock);
/* Line disc dispatch table */
static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];

@@ -58,7 +58,7 @@ static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];
* from this point onwards.
*
* Locking:
- * takes tty_ldisc_lock to guard against ldisc races
+ * takes tty_ldiscs_lock to guard against ldisc races
*/

int tty_register_ldisc(int disc, struct tty_ldisc_ops *new_ldisc)
@@ -69,11 +69,11 @@ int tty_register_ldisc(int disc, struct tty_ldisc_ops *new_ldisc)
if (disc < N_TTY || disc >= NR_LDISCS)
return -EINVAL;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
tty_ldiscs[disc] = new_ldisc;
new_ldisc->num = disc;
new_ldisc->refcount = 0;
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);

return ret;
}
@@ -88,7 +88,7 @@ EXPORT_SYMBOL(tty_register_ldisc);
* currently in use.
*
* Locking:
- * takes tty_ldisc_lock to guard against ldisc races
+ * takes tty_ldiscs_lock to guard against ldisc races
*/

int tty_unregister_ldisc(int disc)
@@ -99,12 +99,12 @@ int tty_unregister_ldisc(int disc)
if (disc < N_TTY || disc >= NR_LDISCS)
return -EINVAL;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
if (tty_ldiscs[disc]->refcount)
ret = -EBUSY;
else
tty_ldiscs[disc] = NULL;
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);

return ret;
}
@@ -115,7 +115,7 @@ static struct tty_ldisc_ops *get_ldops(int disc)
unsigned long flags;
struct tty_ldisc_ops *ldops, *ret;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
ret = ERR_PTR(-EINVAL);
ldops = tty_ldiscs[disc];
if (ldops) {
@@ -125,7 +125,7 @@ static struct tty_ldisc_ops *get_ldops(int disc)
ret = ldops;
}
}
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);
return ret;
}

@@ -133,10 +133,10 @@ static void put_ldops(struct tty_ldisc_ops *ldops)
{
unsigned long flags;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
+ raw_spin_lock_irqsave(&tty_ldiscs_lock, flags);
ldops->refcount--;
module_put(ldops->owner);
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
+ raw_spin_unlock_irqrestore(&tty_ldiscs_lock, flags);
}

/**
@@ -149,7 +149,7 @@ static void put_ldops(struct tty_ldisc_ops *ldops)
* available
*
* Locking:
- * takes tty_ldisc_lock to guard against ldisc races
+ * takes tty_ldiscs_lock to guard against ldisc races
*/

static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc)
--
1.8.1.2

2013-03-19 19:53:38

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 2/7] tty: Add lock/unlock ldisc pair functions

Just as the tty pair must be locked in a stable sequence
(ie, independent of which is consider the 'other' tty), so must
the ldisc pair be locked in a stable sequence as well.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 1afe192..ae0287f 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -31,6 +31,13 @@
#define tty_ldisc_debug(tty, f, args...)
#endif

+/* lockdep nested classes for tty->ldisc_sem */
+enum {
+ LDISC_SEM_NORMAL,
+ LDISC_SEM_OTHER,
+};
+
+
/*
* This guards the refcounted line discipline lists. The lock
* must be taken with irqs off because there are hangup path
@@ -351,6 +358,86 @@ void tty_ldisc_deref(struct tty_ldisc *ld)
}
EXPORT_SYMBOL_GPL(tty_ldisc_deref);

+
+static inline int __lockfunc
+tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
+{
+ return ldsem_down_write(&tty->ldisc_sem, timeout);
+}
+
+static inline int __lockfunc
+tty_ldisc_lock_nested(struct tty_struct *tty, unsigned long timeout)
+{
+ return ldsem_down_write_nested(&tty->ldisc_sem,
+ LDISC_SEM_OTHER, timeout);
+}
+
+static inline void tty_ldisc_unlock(struct tty_struct *tty)
+{
+ return ldsem_up_write(&tty->ldisc_sem);
+}
+
+static int __lockfunc
+tty_ldisc_lock_pair_timeout(struct tty_struct *tty, struct tty_struct *tty2,
+ unsigned long timeout)
+{
+ int ret;
+
+ if (tty < tty2) {
+ ret = tty_ldisc_lock(tty, timeout);
+ if (ret) {
+ ret = tty_ldisc_lock_nested(tty2, timeout);
+ if (!ret)
+ tty_ldisc_unlock(tty);
+ }
+ } else {
+ /* if this is possible, it has lots of implications */
+ WARN_ON_ONCE(tty == tty2);
+ if (tty2 && tty != tty2) {
+ ret = tty_ldisc_lock(tty2, timeout);
+ if (ret) {
+ ret = tty_ldisc_lock_nested(tty, timeout);
+ if (!ret)
+ tty_ldisc_unlock(tty2);
+ }
+ } else
+ ret = tty_ldisc_lock(tty, timeout);
+ }
+
+ if (!ret)
+ return -EBUSY;
+
+ set_bit(TTY_LDISC_HALTED, &tty->flags);
+ if (tty2)
+ set_bit(TTY_LDISC_HALTED, &tty2->flags);
+ return 0;
+}
+
+static void __lockfunc
+tty_ldisc_lock_pair(struct tty_struct *tty, struct tty_struct *tty2)
+{
+ tty_ldisc_lock_pair_timeout(tty, tty2, MAX_SCHEDULE_TIMEOUT);
+}
+
+static void __lockfunc tty_ldisc_unlock_pair(struct tty_struct *tty,
+ struct tty_struct *tty2)
+{
+ tty_ldisc_unlock(tty);
+ if (tty2)
+ tty_ldisc_unlock(tty2);
+}
+
+static void __lockfunc tty_ldisc_enable_pair(struct tty_struct *tty,
+ struct tty_struct *tty2)
+{
+ clear_bit(TTY_LDISC_HALTED, &tty->flags);
+ if (tty2)
+ clear_bit(TTY_LDISC_HALTED, &tty2->flags);
+
+ tty_ldisc_unlock_pair(tty, tty2);
+}
+
+
/**
* tty_ldisc_enable - allow ldisc use
* @tty: terminal to activate ldisc on
--
1.8.1.2

2013-03-19 20:03:40

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 3/7] tty: Replace ldisc locking with ldisc_sem

Line discipline locking was performed with a combination of
a mutex, a status bit, a count, and a waitqueue -- basically,
a rw semaphore.

Replace the existing combination with an ld_semaphore.

Fixes:
1) the 'reference acquire after ldisc locked' bug
2) the over-complicated halt mechanism
3) lock order wrt. tty_lock()
4) dropping locks while changing ldisc
5) previously unidentified deadlock while locking ldisc from
both linked ttys concurrently
6) previously unidentified recursive deadlocks

Adds much-needed lockdep diagnostics.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_buffer.c | 2 +-
drivers/tty/tty_io.c | 7 +-
drivers/tty/tty_ldisc.c | 324 ++++++----------------------------------------
include/linux/tty.h | 4 +-
include/linux/tty_ldisc.h | 3 +-
5 files changed, 48 insertions(+), 292 deletions(-)

diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c
index 578aa75..8e8d730 100644
--- a/drivers/tty/tty_buffer.c
+++ b/drivers/tty/tty_buffer.c
@@ -429,7 +429,7 @@ static void flush_to_ldisc(struct work_struct *work)
return;

disc = tty_ldisc_ref(tty);
- if (disc == NULL) /* !TTY_LDISC */
+ if (disc == NULL)
return;

spin_lock_irqsave(&buf->lock, flags);
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 2f77af4..06f0e25 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1326,8 +1326,7 @@ static int tty_reopen(struct tty_struct *tty)
struct tty_driver *driver = tty->driver;

if (test_bit(TTY_CLOSING, &tty->flags) ||
- test_bit(TTY_HUPPING, &tty->flags) ||
- test_bit(TTY_LDISC_CHANGING, &tty->flags))
+ test_bit(TTY_HUPPING, &tty->flags))
return -EIO;

if (driver->type == TTY_DRIVER_TYPE_PTY &&
@@ -1343,7 +1342,7 @@ static int tty_reopen(struct tty_struct *tty)
}
tty->count++;

- WARN_ON(!test_bit(TTY_LDISC, &tty->flags));
+ WARN_ON(!tty->ldisc);

return 0;
}
@@ -2952,7 +2951,7 @@ void initialize_tty_struct(struct tty_struct *tty,
tty->pgrp = NULL;
mutex_init(&tty->legacy_mutex);
mutex_init(&tty->termios_mutex);
- mutex_init(&tty->ldisc_mutex);
+ init_ldsem(&tty->ldisc_sem);
init_waitqueue_head(&tty->write_wait);
init_waitqueue_head(&tty->read_wait);
INIT_WORK(&tty->hangup_work, do_tty_hangup);
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index ae0287f..a150f95 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -45,7 +45,6 @@ enum {
*/

static DEFINE_RAW_SPINLOCK(tty_ldisc_lock);
-static DECLARE_WAIT_QUEUE_HEAD(tty_ldisc_wait);
/* Line disc dispatch table */
static struct tty_ldisc_ops *tty_ldiscs[NR_LDISCS];

@@ -153,7 +152,7 @@ static void put_ldops(struct tty_ldisc_ops *ldops)
* takes tty_ldisc_lock to guard against ldisc races
*/

-static struct tty_ldisc *tty_ldisc_get(int disc)
+static struct tty_ldisc *tty_ldisc_get(struct tty_struct *tty, int disc)
{
struct tty_ldisc *ld;
struct tty_ldisc_ops *ldops;
@@ -180,8 +179,7 @@ static struct tty_ldisc *tty_ldisc_get(int disc)
}

ld->ops = ldops;
- atomic_set(&ld->users, 1);
- init_waitqueue_head(&ld->wq_idle);
+ ld->tty = tty;

return ld;
}
@@ -193,20 +191,11 @@ static struct tty_ldisc *tty_ldisc_get(int disc)
*/
static inline void tty_ldisc_put(struct tty_ldisc *ld)
{
- unsigned long flags;
-
if (WARN_ON_ONCE(!ld))
return;

- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
-
- /* unreleased reader reference(s) will cause this WARN */
- WARN_ON(!atomic_dec_and_test(&ld->users));
-
- ld->ops->refcount--;
- module_put(ld->ops->owner);
+ put_ldops(ld->ops);
kfree(ld);
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
}

static void *tty_ldiscs_seq_start(struct seq_file *m, loff_t *pos)
@@ -258,34 +247,6 @@ const struct file_operations tty_ldiscs_proc_fops = {
};

/**
- * tty_ldisc_try - internal helper
- * @tty: the tty
- *
- * Make a single attempt to grab and bump the refcount on
- * the tty ldisc. Return 0 on failure or 1 on success. This is
- * used to implement both the waiting and non waiting versions
- * of tty_ldisc_ref
- *
- * Locking: takes tty_ldisc_lock
- */
-
-static struct tty_ldisc *tty_ldisc_try(struct tty_struct *tty)
-{
- unsigned long flags;
- struct tty_ldisc *ld;
-
- /* FIXME: this allows reference acquire after TTY_LDISC is cleared */
- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
- ld = NULL;
- if (test_bit(TTY_LDISC, &tty->flags) && tty->ldisc) {
- ld = tty->ldisc;
- atomic_inc(&ld->users);
- }
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
- return ld;
-}
-
-/**
* tty_ldisc_ref_wait - wait for the tty ldisc
* @tty: tty device
*
@@ -298,16 +259,15 @@ static struct tty_ldisc *tty_ldisc_try(struct tty_struct *tty)
* against a discipline change, such as an existing ldisc reference
* (which we check for)
*
- * Locking: call functions take tty_ldisc_lock
+ * Note: only callable from a file_operations routine (which
+ * guarantees tty->ldisc !- NULL when the lock is acquired).
*/

struct tty_ldisc *tty_ldisc_ref_wait(struct tty_struct *tty)
{
- struct tty_ldisc *ld;
-
- /* wait_event is a macro */
- wait_event(tty_ldisc_wait, (ld = tty_ldisc_try(tty)) != NULL);
- return ld;
+ ldsem_down_read(&tty->ldisc_sem, MAX_SCHEDULE_TIMEOUT);
+ WARN_ON(!tty->ldisc);
+ return tty->ldisc;
}
EXPORT_SYMBOL_GPL(tty_ldisc_ref_wait);

@@ -318,13 +278,16 @@ EXPORT_SYMBOL_GPL(tty_ldisc_ref_wait);
* Dereference the line discipline for the terminal and take a
* reference to it. If the line discipline is in flux then
* return NULL. Can be called from IRQ and timer functions.
- *
- * Locking: called functions take tty_ldisc_lock
*/

struct tty_ldisc *tty_ldisc_ref(struct tty_struct *tty)
{
- return tty_ldisc_try(tty);
+ if (ldsem_down_read_trylock(&tty->ldisc_sem)) {
+ if (!tty->ldisc)
+ ldsem_up_read(&tty->ldisc_sem);
+ return tty->ldisc;
+ }
+ return NULL;
}
EXPORT_SYMBOL_GPL(tty_ldisc_ref);

@@ -334,27 +297,11 @@ EXPORT_SYMBOL_GPL(tty_ldisc_ref);
*
* Undoes the effect of tty_ldisc_ref or tty_ldisc_ref_wait. May
* be called in IRQ context.
- *
- * Locking: takes tty_ldisc_lock
*/

void tty_ldisc_deref(struct tty_ldisc *ld)
{
- unsigned long flags;
-
- if (WARN_ON_ONCE(!ld))
- return;
-
- raw_spin_lock_irqsave(&tty_ldisc_lock, flags);
- /*
- * WARNs if one-too-many reader references were released
- * - the last reference must be released with tty_ldisc_put
- */
- WARN_ON(atomic_dec_and_test(&ld->users));
- raw_spin_unlock_irqrestore(&tty_ldisc_lock, flags);
-
- if (waitqueue_active(&ld->wq_idle))
- wake_up(&ld->wq_idle);
+ ldsem_up_read(&ld->tty->ldisc_sem);
}
EXPORT_SYMBOL_GPL(tty_ldisc_deref);

@@ -439,26 +386,6 @@ static void __lockfunc tty_ldisc_enable_pair(struct tty_struct *tty,


/**
- * tty_ldisc_enable - allow ldisc use
- * @tty: terminal to activate ldisc on
- *
- * Set the TTY_LDISC flag when the line discipline can be called
- * again. Do necessary wakeups for existing sleepers. Clear the LDISC
- * changing flag to indicate any ldisc change is now over.
- *
- * Note: nobody should set the TTY_LDISC bit except via this function.
- * Clearing directly is allowed.
- */
-
-static void tty_ldisc_enable(struct tty_struct *tty)
-{
- clear_bit(TTY_LDISC_HALTED, &tty->flags);
- set_bit(TTY_LDISC, &tty->flags);
- clear_bit(TTY_LDISC_CHANGING, &tty->flags);
- wake_up(&tty_ldisc_wait);
-}
-
-/**
* tty_ldisc_flush - flush line discipline queue
* @tty: tty
*
@@ -555,14 +482,14 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
int r;

/* There is an outstanding reference here so this is safe */
- old = tty_ldisc_get(old->ops->num);
+ old = tty_ldisc_get(tty, old->ops->num);
WARN_ON(IS_ERR(old));
tty->ldisc = old;
tty_set_termios_ldisc(tty, old->ops->num);
if (tty_ldisc_open(tty, old) < 0) {
tty_ldisc_put(old);
/* This driver is always present */
- new_ldisc = tty_ldisc_get(N_TTY);
+ new_ldisc = tty_ldisc_get(tty, N_TTY);
if (IS_ERR(new_ldisc))
panic("n_tty: get");
tty->ldisc = new_ldisc;
@@ -576,101 +503,6 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
}

/**
- * tty_ldisc_wait_idle - wait for the ldisc to become idle
- * @tty: tty to wait for
- * @timeout: for how long to wait at most
- *
- * Wait for the line discipline to become idle. The discipline must
- * have been halted for this to guarantee it remains idle.
- */
-static int tty_ldisc_wait_idle(struct tty_struct *tty, long timeout)
-{
- long ret;
- ret = wait_event_timeout(tty->ldisc->wq_idle,
- atomic_read(&tty->ldisc->users) == 1, timeout);
- return ret > 0 ? 0 : -EBUSY;
-}
-
-/**
- * tty_ldisc_halt - shut down the line discipline
- * @tty: tty device
- * @o_tty: paired pty device (can be NULL)
- * @timeout: # of jiffies to wait for ldisc refs to be released
- *
- * Shut down the line discipline and work queue for this tty device and
- * its paired pty (if exists). Clearing the TTY_LDISC flag ensures
- * no further references can be obtained, while waiting for existing
- * references to be released ensures no more data is fed to the ldisc.
- *
- * You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex)
- * in order to make sure any currently executing ldisc work is also
- * flushed.
- */
-
-static int tty_ldisc_halt(struct tty_struct *tty, struct tty_struct *o_tty,
- long timeout)
-{
- int retval;
-
- clear_bit(TTY_LDISC, &tty->flags);
- if (o_tty)
- clear_bit(TTY_LDISC, &o_tty->flags);
-
- retval = tty_ldisc_wait_idle(tty, timeout);
- if (!retval && o_tty)
- retval = tty_ldisc_wait_idle(o_tty, timeout);
- if (retval)
- return retval;
-
- set_bit(TTY_LDISC_HALTED, &tty->flags);
- if (o_tty)
- set_bit(TTY_LDISC_HALTED, &o_tty->flags);
-
- return 0;
-}
-
-/**
- * tty_ldisc_hangup_halt - halt the line discipline for hangup
- * @tty: tty being hung up
- *
- * Shut down the line discipline and work queue for the tty device
- * being hungup. Clear the TTY_LDISC flag to ensure no further
- * references can be obtained and wait for remaining references to be
- * released to ensure no more data is fed to this ldisc.
- * Caller must hold legacy and ->ldisc_mutex.
- *
- * NB: tty_set_ldisc() is prevented from changing the ldisc concurrently
- * with this function by checking the TTY_HUPPING flag.
- */
-static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
-{
- char cur_n[TASK_COMM_LEN], tty_n[64];
- long timeout = 3 * HZ;
-
- clear_bit(TTY_LDISC, &tty->flags);
-
- if (tty->ldisc) { /* Not yet closed */
- tty_unlock(tty);
-
- while (tty_ldisc_wait_idle(tty, timeout) == -EBUSY) {
- timeout = MAX_SCHEDULE_TIMEOUT;
- printk_ratelimited(KERN_WARNING
- "%s: waiting (%s) for %s took too long, but we keep waiting...\n",
- __func__, get_task_comm(cur_n, current),
- tty_name(tty, tty_n));
- }
-
- set_bit(TTY_LDISC_HALTED, &tty->flags);
-
- /* must reacquire both locks and preserve lock order */
- mutex_unlock(&tty->ldisc_mutex);
- tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);
- }
- return !!tty->ldisc;
-}
-
-/**
* tty_set_ldisc - set line discipline
* @tty: the terminal to set
* @ldisc: the line discipline
@@ -679,103 +511,45 @@ static bool tty_ldisc_hangup_halt(struct tty_struct *tty)
* context. The ldisc change logic has to protect itself against any
* overlapping ldisc change (including on the other end of pty pairs),
* the close of one side of a tty/pty pair, and eventually hangup.
- *
- * Locking: takes tty_ldisc_lock, termios_mutex
*/

int tty_set_ldisc(struct tty_struct *tty, int ldisc)
{
int retval;
struct tty_ldisc *o_ldisc, *new_ldisc;
- struct tty_struct *o_tty;
+ struct tty_struct *o_tty = tty->link;

- new_ldisc = tty_ldisc_get(ldisc);
+ new_ldisc = tty_ldisc_get(tty, ldisc);
if (IS_ERR(new_ldisc))
return PTR_ERR(new_ldisc);

- tty_lock(tty);
- /*
- * We need to look at the tty locking here for pty/tty pairs
- * when both sides try to change in parallel.
- */
-
- o_tty = tty->link; /* o_tty is the pty side or NULL */
-
+ retval = tty_ldisc_lock_pair_timeout(tty, o_tty, 5 * HZ);
+ if (retval)
+ return retval;

/*
* Check the no-op case
*/

if (tty->ldisc->ops->num == ldisc) {
- tty_unlock(tty);
+ tty_ldisc_enable_pair(tty, o_tty);
tty_ldisc_put(new_ldisc);
return 0;
}

- mutex_lock(&tty->ldisc_mutex);
-
- /*
- * We could be midstream of another ldisc change which has
- * dropped the lock during processing. If so we need to wait.
- */
-
- while (test_bit(TTY_LDISC_CHANGING, &tty->flags)) {
- mutex_unlock(&tty->ldisc_mutex);
- tty_unlock(tty);
- wait_event(tty_ldisc_wait,
- test_bit(TTY_LDISC_CHANGING, &tty->flags) == 0);
- tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);
- }
-
- set_bit(TTY_LDISC_CHANGING, &tty->flags);
-
- /*
- * No more input please, we are switching. The new ldisc
- * will update this value in the ldisc open function
- */
-
+ /* FIXME: why 'shutoff' input if the ldisc is locked? */
tty->receive_room = 0;

o_ldisc = tty->ldisc;
-
- tty_unlock(tty);
- /*
- * Make sure we don't change while someone holds a
- * reference to the line discipline. The TTY_LDISC bit
- * prevents anyone taking a reference once it is clear.
- * We need the lock to avoid racing reference takers.
- *
- * We must clear the TTY_LDISC bit here to avoid a livelock
- * with a userspace app continually trying to use the tty in
- * parallel to the change and re-referencing the tty.
- */
-
- retval = tty_ldisc_halt(tty, o_tty, 5 * HZ);
-
- /*
- * Wait for hangup to complete, if pending.
- * We must drop the mutex here in case a hangup is also in process.
- */
-
- mutex_unlock(&tty->ldisc_mutex);
-
- flush_work(&tty->hangup_work);
-
tty_lock(tty);
- mutex_lock(&tty->ldisc_mutex);

- /* handle wait idle failure locked */
- if (retval) {
- tty_ldisc_put(new_ldisc);
- goto enable;
- }
+ /* FIXME: for testing only */
+ WARN_ON(test_bit(TTY_HUPPED, &tty->flags));

if (test_bit(TTY_HUPPING, &tty->flags)) {
/* We were raced by the hangup method. It will have stomped
the ldisc data and closed the ldisc down */
- clear_bit(TTY_LDISC_CHANGING, &tty->flags);
- mutex_unlock(&tty->ldisc_mutex);
+ tty_ldisc_enable_pair(tty, o_tty);
tty_ldisc_put(new_ldisc);
tty_unlock(tty);
return -EIO;
@@ -804,14 +578,10 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)

tty_ldisc_put(o_ldisc);

-enable:
/*
* Allow ldisc referencing to occur again
*/
-
- tty_ldisc_enable(tty);
- if (o_tty)
- tty_ldisc_enable(o_tty);
+ tty_ldisc_enable_pair(tty, o_tty);

/* Restart the work queue in case no characters kick it off. Safe if
already running */
@@ -819,7 +589,6 @@ enable:
if (o_tty)
schedule_work(&o_tty->port->buf.work);

- mutex_unlock(&tty->ldisc_mutex);
tty_unlock(tty);
return retval;
}
@@ -852,7 +621,7 @@ static void tty_reset_termios(struct tty_struct *tty)

static int tty_ldisc_reinit(struct tty_struct *tty, int ldisc)
{
- struct tty_ldisc *ld = tty_ldisc_get(ldisc);
+ struct tty_ldisc *ld = tty_ldisc_get(tty, ldisc);

if (IS_ERR(ld))
return -1;
@@ -891,14 +660,8 @@ void tty_ldisc_hangup(struct tty_struct *tty)

tty_ldisc_debug(tty, "closing ldisc: %p\n", tty->ldisc);

- /*
- * FIXME! What are the locking issues here? This may me overdoing
- * things... This question is especially important now that we've
- * removed the irqlock.
- */
ld = tty_ldisc_ref(tty);
if (ld != NULL) {
- /* We may have no line discipline at this point */
if (ld->ops->flush_buffer)
ld->ops->flush_buffer(tty);
tty_driver_flush_buffer(tty);
@@ -909,21 +672,22 @@ void tty_ldisc_hangup(struct tty_struct *tty)
ld->ops->hangup(tty);
tty_ldisc_deref(ld);
}
- /*
- * FIXME: Once we trust the LDISC code better we can wait here for
- * ldisc completion and fix the driver call race
- */
+
wake_up_interruptible_poll(&tty->write_wait, POLLOUT);
wake_up_interruptible_poll(&tty->read_wait, POLLIN);
+
+ tty_unlock(tty);
+
/*
* Shutdown the current line discipline, and reset it to
* N_TTY if need be.
*
* Avoid racing set_ldisc or tty_ldisc_release
*/
- mutex_lock(&tty->ldisc_mutex);
+ tty_ldisc_lock_pair(tty, tty->link);
+ tty_lock(tty);

- if (tty_ldisc_hangup_halt(tty)) {
+ if (tty->ldisc) {

/* At this point we have a halted ldisc; we want to close it and
reopen a new ldisc. We could defer the reopen to the next
@@ -942,9 +706,8 @@ void tty_ldisc_hangup(struct tty_struct *tty)
BUG_ON(tty_ldisc_reinit(tty, N_TTY));
WARN_ON(tty_ldisc_open(tty, tty->ldisc));
}
- tty_ldisc_enable(tty);
}
- mutex_unlock(&tty->ldisc_mutex);
+ tty_ldisc_enable_pair(tty, tty->link);
if (reset)
tty_reset_termios(tty);

@@ -976,15 +739,12 @@ int tty_ldisc_setup(struct tty_struct *tty, struct tty_struct *o_tty)
tty_ldisc_close(tty, ld);
return retval;
}
- tty_ldisc_enable(o_tty);
}
- tty_ldisc_enable(tty);
return 0;
}

static void tty_ldisc_kill(struct tty_struct *tty)
{
- mutex_lock(&tty->ldisc_mutex);
/*
* Now kill off the ldisc
*/
@@ -995,7 +755,6 @@ static void tty_ldisc_kill(struct tty_struct *tty)

/* Ensure the next open requests the N_TTY ldisc */
tty_set_termios_ldisc(tty, N_TTY);
- mutex_unlock(&tty->ldisc_mutex);
}

/**
@@ -1017,15 +776,16 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)

tty_ldisc_debug(tty, "closing ldisc: %p\n", tty->ldisc);

- tty_ldisc_halt(tty, o_tty, MAX_SCHEDULE_TIMEOUT);
-
+ tty_ldisc_lock_pair(tty, o_tty);
tty_lock_pair(tty, o_tty);
- /* This will need doing differently if we need to lock */
+
tty_ldisc_kill(tty);
if (o_tty)
tty_ldisc_kill(o_tty);

tty_unlock_pair(tty, o_tty);
+ tty_ldisc_unlock_pair(tty, o_tty);
+
/* And the memory resources remaining (buffers, termios) will be
disposed of when the kref hits zero */

@@ -1042,7 +802,7 @@ void tty_ldisc_release(struct tty_struct *tty, struct tty_struct *o_tty)

void tty_ldisc_init(struct tty_struct *tty)
{
- struct tty_ldisc *ld = tty_ldisc_get(N_TTY);
+ struct tty_ldisc *ld = tty_ldisc_get(tty, N_TTY);
if (IS_ERR(ld))
panic("n_tty: init_tty");
tty->ldisc = ld;
diff --git a/include/linux/tty.h b/include/linux/tty.h
index bfa6fca..2c109a3 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -238,7 +238,7 @@ struct tty_struct {
int index;

/* Protects ldisc changes: Lock tty not pty */
- struct mutex ldisc_mutex;
+ struct ld_semaphore ldisc_sem;
struct tty_ldisc *ldisc;

struct mutex atomic_write_lock;
@@ -306,8 +306,6 @@ struct tty_file_private {
#define TTY_DO_WRITE_WAKEUP 5 /* Call write_wakeup after queuing new */
#define TTY_PUSH 6 /* n_tty private */
#define TTY_CLOSING 7 /* ->close() in progress */
-#define TTY_LDISC 9 /* Line discipline attached */
-#define TTY_LDISC_CHANGING 10 /* Line discipline changing */
#define TTY_LDISC_OPEN 11 /* Line discipline is open */
#define TTY_HW_COOK_OUT 14 /* Hardware can do output cooking */
#define TTY_HW_COOK_IN 15 /* Hardware can do input cooking */
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index ca000fc..272075e 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -197,8 +197,7 @@ struct tty_ldisc_ops {

struct tty_ldisc {
struct tty_ldisc_ops *ops;
- atomic_t users;
- wait_queue_head_t wq_idle;
+ struct tty_struct *tty;
};

#define TTY_LDISC_MAGIC 0x5403
--
1.8.1.2

2013-03-19 20:04:32

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 1/7] tty: Add timed, writer-prioritized rw semaphore

The semantics of a rw semaphore are almost ideally suited
for tty line discipline lifetime management; multiple active
threads obtain "references" (read locks) while performing i/o
to prevent the loss or change of the current line discipline
(write lock).

Unfortunately, the existing rw_semaphore is ill-suited in other
ways;
1) TIOCSETD ioctl (change line discipline) expects to return an
error if the line discipline cannot be exclusively locked within
5 secs. Lock wait timeouts are not supported by rwsem.
2) A tty hangup is expected to halt and scrap pending i/o, so
exclusive locking must be prioritized.
Writer priority is not supported by rwsem.

Add ld_semaphore which implements these requirements in a
semantically similar way to rw_semaphore.

Writer priority is handled by separate wait lists for readers and
writers. Pending write waits are priortized before existing read
waits and prevent further read locks.

Wait timeouts are trivially added, but obviously change the lock
semantics as lock attempts can fail (but only due to timeout).

This implementation incorporates the write-lock stealing work of
Michel Lespinasse <[email protected]>.

Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/Makefile | 2 +-
drivers/tty/tty_ldsem.c | 453 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/tty_ldisc.h | 46 +++++
3 files changed, 500 insertions(+), 1 deletion(-)
create mode 100644 drivers/tty/tty_ldsem.c

diff --git a/drivers/tty/Makefile b/drivers/tty/Makefile
index 6b78399..58ad1c0 100644
--- a/drivers/tty/Makefile
+++ b/drivers/tty/Makefile
@@ -1,5 +1,5 @@
obj-$(CONFIG_TTY) += tty_io.o n_tty.o tty_ioctl.o tty_ldisc.o \
- tty_buffer.o tty_port.o tty_mutex.o
+ tty_buffer.o tty_port.o tty_mutex.o tty_ldsem.o
obj-$(CONFIG_LEGACY_PTYS) += pty.o
obj-$(CONFIG_UNIX98_PTYS) += pty.o
obj-$(CONFIG_AUDIT) += tty_audit.o
diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
new file mode 100644
index 0000000..22fad8a
--- /dev/null
+++ b/drivers/tty/tty_ldsem.c
@@ -0,0 +1,453 @@
+/*
+ * Ldisc rw semaphore
+ *
+ * The ldisc semaphore is semantically a rw_semaphore but which enforces
+ * an alternate policy, namely:
+ * 1) Supports lock wait timeouts
+ * 2) Write waiter has priority
+ * 3) Downgrading is not supported
+ *
+ * Implementation notes:
+ * 1) Upper half of semaphore count is a wait count (differs from rwsem
+ * in that rwsem normalizes the upper half to the wait bias)
+ * 2) Lacks overflow checking
+ *
+ * The generic counting was copied and modified from include/asm-generic/rwsem.h
+ * by Paul Mackerras <[email protected]>.
+ *
+ * The scheduling policy was copied and modified from lib/rwsem.c
+ * Written by David Howells ([email protected]).
+ *
+ * This implementation incorporates the write lock stealing work of
+ * Michel Lespinasse <[email protected]>.
+ *
+ * Copyright (C) 2013 Peter Hurley <[email protected]>
+ *
+ * This file may be redistributed under the terms of the GNU General Public
+ * License v2.
+ */
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <linux/tty.h>
+#include <linux/sched.h>
+
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+# define __acq(l, s, t, r, c, n, i) \
+ lock_acquire(&(l)->dep_map, s, t, r, c, n, i)
+# define __rel(l, n, i) \
+ lock_release(&(l)->dep_map, n, i)
+# ifdef CONFIG_PROVE_LOCKING
+# define lockdep_acquire(l, s, t, i) __acq(l, s, t, 0, 2, NULL, i)
+# define lockdep_acquire_nest(l, s, t, n, i) __acq(l, s, t, 0, 2, n, i)
+# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 1, 2, NULL, i)
+# define lockdep_release(l, n, i) __rel(l, n, i)
+# else
+# define lockdep_acquire(l, s, t, i) __acq(l, s, t, 0, 1, NULL, i)
+# define lockdep_acquire_nest(l, s, t, n, i) __acq(l, s, t, 0, 1, n, i)
+# define lockdep_acquire_read(l, s, t, i) __acq(l, s, t, 1, 1, NULL, i)
+# define lockdep_release(l, n, i) __rel(l, n, i)
+# endif
+#else
+# define lockdep_acquire(l, s, t, i) do { } while (0)
+# define lockdep_acquire_nest(l, s, t, n, i) do { } while (0)
+# define lockdep_acquire_read(l, s, t, i) do { } while (0)
+# define lockdep_release(l, n, i) do { } while (0)
+#endif
+
+#ifdef CONFIG_LOCK_STAT
+# define lock_stat(_lock, stat) lock_##stat(&(_lock)->dep_map, _RET_IP_)
+#else
+# define lock_stat(_lock, stat) do { } while (0)
+#endif
+
+
+#if BITS_PER_LONG == 64
+# define LDSEM_ACTIVE_MASK 0xffffffffL
+#else
+# define LDSEM_ACTIVE_MASK 0x0000ffffL
+#endif
+
+#define LDSEM_UNLOCKED 0L
+#define LDSEM_ACTIVE_BIAS 1L
+#define LDSEM_WAIT_BIAS (-LDSEM_ACTIVE_MASK-1)
+#define LDSEM_READ_BIAS LDSEM_ACTIVE_BIAS
+#define LDSEM_WRITE_BIAS (LDSEM_WAIT_BIAS + LDSEM_ACTIVE_BIAS)
+
+struct ldsem_waiter {
+ struct list_head list;
+ struct task_struct *task;
+};
+
+static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem)
+{
+ return atomic_long_add_return(delta, (atomic_long_t *)&sem->count);
+}
+
+static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem)
+{
+ long tmp = *old;
+ *old = atomic_long_cmpxchg(&sem->count, *old, new);
+ return *old == tmp;
+}
+
+/*
+ * Initialize an ldsem:
+ */
+void __init_ldsem(struct ld_semaphore *sem, const char *name,
+ struct lock_class_key *key)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ /*
+ * Make sure we are not reinitializing a held semaphore:
+ */
+ debug_check_no_locks_freed((void *)sem, sizeof(*sem));
+ lockdep_init_map(&sem->dep_map, name, key, 0);
+#endif
+ sem->count = LDSEM_UNLOCKED;
+ sem->wait_readers = 0;
+ raw_spin_lock_init(&sem->wait_lock);
+ INIT_LIST_HEAD(&sem->read_wait);
+ INIT_LIST_HEAD(&sem->write_wait);
+}
+
+static void __ldsem_wake_readers(struct ld_semaphore *sem)
+{
+ struct ldsem_waiter *waiter, *next;
+ struct task_struct *tsk;
+ long adjust, count;
+
+ /* Try to grant read locks to all readers on the read wait list.
+ * Note the 'active part' of the count is incremented by
+ * the number of readers before waking any processes up.
+ */
+ adjust = sem->wait_readers * (LDSEM_ACTIVE_BIAS - LDSEM_WAIT_BIAS);
+ count = ldsem_atomic_update(adjust, sem);
+ do {
+ if (count > 0)
+ break;
+ if (ldsem_cmpxchg(&count, count - adjust, sem))
+ return;
+ } while (1);
+
+ list_for_each_entry_safe(waiter, next, &sem->read_wait, list) {
+ tsk = waiter->task;
+ smp_mb();
+ waiter->task = NULL;
+ wake_up_process(tsk);
+ put_task_struct(tsk);
+ }
+ INIT_LIST_HEAD(&sem->read_wait);
+ sem->wait_readers = 0;
+}
+
+static inline int writer_trylock(struct ld_semaphore *sem)
+{
+ /* only wake this writer if the active part of the count can be
+ * transitioned from 0 -> 1
+ */
+ long count = ldsem_atomic_update(LDSEM_ACTIVE_BIAS, sem);
+ do {
+ if ((count & LDSEM_ACTIVE_MASK) == LDSEM_ACTIVE_BIAS)
+ return 1;
+ if (ldsem_cmpxchg(&count, count - LDSEM_ACTIVE_BIAS, sem))
+ return 0;
+ } while (1);
+}
+
+static void __ldsem_wake_writer(struct ld_semaphore *sem)
+{
+ struct ldsem_waiter *waiter;
+
+ waiter = list_entry(sem->write_wait.next, struct ldsem_waiter, list);
+ wake_up_process(waiter->task);
+}
+
+/*
+ * handle the lock release when processes blocked on it that can now run
+ * - if we come here from up_xxxx(), then:
+ * - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
+ * - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so)
+ * - the spinlock must be held by the caller
+ * - woken process blocks are discarded from the list after having task zeroed
+ */
+static void __ldsem_wake(struct ld_semaphore *sem)
+{
+ if (!list_empty(&sem->write_wait))
+ __ldsem_wake_writer(sem);
+ else if (!list_empty(&sem->read_wait))
+ __ldsem_wake_readers(sem);
+}
+
+static void ldsem_wake(struct ld_semaphore *sem)
+{
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&sem->wait_lock, flags);
+ __ldsem_wake(sem);
+ raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+}
+
+/*
+ * wait for the read lock to be granted
+ */
+static struct ld_semaphore __sched *
+down_read_failed(struct ld_semaphore *sem, long count, long timeout)
+{
+ struct ldsem_waiter waiter;
+ struct task_struct *tsk = current;
+ long adjust = -LDSEM_ACTIVE_BIAS + LDSEM_WAIT_BIAS;
+
+ /* set up my own style of waitqueue */
+ raw_spin_lock_irq(&sem->wait_lock);
+
+ /* Try to reverse the lock attempt but if the count has changed
+ * so that reversing fails, check if there are are no waiters,
+ * and early-out if not */
+ do {
+ if (ldsem_cmpxchg(&count, count + adjust, sem))
+ break;
+ if (count > 0) {
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return sem;
+ }
+ } while (1);
+
+ list_add_tail(&waiter.list, &sem->read_wait);
+ sem->wait_readers++;
+
+ waiter.task = tsk;
+ get_task_struct(tsk);
+
+ /* if there are no active locks, wake the new lock owner(s) */
+ if ((count & LDSEM_ACTIVE_MASK) == 0)
+ __ldsem_wake(sem);
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ /* wait to be given the lock */
+ for (;;) {
+ set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+
+ if (!waiter.task)
+ break;
+ if (!timeout)
+ break;
+ timeout = schedule_timeout(timeout);
+ }
+
+ __set_task_state(tsk, TASK_RUNNING);
+
+ if (!timeout) {
+ /* lock timed out but check if this task was just
+ * granted lock ownership - if so, pretend there
+ * was no timeout; otherwise, cleanup lock wait */
+ raw_spin_lock_irq(&sem->wait_lock);
+ if (waiter.task) {
+ ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem);
+ list_del(&waiter.list);
+ raw_spin_unlock_irq(&sem->wait_lock);
+ put_task_struct(waiter.task);
+ return NULL;
+ }
+ raw_spin_unlock_irq(&sem->wait_lock);
+ }
+
+ return sem;
+}
+
+/*
+ * wait for the write lock to be granted
+ */
+static struct ld_semaphore __sched *
+down_write_failed(struct ld_semaphore *sem, long count, long timeout)
+{
+ struct ldsem_waiter waiter;
+ struct task_struct *tsk = current;
+ long adjust = -LDSEM_ACTIVE_BIAS;
+ int locked = 0;
+
+ /* set up my own style of waitqueue */
+ raw_spin_lock_irq(&sem->wait_lock);
+
+ /* Try to reverse the lock attempt but if the count has changed
+ * so that reversing fails, check if the lock is now owned,
+ * and early-out if so */
+ do {
+ if (ldsem_cmpxchg(&count, count + adjust, sem))
+ break;
+ if ((count & LDSEM_ACTIVE_MASK) == LDSEM_ACTIVE_BIAS) {
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return sem;
+ }
+ } while (1);
+
+ list_add_tail(&waiter.list, &sem->write_wait);
+
+ waiter.task = tsk;
+
+ set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+ for (;;) {
+ if (!timeout)
+ break;
+ raw_spin_unlock_irq(&sem->wait_lock);
+ timeout = schedule_timeout(timeout);
+ raw_spin_lock_irq(&sem->wait_lock);
+ set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+ if ((locked = writer_trylock(sem)))
+ break;
+ }
+
+ if (!locked)
+ ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem);
+ list_del(&waiter.list);
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ __set_task_state(tsk, TASK_RUNNING);
+
+ /* lock wait may have timed out */
+ if (!locked)
+ return NULL;
+ return sem;
+}
+
+
+
+static inline int __ldsem_down_read_nested(struct ld_semaphore *sem,
+ int subclass, long timeout)
+{
+ long count;
+
+ lockdep_acquire_read(sem, subclass, 0, _RET_IP_);
+
+ count = ldsem_atomic_update(LDSEM_READ_BIAS, sem);
+ if (count <= 0) {
+ lock_stat(sem, contended);
+ if (!down_read_failed(sem, count, timeout)) {
+ lockdep_release(sem, 1, _RET_IP_);
+ return 0;
+ }
+ }
+ lock_stat(sem, acquired);
+ return 1;
+}
+
+static inline int __ldsem_down_write_nested(struct ld_semaphore *sem,
+ int subclass, long timeout)
+{
+ long count;
+
+ lockdep_acquire(sem, subclass, 0, _RET_IP_);
+
+ count = ldsem_atomic_update(LDSEM_WRITE_BIAS, sem);
+ if ((count & LDSEM_ACTIVE_MASK) != LDSEM_ACTIVE_BIAS) {
+ lock_stat(sem, contended);
+ if (!down_write_failed(sem, count, timeout)) {
+ lockdep_release(sem, 1, _RET_IP_);
+ return 0;
+ }
+ }
+ lock_stat(sem, acquired);
+ return 1;
+}
+
+
+/*
+ * lock for reading -- returns 1 if successful, 0 if timed out
+ */
+int __sched ldsem_down_read(struct ld_semaphore *sem, long timeout)
+{
+ might_sleep();
+ return __ldsem_down_read_nested(sem, 0, timeout);
+}
+
+/*
+ * trylock for reading -- returns 1 if successful, 0 if contention
+ */
+int ldsem_down_read_trylock(struct ld_semaphore *sem)
+{
+ long count = sem->count;
+
+ while (count >= 0) {
+ if (ldsem_cmpxchg(&count, count + LDSEM_READ_BIAS, sem)) {
+ lockdep_acquire_read(sem, 0, 1, _RET_IP_);
+ lock_stat(sem, acquired);
+ return 1;
+ }
+ }
+ return 0;
+}
+
+/*
+ * lock for writing -- returns 1 if successful, 0 if timed out
+ */
+int __sched ldsem_down_write(struct ld_semaphore *sem, long timeout)
+{
+ might_sleep();
+ return __ldsem_down_write_nested(sem, 0, timeout);
+}
+
+/*
+ * trylock for writing -- returns 1 if successful, 0 if contention
+ */
+int ldsem_down_write_trylock(struct ld_semaphore *sem)
+{
+ long count = sem->count;
+
+ while ((count & LDSEM_ACTIVE_MASK) == 0) {
+ if (ldsem_cmpxchg(&count, count + LDSEM_WRITE_BIAS, sem)) {
+ lockdep_acquire(sem, 0, 1, _RET_IP_);
+ lock_stat(sem, acquired);
+ return 1;
+ }
+ }
+ return 0;
+}
+
+/*
+ * release a read lock
+ */
+void ldsem_up_read(struct ld_semaphore *sem)
+{
+ long count;
+
+ lockdep_release(sem, 1, _RET_IP_);
+
+ count = ldsem_atomic_update(-LDSEM_READ_BIAS, sem);
+ if (count < 0 && (count & LDSEM_ACTIVE_MASK) == 0)
+ ldsem_wake(sem);
+}
+
+/*
+ * release a write lock
+ */
+void ldsem_up_write(struct ld_semaphore *sem)
+{
+ long count;
+
+ lockdep_release(sem, 1, _RET_IP_);
+
+ count = ldsem_atomic_update(-LDSEM_WRITE_BIAS, sem);
+ if (count < 0)
+ ldsem_wake(sem);
+}
+
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+int ldsem_down_read_nested(struct ld_semaphore *sem, int subclass, long timeout)
+{
+ might_sleep();
+ return __ldsem_down_read_nested(sem, subclass, timeout);
+}
+
+int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass,
+ long timeout)
+{
+ might_sleep();
+ return __ldsem_down_write_nested(sem, subclass, timeout);
+}
+
+#endif
diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h
index 455a0d7..ca000fc 100644
--- a/include/linux/tty_ldisc.h
+++ b/include/linux/tty_ldisc.h
@@ -110,6 +110,52 @@
#include <linux/wait.h>
#include <linux/wait.h>

+
+/*
+ * the semaphore definition
+ */
+struct ld_semaphore {
+ long count;
+ raw_spinlock_t wait_lock;
+ unsigned int wait_readers;
+ struct list_head read_wait;
+ struct list_head write_wait;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
+};
+
+extern void __init_ldsem(struct ld_semaphore *sem, const char *name,
+ struct lock_class_key *key);
+
+#define init_ldsem(sem) \
+do { \
+ static struct lock_class_key __key; \
+ \
+ __init_ldsem((sem), #sem, &__key); \
+} while (0)
+
+
+extern int ldsem_down_read(struct ld_semaphore *sem, long timeout);
+extern int ldsem_down_read_trylock(struct ld_semaphore *sem);
+extern int ldsem_down_write(struct ld_semaphore *sem, long timeout);
+extern int ldsem_down_write_trylock(struct ld_semaphore *sem);
+extern void ldsem_up_read(struct ld_semaphore *sem);
+extern void ldsem_up_write(struct ld_semaphore *sem);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern int ldsem_down_read_nested(struct ld_semaphore *sem, int subclass,
+ long timeout);
+extern int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass,
+ long timeout);
+#else
+# define ldsem_down_read_nested(sem, subclass, timeout) \
+ ldsem_down_read(sem, timeout)
+# define ldsem_down_write_nested(sem, subclass, timeout) \
+ ldsem_down_write(sem, timeout)
+#endif
+
+
struct tty_ldisc_ops {
int magic;
char *name;
--
1.8.1.2

2013-03-19 20:05:08

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 5/7] tty: Fix hangup race with TIOCSETD ioctl

The hangup may already have happened; check for that state also.

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 9ace119..84ba790 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -543,10 +543,8 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
old_ldisc = tty->ldisc;
tty_lock(tty);

- /* FIXME: for testing only */
- WARN_ON(test_bit(TTY_HUPPED, &tty->flags));
-
- if (test_bit(TTY_HUPPING, &tty->flags)) {
+ if (test_bit(TTY_HUPPING, &tty->flags) ||
+ test_bit(TTY_HUPPED, &tty->flags)) {
/* We were raced by the hangup method. It will have stomped
the ldisc data and closed the ldisc down */
tty_ldisc_enable_pair(tty, o_tty);
--
1.8.1.2

2013-03-19 20:05:23

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 0/7] ldsem patchset

Ingo and David,

Greg has asked me to get your acks on the r/w semaphore implementation
in the first patch,
'tty: Add timed, writer-prioritized rw_semaphore'

Would you please review the implementation and comment?


This is a re-spin of the remainder of 'ldisc patchset'.
The semaphore-related patches of that set have been squashed into the
first patch.


Sasha and Dave,

Please don't run the 44-patch ldisc patchset anymore. That patchset
was only partially applied so trinity testing for linux-next may
give false assurance that those bugs are fixed.


Peter Hurley (7):
tty: Add timed, writer-prioritized rw semaphore
tty: Add lock/unlock ldisc pair functions
tty: Replace ldisc locking with ldisc_sem
tty: Clarify ldisc variable
tty: Fix hangup race with TIOCSETD ioctl
tty: Clarify multiple-references comment in TIOCSETD ioctl
tty: Fix tty_ldisc_lock name collision

drivers/tty/Makefile | 2 +-
drivers/tty/tty_buffer.c | 2 +-
drivers/tty/tty_io.c | 7 +-
drivers/tty/tty_ldisc.c | 447 +++++++++++++++------------------------------
drivers/tty/tty_ldsem.c | 453 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/tty.h | 4 +-
include/linux/tty_ldisc.h | 49 ++++-
7 files changed, 653 insertions(+), 311 deletions(-)
create mode 100644 drivers/tty/tty_ldsem.c

--
1.8.1.2

2013-03-19 20:06:07

by Peter Hurley

[permalink] [raw]
Subject: [PATCH 6/7] tty: Clarify multiple-references comment in TIOCSETD ioctl

Signed-off-by: Peter Hurley <[email protected]>
---
drivers/tty/tty_ldisc.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 84ba790..9725c94 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -567,13 +567,15 @@ int tty_set_ldisc(struct tty_struct *tty, int ldisc)
tty_ldisc_restore(tty, old_ldisc);
}

- /* At this point we hold a reference to the new ldisc and a
- a reference to the old ldisc. If we ended up flipping back
- to the existing ldisc we have two references to it */
-
if (tty->ldisc->ops->num != old_ldisc->ops->num && tty->ops->set_ldisc)
tty->ops->set_ldisc(tty);

+ /* At this point we hold a reference to the new ldisc and a
+ reference to the old ldisc, or we hold two references to
+ the old ldisc (if it was restored as part of error cleanup
+ above). In either case, releasing a single reference from
+ the old ldisc is correct. */
+
tty_ldisc_put(old_ldisc);

/*
--
1.8.1.2

2013-03-26 23:48:40

by Peter Hurley

[permalink] [raw]
Subject: Re: [PATCH v5 26/44] tty: Add read-recursive, writer-prioritized rw semaphore

On Mon, 2013-03-18 at 18:59 -0700, Greg Kroah-Hartman wrote:
> > If you'd like, I can send you 6 or so short user test programs that
> > hang, crash, or deadlock inside 60 seconds on mainline and next, but not
> > with this patchset.
>
> That would be interesting to have, please send them.
>
> And I hope that they only lock up when run as root, but I'm afraid to
> ask that question...

Sorry Greg, I meant to get to this sooner.

This one is pretty straightforward. Does not require suid.
I would still recommend only running this in a disposable vm -- that's
what I do. The vm should have 6 cores to do real business.

It helps to hook up netconsole or whatever so you can see what kind of
progress it's making but don't use a slow framebuffer console because
that invariably locks up :). If it pauses for 5 secs., that's actually
correct behavior.

--- >% ---

Signed-off-by: Peter Hurley <[email protected]>
---
pts_test3.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 151 insertions(+)
create mode 100644 pts_test3.c

diff --git a/pts_test3.c b/pts_test3.c
new file mode 100644
index 0000000..b673f17
--- /dev/null
+++ b/pts_test3.c
@@ -0,0 +1,151 @@
+/*
+ * pts_test3.c
+ *
+ * Created on: Dec, 2012
+ * Copyright (C) 2012 Ilya Zykov
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Modified-by: Peter Hurley <[email protected]>
+ */
+
+#include <stdio.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <termios.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <signal.h>
+
+#define BUF_SIZE 2
+#define ERROR_EXIT_CODE 1
+#define parent child_id
+
+static int
+mfd=-1, sfd=-1, parent=1;
+
+static pthread_t
+pth_id;
+
+static char
+pty_name[24], buf[]={ '1', '\n' };
+
+
+static void
+pty_exit(int ret, char * exit_message){
+ if (sfd >= 0) close(sfd);
+ if (mfd >= 0) close(mfd);
+ printf("%s %s %s exit. \n",exit_message?exit_message:"",
+ ret?"Error":"Normal", parent?"parent":"child");
+ exit(ret);
+}
+
+static void
+pty_init(void){
+ int ptn;
+ if( (mfd=open("/dev/ptmx", O_RDWR )) < 0 )
+ pty_exit(ERROR_EXIT_CODE,"Couldn't open /dev/ptmx. \n");
+ if (ioctl(mfd, TIOCGPTN, &ptn) < 0 )
+ pty_exit(ERROR_EXIT_CODE,"Couldn't get pty number. \n");
+ snprintf(pty_name, sizeof(pty_name), "/dev/pts/%d", ptn);
+ //printf("Slave pty name = %s.\n",pty_name);
+ ptn=0;
+ if (ioctl(mfd, TIOCSPTLCK, &ptn) < 0 )
+ pty_exit(ERROR_EXIT_CODE,"Couldn't unlock pty slave. \n");
+ if ( (sfd=open(pty_name, O_RDWR )) < 0 )
+ pty_exit(ERROR_EXIT_CODE, "Couldn't open pty slave. \n");
+}
+
+static void *
+pty_thread_open(void * arg) {
+ static const char ret[]="Thread open has been created.\n";
+ printf(ret);
+ do {
+ close(open(pty_name, O_RDWR ));
+ } while(1);
+ return ret;
+}
+
+static void *
+pty_thread_read(void * arg) {
+ static const char ret[]="Thread read has been created.\n";
+ printf(ret);
+ do {
+ read(sfd, buf, BUF_SIZE);
+ } while(1);
+ return ret;
+}
+
+static void *
+pty_thread_write(void * arg) {
+ static char ret[]="Thread write has been created.\n";
+ printf(ret);
+ do {
+ write(mfd, buf, BUF_SIZE);
+ } while(1);
+ return ret;
+}
+
+#define N_PPS 18
+
+static void *
+pty_thread_msetd(void * arg) {
+ static const char ret[]="Thread msetd has been created.\n";
+ static int ldisc;
+ printf(ret);
+ do {
+ ldisc = N_PPS;
+ ioctl(mfd, TIOCSETD, &ldisc);
+ ldisc = N_TTY;
+ ioctl(mfd, TIOCSETD, &ldisc);
+ } while(1);
+ return ret;
+}
+
+static void *
+pty_thread_ssetd(void * arg) {
+ static char ret[]="Thread ssetd has been created.\n";
+ static int ldisc;
+ printf(ret);
+ do {
+ ldisc = N_PPS;
+ ioctl(sfd, TIOCSETD, &ldisc);
+ ldisc = N_TTY;
+ ioctl(sfd, TIOCSETD, &ldisc);
+ } while(1);
+ return ret;
+}
+
+int main(int argc,char *argv[]) {
+ pty_init();
+ child_id=fork();
+ if(parent) {
+ sleep(100);
+ kill(child_id, SIGINT);
+ pty_exit(0,"Parent normal exit\n");
+ }
+ pthread_create(&pth_id, NULL, &pty_thread_open, 0);
+ /* For WARNINGS. */
+ pthread_create(&pth_id, NULL, &pty_thread_write, 0);
+ pthread_create(&pth_id, NULL, &pty_thread_read, 0);
+
+ pthread_create(&pth_id, NULL, &pty_thread_msetd, 0);
+ pthread_create(&pth_id, NULL, &pty_thread_ssetd, 0);
+ do {
+ close(sfd);
+ close(mfd);
+ pty_init();
+ } while(1);
+ return 0;
+}
--
1.8.1.2