2013-05-13 21:58:48

by Jörn Engel

[permalink] [raw]
Subject: [PATCH 0/3] target: Fix two races leading to use-after-free

In our testing we've encountered use-after-free bugs, usually in the
shape of double list_del, at a rate of 2-10 per week. Patches 2 and 3
fix two races that can both lead to use-after-free and after applying
both of those patches, we have been bug-free for some weeks now.

Patch 1 is an unrelated trivial cleanup. I just happened to spot it
while I was in the area.

Joern Engel (3):
target: removed unused transport_state flag
target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5
target: simplify target_wait_for_sess_cmds()

drivers/infiniband/ulp/srpt/ib_srpt.c | 2 +-
drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 +-
drivers/target/target_core_transport.c | 73 +++++++++-----------------------
include/linux/kref.h | 33 +++++++++++++++
include/target/target_core_base.h | 3 --
include/target/target_core_fabric.h | 2 +-
6 files changed, 57 insertions(+), 58 deletions(-)

--
1.7.10.4


2013-05-13 21:58:21

by Jörn Engel

[permalink] [raw]
Subject: [PATCH 1/3] target: removed unused transport_state flag

Signed-off-by: Joern Engel <[email protected]>
---
include/target/target_core_base.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index c4af592..068ec0f 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -463,7 +463,6 @@ struct se_cmd {
#define CMD_T_ABORTED (1 << 0)
#define CMD_T_ACTIVE (1 << 1)
#define CMD_T_COMPLETE (1 << 2)
-#define CMD_T_QUEUED (1 << 3)
#define CMD_T_SENT (1 << 4)
#define CMD_T_STOP (1 << 5)
#define CMD_T_FAILED (1 << 6)
--
1.7.10.4

2013-05-13 21:59:07

by Jörn Engel

[permalink] [raw]
Subject: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

The second parameter was always 0, leading to effectively dead code. It
called list_del() and se_cmd->se_tfo->release_cmd(), and had to set a
flag to prevent target_release_cmd_kref() from doing the same. But most
of all, it iterated the list without taking se_sess->sess_cmd_lock,
leading to races against ABORT and LUN_RESET.

Since the whole point of the function is to wait for the list to drain,
and potentially print a bit of debug information in case that never
happens, I've replaced the wait_for_completion() with 100ms sleep. The
only callpath that would get delayed by this is rmmod, afaics, so I
didn't want the overhead of a waitqueue.

Signed-off-by: Joern Engel <[email protected]>
---
drivers/infiniband/ulp/srpt/ib_srpt.c | 2 +-
drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 +-
drivers/target/target_core_transport.c | 64 +++++++++-----------------------
include/target/target_core_base.h | 2 -
include/target/target_core_fabric.h | 2 +-
5 files changed, 20 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index c09d41b..c318f7c 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2328,7 +2328,7 @@ static void srpt_release_channel_work(struct work_struct *w)
se_sess = ch->sess;
BUG_ON(!se_sess);

- target_wait_for_sess_cmds(se_sess, 0);
+ target_wait_for_sess_cmds(se_sess);

transport_deregister_session_configfs(se_sess);
transport_deregister_session(se_sess);
diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
index d182c96..7a3870f 100644
--- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
+++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
@@ -1370,7 +1370,7 @@ static void tcm_qla2xxx_free_session(struct qla_tgt_sess *sess)
dump_stack();
return;
}
- target_wait_for_sess_cmds(se_sess, 0);
+ target_wait_for_sess_cmds(se_sess);

transport_deregister_session_configfs(sess->se_sess);
transport_deregister_session(sess->se_sess);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 0d46276..5b6dbf9 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1043,7 +1043,6 @@ void transport_init_se_cmd(
init_completion(&cmd->transport_lun_fe_stop_comp);
init_completion(&cmd->transport_lun_stop_comp);
init_completion(&cmd->t_transport_stop_comp);
- init_completion(&cmd->cmd_wait_comp);
init_completion(&cmd->task_stop_comp);
spin_lock_init(&cmd->t_state_lock);
cmd->transport_state = CMD_T_DEV_ACTIVE;
@@ -2219,11 +2218,6 @@ static void target_release_cmd_kref(struct kref *kref)
se_cmd->se_tfo->release_cmd(se_cmd);
return;
}
- if (se_sess->sess_tearing_down && se_cmd->cmd_wait_set) {
- spin_unlock(&se_sess->sess_cmd_lock);
- complete(&se_cmd->cmd_wait_comp);
- return;
- }
list_del(&se_cmd->se_cmd_list);
spin_unlock(&se_sess->sess_cmd_lock);

@@ -2241,68 +2235,44 @@ int target_put_sess_cmd(struct se_session *se_sess, struct se_cmd *se_cmd)
}
EXPORT_SYMBOL(target_put_sess_cmd);

-/* target_sess_cmd_list_set_waiting - Flag all commands in
- * sess_cmd_list to complete cmd_wait_comp. Set
+/* target_sess_cmd_list_set_waiting - Set
* sess_tearing_down so no more commands are queued.
* @se_sess: session to flag
*/
void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
{
- struct se_cmd *se_cmd;
unsigned long flags;

spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
-
WARN_ON(se_sess->sess_tearing_down);
se_sess->sess_tearing_down = 1;
-
- list_for_each_entry(se_cmd, &se_sess->sess_cmd_list, se_cmd_list)
- se_cmd->cmd_wait_set = 1;
-
spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
}
EXPORT_SYMBOL(target_sess_cmd_list_set_waiting);

/* target_wait_for_sess_cmds - Wait for outstanding descriptors
* @se_sess: session to wait for active I/O
- * @wait_for_tasks: Make extra transport_wait_for_tasks call
*/
-void target_wait_for_sess_cmds(
- struct se_session *se_sess,
- int wait_for_tasks)
+void target_wait_for_sess_cmds(struct se_session *se_sess)
{
- struct se_cmd *se_cmd, *tmp_cmd;
- bool rc = false;
-
- list_for_each_entry_safe(se_cmd, tmp_cmd,
- &se_sess->sess_cmd_list, se_cmd_list) {
- list_del(&se_cmd->se_cmd_list);
-
- pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
- " %d\n", se_cmd, se_cmd->t_state,
- se_cmd->se_tfo->get_cmd_state(se_cmd));
-
- if (wait_for_tasks) {
- pr_debug("Calling transport_wait_for_tasks se_cmd: %p t_state: %d,"
- " fabric state: %d\n", se_cmd, se_cmd->t_state,
- se_cmd->se_tfo->get_cmd_state(se_cmd));
-
- rc = transport_wait_for_tasks(se_cmd);
-
- pr_debug("After transport_wait_for_tasks se_cmd: %p t_state: %d,"
- " fabric state: %d\n", se_cmd, se_cmd->t_state,
- se_cmd->se_tfo->get_cmd_state(se_cmd));
- }
+ struct se_cmd *se_cmd, *last_cmd = NULL;
+ unsigned long flags;

- if (!rc) {
- wait_for_completion(&se_cmd->cmd_wait_comp);
- pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
- " fabric state: %d\n", se_cmd, se_cmd->t_state,
- se_cmd->se_tfo->get_cmd_state(se_cmd));
+ spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
+ while (!list_empty(&se_sess->sess_cmd_list)) {
+ se_cmd = list_entry(se_sess->sess_cmd_list.next, struct se_cmd,
+ se_cmd_list);
+ if (se_cmd != last_cmd) { /* print this only once per command */
+ pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state: %d\n",
+ se_cmd, se_cmd->t_state,
+ se_cmd->se_tfo->get_cmd_state(se_cmd));
+ last_cmd = se_cmd;
}
-
- se_cmd->se_tfo->release_cmd(se_cmd);
+ spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
+ msleep_interruptible(100);
+ spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
}
+ spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
}
EXPORT_SYMBOL(target_wait_for_sess_cmds);

diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 068ec0f..16d58b5 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -426,7 +426,6 @@ struct se_cmd {
enum transport_state_table t_state;
/* Used to signal cmd->se_tfo->check_release_cmd() usage per cmd */
unsigned check_release:1;
- unsigned cmd_wait_set:1;
unsigned unknown_data_length:1;
/* See se_cmd_flags_table */
u32 se_cmd_flags;
@@ -449,7 +448,6 @@ struct se_cmd {
struct se_session *se_sess;
struct se_tmr_req *se_tmr_req;
struct list_head se_cmd_list;
- struct completion cmd_wait_comp;
struct kref cmd_kref;
struct target_core_fabric_ops *se_tfo;
sense_reason_t (*execute_cmd)(struct se_cmd *);
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index aaa1ee6..57f8fb7 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -123,7 +123,7 @@ int transport_send_check_condition_and_sense(struct se_cmd *,

int target_put_sess_cmd(struct se_session *, struct se_cmd *);
void target_sess_cmd_list_set_waiting(struct se_session *);
-void target_wait_for_sess_cmds(struct se_session *, int);
+void target_wait_for_sess_cmds(struct se_session *);

int core_alua_check_nonop_delay(struct se_cmd *);

--
1.7.10.4

2013-05-13 21:58:38

by Jörn Engel

[permalink] [raw]
Subject: [PATCH 2/3] target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5

It is possible for one thread to to take se_sess->sess_cmd_lock in
core_tmr_abort_task() before taking a reference count on
se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops
se_cmd->cmd_kref before taking se_sess->sess_cmd_lock.

This introduces kref_put_spinlock_irqsave() and uses it in
target_put_sess_cmd() to close the race window.

Signed-off-by: Joern Engel <[email protected]>
---
drivers/target/target_core_transport.c | 11 +++++------
include/linux/kref.h | 33 ++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 3243ea7..0d46276 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2213,21 +2213,19 @@ static void target_release_cmd_kref(struct kref *kref)
{
struct se_cmd *se_cmd = container_of(kref, struct se_cmd, cmd_kref);
struct se_session *se_sess = se_cmd->se_sess;
- unsigned long flags;

- spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
if (list_empty(&se_cmd->se_cmd_list)) {
- spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
+ spin_unlock(&se_sess->sess_cmd_lock);
se_cmd->se_tfo->release_cmd(se_cmd);
return;
}
if (se_sess->sess_tearing_down && se_cmd->cmd_wait_set) {
- spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
+ spin_unlock(&se_sess->sess_cmd_lock);
complete(&se_cmd->cmd_wait_comp);
return;
}
list_del(&se_cmd->se_cmd_list);
- spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
+ spin_unlock(&se_sess->sess_cmd_lock);

se_cmd->se_tfo->release_cmd(se_cmd);
}
@@ -2238,7 +2236,8 @@ static void target_release_cmd_kref(struct kref *kref)
*/
int target_put_sess_cmd(struct se_session *se_sess, struct se_cmd *se_cmd)
{
- return kref_put(&se_cmd->cmd_kref, target_release_cmd_kref);
+ return kref_put_spinlock_irqsave(&se_cmd->cmd_kref, target_release_cmd_kref,
+ &se_sess->sess_cmd_lock);
}
EXPORT_SYMBOL(target_put_sess_cmd);

diff --git a/include/linux/kref.h b/include/linux/kref.h
index 4972e6e..7419c02 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -19,6 +19,7 @@
#include <linux/atomic.h>
#include <linux/kernel.h>
#include <linux/mutex.h>
+#include <linux/spinlock.h>

struct kref {
atomic_t refcount;
@@ -95,6 +96,38 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
return kref_sub(kref, 1, release);
}

+/**
+ * kref_put_spinlock_irqsave - decrement refcount for object.
+ * @kref: object.
+ * @release: pointer to the function that will clean up the object when the
+ * last reference to the object is released.
+ * This pointer is required, and it is not acceptable to pass kfree
+ * in as this function.
+ * @lock: lock to take in release case
+ *
+ * Behaves identical to kref_put with one exception. If the reference count
+ * drops to zero, the lock will be taken atomically wrt dropping the reference
+ * count. The release function has to call spin_unlock() without _irqrestore.
+ */
+static inline int kref_put_spinlock_irqsave(struct kref *kref,
+ void (*release)(struct kref *kref),
+ spinlock_t *lock)
+{
+ unsigned long flags;
+
+ WARN_ON(release == NULL);
+ if (atomic_add_unless(&kref->refcount, -1, 1))
+ return 0;
+ spin_lock_irqsave(lock, flags);
+ if (atomic_dec_and_test(&kref->refcount)) {
+ release(kref);
+ local_irq_restore(flags);
+ return 1;
+ }
+ spin_unlock_irqrestore(lock, flags);
+ return 0;
+}
+
static inline int kref_put_mutex(struct kref *kref,
void (*release)(struct kref *kref),
struct mutex *lock)
--
1.7.10.4

2013-05-13 22:57:18

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Mon, 2013-05-13 at 16:30 -0400, Joern Engel wrote:
> The second parameter was always 0, leading to effectively dead code. It
> called list_del() and se_cmd->se_tfo->release_cmd(), and had to set a
> flag to prevent target_release_cmd_kref() from doing the same.

Look again. The call to transport_wait_for_tasks() was dead code, but
the wait_for_completion(&se_cmd->cmd_wait_comp) most certainly is not.

> But most
> of all, it iterated the list without taking se_sess->sess_cmd_lock,
> leading to races against ABORT and LUN_RESET.
>

Ugh. You'll recall that target_wait_for_sess_cmds() originally did not
have to take the lock because the list was spliced into
sess_wait_list.

When Roland removed sess_wait_list in commit 1c7b13fe, no one re-added
the lock here.

> Since the whole point of the function is to wait for the list to drain,
> and potentially print a bit of debug information in case that never
> happens, I've replaced the wait_for_completion() with 100ms sleep. The
> only callpath that would get delayed by this is rmmod, afaics, so I
> didn't want the overhead of a waitqueue.
>

This seems totally wrong..

> Signed-off-by: Joern Engel <[email protected]>
> ---
> drivers/infiniband/ulp/srpt/ib_srpt.c | 2 +-
> drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 +-
> drivers/target/target_core_transport.c | 64 +++++++++-----------------------
> include/target/target_core_base.h | 2 -
> include/target/target_core_fabric.h | 2 +-
> 5 files changed, 20 insertions(+), 52 deletions(-)
>

<SNIP>

> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 0d46276..5b6dbf9 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -1043,7 +1043,6 @@ void transport_init_se_cmd(
> init_completion(&cmd->transport_lun_fe_stop_comp);
> init_completion(&cmd->transport_lun_stop_comp);
> init_completion(&cmd->t_transport_stop_comp);
> - init_completion(&cmd->cmd_wait_comp);
> init_completion(&cmd->task_stop_comp);
> spin_lock_init(&cmd->t_state_lock);
> cmd->transport_state = CMD_T_DEV_ACTIVE;
> @@ -2219,11 +2218,6 @@ static void target_release_cmd_kref(struct kref *kref)
> se_cmd->se_tfo->release_cmd(se_cmd);
> return;
> }
> - if (se_sess->sess_tearing_down && se_cmd->cmd_wait_set) {
> - spin_unlock(&se_sess->sess_cmd_lock);
> - complete(&se_cmd->cmd_wait_comp);
> - return;
> - }
> list_del(&se_cmd->se_cmd_list);
> spin_unlock(&se_sess->sess_cmd_lock);
>
> @@ -2241,68 +2235,44 @@ int target_put_sess_cmd(struct se_session *se_sess, struct se_cmd *se_cmd)
> }
> EXPORT_SYMBOL(target_put_sess_cmd);
>
> -/* target_sess_cmd_list_set_waiting - Flag all commands in
> - * sess_cmd_list to complete cmd_wait_comp. Set
> +/* target_sess_cmd_list_set_waiting - Set
> * sess_tearing_down so no more commands are queued.
> * @se_sess: session to flag
> */
> void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
> {
> - struct se_cmd *se_cmd;
> unsigned long flags;
>
> spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> -
> WARN_ON(se_sess->sess_tearing_down);
> se_sess->sess_tearing_down = 1;
> -
> - list_for_each_entry(se_cmd, &se_sess->sess_cmd_list, se_cmd_list)
> - se_cmd->cmd_wait_set = 1;
> -
> spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> }
> EXPORT_SYMBOL(target_sess_cmd_list_set_waiting);
>
> /* target_wait_for_sess_cmds - Wait for outstanding descriptors
> * @se_sess: session to wait for active I/O
> - * @wait_for_tasks: Make extra transport_wait_for_tasks call
> */
> -void target_wait_for_sess_cmds(
> - struct se_session *se_sess,
> - int wait_for_tasks)
> +void target_wait_for_sess_cmds(struct se_session *se_sess)
> {
> - struct se_cmd *se_cmd, *tmp_cmd;
> - bool rc = false;
> -
> - list_for_each_entry_safe(se_cmd, tmp_cmd,
> - &se_sess->sess_cmd_list, se_cmd_list) {
> - list_del(&se_cmd->se_cmd_list);
> -
> - pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
> - " %d\n", se_cmd, se_cmd->t_state,
> - se_cmd->se_tfo->get_cmd_state(se_cmd));
> -
> - if (wait_for_tasks) {
> - pr_debug("Calling transport_wait_for_tasks se_cmd: %p t_state: %d,"
> - " fabric state: %d\n", se_cmd, se_cmd->t_state,
> - se_cmd->se_tfo->get_cmd_state(se_cmd));
> -
> - rc = transport_wait_for_tasks(se_cmd);
> -
> - pr_debug("After transport_wait_for_tasks se_cmd: %p t_state: %d,"
> - " fabric state: %d\n", se_cmd, se_cmd->t_state,
> - se_cmd->se_tfo->get_cmd_state(se_cmd));
> - }
> + struct se_cmd *se_cmd, *last_cmd = NULL;
> + unsigned long flags;
>
> - if (!rc) {
> - wait_for_completion(&se_cmd->cmd_wait_comp);
> - pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
> - " fabric state: %d\n", se_cmd, se_cmd->t_state,
> - se_cmd->se_tfo->get_cmd_state(se_cmd));
> + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> + while (!list_empty(&se_sess->sess_cmd_list)) {
> + se_cmd = list_entry(se_sess->sess_cmd_list.next, struct se_cmd,
> + se_cmd_list);
> + if (se_cmd != last_cmd) { /* print this only once per command */
> + pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state: %d\n",
> + se_cmd, se_cmd->t_state,
> + se_cmd->se_tfo->get_cmd_state(se_cmd));
> + last_cmd = se_cmd;
> }
> -
> - se_cmd->se_tfo->release_cmd(se_cmd);
> + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> + msleep_interruptible(100);
> + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> }
> + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> }

So what happens when the backend se_cmd I/O does not complete before the
msleep finishes..?

It seems totally wrong to drop the initial cmd_wait_set =1 assignment,
target_release_cmd_kref() completion for cmd_wait_comp, and wait on
cmd_wait_comp to allow se_cmd to finish processing here.

Who cares about waitqueue overhead in a shutdown slow path when the
replacement doesn't address long outstanding commands..?

--nab

2013-05-13 22:59:39

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 2/3] target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5

On Mon, 2013-05-13 at 16:30 -0400, Joern Engel wrote:
> It is possible for one thread to to take se_sess->sess_cmd_lock in
> core_tmr_abort_task() before taking a reference count on
> se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops
> se_cmd->cmd_kref before taking se_sess->sess_cmd_lock.
>
> This introduces kref_put_spinlock_irqsave() and uses it in
> target_put_sess_cmd() to close the race window.
>
> Signed-off-by: Joern Engel <[email protected]>
> ---

I'm fine with applying this one..

Greg-KH, are you OK with the kref.h changes here..?

--nab

> drivers/target/target_core_transport.c | 11 +++++------
> include/linux/kref.h | 33 ++++++++++++++++++++++++++++++++
> 2 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 3243ea7..0d46276 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2213,21 +2213,19 @@ static void target_release_cmd_kref(struct kref *kref)
> {
> struct se_cmd *se_cmd = container_of(kref, struct se_cmd, cmd_kref);
> struct se_session *se_sess = se_cmd->se_sess;
> - unsigned long flags;
>
> - spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> if (list_empty(&se_cmd->se_cmd_list)) {
> - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> + spin_unlock(&se_sess->sess_cmd_lock);
> se_cmd->se_tfo->release_cmd(se_cmd);
> return;
> }
> if (se_sess->sess_tearing_down && se_cmd->cmd_wait_set) {
> - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> + spin_unlock(&se_sess->sess_cmd_lock);
> complete(&se_cmd->cmd_wait_comp);
> return;
> }
> list_del(&se_cmd->se_cmd_list);
> - spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> + spin_unlock(&se_sess->sess_cmd_lock);
>
> se_cmd->se_tfo->release_cmd(se_cmd);
> }
> @@ -2238,7 +2236,8 @@ static void target_release_cmd_kref(struct kref *kref)
> */
> int target_put_sess_cmd(struct se_session *se_sess, struct se_cmd *se_cmd)
> {
> - return kref_put(&se_cmd->cmd_kref, target_release_cmd_kref);
> + return kref_put_spinlock_irqsave(&se_cmd->cmd_kref, target_release_cmd_kref,
> + &se_sess->sess_cmd_lock);
> }
> EXPORT_SYMBOL(target_put_sess_cmd);
>
> diff --git a/include/linux/kref.h b/include/linux/kref.h
> index 4972e6e..7419c02 100644
> --- a/include/linux/kref.h
> +++ b/include/linux/kref.h
> @@ -19,6 +19,7 @@
> #include <linux/atomic.h>
> #include <linux/kernel.h>
> #include <linux/mutex.h>
> +#include <linux/spinlock.h>
>
> struct kref {
> atomic_t refcount;
> @@ -95,6 +96,38 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
> return kref_sub(kref, 1, release);
> }
>
> +/**
> + * kref_put_spinlock_irqsave - decrement refcount for object.
> + * @kref: object.
> + * @release: pointer to the function that will clean up the object when the
> + * last reference to the object is released.
> + * This pointer is required, and it is not acceptable to pass kfree
> + * in as this function.
> + * @lock: lock to take in release case
> + *
> + * Behaves identical to kref_put with one exception. If the reference count
> + * drops to zero, the lock will be taken atomically wrt dropping the reference
> + * count. The release function has to call spin_unlock() without _irqrestore.
> + */
> +static inline int kref_put_spinlock_irqsave(struct kref *kref,
> + void (*release)(struct kref *kref),
> + spinlock_t *lock)
> +{
> + unsigned long flags;
> +
> + WARN_ON(release == NULL);
> + if (atomic_add_unless(&kref->refcount, -1, 1))
> + return 0;
> + spin_lock_irqsave(lock, flags);
> + if (atomic_dec_and_test(&kref->refcount)) {
> + release(kref);
> + local_irq_restore(flags);
> + return 1;
> + }
> + spin_unlock_irqrestore(lock, flags);
> + return 0;
> +}
> +
> static inline int kref_put_mutex(struct kref *kref,
> void (*release)(struct kref *kref),
> struct mutex *lock)

2013-05-13 23:28:51

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Mon, 13 May 2013 16:00:03 -0700, Nicholas A. Bellinger wrote:
> On Mon, 2013-05-13 at 16:30 -0400, Joern Engel wrote:
> > The second parameter was always 0, leading to effectively dead code. It
> > called list_del() and se_cmd->se_tfo->release_cmd(), and had to set a
> > flag to prevent target_release_cmd_kref() from doing the same.
>
> Look again. The call to transport_wait_for_tasks() was dead code, but
> the wait_for_completion(&se_cmd->cmd_wait_comp) most certainly is not.

See "totally wrong" below.

> > But most
> > of all, it iterated the list without taking se_sess->sess_cmd_lock,
> > leading to races against ABORT and LUN_RESET.
>
> Ugh. You'll recall that target_wait_for_sess_cmds() originally did not
> have to take the lock because the list was spliced into
> sess_wait_list.
>
> When Roland removed sess_wait_list in commit 1c7b13fe, no one re-added
> the lock here.

Interesting point. That seems to imply that reverting 1c7b13fe would
be an alternative approach.

> > Since the whole point of the function is to wait for the list to drain,
> > and potentially print a bit of debug information in case that never
> > happens, I've replaced the wait_for_completion() with 100ms sleep. The
> > only callpath that would get delayed by this is rmmod, afaics, so I
> > didn't want the overhead of a waitqueue.
>
> This seems totally wrong..

The wait_for_completion() was not dead code, but it was just one
possible implementation of "wait for the list to drain". I dislike
that particular implementation because you have to drop the spinlock
before waiting and at the same time wait for a specific command.
Since you no longer hold any locks, the command can say *poof* and
disappear from under you at any point. Indeed it has to. So maybe
you could take a refcount while waiting for this command to prevent
that, which implies you have to check for this special refcount
elsewhere and...

At this point most readers should shudder in disgust and look for some
alternate implementation. I don't say mine is perfect, but at least
it does not care about any particular command.

> > - if (!rc) {
> > - wait_for_completion(&se_cmd->cmd_wait_comp);
> > - pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
> > - " fabric state: %d\n", se_cmd, se_cmd->t_state,
> > - se_cmd->se_tfo->get_cmd_state(se_cmd));
> > + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> > + while (!list_empty(&se_sess->sess_cmd_list)) {
> > + se_cmd = list_entry(se_sess->sess_cmd_list.next, struct se_cmd,
> > + se_cmd_list);
> > + if (se_cmd != last_cmd) { /* print this only once per command */
> > + pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state: %d\n",
> > + se_cmd, se_cmd->t_state,
> > + se_cmd->se_tfo->get_cmd_state(se_cmd));
> > + last_cmd = se_cmd;
> > }
> > -
> > - se_cmd->se_tfo->release_cmd(se_cmd);
> > + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> > + msleep_interruptible(100);
> > + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> > }
> > + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> > }
>
> So what happens when the backend se_cmd I/O does not complete before the
> msleep finishes..?

You take the lock, check list_empty() and go to sleep again. Repeat
until the backend I/O does complete or you get a hung task, whichever
comes first.

Which is exactly the same behaviour you had before.

> It seems totally wrong to drop the initial cmd_wait_set =1 assignment,
> target_release_cmd_kref() completion for cmd_wait_comp, and wait on
> cmd_wait_comp to allow se_cmd to finish processing here.
>
> Who cares about waitqueue overhead in a shutdown slow path when the
> replacement doesn't address long outstanding commands..?

I agree that the overhead doesn't matter. The msleep(100) spells this
out rather explicitly. What does matter is that a) the patch retains
old behaviour with much simpler code and b) it fixes a race that kills
the machine. I can live without a, but very much want to keep b. ;)

Jörn

--
Sometimes, asking the right question is already the answer.
-- Unknown

2013-05-14 03:05:59

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Mon, 2013-05-13 at 18:00 -0400, Jörn Engel wrote:
> On Mon, 13 May 2013 16:00:03 -0700, Nicholas A. Bellinger wrote:
> > On Mon, 2013-05-13 at 16:30 -0400, Joern Engel wrote:
> > > The second parameter was always 0, leading to effectively dead code. It
> > > called list_del() and se_cmd->se_tfo->release_cmd(), and had to set a
> > > flag to prevent target_release_cmd_kref() from doing the same.
> >
> > Look again. The call to transport_wait_for_tasks() was dead code, but
> > the wait_for_completion(&se_cmd->cmd_wait_comp) most certainly is not.
>
> See "totally wrong" below.
>
> > > But most
> > > of all, it iterated the list without taking se_sess->sess_cmd_lock,
> > > leading to races against ABORT and LUN_RESET.
> >
> > Ugh. You'll recall that target_wait_for_sess_cmds() originally did not
> > have to take the lock because the list was spliced into
> > sess_wait_list.
> >
> > When Roland removed sess_wait_list in commit 1c7b13fe, no one re-added
> > the lock here.
>
> Interesting point. That seems to imply that reverting 1c7b13fe would
> be an alternative approach.
>
> > > Since the whole point of the function is to wait for the list to drain,
> > > and potentially print a bit of debug information in case that never
> > > happens, I've replaced the wait_for_completion() with 100ms sleep. The
> > > only callpath that would get delayed by this is rmmod, afaics, so I
> > > didn't want the overhead of a waitqueue.
> >
> > This seems totally wrong..
>
> The wait_for_completion() was not dead code, but it was just one
> possible implementation of "wait for the list to drain". I dislike
> that particular implementation because you have to drop the spinlock
> before waiting and at the same time wait for a specific command.
> Since you no longer hold any locks, the command can say *poof* and
> disappear from under you at any point.

So I'd rather re-instate the list splice within
target_sess_cmd_list_set_waiting(), keep target_wait_for_sess_cmds()
lock-less performing wait_for_completions() on cmd_wait_comp, and keep
the existing cmd_wait_set assignment + check in place.


> Indeed it has to. So maybe
> you could take a refcount while waiting for this command to prevent
> that, which implies you have to check for this special refcount
> elsewhere and...

I don't think that will be necessary..

>
> At this point most readers should shudder in disgust and look for some
> alternate implementation. I don't say mine is perfect, but at least
> it does not care about any particular command.
>
> > > - if (!rc) {
> > > - wait_for_completion(&se_cmd->cmd_wait_comp);
> > > - pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
> > > - " fabric state: %d\n", se_cmd, se_cmd->t_state,
> > > - se_cmd->se_tfo->get_cmd_state(se_cmd));
> > > + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> > > + while (!list_empty(&se_sess->sess_cmd_list)) {
> > > + se_cmd = list_entry(se_sess->sess_cmd_list.next, struct se_cmd,
> > > + se_cmd_list);
> > > + if (se_cmd != last_cmd) { /* print this only once per command */
> > > + pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state: %d\n",
> > > + se_cmd, se_cmd->t_state,
> > > + se_cmd->se_tfo->get_cmd_state(se_cmd));
> > > + last_cmd = se_cmd;
> > > }
> > > -
> > > - se_cmd->se_tfo->release_cmd(se_cmd);
> > > + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> > > + msleep_interruptible(100);
> > > + spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> > > }
> > > + spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> > > }
> >
> > So what happens when the backend se_cmd I/O does not complete before the
> > msleep finishes..?
>
> You take the lock, check list_empty() and go to sleep again. Repeat
> until the backend I/O does complete or you get a hung task, whichever
> comes first.
>
> Which is exactly the same behaviour you had before.
>
> > It seems totally wrong to drop the initial cmd_wait_set =1 assignment,
> > target_release_cmd_kref() completion for cmd_wait_comp, and wait on
> > cmd_wait_comp to allow se_cmd to finish processing here.
> >
> > Who cares about waitqueue overhead in a shutdown slow path when the
> > replacement doesn't address long outstanding commands..?
>
> I agree that the overhead doesn't matter. The msleep(100) spells this
> out rather explicitly. What does matter is that a) the patch retains
> old behaviour with much simpler code and b) it fixes a race that kills
> the machine. I can live without a, but very much want to keep b. ;)
>

Fucking around with ->sess_cmd_lock during each loop of ->sess_cmd_list
in target_wait_for_sess_cmds is not simpler code..

Please re-spin a patch that re-instates the list splice part of commit
1c7b13fe6, and only drops the wait_for_tasks case check in
target_wait_for_sess_cmds()

--nab

2013-05-14 17:57:37

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Mon, 13 May 2013 20:08:44 -0700, Nicholas A. Bellinger wrote:
> On Mon, 2013-05-13 at 18:00 -0400, Jörn Engel wrote:
> >
> > I agree that the overhead doesn't matter. The msleep(100) spells this
> > out rather explicitly. What does matter is that a) the patch retains
> > old behaviour with much simpler code and b) it fixes a race that kills
> > the machine. I can live without a, but very much want to keep b. ;)
>
> Fucking around with ->sess_cmd_lock during each loop of ->sess_cmd_list
> in target_wait_for_sess_cmds is not simpler code..

I could argue that fucking around with ->sess_cmd_lock during each
loop is simpler than the communication through cmd_wait_set and
cmd_wait_comp. But simplicity is ultimately subjective and we can
argue all day.

drivers/infiniband/ulp/srpt/ib_srpt.c | 2 +-
drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 +-
drivers/target/target_core_transport.c | 64 +++++++++-----------------------
include/target/target_core_base.h | 2 -
include/target/target_core_fabric.h | 2 +-
5 files changed, 20 insertions(+), 52 deletions(-)

But diffstat is reasonably objective. Do you really want me to come
up with an alternative patch that adds code instead of removing it?

Jörn

--
Simplicity is prerequisite for reliability.
-- Edsger W. Dijkstra

2013-05-14 19:04:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 2/3] target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5

On Mon, May 13, 2013 at 04:30:06PM -0400, Joern Engel wrote:
> It is possible for one thread to to take se_sess->sess_cmd_lock in
> core_tmr_abort_task() before taking a reference count on
> se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops
> se_cmd->cmd_kref before taking se_sess->sess_cmd_lock.
>
> This introduces kref_put_spinlock_irqsave() and uses it in
> target_put_sess_cmd() to close the race window.
>
> Signed-off-by: Joern Engel <[email protected]>

For the kref.h part, please feel free to add:

Acked-by: Greg Kroah-Hartman <[email protected]>

2013-05-15 03:02:34

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Tue, 2013-05-14 at 12:29 -0400, Jörn Engel wrote:
> On Mon, 13 May 2013 20:08:44 -0700, Nicholas A. Bellinger wrote:
> > On Mon, 2013-05-13 at 18:00 -0400, Jörn Engel wrote:
> > >
> > > I agree that the overhead doesn't matter. The msleep(100) spells this
> > > out rather explicitly. What does matter is that a) the patch retains
> > > old behaviour with much simpler code and b) it fixes a race that kills
> > > the machine. I can live without a, but very much want to keep b. ;)
> >
> > Fucking around with ->sess_cmd_lock during each loop of ->sess_cmd_list
> > in target_wait_for_sess_cmds is not simpler code..
>
> I could argue that fucking around with ->sess_cmd_lock during each
> loop is simpler than the communication through cmd_wait_set and
> cmd_wait_comp. But simplicity is ultimately subjective and we can
> argue all day.
>

What I don't like is the endless loop in target_wait_for_sess_cmds()
that acquires and releases ->sess_cmd_lock for every command, with a
hard-coded msleep thrown in.

>
>
> drivers/infiniband/ulp/srpt/ib_srpt.c | 2 +-
> drivers/scsi/qla2xxx/tcm_qla2xxx.c | 2 +-
> drivers/target/target_core_transport.c | 64 +++++++++-----------------------
> include/target/target_core_base.h | 2 -
> include/target/target_core_fabric.h | 2 +-
> 5 files changed, 20 insertions(+), 52 deletions(-)
>
> But diffstat is reasonably objective. Do you really want me to come
> up with an alternative patch that adds code instead of removing it?
>

What I want is the part of Roland's commit reverted that introduced the
regression, that started you down this particular path of adding
unnecessary locking to target_wait_for_sess_cmds().

And if your using diffstat as a guide, after re-adding the handful of
parts for ->sess_wait_list from commit 1c7b13fe6 plus removing the dead
code in target_wait_for_sess_cmds(), the number of changed lines will
still be a net minus.

--nab

2013-05-15 03:04:31

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 2/3] target: close target_put_sess_cmd() vs. core_tmr_abort_task() race v5

On Tue, 2013-05-14 at 15:04 -0400, Greg Kroah-Hartman wrote:
> On Mon, May 13, 2013 at 04:30:06PM -0400, Joern Engel wrote:
> > It is possible for one thread to to take se_sess->sess_cmd_lock in
> > core_tmr_abort_task() before taking a reference count on
> > se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops
> > se_cmd->cmd_kref before taking se_sess->sess_cmd_lock.
> >
> > This introduces kref_put_spinlock_irqsave() and uses it in
> > target_put_sess_cmd() to close the race window.
> >
> > Signed-off-by: Joern Engel <[email protected]>
>
> For the kref.h part, please feel free to add:
>
> Acked-by: Greg Kroah-Hartman <[email protected]>

Applied to target-pending/queue

Since this fixes a real long-standing bug within tcm_qla2xxx code, I'm
adding a CC' to stable as well.

Thanks Joern + Greg-KH!

--nab

2013-05-15 03:47:08

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Tue, 14 May 2013 20:05:30 -0700, Nicholas A. Bellinger wrote:
> On Tue, 2013-05-14 at 12:29 -0400, Jörn Engel wrote:
> > On Mon, 13 May 2013 20:08:44 -0700, Nicholas A. Bellinger wrote:
> > > On Mon, 2013-05-13 at 18:00 -0400, Jörn Engel wrote:
> > > >
> > > > I agree that the overhead doesn't matter. The msleep(100) spells this
> > > > out rather explicitly. What does matter is that a) the patch retains
> > > > old behaviour with much simpler code and b) it fixes a race that kills
> > > > the machine. I can live without a, but very much want to keep b. ;)
> > >
> > > Fucking around with ->sess_cmd_lock during each loop of ->sess_cmd_list
> > > in target_wait_for_sess_cmds is not simpler code..
> >
> > I could argue that fucking around with ->sess_cmd_lock during each
> > loop is simpler than the communication through cmd_wait_set and
> > cmd_wait_comp. But simplicity is ultimately subjective and we can
> > argue all day.
>
> What I don't like is the endless loop in target_wait_for_sess_cmds()
> that acquires and releases ->sess_cmd_lock for every command, with a
> hard-coded msleep thrown in.

Not for every command. If the list is empty, it waits exactly 0x. If
all the commands finish within 100ms, it waits exactly 1x. Otherwise
it waits for as long as it takes, plus up to 100ms.

I agree this sucks, but the alternative was more code and we got it
wrong. The old adage is that for every 20 lines of code you add a
bug, and these 32 lines definitely had one. Which is why I almost
always prefer less code.

There is more to complexity than mere line count. Your original code
also made it impossible to judge the correctness of the code without
using grep. My loop is "either the commands eventually all complete,
or we hang forever." Your loop was "grep for cmd_wait_set, grep for
cmd_wait_comp and check every function that lights up. Assuming all
of that is correct, either the commands eventually all complete, or we
hang forever."

But if I cannot convince you, I guess we have to live with the bug
as-is. Telling management that I have to spend another week of my
time and several weeks of testing for a bug that is already fixed is a
hard sell. And even if I had that much free time and my wife didn't
complain, I don't have the necessary equipment. So the decision is
yours. You are the maintainer and have every right to block my patch.

Jörn

--
Modules are evil. They are a security issue, and they encourage a
"distro kernel" approach that takes forever to compile. Just say no.
Build a lean and mean kernel that actually has what you need, and nothing
more. And don't spend stupid time compiling modules you won't need.
-- Linus Torvalds

2013-05-15 06:02:13

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH 3/3] target: simplify target_wait_for_sess_cmds()

On Tue, 2013-05-14 at 22:19 -0400, Jörn Engel wrote:
> On Tue, 14 May 2013 20:05:30 -0700, Nicholas A. Bellinger wrote:
> > On Tue, 2013-05-14 at 12:29 -0400, Jörn Engel wrote:
> > > On Mon, 13 May 2013 20:08:44 -0700, Nicholas A. Bellinger wrote:
> > > > On Mon, 2013-05-13 at 18:00 -0400, Jörn Engel wrote:
> > > > >
> > > > > I agree that the overhead doesn't matter. The msleep(100) spells this
> > > > > out rather explicitly. What does matter is that a) the patch retains
> > > > > old behaviour with much simpler code and b) it fixes a race that kills
> > > > > the machine. I can live without a, but very much want to keep b. ;)
> > > >
> > > > Fucking around with ->sess_cmd_lock during each loop of ->sess_cmd_list
> > > > in target_wait_for_sess_cmds is not simpler code..
> > >
> > > I could argue that fucking around with ->sess_cmd_lock during each
> > > loop is simpler than the communication through cmd_wait_set and
> > > cmd_wait_comp. But simplicity is ultimately subjective and we can
> > > argue all day.
> >
> > What I don't like is the endless loop in target_wait_for_sess_cmds()
> > that acquires and releases ->sess_cmd_lock for every command, with a
> > hard-coded msleep thrown in.
>
> Not for every command. If the list is empty, it waits exactly 0x. If
> all the commands finish within 100ms, it waits exactly 1x. Otherwise
> it waits for as long as it takes, plus up to 100ms.
>
> I agree this sucks, but the alternative was more code and we got it
> wrong. The old adage is that for every 20 lines of code you add a
> bug, and these 32 lines definitely had one. Which is why I almost
> always prefer less code.
>

I'd rather judge by the code itself, rather than by some artificial line
count. Especially when the few lines we're talking about reinstating
are two pieces of initialization and single list_splice().

> There is more to complexity than mere line count. Your original code
> also made it impossible to judge the correctness of the code without
> using grep. My loop is "either the commands eventually all complete,
> or we hang forever." Your loop was "grep for cmd_wait_set, grep for
> cmd_wait_comp and check every function that lights up. Assuming all
> of that is correct, either the commands eventually all complete, or we
> hang forever."
>
> But if I cannot convince you, I guess we have to live with the bug
> as-is.

Fine. I'll post a patch shortly for the version that I'd prefer, and
you can include it into the test setup at your leisure.

Feel free to complain if you think it's logically incorrect.

> Telling management that I have to spend another week of my
> time and several weeks of testing for a bug that is already fixed is a
> hard sell. And even if I had that much free time and my wife didn't
> complain, I don't have the necessary equipment. So the decision is
> yours. You are the maintainer and have every right to block my patch.
>

Not sure what to say here. The end result for how the logic actually
works is AFAICT the same, I just don't like the extra lock acquire +
releases and the hardcoded msleep.

--nab