2016-10-27 06:59:24

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 00/10] infiniband: Remove semaphores

Hi,

These are a set of patches [v4] which removes semaphores from infiniband.
These are part of a bigger effort to eliminate all semaphores from the
linux kernel.

v3 -> v4:

IB/mlx5: Added patch - Replace semaphore umr_common:sem with wait_event
IB/mlx5: Fixed a bug pointed out by Leon Romanovsky

Thanks,
Binoy

Binoy Jayan (10):
IB/core: iwpm_nlmsg_request: Replace semaphore with completion
IB/core: Replace semaphore sm_sem with an atomic wait
IB/hns: Replace semaphore poll_sem with mutex
IB/mthca: Replace semaphore poll_sem with mutex
IB/isert: Replace semaphore sem with completion
IB/hns: Replace counting semaphore event_sem with wait_event
IB/mthca: Replace counting semaphore event_sem with wait_event
IB/mlx5: Add helper mlx5_ib_post_send_wait
IB/mlx5: Replace semaphore umr_common:sem with wait_event
IB/mlx5: Simplify completion into a wait_event

drivers/infiniband/core/iwpm_msg.c | 8 +-
drivers/infiniband/core/iwpm_util.c | 7 +-
drivers/infiniband/core/iwpm_util.h | 3 +-
drivers/infiniband/core/user_mad.c | 20 +++--
drivers/infiniband/hw/hns/hns_roce_cmd.c | 57 ++++++++-----
drivers/infiniband/hw/hns/hns_roce_device.h | 5 +-
drivers/infiniband/hw/mlx5/main.c | 6 +-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 9 +-
drivers/infiniband/hw/mlx5/mr.c | 124 +++++++++-------------------
drivers/infiniband/hw/mthca/mthca_cmd.c | 57 ++++++++-----
drivers/infiniband/hw/mthca/mthca_cmd.h | 1 +
drivers/infiniband/hw/mthca/mthca_dev.h | 5 +-
drivers/infiniband/ulp/isert/ib_isert.c | 6 +-
drivers/infiniband/ulp/isert/ib_isert.h | 3 +-
include/rdma/ib_verbs.h | 1 +
15 files changed, 153 insertions(+), 159 deletions(-)

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2016-10-27 06:59:40

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 03/10] IB/hns: Replace semaphore poll_sem with mutex

The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/hns/hns_roce_cmd.c | 11 ++++-------
drivers/infiniband/hw/hns/hns_roce_device.h | 3 ++-
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 2a0b6c0..51a0675 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -119,7 +119,7 @@ static int hns_roce_cmd_mbox_post_hw(struct hns_roce_dev *hr_dev, u64 in_param,
return ret;
}

-/* this should be called with "poll_sem" */
+/* this should be called with "poll_mutex" */
static int __hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
u64 out_param, unsigned long in_modifier,
u8 op_modifier, u16 op,
@@ -167,10 +167,10 @@ static int hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param,
{
int ret;

- down(&hr_dev->cmd.poll_sem);
+ mutex_lock(&hr_dev->cmd.poll_mutex);
ret = __hns_roce_cmd_mbox_poll(hr_dev, in_param, out_param, in_modifier,
op_modifier, op, timeout);
- up(&hr_dev->cmd.poll_sem);
+ mutex_unlock(&hr_dev->cmd.poll_mutex);

return ret;
}
@@ -275,7 +275,7 @@ int hns_roce_cmd_init(struct hns_roce_dev *hr_dev)
struct device *dev = &hr_dev->pdev->dev;

mutex_init(&hr_dev->cmd.hcr_mutex);
- sema_init(&hr_dev->cmd.poll_sem, 1);
+ mutex_init(&hr_dev->cmd.poll_mutex);
hr_dev->cmd.use_events = 0;
hr_dev->cmd.toggle = 1;
hr_dev->cmd.max_cmds = CMD_MAX_NUM;
@@ -319,8 +319,6 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
hr_cmd->token_mask = CMD_TOKEN_MASK;
hr_cmd->use_events = 1;

- down(&hr_cmd->poll_sem);
-
return 0;
}

@@ -335,7 +333,6 @@ void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
down(&hr_cmd->event_sem);

kfree(hr_cmd->context);
- up(&hr_cmd->poll_sem);
}

struct hns_roce_cmd_mailbox
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..2afe075 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -34,6 +34,7 @@
#define _HNS_ROCE_DEVICE_H

#include <rdma/ib_verbs.h>
+#include <linux/mutex.h>

#define DRV_NAME "hns_roce"

@@ -358,7 +359,7 @@ struct hns_roce_cmdq {
struct dma_pool *pool;
u8 __iomem *hcr;
struct mutex hcr_mutex;
- struct semaphore poll_sem;
+ struct mutex poll_mutex;
/*
* Event mode: cmd register mutex protection,
* ensure to not exceed max_cmds and user use limit region
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 06:59:28

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 01/10] IB/core: iwpm_nlmsg_request: Replace semaphore with completion

Semaphore sem in iwpm_nlmsg_request is used as completion, so
convert it to a struct completion type. Semaphores are going
away in the future.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/core/iwpm_msg.c | 8 ++++----
drivers/infiniband/core/iwpm_util.c | 7 +++----
drivers/infiniband/core/iwpm_util.h | 3 ++-
3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/iwpm_msg.c b/drivers/infiniband/core/iwpm_msg.c
index 1c41b95..761358f 100644
--- a/drivers/infiniband/core/iwpm_msg.c
+++ b/drivers/infiniband/core/iwpm_msg.c
@@ -394,7 +394,7 @@ int iwpm_register_pid_cb(struct sk_buff *skb, struct netlink_callback *cb)
/* always for found nlmsg_request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
- up(&nlmsg_request->sem);
+ complete(&nlmsg_request->comp);
return 0;
}
EXPORT_SYMBOL(iwpm_register_pid_cb);
@@ -463,7 +463,7 @@ int iwpm_add_mapping_cb(struct sk_buff *skb, struct netlink_callback *cb)
/* always for found request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
- up(&nlmsg_request->sem);
+ complete(&nlmsg_request->comp);
return 0;
}
EXPORT_SYMBOL(iwpm_add_mapping_cb);
@@ -555,7 +555,7 @@ int iwpm_add_and_query_mapping_cb(struct sk_buff *skb,
/* always for found request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
- up(&nlmsg_request->sem);
+ complete(&nlmsg_request->comp);
return 0;
}
EXPORT_SYMBOL(iwpm_add_and_query_mapping_cb);
@@ -749,7 +749,7 @@ int iwpm_mapping_error_cb(struct sk_buff *skb, struct netlink_callback *cb)
/* always for found request */
kref_put(&nlmsg_request->kref, iwpm_free_nlmsg_request);
barrier();
- up(&nlmsg_request->sem);
+ complete(&nlmsg_request->comp);
return 0;
}
EXPORT_SYMBOL(iwpm_mapping_error_cb);
diff --git a/drivers/infiniband/core/iwpm_util.c b/drivers/infiniband/core/iwpm_util.c
index ade71e7..08ddd2e 100644
--- a/drivers/infiniband/core/iwpm_util.c
+++ b/drivers/infiniband/core/iwpm_util.c
@@ -323,8 +323,7 @@ struct iwpm_nlmsg_request *iwpm_get_nlmsg_request(__u32 nlmsg_seq,
nlmsg_request->nl_client = nl_client;
nlmsg_request->request_done = 0;
nlmsg_request->err_code = 0;
- sema_init(&nlmsg_request->sem, 1);
- down(&nlmsg_request->sem);
+ init_completion(&nlmsg_request->comp);
return nlmsg_request;
}

@@ -368,8 +367,8 @@ int iwpm_wait_complete_req(struct iwpm_nlmsg_request *nlmsg_request)
{
int ret;

- ret = down_timeout(&nlmsg_request->sem, IWPM_NL_TIMEOUT);
- if (ret) {
+ ret = wait_for_completion_timeout(&nlmsg_request->comp, IWPM_NL_TIMEOUT);
+ if (!ret) {
ret = -EINVAL;
pr_info("%s: Timeout %d sec for netlink request (seq = %u)\n",
__func__, (IWPM_NL_TIMEOUT/HZ), nlmsg_request->nlmsg_seq);
diff --git a/drivers/infiniband/core/iwpm_util.h b/drivers/infiniband/core/iwpm_util.h
index af1fc14..ea6c299 100644
--- a/drivers/infiniband/core/iwpm_util.h
+++ b/drivers/infiniband/core/iwpm_util.h
@@ -43,6 +43,7 @@
#include <linux/delay.h>
#include <linux/workqueue.h>
#include <linux/mutex.h>
+#include <linux/completion.h>
#include <linux/jhash.h>
#include <linux/kref.h>
#include <net/netlink.h>
@@ -69,7 +70,7 @@ struct iwpm_nlmsg_request {
u8 nl_client;
u8 request_done;
u16 err_code;
- struct semaphore sem;
+ struct completion comp;
struct kref kref;
};

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 06:59:49

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 06/10] IB/hns: Replace counting semaphore event_sem with wait_event

Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/hns/hns_roce_cmd.c | 46 ++++++++++++++++++++---------
drivers/infiniband/hw/hns/hns_roce_device.h | 2 +-
2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 51a0675..12ef3d8 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -189,6 +189,34 @@ void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 token, u8 status,
complete(&context->done);
}

+static inline struct hns_roce_cmd_context *
+hns_roce_try_get_context(struct hns_roce_cmdq *cmd)
+{
+ struct hns_roce_cmd_context *context = NULL;
+
+ spin_lock(&cmd->context_lock);
+
+ if (cmd->free_head < 0)
+ goto out;
+
+ context = &cmd->context[cmd->free_head];
+ context->token += cmd->token_mask + 1;
+ cmd->free_head = context->next;
+out:
+ spin_unlock(&cmd->context_lock);
+ return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct hns_roce_cmd_context *
+hns_roce_get_free_context(struct hns_roce_cmdq *cmd)
+{
+ struct hns_roce_cmd_context *context;
+
+ wait_event(cmd->wq, (context = hns_roce_try_get_context(cmd)));
+ return context;
+}
+
/* this should be called with "use_events" */
static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
u64 out_param, unsigned long in_modifier,
@@ -200,13 +228,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
struct hns_roce_cmd_context *context;
int ret = 0;

- spin_lock(&cmd->context_lock);
- WARN_ON(cmd->free_head < 0);
- context = &cmd->context[cmd->free_head];
- context->token += cmd->token_mask + 1;
- cmd->free_head = context->next;
- spin_unlock(&cmd->context_lock);
-
+ context = hns_roce_get_free_context(cmd);
init_completion(&context->done);

ret = hns_roce_cmd_mbox_post_hw(hr_dev, in_param, out_param,
@@ -238,6 +260,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
context->next = cmd->free_head;
cmd->free_head = context - cmd->context;
spin_unlock(&cmd->context_lock);
+ wake_up(&cmd->wq);

return ret;
}
@@ -248,10 +271,8 @@ static int hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
{
int ret = 0;

- down(&hr_dev->cmd.event_sem);
ret = __hns_roce_cmd_mbox_wait(hr_dev, in_param, out_param,
in_modifier, op_modifier, op, timeout);
- up(&hr_dev->cmd.event_sem);

return ret;
}
@@ -313,7 +334,7 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
hr_cmd->context[hr_cmd->max_cmds - 1].next = -1;
hr_cmd->free_head = 0;

- sema_init(&hr_cmd->event_sem, hr_cmd->max_cmds);
+ init_waitqueue_head(&hr_cmd->wq);
spin_lock_init(&hr_cmd->context_lock);

hr_cmd->token_mask = CMD_TOKEN_MASK;
@@ -325,12 +346,9 @@ int hns_roce_cmd_use_events(struct hns_roce_dev *hr_dev)
void hns_roce_cmd_use_polling(struct hns_roce_dev *hr_dev)
{
struct hns_roce_cmdq *hr_cmd = &hr_dev->cmd;
- int i;

hr_cmd->use_events = 0;
-
- for (i = 0; i < hr_cmd->max_cmds; ++i)
- down(&hr_cmd->event_sem);
+ hr_cmd->free_head = -1;

kfree(hr_cmd->context);
}
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 2afe075..ac95f52 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -364,7 +364,7 @@ struct hns_roce_cmdq {
* Event mode: cmd register mutex protection,
* ensure to not exceed max_cmds and user use limit region
*/
- struct semaphore event_sem;
+ wait_queue_head_t wq;
int max_cmds;
spinlock_t context_lock;
int free_head;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 06:59:37

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 02/10] IB/core: Replace semaphore sm_sem with an atomic wait

The semaphore 'sm_sem' is used for an exclusive ownership of the device
so model the same as an atomic variable with an associated wait_event.
Semaphores are going away in the future.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/core/user_mad.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 415a318..6101c0a 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -67,6 +67,8 @@ enum {
IB_UMAD_MINOR_BASE = 0
};

+#define UMAD_F_CLAIM 0x01
+
/*
* Our lifetime rules for these structs are the following:
* device special file is opened, we take a reference on the
@@ -87,7 +89,8 @@ struct ib_umad_port {

struct cdev sm_cdev;
struct device *sm_dev;
- struct semaphore sm_sem;
+ wait_queue_head_t wq;
+ unsigned long flags;

struct mutex file_mutex;
struct list_head file_list;
@@ -1030,12 +1033,14 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp)
port = container_of(inode->i_cdev, struct ib_umad_port, sm_cdev);

if (filp->f_flags & O_NONBLOCK) {
- if (down_trylock(&port->sm_sem)) {
+ if (test_and_set_bit(UMAD_F_CLAIM, &port->flags)) {
ret = -EAGAIN;
goto fail;
}
} else {
- if (down_interruptible(&port->sm_sem)) {
+ if (wait_event_interruptible(port->wq,
+ !test_and_set_bit(UMAD_F_CLAIM,
+ &port->flags))) {
ret = -ERESTARTSYS;
goto fail;
}
@@ -1060,7 +1065,8 @@ static int ib_umad_sm_open(struct inode *inode, struct file *filp)
ib_modify_port(port->ib_dev, port->port_num, 0, &props);

err_up_sem:
- up(&port->sm_sem);
+ clear_bit(UMAD_F_CLAIM, &port->flags);
+ wake_up(&port->wq);

fail:
return ret;
@@ -1079,7 +1085,8 @@ static int ib_umad_sm_close(struct inode *inode, struct file *filp)
ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props);
mutex_unlock(&port->file_mutex);

- up(&port->sm_sem);
+ clear_bit(UMAD_F_CLAIM, &port->flags);
+ wake_up(&port->wq);

kobject_put(&port->umad_dev->kobj);

@@ -1177,7 +1184,8 @@ static int ib_umad_init_port(struct ib_device *device, int port_num,

port->ib_dev = device;
port->port_num = port_num;
- sema_init(&port->sm_sem, 1);
+ init_waitqueue_head(&port->wq);
+ __clear_bit(UMAD_F_CLAIM, &port->flags);
mutex_init(&port->file_mutex);
INIT_LIST_HEAD(&port->file_list);

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 06:59:53

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 07/10] IB/mthca: Replace counting semaphore event_sem with wait_event

Counting semaphores are going away in the future, so replace the semaphore
mthca_cmd::event_sem with a conditional wait_event.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/mthca/mthca_cmd.c | 47 ++++++++++++++++++++++-----------
drivers/infiniband/hw/mthca/mthca_dev.h | 3 ++-
2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index 49c6e19..d6a048a 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -405,6 +405,34 @@ void mthca_cmd_event(struct mthca_dev *dev,
complete(&context->done);
}

+static inline struct mthca_cmd_context *
+mthca_try_get_context(struct mthca_cmd *cmd)
+{
+ struct mthca_cmd_context *context = NULL;
+
+ spin_lock(&cmd->context_lock);
+
+ if (cmd->free_head < 0)
+ goto out;
+
+ context = &cmd->context[cmd->free_head];
+ context->token += cmd->token_mask + 1;
+ cmd->free_head = context->next;
+out:
+ spin_unlock(&cmd->context_lock);
+ return context;
+}
+
+/* wait for and acquire a free context */
+static inline struct mthca_cmd_context *
+mthca_get_free_context(struct mthca_cmd *cmd)
+{
+ struct mthca_cmd_context *context;
+
+ wait_event(cmd->wq, (context = mthca_try_get_context(cmd)));
+ return context;
+}
+
static int mthca_cmd_wait(struct mthca_dev *dev,
u64 in_param,
u64 *out_param,
@@ -417,15 +445,7 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
int err = 0;
struct mthca_cmd_context *context;

- down(&dev->cmd.event_sem);
-
- spin_lock(&dev->cmd.context_lock);
- BUG_ON(dev->cmd.free_head < 0);
- context = &dev->cmd.context[dev->cmd.free_head];
- context->token += dev->cmd.token_mask + 1;
- dev->cmd.free_head = context->next;
- spin_unlock(&dev->cmd.context_lock);
-
+ context = mthca_get_free_context(&dev->cmd);
init_completion(&context->done);

err = mthca_cmd_post(dev, in_param,
@@ -458,8 +478,8 @@ static int mthca_cmd_wait(struct mthca_dev *dev,
context->next = dev->cmd.free_head;
dev->cmd.free_head = context - dev->cmd.context;
spin_unlock(&dev->cmd.context_lock);
+ wake_up(&dev->cmd.wq);

- up(&dev->cmd.event_sem);
return err;
}

@@ -571,7 +591,7 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
dev->cmd.context[dev->cmd.max_cmds - 1].next = -1;
dev->cmd.free_head = 0;

- sema_init(&dev->cmd.event_sem, dev->cmd.max_cmds);
+ init_waitqueue_head(&dev->cmd.wq);
spin_lock_init(&dev->cmd.context_lock);

for (dev->cmd.token_mask = 1;
@@ -590,12 +610,9 @@ int mthca_cmd_use_events(struct mthca_dev *dev)
*/
void mthca_cmd_use_polling(struct mthca_dev *dev)
{
- int i;
-
dev->cmd.flags &= ~MTHCA_CMD_USE_EVENTS;

- for (i = 0; i < dev->cmd.max_cmds; ++i)
- down(&dev->cmd.event_sem);
+ dev->cmd.free_head = -1;

kfree(dev->cmd.context);
}
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 87ab964..2fc86db 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -46,6 +46,7 @@
#include <linux/list.h>
#include <linux/semaphore.h>

+#include <rdma/ib_sa.h>
#include "mthca_provider.h"
#include "mthca_doorbell.h"

@@ -121,7 +122,7 @@ struct mthca_cmd {
struct pci_pool *pool;
struct mutex hcr_mutex;
struct mutex poll_mutex;
- struct semaphore event_sem;
+ wait_queue_head_t wq;
int max_cmds;
spinlock_t context_lock;
int free_head;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 07:00:00

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event

Convert the completion 'mlx5_ib_umr_context:done' to a wait_event as it
just waits for the return value to be filled.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
drivers/infiniband/hw/mlx5/mr.c | 9 +++++----
include/rdma/ib_verbs.h | 1 +
3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index de31b5f..cf496b5 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -524,7 +524,7 @@ struct mlx5_ib_mw {
struct mlx5_ib_umr_context {
struct ib_cqe cqe;
enum ib_wc_status status;
- struct completion done;
+ wait_queue_head_t wq;
};

struct umr_common {
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index dfaf6f6..49ff2af 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -846,14 +846,14 @@ static void mlx5_ib_umr_done(struct ib_cq *cq, struct ib_wc *wc)
container_of(wc->wr_cqe, struct mlx5_ib_umr_context, cqe);

context->status = wc->status;
- complete(&context->done);
+ wake_up(&context->wq);
}

static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
{
context->cqe.done = mlx5_ib_umr_done;
- context->status = -1;
- init_completion(&context->done);
+ context->status = IB_WC_STATUS_NONE;
+ init_waitqueue_head(&context->wq);
}

static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
@@ -873,7 +873,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
if (err) {
mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
} else {
- wait_for_completion(&umr_context.done);
+ wait_event(umr_context.wq,
+ umr_context.status != IB_WC_STATUS_NONE);
if (umr_context.status != IB_WC_SUCCESS) {
mlx5_ib_warn(dev, "reg umr failed (%u)\n",
umr_context.status);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..8b15b6f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -823,6 +823,7 @@ struct ib_ah_attr {
};

enum ib_wc_status {
+ IB_WC_STATUS_NONE = -1,
IB_WC_SUCCESS,
IB_WC_LOC_LEN_ERR,
IB_WC_LOC_QP_OP_ERR,
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 07:00:19

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 09/10] IB/mlx5: Replace semaphore umr_common:sem with wait_event

Remove semaphore umr_common:sem used to limit concurrent access to umr qp
and introduce an atomic value 'users' to keep track of the same. Use a
wait_event to block when the limit is reached.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/mlx5/main.c | 6 +-----
drivers/infiniband/hw/mlx5/mlx5_ib.h | 7 ++++++-
drivers/infiniband/hw/mlx5/mr.c | 6 ++++--
3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2217477..eb72bff 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2437,10 +2437,6 @@ static void destroy_umrc_res(struct mlx5_ib_dev *dev)
ib_dealloc_pd(dev->umrc.pd);
}

-enum {
- MAX_UMR_WR = 128,
-};
-
static int create_umr_res(struct mlx5_ib_dev *dev)
{
struct ib_qp_init_attr *init_attr = NULL;
@@ -2520,7 +2516,7 @@ static int create_umr_res(struct mlx5_ib_dev *dev)
dev->umrc.cq = cq;
dev->umrc.pd = pd;

- sema_init(&dev->umrc.sem, MAX_UMR_WR);
+ init_waitqueue_head(&dev->umrc.wq);
ret = mlx5_mr_cache_init(dev);
if (ret) {
mlx5_ib_warn(dev, "mr cache init failed %d\n", ret);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index dcdcd19..de31b5f 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -533,7 +533,12 @@ struct umr_common {
struct ib_qp *qp;
/* control access to UMR QP
*/
- struct semaphore sem;
+ wait_queue_head_t wq;
+ atomic_t users;
+};
+
+enum {
+ MAX_UMR_WR = 128,
};

enum {
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 1593856..dfaf6f6 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -867,7 +867,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
mlx5_ib_init_umr_context(&umr_context);
umrwr->wr.wr_cqe = &umr_context.cqe;

- down(&umrc->sem);
+ /* limit number of concurrent ib_post_send() on qp */
+ wait_event(umrc->wq, atomic_add_unless(&umrc->users, 1, MAX_UMR_WR));
err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
if (err) {
mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
@@ -879,7 +880,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
err = -EFAULT;
}
}
- up(&umrc->sem);
+ atomic_dec(&umrc->users);
+ wake_up(&umrc->wq);
return err;
}

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 07:00:34

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 08/10] IB/mlx5: Add helper mlx5_ib_post_send_wait

Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

- Initialize 'mlx5_ib_umr_context' on stack
- Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
- Acquire the semaphore
- call ib_post_send with a single ib_send_wr
- wait_for_completion()
- Check for umr_context.status
- Release the semaphore

As semaphores are going away in the future, moving all of these into the
shared helper leaves only a single function using the semaphore, which
can then be rewritten to use something else.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/mlx5/mr.c | 115 +++++++++++-----------------------------
1 file changed, 32 insertions(+), 83 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index d4ad672..1593856 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -856,16 +856,40 @@ static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
init_completion(&context->done);
}

+static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
+ struct mlx5_umr_wr *umrwr)
+{
+ struct umr_common *umrc = &dev->umrc;
+ struct ib_send_wr *bad;
+ int err;
+ struct mlx5_ib_umr_context umr_context;
+
+ mlx5_ib_init_umr_context(&umr_context);
+ umrwr->wr.wr_cqe = &umr_context.cqe;
+
+ down(&umrc->sem);
+ err = ib_post_send(umrc->qp, &umrwr->wr, &bad);
+ if (err) {
+ mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
+ } else {
+ wait_for_completion(&umr_context.done);
+ if (umr_context.status != IB_WC_SUCCESS) {
+ mlx5_ib_warn(dev, "reg umr failed (%u)\n",
+ umr_context.status);
+ err = -EFAULT;
+ }
+ }
+ up(&umrc->sem);
+ return err;
+}
+
static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
u64 virt_addr, u64 len, int npages,
int page_shift, int order, int access_flags)
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct device *ddev = dev->ib_dev.dma_device;
- struct umr_common *umrc = &dev->umrc;
- struct mlx5_ib_umr_context umr_context;
struct mlx5_umr_wr umrwr = {};
- struct ib_send_wr *bad;
struct mlx5_ib_mr *mr;
struct ib_sge sg;
int size;
@@ -894,24 +918,12 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
if (err)
goto free_mr;

- mlx5_ib_init_umr_context(&umr_context);
-
- umrwr.wr.wr_cqe = &umr_context.cqe;
prep_umr_reg_wqe(pd, &umrwr.wr, &sg, dma, npages, mr->mmkey.key,
page_shift, virt_addr, len, access_flags);

- down(&umrc->sem);
- err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
- if (err) {
- mlx5_ib_warn(dev, "post send failed, err %d\n", err);
+ err = mlx5_ib_post_send_wait(dev, &umrwr);
+ if (err && err != -EFAULT)
goto unmap_dma;
- } else {
- wait_for_completion(&umr_context.done);
- if (umr_context.status != IB_WC_SUCCESS) {
- mlx5_ib_warn(dev, "reg umr failed\n");
- err = -EFAULT;
- }
- }

mr->mmkey.iova = virt_addr;
mr->mmkey.size = len;
@@ -920,7 +932,6 @@ static struct mlx5_ib_mr *reg_umr(struct ib_pd *pd, struct ib_umem *umem,
mr->live = 1;

unmap_dma:
- up(&umrc->sem);
dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);

kfree(mr_pas);
@@ -940,13 +951,10 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,
{
struct mlx5_ib_dev *dev = mr->dev;
struct device *ddev = dev->ib_dev.dma_device;
- struct umr_common *umrc = &dev->umrc;
- struct mlx5_ib_umr_context umr_context;
struct ib_umem *umem = mr->umem;
int size;
__be64 *pas;
dma_addr_t dma;
- struct ib_send_wr *bad;
struct mlx5_umr_wr wr;
struct ib_sge sg;
int err = 0;
@@ -1011,10 +1019,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,

dma_sync_single_for_device(ddev, dma, size, DMA_TO_DEVICE);

- mlx5_ib_init_umr_context(&umr_context);
-
memset(&wr, 0, sizeof(wr));
- wr.wr.wr_cqe = &umr_context.cqe;

sg.addr = dma;
sg.length = ALIGN(npages * sizeof(u64),
@@ -1031,19 +1036,7 @@ int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages,
wr.mkey = mr->mmkey.key;
wr.target.offset = start_page_index;

- down(&umrc->sem);
- err = ib_post_send(umrc->qp, &wr.wr, &bad);
- if (err) {
- mlx5_ib_err(dev, "UMR post send failed, err %d\n", err);
- } else {
- wait_for_completion(&umr_context.done);
- if (umr_context.status != IB_WC_SUCCESS) {
- mlx5_ib_err(dev, "UMR completion failed, code %d\n",
- umr_context.status);
- err = -EFAULT;
- }
- }
- up(&umrc->sem);
+ err = mlx5_ib_post_send_wait(dev, &wr);
}
dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);

@@ -1210,39 +1203,14 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
static int unreg_umr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
{
struct mlx5_core_dev *mdev = dev->mdev;
- struct umr_common *umrc = &dev->umrc;
- struct mlx5_ib_umr_context umr_context;
struct mlx5_umr_wr umrwr = {};
- struct ib_send_wr *bad;
- int err;

if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
return 0;

- mlx5_ib_init_umr_context(&umr_context);
-
- umrwr.wr.wr_cqe = &umr_context.cqe;
prep_umr_unreg_wqe(dev, &umrwr.wr, mr->mmkey.key);

- down(&umrc->sem);
- err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
- if (err) {
- up(&umrc->sem);
- mlx5_ib_dbg(dev, "err %d\n", err);
- goto error;
- } else {
- wait_for_completion(&umr_context.done);
- up(&umrc->sem);
- }
- if (umr_context.status != IB_WC_SUCCESS) {
- mlx5_ib_warn(dev, "unreg umr failed\n");
- err = -EFAULT;
- goto error;
- }
- return 0;
-
-error:
- return err;
+ return mlx5_ib_post_send_wait(dev, &umrwr);
}

static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr, u64 virt_addr,
@@ -1251,19 +1219,13 @@ static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr, u64 virt_addr,
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct device *ddev = dev->ib_dev.dma_device;
- struct mlx5_ib_umr_context umr_context;
- struct ib_send_wr *bad;
struct mlx5_umr_wr umrwr = {};
struct ib_sge sg;
- struct umr_common *umrc = &dev->umrc;
dma_addr_t dma = 0;
__be64 *mr_pas = NULL;
int size;
int err;

- mlx5_ib_init_umr_context(&umr_context);
-
- umrwr.wr.wr_cqe = &umr_context.cqe;
umrwr.wr.send_flags = MLX5_IB_SEND_UMR_FAIL_IF_FREE;

if (flags & IB_MR_REREG_TRANS) {
@@ -1291,21 +1253,8 @@ static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr, u64 virt_addr,
}

/* post send request to UMR QP */
- down(&umrc->sem);
- err = ib_post_send(umrc->qp, &umrwr.wr, &bad);
+ err = mlx5_ib_post_send_wait(dev, &umrwr);

- if (err) {
- mlx5_ib_warn(dev, "post send failed, err %d\n", err);
- } else {
- wait_for_completion(&umr_context.done);
- if (umr_context.status != IB_WC_SUCCESS) {
- mlx5_ib_warn(dev, "reg umr failed (%u)\n",
- umr_context.status);
- err = -EFAULT;
- }
- }
-
- up(&umrc->sem);
if (flags & IB_MR_REREG_TRANS) {
dma_unmap_single(ddev, dma, size, DMA_TO_DEVICE);
kfree(mr_pas);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 07:00:58

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion

The semaphore 'sem' in isert_device is used as completion, so convert
it to struct completion. Semaphores are going away in the future.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/ulp/isert/ib_isert.c | 6 +++---
drivers/infiniband/ulp/isert/ib_isert.h | 3 ++-
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f6..de80f56 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -619,7 +619,7 @@
mutex_unlock(&isert_np->mutex);

isert_info("np %p: Allow accept_np to continue\n", isert_np);
- up(&isert_np->sem);
+ complete(&isert_np->comp);
}

static void
@@ -2311,7 +2311,7 @@ struct rdma_cm_id *
isert_err("Unable to allocate struct isert_np\n");
return -ENOMEM;
}
- sema_init(&isert_np->sem, 0);
+ init_completion(&isert_np->comp);
mutex_init(&isert_np->mutex);
INIT_LIST_HEAD(&isert_np->accepted);
INIT_LIST_HEAD(&isert_np->pending);
@@ -2427,7 +2427,7 @@ struct rdma_cm_id *
int ret;

accept_wait:
- ret = down_interruptible(&isert_np->sem);
+ ret = wait_for_completion_interruptible(&isert_np->comp);
if (ret)
return -ENODEV;

diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index c02ada5..a1277c0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -3,6 +3,7 @@
#include <linux/in6.h>
#include <rdma/ib_verbs.h>
#include <rdma/rdma_cm.h>
+#include <linux/completion.h>
#include <rdma/rw.h>
#include <scsi/iser.h>

@@ -190,7 +191,7 @@ struct isert_device {

struct isert_np {
struct iscsi_np *np;
- struct semaphore sem;
+ struct completion comp;
struct rdma_cm_id *cm_id;
struct mutex mutex;
struct list_head accepted;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 07:01:17

by Binoy Jayan

[permalink] [raw]
Subject: [PATCH v4 04/10] IB/mthca: Replace semaphore poll_sem with mutex

The semaphore 'poll_sem' is a simple mutex, so it should be written as one.
Semaphores are going away in the future. So replace it with a mutex. Also,
remove mutex_[un]lock from mthca_cmd_use_events and mthca_cmd_use_polling
respectively.

Signed-off-by: Binoy Jayan <[email protected]>
---
drivers/infiniband/hw/mthca/mthca_cmd.c | 10 +++-------
drivers/infiniband/hw/mthca/mthca_cmd.h | 1 +
drivers/infiniband/hw/mthca/mthca_dev.h | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c
index c7f49bb..49c6e19 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.c
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.c
@@ -347,7 +347,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
unsigned long end;
u8 status;

- down(&dev->cmd.poll_sem);
+ mutex_lock(&dev->cmd.poll_mutex);

err = mthca_cmd_post(dev, in_param,
out_param ? *out_param : 0,
@@ -382,7 +382,7 @@ static int mthca_cmd_poll(struct mthca_dev *dev,
}

out:
- up(&dev->cmd.poll_sem);
+ mutex_unlock(&dev->cmd.poll_mutex);
return err;
}

@@ -520,7 +520,7 @@ static int mthca_cmd_imm(struct mthca_dev *dev,
int mthca_cmd_init(struct mthca_dev *dev)
{
mutex_init(&dev->cmd.hcr_mutex);
- sema_init(&dev->cmd.poll_sem, 1);
+ mutex_init(&dev->cmd.poll_mutex);
dev->cmd.flags = 0;

dev->hcr = ioremap(pci_resource_start(dev->pdev, 0) + MTHCA_HCR_BASE,
@@ -582,8 +582,6 @@ int mthca_cmd_use_events(struct mthca_dev *dev)

dev->cmd.flags |= MTHCA_CMD_USE_EVENTS;

- down(&dev->cmd.poll_sem);
-
return 0;
}

@@ -600,8 +598,6 @@ void mthca_cmd_use_polling(struct mthca_dev *dev)
down(&dev->cmd.event_sem);

kfree(dev->cmd.context);
-
- up(&dev->cmd.poll_sem);
}

struct mthca_mailbox *mthca_alloc_mailbox(struct mthca_dev *dev,
diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.h b/drivers/infiniband/hw/mthca/mthca_cmd.h
index d2e5b19..a7f197e 100644
--- a/drivers/infiniband/hw/mthca/mthca_cmd.h
+++ b/drivers/infiniband/hw/mthca/mthca_cmd.h
@@ -35,6 +35,7 @@
#ifndef MTHCA_CMD_H
#define MTHCA_CMD_H

+#include <linux/mutex.h>
#include <rdma/ib_verbs.h>

#define MTHCA_MAILBOX_SIZE 4096
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index 4393a02..87ab964 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -120,7 +120,7 @@ enum {
struct mthca_cmd {
struct pci_pool *pool;
struct mutex hcr_mutex;
- struct semaphore poll_sem;
+ struct mutex poll_mutex;
struct semaphore event_sem;
int max_cmds;
spinlock_t context_lock;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

2016-10-27 07:13:37

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] infiniband: Remove semaphores

On Thu, Oct 27, 2016 at 12:29:04PM +0530, Binoy Jayan wrote:
> Hi,
>
> These are a set of patches [v4] which removes semaphores from infiniband.
> These are part of a bigger effort to eliminate all semaphores from the
> linux kernel.
>
> v3 -> v4:
>
> IB/mlx5: Added patch - Replace semaphore umr_common:sem with wait_event
> IB/mlx5: Fixed a bug pointed out by Leon Romanovsky

Please keep full changelog for your next submissions/respins.

Thanks


Attachments:
(No filename) (449.00 B)
signature.asc (819.00 B)
Download all attachments

2016-10-30 21:12:49

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion

> The semaphore 'sem' in isert_device is used as completion, so convert
> it to struct completion. Semaphores are going away in the future.

Umm, this is 100% *not* true. np->sem is designed as a counting to
sync the iscsi login thread with the connect requests coming from the
initiators. So this is actually a reliable bug insertion :(

NAK from me...

Also, I would appreciate if you include get_maintainer.pl in your
patch submissions so I won't need to fish these in the Linux-rdma
patch traffic.

2016-10-30 21:18:57

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event



On 27/10/16 09:59, Binoy Jayan wrote:
> Convert the completion 'mlx5_ib_umr_context:done' to a wait_event as it
> just waits for the return value to be filled.
>
> Signed-off-by: Binoy Jayan <[email protected]>
> ---
> drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
> drivers/infiniband/hw/mlx5/mr.c | 9 +++++----
> include/rdma/ib_verbs.h | 1 +
> 3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index de31b5f..cf496b5 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -524,7 +524,7 @@ struct mlx5_ib_mw {
> struct mlx5_ib_umr_context {
> struct ib_cqe cqe;
> enum ib_wc_status status;
> - struct completion done;
> + wait_queue_head_t wq;
> };
>
> struct umr_common {
> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> index dfaf6f6..49ff2af 100644
> --- a/drivers/infiniband/hw/mlx5/mr.c
> +++ b/drivers/infiniband/hw/mlx5/mr.c
> @@ -846,14 +846,14 @@ static void mlx5_ib_umr_done(struct ib_cq *cq, struct ib_wc *wc)
> container_of(wc->wr_cqe, struct mlx5_ib_umr_context, cqe);
>
> context->status = wc->status;
> - complete(&context->done);
> + wake_up(&context->wq);
> }
>
> static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
> {
> context->cqe.done = mlx5_ib_umr_done;
> - context->status = -1;
> - init_completion(&context->done);
> + context->status = IB_WC_STATUS_NONE;
> + init_waitqueue_head(&context->wq);
> }
>
> static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> @@ -873,7 +873,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> if (err) {
> mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
> } else {
> - wait_for_completion(&umr_context.done);
> + wait_event(umr_context.wq,
> + umr_context.status != IB_WC_STATUS_NONE);

How is this simpler?


> enum ib_wc_status {
> + IB_WC_STATUS_NONE = -1,
> IB_WC_SUCCESS,
> IB_WC_LOC_LEN_ERR,
> IB_WC_LOC_QP_OP_ERR,
>

Huh? Where did this bogus status came from? IMHO, this is polluting
the verbs interface for no good reason at all, sorry.

2016-11-03 04:41:48

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event

On Sun, Oct 30, 2016 at 11:17:57PM +0200, Sagi Grimberg wrote:
>
>
> On 27/10/16 09:59, Binoy Jayan wrote:
> >Convert the completion 'mlx5_ib_umr_context:done' to a wait_event as it
> >just waits for the return value to be filled.

On top of Sagi's response, I'm failing to understand why it is needed.
Can you elaborate more about it?

> >
> >Signed-off-by: Binoy Jayan <[email protected]>
> >---
> > drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
> > drivers/infiniband/hw/mlx5/mr.c | 9 +++++----
> > include/rdma/ib_verbs.h | 1 +
> > 3 files changed, 7 insertions(+), 5 deletions(-)
> >
> >diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> >index de31b5f..cf496b5 100644
> >--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> >+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> >@@ -524,7 +524,7 @@ struct mlx5_ib_mw {
> > struct mlx5_ib_umr_context {
> > struct ib_cqe cqe;
> > enum ib_wc_status status;
> >- struct completion done;
> >+ wait_queue_head_t wq;
> > };
> >
> > struct umr_common {
> >diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
> >index dfaf6f6..49ff2af 100644
> >--- a/drivers/infiniband/hw/mlx5/mr.c
> >+++ b/drivers/infiniband/hw/mlx5/mr.c
> >@@ -846,14 +846,14 @@ static void mlx5_ib_umr_done(struct ib_cq *cq, struct ib_wc *wc)
> > container_of(wc->wr_cqe, struct mlx5_ib_umr_context, cqe);
> >
> > context->status = wc->status;
> >- complete(&context->done);
> >+ wake_up(&context->wq);
> > }
> >
> > static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
> > {
> > context->cqe.done = mlx5_ib_umr_done;
> >- context->status = -1;
> >- init_completion(&context->done);
> >+ context->status = IB_WC_STATUS_NONE;
> >+ init_waitqueue_head(&context->wq);
> > }
> >
> > static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> >@@ -873,7 +873,8 @@ static inline int mlx5_ib_post_send_wait(struct mlx5_ib_dev *dev,
> > if (err) {
> > mlx5_ib_warn(dev, "UMR post send failed, err %d\n", err);
> > } else {
> >- wait_for_completion(&umr_context.done);
> >+ wait_event(umr_context.wq,
> >+ umr_context.status != IB_WC_STATUS_NONE);
>
> How is this simpler?
>
>
> > enum ib_wc_status {
> >+ IB_WC_STATUS_NONE = -1,
> > IB_WC_SUCCESS,
> > IB_WC_LOC_LEN_ERR,
> > IB_WC_LOC_QP_OP_ERR,
> >
>
> Huh? Where did this bogus status came from? IMHO, this is polluting
> the verbs interface for no good reason at all, sorry.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
(No filename) (2.61 kB)
signature.asc (819.00 B)
Download all attachments

2016-11-03 06:03:08

by Binoy Jayan

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event

Hi,

On 31 October 2016 at 02:47, Sagi Grimberg <[email protected]> wrote:

> How is this simpler?

It is simpler in the sense that it is a light weight primitive and that only
one thread waits on the event here. In our case since 'umr_context.done'
is an "on stack" variable, and has only one thread waiting on that event,
no race conditions occur. So, we do not need completions here which
are usually used to provide a race-free but easy-to-use solution involving
multiple threads waiting on an event.

>> enum ib_wc_status {
>> + IB_WC_STATUS_NONE = -1,
>> IB_WC_SUCCESS,
>> IB_WC_LOC_LEN_ERR,
>> IB_WC_LOC_QP_OP_ERR,
>>
>
> Huh? Where did this bogus status came from? IMHO, this is polluting
> the verbs interface for no good reason at all, sorry.

context->status is initialized to -1 in the following code, so I just thought of
replacing it with a name.

static inline void mlx5_ib_init_umr_context(struct mlx5_ib_umr_context *context)
{
context->cqe.done = mlx5_ib_umr_done;
- context->status = -1;
- init_completion(&context->done);
+ context->status = IB_WC_STATUS_NONE;
+ init_waitqueue_head(&context->wq);
}

Thanks,
Binoy

2016-11-03 15:37:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event

On Wed, Oct 26, 2016 at 11:59 PM, Binoy Jayan <[email protected]> wrote:
> Convert the completion 'mlx5_ib_umr_context:done' to a wait_event as it
> just waits for the return value to be filled.

This is wrong.

Since that "umr_context" variable is on the stack, and you are waiting
for it to be fully done, it really should be a completion.

Why?

Because a "complete()" guarantees that the *last* access to the
variable is done by the thread that does "wait_for_completion()".

In contrast, doing a "wait_event()" can actually *race* with whoever
wakes it up, and the "wait_event()" may race with the wakeup, where
the wakeup() ends up removing the entry from the list, so that
"list_empty_careful()" sees that it's all done, but the "wakeup()" can
still be holding (and about to release) the waitqueue spinlock.

What's the problem?

When the waiter is on the stack, the umr_context" variable may be
about to be relased, and then the "wake_up()" may end up accessing the
umr_context waitqueue spinlock after it has already become something
else.

This is unlikely, but it's very much one of the things that a
"completion" is all about. A completion (along with a plain spinlock)
is guaranteed to be synchronous in a way that semaphores and
wait-queues are *not*, so that when somebody has done "complete()",
there is no access to the variable that can race.

So no. A wait-queue wakeup is NOT AT ALL the same thing as a completion.

There really is a very real reason why "complete()" exists, and why it
is used for one-time initializations like this. "complete()" along
with spinlocks are synchronous in ways that other concepts are not.

Linus

2016-11-07 05:51:15

by Binoy Jayan

[permalink] [raw]
Subject: Re: [PATCH v4 10/10] IB/mlx5: Simplify completion into a wait_event

Hi Linus,

On 3 November 2016 at 21:07, Linus Torvalds
<[email protected]> wrote:
> This is wrong.

Will change it back.

> Since that "umr_context" variable is on the stack, and you are waiting
> for it to be fully done, it really should be a completion.

Thank you for sharing your insight with wait_event and completion.

-Binoy

2016-11-18 06:57:37

by Binoy Jayan

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion

Hi Sagi,

On 31 October 2016 at 02:42, Sagi Grimberg <[email protected]> wrote:
>> The semaphore 'sem' in isert_device is used as completion, so convert
>> it to struct completion. Semaphores are going away in the future.
>
>
> Umm, this is 100% *not* true. np->sem is designed as a counting to
> sync the iscsi login thread with the connect requests coming from the
> initiators. So this is actually a reliable bug insertion :(
>
> NAK from me...

Sorry for the late reply as I was held up in other activities.

I converted this to a wait_event() implementation but as I was doing it,
I was wondering how it would have been different if it was a completion
and not a semaphore.

File: drivers/infiniband/ulp/isert/ib_isert.c

If isert_connected_handler() is called multiple times, adding an entry to the
list, and if that happens while we use completion, 'done' (part of struct
completion) would be incremented by 1 each time 'complete' is called from
isert_connected_handler. After 'n' iterations, done will be equal to 'n'. If we
call wait_for_completion now from isert_accept_np, it would just decrement
'done' by one and continue without blocking, consuming one node at a time
from the list 'isert_np->pending'.

Alternatively if "done" becomes zero, and the next time wait_for_completion is
called, the API would add a node at the end of the wait queue 'wait' in 'struct
completion' and block until "done" is nonzero. (Ref: do_wait_for_common)
It exists the wait when a call to 'complete' turns 'done' back to 1.
But if there
are multiple waits called before calling complete, all the tasks
calling the wait
gets queued up and they will all would see "done" set to zero. When complete
is called now, done turns 1 again and the first task in the queue is woken up
as it is serialized as FIFO. Now the first wait returns and the done is
decremented by 1 just before the return.

Am I missing something here?

Thanks,
Binoy

2016-11-18 08:58:37

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion

On Friday, November 18, 2016 12:27:32 PM CET Binoy Jayan wrote:
> Hi Sagi,
>
> On 31 October 2016 at 02:42, Sagi Grimberg <[email protected]> wrote:
> >> The semaphore 'sem' in isert_device is used as completion, so convert
> >> it to struct completion. Semaphores are going away in the future.
> >
> >
> > Umm, this is 100% *not* true. np->sem is designed as a counting to
> > sync the iscsi login thread with the connect requests coming from the
> > initiators. So this is actually a reliable bug insertion :(
> >
> > NAK from me...
>
> Sorry for the late reply as I was held up in other activities.
>
> I converted this to a wait_event() implementation but as I was doing it,
> I was wondering how it would have been different if it was a completion
> and not a semaphore.
>
> File: drivers/infiniband/ulp/isert/ib_isert.c
>
> If isert_connected_handler() is called multiple times, adding an entry to the
> list, and if that happens while we use completion, 'done' (part of struct
> completion) would be incremented by 1 each time 'complete' is called from
> isert_connected_handler. After 'n' iterations, done will be equal to 'n'. If we
> call wait_for_completion now from isert_accept_np, it would just decrement
> 'done' by one and continue without blocking, consuming one node at a time
> from the list 'isert_np->pending'.
>
> Alternatively if "done" becomes zero, and the next time wait_for_completion is
> called, the API would add a node at the end of the wait queue 'wait' in 'struct
> completion' and block until "done" is nonzero. (Ref: do_wait_for_common)
> It exists the wait when a call to 'complete' turns 'done' back to 1.
> But if there
> are multiple waits called before calling complete, all the tasks
> calling the wait
> gets queued up and they will all would see "done" set to zero. When complete
> is called now, done turns 1 again and the first task in the queue is woken up
> as it is serialized as FIFO. Now the first wait returns and the done is
> decremented by 1 just before the return.
>
> Am I missing something here?

I think you are right. This is behavior is actuallly documented in
Documentation/scheduler/completion.txt:

If complete() is called multiple times then this will allow for that number
of waiters to continue - each call to complete() will simply increment the
done element. Calling complete_all() multiple times is a bug though. Both
complete() and complete_all() can be called in hard-irq/atomic context
safely.

However, this is fairly unusual behavior and I wasn't immediately aware
of it either when I read Sagi's reply. While your patch looks correct,
it's probably a good idea to point out the counting behavior of this
completion as explicitly as possible, in the changelog text of the patch
as well as in a code comment and perhaps in the naming of the completion.

Arnd

2016-11-18 09:10:18

by Binoy Jayan

[permalink] [raw]
Subject: Re: [PATCH v4 05/10] IB/isert: Replace semaphore sem with completion

Hi Arnd,

On 18 November 2016 at 14:28, Arnd Bergmann <[email protected]> wrote:
> On Friday, November 18, 2016 12:27:32 PM CET Binoy Jayan wrote:
>> Hi Sagi,

> I think you are right. This is behavior is actuallly documented in
> Documentation/scheduler/completion.txt:

Thanking for having a look.

> However, this is fairly unusual behavior and I wasn't immediately aware
> of it either when I read Sagi's reply. While your patch looks correct,
> it's probably a good idea to point out the counting behavior of this
> completion as explicitly as possible, in the changelog text of the patch
> as well as in a code comment and perhaps in the naming of the completion.

Will mention this and resend the patch series.

Thanks,
Binoy