2013-08-31 03:08:43

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 0/6] target/vhost/iscsi: Add per-cpu ida tag pre-allocation for v3.12

From: Nicholas Bellinger <[email protected]>

Hi folks,

This is an updated -v5 series for adding tag pre-allocation support of
target fabric descriptor memory, utilizing Kent's latest per-cpu ida
bits and incorporates akpm's last round of feedback. The full -v5
changelog is included below.

The first patch is a standalone version of per-cpu-ida, seperate from
the full idr rewrite from Kent that is still being discussed. Jens has
also expressed interest in a blk-mq conversion to use these per-cpu-ida
primatives, so getting this piece merged for v3.12 would make life easier
for both of us. ;)

The second patch includes target-core setup of se_sess->sess_cmd_map +
se_sess->sess_tag_pool resources at session creation time, using
fabric independent code in transport_init_session_tags().

The third patch is the initial conversion of vhost-scsi fabric code
to use per-cpu ida logic for obtaining a new tcm_vhost_cmd descriptor
via vhost_scsi_get_tag() during vhost_work_fn_t->handle_kick() ->
vhost_scsi_handle_vq() callback execution.

The forth patch is a vhost-scsi change that adds pre-allocation of
per tcm_vhost_cmd descriptor scatterlist + user-space page pointer
memory, that allows the last two fast-path allocations to be dropped
from tcm_vhost_submission_work() -> vhost_scsi_map_to_sgl() fast-path
execution.

The fifth patch converts iscsi/iser-target to use allocations based on
iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
using an embedded isert_cmd->iscsi_cmd. This makes the conversion to
percpu-ida pre-allocation easier.

And the sixth patch enables iscsi-target to use pre-allocation logic for
per-cpu session tag pooling with internal ida_alloc() + ida_free() calls
based upon the saved iscsi_cmd->se_cmd.map_tag id.

v5 changes:

- Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
- Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
- Convert steal_tags() to use cpumask_weight() + cpumask_next() +
cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
- Add comment for alloc_global_tags() (kmo + akpm)
- Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
- Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
- Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
(kmo + akpm)
- Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
(kmo + akpm)
- Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
- Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)
- Convert target/vhost/iscsi-target to use percpu_ida.h (nab)

Please review as v3.12 material.

Thank you,

--nab

Kent Overstreet (1):
idr: Percpu ida

Nicholas Bellinger (5):
target: Add transport_init_session_tags using per-cpu ida
vhost/scsi: Convert to per-cpu ida_alloc + ida_free command map
vhost/scsi: Add pre-allocation for tv_cmd SGL + upages memory
iscsi/iser-target: Convert to command priv_size usage
iscsi-target: Convert to per-cpu ida_alloc + ida_free command map

drivers/infiniband/ulp/isert/ib_isert.c | 114 +++------
drivers/infiniband/ulp/isert/ib_isert.h | 2 +-
drivers/target/iscsi/iscsi_target.c | 16 +--
drivers/target/iscsi/iscsi_target.h | 1 -
drivers/target/iscsi/iscsi_target_configfs.c | 2 +-
drivers/target/iscsi/iscsi_target_core.h | 3 +-
drivers/target/iscsi/iscsi_target_nego.c | 28 ++-
drivers/target/iscsi/iscsi_target_util.c | 41 ++--
drivers/target/target_core_transport.c | 48 ++++
drivers/vhost/scsi.c | 132 ++++++++---
include/linux/percpu_ida.h | 59 +++++
include/target/iscsi/iscsi_transport.h | 8 +-
include/target/target_core_base.h | 5 +
include/target/target_core_fabric.h | 3 +
lib/Makefile | 5 +-
lib/percpu_ida.c | 335 ++++++++++++++++++++++++++
16 files changed, 652 insertions(+), 150 deletions(-)
create mode 100644 include/linux/percpu_ida.h
create mode 100644 lib/percpu_ida.c

--
1.7.2.5


2013-08-31 03:08:57

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 5/6] iscsi/iser-target: Convert to command priv_size usage

From: Nicholas Bellinger <[email protected]>

This command converts iscsi/isert-target to use allocations based on
iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
using an embedded isert_cmd->iscsi_cmd.

This includes removing iscsit_transport->alloc_cmd() usage, along
with updating isert-target code to use iscsit_priv_cmd().

Also, remove left-over iscsit_transport->release_cmd() usage for
direct calls to iscsit_release_cmd(), and drop the now unused
lio_cmd_cache and isert_cmd_cache.

Cc: Or Gerlitz <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/infiniband/ulp/isert/ib_isert.c | 114 +++++++++-----------------
drivers/infiniband/ulp/isert/ib_isert.h | 2 +-
drivers/target/iscsi/iscsi_target.c | 16 +----
drivers/target/iscsi/iscsi_target.h | 1 -
drivers/target/iscsi/iscsi_target_configfs.c | 2 +-
drivers/target/iscsi/iscsi_target_core.h | 1 -
drivers/target/iscsi/iscsi_target_util.c | 20 +----
include/target/iscsi/iscsi_transport.h | 8 ++-
8 files changed, 55 insertions(+), 109 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index d17ff13..a535b22 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -39,7 +39,6 @@ static DEFINE_MUTEX(device_list_mutex);
static LIST_HEAD(device_list);
static struct workqueue_struct *isert_rx_wq;
static struct workqueue_struct *isert_comp_wq;
-static struct kmem_cache *isert_cmd_cache;

static void
isert_qp_event_callback(struct ib_event *e, void *context)
@@ -879,43 +878,30 @@ isert_rx_login_req(struct iser_rx_desc *rx_desc, int rx_buflen,
schedule_delayed_work(&conn->login_work, 0);
}

-static void
-isert_release_cmd(struct iscsi_cmd *cmd)
-{
- struct isert_cmd *isert_cmd = container_of(cmd, struct isert_cmd,
- iscsi_cmd);
-
- pr_debug("Entering isert_release_cmd %p >>>>>>>>>>>>>>>.\n", isert_cmd);
-
- kfree(cmd->buf_ptr);
- kfree(cmd->tmr_req);
-
- kmem_cache_free(isert_cmd_cache, isert_cmd);
-}
-
static struct iscsi_cmd
-*isert_alloc_cmd(struct iscsi_conn *conn, gfp_t gfp)
+*isert_allocate_cmd(struct iscsi_conn *conn, gfp_t gfp)
{
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct isert_cmd *isert_cmd;
+ struct iscsi_cmd *cmd;

- isert_cmd = kmem_cache_zalloc(isert_cmd_cache, gfp);
- if (!isert_cmd) {
- pr_err("Unable to allocate isert_cmd\n");
+ cmd = iscsit_allocate_cmd(conn, gfp);
+ if (!cmd) {
+ pr_err("Unable to allocate iscsi_cmd + isert_cmd\n");
return NULL;
}
+ isert_cmd = iscsit_priv_cmd(cmd);
isert_cmd->conn = isert_conn;
- isert_cmd->iscsi_cmd.release_cmd = &isert_release_cmd;
+ isert_cmd->iscsi_cmd = cmd;

- return &isert_cmd->iscsi_cmd;
+ return cmd;
}

static int
isert_handle_scsi_cmd(struct isert_conn *isert_conn,
- struct isert_cmd *isert_cmd, struct iser_rx_desc *rx_desc,
- unsigned char *buf)
+ struct isert_cmd *isert_cmd, struct iscsi_cmd *cmd,
+ struct iser_rx_desc *rx_desc, unsigned char *buf)
{
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
struct iscsi_conn *conn = isert_conn->conn;
struct iscsi_scsi_req *hdr = (struct iscsi_scsi_req *)buf;
struct scatterlist *sg;
@@ -1022,9 +1008,9 @@ isert_handle_iscsi_dataout(struct isert_conn *isert_conn,

static int
isert_handle_nop_out(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
- struct iser_rx_desc *rx_desc, unsigned char *buf)
+ struct iscsi_cmd *cmd, struct iser_rx_desc *rx_desc,
+ unsigned char *buf)
{
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
struct iscsi_conn *conn = isert_conn->conn;
struct iscsi_nopout *hdr = (struct iscsi_nopout *)buf;
int rc;
@@ -1041,9 +1027,9 @@ isert_handle_nop_out(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,

static int
isert_handle_text_cmd(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
- struct iser_rx_desc *rx_desc, struct iscsi_text *hdr)
+ struct iscsi_cmd *cmd, struct iser_rx_desc *rx_desc,
+ struct iscsi_text *hdr)
{
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
struct iscsi_conn *conn = isert_conn->conn;
u32 payload_length = ntoh24(hdr->dlength);
int rc;
@@ -1088,26 +1074,26 @@ isert_rx_opcode(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc,

switch (opcode) {
case ISCSI_OP_SCSI_CMD:
- cmd = iscsit_allocate_cmd(conn, GFP_KERNEL);
+ cmd = isert_allocate_cmd(conn, GFP_KERNEL);
if (!cmd)
break;

- isert_cmd = container_of(cmd, struct isert_cmd, iscsi_cmd);
+ isert_cmd = iscsit_priv_cmd(cmd);
isert_cmd->read_stag = read_stag;
isert_cmd->read_va = read_va;
isert_cmd->write_stag = write_stag;
isert_cmd->write_va = write_va;

- ret = isert_handle_scsi_cmd(isert_conn, isert_cmd,
+ ret = isert_handle_scsi_cmd(isert_conn, isert_cmd, cmd,
rx_desc, (unsigned char *)hdr);
break;
case ISCSI_OP_NOOP_OUT:
- cmd = iscsit_allocate_cmd(conn, GFP_KERNEL);
+ cmd = isert_allocate_cmd(conn, GFP_KERNEL);
if (!cmd)
break;

- isert_cmd = container_of(cmd, struct isert_cmd, iscsi_cmd);
- ret = isert_handle_nop_out(isert_conn, isert_cmd,
+ isert_cmd = iscsit_priv_cmd(cmd);
+ ret = isert_handle_nop_out(isert_conn, isert_cmd, cmd,
rx_desc, (unsigned char *)hdr);
break;
case ISCSI_OP_SCSI_DATA_OUT:
@@ -1115,7 +1101,7 @@ isert_rx_opcode(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc,
(unsigned char *)hdr);
break;
case ISCSI_OP_SCSI_TMFUNC:
- cmd = iscsit_allocate_cmd(conn, GFP_KERNEL);
+ cmd = isert_allocate_cmd(conn, GFP_KERNEL);
if (!cmd)
break;

@@ -1123,7 +1109,7 @@ isert_rx_opcode(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc,
(unsigned char *)hdr);
break;
case ISCSI_OP_LOGOUT:
- cmd = iscsit_allocate_cmd(conn, GFP_KERNEL);
+ cmd = isert_allocate_cmd(conn, GFP_KERNEL);
if (!cmd)
break;

@@ -1134,12 +1120,12 @@ isert_rx_opcode(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc,
HZ);
break;
case ISCSI_OP_TEXT:
- cmd = iscsit_allocate_cmd(conn, GFP_KERNEL);
+ cmd = isert_allocate_cmd(conn, GFP_KERNEL);
if (!cmd)
break;

- isert_cmd = container_of(cmd, struct isert_cmd, iscsi_cmd);
- ret = isert_handle_text_cmd(isert_conn, isert_cmd,
+ isert_cmd = iscsit_priv_cmd(cmd);
+ ret = isert_handle_text_cmd(isert_conn, isert_cmd, cmd,
rx_desc, (struct iscsi_text *)hdr);
break;
default:
@@ -1267,7 +1253,7 @@ isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
static void
isert_put_cmd(struct isert_cmd *isert_cmd)
{
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
+ struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
struct isert_conn *isert_conn = isert_cmd->conn;
struct iscsi_conn *conn = isert_conn->conn;

@@ -1318,7 +1304,7 @@ isert_put_cmd(struct isert_cmd *isert_cmd)
* Fall-through
*/
default:
- isert_release_cmd(cmd);
+ iscsit_release_cmd(cmd);
break;
}
}
@@ -1354,7 +1340,7 @@ isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
struct isert_cmd *isert_cmd)
{
struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
+ struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
struct se_cmd *se_cmd = &cmd->se_cmd;
struct ib_device *ib_dev = isert_cmd->conn->conn_cm_id->device;

@@ -1390,7 +1376,7 @@ isert_do_control_comp(struct work_struct *work)
struct isert_cmd, comp_work);
struct isert_conn *isert_conn = isert_cmd->conn;
struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
+ struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;

switch (cmd->i_state) {
case ISTATE_SEND_TASKMGTRSP:
@@ -1436,7 +1422,7 @@ isert_response_completion(struct iser_tx_desc *tx_desc,
struct isert_conn *isert_conn,
struct ib_device *ib_dev)
{
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
+ struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;

if (cmd->i_state == ISTATE_SEND_TASKMGTRSP ||
cmd->i_state == ISTATE_SEND_LOGOUTRSP ||
@@ -1628,8 +1614,7 @@ isert_post_response(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd)
static int
isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
{
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;
struct iscsi_scsi_rsp *hdr = (struct iscsi_scsi_rsp *)
@@ -1678,8 +1663,7 @@ static int
isert_put_nopin(struct iscsi_cmd *cmd, struct iscsi_conn *conn,
bool nopout_response)
{
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;

@@ -1698,8 +1682,7 @@ isert_put_nopin(struct iscsi_cmd *cmd, struct iscsi_conn *conn,
static int
isert_put_logout_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
{
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;

@@ -1717,8 +1700,7 @@ isert_put_logout_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
static int
isert_put_tm_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
{
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;

@@ -1736,8 +1718,7 @@ isert_put_tm_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
static int
isert_put_reject(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
{
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;
struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
@@ -1769,8 +1750,7 @@ isert_put_reject(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
static int
isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
{
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *send_wr = &isert_cmd->tx_desc.send_wr;
struct iscsi_text_rsp *hdr =
@@ -1812,7 +1792,7 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
struct ib_sge *ib_sge, struct ib_send_wr *send_wr,
u32 data_left, u32 offset)
{
- struct iscsi_cmd *cmd = &isert_cmd->iscsi_cmd;
+ struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
struct scatterlist *sg_start, *tmp_sg;
struct ib_device *ib_dev = isert_conn->conn_cm_id->device;
u32 sg_off, page_off;
@@ -1857,8 +1837,7 @@ static int
isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
{
struct se_cmd *se_cmd = &cmd->se_cmd;
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *wr_failed, *send_wr;
@@ -1961,8 +1940,7 @@ static int
isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
{
struct se_cmd *se_cmd = &cmd->se_cmd;
- struct isert_cmd *isert_cmd = container_of(cmd,
- struct isert_cmd, iscsi_cmd);
+ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
struct isert_conn *isert_conn = (struct isert_conn *)conn->context;
struct ib_send_wr *wr_failed, *send_wr;
@@ -2408,12 +2386,12 @@ static void isert_free_conn(struct iscsi_conn *conn)
static struct iscsit_transport iser_target_transport = {
.name = "IB/iSER",
.transport_type = ISCSI_INFINIBAND,
+ .priv_size = sizeof(struct isert_cmd),
.owner = THIS_MODULE,
.iscsit_setup_np = isert_setup_np,
.iscsit_accept_np = isert_accept_np,
.iscsit_free_np = isert_free_np,
.iscsit_free_conn = isert_free_conn,
- .iscsit_alloc_cmd = isert_alloc_cmd,
.iscsit_get_login_rx = isert_get_login_rx,
.iscsit_put_login_tx = isert_put_login_tx,
.iscsit_immediate_queue = isert_immediate_queue,
@@ -2440,21 +2418,10 @@ static int __init isert_init(void)
goto destroy_rx_wq;
}

- isert_cmd_cache = kmem_cache_create("isert_cmd_cache",
- sizeof(struct isert_cmd), __alignof__(struct isert_cmd),
- 0, NULL);
- if (!isert_cmd_cache) {
- pr_err("Unable to create isert_cmd_cache\n");
- ret = -ENOMEM;
- goto destroy_tx_cq;
- }
-
iscsit_register_transport(&iser_target_transport);
pr_debug("iSER_TARGET[0] - Loaded iser_target_transport\n");
return 0;

-destroy_tx_cq:
- destroy_workqueue(isert_comp_wq);
destroy_rx_wq:
destroy_workqueue(isert_rx_wq);
return ret;
@@ -2462,7 +2429,6 @@ destroy_rx_wq:

static void __exit isert_exit(void)
{
- kmem_cache_destroy(isert_cmd_cache);
destroy_workqueue(isert_comp_wq);
destroy_workqueue(isert_rx_wq);
iscsit_unregister_transport(&iser_target_transport);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 191117b..0d45945 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -67,7 +67,7 @@ struct isert_cmd {
u32 write_va_off;
u32 rdma_wr_num;
struct isert_conn *conn;
- struct iscsi_cmd iscsi_cmd;
+ struct iscsi_cmd *iscsi_cmd;
struct ib_sge *ib_sge;
struct iser_tx_desc tx_desc;
struct isert_rdma_wr rdma_wr;
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 7228cc7..92f2776 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -63,7 +63,6 @@ spinlock_t sess_idr_lock;

struct iscsit_global *iscsit_global;

-struct kmem_cache *lio_cmd_cache;
struct kmem_cache *lio_qr_cache;
struct kmem_cache *lio_dr_cache;
struct kmem_cache *lio_ooo_cache;
@@ -500,7 +499,6 @@ static struct iscsit_transport iscsi_target_transport = {
.iscsit_setup_np = iscsit_setup_np,
.iscsit_accept_np = iscsit_accept_np,
.iscsit_free_np = iscsit_free_np,
- .iscsit_alloc_cmd = iscsit_alloc_cmd,
.iscsit_get_login_rx = iscsit_get_login_rx,
.iscsit_put_login_tx = iscsit_put_login_tx,
.iscsit_get_dataout = iscsit_build_r2ts_for_cmd,
@@ -541,22 +539,13 @@ static int __init iscsi_target_init_module(void)
goto ts_out1;
}

- lio_cmd_cache = kmem_cache_create("lio_cmd_cache",
- sizeof(struct iscsi_cmd), __alignof__(struct iscsi_cmd),
- 0, NULL);
- if (!lio_cmd_cache) {
- pr_err("Unable to kmem_cache_create() for"
- " lio_cmd_cache\n");
- goto ts_out2;
- }
-
lio_qr_cache = kmem_cache_create("lio_qr_cache",
sizeof(struct iscsi_queue_req),
__alignof__(struct iscsi_queue_req), 0, NULL);
if (!lio_qr_cache) {
pr_err("nable to kmem_cache_create() for"
" lio_qr_cache\n");
- goto cmd_out;
+ goto ts_out2;
}

lio_dr_cache = kmem_cache_create("lio_dr_cache",
@@ -600,8 +589,6 @@ dr_out:
kmem_cache_destroy(lio_dr_cache);
qr_out:
kmem_cache_destroy(lio_qr_cache);
-cmd_out:
- kmem_cache_destroy(lio_cmd_cache);
ts_out2:
iscsi_deallocate_thread_sets();
ts_out1:
@@ -619,7 +606,6 @@ static void __exit iscsi_target_cleanup_module(void)
iscsi_thread_set_free();
iscsit_release_discovery_tpg();
iscsit_unregister_transport(&iscsi_target_transport);
- kmem_cache_destroy(lio_cmd_cache);
kmem_cache_destroy(lio_qr_cache);
kmem_cache_destroy(lio_dr_cache);
kmem_cache_destroy(lio_ooo_cache);
diff --git a/drivers/target/iscsi/iscsi_target.h b/drivers/target/iscsi/iscsi_target.h
index f82f627..e936d56 100644
--- a/drivers/target/iscsi/iscsi_target.h
+++ b/drivers/target/iscsi/iscsi_target.h
@@ -39,7 +39,6 @@ extern struct target_fabric_configfs *lio_target_fabric_configfs;

extern struct kmem_cache *lio_dr_cache;
extern struct kmem_cache *lio_ooo_cache;
-extern struct kmem_cache *lio_cmd_cache;
extern struct kmem_cache *lio_qr_cache;
extern struct kmem_cache *lio_r2t_cache;

diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c
index ddfa32c..116114f 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1925,7 +1925,7 @@ static void lio_release_cmd(struct se_cmd *se_cmd)
struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd, se_cmd);

pr_debug("Entering lio_release_cmd for se_cmd: %p\n", se_cmd);
- cmd->release_cmd(cmd);
+ iscsit_release_cmd(cmd);
}

/* End functions for target_core_fabric_ops */
diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
index 8a4c32d..5e1a068 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -489,7 +489,6 @@ struct iscsi_cmd {
u32 first_data_sg_off;
u32 kmapped_nents;
sense_reason_t sense_reason;
- void (*release_cmd)(struct iscsi_cmd *);
} ____cacheline_aligned;

struct iscsi_tmr_req {
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 1df06d5..5784cad 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -149,18 +149,6 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
spin_unlock_bh(&cmd->r2t_lock);
}

-struct iscsi_cmd *iscsit_alloc_cmd(struct iscsi_conn *conn, gfp_t gfp_mask)
-{
- struct iscsi_cmd *cmd;
-
- cmd = kmem_cache_zalloc(lio_cmd_cache, gfp_mask);
- if (!cmd)
- return NULL;
-
- cmd->release_cmd = &iscsit_release_cmd;
- return cmd;
-}
-
/*
* May be called from software interrupt (timer) context for allocating
* iSCSI NopINs.
@@ -168,8 +156,9 @@ struct iscsi_cmd *iscsit_alloc_cmd(struct iscsi_conn *conn, gfp_t gfp_mask)
struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, gfp_t gfp_mask)
{
struct iscsi_cmd *cmd;
+ int priv_size = conn->conn_transport->priv_size;

- cmd = conn->conn_transport->iscsit_alloc_cmd(conn, gfp_mask);
+ cmd = kzalloc(sizeof(struct iscsi_cmd) + priv_size, gfp_mask);
if (!cmd) {
pr_err("Unable to allocate memory for struct iscsi_cmd.\n");
return NULL;
@@ -696,8 +685,9 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
kfree(cmd->iov_data);
kfree(cmd->text_in_ptr);

- kmem_cache_free(lio_cmd_cache, cmd);
+ kfree(cmd);
}
+EXPORT_SYMBOL(iscsit_release_cmd);

static void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool scsi_cmd,
bool check_queues)
@@ -761,7 +751,7 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
/* Fall-through */
default:
__iscsit_free_cmd(cmd, false, shutdown);
- cmd->release_cmd(cmd);
+ iscsit_release_cmd(cmd);
break;
}
}
diff --git a/include/target/iscsi/iscsi_transport.h b/include/target/iscsi/iscsi_transport.h
index e5d09d2..a12589c 100644
--- a/include/target/iscsi/iscsi_transport.h
+++ b/include/target/iscsi/iscsi_transport.h
@@ -6,13 +6,13 @@ struct iscsit_transport {
#define ISCSIT_TRANSPORT_NAME 16
char name[ISCSIT_TRANSPORT_NAME];
int transport_type;
+ int priv_size;
struct module *owner;
struct list_head t_node;
int (*iscsit_setup_np)(struct iscsi_np *, struct __kernel_sockaddr_storage *);
int (*iscsit_accept_np)(struct iscsi_np *, struct iscsi_conn *);
void (*iscsit_free_np)(struct iscsi_np *);
void (*iscsit_free_conn)(struct iscsi_conn *);
- struct iscsi_cmd *(*iscsit_alloc_cmd)(struct iscsi_conn *, gfp_t);
int (*iscsit_get_login_rx)(struct iscsi_conn *, struct iscsi_login *);
int (*iscsit_put_login_tx)(struct iscsi_conn *, struct iscsi_login *, u32);
int (*iscsit_immediate_queue)(struct iscsi_conn *, struct iscsi_cmd *, int);
@@ -22,6 +22,11 @@ struct iscsit_transport {
int (*iscsit_queue_status)(struct iscsi_conn *, struct iscsi_cmd *);
};

+static inline void *iscsit_priv_cmd(struct iscsi_cmd *cmd)
+{
+ return (void *)(cmd + 1);
+}
+
/*
* From iscsi_target_transport.c
*/
@@ -92,3 +97,4 @@ extern int iscsit_tmr_post_handler(struct iscsi_cmd *, struct iscsi_conn *);
extern struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *, gfp_t);
extern int iscsit_sequence_cmd(struct iscsi_conn *, struct iscsi_cmd *,
unsigned char *, __be32);
+extern void iscsit_release_cmd(struct iscsi_cmd *);
--
1.7.2.5

2013-08-31 03:08:55

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 1/6] idr: Percpu ida

From: Kent Overstreet <[email protected]>

Percpu frontend for allocating ids. With percpu allocation (that works),
it's impossible to guarantee it will always be possible to allocate all
nr_tags - typically, some will be stuck on a remote percpu freelist
where the current job can't get to them.

We do guarantee that it will always be possible to allocate at least
(nr_tags / 2) tags - this is done by keeping track of which and how many
cpus have tags on their percpu freelists. On allocation failure if
enough cpus have tags that there could potentially be (nr_tags / 2) tags
stuck on remote percpu freelists, we then pick a remote cpu at random to
steal from.

Note that there's no cpu hotplug notifier - we don't care, because
steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
more allocations if we had a notifier - but we'll still meet our
guarantees and it's absolutely not a correctness issue, so I don't think
it's worth the extra code.

v5 changes:
- Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
- Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
- Convert steal_tags() to use cpumask_weight() + cpumask_next() +
cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
- Add comment for alloc_global_tags() (kmo + akpm)
- Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
- Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
- Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
(kmo + akpm)
- Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
(kmo + akpm)
- Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
- Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)

v4 changes:

- Fix tags.c reference in percpu_ida_init (akpm)

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Nicholas A. Bellinger" <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
include/linux/percpu_ida.h | 59 ++++++++
lib/Makefile | 5 +-
lib/percpu_ida.c | 335 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 397 insertions(+), 2 deletions(-)
create mode 100644 include/linux/percpu_ida.h
create mode 100644 lib/percpu_ida.c

diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
new file mode 100644
index 0000000..8a39cdc
--- /dev/null
+++ b/include/linux/percpu_ida.h
@@ -0,0 +1,59 @@
+#ifndef __PERCPU_IDA_H__
+#define __PERCPU_IDA_H__
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/init.h>
+#include <linux/spinlock_types.h>
+#include <linux/wait.h>
+
+struct percpu_ida_cpu;
+
+struct percpu_ida {
+ /*
+ * number of tags available to be allocated, as passed to
+ * percpu_ida_init()
+ */
+ unsigned nr_tags;
+
+ struct percpu_ida_cpu __percpu *tag_cpu;
+
+ /*
+ * Bitmap of cpus that (may) have tags on their percpu freelists:
+ * steal_tags() uses this to decide when to steal tags, and which cpus
+ * to try stealing from.
+ *
+ * It's ok for a freelist to be empty when its bit is set - steal_tags()
+ * will just keep looking - but the bitmap _must_ be set whenever a
+ * percpu freelist does have tags.
+ */
+ cpumask_t cpus_have_tags;
+
+ struct {
+ spinlock_t lock;
+ /*
+ * When we go to steal tags from another cpu (see steal_tags()),
+ * we want to pick a cpu at random. Cycling through them every
+ * time we steal is a bit easier and more or less equivalent:
+ */
+ unsigned cpu_last_stolen;
+
+ /* For sleeping on allocation failure */
+ wait_queue_head_t wait;
+
+ /*
+ * Global freelist - it's a stack where nr_free points to the
+ * top
+ */
+ unsigned nr_free;
+ unsigned *freelist;
+ } ____cacheline_aligned_in_smp;
+};
+
+int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp);
+void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
+
+void percpu_ida_destroy(struct percpu_ida *pool);
+int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags);
+
+#endif /* __PERCPU_IDA_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index 7baccfd..1cb8356 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
- earlycpio.o percpu-refcount.o
+ earlycpio.o percpu-refcount.o percpu_ida.o

obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
lib-$(CONFIG_MMU) += ioremap.o
@@ -24,7 +24,8 @@ lib-y += kobject.o klist.o
obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
- bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o
+ bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
+ percpu_ida.o
obj-y += string_helpers.o
obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
obj-y += kstrtox.o
diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
new file mode 100644
index 0000000..3ef70e8
--- /dev/null
+++ b/lib/percpu_ida.c
@@ -0,0 +1,335 @@
+/*
+ * Percpu IDA library
+ *
+ * Copyright (C) 2013 Datera, Inc. Kent Overstreet
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/hardirq.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/spinlock.h>
+#include <linux/percpu_ida.h>
+
+/*
+ * Number of tags we move between the percpu freelist and the global freelist at
+ * a time
+ */
+#define IDA_PCPU_BATCH_MOVE 32U
+
+/* Max size of percpu freelist, */
+#define IDA_PCPU_SIZE ((IDA_PCPU_BATCH_MOVE * 3) / 2)
+
+struct percpu_ida_cpu {
+ /*
+ * Even though this is percpu, we need a lock for tag stealing by remote
+ * CPUs:
+ */
+ spinlock_t lock;
+
+ /* nr_free/freelist form a stack of free IDs */
+ unsigned nr_free;
+ unsigned freelist[];
+};
+
+static inline void move_tags(unsigned *dst, unsigned *dst_nr,
+ unsigned *src, unsigned *src_nr,
+ unsigned nr)
+{
+ *src_nr -= nr;
+ memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
+ *dst_nr += nr;
+}
+
+/*
+ * Try to steal tags from a remote cpu's percpu freelist.
+ *
+ * We first check how many percpu freelists have tags - we don't steal tags
+ * unless enough percpu freelists have tags on them that it's possible more than
+ * half the total tags could be stuck on remote percpu freelists.
+ *
+ * Then we iterate through the cpus until we find some tags - we don't attempt
+ * to find the "best" cpu to steal from, to keep cacheline bouncing to a
+ * minimum.
+ */
+static inline void steal_tags(struct percpu_ida *pool,
+ struct percpu_ida_cpu *tags)
+{
+ unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
+ struct percpu_ida_cpu *remote;
+
+ for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
+ cpus_have_tags * IDA_PCPU_SIZE > pool->nr_tags / 2;
+ cpus_have_tags--) {
+ cpu = cpumask_next(cpu, &pool->cpus_have_tags);
+
+ if (cpu >= nr_cpu_ids)
+ cpu = cpumask_first(&pool->cpus_have_tags);
+
+ if (cpu >= nr_cpu_ids)
+ BUG();
+
+ pool->cpu_last_stolen = cpu;
+ remote = per_cpu_ptr(pool->tag_cpu, cpu);
+
+ cpumask_clear_cpu(cpu, &pool->cpus_have_tags);
+
+ if (remote == tags)
+ continue;
+
+ spin_lock(&remote->lock);
+
+ if (remote->nr_free) {
+ memcpy(tags->freelist,
+ remote->freelist,
+ sizeof(unsigned) * remote->nr_free);
+
+ tags->nr_free = remote->nr_free;
+ remote->nr_free = 0;
+ }
+
+ spin_unlock(&remote->lock);
+
+ if (tags->nr_free)
+ break;
+ }
+}
+
+/*
+ * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them onto
+ * our percpu freelist:
+ */
+static inline void alloc_global_tags(struct percpu_ida *pool,
+ struct percpu_ida_cpu *tags)
+{
+ move_tags(tags->freelist, &tags->nr_free,
+ pool->freelist, &pool->nr_free,
+ min(pool->nr_free, IDA_PCPU_BATCH_MOVE));
+}
+
+static inline unsigned alloc_local_tag(struct percpu_ida *pool,
+ struct percpu_ida_cpu *tags)
+{
+ int tag = -ENOSPC;
+
+ spin_lock(&tags->lock);
+ if (tags->nr_free)
+ tag = tags->freelist[--tags->nr_free];
+ spin_unlock(&tags->lock);
+
+ return tag;
+}
+
+/**
+ * percpu_ida_alloc - allocate a tag
+ * @pool: pool to allocate from
+ * @gfp: gfp flags
+ *
+ * Returns a tag - an integer in the range [0..nr_tags) (passed to
+ * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
+ *
+ * Safe to be called from interrupt context (assuming it isn't passed
+ * __GFP_WAIT, of course).
+ *
+ * @gfp indicates whether or not to wait until a free id is available (it's not
+ * used for internal memory allocations); thus if passed __GFP_WAIT we may sleep
+ * however long it takes until another thread frees an id (same semantics as a
+ * mempool).
+ *
+ * Will not fail if passed __GFP_WAIT.
+ */
+int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
+{
+ DEFINE_WAIT(wait);
+ struct percpu_ida_cpu *tags;
+ unsigned long flags;
+ int tag;
+
+ local_irq_save(flags);
+ tags = this_cpu_ptr(pool->tag_cpu);
+
+ /* Fastpath */
+ tag = alloc_local_tag(pool, tags);
+ if (likely(tag >= 0)) {
+ local_irq_restore(flags);
+ return tag;
+ }
+
+ while (1) {
+ spin_lock(&pool->lock);
+
+ /*
+ * prepare_to_wait() must come before steal_tags(), in case
+ * percpu_ida_free() on another cpu flips a bit in
+ * cpus_have_tags
+ *
+ * global lock held and irqs disabled, don't need percpu lock
+ */
+ prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
+
+ if (!tags->nr_free)
+ alloc_global_tags(pool, tags);
+ if (!tags->nr_free)
+ steal_tags(pool, tags);
+
+ if (tags->nr_free) {
+ tag = tags->freelist[--tags->nr_free];
+ if (tags->nr_free)
+ cpumask_set_cpu(smp_processor_id(),
+ &pool->cpus_have_tags);
+ }
+
+ spin_unlock(&pool->lock);
+ local_irq_restore(flags);
+
+ if (tag >= 0 || !(gfp & __GFP_WAIT))
+ break;
+
+ schedule();
+
+ local_irq_save(flags);
+ tags = this_cpu_ptr(pool->tag_cpu);
+ }
+
+ finish_wait(&pool->wait, &wait);
+ return tag;
+}
+EXPORT_SYMBOL_GPL(percpu_ida_alloc);
+
+/**
+ * percpu_ida_free - free a tag
+ * @pool: pool @tag was allocated from
+ * @tag: a tag previously allocated with percpu_ida_alloc()
+ *
+ * Safe to be called from interrupt context.
+ */
+void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
+{
+ struct percpu_ida_cpu *tags;
+ unsigned long flags;
+ unsigned nr_free;
+
+ BUG_ON(tag >= pool->nr_tags);
+
+ local_irq_save(flags);
+ tags = this_cpu_ptr(pool->tag_cpu);
+
+ spin_lock(&tags->lock);
+ tags->freelist[tags->nr_free++] = tag;
+
+ nr_free = tags->nr_free;
+ spin_unlock(&tags->lock);
+
+ if (nr_free == 1) {
+ cpumask_set_cpu(smp_processor_id(),
+ &pool->cpus_have_tags);
+ wake_up(&pool->wait);
+ }
+
+ if (nr_free == IDA_PCPU_SIZE) {
+ spin_lock(&pool->lock);
+
+ /*
+ * Global lock held and irqs disabled, don't need percpu
+ * lock
+ */
+ if (tags->nr_free == IDA_PCPU_SIZE) {
+ move_tags(pool->freelist, &pool->nr_free,
+ tags->freelist, &tags->nr_free,
+ IDA_PCPU_BATCH_MOVE);
+
+ wake_up(&pool->wait);
+ }
+ spin_unlock(&pool->lock);
+ }
+
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(percpu_ida_free);
+
+/**
+ * percpu_ida_destroy - release a tag pool's resources
+ * @pool: pool to free
+ *
+ * Frees the resources allocated by percpu_ida_init().
+ */
+void percpu_ida_destroy(struct percpu_ida *pool)
+{
+ free_percpu(pool->tag_cpu);
+ free_pages((unsigned long) pool->freelist,
+ get_order(pool->nr_tags * sizeof(unsigned)));
+}
+EXPORT_SYMBOL_GPL(percpu_ida_destroy);
+
+/**
+ * percpu_ida_init - initialize a percpu tag pool
+ * @pool: pool to initialize
+ * @nr_tags: number of tags that will be available for allocation
+ *
+ * Initializes @pool so that it can be used to allocate tags - integers in the
+ * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
+ * preallocated array of tag structures.
+ *
+ * Allocation is percpu, but sharding is limited by nr_tags - for best
+ * performance, the workload should not span more cpus than nr_tags / 128.
+ */
+int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
+{
+ unsigned i, cpu, order;
+
+ memset(pool, 0, sizeof(*pool));
+
+ init_waitqueue_head(&pool->wait);
+ spin_lock_init(&pool->lock);
+ pool->nr_tags = nr_tags;
+
+ /* Guard against overflow */
+ if (nr_tags > (unsigned) INT_MAX + 1) {
+ pr_err("percpu_ida_init(): nr_tags too large\n");
+ return -EINVAL;
+ }
+
+ order = get_order(nr_tags * sizeof(unsigned));
+ pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
+ if (!pool->freelist)
+ return -ENOMEM;
+
+ for (i = 0; i < nr_tags; i++)
+ pool->freelist[i] = i;
+
+ pool->nr_free = nr_tags;
+
+ pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
+ IDA_PCPU_SIZE * sizeof(unsigned),
+ sizeof(unsigned));
+ if (!pool->tag_cpu)
+ goto err;
+
+ for_each_possible_cpu(cpu)
+ spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
+
+ return 0;
+err:
+ percpu_ida_destroy(pool);
+ return -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(percpu_ida_init);
--
1.7.2.5

2013-08-31 03:09:15

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 6/6] iscsi-target: Convert to per-cpu ida_alloc + ida_free command map

From: Nicholas Bellinger <[email protected]>

This patch changes iscsi-target to use transport_alloc_session_tags()
pre-allocation logic for per-cpu session tag pooling with internal
ida_alloc() + ida_free() calls based upon the saved se_cmd->map_tag id.

This includes tag pool setup based upon per NodeACL queue_depth after
locating se_node_acl in iscsi_target_locate_portal().

Also update iscsit_allocate_cmd() and iscsit_release_cmd() to use
percpu_ida_alloc() and percpu_ida_free() respectively.

v5 changes;
- Convert to percpu_ida.h include

v2 changes:
- Fix bug with SessionType=Discovery in iscsi_target_locate_portal()

Cc: Or Gerlitz <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/target/iscsi/iscsi_target_core.h | 2 ++
drivers/target/iscsi/iscsi_target_nego.c | 28 ++++++++++++++++++++++++----
drivers/target/iscsi/iscsi_target_util.c | 27 ++++++++++++++++++++-------
3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_core.h b/drivers/target/iscsi/iscsi_target_core.h
index 5e1a068..e37b3b0 100644
--- a/drivers/target/iscsi/iscsi_target_core.h
+++ b/drivers/target/iscsi/iscsi_target_core.h
@@ -17,6 +17,8 @@
#define SECONDS_FOR_ASYNC_TEXT 10
#define SECONDS_FOR_LOGOUT_COMP 15
#define WHITE_SPACE " \t\v\f\n\r"
+#define ISCSIT_MIN_TAGS 16
+#define ISCSIT_EXTRA_TAGS 8

/* struct iscsi_node_attrib sanity values */
#define NA_DATAOUT_TIMEOUT 3
diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index daebe32..582b2db 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -876,8 +876,9 @@ int iscsi_target_locate_portal(
struct iscsi_tiqn *tiqn;
struct iscsi_tpg_np *tpg_np = NULL;
struct iscsi_login_req *login_req;
- u32 payload_length;
- int sessiontype = 0, ret = 0;
+ struct se_node_acl *se_nacl;
+ u32 payload_length, queue_depth = 0;
+ int sessiontype = 0, ret = 0, tag_num, tag_size;

login->np = np;

@@ -973,7 +974,7 @@ int iscsi_target_locate_portal(
goto out;
}
ret = 0;
- goto out;
+ goto alloc_tags;
}

get_target:
@@ -1070,8 +1071,27 @@ get_target:
ret = -1;
goto out;
}
+ se_nacl = sess->se_sess->se_node_acl;
+ queue_depth = se_nacl->queue_depth;
+ /*
+ * Setup pre-allocated tags based upon allowed per NodeACL CmdSN
+ * depth for non immediate commands, plus extra tags for immediate
+ * commands.
+ *
+ * Also enforce a ISCSIT_MIN_TAGS to prevent unnecessary contention
+ * in per-cpu-ida tag allocation logic + small queue_depth.
+ */
+alloc_tags:
+ tag_num = max_t(u32, ISCSIT_MIN_TAGS, queue_depth);
+ tag_num += ISCSIT_EXTRA_TAGS;
+ tag_size = sizeof(struct iscsi_cmd) + conn->conn_transport->priv_size;

- ret = 0;
+ ret = transport_alloc_session_tags(sess->se_sess, tag_num, tag_size);
+ if (ret < 0) {
+ iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
+ ISCSI_LOGIN_STATUS_NO_RESOURCES);
+ ret = -1;
+ }
out:
kfree(tmpbuf);
return ret;
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 5784cad..8d85d8c 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -19,6 +19,7 @@
******************************************************************************/

#include <linux/list.h>
+#include <linux/percpu_ida.h>
#include <scsi/scsi_tcq.h>
#include <scsi/iscsi_proto.h>
#include <target/target_core_base.h>
@@ -156,13 +157,15 @@ void iscsit_free_r2ts_from_list(struct iscsi_cmd *cmd)
struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn *conn, gfp_t gfp_mask)
{
struct iscsi_cmd *cmd;
- int priv_size = conn->conn_transport->priv_size;
+ struct se_session *se_sess = conn->sess->se_sess;
+ int size, tag;

- cmd = kzalloc(sizeof(struct iscsi_cmd) + priv_size, gfp_mask);
- if (!cmd) {
- pr_err("Unable to allocate memory for struct iscsi_cmd.\n");
- return NULL;
- }
+ tag = percpu_ida_alloc(&se_sess->sess_tag_pool, gfp_mask);
+ size = sizeof(struct iscsi_cmd) + conn->conn_transport->priv_size;
+ cmd = (struct iscsi_cmd *)(se_sess->sess_cmd_map + (tag * size));
+ memset(cmd, 0, size);
+
+ cmd->se_cmd.map_tag = tag;
cmd->conn = conn;
INIT_LIST_HEAD(&cmd->i_conn_node);
INIT_LIST_HEAD(&cmd->datain_list);
@@ -678,6 +681,16 @@ void iscsit_free_queue_reqs_for_conn(struct iscsi_conn *conn)

void iscsit_release_cmd(struct iscsi_cmd *cmd)
{
+ struct iscsi_session *sess;
+ struct se_cmd *se_cmd = &cmd->se_cmd;
+
+ if (cmd->conn)
+ sess = cmd->conn->sess;
+ else
+ sess = cmd->sess;
+
+ BUG_ON(!sess || !sess->se_sess);
+
kfree(cmd->buf_ptr);
kfree(cmd->pdu_list);
kfree(cmd->seq_list);
@@ -685,7 +698,7 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
kfree(cmd->iov_data);
kfree(cmd->text_in_ptr);

- kfree(cmd);
+ percpu_ida_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
}
EXPORT_SYMBOL(iscsit_release_cmd);

--
1.7.2.5

2013-08-31 03:08:53

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 2/6] target: Add transport_init_session_tags using per-cpu ida

From: Nicholas Bellinger <[email protected]>

This patch adds lib/idr.c based transport_init_session_tags() logic
that allows fabric drivers to setup a per-cpu se_sess->sess_tag_pool
and associated se_sess->sess_cmd_map for basic tagged pre-allocation
of fabric descriptor sized memory.

v5 changes:
- Convert to percpu_ida.h include

v4 changes:
- Add transport_alloc_session_tags() for fabrics that need early
transport_init_session()

v3 changes:
- Update to percpu-ida usage

Cc: Kent Overstreet <[email protected]>
Cc: Asias He <[email protected]>
Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/target/target_core_transport.c | 48 ++++++++++++++++++++++++++++++++
include/target/target_core_base.h | 5 +++
include/target/target_core_fabric.h | 3 ++
3 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 7172d00..98ec711 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -232,6 +232,50 @@ struct se_session *transport_init_session(void)
}
EXPORT_SYMBOL(transport_init_session);

+int transport_alloc_session_tags(struct se_session *se_sess,
+ unsigned int tag_num, unsigned int tag_size)
+{
+ int rc;
+
+ se_sess->sess_cmd_map = kzalloc(tag_num * tag_size, GFP_KERNEL);
+ if (!se_sess->sess_cmd_map) {
+ pr_err("Unable to allocate se_sess->sess_cmd_map\n");
+ return -ENOMEM;
+ }
+
+ rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
+ if (rc < 0) {
+ pr_err("Unable to init se_sess->sess_tag_pool,"
+ " tag_num: %u\n", tag_num);
+ kfree(se_sess->sess_cmd_map);
+ se_sess->sess_cmd_map = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(transport_alloc_session_tags);
+
+struct se_session *transport_init_session_tags(unsigned int tag_num,
+ unsigned int tag_size)
+{
+ struct se_session *se_sess;
+ int rc;
+
+ se_sess = transport_init_session();
+ if (IS_ERR(se_sess))
+ return se_sess;
+
+ rc = transport_alloc_session_tags(se_sess, tag_num, tag_size);
+ if (rc < 0) {
+ transport_free_session(se_sess);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ return se_sess;
+}
+EXPORT_SYMBOL(transport_init_session_tags);
+
/*
* Called with spin_lock_irqsave(&struct se_portal_group->session_lock called.
*/
@@ -367,6 +411,10 @@ EXPORT_SYMBOL(transport_deregister_session_configfs);

void transport_free_session(struct se_session *se_sess)
{
+ if (se_sess->sess_cmd_map) {
+ percpu_ida_destroy(&se_sess->sess_tag_pool);
+ kfree(se_sess->sess_cmd_map);
+ }
kmem_cache_free(se_sess_cache, se_sess);
}
EXPORT_SYMBOL(transport_free_session);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index e34fc90..bd55acd 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -5,6 +5,7 @@
#include <linux/configfs.h>
#include <linux/dma-mapping.h>
#include <linux/blkdev.h>
+#include <linux/percpu_ida.h>
#include <scsi/scsi_cmnd.h>
#include <net/sock.h>
#include <net/tcp.h>
@@ -415,6 +416,8 @@ struct se_cmd {
enum dma_data_direction data_direction;
/* For SAM Task Attribute */
int sam_task_attr;
+ /* Used for se_sess->sess_tag_pool */
+ unsigned int map_tag;
/* Transport protocol dependent state, see transport_state_table */
enum transport_state_table t_state;
unsigned cmd_wait_set:1;
@@ -536,6 +539,8 @@ struct se_session {
struct list_head sess_wait_list;
spinlock_t sess_cmd_lock;
struct kref sess_kref;
+ void *sess_cmd_map;
+ struct percpu_ida sess_tag_pool;
};

struct se_device;
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index 7a16178..d559c36 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -84,6 +84,9 @@ struct target_core_fabric_ops {
};

struct se_session *transport_init_session(void);
+int transport_alloc_session_tags(struct se_session *, unsigned int,
+ unsigned int);
+struct se_session *transport_init_session_tags(unsigned int, unsigned int);
void __transport_register_session(struct se_portal_group *,
struct se_node_acl *, struct se_session *, void *);
void transport_register_session(struct se_portal_group *,
--
1.7.2.5

2013-08-31 03:09:44

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 4/6] vhost/scsi: Add pre-allocation for tv_cmd SGL + upages memory

From: Nicholas Bellinger <[email protected]>

This patch adds support for pre-allocation of per tv_cmd descriptor
scatterlist + user-space page pointer memory using se_sess->sess_cmd_map
within tcm_vhost_make_nexus() code.

This includes sanity checks within vhost_scsi_map_to_sgl()
to reject I/O that exceeds these initial hardcoded values, and
the necessary cleanup in tcm_vhost_make_nexus() failure path +
tcm_vhost_drop_nexus().

v3 changes:
- Rebase to v3.11-rc5 code

Cc: Michael S. Tsirkin <[email protected]>
Cc: Asias He <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/vhost/scsi.c | 99 ++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 80 insertions(+), 19 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 8cd545a..d52a3a0 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -56,6 +56,8 @@
#define TCM_VHOST_NAMELEN 256
#define TCM_VHOST_MAX_CDB_SIZE 32
#define TCM_VHOST_DEFAULT_TAGS 256
+#define TCM_VHOST_PREALLOC_SGLS 2048
+#define TCM_VHOST_PREALLOC_PAGES 2048

struct vhost_scsi_inflight {
/* Wait for the flush operation to finish */
@@ -81,6 +83,7 @@ struct tcm_vhost_cmd {
u32 tvc_lun;
/* Pointer to the SGL formatted memory from virtio-scsi */
struct scatterlist *tvc_sgl;
+ struct page **tvc_upages;
/* Pointer to response */
struct virtio_scsi_cmd_resp __user *tvc_resp;
/* Pointer to vhost_scsi for our device */
@@ -458,8 +461,6 @@ static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
u32 i;
for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
put_page(sg_page(&tv_cmd->tvc_sgl[i]));
-
- kfree(tv_cmd->tvc_sgl);
}

tcm_vhost_put_inflight(tv_cmd->inflight);
@@ -716,6 +717,8 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
struct tcm_vhost_cmd *cmd;
struct tcm_vhost_nexus *tv_nexus;
struct se_session *se_sess;
+ struct scatterlist *sg;
+ struct page **pages;
int tag;

tv_nexus = tpg->tpg_nexus;
@@ -727,8 +730,12 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,

tag = percpu_ida_alloc(&se_sess->sess_tag_pool, GFP_KERNEL);
cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[tag];
+ sg = cmd->tvc_sgl;
+ pages = cmd->tvc_upages;
memset(cmd, 0, sizeof(struct tcm_vhost_cmd));

+ cmd->tvc_sgl = sg;
+ cmd->tvc_upages = pages;
cmd->tvc_se_cmd.map_tag = tag;
cmd->tvc_tag = v_req->tag;
cmd->tvc_task_attr = v_req->task_attr;
@@ -746,7 +753,8 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
* Returns the number of scatterlist entries used or -errno on error.
*/
static int
-vhost_scsi_map_to_sgl(struct scatterlist *sgl,
+vhost_scsi_map_to_sgl(struct tcm_vhost_cmd *tv_cmd,
+ struct scatterlist *sgl,
unsigned int sgl_count,
struct iovec *iov,
int write)
@@ -758,13 +766,25 @@ vhost_scsi_map_to_sgl(struct scatterlist *sgl,
struct page **pages;
int ret, i;

+ if (sgl_count > TCM_VHOST_PREALLOC_SGLS) {
+ pr_err("vhost_scsi_map_to_sgl() psgl_count: %u greater than"
+ " preallocated TCM_VHOST_PREALLOC_SGLS: %u\n",
+ sgl_count, TCM_VHOST_PREALLOC_SGLS);
+ return -ENOBUFS;
+ }
+
pages_nr = iov_num_pages(iov);
if (pages_nr > sgl_count)
return -ENOBUFS;

- pages = kmalloc(pages_nr * sizeof(struct page *), GFP_KERNEL);
- if (!pages)
- return -ENOMEM;
+ if (pages_nr > TCM_VHOST_PREALLOC_PAGES) {
+ pr_err("vhost_scsi_map_to_sgl() pages_nr: %u greater than"
+ " preallocated TCM_VHOST_PREALLOC_PAGES: %u\n",
+ pages_nr, TCM_VHOST_PREALLOC_PAGES);
+ return -ENOBUFS;
+ }
+
+ pages = tv_cmd->tvc_upages;

ret = get_user_pages_fast((unsigned long)ptr, pages_nr, write, pages);
/* No pages were pinned */
@@ -789,7 +809,6 @@ vhost_scsi_map_to_sgl(struct scatterlist *sgl,
}

out:
- kfree(pages);
return ret;
}

@@ -813,24 +832,20 @@ vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *cmd,

/* TODO overflow checking */

- sg = kmalloc(sizeof(cmd->tvc_sgl[0]) * sgl_count, GFP_ATOMIC);
- if (!sg)
- return -ENOMEM;
- pr_debug("%s sg %p sgl_count %u is_err %d\n", __func__,
- sg, sgl_count, !sg);
+ sg = cmd->tvc_sgl;
+ pr_debug("%s sg %p sgl_count %u\n", __func__, sg, sgl_count);
sg_init_table(sg, sgl_count);

- cmd->tvc_sgl = sg;
cmd->tvc_sgl_count = sgl_count;

pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count);
for (i = 0; i < niov; i++) {
- ret = vhost_scsi_map_to_sgl(sg, sgl_count, &iov[i], write);
+ ret = vhost_scsi_map_to_sgl(cmd, sg, sgl_count, &iov[i],
+ write);
if (ret < 0) {
for (i = 0; i < cmd->tvc_sgl_count; i++)
put_page(sg_page(&cmd->tvc_sgl[i]));
- kfree(cmd->tvc_sgl);
- cmd->tvc_sgl = NULL;
+
cmd->tvc_sgl_count = 0;
return ret;
}
@@ -1660,11 +1675,31 @@ static void tcm_vhost_drop_nodeacl(struct se_node_acl *se_acl)
kfree(nacl);
}

+static void tcm_vhost_free_cmd_map_res(struct tcm_vhost_nexus *nexus,
+ struct se_session *se_sess)
+{
+ struct tcm_vhost_cmd *tv_cmd;
+ unsigned int i;
+
+ if (!se_sess->sess_cmd_map)
+ return;
+
+ for (i = 0; i < TCM_VHOST_DEFAULT_TAGS; i++) {
+ tv_cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[i];
+
+ kfree(tv_cmd->tvc_sgl);
+ kfree(tv_cmd->tvc_upages);
+ }
+}
+
static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
const char *name)
{
struct se_portal_group *se_tpg;
+ struct se_session *se_sess;
struct tcm_vhost_nexus *tv_nexus;
+ struct tcm_vhost_cmd *tv_cmd;
+ unsigned int i;

mutex_lock(&tpg->tv_tpg_mutex);
if (tpg->tpg_nexus) {
@@ -1692,6 +1727,26 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
kfree(tv_nexus);
return -ENOMEM;
}
+ se_sess = tv_nexus->tvn_se_sess;
+ for (i = 0; i < TCM_VHOST_DEFAULT_TAGS; i++) {
+ tv_cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[i];
+
+ tv_cmd->tvc_sgl = kzalloc(sizeof(struct scatterlist) *
+ TCM_VHOST_PREALLOC_SGLS, GFP_KERNEL);
+ if (!tv_cmd->tvc_sgl) {
+ mutex_unlock(&tpg->tv_tpg_mutex);
+ pr_err("Unable to allocate tv_cmd->tvc_sgl\n");
+ goto out;
+ }
+
+ tv_cmd->tvc_upages = kzalloc(sizeof(struct page *) *
+ TCM_VHOST_PREALLOC_PAGES, GFP_KERNEL);
+ if (!tv_cmd->tvc_upages) {
+ mutex_unlock(&tpg->tv_tpg_mutex);
+ pr_err("Unable to allocate tv_cmd->tvc_upages\n");
+ goto out;
+ }
+ }
/*
* Since we are running in 'demo mode' this call with generate a
* struct se_node_acl for the tcm_vhost struct se_portal_group with
@@ -1703,9 +1758,7 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
mutex_unlock(&tpg->tv_tpg_mutex);
pr_debug("core_tpg_check_initiator_node_acl() failed"
" for %s\n", name);
- transport_free_session(tv_nexus->tvn_se_sess);
- kfree(tv_nexus);
- return -ENOMEM;
+ goto out;
}
/*
* Now register the TCM vhost virtual I_T Nexus as active with the
@@ -1717,6 +1770,12 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,

mutex_unlock(&tpg->tv_tpg_mutex);
return 0;
+
+out:
+ tcm_vhost_free_cmd_map_res(tv_nexus, se_sess);
+ transport_free_session(se_sess);
+ kfree(tv_nexus);
+ return -ENOMEM;
}

static int tcm_vhost_drop_nexus(struct tcm_vhost_tpg *tpg)
@@ -1756,6 +1815,8 @@ static int tcm_vhost_drop_nexus(struct tcm_vhost_tpg *tpg)
pr_debug("TCM_vhost_ConfigFS: Removing I_T Nexus to emulated"
" %s Initiator Port: %s\n", tcm_vhost_dump_proto_id(tpg->tport),
tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+
+ tcm_vhost_free_cmd_map_res(tv_nexus, se_sess);
/*
* Release the SCSI I_T Nexus to the emulated vhost Target Port
*/
--
1.7.2.5

2013-08-31 03:10:11

by Nicholas A. Bellinger

[permalink] [raw]
Subject: [PATCH-v5 3/6] vhost/scsi: Convert to per-cpu ida_alloc + ida_free command map

From: Nicholas Bellinger <[email protected]>

This patch changes vhost/scsi to use transport_init_session_tags()
pre-allocation logic for per-cpu session tag pooling with internal
ida_alloc() + ida_free() calls based upon the saved se_cmd->map_tag id.

FIXME: Make transport_init_session_tags() number of tags setup
configurable per vring client setting via configfs

v5 changes:
- Convert to percpu_ida.h include

v3 changes:
- Update to percpu-ida usage
- Rebase to v3.11-rc5 code

Cc: Michael S. Tsirkin <[email protected]>
Cc: Asias He <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Nicholas Bellinger <[email protected]>
---
drivers/vhost/scsi.c | 33 +++++++++++++++++++++------------
1 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0c27c7d..8cd545a 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -48,12 +48,14 @@
#include <linux/virtio_scsi.h>
#include <linux/llist.h>
#include <linux/bitmap.h>
+#include <linux/percpu_ida.h>

#include "vhost.h"

#define TCM_VHOST_VERSION "v0.1"
#define TCM_VHOST_NAMELEN 256
#define TCM_VHOST_MAX_CDB_SIZE 32
+#define TCM_VHOST_DEFAULT_TAGS 256

struct vhost_scsi_inflight {
/* Wait for the flush operation to finish */
@@ -450,6 +452,7 @@ static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
{
struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
struct tcm_vhost_cmd, tvc_se_cmd);
+ struct se_session *se_sess = se_cmd->se_sess;

if (tv_cmd->tvc_sgl_count) {
u32 i;
@@ -460,7 +463,7 @@ static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
}

tcm_vhost_put_inflight(tv_cmd->inflight);
- kfree(tv_cmd);
+ percpu_ida_free(&se_sess->sess_tag_pool, se_cmd->map_tag);
}

static int tcm_vhost_shutdown_session(struct se_session *se_sess)
@@ -704,7 +707,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
}

static struct tcm_vhost_cmd *
-vhost_scsi_allocate_cmd(struct vhost_virtqueue *vq,
+vhost_scsi_get_tag(struct vhost_virtqueue *vq,
struct tcm_vhost_tpg *tpg,
struct virtio_scsi_cmd_req *v_req,
u32 exp_data_len,
@@ -712,18 +715,21 @@ vhost_scsi_allocate_cmd(struct vhost_virtqueue *vq,
{
struct tcm_vhost_cmd *cmd;
struct tcm_vhost_nexus *tv_nexus;
+ struct se_session *se_sess;
+ int tag;

tv_nexus = tpg->tpg_nexus;
if (!tv_nexus) {
pr_err("Unable to locate active struct tcm_vhost_nexus\n");
return ERR_PTR(-EIO);
}
+ se_sess = tv_nexus->tvn_se_sess;

- cmd = kzalloc(sizeof(struct tcm_vhost_cmd), GFP_ATOMIC);
- if (!cmd) {
- pr_err("Unable to allocate struct tcm_vhost_cmd\n");
- return ERR_PTR(-ENOMEM);
- }
+ tag = percpu_ida_alloc(&se_sess->sess_tag_pool, GFP_KERNEL);
+ cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[tag];
+ memset(cmd, 0, sizeof(struct tcm_vhost_cmd));
+
+ cmd->tvc_se_cmd.map_tag = tag;
cmd->tvc_tag = v_req->tag;
cmd->tvc_task_attr = v_req->task_attr;
cmd->tvc_exp_data_len = exp_data_len;
@@ -989,10 +995,10 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
for (i = 0; i < data_num; i++)
exp_data_len += vq->iov[data_first + i].iov_len;

- cmd = vhost_scsi_allocate_cmd(vq, tpg, &v_req,
- exp_data_len, data_direction);
+ cmd = vhost_scsi_get_tag(vq, tpg, &v_req,
+ exp_data_len, data_direction);
if (IS_ERR(cmd)) {
- vq_err(vq, "vhost_scsi_allocate_cmd failed %ld\n",
+ vq_err(vq, "vhost_scsi_get_tag failed %ld\n",
PTR_ERR(cmd));
goto err_cmd;
}
@@ -1675,9 +1681,12 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
return -ENOMEM;
}
/*
- * Initialize the struct se_session pointer
+ * Initialize the struct se_session pointer and setup tagpool
+ * for struct tcm_vhost_cmd descriptors
*/
- tv_nexus->tvn_se_sess = transport_init_session();
+ tv_nexus->tvn_se_sess = transport_init_session_tags(
+ TCM_VHOST_DEFAULT_TAGS,
+ sizeof(struct tcm_vhost_cmd));
if (IS_ERR(tv_nexus->tvn_se_sess)) {
mutex_unlock(&tpg->tv_tpg_mutex);
kfree(tv_nexus);
--
1.7.2.5

2013-09-03 08:14:28

by Asias He

[permalink] [raw]
Subject: Re: [PATCH-v5 4/6] vhost/scsi: Add pre-allocation for tv_cmd SGL + upages memory

On Sat, Aug 31, 2013 at 02:52:34AM +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <[email protected]>
>
> This patch adds support for pre-allocation of per tv_cmd descriptor
> scatterlist + user-space page pointer memory using se_sess->sess_cmd_map
> within tcm_vhost_make_nexus() code.
>
> This includes sanity checks within vhost_scsi_map_to_sgl()
> to reject I/O that exceeds these initial hardcoded values, and
> the necessary cleanup in tcm_vhost_make_nexus() failure path +
> tcm_vhost_drop_nexus().
>
> v3 changes:
> - Rebase to v3.11-rc5 code
>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: Asias He <[email protected]>
> Cc: Kent Overstreet <[email protected]>
> Signed-off-by: Nicholas Bellinger <[email protected]>

Reviewed-by: Asias He <[email protected]>

> ---
> drivers/vhost/scsi.c | 99 ++++++++++++++++++++++++++++++++++++++++----------
> 1 files changed, 80 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index 8cd545a..d52a3a0 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -56,6 +56,8 @@
> #define TCM_VHOST_NAMELEN 256
> #define TCM_VHOST_MAX_CDB_SIZE 32
> #define TCM_VHOST_DEFAULT_TAGS 256
> +#define TCM_VHOST_PREALLOC_SGLS 2048
> +#define TCM_VHOST_PREALLOC_PAGES 2048
>
> struct vhost_scsi_inflight {
> /* Wait for the flush operation to finish */
> @@ -81,6 +83,7 @@ struct tcm_vhost_cmd {
> u32 tvc_lun;
> /* Pointer to the SGL formatted memory from virtio-scsi */
> struct scatterlist *tvc_sgl;
> + struct page **tvc_upages;
> /* Pointer to response */
> struct virtio_scsi_cmd_resp __user *tvc_resp;
> /* Pointer to vhost_scsi for our device */
> @@ -458,8 +461,6 @@ static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
> u32 i;
> for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
> put_page(sg_page(&tv_cmd->tvc_sgl[i]));
> -
> - kfree(tv_cmd->tvc_sgl);
> }
>
> tcm_vhost_put_inflight(tv_cmd->inflight);
> @@ -716,6 +717,8 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
> struct tcm_vhost_cmd *cmd;
> struct tcm_vhost_nexus *tv_nexus;
> struct se_session *se_sess;
> + struct scatterlist *sg;
> + struct page **pages;
> int tag;
>
> tv_nexus = tpg->tpg_nexus;
> @@ -727,8 +730,12 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
>
> tag = percpu_ida_alloc(&se_sess->sess_tag_pool, GFP_KERNEL);
> cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[tag];
> + sg = cmd->tvc_sgl;
> + pages = cmd->tvc_upages;
> memset(cmd, 0, sizeof(struct tcm_vhost_cmd));
>
> + cmd->tvc_sgl = sg;
> + cmd->tvc_upages = pages;
> cmd->tvc_se_cmd.map_tag = tag;
> cmd->tvc_tag = v_req->tag;
> cmd->tvc_task_attr = v_req->task_attr;
> @@ -746,7 +753,8 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq,
> * Returns the number of scatterlist entries used or -errno on error.
> */
> static int
> -vhost_scsi_map_to_sgl(struct scatterlist *sgl,
> +vhost_scsi_map_to_sgl(struct tcm_vhost_cmd *tv_cmd,
> + struct scatterlist *sgl,
> unsigned int sgl_count,
> struct iovec *iov,
> int write)
> @@ -758,13 +766,25 @@ vhost_scsi_map_to_sgl(struct scatterlist *sgl,
> struct page **pages;
> int ret, i;
>
> + if (sgl_count > TCM_VHOST_PREALLOC_SGLS) {
> + pr_err("vhost_scsi_map_to_sgl() psgl_count: %u greater than"
> + " preallocated TCM_VHOST_PREALLOC_SGLS: %u\n",
> + sgl_count, TCM_VHOST_PREALLOC_SGLS);
> + return -ENOBUFS;
> + }
> +
> pages_nr = iov_num_pages(iov);
> if (pages_nr > sgl_count)
> return -ENOBUFS;
>
> - pages = kmalloc(pages_nr * sizeof(struct page *), GFP_KERNEL);
> - if (!pages)
> - return -ENOMEM;
> + if (pages_nr > TCM_VHOST_PREALLOC_PAGES) {
> + pr_err("vhost_scsi_map_to_sgl() pages_nr: %u greater than"
> + " preallocated TCM_VHOST_PREALLOC_PAGES: %u\n",
> + pages_nr, TCM_VHOST_PREALLOC_PAGES);
> + return -ENOBUFS;
> + }
> +
> + pages = tv_cmd->tvc_upages;
>
> ret = get_user_pages_fast((unsigned long)ptr, pages_nr, write, pages);
> /* No pages were pinned */
> @@ -789,7 +809,6 @@ vhost_scsi_map_to_sgl(struct scatterlist *sgl,
> }
>
> out:
> - kfree(pages);
> return ret;
> }
>
> @@ -813,24 +832,20 @@ vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *cmd,
>
> /* TODO overflow checking */
>
> - sg = kmalloc(sizeof(cmd->tvc_sgl[0]) * sgl_count, GFP_ATOMIC);
> - if (!sg)
> - return -ENOMEM;
> - pr_debug("%s sg %p sgl_count %u is_err %d\n", __func__,
> - sg, sgl_count, !sg);
> + sg = cmd->tvc_sgl;
> + pr_debug("%s sg %p sgl_count %u\n", __func__, sg, sgl_count);
> sg_init_table(sg, sgl_count);
>
> - cmd->tvc_sgl = sg;
> cmd->tvc_sgl_count = sgl_count;
>
> pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count);
> for (i = 0; i < niov; i++) {
> - ret = vhost_scsi_map_to_sgl(sg, sgl_count, &iov[i], write);
> + ret = vhost_scsi_map_to_sgl(cmd, sg, sgl_count, &iov[i],
> + write);
> if (ret < 0) {
> for (i = 0; i < cmd->tvc_sgl_count; i++)
> put_page(sg_page(&cmd->tvc_sgl[i]));
> - kfree(cmd->tvc_sgl);
> - cmd->tvc_sgl = NULL;
> +
> cmd->tvc_sgl_count = 0;
> return ret;
> }
> @@ -1660,11 +1675,31 @@ static void tcm_vhost_drop_nodeacl(struct se_node_acl *se_acl)
> kfree(nacl);
> }
>
> +static void tcm_vhost_free_cmd_map_res(struct tcm_vhost_nexus *nexus,
> + struct se_session *se_sess)
> +{
> + struct tcm_vhost_cmd *tv_cmd;
> + unsigned int i;
> +
> + if (!se_sess->sess_cmd_map)
> + return;
> +
> + for (i = 0; i < TCM_VHOST_DEFAULT_TAGS; i++) {
> + tv_cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[i];
> +
> + kfree(tv_cmd->tvc_sgl);
> + kfree(tv_cmd->tvc_upages);
> + }
> +}
> +
> static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
> const char *name)
> {
> struct se_portal_group *se_tpg;
> + struct se_session *se_sess;
> struct tcm_vhost_nexus *tv_nexus;
> + struct tcm_vhost_cmd *tv_cmd;
> + unsigned int i;
>
> mutex_lock(&tpg->tv_tpg_mutex);
> if (tpg->tpg_nexus) {
> @@ -1692,6 +1727,26 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
> kfree(tv_nexus);
> return -ENOMEM;
> }
> + se_sess = tv_nexus->tvn_se_sess;
> + for (i = 0; i < TCM_VHOST_DEFAULT_TAGS; i++) {
> + tv_cmd = &((struct tcm_vhost_cmd *)se_sess->sess_cmd_map)[i];
> +
> + tv_cmd->tvc_sgl = kzalloc(sizeof(struct scatterlist) *
> + TCM_VHOST_PREALLOC_SGLS, GFP_KERNEL);
> + if (!tv_cmd->tvc_sgl) {
> + mutex_unlock(&tpg->tv_tpg_mutex);
> + pr_err("Unable to allocate tv_cmd->tvc_sgl\n");
> + goto out;
> + }
> +
> + tv_cmd->tvc_upages = kzalloc(sizeof(struct page *) *
> + TCM_VHOST_PREALLOC_PAGES, GFP_KERNEL);
> + if (!tv_cmd->tvc_upages) {
> + mutex_unlock(&tpg->tv_tpg_mutex);
> + pr_err("Unable to allocate tv_cmd->tvc_upages\n");
> + goto out;
> + }
> + }
> /*
> * Since we are running in 'demo mode' this call with generate a
> * struct se_node_acl for the tcm_vhost struct se_portal_group with
> @@ -1703,9 +1758,7 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
> mutex_unlock(&tpg->tv_tpg_mutex);
> pr_debug("core_tpg_check_initiator_node_acl() failed"
> " for %s\n", name);
> - transport_free_session(tv_nexus->tvn_se_sess);
> - kfree(tv_nexus);
> - return -ENOMEM;
> + goto out;
> }
> /*
> * Now register the TCM vhost virtual I_T Nexus as active with the
> @@ -1717,6 +1770,12 @@ static int tcm_vhost_make_nexus(struct tcm_vhost_tpg *tpg,
>
> mutex_unlock(&tpg->tv_tpg_mutex);
> return 0;
> +
> +out:
> + tcm_vhost_free_cmd_map_res(tv_nexus, se_sess);
> + transport_free_session(se_sess);
> + kfree(tv_nexus);
> + return -ENOMEM;
> }
>
> static int tcm_vhost_drop_nexus(struct tcm_vhost_tpg *tpg)
> @@ -1756,6 +1815,8 @@ static int tcm_vhost_drop_nexus(struct tcm_vhost_tpg *tpg)
> pr_debug("TCM_vhost_ConfigFS: Removing I_T Nexus to emulated"
> " %s Initiator Port: %s\n", tcm_vhost_dump_proto_id(tpg->tport),
> tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
> +
> + tcm_vhost_free_cmd_map_res(tv_nexus, se_sess);
> /*
> * Release the SCSI I_T Nexus to the emulated vhost Target Port
> */
> --
> 1.7.2.5
>

--
Asias

2013-09-03 08:16:20

by Asias He

[permalink] [raw]
Subject: Re: [PATCH-v5 2/6] target: Add transport_init_session_tags using per-cpu ida

On Sat, Aug 31, 2013 at 02:52:32AM +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <[email protected]>
>
> This patch adds lib/idr.c based transport_init_session_tags() logic
> that allows fabric drivers to setup a per-cpu se_sess->sess_tag_pool
> and associated se_sess->sess_cmd_map for basic tagged pre-allocation
> of fabric descriptor sized memory.
>
> v5 changes:
> - Convert to percpu_ida.h include
>
> v4 changes:
> - Add transport_alloc_session_tags() for fabrics that need early
> transport_init_session()
>
> v3 changes:
> - Update to percpu-ida usage
>
> Cc: Kent Overstreet <[email protected]>
> Cc: Asias He <[email protected]>
> Cc: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Nicholas Bellinger <[email protected]>

Reviewed-by: Asias He <[email protected]>

> ---
> drivers/target/target_core_transport.c | 48 ++++++++++++++++++++++++++++++++
> include/target/target_core_base.h | 5 +++
> include/target/target_core_fabric.h | 3 ++
> 3 files changed, 56 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 7172d00..98ec711 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -232,6 +232,50 @@ struct se_session *transport_init_session(void)
> }
> EXPORT_SYMBOL(transport_init_session);
>
> +int transport_alloc_session_tags(struct se_session *se_sess,
> + unsigned int tag_num, unsigned int tag_size)
> +{
> + int rc;
> +
> + se_sess->sess_cmd_map = kzalloc(tag_num * tag_size, GFP_KERNEL);
> + if (!se_sess->sess_cmd_map) {
> + pr_err("Unable to allocate se_sess->sess_cmd_map\n");
> + return -ENOMEM;
> + }
> +
> + rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
> + if (rc < 0) {
> + pr_err("Unable to init se_sess->sess_tag_pool,"
> + " tag_num: %u\n", tag_num);
> + kfree(se_sess->sess_cmd_map);
> + se_sess->sess_cmd_map = NULL;
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(transport_alloc_session_tags);
> +
> +struct se_session *transport_init_session_tags(unsigned int tag_num,
> + unsigned int tag_size)
> +{
> + struct se_session *se_sess;
> + int rc;
> +
> + se_sess = transport_init_session();
> + if (IS_ERR(se_sess))
> + return se_sess;
> +
> + rc = transport_alloc_session_tags(se_sess, tag_num, tag_size);
> + if (rc < 0) {
> + transport_free_session(se_sess);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + return se_sess;
> +}
> +EXPORT_SYMBOL(transport_init_session_tags);
> +
> /*
> * Called with spin_lock_irqsave(&struct se_portal_group->session_lock called.
> */
> @@ -367,6 +411,10 @@ EXPORT_SYMBOL(transport_deregister_session_configfs);
>
> void transport_free_session(struct se_session *se_sess)
> {
> + if (se_sess->sess_cmd_map) {
> + percpu_ida_destroy(&se_sess->sess_tag_pool);
> + kfree(se_sess->sess_cmd_map);
> + }
> kmem_cache_free(se_sess_cache, se_sess);
> }
> EXPORT_SYMBOL(transport_free_session);
> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
> index e34fc90..bd55acd 100644
> --- a/include/target/target_core_base.h
> +++ b/include/target/target_core_base.h
> @@ -5,6 +5,7 @@
> #include <linux/configfs.h>
> #include <linux/dma-mapping.h>
> #include <linux/blkdev.h>
> +#include <linux/percpu_ida.h>
> #include <scsi/scsi_cmnd.h>
> #include <net/sock.h>
> #include <net/tcp.h>
> @@ -415,6 +416,8 @@ struct se_cmd {
> enum dma_data_direction data_direction;
> /* For SAM Task Attribute */
> int sam_task_attr;
> + /* Used for se_sess->sess_tag_pool */
> + unsigned int map_tag;
> /* Transport protocol dependent state, see transport_state_table */
> enum transport_state_table t_state;
> unsigned cmd_wait_set:1;
> @@ -536,6 +539,8 @@ struct se_session {
> struct list_head sess_wait_list;
> spinlock_t sess_cmd_lock;
> struct kref sess_kref;
> + void *sess_cmd_map;
> + struct percpu_ida sess_tag_pool;
> };
>
> struct se_device;
> diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
> index 7a16178..d559c36 100644
> --- a/include/target/target_core_fabric.h
> +++ b/include/target/target_core_fabric.h
> @@ -84,6 +84,9 @@ struct target_core_fabric_ops {
> };
>
> struct se_session *transport_init_session(void);
> +int transport_alloc_session_tags(struct se_session *, unsigned int,
> + unsigned int);
> +struct se_session *transport_init_session_tags(unsigned int, unsigned int);
> void __transport_register_session(struct se_portal_group *,
> struct se_node_acl *, struct se_session *, void *);
> void transport_register_session(struct se_portal_group *,
> --
> 1.7.2.5
>

--
Asias

2013-09-03 15:59:47

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH-v5 1/6] idr: Percpu ida

Hi Andrew & Co,

Are there any other review comments to be addressed for this patch..?

If not, please kindly give your Reviewed-by if your OK for an initial
standalone merge.

Thank you,

--nab

On Sat, 2013-08-31 at 02:52 +0000, Nicholas A. Bellinger wrote:
> From: Kent Overstreet <[email protected]>
>
> Percpu frontend for allocating ids. With percpu allocation (that works),
> it's impossible to guarantee it will always be possible to allocate all
> nr_tags - typically, some will be stuck on a remote percpu freelist
> where the current job can't get to them.
>
> We do guarantee that it will always be possible to allocate at least
> (nr_tags / 2) tags - this is done by keeping track of which and how many
> cpus have tags on their percpu freelists. On allocation failure if
> enough cpus have tags that there could potentially be (nr_tags / 2) tags
> stuck on remote percpu freelists, we then pick a remote cpu at random to
> steal from.
>
> Note that there's no cpu hotplug notifier - we don't care, because
> steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
> more allocations if we had a notifier - but we'll still meet our
> guarantees and it's absolutely not a correctness issue, so I don't think
> it's worth the extra code.
>
> v5 changes:
> - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
> - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
> - Convert steal_tags() to use cpumask_weight() + cpumask_next() +
> cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
> - Add comment for alloc_global_tags() (kmo + akpm)
> - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
> - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
> - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
> (kmo + akpm)
> - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
> (kmo + akpm)
> - Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
> - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)
>
> v4 changes:
>
> - Fix tags.c reference in percpu_ida_init (akpm)
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: "Nicholas A. Bellinger" <[email protected]>
> Signed-off-by: Nicholas Bellinger <[email protected]>
> ---
> include/linux/percpu_ida.h | 59 ++++++++
> lib/Makefile | 5 +-
> lib/percpu_ida.c | 335 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 397 insertions(+), 2 deletions(-)
> create mode 100644 include/linux/percpu_ida.h
> create mode 100644 lib/percpu_ida.c
>
> diff --git a/include/linux/percpu_ida.h b/include/linux/percpu_ida.h
> new file mode 100644
> index 0000000..8a39cdc
> --- /dev/null
> +++ b/include/linux/percpu_ida.h
> @@ -0,0 +1,59 @@
> +#ifndef __PERCPU_IDA_H__
> +#define __PERCPU_IDA_H__
> +
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/init.h>
> +#include <linux/spinlock_types.h>
> +#include <linux/wait.h>
> +
> +struct percpu_ida_cpu;
> +
> +struct percpu_ida {
> + /*
> + * number of tags available to be allocated, as passed to
> + * percpu_ida_init()
> + */
> + unsigned nr_tags;
> +
> + struct percpu_ida_cpu __percpu *tag_cpu;
> +
> + /*
> + * Bitmap of cpus that (may) have tags on their percpu freelists:
> + * steal_tags() uses this to decide when to steal tags, and which cpus
> + * to try stealing from.
> + *
> + * It's ok for a freelist to be empty when its bit is set - steal_tags()
> + * will just keep looking - but the bitmap _must_ be set whenever a
> + * percpu freelist does have tags.
> + */
> + cpumask_t cpus_have_tags;
> +
> + struct {
> + spinlock_t lock;
> + /*
> + * When we go to steal tags from another cpu (see steal_tags()),
> + * we want to pick a cpu at random. Cycling through them every
> + * time we steal is a bit easier and more or less equivalent:
> + */
> + unsigned cpu_last_stolen;
> +
> + /* For sleeping on allocation failure */
> + wait_queue_head_t wait;
> +
> + /*
> + * Global freelist - it's a stack where nr_free points to the
> + * top
> + */
> + unsigned nr_free;
> + unsigned *freelist;
> + } ____cacheline_aligned_in_smp;
> +};
> +
> +int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp);
> +void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
> +
> +void percpu_ida_destroy(struct percpu_ida *pool);
> +int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags);
> +
> +#endif /* __PERCPU_IDA_H__ */
> diff --git a/lib/Makefile b/lib/Makefile
> index 7baccfd..1cb8356 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
> sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
> proportions.o flex_proportions.o prio_heap.o ratelimit.o show_mem.o \
> is_single_threaded.o plist.o decompress.o kobject_uevent.o \
> - earlycpio.o percpu-refcount.o
> + earlycpio.o percpu-refcount.o percpu_ida.o
>
> obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
> lib-$(CONFIG_MMU) += ioremap.o
> @@ -24,7 +24,8 @@ lib-y += kobject.o klist.o
> obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
> bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
> gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
> - bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o
> + bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
> + percpu_ida.o
> obj-y += string_helpers.o
> obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
> obj-y += kstrtox.o
> diff --git a/lib/percpu_ida.c b/lib/percpu_ida.c
> new file mode 100644
> index 0000000..3ef70e8
> --- /dev/null
> +++ b/lib/percpu_ida.c
> @@ -0,0 +1,335 @@
> +/*
> + * Percpu IDA library
> + *
> + * Copyright (C) 2013 Datera, Inc. Kent Overstreet
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2, or (at
> + * your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + */
> +
> +#include <linux/bitmap.h>
> +#include <linux/bitops.h>
> +#include <linux/bug.h>
> +#include <linux/err.h>
> +#include <linux/export.h>
> +#include <linux/hardirq.h>
> +#include <linux/idr.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include <linux/spinlock.h>
> +#include <linux/percpu_ida.h>
> +
> +/*
> + * Number of tags we move between the percpu freelist and the global freelist at
> + * a time
> + */
> +#define IDA_PCPU_BATCH_MOVE 32U
> +
> +/* Max size of percpu freelist, */
> +#define IDA_PCPU_SIZE ((IDA_PCPU_BATCH_MOVE * 3) / 2)
> +
> +struct percpu_ida_cpu {
> + /*
> + * Even though this is percpu, we need a lock for tag stealing by remote
> + * CPUs:
> + */
> + spinlock_t lock;
> +
> + /* nr_free/freelist form a stack of free IDs */
> + unsigned nr_free;
> + unsigned freelist[];
> +};
> +
> +static inline void move_tags(unsigned *dst, unsigned *dst_nr,
> + unsigned *src, unsigned *src_nr,
> + unsigned nr)
> +{
> + *src_nr -= nr;
> + memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr);
> + *dst_nr += nr;
> +}
> +
> +/*
> + * Try to steal tags from a remote cpu's percpu freelist.
> + *
> + * We first check how many percpu freelists have tags - we don't steal tags
> + * unless enough percpu freelists have tags on them that it's possible more than
> + * half the total tags could be stuck on remote percpu freelists.
> + *
> + * Then we iterate through the cpus until we find some tags - we don't attempt
> + * to find the "best" cpu to steal from, to keep cacheline bouncing to a
> + * minimum.
> + */
> +static inline void steal_tags(struct percpu_ida *pool,
> + struct percpu_ida_cpu *tags)
> +{
> + unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
> + struct percpu_ida_cpu *remote;
> +
> + for (cpus_have_tags = cpumask_weight(&pool->cpus_have_tags);
> + cpus_have_tags * IDA_PCPU_SIZE > pool->nr_tags / 2;
> + cpus_have_tags--) {
> + cpu = cpumask_next(cpu, &pool->cpus_have_tags);
> +
> + if (cpu >= nr_cpu_ids)
> + cpu = cpumask_first(&pool->cpus_have_tags);
> +
> + if (cpu >= nr_cpu_ids)
> + BUG();
> +
> + pool->cpu_last_stolen = cpu;
> + remote = per_cpu_ptr(pool->tag_cpu, cpu);
> +
> + cpumask_clear_cpu(cpu, &pool->cpus_have_tags);
> +
> + if (remote == tags)
> + continue;
> +
> + spin_lock(&remote->lock);
> +
> + if (remote->nr_free) {
> + memcpy(tags->freelist,
> + remote->freelist,
> + sizeof(unsigned) * remote->nr_free);
> +
> + tags->nr_free = remote->nr_free;
> + remote->nr_free = 0;
> + }
> +
> + spin_unlock(&remote->lock);
> +
> + if (tags->nr_free)
> + break;
> + }
> +}
> +
> +/*
> + * Pop up to IDA_PCPU_BATCH_MOVE IDs off the global freelist, and push them onto
> + * our percpu freelist:
> + */
> +static inline void alloc_global_tags(struct percpu_ida *pool,
> + struct percpu_ida_cpu *tags)
> +{
> + move_tags(tags->freelist, &tags->nr_free,
> + pool->freelist, &pool->nr_free,
> + min(pool->nr_free, IDA_PCPU_BATCH_MOVE));
> +}
> +
> +static inline unsigned alloc_local_tag(struct percpu_ida *pool,
> + struct percpu_ida_cpu *tags)
> +{
> + int tag = -ENOSPC;
> +
> + spin_lock(&tags->lock);
> + if (tags->nr_free)
> + tag = tags->freelist[--tags->nr_free];
> + spin_unlock(&tags->lock);
> +
> + return tag;
> +}
> +
> +/**
> + * percpu_ida_alloc - allocate a tag
> + * @pool: pool to allocate from
> + * @gfp: gfp flags
> + *
> + * Returns a tag - an integer in the range [0..nr_tags) (passed to
> + * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
> + *
> + * Safe to be called from interrupt context (assuming it isn't passed
> + * __GFP_WAIT, of course).
> + *
> + * @gfp indicates whether or not to wait until a free id is available (it's not
> + * used for internal memory allocations); thus if passed __GFP_WAIT we may sleep
> + * however long it takes until another thread frees an id (same semantics as a
> + * mempool).
> + *
> + * Will not fail if passed __GFP_WAIT.
> + */
> +int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
> +{
> + DEFINE_WAIT(wait);
> + struct percpu_ida_cpu *tags;
> + unsigned long flags;
> + int tag;
> +
> + local_irq_save(flags);
> + tags = this_cpu_ptr(pool->tag_cpu);
> +
> + /* Fastpath */
> + tag = alloc_local_tag(pool, tags);
> + if (likely(tag >= 0)) {
> + local_irq_restore(flags);
> + return tag;
> + }
> +
> + while (1) {
> + spin_lock(&pool->lock);
> +
> + /*
> + * prepare_to_wait() must come before steal_tags(), in case
> + * percpu_ida_free() on another cpu flips a bit in
> + * cpus_have_tags
> + *
> + * global lock held and irqs disabled, don't need percpu lock
> + */
> + prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
> +
> + if (!tags->nr_free)
> + alloc_global_tags(pool, tags);
> + if (!tags->nr_free)
> + steal_tags(pool, tags);
> +
> + if (tags->nr_free) {
> + tag = tags->freelist[--tags->nr_free];
> + if (tags->nr_free)
> + cpumask_set_cpu(smp_processor_id(),
> + &pool->cpus_have_tags);
> + }
> +
> + spin_unlock(&pool->lock);
> + local_irq_restore(flags);
> +
> + if (tag >= 0 || !(gfp & __GFP_WAIT))
> + break;
> +
> + schedule();
> +
> + local_irq_save(flags);
> + tags = this_cpu_ptr(pool->tag_cpu);
> + }
> +
> + finish_wait(&pool->wait, &wait);
> + return tag;
> +}
> +EXPORT_SYMBOL_GPL(percpu_ida_alloc);
> +
> +/**
> + * percpu_ida_free - free a tag
> + * @pool: pool @tag was allocated from
> + * @tag: a tag previously allocated with percpu_ida_alloc()
> + *
> + * Safe to be called from interrupt context.
> + */
> +void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
> +{
> + struct percpu_ida_cpu *tags;
> + unsigned long flags;
> + unsigned nr_free;
> +
> + BUG_ON(tag >= pool->nr_tags);
> +
> + local_irq_save(flags);
> + tags = this_cpu_ptr(pool->tag_cpu);
> +
> + spin_lock(&tags->lock);
> + tags->freelist[tags->nr_free++] = tag;
> +
> + nr_free = tags->nr_free;
> + spin_unlock(&tags->lock);
> +
> + if (nr_free == 1) {
> + cpumask_set_cpu(smp_processor_id(),
> + &pool->cpus_have_tags);
> + wake_up(&pool->wait);
> + }
> +
> + if (nr_free == IDA_PCPU_SIZE) {
> + spin_lock(&pool->lock);
> +
> + /*
> + * Global lock held and irqs disabled, don't need percpu
> + * lock
> + */
> + if (tags->nr_free == IDA_PCPU_SIZE) {
> + move_tags(pool->freelist, &pool->nr_free,
> + tags->freelist, &tags->nr_free,
> + IDA_PCPU_BATCH_MOVE);
> +
> + wake_up(&pool->wait);
> + }
> + spin_unlock(&pool->lock);
> + }
> +
> + local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(percpu_ida_free);
> +
> +/**
> + * percpu_ida_destroy - release a tag pool's resources
> + * @pool: pool to free
> + *
> + * Frees the resources allocated by percpu_ida_init().
> + */
> +void percpu_ida_destroy(struct percpu_ida *pool)
> +{
> + free_percpu(pool->tag_cpu);
> + free_pages((unsigned long) pool->freelist,
> + get_order(pool->nr_tags * sizeof(unsigned)));
> +}
> +EXPORT_SYMBOL_GPL(percpu_ida_destroy);
> +
> +/**
> + * percpu_ida_init - initialize a percpu tag pool
> + * @pool: pool to initialize
> + * @nr_tags: number of tags that will be available for allocation
> + *
> + * Initializes @pool so that it can be used to allocate tags - integers in the
> + * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
> + * preallocated array of tag structures.
> + *
> + * Allocation is percpu, but sharding is limited by nr_tags - for best
> + * performance, the workload should not span more cpus than nr_tags / 128.
> + */
> +int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
> +{
> + unsigned i, cpu, order;
> +
> + memset(pool, 0, sizeof(*pool));
> +
> + init_waitqueue_head(&pool->wait);
> + spin_lock_init(&pool->lock);
> + pool->nr_tags = nr_tags;
> +
> + /* Guard against overflow */
> + if (nr_tags > (unsigned) INT_MAX + 1) {
> + pr_err("percpu_ida_init(): nr_tags too large\n");
> + return -EINVAL;
> + }
> +
> + order = get_order(nr_tags * sizeof(unsigned));
> + pool->freelist = (void *) __get_free_pages(GFP_KERNEL, order);
> + if (!pool->freelist)
> + return -ENOMEM;
> +
> + for (i = 0; i < nr_tags; i++)
> + pool->freelist[i] = i;
> +
> + pool->nr_free = nr_tags;
> +
> + pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
> + IDA_PCPU_SIZE * sizeof(unsigned),
> + sizeof(unsigned));
> + if (!pool->tag_cpu)
> + goto err;
> +
> + for_each_possible_cpu(cpu)
> + spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
> +
> + return 0;
> +err:
> + percpu_ida_destroy(pool);
> + return -ENOMEM;
> +}
> +EXPORT_SYMBOL_GPL(percpu_ida_init);

2013-09-03 16:03:16

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH-v5 0/6] target/vhost/iscsi: Add per-cpu ida tag pre-allocation for v3.12

On Sat, Aug 31, 2013 at 02:52:30AM +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <[email protected]>
>
> Hi folks,
>
> This is an updated -v5 series for adding tag pre-allocation support of
> target fabric descriptor memory, utilizing Kent's latest per-cpu ida
> bits and incorporates akpm's last round of feedback. The full -v5
> changelog is included below.
>
> The first patch is a standalone version of per-cpu-ida, seperate from
> the full idr rewrite from Kent that is still being discussed. Jens has
> also expressed interest in a blk-mq conversion to use these per-cpu-ida
> primatives, so getting this piece merged for v3.12 would make life easier
> for both of us. ;)
>
> The second patch includes target-core setup of se_sess->sess_cmd_map +
> se_sess->sess_tag_pool resources at session creation time, using
> fabric independent code in transport_init_session_tags().
>
> The third patch is the initial conversion of vhost-scsi fabric code
> to use per-cpu ida logic for obtaining a new tcm_vhost_cmd descriptor
> via vhost_scsi_get_tag() during vhost_work_fn_t->handle_kick() ->
> vhost_scsi_handle_vq() callback execution.
>
> The forth patch is a vhost-scsi change that adds pre-allocation of
> per tcm_vhost_cmd descriptor scatterlist + user-space page pointer
> memory, that allows the last two fast-path allocations to be dropped
> from tcm_vhost_submission_work() -> vhost_scsi_map_to_sgl() fast-path
> execution.
>
> The fifth patch converts iscsi/iser-target to use allocations based on
> iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
> using an embedded isert_cmd->iscsi_cmd. This makes the conversion to
> percpu-ida pre-allocation easier.
>
> And the sixth patch enables iscsi-target to use pre-allocation logic for
> per-cpu session tag pooling with internal ida_alloc() + ida_free() calls
> based upon the saved iscsi_cmd->se_cmd.map_tag id.

For the vhost changes:
Acked-by: Michael S. Tsirkin <[email protected]>



> v5 changes:
>
> - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
> - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
> - Convert steal_tags() to use cpumask_weight() + cpumask_next() +
> cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
> - Add comment for alloc_global_tags() (kmo + akpm)
> - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
> - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
> - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
> (kmo + akpm)
> - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
> (kmo + akpm)
> - Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
> - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)
> - Convert target/vhost/iscsi-target to use percpu_ida.h (nab)
>
> Please review as v3.12 material.
>
> Thank you,
>
> --nab
>
> Kent Overstreet (1):
> idr: Percpu ida
>
> Nicholas Bellinger (5):
> target: Add transport_init_session_tags using per-cpu ida
> vhost/scsi: Convert to per-cpu ida_alloc + ida_free command map
> vhost/scsi: Add pre-allocation for tv_cmd SGL + upages memory
> iscsi/iser-target: Convert to command priv_size usage
> iscsi-target: Convert to per-cpu ida_alloc + ida_free command map
>
> drivers/infiniband/ulp/isert/ib_isert.c | 114 +++------
> drivers/infiniband/ulp/isert/ib_isert.h | 2 +-
> drivers/target/iscsi/iscsi_target.c | 16 +--
> drivers/target/iscsi/iscsi_target.h | 1 -
> drivers/target/iscsi/iscsi_target_configfs.c | 2 +-
> drivers/target/iscsi/iscsi_target_core.h | 3 +-
> drivers/target/iscsi/iscsi_target_nego.c | 28 ++-
> drivers/target/iscsi/iscsi_target_util.c | 41 ++--
> drivers/target/target_core_transport.c | 48 ++++
> drivers/vhost/scsi.c | 132 ++++++++---
> include/linux/percpu_ida.h | 59 +++++
> include/target/iscsi/iscsi_transport.h | 8 +-
> include/target/target_core_base.h | 5 +
> include/target/target_core_fabric.h | 3 +
> lib/Makefile | 5 +-
> lib/percpu_ida.c | 335 ++++++++++++++++++++++++++
> 16 files changed, 652 insertions(+), 150 deletions(-)
> create mode 100644 include/linux/percpu_ida.h
> create mode 100644 lib/percpu_ida.c
>
> --
> 1.7.2.5

2013-09-04 01:27:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH-v5 1/6] idr: Percpu ida

On Tue, 03 Sep 2013 09:06:58 -0700 "Nicholas A. Bellinger" <[email protected]> wrote:

> Are there any other review comments to be addressed for this patch..?
>
> If not, please kindly give your Reviewed-by if your OK for an initial
> standalone merge.

Sorry, I'm largely offline for the next week...

It looks OK to me (that's as close as I get to an ack :))

Nit:

> + if (cpu >= nr_cpu_ids)
> + cpu = cpumask_first(&pool->cpus_have_tags);
> +
> + if (cpu >= nr_cpu_ids)
> + BUG();

The second `if' can be moved inside the first, but hopefully the
compiler already did that.

I have dim memories that doing

local_irq_save()
spin_lock()

is not equivalent to

spin_lock_irqsave()

in some circumstances. Perhaps a lockdep thing. Or perhaps this was fixed in
the intervening years and this is no longer the case.

Such things probably won't be detected during 3.12-rcX because few
people will test iscsi, I expect. So please be sure that you've
runtime tested it with all the lockdeppy things enabled.
Documentation/SubmitChecklist section 12 has a list, but it might be
dated.

2013-09-04 04:34:56

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: [PATCH-v5 1/6] idr: Percpu ida

On Tue, 2013-09-03 at 18:27 -0700, Andrew Morton wrote:
> On Tue, 03 Sep 2013 09:06:58 -0700 "Nicholas A. Bellinger" <[email protected]> wrote:
>
> > Are there any other review comments to be addressed for this patch..?
> >
> > If not, please kindly give your Reviewed-by if your OK for an initial
> > standalone merge.
>
> Sorry, I'm largely offline for the next week...
>
> It looks OK to me (that's as close as I get to an ack :))
>

Thanks. That quote is going in the commit log btw. ;)

> Nit:
>
> > + if (cpu >= nr_cpu_ids)
> > + cpu = cpumask_first(&pool->cpus_have_tags);
> > +
> > + if (cpu >= nr_cpu_ids)
> > + BUG();
>
> The second `if' can be moved inside the first, but hopefully the
> compiler already did that.

Fixed.

>
> I have dim memories that doing
>
> local_irq_save()
> spin_lock()
>
> is not equivalent to
>
> spin_lock_irqsave()
>
> in some circumstances. Perhaps a lockdep thing. Or perhaps this was fixed in
> the intervening years and this is no longer the case.

Mmmm, I had wondered the same thing on the release side with
kref_put_spinlock_irqsave() here before:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/linux/kref.h?id=ccf5ae83a6cf3d9cfe9a7038bfe7cd38ab03d5e1

but at some point was convinced it was not a issue anymore..

>
> Such things probably won't be detected during 3.12-rcX because few
> people will test iscsi, I expect. So please be sure that you've
> runtime tested it with all the lockdeppy things enabled.
> Documentation/SubmitChecklist section 12 has a list, but it might be
> dated.

<nod>

FYI, my v3.11-rc5 kernel has been running with CONFIG_LOCKDEP_SUPPORT=y
using vhost/iscsi/iser fabrics on the -v5 patch, and not generated any
warnings (yet).

I'll double check against the other debug options in Section 12.

Thank you,

--nab