2008-12-02 01:52:24

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Changes to Linux/SCSI target mode infrastructure for v2.6.28

Greetings Tomo-san and Co,

With the ongoing work in Linux/SCSI for v2.6.28 to map target mode
struct scatterlist memory directly down to struct scsi_cmnd without the
need for a intermediate struct bio as with the existing
scsi_execute_async(), I have started the porting process for the
Linux/SCSI subsystem plugin in generic target core v3.0
(lio-core-2.6.git) on v2.6.28-rc6.

So far, using struct request for ICF_SCSI_CONTROL_NONSG_IO_CDB is up
using blk_rq_map_kern(), as well as ICF_SCSI_NON_DATA_CDB ops using
struct request. In order to get the first READ_10s of type
ICF_SCSI_DATA_SG_IO_CDB to work, I had to add a temporary
EXPORT_SYMBOL_GPL() for drivers/scsi/scsi_lib.c:scsi_req_map_sg() in
lio-core-2.6.git for v2.6.28-rc6 in order to get TYPE_DISK up using an
software emulated MPT-Fusion HBA driver with struct request. I have
been looking at drivers/scsi/scsi_tgt_lib.c() (which currently uses
struct request), and I figure we need something similar for the generic
target infrastructure, although __scsi_get_command() and
__scsi_put_command() are currently used in that code.

Below is what my patch looks like so far, I will probably just end up
commiting an temporary ifdef to keep scsi_execute_async() until the
proper pieces are in place and the other issues are resolved below.
>From there I will be able to drop in the proper upstream mapping bits
for struct scatterlist in
drivers/lio-core/target_core_pscsi.c:pscsi_map_task_SG() get rid of
scsi_req_map_sg() usage all together.

So far during my initial testing, I am running into a two different
exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
trace from SCSI softirq context:

http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png

The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
that happens after a few hundred MB of READ_10 traffic, which also
appears to pass through elv_dequeue_request() at some point:

http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png

Tomo, Boaz, Jens, any comments..?

--nab

diff --git a/drivers/lio-core/target_core_pscsi.c b/drivers/lio-core/target_core_pscsi.c
index ed9f588..5161483 100644
--- a/drivers/lio-core/target_core_pscsi.c
+++ b/drivers/lio-core/target_core_pscsi.c
@@ -37,6 +37,7 @@
#include <linux/spinlock.h>
#include <linux/smp_lock.h>
#include <linux/genhd.h>
+#include <linux/cdrom.h>
#include <linux/file.h>

#include <scsi/scsi.h>
@@ -44,6 +45,7 @@
#include <scsi/scsi_cmnd.h>
#include <scsi/scsi_host.h>
#include <sd.h>
+#include <sr.h>

#include <iscsi_linux_os.h>
#include <iscsi_linux_defs.h>
@@ -763,11 +765,11 @@ extern void *pscsi_allocate_request (
se_device_t *dev)
{
pscsi_plugin_task_t *pt;
- if (!(pt = kmalloc(sizeof(pscsi_plugin_task_t), GFP_KERNEL))) {
- TRACE_ERROR("Unable to allocate pscsi_plugin_task_t\n");
+
+ if (!(pt = kzalloc(sizeof(pscsi_plugin_task_t), GFP_KERNEL))) {
+ printk(KERN_ERR "Unable to allocate pscsi_plugin_task_t\n");
return(NULL);
}
- memset(pt, 0, sizeof(pscsi_plugin_task_t));

return(pt);
}
@@ -788,7 +790,44 @@ extern void pscsi_get_evpd_sn (unsigned char *buf, u32 size, se_device_t *dev)
return;
}

-/* pscsi_do_task(): (Part of se_subsystem_api_t template)
+static int pscsi_blk_get_request (se_task_t *task)
+{
+ pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
+ pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
+
+ pt->pscsi_req = blk_get_request(pdv->pdv_sd->request_queue,
+ (pt->pscsi_direction == DMA_TO_DEVICE), GFP_KERNEL);
+ if (!(pt->pscsi_req) || IS_ERR(pt->pscsi_req)) {
+ printk(KERN_ERR "PSCSI: blk_get_request() failed: %ld\n",
+ IS_ERR(pt->pscsi_req));
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+ }
+ /*
+ * Defined as "scsi command" in include/linux/blkdev.h.
+ */
+ pt->pscsi_req->cmd_type = REQ_TYPE_BLOCK_PC;
+ /*
+ * Setup the done function pointer for struct request,
+ * also set the end_io_data pointer.to se_task_t.
+ */
+ pt->pscsi_req->end_io = pscsi_req_done;
+ pt->pscsi_req->end_io_data = (void *)task;
+ /*
+ * Load the referenced se_task_t's SCSI CDB into
+ * include/linux/blkdev.h:struct request->cmd
+ */
+ pt->pscsi_req->cmd_len = COMMAND_SIZE(pt->pscsi_cdb[0]);
+ memcpy(pt->pscsi_req->cmd, pt->pscsi_cdb, pt->pscsi_req->cmd_len);
+ /*
+ * Setup pointer for outgoing sense data.
+ */
+ pt->pscsi_req->sense = (void *)&pt->pscsi_sense[0];
+ pt->pscsi_req->sense_len = 0;
+
+ return(0);
+}
+
+/* pscsi_do_task(): (Part of se_subsystem_api_t template)
*
*
*/
@@ -796,17 +835,32 @@ extern int pscsi_do_task (se_task_t *task)
{
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
- int ret;
-
- if ((ret = scsi_execute_async((struct scsi_device *)pdv->pdv_sd,
- pt->pscsi_cdb, COMMAND_SIZE(pt->pscsi_cdb[0]), pt->pscsi_direction,
- pt->pscsi_buf, task->task_size, task->task_sg_num,
- (task->se_obj_api->get_device_type(task->se_obj_ptr) == 0) ?
- PS_TIMEOUT_DISK : PS_TIMEOUT_OTHER, PS_RETRY,
- (void *)task, pscsi_req_done, GFP_KERNEL)) != 0) {
- TRACE_ERROR("PSCSI Execute(): returned: %d\n", ret);
- return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
- }
+ struct gendisk *gd = NULL;
+ /*
+ * Grab pointer to struct gendisk for TYPE_DISK and TYPE_ROM
+ * cases (eg: cases where struct scsi_device has a backing
+ * struct block_device. Also set the struct request->timeout
+ * value based on peripheral device type (from SCSI).
+ */
+ if (pdv->pdv_sd->type == TYPE_DISK) {
+ struct scsi_disk *sdisk = dev_get_drvdata(
+ &pdv->pdv_sd->sdev_gendev);
+ gd = sdisk->disk;
+ pt->pscsi_req->timeout = PS_TIMEOUT_DISK;
+ } else if (pdv->pdv_sd->type == TYPE_ROM) {
+ struct scsi_cd *scd = dev_get_drvdata(
+ &pdv->pdv_sd->sdev_gendev);
+ gd = scd->disk;
+ pt->pscsi_req->timeout = PS_TIMEOUT_OTHER;
+ } else
+ pt->pscsi_req->timeout = PS_TIMEOUT_OTHER;
+
+ pt->pscsi_req->retries = PS_RETRY;
+ /*
+ * Queue the struct request into the struct scsi_device->request_queue.
+ */
+ blk_execute_rq_nowait(pdv->pdv_sd->request_queue, gd,
+ pt->pscsi_req, 1, pscsi_req_done);

return(PYX_TRANSPORT_SENT_TO_TRANSPORT);
}
@@ -817,7 +871,14 @@ extern int pscsi_do_task (se_task_t *task)
*/
extern void pscsi_free_task (se_task_t *task)
{
- kfree(task->transport_req);
+ pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
+
+ if (pt->pscsi_req) {
+ blk_put_request(pt->pscsi_req);
+ pt->pscsi_req = NULL;
+ }
+
+ kfree(pt);
return;
}

@@ -1099,31 +1160,65 @@ extern void __pscsi_get_dev_info (pscsi_dev_virt_t *pdv, char *b, int *bl)
return;
}

-/* pscsi_map_task_SG():
+extern int scsi_req_map_sg(struct request *, struct scatterlist *, int, unsigned , gfp_t );
+
+/* pscsi_map_task_SG():
*
*
*/
-extern void pscsi_map_task_SG (se_task_t *task)
+extern int pscsi_map_task_SG (se_task_t *task)
{
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
+ int ret = 0;
+
pt->pscsi_buf = (void *)task->task_buf;

- return;
+ if (!task->task_size)
+ return(0);
+#if 0
+ if ((ret = blk_rq_map_sg(pdv->pdv_sd->request_queue,
+ pt->pscsi_req,
+ (struct scatterlist *)pt->pscsi_buf)) < 0) {
+ printk(KERN_ERR "PSCSI: blk_rq_map_sg() returned %d\n", ret);
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+ }
+#else
+ if ((ret = scsi_req_map_sg(pt->pscsi_req,
+ (struct scatterlist *)pt->pscsi_buf,
+ task->task_sg_num, task->task_size,
+ GFP_KERNEL)) < 0) {
+ printk(KERN_ERR "PSCSI: scsi_req_map_sg() failed: %d\n", ret);
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+ }
+#endif
+ return(0);
}

/* pscsi_map_task_non_SG():
*
*
*/
-extern void pscsi_map_task_non_SG (se_task_t *task)
+extern int pscsi_map_task_non_SG (se_task_t *task)
{
iscsi_cmd_t *cmd = task->iscsi_cmd;
+ pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
+ pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
unsigned char *buf = (unsigned char *) T_TASK(cmd)->t_task_buf;
+ int ret = 0;

- pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
pt->pscsi_buf = (void *)buf;

- return;
+ if (!task->task_size)
+ return(0);
+
+ if ((ret = blk_rq_map_kern(pdv->pdv_sd->request_queue,
+ pt->pscsi_req, pt->pscsi_buf,
+ task->task_size, GFP_KERNEL)) < 0) {
+ printk(KERN_ERR "PSCSI: blk_rq_map_kern() failed: %d\n", ret);
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+ }
+
+ return(0);
}

/* pscsi_CDB_inquiry():
@@ -1135,9 +1230,11 @@ extern int pscsi_CDB_inquiry (se_task_t *task, u32 size)
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;

pt->pscsi_direction = DMA_FROM_DEVICE;
- pscsi_map_task_non_SG(task);
+
+ if (pscsi_blk_get_request(task) < 0)
+ return(-1);

- return(0);
+ return(pscsi_map_task_non_SG(task));
}

extern int pscsi_CDB_none (se_task_t *task, u32 size)
@@ -1146,7 +1243,7 @@ extern int pscsi_CDB_none (se_task_t *task, u32 size)

pt->pscsi_direction = DMA_NONE;

- return(0);
+ return(pscsi_blk_get_request(task));
}

/* pscsi_CDB_read_non_SG():
@@ -1158,9 +1255,11 @@ extern int pscsi_CDB_read_non_SG (se_task_t *task, u32 size)
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;

pt->pscsi_direction = DMA_FROM_DEVICE;
- pscsi_map_task_non_SG(task);

- return(0);
+ if (pscsi_blk_get_request(task) < 0)
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+
+ return(pscsi_map_task_non_SG(task));
}

/* pscsi_CDB_read_SG():
@@ -1172,7 +1271,12 @@ extern int pscsi_CDB_read_SG (se_task_t *task, u32 size)
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;

pt->pscsi_direction = DMA_FROM_DEVICE;
- pscsi_map_task_SG(task);
+
+ if (pscsi_blk_get_request(task) < 0)
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+
+ if (pscsi_map_task_SG(task) < 0)
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);

return(task->task_sg_num);
}
@@ -1186,9 +1290,11 @@ extern int pscsi_CDB_write_non_SG (se_task_t *task, u32 size)
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;

pt->pscsi_direction = DMA_TO_DEVICE;
- pscsi_map_task_non_SG(task);

- return(0);
+ if (pscsi_blk_get_request(task) < 0)
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+
+ return(pscsi_map_task_non_SG(task));
}

/* pscsi_CDB_write_SG():
@@ -1200,7 +1306,12 @@ extern int pscsi_CDB_write_SG (se_task_t *task, u32 size)
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;

pt->pscsi_direction = DMA_TO_DEVICE;
- pscsi_map_task_SG(task);
+
+ if (pscsi_blk_get_request(task) < 0)
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+
+ if (pscsi_map_task_SG(task) < 0)
+ return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);

return(task->task_sg_num);
}
@@ -1386,22 +1497,25 @@ extern void pscsi_shutdown_hba (se_hba_t *hba)
*
*
*/
-//#warning FIXME: We can do some custom handling of HBA fuckups here.
-extern inline void pscsi_process_SAM_status (se_task_t *task, unsigned char *cdb, int result)
+static inline void pscsi_process_SAM_status (
+ se_task_t *task,
+ pscsi_plugin_task_t *pt)
{
- if ((task->task_scsi_status = status_byte(result))) {
+ if ((task->task_scsi_status = status_byte(pt->pscsi_result))) {
task->task_scsi_status <<= 1;
- PYXPRINT("Parallel SCSI Status Byte exception - ITT: 0x%08x Task: %p CDB: 0x%02x"
- " Result: 0x%08x\n", task->iscsi_cmd->init_task_tag, task, cdb[0], result);
+ PYXPRINT("PSCSI Status Byte exception at task: %p CDB: 0x%02x"
+ " Result: 0x%08x\n", task, pt->pscsi_cdb[0],
+ pt->pscsi_result);
}

- switch (host_byte(result)) {
+ switch (host_byte(pt->pscsi_result)) {
case DID_OK:
transport_complete_task(task, (!task->task_scsi_status));
break;
default:
- PYXPRINT("Parallel SCSI Host Byte exception - ITT: 0x%08x Task: %p CDB: 0x%02x"
- " Result: 0x%08x\n", task->iscsi_cmd->init_task_tag, task, cdb[0], result);
+ PYXPRINT("PSCSI Host Byte exception at task: %p CDB: 0x%02x"
+ " Result: 0x%08x\n", task, pt->pscsi_cdb[0],
+ pt->pscsi_result);
task->task_scsi_status = SAM_STAT_CHECK_CONDITION;
task->task_error_status = PYX_TRANSPORT_UNKNOWN_SAM_OPCODE;
task->iscsi_cmd->transport_error_status = PYX_TRANSPORT_UNKNOWN_SAM_OPCODE;
@@ -1412,21 +1526,17 @@ extern inline void pscsi_process_SAM_status (se_task_t *task, unsigned char *cdb
return;
}

-extern void pscsi_req_done (void *data, char *sense, int result, int data_len)
+extern void pscsi_req_done (struct request *req, int uptodate)
{
- se_task_t *task = (se_task_t *)data;
+ se_task_t *task = (se_task_t *)req->end_io_data;
pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *)task->transport_req;
-#if 0
- printk("pscsi_req_done(): result: %08x, sense: %p data_len: %d\n",
- result, sense, data_len);
-#endif
- pt->pscsi_result = result;
- pt->pscsi_data_len = data_len;
-
- if (result != 0)
- memcpy(pt->pscsi_sense, sense, SCSI_SENSE_BUFFERSIZE);
+
+ pt->pscsi_result = req->errors;
+ pt->pscsi_resid = req->data_len;

- pscsi_process_SAM_status(task, &pt->pscsi_cdb[0], result);
+ pscsi_process_SAM_status(task, pt);
+
+ __blk_put_request(req->q, req);
+
return;
}
-
diff --git a/drivers/lio-core/target_core_pscsi.h b/drivers/lio-core/target_core_pscsi.h
index 980587d..090f0d2 100644
--- a/drivers/lio-core/target_core_pscsi.h
+++ b/drivers/lio-core/target_core_pscsi.h
@@ -90,7 +90,7 @@ extern struct scatterlist *pscsi_get_SG (se_task_t *);
extern u32 pscsi_get_SG_count (se_task_t *);
extern int pscsi_set_non_SG_buf (unsigned char *, se_task_t *);
extern void pscsi_shutdown_hba (struct se_hba_s *);
-extern void pscsi_req_done (void *, char *, int, int);
+extern void pscsi_req_done (struct request *, int);
#endif

#include <linux/device.h>
@@ -104,8 +104,9 @@ typedef struct pscsi_plugin_task_s {
unsigned char pscsi_sense[SCSI_SENSE_BUFFERSIZE];
int pscsi_direction;
int pscsi_result;
- u32 pscsi_data_len;
+ u32 pscsi_resid;
void *pscsi_buf;
+ struct request *pscsi_req;
} pscsi_plugin_task_t;

#define PDF_HAS_CHANNEL_ID 0x01
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f5d3b96..9e03a02 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -303,8 +303,8 @@ static void scsi_bi_endio(struct bio *bio, int error)
* request can be sent to the block layer. We do not trust the scatterlist
* sent to use, as some ULDs use that struct to only organize the pages.
*/
-static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
- int nsegs, unsigned bufflen, gfp_t gfp)
+int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
+ int nsegs, unsigned bufflen, gfp_t gfp)
{
struct request_queue *q = rq->q;
int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
@@ -379,6 +379,8 @@ free_bios:
return err;
}

+EXPORT_SYMBOL_GPL(scsi_req_map_sg);
+
/**
* scsi_execute_async - insert request
* @sdev: scsi device


2008-12-02 02:04:56

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Mon, 2008-12-01 at 17:52 -0800, Nicholas A. Bellinger wrote:
> Greetings Tomo-san and Co,
>
> With the ongoing work in Linux/SCSI for v2.6.28 to map target mode
> struct scatterlist memory directly down to struct scsi_cmnd without the
> need for a intermediate struct bio as with the existing
> scsi_execute_async(), I have started the porting process for the
> Linux/SCSI subsystem plugin in generic target core v3.0
> (lio-core-2.6.git) on v2.6.28-rc6.
>
> So far, using struct request for ICF_SCSI_CONTROL_NONSG_IO_CDB is up
> using blk_rq_map_kern(), as well as ICF_SCSI_NON_DATA_CDB ops using
> struct request. In order to get the first READ_10s of type
> ICF_SCSI_DATA_SG_IO_CDB to work, I had to add a temporary
> EXPORT_SYMBOL_GPL() for drivers/scsi/scsi_lib.c:scsi_req_map_sg() in
> lio-core-2.6.git for v2.6.28-rc6 in order to get TYPE_DISK up using an
> software emulated MPT-Fusion HBA driver with struct request. I have
> been looking at drivers/scsi/scsi_tgt_lib.c() (which currently uses
> struct request), and I figure we need something similar for the generic
> target infrastructure, although __scsi_get_command() and
> __scsi_put_command() are currently used in that code.
>
> Below is what my patch looks like so far, I will probably just end up
> commiting an temporary ifdef to keep scsi_execute_async() until the
> proper pieces are in place and the other issues are resolved below.
> >From there I will be able to drop in the proper upstream mapping bits
> for struct scatterlist in
> drivers/lio-core/target_core_pscsi.c:pscsi_map_task_SG() get rid of
> scsi_req_map_sg() usage all together.
>
> So far during my initial testing, I am running into a two different
> exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
> login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
> trace from SCSI softirq context:
>
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png
>
> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> that happens after a few hundred MB of READ_10 traffic, which also
> appears to pass through elv_dequeue_request() at some point:
>
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
>

Ok, I just saw this patch:

[PATCH 2.6.28-rc6] block: internal dequeue shouldn't start timer

at http://lkml.org/lkml/2008/11/27/394.

It sounds very similar and I will try it out and see if it resolves the
issues above.

Thanks,

--nab

2008-12-02 03:11:27

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Mon, 2008-12-01 at 18:04 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2008-12-01 at 17:52 -0800, Nicholas A. Bellinger wrote:
> > Greetings Tomo-san and Co,
> >
> > With the ongoing work in Linux/SCSI for v2.6.28 to map target mode
> > struct scatterlist memory directly down to struct scsi_cmnd without the
> > need for a intermediate struct bio as with the existing
> > scsi_execute_async(), I have started the porting process for the
> > Linux/SCSI subsystem plugin in generic target core v3.0
> > (lio-core-2.6.git) on v2.6.28-rc6.
> >
> > So far, using struct request for ICF_SCSI_CONTROL_NONSG_IO_CDB is up
> > using blk_rq_map_kern(), as well as ICF_SCSI_NON_DATA_CDB ops using
> > struct request. In order to get the first READ_10s of type
> > ICF_SCSI_DATA_SG_IO_CDB to work, I had to add a temporary
> > EXPORT_SYMBOL_GPL() for drivers/scsi/scsi_lib.c:scsi_req_map_sg() in
> > lio-core-2.6.git for v2.6.28-rc6 in order to get TYPE_DISK up using an
> > software emulated MPT-Fusion HBA driver with struct request. I have
> > been looking at drivers/scsi/scsi_tgt_lib.c() (which currently uses
> > struct request), and I figure we need something similar for the generic
> > target infrastructure, although __scsi_get_command() and
> > __scsi_put_command() are currently used in that code.
> >
> > Below is what my patch looks like so far, I will probably just end up
> > commiting an temporary ifdef to keep scsi_execute_async() until the
> > proper pieces are in place and the other issues are resolved below.
> > >From there I will be able to drop in the proper upstream mapping bits
> > for struct scatterlist in
> > drivers/lio-core/target_core_pscsi.c:pscsi_map_task_SG() get rid of
> > scsi_req_map_sg() usage all together.
> >
> > So far during my initial testing, I am running into a two different
> > exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
> > login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
> > trace from SCSI softirq context:
> >
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png
> >
> > The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > that happens after a few hundred MB of READ_10 traffic, which also
> > appears to pass through elv_dequeue_request() at some point:
> >
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> >
>
> Ok, I just saw this patch:
>
> [PATCH 2.6.28-rc6] block: internal dequeue shouldn't start timer
>
> at http://lkml.org/lkml/2008/11/27/394.
>
> It sounds very similar and I will try it out and see if it resolves the
> issues above.
>

Ok, patch applied and rerunning, this time after ~20 Open/iSCSI
--login/--logout ops. The same BUG_ON in blk/blk-timeout.c:177 in
blk_add_timeout() again triggered again, this time coming from
blkdev_dequeue_request() -> scsi_request_fn() ->
__generic_unplugin_device().

http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-5.png

blkdev_dequeue_request() is used in a few other places in drivers/scsi:

target:/mnt/sdb/lio-core-2.6/drivers/scsi# grep blkdev_dequeue_request *
Binary file built-in.o matches
scsi_lib.c: blkdev_dequeue_request(req);
scsi_lib.c: blkdev_dequeue_request(req);
Binary file scsi_lib.o matches
Binary file scsi_mod.o matches
scsi_transport_sas.c: blkdev_dequeue_request(req);

Do these need to be changed to use elv_dequeue_request() as well..?

--nab

2008-12-02 03:25:45

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Mon, 2008-12-01 at 19:10 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2008-12-01 at 18:04 -0800, Nicholas A. Bellinger wrote:
> > On Mon, 2008-12-01 at 17:52 -0800, Nicholas A. Bellinger wrote:
> > > Greetings Tomo-san and Co,
> > >
> > > With the ongoing work in Linux/SCSI for v2.6.28 to map target mode
> > > struct scatterlist memory directly down to struct scsi_cmnd without the
> > > need for a intermediate struct bio as with the existing
> > > scsi_execute_async(), I have started the porting process for the
> > > Linux/SCSI subsystem plugin in generic target core v3.0
> > > (lio-core-2.6.git) on v2.6.28-rc6.
> > >
> > > So far, using struct request for ICF_SCSI_CONTROL_NONSG_IO_CDB is up
> > > using blk_rq_map_kern(), as well as ICF_SCSI_NON_DATA_CDB ops using
> > > struct request. In order to get the first READ_10s of type
> > > ICF_SCSI_DATA_SG_IO_CDB to work, I had to add a temporary
> > > EXPORT_SYMBOL_GPL() for drivers/scsi/scsi_lib.c:scsi_req_map_sg() in
> > > lio-core-2.6.git for v2.6.28-rc6 in order to get TYPE_DISK up using an
> > > software emulated MPT-Fusion HBA driver with struct request. I have
> > > been looking at drivers/scsi/scsi_tgt_lib.c() (which currently uses
> > > struct request), and I figure we need something similar for the generic
> > > target infrastructure, although __scsi_get_command() and
> > > __scsi_put_command() are currently used in that code.
> > >
> > > Below is what my patch looks like so far, I will probably just end up
> > > commiting an temporary ifdef to keep scsi_execute_async() until the
> > > proper pieces are in place and the other issues are resolved below.
> > > >From there I will be able to drop in the proper upstream mapping bits
> > > for struct scatterlist in
> > > drivers/lio-core/target_core_pscsi.c:pscsi_map_task_SG() get rid of
> > > scsi_req_map_sg() usage all together.
> > >
> > > So far during my initial testing, I am running into a two different
> > > exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
> > > login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
> > > trace from SCSI softirq context:
> > >
> > > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
> > > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png
> > >
> > > The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > that happens after a few hundred MB of READ_10 traffic, which also
> > > appears to pass through elv_dequeue_request() at some point:
> > >
> > > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > >
> >
> > Ok, I just saw this patch:
> >
> > [PATCH 2.6.28-rc6] block: internal dequeue shouldn't start timer
> >
> > at http://lkml.org/lkml/2008/11/27/394.
> >
> > It sounds very similar and I will try it out and see if it resolves the
> > issues above.
> >
>
> Ok, patch applied and rerunning, this time after ~20 Open/iSCSI
> --login/--logout ops. The same BUG_ON in blk/blk-timeout.c:177 in
> blk_add_timeout() again triggered again, this time coming from
> blkdev_dequeue_request() -> scsi_request_fn() ->
> __generic_unplugin_device().
>
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-5.png
>
> blkdev_dequeue_request() is used in a few other places in drivers/scsi:
>
> target:/mnt/sdb/lio-core-2.6/drivers/scsi# grep blkdev_dequeue_request *
> Binary file built-in.o matches
> scsi_lib.c: blkdev_dequeue_request(req);
> scsi_lib.c: blkdev_dequeue_request(req);
> Binary file scsi_lib.o matches
> Binary file scsi_mod.o matches
> scsi_transport_sas.c: blkdev_dequeue_request(req);
>
> Do these need to be changed to use elv_dequeue_request() as well..?
>

Ok, I am up and running using the following patch against v2.6.28-rc6
(along with Tejun's patch). Comments please..?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f5d3b96..77f1fe0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1505,7 +1507,7 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
struct scsi_target *starget = scsi_target(sdev);
struct Scsi_Host *shost = sdev->host;

- blkdev_dequeue_request(req);
+ elv_dequeue_request(req->q, req);

if (unlikely(cmd == NULL)) {
printk(KERN_CRIT "impossible request in %s.\n",
@@ -1634,7 +1636,7 @@ static void scsi_request_fn(struct request_queue *q)
* Remove the request from the request list.
*/
if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
- blkdev_dequeue_request(req);
+ elv_dequeue_request(req->q, req);
sdev->device_busy++;

spin_unlock(q->queue_lock);

Also, blkdev_dequeue_request() is still used in drivers/scsi/scsi_transport_sas.c()..

Thanks,

--nab

2008-12-02 03:44:00

by Tejun Heo

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

Hello, Nicholas.

Nicholas A. Bellinger wrote:
> Ok, I am up and running using the following patch against v2.6.28-rc6
> (along with Tejun's patch). Comments please..?
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f5d3b96..77f1fe0 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1505,7 +1507,7 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
> struct scsi_target *starget = scsi_target(sdev);
> struct Scsi_Host *shost = sdev->host;
>
> - blkdev_dequeue_request(req);
> + elv_dequeue_request(req->q, req);

I don't think this really matters. Either one wouldn't really matter.

> if (unlikely(cmd == NULL)) {
> printk(KERN_CRIT "impossible request in %s.\n",
> @@ -1634,7 +1636,7 @@ static void scsi_request_fn(struct request_queue *q)
> * Remove the request from the request list.
> */
> if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
> - blkdev_dequeue_request(req);
> + elv_dequeue_request(req->q, req);
> sdev->device_busy++;
>
> spin_unlock(q->queue_lock);

And this change is incorrect. Timer should start here. I don't think
you're seeing the same problem. I'll reply to the original message.

Thanks.

--
tejun

2008-12-02 04:18:49

by Tejun Heo

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

Nicholas A. Bellinger wrote:
>>> So far during my initial testing, I am running into a two different
>>> exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
>>> login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
>>> trace from SCSI softirq context:
>>>
>>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
>>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png

Can you build with debug info and find out which line is the offending
one?

>>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
>>> that happens after a few hundred MB of READ_10 traffic, which also
>>> appears to pass through elv_dequeue_request() at some point:
>>>
>>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
>>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png

Hmmm... this means blk_add_timer() is being called after the request
is already completed. All the problem discovered till now have to do
with timeout going off without the low level driver knowing about the
request. I don't have much idea and it'll probably be best to trace
what's going on using blktrace or printks. Maybe this is caused by
list corruption as with the first issue or request completion races
with requeueing?

--
tejun

2008-12-02 05:05:53

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> Nicholas A. Bellinger wrote:
> >>> So far during my initial testing, I am running into a two different
> >>> exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
> >>> login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
> >>> trace from SCSI softirq context:
> >>>
> >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
> >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png
>
> Can you build with debug info and find out which line is the offending
> one?

(gdb) list *(elv_dequeue_request+0x1d)
0x1320 is in elv_dequeue_request (include/linux/list.h:92).
87 * This is only for internal list manipulation where we know
88 * the prev/next entries already!
89 */
90 static inline void __list_del(struct list_head * prev, struct
list_head * next)
91 {
92 next->prev = prev;
93 prev->next = next;
94 }
95
96 /**
(gdb) list *(elv_dequeue_request+0x19)
0x131c is in elv_dequeue_request (block/elevator.c:836).
831 EXPORT_SYMBOL(elv_next_request);
832
833 void elv_dequeue_request(struct request_queue *q, struct request
*rq)
834 {
835 BUG_ON(list_empty(&rq->queuelist));
836 BUG_ON(ELV_ON_HASH(rq));
837
838 list_del_init(&rq->queuelist);
839
840 /*


>
> >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> >>> that happens after a few hundred MB of READ_10 traffic, which also
> >>> appears to pass through elv_dequeue_request() at some point:
> >>>
> >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
>
> Hmmm... this means blk_add_timer() is being called after the request
> is already completed. All the problem discovered till now have to do
> with timeout going off without the low level driver knowing about the
> request. I don't have much idea and it'll probably be best to trace
> what's going on using blktrace or printks.

<nod>, OK.

> Maybe this is caused by
> list corruption as with the first issue or request completion races
> with requeueing?

Hrmm, yeah, perhaps the use of scsi_req_map_sg() (which obviously still
has struct bio behind it) is causing the breakage.. I will wait for
Tomo, Boaz and co to have a look at the original patch to
lio-core-2.6.git/drivers/lio-core/target_core_pscsi.c and see if I am
missing something obvious.

Also, with the previous patch to drivers/scsi/scsi_lib.c(), I am able to
move a few GB of bulk I/O and not hit the BUG_ON in
blk/blk-timeout.c:177 in blk_add_timeout() mentioned above when feeding
struct request into struct scsi_device->request_queue with
blk_execute_rq_nowait() with use_sg > 0 CDBs. However, I am still
running into another reproducable BUG_ON in
block/cfq-iosched.c:cfq_find_next_rq() after extended bulk I/O tests.

Many thanks for your comments,

--nab

>

2008-12-02 06:40:55

by Mike Anderson

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

Nicholas A. Bellinger <[email protected]> wrote:
> On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> >
> > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > >>> appears to pass through elv_dequeue_request() at some point:
> > >>>
> > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> >
> > Hmmm... this means blk_add_timer() is being called after the request
> > is already completed.

or is it possible since elv_dequeue_request BUG_ON check of queuelist did
not trigger a request is on the queuelist with a timeout_list not empty.

It would be interesting for a debug run to change the
"BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
the cmd_flags plus req->atomic_flags and also add a
"BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
is never added to the queuelist with a timeout_list not empty.

> All the problem discovered till now have to do
> > with timeout going off without the low level driver knowing about the
> > request. I don't have much idea and it'll probably be best to trace
> > what's going on using blktrace or printks.
>
> <nod>, OK.
>
> > Maybe this is caused by
> > list corruption as with the first issue or request completion races
> > with requeueing?
>
> Hrmm, yeah, perhaps the use of scsi_req_map_sg() (which obviously still
> has struct bio behind it) is causing the breakage.. I will wait for
> Tomo, Boaz and co to have a look at the original patch to
> lio-core-2.6.git/drivers/lio-core/target_core_pscsi.c and see if I am
> missing something obvious.
>
> Also, with the previous patch to drivers/scsi/scsi_lib.c(), I am able to
> move a few GB of bulk I/O and not hit the BUG_ON in
> blk/blk-timeout.c:177 in blk_add_timeout() mentioned above when feeding
> struct request into struct scsi_device->request_queue with
> blk_execute_rq_nowait() with use_sg > 0 CDBs. However, I am still
> running into another reproducable BUG_ON in
> block/cfq-iosched.c:cfq_find_next_rq() after extended bulk I/O tests.
>

-andmike
--
Michael Anderson
[email protected]

2008-12-02 07:49:40

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Mon, 2008-12-01 at 22:40 -0800, Mike Anderson wrote:
> Nicholas A. Bellinger <[email protected]> wrote:
> > On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> > >
> > > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > > >>> appears to pass through elv_dequeue_request() at some point:
> > > >>>
> > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > >
> > > Hmmm... this means blk_add_timer() is being called after the request
> > > is already completed.
>
> or is it possible since elv_dequeue_request BUG_ON check of queuelist did
> not trigger a request is on the queuelist with a timeout_list not empty.
>
> It would be interesting for a debug run to change the
> "BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
> the cmd_flags plus req->atomic_flags and also add a
> "BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
> is never added to the queuelist with a timeout_list not empty.
>

Sounds good. I will retest w/o the previous changes to scsi_lib.c for
elv_dequeue_request() and see what happens.

Thanks for your comments,

--nab

2008-12-02 08:31:00

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Mon, 2008-12-01 at 22:40 -0800, Mike Anderson wrote:
> Nicholas A. Bellinger <[email protected]> wrote:
> > On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> > >
> > > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > > >>> appears to pass through elv_dequeue_request() at some point:
> > > >>>
> > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > >
> > > Hmmm... this means blk_add_timer() is being called after the request
> > > is already completed.
>
> or is it possible since elv_dequeue_request BUG_ON check of queuelist did
> not trigger a request is on the queuelist with a timeout_list not empty.
>
> It would be interesting for a debug run to change the
> "BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
> the cmd_flags plus req->atomic_flags and also add a
> "BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
> is never added to the queuelist with a timeout_list not empty.
>

Ok, so blk_dump_rq_flags() is now being called in
block/blk-timeout.c:blk_add_timer() for the case
BUG_ON(list_empty(&req->timeout_list)) case:

http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-6.png

Hmm, the outputted "sector " range is definately is bogus, as the only
READ_10 that have been sent are at LBA offset 0 for 8 * 512 byte sectors
for the partition table during Open/iSCSI LUN scanning.

--nab

2008-12-02 08:35:43

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Tue, 2008-12-02 at 00:30 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2008-12-01 at 22:40 -0800, Mike Anderson wrote:
> > Nicholas A. Bellinger <[email protected]> wrote:
> > > On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> > > >
> > > > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > > > >>> appears to pass through elv_dequeue_request() at some point:
> > > > >>>
> > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > > >
> > > > Hmmm... this means blk_add_timer() is being called after the request
> > > > is already completed.
> >
> > or is it possible since elv_dequeue_request BUG_ON check of queuelist did
> > not trigger a request is on the queuelist with a timeout_list not empty.
> >
> > It would be interesting for a debug run to change the
> > "BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
> > the cmd_flags plus req->atomic_flags and also add a
> > "BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
> > is never added to the queuelist with a timeout_list not empty.
> >
>
> Ok, so blk_dump_rq_flags() is now being called in
> block/blk-timeout.c:blk_add_timer() for the case
> BUG_ON(list_empty(&req->timeout_list)) case:
>
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-6.png
>
> Hmm, the outputted "sector " range is definately is bogus, as the only
> READ_10 that have been sent are at LBA offset 0 for 8 * 512 byte sectors
> for the partition table during Open/iSCSI LUN scanning.
>

Also, the following code from block/blk-core.c:blk_dump_rq_flags():

if (blk_pc_request(rq)) {
printk(KERN_INFO " cdb: ");
for (bit = 0; bit < BLK_MAX_CDB; bit++)
printk("%02x ", rq->cmd[bit]);
printk("\n");
}

is not printing out the copied CDB in struct request->cmd[], which makes
me think the struct request->cmd_flags (that blk_pc_request() is
checking) are also bogus when blk_add_timer() is being called..

--nab

> --nab
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2008-12-02 08:39:48

by Boaz Harrosh

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

Nicholas A. Bellinger wrote:
> Greetings Tomo-san and Co,
>
> With the ongoing work in Linux/SCSI for v2.6.28 to map target mode
> struct scatterlist memory directly down to struct scsi_cmnd without the
> need for a intermediate struct bio as with the existing
> scsi_execute_async(), I have started the porting process for the
> Linux/SCSI subsystem plugin in generic target core v3.0
> (lio-core-2.6.git) on v2.6.28-rc6.
>
> So far, using struct request for ICF_SCSI_CONTROL_NONSG_IO_CDB is up
> using blk_rq_map_kern(), as well as ICF_SCSI_NON_DATA_CDB ops using
> struct request. In order to get the first READ_10s of type
> ICF_SCSI_DATA_SG_IO_CDB to work, I had to add a temporary
> EXPORT_SYMBOL_GPL() for drivers/scsi/scsi_lib.c:scsi_req_map_sg() in
> lio-core-2.6.git for v2.6.28-rc6 in order to get TYPE_DISK up using an
> software emulated MPT-Fusion HBA driver with struct request. I have
> been looking at drivers/scsi/scsi_tgt_lib.c() (which currently uses
> struct request), and I figure we need something similar for the generic
> target infrastructure, although __scsi_get_command() and
> __scsi_put_command() are currently used in that code.
>
> Below is what my patch looks like so far, I will probably just end up
> commiting an temporary ifdef to keep scsi_execute_async() until the
> proper pieces are in place and the other issues are resolved below.
>>From there I will be able to drop in the proper upstream mapping bits
> for struct scatterlist in
> drivers/lio-core/target_core_pscsi.c:pscsi_map_task_SG() get rid of
> scsi_req_map_sg() usage all together.
>
> So far during my initial testing, I am running into a two different
> exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
> login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
> trace from SCSI softirq context:
>
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png
>
> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> that happens after a few hundred MB of READ_10 traffic, which also
> appears to pass through elv_dequeue_request() at some point:
>
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
>
> Tomo, Boaz, Jens, any comments..?
>
> --nab
>
> diff --git a/drivers/lio-core/target_core_pscsi.c b/drivers/lio-core/target_core_pscsi.c
> index ed9f588..5161483 100644
> --- a/drivers/lio-core/target_core_pscsi.c
> +++ b/drivers/lio-core/target_core_pscsi.c
> @@ -37,6 +37,7 @@
> #include <linux/spinlock.h>
> #include <linux/smp_lock.h>
> #include <linux/genhd.h>
> +#include <linux/cdrom.h>
> #include <linux/file.h>
>
> #include <scsi/scsi.h>
> @@ -44,6 +45,7 @@
> #include <scsi/scsi_cmnd.h>
> #include <scsi/scsi_host.h>
> #include <sd.h>
> +#include <sr.h>
>
> #include <iscsi_linux_os.h>
> #include <iscsi_linux_defs.h>
> @@ -763,11 +765,11 @@ extern void *pscsi_allocate_request (
> se_device_t *dev)
> {
> pscsi_plugin_task_t *pt;
> - if (!(pt = kmalloc(sizeof(pscsi_plugin_task_t), GFP_KERNEL))) {
> - TRACE_ERROR("Unable to allocate pscsi_plugin_task_t\n");
> +
> + if (!(pt = kzalloc(sizeof(pscsi_plugin_task_t), GFP_KERNEL))) {
> + printk(KERN_ERR "Unable to allocate pscsi_plugin_task_t\n");
> return(NULL);
> }
> - memset(pt, 0, sizeof(pscsi_plugin_task_t));
>
> return(pt);
> }
> @@ -788,7 +790,44 @@ extern void pscsi_get_evpd_sn (unsigned char *buf, u32 size, se_device_t *dev)
> return;
> }
>
> -/* pscsi_do_task(): (Part of se_subsystem_api_t template)
> +static int pscsi_blk_get_request (se_task_t *task)
> +{
> + pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> + pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
> +
> + pt->pscsi_req = blk_get_request(pdv->pdv_sd->request_queue,
> + (pt->pscsi_direction == DMA_TO_DEVICE), GFP_KERNEL);
> + if (!(pt->pscsi_req) || IS_ERR(pt->pscsi_req)) {
> + printk(KERN_ERR "PSCSI: blk_get_request() failed: %ld\n",
> + IS_ERR(pt->pscsi_req));
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> + }
> + /*
> + * Defined as "scsi command" in include/linux/blkdev.h.
> + */
> + pt->pscsi_req->cmd_type = REQ_TYPE_BLOCK_PC;
> + /*
> + * Setup the done function pointer for struct request,
> + * also set the end_io_data pointer.to se_task_t.
> + */
> + pt->pscsi_req->end_io = pscsi_req_done;
> + pt->pscsi_req->end_io_data = (void *)task;
> + /*
> + * Load the referenced se_task_t's SCSI CDB into
> + * include/linux/blkdev.h:struct request->cmd
> + */
> + pt->pscsi_req->cmd_len = COMMAND_SIZE(pt->pscsi_cdb[0]);
> + memcpy(pt->pscsi_req->cmd, pt->pscsi_cdb, pt->pscsi_req->cmd_len);

pt->pscsi_cdb is that a static sized buffer? What do you do with larger
commands received on the iscsi wire. Are they dropped? If you did have
the full > 16 CDB in some buffer you could do:

+ pt->pscsi_req->cmd_len = scsi_command_size(pt->pscsi_cdb);
+ pt->pscsi_req->cmd = pt->pscsi_cdb;

> + /*
> + * Setup pointer for outgoing sense data.
> + */
> + pt->pscsi_req->sense = (void *)&pt->pscsi_sense[0];
> + pt->pscsi_req->sense_len = 0;
> +
> + return(0);
> +}
> +
> +/* pscsi_do_task(): (Part of se_subsystem_api_t template)
> *
> *
> */
> @@ -796,17 +835,32 @@ extern int pscsi_do_task (se_task_t *task)
> {
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
> - int ret;
> -
> - if ((ret = scsi_execute_async((struct scsi_device *)pdv->pdv_sd,
> - pt->pscsi_cdb, COMMAND_SIZE(pt->pscsi_cdb[0]), pt->pscsi_direction,
> - pt->pscsi_buf, task->task_size, task->task_sg_num,
> - (task->se_obj_api->get_device_type(task->se_obj_ptr) == 0) ?
> - PS_TIMEOUT_DISK : PS_TIMEOUT_OTHER, PS_RETRY,
> - (void *)task, pscsi_req_done, GFP_KERNEL)) != 0) {
> - TRACE_ERROR("PSCSI Execute(): returned: %d\n", ret);
> - return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> - }
> + struct gendisk *gd = NULL;
> + /*
> + * Grab pointer to struct gendisk for TYPE_DISK and TYPE_ROM
> + * cases (eg: cases where struct scsi_device has a backing
> + * struct block_device. Also set the struct request->timeout
> + * value based on peripheral device type (from SCSI).
> + */
> + if (pdv->pdv_sd->type == TYPE_DISK) {
> + struct scsi_disk *sdisk = dev_get_drvdata(
> + &pdv->pdv_sd->sdev_gendev);
> + gd = sdisk->disk;
> + pt->pscsi_req->timeout = PS_TIMEOUT_DISK;
> + } else if (pdv->pdv_sd->type == TYPE_ROM) {
> + struct scsi_cd *scd = dev_get_drvdata(
> + &pdv->pdv_sd->sdev_gendev);
> + gd = scd->disk;
> + pt->pscsi_req->timeout = PS_TIMEOUT_OTHER;
> + } else
> + pt->pscsi_req->timeout = PS_TIMEOUT_OTHER;
> +
> + pt->pscsi_req->retries = PS_RETRY;
> + /*
> + * Queue the struct request into the struct scsi_device->request_queue.
> + */
> + blk_execute_rq_nowait(pdv->pdv_sd->request_queue, gd,
> + pt->pscsi_req, 1, pscsi_req_done);
>
> return(PYX_TRANSPORT_SENT_TO_TRANSPORT);
> }
> @@ -817,7 +871,14 @@ extern int pscsi_do_task (se_task_t *task)
> */
> extern void pscsi_free_task (se_task_t *task)
> {
> - kfree(task->transport_req);
> + pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> +
> + if (pt->pscsi_req) {
> + blk_put_request(pt->pscsi_req);
> + pt->pscsi_req = NULL;
> + }
> +
> + kfree(pt);
> return;
> }
>
> @@ -1099,31 +1160,65 @@ extern void __pscsi_get_dev_info (pscsi_dev_virt_t *pdv, char *b, int *bl)
> return;
> }
>
> -/* pscsi_map_task_SG():
> +extern int scsi_req_map_sg(struct request *, struct scatterlist *, int, unsigned , gfp_t );
> +
> +/* pscsi_map_task_SG():
> *
> *
> */
> -extern void pscsi_map_task_SG (se_task_t *task)
> +extern int pscsi_map_task_SG (se_task_t *task)
> {
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> + int ret = 0;
> +
> pt->pscsi_buf = (void *)task->task_buf;
>
> - return;
> + if (!task->task_size)
> + return(0);
> +#if 0
> + if ((ret = blk_rq_map_sg(pdv->pdv_sd->request_queue,
> + pt->pscsi_req,
> + (struct scatterlist *)pt->pscsi_buf)) < 0) {
> + printk(KERN_ERR "PSCSI: blk_rq_map_sg() returned %d\n", ret);
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> + }
> +#else
> + if ((ret = scsi_req_map_sg(pt->pscsi_req,
> + (struct scatterlist *)pt->pscsi_buf,
> + task->task_sg_num, task->task_size,
> + GFP_KERNEL)) < 0) {
> + printk(KERN_ERR "PSCSI: scsi_req_map_sg() failed: %d\n", ret);
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> + }
> +#endif
> + return(0);
> }
>
> /* pscsi_map_task_non_SG():
> *
> *
> */
> -extern void pscsi_map_task_non_SG (se_task_t *task)
> +extern int pscsi_map_task_non_SG (se_task_t *task)
> {
> iscsi_cmd_t *cmd = task->iscsi_cmd;
> + pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> + pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
> unsigned char *buf = (unsigned char *) T_TASK(cmd)->t_task_buf;
> + int ret = 0;
>
> - pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> pt->pscsi_buf = (void *)buf;
>
> - return;
> + if (!task->task_size)
> + return(0);
> +
> + if ((ret = blk_rq_map_kern(pdv->pdv_sd->request_queue,
> + pt->pscsi_req, pt->pscsi_buf,
> + task->task_size, GFP_KERNEL)) < 0) {
> + printk(KERN_ERR "PSCSI: blk_rq_map_kern() failed: %d\n", ret);
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> + }
> +
> + return(0);
> }
>
> /* pscsi_CDB_inquiry():
> @@ -1135,9 +1230,11 @@ extern int pscsi_CDB_inquiry (se_task_t *task, u32 size)
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
>
> pt->pscsi_direction = DMA_FROM_DEVICE;
> - pscsi_map_task_non_SG(task);
> +
> + if (pscsi_blk_get_request(task) < 0)
> + return(-1);
>
> - return(0);
> + return(pscsi_map_task_non_SG(task));
> }
>
> extern int pscsi_CDB_none (se_task_t *task, u32 size)
> @@ -1146,7 +1243,7 @@ extern int pscsi_CDB_none (se_task_t *task, u32 size)
>
> pt->pscsi_direction = DMA_NONE;
>
> - return(0);
> + return(pscsi_blk_get_request(task));
> }
>
> /* pscsi_CDB_read_non_SG():
> @@ -1158,9 +1255,11 @@ extern int pscsi_CDB_read_non_SG (se_task_t *task, u32 size)
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
>
> pt->pscsi_direction = DMA_FROM_DEVICE;
> - pscsi_map_task_non_SG(task);
>
> - return(0);
> + if (pscsi_blk_get_request(task) < 0)
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> +
> + return(pscsi_map_task_non_SG(task));
> }
>
> /* pscsi_CDB_read_SG():
> @@ -1172,7 +1271,12 @@ extern int pscsi_CDB_read_SG (se_task_t *task, u32 size)
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
>
> pt->pscsi_direction = DMA_FROM_DEVICE;
> - pscsi_map_task_SG(task);
> +
> + if (pscsi_blk_get_request(task) < 0)
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> +
> + if (pscsi_map_task_SG(task) < 0)
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
>
> return(task->task_sg_num);
> }
> @@ -1186,9 +1290,11 @@ extern int pscsi_CDB_write_non_SG (se_task_t *task, u32 size)
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
>
> pt->pscsi_direction = DMA_TO_DEVICE;
> - pscsi_map_task_non_SG(task);
>
> - return(0);
> + if (pscsi_blk_get_request(task) < 0)
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> +
> + return(pscsi_map_task_non_SG(task));
> }
>
> /* pscsi_CDB_write_SG():
> @@ -1200,7 +1306,12 @@ extern int pscsi_CDB_write_SG (se_task_t *task, u32 size)
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
>
> pt->pscsi_direction = DMA_TO_DEVICE;
> - pscsi_map_task_SG(task);
> +
> + if (pscsi_blk_get_request(task) < 0)
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> +
> + if (pscsi_map_task_SG(task) < 0)
> + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
>
> return(task->task_sg_num);
> }
> @@ -1386,22 +1497,25 @@ extern void pscsi_shutdown_hba (se_hba_t *hba)
> *
> *
> */
> -//#warning FIXME: We can do some custom handling of HBA fuckups here.
> -extern inline void pscsi_process_SAM_status (se_task_t *task, unsigned char *cdb, int result)
> +static inline void pscsi_process_SAM_status (
> + se_task_t *task,
> + pscsi_plugin_task_t *pt)
> {
> - if ((task->task_scsi_status = status_byte(result))) {
> + if ((task->task_scsi_status = status_byte(pt->pscsi_result))) {
> task->task_scsi_status <<= 1;
> - PYXPRINT("Parallel SCSI Status Byte exception - ITT: 0x%08x Task: %p CDB: 0x%02x"
> - " Result: 0x%08x\n", task->iscsi_cmd->init_task_tag, task, cdb[0], result);
> + PYXPRINT("PSCSI Status Byte exception at task: %p CDB: 0x%02x"
> + " Result: 0x%08x\n", task, pt->pscsi_cdb[0],
> + pt->pscsi_result);
> }
>
> - switch (host_byte(result)) {
> + switch (host_byte(pt->pscsi_result)) {
> case DID_OK:
> transport_complete_task(task, (!task->task_scsi_status));
> break;
> default:
> - PYXPRINT("Parallel SCSI Host Byte exception - ITT: 0x%08x Task: %p CDB: 0x%02x"
> - " Result: 0x%08x\n", task->iscsi_cmd->init_task_tag, task, cdb[0], result);
> + PYXPRINT("PSCSI Host Byte exception at task: %p CDB: 0x%02x"
> + " Result: 0x%08x\n", task, pt->pscsi_cdb[0],
> + pt->pscsi_result);
> task->task_scsi_status = SAM_STAT_CHECK_CONDITION;
> task->task_error_status = PYX_TRANSPORT_UNKNOWN_SAM_OPCODE;
> task->iscsi_cmd->transport_error_status = PYX_TRANSPORT_UNKNOWN_SAM_OPCODE;
> @@ -1412,21 +1526,17 @@ extern inline void pscsi_process_SAM_status (se_task_t *task, unsigned char *cdb
> return;
> }
>
> -extern void pscsi_req_done (void *data, char *sense, int result, int data_len)
> +extern void pscsi_req_done (struct request *req, int uptodate)
> {
> - se_task_t *task = (se_task_t *)data;
> + se_task_t *task = (se_task_t *)req->end_io_data;
> pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *)task->transport_req;
> -#if 0
> - printk("pscsi_req_done(): result: %08x, sense: %p data_len: %d\n",
> - result, sense, data_len);
> -#endif
> - pt->pscsi_result = result;
> - pt->pscsi_data_len = data_len;
> -
> - if (result != 0)
> - memcpy(pt->pscsi_sense, sense, SCSI_SENSE_BUFFERSIZE);
> +
> + pt->pscsi_result = req->errors;
> + pt->pscsi_resid = req->data_len;
>
> - pscsi_process_SAM_status(task, &pt->pscsi_cdb[0], result);
> + pscsi_process_SAM_status(task, pt);
> +
> + __blk_put_request(req->q, req);
> +
> return;
> }
> -
> diff --git a/drivers/lio-core/target_core_pscsi.h b/drivers/lio-core/target_core_pscsi.h
> index 980587d..090f0d2 100644
> --- a/drivers/lio-core/target_core_pscsi.h
> +++ b/drivers/lio-core/target_core_pscsi.h
> @@ -90,7 +90,7 @@ extern struct scatterlist *pscsi_get_SG (se_task_t *);
> extern u32 pscsi_get_SG_count (se_task_t *);
> extern int pscsi_set_non_SG_buf (unsigned char *, se_task_t *);
> extern void pscsi_shutdown_hba (struct se_hba_s *);
> -extern void pscsi_req_done (void *, char *, int, int);
> +extern void pscsi_req_done (struct request *, int);
> #endif
>
> #include <linux/device.h>
> @@ -104,8 +104,9 @@ typedef struct pscsi_plugin_task_s {
> unsigned char pscsi_sense[SCSI_SENSE_BUFFERSIZE];
> int pscsi_direction;
> int pscsi_result;
> - u32 pscsi_data_len;
> + u32 pscsi_resid;
> void *pscsi_buf;

- void *pscsi_buf;
+ struct scatterlist kern_buff;
+ struct scatterlist *pscsi_sg;

See comment bellow, you will need to change all that code,
functions APIs the works

> + struct request *pscsi_req;
> } pscsi_plugin_task_t;
>
> #define PDF_HAS_CHANNEL_ID 0x01
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f5d3b96..9e03a02 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -303,8 +303,8 @@ static void scsi_bi_endio(struct bio *bio, int error)
> * request can be sent to the block layer. We do not trust the scatterlist
> * sent to use, as some ULDs use that struct to only organize the pages.
> */
> -static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
> - int nsegs, unsigned bufflen, gfp_t gfp)
> +int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
> + int nsegs, unsigned bufflen, gfp_t gfp)
> {
> struct request_queue *q = rq->q;
> int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> @@ -379,6 +379,8 @@ free_bios:
> return err;
> }
>
> +EXPORT_SYMBOL_GPL(scsi_req_map_sg);
> +

Hmmm this is going away from here soon I hope. Better copy past it into your
own code. Perhaps optimize to your needs.

> /**
> * scsi_execute_async - insert request
> * @sdev: scsi device
>
>

Do you have this code on a gitweb somewhere. I'm to lazy to download everything
here?

One thing I can not see is where you get your "(struct scatterlist *)pt->pscsi_buf"
scatterlist from? is that from the network layer? don't you get them one by one
from the network layer and copy them to your own allocated scatterlist array?
Do you actually receive a network allocated scatterlist array? because if not
you should go directly to BIOs and drop that scatterlist stuff.

Also that void* SG/NON_SG type cast crap is not acceptable any more. We worked
very hard to clean all that up. You need to hold typed scatterlist pointer and if
you happen to have single kernel buffers for submission you hold an struct scatterlist
somewhere and do an sg_init_one() on the kernel buffer. For me this is preliminary to
even consider this code.

Boaz

2008-12-02 09:13:59

by Mike Anderson

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

Nicholas A. Bellinger <[email protected]> wrote:
> On Tue, 2008-12-02 at 00:30 -0800, Nicholas A. Bellinger wrote:
> > On Mon, 2008-12-01 at 22:40 -0800, Mike Anderson wrote:
> > > Nicholas A. Bellinger <[email protected]> wrote:
> > > > On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> > > > >
> > > > > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > > > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > > > > >>> appears to pass through elv_dequeue_request() at some point:
> > > > > >>>
> > > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > > > >
> > > > > Hmmm... this means blk_add_timer() is being called after the request
> > > > > is already completed.
> > >
> > > or is it possible since elv_dequeue_request BUG_ON check of queuelist did
> > > not trigger a request is on the queuelist with a timeout_list not empty.
> > >
> > > It would be interesting for a debug run to change the
> > > "BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
> > > the cmd_flags plus req->atomic_flags and also add a
> > > "BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
> > > is never added to the queuelist with a timeout_list not empty.
> > >
> >
> > Ok, so blk_dump_rq_flags() is now being called in
> > block/blk-timeout.c:blk_add_timer() for the case
> > BUG_ON(list_empty(&req->timeout_list)) case:
> >
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-6.png
> >
> > Hmm, the outputted "sector " range is definately is bogus, as the only
> > READ_10 that have been sent are at LBA offset 0 for 8 * 512 byte sectors
> > for the partition table during Open/iSCSI LUN scanning.
> >
>
> Also, the following code from block/blk-core.c:blk_dump_rq_flags():
>
> if (blk_pc_request(rq)) {
> printk(KERN_INFO " cdb: ");
> for (bit = 0; bit < BLK_MAX_CDB; bit++)
> printk("%02x ", rq->cmd[bit]);
> printk("\n");
> }
>
> is not printing out the copied CDB in struct request->cmd[], which makes
> me think the struct request->cmd_flags (that blk_pc_request() is
> checking) are also bogus when blk_add_timer() is being called..
>
Based on the output from 2.6.28-rc6-oops-6.png the flags value of 82c21
looks like
(__REQ_ALLOCED,__REQ_ELVPRIV,__REQ_DONTPREP,__REQ_STARTED,__REQ_SORTED,__REQ_RW),
but it is late so someone maybe should check my shift counts.

The type is also REQ_TYPE_FS so the blk_pc cdb: info will not be printed.

I will talk to you more in the morning.

> --nab
>
> > --nab
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

-andmike
--
Michael Anderson
[email protected]

2008-12-02 10:34:54

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Tue, 2008-12-02 at 10:39 +0200, Boaz Harrosh wrote:
> Nicholas A. Bellinger wrote:
> > Greetings Tomo-san and Co,
> >
> > With the ongoing work in Linux/SCSI for v2.6.28 to map target mode
> > struct scatterlist memory directly down to struct scsi_cmnd without the
> > need for a intermediate struct bio as with the existing
> > scsi_execute_async(), I have started the porting process for the
> > Linux/SCSI subsystem plugin in generic target core v3.0
> > (lio-core-2.6.git) on v2.6.28-rc6.
> >
> > So far, using struct request for ICF_SCSI_CONTROL_NONSG_IO_CDB is up
> > using blk_rq_map_kern(), as well as ICF_SCSI_NON_DATA_CDB ops using
> > struct request. In order to get the first READ_10s of type
> > ICF_SCSI_DATA_SG_IO_CDB to work, I had to add a temporary
> > EXPORT_SYMBOL_GPL() for drivers/scsi/scsi_lib.c:scsi_req_map_sg() in
> > lio-core-2.6.git for v2.6.28-rc6 in order to get TYPE_DISK up using an
> > software emulated MPT-Fusion HBA driver with struct request. I have
> > been looking at drivers/scsi/scsi_tgt_lib.c() (which currently uses
> > struct request), and I figure we need something similar for the generic
> > target infrastructure, although __scsi_get_command() and
> > __scsi_put_command() are currently used in that code.
> >
> > Below is what my patch looks like so far, I will probably just end up
> > commiting an temporary ifdef to keep scsi_execute_async() until the
> > proper pieces are in place and the other issues are resolved below.
> >>From there I will be able to drop in the proper upstream mapping bits
> > for struct scatterlist in
> > drivers/lio-core/target_core_pscsi.c:pscsi_map_task_SG() get rid of
> > scsi_req_map_sg() usage all together.
> >
> > So far during my initial testing, I am running into a two different
> > exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI
> > login/logouts in block/elevator.c:elv_dequeue_request(). Here is the
> > trace from SCSI softirq context:
> >
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png
> >
> > The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > that happens after a few hundred MB of READ_10 traffic, which also
> > appears to pass through elv_dequeue_request() at some point:
> >
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> >
> > Tomo, Boaz, Jens, any comments..?
> >
> > --nab
> >
> > diff --git a/drivers/lio-core/target_core_pscsi.c b/drivers/lio-core/target_core_pscsi.c
> > index ed9f588..5161483 100644
> > --- a/drivers/lio-core/target_core_pscsi.c
> > +++ b/drivers/lio-core/target_core_pscsi.c
> > @@ -37,6 +37,7 @@
> > #include <linux/spinlock.h>
> > #include <linux/smp_lock.h>
> > #include <linux/genhd.h>
> > +#include <linux/cdrom.h>
> > #include <linux/file.h>
> >
> > #include <scsi/scsi.h>
> > @@ -44,6 +45,7 @@
> > #include <scsi/scsi_cmnd.h>
> > #include <scsi/scsi_host.h>
> > #include <sd.h>
> > +#include <sr.h>
> >
> > #include <iscsi_linux_os.h>
> > #include <iscsi_linux_defs.h>
> > @@ -763,11 +765,11 @@ extern void *pscsi_allocate_request (
> > se_device_t *dev)
> > {
> > pscsi_plugin_task_t *pt;
> > - if (!(pt = kmalloc(sizeof(pscsi_plugin_task_t), GFP_KERNEL))) {
> > - TRACE_ERROR("Unable to allocate pscsi_plugin_task_t\n");
> > +
> > + if (!(pt = kzalloc(sizeof(pscsi_plugin_task_t), GFP_KERNEL))) {
> > + printk(KERN_ERR "Unable to allocate pscsi_plugin_task_t\n");
> > return(NULL);
> > }
> > - memset(pt, 0, sizeof(pscsi_plugin_task_t));
> >
> > return(pt);
> > }
> > @@ -788,7 +790,44 @@ extern void pscsi_get_evpd_sn (unsigned char *buf, u32 size, se_device_t *dev)
> > return;
> > }
> >
> > -/* pscsi_do_task(): (Part of se_subsystem_api_t template)
> > +static int pscsi_blk_get_request (se_task_t *task)
> > +{
> > + pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> > + pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
> > +
> > + pt->pscsi_req = blk_get_request(pdv->pdv_sd->request_queue,
> > + (pt->pscsi_direction == DMA_TO_DEVICE), GFP_KERNEL);
> > + if (!(pt->pscsi_req) || IS_ERR(pt->pscsi_req)) {
> > + printk(KERN_ERR "PSCSI: blk_get_request() failed: %ld\n",
> > + IS_ERR(pt->pscsi_req));
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > + }
> > + /*
> > + * Defined as "scsi command" in include/linux/blkdev.h.
> > + */
> > + pt->pscsi_req->cmd_type = REQ_TYPE_BLOCK_PC;
> > + /*
> > + * Setup the done function pointer for struct request,
> > + * also set the end_io_data pointer.to se_task_t.
> > + */
> > + pt->pscsi_req->end_io = pscsi_req_done;
> > + pt->pscsi_req->end_io_data = (void *)task;
> > + /*
> > + * Load the referenced se_task_t's SCSI CDB into
> > + * include/linux/blkdev.h:struct request->cmd
> > + */
> > + pt->pscsi_req->cmd_len = COMMAND_SIZE(pt->pscsi_cdb[0]);
> > + memcpy(pt->pscsi_req->cmd, pt->pscsi_cdb, pt->pscsi_req->cmd_len);
>
> pt->pscsi_cdb is that a static sized buffer? What do you do with larger
> commands received on the iscsi wire. Are they dropped?

Support for this is on the short-term TODO list for v3.0 in the generic
target engine..

> If you did have
> the full > 16 CDB in some buffer you could do:
>
> + pt->pscsi_req->cmd_len = scsi_command_size(pt->pscsi_cdb);
> + pt->pscsi_req->cmd = pt->pscsi_cdb;
>

<nod>, the plan is to remove pt->pscsi_cdb[] and use the struct
request->_cmd[] through the subsystem plugin API for the v3.0 generic
target core in lio-core-2.6.git..

> > + /*
> > + * Setup pointer for outgoing sense data.
> > + */
> > + pt->pscsi_req->sense = (void *)&pt->pscsi_sense[0];
> > + pt->pscsi_req->sense_len = 0;
> > +
> > + return(0);
> > +}
> > +
> > +/* pscsi_do_task(): (Part of se_subsystem_api_t template)
> > *
> > *
> > */
> > @@ -796,17 +835,32 @@ extern int pscsi_do_task (se_task_t *task)
> > {
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> > pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
> > - int ret;
> > -
> > - if ((ret = scsi_execute_async((struct scsi_device *)pdv->pdv_sd,
> > - pt->pscsi_cdb, COMMAND_SIZE(pt->pscsi_cdb[0]), pt->pscsi_direction,
> > - pt->pscsi_buf, task->task_size, task->task_sg_num,
> > - (task->se_obj_api->get_device_type(task->se_obj_ptr) == 0) ?
> > - PS_TIMEOUT_DISK : PS_TIMEOUT_OTHER, PS_RETRY,
> > - (void *)task, pscsi_req_done, GFP_KERNEL)) != 0) {
> > - TRACE_ERROR("PSCSI Execute(): returned: %d\n", ret);
> > - return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > - }
> > + struct gendisk *gd = NULL;
> > + /*
> > + * Grab pointer to struct gendisk for TYPE_DISK and TYPE_ROM
> > + * cases (eg: cases where struct scsi_device has a backing
> > + * struct block_device. Also set the struct request->timeout
> > + * value based on peripheral device type (from SCSI).
> > + */
> > + if (pdv->pdv_sd->type == TYPE_DISK) {
> > + struct scsi_disk *sdisk = dev_get_drvdata(
> > + &pdv->pdv_sd->sdev_gendev);
> > + gd = sdisk->disk;
> > + pt->pscsi_req->timeout = PS_TIMEOUT_DISK;
> > + } else if (pdv->pdv_sd->type == TYPE_ROM) {
> > + struct scsi_cd *scd = dev_get_drvdata(
> > + &pdv->pdv_sd->sdev_gendev);
> > + gd = scd->disk;
> > + pt->pscsi_req->timeout = PS_TIMEOUT_OTHER;
> > + } else
> > + pt->pscsi_req->timeout = PS_TIMEOUT_OTHER;
> > +
> > + pt->pscsi_req->retries = PS_RETRY;
> > + /*
> > + * Queue the struct request into the struct scsi_device->request_queue.
> > + */
> > + blk_execute_rq_nowait(pdv->pdv_sd->request_queue, gd,
> > + pt->pscsi_req, 1, pscsi_req_done);
> >
> > return(PYX_TRANSPORT_SENT_TO_TRANSPORT);
> > }
> > @@ -817,7 +871,14 @@ extern int pscsi_do_task (se_task_t *task)
> > */
> > extern void pscsi_free_task (se_task_t *task)
> > {
> > - kfree(task->transport_req);
> > + pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> > +
> > + if (pt->pscsi_req) {
> > + blk_put_request(pt->pscsi_req);
> > + pt->pscsi_req = NULL;
> > + }
> > +
> > + kfree(pt);
> > return;
> > }
> >
> > @@ -1099,31 +1160,65 @@ extern void __pscsi_get_dev_info (pscsi_dev_virt_t *pdv, char *b, int *bl)
> > return;
> > }
> >
> > -/* pscsi_map_task_SG():
> > +extern int scsi_req_map_sg(struct request *, struct scatterlist *, int, unsigned , gfp_t );
> > +
> > +/* pscsi_map_task_SG():
> > *
> > *
> > */
> > -extern void pscsi_map_task_SG (se_task_t *task)
> > +extern int pscsi_map_task_SG (se_task_t *task)
> > {
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> > + int ret = 0;
> > +
> > pt->pscsi_buf = (void *)task->task_buf;
> >
> > - return;
> > + if (!task->task_size)
> > + return(0);
> > +#if 0
> > + if ((ret = blk_rq_map_sg(pdv->pdv_sd->request_queue,
> > + pt->pscsi_req,
> > + (struct scatterlist *)pt->pscsi_buf)) < 0) {
> > + printk(KERN_ERR "PSCSI: blk_rq_map_sg() returned %d\n", ret);
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > + }
> > +#else
> > + if ((ret = scsi_req_map_sg(pt->pscsi_req,
> > + (struct scatterlist *)pt->pscsi_buf,
> > + task->task_sg_num, task->task_size,
> > + GFP_KERNEL)) < 0) {
> > + printk(KERN_ERR "PSCSI: scsi_req_map_sg() failed: %d\n", ret);
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > + }
> > +#endif
> > + return(0);
> > }
> >
> > /* pscsi_map_task_non_SG():
> > *
> > *
> > */
> > -extern void pscsi_map_task_non_SG (se_task_t *task)
> > +extern int pscsi_map_task_non_SG (se_task_t *task)
> > {
> > iscsi_cmd_t *cmd = task->iscsi_cmd;
> > + pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> > + pscsi_dev_virt_t *pdv = (pscsi_dev_virt_t *) task->iscsi_dev->dev_ptr;
> > unsigned char *buf = (unsigned char *) T_TASK(cmd)->t_task_buf;
> > + int ret = 0;
> >
> > - pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> > pt->pscsi_buf = (void *)buf;
> >
> > - return;
> > + if (!task->task_size)
> > + return(0);
> > +
> > + if ((ret = blk_rq_map_kern(pdv->pdv_sd->request_queue,
> > + pt->pscsi_req, pt->pscsi_buf,
> > + task->task_size, GFP_KERNEL)) < 0) {
> > + printk(KERN_ERR "PSCSI: blk_rq_map_kern() failed: %d\n", ret);
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > + }
> > +
> > + return(0);
> > }
> >
> > /* pscsi_CDB_inquiry():
> > @@ -1135,9 +1230,11 @@ extern int pscsi_CDB_inquiry (se_task_t *task, u32 size)
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> >
> > pt->pscsi_direction = DMA_FROM_DEVICE;
> > - pscsi_map_task_non_SG(task);
> > +
> > + if (pscsi_blk_get_request(task) < 0)
> > + return(-1);
> >
> > - return(0);
> > + return(pscsi_map_task_non_SG(task));
> > }
> >
> > extern int pscsi_CDB_none (se_task_t *task, u32 size)
> > @@ -1146,7 +1243,7 @@ extern int pscsi_CDB_none (se_task_t *task, u32 size)
> >
> > pt->pscsi_direction = DMA_NONE;
> >
> > - return(0);
> > + return(pscsi_blk_get_request(task));
> > }
> >
> > /* pscsi_CDB_read_non_SG():
> > @@ -1158,9 +1255,11 @@ extern int pscsi_CDB_read_non_SG (se_task_t *task, u32 size)
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> >
> > pt->pscsi_direction = DMA_FROM_DEVICE;
> > - pscsi_map_task_non_SG(task);
> >
> > - return(0);
> > + if (pscsi_blk_get_request(task) < 0)
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > +
> > + return(pscsi_map_task_non_SG(task));
> > }
> >
> > /* pscsi_CDB_read_SG():
> > @@ -1172,7 +1271,12 @@ extern int pscsi_CDB_read_SG (se_task_t *task, u32 size)
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> >
> > pt->pscsi_direction = DMA_FROM_DEVICE;
> > - pscsi_map_task_SG(task);
> > +
> > + if (pscsi_blk_get_request(task) < 0)
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > +
> > + if (pscsi_map_task_SG(task) < 0)
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> >
> > return(task->task_sg_num);
> > }
> > @@ -1186,9 +1290,11 @@ extern int pscsi_CDB_write_non_SG (se_task_t *task, u32 size)
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> >
> > pt->pscsi_direction = DMA_TO_DEVICE;
> > - pscsi_map_task_non_SG(task);
> >
> > - return(0);
> > + if (pscsi_blk_get_request(task) < 0)
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > +
> > + return(pscsi_map_task_non_SG(task));
> > }
> >
> > /* pscsi_CDB_write_SG():
> > @@ -1200,7 +1306,12 @@ extern int pscsi_CDB_write_SG (se_task_t *task, u32 size)
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *) task->transport_req;
> >
> > pt->pscsi_direction = DMA_TO_DEVICE;
> > - pscsi_map_task_SG(task);
> > +
> > + if (pscsi_blk_get_request(task) < 0)
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> > +
> > + if (pscsi_map_task_SG(task) < 0)
> > + return(PYX_TRANSPORT_LOGICAL_UNIT_COMMUNICATION_FAILURE);
> >
> > return(task->task_sg_num);
> > }
> > @@ -1386,22 +1497,25 @@ extern void pscsi_shutdown_hba (se_hba_t *hba)
> > *
> > *
> > */
> > -//#warning FIXME: We can do some custom handling of HBA fuckups here.
> > -extern inline void pscsi_process_SAM_status (se_task_t *task, unsigned char *cdb, int result)
> > +static inline void pscsi_process_SAM_status (
> > + se_task_t *task,
> > + pscsi_plugin_task_t *pt)
> > {
> > - if ((task->task_scsi_status = status_byte(result))) {
> > + if ((task->task_scsi_status = status_byte(pt->pscsi_result))) {
> > task->task_scsi_status <<= 1;
> > - PYXPRINT("Parallel SCSI Status Byte exception - ITT: 0x%08x Task: %p CDB: 0x%02x"
> > - " Result: 0x%08x\n", task->iscsi_cmd->init_task_tag, task, cdb[0], result);
> > + PYXPRINT("PSCSI Status Byte exception at task: %p CDB: 0x%02x"
> > + " Result: 0x%08x\n", task, pt->pscsi_cdb[0],
> > + pt->pscsi_result);
> > }
> >
> > - switch (host_byte(result)) {
> > + switch (host_byte(pt->pscsi_result)) {
> > case DID_OK:
> > transport_complete_task(task, (!task->task_scsi_status));
> > break;
> > default:
> > - PYXPRINT("Parallel SCSI Host Byte exception - ITT: 0x%08x Task: %p CDB: 0x%02x"
> > - " Result: 0x%08x\n", task->iscsi_cmd->init_task_tag, task, cdb[0], result);
> > + PYXPRINT("PSCSI Host Byte exception at task: %p CDB: 0x%02x"
> > + " Result: 0x%08x\n", task, pt->pscsi_cdb[0],
> > + pt->pscsi_result);
> > task->task_scsi_status = SAM_STAT_CHECK_CONDITION;
> > task->task_error_status = PYX_TRANSPORT_UNKNOWN_SAM_OPCODE;
> > task->iscsi_cmd->transport_error_status = PYX_TRANSPORT_UNKNOWN_SAM_OPCODE;
> > @@ -1412,21 +1526,17 @@ extern inline void pscsi_process_SAM_status (se_task_t *task, unsigned char *cdb
> > return;
> > }
> >
> > -extern void pscsi_req_done (void *data, char *sense, int result, int data_len)
> > +extern void pscsi_req_done (struct request *req, int uptodate)
> > {
> > - se_task_t *task = (se_task_t *)data;
> > + se_task_t *task = (se_task_t *)req->end_io_data;
> > pscsi_plugin_task_t *pt = (pscsi_plugin_task_t *)task->transport_req;
> > -#if 0
> > - printk("pscsi_req_done(): result: %08x, sense: %p data_len: %d\n",
> > - result, sense, data_len);
> > -#endif
> > - pt->pscsi_result = result;
> > - pt->pscsi_data_len = data_len;
> > -
> > - if (result != 0)
> > - memcpy(pt->pscsi_sense, sense, SCSI_SENSE_BUFFERSIZE);
> > +
> > + pt->pscsi_result = req->errors;
> > + pt->pscsi_resid = req->data_len;
> >
> > - pscsi_process_SAM_status(task, &pt->pscsi_cdb[0], result);
> > + pscsi_process_SAM_status(task, pt);
> > +
> > + __blk_put_request(req->q, req);
> > +
> > return;
> > }
> > -
> > diff --git a/drivers/lio-core/target_core_pscsi.h b/drivers/lio-core/target_core_pscsi.h
> > index 980587d..090f0d2 100644
> > --- a/drivers/lio-core/target_core_pscsi.h
> > +++ b/drivers/lio-core/target_core_pscsi.h
> > @@ -90,7 +90,7 @@ extern struct scatterlist *pscsi_get_SG (se_task_t *);
> > extern u32 pscsi_get_SG_count (se_task_t *);
> > extern int pscsi_set_non_SG_buf (unsigned char *, se_task_t *);
> > extern void pscsi_shutdown_hba (struct se_hba_s *);
> > -extern void pscsi_req_done (void *, char *, int, int);
> > +extern void pscsi_req_done (struct request *, int);
> > #endif
> >
> > #include <linux/device.h>
> > @@ -104,8 +104,9 @@ typedef struct pscsi_plugin_task_s {
> > unsigned char pscsi_sense[SCSI_SENSE_BUFFERSIZE];
> > int pscsi_direction;
> > int pscsi_result;
> > - u32 pscsi_data_len;
> > + u32 pscsi_resid;
> > void *pscsi_buf;
>
> - void *pscsi_buf;
> + struct scatterlist kern_buff;
> + struct scatterlist *pscsi_sg;
>
> See comment bellow, you will need to change all that code,
> functions APIs the works
>

<nod>, as the I/O becomes stable using struct request, the plan is to
remove all of the duplicated structure members in struct
pscsi_plugin_task_s, and most likely getting rid of the structure all
together..

> > + struct request *pscsi_req;
> > } pscsi_plugin_task_t;
> >
> > #define PDF_HAS_CHANNEL_ID 0x01
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index f5d3b96..9e03a02 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -303,8 +303,8 @@ static void scsi_bi_endio(struct bio *bio, int error)
> > * request can be sent to the block layer. We do not trust the scatterlist
> > * sent to use, as some ULDs use that struct to only organize the pages.
> > */
> > -static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
> > - int nsegs, unsigned bufflen, gfp_t gfp)
> > +int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
> > + int nsegs, unsigned bufflen, gfp_t gfp)
> > {
> > struct request_queue *q = rq->q;
> > int nr_pages = (bufflen + sgl[0].offset + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > @@ -379,6 +379,8 @@ free_bios:
> > return err;
> > }
> >
> > +EXPORT_SYMBOL_GPL(scsi_req_map_sg);
> > +
>
> Hmmm this is going away from here soon I hope. Better copy past it into your
> own code. Perhaps optimize to your needs.
>

The only reason I am using this (and hence using the BIOs behind it) is
because the upstream code for struct scatterlist -> struct request ->
struct scsi_device->request_queue via blk_execute_rq_nowait() ops does
not exist (yet :-)

> > /**
> > * scsi_execute_async - insert request
> > * @sdev: scsi device
> >
> >
>
> Do you have this code on a gitweb somewhere. I'm to lazy to download everything
> here?
>

The v3.0 tree is located at:

http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=summary

So, I have not commited the original patch for scsi_execute_async()
removal in the LIO PSCSI plugin in v2.6.28-rc6, but I will add some
temporary stubs for both codepaths and default to the legacy codepath
(until the timeout issues are resolved) and make the commit soon.

> One thing I can not see is where you get your "(struct scatterlist *)pt->pscsi_buf"
> scatterlist from? is that from the network layer? don't you get them one by one
> from the network layer and copy them to your own allocated scatterlist array?

So, the generic target engine in v3.0 lio-core-2.6.git can either
internally allocate *OR* accept linked list scatterlist with struct page
members in drivers/lio-core/target_core_transport.h:se_mem_t->se_page
coming from a fabric module. In current v2.9 and v3.0 LIO-Target
running code (just the iSCSi Target engine), this means the internally
allocation using Linux kernel sockets.

> Do you actually receive a network allocated scatterlist array? because if not
> you should go directly to BIOs and drop that scatterlist stuff.
>

Well, this discussion is for the target_core_mod/PSCSI plugin located in
drivers/lio-core/target-core-pscsi.c. The target_core_mod/IBLOCK plugin
located in drivers/lio-core/target-core-iblock.c already uses
bio_alloc() and bio_add_page() + submit_bio() when talking to struct
block_device for READ/WRITE I/O, and of course emulating the SCSI
control path for the rest. This is used when talking with virtual
struct block_device, that do not map struct back to a single struct
scsi_device.

Considering the target_core_mod/PSCSI plugin is intended to be directly
queuing passthrough CDBs (the generic engine does handle sectors >
$STORAGE_OBJECT->max_sectors) into struct scsi_device->request_queue, I
would like to get rid of the BIO usage with scsi_execute_async() (/me
thinks of legacy scsi_do_req()) when a generic target engine is handling
the hardware restrictions related to max_sectors, queue_depth and struct
page memory allocation.

So I understand that getting rid of scsi_execute_async() and
scsi_req_map_sg() is a work in progress, and I am really asking more of
what should be used in place of scsi_req_map_sg() in
target_core_mod/PSCSI..?

> Also that void* SG/NON_SG type cast crap is not acceptable any more. We worked
> very hard to clean all that up. You need to hold typed scatterlist pointer and if
> you happen to have single kernel buffers for submission you hold an struct scatterlist
> somewhere and do an sg_init_one() on the kernel buffer.

Ok, drivers/lio-core/target_core_transport.c:transport_calc_sg_num() is
using include/linux/scatterlist.h:sg_init_table() to setup the freshly
allocated contigious array of struct scatterlist. se_mem_t->se_page
located in the linked list of memory at
drivers/lio-core/target_core_base.h:se_transport_task_t->t_mem_list.
The generic target algortihms in target_core_transport.c allow the
linked list of struct page memory to be mapped to struct
scatterlist->page_link in
target_core_transport.c:transport_map_mem_to_sg() using
sg_assign_page(). This happens already across subsystem plugins for
Linux/SCSI, Linux/BLOCK and Linux/VFS using the same code path.

> For me this is preliminary to
> even consider this code.
>

Ok, I have no problem getting rid of the casts between target_core_mod
and the subsystem plugins related to contigious buffer or non contigious
buffer usage with struct scatterlist (or whatever that can reference
struct page). I will make the stub commits for the original patch later
today, and will remove the casts from se_task_t->task_buf that are going
down to drivers/lio-core/target_core_base.h:se_subsystem_api_t at some
point in the near future.

Many thanks for your comments,

--nab

2008-12-02 23:18:53

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Tue, 2008-12-02 at 01:13 -0800, Mike Anderson wrote:
> Nicholas A. Bellinger <[email protected]> wrote:
> > On Tue, 2008-12-02 at 00:30 -0800, Nicholas A. Bellinger wrote:
> > > On Mon, 2008-12-01 at 22:40 -0800, Mike Anderson wrote:
> > > > Nicholas A. Bellinger <[email protected]> wrote:
> > > > > On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> > > > > >
> > > > > > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > > > > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > > > > > >>> appears to pass through elv_dequeue_request() at some point:
> > > > > > >>>
> > > > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > > > > >
> > > > > > Hmmm... this means blk_add_timer() is being called after the request
> > > > > > is already completed.
> > > >
> > > > or is it possible since elv_dequeue_request BUG_ON check of queuelist did
> > > > not trigger a request is on the queuelist with a timeout_list not empty.
> > > >
> > > > It would be interesting for a debug run to change the
> > > > "BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
> > > > the cmd_flags plus req->atomic_flags and also add a
> > > > "BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
> > > > is never added to the queuelist with a timeout_list not empty.
> > > >
> > >
> > > Ok, so blk_dump_rq_flags() is now being called in
> > > block/blk-timeout.c:blk_add_timer() for the case
> > > BUG_ON(list_empty(&req->timeout_list)) case:
> > >
> > > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-6.png
> > >
> > > Hmm, the outputted "sector " range is definately is bogus, as the only
> > > READ_10 that have been sent are at LBA offset 0 for 8 * 512 byte sectors
> > > for the partition table during Open/iSCSI LUN scanning.
> > >
> >
> > Also, the following code from block/blk-core.c:blk_dump_rq_flags():
> >
> > if (blk_pc_request(rq)) {
> > printk(KERN_INFO " cdb: ");
> > for (bit = 0; bit < BLK_MAX_CDB; bit++)
> > printk("%02x ", rq->cmd[bit]);
> > printk("\n");
> > }
> >
> > is not printing out the copied CDB in struct request->cmd[], which makes
> > me think the struct request->cmd_flags (that blk_pc_request() is
> > checking) are also bogus when blk_add_timer() is being called..
> >
> Based on the output from 2.6.28-rc6-oops-6.png the flags value of 82c21
> looks like
> (__REQ_ALLOCED,__REQ_ELVPRIV,__REQ_DONTPREP,__REQ_STARTED,__REQ_SORTED,__REQ_RW),
> but it is late so someone maybe should check my shift counts.
>
> The type is also REQ_TYPE_FS so the blk_pc cdb: info will not be printed.
>
> I will talk to you more in the morning.
>

Ok, after double checking the original patch again, I noticed that
struct request was getting released with __blk_put_request() from the
struct request->end_io() caller context in pscsi_req_done(), and again
after the original target mode CDB was acknowledged from the fabric in
pscsi_free_task() with blk_put_request().. Not good.

Anyways, I removed the extra blk_put_request() in pscsi_free_task() and
am running some badblocks tests now. Things are obviously looking much
better.

Thanks for your help,

--nab

2008-12-03 00:47:26

by Nicholas A. Bellinger

[permalink] [raw]
Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28

On Tue, 2008-12-02 at 15:18 -0800, Nicholas A. Bellinger wrote:
> On Tue, 2008-12-02 at 01:13 -0800, Mike Anderson wrote:
> > Nicholas A. Bellinger <[email protected]> wrote:
> > > On Tue, 2008-12-02 at 00:30 -0800, Nicholas A. Bellinger wrote:
> > > > On Mon, 2008-12-01 at 22:40 -0800, Mike Anderson wrote:
> > > > > Nicholas A. Bellinger <[email protected]> wrote:
> > > > > > On Tue, 2008-12-02 at 13:18 +0900, Tejun Heo wrote:
> > > > > > >
> > > > > > > >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout()
> > > > > > > >>> that happens after a few hundred MB of READ_10 traffic, which also
> > > > > > > >>> appears to pass through elv_dequeue_request() at some point:
> > > > > > > >>>
> > > > > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png
> > > > > > > >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png
> > > > > > >
> > > > > > > Hmmm... this means blk_add_timer() is being called after the request
> > > > > > > is already completed.
> > > > >
> > > > > or is it possible since elv_dequeue_request BUG_ON check of queuelist did
> > > > > not trigger a request is on the queuelist with a timeout_list not empty.
> > > > >
> > > > > It would be interesting for a debug run to change the
> > > > > "BUG_ON(!list_empty(&req->timeout_list))" in blk_add_timer to print out
> > > > > the cmd_flags plus req->atomic_flags and also add a
> > > > > "BUG_ON(!list_empty(&rq->timeout_list))" to elv_insert to ensure a request
> > > > > is never added to the queuelist with a timeout_list not empty.
> > > > >
> > > >
> > > > Ok, so blk_dump_rq_flags() is now being called in
> > > > block/blk-timeout.c:blk_add_timer() for the case
> > > > BUG_ON(list_empty(&req->timeout_list)) case:
> > > >
> > > > http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-6.png
> > > >
> > > > Hmm, the outputted "sector " range is definately is bogus, as the only
> > > > READ_10 that have been sent are at LBA offset 0 for 8 * 512 byte sectors
> > > > for the partition table during Open/iSCSI LUN scanning.
> > > >
> > >
> > > Also, the following code from block/blk-core.c:blk_dump_rq_flags():
> > >
> > > if (blk_pc_request(rq)) {
> > > printk(KERN_INFO " cdb: ");
> > > for (bit = 0; bit < BLK_MAX_CDB; bit++)
> > > printk("%02x ", rq->cmd[bit]);
> > > printk("\n");
> > > }
> > >
> > > is not printing out the copied CDB in struct request->cmd[], which makes
> > > me think the struct request->cmd_flags (that blk_pc_request() is
> > > checking) are also bogus when blk_add_timer() is being called..
> > >
> > Based on the output from 2.6.28-rc6-oops-6.png the flags value of 82c21
> > looks like
> > (__REQ_ALLOCED,__REQ_ELVPRIV,__REQ_DONTPREP,__REQ_STARTED,__REQ_SORTED,__REQ_RW),
> > but it is late so someone maybe should check my shift counts.
> >
> > The type is also REQ_TYPE_FS so the blk_pc cdb: info will not be printed.
> >
> > I will talk to you more in the morning.
> >
>
> Ok, after double checking the original patch again, I noticed that
> struct request was getting released with __blk_put_request() from the
> struct request->end_io() caller context in pscsi_req_done(), and again
> after the original target mode CDB was acknowledged from the fabric in
> pscsi_free_task() with blk_put_request().. Not good.
>
> Anyways, I removed the extra blk_put_request() in pscsi_free_task() and
> am running some badblocks tests now. Things are obviously looking much
> better.
>

Ok, passing badblocks tests now and looking stable on v2.6.28-rc7 using
Tejun's patch, here the commit diff for the LIO-Core v3.0 PSCSI changes:

http://git.kernel.org/?p=linux/kernel/git/nab/lio-core-2.6.git;a=commitdiff;h=1b14b5e1fc9a7074322b1015bb86eca2a8ef4560

Now the question becomes, how can we get rid of scsi_req_map_sg() usage
(and struct bio in between) all together..?

--nab