2002-10-25 19:57:40

by Cameron, Steve

[permalink] [raw]
Subject: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements


Add blk_rq_map_sg_one_by_one function to ll_rw_blk.c in order to allow a low
level driver to map scatter gather elements from the block subsystem one
at a time into device specific data structures instead of having to take
a copy of all of them all at once and then afterwards copy them into a device
specific format.

To follow, a patch to the cciss driver using this interface which
allows up to 256 scatter gather elements (current limit is 31).

-- steve

drivers/block/ll_rw_blk.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/blkdev.h | 4 +++
2 files changed, 63 insertions

--- lx2544ac3/drivers/block/ll_rw_blk.c~blkdev_sg Fri Oct 25 13:46:08 2002
+++ lx2544ac3-scameron/drivers/block/ll_rw_blk.c Fri Oct 25 13:46:08 2002
@@ -766,6 +766,64 @@ inline int blk_hw_contig_segment(request
}

/*
+ * Map a request to a single scatterlist, one by one. As each scatterlist is
+ * constructed consume_sg is called so the scattlerlist may be used. The idea
+ * is the caller won't have to allocate all the space needed for a potentially
+ * large number of sg elements, instead the consume_sg callback function
+ * consumes each element as it's made. sg_state is a pointer meant to be
+ * used by consume_sg to remember its state between calls.
+ */
+int blk_rq_map_sg_one_by_one(request_queue_t *q, struct request *rq,
+ consume_sg_callback consume_sg, void *sg_state)
+{
+ struct scatterlist sg;
+ struct bio_vec *bvec, *bvprv;
+ struct bio *bio;
+ int nsegs, i, cluster;
+
+ nsegs = 0;
+ cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
+
+ /*
+ * for each bio in rq
+ */
+ bvprv = NULL;
+ rq_for_each_bio(bio, rq) {
+ /*
+ * for each segment in bio
+ */
+ bio_for_each_segment(bvec, bio, i) {
+ int nbytes = bvec->bv_len;
+
+ if (bvprv && cluster) {
+ if (sg.length + nbytes > q->max_segment_size)
+ goto new_segment;
+
+ if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
+ goto new_segment;
+ if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
+ goto new_segment;
+
+ sg.length += nbytes;
+ } else {
+new_segment:
+ if (bvprv) consume_sg(&sg, sg_state);
+ memset(&sg,0,sizeof(struct scatterlist));
+ sg.page = bvec->bv_page;
+ sg.length = nbytes;
+ sg.offset = bvec->bv_offset;
+ nsegs++;
+ }
+ bvprv = bvec;
+ } /* segments in bio */
+ } /* bios in rq */
+
+ if (bvprv) consume_sg(&sg, sg_state);
+
+ return nsegs;
+}
+
+/*
* map a request to scatterlist, return number of sg entries setup. Caller
* must make sure sg can hold rq->nr_phys_segments entries
*/
@@ -2112,6 +2170,7 @@ EXPORT_SYMBOL(blk_queue_max_hw_segments)
EXPORT_SYMBOL(blk_queue_max_segment_size);
EXPORT_SYMBOL(blk_queue_hardsect_size);
EXPORT_SYMBOL(blk_queue_segment_boundary);
+EXPORT_SYMBOL(blk_rq_map_sg_one_by_one);
EXPORT_SYMBOL(blk_rq_map_sg);
EXPORT_SYMBOL(blk_nohighio);
EXPORT_SYMBOL(blk_dump_rq_flags);
--- lx2544ac3/include/linux/blkdev.h~blkdev_sg Fri Oct 25 13:46:08 2002
+++ lx2544ac3-scameron/include/linux/blkdev.h Fri Oct 25 13:46:08 2002
@@ -341,6 +341,10 @@ extern void blk_queue_prep_rq(request_qu
extern void blk_queue_merge_bvec(request_queue_t *, merge_bvec_fn *);
extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);

+typedef void (*consume_sg_callback)(struct scatterlist *, void *);
+extern int blk_rq_map_sg_one_by_one(request_queue_t *, struct request *,
+ consume_sg_callback consume_sg, void * sg_state);
+
extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
extern void blk_dump_rq_flags(struct request *, char *);
extern void generic_unplug_device(void *);

.


2002-10-25 20:18:32

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Fri, Oct 25 2002, Stephen Cameron wrote:
>
> Add blk_rq_map_sg_one_by_one function to ll_rw_blk.c in order to allow a low
> level driver to map scatter gather elements from the block subsystem one
> at a time into device specific data structures instead of having to take
> a copy of all of them all at once and then afterwards copy them into a device
> specific format.
>
> To follow, a patch to the cciss driver using this interface which
> allows up to 256 scatter gather elements (current limit is 31).

I have to say that I think this patch is ugly, and a complete duplicate
of existing code. This is always bad, especially in the case of
something not really straight forward (like blk_rq_map_sg()). A hack.

I can understand the need for something like this, for drivers that
can't use a struct scatterlist directly. I'd rather do this in a
different way, though.

__blk_rq_map(q, rq, sglist, callback)
{
...
/* do the mapping *.
if (sglist) {
sglist[nsegs].page = bvec->bv_page;
...
} else
callback(q, bvec, nsegs);
}

blk_rq_map_rq(q, rq, sglist)
{
return __blk_rq_map(q, rq, sglist, NULL);
}

blk_rq_map_callback(q, rq, callback)
{
return __blk_rq_map(q, rq, NULL, callback);
}

Or use a cookie like you currently do.

Oh, and do try to follow the style. It's

if (cond)
foo();

not

if (fond) foo();

--
Jens Axboe

2002-10-25 20:00:18

by Cameron, Steve

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

DESC

Allow cciss driver to use more then 31 scatter gather elements

drivers/block/Config.help | 12 ++
drivers/block/Config.in | 3
drivers/block/cciss.c | 261 ++++++++++++++++++++++++++++++++++-----------
drivers/block/cciss.h | 10 +
drivers/block/cciss_cmd.h | 12 ++
drivers/block/cciss_scsi.c | 26 ++++
6 files changed, 261 insertions, 63 deletions

--- lx2544ac3/drivers/block/cciss.c~sgc3 Fri Oct 25 13:46:12 2002
+++ lx2544ac3-scameron/drivers/block/cciss.c Fri Oct 25 13:46:12 2002
@@ -37,6 +37,7 @@
#include <linux/init.h>
#include <linux/hdreg.h>
#include <linux/spinlock.h>
+#include <linux/completion.h>
#include <asm/uaccess.h>
#include <asm/io.h>

@@ -164,7 +165,7 @@ static int cciss_proc_get_info(char *buf
" Current Q depth: %d\n"
" Max Q depth since init: %d\n"
" Max # commands on controller since init: %d\n"
- " Max SG entries since init: %d\n\n",
+ " Max SG entries since init: %d/%d\n\n",
h->devname,
h->product_name,
(unsigned long)h->board_id,
@@ -173,7 +174,8 @@ static int cciss_proc_get_info(char *buf
(unsigned int)h->intr,
h->num_luns,
h->highest_lun,
- h->Qdepth, h->maxQsinceinit, h->max_outstanding, h->maxSG);
+ h->Qdepth, h->maxQsinceinit, h->max_outstanding, h->maxSG,
+ CONFIG_CISS_MAX_SG_PAGES * MAXSGENTRIES);

pos += size; len += size;
cciss_proc_tape_report(ctlr, buffer, &pos, &len);
@@ -248,6 +250,44 @@ static void __init cciss_procinit(int i)
}
#endif /* CONFIG_PROC_FS */

+static int supports_sg_chaining(ctlr_info_t *h)
+{
+ int fw_ver = (h->firm_ver[0] - '0') * 100 + /* [1] == '.', skip */
+ (h->firm_ver[2] - '0') * 10 +
+ (h->firm_ver[3] - '0');
+ /* 1st generation controllers need >= 2.32 firmware */
+ if (h->pdev->device == PCI_DEVICE_ID_COMPAQ_CISS)
+ return (fw_ver >= 232);
+ /* 2nd generation controllers need >= 1.78 firmware */
+ if (h->pdev->device == PCI_DEVICE_ID_COMPAQ_CISSB)
+ return (fw_ver >= 178);
+ /* All controllers >= 3rd generation (will) support this. */
+ return 1;
+}
+
+static void cciss_allocate_extra_sg_pages(int ctlr)
+{
+ /* Allocate extra SG list pages if supported and requested */
+ if (supports_sg_chaining(hba[ctlr]) && CONFIG_CISS_MAX_SG_PAGES > 1) {
+ hba[ctlr]->sglist_pool = (SGDescriptor_struct *)
+ pci_alloc_consistent(hba[ctlr]->pdev,
+ (CONFIG_CISS_MAX_SG_PAGES-1) * MAXSGENTRIES *
+ NR_CMDS * sizeof(SGDescriptor_struct),
+ &(hba[ctlr]->sglist_pool_dhandle));
+ if (hba[ctlr]->sglist_pool == NULL)
+ printk(KERN_WARNING "cciss%d: Scatter-gather list "
+ "chaining not enabled.", ctlr);
+ }
+}
+
+static inline int figure_sg_limit(int i)
+{
+ /* figure out how many SGs we can allow block system to send us */
+ if (supports_sg_chaining(hba[i]) && hba[i]->sglist_pool != NULL)
+ return CONFIG_CISS_MAX_SG_PAGES * (MAXSGENTRIES-1);
+ return MAXSGENTRIES; /* no SG list chaining. */
+}
+
/*
* For operations that cannot sleep, a command block is allocated at init,
* and managed by cmd_alloc() and cmd_free() using a simple bitmap to track
@@ -258,7 +298,7 @@ static void __init cciss_procinit(int i)
static CommandList_struct * cmd_alloc(ctlr_info_t *h, int get_from_pool)
{
CommandList_struct *c;
- int i;
+ int i, j;
u64bit temp64;
dma_addr_t cmd_dma_handle, err_dma_handle;

@@ -273,14 +313,40 @@ static CommandList_struct * cmd_alloc(ct
c->err_info = (ErrorInfo_struct *)pci_alloc_consistent(
h->pdev, sizeof(ErrorInfo_struct),
&err_dma_handle);
-
- if (c->err_info == NULL)
- {
+
+ if (c->err_info == NULL) {
pci_free_consistent(h->pdev,
sizeof(CommandList_struct), c, cmd_dma_handle);
return NULL;
}
+ c->sgdlist[0].sgd = c->SG;
+ c->sgdlist[0].dma = cmd_dma_handle +
+ ((void *) c->SG - (void *) c);
memset(c->err_info, 0, sizeof(ErrorInfo_struct));
+
+ if (h->sglist_pool != NULL) {
+ c->sgdlist[1].sgd = (SGDescriptor_struct *)
+ pci_alloc_consistent(h->pdev, SGD_SIZE *
+ (CONFIG_CISS_MAX_SG_PAGES - 1),
+ &c->sgdlist[0].dma);
+
+ if (c->sgdlist[1].sgd == NULL) {
+ pci_free_consistent(h->pdev,
+ sizeof(ErrorInfo_struct), c->err_info,
+ err_dma_handle);
+ pci_free_consistent(h->pdev,
+ sizeof(CommandList_struct), c,
+ cmd_dma_handle);
+ return NULL;
+ }
+ memset(&c->sgdlist[0].sgd[0], 0, SGD_SIZE);
+ for (j=1;j<CONFIG_CISS_MAX_SG_PAGES-1; j++) {
+ c->sgdlist[j+1].sgd =
+ (void *)c->sgdlist[j].sgd+j*SGD_SIZE;
+ c->sgdlist[j+1].dma =
+ c->sgdlist[j].dma + j * SGD_SIZE;
+ }
+ }
} else /* get it out of the controllers pool */
{
do {
@@ -300,6 +366,22 @@ static CommandList_struct * cmd_alloc(ct
err_dma_handle = h->errinfo_pool_dhandle
+ i*sizeof(ErrorInfo_struct);
h->nr_allocs++;
+ c->sgdlist[0].sgd = c->SG;
+ c->sgdlist[0].dma = cmd_dma_handle +
+ ((void *) c->SG - (void *) c);
+ if (h->sglist_pool != NULL)
+ for (j=0; j < CONFIG_CISS_MAX_SG_PAGES-1 ; j++) {
+ c->sgdlist[j+1].sgd =
+ (void *) h->sglist_pool +
+ SGD_SIZE *
+ (i * (CONFIG_CISS_MAX_SG_PAGES - 1)+j);
+ // This memset is paranoia.
+ // memset(c->sgdlist[j+1].sgd, 0, SGD_SIZE);
+ c->sgdlist[j+1].dma =
+ h->sglist_pool_dhandle +
+ SGD_SIZE *
+ (i * (CONFIG_CISS_MAX_SG_PAGES - 1)+j);
+ }
}

c->busaddr = (__u32) cmd_dma_handle;
@@ -307,11 +389,8 @@ static CommandList_struct * cmd_alloc(ct
c->ErrDesc.Addr.lower = temp64.val32.lower;
c->ErrDesc.Addr.upper = temp64.val32.upper;
c->ErrDesc.Len = sizeof(ErrorInfo_struct);
-
c->ctlr = h->ctlr;
- return c;
-
-
+ return c;
}

/*
@@ -326,6 +405,9 @@ static void cmd_free(ctlr_info_t *h, Com
{
temp64.val32.lower = c->ErrDesc.Addr.lower;
temp64.val32.upper = c->ErrDesc.Addr.upper;
+ pci_free_consistent(h->pdev, SGD_SIZE *
+ CONFIG_CISS_MAX_SG_PAGES,
+ c->sgdlist[0].sgd, c->sgdlist[0].dma);
pci_free_consistent(h->pdev, sizeof(ErrorInfo_struct),
c->err_info, (dma_addr_t) temp64.val);
pci_free_consistent(h->pdev, sizeof(CommandList_struct),
@@ -1768,6 +1850,50 @@ static inline void complete_command( ctl
cmd_free(h,cmd,1);
}

+struct sg_setup_data {
+ int total_sgs;
+ int sgelem;
+ int sgd_number;
+ int dir;
+ CommandList_struct *c;
+};
+
+static void
+cciss_setup_sg(struct scatterlist *sg, struct sg_setup_data *sgs)
+{
+ /* this function gets called back by blk_rq_map_sg_one_by_one
+ once for each scatter gather element. */
+
+ SGDescriptor_struct *sgd;
+ ctlr_info_t *h = hba[sgs->c->ctlr];
+ u64bit temp64;
+
+ sgd = &sgs->c->sgdlist[sgs->sgd_number].sgd[sgs->sgelem];
+
+ /* if we can chain, and we hit end of page, chain to new SG page */
+ if (sgs->sgelem == MAXSGENTRIES-1 && h->sglist_pool != NULL) {
+ sgs->sgd_number++; // Move on to the next SG page
+ /* Set this SG elem to point to phys addr of next SG page */
+ temp64.val = (__u64) sgs->c->sgdlist[sgs->sgd_number].dma;
+ sgd->Addr.lower = temp64.val32.lower;
+ sgd->Addr.upper = temp64.val32.upper;
+ sgd->Ext = 0x80000000; // mark as a pointer to next SG page
+ sgd->Len = SGD_SIZE; // total no. of bytes in a SG page.
+ // (probably wrong for the last page.)
+ sgd = &sgs->c->sgdlist[sgs->sgd_number].sgd[0];
+ sgs->sgelem = 0; // Start at SG zero on the new SG page
+ sgs->total_sgs++; // ptrs to SG pages count as SG elems.
+ }
+ temp64.val = (__u64) pci_map_page(h->pdev, sg->page, sg->offset,
+ sg->length, sgs->dir);
+ sgd->Addr.lower = temp64.val32.lower;
+ sgd->Addr.upper = temp64.val32.upper;
+ sgd->Len = sg->length;
+ sgd->Ext = 0; // Ordinary SG element, not a ptr to SG page
+ sgs->sgelem++; // move on to next SG element on this page.
+ sgs->total_sgs++;
+}
+
/*
* Get a request and submit it to the controller.
*/
@@ -1777,10 +1903,8 @@ static void do_cciss_request(request_que
CommandList_struct *c;
int start_blk, seg;
struct request *creq;
- u64bit temp64;
- struct scatterlist tmp_sg[MAXSGENTRIES];
drive_info_struct *drv;
- int i, dir;
+ struct sg_setup_data sgs;

if (blk_queue_plugged(q))
goto startio;
@@ -1790,7 +1914,7 @@ queue:
goto startio;

creq = elv_next_request(q);
- if (creq->nr_phys_segments > MAXSGENTRIES)
+ if (creq->nr_phys_segments > h->sg_limit)
BUG();

if (( c = cmd_alloc(h, 1)) == NULL)
@@ -1822,33 +1946,32 @@ queue:
(int) creq->nr_sectors);
#endif /* CCISS_DEBUG */

- seg = blk_rq_map_sg(q, creq, tmp_sg);
-
- /* get the DMA records for the setup */
- if (c->Request.Type.Direction == XFER_READ)
- dir = PCI_DMA_FROMDEVICE;
- else
- dir = PCI_DMA_TODEVICE;
+ sgs.c = c;
+ sgs.sgelem = 0;
+ sgs.sgd_number = 0;
+ sgs.total_sgs = 0;
+ sgs.dir = (c->Request.Type.Direction == XFER_READ) ?
+ PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE;
+ seg = blk_rq_map_sg_one_by_one(q, creq,
+ (consume_sg_callback) cciss_setup_sg, &sgs);

- for (i=0; i<seg; i++)
- {
- c->SG[i].Len = tmp_sg[i].length;
- temp64.val = (__u64) pci_map_page(h->pdev, tmp_sg[i].page,
- tmp_sg[i].offset, tmp_sg[i].length,
- dir);
- c->SG[i].Addr.lower = temp64.val32.lower;
- c->SG[i].Addr.upper = temp64.val32.upper;
- c->SG[i].Ext = 0; // we are not chaining
- }
/* track how many SG entries we are using */
- if( seg > h->maxSG)
- h->maxSG = seg;
+ if( sgs.total_sgs > h->maxSG)
+ h->maxSG = sgs.total_sgs;

#ifdef CCISS_DEBUG
printk(KERN_DEBUG "cciss: Submitting %d sectors in %d segments\n", creq->nr_sectors, seg);
#endif /* CCISS_DEBUG */

- c->Header.SGList = c->Header.SGTotal = seg;
+ if (sgs.total_sgs > MAXSGENTRIES) {
+ c->Header.SGList = MAXSGENTRIES;
+ /* fixup last chained sg page length, probably not SGD_SIZE. */
+ c->sgdlist[sgs.sgd_number-1].sgd[MAXSGENTRIES-1].Len =
+ sgs.sgelem * sizeof(SGDescriptor_struct);
+ } else
+ c->Header.SGList = sgs.total_sgs;
+
+ c->Header.SGTotal = sgs.total_sgs;
c->Request.CDB[1]= 0;
c->Request.CDB[2]= (start_blk >> 24) & 0xff; //MSB
c->Request.CDB[3]= (start_blk >> 16) & 0xff;
@@ -2316,6 +2439,33 @@ static void free_hba(int i)
kfree(p);
}

+
+static void bail_out(int i)
+{
+ if(hba[i]->cmd_pool_bits)
+ kfree(hba[i]->cmd_pool_bits);
+ if(hba[i]->cmd_pool)
+ pci_free_consistent(hba[i]->pdev,
+ NR_CMDS * sizeof(CommandList_struct),
+ hba[i]->cmd_pool, hba[i]->cmd_pool_dhandle);
+ if(hba[i]->errinfo_pool)
+ pci_free_consistent(hba[i]->pdev,
+ NR_CMDS * sizeof( ErrorInfo_struct),
+ hba[i]->errinfo_pool,
+ hba[i]->errinfo_pool_dhandle);
+ if (hba[i]->sglist_pool != NULL)
+ pci_free_consistent(hba[i]->pdev,
+ (CONFIG_CISS_MAX_SG_PAGES-1) * MAXSGENTRIES *
+ NR_CMDS * sizeof(SGDescriptor_struct),
+ hba[i]->sglist_pool,
+ hba[i]->sglist_pool_dhandle);
+ free_irq(hba[i]->intr, hba[i]);
+ unregister_blkdev(MAJOR_NR+i, hba[i]->devname);
+ release_io_mem(hba[i]);
+ free_hba(i);
+ printk( KERN_ERR "cciss: out of memory");
+}
+
/*
* This is it. Find all the controllers and register them. I really hate
* stealing all these major device numbers.
@@ -2364,6 +2514,11 @@ static int __init cciss_init_one(struct
free_hba(i);
return(-1);
}
+ /* Assume we can't do SG page chaining until we know otherwise */
+ hba[i]->sglist_pool = NULL;
+ hba[i]->sglist_pool_dhandle = (dma_addr_t) NULL;
+ hba[i]->sg_limit = MAXSGENTRIES;
+
/* make sure the board interrupts are off */
hba[i]->access.set_intr_mask(hba[i], CCISS_INTR_OFF);
if( request_irq(hba[i]->intr, do_cciss_intr,
@@ -2386,24 +2541,8 @@ static int __init cciss_init_one(struct
&(hba[i]->errinfo_pool_dhandle));
if((hba[i]->cmd_pool_bits == NULL)
|| (hba[i]->cmd_pool == NULL)
- || (hba[i]->errinfo_pool == NULL))
- {
- if(hba[i]->cmd_pool_bits)
- kfree(hba[i]->cmd_pool_bits);
- if(hba[i]->cmd_pool)
- pci_free_consistent(hba[i]->pdev,
- NR_CMDS * sizeof(CommandList_struct),
- hba[i]->cmd_pool, hba[i]->cmd_pool_dhandle);
- if(hba[i]->errinfo_pool)
- pci_free_consistent(hba[i]->pdev,
- NR_CMDS * sizeof( ErrorInfo_struct),
- hba[i]->errinfo_pool,
- hba[i]->errinfo_pool_dhandle);
- free_irq(hba[i]->intr, hba[i]);
- unregister_blkdev(MAJOR_NR+i, hba[i]->devname);
- release_io_mem(hba[i]);
- free_hba(i);
- printk( KERN_ERR "cciss: out of memory");
+ || (hba[i]->errinfo_pool == NULL)) {
+ bail_out(i);
return(-1);
}

@@ -2419,7 +2558,7 @@ static int __init cciss_init_one(struct
#endif /* CCISS_DEBUG */

cciss_getgeometry(i);
-
+ cciss_allocate_extra_sg_pages(i);
cciss_scsi_setup(i);

/* Turn the interrupts on so we can service requests */
@@ -2427,21 +2566,16 @@ static int __init cciss_init_one(struct

cciss_procinit(i);

+ hba[i]->sg_limit = figure_sg_limit(i);
q = &hba[i]->queue;
q->queuedata = hba[i];
spin_lock_init(&hba[i]->lock);
blk_init_queue(q, do_cciss_request, &hba[i]->lock);
blk_queue_bounce_limit(q, hba[i]->pdev->dma_mask);
-
- /* This is a hardware imposed limit. */
- blk_queue_max_hw_segments(q, MAXSGENTRIES);
-
- /* This is a limit in the driver and could be eliminated. */
- blk_queue_max_phys_segments(q, MAXSGENTRIES);
-
+ blk_queue_max_hw_segments(q, hba[i]->sg_limit);
+ blk_queue_max_phys_segments(q, hba[i]->sg_limit);
blk_queue_max_sectors(q, 512);

-
for(j=0; j<NWD; j++) {
drive_info_struct *drv = &(hba[i]->drv[j]);
struct gendisk *disk = hba[i]->gendisk[j];
@@ -2509,6 +2643,11 @@ static void __devexit cciss_remove_one (
hba[i]->cmd_pool, hba[i]->cmd_pool_dhandle);
pci_free_consistent(hba[i]->pdev, NR_CMDS * sizeof( ErrorInfo_struct),
hba[i]->errinfo_pool, hba[i]->errinfo_pool_dhandle);
+ if (hba[i]->sglist_pool != NULL)
+ pci_free_consistent(hba[i]->pdev,
+ (CONFIG_CISS_MAX_SG_PAGES-1) * MAXSGENTRIES *
+ NR_CMDS * sizeof(SGDescriptor_struct),
+ hba[i]->sglist_pool, hba[i]->sglist_pool_dhandle);
kfree(hba[i]->cmd_pool_bits);
release_io_mem(hba[i]);
free_hba(i);
--- lx2544ac3/drivers/block/cciss_cmd.h~sgc3 Fri Oct 25 13:46:12 2002
+++ lx2544ac3-scameron/drivers/block/cciss_cmd.h Fri Oct 25 13:46:12 2002
@@ -7,6 +7,10 @@

//general boundary defintions
#define SENSEINFOBYTES 32//note that this value may vary between host implementations
+
+/* MAXSGENTRIES is the max scatter gather entries per SG page,
+ some adapters support chaining SG pages, using the last SG
+ entry to point to the next SG page. */
#define MAXSGENTRIES 31
#define MAXREPLYQS 256

@@ -197,6 +201,8 @@ typedef struct _SGDescriptor_struct {
DWORD Ext;
} SGDescriptor_struct;

+#define SGD_SIZE (sizeof(SGDescriptor_struct) * MAXSGENTRIES)
+
typedef union _MoreErrInfo_struct{
struct {
BYTE Reserved[3];
@@ -226,6 +232,11 @@ typedef struct _ErrorInfo_struct {
#define CMD_MSG_DONE 0x04
#define CMD_MSG_TIMEOUT 0x05

+struct sgd_list_t {
+ SGDescriptor_struct *sgd;
+ dma_addr_t dma;
+};
+
typedef struct _CommandList_struct {
CommandListHeader_struct Header;
RequestBlock_struct Request;
@@ -241,6 +252,7 @@ typedef struct _CommandList_struct {
struct request * rq;
struct completion *waiting;
int retry_count;
+ struct sgd_list_t sgdlist[CONFIG_CISS_MAX_SG_PAGES];
#ifdef CONFIG_CISS_SCSI_TAPE
void * scsi_cmd;
#endif
--- lx2544ac3/drivers/block/cciss.h~sgc3 Fri Oct 25 13:46:12 2002
+++ lx2544ac3-scameron/drivers/block/cciss.h Fri Oct 25 13:46:12 2002
@@ -36,6 +36,13 @@ typedef struct _drive_info_struct
int cylinders;
} drive_info_struct;

+#if CONFIG_CISS_MAX_SG_PAGES < 1
+#error "CONFIG_CISS_MAX_SG_PAGES must be greater than 0 and less than 9"
+#elif CONFIG_CISS_MAX_SG_PAGES > 8
+/* eight is rather arbitrary...no real limit here. */
+#error "CONFIG_CISS_MAX_SG_PAGES must be greater than 0 and less than 9"
+#endif
+
struct ctlr_info
{
int ctlr;
@@ -77,9 +84,12 @@ struct ctlr_info
dma_addr_t cmd_pool_dhandle;
ErrorInfo_struct *errinfo_pool;
dma_addr_t errinfo_pool_dhandle;
+ SGDescriptor_struct *sglist_pool;
+ dma_addr_t sglist_pool_dhandle;
unsigned long *cmd_pool_bits;
int nr_allocs;
int nr_frees;
+ int sg_limit;

// Disk structures we need to pass back
struct gendisk *gendisk[NWD];
--- lx2544ac3/drivers/block/cciss_scsi.c~sgc3 Fri Oct 25 13:46:12 2002
+++ lx2544ac3-scameron/drivers/block/cciss_scsi.c Fri Oct 25 13:46:12 2002
@@ -257,7 +257,7 @@ scsi_cmd_stack_free(int ctlr)
"Unknown" : scsi_device_types[n]

#if 0
-static int xmargin=8;
+static int xmargin=24;
static int amargin=60;

static void
@@ -297,6 +297,10 @@ print_bytes (unsigned char *c, int len,
static void
print_cmd(CommandList_struct *cp)
{
+ int i;
+ int sg, sgp;
+ SGDescriptor_struct *sgd;
+
printk("queue:%d\n", cp->Header.ReplyQueue);
printk("sglist:%d\n", cp->Header.SGList);
printk("sgtot:%d\n", cp->Header.SGTotal);
@@ -329,7 +333,25 @@ print_cmd(CommandList_struct *cp)
printk("edesc.Addr: 0x%08x/0%08x, Len = %d\n",
cp->ErrDesc.Addr.upper, cp->ErrDesc.Addr.lower,
cp->ErrDesc.Len);
- printk("sgs..........Errorinfo:\n");
+ printk("SGS:\n");
+ sgp = 0; sg = 0;
+ sgd = cp->sgdlist[sgp].sgd;
+ for (i=0;i<cp->Header.SGTotal;i++) {
+ printk("0x%08x%08x : %d: %d %s\n", sgd[sg].Addr.upper,
+ sgd[sg].Addr.lower, sgd[sg].Len,
+ sgd[sg].Ext, sg >= MAXSGENTRIES-1 ? " chain" : "");
+
+ if (sg == MAXSGENTRIES-1 && ! sgd[sg].Ext &&
+ cp->Header.SGTotal != cp->Header.SGList)
+ printk("Extended SG bit expected, but not set!\n");
+ sg++;
+ if (sg >= MAXSGENTRIES) {
+ sg = 0;
+ sgp++;
+ sgd = cp->sgdlist[sgp].sgd;
+ }
+ }
+ printk("Errorinfo:\n");
printk("scsistatus:%d\n", cp->err_info->ScsiStatus);
printk("senselen:%d\n", cp->err_info->SenseLen);
printk("cmd status:%d\n", cp->err_info->CommandStatus);
--- lx2544ac3/drivers/block/Config.help~sgc3 Fri Oct 25 13:46:12 2002
+++ lx2544ac3-scameron/drivers/block/Config.help Fri Oct 25 13:46:12 2002
@@ -196,6 +196,18 @@ CONFIG_BLK_CPQ_CISS_DA
boards supported by this driver, and for further information
on the use of this driver.

+CONFIG_CISS_MAX_SG_PAGES
+ This is the number of scatter gather pages to allocate for
+ each command. Each page can hold up to 31 address/length
+ pairs. The minimum and default value is 1 page. You may observe
+ the values reported in /proc/driver/cciss/cciss* for "Max SG
+ entries since init X/Y". These two values, X and Y, are the
+ maximum number of address/length pairs used by any one command
+ so far, and the total number of address/length pairs available
+ to each command respectively. If, after running a typical load,
+ X == Y, then adding another page may help. If Y - X > 31,
+ then subtracting a page may help.
+
CONFIG_CISS_SCSI_TAPE
When enabled (Y), this option allows SCSI tape drives and SCSI medium
changers (tape robots) to be accessed via a Compaq 5xxx array
--- lx2544ac3/drivers/block/Config.in~sgc3 Fri Oct 25 13:46:12 2002
+++ lx2544ac3-scameron/drivers/block/Config.in Fri Oct 25 13:46:12 2002
@@ -35,6 +35,9 @@ if [ "$CONFIG_PARIDE" = "y" -o "$CONFIG_
fi
dep_tristate 'Compaq SMART2 support' CONFIG_BLK_CPQ_DA $CONFIG_PCI
dep_tristate 'Compaq Smart Array 5xxx support' CONFIG_BLK_CPQ_CISS_DA $CONFIG_PCI
+if [ "$CONFIG_BLK_CPQ_CISS_DA" = "y" -o "$CONFIG_BLK_CPQ_CISS_DA" = "m" ]; then
+ int ' Maximum scatter gather pages per command (1-8)' CONFIG_CISS_MAX_SG_PAGES 1
+fi
dep_mbool ' SCSI tape drive support for Smart Array 5xxx' CONFIG_CISS_SCSI_TAPE $CONFIG_BLK_CPQ_CISS_DA $CONFIG_SCSI
dep_tristate 'Mylex DAC960/DAC1100 PCI RAID Controller support' CONFIG_BLK_DEV_DAC960 $CONFIG_PCI
dep_tristate 'Micro Memory MM5415 Battery Backed RAM support (EXPERIMENTAL)' CONFIG_BLK_DEV_UMEM $CONFIG_PCI $CONFIG_EXPERIMENTAL

.

2002-10-25 20:41:34

by Cameron, Steve

[permalink] [raw]
Subject: RE: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

Jens Axboe wrote:
> On Fri, Oct 25 2002, Stephen Cameron wrote:
> >
> > Add blk_rq_map_sg_one_by_one function to ll_rw_blk.c in order to allow a low
> > level driver to map scatter gather elements from the block subsystem one
> > at a time
[...]
> I have to say that I think this patch is ugly, and a complete
> duplicate of existing code. This is always bad, especially in the case of
> something not really straight forward (like blk_rq_map_sg()). A hack.

Yes, I sort of figured you'd say that. I was just trying to get the
ball rolling.

>
> I can understand the need for something like this, for drivers that
> can't use a struct scatterlist directly. I'd rather do this in a
> different way, though.
[...]

Ok, I will (try to) rewrite it as you suggest.

> Oh, and do try to follow the style. It's
[...]

Well, it doesn't look like cpqfc, at least.
Heh, I count it as a victory that you choose such small things
to complain about. Thanks for your time reviewing this.

-- steve

2002-10-25 21:05:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Fri, Oct 25 2002, Cameron, Steve wrote:
> Jens Axboe wrote:
> > On Fri, Oct 25 2002, Stephen Cameron wrote:
> > >
> > > Add blk_rq_map_sg_one_by_one function to ll_rw_blk.c in order to allow a low
> > > level driver to map scatter gather elements from the block subsystem one
> > > at a time
> [...]
> > I have to say that I think this patch is ugly, and a complete
> > duplicate of existing code. This is always bad, especially in the case of
> > something not really straight forward (like blk_rq_map_sg()). A hack.
>
> Yes, I sort of figured you'd say that. I was just trying to get the
> ball rolling.

I was hoping that was the case and you weren't really serious :-). I've
cut a draft version of this, it's not tested at all though to be
careful... It should do what you want, yes? I'm not too happy with this
one, need a bit more time to ponder.

> Well, it doesn't look like cpqfc, at least.

Well thank god. :)

===== drivers/block/cciss.c 1.63 vs edited =====
--- 1.63/drivers/block/cciss.c Fri Oct 18 12:55:58 2002
+++ edited/drivers/block/cciss.c Fri Oct 25 23:07:21 2002
@@ -1681,6 +1681,28 @@
end_that_request_last(cmd->rq);
}

+struct cciss_map_state {
+ CommandList_struct *command;
+ int data_dir;
+};
+
+static void cciss_map_sg(request_queue_t *q, struct scatterlist *sg, int nseg,
+ void *map_cookie)
+{
+ struct cciss_map_state *s = map_cookie;
+ CommandList_struct *c = s->command;
+ ctlr_info_t *h= q->queuedata;
+ u64bit temp64;
+
+ c->SG[nseg].Len = sg->length;
+ temp64.val = (__u64) pci_map_page(h->pdev, sg->page, sg->offset,
+ sg->length, s->data_dir);
+
+ c->SG[nseg].Addr.lower = temp64.val32.lower;
+ c->SG[nseg].Addr.upper = temp64.val32.upper;
+ c->SG[nseg].Ext = 0; // we are not chaining
+}
+
/*
* Get a request and submit it to the controller.
*/
@@ -1690,10 +1712,8 @@
CommandList_struct *c;
int start_blk, seg;
struct request *creq;
- u64bit temp64;
- struct scatterlist tmp_sg[MAXSGENTRIES];
+ struct cciss_map_state map_state;
drive_info_struct *drv;
- int i, dir;

if (blk_queue_plugged(q))
goto startio;
@@ -1735,24 +1755,16 @@
(int) creq->nr_sectors);
#endif /* CCISS_DEBUG */

- seg = blk_rq_map_sg(q, creq, tmp_sg);
-
/* get the DMA records for the setup */
if (c->Request.Type.Direction == XFER_READ)
- dir = PCI_DMA_FROMDEVICE;
+ map_state.data_dir = PCI_DMA_FROMDEVICE;
else
- dir = PCI_DMA_TODEVICE;
+ map_state.data_dir = PCI_DMA_TODEVICE;
+
+ map_state.command = c;
+
+ seg = blk_rq_map_callback(q, creq, cciss_map_sg, &map_state);

- for (i=0; i<seg; i++)
- {
- c->SG[i].Len = tmp_sg[i].length;
- temp64.val = (__u64) pci_map_page(h->pdev, tmp_sg[i].page,
- tmp_sg[i].offset, tmp_sg[i].length,
- dir);
- c->SG[i].Addr.lower = temp64.val32.lower;
- c->SG[i].Addr.upper = temp64.val32.upper;
- c->SG[i].Ext = 0; // we are not chaining
- }
/* track how many SG entries we are using */
if( seg > h->maxSG)
h->maxSG = seg;
===== drivers/block/ll_rw_blk.c 1.123 vs edited =====
--- 1.123/drivers/block/ll_rw_blk.c Fri Oct 18 19:41:37 2002
+++ edited/drivers/block/ll_rw_blk.c Fri Oct 25 23:00:55 2002
@@ -765,13 +793,11 @@
return 0;
}

-/*
- * map a request to scatterlist, return number of sg entries setup. Caller
- * must make sure sg can hold rq->nr_phys_segments entries
- */
-int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
+int __blk_rq_map(request_queue_t *q, struct request *rq,
+ struct scatterlist *sglist, map_sg_fn *map, void *map_cookie)
{
struct bio_vec *bvec, *bvprv;
+ struct scatterlist *sg, sgstat;
struct bio *bio;
int nsegs, i, cluster;

@@ -782,6 +808,7 @@
* for each bio in rq
*/
bvprv = NULL;
+ sg = NULL;
rq_for_each_bio(bio, rq) {
/*
* for each segment in bio
@@ -790,7 +817,7 @@
int nbytes = bvec->bv_len;

if (bvprv && cluster) {
- if (sg[nsegs - 1].length + nbytes > q->max_segment_size)
+ if (sg->length + nbytes > q->max_segment_size)
goto new_segment;

if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
@@ -798,13 +825,21 @@
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;

- sg[nsegs - 1].length += nbytes;
+ sg->length += nbytes;
} else {
new_segment:
- memset(&sg[nsegs],0,sizeof(struct scatterlist));
- sg[nsegs].page = bvec->bv_page;
- sg[nsegs].length = nbytes;
- sg[nsegs].offset = bvec->bv_offset;
+ if (sglist) {
+ sg = &sglist[nsegs];
+ sg->page = bvec->bv_page;
+ sg->length = nbytes;
+ sg->offset = bvec->bv_offset;
+ } else {
+ sg = &sgstat;
+ sg->page = bvec->bv_page;
+ sg->length = nbytes;
+ sg->offset = bvec->bv_offset;
+ map(q, sg, nsegs, map_cookie);
+ }

nsegs++;
}
@@ -816,10 +851,27 @@
}

/*
+ * map a request to scatterlist, return number of sg entries setup. Caller
+ * must make sure sg can hold rq->nr_phys_segments entries
+ */
+int blk_rq_map_sg(request_queue_t *q, struct request *rq,struct scatterlist *sg)
+{
+ return __blk_rq_map(q, rq, sg, NULL, NULL);
+}
+
+/*
+ * map a request to scatterlist, calling the map function for each segment
+ */
+int blk_rq_map_callback(request_queue_t *q, struct request *rq, map_sg_fn *map,
+ void *map_cookie)
+{
+ return __blk_rq_map(q, rq, NULL, map, map_cookie);
+}
+
+/*
* the standard queue merge functions, can be overridden with device
* specific ones if so desired
*/
-
static inline int ll_new_mergeable(request_queue_t *q,
struct request *req,
struct bio *bio)
===== include/linux/blkdev.h 1.76 vs edited =====
--- 1.76/include/linux/blkdev.h Fri Oct 18 19:50:43 2002
+++ edited/include/linux/blkdev.h Fri Oct 25 23:06:51 2002
@@ -131,6 +136,8 @@
typedef int (prep_rq_fn) (request_queue_t *, struct request *);
typedef void (unplug_fn) (void *q);

+typedef void (map_sg_fn) (request_queue_t *q, struct scatterlist *,int, void *);
+
struct bio_vec;
typedef int (merge_bvec_fn) (request_queue_t *, struct bio *, struct bio_vec *);

@@ -339,9 +348,11 @@
extern void blk_queue_assign_lock(request_queue_t *, spinlock_t *);
extern void blk_queue_prep_rq(request_queue_t *, prep_rq_fn *pfn);
extern void blk_queue_merge_bvec(request_queue_t *, merge_bvec_fn *);
+extern void blk_queue_dma_alignment(request_queue_t *, int);
extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);

extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
+extern int blk_rq_map_callback(request_queue_t *, struct request *, map_sg_fn *, void *);
extern void blk_dump_rq_flags(struct request *, char *);
extern void generic_unplug_device(void *);
extern long nr_blockdev_pages(void);

--
Jens Axboe

2002-10-25 21:18:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Fri, Oct 25 2002, Jens Axboe wrote:
> On Fri, Oct 25 2002, Cameron, Steve wrote:
> > Jens Axboe wrote:
> > > On Fri, Oct 25 2002, Stephen Cameron wrote:
> > > >
> > > > Add blk_rq_map_sg_one_by_one function to ll_rw_blk.c in order to allow a low
> > > > level driver to map scatter gather elements from the block subsystem one
> > > > at a time
> > [...]
> > > I have to say that I think this patch is ugly, and a complete
> > > duplicate of existing code. This is always bad, especially in the case of
> > > something not really straight forward (like blk_rq_map_sg()). A hack.
> >
> > Yes, I sort of figured you'd say that. I was just trying to get the
> > ball rolling.
>
> I was hoping that was the case and you weren't really serious :-). I've
> cut a draft version of this, it's not tested at all though to be
> careful... It should do what you want, yes? I'm not too happy with this
> one, need a bit more time to ponder.

Well no doubt that this is much cleaner. It just needs to be evaluated
for over head... It does incur an extra function call per segment
mapped. On the other hand, we save the conditional and branch. IOW, I'll
probably go with this one.

--
Jens Axboe

2002-10-25 21:20:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Fri, Oct 25 2002, Jens Axboe wrote:
> probably go with this one.

Ahem...

===== drivers/block/cciss.c 1.63 vs edited =====
--- 1.63/drivers/block/cciss.c Fri Oct 18 12:55:58 2002
+++ edited/drivers/block/cciss.c Fri Oct 25 23:07:21 2002
@@ -1681,6 +1681,28 @@
end_that_request_last(cmd->rq);
}

+struct cciss_map_state {
+ CommandList_struct *command;
+ int data_dir;
+};
+
+static void cciss_map_sg(request_queue_t *q, struct scatterlist *sg, int nseg,
+ void *map_cookie)
+{
+ struct cciss_map_state *s = map_cookie;
+ CommandList_struct *c = s->command;
+ ctlr_info_t *h= q->queuedata;
+ u64bit temp64;
+
+ c->SG[nseg].Len = sg->length;
+ temp64.val = (__u64) pci_map_page(h->pdev, sg->page, sg->offset,
+ sg->length, s->data_dir);
+
+ c->SG[nseg].Addr.lower = temp64.val32.lower;
+ c->SG[nseg].Addr.upper = temp64.val32.upper;
+ c->SG[nseg].Ext = 0; // we are not chaining
+}
+
/*
* Get a request and submit it to the controller.
*/
@@ -1690,10 +1712,8 @@
CommandList_struct *c;
int start_blk, seg;
struct request *creq;
- u64bit temp64;
- struct scatterlist tmp_sg[MAXSGENTRIES];
+ struct cciss_map_state map_state;
drive_info_struct *drv;
- int i, dir;

if (blk_queue_plugged(q))
goto startio;
@@ -1735,24 +1755,16 @@
(int) creq->nr_sectors);
#endif /* CCISS_DEBUG */

- seg = blk_rq_map_sg(q, creq, tmp_sg);
-
/* get the DMA records for the setup */
if (c->Request.Type.Direction == XFER_READ)
- dir = PCI_DMA_FROMDEVICE;
+ map_state.data_dir = PCI_DMA_FROMDEVICE;
else
- dir = PCI_DMA_TODEVICE;
+ map_state.data_dir = PCI_DMA_TODEVICE;
+
+ map_state.command = c;
+
+ seg = blk_rq_map_callback(q, creq, cciss_map_sg, &map_state);

- for (i=0; i<seg; i++)
- {
- c->SG[i].Len = tmp_sg[i].length;
- temp64.val = (__u64) pci_map_page(h->pdev, tmp_sg[i].page,
- tmp_sg[i].offset, tmp_sg[i].length,
- dir);
- c->SG[i].Addr.lower = temp64.val32.lower;
- c->SG[i].Addr.upper = temp64.val32.upper;
- c->SG[i].Ext = 0; // we are not chaining
- }
/* track how many SG entries we are using */
if( seg > h->maxSG)
h->maxSG = seg;
===== drivers/block/ll_rw_blk.c 1.123 vs edited =====
--- 1.123/drivers/block/ll_rw_blk.c Fri Oct 18 19:41:37 2002
+++ edited/drivers/block/ll_rw_blk.c Fri Oct 25 23:22:10 2002
@@ -765,13 +793,11 @@
return 0;
}

-/*
- * map a request to scatterlist, return number of sg entries setup. Caller
- * must make sure sg can hold rq->nr_phys_segments entries
- */
-int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
+int __blk_rq_map(request_queue_t *q, struct request *rq, map_sg_fn *map,
+ void *cookie)
{
struct bio_vec *bvec, *bvprv;
+ struct scatterlist sg;
struct bio *bio;
int nsegs, i, cluster;

@@ -790,7 +816,7 @@
int nbytes = bvec->bv_len;

if (bvprv && cluster) {
- if (sg[nsegs - 1].length + nbytes > q->max_segment_size)
+ if (sg.length + nbytes > q->max_segment_size)
goto new_segment;

if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
@@ -798,14 +824,13 @@
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;

- sg[nsegs - 1].length += nbytes;
+ sg.length += nbytes;
} else {
new_segment:
- memset(&sg[nsegs],0,sizeof(struct scatterlist));
- sg[nsegs].page = bvec->bv_page;
- sg[nsegs].length = nbytes;
- sg[nsegs].offset = bvec->bv_offset;
-
+ sg.page = bvec->bv_page;
+ sg.offset = bvec->bv_offset;
+ sg.length = nbytes;
+ map(q, &sg, nsegs, cookie);
nsegs++;
}
bvprv = bvec;
@@ -815,11 +840,38 @@
return nsegs;
}

+static void blk_map_to_sg(request_queue_t *q, struct scatterlist *sg,
+ int segment, void *cookie)
+{
+ struct scatterlist *sglist = cookie;
+
+ sglist[segment].page = sg->page;
+ sglist[segment].offset = sg->offset;
+ sglist[segment].length = sg->length;
+}
+
+/*
+ * map a request to scatterlist, return number of sg entries setup. Caller
+ * must make sure sg can hold rq->nr_phys_segments entries
+ */
+int blk_rq_map_sg(request_queue_t *q, struct request *rq,struct scatterlist *sg)
+{
+ return __blk_rq_map(q, rq, blk_map_to_sg, sg);
+}
+
+/*
+ * map a request to scatterlist, calling the map function for each segment
+ */
+int blk_rq_map_callback(request_queue_t *q, struct request *rq, map_sg_fn *map,
+ void *map_cookie)
+{
+ return __blk_rq_map(q, rq, map, map_cookie);
+}
+
/*
* the standard queue merge functions, can be overridden with device
* specific ones if so desired
*/
-
static inline int ll_new_mergeable(request_queue_t *q,
struct request *req,
struct bio *bio)
===== include/linux/blkdev.h 1.76 vs edited =====
--- 1.76/include/linux/blkdev.h Fri Oct 18 19:50:43 2002
+++ edited/include/linux/blkdev.h Fri Oct 25 23:06:51 2002
@@ -131,6 +136,8 @@
typedef int (prep_rq_fn) (request_queue_t *, struct request *);
typedef void (unplug_fn) (void *q);

+typedef void (map_sg_fn) (request_queue_t *q, struct scatterlist *,int, void *);
+
struct bio_vec;
typedef int (merge_bvec_fn) (request_queue_t *, struct bio *, struct bio_vec *);

@@ -339,9 +348,11 @@
extern void blk_queue_assign_lock(request_queue_t *, spinlock_t *);
extern void blk_queue_prep_rq(request_queue_t *, prep_rq_fn *pfn);
extern void blk_queue_merge_bvec(request_queue_t *, merge_bvec_fn *);
+extern void blk_queue_dma_alignment(request_queue_t *, int);
extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);

extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
+extern int blk_rq_map_callback(request_queue_t *, struct request *, map_sg_fn *, void *);
extern void blk_dump_rq_flags(struct request *, char *);
extern void generic_unplug_device(void *);
extern long nr_blockdev_pages(void);

--
Jens Axboe

2002-10-25 21:25:52

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Fri, Oct 25 2002, Jens Axboe wrote:
> } else {
> new_segment:
> - memset(&sg[nsegs],0,sizeof(struct scatterlist));
> - sg[nsegs].page = bvec->bv_page;
> - sg[nsegs].length = nbytes;
> - sg[nsegs].offset = bvec->bv_offset;
> -
> + sg.page = bvec->bv_page;
> + sg.offset = bvec->bv_offset;
> + sg.length = nbytes;
> + map(q, &sg, nsegs, cookie);
if (sgprev)
map(q, sgprev, nsegs, cookie);

of course, and likewise for the cluster check. I'll cut a clean version
tomorrow, I'm out for today..

--
Jens Axboe

2002-10-25 22:00:13

by Cameron, Steve

[permalink] [raw]
Subject: RE: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

>
> On Fri, Oct 25 2002, Jens Axboe wrote:
[...]
> of course, and likewise for the cluster check. I'll cut a
> clean version
> tomorrow, I'm out for today..
>
Ok, Thanks. I'm out 'til Monday.

What originally drove me down this road was not wanting to
have:

struct scatterlist tmpsg[XXX]; /* 8 kbytes */

on the stack for every command when XXX might be 512, which
was what I was expecting when I started writing this patch.
Once I got it working, empirically I saw the most I ever
get is 64 SGs. I don't know if that's a kernel limit, or
if I just haven't beat on the system hard enough.

In hindsight, perhaps it would have been better for me to just put
it on the stack like before, only moreso, and just change the
driver to use more SGs, and only later tackle the stack problem,
if necessary. As you noted, it's not clear that the overhead of
the function call per scatter gather element isn't worse than
just reformatting the entire scatter gather list in a separate
pass, as is currently done (before my patch).

Have a good weekend.

-- steve

2002-10-28 15:21:12

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Fri, Oct 25 2002, Cameron, Steve wrote:
> >
> > On Fri, Oct 25 2002, Jens Axboe wrote:
> [...]
> > of course, and likewise for the cluster check. I'll cut a
> > clean version
> > tomorrow, I'm out for today..
> >
> Ok, Thanks. I'm out 'til Monday.
>
> What originally drove me down this road was not wanting to
> have:
>
> struct scatterlist tmpsg[XXX]; /* 8 kbytes */
>
> on the stack for every command when XXX might be 512, which
> was what I was expecting when I started writing this patch.
> Once I got it working, empirically I saw the most I ever
> get is 64 SGs. I don't know if that's a kernel limit, or
> if I just haven't beat on the system hard enough.

It's not a kernel limit. If you queue limits are all beyond 64 entries,
what you typically see is that you just hit what mpage will fill in for
you. One thing that should give you max number of sg entries is plain
and simple:

dd if=/dev/zero of=some_file bs=4096k

Besides, if you put 8kb on the stack you are already seriously in
trouble. 64 entries is already 1280 bytes on stack for pae x86. On
64-bit archs you are hitting the 2kb limit. I like your idea of having a
consume function, so I think we should just stick with that.

> In hindsight, perhaps it would have been better for me to just put
> it on the stack like before, only moreso, and just change the
> driver to use more SGs, and only later tackle the stack problem,
> if necessary. As you noted, it's not clear that the overhead of
> the function call per scatter gather element isn't worse than
> just reformatting the entire scatter gather list in a separate
> pass, as is currently done (before my patch).

Well that's hard to say, needs to be profile. For reference, here's my
current patch. This one has even been booted, seems to work :-). Don't
mind the dma_alignment bits, that's just polution from my tree.

===== drivers/block/cciss.c 1.63 vs edited =====
--- 1.63/drivers/block/cciss.c Fri Oct 18 12:55:58 2002
+++ edited/drivers/block/cciss.c Sun Oct 27 15:35:37 2002
@@ -1681,6 +1681,28 @@
end_that_request_last(cmd->rq);
}

+struct cciss_map_state {
+ CommandList_struct *command;
+ int data_dir;
+};
+
+static void cciss_map_sg(request_queue_t *q, struct scatterlist *sg, int nseg,
+ void *cookie)
+{
+ struct cciss_map_state *s = cookie;
+ CommandList_struct *c = s->command;
+ ctlr_info_t *h= q->queuedata;
+ u64bit temp64;
+
+ c->SG[nseg].Len = sg->length;
+ temp64.val = (__u64) pci_map_page(h->pdev, sg->page, sg->offset,
+ sg->length, s->data_dir);
+
+ c->SG[nseg].Addr.lower = temp64.val32.lower;
+ c->SG[nseg].Addr.upper = temp64.val32.upper;
+ c->SG[nseg].Ext = 0; // we are not chaining
+}
+
/*
* Get a request and submit it to the controller.
*/
@@ -1690,10 +1712,8 @@
CommandList_struct *c;
int start_blk, seg;
struct request *creq;
- u64bit temp64;
- struct scatterlist tmp_sg[MAXSGENTRIES];
+ struct cciss_map_state map_state;
drive_info_struct *drv;
- int i, dir;

if (blk_queue_plugged(q))
goto startio;
@@ -1735,24 +1755,16 @@
(int) creq->nr_sectors);
#endif /* CCISS_DEBUG */

- seg = blk_rq_map_sg(q, creq, tmp_sg);
-
/* get the DMA records for the setup */
if (c->Request.Type.Direction == XFER_READ)
- dir = PCI_DMA_FROMDEVICE;
+ map_state.data_dir = PCI_DMA_FROMDEVICE;
else
- dir = PCI_DMA_TODEVICE;
+ map_state.data_dir = PCI_DMA_TODEVICE;
+
+ map_state.command = c;
+
+ seg = blk_rq_map_consume(q, creq, cciss_map_sg, &map_state);

- for (i=0; i<seg; i++)
- {
- c->SG[i].Len = tmp_sg[i].length;
- temp64.val = (__u64) pci_map_page(h->pdev, tmp_sg[i].page,
- tmp_sg[i].offset, tmp_sg[i].length,
- dir);
- c->SG[i].Addr.lower = temp64.val32.lower;
- c->SG[i].Addr.upper = temp64.val32.upper;
- c->SG[i].Ext = 0; // we are not chaining
- }
/* track how many SG entries we are using */
if( seg > h->maxSG)
h->maxSG = seg;
===== drivers/block/ll_rw_blk.c 1.123 vs edited =====
--- 1.123/drivers/block/ll_rw_blk.c Fri Oct 18 19:41:37 2002
+++ edited/drivers/block/ll_rw_blk.c Mon Oct 28 11:43:27 2002
@@ -765,61 +793,123 @@
return 0;
}

-/*
- * map a request to scatterlist, return number of sg entries setup. Caller
- * must make sure sg can hold rq->nr_phys_segments entries
- */
-int blk_rq_map_sg(request_queue_t *q, struct request *rq, struct scatterlist *sg)
+static int __blk_rq_map(request_queue_t *q, struct request *rq,
+ consume_sg_fn *consume_fn, void *cookie)
{
struct bio_vec *bvec, *bvprv;
+ struct scatterlist sg;
struct bio *bio;
- int nsegs, i, cluster;
-
- nsegs = 0;
- cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
+ int nsegs, i;
+ const int cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
+ const int max_seg = q->max_segment_size;

/*
* for each bio in rq
*/
- bvprv = NULL;
+ nsegs = 0;
rq_for_each_bio(bio, rq) {
/*
* for each segment in bio
*/
+ bvprv = NULL;
bio_for_each_segment(bvec, bio, i) {
- int nbytes = bvec->bv_len;
-
- if (bvprv && cluster) {
- if (sg[nsegs - 1].length + nbytes > q->max_segment_size)
- goto new_segment;
-
- if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
- goto new_segment;
- if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
- goto new_segment;
-
- sg[nsegs - 1].length += nbytes;
- } else {
-new_segment:
- memset(&sg[nsegs],0,sizeof(struct scatterlist));
- sg[nsegs].page = bvec->bv_page;
- sg[nsegs].length = nbytes;
- sg[nsegs].offset = bvec->bv_offset;
+ /*
+ * if length does not exceed max segment size
+ * and does not straddle a physical memory
+ * boundary and does not start a new physical
+ * segment, cluster on to previous one
+ */
+ if ((bvprv && cluster)
+ && (sg.length + bvec->bv_len <= max_seg)
+ && BIOVEC_PHYS_MERGEABLE(bvprv, bvec)
+ && BIOVEC_SEG_BOUNDARY(q, bvprv, bvec)) {
+ sg.length += bvec->bv_len;
+ continue;
+ }

+ /*
+ * could/must not cluster, start a new segment making
+ * sure to consume the previous one first
+ */
+ if (bvprv) {
+ consume_fn(q, &sg, nsegs, cookie);
nsegs++;
}
+
+ sg.page = bvec->bv_page;
+ sg.offset = bvec->bv_offset;
+ sg.length = bvec->bv_len;
bvprv = bvec;
+
} /* segments in bio */
+
+ /*
+ * consume leftover segment, if any
+ */
+ if (bvprv) {
+ consume_fn(q, &sg, nsegs, cookie);
+ nsegs++;
+ }
+
} /* bios in rq */

return nsegs;
}

+static void blk_consume_sg(request_queue_t *q, struct scatterlist *sg,
+ int segment, void *cookie)
+{
+ struct scatterlist *sglist = cookie;
+
+ sglist[segment].page = sg->page;
+ sglist[segment].offset = sg->offset;
+ sglist[segment].length = sg->length;
+}
+
+/**
+ * blk_rq_map_sg - map a request to a scatterlist
+ * @q: The &request_queue_t the request belongs to
+ * @rq: The request
+ * @sg: The scatterlist to map to
+ *
+ * Description:
+ *
+ * Map a request to a scatterlist, returning the number of sg entries
+ * setup. Caller must make sure that @sg can hold ->nr_phys_segment
+ * worth of entries.
+ **/
+int blk_rq_map_sg(request_queue_t *q, struct request *rq,struct scatterlist *sg)
+{
+ return __blk_rq_map(q, rq, blk_consume_sg, sg);
+}
+
+/**
+ * blk_rq_map_consume - map a request, calling a consume function for each entry
+ * @q: The &request_queue_t the request belongs to
+ * @rq: The request
+ * @consume_fn: Function to call for each sg mapping
+ * @cookie: Per-mapping cookie
+ *
+ * Description:
+ *
+ * Like blk_rq_map_sg(), but call the @consume_fn for each sg entry
+ * prepared. Some drivers find this more handy if their internal
+ * scatterlist is different from the Linux scatterlist since they then
+ * don't have to allocate (on stack or dynamically) a large dummy scatterlist
+ * just for the intermediate request mapping. Drivers can use @cookie to
+ * separate various mappings from each other, typically by passing the actual
+ * hardware command here.
+ **/
+int blk_rq_map_consume(request_queue_t *q, struct request *rq,
+ consume_sg_fn *consume_fn, void *cookie)
+{
+ return __blk_rq_map(q, rq, consume_fn, cookie);
+}
+
/*
* the standard queue merge functions, can be overridden with device
* specific ones if so desired
*/
-
static inline int ll_new_mergeable(request_queue_t *q,
struct request *req,
struct bio *bio)
@@ -2112,7 +2266,9 @@
EXPORT_SYMBOL(blk_queue_max_segment_size);
EXPORT_SYMBOL(blk_queue_hardsect_size);
EXPORT_SYMBOL(blk_queue_segment_boundary);
+EXPORT_SYMBOL(blk_queue_dma_alignment);
EXPORT_SYMBOL(blk_rq_map_sg);
+EXPORT_SYMBOL(blk_rq_map_consume);
EXPORT_SYMBOL(blk_nohighio);
EXPORT_SYMBOL(blk_dump_rq_flags);
EXPORT_SYMBOL(submit_bio);
===== include/linux/blkdev.h 1.76 vs edited =====
--- 1.76/include/linux/blkdev.h Fri Oct 18 19:50:43 2002
+++ edited/include/linux/blkdev.h Sun Oct 27 22:06:19 2002
@@ -339,9 +353,14 @@
extern void blk_queue_assign_lock(request_queue_t *, spinlock_t *);
extern void blk_queue_prep_rq(request_queue_t *, prep_rq_fn *pfn);
extern void blk_queue_merge_bvec(request_queue_t *, merge_bvec_fn *);
+extern void blk_queue_dma_alignment(request_queue_t *, int);
extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev);

extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
+
+typedef void (consume_sg_fn) (request_queue_t *q, struct scatterlist *, int, void *);
+extern int blk_rq_map_consume(request_queue_t *, struct request *, consume_sg_fn *, void *);
+
extern void blk_dump_rq_flags(struct request *, char *);
extern void generic_unplug_device(void *);
extern long nr_blockdev_pages(void);

--
Jens Axboe

2002-10-28 23:08:18

by Cameron, Steve

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

Jens Axboe wrote:
[...]
> It's not a kernel limit. If you queue limits are all beyond 64 entries,
> what you typically see is that you just hit what mpage will fill in for
> you. One thing that should give you max number of sg entries is plain
> and simple:
>
> dd if=/dev/zero of=some_file bs=4096k

What about

dd if=/dev/cciss/c0d1p1 of=/dev/null bs=4096k

Should be similar, right? When I had tried that, I still got a max
of 64 SG's coming down. Tried lots of other block sizes too...
Maybe it's something to do with disk performance. A raid 1 volume of two
disks seems to give me 64 SGs, but a raid-0 set (1 disk) gets me
just 32. Maybe adding more disks in a stripe-set would get me more.
I haven't experimented enough to see if that behavior is consistent.

[...]
> Besides, if you put 8kb on the stack you are already seriously in
> trouble.

That is what I thought, but I could't remember where I learned it.
Anyway, here is a new cciss patch to work with your ll_rw_blk.c patch
to use more than 31 scatter gather entries.
-- steve

drivers/block/Config.help | 16 +++
drivers/block/Config.in | 3
drivers/block/cciss.c | 216 +++++++++++++++++++++++++++++++++++----------
drivers/block/cciss.h | 10 ++
drivers/block/cciss_cmd.h | 13 ++
drivers/block/cciss_scsi.c | 26 +++++
6 files changed, 237 insertions, 47 deletions

--- lx2544ac3/drivers/block/cciss.c~sgc3 Mon Oct 28 15:18:44 2002
+++ lx2544ac3-root/drivers/block/cciss.c Mon Oct 28 16:02:07 2002
@@ -37,6 +37,7 @@
#include <linux/init.h>
#include <linux/hdreg.h>
#include <linux/spinlock.h>
+#include <linux/completion.h>
#include <asm/uaccess.h>
#include <asm/io.h>

@@ -164,7 +165,7 @@ static int cciss_proc_get_info(char *buf
" Current Q depth: %d\n"
" Max Q depth since init: %d\n"
" Max # commands on controller since init: %d\n"
- " Max SG entries since init: %d\n\n",
+ " Max SG entries since init: %d/%d\n\n",
h->devname,
h->product_name,
(unsigned long)h->board_id,
@@ -173,7 +174,8 @@ static int cciss_proc_get_info(char *buf
(unsigned int)h->intr,
h->num_luns,
h->highest_lun,
- h->Qdepth, h->maxQsinceinit, h->max_outstanding, h->maxSG);
+ h->Qdepth, h->maxQsinceinit, h->max_outstanding, h->maxSG,
+ h->sg_limit);

pos += size; len += size;
cciss_proc_tape_report(ctlr, buffer, &pos, &len);
@@ -248,6 +250,44 @@ static void __init cciss_procinit(int i)
}
#endif /* CONFIG_PROC_FS */

+static int supports_sg_chaining(ctlr_info_t *h)
+{
+ int fw_ver = (h->firm_ver[0] - '0') * 100 + /* [1] == '.', skip */
+ (h->firm_ver[2] - '0') * 10 +
+ (h->firm_ver[3] - '0');
+ /* 1st generation controllers need >= 2.32 firmware */
+ if (h->pdev->device == PCI_DEVICE_ID_COMPAQ_CISS)
+ return (fw_ver >= 232);
+ /* 2nd generation controllers need >= 1.78 firmware */
+ if (h->pdev->device == PCI_DEVICE_ID_COMPAQ_CISSB)
+ return (fw_ver >= 178);
+ /* All controllers >= 3rd generation (will) support this. */
+ return 1;
+}
+
+static void cciss_allocate_extra_sg_pages(int ctlr)
+{
+ /* Allocate extra SG list pages if supported and requested */
+ if (supports_sg_chaining(hba[ctlr]) && CONFIG_CISS_MAX_SG_PAGES > 1) {
+ hba[ctlr]->sglist_pool = (SGDescriptor_struct *)
+ pci_alloc_consistent(hba[ctlr]->pdev,
+ (CONFIG_CISS_MAX_SG_PAGES-1) * MAXSGENTRIES *
+ NR_CMDS * sizeof(SGDescriptor_struct),
+ &(hba[ctlr]->sglist_pool_dhandle));
+ if (hba[ctlr]->sglist_pool == NULL)
+ printk(KERN_WARNING "cciss%d: Scatter-gather list "
+ "chaining not enabled.", ctlr);
+ }
+}
+
+static inline int figure_sg_limit(int i)
+{
+ /* figure out how many SGs we want to support for this board. */
+ if (supports_sg_chaining(hba[i]) && hba[i]->sglist_pool != NULL)
+ return CONFIG_CISS_MAX_SG_PAGES * (MAXSGENTRIES-1);
+ return MAXSGENTRIES; /* no SG list chaining. */
+}
+
/*
* For operations that cannot sleep, a command block is allocated at init,
* and managed by cmd_alloc() and cmd_free() using a simple bitmap to track
@@ -258,7 +298,7 @@ static void __init cciss_procinit(int i)
static CommandList_struct * cmd_alloc(ctlr_info_t *h, int get_from_pool)
{
CommandList_struct *c;
- int i;
+ int i, j;
u64bit temp64;
dma_addr_t cmd_dma_handle, err_dma_handle;

@@ -273,14 +313,40 @@ static CommandList_struct * cmd_alloc(ct
c->err_info = (ErrorInfo_struct *)pci_alloc_consistent(
h->pdev, sizeof(ErrorInfo_struct),
&err_dma_handle);
-
- if (c->err_info == NULL)
- {
+
+ if (c->err_info == NULL) {
pci_free_consistent(h->pdev,
sizeof(CommandList_struct), c, cmd_dma_handle);
return NULL;
}
+ c->sgdlist[0].sgd = c->SG;
+ c->sgdlist[0].dma = cmd_dma_handle +
+ ((void *) c->SG - (void *) c);
memset(c->err_info, 0, sizeof(ErrorInfo_struct));
+
+ if (h->sglist_pool != NULL) {
+ c->sgdlist[1].sgd = (SGDescriptor_struct *)
+ pci_alloc_consistent(h->pdev, SGD_SIZE *
+ (CONFIG_CISS_MAX_SG_PAGES - 1),
+ &c->sgdlist[0].dma);
+
+ if (c->sgdlist[1].sgd == NULL) {
+ pci_free_consistent(h->pdev,
+ sizeof(ErrorInfo_struct), c->err_info,
+ err_dma_handle);
+ pci_free_consistent(h->pdev,
+ sizeof(CommandList_struct), c,
+ cmd_dma_handle);
+ return NULL;
+ }
+ memset(&c->sgdlist[0].sgd[0], 0, SGD_SIZE);
+ for (j=1;j<CONFIG_CISS_MAX_SG_PAGES-1; j++) {
+ c->sgdlist[j+1].sgd =
+ (void *)c->sgdlist[j].sgd+j*SGD_SIZE;
+ c->sgdlist[j+1].dma =
+ c->sgdlist[j].dma + j * SGD_SIZE;
+ }
+ }
} else /* get it out of the controllers pool */
{
do {
@@ -300,6 +366,20 @@ static CommandList_struct * cmd_alloc(ct
err_dma_handle = h->errinfo_pool_dhandle
+ i*sizeof(ErrorInfo_struct);
h->nr_allocs++;
+ c->sgdlist[0].sgd = c->SG;
+ c->sgdlist[0].dma = cmd_dma_handle +
+ ((void *) c->SG - (void *) c);
+ if (h->sglist_pool != NULL)
+ for (j=0; j < CONFIG_CISS_MAX_SG_PAGES-1 ; j++) {
+ c->sgdlist[j+1].sgd =
+ (void *) h->sglist_pool +
+ SGD_SIZE *
+ (i * (CONFIG_CISS_MAX_SG_PAGES - 1)+j);
+ c->sgdlist[j+1].dma =
+ h->sglist_pool_dhandle +
+ SGD_SIZE *
+ (i * (CONFIG_CISS_MAX_SG_PAGES - 1)+j);
+ }
}

c->busaddr = (__u32) cmd_dma_handle;
@@ -307,11 +387,8 @@ static CommandList_struct * cmd_alloc(ct
c->ErrDesc.Addr.lower = temp64.val32.lower;
c->ErrDesc.Addr.upper = temp64.val32.upper;
c->ErrDesc.Len = sizeof(ErrorInfo_struct);
-
c->ctlr = h->ctlr;
- return c;
-
-
+ return c;
}

/*
@@ -326,6 +403,9 @@ static void cmd_free(ctlr_info_t *h, Com
{
temp64.val32.lower = c->ErrDesc.Addr.lower;
temp64.val32.upper = c->ErrDesc.Addr.upper;
+ pci_free_consistent(h->pdev, SGD_SIZE *
+ CONFIG_CISS_MAX_SG_PAGES,
+ c->sgdlist[0].sgd, c->sgdlist[0].dma);
pci_free_consistent(h->pdev, sizeof(ErrorInfo_struct),
c->err_info, (dma_addr_t) temp64.val);
pci_free_consistent(h->pdev, sizeof(CommandList_struct),
@@ -1771,6 +1851,8 @@ static inline void complete_command( ctl
struct cciss_map_state {
CommandList_struct *command;
int data_dir;
+ int sgelem;
+ int sgpage;
};

static void cciss_map_sg(request_queue_t *q, struct scatterlist *sg, int nseg,
@@ -1778,16 +1860,32 @@ static void cciss_map_sg(request_queue_t
{
struct cciss_map_state *s = cookie;
CommandList_struct *c = s->command;
+ SGDescriptor_struct *sgd;
ctlr_info_t *h= q->queuedata;
u64bit temp64;

- c->SG[nseg].Len = sg->length;
- temp64.val = (__u64) pci_map_page(h->pdev, sg->page, sg->offset,
- sg->length, s->data_dir);
+ sgd = &c->sgdlist[s->sgpage].sgd[s->sgelem];

- c->SG[nseg].Addr.lower = temp64.val32.lower;
- c->SG[nseg].Addr.upper = temp64.val32.upper;
- c->SG[nseg].Ext = 0; // we are not chaining
+ /* if we hit end of SG page, then chain to new SG page, if we can */
+ if (s->sgelem == MAXSGENTRIES-1 && h->sglist_pool) {
+ s->sgpage++;
+ /* Set this SG elem to point to dma-able addr of next SG page */
+ temp64.val = (__u64) c->sgdlist[s->sgpage].dma;
+ sgd->Addr.lower = temp64.val32.lower;
+ sgd->Addr.upper = temp64.val32.upper;
+ sgd->Ext = CISS_SG_CHAIN;
+ sgd->Len = SGD_SIZE; /* total no. of bytes in a SG page. */
+ /* (wrong for last page.) */
+ sgd = &c->sgdlist[s->sgpage].sgd[0];
+ s->sgelem = 0; /* Start at SG zero on the new SG page */
+ }
+ temp64.val = (__u64) pci_map_page(h->pdev, sg->page, sg->offset,
+ sg->length, s->data_dir);
+ sgd->Addr.lower = temp64.val32.lower;
+ sgd->Addr.upper = temp64.val32.upper;
+ sgd->Len = sg->length;
+ sgd->Ext = 0;
+ s->sgelem++;
}

/*
@@ -1810,7 +1908,7 @@ queue:
goto startio;

creq = elv_next_request(q);
- if (creq->nr_phys_segments > MAXSGENTRIES)
+ if (creq->nr_phys_segments > h->sg_limit)
BUG();

if (( c = cmd_alloc(h, 1)) == NULL)
@@ -1849,8 +1947,11 @@ queue:
map_state.data_dir = PCI_DMA_TODEVICE;

map_state.command = c;
+ map_state.sgelem = 0;
+ map_state.sgpage = 0;

seg = blk_rq_map_consume(q, creq, cciss_map_sg, &map_state);
+ seg += map_state.sgpage;

/* track how many SG entries we are using */
if( seg > h->maxSG)
@@ -1860,7 +1961,16 @@ queue:
printk(KERN_DEBUG "cciss: Submitting %d sectors in %d segments\n", creq->nr_sectors, seg);
#endif /* CCISS_DEBUG */

- c->Header.SGList = c->Header.SGTotal = seg;
+ if (seg > MAXSGENTRIES) {
+ c->Header.SGList = MAXSGENTRIES;
+ /* fixup last chained sg page length, probably not SGD_SIZE. */
+ c->sgdlist[map_state.sgpage-1].sgd[MAXSGENTRIES-1].Len =
+ map_state.sgelem * sizeof(SGDescriptor_struct);
+ } else
+ c->Header.SGList = seg;
+
+ c->Header.SGTotal = seg;
+
c->Request.CDB[1]= 0;
c->Request.CDB[2]= (start_blk >> 24) & 0xff; //MSB
c->Request.CDB[3]= (start_blk >> 16) & 0xff;
@@ -2328,6 +2438,33 @@ static void free_hba(int i)
kfree(p);
}

+
+static void bail_out(int i)
+{
+ if(hba[i]->cmd_pool_bits)
+ kfree(hba[i]->cmd_pool_bits);
+ if(hba[i]->cmd_pool)
+ pci_free_consistent(hba[i]->pdev,
+ NR_CMDS * sizeof(CommandList_struct),
+ hba[i]->cmd_pool, hba[i]->cmd_pool_dhandle);
+ if(hba[i]->errinfo_pool)
+ pci_free_consistent(hba[i]->pdev,
+ NR_CMDS * sizeof( ErrorInfo_struct),
+ hba[i]->errinfo_pool,
+ hba[i]->errinfo_pool_dhandle);
+ if (hba[i]->sglist_pool != NULL)
+ pci_free_consistent(hba[i]->pdev,
+ (CONFIG_CISS_MAX_SG_PAGES-1) * MAXSGENTRIES *
+ NR_CMDS * sizeof(SGDescriptor_struct),
+ hba[i]->sglist_pool,
+ hba[i]->sglist_pool_dhandle);
+ free_irq(hba[i]->intr, hba[i]);
+ unregister_blkdev(MAJOR_NR+i, hba[i]->devname);
+ release_io_mem(hba[i]);
+ free_hba(i);
+ printk( KERN_ERR "cciss: out of memory");
+}
+
/*
* This is it. Find all the controllers and register them. I really hate
* stealing all these major device numbers.
@@ -2376,6 +2513,11 @@ static int __init cciss_init_one(struct
free_hba(i);
return(-1);
}
+ /* Assume we can't do SG page chaining until we know otherwise */
+ hba[i]->sglist_pool = NULL;
+ hba[i]->sglist_pool_dhandle = (dma_addr_t) NULL;
+ hba[i]->sg_limit = MAXSGENTRIES;
+
/* make sure the board interrupts are off */
hba[i]->access.set_intr_mask(hba[i], CCISS_INTR_OFF);
if( request_irq(hba[i]->intr, do_cciss_intr,
@@ -2398,24 +2540,8 @@ static int __init cciss_init_one(struct
&(hba[i]->errinfo_pool_dhandle));
if((hba[i]->cmd_pool_bits == NULL)
|| (hba[i]->cmd_pool == NULL)
- || (hba[i]->errinfo_pool == NULL))
- {
- if(hba[i]->cmd_pool_bits)
- kfree(hba[i]->cmd_pool_bits);
- if(hba[i]->cmd_pool)
- pci_free_consistent(hba[i]->pdev,
- NR_CMDS * sizeof(CommandList_struct),
- hba[i]->cmd_pool, hba[i]->cmd_pool_dhandle);
- if(hba[i]->errinfo_pool)
- pci_free_consistent(hba[i]->pdev,
- NR_CMDS * sizeof( ErrorInfo_struct),
- hba[i]->errinfo_pool,
- hba[i]->errinfo_pool_dhandle);
- free_irq(hba[i]->intr, hba[i]);
- unregister_blkdev(MAJOR_NR+i, hba[i]->devname);
- release_io_mem(hba[i]);
- free_hba(i);
- printk( KERN_ERR "cciss: out of memory");
+ || (hba[i]->errinfo_pool == NULL)) {
+ bail_out(i);
return(-1);
}

@@ -2431,7 +2557,7 @@ static int __init cciss_init_one(struct
#endif /* CCISS_DEBUG */

cciss_getgeometry(i);
-
+ cciss_allocate_extra_sg_pages(i);
cciss_scsi_setup(i);

/* Turn the interrupts on so we can service requests */
@@ -2439,21 +2565,16 @@ static int __init cciss_init_one(struct

cciss_procinit(i);

+ hba[i]->sg_limit = figure_sg_limit(i);
q = &hba[i]->queue;
q->queuedata = hba[i];
spin_lock_init(&hba[i]->lock);
blk_init_queue(q, do_cciss_request, &hba[i]->lock);
blk_queue_bounce_limit(q, hba[i]->pdev->dma_mask);
-
- /* This is a hardware imposed limit. */
- blk_queue_max_hw_segments(q, MAXSGENTRIES);
-
- /* This is a limit in the driver and could be eliminated. */
- blk_queue_max_phys_segments(q, MAXSGENTRIES);
-
+ blk_queue_max_hw_segments(q, hba[i]->sg_limit);
+ blk_queue_max_phys_segments(q, hba[i]->sg_limit);
blk_queue_max_sectors(q, 512);

-
for(j=0; j<NWD; j++) {
drive_info_struct *drv = &(hba[i]->drv[j]);
struct gendisk *disk = hba[i]->gendisk[j];
@@ -2521,6 +2642,11 @@ static void __devexit cciss_remove_one (
hba[i]->cmd_pool, hba[i]->cmd_pool_dhandle);
pci_free_consistent(hba[i]->pdev, NR_CMDS * sizeof( ErrorInfo_struct),
hba[i]->errinfo_pool, hba[i]->errinfo_pool_dhandle);
+ if (hba[i]->sglist_pool != NULL)
+ pci_free_consistent(hba[i]->pdev,
+ (CONFIG_CISS_MAX_SG_PAGES-1) * MAXSGENTRIES *
+ NR_CMDS * sizeof(SGDescriptor_struct),
+ hba[i]->sglist_pool, hba[i]->sglist_pool_dhandle);
kfree(hba[i]->cmd_pool_bits);
release_io_mem(hba[i]);
free_hba(i);
--- lx2544ac3/drivers/block/cciss_cmd.h~sgc3 Mon Oct 28 15:18:44 2002
+++ lx2544ac3-root/drivers/block/cciss_cmd.h Mon Oct 28 15:42:13 2002
@@ -7,6 +7,10 @@

//general boundary defintions
#define SENSEINFOBYTES 32//note that this value may vary between host implementations
+
+/* MAXSGENTRIES is the max scatter gather entries per SG page,
+ some adapters support chaining SG pages, using the last SG
+ entry to point to the next SG page. */
#define MAXSGENTRIES 31
#define MAXREPLYQS 256

@@ -195,8 +199,11 @@ typedef struct _SGDescriptor_struct {
QWORD Addr;
DWORD Len;
DWORD Ext;
+#define CISS_SG_CHAIN 0x80000000
} SGDescriptor_struct;

+#define SGD_SIZE (sizeof(SGDescriptor_struct) * MAXSGENTRIES)
+
typedef union _MoreErrInfo_struct{
struct {
BYTE Reserved[3];
@@ -226,6 +233,11 @@ typedef struct _ErrorInfo_struct {
#define CMD_MSG_DONE 0x04
#define CMD_MSG_TIMEOUT 0x05

+struct sgd_list_t {
+ SGDescriptor_struct *sgd;
+ dma_addr_t dma;
+};
+
typedef struct _CommandList_struct {
CommandListHeader_struct Header;
RequestBlock_struct Request;
@@ -241,6 +253,7 @@ typedef struct _CommandList_struct {
struct request * rq;
struct completion *waiting;
int retry_count;
+ struct sgd_list_t sgdlist[CONFIG_CISS_MAX_SG_PAGES];
#ifdef CONFIG_CISS_SCSI_TAPE
void * scsi_cmd;
#endif
--- lx2544ac3/drivers/block/cciss.h~sgc3 Mon Oct 28 15:18:44 2002
+++ lx2544ac3-root/drivers/block/cciss.h Mon Oct 28 15:18:44 2002
@@ -36,6 +36,13 @@ typedef struct _drive_info_struct
int cylinders;
} drive_info_struct;

+#if CONFIG_CISS_MAX_SG_PAGES < 1
+#error "CONFIG_CISS_MAX_SG_PAGES must be greater than 0 and less than 9"
+#elif CONFIG_CISS_MAX_SG_PAGES > 8
+/* eight is rather arbitrary...no real limit here. */
+#error "CONFIG_CISS_MAX_SG_PAGES must be greater than 0 and less than 9"
+#endif
+
struct ctlr_info
{
int ctlr;
@@ -77,9 +84,12 @@ struct ctlr_info
dma_addr_t cmd_pool_dhandle;
ErrorInfo_struct *errinfo_pool;
dma_addr_t errinfo_pool_dhandle;
+ SGDescriptor_struct *sglist_pool;
+ dma_addr_t sglist_pool_dhandle;
unsigned long *cmd_pool_bits;
int nr_allocs;
int nr_frees;
+ int sg_limit;

// Disk structures we need to pass back
struct gendisk *gendisk[NWD];
--- lx2544ac3/drivers/block/cciss_scsi.c~sgc3 Mon Oct 28 15:18:44 2002
+++ lx2544ac3-root/drivers/block/cciss_scsi.c Mon Oct 28 15:18:44 2002
@@ -257,7 +257,7 @@ scsi_cmd_stack_free(int ctlr)
"Unknown" : scsi_device_types[n]

#if 0
-static int xmargin=8;
+static int xmargin=24;
static int amargin=60;

static void
@@ -297,6 +297,10 @@ print_bytes (unsigned char *c, int len,
static void
print_cmd(CommandList_struct *cp)
{
+ int i;
+ int sg, sgp;
+ SGDescriptor_struct *sgd;
+
printk("queue:%d\n", cp->Header.ReplyQueue);
printk("sglist:%d\n", cp->Header.SGList);
printk("sgtot:%d\n", cp->Header.SGTotal);
@@ -329,7 +333,25 @@ print_cmd(CommandList_struct *cp)
printk("edesc.Addr: 0x%08x/0%08x, Len = %d\n",
cp->ErrDesc.Addr.upper, cp->ErrDesc.Addr.lower,
cp->ErrDesc.Len);
- printk("sgs..........Errorinfo:\n");
+ printk("SGS:\n");
+ sgp = 0; sg = 0;
+ sgd = cp->sgdlist[sgp].sgd;
+ for (i=0;i<cp->Header.SGTotal;i++) {
+ printk("0x%08x%08x : %d: %d %s\n", sgd[sg].Addr.upper,
+ sgd[sg].Addr.lower, sgd[sg].Len,
+ sgd[sg].Ext, sg >= MAXSGENTRIES-1 ? " chain" : "");
+
+ if (sg == MAXSGENTRIES-1 && ! sgd[sg].Ext &&
+ cp->Header.SGTotal != cp->Header.SGList)
+ printk("Extended SG bit expected, but not set!\n");
+ sg++;
+ if (sg >= MAXSGENTRIES) {
+ sg = 0;
+ sgp++;
+ sgd = cp->sgdlist[sgp].sgd;
+ }
+ }
+ printk("Errorinfo:\n");
printk("scsistatus:%d\n", cp->err_info->ScsiStatus);
printk("senselen:%d\n", cp->err_info->SenseLen);
printk("cmd status:%d\n", cp->err_info->CommandStatus);
--- lx2544ac3/drivers/block/Config.help~sgc3 Mon Oct 28 15:18:44 2002
+++ lx2544ac3-root/drivers/block/Config.help Mon Oct 28 15:18:44 2002
@@ -196,6 +196,22 @@ CONFIG_BLK_CPQ_CISS_DA
boards supported by this driver, and for further information
on the use of this driver.

+CONFIG_CISS_MAX_SG_PAGES
+ This is the number of scatter gather pages to pre-allocate
+ for each command block. Each page for each command can hold
+ up to 31 scatter gather elements. Values greater than 1 may
+ yield better performance at the cost of memory usage. The
+ minumum value is 1, which is also likely to be a reasonable
+ value. You may observe the values reported in
+ /proc/driver/cciss/cciss* for "Max SG entries since init X/Y".
+ There are two values, X/Y, where X is the maximum number of SG
+ elements used by any one command so far, and Y is the total
+ number of scatter gather elements available for each command.
+ If X == Y, then at least once, some command used up all its
+ available scatter gather elements. If X remains less than Y
+ after some period of running a typical load, then there is no
+ benefit in increasing this parameter.
+
CONFIG_CISS_SCSI_TAPE
When enabled (Y), this option allows SCSI tape drives and SCSI medium
changers (tape robots) to be accessed via a Compaq 5xxx array
--- lx2544ac3/drivers/block/Config.in~sgc3 Mon Oct 28 15:18:44 2002
+++ lx2544ac3-root/drivers/block/Config.in Mon Oct 28 15:18:44 2002
@@ -35,6 +35,9 @@ if [ "$CONFIG_PARIDE" = "y" -o "$CONFIG_
fi
dep_tristate 'Compaq SMART2 support' CONFIG_BLK_CPQ_DA $CONFIG_PCI
dep_tristate 'Compaq Smart Array 5xxx support' CONFIG_BLK_CPQ_CISS_DA $CONFIG_PCI
+if [ "$CONFIG_BLK_CPQ_CISS_DA" = "y" -o "$CONFIG_BLK_CPQ_CISS_DA" = "m" ]; then
+ int ' Maximum scatter gather pages per command' CONFIG_CISS_MAX_SG_PAGES 1
+fi
dep_mbool ' SCSI tape drive support for Smart Array 5xxx' CONFIG_CISS_SCSI_TAPE $CONFIG_BLK_CPQ_CISS_DA $CONFIG_SCSI
dep_tristate 'Mylex DAC960/DAC1100 PCI RAID Controller support' CONFIG_BLK_DEV_DAC960 $CONFIG_PCI
dep_tristate 'Micro Memory MM5415 Battery Backed RAM support (EXPERIMENTAL)' CONFIG_BLK_DEV_UMEM $CONFIG_PCI $CONFIG_EXPERIMENTAL

.

2002-10-29 07:22:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

On Mon, Oct 28 2002, Stephen Cameron wrote:
> Jens Axboe wrote:
> [...]
> > It's not a kernel limit. If you queue limits are all beyond 64 entries,
> > what you typically see is that you just hit what mpage will fill in for
> > you. One thing that should give you max number of sg entries is plain
> > and simple:
> >
> > dd if=/dev/zero of=some_file bs=4096k
>
> What about
>
> dd if=/dev/cciss/c0d1p1 of=/dev/null bs=4096k

It's much better to do it with writes, you know the io scheduler will
stack those up in a long line, all max size. Reads are more difficult to
predict.

--
Jens Axboe

2002-10-30 20:21:06

by Cameron, Steve

[permalink] [raw]
Subject: Re: [PATCH] 2.5.44-ac3, cciss, more scatter gather elements

I wrote:
[...]
> Anyway, here is a new cciss patch to work with your ll_rw_blk.c patch
> to use more than 31 scatter gather entries.

And that patch was missing something important. It didn't unmap
all the addresses correctly. That's what I get for sitting on a patch
since 2.5.7, I guess. Here's a patch to fix that.

-- steve

drivers/block/cciss.c | 20 ++++++++++++++++----
1 files changed, 16 insertions, 4 deletions

--- lx2544ac5/drivers/block/cciss.c~sgc4 Wed Oct 30 14:09:56 2002
+++ lx2544ac5-root/drivers/block/cciss.c Wed Oct 30 14:09:56 2002
@@ -1719,6 +1719,8 @@ static inline void complete_command( ctl
int i;
int retry_cmd = 0;
u64bit temp64;
+ int sgpage, sgelem;
+ SGDescriptor_struct *sgd;

if (timeout)
status = 0;
@@ -1830,13 +1832,23 @@ static inline void complete_command( ctl
}
/* command did not need to be retried */
/* unmap the DMA mapping for all the scatter gather elements */
- for(i=0; i<cmd->Header.SGList; i++) {
- temp64.val32.lower = cmd->SG[i].Addr.lower;
- temp64.val32.upper = cmd->SG[i].Addr.upper;
+ sgpage = 0;
+ sgelem = 0;
+ for(i=0; i<cmd->Header.SGTotal; i++) {
+ sgd = &cmd->sgdlist[sgpage].sgd[sgelem];
+ if (sgelem == MAXSGENTRIES-1 && sgd->Ext) {
+ i++;
+ sgpage++;
+ sgelem=0;
+ sgd = &cmd->sgdlist[sgpage].sgd[sgelem];
+ }
+ temp64.val32.lower = sgd->Addr.lower;
+ temp64.val32.upper = sgd->Addr.upper;
pci_unmap_page(hba[cmd->ctlr]->pdev,
- temp64.val, cmd->SG[i].Len,
+ temp64.val, sgd->Len,
(cmd->Request.Type.Direction == XFER_READ) ?
PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE);
+ sgelem++;
}
complete_buffers(cmd->rq->bio, status);


.