2007-12-20 05:45:36

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 0/5] sg_ring for scsi

OK, some fixes since last time, as I wade through more SCSI drivers. Some
drivers use "use_sg" as a flag to know whether the request_buffer is a
scatterlist: I don't need the counter, but I still need the flag, so I fixed
that in a more intuitive way (an explicit ->sg pointer in the cmd).

Also, I've updated and tested scsi_debug, after Douglas's excellent
suggestion.

(I just found out about struct scsi_pointer, so I'm off to update that now,
too).

Cheers,
Rusty.


2007-12-20 05:49:18

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 1/5] sg_ring: sg_ring.h

(Updates since last time: const-safe iterators and sg_ring_num helper for
counting scatterlist entries across entire ring).

This patch introduces 'struct sg_ring', a layer on top of scatterlist
arrays. It meshes nicely with routines which expect a simple array of
'struct scatterlist' because it is easy to break down the ring into
its constituent arrays.

The sg_ring header also encodes the maximum number of entries, useful
for routines which populate an sg. We need never hand around a number
of elements any more.

Signed-off-by: Rusty Russell <[email protected]>
---
include/linux/sg_ring.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 74 insertions(+), 0 deletions(-)
create mode 100644 include/linux/sgring.h

diff --git a/include/linux/sg_ring.h b/include/linux/sg_ring.h
new file mode 100644
--- /dev/null
+++ b/include/linux/sg_ring.h
@@ -0,0 +1,124 @@
+#ifndef _LINUX_SG_RING_H
+#define _LINUX_SG_RING_H
+#include <linux/scatterlist.h>
+
+/**
+ * struct sg_ring - a ring of scatterlists
+ * @list: the list_head chaining them together
+ * @num: the number of valid sg entries
+ * @max: the maximum number of sg entries (size of the sg array).
+ * @sg: the array of scatterlist entries.
+ *
+ * This provides a convenient encapsulation of one or more scatter gather
+ * arrays.
+ */
+struct sg_ring
+{
+ struct list_head list;
+ unsigned int num, max;
+ struct scatterlist sg[0];
+};
+
+/* This helper declares an sg ring on the stack or in a struct. */
+#define DECLARE_SG_RING(name, max) \
+ struct { \
+ struct sg_ring ring; \
+ struct scatterlist sg[max]; \
+ } name
+
+/**
+ * sg_ring_init - initialize a scatterlist ring.
+ * @sg: the sg_ring.
+ * @max: the size of the trailing sg array.
+ *
+ * After initialization sg is alone in the ring.
+ */
+static inline void sg_ring_init(struct sg_ring *sg, unsigned int max)
+{
+#ifdef CONFIG_DEBUG_SG
+ unsigned int i;
+ for (i = 0; i < max; i++)
+ sg->sg[i].sg_magic = SG_MAGIC;
+#endif
+ INIT_LIST_HEAD(&sg->list);
+ sg->max = max;
+ /* FIXME: This is to clear the page bits. */
+ sg_init_table(sg->sg, sg->max);
+}
+
+/**
+ * sg_ring_single - initialize a one-element scatterlist ring.
+ * @sg: the sg_ring.
+ * @buf: the pointer to the buffer.
+ * @buflen: the length of the buffer.
+ *
+ * Does sg_ring_init and also sets up first (and only) sg element.
+ */
+static inline void sg_ring_single(struct sg_ring *sg,
+ const void *buf,
+ unsigned int buflen)
+{
+ sg_ring_init(sg, 1);
+ sg->num = 1;
+ sg_init_one(&sg->sg[0], buf, buflen);
+}
+
+/**
+ * sg_ring_next - next array in a scatterlist ring.
+ * @sg: the sg_ring.
+ * @head: the sg_ring head.
+ *
+ * This will return NULL once @sg has looped back around to @head.
+ */
+static inline struct sg_ring *sg_ring_next(const struct sg_ring *sg,
+ const struct sg_ring *head)
+{
+ sg = list_first_entry(&sg->list, struct sg_ring, list);
+ if (sg == head)
+ sg = NULL;
+ return (struct sg_ring *)sg;
+}
+
+/* Helper for writing for loops. */
+static inline struct sg_ring *sg_ring_iter(const struct sg_ring *head,
+ const struct sg_ring *sg,
+ unsigned int *i)
+{
+ (*i)++;
+ /* While loop lets us skip any zero-entry sg_ring arrays */
+ while (*i == sg->num) {
+ *i = 0;
+ sg = sg_ring_next(sg, head);
+ if (!sg)
+ break;
+ }
+ return (struct sg_ring *)sg;
+}
+
+/**
+ * sg_ring_for_each - iterate through an entire sg_ring ring
+ * @head: the head of the sg_ring.
+ * @sg: the sg_ring iterator.
+ * @i: an (unsigned) integer which refers to sg->sg[i].
+ *
+ * The current scatterlist element is sg->sg[i].
+ */
+#define sg_ring_for_each(head, sg, i) \
+ for (sg = head, i = 0; sg; sg = sg_ring_iter(head, sg, &i))
+
+/**
+ * sg_ring_num - how many struct scatterlists are used in this sg_ring.
+ * @head: the sg_ring
+ *
+ * Simple helper function to add up the number of scatterlists.
+ */
+static inline unsigned sg_ring_num(const struct sg_ring *head)
+{
+ unsigned int num = 0, i;
+ const struct sg_ring *sg;
+
+ sg_ring_for_each(head, sg, i)
+ num += sg->num;
+ return num;
+}
+#endif /* _LINUX_SG_RING_H */

2007-12-20 05:49:57

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 2/5] dma_map_sg_ring() helper

Obvious counterpart to dma_map_sg. Note that this is arch-independent
code; sg_rings are backwards compatible with simple sg arrays.

Signed-off-by: Rusty Russell <[email protected]>
---
drivers/base/dma-mapping.c | 13 +++++++++++++
include/linux/dma-mapping.h | 4 ++++
2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -8,6 +8,7 @@
*/

#include <linux/dma-mapping.h>
+#include <linux/sg_ring.h>

/*
* Managed DMA API
@@ -162,6 +163,59 @@ void dmam_free_noncoherent(struct device
}
EXPORT_SYMBOL(dmam_free_noncoherent);

+/**
+ * dma_map_sg_ring - Map an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * This returns -ENOMEM if mapping fails. It's not clear that telling you
+ * it failed is useful though.
+ */
+int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction)
+{
+ struct sg_ring *i;
+ unsigned int num;
+
+ for (i = sg; i; i = sg_ring_next(i, sg)) {
+ BUG_ON(i->num > i->max);
+ num = dma_map_sg(dev, i->sg, i->num, direction);
+ if (num == 0 && i->num != 0)
+ goto unmap;
+ }
+ return 0;
+
+unmap:
+ while (sg) {
+ dma_unmap_sg(dev, sg->sg, sg->num, direction);
+ sg = sg_ring_next(sg, i);
+ }
+ return -ENOMEM;
+
+}
+EXPORT_SYMBOL(dma_map_sg_ring);
+
+/**
+ * dma_unmap_sg_ring - Unmap an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * Call after dma_map_sg_ring() succeeds.
+ */
+void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction)
+{
+ struct sg_ring *i;
+
+ for (i = sg; i; i = sg_ring_next(i, sg)) {
+ BUG_ON(i->num > i->max);
+ dma_unmap_sg(dev, i->sg, i->num, direction);
+ }
+}
+EXPORT_SYMBOL(dma_unmap_sg_ring);
+
#ifdef ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY

static void dmam_coherent_decl_release(struct device *dev, void *res)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -87,6 +87,12 @@ dma_mark_declared_memory_occupied(struct
}
#endif

+struct sg_ring;
+extern int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+extern void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+
/*
* Managed DMA API
*/

2007-12-20 05:51:00

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 3/5] blk_rq_map_sg_ring as a counterpart to blk_rq_map_sg.

Obvious counterpart to blk_rq_map_sg.

Signed-off-by: Rusty Russell <[email protected]>

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -31,6 +31,7 @@
#include <linux/blktrace_api.h>
#include <linux/fault-inject.h>
#include <linux/scatterlist.h>
+#include <linux/sg_ring.h>

/*
* for max sense size
@@ -1364,6 +1365,68 @@ new_segment:

EXPORT_SYMBOL(blk_rq_map_sg);

+/**
+ * blk_rq_map_sg_ring - map a request to a scatterlist ring.
+ * @q: the request queue this request applies to.
+ * @rq: the request to map
+ * @sg: the sg_ring to populate.
+ *
+ * There must be enough elements in the sg_ring(s) to map the request.
+ */
+void blk_rq_map_sg_ring(struct request_queue *q, struct request *rq,
+ struct sg_ring *sg)
+{
+ struct bio_vec *bvec, *bvprv;
+ struct req_iterator iter;
+ int i, cluster;
+ struct sg_ring *head = sg;
+ struct scatterlist *sgprv;
+
+ i = 0;
+ cluster = q->queue_flags & (1 << QUEUE_FLAG_CLUSTER);
+
+ /*
+ * for each bio in rq
+ */
+ bvprv = NULL;
+ sgprv = NULL;
+ rq_for_each_segment(bvec, rq, iter) {
+ int nbytes = bvec->bv_len;
+
+ if (bvprv && cluster) {
+ if (sgprv->length + nbytes > q->max_segment_size)
+ goto new_segment;
+
+ if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
+ goto new_segment;
+ if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
+ goto new_segment;
+
+ sgprv->length += nbytes;
+ } else {
+new_segment:
+ sg_set_page(sg->sg + i, bvec->bv_page, nbytes,
+ bvec->bv_offset);
+ sgprv = sg->sg + i;
+ if (++i == sg->max) {
+ sg->num = i;
+ sg = sg_ring_next(sg, head);
+ i = 0;
+ }
+ }
+ bvprv = bvec;
+ } /* segments in rq */
+
+ /* If we were still working on an sg_ring, set the number and
+ * clear any following sg_rings. */
+ if (sg) {
+ sg->num = i;
+ for (sg = sg_ring_next(sg,head); sg; sg = sg_ring_next(sg,head))
+ sg->num = 0;
+ }
+}
+EXPORT_SYMBOL(blk_rq_map_sg_ring);
+
/*
* the standard queue merge functions, can be overridden with device
* specific ones if so desired
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -777,6 +777,8 @@ extern void blk_ordered_complete_seq(str
extern void blk_ordered_complete_seq(struct request_queue *, unsigned, int);

extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *);
+struct sg_ring;
+extern void blk_rq_map_sg_ring(struct request_queue *, struct request *, struct sg_ring *);
extern void blk_dump_rq_flags(struct request *, char *);
extern void generic_unplug_device(struct request_queue *);
extern void __generic_unplug_device(struct request_queue *);

2007-12-20 05:52:20

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 4/5] sg_ring: Convert core scsi code to sg_ring.

(Update: explicit sg member to replace use_sg indicating it's a scatter gather)

Details:
1) The use_sg (and __use_sg) fields are removed: sg_rings contain their own
count, and it's confusing to have two.
2) use_sg used to do double duty: if non-zero, it meant that the request_buffer
actually pointed to a scatterlist. Replace this with an explicit sg member:
if NULL, then request_buffer contains a pointer to the raw data.
3) The scsi_sg_count(), scsi_sglist() and scsi_bufflen() wrappers are removed.
scsi_sg_count() is no longer necessary, scsi_sglist() is now cmd->sg, and
scsi_bufflen should just be cmd->request_bufflen if you need it.

If nothing else, the simplification of this logic shows why I prefer
sg_ring over scatterlist chaining.

Signed-off-by: Rusty Russell <[email protected]>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -617,20 +617,21 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd
ses->cmd_len = scmd->cmd_len;
memcpy(ses->cmnd, scmd->cmnd, sizeof(scmd->cmnd));
ses->data_direction = scmd->sc_data_direction;
+ ses->sg = scmd->sg;
ses->bufflen = scmd->request_bufflen;
ses->buffer = scmd->request_buffer;
- ses->use_sg = scmd->use_sg;
ses->resid = scmd->resid;
ses->result = scmd->result;

if (sense_bytes) {
scmd->request_bufflen = min_t(unsigned,
sizeof(scmd->sense_buffer), sense_bytes);
- sg_init_one(&ses->sense_sgl, scmd->sense_buffer,
- scmd->request_bufflen);
- scmd->request_buffer = &ses->sense_sgl;
+
+ sg_ring_single(&ses->sense_sg.ring, scmd->sense_buffer,
+ scmd->request_bufflen);
+ scmd->sg = &ses->sense_sg.ring;
+ scmd->request_buffer = NULL;
scmd->sc_data_direction = DMA_FROM_DEVICE;
- scmd->use_sg = 1;
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
scmd->cmnd[0] = REQUEST_SENSE;
scmd->cmnd[4] = scmd->request_bufflen;
@@ -639,7 +640,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd
scmd->request_buffer = NULL;
scmd->request_bufflen = 0;
scmd->sc_data_direction = DMA_NONE;
- scmd->use_sg = 0;
+ scmd->sg = NULL;
if (cmnd) {
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
memcpy(scmd->cmnd, cmnd, cmnd_size);
@@ -678,7 +679,7 @@ void scsi_eh_restore_cmnd(struct scsi_cm
scmd->sc_data_direction = ses->data_direction;
scmd->request_bufflen = ses->bufflen;
scmd->request_buffer = ses->buffer;
- scmd->use_sg = ses->use_sg;
+ scmd->sg = ses->sg;
scmd->resid = ses->resid;
scmd->result = ses->result;
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -17,7 +17,7 @@
#include <linux/pci.h>
#include <linux/delay.h>
#include <linux/hardirq.h>
-#include <linux/scatterlist.h>
+#include <linux/sg_ring.h>

#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -737,21 +737,41 @@ static inline unsigned int scsi_sgtable_
return index;
}

-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static void free_sgring(struct sg_ring *head)
{
struct scsi_host_sg_pool *sgp;
- struct scatterlist *sgl, *prev, *ret;
+ struct sg_ring *sg, *n;
+
+ /* Free any other entries in the ring. */
+ list_for_each_entry_safe(sg, n, &head->list, list) {
+ list_del(&sg->list);
+ sgp = scsi_sg_pools + scsi_sgtable_index(sg->max);
+ mempool_free(sg, sgp->pool);
+ }
+
+ /* Now free the head of the ring. */
+ BUG_ON(!list_empty(&head->list));
+
+ sgp = scsi_sg_pools + scsi_sgtable_index(head->max);
+ mempool_free(head, sgp->pool);
+}
+
+struct sg_ring *scsi_alloc_sgring(struct scsi_cmnd *cmd, unsigned int num,
+ gfp_t gfp_mask)
+{
+ struct scsi_host_sg_pool *sgp;
+ struct sg_ring *sg, *ret;
unsigned int index;
int this, left;

- BUG_ON(!cmd->use_sg);
+ BUG_ON(!num);

- left = cmd->use_sg;
- ret = prev = NULL;
+ left = num;
+ ret = NULL;
do {
this = left;
if (this > SCSI_MAX_SG_SEGMENTS) {
- this = SCSI_MAX_SG_SEGMENTS - 1;
+ this = SCSI_MAX_SG_SEGMENTS;
index = SG_MEMPOOL_NR - 1;
} else
index = scsi_sgtable_index(this);
@@ -760,32 +780,20 @@ struct scatterlist *scsi_alloc_sgtable(s

sgp = scsi_sg_pools + index;

- sgl = mempool_alloc(sgp->pool, gfp_mask);
- if (unlikely(!sgl))
+ sg = mempool_alloc(sgp->pool, gfp_mask);
+ if (unlikely(!sg))
goto enomem;

- sg_init_table(sgl, sgp->size);
+ sg_ring_init(sg, sgp->size);

/*
- * first loop through, set initial index and return value
+ * first loop through, set return value, otherwise
+ * attach this to the tail.
*/
if (!ret)
- ret = sgl;
-
- /*
- * chain previous sglist, if any. we know the previous
- * sglist must be the biggest one, or we would not have
- * ended up doing another loop.
- */
- if (prev)
- sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
-
- /*
- * if we have nothing left, mark the last segment as
- * end-of-list
- */
- if (!left)
- sg_mark_end(&sgl[this - 1]);
+ ret = sg;
+ else
+ list_add_tail(&sg->list, &ret->list);

/*
* don't allow subsequent mempool allocs to sleep, it would
@@ -793,85 +801,21 @@ struct scatterlist *scsi_alloc_sgtable(s
*/
gfp_mask &= ~__GFP_WAIT;
gfp_mask |= __GFP_HIGH;
- prev = sgl;
} while (left);

- /*
- * ->use_sg may get modified after dma mapping has potentially
- * shrunk the number of segments, so keep a copy of it for free.
- */
- cmd->__use_sg = cmd->use_sg;
return ret;
enomem:
- if (ret) {
- /*
- * Free entries chained off ret. Since we were trying to
- * allocate another sglist, we know that all entries are of
- * the max size.
- */
- sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
- prev = ret;
- ret = &ret[SCSI_MAX_SG_SEGMENTS - 1];
-
- while ((sgl = sg_chain_ptr(ret)) != NULL) {
- ret = &sgl[SCSI_MAX_SG_SEGMENTS - 1];
- mempool_free(sgl, sgp->pool);
- }
-
- mempool_free(prev, sgp->pool);
- }
+ if (ret)
+ free_sgring(ret);
return NULL;
}
+EXPORT_SYMBOL(scsi_alloc_sgring);

-EXPORT_SYMBOL(scsi_alloc_sgtable);
-
-void scsi_free_sgtable(struct scsi_cmnd *cmd)
+void scsi_free_sgring(struct scsi_cmnd *cmd)
{
- struct scatterlist *sgl = cmd->request_buffer;
- struct scsi_host_sg_pool *sgp;
-
- /*
- * if this is the biggest size sglist, check if we have
- * chained parts we need to free
- */
- if (cmd->__use_sg > SCSI_MAX_SG_SEGMENTS) {
- unsigned short this, left;
- struct scatterlist *next;
- unsigned int index;
-
- left = cmd->__use_sg - (SCSI_MAX_SG_SEGMENTS - 1);
- next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
- while (left && next) {
- sgl = next;
- this = left;
- if (this > SCSI_MAX_SG_SEGMENTS) {
- this = SCSI_MAX_SG_SEGMENTS - 1;
- index = SG_MEMPOOL_NR - 1;
- } else
- index = scsi_sgtable_index(this);
-
- left -= this;
-
- sgp = scsi_sg_pools + index;
-
- if (left)
- next = sg_chain_ptr(&sgl[sgp->size - 1]);
-
- mempool_free(sgl, sgp->pool);
- }
-
- /*
- * Restore original, will be freed below
- */
- sgl = cmd->request_buffer;
- sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
- } else
- sgp = scsi_sg_pools + scsi_sgtable_index(cmd->__use_sg);
-
- mempool_free(sgl, sgp->pool);
+ free_sgring(cmd->sg);
}
-
-EXPORT_SYMBOL(scsi_free_sgtable);
+EXPORT_SYMBOL(scsi_free_sgring);

/*
* Function: scsi_release_buffers()
@@ -892,13 +836,14 @@ EXPORT_SYMBOL(scsi_free_sgtable);
*/
static void scsi_release_buffers(struct scsi_cmnd *cmd)
{
- if (cmd->use_sg)
- scsi_free_sgtable(cmd);
+ if (cmd->sg)
+ scsi_free_sgring(cmd);

/*
* Zero these out. They now point to freed memory, and it is
* dangerous to hang onto the pointers.
*/
+ cmd->sg = NULL;
cmd->request_buffer = NULL;
cmd->request_bufflen = 0;
}
@@ -976,7 +921,10 @@ void scsi_io_completion(struct scsi_cmnd
SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
"%d bytes done.\n",
req->nr_sectors, good_bytes));
- SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d\n", cmd->use_sg));
+ SCSI_LOG_HLCOMPLETE(1, printk("sg elems %d%s\n",
+ cmd->sg ? cmd->sg->num : 0,
+ cmd->sg && !list_empty(&cmd->sg->list) ?
+ "+" : ""));

if (clear_errors)
req->errors = 0;
@@ -1106,20 +1054,20 @@ static int scsi_init_io(struct scsi_cmnd
static int scsi_init_io(struct scsi_cmnd *cmd)
{
struct request *req = cmd->request;
- int count;

/*
* We used to not use scatter-gather for single segment request,
* but now we do (it makes highmem I/O easier to support without
* kmapping pages)
*/
- cmd->use_sg = req->nr_phys_segments;

+ cmd->request_buffer = NULL;
/*
* If sg table allocation fails, requeue request later.
*/
- cmd->request_buffer = scsi_alloc_sgtable(cmd, GFP_ATOMIC);
- if (unlikely(!cmd->request_buffer)) {
+ cmd->sg = scsi_alloc_sgring(cmd, req->nr_phys_segments, GFP_ATOMIC);
+ if (unlikely(!cmd->sg)) {
+ BUG();
scsi_unprep_request(req);
return BLKPREP_DEFER;
}
@@ -1134,9 +1082,7 @@ static int scsi_init_io(struct scsi_cmnd
* Next, walk the list, and fill in the addresses and sizes of
* each segment.
*/
- count = blk_rq_map_sg(req->q, req, cmd->request_buffer);
- BUG_ON(count > cmd->use_sg);
- cmd->use_sg = count;
+ blk_rq_map_sg_ring(req->q, req, cmd->sg);
return BLKPREP_OK;
}

@@ -1193,7 +1139,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_d

cmd->request_bufflen = 0;
cmd->request_buffer = NULL;
- cmd->use_sg = 0;
+ cmd->sg = NULL;
req->buffer = NULL;
}

@@ -1751,7 +1697,7 @@ int __init scsi_init_queue(void)

for (i = 0; i < SG_MEMPOOL_NR; i++) {
struct scsi_host_sg_pool *sgp = scsi_sg_pools + i;
- int size = sgp->size * sizeof(struct scatterlist);
+ int size = sizeof(struct sg_ring) + sgp->size * sizeof(struct scatterlist);

sgp->slab = kmem_cache_create(sgp->name, size, 0,
SLAB_HWCACHE_ALIGN, NULL);
diff --git a/drivers/scsi/scsi_lib_dma.c b/drivers/scsi/scsi_lib_dma.c
--- a/drivers/scsi/scsi_lib_dma.c
+++ b/drivers/scsi/scsi_lib_dma.c
@@ -15,22 +15,21 @@
* scsi_dma_map - perform DMA mapping against command's sg lists
* @cmd: scsi command
*
- * Returns the number of sg lists actually used, zero if the sg lists
- * is NULL, or -ENOMEM if the mapping failed.
+ * Returns -ENOMEM if the mapping failed, or the number of elements mapped
+ * (ie. sg_ring_num(cmd->sg)).
*/
int scsi_dma_map(struct scsi_cmnd *cmd)
{
- int nseg = 0;
+ if (cmd->sg) {
+ struct device *dev = cmd->device->host->shost_gendev.parent;
+ int err;

- if (scsi_sg_count(cmd)) {
- struct device *dev = cmd->device->host->shost_gendev.parent;
-
- nseg = dma_map_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd),
- cmd->sc_data_direction);
- if (unlikely(!nseg))
- return -ENOMEM;
+ err = dma_map_sg_ring(dev, cmd->sg, cmd->sc_data_direction);
+ if (err)
+ return err;
+ return sg_ring_num(cmd->sg);
}
- return nseg;
+ return 0;
}
EXPORT_SYMBOL(scsi_dma_map);

@@ -40,11 +39,10 @@ EXPORT_SYMBOL(scsi_dma_map);
*/
void scsi_dma_unmap(struct scsi_cmnd *cmd)
{
- if (scsi_sg_count(cmd)) {
+ if (cmd->sg) {
struct device *dev = cmd->device->host->shost_gendev.parent;

- dma_unmap_sg(dev, scsi_sglist(cmd), scsi_sg_count(cmd),
- cmd->sc_data_direction);
+ dma_unmap_sg_ring(dev, cmd->sg, cmd->sc_data_direction);
}
}
EXPORT_SYMBOL(scsi_dma_unmap);
diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -331,7 +331,7 @@ static void scsi_tgt_cmd_done(struct scs

scsi_tgt_uspace_send_status(cmd, tcmd->itn_id, tcmd->tag);

- if (cmd->request_buffer)
+ if (cmd->sg)
scsi_free_sgtable(cmd);

queue_work(scsi_tgtd, &tcmd->work);
@@ -358,15 +358,14 @@ static int scsi_tgt_init_cmd(struct scsi
struct request *rq = cmd->request;
int count;

- cmd->use_sg = rq->nr_phys_segments;
- cmd->request_buffer = scsi_alloc_sgtable(cmd, gfp_mask);
- if (!cmd->request_buffer)
+ cmd->sg = scsi_alloc_sgring(cmd, rq->nr_phys_segments, gfp_mask);
+ if (!cmd->sg)
return -ENOMEM;
-
+ cmd->request_buffer = NULL;
cmd->request_bufflen = rq->data_len;

- dprintk("cmd %p cnt %d %lu\n", cmd, cmd->use_sg, rq_data_dir(rq));
- count = blk_rq_map_sg(rq->q, rq, cmd->request_buffer);
+ dprintk("cmd %p cnt %d %lu\n", cmd, rq->nr_phys_segments, rq_data_dir(rq));
+ count = blk_rq_map_sg_ring(rq->q, rq, cmd->request_buffer);
BUG_ON(count > cmd->use_sg);
cmd->use_sg = count;
return 0;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -65,11 +65,8 @@ struct scsi_cmnd {
unsigned request_bufflen; /* Actual request size */

struct timer_list eh_timeout; /* Used to time out the command. */
- void *request_buffer; /* Actual requested buffer */
-
- /* These elements define the operation we ultimately want to perform */
- unsigned short use_sg; /* Number of pieces of scatter-gather */
- unsigned short __use_sg;
+ struct sg_ring *sg; /* if it's a scatter-gather, otherwise... */
+ void *request_buffer; /* Actual requested buffer */

unsigned underflow; /* Return error if less than
this amount is transferred */
@@ -128,15 +125,11 @@ extern void *scsi_kmap_atomic_sg(struct
size_t *offset, size_t *len);
extern void scsi_kunmap_atomic_sg(void *virt);

-extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
-extern void scsi_free_sgtable(struct scsi_cmnd *);
+extern struct sg_ring *scsi_alloc_sgring(struct scsi_cmnd *, unsigned, gfp_t);
+extern void scsi_free_sgring(struct scsi_cmnd *);

extern int scsi_dma_map(struct scsi_cmnd *cmd);
extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
-
-#define scsi_sg_count(cmd) ((cmd)->use_sg)
-#define scsi_sglist(cmd) ((struct scatterlist *)(cmd)->request_buffer)
-#define scsi_bufflen(cmd) ((cmd)->request_bufflen)

static inline void scsi_set_resid(struct scsi_cmnd *cmd, int resid)
{
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -1,7 +1,7 @@
#ifndef _SCSI_SCSI_EH_H
#define _SCSI_SCSI_EH_H

-#include <linux/scatterlist.h>
+#include <linux/sg_ring.h>

#include <scsi/scsi_cmnd.h>
struct scsi_device;
@@ -75,10 +75,10 @@ struct scsi_eh_save {

void *buffer;
unsigned bufflen;
- unsigned short use_sg;
+ struct sg_ring *sg;
int resid;

- struct scatterlist sense_sgl;
+ DECLARE_SG_RING(sense_sg, 1);
};

extern void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd,

2007-12-20 05:53:57

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 5/5] sg_ring: Convert scsi_debug

Douglas Gilbert pointed out that converting scsi_debug would be a good
demonstration of the conversion required for other SCSI devices.

Details of the changes:
1) ->use_sg is replaced by ->sg, which if non-NULL, contains the sg_ring.
2) ->request_buffer can be NULL (it's only relevent if ->sg isn't set)
3) sg_ring_for_each is no longer required, just iterate directly over ->sg.
4) The iterator updates a struct sg_ring (sg) and an index (k), so previous
references to sg become &sg->sg[k] (the k'th element within the sg_ring sg).

Signed-off-by: Rusty Russell <[email protected]>

diff -r c5fe2cab1d48 drivers/scsi/scsi_debug.c
--- a/drivers/scsi/scsi_debug.c Thu Dec 20 13:12:43 2007 +1100
+++ b/drivers/scsi/scsi_debug.c Thu Dec 20 13:39:24 2007 +1100
@@ -601,16 +601,16 @@ static int fill_from_dev_buffer(struct s
int k, req_len, act_len, len, active;
void * kaddr;
void * kaddr_off;
- struct scatterlist * sg;
+ struct sg_ring * sg;

if (0 == scp->request_bufflen)
return 0;
- if (NULL == scp->request_buffer)
- return (DID_ERROR << 16);
if (! ((scp->sc_data_direction == DMA_BIDIRECTIONAL) ||
(scp->sc_data_direction == DMA_FROM_DEVICE)))
return (DID_ERROR << 16);
- if (0 == scp->use_sg) {
+ if (NULL == scp->sg) {
+ if (NULL == scp->request_buffer)
+ return (DID_ERROR << 16);
req_len = scp->request_bufflen;
act_len = (req_len < arr_len) ? req_len : arr_len;
memcpy(scp->request_buffer, arr, act_len);
@@ -622,14 +622,14 @@ static int fill_from_dev_buffer(struct s
}
active = 1;
req_len = act_len = 0;
- scsi_for_each_sg(scp, sg, scp->use_sg, k) {
+ sg_ring_for_each(scp->sg, sg, k) {
if (active) {
kaddr = (unsigned char *)
- kmap_atomic(sg_page(sg), KM_USER0);
+ kmap_atomic(sg_page(&sg->sg[k]), KM_USER0);
if (NULL == kaddr)
return (DID_ERROR << 16);
- kaddr_off = (unsigned char *)kaddr + sg->offset;
- len = sg->length;
+ kaddr_off = (unsigned char *)kaddr + sg->sg[k].offset;
+ len = sg->sg[k].length;
if ((req_len + len) > arr_len) {
active = 0;
len = arr_len - req_len;
@@ -638,7 +638,7 @@ static int fill_from_dev_buffer(struct s
kunmap_atomic(kaddr, KM_USER0);
act_len += len;
}
- req_len += sg->length;
+ req_len += sg->sg[k].length;
}
if (scp->resid)
scp->resid -= act_len;
@@ -654,29 +654,29 @@ static int fetch_to_dev_buffer(struct sc
int k, req_len, len, fin;
void * kaddr;
void * kaddr_off;
- struct scatterlist * sg;
+ struct sg_ring * sg;

if (0 == scp->request_bufflen)
return 0;
- if (NULL == scp->request_buffer)
- return -1;
if (! ((scp->sc_data_direction == DMA_BIDIRECTIONAL) ||
(scp->sc_data_direction == DMA_TO_DEVICE)))
return -1;
- if (0 == scp->use_sg) {
+ if (NULL == scp->sg) {
+ if (NULL == scp->request_buffer)
+ return -1;
req_len = scp->request_bufflen;
len = (req_len < max_arr_len) ? req_len : max_arr_len;
memcpy(arr, scp->request_buffer, len);
return len;
}
- sg = scsi_sglist(scp);
req_len = fin = 0;
- for (k = 0; k < scp->use_sg; ++k, sg = sg_next(sg)) {
- kaddr = (unsigned char *)kmap_atomic(sg_page(sg), KM_USER0);
+ sg_ring_for_each(scp->sg, sg, k) {
+ kaddr = (unsigned char *)kmap_atomic(sg_page(&sg->sg[k]),
+ KM_USER0);
if (NULL == kaddr)
return -1;
- kaddr_off = (unsigned char *)kaddr + sg->offset;
- len = sg->length;
+ kaddr_off = (unsigned char *)kaddr + sg->sg[k].offset;
+ len = sg->sg[k].length;
if ((req_len + len) > max_arr_len) {
len = max_arr_len - req_len;
fin = 1;
@@ -685,7 +685,7 @@ static int fetch_to_dev_buffer(struct sc
kunmap_atomic(kaddr, KM_USER0);
if (fin)
return req_len + len;
- req_len += sg->length;
+ req_len += sg->sg[k].length;
}
return req_len;
}

2007-12-20 07:07:19

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

On Thu, 20 Dec 2007 16:49:30 +1100
Rusty Russell <[email protected]> wrote:

> Obvious counterpart to dma_map_sg. Note that this is arch-independent
> code; sg_rings are backwards compatible with simple sg arrays.
>
> Signed-off-by: Rusty Russell <[email protected]>
> ---
> drivers/base/dma-mapping.c | 13 +++++++++++++
> include/linux/dma-mapping.h | 4 ++++
> 2 files changed, 17 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
> --- a/drivers/base/dma-mapping.c
> +++ b/drivers/base/dma-mapping.c
> @@ -8,6 +8,7 @@
> */
>
> #include <linux/dma-mapping.h>
> +#include <linux/sg_ring.h>
>
> /*
> * Managed DMA API
> @@ -162,6 +163,59 @@ void dmam_free_noncoherent(struct device
> }
> EXPORT_SYMBOL(dmam_free_noncoherent);
>
> +/**
> + * dma_map_sg_ring - Map an entire sg ring
> + * @dev: Device to free noncoherent memory for
> + * @sg: The sg_ring
> + * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
> + *
> + * This returns -ENOMEM if mapping fails. It's not clear that telling you
> + * it failed is useful though.
> + */
> +int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
> + enum dma_data_direction direction)
> +{
> + struct sg_ring *i;
> + unsigned int num;
> +
> + for (i = sg; i; i = sg_ring_next(i, sg)) {
> + BUG_ON(i->num > i->max);
> + num = dma_map_sg(dev, i->sg, i->num, direction);
> + if (num == 0 && i->num != 0)
> + goto unmap;
> + }
> + return 0;

I don't think that this works for IOMMUs that could merge sg entries.

2007-12-20 07:08:22

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Thu, 20 Dec 2007 16:45:18 +1100
Rusty Russell <[email protected]> wrote:

> OK, some fixes since last time, as I wade through more SCSI drivers. Some
> drivers use "use_sg" as a flag to know whether the request_buffer is a
> scatterlist: I don't need the counter, but I still need the flag, so I fixed
> that in a more intuitive way (an explicit ->sg pointer in the cmd).

use_sg and the request_buffer will be removed shortly.

http://marc.info/?l=linux-scsi&m=119754650614813&w=2


I think that we tried the similar idea before, scsi_sgtable, but we
seem to settle in the current simple approach.

2007-12-20 07:43:01

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

From: FUJITA Tomonori <[email protected]>
Date: Thu, 20 Dec 2007 16:06:31 +0900

> On Thu, 20 Dec 2007 16:49:30 +1100
> Rusty Russell <[email protected]> wrote:
>
> > +/**
> > + * dma_map_sg_ring - Map an entire sg ring
> > + * @dev: Device to free noncoherent memory for
> > + * @sg: The sg_ring
> > + * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
> > + *
> > + * This returns -ENOMEM if mapping fails. It's not clear that telling you
> > + * it failed is useful though.
> > + */
> > +int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
> > + enum dma_data_direction direction)
> > +{
> > + struct sg_ring *i;
> > + unsigned int num;
> > +
> > + for (i = sg; i; i = sg_ring_next(i, sg)) {
> > + BUG_ON(i->num > i->max);
> > + num = dma_map_sg(dev, i->sg, i->num, direction);
> > + if (num == 0 && i->num != 0)
> > + goto unmap;
> > + }
> > + return 0;
>
> I don't think that this works for IOMMUs that could merge sg entries.

Right, it won't work at all.

The caller has to be told how many DMA entries it really did use to
compose the mapping, and there has to be a way to properly iterate
over them. The assumption that the IOMMU will map the SG entries
1-to-1 is invalid.

The IOMMU code can and will map thousands of pages all to one linear
DMA address+length pair of all the segments start and end on page
boundaries, since it can virtually remap those pages however it
chooses.

2007-12-20 07:54:13

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote:
> On Thu, 20 Dec 2007 16:45:18 +1100
>
> Rusty Russell <[email protected]> wrote:
> > OK, some fixes since last time, as I wade through more SCSI drivers.
> > Some drivers use "use_sg" as a flag to know whether the request_buffer is
> > a scatterlist: I don't need the counter, but I still need the flag, so I
> > fixed that in a more intuitive way (an explicit ->sg pointer in the cmd).
>
> use_sg and the request_buffer will be removed shortly.
>
> http://marc.info/?l=linux-scsi&m=119754650614813&w=2

Thanks! Is there a git tree somewhere with these changes?

> I think that we tried the similar idea before, scsi_sgtable, but we
> seem to settle in the current simple approach.

Yes, a scsi-specific solution is a bad idea: other people use sg.
Manipulating the magic chains is horrible; it looks simple to the places
which simply want to iterate through it, but it's awful for code which wants
to create them.

Cheers,
Rusty.

2007-12-20 07:58:24

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

From: Rusty Russell <[email protected]>
Date: Thu, 20 Dec 2007 18:53:48 +1100

> Manipulating the magic chains is horrible; it looks simple to the
> places which simply want to iterate through it, but it's awful for
> code which wants to create them.

I'm not saying complexity is inherent in this stuff, but
assuming that it is the complexity should live as far away
from the minions (the iterators in this case). Therefore,
the creators is the right spot for the hard stuff.

2007-12-20 08:03:10

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Thu, Dec 20 2007, Rusty Russell wrote:
> On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote:
> > On Thu, 20 Dec 2007 16:45:18 +1100
> >
> > Rusty Russell <[email protected]> wrote:
> > > OK, some fixes since last time, as I wade through more SCSI drivers.
> > > Some drivers use "use_sg" as a flag to know whether the request_buffer is
> > > a scatterlist: I don't need the counter, but I still need the flag, so I
> > > fixed that in a more intuitive way (an explicit ->sg pointer in the cmd).
> >
> > use_sg and the request_buffer will be removed shortly.
> >
> > http://marc.info/?l=linux-scsi&m=119754650614813&w=2
>
> Thanks! Is there a git tree somewhere with these changes?
>
> > I think that we tried the similar idea before, scsi_sgtable, but we
> > seem to settle in the current simple approach.
>
> Yes, a scsi-specific solution is a bad idea: other people use sg.
> Manipulating the magic chains is horrible; it looks simple to the places
> which simply want to iterate through it, but it's awful for code which wants
> to create them.

The current code looks like that to minimize impact on 2.6.24, see this
branch:

http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=sg

for how it folds into lib/sg.c and the magic disappears from SCSI.
Rusty, nobody claimed the sg code in 2.6.24 is perfect. I like to get
things incrementally there.

--
Jens Axboe

2007-12-20 08:03:44

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Wed, Dec 19 2007, David Miller wrote:
> From: Rusty Russell <[email protected]>
> Date: Thu, 20 Dec 2007 18:53:48 +1100
>
> > Manipulating the magic chains is horrible; it looks simple to the
> > places which simply want to iterate through it, but it's awful for
> > code which wants to create them.
>
> I'm not saying complexity is inherent in this stuff, but
> assuming that it is the complexity should live as far away
> from the minions (the iterators in this case). Therefore,
> the creators is the right spot for the hard stuff.

Agree, and the missing bit is just moving the creators out of the driver
parts and into a core helper. See the previous post on the 'sg' branch
for 2.6.25.

--
Jens Axboe

2007-12-20 13:57:26

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Thu, Dec 20 2007 at 9:58 +0200, Jens Axboe <[email protected]> wrote:
> On Thu, Dec 20 2007, Rusty Russell wrote:
>> On Thursday 20 December 2007 18:07:41 FUJITA Tomonori wrote:
>>> On Thu, 20 Dec 2007 16:45:18 +1100
>>>
>>> Rusty Russell <[email protected]> wrote:
>>>> OK, some fixes since last time, as I wade through more SCSI drivers.
>>>> Some drivers use "use_sg" as a flag to know whether the request_buffer is
>>>> a scatterlist: I don't need the counter, but I still need the flag, so I
>>>> fixed that in a more intuitive way (an explicit ->sg pointer in the cmd).
>>> use_sg and the request_buffer will be removed shortly.
>>>
>>> http://marc.info/?l=linux-scsi&m=119754650614813&w=2
>> Thanks! Is there a git tree somewhere with these changes?
>>
>>> I think that we tried the similar idea before, scsi_sgtable, but we
>>> seem to settle in the current simple approach.
>> Yes, a scsi-specific solution is a bad idea: other people use sg.
>> Manipulating the magic chains is horrible; it looks simple to the places
>> which simply want to iterate through it, but it's awful for code which wants
>> to create them.
>
> The current code looks like that to minimize impact on 2.6.24, see this
> branch:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=sg
>
> for how it folds into lib/sg.c and the magic disappears from SCSI.
> Rusty, nobody claimed the sg code in 2.6.24 is perfect. I like to get
> things incrementally there.
>
Dear Jens.

Is this code scheduled for 2.6.25? I cannot find it in mm tree.

Because above code conflicts with the scsi_data_buffer patch,
that is in mm tree and I hope will get accepted into 2.6.25.
Now the concepts are not at all conflicting, only that they
both bang in the same places.
(And by the way it does not apply on scsi-misc either.
And it did not compile in your tree, a missing bit in
ide-scsi.c)

I have rebase and fixed your code on top of scsi_data_buffer patchset.
Please review. (Patchset posted as reply to this mail)

They are totaly untested, based on -mm tree.
We should decide the order of these patches and rebase
accordingly.

AND ...
Please send, to-be-included-in-next-kernel patches to Morton.
This way we can account for them. Also I do not see Ack-by:
of the scsi maintainer in the scsi bits of your patches.
Is it not a costume to Ack-on bits that belong to other maintainers,
even for maintainers?

Boaz

2007-12-20 14:01:05

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers


Manually doing chained sg lists is not trivial, so add some helpers
to make sure that drivers get it right.

Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/scatterlist.h | 125 ++++---------------
lib/Makefile | 2 +-
lib/scatterlist.c | 281 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 307 insertions(+), 101 deletions(-)
create mode 100644 lib/scatterlist.c

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 416e000..c3ca848 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -7,6 +7,12 @@
#include <linux/string.h>
#include <asm/io.h>

+struct sg_table {
+ struct scatterlist *sgl; /* the list */
+ unsigned int nents; /* number of mapped entries */
+ unsigned int orig_nents; /* original size of list */
+};
+
/*
* Notes on SG table design.
*
@@ -106,31 +112,6 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
}

-/**
- * sg_next - return the next scatterlist entry in a list
- * @sg: The current sg entry
- *
- * Description:
- * Usually the next entry will be @sg@ + 1, but if this sg element is part
- * of a chained scatterlist, it could jump to the start of a new
- * scatterlist array.
- *
- **/
-static inline struct scatterlist *sg_next(struct scatterlist *sg)
-{
-#ifdef CONFIG_DEBUG_SG
- BUG_ON(sg->sg_magic != SG_MAGIC);
-#endif
- if (sg_is_last(sg))
- return NULL;
-
- sg++;
- if (unlikely(sg_is_chain(sg)))
- sg = sg_chain_ptr(sg);
-
- return sg;
-}
-
/*
* Loop over each sg element, following the pointer to a new list if necessary
*/
@@ -138,40 +119,6 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg)
for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg))

/**
- * sg_last - return the last scatterlist entry in a list
- * @sgl: First entry in the scatterlist
- * @nents: Number of entries in the scatterlist
- *
- * Description:
- * Should only be used casually, it (currently) scan the entire list
- * to get the last entry.
- *
- * Note that the @sgl@ pointer passed in need not be the first one,
- * the important bit is that @nents@ denotes the number of entries that
- * exist from @sgl@.
- *
- **/
-static inline struct scatterlist *sg_last(struct scatterlist *sgl,
- unsigned int nents)
-{
-#ifndef ARCH_HAS_SG_CHAIN
- struct scatterlist *ret = &sgl[nents - 1];
-#else
- struct scatterlist *sg, *ret = NULL;
- unsigned int i;
-
- for_each_sg(sgl, sg, nents, i)
- ret = sg;
-
-#endif
-#ifdef CONFIG_DEBUG_SG
- BUG_ON(sgl[0].sg_magic != SG_MAGIC);
- BUG_ON(!sg_is_last(ret));
-#endif
- return ret;
-}
-
-/**
* sg_chain - Chain two sglists together
* @prv: First scatterlist
* @prv_nents: Number of entries in prv
@@ -223,47 +170,6 @@ static inline void sg_mark_end(struct scatterlist *sg)
}

/**
- * sg_init_table - Initialize SG table
- * @sgl: The SG table
- * @nents: Number of entries in table
- *
- * Notes:
- * If this is part of a chained sg table, sg_mark_end() should be
- * used only on the last table part.
- *
- **/
-static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
-{
- memset(sgl, 0, sizeof(*sgl) * nents);
-#ifdef CONFIG_DEBUG_SG
- {
- unsigned int i;
- for (i = 0; i < nents; i++)
- sgl[i].sg_magic = SG_MAGIC;
- }
-#endif
- sg_mark_end(&sgl[nents - 1]);
-}
-
-/**
- * sg_init_one - Initialize a single entry sg list
- * @sg: SG entry
- * @buf: Virtual address for IO
- * @buflen: IO length
- *
- * Notes:
- * This should not be used on a single entry that is part of a larger
- * table. Use sg_init_table() for that.
- *
- **/
-static inline void sg_init_one(struct scatterlist *sg, const void *buf,
- unsigned int buflen)
-{
- sg_init_table(sg, 1);
- sg_set_buf(sg, buf, buflen);
-}
-
-/**
* sg_phys - Return physical address of an sg entry
* @sg: SG entry
*
@@ -293,4 +199,23 @@ static inline void *sg_virt(struct scatterlist *sg)
return page_address(sg_page(sg)) + sg->offset;
}

+struct scatterlist *sg_next(struct scatterlist *);
+struct scatterlist *sg_last(struct scatterlist *s, unsigned int);
+void sg_init_table(struct scatterlist *, unsigned int);
+void sg_init_one(struct scatterlist *, const void *, unsigned int);
+
+typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t);
+typedef void (sg_free_fn)(struct scatterlist *, unsigned int);
+
+void __sg_free_table(struct sg_table *, sg_free_fn *);
+void sg_free_table(struct sg_table *);
+int __sg_alloc_table(struct sg_table *, unsigned int, gfp_t, sg_alloc_fn *);
+int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
+
+/*
+ * Maximum number of entries that will be allocated in one piece, if
+ * a list larger than this is required then chaining will be utilized.
+ */
+#define SG_MAX_SINGLE_ALLOC (PAGE_SIZE / sizeof(struct scatterlist))
+
#endif /* _LINUX_SCATTERLIST_H */
diff --git a/lib/Makefile b/lib/Makefile
index b6793ed..89841dc 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -6,7 +6,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o dump_stack.o \
idr.o int_sqrt.o extable.o prio_tree.o \
sha1.o irq_regs.o reciprocal_div.o argv_split.o \
- proportions.o prio_heap.o
+ proportions.o prio_heap.o scatterlist.o

lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
new file mode 100644
index 0000000..02aaa27
--- /dev/null
+++ b/lib/scatterlist.c
@@ -0,0 +1,281 @@
+/*
+ * Copyright (C) 2007 Jens Axboe <[email protected]>
+ *
+ * Scatterlist handling helpers.
+ *
+ * This source code is licensed under the GNU General Public License,
+ * Version 2. See the file COPYING for more details.
+ */
+#include <linux/module.h>
+#include <linux/scatterlist.h>
+
+/**
+ * sg_next - return the next scatterlist entry in a list
+ * @sg: The current sg entry
+ *
+ * Description:
+ * Usually the next entry will be @sg@ + 1, but if this sg element is part
+ * of a chained scatterlist, it could jump to the start of a new
+ * scatterlist array.
+ *
+ **/
+struct scatterlist *sg_next(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ if (sg_is_last(sg))
+ return NULL;
+
+ sg++;
+ if (unlikely(sg_is_chain(sg)))
+ sg = sg_chain_ptr(sg);
+
+ return sg;
+}
+EXPORT_SYMBOL(sg_next);
+
+/**
+ * sg_last - return the last scatterlist entry in a list
+ * @sgl: First entry in the scatterlist
+ * @nents: Number of entries in the scatterlist
+ *
+ * Description:
+ * Should only be used casually, it (currently) scans the entire list
+ * to get the last entry.
+ *
+ * Note that the @sgl@ pointer passed in need not be the first one,
+ * the important bit is that @nents@ denotes the number of entries that
+ * exist from @sgl@.
+ *
+ **/
+struct scatterlist *sg_last(struct scatterlist *sgl, unsigned int nents)
+{
+#ifndef ARCH_HAS_SG_CHAIN
+ struct scatterlist *ret = &sgl[nents - 1];
+#else
+ struct scatterlist *sg, *ret = NULL;
+ unsigned int i;
+
+ for_each_sg(sgl, sg, nents, i)
+ ret = sg;
+
+#endif
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sgl[0].sg_magic != SG_MAGIC);
+ BUG_ON(!sg_is_last(ret));
+#endif
+ return ret;
+}
+EXPORT_SYMBOL(sg_last);
+
+/**
+ * sg_init_table - Initialize SG table
+ * @sgl: The SG table
+ * @nents: Number of entries in table
+ *
+ * Notes:
+ * If this is part of a chained sg table, sg_mark_end() should be
+ * used only on the last table part.
+ *
+ **/
+void sg_init_table(struct scatterlist *sgl, unsigned int nents)
+{
+ memset(sgl, 0, sizeof(*sgl) * nents);
+#ifdef CONFIG_DEBUG_SG
+ {
+ unsigned int i;
+ for (i = 0; i < nents; i++)
+ sgl[i].sg_magic = SG_MAGIC;
+ }
+#endif
+ sg_mark_end(&sgl[nents - 1]);
+}
+EXPORT_SYMBOL(sg_init_table);
+
+/**
+ * sg_init_one - Initialize a single entry sg list
+ * @sg: SG entry
+ * @buf: Virtual address for IO
+ * @buflen: IO length
+ *
+ **/
+void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen)
+{
+ sg_init_table(sg, 1);
+ sg_set_buf(sg, buf, buflen);
+}
+EXPORT_SYMBOL(sg_init_one);
+
+/*
+ * The default behaviour of sg_alloc_table() is to use these kmalloc/kfree
+ * helpers.
+ */
+static struct scatterlist *sg_kmalloc(unsigned int nents, gfp_t gfp_mask)
+{
+ if (nents == SG_MAX_SINGLE_ALLOC)
+ return (struct scatterlist *) __get_free_page(gfp_mask);
+ else
+ return kmalloc(nents * sizeof(struct scatterlist), gfp_mask);
+}
+
+static void sg_kfree(struct scatterlist *sg, unsigned int nents)
+{
+ if (nents == SG_MAX_SINGLE_ALLOC)
+ free_page((unsigned long) sg);
+ else
+ kfree(sg);
+}
+
+/**
+ * __sg_free_table - Free a previously mapped sg table
+ * @table: The sg table header to use
+ * @free_fn: Free function
+ *
+ * Description:
+ * Free an sg table previously allocated and setup with __sg_alloc_table().
+ *
+ **/
+void __sg_free_table(struct sg_table *table, sg_free_fn *free_fn)
+{
+ struct scatterlist *sgl, *next;
+
+ if (unlikely(!table->sgl))
+ return;
+
+ sgl = table->sgl;
+ while (table->orig_nents) {
+ unsigned int alloc_size = table->orig_nents;
+ unsigned int sg_size;
+
+ /*
+ * If we have more than SG_MAX_SINGLE_ALLOC segments left,
+ * then assign 'next' to the sg table after the current one.
+ * sg_size is then one less than alloc size, since the last
+ * element is the chain pointer.
+ */
+ if (alloc_size > SG_MAX_SINGLE_ALLOC) {
+ next = sg_chain_ptr(&sgl[SG_MAX_SINGLE_ALLOC - 1]);
+ alloc_size = SG_MAX_SINGLE_ALLOC;
+ sg_size = alloc_size - 1;
+ } else {
+ sg_size = alloc_size;
+ next = NULL;
+ }
+
+ table->orig_nents -= sg_size;
+ free_fn(sgl, alloc_size);
+ sgl = next;
+ }
+
+ table->sgl = NULL;
+}
+EXPORT_SYMBOL(__sg_free_table);
+
+/**
+ * sg_free_table - Free a previously allocated sg table
+ * @table: The mapped sg table header
+ *
+ **/
+void sg_free_table(struct sg_table *table)
+{
+ __sg_free_table(table, sg_kfree);
+}
+EXPORT_SYMBOL(sg_free_table);
+
+/**
+ * __sg_alloc_table - Allocate and initialize an sg table with given allocator
+ * @table: The sg table header to use
+ * @nents: Number of entries in sg list
+ * @gfp_mask: GFP allocation mask
+ * @alloc_fn: Allocator to use
+ *
+ * Notes:
+ * If this function returns non-0 (eg failure), the caller must call
+ * __sg_free_table() to cleanup any leftover allocations.
+ *
+ **/
+int __sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask,
+ sg_alloc_fn *alloc_fn)
+{
+ struct scatterlist *sg, *prv;
+ unsigned int left;
+
+#ifndef ARCH_HAS_SG_CHAIN
+ BUG_ON(nents > SG_MAX_SINGLE_ALLOC);
+#endif
+
+ memset(table, 0, sizeof(*table));
+
+ left = nents;
+ prv = NULL;
+ do {
+ unsigned int sg_size, alloc_size = left;
+
+ if (alloc_size > SG_MAX_SINGLE_ALLOC) {
+ alloc_size = SG_MAX_SINGLE_ALLOC;
+ sg_size = alloc_size - 1;
+ } else
+ sg_size = alloc_size;
+
+ left -= sg_size;
+
+ sg = alloc_fn(alloc_size, gfp_mask);
+ if (unlikely(!sg))
+ return -ENOMEM;
+
+ sg_init_table(sg, alloc_size);
+ table->nents = table->orig_nents += sg_size;
+
+ /*
+ * If this is the first mapping, assign the sg table header.
+ * If this is not the first mapping, chain previous part.
+ */
+ if (prv)
+ sg_chain(prv, SG_MAX_SINGLE_ALLOC, sg);
+ else
+ table->sgl = sg;
+
+ /*
+ * If no more entries after this one, mark the end
+ */
+ if (!left)
+ sg_mark_end(&sg[sg_size - 1]);
+
+ /*
+ * only really needed for mempool backed sg allocations (like
+ * SCSI), a possible improvement here would be to pass the
+ * table pointer into the allocator and let that clear these
+ * flags
+ */
+ gfp_mask &= ~__GFP_WAIT;
+ gfp_mask |= __GFP_HIGH;
+ prv = sg;
+ } while (left);
+
+ return 0;
+}
+EXPORT_SYMBOL(__sg_alloc_table);
+
+/**
+ * sg_alloc_table - Allocate and initialize an sg table
+ * @table: The sg table header to use
+ * @nents: Number of entries in sg list
+ * @gfp_mask: GFP allocation mask
+ *
+ * Description:
+ * Allocate and initialize an sg table. If @nents@ is larger than
+ * SG_MAX_SINGLE_ALLOC a chained sg table will be setup.
+ *
+ **/
+int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
+{
+ int ret;
+
+ ret = __sg_alloc_table(table, nents, gfp_mask, sg_kmalloc);
+ if (unlikely(ret))
+ __sg_free_table(table, sg_kfree);
+
+ return ret;
+}
+EXPORT_SYMBOL(sg_alloc_table);
--
1.5.3.3

2007-12-20 14:05:00

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 2/3] SG: Convert SCSI to use scatterlist helpers for sg chaining

From: Jens Axboe <[email protected]>

Signed-off-by: Jens Axboe <[email protected]>
---
drivers/scsi/libsrp.c | 2 +-
drivers/scsi/scsi_error.c | 4 +-
drivers/scsi/scsi_lib.c | 150 +++++-------------------------------------
drivers/usb/storage/isd200.c | 4 +-
include/scsi/scsi_cmnd.h | 9 +--
5 files changed, 24 insertions(+), 145 deletions(-)

diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c
index 8a8562a..b81350d 100644
--- a/drivers/scsi/libsrp.c
+++ b/drivers/scsi/libsrp.c
@@ -427,7 +427,7 @@ int srp_cmd_queue(struct Scsi_Host *shost, struct srp_cmd *cmd, void *info,
sc->SCp.ptr = info;
memcpy(sc->cmnd, cmd->cdb, MAX_COMMAND_SIZE);
sc->sdb.length = len;
- sc->sdb.sglist = (void *) (unsigned long) addr;
+ sc->sdb.sgt.sgl = (void *) (unsigned long) addr;
sc->tag = tag;
err = scsi_tgt_queue_command(sc, itn_id, (struct scsi_lun *)&cmd->lun,
cmd->tag);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5c8ba6a..1fd2a8c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -629,9 +629,9 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
sizeof(scmd->sense_buffer), sense_bytes);
sg_init_one(&ses->sense_sgl, scmd->sense_buffer,
scmd->sdb.length);
- scmd->sdb.sglist = &ses->sense_sgl;
+ scmd->sdb.sgt.sgl = &ses->sense_sgl;
scmd->sc_data_direction = DMA_FROM_DEVICE;
- scmd->sdb.sg_count = 1;
+ scmd->sdb.sgt.nents = 1;
memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
scmd->cmnd[0] = REQUEST_SENSE;
scmd->cmnd[4] = scmd->sdb.length;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a6aae56..c7107f1 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -750,136 +750,15 @@ static inline unsigned int scsi_sgtable_index(unsigned short nents)
return index;
}

-static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb,
- unsigned short sg_count, gfp_t gfp_mask)
+static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
{
- struct scsi_host_sg_pool *sgp;
- struct scatterlist *sgl, *prev, *ret;
- unsigned int index;
- int this, left;
-
- left = sg_count;
- ret = prev = NULL;
- do {
- this = left;
- if (this > SCSI_MAX_SG_SEGMENTS) {
- this = SCSI_MAX_SG_SEGMENTS - 1;
- index = SG_MEMPOOL_NR - 1;
- } else
- index = scsi_sgtable_index(this);
-
- left -= this;
-
- sgp = scsi_sg_pools + index;
-
- sgl = mempool_alloc(sgp->pool, gfp_mask);
- if (unlikely(!sgl))
- goto enomem;
-
- sg_init_table(sgl, sgp->size);
-
- /*
- * first loop through, set initial index and return value
- */
- if (!ret)
- ret = sgl;
-
- /*
- * chain previous sglist, if any. we know the previous
- * sglist must be the biggest one, or we would not have
- * ended up doing another loop.
- */
- if (prev)
- sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
-
- /*
- * if we have nothing left, mark the last segment as
- * end-of-list
- */
- if (!left)
- sg_mark_end(&sgl[this - 1]);
-
- /*
- * don't allow subsequent mempool allocs to sleep, it would
- * violate the mempool principle.
- */
- gfp_mask &= ~__GFP_WAIT;
- gfp_mask |= __GFP_HIGH;
- prev = sgl;
- } while (left);
-
- /*
- * ->use_sg may get modified after dma mapping has potentially
- * shrunk the number of segments, so keep a copy of it for free.
- */
- sdb->alloc_sg_count = sdb->sg_count = sg_count;
- sdb->sglist = ret;
- return 0;
-enomem:
- if (ret) {
- /*
- * Free entries chained off ret. Since we were trying to
- * allocate another sglist, we know that all entries are of
- * the max size.
- */
- sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
- prev = ret;
- ret = &ret[SCSI_MAX_SG_SEGMENTS - 1];
-
- while ((sgl = sg_chain_ptr(ret)) != NULL) {
- ret = &sgl[SCSI_MAX_SG_SEGMENTS - 1];
- mempool_free(sgl, sgp->pool);
- }
-
- mempool_free(prev, sgp->pool);
- }
- return -ENOMEM;
+ return mempool_alloc(scsi_sg_pools[scsi_sgtable_index(nents)].pool,
+ gfp_mask);
}

-static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
+static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
{
- struct scatterlist *sgl = sdb->sglist;
- struct scsi_host_sg_pool *sgp;
-
- /*
- * if this is the biggest size sglist, check if we have
- * chained parts we need to free
- */
- if (sdb->alloc_sg_count > SCSI_MAX_SG_SEGMENTS) {
- unsigned short this, left;
- struct scatterlist *next;
- unsigned int index;
-
- left = sdb->alloc_sg_count - (SCSI_MAX_SG_SEGMENTS - 1);
- next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
- while (left && next) {
- sgl = next;
- this = left;
- if (this > SCSI_MAX_SG_SEGMENTS) {
- this = SCSI_MAX_SG_SEGMENTS - 1;
- index = SG_MEMPOOL_NR - 1;
- } else
- index = scsi_sgtable_index(this);
-
- left -= this;
-
- sgp = scsi_sg_pools + index;
-
- if (left)
- next = sg_chain_ptr(&sgl[sgp->size - 1]);
-
- mempool_free(sgl, sgp->pool);
- }
-
- /*
- * Restore original, will be freed below
- */
- sgl = sdb->sglist;
- sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
- } else
- sgp = scsi_sg_pools + scsi_sgtable_index(sdb->alloc_sg_count);
-
- mempool_free(sgl, sgp->pool);
+ mempool_free(sgl, scsi_sg_pools[scsi_sgtable_index(nents)].pool);
}

/*
@@ -901,15 +780,15 @@ static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
*/
void scsi_release_buffers(struct scsi_cmnd *cmd)
{
- if (cmd->sdb.sglist)
- scsi_free_sgtable(&cmd->sdb);
+ if (cmd->sdb.sgt.sgl)
+ __sg_free_table(&cmd->sdb.sgt, scsi_sg_free);

memset(&cmd->sdb, 0, sizeof(cmd->sdb));

if (scsi_bidi_cmnd(cmd)) {
struct scsi_data_buffer *bidi_sdb =
cmd->request->next_rq->special;
- scsi_free_sgtable(bidi_sdb);
+ __sg_free_table(&bidi_sdb->sgt, scsi_sg_free);
kmem_cache_free(scsi_bidi_sdb_cache, bidi_sdb);
cmd->request->next_rq->special = NULL;
}
@@ -1135,9 +1014,12 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
gfp_t gfp_mask)
{
- int count;
+ int count, ret;

- if (scsi_alloc_sgtable(sdb, req->nr_phys_segments, gfp_mask)) {
+ ret = __sg_alloc_table(&sdb->sgt, req->nr_phys_segments, gfp_mask,
+ scsi_sg_alloc);
+ if (unlikely(ret)) {
+ __sg_free_table(&sdb->sgt, scsi_sg_free);
return BLKPREP_DEFER;
}

@@ -1151,9 +1033,9 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
* Next, walk the list, and fill in the addresses and sizes of
* each segment.
*/
- count = blk_rq_map_sg(req->q, req, sdb->sglist);
- BUG_ON(count > sdb->sg_count);
- sdb->sg_count = count;
+ count = blk_rq_map_sg(req->q, req, sdb->sgt.sgl);
+ BUG_ON(count > sdb->sgt.nents);
+ sdb->sgt.nents = count;
return BLKPREP_OK;
}

diff --git a/drivers/usb/storage/isd200.c b/drivers/usb/storage/isd200.c
index 2d9a32b..0214ddb 100644
--- a/drivers/usb/storage/isd200.c
+++ b/drivers/usb/storage/isd200.c
@@ -415,9 +415,9 @@ static void isd200_set_srb(struct isd200_info *info,
sg_init_one(&info->sg, buff, bufflen);

srb->sc_data_direction = dir;
- srb->sdb.sglist = buff ? &info->sg : NULL;
+ srb->sdb.sgt.sgl = buff ? &info->sg : NULL;
srb->sdb.length = bufflen;
- srb->sdb.sg_count = buff ? 1 : 0;
+ srb->sdb.sgt.nents = buff ? 1 : 0;
}

static void isd200_srb_set_bufflen(struct scsi_cmnd *srb, unsigned bufflen)
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index cd2851a..9933bbe 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -8,16 +8,13 @@
#include <linux/timer.h>
#include <linux/scatterlist.h>

-struct scatterlist;
struct Scsi_Host;
struct scsi_device;

struct scsi_data_buffer {
- struct scatterlist *sglist;
+ struct sg_table sgt;
unsigned length;
int resid;
- unsigned short sg_count;
- unsigned short alloc_sg_count; /* private to scsi_lib */
};

/* embedded in scsi_cmnd */
@@ -136,12 +133,12 @@ extern void scsi_dma_unmap(struct scsi_cmnd *cmd);

static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
{
- return cmd->sdb.sg_count;
+ return cmd->sdb.sgt.nents;
}

static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
{
- return cmd->sdb.sglist;
+ return cmd->sdb.sgt.sgl;
}

static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
--
1.5.3.3

2007-12-20 14:08:56

by Boaz Harrosh

[permalink] [raw]
Subject: [PATCH 3/3] SG: Update ide/ to use sg_table

From: Jens Axboe <[email protected]>

Signed-off-by: Jens Axboe <[email protected]>
---
drivers/ide/arm/icside.c | 6 +++---
drivers/ide/cris/ide-cris.c | 2 +-
drivers/ide/ide-dma.c | 8 ++++----
drivers/ide/ide-io.c | 2 +-
drivers/ide/ide-probe.c | 6 +-----
drivers/ide/ide-taskfile.c | 2 +-
drivers/ide/ide.c | 2 +-
drivers/ide/mips/au1xxx-ide.c | 8 ++++----
drivers/ide/pci/sgiioc4.c | 4 ++--
drivers/ide/ppc/pmac.c | 6 +++---
drivers/scsi/ide-scsi.c | 2 +-
include/linux/ide.h | 2 +-
12 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c
index 93f71fc..a48a6bd 100644
--- a/drivers/ide/arm/icside.c
+++ b/drivers/ide/arm/icside.c
@@ -210,7 +210,7 @@ static void icside_build_sglist(ide_drive_t *drive, struct request *rq)
{
ide_hwif_t *hwif = drive->hwif;
struct icside_state *state = hwif->hwif_data;
- struct scatterlist *sg = hwif->sg_table;
+ struct scatterlist *sg = hwif->sg_table.sgl;

ide_map_sg(drive, rq);

@@ -319,7 +319,7 @@ static int icside_dma_end(ide_drive_t *drive)
disable_dma(ECARD_DEV(state->dev)->dma);

/* Teardown mappings after DMA has completed. */
- dma_unmap_sg(state->dev, hwif->sg_table, hwif->sg_nents,
+ dma_unmap_sg(state->dev, hwif->sg_table.sgl, hwif->sg_nents,
hwif->sg_dma_direction);

return get_dma_residue(ECARD_DEV(state->dev)->dma) != 0;
@@ -373,7 +373,7 @@ static int icside_dma_setup(ide_drive_t *drive)
* Tell the DMA engine about the SG table and
* data direction.
*/
- set_dma_sg(ECARD_DEV(state->dev)->dma, hwif->sg_table, hwif->sg_nents);
+ set_dma_sg(ECARD_DEV(state->dev)->dma, hwif->sg_table.sgl, hwif->sg_nents);
set_dma_mode(ECARD_DEV(state->dev)->dma, dma_mode);

drive->waiting_for_dma = 1;
diff --git a/drivers/ide/cris/ide-cris.c b/drivers/ide/cris/ide-cris.c
index 476e0d6..3701aca 100644
--- a/drivers/ide/cris/ide-cris.c
+++ b/drivers/ide/cris/ide-cris.c
@@ -919,7 +919,7 @@ static int cris_ide_build_dmatable (ide_drive_t *drive)
unsigned int count = 0;
int i = 0;

- sg = hwif->sg_table;
+ sg = hwif->sg_table.sgl;

ata_tot_size = 0;

diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c
index 4703837..33a2f56 100644
--- a/drivers/ide/ide-dma.c
+++ b/drivers/ide/ide-dma.c
@@ -190,7 +190,7 @@ static int ide_dma_good_drive(ide_drive_t *drive)
int ide_build_sglist(ide_drive_t *drive, struct request *rq)
{
ide_hwif_t *hwif = HWIF(drive);
- struct scatterlist *sg = hwif->sg_table;
+ struct scatterlist *sg = hwif->sg_table.sgl;

BUG_ON((rq->cmd_type == REQ_TYPE_ATA_TASKFILE) && rq->nr_sectors > 256);

@@ -234,7 +234,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request *rq)
if (!i)
return 0;

- sg = hwif->sg_table;
+ sg = hwif->sg_table.sgl;
while (i) {
u32 cur_addr;
u32 cur_len;
@@ -293,7 +293,7 @@ int ide_build_dmatable (ide_drive_t *drive, struct request *rq)
printk(KERN_ERR "%s: empty DMA table?\n", drive->name);
use_pio_instead:
pci_unmap_sg(hwif->pci_dev,
- hwif->sg_table,
+ hwif->sg_table.sgl,
hwif->sg_nents,
hwif->sg_dma_direction);
return 0; /* revert to PIO for this request */
@@ -315,7 +315,7 @@ EXPORT_SYMBOL_GPL(ide_build_dmatable);
void ide_destroy_dmatable (ide_drive_t *drive)
{
struct pci_dev *dev = HWIF(drive)->pci_dev;
- struct scatterlist *sg = HWIF(drive)->sg_table;
+ struct scatterlist *sg = HWIF(drive)->sg_table.sgl;
int nents = HWIF(drive)->sg_nents;

pci_unmap_sg(dev, sg, nents, HWIF(drive)->sg_dma_direction);
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index bef781f..638e2db 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -819,7 +819,7 @@ static ide_startstop_t do_special (ide_drive_t *drive)
void ide_map_sg(ide_drive_t *drive, struct request *rq)
{
ide_hwif_t *hwif = drive->hwif;
- struct scatterlist *sg = hwif->sg_table;
+ struct scatterlist *sg = hwif->sg_table.sgl;

if (hwif->sg_mapped) /* needed by ide-scsi */
return;
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 2994523..770f8cf 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -1312,15 +1312,11 @@ static int hwif_init(ide_hwif_t *hwif)
if (!hwif->sg_max_nents)
hwif->sg_max_nents = PRD_ENTRIES;

- hwif->sg_table = kmalloc(sizeof(struct scatterlist)*hwif->sg_max_nents,
- GFP_KERNEL);
- if (!hwif->sg_table) {
+ if (sg_alloc_table(&hwif->sg_table, hwif->sg_max_nents, GFP_KERNEL)) {
printk(KERN_ERR "%s: unable to allocate SG table.\n", hwif->name);
goto out;
}

- sg_init_table(hwif->sg_table, hwif->sg_max_nents);
-
if (init_irq(hwif) == 0)
goto done;

diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index 2b60f1b..0a898d9 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -246,7 +246,7 @@ static u8 wait_drive_not_busy(ide_drive_t *drive)
static void ide_pio_sector(ide_drive_t *drive, unsigned int write)
{
ide_hwif_t *hwif = drive->hwif;
- struct scatterlist *sg = hwif->sg_table;
+ struct scatterlist *sg = hwif->sg_table.sgl;
struct scatterlist *cursg = hwif->cursg;
struct page *page;
#ifdef CONFIG_HIGHMEM
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index 54943da..a2ad85a 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -594,7 +594,7 @@ void ide_unregister(unsigned int index)
* Remove us from the kernel's knowledge
*/
blk_unregister_region(MKDEV(hwif->major, 0), MAX_DRIVES<<PARTN_BITS);
- kfree(hwif->sg_table);
+ sg_free_table(&hwif->sg_table);
unregister_blkdev(hwif->major, hwif->name);
spin_lock_irq(&ide_lock);

diff --git a/drivers/ide/mips/au1xxx-ide.c b/drivers/ide/mips/au1xxx-ide.c
index a4ce3ba..bccf3c0 100644
--- a/drivers/ide/mips/au1xxx-ide.c
+++ b/drivers/ide/mips/au1xxx-ide.c
@@ -216,7 +216,7 @@ static int auide_build_sglist(ide_drive_t *drive, struct request *rq)
{
ide_hwif_t *hwif = drive->hwif;
_auide_hwif *ahwif = (_auide_hwif*)hwif->hwif_data;
- struct scatterlist *sg = hwif->sg_table;
+ struct scatterlist *sg = hwif->sg_table.sgl;

ide_map_sg(drive, rq);

@@ -250,7 +250,7 @@ static int auide_build_dmatable(ide_drive_t *drive)
return 0;

/* fill the descriptors */
- sg = hwif->sg_table;
+ sg = hwif->sg_table.sgl;
while (i && sg_dma_len(sg)) {
u32 cur_addr;
u32 cur_len;
@@ -303,7 +303,7 @@ static int auide_build_dmatable(ide_drive_t *drive)

use_pio_instead:
dma_unmap_sg(ahwif->dev,
- hwif->sg_table,
+ hwif->sg_table.sgl,
hwif->sg_nents,
hwif->sg_dma_direction);

@@ -316,7 +316,7 @@ static int auide_dma_end(ide_drive_t *drive)
_auide_hwif *ahwif = (_auide_hwif*)hwif->hwif_data;

if (hwif->sg_nents) {
- dma_unmap_sg(ahwif->dev, hwif->sg_table, hwif->sg_nents,
+ dma_unmap_sg(ahwif->dev, hwif->sg_table.sgl, hwif->sg_nents,
hwif->sg_dma_direction);
hwif->sg_nents = 0;
}
diff --git a/drivers/ide/pci/sgiioc4.c b/drivers/ide/pci/sgiioc4.c
index de820aa..3ad9e4b 100644
--- a/drivers/ide/pci/sgiioc4.c
+++ b/drivers/ide/pci/sgiioc4.c
@@ -487,7 +487,7 @@ sgiioc4_build_dma_table(ide_drive_t * drive, struct request *rq, int ddir)
if (!i)
return 0; /* sglist of length Zero */

- sg = hwif->sg_table;
+ sg = hwif->sg_table.sgl;
while (i && sg_dma_len(sg)) {
dma_addr_t cur_addr;
int cur_len;
@@ -535,7 +535,7 @@ sgiioc4_build_dma_table(ide_drive_t * drive, struct request *rq, int ddir)
}

use_pio_instead:
- pci_unmap_sg(hwif->pci_dev, hwif->sg_table, hwif->sg_nents,
+ pci_unmap_sg(hwif->pci_dev, hwif->sg_table.sgl, hwif->sg_nents,
hwif->sg_dma_direction);

return 0; /* revert to PIO for this request */
diff --git a/drivers/ide/ppc/pmac.c b/drivers/ide/ppc/pmac.c
index 7f7a598..b982180 100644
--- a/drivers/ide/ppc/pmac.c
+++ b/drivers/ide/ppc/pmac.c
@@ -1503,7 +1503,7 @@ pmac_ide_build_dmatable(ide_drive_t *drive, struct request *rq)
return 0;

/* Build DBDMA commands list */
- sg = hwif->sg_table;
+ sg = hwif->sg_table.sgl;
while (i && sg_dma_len(sg)) {
u32 cur_addr;
u32 cur_len;
@@ -1555,7 +1555,7 @@ pmac_ide_build_dmatable(ide_drive_t *drive, struct request *rq)
printk(KERN_DEBUG "%s: empty DMA table?\n", drive->name);
use_pio_instead:
pci_unmap_sg(hwif->pci_dev,
- hwif->sg_table,
+ hwif->sg_table.sgl,
hwif->sg_nents,
hwif->sg_dma_direction);
return 0; /* revert to PIO for this request */
@@ -1567,7 +1567,7 @@ pmac_ide_destroy_dmatable (ide_drive_t *drive)
{
ide_hwif_t *hwif = drive->hwif;
struct pci_dev *dev = HWIF(drive)->pci_dev;
- struct scatterlist *sg = hwif->sg_table;
+ struct scatterlist *sg = hwif->sg_table.sgl;
int nents = hwif->sg_nents;

if (nents) {
diff --git a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
index 9706de9..762bd93 100644
--- a/drivers/scsi/ide-scsi.c
+++ b/drivers/scsi/ide-scsi.c
@@ -553,7 +553,7 @@ static int idescsi_map_sg(ide_drive_t *drive, idescsi_pc_t *pc)
if (idescsi_set_direction(pc))
return 1;

- sg = hwif->sg_table;
+ sg = hwif->sg_table.sgl;
scsi_sg = scsi_sglist(pc->scsi_cmd);
segments = scsi_sg_count(pc->scsi_cmd);

diff --git a/include/linux/ide.h b/include/linux/ide.h
index 9a6a41e..80ff2ee 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -750,7 +750,7 @@ typedef struct hwif_s {
/* dma physical region descriptor table (dma view) */
dma_addr_t dmatable_dma;
/* Scatter-gather list used to build the above */
- struct scatterlist *sg_table;
+ struct sg_table sg_table;
int sg_max_nents; /* Maximum number of entries in it */
int sg_nents; /* Current number of entries in it */
int sg_dma_direction; /* dma transfer direction */
--
1.5.3.3

2007-12-20 14:36:56

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers

A small comment in body

On Thu, Dec 20 2007 at 16:00 +0200, Boaz Harrosh <[email protected]> wrote:
> Manually doing chained sg lists is not trivial, so add some helpers
> to make sure that drivers get it right.
>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> include/linux/scatterlist.h | 125 ++++---------------
> lib/Makefile | 2 +-
> lib/scatterlist.c | 281 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 307 insertions(+), 101 deletions(-)
> create mode 100644 lib/scatterlist.c
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 416e000..c3ca848 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -7,6 +7,12 @@
> #include <linux/string.h>
> #include <asm/io.h>
>
> +struct sg_table {
> + struct scatterlist *sgl; /* the list */
> + unsigned int nents; /* number of mapped entries */
> + unsigned int orig_nents; /* original size of list */
> +};
> +
> /*
> * Notes on SG table design.
> *
> @@ -106,31 +112,6 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
> }
>
> -/**
> - * sg_next - return the next scatterlist entry in a list
> - * @sg: The current sg entry
> - *
> - * Description:
> - * Usually the next entry will be @sg@ + 1, but if this sg element is part
> - * of a chained scatterlist, it could jump to the start of a new
> - * scatterlist array.
> - *
> - **/
> -static inline struct scatterlist *sg_next(struct scatterlist *sg)
> -{
> -#ifdef CONFIG_DEBUG_SG
> - BUG_ON(sg->sg_magic != SG_MAGIC);
> -#endif
> - if (sg_is_last(sg))
> - return NULL;
> -
> - sg++;
> - if (unlikely(sg_is_chain(sg)))
> - sg = sg_chain_ptr(sg);
> -
> - return sg;
> -}
> -
> /*
> * Loop over each sg element, following the pointer to a new list if necessary
> */
> @@ -138,40 +119,6 @@ static inline struct scatterlist *sg_next(struct scatterlist *sg)
> for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg))
>
> /**
> - * sg_last - return the last scatterlist entry in a list
> - * @sgl: First entry in the scatterlist
> - * @nents: Number of entries in the scatterlist
> - *
> - * Description:
> - * Should only be used casually, it (currently) scan the entire list
> - * to get the last entry.
> - *
> - * Note that the @sgl@ pointer passed in need not be the first one,
> - * the important bit is that @nents@ denotes the number of entries that
> - * exist from @sgl@.
> - *
> - **/
> -static inline struct scatterlist *sg_last(struct scatterlist *sgl,
> - unsigned int nents)
> -{
> -#ifndef ARCH_HAS_SG_CHAIN
> - struct scatterlist *ret = &sgl[nents - 1];
> -#else
> - struct scatterlist *sg, *ret = NULL;
> - unsigned int i;
> -
> - for_each_sg(sgl, sg, nents, i)
> - ret = sg;
> -
> -#endif
> -#ifdef CONFIG_DEBUG_SG
> - BUG_ON(sgl[0].sg_magic != SG_MAGIC);
> - BUG_ON(!sg_is_last(ret));
> -#endif
> - return ret;
> -}
> -
> -/**
> * sg_chain - Chain two sglists together
> * @prv: First scatterlist
> * @prv_nents: Number of entries in prv
> @@ -223,47 +170,6 @@ static inline void sg_mark_end(struct scatterlist *sg)
> }
>
> /**
> - * sg_init_table - Initialize SG table
> - * @sgl: The SG table
> - * @nents: Number of entries in table
> - *
> - * Notes:
> - * If this is part of a chained sg table, sg_mark_end() should be
> - * used only on the last table part.
> - *
> - **/
> -static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
> -{
> - memset(sgl, 0, sizeof(*sgl) * nents);
> -#ifdef CONFIG_DEBUG_SG
> - {
> - unsigned int i;
> - for (i = 0; i < nents; i++)
> - sgl[i].sg_magic = SG_MAGIC;
> - }
> -#endif
> - sg_mark_end(&sgl[nents - 1]);
> -}
> -
> -/**
> - * sg_init_one - Initialize a single entry sg list
> - * @sg: SG entry
> - * @buf: Virtual address for IO
> - * @buflen: IO length
> - *
> - * Notes:
> - * This should not be used on a single entry that is part of a larger
> - * table. Use sg_init_table() for that.
> - *
> - **/
> -static inline void sg_init_one(struct scatterlist *sg, const void *buf,
> - unsigned int buflen)
> -{
> - sg_init_table(sg, 1);
> - sg_set_buf(sg, buf, buflen);
> -}
> -
> -/**
> * sg_phys - Return physical address of an sg entry
> * @sg: SG entry
> *
> @@ -293,4 +199,23 @@ static inline void *sg_virt(struct scatterlist *sg)
> return page_address(sg_page(sg)) + sg->offset;
> }
>
> +struct scatterlist *sg_next(struct scatterlist *);
I don't like that this became exported. I would prefer if
it could stay inline. if at all possible?

> +struct scatterlist *sg_last(struct scatterlist *s, unsigned int);
> +void sg_init_table(struct scatterlist *, unsigned int);
> +void sg_init_one(struct scatterlist *, const void *, unsigned int);
> +
> +typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t);
> +typedef void (sg_free_fn)(struct scatterlist *, unsigned int);
> +
> +void __sg_free_table(struct sg_table *, sg_free_fn *);
> +void sg_free_table(struct sg_table *);
> +int __sg_alloc_table(struct sg_table *, unsigned int, gfp_t, sg_alloc_fn *);
> +int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
> +
> +/*
> + * Maximum number of entries that will be allocated in one piece, if
> + * a list larger than this is required then chaining will be utilized.
> + */
> +#define SG_MAX_SINGLE_ALLOC (PAGE_SIZE / sizeof(struct scatterlist))
Yes this is a well needed calculation.

Only now the SCSI part of the allocator is missing.
What is needed is the code a posted a while back that
automatically adjusts pools and sizes to this constant.
I will post that again soon.
(Note that this is a bug on current code since above is 256
in 32 bit code and scsi_lib is only ready for 128)

> +
> #endif /* _LINUX_SCATTERLIST_H */
> diff --git a/lib/Makefile b/lib/Makefile
> index b6793ed..89841dc 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -6,7 +6,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
> rbtree.o radix-tree.o dump_stack.o \
> idr.o int_sqrt.o extable.o prio_tree.o \
> sha1.o irq_regs.o reciprocal_div.o argv_split.o \
> - proportions.o prio_heap.o
> + proportions.o prio_heap.o scatterlist.o
>
> lib-$(CONFIG_MMU) += ioremap.o
> lib-$(CONFIG_SMP) += cpumask.o
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> new file mode 100644
> index 0000000..02aaa27
> --- /dev/null
> +++ b/lib/scatterlist.c
> @@ -0,0 +1,281 @@
> +/*
> + * Copyright (C) 2007 Jens Axboe <[email protected]>
> + *
> + * Scatterlist handling helpers.
> + *
> + * This source code is licensed under the GNU General Public License,
> + * Version 2. See the file COPYING for more details.
> + */
> +#include <linux/module.h>
> +#include <linux/scatterlist.h>
> +
> +/**
> + * sg_next - return the next scatterlist entry in a list
> + * @sg: The current sg entry
> + *
> + * Description:
> + * Usually the next entry will be @sg@ + 1, but if this sg element is part
> + * of a chained scatterlist, it could jump to the start of a new
> + * scatterlist array.
> + *
> + **/
> +struct scatterlist *sg_next(struct scatterlist *sg)
> +{
> +#ifdef CONFIG_DEBUG_SG
> + BUG_ON(sg->sg_magic != SG_MAGIC);
> +#endif
> + if (sg_is_last(sg))
> + return NULL;
> +
> + sg++;
> + if (unlikely(sg_is_chain(sg)))
> + sg = sg_chain_ptr(sg);
> +
> + return sg;
> +}
> +EXPORT_SYMBOL(sg_next);
> +
> +/**
> + * sg_last - return the last scatterlist entry in a list
> + * @sgl: First entry in the scatterlist
> + * @nents: Number of entries in the scatterlist
> + *
> + * Description:
> + * Should only be used casually, it (currently) scans the entire list
> + * to get the last entry.
> + *
> + * Note that the @sgl@ pointer passed in need not be the first one,
> + * the important bit is that @nents@ denotes the number of entries that
> + * exist from @sgl@.
> + *
> + **/
> +struct scatterlist *sg_last(struct scatterlist *sgl, unsigned int nents)
> +{
> +#ifndef ARCH_HAS_SG_CHAIN
> + struct scatterlist *ret = &sgl[nents - 1];
> +#else
> + struct scatterlist *sg, *ret = NULL;
> + unsigned int i;
> +
> + for_each_sg(sgl, sg, nents, i)
> + ret = sg;
> +
> +#endif
> +#ifdef CONFIG_DEBUG_SG
> + BUG_ON(sgl[0].sg_magic != SG_MAGIC);
> + BUG_ON(!sg_is_last(ret));
> +#endif
> + return ret;
> +}
> +EXPORT_SYMBOL(sg_last);
> +
> +/**
> + * sg_init_table - Initialize SG table
> + * @sgl: The SG table
> + * @nents: Number of entries in table
> + *
> + * Notes:
> + * If this is part of a chained sg table, sg_mark_end() should be
> + * used only on the last table part.
> + *
> + **/
> +void sg_init_table(struct scatterlist *sgl, unsigned int nents)
> +{
> + memset(sgl, 0, sizeof(*sgl) * nents);
> +#ifdef CONFIG_DEBUG_SG
> + {
> + unsigned int i;
> + for (i = 0; i < nents; i++)
> + sgl[i].sg_magic = SG_MAGIC;
> + }
> +#endif
> + sg_mark_end(&sgl[nents - 1]);
> +}
> +EXPORT_SYMBOL(sg_init_table);
> +
> +/**
> + * sg_init_one - Initialize a single entry sg list
> + * @sg: SG entry
> + * @buf: Virtual address for IO
> + * @buflen: IO length
> + *
> + **/
> +void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen)
> +{
> + sg_init_table(sg, 1);
> + sg_set_buf(sg, buf, buflen);
> +}
> +EXPORT_SYMBOL(sg_init_one);
> +
> +/*
> + * The default behaviour of sg_alloc_table() is to use these kmalloc/kfree
> + * helpers.
> + */
> +static struct scatterlist *sg_kmalloc(unsigned int nents, gfp_t gfp_mask)
> +{
> + if (nents == SG_MAX_SINGLE_ALLOC)
> + return (struct scatterlist *) __get_free_page(gfp_mask);
> + else
> + return kmalloc(nents * sizeof(struct scatterlist), gfp_mask);
> +}
> +
> +static void sg_kfree(struct scatterlist *sg, unsigned int nents)
> +{
> + if (nents == SG_MAX_SINGLE_ALLOC)
> + free_page((unsigned long) sg);
> + else
> + kfree(sg);
> +}
> +
> +/**
> + * __sg_free_table - Free a previously mapped sg table
> + * @table: The sg table header to use
> + * @free_fn: Free function
> + *
> + * Description:
> + * Free an sg table previously allocated and setup with __sg_alloc_table().
> + *
> + **/
> +void __sg_free_table(struct sg_table *table, sg_free_fn *free_fn)
> +{
> + struct scatterlist *sgl, *next;
> +
> + if (unlikely(!table->sgl))
> + return;
> +
> + sgl = table->sgl;
> + while (table->orig_nents) {
> + unsigned int alloc_size = table->orig_nents;
> + unsigned int sg_size;
> +
> + /*
> + * If we have more than SG_MAX_SINGLE_ALLOC segments left,
> + * then assign 'next' to the sg table after the current one.
> + * sg_size is then one less than alloc size, since the last
> + * element is the chain pointer.
> + */
> + if (alloc_size > SG_MAX_SINGLE_ALLOC) {
> + next = sg_chain_ptr(&sgl[SG_MAX_SINGLE_ALLOC - 1]);
> + alloc_size = SG_MAX_SINGLE_ALLOC;
> + sg_size = alloc_size - 1;
> + } else {
> + sg_size = alloc_size;
> + next = NULL;
> + }
> +
> + table->orig_nents -= sg_size;
> + free_fn(sgl, alloc_size);
> + sgl = next;
> + }
> +
> + table->sgl = NULL;
> +}
> +EXPORT_SYMBOL(__sg_free_table);
> +
> +/**
> + * sg_free_table - Free a previously allocated sg table
> + * @table: The mapped sg table header
> + *
> + **/
> +void sg_free_table(struct sg_table *table)
> +{
> + __sg_free_table(table, sg_kfree);
> +}
> +EXPORT_SYMBOL(sg_free_table);
> +
> +/**
> + * __sg_alloc_table - Allocate and initialize an sg table with given allocator
> + * @table: The sg table header to use
> + * @nents: Number of entries in sg list
> + * @gfp_mask: GFP allocation mask
> + * @alloc_fn: Allocator to use
> + *
> + * Notes:
> + * If this function returns non-0 (eg failure), the caller must call
> + * __sg_free_table() to cleanup any leftover allocations.
> + *
> + **/
> +int __sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask,
> + sg_alloc_fn *alloc_fn)
> +{
> + struct scatterlist *sg, *prv;
> + unsigned int left;
> +
> +#ifndef ARCH_HAS_SG_CHAIN
> + BUG_ON(nents > SG_MAX_SINGLE_ALLOC);
> +#endif
> +
> + memset(table, 0, sizeof(*table));
> +
> + left = nents;
> + prv = NULL;
> + do {
> + unsigned int sg_size, alloc_size = left;
> +
> + if (alloc_size > SG_MAX_SINGLE_ALLOC) {
> + alloc_size = SG_MAX_SINGLE_ALLOC;
> + sg_size = alloc_size - 1;
> + } else
> + sg_size = alloc_size;
> +
> + left -= sg_size;
> +
> + sg = alloc_fn(alloc_size, gfp_mask);
> + if (unlikely(!sg))
> + return -ENOMEM;
> +
> + sg_init_table(sg, alloc_size);
> + table->nents = table->orig_nents += sg_size;
> +
> + /*
> + * If this is the first mapping, assign the sg table header.
> + * If this is not the first mapping, chain previous part.
> + */
> + if (prv)
> + sg_chain(prv, SG_MAX_SINGLE_ALLOC, sg);
> + else
> + table->sgl = sg;
> +
> + /*
> + * If no more entries after this one, mark the end
> + */
> + if (!left)
> + sg_mark_end(&sg[sg_size - 1]);
> +
> + /*
> + * only really needed for mempool backed sg allocations (like
> + * SCSI), a possible improvement here would be to pass the
> + * table pointer into the allocator and let that clear these
> + * flags
> + */
> + gfp_mask &= ~__GFP_WAIT;
> + gfp_mask |= __GFP_HIGH;
> + prv = sg;
> + } while (left);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(__sg_alloc_table);
> +
> +/**
> + * sg_alloc_table - Allocate and initialize an sg table
> + * @table: The sg table header to use
> + * @nents: Number of entries in sg list
> + * @gfp_mask: GFP allocation mask
> + *
> + * Description:
> + * Allocate and initialize an sg table. If @nents@ is larger than
> + * SG_MAX_SINGLE_ALLOC a chained sg table will be setup.
> + *
> + **/
> +int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
> +{
> + int ret;
> +
> + ret = __sg_alloc_table(table, nents, gfp_mask, sg_kmalloc);
> + if (unlikely(ret))
> + __sg_free_table(table, sg_kfree);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL(sg_alloc_table);

Boaz

2007-12-20 14:38:46

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH 2/3] SG: Convert SCSI to use scatterlist helpers for sg chaining

On Thu, Dec 20 2007 at 16:03 +0200, Boaz Harrosh <[email protected]> wrote:
> From: Jens Axboe <[email protected]>
>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> drivers/scsi/libsrp.c | 2 +-
> drivers/scsi/scsi_error.c | 4 +-
> drivers/scsi/scsi_lib.c | 150 +++++-------------------------------------
> drivers/usb/storage/isd200.c | 4 +-
> include/scsi/scsi_cmnd.h | 9 +--
> 5 files changed, 24 insertions(+), 145 deletions(-)
>
> diff --git a/drivers/scsi/libsrp.c b/drivers/scsi/libsrp.c
> index 8a8562a..b81350d 100644
> --- a/drivers/scsi/libsrp.c
> +++ b/drivers/scsi/libsrp.c
> @@ -427,7 +427,7 @@ int srp_cmd_queue(struct Scsi_Host *shost, struct srp_cmd *cmd, void *info,
> sc->SCp.ptr = info;
> memcpy(sc->cmnd, cmd->cdb, MAX_COMMAND_SIZE);
> sc->sdb.length = len;
> - sc->sdb.sglist = (void *) (unsigned long) addr;
> + sc->sdb.sgt.sgl = (void *) (unsigned long) addr;
> sc->tag = tag;
> err = scsi_tgt_queue_command(sc, itn_id, (struct scsi_lun *)&cmd->lun,
> cmd->tag);
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 5c8ba6a..1fd2a8c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -629,9 +629,9 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
> sizeof(scmd->sense_buffer), sense_bytes);
> sg_init_one(&ses->sense_sgl, scmd->sense_buffer,
> scmd->sdb.length);
> - scmd->sdb.sglist = &ses->sense_sgl;
> + scmd->sdb.sgt.sgl = &ses->sense_sgl;
> scmd->sc_data_direction = DMA_FROM_DEVICE;
> - scmd->sdb.sg_count = 1;
> + scmd->sdb.sgt.nents = 1;
> memset(scmd->cmnd, 0, sizeof(scmd->cmnd));
> scmd->cmnd[0] = REQUEST_SENSE;
> scmd->cmnd[4] = scmd->sdb.length;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index a6aae56..c7107f1 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -750,136 +750,15 @@ static inline unsigned int scsi_sgtable_index(unsigned short nents)
> return index;
> }
>
> -static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb,
> - unsigned short sg_count, gfp_t gfp_mask)
> +static struct scatterlist *scsi_sg_alloc(unsigned int nents, gfp_t gfp_mask)
> {
> - struct scsi_host_sg_pool *sgp;
> - struct scatterlist *sgl, *prev, *ret;
> - unsigned int index;
> - int this, left;
> -
> - left = sg_count;
> - ret = prev = NULL;
> - do {
> - this = left;
> - if (this > SCSI_MAX_SG_SEGMENTS) {
> - this = SCSI_MAX_SG_SEGMENTS - 1;
> - index = SG_MEMPOOL_NR - 1;
> - } else
> - index = scsi_sgtable_index(this);
> -
> - left -= this;
> -
> - sgp = scsi_sg_pools + index;
> -
> - sgl = mempool_alloc(sgp->pool, gfp_mask);
> - if (unlikely(!sgl))
> - goto enomem;
> -
> - sg_init_table(sgl, sgp->size);
> -
> - /*
> - * first loop through, set initial index and return value
> - */
> - if (!ret)
> - ret = sgl;
> -
> - /*
> - * chain previous sglist, if any. we know the previous
> - * sglist must be the biggest one, or we would not have
> - * ended up doing another loop.
> - */
> - if (prev)
> - sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
> -
> - /*
> - * if we have nothing left, mark the last segment as
> - * end-of-list
> - */
> - if (!left)
> - sg_mark_end(&sgl[this - 1]);
> -
> - /*
> - * don't allow subsequent mempool allocs to sleep, it would
> - * violate the mempool principle.
> - */
> - gfp_mask &= ~__GFP_WAIT;
> - gfp_mask |= __GFP_HIGH;
> - prev = sgl;
> - } while (left);
> -
> - /*
> - * ->use_sg may get modified after dma mapping has potentially
> - * shrunk the number of segments, so keep a copy of it for free.
> - */
> - sdb->alloc_sg_count = sdb->sg_count = sg_count;
> - sdb->sglist = ret;
> - return 0;
> -enomem:
> - if (ret) {
> - /*
> - * Free entries chained off ret. Since we were trying to
> - * allocate another sglist, we know that all entries are of
> - * the max size.
> - */
> - sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
> - prev = ret;
> - ret = &ret[SCSI_MAX_SG_SEGMENTS - 1];
> -
> - while ((sgl = sg_chain_ptr(ret)) != NULL) {
> - ret = &sgl[SCSI_MAX_SG_SEGMENTS - 1];
> - mempool_free(sgl, sgp->pool);
> - }
> -
> - mempool_free(prev, sgp->pool);
> - }
> - return -ENOMEM;
> + return mempool_alloc(scsi_sg_pools[scsi_sgtable_index(nents)].pool,
> + gfp_mask);
> }
>
> -static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
> +static void scsi_sg_free(struct scatterlist *sgl, unsigned int nents)
> {
> - struct scatterlist *sgl = sdb->sglist;
> - struct scsi_host_sg_pool *sgp;
> -
> - /*
> - * if this is the biggest size sglist, check if we have
> - * chained parts we need to free
> - */
> - if (sdb->alloc_sg_count > SCSI_MAX_SG_SEGMENTS) {
> - unsigned short this, left;
> - struct scatterlist *next;
> - unsigned int index;
> -
> - left = sdb->alloc_sg_count - (SCSI_MAX_SG_SEGMENTS - 1);
> - next = sg_chain_ptr(&sgl[SCSI_MAX_SG_SEGMENTS - 1]);
> - while (left && next) {
> - sgl = next;
> - this = left;
> - if (this > SCSI_MAX_SG_SEGMENTS) {
> - this = SCSI_MAX_SG_SEGMENTS - 1;
> - index = SG_MEMPOOL_NR - 1;
> - } else
> - index = scsi_sgtable_index(this);
> -
> - left -= this;
> -
> - sgp = scsi_sg_pools + index;
> -
> - if (left)
> - next = sg_chain_ptr(&sgl[sgp->size - 1]);
> -
> - mempool_free(sgl, sgp->pool);
> - }
> -
> - /*
> - * Restore original, will be freed below
> - */
> - sgl = sdb->sglist;
> - sgp = scsi_sg_pools + SG_MEMPOOL_NR - 1;
> - } else
> - sgp = scsi_sg_pools + scsi_sgtable_index(sdb->alloc_sg_count);
> -
> - mempool_free(sgl, sgp->pool);
> + mempool_free(sgl, scsi_sg_pools[scsi_sgtable_index(nents)].pool);
> }
>
> /*
> @@ -901,15 +780,15 @@ static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
> */
> void scsi_release_buffers(struct scsi_cmnd *cmd)
> {
> - if (cmd->sdb.sglist)
> - scsi_free_sgtable(&cmd->sdb);
> + if (cmd->sdb.sgt.sgl)
> + __sg_free_table(&cmd->sdb.sgt, scsi_sg_free);
>
> memset(&cmd->sdb, 0, sizeof(cmd->sdb));
>
> if (scsi_bidi_cmnd(cmd)) {
> struct scsi_data_buffer *bidi_sdb =
> cmd->request->next_rq->special;
> - scsi_free_sgtable(bidi_sdb);
> + __sg_free_table(&bidi_sdb->sgt, scsi_sg_free);
> kmem_cache_free(scsi_bidi_sdb_cache, bidi_sdb);
> cmd->request->next_rq->special = NULL;
> }
> @@ -1135,9 +1014,12 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
> static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
> gfp_t gfp_mask)
> {
> - int count;
> + int count, ret;
>
> - if (scsi_alloc_sgtable(sdb, req->nr_phys_segments, gfp_mask)) {
> + ret = __sg_alloc_table(&sdb->sgt, req->nr_phys_segments, gfp_mask,
> + scsi_sg_alloc);
> + if (unlikely(ret)) {
> + __sg_free_table(&sdb->sgt, scsi_sg_free);
> return BLKPREP_DEFER;
> }
>
> @@ -1151,9 +1033,9 @@ static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
> * Next, walk the list, and fill in the addresses and sizes of
> * each segment.
> */
> - count = blk_rq_map_sg(req->q, req, sdb->sglist);
> - BUG_ON(count > sdb->sg_count);
> - sdb->sg_count = count;
> + count = blk_rq_map_sg(req->q, req, sdb->sgt.sgl);
> + BUG_ON(count > sdb->sgt.nents);
> + sdb->sgt.nents = count;
> return BLKPREP_OK;
> }
>
> diff --git a/drivers/usb/storage/isd200.c b/drivers/usb/storage/isd200.c
> index 2d9a32b..0214ddb 100644
> --- a/drivers/usb/storage/isd200.c
> +++ b/drivers/usb/storage/isd200.c
> @@ -415,9 +415,9 @@ static void isd200_set_srb(struct isd200_info *info,
> sg_init_one(&info->sg, buff, bufflen);
>
> srb->sc_data_direction = dir;
> - srb->sdb.sglist = buff ? &info->sg : NULL;
> + srb->sdb.sgt.sgl = buff ? &info->sg : NULL;
> srb->sdb.length = bufflen;
> - srb->sdb.sg_count = buff ? 1 : 0;
> + srb->sdb.sgt.nents = buff ? 1 : 0;
> }
>
> static void isd200_srb_set_bufflen(struct scsi_cmnd *srb, unsigned bufflen)
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index cd2851a..9933bbe 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -8,16 +8,13 @@
> #include <linux/timer.h>
> #include <linux/scatterlist.h>
>
> -struct scatterlist;
> struct Scsi_Host;
> struct scsi_device;
>
> struct scsi_data_buffer {
> - struct scatterlist *sglist;
> + struct sg_table sgt;
> unsigned length;
> int resid;
> - unsigned short sg_count;
> - unsigned short alloc_sg_count; /* private to scsi_lib */
> };
>
> /* embedded in scsi_cmnd */
> @@ -136,12 +133,12 @@ extern void scsi_dma_unmap(struct scsi_cmnd *cmd);
>
> static inline unsigned scsi_sg_count(struct scsi_cmnd *cmd)
> {
> - return cmd->sdb.sg_count;
> + return cmd->sdb.sgt.nents;
> }
>
> static inline struct scatterlist *scsi_sglist(struct scsi_cmnd *cmd)
> {
> - return cmd->sdb.sglist;
> + return cmd->sdb.sgt.sgl;
> }
>
> static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)

The cmd->sdb.sgt.sgl thingy is a bit ugly, I would say. There is
not much we can do, unless Jens is willing to add the "length" and
"resid" members into his struct sg_table. I guess other drivers might
have use for them other than scsi. Jens could you do this for us?

Boaz

2007-12-21 00:35:29

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

On Friday 21 December 2007 11:00:27 FUJITA Tomonori wrote:
> We need to pass the whole sg entries to the IOMMUs at a time.

Hi Fujita,

OK, it's certainly possible to have an arch override. For which
architecture is this BTW?

Thanks,
Rusty.

2007-12-21 00:40:19

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

From: Rusty Russell <[email protected]>
Date: Fri, 21 Dec 2007 11:35:12 +1100

> On Friday 21 December 2007 11:00:27 FUJITA Tomonori wrote:
> > We need to pass the whole sg entries to the IOMMUs at a time.
>
> Hi Fujita,
>
> OK, it's certainly possible to have an arch override. For which
> architecture is this BTW?

SPARC64, POWERPC, maybe IA-64 etc.

Basically any platform that potentially does virtual
remamping and thus linearization.

I think it should always be provided, the new APIs give
less information to the implementation and that's a step
backwards.

2007-12-21 00:31:18

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

From: Rusty Russell <[email protected]>
Date: Fri, 21 Dec 2007 10:13:38 +1100

> But, as demonstrated, there are real benefits of having an explicit header:
>
> 1) It removes the chain-end/explicit count ambiguity (see
> http://lkml.org/lkml/2007/10/25/209 & thread)
> 2) It allows others to manipulate sg chains, which couldn't be done before
> (eg. the ATA code which wants to append a padding element).
> 3) I can now hand you an sg ring for you to fill: sg chains can't do that.
>
> In short, sg_ring is generally useful primitive. sg chains are a clever hack
> for scsi_lib to create, and everyone else to read.

I do not refute any of this :-)

2007-12-20 23:13:56

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Thursday 20 December 2007 18:58:07 David Miller wrote:
> From: Rusty Russell <[email protected]>
> Date: Thu, 20 Dec 2007 18:53:48 +1100
>
> > Manipulating the magic chains is horrible; it looks simple to the
> > places which simply want to iterate through it, but it's awful for
> > code which wants to create them.
>
> I'm not saying complexity is inherent in this stuff, but
> assuming that it is the complexity should live as far away
> from the minions (the iterators in this case). Therefore,
> the creators is the right spot for the hard stuff.

In this case, the main benefit of the sg chaining was that the conversion of
most scsi drivers was easy (basically sg++ -> sg = sg_next(sg)). The
conversion to sg_ring is more complex, but the end result is not
significantly more complex.

However, the cost to code which manipulates sg chains was significant: I tried
using them in virtio and it was too ugly to live (so that doesn't support sg
chaining). If this was the best we could do, that'd be fine.

But, as demonstrated, there are real benefits of having an explicit header:

1) It removes the chain-end/explicit count ambiguity (see
http://lkml.org/lkml/2007/10/25/209 & thread)
2) It allows others to manipulate sg chains, which couldn't be done before
(eg. the ATA code which wants to append a padding element).
3) I can now hand you an sg ring for you to fill: sg chains can't do that.

In short, sg_ring is generally useful primitive. sg chains are a clever hack
for scsi_lib to create, and everyone else to read.

Hope that clarifies,
Rusty.

2007-12-21 00:01:55

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

On Fri, 21 Dec 2007 09:58:56 +1100
Rusty Russell <[email protected]> wrote:

> On Thursday 20 December 2007 18:42:44 David Miller wrote:
> > From: FUJITA Tomonori <[email protected]>
> > Date: Thu, 20 Dec 2007 16:06:31 +0900
> >
> > > On Thu, 20 Dec 2007 16:49:30 +1100
> > >
> > > Rusty Russell <[email protected]> wrote:
> > > > +/**
> > > > + * dma_map_sg_ring - Map an entire sg ring
> > > > + * @dev: Device to free noncoherent memory for
> > > > + * @sg: The sg_ring
> > > > + * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
> > > > + *
> > > > + * This returns -ENOMEM if mapping fails. It's not clear that telling
> > > > you + * it failed is useful though.
> > > > + */
> > > > +int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
> > > > + enum dma_data_direction direction)
> > > > +{
> > > > + struct sg_ring *i;
> > > > + unsigned int num;
> > > > +
> > > > + for (i = sg; i; i = sg_ring_next(i, sg)) {
> > > > + BUG_ON(i->num > i->max);
> > > > + num = dma_map_sg(dev, i->sg, i->num, direction);
> > > > + if (num == 0 && i->num != 0)
> > > > + goto unmap;
> > > > + }
> > > > + return 0;
> > >
> > > I don't think that this works for IOMMUs that could merge sg entries.
> >
> > Right, it won't work at all.
> >
> > The caller has to be told how many DMA entries it really did use to
> > compose the mapping, and there has to be a way to properly iterate
> > over them. The assumption that the IOMMU will map the SG entries
> > 1-to-1 is invalid.
>
> Good catch. Indeed, what's missing is one line:
>
> i->num = num;
>
> Of course, an arch-specific version of this could merge between sg_rings,
> too, but that's an optimization.

We need to pass the whole sg entries to the IOMMUs at a time.

2007-12-20 22:59:23

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

On Thursday 20 December 2007 18:42:44 David Miller wrote:
> From: FUJITA Tomonori <[email protected]>
> Date: Thu, 20 Dec 2007 16:06:31 +0900
>
> > On Thu, 20 Dec 2007 16:49:30 +1100
> >
> > Rusty Russell <[email protected]> wrote:
> > > +/**
> > > + * dma_map_sg_ring - Map an entire sg ring
> > > + * @dev: Device to free noncoherent memory for
> > > + * @sg: The sg_ring
> > > + * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
> > > + *
> > > + * This returns -ENOMEM if mapping fails. It's not clear that telling
> > > you + * it failed is useful though.
> > > + */
> > > +int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
> > > + enum dma_data_direction direction)
> > > +{
> > > + struct sg_ring *i;
> > > + unsigned int num;
> > > +
> > > + for (i = sg; i; i = sg_ring_next(i, sg)) {
> > > + BUG_ON(i->num > i->max);
> > > + num = dma_map_sg(dev, i->sg, i->num, direction);
> > > + if (num == 0 && i->num != 0)
> > > + goto unmap;
> > > + }
> > > + return 0;
> >
> > I don't think that this works for IOMMUs that could merge sg entries.
>
> Right, it won't work at all.
>
> The caller has to be told how many DMA entries it really did use to
> compose the mapping, and there has to be a way to properly iterate
> over them. The assumption that the IOMMU will map the SG entries
> 1-to-1 is invalid.

Good catch. Indeed, what's missing is one line:

i->num = num;

Of course, an arch-specific version of this could merge between sg_rings,
too, but that's an optimization.

Thanks,
Rusty.

dma_map_sg_ring() helper

Obvious counterpart to dma_map_sg. Note that this is arch-independent
code; sg_rings are backwards compatible with simple sg arrays.

Signed-off-by: Rusty Russell <[email protected]>
---
drivers/base/dma-mapping.c | 13 +++++++++++++
include/linux/dma-mapping.h | 4 ++++
2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -8,6 +8,7 @@
*/

#include <linux/dma-mapping.h>
+#include <linux/sg_ring.h>

/*
* Managed DMA API
@@ -162,6 +163,60 @@ void dmam_free_noncoherent(struct device
}
EXPORT_SYMBOL(dmam_free_noncoherent);

+/**
+ * dma_map_sg_ring - Map an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * This returns -ENOMEM if mapping fails. It's not clear that telling you
+ * it failed is useful though.
+ */
+int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction)
+{
+ struct sg_ring *i;
+ unsigned int num;
+
+ for (i = sg; i; i = sg_ring_next(i, sg)) {
+ BUG_ON(i->num > i->max);
+ num = dma_map_sg(dev, i->sg, i->num, direction);
+ if (num == 0 && i->num != 0)
+ goto unmap;
+ i->num = num;
+ }
+ return 0;
+
+unmap:
+ while (sg) {
+ dma_unmap_sg(dev, sg->sg, sg->num, direction);
+ sg = sg_ring_next(sg, i);
+ }
+ return -ENOMEM;
+
+}
+EXPORT_SYMBOL(dma_map_sg_ring);
+
+/**
+ * dma_unmap_sg_ring - Unmap an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * Call after dma_map_sg_ring() succeeds.
+ */
+void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction)
+{
+ struct sg_ring *i;
+
+ for (i = sg; i; i = sg_ring_next(i, sg)) {
+ BUG_ON(i->num > i->max);
+ dma_unmap_sg(dev, i->sg, i->num, direction);
+ }
+}
+EXPORT_SYMBOL(dma_unmap_sg_ring);
+
#ifdef ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY

static void dmam_coherent_decl_release(struct device *dev, void *res)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -87,6 +87,12 @@ dma_mark_declared_memory_occupied(struct
}
#endif

+struct sg_ring;
+extern int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+extern void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+
/*
* Managed DMA API
*/

2007-12-21 02:23:38

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

On Thu, 20 Dec 2007 16:40:00 -0800 (PST)
David Miller <[email protected]> wrote:

> From: Rusty Russell <[email protected]>
> Date: Fri, 21 Dec 2007 11:35:12 +1100
>
> > On Friday 21 December 2007 11:00:27 FUJITA Tomonori wrote:
> > > We need to pass the whole sg entries to the IOMMUs at a time.
> >
> > Hi Fujita,
> >
> > OK, it's certainly possible to have an arch override. For which
> > architecture is this BTW?
>
> SPARC64, POWERPC, maybe IA-64 etc.

And x86_64, Alpha, and PARISC.


> Basically any platform that potentially does virtual
> remamping and thus linearization.
>
> I think it should always be provided, the new APIs give
> less information to the implementation and that's a step
> backwards.

Agreed.

2007-12-21 02:29:21

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Fri, 21 Dec 2007 10:13:38 +1100
Rusty Russell <[email protected]> wrote:

> On Thursday 20 December 2007 18:58:07 David Miller wrote:
> > From: Rusty Russell <[email protected]>
> > Date: Thu, 20 Dec 2007 18:53:48 +1100
> >
> > > Manipulating the magic chains is horrible; it looks simple to the
> > > places which simply want to iterate through it, but it's awful for
> > > code which wants to create them.
> >
> > I'm not saying complexity is inherent in this stuff, but
> > assuming that it is the complexity should live as far away
> > from the minions (the iterators in this case). Therefore,
> > the creators is the right spot for the hard stuff.
>
> In this case, the main benefit of the sg chaining was that the conversion of
> most scsi drivers was easy (basically sg++ -> sg = sg_next(sg)). The
> conversion to sg_ring is more complex, but the end result is not
> significantly more complex.
>
> However, the cost to code which manipulates sg chains was significant: I tried
> using them in virtio and it was too ugly to live (so that doesn't support sg
> chaining). If this was the best we could do, that'd be fine.
>
> But, as demonstrated, there are real benefits of having an explicit header:

I'm not sure about chaining the headers (as your sg_ring and
scsi_sgtable do) would simplify LLDs. Have you looked at ips or
qla1280?

2007-12-21 02:47:47

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 2/5] dma_map_sg_ring() helper

On Friday 21 December 2007 11:40:00 David Miller wrote:
> From: Rusty Russell <[email protected]>
> Date: Fri, 21 Dec 2007 11:35:12 +1100
>
> > On Friday 21 December 2007 11:00:27 FUJITA Tomonori wrote:
> > > We need to pass the whole sg entries to the IOMMUs at a time.
> >
> > Hi Fujita,
> >
> > OK, it's certainly possible to have an arch override. For which
> > architecture is this BTW?
>
> SPARC64, POWERPC, maybe IA-64 etc.
>
> Basically any platform that potentially does virtual
> remamping and thus linearization.

Fujita said "need" which confused me. I already said it should be handed
down as an optimization; I was curious what I had broken :)

> I think it should always be provided, the new APIs give
> less information to the implementation and that's a step
> backwards.

Absolutely. In fact, I think the sg_ring header would be made safer if it
had the "dma_num" in it as well: it's more explicit and less surprising to
the caller than mangling sg->num.

How are these two patches then?

===
Introduce sg_ring: a ring of scatterlist arrays.

This patch introduces 'struct sg_ring', a layer on top of scatterlist
arrays. It meshes nicely with routines which expect a simple array of
'struct scatterlist' because it is easy to break down the ring into
its constituent arrays.

The sg_ring header also encodes the maximum number of entries, useful
for routines which populate an sg. We need never hand around a number
of elements any more.

Signed-off-by: Rusty Russell <[email protected]>
---
include/linux/sg_ring.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 74 insertions(+), 0 deletions(-)
create mode 100644 include/linux/sgring.h

diff --git a/include/linux/sg_ring.h b/include/linux/sg_ring.h
new file mode 100644
--- /dev/null
+++ b/include/linux/sg_ring.h
@@ -0,0 +1,128 @@
+#ifndef _LINUX_SG_RING_H
+#define _LINUX_SG_RING_H
+#include <linux/scatterlist.h>
+
+/**
+ * struct sg_ring - a ring of scatterlists
+ * @list: the list_head chaining them together
+ * @num: the number of valid sg entries
+ * @dma_num: the number of valid sg entries after dma mapping
+ * @max: the maximum number of sg entries (size of the sg array).
+ * @sg: the array of scatterlist entries.
+ *
+ * This provides a convenient encapsulation of one or more scatter gather
+ * arrays. dma_map_sg_ring() (and friends) set @dma_num: some architectures
+ * coalesce sg entries, to this will be < num.
+ */
+struct sg_ring
+{
+ struct list_head list;
+ unsigned int num, dma_num, max;
+ struct scatterlist sg[0];
+};
+
+/* This helper declares an sg ring on the stack or in a struct. */
+#define DECLARE_SG_RING(name, max) \
+ struct { \
+ struct sg_ring ring; \
+ struct scatterlist sg[max]; \
+ } name
+
+/**
+ * sg_ring_init - initialize a scatterlist ring.
+ * @sg: the sg_ring.
+ * @max: the size of the trailing sg array.
+ *
+ * After initialization sg is alone in the ring.
+ */
+static inline void sg_ring_init(struct sg_ring *sg, unsigned int max)
+{
+#ifdef CONFIG_DEBUG_SG
+ unsigned int i;
+ for (i = 0; i < max; i++)
+ sg->sg[i].sg_magic = SG_MAGIC;
+ sg->num = 0xFFFFFFFF;
+ sg->dma_num = 0xFFFFFFFF;
+#endif
+ INIT_LIST_HEAD(&sg->list);
+ sg->max = max;
+ /* FIXME: This is to clear the page bits. */
+ sg_init_table(sg->sg, sg->max);
+}
+
+/**
+ * sg_ring_single - initialize a one-element scatterlist ring.
+ * @sg: the sg_ring.
+ * @buf: the pointer to the buffer.
+ * @buflen: the length of the buffer.
+ *
+ * Does sg_ring_init and also sets up first (and only) sg element.
+ */
+static inline void sg_ring_single(struct sg_ring *sg,
+ const void *buf,
+ unsigned int buflen)
+{
+ sg_ring_init(sg, 1);
+ sg->num = 1;
+ sg_init_one(&sg->sg[0], buf, buflen);
+}
+
+/**
+ * sg_ring_next - next array in a scatterlist ring.
+ * @sg: the sg_ring.
+ * @head: the sg_ring head.
+ *
+ * This will return NULL once @sg has looped back around to @head.
+ */
+static inline struct sg_ring *sg_ring_next(const struct sg_ring *sg,
+ const struct sg_ring *head)
+{
+ sg = list_first_entry(&sg->list, struct sg_ring, list);
+ if (sg == head)
+ sg = NULL;
+ return (struct sg_ring *)sg;
+}
+
+/* Helper for writing for loops. */
+static inline struct sg_ring *sg_ring_iter(const struct sg_ring *head,
+ const struct sg_ring *sg,
+ unsigned int *i)
+{
+ (*i)++;
+ /* While loop lets us skip any zero-entry sg_ring arrays */
+ while (*i == sg->num) {
+ *i = 0;
+ sg = sg_ring_next(sg, head);
+ if (!sg)
+ break;
+ }
+ return (struct sg_ring *)sg;
+}
+
+/**
+ * sg_ring_for_each - iterate through an entire sg_ring ring
+ * @head: the head of the sg_ring.
+ * @sg: the sg_ring iterator.
+ * @i: an (unsigned) integer which refers to sg->sg[i].
+ *
+ * The current scatterlist element is sg->sg[i].
+ */
+#define sg_ring_for_each(head, sg, i) \
+ for (sg = head, i = 0; sg; sg = sg_ring_iter(head, sg, &i))
+
+/**
+ * sg_ring_num - how many struct scatterlists are used in this sg_ring.
+ * @head: the sg_ring
+ *
+ * Simple helper function to add up the number of scatterlists.
+ */
+static inline unsigned sg_ring_num(const struct sg_ring *head)
+{
+ unsigned int num = 0, i;
+ const struct sg_ring *sg;
+
+ sg_ring_for_each(head, sg, i)
+ num += sg->num;
+ return num;
+}
+#endif /* _LINUX_SG_RING_H */


dma_map_sg_ring() helper

Obvious counterpart to dma_map_sg. Note that this is arch-independent
code; sg_rings are backwards compatible with simple sg arrays.

Signed-off-by: Rusty Russell <[email protected]>
---
drivers/base/dma-mapping.c | 13 +++++++++++++
include/linux/dma-mapping.h | 4 ++++
2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -8,6 +8,7 @@
*/

#include <linux/dma-mapping.h>
+#include <linux/sg_ring.h>

/*
* Managed DMA API
@@ -162,6 +163,60 @@ void dmam_free_noncoherent(struct device
}
EXPORT_SYMBOL(dmam_free_noncoherent);

+#ifndef ARCH_HAS_DMA_MAP_SG_RING
+/**
+ * dma_map_sg_ring - Map an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * This returns -ENOMEM if mapping fails. It's not clear that telling you
+ * it failed is useful though.
+ */
+int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction)
+{
+ struct sg_ring *i;
+
+ for (i = sg; i; i = sg_ring_next(i, sg)) {
+ BUG_ON(i->num > i->max);
+ i->dma_num = dma_map_sg(dev, i->sg, i->num, direction);
+ if (i->dma_num == 0 && i->num != 0)
+ goto unmap;
+ }
+ return 0;
+
+unmap:
+ while (sg) {
+ dma_unmap_sg(dev, sg->sg, sg->dma_num, direction);
+ sg = sg_ring_next(sg, i);
+ }
+ return -ENOMEM;
+
+}
+EXPORT_SYMBOL(dma_map_sg_ring);
+
+/**
+ * dma_unmap_sg_ring - Unmap an entire sg ring
+ * @dev: Device to free noncoherent memory for
+ * @sg: The sg_ring
+ * @direction: DMA_TO_DEVICE, DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
+ *
+ * Call after dma_map_sg_ring() succeeds.
+ */
+void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction)
+{
+ struct sg_ring *i;
+
+ for (i = sg; i; i = sg_ring_next(i, sg)) {
+ BUG_ON(i->dma_num > i->max);
+ dma_unmap_sg(dev, i->sg, i->dma_num, direction);
+ }
+}
+EXPORT_SYMBOL(dma_unmap_sg_ring);
+#endif /* !ARCH_HAS_DMA_MAP_SG_RING */
+
#ifdef ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY

static void dmam_coherent_decl_release(struct device *dev, void *res)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -87,6 +87,14 @@ dma_mark_declared_memory_occupied(struct
}
#endif

+#ifndef ARCH_HAS_DMA_MAP_SG_RING
+struct sg_ring;
+extern int dma_map_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+extern void dma_unmap_sg_ring(struct device *dev, struct sg_ring *sg,
+ enum dma_data_direction direction);
+#endif
+
/*
* Managed DMA API
*/

2007-12-21 03:27:00

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Friday 21 December 2007 13:28:34 FUJITA Tomonori wrote:
> I'm not sure about chaining the headers (as your sg_ring and
> scsi_sgtable do) would simplify LLDs. Have you looked at ips or
> qla1280?

Not yet, am working my way through the drivers, but I don't expect it will be
a simplification to the normal SCSI LLDs. Most of them are mere consumers of
sgs...

I'm not a SCSI person: I'm patching SCSI because I have to to get my own
sg-using code clean :)

Hope that clarifies,
Rusty.

2007-12-21 04:39:20

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Fri, 21 Dec 2007 14:26:47 +1100
Rusty Russell <[email protected]> wrote:

> On Friday 21 December 2007 13:28:34 FUJITA Tomonori wrote:
> > I'm not sure about chaining the headers (as your sg_ring and
> > scsi_sgtable do) would simplify LLDs. Have you looked at ips or
> > qla1280?
>
> Not yet, am working my way through the drivers, but I don't expect it will be
> a simplification to the normal SCSI LLDs. Most of them are mere consumers of
> sgs...

Some scsi drivers like ips access to sglist in a tricky way. I feel
that they don't work with the sg_ring interface well. So if you
convert scsi_lib.c to use sg_ring, please see how it works with the
tricky drivers before that.


> I'm not a SCSI person: I'm patching SCSI because I have to to get my
> own sg-using code clean :)

I'm SCSI-biased. If you don't convert scsi to use sg_ring, I don't
complain. :) Though it would be better to have only one mechanism to
handle large sglist in kernel.

2007-12-21 12:14:17

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

Rusty Russell <[email protected]> wrote:
>
> 2) It allows others to manipulate sg chains, which couldn't be done before
> (eg. the ATA code which wants to append a padding element).

Yes chaining at the end is very useful for the crypto layer as
well.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-12-21 12:16:20

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH 1/3] SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers

Boaz Harrosh <[email protected]> wrote:
>
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 416e000..c3ca848 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -7,6 +7,12 @@
> #include <linux/string.h>
> #include <asm/io.h>
>
> +struct sg_table {
> + struct scatterlist *sgl; /* the list */
> + unsigned int nents; /* number of mapped entries */
> + unsigned int orig_nents; /* original size of list */
> +};

If we're making massive changes like this, let's do it properly
as Rusty has demonstrated so that we support back-chaining as
well as frong-chaining.

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2007-12-26 00:28:11

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Friday 21 December 2007 15:37:46 FUJITA Tomonori wrote:
> Some scsi drivers like ips access to sglist in a tricky way.

Indeed. I fail to see how this code works, in fact:

drivers/scsi/ips.c:ips_queue() line 1101

if (ips_is_passthru(SC)) {

ips_copp_wait_item_t *scratch;

/* A Reset IOCTL is only sent by the boot CD in extreme cases. */
/* There can never be any system activity ( network or disk ), but check */
/* anyway just as a good practice. */
pt = (ips_passthru_t *) scsi_sglist(SC);
if ((pt->CoppCP.cmd.reset.op_code == IPS_CMD_RESET_CHANNEL) &&
(pt->CoppCP.cmd.reset.adapter_flag == 1)) {

Casting the scsi_sglist(SC) (which ips_is_passthrough treated as a
scatterlist) to an ips_passthrough_t seems completely bogus. It looks like
it wants to access the region mapped by the scatterlist.

There are many signs through the code that it needs a great deal of work:
what is the purpose of sg_break? Why does the code check if kmap_atomic
fails?

===
Convert the ips SCSI driver to sg_ring.

Slightly non-trivial conversion, will need testing.

The first issue is the standard one with "scsi_for_each_sg"
conversion: scsi_for_each_sg will always start i at 0 and increment it
each time around; "sg_ring_for_each" will start i at 0 but it will
return to zero for each new sg_ring entry; you need a separate counter
if you really want a simple increment.

Various flaws in the driver have been maintained.

Signed-off-by: Rusty Russell <[email protected]>

diff -r 63176a8a6ce3 drivers/scsi/ips.c
--- a/drivers/scsi/ips.c Mon Dec 24 17:40:08 2007 +1100
+++ b/drivers/scsi/ips.c Wed Dec 26 11:20:52 2007 +1100
@@ -1105,7 +1105,7 @@ static int ips_queue(struct scsi_cmnd *S
/* A Reset IOCTL is only sent by the boot CD in extreme cases. */
/* There can never be any system activity ( network or disk ), but check */
/* anyway just as a good practice. */
- pt = (ips_passthru_t *) scsi_sglist(SC);
+ pt = (ips_passthru_t *) SC->sg;
if ((pt->CoppCP.cmd.reset.op_code == IPS_CMD_RESET_CHANNEL) &&
(pt->CoppCP.cmd.reset.adapter_flag == 1)) {
if (ha->scb_activelist.count != 0) {
@@ -1508,8 +1508,8 @@ static int ips_is_passthru(struct scsi_c
if ((SC->cmnd[0] == IPS_IOCTL_COMMAND) &&
(SC->device->channel == 0) &&
(SC->device->id == IPS_ADAPTER_ID) &&
- (SC->device->lun == 0) && scsi_sglist(SC)) {
- struct scatterlist *sg = scsi_sglist(SC);
+ (SC->device->lun == 0) && SC->sg) {
+ struct scatterlist *sg = &SC->sg->sg[0];
char *buffer;

/* kmap_atomic() ensures addressability of the user buffer.*/
@@ -1575,12 +1575,12 @@ ips_make_passthru(ips_ha_t *ha, struct s
ips_passthru_t *pt;
int length = 0;
int i, ret;
- struct scatterlist *sg = scsi_sglist(SC);
+ struct sg_ring *sg;

METHOD_TRACE("ips_make_passthru", 1);

- scsi_for_each_sg(SC, sg, scsi_sg_count(SC), i)
- length += sg[i].length;
+ sg_ring_for_each(SC->sg, sg, i)
+ length += sg->sg[i].length;

if (length < sizeof (ips_passthru_t)) {
/* wrong size */
@@ -2005,7 +2005,7 @@ ips_cleanup_passthru(ips_ha_t * ha, ips_

METHOD_TRACE("ips_cleanup_passthru", 1);

- if ((!scb) || (!scb->scsi_cmd) || (!scsi_sglist(scb->scsi_cmd))) {
+ if ((!scb) || (!scb->scsi_cmd) || (!scb->scsi_cmd->sg)) {
DEBUG_VAR(1, "(%s%d) couldn't cleanup after passthru",
ips_name, ha->host_num);

@@ -2758,16 +2758,18 @@ ips_next(ips_ha_t * ha, int intr)
scb->sg_count = scsi_dma_map(SC);
BUG_ON(scb->sg_count < 0);
if (scb->sg_count) {
- struct scatterlist *sg;
- int i;
+ struct sg_ring *sg;
+ int i, idx;

scb->flags |= IPS_SCB_MAP_SG;

- scsi_for_each_sg(SC, sg, scb->sg_count, i) {
+ idx = 0;
+ sg_ring_for_each(SC->sg, sg, i) {
if (ips_fill_scb_sg_single
- (ha, sg_dma_address(sg), scb, i,
- sg_dma_len(sg)) < 0)
- break;
+ (ha, sg_dma_address(&sg->sg[i]), scb, idx,
+ sg_dma_len(&sg->sg[i])) < 0)
+ break;
+ idx++;
}
scb->dcdb.transfer_length = scb->data_len;
} else {
@@ -3251,34 +3253,35 @@ ips_done(ips_ha_t * ha, ips_scb_t * scb)
* the rest of the data and continue.
*/
if ((scb->breakup) || (scb->sg_break)) {
- struct scatterlist *sg;
+ struct sg_ring *sg;
int i, sg_dma_index, ips_sg_index = 0;

/* we had a data breakup */
scb->data_len = 0;

- sg = scsi_sglist(scb->scsi_cmd);
-
/* Spin forward to last dma chunk */
- sg_dma_index = scb->breakup;
- for (i = 0; i < scb->breakup; i++)
- sg = sg_next(sg);
-
+ sg_dma_index = 0;
+ sg_ring_for_each(scb->scsi_cmd->sg, sg, i) {
+ if (sg_dma_index == scb->breakup)
+ break;
+ sg_dma_index++;
+ }
+
/* Take care of possible partial on last chunk */
ips_fill_scb_sg_single(ha,
- sg_dma_address(sg),
+ sg_dma_address(&sg->sg[i]),
scb, ips_sg_index++,
- sg_dma_len(sg));
-
- for (; sg_dma_index < scsi_sg_count(scb->scsi_cmd);
- sg_dma_index++, sg = sg_next(sg)) {
+ sg_dma_len(&sg->sg[i]));
+
+ do {
if (ips_fill_scb_sg_single
(ha,
- sg_dma_address(sg),
+ sg_dma_address(&sg->sg[i]),
scb, ips_sg_index++,
- sg_dma_len(sg)) < 0)
- break;
- }
+ sg_dma_len(&sg->sg[i])) < 0)
+ break;
+ } while ((sg = sg_ring_iter(scb->scsi_cmd->sg, sg, &i))
+ != NULL);

scb->dcdb.transfer_length = scb->data_len;
scb->dcdb.cmd_attribute |=
@@ -3510,26 +3513,28 @@ ips_scmd_buf_write(struct scsi_cmnd *scm
ips_scmd_buf_write(struct scsi_cmnd *scmd, void *data, unsigned int count)
{
int i;
- unsigned int min_cnt, xfer_cnt;
+ unsigned int min_cnt, xfer_cnt = 0;
char *cdata = (char *) data;
unsigned char *buffer;
unsigned long flags;
- struct scatterlist *sg = scsi_sglist(scmd);
-
- for (i = 0, xfer_cnt = 0;
- (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
- min_cnt = min(count - xfer_cnt, sg[i].length);
+ struct sg_ring *sg;
+
+ sg_ring_for_each(scmd->sg, sg, i) {
+ min_cnt = min(count - xfer_cnt, sg->sg[i].length);

/* kmap_atomic() ensures addressability of the data buffer.*/
/* local_irq_save() protects the KM_IRQ0 address slot. */
local_irq_save(flags);
- buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset;
+ buffer = kmap_atomic(sg_page(&sg->sg[i]), KM_IRQ0)
+ + sg->sg[i].offset;
memcpy(buffer, &cdata[xfer_cnt], min_cnt);
- kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
+ kunmap_atomic(buffer - sg->sg[i].offset, KM_IRQ0);
local_irq_restore(flags);

xfer_cnt += min_cnt;
- }
+ if (xfer_cnt == count)
+ break;
+ }
}

/****************************************************************************/
@@ -3543,26 +3548,28 @@ ips_scmd_buf_read(struct scsi_cmnd *scmd
ips_scmd_buf_read(struct scsi_cmnd *scmd, void *data, unsigned int count)
{
int i;
- unsigned int min_cnt, xfer_cnt;
+ unsigned int min_cnt, xfer_cnt = 0;
char *cdata = (char *) data;
unsigned char *buffer;
unsigned long flags;
- struct scatterlist *sg = scsi_sglist(scmd);
-
- for (i = 0, xfer_cnt = 0;
- (i < scsi_sg_count(scmd)) && (xfer_cnt < count); i++) {
- min_cnt = min(count - xfer_cnt, sg[i].length);
+ struct sg_ring *sg;
+
+ sg_ring_for_each(scmd->sg, sg, i) {
+ min_cnt = min(count - xfer_cnt, sg->sg[i].length);

/* kmap_atomic() ensures addressability of the data buffer.*/
/* local_irq_save() protects the KM_IRQ0 address slot. */
local_irq_save(flags);
- buffer = kmap_atomic(sg_page(&sg[i]), KM_IRQ0) + sg[i].offset;
+ buffer = kmap_atomic(sg_page(&sg->sg[i]), KM_IRQ0)
+ + sg->sg[i].offset;
memcpy(&cdata[xfer_cnt], buffer, min_cnt);
- kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
+ kunmap_atomic(buffer - sg->sg[i].offset, KM_IRQ0);
local_irq_restore(flags);

xfer_cnt += min_cnt;
- }
+ if (xfer_cnt == count)
+ break;
+ }
}

/****************************************************************************/
@@ -4196,7 +4203,7 @@ ips_rdcap(ips_ha_t * ha, ips_scb_t * scb

METHOD_TRACE("ips_rdcap", 1);

- if (scsi_bufflen(scb->scsi_cmd) < 8)
+ if (scb->scsi_cmd->request_bufflen < 8)
return (0);

cap.lba =

2007-12-27 02:11:20

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Wed, 26 Dec 2007 11:27:40 +1100
Rusty Russell <[email protected]> wrote:

> On Friday 21 December 2007 15:37:46 FUJITA Tomonori wrote:
> > Some scsi drivers like ips access to sglist in a tricky way.
>
> Indeed. I fail to see how this code works, in fact:
>
> drivers/scsi/ips.c:ips_queue() line 1101
>
> if (ips_is_passthru(SC)) {
>
> ips_copp_wait_item_t *scratch;
>
> /* A Reset IOCTL is only sent by the boot CD in extreme cases. */
> /* There can never be any system activity ( network or disk ), but check */
> /* anyway just as a good practice. */
> pt = (ips_passthru_t *) scsi_sglist(SC);
> if ((pt->CoppCP.cmd.reset.op_code == IPS_CMD_RESET_CHANNEL) &&
> (pt->CoppCP.cmd.reset.adapter_flag == 1)) {
>
> Casting the scsi_sglist(SC) (which ips_is_passthrough treated as a
> scatterlist) to an ips_passthrough_t seems completely bogus. It looks like
> it wants to access the region mapped by the scatterlist.

Yeah, it seems to be broken.

> There are many signs through the code that it needs a great deal of
> work: what is the purpose of sg_break? Why does the code check if kmap_atomic
> fails?

sg_break is a workaround for the hardware restrictions, I think.


> ===
> Convert the ips SCSI driver to sg_ring.
>
> Slightly non-trivial conversion, will need testing.

As I said, I don't think that converting SCSI drivers to sg_ring (with
lots of non-trivial work) provides any benefits. Surely, sg_ring
enables us to modify sg lists but SCSI drivers don't need the
feature. What SCSI drivers needs is just a efficient way to get the
next sg entry (they use 'sg++' in the past and sg_next now).

2007-12-27 11:53:11

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 0/5] sg_ring for scsi

On Thursday 27 December 2007 13:09:27 FUJITA Tomonori wrote:
> On Wed, 26 Dec 2007 11:27:40 +1100
>
> Rusty Russell <[email protected]> wrote:
> > There are many signs through the code that it needs a great deal of
> > work: what is the purpose of sg_break? Why does the code check if
> > kmap_atomic fails?
>
> sg_break is a workaround for the hardware restrictions, I think.

Well, scb->breakup does that, can't see what scb->sg_break is for... May have
to wade through git history to understand it.

> > ===
> > Convert the ips SCSI driver to sg_ring.
> >
> > Slightly non-trivial conversion, will need testing.
>
> As I said, I don't think that converting SCSI drivers to sg_ring (with
> lots of non-trivial work) provides any benefits. Surely, sg_ring
> enables us to modify sg lists but SCSI drivers don't need the
> feature. What SCSI drivers needs is just a efficient way to get the
> next sg entry (they use 'sg++' in the past and sg_next now).

Sure, iteration over a two-level structure is a little more tricky, but it's
pretty trivial for most drivers.

Indeed, I think that it's probably better to have an explicitly different
interface for sg_ring, and keep the current interface for small sg arrays.
That would provide a nicer changeover.

I'll prototype something and see what it's like...
Rusty.