2007-10-24 10:50:30

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 0/11] fix iommu sg merging problem

IOMMUs merges scatter/gather segments without considering a low level
driver's restrictions. The problem is that IOMMUs can't access to the
limitations because they are in request_queue.

This patchset introduces a new structure, device_dma_parameters,
including dma information. A pointer to device_dma_parameters is added
to struct device. The bus specific structures (like pci_dev) includes
device_dma_parameters. Low level drivers can use dma_set_max_seg_size
to tell IOMMUs about the restrictions.

We can move more dma stuff in struct device (like dma_mask) to struct
device_dma_parameters later (needs some cleanups before that).

This includes patches for all the IOMMUs that could merge sg (x86_64,
ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
tested. The patches for other IOMMUs are only compile tested.

Thanks to everyone for the comments on the previous submission
to linux-scsi.

This is against 2.6.24-rc1. The same patchset is also available:

git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git iommu-sg-fixes


2007-10-24 10:48:38

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 05/11] IA64: make sba_iommu respect the segment size limits


This patch makes sba iommu respect segment size limits when merging sg
lists.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/ia64/hp/common/sba_iommu.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index bc859a3..82bf7fa 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -1265,7 +1265,7 @@ sba_fill_pdir(
* the sglist do both.
*/
static SBA_INLINE int
-sba_coalesce_chunks( struct ioc *ioc,
+sba_coalesce_chunks(struct ioc *ioc, struct device *dev,
struct scatterlist *startsg,
int nents)
{
@@ -1275,6 +1275,7 @@ sba_coalesce_chunks( struct ioc *ioc,
struct scatterlist *dma_sg; /* next DMA stream head */
unsigned long dma_offset, dma_len; /* start/len of DMA stream */
int n_mappings = 0;
+ unsigned int max_seg_size = dma_get_max_seg_size(dev);

while (nents > 0) {
unsigned long vaddr = (unsigned long) sba_sg_address(startsg);
@@ -1314,6 +1315,9 @@ sba_coalesce_chunks( struct ioc *ioc,
> DMA_CHUNK_SIZE)
break;

+ if (dma_len + startsg->length > max_seg_size)
+ break;
+
/*
** Then look for virtually contiguous blocks.
**
@@ -1441,7 +1445,7 @@ int sba_map_sg(struct device *dev, struct scatterlist *sglist, int nents, int di
** w/o this association, we wouldn't have coherent DMA!
** Access to the virtual address is what forces a two pass algorithm.
*/
- coalesced = sba_coalesce_chunks(ioc, sglist, nents);
+ coalesced = sba_coalesce_chunks(ioc, dev, sglist, nents);

/*
** Program the I/O Pdir
--
1.5.2.4

2007-10-24 10:49:09

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 07/11] sparc64: make iommu respect the segment size limits

This patch makes iommu respect segment size limits when merging sg
lists.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/sparc64/kernel/iommu.c | 2 +-
arch/sparc64/kernel/iommu_common.c | 8 ++++++--
arch/sparc64/kernel/iommu_common.h | 3 ++-
arch/sparc64/kernel/pci_sun4v.c | 2 +-
4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/sparc64/kernel/iommu.c b/arch/sparc64/kernel/iommu.c
index 070a484..4b9115a 100644
--- a/arch/sparc64/kernel/iommu.c
+++ b/arch/sparc64/kernel/iommu.c
@@ -580,7 +580,7 @@ static int dma_4u_map_sg(struct device *dev, struct scatterlist *sglist,

/* Step 1: Prepare scatter list. */

- npages = prepare_sg(sglist, nelems);
+ npages = prepare_sg(dev, sglist, nelems);

/* Step 2: Allocate a cluster and context, if necessary. */

diff --git a/arch/sparc64/kernel/iommu_common.c b/arch/sparc64/kernel/iommu_common.c
index b70324e..62c3218 100644
--- a/arch/sparc64/kernel/iommu_common.c
+++ b/arch/sparc64/kernel/iommu_common.c
@@ -4,6 +4,7 @@
* Copyright (C) 1999 David S. Miller ([email protected])
*/

+#include <linux/dma-mapping.h>
#include "iommu_common.h"

/* You are _strongly_ advised to enable the following debugging code
@@ -201,21 +202,24 @@ void verify_sglist(struct scatterlist *sglist, int nents, iopte_t *iopte, int np
}
#endif

-unsigned long prepare_sg(struct scatterlist *sg, int nents)
+unsigned long prepare_sg(struct device *dev, struct scatterlist *sg, int nents)
{
struct scatterlist *dma_sg = sg;
unsigned long prev;
u32 dent_addr, dent_len;
+ unsigned int max_seg_size;

prev = (unsigned long) sg_virt(sg);
prev += (unsigned long) (dent_len = sg->length);
dent_addr = (u32) ((unsigned long)(sg_virt(sg)) & (IO_PAGE_SIZE - 1UL));
+ max_seg_size = dma_get_max_seg_size(dev);
while (--nents) {
unsigned long addr;

sg = sg_next(sg);
addr = (unsigned long) sg_virt(sg);
- if (! VCONTIG(prev, addr)) {
+ if (! VCONTIG(prev, addr) ||
+ dent_len + sg->length > max_seg_size) {
dma_sg->dma_address = dent_addr;
dma_sg->dma_length = dent_len;
dma_sg = sg_next(dma_sg);
diff --git a/arch/sparc64/kernel/iommu_common.h b/arch/sparc64/kernel/iommu_common.h
index 75b5a58..a90d046 100644
--- a/arch/sparc64/kernel/iommu_common.h
+++ b/arch/sparc64/kernel/iommu_common.h
@@ -9,6 +9,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/scatterlist.h>
+#include <linux/device.h>

#include <asm/iommu.h>
#include <asm/scatterlist.h>
@@ -46,4 +47,4 @@ extern void verify_sglist(struct scatterlist *sg, int nents, iopte_t *iopte, int
#define VCONTIG(__X, __Y) (((__X) == (__Y)) || \
(((__X) | (__Y)) << (64UL - PAGE_SHIFT)) == 0UL)

-extern unsigned long prepare_sg(struct scatterlist *sg, int nents);
+extern unsigned long prepare_sg(struct device *dev, struct scatterlist *sg, int nents);
diff --git a/arch/sparc64/kernel/pci_sun4v.c b/arch/sparc64/kernel/pci_sun4v.c
index 8c4875b..cc03021 100644
--- a/arch/sparc64/kernel/pci_sun4v.c
+++ b/arch/sparc64/kernel/pci_sun4v.c
@@ -490,7 +490,7 @@ static int dma_4v_map_sg(struct device *dev, struct scatterlist *sglist,
goto bad;

/* Step 1: Prepare scatter list. */
- npages = prepare_sg(sglist, nelems);
+ npages = prepare_sg(dev, sglist, nelems);

/* Step 2: Allocate a cluster and context, if necessary. */
spin_lock_irqsave(&iommu->lock, flags);
--
1.5.2.4

2007-10-24 10:49:27

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 09/11] call blk_queue_segment_boundary in __scsi_alloc_queue

request_queue and device struct must have the same value of a segment
size limit. This patch adds blk_queue_segment_boundary in
__scsi_alloc_queue so LLDs don't need to call both
blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change
the default value (64KB) can call device_dma_parameters accessors like
pci_set_dma_max_seg_size when allocating scsi_host.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/scsi/scsi_lib.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fdaf0..23a30ab 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1645,6 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
request_fn_proc *request_fn)
{
struct request_queue *q;
+ struct device *dev = shost->shost_gendev.parent;

q = blk_init_queue(request_fn, NULL);
if (!q)
@@ -1673,6 +1674,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
blk_queue_segment_boundary(q, shost->dma_boundary);

+ blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
+
if (!shost->use_clustering)
clear_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
return q;
--
1.5.2.4

2007-10-24 10:49:44

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 10/11] sata_inic162x: use pci_set_dma_max_seg_size

This sets the segment size limit properly via pci_set_dma_max_seg_size
and remove blk_queue_max_segment_size because scsi-ml calls it.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/ata/sata_inic162x.c | 25 +++++++++++++------------
1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/ata/sata_inic162x.c b/drivers/ata/sata_inic162x.c
index 08595f3..4f3ba83 100644
--- a/drivers/ata/sata_inic162x.c
+++ b/drivers/ata/sata_inic162x.c
@@ -108,17 +108,6 @@ struct inic_port_priv {
u8 cached_pirq_mask;
};

-static int inic_slave_config(struct scsi_device *sdev)
-{
- /* This controller is braindamaged. dma_boundary is 0xffff
- * like others but it will lock up the whole machine HARD if
- * 65536 byte PRD entry is fed. Reduce maximum segment size.
- */
- blk_queue_max_segment_size(sdev->request_queue, 65536 - 512);
-
- return ata_scsi_slave_config(sdev);
-}
-
static struct scsi_host_template inic_sht = {
.module = THIS_MODULE,
.name = DRV_NAME,
@@ -132,7 +121,7 @@ static struct scsi_host_template inic_sht = {
.use_clustering = ATA_SHT_USE_CLUSTERING,
.proc_name = DRV_NAME,
.dma_boundary = ATA_DMA_BOUNDARY,
- .slave_configure = inic_slave_config,
+ .slave_configure = ata_scsi_slave_config,
.slave_destroy = ata_scsi_slave_destroy,
.bios_param = ata_std_bios_param,
};
@@ -730,6 +719,18 @@ static int inic_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
return rc;
}

+ /*
+ * This controller is braindamaged. dma_boundary is 0xffff
+ * like others but it will lock up the whole machine HARD if
+ * 65536 byte PRD entry is fed. Reduce maximum segment size.
+ */
+ rc = pci_set_dma_max_seg_size(pdev, 65536 - 512);
+ if (rc) {
+ dev_printk(KERN_ERR, &pdev->dev,
+ "failed to set the maximum segment size.\n");
+ return rc;
+ }
+
rc = init_controller(iomap[MMIO_BAR], hpriv->cached_hctl);
if (rc) {
dev_printk(KERN_ERR, &pdev->dev,
--
1.5.2.4

2007-10-24 10:50:01

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 08/11] parisc: make iommu respect the segment size limits


This patch makes iommu respect segment size limits when merging sg
lists.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/parisc/ccio-dma.c | 2 +-
drivers/parisc/iommu-helpers.h | 7 ++++++-
drivers/parisc/sba_iommu.c | 2 +-
3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index 7c60cbd..2ec7db1 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -941,7 +941,7 @@ ccio_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
** w/o this association, we wouldn't have coherent DMA!
** Access to the virtual address is what forces a two pass algorithm.
*/
- coalesced = iommu_coalesce_chunks(ioc, sglist, nents, ccio_alloc_range);
+ coalesced = iommu_coalesce_chunks(ioc, dev, sglist, nents, ccio_alloc_range);

/*
** Program the I/O Pdir
diff --git a/drivers/parisc/iommu-helpers.h b/drivers/parisc/iommu-helpers.h
index 0a1f99a..97ba828 100644
--- a/drivers/parisc/iommu-helpers.h
+++ b/drivers/parisc/iommu-helpers.h
@@ -95,12 +95,14 @@ iommu_fill_pdir(struct ioc *ioc, struct scatterlist *startsg, int nents,
*/

static inline unsigned int
-iommu_coalesce_chunks(struct ioc *ioc, struct scatterlist *startsg, int nents,
+iommu_coalesce_chunks(struct ioc *ioc, struct device *dev,
+ struct scatterlist *startsg, int nents,
int (*iommu_alloc_range)(struct ioc *, size_t))
{
struct scatterlist *contig_sg; /* contig chunk head */
unsigned long dma_offset, dma_len; /* start/len of DMA stream */
unsigned int n_mappings = 0;
+ unsigned int max_seg_size = dma_get_max_seg_size(dev);

while (nents > 0) {

@@ -142,6 +144,9 @@ iommu_coalesce_chunks(struct ioc *ioc, struct scatterlist *startsg, int nents,
IOVP_SIZE) > DMA_CHUNK_SIZE))
break;

+ if (startsg->length + dma_len > max_seg_size)
+ break;
+
/*
** Next see if we can append the next chunk (i.e.
** it must end on one page and begin on another
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index e527a0e..d06627c 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -946,7 +946,7 @@ sba_map_sg(struct device *dev, struct scatterlist *sglist, int nents,
** w/o this association, we wouldn't have coherent DMA!
** Access to the virtual address is what forces a two pass algorithm.
*/
- coalesced = iommu_coalesce_chunks(ioc, sglist, nents, sba_alloc_range);
+ coalesced = iommu_coalesce_chunks(ioc, dev, sglist, nents, sba_alloc_range);

/*
** Program the I/O Pdir
--
1.5.2.4

2007-10-24 10:52:54

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

This sets the segment size limit properly via pci_set_dma_max_seg_size
and remove blk_queue_max_segment_size because scsi-ml calls it.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/scsi/aacraid/linit.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 038980b..04d6a65 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -435,9 +435,6 @@ static int aac_slave_configure(struct scsi_device *sdev)
else if (depth < 2)
depth = 2;
scsi_adjust_queue_depth(sdev, MSG_ORDERED_TAG, depth);
- if (!(((struct aac_dev *)host->hostdata)->adapter_info.options &
- AAC_OPT_NEW_COMM))
- blk_queue_max_segment_size(sdev->request_queue, 65536);
} else
scsi_adjust_queue_depth(sdev, 0, 1);

@@ -1045,6 +1042,12 @@ static int __devinit aac_probe_one(struct pci_dev *pdev,
if (error < 0)
goto out_deinit;

+ if (!(aac->adapter_info.options & AAC_OPT_NEW_COMM)) {
+ error = pci_set_dma_max_seg_size(pdev, 65536);
+ if (error)
+ goto out_deinit;
+ }
+
/*
* Lets override negotiations and drop the maximum SG limit to 34
*/
--
1.5.2.4

2007-10-24 10:53:22

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 06/11] alpha: make pci_iommu respect the segment size limits

This patch makes pci_iommu respect segment size limits when merging sg
lists.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/alpha/kernel/pci_iommu.c | 24 ++++++++++++++++++------
include/asm-alpha/pci.h | 1 +
2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 2d00a08..26d3789 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -9,6 +9,7 @@
#include <linux/bootmem.h>
#include <linux/scatterlist.h>
#include <linux/log2.h>
+#include <linux/dma-mapping.h>

#include <asm/io.h>
#include <asm/hwrpb.h>
@@ -470,22 +471,29 @@ EXPORT_SYMBOL(pci_free_consistent);
#define SG_ENT_PHYS_ADDRESS(SG) __pa(SG_ENT_VIRT_ADDRESS(SG))

static void
-sg_classify(struct scatterlist *sg, struct scatterlist *end, int virt_ok)
+sg_classify(struct device *dev, struct scatterlist *sg, struct scatterlist *end,
+ int virt_ok)
{
unsigned long next_paddr;
struct scatterlist *leader;
long leader_flag, leader_length;
+ unsigned int max_seg_size;

leader = sg;
leader_flag = 0;
leader_length = leader->length;
next_paddr = SG_ENT_PHYS_ADDRESS(leader) + leader_length;

+ /* we will not marge sg without device. */
+ max_seg_size = dev ? dma_get_max_seg_size(dev) : 0;
for (++sg; sg < end; ++sg) {
unsigned long addr, len;
addr = SG_ENT_PHYS_ADDRESS(sg);
len = sg->length;

+ if (leader_length + len > max_seg_size)
+ goto new_segment;
+
if (next_paddr == addr) {
sg->dma_address = -1;
leader_length += len;
@@ -494,6 +502,7 @@ sg_classify(struct scatterlist *sg, struct scatterlist *end, int virt_ok)
leader_flag = 1;
leader_length += len;
} else {
+new_segment:
leader->dma_address = leader_flag;
leader->dma_length = leader_length;
leader = sg;
@@ -512,7 +521,7 @@ sg_classify(struct scatterlist *sg, struct scatterlist *end, int virt_ok)
in the blanks. */

static int
-sg_fill(struct scatterlist *leader, struct scatterlist *end,
+sg_fill(struct device *dev, struct scatterlist *leader, struct scatterlist *end,
struct scatterlist *out, struct pci_iommu_arena *arena,
dma_addr_t max_dma, int dac_allowed)
{
@@ -562,8 +571,8 @@ sg_fill(struct scatterlist *leader, struct scatterlist *end,

/* Otherwise, break up the remaining virtually contiguous
hunks into individual direct maps and retry. */
- sg_classify(leader, end, 0);
- return sg_fill(leader, end, out, arena, max_dma, dac_allowed);
+ sg_classify(dev, leader, end, 0);
+ return sg_fill(dev, leader, end, out, arena, max_dma, dac_allowed);
}

out->dma_address = arena->dma_base + dma_ofs*PAGE_SIZE + paddr;
@@ -619,12 +628,15 @@ pci_map_sg(struct pci_dev *pdev, struct scatterlist *sg, int nents,
struct pci_iommu_arena *arena;
dma_addr_t max_dma;
int dac_allowed;
+ struct device *dev;

if (direction == PCI_DMA_NONE)
BUG();

dac_allowed = pdev ? pci_dac_dma_supported(pdev, pdev->dma_mask) : 0;

+ dev = pdev ? &pdev->dev : NULL;
+
/* Fast path single entry scatterlists. */
if (nents == 1) {
sg->dma_length = sg->length;
@@ -638,7 +650,7 @@ pci_map_sg(struct pci_dev *pdev, struct scatterlist *sg, int nents,
end = sg + nents;

/* First, prepare information about the entries. */
- sg_classify(sg, end, alpha_mv.mv_pci_tbi != 0);
+ sg_classify(dev, sg, end, alpha_mv.mv_pci_tbi != 0);

/* Second, figure out where we're going to map things. */
if (alpha_mv.mv_pci_tbi) {
@@ -658,7 +670,7 @@ pci_map_sg(struct pci_dev *pdev, struct scatterlist *sg, int nents,
for (out = sg; sg < end; ++sg) {
if ((int) sg->dma_address < 0)
continue;
- if (sg_fill(sg, end, out, arena, max_dma, dac_allowed) < 0)
+ if (sg_fill(dev, sg, end, out, arena, max_dma, dac_allowed) < 0)
goto error;
out++;
}
diff --git a/include/asm-alpha/pci.h b/include/asm-alpha/pci.h
index 30ee766..d5b10ef 100644
--- a/include/asm-alpha/pci.h
+++ b/include/asm-alpha/pci.h
@@ -4,6 +4,7 @@
#ifdef __KERNEL__

#include <linux/spinlock.h>
+#include <linux/dma-mapping.h>
#include <asm/scatterlist.h>
#include <asm/machvec.h>

--
1.5.2.4

2007-10-24 10:53:41

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 04/11] ppc: make iommu respect the segment size limits

This patch makes iommu respect segment size limits when merging sg
lists.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/powerpc/kernel/dma_64.c | 2 +-
arch/powerpc/kernel/iommu.c | 8 ++++++--
include/asm-powerpc/iommu.h | 2 +-
3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/dma_64.c b/arch/powerpc/kernel/dma_64.c
index 14206e3..1806d96 100644
--- a/arch/powerpc/kernel/dma_64.c
+++ b/arch/powerpc/kernel/dma_64.c
@@ -68,7 +68,7 @@ static void dma_iommu_unmap_single(struct device *dev, dma_addr_t dma_handle,
static int dma_iommu_map_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction direction)
{
- return iommu_map_sg(dev->archdata.dma_data, sglist, nelems,
+ return iommu_map_sg(dev, sglist, nelems,
device_to_mask(dev), direction);
}

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 2d0c9ef..7a5d247 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -270,15 +270,17 @@ static void iommu_free(struct iommu_table *tbl, dma_addr_t dma_addr,
spin_unlock_irqrestore(&(tbl->it_lock), flags);
}

-int iommu_map_sg(struct iommu_table *tbl, struct scatterlist *sglist,
+int iommu_map_sg(struct device *dev, struct scatterlist *sglist,
int nelems, unsigned long mask,
enum dma_data_direction direction)
{
+ struct iommu_table *tbl = dev->archdata.dma_data;
dma_addr_t dma_next = 0, dma_addr;
unsigned long flags;
struct scatterlist *s, *outs, *segstart;
int outcount, incount, i;
unsigned long handle;
+ unsigned int max_seg_size;

BUG_ON(direction == DMA_NONE);

@@ -297,6 +299,7 @@ int iommu_map_sg(struct iommu_table *tbl, struct scatterlist *sglist,

spin_lock_irqsave(&(tbl->it_lock), flags);

+ max_seg_size = dma_get_max_seg_size(dev);
for_each_sg(sglist, s, nelems, i) {
unsigned long vaddr, npages, entry, slen;

@@ -338,7 +341,8 @@ int iommu_map_sg(struct iommu_table *tbl, struct scatterlist *sglist,
/* We cannot merge if:
* - allocated dma_addr isn't contiguous to previous allocation
*/
- if (novmerge || (dma_addr != dma_next)) {
+ if (novmerge || (dma_addr != dma_next) ||
+ (outs->dma_length + s->length > max_seg_size)) {
/* Can't merge: create a new segment */
segstart = s;
outcount++;
diff --git a/include/asm-powerpc/iommu.h b/include/asm-powerpc/iommu.h
index 4a82fdc..5225f05 100644
--- a/include/asm-powerpc/iommu.h
+++ b/include/asm-powerpc/iommu.h
@@ -80,7 +80,7 @@ extern void iommu_free_table(struct device_node *dn);
extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);

-extern int iommu_map_sg(struct iommu_table *tbl, struct scatterlist *sglist,
+extern int iommu_map_sg(struct device *dev, struct scatterlist *sglist,
int nelems, unsigned long mask,
enum dma_data_direction direction);
extern void iommu_unmap_sg(struct iommu_table *tbl, struct scatterlist *sglist,
--
1.5.2.4

2007-10-24 10:57:32

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 03/11] x86: make pci-gart iommu respect the segment size limits

This patch makes pci-gart iommu respect segment size limits when
merging sg lists.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/x86/kernel/pci-gart_64.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index c56e9ee..dfe3828 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -384,6 +384,8 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
int start;
unsigned long pages = 0;
int need = 0, nextneed;
+ unsigned int seg_size;
+ unsigned int max_seg_size;
struct scatterlist *s, *ps, *start_sg, *sgmap;

if (nents == 0)
@@ -395,6 +397,8 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
out = 0;
start = 0;
start_sg = sgmap = sg;
+ seg_size = 0;
+ max_seg_size = dma_get_max_seg_size(dev);
ps = NULL; /* shut up gcc */
for_each_sg(sg, s, nents, i) {
dma_addr_t addr = sg_phys(s);
@@ -408,11 +412,13 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
/* Can only merge when the last chunk ends on a page
boundary and the new one doesn't have an offset. */
if (!iommu_merge || !nextneed || !need || s->offset ||
+ (s->length + seg_size > max_seg_size) ||
(ps->offset + ps->length) % PAGE_SIZE) {
if (dma_map_cont(start_sg, i - start, sgmap,
pages, need) < 0)
goto error;
out++;
+ seg_size = 0;
sgmap = sg_next(sgmap);
pages = 0;
start = i;
@@ -420,6 +426,7 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents,
}
}

+ seg_size += s->length;
need = nextneed;
pages += to_pages(s->offset, s->length);
ps = s;
--
1.5.2.4

2007-10-24 10:58:22

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 01/11] add device_dma_parameters structure

iommu code merges scatter/gather segments without considering a low
level driver's restrictions. The problem is that iommu code can't
access to the limitations because they are in request_queue.

This patch adds a new structure, device_dma_parameters, including dma
information. A pointer to device_dma_parameters is added to struct
device.

- there are only max_segment_size and segment_boundary_mask there but
we'll move more dma stuff in struct device (like dma_mask) to struct
device_dma_parameters later. segment_boundary_mask is not supported
yet.

- new accessors for the dma parameters are added. So we can easily
change where to place struct device_dma_parameters in the future.

- dma_get_max_seg_size returns 64K if dma_parms in struct device isn't
set up properly. 64K is the default max_segment_size in the block
layer.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
include/linux/device.h | 11 +++++++++++
include/linux/dma-mapping.h | 15 +++++++++++++++
2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 2e15822..9eb883f 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -397,6 +397,15 @@ extern int devres_release_group(struct device *dev, void *id);
extern void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp);
extern void devm_kfree(struct device *dev, void *p);

+struct device_dma_parameters {
+ /*
+ * a low level driver may set these to teach IOMMU code about
+ * sg limitations.
+ */
+ unsigned int max_segment_size;
+ unsigned long segment_boundary_mask;
+};
+
struct device {
struct klist klist_children;
struct klist_node knode_parent; /* node in sibling list */
@@ -432,6 +441,8 @@ struct device {
64 bit addresses for consistent
allocations such descriptors. */

+ struct device_dma_parameters *dma_parms;
+
struct list_head dma_pools; /* dma pools (if dma'ble) */

struct dma_coherent_mem *dma_mem; /* internal for coherent mem
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 101a2d4..71972ca 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -60,6 +60,21 @@ static inline int is_device_dma_capable(struct device *dev)

extern u64 dma_get_required_mask(struct device *dev);

+static inline unsigned int dma_get_max_seg_size(struct device *dev)
+{
+ return dev->dma_parms ? dev->dma_parms->max_segment_size : 65536;
+}
+
+static inline unsigned int dma_set_max_seg_size(struct device *dev,
+ unsigned int size)
+{
+ if (dev->dma_parms) {
+ dev->dma_parms->max_segment_size = size;
+ return 0;
+ } else
+ return -EIO;
+}
+
/* flags for the coherent memory api */
#define DMA_MEMORY_MAP 0x01
#define DMA_MEMORY_IO 0x02
--
1.5.2.4

2007-10-24 10:58:36

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 02/11] PCI: add device_dma_parameters support

This adds struct device_dma_parameters in struct pci_dev and properly
sets up a pointer in struct device.

The default max_segment_size is set to 64K, same to the block layer's
default value.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/pci/pci.c | 8 ++++++++
drivers/pci/probe.c | 3 +++
include/linux/pci.h | 4 ++++
3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 71d561f..de97c2b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1397,6 +1397,14 @@ pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask)
}
#endif

+#ifndef HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_SIZE
+int pci_set_dma_max_seg_size(struct pci_dev *dev, unsigned int size)
+{
+ return dma_set_max_seg_size(&dev->dev, size);
+}
+EXPORT_SYMBOL(pci_set_dma_max_seg_size);
+#endif
+
/**
* pcix_get_max_mmrbc - get PCI-X maximum designed memory read byte count
* @dev: PCI device to query
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 463a5a9..54edea2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -985,8 +985,11 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)

set_dev_node(&dev->dev, pcibus_to_node(bus));
dev->dev.dma_mask = &dev->dma_mask;
+ dev->dev.dma_parms = &dev->dma_parms;
dev->dev.coherent_dma_mask = 0xffffffffull;

+ pci_set_dma_max_seg_size(dev, 65536);
+
/* Fix up broken headers */
pci_fixup_device(pci_fixup_header, dev);

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5d2281f..f8cba3b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -152,6 +152,8 @@ struct pci_dev {
this if your device has broken DMA
or supports 64-bit transfers. */

+ struct device_dma_parameters dma_parms;
+
pci_power_t current_state; /* Current operating state. In ACPI-speak,
this is D0-D3, D0 being fully functional,
and D3 being off. */
@@ -556,6 +558,7 @@ void pci_intx(struct pci_dev *dev, int enable);
void pci_msi_off(struct pci_dev *dev);
int pci_set_dma_mask(struct pci_dev *dev, u64 mask);
int pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask);
+int pci_set_dma_max_seg_size(struct pci_dev *dev, unsigned int size);
int pcix_get_max_mmrbc(struct pci_dev *dev);
int pcix_get_mmrbc(struct pci_dev *dev);
int pcix_set_mmrbc(struct pci_dev *dev, int mmrbc);
@@ -744,6 +747,7 @@ static inline void pci_set_master(struct pci_dev *dev) { }
static inline int pci_enable_device(struct pci_dev *dev) { return -EIO; }
static inline void pci_disable_device(struct pci_dev *dev) { }
static inline int pci_set_dma_mask(struct pci_dev *dev, u64 mask) { return -EIO; }
+static inline int pci_set_dma_max_seg_size(struct pci_dev *dev, unsigned int size) { return -EIO; }
static inline int pci_assign_resource(struct pci_dev *dev, int i) { return -EBUSY;}
static inline int __pci_register_driver(struct pci_driver *drv, struct module *owner) { return 0;}
static inline int pci_register_driver(struct pci_driver *drv) { return 0;}
--
1.5.2.4

2007-10-24 11:31:50

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

FUJITA Tomonori wrote:
> This sets the segment size limit properly via pci_set_dma_max_seg_size
> and remove blk_queue_max_segment_size because scsi-ml calls it.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> drivers/scsi/aacraid/linit.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
> index 038980b..04d6a65 100644
> --- a/drivers/scsi/aacraid/linit.c
> +++ b/drivers/scsi/aacraid/linit.c
> @@ -435,9 +435,6 @@ static int aac_slave_configure(struct scsi_device *sdev)
> else if (depth < 2)
> depth = 2;
> scsi_adjust_queue_depth(sdev, MSG_ORDERED_TAG, depth);
> - if (!(((struct aac_dev *)host->hostdata)->adapter_info.options &
> - AAC_OPT_NEW_COMM))
> - blk_queue_max_segment_size(sdev->request_queue, 65536);
> } else
> scsi_adjust_queue_depth(sdev, 0, 1);
>
> @@ -1045,6 +1042,12 @@ static int __devinit aac_probe_one(struct pci_dev *pdev,
> if (error < 0)
> goto out_deinit;
>
> + if (!(aac->adapter_info.options & AAC_OPT_NEW_COMM)) {
> + error = pci_set_dma_max_seg_size(pdev, 65536);
> + if (error)
> + goto out_deinit;
> + }


is this needed, given that the default is already 65536?

Jeff



2007-10-24 11:33:20

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 01/11] add device_dma_parameters structure

FUJITA Tomonori wrote:
> iommu code merges scatter/gather segments without considering a low
> level driver's restrictions. The problem is that iommu code can't
> access to the limitations because they are in request_queue.
>
> This patch adds a new structure, device_dma_parameters, including dma
> information. A pointer to device_dma_parameters is added to struct
> device.
>
> - there are only max_segment_size and segment_boundary_mask there but
> we'll move more dma stuff in struct device (like dma_mask) to struct
> device_dma_parameters later. segment_boundary_mask is not supported
> yet.
>
> - new accessors for the dma parameters are added. So we can easily
> change where to place struct device_dma_parameters in the future.
>
> - dma_get_max_seg_size returns 64K if dma_parms in struct device isn't
> set up properly. 64K is the default max_segment_size in the block
> layer.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> include/linux/device.h | 11 +++++++++++
> include/linux/dma-mapping.h | 15 +++++++++++++++
> 2 files changed, 26 insertions(+), 0 deletions(-)

ACK


2007-10-24 11:34:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 02/11] PCI: add device_dma_parameters support

FUJITA Tomonori wrote:
> index 463a5a9..54edea2 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -985,8 +985,11 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>
> set_dev_node(&dev->dev, pcibus_to_node(bus));
> dev->dev.dma_mask = &dev->dma_mask;
> + dev->dev.dma_parms = &dev->dma_parms;
> dev->dev.coherent_dma_mask = 0xffffffffull;
>
> + pci_set_dma_max_seg_size(dev, 65536);

this should check the return value of pci_set_dma_max_seg_size(), and do
something useful.

ACK everything else

2007-10-24 11:35:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 08/11] parisc: make iommu respect the segment size limits

FUJITA Tomonori wrote:
> This patch makes iommu respect segment size limits when merging sg
> lists.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> drivers/parisc/ccio-dma.c | 2 +-
> drivers/parisc/iommu-helpers.h | 7 ++++++-
> drivers/parisc/sba_iommu.c | 2 +-
> 3 files changed, 8 insertions(+), 3 deletions(-)

patches 3-8 look OK

2007-10-24 11:35:53

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

On Wed, 24 Oct 2007 07:31:30 -0400
Jeff Garzik <[email protected]> wrote:

> FUJITA Tomonori wrote:
> > This sets the segment size limit properly via pci_set_dma_max_seg_size
> > and remove blk_queue_max_segment_size because scsi-ml calls it.
> >
> > Signed-off-by: FUJITA Tomonori <[email protected]>
> > ---
> > drivers/scsi/aacraid/linit.c | 9 ++++++---
> > 1 files changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
> > index 038980b..04d6a65 100644
> > --- a/drivers/scsi/aacraid/linit.c
> > +++ b/drivers/scsi/aacraid/linit.c
> > @@ -435,9 +435,6 @@ static int aac_slave_configure(struct scsi_device *sdev)
> > else if (depth < 2)
> > depth = 2;
> > scsi_adjust_queue_depth(sdev, MSG_ORDERED_TAG, depth);
> > - if (!(((struct aac_dev *)host->hostdata)->adapter_info.options &
> > - AAC_OPT_NEW_COMM))
> > - blk_queue_max_segment_size(sdev->request_queue, 65536);
> > } else
> > scsi_adjust_queue_depth(sdev, 0, 1);
> >
> > @@ -1045,6 +1042,12 @@ static int __devinit aac_probe_one(struct pci_dev *pdev,
> > if (error < 0)
> > goto out_deinit;
> >
> > + if (!(aac->adapter_info.options & AAC_OPT_NEW_COMM)) {
> > + error = pci_set_dma_max_seg_size(pdev, 65536);
> > + if (error)
> > + goto out_deinit;
> > + }
>
>
> is this needed, given that the default is already 65536?

Yeah, but I thought that it would be better to set it explicitly
because 'aac->adapter_info.options & AAC_OPT_NEW_COMM' HBAs looks to
be able to handle larger segments.

2007-10-24 11:39:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 09/11] call blk_queue_segment_boundary in __scsi_alloc_queue

FUJITA Tomonori wrote:
> request_queue and device struct must have the same value of a segment
> size limit. This patch adds blk_queue_segment_boundary in
> __scsi_alloc_queue so LLDs don't need to call both
> blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change
> the default value (64KB) can call device_dma_parameters accessors like
> pci_set_dma_max_seg_size when allocating scsi_host.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> drivers/scsi/scsi_lib.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 61fdaf0..23a30ab 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1645,6 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> request_fn_proc *request_fn)
> {
> struct request_queue *q;
> + struct device *dev = shost->shost_gendev.parent;
>
> q = blk_init_queue(request_fn, NULL);
> if (!q)
> @@ -1673,6 +1674,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
> blk_queue_segment_boundary(q, shost->dma_boundary);
>
> + blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
> +

it would be nice to have something more general that's useable in
drivers/block/sx8.c (for example), something like

static inline void
dev_blk_associate(struct device *dev, request_queue *q)
{
blk_queue_max_segment_size(q,
dma_get_max_seg_size(dev));
}

still, I will ACK the above patch (#9) in case you wish that to become a
future cleanup, or others dislike this suggestion

2007-10-24 11:39:44

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 10/11] sata_inic162x: use pci_set_dma_max_seg_size

FUJITA Tomonori wrote:
> This sets the segment size limit properly via pci_set_dma_max_seg_size
> and remove blk_queue_max_segment_size because scsi-ml calls it.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> drivers/ata/sata_inic162x.c | 25 +++++++++++++------------
> 1 files changed, 13 insertions(+), 12 deletions(-)

ACK


2007-10-24 11:41:14

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 0/11] fix iommu sg merging problem

FUJITA Tomonori wrote:
> IOMMUs merges scatter/gather segments without considering a low level
> driver's restrictions. The problem is that IOMMUs can't access to the
> limitations because they are in request_queue.
>
> This patchset introduces a new structure, device_dma_parameters,
> including dma information. A pointer to device_dma_parameters is added
> to struct device. The bus specific structures (like pci_dev) includes
> device_dma_parameters. Low level drivers can use dma_set_max_seg_size
> to tell IOMMUs about the restrictions.
>
> We can move more dma stuff in struct device (like dma_mask) to struct
> device_dma_parameters later (needs some cleanups before that).
>
> This includes patches for all the IOMMUs that could merge sg (x86_64,
> ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
> tested. The patches for other IOMMUs are only compile tested.
>
> Thanks to everyone for the comments on the previous submission
> to linux-scsi.
>
> This is against 2.6.24-rc1. The same patchset is also available:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git iommu-sg-fixes

Are you interested in collecting patches to libata that eliminate the
hand-rolled S/G splitting? e.g. now ata_fill_sg() and mv_fill_sg() can
be made much more efficient.

Again, thanks for doing this! It's been needed for a long time.

Jeff



2007-10-24 13:24:50

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -mm 0/11] fix iommu sg merging problem

On Wed, Oct 24 2007, FUJITA Tomonori wrote:
> IOMMUs merges scatter/gather segments without considering a low level
> driver's restrictions. The problem is that IOMMUs can't access to the
> limitations because they are in request_queue.
>
> This patchset introduces a new structure, device_dma_parameters,
> including dma information. A pointer to device_dma_parameters is added
> to struct device. The bus specific structures (like pci_dev) includes
> device_dma_parameters. Low level drivers can use dma_set_max_seg_size
> to tell IOMMUs about the restrictions.
>
> We can move more dma stuff in struct device (like dma_mask) to struct
> device_dma_parameters later (needs some cleanups before that).
>
> This includes patches for all the IOMMUs that could merge sg (x86_64,
> ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
> tested. The patches for other IOMMUs are only compile tested.
>
> Thanks to everyone for the comments on the previous submission
> to linux-scsi.
>
> This is against 2.6.24-rc1. The same patchset is also available:

Looks good to me, I think we should get this included sooner rather than
later.

--
Jens Axboe

2007-10-24 13:34:43

by Mark Salyzyn

[permalink] [raw]
Subject: RE: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

ACK

Based on the presence of the call. 2.6.22, for instance, does not have
this capability...

I did not test this change, just accepting on the principals. How much
testing of the change did you do Fujita?

Sincerely -- Mark Salyzyn

> -----Original Message-----
> From: FUJITA Tomonori [mailto:[email protected]]
> Sent: Wednesday, October 24, 2007 6:49 AM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; AACRAID; [email protected]
> Subject: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size
>
> This sets the segment size limit properly via pci_set_dma_max_seg_size
> and remove blk_queue_max_segment_size because scsi-ml calls it.
>
> Signed-off-by: FUJITA Tomonori <[email protected]>
> ---
> drivers/scsi/aacraid/linit.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/aacraid/linit.c
> b/drivers/scsi/aacraid/linit.c
> index 038980b..04d6a65 100644
> --- a/drivers/scsi/aacraid/linit.c
> +++ b/drivers/scsi/aacraid/linit.c
> @@ -435,9 +435,6 @@ static int aac_slave_configure(struct
> scsi_device *sdev)
> else if (depth < 2)
> depth = 2;
> scsi_adjust_queue_depth(sdev, MSG_ORDERED_TAG, depth);
> - if (!(((struct aac_dev
> *)host->hostdata)->adapter_info.options &
> - AAC_OPT_NEW_COMM))
> -
> blk_queue_max_segment_size(sdev->request_queue, 65536);
> } else
> scsi_adjust_queue_depth(sdev, 0, 1);
>
> @@ -1045,6 +1042,12 @@ static int __devinit
> aac_probe_one(struct pci_dev *pdev,
> if (error < 0)
> goto out_deinit;
>
> + if (!(aac->adapter_info.options & AAC_OPT_NEW_COMM)) {
> + error = pci_set_dma_max_seg_size(pdev, 65536);
> + if (error)
> + goto out_deinit;
> + }
> +
> /*
> * Lets override negotiations and drop the maximum SG
> limit to 34
> */
> --
> 1.5.2.4
>

2007-10-24 13:34:55

by Mark Salyzyn

[permalink] [raw]
Subject: RE: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

Jeff Garzik [mailto:[email protected]] writes:
> is this needed, given that the default is already 65536?

Apparently so, as we had to add it in the past, mainly because the
feature to limit was not part of the SCSI layer when the original limit
code was added. At that time it replaced a complicated sg breakup
algorithm.

Does your statement guarantee that the default will not change to a
large value? If not, then we need to enforce it in perpetuity...

Sincerely -- Mark Salyzyn

2007-10-24 13:41:57

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 02/11] PCI: add device_dma_parameters support

On Wed, 24 Oct 2007 07:34:07 -0400
Jeff Garzik <[email protected]> wrote:

> FUJITA Tomonori wrote:
> > index 463a5a9..54edea2 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -985,8 +985,11 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
> >
> > set_dev_node(&dev->dev, pcibus_to_node(bus));
> > dev->dev.dma_mask = &dev->dma_mask;
> > + dev->dev.dma_parms = &dev->dma_parms;
> > dev->dev.coherent_dma_mask = 0xffffffffull;
> >
> > + pci_set_dma_max_seg_size(dev, 65536);
>
> this should check the return value of pci_set_dma_max_seg_size(), and do
> something useful.
>
> ACK everything else

Thanks.

I wasn't sure about what to do. Should pci_device_add fail in case of
pci_set_dma_max_seg_size failure or just should we print warning?

2007-10-24 14:16:23

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 09/11] call blk_queue_segment_boundary in __scsi_alloc_queue

On Wed, 24 Oct 2007 07:39:16 -0400
Jeff Garzik <[email protected]> wrote:

> FUJITA Tomonori wrote:
> > request_queue and device struct must have the same value of a segment
> > size limit. This patch adds blk_queue_segment_boundary in
> > __scsi_alloc_queue so LLDs don't need to call both
> > blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change
> > the default value (64KB) can call device_dma_parameters accessors like
> > pci_set_dma_max_seg_size when allocating scsi_host.
> >
> > Signed-off-by: FUJITA Tomonori <[email protected]>
> > ---
> > drivers/scsi/scsi_lib.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 61fdaf0..23a30ab 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1645,6 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> > request_fn_proc *request_fn)
> > {
> > struct request_queue *q;
> > + struct device *dev = shost->shost_gendev.parent;
> >
> > q = blk_init_queue(request_fn, NULL);
> > if (!q)
> > @@ -1673,6 +1674,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> > blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
> > blk_queue_segment_boundary(q, shost->dma_boundary);
> >
> > + blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
> > +
>
> it would be nice to have something more general that's useable in
> drivers/block/sx8.c (for example), something like
>
> static inline void
> dev_blk_associate(struct device *dev, request_queue *q)
> {
> blk_queue_max_segment_size(q,
> dma_get_max_seg_size(dev));
> }
>
> still, I will ACK the above patch (#9) in case you wish that to become a
> future cleanup, or others dislike this suggestion

Yeah, I thought about something like that. But I can't find many
non-scsi drivers (ide, sx8, mmc/card/, anymore?) doing dma and having
the restrictions so I just call blk_queue_max_segment_size here.

Either is ok with me. I'll modify the patch if Jens prefers such
function. Jens?

2007-10-24 14:28:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH -mm 09/11] call blk_queue_segment_boundary in __scsi_alloc_queue

On Wed, Oct 24 2007, FUJITA Tomonori wrote:
> On Wed, 24 Oct 2007 07:39:16 -0400
> Jeff Garzik <[email protected]> wrote:
>
> > FUJITA Tomonori wrote:
> > > request_queue and device struct must have the same value of a segment
> > > size limit. This patch adds blk_queue_segment_boundary in
> > > __scsi_alloc_queue so LLDs don't need to call both
> > > blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change
> > > the default value (64KB) can call device_dma_parameters accessors like
> > > pci_set_dma_max_seg_size when allocating scsi_host.
> > >
> > > Signed-off-by: FUJITA Tomonori <[email protected]>
> > > ---
> > > drivers/scsi/scsi_lib.c | 3 +++
> > > 1 files changed, 3 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index 61fdaf0..23a30ab 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1645,6 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> > > request_fn_proc *request_fn)
> > > {
> > > struct request_queue *q;
> > > + struct device *dev = shost->shost_gendev.parent;
> > >
> > > q = blk_init_queue(request_fn, NULL);
> > > if (!q)
> > > @@ -1673,6 +1674,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> > > blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
> > > blk_queue_segment_boundary(q, shost->dma_boundary);
> > >
> > > + blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
> > > +
> >
> > it would be nice to have something more general that's useable in
> > drivers/block/sx8.c (for example), something like
> >
> > static inline void
> > dev_blk_associate(struct device *dev, request_queue *q)
> > {
> > blk_queue_max_segment_size(q,
> > dma_get_max_seg_size(dev));
> > }
> >
> > still, I will ACK the above patch (#9) in case you wish that to become a
> > future cleanup, or others dislike this suggestion
>
> Yeah, I thought about something like that. But I can't find many
> non-scsi drivers (ide, sx8, mmc/card/, anymore?) doing dma and having
> the restrictions so I just call blk_queue_max_segment_size here.
>
> Either is ok with me. I'll modify the patch if Jens prefers such
> function. Jens?

If there's a use for it, sure.

--
Jens Axboe

2007-10-24 14:33:54

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 0/11] fix iommu sg merging problem

On Wed, 24 Oct 2007 07:40:50 -0400
Jeff Garzik <[email protected]> wrote:

> FUJITA Tomonori wrote:
> > IOMMUs merges scatter/gather segments without considering a low level
> > driver's restrictions. The problem is that IOMMUs can't access to the
> > limitations because they are in request_queue.
> >
> > This patchset introduces a new structure, device_dma_parameters,
> > including dma information. A pointer to device_dma_parameters is added
> > to struct device. The bus specific structures (like pci_dev) includes
> > device_dma_parameters. Low level drivers can use dma_set_max_seg_size
> > to tell IOMMUs about the restrictions.
> >
> > We can move more dma stuff in struct device (like dma_mask) to struct
> > device_dma_parameters later (needs some cleanups before that).
> >
> > This includes patches for all the IOMMUs that could merge sg (x86_64,
> > ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
> > tested. The patches for other IOMMUs are only compile tested.
> >
> > Thanks to everyone for the comments on the previous submission
> > to linux-scsi.
> >
> > This is against 2.6.24-rc1. The same patchset is also available:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git iommu-sg-fixes
>
> Are you interested in collecting patches to libata that eliminate the
> hand-rolled S/G splitting? e.g. now ata_fill_sg() and mv_fill_sg() can
> be made much more efficient.

Yeah, but ATA has the segment boundary limit too. It's much more
tricky than the segment size limit. It makes the IOMMU free space
managemet complicated.


> Again, thanks for doing this! It's been needed for a long time.

My pleasure.

2007-10-24 14:36:45

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH -mm 09/11] call blk_queue_segment_boundary in __scsi_alloc_queue

FUJITA Tomonori wrote:
> On Wed, 24 Oct 2007 07:39:16 -0400
> Jeff Garzik <[email protected]> wrote:
>
>> FUJITA Tomonori wrote:
>>> request_queue and device struct must have the same value of a segment
>>> size limit. This patch adds blk_queue_segment_boundary in
>>> __scsi_alloc_queue so LLDs don't need to call both
>>> blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change
>>> the default value (64KB) can call device_dma_parameters accessors like
>>> pci_set_dma_max_seg_size when allocating scsi_host.
>>>
>>> Signed-off-by: FUJITA Tomonori <[email protected]>
>>> ---
>>> drivers/scsi/scsi_lib.c | 3 +++
>>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 61fdaf0..23a30ab 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1645,6 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
>>> request_fn_proc *request_fn)
>>> {
>>> struct request_queue *q;
>>> + struct device *dev = shost->shost_gendev.parent;
>>>
>>> q = blk_init_queue(request_fn, NULL);
>>> if (!q)
>>> @@ -1673,6 +1674,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
>>> blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
>>> blk_queue_segment_boundary(q, shost->dma_boundary);
>>>
>>> + blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
>>> +
>> it would be nice to have something more general that's useable in
>> drivers/block/sx8.c (for example), something like
>>
>> static inline void
>> dev_blk_associate(struct device *dev, request_queue *q)
>> {
>> blk_queue_max_segment_size(q,
>> dma_get_max_seg_size(dev));
>> }
>>
>> still, I will ACK the above patch (#9) in case you wish that to become a
>> future cleanup, or others dislike this suggestion
>
> Yeah, I thought about something like that. But I can't find many
> non-scsi drivers (ide, sx8, mmc/card/, anymore?) doing dma and having
> the restrictions so I just call blk_queue_max_segment_size here.

My main idea was that request_queue will eventually want to know struct
device details directly, if it can.


> Either is ok with me. I'll modify the patch if Jens prefers such
> function. Jens?

I have no strong opinion... it was just a thought.

Jeff



2007-10-24 14:37:55

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 09/11] call blk_queue_segment_boundary in __scsi_alloc_queue

On Wed, 24 Oct 2007 16:28:11 +0200
Jens Axboe <[email protected]> wrote:

> On Wed, Oct 24 2007, FUJITA Tomonori wrote:
> > On Wed, 24 Oct 2007 07:39:16 -0400
> > Jeff Garzik <[email protected]> wrote:
> >
> > > FUJITA Tomonori wrote:
> > > > request_queue and device struct must have the same value of a segment
> > > > size limit. This patch adds blk_queue_segment_boundary in
> > > > __scsi_alloc_queue so LLDs don't need to call both
> > > > blk_queue_segment_boundary and set_dma_max_seg_size. A LLD can change
> > > > the default value (64KB) can call device_dma_parameters accessors like
> > > > pci_set_dma_max_seg_size when allocating scsi_host.
> > > >
> > > > Signed-off-by: FUJITA Tomonori <[email protected]>
> > > > ---
> > > > drivers/scsi/scsi_lib.c | 3 +++
> > > > 1 files changed, 3 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > > index 61fdaf0..23a30ab 100644
> > > > --- a/drivers/scsi/scsi_lib.c
> > > > +++ b/drivers/scsi/scsi_lib.c
> > > > @@ -1645,6 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> > > > request_fn_proc *request_fn)
> > > > {
> > > > struct request_queue *q;
> > > > + struct device *dev = shost->shost_gendev.parent;
> > > >
> > > > q = blk_init_queue(request_fn, NULL);
> > > > if (!q)
> > > > @@ -1673,6 +1674,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
> > > > blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
> > > > blk_queue_segment_boundary(q, shost->dma_boundary);
> > > >
> > > > + blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
> > > > +
> > >
> > > it would be nice to have something more general that's useable in
> > > drivers/block/sx8.c (for example), something like
> > >
> > > static inline void
> > > dev_blk_associate(struct device *dev, request_queue *q)
> > > {
> > > blk_queue_max_segment_size(q,
> > > dma_get_max_seg_size(dev));
> > > }
> > >
> > > still, I will ACK the above patch (#9) in case you wish that to become a
> > > future cleanup, or others dislike this suggestion
> >
> > Yeah, I thought about something like that. But I can't find many
> > non-scsi drivers (ide, sx8, mmc/card/, anymore?) doing dma and having
> > the restrictions so I just call blk_queue_max_segment_size here.
> >
> > Either is ok with me. I'll modify the patch if Jens prefers such
> > function. Jens?
>
> If there's a use for it, sure.

Ok, I'll update the patch and add some patches for non-scsi drivers.

2007-10-24 16:23:17

by FUJITA Tomonori

[permalink] [raw]
Subject: RE: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

On Wed, 24 Oct 2007 09:34:23 -0400
"Salyzyn, Mark" <[email protected]> wrote:

> ACK

Thanks.

> Based on the presence of the call. 2.6.22, for instance, does not have
> this capability...
>
> I did not test this change, just accepting on the principals. How much
> testing of the change did you do Fujita?

Unfortunately, I tested only the main component (device and pci
changes) with ppc64 IOMMU. I don't have the other IOMMUs.

I didn't test the aacraid patch but I guess that I have aacraid in the
workplace so I can test the patch (without the IOMMU chages).

2007-10-24 16:44:30

by Mark Salyzyn

[permalink] [raw]
Subject: RE: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size

Not requesting you to test (aacraid), just scoping any effort.

The cards in question are the (old) Dell PERC variety that would trigger
the need. I will notify our Dell liaison to see what they can do.

Sincerely -- Mark Salyzyn

> -----Original Message-----
> From: FUJITA Tomonori [mailto:[email protected]]
> Sent: Wednesday, October 24, 2007 12:22 PM
> To: Salyzyn, Mark
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; AACRAID; [email protected]
> Subject: RE: [PATCH -mm 11/11] aacraid: use pci_set_dma_max_seg_size
>
> On Wed, 24 Oct 2007 09:34:23 -0400
> "Salyzyn, Mark" <[email protected]> wrote:
>
> > ACK
>
> Thanks.
>
> > Based on the presence of the call. 2.6.22, for instance,
> does not have
> > this capability...
> >
> > I did not test this change, just accepting on the
> principals. How much
> > testing of the change did you do Fujita?
>
> Unfortunately, I tested only the main component (device and pci
> changes) with ppc64 IOMMU. I don't have the other IOMMUs.
>
> I didn't test the aacraid patch but I guess that I have aacraid in the
> workplace so I can test the patch (without the IOMMU chages).
>