2010-06-27 10:14:04

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

53c700 is the only user of dma_is_consistent():

BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());

The above code tries to see if the system can allocate coherent memory
or not. It's for some old systems that can't allocate coherent memory
at all (e.g some parisc systems).

I think that we can safely remove the above usage:

- such old systems haven't triger the above checking for long.

- the above condition is important for systems that can't allocate
coherent memory if these systems do DMA. So probably it would be
better to have such checking in arch's DMA initialization code
instead of a driver.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/scsi/53c700.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 80dc3ac..89fc1c8 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -309,9 +309,6 @@ NCR_700_detect(struct scsi_host_template *tpnt,
hostdata->msgin = memory + MSGIN_OFFSET;
hostdata->msgout = memory + MSGOUT_OFFSET;
hostdata->status = memory + STATUS_OFFSET;
- /* all of these offsets are L1_CACHE_BYTES separated. It is fatal
- * if this isn't sufficient separation to avoid dma flushing issues */
- BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());
hostdata->slots = (struct NCR_700_command_slot *)(memory + SLOTS_OFFSET);
hostdata->dev = dev;

--
1.6.5


2010-06-27 10:14:22

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH -mm 2/2] remove dma_is_consistent() API

The definition of dma_is_consistent() isn't equal in architectures. So
it hasn't been so useful for drivers (we have only one user of the API
in tree). Even if we fix dma_is_consistent() in some architectures, it
doesn't look useful at all. It was invented long ago for some old
systems that can't allocate coherent memory at all. It's better to
export only APIs that are definitely necessary for drivers.

Let's remove this API.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
Documentation/DMA-API.txt | 6 ------
arch/alpha/include/asm/dma-mapping.h | 1 -
arch/arm/include/asm/dma-mapping.h | 5 -----
arch/avr32/include/asm/dma-mapping.h | 5 -----
arch/blackfin/include/asm/dma-mapping.h | 1 -
arch/cris/include/asm/dma-mapping.h | 2 --
arch/frv/include/asm/dma-mapping.h | 2 --
arch/ia64/include/asm/dma-mapping.h | 2 --
arch/m68k/include/asm/dma-mapping.h | 5 -----
arch/microblaze/include/asm/dma-mapping.h | 1 -
arch/mips/include/asm/dma-mapping.h | 2 --
arch/mips/mm/dma-default.c | 7 -------
arch/mn10300/include/asm/dma-mapping.h | 2 --
arch/parisc/include/asm/dma-mapping.h | 6 ------
arch/powerpc/include/asm/dma-mapping.h | 5 -----
arch/sh/include/asm/dma-mapping.h | 6 ------
arch/sparc/include/asm/dma-mapping.h | 1 -
arch/um/include/asm/dma-mapping.h | 1 -
arch/x86/include/asm/dma-mapping.h | 1 -
arch/xtensa/include/asm/dma-mapping.h | 2 --
include/asm-generic/dma-mapping-broken.h | 3 ---
21 files changed, 0 insertions(+), 66 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 05e2ae2..fe23269 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -456,12 +456,6 @@ be identical to those passed in (and returned by
dma_alloc_noncoherent()).

int
-dma_is_consistent(struct device *dev, dma_addr_t dma_handle)
-
-Returns true if the device dev is performing consistent DMA on the memory
-area pointed to by the dma_handle.
-
-int
dma_get_cache_alignment(void)

Returns the processor cache alignment. This is the absolute minimum
diff --git a/arch/alpha/include/asm/dma-mapping.h b/arch/alpha/include/asm/dma-mapping.h
index 1bce816..99aa5b9 100644
--- a/arch/alpha/include/asm/dma-mapping.h
+++ b/arch/alpha/include/asm/dma-mapping.h
@@ -41,7 +41,6 @@ static inline int dma_set_mask(struct device *dev, u64 mask)

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
-#define dma_is_consistent(d, h) (1)

#define dma_cache_sync(dev, va, size, dir) ((void)0)
#define dma_get_cache_alignment() L1_CACHE_BYTES
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 69ce072..40adf78 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -149,11 +149,6 @@ static inline int dma_get_cache_alignment(void)
return 32;
}

-static inline int dma_is_consistent(struct device *dev, dma_addr_t handle)
-{
- return !!arch_is_coherent();
-}
-
/*
* DMA errors are defined by all-bits-set in the DMA address.
*/
diff --git a/arch/avr32/include/asm/dma-mapping.h b/arch/avr32/include/asm/dma-mapping.h
index 0399359..eecb264 100644
--- a/arch/avr32/include/asm/dma-mapping.h
+++ b/arch/avr32/include/asm/dma-mapping.h
@@ -336,11 +336,6 @@ dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)

-static inline int dma_is_consistent(struct device *dev, dma_addr_t dma_addr)
-{
- return 1;
-}
-
static inline int dma_get_cache_alignment(void)
{
return boot_cpu_data.dcache.linesz;
diff --git a/arch/blackfin/include/asm/dma-mapping.h b/arch/blackfin/include/asm/dma-mapping.h
index 212cb80..ffc7007 100644
--- a/arch/blackfin/include/asm/dma-mapping.h
+++ b/arch/blackfin/include/asm/dma-mapping.h
@@ -22,7 +22,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *vaddr,
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
#define dma_supported(d, m) (1)
#define dma_get_cache_alignment() (32)
-#define dma_is_consistent(d, h) (1)

static inline int
dma_set_mask(struct device *dev, u64 dma_mask)
diff --git a/arch/cris/include/asm/dma-mapping.h b/arch/cris/include/asm/dma-mapping.h
index da8ef8e..d4490f6 100644
--- a/arch/cris/include/asm/dma-mapping.h
+++ b/arch/cris/include/asm/dma-mapping.h
@@ -158,8 +158,6 @@ dma_get_cache_alignment(void)
return (1 << INTERNODE_CACHE_SHIFT);
}

-#define dma_is_consistent(d, h) (1)
-
static inline void
dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
diff --git a/arch/frv/include/asm/dma-mapping.h b/arch/frv/include/asm/dma-mapping.h
index 6af5d83..c66e651 100644
--- a/arch/frv/include/asm/dma-mapping.h
+++ b/arch/frv/include/asm/dma-mapping.h
@@ -131,8 +131,6 @@ int dma_get_cache_alignment(void)
return 1 << L1_CACHE_SHIFT;
}

-#define dma_is_consistent(d, h) (1)
-
static inline
void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
diff --git a/arch/ia64/include/asm/dma-mapping.h b/arch/ia64/include/asm/dma-mapping.h
index 7d09a09..f5c5094 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -99,6 +99,4 @@ dma_cache_sync (struct device *dev, void *vaddr, size_t size,
mb();
}

-#define dma_is_consistent(d, h) (1) /* all we do is coherent memory... */
-
#endif /* _ASM_IA64_DMA_MAPPING_H */
diff --git a/arch/m68k/include/asm/dma-mapping.h b/arch/m68k/include/asm/dma-mapping.h
index 26f5054..e0c4e83 100644
--- a/arch/m68k/include/asm/dma-mapping.h
+++ b/arch/m68k/include/asm/dma-mapping.h
@@ -21,11 +21,6 @@ static inline int dma_get_cache_alignment(void)
return 1 << L1_CACHE_SHIFT;
}

-static inline int dma_is_consistent(struct device *dev, dma_addr_t dma_addr)
-{
- return 0;
-}
-
extern void *dma_alloc_coherent(struct device *, size_t,
dma_addr_t *, gfp_t);
extern void dma_free_coherent(struct device *, size_t,
diff --git a/arch/microblaze/include/asm/dma-mapping.h b/arch/microblaze/include/asm/dma-mapping.h
index 18b3731..7d11eae 100644
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ b/arch/microblaze/include/asm/dma-mapping.h
@@ -112,7 +112,6 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
-#define dma_is_consistent(d, h) (1)

static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag)
diff --git a/arch/mips/include/asm/dma-mapping.h b/arch/mips/include/asm/dma-mapping.h
index 664ba53..bb72b4a 100644
--- a/arch/mips/include/asm/dma-mapping.h
+++ b/arch/mips/include/asm/dma-mapping.h
@@ -69,8 +69,6 @@ dma_get_cache_alignment(void)
return 128;
}

-extern int dma_is_consistent(struct device *dev, dma_addr_t dma_addr);
-
extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction);

diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index 9547bc0..7ba8908 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -357,13 +357,6 @@ int dma_supported(struct device *dev, u64 mask)

EXPORT_SYMBOL(dma_supported);

-int dma_is_consistent(struct device *dev, dma_addr_t dma_addr)
-{
- return plat_device_is_coherent(dev);
-}
-
-EXPORT_SYMBOL(dma_is_consistent);
-
void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
{
diff --git a/arch/mn10300/include/asm/dma-mapping.h b/arch/mn10300/include/asm/dma-mapping.h
index 4ed1522..1123770 100644
--- a/arch/mn10300/include/asm/dma-mapping.h
+++ b/arch/mn10300/include/asm/dma-mapping.h
@@ -167,8 +167,6 @@ int dma_get_cache_alignment(void)
return 1 << L1_CACHE_SHIFT;
}

-#define dma_is_consistent(d) (1)
-
static inline
void dma_cache_sync(void *vaddr, size_t size,
enum dma_data_direction direction)
diff --git a/arch/parisc/include/asm/dma-mapping.h b/arch/parisc/include/asm/dma-mapping.h
index da69433..a80bede 100644
--- a/arch/parisc/include/asm/dma-mapping.h
+++ b/arch/parisc/include/asm/dma-mapping.h
@@ -190,12 +190,6 @@ dma_get_cache_alignment(void)
return dcache_stride;
}

-static inline int
-dma_is_consistent(struct device *dev, dma_addr_t dma_addr)
-{
- return (hppa_dma_ops->dma_sync_single_for_cpu == NULL);
-}
-
static inline void
dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
index c85ef23..6bbf57a 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -209,11 +209,6 @@ static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
-#ifdef CONFIG_NOT_COHERENT_CACHE
-#define dma_is_consistent(d, h) (0)
-#else
-#define dma_is_consistent(d, h) (1)
-#endif

static inline int dma_get_cache_alignment(void)
{
diff --git a/arch/sh/include/asm/dma-mapping.h b/arch/sh/include/asm/dma-mapping.h
index bea3337..56a2cdb 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -42,12 +42,6 @@ void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)

-#ifdef CONFIG_DMA_COHERENT
-#define dma_is_consistent(d, h) (1)
-#else
-#define dma_is_consistent(d, h) (0)
-#endif
-
static inline int dma_get_cache_alignment(void)
{
/*
diff --git a/arch/sparc/include/asm/dma-mapping.h b/arch/sparc/include/asm/dma-mapping.h
index 4b4a0c0..13366df 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -11,7 +11,6 @@ extern int dma_supported(struct device *dev, u64 mask);

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
-#define dma_is_consistent(d, h) (1)

extern struct dma_map_ops *dma_ops, pci32_dma_ops;
extern struct bus_type pci_bus_type;
diff --git a/arch/um/include/asm/dma-mapping.h b/arch/um/include/asm/dma-mapping.h
index b948c14..17a2cb5 100644
--- a/arch/um/include/asm/dma-mapping.h
+++ b/arch/um/include/asm/dma-mapping.h
@@ -94,7 +94,6 @@ dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
-#define dma_is_consistent(d, h) (1)

static inline int
dma_get_cache_alignment(void)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index ac91eed..39c8b5c 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -54,7 +54,6 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)

#define dma_alloc_noncoherent(d, s, h, f) dma_alloc_coherent(d, s, h, f)
#define dma_free_noncoherent(d, s, v, h) dma_free_coherent(d, s, v, h)
-#define dma_is_consistent(d, h) (1)

extern int dma_supported(struct device *hwdev, u64 mask);
extern int dma_set_mask(struct device *dev, u64 mask);
diff --git a/arch/xtensa/include/asm/dma-mapping.h b/arch/xtensa/include/asm/dma-mapping.h
index 51882ae..7947456 100644
--- a/arch/xtensa/include/asm/dma-mapping.h
+++ b/arch/xtensa/include/asm/dma-mapping.h
@@ -167,8 +167,6 @@ dma_get_cache_alignment(void)
return L1_CACHE_BYTES;
}

-#define dma_is_consistent(d, h) (1)
-
static inline void
dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction)
diff --git a/include/asm-generic/dma-mapping-broken.h b/include/asm-generic/dma-mapping-broken.h
index 82cd0cb..ccf7b4f 100644
--- a/include/asm-generic/dma-mapping-broken.h
+++ b/include/asm-generic/dma-mapping-broken.h
@@ -72,9 +72,6 @@ dma_set_mask(struct device *dev, u64 mask);
extern int
dma_get_cache_alignment(void);

-extern int
-dma_is_consistent(struct device *dev, dma_addr_t dma_handle);
-
extern void
dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction direction);
--
1.6.5

2010-06-27 15:08:54

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Sun, 2010-06-27 at 19:10 +0900, FUJITA Tomonori wrote:
> 53c700 is the only user of dma_is_consistent():
>
> BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());
>
> The above code tries to see if the system can allocate coherent memory
> or not. It's for some old systems that can't allocate coherent memory
> at all (e.g some parisc systems).

Actually, that's not the right explanation. The BUG_ON is because of an
efficiency in the driver ... it's nothing to do with the architecture.

The driver uses a set of mailboxes, but for efficiency's sake, it packs
them into a single coherent area and separates the different usages by a
L1 cache stride). On architectures capable of manufacturing coherent
memory, this is a nice speed up in the DMA infrastructure. However, for
incoherent architectures, it's fatal if the dma coherence stride is
greater than the L1 cache size, because now we'll get data corruption
due to cacheline interference. That's what the BUG_ON is checking for.

> I think that we can safely remove the above usage:
>
> - such old systems haven't triger the above checking for long.
>
> - the above condition is important for systems that can't allocate
> coherent memory if these systems do DMA. So probably it would be
> better to have such checking in arch's DMA initialization code
> instead of a driver.

Well, we can't check in the architecture because it's a driver specific
thing ... I suppose making it a rule that dma_get_cache_alignment()
*must* be <= L1_CACHE_BYTES fixes it ... we seem to have no architecture
violating that, so just add it to the documentation, and the check can
go.

James

2010-06-28 03:38:39

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Sun, 27 Jun 2010 10:08:48 -0500
James Bottomley <[email protected]> wrote:

> On Sun, 2010-06-27 at 19:10 +0900, FUJITA Tomonori wrote:
> > 53c700 is the only user of dma_is_consistent():
> >
> > BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());
> >
> > The above code tries to see if the system can allocate coherent memory
> > or not. It's for some old systems that can't allocate coherent memory
> > at all (e.g some parisc systems).
>
> Actually, that's not the right explanation. The BUG_ON is because of an
> efficiency in the driver ... it's nothing to do with the architecture.
>
> The driver uses a set of mailboxes, but for efficiency's sake, it packs
> them into a single coherent area and separates the different usages by a
> L1 cache stride). On architectures capable of manufacturing coherent
> memory, this is a nice speed up in the DMA infrastructure. However, for
> incoherent architectures, it's fatal if the dma coherence stride is
> greater than the L1 cache size, because now we'll get data corruption
> due to cacheline interference. That's what the BUG_ON is checking for.

Sorry, I should have looked the details of the driver.

You are talking about the following tricks, right?

#define MSG_ARRAY_SIZE 8
#define MSGOUT_OFFSET (L1_CACHE_ALIGN(sizeof(SCRIPT)))
__u8 *msgout;
#define MSGIN_OFFSET (MSGOUT_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
__u8 *msgin;
#define STATUS_OFFSET (MSGIN_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
__u8 *status;
#define SLOTS_OFFSET (STATUS_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
struct NCR_700_command_slot *slots;
#define TOTAL_MEM_SIZE (SLOTS_OFFSET + L1_CACHE_ALIGN(sizeof(struct NCR_700_command_slot) * NCR_700_COMMAND_SLOTS_PER_HOST))


> > I think that we can safely remove the above usage:
> >
> > - such old systems haven't triger the above checking for long.
> >
> > - the above condition is important for systems that can't allocate
> > coherent memory if these systems do DMA. So probably it would be
> > better to have such checking in arch's DMA initialization code
> > instead of a driver.
>
> Well, we can't check in the architecture because it's a driver specific
> thing ... I suppose making it a rule that dma_get_cache_alignment()
> *must* be <= L1_CACHE_BYTES fixes it ... we seem to have no architecture
> violating that, so just add it to the documentation, and the check can
> go.

Seems that on some architectures (arm and mips at least),
dma_get_cache_alignment() could greater than L1_CACHE_BYTES. But they
simply return the possible maximum size of cache size like:

static inline int dma_get_cache_alignment(void)
{
/* XXX Largest on any MIPS */
return 128;
}

So practically, we should be safe. I guess that we can simply convert
them to return L1_CACHE_BYTES.

Some PARISC and mips are only the fully non-coherent architectures
that we support now? We can remove the above checking if
dma_get_cache_alignment() is <= L1_CACHE_BYTES on PARISC and mips?

2010-06-28 14:56:06

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Mon, 2010-06-28 at 12:37 +0900, FUJITA Tomonori wrote:
> On Sun, 27 Jun 2010 10:08:48 -0500
> James Bottomley <[email protected]> wrote:
>
> > On Sun, 2010-06-27 at 19:10 +0900, FUJITA Tomonori wrote:
> > > 53c700 is the only user of dma_is_consistent():
> > >
> > > BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());
> > >
> > > The above code tries to see if the system can allocate coherent memory
> > > or not. It's for some old systems that can't allocate coherent memory
> > > at all (e.g some parisc systems).
> >
> > Actually, that's not the right explanation. The BUG_ON is because of an
> > efficiency in the driver ... it's nothing to do with the architecture.
> >
> > The driver uses a set of mailboxes, but for efficiency's sake, it packs
> > them into a single coherent area and separates the different usages by a
> > L1 cache stride). On architectures capable of manufacturing coherent
> > memory, this is a nice speed up in the DMA infrastructure. However, for
> > incoherent architectures, it's fatal if the dma coherence stride is
> > greater than the L1 cache size, because now we'll get data corruption
> > due to cacheline interference. That's what the BUG_ON is checking for.
>
> Sorry, I should have looked the details of the driver.
>
> You are talking about the following tricks, right?
>
> #define MSG_ARRAY_SIZE 8
> #define MSGOUT_OFFSET (L1_CACHE_ALIGN(sizeof(SCRIPT)))
> __u8 *msgout;
> #define MSGIN_OFFSET (MSGOUT_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
> __u8 *msgin;
> #define STATUS_OFFSET (MSGIN_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
> __u8 *status;
> #define SLOTS_OFFSET (STATUS_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
> struct NCR_700_command_slot *slots;
> #define TOTAL_MEM_SIZE (SLOTS_OFFSET + L1_CACHE_ALIGN(sizeof(struct NCR_700_command_slot) * NCR_700_COMMAND_SLOTS_PER_HOST))

Yes, that's it. The mailboxes themselves are pretty small, and the
minimum coherent allocation is usually a page, so we'd waste orders of
magnitude more coherent memory than we actually need without this trick
(and on a lot of platforms, coherent memory is scarce).

> > > I think that we can safely remove the above usage:
> > >
> > > - such old systems haven't triger the above checking for long.
> > >
> > > - the above condition is important for systems that can't allocate
> > > coherent memory if these systems do DMA. So probably it would be
> > > better to have such checking in arch's DMA initialization code
> > > instead of a driver.
> >
> > Well, we can't check in the architecture because it's a driver specific
> > thing ... I suppose making it a rule that dma_get_cache_alignment()
> > *must* be <= L1_CACHE_BYTES fixes it ... we seem to have no architecture
> > violating that, so just add it to the documentation, and the check can
> > go.
>
> Seems that on some architectures (arm and mips at least),
> dma_get_cache_alignment() could greater than L1_CACHE_BYTES. But they
> simply return the possible maximum size of cache size like:
>
> static inline int dma_get_cache_alignment(void)
> {
> /* XXX Largest on any MIPS */
> return 128;
> }
>
> So practically, we should be safe. I guess that we can simply convert
> them to return L1_CACHE_BYTES.

As long as that's architecturally true, yes. I mean I can't imagine
any architecture that had a dma alignment requirement that was greater
than its L1 cache width ... but I've been surprised be for making
"Obviously this can't happen ..." type statements where MIPS is
concerned.

> Some PARISC and mips are only the fully non-coherent architectures
> that we support now?

I think there might be some ARM SoC systems as well ... there were some
strange ones that had tight limits on the addresses the other SoC
components could DMA to which made it very difficult to make consistent
areas.

> We can remove the above checking if
> dma_get_cache_alignment() is <= L1_CACHE_BYTES on PARISC and mips?

I don't think we need to check, just document that
dma_get_cache_alignment cannot be greater than the L1 cache stride.

James

2010-06-28 21:27:46

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Mon, Jun 28, 2010 at 09:55:58AM -0500, James Bottomley wrote:
> On Mon, 2010-06-28 at 12:37 +0900, FUJITA Tomonori wrote:
> > Seems that on some architectures (arm and mips at least),
> > dma_get_cache_alignment() could greater than L1_CACHE_BYTES. But they
> > simply return the possible maximum size of cache size like:
> >
> > static inline int dma_get_cache_alignment(void)
> > {
> > /* XXX Largest on any MIPS */
> > return 128;
> > }
> >
> > So practically, we should be safe. I guess that we can simply convert
> > them to return L1_CACHE_BYTES.
>
> As long as that's architecturally true, yes. I mean I can't imagine
> any architecture that had a dma alignment requirement that was greater
> than its L1 cache width ... but I've been surprised be for making
> "Obviously this can't happen ..." type statements where MIPS is
> concerned.
>
> > Some PARISC and mips are only the fully non-coherent architectures
> > that we support now?
>
> I think there might be some ARM SoC systems as well ... there were some
> strange ones that had tight limits on the addresses the other SoC
> components could DMA to which made it very difficult to make consistent
> areas.
>
We had similar cases on SH where even though we can generally provide
consistent memory, it may not be visible or usable by certain
peripherals. On some of the earlier CPUs when the on-chip bus was being
overhauled there was an on-chip DMAC and a PCI DMAC on different
interconnects with their own addressing limitations. PCI DMA needed
buffers to be allocated from PCI space and would simply generate address
errors for anything on any of the other interconnects. On those systems
we could provide consistent memory for other PCI devices if and only if
we happened to have a PCI video card or something else with spare device
memory on the bus inserted -- which in turn would not be visible on any
other interconnects. In those cases it worked out that the DMA alignment
for PCI memory and L1 line size were the same, but that was really more
by coincidence than design.

There are still similar cases remaining. Most SH CPUs have a snoop
controller for snooping PCI <-> external memory transactions, but most
CPUs do not enable it on account of not being able to have the CPU enter
idle states while the controller is active. It's only been with the SMP
parts that a generic snoop controller has been provided that has general
L1 visibility.

> > We can remove the above checking if
> > dma_get_cache_alignment() is <= L1_CACHE_BYTES on PARISC and mips?
>
> I don't think we need to check, just document that
> dma_get_cache_alignment cannot be greater than the L1 cache stride.
>
Looking at the MIPS stuff, it also seems like there are cases where
L1_CACHE_BYTES == 32 while the kmalloc minalign value is bumped to 128
for certain CPU configurations, and kept at 32 for others. Those sorts of
values look a lot more like the L2 cache stride than the L1, perhaps
something to do with the snoop controller on exotic ccNUMA
configurations?

2010-06-29 06:04:49

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Mon, 28 Jun 2010 09:55:58 -0500
James Bottomley <[email protected]> wrote:

> > > > I think that we can safely remove the above usage:
> > > >
> > > > - such old systems haven't triger the above checking for long.
> > > >
> > > > - the above condition is important for systems that can't allocate
> > > > coherent memory if these systems do DMA. So probably it would be
> > > > better to have such checking in arch's DMA initialization code
> > > > instead of a driver.
> > >
> > > Well, we can't check in the architecture because it's a driver specific
> > > thing ... I suppose making it a rule that dma_get_cache_alignment()
> > > *must* be <= L1_CACHE_BYTES fixes it ... we seem to have no architecture
> > > violating that, so just add it to the documentation, and the check can
> > > go.
> >
> > Seems that on some architectures (arm and mips at least),
> > dma_get_cache_alignment() could greater than L1_CACHE_BYTES. But they
> > simply return the possible maximum size of cache size like:
> >
> > static inline int dma_get_cache_alignment(void)
> > {
> > /* XXX Largest on any MIPS */
> > return 128;
> > }
> >
> > So practically, we should be safe. I guess that we can simply convert
> > them to return L1_CACHE_BYTES.
>
> As long as that's architecturally true, yes. I mean I can't imagine
> any architecture that had a dma alignment requirement that was greater
> than its L1 cache width ... but I've been surprised be for making
> "Obviously this can't happen ..." type statements where MIPS is
> concerned.

How about using ARCH_KMALLOC_MINALIGN instead of L1_CACHE_BYTES?

In the previous merge window, we made sure that all the architectures
defines the minimum alignment and width of DMA properly (and the fully
coherent architectures don't define ARCH_KMALLOC_MINALIGN).

dma_get_cache_alignment should be equal to ARCH_KMALLOC_MINALIGN if an
architecture defines ARCH_KMALLOC_MINALIGN (probably,
dma_get_cache_alignment() can be implemented in the common place with
ARCH_KMALLOC_MINALIGN. It would be better to rename
ARCH_KMALLOC_MINALIGN to something like ARCH_DMA_MINALIGN).

It might be better to place DMA_ALIGN(x) in the common place. Seems
that some drivers wrongly use L1_CACHE_ALIGN() to get the dma
alignment. Well, using cache alignment magic in drivers isn't a good
idea though...

=
From: FUJITA Tomonori <[email protected]>
Subject: [PATCH] 53c700: remove dma_is_consistent usage in 53c700

ARCH_KMALLOC_MINALIGN returns the minimum alignment and width of DMA
on architectures that define ARCH_KMALLOC_MINALIGN (if it's not
defined, architectures are fully coherent).

So we can use ARCH_KMALLOC_MINALIGN instead of L1_CACHE_BYTES and
safely remove the alignment checking.

Signed-off-by: FUJITA Tomonori <[email protected]>
---
drivers/scsi/53c700.c | 1 -
drivers/scsi/53c700.h | 17 ++++++++++++-----
2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 80dc3ac..f5fd923 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -311,7 +311,6 @@ NCR_700_detect(struct scsi_host_template *tpnt,
hostdata->status = memory + STATUS_OFFSET;
/* all of these offsets are L1_CACHE_BYTES separated. It is fatal
* if this isn't sufficient separation to avoid dma flushing issues */
- BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment());
hostdata->slots = (struct NCR_700_command_slot *)(memory + SLOTS_OFFSET);
hostdata->dev = dev;

diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h
index e06bdfe..469552b 100644
--- a/drivers/scsi/53c700.h
+++ b/drivers/scsi/53c700.h
@@ -222,16 +222,23 @@ struct NCR_700_Host_Parameters {
* memory. All the msgin, msgout and status are allocated in
* this memory too (at separate cache lines). TOTAL_MEM_SIZE
* represents the total size of this area */
+
+#ifdef ARCH_KMALLOC_MINALIGN
+#define DMA_ALIGN(x) ALIGN(x, ARCH_KMALLOC_MINALIGN)
+#else
+#define DMA_ALIGN(x) ALIGN(x, 1)
+#endif
+
#define MSG_ARRAY_SIZE 8
-#define MSGOUT_OFFSET (L1_CACHE_ALIGN(sizeof(SCRIPT)))
+#define MSGOUT_OFFSET (DMA_ALIGN(sizeof(SCRIPT)))
__u8 *msgout;
-#define MSGIN_OFFSET (MSGOUT_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
+#define MSGIN_OFFSET (MSGOUT_OFFSET + DMA_ALIGN(MSG_ARRAY_SIZE))
__u8 *msgin;
-#define STATUS_OFFSET (MSGIN_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
+#define STATUS_OFFSET (MSGIN_OFFSET + DMA_ALIGN(MSG_ARRAY_SIZE))
__u8 *status;
-#define SLOTS_OFFSET (STATUS_OFFSET + L1_CACHE_ALIGN(MSG_ARRAY_SIZE))
+#define SLOTS_OFFSET (STATUS_OFFSET + DMA_ALIGN(MSG_ARRAY_SIZE))
struct NCR_700_command_slot *slots;
-#define TOTAL_MEM_SIZE (SLOTS_OFFSET + L1_CACHE_ALIGN(sizeof(struct NCR_700_command_slot) * NCR_700_COMMAND_SLOTS_PER_HOST))
+#define TOTAL_MEM_SIZE (SLOTS_OFFSET + DMA_ALIGN(sizeof(struct NCR_700_command_slot) * NCR_700_COMMAND_SLOTS_PER_HOST))
int saved_slot_position;
int command_slot_count; /* protected by state lock */
__u8 tag_negotiated;
--
1.6.5

2010-06-29 13:37:40

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Tue, 2010-06-29 at 15:04 +0900, FUJITA Tomonori wrote:
> On Mon, 28 Jun 2010 09:55:58 -0500
> James Bottomley <[email protected]> wrote:
>
> > > > > I think that we can safely remove the above usage:
> > > > >
> > > > > - such old systems haven't triger the above checking for long.
> > > > >
> > > > > - the above condition is important for systems that can't allocate
> > > > > coherent memory if these systems do DMA. So probably it would be
> > > > > better to have such checking in arch's DMA initialization code
> > > > > instead of a driver.
> > > >
> > > > Well, we can't check in the architecture because it's a driver specific
> > > > thing ... I suppose making it a rule that dma_get_cache_alignment()
> > > > *must* be <= L1_CACHE_BYTES fixes it ... we seem to have no architecture
> > > > violating that, so just add it to the documentation, and the check can
> > > > go.
> > >
> > > Seems that on some architectures (arm and mips at least),
> > > dma_get_cache_alignment() could greater than L1_CACHE_BYTES. But they
> > > simply return the possible maximum size of cache size like:
> > >
> > > static inline int dma_get_cache_alignment(void)
> > > {
> > > /* XXX Largest on any MIPS */
> > > return 128;
> > > }
> > >
> > > So practically, we should be safe. I guess that we can simply convert
> > > them to return L1_CACHE_BYTES.
> >
> > As long as that's architecturally true, yes. I mean I can't imagine
> > any architecture that had a dma alignment requirement that was greater
> > than its L1 cache width ... but I've been surprised be for making
> > "Obviously this can't happen ..." type statements where MIPS is
> > concerned.
>
> How about using ARCH_KMALLOC_MINALIGN instead of L1_CACHE_BYTES?
>
> In the previous merge window, we made sure that all the architectures
> defines the minimum alignment and width of DMA properly (and the fully
> coherent architectures don't define ARCH_KMALLOC_MINALIGN).
>
> dma_get_cache_alignment should be equal to ARCH_KMALLOC_MINALIGN if an
> architecture defines ARCH_KMALLOC_MINALIGN (probably,
> dma_get_cache_alignment() can be implemented in the common place with
> ARCH_KMALLOC_MINALIGN. It would be better to rename
> ARCH_KMALLOC_MINALIGN to something like ARCH_DMA_MINALIGN).
>
> It might be better to place DMA_ALIGN(x) in the common place. Seems
> that some drivers wrongly use L1_CACHE_ALIGN() to get the dma
> alignment. Well, using cache alignment magic in drivers isn't a good
> idea though...
>
> =
> From: FUJITA Tomonori <[email protected]>
> Subject: [PATCH] 53c700: remove dma_is_consistent usage in 53c700
>
> ARCH_KMALLOC_MINALIGN returns the minimum alignment and width of DMA
> on architectures that define ARCH_KMALLOC_MINALIGN (if it's not
> defined, architectures are fully coherent).
>
> So we can use ARCH_KMALLOC_MINALIGN instead of L1_CACHE_BYTES and
> safely remove the alignment checking.

Actually, I'd rather not do this. The reason is that L1_CACHE_ALIGN is
quite a big performance optimisation on x86 for the driver. Without it,
it's functionally correct, but the DMA use of the mailboxes really
thrashes the cache which damages performance (x86 has
ARCH_KMALLOC_MINALIGN set to 8 ... the default)

The only correctness problem, which the BUG is checking for is mismatch
in dma alignment ... as I said, I'm happy just to rely on that being
correct on every incoherent platform the driver operates on.

James

2010-06-30 02:39:58

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] scsi: remove dma_is_consistent usage in 53c700

On Tue, 29 Jun 2010 08:37:35 -0500
James Bottomley <[email protected]> wrote:

> > How about using ARCH_KMALLOC_MINALIGN instead of L1_CACHE_BYTES?

(snip)

> Actually, I'd rather not do this. The reason is that L1_CACHE_ALIGN is
> quite a big performance optimisation on x86 for the driver. Without it,
> it's functionally correct, but the DMA use of the mailboxes really
> thrashes the cache which damages performance (x86 has
> ARCH_KMALLOC_MINALIGN set to 8 ... the default)

Ah, I see.

If slab.h doesn't define ARCH_KMALLOC_MINALIGN for architectures that
don't define it, the driver could do something like:

#ifdef ARCH_KMALLOC_MINALIGN
#define DMA_ALIGN(x) ALIGN(x, ARCH_KMALLOC_MINALIGN)
#else
#define DMA_ALIGN(x) ALIGN(x, L1_CACHE_BYTES)
#endif

Seems that it's better to rename ARCH_KMALLOC_MINALIGN to something
like ARCH_DMA_MINALIGN and make ARCH_KMALLOC_MINALIGN the slab
internal thing.


> The only correctness problem, which the BUG is checking for is mismatch
> in dma alignment ... as I said, I'm happy just to rely on that being
> correct on every incoherent platform the driver operates on.

Ok, it's fine by me too. let's simply remove the BUG_ON.

I think that you want to document that dma_get_cache_alignment()
cannot be greater than the L1 cache stride. However, seems that
dma_get_cache_alignment() is greater than L1_CACHE_BYTES on some
architectures (they have some reasons, I assume). So I'll just remove
the BUG_ON.