Hello, all.
This patchset tries to fix data corruption bug caused by not handling
cache coherency during block PIO. This patch implements
blk_kmap/unmap helpers which take extra @dir argument and perform
appropriate coherency actions. These are to block PIO what dma_map/
unmap are to block DMA transfers.
IDE, libata, SCSI, rd and md are converted. Still left are nbd, loop
and pktcddvd. If I missed something, please let me know.
Russell, can you please test whether this fixes the bug on arm? If
this fixes the bug and people agree with the approach, I'll follow up
with patches for yet unconverted drivers and documentation update.
crypto/internal.h | 1
drivers/block/rd.c | 20 ++++----
drivers/ide/ide-floppy.c | 8 +--
drivers/ide/ide-taskfile.c | 6 +-
drivers/md/raid1.c | 13 +++--
drivers/md/raid5.c | 9 ++-
drivers/md/raid6main.c | 9 ++-
drivers/scsi/3w-9xxx.c | 8 +--
drivers/scsi/3w-xxxx.c | 4 -
drivers/scsi/aacraid/aachba.c | 4 -
drivers/scsi/gdth.c | 5 +-
drivers/scsi/ide-scsi.c | 14 +++--
drivers/scsi/ips.c | 21 +++++---
drivers/scsi/iscsi_tcp.c | 4 -
drivers/scsi/libata-core.c | 24 ++++++---
drivers/scsi/libata-scsi.c | 6 +-
drivers/scsi/megaraid.c | 8 ++-
drivers/scsi/qlogicpti.c | 5 +-
drivers/scsi/scsi_debug.c | 11 ++--
drivers/scsi/scsi_lib.c | 5 +-
fs/bio.c | 5 --
include/linux/bio.h | 59 ------------------------
include/linux/blkdev.h | 101 +++++++++++++++++++++++++++++++++++++++++-
include/linux/highmem.h | 2
24 files changed, 213 insertions(+), 139 deletions(-)
Thanks.
--
tejun
On architectures where highmem isn't used, arguments to kmap/unmap are
simply thrown away without being evaluated. This is fine until a
wrapper function is written. Even though it got ignored in the end,
the arguments are evaulated. As asm/highmem.h is not included by
linux/highmem.h when CONFIG_HIGHMEM is off, none of KM_* constants get
defined which results in error if those are evaluated.
This patch makes linux/highmem.h include asm/kmap_types.h regardless
of CONFIG_HIGHMEM. To deal with the same problem, crypto subsystem
used to include asm/kmap_types.h directly. This patch kills it.
Signed-off-by: Tejun Heo <[email protected]>
---
crypto/internal.h | 1 -
include/linux/highmem.h | 1 +
2 files changed, 1 insertions(+), 1 deletions(-)
4e0462fa09e87da901867f37b2c7311ef714c3e7
diff --git a/crypto/internal.h b/crypto/internal.h
index 959e602..4188672 100644
--- a/crypto/internal.h
+++ b/crypto/internal.h
@@ -21,7 +21,6 @@
#include <linux/kernel.h>
#include <linux/rwsem.h>
#include <linux/slab.h>
-#include <asm/kmap_types.h>
extern struct list_head crypto_alg_list;
extern struct rw_semaphore crypto_alg_sem;
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 6bece92..c605f01 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -6,6 +6,7 @@
#include <linux/mm.h>
#include <asm/cacheflush.h>
+#include <asm/kmap_types.h>
#ifdef CONFIG_HIGHMEM
--
1.0.6
Convert block/rd.c to use blk_kmap/unmap helpers. rd already had all
needed cache flushes, so this patch doesn't change its functionality.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/block/rd.c | 19 +++++++++++--------
1 files changed, 11 insertions(+), 8 deletions(-)
f09db99a6eb5be8a6a793c3a73e1457bfa61c928
diff --git a/drivers/block/rd.c b/drivers/block/rd.c
index ffd6abd..d0ff693 100644
--- a/drivers/block/rd.c
+++ b/drivers/block/rd.c
@@ -232,9 +232,11 @@ static int rd_blkdev_pagecache_IO(int rw
if (rw == READ) {
src = kmap_atomic(page, KM_USER0) + offset;
- dst = kmap_atomic(vec->bv_page, KM_USER1) + vec_offset;
+ dst = blk_kmap_atomic(vec->bv_page, KM_USER1,
+ DMA_FROM_DEVICE) + vec_offset;
} else {
- src = kmap_atomic(vec->bv_page, KM_USER0) + vec_offset;
+ src = blk_kmap_atomic(vec->bv_page, KM_USER0,
+ DMA_TO_DEVICE) + vec_offset;
dst = kmap_atomic(page, KM_USER1) + offset;
}
offset = 0;
@@ -242,13 +244,14 @@ static int rd_blkdev_pagecache_IO(int rw
memcpy(dst, src, count);
- kunmap_atomic(src, KM_USER0);
- kunmap_atomic(dst, KM_USER1);
-
- if (rw == READ)
- flush_dcache_page(vec->bv_page);
- else
+ if (rw == READ) {
+ kunmap_atomic(src, KM_USER0);
+ blk_kunmap_atomic(dst, KM_USER1, DMA_FROM_DEVICE);
+ } else {
+ blk_kunmap_atomic(src, KM_USER0, DMA_TO_DEVICE);
+ kunmap_atomic(dst, KM_USER1);
set_page_dirty(page);
+ }
unlock_page(page);
put_page(page);
} while (size);
--
1.0.6
Convert direct uses of kmap/unmap to blk_kmap/unmap in IDE. This
combined with the previous bio helper change fixes PIO cache coherency
bugs on architectures with aliased caches.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/ide/ide-taskfile.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
72ba1c82f766c7b42792499cc9156f89e76b176c
diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index 62ebefd..24d5e56 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -260,6 +260,7 @@ static void ide_pio_sector(ide_drive_t *
{
ide_hwif_t *hwif = drive->hwif;
struct scatterlist *sg = hwif->sg_table;
+ enum dma_data_direction dir = write ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
struct page *page;
#ifdef CONFIG_HIGHMEM
unsigned long flags;
@@ -277,7 +278,7 @@ static void ide_pio_sector(ide_drive_t *
#ifdef CONFIG_HIGHMEM
local_irq_save(flags);
#endif
- buf = kmap_atomic(page, KM_BIO_SRC_IRQ) + offset;
+ buf = blk_kmap_atomic(page, KM_BIO_SRC_IRQ, dir) + offset;
hwif->nleft--;
hwif->cursg_ofs++;
@@ -293,7 +294,7 @@ static void ide_pio_sector(ide_drive_t *
else
taskfile_input_data(drive, buf, SECTOR_WORDS);
- kunmap_atomic(buf, KM_BIO_SRC_IRQ);
+ blk_kunmap_atomic(buf, KM_BIO_SRC_IRQ, dir);
#ifdef CONFIG_HIGHMEM
local_irq_restore(flags);
#endif
--
1.0.6
Convert __bio_kmap_atomic, bvec_kmap_irq, __bio_kmap_irq, bio_kmap_irq
and their unmap counterparts such that they take @dir argument and use
blk_kmap helpers instead of directly calling kmap/unmap. This patch
also converts all users accordingly.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/ide/ide-floppy.c | 8 +++---
drivers/md/raid5.c | 9 +++++--
drivers/md/raid6main.c | 9 +++++--
drivers/scsi/scsi_lib.c | 5 ++--
fs/bio.c | 5 ++--
include/linux/bio.h | 59 --------------------------------------------
include/linux/blkdev.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 84 insertions(+), 72 deletions(-)
b35899fcc1babe0d74c9eb6d05beddb5992f4a60
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index 5945f55..af39130 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -615,9 +615,9 @@ static void idefloppy_input_buffers (ide
count = min(bvec->bv_len, bcount);
- data = bvec_kmap_irq(bvec, &flags);
+ data = bvec_kmap_irq(bvec, &flags, DMA_FROM_DEVICE);
drive->hwif->atapi_input_bytes(drive, data, count);
- bvec_kunmap_irq(data, &flags);
+ bvec_kunmap_irq(data, &flags, DMA_FROM_DEVICE);
bcount -= count;
pc->b_count += count;
@@ -649,9 +649,9 @@ static void idefloppy_output_buffers (id
count = min(bvec->bv_len, bcount);
- data = bvec_kmap_irq(bvec, &flags);
+ data = bvec_kmap_irq(bvec, &flags, DMA_TO_DEVICE);
drive->hwif->atapi_output_bytes(drive, data, count);
- bvec_kunmap_irq(data, &flags);
+ bvec_kunmap_irq(data, &flags, DMA_TO_DEVICE);
bcount -= count;
pc->b_count += count;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 54f4a98..db1fcd2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -689,12 +689,17 @@ static void copy_data(int frombio, struc
else clen = len;
if (clen > 0) {
- char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
+ enum dma_data_direction dir;
+ char *ba;
+
+ dir = frombio ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+ ba = __bio_kmap_atomic(bio, i, KM_USER0, dir);
+
if (frombio)
memcpy(pa+page_offset, ba+b_offset, clen);
else
memcpy(ba+b_offset, pa+page_offset, clen);
- __bio_kunmap_atomic(ba, KM_USER0);
+ __bio_kunmap_atomic(ba, KM_USER0, dir);
}
if (clen < len) /* hit end of page */
break;
diff --git a/drivers/md/raid6main.c b/drivers/md/raid6main.c
index 8c823d6..f320291 100644
--- a/drivers/md/raid6main.c
+++ b/drivers/md/raid6main.c
@@ -720,12 +720,17 @@ static void copy_data(int frombio, struc
else clen = len;
if (clen > 0) {
- char *ba = __bio_kmap_atomic(bio, i, KM_USER0);
+ enum dma_data_direction dir;
+ char *ba;
+
+ dir = frombio ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+ ba = __bio_kmap_atomic(bio, i, KM_USER0, dir);
+
if (frombio)
memcpy(pa+page_offset, ba+b_offset, clen);
else
memcpy(ba+b_offset, pa+page_offset, clen);
- __bio_kunmap_atomic(ba, KM_USER0);
+ __bio_kunmap_atomic(ba, KM_USER0, dir);
}
if (clen < len) /* hit end of page */
break;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 00c9bf3..73285eb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -942,9 +942,10 @@ void scsi_io_completion(struct scsi_cmnd
else if (cmd->buffer != req->buffer) {
if (rq_data_dir(req) == READ) {
unsigned long flags;
- char *to = bio_kmap_irq(req->bio, &flags);
+ char *to = bio_kmap_irq(req->bio, &flags,
+ DMA_FROM_DEVICE);
memcpy(to, cmd->buffer, cmd->bufflen);
- bio_kunmap_irq(to, &flags);
+ bio_kunmap_irq(to, &flags, DMA_FROM_DEVICE);
}
kfree(cmd->buffer);
}
diff --git a/fs/bio.c b/fs/bio.c
index 7b30695..c7b5442 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -199,10 +199,9 @@ void zero_fill_bio(struct bio *bio)
int i;
bio_for_each_segment(bv, bio, i) {
- char *data = bvec_kmap_irq(bv, &flags);
+ char *data = bvec_kmap_irq(bv, &flags, DMA_FROM_DEVICE);
memset(data, 0, bv->bv_len);
- flush_dcache_page(bv->bv_page);
- bvec_kunmap_irq(data, &flags);
+ bvec_kunmap_irq(data, &flags, DMA_FROM_DEVICE);
}
}
EXPORT_SYMBOL(zero_fill_bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index b60ffe3..8c80a14 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -186,18 +186,6 @@ struct bio {
#define bvec_to_phys(bv) (page_to_phys((bv)->bv_page) + (unsigned long) (bv)->bv_offset)
/*
- * queues that have highmem support enabled may still need to revert to
- * PIO transfers occasionally and thus map high pages temporarily. For
- * permanent PIO fall back, user is probably better off disabling highmem
- * I/O completely on that queue (see ide-dma for example)
- */
-#define __bio_kmap_atomic(bio, idx, kmtype) \
- (kmap_atomic(bio_iovec_idx((bio), (idx))->bv_page, kmtype) + \
- bio_iovec_idx((bio), (idx))->bv_offset)
-
-#define __bio_kunmap_atomic(addr, kmtype) kunmap_atomic(addr, kmtype)
-
-/*
* merge helpers etc
*/
@@ -310,51 +298,4 @@ extern struct bio *bio_copy_user(struct
extern int bio_uncopy_user(struct bio *);
void zero_fill_bio(struct bio *bio);
-#ifdef CONFIG_HIGHMEM
-/*
- * remember to add offset! and never ever reenable interrupts between a
- * bvec_kmap_irq and bvec_kunmap_irq!!
- *
- * This function MUST be inlined - it plays with the CPU interrupt flags.
- */
-static inline char *bvec_kmap_irq(struct bio_vec *bvec, unsigned long *flags)
-{
- unsigned long addr;
-
- /*
- * might not be a highmem page, but the preempt/irq count
- * balancing is a lot nicer this way
- */
- local_irq_save(*flags);
- addr = (unsigned long) kmap_atomic(bvec->bv_page, KM_BIO_SRC_IRQ);
-
- BUG_ON(addr & ~PAGE_MASK);
-
- return (char *) addr + bvec->bv_offset;
-}
-
-static inline void bvec_kunmap_irq(char *buffer, unsigned long *flags)
-{
- unsigned long ptr = (unsigned long) buffer & PAGE_MASK;
-
- kunmap_atomic((void *) ptr, KM_BIO_SRC_IRQ);
- local_irq_restore(*flags);
-}
-
-#else
-#define bvec_kmap_irq(bvec, flags) (page_address((bvec)->bv_page) + (bvec)->bv_offset)
-#define bvec_kunmap_irq(buf, flags) do { *(flags) = 0; } while (0)
-#endif
-
-static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
- unsigned long *flags)
-{
- return bvec_kmap_irq(bio_iovec_idx(bio, idx), flags);
-}
-#define __bio_kunmap_irq(buf, flags) bvec_kunmap_irq(buf, flags)
-
-#define bio_kmap_irq(bio, flags) \
- __bio_kmap_irq((bio), (bio)->bi_idx, (flags))
-#define bio_kunmap_irq(buf,flags) __bio_kunmap_irq(buf, flags)
-
#endif /* __LINUX_BIO_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1040029..ed432cf 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -850,6 +850,67 @@ static inline void blk_kunmap(struct pag
kunmap(page);
}
+static inline void * __bio_kmap_atomic(struct bio *bio, unsigned short idx,
+ enum km_type type,
+ enum dma_data_direction dir)
+{
+ struct bio_vec *bvec = bio_iovec_idx(bio, idx);
+ return blk_kmap_atomic(bvec->bv_page, type, dir) + bvec->bv_offset;
+}
+
+static inline void __bio_kunmap_atomic(void *addr, enum km_type type,
+ enum dma_data_direction dir)
+{
+ addr = (void *)((unsigned long)addr & PAGE_MASK);
+ return blk_kunmap_atomic(addr, type, dir);
+}
+
+/*
+ * Never ever reenable interrupts between a bvec_kmap_irq and
+ * bvec_kunmap_irq!! This function MUST be inlined - it plays with
+ * the CPU interrupt flags.
+ */
+static inline char * bvec_kmap_irq(struct bio_vec *bvec, unsigned long *flags,
+ enum dma_data_direction dir)
+{
+ char *addr;
+
+ /*
+ * might not be a highmem page, but the preempt/irq count
+ * balancing is a lot nicer this way
+ */
+#ifdef CONFIG_HIGHMEM
+ local_irq_save(*flags);
+#endif
+ addr = blk_kmap_atomic(bvec->bv_page, KM_BIO_SRC_IRQ, dir);
+
+ BUG_ON(((unsigned long)addr) & ~PAGE_MASK);
+
+ return addr + bvec->bv_offset;
+}
+
+static inline void bvec_kunmap_irq(char *addr, unsigned long *flags,
+ enum dma_data_direction dir)
+{
+ addr = (char *)((unsigned long)addr & PAGE_MASK);
+ blk_kunmap_atomic(addr, KM_BIO_SRC_IRQ, dir);
+#ifdef CONFIG_HIGHMEM
+ local_irq_restore(*flags);
+#endif
+}
+
+static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
+ unsigned long *flags,
+ enum dma_data_direction dir)
+{
+ return bvec_kmap_irq(bio_iovec_idx(bio, idx), flags, dir);
+}
+#define __bio_kunmap_irq(buf, flags, dir) bvec_kunmap_irq(buf, flags, dir)
+
+#define bio_kmap_irq(bio, flags, dir) \
+ __bio_kmap_irq((bio), (bio)->bi_idx, (flags), dir)
+#define bio_kunmap_irq(buf,flags, dir) __bio_kunmap_irq(buf, flags, dir)
+
struct work_struct;
int kblockd_schedule_work(struct work_struct *work);
void kblockd_flush(void);
--
1.0.6
Convert direct uses of kmap/unmap to blk_kmap/unmap in SCSI. This
combined with the previous bio helper change fixes PIO cache coherency
bugs on architectures with aliased caches.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/scsi/3w-9xxx.c | 8 ++++----
drivers/scsi/3w-xxxx.c | 4 ++--
drivers/scsi/aacraid/aachba.c | 4 ++--
drivers/scsi/gdth.c | 5 +++--
drivers/scsi/ide-scsi.c | 14 ++++++++------
drivers/scsi/ips.c | 21 ++++++++++++++-------
drivers/scsi/iscsi_tcp.c | 4 ++--
drivers/scsi/megaraid.c | 8 +++++---
drivers/scsi/qlogicpti.c | 5 +++--
drivers/scsi/scsi_debug.c | 10 ++++++----
10 files changed, 49 insertions(+), 34 deletions(-)
388ce61deda6116ffeafe3f71ed0e36ba48c8e26
diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
index 3ff74f4..dd343e1 100644
--- a/drivers/scsi/3w-9xxx.c
+++ b/drivers/scsi/3w-9xxx.c
@@ -1863,9 +1863,9 @@ static int twa_scsiop_execute_scsi(TW_De
if ((tw_dev->srb[request_id]->use_sg == 1) && (tw_dev->srb[request_id]->request_bufflen < TW_MIN_SGL_LENGTH)) {
if (tw_dev->srb[request_id]->sc_data_direction == DMA_TO_DEVICE || tw_dev->srb[request_id]->sc_data_direction == DMA_BIDIRECTIONAL) {
struct scatterlist *sg = (struct scatterlist *)tw_dev->srb[request_id]->request_buffer;
- char *buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
+ char *buf = blk_kmap_atomic(sg->page, KM_IRQ0, DMA_TO_DEVICE) + sg->offset;
memcpy(tw_dev->generic_buffer_virt[request_id], buf, sg->length);
- kunmap_atomic(buf - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - sg->offset, KM_IRQ0, DMA_TO_DEVICE);
}
command_packet->sg_list[0].address = tw_dev->generic_buffer_phys[request_id];
command_packet->sg_list[0].length = TW_MIN_SGL_LENGTH;
@@ -1942,9 +1942,9 @@ static void twa_scsiop_execute_scsi_comp
}
if (tw_dev->srb[request_id]->use_sg == 1) {
struct scatterlist *sg = (struct scatterlist *)tw_dev->srb[request_id]->request_buffer;
- char *buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
+ char *buf = blk_kmap_atomic(sg->page, KM_IRQ0, DMA_FROM_DEVICE) + sg->offset;
memcpy(buf, tw_dev->generic_buffer_virt[request_id], sg->length);
- kunmap_atomic(buf - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - sg->offset, KM_IRQ0, DMA_FROM_DEVICE);
}
}
} /* End twa_scsiop_execute_scsi_complete() */
diff --git a/drivers/scsi/3w-xxxx.c b/drivers/scsi/3w-xxxx.c
index 283f6d2..21955ed 100644
--- a/drivers/scsi/3w-xxxx.c
+++ b/drivers/scsi/3w-xxxx.c
@@ -1511,7 +1511,7 @@ static void tw_transfer_internal(TW_Devi
if (cmd->use_sg) {
struct scatterlist *sg =
(struct scatterlist *)cmd->request_buffer;
- buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
+ buf = blk_kmap_atomic(sg->page, KM_IRQ0, DMA_FROM_DEVICE) + sg->offset;
transfer_len = min(sg->length, len);
} else {
buf = cmd->request_buffer;
@@ -1524,7 +1524,7 @@ static void tw_transfer_internal(TW_Devi
struct scatterlist *sg;
sg = (struct scatterlist *)cmd->request_buffer;
- kunmap_atomic(buf - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - sg->offset, KM_IRQ0, DMA_FROM_DEVICE);
}
}
diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 7139659..089ab56 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -366,7 +366,7 @@ static void aac_internal_transfer(struct
struct scatterlist *sg = scsicmd->request_buffer;
if (scsicmd->use_sg) {
- buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
+ buf = blk_kmap_atomic(sg->page, KM_IRQ0, DMA_FROM_DEVICE) + sg->offset;
transfer_len = min(sg->length, len + offset);
} else {
buf = scsicmd->request_buffer;
@@ -376,7 +376,7 @@ static void aac_internal_transfer(struct
memcpy(buf + offset, data, transfer_len - offset);
if (scsicmd->use_sg)
- kunmap_atomic(buf - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - sg->offset, KM_IRQ0, DMA_FROM_DEVICE);
}
diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index a6deb01..de5e5cf 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -2560,10 +2560,11 @@ static void gdth_copy_internal_data(int
return;
}
local_irq_save(flags);
- address = kmap_atomic(sl->page, KM_BIO_SRC_IRQ) + sl->offset;
+ address = blk_kmap_atomic(sl->page, KM_BIO_SRC_IRQ,
+ DMA_FROM_DEVICE) + sl->offset;
memcpy(address,buffer,cpnow);
flush_dcache_page(sl->page);
- kunmap_atomic(address, KM_BIO_SRC_IRQ);
+ blk_kunmap_atomic(address, KM_BIO_SRC_IRQ, DMA_FROM_DEVICE);
local_irq_restore(flags);
if (cpsum == cpcount)
break;
diff --git a/drivers/scsi/ide-scsi.c b/drivers/scsi/ide-scsi.c
index 3c688ef..ef935d7 100644
--- a/drivers/scsi/ide-scsi.c
+++ b/drivers/scsi/ide-scsi.c
@@ -184,11 +184,12 @@ static void idescsi_input_buffers (ide_d
unsigned long flags;
local_irq_save(flags);
- buf = kmap_atomic(pc->sg->page, KM_IRQ0) +
- pc->sg->offset;
+ buf = blk_kmap_atomic(pc->sg->page, KM_IRQ0,
+ DMA_FROM_DEVICE) + pc->sg->offset;
drive->hwif->atapi_input_bytes(drive,
buf + pc->b_count, count);
- kunmap_atomic(buf - pc->sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - pc->sg->offset, KM_IRQ0,
+ DMA_FROM_DEVICE);
local_irq_restore(flags);
} else {
buf = page_address(pc->sg->page) + pc->sg->offset;
@@ -219,11 +220,12 @@ static void idescsi_output_buffers (ide_
unsigned long flags;
local_irq_save(flags);
- buf = kmap_atomic(pc->sg->page, KM_IRQ0) +
- pc->sg->offset;
+ buf = blk_kmap_atomic(pc->sg->page, KM_IRQ0,
+ DMA_TO_DEVICE) + pc->sg->offset;
drive->hwif->atapi_output_bytes(drive,
buf + pc->b_count, count);
- kunmap_atomic(buf - pc->sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - pc->sg->offset, KM_IRQ0,
+ DMA_TO_DEVICE);
local_irq_restore(flags);
} else {
buf = page_address(pc->sg->page) + pc->sg->offset;
diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 3882d48..9c1ec4c 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -1626,14 +1626,17 @@ ips_is_passthru(Scsi_Cmnd * SC)
/* kmap_atomic() ensures addressability of the user buffer.*/
/* local_irq_save() protects the KM_IRQ0 address slot. */
local_irq_save(flags);
- buffer = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
+ buffer = blk_kmap_atomic(sg->page, KM_IRQ0,
+ DMA_TO_DEVICE) + sg->offset;
if (buffer && buffer[0] == 'C' && buffer[1] == 'O' &&
buffer[2] == 'P' && buffer[3] == 'P') {
- kunmap_atomic(buffer - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buffer - sg->offset, KM_IRQ0,
+ DMA_TO_DEVICE);
local_irq_restore(flags);
return 1;
}
- kunmap_atomic(buffer - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buffer - sg->offset, KM_IRQ0,
+ DMA_TO_DEVICE);
local_irq_restore(flags);
}
}
@@ -3675,9 +3678,11 @@ ips_scmd_buf_write(Scsi_Cmnd * scmd, voi
/* kmap_atomic() ensures addressability of the data buffer.*/
/* local_irq_save() protects the KM_IRQ0 address slot. */
local_irq_save(flags);
- buffer = kmap_atomic(sg[i].page, KM_IRQ0) + sg[i].offset;
+ buffer = blk_kmap_atomic(sg[i].page, KM_IRQ0,
+ DMA_FROM_DEVICE) + sg[i].offset;
memcpy(buffer, &cdata[xfer_cnt], min_cnt);
- kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
+ blk_kunmap_atomic(buffer - sg[i].offset, KM_IRQ0,
+ DMA_FROM_DEVICE);
local_irq_restore(flags);
xfer_cnt += min_cnt;
@@ -3714,9 +3719,11 @@ ips_scmd_buf_read(Scsi_Cmnd * scmd, void
/* kmap_atomic() ensures addressability of the data buffer.*/
/* local_irq_save() protects the KM_IRQ0 address slot. */
local_irq_save(flags);
- buffer = kmap_atomic(sg[i].page, KM_IRQ0) + sg[i].offset;
+ buffer = blk_kmap_atomic(sg[i].page, KM_IRQ0,
+ DMA_TO_DEVICE) + sg[i].offset;
memcpy(&cdata[xfer_cnt], buffer, min_cnt);
- kunmap_atomic(buffer - sg[i].offset, KM_IRQ0);
+ blk_kunmap_atomic(buffer - sg[i].offset, KM_IRQ0,
+ DMA_TO_DEVICE);
local_irq_restore(flags);
xfer_cnt += min_cnt;
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 10bcf42..376a56e 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -948,10 +948,10 @@ static int iscsi_scsi_data_in(struct isc
for (i = ctask->sg_count; i < sc->use_sg; i++) {
char *dest;
- dest = kmap_atomic(sg[i].page, KM_SOFTIRQ0);
+ dest = blk_kmap_atomic(sg[i].page, KM_SOFTIRQ0, DMA_FROM_DEVICE);
rc = iscsi_ctask_copy(conn, ctask, dest + sg[i].offset,
sg[i].length, offset);
- kunmap_atomic(dest, KM_SOFTIRQ0);
+ blk_kunmap_atomic(dest, KM_SOFTIRQ0, DMA_FROM_DEVICE);
if (rc == -EAGAIN)
/* continue with the next SKB/PDU */
return rc;
diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 4a6feb1..28087a1 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -661,8 +661,9 @@ mega_build_cmd(adapter_t *adapter, Scsi_
struct scatterlist *sg;
sg = (struct scatterlist *)cmd->request_buffer;
- buf = kmap_atomic(sg->page, KM_IRQ0) +
- sg->offset;
+ buf = blk_kmap_atomic(sg->page, KM_IRQ0,
+ DMA_FROM_DEVICE);
+ buf += sg->offset;
} else
buf = cmd->request_buffer;
memset(buf, 0, cmd->cmnd[4]);
@@ -670,7 +671,8 @@ mega_build_cmd(adapter_t *adapter, Scsi_
struct scatterlist *sg;
sg = (struct scatterlist *)cmd->request_buffer;
- kunmap_atomic(buf - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - sg->offset, KM_IRQ0,
+ DMA_FROM_DEVICE);
}
cmd->result = (DID_OK << 16);
cmd->scsi_done(cmd);
diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index 1fd5fc6..fe5f40a 100644
--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -1128,7 +1128,8 @@ static unsigned int scsi_rbuf_get(struct
struct scatterlist *sg;
sg = (struct scatterlist *) cmd->request_buffer;
- buf = kmap_atomic(sg->page, KM_IRQ0) + sg->offset;
+ buf = blk_kmap_atomic(sg->page, KM_IRQ0, DMA_FROM_DEVICE)
+ + sg->offset;
buflen = sg->length;
} else {
buf = cmd->request_buffer;
@@ -1145,7 +1146,7 @@ static void scsi_rbuf_put(struct scsi_cm
struct scatterlist *sg;
sg = (struct scatterlist *) cmd->request_buffer;
- kunmap_atomic(buf - sg->offset, KM_IRQ0);
+ blk_kunmap_atomic(buf - sg->offset, KM_IRQ0, DMA_FROM_DEVICE);
}
}
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 3ded9da..446fbb1 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -503,7 +503,8 @@ static int fill_from_dev_buffer(struct s
for (k = 0, req_len = 0, act_len = 0; k < scp->use_sg; ++k, ++sgpnt) {
if (active) {
kaddr = (unsigned char *)
- kmap_atomic(sgpnt->page, KM_USER0);
+ blk_kmap_atomic(sgpnt->page, KM_USER0,
+ DMA_FROM_DEVICE);
if (NULL == kaddr)
return (DID_ERROR << 16);
kaddr_off = (unsigned char *)kaddr + sgpnt->offset;
@@ -513,7 +514,7 @@ static int fill_from_dev_buffer(struct s
len = arr_len - req_len;
}
memcpy(kaddr_off, arr + req_len, len);
- kunmap_atomic(kaddr, KM_USER0);
+ blk_kunmap_atomic(kaddr, KM_USER0, DMA_FROM_DEVICE);
act_len += len;
}
req_len += sgpnt->length;
@@ -546,7 +547,8 @@ static int fetch_to_dev_buffer(struct sc
}
sgpnt = (struct scatterlist *)scp->request_buffer;
for (k = 0, req_len = 0, fin = 0; k < scp->use_sg; ++k, ++sgpnt) {
- kaddr = (unsigned char *)kmap_atomic(sgpnt->page, KM_USER0);
+ kaddr = (unsigned char *)blk_kmap_atomic(sgpnt->page, KM_USER0,
+ DMA_TO_DEVICE);
if (NULL == kaddr)
return -1;
kaddr_off = (unsigned char *)kaddr + sgpnt->offset;
@@ -556,7 +558,7 @@ static int fetch_to_dev_buffer(struct sc
fin = 1;
}
memcpy(arr + req_len, kaddr_off, len);
- kunmap_atomic(kaddr, KM_USER0);
+ blk_kunmap_atomic(kaddr, KM_USER0, DMA_TO_DEVICE);
if (fin)
return req_len + len;
req_len += sgpnt->length;
--
1.0.6
When block requests are handled via DMA dma mapping functions take
care of cache coherency. Unfortunately, cache coherencly was left
unhandled until now for block PIOs, resulting in data corruption
issues on architectures with aliasing caches.
All block PIO operations use kmap/unmap to access target memory area
and the mapping/unmapping points are the perfect places for cache
flushing. kmap/unmap are to PIO'ing cpus what dma_map/unmap are to
DMAing devices.
This patch implements blk kmap helpers which additionally take
@direction argument and deal with cache coherency.
Signed-off-by: Tejun Heo <[email protected]>
---
include/linux/blkdev.h | 38 ++++++++++++++++++++++++++++++++++++++
1 files changed, 38 insertions(+), 0 deletions(-)
fdaeda6742b70451ddbb860b440d2533c6591fda
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02a585f..1040029 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -17,6 +17,10 @@
#include <asm/scatterlist.h>
+/* for PIO kmap helpers */
+#include <linux/highmem.h>
+#include <linux/dma-mapping.h>
+
struct request_queue;
typedef struct request_queue request_queue_t;
struct elevator_queue;
@@ -812,6 +816,40 @@ static inline void put_dev_sector(Sector
page_cache_release(p.v);
}
+/*
+ * PIO kmap helpers.
+ *
+ * Block PIO requires cache flushes on architectures with aliasing
+ * caches. If a driver wants to perform PIO on a user-mappable page
+ * (page cache page), it MUST use one of the following kmap/unmap
+ * helpers unless it handles cache coherency itself.
+ */
+static inline void * blk_kmap_atomic(struct page *page, enum km_type type,
+ enum dma_data_direction dir)
+{
+ return kmap_atomic(page, type);
+}
+
+static inline void blk_kunmap_atomic(void *addr, enum km_type type,
+ enum dma_data_direction dir)
+{
+ if (dir == DMA_BIDIRECTIONAL || dir == DMA_FROM_DEVICE)
+ flush_dcache_page(kmap_atomic_to_page(addr));
+ kunmap_atomic(addr, type);
+}
+
+static inline void * blk_kmap(struct page *page, enum dma_data_direction dir)
+{
+ return kmap(page);
+}
+
+static inline void blk_kunmap(struct page *page, enum dma_data_direction dir)
+{
+ if (dir == DMA_BIDIRECTIONAL || dir == DMA_FROM_DEVICE)
+ flush_dcache_page(page);
+ kunmap(page);
+}
+
struct work_struct;
int kblockd_schedule_work(struct work_struct *work);
void kblockd_flush(void);
--
1.0.6
Convert direct uses of kmap/unmap to blk_kmap/unmap in libata. This
combined with the previous bio helper change fixes PIO cache coherency
bugs on architectures with aliased caches.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/scsi/libata-core.c | 24 +++++++++++++++---------
drivers/scsi/libata-scsi.c | 5 +++--
2 files changed, 18 insertions(+), 11 deletions(-)
c1a1417612a1dc346baf783904d716c2e1d5f223
diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index f55b9b3..877a33c 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -2491,9 +2491,10 @@ static void ata_sg_clean(struct ata_queu
sg[qc->orig_n_elem - 1].length += qc->pad_len;
if (pad_buf) {
struct scatterlist *psg = &qc->pad_sgent;
- void *addr = kmap_atomic(psg->page, KM_IRQ0);
+ void *addr = blk_kmap_atomic(psg->page, KM_IRQ0,
+ DMA_FROM_DEVICE);
memcpy(addr + psg->offset, pad_buf, qc->pad_len);
- kunmap_atomic(addr, KM_IRQ0);
+ blk_kunmap_atomic(addr, KM_IRQ0, DMA_FROM_DEVICE);
}
} else {
if (sg_dma_len(&sg[0]) > 0)
@@ -2765,9 +2766,10 @@ static int ata_sg_setup(struct ata_queue
psg->offset = offset_in_page(offset);
if (qc->tf.flags & ATA_TFLAG_WRITE) {
- void *addr = kmap_atomic(psg->page, KM_IRQ0);
+ void *addr = blk_kmap_atomic(psg->page, KM_IRQ0,
+ DMA_TO_DEVICE);
memcpy(pad_buf, addr + psg->offset, qc->pad_len);
- kunmap_atomic(addr, KM_IRQ0);
+ blk_kunmap_atomic(addr, KM_IRQ0, DMA_TO_DEVICE);
}
sg_dma_address(psg) = ap->pad_dma + (qc->tag * ATA_DMA_PAD_SZ);
@@ -3077,6 +3079,7 @@ static void ata_pio_sector(struct ata_qu
int do_write = (qc->tf.flags & ATA_TFLAG_WRITE);
struct scatterlist *sg = qc->__sg;
struct ata_port *ap = qc->ap;
+ enum dma_data_direction dir;
struct page *page;
unsigned int offset;
unsigned char *buf;
@@ -3084,6 +3087,7 @@ static void ata_pio_sector(struct ata_qu
if (qc->cursect == (qc->nsect - 1))
ap->hsm_task_state = HSM_ST_LAST;
+ dir = do_write ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
page = sg[qc->cursg].page;
offset = sg[qc->cursg].offset + qc->cursg_ofs * ATA_SECT_SIZE;
@@ -3091,7 +3095,7 @@ static void ata_pio_sector(struct ata_qu
page = nth_page(page, (offset >> PAGE_SHIFT));
offset %= PAGE_SIZE;
- buf = kmap(page) + offset;
+ buf = blk_kmap(page, dir) + offset;
qc->cursect++;
qc->cursg_ofs++;
@@ -3104,10 +3108,9 @@ static void ata_pio_sector(struct ata_qu
DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read");
/* do the actual data transfer */
- do_write = (qc->tf.flags & ATA_TFLAG_WRITE);
ata_data_xfer(ap, buf, ATA_SECT_SIZE, do_write);
- kunmap(page);
+ blk_kunmap(page, dir);
}
/**
@@ -3127,6 +3130,7 @@ static void __atapi_pio_bytes(struct ata
int do_write = (qc->tf.flags & ATA_TFLAG_WRITE);
struct scatterlist *sg = qc->__sg;
struct ata_port *ap = qc->ap;
+ enum dma_data_direction dir;
struct page *page;
unsigned char *buf;
unsigned int offset, count;
@@ -3134,6 +3138,8 @@ static void __atapi_pio_bytes(struct ata
if (qc->curbytes + bytes >= qc->nbytes)
ap->hsm_task_state = HSM_ST_LAST;
+ dir = do_write ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+
next_sg:
if (unlikely(qc->cursg >= qc->n_elem)) {
/*
@@ -3173,7 +3179,7 @@ next_sg:
/* don't cross page boundaries */
count = min(count, (unsigned int)PAGE_SIZE - offset);
- buf = kmap(page) + offset;
+ buf = blk_kmap(page, dir) + offset;
bytes -= count;
qc->curbytes += count;
@@ -3189,7 +3195,7 @@ next_sg:
/* do the actual data transfer */
ata_data_xfer(ap, buf, count, do_write);
- kunmap(page);
+ blk_kunmap(page, dir);
if (bytes)
goto next_sg;
diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
index cfbceb5..8f8a19a 100644
--- a/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -1372,7 +1372,8 @@ static unsigned int ata_scsi_rbuf_get(st
struct scatterlist *sg;
sg = (struct scatterlist *) cmd->request_buffer;
- buf = kmap_atomic(sg->page, KM_USER0) + sg->offset;
+ buf = blk_kmap_atomic(sg->page, KM_USER0, DMA_FROM_DEVICE);
+ buf += sg->offset;
buflen = sg->length;
} else {
buf = cmd->request_buffer;
@@ -1400,7 +1401,7 @@ static inline void ata_scsi_rbuf_put(str
struct scatterlist *sg;
sg = (struct scatterlist *) cmd->request_buffer;
- kunmap_atomic(buf - sg->offset, KM_USER0);
+ blk_kunmap_atomic(buf - sg->offset, KM_USER0, DMA_FROM_DEVICE);
}
}
--
1.0.6
Convert direct uses of kmap/unmap to blk_kmap/unmap in md. This
combined with the previous bio helper change fixes PIO cache coherency
bugs on architectures with aliased caches.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/md/raid1.c | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
c781a8f9320246a13cdc62610b314dcd3678266d
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a06ff91..6a940a1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -714,13 +714,19 @@ static struct page **alloc_behind_pages(
goto do_sync_io;
bio_for_each_segment(bvec, bio, i) {
+ void *src, *dst;
+
pages[i] = alloc_page(GFP_NOIO);
if (unlikely(!pages[i]))
goto do_sync_io;
- memcpy(kmap(pages[i]) + bvec->bv_offset,
- kmap(bvec->bv_page) + bvec->bv_offset, bvec->bv_len);
+
+ src = blk_kmap(bvec->bv_page, DMA_TO_DEVICE) + bvec->bv_offset;
+ dst = kmap(pages[i]) + bvec->bv_offset;
+
+ memcpy(dst, src, bvec->bv_len);
+
+ blk_kunmap(bvec->bv_page, DMA_TO_DEVICE);
kunmap(pages[i]);
- kunmap(bvec->bv_page);
}
return pages;
--
1.0.6
On Sat, Jan 14 2006, Tejun Heo wrote:
> Hello, all.
>
> This patchset tries to fix data corruption bug caused by not handling
> cache coherency during block PIO. This patch implements
> blk_kmap/unmap helpers which take extra @dir argument and perform
> appropriate coherency actions. These are to block PIO what dma_map/
> unmap are to block DMA transfers.
>
> IDE, libata, SCSI, rd and md are converted. Still left are nbd, loop
> and pktcddvd. If I missed something, please let me know.
>
> Russell, can you please test whether this fixes the bug on arm? If
> this fixes the bug and people agree with the approach, I'll follow up
> with patches for yet unconverted drivers and documentation update.
Tejun,
Patches looks good, it's good to get this hole closed. Thanks!
--
Jens Axboe
On 1/13/06, Tejun Heo <[email protected]> wrote:
> Hello, all.
>
> This patchset tries to fix data corruption bug caused by not handling
> cache coherency during block PIO. This patch implements
> blk_kmap/unmap helpers which take extra @dir argument and perform
> appropriate coherency actions. These are to block PIO what dma_map/
> unmap are to block DMA transfers.
>
> IDE, libata, SCSI, rd and md are converted. Still left are nbd, loop
> and pktcddvd. If I missed something, please let me know.
>
> Russell, can you please test whether this fixes the bug on arm? If
> this fixes the bug and people agree with the approach, I'll follow up
> with patches for yet unconverted drivers and documentation update.
Looks fine, if it works you can add Acked-by: to the IDE patch.
Thanks!
Bartlomiej
On Sat, 2006-01-14 at 00:24 +0900, Tejun Heo wrote:
> This patchset tries to fix data corruption bug caused by not handling
> cache coherency during block PIO. This patch implements
> blk_kmap/unmap helpers which take extra @dir argument and perform
> appropriate coherency actions. These are to block PIO what dma_map/
> unmap are to block DMA transfers.
>
> IDE, libata, SCSI, rd and md are converted. Still left are nbd, loop
> and pktcddvd. If I missed something, please let me know.
>
> Russell, can you please test whether this fixes the bug on arm? If
> this fixes the bug and people agree with the approach, I'll follow up
> with patches for yet unconverted drivers and documentation update.
Actually, this doesn't look to be the correct thing to do. The
dma_map/unmap don't make the data coherent with respect to the user
space, only with respect to the kernel space. I've never liked this
(and indeed I wrote an OLS paper in 2004 trying to explain how we could
fix it) but that's our current model.
Our classic path for data on machines is that the driver makes the
kernel coherent and then whatever's transferring from the page cache to
the user makes user space coherent. It sounds, therefore, like
whatever's broken (what is the problem, by the way?) is broken in the
second half (page cache to user) not in the first half (driver to
kernel).
James
On Fri, Jan 13, 2006 at 09:50:19AM -0600, James Bottomley wrote:
> Actually, this doesn't look to be the correct thing to do. The
> dma_map/unmap don't make the data coherent with respect to the user
> space, only with respect to the kernel space. I've never liked this
> (and indeed I wrote an OLS paper in 2004 trying to explain how we could
> fix it) but that's our current model.
>
> Our classic path for data on machines is that the driver makes the
> kernel coherent and then whatever's transferring from the page cache to
> the user makes user space coherent. It sounds, therefore, like
> whatever's broken (what is the problem, by the way?) is broken in the
> second half (page cache to user) not in the first half (driver to
> kernel).
I think you're misunderstanding the issue. I'll give you essentially
my understanding of the explaination that Dave Miller gave me a number
of years ago. This is from memory, so Dave may wish to correct it.
1. When a driver DMAs data into a page cache page, it is written directly
to RAM and is made visible to the kernel mapping via the DMA API. As
a result, there will be no cache lines associated with the kernel
mapping at the point when the driver hands the page back to the page
cache.
However, in the PIO case, there is the possibility that the data read
from the device into the kernel mapping results in cache lines
associated with the page. Moreover, if the cache is write-allocate,
you _will_ have cache lines.
Therefore, you have two completely differing system states depending
on how the driver decided to transfer data from the device to the page
cache.
As such, drivers must ensure that PIO data transfers have the same
system state guarantees as DMA data transfers.
ISTR davem recommended flush_dcache_page() be used for this.
2. (this is my own) The cachetlb document specifies quite clearly what
is required whenever a page cache page is written to - that is
flush_dcache_page() is called. The situation when a driver uses PIO
quote clearly violates the requirements set out in that document.
>From (2), it is quite clear that flush_dcache_page() is the correct
function to use, otherwise we would end up with random set of state
of pages in the page cache. (1) merely reinforces that it's the
correct place for the decision to be made. In fact, it's the only
part of the kernel which _knows_ what needs to be done.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Fri, 2006-01-13 at 18:20 +0000, Russell King wrote:
> I think you're misunderstanding the issue. I'll give you essentially
> my understanding of the explaination that Dave Miller gave me a number
> of years ago. This is from memory, so Dave may wish to correct it.
>
> 1. When a driver DMAs data into a page cache page, it is written directly
> to RAM and is made visible to the kernel mapping via the DMA API. As
> a result, there will be no cache lines associated with the kernel
> mapping at the point when the driver hands the page back to the page
> cache.
Yes ... that's essentially what I said: DMA API makes us kernel
coherent. However, we explicitly *don't* mandate the way architectures
do this. It's certainly true, all the ones I know work by flushing the
kernel mapping cache lines, but I don't think you're entitled to rely on
this behaviour ... it's not inconcievable for large external cache
machines that the DMA could be done straight into the kernel cache.
> However, in the PIO case, there is the possibility that the data read
> from the device into the kernel mapping results in cache lines
> associated with the page. Moreover, if the cache is write-allocate,
> you _will_ have cache lines.
>
> Therefore, you have two completely differing system states depending
> on how the driver decided to transfer data from the device to the page
> cache.
>
> As such, drivers must ensure that PIO data transfers have the same
> system state guarantees as DMA data transfers.
>
> ISTR davem recommended flush_dcache_page() be used for this.
Ah ... perhaps this is the misunderstanding. To clear the kernel lines
associated with the page you use flush_kernel_dcache_page().
flush_dcache_page() is used to make a page cache page coherent with its
user mappings.
> 2. (this is my own) The cachetlb document specifies quite clearly what
> is required whenever a page cache page is written to - that is
> flush_dcache_page() is called. The situation when a driver uses PIO
> quote clearly violates the requirements set out in that document.
>
> >From (2), it is quite clear that flush_dcache_page() is the correct
> function to use, otherwise we would end up with random set of state
> of pages in the page cache. (1) merely reinforces that it's the
> correct place for the decision to be made. In fact, it's the only
> part of the kernel which _knows_ what needs to be done.
True, but your assumption that a driver should be doing this is what I'm
saying is incorrect. A driver's job is to deliver data coherently to
the kernel. The kernel's job is to deliver it coherently to the user.
Perhaps we should take this to linux-arch ... the audience there is well
versed in these arcane problems?
James
On Fri, Jan 13, 2006 at 12:35:24PM -0600, James Bottomley wrote:
> Perhaps we should take this to linux-arch ... the audience there is well
> versed in these arcane problems?
I think you need to wait for Dave Miller to reply and give a definitive
statement on how his cache coherency model is supposed to work in this
regard.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Sat, Jan 14, 2006 at 12:24:16AM +0900, Tejun Heo wrote:
> Russell, can you please test whether this fixes the bug on arm? If
> this fixes the bug and people agree with the approach, I'll follow up
> with patches for yet unconverted drivers and documentation update.
Unfortunately, as I previously explained, I'm not able to test this.
The reason is that in order to reproduce the bug, you need a system
with a VIVT write-back write-allocate cache.
Unfortunately, the few systems I have which have such a cache do not
have IDE, SCSI nor SATA (not even PCMCIA.) I suggest contacting the
folk who reported the bug in the first instance.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Fri, 2006-01-13 at 22:02 +0000, Russell King wrote:
> Unfortunately, as I previously explained, I'm not able to test this.
> The reason is that in order to reproduce the bug, you need a system
> with a VIVT write-back write-allocate cache.
>
> Unfortunately, the few systems I have which have such a cache do not
> have IDE, SCSI nor SATA (not even PCMCIA.) I suggest contacting the
> folk who reported the bug in the first instance.
Could someone explain (or give a reference to) the actual problem? If
it's a cache coherency issue it should show up with VIPT arhictectures
as well as VIVT ones ... I have access to parisc systems (with SCSI),
which are VIPT.
James
From: James Bottomley <[email protected]>
Date: Fri, 13 Jan 2006 16:38:49 -0600
> Could someone explain (or give a reference to) the actual problem? If
> it's a cache coherency issue it should show up with VIPT arhictectures
> as well as VIVT ones ... I have access to parisc systems (with SCSI),
> which are VIPT.
Not true, VIPT caches participate in cache coherency transactions
with the PCI host controller (and thus the PCI device), whereas
VIVT caches do not.
It does make a big difference, it's very hard to share datastructures
with a device concurrently accessing them (what we call PCI consistent
DMA mappings) on a VIVT cache.
On Fri, 2006-01-13 at 14:43 -0800, David S. Miller wrote:
> Not true, VIPT caches participate in cache coherency transactions
> with the PCI host controller (and thus the PCI device), whereas
> VIVT caches do not.
Yes ... I understand that. However for VIPT caches, we can only specify
one Virtual Index to be coherent (and the user mappings and the kernel
are at different virtual indexes) so we specify the kernel mapping in
the current implementation. We rely on the page cache cache<->user
piece making the rest of the user mappings coherent. So on parisc we
come out of the dma_unmap with the kernel coherent but not userspace.
The content of these patches was to put a flush_dcache_page() in the PIO
path, which makes both kernel and user mappings fully coherent. If
there was a bug because that is necessary, then it should show up on the
VIPT machines like parisc because user space isn't currently coherent
after a dma_unmap. But like I asked, is there a pointer to the original
bug ... I really think I need to look at that.
> It does make a big difference, it's very hard to share datastructures
> with a device concurrently accessing them (what we call PCI consistent
> DMA mappings) on a VIVT cache.
Yes ... I've always fancied a VIVT machine to play with ... but no-one's
ever sent me such a toy, sigh.
James
Tejun Heo wrote:
> Hello, all.
>
> This patchset tries to fix data corruption bug caused by not handling
> cache coherency during block PIO. This patch implements
> blk_kmap/unmap helpers which take extra @dir argument and perform
> appropriate coherency actions. These are to block PIO what dma_map/
> unmap are to block DMA transfers.
>
> IDE, libata, SCSI, rd and md are converted. Still left are nbd, loop
> and pktcddvd. If I missed something, please let me know.
>
> Russell, can you please test whether this fixes the bug on arm? If
> this fixes the bug and people agree with the approach, I'll follow up
> with patches for yet unconverted drivers and documentation update.
I ACK the libata portions, but I will let others ACK the overall
patchset goal (hopefully the arch people).
Jeff
On 1/13/06, Tejun Heo <[email protected]> wrote:
> Convert direct uses of kmap/unmap to blk_kmap/unmap in IDE. This
> combined with the previous bio helper change fixes PIO cache coherency
> bugs on architectures with aliased caches.
>
> Signed-off-by: Tejun Heo <[email protected]>
This series of patches makes booting from CF on my PXA255 device. Thanks Tejun.
Will these patches make 2.6.16?
Matt
Matt Reimer wrote:
> On 1/13/06, Tejun Heo <[email protected]> wrote:
>
>>Convert direct uses of kmap/unmap to blk_kmap/unmap in IDE. This
>>combined with the previous bio helper change fixes PIO cache coherency
>>bugs on architectures with aliased caches.
>>
>>Signed-off-by: Tejun Heo <[email protected]>
>
>
> This series of patches makes booting from CF on my PXA255 device. Thanks Tejun.
>
> Will these patches make 2.6.16?
>
Unfortunately, this patchset has some pending issues and probably should
be spinned one more time with another approach, although I'm currently
not very sure what the another approach should be. :-(
I'll try to do something. Thanks.
--
tejun
On Wed, Feb 15, 2006 at 11:05:02AM +0900, Tejun Heo wrote:
> Matt Reimer wrote:
> >On 1/13/06, Tejun Heo <[email protected]> wrote:
> >
> >>Convert direct uses of kmap/unmap to blk_kmap/unmap in IDE. This
> >>combined with the previous bio helper change fixes PIO cache coherency
> >>bugs on architectures with aliased caches.
> >>
> >>Signed-off-by: Tejun Heo <[email protected]>
> >
> >
> >This series of patches makes booting from CF on my PXA255 device. Thanks
> >Tejun.
> >
> >Will these patches make 2.6.16?
> >
>
> Unfortunately, this patchset has some pending issues and probably should
> be spinned one more time with another approach, although I'm currently
> not very sure what the another approach should be. :-(
>
> I'll try to do something. Thanks.
I think that's a mistake. Yes, James has decided to object, but I
think that James' objections are unfounded.
Since James doesn't even have a machine which shows this bug, it's
rather convenient for him to object and effectively stand in the way
of having the bug being fixed.
Or that's how I'm reading the current impass on these patches.
Linus - can we merge Tejun's patches so that we have an IDE subsystem
which works on ARM platforms please? If James wants to come up with
another solution later on, I'm sure we can transition all drivers
over to that new solution once we know what it is. Until then, can
we please fix the bug?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Thu, 16 Feb 2006, Russell King wrote:
>
> Linus - can we merge Tejun's patches so that we have an IDE subsystem
> which works on ARM platforms please? If James wants to come up with
> another solution later on, I'm sure we can transition all drivers
> over to that new solution once we know what it is. Until then, can
> we please fix the bug?
I'm assuming this isn't a regression, and that it's just been that way
forever. If so, I'm going to vote for it being merged after 2.6.16 is out.
We don't need more new stuff, and I think PXA users can either apply the
patches themselves, or wait a few weeks more..
But if this is actually a regression from 2.6.15, I want to know more
about it.
Linus
On Thu, Feb 16, 2006 at 10:10:59AM -0800, Linus Torvalds wrote:
> On Thu, 16 Feb 2006, Russell King wrote:
> > Linus - can we merge Tejun's patches so that we have an IDE subsystem
> > which works on ARM platforms please? If James wants to come up with
> > another solution later on, I'm sure we can transition all drivers
> > over to that new solution once we know what it is. Until then, can
> > we please fix the bug?
>
> I'm assuming this isn't a regression, and that it's just been that way
> forever. If so, I'm going to vote for it being merged after 2.6.16 is out.
> We don't need more new stuff, and I think PXA users can either apply the
> patches themselves, or wait a few weeks more..
>
> But if this is actually a regression from 2.6.15, I want to know more
> about it.
It's a long standing bug, so given that it can wait for post 2.6.16.
I just didn't want to have this situation continuing indefinitely.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Fri, Jan 13, 2006 at 07:06:14PM +0000, Russell King wrote:
> On Fri, Jan 13, 2006 at 12:35:24PM -0600, James Bottomley wrote:
> > Perhaps we should take this to linux-arch ... the audience there is well
> > versed in these arcane problems?
>
> I think you need to wait for Dave Miller to reply and give a definitive
> statement on how his cache coherency model is supposed to work in this
> regard.
>
Hello, all.
This thread has been dead for quite some time mainly because I didn't
know what to do. As it is a real outstanding bug bugging people and
Matt Reimer thankfully reminded me[1], I'm giving another shot at
resolving this.
People seem to agree that it is the responsibility of the driver to
make sure read data gets to the page cache page (or whatever kernel
page). Only driver knows how and when.
The objection raised by James Bottomley is that although syncing the
kernel page is the responsbility of the driver, syncing user page is
not; thus, use of flush_dcache_page() is excessive. James suggested
use of flush_kernel_dcache_page().
I also asked similar question[2] on lkml and Russell replied that
depending on arch implementation it shouldn't be much of a problem[3].
Another thing to consider is that all other drivers which currently
manage cache coherency use flush_dcache_page().
So, the questions are...
q1. James, besides from the use of flush_dcache_page(), do you agree
with the block layer kmap/kunmap API?
2. Is flush_kernel_dcache_page() the correct one?
Whether or not flush_kernel_dcache_page() is the one or not, I think
we should first go with flush_dcache_page() as that's what drivers
have been doing upto this point. Switching from flush_dcache_page()
to flush_kernel_dcache_page() is very easy and should be done in a
separate patch anyway. No?
Another thing mind is that this problem is not limited block drivers.
All the codes that perform writes to kmap'ed pages take care of
synchronization themselves and the popular choice seems to be
flush_dcache_page().
IMHO, kmap API should have a flag or something to tell it how the page
is being used such that kmap API can take care of synchronization like
dma mapping API does rather than scattering sync code all over the
kernel. And if that's the right thing to do, some of blk kmap
wrappers can/should be removed.
What do you guys think?
Thanks.
--
tejun
[1] http://article.gmane.org/gmane.linux.kernel/379304
[2] http://article.gmane.org/gmane.linux.kernel/360324 (the last question)
[3] http://article.gmane.org/gmane.linux.kernel/365607
On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
> The objection raised by James Bottomley is that although syncing the
> kernel page is the responsbility of the driver, syncing user page is
> not; thus, use of flush_dcache_page() is excessive. James suggested
> use of flush_kernel_dcache_page().
The problem is that it's not only excessive, it would entangle us with
mm locking. Basically, all you want to ensure is that the underlying
memory has the information after you've done (rather than the CPU
cache), flush_kernel_dcache_page() will achieve this. The block layer
itself takes care of user space coherency.
> I also asked similar question[2] on lkml and Russell replied that
> depending on arch implementation it shouldn't be much of a problem[3].
> Another thing to consider is that all other drivers which currently
> manage cache coherency use flush_dcache_page().
>
> So, the questions are...
>
> q1. James, besides from the use of flush_dcache_page(), do you agree
> with the block layer kmap/kunmap API?
Yes.
> 2. Is flush_kernel_dcache_page() the correct one?
Yes.
> Whether or not flush_kernel_dcache_page() is the one or not, I think
> we should first go with flush_dcache_page() as that's what drivers
> have been doing upto this point. Switching from flush_dcache_page()
> to flush_kernel_dcache_page() is very easy and should be done in a
> separate patch anyway. No?
No. On parisc flush_dcache_mm_lock is a write_lock_irq ... that will
enable interrupts after it's done (don't ask me why ... I think Hugh did
this). a lot of kmap API callers seem to be in interrupt context and may
have irqs off, so re-enabling would be impolite.
> Another thing mind is that this problem is not limited block drivers.
> All the codes that perform writes to kmap'ed pages take care of
> synchronization themselves and the popular choice seems to be
> flush_dcache_page().
>
> IMHO, kmap API should have a flag or something to tell it how the page
> is being used such that kmap API can take care of synchronization like
> dma mapping API does rather than scattering sync code all over the
> kernel. And if that's the right thing to do, some of blk kmap
> wrappers can/should be removed.
you mean something like kmap_to_device/cpu (or perhaps
kmap_for_read/write?)
James
On Thu, Mar 02, 2006 at 12:46:28PM -0600, James Bottomley wrote:
> On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
> > The objection raised by James Bottomley is that although syncing the
> > kernel page is the responsbility of the driver, syncing user page is
> > not; thus, use of flush_dcache_page() is excessive. James suggested
> > use of flush_kernel_dcache_page().
>
> The problem is that it's not only excessive, it would entangle us with
> mm locking. Basically, all you want to ensure is that the underlying
> memory has the information after you've done (rather than the CPU
> cache), flush_kernel_dcache_page() will achieve this. The block layer
> itself takes care of user space coherency.
Your understanding of the problem on ARM remains fundamentally flawed.
I see no way to resolve this since you don't seem to listen or accept
my reasoning.
Therefore, message I'm getting from you is that we are not allowed to
have an ARM system which can possibly work correctly with PIO.
As a result, I have no further interest in trying to resolve this issue,
period. ARM people will just have to accept that PIO mode IDE drivers
just will not be an option.
Thanks.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Wed, Feb 22 2006, Tejun Heo wrote:
> On Fri, Jan 13, 2006 at 07:06:14PM +0000, Russell King wrote:
> > On Fri, Jan 13, 2006 at 12:35:24PM -0600, James Bottomley wrote:
> > > Perhaps we should take this to linux-arch ... the audience there is well
> > > versed in these arcane problems?
> >
> > I think you need to wait for Dave Miller to reply and give a definitive
> > statement on how his cache coherency model is supposed to work in this
> > regard.
> >
>
> Hello, all.
>
> This thread has been dead for quite some time mainly because I didn't
> know what to do. As it is a real outstanding bug bugging people and
> Matt Reimer thankfully reminded me[1], I'm giving another shot at
> resolving this.
>
> People seem to agree that it is the responsibility of the driver to
> make sure read data gets to the page cache page (or whatever kernel
> page). Only driver knows how and when.
>
> The objection raised by James Bottomley is that although syncing the
> kernel page is the responsbility of the driver, syncing user page is
> not; thus, use of flush_dcache_page() is excessive. James suggested
> use of flush_kernel_dcache_page().
>
> I also asked similar question[2] on lkml and Russell replied that
> depending on arch implementation it shouldn't be much of a problem[3].
> Another thing to consider is that all other drivers which currently
> manage cache coherency use flush_dcache_page().
>
> So, the questions are...
>
> q1. James, besides from the use of flush_dcache_page(), do you agree
> with the block layer kmap/kunmap API?
>
> 2. Is flush_kernel_dcache_page() the correct one?
>
> Whether or not flush_kernel_dcache_page() is the one or not, I think
> we should first go with flush_dcache_page() as that's what drivers
> have been doing upto this point. Switching from flush_dcache_page()
> to flush_kernel_dcache_page() is very easy and should be done in a
> separate patch anyway. No?
Fully agree. The API is the right one in my opinion, just simple
wrapping of the mapping and flushing when appropriate.
> Another thing mind is that this problem is not limited block drivers.
> All the codes that perform writes to kmap'ed pages take care of
> synchronization themselves and the popular choice seems to be
> flush_dcache_page().
It's probably not widely seen outside of IDE, I would guess.
> IMHO, kmap API should have a flag or something to tell it how the page
> is being used such that kmap API can take care of synchronization like
> dma mapping API does rather than scattering sync code all over the
> kernel. And if that's the right thing to do, some of blk kmap
> wrappers can/should be removed.
>
> What do you guys think?
Might be even better. Shove regular eg kmap_atomic() into
__kmap_atomic() and add kmap_atomic() just calling that and
kmap_atomic_io() with an added flag for direction. Or an even easier
'hack' - use a separate KM type for these types of mappings and flush
always. Might need both IRQ and non-IRQ version, so probably not the
best idea. I'm inclined to vote for the first suggestion.
The point of it all is that there's no point in making this too fancy or
optimized, it would be a waste of time for an utterly slow and CPU
intensive path anyways.
--
Jens Axboe
On Thu, 2006-03-02 at 20:30 +0000, Russell King wrote:
> Your understanding of the problem on ARM remains fundamentally flawed.
> I see no way to resolve this since you don't seem to listen or accept
> my reasoning.
You have advanced no reason for using flush_dcache_page() instead of
flush_kernel_dcache_page() except this:
> ISTR davem recommended flush_dcache_page() be used for this.
and this:
> 2. (this is my own) The cachetlb document specifies quite clearly
> what
> is required whenever a page cache page is written to - that is
> flush_dcache_page() is called. The situation when a driver uses
> PIO
> quote clearly violates the requirements set out in that document.
>
Your problem, as you state:
> However, in the PIO case, there is the possibility that the data
> read
> from the device into the kernel mapping results in cache lines
> associated with the page. Moreover, if the cache is
> write-allocate,
> you _will_ have cache lines.
>
> Therefore, you have two completely differing system states
> depending
> on how the driver decided to transfer data from the device to the
> page
> cache.
It is my contention that flush_kernel_dcache_page() ejects the cache
lines that may be dirty in the kernel mapping and makes the underlying
memory coherent again. This is the same net effect as a DMA transfer
(data in memory but not in kernel cache).
> Therefore, message I'm getting from you is that we are not allowed to
> have an ARM system which can possibly work correctly with PIO.
>
> As a result, I have no further interest in trying to resolve this issue,
> period. ARM people will just have to accept that PIO mode IDE drivers
> just will not be an option.
Could you actually address the argument instead of getting all huffy?
James
On Thu, Mar 02 2006, Russell King wrote:
> On Thu, Mar 02, 2006 at 12:46:28PM -0600, James Bottomley wrote:
> > On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
> > > The objection raised by James Bottomley is that although syncing the
> > > kernel page is the responsbility of the driver, syncing user page is
> > > not; thus, use of flush_dcache_page() is excessive. James suggested
> > > use of flush_kernel_dcache_page().
> >
> > The problem is that it's not only excessive, it would entangle us with
> > mm locking. Basically, all you want to ensure is that the underlying
> > memory has the information after you've done (rather than the CPU
> > cache), flush_kernel_dcache_page() will achieve this. The block layer
> > itself takes care of user space coherency.
>
> Your understanding of the problem on ARM remains fundamentally flawed.
> I see no way to resolve this since you don't seem to listen or accept
> my reasoning.
>
> Therefore, message I'm getting from you is that we are not allowed to
> have an ARM system which can possibly work correctly with PIO.
>
> As a result, I have no further interest in trying to resolve this issue,
> period. ARM people will just have to accept that PIO mode IDE drivers
> just will not be an option.
Hey Russell calm down, lets get this thing fixed in the easiest and
least intrusive way for 2.6.17. As mentioned before, this isn't actually
a new problem by any stretch, a 2.6.17 solution would be acceptable to
you I hope.
What do you think of the kmap_atomic_pio() (notoriously bad at names,
but it should get the point across) and kunmap_atomic_pio(), the latter
accepting a read/write flag to note if we wrote to a vm page?
This is basically Tejuns original patch set, just moving it out of the
block layer so it's a generel exported property of the kmap api.
--
Jens Axboe
On Thu, Mar 02, 2006 at 02:43:59PM -0600, James Bottomley wrote:
> On Thu, 2006-03-02 at 20:30 +0000, Russell King wrote:
> > Therefore, message I'm getting from you is that we are not allowed to
> > have an ARM system which can possibly work correctly with PIO.
> >
> > As a result, I have no further interest in trying to resolve this issue,
> > period. ARM people will just have to accept that PIO mode IDE drivers
> > just will not be an option.
>
> Could you actually address the argument instead of getting all huffy?
I'm sorry, I've addressed the argument, and as far as I'm concerned,
there's nothing more I have to usefully contribute to this thread.
Sorry, you've worn me out, you've won.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
> \This thread has been dead for quite some time mainly because I didn't
> know what to do. As it is a real outstanding bug bugging people and
> Matt Reimer thankfully reminded me[1], I'm giving another shot at
> resolving this.
>
> People seem to agree that it is the responsibility of the driver to
> make sure read data gets to the page cache page (or whatever kernel
> page). Only driver knows how and when.
>
> The objection raised by James Bottomley is that although syncing the
> kernel page is the responsbility of the driver, syncing user page is
> not; thus, use of flush_dcache_page() is excessive. James suggested
> use of flush_kernel_dcache_page().
>
> I also asked similar question[2] on lkml and Russell replied that
> depending on arch implementation it shouldn't be much of a problem[3].
> Another thing to consider is that all other drivers which currently
> manage cache coherency use flush_dcache_page().
>
> So, the questions are...
>
> q1. James, besides from the use of flush_dcache_page(), do you agree
> with the block layer kmap/kunmap API?
>
> 2. Is flush_kernel_dcache_page() the correct one?
>
> Whether or not flush_kernel_dcache_page() is the one or not, I think
> we should first go with flush_dcache_page() as that's what drivers
> have been doing upto this point. Switching from flush_dcache_page()
> to flush_kernel_dcache_page() is very easy and should be done in a
> separate patch anyway. No?
>
> Another thing mind is that this problem is not limited block drivers.
> All the codes that perform writes to kmap'ed pages take care of
> synchronization themselves and the popular choice seems to be
> flush_dcache_page().
>
> IMHO, kmap API should have a flag or something to tell it how the page
> is being used such that kmap API can take care of synchronization like
> dma mapping API does rather than scattering sync code all over the
> kernel. And if that's the right thing to do, some of blk kmap
> wrappers can/should be removed.
>
> What do you guys think?
Here's my proposal to break this logjam.
I'm proposing introducing a new memory coherency API:
flush_kernel_dcache_page()
Which would be tasked with bringing cache coherency back to the kernel's
image of a user page after the kernel has modified it. On parisc this
will be a simple flush through the kernel cache.
I think on arm this should be implemented as
__cpuc_flush_dcache_page(page_address(page))
but you can implement this as flush_dcache_page() if you wish (I warn
you now that you have the same flush_dcache_mmap_lock problem that we
have on parisc, so if you do this, you'll return from
flush_dcache_page() with interrupts enabled, but at least this will no
longer be a parisc problem).
If everyone's happy with this approach, I'll take it over to linux-arch.
James
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 4ae4188..6232dd7 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -362,6 +362,15 @@ maps this page at its virtual address.
likely that you will need to flush the instruction cache
for copy_to_user_page().
+ void flush_kernel_dcache_page(struct page *page)
+ When the kernel needs to modify a user page is has obtained
+ with kmap, it calls this function after all modifications are
+ complete (but before kunmapping it) to bring the underlying
+ page up to date. It is assumed here that the user has no
+ incoherent cached copies (i.e. the original page was obtained
+ from a mechanism like get_user_pages())
+
+
void flush_icache_range(unsigned long start, unsigned long end)
When the kernel stores into addresses that it will execute
out of (eg when loading modules), this function is called.
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 6bece92..157bd2f 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -7,6 +7,12 @@
#include <asm/cacheflush.h>
+#ifndef ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
+static inline void flush_kernel_dcache_page(struct page *page)
+{
+}
+#endif
+
#ifdef CONFIG_HIGHMEM
#include <asm/highmem.h>
James Bottomley wrote:
> On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
>> \This thread has been dead for quite some time mainly because I didn't
>> know what to do. As it is a real outstanding bug bugging people and
>> Matt Reimer thankfully reminded me[1], I'm giving another shot at
>> resolving this.
>>
>> People seem to agree that it is the responsibility of the driver to
>> make sure read data gets to the page cache page (or whatever kernel
>> page). Only driver knows how and when.
>>
>> The objection raised by James Bottomley is that although syncing the
>> kernel page is the responsbility of the driver, syncing user page is
>> not; thus, use of flush_dcache_page() is excessive. James suggested
>> use of flush_kernel_dcache_page().
>>
>> I also asked similar question[2] on lkml and Russell replied that
>> depending on arch implementation it shouldn't be much of a problem[3].
>> Another thing to consider is that all other drivers which currently
>> manage cache coherency use flush_dcache_page().
>>
>> So, the questions are...
>>
>> q1. James, besides from the use of flush_dcache_page(), do you agree
>> with the block layer kmap/kunmap API?
>>
>> 2. Is flush_kernel_dcache_page() the correct one?
>>
>> Whether or not flush_kernel_dcache_page() is the one or not, I think
>> we should first go with flush_dcache_page() as that's what drivers
>> have been doing upto this point. Switching from flush_dcache_page()
>> to flush_kernel_dcache_page() is very easy and should be done in a
>> separate patch anyway. No?
>>
>> Another thing mind is that this problem is not limited block drivers.
>> All the codes that perform writes to kmap'ed pages take care of
>> synchronization themselves and the popular choice seems to be
>> flush_dcache_page().
>>
>> IMHO, kmap API should have a flag or something to tell it how the page
>> is being used such that kmap API can take care of synchronization like
>> dma mapping API does rather than scattering sync code all over the
>> kernel. And if that's the right thing to do, some of blk kmap
>> wrappers can/should be removed.
>>
>> What do you guys think?
>
> Here's my proposal to break this logjam.
>
> I'm proposing introducing a new memory coherency API:
>
> flush_kernel_dcache_page()
>
> Which would be tasked with bringing cache coherency back to the kernel's
> image of a user page after the kernel has modified it. On parisc this
> will be a simple flush through the kernel cache.
>
> I think on arm this should be implemented as
>
> __cpuc_flush_dcache_page(page_address(page))
>
> but you can implement this as flush_dcache_page() if you wish (I warn
> you now that you have the same flush_dcache_mmap_lock problem that we
> have on parisc, so if you do this, you'll return from
> flush_dcache_page() with interrupts enabled, but at least this will no
> longer be a parisc problem).
>
> If everyone's happy with this approach, I'll take it over to linux-arch.
>
> James
>
> diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
> index 4ae4188..6232dd7 100644
> --- a/Documentation/cachetlb.txt
> +++ b/Documentation/cachetlb.txt
> @@ -362,6 +362,15 @@ maps this page at its virtual address.
> likely that you will need to flush the instruction cache
> for copy_to_user_page().
>
> + void flush_kernel_dcache_page(struct page *page)
> + When the kernel needs to modify a user page is has obtained
> + with kmap, it calls this function after all modifications are
> + complete (but before kunmapping it) to bring the underlying
> + page up to date. It is assumed here that the user has no
> + incoherent cached copies (i.e. the original page was obtained
> + from a mechanism like get_user_pages())
> +
> +
> void flush_icache_range(unsigned long start, unsigned long end)
> When the kernel stores into addresses that it will execute
> out of (eg when loading modules), this function is called.
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 6bece92..157bd2f 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -7,6 +7,12 @@
>
> #include <asm/cacheflush.h>
>
> +#ifndef ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
> +static inline void flush_kernel_dcache_page(struct page *page)
> +{
> +}
> +#endif
> +
> #ifdef CONFIG_HIGHMEM
>
> #include <asm/highmem.h>
>
>
Okay by me, although I think we also need to fix flush_dcache_page()
such that it doesn't abruptly enable irq after itself. That's just
broken. I'll make some kmap wrappers such that callers don't have to try
too hard to get synchronization correct.
Thanks.
--
tejun
On Tue, 2006-03-21 at 01:26 +0900, Tejun Heo wrote:
> Okay by me, although I think we also need to fix flush_dcache_page()
> such that it doesn't abruptly enable irq after itself. That's just
> broken. I'll make some kmap wrappers such that callers don't have to try
> too hard to get synchronization correct.
That would involve fixing all of the flush_dcache_mmap_lock/unlock()
wrappers to take an extra flags variable (which would be unused on null
implementations) ... it can be done, but it's a lot of work; I think,
since all the current users aren't in atomic context, that we shouldn't
bother unless anyone can see a real need to call it from atomic context.
James
James Bottomley wrote:
> On Tue, 2006-03-21 at 01:26 +0900, Tejun Heo wrote:
>> Okay by me, although I think we also need to fix flush_dcache_page()
>> such that it doesn't abruptly enable irq after itself. That's just
>> broken. I'll make some kmap wrappers such that callers don't have to try
>> too hard to get synchronization correct.
>
> That would involve fixing all of the flush_dcache_mmap_lock/unlock()
> wrappers to take an extra flags variable (which would be unused on null
> implementations) ... it can be done, but it's a lot of work; I think,
> since all the current users aren't in atomic context, that we shouldn't
> bother unless anyone can see a real need to call it from atomic context.
>
Hmmm... if fixing is too much work, how about adding WARN_ON() or one of
its friends if we enter flush_dcache_page() from atomic context?
--
tejun
On Mon, 20 Mar 2006 10:12:51 -0600 James Bottomley wrote:
> On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
> > \This thread has been dead for quite some time mainly because I didn't
> > know what to do. As it is a real outstanding bug bugging people and
> > Matt Reimer thankfully reminded me[1], I'm giving another shot at
> > resolving this.
> >
> > People seem to agree that it is the responsibility of the driver to
> > make sure read data gets to the page cache page (or whatever kernel
> > page). Only driver knows how and when.
> >
> > The objection raised by James Bottomley is that although syncing the
> > kernel page is the responsbility of the driver, syncing user page is
> > not; thus, use of flush_dcache_page() is excessive. James suggested
> > use of flush_kernel_dcache_page().
> >
> > I also asked similar question[2] on lkml and Russell replied that
> > depending on arch implementation it shouldn't be much of a problem[3].
> > Another thing to consider is that all other drivers which currently
> > manage cache coherency use flush_dcache_page().
> >
> > So, the questions are...
> >
> > q1. James, besides from the use of flush_dcache_page(), do you agree
> > with the block layer kmap/kunmap API?
> >
> > 2. Is flush_kernel_dcache_page() the correct one?
> >
> > Whether or not flush_kernel_dcache_page() is the one or not, I think
> > we should first go with flush_dcache_page() as that's what drivers
> > have been doing upto this point. Switching from flush_dcache_page()
> > to flush_kernel_dcache_page() is very easy and should be done in a
> > separate patch anyway. No?
> >
> > Another thing mind is that this problem is not limited block drivers.
> > All the codes that perform writes to kmap'ed pages take care of
> > synchronization themselves and the popular choice seems to be
> > flush_dcache_page().
> >
> > IMHO, kmap API should have a flag or something to tell it how the page
> > is being used such that kmap API can take care of synchronization like
> > dma mapping API does rather than scattering sync code all over the
> > kernel. And if that's the right thing to do, some of blk kmap
> > wrappers can/should be removed.
> >
> > What do you guys think?
>
> Here's my proposal to break this logjam.
>
> I'm proposing introducing a new memory coherency API:
>
> flush_kernel_dcache_page()
>
> Which would be tasked with bringing cache coherency back to the kernel's
> image of a user page after the kernel has modified it. On parisc this
> will be a simple flush through the kernel cache.
>
> I think on arm this should be implemented as
>
> __cpuc_flush_dcache_page(page_address(page))
>
> but you can implement this as flush_dcache_page() if you wish (I warn
> you now that you have the same flush_dcache_mmap_lock problem that we
> have on parisc, so if you do this, you'll return from
> flush_dcache_page() with interrupts enabled, but at least this will no
> longer be a parisc problem).
>
> If everyone's happy with this approach, I'll take it over to linux-arch.
why is that the right place for it?
---
~Randy
On Tue, 2006-03-21 at 01:40 +0900, Tejun Heo wrote:
> Hmmm... if fixing is too much work, how about adding WARN_ON() or one of
> its friends if we enter flush_dcache_page() from atomic context?
Suits me ... if it ever triggers someone else can look at fixing it ...
James
On Mon, 2006-03-20 at 08:50 -0800, Randy.Dunlap wrote:
> > If everyone's happy with this approach, I'll take it over to
> linux-arch.
>
> why is that the right place for it?
Because the implementation details of flush_kernel_dcache_page() are
arch specific, so linux-arch is the list to notify about it.
James
On Tue, 2006-03-21 at 01:26 +0900, Tejun Heo wrote:
> Okay by me, although I think we also need to fix flush_dcache_page()
> such that it doesn't abruptly enable irq after itself. That's just
> broken. I'll make some kmap wrappers such that callers don't have to try
> too hard to get synchronization correct.
Andrew's taken the flush_kernel_dcache_page() changes, so if you can
re-roll your kmap/kunmap patch in the next week or so (before Linus goes
-rc) we might get it into 2.6.17.
James
On Thu, 2 Mar 2006, Jens Axboe wrote:
> On Thu, Mar 02 2006, Russell King wrote:
> > On Thu, Mar 02, 2006 at 12:46:28PM -0600, James Bottomley wrote:
> > > On Wed, 2006-02-22 at 17:27 +0900, Tejun Heo wrote:
> > > > The objection raised by James Bottomley is that although syncing the
> > > > kernel page is the responsbility of the driver, syncing user page is
> > > > not; thus, use of flush_dcache_page() is excessive. James suggested
> > > > use of flush_kernel_dcache_page().
> > >
> > > The problem is that it's not only excessive, it would entangle us with
> > > mm locking. Basically, all you want to ensure is that the underlying
> > > memory has the information after you've done (rather than the CPU
> > > cache), flush_kernel_dcache_page() will achieve this. The block layer
> > > itself takes care of user space coherency.
> >
> > Your understanding of the problem on ARM remains fundamentally flawed.
> > I see no way to resolve this since you don't seem to listen or accept
> > my reasoning.
> >
> > Therefore, message I'm getting from you is that we are not allowed to
> > have an ARM system which can possibly work correctly with PIO.
> >
> > As a result, I have no further interest in trying to resolve this issue,
> > period. ARM people will just have to accept that PIO mode IDE drivers
> > just will not be an option.
>
> Hey Russell calm down, lets get this thing fixed in the easiest and
> least intrusive way for 2.6.17. As mentioned before, this isn't actually
> a new problem by any stretch, a 2.6.17 solution would be acceptable to
> you I hope.
Has any discussion about this problem lead to some consensus?
> What do you think of the kmap_atomic_pio() (notoriously bad at names,
> but it should get the point across) and kunmap_atomic_pio(), the latter
> accepting a read/write flag to note if we wrote to a vm page?
>
> This is basically Tejuns original patch set, just moving it out of the
> block layer so it's a generel exported property of the kmap api.
What was the problem with Tejun's patchset already to which RMK (and
many others) agreed?
I do have hardware that exhibits the problem and therefore I wish the
discussion could be resumed.
Nicolas
Nicolas Pitre wrote:
> Has any discussion about this problem lead to some consensus?
Argghhh.. I completely forgot about this.
>> What do you think of the kmap_atomic_pio() (notoriously bad at names,
>> but it should get the point across) and kunmap_atomic_pio(), the latter
>> accepting a read/write flag to note if we wrote to a vm page?
>>
>> This is basically Tejuns original patch set, just moving it out of the
>> block layer so it's a generel exported property of the kmap api.
>
> What was the problem with Tejun's patchset already to which RMK (and
> many others) agreed?
>
> I do have hardware that exhibits the problem and therefore I wish the
> discussion could be resumed.
I'll follow up soon. Sorry.
--
tejun
On Tue, 30 May 2006, Tejun Heo wrote:
> Nicolas Pitre wrote:
> > Has any discussion about this problem lead to some consensus?
>
> Argghhh.. I completely forgot about this.
>
> > > What do you think of the kmap_atomic_pio() (notoriously bad at names,
> > > but it should get the point across) and kunmap_atomic_pio(), the latter
> > > accepting a read/write flag to note if we wrote to a vm page?
> > >
> > > This is basically Tejuns original patch set, just moving it out of the
> > > block layer so it's a generel exported property of the kmap api.
> >
> > What was the problem with Tejun's patchset already to which RMK (and many
> > others) agreed?
> >
> > I do have hardware that exhibits the problem and therefore I wish the
> > discussion could be resumed.
Partly to add myself to the cc-list, partly to add some oil in the fire -
there have been a few discussions on arm-kernel recently, one of them
http://marc.theaimsgroup.com/?t=114136178100001&r=1&w=2
where at least you can get some more test results / failure pictures.
However, many have also stated that the patch set from Tejun,
unfortunately, doesn't fix 100% of IDE PIO cache coherency problems on
ARM.
Thanks for bringing this back up
Guennadi
---
Guennadi Liakhovetski
On Tue, 30 May 2006, Guennadi Liakhovetski wrote:
> On Tue, 30 May 2006, Tejun Heo wrote:
>
> > Nicolas Pitre wrote:
> > > I do have hardware that exhibits the problem and therefore I wish the
> > > discussion could be resumed.
>
> Partly to add myself to the cc-list, partly to add some oil in the fire -
> there have been a few discussions on arm-kernel recently, one of them
>
> http://marc.theaimsgroup.com/?t=114136178100001&r=1&w=2
>
> where at least you can get some more test results / failure pictures.
> However, many have also stated that the patch set from Tejun,
> unfortunately, doesn't fix 100% of IDE PIO cache coherency problems on
> ARM.
The other problem is probably due to a mistake in the interpretation of
some XScale document and if so is easily fixable (actually one bit
difference in the page table).
The much more fundamental issue of having dirty lines after PIO in a
VIVT cache that user space fails to see is a real and generic design
issue on its own.
Nicolas
The IDE string ops need to flush the cache. We've been doing that
on sparc64 for more than 5 years... Otherwise, PIO transfers do
not "behave" like DMA.