2006-09-11 23:00:36

by Dan Williams

[permalink] [raw]
Subject: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

Neil,

The following patches implement hardware accelerated raid5 for the Intel
XscaleĀ® series of I/O Processors. The MD changes allow stripe
operations to run outside the spin lock in a work queue. Hardware
acceleration is achieved by using a dma-engine-aware work queue routine
instead of the default software only routine.

Since the last release of the raid5 changes many bug fixes and other
improvements have been made as a result of stress testing. See the per
patch change logs for more information about what was fixed. This
release is the first release of the full dma implementation.

The patches touch 3 areas, the md-raid5 driver, the generic dmaengine
interface, and a platform device driver for IOPs. The raid5 changes
follow your comments concerning making the acceleration implementation
similar to how the stripe cache handles I/O requests. The dmaengine
changes are the second release of this code. They expand the interface
to handle more than memcpy operations, and add a generic raid5-dma
client. The iop-adma driver supports dma memcpy, xor, xor zero sum, and
memset across all IOP architectures (32x, 33x, and 13xx).

Concerning the context switching performance concerns raised at the
previous release, I have observed the following. For the hardware
accelerated case it appears that performance is always better with the
work queue than without since it allows multiple stripes to be operated
on simultaneously. I expect the same for an SMP platform, but so far my
testing has been limited to IOPs. For a single-processor
non-accelerated configuration I have not observed performance
degradation with work queue support enabled, but in the Kconfig option
help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).

Please consider the patches for -mm.

-Dan

[PATCH 01/19] raid5: raid5_do_soft_block_ops
[PATCH 02/19] raid5: move write operations to a workqueue
[PATCH 03/19] raid5: move check parity operations to a workqueue
[PATCH 04/19] raid5: move compute block operations to a workqueue
[PATCH 05/19] raid5: move read completion copies to a workqueue
[PATCH 06/19] raid5: move the reconstruct write expansion operation to a workqueue
[PATCH 07/19] raid5: remove compute_block and compute_parity5
[PATCH 08/19] dmaengine: enable multiple clients and operations
[PATCH 09/19] dmaengine: reduce backend address permutations
[PATCH 10/19] dmaengine: expose per channel dma mapping characteristics to clients
[PATCH 11/19] dmaengine: add memset as an asynchronous dma operation
[PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy
[PATCH 13/19] dmaengine: add support for dma xor zero sum operations
[PATCH 14/19] dmaengine: add dma_sync_wait
[PATCH 15/19] dmaengine: raid5 dma client
[PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines
[PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs
[PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization
[PATCH 19/19] iop3xx: IOP 32x and 33x support for the iop-adma driver

Note, the iop3xx patches apply against the iop3xx platform code
re-factoring done by Lennert Buytenhek. His patches are reproduced,
with permission, on the Xscale IOP SourceForge site.

Also available on SourceForge:

Linux Symposium Paper: MD RAID Acceleration Support for Asynchronous
DMA/XOR Engines
http://prdownloads.sourceforge.net/xscaleiop/ols_paper_2006.pdf?download

Tar archive of the patch set
http://prdownloads.sourceforge.net/xscaleiop/md_raid_accel-2.6.18-rc6.tar.gz?download

[PATCH 01/19] http://prdownloads.sourceforge.net/xscaleiop/md-add-raid5-do-soft-block-ops.patch?download
[PATCH 02/19] http://prdownloads.sourceforge.net/xscaleiop/md-move-write-operations-to-a-workqueue.patch?download
[PATCH 03/19] http://prdownloads.sourceforge.net/xscaleiop/md-move-check-parity-operations-to-a-workqueue.patch?download
[PATCH 04/19] http://prdownloads.sourceforge.net/xscaleiop/md-move-compute-block-operations-to-a-workqueue.patch?download
[PATCH 05/19] http://prdownloads.sourceforge.net/xscaleiop/md-move-read-completion-copies-to-a-workqueue.patch?download
[PATCH 06/19] http://prdownloads.sourceforge.net/xscaleiop/md-move-expansion-operations-to-a-workqueue.patch?download
[PATCH 07/19] http://prdownloads.sourceforge.net/xscaleiop/md-remove-compute_block-and-compute_parity5.patch?download
[PATCH 08/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-multiple-clients-and-multiple-operations.patch?download
[PATCH 09/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-unite-backend-address-types.patch?download
[PATCH 10/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-dma-async-map-page.patch?download
[PATCH 11/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-dma-async-memset.patch?download
[PATCH 12/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-dma-async-memcpy-err.patch?download
[PATCH 13/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-dma-async-zero-sum.patch?download
[PATCH 14/19] http://prdownloads.sourceforge.net/xscaleiop/dmaengine-dma-sync-wait.patch?download
[PATCH 15/19] http://prdownloads.sourceforge.net/xscaleiop/md-raid5-dma-client.patch?download
[PATCH 16/19] http://prdownloads.sourceforge.net/xscaleiop/iop-adma-device-driver.patch?download
[PATCH 17/19] http://prdownloads.sourceforge.net/xscaleiop/iop3xx-register-macro-cleanup.patch?download
[PATCH 18/19] http://prdownloads.sourceforge.net/xscaleiop/iop3xx-pci-initialization.patch?download
[PATCH 19/19] http://prdownloads.sourceforge.net/xscaleiop/iop3xx-adma-support.patch?download

Optimal performance on IOPs is obtained with:
CONFIG_MD_RAID456_WORKQUEUE=y
CONFIG_MD_RAID5_HW_OFFLOAD=y
CONFIG_RAID5_DMA=y
CONFIG_INTEL_IOP_ADMA=y


2006-09-11 23:17:43

by Dan Williams

[permalink] [raw]
Subject: [PATCH 01/19] raid5: raid5_do_soft_block_ops

From: Dan Williams <[email protected]>

raid5_do_soft_block_ops consolidates all the stripe cache maintenance
operations into a single routine. The stripe operations are:
* copying data between the stripe cache and user application buffers
* computing blocks to save a disk access, or to recover a missing block
* updating the parity on a write operation (reconstruct write and
read-modify-write)
* checking parity correctness

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/raid5.c | 289 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/raid/raid5.h | 129 +++++++++++++++++++-
2 files changed, 415 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4500660..8fde62b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1362,6 +1362,295 @@ static int stripe_to_pdidx(sector_t stri
return pd_idx;
}

+/*
+ * raid5_do_soft_block_ops - perform block memory operations on stripe data
+ * outside the spin lock.
+ */
+static void raid5_do_soft_block_ops(void *stripe_head_ref)
+{
+ struct stripe_head *sh = stripe_head_ref;
+ int i, pd_idx = sh->pd_idx, disks = sh->disks;
+ void *ptr[MAX_XOR_BLOCKS];
+ int overlap=0, work=0, written=0, compute=0, dd_idx=0;
+ int pd_uptodate=0;
+ unsigned long state, ops_state, ops_state_orig;
+ raid5_conf_t *conf = sh->raid_conf;
+
+ /* take a snapshot of what needs to be done at this point in time */
+ spin_lock(&sh->lock);
+ state = sh->state;
+ ops_state_orig = ops_state = sh->ops.state;
+ spin_unlock(&sh->lock);
+
+ if (test_bit(STRIPE_OP_BIOFILL, &state)) {
+ struct bio *return_bi=NULL;
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (test_bit(R5_ReadReq, &dev->flags)) {
+ struct bio *rbi, *rbi2;
+ PRINTK("%s: stripe %llu STRIPE_OP_BIOFILL op_state: %lx disk: %d\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state, i);
+ spin_lock_irq(&conf->device_lock);
+ rbi = dev->toread;
+ dev->toread = NULL;
+ spin_unlock_irq(&conf->device_lock);
+ overlap++;
+ while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+ copy_data(0, rbi, dev->page, dev->sector);
+ rbi2 = r5_next_bio(rbi, dev->sector);
+ spin_lock_irq(&conf->device_lock);
+ if (--rbi->bi_phys_segments == 0) {
+ rbi->bi_next = return_bi;
+ return_bi = rbi;
+ }
+ spin_unlock_irq(&conf->device_lock);
+ rbi = rbi2;
+ }
+ dev->read = return_bi;
+ }
+ }
+ if (overlap) {
+ set_bit(STRIPE_OP_BIOFILL_Done, &ops_state);
+ work++;
+ }
+ }
+
+ if (test_bit(STRIPE_OP_COMPUTE, &state)) {
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (test_bit(R5_ComputeReq, &dev->flags)) {
+ dd_idx = i;
+ i = -1;
+ break;
+ }
+ }
+ BUG_ON(i >= 0);
+ PRINTK("%s: stripe %llu STRIPE_OP_COMPUTE op_state: %lx block: %d\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state, dd_idx);
+ ptr[0] = page_address(sh->dev[dd_idx].page);
+
+ if (test_and_clear_bit(STRIPE_OP_COMPUTE_Prep, &ops_state)) {
+ memset(ptr[0], 0, STRIPE_SIZE);
+ set_bit(STRIPE_OP_COMPUTE_Parity, &ops_state);
+ }
+
+ if (test_and_clear_bit(STRIPE_OP_COMPUTE_Parity, &ops_state)) {
+ int count = 1;
+ for (i = disks ; i--; ) {
+ struct r5dev *dev = &sh->dev[i];
+ void *p;
+ if (i == dd_idx)
+ continue;
+ p = page_address(dev->page);
+ ptr[count++] = p;
+
+ check_xor();
+ }
+ if (count != 1)
+ xor_block(count, STRIPE_SIZE, ptr);
+
+ work++;
+ compute++;
+ set_bit(STRIPE_OP_COMPUTE_Done, &ops_state);
+ }
+ }
+
+ if (test_bit(STRIPE_OP_RMW, &state)) {
+ BUG_ON(test_bit(STRIPE_OP_RCW, &state));
+
+ PRINTK("%s: stripe %llu STRIPE_OP_RMW op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+ ptr[0] = page_address(sh->dev[pd_idx].page);
+
+ if (test_and_clear_bit(STRIPE_OP_RMW_ParityPre, &ops_state)) {
+ int count = 1;
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ struct bio *chosen;
+
+ /* Only process blocks that are known to be uptodate */
+ if (dev->towrite && test_bit(R5_RMWReq, &dev->flags)) {
+ ptr[count++] = page_address(dev->page);
+
+ spin_lock(&sh->lock);
+ chosen = dev->towrite;
+ dev->towrite = NULL;
+ BUG_ON(dev->written);
+ dev->written = chosen;
+ spin_unlock(&sh->lock);
+
+ overlap++;
+
+ check_xor();
+ }
+ }
+ if (count != 1)
+ xor_block(count, STRIPE_SIZE, ptr);
+ set_bit(STRIPE_OP_RMW_Drain, &ops_state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_RMW_Drain, &ops_state)) {
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ struct bio *wbi = dev->written;
+
+ if (dev->written)
+ written++;
+
+ while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+ copy_data(1, wbi, dev->page, dev->sector);
+ wbi = r5_next_bio(wbi, dev->sector);
+ }
+ }
+ set_bit(STRIPE_OP_RMW_ParityPost, &ops_state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_RMW_ParityPost, &ops_state)) {
+ int count = 1;
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (dev->written) {
+ ptr[count++] = page_address(dev->page);
+ check_xor();
+ }
+ }
+ if (count != 1)
+ xor_block(count, STRIPE_SIZE, ptr);
+
+ work++;
+ pd_uptodate++;
+ set_bit(STRIPE_OP_RMW_Done, &ops_state);
+ }
+
+ }
+
+ if (test_bit(STRIPE_OP_RCW, &state)) {
+ BUG_ON(test_bit(STRIPE_OP_RMW, &state));
+
+ PRINTK("%s: stripe %llu STRIPE_OP_RCW op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+ ptr[0] = page_address(sh->dev[pd_idx].page);
+
+ if (test_and_clear_bit(STRIPE_OP_RCW_Drain, &ops_state)) {
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ struct bio *chosen;
+ struct bio *wbi;
+
+ if (i!=pd_idx && dev->towrite &&
+ test_bit(R5_LOCKED, &dev->flags)) {
+
+ spin_lock(&sh->lock);
+ chosen = dev->towrite;
+ dev->towrite = NULL;
+ spin_unlock(&sh->lock);
+
+ BUG_ON(dev->written);
+ wbi = dev->written = chosen;
+
+ overlap++;
+ written++;
+
+ while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+ copy_data(1, wbi, dev->page, dev->sector);
+ wbi = r5_next_bio(wbi, dev->sector);
+ }
+ } else if (i==pd_idx)
+ memset(ptr[0], 0, STRIPE_SIZE);
+ }
+ set_bit(STRIPE_OP_RCW_Parity, &ops_state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_RCW_Parity, &ops_state)) {
+ int count = 1;
+ for (i=disks; i--;)
+ if (i != pd_idx) {
+ ptr[count++] = page_address(sh->dev[i].page);
+ check_xor();
+ }
+ if (count != 1)
+ xor_block(count, STRIPE_SIZE, ptr);
+
+ work++;
+ pd_uptodate++;
+ set_bit(STRIPE_OP_RCW_Done, &ops_state);
+
+ }
+ }
+
+ if (test_bit(STRIPE_OP_CHECK, &state)) {
+ PRINTK("%s: stripe %llu STRIPE_OP_CHECK op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+ ptr[0] = page_address(sh->dev[pd_idx].page);
+
+ if (test_and_clear_bit(STRIPE_OP_CHECK_Gen, &ops_state)) {
+ int count = 1;
+ for (i=disks; i--;)
+ if (i != pd_idx) {
+ ptr[count++] = page_address(sh->dev[i].page);
+ check_xor();
+ }
+ if (count != 1)
+ xor_block(count, STRIPE_SIZE, ptr);
+
+ set_bit(STRIPE_OP_CHECK_Verify, &ops_state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_CHECK_Verify, &ops_state)) {
+ if (page_is_zero(sh->dev[pd_idx].page))
+ set_bit(STRIPE_OP_CHECK_IsZero, &ops_state);
+
+ work++;
+ set_bit(STRIPE_OP_CHECK_Done, &ops_state);
+ }
+ }
+
+ spin_lock(&sh->lock);
+ /* Update the state of operations:
+ * -clear incoming requests
+ * -preserve output status (i.e. done status / check result)
+ * -preserve requests added since 'ops_state_orig' was set
+ */
+ sh->ops.state ^= (ops_state_orig & ~STRIPE_OP_COMPLETION_MASK);
+ sh->ops.state |= ops_state;
+
+ if (pd_uptodate)
+ set_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+
+ if (written)
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (dev->written)
+ set_bit(R5_UPTODATE, &dev->flags);
+ }
+
+ if (overlap)
+ for (i= disks; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (test_and_clear_bit(R5_Overlap, &dev->flags))
+ wake_up(&sh->raid_conf->wait_for_overlap);
+ }
+
+ if (compute) {
+ clear_bit(R5_ComputeReq, &sh->dev[dd_idx].flags);
+ set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
+ }
+
+ sh->ops.pending -= work;
+ BUG_ON(sh->ops.pending < 0);
+ clear_bit(STRIPE_OP_QUEUED, &sh->state);
+ set_bit(STRIPE_HANDLE, &sh->state);
+ queue_raid_work(sh);
+ spin_unlock(&sh->lock);
+
+ release_stripe(sh);
+}

/*
* handle_stripe - do things to a stripe.
diff --git a/include/linux/raid/raid5.h b/include/linux/raid/raid5.h
index 20ed4c9..c8a315b 100644
--- a/include/linux/raid/raid5.h
+++ b/include/linux/raid/raid5.h
@@ -116,13 +116,39 @@ #include <linux/raid/xor.h>
* attach a request to an active stripe (add_stripe_bh())
* lockdev attach-buffer unlockdev
* handle a stripe (handle_stripe())
- * lockstripe clrSTRIPE_HANDLE ... (lockdev check-buffers unlockdev) .. change-state .. record io needed unlockstripe schedule io
+ * lockstripe clrSTRIPE_HANDLE ... (lockdev check-buffers unlockdev) .. change-state .. record io/ops needed unlockstripe schedule io/ops
* release an active stripe (release_stripe())
* lockdev if (!--cnt) { if STRIPE_HANDLE, add to handle_list else add to inactive-list } unlockdev
*
* The refcount counts each thread that have activated the stripe,
* plus raid5d if it is handling it, plus one for each active request
- * on a cached buffer.
+ * on a cached buffer, and plus one if the stripe is undergoing stripe
+ * operations.
+ *
+ * Stripe operations are performed outside the stripe lock,
+ * the stripe operations are:
+ * -copying data between the stripe cache and user application buffers
+ * -computing blocks to save a disk access, or to recover a missing block
+ * -updating the parity on a write operation (reconstruct write and read-modify-write)
+ * -checking parity correctness
+ * These operations are carried out by either a software routine,
+ * raid5_do_soft_block_ops, or by a routine that arranges for the work to be
+ * done by dedicated DMA engines.
+ * When requesting an operation handle_stripe sets the proper state and work
+ * request flags, it then hands control to the operations routine. There are
+ * some critical dependencies between the operations that prevent some
+ * operations from being requested while another is in flight.
+ * Here are the inter-dependencies:
+ * -parity check operations destroy the in cache version of the parity block,
+ * so we prevent parity dependent operations like writes and compute_blocks
+ * from starting while a check is in progress.
+ * -when a write operation is requested we immediately lock the affected blocks,
+ * and mark them as not up to date. This causes new read requests to be held
+ * off, as well as parity checks and compute block operations.
+ * -once a compute block operation has been requested handle_stripe treats that
+ * block as if it is immediately up to date. The routine carrying out the
+ * operation guaruntees that any operation that is dependent on the
+ * compute block result is initiated after the computation completes.
*/

struct stripe_head {
@@ -136,11 +162,18 @@ struct stripe_head {
spinlock_t lock;
int bm_seq; /* sequence number for bitmap flushes */
int disks; /* disks in stripe */
+ struct stripe_operations {
+ int pending; /* number of operations requested */
+ unsigned long state; /* state of block operations */
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ struct work_struct work; /* work queue descriptor */
+ #endif
+ } ops;
struct r5dev {
struct bio req;
struct bio_vec vec;
struct page *page;
- struct bio *toread, *towrite, *written;
+ struct bio *toread, *read, *towrite, *written;
sector_t sector; /* sector of this page */
unsigned long flags;
} dev[1]; /* allocated with extra space depending of RAID geometry */
@@ -158,6 +191,11 @@ #define R5_ReadError 8 /* seen a read er
#define R5_ReWrite 9 /* have tried to over-write the readerror */

#define R5_Expanded 10 /* This block now has post-expand data */
+#define R5_Consistent 11 /* Block is HW DMA-able without a cache flush */
+#define R5_ComputeReq 12 /* compute_block in progress treat as uptodate */
+#define R5_ReadReq 13 /* dev->toread contains a bio that needs filling */
+#define R5_RMWReq 14 /* distinguish blocks ready for rmw from other "towrites" */
+
/*
* Write method
*/
@@ -179,6 +217,72 @@ #define STRIPE_BIT_DELAY 8
#define STRIPE_EXPANDING 9
#define STRIPE_EXPAND_SOURCE 10
#define STRIPE_EXPAND_READY 11
+#define STRIPE_OP_RCW 12
+#define STRIPE_OP_RMW 13 /* RAID-5 only */
+#define STRIPE_OP_UPDATE 14 /* RAID-6 only */
+#define STRIPE_OP_CHECK 15
+#define STRIPE_OP_COMPUTE 16
+#define STRIPE_OP_COMPUTE2 17 /* RAID-6 only */
+#define STRIPE_OP_BIOFILL 18
+#define STRIPE_OP_QUEUED 19
+#define STRIPE_OP_DMA 20
+
+/*
+ * These flags are communication markers between the handle_stripe[5|6]
+ * routine and the block operations work queue
+ * - The *_Done definitions signal completion from work queue to handle_stripe
+ * - STRIPE_OP_CHECK_IsZero: signals parity correctness to handle_stripe
+ * - STRIPE_OP_RCW_Expand: expansion operations perform a modified RCW sequence
+ * - STRIPE_OP_COMPUTE_Recover_pd: recovering the parity disk involves an extra
+ * write back step
+ * - STRIPE_OP_*_Dma: flag operations that will be done once the DMA engine
+ * goes idle
+ * - All other definitions are service requests for the work queue
+ */
+#define STRIPE_OP_RCW_Drain 0
+#define STRIPE_OP_RCW_Parity 1
+#define STRIPE_OP_RCW_Done 2
+#define STRIPE_OP_RCW_Expand 3
+#define STRIPE_OP_RMW_ParityPre 4
+#define STRIPE_OP_RMW_Drain 5
+#define STRIPE_OP_RMW_ParityPost 6
+#define STRIPE_OP_RMW_Done 7
+#define STRIPE_OP_CHECK_Gen 8
+#define STRIPE_OP_CHECK_Verify 9
+#define STRIPE_OP_CHECK_Done 10
+#define STRIPE_OP_CHECK_IsZero 11
+#define STRIPE_OP_COMPUTE_Prep 12
+#define STRIPE_OP_COMPUTE_Parity 13
+#define STRIPE_OP_COMPUTE_Done 14
+#define STRIPE_OP_COMPUTE_Recover_pd 15
+#define STRIPE_OP_BIOFILL_Copy 16
+#define STRIPE_OP_BIOFILL_Done 17
+#define STRIPE_OP_RCW_Dma 18
+#define STRIPE_OP_RMW_Dma 19
+#define STRIPE_OP_UPDATE_Dma 20
+#define STRIPE_OP_CHECK_Dma 21
+#define STRIPE_OP_COMPUTE_Dma 22
+#define STRIPE_OP_COMPUTE2_Dma 23
+#define STRIPE_OP_BIOFILL_Dma 24
+
+/*
+ * Bit mask for status bits not to be auto-cleared by the work queue thread
+ */
+#define STRIPE_OP_COMPLETION_MASK (1 << STRIPE_OP_RCW_Done |\
+ 1 << STRIPE_OP_RMW_Done |\
+ 1 << STRIPE_OP_CHECK_Done |\
+ 1 << STRIPE_OP_CHECK_IsZero |\
+ 1 << STRIPE_OP_COMPUTE_Done |\
+ 1 << STRIPE_OP_COMPUTE_Recover_pd |\
+ 1 << STRIPE_OP_BIOFILL_Done |\
+ 1 << STRIPE_OP_RCW_Dma |\
+ 1 << STRIPE_OP_RMW_Dma |\
+ 1 << STRIPE_OP_UPDATE_Dma |\
+ 1 << STRIPE_OP_CHECK_Dma |\
+ 1 << STRIPE_OP_COMPUTE_Dma |\
+ 1 << STRIPE_OP_COMPUTE2_Dma |\
+ 1 << STRIPE_OP_BIOFILL_Dma)
+
/*
* Plugging:
*
@@ -229,11 +333,19 @@ struct raid5_private_data {
atomic_t preread_active_stripes; /* stripes with scheduled io */

atomic_t reshape_stripes; /* stripes with pending writes for reshape */
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ struct workqueue_struct *block_ops_queue;
+ #endif
+ void (*do_block_ops)(void *);
+
/* unfortunately we need two cache names as we temporarily have
* two caches.
*/
int active_name;
char cache_name[2][20];
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ char workqueue_name[20];
+ #endif
kmem_cache_t *slab_cache; /* for allocating stripes */

int seq_flush, seq_write;
@@ -264,6 +376,17 @@ struct raid5_private_data {
typedef struct raid5_private_data raid5_conf_t;

#define mddev_to_conf(mddev) ((raid5_conf_t *) mddev->private)
+/* must be called under the stripe lock */
+static inline void queue_raid_work(struct stripe_head *sh)
+{
+ if (sh->ops.pending != 0 && !test_bit(STRIPE_OP_QUEUED, &sh->state)) {
+ set_bit(STRIPE_OP_QUEUED, &sh->state);
+ atomic_inc(&sh->count);
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ queue_work(sh->raid_conf->block_ops_queue, &sh->ops.work);
+ #endif
+ }
+}

/*
* Our supported algorithms

2006-09-11 23:18:01

by Dan Williams

[permalink] [raw]
Subject: [PATCH 04/19] raid5: move compute block operations to a workqueue

From: Dan Williams <[email protected]>

Enable handle_stripe5 to pass off compute block operations to
raid5_do_soft_block_ops, formerly handled by compute_block.

Here are a few notes about the new flags R5_ComputeReq and
STRIPE_OP_COMPUTE_Recover:

Previously, when handle_stripe5 found a block that needed to be computed it
updated it in the same step. Now that these operations are separated
(across multiple calls to handle_stripe5), a R5_ComputeReq flag is needed
to tell other parts of handle_stripe5 to treat the block under computation
as if it were up to date. The order of events in the work queue ensures that the
block is indeed up to date before performing further operations.

STRIPE_OP_COMPUTE_Recover_pd was added to track when the parity block is being
computed due to a failed parity check. This allows the code in
handle_stripe5 that produces requests for check_parity and compute_block
operations to be separate from the code that consumes the result.

Changelog:
* count blocks under computation as uptodate
* removed handle_compute_operations5. All logic moved into handle_stripe5
so that we do not need to go through the initiation logic to end the
operation.
* since the write operations mark blocks !uptodate we hold off the code to
compute/read blocks until it completes.
* new compute block operations and reads are held off while a compute is in
flight
* do not compute a block while a check parity operation is pending, and do
not start a new check parity operation while a compute operation is pending
* STRIPE_OP_Recover_pd holds off the clearing of the STRIPE_OP_COMPUTE state.
This allows the transition to be handled by the check parity logic that
writes recomputed parity to disk.

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/raid5.c | 153 ++++++++++++++++++++++++++++++++++++----------------
1 files changed, 107 insertions(+), 46 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 24ed4d8..0c39203 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1300,7 +1300,8 @@ static int handle_write_operations5(stru
}
} else {
/* enter stage 1 of read modify write operation */
- BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags));
+ BUG_ON(!(test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags) ||
+ test_bit(R5_ComputeReq, &sh->dev[pd_idx].flags)));

set_bit(STRIPE_OP_RMW, &sh->state);
set_bit(STRIPE_OP_RMW_ParityPre, &sh->ops.state);
@@ -1314,7 +1315,8 @@ static int handle_write_operations5(stru
* so we distinguish these blocks by the RMWReq bit
*/
if (dev->towrite &&
- test_bit(R5_UPTODATE, &dev->flags)) {
+ (test_bit(R5_UPTODATE, &dev->flags) ||
+ test_bit(R5_ComputeReq, &dev->flags))) {
set_bit(R5_RMWReq, &dev->flags);
set_bit(R5_LOCKED, &dev->flags);
clear_bit(R5_UPTODATE, &dev->flags);
@@ -1748,7 +1750,7 @@ static void handle_stripe5(struct stripe
int i;
int syncing, expanding, expanded;
int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
- int non_overwrite=0, write_complete=0;
+ int compute=0, non_overwrite=0, write_complete=0;
int failed_num=0;
struct r5dev *dev;

@@ -1799,7 +1801,7 @@ static void handle_stripe5(struct stripe
/* now count some things */
if (test_bit(R5_LOCKED, &dev->flags)) locked++;
if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++;
-
+ if (test_bit(R5_ComputeReq, &dev->flags)) BUG_ON(++compute > 1);

if (dev->toread) to_read++;
if (dev->towrite) {
@@ -1955,40 +1957,83 @@ static void handle_stripe5(struct stripe
* parity, or to satisfy requests
* or to load a block that is being partially written.
*/
- if (to_read || non_overwrite || (syncing && (uptodate < disks)) || expanding) {
- for (i=disks; i--;) {
- dev = &sh->dev[i];
- if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
- (dev->toread ||
- (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
- syncing ||
- expanding ||
- (failed && (sh->dev[failed_num].toread ||
- (sh->dev[failed_num].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags))))
- )
- ) {
- /* we would like to get this block, possibly
- * by computing it, but we might not be able to
+ if (to_read || non_overwrite || (syncing && (uptodate + compute < disks)) || expanding ||
+ test_bit(STRIPE_OP_COMPUTE, &sh->state)) {
+ /* Finish any pending compute operations. Parity recovery implies
+ * a write-back which is handled later on in this routine
+ */
+ if (test_bit(STRIPE_OP_COMPUTE, &sh->state) &&
+ test_bit(STRIPE_OP_COMPUTE_Done, &sh->ops.state) &&
+ !test_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_COMPUTE, &sh->state);
+ clear_bit(STRIPE_OP_COMPUTE_Done, &sh->ops.state);
+ }
+
+ /* blocks being written are temporarily !UPTODATE */
+ if (!test_bit(STRIPE_OP_COMPUTE, &sh->state) &&
+ !test_bit(STRIPE_OP_RCW, &sh->state) &&
+ !test_bit(STRIPE_OP_RMW, &sh->state)) {
+ for (i=disks; i--;) {
+ dev = &sh->dev[i];
+
+ /* don't schedule compute operations or reads on
+ * the parity block while a check is in flight
*/
- if (uptodate == disks-1) {
- PRINTK("Computing block %d\n", i);
- compute_block(sh, i);
- uptodate++;
- } else if (test_bit(R5_Insync, &dev->flags)) {
- set_bit(R5_LOCKED, &dev->flags);
- set_bit(R5_Wantread, &dev->flags);
+ if ((i == sh->pd_idx) && test_bit(STRIPE_OP_CHECK, &sh->state))
+ continue;
+
+ if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
+ (dev->toread ||
+ (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
+ syncing ||
+ expanding ||
+ (failed && (sh->dev[failed_num].toread ||
+ (sh->dev[failed_num].towrite &&
+ !test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags))))
+ )
+ ) {
+ /* 1/ We would like to get this block, possibly
+ * by computing it, but we might not be able to.
+ *
+ * 2/ Since parity check operations make the parity
+ * block !uptodate it will need to be refreshed
+ * before any compute operations on data disks are
+ * scheduled.
+ *
+ * 3/ We hold off parity block re-reads until check
+ * operations have quiesced.
+ */
+ if ((uptodate == disks-1) && !test_bit(STRIPE_OP_CHECK, &sh->state)) {
+ set_bit(STRIPE_OP_COMPUTE, &sh->state);
+ set_bit(STRIPE_OP_COMPUTE_Prep, &sh->ops.state);
+ set_bit(R5_ComputeReq, &dev->flags);
+ sh->ops.pending++;
+ /* Careful: from this point on 'uptodate' is in the eye of the
+ * workqueue which services 'compute' operations before writes.
+ * R5_ComputeReq flags blocks that will be R5_UPTODATE
+ * in the work queue.
+ */
+ uptodate++;
+ } else if ((uptodate < disks-1) && test_bit(R5_Insync, &dev->flags)) {
+ /* Note: we hold off compute operations while checks are in flight,
+ * but we still prefer 'compute' over 'read' hence we only read if
+ * (uptodate < disks-1)
+ */
+ set_bit(R5_LOCKED, &dev->flags);
+ set_bit(R5_Wantread, &dev->flags);
#if 0
- /* if I am just reading this block and we don't have
- a failed drive, or any pending writes then sidestep the cache */
- if (sh->bh_read[i] && !sh->bh_read[i]->b_reqnext &&
- ! syncing && !failed && !to_write) {
- sh->bh_cache[i]->b_page = sh->bh_read[i]->b_page;
- sh->bh_cache[i]->b_data = sh->bh_read[i]->b_data;
- }
+ /* if I am just reading this block and we don't have
+ a failed drive, or any pending writes then sidestep the cache */
+ if (sh->bh_read[i] && !sh->bh_read[i]->b_reqnext &&
+ ! syncing && !failed && !to_write) {
+ sh->bh_cache[i]->b_page = sh->bh_read[i]->b_page;
+ sh->bh_cache[i]->b_data = sh->bh_read[i]->b_data;
+ }
#endif
- locked++;
- PRINTK("Reading block %d (sync=%d)\n",
- i, syncing);
+ locked++;
+ PRINTK("Reading block %d (sync=%d)\n",
+ i, syncing);
+ }
}
}
}
@@ -2055,7 +2100,7 @@ #if 0
|| sh->bh_page[i]!=bh->b_page
#endif
) &&
- !test_bit(R5_UPTODATE, &dev->flags)) {
+ !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_ComputeReq, &dev->flags))) {
if (test_bit(R5_Insync, &dev->flags)
/* && !(!mddev->insync && i == sh->pd_idx) */
)
@@ -2069,7 +2114,7 @@ #if 0
|| sh->bh_page[i] != bh->b_page
#endif
) &&
- !test_bit(R5_UPTODATE, &dev->flags)) {
+ !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_ComputeReq, &dev->flags))) {
if (test_bit(R5_Insync, &dev->flags)) rcw++;
else rcw += 2*disks;
}
@@ -2082,7 +2127,8 @@ #endif
for (i=disks; i--;) {
dev = &sh->dev[i];
if ((dev->towrite || i == sh->pd_idx) &&
- !test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
+ !test_bit(R5_LOCKED, &dev->flags) &&
+ !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_ComputeReq, &dev->flags)) &&
test_bit(R5_Insync, &dev->flags)) {
if (test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
{
@@ -2101,7 +2147,8 @@ #endif
for (i=disks; i--;) {
dev = &sh->dev[i];
if (!test_bit(R5_OVERWRITE, &dev->flags) && i != sh->pd_idx &&
- !test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
+ !test_bit(R5_LOCKED, &dev->flags) &&
+ !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_ComputeReq, &dev->flags)) &&
test_bit(R5_Insync, &dev->flags)) {
if (test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
{
@@ -2127,16 +2174,19 @@ #endif
* 2/ Hold off parity checks while parity dependent operations are in flight
* (RCW and RMW are protected by 'locked')
*/
- if ((syncing && locked == 0 &&
- !test_bit(STRIPE_INSYNC, &sh->state)) ||
- test_bit(STRIPE_OP_CHECK, &sh->state)) {
+ if ((syncing && locked == 0 && !test_bit(STRIPE_OP_COMPUTE, &sh->state) &&
+ !test_bit(STRIPE_INSYNC, &sh->state)) ||
+ test_bit(STRIPE_OP_CHECK, &sh->state) ||
+ test_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state)) {

set_bit(STRIPE_HANDLE, &sh->state);
/* Take one of the following actions:
* 1/ start a check parity operation if (uptodate == disks)
* 2/ finish a check parity operation and act on the result
+ * 3/ skip to the writeback section if we previously
+ * initiated a recovery operation
*/
- if (failed == 0) {
+ if (failed == 0 && !test_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state)) {
if (!test_bit(STRIPE_OP_CHECK, &sh->state)) {
BUG_ON(uptodate != disks);
set_bit(STRIPE_OP_CHECK, &sh->state);
@@ -2157,18 +2207,29 @@ #endif
/* don't try to repair!! */
set_bit(STRIPE_INSYNC, &sh->state);
else {
- compute_block(sh, sh->pd_idx);
+ set_bit(STRIPE_OP_COMPUTE, &sh->state);
+ set_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state);
+ set_bit(STRIPE_OP_COMPUTE_Prep, &sh->ops.state);
+ set_bit(R5_ComputeReq, &sh->dev[sh->pd_idx].flags);
+ sh->ops.pending++;
uptodate++;
}
}
}
}
+ if (test_bit(STRIPE_OP_COMPUTE_Done, &sh->ops.state) &&
+ test_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_COMPUTE, &sh->state);
+ clear_bit(STRIPE_OP_COMPUTE_Done, &sh->ops.state);
+ clear_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state);
+ }

- /* Wait for check parity operations to complete
+ /* Wait for check parity and compute block operations to complete
* before write-back
*/
if (!test_bit(STRIPE_INSYNC, &sh->state) &&
- !test_bit(STRIPE_OP_CHECK, &sh->state)) {
+ !test_bit(STRIPE_OP_CHECK, &sh->state) &&
+ !test_bit(STRIPE_OP_COMPUTE, &sh->state)) {

/* either failed parity check, or recovery is happening */
if (failed==0)

2006-09-11 23:18:07

by Dan Williams

[permalink] [raw]
Subject: [PATCH 05/19] raid5: move read completion copies to a workqueue

From: Dan Williams <[email protected]>

Enable handle_stripe5 to hand off the memory copy operations that satisfy
read requests to raid5_do_soft_blocks_ops, formerly this was handled in
line within handle_stripe5.

It adds a 'read' (past tense) pointer to the r5dev structure
to to track reads that have been offloaded to the workqueue. When the copy
operation is complete the 'read' pointer is reused as the return_bi for the
bi_end_io() call.

Changelog:
* dev->read only holds reads that have been satisfied, previously it
doubled as a request queue to the operations routine
* added R5_ReadReq to mark the blocks that belong to a given bio fill
operation
* requested reads no longer count towards the 'to_read' count, 'to_fill'
tracks the number of requested reads

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/raid5.c | 67 +++++++++++++++++++++++++++++-----------------------
1 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0c39203..1a8dfd2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -240,11 +240,11 @@ static void init_stripe(struct stripe_he
for (i = sh->disks; i--; ) {
struct r5dev *dev = &sh->dev[i];

- if (dev->toread || dev->towrite || dev->written ||
+ if (dev->toread || dev->read || dev->towrite || dev->written ||
test_bit(R5_LOCKED, &dev->flags)) {
- printk("sector=%llx i=%d %p %p %p %d\n",
+ printk("sector=%llx i=%d %p %p %p %p %d\n",
(unsigned long long)sh->sector, i, dev->toread,
- dev->towrite, dev->written,
+ dev->read, dev->towrite, dev->written,
test_bit(R5_LOCKED, &dev->flags));
BUG();
}
@@ -1749,7 +1749,7 @@ static void handle_stripe5(struct stripe
struct bio *bi;
int i;
int syncing, expanding, expanded;
- int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
+ int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0, to_fill=0;
int compute=0, non_overwrite=0, write_complete=0;
int failed_num=0;
struct r5dev *dev;
@@ -1765,44 +1765,47 @@ static void handle_stripe5(struct stripe
syncing = test_bit(STRIPE_SYNCING, &sh->state);
expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
- /* Now to look around and see what can be done */

+ if (test_bit(STRIPE_OP_BIOFILL, &sh->state) &&
+ test_bit(STRIPE_OP_BIOFILL_Done, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_BIOFILL, &sh->state);
+ clear_bit(STRIPE_OP_BIOFILL_Done, &sh->ops.state);
+ }
+
+ /* Now to look around and see what can be done */
rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
dev = &sh->dev[i];
clear_bit(R5_Insync, &dev->flags);

- PRINTK("check %d: state 0x%lx read %p write %p written %p\n",
- i, dev->flags, dev->toread, dev->towrite, dev->written);
+ PRINTK("check %d: state 0x%lx toread %p read %p write %p written %p\n",
+ i, dev->flags, dev->toread, dev->read, dev->towrite, dev->written);
+
+ /* maybe we can acknowledge completion of a biofill operation */
+ if (test_bit(R5_ReadReq, &dev->flags) && !dev->toread)
+ clear_bit(R5_ReadReq, &dev->flags);
+
/* maybe we can reply to a read */
+ if (dev->read && !test_bit(R5_ReadReq, &dev->flags) &&
+ !test_bit(STRIPE_OP_BIOFILL, &sh->state)) {
+ return_bi = dev->read;
+ dev->read = NULL;
+ }
+
+ /* maybe we can start a biofill operation */
if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread) {
- struct bio *rbi, *rbi2;
- PRINTK("Return read for disc %d\n", i);
- spin_lock_irq(&conf->device_lock);
- rbi = dev->toread;
- dev->toread = NULL;
- if (test_and_clear_bit(R5_Overlap, &dev->flags))
- wake_up(&conf->wait_for_overlap);
- spin_unlock_irq(&conf->device_lock);
- while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
- copy_data(0, rbi, dev->page, dev->sector);
- rbi2 = r5_next_bio(rbi, dev->sector);
- spin_lock_irq(&conf->device_lock);
- if (--rbi->bi_phys_segments == 0) {
- rbi->bi_next = return_bi;
- return_bi = rbi;
- }
- spin_unlock_irq(&conf->device_lock);
- rbi = rbi2;
- }
+ to_read--;
+ if (!test_bit(STRIPE_OP_BIOFILL, &sh->state))
+ set_bit(R5_ReadReq, &dev->flags);
}

/* now count some things */
if (test_bit(R5_LOCKED, &dev->flags)) locked++;
if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++;
+ if (test_bit(R5_ReadReq, &dev->flags)) to_fill++;
if (test_bit(R5_ComputeReq, &dev->flags)) BUG_ON(++compute > 1);
-
+
if (dev->toread) to_read++;
if (dev->towrite) {
to_write++;
@@ -1824,9 +1827,15 @@ static void handle_stripe5(struct stripe
set_bit(R5_Insync, &dev->flags);
}
rcu_read_unlock();
+
+ if (to_fill && !test_bit(STRIPE_OP_BIOFILL, &sh->state)) {
+ set_bit(STRIPE_OP_BIOFILL, &sh->state);
+ sh->ops.pending++;
+ }
+
PRINTK("locked=%d uptodate=%d to_read=%d"
- " to_write=%d failed=%d failed_num=%d\n",
- locked, uptodate, to_read, to_write, failed, failed_num);
+ " to_write=%d to_fill=%d failed=%d failed_num=%d\n",
+ locked, uptodate, to_read, to_write, to_fill, failed, failed_num);
/* check if the array has lost two devices and, if so, some requests might
* need to be failed
*/

2006-09-11 23:18:43

by Dan Williams

[permalink] [raw]
Subject: [PATCH 10/19] dmaengine: expose per channel dma mapping characteristics to clients

From: Dan Williams <[email protected]>

Allow a client to ensure that the dma channel it has selected can
dma to the specified buffer or page address. Also allow the client to
pre-map address ranges to be passed to the operations API.

Changelog:
* make the dmaengine api EXPORT_SYMBOL_GPL
* zero sum support should be standalone, not integrated into xor

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 4 ++++
drivers/dma/ioatdma.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/dmaengine.h | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 9b02afa..e78ce89 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -630,3 +630,7 @@ EXPORT_SYMBOL_GPL(dma_async_device_unreg
EXPORT_SYMBOL_GPL(dma_chan_cleanup);
EXPORT_SYMBOL_GPL(dma_async_do_xor_err);
EXPORT_SYMBOL_GPL(dma_async_chan_init);
+EXPORT_SYMBOL_GPL(dma_async_map_page);
+EXPORT_SYMBOL_GPL(dma_async_map_single);
+EXPORT_SYMBOL_GPL(dma_async_unmap_page);
+EXPORT_SYMBOL_GPL(dma_async_unmap_single);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index dd5b9f0..0159d14 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -637,6 +637,37 @@ extern dma_cookie_t dma_async_do_xor_err
union dmaengine_addr src, unsigned int src_cnt,
unsigned int src_off, size_t len, unsigned long flags);

+static dma_addr_t ioat_map_page(struct dma_chan *chan, struct page *page,
+ unsigned long offset, size_t size,
+ int direction)
+{
+ struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+ return pci_map_page(ioat_chan->device->pdev, page, offset, size,
+ direction);
+}
+
+static dma_addr_t ioat_map_single(struct dma_chan *chan, void *cpu_addr,
+ size_t size, int direction)
+{
+ struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+ return pci_map_single(ioat_chan->device->pdev, cpu_addr, size,
+ direction);
+}
+
+static void ioat_unmap_page(struct dma_chan *chan, dma_addr_t handle,
+ size_t size, int direction)
+{
+ struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+ pci_unmap_page(ioat_chan->device->pdev, handle, size, direction);
+}
+
+static void ioat_unmap_single(struct dma_chan *chan, dma_addr_t handle,
+ size_t size, int direction)
+{
+ struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+ pci_unmap_single(ioat_chan->device->pdev, handle, size, direction);
+}
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -717,6 +748,10 @@ #endif
device->common.capabilities = DMA_MEMCPY;
device->common.device_do_dma_memcpy = do_ioat_dma_memcpy;
device->common.device_do_dma_xor = dma_async_do_xor_err;
+ device->common.map_page = ioat_map_page;
+ device->common.map_single = ioat_map_single;
+ device->common.unmap_page = ioat_unmap_page;
+ device->common.unmap_single = ioat_unmap_single;
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index df055cc..cb4cfcf 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -287,6 +287,15 @@ struct dma_device {
enum dma_status (*device_operation_complete)(struct dma_chan *chan,
dma_cookie_t cookie, dma_cookie_t *last,
dma_cookie_t *used);
+ dma_addr_t (*map_page)(struct dma_chan *chan, struct page *page,
+ unsigned long offset, size_t size,
+ int direction);
+ dma_addr_t (*map_single)(struct dma_chan *chan, void *cpu_addr,
+ size_t size, int direction);
+ void (*unmap_page)(struct dma_chan *chan, dma_addr_t handle,
+ size_t size, int direction);
+ void (*unmap_single)(struct dma_chan *chan, dma_addr_t handle,
+ size_t size, int direction);
void (*device_issue_pending)(struct dma_chan *chan);
};

@@ -592,6 +601,31 @@ static inline enum dma_status dma_async_
return DMA_IN_PROGRESS;
}

+static inline dma_addr_t dma_async_map_page(struct dma_chan *chan,
+ struct page *page, unsigned long offset, size_t size,
+ int direction)
+{
+ return chan->device->map_page(chan, page, offset, size, direction);
+}
+
+static inline dma_addr_t dma_async_map_single(struct dma_chan *chan,
+ void *cpu_addr, size_t size, int direction)
+{
+ return chan->device->map_single(chan, cpu_addr, size, direction);
+}
+
+static inline void dma_async_unmap_page(struct dma_chan *chan,
+ dma_addr_t handle, size_t size, int direction)
+{
+ chan->device->unmap_page(chan, handle, size, direction);
+}
+
+static inline void dma_async_unmap_single(struct dma_chan *chan,
+ dma_addr_t handle, size_t size, int direction)
+{
+ chan->device->unmap_single(chan, handle, size, direction);
+}
+
/* --- DMA device --- */

int dma_async_device_register(struct dma_device *device);

2006-09-11 23:18:55

by Dan Williams

[permalink] [raw]
Subject: [PATCH 14/19] dmaengine: add dma_sync_wait

From: Dan Williams <[email protected]>

dma_sync_wait is a common routine to live wait for a dma operation to
complete.

Signed-off-by: Dan Williams <[email protected]>
---

include/linux/dmaengine.h | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 9fd6cbd..0a70c9e 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -750,6 +750,18 @@ static inline void dma_async_unmap_singl
chan->device->unmap_single(chan, handle, size, direction);
}

+static inline enum dma_status dma_sync_wait(struct dma_chan *chan,
+ dma_cookie_t cookie)
+{
+ enum dma_status status;
+ dma_async_issue_pending(chan);
+ do {
+ status = dma_async_operation_complete(chan, cookie, NULL, NULL);
+ } while (status == DMA_IN_PROGRESS);
+
+ return status;
+}
+
/* --- DMA device --- */

int dma_async_device_register(struct dma_device *device);

2006-09-11 23:18:52

by Dan Williams

[permalink] [raw]
Subject: [PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy

From: Dan Williams <[email protected]>

Default virtual function that returns an error if the user attempts a
memcpy operation. An XOR engine is an example of a DMA engine that does
not support memcpy.

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index fe62237..33ad690 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -593,6 +593,18 @@ void dma_async_device_unregister(struct
}

/**
+ * dma_async_do_memcpy_err - default function for dma devices that
+ * do not support memcpy
+ */
+dma_cookie_t dma_async_do_memcpy_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_off,
+ size_t len, unsigned long flags)
+{
+ return -ENXIO;
+}
+
+/**
* dma_async_do_xor_err - default function for dma devices that
* do not support xor
*/
@@ -642,6 +654,7 @@ EXPORT_SYMBOL_GPL(dma_async_issue_pendin
EXPORT_SYMBOL_GPL(dma_async_device_register);
EXPORT_SYMBOL_GPL(dma_async_device_unregister);
EXPORT_SYMBOL_GPL(dma_chan_cleanup);
+EXPORT_SYMBOL_GPL(dma_async_do_memcpy_err);
EXPORT_SYMBOL_GPL(dma_async_do_xor_err);
EXPORT_SYMBOL_GPL(dma_async_do_memset_err);
EXPORT_SYMBOL_GPL(dma_async_chan_init);

2006-09-11 23:19:50

by Dan Williams

[permalink] [raw]
Subject: [PATCH 15/19] dmaengine: raid5 dma client

From: Dan Williams <[email protected]>

Adds a dmaengine client that is the hardware accelerated version of
raid5_do_soft_block_ops. It utilizes the raid5 workqueue implementation to
operate on multiple stripes simultaneously. See the iop-adma.c driver for
an example of a driver that enables hardware accelerated raid5.

Changelog:
* mark operations as _Dma rather than _Done until all outstanding
operations have completed. Once all operations have completed update the
state and return it to the handle list
* add a helper routine to retrieve the last used cookie
* use dma_async_zero_sum_dma_list for checking parity which optionally
allows parity check operations to not dirty the parity block in the cache
(if 'disks' is less than 'MAX_ADMA_XOR_SOURCES')
* remove dependencies on iop13xx
* take into account the fact that dma engines have a staging buffer so we
can perform 1 less block operation compared to software xor
* added __arch_raid5_dma_chan_request __arch_raid5_dma_next_channel and
__arch_raid5_dma_check_channel to make the driver architecture independent
* added channel switching capability for architectures that implement
different operations (i.e. copy & xor) on individual channels
* added initial support for "non-blocking" channel switching

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/Kconfig | 9 +
drivers/dma/Makefile | 1
drivers/dma/raid5-dma.c | 730 ++++++++++++++++++++++++++++++++++++++++++++
drivers/md/Kconfig | 11 +
drivers/md/raid5.c | 66 ++++
include/linux/dmaengine.h | 5
include/linux/raid/raid5.h | 24 +
7 files changed, 839 insertions(+), 7 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 30d021d..fced8c3 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -22,6 +22,15 @@ config NET_DMA
Since this is the main user of the DMA engine, it should be enabled;
say Y here.

+config RAID5_DMA
+ tristate "MD raid5: block operations offload"
+ depends on INTEL_IOP_ADMA && MD_RAID456
+ default y
+ ---help---
+ This enables the use of DMA engines in the MD-RAID5 driver to
+ offload stripe cache operations, freeing CPU cycles.
+ say Y here
+
comment "DMA Devices"

config INTEL_IOATDMA
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index bdcfdbd..4e36d6e 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,3 +1,4 @@
obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
obj-$(CONFIG_NET_DMA) += iovlock.o
+obj-$(CONFIG_RAID5_DMA) += raid5-dma.o
obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
diff --git a/drivers/dma/raid5-dma.c b/drivers/dma/raid5-dma.c
new file mode 100644
index 0000000..04a1790
--- /dev/null
+++ b/drivers/dma/raid5-dma.c
@@ -0,0 +1,730 @@
+/*
+ * Offload raid5 operations to hardware RAID engines
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+#include <linux/raid/raid5.h>
+#include <linux/dmaengine.h>
+
+static struct dma_client *raid5_dma_client;
+static atomic_t raid5_count;
+extern void release_stripe(struct stripe_head *sh);
+extern void __arch_raid5_dma_chan_request(struct dma_client *client);
+extern struct dma_chan *__arch_raid5_dma_next_channel(struct dma_client *client);
+
+#define MAX_HW_XOR_SRCS 16
+
+#ifndef STRIPE_SIZE
+#define STRIPE_SIZE PAGE_SIZE
+#endif
+
+#ifndef STRIPE_SECTORS
+#define STRIPE_SECTORS (STRIPE_SIZE>>9)
+#endif
+
+#ifndef r5_next_bio
+#define r5_next_bio(bio, sect) ( ( (bio)->bi_sector + ((bio)->bi_size>>9) < sect + STRIPE_SECTORS) ? (bio)->bi_next : NULL)
+#endif
+
+#define DMA_RAID5_DEBUG 0
+#define PRINTK(x...) ((void)(DMA_RAID5_DEBUG && printk(x)))
+
+/*
+ * Copy data between a page in the stripe cache, and one or more bion
+ * The page could align with the middle of the bio, or there could be
+ * several bion, each with several bio_vecs, which cover part of the page
+ * Multiple bion are linked together on bi_next. There may be extras
+ * at the end of this list. We ignore them.
+ */
+static dma_cookie_t dma_raid_copy_data(int frombio, struct bio *bio,
+ dma_addr_t dma, sector_t sector, struct dma_chan *chan,
+ dma_cookie_t cookie)
+{
+ struct bio_vec *bvl;
+ struct page *bio_page;
+ int i;
+ int dma_offset;
+ dma_cookie_t last_cookie = cookie;
+
+ if (bio->bi_sector >= sector)
+ dma_offset = (signed)(bio->bi_sector - sector) * 512;
+ else
+ dma_offset = (signed)(sector - bio->bi_sector) * -512;
+ bio_for_each_segment(bvl, bio, i) {
+ int len = bio_iovec_idx(bio,i)->bv_len;
+ int clen;
+ int b_offset = 0;
+
+ if (dma_offset < 0) {
+ b_offset = -dma_offset;
+ dma_offset += b_offset;
+ len -= b_offset;
+ }
+
+ if (len > 0 && dma_offset + len > STRIPE_SIZE)
+ clen = STRIPE_SIZE - dma_offset;
+ else clen = len;
+
+ if (clen > 0) {
+ b_offset += bio_iovec_idx(bio,i)->bv_offset;
+ bio_page = bio_iovec_idx(bio,i)->bv_page;
+ if (frombio)
+ do {
+ cookie = dma_async_memcpy_pg_to_dma(chan,
+ dma + dma_offset,
+ bio_page,
+ b_offset,
+ clen);
+ if (cookie == -ENOMEM)
+ dma_sync_wait(chan, last_cookie);
+ else
+ WARN_ON(cookie <= 0);
+ } while (cookie == -ENOMEM);
+ else
+ do {
+ cookie = dma_async_memcpy_dma_to_pg(chan,
+ bio_page,
+ b_offset,
+ dma + dma_offset,
+ clen);
+ if (cookie == -ENOMEM)
+ dma_sync_wait(chan, last_cookie);
+ else
+ WARN_ON(cookie <= 0);
+ } while (cookie == -ENOMEM);
+ }
+ last_cookie = cookie;
+ if (clen < len) /* hit end of page */
+ break;
+ dma_offset += len;
+ }
+
+ return last_cookie;
+}
+
+#define issue_xor() do { \
+ do { \
+ cookie = dma_async_xor_dma_list_to_dma( \
+ sh->ops.dma_chan, \
+ xor_destination_addr, \
+ dma, \
+ count, \
+ STRIPE_SIZE); \
+ if (cookie == -ENOMEM) \
+ dma_sync_wait(sh->ops.dma_chan, \
+ sh->ops.dma_cookie); \
+ else \
+ WARN_ON(cookie <= 0); \
+ } while (cookie == -ENOMEM); \
+ sh->ops.dma_cookie = cookie; \
+ dma[0] = xor_destination_addr; \
+ count = 1; \
+ } while(0)
+#define check_xor() do { \
+ if (count == MAX_HW_XOR_SRCS) \
+ issue_xor(); \
+ } while (0)
+
+#ifdef CONFIG_RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH
+extern struct dma_chan *__arch_raid5_dma_check_channel(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ struct dma_client *client,
+ unsigned long capabilities);
+
+#ifdef CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE
+#define check_channel(cap, bookmark) do { \
+bookmark: \
+ next_chan = __arch_raid5_dma_check_channel(sh->ops.dma_chan, \
+ sh->ops.dma_cookie, \
+ raid5_dma_client, \
+ (cap)); \
+ if (!next_chan) { \
+ BUG_ON(sh->ops.ops_bookmark); \
+ sh->ops.ops_bookmark = &&bookmark; \
+ goto raid5_dma_retry; \
+ } else { \
+ sh->ops.dma_chan = next_chan; \
+ sh->ops.dma_cookie = dma_async_get_last_cookie( \
+ next_chan); \
+ sh->ops.ops_bookmark = NULL; \
+ } \
+} while (0)
+#else
+#define check_channel(cap, bookmark) do { \
+bookmark: \
+ next_chan = __arch_raid5_dma_check_channel(sh->ops.dma_chan, \
+ sh->ops.dma_cookie, \
+ raid5_dma_client, \
+ (cap)); \
+ if (!next_chan) { \
+ dma_sync_wait(sh->ops.dma_chan, sh->ops.dma_cookie); \
+ goto bookmark; \
+ } else { \
+ sh->ops.dma_chan = next_chan; \
+ sh->ops.dma_cookie = dma_async_get_last_cookie( \
+ next_chan); \
+ } \
+} while (0)
+#endif /* CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE */
+#else
+#define check_channel(cap, bookmark) do { } while (0)
+#endif /* CONFIG_RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH */
+
+/*
+ * dma_do_raid5_block_ops - perform block memory operations on stripe data
+ * outside the spin lock with dma engines
+ *
+ * A note about the need for __arch_raid5_dma_check_channel:
+ * This function is only needed to support architectures where a single raid
+ * operation spans multiple hardware channels. For example on a reconstruct
+ * write, memory copy operations are submitted to a memcpy channel and then
+ * the routine must switch to the xor channel to complete the raid operation.
+ * __arch_raid5_dma_check_channel makes sure the previous operation has
+ * completed before returning the new channel.
+ * Some efficiency can be gained by putting the stripe back on the work
+ * queue rather than spin waiting. This code is a work in progress and is
+ * available via the 'broken' option CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE.
+ * If 'wait via requeue' is not defined the check_channel macro live waits
+ * for the next channel.
+ */
+static void dma_do_raid5_block_ops(void *stripe_head_ref)
+{
+ struct stripe_head *sh = stripe_head_ref;
+ int i, pd_idx = sh->pd_idx, disks = sh->disks;
+ dma_addr_t dma[MAX_HW_XOR_SRCS];
+ int overlap=0;
+ unsigned long state, ops_state, ops_state_orig;
+ raid5_conf_t *conf = sh->raid_conf;
+ dma_cookie_t cookie;
+ #ifdef CONFIG_RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH
+ struct dma_chan *next_chan;
+ #endif
+
+ if (!sh->ops.dma_chan) {
+ sh->ops.dma_chan = __arch_raid5_dma_next_channel(raid5_dma_client);
+ dma_chan_get(sh->ops.dma_chan);
+ /* retrieve the last used cookie on this channel */
+ sh->ops.dma_cookie = dma_async_get_last_cookie(sh->ops.dma_chan);
+ }
+
+ /* take a snapshot of what needs to be done at this point in time */
+ spin_lock(&sh->lock);
+ state = sh->state;
+ ops_state_orig = ops_state = sh->ops.state;
+ spin_unlock(&sh->lock);
+
+ #ifdef CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE
+ /* pick up where we left off */
+ if (sh->ops.ops_bookmark)
+ goto *sh->ops.ops_bookmark;
+ #endif
+
+ if (test_bit(STRIPE_OP_BIOFILL, &state) &&
+ !test_bit(STRIPE_OP_BIOFILL_Dma, &ops_state)) {
+ struct bio *return_bi;
+ PRINTK("%s: stripe %llu STRIPE_OP_BIOFILL op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+ check_channel(DMA_MEMCPY, stripe_op_biofill);
+ return_bi = NULL;
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (test_bit(R5_ReadReq, &dev->flags)) {
+ struct bio *rbi, *rbi2;
+ spin_lock_irq(&conf->device_lock);
+ rbi = dev->toread;
+ dev->toread = NULL;
+ spin_unlock_irq(&conf->device_lock);
+ overlap++;
+ while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+ sh->ops.dma_cookie = dma_raid_copy_data(0,
+ rbi, dev->dma, dev->sector,
+ sh->ops.dma_chan,
+ sh->ops.dma_cookie);
+ rbi2 = r5_next_bio(rbi, dev->sector);
+ spin_lock_irq(&conf->device_lock);
+ if (--rbi->bi_phys_segments == 0) {
+ rbi->bi_next = return_bi;
+ return_bi = rbi;
+ }
+ spin_unlock_irq(&conf->device_lock);
+ rbi = rbi2;
+ }
+ dev->read = return_bi;
+ }
+ }
+ if (overlap)
+ set_bit(STRIPE_OP_BIOFILL_Dma, &ops_state);
+ }
+
+ if (test_bit(STRIPE_OP_COMPUTE, &state) &&
+ !test_bit(STRIPE_OP_COMPUTE_Dma, &ops_state)) {
+
+ /* dma engines do not need to pre-zero the destination */
+ if (test_and_clear_bit(STRIPE_OP_COMPUTE_Prep, &ops_state))
+ set_bit(STRIPE_OP_COMPUTE_Parity, &ops_state);
+
+ if (test_and_clear_bit(STRIPE_OP_COMPUTE_Parity, &ops_state)) {
+ dma_addr_t xor_destination_addr;
+ int dd_idx;
+ int count;
+
+ check_channel(DMA_XOR, stripe_op_compute_parity);
+ dd_idx = -1;
+ count = 0;
+
+ for (i=disks ; i-- ; )
+ if (test_bit(R5_ComputeReq, &sh->dev[i].flags)) {
+ dd_idx = i;
+ PRINTK("%s: stripe %llu STRIPE_OP_COMPUTE "
+ "op_state: %lx block: %d\n",
+ __FUNCTION__,
+ (unsigned long long)sh->sector,
+ ops_state, dd_idx);
+ break;
+ }
+
+ BUG_ON(dd_idx < 0);
+
+ xor_destination_addr = sh->dev[dd_idx].dma;
+
+ for (i=disks ; i-- ; )
+ if (i != dd_idx) {
+ dma[count++] = sh->dev[i].dma;
+ check_xor();
+ }
+
+ if (count > 1)
+ issue_xor();
+
+ set_bit(STRIPE_OP_COMPUTE_Dma, &ops_state);
+ }
+ }
+
+ if (test_bit(STRIPE_OP_RMW, &state) &&
+ !test_bit(STRIPE_OP_RMW_Dma, &ops_state)) {
+ BUG_ON(test_bit(STRIPE_OP_RCW, &state));
+
+ PRINTK("%s: stripe %llu STRIPE_OP_RMW op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+ if (test_and_clear_bit(STRIPE_OP_RMW_ParityPre, &ops_state)) {
+ dma_addr_t xor_destination_addr;
+ int count;
+
+ check_channel(DMA_XOR, stripe_op_rmw_paritypre);
+ count = 0;
+
+ /* existing parity data is used in the xor subtraction */
+ xor_destination_addr = dma[count++] = sh->dev[pd_idx].dma;
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ struct bio *chosen;
+
+ /* Only process blocks that are known to be uptodate */
+ if (dev->towrite && test_bit(R5_RMWReq, &dev->flags)) {
+ dma[count++] = dev->dma;
+
+ spin_lock(&sh->lock);
+ chosen = dev->towrite;
+ dev->towrite = NULL;
+ BUG_ON(dev->written);
+ dev->written = chosen;
+ spin_unlock(&sh->lock);
+
+ overlap++;
+
+ check_xor();
+ }
+ }
+ if (count > 1)
+ issue_xor();
+
+ set_bit(STRIPE_OP_RMW_Drain, &ops_state);
+ }
+
+ if (test_and_clear_bit(STRIPE_OP_RMW_Drain, &ops_state)) {
+
+ check_channel(DMA_MEMCPY, stripe_op_rmw_drain);
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ struct bio *wbi = dev->written;
+
+ while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+ sh->ops.dma_cookie = dma_raid_copy_data(1,
+ wbi, dev->dma, dev->sector,
+ sh->ops.dma_chan,
+ sh->ops.dma_cookie);
+ wbi = r5_next_bio(wbi, dev->sector);
+ }
+ }
+ set_bit(STRIPE_OP_RMW_ParityPost, &ops_state);
+ }
+
+ if (test_and_clear_bit(STRIPE_OP_RMW_ParityPost, &ops_state)) {
+ dma_addr_t xor_destination_addr;
+ int count;
+
+ check_channel(DMA_XOR, stripe_op_rmw_paritypost);
+ count = 0;
+
+ xor_destination_addr = dma[count++] = sh->dev[pd_idx].dma;
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (dev->written) {
+ dma[count++] = dev->dma;
+ check_xor();
+ }
+ }
+ if (count > 1)
+ issue_xor();
+
+ set_bit(STRIPE_OP_RMW_Dma, &ops_state);
+ }
+ }
+
+ if (test_bit(STRIPE_OP_RCW, &state) &&
+ !test_bit(STRIPE_OP_RCW_Dma, &ops_state)) {
+ BUG_ON(test_bit(STRIPE_OP_RMW, &state));
+
+ PRINTK("%s: stripe %llu STRIPE_OP_RCW op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+
+ if (test_and_clear_bit(STRIPE_OP_RCW_Drain, &ops_state)) {
+
+ check_channel(DMA_MEMCPY, stripe_op_rcw_drain);
+
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ struct bio *chosen;
+ struct bio *wbi;
+
+ if (i!=pd_idx && dev->towrite &&
+ test_bit(R5_LOCKED, &dev->flags)) {
+
+ spin_lock(&sh->lock);
+ chosen = dev->towrite;
+ dev->towrite = NULL;
+ BUG_ON(dev->written);
+ wbi = dev->written = chosen;
+ spin_unlock(&sh->lock);
+
+ overlap++;
+
+ while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+ sh->ops.dma_cookie = dma_raid_copy_data(1,
+ wbi, dev->dma, dev->sector,
+ sh->ops.dma_chan,
+ sh->ops.dma_cookie);
+ wbi = r5_next_bio(wbi, dev->sector);
+ }
+ }
+ }
+ set_bit(STRIPE_OP_RCW_Parity, &ops_state);
+ }
+
+ if (test_and_clear_bit(STRIPE_OP_RCW_Parity, &ops_state)) {
+ dma_addr_t xor_destination_addr;
+ int count;
+
+ check_channel(DMA_XOR, stripe_op_rcw_parity);
+ count = 0;
+
+ xor_destination_addr = sh->dev[pd_idx].dma;
+
+ for (i=disks; i--;)
+ if (i != pd_idx) {
+ dma[count++] = sh->dev[i].dma;
+ check_xor();
+ }
+ if (count > 1)
+ issue_xor();
+
+ set_bit(STRIPE_OP_RCW_Dma, &ops_state);
+ }
+ }
+
+ if (test_bit(STRIPE_OP_CHECK, &state) &&
+ !test_bit(STRIPE_OP_CHECK_Dma, &ops_state)) {
+ PRINTK("%s: stripe %llu STRIPE_OP_CHECK op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ ops_state);
+
+ if (test_and_clear_bit(STRIPE_OP_CHECK_Gen, &ops_state)) {
+
+ check_channel(DMA_XOR | DMA_ZERO_SUM, stripe_op_check_gen);
+
+ if (disks > MAX_HW_XOR_SRCS) {
+ /* we need to do a destructive xor
+ * i.e. the result needs to be temporarily stored in memory
+ */
+ dma_addr_t xor_destination_addr;
+ int count = 0;
+ int skip = -1;
+
+ xor_destination_addr = dma[count++] = sh->dev[pd_idx].dma;
+
+ /* xor all but one block */
+ for (i=disks; i--;)
+ if (i != pd_idx) {
+ if (skip < 0) {
+ skip = i;
+ continue;
+ }
+ dma[count++] = sh->dev[i].dma;
+ check_xor();
+ }
+ if (count > 1)
+ issue_xor();
+
+ /* zero result check the skipped block with
+ * the new parity
+ */
+ count = 2;
+ dma[1] = sh->dev[skip].dma;
+ do {
+ cookie = dma_async_zero_sum_dma_list(
+ sh->ops.dma_chan,
+ dma,
+ count,
+ STRIPE_SIZE,
+ &sh->ops.dma_result);
+ if (cookie == -ENOMEM)
+ dma_sync_wait(sh->ops.dma_chan,
+ sh->ops.dma_cookie);
+ else
+ WARN_ON(cookie <= 0);
+ } while (cookie == -ENOMEM);
+ sh->ops.dma_cookie = cookie;
+ } else {
+ int count = 0;
+ for (i=disks; i--;)
+ dma[count++] = sh->dev[i].dma;
+ do {
+ cookie = dma_async_zero_sum_dma_list(
+ sh->ops.dma_chan,
+ dma,
+ count,
+ STRIPE_SIZE,
+ &sh->ops.dma_result);
+ if (cookie == -ENOMEM)
+ dma_sync_wait(sh->ops.dma_chan,
+ sh->ops.dma_cookie);
+ else
+ WARN_ON(cookie <= 0);
+ } while (cookie == -ENOMEM);
+ sh->ops.dma_cookie = cookie;
+ }
+ set_bit(STRIPE_OP_CHECK_Verify, &ops_state);
+ set_bit(STRIPE_OP_CHECK_Dma, &ops_state);
+ }
+ }
+
+#ifdef CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE
+raid5_dma_retry:
+#endif
+ spin_lock(&sh->lock);
+ /* Update the state of operations:
+ * -clear incoming requests
+ * -preserve output status (i.e. done status / check result / dma)
+ * -preserve requests added since 'ops_state_orig' was set
+ */
+ sh->ops.state ^= (ops_state_orig & ~STRIPE_OP_COMPLETION_MASK);
+ sh->ops.state |= ops_state;
+
+ /* if we cleared an overlap condition wake up threads in make_request */
+ if (overlap)
+ for (i= disks; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (test_and_clear_bit(R5_Overlap, &dev->flags))
+ wake_up(&sh->raid_conf->wait_for_overlap);
+ }
+
+ if (dma_async_operation_complete(sh->ops.dma_chan, sh->ops.dma_cookie,
+ NULL, NULL) == DMA_IN_PROGRESS)
+ dma_async_issue_pending(sh->ops.dma_chan);
+ else { /* now that dma operations have quiesced update the stripe state */
+ int written, work;
+ written = 0;
+ work = 0;
+
+ if (test_and_clear_bit(STRIPE_OP_BIOFILL_Dma, &sh->ops.state)) {
+ work++;
+ set_bit(STRIPE_OP_BIOFILL_Done, &sh->ops.state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_COMPUTE_Dma, &sh->ops.state)) {
+ for (i=disks ; i-- ;)
+ if (test_and_clear_bit(R5_ComputeReq,
+ &sh->dev[i].flags)) {
+ set_bit(R5_UPTODATE,
+ &sh->dev[i].flags);
+ break;
+ }
+ work++;
+ set_bit(STRIPE_OP_COMPUTE_Done, &sh->ops.state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_RCW_Dma, &sh->ops.state)) {
+ work++;
+ written++;
+ set_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+ set_bit(STRIPE_OP_RCW_Done, &sh->ops.state);
+ }
+ if (test_and_clear_bit(STRIPE_OP_RMW_Dma, &sh->ops.state)) {
+ work++;
+ written++;
+ set_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+ set_bit(STRIPE_OP_RMW_Done, &sh->ops.state);
+ }
+ if (written)
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (dev->written)
+ set_bit(R5_UPTODATE, &dev->flags);
+ }
+ if (test_and_clear_bit(STRIPE_OP_CHECK_Dma, &sh->ops.state)) {
+ if (test_and_clear_bit(STRIPE_OP_CHECK_Verify,
+ &sh->ops.state)) {
+ work++;
+ if (sh->ops.dma_result == 0) {
+ set_bit(STRIPE_OP_CHECK_IsZero,
+ &sh->ops.state);
+
+ /* if the parity is correct and we
+ * performed the check withtout dirtying
+ * the parity block, mark it up to date.
+ */
+ if (disks <= MAX_HW_XOR_SRCS)
+ set_bit(R5_UPTODATE,
+ &sh->dev[sh->pd_idx].flags);
+
+ } else
+ clear_bit(STRIPE_OP_CHECK_IsZero,
+ &sh->ops.state);
+
+ set_bit(STRIPE_OP_CHECK_Done, &sh->ops.state);
+
+ } else
+ BUG();
+ }
+
+ sh->ops.pending -= work;
+ BUG_ON(sh->ops.pending < 0);
+
+ #ifdef CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE
+ /* return to the bookmark to continue the operation */
+ if (sh->ops.ops_bookmark) {
+ overlap = 0;
+ state = sh->state;
+ ops_state_orig = ops_state = sh->ops.state;
+ spin_unlock(&sh->lock);
+ goto *sh->ops.ops_bookmark;
+ }
+ #endif
+
+ /* the stripe is done with the channel */
+ dma_chan_put(sh->ops.dma_chan);
+ sh->ops.dma_chan = NULL;
+ sh->ops.dma_cookie = 0;
+ }
+
+ BUG_ON(sh->ops.pending == 0 && sh->ops.dma_chan);
+ clear_bit(STRIPE_OP_QUEUED, &sh->state);
+ set_bit(STRIPE_HANDLE, &sh->state);
+ queue_raid_work(sh);
+ spin_unlock(&sh->lock);
+
+ release_stripe(sh);
+}
+
+static void raid5_dma_event_callback(struct dma_client *client,
+ struct dma_chan *chan, enum dma_event event)
+{
+ switch (event) {
+ case DMA_RESOURCE_SUSPEND:
+ PRINTK("%s: DMA_RESOURCE_SUSPEND\n", __FUNCTION__);
+ break;
+ case DMA_RESOURCE_RESUME:
+ PRINTK("%s: DMA_RESOURCE_RESUME\n", __FUNCTION__);
+ break;
+ case DMA_RESOURCE_ADDED:
+ PRINTK("%s: DMA_RESOURCE_ADDED\n", __FUNCTION__);
+ break;
+ case DMA_RESOURCE_REMOVED:
+ PRINTK("%s: DMA_RESOURCE_REMOVED\n", __FUNCTION__);
+ break;
+ default:
+ PRINTK("%s: unknown\n", __FUNCTION__);
+ break;
+ }
+
+}
+
+static int __init raid5_dma_init (void)
+{
+ raid5_dma_client = dma_async_client_register(
+ &raid5_dma_event_callback);
+
+ if (raid5_dma_client == NULL)
+ return -ENOMEM;
+
+ __arch_raid5_dma_chan_request(raid5_dma_client);
+
+ printk("raid5-dma: driver initialized\n");
+ return 0;
+
+}
+
+static void __init raid5_dma_exit (void)
+{
+ if (raid5_dma_client)
+ dma_async_client_unregister(raid5_dma_client);
+
+ raid5_dma_client = NULL;
+}
+
+static struct dma_chan *raid5_dma_next_channel(void)
+{
+ return __arch_raid5_dma_next_channel(raid5_dma_client);
+}
+
+void raid5_dma_get_dma(struct raid5_dma *dma)
+{
+ dma->owner = THIS_MODULE;
+ dma->channel_iterate = raid5_dma_next_channel;
+ dma->do_block_ops = dma_do_raid5_block_ops;
+ atomic_inc(&raid5_count);
+}
+
+EXPORT_SYMBOL_GPL(raid5_dma_get_dma);
+
+module_init(raid5_dma_init);
+module_exit(raid5_dma_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("RAID5-DMA Offload Driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 2a16b3b..dbd3ddc 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -183,6 +183,17 @@ config MD_RAID456_WORKQUEUE_MULTITHREAD

If unsure say, Y.

+config MD_RAID5_HW_OFFLOAD
+ depends on MD_RAID456 && RAID5_DMA
+ bool "Execute raid5 xor/copy operations with hardware engines"
+ default y
+ ---help---
+ On platforms with the requisite hardware capabilities MD
+ can offload RAID5 stripe cache operations (i.e. parity
+ maintenance and bio buffer copies)
+
+ If unsure say, Y.
+
config MD_MULTIPATH
tristate "Multipath I/O support"
depends on BLK_DEV_MD
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ad6883b..4daa335 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -53,6 +53,16 @@ #include "raid6.h"

#include <linux/raid/bitmap.h>

+#ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+#include <linux/dma-mapping.h>
+extern void raid5_dma_get_dma(struct raid5_dma *dma);
+#endif /* CONFIG_MD_RAID5_HW_OFFLOAD */
+
+#ifdef CONFIG_MD_RAID6_HW_OFFLOAD
+#include <linux/dma-mapping.h>
+extern void raid6_dma_get_dma(struct raid6_dma *dma);
+#endif /* CONFIG_MD_RAID6_HW_OFFLOAD */
+
/*
* Stripe cache
*/
@@ -138,7 +148,7 @@ static void __release_stripe(raid5_conf_
}
}
}
-static void release_stripe(struct stripe_head *sh)
+void release_stripe(struct stripe_head *sh)
{
raid5_conf_t *conf = sh->raid_conf;
unsigned long flags;
@@ -193,6 +203,17 @@ static void shrink_buffers(struct stripe
p = sh->dev[i].page;
if (!p)
continue;
+ #ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ do {
+ raid5_conf_t *conf = sh->raid_conf;
+ struct dma_chan *chan = conf->dma.channel_iterate();
+ /* assumes that all channels share the same mapping
+ * characteristics
+ */
+ dma_async_unmap_page(chan, sh->dev[i].dma,
+ PAGE_SIZE, DMA_FROM_DEVICE);
+ } while (0);
+ #endif
sh->dev[i].page = NULL;
put_page(p);
}
@@ -209,6 +230,20 @@ static int grow_buffers(struct stripe_he
return 1;
}
sh->dev[i].page = page;
+ #ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ do {
+ raid5_conf_t *conf = sh->raid_conf;
+ struct dma_chan *chan = conf->dma.channel_iterate();
+ /* assumes that all channels share the same mapping
+ * characteristics
+ */
+ sh->dev[i].dma = dma_async_map_page(chan,
+ sh->dev[i].page,
+ 0,
+ PAGE_SIZE,
+ DMA_FROM_DEVICE);
+ } while (0);
+ #endif
}
return 0;
}
@@ -576,6 +611,13 @@ #if 0
#else
set_bit(R5_UPTODATE, &sh->dev[i].flags);
#endif
+#ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ /* If the backing block device driver performed a pio
+ * read then the buffer needs to be cleaned
+ */
+ consistent_sync(page_address(sh->dev[i].page), PAGE_SIZE,
+ DMA_TO_DEVICE);
+#endif
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
rdev = conf->disks[i].rdev;
printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n",
@@ -666,6 +708,15 @@ static int raid5_end_write_request (stru
rdev_dec_pending(conf->disks[i].rdev, conf->mddev);

clear_bit(R5_LOCKED, &sh->dev[i].flags);
+
+ #ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ /* If the backing block device driver performed a pio
+ * write then the buffer needs to be invalidated
+ */
+ consistent_sync(page_address(sh->dev[i].page), PAGE_SIZE,
+ DMA_FROM_DEVICE);
+ #endif
+
set_bit(STRIPE_HANDLE, &sh->state);
__release_stripe(conf, sh);
spin_unlock_irqrestore(&conf->device_lock, flags);
@@ -1311,6 +1362,7 @@ static int stripe_to_pdidx(sector_t stri
return pd_idx;
}

+#ifndef CONFIG_MD_RAID5_HW_OFFLOAD
/*
* raid5_do_soft_block_ops - perform block memory operations on stripe data
* outside the spin lock.
@@ -1600,6 +1652,7 @@ static void raid5_do_soft_block_ops(void

release_stripe(sh);
}
+#endif /* #ifndef CONFIG_MD_RAID5_HW_OFFLOAD*/

/*
* handle_stripe - do things to a stripe.
@@ -3553,12 +3606,12 @@ static int run(mddev_t *mddev)
#endif
#endif

- /* To Do:
- * 1/ Offload to asynchronous copy / xor engines
- * 2/ Automated selection of optimal do_block_ops
- * routine similar to the xor template selection
- */
+ #ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ raid5_dma_get_dma(&conf->dma);
+ conf->do_block_ops = conf->dma.do_block_ops;
+ #else
conf->do_block_ops = raid5_do_soft_block_ops;
+ #endif


spin_lock_init(&conf->device_lock);
@@ -4184,6 +4237,7 @@ static void raid5_exit(void)
unregister_md_personality(&raid4_personality);
}

+EXPORT_SYMBOL(release_stripe);
module_init(raid5_init);
module_exit(raid5_exit);
MODULE_LICENSE("GPL");
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 0a70c9e..7fd5aaf 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -762,6 +762,11 @@ static inline enum dma_status dma_sync_w
return status;
}

+static inline dma_cookie_t dma_async_get_last_cookie(struct dma_chan *chan)
+{
+ return chan->cookie;
+}
+
/* --- DMA device --- */

int dma_async_device_register(struct dma_device *device);
diff --git a/include/linux/raid/raid5.h b/include/linux/raid/raid5.h
index 31ae55c..f5b021d 100644
--- a/include/linux/raid/raid5.h
+++ b/include/linux/raid/raid5.h
@@ -4,6 +4,9 @@ #define _RAID5_H
#include <linux/raid/md.h>
#include <linux/raid/xor.h>
#include <linux/workqueue.h>
+#ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+#include <linux/dmaengine.h>
+#endif

/*
*
@@ -169,16 +172,28 @@ struct stripe_head {
#ifdef CONFIG_MD_RAID456_WORKQUEUE
struct work_struct work; /* work queue descriptor */
#endif
+ #ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ u32 dma_result; /* storage for dma engine zero sum results */
+ dma_cookie_t dma_cookie; /* last issued dma operation */
+ struct dma_chan *dma_chan; /* dma channel for ops offload */
+ #ifdef CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE
+ void *ops_bookmark; /* place holder for requeued stripes */
+ #endif /* CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE */
+ #endif /* CONFIG_MD_RAID5_HW_OFFLOAD */
} ops;
struct r5dev {
struct bio req;
struct bio_vec vec;
struct page *page;
+ #ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ dma_addr_t dma;
+ #endif
struct bio *toread, *read, *towrite, *written;
sector_t sector; /* sector of this page */
unsigned long flags;
} dev[1]; /* allocated with extra space depending of RAID geometry */
};
+
/* Flags */
#define R5_UPTODATE 0 /* page contains current data */
#define R5_LOCKED 1 /* IO has been submitted on "req" */
@@ -190,7 +205,6 @@ #define R5_Wantwrite 5
#define R5_Overlap 7 /* There is a pending overlapping request on this block */
#define R5_ReadError 8 /* seen a read error here recently */
#define R5_ReWrite 9 /* have tried to over-write the readerror */
-
#define R5_Expanded 10 /* This block now has post-expand data */
#define R5_Consistent 11 /* Block is HW DMA-able without a cache flush */
#define R5_ComputeReq 12 /* compute_block in progress treat as uptodate */
@@ -373,6 +387,14 @@ struct raid5_private_data {
int pool_size; /* number of disks in stripeheads in pool */
spinlock_t device_lock;
struct disk_info *disks;
+#ifdef CONFIG_MD_RAID5_HW_OFFLOAD
+ struct raid5_dma {
+ struct module *owner;
+ void (*do_block_ops)(void *stripe_ref);
+ struct dma_chan * (*channel_iterate)(void);
+ } dma;
+#endif
+
};

typedef struct raid5_private_data raid5_conf_t;

2006-09-11 23:19:47

by Dan Williams

[permalink] [raw]
Subject: [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines

From: Dan Williams <[email protected]>

This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor,
pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy
operations.

Changelog:
* fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few
slots to be requested eventually leading to data corruption
* enabled the slot allocation routine to attempt to free slots before
returning -ENOMEM
* switched the cleanup routine to solely use the software chain and the
status register to determine if a descriptor is complete. This is
necessary to support other IOP engines that do not have status writeback
capability
* make the driver iop generic
* modified the allocation routines to understand allocating a group of
slots for a single operation
* added a null xor initialization operation for the xor only channel on
iop3xx
* add software emulation of zero sum on iop32x
* support xor operations on buffers larger than the hardware maximum
* add architecture specific raid5-dma support functions

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/Kconfig | 27 +
drivers/dma/Makefile | 1
drivers/dma/iop-adma.c | 1501 +++++++++++++++++++++++++++++++++++
include/asm-arm/hardware/iop_adma.h | 98 ++
4 files changed, 1624 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index fced8c3..3556143 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -7,8 +7,8 @@ menu "DMA Engine support"
config DMA_ENGINE
bool "Support for DMA engines"
---help---
- DMA engines offload copy operations from the CPU to dedicated
- hardware, allowing the copies to happen asynchronously.
+ DMA engines offload block memory operations from the CPU to dedicated
+ hardware, allowing the operations to happen asynchronously.

comment "DMA Clients"

@@ -28,9 +28,19 @@ config RAID5_DMA
default y
---help---
This enables the use of DMA engines in the MD-RAID5 driver to
- offload stripe cache operations, freeing CPU cycles.
+ offload stripe cache operations (i.e. xor, memcpy), freeing CPU cycles.
say Y here

+config RAID5_DMA_WAIT_VIA_REQUEUE
+ bool "raid5-dma: Non-blocking channel switching"
+ depends on RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH && RAID5_DMA && BROKEN
+ default n
+ ---help---
+ This enables the raid5-dma driver to continue to operate on incoming
+ stripes when it determines that the current stripe must wait for a
+ a hardware channel to finish operations. This code is a work in
+ progress, only say Y to debug the implementation, otherwise say N.
+
comment "DMA Devices"

config INTEL_IOATDMA
@@ -40,4 +50,15 @@ config INTEL_IOATDMA
---help---
Enable support for the Intel(R) I/OAT DMA engine.

+config INTEL_IOP_ADMA
+ tristate "Intel IOP ADMA support"
+ depends on DMA_ENGINE && (ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX)
+ select RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH if (ARCH_IOP32X || ARCH_IOP33X)
+ default m
+ ---help---
+ Enable support for the Intel(R) IOP Series RAID engines.
+
+config RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH
+ bool
+
endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 4e36d6e..233eae7 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
obj-$(CONFIG_NET_DMA) += iovlock.o
obj-$(CONFIG_RAID5_DMA) += raid5-dma.o
obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
+obj-$(CONFIG_INTEL_IOP_ADMA) += iop-adma.o
diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c
new file mode 100644
index 0000000..51f1c54
--- /dev/null
+++ b/drivers/dma/iop-adma.c
@@ -0,0 +1,1501 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This driver supports the asynchrounous DMA copy and RAID engines available
+ * on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x)
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/dmaengine.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/spinlock.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <asm/arch/adma.h>
+#include <asm/memory.h>
+
+#define to_iop_adma_chan(chan) container_of(chan, struct iop_adma_chan, common)
+#define to_iop_adma_device(dev) container_of(dev, struct iop_adma_device, common)
+#define to_iop_adma_slot(lh) container_of(lh, struct iop_adma_desc_slot, slot_node)
+
+#define IOP_ADMA_DEBUG 0
+#define PRINTK(x...) ((void)(IOP_ADMA_DEBUG && printk(x)))
+
+/* software zero sum implemenation bits for iop32x */
+#ifdef CONFIG_ARCH_IOP32X
+char iop32x_zero_result_buffer[PAGE_SIZE] __attribute__((aligned(256)));
+u32 *iop32x_zero_sum_output;
+#endif
+
+/**
+ * iop_adma_free_slots - flags descriptor slots for reuse
+ * @slot: Slot to free
+ * Caller must hold &iop_chan->lock while calling this function
+ */
+static inline void iop_adma_free_slots(struct iop_adma_desc_slot *slot)
+{
+ int stride = slot->stride;
+ while (stride--) {
+ slot->stride = 0;
+ slot = list_entry(slot->slot_node.next,
+ struct iop_adma_desc_slot,
+ slot_node);
+ }
+}
+
+static void __iop_adma_slot_cleanup(struct iop_adma_chan *iop_chan)
+{
+ struct iop_adma_desc_slot *iter, *_iter;
+ dma_cookie_t cookie = 0;
+ struct device *dev = &iop_chan->device->pdev->dev;
+ u32 current_desc = iop_chan_get_current_descriptor(iop_chan);
+ int busy = iop_chan_is_busy(iop_chan);
+ int seen_current = 0;
+
+ /* free completed slots from the chain starting with
+ * the oldest descriptor
+ */
+ list_for_each_entry_safe(iter, _iter, &iop_chan->chain,
+ chain_node) {
+ PRINTK("%s: [%d] cookie: %d busy: %x next: %x\n",
+ __FUNCTION__, iter->idx, iter->cookie, busy,
+ iop_desc_get_next_desc(iter, iop_chan));
+
+ /* do not advance past the current descriptor loaded into the
+ * hardware channel, subsequent descriptors are either in process
+ * or have not been submitted
+ */
+ if (seen_current)
+ break;
+
+ /* stop the search if we reach the current descriptor and the
+ * channel is busy, or if it appears that the current descriptor
+ * needs to be re-read (i.e. has been appended to)
+ */
+ if (iter->phys == current_desc) {
+ BUG_ON(seen_current++);
+ if (busy || iop_desc_get_next_desc(iter, iop_chan))
+ break;
+ }
+
+ /* if we are tracking a group of zero-result descriptors add
+ * the current result to the accumulator
+ */
+ if (iop_chan->zero_sum_group) {
+ iop_chan->result_accumulator |=
+ iop_desc_get_zero_result(iter);
+ PRINTK("%s: add to zero sum group acc: %d this: %d\n", __FUNCTION__,
+ iop_chan->result_accumulator, iop_desc_get_zero_result(iter));
+ }
+
+ if (iter->cookie) {
+ u32 src_cnt = iter->src_cnt;
+ u32 len = iop_desc_get_byte_count(iter, iop_chan);
+ dma_addr_t addr;
+
+ cookie = iter->cookie;
+ iter->cookie = 0;
+
+ /* the first and last descriptor in a zero sum group
+ * will have 'xor_check_result' set
+ */
+ if (iter->xor_check_result) {
+ if (iter->slot_cnt > iter->slots_per_op) {
+ if (!iop_chan->zero_sum_group) {
+ iop_chan->zero_sum_group = 1;
+ iop_chan->result_accumulator |=
+ iop_desc_get_zero_result(iter);
+ }
+ PRINTK("%s: start zero sum group acc: %d this: %d\n", __FUNCTION__,
+ iop_chan->result_accumulator, iop_desc_get_zero_result(iter));
+ } else {
+ if (!iop_chan->zero_sum_group)
+ iop_chan->result_accumulator |=
+ iop_desc_get_zero_result(iter);
+ else
+ iop_chan->zero_sum_group = 0;
+
+ *iter->xor_check_result = iop_chan->result_accumulator;
+ iop_chan->result_accumulator = 0;
+
+ PRINTK("%s: end zero sum group acc: %d this: %d\n", __FUNCTION__,
+ *iter->xor_check_result, iop_desc_get_zero_result(iter));
+ }
+ }
+
+ /* unmap dma ranges */
+ switch (iter->flags & (DMA_DEST_BUF | DMA_DEST_PAGE |
+ DMA_DEST_DMA)) {
+ case DMA_DEST_BUF:
+ addr = iop_desc_get_dest_addr(iter, iop_chan);
+ dma_unmap_single(dev, addr, len, DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_PAGE:
+ addr = iop_desc_get_dest_addr(iter, iop_chan);
+ dma_unmap_page(dev, addr, len, DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_DMA:
+ break;
+ }
+
+ switch (iter->flags & (DMA_SRC_BUF |
+ DMA_SRC_PAGE | DMA_SRC_DMA |
+ DMA_SRC_PAGES | DMA_SRC_DMA_LIST)) {
+ case DMA_SRC_BUF:
+ addr = iop_desc_get_src_addr(iter, iop_chan, 0);
+ dma_unmap_single(dev, addr, len, DMA_TO_DEVICE);
+ break;
+ case DMA_SRC_PAGE:
+ addr = iop_desc_get_src_addr(iter, iop_chan, 0);
+ dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
+ break;
+ case DMA_SRC_PAGES:
+ while(src_cnt--) {
+ addr = iop_desc_get_src_addr(iter,
+ iop_chan,
+ src_cnt);
+ dma_unmap_page(dev, addr, len,
+ DMA_TO_DEVICE);
+ }
+ break;
+ case DMA_SRC_DMA:
+ case DMA_SRC_DMA_LIST:
+ break;
+ }
+ }
+
+ /* leave the last descriptor in the chain
+ * so we can append to it
+ */
+ if (iter->chain_node.next == &iop_chan->chain)
+ break;
+
+ PRINTK("iop adma%d: cleanup %d stride %d\n",
+ iop_chan->device->id, iter->idx, iter->stride);
+
+ list_del(&iter->chain_node);
+ iop_adma_free_slots(iter);
+ }
+
+ BUG_ON(!seen_current);
+
+ if (cookie) {
+ iop_chan->completed_cookie = cookie;
+
+ PRINTK("iop adma%d: completed cookie %d\n",
+ iop_chan->device->id, cookie);
+ }
+}
+
+static inline void iop_adma_slot_cleanup(struct iop_adma_chan *iop_chan)
+{
+ spin_lock_bh(&iop_chan->lock);
+ __iop_adma_slot_cleanup(iop_chan);
+ spin_unlock_bh(&iop_chan->lock);
+}
+
+static struct iop_adma_desc_slot *
+__iop_adma_alloc_slots(struct iop_adma_chan *iop_chan, int num_slots,
+ int slots_per_op, int recurse)
+{
+ struct iop_adma_desc_slot *iter = NULL, *alloc_start = NULL;
+ int i;
+
+ /* start search from the last allocated descrtiptor
+ * if a contiguous allocation can not be found start searching
+ * from the beginning of the list
+ */
+ for (i = 0; i < 2; i++) {
+ int slots_found = 0;
+ if (i == 0)
+ iter = iop_chan->last_used;
+ else {
+ iter = list_entry(&iop_chan->all_slots,
+ struct iop_adma_desc_slot,
+ slot_node);
+ }
+
+ list_for_each_entry_continue(iter, &iop_chan->all_slots, slot_node) {
+ if (iter->stride) {
+ /* give up after finding the first busy slot
+ * on the second pass through the list
+ */
+ if (i == 1)
+ break;
+
+ slots_found = 0;
+ continue;
+ }
+
+ /* start the allocation if the slot is correctly aligned */
+ if (!slots_found++) {
+ if (iop_desc_is_aligned(iter, slots_per_op))
+ alloc_start = iter;
+ else {
+ slots_found = 0;
+ continue;
+ }
+ }
+
+ if (slots_found == num_slots) {
+ iter = alloc_start;
+ while (num_slots) {
+ PRINTK("iop adma%d: allocated [%d] "
+ "(desc %p phys: %#x) stride %d\n",
+ iop_chan->device->id,
+ iter->idx, iter->hw_desc, iter->phys,
+ slots_per_op);
+ iop_chan->last_used = iter;
+ list_add_tail(&iter->chain_node,
+ &iop_chan->chain);
+ iter->slot_cnt = num_slots;
+ iter->slots_per_op = slots_per_op;
+ iter->xor_check_result = NULL;
+ iter->cookie = 0;
+ for (i = 0; i < slots_per_op; i++) {
+ iter->stride = slots_per_op - i;
+ iter = list_entry(iter->slot_node.next,
+ struct iop_adma_desc_slot,
+ slot_node);
+ }
+ num_slots -= slots_per_op;
+ }
+ return alloc_start;
+ }
+ }
+ }
+
+ /* try once to free some slots if the allocation fails */
+ if (recurse) {
+ __iop_adma_slot_cleanup(iop_chan);
+ return __iop_adma_alloc_slots(iop_chan, num_slots, slots_per_op, 0);
+ } else
+ return NULL;
+}
+
+static struct iop_adma_desc_slot *
+iop_adma_alloc_slots(struct iop_adma_chan *iop_chan,
+ int num_slots,
+ int slots_per_op)
+{
+ return __iop_adma_alloc_slots(iop_chan, num_slots, slots_per_op, 1);
+}
+
+static void iop_chan_start_null_memcpy(struct iop_adma_chan *iop_chan);
+static void iop_chan_start_null_xor(struct iop_adma_chan *iop_chan);
+
+/* returns the actual number of allocated descriptors */
+static int iop_adma_alloc_chan_resources(struct dma_chan *chan)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ struct iop_adma_desc_slot *slot = NULL;
+ char *hw_desc;
+ int i;
+ int init = iop_chan->slots_allocated ? 0 : 1;
+ struct iop_adma_platform_data *plat_data;
+
+ plat_data = iop_chan->device->pdev->dev.platform_data;
+
+ spin_lock_bh(&iop_chan->lock);
+ /* Allocate descriptor slots */
+ i = iop_chan->slots_allocated;
+ for (; i < (plat_data->pool_size/IOP_ADMA_SLOT_SIZE); i++) {
+ slot = kmalloc(sizeof(*slot), GFP_KERNEL);
+ if (!slot) {
+ printk(KERN_INFO "IOP ADMA Channel only initialized"
+ " %d descriptor slots", i--);
+ break;
+ }
+ hw_desc = (char *) iop_chan->device->dma_desc_pool_virt;
+ slot->hw_desc = (void *) &hw_desc[i * IOP_ADMA_SLOT_SIZE];
+
+ INIT_LIST_HEAD(&slot->chain_node);
+ INIT_LIST_HEAD(&slot->slot_node);
+ hw_desc = (char *) iop_chan->device->dma_desc_pool;
+ slot->phys = (dma_addr_t) &hw_desc[i * IOP_ADMA_SLOT_SIZE];
+ slot->stride = 0;
+ slot->cookie = 0;
+ slot->xor_check_result = NULL;
+ slot->idx = i;
+ list_add_tail(&slot->slot_node, &iop_chan->all_slots);
+ }
+ if (i && !iop_chan->last_used)
+ iop_chan->last_used = list_entry(iop_chan->all_slots.next,
+ struct iop_adma_desc_slot,
+ slot_node);
+
+ iop_chan->slots_allocated = i;
+ PRINTK("iop adma%d: allocated %d descriptor slots last_used: %p\n",
+ iop_chan->device->id, i, iop_chan->last_used);
+ spin_unlock_bh(&iop_chan->lock);
+
+ /* initialize the channel and the chain with a null operation */
+ if (init) {
+ if (iop_chan->device->common.capabilities & DMA_MEMCPY)
+ iop_chan_start_null_memcpy(iop_chan);
+ else if (iop_chan->device->common.capabilities & DMA_XOR)
+ iop_chan_start_null_xor(iop_chan);
+ else
+ BUG();
+ }
+
+ return (i > 0) ? i : -ENOMEM;
+}
+
+/* chain the descriptors */
+static inline void iop_chan_chain_desc(struct iop_adma_chan *iop_chan,
+ struct iop_adma_desc_slot *desc)
+{
+ struct iop_adma_desc_slot *prev = list_entry(desc->chain_node.prev,
+ struct iop_adma_desc_slot,
+ chain_node);
+ iop_desc_set_next_desc(prev, iop_chan, desc->phys);
+}
+
+static inline void iop_desc_assign_cookie(struct iop_adma_chan *iop_chan,
+ struct iop_adma_desc_slot *desc)
+{
+ dma_cookie_t cookie = iop_chan->common.cookie;
+ cookie++;
+ if (cookie < 0)
+ cookie = 1;
+ iop_chan->common.cookie = desc->cookie = cookie;
+ PRINTK("iop adma%d: %s cookie %d slot %d\n",
+ iop_chan->device->id, __FUNCTION__, cookie, desc->idx);
+}
+
+static inline void iop_adma_check_threshold(struct iop_adma_chan *iop_chan)
+{
+ if (iop_chan->pending >= IOP_ADMA_THRESHOLD) {
+ iop_chan->pending = 0;
+ iop_chan_append(iop_chan);
+ }
+}
+
+static dma_cookie_t do_iop_adma_memcpy(struct dma_chan *chan,
+ union dmaengine_addr dest,
+ unsigned int dest_off,
+ union dmaengine_addr src,
+ unsigned int src_off,
+ size_t len,
+ unsigned long flags)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ dma_cookie_t ret = -ENOMEM;
+ struct iop_adma_desc_slot *sw_desc;
+ int slot_cnt, slots_per_op;
+
+ if (!chan || !dest.dma || !src.dma)
+ return -EFAULT;
+ if (!len)
+ return iop_chan->common.cookie;
+
+ PRINTK("iop adma%d: %s len: %u flags: %#lx\n",
+ iop_chan->device->id, __FUNCTION__, len, flags);
+
+ switch (flags & (DMA_SRC_BUF | DMA_SRC_PAGE | DMA_SRC_DMA)) {
+ case DMA_SRC_BUF:
+ src.dma = dma_map_single(&iop_chan->device->pdev->dev,
+ src.buf, len, DMA_TO_DEVICE);
+ break;
+ case DMA_SRC_PAGE:
+ src.dma = dma_map_page(&iop_chan->device->pdev->dev,
+ src.pg, src_off, len, DMA_TO_DEVICE);
+ break;
+ case DMA_SRC_DMA:
+ break;
+ default:
+ return -EFAULT;
+ }
+
+ switch (flags & (DMA_DEST_BUF | DMA_DEST_PAGE | DMA_DEST_DMA)) {
+ case DMA_DEST_BUF:
+ dest.dma = dma_map_single(&iop_chan->device->pdev->dev,
+ dest.buf, len, DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_PAGE:
+ dest.dma = dma_map_page(&iop_chan->device->pdev->dev,
+ dest.pg, dest_off, len, DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_DMA:
+ break;
+ default:
+ return -EFAULT;
+ }
+
+ spin_lock_bh(&iop_chan->lock);
+ slot_cnt = iop_chan_memcpy_slot_count(len, &slots_per_op);
+ sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+ if (sw_desc) {
+ iop_desc_init_memcpy(sw_desc);
+ iop_desc_set_byte_count(sw_desc, iop_chan, len);
+ iop_desc_set_dest_addr(sw_desc, iop_chan, dest.dma);
+ iop_desc_set_memcpy_src_addr(sw_desc, src.dma, slot_cnt, slots_per_op);
+
+ iop_chan_chain_desc(iop_chan, sw_desc);
+ iop_desc_assign_cookie(iop_chan, sw_desc);
+
+ sw_desc->flags = flags;
+ iop_chan->pending++;
+ ret = sw_desc->cookie;
+ }
+ spin_unlock_bh(&iop_chan->lock);
+
+ iop_adma_check_threshold(iop_chan);
+
+ return ret;
+}
+
+static dma_cookie_t do_iop_adma_memset(struct dma_chan *chan,
+ union dmaengine_addr dest,
+ unsigned int dest_off,
+ int val,
+ size_t len,
+ unsigned long flags)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ dma_cookie_t ret = -ENOMEM;
+ struct iop_adma_desc_slot *sw_desc;
+ int slot_cnt, slots_per_op;
+
+ if (!chan || !dest.dma)
+ return -EFAULT;
+ if (!len)
+ return iop_chan->common.cookie;
+
+ PRINTK("iop adma%d: %s len: %u flags: %#lx\n",
+ iop_chan->device->id, __FUNCTION__, len, flags);
+
+ switch (flags & (DMA_DEST_BUF | DMA_DEST_PAGE | DMA_DEST_DMA)) {
+ case DMA_DEST_BUF:
+ dest.dma = dma_map_single(&iop_chan->device->pdev->dev,
+ dest.buf, len, DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_PAGE:
+ dest.dma = dma_map_page(&iop_chan->device->pdev->dev,
+ dest.pg, dest_off, len, DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_DMA:
+ break;
+ default:
+ return -EFAULT;
+ }
+
+ spin_lock_bh(&iop_chan->lock);
+ slot_cnt = iop_chan_memset_slot_count(len, &slots_per_op);
+ sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+ if (sw_desc) {
+ iop_desc_init_memset(sw_desc);
+ iop_desc_set_byte_count(sw_desc, iop_chan, len);
+ iop_desc_set_block_fill_val(sw_desc, val);
+ iop_desc_set_dest_addr(sw_desc, iop_chan, dest.dma);
+
+ iop_chan_chain_desc(iop_chan, sw_desc);
+ iop_desc_assign_cookie(iop_chan, sw_desc);
+
+ sw_desc->flags = flags;
+ iop_chan->pending++;
+ ret = sw_desc->cookie;
+ }
+ spin_unlock_bh(&iop_chan->lock);
+
+ iop_adma_check_threshold(iop_chan);
+
+ return ret;
+}
+
+/**
+ * do_iop_adma_xor - xor from source pages to a dest page
+ * @chan: common channel handle
+ * @dest: DMAENGINE destination address
+ * @dest_off: offset into the destination page
+ * @src: DMAENGINE source addresses
+ * @src_cnt: number of source pages
+ * @src_off: offset into the source pages
+ * @len: transaction length in bytes
+ * @flags: DMAENGINE address type flags
+ */
+static dma_cookie_t do_iop_adma_xor(struct dma_chan *chan,
+ union dmaengine_addr dest,
+ unsigned int dest_off,
+ union dmaengine_addr src,
+ unsigned int src_cnt,
+ unsigned int src_off,
+ size_t len,
+ unsigned long flags)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ struct device *dev = &iop_chan->device->pdev->dev;
+ struct iop_adma_desc_slot *sw_desc;
+ dma_cookie_t ret = -ENOMEM;
+ int slot_cnt, slots_per_op;
+
+ if (!chan || !dest.dma || !src.dma_list)
+ return -EFAULT;
+
+ if (!len)
+ return iop_chan->common.cookie;
+
+ PRINTK("iop adma%d: %s src_cnt: %d len: %u flags: %lx\n",
+ iop_chan->device->id, __FUNCTION__, src_cnt, len, flags);
+
+ spin_lock_bh(&iop_chan->lock);
+ slot_cnt = iop_chan_xor_slot_count(len, src_cnt, &slots_per_op);
+ sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+ if (sw_desc) {
+ #ifdef CONFIG_ARCH_IOP32X
+ if ((flags & DMA_DEST_BUF) &&
+ dest.buf == (void *) iop32x_zero_result_buffer) {
+ PRINTK("%s: iop32x zero sum emulation requested\n",
+ __FUNCTION__);
+ sw_desc->xor_check_result = iop32x_zero_sum_output;
+ }
+ #endif
+
+ iop_desc_init_xor(sw_desc, src_cnt);
+ iop_desc_set_byte_count(sw_desc, iop_chan, len);
+
+ switch (flags & (DMA_DEST_BUF | DMA_DEST_PAGE |
+ DMA_DEST_PAGES | DMA_DEST_DMA |
+ DMA_DEST_DMA_LIST)) {
+ case DMA_DEST_PAGE:
+ dest.dma = dma_map_page(dev, dest.pg, dest_off, len,
+ DMA_FROM_DEVICE);
+ break;
+ case DMA_DEST_BUF:
+ dest.dma = dma_map_single(dev, dest.buf, len,
+ DMA_FROM_DEVICE);
+ break;
+ }
+
+ iop_desc_set_dest_addr(sw_desc, iop_chan, dest.dma);
+
+ switch (flags & (DMA_SRC_BUF | DMA_SRC_PAGE |
+ DMA_SRC_PAGES | DMA_SRC_DMA |
+ DMA_SRC_DMA_LIST)) {
+ case DMA_SRC_PAGES:
+ while (src_cnt--) {
+ dma_addr_t addr = dma_map_page(dev,
+ src.pgs[src_cnt],
+ src_off, len,
+ DMA_TO_DEVICE);
+ iop_desc_set_xor_src_addr(sw_desc,
+ src_cnt,
+ addr,
+ slot_cnt,
+ slots_per_op);
+ }
+ break;
+ case DMA_SRC_DMA_LIST:
+ while (src_cnt--) {
+ iop_desc_set_xor_src_addr(sw_desc,
+ src_cnt,
+ src.dma_list[src_cnt],
+ slot_cnt,
+ slots_per_op);
+ }
+ break;
+ }
+
+ iop_chan_chain_desc(iop_chan, sw_desc);
+ iop_desc_assign_cookie(iop_chan, sw_desc);
+
+ sw_desc->flags = flags;
+ iop_chan->pending++;
+ ret = sw_desc->cookie;
+ }
+ spin_unlock_bh(&iop_chan->lock);
+
+ iop_adma_check_threshold(iop_chan);
+
+ return ret;
+}
+
+/**
+ * do_iop_adma_zero_sum - xor the sources together and report whether
+ * the sum is zero
+ * @chan: common channel handle
+ * @src: DMAENGINE source addresses
+ * @src_cnt: number of sources
+ * @src_off: offset into the sources
+ * @len: transaction length in bytes
+ * @flags: DMAENGINE address type flags
+ * @result: set to 1 if sum is zero else 0
+ */
+#ifndef CONFIG_ARCH_IOP32X
+static dma_cookie_t do_iop_adma_zero_sum(struct dma_chan *chan,
+ union dmaengine_addr src,
+ unsigned int src_cnt,
+ unsigned int src_off,
+ size_t len,
+ u32 *result,
+ unsigned long flags)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ struct iop_adma_desc_slot *sw_desc;
+ dma_cookie_t ret = -ENOMEM;
+ int slot_cnt, slots_per_op;
+
+ if (!chan || !src.dma_list || !result)
+ return -EFAULT;
+
+ if (!len)
+ return iop_chan->common.cookie;
+
+ PRINTK("iop adma%d: %s src_cnt: %d len: %u flags: %lx\n",
+ iop_chan->device->id, __FUNCTION__, src_cnt, len, flags);
+
+ spin_lock_bh(&iop_chan->lock);
+ slot_cnt = iop_chan_zero_sum_slot_count(len, src_cnt, &slots_per_op);
+ sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+ if (sw_desc) {
+ struct device *dev = &iop_chan->device->pdev->dev;
+ iop_chan->pending += iop_desc_init_zero_sum(sw_desc, src_cnt,
+ slot_cnt, slots_per_op);
+
+ switch (flags & (DMA_SRC_BUF | DMA_SRC_PAGE |
+ DMA_SRC_PAGES | DMA_SRC_DMA |
+ DMA_SRC_DMA_LIST)) {
+ case DMA_SRC_PAGES:
+ while (src_cnt--) {
+ dma_addr_t addr = dma_map_page(dev,
+ src.pgs[src_cnt],
+ src_off, len,
+ DMA_TO_DEVICE);
+ iop_desc_set_zero_sum_src_addr(sw_desc,
+ src_cnt,
+ addr,
+ slot_cnt,
+ slots_per_op);
+ }
+ break;
+ case DMA_SRC_DMA_LIST:
+ while (src_cnt--) {
+ iop_desc_set_zero_sum_src_addr(sw_desc,
+ src_cnt,
+ src.dma_list[src_cnt],
+ slot_cnt,
+ slots_per_op);
+ }
+ break;
+ }
+
+ iop_desc_set_zero_sum_byte_count(sw_desc, len, slots_per_op);
+
+ /* assign a cookie to the first descriptor so
+ * the buffers are unmapped
+ */
+ iop_desc_assign_cookie(iop_chan, sw_desc);
+ sw_desc->flags = flags;
+
+ /* assign cookie to the last descriptor in the group
+ * so the xor_check_result is updated. Also, set the
+ * xor_check_result ptr of the first and last descriptor
+ * so the cleanup routine can sum the group of results
+ */
+ if (slot_cnt > slots_per_op) {
+ struct iop_adma_desc_slot *desc;
+ desc = list_entry(iop_chan->chain.prev,
+ struct iop_adma_desc_slot,
+ chain_node);
+ iop_desc_assign_cookie(iop_chan, desc);
+ sw_desc->xor_check_result = result;
+ desc->xor_check_result = result;
+ ret = desc->cookie;
+ } else {
+ sw_desc->xor_check_result = result;
+ ret = sw_desc->cookie;
+ }
+
+ /* add the group to the chain */
+ iop_chan_chain_desc(iop_chan, sw_desc);
+ }
+ spin_unlock_bh(&iop_chan->lock);
+
+ iop_adma_check_threshold(iop_chan);
+
+ return ret;
+}
+#else
+/* iop32x does not support zero sum in hardware, so we simulate
+ * it in software. It only supports a PAGE_SIZE length which is
+ * enough to support md raid.
+ */
+static dma_cookie_t do_iop_adma_zero_sum(struct dma_chan *chan,
+ union dmaengine_addr src,
+ unsigned int src_cnt,
+ unsigned int src_off,
+ size_t len,
+ u32 *result,
+ unsigned long flags)
+{
+ static union dmaengine_addr dest_addr = { .buf = iop32x_zero_result_buffer };
+ static dma_cookie_t last_zero_result_cookie = 0;
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ dma_cookie_t ret;
+
+ if (!chan || !src.dma_list || !result)
+ return -EFAULT;
+
+ if (!len)
+ return iop_chan->common.cookie;
+
+ if (len > sizeof(iop32x_zero_result_buffer)) {
+ printk(KERN_ERR "iop32x performs zero sum with a %d byte buffer, %d"
+ " bytes is too large\n", sizeof(iop32x_zero_result_buffer),
+ len);
+ BUG();
+ return -EFAULT;
+ }
+
+ /* we only have 1 result buffer, it can not be shared */
+ if (last_zero_result_cookie) {
+ PRINTK("%s: waiting for last_zero_result_cookie: %d\n",
+ __FUNCTION__, last_zero_result_cookie);
+ dma_sync_wait(chan, last_zero_result_cookie);
+ last_zero_result_cookie = 0;
+ }
+
+ PRINTK("iop adma%d: %s src_cnt: %d len: %u flags: %lx\n",
+ iop_chan->device->id, __FUNCTION__, src_cnt, len, flags);
+
+ flags |= DMA_DEST_BUF;
+ iop32x_zero_sum_output = result;
+
+ ret = do_iop_adma_xor(chan, dest_addr, 0, src, src_cnt, src_off,
+ len, flags);
+
+ if (ret > 0)
+ last_zero_result_cookie = ret;
+
+ return ret;
+}
+#endif
+
+static void iop_adma_free_chan_resources(struct dma_chan *chan)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ struct iop_adma_desc_slot *iter, *_iter;
+ int in_use_descs = 0;
+
+ iop_adma_slot_cleanup(iop_chan);
+
+ spin_lock_bh(&iop_chan->lock);
+ list_for_each_entry_safe(iter, _iter, &iop_chan->chain,
+ chain_node) {
+ in_use_descs++;
+ list_del(&iter->chain_node);
+ }
+ list_for_each_entry_safe_reverse(iter, _iter, &iop_chan->all_slots, slot_node) {
+ list_del(&iter->slot_node);
+ kfree(iter);
+ iop_chan->slots_allocated--;
+ }
+ iop_chan->last_used = NULL;
+
+ PRINTK("iop adma%d %s slots_allocated %d\n", iop_chan->device->id,
+ __FUNCTION__, iop_chan->slots_allocated);
+ spin_unlock_bh(&iop_chan->lock);
+
+ /* one is ok since we left it on there on purpose */
+ if (in_use_descs > 1)
+ printk(KERN_ERR "IOP: Freeing %d in use descriptors!\n",
+ in_use_descs - 1);
+}
+
+/**
+ * iop_adma_is_complete - poll the status of an ADMA transaction
+ * @chan: ADMA channel handle
+ * @cookie: ADMA transaction identifier
+ */
+static enum dma_status iop_adma_is_complete(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ dma_cookie_t *done,
+ dma_cookie_t *used)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ dma_cookie_t last_used;
+ dma_cookie_t last_complete;
+ enum dma_status ret;
+
+ last_used = chan->cookie;
+ last_complete = iop_chan->completed_cookie;
+
+ if (done)
+ *done= last_complete;
+ if (used)
+ *used = last_used;
+
+ ret = dma_async_is_complete(cookie, last_complete, last_used);
+ if (ret == DMA_SUCCESS)
+ return ret;
+
+ iop_adma_slot_cleanup(iop_chan);
+
+ last_used = chan->cookie;
+ last_complete = iop_chan->completed_cookie;
+
+ if (done)
+ *done= last_complete;
+ if (used)
+ *used = last_used;
+
+ return dma_async_is_complete(cookie, last_complete, last_used);
+}
+
+/* to do: can we use these interrupts to implement 'sleep until completed' */
+static irqreturn_t iop_adma_eot_handler(int irq, void *data, struct pt_regs *regs)
+{
+ return IRQ_NONE;
+}
+
+static irqreturn_t iop_adma_eoc_handler(int irq, void *data, struct pt_regs *regs)
+{
+ return IRQ_NONE;
+}
+
+static irqreturn_t iop_adma_err_handler(int irq, void *data, struct pt_regs *regs)
+{
+ return IRQ_NONE;
+}
+
+static void iop_adma_issue_pending(struct dma_chan *chan)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ spin_lock(&iop_chan->lock);
+ if (iop_chan->pending) {
+ iop_chan->pending = 0;
+ iop_chan_append(iop_chan);
+ }
+ spin_unlock(&iop_chan->lock);
+}
+
+/*
+ * Perform a transaction to verify the HW works.
+ */
+#define IOP_ADMA_TEST_SIZE 2000
+
+static int __devinit iop_adma_memcpy_self_test(struct iop_adma_device *device)
+{
+ int i;
+ union dmaengine_addr src;
+ union dmaengine_addr dest;
+ struct dma_chan *dma_chan;
+ dma_cookie_t cookie;
+ int err = 0;
+
+ src.buf = kzalloc(sizeof(u8) * IOP_ADMA_TEST_SIZE, SLAB_KERNEL);
+ if (!src.buf)
+ return -ENOMEM;
+ dest.buf = kzalloc(sizeof(u8) * IOP_ADMA_TEST_SIZE, SLAB_KERNEL);
+ if (!dest.buf) {
+ kfree(src.buf);
+ return -ENOMEM;
+ }
+
+ /* Fill in src buffer */
+ for (i = 0; i < IOP_ADMA_TEST_SIZE; i++)
+ ((u8 *) src.buf)[i] = (u8)i;
+
+ memset(dest.buf, 0, IOP_ADMA_TEST_SIZE);
+
+ /* Start copy, using first DMA channel */
+ dma_chan = container_of(device->common.channels.next,
+ struct dma_chan,
+ device_node);
+ if (iop_adma_alloc_chan_resources(dma_chan) < 1) {
+ err = -ENODEV;
+ goto out;
+ }
+
+ cookie = do_iop_adma_memcpy(dma_chan, dest, 0, src, 0,
+ IOP_ADMA_TEST_SIZE, DMA_SRC_BUF | DMA_DEST_BUF);
+ iop_adma_issue_pending(dma_chan);
+ msleep(1);
+
+ if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+ printk(KERN_ERR "iop adma%d: Self-test copy timed out, disabling\n",
+ device->id);
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+ consistent_sync(dest.buf, IOP_ADMA_TEST_SIZE, DMA_FROM_DEVICE);
+ if (memcmp(src.buf, dest.buf, IOP_ADMA_TEST_SIZE)) {
+ printk(KERN_ERR "iop adma%d: Self-test copy failed compare, disabling\n",
+ device->id);
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+free_resources:
+ iop_adma_free_chan_resources(dma_chan);
+out:
+ kfree(src.buf);
+ kfree(dest.buf);
+ return err;
+}
+
+#define IOP_ADMA_NUM_SRC_TST 4 /* must be <= 15 */
+static int __devinit iop_adma_xor_zero_sum_self_test(struct iop_adma_device *device)
+{
+ int i, src_idx;
+ struct page *xor_srcs[IOP_ADMA_NUM_SRC_TST];
+ struct page *zero_sum_srcs[IOP_ADMA_NUM_SRC_TST + 1];
+ union dmaengine_addr dest;
+ union dmaengine_addr src;
+ struct dma_chan *dma_chan;
+ dma_cookie_t cookie;
+ u8 cmp_byte = 0;
+ u32 cmp_word;
+ u32 zero_sum_result;
+ int err = 0;
+
+ for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TST; src_idx++) {
+ xor_srcs[src_idx] = alloc_page(GFP_KERNEL);
+ if (!xor_srcs[src_idx])
+ while (src_idx--) {
+ __free_page(xor_srcs[src_idx]);
+ return -ENOMEM;
+ }
+ }
+
+ dest.pg = alloc_page(GFP_KERNEL);
+ if (!dest.pg)
+ while (src_idx--) {
+ __free_page(xor_srcs[src_idx]);
+ return -ENOMEM;
+ }
+
+ /* Fill in src buffers */
+ for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TST; src_idx++) {
+ u8 *ptr = page_address(xor_srcs[src_idx]);
+ for (i = 0; i < PAGE_SIZE; i++)
+ ptr[i] = (1 << src_idx);
+ }
+
+ for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TST; src_idx++)
+ cmp_byte ^= (u8) (1 << src_idx);
+
+ cmp_word = (cmp_byte << 24) | (cmp_byte << 16) | (cmp_byte << 8) | cmp_byte;
+
+ memset(page_address(dest.pg), 0, PAGE_SIZE);
+
+ dma_chan = container_of(device->common.channels.next,
+ struct dma_chan,
+ device_node);
+ if (iop_adma_alloc_chan_resources(dma_chan) < 1) {
+ err = -ENODEV;
+ goto out;
+ }
+
+ /* test xor */
+ src.pgs = xor_srcs;
+ cookie = do_iop_adma_xor(dma_chan, dest, 0, src,
+ IOP_ADMA_NUM_SRC_TST, 0, PAGE_SIZE, DMA_DEST_PAGE | DMA_SRC_PAGES);
+ iop_adma_issue_pending(dma_chan);
+ msleep(8);
+
+ if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+ printk(KERN_ERR "iop_adma: Self-test xor timed out, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+ consistent_sync(page_address(dest.pg), PAGE_SIZE, DMA_FROM_DEVICE);
+ for (i = 0; i < (PAGE_SIZE / sizeof(u32)); i++) {
+ u32 *ptr = page_address(dest.pg);
+ if (ptr[i] != cmp_word) {
+ printk(KERN_ERR "iop_adma: Self-test xor failed compare, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+ }
+
+ /* zero sum the sources with the destintation page */
+ for (i = 0; i < IOP_ADMA_NUM_SRC_TST; i++)
+ zero_sum_srcs[i] = xor_srcs[i];
+ zero_sum_srcs[i] = dest.pg;
+ src.pgs = zero_sum_srcs;
+
+ zero_sum_result = 1;
+ cookie = do_iop_adma_zero_sum(dma_chan, src, IOP_ADMA_NUM_SRC_TST + 1,
+ 0, PAGE_SIZE, &zero_sum_result, DMA_SRC_PAGES);
+ iop_adma_issue_pending(dma_chan);
+ msleep(8);
+
+ if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+ printk(KERN_ERR "iop_adma: Self-test zero sum timed out, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+ if (zero_sum_result != 0) {
+ printk(KERN_ERR "iop_adma: Self-test zero sum failed compare, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+ /* test memset */
+ cookie = do_iop_adma_memset(dma_chan, dest, 0, 0, PAGE_SIZE, DMA_DEST_PAGE);
+ iop_adma_issue_pending(dma_chan);
+ msleep(8);
+
+ if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+ printk(KERN_ERR "iop_adma: Self-test memset timed out, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+ consistent_sync(page_address(dest.pg), PAGE_SIZE, DMA_FROM_DEVICE);
+ for (i = 0; i < PAGE_SIZE/sizeof(u32); i++) {
+ u32 *ptr = page_address(dest.pg);
+ if (ptr[i]) {
+ printk(KERN_ERR "iop_adma: Self-test memset failed compare, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+ }
+
+ /* test for non-zero parity sum */
+ zero_sum_result = 0;
+ cookie = do_iop_adma_zero_sum(dma_chan, src, IOP_ADMA_NUM_SRC_TST + 1,
+ 0, PAGE_SIZE, &zero_sum_result, DMA_SRC_PAGES);
+ iop_adma_issue_pending(dma_chan);
+ msleep(8);
+
+ if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+ printk(KERN_ERR "iop_adma: Self-test non-zero sum timed out, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+ if (zero_sum_result != 1) {
+ printk(KERN_ERR "iop_adma: Self-test non-zero sum failed compare, disabling\n");
+ err = -ENODEV;
+ goto free_resources;
+ }
+
+free_resources:
+ iop_adma_free_chan_resources(dma_chan);
+out:
+ src_idx = IOP_ADMA_NUM_SRC_TST;
+ while (src_idx--)
+ __free_page(xor_srcs[src_idx]);
+ __free_page(dest.pg);
+ return err;
+}
+
+static int __devexit iop_adma_remove(struct platform_device *dev)
+{
+ struct iop_adma_device *device = platform_get_drvdata(dev);
+ struct dma_chan *chan, *_chan;
+ struct iop_adma_chan *iop_chan;
+ int i;
+ struct iop_adma_platform_data *plat_data = dev->dev.platform_data;
+
+
+ dma_async_device_unregister(&device->common);
+
+ for (i = 0; i < 3; i++) {
+ unsigned int irq;
+ irq = platform_get_irq(dev, i);
+ free_irq(irq, device);
+ }
+
+ dma_free_coherent(&dev->dev, plat_data->pool_size,
+ device->dma_desc_pool_virt, device->dma_desc_pool);
+
+ do {
+ struct resource *res;
+ res = platform_get_resource(dev, IORESOURCE_MEM, 0);
+ release_mem_region(res->start, res->end - res->start);
+ } while (0);
+
+ list_for_each_entry_safe(chan, _chan, &device->common.channels,
+ device_node) {
+ iop_chan = to_iop_adma_chan(chan);
+ list_del(&chan->device_node);
+ kfree(iop_chan);
+ }
+ kfree(device);
+
+ return 0;
+}
+
+static dma_addr_t iop_adma_map_page(struct dma_chan *chan, struct page *page,
+ unsigned long offset, size_t size,
+ int direction)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ return dma_map_page(&iop_chan->device->pdev->dev, page, offset, size,
+ direction);
+}
+
+static dma_addr_t iop_adma_map_single(struct dma_chan *chan, void *cpu_addr,
+ size_t size, int direction)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ return dma_map_single(&iop_chan->device->pdev->dev, cpu_addr, size,
+ direction);
+}
+
+static void iop_adma_unmap_page(struct dma_chan *chan, dma_addr_t handle,
+ size_t size, int direction)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ dma_unmap_page(&iop_chan->device->pdev->dev, handle, size, direction);
+}
+
+static void iop_adma_unmap_single(struct dma_chan *chan, dma_addr_t handle,
+ size_t size, int direction)
+{
+ struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+ dma_unmap_single(&iop_chan->device->pdev->dev, handle, size, direction);
+}
+
+extern dma_cookie_t dma_async_do_memcpy_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_off,
+ size_t len, unsigned long flags);
+
+extern dma_cookie_t dma_async_do_xor_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, unsigned long flags);
+
+extern dma_cookie_t dma_async_do_zero_sum_err(struct dma_chan *chan,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, u32 *result,
+ unsigned long flags);
+
+extern dma_cookie_t dma_async_do_memset_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ int val, size_t len, unsigned long flags);
+
+static int __devinit iop_adma_probe(struct platform_device *pdev)
+{
+ struct resource *res;
+ int ret=0, irq_eot=0, irq_eoc=0, irq_err=0, irq, i;
+ struct iop_adma_device *adev;
+ struct iop_adma_chan *iop_chan;
+ struct iop_adma_platform_data *plat_data = pdev->dev.platform_data;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res)
+ return -ENODEV;
+
+ if (!request_mem_region(res->start, res->end - res->start, pdev->name))
+ return -EBUSY;
+
+ if ((adev = kzalloc(sizeof(*adev), GFP_KERNEL)) == NULL) {
+ ret = -ENOMEM;
+ goto err_adev_alloc;
+ }
+
+ if ((adev->dma_desc_pool_virt = dma_alloc_writecombine(&pdev->dev,
+ plat_data->pool_size,
+ &adev->dma_desc_pool,
+ GFP_KERNEL)) == NULL) {
+ ret = -ENOMEM;
+ goto err_dma_alloc;
+ }
+
+ PRINTK("%s: allocted descriptor pool virt %p phys %p\n",
+ __FUNCTION__, adev->dma_desc_pool_virt, (void *) adev->dma_desc_pool);
+
+ adev->id = plat_data->hw_id;
+ adev->common.capabilities = plat_data->capabilities;
+
+ for (i = 0; i < 3; i++) {
+ irq = platform_get_irq(pdev, i);
+ if (irq < 0)
+ ret = -ENXIO;
+ else {
+ switch (i) {
+ case 0:
+ irq_eot = irq;
+ ret = request_irq(irq, iop_adma_eot_handler,
+ 0, pdev->name, adev);
+ if (ret) {
+ ret = -EIO;
+ goto err_irq0;
+ }
+ break;
+ case 1:
+ irq_eoc = irq;
+ ret = request_irq(irq, iop_adma_eoc_handler,
+ 0, pdev->name, adev);
+ if (ret) {
+ ret = -EIO;
+ goto err_irq1;
+ }
+ break;
+ case 2:
+ irq_err = irq;
+ ret = request_irq(irq, iop_adma_err_handler,
+ 0, pdev->name, adev);
+ if (ret) {
+ ret = -EIO;
+ goto err_irq2;
+ }
+ break;
+ }
+ }
+ }
+
+ adev->pdev = pdev;
+ platform_set_drvdata(pdev, adev);
+
+ INIT_LIST_HEAD(&adev->common.channels);
+ adev->common.device_alloc_chan_resources = iop_adma_alloc_chan_resources;
+ adev->common.device_free_chan_resources = iop_adma_free_chan_resources;
+ adev->common.device_operation_complete = iop_adma_is_complete;
+ adev->common.device_issue_pending = iop_adma_issue_pending;
+ adev->common.map_page = iop_adma_map_page;
+ adev->common.map_single = iop_adma_map_single;
+ adev->common.unmap_page = iop_adma_unmap_page;
+ adev->common.unmap_single = iop_adma_unmap_single;
+
+ if (adev->common.capabilities & DMA_MEMCPY)
+ adev->common.device_do_dma_memcpy = do_iop_adma_memcpy;
+ else
+ adev->common.device_do_dma_memcpy = dma_async_do_memcpy_err;
+
+ if (adev->common.capabilities & DMA_MEMSET)
+ adev->common.device_do_dma_memset = do_iop_adma_memset;
+ else
+ adev->common.device_do_dma_memset = dma_async_do_memset_err;
+
+ if (adev->common.capabilities & DMA_XOR)
+ adev->common.device_do_dma_xor = do_iop_adma_xor;
+ else
+ adev->common.device_do_dma_xor = dma_async_do_xor_err;
+
+ if (adev->common.capabilities & DMA_ZERO_SUM)
+ adev->common.device_do_dma_zero_sum = do_iop_adma_zero_sum;
+ else
+ adev->common.device_do_dma_zero_sum = dma_async_do_zero_sum_err;
+
+ if ((iop_chan = kzalloc(sizeof(*iop_chan), GFP_KERNEL)) == NULL) {
+ ret = -ENOMEM;
+ goto err_chan_alloc;
+ }
+
+ spin_lock_init(&iop_chan->lock);
+ iop_chan->device = adev;
+ INIT_LIST_HEAD(&iop_chan->chain);
+ INIT_LIST_HEAD(&iop_chan->all_slots);
+ iop_chan->last_used = NULL;
+ dma_async_chan_init(&iop_chan->common, &adev->common);
+
+ if (adev->common.capabilities & DMA_MEMCPY) {
+ ret = iop_adma_memcpy_self_test(adev);
+ PRINTK("iop adma%d: memcpy self test returned %d\n", adev->id, ret);
+ if (ret)
+ goto err_self_test;
+ }
+
+ if (adev->common.capabilities & (DMA_XOR + DMA_ZERO_SUM + DMA_MEMSET)) {
+ ret = iop_adma_xor_zero_sum_self_test(adev);
+ PRINTK("iop adma%d: xor self test returned %d\n", adev->id, ret);
+ if (ret)
+ goto err_self_test;
+ }
+
+ printk(KERN_INFO "Intel(R) IOP ADMA Engine found [%d]: "
+ "( %s%s%s%s%s%s%s%s%s)\n",
+ adev->id,
+ adev->common.capabilities & DMA_PQ_XOR ? "pq_xor " : "",
+ adev->common.capabilities & DMA_PQ_UPDATE ? "pq_update " : "",
+ adev->common.capabilities & DMA_PQ_ZERO_SUM ? "pq_zero_sum " : "",
+ adev->common.capabilities & DMA_XOR ? "xor " : "",
+ adev->common.capabilities & DMA_DUAL_XOR ? "dual_xor " : "",
+ adev->common.capabilities & DMA_ZERO_SUM ? "xor_zero_sum " : "",
+ adev->common.capabilities & DMA_MEMSET ? "memset " : "",
+ adev->common.capabilities & DMA_MEMCPY_CRC32C ? "memcpy+crc " : "",
+ adev->common.capabilities & DMA_MEMCPY ? "memcpy " : "");
+
+ dma_async_device_register(&adev->common);
+ goto out;
+
+err_self_test:
+ kfree(iop_chan);
+err_chan_alloc:
+err_irq2:
+ free_irq(irq_eoc, adev);
+err_irq1:
+ free_irq(irq_eot, adev);
+err_irq0:
+ dma_free_coherent(&adev->pdev->dev, plat_data->pool_size,
+ adev->dma_desc_pool_virt, adev->dma_desc_pool);
+err_dma_alloc:
+ kfree(adev);
+err_adev_alloc:
+ release_mem_region(res->start, res->end - res->start);
+out:
+ return ret;
+}
+
+static void iop_chan_start_null_memcpy(struct iop_adma_chan *iop_chan)
+{
+ struct iop_adma_desc_slot *sw_desc;
+ dma_cookie_t cookie;
+ int slot_cnt, slots_per_op;
+
+ spin_lock_bh(&iop_chan->lock);
+ slot_cnt = iop_chan_memcpy_slot_count(0, &slots_per_op);
+ sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+ if (sw_desc) {
+ iop_desc_init_memcpy(sw_desc);
+ iop_desc_set_byte_count(sw_desc, iop_chan, 0);
+ iop_desc_set_dest_addr(sw_desc, iop_chan, 0);
+ iop_desc_set_memcpy_src_addr(sw_desc, 0, slot_cnt, slots_per_op);
+
+ cookie = iop_chan->common.cookie;
+ cookie++;
+ if (cookie <= 1)
+ cookie = 2;
+
+ /* initialize the completed cookie to be less than
+ * the most recently used cookie
+ */
+ iop_chan->completed_cookie = cookie - 1;
+ iop_chan->common.cookie = sw_desc->cookie = cookie;
+
+ /* channel should not be busy */
+ BUG_ON(iop_chan_is_busy(iop_chan));
+
+ /* clear any prior error-status bits */
+ iop_chan_clear_status(iop_chan);
+
+ /* disable operation */
+ iop_chan_disable(iop_chan);
+
+ /* set the descriptor address */
+ iop_chan_set_next_descriptor(iop_chan, sw_desc->phys);
+
+ /* run the descriptor */
+ iop_chan_enable(iop_chan);
+ } else
+ printk(KERN_ERR "iop adma%d failed to allocate null descriptor\n",
+ iop_chan->device->id);
+ spin_unlock_bh(&iop_chan->lock);
+}
+
+static void iop_chan_start_null_xor(struct iop_adma_chan *iop_chan)
+{
+ struct iop_adma_desc_slot *sw_desc;
+ dma_cookie_t cookie;
+ int slot_cnt, slots_per_op;
+
+ spin_lock_bh(&iop_chan->lock);
+ slot_cnt = iop_chan_xor_slot_count(0, 2, &slots_per_op);
+ sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+ if (sw_desc) {
+ iop_desc_init_null_xor(sw_desc, 2);
+ iop_desc_set_byte_count(sw_desc, iop_chan, 0);
+ iop_desc_set_dest_addr(sw_desc, iop_chan, 0);
+ iop_desc_set_xor_src_addr(sw_desc, 0, 0, slot_cnt, slots_per_op);
+ iop_desc_set_xor_src_addr(sw_desc, 1, 0, slot_cnt, slots_per_op);
+
+ cookie = iop_chan->common.cookie;
+ cookie++;
+ if (cookie <= 1)
+ cookie = 2;
+
+ /* initialize the completed cookie to be less than
+ * the most recently used cookie
+ */
+ iop_chan->completed_cookie = cookie - 1;
+ iop_chan->common.cookie = sw_desc->cookie = cookie;
+
+ /* channel should not be busy */
+ BUG_ON(iop_chan_is_busy(iop_chan));
+
+ /* clear any prior error-status bits */
+ iop_chan_clear_status(iop_chan);
+
+ /* disable operation */
+ iop_chan_disable(iop_chan);
+
+ /* set the descriptor address */
+ iop_chan_set_next_descriptor(iop_chan, sw_desc->phys);
+
+ /* run the descriptor */
+ iop_chan_enable(iop_chan);
+ } else
+ printk(KERN_ERR "iop adma%d failed to allocate null descriptor\n",
+ iop_chan->device->id);
+ spin_unlock_bh(&iop_chan->lock);
+}
+
+static struct platform_driver iop_adma_driver = {
+ .probe = iop_adma_probe,
+ .remove = iop_adma_remove,
+ .driver = {
+ .owner = THIS_MODULE,
+ .name = "IOP-ADMA",
+ },
+};
+
+static int __init iop_adma_init (void)
+{
+ return platform_driver_register(&iop_adma_driver);
+}
+
+static void __exit iop_adma_exit (void)
+{
+ platform_driver_unregister(&iop_adma_driver);
+ return;
+}
+
+void __arch_raid5_dma_chan_request(struct dma_client *client)
+{
+ iop_raid5_dma_chan_request(client);
+}
+
+struct dma_chan *__arch_raid5_dma_next_channel(struct dma_client *client)
+{
+ return iop_raid5_dma_next_channel(client);
+}
+
+struct dma_chan *__arch_raid5_dma_check_channel(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ struct dma_client *client,
+ unsigned long capabilities)
+{
+ return iop_raid5_dma_check_channel(chan, cookie, client, capabilities);
+}
+
+EXPORT_SYMBOL_GPL(__arch_raid5_dma_chan_request);
+EXPORT_SYMBOL_GPL(__arch_raid5_dma_next_channel);
+EXPORT_SYMBOL_GPL(__arch_raid5_dma_check_channel);
+
+module_init(iop_adma_init);
+module_exit(iop_adma_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("IOP ADMA Engine Driver");
+MODULE_LICENSE("GPL");
diff --git a/include/asm-arm/hardware/iop_adma.h b/include/asm-arm/hardware/iop_adma.h
new file mode 100644
index 0000000..62bbbdf
--- /dev/null
+++ b/include/asm-arm/hardware/iop_adma.h
@@ -0,0 +1,98 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef IOP_ADMA_H
+#define IOP_ADMA_H
+#include <linux/types.h>
+#include <linux/dmaengine.h>
+
+#define IOP_ADMA_SLOT_SIZE 32
+#define IOP_ADMA_THRESHOLD 20
+
+/**
+ * struct iop_adma_device - internal representation of an ADMA device
+ * @pdev: Platform device
+ * @id: HW ADMA Device selector
+ * @dma_desc_pool: base of DMA descriptor region (DMA address)
+ * @dma_desc_pool_virt: base of DMA descriptor region (CPU address)
+ * @common: embedded struct dma_device
+ */
+struct iop_adma_device {
+ struct platform_device *pdev;
+ int id;
+ dma_addr_t dma_desc_pool;
+ void *dma_desc_pool_virt;
+ struct dma_device common;
+};
+
+/**
+ * struct iop_adma_device - internal representation of an ADMA device
+ * @lock: serializes enqueue/dequeue operations to the slot pool
+ * @device: parent device
+ * @chain: device chain view of the descriptors
+ * @common: common dmaengine channel object members
+ * @all_slots: complete domain of slots usable by the channel
+ * @pending: allows batching of hardware operations
+ * @result_accumulator: allows zero result sums of buffers > the hw maximum
+ * @zero_sum_group: flag to the clean up routine to collect zero sum results
+ * @completed_cookie: identifier for the most recently completed operation
+ * @slots_allocated: records the actual size of the descriptor slot pool
+ */
+struct iop_adma_chan {
+ spinlock_t lock;
+ struct iop_adma_device *device;
+ struct list_head chain;
+ struct dma_chan common;
+ struct list_head all_slots;
+ struct iop_adma_desc_slot *last_used;
+ int pending;
+ u8 result_accumulator;
+ u8 zero_sum_group;
+ dma_cookie_t completed_cookie;
+ int slots_allocated;
+};
+
+struct iop_adma_desc_slot {
+ void *hw_desc;
+ struct list_head slot_node;
+ struct list_head chain_node;
+ dma_cookie_t cookie;
+ dma_addr_t phys;
+ u16 stride;
+ u16 idx;
+ u16 slot_cnt;
+ u8 src_cnt;
+ u8 slots_per_op;
+ unsigned long flags;
+ union {
+ u32 *xor_check_result;
+ u32 *crc32_result;
+ };
+};
+
+struct iop_adma_platform_data {
+ int hw_id;
+ unsigned long capabilities;
+ size_t pool_size;
+};
+
+#define to_iop_sw_desc(addr_hw_desc) container_of(addr_hw_desc, struct iop_adma_desc_slot, hw_desc)
+#define iop_hw_desc_slot_idx(hw_desc, idx) ( (void *) (((unsigned long) hw_desc) + ((idx) << 5)) )
+#endif

2006-09-11 23:21:05

by Dan Williams

[permalink] [raw]
Subject: [PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs

From: Dan Williams <[email protected]>

Also brings the iop3xx registers in line with the format of the iop13xx
register definitions.

Signed-off-by: Dan Williams <[email protected]>
---

include/asm-arm/arch-iop32x/entry-macro.S | 2
include/asm-arm/arch-iop32x/iop32x.h | 14 +
include/asm-arm/arch-iop33x/entry-macro.S | 2
include/asm-arm/arch-iop33x/iop33x.h | 38 ++-
include/asm-arm/hardware/iop3xx.h | 347 +++++++++++++----------------
5 files changed, 188 insertions(+), 215 deletions(-)

diff --git a/include/asm-arm/arch-iop32x/entry-macro.S b/include/asm-arm/arch-iop32x/entry-macro.S
index 1500cbb..f357be4 100644
--- a/include/asm-arm/arch-iop32x/entry-macro.S
+++ b/include/asm-arm/arch-iop32x/entry-macro.S
@@ -13,7 +13,7 @@ #include <asm/arch/iop32x.h>
.endm

.macro get_irqnr_and_base, irqnr, irqstat, base, tmp
- ldr \base, =IOP3XX_REG_ADDR(0x07D8)
+ ldr \base, =0xfeffe7d8
ldr \irqstat, [\base] @ Read IINTSRC
cmp \irqstat, #0
clzne \irqnr, \irqstat
diff --git a/include/asm-arm/arch-iop32x/iop32x.h b/include/asm-arm/arch-iop32x/iop32x.h
index 15b4d6a..904a14d 100644
--- a/include/asm-arm/arch-iop32x/iop32x.h
+++ b/include/asm-arm/arch-iop32x/iop32x.h
@@ -19,16 +19,18 @@ #define __IOP32X_H
* Peripherals that are shared between the iop32x and iop33x but
* located at different addresses.
*/
-#define IOP3XX_GPIO_REG(reg) (IOP3XX_PERIPHERAL_VIRT_BASE + 0x07c0 + (reg))
-#define IOP3XX_TIMER_REG(reg) (IOP3XX_PERIPHERAL_VIRT_BASE + 0x07e0 + (reg))
+#define IOP3XX_GPIO_REG32(reg) (volatile u32 *)(IOP3XX_PERIPHERAL_VIRT_BASE +\
+ 0x07c0 + (reg))
+#define IOP3XX_TIMER_REG32(reg) (volatile u32 *)(IOP3XX_PERIPHERAL_VIRT_BASE +\
+ 0x07e0 + (reg))

#include <asm/hardware/iop3xx.h>

/* Interrupt Controller */
-#define IOP32X_INTCTL (volatile u32 *)IOP3XX_REG_ADDR(0x07d0)
-#define IOP32X_INTSTR (volatile u32 *)IOP3XX_REG_ADDR(0x07d4)
-#define IOP32X_IINTSRC (volatile u32 *)IOP3XX_REG_ADDR(0x07d8)
-#define IOP32X_FINTSRC (volatile u32 *)IOP3XX_REG_ADDR(0x07dc)
+#define IOP32X_INTCTL IOP3XX_REG_ADDR32(0x07d0)
+#define IOP32X_INTSTR IOP3XX_REG_ADDR32(0x07d4)
+#define IOP32X_IINTSRC IOP3XX_REG_ADDR32(0x07d8)
+#define IOP32X_FINTSRC IOP3XX_REG_ADDR32(0x07dc)


#endif
diff --git a/include/asm-arm/arch-iop33x/entry-macro.S b/include/asm-arm/arch-iop33x/entry-macro.S
index 92b7917..eb207d2 100644
--- a/include/asm-arm/arch-iop33x/entry-macro.S
+++ b/include/asm-arm/arch-iop33x/entry-macro.S
@@ -13,7 +13,7 @@ #include <asm/arch/iop33x.h>
.endm

.macro get_irqnr_and_base, irqnr, irqstat, base, tmp
- ldr \base, =IOP3XX_REG_ADDR(0x07C8)
+ ldr \base, =0xfeffe7c8
ldr \irqstat, [\base] @ Read IINTVEC
cmp \irqstat, #0
ldreq \irqstat, [\base] @ erratum 63 workaround
diff --git a/include/asm-arm/arch-iop33x/iop33x.h b/include/asm-arm/arch-iop33x/iop33x.h
index 9b38fde..c171383 100644
--- a/include/asm-arm/arch-iop33x/iop33x.h
+++ b/include/asm-arm/arch-iop33x/iop33x.h
@@ -18,28 +18,30 @@ #define __IOP33X_H
* Peripherals that are shared between the iop32x and iop33x but
* located at different addresses.
*/
-#define IOP3XX_GPIO_REG(reg) (IOP3XX_PERIPHERAL_VIRT_BASE + 0x1780 + (reg))
-#define IOP3XX_TIMER_REG(reg) (IOP3XX_PERIPHERAL_VIRT_BASE + 0x07d0 + (reg))
+#define IOP3XX_GPIO_REG32(reg) (volatile u32 *)(IOP3XX_PERIPHERAL_VIRT_BASE +\
+ 0x1780 + (reg))
+#define IOP3XX_TIMER_REG32(reg) (volatile u32 *)(IOP3XX_PERIPHERAL_VIRT_BASE +\
+ 0x07d0 + (reg))

#include <asm/hardware/iop3xx.h>

/* Interrupt Controller */
-#define IOP33X_INTCTL0 (volatile u32 *)IOP3XX_REG_ADDR(0x0790)
-#define IOP33X_INTCTL1 (volatile u32 *)IOP3XX_REG_ADDR(0x0794)
-#define IOP33X_INTSTR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0798)
-#define IOP33X_INTSTR1 (volatile u32 *)IOP3XX_REG_ADDR(0x079c)
-#define IOP33X_IINTSRC0 (volatile u32 *)IOP3XX_REG_ADDR(0x07a0)
-#define IOP33X_IINTSRC1 (volatile u32 *)IOP3XX_REG_ADDR(0x07a4)
-#define IOP33X_FINTSRC0 (volatile u32 *)IOP3XX_REG_ADDR(0x07a8)
-#define IOP33X_FINTSRC1 (volatile u32 *)IOP3XX_REG_ADDR(0x07ac)
-#define IOP33X_IPR0 (volatile u32 *)IOP3XX_REG_ADDR(0x07b0)
-#define IOP33X_IPR1 (volatile u32 *)IOP3XX_REG_ADDR(0x07b4)
-#define IOP33X_IPR2 (volatile u32 *)IOP3XX_REG_ADDR(0x07b8)
-#define IOP33X_IPR3 (volatile u32 *)IOP3XX_REG_ADDR(0x07bc)
-#define IOP33X_INTBASE (volatile u32 *)IOP3XX_REG_ADDR(0x07c0)
-#define IOP33X_INTSIZE (volatile u32 *)IOP3XX_REG_ADDR(0x07c4)
-#define IOP33X_IINTVEC (volatile u32 *)IOP3XX_REG_ADDR(0x07c8)
-#define IOP33X_FINTVEC (volatile u32 *)IOP3XX_REG_ADDR(0x07cc)
+#define IOP33X_INTCTL0 IOP3XX_REG_ADDR32(0x0790)
+#define IOP33X_INTCTL1 IOP3XX_REG_ADDR32(0x0794)
+#define IOP33X_INTSTR0 IOP3XX_REG_ADDR32(0x0798)
+#define IOP33X_INTSTR1 IOP3XX_REG_ADDR32(0x079c)
+#define IOP33X_IINTSRC0 IOP3XX_REG_ADDR32(0x07a0)
+#define IOP33X_IINTSRC1 IOP3XX_REG_ADDR32(0x07a4)
+#define IOP33X_FINTSRC0 IOP3XX_REG_ADDR32(0x07a8)
+#define IOP33X_FINTSRC1 IOP3XX_REG_ADDR32(0x07ac)
+#define IOP33X_IPR0 IOP3XX_REG_ADDR32(0x07b0)
+#define IOP33X_IPR1 IOP3XX_REG_ADDR32(0x07b4)
+#define IOP33X_IPR2 IOP3XX_REG_ADDR32(0x07b8)
+#define IOP33X_IPR3 IOP3XX_REG_ADDR32(0x07bc)
+#define IOP33X_INTBASE IOP3XX_REG_ADDR32(0x07c0)
+#define IOP33X_INTSIZE IOP3XX_REG_ADDR32(0x07c4)
+#define IOP33X_IINTVEC IOP3XX_REG_ADDR32(0x07c8)
+#define IOP33X_FINTVEC IOP3XX_REG_ADDR32(0x07cc)

/* UARTs */
#define IOP33X_UART0_PHYS (IOP3XX_PERIPHERAL_PHYS_BASE + 0x1700)
diff --git a/include/asm-arm/hardware/iop3xx.h b/include/asm-arm/hardware/iop3xx.h
index b5c12ef..295789a 100644
--- a/include/asm-arm/hardware/iop3xx.h
+++ b/include/asm-arm/hardware/iop3xx.h
@@ -34,153 +34,166 @@ #endif
/*
* IOP3XX processor registers
*/
-#define IOP3XX_PERIPHERAL_PHYS_BASE 0xffffe000
-#define IOP3XX_PERIPHERAL_VIRT_BASE 0xfeffe000
-#define IOP3XX_PERIPHERAL_SIZE 0x00002000
-#define IOP3XX_REG_ADDR(reg) (IOP3XX_PERIPHERAL_VIRT_BASE + (reg))
+#define IOP3XX_PERIPHERAL_PHYS_BASE 0xffffe000
+#define IOP3XX_PERIPHERAL_VIRT_BASE 0xfeffe000
+#define IOP3XX_PERIPHERAL_SIZE 0x00002000
+#define IOP3XX_REG_ADDR32(reg) (volatile u32 *)(IOP3XX_PERIPHERAL_VIRT_BASE + (reg))
+#define IOP3XX_REG_ADDR16(reg) (volatile u16 *)(IOP3XX_PERIPHERAL_VIRT_BASE + (reg))
+#define IOP3XX_REG_ADDR8(reg) (volatile u8 *)(IOP3XX_PERIPHERAL_VIRT_BASE + (reg))

/* Address Translation Unit */
-#define IOP3XX_ATUVID (volatile u16 *)IOP3XX_REG_ADDR(0x0100)
-#define IOP3XX_ATUDID (volatile u16 *)IOP3XX_REG_ADDR(0x0102)
-#define IOP3XX_ATUCMD (volatile u16 *)IOP3XX_REG_ADDR(0x0104)
-#define IOP3XX_ATUSR (volatile u16 *)IOP3XX_REG_ADDR(0x0106)
-#define IOP3XX_ATURID (volatile u8 *)IOP3XX_REG_ADDR(0x0108)
-#define IOP3XX_ATUCCR (volatile u32 *)IOP3XX_REG_ADDR(0x0109)
-#define IOP3XX_ATUCLSR (volatile u8 *)IOP3XX_REG_ADDR(0x010c)
-#define IOP3XX_ATULT (volatile u8 *)IOP3XX_REG_ADDR(0x010d)
-#define IOP3XX_ATUHTR (volatile u8 *)IOP3XX_REG_ADDR(0x010e)
-#define IOP3XX_ATUBIST (volatile u8 *)IOP3XX_REG_ADDR(0x010f)
-#define IOP3XX_IABAR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0110)
-#define IOP3XX_IAUBAR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0114)
-#define IOP3XX_IABAR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0118)
-#define IOP3XX_IAUBAR1 (volatile u32 *)IOP3XX_REG_ADDR(0x011c)
-#define IOP3XX_IABAR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0120)
-#define IOP3XX_IAUBAR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0124)
-#define IOP3XX_ASVIR (volatile u16 *)IOP3XX_REG_ADDR(0x012c)
-#define IOP3XX_ASIR (volatile u16 *)IOP3XX_REG_ADDR(0x012e)
-#define IOP3XX_ERBAR (volatile u32 *)IOP3XX_REG_ADDR(0x0130)
-#define IOP3XX_ATUILR (volatile u8 *)IOP3XX_REG_ADDR(0x013c)
-#define IOP3XX_ATUIPR (volatile u8 *)IOP3XX_REG_ADDR(0x013d)
-#define IOP3XX_ATUMGNT (volatile u8 *)IOP3XX_REG_ADDR(0x013e)
-#define IOP3XX_ATUMLAT (volatile u8 *)IOP3XX_REG_ADDR(0x013f)
-#define IOP3XX_IALR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0140)
-#define IOP3XX_IATVR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0144)
-#define IOP3XX_ERLR (volatile u32 *)IOP3XX_REG_ADDR(0x0148)
-#define IOP3XX_ERTVR (volatile u32 *)IOP3XX_REG_ADDR(0x014c)
-#define IOP3XX_IALR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0150)
-#define IOP3XX_IALR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0154)
-#define IOP3XX_IATVR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0158)
-#define IOP3XX_OIOWTVR (volatile u32 *)IOP3XX_REG_ADDR(0x015c)
-#define IOP3XX_OMWTVR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0160)
-#define IOP3XX_OUMWTVR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0164)
-#define IOP3XX_OMWTVR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0168)
-#define IOP3XX_OUMWTVR1 (volatile u32 *)IOP3XX_REG_ADDR(0x016c)
-#define IOP3XX_OUDWTVR (volatile u32 *)IOP3XX_REG_ADDR(0x0178)
-#define IOP3XX_ATUCR (volatile u32 *)IOP3XX_REG_ADDR(0x0180)
-#define IOP3XX_PCSR (volatile u32 *)IOP3XX_REG_ADDR(0x0184)
-#define IOP3XX_ATUISR (volatile u32 *)IOP3XX_REG_ADDR(0x0188)
-#define IOP3XX_ATUIMR (volatile u32 *)IOP3XX_REG_ADDR(0x018c)
-#define IOP3XX_IABAR3 (volatile u32 *)IOP3XX_REG_ADDR(0x0190)
-#define IOP3XX_IAUBAR3 (volatile u32 *)IOP3XX_REG_ADDR(0x0194)
-#define IOP3XX_IALR3 (volatile u32 *)IOP3XX_REG_ADDR(0x0198)
-#define IOP3XX_IATVR3 (volatile u32 *)IOP3XX_REG_ADDR(0x019c)
-#define IOP3XX_OCCAR (volatile u32 *)IOP3XX_REG_ADDR(0x01a4)
-#define IOP3XX_OCCDR (volatile u32 *)IOP3XX_REG_ADDR(0x01ac)
-#define IOP3XX_PDSCR (volatile u32 *)IOP3XX_REG_ADDR(0x01bc)
-#define IOP3XX_PMCAPID (volatile u8 *)IOP3XX_REG_ADDR(0x01c0)
-#define IOP3XX_PMNEXT (volatile u8 *)IOP3XX_REG_ADDR(0x01c1)
-#define IOP3XX_APMCR (volatile u16 *)IOP3XX_REG_ADDR(0x01c2)
-#define IOP3XX_APMCSR (volatile u16 *)IOP3XX_REG_ADDR(0x01c4)
-#define IOP3XX_PCIXCAPID (volatile u8 *)IOP3XX_REG_ADDR(0x01e0)
-#define IOP3XX_PCIXNEXT (volatile u8 *)IOP3XX_REG_ADDR(0x01e1)
-#define IOP3XX_PCIXCMD (volatile u16 *)IOP3XX_REG_ADDR(0x01e2)
-#define IOP3XX_PCIXSR (volatile u32 *)IOP3XX_REG_ADDR(0x01e4)
-#define IOP3XX_PCIIRSR (volatile u32 *)IOP3XX_REG_ADDR(0x01ec)
+#define IOP3XX_ATUVID IOP3XX_REG_ADDR16(0x0100)
+#define IOP3XX_ATUDID IOP3XX_REG_ADDR16(0x0102)
+#define IOP3XX_ATUCMD IOP3XX_REG_ADDR16(0x0104)
+#define IOP3XX_ATUSR IOP3XX_REG_ADDR16(0x0106)
+#define IOP3XX_ATURID IOP3XX_REG_ADDR8(0x0108)
+#define IOP3XX_ATUCCR IOP3XX_REG_ADDR32(0x0109)
+#define IOP3XX_ATUCLSR IOP3XX_REG_ADDR8(0x010c)
+#define IOP3XX_ATULT IOP3XX_REG_ADDR8(0x010d)
+#define IOP3XX_ATUHTR IOP3XX_REG_ADDR8(0x010e)
+#define IOP3XX_ATUBIST IOP3XX_REG_ADDR8(0x010f)
+#define IOP3XX_IABAR0 IOP3XX_REG_ADDR32(0x0110)
+#define IOP3XX_IAUBAR0 IOP3XX_REG_ADDR32(0x0114)
+#define IOP3XX_IABAR1 IOP3XX_REG_ADDR32(0x0118)
+#define IOP3XX_IAUBAR1 IOP3XX_REG_ADDR32(0x011c)
+#define IOP3XX_IABAR2 IOP3XX_REG_ADDR32(0x0120)
+#define IOP3XX_IAUBAR2 IOP3XX_REG_ADDR32(0x0124)
+#define IOP3XX_ASVIR IOP3XX_REG_ADDR16(0x012c)
+#define IOP3XX_ASIR IOP3XX_REG_ADDR16(0x012e)
+#define IOP3XX_ERBAR IOP3XX_REG_ADDR32(0x0130)
+#define IOP3XX_ATUILR IOP3XX_REG_ADDR8(0x013c)
+#define IOP3XX_ATUIPR IOP3XX_REG_ADDR8(0x013d)
+#define IOP3XX_ATUMGNT IOP3XX_REG_ADDR8(0x013e)
+#define IOP3XX_ATUMLAT IOP3XX_REG_ADDR8(0x013f)
+#define IOP3XX_IALR0 IOP3XX_REG_ADDR32(0x0140)
+#define IOP3XX_IATVR0 IOP3XX_REG_ADDR32(0x0144)
+#define IOP3XX_ERLR IOP3XX_REG_ADDR32(0x0148)
+#define IOP3XX_ERTVR IOP3XX_REG_ADDR32(0x014c)
+#define IOP3XX_IALR1 IOP3XX_REG_ADDR32(0x0150)
+#define IOP3XX_IALR2 IOP3XX_REG_ADDR32(0x0154)
+#define IOP3XX_IATVR2 IOP3XX_REG_ADDR32(0x0158)
+#define IOP3XX_OIOWTVR IOP3XX_REG_ADDR32(0x015c)
+#define IOP3XX_OMWTVR0 IOP3XX_REG_ADDR32(0x0160)
+#define IOP3XX_OUMWTVR0 IOP3XX_REG_ADDR32(0x0164)
+#define IOP3XX_OMWTVR1 IOP3XX_REG_ADDR32(0x0168)
+#define IOP3XX_OUMWTVR1 IOP3XX_REG_ADDR32(0x016c)
+#define IOP3XX_OUDWTVR IOP3XX_REG_ADDR32(0x0178)
+#define IOP3XX_ATUCR IOP3XX_REG_ADDR32(0x0180)
+#define IOP3XX_PCSR IOP3XX_REG_ADDR32(0x0184)
+#define IOP3XX_ATUISR IOP3XX_REG_ADDR32(0x0188)
+#define IOP3XX_ATUIMR IOP3XX_REG_ADDR32(0x018c)
+#define IOP3XX_IABAR3 IOP3XX_REG_ADDR32(0x0190)
+#define IOP3XX_IAUBAR3 IOP3XX_REG_ADDR32(0x0194)
+#define IOP3XX_IALR3 IOP3XX_REG_ADDR32(0x0198)
+#define IOP3XX_IATVR3 IOP3XX_REG_ADDR32(0x019c)
+#define IOP3XX_OCCAR IOP3XX_REG_ADDR32(0x01a4)
+#define IOP3XX_OCCDR IOP3XX_REG_ADDR32(0x01ac)
+#define IOP3XX_PDSCR IOP3XX_REG_ADDR32(0x01bc)
+#define IOP3XX_PMCAPID IOP3XX_REG_ADDR8(0x01c0)
+#define IOP3XX_PMNEXT IOP3XX_REG_ADDR8(0x01c1)
+#define IOP3XX_APMCR IOP3XX_REG_ADDR16(0x01c2)
+#define IOP3XX_APMCSR IOP3XX_REG_ADDR16(0x01c4)
+#define IOP3XX_PCIXCAPID IOP3XX_REG_ADDR8(0x01e0)
+#define IOP3XX_PCIXNEXT IOP3XX_REG_ADDR8(0x01e1)
+#define IOP3XX_PCIXCMD IOP3XX_REG_ADDR16(0x01e2)
+#define IOP3XX_PCIXSR IOP3XX_REG_ADDR32(0x01e4)
+#define IOP3XX_PCIIRSR IOP3XX_REG_ADDR32(0x01ec)

/* Messaging Unit */
-#define IOP3XX_IMR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0310)
-#define IOP3XX_IMR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0314)
-#define IOP3XX_OMR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0318)
-#define IOP3XX_OMR1 (volatile u32 *)IOP3XX_REG_ADDR(0x031c)
-#define IOP3XX_IDR (volatile u32 *)IOP3XX_REG_ADDR(0x0320)
-#define IOP3XX_IISR (volatile u32 *)IOP3XX_REG_ADDR(0x0324)
-#define IOP3XX_IIMR (volatile u32 *)IOP3XX_REG_ADDR(0x0328)
-#define IOP3XX_ODR (volatile u32 *)IOP3XX_REG_ADDR(0x032c)
-#define IOP3XX_OISR (volatile u32 *)IOP3XX_REG_ADDR(0x0330)
-#define IOP3XX_OIMR (volatile u32 *)IOP3XX_REG_ADDR(0x0334)
-#define IOP3XX_MUCR (volatile u32 *)IOP3XX_REG_ADDR(0x0350)
-#define IOP3XX_QBAR (volatile u32 *)IOP3XX_REG_ADDR(0x0354)
-#define IOP3XX_IFHPR (volatile u32 *)IOP3XX_REG_ADDR(0x0360)
-#define IOP3XX_IFTPR (volatile u32 *)IOP3XX_REG_ADDR(0x0364)
-#define IOP3XX_IPHPR (volatile u32 *)IOP3XX_REG_ADDR(0x0368)
-#define IOP3XX_IPTPR (volatile u32 *)IOP3XX_REG_ADDR(0x036c)
-#define IOP3XX_OFHPR (volatile u32 *)IOP3XX_REG_ADDR(0x0370)
-#define IOP3XX_OFTPR (volatile u32 *)IOP3XX_REG_ADDR(0x0374)
-#define IOP3XX_OPHPR (volatile u32 *)IOP3XX_REG_ADDR(0x0378)
-#define IOP3XX_OPTPR (volatile u32 *)IOP3XX_REG_ADDR(0x037c)
-#define IOP3XX_IAR (volatile u32 *)IOP3XX_REG_ADDR(0x0380)
+#define IOP3XX_IMR0 IOP3XX_REG_ADDR32(0x0310)
+#define IOP3XX_IMR1 IOP3XX_REG_ADDR32(0x0314)
+#define IOP3XX_OMR0 IOP3XX_REG_ADDR32(0x0318)
+#define IOP3XX_OMR1 IOP3XX_REG_ADDR32(0x031c)
+#define IOP3XX_IDR IOP3XX_REG_ADDR32(0x0320)
+#define IOP3XX_IISR IOP3XX_REG_ADDR32(0x0324)
+#define IOP3XX_IIMR IOP3XX_REG_ADDR32(0x0328)
+#define IOP3XX_ODR IOP3XX_REG_ADDR32(0x032c)
+#define IOP3XX_OISR IOP3XX_REG_ADDR32(0x0330)
+#define IOP3XX_OIMR IOP3XX_REG_ADDR32(0x0334)
+#define IOP3XX_MUCR IOP3XX_REG_ADDR32(0x0350)
+#define IOP3XX_QBAR IOP3XX_REG_ADDR32(0x0354)
+#define IOP3XX_IFHPR IOP3XX_REG_ADDR32(0x0360)
+#define IOP3XX_IFTPR IOP3XX_REG_ADDR32(0x0364)
+#define IOP3XX_IPHPR IOP3XX_REG_ADDR32(0x0368)
+#define IOP3XX_IPTPR IOP3XX_REG_ADDR32(0x036c)
+#define IOP3XX_OFHPR IOP3XX_REG_ADDR32(0x0370)
+#define IOP3XX_OFTPR IOP3XX_REG_ADDR32(0x0374)
+#define IOP3XX_OPHPR IOP3XX_REG_ADDR32(0x0378)
+#define IOP3XX_OPTPR IOP3XX_REG_ADDR32(0x037c)
+#define IOP3XX_IAR IOP3XX_REG_ADDR32(0x0380)

-/* DMA Controller */
-#define IOP3XX_DMA0_CCR (volatile u32 *)IOP3XX_REG_ADDR(0x0400)
-#define IOP3XX_DMA0_CSR (volatile u32 *)IOP3XX_REG_ADDR(0x0404)
-#define IOP3XX_DMA0_DAR (volatile u32 *)IOP3XX_REG_ADDR(0x040c)
-#define IOP3XX_DMA0_NDAR (volatile u32 *)IOP3XX_REG_ADDR(0x0410)
-#define IOP3XX_DMA0_PADR (volatile u32 *)IOP3XX_REG_ADDR(0x0414)
-#define IOP3XX_DMA0_PUADR (volatile u32 *)IOP3XX_REG_ADDR(0x0418)
-#define IOP3XX_DMA0_LADR (volatile u32 *)IOP3XX_REG_ADDR(0x041c)
-#define IOP3XX_DMA0_BCR (volatile u32 *)IOP3XX_REG_ADDR(0x0420)
-#define IOP3XX_DMA0_DCR (volatile u32 *)IOP3XX_REG_ADDR(0x0424)
-#define IOP3XX_DMA1_CCR (volatile u32 *)IOP3XX_REG_ADDR(0x0440)
-#define IOP3XX_DMA1_CSR (volatile u32 *)IOP3XX_REG_ADDR(0x0444)
-#define IOP3XX_DMA1_DAR (volatile u32 *)IOP3XX_REG_ADDR(0x044c)
-#define IOP3XX_DMA1_NDAR (volatile u32 *)IOP3XX_REG_ADDR(0x0450)
-#define IOP3XX_DMA1_PADR (volatile u32 *)IOP3XX_REG_ADDR(0x0454)
-#define IOP3XX_DMA1_PUADR (volatile u32 *)IOP3XX_REG_ADDR(0x0458)
-#define IOP3XX_DMA1_LADR (volatile u32 *)IOP3XX_REG_ADDR(0x045c)
-#define IOP3XX_DMA1_BCR (volatile u32 *)IOP3XX_REG_ADDR(0x0460)
-#define IOP3XX_DMA1_DCR (volatile u32 *)IOP3XX_REG_ADDR(0x0464)
+/* DMA Controllers */
+#define IOP3XX_DMA_OFFSET(chan, ofs) IOP3XX_REG_ADDR32((chan << 6) + (ofs))
+
+#define IOP3XX_DMA_CCR(chan) IOP3XX_DMA_OFFSET(chan, 0x0400)
+#define IOP3XX_DMA_CSR(chan) IOP3XX_DMA_OFFSET(chan, 0x0404)
+#define IOP3XX_DMA_DAR(chan) IOP3XX_DMA_OFFSET(chan, 0x040c)
+#define IOP3XX_DMA_NDAR(chan) IOP3XX_DMA_OFFSET(chan, 0x0410)
+#define IOP3XX_DMA_PADR(chan) IOP3XX_DMA_OFFSET(chan, 0x0414)
+#define IOP3XX_DMA_PUADR(chan) IOP3XX_DMA_OFFSET(chan, 0x0418)
+#define IOP3XX_DMA_LADR(chan) IOP3XX_DMA_OFFSET(chan, 0x041c)
+#define IOP3XX_DMA_BCR(chan) IOP3XX_DMA_OFFSET(chan, 0x0420)
+#define IOP3XX_DMA_DCR(chan) IOP3XX_DMA_OFFSET(chan, 0x0424)
+
+/* Application accelerator unit */
+#define IOP3XX_AAU_ACR IOP3XX_REG_ADDR32(0x0800)
+#define IOP3XX_AAU_ASR IOP3XX_REG_ADDR32(0x0804)
+#define IOP3XX_AAU_ADAR IOP3XX_REG_ADDR32(0x0808)
+#define IOP3XX_AAU_ANDAR IOP3XX_REG_ADDR32(0x080c)
+#define IOP3XX_AAU_SAR(src) IOP3XX_REG_ADDR32(0x0810 + ((src) << 2))
+#define IOP3XX_AAU_DAR IOP3XX_REG_ADDR32(0x0820)
+#define IOP3XX_AAU_ABCR IOP3XX_REG_ADDR32(0x0824)
+#define IOP3XX_AAU_ADCR IOP3XX_REG_ADDR32(0x0828)
+#define IOP3XX_AAU_SAR_EDCR(src_edc) IOP3XX_REG_ADDR32(0x082c + ((src_edc - 4) << 2))
+#define IOP3XX_AAU_EDCR0_IDX 8
+#define IOP3XX_AAU_EDCR1_IDX 17
+#define IOP3XX_AAU_EDCR2_IDX 26
+
+#define IOP3XX_DMA0_ID 0
+#define IOP3XX_DMA1_ID 1
+#define IOP3XX_AAU_ID 2

/* Peripheral bus interface */
-#define IOP3XX_PBCR (volatile u32 *)IOP3XX_REG_ADDR(0x0680)
-#define IOP3XX_PBISR (volatile u32 *)IOP3XX_REG_ADDR(0x0684)
-#define IOP3XX_PBBAR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0688)
-#define IOP3XX_PBLR0 (volatile u32 *)IOP3XX_REG_ADDR(0x068c)
-#define IOP3XX_PBBAR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0690)
-#define IOP3XX_PBLR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0694)
-#define IOP3XX_PBBAR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0698)
-#define IOP3XX_PBLR2 (volatile u32 *)IOP3XX_REG_ADDR(0x069c)
-#define IOP3XX_PBBAR3 (volatile u32 *)IOP3XX_REG_ADDR(0x06a0)
-#define IOP3XX_PBLR3 (volatile u32 *)IOP3XX_REG_ADDR(0x06a4)
-#define IOP3XX_PBBAR4 (volatile u32 *)IOP3XX_REG_ADDR(0x06a8)
-#define IOP3XX_PBLR4 (volatile u32 *)IOP3XX_REG_ADDR(0x06ac)
-#define IOP3XX_PBBAR5 (volatile u32 *)IOP3XX_REG_ADDR(0x06b0)
-#define IOP3XX_PBLR5 (volatile u32 *)IOP3XX_REG_ADDR(0x06b4)
-#define IOP3XX_PMBR0 (volatile u32 *)IOP3XX_REG_ADDR(0x06c0)
-#define IOP3XX_PMBR1 (volatile u32 *)IOP3XX_REG_ADDR(0x06e0)
-#define IOP3XX_PMBR2 (volatile u32 *)IOP3XX_REG_ADDR(0x06e4)
+#define IOP3XX_PBCR IOP3XX_REG_ADDR32(0x0680)
+#define IOP3XX_PBISR IOP3XX_REG_ADDR32(0x0684)
+#define IOP3XX_PBBAR0 IOP3XX_REG_ADDR32(0x0688)
+#define IOP3XX_PBLR0 IOP3XX_REG_ADDR32(0x068c)
+#define IOP3XX_PBBAR1 IOP3XX_REG_ADDR32(0x0690)
+#define IOP3XX_PBLR1 IOP3XX_REG_ADDR32(0x0694)
+#define IOP3XX_PBBAR2 IOP3XX_REG_ADDR32(0x0698)
+#define IOP3XX_PBLR2 IOP3XX_REG_ADDR32(0x069c)
+#define IOP3XX_PBBAR3 IOP3XX_REG_ADDR32(0x06a0)
+#define IOP3XX_PBLR3 IOP3XX_REG_ADDR32(0x06a4)
+#define IOP3XX_PBBAR4 IOP3XX_REG_ADDR32(0x06a8)
+#define IOP3XX_PBLR4 IOP3XX_REG_ADDR32(0x06ac)
+#define IOP3XX_PBBAR5 IOP3XX_REG_ADDR32(0x06b0)
+#define IOP3XX_PBLR5 IOP3XX_REG_ADDR32(0x06b4)
+#define IOP3XX_PMBR0 IOP3XX_REG_ADDR32(0x06c0)
+#define IOP3XX_PMBR1 IOP3XX_REG_ADDR32(0x06e0)
+#define IOP3XX_PMBR2 IOP3XX_REG_ADDR32(0x06e4)

/* Peripheral performance monitoring unit */
-#define IOP3XX_GTMR (volatile u32 *)IOP3XX_REG_ADDR(0x0700)
-#define IOP3XX_ESR (volatile u32 *)IOP3XX_REG_ADDR(0x0704)
-#define IOP3XX_EMISR (volatile u32 *)IOP3XX_REG_ADDR(0x0708)
-#define IOP3XX_GTSR (volatile u32 *)IOP3XX_REG_ADDR(0x0710)
+#define IOP3XX_GTMR IOP3XX_REG_ADDR32(0x0700)
+#define IOP3XX_ESR IOP3XX_REG_ADDR32(0x0704)
+#define IOP3XX_EMISR IOP3XX_REG_ADDR32(0x0708)
+#define IOP3XX_GTSR IOP3XX_REG_ADDR32(0x0710)
/* PERCR0 DOESN'T EXIST - index from 1! */
-#define IOP3XX_PERCR0 (volatile u32 *)IOP3XX_REG_ADDR(0x0710)
+#define IOP3XX_PERCR0 IOP3XX_REG_ADDR32(0x0710)

/* General Purpose I/O */
-#define IOP3XX_GPOE (volatile u32 *)IOP3XX_GPIO_REG(0x0004)
-#define IOP3XX_GPID (volatile u32 *)IOP3XX_GPIO_REG(0x0008)
-#define IOP3XX_GPOD (volatile u32 *)IOP3XX_GPIO_REG(0x000c)
+#define IOP3XX_GPOE IOP3XX_GPIO_REG32(0x0004)
+#define IOP3XX_GPID IOP3XX_GPIO_REG32(0x0008)
+#define IOP3XX_GPOD IOP3XX_GPIO_REG32(0x000c)

/* Timers */
-#define IOP3XX_TU_TMR0 (volatile u32 *)IOP3XX_TIMER_REG(0x0000)
-#define IOP3XX_TU_TMR1 (volatile u32 *)IOP3XX_TIMER_REG(0x0004)
-#define IOP3XX_TU_TCR0 (volatile u32 *)IOP3XX_TIMER_REG(0x0008)
-#define IOP3XX_TU_TCR1 (volatile u32 *)IOP3XX_TIMER_REG(0x000c)
-#define IOP3XX_TU_TRR0 (volatile u32 *)IOP3XX_TIMER_REG(0x0010)
-#define IOP3XX_TU_TRR1 (volatile u32 *)IOP3XX_TIMER_REG(0x0014)
-#define IOP3XX_TU_TISR (volatile u32 *)IOP3XX_TIMER_REG(0x0018)
-#define IOP3XX_TU_WDTCR (volatile u32 *)IOP3XX_TIMER_REG(0x001c)
+#define IOP3XX_TU_TMR0 IOP3XX_TIMER_REG32(0x0000)
+#define IOP3XX_TU_TMR1 IOP3XX_TIMER_REG32(0x0004)
+#define IOP3XX_TU_TCR0 IOP3XX_TIMER_REG32(0x0008)
+#define IOP3XX_TU_TCR1 IOP3XX_TIMER_REG32(0x000c)
+#define IOP3XX_TU_TRR0 IOP3XX_TIMER_REG32(0x0010)
+#define IOP3XX_TU_TRR1 IOP3XX_TIMER_REG32(0x0014)
+#define IOP3XX_TU_TISR IOP3XX_TIMER_REG32(0x0018)
+#define IOP3XX_TU_WDTCR IOP3XX_TIMER_REG32(0x001c)
#define IOP3XX_TMR_TC 0x01
#define IOP3XX_TMR_EN 0x02
#define IOP3XX_TMR_RELOAD 0x04
@@ -190,69 +203,25 @@ #define IOP3XX_TMR_RATIO_4_1 0x10
#define IOP3XX_TMR_RATIO_8_1 0x20
#define IOP3XX_TMR_RATIO_16_1 0x30

-/* Application accelerator unit */
-#define IOP3XX_AAU_ACR (volatile u32 *)IOP3XX_REG_ADDR(0x0800)
-#define IOP3XX_AAU_ASR (volatile u32 *)IOP3XX_REG_ADDR(0x0804)
-#define IOP3XX_AAU_ADAR (volatile u32 *)IOP3XX_REG_ADDR(0x0808)
-#define IOP3XX_AAU_ANDAR (volatile u32 *)IOP3XX_REG_ADDR(0x080c)
-#define IOP3XX_AAU_SAR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0810)
-#define IOP3XX_AAU_SAR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0814)
-#define IOP3XX_AAU_SAR3 (volatile u32 *)IOP3XX_REG_ADDR(0x0818)
-#define IOP3XX_AAU_SAR4 (volatile u32 *)IOP3XX_REG_ADDR(0x081c)
-#define IOP3XX_AAU_DAR (volatile u32 *)IOP3XX_REG_ADDR(0x0820)
-#define IOP3XX_AAU_ABCR (volatile u32 *)IOP3XX_REG_ADDR(0x0824)
-#define IOP3XX_AAU_ADCR (volatile u32 *)IOP3XX_REG_ADDR(0x0828)
-#define IOP3XX_AAU_SAR5 (volatile u32 *)IOP3XX_REG_ADDR(0x082c)
-#define IOP3XX_AAU_SAR6 (volatile u32 *)IOP3XX_REG_ADDR(0x0830)
-#define IOP3XX_AAU_SAR7 (volatile u32 *)IOP3XX_REG_ADDR(0x0834)
-#define IOP3XX_AAU_SAR8 (volatile u32 *)IOP3XX_REG_ADDR(0x0838)
-#define IOP3XX_AAU_EDCR0 (volatile u32 *)IOP3XX_REG_ADDR(0x083c)
-#define IOP3XX_AAU_SAR9 (volatile u32 *)IOP3XX_REG_ADDR(0x0840)
-#define IOP3XX_AAU_SAR10 (volatile u32 *)IOP3XX_REG_ADDR(0x0844)
-#define IOP3XX_AAU_SAR11 (volatile u32 *)IOP3XX_REG_ADDR(0x0848)
-#define IOP3XX_AAU_SAR12 (volatile u32 *)IOP3XX_REG_ADDR(0x084c)
-#define IOP3XX_AAU_SAR13 (volatile u32 *)IOP3XX_REG_ADDR(0x0850)
-#define IOP3XX_AAU_SAR14 (volatile u32 *)IOP3XX_REG_ADDR(0x0854)
-#define IOP3XX_AAU_SAR15 (volatile u32 *)IOP3XX_REG_ADDR(0x0858)
-#define IOP3XX_AAU_SAR16 (volatile u32 *)IOP3XX_REG_ADDR(0x085c)
-#define IOP3XX_AAU_EDCR1 (volatile u32 *)IOP3XX_REG_ADDR(0x0860)
-#define IOP3XX_AAU_SAR17 (volatile u32 *)IOP3XX_REG_ADDR(0x0864)
-#define IOP3XX_AAU_SAR18 (volatile u32 *)IOP3XX_REG_ADDR(0x0868)
-#define IOP3XX_AAU_SAR19 (volatile u32 *)IOP3XX_REG_ADDR(0x086c)
-#define IOP3XX_AAU_SAR20 (volatile u32 *)IOP3XX_REG_ADDR(0x0870)
-#define IOP3XX_AAU_SAR21 (volatile u32 *)IOP3XX_REG_ADDR(0x0874)
-#define IOP3XX_AAU_SAR22 (volatile u32 *)IOP3XX_REG_ADDR(0x0878)
-#define IOP3XX_AAU_SAR23 (volatile u32 *)IOP3XX_REG_ADDR(0x087c)
-#define IOP3XX_AAU_SAR24 (volatile u32 *)IOP3XX_REG_ADDR(0x0880)
-#define IOP3XX_AAU_EDCR2 (volatile u32 *)IOP3XX_REG_ADDR(0x0884)
-#define IOP3XX_AAU_SAR25 (volatile u32 *)IOP3XX_REG_ADDR(0x0888)
-#define IOP3XX_AAU_SAR26 (volatile u32 *)IOP3XX_REG_ADDR(0x088c)
-#define IOP3XX_AAU_SAR27 (volatile u32 *)IOP3XX_REG_ADDR(0x0890)
-#define IOP3XX_AAU_SAR28 (volatile u32 *)IOP3XX_REG_ADDR(0x0894)
-#define IOP3XX_AAU_SAR29 (volatile u32 *)IOP3XX_REG_ADDR(0x0898)
-#define IOP3XX_AAU_SAR30 (volatile u32 *)IOP3XX_REG_ADDR(0x089c)
-#define IOP3XX_AAU_SAR31 (volatile u32 *)IOP3XX_REG_ADDR(0x08a0)
-#define IOP3XX_AAU_SAR32 (volatile u32 *)IOP3XX_REG_ADDR(0x08a4)
-
/* I2C bus interface unit */
-#define IOP3XX_ICR0 (volatile u32 *)IOP3XX_REG_ADDR(0x1680)
-#define IOP3XX_ISR0 (volatile u32 *)IOP3XX_REG_ADDR(0x1684)
-#define IOP3XX_ISAR0 (volatile u32 *)IOP3XX_REG_ADDR(0x1688)
-#define IOP3XX_IDBR0 (volatile u32 *)IOP3XX_REG_ADDR(0x168c)
-#define IOP3XX_IBMR0 (volatile u32 *)IOP3XX_REG_ADDR(0x1694)
-#define IOP3XX_ICR1 (volatile u32 *)IOP3XX_REG_ADDR(0x16a0)
-#define IOP3XX_ISR1 (volatile u32 *)IOP3XX_REG_ADDR(0x16a4)
-#define IOP3XX_ISAR1 (volatile u32 *)IOP3XX_REG_ADDR(0x16a8)
-#define IOP3XX_IDBR1 (volatile u32 *)IOP3XX_REG_ADDR(0x16ac)
-#define IOP3XX_IBMR1 (volatile u32 *)IOP3XX_REG_ADDR(0x16b4)
+#define IOP3XX_ICR0 IOP3XX_REG_ADDR32(0x1680)
+#define IOP3XX_ISR0 IOP3XX_REG_ADDR32(0x1684)
+#define IOP3XX_ISAR0 IOP3XX_REG_ADDR32(0x1688)
+#define IOP3XX_IDBR0 IOP3XX_REG_ADDR32(0x168c)
+#define IOP3XX_IBMR0 IOP3XX_REG_ADDR32(0x1694)
+#define IOP3XX_ICR1 IOP3XX_REG_ADDR32(0x16a0)
+#define IOP3XX_ISR1 IOP3XX_REG_ADDR32(0x16a4)
+#define IOP3XX_ISAR1 IOP3XX_REG_ADDR32(0x16a8)
+#define IOP3XX_IDBR1 IOP3XX_REG_ADDR32(0x16ac)
+#define IOP3XX_IBMR1 IOP3XX_REG_ADDR32(0x16b4)


/*
* IOP3XX I/O and Mem space regions for PCI autoconfiguration
*/
#define IOP3XX_PCI_MEM_WINDOW_SIZE 0x04000000
-#define IOP3XX_PCI_LOWER_MEM_PA 0x80000000
-#define IOP3XX_PCI_LOWER_MEM_BA (*IOP3XX_OMWTVR0)
+#define IOP3XX_PCI_LOWER_MEM_PA 0x80000000
+#define IOP3XX_PCI_LOWER_MEM_BA (*IOP3XX_OMWTVR0)

#define IOP3XX_PCI_IO_WINDOW_SIZE 0x00010000
#define IOP3XX_PCI_LOWER_IO_PA 0x90000000

2006-09-11 23:21:26

by Dan Williams

[permalink] [raw]
Subject: [PATCH 19/19] iop3xx: IOP 32x and 33x support for the iop-adma driver

From: Dan Williams <[email protected]>

Adds the platform device definitions and the architecture specific support
routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* add support for > 1k zero sum buffer sizes
* added dma/aau platform devices to iq80321 and iq80332 setup
* fixed the calculation in iop_desc_is_aligned
* support xor buffer sizes larger than 16MB
* fix places where software descriptors are assumed to be contiguous, only
hardware descriptors are contiguous
* iop32x does not support hardware zero sum, add software emulation support
for up to a PAGE_SIZE buffer size
* added raid5 dma driver support functions

Signed-off-by: Dan Williams <[email protected]>
---

arch/arm/mach-iop32x/iq80321.c | 141 +++++
arch/arm/mach-iop33x/iq80331.c | 9
arch/arm/mach-iop33x/iq80332.c | 8
arch/arm/mach-iop33x/setup.c | 132 +++++
include/asm-arm/arch-iop32x/adma.h | 5
include/asm-arm/arch-iop33x/adma.h | 5
include/asm-arm/hardware/iop3xx-adma.h | 901 ++++++++++++++++++++++++++++++++
7 files changed, 1201 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c
index cdd2265..79d6514 100644
--- a/arch/arm/mach-iop32x/iq80321.c
+++ b/arch/arm/mach-iop32x/iq80321.c
@@ -33,6 +33,9 @@ #include <asm/mach/time.h>
#include <asm/mach-types.h>
#include <asm/page.h>
#include <asm/pgtable.h>
+#ifdef CONFIG_DMA_ENGINE
+#include <asm/hardware/iop_adma.h>
+#endif

/*
* IQ80321 timer tick configuration.
@@ -170,12 +173,150 @@ static struct platform_device iq80321_se
.resource = &iq80321_uart_resource,
};

+#ifdef CONFIG_DMA_ENGINE
+/* AAU and DMA Channels */
+static struct resource iop3xx_dma_0_resources[] = {
+ [0] = {
+ .start = (unsigned long) IOP3XX_DMA_CCR(0),
+ .end = ((unsigned long) IOP3XX_DMA_DCR(0)) + 4,
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = IRQ_IOP32X_DMA0_EOT,
+ .end = IRQ_IOP32X_DMA0_EOT,
+ .flags = IORESOURCE_IRQ
+ },
+ [2] = {
+ .start = IRQ_IOP32X_DMA0_EOC,
+ .end = IRQ_IOP32X_DMA0_EOC,
+ .flags = IORESOURCE_IRQ
+ },
+ [3] = {
+ .start = IRQ_IOP32X_DMA0_ERR,
+ .end = IRQ_IOP32X_DMA0_ERR,
+ .flags = IORESOURCE_IRQ
+ }
+};
+
+static struct resource iop3xx_dma_1_resources[] = {
+ [0] = {
+ .start = (unsigned long) IOP3XX_DMA_CCR(1),
+ .end = ((unsigned long) IOP3XX_DMA_DCR(1)) + 4,
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = IRQ_IOP32X_DMA1_EOT,
+ .end = IRQ_IOP32X_DMA1_EOT,
+ .flags = IORESOURCE_IRQ
+ },
+ [2] = {
+ .start = IRQ_IOP32X_DMA1_EOC,
+ .end = IRQ_IOP32X_DMA1_EOC,
+ .flags = IORESOURCE_IRQ
+ },
+ [3] = {
+ .start = IRQ_IOP32X_DMA1_ERR,
+ .end = IRQ_IOP32X_DMA1_ERR,
+ .flags = IORESOURCE_IRQ
+ }
+};
+
+
+static struct resource iop3xx_aau_resources[] = {
+ [0] = {
+ .start = (unsigned long) IOP3XX_AAU_ACR,
+ .end = (unsigned long) IOP3XX_AAU_SAR_EDCR(32),
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = IRQ_IOP32X_AA_EOT,
+ .end = IRQ_IOP32X_AA_EOT,
+ .flags = IORESOURCE_IRQ
+ },
+ [2] = {
+ .start = IRQ_IOP32X_AA_EOC,
+ .end = IRQ_IOP32X_AA_EOC,
+ .flags = IORESOURCE_IRQ
+ },
+ [3] = {
+ .start = IRQ_IOP32X_AA_ERR,
+ .end = IRQ_IOP32X_AA_ERR,
+ .flags = IORESOURCE_IRQ
+ }
+};
+
+static u64 iop3xx_adma_dmamask = DMA_32BIT_MASK;
+
+static struct iop_adma_platform_data iop3xx_dma_0_data = {
+ .hw_id = IOP3XX_DMA0_ID,
+ .capabilities = DMA_MEMCPY | DMA_MEMCPY_CRC32C,
+ .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop3xx_dma_1_data = {
+ .hw_id = IOP3XX_DMA1_ID,
+ .capabilities = DMA_MEMCPY | DMA_MEMCPY_CRC32C,
+ .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop3xx_aau_data = {
+ .hw_id = IOP3XX_AAU_ID,
+ .capabilities = DMA_XOR | DMA_ZERO_SUM | DMA_MEMSET,
+ .pool_size = 3 * PAGE_SIZE,
+};
+
+struct platform_device iop3xx_dma_0_channel = {
+ .name = "IOP-ADMA",
+ .id = 0,
+ .num_resources = 4,
+ .resource = iop3xx_dma_0_resources,
+ .dev = {
+ .dma_mask = &iop3xx_adma_dmamask,
+ .coherent_dma_mask = DMA_64BIT_MASK,
+ .platform_data = (void *) &iop3xx_dma_0_data,
+ },
+};
+
+struct platform_device iop3xx_dma_1_channel = {
+ .name = "IOP-ADMA",
+ .id = 1,
+ .num_resources = 4,
+ .resource = iop3xx_dma_1_resources,
+ .dev = {
+ .dma_mask = &iop3xx_adma_dmamask,
+ .coherent_dma_mask = DMA_64BIT_MASK,
+ .platform_data = (void *) &iop3xx_dma_1_data,
+ },
+};
+
+struct platform_device iop3xx_aau_channel = {
+ .name = "IOP-ADMA",
+ .id = 2,
+ .num_resources = 4,
+ .resource = iop3xx_aau_resources,
+ .dev = {
+ .dma_mask = &iop3xx_adma_dmamask,
+ .coherent_dma_mask = DMA_64BIT_MASK,
+ .platform_data = (void *) &iop3xx_aau_data,
+ },
+};
+#endif /* CONFIG_DMA_ENGINE */
+
+extern struct platform_device iop3xx_dma_0_channel;
+extern struct platform_device iop3xx_dma_1_channel;
+extern struct platform_device iop3xx_aau_channel;
static void __init iq80321_init_machine(void)
{
platform_device_register(&iop3xx_i2c0_device);
platform_device_register(&iop3xx_i2c1_device);
platform_device_register(&iq80321_flash_device);
platform_device_register(&iq80321_serial_device);
+ #ifdef CONFIG_DMA_ENGINE
+ platform_device_register(&iop3xx_dma_0_channel);
+ platform_device_register(&iop3xx_dma_1_channel);
+ platform_device_register(&iop3xx_aau_channel);
+ #endif
+
}

MACHINE_START(IQ80321, "Intel IQ80321")
diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c
index 3807000..34bedc6 100644
--- a/arch/arm/mach-iop33x/iq80331.c
+++ b/arch/arm/mach-iop33x/iq80331.c
@@ -122,6 +122,10 @@ static struct platform_device iq80331_fl
.resource = &iq80331_flash_resource,
};

+
+extern struct platform_device iop3xx_dma_0_channel;
+extern struct platform_device iop3xx_dma_1_channel;
+extern struct platform_device iop3xx_aau_channel;
static void __init iq80331_init_machine(void)
{
platform_device_register(&iop3xx_i2c0_device);
@@ -129,6 +133,11 @@ static void __init iq80331_init_machine(
platform_device_register(&iop33x_uart0_device);
platform_device_register(&iop33x_uart1_device);
platform_device_register(&iq80331_flash_device);
+ #ifdef CONFIG_DMA_ENGINE
+ platform_device_register(&iop3xx_dma_0_channel);
+ platform_device_register(&iop3xx_dma_1_channel);
+ platform_device_register(&iop3xx_aau_channel);
+ #endif
}

MACHINE_START(IQ80331, "Intel IQ80331")
diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c
index 8780d55..ed36016 100644
--- a/arch/arm/mach-iop33x/iq80332.c
+++ b/arch/arm/mach-iop33x/iq80332.c
@@ -129,6 +129,9 @@ static struct platform_device iq80332_fl
.resource = &iq80332_flash_resource,
};

+extern struct platform_device iop3xx_dma_0_channel;
+extern struct platform_device iop3xx_dma_1_channel;
+extern struct platform_device iop3xx_aau_channel;
static void __init iq80332_init_machine(void)
{
platform_device_register(&iop3xx_i2c0_device);
@@ -136,6 +139,11 @@ static void __init iq80332_init_machine(
platform_device_register(&iop33x_uart0_device);
platform_device_register(&iop33x_uart1_device);
platform_device_register(&iq80332_flash_device);
+ #ifdef CONFIG_DMA_ENGINE
+ platform_device_register(&iop3xx_dma_0_channel);
+ platform_device_register(&iop3xx_dma_1_channel);
+ platform_device_register(&iop3xx_aau_channel);
+ #endif
}

MACHINE_START(IQ80332, "Intel IQ80332")
diff --git a/arch/arm/mach-iop33x/setup.c b/arch/arm/mach-iop33x/setup.c
index e72face..fbdb998 100644
--- a/arch/arm/mach-iop33x/setup.c
+++ b/arch/arm/mach-iop33x/setup.c
@@ -28,6 +28,9 @@ #include <asm/hardware.h>
#include <asm/hardware/iop3xx.h>
#include <asm/mach-types.h>
#include <asm/mach/arch.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <asm/hardware/iop_adma.h>

#define IOP33X_UART_XTAL 33334000

@@ -103,3 +106,132 @@ struct platform_device iop33x_uart1_devi
.num_resources = 2,
.resource = iop33x_uart1_resources,
};
+
+#ifdef CONFIG_DMA_ENGINE
+/* AAU and DMA Channels */
+static struct resource iop3xx_dma_0_resources[] = {
+ [0] = {
+ .start = (unsigned long) IOP3XX_DMA_CCR(0),
+ .end = ((unsigned long) IOP3XX_DMA_DCR(0)) + 4,
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = IRQ_IOP33X_DMA0_EOT,
+ .end = IRQ_IOP33X_DMA0_EOT,
+ .flags = IORESOURCE_IRQ
+ },
+ [2] = {
+ .start = IRQ_IOP33X_DMA0_EOC,
+ .end = IRQ_IOP33X_DMA0_EOC,
+ .flags = IORESOURCE_IRQ
+ },
+ [3] = {
+ .start = IRQ_IOP33X_DMA0_ERR,
+ .end = IRQ_IOP33X_DMA0_ERR,
+ .flags = IORESOURCE_IRQ
+ }
+};
+
+static struct resource iop3xx_dma_1_resources[] = {
+ [0] = {
+ .start = (unsigned long) IOP3XX_DMA_CCR(1),
+ .end = ((unsigned long) IOP3XX_DMA_DCR(1)) + 4,
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = IRQ_IOP33X_DMA1_EOT,
+ .end = IRQ_IOP33X_DMA1_EOT,
+ .flags = IORESOURCE_IRQ
+ },
+ [2] = {
+ .start = IRQ_IOP33X_DMA1_EOC,
+ .end = IRQ_IOP33X_DMA1_EOC,
+ .flags = IORESOURCE_IRQ
+ },
+ [3] = {
+ .start = IRQ_IOP33X_DMA1_ERR,
+ .end = IRQ_IOP33X_DMA1_ERR,
+ .flags = IORESOURCE_IRQ
+ }
+};
+
+
+static struct resource iop3xx_aau_resources[] = {
+ [0] = {
+ .start = (unsigned long) IOP3XX_AAU_ACR,
+ .end = (unsigned long) IOP3XX_AAU_SAR_EDCR(32),
+ .flags = IORESOURCE_MEM,
+ },
+ [1] = {
+ .start = IRQ_IOP33X_AA_EOT,
+ .end = IRQ_IOP33X_AA_EOT,
+ .flags = IORESOURCE_IRQ
+ },
+ [2] = {
+ .start = IRQ_IOP33X_AA_EOC,
+ .end = IRQ_IOP33X_AA_EOC,
+ .flags = IORESOURCE_IRQ
+ },
+ [3] = {
+ .start = IRQ_IOP33X_AA_ERR,
+ .end = IRQ_IOP33X_AA_ERR,
+ .flags = IORESOURCE_IRQ
+ }
+};
+
+static u64 iop3xx_adma_dmamask = DMA_32BIT_MASK;
+
+static struct iop_adma_platform_data iop3xx_dma_0_data = {
+ .hw_id = IOP3XX_DMA0_ID,
+ .capabilities = DMA_MEMCPY | DMA_MEMCPY_CRC32C,
+ .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop3xx_dma_1_data = {
+ .hw_id = IOP3XX_DMA1_ID,
+ .capabilities = DMA_MEMCPY | DMA_MEMCPY_CRC32C,
+ .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop3xx_aau_data = {
+ .hw_id = IOP3XX_AAU_ID,
+ .capabilities = DMA_XOR | DMA_ZERO_SUM | DMA_MEMSET,
+ .pool_size = 3 * PAGE_SIZE,
+};
+
+struct platform_device iop3xx_dma_0_channel = {
+ .name = "IOP-ADMA",
+ .id = 0,
+ .num_resources = 4,
+ .resource = iop3xx_dma_0_resources,
+ .dev = {
+ .dma_mask = &iop3xx_adma_dmamask,
+ .coherent_dma_mask = DMA_64BIT_MASK,
+ .platform_data = (void *) &iop3xx_dma_0_data,
+ },
+};
+
+struct platform_device iop3xx_dma_1_channel = {
+ .name = "IOP-ADMA",
+ .id = 1,
+ .num_resources = 4,
+ .resource = iop3xx_dma_1_resources,
+ .dev = {
+ .dma_mask = &iop3xx_adma_dmamask,
+ .coherent_dma_mask = DMA_64BIT_MASK,
+ .platform_data = (void *) &iop3xx_dma_1_data,
+ },
+};
+
+struct platform_device iop3xx_aau_channel = {
+ .name = "IOP-ADMA",
+ .id = 2,
+ .num_resources = 4,
+ .resource = iop3xx_aau_resources,
+ .dev = {
+ .dma_mask = &iop3xx_adma_dmamask,
+ .coherent_dma_mask = DMA_64BIT_MASK,
+ .platform_data = (void *) &iop3xx_aau_data,
+ },
+};
+#endif /* CONFIG_DMA_ENGINE */
diff --git a/include/asm-arm/arch-iop32x/adma.h b/include/asm-arm/arch-iop32x/adma.h
new file mode 100644
index 0000000..5ed9203
--- /dev/null
+++ b/include/asm-arm/arch-iop32x/adma.h
@@ -0,0 +1,5 @@
+#ifndef IOP32X_ADMA_H
+#define IOP32X_ADMA_H
+#include <asm/hardware/iop3xx-adma.h>
+#endif
+
diff --git a/include/asm-arm/arch-iop33x/adma.h b/include/asm-arm/arch-iop33x/adma.h
new file mode 100644
index 0000000..4b92f79
--- /dev/null
+++ b/include/asm-arm/arch-iop33x/adma.h
@@ -0,0 +1,5 @@
+#ifndef IOP33X_ADMA_H
+#define IOP33X_ADMA_H
+#include <asm/hardware/iop3xx-adma.h>
+#endif
+
diff --git a/include/asm-arm/hardware/iop3xx-adma.h b/include/asm-arm/hardware/iop3xx-adma.h
new file mode 100644
index 0000000..34624b6
--- /dev/null
+++ b/include/asm-arm/hardware/iop3xx-adma.h
@@ -0,0 +1,901 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef _IOP3XX_ADMA_H
+#define _IOP3XX_ADMA_H
+#include <linux/types.h>
+#include <asm/hardware.h>
+#include <asm/hardware/iop_adma.h>
+
+struct iop3xx_aau_desc_ctrl {
+ unsigned int int_en:1;
+ unsigned int blk1_cmd_ctrl:3;
+ unsigned int blk2_cmd_ctrl:3;
+ unsigned int blk3_cmd_ctrl:3;
+ unsigned int blk4_cmd_ctrl:3;
+ unsigned int blk5_cmd_ctrl:3;
+ unsigned int blk6_cmd_ctrl:3;
+ unsigned int blk7_cmd_ctrl:3;
+ unsigned int blk8_cmd_ctrl:3;
+ unsigned int blk_ctrl:2;
+ unsigned int dual_xor_en:1;
+ unsigned int tx_complete:1;
+ unsigned int zero_result_err:1;
+ unsigned int zero_result_en:1;
+ unsigned int dest_write_en:1;
+};
+
+struct iop3xx_aau_e_desc_ctrl {
+ unsigned int reserved:1;
+ unsigned int blk1_cmd_ctrl:3;
+ unsigned int blk2_cmd_ctrl:3;
+ unsigned int blk3_cmd_ctrl:3;
+ unsigned int blk4_cmd_ctrl:3;
+ unsigned int blk5_cmd_ctrl:3;
+ unsigned int blk6_cmd_ctrl:3;
+ unsigned int blk7_cmd_ctrl:3;
+ unsigned int blk8_cmd_ctrl:3;
+ unsigned int reserved2:7;
+};
+
+struct iop3xx_dma_desc_ctrl {
+ unsigned int pci_transaction:4;
+ unsigned int int_en:1;
+ unsigned int dac_cycle_en:1;
+ unsigned int mem_to_mem_en:1;
+ unsigned int crc_data_tx_en:1;
+ unsigned int crc_gen_en:1;
+ unsigned int crc_seed_dis:1;
+ unsigned int reserved:21;
+ unsigned int crc_tx_complete:1;
+};
+
+struct iop3xx_desc_dma {
+ u32 next_desc;
+ union {
+ u32 pci_src_addr;
+ u32 pci_dest_addr;
+ u32 src_addr;
+ };
+ union {
+ u32 upper_pci_src_addr;
+ u32 upper_pci_dest_addr;
+ };
+ union {
+ u32 local_pci_src_addr;
+ u32 local_pci_dest_addr;
+ u32 dest_addr;
+ };
+ u32 byte_count;
+ union {
+ u32 desc_ctrl;
+ struct iop3xx_dma_desc_ctrl desc_ctrl_field;
+ };
+ u32 crc_addr;
+};
+
+struct iop3xx_desc_aau {
+ u32 next_desc;
+ u32 src[4];
+ u32 dest_addr;
+ u32 byte_count;
+ union {
+ u32 desc_ctrl;
+ struct iop3xx_aau_desc_ctrl desc_ctrl_field;
+ };
+ union {
+ u32 src_addr;
+ u32 e_desc_ctrl;
+ struct iop3xx_aau_e_desc_ctrl e_desc_ctrl_field;
+ } src_edc[31];
+};
+
+
+struct iop3xx_aau_gfmr {
+ unsigned int gfmr1:8;
+ unsigned int gfmr2:8;
+ unsigned int gfmr3:8;
+ unsigned int gfmr4:8;
+};
+
+struct iop3xx_desc_pq_xor {
+ u32 next_desc;
+ u32 src[3];
+ union {
+ u32 data_mult1;
+ struct iop3xx_aau_gfmr data_mult1_field;
+ };
+ u32 dest_addr;
+ u32 byte_count;
+ union {
+ u32 desc_ctrl;
+ struct iop3xx_aau_desc_ctrl desc_ctrl_field;
+ };
+ union {
+ u32 src_addr;
+ u32 e_desc_ctrl;
+ struct iop3xx_aau_e_desc_ctrl e_desc_ctrl_field;
+ u32 data_multiplier;
+ struct iop3xx_aau_gfmr data_mult_field;
+ u32 reserved;
+ } src_edc_gfmr[19];
+};
+
+struct iop3xx_desc_dual_xor {
+ u32 next_desc;
+ u32 src0_addr;
+ u32 src1_addr;
+ u32 h_src_addr;
+ u32 d_src_addr;
+ u32 h_dest_addr;
+ u32 byte_count;
+ union {
+ u32 desc_ctrl;
+ struct iop3xx_aau_desc_ctrl desc_ctrl_field;
+ };
+ u32 d_dest_addr;
+};
+
+union iop3xx_desc {
+ struct iop3xx_desc_aau *aau;
+ struct iop3xx_desc_dma *dma;
+ struct iop3xx_desc_pq_xor *pq_xor;
+ struct iop3xx_desc_dual_xor *dual_xor;
+ void *ptr;
+};
+
+static inline u32 iop_chan_get_current_descriptor(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ return *IOP3XX_DMA_DAR(id);
+ case IOP3XX_AAU_ID:
+ return *IOP3XX_AAU_ADAR;
+ default:
+ BUG();
+ }
+ return 0;
+}
+
+static inline void iop_chan_set_next_descriptor(struct iop_adma_chan *chan,
+ u32 next_desc_addr)
+{
+ int id = chan->device->id;
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ *IOP3XX_DMA_NDAR(id) = next_desc_addr;
+ break;
+ case IOP3XX_AAU_ID:
+ *IOP3XX_AAU_ANDAR = next_desc_addr;
+ break;
+ }
+
+}
+
+#define IOP3XX_ADMA_STATUS_BUSY (1 << 10)
+#define IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT (1024)
+#define IOP_ADMA_XOR_MAX_BYTE_COUNT (16 * 1024 * 1024)
+
+static int iop_chan_is_busy(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+ int busy;
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ busy = (*IOP3XX_DMA_CSR(id) & IOP3XX_ADMA_STATUS_BUSY) ? 1 : 0;
+ break;
+ case IOP3XX_AAU_ID:
+ busy = (*IOP3XX_AAU_ASR & IOP3XX_ADMA_STATUS_BUSY) ? 1 : 0;
+ break;
+ default:
+ busy = 0;
+ BUG();
+ }
+
+ return busy;
+}
+
+static inline int iop_desc_is_aligned(struct iop_adma_desc_slot *desc,
+ int num_slots)
+{
+ /* num_slots will only ever be 1, 2, 4, or 8 */
+ return (desc->idx & (num_slots - 1)) ? 0 : 1;
+}
+
+/* to do: support large (i.e. > hw max) buffer sizes */
+static inline int iop_chan_memcpy_slot_count(size_t len, int *slots_per_op)
+{
+ *slots_per_op = 1;
+ return 1;
+}
+
+/* to do: support large (i.e. > hw max) buffer sizes */
+static inline int iop_chan_memset_slot_count(size_t len, int *slots_per_op)
+{
+ *slots_per_op = 1;
+ return 1;
+}
+
+static inline int iop3xx_aau_xor_slot_count(size_t len, int src_cnt,
+ int *slots_per_op)
+{
+ const static int slot_count_table[] = { 0,
+ 1, 1, 1, 1, /* 01 - 04 */
+ 2, 2, 2, 2, /* 05 - 08 */
+ 4, 4, 4, 4, /* 09 - 12 */
+ 4, 4, 4, 4, /* 13 - 16 */
+ 8, 8, 8, 8, /* 17 - 20 */
+ 8, 8, 8, 8, /* 21 - 24 */
+ 8, 8, 8, 8, /* 25 - 28 */
+ 8, 8, 8, 8, /* 29 - 32 */
+ };
+ *slots_per_op = slot_count_table[src_cnt];
+ return *slots_per_op;
+}
+
+static inline int iop_chan_xor_slot_count(size_t len, int src_cnt,
+ int *slots_per_op)
+{
+ int slot_cnt = iop3xx_aau_xor_slot_count(len, src_cnt, slots_per_op);
+
+ if (len <= IOP_ADMA_XOR_MAX_BYTE_COUNT)
+ return slot_cnt;
+
+ len -= IOP_ADMA_XOR_MAX_BYTE_COUNT;
+ while (len > IOP_ADMA_XOR_MAX_BYTE_COUNT) {
+ len -= IOP_ADMA_XOR_MAX_BYTE_COUNT;
+ slot_cnt += *slots_per_op;
+ }
+
+ if (len)
+ slot_cnt += *slots_per_op;
+
+ return slot_cnt;
+}
+
+/* zero sum on iop3xx is limited to 1k at a time so it requires multiple
+ * descriptors
+ */
+static inline int iop_chan_zero_sum_slot_count(size_t len, int src_cnt,
+ int *slots_per_op)
+{
+ int slot_cnt = iop3xx_aau_xor_slot_count(len, src_cnt, slots_per_op);
+
+ if (len <= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT)
+ return slot_cnt;
+
+ len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+ while (len > IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT) {
+ len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+ slot_cnt += *slots_per_op;
+ }
+
+ if (len)
+ slot_cnt += *slots_per_op;
+
+ return slot_cnt;
+}
+
+static inline u32 iop_desc_get_dest_addr(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ return hw_desc.dma->dest_addr;
+ case IOP3XX_AAU_ID:
+ return hw_desc.aau->dest_addr;
+ default:
+ BUG();
+ }
+ return 0;
+}
+
+static inline u32 iop_desc_get_byte_count(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ return hw_desc.dma->byte_count;
+ case IOP3XX_AAU_ID:
+ return hw_desc.aau->byte_count;
+ default:
+ BUG();
+ }
+ return 0;
+}
+
+static inline int iop3xx_src_edc_idx(int src_idx)
+{
+ const static int src_edc_idx_table[] = { 0, 0, 0, 0,
+ 0, 1, 2, 3,
+ 5, 6, 7, 8,
+ 9, 10, 11, 12,
+ 14, 15, 16, 17,
+ 18, 19, 20, 21,
+ 23, 24, 25, 26,
+ 27, 28, 29, 30,
+ };
+
+ return src_edc_idx_table[src_idx];
+}
+
+static inline u32 iop_desc_get_src_addr(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan,
+ int src_idx)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ return hw_desc.dma->src_addr;
+ case IOP3XX_AAU_ID:
+ break;
+ default:
+ BUG();
+ }
+
+ if (src_idx < 4)
+ return hw_desc.aau->src[src_idx];
+ else
+ return hw_desc.aau->src_edc[iop3xx_src_edc_idx(src_idx)].src_addr;
+}
+
+static inline void iop3xx_aau_desc_set_src_addr(struct iop3xx_desc_aau *hw_desc,
+ int src_idx, dma_addr_t addr)
+{
+ if (src_idx < 4)
+ hw_desc->src[src_idx] = addr;
+ else
+ hw_desc->src_edc[iop3xx_src_edc_idx(src_idx)].src_addr = addr;
+}
+
+static inline void iop_desc_init_memcpy(struct iop_adma_desc_slot *desc)
+{
+ struct iop3xx_desc_dma *hw_desc = desc->hw_desc;
+ union {
+ u32 value;
+ struct iop3xx_dma_desc_ctrl field;
+ } u_desc_ctrl;
+
+ desc->src_cnt = 1;
+ u_desc_ctrl.value = 0;
+ u_desc_ctrl.field.mem_to_mem_en = 1;
+ u_desc_ctrl.field.pci_transaction = 0xe; /* memory read block */
+ hw_desc->desc_ctrl = u_desc_ctrl.value;
+ hw_desc->upper_pci_src_addr = 0;
+ hw_desc->crc_addr = 0;
+ hw_desc->next_desc = 0;
+}
+
+static inline void iop_desc_init_memset(struct iop_adma_desc_slot *desc)
+{
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+ union {
+ u32 value;
+ struct iop3xx_aau_desc_ctrl field;
+ } u_desc_ctrl;
+
+ desc->src_cnt = 1;
+ u_desc_ctrl.value = 0;
+ u_desc_ctrl.field.blk1_cmd_ctrl = 0x2; /* memory block fill */
+ u_desc_ctrl.field.dest_write_en = 1;
+ hw_desc->desc_ctrl = u_desc_ctrl.value;
+ hw_desc->next_desc = 0;
+}
+
+static inline u32 iop3xx_desc_init_xor(struct iop3xx_desc_aau *hw_desc,
+ int src_cnt)
+{
+ int i, shift;
+ u32 edcr;
+ union {
+ u32 value;
+ struct iop3xx_aau_desc_ctrl field;
+ } u_desc_ctrl;
+
+ u_desc_ctrl.value = 0;
+ switch (src_cnt) {
+ case 25 ... 32:
+ u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+ edcr = 0;
+ shift = 1;
+ for (i = 24; i < src_cnt; i++) {
+ edcr |= (1 << shift);
+ shift += 3;
+ }
+ hw_desc->src_edc[IOP3XX_AAU_EDCR2_IDX].e_desc_ctrl = edcr;
+ src_cnt = 24;
+ /* fall through */
+ case 17 ... 24:
+ if (!u_desc_ctrl.field.blk_ctrl) {
+ hw_desc->src_edc[IOP3XX_AAU_EDCR2_IDX].e_desc_ctrl = 0;
+ u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+ }
+ edcr = 0;
+ shift = 1;
+ for (i = 16; i < src_cnt; i++) {
+ edcr |= (1 << shift);
+ shift += 3;
+ }
+ hw_desc->src_edc[IOP3XX_AAU_EDCR1_IDX].e_desc_ctrl = edcr;
+ src_cnt = 16;
+ /* fall through */
+ case 9 ... 16:
+ if (!u_desc_ctrl.field.blk_ctrl)
+ u_desc_ctrl.field.blk_ctrl = 0x2; /* use EDCR0 */
+ edcr = 0;
+ shift = 1;
+ for (i = 8; i < src_cnt; i++) {
+ edcr |= (1 << shift);
+ shift += 3;
+ }
+ hw_desc->src_edc[IOP3XX_AAU_EDCR0_IDX].e_desc_ctrl = edcr;
+ src_cnt = 8;
+ /* fall through */
+ case 2 ... 8:
+ shift = 1;
+ for (i = 0; i < src_cnt; i++) {
+ u_desc_ctrl.value |= (1 << shift);
+ shift += 3;
+ }
+
+ if (!u_desc_ctrl.field.blk_ctrl && src_cnt > 4)
+ u_desc_ctrl.field.blk_ctrl = 0x1; /* use mini-desc */
+ }
+
+ u_desc_ctrl.field.dest_write_en = 1;
+ u_desc_ctrl.field.blk1_cmd_ctrl = 0x7; /* direct fill */
+ hw_desc->desc_ctrl = u_desc_ctrl.value;
+ hw_desc->next_desc = 0;
+
+ return u_desc_ctrl.value;
+}
+
+static inline void iop_desc_init_xor(struct iop_adma_desc_slot *desc,
+ int src_cnt)
+{
+ desc->src_cnt = src_cnt;
+ iop3xx_desc_init_xor(desc->hw_desc, src_cnt);
+}
+
+/* return the number of operations */
+static inline int iop_desc_init_zero_sum(struct iop_adma_desc_slot *desc,
+ int src_cnt,
+ int slot_cnt,
+ int slots_per_op)
+{
+ struct iop3xx_desc_aau *hw_desc, *prev_hw_desc, *iter;
+ union {
+ u32 value;
+ struct iop3xx_aau_desc_ctrl field;
+ } u_desc_ctrl;
+ int i = 0, j = 0;
+ hw_desc = desc->hw_desc;
+ desc->src_cnt = src_cnt;
+
+ do {
+ iter = iop_hw_desc_slot_idx(hw_desc, i);
+ u_desc_ctrl.value = iop3xx_desc_init_xor(iter, src_cnt);
+ u_desc_ctrl.field.dest_write_en = 0;
+ u_desc_ctrl.field.zero_result_en = 1;
+ /* for the subsequent descriptors preserve the store queue
+ * and chain them together
+ */
+ if (i) {
+ prev_hw_desc = iop_hw_desc_slot_idx(hw_desc, i - slots_per_op);
+ prev_hw_desc->next_desc = (u32) (desc->phys + (i << 5));
+ }
+ iter->desc_ctrl = u_desc_ctrl.value;
+ slot_cnt -= slots_per_op;
+ i += slots_per_op;
+ j++;
+ } while (slot_cnt);
+
+ return j;
+}
+
+static inline void iop_desc_init_null_xor(struct iop_adma_desc_slot *desc,
+ int src_cnt)
+{
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+ union {
+ u32 value;
+ struct iop3xx_aau_desc_ctrl field;
+ } u_desc_ctrl;
+
+ u_desc_ctrl.value = 0;
+ switch (src_cnt) {
+ case 25 ... 32:
+ u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+ hw_desc->src_edc[IOP3XX_AAU_EDCR2_IDX].e_desc_ctrl = 0;
+ /* fall through */
+ case 17 ... 24:
+ if (!u_desc_ctrl.field.blk_ctrl) {
+ hw_desc->src_edc[IOP3XX_AAU_EDCR2_IDX].e_desc_ctrl = 0;
+ u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+ }
+ hw_desc->src_edc[IOP3XX_AAU_EDCR1_IDX].e_desc_ctrl = 0;
+ /* fall through */
+ case 9 ... 16:
+ if (!u_desc_ctrl.field.blk_ctrl)
+ u_desc_ctrl.field.blk_ctrl = 0x2; /* use EDCR0 */
+ hw_desc->src_edc[IOP3XX_AAU_EDCR0_IDX].e_desc_ctrl = 0;
+ /* fall through */
+ case 1 ... 8:
+ if (!u_desc_ctrl.field.blk_ctrl && src_cnt > 4)
+ u_desc_ctrl.field.blk_ctrl = 0x1; /* use mini-desc */
+ }
+
+ desc->src_cnt = src_cnt;
+ u_desc_ctrl.field.dest_write_en = 0;
+ hw_desc->desc_ctrl = u_desc_ctrl.value;
+ hw_desc->next_desc = 0;
+}
+
+static inline void iop_desc_set_byte_count(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan,
+ u32 byte_count)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ hw_desc.dma->byte_count = byte_count;
+ break;
+ case IOP3XX_AAU_ID:
+ hw_desc.aau->byte_count = byte_count;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline void iop_desc_set_zero_sum_byte_count(struct iop_adma_desc_slot *desc,
+ u32 len,
+ int slots_per_op)
+{
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc, *iter;
+ int i = 0;
+
+ if (len <= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT) {
+ hw_desc->byte_count = len;
+ } else {
+ do {
+ iter = iop_hw_desc_slot_idx(hw_desc, i);
+ iter->byte_count = IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+ len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+ i += slots_per_op;
+ } while (len > IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT);
+
+ if (len) {
+ iter = iop_hw_desc_slot_idx(hw_desc, i);
+ iter->byte_count = len;
+ }
+ }
+}
+
+static inline void iop_desc_set_dest_addr(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan,
+ dma_addr_t addr)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ hw_desc.dma->dest_addr = addr;
+ break;
+ case IOP3XX_AAU_ID:
+ hw_desc.aau->dest_addr = addr;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline void iop_desc_set_memcpy_src_addr(struct iop_adma_desc_slot *desc,
+ dma_addr_t addr, int slot_cnt,
+ int slots_per_op)
+{
+ struct iop3xx_desc_dma *hw_desc = desc->hw_desc;
+ hw_desc->src_addr = addr;
+}
+
+static inline void iop_desc_set_zero_sum_src_addr(struct iop_adma_desc_slot *desc,
+ int src_idx, dma_addr_t addr, int slot_cnt,
+ int slots_per_op)
+{
+
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc, *iter;
+ int i = 0;
+
+ do {
+ iter = iop_hw_desc_slot_idx(hw_desc, i);
+ iop3xx_aau_desc_set_src_addr(iter, src_idx, addr);
+ slot_cnt -= slots_per_op;
+ i += slots_per_op;
+ addr += IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+ } while (slot_cnt);
+}
+
+static inline void iop_desc_set_xor_src_addr(struct iop_adma_desc_slot *desc,
+ int src_idx, dma_addr_t addr, int slot_cnt,
+ int slots_per_op)
+{
+
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc, *iter;
+ int i = 0;
+
+ do {
+ iter = iop_hw_desc_slot_idx(hw_desc, i);
+ iop3xx_aau_desc_set_src_addr(iter, src_idx, addr);
+ slot_cnt -= slots_per_op;
+ i += slots_per_op;
+ addr += IOP_ADMA_XOR_MAX_BYTE_COUNT;
+ } while (slot_cnt);
+}
+
+static inline void iop_desc_set_next_desc(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan,
+ u32 next_desc_addr)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ BUG_ON(hw_desc.dma->next_desc);
+ hw_desc.dma->next_desc = next_desc_addr;
+ break;
+ case IOP3XX_AAU_ID:
+ BUG_ON(hw_desc.aau->next_desc);
+ hw_desc.aau->next_desc = next_desc_addr;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline u32 iop_desc_get_next_desc(struct iop_adma_desc_slot *desc,
+ struct iop_adma_chan *chan)
+{
+ union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+ switch (chan->device->id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ return hw_desc.dma->next_desc;
+ case IOP3XX_AAU_ID:
+ return hw_desc.aau->next_desc;
+ default:
+ BUG();
+ }
+
+ return 0;
+}
+
+static inline void iop_desc_set_block_fill_val(struct iop_adma_desc_slot *desc,
+ u32 val)
+{
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+ hw_desc->src[0] = val;
+}
+
+#ifndef CONFIG_ARCH_IOP32X
+static inline int iop_desc_get_zero_result(struct iop_adma_desc_slot *desc)
+{
+ struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+ struct iop3xx_aau_desc_ctrl desc_ctrl = hw_desc->desc_ctrl_field;
+
+ BUG_ON(!(desc_ctrl.tx_complete && desc_ctrl.zero_result_en));
+ return desc_ctrl.zero_result_err;
+}
+#else
+extern char iop32x_zero_result_buffer[PAGE_SIZE];
+static inline int iop_desc_get_zero_result(struct iop_adma_desc_slot *desc)
+{
+ int i;
+
+ consistent_sync(iop32x_zero_result_buffer,
+ sizeof(iop32x_zero_result_buffer),
+ DMA_FROM_DEVICE);
+
+ for (i = 0; i < sizeof(iop32x_zero_result_buffer)/sizeof(u32); i++)
+ if (((u32 *) iop32x_zero_result_buffer)[i])
+ return 1;
+ else if ((i & 0x7) == 0) /* prefetch the next cache line */
+ prefetch(((u32 *) iop32x_zero_result_buffer) + 8);
+
+ return 0;
+}
+#endif
+
+static inline void iop_chan_append(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+ /* drain write buffer so ADMA can see updated descriptor */
+ asm volatile ("mcr p15, 0, r1, c7, c10, 4" : : : "%r1");
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ *IOP3XX_DMA_CCR(id) |= 0x2;
+ break;
+ case IOP3XX_AAU_ID:
+ *IOP3XX_AAU_ACR |= 0x2;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline void iop_chan_clear_status(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+ u32 status;
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ status = *IOP3XX_DMA_CSR(id);
+ *IOP3XX_DMA_CSR(id) = status;
+ break;
+ case IOP3XX_AAU_ID:
+ status = *IOP3XX_AAU_ASR;
+ *IOP3XX_AAU_ASR = status;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline u32 iop_chan_get_status(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ return *IOP3XX_DMA_CSR(id);
+ case IOP3XX_AAU_ID:
+ return *IOP3XX_AAU_ASR;
+ default:
+ BUG();
+ }
+}
+
+static inline void iop_chan_disable(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ *IOP3XX_DMA_CCR(id) &= ~0x1;
+ break;
+ case IOP3XX_AAU_ID:
+ *IOP3XX_AAU_ACR &= ~0x1;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline void iop_chan_enable(struct iop_adma_chan *chan)
+{
+ int id = chan->device->id;
+
+ /* drain write buffer */
+ asm volatile ("mcr p15, 0, r1, c7, c10, 4" : : : "%r1");
+
+ switch (id) {
+ case IOP3XX_DMA0_ID:
+ case IOP3XX_DMA1_ID:
+ *IOP3XX_DMA_CCR(id) |= 0x1;
+ break;
+ case IOP3XX_AAU_ID:
+ *IOP3XX_AAU_ACR |= 0x1;
+ break;
+ default:
+ BUG();
+ }
+}
+
+static inline void iop_raid5_dma_chan_request(struct dma_client *client)
+{
+ dma_async_client_chan_request(client, 2, DMA_MEMCPY);
+ dma_async_client_chan_request(client, 1, DMA_XOR | DMA_ZERO_SUM);
+}
+
+static inline struct dma_chan *iop_raid5_dma_next_channel(struct dma_client *client)
+{
+ static struct dma_chan_client_ref *chan_ref = NULL;
+ static int req_idx = -1;
+ static struct dma_req *req[2];
+
+ if (unlikely(req_idx < 0)) {
+ req[0] = &client->req[0];
+ req[1] = &client->req[1];
+ }
+
+ if (++req_idx > 1)
+ req_idx = 0;
+
+ spin_lock(&client->lock);
+ if (unlikely(list_empty(&req[req_idx]->channels)))
+ chan_ref = NULL;
+ else if (!chan_ref || chan_ref->req_node.next == &req[req_idx]->channels)
+ chan_ref = list_entry(req[req_idx]->channels.next, typeof(*chan_ref),
+ req_node);
+ else
+ chan_ref = list_entry(chan_ref->req_node.next,
+ typeof(*chan_ref), req_node);
+ spin_unlock(&client->lock);
+
+ return chan_ref ? chan_ref->chan : NULL;
+}
+
+static inline struct dma_chan *iop_raid5_dma_check_channel(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ struct dma_client *client,
+ unsigned long capabilities)
+{
+ struct dma_chan_client_ref *chan_ref;
+
+ if ((chan->device->capabilities & capabilities) == capabilities)
+ return chan;
+ else if (dma_async_operation_complete(chan,
+ cookie,
+ NULL,
+ NULL) == DMA_SUCCESS) {
+ /* dma channels on req[0] */
+ if (capabilities & (DMA_MEMCPY | DMA_MEMCPY_CRC32C))
+ chan_ref = list_entry(client->req[0].channels.next,
+ typeof(*chan_ref),
+ req_node);
+ /* aau channel on req[1] */
+ else
+ chan_ref = list_entry(client->req[1].channels.next,
+ typeof(*chan_ref),
+ req_node);
+ /* switch to the new channel */
+ dma_chan_put(chan);
+ dma_chan_get(chan_ref->chan);
+
+ return chan_ref->chan;
+ } else
+ return NULL;
+}
+#endif /* _IOP3XX_ADMA_H */

2006-09-11 23:24:00

by Dan Williams

[permalink] [raw]
Subject: [PATCH 07/19] raid5: remove compute_block and compute_parity5

From: Dan Williams <[email protected]>

replaced by the workqueue implementation

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/raid5.c | 123 ----------------------------------------------------
1 files changed, 0 insertions(+), 123 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a07b52b..ad6883b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -964,129 +964,6 @@ #define check_xor() do { \
} while(0)


-static void compute_block(struct stripe_head *sh, int dd_idx)
-{
- int i, count, disks = sh->disks;
- void *ptr[MAX_XOR_BLOCKS], *p;
-
- PRINTK("compute_block, stripe %llu, idx %d\n",
- (unsigned long long)sh->sector, dd_idx);
-
- ptr[0] = page_address(sh->dev[dd_idx].page);
- memset(ptr[0], 0, STRIPE_SIZE);
- count = 1;
- for (i = disks ; i--; ) {
- if (i == dd_idx)
- continue;
- p = page_address(sh->dev[i].page);
- if (test_bit(R5_UPTODATE, &sh->dev[i].flags))
- ptr[count++] = p;
- else
- printk(KERN_ERR "compute_block() %d, stripe %llu, %d"
- " not present\n", dd_idx,
- (unsigned long long)sh->sector, i);
-
- check_xor();
- }
- if (count != 1)
- xor_block(count, STRIPE_SIZE, ptr);
- set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
-}
-
-static void compute_parity5(struct stripe_head *sh, int method)
-{
- raid5_conf_t *conf = sh->raid_conf;
- int i, pd_idx = sh->pd_idx, disks = sh->disks, count;
- void *ptr[MAX_XOR_BLOCKS];
- struct bio *chosen;
-
- PRINTK("compute_parity5, stripe %llu, method %d\n",
- (unsigned long long)sh->sector, method);
-
- count = 1;
- ptr[0] = page_address(sh->dev[pd_idx].page);
- switch(method) {
- case READ_MODIFY_WRITE:
- BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags));
- for (i=disks ; i-- ;) {
- if (i==pd_idx)
- continue;
- if (sh->dev[i].towrite &&
- test_bit(R5_UPTODATE, &sh->dev[i].flags)) {
- ptr[count++] = page_address(sh->dev[i].page);
- chosen = sh->dev[i].towrite;
- sh->dev[i].towrite = NULL;
-
- if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
- wake_up(&conf->wait_for_overlap);
-
- BUG_ON(sh->dev[i].written);
- sh->dev[i].written = chosen;
- check_xor();
- }
- }
- break;
- case RECONSTRUCT_WRITE:
- memset(ptr[0], 0, STRIPE_SIZE);
- for (i= disks; i-- ;)
- if (i!=pd_idx && sh->dev[i].towrite) {
- chosen = sh->dev[i].towrite;
- sh->dev[i].towrite = NULL;
-
- if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
- wake_up(&conf->wait_for_overlap);
-
- BUG_ON(sh->dev[i].written);
- sh->dev[i].written = chosen;
- }
- break;
- case CHECK_PARITY:
- break;
- }
- if (count>1) {
- xor_block(count, STRIPE_SIZE, ptr);
- count = 1;
- }
-
- for (i = disks; i--;)
- if (sh->dev[i].written) {
- sector_t sector = sh->dev[i].sector;
- struct bio *wbi = sh->dev[i].written;
- while (wbi && wbi->bi_sector < sector + STRIPE_SECTORS) {
- copy_data(1, wbi, sh->dev[i].page, sector);
- wbi = r5_next_bio(wbi, sector);
- }
-
- set_bit(R5_LOCKED, &sh->dev[i].flags);
- set_bit(R5_UPTODATE, &sh->dev[i].flags);
- }
-
- switch(method) {
- case RECONSTRUCT_WRITE:
- case CHECK_PARITY:
- for (i=disks; i--;)
- if (i != pd_idx) {
- ptr[count++] = page_address(sh->dev[i].page);
- check_xor();
- }
- break;
- case READ_MODIFY_WRITE:
- for (i = disks; i--;)
- if (sh->dev[i].written) {
- ptr[count++] = page_address(sh->dev[i].page);
- check_xor();
- }
- }
- if (count != 1)
- xor_block(count, STRIPE_SIZE, ptr);
-
- if (method != CHECK_PARITY) {
- set_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
- set_bit(R5_LOCKED, &sh->dev[pd_idx].flags);
- } else
- clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
-}
-
static void compute_parity6(struct stripe_head *sh, int method)
{
raid6_conf_t *conf = sh->raid_conf;

2006-09-11 23:23:54

by Dan Williams

[permalink] [raw]
Subject: [PATCH 02/19] raid5: move write operations to a workqueue

From: Dan Williams <[email protected]>

Enable handle_stripe5 to pass off write operations to
raid5_do_soft_blocks_ops (which can be run as a workqueue). The operations
moved are reconstruct-writes and read-modify-writes formerly handled by
compute_parity5.

Changelog:
* moved raid5_do_soft_block_ops changes into a separate patch
* changed handle_write_operations5 to only initiate write operations, which
prevents new writes from being requested while the current one is in flight
* all blocks undergoing a write are now marked locked and !uptodate at the
beginning of the write operation
* blocks undergoing a read-modify-write need a request flag to distinguish
them from blocks that are locked for reading. Reconstruct-writes still use
the R5_LOCKED bit to select blocks for the operation
* integrated the work queue Kconfig option

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/Kconfig | 21 +++++
drivers/md/raid5.c | 192 ++++++++++++++++++++++++++++++++++++++------
include/linux/raid/raid5.h | 3 +
3 files changed, 190 insertions(+), 26 deletions(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index bf869ed..2a16b3b 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -162,6 +162,27 @@ config MD_RAID5_RESHAPE
There should be enough spares already present to make the new
array workable.

+config MD_RAID456_WORKQUEUE
+ depends on MD_RAID456
+ bool "Offload raid work to a workqueue from raid5d"
+ ---help---
+ This option enables raid work (block copy and xor operations)
+ to run in a workqueue. If your platform has a high context
+ switch penalty say N. If you are using hardware offload or
+ are running on an SMP platform say Y.
+
+ If unsure say, Y.
+
+config MD_RAID456_WORKQUEUE_MULTITHREAD
+ depends on MD_RAID456_WORKQUEUE && SMP
+ bool "Enable multi-threaded raid processing"
+ default y
+ ---help---
+ This option controls whether the raid workqueue will be multi-
+ threaded or single threaded.
+
+ If unsure say, Y.
+
config MD_MULTIPATH
tristate "Multipath I/O support"
depends on BLK_DEV_MD
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8fde62b..e39d248 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -222,6 +222,8 @@ static void init_stripe(struct stripe_he

BUG_ON(atomic_read(&sh->count) != 0);
BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
+ BUG_ON(sh->ops.state);
+ BUG_ON(sh->ops.pending);

CHECK_DEVLOCK();
PRINTK("init_stripe called, stripe %llu\n",
@@ -331,6 +333,9 @@ static int grow_one_stripe(raid5_conf_t
memset(sh, 0, sizeof(*sh) + (conf->raid_disks-1)*sizeof(struct r5dev));
sh->raid_conf = conf;
spin_lock_init(&sh->lock);
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ INIT_WORK(&sh->ops.work, conf->do_block_ops, sh);
+ #endif

if (grow_buffers(sh, conf->raid_disks)) {
shrink_buffers(sh, conf->raid_disks);
@@ -1266,7 +1271,72 @@ static void compute_block_2(struct strip
}
}

+static int handle_write_operations5(struct stripe_head *sh, int rcw)
+{
+ int i, pd_idx = sh->pd_idx, disks = sh->disks;
+ int locked=0;
+
+ if (rcw == 0) {
+ /* skip the drain operation on an expand */
+ if (test_bit(STRIPE_OP_RCW_Expand, &sh->ops.state)) {
+ set_bit(STRIPE_OP_RCW, &sh->state);
+ set_bit(STRIPE_OP_RCW_Parity, &sh->ops.state);
+ for (i=disks ; i-- ;) {
+ set_bit(R5_LOCKED, &sh->dev[i].flags);
+ locked++;
+ }
+ } else { /* enter stage 1 of reconstruct write operation */
+ set_bit(STRIPE_OP_RCW, &sh->state);
+ set_bit(STRIPE_OP_RCW_Drain, &sh->ops.state);
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+
+ if (dev->towrite) {
+ set_bit(R5_LOCKED, &dev->flags);
+ clear_bit(R5_UPTODATE, &dev->flags);
+ locked++;
+ }
+ }
+ }
+ } else {
+ /* enter stage 1 of read modify write operation */
+ BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags));
+
+ set_bit(STRIPE_OP_RMW, &sh->state);
+ set_bit(STRIPE_OP_RMW_ParityPre, &sh->ops.state);
+ for (i=disks ; i-- ;) {
+ struct r5dev *dev = &sh->dev[i];
+ if (i==pd_idx)
+ continue;
+
+ /* For a read-modify write there may be blocks that are
+ * locked for reading while others are ready to be written
+ * so we distinguish these blocks by the RMWReq bit
+ */
+ if (dev->towrite &&
+ test_bit(R5_UPTODATE, &dev->flags)) {
+ set_bit(R5_RMWReq, &dev->flags);
+ set_bit(R5_LOCKED, &dev->flags);
+ clear_bit(R5_UPTODATE, &dev->flags);
+ locked++;
+ }
+ }
+ }
+
+ /* keep the parity disk locked while asynchronous operations
+ * are in flight
+ */
+ set_bit(R5_LOCKED, &sh->dev[pd_idx].flags);
+ clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
+ locked++;
+ sh->ops.pending++;

+ PRINTK("%s: stripe %llu locked: %d op_state: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ locked, sh->ops.state);
+
+ return locked;
+}

/*
* Each stripe/dev can have one or more bion attached.
@@ -1664,7 +1734,6 @@ static void raid5_do_soft_block_ops(void
* schedule a write of some buffers
* return confirmation of parity correctness
*
- * Parity calculations are done inside the stripe lock
* buffers are taken off read_list or write_list, and bh_cache buffers
* get BH_Lock set before the stripe lock is released.
*
@@ -1679,13 +1748,13 @@ static void handle_stripe5(struct stripe
int i;
int syncing, expanding, expanded;
int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
- int non_overwrite = 0;
+ int non_overwrite=0, write_complete=0;
int failed_num=0;
struct r5dev *dev;

- PRINTK("handling stripe %llu, cnt=%d, pd_idx=%d\n",
- (unsigned long long)sh->sector, atomic_read(&sh->count),
- sh->pd_idx);
+ PRINTK("handling stripe %llu, state=%#lx cnt=%d, pd_idx=%d\n",
+ (unsigned long long)sh->sector, sh->state, atomic_read(&sh->count),
+ sh->pd_idx);

spin_lock(&sh->lock);
clear_bit(STRIPE_HANDLE, &sh->state);
@@ -1926,8 +1995,56 @@ #endif
set_bit(STRIPE_HANDLE, &sh->state);
}

- /* now to consider writing and what else, if anything should be read */
- if (to_write) {
+ /* Now we check to see if any write operations have recently
+ * completed
+ */
+ if (test_bit(STRIPE_OP_RCW, &sh->state) &&
+ test_bit(STRIPE_OP_RCW_Done, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_RCW, &sh->state);
+ clear_bit(STRIPE_OP_RCW_Done, &sh->ops.state);
+ write_complete++;
+ }
+
+ if (test_bit(STRIPE_OP_RMW, &sh->state) &&
+ test_bit(STRIPE_OP_RMW_Done, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_RMW, &sh->state);
+ clear_bit(STRIPE_OP_RMW_Done, &sh->ops.state);
+ BUG_ON(++write_complete > 1);
+ for (i=disks; i--;)
+ clear_bit(R5_RMWReq, &sh->dev[i].flags);
+ }
+
+ /* All the 'written' buffers and the parity block are ready to be
+ * written back to disk
+ */
+ if (write_complete) {
+ BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags));
+ for (i=disks; i--;) {
+ dev = &sh->dev[i];
+ if (test_bit(R5_LOCKED, &dev->flags) &&
+ (i == sh->pd_idx || dev->written)) {
+ PRINTK("Writing block %d\n", i);
+ set_bit(R5_Wantwrite, &dev->flags);
+ if (!test_bit(R5_Insync, &dev->flags)
+ || (i==sh->pd_idx && failed == 0))
+ set_bit(STRIPE_INSYNC, &sh->state);
+ }
+ }
+ if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
+ atomic_dec(&conf->preread_active_stripes);
+ if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
+ md_wakeup_thread(conf->mddev->thread);
+ }
+ }
+
+ /* 1/ Now to consider new write requests and what else, if anything should be read
+ * 2/ Check operations clobber the parity block so do not start new writes while
+ * a check is in flight
+ * 3/ Write operations do not stack
+ */
+ if (to_write && !test_bit(STRIPE_OP_RCW, &sh->state) &&
+ !test_bit(STRIPE_OP_RMW, &sh->state) &&
+ !test_bit(STRIPE_OP_CHECK, &sh->state)) {
int rmw=0, rcw=0;
for (i=disks ; i--;) {
/* would I have to read this buffer for read_modify_write */
@@ -2000,25 +2117,8 @@ #endif
}
/* now if nothing is locked, and if we have enough data, we can start a write request */
if (locked == 0 && (rcw == 0 ||rmw == 0) &&
- !test_bit(STRIPE_BIT_DELAY, &sh->state)) {
- PRINTK("Computing parity...\n");
- compute_parity5(sh, rcw==0 ? RECONSTRUCT_WRITE : READ_MODIFY_WRITE);
- /* now every locked buffer is ready to be written */
- for (i=disks; i--;)
- if (test_bit(R5_LOCKED, &sh->dev[i].flags)) {
- PRINTK("Writing block %d\n", i);
- locked++;
- set_bit(R5_Wantwrite, &sh->dev[i].flags);
- if (!test_bit(R5_Insync, &sh->dev[i].flags)
- || (i==sh->pd_idx && failed == 0))
- set_bit(STRIPE_INSYNC, &sh->state);
- }
- if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
- atomic_dec(&conf->preread_active_stripes);
- if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
- }
- }
+ !test_bit(STRIPE_BIT_DELAY, &sh->state))
+ locked += handle_write_operations5(sh, rcw);
}

/* maybe we need to check and possibly fix the parity for this stripe
@@ -2150,8 +2250,17 @@ #endif
}
}

+ queue_raid_work(sh);
+
spin_unlock(&sh->lock);

+ #ifndef CONFIG_MD_RAID456_WORKQUEUE
+ while (test_bit(STRIPE_OP_QUEUED, &sh->state)) {
+ PRINTK("run do_block_ops\n");
+ conf->do_block_ops(sh);
+ }
+ #endif
+
while ((bi=return_bi)) {
int bytes = bi->bi_size;

@@ -3439,6 +3548,30 @@ static int run(mddev_t *mddev)
if (!conf->spare_page)
goto abort;
}
+
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ sprintf(conf->workqueue_name, "%s_raid5_ops",
+ mddev->gendisk->disk_name);
+
+ #ifdef CONFIG_MD_RAID456_MULTITHREAD
+ if ((conf->block_ops_queue = create_workqueue(conf->workqueue_name))
+ == NULL)
+ goto abort;
+ #else
+ if ((conf->block_ops_queue = create_singlethread_workqueue(
+ conf->workqueue_name)) == NULL)
+ goto abort;
+ #endif
+ #endif
+
+ /* To Do:
+ * 1/ Offload to asynchronous copy / xor engines
+ * 2/ Automated selection of optimal do_block_ops
+ * routine similar to the xor template selection
+ */
+ conf->do_block_ops = raid5_do_soft_block_ops;
+
+
spin_lock_init(&conf->device_lock);
init_waitqueue_head(&conf->wait_for_stripe);
init_waitqueue_head(&conf->wait_for_overlap);
@@ -3598,6 +3731,10 @@ abort:
safe_put_page(conf->spare_page);
kfree(conf->disks);
kfree(conf->stripe_hashtbl);
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ if (conf->do_block_ops)
+ destroy_workqueue(conf->block_ops_queue);
+ #endif
kfree(conf);
}
mddev->private = NULL;
@@ -3618,6 +3755,9 @@ static int stop(mddev_t *mddev)
blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
sysfs_remove_group(&mddev->kobj, &raid5_attrs_group);
kfree(conf->disks);
+ #ifdef CONFIG_MD_RAID456_WORKQUEUE
+ destroy_workqueue(conf->block_ops_queue);
+ #endif
kfree(conf);
mddev->private = NULL;
return 0;
diff --git a/include/linux/raid/raid5.h b/include/linux/raid/raid5.h
index c8a315b..31ae55c 100644
--- a/include/linux/raid/raid5.h
+++ b/include/linux/raid/raid5.h
@@ -3,6 +3,7 @@ #define _RAID5_H

#include <linux/raid/md.h>
#include <linux/raid/xor.h>
+#include <linux/workqueue.h>

/*
*
@@ -333,6 +334,7 @@ struct raid5_private_data {
atomic_t preread_active_stripes; /* stripes with scheduled io */

atomic_t reshape_stripes; /* stripes with pending writes for reshape */
+
#ifdef CONFIG_MD_RAID456_WORKQUEUE
struct workqueue_struct *block_ops_queue;
#endif
@@ -376,6 +378,7 @@ struct raid5_private_data {
typedef struct raid5_private_data raid5_conf_t;

#define mddev_to_conf(mddev) ((raid5_conf_t *) mddev->private)
+
/* must be called under the stripe lock */
static inline void queue_raid_work(struct stripe_head *sh)
{

2006-09-11 23:18:38

by Dan Williams

[permalink] [raw]
Subject: [PATCH 11/19] dmaengine: add memset as an asynchronous dma operation

From: Dan Williams <[email protected]>

Changelog:
* make the dmaengine api EXPORT_SYMBOL_GPL
* zero sum support should be standalone, not integrated into xor

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 15 ++++++++++
drivers/dma/ioatdma.c | 5 +++
include/linux/dmaengine.h | 68 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index e78ce89..fe62237 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -604,6 +604,17 @@ dma_cookie_t dma_async_do_xor_err(struct
return -ENXIO;
}

+/**
+ * dma_async_do_memset_err - default function for dma devices that
+ * do not support memset
+ */
+dma_cookie_t dma_async_do_memset_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ int val, size_t len, unsigned long flags)
+{
+ return -ENXIO;
+}
+
static int __init dma_bus_init(void)
{
mutex_init(&dma_list_mutex);
@@ -621,6 +632,9 @@ EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to
EXPORT_SYMBOL_GPL(dma_async_memcpy_dma_to_dma);
EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to_dma);
EXPORT_SYMBOL_GPL(dma_async_memcpy_dma_to_pg);
+EXPORT_SYMBOL_GPL(dma_async_memset_buf);
+EXPORT_SYMBOL_GPL(dma_async_memset_page);
+EXPORT_SYMBOL_GPL(dma_async_memset_dma);
EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg);
EXPORT_SYMBOL_GPL(dma_async_xor_dma_list_to_dma);
EXPORT_SYMBOL_GPL(dma_async_operation_complete);
@@ -629,6 +643,7 @@ EXPORT_SYMBOL_GPL(dma_async_device_regis
EXPORT_SYMBOL_GPL(dma_async_device_unregister);
EXPORT_SYMBOL_GPL(dma_chan_cleanup);
EXPORT_SYMBOL_GPL(dma_async_do_xor_err);
+EXPORT_SYMBOL_GPL(dma_async_do_memset_err);
EXPORT_SYMBOL_GPL(dma_async_chan_init);
EXPORT_SYMBOL_GPL(dma_async_map_page);
EXPORT_SYMBOL_GPL(dma_async_map_single);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index 0159d14..231247c 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -637,6 +637,10 @@ extern dma_cookie_t dma_async_do_xor_err
union dmaengine_addr src, unsigned int src_cnt,
unsigned int src_off, size_t len, unsigned long flags);

+extern dma_cookie_t dma_async_do_memset_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ int val, size_t size, unsigned long flags);
+
static dma_addr_t ioat_map_page(struct dma_chan *chan, struct page *page,
unsigned long offset, size_t size,
int direction)
@@ -748,6 +752,7 @@ #endif
device->common.capabilities = DMA_MEMCPY;
device->common.device_do_dma_memcpy = do_ioat_dma_memcpy;
device->common.device_do_dma_xor = dma_async_do_xor_err;
+ device->common.device_do_dma_memset = dma_async_do_memset_err;
device->common.map_page = ioat_map_page;
device->common.map_single = ioat_map_single;
device->common.unmap_page = ioat_unmap_page;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index cb4cfcf..8d53b08 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -260,6 +260,7 @@ struct dma_chan_client_ref {
* @device_issue_pending: push appended descriptors to hardware
* @device_do_dma_memcpy: perform memcpy with a dma engine
* @device_do_dma_xor: perform block xor with a dma engine
+ * @device_do_dma_memset: perform block fill with a dma engine
*/
struct dma_device {

@@ -284,6 +285,9 @@ struct dma_device {
union dmaengine_addr src, unsigned int src_cnt,
unsigned int src_off, size_t len,
unsigned long flags);
+ dma_cookie_t (*device_do_dma_memset)(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ int value, size_t len, unsigned long flags);
enum dma_status (*device_operation_complete)(struct dma_chan *chan,
dma_cookie_t cookie, dma_cookie_t *last,
dma_cookie_t *used);
@@ -478,6 +482,70 @@ static inline dma_cookie_t dma_async_mem
}

/**
+ * dma_async_memset_buf - offloaded memset
+ * @chan: DMA channel to offload memset to
+ * @buf: destination buffer
+ * @val: value to initialize the buffer
+ * @len: length
+ */
+static inline dma_cookie_t dma_async_memset_buf(struct dma_chan *chan,
+ void *buf, int val, size_t len)
+{
+ unsigned long flags = DMA_DEST_BUF;
+ union dmaengine_addr dest_addr = { .buf = buf };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+ per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_memset(chan, dest_addr, 0, val,
+ len, flags);
+}
+
+/**
+ * dma_async_memset_page - offloaded memset
+ * @chan: DMA channel to offload memset to
+ * @page: destination page
+ * @offset: offset into the destination
+ * @val: value to initialize the buffer
+ * @len: length
+ */
+static inline dma_cookie_t dma_async_memset_page(struct dma_chan *chan,
+ struct page *page, unsigned int offset, int val, size_t len)
+{
+ unsigned long flags = DMA_DEST_PAGE;
+ union dmaengine_addr dest_addr = { .pg = page };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+ per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_memset(chan, dest_addr, offset, val,
+ len, flags);
+}
+
+/**
+ * dma_async_memset_dma - offloaded memset
+ * @chan: DMA channel to offload memset to
+ * @page: destination dma address
+ * @val: value to initialize the buffer
+ * @len: length
+ */
+static inline dma_cookie_t dma_async_memset_dma(struct dma_chan *chan,
+ dma_addr_t dma, int val, size_t len)
+{
+ unsigned long flags = DMA_DEST_DMA;
+ union dmaengine_addr dest_addr = { .dma = dma };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+ per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_memset(chan, dest_addr, 0, val,
+ len, flags);
+}
+
+/**
* dma_async_xor_pgs_to_pg - offloaded xor from pages to page
* @chan: DMA channel to offload xor to
* @dest_page: destination page

2006-09-11 23:22:24

by Dan Williams

[permalink] [raw]
Subject: [PATCH 13/19] dmaengine: add support for dma xor zero sum operations

From: Dan Williams <[email protected]>

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 15 ++++++++++++
drivers/dma/ioatdma.c | 6 +++++
include/linux/dmaengine.h | 56 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 33ad690..190c612 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -617,6 +617,18 @@ dma_cookie_t dma_async_do_xor_err(struct
}

/**
+ * dma_async_do_zero_sum_err - default function for dma devices that
+ * do not support xor zero sum
+ */
+dma_cookie_t dma_async_do_zero_sum_err(struct dma_chan *chan,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, u32 *result,
+ unsigned long flags)
+{
+ return -ENXIO;
+}
+
+/**
* dma_async_do_memset_err - default function for dma devices that
* do not support memset
*/
@@ -649,6 +661,8 @@ EXPORT_SYMBOL_GPL(dma_async_memset_page)
EXPORT_SYMBOL_GPL(dma_async_memset_dma);
EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg);
EXPORT_SYMBOL_GPL(dma_async_xor_dma_list_to_dma);
+EXPORT_SYMBOL_GPL(dma_async_zero_sum_pgs);
+EXPORT_SYMBOL_GPL(dma_async_zero_sum_dma_list);
EXPORT_SYMBOL_GPL(dma_async_operation_complete);
EXPORT_SYMBOL_GPL(dma_async_issue_pending);
EXPORT_SYMBOL_GPL(dma_async_device_register);
@@ -656,6 +670,7 @@ EXPORT_SYMBOL_GPL(dma_async_device_unreg
EXPORT_SYMBOL_GPL(dma_chan_cleanup);
EXPORT_SYMBOL_GPL(dma_async_do_memcpy_err);
EXPORT_SYMBOL_GPL(dma_async_do_xor_err);
+EXPORT_SYMBOL_GPL(dma_async_do_zero_sum_err);
EXPORT_SYMBOL_GPL(dma_async_do_memset_err);
EXPORT_SYMBOL_GPL(dma_async_chan_init);
EXPORT_SYMBOL_GPL(dma_async_map_page);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index 231247c..4e90b02 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -637,6 +637,11 @@ extern dma_cookie_t dma_async_do_xor_err
union dmaengine_addr src, unsigned int src_cnt,
unsigned int src_off, size_t len, unsigned long flags);

+extern dma_cookie_t dma_async_do_zero_sum_err(struct dma_chan *chan,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, u32 *result,
+ unsigned long flags);
+
extern dma_cookie_t dma_async_do_memset_err(struct dma_chan *chan,
union dmaengine_addr dest, unsigned int dest_off,
int val, size_t size, unsigned long flags);
@@ -752,6 +757,7 @@ #endif
device->common.capabilities = DMA_MEMCPY;
device->common.device_do_dma_memcpy = do_ioat_dma_memcpy;
device->common.device_do_dma_xor = dma_async_do_xor_err;
+ device->common.device_do_dma_zero_sum = dma_async_do_zero_sum_err;
device->common.device_do_dma_memset = dma_async_do_memset_err;
device->common.map_page = ioat_map_page;
device->common.map_single = ioat_map_single;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 8d53b08..9fd6cbd 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -260,6 +260,7 @@ struct dma_chan_client_ref {
* @device_issue_pending: push appended descriptors to hardware
* @device_do_dma_memcpy: perform memcpy with a dma engine
* @device_do_dma_xor: perform block xor with a dma engine
+ * @device_do_dma_zero_sum: perform block xor zero sum with a dma engine
* @device_do_dma_memset: perform block fill with a dma engine
*/
struct dma_device {
@@ -285,6 +286,10 @@ struct dma_device {
union dmaengine_addr src, unsigned int src_cnt,
unsigned int src_off, size_t len,
unsigned long flags);
+ dma_cookie_t (*device_do_dma_zero_sum)(struct dma_chan *chan,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, u32 *result,
+ unsigned long flags);
dma_cookie_t (*device_do_dma_memset)(struct dma_chan *chan,
union dmaengine_addr dest, unsigned int dest_off,
int value, size_t len, unsigned long flags);
@@ -601,6 +606,57 @@ static inline dma_cookie_t dma_async_xor
}

/**
+ * dma_async_zero_sum_pgs - offloaded xor zero sum from a list of pages
+ * @chan: DMA channel to offload zero sum to
+ * @src_pgs: array of source pages
+ * @src_cnt: number of source pages
+ * @src_off: offset in pages to xor from
+ * @len: length
+ * @result: set to 1 if sum is zero else 0
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_zero_sum_pgs(struct dma_chan *chan,
+ struct page **src_pgs, unsigned int src_cnt, unsigned int src_off,
+ size_t len, u32 *result)
+{
+ unsigned long flags = DMA_DEST_PAGE | DMA_SRC_PAGES;
+ union dmaengine_addr src_addr = { .pgs = src_pgs };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_xor += len * src_cnt;
+ per_cpu_ptr(chan->local, cpu)->xor_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_zero_sum(chan,
+ src_addr, src_cnt, src_off, len, result, flags);
+}
+
+/**
+ * dma_async_zero_sum_dma_list - offloaded xor zero sum from a dma list
+ * @chan: DMA channel to offload zero sum to
+ * @src_list: array of sources already mapped and consistent
+ * @src_cnt: number of sources
+ * @len: length
+ * @result: set to 1 if sum is zero else 0
+ */
+static inline dma_cookie_t dma_async_zero_sum_dma_list(struct dma_chan *chan,
+ dma_addr_t *src_list, unsigned int src_cnt, size_t len, u32 *result)
+{
+ unsigned long flags = DMA_DEST_DMA | DMA_SRC_DMA_LIST;
+ union dmaengine_addr src_addr = { .dma_list = src_list };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_xor += len * src_cnt;
+ per_cpu_ptr(chan->local, cpu)->xor_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_zero_sum(chan,
+ src_addr, src_cnt, 0, len, result, flags);
+}
+
+/**
* dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*

2006-09-11 23:23:07

by Dan Williams

[permalink] [raw]
Subject: [PATCH 09/19] dmaengine: reduce backend address permutations

From: Dan Williams <[email protected]>

Change the backend dma driver API to accept a 'union dmaengine_addr'. The
intent is to be able to support a wide range of frontend address type
permutations without needing an equal number of function type permutations
on the backend.

Changelog:
* make the dmaengine api EXPORT_SYMBOL_GPL
* zero sum support should be standalone, not integrated into xor

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 15 ++-
drivers/dma/ioatdma.c | 186 +++++++++++++++++--------------------------
include/linux/dmaengine.h | 193 +++++++++++++++++++++++++++++++++++++++------
3 files changed, 249 insertions(+), 145 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index e10f19d..9b02afa 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -593,12 +593,13 @@ void dma_async_device_unregister(struct
}

/**
- * dma_async_xor_pgs_to_pg_err - default function for dma devices that
+ * dma_async_do_xor_err - default function for dma devices that
* do not support xor
*/
-dma_cookie_t dma_async_xor_pgs_to_pg_err(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pgs,
- unsigned int src_cnt, unsigned int src_off, size_t len)
+dma_cookie_t dma_async_do_xor_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, unsigned long flags)
{
return -ENXIO;
}
@@ -617,11 +618,15 @@ EXPORT_SYMBOL_GPL(dma_async_client_chan_
EXPORT_SYMBOL_GPL(dma_async_memcpy_buf_to_buf);
EXPORT_SYMBOL_GPL(dma_async_memcpy_buf_to_pg);
EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to_pg);
+EXPORT_SYMBOL_GPL(dma_async_memcpy_dma_to_dma);
+EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to_dma);
+EXPORT_SYMBOL_GPL(dma_async_memcpy_dma_to_pg);
EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg);
+EXPORT_SYMBOL_GPL(dma_async_xor_dma_list_to_dma);
EXPORT_SYMBOL_GPL(dma_async_operation_complete);
EXPORT_SYMBOL_GPL(dma_async_issue_pending);
EXPORT_SYMBOL_GPL(dma_async_device_register);
EXPORT_SYMBOL_GPL(dma_async_device_unregister);
EXPORT_SYMBOL_GPL(dma_chan_cleanup);
-EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg_err);
+EXPORT_SYMBOL_GPL(dma_async_do_xor_err);
EXPORT_SYMBOL_GPL(dma_async_chan_init);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index 415de03..dd5b9f0 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -213,20 +213,25 @@ static void ioat_dma_free_chan_resources

/**
* do_ioat_dma_memcpy - actual function that initiates a IOAT DMA transaction
- * @ioat_chan: IOAT DMA channel handle
- * @dest: DMA destination address
- * @src: DMA source address
+ * @chan: IOAT DMA channel handle
+ * @dest: DMAENGINE destination address
+ * @dest_off: Page offset
+ * @src: DMAENGINE source address
+ * @src_off: Page offset
* @len: transaction length in bytes
*/

-static dma_cookie_t do_ioat_dma_memcpy(struct ioat_dma_chan *ioat_chan,
- dma_addr_t dest,
- dma_addr_t src,
- size_t len)
+static dma_cookie_t do_ioat_dma_memcpy(struct dma_chan *dma_chan,
+ union dmaengine_addr dest,
+ unsigned int dest_off,
+ union dmaengine_addr src,
+ unsigned int src_off,
+ size_t len,
+ unsigned long flags)
{
struct ioat_desc_sw *first;
struct ioat_desc_sw *prev;
- struct ioat_desc_sw *new;
+ struct ioat_desc_sw *new = 0;
dma_cookie_t cookie;
LIST_HEAD(new_chain);
u32 copy;
@@ -234,16 +239,47 @@ static dma_cookie_t do_ioat_dma_memcpy(s
dma_addr_t orig_src, orig_dst;
unsigned int desc_count = 0;
unsigned int append = 0;
+ struct ioat_dma_chan *ioat_chan = to_ioat_chan(dma_chan);

- if (!ioat_chan || !dest || !src)
+ if (!dma_chan || !dest.dma || !src.dma)
return -EFAULT;

if (!len)
return ioat_chan->common.cookie;

+ switch (flags & (DMA_SRC_BUF | DMA_SRC_PAGE | DMA_SRC_DMA)) {
+ case DMA_SRC_BUF:
+ src.dma = pci_map_single(ioat_chan->device->pdev,
+ src.buf, len, PCI_DMA_TODEVICE);
+ break;
+ case DMA_SRC_PAGE:
+ src.dma = pci_map_page(ioat_chan->device->pdev,
+ src.pg, src_off, len, PCI_DMA_TODEVICE);
+ break;
+ case DMA_SRC_DMA:
+ break;
+ default:
+ return -EFAULT;
+ }
+
+ switch (flags & (DMA_DEST_BUF | DMA_DEST_PAGE | DMA_DEST_DMA)) {
+ case DMA_DEST_BUF:
+ dest.dma = pci_map_single(ioat_chan->device->pdev,
+ dest.buf, len, PCI_DMA_FROMDEVICE);
+ break;
+ case DMA_DEST_PAGE:
+ dest.dma = pci_map_page(ioat_chan->device->pdev,
+ dest.pg, dest_off, len, PCI_DMA_FROMDEVICE);
+ break;
+ case DMA_DEST_DMA:
+ break;
+ default:
+ return -EFAULT;
+ }
+
orig_len = len;
- orig_src = src;
- orig_dst = dest;
+ orig_src = src.dma;
+ orig_dst = dest.dma;

first = NULL;
prev = NULL;
@@ -266,8 +302,8 @@ static dma_cookie_t do_ioat_dma_memcpy(s

new->hw->size = copy;
new->hw->ctl = 0;
- new->hw->src_addr = src;
- new->hw->dst_addr = dest;
+ new->hw->src_addr = src.dma;
+ new->hw->dst_addr = dest.dma;
new->cookie = 0;

/* chain together the physical address list for the HW */
@@ -279,8 +315,8 @@ static dma_cookie_t do_ioat_dma_memcpy(s
prev = new;

len -= copy;
- dest += copy;
- src += copy;
+ dest.dma += copy;
+ src.dma += copy;

list_add_tail(&new->node, &new_chain);
desc_count++;
@@ -321,89 +357,7 @@ static dma_cookie_t do_ioat_dma_memcpy(s
}

/**
- * ioat_dma_memcpy_buf_to_buf - wrapper that takes src & dest bufs
- * @chan: IOAT DMA channel handle
- * @dest: DMA destination address
- * @src: DMA source address
- * @len: transaction length in bytes
- */
-
-static dma_cookie_t ioat_dma_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest,
- void *src,
- size_t len)
-{
- dma_addr_t dest_addr;
- dma_addr_t src_addr;
- struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
-
- dest_addr = pci_map_single(ioat_chan->device->pdev,
- dest, len, PCI_DMA_FROMDEVICE);
- src_addr = pci_map_single(ioat_chan->device->pdev,
- src, len, PCI_DMA_TODEVICE);
-
- return do_ioat_dma_memcpy(ioat_chan, dest_addr, src_addr, len);
-}
-
-/**
- * ioat_dma_memcpy_buf_to_pg - wrapper, copying from a buf to a page
- * @chan: IOAT DMA channel handle
- * @page: pointer to the page to copy to
- * @offset: offset into that page
- * @src: DMA source address
- * @len: transaction length in bytes
- */
-
-static dma_cookie_t ioat_dma_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page,
- unsigned int offset,
- void *src,
- size_t len)
-{
- dma_addr_t dest_addr;
- dma_addr_t src_addr;
- struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
-
- dest_addr = pci_map_page(ioat_chan->device->pdev,
- page, offset, len, PCI_DMA_FROMDEVICE);
- src_addr = pci_map_single(ioat_chan->device->pdev,
- src, len, PCI_DMA_TODEVICE);
-
- return do_ioat_dma_memcpy(ioat_chan, dest_addr, src_addr, len);
-}
-
-/**
- * ioat_dma_memcpy_pg_to_pg - wrapper, copying between two pages
- * @chan: IOAT DMA channel handle
- * @dest_pg: pointer to the page to copy to
- * @dest_off: offset into that page
- * @src_pg: pointer to the page to copy from
- * @src_off: offset into that page
- * @len: transaction length in bytes. This is guaranteed not to make a copy
- * across a page boundary.
- */
-
-static dma_cookie_t ioat_dma_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg,
- unsigned int dest_off,
- struct page *src_pg,
- unsigned int src_off,
- size_t len)
-{
- dma_addr_t dest_addr;
- dma_addr_t src_addr;
- struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
-
- dest_addr = pci_map_page(ioat_chan->device->pdev,
- dest_pg, dest_off, len, PCI_DMA_FROMDEVICE);
- src_addr = pci_map_page(ioat_chan->device->pdev,
- src_pg, src_off, len, PCI_DMA_TODEVICE);
-
- return do_ioat_dma_memcpy(ioat_chan, dest_addr, src_addr, len);
-}
-
-/**
- * ioat_dma_memcpy_issue_pending - push potentially unrecognized appended descriptors to hw
+ * ioat_dma_memcpy_issue_pending - push potentially unrecognoized appended descriptors to hw
* @chan: DMA channel handle
*/

@@ -626,24 +580,24 @@ #define IOAT_TEST_SIZE 2000
static int ioat_self_test(struct ioat_device *device)
{
int i;
- u8 *src;
- u8 *dest;
+ union dmaengine_addr src;
+ union dmaengine_addr dest;
struct dma_chan *dma_chan;
dma_cookie_t cookie;
int err = 0;

- src = kzalloc(sizeof(u8) * IOAT_TEST_SIZE, SLAB_KERNEL);
- if (!src)
+ src.buf = kzalloc(sizeof(u8) * IOAT_TEST_SIZE, SLAB_KERNEL);
+ if (!src.buf)
return -ENOMEM;
- dest = kzalloc(sizeof(u8) * IOAT_TEST_SIZE, SLAB_KERNEL);
- if (!dest) {
- kfree(src);
+ dest.buf = kzalloc(sizeof(u8) * IOAT_TEST_SIZE, SLAB_KERNEL);
+ if (!dest.buf) {
+ kfree(src.buf);
return -ENOMEM;
}

/* Fill in src buffer */
for (i = 0; i < IOAT_TEST_SIZE; i++)
- src[i] = (u8)i;
+ ((u8 *) src.buf)[i] = (u8)i;

/* Start copy, using first DMA channel */
dma_chan = container_of(device->common.channels.next,
@@ -654,7 +608,8 @@ static int ioat_self_test(struct ioat_de
goto out;
}

- cookie = ioat_dma_memcpy_buf_to_buf(dma_chan, dest, src, IOAT_TEST_SIZE);
+ cookie = do_ioat_dma_memcpy(dma_chan, dest, 0, src, 0,
+ IOAT_TEST_SIZE, DMA_SRC_BUF | DMA_DEST_BUF);
ioat_dma_memcpy_issue_pending(dma_chan);
msleep(1);

@@ -663,7 +618,7 @@ static int ioat_self_test(struct ioat_de
err = -ENODEV;
goto free_resources;
}
- if (memcmp(src, dest, IOAT_TEST_SIZE)) {
+ if (memcmp(src.buf, dest.buf, IOAT_TEST_SIZE)) {
printk(KERN_ERR "ioatdma: Self-test copy failed compare, disabling\n");
err = -ENODEV;
goto free_resources;
@@ -672,11 +627,16 @@ static int ioat_self_test(struct ioat_de
free_resources:
ioat_dma_free_chan_resources(dma_chan);
out:
- kfree(src);
- kfree(dest);
+ kfree(src.buf);
+ kfree(dest.buf);
return err;
}

+extern dma_cookie_t dma_async_do_xor_err(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len, unsigned long flags);
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -752,13 +712,11 @@ #endif

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
device->common.device_operation_complete = ioat_dma_is_complete;
- device->common.device_xor_pgs_to_pg = dma_async_xor_pgs_to_pg_err;
device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
device->common.capabilities = DMA_MEMCPY;
+ device->common.device_do_dma_memcpy = do_ioat_dma_memcpy;
+ device->common.device_do_dma_xor = dma_async_do_xor_err;
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 3599472..df055cc 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -86,6 +86,32 @@ enum dma_capabilities {
};

/**
+ * union dmaengine_addr - Private address types
+ * -passing a dma address to the hardware engine
+ * implies skipping the dma_map* operation
+ */
+union dmaengine_addr {
+ void *buf;
+ struct page *pg;
+ struct page **pgs;
+ dma_addr_t dma;
+ dma_addr_t *dma_list;
+};
+
+enum dmaengine_flags {
+ DMA_SRC_BUF = 0x1,
+ DMA_SRC_PAGE = 0x2,
+ DMA_SRC_PAGES = 0x4,
+ DMA_SRC_DMA = 0x8,
+ DMA_SRC_DMA_LIST = 0x10,
+ DMA_DEST_BUF = 0x20,
+ DMA_DEST_PAGE = 0x40,
+ DMA_DEST_PAGES = 0x80,
+ DMA_DEST_DMA = 0x100,
+ DMA_DEST_DMA_LIST = 0x200,
+};
+
+/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
* @memcpy_count: transaction counter
@@ -230,11 +256,10 @@ struct dma_chan_client_ref {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
* @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
+ * @device_issue_pending: push appended descriptors to hardware
+ * @device_do_dma_memcpy: perform memcpy with a dma engine
+ * @device_do_dma_xor: perform block xor with a dma engine
*/
struct dma_device {

@@ -250,18 +275,15 @@ struct dma_device {

int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- dma_cookie_t (*device_xor_pgs_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page **src_pgs, unsigned int src_cnt,
- unsigned int src_off, size_t len);
+ dma_cookie_t (*device_do_dma_memcpy)(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_off,
+ size_t len, unsigned long flags);
+ dma_cookie_t (*device_do_dma_xor)(struct dma_chan *chan,
+ union dmaengine_addr dest, unsigned int dest_off,
+ union dmaengine_addr src, unsigned int src_cnt,
+ unsigned int src_off, size_t len,
+ unsigned long flags);
enum dma_status (*device_operation_complete)(struct dma_chan *chan,
dma_cookie_t cookie, dma_cookie_t *last,
dma_cookie_t *used);
@@ -275,9 +297,6 @@ void dma_async_client_unregister(struct
int dma_async_client_chan_request(struct dma_client *client,
unsigned int number, unsigned int mask);
void dma_async_chan_init(struct dma_chan *chan, struct dma_device *device);
-dma_cookie_t dma_async_xor_pgs_to_pg_err(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pgs,
- unsigned int src_cnt, unsigned int src_off, size_t len);

/**
* dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
@@ -294,12 +313,16 @@ dma_cookie_t dma_async_xor_pgs_to_pg_err
static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
void *dest, void *src, size_t len)
{
+ unsigned long flags = DMA_DEST_BUF | DMA_SRC_BUF;
+ union dmaengine_addr dest_addr = { .buf = dest };
+ union dmaengine_addr src_addr = { .buf = src };
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
per_cpu_ptr(chan->local, cpu)->memcpy_count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return chan->device->device_do_dma_memcpy(chan, dest_addr, 0,
+ src_addr, 0, len, flags);
}

/**
@@ -318,13 +341,16 @@ static inline dma_cookie_t dma_async_mem
static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
struct page *page, unsigned int offset, void *kdata, size_t len)
{
+ unsigned long flags = DMA_DEST_PAGE | DMA_SRC_BUF;
+ union dmaengine_addr dest_addr = { .pg = page };
+ union dmaengine_addr src_addr = { .buf = kdata };
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
per_cpu_ptr(chan->local, cpu)->memcpy_count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return chan->device->device_do_dma_memcpy(chan, dest_addr, offset,
+ src_addr, 0, len, flags);
}

/**
@@ -345,13 +371,101 @@ static inline dma_cookie_t dma_async_mem
struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
unsigned int src_off, size_t len)
{
+ unsigned long flags = DMA_DEST_PAGE | DMA_SRC_PAGE;
+ union dmaengine_addr dest_addr = { .pg = dest_pg };
+ union dmaengine_addr src_addr = { .pg = src_pg };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+ per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_memcpy(chan, dest_addr, dest_off,
+ src_addr, src_off, len, flags);
+}
+
+/**
+ * dma_async_memcpy_dma_to_dma - offloaded copy from dma to dma
+ * @chan: DMA channel to offload copy to
+ * @dest: destination already mapped and consistent
+ * @src: source already mapped and consistent
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_dma_to_dma(struct dma_chan *chan,
+ dma_addr_t dest, dma_addr_t src, size_t len)
+{
+ unsigned long flags = DMA_DEST_DMA | DMA_SRC_DMA;
+ union dmaengine_addr dest_addr = { .dma = dest };
+ union dmaengine_addr src_addr = { .dma = src };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+ per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_memcpy(chan, dest_addr, 0,
+ src_addr, 0, len, flags);
+}
+
+/**
+ * dma_async_memcpy_pg_to_dma - offloaded copy from page to dma
+ * @chan: DMA channel to offload copy to
+ * @dest: destination already mapped and consistent
+ * @src_pg: source page
+ * @src_off: offset in page to copy from
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_pg_to_dma(struct dma_chan *chan,
+ dma_addr_t dest, struct page *src_pg,
+ unsigned int src_off, size_t len)
+{
+ unsigned long flags = DMA_DEST_DMA | DMA_SRC_PAGE;
+ union dmaengine_addr dest_addr = { .dma = dest };
+ union dmaengine_addr src_addr = { .pg = src_pg };
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
per_cpu_ptr(chan->local, cpu)->memcpy_count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return chan->device->device_do_dma_memcpy(chan, dest_addr, 0,
+ src_addr, src_off, len, flags);
+}
+
+/**
+ * dma_async_memcpy_dma_to_pg - offloaded copy from dma to page
+ * @chan: DMA channel to offload copy to
+ * @dest_page: destination page
+ * @dest_off: offset in page to copy to
+ * @src: source already mapped and consistent
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_memcpy_dma_to_pg(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off, dma_addr_t src,
+ size_t len)
+{
+ unsigned long flags = DMA_DEST_PAGE | DMA_SRC_DMA;
+ union dmaengine_addr dest_addr = { .pg = dest_pg };
+ union dmaengine_addr src_addr = { .dma = src };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+ per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_memcpy(chan, dest_addr, dest_off,
+ src_addr, 0, len, flags);
}

/**
@@ -373,13 +487,40 @@ static inline dma_cookie_t dma_async_xor
struct page *dest_pg, unsigned int dest_off, struct page **src_pgs,
unsigned int src_cnt, unsigned int src_off, size_t len)
{
+ unsigned long flags = DMA_DEST_PAGE | DMA_SRC_PAGES;
+ union dmaengine_addr dest_addr = { .pg = dest_pg };
+ union dmaengine_addr src_addr = { .pgs = src_pgs };
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_xor += len * src_cnt;
+ per_cpu_ptr(chan->local, cpu)->xor_count++;
+ put_cpu();
+
+ return chan->device->device_do_dma_xor(chan, dest_addr, dest_off,
+ src_addr, src_cnt, src_off, len, flags);
+}
+
+/**
+ * dma_async_xor_dma_list_to_dma - offloaded xor of dma blocks
+ * @chan: DMA channel to offload xor to
+ * @dest: destination already mapped and consistent
+ * @src_list: array of sources already mapped and consistent
+ * @src_cnt: number of sources
+ * @len: length
+ */
+static inline dma_cookie_t dma_async_xor_dma_list_to_dma(struct dma_chan *chan,
+ dma_addr_t dest, dma_addr_t *src_list, unsigned int src_cnt,
+ size_t len)
+{
+ unsigned long flags = DMA_DEST_DMA | DMA_SRC_DMA_LIST;
+ union dmaengine_addr dest_addr = { .dma = dest };
+ union dmaengine_addr src_addr = { .dma_list = src_list };
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_xor += len * src_cnt;
per_cpu_ptr(chan->local, cpu)->xor_count++;
put_cpu();

- return chan->device->device_xor_pgs_to_pg(chan, dest_pg, dest_off,
- src_pgs, src_cnt, src_off, len);
+ return chan->device->device_do_dma_xor(chan, dest_addr, 0,
+ src_addr, src_cnt, 0, len, flags);
}

/**

2006-09-11 23:20:38

by Dan Williams

[permalink] [raw]
Subject: [PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization

From: Dan Williams <[email protected]>

Currently the iop3xx platform support code assumes that RedBoot is the
bootloader and has already initialized the ATU. Linux should handle this
initialization for three reasons:

1/ The memory map that RedBoot sets up is not optimal (page_to_dma and
virt_to_phys return different addresses). The effect of this is that using
the dma mapping API for the internal bus dma units generates pci bus
addresses that are incorrect for the internal bus.

2/ Not all iop platforms use RedBoot

3/ If the ATU is already initialized it indicates that the iop is an add-in
card in another host, it does not own the PCI bus, and should not be
re-initialized.

Signed-off-by: Dan Williams <[email protected]>
---

arch/arm/mach-iop32x/Kconfig | 8 ++
arch/arm/mach-iop32x/ep80219.c | 4 +
arch/arm/mach-iop32x/iq31244.c | 5 +
arch/arm/mach-iop32x/iq80321.c | 5 +
arch/arm/mach-iop33x/Kconfig | 8 ++
arch/arm/mach-iop33x/iq80331.c | 5 +
arch/arm/mach-iop33x/iq80332.c | 4 +
arch/arm/plat-iop/pci.c | 140 ++++++++++++++++++++++++++++++++++
include/asm-arm/arch-iop32x/iop32x.h | 9 ++
include/asm-arm/arch-iop32x/memory.h | 4 -
include/asm-arm/arch-iop33x/iop33x.h | 10 ++
include/asm-arm/arch-iop33x/memory.h | 4 -
include/asm-arm/hardware/iop3xx.h | 20 ++++-
13 files changed, 214 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mach-iop32x/Kconfig b/arch/arm/mach-iop32x/Kconfig
index 05549a5..b2788e3 100644
--- a/arch/arm/mach-iop32x/Kconfig
+++ b/arch/arm/mach-iop32x/Kconfig
@@ -22,6 +22,14 @@ config ARCH_IQ80321
Say Y here if you want to run your kernel on the Intel IQ80321
evaluation kit for the IOP321 processor.

+config IOP3XX_ATU
+ bool "Enable the PCI Controller"
+ default y
+ help
+ Say Y here if you want the IOP to initialize its PCI Controller.
+ Say N if the IOP is an add in card, the host system owns the PCI
+ bus in this case.
+
endmenu

endif
diff --git a/arch/arm/mach-iop32x/ep80219.c b/arch/arm/mach-iop32x/ep80219.c
index f616d3e..1a5c586 100644
--- a/arch/arm/mach-iop32x/ep80219.c
+++ b/arch/arm/mach-iop32x/ep80219.c
@@ -100,7 +100,7 @@ ep80219_pci_map_irq(struct pci_dev *dev,

static struct hw_pci ep80219_pci __initdata = {
.swizzle = pci_std_swizzle,
- .nr_controllers = 1,
+ .nr_controllers = 0,
.setup = iop3xx_pci_setup,
.preinit = iop3xx_pci_preinit,
.scan = iop3xx_pci_scan_bus,
@@ -109,6 +109,8 @@ static struct hw_pci ep80219_pci __initd

static int __init ep80219_pci_init(void)
{
+ if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
+ ep80219_pci.nr_controllers = 1;
#if 0
if (machine_is_ep80219())
pci_common_init(&ep80219_pci);
diff --git a/arch/arm/mach-iop32x/iq31244.c b/arch/arm/mach-iop32x/iq31244.c
index 967a696..25d5d62 100644
--- a/arch/arm/mach-iop32x/iq31244.c
+++ b/arch/arm/mach-iop32x/iq31244.c
@@ -97,7 +97,7 @@ iq31244_pci_map_irq(struct pci_dev *dev,

static struct hw_pci iq31244_pci __initdata = {
.swizzle = pci_std_swizzle,
- .nr_controllers = 1,
+ .nr_controllers = 0,
.setup = iop3xx_pci_setup,
.preinit = iop3xx_pci_preinit,
.scan = iop3xx_pci_scan_bus,
@@ -106,6 +106,9 @@ static struct hw_pci iq31244_pci __initd

static int __init iq31244_pci_init(void)
{
+ if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
+ iq31244_pci.nr_controllers = 1;
+
if (machine_is_iq31244())
pci_common_init(&iq31244_pci);

diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c
index ef4388c..cdd2265 100644
--- a/arch/arm/mach-iop32x/iq80321.c
+++ b/arch/arm/mach-iop32x/iq80321.c
@@ -97,7 +97,7 @@ iq80321_pci_map_irq(struct pci_dev *dev,

static struct hw_pci iq80321_pci __initdata = {
.swizzle = pci_std_swizzle,
- .nr_controllers = 1,
+ .nr_controllers = 0,
.setup = iop3xx_pci_setup,
.preinit = iop3xx_pci_preinit,
.scan = iop3xx_pci_scan_bus,
@@ -106,6 +106,9 @@ static struct hw_pci iq80321_pci __initd

static int __init iq80321_pci_init(void)
{
+ if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
+ iq80321_pci.nr_controllers = 1;
+
if (machine_is_iq80321())
pci_common_init(&iq80321_pci);

diff --git a/arch/arm/mach-iop33x/Kconfig b/arch/arm/mach-iop33x/Kconfig
index 9aa016b..45598e0 100644
--- a/arch/arm/mach-iop33x/Kconfig
+++ b/arch/arm/mach-iop33x/Kconfig
@@ -16,6 +16,14 @@ config MACH_IQ80332
Say Y here if you want to run your kernel on the Intel IQ80332
evaluation kit for the IOP332 chipset.

+config IOP3XX_ATU
+ bool "Enable the PCI Controller"
+ default y
+ help
+ Say Y here if you want the IOP to initialize its PCI Controller.
+ Say N if the IOP is an add in card, the host system owns the PCI
+ bus in this case.
+
endmenu

endif
diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c
index 7714c94..3807000 100644
--- a/arch/arm/mach-iop33x/iq80331.c
+++ b/arch/arm/mach-iop33x/iq80331.c
@@ -78,7 +78,7 @@ iq80331_pci_map_irq(struct pci_dev *dev,

static struct hw_pci iq80331_pci __initdata = {
.swizzle = pci_std_swizzle,
- .nr_controllers = 1,
+ .nr_controllers = 0,
.setup = iop3xx_pci_setup,
.preinit = iop3xx_pci_preinit,
.scan = iop3xx_pci_scan_bus,
@@ -87,6 +87,9 @@ static struct hw_pci iq80331_pci __initd

static int __init iq80331_pci_init(void)
{
+ if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
+ iq80331_pci.nr_controllers = 1;
+
if (machine_is_iq80331())
pci_common_init(&iq80331_pci);

diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c
index a3fa7f8..8780d55 100644
--- a/arch/arm/mach-iop33x/iq80332.c
+++ b/arch/arm/mach-iop33x/iq80332.c
@@ -93,6 +93,10 @@ static struct hw_pci iq80332_pci __initd

static int __init iq80332_pci_init(void)
{
+
+ if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
+ iq80332_pci.nr_controllers = 1;
+
if (machine_is_iq80332())
pci_common_init(&iq80332_pci);

diff --git a/arch/arm/plat-iop/pci.c b/arch/arm/plat-iop/pci.c
index e647812..19aace9 100644
--- a/arch/arm/plat-iop/pci.c
+++ b/arch/arm/plat-iop/pci.c
@@ -55,7 +55,7 @@ static u32 iop3xx_cfg_address(struct pci
* This routine checks the status of the last configuration cycle. If an error
* was detected it returns a 1, else it returns a 0. The errors being checked
* are parity, master abort, target abort (master and target). These types of
- * errors occure during a config cycle where there is no device, like during
+ * errors occur during a config cycle where there is no device, like during
* the discovery stage.
*/
static int iop3xx_pci_status(void)
@@ -223,8 +223,111 @@ struct pci_bus *iop3xx_pci_scan_bus(int
return pci_scan_bus(sys->busnr, &iop3xx_ops, sys);
}

+void __init iop3xx_atu_setup(void)
+{
+ /* BAR 0 ( Disabled ) */
+ *IOP3XX_IAUBAR0 = 0x0;
+ *IOP3XX_IABAR0 = 0x0;
+ *IOP3XX_IATVR0 = 0x0;
+ *IOP3XX_IALR0 = 0x0;
+
+ /* BAR 1 ( Disabled ) */
+ *IOP3XX_IAUBAR1 = 0x0;
+ *IOP3XX_IABAR1 = 0x0;
+ *IOP3XX_IALR1 = 0x0;
+
+ /* BAR 2 (1:1 mapping with Physical RAM) */
+ /* Set limit and enable */
+ *IOP3XX_IALR2 = ~((u32)IOP3XX_MAX_RAM_SIZE - 1) & ~0x1;
+ *IOP3XX_IAUBAR2 = 0x0;
+
+ /* Align the inbound bar with the base of memory */
+ *IOP3XX_IABAR2 = PHYS_OFFSET |
+ PCI_BASE_ADDRESS_MEM_TYPE_64 |
+ PCI_BASE_ADDRESS_MEM_PREFETCH;
+
+ *IOP3XX_IATVR2 = PHYS_OFFSET;
+
+ /* Outbound window 0 */
+ *IOP3XX_OMWTVR0 = IOP3XX_PCI_LOWER_MEM_PA;
+ *IOP3XX_OUMWTVR0 = 0;
+
+ /* Outbound window 1 */
+ *IOP3XX_OMWTVR1 = IOP3XX_PCI_LOWER_MEM_PA + IOP3XX_PCI_MEM_WINDOW_SIZE;
+ *IOP3XX_OUMWTVR1 = 0;
+
+ /* BAR 3 ( Disabled ) */
+ *IOP3XX_IAUBAR3 = 0x0;
+ *IOP3XX_IABAR3 = 0x0;
+ *IOP3XX_IATVR3 = 0x0;
+ *IOP3XX_IALR3 = 0x0;
+
+ /* Setup the I/O Bar
+ */
+ *IOP3XX_OIOWTVR = IOP3XX_PCI_LOWER_IO_PA;;
+
+ /* Enable inbound and outbound cycles
+ */
+ *IOP3XX_ATUCMD |= PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
+ PCI_COMMAND_PARITY | PCI_COMMAND_SERR;
+ *IOP3XX_ATUCR |= IOP3XX_ATUCR_OUT_EN;
+}
+
+void __init iop3xx_atu_disable(void)
+{
+ *IOP3XX_ATUCMD = 0;
+ *IOP3XX_ATUCR = 0;
+
+ /* wait for cycles to quiesce */
+ while (*IOP3XX_PCSR & (IOP3XX_PCSR_OUT_Q_BUSY |
+ IOP3XX_PCSR_IN_Q_BUSY))
+ cpu_relax();
+
+ /* BAR 0 ( Disabled ) */
+ *IOP3XX_IAUBAR0 = 0x0;
+ *IOP3XX_IABAR0 = 0x0;
+ *IOP3XX_IATVR0 = 0x0;
+ *IOP3XX_IALR0 = 0x0;
+
+ /* BAR 1 ( Disabled ) */
+ *IOP3XX_IAUBAR1 = 0x0;
+ *IOP3XX_IABAR1 = 0x0;
+ *IOP3XX_IALR1 = 0x0;
+
+ /* BAR 2 ( Disabled ) */
+ *IOP3XX_IAUBAR2 = 0x0;
+ *IOP3XX_IABAR2 = 0x0;
+ *IOP3XX_IATVR2 = 0x0;
+ *IOP3XX_IALR2 = 0x0;
+
+ /* BAR 3 ( Disabled ) */
+ *IOP3XX_IAUBAR3 = 0x0;
+ *IOP3XX_IABAR3 = 0x0;
+ *IOP3XX_IATVR3 = 0x0;
+ *IOP3XX_IALR3 = 0x0;
+
+ /* Clear the outbound windows */
+ *IOP3XX_OIOWTVR = 0;
+
+ /* Outbound window 0 */
+ *IOP3XX_OMWTVR0 = 0;
+ *IOP3XX_OUMWTVR0 = 0;
+
+ /* Outbound window 1 */
+ *IOP3XX_OMWTVR1 = 0;
+ *IOP3XX_OUMWTVR1 = 0;
+}
+
+/* Flag to determine whether the ATU is initialized and the PCI bus scanned */
+int init_atu;
+
void iop3xx_pci_preinit(void)
{
+ if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE) {
+ iop3xx_atu_disable();
+ iop3xx_atu_setup();
+ }
+
DBG("PCI: Intel 803xx PCI init code.\n");
DBG("ATU: IOP3XX_ATUCMD=0x%04x\n", *IOP3XX_ATUCMD);
DBG("ATU: IOP3XX_OMWTVR0=0x%04x, IOP3XX_OIOWTVR=0x%04x\n",
@@ -245,3 +348,38 @@ void iop3xx_pci_preinit(void)

hook_fault_code(16+6, iop3xx_pci_abort, SIGBUS, "imprecise external abort");
}
+
+/* allow init_atu to be user overridden */
+static int __init iop3xx_init_atu_setup(char *str)
+{
+ init_atu = IOP3XX_INIT_ATU_DEFAULT;
+ if (str) {
+ while (*str != '\0') {
+ switch (*str) {
+ case 'y':
+ case 'Y':
+ init_atu = IOP3XX_INIT_ATU_ENABLE;
+ break;
+ case 'n':
+ case 'N':
+ init_atu = IOP3XX_INIT_ATU_DISABLE;
+ break;
+ case ',':
+ case '=':
+ break;
+ default:
+ printk(KERN_DEBUG "\"%s\" malformed at "
+ "character: \'%c\'",
+ __FUNCTION__,
+ *str);
+ *(str + 1) = '\0';
+ }
+ str++;
+ }
+ }
+
+ return 1;
+}
+
+__setup("iop3xx_init_atu", iop3xx_init_atu_setup);
+
diff --git a/include/asm-arm/arch-iop32x/iop32x.h b/include/asm-arm/arch-iop32x/iop32x.h
index 904a14d..93209c7 100644
--- a/include/asm-arm/arch-iop32x/iop32x.h
+++ b/include/asm-arm/arch-iop32x/iop32x.h
@@ -32,5 +32,14 @@ #define IOP32X_INTSTR IOP3XX_REG_ADDR32
#define IOP32X_IINTSRC IOP3XX_REG_ADDR32(0x07d8)
#define IOP32X_FINTSRC IOP3XX_REG_ADDR32(0x07dc)

+/* ATU Parameters
+ * set up a 1:1 bus to physical ram relationship
+ * w/ physical ram on top of pci in the memory map
+ */
+#define IOP32X_MAX_RAM_SIZE 0x40000000UL
+#define IOP3XX_MAX_RAM_SIZE IOP32X_MAX_RAM_SIZE
+#define IOP3XX_PCI_LOWER_MEM_BA 0x80000000
+#define IOP32X_PCI_MEM_WINDOW_SIZE 0x04000000
+#define IOP3XX_PCI_MEM_WINDOW_SIZE IOP32X_PCI_MEM_WINDOW_SIZE

#endif
diff --git a/include/asm-arm/arch-iop32x/memory.h b/include/asm-arm/arch-iop32x/memory.h
index 764cd3f..c51072a 100644
--- a/include/asm-arm/arch-iop32x/memory.h
+++ b/include/asm-arm/arch-iop32x/memory.h
@@ -19,8 +19,8 @@ #define PHYS_OFFSET UL(0xa0000000)
* bus_to_virt: Used to convert an address for DMA operations
* to an address that the kernel can use.
*/
-#define __virt_to_bus(x) (((__virt_to_phys(x)) & ~(*IOP3XX_IATVR2)) | ((*IOP3XX_IABAR2) & 0xfffffff0))
-#define __bus_to_virt(x) (__phys_to_virt(((x) & ~(*IOP3XX_IALR2)) | ( *IOP3XX_IATVR2)))
+#define __virt_to_bus(x) (__virt_to_phys(x))
+#define __bus_to_virt(x) (__phys_to_virt(x))


#endif
diff --git a/include/asm-arm/arch-iop33x/iop33x.h b/include/asm-arm/arch-iop33x/iop33x.h
index c171383..e106b80 100644
--- a/include/asm-arm/arch-iop33x/iop33x.h
+++ b/include/asm-arm/arch-iop33x/iop33x.h
@@ -49,5 +49,15 @@ #define IOP33X_UART0_VIRT (IOP3XX_PERIPH
#define IOP33X_UART1_PHYS (IOP3XX_PERIPHERAL_PHYS_BASE + 0x1740)
#define IOP33X_UART1_VIRT (IOP3XX_PERIPHERAL_VIRT_BASE + 0x1740)

+/* ATU Parameters
+ * set up a 1:1 bus to physical ram relationship
+ * w/ pci on top of physical ram in memory map
+ */
+#define IOP33X_MAX_RAM_SIZE 0x80000000UL
+#define IOP3XX_MAX_RAM_SIZE IOP33X_MAX_RAM_SIZE
+#define IOP3XX_PCI_LOWER_MEM_BA (PHYS_OFFSET + IOP33X_MAX_RAM_SIZE)
+#define IOP33X_PCI_MEM_WINDOW_SIZE 0x08000000
+#define IOP3XX_PCI_MEM_WINDOW_SIZE IOP33X_PCI_MEM_WINDOW_SIZE
+

#endif
diff --git a/include/asm-arm/arch-iop33x/memory.h b/include/asm-arm/arch-iop33x/memory.h
index 0d39139..c874912 100644
--- a/include/asm-arm/arch-iop33x/memory.h
+++ b/include/asm-arm/arch-iop33x/memory.h
@@ -19,8 +19,8 @@ #define PHYS_OFFSET UL(0x00000000)
* bus_to_virt: Used to convert an address for DMA operations
* to an address that the kernel can use.
*/
-#define __virt_to_bus(x) (((__virt_to_phys(x)) & ~(*IOP3XX_IATVR2)) | ((*IOP3XX_IABAR2) & 0xfffffff0))
-#define __bus_to_virt(x) (__phys_to_virt(((x) & ~(*IOP3XX_IALR2)) | ( *IOP3XX_IATVR2)))
+#define __virt_to_bus(x) (__virt_to_phys(x))
+#define __bus_to_virt(x) (__phys_to_virt(x))


#endif
diff --git a/include/asm-arm/hardware/iop3xx.h b/include/asm-arm/hardware/iop3xx.h
index 295789a..5a084c8 100644
--- a/include/asm-arm/hardware/iop3xx.h
+++ b/include/asm-arm/hardware/iop3xx.h
@@ -28,6 +28,7 @@ #ifndef __ASSEMBLY__
extern void gpio_line_config(int line, int direction);
extern int gpio_line_get(int line);
extern void gpio_line_set(int line, int value);
+extern int init_atu;
#endif


@@ -98,6 +99,21 @@ #define IOP3XX_PCIXNEXT IOP3XX_REG_ADDR8
#define IOP3XX_PCIXCMD IOP3XX_REG_ADDR16(0x01e2)
#define IOP3XX_PCIXSR IOP3XX_REG_ADDR32(0x01e4)
#define IOP3XX_PCIIRSR IOP3XX_REG_ADDR32(0x01ec)
+#define IOP3XX_PCSR_OUT_Q_BUSY (1 << 15)
+#define IOP3XX_PCSR_IN_Q_BUSY (1 << 14)
+#define IOP3XX_ATUCR_OUT_EN (1 << 1)
+
+#define IOP3XX_INIT_ATU_DEFAULT 0
+#define IOP3XX_INIT_ATU_DISABLE -1
+#define IOP3XX_INIT_ATU_ENABLE 1
+
+#ifdef CONFIG_IOP3XX_ATU
+#define iop3xx_get_init_atu(x) (init_atu == IOP3XX_INIT_ATU_DEFAULT ?\
+ IOP3XX_INIT_ATU_ENABLE : init_atu)
+#else
+#define iop3xx_get_init_atu(x) (init_atu == IOP3XX_INIT_ATU_DEFAULT ?\
+ IOP3XX_INIT_ATU_DISABLE : init_atu)
+#endif

/* Messaging Unit */
#define IOP3XX_IMR0 IOP3XX_REG_ADDR32(0x0310)
@@ -219,14 +235,12 @@ #define IOP3XX_IBMR1 IOP3XX_REG_ADDR32(
/*
* IOP3XX I/O and Mem space regions for PCI autoconfiguration
*/
-#define IOP3XX_PCI_MEM_WINDOW_SIZE 0x04000000
#define IOP3XX_PCI_LOWER_MEM_PA 0x80000000
-#define IOP3XX_PCI_LOWER_MEM_BA (*IOP3XX_OMWTVR0)

#define IOP3XX_PCI_IO_WINDOW_SIZE 0x00010000
#define IOP3XX_PCI_LOWER_IO_PA 0x90000000
#define IOP3XX_PCI_LOWER_IO_VA 0xfe000000
-#define IOP3XX_PCI_LOWER_IO_BA (*IOP3XX_OIOWTVR)
+#define IOP3XX_PCI_LOWER_IO_BA 0x90000000


#ifndef __ASSEMBLY__

2006-09-11 23:21:56

by Dan Williams

[permalink] [raw]
Subject: [PATCH 06/19] raid5: move the reconstruct write expansion operation to a workqueue

From: Dan Williams <[email protected]>

Enable handle_stripe5 to use the reconstruct write operations capability
for expansion operations.

However this does not move the copy operation associated with an expand to
the workqueue. First, it was difficult to find a clean way to pass the
parameters of this operation to the queue. Second, this section of code is
a good candidate for performing the copies with inline calls to the dma
routines.

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/raid5.c | 36 +++++++++++++++++++++++++++---------
1 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 1a8dfd2..a07b52b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2053,6 +2053,7 @@ #endif
* completed
*/
if (test_bit(STRIPE_OP_RCW, &sh->state) &&
+ !test_bit(STRIPE_OP_RCW_Expand, &sh->ops.state) &&
test_bit(STRIPE_OP_RCW_Done, &sh->ops.state)) {
clear_bit(STRIPE_OP_RCW, &sh->state);
clear_bit(STRIPE_OP_RCW_Done, &sh->ops.state);
@@ -2226,6 +2227,7 @@ #endif
}
}
}
+
if (test_bit(STRIPE_OP_COMPUTE_Done, &sh->ops.state) &&
test_bit(STRIPE_OP_COMPUTE_Recover_pd, &sh->ops.state)) {
clear_bit(STRIPE_OP_COMPUTE, &sh->state);
@@ -2282,18 +2284,28 @@ #endif
}
}

- if (expanded && test_bit(STRIPE_EXPANDING, &sh->state)) {
+ /* Finish 'rcw' operations initiated by the expansion
+ * process
+ */
+ if (test_bit(STRIPE_OP_RCW, &sh->state) &&
+ test_bit(STRIPE_OP_RCW_Expand, &sh->ops.state) &&
+ test_bit(STRIPE_OP_RCW_Done, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_RCW, &sh->state);
+ clear_bit(STRIPE_OP_RCW_Done, &sh->ops.state);
+ clear_bit(STRIPE_OP_RCW_Expand, &sh->ops.state);
+ clear_bit(STRIPE_EXPANDING, &sh->state);
+ for (i= conf->raid_disks; i--;)
+ set_bit(R5_Wantwrite, &sh->dev[i].flags);
+ }
+
+ if (expanded && test_bit(STRIPE_EXPANDING, &sh->state) &&
+ !test_bit(STRIPE_OP_RCW, &sh->state)) {
/* Need to write out all blocks after computing parity */
sh->disks = conf->raid_disks;
sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks);
- compute_parity5(sh, RECONSTRUCT_WRITE);
- for (i= conf->raid_disks; i--;) {
- set_bit(R5_LOCKED, &sh->dev[i].flags);
- locked++;
- set_bit(R5_Wantwrite, &sh->dev[i].flags);
- }
- clear_bit(STRIPE_EXPANDING, &sh->state);
- } else if (expanded) {
+ set_bit(STRIPE_OP_RCW_Expand, &sh->ops.state);
+ locked += handle_write_operations5(sh, 0);
+ } else if (expanded && !test_bit(STRIPE_OP_RCW, &sh->state)) {
clear_bit(STRIPE_EXPAND_READY, &sh->state);
atomic_dec(&conf->reshape_stripes);
wake_up(&conf->wait_for_overlap);
@@ -2327,9 +2339,15 @@ #endif
release_stripe(sh2);
continue;
}
+ /* to do: perform these operations with a dma engine
+ * inline (rather than pushing to the workqueue)
+ */
+ /*#ifdef CONFIG_RAID5_DMA*/
+ /*#else*/
memcpy(page_address(sh2->dev[dd_idx].page),
page_address(sh->dev[i].page),
STRIPE_SIZE);
+ /*#endif*/
set_bit(R5_Expanded, &sh2->dev[dd_idx].flags);
set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags);
for (j=0; j<conf->raid_disks; j++)

2006-09-11 23:22:35

by Dan Williams

[permalink] [raw]
Subject: [PATCH 08/19] dmaengine: enable multiple clients and operations

From: Dan Williams <[email protected]>

Enable the dmaengine interface to allow multiple clients to share a
channel, and enable clients to request channels based on an operations
capability mask. This prepares the interface for use with the RAID5 client
and the future RAID6 client.

Multi-client support is achieved by modifying channels to maintain a list
of peer clients.

Multi-operation support is achieved by modifying clients to maintain lists
of channel references. Channel references in a given request list satisfy
a client specified capability mask.

Changelog:
* make the dmaengine api EXPORT_SYMBOL_GPL
* zero sum support should be standalone, not integrated into xor

Signed-off-by: Dan Williams <[email protected]>
---

drivers/dma/dmaengine.c | 357 ++++++++++++++++++++++++++++++++++++---------
drivers/dma/ioatdma.c | 12 +-
include/linux/dmaengine.h | 164 ++++++++++++++++++---
net/core/dev.c | 21 +--
net/ipv4/tcp.c | 4 -
5 files changed, 443 insertions(+), 115 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 1527804..e10f19d 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -37,8 +37,13 @@
* Each device has a channels list, which runs unlocked but is never modified
* once the device is registered, it's just setup by the driver.
*
- * Each client has a channels list, it's only modified under the client->lock
- * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ * Each client has 'n' lists of channel references where
+ * n == DMA_MAX_CHAN_TYPE_REQ. These lists are only modified under the
+ * client->lock and in an RCU callback, so they are safe to read under
+ * rcu_read_lock().
+ *
+ * Each channel has a list of peer clients, it's only modified under the
+ * chan->lock. This allows a channel to be shared amongst several clients
*
* Each device has a kref, which is initialized to 1 when the device is
* registered. A kref_put is done for each class_device registered. When the
@@ -85,6 +90,18 @@ static ssize_t show_memcpy_count(struct
return sprintf(buf, "%lu\n", count);
}

+static ssize_t show_xor_count(struct class_device *cd, char *buf)
+{
+ struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+ unsigned long count = 0;
+ int i;
+
+ for_each_possible_cpu(i)
+ count += per_cpu_ptr(chan->local, i)->xor_count;
+
+ return sprintf(buf, "%lu\n", count);
+}
+
static ssize_t show_bytes_transferred(struct class_device *cd, char *buf)
{
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
@@ -97,16 +114,37 @@ static ssize_t show_bytes_transferred(st
return sprintf(buf, "%lu\n", count);
}

+static ssize_t show_bytes_xor(struct class_device *cd, char *buf)
+{
+ struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+ unsigned long count = 0;
+ int i;
+
+ for_each_possible_cpu(i)
+ count += per_cpu_ptr(chan->local, i)->bytes_xor;
+
+ return sprintf(buf, "%lu\n", count);
+}
+
static ssize_t show_in_use(struct class_device *cd, char *buf)
{
+ unsigned int clients = 0;
+ struct list_head *peer;
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);

- return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
+ rcu_read_lock();
+ list_for_each_rcu(peer, &chan->peers)
+ clients++;
+ rcu_read_unlock();
+
+ return sprintf(buf, "%d\n", clients);
}

static struct class_device_attribute dma_class_attrs[] = {
__ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(xor_count, S_IRUGO, show_xor_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
+ __ATTR(bytes_xor, S_IRUGO, show_bytes_xor, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
};
@@ -130,34 +168,79 @@ static struct class dma_devclass = {
/**
* dma_client_chan_alloc - try to allocate a channel to a client
* @client: &dma_client
+ * @req: request descriptor
*
* Called with dma_list_mutex held.
*/
-static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
+static struct dma_chan *dma_client_chan_alloc(struct dma_client *client,
+ struct dma_req *req)
{
struct dma_device *device;
struct dma_chan *chan;
+ struct dma_client_chan_peer *peer;
+ struct dma_chan_client_ref *chan_ref;
unsigned long flags;
int desc; /* allocated descriptor count */
+ int allocated; /* flag re-allocations */

- /* Find a channel, any DMA engine will do */
+ /* Find a channel */
list_for_each_entry(device, &dma_device_list, global_node) {
+ if ((req->cap_mask & device->capabilities)
+ != req->cap_mask)
+ continue;
list_for_each_entry(chan, &device->channels, device_node) {
- if (chan->client)
+ allocated = 0;
+ rcu_read_lock();
+ list_for_each_entry_rcu(chan_ref, &req->channels, req_node) {
+ if (chan_ref->chan == chan) {
+ allocated = 1;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ if (allocated)
continue;

+ /* can the channel be shared between multiple clients */
+ if ((req->exclusive && !list_empty(&chan->peers)) ||
+ chan->exclusive)
+ continue;
+
+ chan_ref = kmalloc(sizeof(*chan_ref), GFP_KERNEL);
+ if (!chan_ref)
+ continue;
+
+ peer = kmalloc(sizeof(*peer), GFP_KERNEL);
+ if (!peer) {
+ kfree(chan_ref);
+ continue;
+ }
+
desc = chan->device->device_alloc_chan_resources(chan);
- if (desc >= 0) {
+ if (desc) {
kref_get(&device->refcount);
- kref_init(&chan->refcount);
- chan->slow_ref = 0;
- INIT_RCU_HEAD(&chan->rcu);
- chan->client = client;
+ kref_get(&chan->refcount);
+ INIT_RCU_HEAD(&peer->rcu);
+ INIT_RCU_HEAD(&chan_ref->rcu);
+ INIT_LIST_HEAD(&peer->peer_node);
+ INIT_LIST_HEAD(&chan_ref->req_node);
+ peer->client = client;
+ chan_ref->chan = chan;
+
+ spin_lock_irqsave(&chan->lock, flags);
+ list_add_tail_rcu(&peer->peer_node, &chan->peers);
+ spin_unlock_irqrestore(&chan->lock, flags);
+
spin_lock_irqsave(&client->lock, flags);
- list_add_tail_rcu(&chan->client_node,
- &client->channels);
+ chan->exclusive = req->exclusive ? client : NULL;
+ list_add_tail_rcu(&chan_ref->req_node,
+ &req->channels);
spin_unlock_irqrestore(&client->lock, flags);
return chan;
+ } else {
+ kfree(peer);
+ kfree(chan_ref);
}
}
}
@@ -173,7 +256,6 @@ void dma_chan_cleanup(struct kref *kref)
{
struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
chan->device->device_free_chan_resources(chan);
- chan->client = NULL;
kref_put(&chan->device->refcount, dma_async_device_cleanup);
}

@@ -186,51 +268,93 @@ static void dma_chan_free_rcu(struct rcu
bias -= local_read(&per_cpu_ptr(chan->local, i)->refcount);
atomic_sub(bias, &chan->refcount.refcount);
kref_put(&chan->refcount, dma_chan_cleanup);
+ kref_put(&chan->device->refcount, dma_async_device_cleanup);
+}
+
+static void dma_peer_free_rcu(struct rcu_head *rcu)
+{
+ struct dma_client_chan_peer *peer =
+ container_of(rcu, struct dma_client_chan_peer, rcu);
+
+ kfree(peer);
+}
+
+static void dma_chan_ref_free_rcu(struct rcu_head *rcu)
+{
+ struct dma_chan_client_ref *chan_ref =
+ container_of(rcu, struct dma_chan_client_ref, rcu);
+
+ kfree(chan_ref);
}

-static void dma_client_chan_free(struct dma_chan *chan)
+static void dma_client_chan_free(struct dma_client *client,
+ struct dma_chan_client_ref *chan_ref)
{
+ struct dma_client_chan_peer *peer;
+ struct dma_chan *chan = chan_ref->chan;
atomic_add(0x7FFFFFFF, &chan->refcount.refcount);
chan->slow_ref = 1;
- call_rcu(&chan->rcu, dma_chan_free_rcu);
+ rcu_read_lock();
+ list_for_each_entry_rcu(peer, &chan->peers, peer_node)
+ if (peer->client == client) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&chan->lock, flags);
+ list_del_rcu(&peer->peer_node);
+ if (list_empty(&chan->peers))
+ chan->exclusive = NULL;
+ spin_unlock_irqrestore(&chan->lock, flags);
+ call_rcu(&peer->rcu, dma_peer_free_rcu);
+ call_rcu(&chan_ref->rcu, dma_chan_ref_free_rcu);
+ call_rcu(&chan->rcu, dma_chan_free_rcu);
+ break;
+ }
+ rcu_read_unlock();
}

/**
* dma_chans_rebalance - reallocate channels to clients
*
- * When the number of DMA channel in the system changes,
- * channels need to be rebalanced among clients.
+ * When the number of DMA channels in the system changes,
+ * channels need to be rebalanced among clients
*/
static void dma_chans_rebalance(void)
{
struct dma_client *client;
struct dma_chan *chan;
+ struct dma_chan_client_ref *chan_ref;
+
unsigned long flags;
+ int i;

mutex_lock(&dma_list_mutex);

list_for_each_entry(client, &dma_client_list, global_node) {
- while (client->chans_desired > client->chan_count) {
- chan = dma_client_chan_alloc(client);
- if (!chan)
- break;
- client->chan_count++;
- client->event_callback(client,
- chan,
- DMA_RESOURCE_ADDED);
- }
- while (client->chans_desired < client->chan_count) {
- spin_lock_irqsave(&client->lock, flags);
- chan = list_entry(client->channels.next,
- struct dma_chan,
- client_node);
- list_del_rcu(&chan->client_node);
- spin_unlock_irqrestore(&client->lock, flags);
- client->chan_count--;
- client->event_callback(client,
- chan,
- DMA_RESOURCE_REMOVED);
- dma_client_chan_free(chan);
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++) {
+ struct dma_req *req = &client->req[i];
+ while (req->chans_desired > atomic_read(&req->chan_count)) {
+ chan = dma_client_chan_alloc(client, req);
+ if (!chan)
+ break;
+ atomic_inc(&req->chan_count);
+ client->event_callback(client,
+ chan,
+ DMA_RESOURCE_ADDED);
+ }
+ while (req->chans_desired < atomic_read(&req->chan_count)) {
+ spin_lock_irqsave(&client->lock, flags);
+ chan_ref = list_entry(req->channels.next,
+ struct dma_chan_client_ref,
+ req_node);
+ list_del_rcu(&chan_ref->req_node);
+ spin_unlock_irqrestore(&client->lock, flags);
+ atomic_dec(&req->chan_count);
+
+ client->event_callback(client,
+ chan_ref->chan,
+ DMA_RESOURCE_REMOVED);
+ dma_client_chan_free(client, chan_ref);
+ }
}
}

@@ -244,15 +368,18 @@ static void dma_chans_rebalance(void)
struct dma_client *dma_async_client_register(dma_event_callback event_callback)
{
struct dma_client *client;
+ int i;

client = kzalloc(sizeof(*client), GFP_KERNEL);
if (!client)
return NULL;

- INIT_LIST_HEAD(&client->channels);
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++) {
+ INIT_LIST_HEAD(&client->req[i].channels);
+ atomic_set(&client->req[i].chan_count, 0);
+ }
+
spin_lock_init(&client->lock);
- client->chans_desired = 0;
- client->chan_count = 0;
client->event_callback = event_callback;

mutex_lock(&dma_list_mutex);
@@ -270,14 +397,16 @@ struct dma_client *dma_async_client_regi
*/
void dma_async_client_unregister(struct dma_client *client)
{
- struct dma_chan *chan;
+ struct dma_chan_client_ref *chan_ref;
+ int i;

if (!client)
return;

rcu_read_lock();
- list_for_each_entry_rcu(chan, &client->channels, client_node)
- dma_client_chan_free(chan);
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++)
+ list_for_each_entry_rcu(chan_ref, &client->req[i].channels, req_node)
+ dma_client_chan_free(client, chan_ref);
rcu_read_unlock();

mutex_lock(&dma_list_mutex);
@@ -292,17 +421,46 @@ void dma_async_client_unregister(struct
* dma_async_client_chan_request - request DMA channels
* @client: &dma_client
* @number: count of DMA channels requested
+ * @mask: limits the DMA channels returned to those that
+ * have the requisite capabilities
*
* Clients call dma_async_client_chan_request() to specify how many
* DMA channels they need, 0 to free all currently allocated.
* The resulting allocations/frees are indicated to the client via the
- * event callback.
+ * event callback. If the client has exhausted the number of distinct
+ * requests allowed (DMA_MAX_CHAN_TYPE_REQ) this function will return 0.
*/
-void dma_async_client_chan_request(struct dma_client *client,
- unsigned int number)
+int dma_async_client_chan_request(struct dma_client *client,
+ unsigned int number, unsigned int mask)
{
- client->chans_desired = number;
- dma_chans_rebalance();
+ int request_slot_found = 0, i;
+
+ /* adjust an outstanding request */
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++) {
+ struct dma_req *req = &client->req[i];
+ if (req->cap_mask == mask) {
+ req->chans_desired = number;
+ request_slot_found = 1;
+ break;
+ }
+ }
+
+ /* start a new request */
+ if (!request_slot_found)
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++) {
+ struct dma_req *req = &client->req[i];
+ if (!req->chans_desired) {
+ req->chans_desired = number;
+ req->cap_mask = mask;
+ request_slot_found = 1;
+ break;
+ }
+ }
+
+ if (request_slot_found)
+ dma_chans_rebalance();
+
+ return request_slot_found;
}

/**
@@ -335,6 +493,7 @@ int dma_async_device_register(struct dma
device->dev_id, chan->chan_id);

kref_get(&device->refcount);
+ kref_init(&chan->refcount);
class_device_register(&chan->class_dev);
}

@@ -348,6 +507,20 @@ int dma_async_device_register(struct dma
}

/**
+ * dma_async_chan_init - common channel initialization
+ * @chan: &dma_chan
+ * @device: &dma_device
+ */
+void dma_async_chan_init(struct dma_chan *chan, struct dma_device *device)
+{
+ INIT_LIST_HEAD(&chan->peers);
+ INIT_RCU_HEAD(&chan->rcu);
+ spin_lock_init(&chan->lock);
+ chan->device = device;
+ list_add_tail(&chan->device_node, &device->channels);
+}
+
+/**
* dma_async_device_cleanup - function called when all references are released
* @kref: kernel reference object
*/
@@ -366,31 +539,70 @@ static void dma_async_device_cleanup(str
void dma_async_device_unregister(struct dma_device *device)
{
struct dma_chan *chan;
+ struct dma_client_chan_peer *peer;
+ struct dma_req *req;
+ struct dma_chan_client_ref *chan_ref;
+ struct dma_client *client;
+ int i;
unsigned long flags;

mutex_lock(&dma_list_mutex);
list_del(&device->global_node);
mutex_unlock(&dma_list_mutex);

+ /* look up and free each reference to a channel
+ * note: a channel can be allocated to a client once per
+ * request type (DMA_MAX_CHAN_TYPE_REQ)
+ */
list_for_each_entry(chan, &device->channels, device_node) {
- if (chan->client) {
- spin_lock_irqsave(&chan->client->lock, flags);
- list_del(&chan->client_node);
- chan->client->chan_count--;
- spin_unlock_irqrestore(&chan->client->lock, flags);
- chan->client->event_callback(chan->client,
- chan,
- DMA_RESOURCE_REMOVED);
- dma_client_chan_free(chan);
+ rcu_read_lock();
+ list_for_each_entry_rcu(peer, &chan->peers, peer_node) {
+ client = peer->client;
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++) {
+ req = &client->req[i];
+ list_for_each_entry_rcu(chan_ref,
+ &req->channels,
+ req_node) {
+ if (chan_ref->chan == chan) {
+ spin_lock_irqsave(&client->lock, flags);
+ list_del_rcu(&chan_ref->req_node);
+ spin_unlock_irqrestore(&client->lock, flags);
+ atomic_dec(&req->chan_count);
+ client->event_callback(
+ client,
+ chan,
+ DMA_RESOURCE_REMOVED);
+ dma_client_chan_free(client,
+ chan_ref);
+ break;
+ }
+ }
+ }
}
- class_device_unregister(&chan->class_dev);
+ rcu_read_unlock();
+ kref_put(&chan->refcount, dma_chan_cleanup);
+ kref_put(&device->refcount, dma_async_device_cleanup);
}
+
+ class_device_unregister(&chan->class_dev);
+
dma_chans_rebalance();

kref_put(&device->refcount, dma_async_device_cleanup);
wait_for_completion(&device->done);
}

+/**
+ * dma_async_xor_pgs_to_pg_err - default function for dma devices that
+ * do not support xor
+ */
+dma_cookie_t dma_async_xor_pgs_to_pg_err(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off, struct page *src_pgs,
+ unsigned int src_cnt, unsigned int src_off, size_t len)
+{
+ return -ENXIO;
+}
+
static int __init dma_bus_init(void)
{
mutex_init(&dma_list_mutex);
@@ -399,14 +611,17 @@ static int __init dma_bus_init(void)

subsys_initcall(dma_bus_init);

-EXPORT_SYMBOL(dma_async_client_register);
-EXPORT_SYMBOL(dma_async_client_unregister);
-EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
-EXPORT_SYMBOL(dma_async_device_register);
-EXPORT_SYMBOL(dma_async_device_unregister);
-EXPORT_SYMBOL(dma_chan_cleanup);
+EXPORT_SYMBOL_GPL(dma_async_client_register);
+EXPORT_SYMBOL_GPL(dma_async_client_unregister);
+EXPORT_SYMBOL_GPL(dma_async_client_chan_request);
+EXPORT_SYMBOL_GPL(dma_async_memcpy_buf_to_buf);
+EXPORT_SYMBOL_GPL(dma_async_memcpy_buf_to_pg);
+EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to_pg);
+EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg);
+EXPORT_SYMBOL_GPL(dma_async_operation_complete);
+EXPORT_SYMBOL_GPL(dma_async_issue_pending);
+EXPORT_SYMBOL_GPL(dma_async_device_register);
+EXPORT_SYMBOL_GPL(dma_async_device_unregister);
+EXPORT_SYMBOL_GPL(dma_chan_cleanup);
+EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg_err);
+EXPORT_SYMBOL_GPL(dma_async_chan_init);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index dbd4d6c..415de03 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -69,11 +69,7 @@ static int enumerate_dma_channels(struct
spin_lock_init(&ioat_chan->desc_lock);
INIT_LIST_HEAD(&ioat_chan->free_desc);
INIT_LIST_HEAD(&ioat_chan->used_desc);
- /* This should be made common somewhere in dmaengine.c */
- ioat_chan->common.device = &device->common;
- ioat_chan->common.client = NULL;
- list_add_tail(&ioat_chan->common.device_node,
- &device->common.channels);
+ dma_async_chan_init(&ioat_chan->common, &device->common);
}
return device->common.chancnt;
}
@@ -759,8 +755,10 @@ #endif
device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.device_operation_complete = ioat_dma_is_complete;
+ device->common.device_xor_pgs_to_pg = dma_async_xor_pgs_to_pg_err;
+ device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.capabilities = DMA_MEMCPY;
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index c94d8f1..3599472 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -20,7 +20,7 @@
*/
#ifndef DMAENGINE_H
#define DMAENGINE_H
-
+#include <linux/config.h>
#ifdef CONFIG_DMA_ENGINE

#include <linux/device.h>
@@ -65,6 +65,27 @@ enum dma_status {
};

/**
+ * enum dma_capabilities - DMA operational capabilities
+ * @DMA_MEMCPY: src to dest copy
+ * @DMA_XOR: src*n to dest xor
+ * @DMA_DUAL_XOR: src*n to dest_diag and dest_horiz xor
+ * @DMA_PQ_XOR: src*n to dest_q and dest_p gf/xor
+ * @DMA_MEMCPY_CRC32C: src to dest copy and crc-32c sum
+ * @DMA_SHARE: multiple clients can use this channel
+ */
+enum dma_capabilities {
+ DMA_MEMCPY = 0x1,
+ DMA_XOR = 0x2,
+ DMA_PQ_XOR = 0x4,
+ DMA_DUAL_XOR = 0x8,
+ DMA_PQ_UPDATE = 0x10,
+ DMA_ZERO_SUM = 0x20,
+ DMA_PQ_ZERO_SUM = 0x40,
+ DMA_MEMSET = 0x80,
+ DMA_MEMCPY_CRC32C = 0x100,
+};
+
+/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
* @memcpy_count: transaction counter
@@ -75,27 +96,32 @@ struct dma_chan_percpu {
local_t refcount;
/* stats */
unsigned long memcpy_count;
+ unsigned long xor_count;
unsigned long bytes_transferred;
+ unsigned long bytes_xor;
};

/**
* struct dma_chan - devices supply DMA channels, clients use them
- * @client: ptr to the client user of this chan, will be %NULL when unused
+ * @peers: list of the clients of this chan, will be 'empty' when unused
* @device: ptr to the dma device who supplies this channel, always !%NULL
* @cookie: last cookie value returned to client
+ * @exclusive: ptr to the client that is exclusively using this channel
+ * @lock: protects access to the peer list
* @chan_id: channel ID for sysfs
* @class_dev: class device for sysfs
* @refcount: kref, used in "bigref" slow-mode
* @slow_ref: indicates that the DMA channel is free
* @rcu: the DMA channel's RCU head
- * @client_node: used to add this to the client chan list
* @device_node: used to add this to the device chan list
* @local: per-cpu pointer to a struct dma_chan_percpu
*/
struct dma_chan {
- struct dma_client *client;
+ struct list_head peers;
struct dma_device *device;
dma_cookie_t cookie;
+ struct dma_client *exclusive;
+ spinlock_t lock;

/* sysfs */
int chan_id;
@@ -105,7 +131,6 @@ struct dma_chan {
int slow_ref;
struct rcu_head rcu;

- struct list_head client_node;
struct list_head device_node;
struct dma_chan_percpu *local;
};
@@ -139,29 +164,66 @@ typedef void (*dma_event_callback) (stru
struct dma_chan *chan, enum dma_event event);

/**
- * struct dma_client - info on the entity making use of DMA services
- * @event_callback: func ptr to call when something happens
+ * struct dma_req - info on the type and number of channels allocated to a client
* @chan_count: number of chans allocated
* @chans_desired: number of chans requested. Can be +/- chan_count
+ * @cap_mask: DMA capabilities required to satisfy this request
+ * @exclusive: Whether this client would like exclusive use of the channel(s)
+ */
+struct dma_req {
+ atomic_t chan_count;
+ unsigned int chans_desired;
+ unsigned int cap_mask;
+ int exclusive;
+ struct list_head channels;
+};
+
+/**
+ * struct dma_client - info on the entity making use of DMA services
+ * @event_callback: func ptr to call when something happens
+ * @dma_req: tracks client channel requests per capability mask
* @lock: protects access to the channels list
* @channels: the list of DMA channels allocated
* @global_node: list_head for global dma_client_list
*/
+#define DMA_MAX_CHAN_TYPE_REQ 2
struct dma_client {
dma_event_callback event_callback;
- unsigned int chan_count;
- unsigned int chans_desired;
-
+ struct dma_req req[DMA_MAX_CHAN_TYPE_REQ];
spinlock_t lock;
- struct list_head channels;
struct list_head global_node;
};

/**
+ * struct dma_client_chan_peer - info on the entities sharing a DMA channel
+ * @client: &dma_client
+ * @peer_node: node list of other clients on the channel
+ * @rcu: rcu head for the peer object
+ */
+struct dma_client_chan_peer {
+ struct dma_client *client;
+ struct list_head peer_node;
+ struct rcu_head rcu;
+};
+
+/**
+ * struct dma_chan_client_ref - reference object for clients to track channels
+ * @chan: channel reference
+ * @chan_node: node in the list of other channels on the client
+ * @rcu: rcu head for the chan_ref object
+ */
+struct dma_chan_client_ref {
+ struct dma_chan *chan;
+ struct list_head req_node;
+ struct rcu_head rcu;
+};
+
+/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
* @channels: the list of struct dma_chan
* @global_node: list_head for global dma_device_list
+ * @capabilities: channel operations capabilities
* @refcount: reference count
* @done: IO completion struct
* @dev_id: unique device ID
@@ -179,6 +241,7 @@ struct dma_device {
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
+ unsigned long capabilities;

struct kref refcount;
struct completion done;
@@ -195,18 +258,26 @@ struct dma_device {
dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
struct page *dest_pg, unsigned int dest_off,
struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
+ dma_cookie_t (*device_xor_pgs_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page **src_pgs, unsigned int src_cnt,
+ unsigned int src_off, size_t len);
+ enum dma_status (*device_operation_complete)(struct dma_chan *chan,
dma_cookie_t cookie, dma_cookie_t *last,
dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
+ void (*device_issue_pending)(struct dma_chan *chan);
};

/* --- public DMA engine API --- */

struct dma_client *dma_async_client_register(dma_event_callback event_callback);
void dma_async_client_unregister(struct dma_client *client);
-void dma_async_client_chan_request(struct dma_client *client,
- unsigned int number);
+int dma_async_client_chan_request(struct dma_client *client,
+ unsigned int number, unsigned int mask);
+void dma_async_chan_init(struct dma_chan *chan, struct dma_device *device);
+dma_cookie_t dma_async_xor_pgs_to_pg_err(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off, struct page *src_pgs,
+ unsigned int src_cnt, unsigned int src_off, size_t len);

/**
* dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
@@ -284,19 +355,65 @@ static inline dma_cookie_t dma_async_mem
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_xor_pgs_to_pg - offloaded xor from pages to page
+ * @chan: DMA channel to offload xor to
+ * @dest_page: destination page
+ * @dest_off: offset in page to xor to
+ * @src_pgs: array of source pages
+ * @src_cnt: number of source pages
+ * @src_off: offset in pages to xor from
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages)
+ */
+static inline dma_cookie_t dma_async_xor_pgs_to_pg(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off, struct page **src_pgs,
+ unsigned int src_cnt, unsigned int src_off, size_t len)
+{
+ int cpu = get_cpu();
+ per_cpu_ptr(chan->local, cpu)->bytes_xor += len * src_cnt;
+ per_cpu_ptr(chan->local, cpu)->xor_count++;
+ put_cpu();
+
+ return chan->device->device_xor_pgs_to_pg(chan, dest_pg, dest_off,
+ src_pgs, src_cnt, src_off, len);
+}
+
+/**
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
- * This allows drivers to push copies to HW in batches,
+ * This allows drivers to push operations to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(struct dma_chan *chan)
+{
+ return chan->device->device_issue_pending(chan);
+}
+
+/**
+ * dma_async_issue_all - call dma_async_issue_pending on all channels
+ * @client: &dma_client
+ */
+static inline void dma_async_issue_all(struct dma_client *client)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ int i;
+ struct dma_chan_client_ref *chan_ref;
+ struct dma_req *req;
+ for (i = 0; i < DMA_MAX_CHAN_TYPE_REQ; i++) {
+ req = &client->req[i];
+ rcu_read_lock();
+ list_for_each_entry_rcu(chan_ref, &req->channels, req_node)
+ dma_async_issue_pending(chan_ref->chan);
+ rcu_read_unlock();
+ }
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_operations_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +423,10 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
+static inline enum dma_status dma_async_operation_complete(struct dma_chan *chan,
dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ return chan->device->device_operation_complete(chan, cookie, last, used);
}

/**
@@ -318,7 +435,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_operation_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
@@ -334,7 +451,6 @@ static inline enum dma_status dma_async_
return DMA_IN_PROGRESS;
}

-
/* --- DMA device --- */

int dma_async_device_register(struct dma_device *device);
diff --git a/net/core/dev.c b/net/core/dev.c
index d4a1ec3..9447f94 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1941,13 +1941,8 @@ #ifdef CONFIG_NET_DMA
* There may not be any more sk_buffs coming right now, so push
* any pending DMA copies to hardware
*/
- if (net_dma_client) {
- struct dma_chan *chan;
- rcu_read_lock();
- list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
- rcu_read_unlock();
- }
+ if (net_dma_client)
+ dma_async_issue_all(net_dma_client);
#endif
local_irq_enable();
return;
@@ -3410,7 +3405,8 @@ #ifdef CONFIG_NET_DMA
static void net_dma_rebalance(void)
{
unsigned int cpu, i, n;
- struct dma_chan *chan;
+ struct dma_chan_client_ref *chan_ref;
+ struct dma_req *req;

if (net_dma_count == 0) {
for_each_online_cpu(cpu)
@@ -3421,13 +3417,16 @@ static void net_dma_rebalance(void)
i = 0;
cpu = first_cpu(cpu_online_map);

+ /* NET_DMA only requests one type of dma channel (memcpy) */
+ req = &net_dma_client->req[0];
+
rcu_read_lock();
- list_for_each_entry(chan, &net_dma_client->channels, client_node) {
+ list_for_each_entry(chan_ref, &req->channels, req_node) {
n = ((num_online_cpus() / net_dma_count)
+ (i < (num_online_cpus() % net_dma_count) ? 1 : 0));

while(n) {
- per_cpu(softnet_data, cpu).net_dma = chan;
+ per_cpu(softnet_data, cpu).net_dma = chan_ref->chan;
cpu = next_cpu(cpu, cpu_online_map);
n--;
}
@@ -3471,7 +3470,7 @@ static int __init netdev_dma_register(vo
if (net_dma_client == NULL)
return -ENOMEM;

- dma_async_client_chan_request(net_dma_client, num_online_cpus());
+ dma_async_client_chan_request(net_dma_client, num_online_cpus(), DMA_MEMCPY);
return 0;
}

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 934396b..cd8ad41 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1431,9 +1431,9 @@ #ifdef CONFIG_NET_DMA
struct sk_buff *skb;
dma_cookie_t done, used;

- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_issue_pending(tp->ucopy.dma_chan);

- while (dma_async_memcpy_complete(tp->ucopy.dma_chan,
+ while (dma_async_operation_complete(tp->ucopy.dma_chan,
tp->ucopy.dma_cookie, &done,
&used) == DMA_IN_PROGRESS) {
/* do partial cleanup of sk_async_wait_queue */

2006-09-11 23:23:53

by Dan Williams

[permalink] [raw]
Subject: [PATCH 03/19] raid5: move check parity operations to a workqueue

From: Dan Williams <[email protected]>

Enable handle_stripe5 to pass off check parity operations to
raid5_do_soft_block_ops formerly handled by compute_parity5.

Changelog:
* removed handle_check_operations5. All logic moved into handle_stripe5 so
that we do not need to go through the initiation logic to end the
operation.
* clear the uptodate bit on the parity block
* hold off check operations if a parity dependent operation is in flight
like a write

Signed-off-by: Dan Williams <[email protected]>
---

drivers/md/raid5.c | 60 ++++++++++++++++++++++++++++++++++++----------------
1 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e39d248..24ed4d8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2121,35 +2121,59 @@ #endif
locked += handle_write_operations5(sh, rcw);
}

- /* maybe we need to check and possibly fix the parity for this stripe
- * Any reads will already have been scheduled, so we just see if enough data
- * is available
+ /* 1/ Maybe we need to check and possibly fix the parity for this stripe.
+ * Any reads will already have been scheduled, so we just see if enough data
+ * is available.
+ * 2/ Hold off parity checks while parity dependent operations are in flight
+ * (RCW and RMW are protected by 'locked')
*/
- if (syncing && locked == 0 &&
- !test_bit(STRIPE_INSYNC, &sh->state)) {
+ if ((syncing && locked == 0 &&
+ !test_bit(STRIPE_INSYNC, &sh->state)) ||
+ test_bit(STRIPE_OP_CHECK, &sh->state)) {
+
set_bit(STRIPE_HANDLE, &sh->state);
+ /* Take one of the following actions:
+ * 1/ start a check parity operation if (uptodate == disks)
+ * 2/ finish a check parity operation and act on the result
+ */
if (failed == 0) {
- BUG_ON(uptodate != disks);
- compute_parity5(sh, CHECK_PARITY);
- uptodate--;
- if (page_is_zero(sh->dev[sh->pd_idx].page)) {
- /* parity is correct (on disc, not in buffer any more) */
- set_bit(STRIPE_INSYNC, &sh->state);
- } else {
- conf->mddev->resync_mismatches += STRIPE_SECTORS;
- if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
- /* don't try to repair!! */
+ if (!test_bit(STRIPE_OP_CHECK, &sh->state)) {
+ BUG_ON(uptodate != disks);
+ set_bit(STRIPE_OP_CHECK, &sh->state);
+ set_bit(STRIPE_OP_CHECK_Gen, &sh->ops.state);
+ clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+ sh->ops.pending++;
+ uptodate--;
+ } else if (test_and_clear_bit(STRIPE_OP_CHECK_Done, &sh->ops.state)) {
+ clear_bit(STRIPE_OP_CHECK, &sh->state);
+
+ if (test_and_clear_bit(STRIPE_OP_CHECK_IsZero,
+ &sh->ops.state))
+ /* parity is correct (on disc, not in buffer any more) */
set_bit(STRIPE_INSYNC, &sh->state);
else {
- compute_block(sh, sh->pd_idx);
- uptodate++;
+ conf->mddev->resync_mismatches += STRIPE_SECTORS;
+ if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
+ /* don't try to repair!! */
+ set_bit(STRIPE_INSYNC, &sh->state);
+ else {
+ compute_block(sh, sh->pd_idx);
+ uptodate++;
+ }
}
}
}
- if (!test_bit(STRIPE_INSYNC, &sh->state)) {
+
+ /* Wait for check parity operations to complete
+ * before write-back
+ */
+ if (!test_bit(STRIPE_INSYNC, &sh->state) &&
+ !test_bit(STRIPE_OP_CHECK, &sh->state)) {
+
/* either failed parity check, or recovery is happening */
if (failed==0)
failed_num = sh->pd_idx;
+
dev = &sh->dev[failed_num];
BUG_ON(!test_bit(R5_UPTODATE, &dev->flags));
BUG_ON(uptodate != disks);

2006-09-11 23:34:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 01/19] raid5: raid5_do_soft_block_ops

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> raid5_do_soft_block_ops consolidates all the stripe cache maintenance
> operations into a single routine. The stripe operations are:
> * copying data between the stripe cache and user application buffers
> * computing blocks to save a disk access, or to recover a missing block
> * updating the parity on a write operation (reconstruct write and
> read-modify-write)
> * checking parity correctness
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> drivers/md/raid5.c | 289 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/raid/raid5.h | 129 +++++++++++++++++++-
> 2 files changed, 415 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4500660..8fde62b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -1362,6 +1362,295 @@ static int stripe_to_pdidx(sector_t stri
> return pd_idx;
> }
>
> +/*
> + * raid5_do_soft_block_ops - perform block memory operations on stripe data
> + * outside the spin lock.
> + */
> +static void raid5_do_soft_block_ops(void *stripe_head_ref)

This function absolutely must be broken up into multiple functions,
presumably one per operation.

Jeff



2006-09-11 23:36:55

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 02/19] raid5: move write operations to a workqueue

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> Enable handle_stripe5 to pass off write operations to
> raid5_do_soft_blocks_ops (which can be run as a workqueue). The operations
> moved are reconstruct-writes and read-modify-writes formerly handled by
> compute_parity5.
>
> Changelog:
> * moved raid5_do_soft_block_ops changes into a separate patch
> * changed handle_write_operations5 to only initiate write operations, which
> prevents new writes from being requested while the current one is in flight
> * all blocks undergoing a write are now marked locked and !uptodate at the
> beginning of the write operation
> * blocks undergoing a read-modify-write need a request flag to distinguish
> them from blocks that are locked for reading. Reconstruct-writes still use
> the R5_LOCKED bit to select blocks for the operation
> * integrated the work queue Kconfig option
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> drivers/md/Kconfig | 21 +++++
> drivers/md/raid5.c | 192 ++++++++++++++++++++++++++++++++++++++------
> include/linux/raid/raid5.h | 3 +
> 3 files changed, 190 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index bf869ed..2a16b3b 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -162,6 +162,27 @@ config MD_RAID5_RESHAPE
> There should be enough spares already present to make the new
> array workable.
>
> +config MD_RAID456_WORKQUEUE
> + depends on MD_RAID456
> + bool "Offload raid work to a workqueue from raid5d"
> + ---help---
> + This option enables raid work (block copy and xor operations)
> + to run in a workqueue. If your platform has a high context
> + switch penalty say N. If you are using hardware offload or
> + are running on an SMP platform say Y.
> +
> + If unsure say, Y.
> +
> +config MD_RAID456_WORKQUEUE_MULTITHREAD
> + depends on MD_RAID456_WORKQUEUE && SMP
> + bool "Enable multi-threaded raid processing"
> + default y
> + ---help---
> + This option controls whether the raid workqueue will be multi-
> + threaded or single threaded.
> +
> + If unsure say, Y.

In the final patch that gets merged, these configuration options should
go away. We are very anti-#ifdef in Linux, for a variety of reasons.
In this particular instance, code complexity increases and
maintainability decreases as the #ifdef forest grows.

Jeff



2006-09-11 23:38:10

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

Dan Williams wrote:
> Neil,
>
> The following patches implement hardware accelerated raid5 for the Intel
> XscaleĀ® series of I/O Processors. The MD changes allow stripe
> operations to run outside the spin lock in a work queue. Hardware
> acceleration is achieved by using a dma-engine-aware work queue routine
> instead of the default software only routine.
>
> Since the last release of the raid5 changes many bug fixes and other
> improvements have been made as a result of stress testing. See the per
> patch change logs for more information about what was fixed. This
> release is the first release of the full dma implementation.
>
> The patches touch 3 areas, the md-raid5 driver, the generic dmaengine
> interface, and a platform device driver for IOPs. The raid5 changes
> follow your comments concerning making the acceleration implementation
> similar to how the stripe cache handles I/O requests. The dmaengine
> changes are the second release of this code. They expand the interface
> to handle more than memcpy operations, and add a generic raid5-dma
> client. The iop-adma driver supports dma memcpy, xor, xor zero sum, and
> memset across all IOP architectures (32x, 33x, and 13xx).
>
> Concerning the context switching performance concerns raised at the
> previous release, I have observed the following. For the hardware
> accelerated case it appears that performance is always better with the
> work queue than without since it allows multiple stripes to be operated
> on simultaneously. I expect the same for an SMP platform, but so far my
> testing has been limited to IOPs. For a single-processor
> non-accelerated configuration I have not observed performance
> degradation with work queue support enabled, but in the Kconfig option
> help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).
>
> Please consider the patches for -mm.
>
> -Dan
>
> [PATCH 01/19] raid5: raid5_do_soft_block_ops
> [PATCH 02/19] raid5: move write operations to a workqueue
> [PATCH 03/19] raid5: move check parity operations to a workqueue
> [PATCH 04/19] raid5: move compute block operations to a workqueue
> [PATCH 05/19] raid5: move read completion copies to a workqueue
> [PATCH 06/19] raid5: move the reconstruct write expansion operation to a workqueue
> [PATCH 07/19] raid5: remove compute_block and compute_parity5
> [PATCH 08/19] dmaengine: enable multiple clients and operations
> [PATCH 09/19] dmaengine: reduce backend address permutations
> [PATCH 10/19] dmaengine: expose per channel dma mapping characteristics to clients
> [PATCH 11/19] dmaengine: add memset as an asynchronous dma operation
> [PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy
> [PATCH 13/19] dmaengine: add support for dma xor zero sum operations
> [PATCH 14/19] dmaengine: add dma_sync_wait
> [PATCH 15/19] dmaengine: raid5 dma client
> [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines
> [PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs
> [PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization
> [PATCH 19/19] iop3xx: IOP 32x and 33x support for the iop-adma driver

Can devices like drivers/scsi/sata_sx4.c or drivers/scsi/sata_promise.c
take advantage of this? Promise silicon supports RAID5 XOR offload.

If so, how? If not, why not? :)

Jeff



2006-09-11 23:44:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

Dan Williams wrote:
> @@ -759,8 +755,10 @@ #endif
> device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
> device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
> device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
> - device->common.device_memcpy_complete = ioat_dma_is_complete;
> - device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
> + device->common.device_operation_complete = ioat_dma_is_complete;
> + device->common.device_xor_pgs_to_pg = dma_async_xor_pgs_to_pg_err;
> + device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
> + device->common.capabilities = DMA_MEMCPY;


Are we really going to add a set of hooks for each DMA engine whizbang
feature?

That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
aes, and a dozen other transforms.


> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index c94d8f1..3599472 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -20,7 +20,7 @@
> */
> #ifndef DMAENGINE_H
> #define DMAENGINE_H
> -
> +#include <linux/config.h>
> #ifdef CONFIG_DMA_ENGINE
>
> #include <linux/device.h>
> @@ -65,6 +65,27 @@ enum dma_status {
> };
>
> /**
> + * enum dma_capabilities - DMA operational capabilities
> + * @DMA_MEMCPY: src to dest copy
> + * @DMA_XOR: src*n to dest xor
> + * @DMA_DUAL_XOR: src*n to dest_diag and dest_horiz xor
> + * @DMA_PQ_XOR: src*n to dest_q and dest_p gf/xor
> + * @DMA_MEMCPY_CRC32C: src to dest copy and crc-32c sum
> + * @DMA_SHARE: multiple clients can use this channel
> + */
> +enum dma_capabilities {
> + DMA_MEMCPY = 0x1,
> + DMA_XOR = 0x2,
> + DMA_PQ_XOR = 0x4,
> + DMA_DUAL_XOR = 0x8,
> + DMA_PQ_UPDATE = 0x10,
> + DMA_ZERO_SUM = 0x20,
> + DMA_PQ_ZERO_SUM = 0x40,
> + DMA_MEMSET = 0x80,
> + DMA_MEMCPY_CRC32C = 0x100,

Please use the more readable style that explicitly lists bits:

DMA_MEMCPY = (1 << 0),
DMA_XOR = (1 << 1),
...


> +/**
> * struct dma_chan_percpu - the per-CPU part of struct dma_chan
> * @refcount: local_t used for open-coded "bigref" counting
> * @memcpy_count: transaction counter
> @@ -75,27 +96,32 @@ struct dma_chan_percpu {
> local_t refcount;
> /* stats */
> unsigned long memcpy_count;
> + unsigned long xor_count;
> unsigned long bytes_transferred;
> + unsigned long bytes_xor;

Clearly, each operation needs to be more compartmentalized.

This just isn't scalable, when you consider all the possible transforms.

Jeff


2006-09-11 23:50:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 11/19] dmaengine: add memset as an asynchronous dma operation

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> Changelog:
> * make the dmaengine api EXPORT_SYMBOL_GPL
> * zero sum support should be standalone, not integrated into xor
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> drivers/dma/dmaengine.c | 15 ++++++++++
> drivers/dma/ioatdma.c | 5 +++
> include/linux/dmaengine.h | 68 +++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 88 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index e78ce89..fe62237 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -604,6 +604,17 @@ dma_cookie_t dma_async_do_xor_err(struct
> return -ENXIO;
> }
>
> +/**
> + * dma_async_do_memset_err - default function for dma devices that
> + * do not support memset
> + */
> +dma_cookie_t dma_async_do_memset_err(struct dma_chan *chan,
> + union dmaengine_addr dest, unsigned int dest_off,
> + int val, size_t len, unsigned long flags)
> +{
> + return -ENXIO;
> +}
> +
> static int __init dma_bus_init(void)
> {
> mutex_init(&dma_list_mutex);
> @@ -621,6 +632,9 @@ EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to
> EXPORT_SYMBOL_GPL(dma_async_memcpy_dma_to_dma);
> EXPORT_SYMBOL_GPL(dma_async_memcpy_pg_to_dma);
> EXPORT_SYMBOL_GPL(dma_async_memcpy_dma_to_pg);
> +EXPORT_SYMBOL_GPL(dma_async_memset_buf);
> +EXPORT_SYMBOL_GPL(dma_async_memset_page);
> +EXPORT_SYMBOL_GPL(dma_async_memset_dma);
> EXPORT_SYMBOL_GPL(dma_async_xor_pgs_to_pg);
> EXPORT_SYMBOL_GPL(dma_async_xor_dma_list_to_dma);
> EXPORT_SYMBOL_GPL(dma_async_operation_complete);
> @@ -629,6 +643,7 @@ EXPORT_SYMBOL_GPL(dma_async_device_regis
> EXPORT_SYMBOL_GPL(dma_async_device_unregister);
> EXPORT_SYMBOL_GPL(dma_chan_cleanup);
> EXPORT_SYMBOL_GPL(dma_async_do_xor_err);
> +EXPORT_SYMBOL_GPL(dma_async_do_memset_err);
> EXPORT_SYMBOL_GPL(dma_async_chan_init);
> EXPORT_SYMBOL_GPL(dma_async_map_page);
> EXPORT_SYMBOL_GPL(dma_async_map_single);
> diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
> index 0159d14..231247c 100644
> --- a/drivers/dma/ioatdma.c
> +++ b/drivers/dma/ioatdma.c
> @@ -637,6 +637,10 @@ extern dma_cookie_t dma_async_do_xor_err
> union dmaengine_addr src, unsigned int src_cnt,
> unsigned int src_off, size_t len, unsigned long flags);
>
> +extern dma_cookie_t dma_async_do_memset_err(struct dma_chan *chan,
> + union dmaengine_addr dest, unsigned int dest_off,
> + int val, size_t size, unsigned long flags);
> +
> static dma_addr_t ioat_map_page(struct dma_chan *chan, struct page *page,
> unsigned long offset, size_t size,
> int direction)
> @@ -748,6 +752,7 @@ #endif
> device->common.capabilities = DMA_MEMCPY;
> device->common.device_do_dma_memcpy = do_ioat_dma_memcpy;
> device->common.device_do_dma_xor = dma_async_do_xor_err;
> + device->common.device_do_dma_memset = dma_async_do_memset_err;
> device->common.map_page = ioat_map_page;
> device->common.map_single = ioat_map_single;
> device->common.unmap_page = ioat_unmap_page;
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index cb4cfcf..8d53b08 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -260,6 +260,7 @@ struct dma_chan_client_ref {
> * @device_issue_pending: push appended descriptors to hardware
> * @device_do_dma_memcpy: perform memcpy with a dma engine
> * @device_do_dma_xor: perform block xor with a dma engine
> + * @device_do_dma_memset: perform block fill with a dma engine
> */
> struct dma_device {
>
> @@ -284,6 +285,9 @@ struct dma_device {
> union dmaengine_addr src, unsigned int src_cnt,
> unsigned int src_off, size_t len,
> unsigned long flags);
> + dma_cookie_t (*device_do_dma_memset)(struct dma_chan *chan,
> + union dmaengine_addr dest, unsigned int dest_off,
> + int value, size_t len, unsigned long flags);

Same comment as for XOR: adding operations in this way just isn't scalable.

Operations need to be more compartmentalized.

Maybe a client could do:

struct adma_transaction adma_xact;

/* fill in hooks with XOR-specific info */
init_XScale_xor(adma_device, &adma_xact, my_completion_func);

/* initiate transaction */
adma_go(&adma_xact);

/* callback signals completion asynchronously */

2006-09-11 23:51:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> Default virtual function that returns an error if the user attempts a
> memcpy operation. An XOR engine is an example of a DMA engine that does
> not support memcpy.
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> drivers/dma/dmaengine.c | 13 +++++++++++++
> 1 files changed, 13 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index fe62237..33ad690 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -593,6 +593,18 @@ void dma_async_device_unregister(struct
> }
>
> /**
> + * dma_async_do_memcpy_err - default function for dma devices that
> + * do not support memcpy
> + */
> +dma_cookie_t dma_async_do_memcpy_err(struct dma_chan *chan,
> + union dmaengine_addr dest, unsigned int dest_off,
> + union dmaengine_addr src, unsigned int src_off,
> + size_t len, unsigned long flags)
> +{
> + return -ENXIO;
> +}

Further illustration of how this API growth is going wrong. You should
create an API such that it is impossible for an XOR transform to ever
call non-XOR-transform hooks.

Jeff



2006-09-11 23:52:12

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 14/19] dmaengine: add dma_sync_wait

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> dma_sync_wait is a common routine to live wait for a dma operation to
> complete.
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> include/linux/dmaengine.h | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> index 9fd6cbd..0a70c9e 100644
> --- a/include/linux/dmaengine.h
> +++ b/include/linux/dmaengine.h
> @@ -750,6 +750,18 @@ static inline void dma_async_unmap_singl
> chan->device->unmap_single(chan, handle, size, direction);
> }
>
> +static inline enum dma_status dma_sync_wait(struct dma_chan *chan,
> + dma_cookie_t cookie)
> +{
> + enum dma_status status;
> + dma_async_issue_pending(chan);
> + do {
> + status = dma_async_operation_complete(chan, cookie, NULL, NULL);
> + } while (status == DMA_IN_PROGRESS);
> +
> + return status;

Where are the timeouts, etc.? Looks like an infinite loop to me, in the
worst case.

Jeff



2006-09-11 23:53:07

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On 9/11/06, Jeff Garzik <[email protected]> wrote:
> Dan Williams wrote:
> > Neil,
> >
> > The following patches implement hardware accelerated raid5 for the Intel
> > Xscale(r) series of I/O Processors. The MD changes allow stripe
> > operations to run outside the spin lock in a work queue. Hardware
> > acceleration is achieved by using a dma-engine-aware work queue routine
> > instead of the default software only routine.
> >
> > Since the last release of the raid5 changes many bug fixes and other
> > improvements have been made as a result of stress testing. See the per
> > patch change logs for more information about what was fixed. This
> > release is the first release of the full dma implementation.
> >
> > The patches touch 3 areas, the md-raid5 driver, the generic dmaengine
> > interface, and a platform device driver for IOPs. The raid5 changes
> > follow your comments concerning making the acceleration implementation
> > similar to how the stripe cache handles I/O requests. The dmaengine
> > changes are the second release of this code. They expand the interface
> > to handle more than memcpy operations, and add a generic raid5-dma
> > client. The iop-adma driver supports dma memcpy, xor, xor zero sum, and
> > memset across all IOP architectures (32x, 33x, and 13xx).
> >
> > Concerning the context switching performance concerns raised at the
> > previous release, I have observed the following. For the hardware
> > accelerated case it appears that performance is always better with the
> > work queue than without since it allows multiple stripes to be operated
> > on simultaneously. I expect the same for an SMP platform, but so far my
> > testing has been limited to IOPs. For a single-processor
> > non-accelerated configuration I have not observed performance
> > degradation with work queue support enabled, but in the Kconfig option
> > help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).
> >
> > Please consider the patches for -mm.
> >
> > -Dan
> >
> > [PATCH 01/19] raid5: raid5_do_soft_block_ops
> > [PATCH 02/19] raid5: move write operations to a workqueue
> > [PATCH 03/19] raid5: move check parity operations to a workqueue
> > [PATCH 04/19] raid5: move compute block operations to a workqueue
> > [PATCH 05/19] raid5: move read completion copies to a workqueue
> > [PATCH 06/19] raid5: move the reconstruct write expansion operation to a workqueue
> > [PATCH 07/19] raid5: remove compute_block and compute_parity5
> > [PATCH 08/19] dmaengine: enable multiple clients and operations
> > [PATCH 09/19] dmaengine: reduce backend address permutations
> > [PATCH 10/19] dmaengine: expose per channel dma mapping characteristics to clients
> > [PATCH 11/19] dmaengine: add memset as an asynchronous dma operation
> > [PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy
> > [PATCH 13/19] dmaengine: add support for dma xor zero sum operations
> > [PATCH 14/19] dmaengine: add dma_sync_wait
> > [PATCH 15/19] dmaengine: raid5 dma client
> > [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines
> > [PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs
> > [PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization
> > [PATCH 19/19] iop3xx: IOP 32x and 33x support for the iop-adma driver
>
> Can devices like drivers/scsi/sata_sx4.c or drivers/scsi/sata_promise.c
> take advantage of this? Promise silicon supports RAID5 XOR offload.
>
> If so, how? If not, why not? :)
This is a frequently asked question, Alan Cox had the same one at OLS.
The answer is "probably." The only complication I currently see is
where/how the stripe cache is maintained. With the IOPs its easy
because the DMA engines operate directly on kernel memory. With the
Promise card I believe they have memory on the card and it's not clear
to me if the XOR engines on the card can deal with host memory. Also,
MD would need to be modified to handle a stripe cache located on a
device, or somehow synchronize its local cache with card in a manner
that is still able to beat software only MD.

> Jeff

Dan

2006-09-11 23:54:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 15/19] dmaengine: raid5 dma client

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> Adds a dmaengine client that is the hardware accelerated version of
> raid5_do_soft_block_ops. It utilizes the raid5 workqueue implementation to
> operate on multiple stripes simultaneously. See the iop-adma.c driver for
> an example of a driver that enables hardware accelerated raid5.
>
> Changelog:
> * mark operations as _Dma rather than _Done until all outstanding
> operations have completed. Once all operations have completed update the
> state and return it to the handle list
> * add a helper routine to retrieve the last used cookie
> * use dma_async_zero_sum_dma_list for checking parity which optionally
> allows parity check operations to not dirty the parity block in the cache
> (if 'disks' is less than 'MAX_ADMA_XOR_SOURCES')
> * remove dependencies on iop13xx
> * take into account the fact that dma engines have a staging buffer so we
> can perform 1 less block operation compared to software xor
> * added __arch_raid5_dma_chan_request __arch_raid5_dma_next_channel and
> __arch_raid5_dma_check_channel to make the driver architecture independent
> * added channel switching capability for architectures that implement
> different operations (i.e. copy & xor) on individual channels
> * added initial support for "non-blocking" channel switching
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> drivers/dma/Kconfig | 9 +
> drivers/dma/Makefile | 1
> drivers/dma/raid5-dma.c | 730 ++++++++++++++++++++++++++++++++++++++++++++
> drivers/md/Kconfig | 11 +
> drivers/md/raid5.c | 66 ++++
> include/linux/dmaengine.h | 5
> include/linux/raid/raid5.h | 24 +
> 7 files changed, 839 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 30d021d..fced8c3 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -22,6 +22,15 @@ config NET_DMA
> Since this is the main user of the DMA engine, it should be enabled;
> say Y here.
>
> +config RAID5_DMA
> + tristate "MD raid5: block operations offload"
> + depends on INTEL_IOP_ADMA && MD_RAID456
> + default y
> + ---help---
> + This enables the use of DMA engines in the MD-RAID5 driver to
> + offload stripe cache operations, freeing CPU cycles.
> + say Y here
> +
> comment "DMA Devices"
>
> config INTEL_IOATDMA
> diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
> index bdcfdbd..4e36d6e 100644
> --- a/drivers/dma/Makefile
> +++ b/drivers/dma/Makefile
> @@ -1,3 +1,4 @@
> obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
> obj-$(CONFIG_NET_DMA) += iovlock.o
> +obj-$(CONFIG_RAID5_DMA) += raid5-dma.o
> obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
> diff --git a/drivers/dma/raid5-dma.c b/drivers/dma/raid5-dma.c
> new file mode 100644
> index 0000000..04a1790
> --- /dev/null
> +++ b/drivers/dma/raid5-dma.c
> @@ -0,0 +1,730 @@
> +/*
> + * Offload raid5 operations to hardware RAID engines
> + * Copyright(c) 2006 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the Free
> + * Software Foundation; either version 2 of the License, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; if not, write to the Free Software Foundation, Inc., 59
> + * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * The full GNU General Public License is included in this distribution in the
> + * file called COPYING.
> + */
> +
> +#include <linux/raid/raid5.h>
> +#include <linux/dmaengine.h>
> +
> +static struct dma_client *raid5_dma_client;
> +static atomic_t raid5_count;
> +extern void release_stripe(struct stripe_head *sh);
> +extern void __arch_raid5_dma_chan_request(struct dma_client *client);
> +extern struct dma_chan *__arch_raid5_dma_next_channel(struct dma_client *client);
> +
> +#define MAX_HW_XOR_SRCS 16
> +
> +#ifndef STRIPE_SIZE
> +#define STRIPE_SIZE PAGE_SIZE
> +#endif
> +
> +#ifndef STRIPE_SECTORS
> +#define STRIPE_SECTORS (STRIPE_SIZE>>9)
> +#endif
> +
> +#ifndef r5_next_bio
> +#define r5_next_bio(bio, sect) ( ( (bio)->bi_sector + ((bio)->bi_size>>9) < sect + STRIPE_SECTORS) ? (bio)->bi_next : NULL)
> +#endif
> +
> +#define DMA_RAID5_DEBUG 0
> +#define PRINTK(x...) ((void)(DMA_RAID5_DEBUG && printk(x)))
> +
> +/*
> + * Copy data between a page in the stripe cache, and one or more bion
> + * The page could align with the middle of the bio, or there could be
> + * several bion, each with several bio_vecs, which cover part of the page
> + * Multiple bion are linked together on bi_next. There may be extras
> + * at the end of this list. We ignore them.
> + */
> +static dma_cookie_t dma_raid_copy_data(int frombio, struct bio *bio,
> + dma_addr_t dma, sector_t sector, struct dma_chan *chan,
> + dma_cookie_t cookie)
> +{
> + struct bio_vec *bvl;
> + struct page *bio_page;
> + int i;
> + int dma_offset;
> + dma_cookie_t last_cookie = cookie;
> +
> + if (bio->bi_sector >= sector)
> + dma_offset = (signed)(bio->bi_sector - sector) * 512;
> + else
> + dma_offset = (signed)(sector - bio->bi_sector) * -512;
> + bio_for_each_segment(bvl, bio, i) {
> + int len = bio_iovec_idx(bio,i)->bv_len;
> + int clen;
> + int b_offset = 0;
> +
> + if (dma_offset < 0) {
> + b_offset = -dma_offset;
> + dma_offset += b_offset;
> + len -= b_offset;
> + }
> +
> + if (len > 0 && dma_offset + len > STRIPE_SIZE)
> + clen = STRIPE_SIZE - dma_offset;
> + else clen = len;
> +
> + if (clen > 0) {
> + b_offset += bio_iovec_idx(bio,i)->bv_offset;
> + bio_page = bio_iovec_idx(bio,i)->bv_page;
> + if (frombio)
> + do {
> + cookie = dma_async_memcpy_pg_to_dma(chan,
> + dma + dma_offset,
> + bio_page,
> + b_offset,
> + clen);
> + if (cookie == -ENOMEM)
> + dma_sync_wait(chan, last_cookie);
> + else
> + WARN_ON(cookie <= 0);
> + } while (cookie == -ENOMEM);
> + else
> + do {
> + cookie = dma_async_memcpy_dma_to_pg(chan,
> + bio_page,
> + b_offset,
> + dma + dma_offset,
> + clen);
> + if (cookie == -ENOMEM)
> + dma_sync_wait(chan, last_cookie);
> + else
> + WARN_ON(cookie <= 0);
> + } while (cookie == -ENOMEM);
> + }
> + last_cookie = cookie;
> + if (clen < len) /* hit end of page */
> + break;
> + dma_offset += len;
> + }
> +
> + return last_cookie;
> +}
> +
> +#define issue_xor() do { \
> + do { \
> + cookie = dma_async_xor_dma_list_to_dma( \
> + sh->ops.dma_chan, \
> + xor_destination_addr, \
> + dma, \
> + count, \
> + STRIPE_SIZE); \
> + if (cookie == -ENOMEM) \
> + dma_sync_wait(sh->ops.dma_chan, \
> + sh->ops.dma_cookie); \
> + else \
> + WARN_ON(cookie <= 0); \
> + } while (cookie == -ENOMEM); \
> + sh->ops.dma_cookie = cookie; \
> + dma[0] = xor_destination_addr; \
> + count = 1; \
> + } while(0)
> +#define check_xor() do { \
> + if (count == MAX_HW_XOR_SRCS) \
> + issue_xor(); \
> + } while (0)
> +
> +#ifdef CONFIG_RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH
> +extern struct dma_chan *__arch_raid5_dma_check_channel(struct dma_chan *chan,
> + dma_cookie_t cookie,
> + struct dma_client *client,
> + unsigned long capabilities);
> +
> +#ifdef CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE
> +#define check_channel(cap, bookmark) do { \
> +bookmark: \
> + next_chan = __arch_raid5_dma_check_channel(sh->ops.dma_chan, \
> + sh->ops.dma_cookie, \
> + raid5_dma_client, \
> + (cap)); \
> + if (!next_chan) { \
> + BUG_ON(sh->ops.ops_bookmark); \
> + sh->ops.ops_bookmark = &&bookmark; \
> + goto raid5_dma_retry; \
> + } else { \
> + sh->ops.dma_chan = next_chan; \
> + sh->ops.dma_cookie = dma_async_get_last_cookie( \
> + next_chan); \
> + sh->ops.ops_bookmark = NULL; \
> + } \
> +} while (0)
> +#else
> +#define check_channel(cap, bookmark) do { \
> +bookmark: \
> + next_chan = __arch_raid5_dma_check_channel(sh->ops.dma_chan, \
> + sh->ops.dma_cookie, \
> + raid5_dma_client, \
> + (cap)); \
> + if (!next_chan) { \
> + dma_sync_wait(sh->ops.dma_chan, sh->ops.dma_cookie); \
> + goto bookmark; \
> + } else { \
> + sh->ops.dma_chan = next_chan; \
> + sh->ops.dma_cookie = dma_async_get_last_cookie( \
> + next_chan); \
> + } \
> +} while (0)
> +#endif /* CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE */
> +#else
> +#define check_channel(cap, bookmark) do { } while (0)
> +#endif /* CONFIG_RAID5_DMA_ARCH_NEEDS_CHAN_SWITCH */

The above seems a bit questionable and overengineered.

Linux mantra: Do What You Must, And No More.

In this case, just code and note that it's IOP-specific. Don't bother
to support cases that doesn't exist yet.


> + * dma_do_raid5_block_ops - perform block memory operations on stripe data
> + * outside the spin lock with dma engines
> + *
> + * A note about the need for __arch_raid5_dma_check_channel:
> + * This function is only needed to support architectures where a single raid
> + * operation spans multiple hardware channels. For example on a reconstruct
> + * write, memory copy operations are submitted to a memcpy channel and then
> + * the routine must switch to the xor channel to complete the raid operation.
> + * __arch_raid5_dma_check_channel makes sure the previous operation has
> + * completed before returning the new channel.
> + * Some efficiency can be gained by putting the stripe back on the work
> + * queue rather than spin waiting. This code is a work in progress and is
> + * available via the 'broken' option CONFIG_RAID5_DMA_WAIT_VIA_REQUEUE.
> + * If 'wait via requeue' is not defined the check_channel macro live waits
> + * for the next channel.
> + */
> +static void dma_do_raid5_block_ops(void *stripe_head_ref)
> +{

Another way-too-big function that should be split up.


2006-09-11 23:55:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> Also brings the iop3xx registers in line with the format of the iop13xx
> register definitions.
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> include/asm-arm/arch-iop32x/entry-macro.S | 2
> include/asm-arm/arch-iop32x/iop32x.h | 14 +
> include/asm-arm/arch-iop33x/entry-macro.S | 2
> include/asm-arm/arch-iop33x/iop33x.h | 38 ++-
> include/asm-arm/hardware/iop3xx.h | 347 +++++++++++++----------------
> 5 files changed, 188 insertions(+), 215 deletions(-)

Another Linux mantra: "volatile" == hiding a bug. Avoid, please.

Jeff



2006-09-11 23:56:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization

Dan Williams wrote:
> From: Dan Williams <[email protected]>
>
> Currently the iop3xx platform support code assumes that RedBoot is the
> bootloader and has already initialized the ATU. Linux should handle this
> initialization for three reasons:
>
> 1/ The memory map that RedBoot sets up is not optimal (page_to_dma and
> virt_to_phys return different addresses). The effect of this is that using
> the dma mapping API for the internal bus dma units generates pci bus
> addresses that are incorrect for the internal bus.
>
> 2/ Not all iop platforms use RedBoot
>
> 3/ If the ATU is already initialized it indicates that the iop is an add-in
> card in another host, it does not own the PCI bus, and should not be
> re-initialized.
>
> Signed-off-by: Dan Williams <[email protected]>
> ---
>
> arch/arm/mach-iop32x/Kconfig | 8 ++
> arch/arm/mach-iop32x/ep80219.c | 4 +
> arch/arm/mach-iop32x/iq31244.c | 5 +
> arch/arm/mach-iop32x/iq80321.c | 5 +
> arch/arm/mach-iop33x/Kconfig | 8 ++
> arch/arm/mach-iop33x/iq80331.c | 5 +
> arch/arm/mach-iop33x/iq80332.c | 4 +
> arch/arm/plat-iop/pci.c | 140 ++++++++++++++++++++++++++++++++++
> include/asm-arm/arch-iop32x/iop32x.h | 9 ++
> include/asm-arm/arch-iop32x/memory.h | 4 -
> include/asm-arm/arch-iop33x/iop33x.h | 10 ++
> include/asm-arm/arch-iop33x/memory.h | 4 -
> include/asm-arm/hardware/iop3xx.h | 20 ++++-
> 13 files changed, 214 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm/mach-iop32x/Kconfig b/arch/arm/mach-iop32x/Kconfig
> index 05549a5..b2788e3 100644
> --- a/arch/arm/mach-iop32x/Kconfig
> +++ b/arch/arm/mach-iop32x/Kconfig
> @@ -22,6 +22,14 @@ config ARCH_IQ80321
> Say Y here if you want to run your kernel on the Intel IQ80321
> evaluation kit for the IOP321 processor.
>
> +config IOP3XX_ATU
> + bool "Enable the PCI Controller"
> + default y
> + help
> + Say Y here if you want the IOP to initialize its PCI Controller.
> + Say N if the IOP is an add in card, the host system owns the PCI
> + bus in this case.
> +
> endmenu
>
> endif
> diff --git a/arch/arm/mach-iop32x/ep80219.c b/arch/arm/mach-iop32x/ep80219.c
> index f616d3e..1a5c586 100644
> --- a/arch/arm/mach-iop32x/ep80219.c
> +++ b/arch/arm/mach-iop32x/ep80219.c
> @@ -100,7 +100,7 @@ ep80219_pci_map_irq(struct pci_dev *dev,
>
> static struct hw_pci ep80219_pci __initdata = {
> .swizzle = pci_std_swizzle,
> - .nr_controllers = 1,
> + .nr_controllers = 0,
> .setup = iop3xx_pci_setup,
> .preinit = iop3xx_pci_preinit,
> .scan = iop3xx_pci_scan_bus,
> @@ -109,6 +109,8 @@ static struct hw_pci ep80219_pci __initd
>
> static int __init ep80219_pci_init(void)
> {
> + if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
> + ep80219_pci.nr_controllers = 1;
> #if 0
> if (machine_is_ep80219())
> pci_common_init(&ep80219_pci);
> diff --git a/arch/arm/mach-iop32x/iq31244.c b/arch/arm/mach-iop32x/iq31244.c
> index 967a696..25d5d62 100644
> --- a/arch/arm/mach-iop32x/iq31244.c
> +++ b/arch/arm/mach-iop32x/iq31244.c
> @@ -97,7 +97,7 @@ iq31244_pci_map_irq(struct pci_dev *dev,
>
> static struct hw_pci iq31244_pci __initdata = {
> .swizzle = pci_std_swizzle,
> - .nr_controllers = 1,
> + .nr_controllers = 0,
> .setup = iop3xx_pci_setup,
> .preinit = iop3xx_pci_preinit,
> .scan = iop3xx_pci_scan_bus,
> @@ -106,6 +106,9 @@ static struct hw_pci iq31244_pci __initd
>
> static int __init iq31244_pci_init(void)
> {
> + if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
> + iq31244_pci.nr_controllers = 1;
> +
> if (machine_is_iq31244())
> pci_common_init(&iq31244_pci);
>
> diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c
> index ef4388c..cdd2265 100644
> --- a/arch/arm/mach-iop32x/iq80321.c
> +++ b/arch/arm/mach-iop32x/iq80321.c
> @@ -97,7 +97,7 @@ iq80321_pci_map_irq(struct pci_dev *dev,
>
> static struct hw_pci iq80321_pci __initdata = {
> .swizzle = pci_std_swizzle,
> - .nr_controllers = 1,
> + .nr_controllers = 0,
> .setup = iop3xx_pci_setup,
> .preinit = iop3xx_pci_preinit,
> .scan = iop3xx_pci_scan_bus,
> @@ -106,6 +106,9 @@ static struct hw_pci iq80321_pci __initd
>
> static int __init iq80321_pci_init(void)
> {
> + if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
> + iq80321_pci.nr_controllers = 1;
> +
> if (machine_is_iq80321())
> pci_common_init(&iq80321_pci);
>
> diff --git a/arch/arm/mach-iop33x/Kconfig b/arch/arm/mach-iop33x/Kconfig
> index 9aa016b..45598e0 100644
> --- a/arch/arm/mach-iop33x/Kconfig
> +++ b/arch/arm/mach-iop33x/Kconfig
> @@ -16,6 +16,14 @@ config MACH_IQ80332
> Say Y here if you want to run your kernel on the Intel IQ80332
> evaluation kit for the IOP332 chipset.
>
> +config IOP3XX_ATU
> + bool "Enable the PCI Controller"
> + default y
> + help
> + Say Y here if you want the IOP to initialize its PCI Controller.
> + Say N if the IOP is an add in card, the host system owns the PCI
> + bus in this case.
> +
> endmenu
>
> endif
> diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c
> index 7714c94..3807000 100644
> --- a/arch/arm/mach-iop33x/iq80331.c
> +++ b/arch/arm/mach-iop33x/iq80331.c
> @@ -78,7 +78,7 @@ iq80331_pci_map_irq(struct pci_dev *dev,
>
> static struct hw_pci iq80331_pci __initdata = {
> .swizzle = pci_std_swizzle,
> - .nr_controllers = 1,
> + .nr_controllers = 0,
> .setup = iop3xx_pci_setup,
> .preinit = iop3xx_pci_preinit,
> .scan = iop3xx_pci_scan_bus,
> @@ -87,6 +87,9 @@ static struct hw_pci iq80331_pci __initd
>
> static int __init iq80331_pci_init(void)
> {
> + if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
> + iq80331_pci.nr_controllers = 1;
> +
> if (machine_is_iq80331())
> pci_common_init(&iq80331_pci);
>
> diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c
> index a3fa7f8..8780d55 100644
> --- a/arch/arm/mach-iop33x/iq80332.c
> +++ b/arch/arm/mach-iop33x/iq80332.c
> @@ -93,6 +93,10 @@ static struct hw_pci iq80332_pci __initd
>
> static int __init iq80332_pci_init(void)
> {
> +
> + if (iop3xx_get_init_atu() == IOP3XX_INIT_ATU_ENABLE)
> + iq80332_pci.nr_controllers = 1;
> +
> if (machine_is_iq80332())
> pci_common_init(&iq80332_pci);
>
> diff --git a/arch/arm/plat-iop/pci.c b/arch/arm/plat-iop/pci.c
> index e647812..19aace9 100644
> --- a/arch/arm/plat-iop/pci.c
> +++ b/arch/arm/plat-iop/pci.c
> @@ -55,7 +55,7 @@ static u32 iop3xx_cfg_address(struct pci
> * This routine checks the status of the last configuration cycle. If an error
> * was detected it returns a 1, else it returns a 0. The errors being checked
> * are parity, master abort, target abort (master and target). These types of
> - * errors occure during a config cycle where there is no device, like during
> + * errors occur during a config cycle where there is no device, like during
> * the discovery stage.
> */
> static int iop3xx_pci_status(void)
> @@ -223,8 +223,111 @@ struct pci_bus *iop3xx_pci_scan_bus(int
> return pci_scan_bus(sys->busnr, &iop3xx_ops, sys);
> }
>
> +void __init iop3xx_atu_setup(void)
> +{
> + /* BAR 0 ( Disabled ) */
> + *IOP3XX_IAUBAR0 = 0x0;
> + *IOP3XX_IABAR0 = 0x0;
> + *IOP3XX_IATVR0 = 0x0;
> + *IOP3XX_IALR0 = 0x0;
> +
> + /* BAR 1 ( Disabled ) */
> + *IOP3XX_IAUBAR1 = 0x0;
> + *IOP3XX_IABAR1 = 0x0;
> + *IOP3XX_IALR1 = 0x0;
> +
> + /* BAR 2 (1:1 mapping with Physical RAM) */
> + /* Set limit and enable */
> + *IOP3XX_IALR2 = ~((u32)IOP3XX_MAX_RAM_SIZE - 1) & ~0x1;
> + *IOP3XX_IAUBAR2 = 0x0;
> +
> + /* Align the inbound bar with the base of memory */
> + *IOP3XX_IABAR2 = PHYS_OFFSET |
> + PCI_BASE_ADDRESS_MEM_TYPE_64 |
> + PCI_BASE_ADDRESS_MEM_PREFETCH;
> +
> + *IOP3XX_IATVR2 = PHYS_OFFSET;
> +
> + /* Outbound window 0 */
> + *IOP3XX_OMWTVR0 = IOP3XX_PCI_LOWER_MEM_PA;
> + *IOP3XX_OUMWTVR0 = 0;
> +
> + /* Outbound window 1 */
> + *IOP3XX_OMWTVR1 = IOP3XX_PCI_LOWER_MEM_PA + IOP3XX_PCI_MEM_WINDOW_SIZE;
> + *IOP3XX_OUMWTVR1 = 0;
> +
> + /* BAR 3 ( Disabled ) */
> + *IOP3XX_IAUBAR3 = 0x0;
> + *IOP3XX_IABAR3 = 0x0;
> + *IOP3XX_IATVR3 = 0x0;
> + *IOP3XX_IALR3 = 0x0;
> +
> + /* Setup the I/O Bar
> + */
> + *IOP3XX_OIOWTVR = IOP3XX_PCI_LOWER_IO_PA;;
> +
> + /* Enable inbound and outbound cycles
> + */
> + *IOP3XX_ATUCMD |= PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
> + PCI_COMMAND_PARITY | PCI_COMMAND_SERR;
> + *IOP3XX_ATUCR |= IOP3XX_ATUCR_OUT_EN;
> +}
> +
> +void __init iop3xx_atu_disable(void)
> +{
> + *IOP3XX_ATUCMD = 0;
> + *IOP3XX_ATUCR = 0;
> +
> + /* wait for cycles to quiesce */
> + while (*IOP3XX_PCSR & (IOP3XX_PCSR_OUT_Q_BUSY |
> + IOP3XX_PCSR_IN_Q_BUSY))
> + cpu_relax();
> +
> + /* BAR 0 ( Disabled ) */
> + *IOP3XX_IAUBAR0 = 0x0;
> + *IOP3XX_IABAR0 = 0x0;
> + *IOP3XX_IATVR0 = 0x0;
> + *IOP3XX_IALR0 = 0x0;
> +
> + /* BAR 1 ( Disabled ) */
> + *IOP3XX_IAUBAR1 = 0x0;
> + *IOP3XX_IABAR1 = 0x0;
> + *IOP3XX_IALR1 = 0x0;
> +
> + /* BAR 2 ( Disabled ) */
> + *IOP3XX_IAUBAR2 = 0x0;
> + *IOP3XX_IABAR2 = 0x0;
> + *IOP3XX_IATVR2 = 0x0;
> + *IOP3XX_IALR2 = 0x0;
> +
> + /* BAR 3 ( Disabled ) */
> + *IOP3XX_IAUBAR3 = 0x0;
> + *IOP3XX_IABAR3 = 0x0;
> + *IOP3XX_IATVR3 = 0x0;
> + *IOP3XX_IALR3 = 0x0;
> +
> + /* Clear the outbound windows */
> + *IOP3XX_OIOWTVR = 0;
> +
> + /* Outbound window 0 */
> + *IOP3XX_OMWTVR0 = 0;
> + *IOP3XX_OUMWTVR0 = 0;
> +
> + /* Outbound window 1 */
> + *IOP3XX_OMWTVR1 = 0;
> + *IOP3XX_OUMWTVR1 = 0;

You should be using readl(), writel() variants rather than writing C
code that appears to be normal, but in reality has hardware side-effects.

Jeff



2006-09-12 00:14:47

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

On 9/11/06, Jeff Garzik <[email protected]> wrote:
> Dan Williams wrote:
> > @@ -759,8 +755,10 @@ #endif
> > device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
> > device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
> > device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
> > - device->common.device_memcpy_complete = ioat_dma_is_complete;
> > - device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
> > + device->common.device_operation_complete = ioat_dma_is_complete;
> > + device->common.device_xor_pgs_to_pg = dma_async_xor_pgs_to_pg_err;
> > + device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
> > + device->common.capabilities = DMA_MEMCPY;
>
>
> Are we really going to add a set of hooks for each DMA engine whizbang
> feature?

What's the alternative? But, also see patch 9 "dmaengine: reduce
backend address permutations" it relieves some of this pain.

>
> That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> aes, and a dozen other transforms.
>
>
> > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
> > index c94d8f1..3599472 100644
> > --- a/include/linux/dmaengine.h
> > +++ b/include/linux/dmaengine.h
> > @@ -20,7 +20,7 @@
> > */
> > #ifndef DMAENGINE_H
> > #define DMAENGINE_H
> > -
> > +#include <linux/config.h>
> > #ifdef CONFIG_DMA_ENGINE
> >
> > #include <linux/device.h>
> > @@ -65,6 +65,27 @@ enum dma_status {
> > };
> >
> > /**
> > + * enum dma_capabilities - DMA operational capabilities
> > + * @DMA_MEMCPY: src to dest copy
> > + * @DMA_XOR: src*n to dest xor
> > + * @DMA_DUAL_XOR: src*n to dest_diag and dest_horiz xor
> > + * @DMA_PQ_XOR: src*n to dest_q and dest_p gf/xor
> > + * @DMA_MEMCPY_CRC32C: src to dest copy and crc-32c sum
> > + * @DMA_SHARE: multiple clients can use this channel
> > + */
> > +enum dma_capabilities {
> > + DMA_MEMCPY = 0x1,
> > + DMA_XOR = 0x2,
> > + DMA_PQ_XOR = 0x4,
> > + DMA_DUAL_XOR = 0x8,
> > + DMA_PQ_UPDATE = 0x10,
> > + DMA_ZERO_SUM = 0x20,
> > + DMA_PQ_ZERO_SUM = 0x40,
> > + DMA_MEMSET = 0x80,
> > + DMA_MEMCPY_CRC32C = 0x100,
>
> Please use the more readable style that explicitly lists bits:
>
> DMA_MEMCPY = (1 << 0),
> DMA_XOR = (1 << 1),
> ...
I prefer this as well, although at one point I was told (not by you)
the absolute number was preferred when I was making changes to
drivers/scsi/sata_vsc.c. In any event I'll change it...

>
> > +/**
> > * struct dma_chan_percpu - the per-CPU part of struct dma_chan
> > * @refcount: local_t used for open-coded "bigref" counting
> > * @memcpy_count: transaction counter
> > @@ -75,27 +96,32 @@ struct dma_chan_percpu {
> > local_t refcount;
> > /* stats */
> > unsigned long memcpy_count;
> > + unsigned long xor_count;
> > unsigned long bytes_transferred;
> > + unsigned long bytes_xor;
>
> Clearly, each operation needs to be more compartmentalized.
>
> This just isn't scalable, when you consider all the possible transforms.
Ok, one set of counters per op is probably overkill what about lumping
operations into groups and just tracking at the group level? i.e.

memcpy, memset -> string_count, string_bytes_transferred
crc, sha1, aes -> hash_count, hash_transferred
xor, pq_xor -> sum_count, sum_transferred

>
> Jeff

Dan

2006-09-12 00:52:54

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

Jeff> Are we really going to add a set of hooks for each DMA
Jeff> engine whizbang feature?

Dan> What's the alternative? But, also see patch 9 "dmaengine:
Dan> reduce backend address permutations" it relieves some of this
Dan> pain.

I guess you can pass an opcode into a common "start operation" function.

With all the memcpy / xor / crypto / etc. hardware out there already,
we definitely have to get this interface right.

- R.

2006-09-12 02:41:44

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

Dan Williams wrote:
> This is a frequently asked question, Alan Cox had the same one at OLS.
> The answer is "probably." The only complication I currently see is
> where/how the stripe cache is maintained. With the IOPs its easy
> because the DMA engines operate directly on kernel memory. With the
> Promise card I believe they have memory on the card and it's not clear
> to me if the XOR engines on the card can deal with host memory. Also,
> MD would need to be modified to handle a stripe cache located on a
> device, or somehow synchronize its local cache with card in a manner
> that is still able to beat software only MD.

sata_sx4 operates through [standard PC] memory on the card, and you use
a DMA engine to copy memory to/from the card.

[select chipsets supported by] sata_promise operates directly on host
memory.

So, while sata_sx4 is farther away from your direct-host-memory model,
it also has much more potential for RAID acceleration: ideally, RAID1
just copies data to the card once, then copies the data to multiple
drives from there. Similarly with RAID5, you can eliminate copies and
offload XOR, presuming the drives are all connected to the same card.

Jeff


2006-09-12 05:47:26

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On 9/11/06, Jeff Garzik <[email protected]> wrote:
> Dan Williams wrote:
> > This is a frequently asked question, Alan Cox had the same one at OLS.
> > The answer is "probably." The only complication I currently see is
> > where/how the stripe cache is maintained. With the IOPs its easy
> > because the DMA engines operate directly on kernel memory. With the
> > Promise card I believe they have memory on the card and it's not clear
> > to me if the XOR engines on the card can deal with host memory. Also,
> > MD would need to be modified to handle a stripe cache located on a
> > device, or somehow synchronize its local cache with card in a manner
> > that is still able to beat software only MD.
>
> sata_sx4 operates through [standard PC] memory on the card, and you use
> a DMA engine to copy memory to/from the card.
>
> [select chipsets supported by] sata_promise operates directly on host
> memory.
>
> So, while sata_sx4 is farther away from your direct-host-memory model,
> it also has much more potential for RAID acceleration: ideally, RAID1
> just copies data to the card once, then copies the data to multiple
> drives from there. Similarly with RAID5, you can eliminate copies and
> offload XOR, presuming the drives are all connected to the same card.
In the sata_promise case its straight forward, all that is needed is
dmaengine drivers for the xor and memcpy engines. This would be
similar to the current I/OAT model where dma resources are provided by
a PCI function. The sata_sx4 case would need a different flavor of
the dma_do_raid5_block_ops routine, one that understands where the
cache is located. MD would also need the capability to bypass the
block layer since the data will have already been transferred to the
card by a stripe cache operation

The RAID1 case give me pause because it seems any work along these
lines requires that the implementation work for both MD and DM, which
then eventually leads to being tasked with merging the two.

> Jeff

Dan

2006-09-12 06:19:07

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

On 9/11/06, Roland Dreier <[email protected]> wrote:
> Jeff> Are we really going to add a set of hooks for each DMA
> Jeff> engine whizbang feature?
...ok, but at some level we are going to need a file that has:
EXPORT_SYMBOL_GPL(dma_whizbang_op1)
. . .
EXPORT_SYMBOL_GPL(dma_whizbang_opX)
correct?


> Dan> What's the alternative? But, also see patch 9 "dmaengine:
> Dan> reduce backend address permutations" it relieves some of this
> Dan> pain.
>
> I guess you can pass an opcode into a common "start operation" function.
But then we still have the problem of being able to request a memory
copy operation of a channel that only understands xor, a la Jeff's
comment to patch 12:

"Further illustration of how this API growth is going wrong. You should
create an API such that it is impossible for an XOR transform to ever
call non-XOR-transform hooks."

> With all the memcpy / xor / crypto / etc. hardware out there already,
> we definitely have to get this interface right.
>
> - R.

I understand what you are saying Jeff, the implementation can be made
better, but something I think is valuable is the ability to write
clients once like NET_DMA and RAID5_DMA and have them run without
modification on any platform that can provide the engine interface
rather than needing a client per architecture
IOP_RAID5_DMA...FOO_X_RAID5_DMA.

Or is this an example of the where "Do What You Must, And No More"
comes in, i.e. don't worry about making a generic RAID5_DMA while
there is only one implementation existence?

I also want to pose the question of whether the dmaengine interface
should handle cryptographic transforms? We already have Acrypto:
http://tservice.net.ru/~s0mbre/blog/devel/acrypto/index.html. At the
same time since IOPs can do Galois Field multiplication and XOR it
would be nice to take advantage of that for crypto acceleration, but
this does not fit the model of a device that Acrypto supports.

Dan

2006-09-12 09:17:07

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

On Mon, Sep 11, 2006 at 11:18:59PM -0700, Dan Williams ([email protected]) wrote:
> Or is this an example of the where "Do What You Must, And No More"
> comes in, i.e. don't worry about making a generic RAID5_DMA while
> there is only one implementation existence?
>
> I also want to pose the question of whether the dmaengine interface
> should handle cryptographic transforms? We already have Acrypto:
> http://tservice.net.ru/~s0mbre/blog/devel/acrypto/index.html. At the
> same time since IOPs can do Galois Field multiplication and XOR it
> would be nice to take advantage of that for crypto acceleration, but
> this does not fit the model of a device that Acrypto supports.

Each acrypto crypto device provides set of capabilities it supports, and
when user requests some operation, acrypto core selects device with the
maximum speed for given capabilities, so one can easily add there GF
multiplication devices. Acrypto supports "sync" mode too in case your
hardware is synchronous (i.e. it does not provide interrupt or other
async event when operation is completed).

P.S. acrypto homepage with some design notes and supported features
can be found here:
http://tservice.net.ru/~s0mbre/old/?section=projects&item=acrypto

> Dan

--
Evgeniy Polyakov

2006-09-13 04:04:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

Dan Williams wrote:
> On 9/11/06, Roland Dreier <[email protected]> wrote:
>> Jeff> Are we really going to add a set of hooks for each DMA
>> Jeff> engine whizbang feature?
> ...ok, but at some level we are going to need a file that has:
> EXPORT_SYMBOL_GPL(dma_whizbang_op1)
> . . .
> EXPORT_SYMBOL_GPL(dma_whizbang_opX)
> correct?

If properly modularized, you'll have multiple files with such exports.

Or perhaps you won't have such exports at all, if it is hidden inside a
module-specific struct-of-hooks.


> I understand what you are saying Jeff, the implementation can be made
> better, but something I think is valuable is the ability to write
> clients once like NET_DMA and RAID5_DMA and have them run without
> modification on any platform that can provide the engine interface
> rather than needing a client per architecture
> IOP_RAID5_DMA...FOO_X_RAID5_DMA.

It depends on the situation.

The hardware capabilities exported by each platform[or device] vary
greatly, not only in the raw capabilities provided, but also in the
level of offload.

In general, we don't want to see hardware-specific stuff in generic
code, though...


> Or is this an example of the where "Do What You Must, And No More"
> comes in, i.e. don't worry about making a generic RAID5_DMA while
> there is only one implementation existence?

> I also want to pose the question of whether the dmaengine interface
> should handle cryptographic transforms? We already have Acrypto:
> http://tservice.net.ru/~s0mbre/blog/devel/acrypto/index.html. At the
> same time since IOPs can do Galois Field multiplication and XOR it
> would be nice to take advantage of that for crypto acceleration, but
> this does not fit the model of a device that Acrypto supports.

It would be quite interesting to see where the synergies are between the
two, at the very least. "async [transform|sum]" is a superset of "async
crypto" after all.

Jeff


2006-09-13 04:05:42

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

Dan Williams wrote:
> On 9/11/06, Jeff Garzik <[email protected]> wrote:
>> Dan Williams wrote:
>> > This is a frequently asked question, Alan Cox had the same one at OLS.
>> > The answer is "probably." The only complication I currently see is
>> > where/how the stripe cache is maintained. With the IOPs its easy
>> > because the DMA engines operate directly on kernel memory. With the
>> > Promise card I believe they have memory on the card and it's not clear
>> > to me if the XOR engines on the card can deal with host memory. Also,
>> > MD would need to be modified to handle a stripe cache located on a
>> > device, or somehow synchronize its local cache with card in a manner
>> > that is still able to beat software only MD.
>>
>> sata_sx4 operates through [standard PC] memory on the card, and you use
>> a DMA engine to copy memory to/from the card.
>>
>> [select chipsets supported by] sata_promise operates directly on host
>> memory.
>>
>> So, while sata_sx4 is farther away from your direct-host-memory model,
>> it also has much more potential for RAID acceleration: ideally, RAID1
>> just copies data to the card once, then copies the data to multiple
>> drives from there. Similarly with RAID5, you can eliminate copies and
>> offload XOR, presuming the drives are all connected to the same card.
> In the sata_promise case its straight forward, all that is needed is
> dmaengine drivers for the xor and memcpy engines. This would be
> similar to the current I/OAT model where dma resources are provided by
> a PCI function. The sata_sx4 case would need a different flavor of
> the dma_do_raid5_block_ops routine, one that understands where the
> cache is located. MD would also need the capability to bypass the
> block layer since the data will have already been transferred to the
> card by a stripe cache operation
>
> The RAID1 case give me pause because it seems any work along these
> lines requires that the implementation work for both MD and DM, which
> then eventually leads to being tasked with merging the two.

RAID5 has similar properties. If all devices in a RAID5 array are
attached to a single SX4 card, then a high level write to the RAID5
array is passed directly to the card, which then performs XOR, striping,
etc.

Jeff



2006-09-13 07:15:12

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On Mon, Sep 11, 2006 at 04:00:32PM -0700, Dan Williams wrote:
> Neil,
>
...
>
> Concerning the context switching performance concerns raised at the
> previous release, I have observed the following. For the hardware
> accelerated case it appears that performance is always better with the
> work queue than without since it allows multiple stripes to be operated
> on simultaneously. I expect the same for an SMP platform, but so far my
> testing has been limited to IOPs. For a single-processor
> non-accelerated configuration I have not observed performance
> degradation with work queue support enabled, but in the Kconfig option
> help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).

Out of curiosity; how does accelerated compare to non-accelerated?

--

/ jakob

2006-09-13 19:18:00

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On 9/13/06, Jakob Oestergaard <[email protected]> wrote:
> On Mon, Sep 11, 2006 at 04:00:32PM -0700, Dan Williams wrote:
> > Neil,
> >
> ...
> >
> > Concerning the context switching performance concerns raised at the
> > previous release, I have observed the following. For the hardware
> > accelerated case it appears that performance is always better with the
> > work queue than without since it allows multiple stripes to be operated
> > on simultaneously. I expect the same for an SMP platform, but so far my
> > testing has been limited to IOPs. For a single-processor
> > non-accelerated configuration I have not observed performance
> > degradation with work queue support enabled, but in the Kconfig option
> > help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).
>
> Out of curiosity; how does accelerated compare to non-accelerated?

One quick example:
4-disk SATA array rebuild on iop321 without acceleration - 'top'
reports md0_resync and md0_raid5 dueling for the CPU each at ~50%
utilization.

With acceleration - 'top' reports md0_resync cpu utilization at ~90%
with the rest split between md0_raid5 and md0_raid5_ops.

The sync speed reported by /proc/mdstat is ~40% higher in the accelerated case.

That being said, array resync is a special case, so your mileage may
vary with other applications.

I will put together some data from bonnie++, iozone, maybe contest,
and post it on SourceForge.

> / jakob

Dan

2006-09-14 07:42:46

by Jakob Oestergaard

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On Wed, Sep 13, 2006 at 12:17:55PM -0700, Dan Williams wrote:
...
> >Out of curiosity; how does accelerated compare to non-accelerated?
>
> One quick example:
> 4-disk SATA array rebuild on iop321 without acceleration - 'top'
> reports md0_resync and md0_raid5 dueling for the CPU each at ~50%
> utilization.
>
> With acceleration - 'top' reports md0_resync cpu utilization at ~90%
> with the rest split between md0_raid5 and md0_raid5_ops.
>
> The sync speed reported by /proc/mdstat is ~40% higher in the accelerated
> case.

Ok, nice :)

>
> That being said, array resync is a special case, so your mileage may
> vary with other applications.

Every-day usage I/O performance data would be nice indeed :)

> I will put together some data from bonnie++, iozone, maybe contest,
> and post it on SourceForge.

Great!

--

/ jakob

2006-09-15 14:48:10

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH 09/19] dmaengine: reduce backend address permutations

Hi,

On Mon, 11 Sep 2006 16:18:23 -0700 Dan Williams <[email protected]> wrote:

> From: Dan Williams <[email protected]>
>
> Change the backend dma driver API to accept a 'union dmaengine_addr'. The
> intent is to be able to support a wide range of frontend address type
> permutations without needing an equal number of function type permutations
> on the backend.

Please do the cleanup of existing code before you apply new function.
Earlier patches in this series added code that you're modifying here.
If you modify the existing code first it's less churn for everyone to
review.


Thanks,

Olof

2006-09-15 14:59:12

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines

Hi,

On Mon, 11 Sep 2006 16:19:00 -0700 Dan Williams <[email protected]> wrote:

> From: Dan Williams <[email protected]>
>
> This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor,
> pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy
> operations.

You implement a bunch of different functions here. I agree with Jeff's
feedback related to the lack of scalability the way the API is going
right now.

Another example of this is that the driver is doing it's own self-test
of the functions. This means that every backend driver will need to
duplicate this code. Wouldn't it be easier for everyone if the common
infrastructure did a test call at the time of registration of a
function instead, and return failure if it doesn't pass?

> drivers/dma/Kconfig | 27 +
> drivers/dma/Makefile | 1
> drivers/dma/iop-adma.c | 1501 +++++++++++++++++++++++++++++++++++
> include/asm-arm/hardware/iop_adma.h | 98 ++

ioatdma.h is currently under drivers/dma/. If the contents is strictly
device-related please add them under drivers/dma.


-Olof

2006-09-15 16:43:38

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH 08/19] dmaengine: enable multiple clients and operations

On Mon, 11 Sep 2006 19:44:16 -0400 Jeff Garzik <[email protected]> wrote:

> Dan Williams wrote:
> > @@ -759,8 +755,10 @@ #endif
> > device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
> > device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
> > device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
> > - device->common.device_memcpy_complete = ioat_dma_is_complete;
> > - device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
> > + device->common.device_operation_complete = ioat_dma_is_complete;
> > + device->common.device_xor_pgs_to_pg = dma_async_xor_pgs_to_pg_err;
> > + device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
> > + device->common.capabilities = DMA_MEMCPY;
>
>
> Are we really going to add a set of hooks for each DMA engine whizbang
> feature?
>
> That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> aes, and a dozen other transforms.


Yes, it will be unmaintainable. We need some sort of multiplexing with
per-function registrations.

Here's a first cut at it, just very quick. It could be improved further
but it shows that we could exorcise most of the hardcoded things pretty
easily.

Dan, would this fit with your added XOR stuff as well? If so, would you
mind rebasing on top of something like this (with your further cleanups
going in before added function, please. :-)

(Build tested only, since I lack Intel hardware).


It would be nice if we could move the type specification to only be
needed in the channel allocation. I don't know how well that fits the
model for some of the hardware platforms though, since a single channel
might be shared for different types of functions. Maybe we need a
different level of abstraction there instead, i.e. divorce the hardware
channel and software channel model and have several software channels
map onto a hardware one.





Clean up the DMA API a bit, allowing each engine to register an array
of supported functions instead of allocating static names for each possible
function.


Signed-off-by: Olof Johansson <[email protected]>


diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 1527804..282ce85 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -80,7 +80,7 @@ static ssize_t show_memcpy_count(struct
int i;

for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += per_cpu_ptr(chan->local, i)->count;

return sprintf(buf, "%lu\n", count);
}
@@ -105,7 +105,7 @@ static ssize_t show_in_use(struct class_
}

static struct class_device_attribute dma_class_attrs[] = {
- __ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(count, S_IRUGO, show_memcpy_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
@@ -402,11 +402,11 @@ subsys_initcall(dma_bus_init);
EXPORT_SYMBOL(dma_async_client_register);
EXPORT_SYMBOL(dma_async_client_unregister);
EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_buf_to_buf);
+EXPORT_SYMBOL(dma_async_buf_to_pg);
+EXPORT_SYMBOL(dma_async_pg_to_pg);
+EXPORT_SYMBOL(dma_async_complete);
+EXPORT_SYMBOL(dma_async_issue_pending);
EXPORT_SYMBOL(dma_async_device_register);
EXPORT_SYMBOL(dma_async_device_unregister);
EXPORT_SYMBOL(dma_chan_cleanup);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index dbd4d6c..6cbed42 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -40,6 +40,7 @@
#define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
#define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)

+
/* internal functions */
static int __devinit ioat_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void __devexit ioat_remove(struct pci_dev *pdev);
@@ -681,6 +682,14 @@ out:
return err;
}

+struct dma_function ioat_memcpy_functions = {
+ .buf_to_buf = ioat_dma_memcpy_buf_to_buf,
+ .buf_to_pg = ioat_dma_memcpy_buf_to_pg,
+ .pg_to_pg = ioat_dma_memcpy_pg_to_pg,
+ .complete = ioat_dma_is_complete,
+ .issue_pending = ioat_dma_memcpy_issue_pending,
+};
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -756,11 +765,8 @@ static int __devinit ioat_probe(struct p

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.funcs[DMAFUNC_MEMCPY] = &ioat_memcpy_functions;
+
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
index d637555..8a2f642 100644
--- a/drivers/dma/iovlock.c
+++ b/drivers/dma/iovlock.c
@@ -151,11 +151,8 @@ static dma_cookie_t dma_memcpy_to_kernel
while (len > 0) {
if (iov->iov_len) {
int copy = min_t(unsigned int, iov->iov_len, len);
- dma_cookie = dma_async_memcpy_buf_to_buf(
- chan,
- iov->iov_base,
- kdata,
- copy);
+ dma_cookie = dma_async_buf_to_buf(DMAFUNC_MEMCPY, chan,
+ iov->iov_base, kdata, copy);
kdata += copy;
len -= copy;
iov->iov_len -= copy;
@@ -210,7 +207,7 @@ dma_cookie_t dma_memcpy_to_iovec(struct
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_buf_to_pg(chan,
+ dma_cookie = dma_async_buf_to_pg(DMAFUNC_MEMCPY, chan,
page_list->pages[page_idx],
iov_byte_offset,
kdata,
@@ -274,7 +271,7 @@ dma_cookie_t dma_memcpy_pg_to_iovec(stru
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_pg_to_pg(chan,
+ dma_cookie = dma_async_pg_to_pg(DMAFUNC_MEMCPY, chan,
page_list->pages[page_idx],
iov_byte_offset,
page,
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index c94d8f1..317a7f2 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -67,14 +67,14 @@ enum dma_status {
/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
- * @memcpy_count: transaction counter
+ * @count: transaction counter
* @bytes_transferred: byte counter
*/

struct dma_chan_percpu {
local_t refcount;
/* stats */
- unsigned long memcpy_count;
+ unsigned long count;
unsigned long bytes_transferred;
};

@@ -157,6 +157,34 @@ struct dma_client {
struct list_head global_node;
};

+enum dma_function_type {
+ DMAFUNC_MEMCPY = 0,
+ DMAFUNC_XOR,
+ DMAFUNC_MAX
+};
+
+/* struct dma_function
+ * @buf_to_pg: buf pointer to struct page
+ * @pg_to_pg: struct page/offset to struct page/offset
+ * @complete: poll the status of a DMA transaction
+ * @issue_pending: push appended descriptors to hardware
+ */
+struct dma_function {
+ dma_cookie_t (*buf_to_buf)(struct dma_chan *chan,
+ void *dest, void *src, size_t len);
+ dma_cookie_t (*buf_to_pg)(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len);
+ dma_cookie_t (*pg_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off,
+ size_t len);
+ enum dma_status (*complete)(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used);
+ void (*issue_pending)(struct dma_chan *chan);
+};
+
/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
@@ -168,14 +196,8 @@ struct dma_client {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
*/
struct dma_device {
-
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
@@ -185,20 +207,10 @@ struct dma_device {

int dev_id;

+ struct dma_function *funcs[DMAFUNC_MAX];
+
int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last,
- dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
};

/* --- public DMA engine API --- */
@@ -209,7 +221,7 @@ void dma_async_client_chan_request(struc
unsigned int number);

/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * dma_async_buf_to_buf - offloaded copy between virtual addresses
* @chan: DMA channel to offload copy to
* @dest: destination address (virtual)
* @src: source address (virtual)
@@ -220,19 +232,24 @@ void dma_async_client_chan_request(struc
* Both @dest and @src must stay memory resident (kernel memory or locked
* user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len)
+static inline dma_cookie_t dma_async_buf_to_buf(enum dma_function_type type,
+ struct dma_chan *chan, void *dest, void *src, size_t len)
{
- int cpu = get_cpu();
+ int cpu;
+
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+
+ cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return chan->device->funcs[type]->buf_to_buf(chan, dest, src, len);
}

/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * dma_async_buf_to_pg - offloaded copy from address to page
* @chan: DMA channel to offload copy to
* @page: destination page
* @offset: offset in page to copy to
@@ -244,20 +261,26 @@ static inline dma_cookie_t dma_async_mem
* Both @page/@offset and @kdata must stay memory resident (kernel memory or
* locked user space pages)
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len)
+static inline dma_cookie_t dma_async_buf_to_pg(enum dma_function_type type,
+ struct dma_chan *chan, struct page *page, unsigned int offset,
+ void *kdata, size_t len)
{
- int cpu = get_cpu();
+ int cpu;
+
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+
+ cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return chan->device->funcs[type]->buf_to_pg(chan, page, offset,
+ kdata, len);
}

/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * dma_async_pg_to_pg - offloaded copy from page to page
* @chan: DMA channel to offload copy to
* @dest_pg: destination page
* @dest_off: offset in page to copy to
@@ -270,33 +293,40 @@ static inline dma_cookie_t dma_async_mem
* Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
* (kernel memory or locked user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len)
+static inline dma_cookie_t dma_async_pg_to_pg(enum dma_function_type type,
+ struct dma_chan *chan, struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off, size_t len)
{
- int cpu = get_cpu();
+ int cpu;
+
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+
+ cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return chan->device->funcs[type]->pg_to_pg(chan, dest_pg, dest_off,
+ src_pg, src_off, len);
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
* This allows drivers to push copies to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(enum dma_function_type type,
+ struct dma_chan *chan)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ if (chan->device->funcs[type])
+ return chan->device->funcs[type]->issue_pending(chan);
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +336,14 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+static inline enum dma_status dma_async_complete(enum dma_function_type type,
+ struct dma_chan *chan, dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ if (!chan->device->funcs[type])
+ return -ENXIO;
+ else
+ return chan->device->funcs[type]->complete(chan, cookie, last, used);
}

/**
@@ -318,7 +352,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
diff --git a/net/core/dev.c b/net/core/dev.c
index d4a1ec3..e8a8ee9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1945,7 +1945,7 @@ out:
struct dma_chan *chan;
rcu_read_lock();
list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
+ dma_async_issue_pending(DMAFUNC_MEMCPY, chan);
rcu_read_unlock();
}
#endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 934396b..c270837 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1431,9 +1431,9 @@ skip_copy:
struct sk_buff *skb;
dma_cookie_t done, used;

- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_issue_pending(DMAFUNC_MEMCPY, tp->ucopy.dma_chan);

- while (dma_async_memcpy_complete(tp->ucopy.dma_chan,
+ while (dma_async_complete(DMAFUNC_MEMCPY, tp->ucopy.dma_chan,
tp->ucopy.dma_cookie, &done,
&used) == DMA_IN_PROGRESS) {
/* do partial cleanup of sk_async_wait_queue */


2006-09-15 19:46:11

by Olof Johansson

[permalink] [raw]
Subject: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

On Fri, 15 Sep 2006 11:38:17 -0500 Olof Johansson <[email protected]> wrote:

> On Mon, 11 Sep 2006 19:44:16 -0400 Jeff Garzik <[email protected]> wrote:

> > Are we really going to add a set of hooks for each DMA engine whizbang
> > feature?
> >
> > That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> > aes, and a dozen other transforms.
>
>
> Yes, it will be unmaintainable. We need some sort of multiplexing with
> per-function registrations.
>
> Here's a first cut at it, just very quick. It could be improved further
> but it shows that we could exorcise most of the hardcoded things pretty
> easily.

Ok, that was obviously a naive and not so nice first attempt, but I
figured it was worth it to show how it can be done.

This is a little more proper: Specify at client registration time what
the function the client will use is, and make the channel use it. This
way most of the error checking per call can be removed too.

Chris/Dan: Please consider picking this up as a base for the added
functionality and cleanups.





Clean up dmaengine a bit. Make the client registration specify which
channel functions ("type") the client will use. Also, make devices
register which functions they will provide.

Also exorcise most of the memcpy-specific references from the generic
dma engine code. There's still some left in the iov stuff.


Signed-off-by: Olof Johansson <[email protected]>

Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c
+++ linux-2.6/drivers/dma/dmaengine.c
@@ -73,14 +73,14 @@ static LIST_HEAD(dma_client_list);

/* --- sysfs implementation --- */

-static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
+static ssize_t show_count(struct class_device *cd, char *buf)
{
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
unsigned long count = 0;
int i;

for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += per_cpu_ptr(chan->local, i)->count;

return sprintf(buf, "%lu\n", count);
}
@@ -105,7 +105,7 @@ static ssize_t show_in_use(struct class_
}

static struct class_device_attribute dma_class_attrs[] = {
- __ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(count, S_IRUGO, show_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
@@ -142,6 +142,10 @@ static struct dma_chan *dma_client_chan_

/* Find a channel, any DMA engine will do */
list_for_each_entry(device, &dma_device_list, global_node) {
+ /* Skip devices that don't provide the right function */
+ if (!device->funcs[client->type])
+ continue;
+
list_for_each_entry(chan, &device->channels, device_node) {
if (chan->client)
continue;
@@ -241,7 +245,8 @@ static void dma_chans_rebalance(void)
* dma_async_client_register - allocate and register a &dma_client
* @event_callback: callback for notification of channel addition/removal
*/
-struct dma_client *dma_async_client_register(dma_event_callback event_callback)
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback)
{
struct dma_client *client;

@@ -254,6 +259,7 @@ struct dma_client *dma_async_client_regi
client->chans_desired = 0;
client->chan_count = 0;
client->event_callback = event_callback;
+ client->type = type;

mutex_lock(&dma_list_mutex);
list_add_tail(&client->global_node, &dma_client_list);
@@ -402,11 +408,11 @@ subsys_initcall(dma_bus_init);
EXPORT_SYMBOL(dma_async_client_register);
EXPORT_SYMBOL(dma_async_client_unregister);
EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_buf_to_buf);
+EXPORT_SYMBOL(dma_async_buf_to_pg);
+EXPORT_SYMBOL(dma_async_pg_to_pg);
+EXPORT_SYMBOL(dma_async_complete);
+EXPORT_SYMBOL(dma_async_issue_pending);
EXPORT_SYMBOL(dma_async_device_register);
EXPORT_SYMBOL(dma_async_device_unregister);
EXPORT_SYMBOL(dma_chan_cleanup);
Index: linux-2.6/drivers/dma/ioatdma.c
===================================================================
--- linux-2.6.orig/drivers/dma/ioatdma.c
+++ linux-2.6/drivers/dma/ioatdma.c
@@ -681,6 +682,14 @@ out:
return err;
}

+struct dma_function ioat_memcpy_functions = {
+ .buf_to_buf = ioat_dma_memcpy_buf_to_buf,
+ .buf_to_pg = ioat_dma_memcpy_buf_to_pg,
+ .pg_to_pg = ioat_dma_memcpy_pg_to_pg,
+ .complete = ioat_dma_is_complete,
+ .issue_pending = ioat_dma_memcpy_issue_pending,
+};
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -756,11 +765,8 @@ static int __devinit ioat_probe(struct p

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.funcs[DMAFUNC_MEMCPY] = &ioat_memcpy_functions;
+
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

Index: linux-2.6/include/linux/dmaengine.h
===================================================================
--- linux-2.6.orig/include/linux/dmaengine.h
+++ linux-2.6/include/linux/dmaengine.h
@@ -67,14 +67,14 @@ enum dma_status {
/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
- * @memcpy_count: transaction counter
+ * @count: transaction counter
* @bytes_transferred: byte counter
*/

struct dma_chan_percpu {
local_t refcount;
/* stats */
- unsigned long memcpy_count;
+ unsigned long count;
unsigned long bytes_transferred;
};

@@ -138,6 +138,15 @@ static inline void dma_chan_put(struct d
typedef void (*dma_event_callback) (struct dma_client *client,
struct dma_chan *chan, enum dma_event event);

+/*
+ * dma_function_type - one entry for every possible function type provided
+ */
+enum dma_function_type {
+ DMAFUNC_MEMCPY = 0,
+ DMAFUNC_XOR,
+ DMAFUNC_MAX
+};
+
/**
* struct dma_client - info on the entity making use of DMA services
* @event_callback: func ptr to call when something happens
@@ -152,11 +161,35 @@ struct dma_client {
unsigned int chan_count;
unsigned int chans_desired;

+ enum dma_function_type type;
+
spinlock_t lock;
struct list_head channels;
struct list_head global_node;
};

+/* struct dma_function
+ * @buf_to_pg: buf pointer to struct page
+ * @pg_to_pg: struct page/offset to struct page/offset
+ * @complete: poll the status of a DMA transaction
+ * @issue_pending: push appended descriptors to hardware
+ */
+struct dma_function {
+ dma_cookie_t (*buf_to_buf)(struct dma_chan *chan,
+ void *dest, void *src, size_t len);
+ dma_cookie_t (*buf_to_pg)(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len);
+ dma_cookie_t (*pg_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off,
+ size_t len);
+ enum dma_status (*complete)(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used);
+ void (*issue_pending)(struct dma_chan *chan);
+};
+
/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
@@ -168,14 +201,8 @@ struct dma_client {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
*/
struct dma_device {
-
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
@@ -185,31 +212,24 @@ struct dma_device {

int dev_id;

+ struct dma_function *funcs[DMAFUNC_MAX];
+
int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last,
- dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
};

+#define CHAN2FUNCS(chan) (chan->device->funcs[chan->client->type])
+
/* --- public DMA engine API --- */

-struct dma_client *dma_async_client_register(dma_event_callback event_callback);
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback);
void dma_async_client_unregister(struct dma_client *client);
void dma_async_client_chan_request(struct dma_client *client,
unsigned int number);

/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * dma_async_buf_to_buf - offloaded copy between virtual addresses
* @chan: DMA channel to offload copy to
* @dest: destination address (virtual)
* @src: source address (virtual)
@@ -220,19 +240,19 @@ void dma_async_client_chan_request(struc
* Both @dest and @src must stay memory resident (kernel memory or locked
* user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len)
+static inline dma_cookie_t dma_async_buf_to_buf(struct dma_chan *chan,
+ void *dest, void *src, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return CHAN2FUNCS(chan)->buf_to_buf(chan, dest, src, len);
}

/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * dma_async_buf_to_pg - offloaded copy from address to page
* @chan: DMA channel to offload copy to
* @page: destination page
* @offset: offset in page to copy to
@@ -244,20 +264,21 @@ static inline dma_cookie_t dma_async_mem
* Both @page/@offset and @kdata must stay memory resident (kernel memory or
* locked user space pages)
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len)
+static inline dma_cookie_t dma_async_buf_to_pg(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return CHAN2FUNCS(chan)->buf_to_pg(chan, page, offset,
+ kdata, len);
}

/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * dma_async_pg_to_pg - offloaded copy from page to page
* @chan: DMA channel to offload copy to
* @dest_pg: destination page
* @dest_off: offset in page to copy to
@@ -270,33 +291,33 @@ static inline dma_cookie_t dma_async_mem
* Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
* (kernel memory or locked user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len)
+static inline dma_cookie_t dma_async_pg_to_pg( struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return CHAN2FUNCS(chan)->pg_to_pg(chan, dest_pg, dest_off,
+ src_pg, src_off, len);
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
* This allows drivers to push copies to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(struct dma_chan *chan)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ return CHAN2FUNCS(chan)->issue_pending(chan);
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +327,11 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+static inline enum dma_status dma_async_complete(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ return CHAN2FUNCS(chan)->complete(chan, cookie, last, used);
}

/**
@@ -318,7 +340,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -1945,7 +1945,7 @@ out:
struct dma_chan *chan;
rcu_read_lock();
list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
+ dma_async_issue_pending(chan);
rcu_read_unlock();
}
#endif
@@ -3467,7 +3467,7 @@ static void netdev_dma_event(struct dma_
static int __init netdev_dma_register(void)
{
spin_lock_init(&net_dma_event_lock);
- net_dma_client = dma_async_client_register(netdev_dma_event);
+ net_dma_client = dma_async_client_register(DMAFUNC_MEMCPY, netdev_dma_event);
if (net_dma_client == NULL)
return -ENOMEM;

2006-09-15 20:04:16

by Olof Johansson

[permalink] [raw]
Subject: [PATCH] [v2] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

[Bad day, forgot a quilt refresh.]




Clean up dmaengine a bit. Make the client registration specify which
channel functions ("type") the client will use. Also, make devices
register which functions they will provide.

Also exorcise most of the memcpy-specific references from the generic
dma engine code. There's still some left in the iov stuff.


Signed-off-by: Olof Johansson <[email protected]>

Index: linux-2.6/drivers/dma/dmaengine.c
===================================================================
--- linux-2.6.orig/drivers/dma/dmaengine.c
+++ linux-2.6/drivers/dma/dmaengine.c
@@ -73,14 +73,14 @@ static LIST_HEAD(dma_client_list);

/* --- sysfs implementation --- */

-static ssize_t show_memcpy_count(struct class_device *cd, char *buf)
+static ssize_t show_count(struct class_device *cd, char *buf)
{
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
unsigned long count = 0;
int i;

for_each_possible_cpu(i)
- count += per_cpu_ptr(chan->local, i)->memcpy_count;
+ count += per_cpu_ptr(chan->local, i)->count;

return sprintf(buf, "%lu\n", count);
}
@@ -105,7 +105,7 @@ static ssize_t show_in_use(struct class_
}

static struct class_device_attribute dma_class_attrs[] = {
- __ATTR(memcpy_count, S_IRUGO, show_memcpy_count, NULL),
+ __ATTR(count, S_IRUGO, show_count, NULL),
__ATTR(bytes_transferred, S_IRUGO, show_bytes_transferred, NULL),
__ATTR(in_use, S_IRUGO, show_in_use, NULL),
__ATTR_NULL
@@ -142,6 +142,10 @@ static struct dma_chan *dma_client_chan_

/* Find a channel, any DMA engine will do */
list_for_each_entry(device, &dma_device_list, global_node) {
+ /* Skip devices that don't provide the right function */
+ if (!device->funcs[client->type])
+ continue;
+
list_for_each_entry(chan, &device->channels, device_node) {
if (chan->client)
continue;
@@ -241,7 +245,8 @@ static void dma_chans_rebalance(void)
* dma_async_client_register - allocate and register a &dma_client
* @event_callback: callback for notification of channel addition/removal
*/
-struct dma_client *dma_async_client_register(dma_event_callback event_callback)
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback)
{
struct dma_client *client;

@@ -254,6 +259,7 @@ struct dma_client *dma_async_client_regi
client->chans_desired = 0;
client->chan_count = 0;
client->event_callback = event_callback;
+ client->type = type;

mutex_lock(&dma_list_mutex);
list_add_tail(&client->global_node, &dma_client_list);
@@ -402,11 +408,11 @@ subsys_initcall(dma_bus_init);
EXPORT_SYMBOL(dma_async_client_register);
EXPORT_SYMBOL(dma_async_client_unregister);
EXPORT_SYMBOL(dma_async_client_chan_request);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-EXPORT_SYMBOL(dma_async_memcpy_complete);
-EXPORT_SYMBOL(dma_async_memcpy_issue_pending);
+EXPORT_SYMBOL(dma_async_buf_to_buf);
+EXPORT_SYMBOL(dma_async_buf_to_pg);
+EXPORT_SYMBOL(dma_async_pg_to_pg);
+EXPORT_SYMBOL(dma_async_complete);
+EXPORT_SYMBOL(dma_async_issue_pending);
EXPORT_SYMBOL(dma_async_device_register);
EXPORT_SYMBOL(dma_async_device_unregister);
EXPORT_SYMBOL(dma_chan_cleanup);
Index: linux-2.6/drivers/dma/ioatdma.c
===================================================================
--- linux-2.6.orig/drivers/dma/ioatdma.c
+++ linux-2.6/drivers/dma/ioatdma.c
@@ -40,6 +40,7 @@
#define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
#define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)

+
/* internal functions */
static int __devinit ioat_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void __devexit ioat_remove(struct pci_dev *pdev);
@@ -681,6 +682,14 @@ out:
return err;
}

+struct dma_function ioat_memcpy_functions = {
+ .buf_to_buf = ioat_dma_memcpy_buf_to_buf,
+ .buf_to_pg = ioat_dma_memcpy_buf_to_pg,
+ .pg_to_pg = ioat_dma_memcpy_pg_to_pg,
+ .complete = ioat_dma_is_complete,
+ .issue_pending = ioat_dma_memcpy_issue_pending,
+};
+
static int __devinit ioat_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
@@ -756,11 +765,8 @@ static int __devinit ioat_probe(struct p

device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
- device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
- device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
- device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
- device->common.device_memcpy_complete = ioat_dma_is_complete;
- device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+ device->common.funcs[DMAFUNC_MEMCPY] = &ioat_memcpy_functions;
+
printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
device->common.chancnt);

Index: linux-2.6/include/linux/dmaengine.h
===================================================================
--- linux-2.6.orig/include/linux/dmaengine.h
+++ linux-2.6/include/linux/dmaengine.h
@@ -67,14 +67,14 @@ enum dma_status {
/**
* struct dma_chan_percpu - the per-CPU part of struct dma_chan
* @refcount: local_t used for open-coded "bigref" counting
- * @memcpy_count: transaction counter
+ * @count: transaction counter
* @bytes_transferred: byte counter
*/

struct dma_chan_percpu {
local_t refcount;
/* stats */
- unsigned long memcpy_count;
+ unsigned long count;
unsigned long bytes_transferred;
};

@@ -138,6 +138,15 @@ static inline void dma_chan_put(struct d
typedef void (*dma_event_callback) (struct dma_client *client,
struct dma_chan *chan, enum dma_event event);

+/*
+ * dma_function_type - one entry for every possible function type provided
+ */
+enum dma_function_type {
+ DMAFUNC_MEMCPY = 0,
+ DMAFUNC_XOR,
+ DMAFUNC_MAX
+};
+
/**
* struct dma_client - info on the entity making use of DMA services
* @event_callback: func ptr to call when something happens
@@ -152,11 +161,35 @@ struct dma_client {
unsigned int chan_count;
unsigned int chans_desired;

+ enum dma_function_type type;
+
spinlock_t lock;
struct list_head channels;
struct list_head global_node;
};

+/* struct dma_function
+ * @buf_to_pg: buf pointer to struct page
+ * @pg_to_pg: struct page/offset to struct page/offset
+ * @complete: poll the status of a DMA transaction
+ * @issue_pending: push appended descriptors to hardware
+ */
+struct dma_function {
+ dma_cookie_t (*buf_to_buf)(struct dma_chan *chan,
+ void *dest, void *src, size_t len);
+ dma_cookie_t (*buf_to_pg)(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len);
+ dma_cookie_t (*pg_to_pg)(struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off,
+ size_t len);
+ enum dma_status (*complete)(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used);
+ void (*issue_pending)(struct dma_chan *chan);
+};
+
/**
* struct dma_device - info on the entity supplying DMA services
* @chancnt: how many DMA channels are supported
@@ -168,14 +201,8 @@ struct dma_client {
* @device_alloc_chan_resources: allocate resources and return the
* number of allocated descriptors
* @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
*/
struct dma_device {
-
unsigned int chancnt;
struct list_head channels;
struct list_head global_node;
@@ -185,31 +212,24 @@ struct dma_device {

int dev_id;

+ struct dma_function *funcs[DMAFUNC_MAX];
+
int (*device_alloc_chan_resources)(struct dma_chan *chan);
void (*device_free_chan_resources)(struct dma_chan *chan);
- dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
- void *dest, void *src, size_t len);
- dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata,
- size_t len);
- dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off,
- struct page *src_pg, unsigned int src_off, size_t len);
- enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last,
- dma_cookie_t *used);
- void (*device_memcpy_issue_pending)(struct dma_chan *chan);
};

+#define CHAN2FUNCS(chan) (chan->device->funcs[chan->client->type])
+
/* --- public DMA engine API --- */

-struct dma_client *dma_async_client_register(dma_event_callback event_callback);
+struct dma_client *dma_async_client_register(enum dma_function_type type,
+ dma_event_callback event_callback);
void dma_async_client_unregister(struct dma_client *client);
void dma_async_client_chan_request(struct dma_client *client,
unsigned int number);

/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * dma_async_buf_to_buf - offloaded copy between virtual addresses
* @chan: DMA channel to offload copy to
* @dest: destination address (virtual)
* @src: source address (virtual)
@@ -220,19 +240,19 @@ void dma_async_client_chan_request(struc
* Both @dest and @src must stay memory resident (kernel memory or locked
* user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len)
+static inline dma_cookie_t dma_async_buf_to_buf(struct dma_chan *chan,
+ void *dest, void *src, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
+ return CHAN2FUNCS(chan)->buf_to_buf(chan, dest, src, len);
}

/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * dma_async_buf_to_pg - offloaded copy from address to page
* @chan: DMA channel to offload copy to
* @page: destination page
* @offset: offset in page to copy to
@@ -244,20 +264,21 @@ static inline dma_cookie_t dma_async_mem
* Both @page/@offset and @kdata must stay memory resident (kernel memory or
* locked user space pages)
*/
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len)
+static inline dma_cookie_t dma_async_buf_to_pg(struct dma_chan *chan,
+ struct page *page, unsigned int offset,
+ void *kdata, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
- kdata, len);
+ return CHAN2FUNCS(chan)->buf_to_pg(chan, page, offset,
+ kdata, len);
}

/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * dma_async_pg_to_pg - offloaded copy from page to page
* @chan: DMA channel to offload copy to
* @dest_pg: destination page
* @dest_off: offset in page to copy to
@@ -270,33 +291,33 @@ static inline dma_cookie_t dma_async_mem
* Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
* (kernel memory or locked user space pages).
*/
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len)
+static inline dma_cookie_t dma_async_pg_to_pg( struct dma_chan *chan,
+ struct page *dest_pg, unsigned int dest_off,
+ struct page *src_pg, unsigned int src_off, size_t len)
{
int cpu = get_cpu();
per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
- per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+ per_cpu_ptr(chan->local, cpu)->count++;
put_cpu();

- return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
- src_pg, src_off, len);
+ return CHAN2FUNCS(chan)->pg_to_pg(chan, dest_pg, dest_off,
+ src_pg, src_off, len);
}

/**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending copies to HW
* @chan: target DMA channel
*
* This allows drivers to push copies to HW in batches,
* reducing MMIO writes where possible.
*/
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(struct dma_chan *chan)
{
- return chan->device->device_memcpy_issue_pending(chan);
+ return CHAN2FUNCS(chan)->issue_pending(chan);
}

/**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_complete - poll for transaction completion
* @chan: DMA channel
* @cookie: transaction identifier to check status of
* @last: returns last completed cookie, can be NULL
@@ -306,10 +327,11 @@ static inline void dma_async_memcpy_issu
* internal state and can be used with dma_async_is_complete() to check
* the status of multiple cookies without re-checking hardware state.
*/
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
- dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
+static inline enum dma_status dma_async_complete(struct dma_chan *chan,
+ dma_cookie_t cookie, dma_cookie_t *last,
+ dma_cookie_t *used)
{
- return chan->device->device_memcpy_complete(chan, cookie, last, used);
+ return CHAN2FUNCS(chan)->complete(chan, cookie, last, used);
}

/**
@@ -318,7 +340,7 @@ static inline enum dma_status dma_async_
* @last_complete: last know completed transaction
* @last_used: last cookie value handed out
*
- * dma_async_is_complete() is used in dma_async_memcpy_complete()
+ * dma_async_is_complete() is used in dma_async_complete()
* the test logic is seperated for lightweight testing of multiple cookies
*/
static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -1945,7 +1945,7 @@ out:
struct dma_chan *chan;
rcu_read_lock();
list_for_each_entry_rcu(chan, &net_dma_client->channels, client_node)
- dma_async_memcpy_issue_pending(chan);
+ dma_async_issue_pending(chan);
rcu_read_unlock();
}
#endif
@@ -3467,7 +3467,7 @@ static void netdev_dma_event(struct dma_
static int __init netdev_dma_register(void)
{
spin_lock_init(&net_dma_event_lock);
- net_dma_client = dma_async_client_register(netdev_dma_event);
+ net_dma_client = dma_async_client_register(DMAFUNC_MEMCPY, netdev_dma_event);
if (net_dma_client == NULL)
return -ENOMEM;

Index: linux-2.6/drivers/dma/iovlock.c
===================================================================
--- linux-2.6.orig/drivers/dma/iovlock.c
+++ linux-2.6/drivers/dma/iovlock.c
@@ -151,7 +151,7 @@ static dma_cookie_t dma_memcpy_to_kernel
while (len > 0) {
if (iov->iov_len) {
int copy = min_t(unsigned int, iov->iov_len, len);
- dma_cookie = dma_async_memcpy_buf_to_buf(
+ dma_cookie = dma_async_buf_to_buf(
chan,
iov->iov_base,
kdata,
@@ -210,7 +210,7 @@ dma_cookie_t dma_memcpy_to_iovec(struct
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_buf_to_pg(chan,
+ dma_cookie = dma_async_buf_to_pg(chan,
page_list->pages[page_idx],
iov_byte_offset,
kdata,
@@ -274,7 +274,7 @@ dma_cookie_t dma_memcpy_pg_to_iovec(stru
copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
copy = min_t(int, copy, iov[iovec_idx].iov_len);

- dma_cookie = dma_async_memcpy_pg_to_pg(chan,
+ dma_cookie = dma_async_pg_to_pg(chan,
page_list->pages[page_idx],
iov_byte_offset,
page,
Index: linux-2.6/net/ipv4/tcp.c
===================================================================
--- linux-2.6.orig/net/ipv4/tcp.c
+++ linux-2.6/net/ipv4/tcp.c
@@ -1431,11 +1431,11 @@ skip_copy:
struct sk_buff *skb;
dma_cookie_t done, used;

- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_issue_pending(tp->ucopy.dma_chan);

- while (dma_async_memcpy_complete(tp->ucopy.dma_chan,
- tp->ucopy.dma_cookie, &done,
- &used) == DMA_IN_PROGRESS) {
+ while (dma_async_complete(tp->ucopy.dma_chan,
+ tp->ucopy.dma_cookie, &done,
+ &used) == DMA_IN_PROGRESS) {
/* do partial cleanup of sk_async_wait_queue */
while ((skb = skb_peek(&sk->sk_async_wait_queue)) &&
(dma_async_is_complete(skb->dma_cookie, done,

2006-09-18 22:56:40

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

On 9/15/06, Olof Johansson <[email protected]> wrote:
> On Fri, 15 Sep 2006 11:38:17 -0500 Olof Johansson <[email protected]> wrote:
>
> > On Mon, 11 Sep 2006 19:44:16 -0400 Jeff Garzik <[email protected]> wrote:
>
> > > Are we really going to add a set of hooks for each DMA engine whizbang
> > > feature?
> > >
> > > That will get ugly when DMA engines support memcpy, xor, crc32, sha1,
> > > aes, and a dozen other transforms.
> >
> >
> > Yes, it will be unmaintainable. We need some sort of multiplexing with
> > per-function registrations.
> >
> > Here's a first cut at it, just very quick. It could be improved further
> > but it shows that we could exorcise most of the hardcoded things pretty
> > easily.
>
> Ok, that was obviously a naive and not so nice first attempt, but I
> figured it was worth it to show how it can be done.
>
> This is a little more proper: Specify at client registration time what
> the function the client will use is, and make the channel use it. This
> way most of the error checking per call can be removed too.
>
> Chris/Dan: Please consider picking this up as a base for the added
> functionality and cleanups.
>
Thanks for this Olof it has sparked some ideas about how to redo
support for multiple operations.

>
>
>
>
> Clean up dmaengine a bit. Make the client registration specify which
> channel functions ("type") the client will use. Also, make devices
> register which functions they will provide.
>
> Also exorcise most of the memcpy-specific references from the generic
> dma engine code. There's still some left in the iov stuff.
I think we should keep the operation type in the function name but
drop all the [buf|pg|dma]_to_[buf|pg|dma] permutations. The buffer
type can be handled generically across all operation types. Something
like the following for a pg_to_buf memcpy.

struct dma_async_op_memcpy *op;
struct page *pg;
void *buf;
size_t len;

dma_async_op_init_src_pg(op, pg);
dma_async_op_init_dest_buf(op, buf);
dma_async_memcpy(chan, op, len);

-Dan

2006-09-19 01:08:45

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

On Mon, 18 Sep 2006 15:56:37 -0700 "Dan Williams" <[email protected]> wrote:

> On 9/15/06, Olof Johansson <[email protected]> wrote:
> > On Fri, 15 Sep 2006 11:38:17 -0500 Olof Johansson <[email protected]> wrote:

> > Chris/Dan: Please consider picking this up as a base for the added
> > functionality and cleanups.
> >
> Thanks for this Olof it has sparked some ideas about how to redo
> support for multiple operations.

Good. :)

> I think we should keep the operation type in the function name but
> drop all the [buf|pg|dma]_to_[buf|pg|dma] permutations. The buffer
> type can be handled generically across all operation types. Something
> like the following for a pg_to_buf memcpy.
>
> struct dma_async_op_memcpy *op;
> struct page *pg;
> void *buf;
> size_t len;
>
> dma_async_op_init_src_pg(op, pg);
> dma_async_op_init_dest_buf(op, buf);
> dma_async_memcpy(chan, op, len);

I'm generally for a more generic interface, especially in the address
permutation cases like above. However, I think it'll be a mistake to
keep the association between the API and the function names and types
so close.

What's the benefit of keeping a memcpy-specific dma_async_memcpy()
instead of a more generic dma_async_commit() (or similar)? We'll know
based on how the client/channel was allocated what kind of function is
requested, won't we?

Same goes for the dma_async_op_memcpy. Make it an union that has a type
field if you need per-operation settings. But as before, we'll know
what kind of op structure gets passed in since we'll know what kind of
operation is to be performed on it.

Finally, yet again the same goes for the op_init settings. I would even
prefer it to not be function-based, instead just direct union/struct
assignments.

struct dma_async_op op;
...

op.src_type = PG; op.src = pg;
op.dest_type = BUF; op.dest = buf;
op.len = len;
dma_async_commit(chan, op);

op might have to be dynamically allocated, since it'll outlive the
scope of this function. But the idea would be the same.


-Olof

2006-09-19 10:57:27

by Alan

[permalink] [raw]
Subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

Ar Llu, 2006-09-18 am 20:05 -0500, ysgrifennodd Olof Johansson:
> On Mon, 18 Sep 2006 15:56:37 -0700 "Dan Williams" <[email protected]> wrote:

> op.src_type = PG; op.src = pg;
> op.dest_type = BUF; op.dest = buf;
> op.len = len;
> dma_async_commit(chan, op);

At OLS Linus suggested it should distinguish between sync and async
events for locking reasons.

if(dma_async_commit(foo) == SYNC_COMPLETE) {
finalise_stuff();
}

else /* will call foo->callback(foo->dev_id) */

because otherwise you have locking complexities - the callback wants to
take locks to guard the object it works on but if it is called
synchronously - eg if hardware is busy and we fall back - it might
deadlock with the caller of dmaa_async_foo() who also needs to hold the
lock.

Alan

2006-09-19 16:35:59

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH] dmaengine: clean up and abstract function types (was Re: [PATCH 08/19] dmaengine: enable multiple clients and operations)

On Tue, 19 Sep 2006 12:20:09 +0100 Alan Cox <[email protected]> wrote:

> Ar Llu, 2006-09-18 am 20:05 -0500, ysgrifennodd Olof Johansson:
> > On Mon, 18 Sep 2006 15:56:37 -0700 "Dan Williams" <[email protected]> wrote:
>
> > op.src_type = PG; op.src = pg;
> > op.dest_type = BUF; op.dest = buf;
> > op.len = len;
> > dma_async_commit(chan, op);
>
> At OLS Linus suggested it should distinguish between sync and async
> events for locking reasons.
>
> if(dma_async_commit(foo) == SYNC_COMPLETE) {
> finalise_stuff();
> }
>
> else /* will call foo->callback(foo->dev_id) */
>
> because otherwise you have locking complexities - the callback wants to
> take locks to guard the object it works on but if it is called
> synchronously - eg if hardware is busy and we fall back - it might
> deadlock with the caller of dmaa_async_foo() who also needs to hold the
> lock.

Good point, sounds very reasonable to me.


-Olof

2006-10-09 00:31:28

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction



On Monday September 11, [email protected] wrote:
> Neil,
>
> The following patches implement hardware accelerated raid5 for the Intel
> Xscale? series of I/O Processors. The MD changes allow stripe
> operations to run outside the spin lock in a work queue. Hardware
> acceleration is achieved by using a dma-engine-aware work queue routine
> instead of the default software only routine.

Hi Dan,
Sorry for the delay in replying.
I've looked through these patches at last (mostly the raid-specific
bits) and while there is clearly a lot of good stuff here, it does
'feel' right - it just seems too complex.

The particular issues that stand out to me are:
- 33 new STRIPE_OP_* flags. I'm sure there doesn't need to be that
many new flags.
- the "raid5 dma client" patch moves far too much internal
knowledge about raid5 into drivers/dma.

Clearly there are some complex issues being dealt with and some
complexity is to be expected, but I feel there must be room for some
serious simplification.

Let me try to describe how I envisage it might work.

As you know, the theory-of-operation of handle_stripe is that it
assesses the state of a stripe deciding what actions to perform and
then performs them. Synchronous actions (e.g. current parity calcs)
are performed 'in-line'. Async actions (reads, writes) and actions
that cannot be performed under a spinlock (->b_end_io) are recorded
as being needed and then are initiated at the end of handle_stripe
outside of the sh->lock.

The proposal is to bring the parity and other bulk-memory operations
out of the spinlock and make them optionally asynchronous.

The set of tasks that might be needed to be performed on a stripe
are:
Clear a target cache block
pre-xor various cache blocks into a target
copy data out of bios into cache blocks. (drain)
post-xor various cache blocks into a target
copy data into bios out of cache blocks (fill)
test if a cache block is all zeros
start a read on a cache block
start a write on a cache block

(There is also a memcpy when expanding raid5. I think I would try to
simply avoid that copy and move pointers around instead).

Some of these steps require sequencing. e.g.
clear, pre-xor, copy, post-xor, write
for a rwm cycle.
We could require handle_stripe to be called again for each step.
i.e. first call just clears the target and flags it as clear. Next
call initiates the pre-xor and flags that as done. Etc. However I
think that would make the non-offloaded case too slow, or at least
too clumsy.

So instead we set flags to say what needs to be done and have a
workqueue system that does it.

(so far this is all quite similar to what you have done.)

So handle_stripe would set various flag and other things (like
identify which block was the 'target' block) and run the following
in a workqueue:

raid5_do_stuff(struct stripe_head *sh)
{
raid5_cont_t *conf = sh->raid_conf;

if (test_bit(CLEAR_TARGET, &sh->ops.pending)) {
struct page = *p->sh->dev[sh->ops.target].page;
rv = async_memset(p, 0, 0, PAGE_SIZE, ops_done, sh);
if (rv != BUSY)
clear_bit(CLEAR_TARGET, &sh->ops.pending);
if (rv != COMPLETE)
goto out;
}

while (test_bit(PRE_XOR, &sh->ops.pending)) {
struct page *plist[XOR_MAX];
int offset[XOR_MAX];
int pos = 0;
int d;

for (d = sh->ops.nextdev;
d < conf->raid_disks && pos < XOR_MAX ;
d++) {
if (sh->ops.nextdev == sh->ops.target)
continue;
if (!test_bit(R5_WantPreXor, &sh->dev[d].flags))
continue;
plist[pos] = sh->dev[d].page;
offset[pos++] = 0;
}
if (pos) {
struct page *p = sh->dev[sh->ops.target].page;
rv = async_xor(p, 0, plist, offset, pos, PAGE_SIZE,
ops_done, sh);
if (rv != BUSY)
sh->ops.nextdev = d;
if (rv != COMPLETE)
goto out;
} else {
clear_bit(PRE_XOR, &sh->ops.pending);
sh->ops.nextdev = 0;
}
}

while (test_bit(COPY_IN, &sh0>ops.pending)) {
...
}
....

if (test_bit(START_IO, &sh->ops.pending)) {
int d;
for (d = 0 ; d < conf->raid_disk ; d++) {
/* all that code from the end of handle_stripe */
}

release_stripe(conf, sh);
return;

out:
if (rv == BUSY) {
/* wait on something and try again ???*/
}
return;
}

ops_done(struct stripe_head *sh)
{
queue_work(....whatever..);
}


Things to note:
- we keep track of where we are up to in sh->ops.
.pending is flags saying what is left to be done
.next_dev is the next device to process for operations that
work on several devices
.next_bio, .next_iov will be needed for copy operations that
cross multiple bios and iovecs.

- Each sh->dev has R5_Want flags reflecting which multi-device
operations are wanted on each device.

- async bulk-memory operations take pages, offsets, and lengths,
and can return COMPLETE (if the operation was performed
synchronously) IN_PROGRESS (if it has been started, or at least
queued) or BUSY if it couldn't even be queued. Exactly what to do
in that case I'm not sure. Probably we need a waitqueue to wait
on.

- The interface between the client and the ADMA hardware is a
collection of async_ functions. async_memcpy, async_xor,
async_memset etc.

I gather there needs to be some understanding
about whether the pages are already appropriately mapped for DMA or
whether a mapping is needed. Maybe an extra flag argument should
be passed.

I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line.

This approach introduces 8 flags for sh->ops.pending and maybe two or
three new R5_Want* flags. It also keeps the raid5 knowledge firmly in
the raid5 code base. So it seems to keep the complexity under control

Would this approach make sense to you? Is there something really
important I have missed?

(I'll try and be more responsive next time).

Thanks,
NeilBrown

2006-10-10 18:23:10

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On 10/8/06, Neil Brown <[email protected]> wrote:
>
>
> On Monday September 11, [email protected] wrote:
> > Neil,
> >
> > The following patches implement hardware accelerated raid5 for the Intel
> > Xscale(r) series of I/O Processors. The MD changes allow stripe
> > operations to run outside the spin lock in a work queue. Hardware
> > acceleration is achieved by using a dma-engine-aware work queue routine
> > instead of the default software only routine.
>
> Hi Dan,
> Sorry for the delay in replying.
> I've looked through these patches at last (mostly the raid-specific
> bits) and while there is clearly a lot of good stuff here, it does
> 'feel' right - it just seems too complex.
>
> The particular issues that stand out to me are:
> - 33 new STRIPE_OP_* flags. I'm sure there doesn't need to be that
> many new flags.
> - the "raid5 dma client" patch moves far too much internal
> knowledge about raid5 into drivers/dma.
>
> Clearly there are some complex issues being dealt with and some
> complexity is to be expected, but I feel there must be room for some
> serious simplification.
A valid criticism. There was definitely a push to just get it
functional, so I can now see how the complexity crept into the
implementation. The primary cause was the choice to explicitly handle
channel switching in raid5-dma. However, relieving "client" code from
this responsibility is something I am taking care of in the async api
changes.

>
> Let me try to describe how I envisage it might work.
>
> As you know, the theory-of-operation of handle_stripe is that it
> assesses the state of a stripe deciding what actions to perform and
> then performs them. Synchronous actions (e.g. current parity calcs)
> are performed 'in-line'. Async actions (reads, writes) and actions
> that cannot be performed under a spinlock (->b_end_io) are recorded
> as being needed and then are initiated at the end of handle_stripe
> outside of the sh->lock.
>
> The proposal is to bring the parity and other bulk-memory operations
> out of the spinlock and make them optionally asynchronous.
>
> The set of tasks that might be needed to be performed on a stripe
> are:
> Clear a target cache block
> pre-xor various cache blocks into a target
> copy data out of bios into cache blocks. (drain)
> post-xor various cache blocks into a target
> copy data into bios out of cache blocks (fill)
> test if a cache block is all zeros
> start a read on a cache block
> start a write on a cache block
>
> (There is also a memcpy when expanding raid5. I think I would try to
> simply avoid that copy and move pointers around instead).
>
> Some of these steps require sequencing. e.g.
> clear, pre-xor, copy, post-xor, write
> for a rwm cycle.
> We could require handle_stripe to be called again for each step.
> i.e. first call just clears the target and flags it as clear. Next
> call initiates the pre-xor and flags that as done. Etc. However I
> think that would make the non-offloaded case too slow, or at least
> too clumsy.
>
> So instead we set flags to say what needs to be done and have a
> workqueue system that does it.
>
> (so far this is all quite similar to what you have done.)
>
> So handle_stripe would set various flag and other things (like
> identify which block was the 'target' block) and run the following
> in a workqueue:
>
> raid5_do_stuff(struct stripe_head *sh)
> {
> raid5_cont_t *conf = sh->raid_conf;
>
> if (test_bit(CLEAR_TARGET, &sh->ops.pending)) {
> struct page = *p->sh->dev[sh->ops.target].page;
> rv = async_memset(p, 0, 0, PAGE_SIZE, ops_done, sh);
> if (rv != BUSY)
> clear_bit(CLEAR_TARGET, &sh->ops.pending);
> if (rv != COMPLETE)
> goto out;
> }
>
> while (test_bit(PRE_XOR, &sh->ops.pending)) {
> struct page *plist[XOR_MAX];
> int offset[XOR_MAX];
> int pos = 0;
> int d;
>
> for (d = sh->ops.nextdev;
> d < conf->raid_disks && pos < XOR_MAX ;
> d++) {
> if (sh->ops.nextdev == sh->ops.target)
> continue;
> if (!test_bit(R5_WantPreXor, &sh->dev[d].flags))
> continue;
> plist[pos] = sh->dev[d].page;
> offset[pos++] = 0;
> }
> if (pos) {
> struct page *p = sh->dev[sh->ops.target].page;
> rv = async_xor(p, 0, plist, offset, pos, PAGE_SIZE,
> ops_done, sh);
> if (rv != BUSY)
> sh->ops.nextdev = d;
> if (rv != COMPLETE)
> goto out;
> } else {
> clear_bit(PRE_XOR, &sh->ops.pending);
> sh->ops.nextdev = 0;
> }
> }
>
> while (test_bit(COPY_IN, &sh0>ops.pending)) {
> ...
> }
> ....
>
> if (test_bit(START_IO, &sh->ops.pending)) {
> int d;
> for (d = 0 ; d < conf->raid_disk ; d++) {
> /* all that code from the end of handle_stripe */
> }
>
> release_stripe(conf, sh);
> return;
>
> out:
> if (rv == BUSY) {
> /* wait on something and try again ???*/
> }
> return;
> }
>
> ops_done(struct stripe_head *sh)
> {
> queue_work(....whatever..);
> }
>
>
> Things to note:
> - we keep track of where we are up to in sh->ops.
> .pending is flags saying what is left to be done
> .next_dev is the next device to process for operations that
> work on several devices
> .next_bio, .next_iov will be needed for copy operations that
> cross multiple bios and iovecs.
>
> - Each sh->dev has R5_Want flags reflecting which multi-device
> operations are wanted on each device.
>
> - async bulk-memory operations take pages, offsets, and lengths,
> and can return COMPLETE (if the operation was performed
> synchronously) IN_PROGRESS (if it has been started, or at least
> queued) or BUSY if it couldn't even be queued. Exactly what to do
> in that case I'm not sure. Probably we need a waitqueue to wait
> on.
>
> - The interface between the client and the ADMA hardware is a
> collection of async_ functions. async_memcpy, async_xor,
> async_memset etc.
>
> I gather there needs to be some understanding
> about whether the pages are already appropriately mapped for DMA or
> whether a mapping is needed. Maybe an extra flag argument should
> be passed.
>
> I imagine that any piece of ADMA hardware would register with the
> 'async_*' subsystem, and a call to async_X would be routed as
> appropriate, or be run in-line.
>
> This approach introduces 8 flags for sh->ops.pending and maybe two or
> three new R5_Want* flags. It also keeps the raid5 knowledge firmly in
> the raid5 code base. So it seems to keep the complexity under control
>
> Would this approach make sense to you?
Definitely.

> Is there something really important I have missed?
No, nothing important jumps out. Just a follow up question/note about
the details.

You imply that the async path and the sync path are unified in this
implementation. I think it is doable but it will add some complexity
since the sync case is not a distinct subset of the async case. For
example "Clear a target cache block" is required for the sync case,
but it can go away when using hardware engines. Engines typically
have their own accumulator buffer to store the temporary result,
whereas software only operates on memory.

What do you think of adding async tests for these situations?
test_bit(XOR, &conf->async)

Where a flag is set if calls to async_<operation> may be routed to
hardware engine? Otherwise skip any async specific details.

>
> (I'll try and be more responsive next time).
Thanks for shepherding this along.

>
> Thanks,
> NeilBrown

Regards,
Dan

2006-10-11 01:46:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

On 9/14/06, Jakob Oestergaard <[email protected]> wrote:
> On Wed, Sep 13, 2006 at 12:17:55PM -0700, Dan Williams wrote:
> ...
> > >Out of curiosity; how does accelerated compare to non-accelerated?
> >
> > One quick example:
> > 4-disk SATA array rebuild on iop321 without acceleration - 'top'
> > reports md0_resync and md0_raid5 dueling for the CPU each at ~50%
> > utilization.
> >
> > With acceleration - 'top' reports md0_resync cpu utilization at ~90%
> > with the rest split between md0_raid5 and md0_raid5_ops.
> >
> > The sync speed reported by /proc/mdstat is ~40% higher in the accelerated
> > case.
>
> Ok, nice :)
>
> >
> > That being said, array resync is a special case, so your mileage may
> > vary with other applications.
>
> Every-day usage I/O performance data would be nice indeed :)
>
> > I will put together some data from bonnie++, iozone, maybe contest,
> > and post it on SourceForge.
>
> Great!
>
I have posted some Iozone data and graphs showing the performance
impact of the patches across the three iop processors iop321, iop331,
and iop341. The general take away from the data is that using dma
engines extends the region that Iozone calls the "buffer cache
effect". Write performance benefited the most as expected, but read
performance showed some modest gains as well. There are some regions
(smaller file size and record length) that show a performance
disadvantage but it is typically less than 5%.

The graphs map the relative performance multiplier that the raid
patches generate ('2.6.18-rc6 performance' x 'performance multiplier'
= '2.6.18-rc6-raid performance') . A value of '1' designates equal
performance. The large cliff that drops to zero is a "not measured"
region, i.e. the record length is larger than the file size. Iozone
outputs to Excel, but I have also made pdf's of the graphs available.
Note: Openoffice-calc can view the data but it does not support the 3D
surface graphs that Iozone uses.

Excel:
http://prdownloads.sourceforge.net/xscaleiop/iozone_raid_accel.xls?download

PDF Graphs:
http://prdownloads.sourceforge.net/xscaleiop/iop-iozone-graphs-20061010.tar.bz2?download

Regards,
Dan

2006-10-11 02:45:01

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 00/19] Hardware Accelerated MD RAID5: Introduction

[dropped akpm from the Cc: as current discussion isn't directly
relevant to him]
On Tuesday October 10, [email protected] wrote:
> On 10/8/06, Neil Brown <[email protected]> wrote:
>
> > Is there something really important I have missed?
> No, nothing important jumps out. Just a follow up question/note about
> the details.
>
> You imply that the async path and the sync path are unified in this
> implementation. I think it is doable but it will add some complexity
> since the sync case is not a distinct subset of the async case. For
> example "Clear a target cache block" is required for the sync case,
> but it can go away when using hardware engines. Engines typically
> have their own accumulator buffer to store the temporary result,
> whereas software only operates on memory.
>
> What do you think of adding async tests for these situations?
> test_bit(XOR, &conf->async)
>
> Where a flag is set if calls to async_<operation> may be routed to
> hardware engine? Otherwise skip any async specific details.

I'd rather try to come up with an interface that was equally
appropriate to both offload and inline. I appreciate that it might
not be possible to get an interface that gets best performance out of
both, but I'd like to explore that direction first.

I'd guess from what you say that the dma engine is given a bunch of
sources and a destination and it xor's all the sources together into
an accumulation buffer, and then writes the accum buffer to the
destination. Would that be right? Can you use the destination as one
of the sources?

That can obviously be done inline too with some changes to the xor
code, and avoiding the initial memset might be good for performance
too.

So I would suggest we drop the memset idea, and define the async_xor
interface to xor a number of sources into a destination, where the
destination is allowed to be the same as the first source, but
doesn't need to be.
Then the inline version could use a memset followed by the current xor
operations, or could use newly written xor operations, and the offload
version could equally do whatever is appropriate.

Another place where combining operations might make sense is copy-in
and post-xor. In some cases it might be more efficient to only read
the source once, and both write it to the destination and xor it into
the target. Would your DMA engine be able to optimise this
combination? I think current processors could certainly do better if
the two were combined.

So there is definitely room to move, but would rather avoid flags if I
could.

NeilBrown