2012-05-14 13:48:55

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 0/7] dmaengine: enhance dmaengine to support DMA device hotplug

From: Jiang Liu <[email protected]>

From: Jiang Liu <[email protected]>

This patch set enhances the dmaengine core and its clients to support
hot-removal of DMA devices at runtime, especially for IOAT devices.

Intel IOAT (Crystal Beach) devices are often built into PCIe root complex.
When hot-plugging a PCI root complex (IOH) on Intel Nehalem/Westmere
platforms, all IOAT devices built in the IOH must be handled first.
For future Intel processors with Integrated IOH (IIO), IOAT device will
get involved even when hot-plugging physical processors.

The dmaengine core already supports hot-add of IOAT devices, but hot-removal
of IOAT devices is still unsupported due to a design limiation in the
dmaengine core.

Currently dmaengine has an assumption that DMA devices could only be
deregistered when there's no any clients making use of DMA devices.
So dma_async_device_unregister() is designed to be called by DMA device
driver's module_exit routines only. But the ioatdma driver doesn't
conform to that rule, it calls dma_async_device_unregister() from its
driver detaching routine instead of module_exit routine.

This patch set enhances the dmaengine core to support DMA device hotplug,
so that dma_async_device_unregister() could be called by DMA driver's
detach routines to hot-remove DMA devices at runtime. It also tries
to optimize DMA channel allocation policy according to NUMA affinity.

v2: use percpu counter for channel reference count to avoid polluting
global shared cachelines

echo 0000:80:16.7 > /sys/bus/pci/drivers/ioatdma/unbind
ioatdma 0000:80:16.7: Removing dma and dca services
------------[ cut here ]------------
WARNING: at drivers/dma/dmaengine.c:831 dma_async_device_unregister+0xd5/0xf0() (Tainted: G ---------------- T)
Hardware name: System x3850 X5 -[7143O3G]-
dma_async_device_unregister called while 17 clients hold a reference
Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat vhost_net macvtap macvlan tun kvm_intel kvm uinput microcode sg serio_raw cdc_ether usbnet mii be2net i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp i7core_edac edac_core ioatdma igb dca e1000e bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif qla2xxx pmcraid pata_acpi ata_generic ata_piix bfa(T) scsi_transport_fc scsi_tgt megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 5143, comm: bash Tainted: G ---------------- T 2.6.32-220.el6.x86_64 #1
Call Trace:
[<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff81069c66>] ? warn_slowpath_fmt+0x46/0x50
[<ffffffff813fe465>] ? dma_async_device_unregister+0xd5/0xf0
[<ffffffffa00ec5e9>] ? ioat_dma_remove+0x28/0x4a [ioatdma]
[<ffffffffa00ec5b9>] ? ioat_remove+0x82/0x8a [ioatdma]
[<ffffffff8128ab37>] ? pci_device_remove+0x37/0x70
[<ffffffff8134623f>] ? __device_release_driver+0x6f/0xe0
[<ffffffff813463ad>] ? device_release_driver+0x2d/0x40
[<ffffffff81345711>] ? driver_unbind+0xa1/0xc0
[<ffffffff81344b7c>] ? drv_attr_store+0x2c/0x30
[<ffffffff811eb445>] ? sysfs_write_file+0xe5/0x170
[<ffffffff811765d8>] ? vfs_write+0xb8/0x1a0
[<ffffffff810d46e2>] ? audit_syscall_entry+0x272/0x2a0
[<ffffffff81176fe1>] ? sys_write+0x51/0x90
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
---[ end trace 436e184dbc830d94 ]---
ioatdma 0000:80:16.7: dma_pool_destroy dma_desc_pool, ffff881073536000 busy
ioatdma 0000:80:16.7: dma_pool_destroy dma_desc_pool, ffff881073533000 busy
ioatdma 0000:80:16.7: dma_pool_destroy dma_desc_pool, ffff88107352f000 busy
ioatdma 0000:80:16.7: dma_pool_destroy dma_desc_pool, ffff88107352c000 busy
ioatdma 0000:80:16.7: dma_pool_destroy dma_desc_pool, ffff881073529000 busy
ioatdma 0000:80:16.7: dma_pool_destroy completion_pool, ffff881073527000 busy

Jiang Liu (7):
dmaengine: enhance DMA channel reference count management
dmaengine: rebalance DMA channels when CPU hotplug happens
dmaengine: enhance dmaengine to support DMA device hotplug
dmaengine: enhance network subsystem to support DMA device hotplug
dmaengine: enhance ASYNC_TX subsystem to support DMA device hotplug
dmaengine: introduce CONFIG_DMA_ENGINE_HOTPLUG for DMA device hotplug
dmaengine: assign DMA channel to CPU according to NUMA affinity

crypto/async_tx/async_memcpy.c | 2 +
crypto/async_tx/async_memset.c | 2 +
crypto/async_tx/async_pq.c | 10 +-
crypto/async_tx/async_raid6_recov.c | 8 +-
crypto/async_tx/async_tx.c | 6 +-
crypto/async_tx/async_xor.c | 13 +-
drivers/dma/Kconfig | 6 +
drivers/dma/dmaengine.c | 360 ++++++++++++++++++++++-------------
include/linux/async_tx.h | 13 ++
include/linux/dmaengine.h | 43 ++++-
include/net/netdma.h | 26 +++
net/ipv4/tcp.c | 10 +-
net/ipv4/tcp_input.c | 5 +-
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv6/tcp_ipv6.c | 4 +-
15 files changed, 350 insertions(+), 162 deletions(-)

--
1.7.9.5


2012-05-14 13:49:38

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 1/7] dmaengine: enhance DMA channel reference count management

From: Jiang Liu <[email protected]>

Intel IOAT (Crystal Beach) devices are often built into PCIe root complex.
When hot-plugging a PCI root complex (IOH) on Intel Nehalem/Westmere
platforms, all IOAT devices built in the IOH must be handled first.
For future Intel processors with Integrated IOH (IIO), IOAT device will
get involved even when hot-plugging physical processors.

Currently dmaengine core already supports hot-add of IOAT devices,
but hot-removal of IOAT devices is still unsupported due to a design
limiation in the dmaengine core.

Currently dmaengine has an assumption that DMA devices could only be
deregistered when there's no any clients making use of DMA devices.
So dma_async_device_unregister() is designed to be called by DMA device
driver's module_exit routines only. But the ioatdma driver doesn't
conform to that rule, it calls dma_async_device_unregister() from its
driver detaching routine instead of module_exit routine.

This patch set enhances the dmaengine core to support DMA device hotplug,
so that dma_async_device_unregister() could be called by DMA driver's
detach routines to hot-remove DMA devices at runtime.

With currently implementation, variable dmaengine_ref_count is used to track
how many clients are making use of the dmaengine. There's also a per-channel
reference count named client_count in struct dma_chan. For successfully
initialized channels, dma_chan->client_count is set to dmaengine_ref_count.
That means all channels can't be released/removed if there are still clients.

To support DMA device hotplug, this patch change dma_chan->client_count to
track actual reference to the DMA channel instead of tracking the global
client count.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/dma/dmaengine.c | 193 ++++++++++++++++-------------------------------
1 file changed, 67 insertions(+), 126 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 2397f6f..58bc45c 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -64,7 +64,7 @@
static DEFINE_MUTEX(dma_list_mutex);
static DEFINE_IDR(dma_idr);
static LIST_HEAD(dma_device_list);
-static long dmaengine_ref_count;
+static long dmaengine_client_count;

/* --- sysfs implementation --- */

@@ -168,8 +168,6 @@ static struct class dma_devclass = {

/* --- client and device registration --- */

-#define dma_device_satisfies_mask(device, mask) \
- __dma_device_satisfies_mask((device), &(mask))
static int
__dma_device_satisfies_mask(struct dma_device *device, dma_cap_mask_t *want)
{
@@ -186,22 +184,6 @@ static struct module *dma_chan_to_owner(struct dma_chan *chan)
}

/**
- * balance_ref_count - catch up the channel reference count
- * @chan - channel to balance ->client_count versus dmaengine_ref_count
- *
- * balance_ref_count must be called under dma_list_mutex
- */
-static void balance_ref_count(struct dma_chan *chan)
-{
- struct module *owner = dma_chan_to_owner(chan);
-
- while (chan->client_count < dmaengine_ref_count) {
- __module_get(owner);
- chan->client_count++;
- }
-}
-
-/**
* dma_chan_get - try to grab a dma channel's parent driver module
* @chan - channel to grab
*
@@ -209,28 +191,22 @@ static void balance_ref_count(struct dma_chan *chan)
*/
static int dma_chan_get(struct dma_chan *chan)
{
- int err = -ENODEV;
+ int err = 0;
struct module *owner = dma_chan_to_owner(chan);

- if (chan->client_count) {
- __module_get(owner);
- err = 0;
- } else if (try_module_get(owner))
- err = 0;
-
- if (err == 0)
- chan->client_count++;
+ /* Device driver module has been unloaded */
+ if (!try_module_get(owner))
+ return -ENODEV;

/* allocate upon first client reference */
- if (chan->client_count == 1 && err == 0) {
+ if (++chan->client_count == 1) {
int desc_cnt = chan->device->device_alloc_chan_resources(chan);

if (desc_cnt < 0) {
err = desc_cnt;
chan->client_count = 0;
module_put(owner);
- } else if (!dma_has_cap(DMA_PRIVATE, chan->device->cap_mask))
- balance_ref_count(chan);
+ }
}

return err;
@@ -244,12 +220,10 @@ static int dma_chan_get(struct dma_chan *chan)
*/
static void dma_chan_put(struct dma_chan *chan)
{
- if (!chan->client_count)
- return; /* this channel failed alloc_chan_resources */
- chan->client_count--;
- module_put(dma_chan_to_owner(chan));
- if (chan->client_count == 0)
+ BUG_ON(chan->client_count <= 0);
+ if (--chan->client_count == 0)
chan->device->device_free_chan_resources(chan);
+ module_put(dma_chan_to_owner(chan));
}

enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
@@ -278,9 +252,11 @@ static dma_cap_mask_t dma_cap_mask_all;
/**
* dma_chan_tbl_ent - tracks channel allocations per core/operation
* @chan - associated channel for this entry
+ * @prev_chan - previous associated channel for this entry
*/
struct dma_chan_tbl_ent {
struct dma_chan *chan;
+ struct dma_chan *prev_chan;
};

/**
@@ -367,7 +343,7 @@ void dma_issue_pending_all(void)
EXPORT_SYMBOL(dma_issue_pending_all);

/**
- * nth_chan - returns the nth channel of the given capability
+ * nth_chan - grab a reference to the nth channel of the given capability
* @cap: capability to match
* @n: nth channel desired
*
@@ -387,17 +363,19 @@ static struct dma_chan *nth_chan(enum dma_transaction_type cap, int n)
dma_has_cap(DMA_PRIVATE, device->cap_mask))
continue;
list_for_each_entry(chan, &device->channels, device_node) {
- if (!chan->client_count)
+ if (dma_chan_get(chan))
continue;
- if (!min)
- min = chan;
- else if (chan->table_count < min->table_count)
- min = chan;
-
if (n-- == 0) {
ret = chan;
break; /* done */
}
+ if (!min)
+ min = chan;
+ else if (chan->table_count < min->table_count) {
+ dma_chan_put(min);
+ min = chan;
+ } else
+ dma_chan_put(chan);
}
if (ret)
break; /* done */
@@ -405,6 +383,8 @@ static struct dma_chan *nth_chan(enum dma_transaction_type cap, int n)

if (!ret)
ret = min;
+ else if (min)
+ dma_chan_put(min);

if (ret)
ret->table_count++;
@@ -423,37 +403,39 @@ static struct dma_chan *nth_chan(enum dma_transaction_type cap, int n)
static void dma_channel_rebalance(void)
{
struct dma_chan *chan;
- struct dma_device *device;
+ struct dma_chan_tbl_ent *entry;
int cpu;
int cap;
- int n;
+ int n = 0;

- /* undo the last distribution */
+ /* save the last distribution */
for_each_dma_cap_mask(cap, dma_cap_mask_all)
- for_each_possible_cpu(cpu)
- per_cpu_ptr(channel_table[cap], cpu)->chan = NULL;
-
- list_for_each_entry(device, &dma_device_list, global_node) {
- if (dma_has_cap(DMA_PRIVATE, device->cap_mask))
- continue;
- list_for_each_entry(chan, &device->channels, device_node)
- chan->table_count = 0;
- }
+ for_each_possible_cpu(cpu) {
+ entry = per_cpu_ptr(channel_table[cap], cpu);
+ entry->prev_chan = entry->chan;
+ entry->chan = NULL;
+ if (entry->prev_chan)
+ entry->prev_chan->table_count--;
+ }

- /* don't populate the channel_table if no clients are available */
- if (!dmaengine_ref_count)
- return;
+ /* redistribute available channels if there are clients */
+ if (dmaengine_client_count)
+ for_each_dma_cap_mask(cap, dma_cap_mask_all)
+ for_each_online_cpu(cpu) {
+ if (num_possible_cpus() > 1)
+ chan = nth_chan(cap, n++);
+ else
+ chan = nth_chan(cap, -1);
+ entry = per_cpu_ptr(channel_table[cap], cpu);
+ entry->chan = chan;
+ }

- /* redistribute available channels */
- n = 0;
+ /* undo the last distribution */
for_each_dma_cap_mask(cap, dma_cap_mask_all)
- for_each_online_cpu(cpu) {
- if (num_possible_cpus() > 1)
- chan = nth_chan(cap, n++);
- else
- chan = nth_chan(cap, -1);
-
- per_cpu_ptr(channel_table[cap], cpu)->chan = chan;
+ for_each_possible_cpu(cpu) {
+ chan = per_cpu(channel_table[cap]->prev_chan, cpu);
+ if (chan)
+ dma_chan_put(chan);
}
}

@@ -511,19 +493,16 @@ struct dma_chan *__dma_request_channel(dma_cap_mask_t *mask, dma_filter_fn fn, v
chan = private_candidate(mask, device, fn, fn_param);
if (chan) {
/* Found a suitable channel, try to grab, prep, and
- * return it. We first set DMA_PRIVATE to disable
- * balance_ref_count as this channel will not be
- * published in the general-purpose allocator
+ * return it.
*/
dma_cap_set(DMA_PRIVATE, device->cap_mask);
device->privatecnt++;
err = dma_chan_get(chan);

- if (err == -ENODEV) {
+ if (err == -ENODEV)
pr_debug("%s: %s module removed\n", __func__,
dma_chan_name(chan));
- list_del_rcu(&device->global_node);
- } else if (err)
+ else if (err)
pr_debug("%s: failed to get %s: (%d)\n",
__func__, dma_chan_name(chan), err);
else
@@ -560,57 +539,26 @@ EXPORT_SYMBOL_GPL(dma_release_channel);
*/
void dmaengine_get(void)
{
- struct dma_device *device, *_d;
- struct dma_chan *chan;
- int err;
-
mutex_lock(&dma_list_mutex);
- dmaengine_ref_count++;
-
- /* try to grab channels */
- list_for_each_entry_safe(device, _d, &dma_device_list, global_node) {
- if (dma_has_cap(DMA_PRIVATE, device->cap_mask))
- continue;
- list_for_each_entry(chan, &device->channels, device_node) {
- err = dma_chan_get(chan);
- if (err == -ENODEV) {
- /* module removed before we could use it */
- list_del_rcu(&device->global_node);
- break;
- } else if (err)
- pr_err("%s: failed to get %s: (%d)\n",
- __func__, dma_chan_name(chan), err);
- }
- }
-
/* if this is the first reference and there were channels
* waiting we need to rebalance to get those channels
* incorporated into the channel table
*/
- if (dmaengine_ref_count == 1)
+ if (++dmaengine_client_count == 1)
dma_channel_rebalance();
mutex_unlock(&dma_list_mutex);
}
EXPORT_SYMBOL(dmaengine_get);

/**
- * dmaengine_put - let dma drivers be removed when ref_count == 0
+ * dmaengine_put - deregister interest in dma_channels
*/
void dmaengine_put(void)
{
- struct dma_device *device;
- struct dma_chan *chan;
-
mutex_lock(&dma_list_mutex);
- dmaengine_ref_count--;
- BUG_ON(dmaengine_ref_count < 0);
- /* drop channel references */
- list_for_each_entry(device, &dma_device_list, global_node) {
- if (dma_has_cap(DMA_PRIVATE, device->cap_mask))
- continue;
- list_for_each_entry(chan, &device->channels, device_node)
- dma_chan_put(chan);
- }
+ BUG_ON(dmaengine_client_count <= 0);
+ if (--dmaengine_client_count == 0)
+ dma_channel_rebalance();
mutex_unlock(&dma_list_mutex);
}
EXPORT_SYMBOL(dmaengine_put);
@@ -773,26 +721,11 @@ int dma_async_device_register(struct dma_device *device)
device->chancnt = chancnt;

mutex_lock(&dma_list_mutex);
- /* take references on public channels */
- if (dmaengine_ref_count && !dma_has_cap(DMA_PRIVATE, device->cap_mask))
- list_for_each_entry(chan, &device->channels, device_node) {
- /* if clients are already waiting for channels we need
- * to take references on their behalf
- */
- if (dma_chan_get(chan) == -ENODEV) {
- /* note we can only get here for the first
- * channel as the remaining channels are
- * guaranteed to get a reference
- */
- rc = -ENODEV;
- mutex_unlock(&dma_list_mutex);
- goto err_out;
- }
- }
list_add_tail_rcu(&device->global_node, &dma_device_list);
if (dma_has_cap(DMA_PRIVATE, device->cap_mask))
device->privatecnt++; /* Always private */
- dma_channel_rebalance();
+ else
+ dma_channel_rebalance();
mutex_unlock(&dma_list_mutex);

return 0;
@@ -836,6 +769,14 @@ void dma_async_device_unregister(struct dma_device *device)
dma_channel_rebalance();
mutex_unlock(&dma_list_mutex);

+ /* Check whether it's called from module exit function. */
+ if (try_module_get(device->dev->driver->owner)) {
+ dev_warn(device->dev,
+ "%s isn't called from module exit function.\n",
+ __func__);
+ module_put(device->dev->driver->owner);
+ }
+
list_for_each_entry(chan, &device->channels, device_node) {
WARN_ONCE(chan->client_count,
"%s called while %d clients hold a reference\n",
--
1.7.9.5

2012-05-14 13:50:04

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 2/7] dmaengine: rebalance DMA channels when CPU hotplug happens

From: Jiang Liu <[email protected]>

Rebalance DMA channels when CPU hotplug happens to correctly update
DMA channel reference count.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/dma/dmaengine.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 58bc45c..d3b1c48 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -49,6 +49,7 @@
#include <linux/init.h>
#include <linux/module.h>
#include <linux/mm.h>
+#include <linux/cpu.h>
#include <linux/device.h>
#include <linux/dmaengine.h>
#include <linux/hardirq.h>
@@ -1003,8 +1004,29 @@ void dma_run_dependencies(struct dma_async_tx_descriptor *tx)
}
EXPORT_SYMBOL_GPL(dma_run_dependencies);

+static int __cpuinit
+hotcpu_rebalance(struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_POST_DEAD:
+ mutex_lock(&dma_list_mutex);
+ dma_channel_rebalance();
+ mutex_unlock(&dma_list_mutex);
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __refdata dma_cpu_notifier = {
+ .notifier_call = hotcpu_rebalance,
+};
+
+
static int __init dma_bus_init(void)
{
+ register_hotcpu_notifier(&dma_cpu_notifier);
return class_register(&dma_devclass);
}
arch_initcall(dma_bus_init);
--
1.7.9.5

2012-05-14 13:50:18

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA device hotplug

From: Jiang Liu <[email protected]>

From: Jiang Liu <[email protected]>

To support DMA device hotplug, dmaengine must correctly manage
reference count for DMA channels. On the other hand, DMA hot path
is performance critical, reference count management code should
avoid polluting global shared cachelines, otherwise it may cause
heavy performance penalty.

This patch introduces a lightweight DMA channel reference count
management mechanism by using percpu counter. When DMA device
hotplug is disabled, there's no performance penalty. When hotplug
is enabled, a dma_find_channel()/dma_put_channel() pair adds two
local memory accesses to the hot path.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/dma/dmaengine.c | 112 +++++++++++++++++++++++++++++++++++++++++----
include/linux/dmaengine.h | 29 +++++++++++-
2 files changed, 129 insertions(+), 12 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index d3b1c48..eca45c0 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -61,12 +61,20 @@
#include <linux/rculist.h>
#include <linux/idr.h>
#include <linux/slab.h>
+#include <linux/sched.h>

static DEFINE_MUTEX(dma_list_mutex);
static DEFINE_IDR(dma_idr);
static LIST_HEAD(dma_device_list);
static long dmaengine_client_count;

+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+static atomic_t dmaengine_dirty;
+static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
+static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
+DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
+#endif /* CONFIG_DMA_ENGINE_HOTPLUG */
+
/* --- sysfs implementation --- */

/**
@@ -305,10 +313,40 @@ arch_initcall(dma_channel_table_init);
*/
struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
{
- return this_cpu_read(channel_table[tx_type]->chan);
+ struct dma_chan *chan = this_cpu_read(channel_table[tx_type]->chan);
+
+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+ this_cpu_inc(dmaengine_chan_ref_count);
+ if (static_key_false(&dmaengine_quiesce))
+ chan = NULL;
+#endif
+
+ return chan;
}
EXPORT_SYMBOL(dma_find_channel);

+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+struct dma_chan *dma_get_channel(struct dma_chan *chan)
+{
+ if (static_key_false(&dmaengine_quiesce))
+ atomic_inc(&dmaengine_dirty);
+ this_cpu_inc(dmaengine_chan_ref_count);
+
+ return chan;
+}
+EXPORT_SYMBOL(dma_get_channel);
+#endif
+
+/**
+ * dma_has_capability - check whether any channel supports tx_type
+ * @tx_type: transaction type
+ */
+bool dma_has_capability(enum dma_transaction_type tx_type)
+{
+ return !!this_cpu_read(channel_table[tx_type]->chan);
+}
+EXPORT_SYMBOL(dma_has_capability);
+
/*
* net_dma_find_channel - find a channel for net_dma
* net_dma has alignment requirements
@@ -316,10 +354,15 @@ EXPORT_SYMBOL(dma_find_channel);
struct dma_chan *net_dma_find_channel(void)
{
struct dma_chan *chan = dma_find_channel(DMA_MEMCPY);
- if (chan && !is_dma_copy_aligned(chan->device, 1, 1, 1))
- return NULL;

- return chan;
+ if (chan && is_dma_copy_aligned(chan->device, 1, 1, 1))
+ return chan;
+
+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+ this_cpu_dec(dmaengine_chan_ref_count);
+#endif
+
+ return NULL;
}
EXPORT_SYMBOL(net_dma_find_channel);

@@ -393,6 +436,37 @@ static struct dma_chan *nth_chan(enum dma_transaction_type cap, int n)
return ret;
}

+static void dma_channel_quiesce(void)
+{
+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+ int cpu;
+ long ref_count;
+ int quiesce_count = 0;
+
+ static_key_slow_inc(&dmaengine_quiesce);
+
+ do {
+ atomic_set(&dmaengine_dirty, 0);
+ schedule_timeout(1);
+
+ if (atomic_read(&dmaengine_dirty)) {
+ quiesce_count = 0;
+ continue;
+ }
+
+ ref_count = 0;
+ for_each_possible_cpu(cpu)
+ ref_count += per_cpu(dmaengine_chan_ref_count, cpu);
+ if (ref_count == 0)
+ quiesce_count++;
+ else
+ quiesce_count = 0;
+ } while (quiesce_count < 10);
+
+ static_key_slow_dec(&dmaengine_quiesce);
+#endif
+}
+
/**
* dma_channel_rebalance - redistribute the available channels
*
@@ -401,7 +475,7 @@ static struct dma_chan *nth_chan(enum dma_transaction_type cap, int n)
* multi-tasking channels) in the non-SMP case. Must be called under
* dma_list_mutex.
*/
-static void dma_channel_rebalance(void)
+static void dma_channel_rebalance(bool quiesce)
{
struct dma_chan *chan;
struct dma_chan_tbl_ent *entry;
@@ -431,6 +505,9 @@ static void dma_channel_rebalance(void)
entry->chan = chan;
}

+ if (quiesce)
+ dma_channel_quiesce();
+
/* undo the last distribution */
for_each_dma_cap_mask(cap, dma_cap_mask_all)
for_each_possible_cpu(cpu) {
@@ -532,6 +609,10 @@ void dma_release_channel(struct dma_chan *chan)
if (--chan->device->privatecnt == 0)
dma_cap_clear(DMA_PRIVATE, chan->device->cap_mask);
mutex_unlock(&dma_list_mutex);
+
+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+ wake_up_all(&dmaengine_wait_queue);
+#endif
}
EXPORT_SYMBOL_GPL(dma_release_channel);

@@ -546,7 +627,7 @@ void dmaengine_get(void)
* incorporated into the channel table
*/
if (++dmaengine_client_count == 1)
- dma_channel_rebalance();
+ dma_channel_rebalance(false);
mutex_unlock(&dma_list_mutex);
}
EXPORT_SYMBOL(dmaengine_get);
@@ -559,7 +640,7 @@ void dmaengine_put(void)
mutex_lock(&dma_list_mutex);
BUG_ON(dmaengine_client_count <= 0);
if (--dmaengine_client_count == 0)
- dma_channel_rebalance();
+ dma_channel_rebalance(false);
mutex_unlock(&dma_list_mutex);
}
EXPORT_SYMBOL(dmaengine_put);
@@ -726,7 +807,7 @@ int dma_async_device_register(struct dma_device *device)
if (dma_has_cap(DMA_PRIVATE, device->cap_mask))
device->privatecnt++; /* Always private */
else
- dma_channel_rebalance();
+ dma_channel_rebalance(false);
mutex_unlock(&dma_list_mutex);

return 0;
@@ -767,14 +848,22 @@ void dma_async_device_unregister(struct dma_device *device)

mutex_lock(&dma_list_mutex);
list_del_rcu(&device->global_node);
- dma_channel_rebalance();
+ dma_channel_rebalance(true);
mutex_unlock(&dma_list_mutex);

/* Check whether it's called from module exit function. */
if (try_module_get(device->dev->driver->owner)) {
+#ifndef CONFIG_DMA_ENGINE_HOTPLUG
dev_warn(device->dev,
"%s isn't called from module exit function.\n",
__func__);
+#else
+ list_for_each_entry(chan, &device->channels, device_node) {
+ /* TODO: notify clients to release channels*/
+ wait_event(dmaengine_wait_queue,
+ chan->client_count == 0);
+ }
+#endif
module_put(device->dev->driver->owner);
}

@@ -788,6 +877,9 @@ void dma_async_device_unregister(struct dma_device *device)
device_unregister(&chan->dev->device);
free_percpu(chan->local);
}
+
+ /* Synchronize with dma_issue_pending_all() */
+ synchronize_rcu();
}
EXPORT_SYMBOL(dma_async_device_unregister);

@@ -1011,7 +1103,7 @@ hotcpu_rebalance(struct notifier_block *nfb, unsigned long action, void *hcpu)
case CPU_ONLINE:
case CPU_POST_DEAD:
mutex_lock(&dma_list_mutex);
- dma_channel_rebalance();
+ dma_channel_rebalance(false);
mutex_unlock(&dma_list_mutex);
break;
}
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index f9a2e5e..e099b28 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -973,10 +973,35 @@ static inline void dma_release_channel(struct dma_chan *chan)
int dma_async_device_register(struct dma_device *device);
void dma_async_device_unregister(struct dma_device *device);
void dma_run_dependencies(struct dma_async_tx_descriptor *tx);
-struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type);
-struct dma_chan *net_dma_find_channel(void);
#define dma_request_channel(mask, x, y) __dma_request_channel(&(mask), x, y)

+/*
+ * Special DMA channel reference count management scheme has been adopted
+ * to support DMA device hotplug without introducing big performance penalty.
+ * Please follow rules below to correctly manage channel reference count.
+ * 1) dma_find_channel()/dma_get_channel() must be paired with
+ * dma_put_channel(), even if dma_find_channel() returns NULL.
+ * 2) Only call dma_put_channel() if net_dma_find_channel() returns valid
+ * channel pointer.
+ */
+bool dma_has_capability(enum dma_transaction_type tx_type);
+struct dma_chan *net_dma_find_channel(void);
+struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type);
+#ifdef CONFIG_DMA_ENGINE_HOTPLUG
+DECLARE_PER_CPU(long, dmaengine_chan_ref_count);
+static inline void dma_put_channel(struct dma_chan *chan)
+{
+ this_cpu_dec(dmaengine_chan_ref_count);
+}
+
+struct dma_chan *dma_get_channel(struct dma_chan *chan);
+#else /* CONFIG_DMA_ENGINE_HOTPLUG */
+static inline void dma_put_channel(struct dma_chan *chan) { }
+static inline struct dma_chan *dma_get_channel(struct dma_chan *chan)
+{ return chan; }
+#endif /* CONFIG_DMA_ENGINE_HOTPLUG */
+
+
/* --- Helper iov-locking functions --- */

struct dma_page_list {
--
1.7.9.5

2012-05-14 13:50:35

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 4/7] dmaengine: enhance network subsystem to support DMA device hotplug

From: Jiang Liu <[email protected]>

Enhance network subsystem to correctly update DMA channel reference counts,
so it won't break DMA device hotplug logic.

Signed-off-by: Jiang Liu <[email protected]>
---
include/net/netdma.h | 26 ++++++++++++++++++++++++++
net/ipv4/tcp.c | 10 +++-------
net/ipv4/tcp_input.c | 5 +----
net/ipv4/tcp_ipv4.c | 4 +---
net/ipv6/tcp_ipv6.c | 4 +---
5 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/include/net/netdma.h b/include/net/netdma.h
index 8ba8ce2..6d71724 100644
--- a/include/net/netdma.h
+++ b/include/net/netdma.h
@@ -24,6 +24,32 @@
#include <linux/dmaengine.h>
#include <linux/skbuff.h>

+static inline bool
+net_dma_capable(void)
+{
+ struct dma_chan *chan = net_dma_find_channel();
+ dma_put_channel(chan);
+
+ return !!chan;
+}
+
+static inline struct dma_chan *
+net_dma_get_channel(struct tcp_sock *tp)
+{
+ if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
+ tp->ucopy.dma_chan = net_dma_find_channel();
+ return tp->ucopy.dma_chan;
+}
+
+static inline void
+net_dma_put_channel(struct tcp_sock *tp)
+{
+ if (tp->ucopy.dma_chan) {
+ dma_put_channel(tp->ucopy.dma_chan);
+ tp->ucopy.dma_chan = NULL;
+ }
+}
+
int dma_skb_copy_datagram_iovec(struct dma_chan* chan,
struct sk_buff *skb, int offset, struct iovec *to,
size_t len, struct dma_pinned_list *pinned_list);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8bb6ade..aea4032 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1451,8 +1451,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
available = TCP_SKB_CB(skb)->seq + skb->len - (*seq);
if ((available < target) &&
(len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) &&
- !sysctl_tcp_low_latency &&
- net_dma_find_channel()) {
+ !sysctl_tcp_low_latency && net_dma_capable()) {
preempt_enable_no_resched();
tp->ucopy.pinned_list =
dma_pin_iovec_pages(msg->msg_iov, len);
@@ -1666,10 +1665,7 @@ do_prequeue:

if (!(flags & MSG_TRUNC)) {
#ifdef CONFIG_NET_DMA
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
-
- if (tp->ucopy.dma_chan) {
+ if (net_dma_get_channel(tp)) {
tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
tp->ucopy.dma_chan, skb, offset,
msg->msg_iov, used,
@@ -1758,7 +1754,7 @@ skip_copy:

#ifdef CONFIG_NET_DMA
tcp_service_net_dma(sk, true); /* Wait for queue to drain */
- tp->ucopy.dma_chan = NULL;
+ net_dma_put_channel(tp);

if (tp->ucopy.pinned_list) {
dma_unpin_iovec_pages(tp->ucopy.pinned_list);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9944c1d..3878916 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5227,10 +5227,7 @@ static int tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
if (tp->ucopy.wakeup)
return 0;

- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
-
- if (tp->ucopy.dma_chan && skb_csum_unnecessary(skb)) {
+ if (net_dma_get_channel(tp) && skb_csum_unnecessary(skb)) {

dma_cookie = dma_skb_copy_datagram_iovec(tp->ucopy.dma_chan,
skb, hlen,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 0cb86ce..90ea1c0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1729,9 +1729,7 @@ process:
if (!sock_owned_by_user(sk)) {
#ifdef CONFIG_NET_DMA
struct tcp_sock *tp = tcp_sk(sk);
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
- if (tp->ucopy.dma_chan)
+ if (net_dma_get_channel(tp))
ret = tcp_v4_do_rcv(sk, skb);
else
#endif
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 86cfe60..fb81bbd 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1644,9 +1644,7 @@ process:
if (!sock_owned_by_user(sk)) {
#ifdef CONFIG_NET_DMA
struct tcp_sock *tp = tcp_sk(sk);
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
- if (tp->ucopy.dma_chan)
+ if (net_dma_get_channel(tp))
ret = tcp_v6_do_rcv(sk, skb);
else
#endif
--
1.7.9.5

2012-05-14 13:50:59

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 5/7] dmaengine: enhance ASYNC_TX subsystem to support DMA device hotplug

From: Jiang Liu <[email protected]>

Enhance ASYNC_TX subsystem to correctly update DMA channel reference counts,
so it won't break DMA device hotplug logic.

Signed-off-by: Jiang Liu <[email protected]>
---
crypto/async_tx/async_memcpy.c | 2 ++
crypto/async_tx/async_memset.c | 2 ++
crypto/async_tx/async_pq.c | 10 ++++++++--
crypto/async_tx/async_raid6_recov.c | 8 ++++++--
crypto/async_tx/async_tx.c | 6 +++---
crypto/async_tx/async_xor.c | 13 +++++++++----
include/linux/async_tx.h | 13 +++++++++++++
include/linux/dmaengine.h | 14 ++++++++++++++
8 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/crypto/async_tx/async_memcpy.c b/crypto/async_tx/async_memcpy.c
index 361b5e8..0cbd90e 100644
--- a/crypto/async_tx/async_memcpy.c
+++ b/crypto/async_tx/async_memcpy.c
@@ -90,6 +90,8 @@ async_memcpy(struct page *dest, struct page *src, unsigned int dest_offset,
async_tx_sync_epilog(submit);
}

+ async_tx_put_channel(chan);
+
return tx;
}
EXPORT_SYMBOL_GPL(async_memcpy);
diff --git a/crypto/async_tx/async_memset.c b/crypto/async_tx/async_memset.c
index 58e4a87..ec568bb 100644
--- a/crypto/async_tx/async_memset.c
+++ b/crypto/async_tx/async_memset.c
@@ -79,6 +79,8 @@ async_memset(struct page *dest, int val, unsigned int offset, size_t len,
async_tx_sync_epilog(submit);
}

+ async_tx_put_channel(chan);
+
return tx;
}
EXPORT_SYMBOL_GPL(async_memset);
diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index 91d5d38..ae2070c 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -203,6 +203,7 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
blocks, src_cnt, len);
struct dma_device *device = chan ? chan->device : NULL;
dma_addr_t *dma_src = NULL;
+ struct dma_async_tx_descriptor *tx;

BUG_ON(disks > 255 || !(P(blocks, disks) || Q(blocks, disks)));

@@ -218,12 +219,15 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
/* run the p+q asynchronously */
pr_debug("%s: (async) disks: %d len: %zu\n",
__func__, disks, len);
- return do_async_gen_syndrome(chan, blocks, raid6_gfexp, offset,
- disks, len, dma_src, submit);
+ tx = do_async_gen_syndrome(chan, blocks, raid6_gfexp, offset,
+ disks, len, dma_src, submit);
+ async_tx_put_channel(chan);
+ return tx;
}

/* run the pq synchronously */
pr_debug("%s: (sync) disks: %d len: %zu\n", __func__, disks, len);
+ async_tx_put_channel(chan);

/* wait for any prerequisite operations */
async_tx_quiesce(&submit->depend_tx);
@@ -331,6 +335,7 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
dma_async_issue_pending(chan);
}
async_tx_submit(chan, tx, submit);
+ async_tx_put_channel(chan);

return tx;
} else {
@@ -344,6 +349,7 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,

pr_debug("%s: (sync) disks: %d len: %zu\n",
__func__, disks, len);
+ async_tx_put_channel(chan);

/* caller must provide a temporary result buffer and
* allow the input parameters to be preserved
diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index a9f08a6..0f54d7c 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -54,6 +54,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
len, dma_flags);
if (tx) {
async_tx_submit(chan, tx, submit);
+ async_tx_put_channel(chan);
return tx;
}

@@ -66,6 +67,7 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
}

/* run the operation synchronously */
+ async_tx_put_channel(chan);
async_tx_quiesce(&submit->depend_tx);
amul = raid6_gfmul[coef[0]];
bmul = raid6_gfmul[coef[1]];
@@ -107,6 +109,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
len, dma_flags);
if (tx) {
async_tx_submit(chan, tx, submit);
+ async_tx_put_channel(chan);
return tx;
}

@@ -120,6 +123,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
/* no channel available, or failed to allocate a descriptor, so
* perform the operation synchronously
*/
+ async_tx_put_channel(chan);
async_tx_quiesce(&submit->depend_tx);
qmul = raid6_gfmul[coef];
d = page_address(dest);
@@ -339,7 +343,7 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
* available' case be sure to use the scribble buffer to
* preserve the content of 'blocks' as the caller intended.
*/
- if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+ if (!async_tx_has_capability(DMA_PQ) || !scribble) {
void **ptrs = scribble ? scribble : (void **) blocks;

async_tx_quiesce(&submit->depend_tx);
@@ -415,7 +419,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
* available' case be sure to use the scribble buffer to
* preserve the content of 'blocks' as the caller intended.
*/
- if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+ if (!async_tx_has_capability(DMA_PQ) || !scribble) {
void **ptrs = scribble ? scribble : (void **) blocks;

async_tx_quiesce(&submit->depend_tx);
diff --git a/crypto/async_tx/async_tx.c b/crypto/async_tx/async_tx.c
index 8421209..6fe6561 100644
--- a/crypto/async_tx/async_tx.c
+++ b/crypto/async_tx/async_tx.c
@@ -47,8 +47,8 @@ module_init(async_tx_init);
module_exit(async_tx_exit);

/**
- * __async_tx_find_channel - find a channel to carry out the operation or let
- * the transaction execute synchronously
+ * __async_tx_find_channel - find and hold a channel to carry out the
+ * operation or let the transaction execute synchronously
* @submit: transaction dependency and submission modifiers
* @tx_type: transaction type
*/
@@ -61,7 +61,7 @@ __async_tx_find_channel(struct async_submit_ctl *submit,
/* see if we can keep the chain on one channel */
if (depend_tx &&
dma_has_cap(tx_type, depend_tx->chan->device->cap_mask))
- return depend_tx->chan;
+ return async_dma_get_channel(depend_tx->chan);
return async_dma_find_channel(tx_type);
}
EXPORT_SYMBOL_GPL(__async_tx_find_channel);
diff --git a/crypto/async_tx/async_xor.c b/crypto/async_tx/async_xor.c
index 154cc84..056e248 100644
--- a/crypto/async_tx/async_xor.c
+++ b/crypto/async_tx/async_xor.c
@@ -190,6 +190,7 @@ async_xor(struct page *dest, struct page **src_list, unsigned int offset,
&dest, 1, src_list,
src_cnt, len);
dma_addr_t *dma_src = NULL;
+ struct dma_async_tx_descriptor *tx = NULL;

BUG_ON(src_cnt <= 1);

@@ -202,8 +203,8 @@ async_xor(struct page *dest, struct page **src_list, unsigned int offset,
/* run the xor asynchronously */
pr_debug("%s (async): len: %zu\n", __func__, len);

- return do_async_xor(chan, dest, src_list, offset, src_cnt, len,
- dma_src, submit);
+ tx = do_async_xor(chan, dest, src_list, offset, src_cnt, len,
+ dma_src, submit);
} else {
/* run the xor synchronously */
pr_debug("%s (sync): len: %zu\n", __func__, len);
@@ -222,9 +223,11 @@ async_xor(struct page *dest, struct page **src_list, unsigned int offset,
async_tx_quiesce(&submit->depend_tx);

do_sync_xor(dest, src_list, offset, src_cnt, len, submit);
-
- return NULL;
}
+
+ async_tx_put_channel(chan);
+
+ return tx;
}
EXPORT_SYMBOL_GPL(async_xor);

@@ -330,6 +333,8 @@ async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
submit->flags = flags_orig;
}

+ async_tx_put_channel(chan);
+
return tx;
}
EXPORT_SYMBOL_GPL(async_xor_val);
diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h
index a1c486a..35dea72 100644
--- a/include/linux/async_tx.h
+++ b/include/linux/async_tx.h
@@ -104,6 +104,10 @@ static inline void async_tx_issue_pending(struct dma_async_tx_descriptor *tx)
dma->device_issue_pending(chan);
}
}
+
+#define async_tx_put_channel(c) async_dma_put_channel(c)
+#define async_tx_has_capability(c) async_dma_has_capability(c)
+
#ifdef CONFIG_ARCH_HAS_ASYNC_TX_FIND_CHANNEL
#include <asm/async_tx.h>
#else
@@ -132,6 +136,15 @@ async_tx_find_channel(struct async_submit_ctl *submit,
{
return NULL;
}
+
+static inline void async_tx_put_channel(struct dma_chan *chan)
+{
+}
+
+static inline bool async_tx_has_capability(enum dma_transaction_type type)
+{
+ return false;
+}
#endif

/**
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index e099b28..197bb71 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -785,6 +785,9 @@ static inline void net_dmaengine_put(void)
#else
#define async_dma_find_channel(type) dma_find_channel(type)
#endif /* CONFIG_ASYNC_TX_ENABLE_CHANNEL_SWITCH */
+#define async_dma_get_channel(chan) dma_get_channel(chan)
+#define async_dma_put_channel(chan) dma_put_channel(chan)
+#define async_dma_has_capability(c) dma_has_capability(c)
#else
static inline void async_dmaengine_get(void)
{
@@ -797,6 +800,17 @@ async_dma_find_channel(enum dma_transaction_type type)
{
return NULL;
}
+static inline struct dma_chan *async_dma_get_channel(struct dma_chan *chan)
+{
+ return chan;
+}
+static inline void async_dma_put_channel(struct dma_chan *chan)
+{
+}
+static inline bool async_dma_has_capability(enum dma_transaction_type type)
+{
+ return false;
+}
#endif /* CONFIG_ASYNC_TX_DMA */

dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
--
1.7.9.5

2012-05-14 13:51:10

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 6/7] dmaengine: introduce CONFIG_DMA_ENGINE_HOTPLUG for DMA device hotplug

From: Jiang Liu <[email protected]>

Introduce CONFIG_DMA_ENGINE_HOTPLUG to enable/disable DMA device hotplug logic.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/dma/Kconfig | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index cf9da36..aba92f0 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -62,6 +62,7 @@ config INTEL_IOATDMA
depends on PCI && X86
select DMA_ENGINE
select DCA
+ select DMA_ENGINE_HOTPLUG
select ASYNC_TX_DISABLE_PQ_VAL_DMA
select ASYNC_TX_DISABLE_XOR_VAL_DMA
help
@@ -263,6 +264,11 @@ config DMA_SA11X0
config DMA_ENGINE
bool

+config DMA_ENGINE_HOTPLUG
+ bool
+ default n
+ depends on DMA_ENGINE
+
comment "DMA Clients"
depends on DMA_ENGINE

--
1.7.9.5

2012-05-14 13:51:26

by Jiang Liu

[permalink] [raw]
Subject: [RFC PATCH v2 7/7] dmaengine: assign DMA channel to CPU according to NUMA affinity

From: Jiang Liu <[email protected]>

On systems with multiple CPUs and DMA devices, try optimize DMA
performance by assigning DMA channels to CPUs according to NUMA
affinity relationship. This may help architectures with memory
controllers and DMA devices built into the same physical processor
to avoid unnecessary cross socket traffic.

Signed-off-by: Jiang Liu <[email protected]>
---
drivers/dma/dmaengine.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index eca45c0..8a41bdf 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -266,6 +266,7 @@ static dma_cap_mask_t dma_cap_mask_all;
struct dma_chan_tbl_ent {
struct dma_chan *chan;
struct dma_chan *prev_chan;
+ int node;
};

/**
@@ -467,6 +468,46 @@ static void dma_channel_quiesce(void)
#endif
}

+/* Assign DMA channels to CPUs according to NUMA affinity relationship */
+static void dma_channel_set(int cap, int cpu, struct dma_chan *chan)
+{
+ int node;
+ int src_cpu;
+ struct dma_chan *src_chan;
+ struct dma_chan_tbl_ent *entry;
+ struct dma_chan_tbl_ent *src_entry;
+
+ entry = per_cpu_ptr(channel_table[cap], cpu);
+ node = dev_to_node(chan->device->dev);
+
+ /* Try to optimize if CPU and DMA channel belong to different node. */
+ if (node != -1 && node != cpu_to_node(cpu)) {
+ for_each_online_cpu(src_cpu) {
+ src_entry = per_cpu_ptr(channel_table[cap], src_cpu);
+ src_chan = src_entry->chan;
+
+ /*
+ * CPU online map may change beneath us due to
+ * CPU hotplug operations.
+ */
+ if (src_chan == NULL)
+ continue;
+
+ if (src_entry->node == node ||
+ cpu_to_node(src_cpu) == node) {
+ entry->node = src_entry->node;
+ src_entry->node = node;
+ entry->chan = src_chan;
+ src_entry->chan = chan;
+ return;
+ }
+ }
+ }
+
+ entry->node = node;
+ entry->chan = chan;
+}
+
/**
* dma_channel_rebalance - redistribute the available channels
*
@@ -501,8 +542,8 @@ static void dma_channel_rebalance(bool quiesce)
chan = nth_chan(cap, n++);
else
chan = nth_chan(cap, -1);
- entry = per_cpu_ptr(channel_table[cap], cpu);
- entry->chan = chan;
+ if (chan)
+ dma_channel_set(cap, cpu, chan);
}

if (quiesce)
--
1.7.9.5

2012-05-25 00:49:27

by Izumi, Taku

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] dmaengine: enhance dmaengine to support DMA device hotplug

Hi Jiang,

Build failed with "Device Drivers/DMA Engine Support/Async_tx: Offload
support for the async_tx api" enabled because of undefined symbol.
Patch 5/7 needs a fix?

(snip)
CHK include/linux/version.h
CHK include/generated/utsrelease.h
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
CHK include/linux/version.h
make[2]: Nothing to be done for `all'.
Building modules, stage 2.
TEST posttest
MODPOST 2201 modules
Succeed: decoded and checked 1356146 instructions
TEST posttest
Success: decoded and checked 1000000 random instructions with 0 errors (seed:0x9432fb9e)
Kernel: arch/x86/boot/bzImage is ready (#13)
ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_xor.ko] undefined!
ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_raid6_recov.ko] undefined!
ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_pq.ko] undefined!
ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_memcpy.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

Best regards,
Taku Izumi <[email protected]>

2012-05-26 16:38:53

by Jiang Liu

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] dmaengine: enhance dmaengine to support DMA device hotplug

Hi Taku,
Sorry, forgot to export the percpu variable. Could you please
help to test the patch below?
Thanks!
---
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8a41bdf..26316f6 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -73,6 +73,7 @@ static atomic_t dmaengine_dirty;
static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
+EXPORT_PER_CPU_SYMBOL(dmaengine_chan_ref_count);
#endif /* CONFIG_DMA_ENGINE_HOTPLUG */

/* --- sysfs implementation --- */

On 05/25/2012 08:49 AM, Taku Izumi wrote:
> Hi Jiang,
>
> Build failed with "Device Drivers/DMA Engine Support/Async_tx: Offload
> support for the async_tx api" enabled because of undefined symbol.
> Patch 5/7 needs a fix?
>
> (snip)
> CHK include/linux/version.h
> CHK include/generated/utsrelease.h
> CALL scripts/checksyscalls.sh
> CHK include/generated/compile.h
> CHK include/linux/version.h
> make[2]: Nothing to be done for `all'.
> Building modules, stage 2.
> TEST posttest
> MODPOST 2201 modules
> Succeed: decoded and checked 1356146 instructions
> TEST posttest
> Success: decoded and checked 1000000 random instructions with 0 errors (seed:0x9432fb9e)
> Kernel: arch/x86/boot/bzImage is ready (#13)
> ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_xor.ko] undefined!
> ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_raid6_recov.ko] undefined!
> ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_pq.ko] undefined!
> ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_memcpy.ko] undefined!
> make[1]: *** [__modpost] Error 1
> make: *** [modules] Error 2
>
> Best regards,
> Taku Izumi <[email protected]>
>

2012-05-29 08:49:51

by Izumi, Taku

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] dmaengine: enhance dmaengine to support DMA device hotplug



On Sun, 27 May 2012 00:38:35 +0800
Jiang Liu <[email protected]> wrote:

> Hi Taku,
> Sorry, forgot to export the percpu variable. Could you please
> help to test the patch below?
> Thanks!

Hi Jiang,

Thank you.
This patch can fix my build issue.

Best regards,
Taku Izumi


> ---
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
> index 8a41bdf..26316f6 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -73,6 +73,7 @@ static atomic_t dmaengine_dirty;
> static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
> static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
> DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
> +EXPORT_PER_CPU_SYMBOL(dmaengine_chan_ref_count);
> #endif /* CONFIG_DMA_ENGINE_HOTPLUG */
>
> /* --- sysfs implementation --- */
>
> On 05/25/2012 08:49 AM, Taku Izumi wrote:
> > Hi Jiang,
> >
> > Build failed with "Device Drivers/DMA Engine Support/Async_tx: Offload
> > support for the async_tx api" enabled because of undefined symbol.
> > Patch 5/7 needs a fix?
> >
> > (snip)
> > CHK include/linux/version.h
> > CHK include/generated/utsrelease.h
> > CALL scripts/checksyscalls.sh
> > CHK include/generated/compile.h
> > CHK include/linux/version.h
> > make[2]: Nothing to be done for `all'.
> > Building modules, stage 2.
> > TEST posttest
> > MODPOST 2201 modules
> > Succeed: decoded and checked 1356146 instructions
> > TEST posttest
> > Success: decoded and checked 1000000 random instructions with 0 errors (seed:0x9432fb9e)
> > Kernel: arch/x86/boot/bzImage is ready (#13)
> > ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_xor.ko] undefined!
> > ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_raid6_recov.ko] undefined!
> > ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_pq.ko] undefined!
> > ERROR: "dmaengine_chan_ref_count" [crypto/async_tx/async_memcpy.ko] undefined!
> > make[1]: *** [__modpost] Error 1
> > make: *** [modules] Error 2
> >
> > Best regards,
> > Taku Izumi <[email protected]>
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2013-08-08 10:46:09

by Rui Wang

[permalink] [raw]
Subject: RE: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA device hotplug

Hi Jiang,
See my comments inline.

> -----Original Message-----
> From: Jiang Liu <[email protected]>
> Date: Mon, 14 May 2012 21:47:05 +0800
> Subject: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA
> device hotplug
> To: Dan Williams <[email protected]>, Maciej Sosnowski
> <[email protected]>, Vinod Koul <[email protected]>
> Cc: Jiang Liu <[email protected]>, Keping Chen <[email protected]>,
> [email protected], [email protected]
>
> From: Jiang Liu <[email protected]>
>
> From: Jiang Liu <[email protected]>
>
> To support DMA device hotplug, dmaengine must correctly manage reference
> count for DMA channels. On the other hand, DMA hot path is performance
> critical, reference count management code should avoid polluting global shared
> cachelines, otherwise it may cause heavy performance penalty.
>
> This patch introduces a lightweight DMA channel reference count management
> mechanism by using percpu counter. When DMA device hotplug is disabled,
> there's no performance penalty. When hotplug is enabled, a
> dma_find_channel()/dma_put_channel() pair adds two local memory accesses
> to the hot path.
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> drivers/dma/dmaengine.c | 112
> +++++++++++++++++++++++++++++++++++++++++----
> include/linux/dmaengine.h | 29 +++++++++++-
> 2 files changed, 129 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index
> d3b1c48..eca45c0 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -61,12 +61,20 @@
> #include <linux/rculist.h>
> #include <linux/idr.h>
> #include <linux/slab.h>
> +#include <linux/sched.h>
>
> static DEFINE_MUTEX(dma_list_mutex);
> static DEFINE_IDR(dma_idr);
> static LIST_HEAD(dma_device_list);
> static long dmaengine_client_count;
>
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> +static atomic_t dmaengine_dirty;
> +static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
> +static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
> +DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
> +#endif /* CONFIG_DMA_ENGINE_HOTPLUG */
> +
> /* --- sysfs implementation --- */
>
> /**
> @@ -305,10 +313,40 @@ arch_initcall(dma_channel_table_init);
> */
> struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
> {
> - return this_cpu_read(channel_table[tx_type]->chan);
> + struct dma_chan *chan = this_cpu_read(channel_table[tx_type]->chan);
> +
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> + this_cpu_inc(dmaengine_chan_ref_count);
> + if (static_key_false(&dmaengine_quiesce))
> + chan = NULL;
> +#endif
> +
> + return chan;
> }
> EXPORT_SYMBOL(dma_find_channel);
>
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> +struct dma_chan *dma_get_channel(struct dma_chan *chan) {
> + if (static_key_false(&dmaengine_quiesce))
> + atomic_inc(&dmaengine_dirty);
> + this_cpu_inc(dmaengine_chan_ref_count);
> +
> + return chan;
> +}
> +EXPORT_SYMBOL(dma_get_channel);
> +#endif
> +
> +/**
> + * dma_has_capability - check whether any channel supports tx_type
> + * @tx_type: transaction type
> + */
> +bool dma_has_capability(enum dma_transaction_type tx_type) {
> + return !!this_cpu_read(channel_table[tx_type]->chan);
> +}
> +EXPORT_SYMBOL(dma_has_capability);
> +
> /*
> * net_dma_find_channel - find a channel for net_dma
> * net_dma has alignment requirements
> @@ -316,10 +354,15 @@ EXPORT_SYMBOL(dma_find_channel); struct
> dma_chan *net_dma_find_channel(void) {
> struct dma_chan *chan = dma_find_channel(DMA_MEMCPY);
> - if (chan && !is_dma_copy_aligned(chan->device, 1, 1, 1))
> - return NULL;
>
> - return chan;
> + if (chan && is_dma_copy_aligned(chan->device, 1, 1, 1))
> + return chan;
> +
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> + this_cpu_dec(dmaengine_chan_ref_count);
> +#endif
> +
> + return NULL;
> }
> EXPORT_SYMBOL(net_dma_find_channel);
>
> @@ -393,6 +436,37 @@ static struct dma_chan *nth_chan(enum
> dma_transaction_type cap, int n)
> return ret;
> }
>
> +static void dma_channel_quiesce(void)
> +{
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> + int cpu;
> + long ref_count;
> + int quiesce_count = 0;
> +
> + static_key_slow_inc(&dmaengine_quiesce);
> +
> + do {
> + atomic_set(&dmaengine_dirty, 0);
> + schedule_timeout(1);
> +
> + if (atomic_read(&dmaengine_dirty)) {
> + quiesce_count = 0;
> + continue;
> + }
> +
> + ref_count = 0;
> + for_each_possible_cpu(cpu)
> + ref_count += per_cpu(dmaengine_chan_ref_count, cpu);
> + if (ref_count == 0)
> + quiesce_count++;
> + else
> + quiesce_count = 0;
> + } while (quiesce_count < 10);
> +
> + static_key_slow_dec(&dmaengine_quiesce);
> +#endif
> +}
> +

The purpose of dma_channel_quiesce() is unclear to me. It waits until dmaengine_chan_ref_count is 0, but how do you prevent someone from using dma after it returns?

<snip>

> @@ -767,14 +848,22 @@ void dma_async_device_unregister(struct
> dma_device *device)
>
> mutex_lock(&dma_list_mutex);
> list_del_rcu(&device->global_node);
> - dma_channel_rebalance();
> + dma_channel_rebalance(true);
> mutex_unlock(&dma_list_mutex);
>
> /* Check whether it's called from module exit function. */
> if (try_module_get(device->dev->driver->owner)) {
> +#ifndef CONFIG_DMA_ENGINE_HOTPLUG
> dev_warn(device->dev,
> "%s isn't called from module exit function.\n",
> __func__);
> +#else
> + list_for_each_entry(chan, &device->channels, device_node) {
> + /* TODO: notify clients to release channels*/
> + wait_event(dmaengine_wait_queue,
> + chan->client_count == 0);
> + }

How do you notify clients to release the channels? Is there an updated version which handles this? There're resources (e.g. timers) that must be free'd, which can only happen when someone calls dma_chan_put() until client_count reaches 0 and device_free_chan_resources() gets called.

Rui Wang

2013-08-08 10:59:19

by Rui Wang

[permalink] [raw]
Subject: RE: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA device hotplug

(resend adding cc list)

Hi Jiang,
See my comments inline.

> -----Original Message-----
> From: Jiang Liu <[email protected]>
> Date: Mon, 14 May 2012 21:47:05 +0800
> Subject: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA
> device hotplug
> To: Dan Williams <[email protected]>, Maciej Sosnowski
> <[email protected]>, Vinod Koul <[email protected]>
> Cc: Jiang Liu <[email protected]>, Keping Chen <[email protected]>,
> [email protected], [email protected]
>
> From: Jiang Liu <[email protected]>
>
> From: Jiang Liu <[email protected]>
>
> To support DMA device hotplug, dmaengine must correctly manage reference
> count for DMA channels. On the other hand, DMA hot path is performance
> critical, reference count management code should avoid polluting global shared
> cachelines, otherwise it may cause heavy performance penalty.
>
> This patch introduces a lightweight DMA channel reference count management
> mechanism by using percpu counter. When DMA device hotplug is disabled,
> there's no performance penalty. When hotplug is enabled, a
> dma_find_channel()/dma_put_channel() pair adds two local memory accesses
> to the hot path.
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> drivers/dma/dmaengine.c | 112
> +++++++++++++++++++++++++++++++++++++++++----
> include/linux/dmaengine.h | 29 +++++++++++-
> 2 files changed, 129 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index
> d3b1c48..eca45c0 100644
> --- a/drivers/dma/dmaengine.c
> +++ b/drivers/dma/dmaengine.c
> @@ -61,12 +61,20 @@
> #include <linux/rculist.h>
> #include <linux/idr.h>
> #include <linux/slab.h>
> +#include <linux/sched.h>
>
> static DEFINE_MUTEX(dma_list_mutex);
> static DEFINE_IDR(dma_idr);
> static LIST_HEAD(dma_device_list);
> static long dmaengine_client_count;
>
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> +static atomic_t dmaengine_dirty;
> +static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
> +static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
> +DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
> +#endif /* CONFIG_DMA_ENGINE_HOTPLUG */
> +
> /* --- sysfs implementation --- */
>
> /**
> @@ -305,10 +313,40 @@ arch_initcall(dma_channel_table_init);
> */
> struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
> {
> - return this_cpu_read(channel_table[tx_type]->chan);
> + struct dma_chan *chan = this_cpu_read(channel_table[tx_type]->chan);
> +
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> + this_cpu_inc(dmaengine_chan_ref_count);
> + if (static_key_false(&dmaengine_quiesce))
> + chan = NULL;
> +#endif
> +
> + return chan;
> }
> EXPORT_SYMBOL(dma_find_channel);
>
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> +struct dma_chan *dma_get_channel(struct dma_chan *chan) {
> + if (static_key_false(&dmaengine_quiesce))
> + atomic_inc(&dmaengine_dirty);
> + this_cpu_inc(dmaengine_chan_ref_count);
> +
> + return chan;
> +}
> +EXPORT_SYMBOL(dma_get_channel);
> +#endif
> +
> +/**
> + * dma_has_capability - check whether any channel supports tx_type
> + * @tx_type: transaction type
> + */
> +bool dma_has_capability(enum dma_transaction_type tx_type) {
> + return !!this_cpu_read(channel_table[tx_type]->chan);
> +}
> +EXPORT_SYMBOL(dma_has_capability);
> +
> /*
> * net_dma_find_channel - find a channel for net_dma
> * net_dma has alignment requirements
> @@ -316,10 +354,15 @@ EXPORT_SYMBOL(dma_find_channel); struct
> dma_chan *net_dma_find_channel(void) {
> struct dma_chan *chan = dma_find_channel(DMA_MEMCPY);
> - if (chan && !is_dma_copy_aligned(chan->device, 1, 1, 1))
> - return NULL;
>
> - return chan;
> + if (chan && is_dma_copy_aligned(chan->device, 1, 1, 1))
> + return chan;
> +
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> + this_cpu_dec(dmaengine_chan_ref_count);
> +#endif
> +
> + return NULL;
> }
> EXPORT_SYMBOL(net_dma_find_channel);
>
> @@ -393,6 +436,37 @@ static struct dma_chan *nth_chan(enum
> dma_transaction_type cap, int n)
> return ret;
> }
>
> +static void dma_channel_quiesce(void)
> +{
> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> + int cpu;
> + long ref_count;
> + int quiesce_count = 0;
> +
> + static_key_slow_inc(&dmaengine_quiesce);
> +
> + do {
> + atomic_set(&dmaengine_dirty, 0);
> + schedule_timeout(1);
> +
> + if (atomic_read(&dmaengine_dirty)) {
> + quiesce_count = 0;
> + continue;
> + }
> +
> + ref_count = 0;
> + for_each_possible_cpu(cpu)
> + ref_count += per_cpu(dmaengine_chan_ref_count, cpu);
> + if (ref_count == 0)
> + quiesce_count++;
> + else
> + quiesce_count = 0;
> + } while (quiesce_count < 10);
> +
> + static_key_slow_dec(&dmaengine_quiesce);
> +#endif
> +}
> +

The purpose of dma_channel_quiesce() is unclear to me. It waits until dmaengine_chan_ref_count is 0, but how do you prevent someone from using dma after it returns?

<snip>

> @@ -767,14 +848,22 @@ void dma_async_device_unregister(struct
> dma_device *device)
>
> mutex_lock(&dma_list_mutex);
> list_del_rcu(&device->global_node);
> - dma_channel_rebalance();
> + dma_channel_rebalance(true);
> mutex_unlock(&dma_list_mutex);
>
> /* Check whether it's called from module exit function. */
> if (try_module_get(device->dev->driver->owner)) {
> +#ifndef CONFIG_DMA_ENGINE_HOTPLUG
> dev_warn(device->dev,
> "%s isn't called from module exit function.\n",
> __func__);
> +#else
> + list_for_each_entry(chan, &device->channels, device_node) {
> + /* TODO: notify clients to release channels*/
> + wait_event(dmaengine_wait_queue,
> + chan->client_count == 0);
> + }

How do you notify clients to release the channels? Is there an updated version which handles this? There're resources (e.g. timers) that must be free'd, which can only happen when someone calls dma_chan_put() until client_count reaches 0 and device_free_chan_resources() gets called.

Rui Wang

2013-08-09 00:03:55

by Jon Mason

[permalink] [raw]
Subject: Re: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA device hotplug

On Thu, Aug 8, 2013 at 3:59 AM, Wang, Rui Y <[email protected]> wrote:
> (resend adding cc list)

The e-mail you are responding to is over a year old, but doesn't
appear to have been accepted. I suppose late is better than never...

Adding Dan Williams new e-mail address and Dave Jiang.

Thanks,
Jon

>
> Hi Jiang,
> See my comments inline.
>
>> -----Original Message-----
>> From: Jiang Liu <[email protected]>
>> Date: Mon, 14 May 2012 21:47:05 +0800
>> Subject: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA
>> device hotplug
>> To: Dan Williams <[email protected]>, Maciej Sosnowski
>> <[email protected]>, Vinod Koul <[email protected]>
>> Cc: Jiang Liu <[email protected]>, Keping Chen <[email protected]>,
>> [email protected], [email protected]
>>
>> From: Jiang Liu <[email protected]>
>>
>> From: Jiang Liu <[email protected]>
>>
>> To support DMA device hotplug, dmaengine must correctly manage reference
>> count for DMA channels. On the other hand, DMA hot path is performance
>> critical, reference count management code should avoid polluting global shared
>> cachelines, otherwise it may cause heavy performance penalty.
>>
>> This patch introduces a lightweight DMA channel reference count management
>> mechanism by using percpu counter. When DMA device hotplug is disabled,
>> there's no performance penalty. When hotplug is enabled, a
>> dma_find_channel()/dma_put_channel() pair adds two local memory accesses
>> to the hot path.
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> ---
>> drivers/dma/dmaengine.c | 112
>> +++++++++++++++++++++++++++++++++++++++++----
>> include/linux/dmaengine.h | 29 +++++++++++-
>> 2 files changed, 129 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index
>> d3b1c48..eca45c0 100644
>> --- a/drivers/dma/dmaengine.c
>> +++ b/drivers/dma/dmaengine.c
>> @@ -61,12 +61,20 @@
>> #include <linux/rculist.h>
>> #include <linux/idr.h>
>> #include <linux/slab.h>
>> +#include <linux/sched.h>
>>
>> static DEFINE_MUTEX(dma_list_mutex);
>> static DEFINE_IDR(dma_idr);
>> static LIST_HEAD(dma_device_list);
>> static long dmaengine_client_count;
>>
>> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
>> +static atomic_t dmaengine_dirty;
>> +static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
>> +static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
>> +DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
>> +#endif /* CONFIG_DMA_ENGINE_HOTPLUG */
>> +
>> /* --- sysfs implementation --- */
>>
>> /**
>> @@ -305,10 +313,40 @@ arch_initcall(dma_channel_table_init);
>> */
>> struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
>> {
>> - return this_cpu_read(channel_table[tx_type]->chan);
>> + struct dma_chan *chan = this_cpu_read(channel_table[tx_type]->chan);
>> +
>> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
>> + this_cpu_inc(dmaengine_chan_ref_count);
>> + if (static_key_false(&dmaengine_quiesce))
>> + chan = NULL;
>> +#endif
>> +
>> + return chan;
>> }
>> EXPORT_SYMBOL(dma_find_channel);
>>
>> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
>> +struct dma_chan *dma_get_channel(struct dma_chan *chan) {
>> + if (static_key_false(&dmaengine_quiesce))
>> + atomic_inc(&dmaengine_dirty);
>> + this_cpu_inc(dmaengine_chan_ref_count);
>> +
>> + return chan;
>> +}
>> +EXPORT_SYMBOL(dma_get_channel);
>> +#endif
>> +
>> +/**
>> + * dma_has_capability - check whether any channel supports tx_type
>> + * @tx_type: transaction type
>> + */
>> +bool dma_has_capability(enum dma_transaction_type tx_type) {
>> + return !!this_cpu_read(channel_table[tx_type]->chan);
>> +}
>> +EXPORT_SYMBOL(dma_has_capability);
>> +
>> /*
>> * net_dma_find_channel - find a channel for net_dma
>> * net_dma has alignment requirements
>> @@ -316,10 +354,15 @@ EXPORT_SYMBOL(dma_find_channel); struct
>> dma_chan *net_dma_find_channel(void) {
>> struct dma_chan *chan = dma_find_channel(DMA_MEMCPY);
>> - if (chan && !is_dma_copy_aligned(chan->device, 1, 1, 1))
>> - return NULL;
>>
>> - return chan;
>> + if (chan && is_dma_copy_aligned(chan->device, 1, 1, 1))
>> + return chan;
>> +
>> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
>> + this_cpu_dec(dmaengine_chan_ref_count);
>> +#endif
>> +
>> + return NULL;
>> }
>> EXPORT_SYMBOL(net_dma_find_channel);
>>
>> @@ -393,6 +436,37 @@ static struct dma_chan *nth_chan(enum
>> dma_transaction_type cap, int n)
>> return ret;
>> }
>>
>> +static void dma_channel_quiesce(void)
>> +{
>> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
>> + int cpu;
>> + long ref_count;
>> + int quiesce_count = 0;
>> +
>> + static_key_slow_inc(&dmaengine_quiesce);
>> +
>> + do {
>> + atomic_set(&dmaengine_dirty, 0);
>> + schedule_timeout(1);
>> +
>> + if (atomic_read(&dmaengine_dirty)) {
>> + quiesce_count = 0;
>> + continue;
>> + }
>> +
>> + ref_count = 0;
>> + for_each_possible_cpu(cpu)
>> + ref_count += per_cpu(dmaengine_chan_ref_count, cpu);
>> + if (ref_count == 0)
>> + quiesce_count++;
>> + else
>> + quiesce_count = 0;
>> + } while (quiesce_count < 10);
>> +
>> + static_key_slow_dec(&dmaengine_quiesce);
>> +#endif
>> +}
>> +
>
> The purpose of dma_channel_quiesce() is unclear to me. It waits until dmaengine_chan_ref_count is 0, but how do you prevent someone from using dma after it returns?
>
> <snip>
>
>> @@ -767,14 +848,22 @@ void dma_async_device_unregister(struct
>> dma_device *device)
>>
>> mutex_lock(&dma_list_mutex);
>> list_del_rcu(&device->global_node);
>> - dma_channel_rebalance();
>> + dma_channel_rebalance(true);
>> mutex_unlock(&dma_list_mutex);
>>
>> /* Check whether it's called from module exit function. */
>> if (try_module_get(device->dev->driver->owner)) {
>> +#ifndef CONFIG_DMA_ENGINE_HOTPLUG
>> dev_warn(device->dev,
>> "%s isn't called from module exit function.\n",
>> __func__);
>> +#else
>> + list_for_each_entry(chan, &device->channels, device_node) {
>> + /* TODO: notify clients to release channels*/
>> + wait_event(dmaengine_wait_queue,
>> + chan->client_count == 0);
>> + }
>
> How do you notify clients to release the channels? Is there an updated version which handles this? There're resources (e.g. timers) that must be free'd, which can only happen when someone calls dma_chan_put() until client_count reaches 0 and device_free_chan_resources() gets called.
>
> Rui Wang
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-08-09 02:27:38

by Rui Wang

[permalink] [raw]
Subject: RE: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support DMA device hotplug

> -----Original Message-----
> From: Jon Mason [mailto:[email protected]]
> Sent: Friday, August 09, 2013 8:04 AM
> To: Wang, Rui Y
> Cc: [email protected]; Sosnowski, Maciej; Koul, Vinod;
> [email protected]; [email protected];
> [email protected]; Luck, Tony; Guo, Chaohong; Dan Williams; Jiang,
> Dave
> Subject: Re: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support
> DMA device hotplug
>
> On Thu, Aug 8, 2013 at 3:59 AM, Wang, Rui Y <[email protected]> wrote:
> > (resend adding cc list)
>
> The e-mail you are responding to is over a year old, but doesn't appear to have
> been accepted. I suppose late is better than never...
>

Yes agreed. We eventually have to fix it.

I recently encountered the same problem (dma_async_device_unregister() hung my machine). I was looking for people who cared about it and found this thread.

Thanks
Rui

> Adding Dan Williams new e-mail address and Dave Jiang.
>
> Thanks,
> Jon
>
> >
> > Hi Jiang,
> > See my comments inline.
> >
> >> -----Original Message-----
> >> From: Jiang Liu <[email protected]>
> >> Date: Mon, 14 May 2012 21:47:05 +0800
> >> Subject: [RFC PATCH v2 3/7] dmaengine: enhance dmaengine to support
> >> DMA device hotplug
> >> To: Dan Williams <[email protected]>, Maciej Sosnowski
> >> <[email protected]>, Vinod Koul <[email protected]>
> >> Cc: Jiang Liu <[email protected]>, Keping Chen
> >> <[email protected]>, [email protected],
> >> [email protected]
> >>
> >> From: Jiang Liu <[email protected]>
> >>
> >> From: Jiang Liu <[email protected]>
> >>
> >> To support DMA device hotplug, dmaengine must correctly manage
> >> reference count for DMA channels. On the other hand, DMA hot path is
> >> performance critical, reference count management code should avoid
> >> polluting global shared cachelines, otherwise it may cause heavy
> performance penalty.
> >>
> >> This patch introduces a lightweight DMA channel reference count
> >> management mechanism by using percpu counter. When DMA device
> hotplug
> >> is disabled, there's no performance penalty. When hotplug is enabled,
> >> a
> >> dma_find_channel()/dma_put_channel() pair adds two local memory
> >> accesses to the hot path.
> >>
> >> Signed-off-by: Jiang Liu <[email protected]>
> >> ---
> >> drivers/dma/dmaengine.c | 112
> >> +++++++++++++++++++++++++++++++++++++++++----
> >> include/linux/dmaengine.h | 29 +++++++++++-
> >> 2 files changed, 129 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index
> >> d3b1c48..eca45c0 100644
> >> --- a/drivers/dma/dmaengine.c
> >> +++ b/drivers/dma/dmaengine.c
> >> @@ -61,12 +61,20 @@
> >> #include <linux/rculist.h>
> >> #include <linux/idr.h>
> >> #include <linux/slab.h>
> >> +#include <linux/sched.h>
> >>
> >> static DEFINE_MUTEX(dma_list_mutex); static DEFINE_IDR(dma_idr);
> >> static LIST_HEAD(dma_device_list); static long
> >> dmaengine_client_count;
> >>
> >> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> >> +static atomic_t dmaengine_dirty;
> >> +static struct static_key dmaengine_quiesce = STATIC_KEY_INIT_FALSE;
> >> +static DECLARE_WAIT_QUEUE_HEAD(dmaengine_wait_queue);
> >> +DEFINE_PER_CPU(long, dmaengine_chan_ref_count);
> >> +#endif /* CONFIG_DMA_ENGINE_HOTPLUG */
> >> +
> >> /* --- sysfs implementation --- */
> >>
> >> /**
> >> @@ -305,10 +313,40 @@ arch_initcall(dma_channel_table_init);
> >> */
> >> struct dma_chan *dma_find_channel(enum dma_transaction_type
> tx_type)
> >> {
> >> - return this_cpu_read(channel_table[tx_type]->chan);
> >> + struct dma_chan *chan =
> >> + this_cpu_read(channel_table[tx_type]->chan);
> >> +
> >> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> >> + this_cpu_inc(dmaengine_chan_ref_count);
> >> + if (static_key_false(&dmaengine_quiesce))
> >> + chan = NULL;
> >> +#endif
> >> +
> >> + return chan;
> >> }
> >> EXPORT_SYMBOL(dma_find_channel);
> >>
> >> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> >> +struct dma_chan *dma_get_channel(struct dma_chan *chan) {
> >> + if (static_key_false(&dmaengine_quiesce))
> >> + atomic_inc(&dmaengine_dirty);
> >> + this_cpu_inc(dmaengine_chan_ref_count);
> >> +
> >> + return chan;
> >> +}
> >> +EXPORT_SYMBOL(dma_get_channel);
> >> +#endif
> >> +
> >> +/**
> >> + * dma_has_capability - check whether any channel supports tx_type
> >> + * @tx_type: transaction type
> >> + */
> >> +bool dma_has_capability(enum dma_transaction_type tx_type) {
> >> + return !!this_cpu_read(channel_table[tx_type]->chan);
> >> +}
> >> +EXPORT_SYMBOL(dma_has_capability);
> >> +
> >> /*
> >> * net_dma_find_channel - find a channel for net_dma
> >> * net_dma has alignment requirements @@ -316,10 +354,15 @@
> >> EXPORT_SYMBOL(dma_find_channel); struct dma_chan
> >> *net_dma_find_channel(void) {
> >> struct dma_chan *chan = dma_find_channel(DMA_MEMCPY);
> >> - if (chan && !is_dma_copy_aligned(chan->device, 1, 1, 1))
> >> - return NULL;
> >>
> >> - return chan;
> >> + if (chan && is_dma_copy_aligned(chan->device, 1, 1, 1))
> >> + return chan;
> >> +
> >> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> >> + this_cpu_dec(dmaengine_chan_ref_count);
> >> +#endif
> >> +
> >> + return NULL;
> >> }
> >> EXPORT_SYMBOL(net_dma_find_channel);
> >>
> >> @@ -393,6 +436,37 @@ static struct dma_chan *nth_chan(enum
> >> dma_transaction_type cap, int n)
> >> return ret;
> >> }
> >>
> >> +static void dma_channel_quiesce(void) {
> >> +#ifdef CONFIG_DMA_ENGINE_HOTPLUG
> >> + int cpu;
> >> + long ref_count;
> >> + int quiesce_count = 0;
> >> +
> >> + static_key_slow_inc(&dmaengine_quiesce);
> >> +
> >> + do {
> >> + atomic_set(&dmaengine_dirty, 0);
> >> + schedule_timeout(1);
> >> +
> >> + if (atomic_read(&dmaengine_dirty)) {
> >> + quiesce_count = 0;
> >> + continue;
> >> + }
> >> +
> >> + ref_count = 0;
> >> + for_each_possible_cpu(cpu)
> >> + ref_count +=
> per_cpu(dmaengine_chan_ref_count, cpu);
> >> + if (ref_count == 0)
> >> + quiesce_count++;
> >> + else
> >> + quiesce_count = 0;
> >> + } while (quiesce_count < 10);
> >> +
> >> + static_key_slow_dec(&dmaengine_quiesce);
> >> +#endif
> >> +}
> >> +
> >
> > The purpose of dma_channel_quiesce() is unclear to me. It waits until
> dmaengine_chan_ref_count is 0, but how do you prevent someone from using
> dma after it returns?
> >
> > <snip>
> >
> >> @@ -767,14 +848,22 @@ void dma_async_device_unregister(struct
> >> dma_device *device)
> >>
> >> mutex_lock(&dma_list_mutex);
> >> list_del_rcu(&device->global_node);
> >> - dma_channel_rebalance();
> >> + dma_channel_rebalance(true);
> >> mutex_unlock(&dma_list_mutex);
> >>
> >> /* Check whether it's called from module exit function. */
> >> if (try_module_get(device->dev->driver->owner)) {
> >> +#ifndef CONFIG_DMA_ENGINE_HOTPLUG
> >> dev_warn(device->dev,
> >> "%s isn't called from module exit function.\n",
> >> __func__);
> >> +#else
> >> + list_for_each_entry(chan, &device->channels, device_node)
> {
> >> + /* TODO: notify clients to release channels*/
> >> + wait_event(dmaengine_wait_queue,
> >> + chan->client_count == 0);
> >> + }
> >
> > How do you notify clients to release the channels? Is there an updated
> version which handles this? There're resources (e.g. timers) that must be
> free'd, which can only happen when someone calls dma_chan_put() until
> client_count reaches 0 and device_free_chan_resources() gets called.
> >
> > Rui Wang
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci"
> > in the body of a message to [email protected] More majordomo
> > info at http://vger.kernel.org/majordomo-info.html