2014-01-09 20:12:45

by Dan Williams

[permalink] [raw]
Subject: [PATCH v2 0/4] net_dma removal, and dma debug extension

Follow up patches to 77873803363c "net_dma: mark broken" to remove
net_dma bits and provide debug infrastructure to flag other
get_user_pages() vs dma instances that might violate the dma api.

Will takes this through the dmaengine tree once acked.

Changes since v1 [1]:

1/ net_dma removal patch has been expanded to revert other
net_dma induced changes.

2/ updated the debug_dma_assert_idle() api to be gated on
CONFIG_DMA_VS_CPU_DEBUG

---

Dan Williams (4):
net_dma: simple removal
net_dma: revert 'copied_early'
net: make tcp_cleanup_rbuf private
dma debug: introduce debug_dma_assert_idle()


Documentation/ABI/removed/net_dma | 8 +
Documentation/networking/ip-sysctl.txt | 6 -
drivers/dma/Kconfig | 12 -
drivers/dma/Makefile | 1
drivers/dma/dmaengine.c | 104 ------------
drivers/dma/ioat/dma.c | 1
drivers/dma/ioat/dma.h | 7 -
drivers/dma/ioat/dma_v2.c | 1
drivers/dma/ioat/dma_v3.c | 1
drivers/dma/iovlock.c | 280 --------------------------------
include/linux/dma-debug.h | 6 +
include/linux/dmaengine.h | 22 ---
include/linux/skbuff.h | 8 -
include/linux/tcp.h | 8 -
include/net/netdma.h | 32 ----
include/net/sock.h | 19 --
include/net/tcp.h | 9 -
kernel/sysctl_binary.c | 1
lib/Kconfig.debug | 13 +
lib/dma-debug.c | 130 +++++++++++++--
mm/memory.c | 3
net/core/Makefile | 1
net/core/dev.c | 10 -
net/core/sock.c | 6 -
net/core/user_dma.c | 131 ---------------
net/dccp/proto.c | 4
net/ipv4/sysctl_net_ipv4.c | 9 -
net/ipv4/tcp.c | 157 ++----------------
net/ipv4/tcp_input.c | 83 +--------
net/ipv4/tcp_ipv4.c | 18 --
net/ipv6/tcp_ipv6.c | 13 -
net/llc/af_llc.c | 10 +
32 files changed, 186 insertions(+), 928 deletions(-)
create mode 100644 Documentation/ABI/removed/net_dma
delete mode 100644 drivers/dma/iovlock.c
delete mode 100644 include/net/netdma.h
delete mode 100644 net/core/user_dma.c


2014-01-09 20:16:48

by Dan Williams

[permalink] [raw]
Subject: [PATCH v2 2/4] net_dma: revert 'copied_early'

Now that tcp_dma_try_early_copy() is gone nothing ever sets
copied_early.

Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma"
since it is no longer necessary.

Cc: Ali Saidi <[email protected]>
Cc: James Morris <[email protected]>
Cc: Patrick McHardy <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
New in v2.

net/ipv4/tcp_input.c | 22 ++++++++--------------
1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 33ef18e550c5..15911a280485 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5145,19 +5145,15 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
}
} else {
int eaten = 0;
- int copied_early = 0;
bool fragstolen = false;

- if (tp->copied_seq == tp->rcv_nxt &&
- len - tcp_header_len <= tp->ucopy.len) {
- if (tp->ucopy.task == current &&
- sock_owned_by_user(sk) && !copied_early) {
- __set_current_state(TASK_RUNNING);
+ if (tp->ucopy.task == current &&
+ tp->copied_seq == tp->rcv_nxt &&
+ len - tcp_header_len <= tp->ucopy.len &&
+ sock_owned_by_user(sk)) {
+ __set_current_state(TASK_RUNNING);

- if (!tcp_copy_to_iovec(sk, skb, tcp_header_len))
- eaten = 1;
- }
- if (eaten) {
+ if (!tcp_copy_to_iovec(sk, skb, tcp_header_len)) {
/* Predicted packet is in window by definition.
* seq == rcv_nxt and rcv_wup <= rcv_nxt.
* Hence, check seq<=rcv_wup reduces to:
@@ -5173,9 +5169,8 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
__skb_pull(skb, tcp_header_len);
tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);
+ eaten = 1;
}
- if (copied_early)
- tcp_cleanup_rbuf(sk, skb->len);
}
if (!eaten) {
if (tcp_checksum_complete_user(sk, skb))
@@ -5212,8 +5207,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
goto no_ack;
}

- if (!copied_early || tp->rcv_nxt != tp->rcv_wup)
- __tcp_ack_snd_check(sk, 0);
+ __tcp_ack_snd_check(sk, 0);
no_ack:
if (eaten)
kfree_skb_partial(skb, fragstolen);

2014-01-09 20:17:04

by Dan Williams

[permalink] [raw]
Subject: [PATCH v2 1/4] net_dma: simple removal

Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Cc: Dave Jiang <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: David Whipple <[email protected]>
Cc: Alexander Duyck <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Updated the changelog to foreshadow the other reverts, otherwise the
same straightforward removal as before.

Documentation/ABI/removed/net_dma | 8 +
Documentation/networking/ip-sysctl.txt | 6 -
drivers/dma/Kconfig | 12 -
drivers/dma/Makefile | 1
drivers/dma/dmaengine.c | 104 ------------
drivers/dma/ioat/dma.c | 1
drivers/dma/ioat/dma.h | 7 -
drivers/dma/ioat/dma_v2.c | 1
drivers/dma/ioat/dma_v3.c | 1
drivers/dma/iovlock.c | 280 --------------------------------
include/linux/dmaengine.h | 22 ---
include/linux/skbuff.h | 8 -
include/linux/tcp.h | 8 -
include/net/netdma.h | 32 ----
include/net/sock.h | 19 --
include/net/tcp.h | 8 -
kernel/sysctl_binary.c | 1
net/core/Makefile | 1
net/core/dev.c | 10 -
net/core/sock.c | 6 -
net/core/user_dma.c | 131 ---------------
net/dccp/proto.c | 4
net/ipv4/sysctl_net_ipv4.c | 9 -
net/ipv4/tcp.c | 147 ++---------------
net/ipv4/tcp_input.c | 61 -------
net/ipv4/tcp_ipv4.c | 18 --
net/ipv6/tcp_ipv6.c | 13 -
net/llc/af_llc.c | 10 +
28 files changed, 35 insertions(+), 894 deletions(-)
create mode 100644 Documentation/ABI/removed/net_dma
delete mode 100644 drivers/dma/iovlock.c
delete mode 100644 include/net/netdma.h
delete mode 100644 net/core/user_dma.c

diff --git a/Documentation/ABI/removed/net_dma b/Documentation/ABI/removed/net_dma
new file mode 100644
index 000000000000..a173aecc2f18
--- /dev/null
+++ b/Documentation/ABI/removed/net_dma
@@ -0,0 +1,8 @@
+What: tcp_dma_copybreak sysctl
+Date: Removed in kernel v3.13
+Contact: Dan Williams <[email protected]>
+Description:
+ Formerly the lower limit, in bytes, of the size of socket reads
+ that will be offloaded to a DMA copy engine. Removed due to
+ coherency issues of the cpu potentially touching the buffers
+ while dma is in flight.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 3c12d9a7ed00..bdd8a67f0be2 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -538,12 +538,6 @@ tcp_workaround_signed_windows - BOOLEAN
not receive a window scaling option from them.
Default: 0

-tcp_dma_copybreak - INTEGER
- Lower limit, in bytes, of the size of socket reads that will be
- offloaded to a DMA copy engine, if one is present in the system
- and CONFIG_NET_DMA is enabled.
- Default: 4096
-
tcp_thin_linear_timeouts - BOOLEAN
Enable dynamic triggering of linear timeouts for thin streams.
If set, a check is performed upon retransmission by timeout to
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index c823daaf9043..b24f13195272 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -351,18 +351,6 @@ config DMA_OF
comment "DMA Clients"
depends on DMA_ENGINE

-config NET_DMA
- bool "Network: TCP receive copy offload"
- depends on DMA_ENGINE && NET
- default (INTEL_IOATDMA || FSL_DMA)
- depends on BROKEN
- help
- This enables the use of DMA engines in the network stack to
- offload receive copy-to-user operations, freeing CPU cycles.
-
- Say Y here if you enabled INTEL_IOATDMA or FSL_DMA, otherwise
- say N.
-
config ASYNC_TX_DMA
bool "Async_tx: Offload support for the async_tx api"
depends on DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 0ce2da97e429..024b008a25de 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -6,7 +6,6 @@ obj-$(CONFIG_DMA_VIRTUAL_CHANNELS) += virt-dma.o
obj-$(CONFIG_DMA_ACPI) += acpi-dma.o
obj-$(CONFIG_DMA_OF) += of-dma.o

-obj-$(CONFIG_NET_DMA) += iovlock.o
obj-$(CONFIG_INTEL_MID_DMAC) += intel_mid_dma.o
obj-$(CONFIG_DMATEST) += dmatest.o
obj-$(CONFIG_INTEL_IOATDMA) += ioat/
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index ef63b9058f3c..d7f4f4e0d71f 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1029,110 +1029,6 @@ dmaengine_get_unmap_data(struct device *dev, int nr, gfp_t flags)
}
EXPORT_SYMBOL(dmaengine_get_unmap_data);

-/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
- * @chan: DMA channel to offload copy to
- * @dest_pg: destination page
- * @dest_off: offset in page to copy to
- * @src_pg: source page
- * @src_off: offset in page to copy from
- * @len: length
- *
- * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
- * address according to the DMA mapping API rules for streaming mappings.
- * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
- * (kernel memory or locked user space pages).
- */
-dma_cookie_t
-dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg,
- unsigned int dest_off, struct page *src_pg, unsigned int src_off,
- size_t len)
-{
- struct dma_device *dev = chan->device;
- struct dma_async_tx_descriptor *tx;
- struct dmaengine_unmap_data *unmap;
- dma_cookie_t cookie;
- unsigned long flags;
-
- unmap = dmaengine_get_unmap_data(dev->dev, 2, GFP_NOWAIT);
- if (!unmap)
- return -ENOMEM;
-
- unmap->to_cnt = 1;
- unmap->from_cnt = 1;
- unmap->addr[0] = dma_map_page(dev->dev, src_pg, src_off, len,
- DMA_TO_DEVICE);
- unmap->addr[1] = dma_map_page(dev->dev, dest_pg, dest_off, len,
- DMA_FROM_DEVICE);
- unmap->len = len;
- flags = DMA_CTRL_ACK;
- tx = dev->device_prep_dma_memcpy(chan, unmap->addr[1], unmap->addr[0],
- len, flags);
-
- if (!tx) {
- dmaengine_unmap_put(unmap);
- return -ENOMEM;
- }
-
- dma_set_unmap(tx, unmap);
- cookie = tx->tx_submit(tx);
- dmaengine_unmap_put(unmap);
-
- preempt_disable();
- __this_cpu_add(chan->local->bytes_transferred, len);
- __this_cpu_inc(chan->local->memcpy_count);
- preempt_enable();
-
- return cookie;
-}
-EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
-
-/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
- * @chan: DMA channel to offload copy to
- * @dest: destination address (virtual)
- * @src: source address (virtual)
- * @len: length
- *
- * Both @dest and @src must be mappable to a bus address according to the
- * DMA mapping API rules for streaming mappings.
- * Both @dest and @src must stay memory resident (kernel memory or locked
- * user space pages).
- */
-dma_cookie_t
-dma_async_memcpy_buf_to_buf(struct dma_chan *chan, void *dest,
- void *src, size_t len)
-{
- return dma_async_memcpy_pg_to_pg(chan, virt_to_page(dest),
- (unsigned long) dest & ~PAGE_MASK,
- virt_to_page(src),
- (unsigned long) src & ~PAGE_MASK, len);
-}
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
-
-/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
- * @chan: DMA channel to offload copy to
- * @page: destination page
- * @offset: offset in page to copy to
- * @kdata: source address (virtual)
- * @len: length
- *
- * Both @page/@offset and @kdata must be mappable to a bus address according
- * to the DMA mapping API rules for streaming mappings.
- * Both @page/@offset and @kdata must stay memory resident (kernel memory or
- * locked user space pages)
- */
-dma_cookie_t
-dma_async_memcpy_buf_to_pg(struct dma_chan *chan, struct page *page,
- unsigned int offset, void *kdata, size_t len)
-{
- return dma_async_memcpy_pg_to_pg(chan, page, offset,
- virt_to_page(kdata),
- (unsigned long) kdata & ~PAGE_MASK, len);
-}
-EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
-
void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
struct dma_chan *chan)
{
diff --git a/drivers/dma/ioat/dma.c b/drivers/dma/ioat/dma.c
index 1a49c777607c..97fa394ca855 100644
--- a/drivers/dma/ioat/dma.c
+++ b/drivers/dma/ioat/dma.c
@@ -1175,7 +1175,6 @@ int ioat1_dma_probe(struct ioatdma_device *device, int dca)
err = ioat_probe(device);
if (err)
return err;
- ioat_set_tcp_copy_break(4096);
err = ioat_register(device);
if (err)
return err;
diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h
index 11fb877ddca9..664ec9cbd651 100644
--- a/drivers/dma/ioat/dma.h
+++ b/drivers/dma/ioat/dma.h
@@ -214,13 +214,6 @@ __dump_desc_dbg(struct ioat_chan_common *chan, struct ioat_dma_descriptor *hw,
#define dump_desc_dbg(c, d) \
({ if (d) __dump_desc_dbg(&c->base, d->hw, &d->txd, desc_id(d)); 0; })

-static inline void ioat_set_tcp_copy_break(unsigned long copybreak)
-{
- #ifdef CONFIG_NET_DMA
- sysctl_tcp_dma_copybreak = copybreak;
- #endif
-}
-
static inline struct ioat_chan_common *
ioat_chan_by_index(struct ioatdma_device *device, int index)
{
diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 5d3affe7e976..31e8098e444f 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -900,7 +900,6 @@ int ioat2_dma_probe(struct ioatdma_device *device, int dca)
err = ioat_probe(device);
if (err)
return err;
- ioat_set_tcp_copy_break(2048);

list_for_each_entry(c, &dma->channels, device_node) {
chan = to_chan_common(c);
diff --git a/drivers/dma/ioat/dma_v3.c b/drivers/dma/ioat/dma_v3.c
index 820817e97e62..4bb81346bee2 100644
--- a/drivers/dma/ioat/dma_v3.c
+++ b/drivers/dma/ioat/dma_v3.c
@@ -1652,7 +1652,6 @@ int ioat3_dma_probe(struct ioatdma_device *device, int dca)
err = ioat_probe(device);
if (err)
return err;
- ioat_set_tcp_copy_break(262144);

list_for_each_entry(c, &dma->channels, device_node) {
chan = to_chan_common(c);
diff --git a/drivers/dma/iovlock.c b/drivers/dma/iovlock.c
deleted file mode 100644
index bb48a57c2fc1..000000000000
--- a/drivers/dma/iovlock.c
+++ /dev/null
@@ -1,280 +0,0 @@
-/*
- * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
- * Portions based on net/core/datagram.c and copyrighted by their authors.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * The full GNU General Public License is included in this distribution in the
- * file called COPYING.
- */
-
-/*
- * This code allows the net stack to make use of a DMA engine for
- * skb to iovec copies.
- */
-
-#include <linux/dmaengine.h>
-#include <linux/pagemap.h>
-#include <linux/slab.h>
-#include <net/tcp.h> /* for memcpy_toiovec */
-#include <asm/io.h>
-#include <asm/uaccess.h>
-
-static int num_pages_spanned(struct iovec *iov)
-{
- return
- ((PAGE_ALIGN((unsigned long)iov->iov_base + iov->iov_len) -
- ((unsigned long)iov->iov_base & PAGE_MASK)) >> PAGE_SHIFT);
-}
-
-/*
- * Pin down all the iovec pages needed for len bytes.
- * Return a struct dma_pinned_list to keep track of pages pinned down.
- *
- * We are allocating a single chunk of memory, and then carving it up into
- * 3 sections, the latter 2 whose size depends on the number of iovecs and the
- * total number of pages, respectively.
- */
-struct dma_pinned_list *dma_pin_iovec_pages(struct iovec *iov, size_t len)
-{
- struct dma_pinned_list *local_list;
- struct page **pages;
- int i;
- int ret;
- int nr_iovecs = 0;
- int iovec_len_used = 0;
- int iovec_pages_used = 0;
-
- /* don't pin down non-user-based iovecs */
- if (segment_eq(get_fs(), KERNEL_DS))
- return NULL;
-
- /* determine how many iovecs/pages there are, up front */
- do {
- iovec_len_used += iov[nr_iovecs].iov_len;
- iovec_pages_used += num_pages_spanned(&iov[nr_iovecs]);
- nr_iovecs++;
- } while (iovec_len_used < len);
-
- /* single kmalloc for pinned list, page_list[], and the page arrays */
- local_list = kmalloc(sizeof(*local_list)
- + (nr_iovecs * sizeof (struct dma_page_list))
- + (iovec_pages_used * sizeof (struct page*)), GFP_KERNEL);
- if (!local_list)
- goto out;
-
- /* list of pages starts right after the page list array */
- pages = (struct page **) &local_list->page_list[nr_iovecs];
-
- local_list->nr_iovecs = 0;
-
- for (i = 0; i < nr_iovecs; i++) {
- struct dma_page_list *page_list = &local_list->page_list[i];
-
- len -= iov[i].iov_len;
-
- if (!access_ok(VERIFY_WRITE, iov[i].iov_base, iov[i].iov_len))
- goto unpin;
-
- page_list->nr_pages = num_pages_spanned(&iov[i]);
- page_list->base_address = iov[i].iov_base;
-
- page_list->pages = pages;
- pages += page_list->nr_pages;
-
- /* pin pages down */
- down_read(&current->mm->mmap_sem);
- ret = get_user_pages(
- current,
- current->mm,
- (unsigned long) iov[i].iov_base,
- page_list->nr_pages,
- 1, /* write */
- 0, /* force */
- page_list->pages,
- NULL);
- up_read(&current->mm->mmap_sem);
-
- if (ret != page_list->nr_pages)
- goto unpin;
-
- local_list->nr_iovecs = i + 1;
- }
-
- return local_list;
-
-unpin:
- dma_unpin_iovec_pages(local_list);
-out:
- return NULL;
-}
-
-void dma_unpin_iovec_pages(struct dma_pinned_list *pinned_list)
-{
- int i, j;
-
- if (!pinned_list)
- return;
-
- for (i = 0; i < pinned_list->nr_iovecs; i++) {
- struct dma_page_list *page_list = &pinned_list->page_list[i];
- for (j = 0; j < page_list->nr_pages; j++) {
- set_page_dirty_lock(page_list->pages[j]);
- page_cache_release(page_list->pages[j]);
- }
- }
-
- kfree(pinned_list);
-}
-
-
-/*
- * We have already pinned down the pages we will be using in the iovecs.
- * Each entry in iov array has corresponding entry in pinned_list->page_list.
- * Using array indexing to keep iov[] and page_list[] in sync.
- * Initial elements in iov array's iov->iov_len will be 0 if already copied into
- * by another call.
- * iov array length remaining guaranteed to be bigger than len.
- */
-dma_cookie_t dma_memcpy_to_iovec(struct dma_chan *chan, struct iovec *iov,
- struct dma_pinned_list *pinned_list, unsigned char *kdata, size_t len)
-{
- int iov_byte_offset;
- int copy;
- dma_cookie_t dma_cookie = 0;
- int iovec_idx;
- int page_idx;
-
- if (!chan)
- return memcpy_toiovec(iov, kdata, len);
-
- iovec_idx = 0;
- while (iovec_idx < pinned_list->nr_iovecs) {
- struct dma_page_list *page_list;
-
- /* skip already used-up iovecs */
- while (!iov[iovec_idx].iov_len)
- iovec_idx++;
-
- page_list = &pinned_list->page_list[iovec_idx];
-
- iov_byte_offset = ((unsigned long)iov[iovec_idx].iov_base & ~PAGE_MASK);
- page_idx = (((unsigned long)iov[iovec_idx].iov_base & PAGE_MASK)
- - ((unsigned long)page_list->base_address & PAGE_MASK)) >> PAGE_SHIFT;
-
- /* break up copies to not cross page boundary */
- while (iov[iovec_idx].iov_len) {
- copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
- copy = min_t(int, copy, iov[iovec_idx].iov_len);
-
- dma_cookie = dma_async_memcpy_buf_to_pg(chan,
- page_list->pages[page_idx],
- iov_byte_offset,
- kdata,
- copy);
- /* poll for a descriptor slot */
- if (unlikely(dma_cookie < 0)) {
- dma_async_issue_pending(chan);
- continue;
- }
-
- len -= copy;
- iov[iovec_idx].iov_len -= copy;
- iov[iovec_idx].iov_base += copy;
-
- if (!len)
- return dma_cookie;
-
- kdata += copy;
- iov_byte_offset = 0;
- page_idx++;
- }
- iovec_idx++;
- }
-
- /* really bad if we ever run out of iovecs */
- BUG();
- return -EFAULT;
-}
-
-dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan *chan, struct iovec *iov,
- struct dma_pinned_list *pinned_list, struct page *page,
- unsigned int offset, size_t len)
-{
- int iov_byte_offset;
- int copy;
- dma_cookie_t dma_cookie = 0;
- int iovec_idx;
- int page_idx;
- int err;
-
- /* this needs as-yet-unimplemented buf-to-buff, so punt. */
- /* TODO: use dma for this */
- if (!chan || !pinned_list) {
- u8 *vaddr = kmap(page);
- err = memcpy_toiovec(iov, vaddr + offset, len);
- kunmap(page);
- return err;
- }
-
- iovec_idx = 0;
- while (iovec_idx < pinned_list->nr_iovecs) {
- struct dma_page_list *page_list;
-
- /* skip already used-up iovecs */
- while (!iov[iovec_idx].iov_len)
- iovec_idx++;
-
- page_list = &pinned_list->page_list[iovec_idx];
-
- iov_byte_offset = ((unsigned long)iov[iovec_idx].iov_base & ~PAGE_MASK);
- page_idx = (((unsigned long)iov[iovec_idx].iov_base & PAGE_MASK)
- - ((unsigned long)page_list->base_address & PAGE_MASK)) >> PAGE_SHIFT;
-
- /* break up copies to not cross page boundary */
- while (iov[iovec_idx].iov_len) {
- copy = min_t(int, PAGE_SIZE - iov_byte_offset, len);
- copy = min_t(int, copy, iov[iovec_idx].iov_len);
-
- dma_cookie = dma_async_memcpy_pg_to_pg(chan,
- page_list->pages[page_idx],
- iov_byte_offset,
- page,
- offset,
- copy);
- /* poll for a descriptor slot */
- if (unlikely(dma_cookie < 0)) {
- dma_async_issue_pending(chan);
- continue;
- }
-
- len -= copy;
- iov[iovec_idx].iov_len -= copy;
- iov[iovec_idx].iov_base += copy;
-
- if (!len)
- return dma_cookie;
-
- offset += copy;
- iov_byte_offset = 0;
- page_idx++;
- }
- iovec_idx++;
- }
-
- /* really bad if we ever run out of iovecs */
- BUG();
- return -EFAULT;
-}
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 41cf0c399288..890545871af0 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -875,18 +875,6 @@ static inline void dmaengine_put(void)
}
#endif

-#ifdef CONFIG_NET_DMA
-#define net_dmaengine_get() dmaengine_get()
-#define net_dmaengine_put() dmaengine_put()
-#else
-static inline void net_dmaengine_get(void)
-{
-}
-static inline void net_dmaengine_put(void)
-{
-}
-#endif
-
#ifdef CONFIG_ASYNC_TX_DMA
#define async_dmaengine_get() dmaengine_get()
#define async_dmaengine_put() dmaengine_put()
@@ -908,16 +896,8 @@ async_dma_find_channel(enum dma_transaction_type type)
return NULL;
}
#endif /* CONFIG_ASYNC_TX_DMA */
-
-dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
- void *dest, void *src, size_t len);
-dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
- struct page *page, unsigned int offset, void *kdata, size_t len);
-dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
- struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
- unsigned int src_off, size_t len);
void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
- struct dma_chan *chan);
+ struct dma_chan *chan);

static inline void async_tx_ack(struct dma_async_tx_descriptor *tx)
{
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bec1cc7d5e3c..ac4f84dfa84b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -28,7 +28,6 @@
#include <linux/textsearch.h>
#include <net/checksum.h>
#include <linux/rcupdate.h>
-#include <linux/dmaengine.h>
#include <linux/hrtimer.h>
#include <linux/dma-mapping.h>
#include <linux/netdev_features.h>
@@ -496,11 +495,8 @@ struct sk_buff {
/* 6/8 bit hole (depending on ndisc_nodetype presence) */
kmemcheck_bitfield_end(flags2);

-#if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL
- union {
- unsigned int napi_id;
- dma_cookie_t dma_cookie;
- };
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ unsigned int napi_id;
#endif
#ifdef CONFIG_NETWORK_SECMARK
__u32 secmark;
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index d68633452d9b..26f16021ce1d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -19,7 +19,6 @@


#include <linux/skbuff.h>
-#include <linux/dmaengine.h>
#include <net/sock.h>
#include <net/inet_connection_sock.h>
#include <net/inet_timewait_sock.h>
@@ -169,13 +168,6 @@ struct tcp_sock {
struct iovec *iov;
int memory;
int len;
-#ifdef CONFIG_NET_DMA
- /* members for async copy */
- struct dma_chan *dma_chan;
- int wakeup;
- struct dma_pinned_list *pinned_list;
- dma_cookie_t dma_cookie;
-#endif
} ucopy;

u32 snd_wl1; /* Sequence for window update */
diff --git a/include/net/netdma.h b/include/net/netdma.h
deleted file mode 100644
index 8ba8ce284eeb..000000000000
--- a/include/net/netdma.h
+++ /dev/null
@@ -1,32 +0,0 @@
-/*
- * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * The full GNU General Public License is included in this distribution in the
- * file called COPYING.
- */
-#ifndef NETDMA_H
-#define NETDMA_H
-#ifdef CONFIG_NET_DMA
-#include <linux/dmaengine.h>
-#include <linux/skbuff.h>
-
-int dma_skb_copy_datagram_iovec(struct dma_chan* chan,
- struct sk_buff *skb, int offset, struct iovec *to,
- size_t len, struct dma_pinned_list *pinned_list);
-
-#endif /* CONFIG_NET_DMA */
-#endif /* NETDMA_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index e3a18ff0c38b..9d5f716e921e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -231,7 +231,6 @@ struct cg_proto;
* @sk_receive_queue: incoming packets
* @sk_wmem_alloc: transmit queue bytes committed
* @sk_write_queue: Packet sending queue
- * @sk_async_wait_queue: DMA copied packets
* @sk_omem_alloc: "o" is "option" or "other"
* @sk_wmem_queued: persistent queue size
* @sk_forward_alloc: space allocated forward
@@ -354,10 +353,6 @@ struct sock {
struct sk_filter __rcu *sk_filter;
struct socket_wq __rcu *sk_wq;

-#ifdef CONFIG_NET_DMA
- struct sk_buff_head sk_async_wait_queue;
-#endif
-
#ifdef CONFIG_XFRM
struct xfrm_policy *sk_policy[2];
#endif
@@ -2200,27 +2195,15 @@ void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags);
* sk_eat_skb - Release a skb if it is no longer needed
* @sk: socket to eat this skb from
* @skb: socket buffer to eat
- * @copied_early: flag indicating whether DMA operations copied this data early
*
* This routine must be called with interrupts disabled or with the socket
* locked so that the sk_buff queue operation is ok.
*/
-#ifdef CONFIG_NET_DMA
-static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, bool copied_early)
-{
- __skb_unlink(skb, &sk->sk_receive_queue);
- if (!copied_early)
- __kfree_skb(skb);
- else
- __skb_queue_tail(&sk->sk_async_wait_queue, skb);
-}
-#else
-static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb, bool copied_early)
+static inline void sk_eat_skb(struct sock *sk, struct sk_buff *skb)
{
__skb_unlink(skb, &sk->sk_receive_queue);
__kfree_skb(skb);
}
-#endif

static inline
struct net *sock_net(const struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70e55d200610..084c163e9d40 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -27,7 +27,6 @@
#include <linux/cache.h>
#include <linux/percpu.h>
#include <linux/skbuff.h>
-#include <linux/dmaengine.h>
#include <linux/crypto.h>
#include <linux/cryptohash.h>
#include <linux/kref.h>
@@ -267,7 +266,6 @@ extern int sysctl_tcp_adv_win_scale;
extern int sysctl_tcp_tw_reuse;
extern int sysctl_tcp_frto;
extern int sysctl_tcp_low_latency;
-extern int sysctl_tcp_dma_copybreak;
extern int sysctl_tcp_nometrics_save;
extern int sysctl_tcp_moderate_rcvbuf;
extern int sysctl_tcp_tso_win_divisor;
@@ -1032,12 +1030,6 @@ static inline void tcp_prequeue_init(struct tcp_sock *tp)
tp->ucopy.len = 0;
tp->ucopy.memory = 0;
skb_queue_head_init(&tp->ucopy.prequeue);
-#ifdef CONFIG_NET_DMA
- tp->ucopy.dma_chan = NULL;
- tp->ucopy.wakeup = 0;
- tp->ucopy.pinned_list = NULL;
- tp->ucopy.dma_cookie = 0;
-#endif
}

bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 653cbbd9e7ad..d457005acedf 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -390,7 +390,6 @@ static const struct bin_table bin_net_ipv4_table[] = {
{ CTL_INT, NET_TCP_MTU_PROBING, "tcp_mtu_probing" },
{ CTL_INT, NET_TCP_BASE_MSS, "tcp_base_mss" },
{ CTL_INT, NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS, "tcp_workaround_signed_windows" },
- { CTL_INT, NET_TCP_DMA_COPYBREAK, "tcp_dma_copybreak" },
{ CTL_INT, NET_TCP_SLOW_START_AFTER_IDLE, "tcp_slow_start_after_idle" },
{ CTL_INT, NET_CIPSOV4_CACHE_ENABLE, "cipso_cache_enable" },
{ CTL_INT, NET_CIPSOV4_CACHE_BUCKET_SIZE, "cipso_cache_bucket_size" },
diff --git a/net/core/Makefile b/net/core/Makefile
index b33b996f5dd6..5f98e5983bd3 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -16,7 +16,6 @@ obj-y += net-sysfs.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
obj-$(CONFIG_NETPOLL) += netpoll.o
-obj-$(CONFIG_NET_DMA) += user_dma.o
obj-$(CONFIG_FIB_RULES) += fib_rules.o
obj-$(CONFIG_TRACEPOINTS) += net-traces.o
obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
diff --git a/net/core/dev.c b/net/core/dev.c
index ba3b7ea5ebb3..677a5a4dcca7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1262,7 +1262,6 @@ static int __dev_open(struct net_device *dev)
clear_bit(__LINK_STATE_START, &dev->state);
else {
dev->flags |= IFF_UP;
- net_dmaengine_get();
dev_set_rx_mode(dev);
dev_activate(dev);
add_device_randomness(dev->dev_addr, dev->addr_len);
@@ -1338,7 +1337,6 @@ static int __dev_close_many(struct list_head *head)
ops->ndo_stop(dev);

dev->flags &= ~IFF_UP;
- net_dmaengine_put();
}

return 0;
@@ -4362,14 +4360,6 @@ static void net_rx_action(struct softirq_action *h)
out:
net_rps_action_and_irq_enable(sd);

-#ifdef CONFIG_NET_DMA
- /*
- * There may not be any more sk_buffs coming right now, so push
- * any pending DMA copies to hardware
- */
- dma_issue_pending_all();
-#endif
-
return;

softnet_break:
diff --git a/net/core/sock.c b/net/core/sock.c
index ab20ed9b0f31..411dab3a5726 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1461,9 +1461,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
atomic_set(&newsk->sk_omem_alloc, 0);
skb_queue_head_init(&newsk->sk_receive_queue);
skb_queue_head_init(&newsk->sk_write_queue);
-#ifdef CONFIG_NET_DMA
- skb_queue_head_init(&newsk->sk_async_wait_queue);
-#endif

spin_lock_init(&newsk->sk_dst_lock);
rwlock_init(&newsk->sk_callback_lock);
@@ -2290,9 +2287,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
skb_queue_head_init(&sk->sk_receive_queue);
skb_queue_head_init(&sk->sk_write_queue);
skb_queue_head_init(&sk->sk_error_queue);
-#ifdef CONFIG_NET_DMA
- skb_queue_head_init(&sk->sk_async_wait_queue);
-#endif

sk->sk_send_head = NULL;

diff --git a/net/core/user_dma.c b/net/core/user_dma.c
deleted file mode 100644
index 1b5fefdb8198..000000000000
--- a/net/core/user_dma.c
+++ /dev/null
@@ -1,131 +0,0 @@
-/*
- * Copyright(c) 2004 - 2006 Intel Corporation. All rights reserved.
- * Portions based on net/core/datagram.c and copyrighted by their authors.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the Free
- * Software Foundation; either version 2 of the License, or (at your option)
- * any later version.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59
- * Temple Place - Suite 330, Boston, MA 02111-1307, USA.
- *
- * The full GNU General Public License is included in this distribution in the
- * file called COPYING.
- */
-
-/*
- * This code allows the net stack to make use of a DMA engine for
- * skb to iovec copies.
- */
-
-#include <linux/dmaengine.h>
-#include <linux/socket.h>
-#include <linux/export.h>
-#include <net/tcp.h>
-#include <net/netdma.h>
-
-#define NET_DMA_DEFAULT_COPYBREAK 4096
-
-int sysctl_tcp_dma_copybreak = NET_DMA_DEFAULT_COPYBREAK;
-EXPORT_SYMBOL(sysctl_tcp_dma_copybreak);
-
-/**
- * dma_skb_copy_datagram_iovec - Copy a datagram to an iovec.
- * @skb - buffer to copy
- * @offset - offset in the buffer to start copying from
- * @iovec - io vector to copy to
- * @len - amount of data to copy from buffer to iovec
- * @pinned_list - locked iovec buffer data
- *
- * Note: the iovec is modified during the copy.
- */
-int dma_skb_copy_datagram_iovec(struct dma_chan *chan,
- struct sk_buff *skb, int offset, struct iovec *to,
- size_t len, struct dma_pinned_list *pinned_list)
-{
- int start = skb_headlen(skb);
- int i, copy = start - offset;
- struct sk_buff *frag_iter;
- dma_cookie_t cookie = 0;
-
- /* Copy header. */
- if (copy > 0) {
- if (copy > len)
- copy = len;
- cookie = dma_memcpy_to_iovec(chan, to, pinned_list,
- skb->data + offset, copy);
- if (cookie < 0)
- goto fault;
- len -= copy;
- if (len == 0)
- goto end;
- offset += copy;
- }
-
- /* Copy paged appendix. Hmm... why does this look so complicated? */
- for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
- int end;
- const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-
- WARN_ON(start > offset + len);
-
- end = start + skb_frag_size(frag);
- copy = end - offset;
- if (copy > 0) {
- struct page *page = skb_frag_page(frag);
-
- if (copy > len)
- copy = len;
-
- cookie = dma_memcpy_pg_to_iovec(chan, to, pinned_list, page,
- frag->page_offset + offset - start, copy);
- if (cookie < 0)
- goto fault;
- len -= copy;
- if (len == 0)
- goto end;
- offset += copy;
- }
- start = end;
- }
-
- skb_walk_frags(skb, frag_iter) {
- int end;
-
- WARN_ON(start > offset + len);
-
- end = start + frag_iter->len;
- copy = end - offset;
- if (copy > 0) {
- if (copy > len)
- copy = len;
- cookie = dma_skb_copy_datagram_iovec(chan, frag_iter,
- offset - start,
- to, copy,
- pinned_list);
- if (cookie < 0)
- goto fault;
- len -= copy;
- if (len == 0)
- goto end;
- offset += copy;
- }
- start = end;
- }
-
-end:
- if (!len) {
- skb->dma_cookie = cookie;
- return cookie;
- }
-
-fault:
- return -EFAULT;
-}
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index eb892b4f4814..f9076f295b13 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -848,7 +848,7 @@ int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
default:
dccp_pr_debug("packet_type=%s\n",
dccp_packet_name(dh->dccph_type));
- sk_eat_skb(sk, skb, false);
+ sk_eat_skb(sk, skb);
}
verify_sock_status:
if (sock_flag(sk, SOCK_DONE)) {
@@ -905,7 +905,7 @@ verify_sock_status:
len = skb->len;
found_fin_ok:
if (!(flags & MSG_PEEK))
- sk_eat_skb(sk, skb, false);
+ sk_eat_skb(sk, skb);
break;
} while (1);
out:
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 3d69ec8dac57..79a90b92e12d 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -642,15 +642,6 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
-#ifdef CONFIG_NET_DMA
- {
- .procname = "tcp_dma_copybreak",
- .data = &sysctl_tcp_dma_copybreak,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec
- },
-#endif
{
.procname = "tcp_slow_start_after_idle",
.data = &sysctl_tcp_slow_start_after_idle,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c4638e6f0238..8dc913dfbaef 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -274,7 +274,6 @@
#include <net/tcp.h>
#include <net/xfrm.h>
#include <net/ip.h>
-#include <net/netdma.h>
#include <net/sock.h>

#include <asm/uaccess.h>
@@ -1409,39 +1408,6 @@ static void tcp_prequeue_process(struct sock *sk)
tp->ucopy.memory = 0;
}

-#ifdef CONFIG_NET_DMA
-static void tcp_service_net_dma(struct sock *sk, bool wait)
-{
- dma_cookie_t done, used;
- dma_cookie_t last_issued;
- struct tcp_sock *tp = tcp_sk(sk);
-
- if (!tp->ucopy.dma_chan)
- return;
-
- last_issued = tp->ucopy.dma_cookie;
- dma_async_issue_pending(tp->ucopy.dma_chan);
-
- do {
- if (dma_async_is_tx_complete(tp->ucopy.dma_chan,
- last_issued, &done,
- &used) == DMA_COMPLETE) {
- /* Safe to free early-copied skbs now */
- __skb_queue_purge(&sk->sk_async_wait_queue);
- break;
- } else {
- struct sk_buff *skb;
- while ((skb = skb_peek(&sk->sk_async_wait_queue)) &&
- (dma_async_is_complete(skb->dma_cookie, done,
- used) == DMA_COMPLETE)) {
- __skb_dequeue(&sk->sk_async_wait_queue);
- kfree_skb(skb);
- }
- }
- } while (wait);
-}
-#endif
-
static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
{
struct sk_buff *skb;
@@ -1459,7 +1425,7 @@ static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
* splitted a fat GRO packet, while we released socket lock
* in skb_splice_bits()
*/
- sk_eat_skb(sk, skb, false);
+ sk_eat_skb(sk, skb);
}
return NULL;
}
@@ -1525,11 +1491,11 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
continue;
}
if (tcp_hdr(skb)->fin) {
- sk_eat_skb(sk, skb, false);
+ sk_eat_skb(sk, skb);
++seq;
break;
}
- sk_eat_skb(sk, skb, false);
+ sk_eat_skb(sk, skb);
if (!desc->count)
break;
tp->copied_seq = seq;
@@ -1567,7 +1533,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
int target; /* Read at least this many bytes */
long timeo;
struct task_struct *user_recv = NULL;
- bool copied_early = false;
struct sk_buff *skb;
u32 urg_hole = 0;

@@ -1610,28 +1575,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,

target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);

-#ifdef CONFIG_NET_DMA
- tp->ucopy.dma_chan = NULL;
- preempt_disable();
- skb = skb_peek_tail(&sk->sk_receive_queue);
- {
- int available = 0;
-
- if (skb)
- available = TCP_SKB_CB(skb)->seq + skb->len - (*seq);
- if ((available < target) &&
- (len > sysctl_tcp_dma_copybreak) && !(flags & MSG_PEEK) &&
- !sysctl_tcp_low_latency &&
- net_dma_find_channel()) {
- preempt_enable_no_resched();
- tp->ucopy.pinned_list =
- dma_pin_iovec_pages(msg->msg_iov, len);
- } else {
- preempt_enable_no_resched();
- }
- }
-#endif
-
do {
u32 offset;

@@ -1762,16 +1705,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
/* __ Set realtime policy in scheduler __ */
}

-#ifdef CONFIG_NET_DMA
- if (tp->ucopy.dma_chan) {
- if (tp->rcv_wnd == 0 &&
- !skb_queue_empty(&sk->sk_async_wait_queue)) {
- tcp_service_net_dma(sk, true);
- tcp_cleanup_rbuf(sk, copied);
- } else
- dma_async_issue_pending(tp->ucopy.dma_chan);
- }
-#endif
if (copied >= target) {
/* Do not sleep, just process backlog. */
release_sock(sk);
@@ -1779,11 +1712,6 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
} else
sk_wait_data(sk, &timeo);

-#ifdef CONFIG_NET_DMA
- tcp_service_net_dma(sk, false); /* Don't block */
- tp->ucopy.wakeup = 0;
-#endif
-
if (user_recv) {
int chunk;

@@ -1841,43 +1769,13 @@ do_prequeue:
}

if (!(flags & MSG_TRUNC)) {
-#ifdef CONFIG_NET_DMA
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
-
- if (tp->ucopy.dma_chan) {
- tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
- tp->ucopy.dma_chan, skb, offset,
- msg->msg_iov, used,
- tp->ucopy.pinned_list);
-
- if (tp->ucopy.dma_cookie < 0) {
-
- pr_alert("%s: dma_cookie < 0\n",
- __func__);
-
- /* Exception. Bailout! */
- if (!copied)
- copied = -EFAULT;
- break;
- }
-
- dma_async_issue_pending(tp->ucopy.dma_chan);
-
- if ((offset + used) == skb->len)
- copied_early = true;
-
- } else
-#endif
- {
- err = skb_copy_datagram_iovec(skb, offset,
- msg->msg_iov, used);
- if (err) {
- /* Exception. Bailout! */
- if (!copied)
- copied = -EFAULT;
- break;
- }
+ err = skb_copy_datagram_iovec(skb, offset,
+ msg->msg_iov, used);
+ if (err) {
+ /* Exception. Bailout! */
+ if (!copied)
+ copied = -EFAULT;
+ break;
}
}

@@ -1897,19 +1795,15 @@ skip_copy:

if (tcp_hdr(skb)->fin)
goto found_fin_ok;
- if (!(flags & MSG_PEEK)) {
- sk_eat_skb(sk, skb, copied_early);
- copied_early = false;
- }
+ if (!(flags & MSG_PEEK))
+ sk_eat_skb(sk, skb);
continue;

found_fin_ok:
/* Process the FIN. */
++*seq;
- if (!(flags & MSG_PEEK)) {
- sk_eat_skb(sk, skb, copied_early);
- copied_early = false;
- }
+ if (!(flags & MSG_PEEK))
+ sk_eat_skb(sk, skb);
break;
} while (len > 0);

@@ -1932,16 +1826,6 @@ skip_copy:
tp->ucopy.len = 0;
}

-#ifdef CONFIG_NET_DMA
- tcp_service_net_dma(sk, true); /* Wait for queue to drain */
- tp->ucopy.dma_chan = NULL;
-
- if (tp->ucopy.pinned_list) {
- dma_unpin_iovec_pages(tp->ucopy.pinned_list);
- tp->ucopy.pinned_list = NULL;
- }
-#endif
-
/* According to UNIX98, msg_name/msg_namelen are ignored
* on connected socket. I was just happy when found this 8) --ANK
*/
@@ -2285,9 +2169,6 @@ int tcp_disconnect(struct sock *sk, int flags)
__skb_queue_purge(&sk->sk_receive_queue);
tcp_write_queue_purge(sk);
__skb_queue_purge(&tp->out_of_order_queue);
-#ifdef CONFIG_NET_DMA
- __skb_queue_purge(&sk->sk_async_wait_queue);
-#endif

inet->inet_dport = 0;

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c53b7f35c51d..33ef18e550c5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -73,7 +73,6 @@
#include <net/inet_common.h>
#include <linux/ipsec.h>
#include <asm/unaligned.h>
-#include <net/netdma.h>

int sysctl_tcp_timestamps __read_mostly = 1;
int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -4967,53 +4966,6 @@ static inline bool tcp_checksum_complete_user(struct sock *sk,
__tcp_checksum_complete_user(sk, skb);
}

-#ifdef CONFIG_NET_DMA
-static bool tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
- int hlen)
-{
- struct tcp_sock *tp = tcp_sk(sk);
- int chunk = skb->len - hlen;
- int dma_cookie;
- bool copied_early = false;
-
- if (tp->ucopy.wakeup)
- return false;
-
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
-
- if (tp->ucopy.dma_chan && skb_csum_unnecessary(skb)) {
-
- dma_cookie = dma_skb_copy_datagram_iovec(tp->ucopy.dma_chan,
- skb, hlen,
- tp->ucopy.iov, chunk,
- tp->ucopy.pinned_list);
-
- if (dma_cookie < 0)
- goto out;
-
- tp->ucopy.dma_cookie = dma_cookie;
- copied_early = true;
-
- tp->ucopy.len -= chunk;
- tp->copied_seq += chunk;
- tcp_rcv_space_adjust(sk);
-
- if ((tp->ucopy.len == 0) ||
- (tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||
- (atomic_read(&sk->sk_rmem_alloc) > (sk->sk_rcvbuf >> 1))) {
- tp->ucopy.wakeup = 1;
- sk->sk_data_ready(sk, 0);
- }
- } else if (chunk > 0) {
- tp->ucopy.wakeup = 1;
- sk->sk_data_ready(sk, 0);
- }
-out:
- return copied_early;
-}
-#endif /* CONFIG_NET_DMA */
-
/* Does PAWS and seqno based validation of an incoming segment, flags will
* play significant role here.
*/
@@ -5198,14 +5150,6 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,

if (tp->copied_seq == tp->rcv_nxt &&
len - tcp_header_len <= tp->ucopy.len) {
-#ifdef CONFIG_NET_DMA
- if (tp->ucopy.task == current &&
- sock_owned_by_user(sk) &&
- tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
- copied_early = 1;
- eaten = 1;
- }
-#endif
if (tp->ucopy.task == current &&
sock_owned_by_user(sk) && !copied_early) {
__set_current_state(TASK_RUNNING);
@@ -5271,11 +5215,6 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
if (!copied_early || tp->rcv_nxt != tp->rcv_wup)
__tcp_ack_snd_check(sk, 0);
no_ack:
-#ifdef CONFIG_NET_DMA
- if (copied_early)
- __skb_queue_tail(&sk->sk_async_wait_queue, skb);
- else
-#endif
if (eaten)
kfree_skb_partial(skb, fragstolen);
sk->sk_data_ready(sk, 0);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 59a6f8b90cd9..dc92ba9d0350 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -72,7 +72,6 @@
#include <net/inet_common.h>
#include <net/timewait_sock.h>
#include <net/xfrm.h>
-#include <net/netdma.h>
#include <net/secure_seq.h>
#include <net/tcp_memcontrol.h>
#include <net/busy_poll.h>
@@ -2000,18 +1999,8 @@ process:
bh_lock_sock_nested(sk);
ret = 0;
if (!sock_owned_by_user(sk)) {
-#ifdef CONFIG_NET_DMA
- struct tcp_sock *tp = tcp_sk(sk);
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
- if (tp->ucopy.dma_chan)
+ if (!tcp_prequeue(sk, skb))
ret = tcp_v4_do_rcv(sk, skb);
- else
-#endif
- {
- if (!tcp_prequeue(sk, skb))
- ret = tcp_v4_do_rcv(sk, skb);
- }
} else if (unlikely(sk_add_backlog(sk, skb,
sk->sk_rcvbuf + sk->sk_sndbuf))) {
bh_unlock_sock(sk);
@@ -2170,11 +2159,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
}
#endif

-#ifdef CONFIG_NET_DMA
- /* Cleans up our sk_async_wait_queue */
- __skb_queue_purge(&sk->sk_async_wait_queue);
-#endif
-
/* Clean prequeue, it must be empty really */
__skb_queue_purge(&tp->ucopy.prequeue);

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0740f93a114a..e27972590379 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -59,7 +59,6 @@
#include <net/snmp.h>
#include <net/dsfield.h>
#include <net/timewait_sock.h>
-#include <net/netdma.h>
#include <net/inet_common.h>
#include <net/secure_seq.h>
#include <net/tcp_memcontrol.h>
@@ -1504,18 +1503,8 @@ process:
bh_lock_sock_nested(sk);
ret = 0;
if (!sock_owned_by_user(sk)) {
-#ifdef CONFIG_NET_DMA
- struct tcp_sock *tp = tcp_sk(sk);
- if (!tp->ucopy.dma_chan && tp->ucopy.pinned_list)
- tp->ucopy.dma_chan = net_dma_find_channel();
- if (tp->ucopy.dma_chan)
+ if (!tcp_prequeue(sk, skb))
ret = tcp_v6_do_rcv(sk, skb);
- else
-#endif
- {
- if (!tcp_prequeue(sk, skb))
- ret = tcp_v6_do_rcv(sk, skb);
- }
} else if (unlikely(sk_add_backlog(sk, skb,
sk->sk_rcvbuf + sk->sk_sndbuf))) {
bh_unlock_sock(sk);
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 7b01b9f5846c..e1b46709f8d6 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -838,7 +838,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,

if (!(flags & MSG_PEEK)) {
spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
- sk_eat_skb(sk, skb, false);
+ sk_eat_skb(sk, skb);
spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
*seq = 0;
}
@@ -860,10 +860,10 @@ copy_uaddr:
llc_cmsg_rcv(msg, skb);

if (!(flags & MSG_PEEK)) {
- spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
- sk_eat_skb(sk, skb, false);
- spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
- *seq = 0;
+ spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
+ sk_eat_skb(sk, skb);
+ spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
+ *seq = 0;
}

goto out;

2014-01-09 20:17:15

by Dan Williams

[permalink] [raw]
Subject: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private

net_dma was the only external user so this can become local to tcp.c
again.

Cc: James Morris <[email protected]>
Cc: Patrick McHardy <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
New in v2
include/net/tcp.h | 1 -
net/ipv4/tcp.c | 10 +++++-----
2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 084c163e9d40..571036b3bead 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -370,7 +370,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
const struct tcphdr *th, unsigned int len);
void tcp_rcv_space_adjust(struct sock *sk);
-void tcp_cleanup_rbuf(struct sock *sk, int copied);
int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp);
void tcp_twsk_destructor(struct sock *sk);
ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8dc913dfbaef..70c905a7d2b8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1332,7 +1332,7 @@ static int tcp_peek_sndq(struct sock *sk, struct msghdr *msg, int len)
* calculation of whether or not we must ACK for the sake of
* a window update.
*/
-void tcp_cleanup_rbuf(struct sock *sk, int copied)
+static void cleanup_rbuf(struct sock *sk, int copied)
{
struct tcp_sock *tp = tcp_sk(sk);
bool time_to_ack = false;
@@ -1507,7 +1507,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
/* Clean up data we have read: This will do ACK frames. */
if (copied > 0) {
tcp_recv_skb(sk, seq, &offset);
- tcp_cleanup_rbuf(sk, copied);
+ cleanup_rbuf(sk, copied);
}
return copied;
}
@@ -1658,7 +1658,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
}
}

- tcp_cleanup_rbuf(sk, copied);
+ cleanup_rbuf(sk, copied);

if (!sysctl_tcp_low_latency && tp->ucopy.task == user_recv) {
/* Install new reader */
@@ -1831,7 +1831,7 @@ skip_copy:
*/

/* Clean up data we have read: This will do ACK frames. */
- tcp_cleanup_rbuf(sk, copied);
+ cleanup_rbuf(sk, copied);

release_sock(sk);
return copied;
@@ -2494,7 +2494,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT) &&
inet_csk_ack_scheduled(sk)) {
icsk->icsk_ack.pending |= ICSK_ACK_PUSHED;
- tcp_cleanup_rbuf(sk, 1);
+ cleanup_rbuf(sk, 1);
if (!(val & 1))
icsk->icsk_ack.pingpong = 1;
}

2014-01-09 20:17:37

by Dan Williams

[permalink] [raw]
Subject: [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle()

Record actively mapped pages and provide an api for asserting a given
page is dma inactive before execution proceeds. Placing
debug_dma_assert_idle() in cow_user_page() flagged the violation of the
dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
mark broken").

Cc: Joerg Roedel <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Russell King <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
---
Added CONFIG_DMA_VS_CPU_DEBUG

include/linux/dma-debug.h | 6 ++
lib/Kconfig.debug | 13 ++++-
lib/dma-debug.c | 130 +++++++++++++++++++++++++++++++++++++++++----
mm/memory.c | 3 +
4 files changed, 138 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index fc0e34ce038f..27a029fb2ead 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -185,4 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)

#endif /* CONFIG_DMA_API_DEBUG */

+#ifdef CONFIG_DMA_VS_CPU_DEBUG
+extern void debug_dma_assert_idle(struct page *page);
+#else
+static inline void debug_dma_assert_idle(struct page *page) { }
+#endif /* CONFIG_DMA_VS_CPU_DEBUG */
+
#endif /* __DMA_DEBUG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index db25707aa41b..99751c47fa87 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1575,9 +1575,20 @@ config DMA_API_DEBUG
With this option you will be able to detect common bugs in device
drivers like double-freeing of DMA mappings or freeing mappings that
were never allocated.
- This option causes a performance degredation. Use only if you want
+ This option causes a performance degradation. Use only if you want
to debug device drivers. If unsure, say N.

+config DMA_VS_CPU_DEBUG
+ bool "Enable debugging CPU vs DMA ownership of pages"
+ depends on DMA_API_DEBUG
+ help
+ Enable this option to attempt to catch cases where a page owned by
+ DMA is being touched by the cpu in a way that could cause data
+ corruption. For example, this enables cow_user_page() to check
+ that the source page is not undergoing DMA. This option
+ further increases the overhead of DMA_API_DEBUG. Use only if
+ debugging device drivers. If unsure, say N.
+
source "samples/Kconfig"

source "lib/Kconfig.kgdb"
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index d87a17a819d0..f67ae111cd2f 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -57,7 +57,8 @@ struct dma_debug_entry {
struct list_head list;
struct device *dev;
int type;
- phys_addr_t paddr;
+ unsigned long pfn;
+ size_t offset;
u64 dev_addr;
u64 size;
int direction;
@@ -372,6 +373,11 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
list_del(&entry->list);
}

+static unsigned long long phys_addr(struct dma_debug_entry *entry)
+{
+ return page_to_phys(pfn_to_page(entry->pfn)) + entry->offset;
+}
+
/*
* Dump mapping entries for debugging purposes
*/
@@ -389,9 +395,9 @@ void debug_dma_dump_mappings(struct device *dev)
list_for_each_entry(entry, &bucket->list, list) {
if (!dev || dev == entry->dev) {
dev_info(entry->dev,
- "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
+ "%s idx %d P=%Lx N=%lx D=%Lx L=%Lx %s %s\n",
type2name[entry->type], idx,
- (unsigned long long)entry->paddr,
+ phys_addr(entry), entry->pfn,
entry->dev_addr, entry->size,
dir2name[entry->direction],
maperr2str[entry->map_err_type]);
@@ -403,6 +409,83 @@ void debug_dma_dump_mappings(struct device *dev)
}
EXPORT_SYMBOL(debug_dma_dump_mappings);

+/* memory usage is constrained by the maximum number of available
+ * dma-debug entries
+ */
+static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
+static DEFINE_SPINLOCK(radix_lock);
+
+static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
+{
+ unsigned long pfn = entry->pfn;
+ int i;
+
+ for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
+ if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
+ radix_tree_tag_set(&dma_active_pfn, pfn, i);
+ return;
+ }
+ pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
+ RADIX_TREE_MAX_TAGS, pfn);
+}
+
+static void __active_pfn_dec_overlap(struct dma_debug_entry *entry)
+{
+ unsigned long pfn = entry->pfn;
+ int i;
+
+ for (i = RADIX_TREE_MAX_TAGS - 1; i >= 0; i--)
+ if (radix_tree_tag_get(&dma_active_pfn, pfn, i)) {
+ radix_tree_tag_clear(&dma_active_pfn, pfn, i);
+ return;
+ }
+ radix_tree_delete(&dma_active_pfn, pfn);
+}
+
+static int active_pfn_insert(struct dma_debug_entry *entry)
+{
+ unsigned long flags;
+ int rc;
+
+ spin_lock_irqsave(&radix_lock, flags);
+ rc = radix_tree_insert(&dma_active_pfn, entry->pfn, entry);
+ if (rc == -EEXIST)
+ __active_pfn_inc_overlap(entry);
+ spin_unlock_irqrestore(&radix_lock, flags);
+
+ return rc;
+}
+
+static void active_pfn_remove(struct dma_debug_entry *entry)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&radix_lock, flags);
+ __active_pfn_dec_overlap(entry);
+ spin_unlock_irqrestore(&radix_lock, flags);
+}
+
+void debug_dma_assert_idle(struct page *page)
+{
+ unsigned long flags;
+ struct dma_debug_entry *entry;
+
+ if (!page)
+ return;
+
+ spin_lock_irqsave(&radix_lock, flags);
+ entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
+ spin_unlock_irqrestore(&radix_lock, flags);
+
+ if (!entry)
+ return;
+
+ err_printk(entry->dev, entry,
+ "DMA-API: cpu touching an active dma mapped page "
+ "[pfn=0x%lx]\n", entry->pfn);
+}
+EXPORT_SYMBOL_GPL(debug_dma_assert_idle);
+
/*
* Wrapper function for adding an entry to the hash.
* This function takes care of locking itself.
@@ -411,10 +494,22 @@ static void add_dma_entry(struct dma_debug_entry *entry)
{
struct hash_bucket *bucket;
unsigned long flags;
+ int rc;

bucket = get_hash_bucket(entry, &flags);
hash_bucket_add(bucket, entry);
put_hash_bucket(bucket, &flags);
+
+ rc = active_pfn_insert(entry);
+ if (rc == -ENOMEM) {
+ pr_err("DMA-API: pfn tracking out of memory - "
+ "disabling dma-debug\n");
+ global_disable = true;
+ }
+
+ /* TODO: report -EEXIST errors as overlapping mappings are not
+ * supported by the DMA API
+ */
}

static struct dma_debug_entry *__dma_entry_alloc(void)
@@ -469,6 +564,8 @@ static void dma_entry_free(struct dma_debug_entry *entry)
{
unsigned long flags;

+ active_pfn_remove(entry);
+
/*
* add to beginning of the list - this way the entries are
* more likely cache hot when they are reallocated.
@@ -895,15 +992,15 @@ static void check_unmap(struct dma_debug_entry *ref)
ref->dev_addr, ref->size,
type2name[entry->type], type2name[ref->type]);
} else if ((entry->type == dma_debug_coherent) &&
- (ref->paddr != entry->paddr)) {
+ (phys_addr(ref) != phys_addr(entry))) {
err_printk(ref->dev, entry, "DMA-API: device driver frees "
"DMA memory with different CPU address "
"[device address=0x%016llx] [size=%llu bytes] "
"[cpu alloc address=0x%016llx] "
"[cpu free address=0x%016llx]",
ref->dev_addr, ref->size,
- (unsigned long long)entry->paddr,
- (unsigned long long)ref->paddr);
+ phys_addr(entry),
+ phys_addr(ref));
}

if (ref->sg_call_ents && ref->type == dma_debug_sg &&
@@ -1052,7 +1149,8 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,

entry->dev = dev;
entry->type = dma_debug_page;
- entry->paddr = page_to_phys(page) + offset;
+ entry->pfn = page_to_pfn(page);
+ entry->offset = offset,
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
@@ -1148,7 +1246,8 @@ void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,

entry->type = dma_debug_sg;
entry->dev = dev;
- entry->paddr = sg_phys(s);
+ entry->pfn = page_to_pfn(sg_page(s));
+ entry->offset = s->offset,
entry->size = sg_dma_len(s);
entry->dev_addr = sg_dma_address(s);
entry->direction = direction;
@@ -1198,7 +1297,8 @@ void debug_dma_unmap_sg(struct device *dev, struct scatterlist *sglist,
struct dma_debug_entry ref = {
.type = dma_debug_sg,
.dev = dev,
- .paddr = sg_phys(s),
+ .pfn = page_to_pfn(sg_page(s)),
+ .offset = s->offset,
.dev_addr = sg_dma_address(s),
.size = sg_dma_len(s),
.direction = dir,
@@ -1233,7 +1333,8 @@ void debug_dma_alloc_coherent(struct device *dev, size_t size,

entry->type = dma_debug_coherent;
entry->dev = dev;
- entry->paddr = virt_to_phys(virt);
+ entry->pfn = page_to_pfn(virt_to_page(virt));
+ entry->offset = (size_t) virt & PAGE_MASK;
entry->size = size;
entry->dev_addr = dma_addr;
entry->direction = DMA_BIDIRECTIONAL;
@@ -1248,7 +1349,8 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
struct dma_debug_entry ref = {
.type = dma_debug_coherent,
.dev = dev,
- .paddr = virt_to_phys(virt),
+ .pfn = page_to_pfn(virt_to_page(virt)),
+ .offset = (size_t) virt & PAGE_MASK,
.dev_addr = addr,
.size = size,
.direction = DMA_BIDIRECTIONAL,
@@ -1356,7 +1458,8 @@ void debug_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
struct dma_debug_entry ref = {
.type = dma_debug_sg,
.dev = dev,
- .paddr = sg_phys(s),
+ .pfn = page_to_pfn(sg_page(s)),
+ .offset = s->offset,
.dev_addr = sg_dma_address(s),
.size = sg_dma_len(s),
.direction = direction,
@@ -1388,7 +1491,8 @@ void debug_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
struct dma_debug_entry ref = {
.type = dma_debug_sg,
.dev = dev,
- .paddr = sg_phys(s),
+ .pfn = page_to_pfn(sg_page(s)),
+ .offset = s->offset,
.dev_addr = sg_dma_address(s),
.size = sg_dma_len(s),
.direction = direction,
diff --git a/mm/memory.c b/mm/memory.c
index 5d9025f3b3e1..c89788436f81 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -59,6 +59,7 @@
#include <linux/gfp.h>
#include <linux/migrate.h>
#include <linux/string.h>
+#include <linux/dma-debug.h>

#include <asm/io.h>
#include <asm/pgalloc.h>
@@ -2559,6 +2560,8 @@ static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,

static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
{
+ debug_dma_assert_idle(src);
+
/*
* If the source page was a PFN mapping, we don't have
* a "struct page" for it. We do a best-effort copy by

2014-01-09 20:26:26

by Neal Cardwell

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private

On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <[email protected]> wrote:
> net_dma was the only external user so this can become local to tcp.c
> again.
...
> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
> +static void cleanup_rbuf(struct sock *sk, int copied)

I would vote to keep the tcp_ prefix. In the TCP code base that is the
more common idiom, even for internal/static TCP functions, and
personally I find it easier to read and work with in stack traces,
etc. My 2 cents.

neal

2014-01-09 20:33:41

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private

On Thu, Jan 9, 2014 at 12:26 PM, Neal Cardwell <[email protected]> wrote:
> On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <[email protected]> wrote:
>> net_dma was the only external user so this can become local to tcp.c
>> again.
> ...
>> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
>> +static void cleanup_rbuf(struct sock *sk, int copied)
>
> I would vote to keep the tcp_ prefix. In the TCP code base that is the
> more common idiom, even for internal/static TCP functions, and
> personally I find it easier to read and work with in stack traces,
> etc. My 2 cents.
>

Ok. It was cleanup_rbuf() in a former life, but one vote for leaving
the name as is is enough for me.

--
Dan

2014-01-09 20:42:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private

From: Dan Williams <[email protected]>
Date: Thu, 9 Jan 2014 12:33:29 -0800

> On Thu, Jan 9, 2014 at 12:26 PM, Neal Cardwell <[email protected]> wrote:
>> On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <[email protected]> wrote:
>>> net_dma was the only external user so this can become local to tcp.c
>>> again.
>> ...
>>> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
>>> +static void cleanup_rbuf(struct sock *sk, int copied)
>>
>> I would vote to keep the tcp_ prefix. In the TCP code base that is the
>> more common idiom, even for internal/static TCP functions, and
>> personally I find it easier to read and work with in stack traces,
>> etc. My 2 cents.
>>
>
> Ok. It was cleanup_rbuf() in a former life, but one vote for leaving
> the name as is is enough for me.

You can make that two votes :)

2014-01-10 00:38:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle()

On Thu, 09 Jan 2014 12:17:26 -0800 Dan Williams <[email protected]> wrote:

> Record actively mapped pages and provide an api for asserting a given
> page is dma inactive before execution proceeds. Placing
> debug_dma_assert_idle() in cow_user_page() flagged the violation of the
> dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
> mark broken").
>
> --- a/include/linux/dma-debug.h
> +++ b/include/linux/dma-debug.h
> @@ -185,4 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
>
> #endif /* CONFIG_DMA_API_DEBUG */
>
> +#ifdef CONFIG_DMA_VS_CPU_DEBUG
> +extern void debug_dma_assert_idle(struct page *page);
> +#else
> +static inline void debug_dma_assert_idle(struct page *page) { }

Surely this breaks the build when CONFIG_DMA_VS_CPU_DEBUG=n?
lib/dma-debug.c is missing the necessary "#ifdef
CONFIG_DMA_VS_CPU_DEBUG"s.

Do we really need this config setting anyway? What goes bad if we
permanently enable this subfeature when dma debugging is enabled?

>
> ...
>
> index d87a17a819d0..f67ae111cd2f 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -57,7 +57,8 @@ struct dma_debug_entry {
> struct list_head list;
> struct device *dev;
> int type;
> - phys_addr_t paddr;
> + unsigned long pfn;
> + size_t offset;

Some documentation for the fields would be nice. offset of what
relative to what, in what units?

> u64 dev_addr;
> u64 size;
> int direction;
> @@ -372,6 +373,11 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
> list_del(&entry->list);
> }
>
>
> ...
>
>
> +/* memory usage is constrained by the maximum number of available
> + * dma-debug entries
> + */

A brief design overview would be useful. What goes in tree, how is it
indexed, when and why do we add/remove/test items, etc.

> +static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
> +static DEFINE_SPINLOCK(radix_lock);
> +
> +static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
> +{
> + unsigned long pfn = entry->pfn;
> + int i;
> +
> + for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
> + if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
> + radix_tree_tag_set(&dma_active_pfn, pfn, i);
> + return;
> + }
> + pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
> + RADIX_TREE_MAX_TAGS, pfn);
> +}
> +
>
> ...
>
> +void debug_dma_assert_idle(struct page *page)
> +{
> + unsigned long flags;
> + struct dma_debug_entry *entry;
> +
> + if (!page)
> + return;
> +
> + spin_lock_irqsave(&radix_lock, flags);
> + entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
> + spin_unlock_irqrestore(&radix_lock, flags);
> +
> + if (!entry)
> + return;
> +
> + err_printk(entry->dev, entry,
> + "DMA-API: cpu touching an active dma mapped page "
> + "[pfn=0x%lx]\n", entry->pfn);
> +}
> +EXPORT_SYMBOL_GPL(debug_dma_assert_idle);

The export isn't needed for mm/memory.c

>
> ...
>

2014-01-10 10:40:37

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 3/4] net: make tcp_cleanup_rbuf private

From: David Miller
...
> > On Thu, Jan 9, 2014 at 12:26 PM, Neal Cardwell <[email protected]> wrote:
> >> On Thu, Jan 9, 2014 at 3:16 PM, Dan Williams <[email protected]> wrote:
> >>> net_dma was the only external user so this can become local to tcp.c
> >>> again.
> >> ...
> >>> -void tcp_cleanup_rbuf(struct sock *sk, int copied)
> >>> +static void cleanup_rbuf(struct sock *sk, int copied)
> >>
> >> I would vote to keep the tcp_ prefix. In the TCP code base that is the
> >> more common idiom, even for internal/static TCP functions, and
> >> personally I find it easier to read and work with in stack traces,
> >> etc. My 2 cents.
> >>
> >
> > Ok. It was cleanup_rbuf() in a former life, but one vote for leaving
> > the name as is is enough for me.
>
> You can make that two votes :)

I suspect DM adds 2000 votes :-)

Keeping the prefix makes it easier to grep for, and makes it more
obvious that it isn't a generic function (when being called).

David


2014-01-11 02:35:19

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma debug: introduce debug_dma_assert_idle()

On Thu, Jan 9, 2014 at 4:38 PM, Andrew Morton <[email protected]> wrote:
> On Thu, 09 Jan 2014 12:17:26 -0800 Dan Williams <[email protected]> wrote:
>
>> Record actively mapped pages and provide an api for asserting a given
>> page is dma inactive before execution proceeds. Placing
>> debug_dma_assert_idle() in cow_user_page() flagged the violation of the
>> dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma:
>> mark broken").
>>
>> --- a/include/linux/dma-debug.h
>> +++ b/include/linux/dma-debug.h
>> @@ -185,4 +185,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
>>
>> #endif /* CONFIG_DMA_API_DEBUG */
>>
>> +#ifdef CONFIG_DMA_VS_CPU_DEBUG
>> +extern void debug_dma_assert_idle(struct page *page);
>> +#else
>> +static inline void debug_dma_assert_idle(struct page *page) { }
>
> Surely this breaks the build when CONFIG_DMA_VS_CPU_DEBUG=n?
> lib/dma-debug.c is missing the necessary "#ifdef
> CONFIG_DMA_VS_CPU_DEBUG"s.

facepalm

> Do we really need this config setting anyway? What goes bad if we
> permanently enable this subfeature when dma debugging is enabled?

I did want to provide notification/description of this extra check,
but I'll go ahead and fold it into the DMA_API_DEBUG description.

The only thing that potentially goes bad is no longer having hard
expectation of memory consumption. Before the patch it's a simple
sizeof(struct dma_debug_entry) * PREALLOC_DMA_DEBUG_ENTRIES, after the
patch it's variable size of the radix tree based on sparseness and
variable based on the number of pages included in each dma_map_sg
call. The testing with NET_DMA did not involve dma_map_sg calls

>> ...
>>
>> index d87a17a819d0..f67ae111cd2f 100644
>> --- a/lib/dma-debug.c
>> +++ b/lib/dma-debug.c
>> @@ -57,7 +57,8 @@ struct dma_debug_entry {
>> struct list_head list;
>> struct device *dev;
>> int type;
>> - phys_addr_t paddr;
>> + unsigned long pfn;
>> + size_t offset;
>
> Some documentation for the fields would be nice. offset of what
> relative to what, in what units?

This is the same 'offset' passed to dma_map_page(), will document.

>
>> u64 dev_addr;
>> u64 size;
>> int direction;
>> @@ -372,6 +373,11 @@ static void hash_bucket_del(struct dma_debug_entry *entry)
>> list_del(&entry->list);
>> }
>>
>>
>> ...
>>
>>
>> +/* memory usage is constrained by the maximum number of available
>> + * dma-debug entries
>> + */
>
> A brief design overview would be useful. What goes in tree, how is it
> indexed, when and why do we add/remove/test items, etc.
>

...added this documentation to dma_active_pfn for the next revision.

/*
* For each page mapped (initial page in the case of
* dma_alloc_coherent/dma_map_{single|page}, or each page in a
* scatterlist) insert into this tree using the pfn as the key. At
* dma_unmap_{single|sg|page} or dma_free_coherent delete the entry. If
* the pfn already exists at insertion time add a tag as a reference
* count for the overlapping mappings. For now the overlap tracking
* just ensures that 'unmaps' balance 'maps' before marking the pfn
* idle, but we should also be flagging overlaps as an API violation.
*
* Memory usage is mostly constrained by the maximum number of available
* dma-debug entries. In the case of dma_map_{single|page} and
* dma_alloc_coherent there is only one dma_debug_entry and one pfn to
* track per each of these calls. dma_map_sg(), on the other hand,
* consumes a single dma_debug_entry, but inserts 'nents' entries into
* the tree.
*
* At any time debug_dma_assert_idle() can be called to trigger a
* warning if the given page is in the active set.
*/


>> +static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
>> +static DEFINE_SPINLOCK(radix_lock);
>> +
>> +static void __active_pfn_inc_overlap(struct dma_debug_entry *entry)
>> +{
>> + unsigned long pfn = entry->pfn;
>> + int i;
>> +
>> + for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
>> + if (radix_tree_tag_get(&dma_active_pfn, pfn, i) == 0) {
>> + radix_tree_tag_set(&dma_active_pfn, pfn, i);
>> + return;
>> + }
>> + pr_debug("DMA-API: max overlap count (%d) reached for pfn 0x%lx\n",
>> + RADIX_TREE_MAX_TAGS, pfn);
>> +}
>> +
>>
>> ...
>>
>> +void debug_dma_assert_idle(struct page *page)
>> +{
>> + unsigned long flags;
>> + struct dma_debug_entry *entry;
>> +
>> + if (!page)
>> + return;
>> +
>> + spin_lock_irqsave(&radix_lock, flags);
>> + entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
>> + spin_unlock_irqrestore(&radix_lock, flags);
>> +
>> + if (!entry)
>> + return;
>> +
>> + err_printk(entry->dev, entry,
>> + "DMA-API: cpu touching an active dma mapped page "
>> + "[pfn=0x%lx]\n", entry->pfn);
>> +}
>> +EXPORT_SYMBOL_GPL(debug_dma_assert_idle);
>
> The export isn't needed for mm/memory.c

True, it can wait until other call sites arise.

Thanks Andrew.