2020-09-15 16:34:53

by Christoph Hellwig

[permalink] [raw]
Subject: a saner API for allocating DMA addressable pages v3

Hi all,

this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
with a separate new dma_alloc_pages API, which is available on all
platforms. In addition to cleaning up the convoluted code path, this
ensures that other drivers that have asked for better support for
non-coherent DMA to pages with incurring bounce buffering over can finally
be properly supported.

As a follow up I plan to move the implementation of the
DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
that is also is a fundamentally non coherent allocation. The replacement
for that flag would then return a struct page, as it is allowed to
actually return pages without a kernel mapping as the name suggested
(although most of the time they will actually have a kernel mapping..)

In addition to the conversions of the existing non-coherent DMA users,
I've also added a patch to convert the firewire ohci driver to use
the new dma_alloc_pages API.

The first patch is queued up for 5.9 in the media tree, but included here
for completeness.


A git tree is available here:

git://git.infradead.org/users/hch/misc.git dma_alloc_pages

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages


Changes since v2:
- fix up the patch reshuffle which wasn't quite correct
- fix up a few commit messages

Changes since v1:
- rebased on the latests dma-mapping tree, which merged many of the
cleanups
- fix an argument passing typo in 53c700, caught by sparse
- rename a few macro arguments in 53c700
- pass the right device to the DMA API in the lib82596 drivers
- fix memory ownershiptransfers in sgiseeq
- better document what a page in the direct kernel mapping means
- split into dma_alloc_pages that returns a struct page and is in the
direct mapping vs dma_alloc_noncoherent that can be vmapped
- conver the firewire ohci driver to dma_alloc_pages

Diffstat:


2020-09-15 17:07:47

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 12/18] sgiseeq: convert to dma_alloc_noncoherent

Use the new non-coherent DMA API including proper ownership transfers.
This includes adding additional calls to dma_sync_desc_dev as the
old syncing was rather ad-hoc.

Thanks to Thomas Bogendoerfer for debugging the ownership transfer
issues.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/net/ethernet/seeq/sgiseeq.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c
index 8507ff2420143a..37ff25a84030eb 100644
--- a/drivers/net/ethernet/seeq/sgiseeq.c
+++ b/drivers/net/ethernet/seeq/sgiseeq.c
@@ -112,14 +112,18 @@ struct sgiseeq_private {

static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr)
{
- dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
- DMA_FROM_DEVICE);
+ struct sgiseeq_private *sp = netdev_priv(dev);
+
+ dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+ sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
}

static inline void dma_sync_desc_dev(struct net_device *dev, void *addr)
{
- dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc),
- DMA_TO_DEVICE);
+ struct sgiseeq_private *sp = netdev_priv(dev);
+
+ dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr),
+ sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL);
}

static inline void hpc3_eth_reset(struct hpc3_ethregs *hregs)
@@ -403,6 +407,8 @@ static inline void sgiseeq_rx(struct net_device *dev, struct sgiseeq_private *sp
rd = &sp->rx_desc[sp->rx_new];
dma_sync_desc_cpu(dev, rd);
}
+ dma_sync_desc_dev(dev, rd);
+
dma_sync_desc_cpu(dev, &sp->rx_desc[orig_end]);
sp->rx_desc[orig_end].rdma.cntinfo &= ~(HPCDMA_EOR);
dma_sync_desc_dev(dev, &sp->rx_desc[orig_end]);
@@ -443,6 +449,7 @@ static inline void kick_tx(struct net_device *dev,
dma_sync_desc_cpu(dev, td);
}
if (td->tdma.cntinfo & HPCDMA_XIU) {
+ dma_sync_desc_dev(dev, td);
hregs->tx_ndptr = VIRT_TO_DMA(sp, td);
hregs->tx_ctrl = HPC3_ETXCTRL_ACTIVE;
}
@@ -476,6 +483,7 @@ static inline void sgiseeq_tx(struct net_device *dev, struct sgiseeq_private *sp
if (!(td->tdma.cntinfo & (HPCDMA_XIU)))
break;
if (!(td->tdma.cntinfo & (HPCDMA_ETXD))) {
+ dma_sync_desc_dev(dev, td);
if (!(status & HPC3_ETXCTRL_ACTIVE)) {
hregs->tx_ndptr = VIRT_TO_DMA(sp, td);
hregs->tx_ctrl = HPC3_ETXCTRL_ACTIVE;
@@ -740,8 +748,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
sp = netdev_priv(dev);

/* Make private data page aligned */
- sr = dma_alloc_attrs(&pdev->dev, sizeof(*sp->srings), &sp->srings_dma,
- GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+ sr = dma_alloc_noncoherent(&pdev->dev, sizeof(*sp->srings),
+ &sp->srings_dma, DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!sr) {
printk(KERN_ERR "Sgiseeq: Page alloc failed, aborting.\n");
err = -ENOMEM;
@@ -802,8 +810,8 @@ static int sgiseeq_probe(struct platform_device *pdev)
return 0;

err_out_free_attrs:
- dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings,
- sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(&pdev->dev, sizeof(*sp->srings), sp->srings,
+ sp->srings_dma, DMA_BIDIRECTIONAL);
err_out_free_dev:
free_netdev(dev);

@@ -817,8 +825,8 @@ static int sgiseeq_remove(struct platform_device *pdev)
struct sgiseeq_private *sp = netdev_priv(dev);

unregister_netdev(dev);
- dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings,
- sp->srings_dma, DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(&pdev->dev, sizeof(*sp->srings), sp->srings,
+ sp->srings_dma, DMA_BIDIRECTIONAL);
free_netdev(dev);

return 0;
--
2.28.0

2020-09-15 22:14:52

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/18] drm/exynos: stop setting DMA_ATTR_NON_CONSISTENT

DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS
configs, so don't set it in this ARM specific driver.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index efa476858db54b..07073222b8f691 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -42,8 +42,6 @@ static int exynos_drm_alloc_buf(struct exynos_drm_gem *exynos_gem, bool kvmap)
if (exynos_gem->flags & EXYNOS_BO_WC ||
!(exynos_gem->flags & EXYNOS_BO_CACHABLE))
attr |= DMA_ATTR_WRITE_COMBINE;
- else
- attr |= DMA_ATTR_NON_CONSISTENT;

/* FBDev emulation requires kernel mapping */
if (!kvmap)
--
2.28.0

2020-09-15 22:16:17

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/18] net/au1000-eth: stop using DMA_ATTR_NON_CONSISTENT

The au1000-eth driver contains none of the manual cache synchronization
required for using DMA_ATTR_NON_CONSISTENT. From what I can tell it
can be used on both dma coherent and non-coherent DMA platforms, but
I suspect it has been buggy on the non-coherent platforms all along.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/net/ethernet/amd/au1000_eth.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/amd/au1000_eth.c b/drivers/net/ethernet/amd/au1000_eth.c
index 75dbd221dc594b..19e195420e2434 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1131,10 +1131,9 @@ static int au1000_probe(struct platform_device *pdev)
/* Allocate the data buffers
* Snooping works fine with eth on all au1xxx
*/
- aup->vaddr = (u32)dma_alloc_attrs(&pdev->dev, MAX_BUF_SIZE *
+ aup->vaddr = (u32)dma_alloc_coherent(&pdev->dev, MAX_BUF_SIZE *
(NUM_TX_BUFFS + NUM_RX_BUFFS),
- &aup->dma_addr, 0,
- DMA_ATTR_NON_CONSISTENT);
+ &aup->dma_addr, 0);
if (!aup->vaddr) {
dev_err(&pdev->dev, "failed to allocate data buffers\n");
err = -ENOMEM;
@@ -1310,9 +1309,8 @@ static int au1000_probe(struct platform_device *pdev)
err_remap2:
iounmap(aup->mac);
err_remap1:
- dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
- (void *)aup->vaddr, aup->dma_addr,
- DMA_ATTR_NON_CONSISTENT);
+ dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
+ (void *)aup->vaddr, aup->dma_addr);
err_vaddr:
free_netdev(dev);
err_alloc:
@@ -1344,9 +1342,8 @@ static int au1000_remove(struct platform_device *pdev)
if (aup->tx_db_inuse[i])
au1000_ReleaseDB(aup, aup->tx_db_inuse[i]);

- dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
- (void *)aup->vaddr, aup->dma_addr,
- DMA_ATTR_NON_CONSISTENT);
+ dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS),
+ (void *)aup->vaddr, aup->dma_addr);

iounmap(aup->macdma);
iounmap(aup->mac);
--
2.28.0

2020-09-15 22:18:20

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 06/18] lib82596: move DMA allocation into the callers of i82596_probe

This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
for untangling the coherent vs non-coherent DMA allocation API.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------
drivers/net/ethernet/i825xx/lib82596.c | 36 ++++++++----------------
drivers/net/ethernet/i825xx/sni_82596.c | 19 +++++++++----
3 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c
index aec7e98bcc853a..a12218e940a2fa 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,8 +96,6 @@

#define OPT_SWAP_PORT 0x0001 /* Need to wordswp on the MPU port */

-#define LIB82596_DMA_ATTR DMA_ATTR_NON_CONSISTENT
-
#define DMA_WBACK(ndev, addr, len) \
do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0)

@@ -155,7 +153,7 @@ lan_init_chip(struct parisc_device *dev)
{
struct net_device *netdevice;
struct i596_private *lp;
- int retval;
+ int retval = -ENOMEM;
int i;

if (!dev->irq) {
@@ -186,12 +184,22 @@ lan_init_chip(struct parisc_device *dev)

lp = netdev_priv(netdevice);
lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
+ lp->dma = dma_alloc_attrs(&dev->dev, sizeof(struct i596_dma),
+ &lp->dma_addr, GFP_KERNEL,
+ DMA_ATTR_NON_CONSISTENT);
+ if (!lp->dma)
+ goto out_free_netdev;

retval = i82596_probe(netdevice);
- if (retval) {
- free_netdev(netdevice);
- return -ENODEV;
- }
+ if (retval)
+ goto out_free_dma;
+ return 0;
+
+out_free_dma:
+ dma_free_attrs(&dev->dev, sizeof(struct i596_dma), lp->dma,
+ lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+out_free_netdev:
+ free_netdev(netdevice);
return retval;
}

@@ -202,7 +210,7 @@ static int __exit lan_remove_chip(struct parisc_device *pdev)

unregister_netdev (dev);
dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
- lp->dma_addr, LIB82596_DMA_ATTR);
+ lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
free_netdev (dev);
return 0;
}
diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c
index b03757e169e475..b4e4b3eb5758b5 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -1047,9 +1047,8 @@ static const struct net_device_ops i596_netdev_ops = {

static int i82596_probe(struct net_device *dev)
{
- int i;
struct i596_private *lp = netdev_priv(dev);
- struct i596_dma *dma;
+ int ret;

/* This lot is ensure things have been cache line aligned. */
BUILD_BUG_ON(sizeof(struct i596_rfd) != 32);
@@ -1063,41 +1062,28 @@ static int i82596_probe(struct net_device *dev)
if (!dev->base_addr || !dev->irq)
return -ENODEV;

- dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma),
- &lp->dma_addr, GFP_KERNEL,
- LIB82596_DMA_ATTR);
- if (!dma) {
- printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__);
- return -ENOMEM;
- }
-
dev->netdev_ops = &i596_netdev_ops;
dev->watchdog_timeo = TX_TIMEOUT;

- memset(dma, 0, sizeof(struct i596_dma));
- lp->dma = dma;
-
- dma->scb.command = 0;
- dma->scb.cmd = I596_NULL;
- dma->scb.rfd = I596_NULL;
+ memset(lp->dma, 0, sizeof(struct i596_dma));
+ lp->dma->scb.command = 0;
+ lp->dma->scb.cmd = I596_NULL;
+ lp->dma->scb.rfd = I596_NULL;
spin_lock_init(&lp->lock);

- DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+ DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));

- i = register_netdev(dev);
- if (i) {
- dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma),
- dma, lp->dma_addr, LIB82596_DMA_ATTR);
- return i;
- }
+ ret = register_netdev(dev);
+ if (ret)
+ return ret;

DEB(DEB_PROBE, printk(KERN_INFO "%s: 82596 at %#3lx, %pM IRQ %d.\n",
dev->name, dev->base_addr, dev->dev_addr,
dev->irq));
DEB(DEB_INIT, printk(KERN_INFO
"%s: dma at 0x%p (%d bytes), lp->scb at 0x%p\n",
- dev->name, dma, (int)sizeof(struct i596_dma),
- &dma->scb));
+ dev->name, lp->dma, (int)sizeof(struct i596_dma),
+ &lp->dma->scb));

return 0;
}
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
index 22f5887578b2bd..4b9ac0c6557731 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -24,8 +24,6 @@

static const char sni_82596_string[] = "snirm_82596";

-#define LIB82596_DMA_ATTR 0
-
#define DMA_WBACK(priv, addr, len) do { } while (0)
#define DMA_INV(priv, addr, len) do { } while (0)
#define DMA_WBACK_INV(priv, addr, len) do { } while (0)
@@ -134,10 +132,19 @@ static int sni_82596_probe(struct platform_device *dev)
lp->ca = ca_addr;
lp->mpu_port = mpu_addr;

+ lp->dma = dma_alloc_coherent(&dev->dev, sizeof(struct i596_dma),
+ &lp->dma_addr, GFP_KERNEL);
+ if (!lp->dma)
+ goto probe_failed;
+
retval = i82596_probe(netdevice);
- if (retval == 0)
- return 0;
+ if (retval)
+ goto probe_failed_free_dma;
+ return 0;

+probe_failed_free_dma:
+ dma_free_coherent(&dev->dev, sizeof(struct i596_dma), lp->dma,
+ lp->dma_addr);
probe_failed:
free_netdev(netdevice);
probe_failed_free_ca:
@@ -153,8 +160,8 @@ static int sni_82596_driver_remove(struct platform_device *pdev)
struct i596_private *lp = netdev_priv(dev);

unregister_netdev(dev);
- dma_free_attrs(dev->dev.parent, sizeof(struct i596_private), lp->dma,
- lp->dma_addr, LIB82596_DMA_ATTR);
+ dma_free_coherent(&pdev->dev, sizeof(struct i596_private), lp->dma,
+ lp->dma_addr);
iounmap(lp->ca);
iounmap(lp->mpu_port);
free_netdev (dev);
--
2.28.0

2020-09-15 22:20:02

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

Implement the alloc_noncoherent method to provide memory that is neither
coherent not contiguous.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/iommu/dma-iommu.c | 41 +++++++++++++++++++++++++++++++++++----
1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 00a5b49248e334..c12c1dc43d312e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -572,6 +572,7 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
* @size: Size of buffer in bytes
* @dma_handle: Out argument for allocated DMA handle
* @gfp: Allocation flags
+ * @prot: pgprot_t to use for the remapped mapping
* @attrs: DMA attributes for this allocation
*
* If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
@@ -580,14 +581,14 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
* Return: Mapped virtual address, or NULL on failure.
*/
static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+ dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
+ unsigned long attrs)
{
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
- pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
struct sg_table sgt;
@@ -1030,8 +1031,10 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
gfp |= __GFP_ZERO;

if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) &&
- !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
- return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
+ !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
+ return iommu_dma_alloc_remap(dev, size, handle, gfp,
+ dma_pgprot(dev, PAGE_KERNEL, attrs), attrs);
+ }

if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
@@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
return cpu_addr;
}

+#ifdef CONFIG_DMA_REMAP
+static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
+ dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
+{
+ if (!gfpflags_allow_blocking(gfp)) {
+ struct page *page;
+
+ page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
+ if (!page)
+ return NULL;
+ return page_address(page);
+ }
+
+ return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
+ PAGE_KERNEL, 0);
+}
+
+static void iommu_dma_free_noncoherent(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t handle, enum dma_data_direction dir)
+{
+ __iommu_dma_unmap(dev, handle, size);
+ __iommu_dma_free(dev, size, cpu_addr);
+}
+#else
+#define iommu_dma_alloc_noncoherent NULL
+#define iommu_dma_free_noncoherent NULL
+#endif /* CONFIG_DMA_REMAP */
+
static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
@@ -1122,6 +1153,8 @@ static const struct dma_map_ops iommu_dma_ops = {
.free = iommu_dma_free,
.alloc_pages = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
+ .alloc_noncoherent = iommu_dma_alloc_noncoherent,
+ .free_noncoherent = iommu_dma_free_noncoherent,
.mmap = iommu_dma_mmap,
.get_sgtable = iommu_dma_get_sgtable,
.map_page = iommu_dma_map_page,
--
2.28.0

2020-09-15 22:20:04

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 18/18] firewire-ohci: use dma_alloc_pages

Use dma_alloc_pages to allocate DMAable pages instead of hoping that
the architecture either has GFP_DMA32 or not more than 4G of memory.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/firewire/ohci.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 020cb15a4d8fcc..9811c40956e54d 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -674,17 +674,16 @@ static void ar_context_link_page(struct ar_context *ctx, unsigned int index)

static void ar_context_release(struct ar_context *ctx)
{
+ struct device *dev = ctx->ohci->card.device;
unsigned int i;

vunmap(ctx->buffer);

- for (i = 0; i < AR_BUFFERS; i++)
- if (ctx->pages[i]) {
- dma_unmap_page(ctx->ohci->card.device,
- ar_buffer_bus(ctx, i),
- PAGE_SIZE, DMA_FROM_DEVICE);
- __free_page(ctx->pages[i]);
- }
+ for (i = 0; i < AR_BUFFERS; i++) {
+ if (ctx->pages[i])
+ dma_free_pages(dev, PAGE_SIZE, ctx->pages[i],
+ ar_buffer_bus(ctx, i), DMA_FROM_DEVICE);
+ }
}

static void ar_context_abort(struct ar_context *ctx, const char *error_msg)
@@ -970,6 +969,7 @@ static void ar_context_tasklet(unsigned long data)
static int ar_context_init(struct ar_context *ctx, struct fw_ohci *ohci,
unsigned int descriptors_offset, u32 regs)
{
+ struct device *dev = ohci->card.device;
unsigned int i;
dma_addr_t dma_addr;
struct page *pages[AR_BUFFERS + AR_WRAPAROUND_PAGES];
@@ -980,17 +980,13 @@ static int ar_context_init(struct ar_context *ctx, struct fw_ohci *ohci,
tasklet_init(&ctx->tasklet, ar_context_tasklet, (unsigned long)ctx);

for (i = 0; i < AR_BUFFERS; i++) {
- ctx->pages[i] = alloc_page(GFP_KERNEL | GFP_DMA32);
+ ctx->pages[i] = dma_alloc_pages(dev, PAGE_SIZE, &dma_addr,
+ DMA_FROM_DEVICE, GFP_KERNEL);
if (!ctx->pages[i])
goto out_of_memory;
- dma_addr = dma_map_page(ohci->card.device, ctx->pages[i],
- 0, PAGE_SIZE, DMA_FROM_DEVICE);
- if (dma_mapping_error(ohci->card.device, dma_addr)) {
- __free_page(ctx->pages[i]);
- ctx->pages[i] = NULL;
- goto out_of_memory;
- }
set_page_private(ctx->pages[i], dma_addr);
+ dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE,
+ DMA_FROM_DEVICE);
}

for (i = 0; i < AR_BUFFERS; i++)
--
2.28.0

2020-09-15 22:20:39

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 16/18] dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods

This will allow IOMMU drivers to allocate non-contigous memory and
return a vmapped virtual address.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/dma-mapping.h | 5 +++++
kernel/dma/mapping.c | 33 +++++++++++++++++++++++++++------
2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index bf592cf0db4acb..b4b5d75260d6dc 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -80,6 +80,11 @@ struct dma_map_ops {
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir);
+ void* (*alloc_noncoherent)(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir,
+ gfp_t gfp);
+ void (*free_noncoherent)(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, enum dma_data_direction dir);
int (*mmap)(struct device *, struct vm_area_struct *,
void *, dma_addr_t, size_t,
unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 6f86c925b8251d..8614d7d2ee59a9 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -502,19 +502,40 @@ EXPORT_SYMBOL_GPL(dma_free_pages);
void *dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
{
- struct page *page;
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ void *vaddr;

- page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
- if (!page)
- return NULL;
- return page_address(page);
+ if (!ops || !ops->alloc_noncoherent) {
+ struct page *page;
+
+ page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
+ if (!page)
+ return NULL;
+ return page_address(page);
+ }
+
+ size = PAGE_ALIGN(size);
+ vaddr = ops->alloc_noncoherent(dev, size, dma_handle, dir, gfp);
+ if (vaddr)
+ debug_dma_map_page(dev, virt_to_page(vaddr), 0, size, dir,
+ *dma_handle);
+ return vaddr;
}
EXPORT_SYMBOL_GPL(dma_alloc_noncoherent);

void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, enum dma_data_direction dir)
{
- dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ if (!ops || !ops->free_noncoherent) {
+ dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
+ return;
+ }
+
+ size = PAGE_ALIGN(size);
+ debug_dma_unmap_page(dev, dma_handle, size, dir);
+ ops->free_noncoherent(dev, size, vaddr, dma_handle, dir);
}
EXPORT_SYMBOL_GPL(dma_free_noncoherent);

--
2.28.0

2020-09-15 22:22:33

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 08/18] dma-mapping: add a new dma_alloc_noncoherent API

Add a new API to allocate and free memory that is guaranteed to be
addressable by a device, but which potentially is not cache coherent
for DMA.

To transfer ownership to and from the device, the existing streaming
DMA API calls dma_sync_single_for_device and dma_sync_single_for_cpu
must be used.

For now the new calls are implemented on top of dma_alloc_attrs just
like the old-noncoherent API, but once all drivers are switched to
the new API it will be replaced with a better working implementation
that is available on all architectures.

Signed-off-by: Christoph Hellwig <[email protected]>
---
Documentation/core-api/dma-api.rst | 75 ++++++++++++++----------------
include/linux/dma-mapping.h | 12 +++++
2 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 90239348b30f6f..ea0413276ddb70 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -516,48 +516,56 @@ routines, e.g.:::
}


-Part II - Advanced dma usage
-----------------------------
+Part II - Non-coherent DMA allocations
+--------------------------------------

-Warning: These pieces of the DMA API should not be used in the
-majority of cases, since they cater for unlikely corner cases that
-don't belong in usual drivers.
+These APIs allow to allocate pages in the kernel direct mapping that are
+guaranteed to be DMA addressable. This means that unlike dma_alloc_coherent,
+virt_to_page can be called on the resulting address, and the resulting
+struct page can be used for everything a struct page is suitable for.

-If you don't understand how cache line coherency works between a
-processor and an I/O device, you should not be using this part of the
-API at all.
+If you don't understand how cache line coherency works between a processor and
+an I/O device, you should not be using this part of the API.

::

void *
- dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
- gfp_t flag, unsigned long attrs)
+ dma_alloc_noncoherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir,
+ gfp_t gfp)

-Identical to dma_alloc_coherent() except that when the
-DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the
-platform will choose to return either consistent or non-consistent memory
-as it sees fit. By using this API, you are guaranteeing to the platform
-that you have all the correct and necessary sync points for this memory
-in the driver should it choose to return non-consistent memory.
+This routine allocates a region of <size> bytes of consistent memory. It
+returns a pointer to the allocated region (in the processor's virtual address
+space) or NULL if the allocation failed. The returned memory may or may not
+be in the kernels direct mapping. Drivers must not call virt_to_page on
+the returned memory region.

-Note: where the platform can return consistent memory, it will
-guarantee that the sync points become nops.
+It also returns a <dma_handle> which may be cast to an unsigned integer the
+same width as the bus and given to the device as the DMA address base of
+the region.

-Warning: Handling non-consistent memory is a real pain. You should
-only use this API if you positively know your driver will be
-required to work on one of the rare (usually non-PCI) architectures
-that simply cannot make consistent memory.
+The dir parameter specified if data is read and/or written by the device,
+see dma_map_single() for details.
+
+The gfp parameter allows the caller to specify the ``GFP_`` flags (see
+kmalloc()) for the allocation, but rejects flags used to specify a memory
+zone such as GFP_DMA or GFP_HIGHMEM.
+
+Before giving the memory to the device, dma_sync_single_for_device() needs
+to be called, and before reading memory written by the device,
+dma_sync_single_for_cpu(), just like for streaming DMA mappings that are
+reused.

::

void
- dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
- dma_addr_t dma_handle, unsigned long attrs)
+ dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
+ dma_addr_t dma_handle, enum dma_data_direction dir)

-Free memory allocated by the dma_alloc_attrs(). All common
-parameters must be identical to those otherwise passed to dma_free_coherent,
-and the attrs argument must be identical to the attrs passed to
-dma_alloc_attrs().
+Free a region of memory previously allocated using dma_alloc_noncoherent().
+dev, size and dma_handle and dir must all be the same as those passed into
+dma_alloc_noncoherent(). cpu_addr must be the virtual address returned by
+the dma_alloc_noncoherent().

::

@@ -575,17 +583,6 @@ memory or doing partial flushes.
into the width returned by this call. It will also always be a power
of two for easy alignment.

-::
-
- void
- dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction direction)
-
-Do a partial sync of memory that was allocated by dma_alloc_attrs() with
-the DMA_ATTR_NON_CONSISTENT flag starting at virtual address vaddr and
-continuing on for size. Again, you *must* observe the cache line
-boundaries when doing this.
-

Part III - Debug drivers use of the DMA-API
-------------------------------------------
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index df0bff2ea750e0..4e1de194b45cbf 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -389,6 +389,18 @@ static inline unsigned long dma_get_merge_boundary(struct device *dev)
}
#endif /* CONFIG_HAS_DMA */

+static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+ return dma_alloc_attrs(dev, size, dma_handle, gfp,
+ DMA_ATTR_NON_CONSISTENT);
+}
+static inline void dma_free_noncoherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+ dma_free_attrs(dev, size, vaddr, dma_handle, DMA_ATTR_NON_CONSISTENT);
+}
+
static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
--
2.28.0

2020-09-15 22:23:57

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 15/18] dma-mapping: add a new dma_alloc_pages API

This API is the equivalent of alloc_pages, except that the returned memory
is guaranteed to be DMA addressable by the passed in device. The
implementation will also be used to provide a more sensible replacement
for DMA_ATTR_NON_CONSISTENT flag.

Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages
as its backend.

Signed-off-by: Christoph Hellwig <[email protected]>
---
Documentation/core-api/dma-attributes.rst | 8 ---
arch/alpha/kernel/pci_iommu.c | 2 +
arch/arm/mm/dma-mapping-nommu.c | 2 +
arch/arm/mm/dma-mapping.c | 4 ++
arch/ia64/hp/common/sba_iommu.c | 2 +
arch/mips/jazz/jazzdma.c | 7 +--
arch/powerpc/kernel/dma-iommu.c | 2 +
arch/powerpc/platforms/ps3/system-bus.c | 4 ++
arch/powerpc/platforms/pseries/vio.c | 2 +
arch/s390/pci/pci_dma.c | 2 +
arch/x86/kernel/amd_gart_64.c | 2 +
drivers/iommu/dma-iommu.c | 2 +
drivers/iommu/intel/iommu.c | 4 ++
drivers/parisc/ccio-dma.c | 2 +
drivers/parisc/sba_iommu.c | 2 +
drivers/xen/swiotlb-xen.c | 2 +
include/linux/dma-direct.h | 5 ++
include/linux/dma-mapping.h | 34 ++++++------
include/linux/dma-noncoherent.h | 3 --
kernel/dma/direct.c | 52 ++++++++++++++++++-
kernel/dma/mapping.c | 63 +++++++++++++++++++++--
kernel/dma/ops_helpers.c | 35 +++++++++++++
kernel/dma/virt.c | 2 +
23 files changed, 206 insertions(+), 37 deletions(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 29dcbe8826e85e..1887d92e8e9269 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -25,14 +25,6 @@ Since it is optional for platforms to implement DMA_ATTR_WRITE_COMBINE,
those that do not will simply ignore the attribute and exhibit default
behavior.

-DMA_ATTR_NON_CONSISTENT
------------------------
-
-DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either
-consistent or non-consistent memory as it sees fit. By using this API,
-you are guaranteeing to the platform that you have all the correct and
-necessary sync points for this memory in the driver.
-
DMA_ATTR_NO_KERNEL_MAPPING
--------------------------

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 6f7de4f4e191e7..447e0fd0ed3895 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -952,5 +952,7 @@ const struct dma_map_ops alpha_pci_ops = {
.dma_supported = alpha_pci_supported,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};
EXPORT_SYMBOL(alpha_pci_ops);
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 287ef898a55e11..43c6d66b6e733a 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -176,6 +176,8 @@ static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist
const struct dma_map_ops arm_nommu_dma_ops = {
.alloc = arm_nommu_dma_alloc,
.free = arm_nommu_dma_free,
+ .alloc_pages = dma_direct_alloc_pages,
+ .free_pages = dma_direct_free_pages,
.mmap = arm_nommu_dma_mmap,
.map_page = arm_nommu_dma_map_page,
.unmap_page = arm_nommu_dma_unmap_page,
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8a8949174b1c06..7738b4d23f692f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -199,6 +199,8 @@ static int arm_dma_supported(struct device *dev, u64 mask)
const struct dma_map_ops arm_dma_ops = {
.alloc = arm_dma_alloc,
.free = arm_dma_free,
+ .alloc_pages = dma_direct_alloc_pages,
+ .free_pages = dma_direct_free_pages,
.mmap = arm_dma_mmap,
.get_sgtable = arm_dma_get_sgtable,
.map_page = arm_dma_map_page,
@@ -226,6 +228,8 @@ static int arm_coherent_dma_mmap(struct device *dev, struct vm_area_struct *vma,
const struct dma_map_ops arm_coherent_dma_ops = {
.alloc = arm_coherent_dma_alloc,
.free = arm_coherent_dma_free,
+ .alloc_pages = dma_direct_alloc_pages,
+ .free_pages = dma_direct_free_pages,
.mmap = arm_coherent_dma_mmap,
.get_sgtable = arm_dma_get_sgtable,
.map_page = arm_coherent_dma_map_page,
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index b49b73a95067d2..cafbb848a34e4d 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2070,6 +2070,8 @@ static const struct dma_map_ops sba_dma_ops = {
.dma_supported = sba_dma_supported,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};

static int __init
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 2bf849caf507b1..f53bc043334c01 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -506,9 +506,6 @@ static void *jazz_dma_alloc(struct device *dev, size_t size,
*dma_handle = vdma_alloc(virt_to_phys(ret), size);
if (*dma_handle == DMA_MAPPING_ERROR)
goto out_free_pages;
-
- if (attrs & DMA_ATTR_NON_CONSISTENT)
- return ret;
arch_dma_prep_coherent(page, size);
return (void *)(UNCAC_BASE + __pa(ret));

@@ -521,8 +518,6 @@ static void jazz_dma_free(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs)
{
vdma_free(dma_handle);
- if (!(attrs & DMA_ATTR_NON_CONSISTENT))
- vaddr = __va(vaddr - UNCAC_BASE);
__free_pages(virt_to_page(vaddr), get_order(size));
}

@@ -622,5 +617,7 @@ const struct dma_map_ops jazz_dma_ops = {
.sync_sg_for_device = jazz_dma_sync_sg_for_device,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};
EXPORT_SYMBOL(jazz_dma_ops);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 569fecd7b5b234..d4e702d74b3393 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -137,4 +137,6 @@ const struct dma_map_ops dma_iommu_ops = {
.get_required_mask = dma_iommu_get_required_mask,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};
diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c
index 3542b7bd6a4689..7bc5f9be3e12d8 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -696,6 +696,8 @@ static const struct dma_map_ops ps3_sb_dma_ops = {
.unmap_page = ps3_unmap_page,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};

static const struct dma_map_ops ps3_ioc0_dma_ops = {
@@ -708,6 +710,8 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = {
.unmap_page = ps3_unmap_page,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};

/**
diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c
index 0487b26f6f1af3..98ed7b09b3fe50 100644
--- a/arch/powerpc/platforms/pseries/vio.c
+++ b/arch/powerpc/platforms/pseries/vio.c
@@ -608,6 +608,8 @@ static const struct dma_map_ops vio_dma_mapping_ops = {
.get_required_mask = dma_iommu_get_required_mask,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};

/**
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 4a37d8f4de9d9d..9291023e9469c2 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -668,6 +668,8 @@ const struct dma_map_ops s390_pci_dma_ops = {
.unmap_page = s390_dma_unmap_pages,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
/* dma_supported is unconditionally true without a callback */
};
EXPORT_SYMBOL_GPL(s390_pci_dma_ops);
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 153374b996a279..c96dcaa572ebd3 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -677,6 +677,8 @@ static const struct dma_map_ops gart_dma_ops = {
.get_sgtable = dma_common_get_sgtable,
.dma_supported = dma_direct_supported,
.get_required_mask = dma_direct_get_required_mask,
+ .alloc_pages = dma_direct_alloc_pages,
+ .free_pages = dma_direct_free_pages,
};

static void gart_iommu_shutdown(void)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5141d49a046baa..00a5b49248e334 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1120,6 +1120,8 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
static const struct dma_map_ops iommu_dma_ops = {
.alloc = iommu_dma_alloc,
.free = iommu_dma_free,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
.mmap = iommu_dma_mmap,
.get_sgtable = iommu_dma_get_sgtable,
.map_page = iommu_dma_map_page,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7983c13b9eef7d..26eb7aafa0bda6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3669,6 +3669,8 @@ static const struct dma_map_ops intel_dma_ops = {
.dma_supported = dma_direct_supported,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
.get_required_mask = intel_get_required_mask,
};

@@ -3922,6 +3924,8 @@ static const struct dma_map_ops bounce_dma_ops = {
.sync_sg_for_device = bounce_sync_sg_for_device,
.map_resource = bounce_map_resource,
.unmap_resource = bounce_unmap_resource,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
.dma_supported = dma_direct_supported,
};

diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index ba16b7f8f80612..8cf0b9c8bdf795 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1024,6 +1024,8 @@ static const struct dma_map_ops ccio_ops = {
.map_sg = ccio_map_sg,
.unmap_sg = ccio_unmap_sg,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};

#ifdef CONFIG_PROC_FS
diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index 959bda193b9603..6fcde7980358ae 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -1076,6 +1076,8 @@ static const struct dma_map_ops sba_ops = {
.map_sg = sba_map_sg,
.unmap_sg = sba_unmap_sg,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};


diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 39a0f2e0847c95..030a225624b060 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -578,4 +578,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.dma_supported = xen_swiotlb_dma_supported,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 805010ea5346f9..c11bb935fc7fe3 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -77,6 +77,11 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t dma_addr, unsigned long attrs);
+struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_direct_free_pages(struct device *dev, size_t size,
+ struct page *page, dma_addr_t dma_addr,
+ enum dma_data_direction dir);
int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 5b4e97b0846fd3..bf592cf0db4acb 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -27,11 +27,6 @@
* buffered to improve performance.
*/
#define DMA_ATTR_WRITE_COMBINE (1UL << 2)
-/*
- * DMA_ATTR_NON_CONSISTENT: Lets the platform to choose to return either
- * consistent or non-consistent memory as it sees fit.
- */
-#define DMA_ATTR_NON_CONSISTENT (1UL << 3)
/*
* DMA_ATTR_NO_KERNEL_MAPPING: Lets the platform to avoid creating a kernel
* virtual mapping for the allocated buffer.
@@ -80,6 +75,11 @@ struct dma_map_ops {
void (*free)(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle,
unsigned long attrs);
+ struct page *(*alloc_pages)(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir,
+ gfp_t gfp);
+ void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
+ dma_addr_t dma_handle, enum dma_data_direction dir);
int (*mmap)(struct device *, struct vm_area_struct *,
void *, dma_addr_t, size_t,
unsigned long attrs);
@@ -381,17 +381,14 @@ static inline unsigned long dma_get_merge_boundary(struct device *dev)
}
#endif /* CONFIG_HAS_DMA */

-static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
- dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
-{
- return dma_alloc_attrs(dev, size, dma_handle, gfp,
- DMA_ATTR_NON_CONSISTENT);
-}
-static inline void dma_free_noncoherent(struct device *dev, size_t size,
- void *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir)
-{
- dma_free_attrs(dev, size, vaddr, dma_handle, DMA_ATTR_NON_CONSISTENT);
-}
+struct page *dma_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_free_pages(struct device *dev, size_t size, struct page *page,
+ dma_addr_t dma_handle, enum dma_data_direction dir);
+void *dma_alloc_noncoherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, enum dma_data_direction dir);

static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
@@ -517,7 +514,10 @@ static inline void dma_sync_sgtable_for_device(struct device *dev,
extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
-
+struct page *dma_common_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp);
+void dma_common_free_pages(struct device *dev, size_t size, struct page *vaddr,
+ dma_addr_t dma_handle, enum dma_data_direction dir);
struct page **dma_common_find_pages(void *cpu_addr);
void *dma_common_contiguous_remap(struct page *page, size_t size,
pgprot_t prot, const void *caller);
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 0888656369a45b..e61283e06576a8 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -31,9 +31,6 @@ static __always_inline bool dma_alloc_need_uncached(struct device *dev,
return false;
if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
return false;
- if (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
- (attrs & DMA_ATTR_NON_CONSISTENT))
- return false;
return true;
}

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 54db9cfdaecc6d..9ba320383b0d19 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Copyright (C) 2018 Christoph Hellwig.
+ * Copyright (C) 2018-2020 Christoph Hellwig.
*
* DMA operations that map physical memory directly without using an IOMMU.
*/
@@ -287,6 +287,56 @@ void dma_direct_free(struct device *dev, size_t size,
dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
}

+struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+ struct page *page;
+ void *ret;
+
+ if (dma_should_alloc_from_pool(dev, gfp, 0)) {
+ page = dma_alloc_from_pool(dev, size, &ret, gfp,
+ dma_coherent_ok);
+ if (!page)
+ return NULL;
+ goto done;
+ }
+
+ page = __dma_direct_alloc_pages(dev, size, gfp);
+ if (!page)
+ return NULL;
+ ret = page_address(page);
+ if (force_dma_unencrypted(dev)) {
+ if (set_memory_decrypted((unsigned long)ret,
+ 1 << get_order(size)))
+ goto out_free_pages;
+ }
+ memset(ret, 0, size);
+done:
+ *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+ return page;
+out_free_pages:
+ dma_free_contiguous(dev, page, size);
+ return NULL;
+}
+
+void dma_direct_free_pages(struct device *dev, size_t size,
+ struct page *page, dma_addr_t dma_addr,
+ enum dma_data_direction dir)
+{
+ unsigned int page_order = get_order(size);
+ void *vaddr = page_address(page);
+
+ /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
+ if (dma_should_free_from_pool(dev, 0) &&
+ dma_free_from_pool(dev, vaddr, size))
+ return;
+
+ if (force_dma_unencrypted(dev))
+ set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
+
+ dma_free_contiguous(dev, page, size);
+}
+
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
defined(CONFIG_SWIOTLB)
void dma_direct_sync_sg_for_device(struct device *dev,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e71abcec8d3913..6f86c925b8251d 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -330,9 +330,7 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs)
{
if (force_dma_unencrypted(dev))
prot = pgprot_decrypted(prot);
- if (dev_is_dma_coherent(dev) ||
- (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) &&
- (attrs & DMA_ATTR_NON_CONSISTENT)))
+ if (dev_is_dma_coherent(dev))
return prot;
#ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINE
if (attrs & DMA_ATTR_WRITE_COMBINE)
@@ -461,6 +459,65 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr,
}
EXPORT_SYMBOL(dma_free_attrs);

+struct page *dma_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ struct page *page;
+
+ if (WARN_ON_ONCE(!dev->coherent_dma_mask))
+ return NULL;
+ if (WARN_ON_ONCE(gfp & (__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM)))
+ return NULL;
+
+ size = PAGE_ALIGN(size);
+ if (dma_alloc_direct(dev, ops))
+ page = dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
+ else if (ops->alloc_pages)
+ page = ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+ else
+ return NULL;
+
+ debug_dma_map_page(dev, page, 0, size, dir, *dma_handle);
+
+ return page;
+}
+EXPORT_SYMBOL_GPL(dma_alloc_pages);
+
+void dma_free_pages(struct device *dev, size_t size, struct page *page,
+ dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ size = PAGE_ALIGN(size);
+ debug_dma_unmap_page(dev, dma_handle, size, dir);
+
+ if (dma_alloc_direct(dev, ops))
+ dma_direct_free_pages(dev, size, page, dma_handle, dir);
+ else if (ops->free_pages)
+ ops->free_pages(dev, size, page, dma_handle, dir);
+}
+EXPORT_SYMBOL_GPL(dma_free_pages);
+
+void *dma_alloc_noncoherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+ struct page *page;
+
+ page = dma_alloc_pages(dev, size, dma_handle, dir, gfp);
+ if (!page)
+ return NULL;
+ return page_address(page);
+}
+EXPORT_SYMBOL_GPL(dma_alloc_noncoherent);
+
+void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+ dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir);
+}
+EXPORT_SYMBOL_GPL(dma_free_noncoherent);
+
int dma_supported(struct device *dev, u64 mask)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index e443c69be4299f..5828e5e01b7913 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -3,6 +3,7 @@
* Helpers for DMA ops implementations. These generally rely on the fact that
* the allocated memory contains normal pages in the direct kernel mapping.
*/
+#include <linux/dma-contiguous.h>
#include <linux/dma-noncoherent.h>

/*
@@ -49,3 +50,37 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
return -ENXIO;
#endif /* CONFIG_MMU */
}
+
+struct page *dma_common_alloc_pages(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ struct page *page;
+
+ page = dma_alloc_contiguous(dev, size, gfp);
+ if (!page)
+ page = alloc_pages_node(dev_to_node(dev), gfp, get_order(size));
+ if (!page)
+ return NULL;
+
+ *dma_handle = ops->map_page(dev, page, 0, size, dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ if (*dma_handle == DMA_MAPPING_ERROR) {
+ dma_free_contiguous(dev, page, size);
+ return NULL;
+ }
+
+ memset(page_address(page), 0, size);
+ return page;
+}
+
+void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
+ dma_addr_t dma_handle, enum dma_data_direction dir)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ if (ops->unmap_page)
+ ops->unmap_page(dev, dma_handle, size, dir,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ dma_free_contiguous(dev, page, size);
+}
diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c
index ebe128833af7b5..6986bf1fd6689c 100644
--- a/kernel/dma/virt.c
+++ b/kernel/dma/virt.c
@@ -55,5 +55,7 @@ const struct dma_map_ops dma_virt_ops = {
.free = dma_virt_free,
.map_page = dma_virt_map_page,
.map_sg = dma_virt_map_sg,
+ .alloc_pages = dma_common_alloc_pages,
+ .free_pages = dma_common_free_pages,
};
EXPORT_SYMBOL(dma_virt_ops);
--
2.28.0

2020-09-15 22:26:13

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 02/18] mm: turn alloc_pages into an inline function

To prevent a compiler error when a method call alloc_pages is
added (which I plan to for the dma_map_ops).

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/gfp.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 67a0774e080b98..dd2577c5407112 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -550,8 +550,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true)
#else
-#define alloc_pages(gfp_mask, order) \
- alloc_pages_node(numa_node_id(), gfp_mask, order)
+static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+ return alloc_pages_node(numa_node_id(), gfp_mask, order);
+}
#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
alloc_pages(gfp_mask, order)
#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
--
2.28.0

2020-09-15 22:26:13

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 14/18] dma-mapping: remove dma_cache_sync

All users are gone now, remove the API.

Signed-off-by: Christoph Hellwig <[email protected]>
---
arch/mips/Kconfig | 1 -
arch/mips/jazz/jazzdma.c | 1 -
arch/mips/mm/dma-noncoherent.c | 6 ------
arch/parisc/Kconfig | 1 -
arch/parisc/kernel/pci-dma.c | 6 ------
include/linux/dma-mapping.h | 8 --------
include/linux/dma-noncoherent.h | 10 ----------
kernel/dma/Kconfig | 3 ---
kernel/dma/mapping.c | 14 --------------
9 files changed, 50 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c95fa3a2484cf0..1be91c5d666e61 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1134,7 +1134,6 @@ config DMA_NONCOHERENT
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_DMA_SET_UNCACHED
select DMA_NONCOHERENT_MMAP
- select DMA_NONCOHERENT_CACHE_SYNC
select NEED_DMA_MAP_STATE

config SYS_HAS_EARLY_PRINTK
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index dab4d058cea9b1..2bf849caf507b1 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -620,7 +620,6 @@ const struct dma_map_ops jazz_dma_ops = {
.sync_single_for_device = jazz_dma_sync_single_for_device,
.sync_sg_for_cpu = jazz_dma_sync_sg_for_cpu,
.sync_sg_for_device = jazz_dma_sync_sg_for_device,
- .cache_sync = arch_dma_cache_sync,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
};
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index 97a14adbafc99c..f34ad1f09799f1 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -137,12 +137,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
}
#endif

-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction direction)
-{
- dma_sync_virt_for_device(vaddr, size, direction);
-}
-
#ifdef CONFIG_DMA_PERDEV_COHERENT
void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 3b0f53dd70bc9b..ed15da1da174e0 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -195,7 +195,6 @@ config PA11
depends on PA7000 || PA7100LC || PA7200 || PA7300LC
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
- select DMA_NONCOHERENT_CACHE_SYNC

config PREFETCH
def_bool y
diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c
index 38c68e131bbe2a..ce38c0b9158125 100644
--- a/arch/parisc/kernel/pci-dma.c
+++ b/arch/parisc/kernel/pci-dma.c
@@ -454,9 +454,3 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
{
flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size);
}
-
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction direction)
-{
- flush_kernel_dcache_range((unsigned long)vaddr, size);
-}
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4e1de194b45cbf..5b4e97b0846fd3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -123,8 +123,6 @@ struct dma_map_ops {
void (*sync_sg_for_device)(struct device *dev,
struct scatterlist *sg, int nents,
enum dma_data_direction dir);
- void (*cache_sync)(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction direction);
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
@@ -254,8 +252,6 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle);
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction dir);
int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs);
@@ -339,10 +335,6 @@ static inline void dmam_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle)
{
}
-static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction dir)
-{
-}
static inline int dma_get_sgtable_attrs(struct device *dev,
struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr,
size_t size, unsigned long attrs)
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index b9bc6c557ea46f..0888656369a45b 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -62,16 +62,6 @@ static inline pgprot_t dma_pgprot(struct device *dev, pgprot_t prot,
}
#endif /* CONFIG_MMU */

-#ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction direction);
-#else
-static inline void arch_dma_cache_sync(struct device *dev, void *vaddr,
- size_t size, enum dma_data_direction direction)
-{
-}
-#endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */
-
#ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE
void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
enum dma_data_direction dir);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 281785feb874db..c5f717021f5654 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -75,9 +75,6 @@ config ARCH_HAS_DMA_PREP_COHERENT
config ARCH_HAS_FORCE_DMA_UNENCRYPTED
bool

-config DMA_NONCOHERENT_CACHE_SYNC
- bool
-
config DMA_VIRT_OPS
bool
depends on HAS_DMA
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 848c95c27d79ff..e71abcec8d3913 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -518,20 +518,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
EXPORT_SYMBOL(dma_set_coherent_mask);
#endif

-void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
- enum dma_data_direction dir)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
-
- if (dma_alloc_direct(dev, ops))
- arch_dma_cache_sync(dev, vaddr, size, dir);
- else if (ops->cache_sync)
- ops->cache_sync(dev, vaddr, size, dir);
-}
-EXPORT_SYMBOL(dma_cache_sync);
-
size_t dma_max_mapping_size(struct device *dev)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
--
2.28.0

2020-09-15 22:27:24

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/18] drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENT

DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS
configs, so don't set it in this ARM specific driver part.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 985f2990ab0dda..13d4d7ac0697b4 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -594,8 +594,7 @@ gk20a_instmem_new(struct nvkm_device *device, int index,

nvkm_info(&imem->base.subdev, "using IOMMU\n");
} else {
- imem->attrs = DMA_ATTR_NON_CONSISTENT |
- DMA_ATTR_WEAK_ORDERING |
+ imem->attrs = DMA_ATTR_WEAK_ORDERING |
DMA_ATTR_WRITE_COMBINE;

nvkm_info(&imem->base.subdev, "using DMA API\n");
--
2.28.0

2020-09-15 22:29:40

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 13/18] 53c700: convert to dma_alloc_noncoherent

Use the new non-coherent DMA API including proper ownership transfers.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/scsi/53c700.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index c59226d7e2f6b5..5117d90ccd9edf 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,18 +269,25 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
spi_period(SDp->sdev_target));
}

+static inline dma_addr_t virt_to_dma(struct NCR_700_Host_Parameters *h, void *p)
+{
+ return h->pScript + ((uintptr_t)p - (uintptr_t)h->script);
+}
+
static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
void *addr, size_t size)
{
if (h->noncoherent)
- dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+ dma_sync_single_for_device(h->dev, virt_to_dma(h, addr),
+ size, DMA_BIDIRECTIONAL);
}

static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
void *addr, size_t size)
{
if (h->noncoherent)
- dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+ dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), size,
+ DMA_BIDIRECTIONAL);
}

struct Scsi_Host *
@@ -300,8 +307,8 @@ NCR_700_detect(struct scsi_host_template *tpnt,
memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL);
if (!memory) {
hostdata->noncoherent = 1;
- memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
- GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+ memory = dma_alloc_noncoherent(dev, TOTAL_MEM_SIZE, &pScript,
+ DMA_BIDIRECTIONAL, GFP_KERNEL);
}
if (!memory) {
printk(KERN_ERR "53c700: Failed to allocate memory for driver, detaching\n");
@@ -414,8 +421,9 @@ NCR_700_release(struct Scsi_Host *host)
(struct NCR_700_Host_Parameters *)host->hostdata[0];

if (hostdata->noncoherent)
- dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
- hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(hostdata->dev, TOTAL_MEM_SIZE,
+ hostdata->script, hostdata->pScript,
+ DMA_BIDIRECTIONAL);
else
dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
hostdata->script, hostdata->pScript);
--
2.28.0

2020-09-15 22:29:45

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/18] lib82596: convert to dma_alloc_noncoherent

Use the new non-coherent DMA API including proper ownership transfers.
This includes moving the DMA helpers to lib82596 based of an ifdef to
avoid include order problems.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/net/ethernet/i825xx/lasi_82596.c | 25 ++---
drivers/net/ethernet/i825xx/lib82596.c | 114 ++++++++++++++---------
drivers/net/ethernet/i825xx/sni_82596.c | 4 -
3 files changed, 80 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c
index a12218e940a2fa..96c6f4f36904ed 100644
--- a/drivers/net/ethernet/i825xx/lasi_82596.c
+++ b/drivers/net/ethernet/i825xx/lasi_82596.c
@@ -96,21 +96,14 @@

#define OPT_SWAP_PORT 0x0001 /* Need to wordswp on the MPU port */

-#define DMA_WBACK(ndev, addr, len) \
- do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0)
-
-#define DMA_INV(ndev, addr, len) \
- do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_FROM_DEVICE); } while (0)
-
-#define DMA_WBACK_INV(ndev, addr, len) \
- do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_BIDIRECTIONAL); } while (0)
-
#define SYSBUS 0x0000006c

/* big endian CPU, 82596 "big" endian mode */
#define SWAP32(x) (((u32)(x)<<16) | ((((u32)(x)))>>16))
#define SWAP16(x) (x)

+#define NONCOHERENT_DMA 1
+
#include "lib82596.c"

MODULE_AUTHOR("Richard Hirst");
@@ -184,9 +177,9 @@ lan_init_chip(struct parisc_device *dev)

lp = netdev_priv(netdevice);
lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0;
- lp->dma = dma_alloc_attrs(&dev->dev, sizeof(struct i596_dma),
- &lp->dma_addr, GFP_KERNEL,
- DMA_ATTR_NON_CONSISTENT);
+ lp->dma = dma_alloc_noncoherent(&dev->dev,
+ sizeof(struct i596_dma), &lp->dma_addr,
+ DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!lp->dma)
goto out_free_netdev;

@@ -196,8 +189,8 @@ lan_init_chip(struct parisc_device *dev)
return 0;

out_free_dma:
- dma_free_attrs(&dev->dev, sizeof(struct i596_dma), lp->dma,
- lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(&dev->dev, sizeof(struct i596_dma),
+ lp->dma, lp->dma_addr, DMA_BIDIRECTIONAL);
out_free_netdev:
free_netdev(netdevice);
return retval;
@@ -209,8 +202,8 @@ static int __exit lan_remove_chip(struct parisc_device *pdev)
struct i596_private *lp = netdev_priv(dev);

unregister_netdev (dev);
- dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma,
- lp->dma_addr, DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(&pdev->dev, sizeof(struct i596_private), lp->dma,
+ lp->dma_addr, DMA_BIDIRECTIONAL);
free_netdev (dev);
return 0;
}
diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c
index b4e4b3eb5758b5..ca2fb303fcc6f6 100644
--- a/drivers/net/ethernet/i825xx/lib82596.c
+++ b/drivers/net/ethernet/i825xx/lib82596.c
@@ -365,13 +365,44 @@ static int max_cmd_backlog = TX_RING_SIZE-1;
static void i596_poll_controller(struct net_device *dev);
#endif

+static inline dma_addr_t virt_to_dma(struct i596_private *lp, volatile void *v)
+{
+ return lp->dma_addr + ((unsigned long)v - (unsigned long)lp->dma);
+}
+
+#ifdef NONCOHERENT_DMA
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+ size_t len)
+{
+ dma_sync_single_for_device(ndev->dev.parent,
+ virt_to_dma(netdev_priv(ndev), addr), len,
+ DMA_BIDIRECTIONAL);
+}
+
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+ size_t len)
+{
+ dma_sync_single_for_cpu(ndev->dev.parent,
+ virt_to_dma(netdev_priv(ndev), addr), len,
+ DMA_BIDIRECTIONAL);
+}
+#else
+static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr,
+ size_t len)
+{
+}
+static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr,
+ size_t len)
+{
+}
+#endif /* NONCOHERENT_DMA */

static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int delcnt, char *str)
{
- DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+ dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
while (--delcnt && dma->iscp.stat) {
udelay(10);
- DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp));
+ dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp));
}
if (!delcnt) {
printk(KERN_ERR "%s: %s, iscp.stat %04x, didn't clear\n",
@@ -384,10 +415,10 @@ static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int d

static inline int wait_cmd(struct net_device *dev, struct i596_dma *dma, int delcnt, char *str)
{
- DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+ dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
while (--delcnt && dma->scb.command) {
udelay(10);
- DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb));
+ dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb));
}
if (!delcnt) {
printk(KERN_ERR "%s: %s, status %4.4x, cmd %4.4x.\n",
@@ -451,12 +482,9 @@ static void i596_display_data(struct net_device *dev)
SWAP32(rbd->b_data), SWAP16(rbd->size));
rbd = rbd->v_next;
} while (rbd != lp->rbd_head);
- DMA_INV(dev, dma, sizeof(struct i596_dma));
+ dma_sync_cpu(dev, dma, sizeof(struct i596_dma));
}

-
-#define virt_to_dma(lp, v) ((lp)->dma_addr + (dma_addr_t)((unsigned long)(v)-(unsigned long)((lp)->dma)))
-
static inline int init_rx_bufs(struct net_device *dev)
{
struct i596_private *lp = netdev_priv(dev);
@@ -508,7 +536,7 @@ static inline int init_rx_bufs(struct net_device *dev)
rfd->b_next = SWAP32(virt_to_dma(lp, dma->rfds));
rfd->cmd = SWAP16(CMD_EOL|CMD_FLEX);

- DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+ dma_sync_dev(dev, dma, sizeof(struct i596_dma));
return 0;
}

@@ -547,7 +575,7 @@ static void rebuild_rx_bufs(struct net_device *dev)
lp->rbd_head = dma->rbds;
dma->rfds[0].rbd = SWAP32(virt_to_dma(lp, dma->rbds));

- DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma));
+ dma_sync_dev(dev, dma, sizeof(struct i596_dma));
}


@@ -575,9 +603,9 @@ static int init_i596_mem(struct net_device *dev)

DEB(DEB_INIT, printk(KERN_DEBUG "%s: starting i82596.\n", dev->name));

- DMA_WBACK(dev, &(dma->scp), sizeof(struct i596_scp));
- DMA_WBACK(dev, &(dma->iscp), sizeof(struct i596_iscp));
- DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(dma->scp), sizeof(struct i596_scp));
+ dma_sync_dev(dev, &(dma->iscp), sizeof(struct i596_iscp));
+ dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));

mpu_port(dev, PORT_ALTSCP, virt_to_dma(lp, &dma->scp));
ca(dev);
@@ -596,24 +624,24 @@ static int init_i596_mem(struct net_device *dev)
rebuild_rx_bufs(dev);

dma->scb.command = 0;
- DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));

DEB(DEB_INIT, printk(KERN_DEBUG
"%s: queuing CmdConfigure\n", dev->name));
memcpy(dma->cf_cmd.i596_config, init_setup, 14);
dma->cf_cmd.cmd.command = SWAP16(CmdConfigure);
- DMA_WBACK(dev, &(dma->cf_cmd), sizeof(struct cf_cmd));
+ dma_sync_dev(dev, &(dma->cf_cmd), sizeof(struct cf_cmd));
i596_add_cmd(dev, &dma->cf_cmd.cmd);

DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdSASetup\n", dev->name));
memcpy(dma->sa_cmd.eth_addr, dev->dev_addr, ETH_ALEN);
dma->sa_cmd.cmd.command = SWAP16(CmdSASetup);
- DMA_WBACK(dev, &(dma->sa_cmd), sizeof(struct sa_cmd));
+ dma_sync_dev(dev, &(dma->sa_cmd), sizeof(struct sa_cmd));
i596_add_cmd(dev, &dma->sa_cmd.cmd);

DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdTDR\n", dev->name));
dma->tdr_cmd.cmd.command = SWAP16(CmdTDR);
- DMA_WBACK(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd));
+ dma_sync_dev(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd));
i596_add_cmd(dev, &dma->tdr_cmd.cmd);

spin_lock_irqsave (&lp->lock, flags);
@@ -625,7 +653,7 @@ static int init_i596_mem(struct net_device *dev)
DEB(DEB_INIT, printk(KERN_DEBUG "%s: Issuing RX_START\n", dev->name));
dma->scb.command = SWAP16(RX_START);
dma->scb.rfd = SWAP32(virt_to_dma(lp, dma->rfds));
- DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));

ca(dev);

@@ -659,13 +687,13 @@ static inline int i596_rx(struct net_device *dev)

rfd = lp->rfd_head; /* Ref next frame to check */

- DMA_INV(dev, rfd, sizeof(struct i596_rfd));
+ dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd));
while (rfd->stat & SWAP16(STAT_C)) { /* Loop while complete frames */
if (rfd->rbd == I596_NULL)
rbd = NULL;
else if (rfd->rbd == lp->rbd_head->b_addr) {
rbd = lp->rbd_head;
- DMA_INV(dev, rbd, sizeof(struct i596_rbd));
+ dma_sync_cpu(dev, rbd, sizeof(struct i596_rbd));
} else {
printk(KERN_ERR "%s: rbd chain broken!\n", dev->name);
/* XXX Now what? */
@@ -713,7 +741,7 @@ static inline int i596_rx(struct net_device *dev)
DMA_FROM_DEVICE);
rbd->v_data = newskb->data;
rbd->b_data = SWAP32(dma_addr);
- DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
+ dma_sync_dev(dev, rbd, sizeof(struct i596_rbd));
} else {
skb = netdev_alloc_skb_ip_align(dev, pkt_len);
}
@@ -765,7 +793,7 @@ static inline int i596_rx(struct net_device *dev)
if (rbd != NULL && (rbd->count & SWAP16(0x4000))) {
rbd->count = 0;
lp->rbd_head = rbd->v_next;
- DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd));
+ dma_sync_dev(dev, rbd, sizeof(struct i596_rbd));
}

/* Tidy the frame descriptor, marking it as end of list */
@@ -779,14 +807,14 @@ static inline int i596_rx(struct net_device *dev)

lp->dma->scb.rfd = rfd->b_next;
lp->rfd_head = rfd->v_next;
- DMA_WBACK_INV(dev, rfd, sizeof(struct i596_rfd));
+ dma_sync_dev(dev, rfd, sizeof(struct i596_rfd));

/* Remove end-of-list from old end descriptor */

rfd->v_prev->cmd = SWAP16(CMD_FLEX);
- DMA_WBACK_INV(dev, rfd->v_prev, sizeof(struct i596_rfd));
+ dma_sync_dev(dev, rfd->v_prev, sizeof(struct i596_rfd));
rfd = lp->rfd_head;
- DMA_INV(dev, rfd, sizeof(struct i596_rfd));
+ dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd));
}

DEB(DEB_RXFRAME, printk(KERN_DEBUG "frames %d\n", frames));
@@ -827,12 +855,12 @@ static inline void i596_cleanup_cmd(struct net_device *dev, struct i596_private
ptr->v_next = NULL;
ptr->b_next = I596_NULL;
}
- DMA_WBACK_INV(dev, ptr, sizeof(struct i596_cmd));
+ dma_sync_dev(dev, ptr, sizeof(struct i596_cmd));
}

wait_cmd(dev, lp->dma, 100, "i596_cleanup_cmd timed out");
lp->dma->scb.cmd = I596_NULL;
- DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
}


@@ -850,7 +878,7 @@ static inline void i596_reset(struct net_device *dev, struct i596_private *lp)

/* FIXME: this command might cause an lpmc */
lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT);
- DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
ca(dev);

/* wait for shutdown */
@@ -878,20 +906,20 @@ static void i596_add_cmd(struct net_device *dev, struct i596_cmd *cmd)
cmd->command |= SWAP16(CMD_EOL | CMD_INTR);
cmd->v_next = NULL;
cmd->b_next = I596_NULL;
- DMA_WBACK(dev, cmd, sizeof(struct i596_cmd));
+ dma_sync_dev(dev, cmd, sizeof(struct i596_cmd));

spin_lock_irqsave (&lp->lock, flags);

if (lp->cmd_head != NULL) {
lp->cmd_tail->v_next = cmd;
lp->cmd_tail->b_next = SWAP32(virt_to_dma(lp, &cmd->status));
- DMA_WBACK(dev, lp->cmd_tail, sizeof(struct i596_cmd));
+ dma_sync_dev(dev, lp->cmd_tail, sizeof(struct i596_cmd));
} else {
lp->cmd_head = cmd;
wait_cmd(dev, dma, 100, "i596_add_cmd timed out");
dma->scb.cmd = SWAP32(virt_to_dma(lp, &cmd->status));
dma->scb.command = SWAP16(CUC_START);
- DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
ca(dev);
}
lp->cmd_tail = cmd;
@@ -956,7 +984,7 @@ static void i596_tx_timeout (struct net_device *dev, unsigned int txqueue)
/* Issue a channel attention signal */
DEB(DEB_ERRORS, printk(KERN_DEBUG "Kicking board.\n"));
lp->dma->scb.command = SWAP16(CUC_START | RX_START);
- DMA_WBACK_INV(dev, &(lp->dma->scb), sizeof(struct i596_scb));
+ dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb));
ca (dev);
lp->last_restart = dev->stats.tx_packets;
}
@@ -1014,8 +1042,8 @@ static netdev_tx_t i596_start_xmit(struct sk_buff *skb, struct net_device *dev)
tbd->data = SWAP32(tx_cmd->dma_addr);

DEB(DEB_TXADDR, print_eth(skb->data, "tx-queued"));
- DMA_WBACK_INV(dev, tx_cmd, sizeof(struct tx_cmd));
- DMA_WBACK_INV(dev, tbd, sizeof(struct i596_tbd));
+ dma_sync_dev(dev, tx_cmd, sizeof(struct tx_cmd));
+ dma_sync_dev(dev, tbd, sizeof(struct i596_tbd));
i596_add_cmd(dev, &tx_cmd->cmd);

dev->stats.tx_packets++;
@@ -1071,7 +1099,7 @@ static int i82596_probe(struct net_device *dev)
lp->dma->scb.rfd = I596_NULL;
spin_lock_init(&lp->lock);

- DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
+ dma_sync_dev(dev, lp->dma, sizeof(struct i596_dma));

ret = register_netdev(dev);
if (ret)
@@ -1141,7 +1169,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
dev->name, status & 0x0700));

while (lp->cmd_head != NULL) {
- DMA_INV(dev, lp->cmd_head, sizeof(struct i596_cmd));
+ dma_sync_cpu(dev, lp->cmd_head, sizeof(struct i596_cmd));
if (!(lp->cmd_head->status & SWAP16(STAT_C)))
break;

@@ -1223,7 +1251,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
}
ptr->v_next = NULL;
ptr->b_next = I596_NULL;
- DMA_WBACK(dev, ptr, sizeof(struct i596_cmd));
+ dma_sync_dev(dev, ptr, sizeof(struct i596_cmd));
lp->last_cmd = jiffies;
}

@@ -1237,13 +1265,13 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)

ptr->command &= SWAP16(0x1fff);
ptr = ptr->v_next;
- DMA_WBACK_INV(dev, prev, sizeof(struct i596_cmd));
+ dma_sync_dev(dev, prev, sizeof(struct i596_cmd));
}

if (lp->cmd_head != NULL)
ack_cmd |= CUC_START;
dma->scb.cmd = SWAP32(virt_to_dma(lp, &lp->cmd_head->status));
- DMA_WBACK_INV(dev, &dma->scb, sizeof(struct i596_scb));
+ dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));
}
if ((status & 0x1000) || (status & 0x4000)) {
if ((status & 0x4000))
@@ -1268,7 +1296,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
}
wait_cmd(dev, dma, 100, "i596 interrupt, timeout");
dma->scb.command = SWAP16(ack_cmd);
- DMA_WBACK(dev, &dma->scb, sizeof(struct i596_scb));
+ dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));

/* DANGER: I suspect that some kind of interrupt
acknowledgement aside from acking the 82596 might be needed
@@ -1299,7 +1327,7 @@ static int i596_close(struct net_device *dev)

wait_cmd(dev, lp->dma, 100, "close1 timed out");
lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT);
- DMA_WBACK(dev, &lp->dma->scb, sizeof(struct i596_scb));
+ dma_sync_dev(dev, &lp->dma->scb, sizeof(struct i596_scb));

ca(dev);

@@ -1358,7 +1386,7 @@ static void set_multicast_list(struct net_device *dev)
dev->name);
else {
dma->cf_cmd.cmd.command = SWAP16(CmdConfigure);
- DMA_WBACK_INV(dev, &dma->cf_cmd, sizeof(struct cf_cmd));
+ dma_sync_dev(dev, &dma->cf_cmd, sizeof(struct cf_cmd));
i596_add_cmd(dev, &dma->cf_cmd.cmd);
}
}
@@ -1390,7 +1418,7 @@ static void set_multicast_list(struct net_device *dev)
dev->name, cp));
cp += ETH_ALEN;
}
- DMA_WBACK_INV(dev, &dma->mc_cmd, sizeof(struct mc_cmd));
+ dma_sync_dev(dev, &dma->mc_cmd, sizeof(struct mc_cmd));
i596_add_cmd(dev, &cmd->cmd);
}
}
diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c
index 4b9ac0c6557731..27937c5d795673 100644
--- a/drivers/net/ethernet/i825xx/sni_82596.c
+++ b/drivers/net/ethernet/i825xx/sni_82596.c
@@ -24,10 +24,6 @@

static const char sni_82596_string[] = "snirm_82596";

-#define DMA_WBACK(priv, addr, len) do { } while (0)
-#define DMA_INV(priv, addr, len) do { } while (0)
-#define DMA_WBACK_INV(priv, addr, len) do { } while (0)
-
#define SYSBUS 0x00004400

/* big endian CPU, 82596 little endian */
--
2.28.0

2020-09-15 22:32:11

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 09/18] sgiwd93: convert to dma_alloc_noncoherent

Use the new non-coherent DMA API including proper ownership transfers.
This also means we can allocate the memory as DMA_TO_DEVICE instead
of bidirectional.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/scsi/sgiwd93.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c
index 3bdf0deb8f1529..cf1030c9dda17f 100644
--- a/drivers/scsi/sgiwd93.c
+++ b/drivers/scsi/sgiwd93.c
@@ -95,7 +95,7 @@ void fill_hpc_entries(struct ip22_hostdata *hd, struct scsi_cmnd *cmd, int din)
*/
hcp->desc.pbuf = 0;
hcp->desc.cntinfo = HPCDMA_EOX;
- dma_cache_sync(hd->dev, hd->cpu,
+ dma_sync_single_for_device(hd->dev, hd->dma,
(unsigned long)(hcp + 1) - (unsigned long)hd->cpu,
DMA_TO_DEVICE);
}
@@ -234,8 +234,8 @@ static int sgiwd93_probe(struct platform_device *pdev)

hdata = host_to_hostdata(host);
hdata->dev = &pdev->dev;
- hdata->cpu = dma_alloc_attrs(&pdev->dev, HPC_DMA_SIZE, &hdata->dma,
- GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+ hdata->cpu = dma_alloc_noncoherent(&pdev->dev, HPC_DMA_SIZE,
+ &hdata->dma, DMA_TO_DEVICE, GFP_KERNEL);
if (!hdata->cpu) {
printk(KERN_WARNING "sgiwd93: Could not allocate memory for "
"host %d buffer.\n", unit);
@@ -274,8 +274,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
out_irq:
free_irq(irq, host);
out_free:
- dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
- DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+ DMA_TO_DEVICE);
out_put:
scsi_host_put(host);
out:
@@ -291,8 +291,8 @@ static int sgiwd93_remove(struct platform_device *pdev)

scsi_remove_host(host);
free_irq(pd->irq, host);
- dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
- DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma,
+ DMA_TO_DEVICE);
scsi_host_put(host);
return 0;
}
--
2.28.0

2020-09-15 22:33:10

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 10/18] hal2: convert to dma_alloc_noncoherent

Use the new non-coherent DMA API including proper ownership transfers.
This also means we can allocate the buffer memory with the proper
direction instead of bidirectional.

Signed-off-by: Christoph Hellwig <[email protected]>
---
sound/mips/hal2.c | 58 ++++++++++++++++++++++-------------------------
1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c
index ec84bc4c3a6e77..9ac9b58d7c8cdd 100644
--- a/sound/mips/hal2.c
+++ b/sound/mips/hal2.c
@@ -441,7 +441,8 @@ static inline void hal2_stop_adc(struct snd_hal2 *hal2)
hal2->adc.pbus.pbus->pbdma_ctrl = HPC3_PDMACTRL_LD;
}

-static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+ enum dma_data_direction buffer_dir)
{
struct device *dev = hal2->card->dev;
struct hal2_desc *desc;
@@ -449,15 +450,15 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
int count = H2_BUF_SIZE / H2_BLOCK_SIZE;
int i;

- codec->buffer = dma_alloc_attrs(dev, H2_BUF_SIZE, &buffer_dma,
- GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+ codec->buffer = dma_alloc_noncoherent(dev, H2_BUF_SIZE, &buffer_dma,
+ buffer_dir, GFP_KERNEL);
if (!codec->buffer)
return -ENOMEM;
- desc = dma_alloc_attrs(dev, count * sizeof(struct hal2_desc),
- &desc_dma, GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+ desc = dma_alloc_noncoherent(dev, count * sizeof(struct hal2_desc),
+ &desc_dma, DMA_BIDIRECTIONAL, GFP_KERNEL);
if (!desc) {
- dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
- DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(dev, H2_BUF_SIZE, codec->buffer, buffer_dma,
+ buffer_dir);
return -ENOMEM;
}
codec->buffer_dma = buffer_dma;
@@ -470,20 +471,22 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
desc_dma : desc_dma + (i + 1) * sizeof(struct hal2_desc);
desc++;
}
- dma_cache_sync(dev, codec->desc, count * sizeof(struct hal2_desc),
- DMA_TO_DEVICE);
+ dma_sync_single_for_device(dev, codec->desc_dma,
+ count * sizeof(struct hal2_desc),
+ DMA_BIDIRECTIONAL);
codec->desc_count = count;
return 0;
}

-static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec)
+static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec,
+ enum dma_data_direction buffer_dir)
{
struct device *dev = hal2->card->dev;

- dma_free_attrs(dev, codec->desc_count * sizeof(struct hal2_desc),
- codec->desc, codec->desc_dma, DMA_ATTR_NON_CONSISTENT);
- dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
- DMA_ATTR_NON_CONSISTENT);
+ dma_free_noncoherent(dev, codec->desc_count * sizeof(struct hal2_desc),
+ codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL);
+ dma_free_noncoherent(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma,
+ buffer_dir);
}

static const struct snd_pcm_hardware hal2_pcm_hw = {
@@ -509,21 +512,16 @@ static int hal2_playback_open(struct snd_pcm_substream *substream)
{
struct snd_pcm_runtime *runtime = substream->runtime;
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
- int err;

runtime->hw = hal2_pcm_hw;
-
- err = hal2_alloc_dmabuf(hal2, &hal2->dac);
- if (err)
- return err;
- return 0;
+ return hal2_alloc_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE);
}

static int hal2_playback_close(struct snd_pcm_substream *substream)
{
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);

- hal2_free_dmabuf(hal2, &hal2->dac);
+ hal2_free_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE);
return 0;
}

@@ -579,7 +577,9 @@ static void hal2_playback_transfer(struct snd_pcm_substream *substream,
unsigned char *buf = hal2->dac.buffer + rec->hw_data;

memcpy(buf, substream->runtime->dma_area + rec->sw_data, bytes);
- dma_cache_sync(hal2->card->dev, buf, bytes, DMA_TO_DEVICE);
+ dma_sync_single_for_device(hal2->card->dev,
+ hal2->dac.buffer_dma + rec->hw_data, bytes,
+ DMA_TO_DEVICE);

}

@@ -597,22 +597,16 @@ static int hal2_capture_open(struct snd_pcm_substream *substream)
{
struct snd_pcm_runtime *runtime = substream->runtime;
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
- struct hal2_codec *adc = &hal2->adc;
- int err;

runtime->hw = hal2_pcm_hw;
-
- err = hal2_alloc_dmabuf(hal2, adc);
- if (err)
- return err;
- return 0;
+ return hal2_alloc_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE);
}

static int hal2_capture_close(struct snd_pcm_substream *substream)
{
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);

- hal2_free_dmabuf(hal2, &hal2->adc);
+ hal2_free_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE);
return 0;
}

@@ -667,7 +661,9 @@ static void hal2_capture_transfer(struct snd_pcm_substream *substream,
struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
unsigned char *buf = hal2->adc.buffer + rec->hw_data;

- dma_cache_sync(hal2->card->dev, buf, bytes, DMA_FROM_DEVICE);
+ dma_sync_single_for_cpu(hal2->card->dev,
+ hal2->adc.buffer_dma + rec->hw_data, bytes,
+ DMA_FROM_DEVICE);
memcpy(substream->runtime->dma_area + rec->sw_data, buf, bytes);
}

--
2.28.0

2020-09-15 22:38:39

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 07/18] 53c700: improve non-coherent DMA handling

Switch the 53c700 driver to only use non-coherent descriptor memory if it
really has to because dma_alloc_coherent fails. This doesn't matter for
any of the platforms it runs on currently, but that will change soon.

To help with this two new helpers to transfer ownership to and from the
device are added that abstract the syncing of the non-coherent memory.
The two current bidirectional cases are mapped to transfers to the
device, as that appears to what they are used for. Note that for parisc,
which is the only architecture this driver needs to use non-coherent
memory on, the direction argument of dma_cache_sync is ignored, so this
will not change behavior in any way.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/scsi/53c700.c | 113 +++++++++++++++++++++++-------------------
drivers/scsi/53c700.h | 17 ++++---
2 files changed, 72 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 84b57a8f86bfa9..c59226d7e2f6b5 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -269,6 +269,20 @@ NCR_700_get_SXFER(struct scsi_device *SDp)
spi_period(SDp->sdev_target));
}

+static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h,
+ void *addr, size_t size)
+{
+ if (h->noncoherent)
+ dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE);
+}
+
+static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h,
+ void *addr, size_t size)
+{
+ if (h->noncoherent)
+ dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE);
+}
+
struct Scsi_Host *
NCR_700_detect(struct scsi_host_template *tpnt,
struct NCR_700_Host_Parameters *hostdata, struct device *dev)
@@ -283,9 +297,13 @@ NCR_700_detect(struct scsi_host_template *tpnt,
if(tpnt->sdev_attrs == NULL)
tpnt->sdev_attrs = NCR_700_dev_attrs;

- memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
- GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
- if(memory == NULL) {
+ memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL);
+ if (!memory) {
+ hostdata->noncoherent = 1;
+ memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript,
+ GFP_KERNEL, DMA_ATTR_NON_CONSISTENT);
+ }
+ if (!memory) {
printk(KERN_ERR "53c700: Failed to allocate memory for driver, detaching\n");
return NULL;
}
@@ -339,11 +357,11 @@ NCR_700_detect(struct scsi_host_template *tpnt,
for (j = 0; j < PATCHES; j++)
script[LABELPATCHES[j]] = bS_to_host(pScript + SCRIPT[LABELPATCHES[j]]);
/* now patch up fixed addresses. */
- script_patch_32(hostdata->dev, script, MessageLocation,
+ script_patch_32(hostdata, script, MessageLocation,
pScript + MSGOUT_OFFSET);
- script_patch_32(hostdata->dev, script, StatusAddress,
+ script_patch_32(hostdata, script, StatusAddress,
pScript + STATUS_OFFSET);
- script_patch_32(hostdata->dev, script, ReceiveMsgAddress,
+ script_patch_32(hostdata, script, ReceiveMsgAddress,
pScript + MSGIN_OFFSET);

hostdata->script = script;
@@ -395,8 +413,12 @@ NCR_700_release(struct Scsi_Host *host)
struct NCR_700_Host_Parameters *hostdata =
(struct NCR_700_Host_Parameters *)host->hostdata[0];

- dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
- hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+ if (hostdata->noncoherent)
+ dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script,
+ hostdata->pScript, DMA_ATTR_NON_CONSISTENT);
+ else
+ dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE,
+ hostdata->script, hostdata->pScript);
return 1;
}

@@ -804,8 +826,8 @@ process_extended_message(struct Scsi_Host *host,
shost_printk(KERN_WARNING, host,
"Unexpected SDTR msg\n");
hostdata->msgout[0] = A_REJECT_MSG;
- dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
- script_patch_16(hostdata->dev, hostdata->script,
+ dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+ script_patch_16(hostdata, hostdata->script,
MessageCount, 1);
/* SendMsgOut returns, so set up the return
* address */
@@ -817,9 +839,8 @@ process_extended_message(struct Scsi_Host *host,
printk(KERN_INFO "scsi%d: (%d:%d), Unsolicited WDTR after CMD, Rejecting\n",
host->host_no, pun, lun);
hostdata->msgout[0] = A_REJECT_MSG;
- dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
- script_patch_16(hostdata->dev, hostdata->script, MessageCount,
- 1);
+ dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+ script_patch_16(hostdata, hostdata->script, MessageCount, 1);
resume_offset = hostdata->pScript + Ent_SendMessageWithATN;

break;
@@ -832,9 +853,8 @@ process_extended_message(struct Scsi_Host *host,
printk("\n");
/* just reject it */
hostdata->msgout[0] = A_REJECT_MSG;
- dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
- script_patch_16(hostdata->dev, hostdata->script, MessageCount,
- 1);
+ dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+ script_patch_16(hostdata, hostdata->script, MessageCount, 1);
/* SendMsgOut returns, so set up the return
* address */
resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
@@ -917,9 +937,8 @@ process_message(struct Scsi_Host *host, struct NCR_700_Host_Parameters *hostdata
printk("\n");
/* just reject it */
hostdata->msgout[0] = A_REJECT_MSG;
- dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE);
- script_patch_16(hostdata->dev, hostdata->script, MessageCount,
- 1);
+ dma_sync_to_dev(hostdata, hostdata->msgout, 1);
+ script_patch_16(hostdata, hostdata->script, MessageCount, 1);
/* SendMsgOut returns, so set up the return
* address */
resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
@@ -928,7 +947,7 @@ process_message(struct Scsi_Host *host, struct NCR_700_Host_Parameters *hostdata
}
NCR_700_writel(temp, host, TEMP_REG);
/* set us up to receive another message */
- dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE, DMA_FROM_DEVICE);
+ dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
return resume_offset;
}

@@ -1008,8 +1027,8 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
slot->SG[1].ins = bS_to_host(SCRIPT_RETURN);
slot->SG[1].pAddr = 0;
slot->resume_offset = hostdata->pScript;
- dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG[0])*2, DMA_TO_DEVICE);
- dma_cache_sync(hostdata->dev, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE, DMA_FROM_DEVICE);
+ dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG[0])*2);
+ dma_sync_from_dev(hostdata, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE);

/* queue the command for reissue */
slot->state = NCR_700_SLOT_QUEUED;
@@ -1129,11 +1148,11 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
hostdata->cmd = slot->cmnd;

/* re-patch for this command */
- script_patch_32_abs(hostdata->dev, hostdata->script,
+ script_patch_32_abs(hostdata, hostdata->script,
CommandAddress, slot->pCmd);
- script_patch_16(hostdata->dev, hostdata->script,
+ script_patch_16(hostdata, hostdata->script,
CommandCount, slot->cmnd->cmd_len);
- script_patch_32_abs(hostdata->dev, hostdata->script,
+ script_patch_32_abs(hostdata, hostdata->script,
SGScriptStartAddress,
to32bit(&slot->pSG[0].ins));

@@ -1144,14 +1163,14 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
* should therefore always clear ACK */
NCR_700_writeb(NCR_700_get_SXFER(hostdata->cmd->device),
host, SXFER_REG);
- dma_cache_sync(hostdata->dev, hostdata->msgin,
- MSG_ARRAY_SIZE, DMA_FROM_DEVICE);
- dma_cache_sync(hostdata->dev, hostdata->msgout,
- MSG_ARRAY_SIZE, DMA_TO_DEVICE);
+ dma_sync_from_dev(hostdata, hostdata->msgin,
+ MSG_ARRAY_SIZE);
+ dma_sync_to_dev(hostdata, hostdata->msgout,
+ MSG_ARRAY_SIZE);
/* I'm just being paranoid here, the command should
* already have been flushed from the cache */
- dma_cache_sync(hostdata->dev, slot->cmnd->cmnd,
- slot->cmnd->cmd_len, DMA_TO_DEVICE);
+ dma_sync_to_dev(hostdata, slot->cmnd->cmnd,
+ slot->cmnd->cmd_len);



@@ -1214,8 +1233,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
hostdata->reselection_id = reselection_id;
/* just in case we have a stale simple tag message, clear it */
hostdata->msgin[1] = 0;
- dma_cache_sync(hostdata->dev, hostdata->msgin,
- MSG_ARRAY_SIZE, DMA_BIDIRECTIONAL);
+ dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
if(hostdata->tag_negotiated & (1<<reselection_id)) {
resume_offset = hostdata->pScript + Ent_GetReselectionWithTag;
} else {
@@ -1329,8 +1347,7 @@ process_selection(struct Scsi_Host *host, __u32 dsp)
hostdata->cmd = NULL;
/* clear any stale simple tag message */
hostdata->msgin[1] = 0;
- dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
- DMA_BIDIRECTIONAL);
+ dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);

if(id == 0xff) {
/* Selected as target, Ignore */
@@ -1427,30 +1444,26 @@ NCR_700_start_command(struct scsi_cmnd *SCp)
NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_SYNC_NEGOTIATION);
}

- script_patch_16(hostdata->dev, hostdata->script, MessageCount, count);
-
+ script_patch_16(hostdata, hostdata->script, MessageCount, count);

- script_patch_ID(hostdata->dev, hostdata->script,
- Device_ID, 1<<scmd_id(SCp));
+ script_patch_ID(hostdata, hostdata->script, Device_ID, 1<<scmd_id(SCp));

- script_patch_32_abs(hostdata->dev, hostdata->script, CommandAddress,
+ script_patch_32_abs(hostdata, hostdata->script, CommandAddress,
slot->pCmd);
- script_patch_16(hostdata->dev, hostdata->script, CommandCount,
- SCp->cmd_len);
+ script_patch_16(hostdata, hostdata->script, CommandCount, SCp->cmd_len);
/* finally plumb the beginning of the SG list into the script
* */
- script_patch_32_abs(hostdata->dev, hostdata->script,
+ script_patch_32_abs(hostdata, hostdata->script,
SGScriptStartAddress, to32bit(&slot->pSG[0].ins));
NCR_700_clear_fifo(SCp->device->host);

if(slot->resume_offset == 0)
slot->resume_offset = hostdata->pScript;
/* now perform all the writebacks and invalidates */
- dma_cache_sync(hostdata->dev, hostdata->msgout, count, DMA_TO_DEVICE);
- dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE,
- DMA_FROM_DEVICE);
- dma_cache_sync(hostdata->dev, SCp->cmnd, SCp->cmd_len, DMA_TO_DEVICE);
- dma_cache_sync(hostdata->dev, hostdata->status, 1, DMA_FROM_DEVICE);
+ dma_sync_to_dev(hostdata, hostdata->msgout, count);
+ dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
+ dma_sync_to_dev(hostdata, SCp->cmnd, SCp->cmd_len);
+ dma_sync_from_dev(hostdata, hostdata->status, 1);

/* set the synchronous period/offset */
NCR_700_writeb(NCR_700_get_SXFER(SCp->device),
@@ -1626,7 +1639,7 @@ NCR_700_intr(int irq, void *dev_id)
slot->SG[i].ins = bS_to_host(SCRIPT_NOP);
slot->SG[i].pAddr = 0;
}
- dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG), DMA_TO_DEVICE);
+ dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG));
/* and pretend we disconnected after
* the command phase */
resume_offset = hostdata->pScript + Ent_MsgInDuringData;
@@ -1878,7 +1891,7 @@ NCR_700_queuecommand_lck(struct scsi_cmnd *SCp, void (*done)(struct scsi_cmnd *)
}
slot->SG[i].ins = bS_to_host(SCRIPT_RETURN);
slot->SG[i].pAddr = 0;
- dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG), DMA_TO_DEVICE);
+ dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG));
DEBUG((" SETTING %p to %x\n",
(&slot->pSG[i].ins),
slot->SG[i].ins));
diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h
index 05fe439b66afe5..c9f8c497babb3d 100644
--- a/drivers/scsi/53c700.h
+++ b/drivers/scsi/53c700.h
@@ -209,6 +209,7 @@ struct NCR_700_Host_Parameters {
#endif
__u32 chip710:1; /* set if really a 710 not 700 */
__u32 burst_length:4; /* set to 0 to disable 710 bursting */
+ __u32 noncoherent:1; /* needs to use non-coherent DMA */

/* NOTHING BELOW HERE NEEDS ALTERING */
__u32 fast:1; /* if we can alter the SCSI bus clock
@@ -422,33 +423,33 @@ struct NCR_700_Host_Parameters {
#define NCR_710_MIN_XFERP 0
#define NCR_700_MIN_PERIOD 25 /* for SDTR message, 100ns */

-#define script_patch_32(dev, script, symbol, value) \
+#define script_patch_32(h, script, symbol, value) \
{ \
int i; \
dma_addr_t da = value; \
for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
__u32 val = bS_to_cpu((script)[A_##symbol##_used[i]]) + da; \
(script)[A_##symbol##_used[i]] = bS_to_host(val); \
- dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+ dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \
DEBUG((" script, patching %s at %d to %pad\n", \
#symbol, A_##symbol##_used[i], &da)); \
} \
}

-#define script_patch_32_abs(dev, script, symbol, value) \
+#define script_patch_32_abs(h, script, symbol, value) \
{ \
int i; \
dma_addr_t da = value; \
for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
(script)[A_##symbol##_used[i]] = bS_to_host(da); \
- dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+ dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \
DEBUG((" script, patching %s at %d to %pad\n", \
#symbol, A_##symbol##_used[i], &da)); \
} \
}

/* Used for patching the SCSI ID in the SELECT instruction */
-#define script_patch_ID(dev, script, symbol, value) \
+#define script_patch_ID(h, script, symbol, value) \
{ \
int i; \
for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
@@ -456,13 +457,13 @@ struct NCR_700_Host_Parameters {
val &= 0xff00ffff; \
val |= ((value) & 0xff) << 16; \
(script)[A_##symbol##_used[i]] = bS_to_host(val); \
- dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+ dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \
DEBUG((" script, patching ID field %s at %d to 0x%x\n", \
#symbol, A_##symbol##_used[i], val)); \
} \
}

-#define script_patch_16(dev, script, symbol, value) \
+#define script_patch_16(h, script, symbol, value) \
{ \
int i; \
for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \
@@ -470,7 +471,7 @@ struct NCR_700_Host_Parameters {
val &= 0xffff0000; \
val |= ((value) & 0xffff); \
(script)[A_##symbol##_used[i]] = bS_to_host(val); \
- dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \
+ dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \
DEBUG((" script, patching short field %s at %d to 0x%x\n", \
#symbol, A_##symbol##_used[i], val)); \
} \
--
2.28.0

2020-09-15 22:43:44

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 01/18] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT flag

From: Sergey Senozhatsky <[email protected]>

The patch partially reverts some of the UAPI bits of the buffer
cache management hints. Namely, the queue consistency (memory
coherency) user-space hint because, as it turned out, the kernel
implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.

The patch revers both kernel and user space parts: removes the
DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers,
v4l2_create_buffers structures and corresponding UAPI functions
(plus compat32 layer) and cleanups the documentation.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
.../userspace-api/media/v4l/buffer.rst | 17 -------
.../media/v4l/vidioc-create-bufs.rst | 6 +--
.../media/v4l/vidioc-reqbufs.rst | 12 +----
.../media/common/videobuf2/videobuf2-core.c | 46 +++----------------
.../common/videobuf2/videobuf2-dma-contig.c | 19 --------
.../media/common/videobuf2/videobuf2-dma-sg.c | 3 +-
.../media/common/videobuf2/videobuf2-v4l2.c | 18 +-------
drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +---
drivers/media/v4l2-core/v4l2-ioctl.c | 5 +-
include/media/videobuf2-core.h | 7 +--
include/uapi/linux/videodev2.h | 13 +-----
11 files changed, 22 insertions(+), 134 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
index 57e752aaf414a7..2044ed13cd9d7d 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -701,23 +701,6 @@ Memory Consistency Flags
:stub-columns: 0
:widths: 3 1 4

- * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
-
- - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
- - 0x00000001
- - A buffer is allocated either in consistent (it will be automatically
- coherent between the CPU and the bus) or non-consistent memory. The
- latter can provide performance gains, for instance the CPU cache
- sync/flush operations can be avoided if the buffer is accessed by the
- corresponding device only and the CPU does not read/write to/from that
- buffer. However, this requires extra care from the driver -- it must
- guarantee memory consistency by issuing a cache flush/sync when
- consistency is needed. If this flag is set V4L2 will attempt to
- allocate the buffer in non-consistent memory. The flag takes effect
- only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
- queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
- <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
-
.. c:type:: v4l2_memory

enum v4l2_memory
diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
index f2a702870fadc1..12cf6b44f414f7 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
@@ -120,13 +120,9 @@ than the number requested.
If you want to just query the capabilities without making any
other changes, then set ``count`` to 0, ``memory`` to
``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type.
- * - __u32
- - ``flags``
- - Specifies additional buffer management attributes.
- See :ref:`memory-flags`.

* - __u32
- - ``reserved``\ [6]
+ - ``reserved``\ [7]
- A place holder for future extensions. Drivers and applications
must set the array to zero.

diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 75d894d9c36c42..0e3e2fde65e850 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -112,17 +112,10 @@ aborting or finishing any DMA in progress, an implicit
``V4L2_MEMORY_MMAP`` and ``type`` set to the buffer type. This will
free any previously allocated buffers, so this is typically something
that will be done at the start of the application.
- * - union {
- - (anonymous)
- * - __u32
- - ``flags``
- - Specifies additional buffer management attributes.
- See :ref:`memory-flags`.
* - __u32
- ``reserved``\ [1]
- - Kept for backwards compatibility. Use ``flags`` instead.
- * - }
- -
+ - A place holder for future extensions. Drivers and applications
+ must set the array to zero.

.. tabularcolumns:: |p{6.1cm}|p{2.2cm}|p{8.7cm}|

@@ -169,7 +162,6 @@ aborting or finishing any DMA in progress, an implicit
- This capability is set by the driver to indicate that the queue supports
cache and memory management hints. However, it's only valid when the
queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
- :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`,
:ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
:ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index f544d3393e9d6b..4eab6d81cce170 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
}
EXPORT_SYMBOL(vb2_verify_memory_type);

-static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
-{
- q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
-
- if (!vb2_queue_allows_cache_hints(q))
- return;
- if (!consistent_mem)
- q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
-}
-
-static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
-{
- bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
-
- if (consistent_mem != queue_is_consistent) {
- dprintk(q, 1, "memory consistency model mismatch\n");
- return false;
- }
- return true;
-}
-
int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
- unsigned int flags, unsigned int *count)
+ unsigned int *count)
{
unsigned int num_buffers, allocated_buffers, num_planes = 0;
unsigned plane_sizes[VB2_MAX_PLANES] = { };
- bool consistent_mem = true;
unsigned int i;
int ret;

- if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
- consistent_mem = false;
-
if (q->streaming) {
dprintk(q, 1, "streaming active\n");
return -EBUSY;
@@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
}

if (*count == 0 || q->num_buffers != 0 ||
- (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
- !verify_consistency_attr(q, consistent_mem)) {
+ (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
/*
* We already have buffers allocated, so first check if they
* are not in use and can be freed.
@@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
q->memory = memory;
- set_queue_consistency(q, consistent_mem);

/*
* Ask the driver how many buffers and planes per buffer it requires.
@@ -888,18 +861,14 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
EXPORT_SYMBOL_GPL(vb2_core_reqbufs);

int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
- unsigned int flags, unsigned int *count,
+ unsigned int *count,
unsigned int requested_planes,
const unsigned int requested_sizes[])
{
unsigned int num_planes = 0, num_buffers, allocated_buffers;
unsigned plane_sizes[VB2_MAX_PLANES] = { };
- bool consistent_mem = true;
int ret;

- if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT)
- consistent_mem = false;
-
if (q->num_buffers == VB2_MAX_FRAME) {
dprintk(q, 1, "maximum number of buffers already allocated\n");
return -ENOBUFS;
@@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
}
memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
q->memory = memory;
- set_queue_consistency(q, consistent_mem);
q->waiting_for_buffers = !q->is_output;
} else {
if (q->memory != memory) {
dprintk(q, 1, "memory model mismatch\n");
return -EINVAL;
}
- if (!verify_consistency_attr(q, consistent_mem))
- return -EINVAL;
}

num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
@@ -2581,7 +2547,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
fileio->memory = VB2_MEMORY_MMAP;
fileio->type = q->type;
q->fileio = fileio;
- ret = vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count);
+ ret = vb2_core_reqbufs(q, fileio->memory, &fileio->count);
if (ret)
goto err_kfree;

@@ -2638,7 +2604,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)

err_reqbufs:
fileio->count = 0;
- vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count);
+ vb2_core_reqbufs(q, fileio->memory, &fileio->count);

err_kfree:
q->fileio = NULL;
@@ -2658,7 +2624,7 @@ static int __vb2_cleanup_fileio(struct vb2_queue *q)
vb2_core_streamoff(q, q->type);
q->fileio = NULL;
fileio->count = 0;
- vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count);
+ vb2_core_reqbufs(q, fileio->memory, &fileio->count);
kfree(fileio);
dprintk(q, 3, "file io emulator closed\n");
}
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index ec3446cc45b8da..7b1b86ec942d7d 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -42,11 +42,6 @@ struct vb2_dc_buf {
struct dma_buf_attachment *db_attach;
};

-static inline bool vb2_dc_buffer_consistent(unsigned long attr)
-{
- return !(attr & DMA_ATTR_NON_CONSISTENT);
-}
-
/*********************************************/
/* scatterlist table functions */
/*********************************************/
@@ -341,13 +336,6 @@ static int
vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
enum dma_data_direction direction)
{
- struct vb2_dc_buf *buf = dbuf->priv;
- struct sg_table *sgt = buf->dma_sgt;
-
- if (vb2_dc_buffer_consistent(buf->attrs))
- return 0;
-
- dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
return 0;
}

@@ -355,13 +343,6 @@ static int
vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
enum dma_data_direction direction)
{
- struct vb2_dc_buf *buf = dbuf->priv;
- struct sg_table *sgt = buf->dma_sgt;
-
- if (vb2_dc_buffer_consistent(buf->attrs))
- return 0;
-
- dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
return 0;
}

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
/*
* NOTE: dma-sg allocates memory using the page allocator directly, so
* there is no memory consistency guarantee, hence dma-sg ignores DMA
- * attributes passed from the upper layer. That means that
- * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers.
+ * attributes passed from the upper layer.
*/
buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *),
GFP_KERNEL | __GFP_ZERO);
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index 30caad27281e1a..cfe197df970df2 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -722,22 +722,12 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
#endif
}

-static void clear_consistency_attr(struct vb2_queue *q,
- int memory,
- unsigned int *flags)
-{
- if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
- *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT;
-}
-
int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
{
int ret = vb2_verify_memory_type(q, req->memory, req->type);

fill_buf_caps(q, &req->capabilities);
- clear_consistency_attr(q, req->memory, &req->flags);
- return ret ? ret : vb2_core_reqbufs(q, req->memory,
- req->flags, &req->count);
+ return ret ? ret : vb2_core_reqbufs(q, req->memory, &req->count);
}
EXPORT_SYMBOL_GPL(vb2_reqbufs);

@@ -769,7 +759,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
unsigned i;

fill_buf_caps(q, &create->capabilities);
- clear_consistency_attr(q, create->memory, &create->flags);
create->index = q->num_buffers;
if (create->count == 0)
return ret != -EBUSY ? ret : 0;
@@ -813,7 +802,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
if (requested_sizes[i] == 0)
return -EINVAL;
return ret ? ret : vb2_core_create_bufs(q, create->memory,
- create->flags,
&create->count,
requested_planes,
requested_sizes);
@@ -998,12 +986,11 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);

fill_buf_caps(vdev->queue, &p->capabilities);
- clear_consistency_attr(vdev->queue, p->memory, &p->flags);
if (res)
return res;
if (vb2_queue_is_busy(vdev, file))
return -EBUSY;
- res = vb2_core_reqbufs(vdev->queue, p->memory, p->flags, &p->count);
+ res = vb2_core_reqbufs(vdev->queue, p->memory, &p->count);
/* If count == 0, then the owner has released all buffers and he
is no longer owner of the queue. Otherwise we have a new owner. */
if (res == 0)
@@ -1021,7 +1008,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,

p->index = vdev->queue->num_buffers;
fill_buf_caps(vdev->queue, &p->capabilities);
- clear_consistency_attr(vdev->queue, p->memory, &p->flags);
/*
* If count == 0, then just check if memory and type are valid.
* Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
index 593bcf6c373502..a99e82ec9ab60d 100644
--- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
+++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
@@ -246,9 +246,6 @@ struct v4l2_format32 {
* @memory: buffer memory type
* @format: frame format, for which buffers are requested
* @capabilities: capabilities of this buffer type.
- * @flags: additional buffer management attributes (ignored unless the
- * queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability and
- * configured for MMAP streaming I/O).
* @reserved: future extensions
*/
struct v4l2_create_buffers32 {
@@ -257,8 +254,7 @@ struct v4l2_create_buffers32 {
__u32 memory; /* enum v4l2_memory */
struct v4l2_format32 format;
__u32 capabilities;
- __u32 flags;
- __u32 reserved[6];
+ __u32 reserved[7];
};

static int __bufsize_v4l2_format(struct v4l2_format32 __user *p32, u32 *size)
@@ -359,8 +355,7 @@ static int get_v4l2_create32(struct v4l2_create_buffers __user *p64,
{
if (!access_ok(p32, sizeof(*p32)) ||
copy_in_user(p64, p32,
- offsetof(struct v4l2_create_buffers32, format)) ||
- assign_in_user(&p64->flags, &p32->flags))
+ offsetof(struct v4l2_create_buffers32, format)))
return -EFAULT;
return __get_v4l2_format32(&p64->format, &p32->format,
aux_buf, aux_space);
@@ -422,7 +417,6 @@ static int put_v4l2_create32(struct v4l2_create_buffers __user *p64,
copy_in_user(p32, p64,
offsetof(struct v4l2_create_buffers32, format)) ||
assign_in_user(&p32->capabilities, &p64->capabilities) ||
- assign_in_user(&p32->flags, &p64->flags) ||
copy_in_user(p32->reserved, p64->reserved, sizeof(p64->reserved)))
return -EFAULT;
return __put_v4l2_format32(&p64->format, &p32->format);
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 2a22e13a630346..e0520c85a3b725 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -2042,6 +2042,9 @@ static int v4l_reqbufs(const struct v4l2_ioctl_ops *ops,

if (ret)
return ret;
+
+ CLEAR_AFTER_FIELD(p, capabilities);
+
return ops->vidioc_reqbufs(file, fh, p);
}

@@ -2081,7 +2084,7 @@ static int v4l_create_bufs(const struct v4l2_ioctl_ops *ops,
if (ret)
return ret;

- CLEAR_AFTER_FIELD(create, flags);
+ CLEAR_AFTER_FIELD(create, capabilities);

v4l_sanitize_format(&create->format);

diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 52ef92049073e3..bbb3f26fbde978 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -744,8 +744,6 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
* vb2_core_reqbufs() - Initiate streaming.
* @q: pointer to &struct vb2_queue with videobuf2 queue.
* @memory: memory type, as defined by &enum vb2_memory.
- * @flags: auxiliary queue/buffer management flags. Currently, the only
- * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
* @count: requested buffer count.
*
* Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
@@ -770,13 +768,12 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
* Return: returns zero on success; an error code otherwise.
*/
int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
- unsigned int flags, unsigned int *count);
+ unsigned int *count);

/**
* vb2_core_create_bufs() - Allocate buffers and any required auxiliary structs
* @q: pointer to &struct vb2_queue with videobuf2 queue.
* @memory: memory type, as defined by &enum vb2_memory.
- * @flags: auxiliary queue/buffer management flags.
* @count: requested buffer count.
* @requested_planes: number of planes requested.
* @requested_sizes: array with the size of the planes.
@@ -794,7 +791,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
* Return: returns zero on success; an error code otherwise.
*/
int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
- unsigned int flags, unsigned int *count,
+ unsigned int *count,
unsigned int requested_planes,
const unsigned int requested_sizes[]);

diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index c7b70ff53bc1dd..235db7754606d6 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -191,8 +191,6 @@ enum v4l2_memory {
V4L2_MEMORY_DMABUF = 4,
};

-#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0)
-
/* see also http://vektor.theorem.ca/graphics/ycbcr/ */
enum v4l2_colorspace {
/*
@@ -949,10 +947,7 @@ struct v4l2_requestbuffers {
__u32 type; /* enum v4l2_buf_type */
__u32 memory; /* enum v4l2_memory */
__u32 capabilities;
- union {
- __u32 flags;
- __u32 reserved[1];
- };
+ __u32 reserved[1];
};

/* capabilities for struct v4l2_requestbuffers and v4l2_create_buffers */
@@ -2456,9 +2451,6 @@ struct v4l2_dbg_chip_info {
* @memory: enum v4l2_memory; buffer memory type
* @format: frame format, for which buffers are requested
* @capabilities: capabilities of this buffer type.
- * @flags: additional buffer management attributes (ignored unless the
- * queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability
- * and configured for MMAP streaming I/O).
* @reserved: future extensions
*/
struct v4l2_create_buffers {
@@ -2467,8 +2459,7 @@ struct v4l2_create_buffers {
__u32 memory;
struct v4l2_format format;
__u32 capabilities;
- __u32 flags;
- __u32 reserved[6];
+ __u32 reserved[7];
};

/*
--
2.28.0

2020-09-21 06:38:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: a saner API for allocating DMA addressable pages v3

Any comments?

Thomas: this should be identical to the git tree I gave you for mips
testing, and you add your tested-by (and reviewd-by tags where
applicable)?

Helge: for parisc this should effectively be the same as the first
version, but I've dropped the tested-by tags due to the reshuffle,
and chance you could retest it?

On Tue, Sep 15, 2020 at 05:51:04PM +0200, Christoph Hellwig wrote:
> Hi all,
>
> this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs
> with a separate new dma_alloc_pages API, which is available on all
> platforms. In addition to cleaning up the convoluted code path, this
> ensures that other drivers that have asked for better support for
> non-coherent DMA to pages with incurring bounce buffering over can finally
> be properly supported.
>
> As a follow up I plan to move the implementation of the
> DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given
> that is also is a fundamentally non coherent allocation. The replacement
> for that flag would then return a struct page, as it is allowed to
> actually return pages without a kernel mapping as the name suggested
> (although most of the time they will actually have a kernel mapping..)
>
> In addition to the conversions of the existing non-coherent DMA users,
> I've also added a patch to convert the firewire ohci driver to use
> the new dma_alloc_pages API.
>
> The first patch is queued up for 5.9 in the media tree, but included here
> for completeness.
>
>
> A git tree is available here:
>
> git://git.infradead.org/users/hch/misc.git dma_alloc_pages
>
> Gitweb:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
>
>
> Changes since v2:
> - fix up the patch reshuffle which wasn't quite correct
> - fix up a few commit messages
>
> Changes since v1:
> - rebased on the latests dma-mapping tree, which merged many of the
> cleanups
> - fix an argument passing typo in 53c700, caught by sparse
> - rename a few macro arguments in 53c700
> - pass the right device to the DMA API in the lib82596 drivers
> - fix memory ownershiptransfers in sgiseeq
> - better document what a page in the direct kernel mapping means
> - split into dma_alloc_pages that returns a struct page and is in the
> direct mapping vs dma_alloc_noncoherent that can be vmapped
> - conver the firewire ohci driver to dma_alloc_pages
>
> Diffstat:
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

2020-09-22 10:57:39

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 06/18] lib82596: move DMA allocation into the callers of i82596_probe

On Tue, Sep 15, 2020 at 05:51:10PM +0200, Christoph Hellwig wrote:
> This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare
> for untangling the coherent vs non-coherent DMA allocation API.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------
> drivers/net/ethernet/i825xx/lib82596.c | 36 ++++++++----------------
> drivers/net/ethernet/i825xx/sni_82596.c | 19 +++++++++----
> 3 files changed, 40 insertions(+), 39 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]> (SNI part)

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:57:42

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 09/18] sgiwd93: convert to dma_alloc_noncoherent

On Tue, Sep 15, 2020 at 05:51:13PM +0200, Christoph Hellwig wrote:
> Use the new non-coherent DMA API including proper ownership transfers.
> This also means we can allocate the memory as DMA_TO_DEVICE instead
> of bidirectional.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/scsi/sgiwd93.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]>

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:57:59

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 07/18] 53c700: improve non-coherent DMA handling

On Tue, Sep 15, 2020 at 05:51:11PM +0200, Christoph Hellwig wrote:
> Switch the 53c700 driver to only use non-coherent descriptor memory if it
> really has to because dma_alloc_coherent fails. This doesn't matter for
> any of the platforms it runs on currently, but that will change soon.
>
> To help with this two new helpers to transfer ownership to and from the
> device are added that abstract the syncing of the non-coherent memory.
> The two current bidirectional cases are mapped to transfers to the
> device, as that appears to what they are used for. Note that for parisc,
> which is the only architecture this driver needs to use non-coherent
> memory on, the direction argument of dma_cache_sync is ignored, so this
> will not change behavior in any way.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/scsi/53c700.c | 113 +++++++++++++++++++++++-------------------
> drivers/scsi/53c700.h | 17 ++++---
> 2 files changed, 72 insertions(+), 58 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]>

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:57:59

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 10/18] hal2: convert to dma_alloc_noncoherent

On Tue, Sep 15, 2020 at 05:51:14PM +0200, Christoph Hellwig wrote:
> Use the new non-coherent DMA API including proper ownership transfers.
> This also means we can allocate the buffer memory with the proper
> direction instead of bidirectional.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> sound/mips/hal2.c | 58 ++++++++++++++++++++++-------------------------
> 1 file changed, 27 insertions(+), 31 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]>

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:58:00

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 15/18] dma-mapping: add a new dma_alloc_pages API

On Tue, Sep 15, 2020 at 05:51:19PM +0200, Christoph Hellwig wrote:
> This API is the equivalent of alloc_pages, except that the returned memory
> is guaranteed to be DMA addressable by the passed in device. The
> implementation will also be used to provide a more sensible replacement
> for DMA_ATTR_NON_CONSISTENT flag.
>
> Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages
> as its backend.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> Documentation/core-api/dma-attributes.rst | 8 ---
> arch/alpha/kernel/pci_iommu.c | 2 +
> arch/arm/mm/dma-mapping-nommu.c | 2 +
> arch/arm/mm/dma-mapping.c | 4 ++
> arch/ia64/hp/common/sba_iommu.c | 2 +
> arch/mips/jazz/jazzdma.c | 7 +--
> arch/powerpc/kernel/dma-iommu.c | 2 +
> arch/powerpc/platforms/ps3/system-bus.c | 4 ++
> arch/powerpc/platforms/pseries/vio.c | 2 +
> arch/s390/pci/pci_dma.c | 2 +
> arch/x86/kernel/amd_gart_64.c | 2 +
> drivers/iommu/dma-iommu.c | 2 +
> drivers/iommu/intel/iommu.c | 4 ++
> drivers/parisc/ccio-dma.c | 2 +
> drivers/parisc/sba_iommu.c | 2 +
> drivers/xen/swiotlb-xen.c | 2 +
> include/linux/dma-direct.h | 5 ++
> include/linux/dma-mapping.h | 34 ++++++------
> include/linux/dma-noncoherent.h | 3 --
> kernel/dma/direct.c | 52 ++++++++++++++++++-
> kernel/dma/mapping.c | 63 +++++++++++++++++++++--
> kernel/dma/ops_helpers.c | 35 +++++++++++++
> kernel/dma/virt.c | 2 +
> 23 files changed, 206 insertions(+), 37 deletions(-)

Acked-by: Thomas Bogendoerfer <[email protected]> (MIPS part)

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:58:04

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 11/18] lib82596: convert to dma_alloc_noncoherent

On Tue, Sep 15, 2020 at 05:51:15PM +0200, Christoph Hellwig wrote:
> Use the new non-coherent DMA API including proper ownership transfers.
> This includes moving the DMA helpers to lib82596 based of an ifdef to
> avoid include order problems.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/net/ethernet/i825xx/lasi_82596.c | 25 ++---
> drivers/net/ethernet/i825xx/lib82596.c | 114 ++++++++++++++---------
> drivers/net/ethernet/i825xx/sni_82596.c | 4 -
> 3 files changed, 80 insertions(+), 63 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]> (SNI part)

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:58:15

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 12/18] sgiseeq: convert to dma_alloc_noncoherent

On Tue, Sep 15, 2020 at 05:51:16PM +0200, Christoph Hellwig wrote:
> Use the new non-coherent DMA API including proper ownership transfers.
> This includes adding additional calls to dma_sync_desc_dev as the
> old syncing was rather ad-hoc.
>
> Thanks to Thomas Bogendoerfer for debugging the ownership transfer
> issues.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/net/ethernet/seeq/sgiseeq.c | 28 ++++++++++++++++++----------
> 1 file changed, 18 insertions(+), 10 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]>

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:58:16

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 13/18] 53c700: convert to dma_alloc_noncoherent

On Tue, Sep 15, 2020 at 05:51:17PM +0200, Christoph Hellwig wrote:
> Use the new non-coherent DMA API including proper ownership transfers.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/scsi/53c700.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)

Tested-by: Thomas Bogendoerfer <[email protected]>

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-22 10:58:27

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 14/18] dma-mapping: remove dma_cache_sync

On Tue, Sep 15, 2020 at 05:51:18PM +0200, Christoph Hellwig wrote:
> All users are gone now, remove the API.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> arch/mips/Kconfig | 1 -
> arch/mips/jazz/jazzdma.c | 1 -
> arch/mips/mm/dma-noncoherent.c | 6 ------
> arch/parisc/Kconfig | 1 -
> arch/parisc/kernel/pci-dma.c | 6 ------
> include/linux/dma-mapping.h | 8 --------
> include/linux/dma-noncoherent.h | 10 ----------
> kernel/dma/Kconfig | 3 ---
> kernel/dma/mapping.c | 14 --------------
> 9 files changed, 50 deletions(-)

Acked-by: Thomas Bogendoerfer <[email protected]> (MIPS part)

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-09-25 04:22:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: a saner API for allocating DMA addressable pages v3

This is in dma-mapping for-next now.

2020-09-25 11:17:13

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH 08/18] dma-mapping: add a new dma_alloc_noncoherent API

On 2020-09-15 16:51, Christoph Hellwig wrote:
[...]
> +These APIs allow to allocate pages in the kernel direct mapping that are
> +guaranteed to be DMA addressable. This means that unlike dma_alloc_coherent,
> +virt_to_page can be called on the resulting address, and the resulting

Nit: if we explicitly describe this as if it's a guarantee that can be
relied upon...

> +struct page can be used for everything a struct page is suitable for.

[...]
> +This routine allocates a region of <size> bytes of consistent memory. It
> +returns a pointer to the allocated region (in the processor's virtual address
> +space) or NULL if the allocation failed. The returned memory may or may not
> +be in the kernels direct mapping. Drivers must not call virt_to_page on
> +the returned memory region.

...then forbid this document's target audience from relying on it,
something seems off. At the very least it's unhelpfully unclear :/

Given patch #17, I suspect that the first paragraph is the one that's no
longer true.

Robin.

2020-09-25 16:21:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 08/18] dma-mapping: add a new dma_alloc_noncoherent API

On Fri, Sep 25, 2020 at 12:15:37PM +0100, Robin Murphy wrote:
> On 2020-09-15 16:51, Christoph Hellwig wrote:
> [...]
>> +These APIs allow to allocate pages in the kernel direct mapping that are
>> +guaranteed to be DMA addressable. This means that unlike dma_alloc_coherent,
>> +virt_to_page can be called on the resulting address, and the resulting
>
> Nit: if we explicitly describe this as if it's a guarantee that can be
> relied upon...
>
>> +struct page can be used for everything a struct page is suitable for.
>
> [...]
>> +This routine allocates a region of <size> bytes of consistent memory. It
>> +returns a pointer to the allocated region (in the processor's virtual address
>> +space) or NULL if the allocation failed. The returned memory may or may not
>> +be in the kernels direct mapping. Drivers must not call virt_to_page on
>> +the returned memory region.
>
> ...then forbid this document's target audience from relying on it,
> something seems off. At the very least it's unhelpfully unclear :/
>
> Given patch #17, I suspect that the first paragraph is the one that's no
> longer true.

Yes. dma_alloc_pages is the replacement for allocations that need the
direct mapping. I'll send a patch to document dma_alloc_pages and
fixes this up

2020-09-25 18:47:47

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

Hi Christoph,

On Tue, Sep 15, 2020 at 05:51:21PM +0200, Christoph Hellwig wrote:
> Implement the alloc_noncoherent method to provide memory that is neither
> coherent not contiguous.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/iommu/dma-iommu.c | 41 +++++++++++++++++++++++++++++++++++----
> 1 file changed, 37 insertions(+), 4 deletions(-)
>

Sorry for being late to the party and thanks a lot for the patch. Please see my
comments inline.

[snip]
> @@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
> return cpu_addr;
> }
>
> +#ifdef CONFIG_DMA_REMAP
> +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
> + dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
> +{
> + if (!gfpflags_allow_blocking(gfp)) {
> + struct page *page;
> +
> + page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
> + if (!page)
> + return NULL;
> + return page_address(page);
> + }
> +
> + return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
> + PAGE_KERNEL, 0);

iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute
to optimize the allocations for devices which don't care about how contiguous
the backing memory is. Do you think we could add an attrs argument to this
function and pass it there?

As ARM is being moved to the common iommu-dma layer as well, we'll probably
make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to
conserve the vmalloc area.

Best regards,
Tomasz

2020-09-25 18:49:06

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 01/18] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT flag

Hi Christoph,

On Tue, Sep 15, 2020 at 05:51:05PM +0200, Christoph Hellwig wrote:
> From: Sergey Senozhatsky <[email protected]>
>
> The patch partially reverts some of the UAPI bits of the buffer
> cache management hints. Namely, the queue consistency (memory
> coherency) user-space hint because, as it turned out, the kernel
> implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.
>
> The patch revers both kernel and user space parts: removes the
> DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers,
> v4l2_create_buffers structures and corresponding UAPI functions
> (plus compat32 layer) and cleanups the documentation.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Sergey Senozhatsky <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> .../userspace-api/media/v4l/buffer.rst | 17 -------
> .../media/v4l/vidioc-create-bufs.rst | 6 +--
> .../media/v4l/vidioc-reqbufs.rst | 12 +----
> .../media/common/videobuf2/videobuf2-core.c | 46 +++----------------
> .../common/videobuf2/videobuf2-dma-contig.c | 19 --------
> .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +-
> .../media/common/videobuf2/videobuf2-v4l2.c | 18 +-------
> drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +---
> drivers/media/v4l2-core/v4l2-ioctl.c | 5 +-
> include/media/videobuf2-core.h | 7 +--
> include/uapi/linux/videodev2.h | 13 +-----
> 11 files changed, 22 insertions(+), 134 deletions(-)

Acked-by: Tomasz Figa <[email protected]>

Best regards,
Tomasz

2020-09-26 14:16:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

On Fri, Sep 25, 2020 at 06:46:22PM +0000, Tomasz Figa wrote:
> > +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
> > + dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
> > +{
> > + if (!gfpflags_allow_blocking(gfp)) {
> > + struct page *page;
> > +
> > + page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
> > + if (!page)
> > + return NULL;
> > + return page_address(page);
> > + }
> > +
> > + return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
> > + PAGE_KERNEL, 0);
>
> iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute
> to optimize the allocations for devices which don't care about how contiguous
> the backing memory is. Do you think we could add an attrs argument to this
> function and pass it there?
>
> As ARM is being moved to the common iommu-dma layer as well, we'll probably
> make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to
> conserve the vmalloc area.

We could probably at it. However I wonder why this is something the
drivers should care about. Isn't this really something that should
be a kernel-wide policy for a given system?

2020-09-26 15:38:26

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

On Sat, Sep 26, 2020 at 4:14 PM Christoph Hellwig <[email protected]> wrote:
>
> On Fri, Sep 25, 2020 at 06:46:22PM +0000, Tomasz Figa wrote:
> > > +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
> > > + dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
> > > +{
> > > + if (!gfpflags_allow_blocking(gfp)) {
> > > + struct page *page;
> > > +
> > > + page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
> > > + if (!page)
> > > + return NULL;
> > > + return page_address(page);
> > > + }
> > > +
> > > + return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
> > > + PAGE_KERNEL, 0);
> >
> > iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute
> > to optimize the allocations for devices which don't care about how contiguous
> > the backing memory is. Do you think we could add an attrs argument to this
> > function and pass it there?
> >
> > As ARM is being moved to the common iommu-dma layer as well, we'll probably
> > make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to
> > conserve the vmalloc area.
>
> We could probably at it. However I wonder why this is something the
> drivers should care about. Isn't this really something that should
> be a kernel-wide policy for a given system?

There are IOMMUs out there which support huge pages and those can
benefit *some* hardware depending on what kind of accesses they
perform, possibly on a per-buffer basis. At the same time, order > 0
allocations can be expensive, significantly affecting allocation
latency, so for devices which don't care about huge pages anyone would
prefer simple single-page allocations. Currently the drivers know the
best on whether the hardware they drive would care. There are some
decision factors listed in the documentation [1].

I can imagine cases where drivers could not be the best to decide
about this - for example, the workload could vary depending on the
userspace or a product decision regarding the performance vs
allocation latency, but we haven't seen such cases in practice yet.

[1] https://www.kernel.org/doc/html/latest/core-api/dma-attributes.html?highlight=dma_attr_alloc_single_pages#dma-attr-alloc-single-pages

Best regards,
Tomasz