2023-09-29 22:13:09

by Chris Leech

[permalink] [raw]
Subject: [PATCH 0/3] UIO_MEM_DMA_COHERENT for cnic/bnx2/bnx2x

During bnx2i iSCSI testing we ran into page refcounting issues in the
uio mmaps exported from cnic to the iscsiuio process, and bisected back
to the removal of the __GFP_COMP flag from dma_alloc_coherent calls.

In order to fix these drivers to be able to mmap dma coherent memory via
a uio device, without resorting to hacks and working with an iommu
enabled, introduce a new uio mmap type backed by dma_mmap_coherent.

While converting the uio interface, I also noticed that not all of these
allocations were PAGE_SIZE aligned. Particularly the bnx2/bnx2x status
block mapping was much smaller than any architecture page size, and I
was concerned that it could be unintentionally exposing kernel memory.

Chris Leech (3):
uio: introduce UIO_DMA_COHERENT type
cnic. bnx2, bnx2x: page align uio mmap allocations
cnic, bnx2, bnx2x: use UIO_MEM_DMA_COHERENT

drivers/net/ethernet/broadcom/bnx2.c | 2 ++
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 10 +++---
drivers/net/ethernet/broadcom/cnic.c | 34 ++++++++++++-------
drivers/net/ethernet/broadcom/cnic.h | 1 +
drivers/net/ethernet/broadcom/cnic_if.h | 1 +
drivers/uio/uio.c | 34 +++++++++++++++++++
include/linux/uio_driver.h | 12 +++++--
7 files changed, 75 insertions(+), 19 deletions(-)

--
2.41.0


2023-09-29 23:51:29

by Chris Leech

[permalink] [raw]
Subject: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type

Add a UIO memtype specificially for sharing dma_alloc_coherent
memory with userspace, backed by dma_mmap_coherent.

Signed-off-by: Chris Leech <[email protected]>
---
drivers/uio/uio.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/uio_driver.h | 12 ++++++++++--
2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 62082d64ece0..f8f1f7ba6378 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -24,6 +24,7 @@
#include <linux/kobject.h>
#include <linux/cdev.h>
#include <linux/uio_driver.h>
+#include <linux/dma-mapping.h>

#define UIO_MAX_DEVICES (1U << MINORBITS)

@@ -759,6 +760,36 @@ static int uio_mmap_physical(struct vm_area_struct *vma)
vma->vm_page_prot);
}

+static int uio_mmap_dma_coherent(struct vm_area_struct *vma)
+{
+ struct uio_device *idev = vma->vm_private_data;
+ int mi = uio_find_mem_index(vma);
+ struct uio_mem *mem;
+ int rc;
+
+ if (mi < 0)
+ return -EINVAL;
+ mem = idev->info->mem + mi;
+
+ if (mem->dma_addr & ~PAGE_MASK)
+ return -ENODEV;
+ if (vma->vm_end - vma->vm_start > mem->size)
+ return -EINVAL;
+
+ /*
+ * UIO uses offset to index into the maps for a device.
+ * We need to clear vm_pgoff for dma_mmap_coherent.
+ */
+ vma->vm_pgoff = 0;
+ rc = dma_mmap_coherent(mem->dma_device,
+ vma,
+ mem->virtual_addr,
+ mem->dma_addr,
+ vma->vm_end - vma->vm_start);
+ vma->vm_pgoff = mi;
+ return rc;
+}
+
static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
{
struct uio_listener *listener = filep->private_data;
@@ -806,6 +837,9 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
case UIO_MEM_VIRTUAL:
ret = uio_mmap_logical(vma);
break;
+ case UIO_MEM_DMA_COHERENT:
+ ret = uio_mmap_dma_coherent(vma);
+ break;
default:
ret = -EINVAL;
}
diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
index 47c5962b876b..ede58e984658 100644
--- a/include/linux/uio_driver.h
+++ b/include/linux/uio_driver.h
@@ -36,11 +36,18 @@ struct uio_map;
*/
struct uio_mem {
const char *name;
- phys_addr_t addr;
+ union {
+ phys_addr_t addr;
+ dma_addr_t dma_addr;
+ };
unsigned long offs;
resource_size_t size;
int memtype;
- void __iomem *internal_addr;
+ union {
+ void __iomem *internal_addr;
+ void *virtual_addr;
+ };
+ struct device *dma_device;
struct uio_map *map;
};

@@ -158,6 +165,7 @@ extern int __must_check
#define UIO_MEM_LOGICAL 2
#define UIO_MEM_VIRTUAL 3
#define UIO_MEM_IOVA 4
+#define UIO_MEM_DMA_COHERENT 5

/* defines for uio_port->porttype */
#define UIO_PORT_NONE 0
--
2.41.0

2023-09-30 00:51:38

by Chris Leech

[permalink] [raw]
Subject: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
for dma_alloc_coherent buffers.

The cnic l2_ring and l2_buf mmaps have caused page refcount issues since
the misuse of the __GFP_COMP flag was removed from their
dma_alloc_coherent calls. Fix that by having the uio device use
dma_mmap_coherent.

The bnx2 and bnx2x status block allocations are also dma_alloc_coherent,
and should use dma_mmap_coherent. They didn't allocate multiple pages,
but also didn't seem to work correctly with an iommu enabled.

Fixes: bb73955c0b1d ("cnic: don't pass bogus GFP_ flags to dma_alloc_coherent")
Signed-off-by: Chris Leech <[email protected]>
---
drivers/net/ethernet/broadcom/bnx2.c | 1 +
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 ++
drivers/net/ethernet/broadcom/cnic.c | 25 ++++++++++++-------
drivers/net/ethernet/broadcom/cnic.h | 1 +
drivers/net/ethernet/broadcom/cnic_if.h | 1 +
5 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 84a04eec654a..490f88ad3bd2 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -367,6 +367,7 @@ static void bnx2_setup_cnic_irq_info(struct bnx2 *bp)
cp->irq_arr[0].status_blk = (void *)
((unsigned long) bnapi->status_blk.msi +
(BNX2_SBLK_MSIX_ALIGN_SIZE * sb_id));
+ cp->irq_arr[0].status_blk_map = bp->status_blk_mapping;
cp->irq_arr[0].status_blk_num = sb_id;
cp->num_irq = 1;
}
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2fcde42a05c1..dbaa90b7f889 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14912,9 +14912,11 @@ void bnx2x_setup_cnic_irq_info(struct bnx2x *bp)
else
cp->irq_arr[0].status_blk = (void *)bp->cnic_sb.e1x_sb;

+ cp->irq_arr[0].status_blk_map = bp->cnic_sb_mapping;
cp->irq_arr[0].status_blk_num = bnx2x_cnic_fw_sb_id(bp);
cp->irq_arr[0].status_blk_num2 = bnx2x_cnic_igu_sb_id(bp);
cp->irq_arr[1].status_blk = bp->def_status_blk;
+ cp->irq_arr[1].status_blk_map = bp->def_status_blk_mapping;
cp->irq_arr[1].status_blk_num = DEF_SB_ID;
cp->irq_arr[1].status_blk_num2 = DEF_SB_IGU_ID;

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 67ec397bd171..05e28e00e916 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1106,8 +1106,8 @@ static int cnic_init_uio(struct cnic_dev *dev)
if (test_bit(CNIC_F_BNX2_CLASS, &dev->flags)) {
uinfo->mem[0].size = MB_GET_CID_ADDR(TX_TSS_CID +
TX_MAX_TSS_RINGS + 1);
- uinfo->mem[1].addr = (unsigned long) cp->status_blk.gen &
- CNIC_PAGE_MASK;
+ uinfo->mem[1].dma_addr = cp->status_blk_map;
+ uinfo->mem[1].virtual_addr = cp->status_blk.gen;
if (cp->ethdev->drv_state & CNIC_DRV_STATE_USING_MSIX)
uinfo->mem[1].size = PAGE_ALIGN(BNX2_SBLK_MSIX_ALIGN_SIZE * 9);
else
@@ -1117,22 +1117,27 @@ static int cnic_init_uio(struct cnic_dev *dev)
} else if (test_bit(CNIC_F_BNX2X_CLASS, &dev->flags)) {
uinfo->mem[0].size = pci_resource_len(dev->pcidev, 0);

- uinfo->mem[1].addr = (unsigned long) cp->bnx2x_def_status_blk &
- CNIC_PAGE_MASK;
+ uinfo->mem[1].dma_addr = cp->status_blk_map;
+ uinfo->mem[1].virtual_addr = cp->bnx2x_def_status_blk;
uinfo->mem[1].size = PAGE_ALIGN(sizeof(*cp->bnx2x_def_status_blk));

uinfo->name = "bnx2x_cnic";
}

- uinfo->mem[1].memtype = UIO_MEM_LOGICAL;
+ uinfo->mem[1].dma_device = &dev->pcidev->dev;
+ uinfo->mem[1].memtype = UIO_MEM_DMA_COHERENT;

- uinfo->mem[2].addr = (unsigned long) udev->l2_ring;
+ uinfo->mem[2].dma_addr = udev->l2_ring_map;
+ uinfo->mem[2].virtual_addr = udev->l2_ring;
uinfo->mem[2].size = udev->l2_ring_size;
- uinfo->mem[2].memtype = UIO_MEM_LOGICAL;
+ uinfo->mem[2].dma_device = &dev->pcidev->dev;
+ uinfo->mem[2].memtype = UIO_MEM_DMA_COHERENT;

- uinfo->mem[3].addr = (unsigned long) udev->l2_buf;
+ uinfo->mem[3].dma_addr = udev->l2_buf_map;
+ uinfo->mem[3].virtual_addr = udev->l2_buf;
uinfo->mem[3].size = udev->l2_buf_size;
- uinfo->mem[3].memtype = UIO_MEM_LOGICAL;
+ uinfo->mem[3].dma_device = &dev->pcidev->dev;
+ uinfo->mem[3].memtype = UIO_MEM_DMA_COHERENT;

uinfo->version = CNIC_MODULE_VERSION;
uinfo->irq = UIO_IRQ_CUSTOM;
@@ -1314,6 +1319,7 @@ static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev)
return 0;

cp->bnx2x_def_status_blk = cp->ethdev->irq_arr[1].status_blk;
+ cp->status_blk_map = cp->ethdev->irq_arr[1].status_blk_map;

cp->l2_rx_ring_size = 15;

@@ -5324,6 +5330,7 @@ static int cnic_start_hw(struct cnic_dev *dev)
pci_dev_get(dev->pcidev);
cp->func = PCI_FUNC(dev->pcidev->devfn);
cp->status_blk.gen = ethdev->irq_arr[0].status_blk;
+ cp->status_blk_map = ethdev->irq_arr[0].status_blk_map;
cp->status_blk_num = ethdev->irq_arr[0].status_blk_num;

err = cp->alloc_resc(dev);
diff --git a/drivers/net/ethernet/broadcom/cnic.h b/drivers/net/ethernet/broadcom/cnic.h
index 4baea81bae7a..fedc84ada937 100644
--- a/drivers/net/ethernet/broadcom/cnic.h
+++ b/drivers/net/ethernet/broadcom/cnic.h
@@ -260,6 +260,7 @@ struct cnic_local {
#define SM_RX_ID 0
#define SM_TX_ID 1
} status_blk;
+ dma_addr_t status_blk_map;

struct host_sp_status_block *bnx2x_def_status_blk;

diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 789e5c7e9311..49a11ec80b36 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -190,6 +190,7 @@ struct cnic_ops {
struct cnic_irq {
unsigned int vector;
void *status_blk;
+ dma_addr_t status_blk_map;
u32 status_blk_num;
u32 status_blk_num2;
u32 irq_flags;
--
2.41.0

2023-09-30 12:15:53

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > for dma_alloc_coherent buffers.
>
> Why are ethernet drivers messing around with UIO devices? That's not
> what UIO is for, unless you are trying to do kernel bypass for these
> devices without anyone noticing?
>
> confused,
>
> greg k-h

Hi Greg,

It is for iscsi offload [1], and apparently cnic has been handing off
dma alloc'd memory to uio for this since 2009, but it has just been
handing off the address from dma_alloc_coherent to uio, and giving the
dma handle the bnx2x device. That managed to avoid being an issue
until changes last year rightly cleaned up allowing __GFP_COMP to be
passed to the dma_alloc* calls, which cnic was passing to
dma_alloc_coherent. Now iscsiuio goes to mmap through the uio device
and the result is a BUG and stack trace like below.

It was my suggestion that it either needs to use dma_mmap_coherent to
mmap the memory, which possibly is a mistaken understanding on my
part but what dma_mmap_coherent says it is for, or possibly look to
do something similar to what qed is doing for uio.

Regards,
Jerry


[ 129.196731] page:ffffea0004cacb40 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x132b2d
[ 129.207285] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 129.214650] page_type: 0xffffffff()
[ 129.218584] raw: 0017ffffc0000000 dead000000000100 dead000000000122 0000000000000000
[ 129.227264] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 129.235937] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))
[ 129.246632] ------------[ cut here ]------------
[ 129.251817] kernel BUG at include/linux/mm.h:1441!
[ 129.257209] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[ 129.263440] CPU: 1 PID: 1930 Comm: iscsiuio Kdump: loaded Not tainted 6.6.0-0.rc3.26.eln130.x86_64+debug #1
[ 129.274323] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.4.2 01/29/2015
[ 129.282682] RIP: 0010:uio_vma_fault+0x40e/0x520 [uio]
[ 129.288345] Code: 49 83 ee 01 e9 db fe ff ff be 01 00 00 00 4c 89 f7 e8 96 7f ae e9 e9 2b ff ff ff 48 c7 c6 00 08 52 c1 4c 89 f7 e8 12 1b 8e e9 <0f> 0b e8 cb 7b a3 e9 e9 f6 fd ff ff bb 02 00 00 00 e9 2b ff ff ff
[ 129.309311] RSP: 0018:ffffc900022878b0 EFLAGS: 00010246
[ 129.315156] RAX: 000000000000005c RBX: ffffea0004cacb40 RCX: 0000000000000000
[ 129.323130] RDX: 0000000000000000 RSI: ffffffffad380f00 RDI: 0000000000000001
[ 129.331102] RBP: ffff8881a65da528 R08: 0000000000000001 R09: fffff52000450eca
[ 129.339074] R10: ffffc90002287657 R11: 0000000000000001 R12: ffffea0004cacb74
[ 129.347044] R13: ffffc900022879f8 R14: ffffea0004cacb40 R15: ffff8881946a0400
[ 129.355015] FS: 00007fc6dcafe640(0000) GS:ffff8885cb800000(0000) knlGS:0000000000000000
[ 129.364054] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 129.370474] CR2: 000055b5c9f091a0 CR3: 0000000162292001 CR4: 00000000001706e0
[ 129.378446] Call Trace:
[ 129.381183] <TASK>
[ 129.383532] ? die+0x36/0x90
[ 129.386765] ? do_trap+0x199/0x240
[ 129.390573] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.395560] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.400539] ? do_error_trap+0xa3/0x170
[ 129.404830] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.409815] ? handle_invalid_op+0x2c/0x40
[ 129.414395] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.419361] ? exc_invalid_op+0x2d/0x40
[ 129.423657] ? asm_exc_invalid_op+0x1a/0x20
[ 129.428332] ? uio_vma_fault+0x40e/0x520 [uio]
[ 129.433301] __do_fault+0xf2/0x5a0
[ 129.437103] do_fault+0x68e/0xc30
[ 129.440804] ? __pfx___pte_offset_map_lock+0x10/0x10
[ 129.446350] __handle_mm_fault+0x8e0/0xeb0
[ 129.450939] ? follow_page_pte+0x29a/0xda0
[ 129.455513] ? __pfx___handle_mm_fault+0x10/0x10
[ 129.460668] ? __pfx_follow_page_pte+0x10/0x10
[ 129.465626] ? follow_pmd_mask.isra.0+0x1d4/0xab0
[ 129.470877] ? count_memcg_event_mm.part.0+0xc7/0x1f0
[ 129.476515] ? rcu_is_watching+0x15/0xb0
[ 129.480896] handle_mm_fault+0x2f2/0x8d0
[ 129.485277] ? check_vma_flags+0x1c2/0x420
[ 129.489852] __get_user_pages+0x333/0xa20
[ 129.494331] ? __pfx___get_user_pages+0x10/0x10
[ 129.499389] ? __pfx_mt_find+0x10/0x10
[ 129.503568] ? __mm_populate+0xe7/0x360
[ 129.507850] ? rcu_is_watching+0x15/0xb0
[ 129.512231] populate_vma_page_range+0x1e9/0x2d0
[ 129.517388] ? __pfx_populate_vma_page_range+0x10/0x10
[ 129.523125] ? __pfx_mmap_region+0x10/0x10
[ 129.527693] __mm_populate+0x1ff/0x360
[ 129.531882] ? __pfx___mm_populate+0x10/0x10
[ 129.536649] ? do_mmap+0x61d/0xcd0
[ 129.540446] ? __up_write+0x1a5/0x500
[ 129.544536] vm_mmap_pgoff+0x276/0x360
[ 129.548722] ? rcu_is_watching+0x15/0xb0
[ 129.553093] ? __pfx_vm_mmap_pgoff+0x10/0x10
[ 129.557861] ? rcu_is_watching+0x15/0xb0
[ 129.562239] ? lock_release+0x25c/0x300
[ 129.566522] ? __fget_files+0x1e0/0x380
[ 129.570807] ksys_mmap_pgoff+0x2e4/0x430
[ 129.575189] do_syscall_64+0x60/0x90
[ 129.579180] ? do_syscall_64+0x6c/0x90
[ 129.583357] ? lockdep_hardirqs_on+0x7d/0x100
[ 129.588235] ? do_syscall_64+0x6c/0x90
[ 129.592420] ? lockdep_hardirqs_on+0x7d/0x100
[ 129.597283] entry_SYSCALL_64_after_hwframe+0x6e/0xd8



[1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README

2023-09-30 12:32:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> for dma_alloc_coherent buffers.

Why are ethernet drivers messing around with UIO devices? That's not
what UIO is for, unless you are trying to do kernel bypass for these
devices without anyone noticing?

confused,

greg k-h

2023-09-30 17:20:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type

On Fri, Sep 29, 2023 at 10:00:21AM -0700, Chris Leech wrote:
> Add a UIO memtype specificially for sharing dma_alloc_coherent
> memory with userspace, backed by dma_mmap_coherent.

Are you sure that you can share this type of memory with userspace
safely? And you are saying what you are doing here, but not why you
want to do it and who will use it.

What are the userspace implications for accessing this type of memory?

>
> Signed-off-by: Chris Leech <[email protected]>
> ---
> drivers/uio/uio.c | 34 ++++++++++++++++++++++++++++++++++
> include/linux/uio_driver.h | 12 ++++++++++--
> 2 files changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
> index 62082d64ece0..f8f1f7ba6378 100644
> --- a/drivers/uio/uio.c
> +++ b/drivers/uio/uio.c
> @@ -24,6 +24,7 @@
> #include <linux/kobject.h>
> #include <linux/cdev.h>
> #include <linux/uio_driver.h>
> +#include <linux/dma-mapping.h>
>
> #define UIO_MAX_DEVICES (1U << MINORBITS)
>
> @@ -759,6 +760,36 @@ static int uio_mmap_physical(struct vm_area_struct *vma)
> vma->vm_page_prot);
> }
>
> +static int uio_mmap_dma_coherent(struct vm_area_struct *vma)
> +{
> + struct uio_device *idev = vma->vm_private_data;
> + int mi = uio_find_mem_index(vma);
> + struct uio_mem *mem;
> + int rc;
> +
> + if (mi < 0)
> + return -EINVAL;
> + mem = idev->info->mem + mi;
> +
> + if (mem->dma_addr & ~PAGE_MASK)
> + return -ENODEV;
> + if (vma->vm_end - vma->vm_start > mem->size)
> + return -EINVAL;
> +
> + /*
> + * UIO uses offset to index into the maps for a device.
> + * We need to clear vm_pgoff for dma_mmap_coherent.
> + */
> + vma->vm_pgoff = 0;
> + rc = dma_mmap_coherent(mem->dma_device,
> + vma,
> + mem->virtual_addr,
> + mem->dma_addr,
> + vma->vm_end - vma->vm_start);
> + vma->vm_pgoff = mi;
> + return rc;
> +}
> +
> static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
> {
> struct uio_listener *listener = filep->private_data;
> @@ -806,6 +837,9 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma)
> case UIO_MEM_VIRTUAL:
> ret = uio_mmap_logical(vma);
> break;
> + case UIO_MEM_DMA_COHERENT:
> + ret = uio_mmap_dma_coherent(vma);
> + break;
> default:
> ret = -EINVAL;
> }
> diff --git a/include/linux/uio_driver.h b/include/linux/uio_driver.h
> index 47c5962b876b..ede58e984658 100644
> --- a/include/linux/uio_driver.h
> +++ b/include/linux/uio_driver.h
> @@ -36,11 +36,18 @@ struct uio_map;
> */
> struct uio_mem {
> const char *name;
> - phys_addr_t addr;
> + union {
> + phys_addr_t addr;
> + dma_addr_t dma_addr;
> + };
> unsigned long offs;
> resource_size_t size;
> int memtype;
> - void __iomem *internal_addr;
> + union {
> + void __iomem *internal_addr;
> + void *virtual_addr;
> + };
> + struct device *dma_device;

Why are you adding a new struct device here?

And why the unions? How are you going to verify that they are being
used correctly? What space savings are you attempting to do here and
why?

thanks,

greg k-h

2023-09-30 19:31:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sat, Sep 30, 2023 at 02:10:50AM -0700, Jerry Snitselaar wrote:
> [1] https://github.com/open-iscsi/open-iscsi/blob/master/iscsiuio/README

That's IP offload, not what UIO is supposed to be for at all (yes, I
know DPDK abuses this api as well, and I hate it.) But this is on a
network card, it shouldn't need UIO. Why is iscsi somehow "special" for
this?

still confused,

greg k-h

2023-09-30 20:02:18

by Chris Leech

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > for dma_alloc_coherent buffers.
>
> Why are ethernet drivers messing around with UIO devices? That's not
> what UIO is for, unless you are trying to do kernel bypass for these
> devices without anyone noticing?
>
> confused,

It's confusing. The bnx2 driver stack included a cnic (converged nic?)
module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
offload drivers (iscsi, fcoe, rdma).

The iscsi module (bnx2i) uses a passthrough interface from cnic to
handle some network configuration that the device firmware doesn't do.
It uses a uio device and a userspace component called iscsiuio to do
that.

Questions beyond that will probably need to be answer by one of the many
Marvell engineers copied on this thread.

- Chris

2023-10-01 02:55:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > for dma_alloc_coherent buffers.
> >
> > Why are ethernet drivers messing around with UIO devices? That's not
> > what UIO is for, unless you are trying to do kernel bypass for these
> > devices without anyone noticing?
> >
> > confused,
>
> It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> offload drivers (iscsi, fcoe, rdma).
>
> The iscsi module (bnx2i) uses a passthrough interface from cnic to
> handle some network configuration that the device firmware doesn't do.
> It uses a uio device and a userspace component called iscsiuio to do
> that.

That's horrible, and not what the UIO api is for at all. Configure the
device like any other normal kernel device, don't poke at raw memory
values directly, that way lies madness.

Have a pointer to the userspace tool anywhere? All I found looks like a
full IP stack in userspace under that name, and surely that's not what
this api is for...

thanks,

greg k-h

2023-10-01 03:00:44

by Chris Leech

[permalink] [raw]
Subject: Re: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type

On Sat, Sep 30, 2023 at 09:10:10AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Sep 29, 2023 at 10:00:21AM -0700, Chris Leech wrote:
> > Add a UIO memtype specificially for sharing dma_alloc_coherent
> > memory with userspace, backed by dma_mmap_coherent.
>
> Are you sure that you can share this type of memory with userspace
> safely? And you are saying what you are doing here, but not why you
> want to do it and who will use it.
>
> What are the userspace implications for accessing this type of memory?

Thanks for taking the time to look at this Greg.
I'm trying to help Marvell fix a regression with these drivers, by
figuring out what the right way to handle this type of mmap is.

The dma_mmap_coherent API exists for exactly this, so I thought making
the uio interface aware of it made sense. There are uio drivers sharing
dma_alloc_coherent memory (uio_dmem_genirq, uio_pruss) using
UIO_MEM_PHYS, but that falls apart in the face of an iommu.

> > struct uio_mem {
> > const char *name;
> > - phys_addr_t addr;
> > + union {
> > + phys_addr_t addr;
> > + dma_addr_t dma_addr;
> > + };
> > unsigned long offs;
> > resource_size_t size;
> > int memtype;
> > - void __iomem *internal_addr;
> > + union {
> > + void __iomem *internal_addr;
> > + void *virtual_addr;
> > + };
> > + struct device *dma_device;
>
> Why are you adding a new struct device here?

dma_mmap_coherent wants it.

> And why the unions? How are you going to verify that they are being
> used correctly? What space savings are you attempting to do here and
> why?

I should have expected that would be questioned, I was being paranoid
about mixing different pointer and address types. I can remove the
unions if putting a dma_addr_t in addr going to be OK.

- Chris

2023-10-01 11:50:47

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
>> On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
>>> On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
>>>> Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
>>>> for dma_alloc_coherent buffers.
>>>
>>> Why are ethernet drivers messing around with UIO devices? That's not
>>> what UIO is for, unless you are trying to do kernel bypass for these
>>> devices without anyone noticing?
>>>
>>> confused,
>>
>> It's confusing. The bnx2 driver stack included a cnic (converged nic?)
>> module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
>> offload drivers (iscsi, fcoe, rdma).
>>
>> The iscsi module (bnx2i) uses a passthrough interface from cnic to
>> handle some network configuration that the device firmware doesn't do.
>> It uses a uio device and a userspace component called iscsiuio to do
>> that.
>
> That's horrible, and not what the UIO api is for at all. Configure the
> device like any other normal kernel device, don't poke at raw memory
> values directly, that way lies madness.
>
> Have a pointer to the userspace tool anywhere? All I found looks like a
> full IP stack in userspace under that name, and surely that's not what
> this api is for...
>
But that's how the interface is used, in particular for the bnx2i
driver. Problem is that the bnx2i iSCSI offload is just that, an iSCSI
offload. Not a TCP offload. So if the iSCSI interface is configured to
acquire the IP address via DHCP, someone has to run the DHCP protocol.
But the iSCSI offload can't, and the bnx2i PCI device is not a network
device so that the normal network stack can't be used.
And so the architects of the bnx2i card decided to use UIO to pass
the network traffic to userspace, and used the userspace 'iscsiuio'
application to run DHCP in userspace.

But's been that way for several years now; so long, in fact, that
the card itself has been out of support from Marvell (since quite some
years, too, IIRC). And even the successor of that card (the qedi driver)
is nearing EOL. Mind you, the qedi driver is using the same interface
(by using UIO to run DHCP in userspace), so singling out the bnx2i for
bad design can be construed as being unfair :-)

I agree, though, that the design is a mess.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2023-10-01 18:05:34

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sun, Oct 01, 2023 at 01:57:25PM +0200, Greg Kroah-Hartman wrote:
> On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote:
> > On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> > > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> > > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > > > > for dma_alloc_coherent buffers.
> > > > >
> > > > > Why are ethernet drivers messing around with UIO devices? That's not
> > > > > what UIO is for, unless you are trying to do kernel bypass for these
> > > > > devices without anyone noticing?
> > > > >
> > > > > confused,
> > > >
> > > > It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> > > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> > > > offload drivers (iscsi, fcoe, rdma).
> > > >
> > > > The iscsi module (bnx2i) uses a passthrough interface from cnic to
> > > > handle some network configuration that the device firmware doesn't do.
> > > > It uses a uio device and a userspace component called iscsiuio to do
> > > > that.
> > >
> > > That's horrible, and not what the UIO api is for at all. Configure the
> > > device like any other normal kernel device, don't poke at raw memory
> > > values directly, that way lies madness.
> > >
> > > Have a pointer to the userspace tool anywhere? All I found looks like a
> > > full IP stack in userspace under that name, and surely that's not what
> > > this api is for...
> > >
> > But that's how the interface is used, in particular for the bnx2i driver.
> > Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not
> > a TCP offload. So if the iSCSI interface is configured to
> > acquire the IP address via DHCP, someone has to run the DHCP protocol.
> > But the iSCSI offload can't, and the bnx2i PCI device is not a network
> > device so that the normal network stack can't be used.
> > And so the architects of the bnx2i card decided to use UIO to pass
> > the network traffic to userspace, and used the userspace 'iscsiuio'
> > application to run DHCP in userspace.
> >
> > But's been that way for several years now; so long, in fact, that
> > the card itself has been out of support from Marvell (since quite some
> > years, too, IIRC). And even the successor of that card (the qedi driver)
> > is nearing EOL. Mind you, the qedi driver is using the same interface (by
> > using UIO to run DHCP in userspace), so singling out the bnx2i for bad
> > design can be construed as being unfair :-)
>
> Ok, let's say they are all horrible! :)
>
> > I agree, though, that the design is a mess.
>
> Ok, so why are we papering over it and continuing to allow it to exist?
>
> What "broke" to suddenly require this UIO change? If this has been
> around for a very long time, what has caused this to now require the UIO
> layer to change?
>
> thanks,
>
> greg k-h

Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
in particular these two (from the e529d3507a93 dma-mapping pull for
6.2):

ffcb75458460 dma-mapping: reject __GFP_COMP in dma_alloc_attrs | 2022-11-21 | (Christoph Hellwig)
bb73955c0b1d cnic: don't pass bogus GFP_ flags to dma_alloc_coherent | 2022-11-21 | (Christoph Hellwig)


Regards,
Jerry

2023-10-02 06:04:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> in particular these two (from the e529d3507a93 dma-mapping pull for
> 6.2):

That's complete BS. The driver was broken since day 1 and always
ignored the DMA API requirement to never try to grab the page from the
dma coherent allocation because you generally speaking can't. It just
happened to accidentally work the trivial dma coherent allocator that
is used on x86.

2023-10-02 06:09:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/3] uio: introduce UIO_DMA_COHERENT type

This is the right way to map dma coherent memory to userspace.
I have to say I agree with the doubts on what it is actually
trying to do, though.

2023-10-02 10:36:33

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > in particular these two (from the e529d3507a93 dma-mapping pull for
> > 6.2):
>
> That's complete BS. The driver was broken since day 1 and always
> ignored the DMA API requirement to never try to grab the page from the
> dma coherent allocation because you generally speaking can't. It just
> happened to accidentally work the trivial dma coherent allocator that
> is used on x86.
>

re-sending since gmail decided to not send plain text:

Yes, I agree that it has been broken and misusing the API. Greg's
question was what changed though, and it was the clean up of
__GFP_COMP in dma-mapping that brought the problem in the driver to
light.

I already said the other day that cnic has been doing this for 14
years. I'm not blaming you or your __GFP_COMP cleanup commits, they
just uncovered that cnic was doing something wrong. My apologies if
you took it that way.

Regards,
Jerry

2023-10-02 13:45:13

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On 10/2/23 10:46, Greg Kroah-Hartman wrote:
> On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
>> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
>>> On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
>>>> Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
>>>> in particular these two (from the e529d3507a93 dma-mapping pull for
>>>> 6.2):
>>>
>>> That's complete BS. The driver was broken since day 1 and always
>>> ignored the DMA API requirement to never try to grab the page from the
>>> dma coherent allocation because you generally speaking can't. It just
>>> happened to accidentally work the trivial dma coherent allocator that
>>> is used on x86.
>>>
>>
>> re-sending since gmail decided to not send plain text:
>>
>> Yes, I agree that it has been broken and misusing the API. Greg's
>> question was what changed though, and it was the clean up of
>> __GFP_COMP in dma-mapping that brought the problem in the driver to
>> light.
>>
>> I already said the other day that cnic has been doing this for 14
>> years. I'm not blaming you or your __GFP_COMP cleanup commits, they
>> just uncovered that cnic was doing something wrong. My apologies if
>> you took it that way.
>
> As these devices aren't being made anymore, and this api is really not a
> good idea in the first place, why don't we just leave it broken and see
> if anyone notices?
>
Guess what triggered this mail thread.
Some customers did notice.

Problem is that these devices were built as the network interface in
some bladecenter machines, so you can't just replace them with a
different Ethernet card.

Cheers,

Hannes

2023-10-02 15:42:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
> On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > > in particular these two (from the e529d3507a93 dma-mapping pull for
> > > 6.2):
> >
> > That's complete BS. The driver was broken since day 1 and always
> > ignored the DMA API requirement to never try to grab the page from the
> > dma coherent allocation because you generally speaking can't. It just
> > happened to accidentally work the trivial dma coherent allocator that
> > is used on x86.
> >
>
> re-sending since gmail decided to not send plain text:
>
> Yes, I agree that it has been broken and misusing the API. Greg's
> question was what changed though, and it was the clean up of
> __GFP_COMP in dma-mapping that brought the problem in the driver to
> light.
>
> I already said the other day that cnic has been doing this for 14
> years. I'm not blaming you or your __GFP_COMP cleanup commits, they
> just uncovered that cnic was doing something wrong. My apologies if
> you took it that way.

As these devices aren't being made anymore, and this api is really not a
good idea in the first place, why don't we just leave it broken and see
if anyone notices?

thanks,

greg k-h

2023-10-02 22:49:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Sun, Oct 01, 2023 at 12:44:05PM +0200, Hannes Reinecke wrote:
> On 9/30/23 20:28, Greg Kroah-Hartman wrote:
> > On Sat, Sep 30, 2023 at 11:19:20AM -0700, Chris Leech wrote:
> > > On Sat, Sep 30, 2023 at 09:06:51AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Sep 29, 2023 at 10:00:23AM -0700, Chris Leech wrote:
> > > > > Make use of the new UIO_MEM_DMA_COHERENT type to properly handle mmap
> > > > > for dma_alloc_coherent buffers.
> > > >
> > > > Why are ethernet drivers messing around with UIO devices? That's not
> > > > what UIO is for, unless you are trying to do kernel bypass for these
> > > > devices without anyone noticing?
> > > >
> > > > confused,
> > >
> > > It's confusing. The bnx2 driver stack included a cnic (converged nic?)
> > > module that sits between the ethernet drivers (bnx2, bnx2x) and protocol
> > > offload drivers (iscsi, fcoe, rdma).
> > >
> > > The iscsi module (bnx2i) uses a passthrough interface from cnic to
> > > handle some network configuration that the device firmware doesn't do.
> > > It uses a uio device and a userspace component called iscsiuio to do
> > > that.
> >
> > That's horrible, and not what the UIO api is for at all. Configure the
> > device like any other normal kernel device, don't poke at raw memory
> > values directly, that way lies madness.
> >
> > Have a pointer to the userspace tool anywhere? All I found looks like a
> > full IP stack in userspace under that name, and surely that's not what
> > this api is for...
> >
> But that's how the interface is used, in particular for the bnx2i driver.
> Problem is that the bnx2i iSCSI offload is just that, an iSCSI offload. Not
> a TCP offload. So if the iSCSI interface is configured to
> acquire the IP address via DHCP, someone has to run the DHCP protocol.
> But the iSCSI offload can't, and the bnx2i PCI device is not a network
> device so that the normal network stack can't be used.
> And so the architects of the bnx2i card decided to use UIO to pass
> the network traffic to userspace, and used the userspace 'iscsiuio'
> application to run DHCP in userspace.
>
> But's been that way for several years now; so long, in fact, that
> the card itself has been out of support from Marvell (since quite some
> years, too, IIRC). And even the successor of that card (the qedi driver)
> is nearing EOL. Mind you, the qedi driver is using the same interface (by
> using UIO to run DHCP in userspace), so singling out the bnx2i for bad
> design can be construed as being unfair :-)

Ok, let's say they are all horrible! :)

> I agree, though, that the design is a mess.

Ok, so why are we papering over it and continuing to allow it to exist?

What "broke" to suddenly require this UIO change? If this has been
around for a very long time, what has caused this to now require the UIO
layer to change?

thanks,

greg k-h

2023-10-05 15:13:51

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH 3/3] cnic,bnx2,bnx2x: use UIO_MEM_DMA_COHERENT

On Mon, 2023-10-02 at 10:59 +0200, Hannes Reinecke wrote:
> On 10/2/23 10:46, Greg Kroah-Hartman wrote:
> > On Mon, Oct 02, 2023 at 12:50:21AM -0700, Jerry Snitselaar wrote:
> > > On Mon, Oct 02, 2023 at 08:04:24AM +0200, Christoph Hellwig wrote:
> > > > On Sun, Oct 01, 2023 at 07:22:36AM -0700, Jerry Snitselaar wrote:
> > > > > Changes last year to the dma-mapping api to no longer allow __GFP_COMP,
> > > > > in particular these two (from the e529d3507a93 dma-mapping pull for
> > > > > 6.2):
> > > >
> > > > That's complete BS. The driver was broken since day 1 and always
> > > > ignored the DMA API requirement to never try to grab the page from the
> > > > dma coherent allocation because you generally speaking can't. It just
> > > > happened to accidentally work the trivial dma coherent allocator that
> > > > is used on x86.
> > > >
> > >
> > > re-sending since gmail decided to not send plain text:
> > >
> > > Yes, I agree that it has been broken and misusing the API. Greg's
> > > question was what changed though, and it was the clean up of
> > > __GFP_COMP in dma-mapping that brought the problem in the driver to
> > > light.
> > >
> > > I already said the other day that cnic has been doing this for 14
> > > years. I'm not blaming you or your __GFP_COMP cleanup commits, they
> > > just uncovered that cnic was doing something wrong. My apologies if
> > > you took it that way.
> >
> > As these devices aren't being made anymore, and this api is really not a
> > good idea in the first place, why don't we just leave it broken and see
> > if anyone notices?
> >
> Guess what triggered this mail thread.
> Some customers did notice.
>
> Problem is that these devices were built as the network interface in
> some bladecenter machines, so you can't just replace them with a
> different Ethernet card.

This route looks a no-go.

Out of sheer ignorance, would the iommu hack hinted in the cover letter
require similar controversial changes?

Cheers,

Paolo