Hi all,
After merging the dma-mapping tree, today's linux-next build (powerpc
pseries_le_defconfig) failed like this:
[ 1.829053][ T1] ------------[ cut here ]------------
[ 1.829629][ T1] kernel BUG at include/linux/iommu-helper.h:21!
[ 1.830182][ T1] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1.830302][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 1.830436][ T1] Modules linked in:
[ 1.830879][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1 #2
[ 1.831042][ T1] NIP: c0000000006f4944 LR: c0000000006f4924 CTR: c00000000004aa10
[ 1.831174][ T1] REGS: c00000007e3a31e0 TRAP: 0700 Not tainted (5.9.0-rc1)
[ 1.831243][ T1] MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44022422 XER: 20000000
[ 1.831574][ T1] CFAR: c0000000006b3084 IRQMASK: 1
[ 1.831574][ T1] GPR00: c0000000006f4924 c00000007e3a3470 c000000001289000 0000000000000001
[ 1.831574][ T1] GPR04: 0000000000000000 0000000000000003 0000000000000040 0000000000000000
[ 1.831574][ T1] GPR08: 0000000000000001 0000000000000000 fffffffffffffffe c00c000000000000
[ 1.831574][ T1] GPR12: 0000000024028420 c0000000014b0000 c00000007e9cd000 0000000000000001
[ 1.831574][ T1] GPR16: 0000000000000000 0000000000000000 c00000007e9cd100 c00000007e9cd118
[ 1.831574][ T1] GPR20: 00000000ffffffff 0000000000000000 0000000000000001 0000000000000000
[ 1.831574][ T1] GPR24: 0000000000000000 ffffffffffffffff c00000007eb20000 0000000000000000
[ 1.831574][ T1] GPR28: 0000000000000001 000000000000bfff 0000000000000000 0000000000000001
[ 1.833145][ T1] NIP [c0000000006f4944] iommu_area_alloc+0xa4/0x170
[ 1.833271][ T1] LR [c0000000006f4924] iommu_area_alloc+0x84/0x170
[ 1.833494][ T1] Call Trace:
[ 1.833686][ T1] [c00000007e3a3470] [c0000000006f4924] iommu_area_alloc+0x84/0x170 (unreliable)
[ 1.833961][ T1] [c00000007e3a34e0] [c00000000004b034] iommu_range_alloc+0x1a4/0x410
[ 1.834116][ T1] [c00000007e3a35a0] [c00000000004b650] iommu_alloc+0x60/0x130
[ 1.834248][ T1] [c00000007e3a35f0] [c00000000004c6c8] iommu_map_page+0xd8/0x210
[ 1.834381][ T1] [c00000007e3a3680] [c00000000004aa70] dma_iommu_map_page+0x60/0x80
[ 1.834502][ T1] [c00000007e3a36a0] [c0000000001cce30] dma_map_page_attrs+0x190/0x260
[ 1.834628][ T1] [c00000007e3a3750] [c00000000086195c] ibmvscsi_probe+0x12c/0xa2c
[ 1.834768][ T1] [c00000007e3a3830] [c0000000000e049c] vio_bus_probe+0x9c/0x460
[ 1.834880][ T1] [c00000007e3a38d0] [c0000000007f2cbc] really_probe+0x12c/0x4e0
[ 1.834993][ T1] [c00000007e3a3970] [c0000000007f3308] driver_probe_device+0x88/0x120
[ 1.835108][ T1] [c00000007e3a39a0] [c0000000007f36ec] device_driver_attach+0xcc/0xe0
[ 1.835220][ T1] [c00000007e3a39e0] [c0000000007f3780] __driver_attach+0x80/0x140
[ 1.835321][ T1] [c00000007e3a3a20] [c0000000007ef9a8] bus_for_each_dev+0xa8/0x130
[ 1.835429][ T1] [c00000007e3a3a80] [c0000000007f2394] driver_attach+0x34/0x50
[ 1.835534][ T1] [c00000007e3a3aa0] [c0000000007f1878] bus_add_driver+0x1e8/0x2b0
[ 1.835647][ T1] [c00000007e3a3b30] [c0000000007f47f8] driver_register+0x98/0x1a0
[ 1.835782][ T1] [c00000007e3a3ba0] [c0000000000df4bc] __vio_register_driver+0x4c/0x60
[ 1.835938][ T1] [c00000007e3a3bc0] [c000000000f8d924] ibmvscsi_module_init+0xa4/0xdc
[ 1.836056][ T1] [c00000007e3a3c00] [c000000000012430] do_one_initcall+0x60/0x2b0
[ 1.836175][ T1] [c00000007e3a3cd0] [c000000000f44740] kernel_init_freeable+0x2e0/0x378
[ 1.836287][ T1] [c00000007e3a3db0] [c000000000012a24] kernel_init+0x2c/0x158
[ 1.836509][ T1] [c00000007e3a3e20] [c00000000000d9d0] ret_from_kernel_thread+0x5c/0x6c
[ 1.836717][ T1] Instruction dump:
[ 1.836904][ T1] 2da90000 f8010010 f821ff91 4bfbe669 60000000 7c3d1840 7c7f1b78 40810074
[ 1.837082][ T1] 60000000 60000000 60000000 40920010 <0fe00000> 60000000 60000000 408efff4
[ 1.838497][ T1] ---[ end trace e9dbc52052087399 ]---
The BUG is
BUG_ON(!is_power_of_2(boundary_size));
in iommu_is_span_boundary()
Bisected to commit
04d324bf549d ("dma-mapping: set default segment_boundary_mask to ULONG_MAX")
I have reverted that commit for today.
--
Cheers,
Stephen Rothwell
Hi Stephen,
On Thu, Aug 20, 2020 at 03:51:12PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> After merging the dma-mapping tree, today's linux-next build (powerpc
> pseries_le_defconfig) failed like this:
>
> [ 1.829053][ T1] ------------[ cut here ]------------
> [ 1.829629][ T1] kernel BUG at include/linux/iommu-helper.h:21!
> [ 1.830182][ T1] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 1.830302][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [ 1.830436][ T1] Modules linked in:
> [ 1.830879][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1 #2
> [ 1.831042][ T1] NIP: c0000000006f4944 LR: c0000000006f4924 CTR: c00000000004aa10
> [ 1.831174][ T1] REGS: c00000007e3a31e0 TRAP: 0700 Not tainted (5.9.0-rc1)
> [ 1.831243][ T1] MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44022422 XER: 20000000
> [ 1.831574][ T1] CFAR: c0000000006b3084 IRQMASK: 1
> [ 1.831574][ T1] GPR00: c0000000006f4924 c00000007e3a3470 c000000001289000 0000000000000001
> [ 1.831574][ T1] GPR04: 0000000000000000 0000000000000003 0000000000000040 0000000000000000
> [ 1.831574][ T1] GPR08: 0000000000000001 0000000000000000 fffffffffffffffe c00c000000000000
> [ 1.831574][ T1] GPR12: 0000000024028420 c0000000014b0000 c00000007e9cd000 0000000000000001
> [ 1.831574][ T1] GPR16: 0000000000000000 0000000000000000 c00000007e9cd100 c00000007e9cd118
> [ 1.831574][ T1] GPR20: 00000000ffffffff 0000000000000000 0000000000000001 0000000000000000
> [ 1.831574][ T1] GPR24: 0000000000000000 ffffffffffffffff c00000007eb20000 0000000000000000
> [ 1.831574][ T1] GPR28: 0000000000000001 000000000000bfff 0000000000000000 0000000000000001
> [ 1.833145][ T1] NIP [c0000000006f4944] iommu_area_alloc+0xa4/0x170
> [ 1.833271][ T1] LR [c0000000006f4924] iommu_area_alloc+0x84/0x170
> [ 1.833494][ T1] Call Trace:
> [ 1.833686][ T1] [c00000007e3a3470] [c0000000006f4924] iommu_area_alloc+0x84/0x170 (unreliable)
> [ 1.833961][ T1] [c00000007e3a34e0] [c00000000004b034] iommu_range_alloc+0x1a4/0x410
> [ 1.834116][ T1] [c00000007e3a35a0] [c00000000004b650] iommu_alloc+0x60/0x130
> [ 1.834248][ T1] [c00000007e3a35f0] [c00000000004c6c8] iommu_map_page+0xd8/0x210
> [ 1.834381][ T1] [c00000007e3a3680] [c00000000004aa70] dma_iommu_map_page+0x60/0x80
> [ 1.834502][ T1] [c00000007e3a36a0] [c0000000001cce30] dma_map_page_attrs+0x190/0x260
> [ 1.834628][ T1] [c00000007e3a3750] [c00000000086195c] ibmvscsi_probe+0x12c/0xa2c
> [ 1.834768][ T1] [c00000007e3a3830] [c0000000000e049c] vio_bus_probe+0x9c/0x460
> [ 1.834880][ T1] [c00000007e3a38d0] [c0000000007f2cbc] really_probe+0x12c/0x4e0
> [ 1.834993][ T1] [c00000007e3a3970] [c0000000007f3308] driver_probe_device+0x88/0x120
> [ 1.835108][ T1] [c00000007e3a39a0] [c0000000007f36ec] device_driver_attach+0xcc/0xe0
> [ 1.835220][ T1] [c00000007e3a39e0] [c0000000007f3780] __driver_attach+0x80/0x140
> [ 1.835321][ T1] [c00000007e3a3a20] [c0000000007ef9a8] bus_for_each_dev+0xa8/0x130
> [ 1.835429][ T1] [c00000007e3a3a80] [c0000000007f2394] driver_attach+0x34/0x50
> [ 1.835534][ T1] [c00000007e3a3aa0] [c0000000007f1878] bus_add_driver+0x1e8/0x2b0
> [ 1.835647][ T1] [c00000007e3a3b30] [c0000000007f47f8] driver_register+0x98/0x1a0
> [ 1.835782][ T1] [c00000007e3a3ba0] [c0000000000df4bc] __vio_register_driver+0x4c/0x60
> [ 1.835938][ T1] [c00000007e3a3bc0] [c000000000f8d924] ibmvscsi_module_init+0xa4/0xdc
> [ 1.836056][ T1] [c00000007e3a3c00] [c000000000012430] do_one_initcall+0x60/0x2b0
> [ 1.836175][ T1] [c00000007e3a3cd0] [c000000000f44740] kernel_init_freeable+0x2e0/0x378
> [ 1.836287][ T1] [c00000007e3a3db0] [c000000000012a24] kernel_init+0x2c/0x158
> [ 1.836509][ T1] [c00000007e3a3e20] [c00000000000d9d0] ret_from_kernel_thread+0x5c/0x6c
> [ 1.836717][ T1] Instruction dump:
> [ 1.836904][ T1] 2da90000 f8010010 f821ff91 4bfbe669 60000000 7c3d1840 7c7f1b78 40810074
> [ 1.837082][ T1] 60000000 60000000 60000000 40920010 <0fe00000> 60000000 60000000 408efff4
> [ 1.838497][ T1] ---[ end trace e9dbc52052087399 ]---
>
> The BUG is
>
> BUG_ON(!is_power_of_2(boundary_size));
>
> in iommu_is_span_boundary()
Took a quick look -- the boundary_size is seemingly passed from
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/powerpc/kernel/iommu.c#n240
boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
1 << tbl->it_page_shift);
Looks like an overflow happens due to (ULONG_MAX + 1). Should
we fix here instead (or also)?
Thanks
Nic
On Thu, Aug 20, 2020 at 01:36:17AM -0700, Nicolin Chen wrote:
> Took a quick look -- the boundary_size is seemingly passed from
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/powerpc/kernel/iommu.c#n240
>
> boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> 1 << tbl->it_page_shift);
>
> Looks like an overflow happens due to (ULONG_MAX + 1). Should
> we fix here instead (or also)?
Yes, please. I'll drop the patch again for now, but once we've
got this sorted out I'll readd it.
On Thu, Aug 20, 2020 at 05:49:41PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 20, 2020 at 01:36:17AM -0700, Nicolin Chen wrote:
> > Took a quick look -- the boundary_size is seemingly passed from
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/powerpc/kernel/iommu.c#n240
> >
> > boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > 1 << tbl->it_page_shift);
> >
> > Looks like an overflow happens due to (ULONG_MAX + 1). Should
> > we fix here instead (or also)?
>
> Yes, please. I'll drop the patch again for now, but once we've
> got this sorted out I'll readd it.
I'll send a series of changes, as I found these...
1 145 arch/alpha/kernel/pci_iommu.c <<iommu_arena_find_pages>>
boundary_size = dma_get_seg_boundary(dev) + 1;
2 488 arch/ia64/hp/common/sba_iommu.c <<sba_search_bitmap>>
boundary_size = (unsigned long long )dma_get_seg_boundary(dev) + 1;
3 266 arch/s390/pci/pci_dma.c <<__dma_alloc_iommu>>
boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
4 170 arch/sparc/kernel/iommu-common.c <<iommu_tbl_range_alloc>>
boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
5 475 arch/sparc/kernel/iommu.c <<dma_4u_map_sg>>
seg_boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
6 511 arch/sparc/kernel/pci_sun4v.c <<dma_4v_map_sg>>
seg_boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
7 97 arch/x86/kernel/amd_gart_64.c <<alloc_iommu>>
base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev),
8 99 arch/x86/kernel/amd_gart_64.c <<alloc_iommu>>
boundary_size = ALIGN((u64)dma_get_seg_boundary(dev) + 1,
9 359 drivers/parisc/ccio-dma.c <<ccio_alloc_range>>
boundary_size = ALIGN((unsigned long long )dma_get_seg_boundary(dev) + 1,
10 110 drivers/parisc/iommu-helpers.h <<iommu_coalesce_chunks>>
unsigned int max_seg_boundary = dma_get_seg_boundary(dev) + 1;
11 345 drivers/parisc/sba_iommu.c <<sba_search_bitmap>>
boundary_size = ALIGN((unsigned long long )dma_get_seg_boundary(dev) + 1,