2020-08-20 07:43:16

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: boot failure after merge of the dma-mapping tree

Hi all,

After merging the dma-mapping tree, today's linux-next build (powerpc
pseries_le_defconfig) failed like this:

[ 1.829053][ T1] ------------[ cut here ]------------
[ 1.829629][ T1] kernel BUG at include/linux/iommu-helper.h:21!
[ 1.830182][ T1] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1.830302][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 1.830436][ T1] Modules linked in:
[ 1.830879][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1 #2
[ 1.831042][ T1] NIP: c0000000006f4944 LR: c0000000006f4924 CTR: c00000000004aa10
[ 1.831174][ T1] REGS: c00000007e3a31e0 TRAP: 0700 Not tainted (5.9.0-rc1)
[ 1.831243][ T1] MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44022422 XER: 20000000
[ 1.831574][ T1] CFAR: c0000000006b3084 IRQMASK: 1
[ 1.831574][ T1] GPR00: c0000000006f4924 c00000007e3a3470 c000000001289000 0000000000000001
[ 1.831574][ T1] GPR04: 0000000000000000 0000000000000003 0000000000000040 0000000000000000
[ 1.831574][ T1] GPR08: 0000000000000001 0000000000000000 fffffffffffffffe c00c000000000000
[ 1.831574][ T1] GPR12: 0000000024028420 c0000000014b0000 c00000007e9cd000 0000000000000001
[ 1.831574][ T1] GPR16: 0000000000000000 0000000000000000 c00000007e9cd100 c00000007e9cd118
[ 1.831574][ T1] GPR20: 00000000ffffffff 0000000000000000 0000000000000001 0000000000000000
[ 1.831574][ T1] GPR24: 0000000000000000 ffffffffffffffff c00000007eb20000 0000000000000000
[ 1.831574][ T1] GPR28: 0000000000000001 000000000000bfff 0000000000000000 0000000000000001
[ 1.833145][ T1] NIP [c0000000006f4944] iommu_area_alloc+0xa4/0x170
[ 1.833271][ T1] LR [c0000000006f4924] iommu_area_alloc+0x84/0x170
[ 1.833494][ T1] Call Trace:
[ 1.833686][ T1] [c00000007e3a3470] [c0000000006f4924] iommu_area_alloc+0x84/0x170 (unreliable)
[ 1.833961][ T1] [c00000007e3a34e0] [c00000000004b034] iommu_range_alloc+0x1a4/0x410
[ 1.834116][ T1] [c00000007e3a35a0] [c00000000004b650] iommu_alloc+0x60/0x130
[ 1.834248][ T1] [c00000007e3a35f0] [c00000000004c6c8] iommu_map_page+0xd8/0x210
[ 1.834381][ T1] [c00000007e3a3680] [c00000000004aa70] dma_iommu_map_page+0x60/0x80
[ 1.834502][ T1] [c00000007e3a36a0] [c0000000001cce30] dma_map_page_attrs+0x190/0x260
[ 1.834628][ T1] [c00000007e3a3750] [c00000000086195c] ibmvscsi_probe+0x12c/0xa2c
[ 1.834768][ T1] [c00000007e3a3830] [c0000000000e049c] vio_bus_probe+0x9c/0x460
[ 1.834880][ T1] [c00000007e3a38d0] [c0000000007f2cbc] really_probe+0x12c/0x4e0
[ 1.834993][ T1] [c00000007e3a3970] [c0000000007f3308] driver_probe_device+0x88/0x120
[ 1.835108][ T1] [c00000007e3a39a0] [c0000000007f36ec] device_driver_attach+0xcc/0xe0
[ 1.835220][ T1] [c00000007e3a39e0] [c0000000007f3780] __driver_attach+0x80/0x140
[ 1.835321][ T1] [c00000007e3a3a20] [c0000000007ef9a8] bus_for_each_dev+0xa8/0x130
[ 1.835429][ T1] [c00000007e3a3a80] [c0000000007f2394] driver_attach+0x34/0x50
[ 1.835534][ T1] [c00000007e3a3aa0] [c0000000007f1878] bus_add_driver+0x1e8/0x2b0
[ 1.835647][ T1] [c00000007e3a3b30] [c0000000007f47f8] driver_register+0x98/0x1a0
[ 1.835782][ T1] [c00000007e3a3ba0] [c0000000000df4bc] __vio_register_driver+0x4c/0x60
[ 1.835938][ T1] [c00000007e3a3bc0] [c000000000f8d924] ibmvscsi_module_init+0xa4/0xdc
[ 1.836056][ T1] [c00000007e3a3c00] [c000000000012430] do_one_initcall+0x60/0x2b0
[ 1.836175][ T1] [c00000007e3a3cd0] [c000000000f44740] kernel_init_freeable+0x2e0/0x378
[ 1.836287][ T1] [c00000007e3a3db0] [c000000000012a24] kernel_init+0x2c/0x158
[ 1.836509][ T1] [c00000007e3a3e20] [c00000000000d9d0] ret_from_kernel_thread+0x5c/0x6c
[ 1.836717][ T1] Instruction dump:
[ 1.836904][ T1] 2da90000 f8010010 f821ff91 4bfbe669 60000000 7c3d1840 7c7f1b78 40810074
[ 1.837082][ T1] 60000000 60000000 60000000 40920010 <0fe00000> 60000000 60000000 408efff4
[ 1.838497][ T1] ---[ end trace e9dbc52052087399 ]---

The BUG is

BUG_ON(!is_power_of_2(boundary_size));

in iommu_is_span_boundary()

Bisected to commit

04d324bf549d ("dma-mapping: set default segment_boundary_mask to ULONG_MAX")

I have reverted that commit for today.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2020-08-20 08:38:08

by Nicolin Chen

[permalink] [raw]
Subject: Re: linux-next: boot failure after merge of the dma-mapping tree

Hi Stephen,

On Thu, Aug 20, 2020 at 03:51:12PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> After merging the dma-mapping tree, today's linux-next build (powerpc
> pseries_le_defconfig) failed like this:
>
> [ 1.829053][ T1] ------------[ cut here ]------------
> [ 1.829629][ T1] kernel BUG at include/linux/iommu-helper.h:21!
> [ 1.830182][ T1] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 1.830302][ T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [ 1.830436][ T1] Modules linked in:
> [ 1.830879][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc1 #2
> [ 1.831042][ T1] NIP: c0000000006f4944 LR: c0000000006f4924 CTR: c00000000004aa10
> [ 1.831174][ T1] REGS: c00000007e3a31e0 TRAP: 0700 Not tainted (5.9.0-rc1)
> [ 1.831243][ T1] MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 44022422 XER: 20000000
> [ 1.831574][ T1] CFAR: c0000000006b3084 IRQMASK: 1
> [ 1.831574][ T1] GPR00: c0000000006f4924 c00000007e3a3470 c000000001289000 0000000000000001
> [ 1.831574][ T1] GPR04: 0000000000000000 0000000000000003 0000000000000040 0000000000000000
> [ 1.831574][ T1] GPR08: 0000000000000001 0000000000000000 fffffffffffffffe c00c000000000000
> [ 1.831574][ T1] GPR12: 0000000024028420 c0000000014b0000 c00000007e9cd000 0000000000000001
> [ 1.831574][ T1] GPR16: 0000000000000000 0000000000000000 c00000007e9cd100 c00000007e9cd118
> [ 1.831574][ T1] GPR20: 00000000ffffffff 0000000000000000 0000000000000001 0000000000000000
> [ 1.831574][ T1] GPR24: 0000000000000000 ffffffffffffffff c00000007eb20000 0000000000000000
> [ 1.831574][ T1] GPR28: 0000000000000001 000000000000bfff 0000000000000000 0000000000000001
> [ 1.833145][ T1] NIP [c0000000006f4944] iommu_area_alloc+0xa4/0x170
> [ 1.833271][ T1] LR [c0000000006f4924] iommu_area_alloc+0x84/0x170
> [ 1.833494][ T1] Call Trace:
> [ 1.833686][ T1] [c00000007e3a3470] [c0000000006f4924] iommu_area_alloc+0x84/0x170 (unreliable)
> [ 1.833961][ T1] [c00000007e3a34e0] [c00000000004b034] iommu_range_alloc+0x1a4/0x410
> [ 1.834116][ T1] [c00000007e3a35a0] [c00000000004b650] iommu_alloc+0x60/0x130
> [ 1.834248][ T1] [c00000007e3a35f0] [c00000000004c6c8] iommu_map_page+0xd8/0x210
> [ 1.834381][ T1] [c00000007e3a3680] [c00000000004aa70] dma_iommu_map_page+0x60/0x80
> [ 1.834502][ T1] [c00000007e3a36a0] [c0000000001cce30] dma_map_page_attrs+0x190/0x260
> [ 1.834628][ T1] [c00000007e3a3750] [c00000000086195c] ibmvscsi_probe+0x12c/0xa2c
> [ 1.834768][ T1] [c00000007e3a3830] [c0000000000e049c] vio_bus_probe+0x9c/0x460
> [ 1.834880][ T1] [c00000007e3a38d0] [c0000000007f2cbc] really_probe+0x12c/0x4e0
> [ 1.834993][ T1] [c00000007e3a3970] [c0000000007f3308] driver_probe_device+0x88/0x120
> [ 1.835108][ T1] [c00000007e3a39a0] [c0000000007f36ec] device_driver_attach+0xcc/0xe0
> [ 1.835220][ T1] [c00000007e3a39e0] [c0000000007f3780] __driver_attach+0x80/0x140
> [ 1.835321][ T1] [c00000007e3a3a20] [c0000000007ef9a8] bus_for_each_dev+0xa8/0x130
> [ 1.835429][ T1] [c00000007e3a3a80] [c0000000007f2394] driver_attach+0x34/0x50
> [ 1.835534][ T1] [c00000007e3a3aa0] [c0000000007f1878] bus_add_driver+0x1e8/0x2b0
> [ 1.835647][ T1] [c00000007e3a3b30] [c0000000007f47f8] driver_register+0x98/0x1a0
> [ 1.835782][ T1] [c00000007e3a3ba0] [c0000000000df4bc] __vio_register_driver+0x4c/0x60
> [ 1.835938][ T1] [c00000007e3a3bc0] [c000000000f8d924] ibmvscsi_module_init+0xa4/0xdc
> [ 1.836056][ T1] [c00000007e3a3c00] [c000000000012430] do_one_initcall+0x60/0x2b0
> [ 1.836175][ T1] [c00000007e3a3cd0] [c000000000f44740] kernel_init_freeable+0x2e0/0x378
> [ 1.836287][ T1] [c00000007e3a3db0] [c000000000012a24] kernel_init+0x2c/0x158
> [ 1.836509][ T1] [c00000007e3a3e20] [c00000000000d9d0] ret_from_kernel_thread+0x5c/0x6c
> [ 1.836717][ T1] Instruction dump:
> [ 1.836904][ T1] 2da90000 f8010010 f821ff91 4bfbe669 60000000 7c3d1840 7c7f1b78 40810074
> [ 1.837082][ T1] 60000000 60000000 60000000 40920010 <0fe00000> 60000000 60000000 408efff4
> [ 1.838497][ T1] ---[ end trace e9dbc52052087399 ]---
>
> The BUG is
>
> BUG_ON(!is_power_of_2(boundary_size));
>
> in iommu_is_span_boundary()

Took a quick look -- the boundary_size is seemingly passed from
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/powerpc/kernel/iommu.c#n240

boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
1 << tbl->it_page_shift);

Looks like an overflow happens due to (ULONG_MAX + 1). Should
we fix here instead (or also)?

Thanks
Nic

2020-08-20 15:51:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: linux-next: boot failure after merge of the dma-mapping tree

On Thu, Aug 20, 2020 at 01:36:17AM -0700, Nicolin Chen wrote:
> Took a quick look -- the boundary_size is seemingly passed from
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/powerpc/kernel/iommu.c#n240
>
> boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> 1 << tbl->it_page_shift);
>
> Looks like an overflow happens due to (ULONG_MAX + 1). Should
> we fix here instead (or also)?

Yes, please. I'll drop the patch again for now, but once we've
got this sorted out I'll readd it.

2020-08-20 21:56:27

by Nicolin Chen

[permalink] [raw]
Subject: Re: linux-next: boot failure after merge of the dma-mapping tree

On Thu, Aug 20, 2020 at 05:49:41PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 20, 2020 at 01:36:17AM -0700, Nicolin Chen wrote:
> > Took a quick look -- the boundary_size is seemingly passed from
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/powerpc/kernel/iommu.c#n240
> >
> > boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
> > 1 << tbl->it_page_shift);
> >
> > Looks like an overflow happens due to (ULONG_MAX + 1). Should
> > we fix here instead (or also)?
>
> Yes, please. I'll drop the patch again for now, but once we've
> got this sorted out I'll readd it.

I'll send a series of changes, as I found these...

1 145 arch/alpha/kernel/pci_iommu.c <<iommu_arena_find_pages>>
boundary_size = dma_get_seg_boundary(dev) + 1;
2 488 arch/ia64/hp/common/sba_iommu.c <<sba_search_bitmap>>
boundary_size = (unsigned long long )dma_get_seg_boundary(dev) + 1;
3 266 arch/s390/pci/pci_dma.c <<__dma_alloc_iommu>>
boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
4 170 arch/sparc/kernel/iommu-common.c <<iommu_tbl_range_alloc>>
boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
5 475 arch/sparc/kernel/iommu.c <<dma_4u_map_sg>>
seg_boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
6 511 arch/sparc/kernel/pci_sun4v.c <<dma_4v_map_sg>>
seg_boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1,
7 97 arch/x86/kernel/amd_gart_64.c <<alloc_iommu>>
base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev),
8 99 arch/x86/kernel/amd_gart_64.c <<alloc_iommu>>
boundary_size = ALIGN((u64)dma_get_seg_boundary(dev) + 1,
9 359 drivers/parisc/ccio-dma.c <<ccio_alloc_range>>
boundary_size = ALIGN((unsigned long long )dma_get_seg_boundary(dev) + 1,
10 110 drivers/parisc/iommu-helpers.h <<iommu_coalesce_chunks>>
unsigned int max_seg_boundary = dma_get_seg_boundary(dev) + 1;
11 345 drivers/parisc/sba_iommu.c <<sba_search_bitmap>>
boundary_size = ALIGN((unsigned long long )dma_get_seg_boundary(dev) + 1,