2014-10-28 10:45:39

by Markos Chandras

[permalink] [raw]
Subject: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

Hi,

It seems I am unable to boot my Malta with EVA. The problem appeared in
the 3.18 merge window. I bisected the problem (between v3.17 and
v3.18-rc1) and I found the following commit responsible for the broken boot.

commit 12220dea07f1ac6ac717707104773d771c3f3077
Author: Joonsoo Kim <[email protected]>
Date: Thu Oct 9 15:26:24 2014 -0700

mm/slab: support slab merge


Reverting my tree back to the parent of that commit
423c929cbbecc60e9c407f9048e58f5422f7995d ("
mm/slab_common: commonize slab merge logic")

restores the boot for me.

I don't quite understand the commit yet so let me know if you need more
information to debug this problem

Here is the kernel log of the failed boot.

Calibrating delay loop... 19.86 BogoMIPS (lpj=99328)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
Kernel bug detected[#1]:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea07f1 #1631
task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000
$ 0 : 00000000 806c0000 00000080 00000000
$ 4 : 1f048080 00000001 00000001 00000000
$ 8 : 1f04f5d8 00000001 fffffffc 00000000
$12 : 00000000 ffffffff fffef7b7 00000000
$16 : 1f048080 1f00ec00 1f048180 806ba998
$20 : 1f00ec00 80660000 1f03b780 806ad380
$24 : 00000000 80154d70
$28 : 1f050000 1f053d48 806ba8ec 80141184
Hi : 00000000
Lo : 0b532b80
epc : 80141190 alloc_unbound_pwq+0x234/0x304
Not tainted
ra : 80141184 alloc_unbound_pwq+0x228/0x304
Status: 1000dc03 KERNEL EXL IE
Cause : 00800034
PrId : 0001a82d (MIPS P5600)
Modules linked in:
Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000)
Stack : 1f03b880 00000002 1f03b800 80140d90 1f048180 1f03b880 00000002
1f03b800
1f03bb80 801417a4 1f0481e0 0000000e 1f048180 00000200 1f048180
1f048190
00000002 1f048188 80660000 80660000 8065af94 80141dc0 0110d710
00000100
8065af94 806ad380 8065b200 8013ea70 1f048280 1f053e0c 8065af98
1f0481e0
00000000 00000004 80660000 80660000 80660000 80660000 80660000
80660000
...
Call Trace:
[<80141190>] alloc_unbound_pwq+0x234/0x304
[<801417a4>] apply_workqueue_attrs+0x11c/0x294
[<80141dc0>] __alloc_workqueue_key+0x23c/0x470
[<80683de4>] init_workqueues+0x320/0x400
[<8010058c>] do_one_initcall+0xe8/0x23c
[<8067cbec>] kernel_init_freeable+0x9c/0x224
[<80565fd8>] kernel_init+0x10/0x100
[<80104e38>] ret_from_kernel_thread+0x14/0x1c


Code: 10400032 00408021 320200ff <00020336> 00002821 02002021
0c0defb0 24060100 26020074
---[ end trace cb88537fdc8fa200 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b

--
markos


2014-10-28 13:01:17

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
> Hi,
>
> It seems I am unable to boot my Malta with EVA. The problem appeared in
> the 3.18 merge window. I bisected the problem (between v3.17 and
> v3.18-rc1) and I found the following commit responsible for the broken boot.

Hello,

Did you start to bisect from v3.18-rc1?
I'd like to be sure that this is another bug which is fixed by following commit.

commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
Author: Joonsoo Kim <[email protected]>
Date: Mon Oct 13 15:51:01 2014 -0700

mm/slab: fix unaligned access on sparc64

This fix is merged into v3.18-rc1 sometime later that
'support slab merge' is merged.

Thanks.

> commit 12220dea07f1ac6ac717707104773d771c3f3077
> Author: Joonsoo Kim <[email protected]>
> Date: Thu Oct 9 15:26:24 2014 -0700
>
> mm/slab: support slab merge
>
>
> Reverting my tree back to the parent of that commit
> 423c929cbbecc60e9c407f9048e58f5422f7995d ("
> mm/slab_common: commonize slab merge logic")
>
> restores the boot for me.
>
> I don't quite understand the commit yet so let me know if you need more
> information to debug this problem
>
> Here is the kernel log of the failed boot.
>
> Calibrating delay loop... 19.86 BogoMIPS (lpj=99328)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
> Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
> Kernel bug detected[#1]:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea07f1 #1631
> task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000
> $ 0 : 00000000 806c0000 00000080 00000000
> $ 4 : 1f048080 00000001 00000001 00000000
> $ 8 : 1f04f5d8 00000001 fffffffc 00000000
> $12 : 00000000 ffffffff fffef7b7 00000000
> $16 : 1f048080 1f00ec00 1f048180 806ba998
> $20 : 1f00ec00 80660000 1f03b780 806ad380
> $24 : 00000000 80154d70
> $28 : 1f050000 1f053d48 806ba8ec 80141184
> Hi : 00000000
> Lo : 0b532b80
> epc : 80141190 alloc_unbound_pwq+0x234/0x304
> Not tainted
> ra : 80141184 alloc_unbound_pwq+0x228/0x304
> Status: 1000dc03 KERNEL EXL IE
> Cause : 00800034
> PrId : 0001a82d (MIPS P5600)
> Modules linked in:
> Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000)
> Stack : 1f03b880 00000002 1f03b800 80140d90 1f048180 1f03b880 00000002
> 1f03b800
> 1f03bb80 801417a4 1f0481e0 0000000e 1f048180 00000200 1f048180
> 1f048190
> 00000002 1f048188 80660000 80660000 8065af94 80141dc0 0110d710
> 00000100
> 8065af94 806ad380 8065b200 8013ea70 1f048280 1f053e0c 8065af98
> 1f0481e0
> 00000000 00000004 80660000 80660000 80660000 80660000 80660000
> 80660000
> ...
> Call Trace:
> [<80141190>] alloc_unbound_pwq+0x234/0x304
> [<801417a4>] apply_workqueue_attrs+0x11c/0x294
> [<80141dc0>] __alloc_workqueue_key+0x23c/0x470
> [<80683de4>] init_workqueues+0x320/0x400
> [<8010058c>] do_one_initcall+0xe8/0x23c
> [<8067cbec>] kernel_init_freeable+0x9c/0x224
> [<80565fd8>] kernel_init+0x10/0x100
> [<80104e38>] ret_from_kernel_thread+0x14/0x1c
>
>
> Code: 10400032 00408021 320200ff <00020336> 00002821 02002021
> 0c0defb0 24060100 26020074
> ---[ end trace cb88537fdc8fa200 ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
>
> --
> markos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-10-28 13:19:21

by Markos Chandras

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>> Hi,
>>
>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>> the 3.18 merge window. I bisected the problem (between v3.17 and
>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>
> Hello,
>
> Did you start to bisect from v3.18-rc1?
> I'd like to be sure that this is another bug which is fixed by following commit.
>
> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
> Author: Joonsoo Kim <[email protected]>
> Date: Mon Oct 13 15:51:01 2014 -0700
>
> mm/slab: fix unaligned access on sparc64
>
> This fix is merged into v3.18-rc1 sometime later that
> 'support slab merge' is merged.
>
> Thanks.
>
Hi,

I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
problem

btw i did more tests and this is not EVA specific. A maltaup_defconfig
fails in the same way. I suspect all malta*_defconfigs will fail in a
similar way which makes it probably easier for you to reproduce it on a
QEMU.

--
markos

2014-10-28 13:24:51

by Markos Chandras

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

On 10/28/2014 01:19 PM, Markos Chandras wrote:
> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>> Hi,
>>>
>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>
>> Hello,
>>
>> Did you start to bisect from v3.18-rc1?
>> I'd like to be sure that this is another bug which is fixed by following commit.
>>
>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>> Author: Joonsoo Kim <[email protected]>
>> Date: Mon Oct 13 15:51:01 2014 -0700
>>
>> mm/slab: fix unaligned access on sparc64
>>
>> This fix is merged into v3.18-rc1 sometime later that
>> 'support slab merge' is merged.
>>
>> Thanks.
>>
> Hi,
>
> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
> problem
>
> btw i did more tests and this is not EVA specific. A maltaup_defconfig
> fails in the same way. I suspect all malta*_defconfigs will fail in a
> similar way which makes it probably easier for you to reproduce it on a
> QEMU.
>

sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
might be a similar problem like the one fixed in
85c9f4b04a08f6bc770b77530c22d04103468b8f

--
markos

2014-10-28 13:48:58

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

2014-10-28 22:24 GMT+09:00 Markos Chandras <[email protected]>:
> On 10/28/2014 01:19 PM, Markos Chandras wrote:
>> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>>> Hi,
>>>>
>>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>>
>>> Hello,
>>>
>>> Did you start to bisect from v3.18-rc1?
>>> I'd like to be sure that this is another bug which is fixed by following commit.
>>>
>>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>> Author: Joonsoo Kim <[email protected]>
>>> Date: Mon Oct 13 15:51:01 2014 -0700
>>>
>>> mm/slab: fix unaligned access on sparc64
>>>
>>> This fix is merged into v3.18-rc1 sometime later that
>>> 'support slab merge' is merged.
>>>
>>> Thanks.
>>>
>> Hi,
>>
>> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
>> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
>> problem
>>
>> btw i did more tests and this is not EVA specific. A maltaup_defconfig
>> fails in the same way. I suspect all malta*_defconfigs will fail in a
>> similar way which makes it probably easier for you to reproduce it on a
>> QEMU.
>>
>
> sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
> might be a similar problem like the one fixed in
> 85c9f4b04a08f6bc770b77530c22d04103468b8f

Oops. Sorry. Above commit ('mm/slab: fix unaligned access on sparc64')
is irrelevant to this problem.

Anyway, your problem would be related to merging with incompatible slab cache.
Best way to debug is printing source/target slab cache's object size and
alignment and find the problem. I will try to reproduce it using QEMU.

Thanks.

2014-10-28 14:21:03

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

2014-10-28 22:48 GMT+09:00 Joonsoo Kim <[email protected]>:
> 2014-10-28 22:24 GMT+09:00 Markos Chandras <[email protected]>:
>> On 10/28/2014 01:19 PM, Markos Chandras wrote:
>>> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>>>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>>>> Hi,
>>>>>
>>>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>>>
>>>> Hello,
>>>>
>>>> Did you start to bisect from v3.18-rc1?
>>>> I'd like to be sure that this is another bug which is fixed by following commit.
>>>>
>>>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>> Author: Joonsoo Kim <[email protected]>
>>>> Date: Mon Oct 13 15:51:01 2014 -0700
>>>>
>>>> mm/slab: fix unaligned access on sparc64
>>>>
>>>> This fix is merged into v3.18-rc1 sometime later that
>>>> 'support slab merge' is merged.
>>>>
>>>> Thanks.
>>>>
>>> Hi,
>>>
>>> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
>>> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
>>> problem
>>>
>>> btw i did more tests and this is not EVA specific. A maltaup_defconfig
>>> fails in the same way. I suspect all malta*_defconfigs will fail in a
>>> similar way which makes it probably easier for you to reproduce it on a
>>> QEMU.
>>>
>>
>> sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
>> might be a similar problem like the one fixed in
>> 85c9f4b04a08f6bc770b77530c22d04103468b8f
>
> Oops. Sorry. Above commit ('mm/slab: fix unaligned access on sparc64')
> is irrelevant to this problem.
>
> Anyway, your problem would be related to merging with incompatible slab cache.
> Best way to debug is printing source/target slab cache's object size and
> alignment and find the problem. I will try to reproduce it using QEMU.

I found that cross compile for MIPS isn't easy job. :)
Could you help me to debug the problem with below patch?

Thanks.

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..b118a52 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2059,6 +2059,10 @@ __kmem_cache_alias(const char *name, size_t
size, size_t align,

cachep = find_mergeable(size, align, flags, name, ctor);
if (cachep) {
+ printk("%s: (%s %lu %lu) to (%s %lu %lu %lu)\n", __func__,
+ name, size, align,
+ cachep->name, cachep->size, cachep->align,
cachep->object_size);
+
cachep->refcount++;

/*

2014-10-28 14:32:10

by Markos Chandras

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

On 10/28/2014 02:21 PM, Joonsoo Kim wrote:
> 2014-10-28 22:48 GMT+09:00 Joonsoo Kim <[email protected]>:
>> 2014-10-28 22:24 GMT+09:00 Markos Chandras <[email protected]>:
>>> On 10/28/2014 01:19 PM, Markos Chandras wrote:
>>>> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>>>>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>>>>> Hi,
>>>>>>
>>>>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>>>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>>>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>>>>
>>>>> Hello,
>>>>>
>>>>> Did you start to bisect from v3.18-rc1?
>>>>> I'd like to be sure that this is another bug which is fixed by following commit.
>>>>>
>>>>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>>> Author: Joonsoo Kim <[email protected]>
>>>>> Date: Mon Oct 13 15:51:01 2014 -0700
>>>>>
>>>>> mm/slab: fix unaligned access on sparc64
>>>>>
>>>>> This fix is merged into v3.18-rc1 sometime later that
>>>>> 'support slab merge' is merged.
>>>>>
>>>>> Thanks.
>>>>>
>>>> Hi,
>>>>
>>>> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
>>>> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
>>>> problem
>>>>
>>>> btw i did more tests and this is not EVA specific. A maltaup_defconfig
>>>> fails in the same way. I suspect all malta*_defconfigs will fail in a
>>>> similar way which makes it probably easier for you to reproduce it on a
>>>> QEMU.
>>>>
>>>
>>> sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
>>> might be a similar problem like the one fixed in
>>> 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>
>> Oops. Sorry. Above commit ('mm/slab: fix unaligned access on sparc64')
>> is irrelevant to this problem.
>>
>> Anyway, your problem would be related to merging with incompatible slab cache.
>> Best way to debug is printing source/target slab cache's object size and
>> alignment and find the problem. I will try to reproduce it using QEMU.
>
> I found that cross compile for MIPS isn't easy job. :)

You could grab the following toolchain

https://sourcery.mentor.com/GNUToolchain/release2791

(get the IA32 linux tar)

unpack it somewhere (eg /tmp) and then

make ARCH=mips maltasmvp_defconfig
make ARCH=mips CROSS_COMPILE=/tmp/mips-2014.05/bin/mips-linux-gnu- -j8
or something :)

> Could you help me to debug the problem with below patch?

(there are a few build warnings with your patch
mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
unsigned int', but argument 5 has type 'size_t' [-Wformat=]
mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
unsigned int', but argument 7 has type 'unsigned int' [-Wformat=]
mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
unsigned int', but argument 8 has type 'int' [-Wformat=]
mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
unsigned int', but argument 9 has type 'int' [-Wformat=]
)

but here is the output from a QEMU boot right before the crash

CPU frequency 200.00 MHz
Calibrating delay loop... 1087.89 BogoMIPS (lpj=5439488)
pid_max: default: 32768 minimum: 301
__kmem_cache_alias: (cred_jar 92 0) to (kmalloc-128 128 128 128)
__kmem_cache_alias: (files_cache 256 0) to (kmalloc-256 256 128 256)
__kmem_cache_alias: (fs_cache 36 0) to (pid 64 64 44)
__kmem_cache_alias: (names_cache 4096 0) to (kmalloc-4096 4096 128 4096)
__kmem_cache_alias: (mnt_cache 160 0) to (filp 192 64 160)
Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
__kmem_cache_alias: (pool_workqueue 256 256) to (kmalloc-256 256 128 256)
Kernel bug detected[#1]:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.18.0-rc2-00043-gf7e87a44ef60-dirty #1647
task: 8704b5d8 ti: 8704c000 task.ti: 8704c000


--
markos

2014-10-28 15:00:27

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

2014-10-28 23:32 GMT+09:00 Markos Chandras <[email protected]>:
> On 10/28/2014 02:21 PM, Joonsoo Kim wrote:
>> 2014-10-28 22:48 GMT+09:00 Joonsoo Kim <[email protected]>:
>>> 2014-10-28 22:24 GMT+09:00 Markos Chandras <[email protected]>:
>>>> On 10/28/2014 01:19 PM, Markos Chandras wrote:
>>>>> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>>>>>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>>>>>> Hi,
>>>>>>>
>>>>>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>>>>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>>>>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Did you start to bisect from v3.18-rc1?
>>>>>> I'd like to be sure that this is another bug which is fixed by following commit.
>>>>>>
>>>>>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>>>> Author: Joonsoo Kim <[email protected]>
>>>>>> Date: Mon Oct 13 15:51:01 2014 -0700
>>>>>>
>>>>>> mm/slab: fix unaligned access on sparc64
>>>>>>
>>>>>> This fix is merged into v3.18-rc1 sometime later that
>>>>>> 'support slab merge' is merged.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>> Hi,
>>>>>
>>>>> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
>>>>> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
>>>>> problem
>>>>>
>>>>> btw i did more tests and this is not EVA specific. A maltaup_defconfig
>>>>> fails in the same way. I suspect all malta*_defconfigs will fail in a
>>>>> similar way which makes it probably easier for you to reproduce it on a
>>>>> QEMU.
>>>>>
>>>>
>>>> sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
>>>> might be a similar problem like the one fixed in
>>>> 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>
>>> Oops. Sorry. Above commit ('mm/slab: fix unaligned access on sparc64')
>>> is irrelevant to this problem.
>>>
>>> Anyway, your problem would be related to merging with incompatible slab cache.
>>> Best way to debug is printing source/target slab cache's object size and
>>> alignment and find the problem. I will try to reproduce it using QEMU.
>>
>> I found that cross compile for MIPS isn't easy job. :)
>
> You could grab the following toolchain
>
> https://sourcery.mentor.com/GNUToolchain/release2791
>
> (get the IA32 linux tar)
>
> unpack it somewhere (eg /tmp) and then
>
> make ARCH=mips maltasmvp_defconfig
> make ARCH=mips CROSS_COMPILE=/tmp/mips-2014.05/bin/mips-linux-gnu- -j8
> or something :)

Wow!! Really Thanks!
I will try it.

>> Could you help me to debug the problem with below patch?
>
> (there are a few build warnings with your patch
> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
> unsigned int', but argument 5 has type 'size_t' [-Wformat=]
> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
> unsigned int', but argument 7 has type 'unsigned int' [-Wformat=]
> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
> unsigned int', but argument 8 has type 'int' [-Wformat=]
> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
> unsigned int', but argument 9 has type 'int' [-Wformat=]
> )
>
> but here is the output from a QEMU boot right before the crash
>
> CPU frequency 200.00 MHz
> Calibrating delay loop... 1087.89 BogoMIPS (lpj=5439488)
> pid_max: default: 32768 minimum: 301
> __kmem_cache_alias: (cred_jar 92 0) to (kmalloc-128 128 128 128)
> __kmem_cache_alias: (files_cache 256 0) to (kmalloc-256 256 128 256)
> __kmem_cache_alias: (fs_cache 36 0) to (pid 64 64 44)
> __kmem_cache_alias: (names_cache 4096 0) to (kmalloc-4096 4096 128 4096)
> __kmem_cache_alias: (mnt_cache 160 0) to (filp 192 64 160)
> Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
> Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
> __kmem_cache_alias: (pool_workqueue 256 256) to (kmalloc-256 256 128 256)

alignment is mismatch between pool_workqueue and kmalloc-256,
but, slab caches are merged, because they have same object size.
Perhaps, slab cache for pool_workqueue returns 128 byte aligned memory
and workqueue can't work well with it.

Quick fix may be something like below, but, I will try it on QEMU MIPS.

Thanks.

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 3a6e0cf..d57b1a2 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -269,6 +269,9 @@ struct kmem_cache *find_mergeable(size_t size, size_t align,
if (s->size - size >= sizeof(void *))
continue;

+ if (align > s->align)
+ continue;
+
return s;
}
return NULL;

2014-10-28 15:45:26

by Markos Chandras

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

On 10/28/2014 03:00 PM, Joonsoo Kim wrote:
> 2014-10-28 23:32 GMT+09:00 Markos Chandras <[email protected]>:
>> On 10/28/2014 02:21 PM, Joonsoo Kim wrote:
>>> 2014-10-28 22:48 GMT+09:00 Joonsoo Kim <[email protected]>:
>>>> 2014-10-28 22:24 GMT+09:00 Markos Chandras <[email protected]>:
>>>>> On 10/28/2014 01:19 PM, Markos Chandras wrote:
>>>>>> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>>>>>>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>>>>>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>>>>>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Did you start to bisect from v3.18-rc1?
>>>>>>> I'd like to be sure that this is another bug which is fixed by following commit.
>>>>>>>
>>>>>>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>>>>> Author: Joonsoo Kim <[email protected]>
>>>>>>> Date: Mon Oct 13 15:51:01 2014 -0700
>>>>>>>
>>>>>>> mm/slab: fix unaligned access on sparc64
>>>>>>>
>>>>>>> This fix is merged into v3.18-rc1 sometime later that
>>>>>>> 'support slab merge' is merged.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
>>>>>> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
>>>>>> problem
>>>>>>
>>>>>> btw i did more tests and this is not EVA specific. A maltaup_defconfig
>>>>>> fails in the same way. I suspect all malta*_defconfigs will fail in a
>>>>>> similar way which makes it probably easier for you to reproduce it on a
>>>>>> QEMU.
>>>>>>
>>>>>
>>>>> sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
>>>>> might be a similar problem like the one fixed in
>>>>> 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>>
>>>> Oops. Sorry. Above commit ('mm/slab: fix unaligned access on sparc64')
>>>> is irrelevant to this problem.
>>>>
>>>> Anyway, your problem would be related to merging with incompatible slab cache.
>>>> Best way to debug is printing source/target slab cache's object size and
>>>> alignment and find the problem. I will try to reproduce it using QEMU.
>>>
>>> I found that cross compile for MIPS isn't easy job. :)
>>
>> You could grab the following toolchain
>>
>> https://sourcery.mentor.com/GNUToolchain/release2791
>>
>> (get the IA32 linux tar)
>>
>> unpack it somewhere (eg /tmp) and then
>>
>> make ARCH=mips maltasmvp_defconfig
>> make ARCH=mips CROSS_COMPILE=/tmp/mips-2014.05/bin/mips-linux-gnu- -j8
>> or something :)
>
> Wow!! Really Thanks!
> I will try it.
>
>>> Could you help me to debug the problem with below patch?
>>
>> (there are a few build warnings with your patch
>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>> unsigned int', but argument 5 has type 'size_t' [-Wformat=]
>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>> unsigned int', but argument 7 has type 'unsigned int' [-Wformat=]
>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>> unsigned int', but argument 8 has type 'int' [-Wformat=]
>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>> unsigned int', but argument 9 has type 'int' [-Wformat=]
>> )
>>
>> but here is the output from a QEMU boot right before the crash
>>
>> CPU frequency 200.00 MHz
>> Calibrating delay loop... 1087.89 BogoMIPS (lpj=5439488)
>> pid_max: default: 32768 minimum: 301
>> __kmem_cache_alias: (cred_jar 92 0) to (kmalloc-128 128 128 128)
>> __kmem_cache_alias: (files_cache 256 0) to (kmalloc-256 256 128 256)
>> __kmem_cache_alias: (fs_cache 36 0) to (pid 64 64 44)
>> __kmem_cache_alias: (names_cache 4096 0) to (kmalloc-4096 4096 128 4096)
>> __kmem_cache_alias: (mnt_cache 160 0) to (filp 192 64 160)
>> Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
>> Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
>> __kmem_cache_alias: (pool_workqueue 256 256) to (kmalloc-256 256 128 256)
>
> alignment is mismatch between pool_workqueue and kmalloc-256,
> but, slab caches are merged, because they have same object size.
> Perhaps, slab cache for pool_workqueue returns 128 byte aligned memory
> and workqueue can't work well with it.
>
> Quick fix may be something like below, but, I will try it on QEMU MIPS.
>
> Thanks.
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 3a6e0cf..d57b1a2 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -269,6 +269,9 @@ struct kmem_cache *find_mergeable(size_t size, size_t align,
> if (s->size - size >= sizeof(void *))
> continue;
>
> + if (align > s->align)
> + continue;
> +
> return s;
> }
> return NULL;
>
Hi,

Yeah this makes the kernel boot again.

--
markos

2014-10-28 16:00:39

by Joonsoo Kim

[permalink] [raw]
Subject: Re: Boot problems on Malta with EVA (bisected to 12220dea07f1 "mm/slab: support slab merge")

2014-10-29 0:45 GMT+09:00 Markos Chandras <[email protected]>:
> On 10/28/2014 03:00 PM, Joonsoo Kim wrote:
>> 2014-10-28 23:32 GMT+09:00 Markos Chandras <[email protected]>:
>>> On 10/28/2014 02:21 PM, Joonsoo Kim wrote:
>>>> 2014-10-28 22:48 GMT+09:00 Joonsoo Kim <[email protected]>:
>>>>> 2014-10-28 22:24 GMT+09:00 Markos Chandras <[email protected]>:
>>>>>> On 10/28/2014 01:19 PM, Markos Chandras wrote:
>>>>>>> On 10/28/2014 01:01 PM, Joonsoo Kim wrote:
>>>>>>>> 2014-10-28 19:45 GMT+09:00 Markos Chandras <[email protected]>:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> It seems I am unable to boot my Malta with EVA. The problem appeared in
>>>>>>>>> the 3.18 merge window. I bisected the problem (between v3.17 and
>>>>>>>>> v3.18-rc1) and I found the following commit responsible for the broken boot.
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Did you start to bisect from v3.18-rc1?
>>>>>>>> I'd like to be sure that this is another bug which is fixed by following commit.
>>>>>>>>
>>>>>>>> commit 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>>>>>> Author: Joonsoo Kim <[email protected]>
>>>>>>>> Date: Mon Oct 13 15:51:01 2014 -0700
>>>>>>>>
>>>>>>>> mm/slab: fix unaligned access on sparc64
>>>>>>>>
>>>>>>>> This fix is merged into v3.18-rc1 sometime later that
>>>>>>>> 'support slab merge' is merged.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I bisected from v3.17 until 3.18-rc1. But 3.18-rc2 and the latest
>>>>>>> mainline (f7e87a44ef60ad379e39b45437604141453bf0ec) still have the same
>>>>>>> problem
>>>>>>>
>>>>>>> btw i did more tests and this is not EVA specific. A maltaup_defconfig
>>>>>>> fails in the same way. I suspect all malta*_defconfigs will fail in a
>>>>>>> similar way which makes it probably easier for you to reproduce it on a
>>>>>>> QEMU.
>>>>>>>
>>>>>>
>>>>>> sorry maltaup_defconfig does not fail. maltasmvp_defconfig does. So it
>>>>>> might be a similar problem like the one fixed in
>>>>>> 85c9f4b04a08f6bc770b77530c22d04103468b8f
>>>>>
>>>>> Oops. Sorry. Above commit ('mm/slab: fix unaligned access on sparc64')
>>>>> is irrelevant to this problem.
>>>>>
>>>>> Anyway, your problem would be related to merging with incompatible slab cache.
>>>>> Best way to debug is printing source/target slab cache's object size and
>>>>> alignment and find the problem. I will try to reproduce it using QEMU.
>>>>
>>>> I found that cross compile for MIPS isn't easy job. :)
>>>
>>> You could grab the following toolchain
>>>
>>> https://sourcery.mentor.com/GNUToolchain/release2791
>>>
>>> (get the IA32 linux tar)
>>>
>>> unpack it somewhere (eg /tmp) and then
>>>
>>> make ARCH=mips maltasmvp_defconfig
>>> make ARCH=mips CROSS_COMPILE=/tmp/mips-2014.05/bin/mips-linux-gnu- -j8
>>> or something :)
>>
>> Wow!! Really Thanks!
>> I will try it.
>>
>>>> Could you help me to debug the problem with below patch?
>>>
>>> (there are a few build warnings with your patch
>>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>>> unsigned int', but argument 5 has type 'size_t' [-Wformat=]
>>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>>> unsigned int', but argument 7 has type 'unsigned int' [-Wformat=]
>>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>>> unsigned int', but argument 8 has type 'int' [-Wformat=]
>>> mm/slab.c:2065:4: warning: format '%lu' expects argument of type 'long
>>> unsigned int', but argument 9 has type 'int' [-Wformat=]
>>> )
>>>
>>> but here is the output from a QEMU boot right before the crash
>>>
>>> CPU frequency 200.00 MHz
>>> Calibrating delay loop... 1087.89 BogoMIPS (lpj=5439488)
>>> pid_max: default: 32768 minimum: 301
>>> __kmem_cache_alias: (cred_jar 92 0) to (kmalloc-128 128 128 128)
>>> __kmem_cache_alias: (files_cache 256 0) to (kmalloc-256 256 128 256)
>>> __kmem_cache_alias: (fs_cache 36 0) to (pid 64 64 44)
>>> __kmem_cache_alias: (names_cache 4096 0) to (kmalloc-4096 4096 128 4096)
>>> __kmem_cache_alias: (mnt_cache 160 0) to (filp 192 64 160)
>>> Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
>>> Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
>>> __kmem_cache_alias: (pool_workqueue 256 256) to (kmalloc-256 256 128 256)
>>
>> alignment is mismatch between pool_workqueue and kmalloc-256,
>> but, slab caches are merged, because they have same object size.
>> Perhaps, slab cache for pool_workqueue returns 128 byte aligned memory
>> and workqueue can't work well with it.
>>
>> Quick fix may be something like below, but, I will try it on QEMU MIPS.
>>
>> Thanks.
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 3a6e0cf..d57b1a2 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -269,6 +269,9 @@ struct kmem_cache *find_mergeable(size_t size, size_t align,
>> if (s->size - size >= sizeof(void *))
>> continue;
>>
>> + if (align > s->align)
>> + continue;
>> +
>> return s;
>> }
>> return NULL;
>>
> Hi,
>
> Yeah this makes the kernel boot again.

(Just resend due to missing cc list)

Thanks for testing!

I tried boot on QEMU MIPS and can reproduce this problem.
I found that my assumption is correct.
128 byte aligned memory is returned for pool_workqueue.

I need some time to think better way to fix this problem.
I will send it in a few days.

Really thanks for your help.
Thanks.