2020-04-27 14:09:56

by kernel test robot

[permalink] [raw]
Subject: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h

Greeting,

FYI, we noticed the following commit (built with gcc-7):

commit: fa6726c1e7f015bb77f07fc81c32a97b33e4f6c4 ("mm/debug: add tests validating architecture page table helpers")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+-----------------------------------------------------------+------------+------------+
| | e3eec8dce1 | fa6726c1e7 |
+-----------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 16 | 20 |
| Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode= | 12 | |
| BUG:kernel_hang_in_test_stage | 4 | |
| kernel_BUG_at_include/linux/mm.h | 0 | 20 |
| invalid_opcode:#[##] | 0 | 20 |
| EIP:__free_pages | 0 | 20 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 20 |
+-----------------------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 10.263354] kernel BUG at include/linux/mm.h:699!
[ 10.264320] invalid opcode: 0000 [#1] SMP
[ 10.264872] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc2-00230-gfa6726c1e7f01 #2
[ 10.265928] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 10.267074] EIP: __free_pages+0x4f/0x62
[ 10.267615] Code: 85 ff 74 0e 89 fa 89 f0 e8 83 ed ff ff 5b 5e 5f 5d c3 89 f0 e8 57 ff ff ff 5b 5e 5f 5d c3 ba fc 86 fc c1 89 f0 e8 ff 2e fe ff <0f> 0b 0f b6 cb ba ff ff ff ff 89 f0 e8 07 8f 01 00 eb bf 55 89 e5
[ 10.270098] EAX: 0000003e EBX: ee800000 ECX: 00000000 EDX: c0068000
[ 10.270925] ESI: eece0640 EDI: c016d020 EBP: c0071f10 ESP: c0071f04
[ 10.271786] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
[ 10.272724] CR0: 80050033 CR2: b7d6467d CR3: 023d0000 CR4: 000006b0
[ 10.273572] Call Trace:
[ 10.273912] free_pages+0x3d/0x43
[ 10.274367] pgd_free+0xea/0x11b
[ 10.274807] __mmdrop+0x3c/0xc7
[ 10.275237] ? __free_pages+0x3e/0x62
[ 10.275761] debug_vm_pgtable+0x411/0x419
[ 10.276305] ? rest_init+0x23c/0x23c
[ 10.276767] kernel_init+0x15/0xf4
[ 10.277208] ? schedule_tail_wrapper+0x9/0xc
[ 10.277756] ret_from_fork+0x2e/0x38
[ 10.278217] Modules linked in: stm_p_basic
[ 10.278776] ---[ end trace b838f89424113a3a ]---


To reproduce:

# build kernel
cd linux
cp config-5.7.0-rc2-00230-gfa6726c1e7f01 .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (3.44 kB)
config-5.7.0-rc2-00230-gfa6726c1e7f01 (135.43 kB)
job-script (4.52 kB)
dmesg.xz (12.58 kB)
Download all attachments

2020-04-28 01:51:30

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 04/27/2020 07:37 PM, kernel test robot wrote:
>
> [ 10.263354] kernel BUG at include/linux/mm.h:699!
> [ 10.264320] invalid opcode: 0000 [#1] SMP
> [ 10.264872] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc2-00230-gfa6726c1e7f01 #2
> [ 10.265928] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 10.267074] EIP: __free_pages+0x4f/0x62
> [ 10.267615] Code: 85 ff 74 0e 89 fa 89 f0 e8 83 ed ff ff 5b 5e 5f 5d c3 89 f0 e8 57 ff ff ff 5b 5e 5f 5d c3 ba fc 86 fc c1 89 f0 e8 ff 2e fe ff <0f> 0b 0f b6 cb ba ff ff ff ff 89 f0 e8 07 8f 01 00 eb bf 55 89 e5
> [ 10.270098] EAX: 0000003e EBX: ee800000 ECX: 00000000 EDX: c0068000
> [ 10.270925] ESI: eece0640 EDI: c016d020 EBP: c0071f10 ESP: c0071f04
> [ 10.271786] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
> [ 10.272724] CR0: 80050033 CR2: b7d6467d CR3: 023d0000 CR4: 000006b0
> [ 10.273572] Call Trace:
> [ 10.273912] free_pages+0x3d/0x43
> [ 10.274367] pgd_free+0xea/0x11b
> [ 10.274807] __mmdrop+0x3c/0xc7
> [ 10.275237] ? __free_pages+0x3e/0x62
> [ 10.275761] debug_vm_pgtable+0x411/0x419
> [ 10.276305] ? rest_init+0x23c/0x23c
> [ 10.276767] kernel_init+0x15/0xf4
> [ 10.277208] ? schedule_tail_wrapper+0x9/0xc
> [ 10.277756] ret_from_fork+0x2e/0x38
> [ 10.278217] Modules linked in: stm_p_basic
> [ 10.278776] ---[ end trace b838f89424113a3a ]---
This is an unsupported (enabled via CONFIG_EXPERT) X86 platform (CONFIG_X86_PAE)
and is known to fail. The latest (V17) patch had moved the test invocation into
a late_initcall() per Linus thus pushing down any possible failures (like this)
after early boot. Please ignore this report.

Apart from this X86_PAE based config, no other platform failures have reported
so far. Assuming that this test robot does have a good platform coverage, the
CONFIG_EXPERT method of enabling CONFIG_DEBUG_VM_PGTABLE should help in getting
more platform coverage for this test.

- Anshuman

2020-04-28 02:08:00

by Qian Cai

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



> On Apr 27, 2020, at 9:49 PM, Anshuman Khandual <[email protected]> wrote:
>
> This is an unsupported (enabled via CONFIG_EXPERT) X86 platform (CONFIG_X86_PAE)
> and is known to fail. The latest (V17) patch had moved the test invocation into
> a late_initcall() per Linus thus pushing down any possible failures (like this)
> after early boot. Please ignore this report.
>
> Apart from this X86_PAE based config, no other platform failures have reported
> so far. Assuming that this test robot does have a good platform coverage, the
> CONFIG_EXPERT method of enabling CONFIG_DEBUG_VM_PGTABLE should help in getting
> more platform coverage for this test.

This sounds really sloppy. Why can’t we make it impossible to select this combination if nobody is willing to fix it?

2020-04-28 02:37:37

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 04/28/2020 07:35 AM, Qian Cai wrote:
>
>
>> On Apr 27, 2020, at 9:49 PM, Anshuman Khandual <[email protected]> wrote:
>>
>> This is an unsupported (enabled via CONFIG_EXPERT) X86 platform (CONFIG_X86_PAE)
>> and is known to fail. The latest (V17) patch had moved the test invocation into
>> a late_initcall() per Linus thus pushing down any possible failures (like this)
>> after early boot. Please ignore this report.
>>
>> Apart from this X86_PAE based config, no other platform failures have reported
>> so far. Assuming that this test robot does have a good platform coverage, the
>> CONFIG_EXPERT method of enabling CONFIG_DEBUG_VM_PGTABLE should help in getting
>> more platform coverage for this test.
>
> This sounds really sloppy. Why can’t we make it impossible to select this combination if nobody is willing to fix it?

Letting CONFIG_DEBUG_VM_PGTABLE enabled via CONFIG_EXPERT for unsupported
platforms i.e without ARCH_HAS_DEBUG_VM_PGTABLE, was a conscious decision
meant to expand it's adaptability and coverage without requiring any code
(i.e Kconfig) change. The easier it is to enable the test on unsupported
platforms right now, more folks are likely to try it out thus increasing
it's probability to get fixed on those platforms. That is a valid enough
reason to have CONFIG_EXPERT based enablement method, IMHO. Also even with
CONFIG_EXPERT set, CONFIG_DEBUG_VM_PGTABLE does not get enabled by default
automatically.

2020-04-28 02:52:53

by Qian Cai

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



> On Apr 27, 2020, at 10:35 PM, Anshuman Khandual <[email protected]> wrote:
>
> Letting CONFIG_DEBUG_VM_PGTABLE enabled via CONFIG_EXPERT for unsupported
> platforms i.e without ARCH_HAS_DEBUG_VM_PGTABLE, was a conscious decision
> meant to expand it's adaptability and coverage without requiring any code
> (i.e Kconfig) change. The easier it is to enable the test on unsupported
> platforms right now, more folks are likely to try it out thus increasing
> it's probability to get fixed on those platforms. That is a valid enough
> reason to have CONFIG_EXPERT based enablement method, IMHO. Also even with
> CONFIG_EXPERT set, CONFIG_DEBUG_VM_PGTABLE does not get enabled by default
> automatically.

No, I am talking about PAE. There is a distinction between known broken that nobody cares (like arm32) and in-progress/unknown status (like s390).

Also, it is not very nice to introduce regressions for robots when testing PAE because they always select CONFIG__EXPERT and CONFIG_DEBUG_VM.

2020-04-28 03:56:35

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 04/28/2020 08:21 AM, Qian Cai wrote:
>
>
>> On Apr 27, 2020, at 10:35 PM, Anshuman Khandual <[email protected]> wrote:
>>
>> Letting CONFIG_DEBUG_VM_PGTABLE enabled via CONFIG_EXPERT for unsupported
>> platforms i.e without ARCH_HAS_DEBUG_VM_PGTABLE, was a conscious decision
>> meant to expand it's adaptability and coverage without requiring any code
>> (i.e Kconfig) change. The easier it is to enable the test on unsupported
>> platforms right now, more folks are likely to try it out thus increasing
>> it's probability to get fixed on those platforms. That is a valid enough
>> reason to have CONFIG_EXPERT based enablement method, IMHO. Also even with
>> CONFIG_EXPERT set, CONFIG_DEBUG_VM_PGTABLE does not get enabled by default
>> automatically.
>
> No, I am talking about PAE. There is a distinction between known broken that nobody cares (like arm32) and in-progress/unknown status (like s390).
>
> Also, it is not very nice to introduce regressions for robots when testing PAE because they always select CONFIG__EXPERT and CONFIG_DEBUG_VM.

Okay, will add X86_PAE to the explicitly disabled list along with
IA64 and ARM.

----
From: Anshuman Khandual <[email protected]>
Date: Tue, 28 Apr 2020 04:30:04 +0100
Subject: [PATCH 3/3] mm/debug/pgtable: Completely disable X86_PAE

Completely disable X86_PAE, even via CONFIG_EXPERT.

Signed-off-by: Anshuman Khandual <[email protected]>
---
lib/Kconfig.debug | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 6a492e32579a..79e097a2285f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -697,7 +697,7 @@ config DEBUG_VM_PGFLAGS
config DEBUG_VM_PGTABLE
bool "Debug arch page table for semantics compliance"
depends on MMU
- depends on !IA64 && !ARM
+ depends on !IA64 && !ARM && !X86_PAE
depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
help
--

Hello Andrew/Stephen,

Could you please fold the above patch in linux-next. Also please do
let me know if I should respin the series as well. Thank you.

- Anshuman

2020-04-28 05:27:26

by Christophe Leroy

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



Le 28/04/2020 à 04:51, Qian Cai a écrit :
>
>
>> On Apr 27, 2020, at 10:35 PM, Anshuman Khandual <[email protected]> wrote:
>>
>> Letting CONFIG_DEBUG_VM_PGTABLE enabled via CONFIG_EXPERT for unsupported
>> platforms i.e without ARCH_HAS_DEBUG_VM_PGTABLE, was a conscious decision
>> meant to expand it's adaptability and coverage without requiring any code
>> (i.e Kconfig) change. The easier it is to enable the test on unsupported
>> platforms right now, more folks are likely to try it out thus increasing
>> it's probability to get fixed on those platforms. That is a valid enough
>> reason to have CONFIG_EXPERT based enablement method, IMHO. Also even with
>> CONFIG_EXPERT set, CONFIG_DEBUG_VM_PGTABLE does not get enabled by default
>> automatically.
>
> No, I am talking about PAE. There is a distinction between known broken that nobody cares (like arm32) and in-progress/unknown status (like s390).
>
> Also, it is not very nice to introduce regressions for robots when testing PAE because they always select CONFIG__EXPERT and CONFIG_DEBUG_VM.
>

Having CONFIG_EXPERT and CONFIG_DEBUG_VM is not enough to get
CONFIG_DEBUG_VM_PGTABLE set to yes.

By default, CONFIG_DEBUG_VM_PGTABLE is set to no when
ARCH_HAS_DEBUG_VM_PGTABLE is not set.

Christophe

2020-04-28 05:58:56

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 04/28/2020 10:54 AM, Christophe Leroy wrote:
>
>
> Le 28/04/2020 à 04:51, Qian Cai a écrit :
>>
>>
>>> On Apr 27, 2020, at 10:35 PM, Anshuman Khandual <[email protected]> wrote:
>>>
>>> Letting CONFIG_DEBUG_VM_PGTABLE enabled via CONFIG_EXPERT for unsupported
>>> platforms i.e without ARCH_HAS_DEBUG_VM_PGTABLE, was a conscious decision
>>> meant to expand it's adaptability and coverage without requiring any code
>>> (i.e Kconfig) change. The easier it is to enable the test on unsupported
>>> platforms right now, more folks are likely to try it out thus increasing
>>> it's probability to get fixed on those platforms. That is a valid enough
>>> reason to have CONFIG_EXPERT based enablement method, IMHO. Also even with
>>> CONFIG_EXPERT set, CONFIG_DEBUG_VM_PGTABLE does not get enabled by default
>>> automatically.
>>
>> No, I am talking about PAE. There is a distinction between known broken that nobody cares (like arm32) and in-progress/unknown status (like s390).
>>
>> Also, it is not very nice to introduce regressions for robots when testing PAE because they always select CONFIG__EXPERT and CONFIG_DEBUG_VM.
>>
>
> Having CONFIG_EXPERT and CONFIG_DEBUG_VM is not enough to get CONFIG_DEBUG_VM_PGTABLE set to yes.

Not automatically, that is right. But it can be set if required. Seems like
the testing robots can and will test with each and every config whether they
are enabled by default or not. So if we really need to prevent all possible
testing robot regressions, X86_PAE needs to be disabled completely.

>
> By default, CONFIG_DEBUG_VM_PGTABLE is set to no when ARCH_HAS_DEBUG_VM_PGTABLE is not set.

That is true. There is a slight change in the rules, making it explicit yes
only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.

+config DEBUG_VM_PGTABLE
+ bool "Debug arch page table for semantics compliance"
+ depends on MMU
+ depends on !IA64 && !ARM
+ depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
+ default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
+ help

The default is really irrelevant as the config option can be set explicitly.

2020-04-28 06:14:48

by Christophe Leroy

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



Le 28/04/2020 à 07:53, Anshuman Khandual a écrit :
>
>
> On 04/28/2020 10:54 AM, Christophe Leroy wrote:
>>
>>
>> Le 28/04/2020 à 04:51, Qian Cai a écrit :
>>>
>>>
>>>> On Apr 27, 2020, at 10:35 PM, Anshuman Khandual <[email protected]> wrote:
>>>>
>>>> Letting CONFIG_DEBUG_VM_PGTABLE enabled via CONFIG_EXPERT for unsupported
>>>> platforms i.e without ARCH_HAS_DEBUG_VM_PGTABLE, was a conscious decision
>>>> meant to expand it's adaptability and coverage without requiring any code
>>>> (i.e Kconfig) change. The easier it is to enable the test on unsupported
>>>> platforms right now, more folks are likely to try it out thus increasing
>>>> it's probability to get fixed on those platforms. That is a valid enough
>>>> reason to have CONFIG_EXPERT based enablement method, IMHO. Also even with
>>>> CONFIG_EXPERT set, CONFIG_DEBUG_VM_PGTABLE does not get enabled by default
>>>> automatically.
>>>
>>> No, I am talking about PAE. There is a distinction between known broken that nobody cares (like arm32) and in-progress/unknown status (like s390).
>>>
>>> Also, it is not very nice to introduce regressions for robots when testing PAE because they always select CONFIG__EXPERT and CONFIG_DEBUG_VM.
>>>
>>
>> Having CONFIG_EXPERT and CONFIG_DEBUG_VM is not enough to get CONFIG_DEBUG_VM_PGTABLE set to yes.
>
> Not automatically, that is right. But it can be set if required. Seems like
> the testing robots can and will test with each and every config whether they
> are enabled by default or not. So if we really need to prevent all possible
> testing robot regressions, X86_PAE needs to be disabled completely.
>
>>
>> By default, CONFIG_DEBUG_VM_PGTABLE is set to no when ARCH_HAS_DEBUG_VM_PGTABLE is not set.
>
> That is true. There is a slight change in the rules, making it explicit yes
> only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
>
> +config DEBUG_VM_PGTABLE
> + bool "Debug arch page table for semantics compliance"
> + depends on MMU
> + depends on !IA64 && !ARM
> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
> + help
>
> The default is really irrelevant as the config option can be set explicitly.
>

Yes but Qian was saying: "Also, it is not very nice to introduce
regressions for robots when testing PAE because they always select
CONFIG__EXPERT and CONFIG_DEBUG_VM"

Here we see that the said regression is not introduced because they
select CONFIG__EXPERT and CONFIG_DEBUG_VM. This is because the robots
explicitely select DEBUG_VM_PGTABLE.

Christophe

2020-04-28 08:02:09

by Qian Cai

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



> On Apr 28, 2020, at 2:12 AM, Christophe Leroy <[email protected]> wrote:
>
> Yes but Qian was saying: "Also, it is not very nice to introduce regressions for robots when testing PAE because they always select CONFIG__EXPERT and CONFIG_DEBUG_VM"
>
> Here we see that the said regression is not introduced because they select CONFIG__EXPERT and CONFIG_DEBUG_VM. This is because the robots explicitely select DEBUG_VM_PGTABLE.

Right, the robot just tried to be helpful, but never figured out this was a minefield.

2020-04-28 08:43:46

by Qian Cai

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



> On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
>
> That is true. There is a slight change in the rules, making it explicit yes
> only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
>
> +config DEBUG_VM_PGTABLE
> + bool "Debug arch page table for semantics compliance"
> + depends on MMU
> + depends on !IA64 && !ARM
> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
> + help
>
> The default is really irrelevant as the config option can be set explicitly.

That could also explain. Since not long time ago, it was only “default y if DEBUG_VM”, that caused the robot saved a .config with DEBUG_VM_PGTABLE=y by default.

Even though you changed the rule recently, it has no effect as the robot could “make oldconfig” from the saved config for each linux-next tree execution and the breakage will go on.

2020-04-28 09:25:13

by Catalin Marinas

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h

On Tue, Apr 28, 2020 at 04:41:11AM -0400, Qian Cai wrote:
> On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
> > That is true. There is a slight change in the rules, making it explicit yes
> > only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
> >
> > +config DEBUG_VM_PGTABLE
> > + bool "Debug arch page table for semantics compliance"
> > + depends on MMU
> > + depends on !IA64 && !ARM
> > + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> > + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
> > + help
> >
> > The default is really irrelevant as the config option can be set explicitly.
>
> That could also explain. Since not long time ago, it was only “default
> y if DEBUG_VM”, that caused the robot saved a .config with
> DEBUG_VM_PGTABLE=y by default.
>
> Even though you changed the rule recently, it has no effect as the
> robot could “make oldconfig” from the saved config for each linux-next
> tree execution and the breakage will go on.

I'm not entirely sure that's the case. This report still points at the
old commit fa6726c1e7 which has:

+ depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
+ default n if !ARCH_HAS_DEBUG_VM_PGTABLE
+ default y if DEBUG_VM

In -next we now have commit 647d9a0de34c and subsequently modified by
commit 0a8646638865. So hopefully with the latest -next tree we won't
see this report.

We could as well remove the 'depends on ... || EXPERT' part but I'd
rather leave this around with a default n (as in current -next) in case
others want to have a go. If that's still causing problems, we can
remove the '|| EXPERT' part, so there won't be any further regressions.

--
Catalin

2020-04-29 03:31:02

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 04/28/2020 02:51 PM, Catalin Marinas wrote:
> On Tue, Apr 28, 2020 at 04:41:11AM -0400, Qian Cai wrote:
>> On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
>>> That is true. There is a slight change in the rules, making it explicit yes
>>> only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
>>>
>>> +config DEBUG_VM_PGTABLE
>>> + bool "Debug arch page table for semantics compliance"
>>> + depends on MMU
>>> + depends on !IA64 && !ARM
>>> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>> + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
>>> + help
>>>
>>> The default is really irrelevant as the config option can be set explicitly.
>>
>> That could also explain. Since not long time ago, it was only “default
>> y if DEBUG_VM”, that caused the robot saved a .config with
>> DEBUG_VM_PGTABLE=y by default.
>>
>> Even though you changed the rule recently, it has no effect as the
>> robot could “make oldconfig” from the saved config for each linux-next
>> tree execution and the breakage will go on.
>
> I'm not entirely sure that's the case. This report still points at the
> old commit fa6726c1e7 which has:
>
> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> + default n if !ARCH_HAS_DEBUG_VM_PGTABLE
> + default y if DEBUG_VM
>
> In -next we now have commit 647d9a0de34c and subsequently modified by
> commit 0a8646638865. So hopefully with the latest -next tree we won't
> see this report.

Could some one from LKP test framework, please confirm if this still causes
above problem on the latest linux-next by default ?

2020-04-29 12:54:25

by Chen, Rong A

[permalink] [raw]
Subject: Re: [LKP] Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 4/29/2020 11:28 AM, Anshuman Khandual wrote:
>
> On 04/28/2020 02:51 PM, Catalin Marinas wrote:
>> On Tue, Apr 28, 2020 at 04:41:11AM -0400, Qian Cai wrote:
>>> On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
>>>> That is true. There is a slight change in the rules, making it explicit yes
>>>> only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
>>>>
>>>> +config DEBUG_VM_PGTABLE
>>>> + bool "Debug arch page table for semantics compliance"
>>>> + depends on MMU
>>>> + depends on !IA64 && !ARM
>>>> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>>> + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
>>>> + help
>>>>
>>>> The default is really irrelevant as the config option can be set explicitly.
>>> That could also explain. Since not long time ago, it was only “default
>>> y if DEBUG_VM”, that caused the robot saved a .config with
>>> DEBUG_VM_PGTABLE=y by default.
>>>
>>> Even though you changed the rule recently, it has no effect as the
>>> robot could “make oldconfig” from the saved config for each linux-next
>>> tree execution and the breakage will go on.
>> I'm not entirely sure that's the case. This report still points at the
>> old commit fa6726c1e7 which has:
>>
>> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>> + default n if !ARCH_HAS_DEBUG_VM_PGTABLE
>> + default y if DEBUG_VM
>>
>> In -next we now have commit 647d9a0de34c and subsequently modified by
>> commit 0a8646638865. So hopefully with the latest -next tree we won't
>> see this report.
> Could some one from LKP test framework, please confirm if this still causes
> above problem on the latest linux-next by default ?

Hi,

The .config is a rand config, the problem is still exist if run "make
oldconfig" for the config
with commit 0a8646638865.

$ grep -e CONFIG_MMU= -e CONFIG_EXPERT= -e
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE= -e CONFIG_DEBUG_VM= .config
CONFIG_EXPERT=y
CONFIG_MMU=y
CONFIG_DEBUG_VM=y

should we disable DEBUG_VM_PGTABLE by default?

Best Regards,
Rong Chen

2020-04-29 18:19:49

by Catalin Marinas

[permalink] [raw]
Subject: Re: [LKP] Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h

On Wed, Apr 29, 2020 at 08:52:25PM +0800, Chen, Rong A wrote:
> On 4/29/2020 11:28 AM, Anshuman Khandual wrote:
> > On 04/28/2020 02:51 PM, Catalin Marinas wrote:
> > > On Tue, Apr 28, 2020 at 04:41:11AM -0400, Qian Cai wrote:
> > > > On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
> > > > > That is true. There is a slight change in the rules, making it explicit yes
> > > > > only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
> > > > >
> > > > > +config DEBUG_VM_PGTABLE
> > > > > + bool "Debug arch page table for semantics compliance"
> > > > > + depends on MMU
> > > > > + depends on !IA64 && !ARM
> > > > > + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> > > > > + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
> > > > > + help
> > > > >
> > > > > The default is really irrelevant as the config option can be set explicitly.
> > > > That could also explain. Since not long time ago, it was only “default
> > > > y if DEBUG_VM”, that caused the robot saved a .config with
> > > > DEBUG_VM_PGTABLE=y by default.
> > > >
> > > > Even though you changed the rule recently, it has no effect as the
> > > > robot could “make oldconfig” from the saved config for each linux-next
> > > > tree execution and the breakage will go on.
> > > I'm not entirely sure that's the case. This report still points at the
> > > old commit fa6726c1e7 which has:
> > >
> > > + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
> > > + default n if !ARCH_HAS_DEBUG_VM_PGTABLE
> > > + default y if DEBUG_VM
> > >
> > > In -next we now have commit 647d9a0de34c and subsequently modified by
> > > commit 0a8646638865. So hopefully with the latest -next tree we won't
> > > see this report.
> > Could some one from LKP test framework, please confirm if this still causes
> > above problem on the latest linux-next by default ?
>
> The .config is a rand config, the problem is still exist if run "make
> oldconfig" for the config with commit 0a8646638865.

Is randconfig expected to boot? I don't think it is but I guess it
should not trigger a BUG_ON during boot.

> $ grep -e CONFIG_MMU= -e CONFIG_EXPERT= -e CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=
> -e CONFIG_DEBUG_VM= .config
> CONFIG_EXPERT=y
> CONFIG_MMU=y
> CONFIG_DEBUG_VM=y
>
> should we disable DEBUG_VM_PGTABLE by default?

If that's the only case where this fails in LKP, I'd rather remove the
EXPERT dependency so that it cannot be enabled. Architectures that want
to experiment with this feature will have to select
ARCH_HAS_DEBUG_VM_PGTABLE explicitly.

--
Catalin

2020-04-29 18:39:26

by Christophe Leroy

[permalink] [raw]
Subject: Re: [LKP] Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



Le 29/04/2020 à 20:15, Catalin Marinas a écrit :
> On Wed, Apr 29, 2020 at 08:52:25PM +0800, Chen, Rong A wrote:
>> On 4/29/2020 11:28 AM, Anshuman Khandual wrote:
>>> On 04/28/2020 02:51 PM, Catalin Marinas wrote:
>>>> On Tue, Apr 28, 2020 at 04:41:11AM -0400, Qian Cai wrote:
>>>>> On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
>>>>>> That is true. There is a slight change in the rules, making it explicit yes
>>>>>> only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
>>>>>>
>>>>>> +config DEBUG_VM_PGTABLE
>>>>>> + bool "Debug arch page table for semantics compliance"
>>>>>> + depends on MMU
>>>>>> + depends on !IA64 && !ARM
>>>>>> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>>>>> + default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
>>>>>> + help
>>>>>>
>>>>>> The default is really irrelevant as the config option can be set explicitly.
>>>>> That could also explain. Since not long time ago, it was only “default
>>>>> y if DEBUG_VM”, that caused the robot saved a .config with
>>>>> DEBUG_VM_PGTABLE=y by default.
>>>>>
>>>>> Even though you changed the rule recently, it has no effect as the
>>>>> robot could “make oldconfig” from the saved config for each linux-next
>>>>> tree execution and the breakage will go on.
>>>> I'm not entirely sure that's the case. This report still points at the
>>>> old commit fa6726c1e7 which has:
>>>>
>>>> + depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>>> + default n if !ARCH_HAS_DEBUG_VM_PGTABLE
>>>> + default y if DEBUG_VM
>>>>
>>>> In -next we now have commit 647d9a0de34c and subsequently modified by
>>>> commit 0a8646638865. So hopefully with the latest -next tree we won't
>>>> see this report.
>>> Could some one from LKP test framework, please confirm if this still causes
>>> above problem on the latest linux-next by default ?
>>
>> The .config is a rand config, the problem is still exist if run "make
>> oldconfig" for the config with commit 0a8646638865.
>
> Is randconfig expected to boot? I don't think it is but I guess it
> should not trigger a BUG_ON during boot.
>
>> $ grep -e CONFIG_MMU= -e CONFIG_EXPERT= -e CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=
>> -e CONFIG_DEBUG_VM= .config
>> CONFIG_EXPERT=y
>> CONFIG_MMU=y
>> CONFIG_DEBUG_VM=y
>>
>> should we disable DEBUG_VM_PGTABLE by default?
>
> If that's the only case where this fails in LKP, I'd rather remove the
> EXPERT dependency so that it cannot be enabled. Architectures that want
> to experiment with this feature will have to select
> ARCH_HAS_DEBUG_VM_PGTABLE explicitly.
>

But when something is not selectable, people won't even know it exists.

Why not try and fix the problems reported by the robots instead ?

2020-04-30 02:41:11

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [LKP] Re: [mm/debug] fa6726c1e7: kernel_BUG_at_include/linux/mm.h



On 04/30/2020 12:04 AM, Christophe Leroy wrote:
>
>
> Le 29/04/2020 à 20:15, Catalin Marinas a écrit :
>> On Wed, Apr 29, 2020 at 08:52:25PM +0800, Chen, Rong A wrote:
>>> On 4/29/2020 11:28 AM, Anshuman Khandual wrote:
>>>> On 04/28/2020 02:51 PM, Catalin Marinas wrote:
>>>>> On Tue, Apr 28, 2020 at 04:41:11AM -0400, Qian Cai wrote:
>>>>>> On Apr 28, 2020, at 1:54 AM, Anshuman Khandual <[email protected]> wrote:
>>>>>>> That is true. There is a slight change in the rules, making it explicit yes
>>>>>>> only when both ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM are enabled.
>>>>>>>
>>>>>>> +config DEBUG_VM_PGTABLE
>>>>>>> +    bool "Debug arch page table for semantics compliance"
>>>>>>> +    depends on MMU
>>>>>>> +    depends on !IA64 && !ARM
>>>>>>> +    depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>>>>>> +    default y if ARCH_HAS_DEBUG_VM_PGTABLE && DEBUG_VM
>>>>>>> +    help
>>>>>>>
>>>>>>> The default is really irrelevant as the config option can be set explicitly.
>>>>>> That could also explain. Since not long time ago, it was only “default
>>>>>> y if DEBUG_VM”, that caused the robot saved a .config with
>>>>>> DEBUG_VM_PGTABLE=y by default.
>>>>>>
>>>>>> Even though you changed the rule recently, it has no effect as the
>>>>>> robot could “make oldconfig” from the saved config for each linux-next
>>>>>> tree execution and the breakage will go on.
>>>>> I'm not entirely sure that's the case. This report still points at the
>>>>> old commit fa6726c1e7 which has:
>>>>>
>>>>> +       depends on ARCH_HAS_DEBUG_VM_PGTABLE || EXPERT
>>>>> +       default n if !ARCH_HAS_DEBUG_VM_PGTABLE
>>>>> +       default y if DEBUG_VM
>>>>>
>>>>> In -next we now have commit 647d9a0de34c and subsequently modified by
>>>>> commit 0a8646638865. So hopefully with the latest -next tree we won't
>>>>> see this report.
>>>> Could some one from LKP test framework, please confirm if this still causes
>>>> above problem on the latest linux-next by default ?
>>>
>>> The .config is a rand config, the problem is still exist if run "make
>>> oldconfig" for the config with commit 0a8646638865.
>>
>> Is randconfig expected to boot? I don't think it is but I guess it
>> should not trigger a BUG_ON during boot.
>>
>>> $ grep -e CONFIG_MMU= -e CONFIG_EXPERT= -e CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=
>>> -e CONFIG_DEBUG_VM= .config
>>> CONFIG_EXPERT=y
>>> CONFIG_MMU=y
>>> CONFIG_DEBUG_VM=y
>>>
>>> should we disable DEBUG_VM_PGTABLE by default?
>>
>> If that's the only case where this fails in LKP, I'd rather remove the
>> EXPERT dependency so that it cannot be enabled. Architectures that want
>> to experiment with this feature will have to select
>> ARCH_HAS_DEBUG_VM_PGTABLE explicitly.
>>

Dropping the EXPERT dependency here seems like the best solution even
though I have always tried to avoid that (many times).

>
> But when something is not selectable, people won't even know it exists

We will probably revisit this option later but for now it is really not
an absolute necessity for the test.

>
> Why not try and fix the problems reported by the robots instead ?
Enabling all non-enabled platforms will be an ongoing process that will
require support and collaboration from their respective platforms folks.
Similar collaborative approach earlier had enabled successfully test run
on arc, s390, ppc32, ppc64 platforms after starting with just arm64 and
x86.