2022-11-17 12:30:07

by Naresh Kamboju

[permalink] [raw]
Subject: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

Kunit test cases failed and found warnings while booting Linux next
version 6.1.0-rc5-next-20221117 on qemu-x86_64 [1].

It was working on Linux next-20221116 tag.

[ 0.663761] WARNING: CPU: 0 PID: 0 at
arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
[ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234
kfence_protect+0x7d/0x120
[ 0.664465] kfence: kfence_init failed

Reported-by: Linux Kernel Functional Testing <[email protected]>

Regressions found on qemu_x86_64:[2]

- kunit/test_use_after_free_read
- kunit/test_invalid_access
- kunit/test_double_free-memcache
- kunit/test_krealloc
- kunit/test_shrink_memcache
- kunit/test_memcache_typesafe_by_rcu
- kunit/test_use_after_free_read-memcache
- kunit/test_invalid_addr_free-memcache
- kunit/test_out_of_bounds_read-memcache
- kunit/test_memcache_alloc_bulk
- kunit/test_out_of_bounds_read
- kunit/test_memcache_ctor
- kunit/test_corruption-memcache
- kunit/test_gfpzero
- kunit/test_out_of_bounds_write
- kunit/test_out_of_bounds_write-memcache
- kunit/test_kmalloc_aligned_oob_read
- kunit/test_free_bulk-memcache

[ 0.663758] ------------[ cut here ]------------
[ 0.663761] WARNING: CPU: 0 PID: 0 at
arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
[ 0.663782] Modules linked in:
[ 0.663788] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
6.1.0-rc5-next-20221117 #1
[ 0.663795] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.12.0-1 04/01/2014
[ 0.663801] RIP: 0010:kfence_protect+0x7b/0x120
[ 0.663811] Code: f1 f1 f1 f1 c7 40 04 04 f3 f3 f3 65 48 8b 04 25
28 00 00 00 48 89 45 d8 31 c0 e8 e0 0d ba ff 48 85 c0 74 06 83 7d a0
01 74 17 <0f> 0b 0f 0b c6 05 cb 97 1d 03 00 45 31 c0 c6 05 c0 97 1d 03
01 eb
[ 0.663819] RSP: 0000:ffffffff9c407d98 EFLAGS: 00010002
[ 0.663826] RAX: ffff8880adc01020 RBX: 00000000000001ff RCX: 80000001000001e3
[ 0.663830] RDX: dffffc0000000000 RSI: ffff88811ac00000 RDI: ffff8880adc01020
[ 0.663836] RBP: ffffffff9c407e18 R08: ffffffff997c8ef8 R09: ffffea00046b7f87
[ 0.663841] R10: fffff940008d6ff0 R11: 0000000000000001 R12: 1ffffffff3880fb3
[ 0.663845] R13: ffff88811ac00000 R14: ffffea00046b7fc0 R15: 0000000000000200
[ 0.663852] FS: 0000000000000000(0000) GS:ffff88811b400000(0000)
knlGS:0000000000000000
[ 0.663859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.663864] CR2: ffff88813ffff000 CR3: 00000000ac814000 CR4: 00000000000406b0
[ 0.663869] Call Trace:
[ 0.663871] <TASK>
[ 0.663876] ? __pfx_kfence_protect+0x10/0x10
[ 0.663886] ? __pfx_set_memory_4k+0x10/0x10
[ 0.663899] kfence_init_pool+0x1ea/0x350
[ 0.663909] ? __pfx_kfence_init_pool+0x10/0x10
[ 0.663919] ? random_init+0xe9/0x13b
[ 0.663930] ? __pfx_random_init+0x10/0x10
[ 0.663936] ? _find_next_bit+0x46/0xe0
[ 0.663947] kfence_init+0x42/0x1e8
[ 0.663959] start_kernel+0x1fd/0x3a6
[ 0.663970] x86_64_start_reservations+0x28/0x2e
[ 0.663978] x86_64_start_kernel+0x96/0xa0
[ 0.663986] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.664001] </TASK>
[ 0.664003] ---[ end trace 0000000000000000 ]---
[ 0.664032] ------------[ cut here ]------------
[ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234
kfence_protect+0x7d/0x120
[ 0.664045] Modules linked in:
[ 0.664049] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W
6.1.0-rc5-next-20221117 #1
[ 0.664055] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.12.0-1 04/01/2014
[ 0.664058] RIP: 0010:kfence_protect+0x7d/0x120
[ 0.664068] Code: f1 f1 c7 40 04 04 f3 f3 f3 65 48 8b 04 25 28 00
00 00 48 89 45 d8 31 c0 e8 e0 0d ba ff 48 85 c0 74 06 83 7d a0 01 74
17 0f 0b <0f> 0b c6 05 cb 97 1d 03 00 45 31 c0 c6 05 c0 97 1d 03 01 eb
4a 48
[ 0.664074] RSP: 0000:ffffffff9c407d98 EFLAGS: 00010002
[ 0.664080] RAX: ffff8880adc01020 RBX: 00000000000001ff RCX: 80000001000001e3
[ 0.664085] RDX: dffffc0000000000 RSI: ffff88811ac00000 RDI: ffff8880adc01020
[ 0.664090] RBP: ffffffff9c407e18 R08: ffffffff997c8ef8 R09: ffffea00046b7f87
[ 0.664095] R10: fffff940008d6ff0 R11: 0000000000000001 R12: 1ffffffff3880fb3
[ 0.664099] R13: ffff88811ac00000 R14: ffffea00046b7fc0 R15: 0000000000000200
[ 0.664105] FS: 0000000000000000(0000) GS:ffff88811b400000(0000)
knlGS:0000000000000000
[ 0.664112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.664116] CR2: ffff88813ffff000 CR3: 00000000ac814000 CR4: 00000000000406b0
[ 0.664121] Call Trace:
[ 0.664123] <TASK>
[ 0.664126] ? __pfx_kfence_protect+0x10/0x10
[ 0.664136] ? __pfx_set_memory_4k+0x10/0x10
[ 0.664146] kfence_init_pool+0x1ea/0x350
[ 0.664156] ? __pfx_kfence_init_pool+0x10/0x10
[ 0.664166] ? random_init+0xe9/0x13b
[ 0.664172] ? __pfx_random_init+0x10/0x10
[ 0.664179] ? _find_next_bit+0x46/0xe0
[ 0.664186] kfence_init+0x42/0x1e8
[ 0.664196] start_kernel+0x1fd/0x3a6
[ 0.664205] x86_64_start_reservations+0x28/0x2e
[ 0.664213] x86_64_start_kernel+0x96/0xa0
[ 0.664221] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.664232] </TASK>
[ 0.664235] ---[ end trace 0000000000000000 ]---
[ 0.664465] kfence: kfence_init failed

metadata:
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git_sha: af37ad1e01c72483c4ee8453d9d9bac95d35f023
git_describe: next-20221117
kernel_version: 6.1.0-rc5
kernel-config: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm/config
build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/697483979
artifact-location: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm
toolchain: gcc-11


[1] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20221117/testrun/13045566/suite/kunit/test/kfence/log
[2] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20221117/testrun/13045566/suite/kunit/tests/
[3] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20221117/testrun/13045566/suite/kunit/test/test_invalid_access/history/



--
Linaro LKFT
https://lkft.linaro.org


2022-11-17 14:10:56

by Marco Elver

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On Thu, Nov 17, 2022 at 05:01PM +0530, Naresh Kamboju wrote:
> Kunit test cases failed and found warnings while booting Linux next
> version 6.1.0-rc5-next-20221117 on qemu-x86_64 [1].
>
> It was working on Linux next-20221116 tag.
>
> [ 0.663761] WARNING: CPU: 0 PID: 0 at
> arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
> [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234
> kfence_protect+0x7d/0x120
> [ 0.664465] kfence: kfence_init failed
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
[...]
> [ 0.663758] ------------[ cut here ]------------
> [ 0.663761] WARNING: CPU: 0 PID: 0 at
> arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
[...]
> [ 0.664465] kfence: kfence_init failed
>
> metadata:
> git_ref: master
> git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
> git_sha: af37ad1e01c72483c4ee8453d9d9bac95d35f023
> git_describe: next-20221117
> kernel_version: 6.1.0-rc5
> kernel-config: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm/config
> build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/697483979
> artifact-location: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm
> toolchain: gcc-11

I bisected this to:

commit 127960a05548ea699a95791669e8112552eb2452
Author: Peter Zijlstra <[email protected]>
Date: Thu Nov 10 13:33:57 2022 +0100

x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()

There is a cludge in change_page_attr_set_clr() that inhibits
propagating NX changes to the aliases (directmap and highmap) -- this
is a cludge twofold:

- it also inhibits the primary checks in __change_page_attr();
- it hard depends on single bit changes.

The introduction of set_memory_rox() triggered this last issue for
clearing both _PAGE_RW and _PAGE_NX.

Explicitly ignore _PAGE_NX in cpa_process_alias() instead.

Fixes: b38994948567 ("x86/mm: Implement native set_memory_rox()")
Reported-by: kernel test robot <[email protected]>
Debugged-by: Dave Hansen <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/20221110125544.594991716%40infradead.org

A simple revert of this commit fixes the issue.

Since all this seems to be about set_memory_rox(), and this is a fix
commit, the fix itself missed something?

Thanks,
-- Marco

2022-11-17 15:21:29

by Dave Hansen

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On 11/17/22 05:58, Marco Elver wrote:
> [ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
> [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120
> [ 0.664465] kfence: kfence_init failed

Any chance you could add some debugging and figure out what actually
made kfence call over? Was it the pte or the level?

if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;

I can see how the thing you bisected to might lead to a page table not
being split, which could mess with the 'level' check.

Also, is there a reason this code is mucking with the page tables
directly? It seems, uh, rather wonky. This, for instance:

> if (protect)
> set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> else
> set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
>
> /*
> * Flush this CPU's TLB, assuming whoever did the allocation/free is
> * likely to continue running on this CPU.
> */
> preempt_disable();
> flush_tlb_one_kernel(addr);
> preempt_enable();

Seems rather broken. I assume the preempt_disable() is there to get rid
of some warnings. But, there is nothing I can see to *keep* the CPU
that did the free from being different from the one where the TLB flush
is performed until the preempt_disable(). That makes the
flush_tlb_one_kernel() mostly useless.

Is there a reason this code isn't using the existing page table
manipulation functions and tries to code its own? What prevents it from
using something like the attached patch?


Attachments:
kfence.patch (1.24 kB)

2022-11-18 00:00:44

by Dave Hansen

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On 11/17/22 15:23, Marco Elver wrote:
> Yes - it's the 'level != PG_LEVEL_4K'.

That plus the bisect made it pretty easy to find, thanks for the effort!

Could you double-check that the attached patch fixes it? It seemed to
for me.

The issue was that the new "No changes, easy!" check in the suspect
commit didn't check the cpa->force_split option. It didn't split down
to 4k and then all hell broke loose.

Oh, and I totally misread the kfence ability to tolerate partial TLB
flushes. Sorry for the noise there!


Attachments:
kfence.patch (505.00 B)

2022-11-18 00:03:22

by Marco Elver

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On Thu, Nov 17, 2022 at 06:34AM -0800, Dave Hansen wrote:
> On 11/17/22 05:58, Marco Elver wrote:
> > [ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
> > [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120
> > [ 0.664465] kfence: kfence_init failed
>
> Any chance you could add some debugging and figure out what actually
> made kfence call over? Was it the pte or the level?
>
> if (WARN_ON(!pte || level != PG_LEVEL_4K))
> return false;
>
> I can see how the thing you bisected to might lead to a page table not
> being split, which could mess with the 'level' check.

Yes - it's the 'level != PG_LEVEL_4K'.

We do actually try to split the pages in arch_kfence_init_pool() (above
this function) - so with "x86/mm: Inhibit _PAGE_NX changes from
cpa_process_alias()" this somehow fails...

> Also, is there a reason this code is mucking with the page tables
> directly? It seems, uh, rather wonky. This, for instance:
>
> > if (protect)
> > set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> > else
> > set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
> >
> > /*
> > * Flush this CPU's TLB, assuming whoever did the allocation/free is
> > * likely to continue running on this CPU.
> > */
> > preempt_disable();
> > flush_tlb_one_kernel(addr);
> > preempt_enable();
>
> Seems rather broken. I assume the preempt_disable() is there to get rid
> of some warnings. But, there is nothing I can see to *keep* the CPU
> that did the free from being different from the one where the TLB flush
> is performed until the preempt_disable(). That makes the
> flush_tlb_one_kernel() mostly useless.
>
> Is there a reason this code isn't using the existing page table
> manipulation functions and tries to code its own? What prevents it from
> using something like the attached patch?

Yes, see the comment below - it's to avoid the IPIs and TLB shoot-downs,
because KFENCE _can_ tolerate the inaccuracy even if we hit the wrong
TLB or other CPUs' TLBs aren't immediately flushed - we trade a few
false negatives for minimizing performance impact.

> diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
> index ff5c7134a37a..5cdb3a1f3995 100644
> --- a/arch/x86/include/asm/kfence.h
> +++ b/arch/x86/include/asm/kfence.h
> @@ -37,34 +37,13 @@ static inline bool arch_kfence_init_pool(void)
> return true;
> }
>
> -/* Protect the given page and flush TLB. */
> static inline bool kfence_protect_page(unsigned long addr, bool protect)
> {
> - unsigned int level;
> - pte_t *pte = lookup_address(addr, &level);
> -
> - if (WARN_ON(!pte || level != PG_LEVEL_4K))
> - return false;
> -
> - /*
> - * We need to avoid IPIs, as we may get KFENCE allocations or faults
> - * with interrupts disabled. Therefore, the below is best-effort, and
> - * does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
> - * lazy fault handling takes care of faults after the page is PRESENT.
> - */
> -

^^ See this comment. Additionally there's a real performance concern,
and the inaccuracy is something that we deliberately accept.

> if (protect)
> - set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> + set_memory_np(addr, addr + PAGE_SIZE);
> else
> - set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
> + set_memory_p(addr, addr + PAGE_SIZE);

Isn't this going to do tons of IPIs and shoot down other CPU's TLBs?
KFENCE shouldn't incur this overhead on large machines with >100 CPUs if
we can avoid it.

What does "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" do
that suddenly makes all this fail?

What solution do you prefer that both fixes the issue and avoids the
IPIs?

Thanks,
-- Marco

2022-11-18 09:36:00

by Marco Elver

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On Thu, Nov 17, 2022 at 03:54PM -0800, Dave Hansen wrote:
> On 11/17/22 15:23, Marco Elver wrote:
> > Yes - it's the 'level != PG_LEVEL_4K'.
>
> That plus the bisect made it pretty easy to find, thanks for the effort!
>
> Could you double-check that the attached patch fixes it? It seemed to
> for me.

Yes, that works - thanks!

> The issue was that the new "No changes, easy!" check in the suspect
> commit didn't check the cpa->force_split option. It didn't split down
> to 4k and then all hell broke loose.
>
> Oh, and I totally misread the kfence ability to tolerate partial TLB
> flushes. Sorry for the noise there!

> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 220361ceb997..9b4e2ad957f6 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -1727,7 +1727,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
> /*
> * No changes, easy!
> */
> - if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr)))
> + if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr))
> + && !cpa->force_split)
> return ret;
>
> while (rempages) {

Tested-by: Marco Elver <[email protected]>

2022-11-18 11:05:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On Thu, Nov 17, 2022 at 03:54:21PM -0800, Dave Hansen wrote:
> On 11/17/22 15:23, Marco Elver wrote:
> > Yes - it's the 'level != PG_LEVEL_4K'.
>
> That plus the bisect made it pretty easy to find, thanks for the effort!
>
> Could you double-check that the attached patch fixes it? It seemed to
> for me.
>
> The issue was that the new "No changes, easy!" check in the suspect
> commit didn't check the cpa->force_split option. It didn't split down
> to 4k and then all hell broke loose.
>
> Oh, and I totally misread the kfence ability to tolerate partial TLB
> flushes. Sorry for the noise there!

> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 220361ceb997..9b4e2ad957f6 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -1727,7 +1727,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary)
> /*
> * No changes, easy!
> */
> - if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr)))
> + if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr))
> + && !cpa->force_split)

(operators go at the end of the previous line)

> return ret;
>
> while (rempages) {

Urgh.. sorry about that.


2022-11-21 06:00:57

by Naresh Kamboju

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On Thu, 17 Nov 2022 at 20:04, Dave Hansen <[email protected]> wrote:
>
> On 11/17/22 05:58, Marco Elver wrote:
> > [ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
> > [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120
> > [ 0.664465] kfence: kfence_init failed
>
> Any chance you could add some debugging and figure out what actually
> made kfence call over? Was it the pte or the level?
>
> if (WARN_ON(!pte || level != PG_LEVEL_4K))
> return false;
>
> I can see how the thing you bisected to might lead to a page table not
> being split, which could mess with the 'level' check.
>
> Also, is there a reason this code is mucking with the page tables
> directly? It seems, uh, rather wonky. This, for instance:
>
> > if (protect)
> > set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> > else
> > set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
> >
> > /*
> > * Flush this CPU's TLB, assuming whoever did the allocation/free is
> > * likely to continue running on this CPU.
> > */
> > preempt_disable();
> > flush_tlb_one_kernel(addr);
> > preempt_enable();
>
> Seems rather broken. I assume the preempt_disable() is there to get rid
> of some warnings. But, there is nothing I can see to *keep* the CPU
> that did the free from being different from the one where the TLB flush
> is performed until the preempt_disable(). That makes the
> flush_tlb_one_kernel() mostly useless.
>
> Is there a reason this code isn't using the existing page table
> manipulation functions and tries to code its own? What prevents it from
> using something like the attached patch?

I have applied this patch and found build warnings / errors.

In file included from mm/kfence/core.c:34:
arch/x86/include/asm/kfence.h: In function 'kfence_protect_page':
arch/x86/include/asm/kfence.h:45:17: error: implicit declaration of
function 'set_memory_p'; did you mean 'set_memory_np'?
[-Werror=implicit-function-declaration]
45 | set_memory_p(addr, addr + PAGE_SIZE);
| ^~~~~~~~~~~~
| set_memory_np
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:250: mm/kfence/core.o] Error 1
In file included from mm/kfence/report.c:20:
arch/x86/include/asm/kfence.h: In function 'kfence_protect_page':
arch/x86/include/asm/kfence.h:45:17: error: implicit declaration of
function 'set_memory_p'; did you mean 'set_memory_np'?
[-Werror=implicit-function-declaration]
45 | set_memory_p(addr, addr + PAGE_SIZE);
| ^~~~~~~~~~~~
| set_memory_np
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:250: mm/kfence/report.o] Error 1
In file included from mm/kfence/kfence_test.c:26:
arch/x86/include/asm/kfence.h: In function 'kfence_protect_page':
arch/x86/include/asm/kfence.h:45:17: error: implicit declaration of
function 'set_memory_p'; did you mean 'set_memory_np'?
[-Werror=implicit-function-declaration]
45 | set_memory_p(addr, addr + PAGE_SIZE);
| ^~~~~~~~~~~~
| set_memory_np
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:250: mm/kfence/kfence_test.o] Error 1

ref:
https://builds.tuxbuild.com/2HqMWcweeInju7rqVgGdNge7gby/

- Naresh

2022-11-21 07:53:15

by Naresh Kamboju

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

Hi Dave,

On Fri, 18 Nov 2022 at 05:24, Dave Hansen <[email protected]> wrote:
>
> On 11/17/22 15:23, Marco Elver wrote:
> > Yes - it's the 'level != PG_LEVEL_4K'.
>
> That plus the bisect made it pretty easy to find, thanks for the effort!
>
> Could you double-check that the attached patch fixes it? It seemed to
> for me.

I have applied the attached patch on Linux next 20221118 and tested [1].
The reported issue has been fixed now.

Reported-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Linux Kernel Functional Testing <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>

OTOH,
I request you to walk through the boot and test log [1] (new see few failures).
not ok 7 - test_double_free
not ok 9 - test_invalid_addr_free
not ok 11 - test_corruption
not ok 18 - test_kmalloc_aligned_oob_write
# kfence: pass:19 fail:4 skip:2 total:25
# Totals: pass:19 fail:4 skip:2 total:25
not ok 6 - kfence

> The issue was that the new "No changes, easy!" check in the suspect
> commit didn't check the cpa->force_split option. It didn't split down
> to 4k and then all hell broke loose.
>
> Oh, and I totally misread the kfence ability to tolerate partial TLB
> flushes. Sorry for the noise there!

[1] https://lkft.validation.linaro.org/scheduler/job/5866612#L621
[2] https://tuxmake.org/
[3] https://qa-reports.linaro.org/lkft/

Steps to reproduce by tuxmake
---

# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.
# Original tuxmake command with fragments listed below.
# tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-11
--kconfig defconfig --kconfig-add
https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-kernel/linux/files/lkft.config
--kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-kernel/linux/files/lkft-crypto.config
--kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-kernel/linux/files/distro-overrides.config
--kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-kernel/linux/files/systemd.config
--kconfig-add https://raw.githubusercontent.com/Linaro/meta-lkft/kirkstone/meta/recipes-kernel/linux/files/virtio.config
--kconfig-add CONFIG_IGB=y --kconfig-add
CONFIG_UNWINDER_FRAME_POINTER=y --kconfig-add CONFIG_KASAN=y
--kconfig-add CONFIG_KUNIT=y --kconfig-add CONFIG_KUNIT_ALL_TESTS=y


echo 'Patch series used in this build is available from
https://builds.tuxbuild.com/2HqQsi6ygbnwoCIENR9NB1zm80E/people.linaro.org/~naresh.kamboju-0001-kfence-fixing-kunit-testing.patch'
echo '[WARNING] Build starting with assumption that the required
patches are already applied.'
echo 'Press any key to continue or Ctrl+C to abort.'
read var_name

tuxmake --runtime podman --target-arch x86_64 --toolchain gcc-11
--kconfig https://builds.tuxbuild.com/2HqQsi6ygbnwoCIENR9NB1zm80E/config

- Naresh


Attachments:
kfence.patch (505.00 B)
kfence-kunit-boot-test-log-on-qemu-x86-linux-next.log (478.92 kB)
Download all attachments

2022-11-21 08:54:36

by Marco Elver

[permalink] [raw]
Subject: Re: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect

On Mon, 21 Nov 2022 at 08:28, Naresh Kamboju <[email protected]> wrote:
>
> Hi Dave,
>
> On Fri, 18 Nov 2022 at 05:24, Dave Hansen <[email protected]> wrote:
> >
> > On 11/17/22 15:23, Marco Elver wrote:
> > > Yes - it's the 'level != PG_LEVEL_4K'.
> >
> > That plus the bisect made it pretty easy to find, thanks for the effort!
> >
> > Could you double-check that the attached patch fixes it? It seemed to
> > for me.
>
> I have applied the attached patch on Linux next 20221118 and tested [1].
> The reported issue has been fixed now.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
> Tested-by: Linux Kernel Functional Testing <[email protected]>
> Tested-by: Naresh Kamboju <[email protected]>
>
> OTOH,
> I request you to walk through the boot and test log [1] (new see few failures).
> not ok 7 - test_double_free
> not ok 9 - test_invalid_addr_free
> not ok 11 - test_corruption
> not ok 18 - test_kmalloc_aligned_oob_write
> # kfence: pass:19 fail:4 skip:2 total:25
> # Totals: pass:19 fail:4 skip:2 total:25
> not ok 6 - kfence

Fixed by https://lkml.kernel.org/r/[email protected]