2021-04-14 02:41:42

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.

For catching simple conditions like a variable having value 0, this
is efficient because it does the test and the trap at the same time.
But most conditions used with BUG_ON or WARN_ON are more complex and
forces GCC to format the condition into a 0 or 1 value in a register.
This will usually require 2 to 3 instructions.

The most efficient solution would be to use __builtin_trap() because
GCC is able to optimise the use of the different trap instructions
based on the requested condition, but this is complex if not
impossible for the following reasons:
- __builtin_trap() is a non-recoverable instruction, so it can't be
used for WARN_ON
- Knowing which line of code generated the trap would require the
analysis of DWARF information. This is not a feature we have today.

As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
the way WARN_ON() is implemented is suboptimal. That commit also
mentions an issue with 'long long' condition. It fixed it for
WARN_ON() but the same problem still exists today with BUG_ON() on
PPC32. It will be fixed by using the generic implementation.

By using the generic implementation, gcc will naturally generate a
branch to the unconditional trap generated by BUG().

As modern powerpc implement zero-cycle branch,
that's even more efficient.

And for the functions using WARN_ON() and its return, the test
on return from WARN_ON() is now also used for the WARN_ON() itself.

On PPC64 we don't want it because we want to be able to use CFAR
register to track how we entered the code that trapped. The CFAR
register would be clobbered by the branch.

A simple test function:

unsigned long test9w(unsigned long a, unsigned long b)
{
if (WARN_ON(!b))
return 0;
return a / b;
}

Before the patch:

0000046c <test9w>:
46c: 7c 89 00 34 cntlzw r9,r4
470: 55 29 d9 7e rlwinm r9,r9,27,5,31
474: 0f 09 00 00 twnei r9,0
478: 2c 04 00 00 cmpwi r4,0
47c: 41 82 00 0c beq 488 <test9w+0x1c>
480: 7c 63 23 96 divwu r3,r3,r4
484: 4e 80 00 20 blr

488: 38 60 00 00 li r3,0
48c: 4e 80 00 20 blr

After the patch:

00000468 <test9w>:
468: 2c 04 00 00 cmpwi r4,0
46c: 41 82 00 0c beq 478 <test9w+0x10>
470: 7c 63 23 96 divwu r3,r3,r4
474: 4e 80 00 20 blr

478: 0f e0 00 00 twui r0,0
47c: 38 60 00 00 li r3,0
480: 4e 80 00 20 blr

So we see before the patch we need 3 instructions on the likely path
to handle the WARN_ON(). With the patch the trap goes on the unlikely
path.

See below the difference at the entry of system_call_exception where
we have several BUG_ON(), allthough less impressing.

With the patch:

00000000 <system_call_exception>:
0: 81 6a 00 84 lwz r11,132(r10)
4: 90 6a 00 88 stw r3,136(r10)
8: 71 60 00 02 andi. r0,r11,2
c: 41 82 00 70 beq 7c <system_call_exception+0x7c>
10: 71 60 40 00 andi. r0,r11,16384
14: 41 82 00 6c beq 80 <system_call_exception+0x80>
18: 71 6b 80 00 andi. r11,r11,32768
1c: 41 82 00 68 beq 84 <system_call_exception+0x84>
20: 94 21 ff e0 stwu r1,-32(r1)
24: 93 e1 00 1c stw r31,28(r1)
28: 7d 8c 42 e6 mftb r12
...
7c: 0f e0 00 00 twui r0,0
80: 0f e0 00 00 twui r0,0
84: 0f e0 00 00 twui r0,0

Without the patch:

00000000 <system_call_exception>:
0: 94 21 ff e0 stwu r1,-32(r1)
4: 93 e1 00 1c stw r31,28(r1)
8: 90 6a 00 88 stw r3,136(r10)
c: 81 6a 00 84 lwz r11,132(r10)
10: 69 60 00 02 xori r0,r11,2
14: 54 00 ff fe rlwinm r0,r0,31,31,31
18: 0f 00 00 00 twnei r0,0
1c: 69 60 40 00 xori r0,r11,16384
20: 54 00 97 fe rlwinm r0,r0,18,31,31
24: 0f 00 00 00 twnei r0,0
28: 69 6b 80 00 xori r11,r11,32768
2c: 55 6b 8f fe rlwinm r11,r11,17,31,31
30: 0f 0b 00 00 twnei r11,0
34: 7d 8c 42 e6 mftb r12

Signed-off-by: Christophe Leroy <[email protected]>
---
arch/powerpc/include/asm/bug.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index d1635ffbb179..101dea4eec8d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -68,7 +68,11 @@
BUG_ENTRY("twi 31, 0, 0", 0); \
unreachable(); \
} while (0)
+#define HAVE_ARCH_BUG
+
+#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))

+#ifdef CONFIG_PPC64
#define BUG_ON(x) do { \
if (__builtin_constant_p(x)) { \
if (x) \
@@ -78,8 +82,6 @@
} \
} while (0)

-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
-
#define WARN_ON(x) ({ \
int __ret_warn_on = !!(x); \
if (__builtin_constant_p(__ret_warn_on)) { \
@@ -93,9 +95,10 @@
unlikely(__ret_warn_on); \
})

-#define HAVE_ARCH_BUG
#define HAVE_ARCH_BUG_ON
#define HAVE_ARCH_WARN_ON
+#endif
+
#endif /* __ASSEMBLY __ */
#else
#ifdef __ASSEMBLY__
--
2.25.0


2021-04-14 02:41:42

by Christophe Leroy

[permalink] [raw]
Subject: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

unsigned long test(struct pt_regs *regs)
{
int ret;

WARN_ON(regs->msr & MSR_PR);

return regs->gpr[3];
}

unsigned long test9w(unsigned long a, unsigned long b)
{
if (WARN_ON(!b))
return 0;
return a / b;
}

Before the patch:

000003a8 <test>:
3a8: 81 23 00 84 lwz r9,132(r3)
3ac: 71 29 40 00 andi. r9,r9,16384
3b0: 40 82 00 0c bne 3bc <test+0x14>
3b4: 80 63 00 0c lwz r3,12(r3)
3b8: 4e 80 00 20 blr

3bc: 0f e0 00 00 twui r0,0
3c0: 80 63 00 0c lwz r3,12(r3)
3c4: 4e 80 00 20 blr

0000000000000bf0 <.test9w>:
bf0: 7c 89 00 74 cntlzd r9,r4
bf4: 79 29 d1 82 rldicl r9,r9,58,6
bf8: 0b 09 00 00 tdnei r9,0
bfc: 2c 24 00 00 cmpdi r4,0
c00: 41 82 00 0c beq c0c <.test9w+0x1c>
c04: 7c 63 23 92 divdu r3,r3,r4
c08: 4e 80 00 20 blr

c0c: 38 60 00 00 li r3,0
c10: 4e 80 00 20 blr

After the patch:

000003a8 <test>:
3a8: 81 23 00 84 lwz r9,132(r3)
3ac: 71 29 40 00 andi. r9,r9,16384
3b0: 40 82 00 0c bne 3bc <test+0x14>
3b4: 80 63 00 0c lwz r3,12(r3)
3b8: 4e 80 00 20 blr

3bc: 0f e0 00 00 twui r0,0

0000000000000c50 <.test9w>:
c50: 7c 89 00 74 cntlzd r9,r4
c54: 79 29 d1 82 rldicl r9,r9,58,6
c58: 0b 09 00 00 tdnei r9,0
c5c: 7c 63 23 92 divdu r3,r3,r4
c60: 4e 80 00 20 blr

c70: 38 60 00 00 li r3,0
c74: 4e 80 00 20 blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <[email protected]>
---
v2: Fix build failure when CONFIG_BUG is not selected.
---
arch/powerpc/include/asm/book3s/64/kup.h | 2 +-
arch/powerpc/include/asm/bug.h | 54 ++++++++++++++++++++----
arch/powerpc/include/asm/extable.h | 14 ++++++
arch/powerpc/include/asm/ppc_asm.h | 11 +----
arch/powerpc/kernel/entry_64.S | 2 +-
arch/powerpc/kernel/exceptions-64e.S | 2 +-
arch/powerpc/kernel/misc_32.S | 2 +-
arch/powerpc/kernel/traps.c | 9 +++-
scripts/mod/modpost.c | 2 +-
9 files changed, 72 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 9700da3a4093..a22839cba32e 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -90,7 +90,7 @@
/* Prevent access to userspace using any key values */
LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
999: tdne \gpr1, \gpr2
- EMIT_BUG_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
+ EMIT_WARN_ENTRY 999b, __FILE__, __LINE__, (BUGFLAG_WARNING | BUGFLAG_ONCE)
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_BOOK3S_KUAP, 67)
#endif
.endm
diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 101dea4eec8d..e22dc503fb2f 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -4,6 +4,7 @@
#ifdef __KERNEL__

#include <asm/asm-compat.h>
+#include <asm/extable.h>

#ifdef CONFIG_BUG

@@ -30,6 +31,11 @@
.endm
#endif /* verbose */

+.macro EMIT_WARN_ENTRY addr,file,line,flags
+ EX_TABLE(\addr,\addr+4)
+ EMIT_BUG_ENTRY \addr,\file,\line,\flags
+.endm
+
#else /* !__ASSEMBLY__ */
/* _EMIT_BUG_ENTRY expects args %0,%1,%2,%3 to be FILE, LINE, flags and
sizeof(struct bug_entry), respectively */
@@ -58,6 +64,16 @@
"i" (sizeof(struct bug_entry)), \
##__VA_ARGS__)

+#define WARN_ENTRY(insn, flags, label, ...) \
+ asm_volatile_goto( \
+ "1: " insn "\n" \
+ EX_TABLE(1b, %l[label]) \
+ _EMIT_BUG_ENTRY \
+ : : "i" (__FILE__), "i" (__LINE__), \
+ "i" (flags), \
+ "i" (sizeof(struct bug_entry)), \
+ ##__VA_ARGS__ : : label)
+
/*
* BUG_ON() and WARN_ON() do their best to cooperate with compile-time
* optimisations. However depending on the complexity of the condition
@@ -70,7 +86,15 @@
} while (0)
#define HAVE_ARCH_BUG

-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags))
+#define __WARN_FLAGS(flags) do { \
+ __label__ __label_warn_on; \
+ \
+ WARN_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | (flags), __label_warn_on); \
+ unreachable(); \
+ \
+__label_warn_on: \
+ break; \
+} while (0)

#ifdef CONFIG_PPC64
#define BUG_ON(x) do { \
@@ -83,15 +107,24 @@
} while (0)

#define WARN_ON(x) ({ \
- int __ret_warn_on = !!(x); \
- if (__builtin_constant_p(__ret_warn_on)) { \
- if (__ret_warn_on) \
+ bool __ret_warn_on = false; \
+ do { \
+ if (__builtin_constant_p((x))) { \
+ if (!(x)) \
+ break; \
__WARN(); \
- } else { \
- BUG_ENTRY(PPC_TLNEI " %4, 0", \
- BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
- "r" (__ret_warn_on)); \
- } \
+ __ret_warn_on = true; \
+ } else { \
+ __label__ __label_warn_on; \
+ \
+ WARN_ENTRY(PPC_TLNEI " %4, 0", \
+ BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
+ __label_warn_on, "r" (x)); \
+ break; \
+__label_warn_on: \
+ __ret_warn_on = true; \
+ } \
+ } while (0); \
unlikely(__ret_warn_on); \
})

@@ -104,8 +137,11 @@
#ifdef __ASSEMBLY__
.macro EMIT_BUG_ENTRY addr,file,line,flags
.endm
+.macro EMIT_WARN_ENTRY addr,file,line,flags
+.endm
#else /* !__ASSEMBLY__ */
#define _EMIT_BUG_ENTRY
+#define _EMIT_WARN_ENTRY
#endif
#endif /* CONFIG_BUG */

diff --git a/arch/powerpc/include/asm/extable.h b/arch/powerpc/include/asm/extable.h
index eb91b2d2935a..26ce2e5c0fa8 100644
--- a/arch/powerpc/include/asm/extable.h
+++ b/arch/powerpc/include/asm/extable.h
@@ -17,6 +17,8 @@

#define ARCH_HAS_RELATIVE_EXTABLE

+#ifndef __ASSEMBLY__
+
struct exception_table_entry {
int insn;
int fixup;
@@ -28,3 +30,15 @@ static inline unsigned long extable_fixup(const struct exception_table_entry *x)
}

#endif
+
+/*
+ * Helper macro for exception table entries
+ */
+#define EX_TABLE(_fault, _target) \
+ stringify_in_c(.section __ex_table,"a";)\
+ stringify_in_c(.balign 4;) \
+ stringify_in_c(.long (_fault) - . ;) \
+ stringify_in_c(.long (_target) - . ;) \
+ stringify_in_c(.previous)
+
+#endif
diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 8998122fc7e2..a74e1561535a 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -10,6 +10,7 @@
#include <asm/ppc-opcode.h>
#include <asm/firmware.h>
#include <asm/feature-fixups.h>
+#include <asm/extable.h>

#ifdef __ASSEMBLY__

@@ -772,16 +773,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)

#endif /* __ASSEMBLY__ */

-/*
- * Helper macro for exception table entries
- */
-#define EX_TABLE(_fault, _target) \
- stringify_in_c(.section __ex_table,"a";)\
- stringify_in_c(.balign 4;) \
- stringify_in_c(.long (_fault) - . ;) \
- stringify_in_c(.long (_target) - . ;) \
- stringify_in_c(.previous)
-
#ifdef CONFIG_PPC_FSL_BOOK3E
#define BTB_FLUSH(reg) \
lis reg,BUCSR_INIT@h; \
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6c4d9e276c4d..faa64cc90f02 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -835,7 +835,7 @@ _GLOBAL(enter_rtas)
*/
lbz r0,PACAIRQSOFTMASK(r13)
1: tdeqi r0,IRQS_ENABLED
- EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+ EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
#endif

/* Hard-disable interrupts */
diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
index e8eb9992a270..3f8c51390a04 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1249,7 +1249,7 @@ fast_exception_return:
/* The interrupt should not have soft enabled. */
lbz r7,PACAIRQSOFTMASK(r13)
1: tdeqi r7,IRQS_ENABLED
- EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
+ EMIT_WARN_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
#endif
b fast_exception_return

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index 6a076bef2932..21390f3119a9 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -237,7 +237,7 @@ _GLOBAL(copy_page)
addi r3,r3,-4

0: twnei r5, 0 /* WARN if r3 is not cache aligned */
- EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+ EMIT_WARN_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING

addi r4,r4,-4

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index efba99870691..ee40a637313d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1467,8 +1467,13 @@ static void do_program_check(struct pt_regs *regs)

if (!(regs->msr & MSR_PR) && /* not user-mode */
report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
- regs->nip += 4;
- return;
+ const struct exception_table_entry *entry;
+
+ entry = search_exception_tables(bugaddr);
+ if (entry) {
+ regs->nip = extable_fixup(entry) + regs->nip - bugaddr;
+ return;
+ }
}
_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
return;
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 24725e50c7b4..34745f239208 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf,
".kprobes.text", ".cpuidle.text", ".noinstr.text"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", ".text.*", \
- ".coldtext"
+ ".coldtext .softirqentry.text"

#define INIT_SECTIONS ".init.*"
#define MEM_INIT_SECTIONS ".meminit.*"
--
2.25.0

2021-08-13 06:50:11

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
>
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.

Nice idea. How much does it bloat the exception table?
How easy would it be to make a different exception table for
unimportant stuff like WARN_ON faults?

>
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
>
> unsigned long test(struct pt_regs *regs)
> {
> int ret;
>
> WARN_ON(regs->msr & MSR_PR);
>
> return regs->gpr[3];
> }
>
> unsigned long test9w(unsigned long a, unsigned long b)
> {
> if (WARN_ON(!b))
> return 0;
> return a / b;
> }
>
> Before the patch:
>
> 000003a8 <test>:
> 3a8: 81 23 00 84 lwz r9,132(r3)
> 3ac: 71 29 40 00 andi. r9,r9,16384
> 3b0: 40 82 00 0c bne 3bc <test+0x14>
> 3b4: 80 63 00 0c lwz r3,12(r3)
> 3b8: 4e 80 00 20 blr
>
> 3bc: 0f e0 00 00 twui r0,0
> 3c0: 80 63 00 0c lwz r3,12(r3)
> 3c4: 4e 80 00 20 blr
>
> 0000000000000bf0 <.test9w>:
> bf0: 7c 89 00 74 cntlzd r9,r4
> bf4: 79 29 d1 82 rldicl r9,r9,58,6
> bf8: 0b 09 00 00 tdnei r9,0
> bfc: 2c 24 00 00 cmpdi r4,0
> c00: 41 82 00 0c beq c0c <.test9w+0x1c>
> c04: 7c 63 23 92 divdu r3,r3,r4
> c08: 4e 80 00 20 blr
>
> c0c: 38 60 00 00 li r3,0
> c10: 4e 80 00 20 blr
>
> After the patch:
>
> 000003a8 <test>:
> 3a8: 81 23 00 84 lwz r9,132(r3)
> 3ac: 71 29 40 00 andi. r9,r9,16384
> 3b0: 40 82 00 0c bne 3bc <test+0x14>
> 3b4: 80 63 00 0c lwz r3,12(r3)
> 3b8: 4e 80 00 20 blr
>
> 3bc: 0f e0 00 00 twui r0,0
>
> 0000000000000c50 <.test9w>:
> c50: 7c 89 00 74 cntlzd r9,r4
> c54: 79 29 d1 82 rldicl r9,r9,58,6
> c58: 0b 09 00 00 tdnei r9,0
> c5c: 7c 63 23 92 divdu r3,r3,r4
> c60: 4e 80 00 20 blr
>
> c70: 38 60 00 00 li r3,0
> c74: 4e 80 00 20 blr

One thing that would be nice for WARN_ON_ONCE is to patch out the trap
after the program interrupt. I've seen sometimes the WARN_ON_ONCE starts
firing a lot and slows down the kernel. That gets harder to do if the
trap is now part of the control flow.

I guess that's less important case for performance though. And in theory
you might be able to replace the trap with an equivalent cmp and branch
(but that's probably going too over engineering it).

Thanks,
Nick

2021-08-13 06:55:31

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Excerpts from Christophe Leroy's message of April 14, 2021 2:38 am:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
>
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
>
> The most efficient solution would be to use __builtin_trap() because
> GCC is able to optimise the use of the different trap instructions
> based on the requested condition, but this is complex if not
> impossible for the following reasons:
> - __builtin_trap() is a non-recoverable instruction, so it can't be
> used for WARN_ON
> - Knowing which line of code generated the trap would require the
> analysis of DWARF information. This is not a feature we have today.
>
> As mentioned in commit 8d4fbcfbe0a4 ("Fix WARN_ON() on bitfield ops")
> the way WARN_ON() is implemented is suboptimal. That commit also
> mentions an issue with 'long long' condition. It fixed it for
> WARN_ON() but the same problem still exists today with BUG_ON() on
> PPC32. It will be fixed by using the generic implementation.
>
> By using the generic implementation, gcc will naturally generate a
> branch to the unconditional trap generated by BUG().
>
> As modern powerpc implement zero-cycle branch,
> that's even more efficient.
>
> And for the functions using WARN_ON() and its return, the test
> on return from WARN_ON() is now also used for the WARN_ON() itself.
>
> On PPC64 we don't want it because we want to be able to use CFAR
> register to track how we entered the code that trapped. The CFAR
> register would be clobbered by the branch.
>
> A simple test function:
>
> unsigned long test9w(unsigned long a, unsigned long b)
> {
> if (WARN_ON(!b))
> return 0;
> return a / b;
> }
>
> Before the patch:
>
> 0000046c <test9w>:
> 46c: 7c 89 00 34 cntlzw r9,r4
> 470: 55 29 d9 7e rlwinm r9,r9,27,5,31
> 474: 0f 09 00 00 twnei r9,0
> 478: 2c 04 00 00 cmpwi r4,0
> 47c: 41 82 00 0c beq 488 <test9w+0x1c>
> 480: 7c 63 23 96 divwu r3,r3,r4
> 484: 4e 80 00 20 blr
>
> 488: 38 60 00 00 li r3,0
> 48c: 4e 80 00 20 blr
>
> After the patch:
>
> 00000468 <test9w>:
> 468: 2c 04 00 00 cmpwi r4,0
> 46c: 41 82 00 0c beq 478 <test9w+0x10>
> 470: 7c 63 23 96 divwu r3,r3,r4
> 474: 4e 80 00 20 blr
>
> 478: 0f e0 00 00 twui r0,0
> 47c: 38 60 00 00 li r3,0
> 480: 4e 80 00 20 blr

That's clearly better because we have a branch anyway.

>
> So we see before the patch we need 3 instructions on the likely path
> to handle the WARN_ON(). With the patch the trap goes on the unlikely
> path.
>
> See below the difference at the entry of system_call_exception where
> we have several BUG_ON(), allthough less impressing.
>
> With the patch:
>
> 00000000 <system_call_exception>:
> 0: 81 6a 00 84 lwz r11,132(r10)
> 4: 90 6a 00 88 stw r3,136(r10)
> 8: 71 60 00 02 andi. r0,r11,2
> c: 41 82 00 70 beq 7c <system_call_exception+0x7c>
> 10: 71 60 40 00 andi. r0,r11,16384
> 14: 41 82 00 6c beq 80 <system_call_exception+0x80>
> 18: 71 6b 80 00 andi. r11,r11,32768
> 1c: 41 82 00 68 beq 84 <system_call_exception+0x84>
> 20: 94 21 ff e0 stwu r1,-32(r1)
> 24: 93 e1 00 1c stw r31,28(r1)
> 28: 7d 8c 42 e6 mftb r12
> ...
> 7c: 0f e0 00 00 twui r0,0
> 80: 0f e0 00 00 twui r0,0
> 84: 0f e0 00 00 twui r0,0
>
> Without the patch:
>
> 00000000 <system_call_exception>:
> 0: 94 21 ff e0 stwu r1,-32(r1)
> 4: 93 e1 00 1c stw r31,28(r1)
> 8: 90 6a 00 88 stw r3,136(r10)
> c: 81 6a 00 84 lwz r11,132(r10)
> 10: 69 60 00 02 xori r0,r11,2
> 14: 54 00 ff fe rlwinm r0,r0,31,31,31
> 18: 0f 00 00 00 twnei r0,0
> 1c: 69 60 40 00 xori r0,r11,16384
> 20: 54 00 97 fe rlwinm r0,r0,18,31,31
> 24: 0f 00 00 00 twnei r0,0
> 28: 69 6b 80 00 xori r11,r11,32768
> 2c: 55 6b 8f fe rlwinm r11,r11,17,31,31
> 30: 0f 0b 00 00 twnei r11,0
> 34: 7d 8c 42 e6 mftb r12

This one possibly the branches end up in predictors, whereas conditional
trap is always just speculated not to hit. Branches may also have a
throughput limit on execution whereas trap could be more (1 per cycle
vs 4 per cycle on POWER9).

On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
is the CFAR issue as well which makes it a problem for 64s. It would
have been nice if it could use the same code though.

Maybe one day gcc's __builtin_trap() will become smart enough around
conditional statements that it it generates better code and tries to
avoid branches.

Thanks,
Nick

2021-08-15 03:51:38

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Christophe Leroy <[email protected]> writes:
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 24725e50c7b4..34745f239208 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -926,7 +926,7 @@ static void check_section(const char *modname, struct elf_info *elf,
> ".kprobes.text", ".cpuidle.text", ".noinstr.text"
> #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
> ".fixup", ".entry.text", ".exception.text", ".text.*", \
> - ".coldtext"
> + ".coldtext .softirqentry.text"

This wasn't working, I updated it to:

".coldtext", ".softirqentry.text"

Which works.

cheers

2021-08-18 13:47:57

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

On Tue, 13 Apr 2021 16:38:09 +0000 (UTC), Christophe Leroy wrote:
> powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.
>
> For catching simple conditions like a variable having value 0, this
> is efficient because it does the test and the trap at the same time.
> But most conditions used with BUG_ON or WARN_ON are more complex and
> forces GCC to format the condition into a 0 or 1 value in a register.
> This will usually require 2 to 3 instructions.
>
> [...]

Applied to powerpc/next.

[1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
https://git.kernel.org/powerpc/c/db87a7199229b75c9996bf78117eceb81854fce2
[2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
https://git.kernel.org/powerpc/c/1e688dd2a3d6759d416616ff07afc4bb836c4213

cheers

2021-08-18 15:15:58

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> This one possibly the branches end up in predictors, whereas conditional
> trap is always just speculated not to hit. Branches may also have a
> throughput limit on execution whereas trap could be more (1 per cycle
> vs 4 per cycle on POWER9).

I thought only *taken* branches are just one per cycle? And those
branches are only taken for the exceptional condition (or the case where
we do not care about performance, anyway, if we do have an error most of
the time ;-) )

> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
> is the CFAR issue as well which makes it a problem for 64s. It would
> have been nice if it could use the same code though.

On 64-bit the code looks better for the no-error path as well.

> Maybe one day gcc's __builtin_trap() will become smart enough around
> conditional statements that it it generates better code and tries to
> avoid branches.

Internally *all* traps are conditional, in GCC. It also can optimise
them quite well. There must be something in the kernel macros that
prevents good optimisation.


Segher

2021-08-25 21:28:23

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Hi Christophe,

On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> flexibility to GCC.
>
> For that add an entry to the exception table so that
> program_check_exception() knowns where to resume execution
> after a WARNING.
>
> Here are two exemples. The first one is done on PPC32 (which
> benefits from the previous patch), the second is on PPC64.
>
> unsigned long test(struct pt_regs *regs)
> {
> int ret;
>
> WARN_ON(regs->msr & MSR_PR);
>
> return regs->gpr[3];
> }
>
> unsigned long test9w(unsigned long a, unsigned long b)
> {
> if (WARN_ON(!b))
> return 0;
> return a / b;
> }
>
> Before the patch:
>
> 000003a8 <test>:
> 3a8: 81 23 00 84 lwz r9,132(r3)
> 3ac: 71 29 40 00 andi. r9,r9,16384
> 3b0: 40 82 00 0c bne 3bc <test+0x14>
> 3b4: 80 63 00 0c lwz r3,12(r3)
> 3b8: 4e 80 00 20 blr
>
> 3bc: 0f e0 00 00 twui r0,0
> 3c0: 80 63 00 0c lwz r3,12(r3)
> 3c4: 4e 80 00 20 blr
>
> 0000000000000bf0 <.test9w>:
> bf0: 7c 89 00 74 cntlzd r9,r4
> bf4: 79 29 d1 82 rldicl r9,r9,58,6
> bf8: 0b 09 00 00 tdnei r9,0
> bfc: 2c 24 00 00 cmpdi r4,0
> c00: 41 82 00 0c beq c0c <.test9w+0x1c>
> c04: 7c 63 23 92 divdu r3,r3,r4
> c08: 4e 80 00 20 blr
>
> c0c: 38 60 00 00 li r3,0
> c10: 4e 80 00 20 blr
>
> After the patch:
>
> 000003a8 <test>:
> 3a8: 81 23 00 84 lwz r9,132(r3)
> 3ac: 71 29 40 00 andi. r9,r9,16384
> 3b0: 40 82 00 0c bne 3bc <test+0x14>
> 3b4: 80 63 00 0c lwz r3,12(r3)
> 3b8: 4e 80 00 20 blr
>
> 3bc: 0f e0 00 00 twui r0,0
>
> 0000000000000c50 <.test9w>:
> c50: 7c 89 00 74 cntlzd r9,r4
> c54: 79 29 d1 82 rldicl r9,r9,58,6
> c58: 0b 09 00 00 tdnei r9,0
> c5c: 7c 63 23 92 divdu r3,r3,r4
> c60: 4e 80 00 20 blr
>
> c70: 38 60 00 00 li r3,0
> c74: 4e 80 00 20 blr
>
> In the first exemple, we see GCC doesn't need to duplicate what
> happens after the trap.
>
> In the second exemple, we see that GCC doesn't need to emit a test
> and a branch in the likely path in addition to the trap.
>
> We've got some WARN_ON() in .softirqentry.text section so it needs
> to be added in the OTHER_TEXT_SECTIONS in modpost.c
>
> Signed-off-by: Christophe Leroy <[email protected]>

This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
klist_add_tail to trigger over and over on boot when compiling with
clang:

[ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
[ 2.177456][ T1] Modules linked in:
[ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1
[ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
[ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825)
[ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000
[ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0
[ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
[ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
[ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
[ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
[ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
[ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
[ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
[ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
[ 2.178088][ T1] Call Trace:
[ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
[ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
[ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
[ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
[ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
[ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
[ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
[ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
[ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
[ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
[ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
[ 2.178569][ T1] Instruction dump:
[ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
[ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
[ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]---

Is this a bug with clang or is there something wrong with the patch? The
vmlinux image is available at [1] if you want to inspect it and our QEMU
command and the warning at boot can be viewed at [2]. If there is any
other information I can provide, please let me know.

[1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
[2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs

Cheers,
Nathan

2021-08-26 03:24:34

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Nathan Chancellor <[email protected]> writes:
> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>> flexibility to GCC.
...
>
> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> klist_add_tail to trigger over and over on boot when compiling with
> clang:
>
> [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> [ 2.177456][ T1] Modules linked in:
> [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1
> [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825)
> [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000
> [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0
> [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> [ 2.178088][ T1] Call Trace:
> [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> [ 2.178569][ T1] Instruction dump:
> [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]---
>
> Is this a bug with clang or is there something wrong with the patch? The
> vmlinux image is available at [1] if you want to inspect it and our QEMU
> command and the warning at boot can be viewed at [2]. If there is any
> other information I can provide, please let me know.
>
> [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs

Thanks.

This is the generated assembly:

c0000000007ff600 <.klist_add_tail>:
c0000000007ff600: 7c 08 02 a6 mflr r0
c0000000007ff604: f8 01 00 10 std r0,16(r1)
c0000000007ff608: f8 21 ff 71 stdu r1,-144(r1) ^ prolog
c0000000007ff60c: fb a1 00 78 std r29,120(r1) save r29 to stack
c0000000007ff610: 7c 7d 1b 78 mr r29,r3 r29 = struct klist_node *n
c0000000007ff614: 38 60 00 01 li r3,1 r3 = 1
c0000000007ff618: fb 81 00 70 std r28,112(r1) save r28 to stack
c0000000007ff61c: 3b 9d 00 08 addi r28,r29,8 r28 = &n->n_node
c0000000007ff620: fb c1 00 80 std r30,128(r1) save r30 to stack
c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k
c0000000007ff628: 38 9d 00 18 addi r4,r29,24 r4 = &n->n_ref
c0000000007ff62c: fb 9d 00 08 std r28,8(r29) n->n_node.next = &n->n_node INIT_LIST_HEAD
c0000000007ff630: fb 9d 00 10 std r28,16(r29) n->n_node.prev = &n->n_node
c0000000007ff634: 90 64 00 00 stw r3,0(r4) kref_init(&n->n_ref)
c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k
c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero


From:

static void knode_set_klist(struct klist_node *knode, struct klist *klist)
{
knode->n_klist = klist;
/* no knode deserves to start its life dead */
WARN_ON(knode_dead(knode));
^^^^^^^^^^^^^^^^^
}

Which expands to:

static void knode_set_klist(struct klist_node *knode, struct klist *klist)
{
knode->n_klist = klist;

({
bool __ret_warn_on = false;
do {
...
} else {
__label__ __label_warn_on;
do {
asm goto(
"1: "
"tdnei"
"
" " % 4,
0 " "\n " ".section __ex_table,\"a\";"
" "
".balign 4;"
" "
".long (1b) - . ;"
" "
".long (%l[__label_warn_on]) - . ;"
" "
".previous"
" "
".section __bug_table,\"aw\"\n"
"2:\t.4byte 1b - 2b, %0 - 2b\n"
"\t.short %1, %2\n"
".org 2b+%3\n"
".previous\n"
:
: "i"("lib/klist.c"), "i"(62),
"i"((1 << 0) | ((9) << 8)),
"i"(sizeof(struct bug_entry)),
"r"(knode_dead(knode))
^^^^^^^^^^^^^^^^^^^^^

:
: __label_warn_on);
asm("");
} while (0);
break;
__label_warn_on:
__ret_warn_on = true;
}
} while (0);
__builtin_expect(!!(__ret_warn_on), 0);
});
}

And knode_dead is:

#define KNODE_DEAD 1LU

static bool knode_dead(struct klist_node *knode)
{
return (unsigned long)knode->n_klist & KNODE_DEAD;
}


So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.

But in the asm:

c0000000007ff600 <.klist_add_tail>:
...
c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k
...
c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k
c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero


It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".

In the GCC output you can see it:

c0000000008c8a30 <.klist_node_init>:
c0000000008c8a30: 39 24 00 08 addi r9,r4,8
c0000000008c8a34: 39 40 00 01 li r10,1
c0000000008c8a38: f9 24 00 08 std r9,8(r4)
c0000000008c8a3c: f9 24 00 10 std r9,16(r4)
c0000000008c8a40: 91 44 00 18 stw r10,24(r4)
c0000000008c8a44: f8 64 00 00 std r3,0(r4)
c0000000008c8a48: 54 69 07 fe clrlwi r9,r3,31
c0000000008c8a4c: 0b 09 00 00 tdnei r9,0

ie. the clrlwi is "clear left (word) immediate", and zeroes everything
except bit 0, which is equivalent to "& KNODE_DEAD".


So there seems to be some misunderstanding with clang, it doesn't like
us passing an expression to the inline asm.

AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
doesn't have to just be a variable name.

This patch seems to fix it. Not sure if that's just papering over it though.

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..75fcb4370d96 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on: \
\
WARN_ENTRY(PPC_TLNEI " %4, 0", \
BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
- __label_warn_on, "r" (x)); \
+ __label_warn_on, "r" (!!(x))); \
break; \
__label_warn_on: \
__ret_warn_on = true; \


Generates:

c0000000008e2ac0 <.klist_add_tail>:
c0000000008e2ac0: 7c 08 02 a6 mflr r0
c0000000008e2ac4: f8 01 00 10 std r0,16(r1)
c0000000008e2ac8: f8 21 ff 71 stdu r1,-144(r1)
c0000000008e2acc: fb a1 00 78 std r29,120(r1)
c0000000008e2ad0: 7c 7d 1b 78 mr r29,r3
c0000000008e2ad4: 38 60 00 01 li r3,1
c0000000008e2ad8: fb c1 00 80 std r30,128(r1)
c0000000008e2adc: 7c 9e 23 78 mr r30,r4
c0000000008e2ae0: 38 9d 00 18 addi r4,r29,24
c0000000008e2ae4: 57 c5 07 fe clrlwi r5,r30,31 <-
c0000000008e2ae8: fb 81 00 70 std r28,112(r1)
c0000000008e2aec: 3b 9d 00 08 addi r28,r29,8
c0000000008e2af0: fb 9d 00 08 std r28,8(r29)
c0000000008e2af4: fb 9d 00 10 std r28,16(r29)
c0000000008e2af8: 90 64 00 00 stw r3,0(r4)
c0000000008e2afc: fb dd 00 00 std r30,0(r29)
c0000000008e2b00: 0b 05 00 00 tdnei r5,0 <-


cheers

2021-08-26 03:27:31

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> This one possibly the branches end up in predictors, whereas conditional
>> trap is always just speculated not to hit. Branches may also have a
>> throughput limit on execution whereas trap could be more (1 per cycle
>> vs 4 per cycle on POWER9).
>
> I thought only *taken* branches are just one per cycle?

Taken branches are fetched by the front end at one per cycle (assuming
they hit the BTAC), but all branches have to be executed by BR at one
per cycle. On POWER9 BR even has to execute some other things like mflr
as well, but at least that's improved on POWER10.

Trap is executed at 4 per cycle and will never use branch table entries
or alias with a non-tagged predictor and mispredict.

> And those
> branches are only taken for the exceptional condition (or the case where
> we do not care about performance, anyway, if we do have an error most of
> the time ;-) )

It's not that big a deal, but trap is really the best instruction for
this.

>
>> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
>> is the CFAR issue as well which makes it a problem for 64s. It would
>> have been nice if it could use the same code though.
>
> On 64-bit the code looks better for the no-error path as well.
>
>> Maybe one day gcc's __builtin_trap() will become smart enough around
>> conditional statements that it it generates better code and tries to
>> avoid branches.
>
> Internally *all* traps are conditional, in GCC. It also can optimise
> them quite well. There must be something in the kernel macros that
> prevents good optimisation.

I did take a look at it at one point.

One problem is that the kernel needs the address of the trap instruction
to create the entry for it. The other problem is that __builtin_trap
does not return so it can't be used for WARN. LLVM at least seems to
have a __builtin_debugtrap which does return.

The first problem seems like the show stopper though. AFAIKS it would
need a special builtin support that does something to create the table
entry, or a guarantee that we could put an inline asm right after the
builtin as a recognized pattern and that would give us the instruction
following the trap.

Thanks,
Nick

2021-08-26 06:39:58

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto



Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
> Nathan Chancellor <[email protected]> writes:
>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>> flexibility to GCC.
> ...
>>
>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>> klist_add_tail to trigger over and over on boot when compiling with
>> clang:

...

>
> This patch seems to fix it. Not sure if that's just papering over it though.
>
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on: \
> \
> WARN_ENTRY(PPC_TLNEI " %4, 0", \
> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
> - __label_warn_on, "r" (x)); \
> + __label_warn_on, "r" (!!(x))); \
> break; \
> __label_warn_on: \
> __ret_warn_on = true; \
>
>

But for a simple WARN_ON() call:

void test(unsigned long b)
{
WARN_ON(b);
}

Without your change with GCC you get:

00000000000012d0 <.test>:
12d0: 0b 03 00 00 tdnei r3,0
12d4: 4e 80 00 20 blr


With the !! change you get:

00000000000012d0 <.test>:
12d0: 31 23 ff ff addic r9,r3,-1
12d4: 7d 29 19 10 subfe r9,r9,r3
12d8: 0b 09 00 00 tdnei r9,0
12dc: 4e 80 00 20 blr

Christophe

2021-08-26 12:57:36

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Hi!

On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> This one possibly the branches end up in predictors, whereas conditional
> >> trap is always just speculated not to hit. Branches may also have a
> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> vs 4 per cycle on POWER9).
> >
> > I thought only *taken* branches are just one per cycle?
>
> Taken branches are fetched by the front end at one per cycle (assuming
> they hit the BTAC), but all branches have to be executed by BR at one
> per cycle

This is not true. (Simple) predicted not-taken conditional branches are
just folded out, never hit the issue queues. And they are fetched as
many together as fit in a fetch group, can complete without limits as
well.

The BTAC is a frontend thing, used for target address prediction. It
does not limit execution.

Correctly predicted simple conditional branches just get their prediction
validated (and that is not done in the execution units). Incorrectly
predicted branches the same, but those cause a redirect and refetch.

> > Internally *all* traps are conditional, in GCC. It also can optimise
> > them quite well. There must be something in the kernel macros that
> > prevents good optimisation.
>
> I did take a look at it at one point.
>
> One problem is that the kernel needs the address of the trap instruction
> to create the entry for it. The other problem is that __builtin_trap
> does not return so it can't be used for WARN. LLVM at least seems to
> have a __builtin_debugtrap which does return.

This is <https://gcc.gnu.org/PR99299>.

> The first problem seems like the show stopper though. AFAIKS it would
> need a special builtin support that does something to create the table
> entry, or a guarantee that we could put an inline asm right after the
> builtin as a recognized pattern and that would give us the instruction
> following the trap.

I'm not quite sure what this means. Can't you always just put a

bla: asm("");

in there, and use the address of "bla"? If not, you need to say a lot
more about what you actually want to do :-/


Segher

2021-08-26 13:55:46

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

On Thu, Aug 26, 2021 at 08:37:09AM +0200, Christophe Leroy wrote:
> Le 26/08/2021 ? 05:21, Michael Ellerman a ?crit?:
> >This patch seems to fix it. Not sure if that's just papering over it
> >though.
> >
> >diff --git a/arch/powerpc/include/asm/bug.h
> >b/arch/powerpc/include/asm/bug.h
> >index 1ee0f22313ee..75fcb4370d96 100644
> >--- a/arch/powerpc/include/asm/bug.h
> >+++ b/arch/powerpc/include/asm/bug.h
> >@@ -119,7 +119,7 @@ __label_warn_on: \
> > \
> > WARN_ENTRY(PPC_TLNEI " %4, 0", \
> > BUGFLAG_WARNING |
> > BUGFLAG_TAINT(TAINT_WARN), \
> >- __label_warn_on, "r" (x)); \
> >+ __label_warn_on, "r" (!!(x))); \
> > break; \
> > __label_warn_on: \
> > __ret_warn_on = true; \
>
> But for a simple WARN_ON() call:
>
> void test(unsigned long b)
> {
> WARN_ON(b);
> }
>
> Without your change with GCC you get:
>
> 00000000000012d0 <.test>:
> 12d0: 0b 03 00 00 tdnei r3,0
> 12d4: 4e 80 00 20 blr
>
>
> With the !! change you get:
>
> 00000000000012d0 <.test>:
> 12d0: 31 23 ff ff addic r9,r3,-1
> 12d4: 7d 29 19 10 subfe r9,r9,r3
> 12d8: 0b 09 00 00 tdnei r9,0
> 12dc: 4e 80 00 20 blr

That is because the asm (unlike the builtin) cannot be optimised by the
compiler.


Segher

2021-08-26 14:00:28

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
>
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> >
>> > I thought only *taken* branches are just one per cycle?
>>
>> Taken branches are fetched by the front end at one per cycle (assuming
>> they hit the BTAC), but all branches have to be executed by BR at one
>> per cycle
>
> This is not true. (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues. And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.

No, they are all dispatched and issue to the BRU for execution. It's
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.

> The BTAC is a frontend thing, used for target address prediction. It
> does not limit execution.

I didn't say it did.

> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units). Incorrectly
> predicted branches the same, but those cause a redirect and refetch.

How could it validate prediction without issuing? It wouldn't know when
sources are ready.

>
>> > Internally *all* traps are conditional, in GCC. It also can optimise
>> > them quite well. There must be something in the kernel macros that
>> > prevents good optimisation.
>>
>> I did take a look at it at one point.
>>
>> One problem is that the kernel needs the address of the trap instruction
>> to create the entry for it. The other problem is that __builtin_trap
>> does not return so it can't be used for WARN. LLVM at least seems to
>> have a __builtin_debugtrap which does return.
>
> This is <https://gcc.gnu.org/PR99299>.

Aha.

>> The first problem seems like the show stopper though. AFAIKS it would
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
>
> I'm not quite sure what this means. Can't you always just put a
>
> bla: asm("");
>
> in there, and use the address of "bla"?

Not AFAIKS. Put it where?

> If not, you need to say a lot
> more about what you actually want to do :-/

We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:

__builtin_trap();
asm ("1: \n\t"
" .section __bug_table,\"aw\" \n\t"
"2: .4byte 1b - 2b - 4 \n\t"
" .previous");

Where the 1: label must follow immediately after the trap instruction,
and where the asm must be emitted even for the case of no-return traps
(but anything following the asm could be omitted).

Thanks,
Nick

2021-08-26 14:20:56

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
>
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.

It certainly is. That is the whole point of inline asm! This way, all
of the C code "around" the asm can be optimised.

> This patch seems to fix it. Not sure if that's just papering over it though.

It is, and it makes less optimised code (also on GCC), as Christophe
points out.


Segher

2021-08-26 14:45:19

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> >> This one possibly the branches end up in predictors, whereas conditional
> >> >> trap is always just speculated not to hit. Branches may also have a
> >> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> >> vs 4 per cycle on POWER9).
> >> >
> >> > I thought only *taken* branches are just one per cycle?
> >>
> >> Taken branches are fetched by the front end at one per cycle (assuming
> >> they hit the BTAC), but all branches have to be executed by BR at one
> >> per cycle
> >
> > This is not true. (Simple) predicted not-taken conditional branches are
> > just folded out, never hit the issue queues. And they are fetched as
> > many together as fit in a fetch group, can complete without limits as
> > well.
>
> No, they are all dispatched and issue to the BRU for execution. It's
> trivial to construct a test of a lot of not taken branches in a row
> and time a loop of it to see it executes at 1 cycle per branch.

(s/dispatched/issued/)

Huh, this was true on p8 already. Sorry for my confusion. In my
defence, this doesn't matter for performance on "real code".

> > Correctly predicted simple conditional branches just get their prediction
> > validated (and that is not done in the execution units). Incorrectly
> > predicted branches the same, but those cause a redirect and refetch.
>
> How could it validate prediction without issuing? It wouldn't know when
> sources are ready.

In the backend. But that is just how it worked on older cores :-/

> >> The first problem seems like the show stopper though. AFAIKS it would
> >> need a special builtin support that does something to create the table
> >> entry, or a guarantee that we could put an inline asm right after the
> >> builtin as a recognized pattern and that would give us the instruction
> >> following the trap.
> >
> > I'm not quite sure what this means. Can't you always just put a
> >
> > bla: asm("");
> >
> > in there, and use the address of "bla"?
>
> Not AFAIKS. Put it where?

After wherever you want to know the address after. You will have to
make sure they stay together somehow.

It is much easier to get the address of something, not the address after
it. If you know it is just one insn anyway, that isn't hard to handle
either (even if prefixed insns exist).

> > If not, you need to say a lot
> > more about what you actually want to do :-/
>
> We need to put the address (or relative offset) of the trap instruction
> into an entry in a different section. Basically this:
>
> __builtin_trap();
> asm ("1: \n\t"
> " .section __bug_table,\"aw\" \n\t"
> "2: .4byte 1b - 2b - 4 \n\t"
> " .previous");
>
> Where the 1: label must follow immediately after the trap instruction,
> and where the asm must be emitted even for the case of no-return traps
> (but anything following the asm could be omitted).

The address *after* the trap insn? That is much much harder than the
address *of* the trap insn.


Segher

2021-08-26 14:46:35

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Christophe Leroy <[email protected]> writes:
> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>> Nathan Chancellor <[email protected]> writes:
>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>> flexibility to GCC.
>> ...
>>>
>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>> klist_add_tail to trigger over and over on boot when compiling with
>>> clang:
>
> ...
>
>>
>> This patch seems to fix it. Not sure if that's just papering over it though.
>>
>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> index 1ee0f22313ee..75fcb4370d96 100644
>> --- a/arch/powerpc/include/asm/bug.h
>> +++ b/arch/powerpc/include/asm/bug.h
>> @@ -119,7 +119,7 @@ __label_warn_on: \
>> \
>> WARN_ENTRY(PPC_TLNEI " %4, 0", \
>> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
>> - __label_warn_on, "r" (x)); \
>> + __label_warn_on, "r" (!!(x))); \
>> break; \
>> __label_warn_on: \
>> __ret_warn_on = true; \
>
> But for a simple WARN_ON() call:
>
> void test(unsigned long b)
> {
> WARN_ON(b);
> }
>
> Without your change with GCC you get:
>
> 00000000000012d0 <.test>:
> 12d0: 0b 03 00 00 tdnei r3,0
> 12d4: 4e 80 00 20 blr
>
>
> With the !! change you get:
>
> 00000000000012d0 <.test>:
> 12d0: 31 23 ff ff addic r9,r3,-1
> 12d4: 7d 29 19 10 subfe r9,r9,r3
> 12d8: 0b 09 00 00 tdnei r9,0
> 12dc: 4e 80 00 20 blr

Yeah that's a pity.

We could do something like below, which is ugly, but would be better
than having to revert the whole thing.

Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

So possibly we need a CLANG ifdef around the whole thing, and use the
old style warn for clang.

cheers


diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..d978d9004d0d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -106,6 +106,12 @@ __label_warn_on: \
} \
} while (0)

+#ifdef CONFIG_CC_IS_CLANG
+#define __clang_warn_hack(x) (!!(x))
+#else
+#define __clang_warn_hack(x) (x)
+#endif
+
#define WARN_ON(x) ({ \
bool __ret_warn_on = false; \
do { \
@@ -119,7 +125,8 @@ __label_warn_on: \
\
WARN_ENTRY(PPC_TLNEI " %4, 0", \
BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
- __label_warn_on, "r" (x)); \
+ __label_warn_on, \
+ "r" __clang_warn_hack(x)); \
break; \
__label_warn_on: \
__ret_warn_on = true; \


2021-08-26 14:56:45

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto



Le 26/08/2021 à 16:45, Michael Ellerman a écrit :
> Christophe Leroy <[email protected]> writes:
>> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>>> Nathan Chancellor <[email protected]> writes:
>>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>>> flexibility to GCC.
>>> ...
>>>>
>>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>>> klist_add_tail to trigger over and over on boot when compiling with
>>>> clang:
>>
>> ...
>>
>>>
>>> This patch seems to fix it. Not sure if that's just papering over it though.
>>>
>>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>>> index 1ee0f22313ee..75fcb4370d96 100644
>>> --- a/arch/powerpc/include/asm/bug.h
>>> +++ b/arch/powerpc/include/asm/bug.h
>>> @@ -119,7 +119,7 @@ __label_warn_on: \
>>> \
>>> WARN_ENTRY(PPC_TLNEI " %4, 0", \
>>> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
>>> - __label_warn_on, "r" (x)); \
>>> + __label_warn_on, "r" (!!(x))); \
>>> break; \
>>> __label_warn_on: \
>>> __ret_warn_on = true; \
>>
>> But for a simple WARN_ON() call:
>>
>> void test(unsigned long b)
>> {
>> WARN_ON(b);
>> }
>>
>> Without your change with GCC you get:
>>
>> 00000000000012d0 <.test>:
>> 12d0: 0b 03 00 00 tdnei r3,0
>> 12d4: 4e 80 00 20 blr
>>
>>
>> With the !! change you get:
>>
>> 00000000000012d0 <.test>:
>> 12d0: 31 23 ff ff addic r9,r3,-1
>> 12d4: 7d 29 19 10 subfe r9,r9,r3
>> 12d8: 0b 09 00 00 tdnei r9,0
>> 12dc: 4e 80 00 20 blr
>
> Yeah that's a pity.
>
> We could do something like below, which is ugly, but would be better
> than having to revert the whole thing.

Yes looks nice, we already had that kind of stuff in the past, even more ugly.

>
> Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

What's the warning ?


>
> So possibly we need a CLANG ifdef around the whole thing, and use the
> old style warn for clang.
>

Christophe

2021-08-26 15:08:29

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
>> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> >> This one possibly the branches end up in predictors, whereas conditional
>> >> >> trap is always just speculated not to hit. Branches may also have a
>> >> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> >> vs 4 per cycle on POWER9).
>> >> >
>> >> > I thought only *taken* branches are just one per cycle?
>> >>
>> >> Taken branches are fetched by the front end at one per cycle (assuming
>> >> they hit the BTAC), but all branches have to be executed by BR at one
>> >> per cycle
>> >
>> > This is not true. (Simple) predicted not-taken conditional branches are
>> > just folded out, never hit the issue queues. And they are fetched as
>> > many together as fit in a fetch group, can complete without limits as
>> > well.
>>
>> No, they are all dispatched and issue to the BRU for execution. It's
>> trivial to construct a test of a lot of not taken branches in a row
>> and time a loop of it to see it executes at 1 cycle per branch.
>
> (s/dispatched/issued/)

?

>
> Huh, this was true on p8 already. Sorry for my confusion. In my
> defence, this doesn't matter for performance on "real code".
>
>> > Correctly predicted simple conditional branches just get their prediction
>> > validated (and that is not done in the execution units). Incorrectly
>> > predicted branches the same, but those cause a redirect and refetch.
>>
>> How could it validate prediction without issuing? It wouldn't know when
>> sources are ready.
>
> In the backend. But that is just how it worked on older cores :-/

Okay. I don't know about older cores than POWER9. Backend would normally
include execution though. Only other place you could do it if you don't
issue/exec would be after it goes back in order, like completion. But
that would be horrible for mispredict penalty.

>> >> The first problem seems like the show stopper though. AFAIKS it would
>> >> need a special builtin support that does something to create the table
>> >> entry, or a guarantee that we could put an inline asm right after the
>> >> builtin as a recognized pattern and that would give us the instruction
>> >> following the trap.
>> >
>> > I'm not quite sure what this means. Can't you always just put a
>> >
>> > bla: asm("");
>> >
>> > in there, and use the address of "bla"?
>>
>> Not AFAIKS. Put it where?
>
> After wherever you want to know the address after. You will have to
> make sure they stay together somehow.

I still don't follow.

> It is much easier to get the address of something, not the address after
> it. If you know it is just one insn anyway, that isn't hard to handle
> either (even if prefixed insns exist).
>
>> > If not, you need to say a lot
>> > more about what you actually want to do :-/
>>
>> We need to put the address (or relative offset) of the trap instruction
>> into an entry in a different section. Basically this:
>>
>> __builtin_trap();
>> asm ("1: \n\t"
>> " .section __bug_table,\"aw\" \n\t"
>> "2: .4byte 1b - 2b - 4 \n\t"
>> " .previous");
>>
>> Where the 1: label must follow immediately after the trap instruction,
>> and where the asm must be emitted even for the case of no-return traps
>> (but anything following the asm could be omitted).
>
> The address *after* the trap insn? That is much much harder than the
> address *of* the trap insn.

No the address of the trap instruction, hence the -4. I have the label
after because that is the semantics I have from inline asm.

If you could give a built in that put a label at the address of the trap
instruction that could be used later by inline asm then that could work
too:

__builtin_labeled_trap("1:");
asm (" .section __bug_table,\"aw\" \n\t"
"2: .4byte 1b - 2b \n\t"
" .previous");

2021-08-26 15:39:55

by Segher Boessenkool

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> >> No, they are all dispatched and issue to the BRU for execution. It's
> >> trivial to construct a test of a lot of not taken branches in a row
> >> and time a loop of it to see it executes at 1 cycle per branch.
> >
> > (s/dispatched/issued/)
>
> ?

Dispatch is from decode to the issue queues. Issue is from there to
execution units. Dispatch is in-order, issue is not.

> >> How could it validate prediction without issuing? It wouldn't know when
> >> sources are ready.
> >
> > In the backend. But that is just how it worked on older cores :-/
>
> Okay. I don't know about older cores than POWER9. Backend would normally
> include execution though.
> Only other place you could do it if you don't
> issue/exec would be after it goes back in order, like completion.

You do not have to do the verification in-order: the insn cannot finish
until it is no longer speculative, that takes care of all ordering
needed.

> But that would be horrible for mispredict penalty.

See the previous point. Also, any insn known to be mispredicted can be
flushed immediately anyway.

> >> >> The first problem seems like the show stopper though. AFAIKS it would
> >> >> need a special builtin support that does something to create the table
> >> >> entry, or a guarantee that we could put an inline asm right after the
> >> >> builtin as a recognized pattern and that would give us the instruction
> >> >> following the trap.
> >> >
> >> > I'm not quite sure what this means. Can't you always just put a
> >> >
> >> > bla: asm("");
> >> >
> >> > in there, and use the address of "bla"?
> >>
> >> Not AFAIKS. Put it where?
> >
> > After wherever you want to know the address after. You will have to
> > make sure they stay together somehow.
>
> I still don't follow.

some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
empty_asm_that_we_can_take_the_address_of_known_as_B;

You have to make sure the compiler keeps A and B together, does not
insert anything between them, does put them in the assembler output in
the same fragment, etc.

> If you could give a built in that put a label at the address of the trap
> instruction that could be used later by inline asm then that could work
> too:
>
> __builtin_labeled_trap("1:");
> asm (" .section __bug_table,\"aw\" \n\t"
> "2: .4byte 1b - 2b \n\t"
> " .previous");

How could a compiler do anything like that?!


Segher

2021-08-26 18:56:17

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> Nathan Chancellor <[email protected]> writes:
> > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> >> flexibility to GCC.
> ...
> >
> > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > klist_add_tail to trigger over and over on boot when compiling with
> > clang:
> >
> > [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > [ 2.177456][ T1] Modules linked in:
> > [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1
> > [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825)
> > [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000
> > [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0
> > [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > [ 2.178088][ T1] Call Trace:
> > [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > [ 2.178569][ T1] Instruction dump:
> > [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]---
> >
> > Is this a bug with clang or is there something wrong with the patch? The
> > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > command and the warning at boot can be viewed at [2]. If there is any
> > other information I can provide, please let me know.
> >
> > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
>
> Thanks.
>
> This is the generated assembly:
>
> c0000000007ff600 <.klist_add_tail>:
> c0000000007ff600: 7c 08 02 a6 mflr r0
> c0000000007ff604: f8 01 00 10 std r0,16(r1)
> c0000000007ff608: f8 21 ff 71 stdu r1,-144(r1) ^ prolog
> c0000000007ff60c: fb a1 00 78 std r29,120(r1) save r29 to stack
> c0000000007ff610: 7c 7d 1b 78 mr r29,r3 r29 = struct klist_node *n
> c0000000007ff614: 38 60 00 01 li r3,1 r3 = 1
> c0000000007ff618: fb 81 00 70 std r28,112(r1) save r28 to stack
> c0000000007ff61c: 3b 9d 00 08 addi r28,r29,8 r28 = &n->n_node
> c0000000007ff620: fb c1 00 80 std r30,128(r1) save r30 to stack
> c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k
> c0000000007ff628: 38 9d 00 18 addi r4,r29,24 r4 = &n->n_ref
> c0000000007ff62c: fb 9d 00 08 std r28,8(r29) n->n_node.next = &n->n_node INIT_LIST_HEAD
> c0000000007ff630: fb 9d 00 10 std r28,16(r29) n->n_node.prev = &n->n_node
> c0000000007ff634: 90 64 00 00 stw r3,0(r4) kref_init(&n->n_ref)
> c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k
> c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero
>
>
> From:
>
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> knode->n_klist = klist;
> /* no knode deserves to start its life dead */
> WARN_ON(knode_dead(knode));
> ^^^^^^^^^^^^^^^^^
> }
>
> Which expands to:
>
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> knode->n_klist = klist;
>
> ({
> bool __ret_warn_on = false;
> do {
> ...
> } else {
> __label__ __label_warn_on;
> do {
> asm goto(
> "1: "
> "tdnei"
> "
> " " % 4,
> 0 " "\n " ".section __ex_table,\"a\";"
> " "
> ".balign 4;"
> " "
> ".long (1b) - . ;"
> " "
> ".long (%l[__label_warn_on]) - . ;"
> " "
> ".previous"
> " "
> ".section __bug_table,\"aw\"\n"
> "2:\t.4byte 1b - 2b, %0 - 2b\n"
> "\t.short %1, %2\n"
> ".org 2b+%3\n"
> ".previous\n"
> :
> : "i"("lib/klist.c"), "i"(62),
> "i"((1 << 0) | ((9) << 8)),
> "i"(sizeof(struct bug_entry)),
> "r"(knode_dead(knode))
> ^^^^^^^^^^^^^^^^^^^^^
>
> :
> : __label_warn_on);
> asm("");
> } while (0);
> break;
> __label_warn_on:
> __ret_warn_on = true;
> }
> } while (0);
> __builtin_expect(!!(__ret_warn_on), 0);
> });
> }
>
> And knode_dead is:
>
> #define KNODE_DEAD 1LU
>
> static bool knode_dead(struct klist_node *knode)
> {
> return (unsigned long)knode->n_klist & KNODE_DEAD;
> }
>
>
> So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
>
> But in the asm:
>
> c0000000007ff600 <.klist_add_tail>:
> ...
> c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k
> ...
> c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k
> c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero
>
>
> It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
>
> In the GCC output you can see it:
>
> c0000000008c8a30 <.klist_node_init>:
> c0000000008c8a30: 39 24 00 08 addi r9,r4,8
> c0000000008c8a34: 39 40 00 01 li r10,1
> c0000000008c8a38: f9 24 00 08 std r9,8(r4)
> c0000000008c8a3c: f9 24 00 10 std r9,16(r4)
> c0000000008c8a40: 91 44 00 18 stw r10,24(r4)
> c0000000008c8a44: f8 64 00 00 std r3,0(r4)
> c0000000008c8a48: 54 69 07 fe clrlwi r9,r3,31
> c0000000008c8a4c: 0b 09 00 00 tdnei r9,0
>
> ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> except bit 0, which is equivalent to "& KNODE_DEAD".
>
>
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
>
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.
>
> This patch seems to fix it. Not sure if that's just papering over it though.
>
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on: \
> \
> WARN_ENTRY(PPC_TLNEI " %4, 0", \
> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
> - __label_warn_on, "r" (x)); \
> + __label_warn_on, "r" (!!(x))); \
> break; \
> __label_warn_on: \
> __ret_warn_on = true; \
>
>
> Generates:
>
> c0000000008e2ac0 <.klist_add_tail>:
> c0000000008e2ac0: 7c 08 02 a6 mflr r0
> c0000000008e2ac4: f8 01 00 10 std r0,16(r1)
> c0000000008e2ac8: f8 21 ff 71 stdu r1,-144(r1)
> c0000000008e2acc: fb a1 00 78 std r29,120(r1)
> c0000000008e2ad0: 7c 7d 1b 78 mr r29,r3
> c0000000008e2ad4: 38 60 00 01 li r3,1
> c0000000008e2ad8: fb c1 00 80 std r30,128(r1)
> c0000000008e2adc: 7c 9e 23 78 mr r30,r4
> c0000000008e2ae0: 38 9d 00 18 addi r4,r29,24
> c0000000008e2ae4: 57 c5 07 fe clrlwi r5,r30,31 <-
> c0000000008e2ae8: fb 81 00 70 std r28,112(r1)
> c0000000008e2aec: 3b 9d 00 08 addi r28,r29,8
> c0000000008e2af0: fb 9d 00 08 std r28,8(r29)
> c0000000008e2af4: fb 9d 00 10 std r28,16(r29)
> c0000000008e2af8: 90 64 00 00 stw r3,0(r4)
> c0000000008e2afc: fb dd 00 00 std r30,0(r29)
> c0000000008e2b00: 0b 05 00 00 tdnei r5,0 <-

Thank you for the in-depth explanation and triage! I have gone ahead and
created a standalone reproducer that shows this based on the
preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
so the LLVM developers can take a look.

Cheers,
Nathan

2021-08-26 23:58:04

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> > Nathan Chancellor <[email protected]> writes:
> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> > >> flexibility to GCC.
> > ...
> > >
> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > > klist_add_tail to trigger over and over on boot when compiling with
> > > clang:
> > >
> > > [ 2.177416][ T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > > [ 2.177456][ T1] Modules linked in:
> > > [ 2.177481][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 5.14.0-rc7-next-20210825 #1
> > > [ 2.177520][ T1] NIP: c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > > [ 2.177557][ T1] REGS: c0000000073c32a0 TRAP: 0700 Tainted: G W (5.14.0-rc7-next-20210825)
> > > [ 2.177593][ T1] MSR: 8000000002029032 <SF,VEC,EE,ME,IR,DR,RI> CR: 22000a40 XER: 00000000
> > > [ 2.177667][ T1] CFAR: c00000000090a034 IRQMASK: 0
> > > [ 2.177667][ T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > > [ 2.177667][ T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > > [ 2.177667][ T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > > [ 2.177667][ T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > > [ 2.177667][ T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [ 2.177667][ T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [ 2.177667][ T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > > [ 2.177667][ T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > > [ 2.178019][ T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > > [ 2.178058][ T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [ 2.178088][ T1] Call Trace:
> > > [ 2.178105][ T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > > [ 2.178150][ T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [ 2.178190][ T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > > [ 2.178234][ T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > > [ 2.178275][ T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > > [ 2.178314][ T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > > [ 2.178357][ T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > > [ 2.178403][ T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > > [ 2.178445][ T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > > [ 2.178491][ T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > > [ 2.178530][ T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > > [ 2.178569][ T1] Instruction dump:
> > > [ 2.178592][ T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > > [ 2.178662][ T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > > [ 2.178728][ T1] ---[ end trace 52ed3431f58f1847 ]---
> > >
> > > Is this a bug with clang or is there something wrong with the patch? The
> > > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > > command and the warning at boot can be viewed at [2]. If there is any
> > > other information I can provide, please let me know.
> > >
> > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> >
> > Thanks.
> >
> > This is the generated assembly:
> >
> > c0000000007ff600 <.klist_add_tail>:
> > c0000000007ff600: 7c 08 02 a6 mflr r0
> > c0000000007ff604: f8 01 00 10 std r0,16(r1)
> > c0000000007ff608: f8 21 ff 71 stdu r1,-144(r1) ^ prolog
> > c0000000007ff60c: fb a1 00 78 std r29,120(r1) save r29 to stack
> > c0000000007ff610: 7c 7d 1b 78 mr r29,r3 r29 = struct klist_node *n
> > c0000000007ff614: 38 60 00 01 li r3,1 r3 = 1
> > c0000000007ff618: fb 81 00 70 std r28,112(r1) save r28 to stack
> > c0000000007ff61c: 3b 9d 00 08 addi r28,r29,8 r28 = &n->n_node
> > c0000000007ff620: fb c1 00 80 std r30,128(r1) save r30 to stack
> > c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k
> > c0000000007ff628: 38 9d 00 18 addi r4,r29,24 r4 = &n->n_ref
> > c0000000007ff62c: fb 9d 00 08 std r28,8(r29) n->n_node.next = &n->n_node INIT_LIST_HEAD
> > c0000000007ff630: fb 9d 00 10 std r28,16(r29) n->n_node.prev = &n->n_node
> > c0000000007ff634: 90 64 00 00 stw r3,0(r4) kref_init(&n->n_ref)
> > c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k
> > c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero
> >
> >
> > From:
> >
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > knode->n_klist = klist;
> > /* no knode deserves to start its life dead */
> > WARN_ON(knode_dead(knode));
> > ^^^^^^^^^^^^^^^^^
> > }
> >
> > Which expands to:
> >
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > knode->n_klist = klist;
> >
> > ({
> > bool __ret_warn_on = false;
> > do {
> > ...
> > } else {
> > __label__ __label_warn_on;
> > do {
> > asm goto(
> > "1: "
> > "tdnei"
> > "
> > " " % 4,
> > 0 " "\n " ".section __ex_table,\"a\";"
> > " "
> > ".balign 4;"
> > " "
> > ".long (1b) - . ;"
> > " "
> > ".long (%l[__label_warn_on]) - . ;"
> > " "
> > ".previous"
> > " "
> > ".section __bug_table,\"aw\"\n"
> > "2:\t.4byte 1b - 2b, %0 - 2b\n"
> > "\t.short %1, %2\n"
> > ".org 2b+%3\n"
> > ".previous\n"
> > :
> > : "i"("lib/klist.c"), "i"(62),
> > "i"((1 << 0) | ((9) << 8)),
> > "i"(sizeof(struct bug_entry)),
> > "r"(knode_dead(knode))
> > ^^^^^^^^^^^^^^^^^^^^^
> >
> > :
> > : __label_warn_on);
> > asm("");
> > } while (0);
> > break;
> > __label_warn_on:
> > __ret_warn_on = true;
> > }
> > } while (0);
> > __builtin_expect(!!(__ret_warn_on), 0);
> > });
> > }
> >
> > And knode_dead is:
> >
> > #define KNODE_DEAD 1LU
> >
> > static bool knode_dead(struct klist_node *knode)
> > {
> > return (unsigned long)knode->n_klist & KNODE_DEAD;
> > }
> >
> >
> > So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> >
> > But in the asm:
> >
> > c0000000007ff600 <.klist_add_tail>:
> > ...
> > c0000000007ff624: 7c 9e 23 78 mr r30,r4 r30 = struct klist *k
> > ...
> > c0000000007ff638: fb dd 00 00 std r30,0(r29) n->n_klist = k
> > c0000000007ff63c: 0b 1e 00 00 tdnei r30,0 trap if r30 (k) is not zero
> >
> >
> > It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> >
> > In the GCC output you can see it:
> >
> > c0000000008c8a30 <.klist_node_init>:
> > c0000000008c8a30: 39 24 00 08 addi r9,r4,8
> > c0000000008c8a34: 39 40 00 01 li r10,1
> > c0000000008c8a38: f9 24 00 08 std r9,8(r4)
> > c0000000008c8a3c: f9 24 00 10 std r9,16(r4)
> > c0000000008c8a40: 91 44 00 18 stw r10,24(r4)
> > c0000000008c8a44: f8 64 00 00 std r3,0(r4)
> > c0000000008c8a48: 54 69 07 fe clrlwi r9,r3,31
> > c0000000008c8a4c: 0b 09 00 00 tdnei r9,0
> >
> > ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> > except bit 0, which is equivalent to "& KNODE_DEAD".
> >
> >
> > So there seems to be some misunderstanding with clang, it doesn't like
> > us passing an expression to the inline asm.
> >
> > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> > doesn't have to just be a variable name.
> >
> > This patch seems to fix it. Not sure if that's just papering over it though.
> >
> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> > index 1ee0f22313ee..75fcb4370d96 100644
> > --- a/arch/powerpc/include/asm/bug.h
> > +++ b/arch/powerpc/include/asm/bug.h
> > @@ -119,7 +119,7 @@ __label_warn_on: \
> > \
> > WARN_ENTRY(PPC_TLNEI " %4, 0", \
> > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
> > - __label_warn_on, "r" (x)); \
> > + __label_warn_on, "r" (!!(x))); \
> > break; \
> > __label_warn_on: \
> > __ret_warn_on = true; \
> >
> >
> > Generates:
> >
> > c0000000008e2ac0 <.klist_add_tail>:
> > c0000000008e2ac0: 7c 08 02 a6 mflr r0
> > c0000000008e2ac4: f8 01 00 10 std r0,16(r1)
> > c0000000008e2ac8: f8 21 ff 71 stdu r1,-144(r1)
> > c0000000008e2acc: fb a1 00 78 std r29,120(r1)
> > c0000000008e2ad0: 7c 7d 1b 78 mr r29,r3
> > c0000000008e2ad4: 38 60 00 01 li r3,1
> > c0000000008e2ad8: fb c1 00 80 std r30,128(r1)
> > c0000000008e2adc: 7c 9e 23 78 mr r30,r4
> > c0000000008e2ae0: 38 9d 00 18 addi r4,r29,24
> > c0000000008e2ae4: 57 c5 07 fe clrlwi r5,r30,31 <-
> > c0000000008e2ae8: fb 81 00 70 std r28,112(r1)
> > c0000000008e2aec: 3b 9d 00 08 addi r28,r29,8
> > c0000000008e2af0: fb 9d 00 08 std r28,8(r29)
> > c0000000008e2af4: fb 9d 00 10 std r28,16(r29)
> > c0000000008e2af8: 90 64 00 00 stw r3,0(r4)
> > c0000000008e2afc: fb dd 00 00 std r30,0(r29)
> > c0000000008e2b00: 0b 05 00 00 tdnei r5,0 <-
>
> Thank you for the in-depth explanation and triage! I have gone ahead and
> created a standalone reproducer that shows this based on the
> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
> so the LLVM developers can take a look.

Based on Eli Friedman's comment in the bug, would something like this
work and not regress GCC? I noticed that the BUG_ON macro does a cast as
well. Nick pointed out to me privately that we have run into what seems
like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
sign extension for RV64I").

Cheers,
Nathan

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..35022667f57d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on: \
\
WARN_ENTRY(PPC_TLNEI " %4, 0", \
BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
- __label_warn_on, "r" (x)); \
+ __label_warn_on, "r" ((__force long)(x))); \
break; \
__label_warn_on: \
__ret_warn_on = true; \

2021-08-27 01:30:22

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

Excerpts from Segher Boessenkool's message of August 27, 2021 1:30 am:
> On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
>> >> No, they are all dispatched and issue to the BRU for execution. It's
>> >> trivial to construct a test of a lot of not taken branches in a row
>> >> and time a loop of it to see it executes at 1 cycle per branch.
>> >
>> > (s/dispatched/issued/)
>>
>> ?
>
> Dispatch is from decode to the issue queues. Issue is from there to
> execution units. Dispatch is in-order, issue is not.

I know what those mean, I wonder what your s/dispatched/issued means.
I was saying they are dispatched in response to you saying they never
hit the issue queue.

>
>> >> How could it validate prediction without issuing? It wouldn't know when
>> >> sources are ready.
>> >
>> > In the backend. But that is just how it worked on older cores :-/
>>
>> Okay. I don't know about older cores than POWER9. Backend would normally
>> include execution though.
>> Only other place you could do it if you don't
>> issue/exec would be after it goes back in order, like completion.
>
> You do not have to do the verification in-order: the insn cannot finish
> until it is no longer speculative, that takes care of all ordering
> needed.

Branches *can* finish out of order and speculative as they do in P9 and
P10. Are you talking about these CPUs or something else which can
verify branches without issuing them?

>
>> But that would be horrible for mispredict penalty.
>
> See the previous point. Also, any insn known to be mispredicted can be
> flushed immediately anyway.

The point is it has to know sources (CR) to verify (aka execute) the
branch prediction was right, and if it needs sources then it needs to
either issue and execute in the out of order part, or it needs to wait
until completion which would seem to be prohibitively expensive. I am
interested to know how it works.

>
>> >> >> The first problem seems like the show stopper though. AFAIKS it would
>> >> >> need a special builtin support that does something to create the table
>> >> >> entry, or a guarantee that we could put an inline asm right after the
>> >> >> builtin as a recognized pattern and that would give us the instruction
>> >> >> following the trap.
>> >> >
>> >> > I'm not quite sure what this means. Can't you always just put a
>> >> >
>> >> > bla: asm("");
>> >> >
>> >> > in there, and use the address of "bla"?
>> >>
>> >> Not AFAIKS. Put it where?
>> >
>> > After wherever you want to know the address after. You will have to
>> > make sure they stay together somehow.
>>
>> I still don't follow.
>
> some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
> empty_asm_that_we_can_take_the_address_of_known_as_B;
>
> You have to make sure the compiler keeps A and B together, does not
> insert anything between them, does put them in the assembler output in
> the same fragment, etc.

How does all this help our problem of putting the address of the trap
into the table?

>
>> If you could give a built in that put a label at the address of the trap
>> instruction that could be used later by inline asm then that could work
>> too:
>>
>> __builtin_labeled_trap("1:");
>> asm (" .section __bug_table,\"aw\" \n\t"
>> "2: .4byte 1b - 2b \n\t"
>> " .previous");
>
> How could a compiler do anything like that?!

How could it add a label at the trap instruction it generates? It didn't
seem like an outlandish thing to do, but I'm not a compiler writer. It was
just a handwaving idea to show what we want to be able to do.

Thanks,
Nick

2021-08-27 07:55:12

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto

Nathan Chancellor <[email protected]> writes:
> On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
>> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
>> > Nathan Chancellor <[email protected]> writes:
>> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>> > >> flexibility to GCC.
>> > ...
>> > >
>> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>> > > klist_add_tail to trigger over and over on boot when compiling with
>> > > clang:
>> > >
...
>> >
>> > This patch seems to fix it. Not sure if that's just papering over it though.
>> >
>> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> > index 1ee0f22313ee..75fcb4370d96 100644
>> > --- a/arch/powerpc/include/asm/bug.h
>> > +++ b/arch/powerpc/include/asm/bug.h
>> > @@ -119,7 +119,7 @@ __label_warn_on: \
>> > \
>> > WARN_ENTRY(PPC_TLNEI " %4, 0", \
>> > BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
>> > - __label_warn_on, "r" (x)); \
>> > + __label_warn_on, "r" (!!(x))); \
>> > break; \
>> > __label_warn_on: \
>> > __ret_warn_on = true; \
>> >
>>
>> Thank you for the in-depth explanation and triage! I have gone ahead and
>> created a standalone reproducer that shows this based on the
>> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
>> so the LLVM developers can take a look.
>
> Based on Eli Friedman's comment in the bug, would something like this
> work and not regress GCC? I noticed that the BUG_ON macro does a cast as
> well. Nick pointed out to me privately that we have run into what seems
> like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
> sign extension for RV64I").

Yes, I read that comment this morning, and then landed at the same fix
via digging through the history of our bug code.

We in fact fixed a similar bug 16 years ago :}

32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels")

Which is when we first started adding the cast to long.


> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..35022667f57d 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on: \
> \
> WARN_ENTRY(PPC_TLNEI " %4, 0", \
> BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
> - __label_warn_on, "r" (x)); \
> + __label_warn_on, "r" ((__force long)(x))); \
> break; \
> __label_warn_on: \
> __ret_warn_on = true; \


Yeah that fixes the clang build for me.

For GCC it seems to generate the same code in the simple cases:

void test_warn_on_ulong(unsigned long b)
{
WARN_ON(b);
}

void test_warn_on_int(int b)
{
WARN_ON(b);
}

I get:

0000000000000020 <.test_warn_on_ulong>:
20: 0b 03 00 00 tdnei r3,0
24: 4e 80 00 20 blr
28: 60 00 00 00 nop
2c: 60 00 00 00 nop

0000000000000030 <.test_warn_on_int>:
30: 0b 03 00 00 tdnei r3,0
34: 4e 80 00 20 blr

Both before and after the change. So that's good.

For:

void test_warn_on_int_addition(int b)
{
WARN_ON(b+1);
}

Before I get:

0000000000000040 <.test_warn_on_int_addition>:
40: 38 63 00 01 addi r3,r3,1
44: 0b 03 00 00 tdnei r3,0
48: 4e 80 00 20 blr

vs after:

0000000000000040 <.test_warn_on_int_addition>:
40: 38 63 00 01 addi r3,r3,1
44: 7c 63 07 b4 extsw r3,r3
48: 0b 03 00 00 tdnei r3,0
4c: 4e 80 00 20 blr


So there's an extra sign extension we don't need, but I guess we can
probably live with that.

cheers