2019-03-21 22:04:35

by Andi Kleen

[permalink] [raw]
Subject: Fixes and cleanup from LTO tree

Here are a range of bug fixes and cleanups that have accumulated in my
gcc Link Time Optimization (LTO) branches; for issues found
by the compiler when doing global optimization and a few
other issues.

(https://github.com/andikleen/linux-misc lto-*)

IMNSHO they are all useful improvements even without LTO support.

About half of it is in x86 specific code, but the others are
random all over. I tried to always copy the respective maintainers,
but since it's (nearly) a tree sweep I'm also copying Andrew.

-Andi




2019-03-21 22:01:25

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 10/17] delta: Fix buffer overrun in delta_ipc_open

From: Andi Kleen <[email protected]>

delta_ipc_open is always called with a single constant string
as name, but it uses a longer memcpy to copy the string to
a different structure. The memcpy would read outside the bounds
of the string, potentially accessing unmapped memory.

Just use strcpy instead after clearing the area.

This fixes a build error with LTO, which can detect this.

Cc: [email protected]
Cc: [email protected]
Fixes: 91c83f395fbe [media] st-delta: rpmsg ipc support
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/media/platform/sti/delta/delta-ipc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/sti/delta/delta-ipc.c b/drivers/media/platform/sti/delta/delta-ipc.c
index a4603d573c34..bd1bbbeedec3 100644
--- a/drivers/media/platform/sti/delta/delta-ipc.c
+++ b/drivers/media/platform/sti/delta/delta-ipc.c
@@ -175,8 +175,8 @@ int delta_ipc_open(struct delta_ctx *pctx, const char *name,
msg.ipc_buf_size = ipc_buf_size;
msg.ipc_buf_paddr = ctx->ipc_buf->paddr;

- memcpy(msg.name, name, sizeof(msg.name));
- msg.name[sizeof(msg.name) - 1] = 0;
+ memset(msg.name, 0, sizeof(msg.name));
+ strcpy(msg.name, name);

msg.param_size = param->size;
memcpy(ctx->ipc_buf->vaddr, param->data, msg.param_size);
--
2.20.1


2019-03-21 22:01:24

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 15/17] x86/hyperv: Make hv_vcpu_is_preempted visible

From: Andi Kleen <[email protected]>

This function is referrenced from assembler, so need to be marked
visible for LTO.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 3a025de64bf8 x86/hyperv: Enable PV qspinlock for Hyper-V
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/hyperv/hv_spinlock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index a861b0456b1a..07f21a06392f 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -56,7 +56,7 @@ static void hv_qlock_wait(u8 *byte, u8 val)
/*
* Hyper-V does not support this so far.
*/
-bool hv_vcpu_is_preempted(int vcpu)
+__visible bool hv_vcpu_is_preempted(int vcpu)
{
return false;
}
--
2.20.1


2019-03-21 22:01:30

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 08/17] x86/speculation: Fix __initconst in bugs.c

From: Andi Kleen <[email protected]>

Fix some of the recently added const tables to use __initconst
for const data instead of __initdata which causes section attribute
conflicts.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 2da82eff0eb4..b91b3bfa5cfb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -275,7 +275,7 @@ static const struct {
const char *option;
enum spectre_v2_user_cmd cmd;
bool secure;
-} v2_user_options[] __initdata = {
+} v2_user_options[] __initconst = {
{ "auto", SPECTRE_V2_USER_CMD_AUTO, false },
{ "off", SPECTRE_V2_USER_CMD_NONE, false },
{ "on", SPECTRE_V2_USER_CMD_FORCE, true },
@@ -419,7 +419,7 @@ static const struct {
const char *option;
enum spectre_v2_mitigation_cmd cmd;
bool secure;
-} mitigation_options[] __initdata = {
+} mitigation_options[] __initconst = {
{ "off", SPECTRE_V2_CMD_NONE, false },
{ "on", SPECTRE_V2_CMD_FORCE, true },
{ "retpoline", SPECTRE_V2_CMD_RETPOLINE, false },
@@ -658,7 +658,7 @@ static const char * const ssb_strings[] = {
static const struct {
const char *option;
enum ssb_mitigation_cmd cmd;
-} ssb_mitigation_options[] __initdata = {
+} ssb_mitigation_options[] __initconst = {
{ "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */
{ "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */
{ "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
--
2.20.1


2019-03-21 22:01:59

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 13/17] ASoC: AMD: Fix incorrect extern

From: Andi Kleen <[email protected]>

When using bare externs outside include files that types should
at least match. This fixes a type confusion between bool
and int.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
sound/soc/amd/acp-da7219-max98357a.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/amd/acp-da7219-max98357a.c b/sound/soc/amd/acp-da7219-max98357a.c
index a5daad973ce5..f001aad0c70e 100644
--- a/sound/soc/amd/acp-da7219-max98357a.c
+++ b/sound/soc/amd/acp-da7219-max98357a.c
@@ -47,7 +47,7 @@

static struct snd_soc_jack cz_jack;
static struct clk *da7219_dai_clk;
-extern int bt_uart_enable;
+extern bool bt_uart_enable;

static int cz_da7219_init(struct snd_soc_pcm_runtime *rtd)
{
--
2.20.1


2019-03-21 22:02:09

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 09/17] x86/kvm: Make steal_time visible

From: Andi Kleen <[email protected]>

This per cpu variable is accessed from assembler code, so needs
to be visible.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5c93a65ee1e5..3f0cc828cc36 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -67,7 +67,7 @@ static int __init parse_no_stealacc(char *arg)
early_param("no-steal-acc", parse_no_stealacc);

static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
-static DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64);
+DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64) __visible;
static int has_steal_clock = 0;

/*
--
2.20.1


2019-03-21 22:02:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 16/17] x86/xen: Mark xen_vcpu_stolen as __visible

From: Andi Kleen <[email protected]>

This function is referenced from assembler, so needs to be marked
__visible for LTO.

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/xen/time.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 0968859c29d0..04b59dab30f5 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -144,7 +144,7 @@ void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
}

/* return true when a vcpu could run but has no real cpu to run on */
-bool xen_vcpu_stolen(int vcpu)
+__visible bool xen_vcpu_stolen(int vcpu)
{
return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable;
}
--
2.20.1


2019-03-21 22:02:45

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

From: Andi Kleen <[email protected]>

With gcc 8 toplevel assembler statements that do not mark themselves
as .text may end up in other sections. I had boot crashes because
various assembler statements ended up in the middle of the initcall
section.

Always mark all the top level assembler statements as text
so that they switch to the right section.

For AMD "vide", which is only used on 32bit kernels, I also
marked it as 32bit only.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/paravirt_types.h | 2 +-
arch/x86/kernel/cpu/amd.c | 5 ++++-
arch/x86/kernel/kprobes/core.c | 1 +
arch/x86/lib/error-inject.c | 1 +
4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 2474e434a6f7..e246577a643b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -372,7 +372,7 @@ extern struct paravirt_patch_template pv_ops;

#define DEF_NATIVE(ops, name, code) \
__visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
- asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
+ asm(".text\n\t" NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))

unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
unsigned paravirt_patch_default(u8 type, void *insnbuf,
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 01004bfb1a1b..fb6a64bd765f 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -82,11 +82,14 @@ static inline int wrmsrl_amd_safe(unsigned msr, unsigned long long val)
* performance at the same time..
*/

+#ifdef CONFIG_X86_32
extern __visible void vide(void);
-__asm__(".globl vide\n"
+__asm__(".text\n"
+ ".globl vide\n"
".type vide, @function\n"
".align 4\n"
"vide: ret\n");
+#endif

static void init_amd_k5(struct cpuinfo_x86 *c)
{
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index a034cb808e7e..31ab91c9c4e9 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -715,6 +715,7 @@ NOKPROBE_SYMBOL(kprobe_int3_handler);
* calls trampoline_handler() runs, which calls the kretprobe's handler.
*/
asm(
+ ".text\n"
".global kretprobe_trampoline\n"
".type kretprobe_trampoline, @function\n"
"kretprobe_trampoline:\n"
diff --git a/arch/x86/lib/error-inject.c b/arch/x86/lib/error-inject.c
index 3cdf06128d13..be5b5fb1598b 100644
--- a/arch/x86/lib/error-inject.c
+++ b/arch/x86/lib/error-inject.c
@@ -6,6 +6,7 @@
asmlinkage void just_return_func(void);

asm(
+ ".text\n"
".type just_return_func, @function\n"
".globl just_return_func\n"
"just_return_func:\n"
--
2.20.1


2019-03-21 22:02:51

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 06/17] locking/core: Mark spinlocks noinline when inline spinlocks are disabled

From: Andi Kleen <[email protected]>

Otherwise LTO will inline them anyways and cause a large
kernel text increase.

Since the explicit intention here is to not inline them marking
them noinline is good documentation even for the non LTO case.

Signed-off-by: Andi Kleen <[email protected]>
---
kernel/locking/spinlock.c | 56 +++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 936f3d14dd6b..51031785e821 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -123,7 +123,7 @@ BUILD_LOCK_OPS(write, rwlock);
#endif

#ifndef CONFIG_INLINE_SPIN_TRYLOCK
-int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock(raw_spinlock_t *lock)
{
return __raw_spin_trylock(lock);
}
@@ -131,7 +131,7 @@ EXPORT_SYMBOL(_raw_spin_trylock);
#endif

#ifndef CONFIG_INLINE_SPIN_TRYLOCK_BH
-int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
+noinline int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)
{
return __raw_spin_trylock_bh(lock);
}
@@ -139,7 +139,7 @@ EXPORT_SYMBOL(_raw_spin_trylock_bh);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK
-void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
{
__raw_spin_lock(lock);
}
@@ -147,7 +147,7 @@ EXPORT_SYMBOL(_raw_spin_lock);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
+noinline unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
{
return __raw_spin_lock_irqsave(lock);
}
@@ -155,7 +155,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irqsave);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
-void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
{
__raw_spin_lock_irq(lock);
}
@@ -163,7 +163,7 @@ EXPORT_SYMBOL(_raw_spin_lock_irq);
#endif

#ifndef CONFIG_INLINE_SPIN_LOCK_BH
-void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
{
__raw_spin_lock_bh(lock);
}
@@ -171,7 +171,7 @@ EXPORT_SYMBOL(_raw_spin_lock_bh);
#endif

#ifdef CONFIG_UNINLINE_SPIN_UNLOCK
-void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)
{
__raw_spin_unlock(lock);
}
@@ -179,7 +179,7 @@ EXPORT_SYMBOL(_raw_spin_unlock);
#endif

#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
-void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
{
__raw_spin_unlock_irqrestore(lock, flags);
}
@@ -187,7 +187,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irqrestore);
#endif

#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
-void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
{
__raw_spin_unlock_irq(lock);
}
@@ -195,7 +195,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_irq);
#endif

#ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
-void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
+noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
{
__raw_spin_unlock_bh(lock);
}
@@ -203,7 +203,7 @@ EXPORT_SYMBOL(_raw_spin_unlock_bh);
#endif

#ifndef CONFIG_INLINE_READ_TRYLOCK
-int __lockfunc _raw_read_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_read_trylock(rwlock_t *lock)
{
return __raw_read_trylock(lock);
}
@@ -211,7 +211,7 @@ EXPORT_SYMBOL(_raw_read_trylock);
#endif

#ifndef CONFIG_INLINE_READ_LOCK
-void __lockfunc _raw_read_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock(rwlock_t *lock)
{
__raw_read_lock(lock);
}
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(_raw_read_lock);
#endif

#ifndef CONFIG_INLINE_READ_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_read_lock_irqsave(rwlock_t *lock)
{
return __raw_read_lock_irqsave(lock);
}
@@ -227,7 +227,7 @@ EXPORT_SYMBOL(_raw_read_lock_irqsave);
#endif

#ifndef CONFIG_INLINE_READ_LOCK_IRQ
-void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_irq(rwlock_t *lock)
{
__raw_read_lock_irq(lock);
}
@@ -235,7 +235,7 @@ EXPORT_SYMBOL(_raw_read_lock_irq);
#endif

#ifndef CONFIG_INLINE_READ_LOCK_BH
-void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_lock_bh(rwlock_t *lock)
{
__raw_read_lock_bh(lock);
}
@@ -243,7 +243,7 @@ EXPORT_SYMBOL(_raw_read_lock_bh);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK
-void __lockfunc _raw_read_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock(rwlock_t *lock)
{
__raw_read_unlock(lock);
}
@@ -251,7 +251,7 @@ EXPORT_SYMBOL(_raw_read_unlock);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK_IRQRESTORE
-void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
{
__raw_read_unlock_irqrestore(lock, flags);
}
@@ -259,7 +259,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irqrestore);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK_IRQ
-void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_irq(rwlock_t *lock)
{
__raw_read_unlock_irq(lock);
}
@@ -267,7 +267,7 @@ EXPORT_SYMBOL(_raw_read_unlock_irq);
#endif

#ifndef CONFIG_INLINE_READ_UNLOCK_BH
-void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_read_unlock_bh(rwlock_t *lock)
{
__raw_read_unlock_bh(lock);
}
@@ -275,7 +275,7 @@ EXPORT_SYMBOL(_raw_read_unlock_bh);
#endif

#ifndef CONFIG_INLINE_WRITE_TRYLOCK
-int __lockfunc _raw_write_trylock(rwlock_t *lock)
+noinline int __lockfunc _raw_write_trylock(rwlock_t *lock)
{
return __raw_write_trylock(lock);
}
@@ -283,7 +283,7 @@ EXPORT_SYMBOL(_raw_write_trylock);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK
-void __lockfunc _raw_write_lock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock(rwlock_t *lock)
{
__raw_write_lock(lock);
}
@@ -291,7 +291,7 @@ EXPORT_SYMBOL(_raw_write_lock);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE
-unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
+noinline unsigned long __lockfunc _raw_write_lock_irqsave(rwlock_t *lock)
{
return __raw_write_lock_irqsave(lock);
}
@@ -299,7 +299,7 @@ EXPORT_SYMBOL(_raw_write_lock_irqsave);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK_IRQ
-void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_irq(rwlock_t *lock)
{
__raw_write_lock_irq(lock);
}
@@ -307,7 +307,7 @@ EXPORT_SYMBOL(_raw_write_lock_irq);
#endif

#ifndef CONFIG_INLINE_WRITE_LOCK_BH
-void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_lock_bh(rwlock_t *lock)
{
__raw_write_lock_bh(lock);
}
@@ -315,7 +315,7 @@ EXPORT_SYMBOL(_raw_write_lock_bh);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK
-void __lockfunc _raw_write_unlock(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock(rwlock_t *lock)
{
__raw_write_unlock(lock);
}
@@ -323,7 +323,7 @@ EXPORT_SYMBOL(_raw_write_unlock);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE
-void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+noinline void __lockfunc _raw_write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
{
__raw_write_unlock_irqrestore(lock, flags);
}
@@ -331,7 +331,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irqrestore);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK_IRQ
-void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_irq(rwlock_t *lock)
{
__raw_write_unlock_irq(lock);
}
@@ -339,7 +339,7 @@ EXPORT_SYMBOL(_raw_write_unlock_irq);
#endif

#ifndef CONFIG_INLINE_WRITE_UNLOCK_BH
-void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
+noinline void __lockfunc _raw_write_unlock_bh(rwlock_t *lock)
{
__raw_write_unlock_bh(lock);
}
--
2.20.1


2019-03-21 22:03:06

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 11/17] x86/kprobes: Make trampoline_handler global and visible

From: Andi Kleen <[email protected]>

This function is referenced from assembler, so in LTO
it needs to be global and visible to not be optimized away.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 31ab91c9c4e9..1309a4eb3119 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -752,7 +752,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
/*
* Called from kretprobe_trampoline
*/
-static __used void *trampoline_handler(struct pt_regs *regs)
+__used __visible void *trampoline_handler(struct pt_regs *regs)
{
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
--
2.20.1


2019-03-21 22:03:09

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

From: Andi Kleen <[email protected]>

This warning is very noisy in a default build with gcc 9.
Move it into W=2 only.

Cc: [email protected]
Cc: Masahiro Yamada <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
Makefile | 1 +
scripts/Makefile.extrawarn | 1 +
2 files changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index 71c2c482e2a2..78efbb172991 100644
--- a/Makefile
+++ b/Makefile
@@ -676,6 +676,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,)
KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)
KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)
KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context)
+KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)

ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += $(call cc-option,-Oz,-Os)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 768306add591..ed974a0f4a47 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -43,6 +43,7 @@ warning-2 += $(call cc-option, -Wmissing-field-initializers)
warning-2 += $(call cc-option, -Wsign-compare)
warning-2 += $(call cc-option, -Wmaybe-uninitialized)
warning-2 += $(call cc-option, -Wunused-macros)
+warning-2 += $(call cc-option, -Waddress-of-packed-member)

warning-3 := -Wbad-function-cast
warning-3 += -Wcast-qual
--
2.20.1


2019-03-21 22:03:12

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 03/17] x86: Don't inline __const_udelay

From: Andi Kleen <[email protected]>

__const_udelay is marked inline, and LTO will happily inline it everywhere
Dropping the inline saves ~44k text in a LTO build.

13999560 1740864 1499136 17239560 1070e08 vmlinux-with-udelay-inline
13954764 1736768 1499136 17190668 1064f0c vmlinux-wo-udelay-inline

Even without LTO I believe marking it noinline documents it correctly.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/lib/delay.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index f5b7f1b3b6d7..b7375dc6898f 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -162,7 +162,7 @@ void __delay(unsigned long loops)
}
EXPORT_SYMBOL(__delay);

-void __const_udelay(unsigned long xloops)
+noinline void __const_udelay(unsigned long xloops)
{
unsigned long lpj = this_cpu_read(cpu_info.loops_per_jiffy) ? : loops_per_jiffy;
int d0;
--
2.20.1


2019-03-21 22:03:16

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 05/17] Use C version for SYSCALL_ALIAS

From: Andi Kleen <[email protected]>

LTO doesn't like the assembler aliasing used for SYSCALL_ALIAS.
Replace it with C aliasing. Also mark the only users visible.

This gives cleaner and nicer code, so is useful even without
LTO.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/syscall_wrapper.h | 1 +
include/linux/linkage.h | 6 ++----
kernel/time/posix-stubs.c | 2 +-
3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
index e046a405743d..6f7a73da8437 100644
--- a/arch/x86/include/asm/syscall_wrapper.h
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -52,6 +52,7 @@
SYSCALL_METADATA(_##sname, 0); \
asmlinkage long __x64_sys_##sname(void); \
ALLOW_ERROR_INJECTION(__x64_sys_##sname, ERRNO); \
+ asmlinkage long __ia32_sys_##sname(void); \
SYSCALL_ALIAS(__ia32_sys_##sname, __x64_sys_##sname); \
asmlinkage long __x64_sys_##sname(void)

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index 7e020782ade2..5fc9ecd55bda 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -30,10 +30,8 @@
#endif

#ifndef SYSCALL_ALIAS
-#define SYSCALL_ALIAS(alias, name) asm( \
- ".globl " __stringify(alias) "\n\t" \
- ".set " __stringify(alias) "," \
- __stringify(name))
+#define SYSCALL_ALIAS(a, name) \
+ __visible typeof(a) a __attribute__((alias(__stringify(name))))
#endif

#define __page_aligned_data __section(.data..page_aligned) __aligned(PAGE_SIZE)
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index 67df65f887ac..4777892bee82 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -21,7 +21,7 @@
#include <asm/syscall_wrapper.h>
#endif

-asmlinkage long sys_ni_posix_timers(void)
+__visible asmlinkage long sys_ni_posix_timers(void)
{
pr_err_once("process %d (%s) attempted a POSIX timer syscall "
"while CONFIG_POSIX_TIMERS is not set\n",
--
2.20.1


2019-03-21 22:03:19

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 14/17] crypto: Don't mark AES tables const and cacheline_aligned

From: Andi Kleen <[email protected]>

cacheline_aligned is a special section. It cannot be const at the same
time because it's not read-only. It doesn't give any MMU protection.

If we wanted const cacheline aligned we would need to define
a new dedicated section. But I assume it's not really needed so the const
can be dropped.

This fixes LTO complaints about conflicting section types.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
crypto/aes_generic.c | 8 ++++----
include/crypto/aes.h | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
index 13df33aca463..851b35949737 100644
--- a/crypto/aes_generic.c
+++ b/crypto/aes_generic.c
@@ -64,7 +64,7 @@ static inline u8 byte(const u32 x, const unsigned n)
static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 };

/* cacheline-aligned to facilitate prefetching into cache */
-__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = {
+__visible u32 crypto_ft_tab[4][256] __cacheline_aligned = {
{
0xa56363c6, 0x847c7cf8, 0x997777ee, 0x8d7b7bf6,
0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591,
@@ -328,7 +328,7 @@ __visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = {
}
};

-__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = {
+__visible u32 crypto_fl_tab[4][256] __cacheline_aligned = {
{
0x00000063, 0x0000007c, 0x00000077, 0x0000007b,
0x000000f2, 0x0000006b, 0x0000006f, 0x000000c5,
@@ -592,7 +592,7 @@ __visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = {
}
};

-__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = {
+__visible u32 crypto_it_tab[4][256] __cacheline_aligned = {
{
0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a,
0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b,
@@ -856,7 +856,7 @@ __visible const u32 crypto_it_tab[4][256] __cacheline_aligned = {
}
};

-__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = {
+__visible u32 crypto_il_tab[4][256] __cacheline_aligned = {
{
0x00000052, 0x00000009, 0x0000006a, 0x000000d5,
0x00000030, 0x00000036, 0x000000a5, 0x00000038,
diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 852eaa9cd4db..a03139ec930f 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -28,10 +28,10 @@ struct crypto_aes_ctx {
u32 key_length;
};

-extern const u32 crypto_ft_tab[4][256];
-extern const u32 crypto_fl_tab[4][256];
-extern const u32 crypto_it_tab[4][256];
-extern const u32 crypto_il_tab[4][256];
+extern u32 crypto_ft_tab[4][256];
+extern u32 crypto_fl_tab[4][256];
+extern u32 crypto_it_tab[4][256];
+extern u32 crypto_il_tab[4][256];

int crypto_aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
unsigned int key_len);
--
2.20.1


2019-03-21 22:03:21

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 12/17] afs: Avoid section confusion in CM_NAME

From: Andi Kleen <[email protected]>

__tracepoint_str cannot be const because the tracepoint_str
section is not read-only. Remove the stray const.

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
fs/afs/cmservice.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index 8ee5972893ed..2f8acb4c556d 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -34,7 +34,7 @@ static void SRXAFSCB_TellMeAboutYourself(struct work_struct *);
static int afs_deliver_yfs_cb_callback(struct afs_call *);

#define CM_NAME(name) \
- const char afs_SRXCB##name##_name[] __tracepoint_string = \
+ char afs_SRXCB##name##_name[] __tracepoint_string = \
"CB." #name

/*
--
2.20.1


2019-03-21 22:03:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 04/17] init: Add __noreorder and mark initcalls __noreorder

From: Andi Kleen <[email protected]>

gcc 5 has a new no_reorder attribute that prevents top level
reordering only for that symbol.

Kernels don't like any reordering of initcalls between files, as several
initcalls depend on each other.

Add a __noreorder wrapper for the no_reorder attribute and use
it for initcalls.

LTO previously needed to use -fno-toplevel-reordering to prevent boot
failures, but with noreorder this is not needed.

I believe the patch is useful even without LTO because it clearly
documents this dependency in the source.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/compiler_attributes.h | 11 +++++++++++
include/linux/init.h | 2 +-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index 6b318efd8a74..1c44ef9034bb 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -39,6 +39,7 @@
# define __GCC4_has_attribute___externally_visible__ 1
# define __GCC4_has_attribute___noclone__ 1
# define __GCC4_has_attribute___nonstring__ 0
+# define __GCC4_has_attribute___no_reorder__ 0
# define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
#endif

@@ -253,4 +254,14 @@
*/
#define __weak __attribute__((__weak__))

+/*
+ * https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
+ */
+
+#if __has_attribute(__no_reorder__)
+#define __noreorder __attribute__((no_reorder))
+#else
+#define __noreorder
+#endif
+
#endif /* __LINUX_COMPILER_ATTRIBUTES_H */
diff --git a/include/linux/init.h b/include/linux/init.h
index 5255069f5a9f..49d032916604 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -190,7 +190,7 @@ extern bool initcall_debug;
".previous \n");
#else
#define ___define_initcall(fn, id, __sec) \
- static initcall_t __initcall_##fn##id __used \
+ static initcall_t __initcall_##fn##id __used __noreorder \
__attribute__((__section__(#__sec ".init"))) = fn;
#endif

--
2.20.1


2019-03-21 22:04:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 07/17] amdkfd: Fix extern declaration

From: Andi Kleen <[email protected]>

When you use bare externs outside include files, need to at least specify
matching types. This fixes a type confusion between u16 and unsigned.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 01494752c36a..a8cba8d3714d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -341,8 +341,8 @@ int kfd_iommu_resume(struct kfd_dev *kfd)
}

extern bool amd_iommu_pc_supported(void);
-extern u8 amd_iommu_pc_get_max_banks(u16 devid);
-extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+extern u8 amd_iommu_pc_get_max_banks(unsigned devid);
+extern u8 amd_iommu_pc_get_max_counters(unsigned devid);

/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to topology
*/
--
2.20.1


2019-03-21 22:04:42

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 17/17] dm: Fix const confusion in dm

From: Andi Kleen <[email protected]>

A non const pointer to const cannot be marked initconst.
Mark the array actually const.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Fixes: 6bbc923dfcf5 dm: add support to directly boot to a mapped device
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/md/dm-init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index b53f30f16b4d..4b76f84424c3 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -36,7 +36,7 @@ struct dm_device {
struct list_head list;
};

-const char *dm_allowed_targets[] __initconst = {
+const char * const dm_allowed_targets[] __initconst = {
"crypt",
"delay",
"linear",
--
2.20.1


2019-03-22 08:35:15

by David Howells

[permalink] [raw]
Subject: Re: [PATCH 12/17] afs: Avoid section confusion in CM_NAME

Andi Kleen <[email protected]> wrote:

> __tracepoint_str cannot be const because the tracepoint_str
> section is not read-only. Remove the stray const.
>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

Acked-by: David Howells <[email protected]>

2019-03-22 14:00:02

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

On Thu, Mar 21, 2019 at 11:00 PM Andi Kleen <[email protected]> wrote:
>
> From: Andi Kleen <[email protected]>
>
> This warning is very noisy in a default build with gcc 9.
> Move it into W=2 only.
>
> Cc: [email protected]
> Cc: Masahiro Yamada <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>

I think W=2 is too aggressive. On many architectures, this finds
real bugs and the false positives tend to be trivial to fix
(by removing the bogus __packed annotation), which improves
the generated code in the process.

Moving it to W=1 for the moment is probably fine, but ideally
I think we should fix the kernel to behave according to the
C standard.

Arnd

> ---
> Makefile | 1 +
> scripts/Makefile.extrawarn | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 71c2c482e2a2..78efbb172991 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -676,6 +676,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,)
> KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)
> KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)
> KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context)
> +KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
>
> ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
> KBUILD_CFLAGS += $(call cc-option,-Oz,-Os)
> diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
> index 768306add591..ed974a0f4a47 100644
> --- a/scripts/Makefile.extrawarn
> +++ b/scripts/Makefile.extrawarn
> @@ -43,6 +43,7 @@ warning-2 += $(call cc-option, -Wmissing-field-initializers)
> warning-2 += $(call cc-option, -Wsign-compare)
> warning-2 += $(call cc-option, -Wmaybe-uninitialized)
> warning-2 += $(call cc-option, -Wunused-macros)
> +warning-2 += $(call cc-option, -Waddress-of-packed-member)
>
> warning-3 := -Wbad-function-cast
> warning-3 += -Wcast-qual
> --
> 2.20.1
>

2019-03-22 16:34:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

On Fri, Mar 22, 2019 at 02:58:51PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 21, 2019 at 11:00 PM Andi Kleen <[email protected]> wrote:
> >
> > From: Andi Kleen <[email protected]>
> >
> > This warning is very noisy in a default build with gcc 9.
> > Move it into W=2 only.
> >
> > Cc: [email protected]
> > Cc: Masahiro Yamada <[email protected]>
> > Signed-off-by: Andi Kleen <[email protected]>
>
> I think W=2 is too aggressive. On many architectures, this finds
> real bugs and the false positives tend to be trivial to fix
> (by removing the bogus __packed annotation), which improves
> the generated code in the process.

It might waste more memory or make the data structure incompatible
with something external.

>
> Moving it to W=1 for the moment is probably fine, but ideally
> I think we should fix the kernel to behave according to the
> C standard.

Okay, so is there anybody working on this?

AFAIK it's not an issue on x86 at all.

-Andi

2019-03-22 16:41:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

On Fri, Mar 22, 2019 at 02:58:51PM +0100, Arnd Bergmann wrote:
> On Thu, Mar 21, 2019 at 11:00 PM Andi Kleen <[email protected]> wrote:
> >
> > From: Andi Kleen <[email protected]>
> >
> > This warning is very noisy in a default build with gcc 9.
> > Move it into W=2 only.
> >
> > Cc: [email protected]
> > Cc: Masahiro Yamada <[email protected]>
> > Signed-off-by: Andi Kleen <[email protected]>
>
> I think W=2 is too aggressive. On many architectures, this finds
> real bugs and the false positives tend to be trivial to fix
> (by removing the bogus __packed annotation), which improves
> the generated code in the process.
>
> Moving it to W=1 for the moment is probably fine, but ideally
> I think we should fix the kernel to behave according to the
> C standard.

Lol... we're actively moving away from the C standard on many places.

Why does the silly compiler think it is a problem to take the address of
a member of a packed structure? That sounds like something that's
perfectly fine and useful.

2019-03-22 17:00:41

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

On Fri, Mar 22, 2019 at 5:40 PM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Mar 22, 2019 at 02:58:51PM +0100, Arnd Bergmann wrote:
> > On Thu, Mar 21, 2019 at 11:00 PM Andi Kleen <[email protected]> wrote:
> > >
> > > From: Andi Kleen <[email protected]>
> > >
> > > This warning is very noisy in a default build with gcc 9.
> > > Move it into W=2 only.
> > >
> > > Cc: [email protected]
> > > Cc: Masahiro Yamada <[email protected]>
> > > Signed-off-by: Andi Kleen <[email protected]>
> >
> > I think W=2 is too aggressive. On many architectures, this finds
> > real bugs and the false positives tend to be trivial to fix
> > (by removing the bogus __packed annotation), which improves
> > the generated code in the process.
> >
> > Moving it to W=1 for the moment is probably fine, but ideally
> > I think we should fix the kernel to behave according to the
> > C standard.
>
> Lol... we're actively moving away from the C standard on many places.
>
> Why does the silly compiler think it is a problem to take the address of
> a member of a packed structure? That sounds like something that's
> perfectly fine and useful.

The problem is casting a pointer of an unaligned variable to one
of higher alignment and passing that to a function that does not
expect unaligned data.

On CPUs that don't have unaligned load/store instructions, this
causes an alignment fault, which will have to be fixed up in
software by doing a bytewise access or crash the system.

Arnd

2019-03-22 17:00:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

> Lol... we're actively moving away from the C standard on many places.

Yes and also packed is not part of the C standard.

> Why does the silly compiler think it is a problem to take the address of
> a member of a packed structure? That sounds like something that's
> perfectly fine and useful.

Probably because a pointer reference doesn't do whatever magic
may be needed on architectures with poor misalignment handling.
In theory you would need memcpy().

In practice it's likely a lot of false positives, like
the architectures with poor misalignment handling are usually
in SOCs without PCI and they don't have the devices
with the problematic drivers.

-Andi

2019-03-23 09:46:47

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 11/17] x86/kprobes: Make trampoline_handler global and visible

On Thu, 21 Mar 2019 15:00:03 -0700
Andi Kleen <[email protected]> wrote:

> From: Andi Kleen <[email protected]>
>
> This function is referenced from assembler, so in LTO
> it needs to be global and visible to not be optimized away.
>

I got it.

Acked-by: Masami Hiramatsu <[email protected]>

Andi, out of curiousity, that is not only for x86, but
does all arch have to care it?

Thank you,

> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kernel/kprobes/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 31ab91c9c4e9..1309a4eb3119 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -752,7 +752,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> /*
> * Called from kretprobe_trampoline
> */
> -static __used void *trampoline_handler(struct pt_regs *regs)
> +__used __visible void *trampoline_handler(struct pt_regs *regs)
> {
> struct kretprobe_instance *ri = NULL;
> struct hlist_head *head, empty_rp;
> --
> 2.20.1
>


--
Masami Hiramatsu <[email protected]>

2019-03-23 14:37:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 11/17] x86/kprobes: Make trampoline_handler global and visible

On Sat, Mar 23, 2019 at 06:45:21PM +0900, Masami Hiramatsu wrote:
> On Thu, 21 Mar 2019 15:00:03 -0700
> Andi Kleen <[email protected]> wrote:
>
> > From: Andi Kleen <[email protected]>
> >
> > This function is referenced from assembler, so in LTO
> > it needs to be global and visible to not be optimized away.
> >
>
> I got it.
>
> Acked-by: Masami Hiramatsu <[email protected]>
>
> Andi, out of curiousity, that is not only for x86, but
> does all arch have to care it?

Yes all architectures need it for LTO, but I currently
only test x86. There are some other people who made it work
on ARM / MIPS / PowerPC before.

-Andi

2019-03-25 01:55:20

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 11/17] x86/kprobes: Make trampoline_handler global and visible

On Sat, 23 Mar 2019 07:35:18 -0700
Andi Kleen <[email protected]> wrote:

> On Sat, Mar 23, 2019 at 06:45:21PM +0900, Masami Hiramatsu wrote:
> > On Thu, 21 Mar 2019 15:00:03 -0700
> > Andi Kleen <[email protected]> wrote:
> >
> > > From: Andi Kleen <[email protected]>
> > >
> > > This function is referenced from assembler, so in LTO
> > > it needs to be global and visible to not be optimized away.
> > >
> >
> > I got it.
> >
> > Acked-by: Masami Hiramatsu <[email protected]>
> >
> > Andi, out of curiousity, that is not only for x86, but
> > does all arch have to care it?
>
> Yes all architectures need it for LTO, but I currently
> only test x86. There are some other people who made it work
> on ARM / MIPS / PowerPC before.

That's good to know! I worried that similar mistake I made
on another arch :)

Thank you,

>
> -Andi


--
Masami Hiramatsu <[email protected]>

2019-03-25 16:27:50

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

From: Andi Kleen
> Sent: 22 March 2019 16:58
> > Lol... we're actively moving away from the C standard on many places.
>
> Yes and also packed is not part of the C standard.
>
> > Why does the silly compiler think it is a problem to take the address of
> > a member of a packed structure? That sounds like something that's
> > perfectly fine and useful.
>
> Probably because a pointer reference doesn't do whatever magic
> may be needed on architectures with poor misalignment handling.
> In theory you would need memcpy().

And you can't use memcpy() on a misaligned pointer because the
C standard doesn't allow misaligned pointer to happen and the
compiler will optimise the memcpy() to aligned register moves.

> In practice it's likely a lot of false positives, like
> the architectures with poor misalignment handling are usually
> in SOCs without PCI and they don't have the devices
> with the problematic drivers.

A lot of it is code that has structures marked 'packed' when
they don't need to be at all.
Structures that map hardware registers probably want the
'error if any padding' check - but some people seem to think
they should be 'packed'.

For hardware structures that have fields on incorrect boundaries
you need to mark the field member as having a lesser alignment
rather than making the structure itself packed.
Do it right and you can get the compiler to do 16bit accesses
for a 32bit field that might be on a 4n+2 boundary.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2019-03-26 08:19:52

by Yi Sun

[permalink] [raw]
Subject: Re: [PATCH 15/17] x86/hyperv: Make hv_vcpu_is_preempted visible

On 19-03-21 15:00:07, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> This function is referrenced from assembler, so need to be marked
> visible for LTO.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 3a025de64bf8 x86/hyperv: Enable PV qspinlock for Hyper-V
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/hyperv/hv_spinlock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
> index a861b0456b1a..07f21a06392f 100644
> --- a/arch/x86/hyperv/hv_spinlock.c
> +++ b/arch/x86/hyperv/hv_spinlock.c
> @@ -56,7 +56,7 @@ static void hv_qlock_wait(u8 *byte, u8 val)
> /*
> * Hyper-V does not support this so far.
> */
> -bool hv_vcpu_is_preempted(int vcpu)
> +__visible bool hv_vcpu_is_preempted(int vcpu)
> {
> return false;
> }

Reviewed-by: Yi Sun <[email protected]>

> --
> 2.20.1

2019-03-26 08:44:34

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [PATCH 14/17] crypto: Don't mark AES tables const and cacheline_aligned

On 21/03/2019 23.00, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> cacheline_aligned is a special section. It cannot be const at the same
> time because it's not read-only. It doesn't give any MMU protection.

Urgh. Perhaps this instead just wanted to use the quadruple underscore
version, ____cacheline_aligned ? That would make these objects correctly
aligned and still live in .rodata, no?

Rasmus

2019-03-26 17:02:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Fixes and cleanup from LTO tree

Andi,

On Thu, 21 Mar 2019, Andi Kleen wrote:

> Here are a range of bug fixes and cleanups that have accumulated in my
> gcc Link Time Optimization (LTO) branches; for issues found
> by the compiler when doing global optimization and a few
> other issues.
>
> (https://github.com/andikleen/linux-misc lto-*)
>
> IMNSHO they are all useful improvements even without LTO support.
>
> About half of it is in x86 specific code, but the others are
> random all over. I tried to always copy the respective maintainers,
> but since it's (nearly) a tree sweep I'm also copying Andrew.

Can you please once and forever stop sending a random pile of patches which
are:

- fixes independent of LTO
- LTO required changes
- RFC material

It's very clear where x86 related patches go through and it's also clear
that fixes have to be separate from features and other material.

You complain about maintainers being inresponsive and slow, but you are not
even trying to make their work easier by following the general process.

Thanks,

tglx

2019-03-26 17:05:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

Andi,

On Thu, 21 Mar 2019, Andi Kleen wrote:

> With gcc 8 toplevel assembler statements that do not mark themselves
> as .text may end up in other sections.

Which is clearly a change in behaviour. Is that intended or just yet
another feature of GCC?

Your subject says: 'x86, lto:'

So is this a LTO related problem or is the section randomization
independent of LTO?

This wants to be clearly documented in the changelog.

Aside of that the proper Subject prefix is either:

x86/asm/lto:

or

x86/asm:

dependent on the nature. Like it or not, but this has been the prefix x86
uses for a very long time already.

> I had boot crashes because
> various assembler statements ended up in the middle of the initcall
> section.
>
> Always mark all the top level assembler statements as text
> so that they switch to the right section.
>
> For AMD "vide", which is only used on 32bit kernels, I also
> marked it as 32bit only.

Once more. See

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

"Describe your changes in imperative mood, e.g. “make xyzzy do frotz”
instead of “[This patch] makes xyzzy do frotz” or “[I] changed xyzzy to
do frotz”, as if you are giving orders to the codebase to change its
behaviour."

This is the last time, I'm asking for this.

Thanks,

tglx

2019-03-26 17:25:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 08/17] x86/speculation: Fix __initconst in bugs.c

On Thu, 21 Mar 2019, Andi Kleen wrote:

Finally found a fix in the pile of unrelated crap.

Subject: x86/speculation: Fix __initconst in bugs.c

So how does this fix __initconst?

Again from Documentation:

"The summary phrase in the email?s Subject should concisely describe the
patch which that email contains. The summary phrase should not be a
filename."

Also instead of having the file name in the subject line, which is
completely uninteresting the prefix should be 'x86/cpu/bugs:'

> Fix some of the recently added const tables to use __initconst
> for const data instead of __initdata which causes section attribute
> conflicts.
>
> Signed-off-by: Andi Kleen <[email protected]>

This lacks a 'Fixes:' tag.

Thanks,

tglx

2019-03-26 18:32:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 14/17] crypto: Don't mark AES tables const and cacheline_aligned

On Tue, Mar 26, 2019 at 09:42:13AM +0100, Rasmus Villemoes wrote:
> On 21/03/2019 23.00, Andi Kleen wrote:
> > From: Andi Kleen <[email protected]>
> >
> > cacheline_aligned is a special section. It cannot be const at the same
> > time because it's not read-only. It doesn't give any MMU protection.
>
> Urgh. Perhaps this instead just wanted to use the quadruple underscore
> version, ____cacheline_aligned ? That would make these objects correctly
> aligned and still live in .rodata, no?

Yes that seems to be the right fix.

.rodata is 4K aligned, so it supports 64 bytes too.

-Andi

2019-03-26 21:38:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Tue, Mar 26, 2019 at 06:03:59PM +0100, Thomas Gleixner wrote:
> Andi,
>
> On Thu, 21 Mar 2019, Andi Kleen wrote:
>
> > With gcc 8 toplevel assembler statements that do not mark themselves
> > as .text may end up in other sections.
>
> Which is clearly a change in behaviour. Is that intended or just yet
> another feature of GCC?

I'm not sure it's a new behavior, but I've seen it first
with gcc 8.

>
> Your subject says: 'x86, lto:'
>
> So is this a LTO related problem or is the section randomization
> independent of LTO?

The basic behavior is independent of LTO, but I've only seen
failures with LTO. But I believe in theory it could lead
to failures even without LTO.

-Andi

2019-03-26 22:26:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

Andi.

On Tue, 26 Mar 2019, Andi Kleen wrote:
> On Tue, Mar 26, 2019 at 06:03:59PM +0100, Thomas Gleixner wrote:
> > On Thu, 21 Mar 2019, Andi Kleen wrote:
> >
> > > With gcc 8 toplevel assembler statements that do not mark themselves
> > > as .text may end up in other sections.
> >
> > Which is clearly a change in behaviour. Is that intended or just yet
> > another feature of GCC?
>
> I'm not sure it's a new behavior, but I've seen it first
> with gcc 8.

Ok.

> > Your subject says: 'x86, lto:'
> >
> > So is this a LTO related problem or is the section randomization
> > independent of LTO?
>
> The basic behavior is independent of LTO, but I've only seen
> failures with LTO. But I believe in theory it could lead
> to failures even without LTO.

Well, we better should know the real reason for this wreckage. I mean, the
default section for text is suprisingly .text. I don't see a reason why
this would be any different for an assembly function implemented in a C
file.

So the question is whether GCC does something silly in general which gets
'repaired' silentely by the linker or whether it's just an LTO issue.

If it's the former, then we must backport those fixes.

Could you please verify with the GCC people as you seem to have a
reproducer of some sort.

Thanks,

tglx

2019-03-27 00:56:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

> Well, we better should know the real reason for this wreckage. I mean, the
> default section for text is suprisingly .text. I don't see a reason why
> this would be any different for an assembly function implemented in a C
> file.

What happens is that when the function before the asm changes
the section, gcc only changes it back for the next function/variable
with a different section. But it doesn't change it back for the asm.


e.g. here

__attribute__((section("foo"))) void func(void)
{
}

asm("foo:\n");

gives with gcc -S (might be different with optimization):

.section foo,"ax",@progbits <----------------- sets the section
.globl func
.type func, @function
func:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size func, .-func
<--------------------------- no section reset before the asm
#APP
foo:

.ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
.section .note.GNU-stack,"",@progbits


> So the question is whether GCC does something silly in general which gets
> 'repaired' silentely by the linker or whether it's just an LTO issue.

The linker has no idea about the boundaries between functions and toplevel
asms. So if gcc doesn't reset the section they stay in the same.

My LTO build was just unlucky I think because it ended up with
asm next to initcall function, which is likely not common.

But gcc reorders functions even without LTO inside files,
so it could eventually happen.

>
> If it's the former, then we must backport those fixes.
>
> Could you please verify with the GCC people as you seem to have a
> reproducer of some sort.

Okay. I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89839

I thought I had already done that, but it seems I didn't.

-Andi


2019-03-27 01:09:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

Following up to myself here...

> > If it's the former, then we must backport those fixes.
> >
> > Could you please verify with the GCC people as you seem to have a
> > reproducer of some sort.
>
> Okay. I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89839
>
> I thought I had already done that, but it seems I didn't.

Looks like they think the kernel should be fixed.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89839

Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID


--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
No, toplevel inline-asm should never assume what section it is in.


-Andi


2019-03-27 14:21:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

Andi,

On Tue, 26 Mar 2019, Andi Kleen wrote:
> > Well, we better should know the real reason for this wreckage. I mean, the
> > default section for text is suprisingly .text. I don't see a reason why
> > this would be any different for an assembly function implemented in a C
> > file.
>
> What happens is that when the function before the asm changes
> the section, gcc only changes it back for the next function/variable
> with a different section. But it doesn't change it back for the asm.
>
>
> e.g. here
>
> __attribute__((section("foo"))) void func(void)
> {
> }
>
> asm("foo:\n");
>
> gives with gcc -S (might be different with optimization):
>
> .section foo,"ax",@progbits <----------------- sets the section
> .globl func
> .type func, @function

SNIP

> .LFE0:
> .size func, .-func
> <--------------------------- no section reset before the asm
> #APP
> foo:
>
> .ident "GCC: (GNU) 8.3.1 20190223 (Red Hat 8.3.1-2)"
> .section .note.GNU-stack,"",@progbits

Makes sense, but comes as a surprise when the thing is actually marked as a
function.

> But gcc reorders functions even without LTO inside files, so it could
> eventually happen.

Adding

+void __init foo(void)
+{
+ pr_info("foo\n");
+}

right before the kretprobe_trampoline and compiling it with GCC 6.

So one would assume that kretprobe_trampoline now ends up in
.init.text. But it ends up in the .text section because it's reordered and
ends up at the top of .text.

So clearly stuff gets reordered and those top level ASM constructs which
lack a section are just working by chance and we need the annotations and
backport them.

We also need a way to detect such wreckage automatically. This can happen
again and as the GCC behaviour is random there is no guarantee that it's
noticed right away. Josh, can objtool help here or do we need some other
form of checking that?

So independent of the LTO issue, this information needs to be in the
changelog.

For the patch itself. The kprobes/vide/error-inject parts are fine because
these are clearly functions: ".type $NAME, @function\n".

But this hunk not so much:

--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -372,7 +372,7 @@ extern struct paravirt_patch_template pv_ops;

> #define DEF_NATIVE(ops, name, code) \
> __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
> + asm(".text\n\t" NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))

Because it is NOT text.

That 'code' is never executed in place. It's a patch table, which is used
by the alternative code to patch in the native instructions so the pv_ops
indirection can be avoided on bare metal. It's only copied into a buffer
nothing else. So blindly slapping '.text' on it is just wrong.

But that's not the only thing which is wrong here. DEF_NATIVE is only used
in paravirt_patch_32/64.c and the resulting labels are not used outside of
this either. So why are these labels global and the c declaration __visible
extern?

global was already in the original paravirt code and should have never been
there in the first place.

But __visible? That was added via:

commit 9a55fdbe941e ("x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops")

with a completely empty changelog. Really helpful.

And further down the road it was again LTO "improved":

commit 824a2870098fa536 ("x86, asmlinkage, paravirt: Don't rely on local assembler labels")

with the following changelog in 2013:

"The paravirt patching code assumes that it can reference a
local assembler label between two different top level assembler
statements. This does not work with LTO
where the assembler code may end up in different assembler files.

Replace it with extern / global /asm linkage labels."

This clearly shows that it was never analyzed proper and even the current
patch lacks any form of proper root cause analysis as the "changelog"
clearly shows:

"With gcc 8 toplevel assembler statements that do not mark themselves
as .text may end up in other sections. I had boot crashes because
various assembler statements ended up in the middle of the initcall
section."

Admittedly it contains at least some information, which is progress over an
empty changelog. But it's clearly NOT a gcc8 problem and it has absolutely
nothing to do with LTO, which the subject suggests.

Is it really necessary, that I need to:

- urge you to talk with GCC people?

- ask about whether this needs to be backported?

- ask whether this is an LTO only problem?

- do your homework of analysing the root cause?

- do your homework of analysing the patched code?

- do your homework of fixing it proper?

And you ask why it takes ages to get your stuff merged? Yes, it takes ages
because patches based on 'works for me' engineering are simply not
acceptable. You have a proven track record of that and I'm trusting you and
your patches not at all. Done that, got burned often enough. Not going to
happen again. It's solely up to you to change that situation.

Proper fix below.

Thanks,

tglx

8<-------------------

--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -367,11 +367,15 @@ extern struct paravirt_patch_template pv
_paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")

/* Simple instruction patching code. */
-#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
+#define NATIVE_LABEL(a,x,b) "\n" a #x "_" #b ":\n\t"

#define DEF_NATIVE(ops, name, code) \
- __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
- asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
+ static const char start_##ops##_##name[], end_##ops##_##name[]; \
+ asm(".pushsection .rodata, \"a\"\n" \
+ NATIVE_LABEL("start_", ops, name) \
+ code \
+ NATIVE_LABEL("end_", ops, name) \
+ ".popsection\n")

unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
unsigned paravirt_patch_default(u8 type, void *insnbuf,




2019-03-27 14:40:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text


* Thomas Gleixner <[email protected]> wrote:

> Proper fix below.
>
> Thanks,
>
> tglx
>
> 8<-------------------
>
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -367,11 +367,15 @@ extern struct paravirt_patch_template pv
> _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
>
> /* Simple instruction patching code. */
> -#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
> +#define NATIVE_LABEL(a,x,b) "\n" a #x "_" #b ":\n\t"
>
> #define DEF_NATIVE(ops, name, code) \
> - __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
> + static const char start_##ops##_##name[], end_##ops##_##name[]; \
> + asm(".pushsection .rodata, \"a\"\n" \
> + NATIVE_LABEL("start_", ops, name) \
> + code \
> + NATIVE_LABEL("end_", ops, name) \
> + ".popsection\n")

Acked-by: Ingo Molnar <[email protected]>

Thanks,

Ingo

2019-03-27 15:00:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Wed, Mar 27, 2019 at 03:20:08PM +0100, Thomas Gleixner wrote:
> +void __init foo(void)
> +{
> + pr_info("foo\n");
> +}
>
> right before the kretprobe_trampoline and compiling it with GCC 6.
>
> So one would assume that kretprobe_trampoline now ends up in
> .init.text. But it ends up in the .text section because it's reordered and
> ends up at the top of .text.

You would see the breakage with -fno-toplevel-reorder

> We also need a way to detect such wreckage automatically. This can happen
> again and as the GCC behaviour is random there is no guarantee that it's
> noticed right away. Josh, can objtool help here or do we need some other
> form of checking that?

It would surprise me if objtool could do it generally because the toplevel
assembler could be anything and may not be distinguishable from
C code. I guess it could catch cases of code ending up in initdata,
but it probably wouldn't work for inittext, which could happen too.

Code review is enough hopefully? Just every top level asm needs
a section.

> Because it is NOT text.

Makes sense.

I guess module loading needs it, otherwise it could just be initdata.

> But that's not the only thing which is wrong here. DEF_NATIVE is only used
> in paravirt_patch_32/64.c and the resulting labels are not used outside of
> this either. So why are these labels global and the c declaration __visible
> extern?

LTO needs any C symbols that are referenced from assembler to be global
and visible because the asm statement could end up in a different
assembler file. This is a different issue from the section
problem.

> This clearly shows that it was never analyzed proper and even the current
> patch lacks any form of proper root cause analysis as the "changelog"
> clearly shows:

This wasn't because of the section problem, just the orthogonal
file reordering problem described above. Given the changelogs could have
been better. But the root cause is/was clear.

> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -367,11 +367,15 @@ extern struct paravirt_patch_template pv
> _paravirt_alt(insn_string, "%c[paravirt_typenum]", "%c[paravirt_clobber]")
>
> /* Simple instruction patching code. */
> -#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
> +#define NATIVE_LABEL(a,x,b) "\n" a #x "_" #b ":\n\t"
>
> #define DEF_NATIVE(ops, name, code) \
> - __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
> + static const char start_##ops##_##name[], end_##ops##_##name[]; \

Please don't apply the static/__visible removal hunk, I will just need to
revert it again for LTO.

> + asm(".pushsection .rodata, \"a\"\n" \
> + NATIVE_LABEL("start_", ops, name) \
> + code \
> + NATIVE_LABEL("end_", ops, name) \
> + ".popsection\n")

That part looks good.


-Andi


2019-03-27 15:09:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Wed, 27 Mar 2019, Andi Kleen wrote:
> On Wed, Mar 27, 2019 at 03:20:08PM +0100, Thomas Gleixner wrote:
> > /* Simple instruction patching code. */
> > -#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
> > +#define NATIVE_LABEL(a,x,b) "\n" a #x "_" #b ":\n\t"
> >
> > #define DEF_NATIVE(ops, name, code) \
> > - __visible extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> > - asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, name))
> > + static const char start_##ops##_##name[], end_##ops##_##name[]; \
>
> Please don't apply the static/__visible removal hunk, I will just need to
> revert it again for LTO.

Why on earth is this needed for LTO?

From the GCC manual:

"This attribute, attached to a global variable or function, nullifies the
effect of the -fw hole-program command-line option, so the object remains
visible outside the current compilation unit."

Neither the variable nor the data generated are global anymore. This data
is only used inside this compilation unit and I don't see why LTO needs a
reference outside of it. If so, then I really want to understand WHY
exactly.

Thanks,

tglx

2019-03-27 15:46:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

> Why on earth is this needed for LTO?
>
> From the GCC manual:
>
> "This attribute, attached to a global variable or function, nullifies the
> effect of the -fw hole-program command-line option, so the object remains
> visible outside the current compilation unit."
>
> Neither the variable nor the data generated are global anymore. This data
> is only used inside this compilation unit and I don't see why LTO needs a
> reference outside of it. If so, then I really want to understand WHY
> exactly.

The LTO code generation doesn't compile file by file, but reorders
all the functions and other top level statements into "partitions".
The partitions are ordered by call graph so that inlining and some
other optimizations work efficiently

Then it finally runs multiple copies of the gcc code generator that
generate code from these partitions, each generating an own assembler file.

The top level assembler is not part of the call graph that drives
the partitioning.

So it can happen (in fact it's likely) that the top level assembler
statement ends up in a different partition and final assembler file
than the other functions in the same source file.

The repartitioning handles all symbols that are visible to C
automatically.

But for top level asm() referencing C symbols gcc doesn't know what
it is referenced.

Normally in non LTO it just works because everything ends up
in the same assembler file and the assembler can resolve the static label.

But that won't work if it's in a different assembler file,
as in LTO.

Fixing it would probably require adding some new syntax to top
level asm to declare symbols it referenced, so that gcc understands
the references.

But if we make it __visible and global it works too because
the linker resolves it. That's simple enough, although can be a bit
unexpected.

More information on LTO internals:
https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html#LTO-Overview

-Andi


2019-03-27 19:10:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Wed, 27 Mar 2019, Andi Kleen wrote:
> > From the GCC manual:
> >
> > "This attribute, attached to a global variable or function, nullifies the
> > effect of the -fw hole-program command-line option, so the object remains
> > visible outside the current compilation unit."
> >
> > Neither the variable nor the data generated are global anymore. This data
> > is only used inside this compilation unit and I don't see why LTO needs a
> > reference outside of it. If so, then I really want to understand WHY
> > exactly.
>
> The LTO code generation doesn't compile file by file, but reorders
> all the functions and other top level statements into "partitions".
> The partitions are ordered by call graph so that inlining and some
> other optimizations work efficiently
>
> Then it finally runs multiple copies of the gcc code generator that
> generate code from these partitions, each generating an own assembler file.
>
> The top level assembler is not part of the call graph that drives
> the partitioning.

There is no top level assembly anymore. It's static data in the .rodata
section which is completely uninteresting for a call graph. That data is
accessed by the C function.

It's static so it's scope is within the file and whatever GCC does with
that C function it has to respect that it accesses static data. If that's
not true then this really needs to be fixed at the compiler side and not in
the kernel.

Thanks,

tglx

2019-03-27 20:42:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

> + asm(".pushsection .rodata, \"a\"\n" \
> + NATIVE_LABEL("start_", ops, name) \
> + code \
> + NATIVE_LABEL("end_", ops, name) \
> + ".popsection\n")

>
> It's static so it's scope is within the file and whatever GCC does with
> that C function it has to respect that it accesses static data. If that's
> not true then this really needs to be fixed at the compiler side and not in
> the kernel.

Ok so you did the statics with undefined size, so kind of an extern static.
That's a weird construct (not sure if it's even allowed in standard C), but
somehow it seems to work in gcc with the inline assembler.

I checked the code general and with the .globl in NATIVE_LABEL the
generated assembler looks like it should work even for LTO yes.

I guess it's an interesting alternative to making them all global.
Maybe that will work for more cases too.

Thanks.

-Andi

2019-03-27 21:56:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Wed, 27 Mar 2019, Andi Kleen wrote:
> >
> > It's static so it's scope is within the file and whatever GCC does with
> > that C function it has to respect that it accesses static data. If that's
> > not true then this really needs to be fixed at the compiler side and not in
> > the kernel.
>
> Ok so you did the statics with undefined size, so kind of an extern static.
> That's a weird construct (not sure if it's even allowed in standard C), but
> somehow it seems to work in gcc with the inline assembler.

Strict C89 does not allow that, but it does not allow a lot of other things
the kernel does.

> I checked the code general and with the .globl in NATIVE_LABEL the

With or without? I removed that as well.

> generated assembler looks like it should work even for LTO yes.
>
> I guess it's an interesting alternative to making them all global.
> Maybe that will work for more cases too.

Pretty much preferred over making static stuff global all over the place.

Thanks,

tglx

2019-03-27 22:32:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

> > I checked the code general and with the .globl in NATIVE_LABEL the
>
> With or without? I removed that as well.

With.

LTO would still need the .globls because the variable and the asm
statement can end up in different assembler files, and resolution
would rely on the linker.


-Andi


2019-03-27 22:51:54

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Wed, 27 Mar 2019, Andi Kleen wrote:

> > > I checked the code general and with the .globl in NATIVE_LABEL the
> >
> > With or without? I removed that as well.
>
> With.
>
> LTO would still need the .globls because the variable and the asm
> statement can end up in different assembler files, and resolution
> would rely on the linker.

This really sucks. As this is a constant source of trouble, we might better
bite the bullet and get rid of this fancy macro completely. It's not that
we add these patch patterns every other day.

So a good old:

/* xor %eax %ex */
static const unsigend char patch_qspinlock[] = { 0x31, 0xc0 };

and then in the patch code:

return paravirt_patch_insns(ibuf, len, patch_qspinlock,
patch_qspinlock + ARRAY_SIZE(patch_qspinlock));

might be less trouble than dealing with that clever inline assembly
forever.

Thanks,

tglx

2019-04-01 01:07:11

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH 01/17] kbuild: Disable -Waddress-of-packed-member for gcc 9

Hi,

On Fri, Mar 22, 2019 at 10:59 PM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Mar 21, 2019 at 11:00 PM Andi Kleen <[email protected]> wrote:
> >
> > From: Andi Kleen <[email protected]>
> >
> > This warning is very noisy in a default build with gcc 9.
> > Move it into W=2 only.
> >
> > Cc: [email protected]
> > Cc: Masahiro Yamada <[email protected]>
> > Signed-off-by: Andi Kleen <[email protected]>
>
> I think W=2 is too aggressive. On many architectures, this finds
> real bugs and the false positives tend to be trivial to fix
> (by removing the bogus __packed annotation), which improves
> the generated code in the process.
>
> Moving it to W=1 for the moment is probably fine, but ideally
> I think we should fix the kernel to behave according to the
> C standard.
>
> Arnd



I agree to disable this warning option
since we see a lot of instances.

The room of argument is W=1 or W=2?
I do not have a strong opinion either way.

Arnd most actively takes care of warnings like this.
If he looks into these warnings, W=1 is fine with me.


--
Best Regards
Masahiro Yamada

2019-04-01 13:41:30

by Hugues Fruchet

[permalink] [raw]
Subject: Re: [PATCH 10/17] delta: Fix buffer overrun in delta_ipc_open

Hi Andi,

We have already discussed about that here:
https://lore.kernel.org/patchwork/patch/866406/

Now that strscpy is largely deployed within kernel, could you retest
with the change I suggested ?

Best regards,
Hugues.

On 3/21/19 11:00 PM, Andi Kleen wrote:
> From: Andi Kleen <[email protected]>
>
> delta_ipc_open is always called with a single constant string
> as name, but it uses a longer memcpy to copy the string to
> a different structure. The memcpy would read outside the bounds
> of the string, potentially accessing unmapped memory.
>
> Just use strcpy instead after clearing the area.
>
> This fixes a build error with LTO, which can detect this.
>
> Cc: [email protected]
> Cc: [email protected]
> Fixes: 91c83f395fbe [media] st-delta: rpmsg ipc support
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> drivers/media/platform/sti/delta/delta-ipc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/platform/sti/delta/delta-ipc.c b/drivers/media/platform/sti/delta/delta-ipc.c
> index a4603d573c34..bd1bbbeedec3 100644
> --- a/drivers/media/platform/sti/delta/delta-ipc.c
> +++ b/drivers/media/platform/sti/delta/delta-ipc.c
> @@ -175,8 +175,8 @@ int delta_ipc_open(struct delta_ctx *pctx, const char *name,
> msg.ipc_buf_size = ipc_buf_size;
> msg.ipc_buf_paddr = ctx->ipc_buf->paddr;
>
> - memcpy(msg.name, name, sizeof(msg.name));
> - msg.name[sizeof(msg.name) - 1] = 0;
> + memset(msg.name, 0, sizeof(msg.name));
> + strcpy(msg.name, name);
>
> msg.param_size = param->size;
> memcpy(ctx->ipc_buf->vaddr, param->data, msg.param_size);
>

2019-04-01 16:55:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 10/17] delta: Fix buffer overrun in delta_ipc_open

On Mon, Apr 01, 2019 at 01:37:56PM +0000, Hugues FRUCHET wrote:
> Hi Andi,
>
> We have already discussed about that here:
> https://lore.kernel.org/patchwork/patch/866406/
>
> Now that strscpy is largely deployed within kernel, could you retest
> with the change I suggested ?

strscpy is not the correct fix because it leaks uninitialized memory
to the receiver. You need the memset.

-Andi

2019-04-02 09:29:38

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 02/17] x86, lto: Mark all top level asm statements as .text

On Wed, 27 Mar 2019, Thomas Gleixner wrote:
> On Wed, 27 Mar 2019, Andi Kleen wrote:
>
> > > > I checked the code general and with the .globl in NATIVE_LABEL the
> > >
> > > With or without? I removed that as well.
> >
> > With.
> >
> > LTO would still need the .globls because the variable and the asm
> > statement can end up in different assembler files, and resolution
> > would rely on the linker.
>
> This really sucks. As this is a constant source of trouble, we might better
> bite the bullet and get rid of this fancy macro completely. It's not that
> we add these patch patterns every other day.
>
> So a good old:
>
> /* xor %eax %ex */
> static const unsigend char patch_qspinlock[] = { 0x31, 0xc0 };
>
> and then in the patch code:
>
> return paravirt_patch_insns(ibuf, len, patch_qspinlock,
> patch_qspinlock + ARRAY_SIZE(patch_qspinlock));
>
> might be less trouble than dealing with that clever inline assembly
> forever.

Just for the record. GCC people have confirmed that all the constructs suck
in one way or the other and are just working by chance. So the above is the
sanest solution.

Thanks,

tglx


2019-04-02 10:00:03

by Hugues Fruchet

[permalink] [raw]
Subject: Re: [PATCH 10/17] delta: Fix buffer overrun in delta_ipc_open

Hi Andi,

So do both, memset then strscpy:

+ memset(msg.name, 0, sizeof(msg.name));
+ if (strscpy(msg.name, name, sizeof(msg.name)) <= 0)
+ goto err;

BR,
Hugues.

On 4/1/19 6:54 PM, Andi Kleen wrote:
> On Mon, Apr 01, 2019 at 01:37:56PM +0000, Hugues FRUCHET wrote:
>> Hi Andi,
>>
>> We have already discussed about that here:
>> https://lore.kernel.org/patchwork/patch/866406/
>>
>> Now that strscpy is largely deployed within kernel, could you retest
>> with the change I suggested ?
>
> strscpy is not the correct fix because it leaks uninitialized memory
> to the receiver. You need the memset.
>
> -Andi
>