2020-07-29 23:11:25

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: build failure after merge of the origin tree

Hi all,

After merging the origin tree, today's linux-next build (x86_64
allmodconfig) failed like this:

In file included from include/asm-generic/percpu.h:7,
from arch/x86/include/asm/percpu.h:556,
from arch/x86/include/asm/preempt.h:6,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:13,
from arch/x86/crypto/glue_helper.c:13:
include/linux/random.h:123:24: error: variable 'net_rand_state' with 'latent_entropy' attribute must not be local
123 | DECLARE_PER_CPU(struct rnd_state, net_rand_state) __latent_entropy;
| ^~~~~~~~~
include/linux/percpu-defs.h:88:38: note: in definition of macro 'DECLARE_PER_CPU_SECTION'
88 | extern __PCPU_ATTRS(sec) __typeof__(type) name
| ^~~~
include/linux/random.h:123:1: note: in expansion of macro 'DECLARE_PER_CPU'
123 | DECLARE_PER_CPU(struct rnd_state, net_rand_state) __latent_entropy;
| ^~~~~~~~~~~~~~~

Caused by commit

f227e3ec3b5c ("random32: update the net random state on interrupt and activity")

I have reverted that commit for today.

In case it matters:

$ x86_64-linux-gnu-gcc --version
x86_64-linux-gnu-gcc (Debian 9.3.0-13) 9.3.0

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2020-07-29 23:44:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Wed, Jul 29, 2020 at 4:08 PM Stephen Rothwell <[email protected]> wrote:
>
> include/linux/random.h:123:24: error: variable 'net_rand_state' with 'latent_entropy' attribute must not be local
> 123 | DECLARE_PER_CPU(struct rnd_state, net_rand_state) __latent_entropy;

Hmm.

Ok, this shows a limitation of my allmodconfig testing (and all my
normal builds) - no plugins. So that problem wasn't as obvious as it
should have been.

That error isn't very helpful, in that I think it actually is very
wrong. The variable really isn't local at all.

I think what the plugin *means* by "local" is "automatic", and I think
it uses the wrong test for it. IOW, looking at the plugin, it does

if (!TREE_STATIC(*node)) {
*no_add_attrs = true;
error("variable %qD with %qE attribute must
not be local",
*node, name);

and what I think it really wants is that it has a static address - so
a global variable is fine - as opposed to being an actual static
declaration.

Also looking at the plugin, I suspect it is going to be very unhappy
about the fact that the attribute is there both on a declaration and
on the actual definition. The code later seems to really only want to
work on the definition, since it's creating an initializer..

IOW, I get the feeling that the plugin is confused, and it so happened
that the only variables we'd marked for latent entropy were static
ones. But I haven't done gcc plugins, so...

Adding the gcc plugin people. Otherwise the only option seems to be to
just remove that __latent_entropy marker.

Linus

2020-07-30 00:13:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Wed, Jul 29, 2020 at 4:43 PM Linus Torvalds
<[email protected]> wrote:
>
> Ok, this shows a limitation of my allmodconfig testing (and all my
> normal builds) - no plugins. So that problem wasn't as obvious as it
> should have been.

Ok, that was easy to install and get the coverage, and now I see the error.

Except I still don't know the gcc plugins well enough to fix it at the
plugin level. And the gcc docs only talk about TREE_STATIC() for
functions, not for variables. Apparently variables should use
DECL_THIS_EXTERN or DECL_THIS_STATIC according to the docs I find, but
..

Removing the __latent_entropy marker obviously fixes things.

Linus

2020-07-30 02:15:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Wed, Jul 29, 2020 at 5:09 PM Linus Torvalds
<[email protected]> wrote:
>
> Removing the __latent_entropy marker obviously fixes things.

Ok, I did that for now. I spent a few minutes looking at the gcc
plugin in case I'd be hit by some sudden stroke of genius, but that
didn't happen, so let's avoid the issue until somebody who knows the
gcc plugins better can come up with what the right solution is.

Linus

2020-07-30 02:31:28

by Willy Tarreau

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Wed, Jul 29, 2020 at 07:12:58PM -0700, Linus Torvalds wrote:
> On Wed, Jul 29, 2020 at 5:09 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > Removing the __latent_entropy marker obviously fixes things.
>
> Ok, I did that for now. I spent a few minutes looking at the gcc
> plugin in case I'd be hit by some sudden stroke of genius, but that
> didn't happen, so let's avoid the issue until somebody who knows the
> gcc plugins better can come up with what the right solution is.

I've looked if we couldn't we work around this by declaring another
static variable with __latent_entropy and use it to initialize
net_rand_state early, for example in prandom_init(), but there we
already fill net_rand_state with randoms so I'm wondering if that
__latent_entropy is used before prandom_init() or if its sole purpose
is to provide extra initial entropy to be combined with the one
prandom_init() will add.

Willy

2020-07-30 03:18:56

by Kees Cook

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Wed, Jul 29, 2020 at 04:43:04PM -0700, Linus Torvalds wrote:
> On Wed, Jul 29, 2020 at 4:08 PM Stephen Rothwell <[email protected]> wrote:
> >
> > include/linux/random.h:123:24: error: variable 'net_rand_state' with 'latent_entropy' attribute must not be local
> > 123 | DECLARE_PER_CPU(struct rnd_state, net_rand_state) __latent_entropy;
>
> Hmm.
>
> Ok, this shows a limitation of my allmodconfig testing (and all my
> normal builds) - no plugins. So that problem wasn't as obvious as it
> should have been.

I'll look into this more tomorrow. (But yes, __latent_entropy is
absolutely used for globals already, as you found, but this is the first
percpu it was applied to...)

> Adding the gcc plugin people. Otherwise the only option seems to be to
> just remove that __latent_entropy marker.

And just another heads-up, the patch[1] (which was never sent to a public
list) also breaks arm64 (circular header needs?):

$ make CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 defconfig
...
$ make -j$(getconf _NPROCESSORS_ONLN) CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64
...
In file included from ./arch/arm64/include/asm/smp.h:33,
from ./include/linux/smp.h:82,
from ./include/linux/percpu.h:7,
from ./include/linux/random.h:14,
from arch/arm64/kernel/pointer_auth.c:5:
./arch/arm64/include/asm/pointer_auth.h: In function ‘ptrauth_keys_init_user’:
./arch/arm64/include/asm/pointer_auth.h:40:3: error: implicit declaration of function ‘get_random_bytes’; did you mean ‘get_random_once’? [-Werror=implicit-function-declaration]
40 | get_random_bytes(&keys->apia, sizeof(keys->apia));
| ^~~~~


[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f227e3ec3b5cad859ad15666874405e8c1bbc1d4

--
Kees Cook

2020-07-30 03:24:18

by Willy Tarreau

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

Hi Kees,

On Wed, Jul 29, 2020 at 08:17:48PM -0700, Kees Cook wrote:
> And just another heads-up, the patch[1] (which was never sent to a public
> list) also breaks arm64 (circular header needs?):
(...)

Definitely, we've just got a report about this, I'll have a look once
I'm at the office. I'd like to check that we don't obviously break
another arch by just removing percpu. If at least shuffling them around
is sufficient that'd be nice. Otherwise we'll likely need a separate
header (which is not a bad thing for the long term).

Thanks!
Willy

2020-07-30 06:16:46

by Willy Tarreau

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Thu, Jul 30, 2020 at 05:22:50AM +0200, Willy Tarreau wrote:
> On Wed, Jul 29, 2020 at 08:17:48PM -0700, Kees Cook wrote:
> > And just another heads-up, the patch[1] (which was never sent to a public
> > list) also breaks arm64 (circular header needs?):
> (...)
>
> Definitely, we've just got a report about this, I'll have a look once
> I'm at the office. I'd like to check that we don't obviously break
> another arch by just removing percpu. If at least shuffling them around
> is sufficient that'd be nice. Otherwise we'll likely need a separate
> header (which is not a bad thing for the long term).

So Linus proposed a clean solution which might be harder to backport
but looks better for 5.8. However the attached one addresses the issue
for me on arm64 and still works on x86_64, arm, mips. I think we should
go with this one first then apply Linus' one on top of it to be long
term proof, and backport only the first one. Linus ?

Willy


Attachments:
(No filename) (969.00 B)
0001-random-fix-circular-include-dependency-on-arm64-afte.patch (1.70 kB)
Download all attachments

2020-07-30 10:10:27

by Catalin Marinas

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Thu, Jul 30, 2020 at 10:59:09AM +0100, Marc Zyngier wrote:
> From 33d819f4efa0a4474b5dc2e4bcaef1b886ca30c3 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <[email protected]>
> Date: Thu, 30 Jul 2020 10:53:05 +0100
> Subject: [PATCH] arm64: Drop unnecessary include from asm/smp.h
>
> asm/pointer_auth.h is not needed anymore in asm/smp.h, as 62a679cb2825
> ("arm64: simplify ptrauth initialization") removed the keys from the
> secondary_data structure.
>
> This also cures a compilation issue introduced by f227e3ec3b5c
> ("random32: update the net random state on interrupt and activity").
>
> Fixes: 62a679cb2825 ("arm64: simplify ptrauth initialization")
> Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and
> activity")
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/include/asm/smp.h | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index ea268d88b6f7..a0c8a0b65259 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -30,7 +30,6 @@
> #include <linux/threads.h>
> #include <linux/cpumask.h>
> #include <linux/thread_info.h>
> -#include <asm/pointer_auth.h>
>
> DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);

I think this arm64 patch makes sense irrespective of any other generic
fixes. If Will wants to take it as a fix:

Acked-by: Catalin Marinas <[email protected]>

(otherwise I'll queue it for 5.9)

2020-07-30 10:26:28

by Marc Zyngier

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On 2020-07-30 07:14, Willy Tarreau wrote:
> On Thu, Jul 30, 2020 at 05:22:50AM +0200, Willy Tarreau wrote:
>> On Wed, Jul 29, 2020 at 08:17:48PM -0700, Kees Cook wrote:
>> > And just another heads-up, the patch[1] (which was never sent to a public
>> > list) also breaks arm64 (circular header needs?):
>> (...)
>>
>> Definitely, we've just got a report about this, I'll have a look once
>> I'm at the office. I'd like to check that we don't obviously break
>> another arch by just removing percpu. If at least shuffling them
>> around
>> is sufficient that'd be nice. Otherwise we'll likely need a separate
>> header (which is not a bad thing for the long term).
>
> So Linus proposed a clean solution which might be harder to backport
> but looks better for 5.8. However the attached one addresses the issue
> for me on arm64 and still works on x86_64, arm, mips. I think we should
> go with this one first then apply Linus' one on top of it to be long
> term proof, and backport only the first one. Linus ?

So for what it's worth, this patch fixes the arm64 compilation problem
for me:

Tested-by: Marc Zyngier <[email protected]>

I had come up with a different fix, only touching arm64 (see below).

Thanks,

M.

From 33d819f4efa0a4474b5dc2e4bcaef1b886ca30c3 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <[email protected]>
Date: Thu, 30 Jul 2020 10:53:05 +0100
Subject: [PATCH] arm64: Drop unnecessary include from asm/smp.h

asm/pointer_auth.h is not needed anymore in asm/smp.h, as 62a679cb2825
("arm64: simplify ptrauth initialization") removed the keys from the
secondary_data structure.

This also cures a compilation issue introduced by f227e3ec3b5c
("random32: update the net random state on interrupt and activity").

Fixes: 62a679cb2825 ("arm64: simplify ptrauth initialization")
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt
and activity")
Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/smp.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index ea268d88b6f7..a0c8a0b65259 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -30,7 +30,6 @@
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/thread_info.h>
-#include <asm/pointer_auth.h>

DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);

--
2.27.0


--
Who you jivin' with that Cosmik Debris?

2020-07-30 15:03:50

by Will Deacon

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Thu, Jul 30, 2020 at 11:09:23AM +0100, Catalin Marinas wrote:
> On Thu, Jul 30, 2020 at 10:59:09AM +0100, Marc Zyngier wrote:
> > From 33d819f4efa0a4474b5dc2e4bcaef1b886ca30c3 Mon Sep 17 00:00:00 2001
> > From: Marc Zyngier <[email protected]>
> > Date: Thu, 30 Jul 2020 10:53:05 +0100
> > Subject: [PATCH] arm64: Drop unnecessary include from asm/smp.h
> >
> > asm/pointer_auth.h is not needed anymore in asm/smp.h, as 62a679cb2825
> > ("arm64: simplify ptrauth initialization") removed the keys from the
> > secondary_data structure.
> >
> > This also cures a compilation issue introduced by f227e3ec3b5c
> > ("random32: update the net random state on interrupt and activity").
> >
> > Fixes: 62a679cb2825 ("arm64: simplify ptrauth initialization")
> > Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and
> > activity")
> > Signed-off-by: Marc Zyngier <[email protected]>
> > ---
> > arch/arm64/include/asm/smp.h | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> > index ea268d88b6f7..a0c8a0b65259 100644
> > --- a/arch/arm64/include/asm/smp.h
> > +++ b/arch/arm64/include/asm/smp.h
> > @@ -30,7 +30,6 @@
> > #include <linux/threads.h>
> > #include <linux/cpumask.h>
> > #include <linux/thread_info.h>
> > -#include <asm/pointer_auth.h>
> >
> > DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
>
> I think this arm64 patch makes sense irrespective of any other generic
> fixes. If Will wants to take it as a fix:
>
> Acked-by: Catalin Marinas <[email protected]>
>
> (otherwise I'll queue it for 5.9)

Cheers, I'll pick this up asap.

Will

2020-07-30 17:50:25

by Kees Cook

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Thu, Jul 30, 2020 at 08:14:07AM +0200, Willy Tarreau wrote:
> On Thu, Jul 30, 2020 at 05:22:50AM +0200, Willy Tarreau wrote:
> > On Wed, Jul 29, 2020 at 08:17:48PM -0700, Kees Cook wrote:
> > > And just another heads-up, the patch[1] (which was never sent to a public
> > > list) also breaks arm64 (circular header needs?):
> > (...)
> >
> > Definitely, we've just got a report about this, I'll have a look once
> > I'm at the office. I'd like to check that we don't obviously break
> > another arch by just removing percpu. If at least shuffling them around
> > is sufficient that'd be nice. Otherwise we'll likely need a separate
> > header (which is not a bad thing for the long term).
>
> So Linus proposed a clean solution which might be harder to backport
> but looks better for 5.8. However the attached one addresses the issue
> for me on arm64 and still works on x86_64, arm, mips. I think we should
> go with this one first then apply Linus' one on top of it to be long
> term proof, and backport only the first one. Linus ?
>
> Willy

> From 18fba9e2dfb16605a722e01f95d9e2d020efaa42 Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <[email protected]>
> Date: Thu, 30 Jul 2020 07:59:24 +0200
> Subject: random: fix circular include dependency on arm64 after addition of
> percpu.h
> MIME-Version: 1.0
> Content-Type: text/plain; charset=latin1
> Content-Transfer-Encoding: 8bit
>
> Daniel D?az and Kees Cook independently reported that commit f227e3ec3b5c
> ("random32: update the net random state on interrupt and activity") broke
> arm64 due to a circular dependency on include files since the addition of
> percpu.h in random.h.
>
> The correct fix would definitely be to move all the prandom32 stuff out
> of random.h but for backporting, a smaller solution is preferred. This
> one replaces linux/percpu.h with asm/percpu.h, and this fixes the problem
> on x86_64, arm64, arm, and mips. Note that moving percpu.h around didn't
> change anything and that removing it entirely broke differently. When
> backporting, such options might still be considered if this patch fails
> to help.
>
> Reported-by: Daniel D?az <[email protected]>
> Reported-by: Kees Cook <[email protected]>

FWIW, I was only a messenger. Sami (in Cc) pointed it out to me right
before I got the email from Linus for the x86 plugin breakage. :)

But yes, thanks, this seems to work for me.

> Fixes: f227e3ec3b5c

nit:

Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")

-Kees

> Cc: Stephen Rothwell <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Signed-off-by: Willy Tarreau <[email protected]>
> ---
> include/linux/random.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/random.h b/include/linux/random.h
> index f310897f051d..9ab7443bd91b 100644
> --- a/include/linux/random.h
> +++ b/include/linux/random.h
> @@ -11,7 +11,7 @@
> #include <linux/kernel.h>
> #include <linux/list.h>
> #include <linux/once.h>
> -#include <linux/percpu.h>
> +#include <asm/percpu.h>
>
> #include <uapi/linux/random.h>
>
> --
> 2.20.1
>


--
Kees Cook

2020-07-30 18:28:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Wed, Jul 29, 2020 at 8:17 PM Kees Cook <[email protected]> wrote:
>
> I'll look into this more tomorrow. (But yes, __latent_entropy is
> absolutely used for globals already, as you found, but this is the first
> percpu it was applied to...)

Note that it was always per-cpu.

The only thing that changed was that it was declared static in
lib/random.c vs being externally visible.

So it's not about the percpu part - although that then showed the
arm64 circular include file problem. It's literally that now the exact
same thing is declared in a header file and not marked "static".

Now, I don't think the __latent_entropy code ever really worked all
that well for per-cpu initializations. It ends up generating one
single initializer, which obviously isn't optimal. But I guess it's as
good as it gets.

Unrelated side note: I notice that the plugins could be simplified a
bit now that we require gcc 4.9 or later. There's a fair amount of
cruft for the earlier gcc versions.

I'm not sure how seriously the gcc plugins are actually maintained (no
offense) aside from just keeping them limping along. Does anybody
actually use them in production? I thought google had mostly moved on
to clang.

Linus

2020-07-30 18:48:05

by Kees Cook

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the origin tree

On Thu, Jul 30, 2020 at 11:24:44AM -0700, Linus Torvalds wrote:
> On Wed, Jul 29, 2020 at 8:17 PM Kees Cook <[email protected]> wrote:
> >
> > I'll look into this more tomorrow. (But yes, __latent_entropy is
> > absolutely used for globals already, as you found, but this is the first
> > percpu it was applied to...)
>
> Note that it was always per-cpu.
>
> The only thing that changed was that it was declared static in
> lib/random.c vs being externally visible.

Yup, thanks. I realized that a bit after sending my email. :)

> Unrelated side note: I notice that the plugins could be simplified a
> bit now that we require gcc 4.9 or later. There's a fair amount of
> cruft for the earlier gcc versions.

Yup -- Masahiro keeps poking the build system, but I haven't cleaned up
the header file macros to keep up with the recent jumps. (It falls a bit
low on my TODO list since it's a bit of a mechanical cleanup. I'm open
to anyone that would like to send patches, though!)

> I'm not sure how seriously the gcc plugins are actually maintained (no
> offense) aside from just keeping them limping along. Does anybody
> actually use them in production? I thought google had mostly moved on
> to clang.

They're part of regular testing, and there is ongoing development
(e.g. see Alex Popov's recent series[1], which is in -next waiting for
the v5.9 merge window). I hear regularly from folks using randstruct,
stackleak, structleak, and latent_entropy. But yes, Google has moved
to Clang where we're using Clang's implementation of structleak
(auto-var-init) but there has been work to get randstruct ported (as
desired by at least one Android vendor), though it's currently stalled.

-Kees

[1] https://lore.kernel.org/lkml/[email protected]/

--
Kees Cook