2024-06-01 17:03:03

by Erick Archer

[permalink] [raw]
Subject: [PATCH v4 0/3] Hardening perf subsystem

Hi everyone,

This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].

In the first patch, the "struct amd_uncore_ctx" can be refactored to
use a flex array for the "events" member. This way, the allocation/
freeing of the memory can be simplified. Then, the struct_size()
helper can be used to do the arithmetic calculation for the memory
to be allocated.

In the second patch, as the "struct intel_uncore_box" ends in a
flexible array, the preferred way in the kernel is to use the
struct_size() helper to do the arithmetic instead of the calculation
"size + count * size" in the kzalloc_node() function.

In the third patch, as the "struct perf_buffer" also ends in a
flexible array, the preferred way in the kernel is to use the
struct_size() helper to do the arithmetic instead of the calculation
"size + count * size" in the kzalloc_node() functions. At the same
time, prepare for the coming implementation by GCC and Clang of the
__counted_by attribute.

This time, I have decided to send these three patches in the same serie
since all of them has been rejected by the maintainers. I have used
the v4 tag since it is the latest iteration in one of the patches.

The reason these patches were rejected is that Peter Zijlstra detest
the struct_size() helper [3][4]. However, Kees Cook and I consider that
the use of this helper improves readability. But we can all say that it
is a matter of preference.

Anyway, leaving aside personal preferences, these patches has the
following pros:

- Code robustness improvement (__counted_by coverage).
- Code robustness improvement (use of struct_size() to do calculations
on memory allocator functions).
- Fewer lines of code.
- Follow the recommendations made in "Deprecated Interfaces, Language
Features, Attributes, and Conventions" [5] as the preferred method
of doing things in the kernel.
- Widely used in the kernel.
- Widely accepted in the kernel.

There are also patches in this subsystem that use the struct_size()
helper:

6566f907bf31 ("Convert intel uncore to struct_size") by Matthew Wilcox
dfbc411e0a5e ("perf/x86/rapl: Prefer struct_size() over open coded arithmetic") by me

Therefore, I would like these patches to be applied this time.

Best regards,
Erick

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Link: https://lore.kernel.org/linux-hardening/[email protected] [3]
Link: https://lore.kernel.org/linux-hardening/[email protected] [4]
Link: https://docs.kernel.org/process/deprecated.html [5]
---
Changes in v4:

- Add the "Reviewed-by:" tag to the three patches.
- Rebase against linux next (tag next-20240531).

Previous versions:

perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237E4848B44A5226BD3551C8BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/

perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237F4D39BF6AA0FF40E91638B392@AS8PR02MB7237.eurprd02.prod.outlook.com/

perf/ring_buffer: Prefer struct_size over open coded arithmetic
v3 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72379B4807F3951A1B926BA58BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/
v2 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237569E4FBE0B26F62FDFDB8B1D2@AS8PR02MB7237.eurprd02.prod.outlook.com/
v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72372AB065EA8340D960CCC48B1B2@AS8PR02MB7237.eurprd02.prod.outlook.com/

Changes in v3:
- Refactor the logic, compared to the previous version, of the second
rb_alloc() function to gain __counted_by() coverage (Kees Cook and
Christophe JAILLET).

Changes in v2:
- Annotate "struct perf_buffer" with __counted_by() attribute (Kees Cook).
- Refactor the logic to gain __counted_by() coverage (Kees Cook).
---
Erick Archer (3):
perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
perf/ring_buffer: Prefer struct_size over open coded arithmetic

arch/x86/events/amd/uncore.c | 18 +++++-------------
arch/x86/events/intel/uncore.c | 7 +++----
kernel/events/internal.h | 2 +-
kernel/events/ring_buffer.c | 15 ++++-----------
4 files changed, 13 insertions(+), 29 deletions(-)

--
2.25.1



2024-06-08 08:51:13

by Erick Archer

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

Hi Andrew,

On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> Hi everyone,
>
> This is an effort to get rid of all multiplications from allocation
> functions in order to prevent integer overflows [1][2].
>
> In the first patch, the "struct amd_uncore_ctx" can be refactored to
> use a flex array for the "events" member. This way, the allocation/
> freeing of the memory can be simplified. Then, the struct_size()
> helper can be used to do the arithmetic calculation for the memory
> to be allocated.
>
> In the second patch, as the "struct intel_uncore_box" ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() function.
>
> In the third patch, as the "struct perf_buffer" also ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() functions. At the same
> time, prepare for the coming implementation by GCC and Clang of the
> __counted_by attribute.
>
> This time, I have decided to send these three patches in the same serie
> since all of them has been rejected by the maintainers. I have used
> the v4 tag since it is the latest iteration in one of the patches.
>
> The reason these patches were rejected is that Peter Zijlstra detest
> the struct_size() helper [3][4]. However, Kees Cook and I consider that
> the use of this helper improves readability. But we can all say that it
> is a matter of preference.
>
> Anyway, leaving aside personal preferences, these patches has the
> following pros:
>
> - Code robustness improvement (__counted_by coverage).
> - Code robustness improvement (use of struct_size() to do calculations
> on memory allocator functions).
> - Fewer lines of code.
> - Follow the recommendations made in "Deprecated Interfaces, Language
> Features, Attributes, and Conventions" [5] as the preferred method
> of doing things in the kernel.
> - Widely used in the kernel.
> - Widely accepted in the kernel.
>
> There are also patches in this subsystem that use the struct_size()
> helper:
>
> 6566f907bf31 ("Convert intel uncore to struct_size") by Matthew Wilcox
> dfbc411e0a5e ("perf/x86/rapl: Prefer struct_size() over open coded arithmetic") by me
>
> Therefore, I would like these patches to be applied this time.

This is my last attemp to get these patches applied. I have decided to
send this mail to try to unjam this situation. I have folowed all the
reviewers comments and have no response from the maintainers other than
"I detest the struct_size() helper".

Therefore, I would like to know your opinion and that of other people
about these patches. If the final consensus is that the code has no real
benefit, I will stop insisting on it ;)

Regards,
Erick
>
> Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
> Link: https://github.com/KSPP/linux/issues/160 [2]
> Link: https://lore.kernel.org/linux-hardening/[email protected] [3]
> Link: https://lore.kernel.org/linux-hardening/[email protected] [4]
> Link: https://docs.kernel.org/process/deprecated.html [5]
> ---
> Changes in v4:
>
> - Add the "Reviewed-by:" tag to the three patches.
> - Rebase against linux next (tag next-20240531).
>
> Previous versions:
>
> perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
> v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237E4848B44A5226BD3551C8BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/
>
> perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
> v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237F4D39BF6AA0FF40E91638B392@AS8PR02MB7237.eurprd02.prod.outlook.com/
>
> perf/ring_buffer: Prefer struct_size over open coded arithmetic
> v3 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72379B4807F3951A1B926BA58BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/
> v2 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237569E4FBE0B26F62FDFDB8B1D2@AS8PR02MB7237.eurprd02.prod.outlook.com/
> v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72372AB065EA8340D960CCC48B1B2@AS8PR02MB7237.eurprd02.prod.outlook.com/
>
> Changes in v3:
> - Refactor the logic, compared to the previous version, of the second
> rb_alloc() function to gain __counted_by() coverage (Kees Cook and
> Christophe JAILLET).
>
> Changes in v2:
> - Annotate "struct perf_buffer" with __counted_by() attribute (Kees Cook).
> - Refactor the logic to gain __counted_by() coverage (Kees Cook).
> ---
> Erick Archer (3):
> perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
> perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
> perf/ring_buffer: Prefer struct_size over open coded arithmetic
>
> arch/x86/events/amd/uncore.c | 18 +++++-------------
> arch/x86/events/intel/uncore.c | 7 +++----
> kernel/events/internal.h | 2 +-
> kernel/events/ring_buffer.c | 15 ++++-----------
> 4 files changed, 13 insertions(+), 29 deletions(-)
>
> --
> 2.25.1
>

2024-06-10 10:08:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Sat, Jun 08, 2024 at 10:50:44AM +0200, Erick Archer wrote:
> Hi Andrew,
>
> On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> > Hi everyone,
> >
> > This is an effort to get rid of all multiplications from allocation
> > functions in order to prevent integer overflows [1][2].
> >
> > In the first patch, the "struct amd_uncore_ctx" can be refactored to
> > use a flex array for the "events" member. This way, the allocation/
> > freeing of the memory can be simplified. Then, the struct_size()
> > helper can be used to do the arithmetic calculation for the memory
> > to be allocated.
> >
> > In the second patch, as the "struct intel_uncore_box" ends in a
> > flexible array, the preferred way in the kernel is to use the
> > struct_size() helper to do the arithmetic instead of the calculation
> > "size + count * size" in the kzalloc_node() function.
> >
> > In the third patch, as the "struct perf_buffer" also ends in a
> > flexible array, the preferred way in the kernel is to use the
> > struct_size() helper to do the arithmetic instead of the calculation
> > "size + count * size" in the kzalloc_node() functions. At the same
> > time, prepare for the coming implementation by GCC and Clang of the
> > __counted_by attribute.
> >
> > This time, I have decided to send these three patches in the same serie
> > since all of them has been rejected by the maintainers. I have used
> > the v4 tag since it is the latest iteration in one of the patches.
> >
> > The reason these patches were rejected is that Peter Zijlstra detest
> > the struct_size() helper [3][4]. However, Kees Cook and I consider that
> > the use of this helper improves readability. But we can all say that it
> > is a matter of preference.
> >
> > Anyway, leaving aside personal preferences, these patches has the
> > following pros:
> >
> > - Code robustness improvement (__counted_by coverage).
> > - Code robustness improvement (use of struct_size() to do calculations
> > on memory allocator functions).
> > - Fewer lines of code.
> > - Follow the recommendations made in "Deprecated Interfaces, Language
> > Features, Attributes, and Conventions" [5] as the preferred method
> > of doing things in the kernel.
> > - Widely used in the kernel.
> > - Widely accepted in the kernel.
> >
> > There are also patches in this subsystem that use the struct_size()
> > helper:
> >
> > 6566f907bf31 ("Convert intel uncore to struct_size") by Matthew Wilcox
> > dfbc411e0a5e ("perf/x86/rapl: Prefer struct_size() over open coded arithmetic") by me
> >
> > Therefore, I would like these patches to be applied this time.
>
> This is my last attemp to get these patches applied. I have decided to
> send this mail to try to unjam this situation. I have folowed all the
> reviewers comments and have no response from the maintainers other than
> "I detest the struct_size() helper".
>
> Therefore, I would like to know your opinion and that of other people
> about these patches. If the final consensus is that the code has no real
> benefit, I will stop insisting on it ;)

Seriously, I've got plenty patches to look at that actually do
something. This falls well within the 'random changes for changes sake'
and goes waaaaay down the priority list.

If you're addressing an actual issue, state so. Otherwise, go play
somewhere else.

2024-06-10 17:59:40

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> Hi everyone,
>
> This is an effort to get rid of all multiplications from allocation
> functions in order to prevent integer overflows [1][2].

I didn't actually see these 3 patches in this thread nor via lore.

> In the first patch, the "struct amd_uncore_ctx" can be refactored to
> use a flex array for the "events" member. This way, the allocation/
> freeing of the memory can be simplified. Then, the struct_size()
> helper can be used to do the arithmetic calculation for the memory
> to be allocated.

I like this patch because it reduces the allocation from 2 to 1. This
isn't what Peter might see as "churn": this is an improvement in resource
utilization.

I think it's here:
https://lore.kernel.org/linux-hardening/AS8PR02MB7237E4848B44A5226BD3551C8BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/

> In the second patch, as the "struct intel_uncore_box" ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() function.

This is
https://lore.kernel.org/linux-hardening/AS8PR02MB7237F4D39BF6AA0FF40E91638B392@AS8PR02MB7237.eurprd02.prod.outlook.com/

I prefer this style, as it makes things unambiguous ("this will never
wrap around") without having to check the associated types and doesn't make
the resulting binary code different in the "can never overflow" case.

In this particular case:

int size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);

"int numshared" comes from struct intel_uncore_type::num_shared_regs,
which is:

unsigned num_shared_regs:8;

And the struct sizes are:

$ pahole -C intel_uncore_box gcc-boot/vmlinux | grep size:
/* size: 488, cachelines: 8, members: 19 */
$ pahole -C intel_uncore_extra_reg gcc-boot/vmlinux | grep size:
/* size: 96, cachelines: 2, members: 5 */

So we have:

s32 size = 488 + u8 * 96

Max size here is 24968 so it can never overflow an s32, so I can see
why Peter views this as "churn".

I still think the patch is a coding style improvement, but okay.

> In the third patch, as the "struct perf_buffer" also ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() functions. At the same
> time, prepare for the coming implementation by GCC and Clang of the
> __counted_by attribute.

This is
https://lore.kernel.org/linux-hardening/AS8PR02MB72379B4807F3951A1B926BA58BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/

This provides __counted_by coverage, and I think this is important to
gain in ever place we can. Given that this is part of a ring buffer
implementation that is arbitrarily sized, this is exactly the kind of
place I'd like to see __counted_by used. This is a runtime robustness
improvement, so I don't see this a "churn" at all.


Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
put them in the hardening tree to keep them out of your way. It seems
clear you don't want patch 2 at all.

-Kees

--
Kees Cook

2024-06-10 20:06:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Mon, Jun 10, 2024 at 10:28:52AM -0700, Kees Cook wrote:
> On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> > Hi everyone,
> >
> > This is an effort to get rid of all multiplications from allocation
> > functions in order to prevent integer overflows [1][2].
>
> I didn't actually see these 3 patches in this thread nor via lore.

He managed to break threading between 0/n and the rest.

> > In the first patch, the "struct amd_uncore_ctx" can be refactored to
> > use a flex array for the "events" member. This way, the allocation/
> > freeing of the memory can be simplified. Then, the struct_size()
> > helper can be used to do the arithmetic calculation for the memory
> > to be allocated.
>
> I like this patch because it reduces the allocation from 2 to 1. This
> isn't what Peter might see as "churn": this is an improvement in resource
> utilization.

But then he went and used that struct_size() abomination :/

> I prefer this style, as it makes things unambiguous ("this will never
> wrap around") without having to check the associated types and doesn't make
> the resulting binary code different in the "can never overflow" case.
>
> In this particular case:
>
> int size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);
>
> "int numshared" comes from struct intel_uncore_type::num_shared_regs,
> which is:
>
> unsigned num_shared_regs:8;
>
> And the struct sizes are:
>
> $ pahole -C intel_uncore_box gcc-boot/vmlinux | grep size:
> /* size: 488, cachelines: 8, members: 19 */
> $ pahole -C intel_uncore_extra_reg gcc-boot/vmlinux | grep size:
> /* size: 96, cachelines: 2, members: 5 */
>
> So we have:
>
> s32 size = 488 + u8 * 96
>
> Max size here is 24968 so it can never overflow an s32, so I can see
> why Peter views this as "churn".
>
> I still think the patch is a coding style improvement, but okay.

I really detest this thing because it makes what was trivially readable
into something opaque. Get me that type qualifier that traps on overflow
and write plain C. All this __builtin_overflow garbage is just that,
unreadable nonsense.

> This provides __counted_by coverage, and I think this is important to
> gain in ever place we can. Given that this is part of a ring buffer
> implementation that is arbitrarily sized, this is exactly the kind of
> place I'd like to see __counted_by used. This is a runtime robustness
> improvement, so I don't see this a "churn" at all.

Again, mixed in with that other crap. Anyway, remind me wth this
__counted_by thing actually does?

> Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
> put them in the hardening tree to keep them out of your way. It seems
> clear you don't want patch 2 at all.

I prefer to not have struct_size() anywhere at all. Please just write
readable code.

Again, if you really care about that overflow muck, get a useable C type
qualifier or something so we can write readable code.

2024-06-10 21:46:18

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Mon, Jun 10, 2024 at 10:05:44PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 10:28:52AM -0700, Kees Cook wrote:
> > On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> > > Hi everyone,
> > >
> > > This is an effort to get rid of all multiplications from allocation
> > > functions in order to prevent integer overflows [1][2].
> >
> > I didn't actually see these 3 patches in this thread nor via lore.
>
> He managed to break threading between 0/n and the rest.
>
> > > In the first patch, the "struct amd_uncore_ctx" can be refactored to
> > > use a flex array for the "events" member. This way, the allocation/
> > > freeing of the memory can be simplified. Then, the struct_size()
> > > helper can be used to do the arithmetic calculation for the memory
> > > to be allocated.
> >
> > I like this patch because it reduces the allocation from 2 to 1. This
> > isn't what Peter might see as "churn": this is an improvement in resource
> > utilization.
>
> But then he went and used that struct_size() abomination :/
>
> > I prefer this style, as it makes things unambiguous ("this will never
> > wrap around") without having to check the associated types and doesn't make
> > the resulting binary code different in the "can never overflow" case.
> >
> > In this particular case:
> >
> > int size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);
> >
> > "int numshared" comes from struct intel_uncore_type::num_shared_regs,
> > which is:
> >
> > unsigned num_shared_regs:8;
> >
> > And the struct sizes are:
> >
> > $ pahole -C intel_uncore_box gcc-boot/vmlinux | grep size:
> > /* size: 488, cachelines: 8, members: 19 */
> > $ pahole -C intel_uncore_extra_reg gcc-boot/vmlinux | grep size:
> > /* size: 96, cachelines: 2, members: 5 */
> >
> > So we have:
> >
> > s32 size = 488 + u8 * 96
> >
> > Max size here is 24968 so it can never overflow an s32, so I can see
> > why Peter views this as "churn".
> >
> > I still think the patch is a coding style improvement, but okay.
>
> I really detest this thing because it makes what was trivially readable
> into something opaque. Get me that type qualifier that traps on overflow
> and write plain C. All this __builtin_overflow garbage is just that,
> unreadable nonsense.

It's more readable than container_of(), IMO. "give me the struct size
for variable VAR, which has a flexible array MEMBER, when we have COUNT
many of them": struct_size(VAR, MEMBER, COUNT). It's more readable, more
robust, and provides saturation in the face of potential wrap-around.

> > This provides __counted_by coverage, and I think this is important to
> > gain in ever place we can. Given that this is part of a ring buffer
> > implementation that is arbitrarily sized, this is exactly the kind of
> > place I'd like to see __counted_by used. This is a runtime robustness
> > improvement, so I don't see this a "churn" at all.
>
> Again, mixed in with that other crap. Anyway, remind me wth this
> __counted_by thing actually does?

It provides annotation for the compiler to perform run-time bounds
checking on dynamically sized arrays. i.e. CONFIG_FORTIFY_SOURCE and
CONFIG_UBSAN_BOUNDS can actually reason about annotated flexible arrays
instead of just saying "oh no a flexible array, I give up".

> > Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
> > put them in the hardening tree to keep them out of your way. It seems
> > clear you don't want patch 2 at all.
>
> I prefer to not have struct_size() anywhere at all. Please just write
> readable code.

That ship has sailed, and it has been keeping things at bay for a while
now. As we make progress on making the compiler able to do this more
naturally, we can work on replacing struct_size(), but it's in use
globally and it's useful both for catching runtime mistakes and for
catching compile-time mistakes (the flexible array has to match the
variable's struct).

-Kees

--
Kees Cook

2024-06-11 07:56:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Mon, Jun 10, 2024 at 02:46:09PM -0700, Kees Cook wrote:

> > I really detest this thing because it makes what was trivially readable
> > into something opaque. Get me that type qualifier that traps on overflow
> > and write plain C. All this __builtin_overflow garbage is just that,
> > unreadable nonsense.
>
> It's more readable than container_of(),

Yeah, no. container_of() is absolutely trivial and very readable.
container_of_const() a lot less so.

(one static_assert() removed)

#define container_of(ptr, type, member) ({ \
void *__mptr = (void *)(ptr); \
((type *)(__mptr - offsetof(type, member))); })

Which is very clear indeed in what it does. Compare with:

#define struct_size(p, member, count) \
__builtin_choose_expr(__is_constexpr(count), \
sizeof(*(p)) + flex_array_size(p, member, count), \
size_add(sizeof(*(p)), flex_array_size(p, member, count)))

And I still have no idea :-(

> IMO. "give me the struct size
> for variable VAR, which has a flexible array MEMBER, when we have COUNT
> many of them": struct_size(VAR, MEMBER, COUNT). It's more readable, more
> robust, and provides saturation in the face of potential wrap-around.

I'm sure you know what it does. Thing is, I don't care because I can
trivially write it myself and not have to care and I'll have forgotten
all about it the moment I sent this email.

It just doesn't make sense to wrap something as utterly trivial as:

size = sizeof(*p) + num*sizeof(p->foo);

We're going to have to agree to disagree on this.

Note how I naturally get the order wrong?

[[ There is the whole FMA angle to this, that is, fundamentally this is a
multiply-accumulate, but the problem there is the same that I noted,
there is no fixed order, a+b*c and a*b+c are both very common
definitions -- although I lean towards the latter being the correct one,
given the order in the naming. I suppose this is a long winded way of
saying that:

#define struct_size(p, member, num) \
mult_add_no_overflow(num, sizeof(p->member), sizeof(*p))

would be *FAR* more readable. And then I still think struct_size() is
less readable than its expansion. ]]

> > > This provides __counted_by coverage, and I think this is important to
> > > gain in ever place we can. Given that this is part of a ring buffer
> > > implementation that is arbitrarily sized, this is exactly the kind of
> > > place I'd like to see __counted_by used. This is a runtime robustness
> > > improvement, so I don't see this a "churn" at all.
> >
> > Again, mixed in with that other crap. Anyway, remind me wth this
> > __counted_by thing actually does?
>
> It provides annotation for the compiler to perform run-time bounds
> checking on dynamically sized arrays. i.e. CONFIG_FORTIFY_SOURCE and
> CONFIG_UBSAN_BOUNDS can actually reason about annotated flexible arrays
> instead of just saying "oh no a flexible array, I give up".

Some day I'll have to look at this FORTIFY_SOURCE and see what it
actually does I suppose :/

> > > Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
> > > put them in the hardening tree to keep them out of your way. It seems
> > > clear you don't want patch 2 at all.
> >
> > I prefer to not have struct_size() anywhere at all. Please just write
> > readable code.
>
> That ship has sailed, and it has been keeping things at bay for a while
> now. As we make progress on making the compiler able to do this more
> naturally, we can work on replacing struct_size(), but it's in use
> globally and it's useful both for catching runtime mistakes and for
> catching compile-time mistakes (the flexible array has to match the
> variable's struct).

I coulnd't quickly find a single instance in the code I care about. So
nothing is sailing afaict.


2024-06-12 19:01:29

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Tue, Jun 11, 2024 at 09:55:42AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 02:46:09PM -0700, Kees Cook wrote:
>
> > > I really detest this thing because it makes what was trivially readable
> > > into something opaque. Get me that type qualifier that traps on overflow
> > > and write plain C. All this __builtin_overflow garbage is just that,
> > > unreadable nonsense.
> >
> > It's more readable than container_of(),
>
> Yeah, no. container_of() is absolutely trivial and very readable.
> container_of_const() a lot less so.

I mean, we have complex macros in the kernel. This isn't uncommon. Look
at cleanup.h. ;)

But that's why we write kern-doc:

* struct_size() - Calculate size of structure with trailing flexible
* array.
* @p: Pointer to the structure.
* @member: Name of the array member.
* @count: Number of elements in the array.

> #define struct_size(p, member, num) \
> mult_add_no_overflow(num, sizeof(p->member), sizeof(*p))
>
> would be *FAR* more readable. And then I still think struct_size() is
> less readable than its expansion. ]]

I'm happy to take patches. And for this bikeshed, this would be better
named under the size_*() helpers which are trying to keep size_t
calculations from overflowing (by saturating). i.e.:

size_add_mult(sizeof(*p), sizeof(*p->member), num)

Generalized overflow handing (check_add/sub/mul_overflow()) helpers
already exist and requires a destination variable to determine type
sizes. It is not safe to allow C semantics to perform
promotion/truncation, so we have to be explicit.

> Some day I'll have to look at this FORTIFY_SOURCE and see what it
> actually does I suppose :/

LOL. It's basically doing compile-time (__builtin_object_size) and
run-time (__builtin_dynamic_object_size) bounds checking on destination
(and source) object sizes, mainly driven by the mentioned builtins:
https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html

> I coulnd't quickly find a single instance in the code I care about. So
> nothing is sailing afaict.

I have to care about all of Linux's code. :P We generally don't have to
defend the kernel from perf, so the hardening changes tend to end up in
core APIs along with compiler improvements.

Anyway! What about the patch that takes the 2 allocations down to 1?
That seems like an obvious improvement.

-Kees

--
Kees Cook

2024-06-12 22:10:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> On Tue, Jun 11, 2024 at 09:55:42AM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 10, 2024 at 02:46:09PM -0700, Kees Cook wrote:
> >
> > > > I really detest this thing because it makes what was trivially readable
> > > > into something opaque. Get me that type qualifier that traps on overflow
> > > > and write plain C. All this __builtin_overflow garbage is just that,
> > > > unreadable nonsense.
> > >
> > > It's more readable than container_of(),
> >
> > Yeah, no. container_of() is absolutely trivial and very readable.
> > container_of_const() a lot less so.
>
> I mean, we have complex macros in the kernel. This isn't uncommon. Look
> at cleanup.h. ;)

Yeah, but in those cases the macro actually saves a lot of typing. A
mult and add is something you shouldn't be needing a macro for.

> But that's why we write kern-doc:

Right, but reading comments is asking for trouble. That is, I really
don't even see comments -- I have a built-in pre-processor that filters
them out. I've been burned too many times by misleading or flat out
wrong comments.

> > #define struct_size(p, member, num) \
> > mult_add_no_overflow(num, sizeof(p->member), sizeof(*p))
> >
> > would be *FAR* more readable. And then I still think struct_size() is
> > less readable than its expansion. ]]
>
> I'm happy to take patches. And for this bikeshed, this would be better
> named under the size_*() helpers which are trying to keep size_t
> calculations from overflowing (by saturating). i.e.:
>
> size_add_mult(sizeof(*p), sizeof(*p->member), num)

Fine I suppose, but what if we want something not size_t? Are we waiting
for the type system extension?

Anyway, I'll take the patch with the above. That at least is readable.

The saturating thing is relying in the allocators never granting INT_MAX
(or whatever size_t actually is) bytes?

> Generalized overflow handing (check_add/sub/mul_overflow()) helpers
> already exist and requires a destination variable to determine type
> sizes. It is not safe to allow C semantics to perform
> promotion/truncation, so we have to be explicit.

Yeah, those are a sodding trainwreck, much like the
__builtin_*_overflow() garbage. Can't we simply have them trap on
overflow and do away with that most terrible calling convention?

If we want to be really fancy we can have a #UD instruction that encodes
a destination register to clear/saturate/whatever.

> > Some day I'll have to look at this FORTIFY_SOURCE and see what it
> > actually does I suppose :/
>
> LOL. It's basically doing compile-time (__builtin_object_size) and
> run-time (__builtin_dynamic_object_size) bounds checking on destination
> (and source) object sizes, mainly driven by the mentioned builtins:
> https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html

Right, I got that far. I also read most of:

https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854

But none of that is showing me generated asm for the various cases. As
such, I don't consider myself informed enough.

> Anyway! What about the patch that takes the 2 allocations down to 1?
> That seems like an obvious improvement.

Separate it from the struct_size() nonsense and Cc the author of that
code (Sandipan IIRC) and I might just apply it.

2024-06-12 23:24:13

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > I'm happy to take patches. And for this bikeshed, this would be better
> > named under the size_*() helpers which are trying to keep size_t
> > calculations from overflowing (by saturating). i.e.:
> >
> > size_add_mult(sizeof(*p), sizeof(*p->member), num)
>
> Fine I suppose, but what if we want something not size_t? Are we waiting
> for the type system extension?

Because of C's implicit promotion/truncation, we can't do anything
sanely with return values of arbitrary type size; we have to capture the
lvalue type somehow so the checking can happen without C doing silent
garbage.

> The saturating thing is relying in the allocators never granting INT_MAX
> (or whatever size_t actually is) bytes?

The max of size_t is ULONG_MAX, but yes, most of the allocators will
refuse >INT_MAX, but I think vmalloc() is higher, but certainly not
SIZE_MAX, which is the entire virtual memory space. ;)

The saturating thing is two-fold: that we never wrap around SIZE_MAX,
and that the allocator will refuse a SIZE_MAX allocation.

> > LOL. It's basically doing compile-time (__builtin_object_size) and
> > run-time (__builtin_dynamic_object_size) bounds checking on destination
> > (and source) object sizes, mainly driven by the mentioned builtins:
> > https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
>
> Right, I got that far. I also read most of:
>
> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854

Oh wow, that's serious extra credit. :) It'll also probably be a while
before most of that stuff is even landed in Clang, much less implemented
in GCC. What we _do_ have is the "counted_by" attribute. This was added
to Clang a little while ago and just landed last week in GCC for GCC 15.

> But none of that is showing me generated asm for the various cases. As
> such, I don't consider myself informed enough.

Gotcha. For the compile-time stuff it's all just looking at
known-at-compile-time sizes. So for something like this, we get a
__compiletime_warning() emitted:

const char src[] = "Hello there";
char dst[10];

strscpy(dst, src); /* Compiler yells since src is bigger than dst. */

For run-time checks it's basically just using the regular WARN()
infrastructure with __builtin_dynamic_object_size(). Here's a simplified
userspace example with assert():

https://godbolt.org/z/zMrKnMxn5

The kernel's FORTIFY_SOURCE is much more complex in how it does the
checking, how it does the reporting (for helping people figure out what's
gone weird), etc.

> > Anyway! What about the patch that takes the 2 allocations down to 1?
> > That seems like an obvious improvement.
>
> Separate it from the struct_size() nonsense and Cc the author of that
> code (Sandipan IIRC) and I might just apply it.

Okay, thanks!

-Kees

--
Kees Cook

2024-06-14 10:17:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > I'm happy to take patches. And for this bikeshed, this would be better
> > > named under the size_*() helpers which are trying to keep size_t
> > > calculations from overflowing (by saturating). i.e.:
> > >
> > > size_add_mult(sizeof(*p), sizeof(*p->member), num)
> >
> > Fine I suppose, but what if we want something not size_t? Are we waiting
> > for the type system extension?
>
> Because of C's implicit promotion/truncation, we can't do anything
> sanely with return values of arbitrary type size; we have to capture the
> lvalue type somehow so the checking can happen without C doing silent
> garbage.

So sizeof() returns the native (built-in) size_t, right? If that type
the nooverflow qualifier on, then:

sizeof(*p) + num*sizeof(p->foo[0])

should all get the nooverflow semantics right? Because size_t is
effectively 'nooverflow unsigned long' the multiplication should promote
'num' to some 'long'.

Now, I've re-read the rules and I don't see qualifiers mentioned, so
can't we state that the overflow/nooverflow qualifiers are to be
preserved on (implicit) promotion and when nooverflow and overflow are
combined the 'safe' nooverflow takes precedence?

I mean, when we're adding qualifiers we can make up rules about them
too, right?

If 'people' don't want to adorn the built-in size_t, we can always do
something like:

#define sizeof(x) ((nooverflow unsigned long)(sizeof(x)))

and 'fix' it ourselves.

> > But none of that is showing me generated asm for the various cases. As
> > such, I don't consider myself informed enough.
>
> Gotcha. For the compile-time stuff it's all just looking at
> known-at-compile-time sizes. So for something like this, we get a
> __compiletime_warning() emitted:
>
> const char src[] = "Hello there";
> char dst[10];
>
> strscpy(dst, src); /* Compiler yells since src is bigger than dst. */
>
> For run-time checks it's basically just using the regular WARN()
> infrastructure with __builtin_dynamic_object_size(). Here's a simplified
> userspace example with assert():
>
> https://godbolt.org/z/zMrKnMxn5
>
> The kernel's FORTIFY_SOURCE is much more complex in how it does the
> checking, how it does the reporting (for helping people figure out what's
> gone weird), etc.

Thanks, I'll go have a look at that.

2024-06-15 16:21:23

by Martin Uecker

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] Hardening perf subsystem

Am Freitag, dem 14.06.2024 um 12:17 +0200 schrieb Peter Zijlstra:
> On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> > On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > > I'm happy to take patches. And for this bikeshed, this would be better
> > > > named under the size_*() helpers which are trying to keep size_t
> > > > calculations from overflowing (by saturating). i.e.:
> > > >
> > > > size_add_mult(sizeof(*p), sizeof(*p->member), num)
> > >
> > > Fine I suppose, but what if we want something not size_t? Are we waiting
> > > for the type system extension?
> >
> > Because of C's implicit promotion/truncation, we can't do anything
> > sanely with return values of arbitrary type size; we have to capture the
> > lvalue type somehow so the checking can happen without C doing silent
> > garbage.

What is the specific problem here?

>
> So sizeof() returns the native (built-in) size_t, right? If that type
> the nooverflow qualifier on, then:
>
> sizeof(*p) + num*sizeof(p->foo[0])
>
> should all get the nooverflow semantics right? Because size_t is
> effectively 'nooverflow unsigned long' the multiplication should promote
> 'num' to some 'long'.
>
> Now, I've re-read the rules and I don't see qualifiers mentioned, so
> can't we state that the overflow/nooverflow qualifiers are to be
> preserved on (implicit) promotion and when nooverflow and overflow are
> combined the 'safe' nooverflow takes precedence?
>
> I mean, when we're adding qualifiers we can make up rules about them
> too, right?

It should probably be a type attribute.

>
> If 'people' don't want to adorn the built-in size_t, we can always do
> something like:
>
> #define sizeof(x) ((nooverflow unsigned long)(sizeof(x)))
>
> and 'fix' it ourselves.

This is likely a stupid question, but making it signed 
wouldn't work? Or is a signed size_t too small  
or some architectures? Or would this change break too much?


Martin

>
> > > But none of that is showing me generated asm for the various cases. As
> > > such, I don't consider myself informed enough.
> >
> > Gotcha. For the compile-time stuff it's all just looking at
> > known-at-compile-time sizes. So for something like this, we get a
> > __compiletime_warning() emitted:
> >
> > const char src[] = "Hello there";
> > char dst[10];
> >
> > strscpy(dst, src); /* Compiler yells since src is bigger than dst. */
> >
> > For run-time checks it's basically just using the regular WARN()
> > infrastructure with __builtin_dynamic_object_size(). Here's a simplified
> > userspace example with assert():
> >
> > https://godbolt.org/z/zMrKnMxn5
> >
> > The kernel's FORTIFY_SOURCE is much more complex in how it does the
> > checking, how it does the reporting (for helping people figure out what's
> > gone weird), etc.
>
> Thanks, I'll go have a look at that.