2024-02-05 10:01:28

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH v4] ELF: AT_PAGE_SHIFT_MASK -- supply userspace with available page shifts

Report available page shifts in arch independent manner, so that
userspace developers won't have to parse /proc/cpuinfo hunting
for arch specific strings.

Main users are supposed to be libhugetlbfs-like libraries which try
to abstract huge mappings across multiple architectures. Regular code
which queries hugepage support before using them benefits too because
it doesn't have to deal with descriptors and parsing sysfs hierarchies
while enjoying the simplicity and speed of getauxval(3).

Note 1!

This is strictly for userspace, if some page size is shutdown due
to kernel command line option or CPU bug workaround, than it must
not be reported in aux vector!

Note 2!

getauxval(AT_PAGE_SHIFT_MASK) output is a function of CPU capabilities
only, it is not changed by system memory size, hugepage availability at
any given moment, hugetlbfs being mounted, etc.

Example output:

x86_64 machine with 1 GiB pages:

00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00

x86_64 machine with 2 MiB pages only:

00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00

AT_PAGESZ always reports one smallest page size which is not interesting.

Signed-off-by: Alexey Dobriyan <[email protected]>
---

changes since v3 -- even better changelog

arch/x86/include/asm/elf.h | 12 ++++++++++++
fs/binfmt_elf.c | 3 +++
include/uapi/linux/auxvec.h | 13 +++++++++++++
3 files changed, 28 insertions(+)

--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -358,6 +358,18 @@ else if (IS_ENABLED(CONFIG_IA32_EMULATION)) \

#define COMPAT_ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000)

+#define ARCH_AT_PAGE_SHIFT_MASK \
+ do { \
+ u32 val = 1 << 12; \
+ if (boot_cpu_has(X86_FEATURE_PSE)) { \
+ val |= 1 << 21; \
+ } \
+ if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
+ val |= 1 << 30; \
+ } \
+ NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, val); \
+ } while (0)
+
#endif /* !CONFIG_X86_32 */

#define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso)
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -240,6 +240,9 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
#endif
NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
+#ifdef ARCH_AT_PAGE_SHIFT_MASK
+ ARCH_AT_PAGE_SHIFT_MASK;
+#endif
NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
NEW_AUX_ENT(AT_PHDR, phdr_addr);
NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr));
--- a/include/uapi/linux/auxvec.h
+++ b/include/uapi/linux/auxvec.h
@@ -33,6 +33,19 @@
#define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */
#define AT_RSEQ_ALIGN 28 /* rseq allocation alignment */

+/*
+ * All page sizes supported by CPU encoded as bitmask.
+ *
+ * Example: x86_64 system with pse, pdpe1gb /proc/cpuinfo flags
+ * reports 4 KiB, 2 MiB and 1 GiB page support.
+ *
+ * $ LD_SHOW_AUXV=1 $(which true) | grep -e AT_PAGE_SHIFT_MASK
+ * AT_PAGE_SHIFT_MASK: 0x40201000
+ *
+ * For 2^64 hugepage support please contact your Universe sales representative.
+ */
+#define AT_PAGE_SHIFT_MASK 29
+
#define AT_EXECFN 31 /* filename of program */

#ifndef AT_MINSIGSTKSZ


2024-02-05 12:54:27

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4] ELF: AT_PAGE_SHIFT_MASK -- supply userspace with available page shifts

On Mon, Feb 05, 2024 at 12:51:43PM +0300, Alexey Dobriyan wrote:
> +#define ARCH_AT_PAGE_SHIFT_MASK \
> + do { \
> + u32 val = 1 << 12; \
> + if (boot_cpu_has(X86_FEATURE_PSE)) { \
> + val |= 1 << 21; \
> + } \
> + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
> + val |= 1 << 30; \
> + } \

Can we use something besides literal "12", "21", and "30" values here?

-Kees

--
Kees Cook

2024-02-09 12:30:54

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH v4] ELF: AT_PAGE_SHIFT_MASK -- supply userspace with available page shifts

On Mon, Feb 05, 2024 at 04:48:08AM -0800, Kees Cook wrote:
> On Mon, Feb 05, 2024 at 12:51:43PM +0300, Alexey Dobriyan wrote:
> > +#define ARCH_AT_PAGE_SHIFT_MASK \
> > + do { \
> > + u32 val = 1 << 12; \
> > + if (boot_cpu_has(X86_FEATURE_PSE)) { \
> > + val |= 1 << 21; \
> > + } \
> > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
> > + val |= 1 << 30; \
> > + } \
>
> Can we use something besides literal "12", "21", and "30" values here?

Ehh, no, why? Inside x86_64 the page shifts are very specific numbers,
they won't change.

2024-02-10 00:51:23

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4] ELF: AT_PAGE_SHIFT_MASK -- supply userspace with available page shifts

On Fri, Feb 09, 2024 at 03:30:37PM +0300, Alexey Dobriyan wrote:
> On Mon, Feb 05, 2024 at 04:48:08AM -0800, Kees Cook wrote:
> > On Mon, Feb 05, 2024 at 12:51:43PM +0300, Alexey Dobriyan wrote:
> > > +#define ARCH_AT_PAGE_SHIFT_MASK \
> > > + do { \
> > > + u32 val = 1 << 12; \
> > > + if (boot_cpu_has(X86_FEATURE_PSE)) { \
> > > + val |= 1 << 21; \
> > > + } \
> > > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
> > > + val |= 1 << 30; \
> > > + } \
> >
> > Can we use something besides literal "12", "21", and "30" values here?
>
> Ehh, no, why? Inside x86_64 the page shifts are very specific numbers,
> they won't change.

Well, it's nicer to have meaningful words to describe these things. In
fact, PAGE_SHIFT already exists for 12, and HPAGE_SHIFT already exists
for 21. Please use those, and add another, perhaps GBPAGE_SHIFT, for 30.

-Kees

--
Kees Cook

2024-02-13 19:18:19

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH v4] ELF: AT_PAGE_SHIFT_MASK -- supply userspace with available page shifts

On Fri, Feb 09, 2024 at 04:41:36PM -0800, Kees Cook wrote:
> On Fri, Feb 09, 2024 at 03:30:37PM +0300, Alexey Dobriyan wrote:
> > On Mon, Feb 05, 2024 at 04:48:08AM -0800, Kees Cook wrote:
> > > On Mon, Feb 05, 2024 at 12:51:43PM +0300, Alexey Dobriyan wrote:
> > > > +#define ARCH_AT_PAGE_SHIFT_MASK \
> > > > + do { \
> > > > + u32 val = 1 << 12; \
> > > > + if (boot_cpu_has(X86_FEATURE_PSE)) { \
> > > > + val |= 1 << 21; \
> > > > + } \
> > > > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
> > > > + val |= 1 << 30; \
> > > > + } \
> > >
> > > Can we use something besides literal "12", "21", and "30" values here?
> >
> > Ehh, no, why? Inside x86_64 the page shifts are very specific numbers,
> > they won't change.
>
> Well, it's nicer to have meaningful words to describe these things.

Not really. Inside specific arch page shifts are fixed, so using names
is just more macros one need to remember.

If I were to invent names (which I wouldn't), the best names are

PAGE_SHIFT
PAGE_SHIFT2
PAGE_SHIFT3
...

with PAGE_SHIFT2, PAGE_SHIFT3 being optional macros if arch doesn't support
multiple page sizes.

> In fact, PAGE_SHIFT already exists for 12, and HPAGE_SHIFT already exists
> for 21. Please use those, and add another, perhaps GBPAGE_SHIFT, for 30.

HPAGE_SHIFT is bad name, H doesn't describe anything unless arch is
known. Hugepages is marketing name. If GBPAGE_SHIFT is good name,
then HPAGE_SHIFT is bad name, it should've been MBPAGE_SHIFT, which
wrong because it is 2 MiB not 1 MiB.

BTW parisc has REAL_HPAGE_SHIFT !

2024-02-13 19:40:16

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4] ELF: AT_PAGE_SHIFT_MASK -- supply userspace with available page shifts

On Tue, Feb 13, 2024 at 09:51:01PM +0300, Alexey Dobriyan wrote:
> On Fri, Feb 09, 2024 at 04:41:36PM -0800, Kees Cook wrote:
> > On Fri, Feb 09, 2024 at 03:30:37PM +0300, Alexey Dobriyan wrote:
> > > On Mon, Feb 05, 2024 at 04:48:08AM -0800, Kees Cook wrote:
> > > > On Mon, Feb 05, 2024 at 12:51:43PM +0300, Alexey Dobriyan wrote:
> > > > > +#define ARCH_AT_PAGE_SHIFT_MASK \
> > > > > + do { \
> > > > > + u32 val = 1 << 12; \
> > > > > + if (boot_cpu_has(X86_FEATURE_PSE)) { \
> > > > > + val |= 1 << 21; \
> > > > > + } \
> > > > > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
> > > > > + val |= 1 << 30; \
> > > > > + } \
> > > >
> > > > Can we use something besides literal "12", "21", and "30" values here?
> > >
> > > Ehh, no, why? Inside x86_64 the page shifts are very specific numbers,
> > > they won't change.
> >
> > Well, it's nicer to have meaningful words to describe these things.
>
> Not really. Inside specific arch page shifts are fixed, so using names
> is just more macros one need to remember.
>
> If I were to invent names (which I wouldn't), the best names are
>
> PAGE_SHIFT
> PAGE_SHIFT2
> PAGE_SHIFT3
> ...
>
> with PAGE_SHIFT2, PAGE_SHIFT3 being optional macros if arch doesn't support
> multiple page sizes.
>
> > In fact, PAGE_SHIFT already exists for 12, and HPAGE_SHIFT already exists
> > for 21. Please use those, and add another, perhaps GBPAGE_SHIFT, for 30.
>
> HPAGE_SHIFT is bad name, H doesn't describe anything unless arch is
> known. Hugepages is marketing name. If GBPAGE_SHIFT is good name,
> then HPAGE_SHIFT is bad name, it should've been MBPAGE_SHIFT, which
> wrong because it is 2 MiB not 1 MiB.
>
> BTW parisc has REAL_HPAGE_SHIFT !

Sure, I mean, we've got an x86-specific function here, so let's use the
x86-specific macros we already have for 12 and 21, and then add the
missing one for 30.

--
Kees Cook