The SHM_HUGE_* stuff was introduced in:
42d7395feb5 (mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB)
It unnecessarily adds another layer, specific to sysv shm, without
anything special about it: the macros are identical to the MAP_HUGE_*
stuff, which in turn does correctly describe the hugepage subsystem.
One example of the problems with extra layers what this patch fixes:
mmap_pgoff() should never be using SHM_HUGE_* logic. This was
introduced by:
091d0d55b28 (shm: fix null pointer deref when userspace specifies invalid hugepage size)
It is obviously harmless but lets just rip out the whole thing --
the shmget.2 manpage will need updating, as it should not be
describing kernel internals.
Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/shm.h | 13 -------------
ipc/shm.c | 6 +++---
mm/mmap.c | 2 +-
tools/testing/selftests/vm/thuge-gen.c | 8 +-------
4 files changed, 5 insertions(+), 24 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index 429c1995d756..98fc25f9db8a 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -31,19 +31,6 @@ struct shmid_kernel /* private to the kernel */
/* Bits [26:31] are reserved */
-/*
- * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
- */
-#define SHM_HUGE_SHIFT 26
-#define SHM_HUGE_MASK 0x3f
-#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
-#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
-
#ifdef CONFIG_SYSVIPC
long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr,
unsigned long shmlba);
diff --git a/ipc/shm.c b/ipc/shm.c
index 7e199fa1960f..f21a2338ee39 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -491,8 +491,8 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
sprintf (name, "SYSV%08x", key);
if (shmflg & SHM_HUGETLB) {
- struct hstate *hs = hstate_sizelog((shmflg >> SHM_HUGE_SHIFT)
- & SHM_HUGE_MASK);
+ struct hstate *hs = hstate_sizelog((shmflg >> MAP_HUGE_SHIFT)
+ & MAP_HUGE_MASK);
size_t hugesize;
if (!hs) {
@@ -506,7 +506,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
acctflag = VM_NORESERVE;
file = hugetlb_file_setup(name, hugesize, acctflag,
&shp->mlock_user, HUGETLB_SHMFS_INODE,
- (shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
+ (shmflg >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
} else {
/*
* Do not allow no accounting for OVERCOMMIT_NEVER, even
diff --git a/mm/mmap.c b/mm/mmap.c
index 0718c175db8f..a1c4cefc5a38 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1369,7 +1369,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
} else if (flags & MAP_HUGETLB) {
struct user_struct *user = NULL;
struct hstate *hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) &
- SHM_HUGE_MASK);
+ MAP_HUGE_MASK);
if (!hs)
return -EINVAL;
diff --git a/tools/testing/selftests/vm/thuge-gen.c b/tools/testing/selftests/vm/thuge-gen.c
index c87957295f74..4479015ec96a 100644
--- a/tools/testing/selftests/vm/thuge-gen.c
+++ b/tools/testing/selftests/vm/thuge-gen.c
@@ -32,12 +32,6 @@
#define MAP_HUGE_MASK 0x3f
#define MAP_HUGETLB 0x40000
-#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
-#define SHM_HUGE_SHIFT 26
-#define SHM_HUGE_MASK 0x3f
-#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
-#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
-
#define NUM_PAGESIZES 5
#define NUM_PAGES 4
@@ -243,7 +237,7 @@ int main(void)
for (i = 0; i < num_page_sizes; i++) {
unsigned long ps = page_sizes[i];
- int arg = ilog2(ps) << SHM_HUGE_SHIFT;
+ int arg = ilog2(ps) << MAP_HUGE_SHIFT;
printf("Testing %luMB shmget with shift %x\n", ps >> 20, arg);
test_shmget(ps, SHM_HUGETLB | arg);
}
--
2.6.6
> One example of the problems with extra layers what this patch fixes:
> mmap_pgoff() should never be using SHM_HUGE_* logic. This was
> introduced by:
>
> 091d0d55b28 (shm: fix null pointer deref when userspace specifies invalid hugepage size)
>
> It is obviously harmless but lets just rip out the whole thing --
> the shmget.2 manpage will need updating, as it should not be
> describing kernel internals.
The SHM_* defines were supposed to be exported to user space,
but somehow they didn't make it into uapi.
But something like this is useful, it's a much nicer
interface for users than to hard code the bit position
So I would rather if you move it to uapi instead of
removing. What the kernel uses internally doesn't
really matter.
-Andi
On 03/09/2017 01:09 AM, Andi Kleen wrote:
>> One example of the problems with extra layers what this patch fixes:
>> mmap_pgoff() should never be using SHM_HUGE_* logic. This was
>> introduced by:
>>
>> 091d0d55b28 (shm: fix null pointer deref when userspace specifies invalid hugepage size)
>>
>> It is obviously harmless but lets just rip out the whole thing --
>> the shmget.2 manpage will need updating, as it should not be
>> describing kernel internals.
>
> The SHM_* defines were supposed to be exported to user space,
> but somehow they didn't make it into uapi.
Yeah, its not part of UAPI which it should have been. Now we
need to ilog2(page_size) and shift it before using them in
the user space. BTW, mmap() interface also would want this
encoding should we choose to use non default HugeTLB page
sizes.
>
> But something like this is useful, it's a much nicer
> interface for users than to hard code the bit position
Right. But as we need this both for shm and mmap() interface,
we can only have one set of values exported to the UAPI. The
other set needs to be removed IMHO. BTW, we need to add the
encoding for other arch supported HugeTLB supported sizes as
well like 16MB, 16GB etc (on POWER).
>
> So I would rather if you move it to uapi instead of
> removing. What the kernel uses internally doesn't
> really matter.
Had a sent a clean up patch last year which unfortunately I
forgot to resend though it has got ACK from Michal Hocko
and Balbir Singh.
https://lkml.org/lkml/2016/4/7/43
I had also tried to add POWER HugeTLB size encoding in the
arch specific header files. Probably its time to move all
of them to generic header.
https://lkml.org/lkml/2016/4/7/48
On Wed 08-03-17 09:06:01, Davidlohr Bueso wrote:
> The SHM_HUGE_* stuff was introduced in:
>
> 42d7395feb5 (mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB)
>
> It unnecessarily adds another layer, specific to sysv shm, without
> anything special about it: the macros are identical to the MAP_HUGE_*
> stuff, which in turn does correctly describe the hugepage subsystem.
>
> One example of the problems with extra layers what this patch fixes:
> mmap_pgoff() should never be using SHM_HUGE_* logic. This was
> introduced by:
>
> 091d0d55b28 (shm: fix null pointer deref when userspace specifies invalid hugepage size)
>
> It is obviously harmless but lets just rip out the whole thing --
> the shmget.2 manpage will need updating, as it should not be
> describing kernel internals.
Yes, I agree the additional layer just adds confusion and as it turned
out it is error prone. As this has never been exported to the userspace
properly without anybody complaining I would strongly suspect it is not
really needed so just get rid of it.
> Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Thanks!
> ---
> include/linux/shm.h | 13 -------------
> ipc/shm.c | 6 +++---
> mm/mmap.c | 2 +-
> tools/testing/selftests/vm/thuge-gen.c | 8 +-------
> 4 files changed, 5 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index 429c1995d756..98fc25f9db8a 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -31,19 +31,6 @@ struct shmid_kernel /* private to the kernel */
>
> /* Bits [26:31] are reserved */
>
> -/*
> - * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> - */
> -#define SHM_HUGE_SHIFT 26
> -#define SHM_HUGE_MASK 0x3f
> -#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
> -#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
> -
> #ifdef CONFIG_SYSVIPC
> long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr,
> unsigned long shmlba);
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7e199fa1960f..f21a2338ee39 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -491,8 +491,8 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
>
> sprintf (name, "SYSV%08x", key);
> if (shmflg & SHM_HUGETLB) {
> - struct hstate *hs = hstate_sizelog((shmflg >> SHM_HUGE_SHIFT)
> - & SHM_HUGE_MASK);
> + struct hstate *hs = hstate_sizelog((shmflg >> MAP_HUGE_SHIFT)
> + & MAP_HUGE_MASK);
> size_t hugesize;
>
> if (!hs) {
> @@ -506,7 +506,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> acctflag = VM_NORESERVE;
> file = hugetlb_file_setup(name, hugesize, acctflag,
> &shp->mlock_user, HUGETLB_SHMFS_INODE,
> - (shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK);
> + (shmflg >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> } else {
> /*
> * Do not allow no accounting for OVERCOMMIT_NEVER, even
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0718c175db8f..a1c4cefc5a38 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1369,7 +1369,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
> } else if (flags & MAP_HUGETLB) {
> struct user_struct *user = NULL;
> struct hstate *hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) &
> - SHM_HUGE_MASK);
> + MAP_HUGE_MASK);
>
> if (!hs)
> return -EINVAL;
> diff --git a/tools/testing/selftests/vm/thuge-gen.c b/tools/testing/selftests/vm/thuge-gen.c
> index c87957295f74..4479015ec96a 100644
> --- a/tools/testing/selftests/vm/thuge-gen.c
> +++ b/tools/testing/selftests/vm/thuge-gen.c
> @@ -32,12 +32,6 @@
> #define MAP_HUGE_MASK 0x3f
> #define MAP_HUGETLB 0x40000
>
> -#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
> -#define SHM_HUGE_SHIFT 26
> -#define SHM_HUGE_MASK 0x3f
> -#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
> -#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
> -
> #define NUM_PAGESIZES 5
>
> #define NUM_PAGES 4
> @@ -243,7 +237,7 @@ int main(void)
>
> for (i = 0; i < num_page_sizes; i++) {
> unsigned long ps = page_sizes[i];
> - int arg = ilog2(ps) << SHM_HUGE_SHIFT;
> + int arg = ilog2(ps) << MAP_HUGE_SHIFT;
> printf("Testing %luMB shmget with shift %x\n", ps >> 20, arg);
> test_shmget(ps, SHM_HUGETLB | arg);
> }
> --
> 2.6.6
--
Michal Hocko
SUSE Labs
Do we have any consensus here? Keeping SHM_HUGE_* is currently
winning 2-1. If there are in fact users out there computing the
value manually, then I am ok with keeping it and properly exporting
it. Michal?
Thanks,
Davidlohr
Sorry, forgot to add Anshuman.
On Tue, 28 Mar 2017, Davidlohr Bueso wrote:
>Do we have any consensus here? Keeping SHM_HUGE_* is currently
>winning 2-1. If there are in fact users out there computing the
>value manually, then I am ok with keeping it and properly exporting
>it. Michal?
>
>Thanks,
>Davidlohr
On Tue, Mar 28, 2017 at 09:55:13AM -0700, Davidlohr Bueso wrote:
> Do we have any consensus here? Keeping SHM_HUGE_* is currently
> winning 2-1. If there are in fact users out there computing the
> value manually, then I am ok with keeping it and properly exporting
> it. Michal?
Well, let's see what it looks like to do that. I went down the rabbit
hole trying to understand why some of the SHM_ flags had the same value
as each other until I realised some of them were internal flags, some
were flags to shmat() and others were flags to shmget(). Hopefully I
disambiguated them nicely in this patch. I also added 8MB and 16GB sizes.
Any more architectures with a pet favourite huge/giant page size we
should add convenience defines for?
diff --git a/include/linux/shm.h b/include/linux/shm.h
index 04e881829625..cd95243efd1a 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -24,26 +24,13 @@ struct shmid_kernel /* private to the kernel */
struct list_head shm_clist; /* list by creator */
};
-/* shm_mode upper byte flags */
-#define SHM_DEST 01000 /* segment will be destroyed on last detach */
-#define SHM_LOCKED 02000 /* segment will not be swapped */
-#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
-#define SHM_NORESERVE 010000 /* don't check for reservations */
-
-/* Bits [26:31] are reserved */
-
/*
- * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
+ * These flags are used internally; they cannot be specified by the user.
+ * They are masked off in newseg(). These values are used by IPC_CREAT
+ * and IPC_EXCL when calling shmget().
*/
-#define SHM_HUGE_SHIFT 26
-#define SHM_HUGE_MASK 0x3f
-#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
-#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
+#define SHM_DEST 01000 /* segment will be destroyed on last detach */
+#define SHM_LOCKED 02000 /* segment will not be swapped */
#ifdef CONFIG_SYSVIPC
struct sysv_shm {
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 1fbf24ea37fd..44b36cb228d7 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -40,15 +40,34 @@ struct shmid_ds {
/* Include the definition of shmid64_ds and shminfo64 */
#include <asm/shmbuf.h>
-/* permission flag for shmget */
+/* shmget() shmflg values. */
+/* The bottom nine bits are the same as open(2) mode flags */
#define SHM_R 0400 /* or S_IRUGO from <linux/stat.h> */
#define SHM_W 0200 /* or S_IWUGO from <linux/stat.h> */
+/* Bits 9 & 10 are IPC_CREAT and IPC_EXCL */
+#define SHM_HUGETLB (1 << 11) /* segment will use huge TLB pages */
+#define SHM_NORESERVE (1 << 12) /* don't check for reservations */
-/* mode for attach */
-#define SHM_RDONLY 010000 /* read-only access */
-#define SHM_RND 020000 /* round attach address to SHMLBA boundary */
-#define SHM_REMAP 040000 /* take-over region on attach */
-#define SHM_EXEC 0100000 /* execution access */
+/*
+ * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
+ * This gives us 6 bits, which is enough until someone invents 128 bit address
+ * spaces. These match MAP_HUGE_SHIFT and MAP_HUGE_MASK.
+ *
+ * Assume these are all powers of two.
+ * When 0 use the default page size.
+ */
+#define SHM_HUGE_SHIFT 26
+#define SHM_HUGE_MASK 0x3f
+#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
+#define SHM_HUGE_8MB (23 << SHM_HUGE_SHIFT)
+#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
+#define SHM_HUGE_16GB (34 << SHM_HUGE_SHIFT)
+
+/* shmat() shmflg values */
+#define SHM_RDONLY (1 << 12) /* read-only access */
+#define SHM_RND (1 << 13) /* round attach address to SHMLBA boundary */
+#define SHM_REMAP (1 << 14) /* take-over region on attach */
+#define SHM_EXEC (1 << 15) /* execution access */
/* super user shmctl commands */
#define SHM_LOCK 11
diff --git a/mm/mmap.c b/mm/mmap.c
index 499b988b1639..40b29aca18c1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1479,7 +1479,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
struct user_struct *user = NULL;
struct hstate *hs;
- hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & SHM_HUGE_MASK);
+ hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
if (!hs)
return -EINVAL;
On Tue 28-03-17 10:54:08, Matthew Wilcox wrote:
> On Tue, Mar 28, 2017 at 09:55:13AM -0700, Davidlohr Bueso wrote:
> > Do we have any consensus here? Keeping SHM_HUGE_* is currently
> > winning 2-1. If there are in fact users out there computing the
> > value manually, then I am ok with keeping it and properly exporting
> > it. Michal?
>
> Well, let's see what it looks like to do that. I went down the rabbit
> hole trying to understand why some of the SHM_ flags had the same value
> as each other until I realised some of them were internal flags, some
> were flags to shmat() and others were flags to shmget(). Hopefully I
> disambiguated them nicely in this patch. I also added 8MB and 16GB sizes.
> Any more architectures with a pet favourite huge/giant page size we
> should add convenience defines for?
Do we actually have any users?
--
Michal Hocko
SUSE Labs
On Wed, Mar 29, 2017 at 10:06:25AM +0200, Michal Hocko wrote:
> On Tue 28-03-17 10:54:08, Matthew Wilcox wrote:
> > On Tue, Mar 28, 2017 at 09:55:13AM -0700, Davidlohr Bueso wrote:
> > > Do we have any consensus here? Keeping SHM_HUGE_* is currently
> > > winning 2-1. If there are in fact users out there computing the
> > > value manually, then I am ok with keeping it and properly exporting
> > > it. Michal?
> >
> > Well, let's see what it looks like to do that. I went down the rabbit
> > hole trying to understand why some of the SHM_ flags had the same value
> > as each other until I realised some of them were internal flags, some
> > were flags to shmat() and others were flags to shmget(). Hopefully I
> > disambiguated them nicely in this patch. I also added 8MB and 16GB sizes.
> > Any more architectures with a pet favourite huge/giant page size we
> > should add convenience defines for?
>
> Do we actually have any users?
Yes this feature is widely used.
-Andi
On Wed 29-03-17 10:45:14, Andi Kleen wrote:
> On Wed, Mar 29, 2017 at 10:06:25AM +0200, Michal Hocko wrote:
> > On Tue 28-03-17 10:54:08, Matthew Wilcox wrote:
> > > On Tue, Mar 28, 2017 at 09:55:13AM -0700, Davidlohr Bueso wrote:
> > > > Do we have any consensus here? Keeping SHM_HUGE_* is currently
> > > > winning 2-1. If there are in fact users out there computing the
> > > > value manually, then I am ok with keeping it and properly exporting
> > > > it. Michal?
> > >
> > > Well, let's see what it looks like to do that. I went down the rabbit
> > > hole trying to understand why some of the SHM_ flags had the same value
> > > as each other until I realised some of them were internal flags, some
> > > were flags to shmat() and others were flags to shmget(). Hopefully I
> > > disambiguated them nicely in this patch. I also added 8MB and 16GB sizes.
> > > Any more architectures with a pet favourite huge/giant page size we
> > > should add convenience defines for?
> >
> > Do we actually have any users?
>
> Yes this feature is widely used.
Considering that none of SHM_HUGE* has been exported to the userspace
headers all the users would have to use the this flag by the value and I
am quite skeptical that application actually do that. Could you point me
to some projects that use this?
--
Michal Hocko
SUSE Labs
On Thu, 30 Mar 2017, Michal Hocko wrote:
>On Wed 29-03-17 10:45:14, Andi Kleen wrote:
>> On Wed, Mar 29, 2017 at 10:06:25AM +0200, Michal Hocko wrote:
>> >
>> > Do we actually have any users?
>>
>> Yes this feature is widely used.
>
>Considering that none of SHM_HUGE* has been exported to the userspace
>headers all the users would have to use the this flag by the value and I
>am quite skeptical that application actually do that. Could you point me
>to some projects that use this?
Hmm Andrew, if there's not one example, could you please pick up this patch?
Thanks,
Davidlohr
On Wed, Apr 12, 2017 at 09:18:29AM -0700, Davidlohr Bueso wrote:
> On Thu, 30 Mar 2017, Michal Hocko wrote:
>
> > On Wed 29-03-17 10:45:14, Andi Kleen wrote:
> > > On Wed, Mar 29, 2017 at 10:06:25AM +0200, Michal Hocko wrote:
> > > >
> > > > Do we actually have any users?
> > >
> > > Yes this feature is widely used.
> >
> > Considering that none of SHM_HUGE* has been exported to the userspace
> > headers all the users would have to use the this flag by the value and I
> > am quite skeptical that application actually do that. Could you point me
> > to some projects that use this?
>
> Hmm Andrew, if there's not one example, could you please pick up this patch?
?!? We just don't break user ABIs this way in Linux!
Just because you don't like something you cannot simply remove it.
I don't know if there are open source users, but there are known closed
source users for which this was originally implemented.
-Andi
On Wed, Apr 12, 2017 at 09:18:29AM -0700, Davidlohr Bueso wrote:
> On Thu, 30 Mar 2017, Michal Hocko wrote:
>
> > On Wed 29-03-17 10:45:14, Andi Kleen wrote:
> > > On Wed, Mar 29, 2017 at 10:06:25AM +0200, Michal Hocko wrote:
> > > >
> > > > Do we actually have any users?
> > >
> > > Yes this feature is widely used.
> >
> > Considering that none of SHM_HUGE* has been exported to the userspace
> > headers all the users would have to use the this flag by the value and I
> > am quite skeptical that application actually do that. Could you point me
> > to some projects that use this?
>
> Hmm Andrew, if there's not one example, could you please pick up this patch?
Please comment on the replacement patch I suggested.
Matthew Wilcox <[email protected]> writes:
> On Tue, Mar 28, 2017 at 09:55:13AM -0700, Davidlohr Bueso wrote:
>> Do we have any consensus here? Keeping SHM_HUGE_* is currently
>> winning 2-1. If there are in fact users out there computing the
>> value manually, then I am ok with keeping it and properly exporting
>> it. Michal?
>
> Well, let's see what it looks like to do that. I went down the rabbit
> hole trying to understand why some of the SHM_ flags had the same value
> as each other until I realised some of them were internal flags, some
> were flags to shmat() and others were flags to shmget(). Hopefully I
> disambiguated them nicely in this patch. I also added 8MB and 16GB sizes.
> Any more architectures with a pet favourite huge/giant page size we
> should add convenience defines for?
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index 04e881829625..cd95243efd1a 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -24,26 +24,13 @@ struct shmid_kernel /* private to the kernel */
> struct list_head shm_clist; /* list by creator */
> };
>
> -/* shm_mode upper byte flags */
> -#define SHM_DEST 01000 /* segment will be destroyed on last detach */
> -#define SHM_LOCKED 02000 /* segment will not be swapped */
> -#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
> -#define SHM_NORESERVE 010000 /* don't check for reservations */
> -
> -/* Bits [26:31] are reserved */
> -
> /*
> - * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> + * These flags are used internally; they cannot be specified by the user.
> + * They are masked off in newseg(). These values are used by IPC_CREAT
> + * and IPC_EXCL when calling shmget().
> */
> -#define SHM_HUGE_SHIFT 26
> -#define SHM_HUGE_MASK 0x3f
> -#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
> -#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
> +#define SHM_DEST 01000 /* segment will be destroyed on last detach */
> +#define SHM_LOCKED 02000 /* segment will not be swapped */
>
> #ifdef CONFIG_SYSVIPC
> struct sysv_shm {
> diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
> index 1fbf24ea37fd..44b36cb228d7 100644
> --- a/include/uapi/linux/shm.h
> +++ b/include/uapi/linux/shm.h
> @@ -40,15 +40,34 @@ struct shmid_ds {
> /* Include the definition of shmid64_ds and shminfo64 */
> #include <asm/shmbuf.h>
>
> -/* permission flag for shmget */
> +/* shmget() shmflg values. */
> +/* The bottom nine bits are the same as open(2) mode flags */
> #define SHM_R 0400 /* or S_IRUGO from <linux/stat.h> */
> #define SHM_W 0200 /* or S_IWUGO from <linux/stat.h> */
> +/* Bits 9 & 10 are IPC_CREAT and IPC_EXCL */
> +#define SHM_HUGETLB (1 << 11) /* segment will use huge TLB pages */
> +#define SHM_NORESERVE (1 << 12) /* don't check for reservations */
>
> -/* mode for attach */
> -#define SHM_RDONLY 010000 /* read-only access */
> -#define SHM_RND 020000 /* round attach address to SHMLBA boundary */
> -#define SHM_REMAP 040000 /* take-over region on attach */
> -#define SHM_EXEC 0100000 /* execution access */
> +/*
> + * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> + * This gives us 6 bits, which is enough until someone invents 128 bit address
> + * spaces. These match MAP_HUGE_SHIFT and MAP_HUGE_MASK.
> + *
> + * Assume these are all powers of two.
> + * When 0 use the default page size.
> + */
> +#define SHM_HUGE_SHIFT 26
> +#define SHM_HUGE_MASK 0x3f
> +#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
> +#define SHM_HUGE_8MB (23 << SHM_HUGE_SHIFT)
> +#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
> +#define SHM_HUGE_16GB (34 << SHM_HUGE_SHIFT)
This should be in arch/uapi like MAP_HUGE_2M ? That will let arch add
#defines based on the hugepae size supported by them.
> +
> +/* shmat() shmflg values */
> +#define SHM_RDONLY (1 << 12) /* read-only access */
> +#define SHM_RND (1 << 13) /* round attach address to SHMLBA boundary */
> +#define SHM_REMAP (1 << 14) /* take-over region on attach */
> +#define SHM_EXEC (1 << 15) /* execution access */
>
-aneesh
On Thu, Apr 13, 2017 at 11:32:09AM +0530, Aneesh Kumar K.V wrote:
> > +#define SHM_HUGE_SHIFT 26
> > +#define SHM_HUGE_MASK 0x3f
> > +#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
> > +#define SHM_HUGE_8MB (23 << SHM_HUGE_SHIFT)
> > +#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
> > +#define SHM_HUGE_16GB (34 << SHM_HUGE_SHIFT)
>
> This should be in arch/uapi like MAP_HUGE_2M ? That will let arch add
> #defines based on the hugepae size supported by them.
Well, what do we want to happen if source code contains SHM_HUGE_2MB?
Do we want it to fail to compile on ppc, or do we want it to request 2MB
pages and get ... hmm, looks like it fails at runtime (size_to_hstate
ends up returning NULL). So, yeah, looks like a compile-time failure
would be better.
But speaking of MAP_HUGE_, the only definitions so far are in arch/x86.
Are you going to add the ppc versions?
Also, which header file? I'm reluctant to add a new header, but
asm/shmbuf.h doesn't seem like a great place to put it.
If hugetlb pages are requested in mmap or shmget system calls, a huge
page size other than default can be requested. This is accomplished by
encoding the log2 of the huge page size in the upper bits of the flag
argument. asm-generic and arch specific headers all define the same
values for these encodings.
Put common definitions in a single header file. arch specific code can
still override if desired.
Signed-off-by: Mike Kravetz <[email protected]>
---
include/uapi/asm-generic/hugetlb_encode.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
create mode 100644 include/uapi/asm-generic/hugetlb_encode.h
diff --git a/include/uapi/asm-generic/hugetlb_encode.h b/include/uapi/asm-generic/hugetlb_encode.h
new file mode 100644
index 0000000..aa09fc0
--- /dev/null
+++ b/include/uapi/asm-generic/hugetlb_encode.h
@@ -0,0 +1,30 @@
+#ifndef _ASM_GENERIC_HUGETLB_ENCODE_H_
+#define _ASM_GENERIC_HUGETLB_ENCODE_H_
+
+/*
+ * Several system calls take a flag to request "hugetlb" huge pages.
+ * Without further specification, these system calls will use the
+ * system's default huge page size. If a system supports multiple
+ * huge page sizes, the desired huge page size can be specified in
+ * bits [26:31] of the flag arguments. The value in these 6 bits
+ * will encode the log2 of the huge page size.
+ *
+ * The following definitions are associated with this huge page size
+ * encoding in flag arguments. System call specific header files
+ * that use this encoding should include this file. They can then
+ * provide definitions based on these with their own specific prefix.
+ * for example #define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT.
+ */
+
+#define HUGETLB_FLAG_ENCODE_SHIFT 26
+#define HUGETLB_FLAG_ENCODE_MASK 0x3f
+
+#define HUGETLB_FLAG_ENCODE_512KB (19 << MAP_HUGE_SHIFT
+#define HUGETLB_FLAG_ENCODE_1MB (20 << MAP_HUGE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_2MB (21 << MAP_HUGE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_8MB (23 << MAP_HUGE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_16MB (24 << MAP_HUGE_SHIFT)
+#define HUGETLB_FLAG_ENCODE_1GB (30 << MAP_HUGE_SHIFT)
+#define HUGETLB_FLAG_ENCODE__16GB (34 << MAP_HUGE_SHIFT)
+
+#endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
--
2.7.5
I hate to resurrect this thread, but I would like to add hugetlb support
to memfd_create. This is for JVM garbage collection as discussed in
this thread [1].
Adding hugetlb support to memfd_create, means that memfd_create will take
a flag something like MFD_HUGETLB. And, if a user wants hugetlb pages
they may want a huge page size different than the system default. So, it
make sense to use the same type of encoding used by mmap and shmget.
However, I would hate to copy/paste the same values used by mmap and shmget
and just give them different names. So, how about something like the
following:
1) Put all the log2 encoded huge page size definitions in a common header
file.
2) Arch specific code can use these values, or overwrite as needed.
3) All system calls using this encoding (mmap, shmget and memfd_create in
the future) will use these common values.
I have also put the shm user space definitions in the uapi file as
previously suggested by Matthew Wilcox. I did not (yet) move the
shm definitions to arch specific files as suggested by Aneesh Kumar.
[1] https://lkml.org/lkml/2017/7/6/564
Mike Kravetz (3):
mm:hugetlb: Define system call hugetlb size encodings in single file
mm: arch: Use new hugetlb size encoding definitions
mm: shm: Use new hugetlb size encoding definitions
arch/alpha/include/uapi/asm/mman.h | 14 ++++++--------
arch/mips/include/uapi/asm/mman.h | 14 ++++++--------
arch/parisc/include/uapi/asm/mman.h | 14 ++++++--------
arch/powerpc/include/uapi/asm/mman.h | 23 ++++++++++-------------
arch/x86/include/uapi/asm/mman.h | 10 ++++++++--
arch/xtensa/include/uapi/asm/mman.h | 14 ++++++--------
include/linux/shm.h | 17 -----------------
include/uapi/asm-generic/hugetlb_encode.h | 30 ++++++++++++++++++++++++++++++
include/uapi/asm-generic/mman-common.h | 6 ++++--
include/uapi/linux/shm.h | 23 +++++++++++++++++++++--
10 files changed, 97 insertions(+), 68 deletions(-)
create mode 100644 include/uapi/asm-generic/hugetlb_encode.h
--
2.7.5
Use the common definitions from hugetlb_encode.h header file for
encoding hugetlb size definitions in shmget system call flags. In
addition, move these definitions to the from the internal to user
(uapi) header file.
Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
---
include/linux/shm.h | 17 -----------------
include/uapi/linux/shm.h | 23 +++++++++++++++++++++--
2 files changed, 21 insertions(+), 19 deletions(-)
diff --git a/include/linux/shm.h b/include/linux/shm.h
index 04e8818..d56285a 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -27,23 +27,6 @@ struct shmid_kernel /* private to the kernel */
/* shm_mode upper byte flags */
#define SHM_DEST 01000 /* segment will be destroyed on last detach */
#define SHM_LOCKED 02000 /* segment will not be swapped */
-#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
-#define SHM_NORESERVE 010000 /* don't check for reservations */
-
-/* Bits [26:31] are reserved */
-
-/*
- * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
- */
-#define SHM_HUGE_SHIFT 26
-#define SHM_HUGE_MASK 0x3f
-#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
-#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
#ifdef CONFIG_SYSVIPC
struct sysv_shm {
diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
index 1fbf24e..329bc17 100644
--- a/include/uapi/linux/shm.h
+++ b/include/uapi/linux/shm.h
@@ -3,6 +3,7 @@
#include <linux/ipc.h>
#include <linux/errno.h>
+#include <asm-generic/hugetlb_encode.h>
#ifndef __KERNEL__
#include <unistd.h>
#endif
@@ -40,11 +41,29 @@ struct shmid_ds {
/* Include the definition of shmid64_ds and shminfo64 */
#include <asm/shmbuf.h>
-/* permission flag for shmget */
+/* shmget() shmflg values. */
+/* The bottom nine bits are the same as open(2) mode flags */
#define SHM_R 0400 /* or S_IRUGO from <linux/stat.h> */
#define SHM_W 0200 /* or S_IWUGO from <linux/stat.h> */
+/* Bits 9 & 10 are IPC_CREAT and IPC_EXCL */
+#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
+#define SHM_NORESERVE 010000 /* don't check for reservations */
-/* mode for attach */
+/*
+ * Huge page size encoding when SHM_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
+ */
+#define SHM_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define SHM_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
+#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB
+#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB
+#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
+#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB
+#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB
+#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
+#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE__16GB
+
+/* shmat() shmflg values */
#define SHM_RDONLY 010000 /* read-only access */
#define SHM_RND 020000 /* round attach address to SHMLBA boundary */
#define SHM_REMAP 040000 /* take-over region on attach */
--
2.7.5
Use the common definitions from hugetlb_encode.h header file for
encoding hugetlb size definitions in mmap system call flags.
Signed-off-by: Mike Kravetz <[email protected]>
---
arch/alpha/include/uapi/asm/mman.h | 14 ++++++--------
arch/mips/include/uapi/asm/mman.h | 14 ++++++--------
arch/parisc/include/uapi/asm/mman.h | 14 ++++++--------
arch/powerpc/include/uapi/asm/mman.h | 23 ++++++++++-------------
arch/x86/include/uapi/asm/mman.h | 10 ++++++++--
arch/xtensa/include/uapi/asm/mman.h | 14 ++++++--------
include/uapi/asm-generic/mman-common.h | 6 ++++--
7 files changed, 46 insertions(+), 49 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 02760f6..bfa5828 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -1,6 +1,8 @@
#ifndef __ALPHA_MMAN_H__
#define __ALPHA_MMAN_H__
+#include <asm-generic/hugetlb_encode.h>
+
#define PROT_READ 0x1 /* page can be read */
#define PROT_WRITE 0x2 /* page can be written */
#define PROT_EXEC 0x4 /* page can be executed */
@@ -68,15 +70,11 @@
#define MAP_FILE 0
/*
- * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
*/
-#define MAP_HUGE_SHIFT 26
-#define MAP_HUGE_MASK 0x3f
+#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index 655e2fb..9e55284 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -8,6 +8,8 @@
#ifndef _ASM_MMAN_H
#define _ASM_MMAN_H
+#include <asm-generic/hugetlb_encode.h>
+
/*
* Protections are chosen from these bits, OR'd together. The
* implementation does not necessarily support PROT_EXEC or PROT_WRITE
@@ -95,15 +97,11 @@
#define MAP_FILE 0
/*
- * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
*/
-#define MAP_HUGE_SHIFT 26
-#define MAP_HUGE_MASK 0x3f
+#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index 5979745..11c6d86 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -1,6 +1,8 @@
#ifndef __PARISC_MMAN_H__
#define __PARISC_MMAN_H__
+#include <asm-generic/hugetlb_encode.h>
+
#define PROT_READ 0x1 /* page can be read */
#define PROT_WRITE 0x2 /* page can be written */
#define PROT_EXEC 0x4 /* page can be executed */
@@ -65,15 +67,11 @@
#define MAP_VARIABLE 0
/*
- * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
*/
-#define MAP_HUGE_SHIFT 26
-#define MAP_HUGE_MASK 0x3f
+#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index ab45cc2..80fd56c 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -8,6 +8,7 @@
#define _UAPI_ASM_POWERPC_MMAN_H
#include <asm-generic/mman-common.h>
+#include <asm-generic/hugetlb_encode.h>
#define PROT_SAO 0x10 /* Strong Access Ordering */
@@ -30,19 +31,15 @@
#define MAP_HUGETLB 0x40000 /* create a huge page mapping */
/*
- * When MAP_HUGETLB is set, bits [26:31] of the flags argument to mmap(2),
- * encode the log2 of the huge page size. A value of zero indicates that the
- * default huge page size should be used. To use a non-default huge page size,
- * one of these defines can be used, or the size can be encoded by hand. Note
- * that on most systems only a subset, or possibly none, of these sizes will be
- * available.
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
*/
-#define MAP_HUGE_512KB (19 << MAP_HUGE_SHIFT) /* 512KB HugeTLB Page */
-#define MAP_HUGE_1MB (20 << MAP_HUGE_SHIFT) /* 1MB HugeTLB Page */
-#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT) /* 2MB HugeTLB Page */
-#define MAP_HUGE_8MB (23 << MAP_HUGE_SHIFT) /* 8MB HugeTLB Page */
-#define MAP_HUGE_16MB (24 << MAP_HUGE_SHIFT) /* 16MB HugeTLB Page */
-#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT) /* 1GB HugeTLB Page */
-#define MAP_HUGE_16GB (34 << MAP_HUGE_SHIFT) /* 16GB HugeTLB Page */
+#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB /* 512KB HugeTLB Page */
+#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB /* 1MB HugeTLB Page */
+#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB /* 2MB HugeTLB Page */
+#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB /* 8MB HugeTLB Page */
+#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB /* 16MB HugeTLB Page */
+#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB /* 1GB HugeTLB Page */
+#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE__16GB /* 16GB HugeTLB Page */
#endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
index 39bca7f..5a67293 100644
--- a/arch/x86/include/uapi/asm/mman.h
+++ b/arch/x86/include/uapi/asm/mman.h
@@ -1,10 +1,16 @@
#ifndef _ASM_X86_MMAN_H
#define _ASM_X86_MMAN_H
+#include <asm-generic/hugetlb_encode.h>
+
#define MAP_32BIT 0x40 /* only give out 32bit addresses */
-#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
-#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)
+/*
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
+ */
+#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
+#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
/*
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index 24365b3..5deb5e4 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -14,6 +14,8 @@
#ifndef _XTENSA_MMAN_H
#define _XTENSA_MMAN_H
+#include <asm-generic/hugetlb_encode.h>
+
/*
* Protections are chosen from these bits, OR'd together. The
* implementation does not necessarily support PROT_EXEC or PROT_WRITE
@@ -107,15 +109,11 @@
#define MAP_FILE 0
/*
- * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
- * This gives us 6 bits, which is enough until someone invents 128 bit address
- * spaces.
- *
- * Assume these are all power of twos.
- * When 0 use the default page size.
+ * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
+ * size other than the default is desired. See hugetlb_encode.h
*/
-#define MAP_HUGE_SHIFT 26
-#define MAP_HUGE_MASK 0x3f
+#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 8c27db0..00d56b8 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -1,6 +1,8 @@
#ifndef __ASM_GENERIC_MMAN_COMMON_H
#define __ASM_GENERIC_MMAN_COMMON_H
+#include <asm-generic/hugetlb_encode.h>
+
/*
Author: Michael S. Tsirkin <[email protected]>, Mellanox Technologies Ltd.
Based on: asm-xxx/mman.h
@@ -69,8 +71,8 @@
* Assume these are all power of twos.
* When 0 use the default page size.
*/
-#define MAP_HUGE_SHIFT 26
-#define MAP_HUGE_MASK 0x3f
+#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
+#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
#define PKEY_DISABLE_ACCESS 0x1
#define PKEY_DISABLE_WRITE 0x2
--
2.7.5
On Mon, Jul 17, 2017 at 03:27:59PM -0700, Mike Kravetz wrote:
> +#define HUGETLB_FLAG_ENCODE_512KB (19 << MAP_HUGE_SHIFT
> +#define HUGETLB_FLAG_ENCODE_1MB (20 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_2MB (21 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_8MB (23 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_16MB (24 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_1GB (30 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE__16GB (34 << MAP_HUGE_SHIFT)
The __ seems like a mistake?
On Mon 17-07-17 15:27:59, Mike Kravetz wrote:
> If hugetlb pages are requested in mmap or shmget system calls, a huge
> page size other than default can be requested. This is accomplished by
> encoding the log2 of the huge page size in the upper bits of the flag
> argument. asm-generic and arch specific headers all define the same
> values for these encodings.
>
> Put common definitions in a single header file. arch specific code can
> still override if desired.
>
> Signed-off-by: Mike Kravetz <[email protected]>
with
s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
Acked-by: Michal Hocko <[email protected]>
> ---
> include/uapi/asm-generic/hugetlb_encode.h | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
> create mode 100644 include/uapi/asm-generic/hugetlb_encode.h
>
> diff --git a/include/uapi/asm-generic/hugetlb_encode.h b/include/uapi/asm-generic/hugetlb_encode.h
> new file mode 100644
> index 0000000..aa09fc0
> --- /dev/null
> +++ b/include/uapi/asm-generic/hugetlb_encode.h
> @@ -0,0 +1,30 @@
> +#ifndef _ASM_GENERIC_HUGETLB_ENCODE_H_
> +#define _ASM_GENERIC_HUGETLB_ENCODE_H_
> +
> +/*
> + * Several system calls take a flag to request "hugetlb" huge pages.
> + * Without further specification, these system calls will use the
> + * system's default huge page size. If a system supports multiple
> + * huge page sizes, the desired huge page size can be specified in
> + * bits [26:31] of the flag arguments. The value in these 6 bits
> + * will encode the log2 of the huge page size.
> + *
> + * The following definitions are associated with this huge page size
> + * encoding in flag arguments. System call specific header files
> + * that use this encoding should include this file. They can then
> + * provide definitions based on these with their own specific prefix.
> + * for example #define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT.
> + */
> +
> +#define HUGETLB_FLAG_ENCODE_SHIFT 26
> +#define HUGETLB_FLAG_ENCODE_MASK 0x3f
> +
> +#define HUGETLB_FLAG_ENCODE_512KB (19 << MAP_HUGE_SHIFT
> +#define HUGETLB_FLAG_ENCODE_1MB (20 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_2MB (21 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_8MB (23 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_16MB (24 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE_1GB (30 << MAP_HUGE_SHIFT)
> +#define HUGETLB_FLAG_ENCODE__16GB (34 << MAP_HUGE_SHIFT)
> +
> +#endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */
> --
> 2.7.5
>
--
Michal Hocko
SUSE Labs
On Mon 17-07-17 15:28:00, Mike Kravetz wrote:
> Use the common definitions from hugetlb_encode.h header file for
> encoding hugetlb size definitions in mmap system call flags.
>
> Signed-off-by: Mike Kravetz <[email protected]>
with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
Acked-by: Michal Hocko <[email protected]>
> ---
> arch/alpha/include/uapi/asm/mman.h | 14 ++++++--------
> arch/mips/include/uapi/asm/mman.h | 14 ++++++--------
> arch/parisc/include/uapi/asm/mman.h | 14 ++++++--------
> arch/powerpc/include/uapi/asm/mman.h | 23 ++++++++++-------------
> arch/x86/include/uapi/asm/mman.h | 10 ++++++++--
> arch/xtensa/include/uapi/asm/mman.h | 14 ++++++--------
> include/uapi/asm-generic/mman-common.h | 6 ++++--
> 7 files changed, 46 insertions(+), 49 deletions(-)
>
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 02760f6..bfa5828 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -1,6 +1,8 @@
> #ifndef __ALPHA_MMAN_H__
> #define __ALPHA_MMAN_H__
>
> +#include <asm-generic/hugetlb_encode.h>
> +
> #define PROT_READ 0x1 /* page can be read */
> #define PROT_WRITE 0x2 /* page can be written */
> #define PROT_EXEC 0x4 /* page can be executed */
> @@ -68,15 +70,11 @@
> #define MAP_FILE 0
>
> /*
> - * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> + * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> */
> -#define MAP_HUGE_SHIFT 26
> -#define MAP_HUGE_MASK 0x3f
> +#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
> +#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
>
> #define PKEY_DISABLE_ACCESS 0x1
> #define PKEY_DISABLE_WRITE 0x2
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index 655e2fb..9e55284 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -8,6 +8,8 @@
> #ifndef _ASM_MMAN_H
> #define _ASM_MMAN_H
>
> +#include <asm-generic/hugetlb_encode.h>
> +
> /*
> * Protections are chosen from these bits, OR'd together. The
> * implementation does not necessarily support PROT_EXEC or PROT_WRITE
> @@ -95,15 +97,11 @@
> #define MAP_FILE 0
>
> /*
> - * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> + * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> */
> -#define MAP_HUGE_SHIFT 26
> -#define MAP_HUGE_MASK 0x3f
> +#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
> +#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
>
> #define PKEY_DISABLE_ACCESS 0x1
> #define PKEY_DISABLE_WRITE 0x2
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index 5979745..11c6d86 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -1,6 +1,8 @@
> #ifndef __PARISC_MMAN_H__
> #define __PARISC_MMAN_H__
>
> +#include <asm-generic/hugetlb_encode.h>
> +
> #define PROT_READ 0x1 /* page can be read */
> #define PROT_WRITE 0x2 /* page can be written */
> #define PROT_EXEC 0x4 /* page can be executed */
> @@ -65,15 +67,11 @@
> #define MAP_VARIABLE 0
>
> /*
> - * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> + * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> */
> -#define MAP_HUGE_SHIFT 26
> -#define MAP_HUGE_MASK 0x3f
> +#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
> +#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
>
> #define PKEY_DISABLE_ACCESS 0x1
> #define PKEY_DISABLE_WRITE 0x2
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index ab45cc2..80fd56c 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -8,6 +8,7 @@
> #define _UAPI_ASM_POWERPC_MMAN_H
>
> #include <asm-generic/mman-common.h>
> +#include <asm-generic/hugetlb_encode.h>
>
>
> #define PROT_SAO 0x10 /* Strong Access Ordering */
> @@ -30,19 +31,15 @@
> #define MAP_HUGETLB 0x40000 /* create a huge page mapping */
>
> /*
> - * When MAP_HUGETLB is set, bits [26:31] of the flags argument to mmap(2),
> - * encode the log2 of the huge page size. A value of zero indicates that the
> - * default huge page size should be used. To use a non-default huge page size,
> - * one of these defines can be used, or the size can be encoded by hand. Note
> - * that on most systems only a subset, or possibly none, of these sizes will be
> - * available.
> + * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> */
> -#define MAP_HUGE_512KB (19 << MAP_HUGE_SHIFT) /* 512KB HugeTLB Page */
> -#define MAP_HUGE_1MB (20 << MAP_HUGE_SHIFT) /* 1MB HugeTLB Page */
> -#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT) /* 2MB HugeTLB Page */
> -#define MAP_HUGE_8MB (23 << MAP_HUGE_SHIFT) /* 8MB HugeTLB Page */
> -#define MAP_HUGE_16MB (24 << MAP_HUGE_SHIFT) /* 16MB HugeTLB Page */
> -#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT) /* 1GB HugeTLB Page */
> -#define MAP_HUGE_16GB (34 << MAP_HUGE_SHIFT) /* 16GB HugeTLB Page */
> +#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB /* 512KB HugeTLB Page */
> +#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB /* 1MB HugeTLB Page */
> +#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB /* 2MB HugeTLB Page */
> +#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB /* 8MB HugeTLB Page */
> +#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB /* 16MB HugeTLB Page */
> +#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB /* 1GB HugeTLB Page */
> +#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE__16GB /* 16GB HugeTLB Page */
>
> #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
> index 39bca7f..5a67293 100644
> --- a/arch/x86/include/uapi/asm/mman.h
> +++ b/arch/x86/include/uapi/asm/mman.h
> @@ -1,10 +1,16 @@
> #ifndef _ASM_X86_MMAN_H
> #define _ASM_X86_MMAN_H
>
> +#include <asm-generic/hugetlb_encode.h>
> +
> #define MAP_32BIT 0x40 /* only give out 32bit addresses */
>
> -#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
> -#define MAP_HUGE_1GB (30 << MAP_HUGE_SHIFT)
> +/*
> + * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> + */
> +#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
> +#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> /*
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index 24365b3..5deb5e4 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -14,6 +14,8 @@
> #ifndef _XTENSA_MMAN_H
> #define _XTENSA_MMAN_H
>
> +#include <asm-generic/hugetlb_encode.h>
> +
> /*
> * Protections are chosen from these bits, OR'd together. The
> * implementation does not necessarily support PROT_EXEC or PROT_WRITE
> @@ -107,15 +109,11 @@
> #define MAP_FILE 0
>
> /*
> - * When MAP_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> + * Huge page size encoding when MAP_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> */
> -#define MAP_HUGE_SHIFT 26
> -#define MAP_HUGE_MASK 0x3f
> +#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
> +#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
>
> #define PKEY_DISABLE_ACCESS 0x1
> #define PKEY_DISABLE_WRITE 0x2
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index 8c27db0..00d56b8 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -1,6 +1,8 @@
> #ifndef __ASM_GENERIC_MMAN_COMMON_H
> #define __ASM_GENERIC_MMAN_COMMON_H
>
> +#include <asm-generic/hugetlb_encode.h>
> +
> /*
> Author: Michael S. Tsirkin <[email protected]>, Mellanox Technologies Ltd.
> Based on: asm-xxx/mman.h
> @@ -69,8 +71,8 @@
> * Assume these are all power of twos.
> * When 0 use the default page size.
> */
> -#define MAP_HUGE_SHIFT 26
> -#define MAP_HUGE_MASK 0x3f
> +#define MAP_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
> +#define MAP_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
>
> #define PKEY_DISABLE_ACCESS 0x1
> #define PKEY_DISABLE_WRITE 0x2
> --
> 2.7.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Michal Hocko
SUSE Labs
On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
> Use the common definitions from hugetlb_encode.h header file for
> encoding hugetlb size definitions in shmget system call flags. In
> addition, move these definitions to the from the internal to user
> (uapi) header file.
s@to the from@from@
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Mike Kravetz <[email protected]>
with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
Acked-by: Michal Hocko <[email protected]>
> ---
> include/linux/shm.h | 17 -----------------
> include/uapi/linux/shm.h | 23 +++++++++++++++++++++--
> 2 files changed, 21 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/shm.h b/include/linux/shm.h
> index 04e8818..d56285a 100644
> --- a/include/linux/shm.h
> +++ b/include/linux/shm.h
> @@ -27,23 +27,6 @@ struct shmid_kernel /* private to the kernel */
> /* shm_mode upper byte flags */
> #define SHM_DEST 01000 /* segment will be destroyed on last detach */
> #define SHM_LOCKED 02000 /* segment will not be swapped */
> -#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
> -#define SHM_NORESERVE 010000 /* don't check for reservations */
> -
> -/* Bits [26:31] are reserved */
> -
> -/*
> - * When SHM_HUGETLB is set bits [26:31] encode the log2 of the huge page size.
> - * This gives us 6 bits, which is enough until someone invents 128 bit address
> - * spaces.
> - *
> - * Assume these are all power of twos.
> - * When 0 use the default page size.
> - */
> -#define SHM_HUGE_SHIFT 26
> -#define SHM_HUGE_MASK 0x3f
> -#define SHM_HUGE_2MB (21 << SHM_HUGE_SHIFT)
> -#define SHM_HUGE_1GB (30 << SHM_HUGE_SHIFT)
>
> #ifdef CONFIG_SYSVIPC
> struct sysv_shm {
> diff --git a/include/uapi/linux/shm.h b/include/uapi/linux/shm.h
> index 1fbf24e..329bc17 100644
> --- a/include/uapi/linux/shm.h
> +++ b/include/uapi/linux/shm.h
> @@ -3,6 +3,7 @@
>
> #include <linux/ipc.h>
> #include <linux/errno.h>
> +#include <asm-generic/hugetlb_encode.h>
> #ifndef __KERNEL__
> #include <unistd.h>
> #endif
> @@ -40,11 +41,29 @@ struct shmid_ds {
> /* Include the definition of shmid64_ds and shminfo64 */
> #include <asm/shmbuf.h>
>
> -/* permission flag for shmget */
> +/* shmget() shmflg values. */
> +/* The bottom nine bits are the same as open(2) mode flags */
> #define SHM_R 0400 /* or S_IRUGO from <linux/stat.h> */
> #define SHM_W 0200 /* or S_IWUGO from <linux/stat.h> */
> +/* Bits 9 & 10 are IPC_CREAT and IPC_EXCL */
> +#define SHM_HUGETLB 04000 /* segment will use huge TLB pages */
> +#define SHM_NORESERVE 010000 /* don't check for reservations */
>
> -/* mode for attach */
> +/*
> + * Huge page size encoding when SHM_HUGETLB is specified, and a huge page
> + * size other than the default is desired. See hugetlb_encode.h
> + */
> +#define SHM_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
> +#define SHM_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
> +#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB
> +#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB
> +#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
> +#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB
> +#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB
> +#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
> +#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE__16GB
> +
> +/* shmat() shmflg values */
> #define SHM_RDONLY 010000 /* read-only access */
> #define SHM_RND 020000 /* round attach address to SHMLBA boundary */
> #define SHM_REMAP 040000 /* take-over region on attach */
> --
> 2.7.5
>
--
Michal Hocko
SUSE Labs
On Wed 26-07-17 11:53:38, Michal Hocko wrote:
> On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
> > Use the common definitions from hugetlb_encode.h header file for
> > encoding hugetlb size definitions in shmget system call flags. In
> > addition, move these definitions to the from the internal to user
> > (uapi) header file.
>
> s@to the from@from@
>
> >
> > Suggested-by: Matthew Wilcox <[email protected]>
> > Signed-off-by: Mike Kravetz <[email protected]>
>
> with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
>
> Acked-by: Michal Hocko <[email protected]>
Btw. man page mentions only 2MB and 1GB, we should document others and
note that each arch might support only subset of them
> > +#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB
> > +#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB
> > +#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
> > +#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB
> > +#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB
> > +#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
> > +#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE__16GB
--
Michal Hocko
SUSE Labs
On 07/26/2017 03:07 AM, Michal Hocko wrote:
> On Wed 26-07-17 11:53:38, Michal Hocko wrote:
>> On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
>>> Use the common definitions from hugetlb_encode.h header file for
>>> encoding hugetlb size definitions in shmget system call flags. In
>>> addition, move these definitions to the from the internal to user
>>> (uapi) header file.
>>
>> s@to the from@from@
>>
>>>
>>> Suggested-by: Matthew Wilcox <[email protected]>
>>> Signed-off-by: Mike Kravetz <[email protected]>
>>
>> with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
>>
>> Acked-by: Michal Hocko <[email protected]>
>
> Btw. man page mentions only 2MB and 1GB, we should document others and
> note that each arch might support only subset of them
Thanks for looking at these Michal.
BTW, those definitions below are wrong. They should be SHM_HUGE_*. :(
In the overview of this RFC, I mentioned still needing to address the
comment from Aneesh about splitting SHM_HUGE_* definitions into arch
specific header files. This is how it is done for mmap. If an arch
supports multiple huge page sizes, the 'asm/mman.h' contains definitions
for those sizes. There will be a bit of churn (such as header file
renaming) to do this for shm as well. So, I keep going back and forth
asking myself 'is it worth it'? Some things to consider.
- We should be consistent between mmap and shm. Also remember, that I
will propose adding the same type of encoding to memfd_create. So,
three system calls will use the encoding. They should be consistent.
- Adding the arch specific definitions seems the 'most correct', as a
user can not use a definition not supported by the arch. However,
even if an arch supports a huge page size it does not mean that the
running kernel supports that size. Therefore, the folllowing is in
the man page.
"The range of huge page sizes that are supported by the system
can be discovered by listing the subdirectories in
/sys/kernel/mm/hugepages."
- Another alternative is to make all known huge page sizes available
to all users. This is 'easier' as the definitions can likely reside
in a common header file. The user will need to determine what
huge page sizes are supported by the running kernel as mentioned in
the man page.
Any thoughts/suggestions on these alternatives? I'll send out another
patch set based on comments. In any case, I think mmap and shm need to
be the same.
--
Mike Kravetz
>>> +#define MAP_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB
>>> +#define MAP_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB
>>> +#define MAP_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
>>> +#define MAP_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB
>>> +#define MAP_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB
>>> +#define MAP_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
>>> +#define MAP_HUGE_16GB HUGETLB_FLAG_ENCODE__16GB
On Wed, Jul 26, 2017 at 10:39:30AM -0700, Mike Kravetz wrote:
> In the overview of this RFC, I mentioned still needing to address the
> comment from Aneesh about splitting SHM_HUGE_* definitions into arch
> specific header files. This is how it is done for mmap. If an arch
> supports multiple huge page sizes, the 'asm/mman.h' contains definitions
> for those sizes. There will be a bit of churn (such as header file
> renaming) to do this for shm as well. So, I keep going back and forth
> asking myself 'is it worth it'? Some things to consider.
>
> - We should be consistent between mmap and shm. Also remember, that I
> will propose adding the same type of encoding to memfd_create. So,
> three system calls will use the encoding. They should be consistent.
I think mmap is wrong here. User programs are generally not architecture
specific, so they'll have to test with ifdefs or something awful.
For all we know, POWER 14 and whatever x86 CPU comes out in 2030 will
support (nearly) arbitrary page sizes like Itanium does, and a user
program compiled today should be able to take advantage of it.
> - Adding the arch specific definitions seems the 'most correct', as a
> user can not use a definition not supported by the arch. However,
> even if an arch supports a huge page size it does not mean that the
> running kernel supports that size. Therefore, the folllowing is in
> the man page.
> "The range of huge page sizes that are supported by the system
> can be discovered by listing the subdirectories in
> /sys/kernel/mm/hugepages."
> - Another alternative is to make all known huge page sizes available
> to all users. This is 'easier' as the definitions can likely reside
> in a common header file. The user will need to determine what
> huge page sizes are supported by the running kernel as mentioned in
> the man page.
That's my preference.
On Wed 26-07-17 10:39:30, Mike Kravetz wrote:
> On 07/26/2017 03:07 AM, Michal Hocko wrote:
> > On Wed 26-07-17 11:53:38, Michal Hocko wrote:
> >> On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
> >>> Use the common definitions from hugetlb_encode.h header file for
> >>> encoding hugetlb size definitions in shmget system call flags. In
> >>> addition, move these definitions to the from the internal to user
> >>> (uapi) header file.
> >>
> >> s@to the from@from@
> >>
> >>>
> >>> Suggested-by: Matthew Wilcox <[email protected]>
> >>> Signed-off-by: Mike Kravetz <[email protected]>
> >>
> >> with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
> >>
> >> Acked-by: Michal Hocko <[email protected]>
> >
> > Btw. man page mentions only 2MB and 1GB, we should document others and
> > note that each arch might support only subset of them
>
> Thanks for looking at these Michal.
> BTW, those definitions below are wrong. They should be SHM_HUGE_*. :(
Ups, and I completely missed that.
> In the overview of this RFC, I mentioned still needing to address the
> comment from Aneesh about splitting SHM_HUGE_* definitions into arch
> specific header files. This is how it is done for mmap. If an arch
> supports multiple huge page sizes, the 'asm/mman.h' contains definitions
> for those sizes. There will be a bit of churn (such as header file
> renaming) to do this for shm as well. So, I keep going back and forth
> asking myself 'is it worth it'?
Why cannot we use a generic header? Btw. I think it would be better for
MMAP definitions as well.
> Some things to consider.
>
> - We should be consistent between mmap and shm. Also remember, that I
> will propose adding the same type of encoding to memfd_create. So,
> three system calls will use the encoding. They should be consistent.
agreed
> - Adding the arch specific definitions seems the 'most correct', as a
> user can not use a definition not supported by the arch. However,
> even if an arch supports a huge page size it does not mean that the
> running kernel supports that size. Therefore, the folllowing is in
> the man page.
> "The range of huge page sizes that are supported by the system
> can be discovered by listing the subdirectories in
> /sys/kernel/mm/hugepages."
Doesn't the respective call return -EINVAL on the unsupported hugepage
size?
> - Another alternative is to make all known huge page sizes available
> to all users. This is 'easier' as the definitions can likely reside
> in a common header file. The user will need to determine what
> huge page sizes are supported by the running kernel as mentioned in
> the man page.
yes I think this makes more sense.
--
Michal Hocko
SUSE Labs
On 07/27/2017 12:50 AM, Michal Hocko wrote:
> On Wed 26-07-17 10:39:30, Mike Kravetz wrote:
>> On 07/26/2017 03:07 AM, Michal Hocko wrote:
>>> On Wed 26-07-17 11:53:38, Michal Hocko wrote:
>>>> On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
>>>>> Use the common definitions from hugetlb_encode.h header file for
>>>>> encoding hugetlb size definitions in shmget system call flags. In
>>>>> addition, move these definitions to the from the internal to user
>>>>> (uapi) header file.
>>>>
>>>> s@to the from@from@
>>>>
>>>>>
>>>>> Suggested-by: Matthew Wilcox <[email protected]>
>>>>> Signed-off-by: Mike Kravetz <[email protected]>
>>>>
>>>> with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
>>>>
>>>> Acked-by: Michal Hocko <[email protected]>
>>>
>>> Btw. man page mentions only 2MB and 1GB, we should document others and
>>> note that each arch might support only subset of them
>>
>> Thanks for looking at these Michal.
>> BTW, those definitions below are wrong. They should be SHM_HUGE_*. :(
>
> Ups, and I completely missed that.
>
>> In the overview of this RFC, I mentioned still needing to address the
>> comment from Aneesh about splitting SHM_HUGE_* definitions into arch
>> specific header files. This is how it is done for mmap. If an arch
>> supports multiple huge page sizes, the 'asm/mman.h' contains definitions
>> for those sizes. There will be a bit of churn (such as header file
>> renaming) to do this for shm as well. So, I keep going back and forth
>> asking myself 'is it worth it'?
>
> Why cannot we use a generic header? Btw. I think it would be better for
> MMAP definitions as well.
I assume you are asking about a uapi asm-generic header file? Currently
mmap has two such files: mman.h and mman-common.h. In order to get the
definitions in such files, arch specific header files must #include the
asm-generic headers. There are arch specific mmap headers today that do
not include either of the asm-generic headers. And, they have their own
definitions for MAP_HUGE_SHIFT. So, it seems we can not use one of the
existing mmap asm-generic header files. Rather, we would need to create
a new one and have that included by all arch specific files.
However, ALL the MAP_HUGE_* definitions in all the arch specific and
asm-generic header files are the same. It would be possible to just put
all those MAP_HUGE_* definitions in the primary uapi header file
(include/uapi/linux/mman.h). If there was ever a need for arch specific
values in the future, we could split them out at that time.
>> Some things to consider.
>>
>> - We should be consistent between mmap and shm. Also remember, that I
>> will propose adding the same type of encoding to memfd_create. So,
>> three system calls will use the encoding. They should be consistent.
>
> agreed
>
>> - Adding the arch specific definitions seems the 'most correct', as a
>> user can not use a definition not supported by the arch. However,
>> even if an arch supports a huge page size it does not mean that the
>> running kernel supports that size. Therefore, the folllowing is in
>> the man page.
>> "The range of huge page sizes that are supported by the system
>> can be discovered by listing the subdirectories in
>> /sys/kernel/mm/hugepages."
>
> Doesn't the respective call return -EINVAL on the unsupported hugepage
> size?
Yes, it does.
>> - Another alternative is to make all known huge page sizes available
>> to all users. This is 'easier' as the definitions can likely reside
>> in a common header file. The user will need to determine what
>> huge page sizes are supported by the running kernel as mentioned in
>> the man page.
>
> yes I think this makes more sense.
Ok, thanks.
The only remaining question is what kind of common header to use:
1) An asm-generic header file in case there may be arch specific differences
in the future.
2) Use the primary uapi header file in include/uapi/linux/mman|shm.h.
--
Mike Kravetz
On Thu 27-07-17 14:18:11, Mike Kravetz wrote:
> On 07/27/2017 12:50 AM, Michal Hocko wrote:
> > On Wed 26-07-17 10:39:30, Mike Kravetz wrote:
> >> On 07/26/2017 03:07 AM, Michal Hocko wrote:
> >>> On Wed 26-07-17 11:53:38, Michal Hocko wrote:
> >>>> On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
> >>>>> Use the common definitions from hugetlb_encode.h header file for
> >>>>> encoding hugetlb size definitions in shmget system call flags. In
> >>>>> addition, move these definitions to the from the internal to user
> >>>>> (uapi) header file.
> >>>>
> >>>> s@to the from@from@
> >>>>
> >>>>>
> >>>>> Suggested-by: Matthew Wilcox <[email protected]>
> >>>>> Signed-off-by: Mike Kravetz <[email protected]>
> >>>>
> >>>> with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
> >>>>
> >>>> Acked-by: Michal Hocko <[email protected]>
> >>>
> >>> Btw. man page mentions only 2MB and 1GB, we should document others and
> >>> note that each arch might support only subset of them
> >>
> >> Thanks for looking at these Michal.
> >> BTW, those definitions below are wrong. They should be SHM_HUGE_*. :(
> >
> > Ups, and I completely missed that.
> >
> >> In the overview of this RFC, I mentioned still needing to address the
> >> comment from Aneesh about splitting SHM_HUGE_* definitions into arch
> >> specific header files. This is how it is done for mmap. If an arch
> >> supports multiple huge page sizes, the 'asm/mman.h' contains definitions
> >> for those sizes. There will be a bit of churn (such as header file
> >> renaming) to do this for shm as well. So, I keep going back and forth
> >> asking myself 'is it worth it'?
> >
> > Why cannot we use a generic header? Btw. I think it would be better for
> > MMAP definitions as well.
>
> I assume you are asking about a uapi asm-generic header file? Currently
> mmap has two such files: mman.h and mman-common.h. In order to get the
> definitions in such files, arch specific header files must #include the
> asm-generic headers. There are arch specific mmap headers today that do
> not include either of the asm-generic headers. And, they have their own
> definitions for MAP_HUGE_SHIFT. So, it seems we can not use one of the
> existing mmap asm-generic header files. Rather, we would need to create
> a new one and have that included by all arch specific files.
yes, add a new one like you did in your first patch
> However, ALL the MAP_HUGE_* definitions in all the arch specific and
> asm-generic header files are the same. It would be possible to just put
> all those MAP_HUGE_* definitions in the primary uapi header file
> (include/uapi/linux/mman.h). If there was ever a need for arch specific
> values in the future, we could split them out at that time.
agreed
[...]
> >> - Another alternative is to make all known huge page sizes available
> >> to all users. This is 'easier' as the definitions can likely reside
> >> in a common header file. The user will need to determine what
> >> huge page sizes are supported by the running kernel as mentioned in
> >> the man page.
> >
> > yes I think this makes more sense.
>
> Ok, thanks.
>
> The only remaining question is what kind of common header to use:
> 1) An asm-generic header file in case there may be arch specific differences
> in the future.
> 2) Use the primary uapi header file in include/uapi/linux/mman|shm.h.
I would use the primary one and only got the arch specific if we ever
need to do arch specific thing.
--
Michal Hocko
SUSE Labs