2005-11-09 18:46:51

by Arun Sharma

[permalink] [raw]
Subject: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Allow shmctl to find out if a shmid corresponds to a HUGETLB segment

Signed-off-by: Arun Sharma <[email protected]>
Acked-by: Rohit Seth <[email protected]>

--- a/ipc/shm.c Tue Nov 8 20:58:38 2005
+++ b/ipc/shm.c Wed Nov 9 10:26:37 2005
@@ -197,7 +197,7 @@
return -ENOMEM;

shp->shm_perm.key = key;
- shp->shm_flags = (shmflg & S_IRWXUGO);
+ shp->shm_flags = (shmflg & (S_IRWXUGO | SHM_HUGETLB));
shp->mlock_user = NULL;

shp->shm_perm.security = NULL;


2005-11-10 06:22:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Arun Sharma <[email protected]> wrote:
>
> Allow shmctl to find out if a shmid corresponds to a HUGETLB segment
>
> Signed-off-by: Arun Sharma <[email protected]>
> Acked-by: Rohit Seth <[email protected]>
>
> --- a/ipc/shm.c Tue Nov 8 20:58:38 2005
> +++ b/ipc/shm.c Wed Nov 9 10:26:37 2005
> @@ -197,7 +197,7 @@
> return -ENOMEM;
>
> shp->shm_perm.key = key;
> - shp->shm_flags = (shmflg & S_IRWXUGO);
> + shp->shm_flags = (shmflg & (S_IRWXUGO | SHM_HUGETLB));
> shp->mlock_user = NULL;
>
> shp->shm_perm.security = NULL;

I dunno. The manpage says:

The highlighted fields in the member shm_perm can be set:

struct ipc_perm {
...
ushort mode; /* lower 9 bits of access modes */
...
};

So if an application used to do:

if (perm.mode == 0666)

it will now break, because we've gone and set bit 9 on hugetlb segments.

2005-11-10 18:36:12

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Andrew Morton wrote:

>> + shp->shm_flags = (shmflg & (S_IRWXUGO | SHM_HUGETLB));
> [...]
> I dunno. The manpage says:
>
> The highlighted fields in the member shm_perm can be set:
>
> struct ipc_perm {
> ...
> ushort mode; /* lower 9 bits of access modes */
> ...
> };
>
> So if an application used to do:
>
> if (perm.mode == 0666)
>
> it will now break, because we've gone and set bit 9 on hugetlb segments.

The man page on my system says:

unsigned short mode; /* Permissions + SHM_DEST and
SHM_LOCKED flags */

I looked for a precendent before sending the patch and thought that
SHM_LOCKED was a good one.

-Arun

2005-11-10 19:59:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Arun Sharma <[email protected]> wrote:
>
> Andrew Morton wrote:
>
> >> + shp->shm_flags = (shmflg & (S_IRWXUGO | SHM_HUGETLB));
> > [...]
> > I dunno. The manpage says:
> >
> > The highlighted fields in the member shm_perm can be set:
> >
> > struct ipc_perm {
> > ...
> > ushort mode; /* lower 9 bits of access modes */
> > ...
> > };
> >
> > So if an application used to do:
> >
> > if (perm.mode == 0666)
> >
> > it will now break, because we've gone and set bit 9 on hugetlb segments.
>
> The man page on my system says:
>
> unsigned short mode; /* Permissions + SHM_DEST and
> SHM_LOCKED flags */
>
> I looked for a precendent before sending the patch and thought that
> SHM_LOCKED was a good one.
>

hm, OK. But an app could still do

if (mode == 0666|SHM_LOCKED)


How important is this feature?

2005-11-10 21:41:51

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Andrew Morton wrote:

>>The man page on my system says:
>>
>> unsigned short mode; /* Permissions + SHM_DEST and
>> SHM_LOCKED flags */
>>
>>I looked for a precendent before sending the patch and thought that
>>SHM_LOCKED was a good one.
>>
>
>
> hm, OK. But an app could still do
>
> if (mode == 0666|SHM_LOCKED)
>

I'd argue that the app should really be doing (perm.mode & 0777 = 0666)

>
> How important is this feature?

Without this feature, an application has no way to figure out if a given
segment is hugetlb or not. Applications need to know this to be able to
handle alignment issues properly.

Also, if the flag is exported via ipcs, the system administrator would
have a better idea about how the hugetlb pages she configured on the
system are getting used.

-Arun

2005-11-10 22:06:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Arun Sharma <[email protected]> wrote:
>
> Andrew Morton wrote:
>
> >>The man page on my system says:
> >>
> >> unsigned short mode; /* Permissions + SHM_DEST and
> >> SHM_LOCKED flags */
> >>
> >>I looked for a precendent before sending the patch and thought that
> >>SHM_LOCKED was a good one.
> >>
> >
> >
> > hm, OK. But an app could still do
> >
> > if (mode == 0666|SHM_LOCKED)
> >
>
> I'd argue that the app should really be doing (perm.mode & 0777 = 0666)

I find your faith in your fellow programmers to be trluy inspirational ;)

> >
> > How important is this feature?
>
> Without this feature, an application has no way to figure out if a given
> segment is hugetlb or not. Applications need to know this to be able to
> handle alignment issues properly.
>
> Also, if the flag is exported via ipcs, the system administrator would
> have a better idea about how the hugetlb pages she configured on the
> system are getting used.
>

I'd suggest that any API which allows us to query the hugeness of a piece
of memory should also work for mmap(hugetld_fd...). IOW: this capability
shouldn't be restricted to sysv shm areas.

I can't think of any syscall which can be sanely overloaded for this.

The most general way of exposing this info would be to export it in
/proc/pid/maps in some back-compatible manner.

And once I've lost the "oh oh we'd need to write 100 lines of userspace
code for that" bunfight I'd say add a new syscall sys_query_pagesize(void
*addr) which returns the size of the page which backs `addr'.

But then again, if it was possible to write 100 lines of userspace code, we
wouldn't need this capability at all. I bet if the userspace guys tried a
bit harder they'd work out a way of teaching their applications to remember
what they did.

2005-11-11 02:50:15

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Andrew Morton wrote:

>>>How important is this feature?
>>
>>Without this feature, an application has no way to figure out if a given
>>segment is hugetlb or not. Applications need to know this to be able to
>>handle alignment issues properly.
>>
>>Also, if the flag is exported via ipcs, the system administrator would
>>have a better idea about how the hugetlb pages she configured on the
>>system are getting used.
>>
>
>
> I'd suggest that any API which allows us to query the hugeness of a piece
> of memory should also work for mmap(hugetld_fd...). IOW: this capability
> shouldn't be restricted to sysv shm areas.

The capability I was talking about was the ability to figure out where
the configured hugetlb pages are going (vs is this a hugetlb page?).

I suspect that one can use lsof+/proc/pid/maps and look for hugetlbfs
mount points to gather that data. But for shared memory hugepages, we
don't have a way.

> But then again, if it was possible to write 100 lines of userspace code, we
> wouldn't need this capability at all. I bet if the userspace guys tried a
> bit harder they'd work out a way of teaching their applications to remember
> what they did.

Why do we need shmctl(IPC_STAT) then? Applications should remember what
they did :)

-Arun

2005-11-11 03:13:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Arun Sharma <[email protected]> wrote:
>
> Andrew Morton wrote:
>
> >>>How important is this feature?
> >>
> >>Without this feature, an application has no way to figure out if a given
> >>segment is hugetlb or not. Applications need to know this to be able to
> >>handle alignment issues properly.
> >>
> >>Also, if the flag is exported via ipcs, the system administrator would
> >>have a better idea about how the hugetlb pages she configured on the
> >>system are getting used.
> >>
> >
> >
> > I'd suggest that any API which allows us to query the hugeness of a piece
> > of memory should also work for mmap(hugetld_fd...). IOW: this capability
> > shouldn't be restricted to sysv shm areas.
>
> The capability I was talking about was the ability to figure out where
> the configured hugetlb pages are going (vs is this a hugetlb page?).

Well, please figure out a way which has less risk of breaking userspace.

Bear in mind that the sort of apps we're talking about here are
dubiously-written monsters with long and costly upgrade cycles and we tend
to not get any reports until many many months after we made a kernel
change. It's very costly all round and we need to be cautious.

2005-11-11 07:33:03

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

Andrew Morton wrote:
>> But then again, if it was possible to write 100 lines of userspace code, we
>> wouldn't need this capability at all. I bet if the userspace guys tried a
>> bit harder they'd work out a way of teaching their applications to remember
>> what they did.

On Thu, Nov 10, 2005 at 06:49:56PM -0800, Arun Sharma wrote:
> Why do we need shmctl(IPC_STAT) then? Applications should remember what
> they did :)

atime, dtime, ctime, lpid, and nattch are not "rememberable" this way.


-- wli

2005-11-12 07:19:54

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

> Arun Sharma <[email protected]> wrote:
> >
> > Andrew Morton wrote:
> >
> > >>>How important is this feature?
> > >>
> > >>Without this feature, an application has no way to figure out if a given
> > >>segment is hugetlb or not. Applications need to know this to be able to
> > >>handle alignment issues properly.
> > >>
> > >>Also, if the flag is exported via ipcs, the system administrator would
> > >>have a better idea about how the hugetlb pages she configured on the
> > >>system are getting used.
> > >>
> > >
> > >
> > > I'd suggest that any API which allows us to query the hugeness of a piece
> > > of memory should also work for mmap(hugetld_fd...). IOW: this capability
> > > shouldn't be restricted to sysv shm areas.
> >
> > The capability I was talking about was the ability to figure out where
> > the configured hugetlb pages are going (vs is this a hugetlb page?).
>
> Well, please figure out a way which has less risk of breaking userspace.
>
> Bear in mind that the sort of apps we're talking about here are
> dubiously-written monsters with long and costly upgrade cycles and we tend
> to not get any reports until many many months after we made a kernel
> change. It's very costly all round and we need to be cautious.

Andrew,

I am late to this discussion, but for what it's worth, a
portable application really must use checks of the like
(perm.mode & 0777 = 0666), because many implementations
define additional read-only flags for perm.mode:

Tru64 5.1
#define SHM_LOCKED 01000 /* segment locked in memory */
#define SHM_REMOVED 02000 /* already removed */

Linux
#define SHM_DEST 01000 /* segment will be destroyed on last detach */
#define SHM_LOCKED 02000 /* segment will not be swapped */

HP-UX 11
# define SHM_CLEAR 01000 /* clear segment on next attach */
# define SHM_DEST 02000 /* destroy segment when # attached = 0 */
# define SHM_NOSWAP 010000 /* region for shared memory is memory locked */
/* (or should be when the region is allocated) */

AIX 5.1
#define SHM_DEST 02000 /* destroy segment when # attached = 0 */

So the chances are probably good that portable applications
wouldn't break with Arun's proposal. Of course applications
that were written just for Linux, and don't take care, might
also be at risk, but I think the risk is probably low.
A check of the form:

if (mode == 0666|SHM_LOCKED)

instead of:

if (mode & SHM_LOCKED)

is very obtuse.

This might not change your point of view (there is a theoretical risk
after all), but I thought it worth mentioning.

Cheers,

Michael

2005-11-12 07:38:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Expose SHM_HUGETLB in shmctl(id, IPC_STAT, ...)

"Michael Kerrisk" <[email protected]> wrote:
>
> > Bear in mind that the sort of apps we're talking about here are
> > dubiously-written monsters with long and costly upgrade cycles and we tend
> > to not get any reports until many many months after we made a kernel
> > change. It's very costly all round and we need to be cautious.
>
> Andrew,
>
> I am late to this discussion, but for what it's worth, a
> portable application really must use checks of the like
> (perm.mode & 0777 = 0666), because many implementations
> define additional read-only flags for perm.mode:
>
> Tru64 5.1
> #define SHM_LOCKED 01000 /* segment locked in memory */
> #define SHM_REMOVED 02000 /* already removed */
>
> Linux
> #define SHM_DEST 01000 /* segment will be destroyed on last detach */
> #define SHM_LOCKED 02000 /* segment will not be swapped */
>
> HP-UX 11
> # define SHM_CLEAR 01000 /* clear segment on next attach */
> # define SHM_DEST 02000 /* destroy segment when # attached = 0 */
> # define SHM_NOSWAP 010000 /* region for shared memory is memory locked */
> /* (or should be when the region is allocated) */
>
> AIX 5.1
> #define SHM_DEST 02000 /* destroy segment when # attached = 0 */
>
> So the chances are probably good that portable applications
> wouldn't break with Arun's proposal.

The chances of breakage I agree are very low. But non-zero. I'd still
like us to find a way which is completely safe.

> Of course applications
> that were written just for Linux, and don't take care, might
> also be at risk, but I think the risk is probably low.
> A check of the form:
>
> if (mode == 0666|SHM_LOCKED)
>
> instead of:
>
> if (mode & SHM_LOCKED)
>
> is very obtuse.

Yes, but

if ((mode & ~(SHM_LOCKED|SHM_REMOVED)) == 0666)

is pretty perverse, but more likely. Stranger things have
been seen ;)

> This might not change your point of view (there is a theoretical risk
> after all), but I thought it worth mentioning.

Thanks.