2007-06-06 16:17:50

by Badari Pulavarty

[permalink] [raw]
Subject: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Hi Eric,

Your recent cleanup to shm code, namely

[PATCH] shm: make sysv ipc shared memory use stacked files

took away one of the debugging feature for shm segments.
Originally, shmid were forced to be the inode numbers and
they show up in /proc/pid/maps for the process which mapped
this shared memory segments (vma listing). That way, its easy
to find out who all mapped this shared memory segment. Your
patchset, took away the inode# setting. So, we can't easily
match the shmem segments to /proc/pid/maps easily. (It was
really useful in tracking down a customer problem recently).
Is this done deliberately ? Anything wrong in setting this back ?

Comments ?

Thanks,
Badari

Without patch:
--------------

# ipcs -m

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 884737 db2inst1 767 33554432 13

# grep 884737 /proc/*/maps
#

With patch:
-----------

# ipcs -m

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 884737 db2inst1 767 33554432 13

# grep 884737 /proc/*/maps
/proc/11110/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11111/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11112/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11113/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11114/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11115/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11116/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11117/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11118/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11121/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11122/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11124/maps:4000389c000-4000589c000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)
/proc/11575/maps:40006724000-40008724000 rw-s 00000000 00:08 884737 /SYSV00000000 (deleted)



Here is the patch.

"ino#" in /proc/pid/maps used to match "ipcs -m" output for shared
memory (shmid). It was useful in debugging, but its changed recently.
This patch sets inode number to shared memory id to match /proc/pid/maps.

Signed-off-by: Badari Pulavarty <[email protected]>

Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.000000000 -0700
@@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+ file->f_dentry->d_inode->i_ino = shp->id;

ns->shm_tot += numpages;
shm_unlock(shp);



2007-06-06 17:04:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Badari Pulavarty <[email protected]> writes:

> Hi Eric,
>
> Your recent cleanup to shm code, namely
>
> [PATCH] shm: make sysv ipc shared memory use stacked files
>
> took away one of the debugging feature for shm segments.
> Originally, shmid were forced to be the inode numbers and
> they show up in /proc/pid/maps for the process which mapped
> this shared memory segments (vma listing). That way, its easy
> to find out who all mapped this shared memory segment. Your
> patchset, took away the inode# setting. So, we can't easily
> match the shmem segments to /proc/pid/maps easily. (It was
> really useful in tracking down a customer problem recently).
> Is this done deliberately ? Anything wrong in setting this back ?
>
> Comments ?
>
> Thanks,
> Badari
>
> Without patch:
> --------------
>
> # ipcs -m
>
> ------ Shared Memory Segments --------
> key shmid owner perms bytes nattch status
> 0x00000000 884737 db2inst1 767 33554432 13
>
> # grep 884737 /proc/*/maps
> #
>
> With patch:
> -----------
>
> # ipcs -m
>
> ------ Shared Memory Segments --------
> key shmid owner perms bytes nattch status
> 0x00000000 884737 db2inst1 767 33554432 13
>
> # grep 884737 /proc/*/maps
> /proc/11110/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11111/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11112/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11113/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11114/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11115/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11116/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11117/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11118/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11121/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11122/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11124/maps:4000389c000-4000589c000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
> /proc/11575/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> /SYSV00000000 (deleted)
>
>
>
> Here is the patch.
>
> "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared
> memory (shmid). It was useful in debugging, but its changed recently.
> This patch sets inode number to shared memory id to match /proc/pid/maps.

Theoretically it makes the stacked file concept more brittle, because
it means the lower layers can't care about their inode number.

We do need something to tie these things together.

So I suspect what makes most sense is to simply rename the dentry
SYSVID<segmentid>

That should give you the necessary information while not doing something
that is a long term maintenance problem.

Do you think you can cook up a patch to that effect?
Otherwise I will see if I can.

In practice I'm not really against your change, but I would prefer
to leave the code in a state where someone can reimplement hugetlbfs
or shmfs and we simply don't care.

Eric

2007-06-06 17:49:33

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Wed, 2007-06-06 at 11:02 -0600, Eric W. Biederman wrote:
> Badari Pulavarty <[email protected]> writes:
>
> > Hi Eric,
> >
> > Your recent cleanup to shm code, namely
> >
> > [PATCH] shm: make sysv ipc shared memory use stacked files
> >
> > took away one of the debugging feature for shm segments.
> > Originally, shmid were forced to be the inode numbers and
> > they show up in /proc/pid/maps for the process which mapped
> > this shared memory segments (vma listing). That way, its easy
> > to find out who all mapped this shared memory segment. Your
> > patchset, took away the inode# setting. So, we can't easily
> > match the shmem segments to /proc/pid/maps easily. (It was
> > really useful in tracking down a customer problem recently).
> > Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Comments ?
> >
> > Thanks,
> > Badari
> >
> > Without patch:
> > --------------
> >
> > # ipcs -m
> >
> > ------ Shared Memory Segments --------
> > key shmid owner perms bytes nattch status
> > 0x00000000 884737 db2inst1 767 33554432 13
> >
> > # grep 884737 /proc/*/maps
> > #
> >
> > With patch:
> > -----------
> >
> > # ipcs -m
> >
> > ------ Shared Memory Segments --------
> > key shmid owner perms bytes nattch status
> > 0x00000000 884737 db2inst1 767 33554432 13
> >
> > # grep 884737 /proc/*/maps
> > /proc/11110/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11111/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11112/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11113/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11114/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11115/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11116/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11117/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11118/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11121/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11122/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11124/maps:4000389c000-4000589c000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> > /proc/11575/maps:40006724000-40008724000 rw-s 00000000 00:08 884737
> > /SYSV00000000 (deleted)
> >
> >
> >
> > Here is the patch.
> >
> > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared
> > memory (shmid). It was useful in debugging, but its changed recently.
> > This patch sets inode number to shared memory id to match /proc/pid/maps.
>
> Theoretically it makes the stacked file concept more brittle, because
> it means the lower layers can't care about their inode number.
>
> We do need something to tie these things together.
>
> So I suspect what makes most sense is to simply rename the dentry
> SYSVID<segmentid>

Yep. Currently, we use part of "key" as the dentry name. For example,

# ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x083d0d74 851968 db2inst1 767 33554432 13

# grep 83d0d74 /proc/*/maps
/proc/11110/maps:40004724000-40006724000 rw-s 00000000 00:08 851968 /SYSV083d0d74 (deleted)
/proc/11111/maps:40004724000-40006724000 rw-s 00000000 00:08 851968 /SYSV083d0d74 (deleted)
/proc/11112/maps:40004724000-40006724000 rw-s 00000000 00:08 851968 /SYSV083d0d74 (deleted)
/proc/11113/maps:40004724000-40006724000 rw-s 00000000 00:08 851968 /SYSV083d0d74 (deleted)
..

The issue is with the ones with key = 0x0000000, like following:

# ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 884737 db2inst1 767 33554432 13
0x00000000 950275 db2fenc1 701 23052288 13

There is no unique way to identify them easily :(

I guess, like you suggested, we can change the dentry name to use shmid
instead of the portions of the "key" to make it unique. I think, I can
work out a patch for this.


>
> That should give you the necessary information while not doing something
> that is a long term maintenance problem.
>
> Do you think you can cook up a patch to that effect?
> Otherwise I will see if I can.
>
> In practice I'm not really against your change, but I would prefer
> to leave the code in a state where someone can reimplement hugetlbfs
> or shmfs and we simply don't care.

Thanks for your suggestion.

Thanks,
Badari


2007-06-06 18:25:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Badari Pulavarty <[email protected]> writes:
>
> ------ Shared Memory Segments --------
> key shmid owner perms bytes nattch status
> 0x00000000 884737 db2inst1 767 33554432 13
> 0x00000000 950275 db2fenc1 701 23052288 13
>
> There is no unique way to identify them easily :(
>
> I guess, like you suggested, we can change the dentry name to use shmid
> instead of the portions of the "key" to make it unique. I think, I can
> work out a patch for this.

Thanks. That should be more robust.

Eric

2007-06-07 03:27:16

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Eric W. Biederman writes:
> Badari Pulavarty <[email protected]> writes:

>> Your recent cleanup to shm code, namely
>>
>> [PATCH] shm: make sysv ipc shared memory use stacked files
>>
>> took away one of the debugging feature for shm segments.
>> Originally, shmid were forced to be the inode numbers and
>> they show up in /proc/pid/maps for the process which mapped
>> this shared memory segments (vma listing). That way, its easy
>> to find out who all mapped this shared memory segment. Your
>> patchset, took away the inode# setting. So, we can't easily
>> match the shmem segments to /proc/pid/maps easily. (It was
>> really useful in tracking down a customer problem recently).
>> Is this done deliberately ? Anything wrong in setting this back ?
>
> Theoretically it makes the stacked file concept more brittle,
> because it means the lower layers can't care about their inode
> number.
>
> We do need something to tie these things together.
>
> So I suspect what makes most sense is to simply rename the
> dentry SYSVID<segmentid>

Please stop breaking things in /proc. The pmap command relys
on the old behavior. It's time to revert. Put back the segment ID
where it belongs, and leave the key where it belongs too.

Containers are NOT worth breaking our ABIs left and right.
We don't need to leap off that bridge just because Solaris did,
unless you can explain why complexity and bloat are desirable.
We already have SE Linux, chroot, KVM, and several more!

2007-06-07 03:45:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[email protected]> wrote:

> Eric W. Biederman writes:
> > Badari Pulavarty <[email protected]> writes:
>
> >> Your recent cleanup to shm code, namely
> >>
> >> [PATCH] shm: make sysv ipc shared memory use stacked files
> >>
> >> took away one of the debugging feature for shm segments.
> >> Originally, shmid were forced to be the inode numbers and
> >> they show up in /proc/pid/maps for the process which mapped
> >> this shared memory segments (vma listing). That way, its easy
> >> to find out who all mapped this shared memory segment. Your
> >> patchset, took away the inode# setting. So, we can't easily
> >> match the shmem segments to /proc/pid/maps easily. (It was
> >> really useful in tracking down a customer problem recently).
> >> Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Theoretically it makes the stacked file concept more brittle,
> > because it means the lower layers can't care about their inode
> > number.
> >
> > We do need something to tie these things together.
> >
> > So I suspect what makes most sense is to simply rename the
> > dentry SYSVID<segmentid>
>
> Please stop breaking things in /proc. The pmap command relys
> on the old behavior.

What effect did this change have upon the pmap command? Details, please.

> It's time to revert.

Probably true, but we'd need to understand what the impact was.

2007-06-07 04:53:29

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On 6/6/07, Andrew Morton <[email protected]> wrote:
> On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[email protected]> wrote:
> > Eric W. Biederman writes:
> > > Badari Pulavarty <[email protected]> writes:
> >
> > >> Your recent cleanup to shm code, namely
> > >>
> > >> [PATCH] shm: make sysv ipc shared memory use stacked files
> > >>
> > >> took away one of the debugging feature for shm segments.
> > >> Originally, shmid were forced to be the inode numbers and
> > >> they show up in /proc/pid/maps for the process which mapped
> > >> this shared memory segments (vma listing). That way, its easy
> > >> to find out who all mapped this shared memory segment. Your
> > >> patchset, took away the inode# setting. So, we can't easily
> > >> match the shmem segments to /proc/pid/maps easily. (It was
> > >> really useful in tracking down a customer problem recently).
> > >> Is this done deliberately ? Anything wrong in setting this back ?
> > >
> > > Theoretically it makes the stacked file concept more brittle,
> > > because it means the lower layers can't care about their inode
> > > number.
> > >
> > > We do need something to tie these things together.
> > >
> > > So I suspect what makes most sense is to simply rename the
> > > dentry SYSVID<segmentid>
> >
> > Please stop breaking things in /proc. The pmap command relys
> > on the old behavior.
>
> What effect did this change have upon the pmap command? Details, please.
>
> > It's time to revert.
>
> Probably true, but we'd need to understand what the impact was.

Very simply, pmap reports the shmid.

albert 0 ~$ pmap `pidof X` | egrep -2 shmid
30050000 16384K rw-s- /dev/fb0
31050000 152K rw--- [ anon ]
31076000 384K rw-s- [ shmid=0x3f428000 ]
310d6000 384K rw-s- [ shmid=0x3f430001 ]
31136000 384K rw-s- [ shmid=0x3f438002 ]
31196000 384K rw-s- [ shmid=0x3f440003 ]
311f6000 384K rw-s- [ shmid=0x3f448004 ]
31256000 384K rw-s- [ shmid=0x3f450005 ]
312b6000 384K rw-s- [ shmid=0x3f460006 ]
31316000 384K rw-s- [ shmid=0x3f870007 ]
31491000 140K r---- /usr/share/fonts/type1/gsfonts/n021003l.pfb
3150e000 9496K rw--- [ anon ]

2007-06-07 16:22:18

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Thu, 2007-06-07 at 00:53 -0400, Albert Cahalan wrote:
> On 6/6/07, Andrew Morton <[email protected]> wrote:
> > On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[email protected]> wrote:
> > > Eric W. Biederman writes:
> > > > Badari Pulavarty <[email protected]> writes:
> > >
> > > >> Your recent cleanup to shm code, namely
> > > >>
> > > >> [PATCH] shm: make sysv ipc shared memory use stacked files
> > > >>
> > > >> took away one of the debugging feature for shm segments.
> > > >> Originally, shmid were forced to be the inode numbers and
> > > >> they show up in /proc/pid/maps for the process which mapped
> > > >> this shared memory segments (vma listing). That way, its easy
> > > >> to find out who all mapped this shared memory segment. Your
> > > >> patchset, took away the inode# setting. So, we can't easily
> > > >> match the shmem segments to /proc/pid/maps easily. (It was
> > > >> really useful in tracking down a customer problem recently).
> > > >> Is this done deliberately ? Anything wrong in setting this back ?
> > > >
> > > > Theoretically it makes the stacked file concept more brittle,
> > > > because it means the lower layers can't care about their inode
> > > > number.
> > > >
> > > > We do need something to tie these things together.
> > > >
> > > > So I suspect what makes most sense is to simply rename the
> > > > dentry SYSVID<segmentid>
> > >
> > > Please stop breaking things in /proc. The pmap command relys
> > > on the old behavior.
> >
> > What effect did this change have upon the pmap command? Details, please.
> >
> > > It's time to revert.
> >
> > Probably true, but we'd need to understand what the impact was.
>
> Very simply, pmap reports the shmid.
>
> albert 0 ~$ pmap `pidof X` | egrep -2 shmid
> 30050000 16384K rw-s- /dev/fb0
> 31050000 152K rw--- [ anon ]
> 31076000 384K rw-s- [ shmid=0x3f428000 ]
> 310d6000 384K rw-s- [ shmid=0x3f430001 ]
> 31136000 384K rw-s- [ shmid=0x3f438002 ]
> 31196000 384K rw-s- [ shmid=0x3f440003 ]
> 311f6000 384K rw-s- [ shmid=0x3f448004 ]
> 31256000 384K rw-s- [ shmid=0x3f450005 ]
> 312b6000 384K rw-s- [ shmid=0x3f460006 ]
> 31316000 384K rw-s- [ shmid=0x3f870007 ]
> 31491000 140K r---- /usr/share/fonts/type1/gsfonts/n021003l.pfb
> 3150e000 9496K rw--- [ anon ]

pmap seems to get shmid from "ino#" field of /proc/pid/map.
Its already broken in current mainline.

But, the breakage is not due to namespaces or container effort :(
Its due to noble effort from Eric to clean up the shm code,
take out the hacks to handle hugetlbfs and make the code
more streamlined and readable.

If we really really want old behaviour, we need my one line
patch to force shmid as inode# :(

BTW, I agree with Eric that its would be nice to use shmid as part
of name instead of forcing to be as inode number. It should be
possible for pmap to workout shmid from "key" or name. Isn't it ?

Andrew/Linus, its up to you to figure out if its worth breaking.
Here is the patch to base dentry-name on shmid - so we don't
need to use ino# to identify shmid.

Thanks,
Badari

Instead of basing dentry name on the shm "key", base it on
"shmid" - so it shows up clearly in /proc/pid/maps. Earlier
we were forcing ino# to match shmid.

Signed-off-by: Badari Pulavarty <[email protected]>
Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 13:43:36.000000000 -0700
@@ -364,6 +364,14 @@ static int newseg (struct ipc_namespace
return error;
}

+ error = -ENOSPC;
+ id = shm_addid(ns, shp);
+ if(id == -1)
+ goto no_id;
+
+ /* Build an id, so we can use it for filename */
+ shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
+
if (shmflg & SHM_HUGETLB) {
/* hugetlb_zero_setup takes care of mlock user accounting */
file = hugetlb_zero_setup(size);
@@ -377,34 +385,28 @@ static int newseg (struct ipc_namespace
if ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = 0;
- sprintf (name, "SYSV%08x", key);
+ sprintf (name, "SYSVID%d", shp->id);
file = shmem_file_setup(name, size, acctflag);
}
error = PTR_ERR(file);
if (IS_ERR(file))
goto no_file;

- error = -ENOSPC;
- id = shm_addid(ns, shp);
- if(id == -1)
- goto no_id;
-
shp->shm_cprid = current->tgid;
shp->shm_lprid = 0;
shp->shm_atim = shp->shm_dtim = 0;
shp->shm_ctim = get_seconds();
shp->shm_segsz = size;
shp->shm_nattch = 0;
- shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;

ns->shm_tot += numpages;
shm_unlock(shp);
return shp->id;

-no_id:
- fput(file);
no_file:
+ shm_rmid(ns, shp->id);
+no_id:
security_shm_free(shp);
ipc_rcu_putref(shp);
return error;


2007-06-07 16:43:51

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On 6/7/07, Badari Pulavarty <[email protected]> wrote:

> BTW, I agree with Eric that its would be nice to use shmid as part
> of name instead of forcing to be as inode number. It should be
> possible for pmap to workout shmid from "key" or name. Isn't it ?

It is not at all nice.

1. it's incompatible ABI breakage
2. where will you put the key then, in the inode? :-)

Changing to "SYSVID%d" is no good either. Look, people
are ***parsing*** this stuff in /proc. The /proc filesystem
is not some random sandbox to be playing in.

Before you go messing with it, note that the device number
also matters. (it's per-boot dynamic, but that's OK)
That's how one knows that /SYSV00000000 is not just
a regular file; sadly these didn't get a non-/ prefix.
(and no you can't fix that now; it's way too late)

Next time you feel like breaking an ABI, mind putting
"LET'S BREAK AN ABI!" in the subject of your email?

BTW, I suspect this kind of thing also breaks:
a. fuser, lsof, and other resource usage display tools
b. various obscure emulators (similar to valgrind)

2007-06-07 17:05:43

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> On 6/7/07, Badari Pulavarty <[email protected]> wrote:
>
> > BTW, I agree with Eric that its would be nice to use shmid as part
> > of name instead of forcing to be as inode number. It should be
> > possible for pmap to workout shmid from "key" or name. Isn't it ?
>
> It is not at all nice.
>
> 1. it's incompatible ABI breakage
> 2. where will you put the key then, in the inode? :-)

Nope. Currently "key" is part of the name (but its not unique).

>
> Changing to "SYSVID%d" is no good either. Look, people
> are ***parsing*** this stuff in /proc. The /proc filesystem
> is not some random sandbox to be playing in.
>
> Before you go messing with it, note that the device number
> also matters. (it's per-boot dynamic, but that's OK)
> That's how one knows that /SYSV00000000 is not just
> a regular file; sadly these didn't get a non-/ prefix.
> (and no you can't fix that now; it's way too late)
>
> Next time you feel like breaking an ABI, mind putting
> "LET'S BREAK AN ABI!" in the subject of your email?

I am not breaking ABI. Its already broken in the current
mainline. I am trying to fix it by putting back the ino#
as shmid. Eric had a suggestion that, instead of depending
on the inode# to be shmid, we could embed shmid into name
(instead of "key" which is currently not unique).

> BTW, I suspect this kind of thing also breaks:
> a. fuser, lsof, and other resource usage display tools
> b. various obscure emulators (similar to valgrind)

If you strongly feel that "old" behaviour needs to be retained,
here is the patch I originally suggested.

Thanks,
Badari

"ino#" in /proc/pid/maps used to match "ipcs -m" output for shared
memory (shmid). It was useful in debugging, but its changed recently.
This patch sets inode number to shared memory id to match /proc/pid/maps.

Signed-off-by: Badari Pulavarty <[email protected]>

Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.000000000 -0700
@@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+ file->f_dentry->d_inode->i_ino = shp->id;

ns->shm_tot += numpages;
shm_unlock(shp);



2007-06-07 17:14:53

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Quoting Albert Cahalan ([email protected]):
> On 6/6/07, Andrew Morton <[email protected]> wrote:
> >On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[email protected]>
> >wrote:
> >> Eric W. Biederman writes:
> >> > Badari Pulavarty <[email protected]> writes:
> >>
> >> >> Your recent cleanup to shm code, namely
> >> >>
> >> >> [PATCH] shm: make sysv ipc shared memory use stacked files
> >> >>
> >> >> took away one of the debugging feature for shm segments.
> >> >> Originally, shmid were forced to be the inode numbers and
> >> >> they show up in /proc/pid/maps for the process which mapped
> >> >> this shared memory segments (vma listing). That way, its easy
> >> >> to find out who all mapped this shared memory segment. Your
> >> >> patchset, took away the inode# setting. So, we can't easily
> >> >> match the shmem segments to /proc/pid/maps easily. (It was
> >> >> really useful in tracking down a customer problem recently).
> >> >> Is this done deliberately ? Anything wrong in setting this back ?
> >> >
> >> > Theoretically it makes the stacked file concept more brittle,
> >> > because it means the lower layers can't care about their inode
> >> > number.
> >> >
> >> > We do need something to tie these things together.
> >> >
> >> > So I suspect what makes most sense is to simply rename the
> >> > dentry SYSVID<segmentid>
> >>
> >> Please stop breaking things in /proc. The pmap command relys
> >> on the old behavior.
> >
> >What effect did this change have upon the pmap command? Details, please.
> >
> >> It's time to revert.
> >
> >Probably true, but we'd need to understand what the impact was.
>
> Very simply, pmap reports the shmid.
>
> albert 0 ~$ pmap `pidof X` | egrep -2 shmid
> 30050000 16384K rw-s- /dev/fb0
> 31050000 152K rw--- [ anon ]
> 31076000 384K rw-s- [ shmid=0x3f428000 ]
> 310d6000 384K rw-s- [ shmid=0x3f430001 ]
> 31136000 384K rw-s- [ shmid=0x3f438002 ]
> 31196000 384K rw-s- [ shmid=0x3f440003 ]
> 311f6000 384K rw-s- [ shmid=0x3f448004 ]
> 31256000 384K rw-s- [ shmid=0x3f450005 ]
> 312b6000 384K rw-s- [ shmid=0x3f460006 ]
> 31316000 384K rw-s- [ shmid=0x3f870007 ]
> 31491000 140K r---- /usr/share/fonts/type1/gsfonts/n021003l.pfb
> 3150e000 9496K rw--- [ anon ]

Ok, so IIUC the problem was that inode->i_ino was being set to the id,
and the id can be the same for different things in two namespaces.

So aside from not using the id as inode->i_ino, an alternative is to use
a separate superblock, spearate mqeueue fs, for each ipc ns.

I haven't looked at that enough to see whether it's feasible, i.e. I
don't know what else mqueue fs is used for. Eric, does that sound
reasonable to you?

thanks,
-serge

2007-06-07 19:49:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Thu, 07 Jun 2007 10:06:37 -0700
Badari Pulavarty <[email protected]> wrote:

> On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> >
> > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > of name instead of forcing to be as inode number. It should be
> > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> >
> > It is not at all nice.
> >
> > 1. it's incompatible ABI breakage
> > 2. where will you put the key then, in the inode? :-)
>
> Nope. Currently "key" is part of the name (but its not unique).
>
> >
> > Changing to "SYSVID%d" is no good either. Look, people
> > are ***parsing*** this stuff in /proc. The /proc filesystem
> > is not some random sandbox to be playing in.
> >
> > Before you go messing with it, note that the device number
> > also matters. (it's per-boot dynamic, but that's OK)
> > That's how one knows that /SYSV00000000 is not just
> > a regular file; sadly these didn't get a non-/ prefix.
> > (and no you can't fix that now; it's way too late)
> >
> > Next time you feel like breaking an ABI, mind putting
> > "LET'S BREAK AN ABI!" in the subject of your email?
>
> I am not breaking ABI. Its already broken in the current
> mainline. I am trying to fix it by putting back the ino#
> as shmid. Eric had a suggestion that, instead of depending
> on the inode# to be shmid, we could embed shmid into name
> (instead of "key" which is currently not unique).
>
> > BTW, I suspect this kind of thing also breaks:
> > a. fuser, lsof, and other resource usage display tools
> > b. various obscure emulators (similar to valgrind)
>
> If you strongly feel that "old" behaviour needs to be retained,

yup, we should put it back. The change was, afaik, accidental.

> here is the patch I originally suggested.

Confused. Will this one-liner fix all the userspace breakage to which
Albert refers?

> Thanks,
> Badari
>
> "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared
> memory (shmid). It was useful in debugging, but its changed recently.
> This patch sets inode number to shared memory id to match /proc/pid/maps.
>
> Signed-off-by: Badari Pulavarty <[email protected]>
>
> Index: linux-2.6.22-rc4/ipc/shm.c
> ===================================================================
> --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.000000000 -0700
> +++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.000000000 -0700
> @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace
> shp->shm_nattch = 0;
> shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
> shp->shm_file = file;
> + file->f_dentry->d_inode->i_ino = shp->id;
>
> ns->shm_tot += numpages;
> shm_unlock(shp);
>
>

2007-06-07 19:58:24

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> On Thu, 07 Jun 2007 10:06:37 -0700
> Badari Pulavarty <[email protected]> wrote:
>
> > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> > >
> > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > of name instead of forcing to be as inode number. It should be
> > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > >
> > > It is not at all nice.
> > >
> > > 1. it's incompatible ABI breakage
> > > 2. where will you put the key then, in the inode? :-)
> >
> > Nope. Currently "key" is part of the name (but its not unique).
> >
> > >
> > > Changing to "SYSVID%d" is no good either. Look, people
> > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > is not some random sandbox to be playing in.
> > >
> > > Before you go messing with it, note that the device number
> > > also matters. (it's per-boot dynamic, but that's OK)
> > > That's how one knows that /SYSV00000000 is not just
> > > a regular file; sadly these didn't get a non-/ prefix.
> > > (and no you can't fix that now; it's way too late)
> > >
> > > Next time you feel like breaking an ABI, mind putting
> > > "LET'S BREAK AN ABI!" in the subject of your email?
> >
> > I am not breaking ABI. Its already broken in the current
> > mainline. I am trying to fix it by putting back the ino#
> > as shmid. Eric had a suggestion that, instead of depending
> > on the inode# to be shmid, we could embed shmid into name
> > (instead of "key" which is currently not unique).
> >
> > > BTW, I suspect this kind of thing also breaks:
> > > a. fuser, lsof, and other resource usage display tools
> > > b. various obscure emulators (similar to valgrind)
> >
> > If you strongly feel that "old" behaviour needs to be retained,
>
> yup, we should put it back. The change was, afaik, accidental.
>
> > here is the patch I originally suggested.
>
> Confused. Will this one-liner fix all the userspace breakage to which
> Albert refers?

Yes. Albert, please correct me if I am wrong.

Thanks,
Badari


> > "ino#" in /proc/pid/maps used to match "ipcs -m" output for shared
> > memory (shmid). It was useful in debugging, but its changed recently.
> > This patch sets inode number to shared memory id to match /proc/pid/maps.
> >
> > Signed-off-by: Badari Pulavarty <[email protected]>
> >
> > Index: linux-2.6.22-rc4/ipc/shm.c
> > ===================================================================
> > --- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-04 17:57:25.000000000 -0700
> > +++ linux-2.6.22-rc4/ipc/shm.c 2007-06-06 08:23:57.000000000 -0700
> > @@ -397,6 +397,7 @@ static int newseg (struct ipc_namespace
> > shp->shm_nattch = 0;
> > shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
> > shp->shm_file = file;
> > + file->f_dentry->d_inode->i_ino = shp->id;
> >
> > ns->shm_tot += numpages;
> > shm_unlock(shp);
> >
> >

2007-06-07 20:38:17

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Quoting Badari Pulavarty ([email protected]):
> On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > On Thu, 07 Jun 2007 10:06:37 -0700
> > Badari Pulavarty <[email protected]> wrote:
> >
> > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> > > >
> > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > of name instead of forcing to be as inode number. It should be
> > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > >
> > > > It is not at all nice.
> > > >
> > > > 1. it's incompatible ABI breakage
> > > > 2. where will you put the key then, in the inode? :-)
> > >
> > > Nope. Currently "key" is part of the name (but its not unique).
> > >
> > > >
> > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > is not some random sandbox to be playing in.
> > > >
> > > > Before you go messing with it, note that the device number
> > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > That's how one knows that /SYSV00000000 is not just
> > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > (and no you can't fix that now; it's way too late)
> > > >
> > > > Next time you feel like breaking an ABI, mind putting
> > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > >
> > > I am not breaking ABI. Its already broken in the current
> > > mainline. I am trying to fix it by putting back the ino#
> > > as shmid. Eric had a suggestion that, instead of depending
> > > on the inode# to be shmid, we could embed shmid into name
> > > (instead of "key" which is currently not unique).
> > >
> > > > BTW, I suspect this kind of thing also breaks:
> > > > a. fuser, lsof, and other resource usage display tools
> > > > b. various obscure emulators (similar to valgrind)
> > >
> > > If you strongly feel that "old" behaviour needs to be retained,
> >
> > yup, we should put it back. The change was, afaik, accidental.
> >
> > > here is the patch I originally suggested.
> >
> > Confused. Will this one-liner fix all the userspace breakage to which
> > Albert refers?
>
> Yes. Albert, please correct me if I am wrong.

It will, but could lead to two different inodes with the same i_ino,
right?

thanks,
-serge

2007-06-07 21:15:33

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> Quoting Badari Pulavarty ([email protected]):
> > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > Badari Pulavarty <[email protected]> wrote:
> > >
> > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> > > > >
> > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > >
> > > > > It is not at all nice.
> > > > >
> > > > > 1. it's incompatible ABI breakage
> > > > > 2. where will you put the key then, in the inode? :-)
> > > >
> > > > Nope. Currently "key" is part of the name (but its not unique).
> > > >
> > > > >
> > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > is not some random sandbox to be playing in.
> > > > >
> > > > > Before you go messing with it, note that the device number
> > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > That's how one knows that /SYSV00000000 is not just
> > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > (and no you can't fix that now; it's way too late)
> > > > >
> > > > > Next time you feel like breaking an ABI, mind putting
> > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > >
> > > > I am not breaking ABI. Its already broken in the current
> > > > mainline. I am trying to fix it by putting back the ino#
> > > > as shmid. Eric had a suggestion that, instead of depending
> > > > on the inode# to be shmid, we could embed shmid into name
> > > > (instead of "key" which is currently not unique).
> > > >
> > > > > BTW, I suspect this kind of thing also breaks:
> > > > > a. fuser, lsof, and other resource usage display tools
> > > > > b. various obscure emulators (similar to valgrind)
> > > >
> > > > If you strongly feel that "old" behaviour needs to be retained,
> > >
> > > yup, we should put it back. The change was, afaik, accidental.
> > >
> > > > here is the patch I originally suggested.
> > >
> > > Confused. Will this one-liner fix all the userspace breakage to which
> > > Albert refers?
> >
> > Yes. Albert, please correct me if I am wrong.
>
> It will, but could lead to two different inodes with the same i_ino,
> right?

Only if we generate same ID in two different namespaces. Is it currently
possible ?

Thanks,
Badari


2007-06-07 22:10:08

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Quoting Badari Pulavarty ([email protected]):
> On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> > Quoting Badari Pulavarty ([email protected]):
> > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > > Badari Pulavarty <[email protected]> wrote:
> > > >
> > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> > > > > >
> > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > >
> > > > > > It is not at all nice.
> > > > > >
> > > > > > 1. it's incompatible ABI breakage
> > > > > > 2. where will you put the key then, in the inode? :-)
> > > > >
> > > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > >
> > > > > >
> > > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > > is not some random sandbox to be playing in.
> > > > > >
> > > > > > Before you go messing with it, note that the device number
> > > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > > That's how one knows that /SYSV00000000 is not just
> > > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > > (and no you can't fix that now; it's way too late)
> > > > > >
> > > > > > Next time you feel like breaking an ABI, mind putting
> > > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > >
> > > > > I am not breaking ABI. Its already broken in the current
> > > > > mainline. I am trying to fix it by putting back the ino#
> > > > > as shmid. Eric had a suggestion that, instead of depending
> > > > > on the inode# to be shmid, we could embed shmid into name
> > > > > (instead of "key" which is currently not unique).
> > > > >
> > > > > > BTW, I suspect this kind of thing also breaks:
> > > > > > a. fuser, lsof, and other resource usage display tools
> > > > > > b. various obscure emulators (similar to valgrind)
> > > > >
> > > > > If you strongly feel that "old" behaviour needs to be retained,
> > > >
> > > > yup, we should put it back. The change was, afaik, accidental.
> > > >
> > > > > here is the patch I originally suggested.
> > > >
> > > > Confused. Will this one-liner fix all the userspace breakage to which
> > > > Albert refers?
> > >
> > > Yes. Albert, please correct me if I am wrong.
> >
> > It will, but could lead to two different inodes with the same i_ino,
> > right?
>
> Only if we generate same ID in two different namespaces. Is it currently
> possible ?

Should be nothing stopping it.

But like I say we never find the inode based on i_ino, and don't hash
the inode, so it might be ok.

-serge

2007-06-07 22:15:00

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Quoting Serge E. Hallyn ([email protected]):
> Quoting Badari Pulavarty ([email protected]):
> > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > Badari Pulavarty <[email protected]> wrote:
> > >
> > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> > > > >
> > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > >
> > > > > It is not at all nice.
> > > > >
> > > > > 1. it's incompatible ABI breakage
> > > > > 2. where will you put the key then, in the inode? :-)
> > > >
> > > > Nope. Currently "key" is part of the name (but its not unique).
> > > >
> > > > >
> > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > is not some random sandbox to be playing in.
> > > > >
> > > > > Before you go messing with it, note that the device number
> > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > That's how one knows that /SYSV00000000 is not just
> > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > (and no you can't fix that now; it's way too late)
> > > > >
> > > > > Next time you feel like breaking an ABI, mind putting
> > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > >
> > > > I am not breaking ABI. Its already broken in the current
> > > > mainline. I am trying to fix it by putting back the ino#
> > > > as shmid. Eric had a suggestion that, instead of depending
> > > > on the inode# to be shmid, we could embed shmid into name
> > > > (instead of "key" which is currently not unique).
> > > >
> > > > > BTW, I suspect this kind of thing also breaks:
> > > > > a. fuser, lsof, and other resource usage display tools
> > > > > b. various obscure emulators (similar to valgrind)
> > > >
> > > > If you strongly feel that "old" behaviour needs to be retained,
> > >
> > > yup, we should put it back. The change was, afaik, accidental.
> > >
> > > > here is the patch I originally suggested.
> > >
> > > Confused. Will this one-liner fix all the userspace breakage to which
> > > Albert refers?
> >
> > Yes. Albert, please correct me if I am wrong.
>
> It will, but could lead to two different inodes with the same i_ino,
> right?

Well I guess it's not *technically* a problem since these inodes are
never hashed.

-serge

2007-06-07 22:21:45

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid



Serge E. Hallyn wrote:

>Quoting Badari Pulavarty ([email protected]):
>
>>On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
>>
>>>Quoting Badari Pulavarty ([email protected]):
>>>
>>>>On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
>>>>
>>>>>On Thu, 07 Jun 2007 10:06:37 -0700
>>>>>Badari Pulavarty <[email protected]> wrote:
>>>>>
>>>>>>On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
>>>>>>
>>>>>>>On 6/7/07, Badari Pulavarty <[email protected]> wrote:
>>>>>>>
>>>>>>>>BTW, I agree with Eric that its would be nice to use shmid as part
>>>>>>>>of name instead of forcing to be as inode number. It should be
>>>>>>>>possible for pmap to workout shmid from "key" or name. Isn't it ?
>>>>>>>>
>>>>>>>It is not at all nice.
>>>>>>>
>>>>>>>1. it's incompatible ABI breakage
>>>>>>>2. where will you put the key then, in the inode? :-)
>>>>>>>
>>>>>>Nope. Currently "key" is part of the name (but its not unique).
>>>>>>
>>>>>>>Changing to "SYSVID%d" is no good either. Look, people
>>>>>>>are ***parsing*** this stuff in /proc. The /proc filesystem
>>>>>>>is not some random sandbox to be playing in.
>>>>>>>
>>>>>>>Before you go messing with it, note that the device number
>>>>>>>also matters. (it's per-boot dynamic, but that's OK)
>>>>>>>That's how one knows that /SYSV00000000 is not just
>>>>>>>a regular file; sadly these didn't get a non-/ prefix.
>>>>>>>(and no you can't fix that now; it's way too late)
>>>>>>>
>>>>>>>Next time you feel like breaking an ABI, mind putting
>>>>>>>"LET'S BREAK AN ABI!" in the subject of your email?
>>>>>>>
>>>>>>I am not breaking ABI. Its already broken in the current
>>>>>>mainline. I am trying to fix it by putting back the ino#
>>>>>>as shmid. Eric had a suggestion that, instead of depending
>>>>>>on the inode# to be shmid, we could embed shmid into name
>>>>>>(instead of "key" which is currently not unique).
>>>>>>
>>>>>>>BTW, I suspect this kind of thing also breaks:
>>>>>>>a. fuser, lsof, and other resource usage display tools
>>>>>>>b. various obscure emulators (similar to valgrind)
>>>>>>>
>>>>>>If you strongly feel that "old" behaviour needs to be retained,
>>>>>>
>>>>>yup, we should put it back. The change was, afaik, accidental.
>>>>>
>>>>>>here is the patch I originally suggested.
>>>>>>
>>>>>Confused. Will this one-liner fix all the userspace breakage to which
>>>>>Albert refers?
>>>>>
>>>>Yes. Albert, please correct me if I am wrong.
>>>>
>>>It will, but could lead to two different inodes with the same i_ino,
>>>right?
>>>
>>Only if we generate same ID in two different namespaces. Is it currently
>>possible ?
>>
>
>Should be nothing stopping it.
>
>But like I say we never find the inode based on i_ino, and don't hash
>the inode, so it might be ok.
>
Correct. We might end up with same shmid - which mean same inode# shows
up in /proc/pid/maps.
If we don't unshare pid namespace or look from parent namespace - we
will end up seeing same
shmid/inode# in different /proc/pid/maps, even though they are
different. But I guess its okay..

Thanks,
Badari



2007-06-07 23:44:51

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

Quoting Serge E. Hallyn ([email protected]):
> Quoting Badari Pulavarty ([email protected]):
> > On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
> > > Quoting Badari Pulavarty ([email protected]):
> > > > On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
> > > > > On Thu, 07 Jun 2007 10:06:37 -0700
> > > > > Badari Pulavarty <[email protected]> wrote:
> > > > >
> > > > > > On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
> > > > > > > On 6/7/07, Badari Pulavarty <[email protected]> wrote:
> > > > > > >
> > > > > > > > BTW, I agree with Eric that its would be nice to use shmid as part
> > > > > > > > of name instead of forcing to be as inode number. It should be
> > > > > > > > possible for pmap to workout shmid from "key" or name. Isn't it ?
> > > > > > >
> > > > > > > It is not at all nice.
> > > > > > >
> > > > > > > 1. it's incompatible ABI breakage
> > > > > > > 2. where will you put the key then, in the inode? :-)
> > > > > >
> > > > > > Nope. Currently "key" is part of the name (but its not unique).
> > > > > >
> > > > > > >
> > > > > > > Changing to "SYSVID%d" is no good either. Look, people
> > > > > > > are ***parsing*** this stuff in /proc. The /proc filesystem
> > > > > > > is not some random sandbox to be playing in.
> > > > > > >
> > > > > > > Before you go messing with it, note that the device number
> > > > > > > also matters. (it's per-boot dynamic, but that's OK)
> > > > > > > That's how one knows that /SYSV00000000 is not just
> > > > > > > a regular file; sadly these didn't get a non-/ prefix.
> > > > > > > (and no you can't fix that now; it's way too late)
> > > > > > >
> > > > > > > Next time you feel like breaking an ABI, mind putting
> > > > > > > "LET'S BREAK AN ABI!" in the subject of your email?
> > > > > >
> > > > > > I am not breaking ABI. Its already broken in the current
> > > > > > mainline. I am trying to fix it by putting back the ino#
> > > > > > as shmid. Eric had a suggestion that, instead of depending
> > > > > > on the inode# to be shmid, we could embed shmid into name
> > > > > > (instead of "key" which is currently not unique).
> > > > > >
> > > > > > > BTW, I suspect this kind of thing also breaks:
> > > > > > > a. fuser, lsof, and other resource usage display tools
> > > > > > > b. various obscure emulators (similar to valgrind)
> > > > > >
> > > > > > If you strongly feel that "old" behaviour needs to be retained,
> > > > >
> > > > > yup, we should put it back. The change was, afaik, accidental.
> > > > >
> > > > > > here is the patch I originally suggested.
> > > > >
> > > > > Confused. Will this one-liner fix all the userspace breakage to which
> > > > > Albert refers?
> > > >
> > > > Yes. Albert, please correct me if I am wrong.
> > >
> > > It will, but could lead to two different inodes with the same i_ino,
> > > right?
> >
> > Only if we generate same ID in two different namespaces. Is it currently
> > possible ?
>
> Should be nothing stopping it.

(just to be more certain, a quick test showed I can get id 0 for
different keys, and different ids for the same key 0xff, in different
ipc namespaces)

-serge

2007-06-07 23:57:17

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid



Serge E. Hallyn wrote:

>Quoting Serge E. Hallyn ([email protected]):
>
>>Quoting Badari Pulavarty ([email protected]):
>>
>>>On Thu, 2007-06-07 at 15:37 -0500, Serge E. Hallyn wrote:
>>>
>>>>Quoting Badari Pulavarty ([email protected]):
>>>>
>>>>>On Thu, 2007-06-07 at 12:48 -0700, Andrew Morton wrote:
>>>>>
>>>>>>On Thu, 07 Jun 2007 10:06:37 -0700
>>>>>>Badari Pulavarty <[email protected]> wrote:
>>>>>>
>>>>>>>On Thu, 2007-06-07 at 12:43 -0400, Albert Cahalan wrote:
>>>>>>>
>>>>>>>>On 6/7/07, Badari Pulavarty <[email protected]> wrote:
>>>>>>>>
>>>>>>>>>BTW, I agree with Eric that its would be nice to use shmid as part
>>>>>>>>>of name instead of forcing to be as inode number. It should be
>>>>>>>>>possible for pmap to workout shmid from "key" or name. Isn't it ?
>>>>>>>>>
>>>>>>>>It is not at all nice.
>>>>>>>>
>>>>>>>>1. it's incompatible ABI breakage
>>>>>>>>2. where will you put the key then, in the inode? :-)
>>>>>>>>
>>>>>>>Nope. Currently "key" is part of the name (but its not unique).
>>>>>>>
>>>>>>>>Changing to "SYSVID%d" is no good either. Look, people
>>>>>>>>are ***parsing*** this stuff in /proc. The /proc filesystem
>>>>>>>>is not some random sandbox to be playing in.
>>>>>>>>
>>>>>>>>Before you go messing with it, note that the device number
>>>>>>>>also matters. (it's per-boot dynamic, but that's OK)
>>>>>>>>That's how one knows that /SYSV00000000 is not just
>>>>>>>>a regular file; sadly these didn't get a non-/ prefix.
>>>>>>>>(and no you can't fix that now; it's way too late)
>>>>>>>>
>>>>>>>>Next time you feel like breaking an ABI, mind putting
>>>>>>>>"LET'S BREAK AN ABI!" in the subject of your email?
>>>>>>>>
>>>>>>>I am not breaking ABI. Its already broken in the current
>>>>>>>mainline. I am trying to fix it by putting back the ino#
>>>>>>>as shmid. Eric had a suggestion that, instead of depending
>>>>>>>on the inode# to be shmid, we could embed shmid into name
>>>>>>>(instead of "key" which is currently not unique).
>>>>>>>
>>>>>>>>BTW, I suspect this kind of thing also breaks:
>>>>>>>>a. fuser, lsof, and other resource usage display tools
>>>>>>>>b. various obscure emulators (similar to valgrind)
>>>>>>>>
>>>>>>>If you strongly feel that "old" behaviour needs to be retained,
>>>>>>>
>>>>>>yup, we should put it back. The change was, afaik, accidental.
>>>>>>
>>>>>>>here is the patch I originally suggested.
>>>>>>>
>>>>>>Confused. Will this one-liner fix all the userspace breakage to which
>>>>>>Albert refers?
>>>>>>
>>>>>Yes. Albert, please correct me if I am wrong.
>>>>>
>>>>It will, but could lead to two different inodes with the same i_ino,
>>>>right?
>>>>
>>>Only if we generate same ID in two different namespaces. Is it currently
>>>possible ?
>>>
>>Should be nothing stopping it.
>>
>
>(just to be more certain, a quick test showed I can get id 0 for
>different keys, and different ids for the same key 0xff, in different
>ipc namespaces)
>
Funny. I played with it and decided that it can happen :)

Thanks,
Badari



2007-06-08 03:47:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

"Serge E. Hallyn" <[email protected]> writes:

> Ok, so IIUC the problem was that inode->i_ino was being set to the id,
> and the id can be the same for different things in two namespaces.

There is nothing preventing inode number collisions in this code even
without multiple namespaces, and even when it was functioning
correctly. However as it does not seem possible to find these files
through normal filesystem operations that does not seem to be a problem.

> So aside from not using the id as inode->i_ino, an alternative is to use
> a separate superblock, spearate mqeueue fs, for each ipc ns.
>
> I haven't looked at that enough to see whether it's feasible, i.e. I
> don't know what else mqueue fs is used for. Eric, does that sound
> reasonable to you?

At this point given that we actually have a small user space dependency
and the fact that after I have reviewed the code it looks harmless to
change the inode number of those inodes, in both cases they are just
anonymous inodes generated with new_inode, and anything that we wrap
is likely to be equally so.

So it looks to me like we need to do three things:
- Fix the inode number
- Fix the name on the hugetlbfs dentry to hold the key
- Add a big fat comment that user space programs depend on this
behavior of both the dentry name and the inode number.

So Badari it looks like your original patch plus a little bit is
what we need.

Eric

2007-06-08 04:41:53

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On 6/7/07, Eric W. Biederman <[email protected]> wrote:

> So it looks to me like we need to do three things:
> - Fix the inode number
> - Fix the name on the hugetlbfs dentry to hold the key
> - Add a big fat comment that user space programs depend on this
> behavior of both the dentry name and the inode number.

Assuming that this proposed fix goes in:

Since the inode number is the shmid, and this is a number
that the kernel randomly chooses AFAIK, there should be
no need to have different shm segments sharing the same
inode number.

The situation with the key is a bit more disturbing, though
we already hit that anyway when IPC_PRIVATE is used.
(why anybody would NOT use IPC_PRIVATE is a mystery)
So having the key in the name doesn't make things worse.

I have some concern about the device minor number.
This should be the same for all shm mappings; I do not
know if the behavior changed.

2007-06-08 05:57:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

"Albert Cahalan" <[email protected]> writes:

> On 6/7/07, Eric W. Biederman <[email protected]> wrote:
>
>> So it looks to me like we need to do three things:
>> - Fix the inode number
>> - Fix the name on the hugetlbfs dentry to hold the key
>> - Add a big fat comment that user space programs depend on this
>> behavior of both the dentry name and the inode number.
>
> Assuming that this proposed fix goes in:
>
> Since the inode number is the shmid, and this is a number
> that the kernel randomly chooses AFAIK, there should be
> no need to have different shm segments sharing the same
> inode number.

Where we run into inode number confusion is that all of
these shm segments are actually files on a tmpfs filesystem
somewhere, and by making the inode number the shmid we loose
the tmpfs inode number. So it is possible we get tmpfs inode
number conflicts. However the inode number is not used for
anything, and the files are not visible in any other way except
as shm segments so it doesn't matter.

There is another case with ipc namespaces where we ultimately need
to support duplicate shmids on the same machine (so migration
is a possibility). However by and large the user space
processes with duplicate ids should be invisible to each other.

> The situation with the key is a bit more disturbing, though
> we already hit that anyway when IPC_PRIVATE is used.
> (why anybody would NOT use IPC_PRIVATE is a mystery)
> So having the key in the name doesn't make things worse.

Having "SYSV" in the name appears mandatory. Otherwise you
don't even know it is a shm file. Although I may be confused.

> I have some concern about the device minor number.
> This should be the same for all shm mappings; I do not
> know if the behavior changed.

So I haven't changed anything here. But I haven't really
looked either.

I don't have a clue if hugetlbfs files use the same device minor
number as tmpfs files.

Hmm. Thinking about this I have just realized that we may want
to approach this a little differently. Currently I am reusing
the dentry and inode structure that hugetlbfs and tmpfs return
me, and simply have a distinct struct file for each shm mapping.

There is a little more cost but it may actually make sense to have
a dentry and inode that is specific to shm.c so we can do whatever
we need to without adding requirements to the normal tmpfs or hugtlb
code.

Eric

2007-06-08 06:52:13

by Albert Cahalan

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

On 6/8/07, Eric W. Biederman <[email protected]> wrote:
> "Albert Cahalan" <[email protected]> writes:
> > On 6/7/07, Eric W. Biederman <[email protected]> wrote:

> >> So it looks to me like we need to do three things:
> >> - Fix the inode number
> >> - Fix the name on the hugetlbfs dentry to hold the key
> >> - Add a big fat comment that user space programs depend on this
> >> behavior of both the dentry name and the inode number.
> >
> > Assuming that this proposed fix goes in:
> >
> > Since the inode number is the shmid, and this is a number
> > that the kernel randomly chooses AFAIK, there should be
> > no need to have different shm segments sharing the same
> > inode number.
>
> Where we run into inode number confusion is that all of
> these shm segments are actually files on a tmpfs filesystem
> somewhere, and by making the inode number the shmid we loose
> the tmpfs inode number. So it is possible we get tmpfs inode
> number conflicts. However the inode number is not used for
> anything, and the files are not visible in any other way except
> as shm segments so it doesn't matter.

Eh, the kernel choses both shmid and tmpfs inode number.
You could set a high bit in one or the other.

> There is another case with ipc namespaces where we ultimately need
> to support duplicate shmids on the same machine (so migration
> is a possibility). However by and large the user space
> processes with duplicate ids should be invisible to each other.

On the bright side, this only screws up people who get the
crazy idea that processes can be migrated.

> > The situation with the key is a bit more disturbing, though
> > we already hit that anyway when IPC_PRIVATE is used.
> > (why anybody would NOT use IPC_PRIVATE is a mystery)
> > So having the key in the name doesn't make things worse.
>
> Having "SYSV" in the name appears mandatory. Otherwise you
> don't even know it is a shm file. Although I may be confused.

It's mandatory for a different reason: to satisfy parsers.

It is nearly useless for identifying shm files. Look what I can do:
touch /SYSV00000000
touch '/SYSV00000000 (deleted)'

(so pmap creates a shm, looks for the address in /proc/self/maps,
determines the device major/minor in use, and then uses that)

> Hmm. Thinking about this I have just realized that we may want
> to approach this a little differently. Currently I am reusing
> the dentry and inode structure that hugetlbfs and tmpfs return
> me, and simply have a distinct struct file for each shm mapping.
>
> There is a little more cost but it may actually make sense to have
> a dentry and inode that is specific to shm.c so we can do whatever
> we need to without adding requirements to the normal tmpfs or hugtlb
> code.

Piggybacking on tmpfs has always seemed a bit dirty to me.

2007-06-08 16:08:18

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid



Eric W. Biederman wrote:

>
>At this point given that we actually have a small user space dependency
>and the fact that after I have reviewed the code it looks harmless to
>change the inode number of those inodes, in both cases they are just
>anonymous inodes generated with new_inode, and anything that we wrap
>is likely to be equally so.
>
>So it looks to me like we need to do three things:
>- Fix the inode number
>
Okay. its already done.

>
>- Fix the name on the hugetlbfs dentry to hold the key
>
I don't see need for doing this for hugetlbfs inodes. Currently, they
don't base their
name on "key" + basing on the "key" is kind of useless anyway (its not
unique).

>
>- Add a big fat comment that user space programs depend on this
> behavior of both the dentry name and the inode number.
>
I don't think, the user-space can depend on the dentry-name. It can only
depend
on inode# to match shmid. (since key is not unique esp. for key=0x00000000).

BTW, I agree that shmid is not unique even without namespaces as its
based on
seq# and we wrap seq#.

Thanks,
Badari




2007-06-08 22:31:24

by Badari Pulavarty

[permalink] [raw]
Subject: [PATCH] Restore shmid as inode# to fix /proc/pid/maps ABI breakage

Andrew,

Can you include this in -mm ?

Thanks,
Badari

shmid used to be stored as inode# for shared memory segments. Some of
the proc-ps tools use this from /proc/pid/maps. Recent cleanups
to newseg() changed it. This patch sets inode number back to shared
memory id to fix breakage.

Signed-off-by: Badari Pulavarty <[email protected]>

Index: linux-2.6.22-rc4/ipc/shm.c
===================================================================
--- linux-2.6.22-rc4.orig/ipc/shm.c 2007-06-08 15:17:20.000000000 -0700
+++ linux-2.6.22-rc4/ipc/shm.c 2007-06-08 15:19:38.000000000 -0700
@@ -397,6 +397,11 @@ static int newseg (struct ipc_namespace
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+ /*
+ * shmid gets reported as "inode#" in /proc/pid/maps.
+ * proc-ps tools use this. Changing this will break them.
+ */
+ file->f_dentry->d_inode->i_ino = shp->id;

ns->shm_tot += numpages;
shm_unlock(shp);


2007-06-08 23:45:19

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] shm: Fix the filename of hugetlb sysv shared memory


Some user space tools need to identify SYSV shared memory when
examining /proc/<pid>/maps. To do so they look for a block device
with major zero, a dentry named SYSV<sysv key>, and having the minor of
the internal sysv shared memory kernel mount.

To help these tools and to make it easier for people just browsing
/proc/<pid>/maps this patch modifies hugetlb sysv shared memory to
use the SYSV<key> dentry naming convention.

User space tools will still have to be aware that hugetlb sysv
shared memory lives on a different internal kernel mount and so
has a different block device minor number from the rest of sysv
shared memory.

Signed-off-by: Eric W. Biederman <[email protected]>
---
fs/hugetlbfs/inode.c | 7 ++-----
include/linux/hugetlb.h | 4 ++--
ipc/shm.c | 6 +++---
3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index aa083dd..e6b46b3 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -736,15 +736,13 @@ static int can_do_hugetlb_shm(void)
can_do_mlock());
}

-struct file *hugetlb_zero_setup(size_t size)
+struct file *hugetlb_file_setup(const char *name, size_t size)
{
int error = -ENOMEM;
struct file *file;
struct inode *inode;
struct dentry *dentry, *root;
struct qstr quick_string;
- char buf[16];
- static atomic_t counter;

if (!hugetlbfs_vfsmount)
return ERR_PTR(-ENOENT);
@@ -756,8 +754,7 @@ struct file *hugetlb_zero_setup(size_t size)
return ERR_PTR(-ENOMEM);

root = hugetlbfs_vfsmount->mnt_root;
- snprintf(buf, 16, "%u", atomic_inc_return(&counter));
- quick_string.name = buf;
+ quick_string.name = name;
quick_string.len = strlen(quick_string.name);
quick_string.hash = 0;
dentry = d_alloc(root, &quick_string);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b4570b6..2c13715 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -163,7 +163,7 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb)

extern const struct file_operations hugetlbfs_file_operations;
extern struct vm_operations_struct hugetlb_vm_ops;
-struct file *hugetlb_zero_setup(size_t);
+struct file *hugetlb_file_setup(const char *name, size_t);
int hugetlb_get_quota(struct address_space *mapping);
void hugetlb_put_quota(struct address_space *mapping);

@@ -185,7 +185,7 @@ static inline void set_file_hugepages(struct file *file)

#define is_file_hugepages(file) 0
#define set_file_hugepages(file) BUG()
-#define hugetlb_zero_setup(size) ERR_PTR(-ENOSYS)
+#define hugetlb_file_setup(name,size) ERR_PTR(-ENOSYS)

#endif /* !CONFIG_HUGETLBFS */

diff --git a/ipc/shm.c b/ipc/shm.c
index 4fefbad..c31f743 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -364,9 +364,10 @@ static int newseg (struct ipc_namespace *ns, key_t key, int shmflg, size_t size)
return error;
}

+ sprintf (name, "SYSV%08x", key);
if (shmflg & SHM_HUGETLB) {
- /* hugetlb_zero_setup takes care of mlock user accounting */
- file = hugetlb_zero_setup(size);
+ /* hugetlb_file_setup takes care of mlock user accounting */
+ file = hugetlb_file_setup(name, size);
shp->mlock_user = current->user;
} else {
int acctflag = VM_ACCOUNT;
@@ -377,7 +378,6 @@ static int newseg (struct ipc_namespace *ns, key_t key, int shmflg, size_t size)
if ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = 0;
- sprintf (name, "SYSV%08x", key);
file = shmem_file_setup(name, size, acctflag);
}
error = PTR_ERR(file);
--
1.5.1.1.181.g2de0

2007-06-08 23:55:38

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

On Fri, 08 Jun 2007 17:43:34 -0600
[email protected] (Eric W. Biederman) wrote:

> Some user space tools need to identify SYSV shared memory when
> examining /proc/<pid>/maps. To do so they look for a block device
> with major zero, a dentry named SYSV<sysv key>, and having the minor of
> the internal sysv shared memory kernel mount.
>
> To help these tools and to make it easier for people just browsing
> /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to
> use the SYSV<key> dentry naming convention.
>
> User space tools will still have to be aware that hugetlb sysv
> shared memory lives on a different internal kernel mount and so
> has a different block device minor number from the rest of sysv
> shared memory.

I assume this fix is preferred over Badari's? If so, why?



From: Badari Pulavarty <[email protected]>

shmid used to be stored as inode# for shared memory segments. Some of
the proc-ps tools use this from /proc/pid/maps. Recent cleanups
to newseg() changed it. This patch sets inode number back to shared
memory id to fix breakage.

Signed-off-by: Badari Pulavarty <[email protected]>
Cc: "Albert Cahalan" <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

ipc/shm.c | 5 +++++
1 files changed, 5 insertions(+)

diff -puN ipc/shm.c~restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage ipc/shm.c
--- a/ipc/shm.c~restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage
+++ a/ipc/shm.c
@@ -397,6 +397,11 @@ static int newseg (struct ipc_namespace
shp->shm_nattch = 0;
shp->id = shm_buildid(ns, id, shp->shm_perm.seq);
shp->shm_file = file;
+ /*
+ * shmid gets reported as "inode#" in /proc/pid/maps.
+ * proc-ps tools use this. Changing this will break them.
+ */
+ file->f_dentry->d_inode->i_ino = shp->id;

ns->shm_tot += numpages;
shm_unlock(shp);
_

2007-06-09 04:32:19

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory



Andrew Morton wrote:

>On Fri, 08 Jun 2007 17:43:34 -0600
>[email protected] (Eric W. Biederman) wrote:
>
>>Some user space tools need to identify SYSV shared memory when
>>examining /proc/<pid>/maps. To do so they look for a block device
>>with major zero, a dentry named SYSV<sysv key>, and having the minor of
>>the internal sysv shared memory kernel mount.
>>
>>To help these tools and to make it easier for people just browsing
>>/proc/<pid>/maps this patch modifies hugetlb sysv shared memory to
>>use the SYSV<key> dentry naming convention.
>>
>>User space tools will still have to be aware that hugetlb sysv
>>shared memory lives on a different internal kernel mount and so
>>has a different block device minor number from the rest of sysv
>>shared memory.
>>
>
>I assume this fix is preferred over Badari's? If so, why?
>
No. You still need my patch to fix the current breakage.

This patch makes hugetlbfs also use same naming convention as regular
shmem for its
name. This is not absolutely needed, its a nice to have. Currently, user
space tools
can't depend on the filename alone, since its not unique (based on kry).

Thanks,
Badari

>


2007-06-09 08:03:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

Badari Pulavarty <[email protected]> writes:

> No. You still need my patch to fix the current breakage.

Agreed.

> This patch makes hugetlbfs also use same naming convention as regular shmem for
> its
> name. This is not absolutely needed, its a nice to have. Currently, user space
> tools
> can't depend on the filename alone, since its not unique (based on kry).

Exactly. My patch is an additional fix/cleanup to bring the hugetlbfs
shm segments as close to their normal counterparts as I can.

pmap still won't recognize them as shm segments (different block device
minor number) but otherwise they are now presented identically with
normal shm segments.

Eric

2007-06-11 18:11:43

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

On Fri, 08 Jun 2007 17:43:34 -0600
[email protected] (Eric W. Biederman) wrote:

> Some user space tools need to identify SYSV shared memory when
> examining /proc/<pid>/maps. To do so they look for a block device
> with major zero, a dentry named SYSV<sysv key>, and having the minor of
> the internal sysv shared memory kernel mount.
>
> To help these tools and to make it easier for people just browsing
> /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to
> use the SYSV<key> dentry naming convention.
>
> User space tools will still have to be aware that hugetlb sysv
> shared memory lives on a different internal kernel mount and so
> has a different block device minor number from the rest of sysv
> shared memory.

So.. I am sitting here believing that this patch and Badari's
restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage.patch are both
needed in 2.6.22 and that they will fix all these issues up.

If that is untrue, someone please let us know..

2007-06-11 19:01:17

by Adam Litke

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

On 6/8/07, Eric W. Biederman <[email protected]> wrote:
> -struct file *hugetlb_zero_setup(size_t size)
> +struct file *hugetlb_file_setup(const char *name, size_t size)

The bulk of this patch seems to handle renaming this function. Is
that really necessary?

--
Adam Litke ( agl at us.ibm.com )
IBM Linux Technology Center

2007-06-11 19:55:20

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

On Mon, 2007-06-11 at 11:11 -0700, Andrew Morton wrote:
> On Fri, 08 Jun 2007 17:43:34 -0600
> [email protected] (Eric W. Biederman) wrote:
>
> > Some user space tools need to identify SYSV shared memory when
> > examining /proc/<pid>/maps. To do so they look for a block device
> > with major zero, a dentry named SYSV<sysv key>, and having the minor of
> > the internal sysv shared memory kernel mount.
> >
> > To help these tools and to make it easier for people just browsing
> > /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to
> > use the SYSV<key> dentry naming convention.
> >
> > User space tools will still have to be aware that hugetlb sysv
> > shared memory lives on a different internal kernel mount and so
> > has a different block device minor number from the rest of sysv
> > shared memory.
>
> So.. I am sitting here believing that this patch and Badari's
> restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage.patch are both
> needed in 2.6.22 and that they will fix all these issues up.
>
> If that is untrue, someone please let us know..

Andrew,

My restore-shmid-as-inode-to-fix-proc-pid-maps-abi-breakage.patch is
definitely needed for 2.6.22 to fix ABI issue.

Eric's patch goes beyond and provides same naming convention for
hugetlbfs backed shm segs (which we never did in the past). So,
its not absolutely need for 2.6.22. You can queue up for next
release, unless Albert really wants to extend proc-ps utils for
hugetlbfs segments too.

But, its very simple patch - you might as well push this too.

Thanks,
Badari

2007-06-11 20:54:19

by Ken Chen

[permalink] [raw]
Subject: Re: [PATCH] shm: Fix the filename of hugetlb sysv shared memory

On 6/11/07, Adam Litke <[email protected]> wrote:
> On 6/8/07, Eric W. Biederman <[email protected]> wrote:
> > -struct file *hugetlb_zero_setup(size_t size)
> > +struct file *hugetlb_file_setup(const char *name, size_t size)
>
> The bulk of this patch seems to handle renaming this function. Is
> that really necessary?

It looks OK to me though, because the argument list to that function
is changed. Avoid change the function name isn't going to reduce the
patch size either. So we might just change the name as well to match
the function name shmem_file_setup().