2008-11-23 23:06:49

by Chris Smowton

[permalink] [raw]
Subject: Robust shared memory for unrelated processes

Hello all,

First of all, my apologies if this is the wrong list for this
suggestion; I haven't posted here before so I might accidentally break
some local conventions :)

With that said, my question/suggestion relates to sharing memory between
processes which are *not* in a parent-child relationship.

Suppose for simplicity's sake that one wishes to share a sizable piece
of memory with a single other process with which one is already in
contact (say, by a Unix domain socket).

It seems to me that one cannot do this without introducing the risk of
the shared section not being deallocated until the system next boots,
for the following reasons:

1. Suppose I use SysV style SHM. Then I must find a free key, create a
section with that key, and communicate that key to my partner process so
that he can also open the section. I cannot issue an IPC_RMID during
this time, as that will render the key immediately unavailable. If I am
SIGKILL'd at any time between creating the section and receiving
confirmation that my partner has opened it, the section will persist
until reboot. This is a large window of opportunity and a very bad thing.

2. Suppose I use POSIX shared memory (i.e. shm_open and its brethren).
Then the same problem exists, only keys are replaced by friendlier
names. The situation is as bad as with SysV SHM.

3. Suppose now I get a bit cleverer; I use POSIX SHM, but I create and
then immediately unlink my section, before sending the file descriptor
over a Unix domain socket to my partner (using the ancillary control
channel). This works, and does mean that I am able to create a shared
section then immediately unlink it, whilst retaining the ability to
allow processes to open the effectively anonymous shared section by
sending them its file descriptor. This nearly accomplishes my goal of
ensuring the shared section does get tidied up if its users are all
SIGKILL'd; however, the section's creator does still have to issue two
calls: shm_open("/mysection", ...); shm_unlink("/mysection");. This is
not atomic, and therefore a window of opportunity still exists for the
section to go astray if I am killed at the wrong time.

This option would also work with a regular file residing in a tmpfs,
since this is all Linux's implementation of shm_open does.

4. Alright, so what if I get still a little cleverer? I will try to use
BSD-style shared memory, as those sections are anonymous and certainly
cleaned up when the referring processes die. I open /dev/zero and mmap
it appropriately, before sending its associated FD to my partner.
Unfortunately this fails; my partner ends up with a private, zeroed
block of memory and nothing is shared. Curiously, I can dup() the
dev-zero file descriptor and share memory with my child processes, and
sendmsg's documentation declares that it will effectively dup() a file
descriptor which is passed across a unix domain socket, but this does
not seem to hold for /dev/zero in particular.

Therefore, it seems that in order to permit sharing of memory with a
process with which I do not have a parent-child relationship, one of the
following needs to be the case:

1. It needs to be possible to atomically shm_open and shm_unlink, or
2. It needs to be possible to pass handles to /dev/zero over sockets
like one can regular files and POSIX section handles (which are just
files in a tmpfs), or
3. It needs to be possible for a general file to atomically created and
registered for deletion on closure of its last handle.

Does this seem valid? Or is there a means to achieve SHM between
unrelated processes without the risk of leaking the memory?

I'm reading the mailing list online rather than getting it delivered at
the moment, so I'd appreciate any comments CC'd to [email protected] :)

Thanks in advance to anyone willing to advise!

Chris


2008-11-23 23:42:49

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: Robust shared memory for unrelated processes

Quoting Chris Smowton ([email protected]):
> Hello all,
>
> First of all, my apologies if this is the wrong list for this suggestion; I
> haven't posted here before so I might accidentally break some local
> conventions :)
>
> With that said, my question/suggestion relates to sharing memory between
> processes which are *not* in a parent-child relationship.
>
> Suppose for simplicity's sake that one wishes to share a sizable piece of
> memory with a single other process with which one is already in contact
> (say, by a Unix domain socket).
>
> It seems to me that one cannot do this without introducing the risk of the
> shared section not being deallocated until the system next boots, for the
> following reasons:
>
> 1. Suppose I use SysV style SHM. Then I must find a free key, create a
> section with that key, and communicate that key to my partner process so
> that he can also open the section. I cannot issue an IPC_RMID during this
> time, as that will render the key immediately unavailable. If I am
> SIGKILL'd at any time between creating the section and receiving
> confirmation that my partner has opened it, the section will persist until
> reboot. This is a large window of opportunity and a very bad thing.

Ah, but if your app is started in a new IPC namespace, then when the
app dies, the namespace will be released and the section will be freed.

Now both of the processes will need to be started in the same child
ipc namespace, as you currently can't enter an existing ipcns. If
that is a problem, I'm sure it can be addressed somehow.

-serge

2008-11-24 01:19:52

by Bodo Eggert

[permalink] [raw]
Subject: Re: Robust shared memory for unrelated processes

Chris Smowton <[email protected]> wrote:

> Suppose for simplicity's sake that one wishes to share a sizable piece
> of memory with a single other process with which one is already in
> contact (say, by a Unix domain socket).
>
> It seems to me that one cannot do this without introducing the risk of
> the shared section not being deallocated until the system next boots,
[...]

> 3. It needs to be possible for a general file to atomically created and
> registered for deletion on closure of its last handle.

This gives you an autounlink flag for tmpfs, which will get rid of your
shared file as soon as the last process closes it. Unfortunately I
don't remember if link()ing it will make it stay.

diff -X dontdiff -dpruN linux-2.6.24.pure/include/linux/shmem_fs.h
linux-2.6.24.autounlink/include/linux/shmem_fs.h
--- linux-2.6.24.pure/include/linux/shmem_fs.h 2006-11-29 22:57:37.000000000
+0100
+++ linux-2.6.24.autounlink/include/linux/shmem_fs.h 2008-02-14
15:35:01.000000000 +0100
@@ -30,11 +30,14 @@ struct shmem_sb_info {
unsigned long free_blocks; /* How many are left for allocation */
unsigned long max_inodes; /* How many inodes are allowed */
unsigned long free_inodes; /* How many are left for allocation */
- int policy; /* Default NUMA memory alloc policy */
- nodemask_t policy_nodes; /* nodemask for preferred and bind */
+ unsigned int flags;
+ int policy; /* Default NUMA memory alloc policy */
+ nodemask_t policy_nodes; /* nodemask for preferred and bind */
spinlock_t stat_lock;
};

+#define TMPFS_FL_AUTOREMOVE 1
+
static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
{
return container_of(inode, struct shmem_inode_info, vfs_inode);
diff -X dontdiff -dpruN linux-2.6.24.pure/mm/shmem.c
linux-2.6.24.autounlink/mm/shmem.c
--- linux-2.6.24.pure/mm/shmem.c 2008-01-25 15:09:39.000000000 +0100
+++ linux-2.6.24.autounlink/mm/shmem.c 2008-02-14 18:00:54.000000000 +0100
@@ -1747,31 +1747,41 @@ static int
shmem_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
{
struct inode *inode = shmem_get_inode(dir->i_sb, mode, dev);
+ struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
int error = -ENOSPC;

- if (inode) {
- error = security_inode_init_security(inode, dir, NULL, NULL,
- NULL);
- if (error) {
- if (error != -EOPNOTSUPP) {
- iput(inode);
- return error;
- }
- }
- error = shmem_acl_init(inode, dir);
- if (error) {
+ if (!inode)
+ return error;
+
+ error = security_inode_init_security(inode, dir, NULL, NULL,
+ NULL);
+ if (error) {
+ if (error != -EOPNOTSUPP) {
iput(inode);
return error;
}
- if (dir->i_mode & S_ISGID) {
- inode->i_gid = dir->i_gid;
- if (S_ISDIR(mode))
- inode->i_mode |= S_ISGID;
- }
- dir->i_size += BOGO_DIRENT_SIZE;
- dir->i_ctime = dir->i_mtime = CURRENT_TIME;
- d_instantiate(dentry, inode);
+ }
+ error = shmem_acl_init(inode, dir);
+ if (error) {
+ iput(inode);
+ return error;
+ }
+ if (dir->i_mode & S_ISGID) {
+ inode->i_gid = dir->i_gid;
+ if (S_ISDIR(mode))
+ inode->i_mode |= S_ISGID;
+ }
+
+ dir->i_size += BOGO_DIRENT_SIZE;
+ dir->i_ctime = dir->i_mtime = CURRENT_TIME;
+ d_instantiate(dentry, inode);
+ if ( S_ISDIR(mode)
+ || !(sbinfo->flags & TMPFS_FL_AUTOREMOVE))
+ {
dget(dentry); /* Extra count - pin the dentry in core */
+ } else {
+ dir->i_size -= BOGO_DIRENT_SIZE;
+ drop_nlink(inode);
}
return error;
}
@@ -1800,6 +1810,11 @@ static int shmem_link(struct dentry *old
struct inode *inode = old_dentry->d_inode;
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);

+ /* In auto-unlink mode, the newly created link would be unlinked
+ immediately. We don't need to do anything here. */
+ if (sbinfo->flags & TMPFS_FL_AUTOREMOVE)
+ return 0;
+
/*
* No ordinary (disk based) filesystem counts links as inodes;
* but each new link needs a new dentry, pinning lowmem, and
@@ -2095,6 +2110,7 @@ static const struct export_operations sh

static int shmem_parse_options(char *options, int *mode, uid_t *uid,
gid_t *gid, unsigned long *blocks, unsigned long *inodes,
+ unsigned int * flags,
int *policy, nodemask_t *policy_nodes)
{
char *this_char, *value, *rest;
@@ -2120,8 +2136,18 @@ static int shmem_parse_options(char *opt
continue;
if ((value = strchr(this_char,'=')) != NULL) {
*value++ = 0;
+
+ /* These options don't take arguments: */
+ } else if (!strcmp(this_char,"autounlink")) {
+ *flags |= TMPFS_FL_AUTOREMOVE;
+ continue;
+ } else if (!strcmp(this_char,"noautounlink")) {
+ *flags &= ~TMPFS_FL_AUTOREMOVE;
+ continue;
+
+ /* All other options need an argument */
} else {
- printk(KERN_ERR
+ printk(KERN_ERR
"tmpfs: No value for mount option '%s'\n",
this_char);
return 1;
@@ -2192,10 +2218,12 @@ static int shmem_remount_fs(struct super
nodemask_t policy_nodes = sbinfo->policy_nodes;
unsigned long blocks;
unsigned long inodes;
+ unsigned int sbflags;
int error = -EINVAL;

+ sbflags = sbinfo->flags;
if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks,
- &max_inodes, &policy, &policy_nodes))
+ &max_inodes, &sbflags, &policy, &policy_nodes))
return error;

spin_lock(&sbinfo->stat_lock);
@@ -2221,6 +2249,7 @@ static int shmem_remount_fs(struct super
sbinfo->free_blocks = max_blocks - blocks;
sbinfo->max_inodes = max_inodes;
sbinfo->free_inodes = max_inodes - inodes;
+ sbinfo->flags = sbflags;
sbinfo->policy = policy;
sbinfo->policy_nodes = policy_nodes;
out:
@@ -2247,6 +2276,7 @@ static int shmem_fill_super(struct super
struct shmem_sb_info *sbinfo;
unsigned long blocks = 0;
unsigned long inodes = 0;
+ unsigned int flags = 0;
int policy = MPOL_DEFAULT;
nodemask_t policy_nodes = node_states[N_HIGH_MEMORY];

@@ -2262,7 +2292,7 @@ static int shmem_fill_super(struct super
if (inodes > blocks)
inodes = blocks;
if (shmem_parse_options(data, &mode, &uid, &gid, &blocks,
- &inodes, &policy, &policy_nodes))
+ &inodes, &flags, &policy, &policy_nodes))
return -EINVAL;
}
sb->s_export_op = &shmem_export_ops;
@@ -2281,6 +2311,7 @@ static int shmem_fill_super(struct super
sbinfo->free_blocks = blocks;
sbinfo->max_inodes = inodes;
sbinfo->free_inodes = inodes;
+ sbinfo->flags = flags;
sbinfo->policy = policy;
sbinfo->policy_nodes = policy_nodes;