This patchset deals with the selinux and rmid races Manfred found on
the ipc scaling work that has been going on. It specifically addresses
shared mem and msg queues. While semaphores still need updated, I want
to make sure these are correct first. Also, Manfred had already sent out
a patchset that deals with a race in sem complex operations. So any changes
should be on top of his.
Patches 1 and 2 deal with shared memory.
Patches 3 and 4 deal with msg queues.
Specific details about each race and its fix are in the corresponding
patches.
Note that Linus suggested a good alternative to patches 1 and 3: use
kfree_rcu() and delay the freeing of the security structure. I would
much prefer that approach to doing security checks with the lock held,
but I want to leave the patches out and ready in case we go with the
later solution.
I have tested these patches with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases, voluntary
and forced preemption have passed the tests -- note that I was not able
to reproduce any of these bugs in the first case, so these tests were
mostly to make sure I didn't further introduce any other issues.
Manfred, could you please give me your feedback on these, as I'd like
to make sure I'm not doing anything stupid.
Davidlohr Bueso (4):
ipc,shm: fix race with selinux
ipc,shm: prevent race with rmid in shmat(2)
ipc,msg: fix race with selinux
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/msg.c | 27 +++++++++++++++++++++------
ipc/shm.c | 31 ++++++++++++++++++++++---------
2 files changed, 43 insertions(+), 15 deletions(-)
--
1.7.11.7
Currently, we check msg security only under RCU. Since selinux
can free the security structure, through selinux_sem_free_security(),
we can run into a use-after-free condition. This bug only affects
msgctl syscalls, as msgsnd and msgrcv have already been updated.
The fix is obvious, make sure we hold the kern_ipc_perm.lock while
performing these security checks.
Reported-by: Manfred Spraul <[email protected]>
Cc: [email protected] # for 3.11
Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/ipc/msg.c b/ipc/msg.c
index b0d541d4..06e8aae 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -414,13 +414,13 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
msq = container_of(ipcp, struct msg_queue, q_perm);
+ ipc_lock_object(&msq->q_perm);
err = security_msg_queue_msgctl(msq, cmd);
if (err)
- goto out_unlock1;
+ goto out_unlock0;
switch (cmd) {
case IPC_RMID:
- ipc_lock_object(&msq->q_perm);
/* freeque unlocks the ipc object and rcu */
freeque(ns, ipcp);
goto out_up;
@@ -428,10 +428,9 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
!capable(CAP_SYS_RESOURCE)) {
err = -EPERM;
- goto out_unlock1;
+ goto out_unlock0;
}
- ipc_lock_object(&msq->q_perm);
err = ipc_update_perm(&msqid64.msg_perm, ipcp);
if (err)
goto out_unlock0;
@@ -450,7 +449,6 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
break;
default:
err = -EINVAL;
- goto out_unlock1;
}
out_unlock0:
@@ -542,9 +540,12 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
if (ipcperms(ns, &msq->q_perm, S_IRUGO))
goto out_unlock;
+ ipc_lock_object(&msq->q_perm);
err = security_msg_queue_msgctl(msq, cmd);
- if (err)
+ if (err) {
+ ipc_unlock_object(&msq->q_perm);
goto out_unlock;
+ }
kernel_to_ipc64_perm(&msq->q_perm, &tbuf.msg_perm);
tbuf.msg_stime = msq->q_stime;
@@ -556,6 +557,7 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
tbuf.msg_lspid = msq->q_lspid;
tbuf.msg_lrpid = msq->q_lrpid;
rcu_read_unlock();
+ ipc_unlock_object(&msq->q_perm);
if (copy_msqid_to_user(buf, &tbuf, version))
return -EFAULT;
--
1.7.11.7
This fixes a race in both msgrcv() and msgsnd() between finding the msg and
actually dealing with the queue, as another thread can delete shmid
underneath us if we are preempted before acquiring the kern_ipc_perm.lock.
Manfred illustrates this nicely:
Assume a preemptible kernel that is preempted just after
> msq = msq_obtain_object_check(ns, msqid)
in do_msgrcv().
The only lock that is held is rcu_read_lock().
Now the other thread processes IPC_RMID.
When the first task is resumed, then it will happily wait for messages on a
deleted queue.
Fix this by checking for if the queue has been deleted after taking the lock.
Reported-by: Manfred Spraul <[email protected]>
Cc: [email protected] # for 3.11
Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/ipc/msg.c b/ipc/msg.c
index 06e8aae..4e7d95a 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -690,6 +690,12 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
if (ipcperms(ns, &msq->q_perm, S_IWUGO))
goto out_unlock0;
+ /* raced with RMID? */
+ if (msq->q_perm.deleted) {
+ err = -EIDRM;
+ goto out_unlock0;
+ }
+
err = security_msg_queue_msgsnd(msq, msg, msgflg);
if (err)
goto out_unlock0;
@@ -896,6 +902,13 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgfl
goto out_unlock1;
ipc_lock_object(&msq->q_perm);
+
+ /* raced with RMID? */
+ if (msq->q_perm.deleted) {
+ msg = ERR_PTR(-EIDRM);
+ goto out_unlock0;
+ }
+
msg = find_msg(msq, &msgtyp, mode);
if (!IS_ERR(msg)) {
/*
--
1.7.11.7
Currently, we check shm security only under RCU. Since selinux
can free the security structure, through selinux_sem_free_security(),
we can run into a use-after-free condition. This bug affects both
shmctl and shmat syscalls.
The fix is obvious, make sure we hold the kern_ipc_perm.lock while
performing these security checks.
Reported-by: Manfred Spraul <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/shm.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/ipc/shm.c b/ipc/shm.c
index 2821cdf..bc3e897 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -781,18 +781,17 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+ ipc_lock_object(&shp->shm_perm);
err = security_shm_shmctl(shp, cmd);
if (err)
- goto out_unlock1;
+ goto out_unlock0;
switch (cmd) {
case IPC_RMID:
- ipc_lock_object(&shp->shm_perm);
/* do_shm_rmid unlocks the ipc object and rcu */
do_shm_rmid(ns, ipcp);
goto out_up;
case IPC_SET:
- ipc_lock_object(&shp->shm_perm);
err = ipc_update_perm(&shmid64.shm_perm, ipcp);
if (err)
goto out_unlock0;
@@ -800,7 +799,6 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
break;
default:
err = -EINVAL;
- goto out_unlock1;
}
out_unlock0:
@@ -895,9 +893,12 @@ static int shmctl_nolock(struct ipc_namespace *ns, int shmid,
if (ipcperms(ns, &shp->shm_perm, S_IRUGO))
goto out_unlock;
+ ipc_lock_object(&shp->shm_perm);
err = security_shm_shmctl(shp, cmd);
- if (err)
+ if (err) {
+ ipc_unlock_object(&shp->shm_perm);
goto out_unlock;
+ }
memset(&tbuf, 0, sizeof(tbuf));
kernel_to_ipc64_perm(&shp->shm_perm, &tbuf.shm_perm);
@@ -909,6 +910,7 @@ static int shmctl_nolock(struct ipc_namespace *ns, int shmid,
tbuf.shm_lpid = shp->shm_lprid;
tbuf.shm_nattch = shp->shm_nattch;
rcu_read_unlock();
+ ipc_unlock_object(&shp->shm_perm);
if (copy_shmid_to_user(buf, &tbuf, version))
err = -EFAULT;
@@ -960,11 +962,12 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
}
audit_ipc_obj(&(shp->shm_perm));
+
+ ipc_lock_object(&shp->shm_perm);
err = security_shm_shmctl(shp, cmd);
if (err)
- goto out_unlock1;
+ goto out_unlock0;
- ipc_lock_object(&shp->shm_perm);
if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
kuid_t euid = current_euid();
err = -EPERM;
@@ -1089,11 +1092,13 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
if (ipcperms(ns, &shp->shm_perm, acc_mode))
goto out_unlock;
+ ipc_lock_object(&shp->shm_perm);
err = security_shm_shmat(shp, shmaddr, shmflg);
- if (err)
+ if (err) {
+ ipc_unlock_object(&shp->shm_perm);
goto out_unlock;
+ }
- ipc_lock_object(&shp->shm_perm);
path = shp->shm_file->f_path;
path_get(&path);
shp->shm_nattch++;
--
1.7.11.7
This fixes a race in shmat() between finding the msq and
actually attaching the segment, as another thread can delete shmid
underneath us if we are preempted before acquiring the kern_ipc_perm.lock.
Reported-by: Manfred Spraul <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/shm.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/ipc/shm.c b/ipc/shm.c
index bc3e897..1afde7e 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1093,6 +1093,14 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
goto out_unlock;
ipc_lock_object(&shp->shm_perm);
+
+ /* have we raced with RMID? */
+ if (shp->shm_perm.deleted) {
+ err = -EIDRM;
+ ipc_unlock_object(&shp->shm_perm);
+ goto out_unlock;
+ }
+
err = security_shm_shmat(shp, shmaddr, shmflg);
if (err) {
ipc_unlock_object(&shp->shm_perm);
--
1.7.11.7
On 09/16/2013 05:04 AM, Davidlohr Bueso wrote:
> Currently, we check shm security only under RCU. Since selinux
> can free the security structure, through selinux_sem_free_security(),
> we can run into a use-after-free condition. This bug affects both
> shmctl and shmat syscalls.
>
> The fix is obvious, make sure we hold the kern_ipc_perm.lock while
> performing these security checks.
Actually: either kern_ipc_perm or down_xx(&shm_ids(ns).rwsem) is sufficient.
>
> Reported-by: Manfred Spraul <[email protected]>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> ipc/shm.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 2821cdf..bc3e897 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -781,18 +781,17 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
>
> shp = container_of(ipcp, struct shmid_kernel, shm_perm);
>
> + ipc_lock_object(&shp->shm_perm);
> err = security_shm_shmctl(shp, cmd);
> if (err)
> - goto out_unlock1;
> + goto out_unlock0;
This change is not necessary: down_write(shm_ids(ns).rwsem) already
synchronizes against another IPC_RMID.
But it doesn't hurt.
> @@ -960,11 +962,12 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
> }
>
> audit_ipc_obj(&(shp->shm_perm));
> +
> + ipc_lock_object(&shp->shm_perm);
> err = security_shm_shmctl(shp, cmd);
What about audit_ipc_obj()?
calls __audit_ipc_obj
calls security_ipc_getsecid
calls security_ops->ipc_getsecid, which can be selinux_ipc_getsecid
selinux_ipc_getsecid dereferences ipcp->security
Please: Restart from 3.0.9 (i.e. before the scalability improvement
project was started)
Every function that is moved from "synchronization with ipc_lock()" to
"only rcu_read_lock() held" must be checked.
--
Manfred
On Sun, 2013-09-15 at 20:04 -0700, Davidlohr Bueso wrote:
> This patchset deals with the selinux and rmid races Manfred found on
> the ipc scaling work that has been going on. It specifically addresses
> shared mem and msg queues. While semaphores still need updated, I want
> to make sure these are correct first. Also, Manfred had already sent out
> a patchset that deals with a race in sem complex operations. So any changes
> should be on top of his.
>
> Patches 1 and 2 deal with shared memory.
> Patches 3 and 4 deal with msg queues.
> Specific details about each race and its fix are in the corresponding
> patches.
>
> Note that Linus suggested a good alternative to patches 1 and 3: use
> kfree_rcu() and delay the freeing of the security structure. I would
> much prefer that approach to doing security checks with the lock held,
> but I want to leave the patches out and ready in case we go with the
> later solution.
Cc'ing selinux/security people - folks, could you please confirm that
this change is ok and won't break anything related to security?
Thanks!
8<-------------------------------------------
From: Davidlohr Bueso <[email protected]>
Date: Thu, 19 Sep 2013 09:32:05 -0700
Subject: [PATCH] selinux: rely on rcu to free ipc isec
Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since selinux can free the security structure, through
selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure
is freed before other tasks are done with it, creating a use-after-free
condition. Manfred illustrates this nicely, for example with shared mem:
--> do_shmat calls rcu_read_lock()
--> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
--> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
--> selinux_shm_shmat calls ipc_has_perm()
--> ipc_has_perm accesses ipc_perms->security
shm_close()
--> shm_close acquires rw_mutex & shm_lock
--> shm_close calls shm_destroy
--> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
--> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
--> ipc_free_security calls kfree(ipc_perms->security)
There are two solutions to this: (i) perform the security checking and whatever
IPC operation(s) atomically (that is, with the kern_ipc_perm.lock held), or
(ii) delay the freeing of the isec structure after all RCU readers are done.
This patch takes the second approach, which, at least from a performance perspective,
is more practical than the first as it keeps the IPC critical regions smaller.
I have tested this patch with IPC testcases from LTP on both my quad-core
laptop and on a 64 core NUMA server. In both cases selinux is enabled, and
tests pass for both voluntary and forced preemption models.
Signed-off-by: Davidlohr Bueso <[email protected]>
---
security/selinux/hooks.c | 9 ++++++++-
security/selinux/include/objsec.h | 5 +++--
2 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a5091ec..179ffe9 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4892,8 +4892,15 @@ static int ipc_alloc_security(struct task_struct *task,
static void ipc_free_security(struct kern_ipc_perm *perm)
{
struct ipc_security_struct *isec = perm->security;
+
+ /*
+ * All IPC mechanisms do security and audit
+ * checking under RCU: wait a grace period before
+ * freeing isec, otherwise we can run into a
+ * use-after-free scenario.
+ */
+ kfree_rcu(isec, rcu);
perm->security = NULL;
- kfree(isec);
}
static int msg_msg_alloc_security(struct msg_msg *msg)
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index aa47bca..38d17b7 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -70,8 +70,9 @@ struct msg_security_struct {
};
struct ipc_security_struct {
- u16 sclass; /* security class of this object */
- u32 sid; /* SID of IPC resource */
+ u16 sclass; /* security class of this object */
+ u32 sid; /* SID of IPC resource */
+ struct rcu_head rcu; /* rcu struct for selinux based IPC security */
};
struct netif_security_struct {
--
1.7.11.7
On Thu, 2013-09-19 at 14:22 -0700, Davidlohr Bueso wrote:
> On Sun, 2013-09-15 at 20:04 -0700, Davidlohr Bueso wrote:
> > This patchset deals with the selinux and rmid races Manfred found on
> > the ipc scaling work that has been going on. It specifically addresses
> > shared mem and msg queues. While semaphores still need updated, I want
> > to make sure these are correct first. Also, Manfred had already sent out
> > a patchset that deals with a race in sem complex operations. So any changes
> > should be on top of his.
> >
> > Patches 1 and 2 deal with shared memory.
> > Patches 3 and 4 deal with msg queues.
> > Specific details about each race and its fix are in the corresponding
> > patches.
> >
> > Note that Linus suggested a good alternative to patches 1 and 3: use
> > kfree_rcu() and delay the freeing of the security structure. I would
> > much prefer that approach to doing security checks with the lock held,
> > but I want to leave the patches out and ready in case we go with the
> > later solution.
>
> Cc'ing selinux/security people - folks, could you please confirm that
> this change is ok and won't break anything related to security?
I agree with the approach to delay the freeing and it does not appear to
be a problem from an SELinux PoV, but I don't necessarily agree with the
location of the delay. Why should every LSM be required to know the
details of that object lifetime? It seems to me like we might want to
move this delay up to shm_destroy. I know that's going to be more code
then using kfree_rcu(), but it seems like a much better abstraction to
hide that knowledge away from the LSM.
Possibly place the rcu_head in struct kern_ipc_perm? Then use
call_rcu() to call the security destroy hook? You'll have to re-write
to LSM hook to take an rcu_head, etc, but that shouldn't be a big
problem.
Doing it this way, also means you won't break the security model of
SMACK. It looks like you'd still have the exact same race with SMACK,
except they can't be fixed with kfree_rcu(). They are just setting a
pointer to NULL. Which could then be dereferenced later. They don't
actually do allocation/free.
So ACK on the RCU delay, but NACK to making the hooks do the delay,
instead do it in the IPC code.
>
> Thanks!
>
>
> 8<-------------------------------------------
>
> From: Davidlohr Bueso <[email protected]>
> Date: Thu, 19 Sep 2013 09:32:05 -0700
> Subject: [PATCH] selinux: rely on rcu to free ipc isec
>
> Currently, IPC mechanisms do security and auditing related checks under
> RCU. However, since selinux can free the security structure, through
> selinux_[sem,msg_queue,shm]_free_security(), we can race if the structure
> is freed before other tasks are done with it, creating a use-after-free
> condition. Manfred illustrates this nicely, for example with shared mem:
>
> --> do_shmat calls rcu_read_lock()
> --> do_shmat calls shm_object_check().
> Checks that the object is still valid - but doesn't acquire any locks.
> Then it returns.
> --> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
> --> selinux_shm_shmat calls ipc_has_perm()
> --> ipc_has_perm accesses ipc_perms->security
>
> shm_close()
> --> shm_close acquires rw_mutex & shm_lock
> --> shm_close calls shm_destroy
> --> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
> --> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
> --> ipc_free_security calls kfree(ipc_perms->security)
>
> There are two solutions to this: (i) perform the security checking and whatever
> IPC operation(s) atomically (that is, with the kern_ipc_perm.lock held), or
> (ii) delay the freeing of the isec structure after all RCU readers are done.
> This patch takes the second approach, which, at least from a performance perspective,
> is more practical than the first as it keeps the IPC critical regions smaller.
>
> I have tested this patch with IPC testcases from LTP on both my quad-core
> laptop and on a 64 core NUMA server. In both cases selinux is enabled, and
> tests pass for both voluntary and forced preemption models.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> security/selinux/hooks.c | 9 ++++++++-
> security/selinux/include/objsec.h | 5 +++--
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index a5091ec..179ffe9 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -4892,8 +4892,15 @@ static int ipc_alloc_security(struct task_struct *task,
> static void ipc_free_security(struct kern_ipc_perm *perm)
> {
> struct ipc_security_struct *isec = perm->security;
> +
> + /*
> + * All IPC mechanisms do security and audit
> + * checking under RCU: wait a grace period before
> + * freeing isec, otherwise we can run into a
> + * use-after-free scenario.
> + */
> + kfree_rcu(isec, rcu);
> perm->security = NULL;
> - kfree(isec);
> }
>
> static int msg_msg_alloc_security(struct msg_msg *msg)
> diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
> index aa47bca..38d17b7 100644
> --- a/security/selinux/include/objsec.h
> +++ b/security/selinux/include/objsec.h
> @@ -70,8 +70,9 @@ struct msg_security_struct {
> };
>
> struct ipc_security_struct {
> - u16 sclass; /* security class of this object */
> - u32 sid; /* SID of IPC resource */
> + u16 sclass; /* security class of this object */
> + u32 sid; /* SID of IPC resource */
> + struct rcu_head rcu; /* rcu struct for selinux based IPC security */
> };
>
> struct netif_security_struct {
Hi Eric,
On Fri, 2013-09-20 at 14:08 -0400, Eric Paris wrote:
> > > Note that Linus suggested a good alternative to patches 1 and 3: use
> > > kfree_rcu() and delay the freeing of the security structure. I would
> > > much prefer that approach to doing security checks with the lock held,
> > > but I want to leave the patches out and ready in case we go with the
> > > later solution.
> >
> > Cc'ing selinux/security people - folks, could you please confirm that
> > this change is ok and won't break anything related to security?
>
> I agree with the approach to delay the freeing and it does not appear to
> be a problem from an SELinux PoV, but I don't necessarily agree with the
> location of the delay. Why should every LSM be required to know the
> details of that object lifetime? It seems to me like we might want to
> move this delay up to shm_destroy. I know that's going to be more code
> then using kfree_rcu(), but it seems like a much better abstraction to
> hide that knowledge away from the LSM.
This patch should only affect selinux, not all LSMs - selinux is the
only user of struct ipc_security_struct (which name IMO is too generic
and misleading). Furthermore, the hooks that are changed are all under
selinux.
>
> Possibly place the rcu_head in struct kern_ipc_perm? Then use
> call_rcu() to call the security destroy hook? You'll have to re-write
> to LSM hook to take an rcu_head, etc, but that shouldn't be a big
> problem.
>
> Doing it this way, also means you won't break the security model of
> SMACK. It looks like you'd still have the exact same race with SMACK,
> except they can't be fixed with kfree_rcu(). They are just setting a
> pointer to NULL. Which could then be dereferenced later. They don't
> actually do allocation/free.
Yep, I recently noticed that, making this patch's approach invalid. I
was considering adding the rcu head to each ipc mechanism kernel private
data structure (shmid_kernel, sem_array, msg_queue). Then, like you
suggest, delaying the security freeing at the ipc level, but I think
you're right and it makes life easier to just do it at a struct
kern_ipc_perm level.
IPC uses security_xxx_free() at two levels: for freeing the structure
(ie: shm_destroy()) and cleaning up upon error when creating the
structure (ie: newseg()). For both I believe we can actually use RCU.
What do you think of the below change, it is specific for shm, and we'd
need an equivalent for msq and sems.
Thanks.
diff --git a/include/linux/ipc.h b/include/linux/ipc.h
index 8d861b2..8b98376 100644
--- a/include/linux/ipc.h
+++ b/include/linux/ipc.h
@@ -21,6 +21,7 @@ struct kern_ipc_perm
umode_t mode;
unsigned long seq;
void *security;
+ struct rcu_head rcu;
};
#endif /* _LINUX_IPC_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index 9d37e2b..b537feb 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1860,7 +1860,7 @@ int security_msg_queue_msgsnd(struct msg_queue *msq,
int security_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg,
struct task_struct *target, long type, int mode);
int security_shm_alloc(struct shmid_kernel *shp);
-void security_shm_free(struct shmid_kernel *shp);
+void security_shm_free(struct rcu_head *rcu);
int security_shm_associate(struct shmid_kernel *shp, int shmflg);
int security_shm_shmctl(struct shmid_kernel *shp, int cmd);
int security_shm_shmat(struct shmid_kernel *shp, char __user *shmaddr, int shmflg);
@@ -2504,7 +2504,7 @@ static inline int security_shm_alloc(struct shmid_kernel *shp)
return 0;
}
-static inline void security_shm_free(struct shmid_kernel *shp)
+static inline void security_shm_free(struct rcu_head *rcu)
{ }
static inline int security_shm_associate(struct shmid_kernel *shp,
diff --git a/ipc/shm.c b/ipc/shm.c
index 2821cdf..208e490 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -208,8 +208,8 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
user_shm_unlock(file_inode(shp->shm_file)->i_size,
shp->mlock_user);
fput (shp->shm_file);
- security_shm_free(shp);
ipc_rcu_putref(shp);
+ call_rcu(&(shp)->shm_perm.rcu, security_shm_free);
}
/*
@@ -566,8 +566,8 @@ no_id:
user_shm_unlock(size, shp->mlock_user);
fput(file);
no_file:
- security_shm_free(shp);
ipc_rcu_putref(shp);
+ call_rcu(&(shp)->shm_perm.rcu, security_shm_free);
return error;
}
diff --git a/security/security.c b/security/security.c
index 4dc31f4..1568b02 100644
--- a/security/security.c
+++ b/security/security.c
@@ -25,6 +25,7 @@
#include <linux/mount.h>
#include <linux/personality.h>
#include <linux/backing-dev.h>
+#include <linux/shm.h>
#include <net/flow.h>
#define MAX_LSM_EVM_XATTR 2
@@ -990,8 +991,11 @@ int security_shm_alloc(struct shmid_kernel *shp)
return security_ops->shm_alloc_security(shp);
}
-void security_shm_free(struct shmid_kernel *shp)
+void security_shm_free(struct rcu_head *rcu)
{
+ struct kern_ipc_perm *ipcp = container_of(rcu, struct kern_ipc_perm, rcu);
+ struct shmid_kernel *shp = container_of(ipcp, struct shmid_kernel, shm_perm);
+
security_ops->shm_free_security(shp);
}
On Sat, Sep 21, 2013 at 11:30 AM, Davidlohr Bueso <[email protected]> wrote:
>
> IPC uses security_xxx_free() at two levels: for freeing the structure
> (ie: shm_destroy()) and cleaning up upon error when creating the
> structure (ie: newseg()). For both I believe we can actually use RCU.
> What do you think of the below change, it is specific for shm, and we'd
> need an equivalent for msq and sems.
Ugh.
This code already has rcu-delaying, usign the existing "rcu" list
entry. I hate how you add a *new* rcu list entry, and we basically
case two callbacks.
More importantly, it's wrong. You do the call_rcu() unconditionally,
but it might not be the last use! You need to do it with the same
logic ipc_rcu_putref(), namely at the dropping of the last reference.
So how about just making ipc_rcu_putref() have a RCU callback
argument, and make the code look something like
ipc_rcu_putref(shp, shm_rcu_free);
and then shm_rcu_free() just does
#define ipc_rcu_to_struct(p) ((void *)(p+1))
void shm_rcu_free(struct rcu_head *head)
{
struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
struct shmid_kernel *shp = ipc_rcu_to_struct(p);
security_shm_free(shp);
ipc_rcu_free(head);
}
(that "ipc_rcu_free()" would do the current vfree-vs-kfree, just not
rcu-delayed, so it would look something like
void ipc_rcu_free(struct rcu_head *head)
{
struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
if (is_vmalloc_addr(p))
vfree(p);
else
kfree(p);
}
Other users would then just use
ipc_rcu_putref(shp, ipc_rcu_free);
until they too decide that they want to do something extra at RCU freeing time..
Hmm?
Linus
On Sat, 2013-09-21 at 11:58 -0700, Linus Torvalds wrote:
> On Sat, Sep 21, 2013 at 11:30 AM, Davidlohr Bueso <[email protected]> wrote:
> >
> > IPC uses security_xxx_free() at two levels: for freeing the structure
> > (ie: shm_destroy()) and cleaning up upon error when creating the
> > structure (ie: newseg()). For both I believe we can actually use RCU.
> > What do you think of the below change, it is specific for shm, and we'd
> > need an equivalent for msq and sems.
>
> Ugh.
>
> This code already has rcu-delaying, usign the existing "rcu" list
> entry. I hate how you add a *new* rcu list entry, and we basically
> case two callbacks.
>
> More importantly, it's wrong. You do the call_rcu() unconditionally,
> but it might not be the last use! You need to do it with the same
> logic ipc_rcu_putref(), namely at the dropping of the last reference.
This is the way IPC has handled things for a long time, no? Security
does not depend on the reference counter, as we unconditionally free
security structs.
>
> So how about just making ipc_rcu_putref() have a RCU callback
> argument, and make the code look something like
>
> ipc_rcu_putref(shp, shm_rcu_free);
What you're suggesting, is (i) freeing security will now depend on the
refcount (wouldn't this cause cases where we actually never end up
freeing?) and (ii) in the scenarios we actually need to free the
security, delay it along with freeing the actual ipc_rcu stuff.
>
> and then shm_rcu_free() just does
>
> #define ipc_rcu_to_struct(p) ((void *)(p+1))
>
> void shm_rcu_free(struct rcu_head *head)
> {
> struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
> struct shmid_kernel *shp = ipc_rcu_to_struct(p);
>
> security_shm_free(shp);
> ipc_rcu_free(head);
> }
If I understand correctly, then we'd have:
void ipc_rcu_putref(void *ptr, void (*func)(struct rcu_head *head))
{
struct ipc_rcu *p = ((struct ipc_rcu *)ptr) - 1;
if (!atomic_dec_and_test(&p->refcount))
return;
call_rcu(&p->rcu, func);
}
>
> (that "ipc_rcu_free()" would do the current vfree-vs-kfree, just not
> rcu-delayed, so it would look something like
The vfree/free would still be rcu delayed from whatever was passed to
ipc_rcu_putref(), but yes, the function wouldn't explicitly be calling
call_rcu/kfree_rcu.
>
> void ipc_rcu_free(struct rcu_head *head)
> {
> struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
> if (is_vmalloc_addr(p))
> vfree(p);
> else
> kfree(p);
> }
>
> Other users would then just use
>
> ipc_rcu_putref(shp, ipc_rcu_free);
> until they too decide that they want to do something extra at RCU freeing time..
I like this flexibility.
Thanks,
Davidlohr
On Sun, Sep 22, 2013 at 11:42 PM, Davidlohr Bueso <[email protected]> wrote:
>>
>> More importantly, it's wrong. You do the call_rcu() unconditionally,
>> but it might not be the last use! You need to do it with the same
>> logic ipc_rcu_putref(), namely at the dropping of the last reference.
>
> This is the way IPC has handled things for a long time, no? Security
> does not depend on the reference counter, as we unconditionally free
> security structs.
Yes, but that was ok back when the logic was idem-potent and you could
call it multiple times. Modulo races (I didn't check if we held a
lock).
You can't do "call_rcu()" more than once, because you'll corrupt the
rcu list if you do another call_rcu() while the first one is still
active (and that's a pretty big race window to hit).
That said, the old behavior was suspect for another reason too: having
the security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior.
(In reality, I suspect the reference count is never elevated in
practice, so there is only one case that calls the security freeing
thing, so this may all be pretty much theoretical, but at least from a
logic standpoint the code clearly makes a big deal about the whole
refcount and "last user turns off the lights").
> What you're suggesting, is (i) freeing security will now depend on the
> refcount (wouldn't this cause cases where we actually never end up
> freeing?)
The security layer better not have any refcounted backpointers to the
shm, so I don't see why that would be a new issue.
> and (ii) in the scenarios we actually need to free the
> security, delay it along with freeing the actual ipc_rcu stuff.
Well, that's the whole point. The security blob should have the same
lifetime as the ipc blob it is associated with.
Getting rid of the security blob before the thing it is supposed to
protect sounds like a bug to me. In fact, it's the bug that this
whole thread has been about. No?
> If I understand correctly, then we'd have:
>
> void ipc_rcu_putref(void *ptr, void (*func)(struct rcu_head *head))
> {
> struct ipc_rcu *p = ((struct ipc_rcu *)ptr) - 1;
>
> if (!atomic_dec_and_test(&p->refcount))
> return;
>
> call_rcu(&p->rcu, func);
> }
Exactly.
Linus
On Mon, 2013-09-23 at 09:54 -0700, Linus Torvalds wrote:
> On Sun, Sep 22, 2013 at 11:42 PM, Davidlohr Bueso <[email protected]> wrote:
> >>
> >> More importantly, it's wrong. You do the call_rcu() unconditionally,
> >> but it might not be the last use! You need to do it with the same
> >> logic ipc_rcu_putref(), namely at the dropping of the last reference.
> >
> > This is the way IPC has handled things for a long time, no? Security
> > does not depend on the reference counter, as we unconditionally free
> > security structs.
>
> Yes, but that was ok back when the logic was idem-potent and you could
> call it multiple times. Modulo races (I didn't check if we held a
> lock).
>
> You can't do "call_rcu()" more than once, because you'll corrupt the
> rcu list if you do another call_rcu() while the first one is still
> active (and that's a pretty big race window to hit).
Ah, ok understood.
>
> That said, the old behavior was suspect for another reason too: having
> the security blob go away from under a user sounds like it could cause
> various other problems anyway, so I think the old code was at least
> _prone_ to bugs even if it didn't have catastrophic behavior.
Agreed. This makes a lot of sense, I just wanted to verify that we're
all on the same page with this, as it does change current and well
tested logic. Eric, Manfred, please shout if you see a problem,
otherwise I'd appreciate your ack/reviewed-by.
>
> (In reality, I suspect the reference count is never elevated in
> practice, so there is only one case that calls the security freeing
> thing, so this may all be pretty much theoretical, but at least from a
> logic standpoint the code clearly makes a big deal about the whole
> refcount and "last user turns off the lights").
Right, this would/should have come up years ago if it were actually
occurring in practice.
>
> > What you're suggesting, is (i) freeing security will now depend on the
> > refcount (wouldn't this cause cases where we actually never end up
> > freeing?)
>
> The security layer better not have any refcounted backpointers to the
> shm, so I don't see why that would be a new issue.
>
> > and (ii) in the scenarios we actually need to free the
> > security, delay it along with freeing the actual ipc_rcu stuff.
>
> Well, that's the whole point. The security blob should have the same
> lifetime as the ipc blob it is associated with.
>
> Getting rid of the security blob before the thing it is supposed to
> protect sounds like a bug to me. In fact, it's the bug that this
> whole thread has been about. No?
>
> > If I understand correctly, then we'd have:
> >
> > void ipc_rcu_putref(void *ptr, void (*func)(struct rcu_head *head))
> > {
> > struct ipc_rcu *p = ((struct ipc_rcu *)ptr) - 1;
> >
> > if (!atomic_dec_and_test(&p->refcount))
> > return;
> >
> > call_rcu(&p->rcu, func);
> > }
>
> Exactly.
Ok, so here's the code - again I've tested it with LTP on the resources
I have. While I have yet to hit the races we are trying to address, I
just wanted to make sure it doesn't affect in anything else. The changes
are pretty monotonous and affect all forms of IPC - it doesn't make
sense splitting it since we're changing the global putref logic. All in
all, it changes the following pattern (ie: sems):
- ipc_rcu_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
and
- security_sem_free(sma);
- ipc_rcu_putref(sma);
+ ipc_rcu_putref(sma, sem_rcu_free);
Note that for ipc/sem.c, it also gets rid of the useless sem_putref(),
which is a straight call to ipc_rcu_putref().
If taken, this should fix shm + msq + sem and I'll backport it to 3.11
for msq + sem and 3.10 for sems.
Thanks!
8<-----------------------------------------
From: Davidlohr Bueso <[email protected]>
Subject: [PATCH] ipc: fix race with LSMs
Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure, for
example, through selinux_[sem,msg_queue,shm]_free_security(), we can race
if the structure is freed before other tasks are done with it, creating a
use-after-free condition. Manfred illustrates this nicely, for instance with
shared mem and selinux:
--> do_shmat calls rcu_read_lock()
--> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
--> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
--> selinux_shm_shmat calls ipc_has_perm()
--> ipc_has_perm accesses ipc_perms->security
shm_close()
--> shm_close acquires rw_mutex & shm_lock
--> shm_close calls shm_destroy
--> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
--> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
--> ipc_free_security calls kfree(ipc_perms->security)
This patch delays the freeing of the security structures after all RCU readers
are done. Furthermore it aligns the security life cycle with that of the rest of
IPC - freeing them based on the reference counter. For situations where we need
not free security, the current behavior is kept. Linus states:
"... the old behavior was suspect for another reason too: having
the security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."
I have tested this patch with IPC testcases from LTP on both my quad-core
laptop and on a 64 core NUMA server. In both cases selinux is enabled, and
tests pass for both voluntary and forced preemption models. While the mentioned
races are theoretical (at least no one as reported them), I wanted to make
sure that this new logic doesn't break anything we weren't aware of.
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
---
ipc/msg.c | 19 +++++++++++++------
ipc/sem.c | 34 ++++++++++++++++++----------------
ipc/shm.c | 17 ++++++++++++-----
ipc/util.c | 32 ++++++++++++--------------------
ipc/util.h | 10 +++++++++-
5 files changed, 64 insertions(+), 48 deletions(-)
diff --git a/ipc/msg.c b/ipc/msg.c
index b0d541d4..9e4310c 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -165,6 +165,15 @@ static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s)
ipc_rmid(&msg_ids(ns), &s->q_perm);
}
+static void msg_rcu_free(struct rcu_head *head)
+{
+ struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
+ struct msg_queue *msq = ipc_rcu_to_struct(p);
+
+ security_msg_queue_free(msq);
+ ipc_rcu_free(head);
+}
+
/**
* newque - Create a new msg queue
* @ns: namespace
@@ -189,15 +198,14 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
msq->q_perm.security = NULL;
retval = security_msg_queue_alloc(msq);
if (retval) {
- ipc_rcu_putref(msq);
+ ipc_rcu_putref(msq, ipc_rcu_free);
return retval;
}
/* ipc_addid() locks msq upon success. */
id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
if (id < 0) {
- security_msg_queue_free(msq);
- ipc_rcu_putref(msq);
+ ipc_rcu_putref(msq, msg_rcu_free);
return id;
}
@@ -276,8 +284,7 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
free_msg(msg);
}
atomic_sub(msq->q_cbytes, &ns->msg_bytes);
- security_msg_queue_free(msq);
- ipc_rcu_putref(msq);
+ ipc_rcu_putref(msq, msg_rcu_free);
}
/*
@@ -717,7 +724,7 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
rcu_read_lock();
ipc_lock_object(&msq->q_perm);
- ipc_rcu_putref(msq);
+ ipc_rcu_putref(msq, ipc_rcu_free);
if (msq->q_perm.deleted) {
err = -EIDRM;
goto out_unlock0;
diff --git a/ipc/sem.c b/ipc/sem.c
index 69b6a21..19c8b98 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -243,6 +243,15 @@ static void merge_queues(struct sem_array *sma)
}
}
+static void sem_rcu_free(struct rcu_head *head)
+{
+ struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
+ struct sem_array *sma = ipc_rcu_to_struct(p);
+
+ security_sem_free(sma);
+ ipc_rcu_free(head);
+}
+
/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
@@ -374,12 +383,7 @@ static inline struct sem_array *sem_obtain_object_check(struct ipc_namespace *ns
static inline void sem_lock_and_putref(struct sem_array *sma)
{
sem_lock(sma, NULL, -1);
- ipc_rcu_putref(sma);
-}
-
-static inline void sem_putref(struct sem_array *sma)
-{
- ipc_rcu_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
}
static inline void sem_rmid(struct ipc_namespace *ns, struct sem_array *s)
@@ -458,14 +462,13 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
sma->sem_perm.security = NULL;
retval = security_sem_alloc(sma);
if (retval) {
- ipc_rcu_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
return retval;
}
id = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni);
if (id < 0) {
- security_sem_free(sma);
- ipc_rcu_putref(sma);
+ ipc_rcu_putref(sma, sem_rcu_free);
return id;
}
ns->used_sems += nsems;
@@ -1047,8 +1050,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
wake_up_sem_queue_do(&tasks);
ns->used_sems -= sma->sem_nsems;
- security_sem_free(sma);
- ipc_rcu_putref(sma);
+ ipc_rcu_putref(sma, sem_rcu_free);
}
static unsigned long copy_semid_to_user(void __user *buf, struct semid64_ds *in, int version)
@@ -1292,7 +1294,7 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
rcu_read_unlock();
sem_io = ipc_alloc(sizeof(ushort)*nsems);
if(sem_io == NULL) {
- sem_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
return -ENOMEM;
}
@@ -1328,20 +1330,20 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
if(nsems > SEMMSL_FAST) {
sem_io = ipc_alloc(sizeof(ushort)*nsems);
if(sem_io == NULL) {
- sem_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
return -ENOMEM;
}
}
if (copy_from_user (sem_io, p, nsems*sizeof(ushort))) {
- sem_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
err = -EFAULT;
goto out_free;
}
for (i = 0; i < nsems; i++) {
if (sem_io[i] > SEMVMX) {
- sem_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
err = -ERANGE;
goto out_free;
}
@@ -1629,7 +1631,7 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
/* step 2: allocate new undo structure */
new = kzalloc(sizeof(struct sem_undo) + sizeof(short)*nsems, GFP_KERNEL);
if (!new) {
- sem_putref(sma);
+ ipc_rcu_putref(sma, ipc_rcu_free);
return ERR_PTR(-ENOMEM);
}
diff --git a/ipc/shm.c b/ipc/shm.c
index 2821cdf..d697396 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -167,6 +167,15 @@ static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
ipc_lock_object(&ipcp->shm_perm);
}
+static void shm_rcu_free(struct rcu_head *head)
+{
+ struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
+ struct shmid_kernel *shp = ipc_rcu_to_struct(p);
+
+ security_shm_free(shp);
+ ipc_rcu_free(head);
+}
+
static inline void shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *s)
{
ipc_rmid(&shm_ids(ns), &s->shm_perm);
@@ -208,8 +217,7 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
user_shm_unlock(file_inode(shp->shm_file)->i_size,
shp->mlock_user);
fput (shp->shm_file);
- security_shm_free(shp);
- ipc_rcu_putref(shp);
+ ipc_rcu_putref(shp, shm_rcu_free);
}
/*
@@ -497,7 +505,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
shp->shm_perm.security = NULL;
error = security_shm_alloc(shp);
if (error) {
- ipc_rcu_putref(shp);
+ ipc_rcu_putref(shp, ipc_rcu_free);
return error;
}
@@ -566,8 +574,7 @@ no_id:
user_shm_unlock(size, shp->mlock_user);
fput(file);
no_file:
- security_shm_free(shp);
- ipc_rcu_putref(shp);
+ ipc_rcu_putref(shp, shm_rcu_free);
return error;
}
diff --git a/ipc/util.c b/ipc/util.c
index e829da9..fdb8ae7 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -474,11 +474,6 @@ void ipc_free(void* ptr, int size)
kfree(ptr);
}
-struct ipc_rcu {
- struct rcu_head rcu;
- atomic_t refcount;
-} ____cacheline_aligned_in_smp;
-
/**
* ipc_rcu_alloc - allocate ipc and rcu space
* @size: size desired
@@ -505,27 +500,24 @@ int ipc_rcu_getref(void *ptr)
return atomic_inc_not_zero(&p->refcount);
}
-/**
- * ipc_schedule_free - free ipc + rcu space
- * @head: RCU callback structure for queued work
- */
-static void ipc_schedule_free(struct rcu_head *head)
-{
- vfree(container_of(head, struct ipc_rcu, rcu));
-}
-
-void ipc_rcu_putref(void *ptr)
+void ipc_rcu_putref(void *ptr, void (*func)(struct rcu_head *head))
{
struct ipc_rcu *p = ((struct ipc_rcu *)ptr) - 1;
if (!atomic_dec_and_test(&p->refcount))
return;
- if (is_vmalloc_addr(ptr)) {
- call_rcu(&p->rcu, ipc_schedule_free);
- } else {
- kfree_rcu(p, rcu);
- }
+ call_rcu(&p->rcu, func);
+}
+
+void ipc_rcu_free(struct rcu_head *head)
+{
+ struct ipc_rcu *p = container_of(head, struct ipc_rcu, rcu);
+
+ if (is_vmalloc_addr(p))
+ vfree(p);
+ else
+ kfree(p);
}
/**
diff --git a/ipc/util.h b/ipc/util.h
index c5f3338b..f2f5036 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -47,6 +47,13 @@ static inline void msg_exit_ns(struct ipc_namespace *ns) { }
static inline void shm_exit_ns(struct ipc_namespace *ns) { }
#endif
+struct ipc_rcu {
+ struct rcu_head rcu;
+ atomic_t refcount;
+} ____cacheline_aligned_in_smp;
+
+#define ipc_rcu_to_struct(p) ((void *)(p+1))
+
/*
* Structure that holds the parameters needed by the ipc operations
* (see after)
@@ -120,7 +127,8 @@ void ipc_free(void* ptr, int size);
*/
void* ipc_rcu_alloc(int size);
int ipc_rcu_getref(void *ptr);
-void ipc_rcu_putref(void *ptr);
+void ipc_rcu_putref(void *ptr, void (*func)(struct rcu_head *head));
+void ipc_rcu_free(struct rcu_head *head);
struct kern_ipc_perm *ipc_lock(struct ipc_ids *, int);
struct kern_ipc_perm *ipc_obtain_object(struct ipc_ids *ids, int id);
--
1.7.11.7
On Mon, Sep 23, 2013 at 5:04 PM, Davidlohr Bueso <[email protected]> wrote:
>
> Ok, so here's the code - again I've tested it with LTP on the resources
> I have.
This looks good to me.
Manfred, mind giving this a look-over and see if this resolves your
race concerns too?
Linus
Hi Linus,
On 09/24/2013 03:22 AM, Linus Torvalds wrote:
> On Mon, Sep 23, 2013 at 5:04 PM, Davidlohr Bueso <[email protected]> wrote:
>> Ok, so here's the code - again I've tested it with LTP on the resources
>> I have.
> This looks good to me.
>
> Manfred, mind giving this a look-over and see if this resolves your
> race concerns too?
All race concerns with regards to code outside ipc are resolved.
My current list of open issues:
https://bugzilla.kernel.org/show_bug.cgi?id=61351
Fix is in mm tree (ipc-semc-fix-race-in-sem_lock.patch)
https://bugzilla.kernel.org/show_bug.cgi?id=61321
https://bugzilla.kernel.org/show_bug.cgi?id=61331
https://bugzilla.kernel.org/show_bug.cgi?id=61341
All 3 are fixed by Davidlohr's patch
https://bugzilla.kernel.org/show_bug.cgi?id=61361
https://bugzilla.kernel.org/show_bug.cgi?id=61371
Both still open. The fix is trivial:
Sprinkle a fair amount of "if (perm.deleted) return -EIDRM;" after
ipc_lock.
And now new:
1) ipc/namespace.c:
free_ipcs() still assumes the "old style" free calls:
rcu_lock and ipc_lock dropped within the callback.
freeary() was converted - but free_ipcs was not updated.
Thus:
Closing a namespace with sem arrays and threads that are waiting on
the array with semtimedop() and bad timing can deadlock the semtimedop
thread.
(i.e.: spin_lock() waiting forever).
2) ipc/sem.c:
The proc interface calls ipc_lock() directly - thus the exclusion
of simple semop's is missing with sysvipc_sem_proc_show().
A "sem_wait_array()" might be added as the first line into
sysvipc_sem_proc_show().
It's more a correctness thing: Nothing breaks if get_semotime() is
called in parallel with simple ops.
3) The missing update of sem_otime for simple ops that Jia He found
http://marc.info/?l=linux-kernel&m=137981594522009&w=2
--
Manfred
Hi all,
On 09/24/2013 02:04 AM, Davidlohr Bueso wrote:
>> (In reality, I suspect the reference count is never elevated in
>> practice, so there is only one case that calls the security freeing
>> thing, so this may all be pretty much theoretical, but at least from a
>> logic standpoint the code clearly makes a big deal about the whole
>> refcount and "last user turns off the lights").
> Right, this would/should have come up years ago if it were actually
> occurring in practice.
As far as I see, the only requirement is "last user does kfree()":
spin_lock must be possible and perm.deleted must be valid.
e.g. from msg.c:
> rcu_read_lock();
> ipc_lock_object(&msq->q_perm);
>
> ipc_rcu_putref(msq);
> if (msq->q_perm.deleted) {
> err = -EIDRM;
> goto out_unlock0;
> }
> 8<-----------------------------------------
> From: Davidlohr Bueso <[email protected]>
> Subject: [PATCH] ipc: fix race with LSMs
>
> Currently, IPC mechanisms do security and auditing related checks under
> RCU. However, since security modules can free the security structure, for
> example, through selinux_[sem,msg_queue,shm]_free_security(), we can race
> if the structure is freed before other tasks are done with it, creating a
> use-after-free condition. Manfred illustrates this nicely, for instance with
> shared mem and selinux:
>
> --> do_shmat calls rcu_read_lock()
> --> do_shmat calls shm_object_check().
> Checks that the object is still valid - but doesn't acquire any locks.
> Then it returns.
> --> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
> --> selinux_shm_shmat calls ipc_has_perm()
> --> ipc_has_perm accesses ipc_perms->security
>
> shm_close()
> --> shm_close acquires rw_mutex & shm_lock
> --> shm_close calls shm_destroy
> --> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
> --> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
> --> ipc_free_security calls kfree(ipc_perms->security)
>
> This patch delays the freeing of the security structures after all RCU readers
> are done. Furthermore it aligns the security life cycle with that of the rest of
> IPC - freeing them based on the reference counter. For situations where we need
> not free security, the current behavior is kept. Linus states:
>
> "... the old behavior was suspect for another reason too: having
> the security blob go away from under a user sounds like it could cause
> various other problems anyway, so I think the old code was at least
> _prone_ to bugs even if it didn't have catastrophic behavior."
>
> I have tested this patch with IPC testcases from LTP on both my quad-core
> laptop and on a 64 core NUMA server. In both cases selinux is enabled, and
> tests pass for both voluntary and forced preemption models. While the mentioned
> races are theoretical (at least no one as reported them), I wanted to make
> sure that this new logic doesn't break anything we weren't aware of.
>
> Suggested-by: Linus Torvalds <[email protected]>
> Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Manfred Spraul <[email protected]>
Hi Davidlohr,
On 09/16/2013 05:04 AM, Davidlohr Bueso wrote:
> This fixes a race in shmat() between finding the msq and
> actually attaching the segment, as another thread can delete shmid
> underneath us if we are preempted before acquiring the kern_ipc_perm.lock.
According the the man page, Linux supports attaching to deleted shm
segments:
http://linux.die.net/man/2/shmat
>
> On Linux, it is possible to attach a shared memory segment even if it
> is already marked to be deleted. However, POSIX.1-2001 does not
> specify this behavior and many other implementations do not support it.
>
Does your patch change that?
Unfortunately, I have neither any experience with ipc/shm nor any test
cases.
And:
As far as I can see it's not a problem if we are attaching to a deleted
segment: shm_close cleans up everything.
--
Manfred
Hi Andrew,
Could you include the patch in -mm and push it towards Linus?
On 09/16/2013 05:04 AM, Davidlohr Bueso wrote:
> This fixes a race in both msgrcv() and msgsnd() between finding the msg and
> actually dealing with the queue, as another thread can delete shmid
> underneath us if we are preempted before acquiring the kern_ipc_perm.lock.
>
> Manfred illustrates this nicely:
>
> Assume a preemptible kernel that is preempted just after
>> msq = msq_obtain_object_check(ns, msqid)
> in do_msgrcv().
> The only lock that is held is rcu_read_lock().
>
> Now the other thread processes IPC_RMID.
> When the first task is resumed, then it will happily wait for messages on a
> deleted queue.
>
> Fix this by checking for if the queue has been deleted after taking the lock.
>
> Reported-by: Manfred Spraul <[email protected]>
> Cc: [email protected] # for 3.11
> Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Manfred Spraul <[email protected]>
Hi Manfred,
On Fri, 2013-09-27 at 07:45 +0200, Manfred Spraul wrote:
> Hi Davidlohr,
>
> On 09/16/2013 05:04 AM, Davidlohr Bueso wrote:
> > This fixes a race in shmat() between finding the msq and
> > actually attaching the segment, as another thread can delete shmid
> > underneath us if we are preempted before acquiring the kern_ipc_perm.lock.
> According the the man page, Linux supports attaching to deleted shm
> segments:
>
> http://linux.die.net/man/2/shmat
> >
> > On Linux, it is possible to attach a shared memory segment even if it
> > is already marked to be deleted. However, POSIX.1-2001 does not
> > specify this behavior and many other implementations do not support it.
> >
Good catch!
> Does your patch change that?
Yes, it should and furthermore it affects the following property:
shm_nattch is decremented by one. If it becomes 0 and the segment is
marked for deletion, the segment is deleted.
> Unfortunately, I have neither any experience with ipc/shm nor any test
> cases.
>
> And:
> As far as I can see it's not a problem if we are attaching to a deleted
> segment: shm_close cleans up everything.
Agreed, please disregard this patch.
Thanks,
Davidlohr