2007-05-08 19:16:06

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 0/2] file capabilities: Introduction

Following are two patches which have been sitting for some time in -mm.
The first implements file capabilities, the second changes the format a
bit to accomodate potential future 64-bit capabilities.

We are hoping to get a few more eyes on the code before deciding whether
this is safe to finally push up. There are no real objections to the
code at the moment, but the lack of serious review, especially by
filesystems experts, is somewhat worrying. If you have some time,
please do take a look.

Appended to this email are two programs which can be used for testing.
One is the actual test program, and one is a victim who gets his file
capabilities set by, and executed by, the main test program.

Compile using
gcc -o testfscaps testfscaps.c -lcap
gcc -o print_caps print_caps.c -lcap

then run
./testfscaps 0
./testfscaps 1 eff
./testfscaps 1 perm
./testfscaps 1 inh
./testfscaps 2

Test 0 makes sure that non-root can't write file capability xattrs.
Test 1 checks various edge cases of xattr lengths and values
Test 2 checks valid xattr values and makes sure the binary with
those values runs with the expected caps. Compare the value which
testfscaps says it set on print_caps with the values printed by
print_caps.

thanks,
-serge

=================================================================
begin print_caps.c
=================================================================
/*
* Copyright (C) IBM Corporation, 2007
* Author: Serge Hallyn <[email protected]>
*
* Prints out the capabilities with which it is running.
*/
#include <stdio.h>
#include <sys/capability.h>

int main(int argc, char *argv[])
{
cap_t cap = cap_get_proc();

if (!cap) {
perror("print_caps - cap_get_proc");
exit(1);
}

printf("%s: running with caps %s\n", argv[0], cap_to_text(cap, NULL));

cap_free(cap);

return 0;
}
=================================================================

=================================================================
begin testfscaps.c
=================================================================
/*
* Copyright (C) IBM Corporation, 2007
* Author: Serge Hallyn <[email protected]>

* Perform several tests of file capabilities:
* 1. try setting caps without CAP_SYS_ADMIN
* 2. try setting wrongly-sized sets of caps
* for eff, inh, perm, or all of the above
* Then run the executable
* 3. try setting valid caps, drop rights, and run the executable,
* make sure we get the rights
*/
#include <stdio.h>
#include <endian.h>
#include <byteswap.h>
#include <sys/types.h>
#include <attr/xattr.h>
#include <errno.h>
#include <sys/capability.h>
int errno;

void usage(char *me)
{
printf("Usage: %s <0|1|2> [arg]\n", me);
printf(" 0: set file caps without CAP_SYS_ADMIN\n");
printf(" 1: set bogus file caps\n");
printf(" arg=eff: for effective caps\n");
printf(" arg=inh: for inheritable caps\n");
printf(" arg=perm: for permitted caps\n");
printf(" 2: test that file caps are set correctly on exec\n");
exit(1);
}

int drop_root()
{
int ret;
ret = setresuid(1000, 1000, 1000);
if (ret) {
perror("setresuid");
exit(4);
}
return 1;
}

#if BYTE_ORDER == LITTLE_ENDIAN
#define le32_to_cpu(x) x
#define le16_to_cpu(x) x
#define cpu_to_le32(x) x
#define cpu_to_le16(x) x
#else
#define le32_to_cpu(x) bswap_32(x)
#define le16_to_cpu(x) bswap_16(x)
#define cpu_to_le32(x) bswap_32(x)
#define cpu_to_le16(x) bswap_16(x)
#endif

#define TSTPATH "./print_caps"
#define CAPNAME "security.capability"
#ifndef __CAP_BITS
#define __CAP_BITS 31
#endif

int perms_test(void)
{
int ret;
unsigned int value[4];

drop_root();
value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);
value[1] = 1;
value[2] = 1;
value[3] = 1;
ret = setxattr(TSTPATH, CAPNAME, value, 4*sizeof(unsigned int), 0);
if (ret) {
perror("setxattr");
printf("PASS: could not set capabilities as non-root\n");
ret = 0;
} else {
printf("FAIL: could set capabilities as non-root\n");
ret = 1;
}

return ret;
}

static inline int getcapflag(int w)
{
switch (w) {
case 0: return CAP_EFFECTIVE;
case 1: return CAP_PERMITTED;
case 2: return CAP_INHERITABLE;
default: exit(10);
}
}

int fork_drop_and_exec(void)
{
int pid = fork();
int ret, status;

if (ret == -1) {
perror("pipe");
exit(1);
}

if (pid < 0) {
perror("fork");
exit(1);
}
if (pid == 0) {
drop_root();
ret = execlp(TSTPATH, TSTPATH);
perror("execl");
exit(1);
} else {
waitpid(pid, &status, 0);
ret = status;
}
return ret;
}

int run_boundary_test(char *how)
{
int whichcap = 0;
unsigned int value[7];
int i, ret;

printf("trying unused cap bits within 32 bits\n");
if (strcmp(how, "eff") == 0)
whichcap = 0;
else if (strcmp(how, "perm") == 0)
whichcap = 1;
else if (strcmp(how, "inh") == 0)
whichcap = 2;

memset(value, 0, 7*sizeof(unsigned int));
value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);
for (i=__CAP_BITS; i<sizeof(unsigned int); i++) {
value[whichcap] = cpu_to_le32(1<<i);
ret = setxattr(TSTPATH, CAPNAME, value, 4*sizeof(unsigned int), 0);
if (ret)
printf("%s test 1: error setting cap at %d %d\n",
__FUNCTION__, whichcap, i);
ret = fork_drop_and_exec();
if (ret) {
printf("%s test 1: error execing at %d %d\n",
__FUNCTION__, whichcap, i);
}
}

printf("trying cap with wrong version\n");
memset(value, 0, 7*sizeof(unsigned int));
value[0] = cpu_to_le32(0xFFF);
ret = setxattr(TSTPATH, CAPNAME, value, 4*sizeof(unsigned int), 0);
if (ret)
printf("%s test 2: error setting cap at\n", __FUNCTION__);
ret = fork_drop_and_exec();
if (ret)
printf("%s test 2: PASS (error execing)\n", __FUNCTION__);
else
printf("%s test 2: FAIL (succeeded execing)\n", __FUNCTION__);

printf("trying cap which is too small\n");
memset(value, 0, 7*sizeof(unsigned int));
value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);
ret = setxattr(TSTPATH, CAPNAME, value, 3*sizeof(unsigned int), 0);
if (ret)
printf("%s test 3: error setting cap at\n", __FUNCTION__);
ret = fork_drop_and_exec();
if (ret)
printf("%s test 3: PASS (error execing)\n", __FUNCTION__);
else
printf("%s test 3: FAIL (succeeded execing)\n", __FUNCTION__);

printf("trying cap with 64 bits of eff, 32 each of perm and inh\n");
memset(value, 0, 7*sizeof(unsigned int));
value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);
ret = setxattr(TSTPATH, CAPNAME, value, 5*sizeof(unsigned int), 0);
if (ret)
printf("%s test 4: error setting cap at\n", __FUNCTION__);
ret = fork_drop_and_exec();
if (ret)
printf("%s test 4: PASS (error execing)\n", __FUNCTION__);
else
printf("%s test 4: FAIL (succeeded execing)\n", __FUNCTION__);

printf("trying full 64 bit caps, extra caps unset\n");
memset(value, 0, 7*sizeof(unsigned int));
value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);
ret = setxattr(TSTPATH, CAPNAME, value, 7*sizeof(unsigned int), 0);
if (ret)
printf("%s test 5: error setting cap at\n", __FUNCTION__);
ret = fork_drop_and_exec();
if (ret)
printf("%s test 5: FAIL (error execing)\n", __FUNCTION__);
else
printf("%s test 5: PASS (succeeded execing)\n", __FUNCTION__);

printf("trying full 64 bit caps, all bits set\n");
memset(value, 0, 7*sizeof(unsigned int));
value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);
value[1] = cpu_to_le32(-1);
value[2] = cpu_to_le32(-1);
value[3] = cpu_to_le32(-1);
value[4] = cpu_to_le32(-1);
value[5] = cpu_to_le32(-1);
value[6] = cpu_to_le32(-1);
ret = setxattr(TSTPATH, CAPNAME, value, 7*sizeof(unsigned int), 0);
if (ret)
printf("%s test 6: error setting cap at\n", __FUNCTION__);
ret = fork_drop_and_exec();
if (ret)
printf("%s test 6: PASS (error execing)\n", __FUNCTION__);
else
printf("%s test 6: FAIL (succeeded execing)\n", __FUNCTION__);

return 0;
}

int caps_actually_set_test(void)
{
int whichset, whichcap, ret;
cap_t cap;
unsigned int value[4];
cap_flag_t capflag;
cap_value_t capvalue[1];

value[0] = cpu_to_le32(_LINUX_CAPABILITY_VERSION);

cap = cap_init();
if (!cap) {
perror("cap_init");
exit(2);
}

for (whichset=0; whichset<3; whichset++) {
capflag = getcapflag(whichset);
value[1] = value[2] = value[3] = cpu_to_le32(0);
for (whichcap=0; whichcap < __CAP_BITS; whichcap++) {
cap_clear(cap);
capvalue[0] = whichcap;
cap_set_flag(cap, capflag, 1, capvalue, CAP_SET);
value[whichset+1] = cpu_to_le32(1 << whichcap);
ret = setxattr(TSTPATH, CAPNAME, value, 4*sizeof(unsigned int), 0);
if (ret) {
printf("%d %d\n", whichset, whichcap);
perror("setxattr");
}
printf("execing with %s\n", cap_to_text(cap, NULL));
ret = fork_drop_and_exec();
if (ret) {
printf("Error execing at %d %d\n", whichset, whichcap);
}
}
}

cap_free(cap);
return 0;
}

int main(int argc, char *argv[])
{
if (argc < 2)
usage(argv[0]);

switch(atoi(argv[1])) {
case 0:
return perms_test();
break;
case 1:
if (argc<3)
usage(argv[0]);
return run_boundary_test(argv[2]);
break;
case 2:
return caps_actually_set_test();
break;
default: usage(argv[0]);
}
}
=================================================================


2007-05-08 19:16:53

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 1/2] file capabilities: implement file capabilities

From: Serge E. Hallyn <[email protected]>
Subject: [PATCH 1/2] file capabilities: implement file capabilities

Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.

This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.

Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.

Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.

Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.

Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.

Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.

Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().

Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.

Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.

Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.

Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.

One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?

It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.

task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit

Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.

Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.

Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Stephen Smalley <[email protected]>
Cc: James Morris <[email protected]>
Cc: Chris Wright <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

include/linux/capability.h | 20 +++
include/linux/security.h | 12 +-
security/Kconfig | 10 +
security/capability.c | 4
security/commoncap.c | 194 +++++++++++++++++++++++++++++++++--
security/selinux/hooks.c | 12 ++
6 files changed, 241 insertions(+), 11 deletions(-)

diff -puN include/linux/capability.h~implement-file-posix-capabilities include/linux/capability.h
--- a/include/linux/capability.h~implement-file-posix-capabilities
+++ a/include/linux/capability.h
@@ -40,11 +40,29 @@ typedef struct __user_cap_data_struct {
__u32 inheritable;
} __user *cap_user_data_t;

+
+
+#define XATTR_CAPS_SUFFIX "capability"
+#define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX
+struct vfs_cap_data_disk {
+ __le32 version;
+ __le32 effective;
+ __le32 permitted;
+ __le32 inheritable;
+};
+
#ifdef __KERNEL__

#include <linux/spinlock.h>
#include <asm/current.h>

+struct vfs_cap_data {
+ __u32 version;
+ __u32 effective;
+ __u32 permitted;
+ __u32 inheritable;
+};
+
/* #define STRICT_CAP_T_TYPECHECKS */

#ifdef STRICT_CAP_T_TYPECHECKS
@@ -288,6 +306,8 @@ typedef __u32 kernel_cap_t;

#define CAP_AUDIT_CONTROL 30

+#define CAP_NUMCAPS 31
+
#ifdef __KERNEL__
/*
* Bounding set
diff -puN include/linux/security.h~implement-file-posix-capabilities include/linux/security.h
--- a/include/linux/security.h~implement-file-posix-capabilities
+++ a/include/linux/security.h
@@ -53,6 +53,10 @@ extern int cap_inode_setxattr(struct den
extern int cap_inode_removexattr(struct dentry *dentry, char *name);
extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags);
extern void cap_task_reparent_to_init (struct task_struct *p);
+extern int cap_task_kill(struct task_struct *p, struct siginfo *info, int sig, u32 secid);
+extern int cap_task_setscheduler (struct task_struct *p, int policy, struct sched_param *lp);
+extern int cap_task_setioprio (struct task_struct *p, int ioprio);
+extern int cap_task_setnice (struct task_struct *p, int nice);
extern int cap_syslog (int type);
extern int cap_vm_enough_memory (long pages);

@@ -2585,12 +2589,12 @@ static inline int security_task_setgroup

static inline int security_task_setnice (struct task_struct *p, int nice)
{
- return 0;
+ return cap_task_setnice(p, nice);
}

static inline int security_task_setioprio (struct task_struct *p, int ioprio)
{
- return 0;
+ return cap_task_setioprio(p, ioprio);
}

static inline int security_task_getioprio (struct task_struct *p)
@@ -2608,7 +2612,7 @@ static inline int security_task_setsched
int policy,
struct sched_param *lp)
{
- return 0;
+ return cap_task_setscheduler(p, policy, lp);
}

static inline int security_task_getscheduler (struct task_struct *p)
@@ -2625,7 +2629,7 @@ static inline int security_task_kill (st
struct siginfo *info, int sig,
u32 secid)
{
- return 0;
+ return cap_task_kill(p, info, sig, secid);
}

static inline int security_task_wait (struct task_struct *p)
diff -puN security/Kconfig~implement-file-posix-capabilities security/Kconfig
--- a/security/Kconfig~implement-file-posix-capabilities
+++ a/security/Kconfig
@@ -80,6 +80,16 @@ config SECURITY_CAPABILITIES
This enables the "default" Linux capabilities functionality.
If you are unsure how to answer this question, answer Y.

+config SECURITY_FS_CAPABILITIES
+ bool "File POSIX Capabilities"
+ depends on SECURITY=n || SECURITY_CAPABILITIES=y
+ default n
+ help
+ This enables filesystem capabilities, allowing you to give
+ binaries a subset of root's powers without using setuid 0.
+
+ If in doubt, answer N.
+
config SECURITY_ROOTPLUG
tristate "Root Plug Support"
depends on USB && SECURITY
diff -puN security/capability.c~implement-file-posix-capabilities security/capability.c
--- a/security/capability.c~implement-file-posix-capabilities
+++ a/security/capability.c
@@ -40,6 +40,10 @@ static struct security_operations capabi
.inode_setxattr = cap_inode_setxattr,
.inode_removexattr = cap_inode_removexattr,

+ .task_kill = cap_task_kill,
+ .task_setscheduler = cap_task_setscheduler,
+ .task_setioprio = cap_task_setioprio,
+ .task_setnice = cap_task_setnice,
.task_post_setuid = cap_task_post_setuid,
.task_reparent_to_init = cap_task_reparent_to_init,

diff -puN security/commoncap.c~implement-file-posix-capabilities security/commoncap.c
--- a/security/commoncap.c~implement-file-posix-capabilities
+++ a/security/commoncap.c
@@ -23,6 +23,7 @@
#include <linux/ptrace.h>
#include <linux/xattr.h>
#include <linux/hugetlb.h>
+#include <linux/mount.h>

int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
{
@@ -109,15 +110,108 @@ void cap_capset_set (struct task_struct
target->cap_permitted = *permitted;
}

+#ifdef CONFIG_SECURITY_FS_CAPABILITIES
+static inline void cap_from_disk(struct vfs_cap_data_disk *dcap,
+ struct vfs_cap_data *cap)
+{
+ cap->version = le32_to_cpu(dcap->version);
+ cap->effective = le32_to_cpu(dcap->effective);
+ cap->permitted = le32_to_cpu(dcap->permitted);
+ cap->inheritable = le32_to_cpu(dcap->inheritable);
+}
+
+static int check_cap_sanity(struct vfs_cap_data *cap)
+{
+ int i;
+
+ if (cap->version != _LINUX_CAPABILITY_VERSION)
+ return -EPERM;
+
+ for (i = CAP_NUMCAPS; i < 8*sizeof(cap->effective); i++) {
+ if (cap->effective & CAP_TO_MASK(i))
+ return -EPERM;
+ }
+ for (i = CAP_NUMCAPS; i < 8*sizeof(cap->permitted); i++) {
+ if (cap->permitted & CAP_TO_MASK(i))
+ return -EPERM;
+ }
+ for (i = CAP_NUMCAPS; i < 8*sizeof(cap->inheritable); i++) {
+ if (cap->inheritable & CAP_TO_MASK(i))
+ return -EPERM;
+ }
+
+ return 0;
+}
+
+/* Locate any VFS capabilities: */
+static int set_file_caps(struct linux_binprm *bprm)
+{
+ struct dentry *dentry;
+ ssize_t rc;
+ struct vfs_cap_data_disk dcaps;
+ struct vfs_cap_data caps;
+ struct inode *inode;
+ int err;
+
+ if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
+ return 0;
+
+ dentry = dget(bprm->file->f_dentry);
+ inode = dentry->d_inode;
+ if (!inode->i_op || !inode->i_op->getxattr) {
+ dput(dentry);
+ return 0;
+ }
+
+ rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, &dcaps,
+ sizeof(dcaps));
+ dput(dentry);
+
+ if (rc == -ENODATA)
+ return 0;
+
+ if (rc < 0) {
+ printk(KERN_NOTICE "%s: Error (%zd) getting xattr\n",
+ __FUNCTION__, rc);
+ return rc;
+ }
+
+ if (rc != sizeof(dcaps)) {
+ printk(KERN_NOTICE "%s: got wrong size for getxattr (%zd)\n",
+ __FUNCTION__, rc);
+ return -EPERM;
+ }
+
+ cap_from_disk(&dcaps, &caps);
+ err = check_cap_sanity(&caps);
+ if (err)
+ return err;
+
+ bprm->cap_effective = caps.effective;
+ bprm->cap_permitted = caps.permitted;
+ bprm->cap_inheritable = caps.inheritable;
+
+ return 0;
+}
+#else
+static inline int set_file_caps(struct linux_binprm *bprm)
+{
+ return 0;
+}
+#endif
+
int cap_bprm_set_security (struct linux_binprm *bprm)
{
+ int ret;
+
/* Copied from fs/exec.c:prepare_binprm. */

- /* We don't have VFS support for capabilities yet */
cap_clear (bprm->cap_inheritable);
cap_clear (bprm->cap_permitted);
cap_clear (bprm->cap_effective);

+ ret = set_file_caps(bprm);
+
/* To support inheritance of root-permissions and suid-root
* executables under compatibility mode, we raise all three
* capability sets for the file.
@@ -134,7 +228,8 @@ int cap_bprm_set_security (struct linux_
if (bprm->e_uid == 0)
cap_set_full (bprm->cap_effective);
}
- return 0;
+
+ return ret;
}

void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
@@ -182,11 +277,15 @@ void cap_bprm_apply_creds (struct linux_

int cap_bprm_secureexec (struct linux_binprm *bprm)
{
- /* If/when this module is enhanced to incorporate capability
- bits on files, the test below should be extended to also perform a
- test between the old and new capability sets. For now,
- it simply preserves the legacy decision algorithm used by
- the old userland. */
+ if (current->uid != 0) {
+ if (!cap_isclear(bprm->cap_effective))
+ return 1;
+ if (!cap_isclear(bprm->cap_permitted))
+ return 1;
+ if (!cap_isclear(bprm->cap_inheritable))
+ return 1;
+ }
+
return (current->euid != current->uid ||
current->egid != current->gid);
}
@@ -300,6 +399,83 @@ int cap_task_post_setuid (uid_t old_ruid
return 0;
}

+#ifdef CONFIG_SECURITY_FS_CAPABILITIES
+/*
+ * Rationale: code calling task_setscheduler, task_setioprio, and
+ * task_setnice, assumes that
+ * . if capable(cap_sys_nice), then those actions should be allowed
+ * . if not capable(cap_sys_nice), but acting on your own processes,
+ * then those actions should be allowed
+ * This is insufficient now since you can call code without suid, but
+ * yet with increased caps.
+ * So we check for increased caps on the target process.
+ */
+static inline int cap_safe_nice(struct task_struct *p)
+{
+ if (!cap_issubset(p->cap_permitted, current->cap_permitted) &&
+ !__capable(current, CAP_SYS_NICE))
+ return -EPERM;
+ return 0;
+}
+
+int cap_task_setscheduler (struct task_struct *p, int policy,
+ struct sched_param *lp)
+{
+ return cap_safe_nice(p);
+}
+
+int cap_task_setioprio (struct task_struct *p, int ioprio)
+{
+ return cap_safe_nice(p);
+}
+
+int cap_task_setnice (struct task_struct *p, int nice)
+{
+ return cap_safe_nice(p);
+}
+
+int cap_task_kill(struct task_struct *p, struct siginfo *info,
+ int sig, u32 secid)
+{
+ if (info != SEND_SIG_NOINFO && (is_si_special(info) || SI_FROMKERNEL(info)))
+ return 0;
+
+ if (secid)
+ /*
+ * Signal sent as a particular user.
+ * Capabilities are ignored. May be wrong, but it's the
+ * only thing we can do at the moment.
+ * Used only by usb drivers?
+ */
+ return 0;
+ if (cap_issubset(p->cap_permitted, current->cap_permitted))
+ return 0;
+ if (capable(CAP_KILL))
+ return 0;
+
+ return -EPERM;
+}
+#else
+int cap_task_setscheduler (struct task_struct *p, int policy,
+ struct sched_param *lp)
+{
+ return 0;
+}
+int cap_task_setioprio (struct task_struct *p, int ioprio)
+{
+ return 0;
+}
+int cap_task_setnice (struct task_struct *p, int nice)
+{
+ return 0;
+}
+int cap_task_kill(struct task_struct *p, struct siginfo *info,
+ int sig, u32 secid)
+{
+ return 0;
+}
+#endif
+
void cap_task_reparent_to_init (struct task_struct *p)
{
p->cap_effective = CAP_INIT_EFF_SET;
@@ -337,6 +513,10 @@ EXPORT_SYMBOL(cap_bprm_secureexec);
EXPORT_SYMBOL(cap_inode_setxattr);
EXPORT_SYMBOL(cap_inode_removexattr);
EXPORT_SYMBOL(cap_task_post_setuid);
+EXPORT_SYMBOL(cap_task_kill);
+EXPORT_SYMBOL(cap_task_setscheduler);
+EXPORT_SYMBOL(cap_task_setioprio);
+EXPORT_SYMBOL(cap_task_setnice);
EXPORT_SYMBOL(cap_task_reparent_to_init);
EXPORT_SYMBOL(cap_syslog);
EXPORT_SYMBOL(cap_vm_enough_memory);
diff -puN security/selinux/hooks.c~implement-file-posix-capabilities security/selinux/hooks.c
--- a/security/selinux/hooks.c~implement-file-posix-capabilities
+++ a/security/selinux/hooks.c
@@ -2825,6 +2825,12 @@ static int selinux_task_setnice(struct t

static int selinux_task_setioprio(struct task_struct *p, int ioprio)
{
+ int rc;
+
+ rc = secondary_ops->task_setioprio(p, ioprio);
+ if (rc)
+ return rc;
+
return task_has_perm(current, p, PROCESS__SETSCHED);
}

@@ -2854,6 +2860,12 @@ static int selinux_task_setrlimit(unsign

static int selinux_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp)
{
+ int rc;
+
+ rc = secondary_ops->task_setscheduler(p, policy, lp);
+ if (rc)
+ return rc;
+
return task_has_perm(current, p, PROCESS__SETSCHED);
}

_

2007-05-08 19:17:30

by Serge E. Hallyn

[permalink] [raw]
Subject: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

From: Serge E. Hallyn <[email protected]>
Subject: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

(Changelog: fixed syntax error in dummy version of check_cap_sanity())

As the capability set changes and distributions start tagging
binaries with capabilities, we would like for running an older
kernel to not necessarily make those binaries unusable.

(0. Enable the CONFIG_SECURITY_FS_CAPABILITIES option
when CONFIG_SECURITY=n.)
(1. Rename CONFIG_SECURITY_FS_CAPABILITIES to
CONFIG_SECURITY_FILE_CAPABILITIES)
2. Introduce CONFIG_SECURITY_FILE_CAPABILITIES_STRICTXATTR
which, when set, prevents loading binaries with capabilities
set which the kernel doesn't know about. When not set,
such capabilities run, ignoring the unknown caps.
3. To accomodate 64-bit caps, specify that capabilities are
stored as
u32 version; u32 eff0; u32 perm0; u32 inh0;
u32 eff1; u32 perm1; u32 inh1; (etc)

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Stephen Smalley <[email protected]>
Cc: James Morris <[email protected]>
Cc: Chris Wright <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

include/linux/capability.h | 23 ++++-
security/Kconfig | 14 ++-
security/commoncap.c | 157 ++++++++++++++++++++++-------------
3 files changed, 132 insertions(+), 62 deletions(-)

diff -puN include/linux/capability.h~file-capabilities-accomodate-future-64-bit-caps include/linux/capability.h
--- a/include/linux/capability.h~file-capabilities-accomodate-future-64-bit-caps
+++ a/include/linux/capability.h
@@ -44,11 +44,28 @@ typedef struct __user_cap_data_struct {

#define XATTR_CAPS_SUFFIX "capability"
#define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX
+
+/* size of caps that we work with */
+#define XATTR_CAPS_SZ (4*sizeof(__le32))
+
+/*
+ * data[] is organized as:
+ * effective[0]
+ * permitted[0]
+ * inheritable[0]
+ * effective[1]
+ * ...
+ * this way we can just read as much of the on-disk capability as
+ * we know should exist and know we'll get the data we'll need.
+ */
struct vfs_cap_data_disk {
__le32 version;
- __le32 effective;
- __le32 permitted;
- __le32 inheritable;
+ __le32 data[]; /* eff[0], perm[0], inh[0], eff[1], ... */
+};
+
+struct vfs_cap_data_disk_v1 {
+ __le32 version;
+ __le32 data[3]; /* eff[0], perm[0], inh[0] */
};

#ifdef __KERNEL__
diff -puN security/commoncap.c~file-capabilities-accomodate-future-64-bit-caps security/commoncap.c
--- a/security/commoncap.c~file-capabilities-accomodate-future-64-bit-caps
+++ a/security/commoncap.c
@@ -110,36 +110,73 @@ void cap_capset_set (struct task_struct
target->cap_permitted = *permitted;
}

-#ifdef CONFIG_SECURITY_FS_CAPABILITIES
-static inline void cap_from_disk(struct vfs_cap_data_disk *dcap,
- struct vfs_cap_data *cap)
+#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
+
+#ifdef CONFIG_SECURITY_FILE_CAPABILITIES_STRICTXATTR
+static int check_cap_sanity(struct vfs_cap_data_disk *dcap, int size)
{
- cap->version = le32_to_cpu(dcap->version);
- cap->effective = le32_to_cpu(dcap->effective);
- cap->permitted = le32_to_cpu(dcap->permitted);
- cap->inheritable = le32_to_cpu(dcap->inheritable);
+ int word, bit;
+ u32 eff, inh, perm;
+ int sz = (size-1)/3;
+
+ word = CAP_NUMCAPS / 32;
+ bit = CAP_NUMCAPS % 32;
+
+ eff = le32_to_cpu(dcap->data[3*word]);
+ perm = le32_to_cpu(dcap->data[3*word+1]);
+ inh = le32_to_cpu(dcap->data[3*word+2]);
+
+ while (word < sz) {
+ if (bit == 32) {
+ bit = 0;
+ word++;
+ if (word >= sz)
+ break;
+ eff = le32_to_cpu(dcap->data[3*word]);
+ perm = le32_to_cpu(dcap->data[3*word+1]);
+ inh = le32_to_cpu(dcap->data[3*word+2]);
+ continue;
+ }
+ if (eff & CAP_TO_MASK(bit))
+ return -EINVAL;
+ if (inh & CAP_TO_MASK(bit))
+ return -EINVAL;
+ if (perm & CAP_TO_MASK(bit))
+ return -EINVAL;
+ bit++;
+ }
+
+ return 0;
}
+#else
+static int check_cap_sanity(struct vfs_cap_data_disk *dcap, int sz)
+{ return 0; }
+#endif

-static int check_cap_sanity(struct vfs_cap_data *cap)
+static inline int cap_from_disk(struct vfs_cap_data_disk *dcap,
+ struct linux_binprm *bprm, int size)
{
- int i;
+ int rc, version;

- if (cap->version != _LINUX_CAPABILITY_VERSION)
- return -EPERM;
+ version = le32_to_cpu(dcap->version);
+ if (version != _LINUX_CAPABILITY_VERSION)
+ return -EINVAL;

- for (i = CAP_NUMCAPS; i < 8*sizeof(cap->effective); i++) {
- if (cap->effective & CAP_TO_MASK(i))
- return -EPERM;
- }
- for (i = CAP_NUMCAPS; i < 8*sizeof(cap->permitted); i++) {
- if (cap->permitted & CAP_TO_MASK(i))
- return -EPERM;
- }
- for (i = CAP_NUMCAPS; i < 8*sizeof(cap->inheritable); i++) {
- if (cap->inheritable & CAP_TO_MASK(i))
- return -EPERM;
+ size /= sizeof(u32);
+ if ((size-1)%3) {
+ printk(KERN_WARNING "%s: size is an invalid size (%d)\n",
+ __FUNCTION__, size);
+ return -EINVAL;
}

+ rc = check_cap_sanity(dcap, size);
+ if (rc)
+ return rc;
+
+ bprm->cap_effective = le32_to_cpu(dcap->data[0]);
+ bprm->cap_permitted = le32_to_cpu(dcap->data[1]);
+ bprm->cap_inheritable = le32_to_cpu(dcap->data[2]);
+
return 0;
}

@@ -147,52 +184,58 @@ static int check_cap_sanity(struct vfs_c
static int set_file_caps(struct linux_binprm *bprm)
{
struct dentry *dentry;
- ssize_t rc;
- struct vfs_cap_data_disk dcaps;
- struct vfs_cap_data caps;
+ int rc;
+ struct vfs_cap_data_disk_v1 v1caps;
+ struct vfs_cap_data_disk *dcaps;
struct inode *inode;
- int err;

+ dcaps = (struct vfs_cap_data_disk *)&v1caps;
if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
return 0;

dentry = dget(bprm->file->f_dentry);
inode = dentry->d_inode;
- if (!inode->i_op || !inode->i_op->getxattr) {
- dput(dentry);
- return 0;
- }
-
- rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, &dcaps,
- sizeof(dcaps));
- dput(dentry);
-
- if (rc == -ENODATA)
- return 0;
-
- if (rc < 0) {
- printk(KERN_NOTICE "%s: Error (%zd) getting xattr\n",
- __FUNCTION__, rc);
- return rc;
+ rc = 0;
+ if (!inode->i_op || !inode->i_op->getxattr)
+ goto out;
+
+ rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, dcaps,
+ XATTR_CAPS_SZ);
+ if (rc == -ENODATA || rc == -EOPNOTSUPP) {
+ rc = 0;
+ goto out;
+ }
+ if (rc == -ERANGE) {
+ int size;
+ size = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0);
+ if (size <= 0) { /* shouldn't ever happen */
+ rc = -EINVAL;
+ goto out;
+ }
+ dcaps = kmalloc(size, GFP_KERNEL);
+ if (!dcaps) {
+ rc = -ENOMEM;
+ goto out;
+ }
+ rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, dcaps,
+ size);
}
-
- if (rc != sizeof(dcaps)) {
- printk(KERN_NOTICE "%s: got wrong size for getxattr (%zd)\n",
- __FUNCTION__, rc);
- return -EPERM;
+ if (rc < 0)
+ goto out;
+ if (rc < sizeof(struct vfs_cap_data_disk_v1)) {
+ rc = -EINVAL;
+ goto out;
}

- cap_from_disk(&dcaps, &caps);
- err = check_cap_sanity(&caps);
- if (err)
- return err;
-
- bprm->cap_effective = caps.effective;
- bprm->cap_permitted = caps.permitted;
- bprm->cap_inheritable = caps.inheritable;
+ rc = cap_from_disk(dcaps, bprm, rc);

- return 0;
+out:
+ dput(dentry);
+ if ((void *)dcaps != (void *)&v1caps)
+ kfree(dcaps);
+ return rc;
}
+
#else
static inline int set_file_caps(struct linux_binprm *bprm)
{
@@ -399,7 +442,7 @@ int cap_task_post_setuid (uid_t old_ruid
return 0;
}

-#ifdef CONFIG_SECURITY_FS_CAPABILITIES
+#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
/*
* Rationale: code calling task_setscheduler, task_setioprio, and
* task_setnice, assumes that
diff -puN security/Kconfig~file-capabilities-accomodate-future-64-bit-caps security/Kconfig
--- a/security/Kconfig~file-capabilities-accomodate-future-64-bit-caps
+++ a/security/Kconfig
@@ -80,9 +80,9 @@ config SECURITY_CAPABILITIES
This enables the "default" Linux capabilities functionality.
If you are unsure how to answer this question, answer Y.

-config SECURITY_FS_CAPABILITIES
+config SECURITY_FILE_CAPABILITIES
bool "File POSIX Capabilities"
- depends on SECURITY=n || SECURITY_CAPABILITIES=y
+ depends on SECURITY=n || SECURITY_CAPABILITIES!=n
default n
help
This enables filesystem capabilities, allowing you to give
@@ -90,6 +90,16 @@ config SECURITY_FS_CAPABILITIES

If in doubt, answer N.

+config SECURITY_FILE_CAPABILITIES_STRICTXATTR
+ bool "Refuse to run files with unknown caps"
+ depends on SECURITY_FILE_CAPABILITIES
+ default y
+ help
+ Refuse to run files which have unknown capabilities set
+ in the security.capability xattr. This could prevent
+ running important binaries from an updated distribution
+ on an older kernel.
+
config SECURITY_ROOTPLUG
tristate "Root Plug Support"
depends on USB && SECURITY
_

2007-05-08 20:06:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/2] file capabilities: Introduction

On Tue, 8 May 2007 14:15:48 -0500
"Serge E. Hallyn" <[email protected]> wrote:

> Following are two patches which have been sitting for some time in -mm.

Where "some time" == "nearly six months".

We need help considering, reviewing and testing this code, please.

2007-05-08 20:58:37

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

On May 08, 2007 14:17 -0500, Serge E. Hallyn wrote:
> As the capability set changes and distributions start tagging
> binaries with capabilities, we would like for running an older
> kernel to not necessarily make those binaries unusable.
>
> (0. Enable the CONFIG_SECURITY_FS_CAPABILITIES option
> when CONFIG_SECURITY=n.)
> (1. Rename CONFIG_SECURITY_FS_CAPABILITIES to
> CONFIG_SECURITY_FILE_CAPABILITIES)
> 2. Introduce CONFIG_SECURITY_FILE_CAPABILITIES_STRICTXATTR
> which, when set, prevents loading binaries with capabilities
> set which the kernel doesn't know about. When not set,
> such capabilities run, ignoring the unknown caps.
> 3. To accomodate 64-bit caps, specify that capabilities are
> stored as
> u32 version; u32 eff0; u32 perm0; u32 inh0;
> u32 eff1; u32 perm1; u32 inh1; (etc)

Have you considered how such capabilities will be used in the future?
One of the important use cases I can see today is the ability to
split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine
grained attributes.

What we definitely do NOT want to happen is an application that needs
priviledged access (e.g. e2fsck, mount) to stop running because the
new capabilities _would_ have been granted by the new kernel and are
not by the old kernel and STRICTXATTR is used.

To me it would seem that having extra capabilities on an old kernel
is relatively harmless if the old kernel doesn't know what they are.
It's like having a key to a door that you don't know where it is.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-05-08 21:49:20

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

Quoting Andreas Dilger ([email protected]):
> On May 08, 2007 14:17 -0500, Serge E. Hallyn wrote:
> > As the capability set changes and distributions start tagging
> > binaries with capabilities, we would like for running an older
> > kernel to not necessarily make those binaries unusable.
> >
> > (0. Enable the CONFIG_SECURITY_FS_CAPABILITIES option
> > when CONFIG_SECURITY=n.)
> > (1. Rename CONFIG_SECURITY_FS_CAPABILITIES to
> > CONFIG_SECURITY_FILE_CAPABILITIES)
> > 2. Introduce CONFIG_SECURITY_FILE_CAPABILITIES_STRICTXATTR
> > which, when set, prevents loading binaries with capabilities
> > set which the kernel doesn't know about. When not set,
> > such capabilities run, ignoring the unknown caps.
> > 3. To accomodate 64-bit caps, specify that capabilities are
> > stored as
> > u32 version; u32 eff0; u32 perm0; u32 inh0;
> > u32 eff1; u32 perm1; u32 inh1; (etc)
>
> Have you considered how such capabilities will be used in the future?

There have been all sorts of suggestions, including very fine-grained
breakdowns of existing capabilities as well as capabilities for
non-privileged operations.

Other candidates for upcoming capabilities will be to satisfy
containers/vserver/openvz, where a distinction needs to be made between
CAP_DAC_OVERRIDE inside the user namespace, and the global
CAP_DAC_OVERRIDE. Although the path i've been pursuing (for which I
should really send out the prelim patches I've been sitting on)
follow David Howell's and Eric Biederman's suggestions of using the
keyrings to store capabilities to other user namespaces. Still new
capabilities may be desirable to guard CLONE_NEW_NS etc (rather than
CAP_SYS_ADMIN).

> One of the important use cases I can see today is the ability to
> split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine
> grained attributes.

Sounds plausible, though it suffers from both making capabilities far
more cumbersome (i.e. finding the right capability for what you wanted
to do) and backward compatibility. Perhaps at that point we should
introduce security.capabilityv2 xattrs. A binary can then carry
security.capability=CAP_SYS_ADMIN=p, and
security.capabilityv2=cap_may_clone_mntns=p.

> What we definitely do NOT want to happen is an application that needs
> priviledged access (e.g. e2fsck, mount) to stop running because the
> new capabilities _would_ have been granted by the new kernel and are
> not by the old kernel and STRICTXATTR is used.
>
> To me it would seem that having extra capabilities on an old kernel
> is relatively harmless if the old kernel doesn't know what they are.
> It's like having a key to a door that you don't know where it is.

If we ditch the STRICTXATTR option do the semantics seem sane to you?

thanks,
-serge

2007-05-10 20:01:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

On May 08, 2007 16:49 -0500, Serge E. Hallyn wrote:
> Quoting Andreas Dilger ([email protected]):
> > One of the important use cases I can see today is the ability to
> > split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine
> > grained attributes.
>
> Sounds plausible, though it suffers from both making capabilities far
> more cumbersome (i.e. finding the right capability for what you wanted
> to do) and backward compatibility. Perhaps at that point we should
> introduce security.capabilityv2 xattrs. A binary can then carry
> security.capability=CAP_SYS_ADMIN=p, and
> security.capabilityv2=cap_may_clone_mntns=p.

Well, the overhead of each EA is non-trivial (16 bytes/EA) for storing
12 bytes worth of data, so it is probably just better to keep extending
the original capability fields as was in the proposal.

> > What we definitely do NOT want to happen is an application that needs
> > priviledged access (e.g. e2fsck, mount) to stop running because the
> > new capabilities _would_ have been granted by the new kernel and are
> > not by the old kernel and STRICTXATTR is used.
> >
> > To me it would seem that having extra capabilities on an old kernel
> > is relatively harmless if the old kernel doesn't know what they are.
> > It's like having a key to a door that you don't know where it is.
>
> If we ditch the STRICTXATTR option do the semantics seem sane to you?

Seems reasonable.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-05-11 06:07:54

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

On Thu, May 10, 2007 at 01:01:27PM -0700, Andreas Dilger wrote:
> On May 08, 2007 16:49 -0500, Serge E. Hallyn wrote:
> > Quoting Andreas Dilger ([email protected]):
> > > One of the important use cases I can see today is the ability to
> > > split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine
> > > grained attributes.
> >
> > Sounds plausible, though it suffers from both making capabilities far
> > more cumbersome (i.e. finding the right capability for what you wanted
> > to do) and backward compatibility. Perhaps at that point we should
> > introduce security.capabilityv2 xattrs. A binary can then carry
> > security.capability=CAP_SYS_ADMIN=p, and
> > security.capabilityv2=cap_may_clone_mntns=p.
>
> Well, the overhead of each EA is non-trivial (16 bytes/EA) for storing
> 12 bytes worth of data, so it is probably just better to keep extending
> the original capability fields as was in the proposal.
>
> > > What we definitely do NOT want to happen is an application that needs
> > > priviledged access (e.g. e2fsck, mount) to stop running because the
> > > new capabilities _would_ have been granted by the new kernel and are
> > > not by the old kernel and STRICTXATTR is used.
> > >
> > > To me it would seem that having extra capabilities on an old kernel
> > > is relatively harmless if the old kernel doesn't know what they are.
> > > It's like having a key to a door that you don't know where it is.
> >
> > If we ditch the STRICTXATTR option do the semantics seem sane to you?
>
> Seems reasonable.

It would simplify the code as well, which is good.

This does mean no sanity checking of fcaps, am not sure if that matters,
I'm guessing it should be similar to the case for other security attributes.

Regards
Suparna

>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2007-05-14 17:26:26

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 2/2] file capabilities: accomodate >32 bit capabilities

Quoting Suparna Bhattacharya ([email protected]):
> On Thu, May 10, 2007 at 01:01:27PM -0700, Andreas Dilger wrote:
> > On May 08, 2007 16:49 -0500, Serge E. Hallyn wrote:
> > > Quoting Andreas Dilger ([email protected]):
> > > > One of the important use cases I can see today is the ability to
> > > > split the heavily-overloaded e.g. CAP_SYS_ADMIN into much more fine
> > > > grained attributes.
> > >
> > > Sounds plausible, though it suffers from both making capabilities far
> > > more cumbersome (i.e. finding the right capability for what you wanted
> > > to do) and backward compatibility. Perhaps at that point we should
> > > introduce security.capabilityv2 xattrs. A binary can then carry
> > > security.capability=CAP_SYS_ADMIN=p, and
> > > security.capabilityv2=cap_may_clone_mntns=p.
> >
> > Well, the overhead of each EA is non-trivial (16 bytes/EA) for storing
> > 12 bytes worth of data, so it is probably just better to keep extending
> > the original capability fields as was in the proposal.
> >
> > > > What we definitely do NOT want to happen is an application that needs
> > > > priviledged access (e.g. e2fsck, mount) to stop running because the
> > > > new capabilities _would_ have been granted by the new kernel and are
> > > > not by the old kernel and STRICTXATTR is used.
> > > >
> > > > To me it would seem that having extra capabilities on an old kernel
> > > > is relatively harmless if the old kernel doesn't know what they are.
> > > > It's like having a key to a door that you don't know where it is.
> > >
> > > If we ditch the STRICTXATTR option do the semantics seem sane to you?
> >
> > Seems reasonable.
>
> It would simplify the code as well, which is good.
>
> This does mean no sanity checking of fcaps, am not sure if that matters,
> I'm guessing it should be similar to the case for other security attributes.

which is to trust the xattr...

So here is a new consolidated patch without the STRICTXATTR config
option.

-serge

From: Serge E. Hallyn <[email protected]>
Subject: [PATCH] Implement file posix capabilities

Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.

This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.

Changelog:
May 14:
Remove STRICTXATTR support which could make newer binaries
unusable on older kernels, and combine the two patches
into one.

[recent]:
1. Enable the CONFIG_SECURITY_FS_CAPABILITIES option
when CONFIG_SECURITY=n.
2. Rename CONFIG_SECURITY_FS_CAPABILITIES to
CONFIG_SECURITY_FILE_CAPABILITIES
3. To accomodate 64-bit caps, specify that capabilities are
stored as
u32 version; u32 eff0; u32 perm0; u32 inh0;
u32 eff1; u32 perm1; u32 inh1; (etc)

Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.

Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.

Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.

Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.

Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.

Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().

Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.

Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.

Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.

Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.

One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?

It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.

task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimit

Aug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.

Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.

Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.

Signed-off-by: Serge E. Hallyn <[email protected]>

---

include/linux/capability.h | 37 ++++++++
include/linux/security.h | 12 ++-
security/Kconfig | 10 ++
security/capability.c | 4 +
security/commoncap.c | 192 ++++++++++++++++++++++++++++++++++++++++++--
security/selinux/hooks.c | 12 +++
6 files changed, 256 insertions(+), 11 deletions(-)

833d9a6478d4f7fb915d76b25be1bb71e56b29f1
diff --git a/include/linux/capability.h b/include/linux/capability.h
index 6548b35..4dbfef3 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -40,11 +40,46 @@ typedef struct __user_cap_data_struct {
__u32 inheritable;
} __user *cap_user_data_t;

+
+
+#define XATTR_CAPS_SUFFIX "capability"
+#define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX
+
+/* size of caps that we work with */
+#define XATTR_CAPS_SZ (4*sizeof(__le32))
+
+/*
+ * data[] is organized as:
+ * effective[0]
+ * permitted[0]
+ * inheritable[0]
+ * effective[1]
+ * ...
+ * this way we can just read as much of the on-disk capability as
+ * we know should exist and know we'll get the data we'll need.
+ */
+struct vfs_cap_data_disk {
+ __le32 version;
+ __le32 data[]; /* eff[0], perm[0], inh[0], eff[1], ... */
+};
+
+struct vfs_cap_data_disk_v1 {
+ __le32 version;
+ __le32 data[3]; /* eff[0], perm[0], inh[0] */
+};
+
#ifdef __KERNEL__

#include <linux/spinlock.h>
#include <asm/current.h>

+struct vfs_cap_data {
+ __u32 version;
+ __u32 effective;
+ __u32 permitted;
+ __u32 inheritable;
+};
+
/* #define STRICT_CAP_T_TYPECHECKS */

#ifdef STRICT_CAP_T_TYPECHECKS
@@ -288,6 +323,8 @@ typedef __u32 kernel_cap_t;

#define CAP_AUDIT_CONTROL 30

+#define CAP_NUMCAPS 31
+
#ifdef __KERNEL__
/*
* Bounding set
diff --git a/include/linux/security.h b/include/linux/security.h
index 9eb9e0f..1a362c8 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -53,6 +53,10 @@ extern int cap_inode_setxattr(struct den
extern int cap_inode_removexattr(struct dentry *dentry, char *name);
extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags);
extern void cap_task_reparent_to_init (struct task_struct *p);
+extern int cap_task_kill(struct task_struct *p, struct siginfo *info, int sig, u32 secid);
+extern int cap_task_setscheduler (struct task_struct *p, int policy, struct sched_param *lp);
+extern int cap_task_setioprio (struct task_struct *p, int ioprio);
+extern int cap_task_setnice (struct task_struct *p, int nice);
extern int cap_syslog (int type);
extern int cap_vm_enough_memory (long pages);

@@ -2585,12 +2589,12 @@ static inline int security_task_setgroup

static inline int security_task_setnice (struct task_struct *p, int nice)
{
- return 0;
+ return cap_task_setnice(p, nice);
}

static inline int security_task_setioprio (struct task_struct *p, int ioprio)
{
- return 0;
+ return cap_task_setioprio(p, ioprio);
}

static inline int security_task_getioprio (struct task_struct *p)
@@ -2608,7 +2612,7 @@ static inline int security_task_setsched
int policy,
struct sched_param *lp)
{
- return 0;
+ return cap_task_setscheduler(p, policy, lp);
}

static inline int security_task_getscheduler (struct task_struct *p)
@@ -2625,7 +2629,7 @@ static inline int security_task_kill (st
struct siginfo *info, int sig,
u32 secid)
{
- return 0;
+ return cap_task_kill(p, info, sig, secid);
}

static inline int security_task_wait (struct task_struct *p)
diff --git a/security/Kconfig b/security/Kconfig
index 460e5c9..7c941d9 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -80,6 +80,16 @@ config SECURITY_CAPABILITIES
This enables the "default" Linux capabilities functionality.
If you are unsure how to answer this question, answer Y.

+config SECURITY_FILE_CAPABILITIES
+ bool "File POSIX Capabilities"
+ depends on SECURITY=n || SECURITY_CAPABILITIES!=n
+ default n
+ help
+ This enables filesystem capabilities, allowing you to give
+ binaries a subset of root's powers without using setuid 0.
+
+ If in doubt, answer N.
+
config SECURITY_ROOTPLUG
tristate "Root Plug Support"
depends on USB && SECURITY
diff --git a/security/capability.c b/security/capability.c
index 38296a0..0db5fda 100644
--- a/security/capability.c
+++ b/security/capability.c
@@ -39,6 +39,10 @@ static struct security_operations capabi
.inode_setxattr = cap_inode_setxattr,
.inode_removexattr = cap_inode_removexattr,

+ .task_kill = cap_task_kill,
+ .task_setscheduler = cap_task_setscheduler,
+ .task_setioprio = cap_task_setioprio,
+ .task_setnice = cap_task_setnice,
.task_post_setuid = cap_task_post_setuid,
.task_reparent_to_init = cap_task_reparent_to_init,

diff --git a/security/commoncap.c b/security/commoncap.c
index 384379e..5c86fec 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -22,6 +22,7 @@
#include <linux/ptrace.h>
#include <linux/xattr.h>
#include <linux/hugetlb.h>
+#include <linux/mount.h>

int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
{
@@ -108,15 +109,106 @@ void cap_capset_set (struct task_struct
target->cap_permitted = *permitted;
}

+#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
+
+static inline int cap_from_disk(struct vfs_cap_data_disk *dcap,
+ struct linux_binprm *bprm, int size)
+{
+ int version;
+
+ version = le32_to_cpu(dcap->version);
+ if (version != _LINUX_CAPABILITY_VERSION)
+ return -EINVAL;
+
+ size /= sizeof(u32);
+ if ((size-1)%3) {
+ printk(KERN_WARNING "%s: size is an invalid size %d for %s\n",
+ __FUNCTION__, size, bprm->filename);
+ return -EINVAL;
+ }
+
+ bprm->cap_effective = le32_to_cpu(dcap->data[0]);
+ bprm->cap_permitted = le32_to_cpu(dcap->data[1]);
+ bprm->cap_inheritable = le32_to_cpu(dcap->data[2]);
+
+ return 0;
+}
+
+/* Locate any VFS capabilities: */
+static int set_file_caps(struct linux_binprm *bprm)
+{
+ struct dentry *dentry;
+ int rc;
+ struct vfs_cap_data_disk_v1 v1caps;
+ struct vfs_cap_data_disk *dcaps;
+ struct inode *inode;
+
+ dcaps = (struct vfs_cap_data_disk *)&v1caps;
+ if (bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)
+ return 0;
+
+ dentry = dget(bprm->file->f_dentry);
+ inode = dentry->d_inode;
+ rc = 0;
+ if (!inode->i_op || !inode->i_op->getxattr)
+ goto out;
+
+ rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, dcaps,
+ XATTR_CAPS_SZ);
+ if (rc == -ENODATA || rc == -EOPNOTSUPP) {
+ rc = 0;
+ goto out;
+ }
+ if (rc == -ERANGE) {
+ int size;
+ size = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0);
+ if (size <= 0) { /* shouldn't ever happen */
+ rc = -EINVAL;
+ goto out;
+ }
+ dcaps = kmalloc(size, GFP_KERNEL);
+ if (!dcaps) {
+ rc = -ENOMEM;
+ goto out;
+ }
+ rc = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, dcaps,
+ size);
+ }
+ if (rc < 0)
+ goto out;
+ if (rc < sizeof(struct vfs_cap_data_disk_v1)) {
+ rc = -EINVAL;
+ goto out;
+ }
+
+ rc = cap_from_disk(dcaps, bprm, rc);
+
+out:
+ dput(dentry);
+ if ((void *)dcaps != (void *)&v1caps)
+ kfree(dcaps);
+ return rc;
+}
+
+#else
+static inline int set_file_caps(struct linux_binprm *bprm)
+{
+ return 0;
+}
+#endif
+
int cap_bprm_set_security (struct linux_binprm *bprm)
{
+ int ret;
+
/* Copied from fs/exec.c:prepare_binprm. */

- /* We don't have VFS support for capabilities yet */
cap_clear (bprm->cap_inheritable);
cap_clear (bprm->cap_permitted);
cap_clear (bprm->cap_effective);

+ ret = set_file_caps(bprm);
+
/* To support inheritance of root-permissions and suid-root
* executables under compatibility mode, we raise all three
* capability sets for the file.
@@ -133,7 +225,8 @@ int cap_bprm_set_security (struct linux_
if (bprm->e_uid == 0)
cap_set_full (bprm->cap_effective);
}
- return 0;
+
+ return ret;
}

void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
@@ -181,11 +274,15 @@ void cap_bprm_apply_creds (struct linux_

int cap_bprm_secureexec (struct linux_binprm *bprm)
{
- /* If/when this module is enhanced to incorporate capability
- bits on files, the test below should be extended to also perform a
- test between the old and new capability sets. For now,
- it simply preserves the legacy decision algorithm used by
- the old userland. */
+ if (current->uid != 0) {
+ if (!cap_isclear(bprm->cap_effective))
+ return 1;
+ if (!cap_isclear(bprm->cap_permitted))
+ return 1;
+ if (!cap_isclear(bprm->cap_inheritable))
+ return 1;
+ }
+
return (current->euid != current->uid ||
current->egid != current->gid);
}
@@ -299,6 +396,83 @@ int cap_task_post_setuid (uid_t old_ruid
return 0;
}

+#ifdef CONFIG_SECURITY_FILE_CAPABILITIES
+/*
+ * Rationale: code calling task_setscheduler, task_setioprio, and
+ * task_setnice, assumes that
+ * . if capable(cap_sys_nice), then those actions should be allowed
+ * . if not capable(cap_sys_nice), but acting on your own processes,
+ * then those actions should be allowed
+ * This is insufficient now since you can call code without suid, but
+ * yet with increased caps.
+ * So we check for increased caps on the target process.
+ */
+static inline int cap_safe_nice(struct task_struct *p)
+{
+ if (!cap_issubset(p->cap_permitted, current->cap_permitted) &&
+ !__capable(current, CAP_SYS_NICE))
+ return -EPERM;
+ return 0;
+}
+
+int cap_task_setscheduler (struct task_struct *p, int policy,
+ struct sched_param *lp)
+{
+ return cap_safe_nice(p);
+}
+
+int cap_task_setioprio (struct task_struct *p, int ioprio)
+{
+ return cap_safe_nice(p);
+}
+
+int cap_task_setnice (struct task_struct *p, int nice)
+{
+ return cap_safe_nice(p);
+}
+
+int cap_task_kill(struct task_struct *p, struct siginfo *info,
+ int sig, u32 secid)
+{
+ if (info != SEND_SIG_NOINFO && (is_si_special(info) || SI_FROMKERNEL(info)))
+ return 0;
+
+ if (secid)
+ /*
+ * Signal sent as a particular user.
+ * Capabilities are ignored. May be wrong, but it's the
+ * only thing we can do at the moment.
+ * Used only by usb drivers?
+ */
+ return 0;
+ if (cap_issubset(p->cap_permitted, current->cap_permitted))
+ return 0;
+ if (capable(CAP_KILL))
+ return 0;
+
+ return -EPERM;
+}
+#else
+int cap_task_setscheduler (struct task_struct *p, int policy,
+ struct sched_param *lp)
+{
+ return 0;
+}
+int cap_task_setioprio (struct task_struct *p, int ioprio)
+{
+ return 0;
+}
+int cap_task_setnice (struct task_struct *p, int nice)
+{
+ return 0;
+}
+int cap_task_kill(struct task_struct *p, struct siginfo *info,
+ int sig, u32 secid)
+{
+ return 0;
+}
+#endif
+
void cap_task_reparent_to_init (struct task_struct *p)
{
p->cap_effective = CAP_INIT_EFF_SET;
@@ -336,6 +510,10 @@ EXPORT_SYMBOL(cap_bprm_secureexec);
EXPORT_SYMBOL(cap_inode_setxattr);
EXPORT_SYMBOL(cap_inode_removexattr);
EXPORT_SYMBOL(cap_task_post_setuid);
+EXPORT_SYMBOL(cap_task_kill);
+EXPORT_SYMBOL(cap_task_setscheduler);
+EXPORT_SYMBOL(cap_task_setioprio);
+EXPORT_SYMBOL(cap_task_setnice);
EXPORT_SYMBOL(cap_task_reparent_to_init);
EXPORT_SYMBOL(cap_syslog);
EXPORT_SYMBOL(cap_vm_enough_memory);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ad8dd4e..af42820 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2823,6 +2823,12 @@ static int selinux_task_setnice(struct t

static int selinux_task_setioprio(struct task_struct *p, int ioprio)
{
+ int rc;
+
+ rc = secondary_ops->task_setioprio(p, ioprio);
+ if (rc)
+ return rc;
+
return task_has_perm(current, p, PROCESS__SETSCHED);
}

@@ -2852,6 +2858,12 @@ static int selinux_task_setrlimit(unsign

static int selinux_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp)
{
+ int rc;
+
+ rc = secondary_ops->task_setscheduler(p, policy, lp);
+ if (rc)
+ return rc;
+
return task_has_perm(current, p, PROCESS__SETSCHED);
}

--
1.1.6

2007-05-14 20:02:36

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/2] file capabilities: Introduction

Hi!

> "Serge E. Hallyn" <[email protected]> wrote:
>
> > Following are two patches which have been sitting for some time in -mm.
>
> Where "some time" == "nearly six months".
>
> We need help considering, reviewing and testing this code, please.

I did quick scan, and it looks ok. Plus, it means we can finally start
using that old capabilities subsystem... so I think we should do it.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-05-17 05:56:28

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: [PATCH 0/2] file capabilities: Introduction

On Mon, May 14, 2007 at 08:00:11PM +0000, Pavel Machek wrote:
> Hi!
>
> > "Serge E. Hallyn" <[email protected]> wrote:
> >
> > > Following are two patches which have been sitting for some time in -mm.
> >
> > Where "some time" == "nearly six months".
> >
> > We need help considering, reviewing and testing this code, please.
>
> I did quick scan, and it looks ok. Plus, it means we can finally start
> using that old capabilities subsystem... so I think we should do it.

FWIW, I looked through it recently as well, and it looked reasonable enough
to me, though I'm not a security expert. I did have a question about
testing corner cases etc, which Serge has tried to address.

Serge, are you planning to post an update without STRICTXATTR ? That should
simplify the second patch.

Regards
Suparna

>
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2007-05-17 12:51:47

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/2] file capabilities: Introduction

Quoting Suparna Bhattacharya ([email protected]):
> On Mon, May 14, 2007 at 08:00:11PM +0000, Pavel Machek wrote:
> > Hi!
> >
> > > "Serge E. Hallyn" <[email protected]> wrote:
> > >
> > > > Following are two patches which have been sitting for some time in -mm.
> > >
> > > Where "some time" == "nearly six months".
> > >
> > > We need help considering, reviewing and testing this code, please.
> >
> > I did quick scan, and it looks ok. Plus, it means we can finally start
> > using that old capabilities subsystem... so I think we should do it.
>
> FWIW, I looked through it recently as well, and it looked reasonable enough
> to me, though I'm not a security expert. I did have a question about
> testing corner cases etc, which Serge has tried to address.
>
> Serge, are you planning to post an update without STRICTXATTR ? That should
> simplify the second patch.

Sorry, I did but I guess I didn't cc: you on that reply.

It is at http://lkml.org/lkml/2007/5/14/276

thanks,
-serge