LinuxLists.cc - disable-cap-mlock

2004-04-01 13:59:21

by Andrea Arcangeli

[permalink] [raw]

Subject: disable-cap-mlock

Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
guess google also won't dislike it.

This is a lot simpler than the mlock rlimit and this is people really
need (not the rlimit). The rlimit thing can still be applied on top of
this. This should be more efficient too (besides its simplicity).

can you apply to mainline?

http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc3-aa1/disable-cap-mlock-1

2004-04-01 14:12:42

by Martin Zwickel

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, 1 Apr 2004 15:59:20 +0200
Andrea Arcangeli <[email protected]> bubbled:

> Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> guess google also won't dislike it.
>
> This is a lot simpler than the mlock rlimit and this is people really
> need (not the rlimit). The rlimit thing can still be applied on top of
> this. This should be more efficient too (besides its simplicity).
>
> can you apply to mainline?
>
> http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc3-aa1/disable-cap-mlock-1

this is the correct link:
http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc3/disable-cap-mlock-1

--
MyExcuse:
Processes running slowly due to weak power supply

Martin Zwickel <[email protected]>
Research & Development

TechnoTrend AG <http://www.technotrend.de>

Attachments:

(No filename) (849.00 B)
(No filename) (189.00 B)
Download all attachments

2004-04-01 16:48:55

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 03:59:20PM +0200, Andrea Arcangeli wrote:
> Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> guess google also won't dislike it.
> This is a lot simpler than the mlock rlimit and this is people really
> need (not the rlimit). The rlimit thing can still be applied on top of
> this. This should be more efficient too (besides its simplicity).
> can you apply to mainline?
> http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc3-aa1/disable-cap-mlock-1

Something like this would have the minor advantage of zero core impact.
Testbooted only. vs. 2.6.5-rc3-mm4

-- wli

$ diffstat -p1 patches/capable_sysctl
security/Kconfig | 6 +
security/Makefile | 1
security/sysctl_capable.c | 205 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 212 insertions(+)

Index: mm4-2.6.5-rc3/security/sysctl_capable.c
===================================================================
--- mm4-2.6.5-rc3.orig/security/sysctl_capable.c 2004-02-07 18:26:35.000000000 -0800
+++ mm4-2.6.5-rc3/security/sysctl_capable.c 2004-04-01 08:41:08.000000000 -0800
@@ -0,0 +1,205 @@
+#include <linux/config.h>
+#include <linux/sysctl.h>
+#include <linux/capability.h>
+#include <linux/security.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+/*
+ * apparently only 0-28 are used
+ * states:
+ * 0: checks enabled
+ * 1: checks disabled
+ * 2: root-only
+ * 3: no access whatsoever
+ */
+#define CAP_SYSCTL_CHOWN (1 + CAP_CHOWN)
+#define CAP_SYSCTL_DAC_OVERRIDE (1 + CAP_DAC_OVERRIDE)
+#define CAP_SYSCTL_DAC_READ_SEARCH (1 + CAP_DAC_READ_SEARCH)
+#define CAP_SYSCTL_FOWNER (1 + CAP_FOWNER)
+#define CAP_SYSCTL_FSETID (1 + CAP_FSETID)
+#define CAP_SYSCTL_KILL (1 + CAP_KILL)
+#define CAP_SYSCTL_SETGID (1 + CAP_SETGID)
+#define CAP_SYSCTL_SETUID (1 + CAP_SETUID)
+#define CAP_SYSCTL_SETPCAP (1 + CAP_SETPCAP)
+#define CAP_SYSCTL_LINUX_IMMUTABLE (1 + CAP_LINUX_IMMUTABLE)
+#define CAP_SYSCTL_NET_BIND_SERVICE (1 + CAP_NET_BIND_SERVICE)
+#define CAP_SYSCTL_NET_BROADCAST (1 + CAP_NET_BROADCAST)
+#define CAP_SYSCTL_NET_ADMIN (1 + CAP_NET_ADMIN)
+#define CAP_SYSCTL_NET_RAW (1 + CAP_NET_RAW)
+#define CAP_SYSCTL_IPC_LOCK (1 + CAP_IPC_LOCK)
+#define CAP_SYSCTL_IPC_OWNER (1 + CAP_IPC_OWNER)
+#define CAP_SYSCTL_SYS_MODULE (1 + CAP_SYS_MODULE)
+#define CAP_SYSCTL_SYS_RAWIO (1 + CAP_SYS_RAWIO)
+#define CAP_SYSCTL_SYS_CHROOT (1 + CAP_SYS_CHROOT)
+#define CAP_SYSCTL_SYS_PTRACE (1 + CAP_SYS_PTRACE)
+#define CAP_SYSCTL_SYS_PACCT (1 + CAP_SYS_PACCT)
+#define CAP_SYSCTL_SYS_ADMIN (1 + CAP_SYS_ADMIN)
+#define CAP_SYSCTL_SYS_BOOT (1 + CAP_SYS_BOOT)
+#define CAP_SYSCTL_SYS_NICE (1 + CAP_SYS_NICE)
+#define CAP_SYSCTL_SYS_RESOURCE (1 + CAP_SYS_RESOURCE)
+#define CAP_SYSCTL_SYS_TIME (1 + CAP_SYS_TIME)
+#define CAP_SYSCTL_SYS_TTY_CONFIG (1 + CAP_SYS_TTY_CONFIG)
+#define CAP_SYSCTL_MKNOD (1 + CAP_MKNOD)
+#define CAP_SYSCTL_LEASE (1 + CAP_LEASE)
+#define MAX_CAPABILITY CAP_SYSCTL_LEASE
+
+#define CAPABILITY_SYSCTL_ENABLED 0
+#define CAPABILITY_SYSCTL_DISABLED 1
+#define CAPABILITY_SYSCTL_ROOT 2
+#define CAPABILITY_SYSCTL_NONE 3
+
+
+/* you've got to be kidding me */
+#define MKCTL(x, y) \
+ { \
+ .ctl_name = CAP_SYSCTL_##x, \
+ .procname = #y , \
+ .extra1 = (void *)&capability_sysctl_zero, \
+ .extra2 = (void *)&capability_sysctl_one, \
+ .data = &capability_sysctl_state[CAP_SYSCTL_##x],\
+ .mode = 0644, \
+ .strategy = sysctl_intvec, \
+ .proc_handler = proc_dointvec_minmax, \
+ .maxlen = sizeof(int), \
+ },
+
+static int capability_sysctl_state[MAX_CAPABILITY];
+static const int capability_sysctl_zero = 0;
+static const int capability_sysctl_one = 1;
+static int secondary;
+static struct ctl_table_header *capability_sysctl_table_header;
+
+static struct ctl_table capability_sysctl_table[] = {
+ MKCTL(CHOWN, chown)
+ MKCTL(DAC_OVERRIDE, dac_override)
+ MKCTL(DAC_READ_SEARCH, dac_read_search)
+ MKCTL(FOWNER, fowner)
+ MKCTL(FSETID, fsetid)
+ MKCTL(KILL, kill)
+ MKCTL(SETGID, setgid)
+ MKCTL(SETUID, setuid)
+ MKCTL(SETPCAP, setpcap)
+ MKCTL(LINUX_IMMUTABLE, immutable)
+ MKCTL(NET_BIND_SERVICE, bind)
+ MKCTL(NET_BROADCAST, broadcast)
+ MKCTL(NET_ADMIN, net_admin)
+ MKCTL(NET_RAW, net_raw)
+ MKCTL(IPC_LOCK, ipc_lock)
+ MKCTL(IPC_OWNER, ipc_owner)
+ MKCTL(SYS_MODULE, module)
+ MKCTL(SYS_RAWIO, rawio)
+ MKCTL(SYS_CHROOT, chroot)
+ MKCTL(SYS_PTRACE, ptrace)
+ MKCTL(SYS_PACCT, pacct)
+ MKCTL(SYS_ADMIN, sys_admin)
+ MKCTL(SYS_BOOT, boot)
+ MKCTL(SYS_NICE, nice)
+ MKCTL(SYS_RESOURCE, resource)
+ MKCTL(SYS_TIME, time)
+ MKCTL(SYS_TTY_CONFIG, tty_config)
+ MKCTL(MKNOD, mknod)
+ MKCTL(LEASE, lease)
+ {
+ .ctl_name = 0,
+ },
+};
+
+static int capability_sysctl_capable(task_t *, int);
+
+static struct ctl_table capability_sysctl_root_table[] = {
+ {
+ .ctl_name = CTL_KERN,
+ .procname = "capability",
+ .mode = 0644,
+ .child = capability_sysctl_table,
+ },
+ {
+ .ctl_name = 0,
+ },
+};
+
+static struct security_operations capability_sysctl_ops = {
+ .ptrace = cap_ptrace,
+ .capget = cap_capget,
+ .capset_check = cap_capset_check,
+ .capset_set = cap_capset_set,
+ .capable = capability_sysctl_capable,
+ .netlink_send = cap_netlink_send,
+ .netlink_recv = cap_netlink_recv,
+ .bprm_compute_creds = cap_bprm_compute_creds,
+ .bprm_set_security = cap_bprm_set_security,
+ .bprm_secureexec = cap_bprm_secureexec,
+ .inode_setxattr = cap_inode_setxattr,
+ .inode_removexattr = cap_inode_removexattr,
+ .task_post_setuid = cap_task_post_setuid,
+ .task_reparent_to_init = cap_task_reparent_to_init,
+ .syslog = cap_syslog,
+ .vm_enough_memory = cap_vm_enough_memory,
+};
+
+
+static int capability_sysctl_capable(task_t *task, int cap)
+{
+ if (cap < 0 || cap >= MAX_CAPABILITY)
+ return -EINVAL;
+ switch (capability_sysctl_state[cap-1]) {
+ case CAPABILITY_SYSCTL_ROOT:
+ if (current->uid == 0)
+ return 0;
+ /* fall through */
+ case CAPABILITY_SYSCTL_ENABLED:
+ if (cap_raised(task->cap_effective, cap))
+ return 0;
+ else
+ return -EPERM;
+ break;
+ case CAPABILITY_SYSCTL_DISABLED:
+ return 0;
+ break;
+ case CAPABILITY_SYSCTL_NONE:
+ return -EPERM;
+ break;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int capability_sysctl_proc_init(void)
+{
+ capability_sysctl_table_header =
+ register_sysctl_table(capability_sysctl_root_table, 0);
+ if (!capability_sysctl_table_header)
+ return -ENOMEM;
+ else
+ return 0;
+}
+
+static int __init capability_sysctl_init(void)
+{
+ if (!register_security(&capability_sysctl_ops)) {
+ secondary = 0;
+ return 0;
+ }
+ if (!mod_reg_security("capability_sysctl", &capability_sysctl_ops)) {
+ secondary = 1;
+ return 0;
+ }
+ printk(KERN_INFO "failure registering sysctl capability disablement\n");
+ return -EINVAL;
+}
+
+static void __exit capability_sysctl_fini(void)
+{
+ if (secondary)
+ mod_unreg_security("capability_sysctl", &capability_sysctl_ops);
+ else
+ unregister_security(&capability_sysctl_ops);
+ if (capability_sysctl_table_header)
+ unregister_sysctl_table(capability_sysctl_table_header);
+}
+security_initcall(capability_sysctl_init);
+module_init(capability_sysctl_proc_init);
+module_exit(capability_sysctl_fini);
+MODULE_DESCRIPTION("Sysctl-based capability check disablement");
+MODULE_LICENSE("GPL");
Index: mm4-2.6.5-rc3/security/Makefile
===================================================================
--- mm4-2.6.5-rc3.orig/security/Makefile 2004-03-29 19:26:54.000000000 -0800
+++ mm4-2.6.5-rc3/security/Makefile 2004-04-01 07:37:41.000000000 -0800
@@ -15,3 +15,4 @@
obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o
obj-$(CONFIG_SECURITY_CAPABILITIES) += commoncap.o capability.o
obj-$(CONFIG_SECURITY_ROOTPLUG) += commoncap.o root_plug.o
+obj-$(CONFIG_SECURITY_CAPABILITY_SYSCTL) += commoncap.o sysctl_capable.o
Index: mm4-2.6.5-rc3/security/Kconfig
===================================================================
--- mm4-2.6.5-rc3.orig/security/Kconfig 2004-03-29 19:26:47.000000000 -0800
+++ mm4-2.6.5-rc3/security/Kconfig 2004-04-01 07:38:49.000000000 -0800
@@ -44,6 +44,12 @@

If you are unsure how to answer this question, answer N.

+config SECURITY_CAPABILITY_SYSCTL
+ bool "Disable capabilities via sysctl"
+ depends on SECURITY!=n
+ help
+ This allows you to disable capabilities with sysctls.
+
source security/selinux/Kconfig

endmenu

2004-04-01 17:01:32

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 08:48:25AM -0800, William Lee Irwin III wrote:
> On Thu, Apr 01, 2004 at 03:59:20PM +0200, Andrea Arcangeli wrote:
> > Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> > guess google also won't dislike it.
> > This is a lot simpler than the mlock rlimit and this is people really
> > need (not the rlimit). The rlimit thing can still be applied on top of
> > this. This should be more efficient too (besides its simplicity).
> > can you apply to mainline?
> > http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc3-aa1/disable-cap-mlock-1
>
> Something like this would have the minor advantage of zero core impact.
> Testbooted only. vs. 2.6.5-rc3-mm4

I certainly like this too (despite it's more complicated but it might
avoid us to have to add further sysctl in the future), Andrew what do
you prefer to merge? I don't mind either ways.

2004-04-01 17:10:45

by Marc-Christian Petersen

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thursday 01 April 2004 18:59, Andrea Arcangeli wrote:

Hi Andrea, Bill,

> > Something like this would have the minor advantage of zero core impact.
> > Testbooted only. vs. 2.6.5-rc3-mm4

Cool!

> I certainly like this too (despite it's more complicated but it might
> avoid us to have to add further sysctl in the future), Andrew what do
> you prefer to merge? I don't mind either ways.

I'd vote for caps via sysctl.

ciao, Marc

2004-04-01 17:16:51

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 08:48:25AM -0800, William Lee Irwin III wrote:
>> Something like this would have the minor advantage of zero core impact.
>> Testbooted only. vs. 2.6.5-rc3-mm4

On Thu, Apr 01, 2004 at 06:59:52PM +0200, Andrea Arcangeli wrote:
> I certainly like this too (despite it's more complicated but it might
> avoid us to have to add further sysctl in the future), Andrew what do
> you prefer to merge? I don't mind either ways.

There are a couple of off-by-ones in there I've got fixes for below.

I didn't so much have it in mind as a pet patch as an example of the
general idea of using the security infrastructure to isolate the
default/core mechanisms from these specialized needs.

My personal preference is actually resolving userspace API issues e.g.
making pam_cap work etc. and/or outstanding implementation issues e.g.
RLMIT_MEMLOCK bits, but since there appears to be such a loud outcry
and/or high demand for this sort of affair, here it is in its full
generality. Updated patch (runtime tested) below.

-- wli

Index: mm4-2.6.5-rc3/security/sysctl_capable.c
===================================================================
--- mm4-2.6.5-rc3.orig/security/sysctl_capable.c 2004-02-07 18:26:35.000000000 -0800
+++ mm4-2.6.5-rc3/security/sysctl_capable.c 2004-04-01 09:07:36.000000000 -0800
@@ -0,0 +1,205 @@
+#include <linux/config.h>
+#include <linux/sysctl.h>
+#include <linux/capability.h>
+#include <linux/security.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+/*
+ * apparently only 0-28 are used
+ * states:
+ * 0: checks enabled
+ * 1: checks disabled
+ * 2: root-only
+ * 3: no access whatsoever
+ */
+#define CAP_SYSCTL_CHOWN (1 + CAP_CHOWN)
+#define CAP_SYSCTL_DAC_OVERRIDE (1 + CAP_DAC_OVERRIDE)
+#define CAP_SYSCTL_DAC_READ_SEARCH (1 + CAP_DAC_READ_SEARCH)
+#define CAP_SYSCTL_FOWNER (1 + CAP_FOWNER)
+#define CAP_SYSCTL_FSETID (1 + CAP_FSETID)
+#define CAP_SYSCTL_KILL (1 + CAP_KILL)
+#define CAP_SYSCTL_SETGID (1 + CAP_SETGID)
+#define CAP_SYSCTL_SETUID (1 + CAP_SETUID)
+#define CAP_SYSCTL_SETPCAP (1 + CAP_SETPCAP)
+#define CAP_SYSCTL_LINUX_IMMUTABLE (1 + CAP_LINUX_IMMUTABLE)
+#define CAP_SYSCTL_NET_BIND_SERVICE (1 + CAP_NET_BIND_SERVICE)
+#define CAP_SYSCTL_NET_BROADCAST (1 + CAP_NET_BROADCAST)
+#define CAP_SYSCTL_NET_ADMIN (1 + CAP_NET_ADMIN)
+#define CAP_SYSCTL_NET_RAW (1 + CAP_NET_RAW)
+#define CAP_SYSCTL_IPC_LOCK (1 + CAP_IPC_LOCK)
+#define CAP_SYSCTL_IPC_OWNER (1 + CAP_IPC_OWNER)
+#define CAP_SYSCTL_SYS_MODULE (1 + CAP_SYS_MODULE)
+#define CAP_SYSCTL_SYS_RAWIO (1 + CAP_SYS_RAWIO)
+#define CAP_SYSCTL_SYS_CHROOT (1 + CAP_SYS_CHROOT)
+#define CAP_SYSCTL_SYS_PTRACE (1 + CAP_SYS_PTRACE)
+#define CAP_SYSCTL_SYS_PACCT (1 + CAP_SYS_PACCT)
+#define CAP_SYSCTL_SYS_ADMIN (1 + CAP_SYS_ADMIN)
+#define CAP_SYSCTL_SYS_BOOT (1 + CAP_SYS_BOOT)
+#define CAP_SYSCTL_SYS_NICE (1 + CAP_SYS_NICE)
+#define CAP_SYSCTL_SYS_RESOURCE (1 + CAP_SYS_RESOURCE)
+#define CAP_SYSCTL_SYS_TIME (1 + CAP_SYS_TIME)
+#define CAP_SYSCTL_SYS_TTY_CONFIG (1 + CAP_SYS_TTY_CONFIG)
+#define CAP_SYSCTL_MKNOD (1 + CAP_MKNOD)
+#define CAP_SYSCTL_LEASE (1 + CAP_LEASE)
+#define MAX_CAPABILITY CAP_SYSCTL_LEASE
+
+#define CAPABILITY_SYSCTL_ENABLED 0
+#define CAPABILITY_SYSCTL_DISABLED 1
+#define CAPABILITY_SYSCTL_ROOT 2
+#define CAPABILITY_SYSCTL_NONE 3
+
+
+/* you've got to be kidding me */
+#define MKCTL(x, y) \
+ { \
+ .ctl_name = CAP_SYSCTL_##x, \
+ .procname = #y , \
+ .extra1 = (void *)&capability_sysctl_zero, \
+ .extra2 = (void *)&capability_sysctl_one, \
+ .data = &capability_sysctl_state[CAP_##x], \
+ .mode = 0644, \
+ .strategy = sysctl_intvec, \
+ .proc_handler = proc_dointvec_minmax, \
+ .maxlen = sizeof(int), \
+ },
+
+static int capability_sysctl_state[MAX_CAPABILITY];
+static const int capability_sysctl_zero = 0;
+static const int capability_sysctl_one = 1;
+static int secondary;
+static struct ctl_table_header *capability_sysctl_table_header;
+
+static struct ctl_table capability_sysctl_table[] = {
+ MKCTL(CHOWN, chown)
+ MKCTL(DAC_OVERRIDE, dac_override)
+ MKCTL(DAC_READ_SEARCH, dac_read_search)
+ MKCTL(FOWNER, fowner)
+ MKCTL(FSETID, fsetid)
+ MKCTL(KILL, kill)
+ MKCTL(SETGID, setgid)
+ MKCTL(SETUID, setuid)
+ MKCTL(SETPCAP, setpcap)
+ MKCTL(LINUX_IMMUTABLE, immutable)
+ MKCTL(NET_BIND_SERVICE, bind)
+ MKCTL(NET_BROADCAST, broadcast)
+ MKCTL(NET_ADMIN, net_admin)
+ MKCTL(NET_RAW, net_raw)
+ MKCTL(IPC_LOCK, ipc_lock)
+ MKCTL(IPC_OWNER, ipc_owner)
+ MKCTL(SYS_MODULE, module)
+ MKCTL(SYS_RAWIO, rawio)
+ MKCTL(SYS_CHROOT, chroot)
+ MKCTL(SYS_PTRACE, ptrace)
+ MKCTL(SYS_PACCT, pacct)
+ MKCTL(SYS_ADMIN, sys_admin)
+ MKCTL(SYS_BOOT, boot)
+ MKCTL(SYS_NICE, nice)
+ MKCTL(SYS_RESOURCE, resource)
+ MKCTL(SYS_TIME, time)
+ MKCTL(SYS_TTY_CONFIG, tty_config)
+ MKCTL(MKNOD, mknod)
+ MKCTL(LEASE, lease)
+ {
+ .ctl_name = 0,
+ },
+};
+
+static int capability_sysctl_capable(task_t *, int);
+
+static struct ctl_table capability_sysctl_root_table[] = {
+ {
+ .ctl_name = CTL_KERN,
+ .procname = "capability",
+ .mode = 0644,
+ .child = capability_sysctl_table,
+ },
+ {
+ .ctl_name = 0,
+ },
+};
+
+static struct security_operations capability_sysctl_ops = {
+ .ptrace = cap_ptrace,
+ .capget = cap_capget,
+ .capset_check = cap_capset_check,
+ .capset_set = cap_capset_set,
+ .capable = capability_sysctl_capable,
+ .netlink_send = cap_netlink_send,
+ .netlink_recv = cap_netlink_recv,
+ .bprm_compute_creds = cap_bprm_compute_creds,
+ .bprm_set_security = cap_bprm_set_security,
+ .bprm_secureexec = cap_bprm_secureexec,
+ .inode_setxattr = cap_inode_setxattr,
+ .inode_removexattr = cap_inode_removexattr,
+ .task_post_setuid = cap_task_post_setuid,
+ .task_reparent_to_init = cap_task_reparent_to_init,
+ .syslog = cap_syslog,
+ .vm_enough_memory = cap_vm_enough_memory,
+};
+
+
+static int capability_sysctl_capable(task_t *task, int cap)
+{
+ if (cap < 0 || cap >= ARRAY_SIZE(capability_sysctl_state))
+ return -EINVAL;
+ switch (capability_sysctl_state[cap]) {
+ case CAPABILITY_SYSCTL_ROOT:
+ if (current->uid == 0)
+ return 0;
+ /* fall through */
+ case CAPABILITY_SYSCTL_ENABLED:
+ if (cap_raised(task->cap_effective, cap))
+ return 0;
+ else
+ return -EPERM;
+ break;
+ case CAPABILITY_SYSCTL_DISABLED:
+ return 0;
+ break;
+ case CAPABILITY_SYSCTL_NONE:
+ return -EPERM;
+ break;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int capability_sysctl_proc_init(void)
+{
+ capability_sysctl_table_header =
+ register_sysctl_table(capability_sysctl_root_table, 0);
+ if (!capability_sysctl_table_header)
+ return -ENOMEM;
+ else
+ return 0;
+}
+
+static int __init capability_sysctl_init(void)
+{
+ if (!register_security(&capability_sysctl_ops)) {
+ secondary = 0;
+ return 0;
+ }
+ if (!mod_reg_security("capability_sysctl", &capability_sysctl_ops)) {
+ secondary = 1;
+ return 0;
+ }
+ printk(KERN_INFO "failure registering sysctl capability disablement\n");
+ return -EINVAL;
+}
+
+static void __exit capability_sysctl_fini(void)
+{
+ if (secondary)
+ mod_unreg_security("capability_sysctl", &capability_sysctl_ops);
+ else
+ unregister_security(&capability_sysctl_ops);
+ if (capability_sysctl_table_header)
+ unregister_sysctl_table(capability_sysctl_table_header);
+}
+security_initcall(capability_sysctl_init);
+module_init(capability_sysctl_proc_init);
+module_exit(capability_sysctl_fini);
+MODULE_DESCRIPTION("Sysctl-based capability check disablement");
+MODULE_LICENSE("GPL");
Index: mm4-2.6.5-rc3/security/Makefile
===================================================================
--- mm4-2.6.5-rc3.orig/security/Makefile 2004-03-29 19:26:54.000000000 -0800
+++ mm4-2.6.5-rc3/security/Makefile 2004-04-01 07:37:41.000000000 -0800
@@ -15,3 +15,4 @@
obj-$(CONFIG_SECURITY_SELINUX) += selinux/built-in.o
obj-$(CONFIG_SECURITY_CAPABILITIES) += commoncap.o capability.o
obj-$(CONFIG_SECURITY_ROOTPLUG) += commoncap.o root_plug.o
+obj-$(CONFIG_SECURITY_CAPABILITY_SYSCTL) += commoncap.o sysctl_capable.o
Index: mm4-2.6.5-rc3/security/Kconfig
===================================================================
--- mm4-2.6.5-rc3.orig/security/Kconfig 2004-03-29 19:26:47.000000000 -0800
+++ mm4-2.6.5-rc3/security/Kconfig 2004-04-01 07:38:49.000000000 -0800
@@ -44,6 +44,12 @@

If you are unsure how to answer this question, answer N.

+config SECURITY_CAPABILITY_SYSCTL
+ bool "Disable capabilities via sysctl"
+ depends on SECURITY!=n
+ help
+ This allows you to disable capabilities with sysctls.
+
source security/selinux/Kconfig

endmenu

2004-04-01 17:34:20

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

it's not compiling:

security/sysctl_capable.c:273: error: redefinition of `capability_sysctl_zero'
security/sysctl_capable.c:68: error: `capability_sysctl_zero' previously defined here
security/sysctl_capable.c:274: error: redefinition of `capability_sysctl_one'
security/sysctl_capable.c:69: error: `capability_sysctl_one' previously defined here
security/sysctl_capable.c:278: error: redefinition of `capability_sysctl_table'
security/sysctl_capable.c:73: error: `capability_sysctl_table' previously defined here
security/sysctl_capable.c:315: error: redefinition of `capability_sysctl_root_table'
security/sysctl_capable.c:110: error: `capability_sysctl_root_table' previously defined here
security/sysctl_capable.c:327: error: redefinition of `capability_sysctl_ops'
security/sysctl_capable.c:122: error: `capability_sysctl_ops' previously defined here
security/sysctl_capable.c:348: error: redefinition of `capability_sysctl_capable'
security/sysctl_capable.c:143: error: `capability_sysctl_capable' previously defined here
security/sysctl_capable.c:374: error: redefinition of `capability_sysctl_proc_init'
security/sysctl_capable.c:169: error: `capability_sysctl_proc_init' previously defined here
security/sysctl_capable.c:384: error: redefinition of `capability_sysctl_init'
security/sysctl_capable.c:179: error: `capability_sysctl_init' previously defined here
security/sysctl_capable.c:398: error: redefinition of `capability_sysctl_fini'
security/sysctl_capable.c:193: error: `capability_sysctl_fini' previously defined here
security/sysctl_capable.c:406: error: redefinition of `__initcall_capability_sysctl_init'
security/sysctl_capable.c:201: error: `__initcall_capability_sysctl_init' previously defined here
security/sysctl_capable.c:407: error: redefinition of `__initcall_capability_sysctl_proc_init'
security/sysctl_capable.c:202: error: `__initcall_capability_sysctl_proc_init' previously defined here
security/sysctl_capable.c:408: error: redefinition of `__exitcall_capability_sysctl_fini'
security/sysctl_capable.c:203: error: `__exitcall_capability_sysctl_fini' previously defined here
security/sysctl_capable.c:348: warning: `capability_sysctl_capable' defined but not used
make[1]: *** [security/sysctl_capable.o] Error 1
make: *** [security] Error 2

I'm still dealing with the swapsuspend crashes so I cannot look into the
above just now. The swapsuspend now crashes in a different place, but I
believe with my last patch the VM is ok now (there is no more sign of
crashes in radix tree or pagecache operations).

2004-04-01 17:38:49

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 07:34:17PM +0200, Andrea Arcangeli wrote:
> it's not compiling:
> security/sysctl_capable.c:273: error: redefinition of `capability_sysctl_zero'
> security/sysctl_capable.c:68: error: `capability_sysctl_zero' previously defined here
> security/sysctl_capable.c:274: error: redefinition of `capability_sysctl_one'
> security/sysctl_capable.c:69: error: `capability_sysctl_one' previously defined here
> security/sysctl_capable.c:278: error: redefinition of `capability_sysctl_table'

Hmm, there aren't 270+ lines in the file; it looks like I may have posted
a full replacement instead of an incremental diff.

-- wli

2004-04-01 17:40:11

by Stephen Smalley

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, 2004-04-01 at 12:16, William Lee Irwin III wrote:
> +static int capability_sysctl_capable(task_t *task, int cap)
> +{
> + if (cap < 0 || cap >= ARRAY_SIZE(capability_sysctl_state))
> + return -EINVAL;
> + switch (capability_sysctl_state[cap]) {
> + case CAPABILITY_SYSCTL_ROOT:
> + if (current->uid == 0)
> + return 0;
> + /* fall through */

See dummy_capable for the root logic, i.e.:
if (cap_is_fs_cap (cap) ? task->fsuid == 0 : task->euid == 0)
return 0;

Note that you shouldn't assume that task == current. The intent is to
support capability checks against other processes as well, e.g. the old
OOM killer code performed such checks as part of deciding which process
to kill.

Why fall through as opposed to just returning -EPERM?

What prevents any uid 0 process from changing these sysctl settings
(aside from SELinux, if you happen to use it and configure the policy
accordingly)?

--
Stephen Smalley <[email protected]>
National Security Agency

2004-04-01 17:42:46

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 09:38:43AM -0800, William Lee Irwin III wrote:
> On Thu, Apr 01, 2004 at 07:34:17PM +0200, Andrea Arcangeli wrote:
> > it's not compiling:
> > security/sysctl_capable.c:273: error: redefinition of `capability_sysctl_zero'
> > security/sysctl_capable.c:68: error: `capability_sysctl_zero' previously defined here
> > security/sysctl_capable.c:274: error: redefinition of `capability_sysctl_one'
> > security/sysctl_capable.c:69: error: `capability_sysctl_one' previously defined here
> > security/sysctl_capable.c:278: error: redefinition of `capability_sysctl_table'
>
> Hmm, there aren't 270+ lines in the file; it looks like I may have posted
> a full replacement instead of an incremental diff.

patch silenty screwed while applying it (I did -R and then reapplied
twice). Patch should bomb on stuff like that, anyways. I'll try again
now.

2004-04-01 17:44:33

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 12:37:51PM -0500, Stephen Smalley wrote:
> See dummy_capable for the root logic, i.e.:
> if (cap_is_fs_cap (cap) ? task->fsuid == 0 : task->euid == 0)
> return 0;
> Note that you shouldn't assume that task == current. The intent is to
> support capability checks against other processes as well, e.g. the old
> OOM killer code performed such checks as part of deciding which process
> to kill.

That's a bogon; thanks for checking.

On Thu, Apr 01, 2004 at 12:37:51PM -0500, Stephen Smalley wrote:
> Why fall through as opposed to just returning -EPERM?

It's a made-up thing, so the semantics are totally contrived. I had
in mind a "root bypasses all capability checks" thing. Maybe it should
die.

On Thu, Apr 01, 2004 at 12:37:51PM -0500, Stephen Smalley wrote:
> What prevents any uid 0 process from changing these sysctl settings
> (aside from SELinux, if you happen to use it and configure the policy
> accordingly)?

I'm aware it does some very unintelligent things to the security model,
e.g. anyone with fs-level access to these things can basically escalate
their capabilities to "everything". Maybe some kind of big fat warning
is in order.

-- wli

Index: mm4-2.6.5-rc3/security/sysctl_capable.c
===================================================================
--- mm4-2.6.5-rc3.orig/security/sysctl_capable.c 2004-04-01 09:07:36.000000000 -0800
+++ mm4-2.6.5-rc3/security/sysctl_capable.c 2004-04-01 09:41:41.000000000 -0800
@@ -145,7 +145,7 @@
return -EINVAL;
switch (capability_sysctl_state[cap]) {
case CAPABILITY_SYSCTL_ROOT:
- if (current->uid == 0)
+ if (task->uid == 0)
return 0;
/* fall through */
case CAPABILITY_SYSCTL_ENABLED:

2004-04-01 17:49:12

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 09:44:05AM -0800, William Lee Irwin III wrote:
> I'm aware it does some very unintelligent things to the security model,
> e.g. anyone with fs-level access to these things can basically escalate
> their capabilities to "everything". Maybe some kind of big fat warning
> is in order.

a similar issue happens with the disable-cap-mlock, but it gives only
mlock to the guy with fs-level fsuid = 0 access, so the security
implications are greatly lower, no way to do anything really bad with
just access to mlock (DoS is the very worst scenario, and it's not very
different from swapoff -a anyways, or again not very different from
filling the swap enterely as far as security is concerned).

2004-04-01 17:51:36

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 12:37:51PM -0500, Stephen Smalley wrote:
>> What prevents any uid 0 process from changing these sysctl settings
>> (aside from SELinux, if you happen to use it and configure the policy
>> accordingly)?

On Thu, Apr 01, 2004 at 09:44:05AM -0800, William Lee Irwin III wrote:
> I'm aware it does some very unintelligent things to the security model,
> e.g. anyone with fs-level access to these things can basically escalate
> their capabilities to "everything". Maybe some kind of big fat warning
> is in order.

Index: mm4-2.6.5-rc3/security/Kconfig
===================================================================
--- mm4-2.6.5-rc3.orig/security/Kconfig 2004-04-01 07:38:49.000000000 -0800
+++ mm4-2.6.5-rc3/security/Kconfig 2004-04-01 09:49:43.000000000 -0800
@@ -49,6 +49,13 @@
depends on SECURITY!=n
help
This allows you to disable capabilities with sysctls.
+ It effectively breaks the kernel's security model so that
+ any user with fs-level access to /proc/sys/capability/*
+ can escalate their privileges to "able to do anything",
+ but some users have special-case needs for these things.
+ Don't use this on any system with untrusted local users.
+ It's probably best to firewall the living daylights out
+ of anything using this also.

source security/selinux/Kconfig

2004-04-01 17:54:49

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thursday 01 April 2004 19:44, William Lee Irwin III wrote:
>> I'm aware it does some very unintelligent things to the security model,
>> e.g. anyone with fs-level access to these things can basically escalate
>> their capabilities to "everything". Maybe some kind of big fat warning
>> is in order.

On Thu, Apr 01, 2004 at 07:52:29PM +0200, Marc-Christian Petersen wrote:
> hmm, maybe a /proc/sys/capability/lock and if set to 1 you can't
> change any of the sysctl variables, even root should not be allowed
> to change lock back, until you do a reboot. Practical?
> ciao, Marc

Feasible, though it's an open question as to how many hoops we should
jump through to prevent people from shooting themselves in the foot.

Maybe Steven could recommend adjustments and/or give some idea as to
whether the above would be useful.

-- wli

2004-04-01 17:53:39

by Marc-Christian Petersen

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thursday 01 April 2004 19:44, William Lee Irwin III wrote:

Hi,

> > What prevents any uid 0 process from changing these sysctl settings
> > (aside from SELinux, if you happen to use it and configure the policy
> > accordingly)?

> I'm aware it does some very unintelligent things to the security model,
> e.g. anyone with fs-level access to these things can basically escalate
> their capabilities to "everything". Maybe some kind of big fat warning
> is in order.

hmm, maybe a /proc/sys/capability/lock and if set to 1 you can't change any of
the sysctl variables, even root should not be allowed to change lock back,
until you do a reboot. Practical?

ciao, Marc

2004-04-01 18:13:05

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 09:44:05AM -0800, William Lee Irwin III wrote:
>> I'm aware it does some very unintelligent things to the security model,
>> e.g. anyone with fs-level access to these things can basically escalate
>> their capabilities to "everything". Maybe some kind of big fat warning
>> is in order.

And maybe proper directory permissions, too.

Index: mm4-2.6.5-rc3/security/sysctl_capable.c
===================================================================
--- mm4-2.6.5-rc3.orig/security/sysctl_capable.c 2004-04-01 09:41:41.000000000 -0800
+++ mm4-2.6.5-rc3/security/sysctl_capable.c 2004-04-01 10:11:53.000000000 -0800
@@ -111,7 +111,7 @@
{
.ctl_name = CTL_KERN,
.procname = "capability",
- .mode = 0644,
+ .mode = 0555,
.child = capability_sysctl_table,
},
{

2004-04-01 18:34:38

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

William Lee Irwin III <[email protected]> wrote:
>
> On Thu, Apr 01, 2004 at 08:48:25AM -0800, William Lee Irwin III wrote:
> >> Something like this would have the minor advantage of zero core impact.
> >> Testbooted only. vs. 2.6.5-rc3-mm4
>
> On Thu, Apr 01, 2004 at 06:59:52PM +0200, Andrea Arcangeli wrote:
> > I certainly like this too (despite it's more complicated but it might
> > avoid us to have to add further sysctl in the future), Andrew what do
> > you prefer to merge? I don't mind either ways.

What is the Oracle requirement in detail?

If it's for access to hugetlbfs then there are the uid= and gid= mount
options.

If it's for access to SHM_HUGETLB then there was some discussion about
extending the uid= thing to shm, but nothing happened. This could be
resurrected.

If it's just generally for the ability to mlock lots of memory then
RLIMIT_MEMLOCK would be preferable. I don't see why we'd need the sysctl
when `ulimit -m' is available? (Where is that patch btw?)

> There are a couple of off-by-ones in there I've got fixes for below.

Using the security framework is neat. There are currently large spinlock
contention problems in avc_has_perm_noaudit() which I suspect will make
SELinux problematic in some server environments. But I trust it is
possible to disable SELinux in config while using Bill's security module?

I guess we could live with sysctl which simply nukes CAP_IPC_LOCK, but it
has to be the when-all-else-failed option, yes?

2004-04-01 18:48:14

by Stephen Smalley

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, 2004-04-01 at 12:54, William Lee Irwin III wrote:
> On Thu, Apr 01, 2004 at 07:52:29PM +0200, Marc-Christian Petersen wrote:
> > hmm, maybe a /proc/sys/capability/lock and if set to 1 you can't
> > change any of the sysctl variables, even root should not be allowed
> > to change lock back, until you do a reboot. Practical?
> > ciao, Marc
>
> Feasible, though it's an open question as to how many hoops we should
> jump through to prevent people from shooting themselves in the foot.
>
> Maybe Steven could recommend adjustments and/or give some idea as to
> whether the above would be useful.

Some form of control over changing the sysctl settings (beyond just the
mode) should be provided; otherwise, the module is too unsafe by itself
for real use, and you can't assume that people will only use it stacked
with SELinux (which could control such changes). Allowing the settings
to be locked as mcp suggested sounds simple and sufficient for the
proposed use; they can disable their desired capability and then lock in
/sbin/init. For greater generality, I'd suggest adding a new capability
to control the ability to set the capability sysctls, but then we are in
a vicious cycle...

--
Stephen Smalley <[email protected]>
National Security Agency

2004-04-01 18:50:05

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 10:34:25AM -0800, Andrew Morton wrote:
> William Lee Irwin III <[email protected]> wrote:
> >
> > On Thu, Apr 01, 2004 at 08:48:25AM -0800, William Lee Irwin III wrote:
> > >> Something like this would have the minor advantage of zero core impact.
> > >> Testbooted only. vs. 2.6.5-rc3-mm4
> >
> > On Thu, Apr 01, 2004 at 06:59:52PM +0200, Andrea Arcangeli wrote:
> > > I certainly like this too (despite it's more complicated but it might
> > > avoid us to have to add further sysctl in the future), Andrew what do
> > > you prefer to merge? I don't mind either ways.
>
> What is the Oracle requirement in detail?

being able to shmget(SHM_HUGETLB) as normal user. However I cannot
disable the CAP_IPC_LOCK from hugetlbfs/inode.c since that would break
local security w.r.t. mlock.

If I've to disable that single check in hugetlbfs/inode.c then I prefer
to disable all CAP_IPC_LOCK so they can as well use mlock.

> If it's for access to hugetlbfs then there are the uid= and gid= mount
> options.

sure we know, that's not the problem, disabling mlock checks is easy
with hugetlbfs marking the mountpoint 777.

> If it's for access to SHM_HUGETLB then there was some discussion about
> extending the uid= thing to shm, but nothing happened. This could be
> resurrected.

never heard of those discussions sorry, though I'm not completely sure
that one single uid would work, a 777 ownership would certain work
instead, but how to give 777 ownership to not an fs. I mean, the sysctl
is an order of magnitude simpler and it covers the mlock and
shmclt(SHM_LOCK/UNLOCK) usages as well that cannot be covered by rlimit
either.

>
> If it's just generally for the ability to mlock lots of memory then
> RLIMIT_MEMLOCK would be preferable. I don't see why we'd need the sysctl
> when `ulimit -m' is available? (Where is that patch btw?)

the real reason is shmget(SHM_HUGETLB) but if I disable that
specific CAP_IPC_LOCK I can disable them all, and then nobody will have
to use rlimit anymore for this.

Also consider the same user can shmget multiple segments and that can
exceed the 4G limit of rlimit, since that's not address space but
physical memory (either ram or swap).

> I guess we could live with sysctl which simply nukes CAP_IPC_LOCK, but it
> has to be the when-all-else-failed option, yes?

Probably the main advantage is that it doesn't giveup all righs to the
user with fsuid == 0, it only gives it mlock. Though I can imagine some
user may not like to giveup mlock to fsuid == 0 either, so even that may
need to be a config option, but OTOH overcommit and other vm bits (again
stuff that can't be used to do real bad stuff) are already available to
fsuid == 0. So I think the sysctl-cap-mlock is reasonable feature at
least until there's a more finegriend way to give the
shmget(SHM_HUGETLB)/shmctl(SHM_LOCK/UNLOCK) rights.

2004-04-01 18:53:36

by Chen, Kenneth W

[permalink] [raw]

Subject: RE: disable-cap-mlock

>>>>> Andrew Morton wrote on Thursday, April 01, 2004 10:34 AM
>
> If it's for access to SHM_HUGETLB

This is the main reason.

> then there was some discussion about
> extending the uid= thing to shm, but nothing happened. This could be
> resurrected.

We have tried doing that, in fact, I have worked on this on and off for
a while, none of the solutions we came up with are clean enough.

> I guess we could live with sysctl which simply nukes CAP_IPC_LOCK, but
> it has to be the when-all-else-failed option, yes?

Very much agreed, I also very much in agreement with wli that the user
level tool need a major improvement. These CAP_* hook has been in the
kernel for ages (since 2.4?), but the user land tool seems fossilized.
Last time I tried libcap (about two weeks ago), it segv on me.

- Ken

2004-04-01 18:59:45

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 10:34:25AM -0800, Andrew Morton wrote:
> What is the Oracle requirement in detail?
> If it's for access to hugetlbfs then there are the uid= and gid= mount
> options.
> If it's for access to SHM_HUGETLB then there was some discussion about
> extending the uid= thing to shm, but nothing happened. This could be
> resurrected.
> If it's just generally for the ability to mlock lots of memory then
> RLIMIT_MEMLOCK would be preferable. I don't see why we'd need the sysctl
> when `ulimit -m' is available? (Where is that patch btw?)

I don't speak for Oracle (obviously), but it's basically for non-root
users to get at the stuff. There's an issue with a few pieces of
userspace that drive the kernel's capability bits being nonstandard
and/or broken (out-of-date? I can't even find the stuff).

DB2 gets away with using the C capability libraries directly because
its launcher scripts are basically setuid, then it arranges to avoid
dropping the capabilities. This is actually not ideal even for DB2, and
other databases don't use analogous launching scripts able to do this.

The "right" way from the userspace angle is basically either pam_cap or
the mlock rlimit, so when you log in as the database user, you get the
capabilities and/or rlimits. I don't appear to be able to decipher
what's going on with pam_cap and I'm not entirely sure anyone else has
either. The mlock rlimits appear to have a more coherent userspace
support story, and are "supposed" to be there anyway. The implementation
just seems to be missing pieces.

William Lee Irwin III <[email protected]> wrote:
>> There are a couple of off-by-ones in there I've got fixes for below.

On Thu, Apr 01, 2004 at 10:34:25AM -0800, Andrew Morton wrote:
> Using the security framework is neat. There are currently large spinlock
> contention problems in avc_has_perm_noaudit() which I suspect will make
> SELinux problematic in some server environments. But I trust it is
> possible to disable SELinux in config while using Bill's security module?
> I guess we could live with sysctl which simply nukes CAP_IPC_LOCK, but it
> has to be the when-all-else-failed option, yes?

The module I wrote acts as one of a number of different alternative
security policies (choosable at compile-time, and even at runtime if
I'd figured out how to actually do loadable modules properly). The
entire callback infrastructure configures out, and choices of security
models configure each other out in turn. It's somewhat more general
than it has to be, and amounts to what's basically a semi-open security
model with the monotonic etc. properties of capabilities removed since
r/w fs access to /proc/sys/capabilities/* entails all others. In theory,
this could be used for other things (CAP_SYS_NICE and CAP_SYS_RAWIO
come to mind), though I'm not aware of any outcry for things like this
for any other capabilities but CAP_IPC_LOCK.

I guess I can say that I'm not actually very wild about the security
module I wrote myself (it was more of an isolation-from-the-core effort
than a thing I wanted in and of itself). Some pieces may be able to be
made safer for users with Steven's suggestions.

-- wli

2004-04-01 19:27:56

by James Morris

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, 1 Apr 2004, Andrew Morton wrote:

> Using the security framework is neat. There are currently large spinlock
> contention problems in avc_has_perm_noaudit() which I suspect will make
> SELinux problematic in some server environments.

This issue will be addressed soon.

- James
--
James Morris
<[email protected]>

2004-04-01 19:26:42

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 01:47:18PM -0500, Stephen Smalley wrote:
> Some form of control over changing the sysctl settings (beyond just the
> mode) should be provided; otherwise, the module is too unsafe by itself
> for real use, and you can't assume that people will only use it stacked
> with SELinux (which could control such changes). Allowing the settings
> to be locked as mcp suggested sounds simple and sufficient for the
> proposed use; they can disable their desired capability and then lock in
> /sbin/init. For greater generality, I'd suggest adding a new capability
> to control the ability to set the capability sysctls, but then we are in
> a vicious cycle...

Okay, done.

Misc fix thrown in: the policies beyond enabled/disabled were wrongly
set up in minmax' args, so this throws the real max in the table.

-- wli

Index: mm4-2.6.5-rc3/security/sysctl_capable.c
===================================================================
--- mm4-2.6.5-rc3.orig/security/sysctl_capable.c 2004-04-01 10:11:53.000000000 -0800
+++ mm4-2.6.5-rc3/security/sysctl_capable.c 2004-04-01 11:24:44.000000000 -0800
@@ -43,6 +43,7 @@
#define CAP_SYSCTL_MKNOD (1 + CAP_MKNOD)
#define CAP_SYSCTL_LEASE (1 + CAP_LEASE)
#define MAX_CAPABILITY CAP_SYSCTL_LEASE
+#define CAP_SYSCTL_LOCKDOWN (1 + MAX_CAPABILITY)

#define CAPABILITY_SYSCTL_ENABLED 0
#define CAPABILITY_SYSCTL_DISABLED 1
@@ -56,19 +57,22 @@
.ctl_name = CAP_SYSCTL_##x, \
.procname = #y , \
.extra1 = (void *)&capability_sysctl_zero, \
- .extra2 = (void *)&capability_sysctl_one, \
+ .extra2 = (void *)&capability_sysctl_three, \
.data = &capability_sysctl_state[CAP_##x], \
.mode = 0644, \
.strategy = sysctl_intvec, \
- .proc_handler = proc_dointvec_minmax, \
+ .proc_handler = capability_sysctl_handler, \
.maxlen = sizeof(int), \
},

static int capability_sysctl_state[MAX_CAPABILITY];
static const int capability_sysctl_zero = 0;
static const int capability_sysctl_one = 1;
-static int secondary;
+static const int capability_sysctl_three = 3;
+static int secondary, lockdown;
static struct ctl_table_header *capability_sysctl_table_header;
+static int capability_sysctl_handler(struct ctl_table *, int,
+ struct file *, void __user *, size_t *);

static struct ctl_table capability_sysctl_table[] = {
MKCTL(CHOWN, chown)
@@ -101,6 +105,17 @@
MKCTL(MKNOD, mknod)
MKCTL(LEASE, lease)
{
+ .ctl_name = CAP_SYSCTL_LOCKDOWN,
+ .procname = "lockdown",
+ .extra1 = (void *)&capability_sysctl_zero,
+ .extra2 = (void *)&capability_sysctl_one,
+ .data = &lockdown,
+ .mode = 0644,
+ .strategy = sysctl_intvec,
+ .proc_handler = capability_sysctl_handler,
+ .maxlen = sizeof(int),
+ },
+ {
.ctl_name = 0,
},
};
@@ -138,6 +153,14 @@
.vm_enough_memory = cap_vm_enough_memory,
};

+static int capability_sysctl_handler(struct ctl_table *table,
+ int write, struct file *file, void __user *buf, size_t *length)
+{
+ if (lockdown && write)
+ return -EINVAL;
+ else
+ return proc_dointvec_minmax(table, write, file, buf, length);
+}

static int capability_sysctl_capable(task_t *task, int cap)
{

2004-04-01 19:45:15

by Rik van Riel

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, 1 Apr 2004, Andrea Arcangeli wrote:

> This is a lot simpler than the mlock rlimit and this is people really
> need (not the rlimit). The rlimit thing can still be applied on top of
> this. This should be more efficient too (besides its simplicity).

What use is this patch ?

One of the main reasons for the mlock rlimit is so that
security conscious people can let normal users' gpg
mlock a few pages.

This patch isn't usable for that at all, since switching
the sysctl on would just open up the system to an easy
deadlock by any user. Definately not something any
security conscious admin would want to enable ...

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-04-01 19:53:06

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Rik van Riel <[email protected]> wrote:
>
> One of the main reasons for the mlock rlimit is so that
> security conscious people can let normal users' gpg
> mlock a few pages.

Could you please refresh-n-send the RLIMIT_MEMLOCK patch?

2004-04-01 20:22:48

by Marc-Christian Petersen

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thursday 01 April 2004 21:26, William Lee Irwin III wrote:

Hi Bill,

> Okay, done.
> Misc fix thrown in: the policies beyond enabled/disabled were wrongly
> set up in minmax' args, so this throws the real max in the table.

Great. Works :) ... Prolly the attached one ontop.

ciao, Marc

Attachments:

(No filename) (294.00 B)
sysctl_capable-Kconfig-update.patch (684.00 B)
Download all attachments

2004-04-01 21:32:56

by Marc-Christian Petersen

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thursday 01 April 2004 23:13, William Lee Irwin III wrote:

Hey Bill,

> I folded that into my little series, but they'll probably all get
> globbed together for archival in the end anyway.
> Not sure where this is all going. I guess if someone's got a use for it
> or otherwise it's a useful example of how to do a security module,
> maybe writing it did some good after all.

Well, I really like your patch and I also have a use for it and I _bet_ many
other people too. One beer? :)

ciao, Marc

2004-04-01 21:57:27

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thursday 01 April 2004 21:26, William Lee Irwin III wrote:
>> Okay, done.
>> Misc fix thrown in: the policies beyond enabled/disabled were wrongly
>> set up in minmax' args, so this throws the real max in the table.

On Thu, Apr 01, 2004 at 10:23:07PM +0200, Marc-Christian Petersen wrote:
> Great. Works :) ... Prolly the attached one ontop.
> ciao, Marc

I folded that into my little series, but they'll probably all get
globbed together for archival in the end anyway.

Not sure where this is all going. I guess if someone's got a use for it
or otherwise it's a useful example of how to do a security module,
maybe writing it did some good after all.

-- wli

2004-04-01 22:30:09

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 02:44:50PM -0500, Rik van Riel wrote:
> On Thu, 1 Apr 2004, Andrea Arcangeli wrote:
>
> > This is a lot simpler than the mlock rlimit and this is people really
> > need (not the rlimit). The rlimit thing can still be applied on top of
> > this. This should be more efficient too (besides its simplicity).
>
> What use is this patch ?
>
> One of the main reasons for the mlock rlimit is so that
> security conscious people can let normal users' gpg
> mlock a few pages.
>
> This patch isn't usable for that at all, since switching
> the sysctl on would just open up the system to an easy
> deadlock by any user. Definately not something any
> security conscious admin would want to enable ...

there's no way the rlimit patch can cover shmget(SHM_HUGETLB) and
shmctl(SHM_LOCK). That's the use of this patch.

Plus it obsoletes the need of setting rlimit for apps like databases.

the rlimit patch remains useful for the multiuser system you're talking
about (assuming you also limit the number of tasks per-user
accordingly).

2004-04-01 22:36:30

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 11:52:52AM -0800, Andrew Morton wrote:
> Rik van Riel <[email protected]> wrote:
> >
> > One of the main reasons for the mlock rlimit is so that
> > security conscious people can let normal users' gpg
> > mlock a few pages.
>
> Could you please refresh-n-send the RLIMIT_MEMLOCK patch?

I asked it to Rik too but he redirected me at some rpm, but luckily
Marc-Christian extracted it and he posted it on l-k some week ago, so
you can just check l-k (From: Marc-Christian) and you'll find it. It's
against 2.4 however. Problem is that it's absolutely useless for the
problem I had to solve, or I would be using it already instead.

2004-04-01 22:42:51

by Marc-Christian Petersen

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Friday 02 April 2004 00:36, Andrea Arcangeli wrote:

Hey Andrea,

> > Could you please refresh-n-send the RLIMIT_MEMLOCK patch?

> I asked it to Rik too but he redirected me at some rpm, but luckily
> Marc-Christian extracted it and he posted it on l-k some week ago, so
> you can just check l-k (From: Marc-Christian) and you'll find it. It's
> against 2.4 however. Problem is that it's absolutely useless for the
> problem I had to solve, or I would be using it already instead.

hehe. Well. I have this flying around on my hard disk.

root@codeman:[/usr/src/patches/distribution] # du -skh .
1702M .

almost every interesting distribution kernels are located there in extracted
form :)

It's actually this one:
http://marc.theaimsgroup.com/?l=linux-kernel&m=107980096115231&w=2

(won't post the patch again to save lkml space ;)

ciao, Marc

2004-04-01 23:08:45

by Rik van Riel

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Fri, 2 Apr 2004, Andrea Arcangeli wrote:

> Marc-Christian extracted it and he posted it on l-k some week ago, so
> you can just check l-k (From: Marc-Christian) and you'll find it. It's
> against 2.4 however. Problem is that it's absolutely useless for the
> problem I had to solve, or I would be using it already instead.

Oracle seems to be using it just fine in a certain 2.4
based kernel, so why exactly do you think it would be
useless for the problem you want to solve ?

Also, what would need to be fixed in order for it to
not be useless ? ;)

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-04-01 23:26:07

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 06:08:18PM -0500, Rik van Riel wrote:
> Oracle seems to be using it just fine in a certain 2.4
> based kernel, so why exactly do you think it would be
> useless for the problem you want to solve ?
>
> Also, what would need to be fixed in order for it to
> not be useless ? ;)

tell me how to call shmget(SHM_HUGETLB) without having the CAP_IPC_LOCK
with the rlimit patch.

2004-04-02 00:59:58

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrea Arcangeli ([email protected]) wrote:
> On Thu, Apr 01, 2004 at 06:08:18PM -0500, Rik van Riel wrote:
> > Oracle seems to be using it just fine in a certain 2.4
> > based kernel, so why exactly do you think it would be
> > useless for the problem you want to solve ?
> >
> > Also, what would need to be fixed in order for it to
> > not be useless ? ;)
>
> tell me how to call shmget(SHM_HUGETLB) without having the CAP_IPC_LOCK
> with the rlimit patch.

Account for the equivalent "locked" huge pages on shmget. I did something
like this when porting the mlock patch to 2.6 a month or so ago. I also
recall finding a couple problems along the way, but it's been a while.
I'll dig up what I have and send it in.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 01:07:15

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrea Arcangeli ([email protected]) wrote:
> Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> guess google also won't dislike it.
>
> This is a lot simpler than the mlock rlimit and this is people really
> need (not the rlimit). The rlimit thing can still be applied on top of
> this. This should be more efficient too (besides its simplicity).
>
> can you apply to mainline?

This patch seems like the wrong hack to work around missing mlock rlimit
functionality. Wouldn't it be better to fix the core problem, and leave
this patch out of mainline? I agree with Rik, such a fix (mlock/rlimit)
will make all the gpg users feel warm and fuzzy ;-)

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 01:18:58

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 05:07:05PM -0800, Chris Wright wrote:
> * Andrea Arcangeli ([email protected]) wrote:
> > Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> > guess google also won't dislike it.
> >
> > This is a lot simpler than the mlock rlimit and this is people really
> > need (not the rlimit). The rlimit thing can still be applied on top of
> > this. This should be more efficient too (besides its simplicity).
> >
> > can you apply to mainline?
>
> This patch seems like the wrong hack to work around missing mlock rlimit
> functionality. Wouldn't it be better to fix the core problem, and leave
> this patch out of mainline? I agree with Rik, such a fix (mlock/rlimit)
> will make all the gpg users feel warm and fuzzy ;-)

please elaborate how can you account for shmget(SHM_HUGETLB) with the
rlimit. The rlimit is just about the _address_space_ mlocked, there's no
way to account for something _outside_ the address space with the rlimit,
period. If you attempt doing that, _that_ will be THE true hack(tm) ;).

2004-04-02 01:28:27

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Chris Wright <[email protected]> wrote:
>
> * Andrea Arcangeli ([email protected]) wrote:
> > Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> > guess google also won't dislike it.
> >
> > This is a lot simpler than the mlock rlimit and this is people really
> > need (not the rlimit). The rlimit thing can still be applied on top of
> > this. This should be more efficient too (besides its simplicity).
> >
> > can you apply to mainline?
>
> This patch seems like the wrong hack to work around missing mlock rlimit
> functionality. Wouldn't it be better to fix the core problem, and leave
> this patch out of mainline? I agree with Rik, such a fix (mlock/rlimit)
> will make all the gpg users feel warm and fuzzy ;-)

Rumour has it that the more exhasperated among us are brewing up a patch to
login.c which will allow capabilities to be retained after the setuid. So
you do

echo "oracle CAP_IPC_LOCK" > /etc/logincap.conf

And that's it.

See any reason why this won't work?

2004-04-02 01:31:57

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrea Arcangeli ([email protected]) wrote:
> please elaborate how can you account for shmget(SHM_HUGETLB) with the
> rlimit. The rlimit is just about the _address_space_ mlocked, there's no
> way to account for something _outside_ the address space with the rlimit,
> period. If you attempt doing that, _that_ will be THE true hack(tm) ;).

Heh ;-) OK, here's the patch. When you setup the vmas for the huge pages
account for them, when you tear them down, account for that as well.
It's very possible that I've missed the obvious, but it at least pasts
simple tests with SHM_HUGETLB, and also allows gpg to mlock when i set
the users mlock rlimit to 8 pages.

I recall the problem that I had. That's with normal pages and SHM_LOCK.
With this method of locking, it's trivial to mess up the accounting for
mm->locked_vm.

Patch below is from around 2.6.3, but still seems to apply ok.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

===== arch/i386/mm/hugetlbpage.c 1.21 vs edited =====
--- 1.21/arch/i386/mm/hugetlbpage.c Tue Dec 30 12:49:10 2003
+++ edited/arch/i386/mm/hugetlbpage.c Fri Feb 27 18:41:35 2004
@@ -106,6 +106,7 @@
pte_t entry;

mm->rss += (HPAGE_SIZE / PAGE_SIZE);
+ mm->locked_vm += (HPAGE_SIZE / PAGE_SIZE);
if (write_access) {
entry =
pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
@@ -316,6 +317,7 @@
pte_clear(pte);
}
mm->rss -= (end - start) >> PAGE_SHIFT;
+ mm->locked_vm -= (end -start) >> PAGE_SHIFT;
flush_tlb_range(vma, start, end);
}

@@ -524,7 +526,16 @@

int is_hugepage_mem_enough(size_t size)
{
- return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpagemem;
+ unsigned long lock_limit, locked;
+ struct mm_struct *mm = current->mm;
+ long htlbpagesize = (size + ~HPAGE_MASK)/HPAGE_SIZE;
+
+ locked = mm->locked_vm >> PAGE_SHIFT;
+ locked += htlbpagesize << (HPAGE_SHIFT - PAGE_SHIFT);
+ lock_limit = current->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT;
+
+ return ((locked <= lock_limit || capable(CAP_IPC_LOCK)) &&
+ (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpagemem);
}

/*
===== fs/hugetlbfs/inode.c 1.24 vs edited =====
--- 1.24/fs/hugetlbfs/inode.c Fri Feb 6 19:23:17 2004
+++ edited/fs/hugetlbfs/inode.c Fri Feb 27 18:49:17 2004
@@ -694,7 +697,7 @@
struct qstr quick_string;
char buf[16];

- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_mlock())
return ERR_PTR(-EPERM);

if (!is_hugepage_mem_enough(size))
===== include/asm-alpha/resource.h 1.1 vs edited =====
--- 1.1/include/asm-alpha/resource.h Thu Feb 15 13:25:56 2001
+++ edited/include/asm-alpha/resource.h Thu Feb 26 16:45:32 2004
@@ -39,7 +39,7 @@
{INR_OPEN, INR_OPEN}, /* RLIMIT_NOFILE */ \
{LONG_MAX, LONG_MAX}, /* RLIMIT_AS */ \
{LONG_MAX, LONG_MAX}, /* RLIMIT_NPROC */ \
- {LONG_MAX, LONG_MAX}, /* RLIMIT_MEMLOCK */ \
+ {PAGE_SIZE,PAGE_SIZE}, /* RLIMIT_MEMLOCK */ \
{LONG_MAX, LONG_MAX}, /* RLIMIT_LOCKS */ \
}

===== include/asm-arm/resource.h 1.1 vs edited =====
--- 1.1/include/asm-arm/resource.h Thu Feb 15 13:26:06 2001
+++ edited/include/asm-arm/resource.h Thu Feb 26 16:45:33 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-cris/resource.h 1.1 vs edited =====
--- 1.1/include/asm-cris/resource.h Thu Feb 22 09:58:11 2001
+++ edited/include/asm-cris/resource.h Thu Feb 26 16:45:33 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-i386/resource.h 1.1 vs edited =====
--- 1.1/include/asm-i386/resource.h Thu Feb 15 13:25:53 2001
+++ edited/include/asm-i386/resource.h Thu Feb 26 16:45:34 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-ia64/resource.h 1.3 vs edited =====
--- 1.3/include/asm-ia64/resource.h Fri Jan 30 18:49:24 2004
+++ edited/include/asm-ia64/resource.h Thu Feb 26 16:45:34 2004
@@ -44,7 +44,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-m68k/resource.h 1.2 vs edited =====
--- 1.2/include/asm-m68k/resource.h Thu May 9 08:21:01 2002
+++ edited/include/asm-m68k/resource.h Thu Feb 26 16:45:35 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-mips/resource.h 1.3 vs edited =====
--- 1.3/include/asm-mips/resource.h Fri Aug 8 14:44:42 2003
+++ edited/include/asm-mips/resource.h Thu Feb 26 16:45:35 2004
@@ -52,7 +52,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}

===== include/asm-parisc/resource.h 1.1 vs edited =====
--- 1.1/include/asm-parisc/resource.h Thu Feb 15 13:26:11 2001
+++ edited/include/asm-parisc/resource.h Thu Feb 26 16:46:33 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-ppc/resource.h 1.3 vs edited =====
--- 1.3/include/asm-ppc/resource.h Fri Sep 20 01:20:44 2002
+++ edited/include/asm-ppc/resource.h Thu Feb 26 16:46:34 2004
@@ -34,7 +34,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-ppc64/resource.h 1.1 vs edited =====
--- 1.1/include/asm-ppc64/resource.h Wed Feb 20 00:14:56 2002
+++ edited/include/asm-ppc64/resource.h Thu Feb 26 16:47:52 2004
@@ -43,7 +43,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-s390/resource.h 1.2 vs edited =====
--- 1.2/include/asm-s390/resource.h Tue Feb 13 06:13:44 2001
+++ edited/include/asm-s390/resource.h Thu Feb 26 16:47:53 2004
@@ -45,7 +45,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-sh/resource.h 1.1 vs edited =====
--- 1.1/include/asm-sh/resource.h Thu Feb 15 13:26:07 2001
+++ edited/include/asm-sh/resource.h Thu Feb 26 16:48:15 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/asm-sparc/resource.h 1.1 vs edited =====
--- 1.1/include/asm-sparc/resource.h Thu Feb 15 13:25:58 2001
+++ edited/include/asm-sparc/resource.h Thu Feb 26 16:48:16 2004
@@ -42,7 +42,7 @@
{ 0, RLIM_INFINITY}, \
{RLIM_INFINITY, RLIM_INFINITY}, \
{INR_OPEN, INR_OPEN}, {0, 0}, \
- {RLIM_INFINITY, RLIM_INFINITY}, \
+ {PAGE_SIZE, PAGE_SIZE }, \
{RLIM_INFINITY, RLIM_INFINITY}, \
{RLIM_INFINITY, RLIM_INFINITY} \
}
===== include/asm-sparc64/resource.h 1.1 vs edited =====
--- 1.1/include/asm-sparc64/resource.h Thu Feb 15 13:26:02 2001
+++ edited/include/asm-sparc64/resource.h Thu Feb 26 16:48:17 2004
@@ -41,7 +41,7 @@
{ 0, RLIM_INFINITY}, \
{RLIM_INFINITY, RLIM_INFINITY}, \
{INR_OPEN, INR_OPEN}, {0, 0}, \
- {RLIM_INFINITY, RLIM_INFINITY}, \
+ {PAGE_SIZE, PAGE_SIZE }, \
{RLIM_INFINITY, RLIM_INFINITY}, \
{RLIM_INFINITY, RLIM_INFINITY} \
}
===== include/asm-x86_64/resource.h 1.1 vs edited =====
--- 1.1/include/asm-x86_64/resource.h Wed Feb 13 16:05:42 2002
+++ edited/include/asm-x86_64/resource.h Thu Feb 26 16:48:19 2004
@@ -37,7 +37,7 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ 0, 0 }, \
{ INR_OPEN, INR_OPEN }, \
- { RLIM_INFINITY, RLIM_INFINITY }, \
+ { PAGE_SIZE, PAGE_SIZE }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ RLIM_INFINITY, RLIM_INFINITY }, \
}
===== include/linux/mm.h 1.64 vs edited =====
--- 1.64/include/linux/mm.h Fri Feb 6 10:08:10 2004
+++ edited/include/linux/mm.h Thu Feb 26 16:51:18 2004
@@ -427,7 +427,7 @@
struct page *shmem_nopage(struct vm_area_struct * vma,
unsigned long address, int *type);
struct file *shmem_file_setup(char * name, loff_t size, unsigned long flags);
-void shmem_lock(struct file * file, int lock);
+int shmem_lock(struct file * file, int lock);
int shmem_zero_setup(struct vm_area_struct *);

void zap_page_range(struct vm_area_struct *vma, unsigned long address,
@@ -575,6 +575,17 @@
if (!vma->vm_file && vma->vm_flags == vm_flags)
return 1;
#endif
+ return 0;
+}
+
+/* mlock can just return an instant EPERM if the caller has no
+ permission to do any memory locking. */
+static inline int can_do_mlock(void)
+{
+ if (capable(CAP_IPC_LOCK))
+ return 1;
+ if (current->rlim[RLIMIT_MEMLOCK].rlim_cur != 0)
+ return 1;
return 0;
}

===== ipc/shm.c 1.40 vs edited =====
--- 1.40/ipc/shm.c Mon Sep 8 15:08:14 2003
+++ edited/ipc/shm.c Wed Mar 3 11:55:17 2004
@@ -502,10 +502,8 @@
case SHM_LOCK:
case SHM_UNLOCK:
{
-/* Allow superuser to lock segment in memory */
-/* Should the pages be faulted in here or leave it to user? */
-/* need to determine interaction with current->swappable */
- if (!capable(CAP_IPC_LOCK)) {
+ /* Allow superuser to lock segment in memory */
+ if (!can_do_mlock()) {
err = -EPERM;
goto out;
}
@@ -524,8 +522,11 @@
goto out_unlock;

if(cmd==SHM_LOCK) {
- if (!is_file_hugepages(shp->shm_file))
- shmem_lock(shp->shm_file, 1);
+ if (!is_file_hugepages(shp->shm_file)) {
+ err = shmem_lock(shp->shm_file, 1);
+ if (err)
+ goto out_unlock;
+ }
shp->shm_flags |= SHM_LOCKED;
} else {
if (!is_file_hugepages(shp->shm_file))
===== mm/mlock.c 1.7 vs edited =====
--- 1.7/mm/mlock.c Fri Oct 17 07:43:50 2003
+++ edited/mm/mlock.c Fri Feb 27 15:10:33 2004
@@ -57,7 +57,7 @@
struct vm_area_struct * vma, * next;
int error;

- if (on && !capable(CAP_IPC_LOCK))
+ if (on && !can_do_mlock())
return -EPERM;
len = PAGE_ALIGN(len);
end = start + len;
@@ -115,9 +115,9 @@
lock_limit >>= PAGE_SHIFT;

/* check against resource limits */
- if (locked <= lock_limit)
+ if (locked <= lock_limit || capable(CAP_IPC_LOCK))
error = do_mlock(start, len, 1);
up_write(&current->mm->mmap_sem);
return error;
}

@@ -139,7 +141,7 @@
unsigned int def_flags;
struct vm_area_struct * vma;

- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_mlock())
return -EPERM;

def_flags = 0;
@@ -174,7 +176,7 @@
lock_limit >>= PAGE_SHIFT;

ret = -ENOMEM;
- if (current->mm->total_vm <= lock_limit)
+ if (current->mm->total_vm <= lock_limit || capable(CAP_IPC_LOCK))
ret = do_mlockall(flags);
out:
up_write(&current->mm->mmap_sem);
===== mm/mmap.c 1.64 vs edited =====
--- 1.64/mm/mmap.c Mon Feb 16 11:49:56 2004
+++ edited/mm/mmap.c Thu Feb 26 17:45:14 2004
@@ -512,15 +512,17 @@
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

if (flags & MAP_LOCKED) {
- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_mlock())
return -EPERM;
vm_flags |= VM_LOCKED;
}
/* mlock MCL_FUTURE? */
if (vm_flags & VM_LOCKED) {
- unsigned long locked = mm->locked_vm << PAGE_SHIFT;
+ unsigned long locked, lock_limit;
+ locked = mm->locked_vm << PAGE_SHIFT;
+ lock_limit = current->rlim[RLIMIT_MEMLOCK].rlim_cur;
locked += len;
- if (locked > current->rlim[RLIMIT_MEMLOCK].rlim_cur)
+ if (locked > lock_limit && !capable(CAP_IPC_LOCK))
return -EAGAIN;
}

@@ -1331,13 +1333,18 @@
if ((addr + len) > TASK_SIZE || (addr + len) < addr)
return -EINVAL;

+ if ((addr + len) > TASK_SIZE || (addr + len) < addr)
+ return -EINVAL;
+
/*
* mlock MCL_FUTURE?
*/
if (mm->def_flags & VM_LOCKED) {
- unsigned long locked = mm->locked_vm << PAGE_SHIFT;
+ unsigned long locked, lock_limit;
+ locked = mm->locked_vm << PAGE_SHIFT;
+ lock_limit = current->rlim[RLIMIT_MEMLOCK].rlim_cur;
locked += len;
- if (locked > current->rlim[RLIMIT_MEMLOCK].rlim_cur)
+ if (locked > lock_limit && !capable(CAP_IPC_LOCK))
return -EAGAIN;
}

===== mm/mremap.c 1.31 vs edited =====
--- 1.31/mm/mremap.c Tue Feb 17 23:14:50 2004
+++ edited/mm/mremap.c Thu Feb 26 16:48:24 2004
@@ -387,10 +387,12 @@
goto out;
}
if (vma->vm_flags & VM_LOCKED) {
- unsigned long locked = current->mm->locked_vm << PAGE_SHIFT;
+ unsigned long locked, lock_limit;
+ locked = current->mm->locked_vm << PAGE_SHIFT;
+ lock_limit = current->rlim[RLIMIT_MEMLOCK].rlim_cur;
locked += new_len - old_len;
ret = -EAGAIN;
- if (locked > current->rlim[RLIMIT_MEMLOCK].rlim_cur)
+ if (locked > lock_limit && !capable(CAP_IPC_LOCK))
goto out;
}
ret = -ENOMEM;
===== mm/shmem.c 1.63 vs edited =====
--- 1.63/mm/shmem.c Thu Feb 5 16:08:09 2004
+++ edited/mm/shmem.c Fri Feb 27 19:40:42 2004
@@ -1046,17 +1046,38 @@
return 0;
}

-void shmem_lock(struct file *file, int lock)
+int shmem_lock(struct file *file, int lock)
{
struct inode *inode = file->f_dentry->d_inode;
struct shmem_inode_info *info = SHMEM_I(inode);
+ struct mm_struct *mm = current->mm;
+ unsigned long lock_limit, locked;
+ int retval = -ENOMEM;

spin_lock(&info->lock);
- if (lock)
+ if (lock) {
+ if (!(info->flags & VM_LOCKED)) {
+ locked = inode->i_size >> PAGE_SHIFT;
+ locked += mm->locked_vm;
+ lock_limit = current->rlim[RLIMIT_MEMLOCK].rlim_cur;
+ lock_limit >>= PAGE_SHIFT;
+ if (locked > lock_limit && !capable(CAP_IPC_LOCK))
+ goto out_nomem;
+ mm->locked_vm = locked;
+ }
info->flags |= VM_LOCKED;
- else
+ }
+ if (!lock) {
+ if ((info->flags & VM_LOCKED) && mm) {
+ locked = inode->i_size >> PAGE_SHIFT;
+ mm->locked_vm -= locked;
+ }
info->flags &= ~VM_LOCKED;
+ }
+ retval = 0;
+out_nomem:
spin_unlock(&info->lock);
+ return retval;
}

static int shmem_mmap(struct file *file, struct vm_area_struct *vma)

2004-04-02 01:38:02

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 05:30:14PM -0800, Chris Wright wrote:
> * Andrea Arcangeli ([email protected]) wrote:
> > please elaborate how can you account for shmget(SHM_HUGETLB) with the
> > rlimit. The rlimit is just about the _address_space_ mlocked, there's no
> > way to account for something _outside_ the address space with the rlimit,
> > period. If you attempt doing that, _that_ will be THE true hack(tm) ;).
>
> Heh ;-) OK, here's the patch. When you setup the vmas for the huge pages
> account for them, when you tear them down, account for that as well.
> It's very possible that I've missed the obvious, but it at least pasts

what you missed is that after you locked_vm -= you don't free anything,
you only unmap them from the address space which means nothing in terms
of amount if pinned ram.

so patch is broken and insecure as far as I can tell.

2004-04-02 01:59:19

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrew Morton ([email protected]) wrote:
> Rumour has it that the more exhasperated among us are brewing up a patch to
> login.c which will allow capabilities to be retained after the setuid. So
> you do
>
> echo "oracle CAP_IPC_LOCK" > /etc/logincap.conf
>
> And that's it.
>
> See any reason why this won't work?

Looks ok, and sounds very similar to what pam_cap does.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 02:04:47

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrea Arcangeli ([email protected]) wrote:
> what you missed is that after you locked_vm -= you don't free anything,
> you only unmap them from the address space which means nothing in terms
> of amount if pinned ram.

doesn't it free the huge page right there? each page gets
huge_page_released, right?

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 02:09:23

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 05:59:14PM -0800, Chris Wright wrote:
> * Andrew Morton ([email protected]) wrote:
> > Rumour has it that the more exhasperated among us are brewing up a patch to
> > login.c which will allow capabilities to be retained after the setuid. So
> > you do
> >
> > echo "oracle CAP_IPC_LOCK" > /etc/logincap.conf
> >
> > And that's it.
> >
> > See any reason why this won't work?
>
> Looks ok, and sounds very similar to what pam_cap does.

just curious, how does this work through 'su'? Does su check
logincap.conf too?

I certainly agree this can be fully solved in userspace, though it won't
be a few linear change in userspace and for the short term matter
there's not much time left to change userspace. For the long term if we
want to go with the userspace solution that's fine with me, I definitely
agree with that. For the very short term I'm not sure, but then I
certainly cannot object if nothing is changed in the mainline kernel for
this.

2004-04-02 02:13:26

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 06:04:41PM -0800, Chris Wright wrote:
> * Andrea Arcangeli ([email protected]) wrote:
> > what you missed is that after you locked_vm -= you don't free anything,
> > you only unmap them from the address space which means nothing in terms
> > of amount if pinned ram.
>
> doesn't it free the huge page right there? each page gets
> huge_page_released, right?

that has nothing to do with freeing the page, that's just releasing 1
refcount, because you dropped the pte mapping, the page is still there
healthy in the pagecache ready for somebody else to shmat. If you were
right then a shmdt+shmat would corrupt the SGA.

Your patch breaks local security and it's trivial to DoS a machine with
it applied as far as I can tell.

2004-04-02 02:21:55

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrea Arcangeli ([email protected]) wrote:
> that has nothing to do with freeing the page, that's just releasing 1
> refcount, because you dropped the pte mapping, the page is still there
> healthy in the pagecache ready for somebody else to shmat. If you were
> right then a shmdt+shmat would corrupt the SGA.

Ah, yes I see what you are saying. This is the same issue with normal
pages and SHM_LOCK that I mentioned earlier, I believe. I don't see the
best solution, because once you detach w/out any destroy, there could be
nobody to assign the accounting to. Do you agree?

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 02:30:35

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Andrea Arcangeli <[email protected]> wrote:
>
> On Thu, Apr 01, 2004 at 05:59:14PM -0800, Chris Wright wrote:
> > * Andrew Morton ([email protected]) wrote:
> > > Rumour has it that the more exhasperated among us are brewing up a patch to
> > > login.c which will allow capabilities to be retained after the setuid. So
> > > you do
> > >
> > > echo "oracle CAP_IPC_LOCK" > /etc/logincap.conf
> > >
> > > And that's it.
> > >
> > > See any reason why this won't work?
> >
> > Looks ok, and sounds very similar to what pam_cap does.
>
> just curious, how does this work through 'su'? Does su check
> logincap.conf too?

I guess so.

> I certainly agree this can be fully solved in userspace, though it won't
> be a few linear change in userspace and for the short term matter
> there's not much time left to change userspace. For the long term if we
> want to go with the userspace solution that's fine with me, I definitely
> agree with that. For the very short term I'm not sure, but then I
> certainly cannot object if nothing is changed in the mainline kernel for
> this.

Well you have a local short-term solution...

One thing I was wondering was whether /proc/sys/vm/disable_cap_mlock should
hold a GID rather than a boolean. So you do

echo groupof oracle > /proc/sys/vm/disable_cap_mlock

2004-04-02 02:33:20

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrew Morton ([email protected]) wrote:
> Andrea Arcangeli <[email protected]> wrote:
> > just curious, how does this work through 'su'? Does su check
> > logincap.conf too?
>
> I guess so.

Or let pam_cap do it so you don't have to modify all the apps just the pam
confs.

> Well you have a local short-term solution...
>
> One thing I was wondering was whether /proc/sys/vm/disable_cap_mlock should
> hold a GID rather than a boolean. So you do
>
> echo groupof oracle > /proc/sys/vm/disable_cap_mlock

Heh, was just thinking the same.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 02:38:21

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 06:21:27PM -0800, Chris Wright wrote:
> Ah, yes I see what you are saying. This is the same issue with normal
> pages and SHM_LOCK that I mentioned earlier, I believe. I don't see the
> best solution, because once you detach w/out any destroy, there could be
> nobody to assign the accounting to. Do you agree?

yes, rlimit just can't account for shmget(SHM_HUGETLB) and
shmctl(SHM_LOCK) either, because it can only account the stuff that you
temporarily have in the address space.

the exploit is simply to shmget tons of 2M hugepage segments, and to
shmat/shmdt all of them, then you'll pin N times those 2M largepages,
and they will not be accounted anywhere allowing anybody to pin as much
memory as they want.

Both shmctl(SHM_LOCK) and shmget(SHM_HUGETLB) cannot be allowed in
function of any rlimit check, a system wide sysctl (as we implemented)
or some other method (can be implemented in userspace too of course, as
Andrew suggested) is needed for that. Using rlimit for that is broken
and in turn insecure.

the rlimit however works fine for _mlock_.

the fundamental difference between mlock and SHM_LOCK/SHM_HUGETLBFS is
that mlock is about locking pages in the address space, after the
address space is unmapped the mlock is gone too, so when the rlimit is
ok with it, you can mlock more ram. SHM_LOCK/SHM_HUGETLB is about
allocating physical pages, the mapping in the address space has no
effect on those, those pages will never be released after the mapping
is gone. So the rlimit can't help here.

2004-04-02 02:41:08

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 06:30:26PM -0800, Andrew Morton wrote:
> I guess so.

sounds very good then, I'll send a notice.

> Well you have a local short-term solution...

yep...

> One thing I was wondering was whether /proc/sys/vm/disable_cap_mlock should
> hold a GID rather than a boolean. So you do
>
> echo groupof oracle > /proc/sys/vm/disable_cap_mlock

that's probably optimal OTOH that would complicate the code, I prefer an
obviously safe !disable_cap_mlock, if we want to go complicated we can
probably wait the userspace solution ;)

2004-04-02 02:45:17

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Chris Wright <[email protected]> wrote:
>
> * Andrew Morton ([email protected]) wrote:
> > Andrea Arcangeli <[email protected]> wrote:
> > > just curious, how does this work through 'su'? Does su check
> > > logincap.conf too?
> >
> > I guess so.
>
> Or let pam_cap do it so you don't have to modify all the apps just the pam
> confs.

Well the message I'm receiving is that the userspace capability
infrastructure is a decrepit mess which nobody is fixing or maintaining.

Certainly, if we could arrange for pam_cap to be fixed and proselytized
that would be even better than bolting new workalike code into login and
su.

2004-04-02 02:48:26

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrea Arcangeli ([email protected]) wrote:
> On Thu, Apr 01, 2004 at 06:21:27PM -0800, Chris Wright wrote:
> > Ah, yes I see what you are saying. This is the same issue with normal
> > pages and SHM_LOCK that I mentioned earlier, I believe. I don't see the
> > best solution, because once you detach w/out any destroy, there could be
> > nobody to assign the accounting to. Do you agree?
>
> yes, rlimit just can't account for shmget(SHM_HUGETLB) and
> shmctl(SHM_LOCK) either, because it can only account the stuff that you
> temporarily have in the address space.
>
> the exploit is simply to shmget tons of 2M hugepage segments, and to
> shmat/shmdt all of them, then you'll pin N times those 2M largepages,
> and they will not be accounted anywhere allowing anybody to pin as much
> memory as they want.

Yup. I had an earlier patch against 2.4 that created a max count for
pages lockable by unprivileged users. So the accounting was done
against a global pool, and mitigated the DoS damage to those trying to
share this pool. I think it was more of a hack, though.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 02:49:23

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Andrea Arcangeli <[email protected]> wrote:
>
> > One thing I was wondering was whether /proc/sys/vm/disable_cap_mlock should
> > hold a GID rather than a boolean. So you do
> >
> > echo groupof oracle > /proc/sys/vm/disable_cap_mlock
>
> that's probably optimal OTOH that would complicate the code, I prefer an
> obviously safe !disable_cap_mlock, if we want to go complicated we can
> probably wait the userspace solution ;)

That depends on how you structure the code. If you do it the below way,
it's a one-liner.

(Will the compiler propagate `unlikeliness' out of an inline function?)

25-akpm/fs/hugetlbfs/inode.c | 2 +-
25-akpm/include/linux/sched.h | 6 ++++++
25-akpm/include/linux/sysctl.h | 1 +
25-akpm/ipc/shm.c | 2 +-
25-akpm/kernel/capability.c | 1 +
25-akpm/kernel/sysctl.c | 8 ++++++++
25-akpm/mm/mlock.c | 4 ++--
25-akpm/mm/mmap.c | 2 +-
8 files changed, 21 insertions(+), 5 deletions(-)

diff -puN fs/hugetlbfs/inode.c~disable-cap-mlock-2 fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/fs/hugetlbfs/inode.c Thu Apr 1 15:45:20 2004
@@ -707,7 +707,7 @@ struct file *hugetlb_zero_setup(size_t s
struct qstr quick_string;
char buf[16];

- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_mlock())
return ERR_PTR(-EPERM);

if (!is_hugepage_mem_enough(size))
diff -puN include/linux/sched.h~disable-cap-mlock-2 include/linux/sched.h
--- 25/include/linux/sched.h~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/include/linux/sched.h Thu Apr 1 15:45:20 2004
@@ -690,6 +690,12 @@ static inline int capable(int cap)
}
#endif

+extern int sysctl_disable_cap_mlock;
+static inline int can_do_mlock(void)
+{
+ return unlikely(sysctl_disable_cap_mlock || capable(CAP_IPC_LOCK));
+}
+
/*
* Routines for handling mm_structs
*/
diff -puN include/linux/sysctl.h~disable-cap-mlock-2 include/linux/sysctl.h
--- 25/include/linux/sysctl.h~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/include/linux/sysctl.h Thu Apr 1 15:45:48 2004
@@ -159,6 +159,7 @@ enum
VM_LOWER_ZONE_PROTECTION=20,/* Amount of protection of lower zones */
VM_MIN_FREE_KBYTES=21, /* Minimum free kilobytes to maintain */
VM_MAX_MAP_COUNT=22, /* int: Maximum number of mmaps/address-space */
+ VM_DISABLE_CAP_MLOCK=23,/* disable CAP_IPC_LOCK checking */
};

diff -puN ipc/shm.c~disable-cap-mlock-2 ipc/shm.c
--- 25/ipc/shm.c~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/ipc/shm.c Thu Apr 1 15:45:20 2004
@@ -505,7 +505,7 @@ asmlinkage long sys_shmctl (int shmid, i
/* Allow superuser to lock segment in memory */
/* Should the pages be faulted in here or leave it to user? */
/* need to determine interaction with current->swappable */
- if (!capable(CAP_IPC_LOCK)) {
+ if (!can_do_mlock()) {
err = -EPERM;
goto out;
}
diff -puN kernel/capability.c~disable-cap-mlock-2 kernel/capability.c
--- 25/kernel/capability.c~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/kernel/capability.c Thu Apr 1 15:45:20 2004
@@ -14,6 +14,7 @@

unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
kernel_cap_t cap_bset = CAP_INIT_EFF_SET;
+int sysctl_disable_cap_mlock = 0;

EXPORT_SYMBOL(securebits);
EXPORT_SYMBOL(cap_bset);
diff -puN kernel/sysctl.c~disable-cap-mlock-2 kernel/sysctl.c
--- 25/kernel/sysctl.c~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/kernel/sysctl.c Thu Apr 1 15:45:20 2004
@@ -744,6 +744,14 @@ static ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec
},
+ {
+ .ctl_name = VM_DISABLE_CAP_MLOCK,
+ .procname = "disable_cap_mlock",
+ .data = &sysctl_disable_cap_mlock,
+ .maxlen = sizeof(sysctl_disable_cap_mlock),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
{ .ctl_name = 0 }
};

diff -puN mm/mlock.c~disable-cap-mlock-2 mm/mlock.c
--- 25/mm/mlock.c~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/mm/mlock.c Thu Apr 1 15:45:20 2004
@@ -57,7 +57,7 @@ static int do_mlock(unsigned long start,
struct vm_area_struct * vma, * next;
int error;

- if (on && !capable(CAP_IPC_LOCK))
+ if (on && !can_do_mlock())
return -EPERM;
len = PAGE_ALIGN(len);
end = start + len;
@@ -139,7 +139,7 @@ static int do_mlockall(int flags)
unsigned int def_flags;
struct vm_area_struct * vma;

- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_mlock())
return -EPERM;

def_flags = 0;
diff -puN mm/mmap.c~disable-cap-mlock-2 mm/mmap.c
--- 25/mm/mmap.c~disable-cap-mlock-2 Thu Apr 1 15:45:20 2004
+++ 25-akpm/mm/mmap.c Thu Apr 1 15:45:20 2004
@@ -536,7 +536,7 @@ unsigned long do_mmap_pgoff(struct file
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;

if (flags & MAP_LOCKED) {
- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_mlock())
return -EPERM;
vm_flags |= VM_LOCKED;
}

_

2004-04-02 02:51:56

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrew Morton ([email protected]) wrote:
> Well the message I'm receiving is that the userspace capability
> infrastructure is a decrepit mess which nobody is fixing or maintaining.
>
> Certainly, if we could arrange for pam_cap to be fixed and proselytized
> that would be even better than bolting new workalike code into login and
> su.

Very true. I don't know who's maintaining that code, and the libcap
maintainer is not really touching that code either. /me looks about for
some spare round tuits or volunteers.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 03:07:27

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Thu, Apr 01, 2004 at 06:49:07PM -0800, Andrew Morton wrote:
> Andrea Arcangeli <[email protected]> wrote:
> >
> > > One thing I was wondering was whether /proc/sys/vm/disable_cap_mlock should
> > > hold a GID rather than a boolean. So you do
> > >
> > > echo groupof oracle > /proc/sys/vm/disable_cap_mlock
> >
> > that's probably optimal OTOH that would complicate the code, I prefer an
> > obviously safe !disable_cap_mlock, if we want to go complicated we can
> > probably wait the userspace solution ;)
>
> That depends on how you structure the code. If you do it the below way,
> it's a one-liner.

after you did this cleanup effort I'll have to merge your version ;)

>
> (Will the compiler propagate `unlikeliness' out of an inline function?)

it should, both need_resched and signal_pending depends on it, but I
don't think unlikely is correct, it's likely infact, optimizing for an
application returning -EPERM doesn't sound worthwhile, so I'll change it
to "likely".

thanks.

2004-04-02 03:22:10

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrew Morton ([email protected]) wrote:
>> Well the message I'm receiving is that the userspace capability
>> infrastructure is a decrepit mess which nobody is fixing or maintaining.
>> Certainly, if we could arrange for pam_cap to be fixed and proselytized
>> that would be even better than bolting new workalike code into login and
>> su.

On Thu, Apr 01, 2004 at 06:51:48PM -0800, Chris Wright wrote:
> Very true. I don't know who's maintaining that code, and the libcap
> maintainer is not really touching that code either. /me looks about for
> some spare round tuits or volunteers.

pam_cap is small... it might be debuggable. Entire thing attached.

-- wli

Attachments:

(No filename) (666.00 B)
pam_cap.c (6.85 kB)
pam_cap.c Download all attachments

2004-04-02 10:39:35

by Pavel Machek

[permalink] [raw]

Subject: Re: disable-cap-mlock

Hi!

> > Oracle needs this sysctl, I designed it and Ken Chen implemented it. I
> > guess google also won't dislike it.
> > This is a lot simpler than the mlock rlimit and this is people really
> > need (not the rlimit). The rlimit thing can still be applied on top of
> > this. This should be more efficient too (besides its simplicity).
> > can you apply to mainline?
> > http://www.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.5-rc3-aa1/disable-cap-mlock-1
>
> Something like this would have the minor advantage of zero core impact.
> Testbooted only. vs. 2.6.5-rc3-mm4

I thought this is what setpcap in init is for?
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-04-02 21:37:17

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Andrew Morton <[email protected]> wrote:
>
> Rumour has it that the more exhasperated among us are brewing up a patch to
> login.c which will allow capabilities to be retained after the setuid.

So I spent a few hours getting pam_cap to work, and indeed it is now doing the
right thing. But the kernel is not.

It turns out that the whole "drop capabilities and then run something"
thing does not work in either 2.4 or 2.6. And hasn't done since forever.
What we have in there is no more useful than suser().

You can do prctl(PR_SET_KEEPCAPS, 1) so that permitted caps are retained
across setuid(). And after the setuid() you can raise effective caps
again. So that's workable, although pretty sad - it requires that su and
login be patched to run the prctl and to re-raise effective caps.

But the two showstoppers are:

1) capabilities are unconditionally nuked across execve() unless you're
root (cap_bprm_set_security())

2) the kernel unconditionally removes CAP_SETPCAP in dummy_capget() so
it is not possible for even a root-owned, otherwise-fully-capable task
to raise capabilities on another task. Period.

I must say that I'm fairly disappointed that we developed and merged all
that fancy security stuff but nobody ever bothered to fix up the existing
simple capability code.

Particularly as, apparently, the new security stuff STILL cannot solve the
extremely simple Oracle-wants-CAP_IPC_LOCK requirement.

Chris has proposed a little patch which will enable the retention of caps
across execve. I'd be interested in knowing why we _ever_ dropped caps
across execve? I thing we should run with Chris's patch - but the new
functionality should of course only be enabled by some admin-settable knob.

I'm looking at securebits.h and wondering why that exists - there's no code
in-kernel to set the thing, although it is exported to modules. Perhaps
securebits should be exposed in /proc and used to enable
retain-caps-across-execve.

2004-04-02 22:36:48

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrew Morton ([email protected]) wrote:
> So I spent a few hours getting pam_cap to work, and indeed it is now doing the
> right thing. But the kernel is not.

Do you have a patch? Seems it could be useful to get this and libcap back
up-to-date .

> It turns out that the whole "drop capabilities and then run something"
> thing does not work in either 2.4 or 2.6. And hasn't done since forever.
> What we have in there is no more useful than suser().

Indeed. This is often how I refer to it. There's one exception.
Without the use of execve(), a resident daemon can drop it's privs as
needed.

> You can do prctl(PR_SET_KEEPCAPS, 1) so that permitted caps are retained
> across setuid(). And after the setuid() you can raise effective caps
> again. So that's workable, although pretty sad - it requires that su and
> login be patched to run the prctl and to re-raise effective caps.
>
> But the two showstoppers are:
>
> 1) capabilities are unconditionally nuked across execve() unless you're
> root (cap_bprm_set_security())

Or exec'ing a setuid root program. And in either of those cases they
get raised to full sets, which may not be nice.

> 2) the kernel unconditionally removes CAP_SETPCAP in dummy_capget() so
> it is not possible for even a root-owned, otherwise-fully-capable task
> to raise capabilities on another task. Period.

This is how the kernel was before the security stuff went in.

> I must say that I'm fairly disappointed that we developed and merged all
> that fancy security stuff but nobody ever bothered to fix up the existing
> simple capability code.

Our goal was actually to keep is compatible. All of it's limitations
predate the security stuff.

> Particularly as, apparently, the new security stuff STILL cannot solve the
> extremely simple Oracle-wants-CAP_IPC_LOCK requirement.
>
> Chris has proposed a little patch which will enable the retention of caps
> across execve. I'd be interested in knowing why we _ever_ dropped caps
> across execve? I thing we should run with Chris's patch - but the new
> functionality should of course only be enabled by some admin-settable knob.

I'm not sure, but it likely has to do with anticipating having the fs
bits of capabilities to do proper setting at execve(). I think basically
nobody really uses capabilites except in either simple root drops a
few privs ways (no exec), or within larger security models running as
kernel modules.

> I'm looking at securebits.h and wondering why that exists - there's no code
> in-kernel to set the thing, although it is exported to modules. Perhaps
> securebits should be exposed in /proc and used to enable
> retain-caps-across-execve.

IIRC, changing those (existing) securebits settings creates an unusable
machine. Again, I think there was some anticipation of the fs bits
going in later. Perhaps those securebits pieces could just be removed?

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 22:56:42

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Fri, Apr 02, 2004 at 02:36:39PM -0800, Chris Wright wrote:
> [..] I think basically
> nobody really uses capabilites except in either simple root drops a
> few privs ways (no exec), [..]

yep, at least sendmail does that (I remeber because there was a kernel
bug at some point not dropping those).

If I understand well, the basic problem is that there's no way to retain
a single capability forever through execve and everything else possible.
We'd need a way to tell the kernel a certain capability must never go
away no matter what syscall is being run. Of course one will need need a
special capability to use this functionality (CAP_ADMIN or similar) and
login/su will then be able to give IPC_CAP_LOCK to an user. I think
at some point there was something like this being discussed, when there
were still discussions about putting the capabilities into the fs (or
the elf header or whatever).

2004-04-02 22:59:55

by Andrew Morton

[permalink] [raw]

Subject: Re: disable-cap-mlock

Chris Wright <[email protected]> wrote:
>
> * Andrew Morton ([email protected]) wrote:
> > So I spent a few hours getting pam_cap to work, and indeed it is now doing the
> > right thing. But the kernel is not.
>
> Do you have a patch? Seems it could be useful to get this and libcap back
> up-to-date .

http://www.zip.com.au/~akpm/linux/patches/stuff/pam_cap-akpm.tar.gz

> > 2) the kernel unconditionally removes CAP_SETPCAP in dummy_capget() so
> > it is not possible for even a root-owned, otherwise-fully-capable task
> > to raise capabilities on another task. Period.
>
> This is how the kernel was before the security stuff went in.

That's my point, Chris. "the feature is bollixed, so let's write a ton of
new parallel stuff but not fix the original code". This is how cruft
accumulates.

> > I must say that I'm fairly disappointed that we developed and merged all
> > that fancy security stuff but nobody ever bothered to fix up the existing
> > simple capability code.
>
> Our goal was actually to keep is compatible. All of it's limitations
> predate the security stuff.

Either the fine-grained capabilities are fixable, or they should be deleted
and we go back to suser(). One of those things should have happened before
adding more code, surely?

> I'm not sure, but it likely has to do with anticipating having the fs
> bits of capabilities to do proper setting at execve(). I think basically
> nobody really uses capabilites except in either simple root drops a
> few privs ways (no exec), or within larger security models running as
> kernel modules.

Yup, we've talked about how you can drop caps in this way for *years* but I
don't think many people realised that this emperor is unclad.

> > I'm looking at securebits.h and wondering why that exists - there's no code
> > in-kernel to set the thing, although it is exported to modules. Perhaps
> > securebits should be exposed in /proc and used to enable
> > retain-caps-across-execve.
>
> IIRC, changing those (existing) securebits settings creates an unusable
> machine. Again, I think there was some anticipation of the fs bits
> going in later. Perhaps those securebits pieces could just be removed.

OK. Do you have time to do the honours?

2004-04-02 23:18:32

by Chris Wright

[permalink] [raw]

Subject: Re: disable-cap-mlock

* Andrew Morton ([email protected]) wrote:
> http://www.zip.com.au/~akpm/linux/patches/stuff/pam_cap-akpm.tar.gz

Cool, thanks.

> That's my point, Chris. "the feature is bollixed, so let's write a ton of
> new parallel stuff but not fix the original code". This is how cruft
> accumulates.

Yes, OK, point well-taken.

> > Our goal was actually to keep is compatible. All of it's limitations
> > predate the security stuff.
>
> Either the fine-grained capabilities are fixable, or they should be deleted
> and we go back to suser(). One of those things should have happened before
> adding more code, surely?

I s'pose we rather viewed the behaviour as legacy...stuck with, don't
muck with...Making it usable is certainly better than going back to
suser(), so let's procede that way and reconcile the mistake.

> > IIRC, changing those (existing) securebits settings creates an unusable
> > machine. Again, I think there was some anticipation of the fs bits
> > going in later. Perhaps those securebits pieces could just be removed.
>
> OK. Do you have time to do the honours?

Sure thing.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

2004-04-02 23:44:18

by William Lee Irwin III

[permalink] [raw]

Subject: Re: disable-cap-mlock

At some point in the past, I wrote:
>> Something like this would have the minor advantage of zero core impact.
>> Testbooted only. vs. 2.6.5-rc3-mm4

On Fri, Apr 02, 2004 at 12:39:23PM +0200, Pavel Machek wrote:
> I thought this is what setpcap in init is for?

Yes, that would be a better answer to this issue. I was largely looking
to produce an alternative implementation of the same thing with less
core impact. It looks like it may have been too powerful for its own
good, which is okay, since I didn't really like the sysctl idea anyway
(though apparently the thing looks attractive to other people for other
uses, which I don't really know much about, and am not really pursuing).

There's a push to fix up the capability issues ongoing that I'm getting
involved in instead.

-- wli

2004-04-05 12:14:35

by Stephen Smalley

[permalink] [raw]

Subject: Re: disable-cap-mlock

On Fri, 2004-04-02 at 16:35, Andrew Morton wrote:
> Particularly as, apparently, the new security stuff STILL cannot solve the
> extremely simple Oracle-wants-CAP_IPC_LOCK requirement.

Actually, it can. With SELinux enabled, you run oracle as uid 0 in a TE
domain that is allowed to use CAP_IPC_LOCK (e.g. allow oracle_t
self:capability ipc_lock;) and no other capabilities, and you are done.
Naturally, you would need to define a domain for oracle. uid 0 has no
special significance to SELinux; it is only required to satisfy the
secondary module you stack with SELinux, i.e. dummy or capabilities, and
the ability to use capabilities is controlled by the TE policy.

Or, if you want to drop the need to use uid 0 entirely, you unhook the
secondary_ops from SELinux so that SELinux alone makes the capability
decisions. But that will require finer tuning of the policy
configuration.

None of this is to argue against fixing the base capability logic, just
to note that SELinux can control capability usage.

--
Stephen Smalley <[email protected]>
National Security Agency