Hi.
First, I hope you are fine and the same for your relatives.
Capabilities are used to check if a thread can perform a given action [1].
For example, a thread with CAP_BPF set can use the bpf() syscall.
Capabilities are used in the container world.
In terms of code, several projects related to container maintain code where the
capabilities are written alike include/uapi/linux/capability.h [2][3][4][5].
For these projects, their codebase should be updated when a new capability is
added to the kernel.
Some other projects rely on <sys/capability.h> [6].
In this case, this header file should reflect the capabilities offered by the
kernel.
The delay between adding a new capability to the kernel and this
capability being used by "container stack" software users can be long.
Indeed, CAP_BPF was added in a17b53c4a4b5 which was part of v5.8 released in
August 2020.
Almost 2 years later, none of the "container stack" software authorize using
this capability in their last stable release.
The only way to use CAP_BPF with moby is to use v22.06.0-beta.0 release which
contains a commit enabling CAP_BPF, CAP_PERFMON and CAP_CHECKPOINT_RESTORE [7].
This situation can be easily explained by the following:
1. moby depends on containerd which in turns depends on runc.
2. runc depends on github.com/syndtr/gocapability which is golang package to
deal with capabilities.
This high number of dependencies explain the delay and the big amount of human
work to add support in the "container stack" software for a new capability.
A solution to this problem could be to add a way for the userspace to ask the
kernel about the capabilities it offers.
So, in this series, I added a new file to securityfs:
/sys/kernel/security/capabilities.
The goal of this file is to be used by "container world" software to know kernel
capabilities at run time instead of compile time.
The "file" is read-only and its content is the capability number associated with
the capability name:
root@vm-amd64:~# cat /sys/kernel/security/capabilities
0 CAP_CHOWN
1 CAP_DAC_OVERRIDE
...
40 CAP_CHECKPOINT_RESTORE
root@vm-amd64:~# wc -c /sys/kernel/security/capabilities
698 /sys/kernel/security/capabilities
So, the "container stack" software just have to read this file to know if they
can use the capabilities the user asked for.
For example, if user asks for CAP_BPF on kernel 5.8, then this capability will
be present in the file and so it can be used.
Nonetheless, if the underlying kernel is 5.4, this capability will not be
present and so it cannot be used.
The kernel already exposes the last capability number under:
/proc/sys/kernel/cap_last_cap
So, I think there should not be any issue exposing all the capabilities it
offers.
If there is any, please share it as I do not want to introduce issue with this
series.
Also, the data exchanged with userspace are less than 700 bytes long which
represent 17% of PAGE_SIZE.
Note that I am open to any better way for the userspace to ask the kernel for
known capabilities.
And if you see any way to improve this series please share it as it would
increase this contribution quality.
Change since:
v3:
* Use securityfs_create_file() to create securityfs file.
v2:
* Use a char * for cap_string instead of an array, each line of this char *
contains the capability number and its name.
* Move the file under /sys/kernel/security instead of /sys/kernel.
Francis Laniel (2):
capability: Add cap_string.
security/inode.c: Add capabilities file.
include/uapi/linux/capability.h | 1 +
kernel/capability.c | 45 +++++++++++++++++++++++++++++++++
security/inode.c | 16 ++++++++++++
3 files changed, 62 insertions(+)
Best regards and thank you in advance for your reviews.
---
[1] man capabilities
[2] https://github.com/containerd/containerd/blob/1a078e6893d07fec10a4940a5664fab21d6f7d1e/pkg/cap/cap_linux.go#L135
[3] https://github.com/moby/moby/commit/485cf38d48e7111b3d1f584d5e9eab46a902aabc#diff-2e04625b209932e74c617de96682ed72fbd1bb0d0cb9fb7c709cf47a86b6f9c1
moby relies on containerd code.
[4] https://github.com/syndtr/gocapability/blob/42c35b4376354fd554efc7ad35e0b7f94e3a0ffb/capability/enum.go#L47
[5] https://github.com/opencontainers/runc/blob/00f56786bb220b55b41748231880ba0e6380519a/libcontainer/capabilities/capabilities.go#L12
runc relies on syndtr package.
[6] https://github.com/containers/crun/blob/fafb556f09e6ffd4690c452ff51856b880c089f1/src/libcrun/linux.c#L35
[7] https://github.com/moby/moby/commit/c1c973e81b0ff36c697fbeabeb5ea7d09566ddc0
--
2.25.1
This new read-only file prints the capabilities values with their names:
cat /sys/kernel/security/capabilities
0 CAP_CHOWN
1 CAP_DAC_OVERRIDE
...
40 CAP_CHECKPOINT_RESTORE
Acked-by: Casey Schaufler <[email protected]>
Signed-off-by: Francis Laniel <[email protected]>
---
security/inode.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/security/inode.c b/security/inode.c
index 6c326939750d..6da87ae5a8d6 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -21,6 +21,7 @@
#include <linux/security.h>
#include <linux/lsm_hooks.h>
#include <linux/magic.h>
+#include <linux/capability.h>
static struct vfsmount *mount;
static int mount_count;
@@ -328,6 +329,19 @@ static const struct file_operations lsm_ops = {
};
#endif
+static struct dentry *capabilities_dentry;
+static ssize_t capabilities_read(struct file *unused, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ return simple_read_from_buffer(buf, count, ppos, cap_string,
+ strlen(cap_string));
+}
+
+static const struct file_operations capabilities_ops = {
+ .read = capabilities_read,
+ .llseek = generic_file_llseek,
+};
+
static int __init securityfs_init(void)
{
int retval;
@@ -345,6 +359,8 @@ static int __init securityfs_init(void)
lsm_dentry = securityfs_create_file("lsm", 0444, NULL, NULL,
&lsm_ops);
#endif
+ capabilities_dentry = securityfs_create_file("capabilities", 0444, NULL,
+ NULL, &capabilities_ops);
return 0;
}
core_initcall(securityfs_init);
--
2.25.1
This string contains on each line the number of the capability associated
to its name.
For example, first line is:
__stringify(CAP_CHOWN) "\tCAP_CHOWN\n"
which the preprocessor will replace by:
"0\tCAP_CHOWN\n"
Acked-by: Casey Schaufler <[email protected]>
Signed-off-by: Francis Laniel <[email protected]>
---
include/uapi/linux/capability.h | 1 +
kernel/capability.c | 45 +++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 463d1ba2232a..115f4fef00da 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -428,5 +428,6 @@ struct vfs_ns_cap_data {
#define CAP_TO_INDEX(x) ((x) >> 5) /* 1 << 5 == bits in __u32 */
#define CAP_TO_MASK(x) (1 << ((x) & 31)) /* mask for indexed __u32 */
+extern const char *cap_string;
#endif /* _UAPI_LINUX_CAPABILITY_H */
diff --git a/kernel/capability.c b/kernel/capability.c
index 765194f5d678..4cd0ce07458b 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -15,6 +15,7 @@
#include <linux/mm.h>
#include <linux/export.h>
#include <linux/security.h>
+#include <linux/stringify.h>
#include <linux/syscalls.h>
#include <linux/pid_namespace.h>
#include <linux/user_namespace.h>
@@ -27,6 +28,50 @@
const kernel_cap_t __cap_empty_set = CAP_EMPTY_SET;
EXPORT_SYMBOL(__cap_empty_set);
+const char *cap_string =
+ __stringify(CAP_CHOWN) "\tCAP_CHOWN\n"
+ __stringify(CAP_DAC_OVERRIDE) "\tCAP_DAC_OVERRIDE\n"
+ __stringify(CAP_DAC_READ_SEARCH) "\tCAP_DAC_READ_SEARCH\n"
+ __stringify(CAP_FOWNER) "\tCAP_FOWNER\n"
+ __stringify(CAP_FSETID) "\tCAP_FSETID\n"
+ __stringify(CAP_KILL) "\tCAP_KILL\n"
+ __stringify(CAP_SETGID) "\tCAP_SETGID\n"
+ __stringify(CAP_SETUID) "\tCAP_SETUID\n"
+ __stringify(CAP_SETPCAP) "\tCAP_SETPCAP\n"
+ __stringify(CAP_LINUX_IMMUTABLE) "\tCAP_LINUX_IMMUTABLE\n"
+ __stringify(CAP_NET_BIND_SERVICE) "\tCAP_NET_BIND_SERVICE\n"
+ __stringify(CAP_NET_BROADCAST) "\tCAP_NET_BROADCAST\n"
+ __stringify(CAP_NET_ADMIN) "\tCAP_NET_ADMIN\n"
+ __stringify(CAP_NET_RAW) "\tCAP_NET_RAW\n"
+ __stringify(CAP_IPC_LOCK) "\tCAP_IPC_LOCK\n"
+ __stringify(CAP_IPC_OWNER) "\tCAP_IPC_OWNER\n"
+ __stringify(CAP_SYS_MODULE) "\tCAP_SYS_MODULE\n"
+ __stringify(CAP_SYS_RAWIO) "\tCAP_SYS_RAWIO\n"
+ __stringify(CAP_SYS_CHROOT) "\tCAP_SYS_CHROOT\n"
+ __stringify(CAP_SYS_PTRACE) "\tCAP_SYS_PTRACE\n"
+ __stringify(CAP_SYS_PACCT) "\tCAP_SYS_PACCT\n"
+ __stringify(CAP_SYS_ADMIN) "\tCAP_SYS_ADMIN\n"
+ __stringify(CAP_SYS_BOOT) "\tCAP_SYS_BOOT\n"
+ __stringify(CAP_SYS_NICE) "\tCAP_SYS_NICE\n"
+ __stringify(CAP_SYS_RESOURCE) "\tCAP_SYS_RESOURCE\n"
+ __stringify(CAP_SYS_TIME) "\tCAP_SYS_TIME\n"
+ __stringify(CAP_SYS_TTY_CONFIG) "\tCAP_SYS_TTY_CONFIG\n"
+ __stringify(CAP_MKNOD) "\tCAP_MKNOD\n"
+ __stringify(CAP_LEASE) "\tCAP_LEASE\n"
+ __stringify(CAP_AUDIT_WRITE) "\tCAP_AUDIT_WRITE\n"
+ __stringify(CAP_AUDIT_CONTROL) "\tCAP_AUDIT_CONTROL\n"
+ __stringify(CAP_SETFCAP) "\tCAP_SETFCAP\n"
+ __stringify(CAP_MAC_OVERRIDE) "\tCAP_MAC_OVERRIDE\n"
+ __stringify(CAP_MAC_ADMIN) "\tCAP_MAC_ADMIN\n"
+ __stringify(CAP_SYSLOG) "\tCAP_SYSLOG\n"
+ __stringify(CAP_WAKE_ALARM) "\tCAP_WAKE_ALARM\n"
+ __stringify(CAP_BLOCK_SUSPEND) "\tCAP_BLOCK_SUSPEND\n"
+ __stringify(CAP_AUDIT_READ) "\tCAP_AUDIT_READ\n"
+ __stringify(CAP_PERFMON) "\tCAP_PERFMON\n"
+ __stringify(CAP_BPF) "\tCAP_BPF\n"
+ __stringify(CAP_CHECKPOINT_RESTORE) "\tCAP_CHECKPOINT_RESTORE\n"
+;
+
int file_caps_enabled = 1;
static int __init file_caps_disable(char *str)
--
2.25.1
On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
<[email protected]> wrote:
> Hi.
>
> First, I hope you are fine and the same for your relatives.
Hi Francis :)
> A solution to this problem could be to add a way for the userspace to ask the
> kernel about the capabilities it offers.
> So, in this series, I added a new file to securityfs:
> /sys/kernel/security/capabilities.
> The goal of this file is to be used by "container world" software to know kernel
> capabilities at run time instead of compile time.
...
> The kernel already exposes the last capability number under:
> /proc/sys/kernel/cap_last_cap
I'm not clear on why this patchset is needed, why can't the
application simply read from "cap_last_cap" to determine what
capabilities the kernel supports?
--
paul-moore.com
Hi.
Le mardi 16 ao?t 2022, 23:59:41 CEST Paul Moore a ?crit :
> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
>
> <[email protected]> wrote:
> > Hi.
> >
> > First, I hope you are fine and the same for your relatives.
>
> Hi Francis :)
>
> > A solution to this problem could be to add a way for the userspace to ask
> > the kernel about the capabilities it offers.
> > So, in this series, I added a new file to securityfs:
> > /sys/kernel/security/capabilities.
> > The goal of this file is to be used by "container world" software to know
> > kernel capabilities at run time instead of compile time.
>
> ...
>
> > The kernel already exposes the last capability number under:
> > /proc/sys/kernel/cap_last_cap
>
> I'm not clear on why this patchset is needed, why can't the
> application simply read from "cap_last_cap" to determine what
> capabilities the kernel supports?
When you capabilities with, for example, docker, you will fill capabilities
like this:
docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
Sadly, each time a new capability is added to the kernel, it means "container
stack" software should add a new string corresponding to the number of the
capabilities [1].
The solution I propose would lead to "container stack" software to get rid of
such an array and to test at runtime, if the name provided by user on the
command line matches the name of a capability known by the kernel.
If it is the case, the number associated to the capability will be get by
"container stack" code to be used as argument of capset() system call.
The advantage of this solution is that it would reduce the time taken between
a new capability added to the kernel (e.g. CAP_BPF) and the time users can use
it.
More generally, a solution to this problem would be a way for the kernel to
expose the capabilities it knows.
Do not hesitate to ask for clarification if I was not clear.
Best regards.
---
[1] https://github.com/containerd/containerd/blob/
1a078e6893d07fec10a4940a5664fab21d6f7d1e/pkg/cap/cap_linux.go#L135
On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
<[email protected]> wrote:
> Le mardi 16 août 2022, 23:59:41 CEST Paul Moore a écrit :
> > On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
> >
> > <[email protected]> wrote:
> > > Hi.
> > >
> > > First, I hope you are fine and the same for your relatives.
> >
> > Hi Francis :)
> >
> > > A solution to this problem could be to add a way for the userspace to ask
> > > the kernel about the capabilities it offers.
> > > So, in this series, I added a new file to securityfs:
> > > /sys/kernel/security/capabilities.
> > > The goal of this file is to be used by "container world" software to know
> > > kernel capabilities at run time instead of compile time.
> >
> > ...
> >
> > > The kernel already exposes the last capability number under:
> > > /proc/sys/kernel/cap_last_cap
> >
> > I'm not clear on why this patchset is needed, why can't the
> > application simply read from "cap_last_cap" to determine what
> > capabilities the kernel supports?
>
> When you capabilities with, for example, docker, you will fill capabilities
> like this:
> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
>
> Sadly, each time a new capability is added to the kernel, it means "container
> stack" software should add a new string corresponding to the number of the
> capabilities [1].
Thanks for clarifying things, I thought you were more concerned about
detecting what capabilities the running kernel supported, I didn't
realize it was getting a string literal for each supported capability.
Unless there is a significant show of support for this - and I'm
guessing there isn't due to the lack of comments - I don't think this
is something we want to add to the kernel, especially since the kernel
doesn't really care about the capabilities' names, it's the number
that matters.
--
paul-moore.com
On 8/17/2022 7:52 AM, Paul Moore wrote:
> On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
> <[email protected]> wrote:
>> Le mardi 16 août 2022, 23:59:41 CEST Paul Moore a écrit :
>>> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
>>>
>>> <[email protected]> wrote:
>>>> Hi.
>>>>
>>>> First, I hope you are fine and the same for your relatives.
>>> Hi Francis :)
>>>
>>>> A solution to this problem could be to add a way for the userspace to ask
>>>> the kernel about the capabilities it offers.
>>>> So, in this series, I added a new file to securityfs:
>>>> /sys/kernel/security/capabilities.
>>>> The goal of this file is to be used by "container world" software to know
>>>> kernel capabilities at run time instead of compile time.
>>> ...
>>>
>>>> The kernel already exposes the last capability number under:
>>>> /proc/sys/kernel/cap_last_cap
>>> I'm not clear on why this patchset is needed, why can't the
>>> application simply read from "cap_last_cap" to determine what
>>> capabilities the kernel supports?
>> When you capabilities with, for example, docker, you will fill capabilities
>> like this:
>> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
>> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
>>
>> Sadly, each time a new capability is added to the kernel, it means "container
>> stack" software should add a new string corresponding to the number of the
>> capabilities [1].
> Thanks for clarifying things, I thought you were more concerned about
> detecting what capabilities the running kernel supported, I didn't
> realize it was getting a string literal for each supported capability.
> Unless there is a significant show of support for this
I believe this could be a significant help in encouraging the use of
capabilities. An application that has to know the list of capabilities
at compile time but is expected to run unmodified for decades isn't
going to be satisfied with cap_last_cap. The best it can do with that
is abort, not being able to ask an admin what to do in the presence of
a capability that wasn't around before because the name isn't known.
On the other hand, it's possible that capabilities are a failure,
and that any effort to make them easier to use is pointless.
> - and I'm
> guessing there isn't due to the lack of comments - I don't think this
> is something we want to add to the kernel, especially since the kernel
> doesn't really care about the capabilities' names, it's the number
> that matters.
>
On Wed, Aug 17, 2022 at 12:19 PM Serge E. Hallyn <[email protected]> wrote:
> On Wed, Aug 17, 2022 at 12:10:25PM -0400, Paul Moore wrote:
> > On Wed, Aug 17, 2022 at 11:50 AM Casey Schaufler <[email protected]> wrote:
> > > On 8/17/2022 7:52 AM, Paul Moore wrote:
> > > > On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
> > > > <[email protected]> wrote:
> > > >> Le mardi 16 août 2022, 23:59:41 CEST Paul Moore a écrit :
> > > >>> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
> > > >>>
> > > >>> <[email protected]> wrote:
> > > >>>> Hi.
> > > >>>>
> > > >>>> First, I hope you are fine and the same for your relatives.
> > > >>> Hi Francis :)
> > > >>>
> > > >>>> A solution to this problem could be to add a way for the userspace to ask
> > > >>>> the kernel about the capabilities it offers.
> > > >>>> So, in this series, I added a new file to securityfs:
> > > >>>> /sys/kernel/security/capabilities.
> > > >>>> The goal of this file is to be used by "container world" software to know
> > > >>>> kernel capabilities at run time instead of compile time.
> > > >>> ...
> > > >>>
> > > >>>> The kernel already exposes the last capability number under:
> > > >>>> /proc/sys/kernel/cap_last_cap
> > > >>> I'm not clear on why this patchset is needed, why can't the
> > > >>> application simply read from "cap_last_cap" to determine what
> > > >>> capabilities the kernel supports?
> > > >> When you capabilities with, for example, docker, you will fill capabilities
> > > >> like this:
> > > >> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
> > > >> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
> > > >>
> > > >> Sadly, each time a new capability is added to the kernel, it means "container
> > > >> stack" software should add a new string corresponding to the number of the
> > > >> capabilities [1].
> > > > Thanks for clarifying things, I thought you were more concerned about
> > > > detecting what capabilities the running kernel supported, I didn't
> > > > realize it was getting a string literal for each supported capability.
> > > > Unless there is a significant show of support for this
> > >
> > > I believe this could be a significant help in encouraging the use of
> > > capabilities. An application that has to know the list of capabilities
> > > at compile time but is expected to run unmodified for decades isn't
> > > going to be satisfied with cap_last_cap. The best it can do with that
> > > is abort, not being able to ask an admin what to do in the presence of
> > > a capability that wasn't around before because the name isn't known.
> >
> > An application isn't going to be able to deduce the semantic value of
> > a capability based solely on a string value, an integer is just as
> > meaningful in that regard. What might be useful is if the application
>
> Maybe it's important to point out that an integer value capability in
> kernel will NEVER change its string value (or semantic meaning).
>
> The libcap tools like capsh accept integer capabilities, other tools
> probably should as well. (see man 3 cap_from_text)
Seems like a reasonable thing to me, I would much prefer that than the
approach in this patchset.
--
paul-moore.com
On Wed, Aug 17, 2022 at 11:50 AM Casey Schaufler <[email protected]> wrote:
> On 8/17/2022 7:52 AM, Paul Moore wrote:
> > On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
> > <[email protected]> wrote:
> >> Le mardi 16 août 2022, 23:59:41 CEST Paul Moore a écrit :
> >>> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
> >>>
> >>> <[email protected]> wrote:
> >>>> Hi.
> >>>>
> >>>> First, I hope you are fine and the same for your relatives.
> >>> Hi Francis :)
> >>>
> >>>> A solution to this problem could be to add a way for the userspace to ask
> >>>> the kernel about the capabilities it offers.
> >>>> So, in this series, I added a new file to securityfs:
> >>>> /sys/kernel/security/capabilities.
> >>>> The goal of this file is to be used by "container world" software to know
> >>>> kernel capabilities at run time instead of compile time.
> >>> ...
> >>>
> >>>> The kernel already exposes the last capability number under:
> >>>> /proc/sys/kernel/cap_last_cap
> >>> I'm not clear on why this patchset is needed, why can't the
> >>> application simply read from "cap_last_cap" to determine what
> >>> capabilities the kernel supports?
> >> When you capabilities with, for example, docker, you will fill capabilities
> >> like this:
> >> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
> >> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
> >>
> >> Sadly, each time a new capability is added to the kernel, it means "container
> >> stack" software should add a new string corresponding to the number of the
> >> capabilities [1].
> > Thanks for clarifying things, I thought you were more concerned about
> > detecting what capabilities the running kernel supported, I didn't
> > realize it was getting a string literal for each supported capability.
> > Unless there is a significant show of support for this
>
> I believe this could be a significant help in encouraging the use of
> capabilities. An application that has to know the list of capabilities
> at compile time but is expected to run unmodified for decades isn't
> going to be satisfied with cap_last_cap. The best it can do with that
> is abort, not being able to ask an admin what to do in the presence of
> a capability that wasn't around before because the name isn't known.
An application isn't going to be able to deduce the semantic value of
a capability based solely on a string value, an integer is just as
meaningful in that regard. What might be useful is if the application
simply accepts a set of capabilities from the user and then checks
those against the maximum supported by the kernel, but once again that
doesn't require a string value, it just requires the application
taking a set of integers and passing those into the kernel when a
capability set is required. I still don't see how adding the
capability string names to the kernel is useful here.
--
paul-moore.com
On 8/17/2022 9:10 AM, Paul Moore wrote:
> On Wed, Aug 17, 2022 at 11:50 AM Casey Schaufler <[email protected]> wrote:
>> On 8/17/2022 7:52 AM, Paul Moore wrote:
>>> On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
>>> <[email protected]> wrote:
>>>> Le mardi 16 août 2022, 23:59:41 CEST Paul Moore a écrit :
>>>>> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
>>>>>
>>>>> <[email protected]> wrote:
>>>>>> Hi.
>>>>>>
>>>>>> First, I hope you are fine and the same for your relatives.
>>>>> Hi Francis :)
>>>>>
>>>>>> A solution to this problem could be to add a way for the userspace to ask
>>>>>> the kernel about the capabilities it offers.
>>>>>> So, in this series, I added a new file to securityfs:
>>>>>> /sys/kernel/security/capabilities.
>>>>>> The goal of this file is to be used by "container world" software to know
>>>>>> kernel capabilities at run time instead of compile time.
>>>>> ...
>>>>>
>>>>>> The kernel already exposes the last capability number under:
>>>>>> /proc/sys/kernel/cap_last_cap
>>>>> I'm not clear on why this patchset is needed, why can't the
>>>>> application simply read from "cap_last_cap" to determine what
>>>>> capabilities the kernel supports?
>>>> When you capabilities with, for example, docker, you will fill capabilities
>>>> like this:
>>>> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
>>>> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
>>>>
>>>> Sadly, each time a new capability is added to the kernel, it means "container
>>>> stack" software should add a new string corresponding to the number of the
>>>> capabilities [1].
>>> Thanks for clarifying things, I thought you were more concerned about
>>> detecting what capabilities the running kernel supported, I didn't
>>> realize it was getting a string literal for each supported capability.
>>> Unless there is a significant show of support for this
>> I believe this could be a significant help in encouraging the use of
>> capabilities. An application that has to know the list of capabilities
>> at compile time but is expected to run unmodified for decades isn't
>> going to be satisfied with cap_last_cap. The best it can do with that
>> is abort, not being able to ask an admin what to do in the presence of
>> a capability that wasn't around before because the name isn't known.
> An application isn't going to be able to deduce the semantic value of
> a capability based solely on a string value,
True, but it can ask someone what to do, and in that case a string is
much better than a number:
thwonkd: Unknown capability 42 - update thwonkd.conf policy section
thwonkd: Unknown capability butter_toast - update thwonkd.conf policy section
The thwonkd configuration could be updated to use that capability correctly.
Yes, you could look capability 42 up in the system header files, but only
if they're installed and there's no guarantee that the header files match
the running kernel. That said, I can't think of a case where this would be
useful in real life except for systemd and chcap. I can't speak to the
container manager proposed, as I don't see containers being deployed with
finer granularity than "privileged" or "unprivileged".
> an integer is just as
> meaningful in that regard. What might be useful is if the application
> simply accepts a set of capabilities from the user and then checks
> those against the maximum supported by the kernel, but once again that
> doesn't require a string value, it just requires the application
> taking a set of integers and passing those into the kernel when a
> capability set is required. I still don't see how adding the
> capability string names to the kernel is useful here.
>
On Wed, Aug 17, 2022 at 12:10:25PM -0400, Paul Moore wrote:
> On Wed, Aug 17, 2022 at 11:50 AM Casey Schaufler <[email protected]> wrote:
> > On 8/17/2022 7:52 AM, Paul Moore wrote:
> > > On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
> > > <[email protected]> wrote:
> > >> Le mardi 16 ao?t 2022, 23:59:41 CEST Paul Moore a ?crit :
> > >>> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
> > >>>
> > >>> <[email protected]> wrote:
> > >>>> Hi.
> > >>>>
> > >>>> First, I hope you are fine and the same for your relatives.
> > >>> Hi Francis :)
> > >>>
> > >>>> A solution to this problem could be to add a way for the userspace to ask
> > >>>> the kernel about the capabilities it offers.
> > >>>> So, in this series, I added a new file to securityfs:
> > >>>> /sys/kernel/security/capabilities.
> > >>>> The goal of this file is to be used by "container world" software to know
> > >>>> kernel capabilities at run time instead of compile time.
> > >>> ...
> > >>>
> > >>>> The kernel already exposes the last capability number under:
> > >>>> /proc/sys/kernel/cap_last_cap
> > >>> I'm not clear on why this patchset is needed, why can't the
> > >>> application simply read from "cap_last_cap" to determine what
> > >>> capabilities the kernel supports?
> > >> When you capabilities with, for example, docker, you will fill capabilities
> > >> like this:
> > >> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
> > >> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
> > >>
> > >> Sadly, each time a new capability is added to the kernel, it means "container
> > >> stack" software should add a new string corresponding to the number of the
> > >> capabilities [1].
> > > Thanks for clarifying things, I thought you were more concerned about
> > > detecting what capabilities the running kernel supported, I didn't
> > > realize it was getting a string literal for each supported capability.
> > > Unless there is a significant show of support for this
> >
> > I believe this could be a significant help in encouraging the use of
> > capabilities. An application that has to know the list of capabilities
> > at compile time but is expected to run unmodified for decades isn't
> > going to be satisfied with cap_last_cap. The best it can do with that
> > is abort, not being able to ask an admin what to do in the presence of
> > a capability that wasn't around before because the name isn't known.
>
> An application isn't going to be able to deduce the semantic value of
> a capability based solely on a string value, an integer is just as
> meaningful in that regard. What might be useful is if the application
Maybe it's important to point out that an integer value capability in
kernel will NEVER change its string value (or semantic meaning).
The libcap tools like capsh accept integer capabilities, other tools
probably should as well. (see man 3 cap_from_text)
> simply accepts a set of capabilities from the user and then checks
> those against the maximum supported by the kernel, but once again that
> doesn't require a string value, it just requires the application
> taking a set of integers and passing those into the kernel when a
> capability set is required. I still don't see how adding the
> capability string names to the kernel is useful here.
>
> --
> paul-moore.com
On Wed, Aug 17, 2022 at 12:49 PM Casey Schaufler <[email protected]> wrote:
> On 8/17/2022 9:10 AM, Paul Moore wrote:
> > On Wed, Aug 17, 2022 at 11:50 AM Casey Schaufler <[email protected]> wrote:
> >> On 8/17/2022 7:52 AM, Paul Moore wrote:
> >>> On Wed, Aug 17, 2022 at 7:53 AM Francis Laniel
> >>> <[email protected]> wrote:
> >>>> Le mardi 16 août 2022, 23:59:41 CEST Paul Moore a écrit :
> >>>>> On Mon, Jul 25, 2022 at 8:42 AM Francis Laniel
> >>>>>
> >>>>> <[email protected]> wrote:
> >>>>>> Hi.
> >>>>>>
> >>>>>> First, I hope you are fine and the same for your relatives.
> >>>>> Hi Francis :)
> >>>>>
> >>>>>> A solution to this problem could be to add a way for the userspace to ask
> >>>>>> the kernel about the capabilities it offers.
> >>>>>> So, in this series, I added a new file to securityfs:
> >>>>>> /sys/kernel/security/capabilities.
> >>>>>> The goal of this file is to be used by "container world" software to know
> >>>>>> kernel capabilities at run time instead of compile time.
> >>>>> ...
> >>>>>
> >>>>>> The kernel already exposes the last capability number under:
> >>>>>> /proc/sys/kernel/cap_last_cap
> >>>>> I'm not clear on why this patchset is needed, why can't the
> >>>>> application simply read from "cap_last_cap" to determine what
> >>>>> capabilities the kernel supports?
> >>>> When you capabilities with, for example, docker, you will fill capabilities
> >>>> like this:
> >>>> docker run --rm --cap-add SYS_ADMIN debian:latest echo foo
> >>>> As a consequence, the "echo foo" will be run with CAP_SYS_ADMIN set.
> >>>>
> >>>> Sadly, each time a new capability is added to the kernel, it means "container
> >>>> stack" software should add a new string corresponding to the number of the
> >>>> capabilities [1].
> >>> Thanks for clarifying things, I thought you were more concerned about
> >>> detecting what capabilities the running kernel supported, I didn't
> >>> realize it was getting a string literal for each supported capability.
> >>> Unless there is a significant show of support for this
> >> I believe this could be a significant help in encouraging the use of
> >> capabilities. An application that has to know the list of capabilities
> >> at compile time but is expected to run unmodified for decades isn't
> >> going to be satisfied with cap_last_cap. The best it can do with that
> >> is abort, not being able to ask an admin what to do in the presence of
> >> a capability that wasn't around before because the name isn't known.
> > An application isn't going to be able to deduce the semantic value of
> > a capability based solely on a string value,
>
> True, but it can ask someone what to do, and in that case a string is
> much better than a number ...
If you are asking a user what to do, that user can just as easily look
up the capability list to translate numbers to intent. If your
security approach requires a user knowing all of the subtle details
around a capability based on 10~15 character string, I wish you the
best of luck :)
--
paul-moore.com