2022-01-21 22:30:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/2] Add capabilities file to sysfs

Francis Laniel <[email protected]> writes:

> Hi.
>
>
> First, I hope you are fine and the same for your relatives.
>
>
> Capabilities are used to check if a thread has the right to perform a given
> action [1].
> For example, a thread with CAP_BPF set can use the bpf() syscall.
>
> Capabilities are used in the container world.
> In terms of code, several projects related to container maintain code where the
> capabilities are written alike include/uapi/linux/capability.h [2][3][4][5].
> For these projects, their codebase should be updated when a new capability is
> added to the kernel.
> Some other projects rely on <sys/capability.h> [6].
> In this case, this header file should reflect the capabilities offered by the
> kernel.
>
> So, in this series, I added a new file to sysfs:
> /sys/kernel/security/capabilities.

Actually that is a file in securityfs. Which is related but slightly
different. For sysfs this would be immediately unacceptable as it
breaks the one value per file rule. I don't know what the rules
are for securityfs but I do know files that contain many many lines
and get very large tend to be problematic in both their kernel
implementation and in userspace parsing speed.

So I am looking for what the advantage of this file that justifies the
cost of maintaining it.

> The goal of this file is to be used by "container world" software to know kernel
> capabilities at run time instead of compile time.

I don't understand the problem you are trying to solve. If the software
needs to updated what benefit is there for all of the information to be
available at runtime?

>
> The "file" is read-only and its content is the capability number associated with
> the capability name:
> root@vm-amd64:~# cat /sys/kernel/security/capabilities
> 0 CAP_CHOWN
> 1 CAP_DAC_OVERRIDE
> ...
> 40 CAP_CHECKPOINT_RESTORE
>

> The kernel already exposes the last capability number under:
> /proc/sys/kernel/cap_last_cap
> So, I think there should not be any issue exposing all the capabilities it
> offers.
> If there is any, please share it as I do not want to introduce issue with this
> series.

The mapping between capabilities and numbers should never change it is
ABI. I seem to remember a version number in the file capability so that
if the mappings do change that number can be changed in a way that
existing software is not confused.

What is the advantage in printing all of the mappings?
>
> Also, if you see any way to improve this series please share it as it would
> increase this contribution quality.
>
> Change since v2:
> * Use a char * for cap_string instead of an array, each line of this char *
> contains the capability number and its name.
> * Move the file under /sys/kernel/security instead of /sys/kernel.
>
> Francis Laniel (2):
> capability: Add cap_string.
> security/inode.c: Add capabilities file.
>
> include/uapi/linux/capability.h | 1 +
> kernel/capability.c | 45 +++++++++++++++++++++++++++++++++
> security/inode.c | 16 ++++++++++++
> 3 files changed, 62 insertions(+)
>
>
> Best regards and thank you in advance for your reviews.
> ---
> [1] man capabilities
> [2] https://github.com/containerd/containerd/blob/1a078e6893d07fec10a4940a5664fab21d6f7d1e/pkg/cap/cap_linux.go#L135
> [3] https://github.com/moby/moby/commit/485cf38d48e7111b3d1f584d5e9eab46a902aabc#diff-2e04625b209932e74c617de96682ed72fbd1bb0d0cb9fb7c709cf47a86b6f9c1
> moby relies on containerd code.
> [4] https://github.com/syndtr/gocapability/blob/42c35b4376354fd554efc7ad35e0b7f94e3a0ffb/capability/enum.go#L47
> [5] https://github.com/opencontainers/runc/blob/00f56786bb220b55b41748231880ba0e6380519a/libcontainer/capabilities/capabilities.go#L12
> runc relies on syndtr package.
> [6] https://github.com/containers/crun/blob/fafb556f09e6ffd4690c452ff51856b880c089f1/src/libcrun/linux.c#L35

Eric


2022-01-21 22:32:26

by Francis Laniel

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/2] Add capabilities file to sysfs

Hi.

Le jeudi 20 janvier 2022, 19:20:15 CET Eric W. Biederman a ?crit :
> Francis Laniel <[email protected]> writes:
> > Hi.
> >
> >
> > First, I hope you are fine and the same for your relatives.
> >
> >
> > Capabilities are used to check if a thread has the right to perform a
> > given
> > action [1].
> > For example, a thread with CAP_BPF set can use the bpf() syscall.
> >
> > Capabilities are used in the container world.
> > In terms of code, several projects related to container maintain code
> > where the capabilities are written alike include/uapi/linux/capability.h
> > [2][3][4][5]. For these projects, their codebase should be updated when a
> > new capability is added to the kernel.
> > Some other projects rely on <sys/capability.h> [6].
> > In this case, this header file should reflect the capabilities offered by
> > the kernel.
> >
> > So, in this series, I added a new file to sysfs:
> > /sys/kernel/security/capabilities.
>
> Actually that is a file in securityfs. Which is related but slightly
> different. For sysfs this would be immediately unacceptable as it
> breaks the one value per file rule. I don't know what the rules
> are for securityfs but I do know files that contain many many lines
> and get very large tend to be problematic in both their kernel
> implementation and in userspace parsing speed.

I was not aware of the sysfs rule, thank you for sharing it to me, I will add
it to my "kernel knowledge" and will make use of it in the future!

> So I am looking for what the advantage of this file that justifies the
> cost of maintaining it.
>
> > The goal of this file is to be used by "container world" software to know
> > kernel capabilities at run time instead of compile time.
>
> I don't understand the problem you are trying to solve. If the software
> needs to updated what benefit is there for all of the information to be
> available at runtime?

Actually, software like containerd hardcodes all the capabilities the kernel
knows based on a per-version approach [1].
So if a new capabilities, say CAP_NEW, is added to the kernel, containerd code
would become something like this:
// caps59 is the caps of kernel 5.9 (41 entries)
caps59 = append(caps58, "CAP_CHECKPOINT_RESTORE")
// caps59 is the caps of kernel 5.17 (42 entries)
cap517 = append(caps59, "CAP_NEW")
capsLatest = caps517
This approach has two drawbacks:
1. A user which wants to use CAP_NEW would need to use kernel 5.17 but also a
version of containerd which knows about CAP_NEW. So, this user would need to
wait for containerd code to be updated (or to modify it by him/herself and
compiles it).
2. Containerd maintainers would need to change their code to add CAP_NEW.

It would be easier for everyone if the kernel exposes its capabilities, thus
containerd code would become something like this:
caps, err := readCapsFromFile("/sys/kernel/security/capabilities")
if err {
caps = capsLatest
}
This approach addresses the first drawback I quoted above and partly the second
one:
1. If user kernel was compiled with CONFIG_SECURITYFS, he/she does not need to
wait the last version of containerd to use CAP_NEW, as containerd would read
the kernel capabilities from "/sys/kernel/security/capabilities".
2. Containerd maintainers would not need to quickly update their code to
reflect last kernel change.
I agree they would still need to update their code in case /sys/kernel/
security/capabilities does not exist.

To conclude, this series adds a bit of code in the kernel to make userland
life easier while not doing nasty things inside the kernel.
What do you think about it?

> > The "file" is read-only and its content is the capability number
> > associated with the capability name:
> > root@vm-amd64:~# cat /sys/kernel/security/capabilities
> > 0 CAP_CHOWN
> > 1 CAP_DAC_OVERRIDE
> > ...
> > 40 CAP_CHECKPOINT_RESTORE
> >
> >
> > The kernel already exposes the last capability number under:
> > /proc/sys/kernel/cap_last_cap
> > So, I think there should not be any issue exposing all the capabilities it
> > offers.
> > If there is any, please share it as I do not want to introduce issue with
> > this series.
>
> The mapping between capabilities and numbers should never change it is
> ABI. I seem to remember a version number in the file capability so that
> if the mappings do change that number can be changed in a way that
> existing software is not confused.
>
> What is the advantage in printing all of the mappings?

The printing of all the mappings can be used by containerd code to know about
the capabilities the kernel knows.
For example, the above mentioned function readCapsFromFile() could be
implemented like this:
func readCapsFromFile(path string) (string[], err) {
var capabilities string[];
// Open the file.
re := regexp.MustCompile(`(\d+)\t(CAP_\w+)`)
for line /*...*/ {
matches := re.FindStringSubmatch(line)
// To simplify, I will make the hypothesis the regex matched and
// everything is OK.
// matches[0] contains the whole line expect final '\n'.
id := strconv.Atoi(matches[1])
name := matches[2]
capabilities[id] = name
}
// Close the file.
return capabilities, nil
}
It will mainly parse the file output to create the capabilities array.

>
> > Also, if you see any way to improve this series please share it as it
> > would
> > increase this contribution quality.
> >
> > Change since v2:
> > * Use a char * for cap_string instead of an array, each line of this char
> > *
> > contains the capability number and its name.
> > * Move the file under /sys/kernel/security instead of /sys/kernel.
> >
> > Francis Laniel (2):
> > capability: Add cap_string.
> > security/inode.c: Add capabilities file.
> >
> > include/uapi/linux/capability.h | 1 +
> > kernel/capability.c | 45 +++++++++++++++++++++++++++++++++
> > security/inode.c | 16 ++++++++++++
> > 3 files changed, 62 insertions(+)
> >
> > Best regards and thank you in advance for your reviews.
> > ---
> > [1] man capabilities
> > [2]
> > https://github.com/containerd/containerd/blob/1a078e6893d07fec10a4940a566
> > 4fab21d6f7d1e/pkg/cap/cap_linux.go#L135 [3]
> > https://github.com/moby/moby/commit/485cf38d48e7111b3d1f584d5e9eab46a902a
> > abc#diff-2e04625b209932e74c617de96682ed72fbd1bb0d0cb9fb7c709cf47a86b6f9c1
> > moby relies on containerd code.
> > [4]
> > https://github.com/syndtr/gocapability/blob/42c35b4376354fd554efc7ad35e0b
> > 7f94e3a0ffb/capability/enum.go#L47 [5]
> > https://github.com/opencontainers/runc/blob/00f56786bb220b55b41748231880b
> > a0e6380519a/libcontainer/capabilities/capabilities.go#L12 runc relies on
> > syndtr package.
> > [6]
> > https://github.com/containers/crun/blob/fafb556f09e6ffd4690c452ff51856b88
> > 0c089f1/src/libcrun/linux.c#L35
> Eric

Best regards.
---
[1] https://github.com/containerd/containerd/blob/
1a078e6893d07fec10a4940a5664fab21d6f7d1e/pkg/cap/cap_linux.go#L181