2023-06-22 14:59:20

by Roberto Sassu

[permalink] [raw]
Subject: [QUESTION] Full user space process isolation?

Hi everyone

I briefly discussed this topic at LSS NA 2023, but I wanted to have an
opinion from a broader audience.


In short:

I wanted to execute some kernel workloads in a fully isolated user
space process, started from a binary statically linked with klibc,
connected to the kernel only through a pipe.

I also wanted that, for the root user, tampering with that process is
as hard as if the same code runs in kernel space.

I would use the fully isolated process to parse and convert unsupported
data formats to a supported one, after the kernel verified the
authenticity of the original format (that already exists and cannot
change).

Preventing tampering of the process ensures that the conversion goes as
expected. Also, the integrity of the binary needs to be verified.


List of wished data formats:

PGP: verify the authenticity of RPM/DEB/... headers
RPM/DEB/... headers: extract reference file checksums for
(kernel-based) file integrity check (e.g. with IMA)


Alternative #1:

Write the parsers to run in kernel space. That was rejected due to
security and scalability concerns. If that changed, please let me know.


Alternative #2:

Linux distributions could provide what the kernel supports. However,
from personal experience, the effort seems orders of magnitude higher
than just writing a tiny component to support the original format. And
there is no guarantee that all Linux distributions will do it.


Full process isolation could be achieved in this way:

process -> outside: set seccomp strict profile at process creation
so that the process can only read/write/close the
pipe and exit, no other system calls are allowed

outside -> process: deny ptrace/kill with the process as target

Anything else?


The only risk I see is that a new feature allowing to interact with
another process is added to the kernel, without the ptrace permission
being asked.

With the restrictions above, can we say that the code inside the
process is as safe (against tampering) to execute as if it runs in
kernel space?

Thanks

Roberto



2023-06-29 02:20:59

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
> Hi everyone
>
> I briefly discussed this topic at LSS NA 2023, but I wanted to have an
> opinion from a broader audience.
>
>
> In short:
>
> I wanted to execute some kernel workloads in a fully isolated user
> space process, started from a binary statically linked with klibc,
> connected to the kernel only through a pipe.
>
> I also wanted that, for the root user, tampering with that process is
> as hard as if the same code runs in kernel space.
>
> I would use the fully isolated process to parse and convert unsupported
> data formats to a supported one, after the kernel verified the

Can you give some examples here of supported and unsupported data
formats? ext2 is supported, but we sadly don't trust the sb parser
to read a an ext2fs coming from unknown source. So I'm not quite
clear what problem you're trying to solve.

> authenticity of the original format (that already exists and cannot
> change).
>
> Preventing tampering of the process ensures that the conversion goes as
> expected. Also, the integrity of the binary needs to be verified.
>
>
> List of wished data formats:
>
> PGP: verify the authenticity of RPM/DEB/... headers
> RPM/DEB/... headers: extract reference file checksums for
> (kernel-based) file integrity check (e.g. with IMA)
>
>
> Alternative #1:
>
> Write the parsers to run in kernel space. That was rejected due to
> security and scalability concerns. If that changed, please let me know.
>
>
> Alternative #2:
>
> Linux distributions could provide what the kernel supports. However,
> from personal experience, the effort seems orders of magnitude higher
> than just writing a tiny component to support the original format. And
> there is no guarantee that all Linux distributions will do it.
>
>
> Full process isolation could be achieved in this way:
>
> process -> outside: set seccomp strict profile at process creation
> so that the process can only read/write/close the
> pipe and exit, no other system calls are allowed
>
> outside -> process: deny ptrace/kill with the process as target
>
> Anything else?
>
>
> The only risk I see is that a new feature allowing to interact with
> another process is added to the kernel, without the ptrace permission
> being asked.
>
> With the restrictions above, can we say that the code inside the
> process is as safe (against tampering) to execute as if it runs in
> kernel space?
>
> Thanks
>
> Roberto

2023-06-29 09:00:41

by Roberto Sassu

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Wed, 2023-06-28 at 21:10 -0500, Serge E. Hallyn wrote:
> On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
> > Hi everyone
> >
> > I briefly discussed this topic at LSS NA 2023, but I wanted to have an
> > opinion from a broader audience.
> >
> >
> > In short:
> >
> > I wanted to execute some kernel workloads in a fully isolated user
> > space process, started from a binary statically linked with klibc,
> > connected to the kernel only through a pipe.
> >
> > I also wanted that, for the root user, tampering with that process is
> > as hard as if the same code runs in kernel space.
> >
> > I would use the fully isolated process to parse and convert unsupported
> > data formats to a supported one, after the kernel verified the
>
> Can you give some examples here of supported and unsupported data
> formats? ext2 is supported, but we sadly don't trust the sb parser
> to read a an ext2fs coming from unknown source. So I'm not quite
> clear what problem you're trying to solve.

+ eBPF guys (as I'm talking about eBPF)

File digests are distributed in a multitude of formats: RPM packages
provide them in RPMTAG_FILEDIGESTS, DEB packages in md5sum (now
sha256sum), containers I guess they have a manifest.

File digests would be used as reference values for IMA Appraisal and
IMA Measurement (to have a predictable PCR).

If we manage to write a tiny parser for RPM headers (I'm talking about
something like 200 lines) to extract the file digests, basically we
have all the information we need to do IMA Appraisal on current, past
and future RPM-based Linux distributions.

If we do the same for DEB (ok, well, when they switch from MD5 to
SHA256), we can support all DEB-based Linux distributions too.

On the opposite side, yes, we can change the RPM format, ask the
distributions to sign per-file. If it was that easy, we would not be
(still) waiting for the first distribution (Fedora 38) to support file
signatures since 2016, when the feature was proposed (sorry if it looks
like diminishing what Mimi and IBM proposed, I'm trying to convey the
idea of how difficult is to realize that architecture).

What about other formats? How long it could take for them to be adapted
to store file signatures?

Immediate support for IMA Appraisal is as far as looping into the RPM
header sections to obtain the data offset of RPMTAG_FILEDIGESTS and
converting the digests at that offset from hex to bin (we need also to
verify the PGP signature of the RPM header, but I would leave it for
later).

You probably also see how close we are to accomplish the goal. I'm open
to suggestions: if this idea of an isolated process doing the parsing
is not suitable, I could implement it in a different way.

So far I proposed:

- In-kernel parser (dangerous, not scalable)
https://lore.kernel.org/linux-integrity/[email protected]/
- eBPF (no program signatures, not compliant with the LSM conventions,
unsolved security issues)
https://github.com/robertosassu/diglim-ebpf/commit/693745cb388bca7354233cadae1fe2d23d47ff9d
- Isolated user space process (not enough isolation guarantees)
https://lore.kernel.org/linux-kernel//[email protected]/
https://lore.kernel.org/linux-kernel//[email protected]/

Thanks

Roberto

> > authenticity of the original format (that already exists and cannot
> > change).
> >
> > Preventing tampering of the process ensures that the conversion goes as
> > expected. Also, the integrity of the binary needs to be verified.
> >
> >
> > List of wished data formats:
> >
> > PGP: verify the authenticity of RPM/DEB/... headers
> > RPM/DEB/... headers: extract reference file checksums for
> > (kernel-based) file integrity check (e.g. with IMA)
> >
> >
> > Alternative #1:
> >
> > Write the parsers to run in kernel space. That was rejected due to
> > security and scalability concerns. If that changed, please let me know.
> >
> >
> > Alternative #2:
> >
> > Linux distributions could provide what the kernel supports. However,
> > from personal experience, the effort seems orders of magnitude higher
> > than just writing a tiny component to support the original format. And
> > there is no guarantee that all Linux distributions will do it.
> >
> >
> > Full process isolation could be achieved in this way:
> >
> > process -> outside: set seccomp strict profile at process creation
> > so that the process can only read/write/close the
> > pipe and exit, no other system calls are allowed
> >
> > outside -> process: deny ptrace/kill with the process as target
> >
> > Anything else?
> >
> >
> > The only risk I see is that a new feature allowing to interact with
> > another process is added to the kernel, without the ptrace permission
> > being asked.
> >
> > With the restrictions above, can we say that the code inside the
> > process is as safe (against tampering) to execute as if it runs in
> > kernel space?
> >
> > Thanks
> >
> > Roberto


2023-07-02 21:03:47

by Dr. Greg

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Thu, Jun 29, 2023 at 10:11:26AM +0200, Roberto Sassu wrote:

Good morning, I hope the weekend is going well for everyone, greetings
to Roberto and everyone copied.

> On Wed, 2023-06-28 at 21:10 -0500, Serge E. Hallyn wrote:
> > On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
> > > Hi everyone
> > >
> > > I briefly discussed this topic at LSS NA 2023, but I wanted to have an
> > > opinion from a broader audience.
> > >
> > > In short:
> > >
> > > I wanted to execute some kernel workloads in a fully isolated user
> > > space process, started from a binary statically linked with klibc,
> > > connected to the kernel only through a pipe.
> > >
> > > I also wanted that, for the root user, tampering with that process is
> > > as hard as if the same code runs in kernel space.
> > >
> > > I would use the fully isolated process to parse and convert unsupported
> > > data formats to a supported one, after the kernel verified the
> >
> > Can you give some examples here of supported and unsupported data
> > formats? ext2 is supported, but we sadly don't trust the sb parser
> > to read a an ext2fs coming from unknown source. So I'm not quite
> > clear what problem you're trying to solve.
>
> + eBPF guys (as I'm talking about eBPF)

If the week goes well, we will be submitting the second version of our
TSEM LSM for review. It incorporates a significant number of changes
and enhancements, based on both initial review comments, and
importantly, feedback from our collaborators in the critical
infrastructure community.

Just as a levelset. TSEM provides kernel infrastructure to implement
security controls based on either deterministic or machine learning
models. Quixote is the userspace infrastructure that enables use of
the TSEM kernel infrastructure.

Based on your description Roberto, TSEM may be of assistance in
addressesing your issues at two different levels.

First with respect to protection of an isolated workload.

TSEM is inherently workload based, given that it is based on an
architecture that implements security modeling namespaces that a
process heirarchy can be placed into. This reduces model complexity
and provides the implementation of very specific and targeted security
controls based on the needs of a proposed workload.

The security controls are prospective rather than retrospective,
ie. TSEM will pro-actively block any security behaviors that are not
in a security model that has been defined for the workload.

For example, with respect to the concerns you had previously mentioned
about ptrace. If the security model definition does not include a
security state coefficient for a ptrace_traceme security event, it
will be disallowed, regardless of what goes on with respect to kernel
development, modulo of course the ptrace_traceme LSM hook being
discontinued.

Cross-model signaling is blocked by default, which further restricts
what can happen in a workload to the workload itself. The security
event coefficients are generated based on a concept known as a 'task
identity'. The documentation has extensive details on this, but the
effect is to make the security behaviors specific to the identity of
the executable code that requests modeling of security events.

So what one ends up with are security controls that contrains the
behavior of the workload to be as consistent as possible to what the
workload has been unit tested to.

Decision making for the security behavior is embodied in an entity
known as a Trust Modeling Agent (TMA). The architecture is designed
to be flexible with respect to where the TMA is run. The sample
userspace provides TMA's that run in the kernel, a userspace process,
an SGX enclave, a Xen based stubdomain and several micro-controller
implementations.

Lots more details, but I will not drone on further, the TSEM
implementation comes with about 25 pages of documentation discussing
the architecture.

> File digests are distributed in a multitude of formats: RPM packages
> provide them in RPMTAG_FILEDIGESTS, DEB packages in md5sum (now
> sha256sum), containers I guess they have a manifest.
>
> File digests would be used as reference values for IMA Appraisal and
> IMA Measurement (to have a predictable PCR).
>
> If we manage to write a tiny parser for RPM headers (I'm talking about
> something like 200 lines) to extract the file digests, basically we
> have all the information we need to do IMA Appraisal on current, past
> and future RPM-based Linux distributions.
>
> If we do the same for DEB (ok, well, when they switch from MD5 to
> SHA256), we can support all DEB-based Linux distributions too.
>
> On the opposite side, yes, we can change the RPM format, ask the
> distributions to sign per-file. If it was that easy, we would not be
> (still) waiting for the first distribution (Fedora 38) to support file
> signatures since 2016, when the feature was proposed (sorry if it looks
> like diminishing what Mimi and IBM proposed, I'm trying to convey the
> idea of how difficult is to realize that architecture).
>
> What about other formats? How long it could take for them to be adapted
> to store file signatures?
>
> Immediate support for IMA Appraisal is as far as looping into the RPM
> header sections to obtain the data offset of RPMTAG_FILEDIGESTS and
> converting the digests at that offset from hex to bin (we need also to
> verify the PGP signature of the RPM header, but I would leave it for
> later).
>
> You probably also see how close we are to accomplish the goal. I'm open
> to suggestions: if this idea of an isolated process doing the parsing
> is not suitable, I could implement it in a different way.
>
> So far I proposed:
>
> - In-kernel parser (dangerous, not scalable)
> https://lore.kernel.org/linux-integrity/[email protected]/
> - eBPF (no program signatures, not compliant with the LSM conventions,
> unsolved security issues)
> https://github.com/robertosassu/diglim-ebpf/commit/693745cb388bca7354233cadae1fe2d23d47ff9d
> - Isolated user space process (not enough isolation guarantees)
> https://lore.kernel.org/linux-kernel//[email protected]/
> https://lore.kernel.org/linux-kernel//[email protected]/

Quixote/TSEM is the product of a number of years of our team building
secure systems based on IMA, TPM's, TBOOT and SGX. Its design was
largely driven by the challenges that we ran into when trying to put
these implementations into field operation.

If you want to look at options beyond classic IMA and appraisal,
Quixote/TSEM may be of utility beyond just the protection of your
proposed single-purpose parsing process.

Given the issues posed, in particular by SMP systems, we designed
Quixote/TSEM from the ground up to deal with the challenges of time
dependent linear extension measurements, ie. the TPM PCR model. The
Quixote deterministic and quasi-deterministic models are designed to
provide an invariant functional 'state' value for the security status
of a workload, a value that is scheduling independent.

One of the basis parameters for TSEM modeling is the digest value of
either an executable file or memory that has been marked as
executable, the same model used by IMA. In essence you get file
appraisal for 'free', as a result of modeling the security behavior of
a workload.

The bearers token for the security of a workload is a signed security
model generated by unit testing of the workload. If the file digests
generated by workload execution are inconsistent with what has been
unit tested for the workload, the security state goes out of model
definition and is detected and/or blocked.

The only thing that needs to know about keys and signatures is the
Quixote trust orchestrator and the security model definition, all of
which further reduces the distribution complexity of the model.

So Quixote/TSEM provides an alternate architecture where the desired
security state of a workload is provided in some type of workload or
container manifest, not just in the form of file digests. This
architecture should be of interest to Lennart and the systemd group as
well as the container people.

One of the open questions is the notion of establishing a definition
for trust. For example, the 'BPF token' thread, has talked about the
notion of what is a 'trusted' process, discussions on the lockdown LSM
focused on that as well. In TSEM, a workload is 'trusted' by virtue
of not having generating a security state coefficient outside the
bounds of a pre-defined model for a workload. Once it does so, it
will be treated as untrustworthy.

Lots and lots of additional details that I can speak to but this has
grown long already and I want to get out fishing. I can speak in
probably boring detail on any additional questions that you may
have... :-)

> Thanks
>
> Roberto

Have a good week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

2023-07-03 08:01:59

by Roberto Sassu

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Sun, 2023-07-02 at 12:55 -0500, Dr. Greg wrote:
> On Thu, Jun 29, 2023 at 10:11:26AM +0200, Roberto Sassu wrote:
>
> Good morning, I hope the weekend is going well for everyone, greetings
> to Roberto and everyone copied.
>
> > On Wed, 2023-06-28 at 21:10 -0500, Serge E. Hallyn wrote:
> > > On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
> > > > Hi everyone
> > > >
> > > > I briefly discussed this topic at LSS NA 2023, but I wanted to have an
> > > > opinion from a broader audience.
> > > >
> > > > In short:
> > > >
> > > > I wanted to execute some kernel workloads in a fully isolated user
> > > > space process, started from a binary statically linked with klibc,
> > > > connected to the kernel only through a pipe.
> > > >
> > > > I also wanted that, for the root user, tampering with that process is
> > > > as hard as if the same code runs in kernel space.
> > > >
> > > > I would use the fully isolated process to parse and convert unsupported
> > > > data formats to a supported one, after the kernel verified the
> > >
> > > Can you give some examples here of supported and unsupported data
> > > formats? ext2 is supported, but we sadly don't trust the sb parser
> > > to read a an ext2fs coming from unknown source. So I'm not quite
> > > clear what problem you're trying to solve.
> >
> > + eBPF guys (as I'm talking about eBPF)
>
> If the week goes well, we will be submitting the second version of our
> TSEM LSM for review. It incorporates a significant number of changes
> and enhancements, based on both initial review comments, and
> importantly, feedback from our collaborators in the critical
> infrastructure community.
>
> Just as a levelset. TSEM provides kernel infrastructure to implement
> security controls based on either deterministic or machine learning
> models. Quixote is the userspace infrastructure that enables use of
> the TSEM kernel infrastructure.
>
> Based on your description Roberto, TSEM may be of assistance in
> addressesing your issues at two different levels.
>
> First with respect to protection of an isolated workload.
>
> TSEM is inherently workload based, given that it is based on an
> architecture that implements security modeling namespaces that a
> process heirarchy can be placed into. This reduces model complexity
> and provides the implementation of very specific and targeted security
> controls based on the needs of a proposed workload.
>
> The security controls are prospective rather than retrospective,
> ie. TSEM will pro-actively block any security behaviors that are not
> in a security model that has been defined for the workload.
>
> For example, with respect to the concerns you had previously mentioned
> about ptrace. If the security model definition does not include a
> security state coefficient for a ptrace_traceme security event, it
> will be disallowed, regardless of what goes on with respect to kernel
> development, modulo of course the ptrace_traceme LSM hook being
> discontinued.

Hi Greg

thanks for your insights.

The policy is quite simple:


r/w ^ kernel space
----------|-----------------------------------------
v (pipe) user space
+-----------------+ +-----------------------+
| trustworthy UMD |---X---| rest of the processes |
+-----------------+ +-----------------------+

The question was more, is the LSM infrastructure complete enough that
the X can be really enforced?

Could there be other implicit information flows that the LSM
infrastructure is not able/does not yet mediate, that could break the
policy above?

I guess TSEM could be for more elaborated security models, but in this
case the policy is quite straithforward. Also, your TSEM would be as
limited as mine by the LSM hooks available.

Thanks

Roberto


2023-07-03 14:50:00

by Casey Schaufler

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On 7/3/2023 12:57 AM, Roberto Sassu wrote:
> On Sun, 2023-07-02 at 12:55 -0500, Dr. Greg wrote:
>> On Thu, Jun 29, 2023 at 10:11:26AM +0200, Roberto Sassu wrote:
>>
>> Good morning, I hope the weekend is going well for everyone, greetings
>> to Roberto and everyone copied.
>>
>>> On Wed, 2023-06-28 at 21:10 -0500, Serge E. Hallyn wrote:
>>>> On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
>>>>> Hi everyone
>>>>>
>>>>> I briefly discussed this topic at LSS NA 2023, but I wanted to have an
>>>>> opinion from a broader audience.
>>>>>
>>>>> In short:
>>>>>
>>>>> I wanted to execute some kernel workloads in a fully isolated user
>>>>> space process, started from a binary statically linked with klibc,
>>>>> connected to the kernel only through a pipe.
>>>>>
>>>>> I also wanted that, for the root user, tampering with that process is
>>>>> as hard as if the same code runs in kernel space.
>>>>>
>>>>> I would use the fully isolated process to parse and convert unsupported
>>>>> data formats to a supported one, after the kernel verified the
>>>> Can you give some examples here of supported and unsupported data
>>>> formats? ext2 is supported, but we sadly don't trust the sb parser
>>>> to read a an ext2fs coming from unknown source. So I'm not quite
>>>> clear what problem you're trying to solve.
>>> + eBPF guys (as I'm talking about eBPF)
>> If the week goes well, we will be submitting the second version of our
>> TSEM LSM for review. It incorporates a significant number of changes
>> and enhancements, based on both initial review comments, and
>> importantly, feedback from our collaborators in the critical
>> infrastructure community.
>>
>> Just as a levelset. TSEM provides kernel infrastructure to implement
>> security controls based on either deterministic or machine learning
>> models. Quixote is the userspace infrastructure that enables use of
>> the TSEM kernel infrastructure.
>>
>> Based on your description Roberto, TSEM may be of assistance in
>> addressesing your issues at two different levels.
>>
>> First with respect to protection of an isolated workload.
>>
>> TSEM is inherently workload based, given that it is based on an
>> architecture that implements security modeling namespaces that a
>> process heirarchy can be placed into. This reduces model complexity
>> and provides the implementation of very specific and targeted security
>> controls based on the needs of a proposed workload.
>>
>> The security controls are prospective rather than retrospective,
>> ie. TSEM will pro-actively block any security behaviors that are not
>> in a security model that has been defined for the workload.
>>
>> For example, with respect to the concerns you had previously mentioned
>> about ptrace. If the security model definition does not include a
>> security state coefficient for a ptrace_traceme security event, it
>> will be disallowed, regardless of what goes on with respect to kernel
>> development, modulo of course the ptrace_traceme LSM hook being
>> discontinued.
> Hi Greg
>
> thanks for your insights.
>
> The policy is quite simple:
>
>
> r/w ^ kernel space
> ----------|-----------------------------------------
> v (pipe) user space
> +-----------------+ +-----------------------+
> | trustworthy UMD |---X---| rest of the processes |
> +-----------------+ +-----------------------+
>
> The question was more, is the LSM infrastructure complete enough that
> the X can be really enforced?

I believe that it is. SELinux and Smack, users of the LSM infrastructure,
enforce "X". They also require netlabel for IP communications, and Smack
falls short on newer protocols, but that's not the fault of LSM.

>
> Could there be other implicit information flows that the LSM
> infrastructure is not able/does not yet mediate, that could break the
> policy above?

Sure. Every so often something pops into the kernel (e.g. io_uring)
without proper LSM integration. We try to discourage that, and correct
it when we find it.

>
> I guess TSEM could be for more elaborated security models, but in this
> case the policy is quite straithforward. Also, your TSEM would be as
> limited as mine by the LSM hooks available.
>
> Thanks
>
> Roberto
>

2023-07-03 15:30:11

by Jann Horn

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Thu, Jun 22, 2023 at 4:45 PM Roberto Sassu
<[email protected]> wrote:
> I wanted to execute some kernel workloads in a fully isolated user
> space process, started from a binary statically linked with klibc,
> connected to the kernel only through a pipe.

FWIW, the kernel has some infrastructure for this already, see
CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
example in net/bpfilter/.

> I also wanted that, for the root user, tampering with that process is
> as hard as if the same code runs in kernel space.

I believe that actually making it that hard would probably mean that
you'd have to ensure that the process doesn't use swap (in other
words, it would have to run with all memory locked), because root can
choose where swapped pages are stored. Other than that, if you mark it
as a kthread so that no ptrace access is allowed, you can probably get
pretty close. But if you do anything like that, please leave some way
(like a kernel build config option or such) to enable debugging for
these processes.

But I'm not convinced that it makes sense to try to draw a security
boundary between fully-privileged root (with the ability to mount
things and configure swap and so on) and the kernel - my understanding
is that some kernel subsystems don't treat root-to-kernel privilege
escalation issues as security bugs that have to be fixed.

2023-07-03 16:07:35

by Roberto Sassu

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote:
> On Thu, Jun 22, 2023 at 4:45 PM Roberto Sassu
> <[email protected]> wrote:
> > I wanted to execute some kernel workloads in a fully isolated user
> > space process, started from a binary statically linked with klibc,
> > connected to the kernel only through a pipe.
>
> FWIW, the kernel has some infrastructure for this already, see
> CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
> example in net/bpfilter/.

Thanks, I actually took that code to make a generic UMD management
library, that can be used by all use cases:

https://lore.kernel.org/linux-kernel/[email protected]/

> > I also wanted that, for the root user, tampering with that process is
> > as hard as if the same code runs in kernel space.
>
> I believe that actually making it that hard would probably mean that
> you'd have to ensure that the process doesn't use swap (in other
> words, it would have to run with all memory locked), because root can
> choose where swapped pages are stored. Other than that, if you mark it
> as a kthread so that no ptrace access is allowed, you can probably get
> pretty close. But if you do anything like that, please leave some way
> (like a kernel build config option or such) to enable debugging for
> these processes.

I didn't think about the swapping part... thanks!

Ok to enable debugging with a config option.

> But I'm not convinced that it makes sense to try to draw a security
> boundary between fully-privileged root (with the ability to mount
> things and configure swap and so on) and the kernel - my understanding
> is that some kernel subsystems don't treat root-to-kernel privilege
> escalation issues as security bugs that have to be fixed.

Yes, that is unfortunately true, and in that case the trustworthy UMD
would not make things worse. On the other hand, on systems where that
separation is defined, the advantage would be to run more exploitable
code in user space, leaving the kernel safe.

I'm thinking about all the cases where the code had to be included in
the kernel to run at the same privilege level, but would not use any of
the kernel facilities (e.g. parsers).

If the boundary is extended to user space, some of these components
could be moved away from the kernel, and the functionality would be the
same without decreasing the security.

Or, new features that are too complex can be partially implemented in
kernel space, partially in user space, increasing their chances to be
upstreamed.

Roberto


2023-07-03 16:20:23

by Roberto Sassu

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Mon, 2023-07-03 at 07:43 -0700, Casey Schaufler wrote:
> On 7/3/2023 12:57 AM, Roberto Sassu wrote:
> > On Sun, 2023-07-02 at 12:55 -0500, Dr. Greg wrote:
> > > On Thu, Jun 29, 2023 at 10:11:26AM +0200, Roberto Sassu wrote:
> > >
> > > Good morning, I hope the weekend is going well for everyone, greetings
> > > to Roberto and everyone copied.
> > >
> > > > On Wed, 2023-06-28 at 21:10 -0500, Serge E. Hallyn wrote:
> > > > > On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
> > > > > > Hi everyone
> > > > > >
> > > > > > I briefly discussed this topic at LSS NA 2023, but I wanted to have an
> > > > > > opinion from a broader audience.
> > > > > >
> > > > > > In short:
> > > > > >
> > > > > > I wanted to execute some kernel workloads in a fully isolated user
> > > > > > space process, started from a binary statically linked with klibc,
> > > > > > connected to the kernel only through a pipe.
> > > > > >
> > > > > > I also wanted that, for the root user, tampering with that process is
> > > > > > as hard as if the same code runs in kernel space.
> > > > > >
> > > > > > I would use the fully isolated process to parse and convert unsupported
> > > > > > data formats to a supported one, after the kernel verified the
> > > > > Can you give some examples here of supported and unsupported data
> > > > > formats? ext2 is supported, but we sadly don't trust the sb parser
> > > > > to read a an ext2fs coming from unknown source. So I'm not quite
> > > > > clear what problem you're trying to solve.
> > > > + eBPF guys (as I'm talking about eBPF)
> > > If the week goes well, we will be submitting the second version of our
> > > TSEM LSM for review. It incorporates a significant number of changes
> > > and enhancements, based on both initial review comments, and
> > > importantly, feedback from our collaborators in the critical
> > > infrastructure community.
> > >
> > > Just as a levelset. TSEM provides kernel infrastructure to implement
> > > security controls based on either deterministic or machine learning
> > > models. Quixote is the userspace infrastructure that enables use of
> > > the TSEM kernel infrastructure.
> > >
> > > Based on your description Roberto, TSEM may be of assistance in
> > > addressesing your issues at two different levels.
> > >
> > > First with respect to protection of an isolated workload.
> > >
> > > TSEM is inherently workload based, given that it is based on an
> > > architecture that implements security modeling namespaces that a
> > > process heirarchy can be placed into. This reduces model complexity
> > > and provides the implementation of very specific and targeted security
> > > controls based on the needs of a proposed workload.
> > >
> > > The security controls are prospective rather than retrospective,
> > > ie. TSEM will pro-actively block any security behaviors that are not
> > > in a security model that has been defined for the workload.
> > >
> > > For example, with respect to the concerns you had previously mentioned
> > > about ptrace. If the security model definition does not include a
> > > security state coefficient for a ptrace_traceme security event, it
> > > will be disallowed, regardless of what goes on with respect to kernel
> > > development, modulo of course the ptrace_traceme LSM hook being
> > > discontinued.
> > Hi Greg
> >
> > thanks for your insights.
> >
> > The policy is quite simple:
> >
> >
> > r/w ^ kernel space
> > ----------|-----------------------------------------
> > v (pipe) user space
> > +-----------------+ +-----------------------+
> > | trustworthy UMD |---X---| rest of the processes |
> > +-----------------+ +-----------------------+
> >
> > The question was more, is the LSM infrastructure complete enough that
> > the X can be really enforced?
>
> I believe that it is. SELinux and Smack, users of the LSM infrastructure,
> enforce "X". They also require netlabel for IP communications, and Smack
> falls short on newer protocols, but that's not the fault of LSM.
>
> >
> > Could there be other implicit information flows that the LSM
> > infrastructure is not able/does not yet mediate, that could break the
> > policy above?
>
> Sure. Every so often something pops into the kernel (e.g. io_uring)
> without proper LSM integration. We try to discourage that, and correct
> it when we find it.

Well, ok. I guess Paul's point was that it is better to write code in
the kernel to be sure, than running in this kind of risk. Maybe for
certain workloads, it is a much better choice.

For example, if the trustworthy UMD had the task to extract the crypto
material from X.509 certificates an PKCS#7 signatures, and pass it to
the kernel, breaking the isolation almost certainly would mean that the
kernel accepts more kernel modules than it should.

The question would be, if we restrict the scope of data processed by
trustworthy UMDs, would that make the solution more acceptable?

An idea for example would be: if we do appraisal with the traditional
methods (signature in the xattr, HMAC, etc.) the trustworthy UMD would
not have any impact.

Only if the IMA policy says, allow appraisal based on what the
trustworthy UMD provides, maybe it is ok? (Mimi?)

Thanks

Roberto

> >
> > I guess TSEM could be for more elaborated security models, but in this
> > case the policy is quite straithforward. Also, your TSEM would be as
> > limited as mine by the LSM hooks available.
> >
> > Thanks
> >
> > Roberto
> >


2023-07-03 18:57:22

by Kees Cook

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Mon, Jul 03, 2023 at 05:06:42PM +0200, Jann Horn wrote:
> But I'm not convinced that it makes sense to try to draw a security
> boundary between fully-privileged root (with the ability to mount
> things and configure swap and so on) and the kernel - my understanding
> is that some kernel subsystems don't treat root-to-kernel privilege
> escalation issues as security bugs that have to be fixed.

There are certainly arguments to be made about this, but efforts continue
to provide a separation between full-cap uid 0 and kernel memory. LSMs
like Lockdown, IMA, and LoadPin, for example, seek to close these gaps,
and systems are designed with this bright line existing between kernel
and root (e.g. Chrome OS). I'm sure there are gaps in attack surface
coverage, but since work continues on this kind of hardening, I'd hate
to knowingly create new attack surface. Providing uid 0 with kernel
memory access should continue to be mediated by at least Lockdown, and
if there are gaps in coverage, let's get them recorded[1] to be fixed.

-Kees

[1] https://github.com/KSPP/linux/issues

--
Kees Cook

2023-07-04 17:37:16

by Dr. Greg

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Mon, Jul 03, 2023 at 09:57:53AM +0200, Roberto Sassu wrote:

> On Sun, 2023-07-02 at 12:55 -0500, Dr. Greg wrote:
> > On Thu, Jun 29, 2023 at 10:11:26AM +0200, Roberto Sassu wrote:
> >
> > Good morning, I hope the weekend is going well for everyone, greetings
> > to Roberto and everyone copied.
> >
> > > On Wed, 2023-06-28 at 21:10 -0500, Serge E. Hallyn wrote:
> > > > On Thu, Jun 22, 2023 at 04:42:37PM +0200, Roberto Sassu wrote:
> > > > > Hi everyone
> > > > >
> > > > > I briefly discussed this topic at LSS NA 2023, but I wanted to have an
> > > > > opinion from a broader audience.
> > > > >
> > > > > In short:
> > > > >
> > > > > I wanted to execute some kernel workloads in a fully isolated user
> > > > > space process, started from a binary statically linked with klibc,
> > > > > connected to the kernel only through a pipe.
> > > > >
> > > > > I also wanted that, for the root user, tampering with that process is
> > > > > as hard as if the same code runs in kernel space.
> > > > >
> > > > > I would use the fully isolated process to parse and convert unsupported
> > > > > data formats to a supported one, after the kernel verified the
> > > >
> > > > Can you give some examples here of supported and unsupported data
> > > > formats? ext2 is supported, but we sadly don't trust the sb parser
> > > > to read a an ext2fs coming from unknown source. So I'm not quite
> > > > clear what problem you're trying to solve.
> > >
> > > + eBPF guys (as I'm talking about eBPF)
> >
> > If the week goes well, we will be submitting the second version of our
> > TSEM LSM for review. It incorporates a significant number of changes
> > and enhancements, based on both initial review comments, and
> > importantly, feedback from our collaborators in the critical
> > infrastructure community.
> >
> > Just as a levelset. TSEM provides kernel infrastructure to implement
> > security controls based on either deterministic or machine learning
> > models. Quixote is the userspace infrastructure that enables use of
> > the TSEM kernel infrastructure.
> >
> > Based on your description Roberto, TSEM may be of assistance in
> > addressesing your issues at two different levels.
> >
> > First with respect to protection of an isolated workload.
> >
> > TSEM is inherently workload based, given that it is based on an
> > architecture that implements security modeling namespaces that a
> > process heirarchy can be placed into. This reduces model complexity
> > and provides the implementation of very specific and targeted security
> > controls based on the needs of a proposed workload.
> >
> > The security controls are prospective rather than retrospective,
> > ie. TSEM will pro-actively block any security behaviors that are not
> > in a security model that has been defined for the workload.
> >
> > For example, with respect to the concerns you had previously mentioned
> > about ptrace. If the security model definition does not include a
> > security state coefficient for a ptrace_traceme security event, it
> > will be disallowed, regardless of what goes on with respect to kernel
> > development, modulo of course the ptrace_traceme LSM hook being
> > discontinued.

> Hi Greg

Hi, I hope your day is going well.

> thanks for your insights.

Such as they were, the price was right... :-)

> The policy is quite simple:
>
>
> r/w ^ kernel space
> ----------|-----------------------------------------
> v (pipe) user space
> +-----------------+ +-----------------------+
> | trustworthy UMD |---X---| rest of the processes |
> +-----------------+ +-----------------------+
>
> The question was more, is the LSM infrastructure complete enough that
> the X can be really enforced?
>
> Could there be other implicit information flows that the LSM
> infrastructure is not able/does not yet mediate, that could break the
> policy above?

When we initiated the Quixote project, to bring security modeling and
machine learning based security policy to the kernel, the predicate
assumed was that the LSM hooks represented the complete basis set of
information that was required to define the security state of a
system.

If the current LSM hooks are insufficient in number or lack being
fully descriptive in character, the LSM by definition, cannot fully
protect a platform.

I see that Casey replied downthread and indicated he thought the LSM
hooks were sufficient to model the necessary security threats,
obviously good news for both your work and ours.

Just as a note of clarification.

Casey indicated that the LSM supported labeled networking would be of
assistance in your model, but my assumption from your diagram, is that
the dashed line with the X in it, implies that there is to be NO
information flow allowed between the sandboxed UMD process and the
rest of the processes running on the system.

This would be in contrast to the line representing some type of
limited network or pipe connectivity, with appropriate security
controls or labeling on the traffic, is this a correct assumption?

It would seem like there would need to be two classes of security
guarantees in place for your model. First, the fact that the
trustworthy UMD cannot be forced to commit some action that was not
intended for it, and second, that the surrounding system can be
trusted to not try and exert nefarious influence on the UMD.

Wouldn't the second requirement necessitate that the UMD operate with
some form of attestation as to the character of the surrounding
system?

Other than the fact that Intel chose to not make the technology
sufficiently ubiquitous, it would seem that SGX would be tailor made
for this.

> I guess TSEM could be for more elaborated security models, but in
> this case the policy is quite straithforward. Also, your TSEM would
> be as limited as mine by the LSM hooks available.

TSEM isn't about elaborate, it is about defining the notion of
workload specific security models. A TMA running a security model,
that acts only on file digests accessed by files with uid=0, would
roughly implement IMA, with appraisal essentially for 'free'.

With respect to this discussion, one of the points that I was trying
to make is that if you make the need to parse file digests from .rpms
and .debs go away, the need for the highly protected UMD goes away as
well.

TSEM, with a signed security model processed by a trust orchestrator,
implements that model, along with an invariant representation of the
state of the system.

In fact, we have micro-controller based TMA's that pull their security
models over a CAT1.M network connection, completely external to the OS
being modeled, which is probably as much isolation as is
possible... :-)

> Thanks
>
> Roberto

Have a good day.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

2023-07-06 11:00:33

by Dr. Greg

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Tue, Jul 04, 2023 at 05:18:43PM +0200, Petr Tesarik wrote:

Good morning, I hope the week is going well for everyone.

> On 7/3/2023 5:28 PM, Roberto Sassu wrote:
> > On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote:
> >> On Thu, Jun 22, 2023 at 4:45???PM Roberto Sassu
> >> <[email protected]> wrote:
> >>> I wanted to execute some kernel workloads in a fully isolated user
> >>> space process, started from a binary statically linked with klibc,
> >>> connected to the kernel only through a pipe.
> >>
> >> FWIW, the kernel has some infrastructure for this already, see
> >> CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
> >> example in net/bpfilter/.
> >
> > Thanks, I actually took that code to make a generic UMD management
> > library, that can be used by all use cases:
> >
> > https://lore.kernel.org/linux-kernel/[email protected]/
> >
> >>> I also wanted that, for the root user, tampering with that process is
> >>> as hard as if the same code runs in kernel space.
> >>
> >> I believe that actually making it that hard would probably mean that
> >> you'd have to ensure that the process doesn't use swap (in other
> >> words, it would have to run with all memory locked), because root can
> >> choose where swapped pages are stored. Other than that, if you mark it
> >> as a kthread so that no ptrace access is allowed, you can probably get
> >> pretty close. But if you do anything like that, please leave some way
> >> (like a kernel build config option or such) to enable debugging for
> >> these processes.
> >
> > I didn't think about the swapping part... thanks!
> >
> > Ok to enable debugging with a config option.
> >
> >> But I'm not convinced that it makes sense to try to draw a security
> >> boundary between fully-privileged root (with the ability to mount
> >> things and configure swap and so on) and the kernel - my understanding
> >> is that some kernel subsystems don't treat root-to-kernel privilege
> >> escalation issues as security bugs that have to be fixed.
> >
> > Yes, that is unfortunately true, and in that case the trustworthy UMD
> > would not make things worse. On the other hand, on systems where that
> > separation is defined, the advantage would be to run more exploitable
> > code in user space, leaving the kernel safe.
> >
> > I'm thinking about all the cases where the code had to be included in
> > the kernel to run at the same privilege level, but would not use any of
> > the kernel facilities (e.g. parsers).
>
> Thanks for reminding me of kexec-tools. The complete image for booting a
> new kernel was originally prepared in user space. With kernel lockdown,
> all this code had to move into the kernel, adding a new syscall and lots
> of complexity to build purgatory code, etc. Yet, this new implementation
> in the kernel does not offer all features of kexec-tools, so both code
> bases continue to exist and are happily diverging...
>
> > If the boundary is extended to user space, some of these components
> > could be moved away from the kernel, and the functionality would be the
> > same without decreasing the security.

> All right, AFAICS your idea is limited to relatively simple cases
> for now. I mean, allowing kexec-tools to run in user space is not
> easily possible when UID 0 is not trusted, because kexec needs to
> open various files and make various other syscalls, which would
> require a complex LSM policy. It looks technically possible to write
> one, but then the big question is if it would be simpler to review
> and maintain than adding more kexec-tools features to the kernel.

You either need to develop and maintain a complex system-wide LSM
policy or you need a security model that is specifically tuned and
then scoped to the needs of the workload running on behalf of the
kernel as a UID=0 userspace process.

As I noted in my e-mail to Roberto, our TSEM LSM brings forward the
ability to do both, as a useful side effect of the need to limit model
complexity when the objective is to have a single functional
description of the security state of a system.

> Anyway, I can sense a general desire to run less code in the most
> privileged system environment. Robert's proposal is one of few that
> go in this direction. What are the alternatives?

As I noted above, TSEM brings the ability to provide highly specific
and narrowly scoped security policy to a process heirarchy
ie. workload.

However, regardless of the technology applied, in order to pursue
Roberto's UMD model of having a uid=0 process run tasks on behalf of
the kernel, there would seem to be a need to define what the security
objectives are.

From the outside looking in, there would seem to be a need to address
two primary issues:

1: Trust/constrain what the UMD process can do.

2: Constrain what the system at large can do to the UMD process.

As we have seen before, requirement 1 implies a definition of what it
means for a process to be 'trusted'.

In the absence of formal verification, which appears to be a
non-starter in practice, this would seem to imply defining a standard
for the allowed security behavior of the UMD workload.

From our perspective, with TSEM, we define 'trusted' for a workload to
mean that it has not requested a security behavior inconsistent with
what the workload has been unit tested to. If a process does this, its
ability to execute additional security behaviors is curtailed.

With respect to requirement two.

Here is the ASCII art diagram of Roberto's proposed system:

r/w ^ kernel space
----------|-----------------------------------------
v (pipe) user space
+-----------------+ +-----------------------+
| trustworthy UMD |---X---| rest of the processes |
+-----------------+ +-----------------------+

Casey noted that he believed the Linux LSM had sufficient coverage to
provide the necessary security controls for this model. He
specifically mentioned that it had support for network traffic
controls and labeling.

I haven't seen a reply from Roberto to my e-mail questioning what the
following means:

---X---

But I get the sense that it means that any other process in userspace
couldn't have any impact, or I assume visibility, into what the UID=0
process is doing on behalf of the kernel. I don't think it means that
there is supposed to be some type of highly controlled traffic between
the UMD and other processes.

We will see what comments Roberto has on this.

This arguably may be the most difficult requirement to meet if our
interpretation of this requirement is correct, particularly so if this
involves a confidentiality requirement, perhaps a bit less so if there
is only a requirement of integrity of execution.

As I mentioned in a previous e-mail, depending on the requirements,
issue 2 starts to look a lot like protected enclave technologies such
as SGX. As history has shown, providing a protected execution
environment, against the rest of the system, is a somewhat formidable
undertaking, with probably a requirement for hardware support if SGX
and/or TDX are any examples.

So, I believe that TSEM brings useful technology to the table, but
regardless of technology, it would seem there is a need to
specifically define the security requirements for the UMD model.

> Petr T

Have a good day.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity

2023-07-06 11:46:30

by Roberto Sassu

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Thu, 2023-07-06 at 05:53 -0500, Dr. Greg wrote:
> On Tue, Jul 04, 2023 at 05:18:43PM +0200, Petr Tesarik wrote:
>
> Good morning, I hope the week is going well for everyone.
>
> > On 7/3/2023 5:28 PM, Roberto Sassu wrote:
> > > On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote:
> > > > On Thu, Jun 22, 2023 at 4:45???PM Roberto Sassu
> > > > <[email protected]> wrote:
> > > > > I wanted to execute some kernel workloads in a fully isolated user
> > > > > space process, started from a binary statically linked with klibc,
> > > > > connected to the kernel only through a pipe.
> > > >
> > > > FWIW, the kernel has some infrastructure for this already, see
> > > > CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
> > > > example in net/bpfilter/.
> > >
> > > Thanks, I actually took that code to make a generic UMD management
> > > library, that can be used by all use cases:
> > >
> > > https://lore.kernel.org/linux-kernel/[email protected]/
> > >
> > > > > I also wanted that, for the root user, tampering with that process is
> > > > > as hard as if the same code runs in kernel space.
> > > >
> > > > I believe that actually making it that hard would probably mean that
> > > > you'd have to ensure that the process doesn't use swap (in other
> > > > words, it would have to run with all memory locked), because root can
> > > > choose where swapped pages are stored. Other than that, if you mark it
> > > > as a kthread so that no ptrace access is allowed, you can probably get
> > > > pretty close. But if you do anything like that, please leave some way
> > > > (like a kernel build config option or such) to enable debugging for
> > > > these processes.
> > >
> > > I didn't think about the swapping part... thanks!
> > >
> > > Ok to enable debugging with a config option.
> > >
> > > > But I'm not convinced that it makes sense to try to draw a security
> > > > boundary between fully-privileged root (with the ability to mount
> > > > things and configure swap and so on) and the kernel - my understanding
> > > > is that some kernel subsystems don't treat root-to-kernel privilege
> > > > escalation issues as security bugs that have to be fixed.
> > >
> > > Yes, that is unfortunately true, and in that case the trustworthy UMD
> > > would not make things worse. On the other hand, on systems where that
> > > separation is defined, the advantage would be to run more exploitable
> > > code in user space, leaving the kernel safe.
> > >
> > > I'm thinking about all the cases where the code had to be included in
> > > the kernel to run at the same privilege level, but would not use any of
> > > the kernel facilities (e.g. parsers).
> >
> > Thanks for reminding me of kexec-tools. The complete image for booting a
> > new kernel was originally prepared in user space. With kernel lockdown,
> > all this code had to move into the kernel, adding a new syscall and lots
> > of complexity to build purgatory code, etc. Yet, this new implementation
> > in the kernel does not offer all features of kexec-tools, so both code
> > bases continue to exist and are happily diverging...
> >
> > > If the boundary is extended to user space, some of these components
> > > could be moved away from the kernel, and the functionality would be the
> > > same without decreasing the security.
>
> > All right, AFAICS your idea is limited to relatively simple cases
> > for now. I mean, allowing kexec-tools to run in user space is not
> > easily possible when UID 0 is not trusted, because kexec needs to
> > open various files and make various other syscalls, which would
> > require a complex LSM policy. It looks technically possible to write
> > one, but then the big question is if it would be simpler to review
> > and maintain than adding more kexec-tools features to the kernel.
>
> You either need to develop and maintain a complex system-wide LSM
> policy or you need a security model that is specifically tuned and
> then scoped to the needs of the workload running on behalf of the
> kernel as a UID=0 userspace process.
>
> As I noted in my e-mail to Roberto, our TSEM LSM brings forward the
> ability to do both, as a useful side effect of the need to limit model
> complexity when the objective is to have a single functional
> description of the security state of a system.
>
> > Anyway, I can sense a general desire to run less code in the most
> > privileged system environment. Robert's proposal is one of few that
> > go in this direction. What are the alternatives?
>
> As I noted above, TSEM brings the ability to provide highly specific
> and narrowly scoped security policy to a process heirarchy
> ie. workload.
>
> However, regardless of the technology applied, in order to pursue
> Roberto's UMD model of having a uid=0 process run tasks on behalf of
> the kernel, there would seem to be a need to define what the security
> objectives are.
>
> From the outside looking in, there would seem to be a need to address
> two primary issues:
>
> 1: Trust/constrain what the UMD process can do.

Very simple:

read from a kernel-opened fd, write to another kernel-opened fd, close
the fds and exit.

With the seccomp strict profile, a process cannot call any other system
call, and it gets killed if it does.

I tried to write a BPF filter, to see how far I can go, and that seems
sufficient to constrain what the UMD process can do.

Please note that the UMD process setup is done by the kernel, before
any user space code has the chance to run. The kernel is responsible to
properly establish the communication with the UMD process.

> 2: Constrain what the system at large can do to the UMD process.

If someone outside can influence the behavior of the UMD process,
meaning altering the result, that would be unacceptable.

I found that denying ptrace on the UMD process as target, more or less
covers everything, even trying to read or write /proc/<pid>/fd/<N>.

There might be something more subtle, like what Iann pointed out, avoid
swapping of the UMD process, as there is no integrity check when the
page comes back.

Other than that, I was limiting the kill, maybe we have to do something
similar with io_uring (but we would know if the UMD process uses it).
With that in place, the UMD process seems pretty much isolated.

I would definitely not complicate things more than that, seems that
this problem is already difficult enough to solve.

Since the goal is very specific, I think writing a very small LSM would
make sense. With SELinux or Smack, you could also do it, but you have
to care about loading a policy, enforcing, etc..

The main question is if the kernel is able to enforce isolation on the
UMD process as it would do for itself.

I'm not considering confidentiality for now, just integrity. And with
the most simple case of the UMD process only communicating with the
kernel (sufficient for my use cases).

> As we have seen before, requirement 1 implies a definition of what it
> means for a process to be 'trusted'.
>
> In the absence of formal verification, which appears to be a
> non-starter in practice, this would seem to imply defining a standard
> for the allowed security behavior of the UMD workload.
>
> From our perspective, with TSEM, we define 'trusted' for a workload to
> mean that it has not requested a security behavior inconsistent with
> what the workload has been unit tested to. If a process does this, its
> ability to execute additional security behaviors is curtailed.
>
> With respect to requirement two.
>
> Here is the ASCII art diagram of Roberto's proposed system:
>
> r/w ^ kernel space
> ----------|-----------------------------------------
> v (pipe) user space
> +-----------------+ +-----------------------+
> | trustworthy UMD |---X---| rest of the processes |
> +-----------------+ +-----------------------+
>
> Casey noted that he believed the Linux LSM had sufficient coverage to
> provide the necessary security controls for this model. He
> specifically mentioned that it had support for network traffic
> controls and labeling.
>
> I haven't seen a reply from Roberto to my e-mail questioning what the
> following means:
>
> ---X---

That means no communication.

Thanks

Roberto

> But I get the sense that it means that any other process in userspace
> couldn't have any impact, or I assume visibility, into what the UID=0
> process is doing on behalf of the kernel. I don't think it means that
> there is supposed to be some type of highly controlled traffic between
> the UMD and other processes.
>
> We will see what comments Roberto has on this.
>
> This arguably may be the most difficult requirement to meet if our
> interpretation of this requirement is correct, particularly so if this
> involves a confidentiality requirement, perhaps a bit less so if there
> is only a requirement of integrity of execution.
>
> As I mentioned in a previous e-mail, depending on the requirements,
> issue 2 starts to look a lot like protected enclave technologies such
> as SGX. As history has shown, providing a protected execution
> environment, against the rest of the system, is a somewhat formidable
> undertaking, with probably a requirement for hardware support if SGX
> and/or TDX are any examples.
>
> So, I believe that TSEM brings useful technology to the table, but
> regardless of technology, it would seem there is a need to
> specifically define the security requirements for the UMD model.
>
> > Petr T
>
> Have a good day.
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity


2023-07-06 15:48:00

by Roberto Sassu

[permalink] [raw]
Subject: Re: [QUESTION] Full user space process isolation?

On Thu, 2023-07-06 at 13:35 +0200, Roberto Sassu wrote:
> On Thu, 2023-07-06 at 05:53 -0500, Dr. Greg wrote:
> > On Tue, Jul 04, 2023 at 05:18:43PM +0200, Petr Tesarik wrote:
> >
> > Good morning, I hope the week is going well for everyone.
> >
> > > On 7/3/2023 5:28 PM, Roberto Sassu wrote:
> > > > On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote:
> > > > > On Thu, Jun 22, 2023 at 4:45???PM Roberto Sassu
> > > > > <[email protected]> wrote:
> > > > > > I wanted to execute some kernel workloads in a fully isolated user
> > > > > > space process, started from a binary statically linked with klibc,
> > > > > > connected to the kernel only through a pipe.
> > > > >
> > > > > FWIW, the kernel has some infrastructure for this already, see
> > > > > CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
> > > > > example in net/bpfilter/.
> > > >
> > > > Thanks, I actually took that code to make a generic UMD management
> > > > library, that can be used by all use cases:
> > > >
> > > > https://lore.kernel.org/linux-kernel/[email protected]/
> > > >
> > > > > > I also wanted that, for the root user, tampering with that process is
> > > > > > as hard as if the same code runs in kernel space.
> > > > >
> > > > > I believe that actually making it that hard would probably mean that
> > > > > you'd have to ensure that the process doesn't use swap (in other
> > > > > words, it would have to run with all memory locked), because root can
> > > > > choose where swapped pages are stored. Other than that, if you mark it
> > > > > as a kthread so that no ptrace access is allowed, you can probably get
> > > > > pretty close. But if you do anything like that, please leave some way
> > > > > (like a kernel build config option or such) to enable debugging for
> > > > > these processes.
> > > >
> > > > I didn't think about the swapping part... thanks!
> > > >
> > > > Ok to enable debugging with a config option.
> > > >
> > > > > But I'm not convinced that it makes sense to try to draw a security
> > > > > boundary between fully-privileged root (with the ability to mount
> > > > > things and configure swap and so on) and the kernel - my understanding
> > > > > is that some kernel subsystems don't treat root-to-kernel privilege
> > > > > escalation issues as security bugs that have to be fixed.
> > > >
> > > > Yes, that is unfortunately true, and in that case the trustworthy UMD
> > > > would not make things worse. On the other hand, on systems where that
> > > > separation is defined, the advantage would be to run more exploitable
> > > > code in user space, leaving the kernel safe.
> > > >
> > > > I'm thinking about all the cases where the code had to be included in
> > > > the kernel to run at the same privilege level, but would not use any of
> > > > the kernel facilities (e.g. parsers).
> > >
> > > Thanks for reminding me of kexec-tools. The complete image for booting a
> > > new kernel was originally prepared in user space. With kernel lockdown,
> > > all this code had to move into the kernel, adding a new syscall and lots
> > > of complexity to build purgatory code, etc. Yet, this new implementation
> > > in the kernel does not offer all features of kexec-tools, so both code
> > > bases continue to exist and are happily diverging...
> > >
> > > > If the boundary is extended to user space, some of these components
> > > > could be moved away from the kernel, and the functionality would be the
> > > > same without decreasing the security.
> >
> > > All right, AFAICS your idea is limited to relatively simple cases
> > > for now. I mean, allowing kexec-tools to run in user space is not
> > > easily possible when UID 0 is not trusted, because kexec needs to
> > > open various files and make various other syscalls, which would
> > > require a complex LSM policy. It looks technically possible to write
> > > one, but then the big question is if it would be simpler to review
> > > and maintain than adding more kexec-tools features to the kernel.
> >
> > You either need to develop and maintain a complex system-wide LSM
> > policy or you need a security model that is specifically tuned and
> > then scoped to the needs of the workload running on behalf of the
> > kernel as a UID=0 userspace process.
> >
> > As I noted in my e-mail to Roberto, our TSEM LSM brings forward the
> > ability to do both, as a useful side effect of the need to limit model
> > complexity when the objective is to have a single functional
> > description of the security state of a system.
> >
> > > Anyway, I can sense a general desire to run less code in the most
> > > privileged system environment. Robert's proposal is one of few that
> > > go in this direction. What are the alternatives?
> >
> > As I noted above, TSEM brings the ability to provide highly specific
> > and narrowly scoped security policy to a process heirarchy
> > ie. workload.
> >
> > However, regardless of the technology applied, in order to pursue
> > Roberto's UMD model of having a uid=0 process run tasks on behalf of
> > the kernel, there would seem to be a need to define what the security
> > objectives are.
> >
> > From the outside looking in, there would seem to be a need to address
> > two primary issues:
> >
> > 1: Trust/constrain what the UMD process can do.
>
> Very simple:
>
> read from a kernel-opened fd, write to another kernel-opened fd, close
> the fds and exit.
>
> With the seccomp strict profile, a process cannot call any other system
> call, and it gets killed if it does.
>
> I tried to write a BPF filter, to see how far I can go, and that seems
> sufficient to constrain what the UMD process can do.
>
> Please note that the UMD process setup is done by the kernel, before
> any user space code has the chance to run. The kernel is responsible to
> properly establish the communication with the UMD process.
>
> > 2: Constrain what the system at large can do to the UMD process.
>
> If someone outside can influence the behavior of the UMD process,
> meaning altering the result, that would be unacceptable.
>
> I found that denying ptrace on the UMD process as target, more or less
> covers everything, even trying to read or write /proc/<pid>/fd/<N>.
>
> There might be something more subtle, like what Iann pointed out, avoid
> swapping of the UMD process, as there is no integrity check when the
> page comes back.
>
> Other than that, I was limiting the kill, maybe we have to do something
> similar with io_uring (but we would know if the UMD process uses it).
> With that in place, the UMD process seems pretty much isolated.
>
> I would definitely not complicate things more than that, seems that
> this problem is already difficult enough to solve.
>
> Since the goal is very specific, I think writing a very small LSM would
> make sense. With SELinux or Smack, you could also do it, but you have
> to care about loading a policy, enforcing, etc..
>
> The main question is if the kernel is able to enforce isolation on the
> UMD process as it would do for itself.

For those who didn't receive the patch set I just sent, I worked around
the first problem of supporting PGP for verifying the authenticity of
RPM headers and use them with IMA Appraisal.

I introduced in the kernel a new key format, TLV-based, and plan to let
Linux distribution vendors convert PGP keys to this new format in their
building infrastructure (trusted). The converted keys are embedded in
the kernel image.

Signatures can be converted in user space at run-time, since altering
them would make signature verification fail.

You can find the patch set here:

https://lore.kernel.org/linux-integrity/[email protected]/

Thanks

Roberto