From: Petr Tesarik <[email protected]>
The ultimate goal of SandBox Mode is to execute native kernel code
in an environment which permits memory access only to predefined
addresses, so potential vulnerabilities cannot be exploited or will
have no impact on the rest of the kernel.
This patch series adds the API and arch-independent infrastructure
of SandBox Mode to the kernel. It runs the target function on a
vmalloc()'ed copy of all input and output data. This alone prevents
some out-of-bounds accesses thanks to guard pages.
Patch 4/5 adds KUnit tests. It is also a good starting point to
understand how SandBox Mode is supposed to be used.
Detailed description of SandBox Mode goals, usage and future plans
can be found in patch 5/5 of this series and is not repeated in
this cover letter.
Petr Tesarik (5):
sbm: SandBox Mode core data types and functions
sbm: sandbox input and output buffers
sbm: call helpers and thunks
sbm: SandBox Mode KUnit test suite
sbm: SandBox Mode documentation
Documentation/security/index.rst | 1 +
Documentation/security/sandbox-mode.rst | 180 ++++++
include/linux/sbm.h | 516 +++++++++++++++++
init/Kconfig | 2 +
kernel/Kconfig.sbm | 43 ++
kernel/Makefile | 2 +
kernel/sbm.c | 133 +++++
kernel/sbm_test.c | 735 ++++++++++++++++++++++++
8 files changed, 1612 insertions(+)
create mode 100644 Documentation/security/sandbox-mode.rst
create mode 100644 include/linux/sbm.h
create mode 100644 kernel/Kconfig.sbm
create mode 100644 kernel/sbm.c
create mode 100644 kernel/sbm_test.c
--
2.34.1
From: Petr Tesarik <[email protected]>
Add a SandBox Mode document under Documentation/security. Describe the
concept, usage and known limitations.
Signed-off-by: Petr Tesarik <[email protected]>
---
Documentation/security/index.rst | 1 +
Documentation/security/sandbox-mode.rst | 180 ++++++++++++++++++++++++
2 files changed, 181 insertions(+)
create mode 100644 Documentation/security/sandbox-mode.rst
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 59f8fc106cb0..680a0b8bf28b 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -14,6 +14,7 @@ Security Documentation
sak
SCTP
self-protection
+ sandbox-mode
siphash
tpm/index
digsig
diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst
new file mode 100644
index 000000000000..4405b8858c4a
--- /dev/null
+++ b/Documentation/security/sandbox-mode.rst
@@ -0,0 +1,180 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+SandBox Mode
+============
+
+Introduction
+============
+
+The primary goal of SandBox Mode (SBM) is to reduce the impact of potential
+memory safety bugs in kernel code by decomposing the kernel. The SBM API
+allows to run each component inside an isolated execution environment. In
+particular, memory areas used as input and/or output are isolated from the
+rest of the kernel and surrounded by guard pages. Without arch hooks, this
+common base provides *weak isolation*.
+
+On architectures which implement the necessary arch hooks, SandBox Mode
+leverages hardware paging facilities and CPU privilege levels to enforce the
+use of only these predefined memory areas. With arch support, SBM can also
+recover from protection violations. This means that SBM forcibly terminates
+the sandbox and returns an error code (e.g. ``-EFAULT``) to the caller, so
+execution can continue. Such implementation provides *strong isolation*.
+
+A target function in a sandbox communicates with the rest of the kernel
+through a caller-defined interface, comprising read-only buffers (input),
+read-write buffers (output) and the return value. The caller can explicitly
+share other data with the sandbox, but doing so may reduce isolation strength.
+
+Protection of sensitive kernel data is currently out of scope. SandBox Mode is
+meant to run kernel code which would otherwise have full access to all system
+resources. SBM allows to impose a scoped access control policy on which
+resources are available to the sandbox. That said, protection of sensitive
+data is foreseen as a future goal, and that's why the API is designed to
+control not only memory writes but also memory reads.
+
+The expected use case for SandBox Mode is parsing data from untrusted sources,
+especially if the parsing cannot be reasonably done by a user mode helper.
+Keep in mind that a sandbox doesn't guarantee that the output data is correct.
+The result may be corrupt (e.g. as a result of an exploited bug) and where
+applicable, it should be sanitized before further use.
+
+Using SandBox Mode
+==================
+
+SandBox Mode is an optional feature, enabled with ``CONFIG_SANDBOX_MODE``.
+However, the SBM API is always defined regardless of the kernel configuration.
+It will call a function with the best available isolation, which is:
+
+* *strong isolation* if both ``CONFIG_SANDBOX_MODE`` and
+ ``CONFIG_ARCH_HAVE_SBM`` are set,
+* *weak isolation* if ``CONFIG_SANDBOX_MODE`` is set, but
+ ``CONFIG_ARCH_HAVE_SBM`` is unset,
+* *no isolation* if ``CONFIG_SANDBOX_MODE`` is unset.
+
+Code which cannot safely run with no isolation should depend on the relevant
+config option(s).
+
+The API can be used like this:
+
+.. code-block:: c
+
+ #include <linux/sbm.h>
+
+ /* Function to be executed in a sandbox. */
+ static SBM_DEFINE_FUNC(my_func, const struct my_input *, in,
+ struct my_output *, out)
+ {
+ /* Read from in, write to out. */
+ return 0;
+ }
+
+ int caller(...)
+ {
+ /* Declare a SBM instance. */
+ struct sbm sbm;
+
+ /* Initialize SBM instance. */
+ sbm_init(&sbm);
+
+ /* Execute my_func() using the SBM instance. */
+ err = sbm_call(&sbm, my_func,
+ SBM_COPY_IN(&sbm, input, in_size),
+ SBM_COPY_OUT(&sbm, output, out_size));
+
+ /* Clean up. */
+ sbm_destroy(&sbm);
+
+The return type of a sandbox mode function is always ``int``. The return value
+is zero on success and negative on error. That's because the SBM helpers
+return an error code (such as ``-ENOMEM``) if the call cannot be performed.
+
+If sbm_call() returns an error, you can use sbm_error() to decide whether the
+error was returned by the target function or because sandbox mode was aborted
+(or failed to run entirely).
+
+Public API
+----------
+
+.. kernel-doc:: include/linux/sbm.h
+ :identifiers: sbm sbm_init sbm_destroy sbm_exec sbm_error
+ SBM_COPY_IN SBM_COPY_OUT SBM_COPY_INOUT
+ SBM_DEFINE_CALL SBM_DEFINE_THUNK SBM_DEFINE_FUNC
+ sbm_call
+
+Arch Hooks
+----------
+
+These hooks must be implemented to select HAVE_ARCH_SBM.
+
+.. kernel-doc:: include/linux/sbm.h
+ :identifiers: arch_sbm_init arch_sbm_destroy arch_sbm_exec
+ arch_sbm_map_readonly arch_sbm_map_writable
+
+Current Limitations
+===================
+
+This section lists know limitations of the current SBM implementation, which
+are planned to be removed in the future.
+
+Stack
+-----
+
+There is no generic kernel API to run a function on an alternate stack, so SBM
+runs on the normal kernel stack by default. The kernel already offers
+self-protection against stack overflows and underflows as well as against
+overwriting on-stack data outside the current frame, but violations are
+usually fatal.
+
+This limitation can be solved for specific targets. Arch hooks can set up a
+separate stack and recover from stack frame overruns.
+
+Inherent Limitations
+====================
+
+This section lists limitations which are inherent to the concept.
+
+Explicit Code
+-------------
+
+The main idea behind SandBox Mode is decomposition of one big program (the
+Linux kernel) into multiple smaller programs that can be sandboxed. AFAIK
+there is no way to automate this task for an existing code base in C.
+
+Given the performance impact of running code in a sandbox, this limitation may
+be perceived as a benefit. It is expected that sandbox mode is introduced only
+knowingly and only where safety is more important than performance.
+
+Complex Data
+------------
+
+Although data structures are not serialized and deserialized between kernel
+mode and sandbox mode, all directly and indirectly referenced data structures
+must be explicitly mapped into the sandbox, which requires some manual effort.
+
+Copying of input/output buffers also incurs some runtime overhead. This
+overhead can be reduced by sharing data directly with the sandbox, but the
+resulting isolation is weaker, so it may or may not be acceptable, depending
+on the overall safety requirements.
+
+Page Granularity
+----------------
+
+Since paging is used to enforce memory safety, page size is the smallest unit.
+Objects mapped into the sandbox must be aligned to a page boundary, and buffer
+overflows may not be detected if they fit into the same page.
+
+On the other hand, even though such writes are not detected, they do not
+corrupt kernel data, because only the output buffer is copied back to kernel
+mode, and the (corrupted) rest of the page is ignored.
+
+Transitions
+-----------
+
+Transitions between kernel mode and sandbox mode are synchronous. That is,
+whenever entering or leaving sandbox mode, the currently running CPU executes
+the instructions necessary to save/restore its kernel-mode state. The API is
+generic enough to allow asynchronous transitions, e.g. to pass data to another
+CPU which is already running in sandbox mode. However, to see the benefits, a
+hypothetical implementation would require far-reaching changes in the kernel
+scheduler. This is (currently) out of scope.
--
2.34.1
On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <[email protected]> wrote:
> +Although data structures are not serialized and deserialized between kernel
> +mode and sandbox mode, all directly and indirectly referenced data structures
> +must be explicitly mapped into the sandbox, which requires some manual effort.
Maybe I'm missing something here, but...
The requirement that the sandboxed function only ever touch two linear
blocks of memory (yes?) seems a tremendous limitation. I mean, how can
the sandboxed function call kmalloc()? How can it call any useful
kernel functions? They'll all touch memory which lies outside the
sandbox areas?
Perhaps a simple but real-world example would help clarify.
On Wed, Feb 14, 2024 at 05:30:53AM -0800, Andrew Morton wrote:
> On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <[email protected]> wrote:
>
> > +Although data structures are not serialized and deserialized between kernel
> > +mode and sandbox mode, all directly and indirectly referenced data structures
> > +must be explicitly mapped into the sandbox, which requires some manual effort.
>
> Maybe I'm missing something here, but...
>
> The requirement that the sandboxed function only ever touch two linear
> blocks of memory (yes?) seems a tremendous limitation. I mean, how can
> the sandboxed function call kmalloc()? How can it call any useful
> kernel functions? They'll all touch memory which lies outside the
> sandbox areas?
>
> Perhaps a simple but real-world example would help clarify.
I agree, this looks like an "interesting" framework, but we don't add
code to the kernel without a real, in-kernel user for it.
Without such a thing, we can't even consider it for inclusion as we
don't know how it will actually work and how any subsystem would use it.
Petr, do you have an user for this today?
thanks,
greg k-h
On Wed, 14 Feb 2024 15:01:25 +0100
Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Feb 14, 2024 at 05:30:53AM -0800, Andrew Morton wrote:
> > On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <[email protected]> wrote:
> >
> > > +Although data structures are not serialized and deserialized between kernel
> > > +mode and sandbox mode, all directly and indirectly referenced data structures
> > > +must be explicitly mapped into the sandbox, which requires some manual effort.
> >
> > Maybe I'm missing something here, but...
> >
> > The requirement that the sandboxed function only ever touch two linear
> > blocks of memory (yes?) seems a tremendous limitation. I mean, how can
> > the sandboxed function call kmalloc()? How can it call any useful
> > kernel functions? They'll all touch memory which lies outside the
> > sandbox areas?
> >
> > Perhaps a simple but real-world example would help clarify.
>
> I agree, this looks like an "interesting" framework, but we don't add
> code to the kernel without a real, in-kernel user for it.
>
> Without such a thing, we can't even consider it for inclusion as we
> don't know how it will actually work and how any subsystem would use it.
>
> Petr, do you have an user for this today?
Hi Greg & Andrew,
your observations is correct. In this form, the framework is quite
limited, and exactly this objections was expected. You have even
spotted one of the first enhancements I tested on top of this framework
(dynamic memory allocation).
The intended use case is code that processes untrusted data that is not
properly sanitized, but where performance is not critical. Some
examples include decompressing initramfs, loading a kernel module. Or
decoding a boot logo; I think I've noticed a vulnerability in another
project recently... ;-)
Of course, even decompression needs dynamic memory. My plan is to
extend the mechanism. Right now I'm mapping all of kernel text into the
sandbox. Later, I'd like to decompose the text section too. The pages
which contain sandboxed code should be mapped, but rest of the kernel
should not. If the sandbox tries to call kmalloc(), vmalloc(), or
schedule(), the attempt will generate a page fault. Sandbox page faults
are already intercepted, so handle_sbm_call() can decide if the call
should be allowed or not. If the sandbox policy says ALLOW, the page
fault handler will perform the API call on behalf of the sandboxed code
and return results, possibly with some post-call action, e.g. map some
more pages to the address space.
The fact that all communication with the rest of the kernel happens
through CPU exceptions is the reason this mechanism is unsuitable for
performance-critical applications.
OK, so why didn't I send the whole thing?
Decomposition of the kernel requires many more changes, e.g. in linker
scripts. Some of them depend on this patch series. Before I go and
clean up my code into something that can be submitted, I want to get
feedback from guys like you, to know if the whole idea would be even
considered, aka "Fail Fast".
Petr T
On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> OK, so why didn't I send the whole thing?
>
> Decomposition of the kernel requires many more changes, e.g. in linker
> scripts. Some of them depend on this patch series. Before I go and
> clean up my code into something that can be submitted, I want to get
> feedback from guys like you, to know if the whole idea would be even
> considered, aka "Fail Fast".
We can't honestly consider this portion without seeing how it would
work, as we don't even see a working implementation that uses it to
verify it at all.
The joy of adding new frameworks is that you need a user before anyone
can spend the time to review it, sorry.
thanks,
greg k-h
On Wed, 14 Feb 2024 16:11:05 +0100
Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > OK, so why didn't I send the whole thing?
> >
> > Decomposition of the kernel requires many more changes, e.g. in linker
> > scripts. Some of them depend on this patch series. Before I go and
> > clean up my code into something that can be submitted, I want to get
> > feedback from guys like you, to know if the whole idea would be even
> > considered, aka "Fail Fast".
>
> We can't honestly consider this portion without seeing how it would
> work, as we don't even see a working implementation that uses it to
> verify it at all.
>
> The joy of adding new frameworks is that you need a user before anyone
> can spend the time to review it, sorry.
Thank your for a quick assessment. Will it be sufficient if I send some
code for illustration (with some quick&dirty hacks to bridge the gaps),
or do you need clean and nice kernel code?
Petr T
On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> On Wed, 14 Feb 2024 16:11:05 +0100
> Greg Kroah-Hartman <[email protected]> wrote:
>
> > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > OK, so why didn't I send the whole thing?
> > >
> > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > scripts. Some of them depend on this patch series. Before I go and
> > > clean up my code into something that can be submitted, I want to get
> > > feedback from guys like you, to know if the whole idea would be even
> > > considered, aka "Fail Fast".
> >
> > We can't honestly consider this portion without seeing how it would
> > work, as we don't even see a working implementation that uses it to
> > verify it at all.
> >
> > The joy of adding new frameworks is that you need a user before anyone
> > can spend the time to review it, sorry.
>
> Thank your for a quick assessment. Will it be sufficient if I send some
> code for illustration (with some quick&dirty hacks to bridge the gaps),
> or do you need clean and nice kernel code?
We need a real user in the kernel, otherwise why would we even consider
it? Would you want to review a new subsystem that does nothing and has
no real users? If not, why would you want us to? :)
thanks,
greg k-h
On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> On Wed, 14 Feb 2024 16:11:05 +0100
> Greg Kroah-Hartman <[email protected]> wrote:
>
> > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > OK, so why didn't I send the whole thing?
> > >
> > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > scripts. Some of them depend on this patch series. Before I go and
> > > clean up my code into something that can be submitted, I want to get
> > > feedback from guys like you, to know if the whole idea would be even
> > > considered, aka "Fail Fast".
> >
> > We can't honestly consider this portion without seeing how it would
> > work, as we don't even see a working implementation that uses it to
> > verify it at all.
> >
> > The joy of adding new frameworks is that you need a user before anyone
> > can spend the time to review it, sorry.
>
> Thank your for a quick assessment. Will it be sufficient if I send some
> code for illustration (with some quick&dirty hacks to bridge the gaps),
> or do you need clean and nice kernel code?
Given that code is going to need a rewrite to make use of this anyways -
why not just do the rewrite in Rust?
Then you get memory safety, which seems to be what you're trying to
achieve here.
Or, you say this is for when performance isn't critical - why not a user
mode helper?
On Wed, 14 Feb 2024 19:48:52 +0100
Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > On Wed, 14 Feb 2024 16:11:05 +0100
> > Greg Kroah-Hartman <[email protected]> wrote:
> >
> > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > OK, so why didn't I send the whole thing?
> > > >
> > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > scripts. Some of them depend on this patch series. Before I go and
> > > > clean up my code into something that can be submitted, I want to get
> > > > feedback from guys like you, to know if the whole idea would be even
> > > > considered, aka "Fail Fast".
> > >
> > > We can't honestly consider this portion without seeing how it would
> > > work, as we don't even see a working implementation that uses it to
> > > verify it at all.
> > >
> > > The joy of adding new frameworks is that you need a user before anyone
> > > can spend the time to review it, sorry.
> >
> > Thank your for a quick assessment. Will it be sufficient if I send some
> > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > or do you need clean and nice kernel code?
>
> We need a real user in the kernel, otherwise why would we even consider
> it? Would you want to review a new subsystem that does nothing and has
> no real users? If not, why would you want us to? :)
Greg, please enlighten me on the process. How is something like this
supposed to get in?
Subsystem maintainers will not review code that depends on core features
not yet reviewed by the respective maintainers. If I add only the API
and a stub implementation, then it brings no benefit and attempts to
introduce the API will be dismissed. I would certainly do just that if
I was a maintainer...
I could try to pack everything (base infrastructure, arch
implementations, API users) into one big patch with pretty much
everybody on the Cc list, but how is that ever going to get reviewed?
Should I just go and maintain an out-of-tree repo for a few years,
hoping that it gets merged one day, like bcachefs? Is this the way?
Petr T
On Wed, 14 Feb 2024 13:54:54 -0500
Kent Overstreet <[email protected]> wrote:
> On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > On Wed, 14 Feb 2024 16:11:05 +0100
> > Greg Kroah-Hartman <[email protected]> wrote:
> >
> > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > OK, so why didn't I send the whole thing?
> > > >
> > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > scripts. Some of them depend on this patch series. Before I go and
> > > > clean up my code into something that can be submitted, I want to get
> > > > feedback from guys like you, to know if the whole idea would be even
> > > > considered, aka "Fail Fast".
> > >
> > > We can't honestly consider this portion without seeing how it would
> > > work, as we don't even see a working implementation that uses it to
> > > verify it at all.
> > >
> > > The joy of adding new frameworks is that you need a user before anyone
> > > can spend the time to review it, sorry.
> >
> > Thank your for a quick assessment. Will it be sufficient if I send some
> > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > or do you need clean and nice kernel code?
>
> Given that code is going to need a rewrite to make use of this anyways -
> why not just do the rewrite in Rust?
Thank you for this question! I concur that rewriting the whole kernel
in Rust would be a better option. I see two differences:
1. amount of work
2. regressions
Rewriting something in Rust means pretty much writing it from scratch.
Doing that necessarily introduces regressions. Old code has been used.
It has been tested. In many corner cases. Lots of bugs have been found,
and they’ve been fixed. If you write code from scratch, you lose much
of the accumulated knowledge.
It may still pay off in the long run.
More importantly, sandbox mode can be viewed as a tool that enforces
decomposition of kernel code. This decomposition is the main benefit.
It requires understanding the dependencies among different parts of the
kernel (both code flow and data flow), and that will in turn promote
better design.
> Then you get memory safety, which seems to be what you're trying to
> achieve here.
>
> Or, you say this is for when performance isn't critical - why not a user
> mode helper?
Processes in user mode are susceptible to all kinds of attacks you may
want to avoid. Sandbox mode can be used in situations where user mode
does not exist, e.g. to display a boot logo or to unpack initramfs.
Sandbox mode is part of the kernel, hence signed, which may be relevant
if the kernel is locked down, so you can use it e.g. to parse a key
from the bootloader and put it on the trusted keyring.
Regarding performance, sandbox overhead is somewhere between kernel
mode and UMH. It is not suitable for time-critical code (like handling
NIC interrupts), but it's still much faster than UMH.
Petr T
On Wed, Feb 14, 2024 at 09:09:37PM +0100, Petr Tesařík wrote:
> On Wed, 14 Feb 2024 13:54:54 -0500
> Kent Overstreet <[email protected]> wrote:
>
> > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > > On Wed, 14 Feb 2024 16:11:05 +0100
> > > Greg Kroah-Hartman <[email protected]> wrote:
> > >
> > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > > OK, so why didn't I send the whole thing?
> > > > >
> > > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > > scripts. Some of them depend on this patch series. Before I go and
> > > > > clean up my code into something that can be submitted, I want to get
> > > > > feedback from guys like you, to know if the whole idea would be even
> > > > > considered, aka "Fail Fast".
> > > >
> > > > We can't honestly consider this portion without seeing how it would
> > > > work, as we don't even see a working implementation that uses it to
> > > > verify it at all.
> > > >
> > > > The joy of adding new frameworks is that you need a user before anyone
> > > > can spend the time to review it, sorry.
> > >
> > > Thank your for a quick assessment. Will it be sufficient if I send some
> > > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > > or do you need clean and nice kernel code?
> >
> > Given that code is going to need a rewrite to make use of this anyways -
> > why not just do the rewrite in Rust?
>
> Thank you for this question! I concur that rewriting the whole kernel
> in Rust would be a better option. I see two differences:
>
> 1. amount of work
> 2. regressions
>
> Rewriting something in Rust means pretty much writing it from scratch.
> Doing that necessarily introduces regressions. Old code has been used.
> It has been tested. In many corner cases. Lots of bugs have been found,
> and they’ve been fixed. If you write code from scratch, you lose much
> of the accumulated knowledge.
But it's work that already has some growing momentum behind it,
especially in the area you cited - decompression algorithms.
> More importantly, sandbox mode can be viewed as a tool that enforces
> decomposition of kernel code. This decomposition is the main benefit.
> It requires understanding the dependencies among different parts of the
> kernel (both code flow and data flow), and that will in turn promote
> better design.
You see this as a tool for general purpose code...?
Typical kernel code tends to be quite pointer heavy.
> > Then you get memory safety, which seems to be what you're trying to
> > achieve here.
> >
> > Or, you say this is for when performance isn't critical - why not a user
> > mode helper?
>
> Processes in user mode are susceptible to all kinds of attacks you may
> want to avoid. Sandbox mode can be used in situations where user mode
> does not exist, e.g. to display a boot logo or to unpack initramfs.
[citation needed]
Running code in the kernel does not make it more secure from attack, and
that's a pretty strange idea. One of the central jobs of the kernel is
to provide isolation between different users.
> Sandbox mode is part of the kernel, hence signed, which may be relevant
> if the kernel is locked down, so you can use it e.g. to parse a key
> from the bootloader and put it on the trusted keyring.
>
> Regarding performance, sandbox overhead is somewhere between kernel
> mode and UMH. It is not suitable for time-critical code (like handling
> NIC interrupts), but it's still much faster than UMH.
yeah, this doesn't seem like a worthwhile direction to go in.
On Wed, 14 Feb 2024 15:19:04 -0500
Kent Overstreet <[email protected]> wrote:
> On Wed, Feb 14, 2024 at 09:09:37PM +0100, Petr Tesařík wrote:
> > On Wed, 14 Feb 2024 13:54:54 -0500
> > Kent Overstreet <[email protected]> wrote:
> >
> > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > > > On Wed, 14 Feb 2024 16:11:05 +0100
> > > > Greg Kroah-Hartman <[email protected]> wrote:
> > > >
> > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > > > OK, so why didn't I send the whole thing?
> > > > > >
> > > > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > > > scripts. Some of them depend on this patch series. Before I go and
> > > > > > clean up my code into something that can be submitted, I want to get
> > > > > > feedback from guys like you, to know if the whole idea would be even
> > > > > > considered, aka "Fail Fast".
> > > > >
> > > > > We can't honestly consider this portion without seeing how it would
> > > > > work, as we don't even see a working implementation that uses it to
> > > > > verify it at all.
> > > > >
> > > > > The joy of adding new frameworks is that you need a user before anyone
> > > > > can spend the time to review it, sorry.
> > > >
> > > > Thank your for a quick assessment. Will it be sufficient if I send some
> > > > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > > > or do you need clean and nice kernel code?
> > >
> > > Given that code is going to need a rewrite to make use of this anyways -
> > > why not just do the rewrite in Rust?
> >
> > Thank you for this question! I concur that rewriting the whole kernel
> > in Rust would be a better option. I see two differences:
> >
> > 1. amount of work
> > 2. regressions
> >
> > Rewriting something in Rust means pretty much writing it from scratch.
> > Doing that necessarily introduces regressions. Old code has been used.
> > It has been tested. In many corner cases. Lots of bugs have been found,
> > and they’ve been fixed. If you write code from scratch, you lose much
> > of the accumulated knowledge.
>
> But it's work that already has some growing momentum behind it,
> especially in the area you cited - decompression algorithms.
Fair enough, this is indeed going for a better solution.
> > More importantly, sandbox mode can be viewed as a tool that enforces
> > decomposition of kernel code. This decomposition is the main benefit.
> > It requires understanding the dependencies among different parts of the
> > kernel (both code flow and data flow), and that will in turn promote
> > better design.
>
> You see this as a tool for general purpose code...?
>
> Typical kernel code tends to be quite pointer heavy.
Yes. I believe this fact contributes to the difficulty of ensuring
memory safety in the kernel. With so much code potentially depnding on
any other kernel data structure, it does not help much that you protect
it as a whole. A finer-grained protection would make more sense.
> > > Then you get memory safety, which seems to be what you're trying to
> > > achieve here.
> > >
> > > Or, you say this is for when performance isn't critical - why not a user
> > > mode helper?
> >
> > Processes in user mode are susceptible to all kinds of attacks you may
> > want to avoid. Sandbox mode can be used in situations where user mode
> > does not exist, e.g. to display a boot logo or to unpack initramfs.
>
> [citation needed]
I assume you mean citation for the kinds of attacks, not for the
unavailability of user space before initramfs is unpacked. ;-)
Here you go:
On Wed, 2023-03-22 at 15:27 -0700, Alexei Starovoitov wrote:
> On Wed, Mar 22, 2023 at 5:08 AM Roberto Sassu
[...]
> <[email protected]> wrote:
> > possible use case. The main goal is to move something that was running
> > in the kernel to user space, with the same isolation guarantees as if
> > the code was executed in the kernel.
>
> They are not the same guarantees.
> UMD is exactly equivalent to root process running in user space.
> Meaning it can be killed, ptraced, priority inverted, etc
https://lore.kernel.org/lkml/CAADnVQJC0h7rtuntt0tqS5BbxWsmyWs3ZSbboZMmUKetMG2VhA@mail.gmail.com/
Petr T
On Wed, 2024-02-14 at 15:19 -0500, Kent Overstreet wrote:
> On Wed, Feb 14, 2024 at 09:09:37PM +0100, Petr Tesařík wrote:
> > On Wed, 14 Feb 2024 13:54:54 -0500
> > Kent Overstreet <[email protected]> wrote:
> >
> > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > > > On Wed, 14 Feb 2024 16:11:05 +0100
> > > > Greg Kroah-Hartman <[email protected]> wrote:
> > > >
> > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > > > OK, so why didn't I send the whole thing?
> > > > > >
> > > > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > > > scripts. Some of them depend on this patch series. Before I go and
> > > > > > clean up my code into something that can be submitted, I want to get
> > > > > > feedback from guys like you, to know if the whole idea would be even
> > > > > > considered, aka "Fail Fast".
> > > > >
> > > > > We can't honestly consider this portion without seeing how it would
> > > > > work, as we don't even see a working implementation that uses it to
> > > > > verify it at all.
> > > > >
> > > > > The joy of adding new frameworks is that you need a user before anyone
> > > > > can spend the time to review it, sorry.
> > > >
> > > > Thank your for a quick assessment. Will it be sufficient if I send some
> > > > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > > > or do you need clean and nice kernel code?
> > >
> > > Given that code is going to need a rewrite to make use of this anyways -
> > > why not just do the rewrite in Rust?
> >
> > Thank you for this question! I concur that rewriting the whole kernel
> > in Rust would be a better option. I see two differences:
> >
> > 1. amount of work
> > 2. regressions
> >
> > Rewriting something in Rust means pretty much writing it from scratch.
> > Doing that necessarily introduces regressions. Old code has been used.
> > It has been tested. In many corner cases. Lots of bugs have been found,
> > and they’ve been fixed. If you write code from scratch, you lose much
> > of the accumulated knowledge.
>
> But it's work that already has some growing momentum behind it,
> especially in the area you cited - decompression algorithms.
>
> > More importantly, sandbox mode can be viewed as a tool that enforces
> > decomposition of kernel code. This decomposition is the main benefit.
> > It requires understanding the dependencies among different parts of the
> > kernel (both code flow and data flow), and that will in turn promote
> > better design.
>
> You see this as a tool for general purpose code...?
>
> Typical kernel code tends to be quite pointer heavy.
>
> > > Then you get memory safety, which seems to be what you're trying to
> > > achieve here.
> > >
> > > Or, you say this is for when performance isn't critical - why not a user
> > > mode helper?
> >
> > Processes in user mode are susceptible to all kinds of attacks you may
> > want to avoid. Sandbox mode can be used in situations where user mode
> > does not exist, e.g. to display a boot logo or to unpack initramfs.
>
> [citation needed]
>
> Running code in the kernel does not make it more secure from attack, and
> that's a pretty strange idea. One of the central jobs of the kernel is
> to provide isolation between different users.
+ linux-security-module, keyrings
It is not exactly about being more secure, but more privileged.
I had a question in the past:
https://lore.kernel.org/linux-integrity/[email protected]/
I basically need to parse PGP keys, to verify RPM package headers,
extract the file digests from them, and use those digests as reference
values for the kernel to decide whether or not files can be accessed
depending on their integrity.
It is very important that, in a locked-down system, even root is
subject to integrity checks. So, the kernel has higher privileges than
root.
Can the kernel be trusted to provide strong enough process isolation,
even against root, in a way that it can offload a workload to a user
space process (PGP parsing), and yet maintain its privilege level
(which would drop to root, if isolation cannot be guaranteed)?
So, since we got no as an answer, we thought about something in the
middle, we still run the code in the kernel, to keep the higher
privilege, but at the same time we mitigate the risk of kernel memory
corruption.
Roberto
On Wed, Feb 14, 2024 at 08:42:54PM +0100, Petr Tesařík wrote:
> On Wed, 14 Feb 2024 19:48:52 +0100
> Greg Kroah-Hartman <[email protected]> wrote:
>
> > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > > On Wed, 14 Feb 2024 16:11:05 +0100
> > > Greg Kroah-Hartman <[email protected]> wrote:
> > >
> > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > > OK, so why didn't I send the whole thing?
> > > > >
> > > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > > scripts. Some of them depend on this patch series. Before I go and
> > > > > clean up my code into something that can be submitted, I want to get
> > > > > feedback from guys like you, to know if the whole idea would be even
> > > > > considered, aka "Fail Fast".
> > > >
> > > > We can't honestly consider this portion without seeing how it would
> > > > work, as we don't even see a working implementation that uses it to
> > > > verify it at all.
> > > >
> > > > The joy of adding new frameworks is that you need a user before anyone
> > > > can spend the time to review it, sorry.
> > >
> > > Thank your for a quick assessment. Will it be sufficient if I send some
> > > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > > or do you need clean and nice kernel code?
> >
> > We need a real user in the kernel, otherwise why would we even consider
> > it? Would you want to review a new subsystem that does nothing and has
> > no real users? If not, why would you want us to? :)
>
> Greg, please enlighten me on the process. How is something like this
> supposed to get in?
If you were in our shoes, what would you want to see in order to be able
to properly review and judge if a new subsystem was ok to accept?
> Subsystem maintainers will not review code that depends on core features
> not yet reviewed by the respective maintainers. If I add only the API
> and a stub implementation, then it brings no benefit and attempts to
> introduce the API will be dismissed. I would certainly do just that if
> I was a maintainer...
Exactly, you need a real user.
> I could try to pack everything (base infrastructure, arch
> implementations, API users) into one big patch with pretty much
> everybody on the Cc list, but how is that ever going to get reviewed?
How are we supposed to know if any of this even works at all if you
don't show that it actually works and is useful? Has any of that work
even been done yet? I'm guessing it has (otherwise you wouldn't have
posted this), but you are expecting us to just "trust us, stuff in the
future is going to use this and need it" here.
Again, we can not add new infrastructure for things that have no users,
nor do you want us to. Ideally you will have at least 3 different
users, as that seems to be the "magic number" that shows that the
api/interface will actually work well, and is flexible enough. Just
one user is great for proof-of-concept, but that usually isn't good
enough to determine if it will work for others (and so it wouldn't need
to be infrastructure at all, but rather just part of that one feature on
its own.)
> Should I just go and maintain an out-of-tree repo for a few years,
> hoping that it gets merged one day, like bcachefs? Is this the way?
No, show us how this is going to be used.
Again, think about what you would want if you had to review this.
thanks,
greg k-h
On Thu, 15 Feb 2024 10:11:05 +0100
Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Feb 14, 2024 at 08:42:54PM +0100, Petr Tesařík wrote:
> > On Wed, 14 Feb 2024 19:48:52 +0100
> > Greg Kroah-Hartman <[email protected]> wrote:
> >
> > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote:
> > > > On Wed, 14 Feb 2024 16:11:05 +0100
> > > > Greg Kroah-Hartman <[email protected]> wrote:
> > > >
> > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote:
> > > > > > OK, so why didn't I send the whole thing?
> > > > > >
> > > > > > Decomposition of the kernel requires many more changes, e.g. in linker
> > > > > > scripts. Some of them depend on this patch series. Before I go and
> > > > > > clean up my code into something that can be submitted, I want to get
> > > > > > feedback from guys like you, to know if the whole idea would be even
> > > > > > considered, aka "Fail Fast".
> > > > >
> > > > > We can't honestly consider this portion without seeing how it would
> > > > > work, as we don't even see a working implementation that uses it to
> > > > > verify it at all.
> > > > >
> > > > > The joy of adding new frameworks is that you need a user before anyone
> > > > > can spend the time to review it, sorry.
> > > >
> > > > Thank your for a quick assessment. Will it be sufficient if I send some
> > > > code for illustration (with some quick&dirty hacks to bridge the gaps),
> > > > or do you need clean and nice kernel code?
> > >
> > > We need a real user in the kernel, otherwise why would we even consider
> > > it? Would you want to review a new subsystem that does nothing and has
> > > no real users? If not, why would you want us to? :)
> >
> > Greg, please enlighten me on the process. How is something like this
> > supposed to get in?
>
> If you were in our shoes, what would you want to see in order to be able
> to properly review and judge if a new subsystem was ok to accept?
>
> > Subsystem maintainers will not review code that depends on core features
> > not yet reviewed by the respective maintainers. If I add only the API
> > and a stub implementation, then it brings no benefit and attempts to
> > introduce the API will be dismissed. I would certainly do just that if
> > I was a maintainer...
>
> Exactly, you need a real user.
Er, what I was trying to say was rather: You need a real implementation
of a core feature before a subsystem maintainer considers using it for
their subsystem.
But I get your point. I need *both*.
> > I could try to pack everything (base infrastructure, arch
> > implementations, API users) into one big patch with pretty much
> > everybody on the Cc list, but how is that ever going to get reviewed?
>
> How are we supposed to know if any of this even works at all if you
> don't show that it actually works and is useful? Has any of that work
> even been done yet? I'm guessing it has (otherwise you wouldn't have
> posted this), but you are expecting us to just "trust us, stuff in the
> future is going to use this and need it" here.
Understood.
> Again, we can not add new infrastructure for things that have no users,
> nor do you want us to. Ideally you will have at least 3 different
> users, as that seems to be the "magic number" that shows that the
> api/interface will actually work well, and is flexible enough. Just
> one user is great for proof-of-concept, but that usually isn't good
> enough to determine if it will work for others (and so it wouldn't need
> to be infrastructure at all, but rather just part of that one feature on
> its own.)
>
> > Should I just go and maintain an out-of-tree repo for a few years,
> > hoping that it gets merged one day, like bcachefs? Is this the way?
>
> No, show us how this is going to be used.
OK, working on it.
> Again, think about what you would want if you had to review this.
Review, or merge? For a review, I would want enough information to
understand what it is *and* where it is going.
As a matter of fact, hpa does not like the x86 implementation. For
reasons that I do not fully understand (yet), but if the concept turns
out to be impractical, then my submission will serve a purpose, as I
can save myself (and anybody else with a similar idea) a lot of work by
failing fast.
Is this a valid way to get early feedback?
Thanks,
Petr T
On Thu, Feb 15, 2024 at 10:45:15AM +0100, Petr Tesařík wrote:
> As a matter of fact, hpa does not like the x86 implementation. For
> reasons that I do not fully understand (yet), but if the concept turns
> out to be impractical, then my submission will serve a purpose, as I
> can save myself (and anybody else with a similar idea) a lot of work by
> failing fast.
>
> Is this a valid way to get early feedback?
Asking for feedback is one thing (that's what RFC is for, right?), but
submitting something and expecting us to review and accept it as-is like
this, just doesn't work well.
thanks,
greg k-h