2024-03-11 15:04:09

by Vegard Nossum

[permalink] [raw]
Subject: [RFC PATCH 2/2] doc: distros: new document about assessing security vulnerabilities

On February 13, kernel.org became a CVE Numbering Authority (CNA):

https://www.cve.org/Media/News/item/news/2024/02/13/kernel-org-Added-as-CNA

The kernel.org CNA/CVE team does not provide any kind of assessment of
the allocated CVEs or patches. However, this is something that many
distributions want and need.

Provide a new document that can be used as a guide when assessing
vulnerabilities. The hope is to have a common point of reference that
can standardize or harmonize the process and hopefully enable more
cross-distribution collaboration when it comes to assessing bugfixes.

We deliberately emphasize the difficulty of assessing security impact
in the wide variety of configurations and deployments.

Since what most distros probably ultimately want is a type of CVSS score,
the guide is written with that in mind. CVSS provides its own "contextual"
modifiers, but these are not accurate or nuanced enough to capture the
wide variety of kernel configurations and deployments. We therefore focus
on practical evaluation under different sets of assumptions.

Create a new top-level (admittedly rather thin) "book" for information
for distros and place the document there as this document is not meant
for either developers or users.

See the rendered document at:

https://vegard.github.io/linux/2024-03-11/security-assessment.html

Cc: Kees Cook <[email protected]>
Cc: Konstantin Ryabitsev <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Lukas Bulwahn <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Lee Jones <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: John Haxby <[email protected]>
Cc: Marcus Meissner <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Roxana Bradescu <[email protected]>
Cc: Solar Designer <[email protected]>
Cc: Matt Wilson <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
---
Documentation/distros/index.rst | 8 +
Documentation/distros/security-assessment.rst | 496 ++++++++++++++++++
2 files changed, 504 insertions(+)
create mode 100644 Documentation/distros/index.rst
create mode 100644 Documentation/distros/security-assessment.rst

diff --git a/Documentation/distros/index.rst b/Documentation/distros/index.rst
new file mode 100644
index 000000000000..871f1e99042b
--- /dev/null
+++ b/Documentation/distros/index.rst
@@ -0,0 +1,8 @@
+===============================
+Documentation for distributions
+===============================
+
+.. toctree::
+ :maxdepth: 1
+
+ security-assessment
diff --git a/Documentation/distros/security-assessment.rst b/Documentation/distros/security-assessment.rst
new file mode 100644
index 000000000000..2e046c9f5f29
--- /dev/null
+++ b/Documentation/distros/security-assessment.rst
@@ -0,0 +1,496 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Assessing security vulnerabilities
+==================================
+
+:Author: Vegard Nossum <[email protected]>
+
+This document is intended for distributions and others who want to assess
+the severity of the bugs fixed by Linux kernel patches.
+
+We could consider *everything* a security issue until proven otherwise, or we
+could consider *nothing* a security issue until proven otherwise. Neither of
+these approaches are useful in practice. Security assessment is therefore
+partly an exercise in finding the middle ground. This guide attempts to make
+reasonable tradeoffs in order to maximize the utility of security assessments.
+
+Exploitability also depends highly on the context, so we make a distinction
+between a general worst-case assessment and context-specific assessments.
+
+This is NOT an exploitation guide.
+
+.. contents:: Table of contents
+ :local:
+ :backlinks: none
+
+
+What is a vulnerability?
+========================
+
+For the purposes of this document, we consider all bugfixes to be
+potential vulnerabilities. This is because, as stated in
+Documentation/process/cve.rst, whether a bug is exploitable or not
+depends highly on the context (threat model, platform/hardware,
+kernel configuration, boot parameters, runtime configuration,
+connected peripherals, etc.).
+
+The severity of a bugfix can thus range from being completely benign
+to enabling an attacker to gain complete control over a machine --
+these two conclusions can be reached for the exact same patch depending
+on context it is being considered in.
+
+Another aspect of this is that a bug may itself not be very serious, but
+serve as a step in a chain of exploits. A common example of this is when
+a kernel pointer is accidentally revealed to userspace, which gives an
+attacker the ability to bypass `KASLR protections`_.
+
+.. _KASLR protections: https://lwn.net/Articles/569635/
+
+Common assessment scenarios
+---------------------------
+
+In order to share common ground when discussing different attack
+scenarios, we define these four contexts which range from general
+to specific:
+
+1. **Worst-case scenario** with a minimum of mitigating factors and
+ assumptions: driver is enabled, in use, mitigations are disabled or
+ ineffective, etc.
+
+2. **Common configurations**: assuming kernel defaults, taking into
+ account hardware prevalence, etc.
+
+3. **Distro-specific configuration** and defaults: This assessment of a
+ bugfix takes into account a specific kernel configuration and the
+ distro's own assumptions about how the system is configured and
+ intended to be used.
+
+4. **Specific use case** for a single user or deployment: Here we can make
+ precise assumptions about what kernel features are in use and whether
+ physical access is possible.
+
+The most interesting scenarios are probably 1 and 3; the worst-case
+scenario is the best we can do for an assessment that is supposed to be
+valid for the entire kernel community, while individual distributions will
+probably want to take their specific kernel configuration and use cases
+into account.
+
+Latent vulnerabilities
+----------------------
+
+It is worth mentioning the possibility of latent vulnerabilities:
+These are code "defects" which technically cannot be exploited on any
+current hardware, configuration, or scenario, but which should nevertheless
+be fixed since they represent a possible future vulnerability if other
+parts of the code change.
+
+An example of latent vulnerabilities is the failure to check the return
+value of kmalloc() for small memory allocations: as of early 2024, these
+are `well-known to never fail in practice <https://lwn.net/Articles/627419/>`_
+and are thus not exploitable and not technically vulnerabilities. If this
+rule were to change (because of changes to the memory allocator), then these
+would become true vulnerabilities.
+
+We recommend that a "worst-case scenario" assessment don't consider latent
+vulnerabilities as actual vulnerabilities since this is a slippery slope
+where eventually all changes can be considered a vulnerability in some sense
+or another; in that case, we've thrown the baby out with the bath water and
+rendered assessment completely useless.
+
+Exploitability and unknowns
+---------------------------
+
+Beyond build and runtime configuration, there are many factors that play
+into whether a bug is exploitable in practice. Some of these are very
+subtle and it may be incredibly difficult to prove that a bug is, in fact,
+exploitable -- or reliably exploitable.
+
+Similarly to latent vulnerabilities, we need to be careful not to throw
+the baby out with the bath water: requiring a full exploit in order to
+deem a bug exploitable is a too high burden of proof and would discount
+many bugs that could be exploitable if enough time and energy was poured
+into exploring it fully.
+
+Many exploits rely on precise memory layouts of structs and things like
+slab merging (or more generally that certain memory allocation patterns are
+possible) or extremely tight races that look impossible to achieve.
+
+In the absence of other specific information, we recommend considering the
+bug from a "worst case" point of view, i.e. that it is indeed exploitable.
+
+
+Types of bugs
+=============
+
+There are many ways to classify types of bugs into broad categories. Two
+ways that we'll cover here are in terms of the outcome (i.e. what an
+attacker could do in the worst case) and in terms of the source defect.
+
+In terms of outcome:
+
+- **local DoS** (Denial of Service): a local user is able to crash the
+ machine or make it unusable in some way
+
+- **remote DoS**: another machine on the network is able to crash the
+ machine or make it unusable in some way
+
+- **local privilege escalation**: a local user is able to become root,
+ gain capabilities, or more generally become another user with more
+ privileges than the original user
+
+- **kernel code execution**: the attacker is able to run arbitrary code
+ in a kernel context; this is largely equivalent to a privilege escalation
+ to root
+
+- **information leak**: the attacker is able to obtain sensitive information
+ (secret keys, access to another user's memory or processes, etc.)
+
+- **kernel address leak**: a subset of information leaks; this can lead to
+ KASLR bypass, usually just one step as part of an exploit chain.
+
+In terms of source defect:
+
+- **use after free**: can be exploited by allocating another object in the
+ spot that was freed and partially or fully controlling the contents of
+ that object; some otherwise illegal values may let you "control" what the
+ kernel does (e.g. pointers to other objects or function pointers)
+
+- **heap buffer overflow**: similarly may allow you to overwrite values in
+ other objects or read values from unrelated objects
+
+- **stack buffer overflow**: may allow you to overwrite local variables or
+ return addresses on the stack; this is pretty well mitigated by the
+ compiler these days (CFI, stackprotector, etc.) and may not be as huge a
+ concern as it used to be
+
+- **double free**: this may result in general memory corruption, corruption
+ of structures used by the memory allocator, or further use after free
+
+- **memory leak**: typically results in DoS if you can make the machine
+ completely run out of memory, or potentially trigger some error handling
+ code paths
+
+- **reference counting bug**: can result in either a memory leak or a use
+ after free (with quite different outcomes in terms of severity)
+
+- **NULL pointer dereference**: this used to be a severe vulnerability
+ because `userspace could mmap() something at address 0 and trick the
+ kernel into dereferencing it <https://lwn.net/Articles/342330/>`_; this
+ is typically relatively harmless these days because the kernel does not
+ allow this anymore. Nowadays, a NULL pointer dereference typically just
+ results in the kernel killing the offending process; however, in some
+ cases the program that crashed may be holding on to various resources
+ (such as locks, references) which causes DoS or potentially `overflow
+ things like reference counts <https://lwn.net/Articles/914878/>`_.
+
+- **incorrect error handling**: not really a precise bug category by itself
+ but can often lead to a number of the other errors listed above (resource
+ leaks, double frees, etc.)
+
+- **not checking kmalloc() return values**: a subtype of error handling
+ bugs; it is widely believed that `"small" SLAB allocation requests cannot
+ fail <https://lwn.net/Articles/627419/>`_ and thus do not actually
+ represent a security vulnerability
+
+- **logic error**: this is a very general category of errors and potentially
+ includes such things as not checking or respecting permissions, bounds,
+ or other values supplied by userspace
+
+- **missing or incorrect bounds check**: a subtype of logic errors, this
+ often leads to out-of-bounds memory accesses
+
+- **races**: missing or incorrect use of locking, atomics, and memory
+ barriers can manifest in many different ways, including many of the
+ categories listed above.
+
+A useful rule of thumb is that anything that can cause invalid memory
+dereferences is a potential privilege escalation bug.
+
+These categories are not perfect and not exhaustive; it is meant to be a
+rough guide covering the most common cases. There is always room to treat
+special cases specially.
+
+
+Mitigations and hardening
+=========================
+
+The kernel comes with a number of options for hardening and mitigating
+the effects of certain types of attacks:
+
+- `Address-space layout randomization <https://lwn.net/Articles/569635/>`_
+- `Stack protection <https://lwn.net/Articles/584225/>`_
+- `Hardened usercopy <https://lwn.net/Articles/695991/>`_
+- `Reference count hardening <https://lwn.net/Articles/693038/>`_
+- `Bounds checking (fortify) <https://lwn.net/Articles/864521/>`_
+
+The purpose of this section is not to recommend certain options be enabled or
+disabled (for that, see the `Kernel Self Protection Project's recommended
+settings <http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings>`_),
+but to document the fact that many of these options can affect how a
+vulnerability is scored.
+
+
+CVEs and CVSS scores for the kernel
+===================================
+
+CVSS (`Common Vulnerability Scoring System <https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System>`_)
+is an open standard for vulnerability scoring and the system which is
+commonly used by Linux distributions and various industry and government
+bodies.
+
+We won't go into the details of CVSS here, except to give a guide on how
+it could be applied most effectively in the context of the kernel.
+
+**Attack Vector** (Local, Network, Physical):
+
+ The attack vector for most bugfixes will probably be **Local**, i.e.
+ you need a local user account (or more generally user access, e.g. a
+ shell account) in order to trigger the buggy code.
+
+ If a networking protocol or device is involved, the attack
+ vector may be **Network**. However, many bugs in networking code may
+ actually only be locally exploitable, for example bugs that would
+ trigger when passing invalid values to a socket-related system call
+ (e.g. setsockopt()) or when making system calls in a specific sequence.
+
+ The attack vector **Physical** is used when physical access
+ to a machine is required, for example USB device driver bugs that can
+ only be exploited by physically inserting a device into a USB port.
+
+**Attack Complexity** (Low, High):
+
+ This metric represents roughly how difficult it would be to work around
+ security measures like ASLR and how much an exploit would need to be
+ tailored to a specific target in order to be successful.
+
+ As a rule of thumb, the less severe outcomes like DoS or information
+ leaks often have complexity **Low**, while outcomes like privilege
+ escalations and kernel code execution often have complexity **High**.
+
+**Privileges Required** (None, Low, High):
+
+ This metric refers to what kind of privileges the attacker must have in
+ order to exploit the bug.
+
+ We propose that unauthenticated remote attacks have the value **None**;
+ if you can trigger the bug as a local user without any specific
+ additional privileges this would be **Low**, and if additional privileges
+ are required for the user (such as e.g. ``CAP_NET_RAW``) then this would
+ be **High**.
+
+**User Interaction** (None, Required):
+
+ This will usually have the value **None** unless a successful attack
+ requires interaction from another, legitimate user. In that case the
+ value will be **Required**; this could be the case e.g. for filesystem
+ bugs that require the user to run a command (in this case ``mount``)
+ in order to trigger the bug.
+
+**Scope** (Changed, Unchanged):
+
+ For bugs where privilege escalation or kernel code execution is a
+ possible outcome, this will typically be **Changed** (since the kernel
+ has access to the whole system), whereas for outcomes like DoS the
+ scope will be **Unchanged**. Information leaks can have either value
+ depending on whether the information pertains to the original user
+ (**Unchanged**) or other users (**Changed**).
+
+**Confidentiality** (None, Low, High):
+
+ For privilege escalation or kernel code execution bugs, this will
+ typically be **High**; for information leaks this will be **Low**,
+ and for DoS this will be **None**.
+
+**Integrity** (None, Low, High):
+
+ For privilege escalation or kernel code execution bugs, this will
+ typically be **High**; for information leaks this will be **Low**,
+ and for DoS this will be **None**.
+
+**Availability** (None, Low, High):
+
+ For information leaks this will be **None**, whereas almost all other
+ outcomes have a **High** availability impact (a DoS is a loss of
+ availability by definition, whereas privilege escalations typically
+ allows the attacker to reduce the availability as well).
+
+The temporal metrics are optional, but may be useful for kernel patches:
+
+**Exploit Code Maturity**:
+
+ **Unreported** where no reproducer of any kind is known (suitable e.g.
+ for fixes based on reports from static checkers) or **POC** if the issue
+ has been demonstrated
+
+**Remediation Level**:
+
+ This will typically always be **Official Fix** since kernel.org CVEs are
+ only assigned once a patch has been published.
+
+**Report Confidence**:
+
+ This will be **Unknown** for theoretical issues or issues reported by
+ static checkers or **Reasonable** for issues that have been triggered
+ by the reporter or author of the patch (as indicated perhaps by a
+ stack trace or other error message reproduced in the changelog for the
+ patch).
+
+To calculate a final CVSS score (value from 0 to 10), use a calculator
+such as `<https://www.first.org/cvss/calculator/>`_ (also includes detailed
+explanations of each metric and its possible values).
+
+A distro may wish to start by checking whether the file(s) being patched
+are even compiled into their kernel; if not, congrats! You're not vulnerable
+and don't really need to carry out a more detailed analysis.
+
+For things like loadable modules (e.g. device drivers for obscure hardware)
+and runtime parameters you might have a large segment of users who are not
+vulnerable by default.
+
+
+Reachability analysis
+=====================
+
+One of the most frequent tasks for evaluating a security issue will be to
+figure out how the buggy code can be triggered. Usually this will mean
+starting with the function(s) being patched and working backwards through
+callers to figure out where the code is ultimately called from. Sometimes
+this will be a system call, but may also be timer callbacks, workqueue
+items, interrupt handlers, etc. Tools like `cscope <https://en.wikipedia.org/wiki/Cscope>`_
+(or just plain ``git grep``) can be used to help untangle these callchains.
+
+Special care will need to be taken to track function calls made through
+function pointers, especially those stored somewhere on the heap (e.g.
+timer callbacks or pointers to "ops" structs like ``struct super_operations``).
+
+While unnesting the call stack, it may be useful to keep track of any
+conditions that need to be satisfied in order to reach the buggy code,
+perhaps especially calls to ``capable()`` and other capabilities-related
+functions, which may place restrictions on what sort of privileges you
+need to reach the code.
+
+User namespaces
+---------------
+
+By design, `user namespaces <https://lwn.net/Articles/528078/>`_ allow
+non-root processes to behave as if they are root for an isolated part
+of the system. Some important things to be aware of in this respect are:
+
+- User namespaces (in combination with mount namespaces) allow a
+ regular user to mount certain filesystems (proc, sys, and others);
+ see https://man7.org/linux/man-pages/man7/user_namespaces.7.html
+ for more information.
+
+- User namespaces (perhaps in combination with network namespaces)
+ allow a regular user to create sockets for network protocols they
+ would otherwise not be able to access; see
+ https://lwn.net/Articles/740455/ for more information.
+
+- ``capable()`` checks capabilities in the initial user namespace,
+ whereas ``ns_capable()`` checks capabilities in the current user
+ namespace.
+
+
+Examples
+========
+
+In the following examples, we give scores from a "worst case" context,
+i.e. assuming the hardware/platform is in use, the driver is compiled
+in, mitigations are disabled or otherwise ineffective, etc.
+
+**Commit 72d9b9747e78 ("ACPI: extlog: fix NULL pointer dereference check")**:
+
+ The first thing to notice is that the code here is in an ``__exit``
+ function, meaning that it can only run when the module is unloaded
+ (the ``mod->exit()`` call in the delete_module() system call) --
+ inspecting this function reveals that it is restricted to processes
+ with the ``CAP_SYS_MODULE`` capability, meaning you already need
+ quite high privileges to trigger the bug.
+
+ The bug itself is that a pointer is dereferenced before it has been
+ checked to be non-NULL. Without deeper analysis we can't really know
+ whether it is even possible for the pointer to be NULL at this point,
+ although the presence of a check is a good indication that it may be.
+ By grepping for ``extlog_l1_addr``, we see that it is assigned in the
+ corresponding module_init() function and moreover that the only way
+ it can be NULL is if the module failed to load in the first place.
+ Since module_exit() functions are not called on module_init() failure
+ we can conclude that this is not a vulnerability.
+
+**Commit 27e56f59bab5 ("UBSAN: array-index-out-of-bounds in dtSplitRoot")**:
+
+ Right away we notice that this is a filesystem bug in jfs. There is a
+ stack trace showing that the code is coming from the mkdirat() system
+ call. This means you can likely trigger this as a regular user, given
+ that a suitable jfs filesystem has been mounted. Since this is a bug
+ found by syzkaller, we can follow the link in the changelog and find
+ the reproducer. By looking at the reproducer we can see that it almost
+ certainly mounts a corrupted filesystem image.
+
+ When filesystems are involved, the most common scenario is probably
+ when a user has privileges to mount filesystem images in the context
+ of a desktop environment that allows the logged-in user to mount
+ attached USB drives, for example. In this case, physical access would
+ also be necessary, which would make this Attack Vector **Physical**
+ and User Interaction **Required**.
+
+ Another scenario is where a malicious filesystem image is passed to a
+ legitimate user who then unwittingly mounts it and runs
+ mkdir()/mkdirat() to trigger the bug. This would clearly be User
+ Interaction **Required**, but it's not so clear what the Attack Vector
+ would be -- let's call it **Physical**, which is the least severe of
+ the options given to us by CVSS, even though it's not a true physical
+ attack.
+
+ This is an out-of-bounds memory access, so without doing a much deeper
+ analysis we should assume it could potentially lead to privilege
+ escalation, so Scope **Changed**, Confidentiality **High**, Integrity
+ **High**, and Availability **High**.
+
+ Since regular users can't normally mount arbitrary filesystems, we can
+ set Attack Complexity **High** and Privileges **Required**.
+
+ If we also set Exploit Code Maturity **Unproven**, we end up with the
+ following CVSSv3.1 vector:
+
+ - CVSS:3.1/AV:P/AC:H/PR:H/UI:R/S:C/C:H/I:H/A:H/E:U (6.2 - Medium)
+
+ If this score seems high, keep in mind that this is a worst case
+ scenario. In a more specific scenario, jfs might be disabled in the
+ kernel config or there is no way for non-root users to mount any
+ filesystem.
+
+**Commit b988b1bb0053 ("KVM: s390: fix setting of fpc register")**:
+
+ From the changelog: "corruption of the fpc register of the host process"
+ and "the host process will incorrectly continue to run with the value
+ that was supposed to be used for a guest cpu".
+
+ This makes it clear that a guest can partially take control of the
+ host process (presumably the host process running the KVM), which would
+ be a privilege escalation of sorts -- however, since this is corruption
+ of floating-point registers and not a memory error, it is highly
+ unlikely to be exploitable beyond DoS in practice (even then, it is
+ questionable whether the DoS impacts anything beyond the KVM process
+ itself).
+
+ Because an attack would be difficult to pull off, we propose Attack
+ Complexity **High**, and because there isn't a clear or likely path to
+ anything beyond DoS, we'll select Confidentiality **None**, Integrity
+ **Low** and Availability **Low**.
+
+ We suggest the following CVSSv3.1 vector:
+
+ - CVSS:3.1/AV:L/AC:H/PR:N/UI:N/S:U/C:N/I:L/A:L/E:U (3.7 - Low)
+
+
+Futher reading
+==============
+
+Different vendors have other/different rating and classification systems
+for vulnerabilities and severities:
+
+- Microsoft: https://www.microsoft.com/en-us/msrc/security-update-severity-rating-system
+- Red Hat: https://access.redhat.com/security/updates/classification
+- Google: https://cloud.google.com/security-command-center/docs/finding-severity-classifications#severity_classifications
--
2.34.1



2024-03-11 18:00:34

by Matt Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] doc: distros: new document about assessing security vulnerabilities

On Mon, Mar 11, 2024 at 04:00:54PM +0100, Vegard Nossum wrote:
> On February 13, kernel.org became a CVE Numbering Authority (CNA):
>
> https://www.cve.org/Media/News/item/news/2024/02/13/kernel-org-Added-as-CNA
>
> The kernel.org CNA/CVE team does not provide any kind of assessment of
> the allocated CVEs or patches. However, this is something that many
> distributions want and need.
>
> Provide a new document that can be used as a guide when assessing
> vulnerabilities. The hope is to have a common point of reference that
> can standardize or harmonize the process and hopefully enable more
> cross-distribution collaboration when it comes to assessing bugfixes.
>
> We deliberately emphasize the difficulty of assessing security impact
> in the wide variety of configurations and deployments.
>
> Since what most distros probably ultimately want is a type of CVSS score,
> the guide is written with that in mind. CVSS provides its own "contextual"
> modifiers, but these are not accurate or nuanced enough to capture the
> wide variety of kernel configurations and deployments. We therefore focus
> on practical evaluation under different sets of assumptions.

(sending from my [email protected] account to emphasize that I am speaking
only for myself, not my current employer.)

I'm not sure that Linux distributions particularly *want* a CVSS base
score for kernel CVEs. It is something that downstream _users_ of
software have come to expect, especially those that operate under
compliance regimes that suggest or require the use of CVSS in an
enterprise's vulnerability management function.

Those compliance regimes often suggest using CVSS scores as found in
the NVD in search of an objective third party assessment of a
vulnerability. Unfortunately the text of these regulations suggests
that the base scores generated by the CVSS system, and found in the
NVD, are a measure of "risk" rather than a contextless measure of
"impact".

There have been occurrences where a CVSSv3.1 score produced by a
vendor of software are ignored when the score in the NVD is higher
(often 9.8 due to NIST's standard practice in producing CVSS scores
from "Incomplete Data" [1]). I don't know that harmonizing the
practice of producing CVSSv3.1 base scores across Linux vendors will
address the problem unless scores that are made available in the NVD
match.

But, stepping back for a moment I want to make sure that we are
putting energy into a system that is fit for the Linux community's
needs. CVSS lacks a strong scientific and statistical basis as an
information capture and conveyance system. A study of the distribution
of CVSSv3.1 base scores historically generated [2] shows that while
the system was designed to resemble a normal distribution, in practice
it is anything but.

A guide that helps a practitioner evaluate the legitimate risks that
may be present in a given version, configuration, and use case for the
Linux kernel could be a very helpful thing. This guide is an excellent
start for one! But as you rightly call out, CVSS is not a system that
has an ability to capture all the nuance and context of software the
likes of the Linux kernel, therefore the focus should be on practical
evaluation under common use cases.

> Create a new top-level (admittedly rather thin) "book" for information
> for distros and place the document there as this document is not meant
> for either developers or users.
>
> See the rendered document at:
>
> https://vegard.github.io/linux/2024-03-11/security-assessment.html

[...]

> +
> +CVEs and CVSS scores for the kernel
> +===================================
> +
> +CVSS (`Common Vulnerability Scoring System <https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System>`_)
> +is an open standard for vulnerability scoring and the system which is
> +commonly used by Linux distributions and various industry and government
> +bodies.
> +
> +We won't go into the details of CVSS here, except to give a guide on how
> +it could be applied most effectively in the context of the kernel.

If the guide has something to say about CVSS, I (speaking only for
myself) would like for it to call out the hazards that the system
presents. I am not convinced that CVSS can be applied effectively in
the context of the kernel, and would rather this section call out all
the reasons why it's a fool's errand to try.

--msw

[1] https://nvd.nist.gov/vuln-metrics/cvss
[2] https://theoryof.predictable.software/articles/a-closer-look-at-cvss-scores/#the-distribution-of-actual-scores

2024-03-12 22:59:20

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] doc: distros: new document about assessing security vulnerabilities

On Mon, Mar 11, 2024 at 04:00:54PM +0100, Vegard Nossum wrote:
> On February 13, kernel.org became a CVE Numbering Authority (CNA):
>
> https://www.cve.org/Media/News/item/news/2024/02/13/kernel-org-Added-as-CNA
>
> The kernel.org CNA/CVE team does not provide any kind of assessment of
> the allocated CVEs or patches. However, this is something that many
> distributions want and need.
>
> Provide a new document that can be used as a guide when assessing
> vulnerabilities. The hope is to have a common point of reference that
> can standardize or harmonize the process and hopefully enable more
> cross-distribution collaboration when it comes to assessing bugfixes.
>
> We deliberately emphasize the difficulty of assessing security impact
> in the wide variety of configurations and deployments.
>
> Since what most distros probably ultimately want is a type of CVSS score,
> the guide is written with that in mind. CVSS provides its own "contextual"
> modifiers, but these are not accurate or nuanced enough to capture the
> wide variety of kernel configurations and deployments. We therefore focus
> on practical evaluation under different sets of assumptions.
>
> Create a new top-level (admittedly rather thin) "book" for information
> for distros and place the document there as this document is not meant
> for either developers or users.
>
> See the rendered document at:
>
> https://vegard.github.io/linux/2024-03-11/security-assessment.html
>
> Cc: Kees Cook <[email protected]>
> Cc: Konstantin Ryabitsev <[email protected]>
> Cc: Krzysztof Kozlowski <[email protected]>
> Cc: Lukas Bulwahn <[email protected]>
> Cc: Sasha Levin <[email protected]>
> Cc: Lee Jones <[email protected]>
> Cc: Pavel Machek <[email protected]>
> Cc: John Haxby <[email protected]>
> Cc: Marcus Meissner <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Roxana Bradescu <[email protected]>
> Cc: Solar Designer <[email protected]>
> Cc: Matt Wilson <[email protected]>
> Signed-off-by: Vegard Nossum <[email protected]>
> ---
> Documentation/distros/index.rst | 8 +
> Documentation/distros/security-assessment.rst | 496 ++++++++++++++++++
> 2 files changed, 504 insertions(+)
> create mode 100644 Documentation/distros/index.rst
> create mode 100644 Documentation/distros/security-assessment.rst
>
> diff --git a/Documentation/distros/index.rst b/Documentation/distros/index.rst
> new file mode 100644
> index 000000000000..871f1e99042b
> --- /dev/null
> +++ b/Documentation/distros/index.rst
> @@ -0,0 +1,8 @@
> +===============================
> +Documentation for distributions
> +===============================
> +
> +.. toctree::
> + :maxdepth: 1
> +
> + security-assessment
> diff --git a/Documentation/distros/security-assessment.rst b/Documentation/distros/security-assessment.rst
> new file mode 100644
> index 000000000000..2e046c9f5f29
> --- /dev/null
> +++ b/Documentation/distros/security-assessment.rst
> @@ -0,0 +1,496 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==================================
> +Assessing security vulnerabilities
> +==================================
> +
> +:Author: Vegard Nossum <[email protected]>
> +
> +This document is intended for distributions and others who want to assess
> +the severity of the bugs fixed by Linux kernel patches.

Perhaps add, "... when it is infeasible to track a stable Linux
release."

> +We could consider *everything* a security issue until proven otherwise, or we

Who is "we" here (and through-out)?

> +could consider *nothing* a security issue until proven otherwise. Neither of
> +these approaches are useful in practice. Security assessment is therefore
> +partly an exercise in finding the middle ground. This guide attempts to make
> +reasonable tradeoffs in order to maximize the utility of security assessments.
> +
> +Exploitability also depends highly on the context, so we make a distinction
> +between a general worst-case assessment and context-specific assessments.
> +
> +This is NOT an exploitation guide.
> +
> +.. contents:: Table of contents
> + :local:
> + :backlinks: none
> +
> +
> +What is a vulnerability?
> +========================
> +
> +For the purposes of this document, we consider all bugfixes to be
> +potential vulnerabilities. This is because, as stated in

The CVE definition makes a distinction here, instead calling a
software flaw with security considerations a "weakness" rather than
"vulnerability". I find "weakness" more in line with people's thinking
about attack chains.

> +Documentation/process/cve.rst, whether a bug is exploitable or not
> +depends highly on the context (threat model, platform/hardware,
> +kernel configuration, boot parameters, runtime configuration,
> +connected peripherals, etc.).

Exploitability is an even higher bar, and tends to be unable to
disprove.

> +The severity of a bugfix can thus range from being completely benign
> +to enabling an attacker to gain complete control over a machine --
> +these two conclusions can be reached for the exact same patch depending
> +on context it is being considered in.

Yes, very much. :)

> +Another aspect of this is that a bug may itself not be very serious, but
> +serve as a step in a chain of exploits. A common example of this is when
> +a kernel pointer is accidentally revealed to userspace, which gives an
> +attacker the ability to bypass `KASLR protections`_.
> +
> +.. _KASLR protections: https://lwn.net/Articles/569635/
> +
> +Common assessment scenarios
> +---------------------------
> +
> +In order to share common ground when discussing different attack
> +scenarios, we define these four contexts which range from general
> +to specific:
> +
> +1. **Worst-case scenario** with a minimum of mitigating factors and
> + assumptions: driver is enabled, in use, mitigations are disabled or
> + ineffective, etc.
> +
> +2. **Common configurations**: assuming kernel defaults, taking into
> + account hardware prevalence, etc.

I'm not sure I'd call this "Common", I'd say "Kernel default configuration"

> +3. **Distro-specific configuration** and defaults: This assessment of a
> + bugfix takes into account a specific kernel configuration and the
> + distro's own assumptions about how the system is configured and
> + intended to be used.

And this just "Distro default configuration".

> +4. **Specific use case** for a single user or deployment: Here we can make
> + precise assumptions about what kernel features are in use and whether
> + physical access is possible.

i.e. a configuration that differs from distro default.

> +The most interesting scenarios are probably 1 and 3; the worst-case
> +scenario is the best we can do for an assessment that is supposed to be
> +valid for the entire kernel community, while individual distributions will
> +probably want to take their specific kernel configuration and use cases
> +into account.
> +
> +Latent vulnerabilities
> +----------------------
> +
> +It is worth mentioning the possibility of latent vulnerabilities:
> +These are code "defects" which technically cannot be exploited on any
> +current hardware, configuration, or scenario, but which should nevertheless
> +be fixed since they represent a possible future vulnerability if other
> +parts of the code change.

I take pedantic issue with "cannot be exploited". Again, "exploit" is a
high bar.

Also, why should hardware limit this? If a "latent vulnerability"
becomes part of an attack chain on some future hardware, and we saw it
was a weakness at the time it landed it stable, it should have gotten
a CVE then, yes?

> +
> +An example of latent vulnerabilities is the failure to check the return
> +value of kmalloc() for small memory allocations: as of early 2024, these
> +are `well-known to never fail in practice <https://lwn.net/Articles/627419/>`_
> +and are thus not exploitable and not technically vulnerabilities. If this
> +rule were to change (because of changes to the memory allocator), then these
> +would become true vulnerabilities.

But for kernel prior to that, it IS an issue, yes? And what does "in
practice" mean? Does that include a system under attack that is being
actively manipulated?

> +We recommend that a "worst-case scenario" assessment don't consider latent
> +vulnerabilities as actual vulnerabilities since this is a slippery slope

I wouldn't use the language "actual", but rather reword this from the
perspective of severity. Triage of severity is what is at issue, yes?

> +where eventually all changes can be considered a vulnerability in some sense
> +or another; in that case, we've thrown the baby out with the bath water and
> +rendered assessment completely useless.

I don't find this to be true at all. Distro triage of kernel bug fixes
isn't binary: it'll always be severity based. Many will be 0, yes, but
it is up to the specific deployment to figure out where their cut line
is if they're not just taking all fixes.

> +
> +Exploitability and unknowns
> +---------------------------
> +
> +Beyond build and runtime configuration, there are many factors that play
> +into whether a bug is exploitable in practice. Some of these are very
> +subtle and it may be incredibly difficult to prove that a bug is, in fact,
> +exploitable -- or reliably exploitable.
> +
> +Similarly to latent vulnerabilities, we need to be careful not to throw
> +the baby out with the bath water: requiring a full exploit in order to
> +deem a bug exploitable is a too high burden of proof and would discount
> +many bugs that could be exploitable if enough time and energy was poured
> +into exploring it fully.
> +
> +Many exploits rely on precise memory layouts of structs and things like
> +slab merging (or more generally that certain memory allocation patterns are
> +possible) or extremely tight races that look impossible to achieve.
> +
> +In the absence of other specific information, we recommend considering the
> +bug from a "worst case" point of view, i.e. that it is indeed exploitable.

I agree with this section. :)

> +
> +
> +Types of bugs
> +=============
> +
> +There are many ways to classify types of bugs into broad categories. Two
> +ways that we'll cover here are in terms of the outcome (i.e. what an
> +attacker could do in the worst case) and in terms of the source defect.

Before breaking this down into examples, I would start with a summary of
the more basic security impact categories: Confidentiality, Integrity,
and Availability, as mapping example back to these can be useful in
understanding what a bug is, or can be expanded to.

> +
> +In terms of outcome:
> +
> +- **local DoS** (Denial of Service): a local user is able to crash the
> + machine or make it unusable in some way
> +
> +- **remote DoS**: another machine on the network is able to crash the
> + machine or make it unusable in some way
> +
> +- **local privilege escalation**: a local user is able to become root,
> + gain capabilities, or more generally become another user with more
> + privileges than the original user
> +
> +- **kernel code execution**: the attacker is able to run arbitrary code
> + in a kernel context; this is largely equivalent to a privilege escalation
> + to root

Yes, uid 0 and kernel context are distinct. I don't think I'd say
"largely equivalent" though. Perhaps "Note that root access in many
configurations is equivalent to kernel code execution".

> +- **information leak**: the attacker is able to obtain sensitive information

Instead of "leak", please use the less ambiguous word for this, which is
"exposure". The word "leak" is often confused with resource leaks. This
is especially true for language like "memory leak" (... is this content
exposure or resource drain?)

> + (secret keys, access to another user's memory or processes, etc.)
> +
> +- **kernel address leak**: a subset of information leaks; this can lead to
> + KASLR bypass, usually just one step as part of an exploit chain.

Again, "exposure".

> +
> +In terms of source defect:

These are also very specific. Perhaps a summary of higher level issues:
Spatial safety, temporal safety, arithmetic safety, logic errors, etc.

> +- **use after free**: can be exploited by allocating another object in the
> + spot that was freed and partially or fully controlling the contents of
> + that object; some otherwise illegal values may let you "control" what the
> + kernel does (e.g. pointers to other objects or function pointers)
> +
> +- **heap buffer overflow**: similarly may allow you to overwrite values in
> + other objects or read values from unrelated objects
> +
> +- **stack buffer overflow**: may allow you to overwrite local variables or
> + return addresses on the stack; this is pretty well mitigated by the
> + compiler these days (CFI, stackprotector, etc.) and may not be as huge a
> + concern as it used to be
> +
> +- **double free**: this may result in general memory corruption, corruption
> + of structures used by the memory allocator, or further use after free
> +
> +- **memory leak**: typically results in DoS if you can make the machine
> + completely run out of memory, or potentially trigger some error handling
> + code paths
> +
> +- **reference counting bug**: can result in either a memory leak or a use
> + after free (with quite different outcomes in terms of severity)
> +
> +- **NULL pointer dereference**: this used to be a severe vulnerability
> + because `userspace could mmap() something at address 0 and trick the
> + kernel into dereferencing it <https://lwn.net/Articles/342330/>`_; this
> + is typically relatively harmless these days because the kernel does not
> + allow this anymore. Nowadays, a NULL pointer dereference typically just
> + results in the kernel killing the offending process; however, in some
> + cases the program that crashed may be holding on to various resources
> + (such as locks, references) which causes DoS or potentially `overflow
> + things like reference counts <https://lwn.net/Articles/914878/>`_.
> +
> +- **incorrect error handling**: not really a precise bug category by itself
> + but can often lead to a number of the other errors listed above (resource
> + leaks, double frees, etc.)
> +
> +- **not checking kmalloc() return values**: a subtype of error handling
> + bugs; it is widely believed that `"small" SLAB allocation requests cannot
> + fail <https://lwn.net/Articles/627419/>`_ and thus do not actually
> + represent a security vulnerability
> +
> +- **logic error**: this is a very general category of errors and potentially
> + includes such things as not checking or respecting permissions, bounds,
> + or other values supplied by userspace
> +
> +- **missing or incorrect bounds check**: a subtype of logic errors, this
> + often leads to out-of-bounds memory accesses
> +
> +- **races**: missing or incorrect use of locking, atomics, and memory
> + barriers can manifest in many different ways, including many of the
> + categories listed above.
> +
> +A useful rule of thumb is that anything that can cause invalid memory
> +dereferences is a potential privilege escalation bug.

Even an "unexpected" dereference. :)

> +
> +These categories are not perfect and not exhaustive; it is meant to be a
> +rough guide covering the most common cases. There is always room to treat
> +special cases specially.
> +
> +
> +Mitigations and hardening
> +=========================
> +
> +The kernel comes with a number of options for hardening and mitigating
> +the effects of certain types of attacks:
> +
> +- `Address-space layout randomization <https://lwn.net/Articles/569635/>`_
> +- `Stack protection <https://lwn.net/Articles/584225/>`_
> +- `Hardened usercopy <https://lwn.net/Articles/695991/>`_
> +- `Reference count hardening <https://lwn.net/Articles/693038/>`_
> +- `Bounds checking (fortify) <https://lwn.net/Articles/864521/>`_
> +
> +The purpose of this section is not to recommend certain options be enabled or
> +disabled (for that, see the `Kernel Self Protection Project's recommended
> +settings <http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings>`_),
> +but to document the fact that many of these options can affect how a
> +vulnerability is scored.
> +
> +
> +CVEs and CVSS scores for the kernel
> +===================================
> +
> +CVSS (`Common Vulnerability Scoring System <https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System>`_)
> +is an open standard for vulnerability scoring and the system which is
> +commonly used by Linux distributions and various industry and government
> +bodies.
> +
> +We won't go into the details of CVSS here, except to give a guide on how
> +it could be applied most effectively in the context of the kernel.
> +
> +**Attack Vector** (Local, Network, Physical):
> +
> + The attack vector for most bugfixes will probably be **Local**, i.e.
> + you need a local user account (or more generally user access, e.g. a
> + shell account) in order to trigger the buggy code.
> +
> + If a networking protocol or device is involved, the attack
> + vector may be **Network**. However, many bugs in networking code may
> + actually only be locally exploitable, for example bugs that would
> + trigger when passing invalid values to a socket-related system call
> + (e.g. setsockopt()) or when making system calls in a specific sequence.
> +
> + The attack vector **Physical** is used when physical access
> + to a machine is required, for example USB device driver bugs that can
> + only be exploited by physically inserting a device into a USB port.
> +
> +**Attack Complexity** (Low, High):
> +
> + This metric represents roughly how difficult it would be to work around
> + security measures like ASLR and how much an exploit would need to be
> + tailored to a specific target in order to be successful.
> +
> + As a rule of thumb, the less severe outcomes like DoS or information
> + leaks often have complexity **Low**, while outcomes like privilege
> + escalations and kernel code execution often have complexity **High**.
> +
> +**Privileges Required** (None, Low, High):
> +
> + This metric refers to what kind of privileges the attacker must have in
> + order to exploit the bug.
> +
> + We propose that unauthenticated remote attacks have the value **None**;
> + if you can trigger the bug as a local user without any specific
> + additional privileges this would be **Low**, and if additional privileges
> + are required for the user (such as e.g. ``CAP_NET_RAW``) then this would
> + be **High**.
> +
> +**User Interaction** (None, Required):
> +
> + This will usually have the value **None** unless a successful attack
> + requires interaction from another, legitimate user. In that case the
> + value will be **Required**; this could be the case e.g. for filesystem
> + bugs that require the user to run a command (in this case ``mount``)
> + in order to trigger the bug.
> +
> +**Scope** (Changed, Unchanged):
> +
> + For bugs where privilege escalation or kernel code execution is a
> + possible outcome, this will typically be **Changed** (since the kernel
> + has access to the whole system), whereas for outcomes like DoS the
> + scope will be **Unchanged**. Information leaks can have either value
> + depending on whether the information pertains to the original user
> + (**Unchanged**) or other users (**Changed**).
> +
> +**Confidentiality** (None, Low, High):
> +
> + For privilege escalation or kernel code execution bugs, this will
> + typically be **High**; for information leaks this will be **Low**,
> + and for DoS this will be **None**.
> +
> +**Integrity** (None, Low, High):
> +
> + For privilege escalation or kernel code execution bugs, this will
> + typically be **High**; for information leaks this will be **Low**,
> + and for DoS this will be **None**.
> +
> +**Availability** (None, Low, High):
> +
> + For information leaks this will be **None**, whereas almost all other
> + outcomes have a **High** availability impact (a DoS is a loss of
> + availability by definition, whereas privilege escalations typically
> + allows the attacker to reduce the availability as well).
> +
> +The temporal metrics are optional, but may be useful for kernel patches:
> +
> +**Exploit Code Maturity**:
> +
> + **Unreported** where no reproducer of any kind is known (suitable e.g.
> + for fixes based on reports from static checkers) or **POC** if the issue
> + has been demonstrated
> +
> +**Remediation Level**:
> +
> + This will typically always be **Official Fix** since kernel.org CVEs are
> + only assigned once a patch has been published.
> +
> +**Report Confidence**:
> +
> + This will be **Unknown** for theoretical issues or issues reported by
> + static checkers or **Reasonable** for issues that have been triggered
> + by the reporter or author of the patch (as indicated perhaps by a
> + stack trace or other error message reproduced in the changelog for the
> + patch).
> +
> +To calculate a final CVSS score (value from 0 to 10), use a calculator
> +such as `<https://www.first.org/cvss/calculator/>`_ (also includes detailed
> +explanations of each metric and its possible values).

Why not NIST's website directly?

> +A distro may wish to start by checking whether the file(s) being patched
> +are even compiled into their kernel; if not, congrats! You're not vulnerable
> +and don't really need to carry out a more detailed analysis.
> +
> +For things like loadable modules (e.g. device drivers for obscure hardware)
> +and runtime parameters you might have a large segment of users who are not
> +vulnerable by default.

These 2 paragraphs seem more suited for the Reachability section?

> +Reachability analysis
> +=====================
> +
> +One of the most frequent tasks for evaluating a security issue will be to
> +figure out how the buggy code can be triggered. Usually this will mean
> +starting with the function(s) being patched and working backwards through
> +callers to figure out where the code is ultimately called from. Sometimes
> +this will be a system call, but may also be timer callbacks, workqueue
> +items, interrupt handlers, etc. Tools like `cscope <https://en.wikipedia.org/wiki/Cscope>`_
> +(or just plain ``git grep``) can be used to help untangle these callchains.


Before even this, is just simply looking at whether it was built,
whether it was shipped, if a CONFIG exposed the feature, etc.

> +Special care will need to be taken to track function calls made through
> +function pointers, especially those stored somewhere on the heap (e.g.
> +timer callbacks or pointers to "ops" structs like ``struct super_operations``).
> +
> +While unnesting the call stack, it may be useful to keep track of any
> +conditions that need to be satisfied in order to reach the buggy code,
> +perhaps especially calls to ``capable()`` and other capabilities-related
> +functions, which may place restrictions on what sort of privileges you
> +need to reach the code.
> +
> +User namespaces
> +---------------
> +
> +By design, `user namespaces <https://lwn.net/Articles/528078/>`_ allow
> +non-root processes to behave as if they are root for an isolated part
> +of the system. Some important things to be aware of in this respect are:
> +
> +- User namespaces (in combination with mount namespaces) allow a
> + regular user to mount certain filesystems (proc, sys, and others);
> + see https://man7.org/linux/man-pages/man7/user_namespaces.7.html
> + for more information.
> +
> +- User namespaces (perhaps in combination with network namespaces)
> + allow a regular user to create sockets for network protocols they
> + would otherwise not be able to access; see
> + https://lwn.net/Articles/740455/ for more information.
> +
> +- ``capable()`` checks capabilities in the initial user namespace,
> + whereas ``ns_capable()`` checks capabilities in the current user
> + namespace.
> +
> +
> +Examples
> +========
> +
> +In the following examples, we give scores from a "worst case" context,

..for an generic distro...

> +i.e. assuming the hardware/platform is in use, the driver is compiled
> +in, mitigations are disabled or otherwise ineffective, etc.
> +
> +**Commit 72d9b9747e78 ("ACPI: extlog: fix NULL pointer dereference check")**:
> +
> + The first thing to notice is that the code here is in an ``__exit``
> + function, meaning that it can only run when the module is unloaded
> + (the ``mod->exit()`` call in the delete_module() system call) --
> + inspecting this function reveals that it is restricted to processes
> + with the ``CAP_SYS_MODULE`` capability, meaning you already need
> + quite high privileges to trigger the bug.
> +
> + The bug itself is that a pointer is dereferenced before it has been
> + checked to be non-NULL. Without deeper analysis we can't really know
> + whether it is even possible for the pointer to be NULL at this point,
> + although the presence of a check is a good indication that it may be.
> + By grepping for ``extlog_l1_addr``, we see that it is assigned in the
> + corresponding module_init() function and moreover that the only way
> + it can be NULL is if the module failed to load in the first place.
> + Since module_exit() functions are not called on module_init() failure
> + we can conclude that this is not a vulnerability.

Sounds right.

> +
> +**Commit 27e56f59bab5 ("UBSAN: array-index-out-of-bounds in dtSplitRoot")**:
> +
> + Right away we notice that this is a filesystem bug in jfs. There is a
> + stack trace showing that the code is coming from the mkdirat() system
> + call. This means you can likely trigger this as a regular user, given
> + that a suitable jfs filesystem has been mounted. Since this is a bug
> + found by syzkaller, we can follow the link in the changelog and find
> + the reproducer. By looking at the reproducer we can see that it almost
> + certainly mounts a corrupted filesystem image.
> +
> + When filesystems are involved, the most common scenario is probably
> + when a user has privileges to mount filesystem images in the context
> + of a desktop environment that allows the logged-in user to mount
> + attached USB drives, for example. In this case, physical access would
> + also be necessary, which would make this Attack Vector **Physical**
> + and User Interaction **Required**.
> +
> + Another scenario is where a malicious filesystem image is passed to a
> + legitimate user who then unwittingly mounts it and runs
> + mkdir()/mkdirat() to trigger the bug. This would clearly be User
> + Interaction **Required**, but it's not so clear what the Attack Vector
> + would be -- let's call it **Physical**, which is the least severe of
> + the options given to us by CVSS, even though it's not a true physical
> + attack.

"let's call it" -> "For a distro that doesn't have tools that will mount
filesystem images"... I'm not sure if "Physical" is "worst case" :)

> +
> + This is an out-of-bounds memory access, so without doing a much deeper
> + analysis we should assume it could potentially lead to privilege
> + escalation, so Scope **Changed**, Confidentiality **High**, Integrity
> + **High**, and Availability **High**.
> +
> + Since regular users can't normally mount arbitrary filesystems, we can
> + set Attack Complexity **High** and Privileges **Required**.

Why not? Many distros ship without automounters for inserted media. Some
docker tooling will mount filesystem images.

> +
> + If we also set Exploit Code Maturity **Unproven**, we end up with the
> + following CVSSv3.1 vector:
> +
> + - CVSS:3.1/AV:P/AC:H/PR:H/UI:R/S:C/C:H/I:H/A:H/E:U (6.2 - Medium)
> +
> + If this score seems high, keep in mind that this is a worst case
> + scenario. In a more specific scenario, jfs might be disabled in the
> + kernel config or there is no way for non-root users to mount any
> + filesystem.

Your worst and mine are very different. ;)

> +
> +**Commit b988b1bb0053 ("KVM: s390: fix setting of fpc register")**:
> +
> + From the changelog: "corruption of the fpc register of the host process"
> + and "the host process will incorrectly continue to run with the value
> + that was supposed to be used for a guest cpu".
> +
> + This makes it clear that a guest can partially take control of the
> + host process (presumably the host process running the KVM), which would
> + be a privilege escalation of sorts -- however, since this is corruption
> + of floating-point registers and not a memory error, it is highly
> + unlikely to be exploitable beyond DoS in practice (even then, it is
> + questionable whether the DoS impacts anything beyond the KVM process
> + itself).
> +
> + Because an attack would be difficult to pull off, we propose Attack
> + Complexity **High**, and because there isn't a clear or likely path to
> + anything beyond DoS, we'll select Confidentiality **None**, Integrity
> + **Low** and Availability **Low**.
> +
> + We suggest the following CVSSv3.1 vector:
> +
> + - CVSS:3.1/AV:L/AC:H/PR:N/UI:N/S:U/C:N/I:L/A:L/E:U (3.7 - Low)

Though for many distros this issue will be a non-issue unless they ship
s390...

> +
> +
> +Futher reading
> +==============
> +
> +Different vendors have other/different rating and classification systems
> +for vulnerabilities and severities:
> +
> +- Microsoft: https://www.microsoft.com/en-us/msrc/security-update-severity-rating-system
> +- Red Hat: https://access.redhat.com/security/updates/classification
> +- Google: https://cloud.google.com/security-command-center/docs/finding-severity-classifications#severity_classifications
> --
> 2.34.1
>

--
Kees Cook

2024-03-13 13:11:54

by Vegard Nossum

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] doc: distros: new document about assessing security vulnerabilities


On 11/03/2024 18:59, Matt Wilson wrote:
> On Mon, Mar 11, 2024 at 04:00:54PM +0100, Vegard Nossum wrote:
>> Since what most distros probably ultimately want is a type of CVSS score,
>> the guide is written with that in mind. CVSS provides its own "contextual"
>> modifiers, but these are not accurate or nuanced enough to capture the
>> wide variety of kernel configurations and deployments. We therefore focus
>> on practical evaluation under different sets of assumptions.
>
> (sending from my [email protected] account to emphasize that I am speaking
> only for myself, not my current employer.)
>
> I'm not sure that Linux distributions particularly *want* a CVSS base
> score for kernel CVEs. It is something that downstream _users_ of
> software have come to expect, especially those that operate under
> compliance regimes that suggest or require the use of CVSS in an
> enterprise's vulnerability management function.

Very true.

> Those compliance regimes often suggest using CVSS scores as found in
> the NVD in search of an objective third party assessment of a
> vulnerability. Unfortunately the text of these regulations suggests
> that the base scores generated by the CVSS system, and found in the
> NVD, are a measure of "risk" rather than a contextless measure of
> "impact".
>
> There have been occurrences where a CVSSv3.1 score produced by a
> vendor of software are ignored when the score in the NVD is higher
> (often 9.8 due to NIST's standard practice in producing CVSS scores
> from "Incomplete Data" [1]). I don't know that harmonizing the
> practice of producing CVSSv3.1 base scores across Linux vendors will
> address the problem unless scores that are made available in the NVD
> match.

That link actually says they would use 10.0 for CVEs without enough
detail provided by the filer/CNA (as I understood it).

I wonder what their strategy would be for all of these new kernel CVEs
-- should we expect to see 10.0 or 9.8 for all of them, do you know? I
assume they do NOT have people to evaluate all these patches in detail.

> But, stepping back for a moment I want to make sure that we are
> putting energy into a system that is fit for the Linux community's
> needs. CVSS lacks a strong scientific and statistical basis as an
> information capture and conveyance system. A study of the distribution
> of CVSSv3.1 base scores historically generated [2] shows that while
> the system was designed to resemble a normal distribution, in practice
> it is anything but.

Yes, agreed.

The article was interesting; thanks for that!

>> +CVEs and CVSS scores for the kernel
>> +===================================
>> +
>> +CVSS (`Common Vulnerability Scoring System <https://en.wikipedia.org/wiki/Common_Vulnerability_Scoring_System>`_)
>> +is an open standard for vulnerability scoring and the system which is
>> +commonly used by Linux distributions and various industry and government
>> +bodies.
>> +
>> +We won't go into the details of CVSS here, except to give a guide on how
>> +it could be applied most effectively in the context of the kernel.
>
> If the guide has something to say about CVSS, I (speaking only for
> myself) would like for it to call out the hazards that the system
> presents. I am not convinced that CVSS can be applied effectively in
> the context of the kernel, and would rather this section call out all
> the reasons why it's a fool's errand to try.

I also heard this concern privately from somebody else.

I am considering replacing the CVSS part with something else. To be
honest, the part that really matters to reduce duplicated work for
distros is the reachability analysis (including the necessary conditions
to trigger the bug) and the potential outcomes of triggering the bug.
Once you have those, scoring for impact, risk, etc. can be done fairly
easily (at least more easily) in different systems and taking
distro-specific constraints (configuration, mitigations, etc.) into account.


Vegard

2024-03-13 13:16:17

by Vegard Nossum

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] doc: distros: new document about assessing security vulnerabilities


On 12/03/2024 23:58, Kees Cook wrote:
> On Mon, Mar 11, 2024 at 04:00:54PM +0100, Vegard Nossum wrote:
>> +==================================
>> +Assessing security vulnerabilities
>> +==================================
>> +
>> +:Author: Vegard Nossum <[email protected]>
>> +
>> +This document is intended for distributions and others who want to assess
>> +the severity of the bugs fixed by Linux kernel patches.
>
> Perhaps add, "... when it is infeasible to track a stable Linux
> release."
>
>> +We could consider *everything* a security issue until proven otherwise, or we
>
> Who is "we" here (and through-out)?

In this case a "general we", as in "One could consider...".

But in general the document was intended to represent a consensus of
Linux distributors who need (for a variety of good and bad reasons) to
know the security impact of what they are shipping. (Note: I'm aware
that there isn't a consensus yet -- that's why this is an RFC and I'm
actively trying to _build_ that consensus :-))

>> +What is a vulnerability?
>> +========================
>> +
>> +For the purposes of this document, we consider all bugfixes to be
>> +potential vulnerabilities. This is because, as stated in

(A kind of "academic we" here, though more referring to the general
community of distributors who might stand behind such a document.)

>
> The CVE definition makes a distinction here, instead calling a
> software flaw with security considerations a "weakness" rather than
> "vulnerability". I find "weakness" more in line with people's thinking
> about attack chains.
>
>> +Documentation/process/cve.rst, whether a bug is exploitable or not
>> +depends highly on the context (threat model, platform/hardware,
>> +kernel configuration, boot parameters, runtime configuration,
>> +connected peripherals, etc.).
>
> Exploitability is an even higher bar, and tends to be unable to
> disprove.

Agreed.

>> +2. **Common configurations**: assuming kernel defaults, taking into
>> + account hardware prevalence, etc.
>
> I'm not sure I'd call this "Common", I'd say "Kernel default configuration"
>
>> +3. **Distro-specific configuration** and defaults: This assessment of a
>> + bugfix takes into account a specific kernel configuration and the
>> + distro's own assumptions about how the system is configured and
>> + intended to be used.
>
> And this just "Distro default configuration".
>
>> +4. **Specific use case** for a single user or deployment: Here we can make
>> + precise assumptions about what kernel features are in use and whether
>> + physical access is possible.
>
> i.e. a configuration that differs from distro default.

Will change these.

>> +Latent vulnerabilities
>> +----------------------
>> +
>> +It is worth mentioning the possibility of latent vulnerabilities:
>> +These are code "defects" which technically cannot be exploited on any
>> +current hardware, configuration, or scenario, but which should nevertheless
>> +be fixed since they represent a possible future vulnerability if other
>> +parts of the code change.
>
> I take pedantic issue with "cannot be exploited". Again, "exploit" is a
> high bar.

I am here specifically talking about patches that aren't really fixing
anything because the condition is provably never true; by definition,
they cannot be exploited.

Take the first example from the text below -- there is a NULL pointer
check in the module_exit() that generates a compiler warning because the
check happens after a dereference, but we really do know that the
pointer can never be NULL in the first place by looking at the rest of
the code.

It is only a latent vulnerability because the rest of the code could
conceivably be changed (in the future) in such a way that the NULL check
actually matters, but this is not the case for the specific kernel
version where the compiler warning was fixed.

We cannot take all future potential changes into account when
considering something a vulnerability, that would be madness. If I
started submitting patches to add NULL checks everywhere in the kernel
on account of "some other code may change to make this NULL check useful
in the future" nobody would take that seriously. Those patches would not
even get into the kernel and should not have CVEs assigned to them.

Another example might be a patch changing a sprintf() into snprintf()
(which is a nice cleanup, don't get me wrong!) even when we can clearly
see that the buffer _is_ large enough to contain all possible printed
strings.

> Also, why should hardware limit this? If a "latent vulnerability"
> becomes part of an attack chain on some future hardware, and we saw it
> was a weakness at the time it landed it stable, it should have gotten
> a CVE then, yes?

Maybe...

I think what I had in mind when writing this would not satisfy your "we
saw it was a weakness" criterion.

How about this as an example: We're reading a value out from some device
register and using that to index an array. There isn't an explicit
bounds check because the hardware never returns out-of-bounds values.
Meanwhile, we also sometimes do lookups in the same array for values
coming from userspace. A cleanup patch adds a helper function that does
the lookup, with a check, and replaces both lookups with a call to the
new helper function.

The check is added as a byproduct of a cleanup, but we have no reason to
believe future hardware is going to start returning invalid values.

I don't think this should be considered a vulnerability, but I'm open to
hearing more arguments.

Note that I'm not saying distros should not take the patch (they
should!), just that we should make a clear distinction between latent
vulnerabilities and actual vulnerabilities.

>> +An example of latent vulnerabilities is the failure to check the return
>> +value of kmalloc() for small memory allocations: as of early 2024, these
>> +are `well-known to never fail in practice <https://lwn.net/Articles/627419/>`_
>> +and are thus not exploitable and not technically vulnerabilities. If this
>> +rule were to change (because of changes to the memory allocator), then these
>> +would become true vulnerabilities.
>
> But for kernel prior to that, it IS an issue, yes? And what does "in
> practice" mean? Does that include a system under attack that is being
> actively manipulated?

Depends on what you mean by "issue". I think we should apply patches
adding missing NULL checks for all kmalloc()s.

"in practice" means that we take into account the way the memory
allocator is currently functioning (i.e.: small allocation requests
cannot fail) as opposed to the common and more general assumption that
any memory allocation request can fail at any time.

With "system under attack", are you alluding to these requests possibly
failing while under memory pressure? If that were the case then I agree
this would not be a latent vulnerability but a real vulnerability. But
that does not seem to be the case, see the linked LWN article for the
discussion.

>> +We recommend that a "worst-case scenario" assessment don't consider latent
>> +vulnerabilities as actual vulnerabilities since this is a slippery slope
>
> I wouldn't use the language "actual", but rather reword this from the
> perspective of severity. Triage of severity is what is at issue, yes?

By the definition of latent vulnerabilities given in this section, I
would personally like to see CVE requests for such patches rejected as
they are (again, by definition) not vulnerabilities.

>> +where eventually all changes can be considered a vulnerability in some sense
>> +or another; in that case, we've thrown the baby out with the bath water and
>> +rendered assessment completely useless.
>
> I don't find this to be true at all. Distro triage of kernel bug fixes
> isn't binary: it'll always be severity based. Many will be 0, yes, but
> it is up to the specific deployment to figure out where their cut line
> is if they're not just taking all fixes.

I don't think any distro wants to have CVEs for non-vulnerabilities as
every CVE adds extra work -- and I'm not referring to triage work here.
The triage needs to happen anyway; that's why the parts about latent
vulnerabilities is here in the first place: to clearly distinguish these
types of patches from actual vulnerabilities that are plausibly
exploitable in some known configuration.

Think of this whole section of the document as a shortcut to saying
"this is not a vulnerability in current mainline or stable, no need for
further analysis".

Perhaps I didn't outline the next step of this process well enough, but
what I had in mind was a shared list or repository of analyses where the
individuals performing triage for distros could share their notes and
cut down some of the duplicate work that is likely happening right now
across distributions. We don't always have to agree (and we definitely
don't need to have the same threshold for which fixes get deployed in
practice), but it would help to try to make the determination as
objective as possible, i.e. independent of factors like configuration.
And analyses (such as information about reachability or assumed
capabilities) are typically easier to verify than they are to perform in
the first place, which is why cross-distro collaboration on this would
be so useful, regardless of what the final impact/risk would be for any
particular distro or user.

I think it makes sense to draw a line between latent and actual
vulnerabilities where that is possible.

>> +Types of bugs
>> +=============
>> +
>> +There are many ways to classify types of bugs into broad categories. Two
>> +ways that we'll cover here are in terms of the outcome (i.e. what an
>> +attacker could do in the worst case) and in terms of the source defect.
>
> Before breaking this down into examples, I would start with a summary of
> the more basic security impact categories: Confidentiality, Integrity,
> and Availability, as mapping example back to these can be useful in
> understanding what a bug is, or can be expanded to.

I have tried to avoid going too deeply into "vulnerability theory" and
keep it more practical/down-to-earth (a bit more about this below...)

>> +
>> +In terms of outcome:
>> +
>> +- **local DoS** (Denial of Service): a local user is able to crash the
>> + machine or make it unusable in some way
>> +
>> +- **remote DoS**: another machine on the network is able to crash the
>> + machine or make it unusable in some way
>> +
>> +- **local privilege escalation**: a local user is able to become root,
>> + gain capabilities, or more generally become another user with more
>> + privileges than the original user
>> +
>> +- **kernel code execution**: the attacker is able to run arbitrary code
>> + in a kernel context; this is largely equivalent to a privilege escalation
>> + to root
>
> Yes, uid 0 and kernel context are distinct. I don't think I'd say
> "largely equivalent" though. Perhaps "Note that root access in many
> configurations is equivalent to kernel code execution".

Fair enough.

I guess in my mind, uid 0 means you can load modules (however, not true
with lockdown enabled) or otherwise reach a bunch of code that usually
isn't that well protected (including potential lockdown bypasses), which
implicitly gives you kernel execution. I guess in a worst case scenario,
lockdown is not enabled, which would make them the same.

(And if you have kernel execution, uid 0 is trivial.)

But I can use your phrasing instead.

Do you think it's worth including the bit about lockdown or is that
going into too much detail at this point? We could make it a separate
section next to user namespaces (further down) as there are probably
more details around it that would be interesting in the context of
assessing security vulnerabilities.

>> +- **information leak**: the attacker is able to obtain sensitive information
>
> Instead of "leak", please use the less ambiguous word for this, which is
> "exposure". The word "leak" is often confused with resource leaks. This
> is especially true for language like "memory leak" (... is this content
> exposure or resource drain?)

Agreed, although "info leak" is VERY commonly used and in some sense
probably less confusing for the intended audience of this document.
Especially also given the definition that is given right after: "[...]
obtain sensitive information [...]".

What if I just changed it to: **information leak (exposure)** ?

>> + (secret keys, access to another user's memory or processes, etc.)
>> +
>> +- **kernel address leak**: a subset of information leaks; this can lead to
>> + KASLR bypass, usually just one step as part of an exploit chain.
>
> Again, "exposure".
>
>> +
>> +In terms of source defect:
>
> These are also very specific. Perhaps a summary of higher level issues:
> Spatial safety, temporal safety, arithmetic safety, logic errors, etc.

Again, I feel like this is going too much into a "technical-theoretical"
direction and using the terms as they are commonly used in kernel
development (and the kernel development community) helps us make the
connection between what a patch describes and how the vulnerability is
assessed.

At least for me, as a non-native speaker of English, I have real trouble
with the terms spatial and temporal; they don't have much to do with
programming. I know what they mean, but I always have to stop and think
about it. Maybe that's just me, though!

That said, I was considering grouping some of these into a "memory
safety" category, however I decided against it for a few reasons: merely
reading uninitialized memory is a memory safety issue but depending on
the context and how the value is used this could be an info leak or
snowball into something worse (I mean, it is UB but we typically have an
idea of the possible consequences by reading the code). Also, many
people believe memory leaks are a memory safety issue, which they are not.

What if we add something at the end, maybe with links to more complete
reference material? (Do you have any?)

>> +A useful rule of thumb is that anything that can cause invalid memory
>> +dereferences is a potential privilege escalation bug.
>
> Even an "unexpected" dereference. :)

True.

>> +To calculate a final CVSS score (value from 0 to 10), use a calculator
>> +such as `<https://www.first.org/cvss/calculator/>`_ (also includes detailed
>> +explanations of each metric and its possible values).
>
> Why not NIST's website directly?

The CVSS specifications are owned and managed by FIRST.Org, Inc. and the
calculators were designed and developed by them.

NIST operates NVD which merely makes use of CVSS. At least that's my
understanding.

>> +A distro may wish to start by checking whether the file(s) being patched
>> +are even compiled into their kernel; if not, congrats! You're not vulnerable
>> +and don't really need to carry out a more detailed analysis.
>> +
>> +For things like loadable modules (e.g. device drivers for obscure hardware)
>> +and runtime parameters you might have a large segment of users who are not
>> +vulnerable by default.
>
> These 2 paragraphs seem more suited for the Reachability section?

Good catch, that's where they came from :-)

>> +Reachability analysis
>> +=====================
>> +
>> +One of the most frequent tasks for evaluating a security issue will be to
>> +figure out how the buggy code can be triggered. Usually this will mean
>> +starting with the function(s) being patched and working backwards through
>> +callers to figure out where the code is ultimately called from. Sometimes
>> +this will be a system call, but may also be timer callbacks, workqueue
>> +items, interrupt handlers, etc. Tools like `cscope <https://en.wikipedia.org/wiki/Cscope>`_
>> +(or just plain ``git grep``) can be used to help untangle these callchains.
>
>
> Before even this, is just simply looking at whether it was built,
> whether it was shipped, if a CONFIG exposed the feature, etc.

Yes.

>> +Examples
>> +========
>> +
>> +In the following examples, we give scores from a "worst case" context,
>
> ...for an generic distro...
>
>> +i.e. assuming the hardware/platform is in use, the driver is compiled
>> +in, mitigations are disabled or otherwise ineffective, etc.
>> +
>> +**Commit 72d9b9747e78 ("ACPI: extlog: fix NULL pointer dereference check")**:
>> +
>> + The first thing to notice is that the code here is in an ``__exit``
>> + function, meaning that it can only run when the module is unloaded
>> + (the ``mod->exit()`` call in the delete_module() system call) --
>> + inspecting this function reveals that it is restricted to processes
>> + with the ``CAP_SYS_MODULE`` capability, meaning you already need
>> + quite high privileges to trigger the bug.
>> +
>> + The bug itself is that a pointer is dereferenced before it has been
>> + checked to be non-NULL. Without deeper analysis we can't really know
>> + whether it is even possible for the pointer to be NULL at this point,
>> + although the presence of a check is a good indication that it may be.
>> + By grepping for ``extlog_l1_addr``, we see that it is assigned in the
>> + corresponding module_init() function and moreover that the only way
>> + it can be NULL is if the module failed to load in the first place.
>> + Since module_exit() functions are not called on module_init() failure
>> + we can conclude that this is not a vulnerability.
>
> Sounds right.

Right, so I came across this while looking for examples, as this was
assigned CVE-2023-52605:

https://lore.kernel.org/all/2024030647-CVE-2023-52605-292a@gregkh/

This is where I think the distinction latent/actual vulnerability is
useful: this is a latent vulnerability in the sense that if somebody
were to change module_init() in such a way that extlog_l1_addr could be
NULL, then the dereference/check ordering could be a local DOS by users
with CAP_SYS_MODULE (yes -- admittedly an extremely contrived scenario,
but admissible under worst case assumptions).

>> +**Commit 27e56f59bab5 ("UBSAN: array-index-out-of-bounds in dtSplitRoot")**:
>> +
>> + Right away we notice that this is a filesystem bug in jfs. There is a
>> + stack trace showing that the code is coming from the mkdirat() system
>> + call. This means you can likely trigger this as a regular user, given
>> + that a suitable jfs filesystem has been mounted. Since this is a bug
>> + found by syzkaller, we can follow the link in the changelog and find
>> + the reproducer. By looking at the reproducer we can see that it almost
>> + certainly mounts a corrupted filesystem image.
>> +
>> + When filesystems are involved, the most common scenario is probably
>> + when a user has privileges to mount filesystem images in the context
>> + of a desktop environment that allows the logged-in user to mount
>> + attached USB drives, for example. In this case, physical access would
>> + also be necessary, which would make this Attack Vector **Physical**
>> + and User Interaction **Required**.
>> +
>> + Another scenario is where a malicious filesystem image is passed to a
>> + legitimate user who then unwittingly mounts it and runs
>> + mkdir()/mkdirat() to trigger the bug. This would clearly be User
>> + Interaction **Required**, but it's not so clear what the Attack Vector
>> + would be -- let's call it **Physical**, which is the least severe of
>> + the options given to us by CVSS, even though it's not a true physical
>> + attack.
>
> "let's call it" -> "For a distro that doesn't have tools that will mount
> filesystem images"... I'm not sure if "Physical" is "worst case" :)

Not sure I get this -- there is some kind of inversion of best/worst
case here I think.

The worst case analysis does not make distro-specific assumptions. What
we know is that two system calls are necessary: mount() and mkdir() (or
possibly some other filesystem-related system calls that would end up in
the same vulnerable code). The question is: how do those calls get made?

Anyway, I would welcome more thoughts on this specific bug because the
lines are very blurred (at least in my mind) between what triggers the
bug vs. what the attacker can do and actually does.

If somebody gives you a USB stick with a shell script on it that removes
your home directory when you run it, is that a vulnerability? At which
point is it simply the user's fault for running unknown code? Is it a
physical attack vector because the user's action was something physical,
or was it the attacker's action of handing over the USB stick that was
physical? Does auto-mounting play a part in the analysis?

I'm sure you could model it mathematically with an adversarial model but
we shouldn't have to invoke that whole machinery to do a best effort
evaluation of a patch (though it could be interesting to include a
summary of the results of such an extensive analysis for specific
examples to gain a better intuition).

>> + This is an out-of-bounds memory access, so without doing a much deeper
>> + analysis we should assume it could potentially lead to privilege
>> + escalation, so Scope **Changed**, Confidentiality **High**, Integrity
>> + **High**, and Availability **High**.
>> +
>> + Since regular users can't normally mount arbitrary filesystems, we can
>> + set Attack Complexity **High** and Privileges **Required**.
>
> Why not? Many distros ship without automounters for inserted media. Some
> docker tooling will mount filesystem images.

You mean a regular, unprivileged user can get docker to call mount() on
behalf of the user for JFS filesystems? That's a bit of unfortunate
attack surface given that most filesystems are deliberately restricted
to root because (as I'm sure you're aware) their implementations are not
as hardened as one would hope for (latest LWN article on this topic for
reference: https://lwn.net/Articles/951846/). I would honestly consider
this a vulnerability in Docker and/or auto-mounters if the kernel is
well known to not provide a guarantee against crafted filesystem images.
Maybe it just needs to be documented explicitly?

>> + If we also set Exploit Code Maturity **Unproven**, we end up with the
>> + following CVSSv3.1 vector:
>> +
>> + - CVSS:3.1/AV:P/AC:H/PR:H/UI:R/S:C/C:H/I:H/A:H/E:U (6.2 - Medium)
>> +
>> + If this score seems high, keep in mind that this is a worst case
>> + scenario. In a more specific scenario, jfs might be disabled in the
>> + kernel config or there is no way for non-root users to mount any
>> + filesystem.
>
> Your worst and mine are very different. ;)

What is your worst? (Honest question! Learning from each other is very
much also a goal here.)

>> +**Commit b988b1bb0053 ("KVM: s390: fix setting of fpc register")**:
>> +
>> + From the changelog: "corruption of the fpc register of the host process"
>> + and "the host process will incorrectly continue to run with the value
>> + that was supposed to be used for a guest cpu".
>> +
>> + This makes it clear that a guest can partially take control of the
>> + host process (presumably the host process running the KVM), which would
>> + be a privilege escalation of sorts -- however, since this is corruption
>> + of floating-point registers and not a memory error, it is highly
>> + unlikely to be exploitable beyond DoS in practice (even then, it is
>> + questionable whether the DoS impacts anything beyond the KVM process
>> + itself).
>> +
>> + Because an attack would be difficult to pull off, we propose Attack
>> + Complexity **High**, and because there isn't a clear or likely path to
>> + anything beyond DoS, we'll select Confidentiality **None**, Integrity
>> + **Low** and Availability **Low**.
>> +
>> + We suggest the following CVSSv3.1 vector:
>> +
>> + - CVSS:3.1/AV:L/AC:H/PR:N/UI:N/S:U/C:N/I:L/A:L/E:U (3.7 - Low)
>
> Though for many distros this issue will be a non-issue unless they ship
> s390...

True, I just didn't want to repeat the same line over and over again. Do
you think it's necessary to repeat it again for this patch?

Finally, I'll ask this very straightforwardly, as this seems to be your
main objection: What is your reason for wanting to assign CVEs to
non-vulnerabilities? I don't really understand the motivation for this.
Some people still believe the whole CNA thing is an attempt to "burn
down" the CVE ecosystem; maybe I'm naïve but I don't think that's
actually the case. I also think you care enough about real security to
see that this (burning it down by swamping it with non-issues) would be
a mistake.

My own reason for NOT wanting to assign CVEs to non-vulnerabilities is
the fact that this makes a lot of people's jobs harder for no good
reason. It also does not improve security. Distros will need to ship and
evaluate these patches anyway; I see it more as a collaboration between
the CNA/CVE team and distros to get the best possible quality and
outcome of being a CNA as opposed to working against each other.

Just to be clear: This document is not meant for CNA/CVE team to follow,
although I would be happy if they found it useful too.

In any case, thanks for the review. I will incorporate at least some of
your feedback for a v2.


Vegard

2024-03-13 22:41:19

by Matt Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] doc: distros: new document about assessing security vulnerabilities

On Wed, Mar 13, 2024 at 02:11:00PM +0100, Vegard Nossum wrote:
>
> On 11/03/2024 18:59, Matt Wilson wrote:
> > There have been occurrences where a CVSSv3.1 score produced by a
> > vendor of software are ignored when the score in the NVD is higher
> > (often 9.8 due to NIST's standard practice in producing CVSS scores
> > from "Incomplete Data" [1]). I don't know that harmonizing the
> > practice of producing CVSSv3.1 base scores across Linux vendors will
> > address the problem unless scores that are made available in the NVD
> > match.
>
> That link actually says they would use 10.0 for CVEs without enough
> detail provided by the filer/CNA (as I understood it).

Indeed, the web page says that it would be 10.0 in cases where there
is no detail about the weakness. In practice, the score tends to come
out as 9.8 because the base score vectors are more often
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
and not
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H
(which would be a 10.0 score)

What's the key difference between 9.8 and 10.0 in the CVSSv3.1 system?
Scope:Unchanged. In CVSSv4 such a weakness would likely be scored
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:N/SI:N/SA:N

With a CVSS-B of 9.3 (Critical). What does any of this information
tell a practitioner about what actions are warranted in light of the
presence of a software weakness in their environment? Not much, from
my personal perspective.

> I wonder what their strategy would be for all of these new kernel CVEs
> -- should we expect to see 10.0 or 9.8 for all of them, do you know? I
> assume they do NOT have people to evaluate all these patches in detail.

At present, and since mid-February, NIST is not enriching new CVEs
that have been allocated with CVSS base scores or other additional
data. Their website displays the following banner text:

NIST is currently working to establish a consortium to address
challenges in the NVD program and develop improved tools and
methods. You will temporarily see delays in analysis efforts
during this transition. We apologize for the inconvenience and ask
for your patience as we work to improve the NVD program.

I expect the path forward will be a topic of discussion among
attendees at the upcoming CVE/FIRST VulnCon 2024 & Annual CNA Summit [1].

> > If the guide has something to say about CVSS, I (speaking only for
> > myself) would like for it to call out the hazards that the system
> > presents. I am not convinced that CVSS can be applied effectively in
> > the context of the kernel, and would rather this section call out all
> > the reasons why it's a fool's errand to try.
>
> I also heard this concern privately from somebody else.
>
> I am considering replacing the CVSS part with something else. To be
> honest, the part that really matters to reduce duplicated work for
> distros is the reachability analysis (including the necessary conditions
> to trigger the bug) and the potential outcomes of triggering the bug.
> Once you have those, scoring for impact, risk, etc. can be done fairly
> easily (at least more easily) in different systems and taking
> distro-specific constraints (configuration, mitigations, etc.) into account.

Distros are not the only downstream consumer of Linux with this
need. Arguably the need is even greater for some consumer electronics
applications that may not have the same over-the-air update
capabilities as something like an Android phone. This is a frequently,
and increasingly, discussed topic in Embedded Linux conferences. See,
for example [2, 3].

I think that one coarse-grained "reachability" analysis is CONFIG_*
based matching [4, 5], and that's something that not necessarily
directly reusable across distros or other downstream users of Linux
(as their Kconfigs aren't necessarily the same). But perhaps some
community maintained tooling to automate that analysis would be
useful.

Many in the security community are rightly skeptical about
"reachability analysis" given the possibility of constructing "weird
machines" [6] from executable code that is present but not normally
reached. But if you can confidently attest that the weakness is not
present in a produced binary, you can safely say that the weakness is
not a factor, and poses no legitimate security risk.

Your current draft security assessment guide says:
> A distro may wish to start by checking whether the file(s) being
> patched are even compiled into their kernel; if not, congrats!
> You're not vulnerable and don't really need to carry out a more
> detailed analysis.

One research group [7] found that in a study of 127 router firmware
images 68% of all na?ve version based CVE matches were false-positives
that could be filtered out, mainly through determining that the code
that contains a weakness was never compiled.

I think this low hanging fruit is ripe for picking, and deserves some
more content in a weakness assessment guide.

(P.S., "weakness" is an intentional word choice)

--msw

[1] https://www.first.org/conference/vulncon2024/
[2] https://elinux.org/images/0/0a/Open-Source-CVE-Monitoring-and-Management-V3.pdf
[3] https://www.timesys.com/security/evaluating-vulnerability-tools-embedded-linux-devices/
[4] https://ossjapan2022.sched.com/event/1D14m/config-based-cve-matching-for-linux-kernel-takuma-kawai-miraxia-edge-technology-corporation
[5] https://www.miraxia.com/en/engineers-blog/config-based-cve-matching-for-linux-kernel/
[6] https://ieeexplore.ieee.org/document/8226852
[7] https://arxiv.org/pdf/2209.05217.pdf