Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
Knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v6" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/[email protected]/
Version 2
https://lore.kernel.org/kernel-hardening/[email protected]/
Version 3
https://lore.kernel.org/lkml/[email protected]/
Version 4
https://lore.kernel.org/lkml/[email protected]/
Version 5
https://lore.kernel.org/kernel-hardening/[email protected]/
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Changelog v5 -> v6
------------------
- Fix a reported deadlock (kernel test robot).
- Add high level details to the documentation (Andi Kleen).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 278 ++++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1107 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2219 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
Add a security hook that allows a LSM to be notified when a task gets a
fatal signal. This patch is a previous step on the way to compute the
task crash period by the "brute" LSM (linux security module to detect
and mitigate fork brute force attack against vulnerable userspace
processes).
Signed-off-by: John Wood <[email protected]>
---
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 ++++
include/linux/security.h | 4 ++++
kernel/signal.c | 1 +
security/security.c | 5 +++++
5 files changed, 15 insertions(+)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 477a597db013..0208df0955fa 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -220,6 +220,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5)
LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
struct inode *inode)
+LSM_HOOK(void, LSM_RET_VOID, task_fatal_signal, const kernel_siginfo_t *siginfo)
LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp,
u32 *secid)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index fb7f3193753d..beedaa6ee745 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -784,6 +784,10 @@
* security attributes, e.g. for /proc/pid inodes.
* @p contains the task_struct for the task.
* @inode contains the inode structure for the inode.
+ * @task_fatal_signal:
+ * This hook allows security modules to be notified when a task gets a
+ * fatal signal.
+ * @siginfo contains the signal information.
*
* Security hooks for Netlink messaging.
*
diff --git a/include/linux/security.h b/include/linux/security.h
index 8aeebd6646dc..e4025a13630f 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -430,6 +430,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
void security_task_to_inode(struct task_struct *p, struct inode *inode);
+void security_task_fatal_signal(const kernel_siginfo_t *siginfo);
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid);
int security_msg_msg_alloc(struct msg_msg *msg);
@@ -1165,6 +1166,9 @@ static inline int security_task_prctl(int option, unsigned long arg2,
static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
{ }
+static inline void security_task_fatal_signal(const kernel_siginfo_t *siginfo)
+{ }
+
static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
short flag)
{
diff --git a/kernel/signal.c b/kernel/signal.c
index ba4d1ef39a9e..d279df338f45 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2750,6 +2750,7 @@ bool get_signal(struct ksignal *ksig)
/*
* Anything else is fatal, maybe with a core dump.
*/
+ security_task_fatal_signal(&ksig->info);
current->flags |= PF_SIGNALED;
if (sig_kernel_coredump(signr)) {
diff --git a/security/security.c b/security/security.c
index 5ac96b16f8fa..d9cf653a4e70 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1840,6 +1840,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode)
call_void_hook(task_to_inode, p, inode);
}
+void security_task_fatal_signal(const kernel_siginfo_t *siginfo)
+{
+ call_void_hook(task_fatal_signal, siginfo);
+}
+
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag)
{
return call_int_hook(ipc_permission, 0, ipcp, flag);
--
2.25.1
To detect a brute force attack it is necessary that the statistics
shared by all the fork hierarchy processes be updated in every fatal
crash and the most important data to update is the application crash
period. To do so, use the new "task_fatal_signal" LSM hook added in a
previous step.
The application crash period must be a value that is not prone to change
due to spurious data and follows the real crash period. So, to compute
it, the exponential moving average (EMA) is used.
There are two types of brute force attacks that need to be detected. The
first one is an attack that happens through the fork system call and the
second one is an attack that happens through the execve system call. The
first type uses the statistics shared by all the fork hierarchy
processes, but the second type cannot use this statistical data due to
these statistics disappear when the involved tasks finished. In this
last scenario the attack info should be tracked by the statistics of a
higher fork hierarchy (the hierarchy that contains the process that
forks before the execve system call).
Moreover, these two attack types have two variants. A slow brute force
attack that is detected if the maximum number of faults per fork
hierarchy is reached and a fast brute force attack that is detected if
the application crash period falls below a certain threshold.
Also, this patch adds locking to protect the statistics pointer hold by
every process.
Signed-off-by: John Wood <[email protected]>
---
security/brute/brute.c | 498 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 479 insertions(+), 19 deletions(-)
diff --git a/security/brute/brute.c b/security/brute/brute.c
index 99d099e45112..870db55332d4 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -11,9 +11,14 @@
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/lsm_hooks.h>
+#include <linux/math64.h>
#include <linux/printk.h>
#include <linux/refcount.h>
+#include <linux/rwlock.h>
+#include <linux/rwlock_types.h>
#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/task.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/types.h>
@@ -37,6 +42,11 @@ struct brute_stats {
u64 period;
};
+/*
+ * brute_stats_ptr_lock - Lock to protect the brute_stats structure pointer.
+ */
+static DEFINE_RWLOCK(brute_stats_ptr_lock);
+
/*
* brute_blob_sizes - LSM blob sizes.
*
@@ -74,7 +84,7 @@ static struct brute_stats *brute_new_stats(void)
{
struct brute_stats *stats;
- stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
+ stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
if (!stats)
return NULL;
@@ -99,16 +109,17 @@ static struct brute_stats *brute_new_stats(void)
* It's mandatory to disable interrupts before acquiring the brute_stats::lock
* since the task_free hook can be called from an IRQ context during the
* execution of the task_alloc hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
*/
static void brute_share_stats(struct brute_stats *src,
struct brute_stats **dst)
{
- unsigned long flags;
-
- spin_lock_irqsave(&src->lock, flags);
+ spin_lock(&src->lock);
refcount_inc(&src->refc);
*dst = src;
- spin_unlock_irqrestore(&src->lock, flags);
+ spin_unlock(&src->lock);
}
/**
@@ -126,26 +137,36 @@ static void brute_share_stats(struct brute_stats *src,
* this task and the new one being allocated. Otherwise, share the statistics
* that the current task already has.
*
+ * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
+ * and brute_stats::lock since the task_free hook can be called from an IRQ
+ * context during the execution of the task_alloc hook.
+ *
* Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
* otherwise.
*/
static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
{
struct brute_stats **stats, **p_stats;
+ unsigned long flags;
stats = brute_stats_ptr(task);
p_stats = brute_stats_ptr(current);
+ write_lock_irqsave(&brute_stats_ptr_lock, flags);
if (likely(*p_stats)) {
brute_share_stats(*p_stats, stats);
+ write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return 0;
}
*stats = brute_new_stats();
- if (!*stats)
+ if (!*stats) {
+ write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return -ENOMEM;
+ }
brute_share_stats(*stats, p_stats);
+ write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return 0;
}
@@ -167,9 +188,9 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
* only one task (the task that calls the execve function) points to the data.
* In this case, the previous allocation is used but the statistics are reset.
*
- * It's mandatory to disable interrupts before acquiring the brute_stats::lock
- * since the task_free hook can be called from an IRQ context during the
- * execution of the bprm_committing_creds hook.
+ * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
+ * and brute_stats::lock since the task_free hook can be called from an IRQ
+ * context during the execution of the bprm_committing_creds hook.
*/
static void brute_task_execve(struct linux_binprm *bprm)
{
@@ -177,24 +198,33 @@ static void brute_task_execve(struct linux_binprm *bprm)
unsigned long flags;
stats = brute_stats_ptr(current);
- if (WARN(!*stats, "No statistical data\n"))
+ read_lock_irqsave(&brute_stats_ptr_lock, flags);
+
+ if (WARN(!*stats, "No statistical data\n")) {
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return;
+ }
- spin_lock_irqsave(&(*stats)->lock, flags);
+ spin_lock(&(*stats)->lock);
if (!refcount_dec_not_one(&(*stats)->refc)) {
/* execve call after an execve call */
(*stats)->faults = 0;
(*stats)->jiffies = get_jiffies_64();
(*stats)->period = 0;
- spin_unlock_irqrestore(&(*stats)->lock, flags);
+ spin_unlock(&(*stats)->lock);
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return;
}
/* execve call after a fork call */
- spin_unlock_irqrestore(&(*stats)->lock, flags);
+ spin_unlock(&(*stats)->lock);
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
+
+ write_lock_irqsave(&brute_stats_ptr_lock, flags);
*stats = brute_new_stats();
WARN(!*stats, "Cannot allocate statistical data\n");
+ write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
}
/**
@@ -204,9 +234,9 @@ static void brute_task_execve(struct linux_binprm *bprm)
* The statistical data that is shared between all the fork hierarchy processes
* needs to be freed when this hierarchy disappears.
*
- * It's mandatory to disable interrupts before acquiring the brute_stats::lock
- * since the task_free hook can be called from an IRQ context during the
- * execution of the task_free hook.
+ * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
+ * and brute_stats::lock since the task_free hook can be called from an IRQ
+ * context during the execution of the task_free hook.
*/
static void brute_task_free(struct task_struct *task)
{
@@ -215,17 +245,446 @@ static void brute_task_free(struct task_struct *task)
bool refc_is_zero;
stats = brute_stats_ptr(task);
- if (WARN(!*stats, "No statistical data\n"))
+ read_lock_irqsave(&brute_stats_ptr_lock, flags);
+
+ if (WARN(!*stats, "No statistical data\n")) {
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return;
+ }
- spin_lock_irqsave(&(*stats)->lock, flags);
+ spin_lock(&(*stats)->lock);
refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
- spin_unlock_irqrestore(&(*stats)->lock, flags);
+ spin_unlock(&(*stats)->lock);
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
if (refc_is_zero) {
+ write_lock_irqsave(&brute_stats_ptr_lock, flags);
kfree(*stats);
*stats = NULL;
+ write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
+ }
+}
+
+/*
+ * BRUTE_EMA_WEIGHT_NUMERATOR - Weight's numerator of EMA.
+ */
+static const u64 BRUTE_EMA_WEIGHT_NUMERATOR = 7;
+
+/*
+ * BRUTE_EMA_WEIGHT_DENOMINATOR - Weight's denominator of EMA.
+ */
+static const u64 BRUTE_EMA_WEIGHT_DENOMINATOR = 10;
+
+/**
+ * brute_mul_by_ema_weight() - Multiply by EMA weight.
+ * @value: Value to multiply by EMA weight.
+ *
+ * Return: The result of the multiplication operation.
+ */
+static inline u64 brute_mul_by_ema_weight(u64 value)
+{
+ return mul_u64_u64_div_u64(value, BRUTE_EMA_WEIGHT_NUMERATOR,
+ BRUTE_EMA_WEIGHT_DENOMINATOR);
+}
+
+/*
+ * BRUTE_MAX_FAULTS - Maximum number of faults.
+ *
+ * If a brute force attack is running slowly for a long time, the application
+ * crash period's EMA is not suitable for the detection. This type of attack
+ * must be detected using a maximum number of faults.
+ */
+static const unsigned char BRUTE_MAX_FAULTS = 200;
+
+/**
+ * brute_update_crash_period() - Update the application crash period.
+ * @stats: Statistics that hold the application crash period to update.
+ * @now: The current timestamp in jiffies.
+ *
+ * The application crash period must be a value that is not prone to change due
+ * to spurious data and follows the real crash period. So, to compute it, the
+ * exponential moving average (EMA) is used.
+ *
+ * This kind of average defines a weight (between 0 and 1) for the new value to
+ * add and applies the remainder of the weight to the current average value.
+ * This way, some spurious data will not excessively modify the average and only
+ * if the new values are persistent, the moving average will tend towards them.
+ *
+ * Mathematically the application crash period's EMA can be expressed as
+ * follows:
+ *
+ * period_ema = period * weight + period_ema * (1 - weight)
+ *
+ * If the operations are applied:
+ *
+ * period_ema = period * weight + period_ema - period_ema * weight
+ *
+ * If the operands are ordered:
+ *
+ * period_ema = period_ema - period_ema * weight + period * weight
+ *
+ * Finally, this formula can be written as follows:
+ *
+ * period_ema -= period_ema * weight;
+ * period_ema += period * weight;
+ *
+ * The statistics that hold the application crash period to update cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
+ * Return: The last crash timestamp before updating it.
+ */
+static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
+{
+ u64 current_period;
+ u64 last_crash_timestamp;
+
+ spin_lock(&stats->lock);
+ current_period = now - stats->jiffies;
+ last_crash_timestamp = stats->jiffies;
+ stats->jiffies = now;
+
+ stats->period -= brute_mul_by_ema_weight(stats->period);
+ stats->period += brute_mul_by_ema_weight(current_period);
+
+ if (stats->faults < BRUTE_MAX_FAULTS)
+ stats->faults += 1;
+
+ spin_unlock(&stats->lock);
+ return last_crash_timestamp;
+}
+
+/*
+ * BRUTE_MIN_FAULTS - Minimum number of faults.
+ *
+ * The application crash period's EMA cannot be used until a minimum number of
+ * data has been applied to it. This constraint allows getting a trend when this
+ * moving average is used. Moreover, it avoids the scenario where an application
+ * fails quickly from execve system call due to reasons unrelated to a real
+ * attack.
+ */
+static const unsigned char BRUTE_MIN_FAULTS = 5;
+
+/*
+ * BRUTE_CRASH_PERIOD_THRESHOLD - Application crash period threshold.
+ *
+ * The units are expressed in milliseconds.
+ *
+ * A fast brute force attack is detected when the application crash period falls
+ * below this threshold.
+ */
+static const u64 BRUTE_CRASH_PERIOD_THRESHOLD = 30000;
+
+/**
+ * brute_attack_running() - Test if a brute force attack is happening.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * The decision if a brute force attack is running is based on the statistical
+ * data shared by all the fork hierarchy processes. This statistics cannot be
+ * NULL.
+ *
+ * There are two types of brute force attacks that can be detected using the
+ * statistical data. The first one is a slow brute force attack that is detected
+ * if the maximum number of faults per fork hierarchy is reached. The second
+ * type is a fast brute force attack that is detected if the application crash
+ * period falls below a certain threshold.
+ *
+ * Moreover, it is important to note that no attacks will be detected until a
+ * minimum number of faults have occurred. This allows to have a trend in the
+ * crash period when the EMA is used and also avoids the scenario where an
+ * application fails quickly from execve system call due to reasons unrelated to
+ * a real attack.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
+ * Return: True if a brute force attack is happening. False otherwise.
+ */
+static bool brute_attack_running(struct brute_stats *stats)
+{
+ u64 crash_period;
+
+ spin_lock(&stats->lock);
+ if (stats->faults < BRUTE_MIN_FAULTS) {
+ spin_unlock(&stats->lock);
+ return false;
+ }
+
+ if (stats->faults >= BRUTE_MAX_FAULTS) {
+ spin_unlock(&stats->lock);
+ return true;
+ }
+
+ crash_period = jiffies64_to_msecs(stats->period);
+ spin_unlock(&stats->lock);
+
+ return crash_period < BRUTE_CRASH_PERIOD_THRESHOLD;
+}
+
+/**
+ * print_fork_attack_running() - Warn about a fork brute force attack.
+ */
+static inline void print_fork_attack_running(void)
+{
+ pr_warn("Fork brute force attack detected [%s]\n", current->comm);
+}
+
+/**
+ * brute_manage_fork_attack() - Manage a fork brute force attack.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ * @now: The current timestamp in jiffies.
+ *
+ * For a correct management of a fork brute force attack it is only necessary to
+ * update the statistics and test if an attack is happening based on these data.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
+ * Return: The last crash timestamp before updating it.
+ */
+static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
+{
+ u64 last_fork_crash;
+
+ last_fork_crash = brute_update_crash_period(stats, now);
+ if (brute_attack_running(stats))
+ print_fork_attack_running();
+
+ return last_fork_crash;
+}
+
+/**
+ * brute_get_exec_stats() - Get the exec statistics.
+ * @stats: When this function is called, this parameter must point to the
+ * current process' statistical data. When this function returns, this
+ * parameter points to the parent process' statistics of the fork
+ * hierarchy that hold the current process' statistics.
+ *
+ * To manage a brute force attack that happens through the execve system call it
+ * is not possible to use the statistical data hold by this process due to these
+ * statistics disappear when this task is finished. In this scenario this data
+ * should be tracked by the statistics of a higher fork hierarchy (the hierarchy
+ * that contains the process that forks before the execve system call).
+ *
+ * To find these statistics the current fork hierarchy must be traversed up
+ * until new statistics are found.
+ *
+ * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
+ */
+static void brute_get_exec_stats(struct brute_stats **stats)
+{
+ const struct task_struct *task = current;
+ struct brute_stats **p_stats;
+
+ do {
+ if (!task->real_parent) {
+ *stats = NULL;
+ return;
+ }
+
+ p_stats = brute_stats_ptr(task->real_parent);
+ task = task->real_parent;
+ } while (*stats == *p_stats);
+
+ *stats = *p_stats;
+}
+
+/**
+ * brute_update_exec_crash_period() - Update the exec crash period.
+ * @stats: When this function is called, this parameter must point to the
+ * current process' statistical data. When this function returns, this
+ * parameter points to the updated statistics (statistics that track the
+ * info to manage a brute force attack that happens through the execve
+ * system call).
+ * @now: The current timestamp in jiffies.
+ * @last_fork_crash: The last fork crash timestamp before updating it.
+ *
+ * If this is the first update of the statistics used to manage a brute force
+ * attack that happens through the execve system call, its last crash timestamp
+ * (the timestamp that shows when the execve was called) cannot be used to
+ * compute the crash period's EMA. Instead, the last fork crash timestamp should
+ * be used (the last crash timestamp of the child fork hierarchy before updating
+ * the crash period). This allows that in a brute force attack that happens
+ * through the fork system call, the exec and fork statistics are the same. In
+ * this situation, the mitigation method will act only in the processes that are
+ * sharing the fork statistics. This way, the process that forked before the
+ * execve system call will not be involved in the mitigation method. In this
+ * scenario, the parent is not responsible of the child's behaviour.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and tasklist_lock and
+ * brute_stats_ptr_lock held.
+ * Return: -EFAULT if there are no exec statistics. Zero otherwise.
+ */
+static int brute_update_exec_crash_period(struct brute_stats **stats,
+ u64 now, u64 last_fork_crash)
+{
+ brute_get_exec_stats(stats);
+ if (!*stats)
+ return -EFAULT;
+
+ spin_lock(&(*stats)->lock);
+ if (!(*stats)->faults)
+ (*stats)->jiffies = last_fork_crash;
+ spin_unlock(&(*stats)->lock);
+
+ brute_update_crash_period(*stats, now);
+ return 0;
+}
+
+/**
+ * brute_get_crash_period() - Get the application crash period.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
+ * Return: The application crash period.
+ */
+static u64 brute_get_crash_period(struct brute_stats *stats)
+{
+ u64 crash_period;
+
+ spin_lock(&stats->lock);
+ crash_period = stats->period;
+ spin_unlock(&stats->lock);
+
+ return crash_period;
+}
+
+/**
+ * print_exec_attack_running() - Warn about an exec brute force attack.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * Before showing the process name it is mandatory to find a process that holds
+ * a pointer to the exec statistics.
+ *
+ * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
+ */
+static void print_exec_attack_running(const struct brute_stats *stats)
+{
+ struct task_struct *p;
+ struct brute_stats **p_stats;
+ bool found = false;
+
+ for_each_process(p) {
+ p_stats = brute_stats_ptr(p);
+ if (*p_stats == stats) {
+ found = true;
+ break;
+ }
+ }
+
+ if (WARN(!found, "No exec process\n"))
+ return;
+
+ pr_warn("Exec brute force attack detected [%s]\n", p->comm);
+}
+
+/**
+ * brute_manage_exec_attack() - Manage an exec brute force attack.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ * @now: The current timestamp in jiffies.
+ * @last_fork_crash: The last fork crash timestamp before updating it.
+ *
+ * For a correct management of an exec brute force attack it is only necessary
+ * to update the exec statistics and test if an attack is happening based on
+ * these data.
+ *
+ * It is important to note that if the fork and exec crash periods are the same,
+ * the attack test is avoided. This allows that in a brute force attack that
+ * happens through the fork system call, the mitigation method does not act on
+ * the parent process of the fork hierarchy.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and tasklist_lock and
+ * brute_stats_ptr_lock held.
+ */
+static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
+ u64 last_fork_crash)
+{
+ int ret;
+ struct brute_stats *exec_stats = stats;
+ u64 fork_period;
+ u64 exec_period;
+
+ ret = brute_update_exec_crash_period(&exec_stats, now, last_fork_crash);
+ if (WARN(ret, "No exec statistical data\n"))
+ return;
+
+ fork_period = brute_get_crash_period(stats);
+ exec_period = brute_get_crash_period(exec_stats);
+ if (fork_period == exec_period)
+ return;
+
+ if (brute_attack_running(exec_stats))
+ print_exec_attack_running(exec_stats);
+}
+
+/**
+ * brute_task_fatal_signal() - Target for the task_fatal_signal hook.
+ * @siginfo: Contains the signal information.
+ *
+ * To detect a brute force attack is necessary to update the fork and exec
+ * statistics in every fatal crash and act based on these data.
+ *
+ * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
+ * and brute_stats::lock since the task_free hook can be called from an IRQ
+ * context during the execution of the task_fatal_signal hook.
+ */
+static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
+{
+ struct brute_stats **stats;
+ unsigned long flags;
+ u64 last_fork_crash;
+ u64 now = get_jiffies_64();
+
+ stats = brute_stats_ptr(current);
+ read_lock(&tasklist_lock);
+ read_lock_irqsave(&brute_stats_ptr_lock, flags);
+
+ if (WARN(!*stats, "No statistical data\n")) {
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
+ read_unlock(&tasklist_lock);
+ return;
}
+
+ last_fork_crash = brute_manage_fork_attack(*stats, now);
+ brute_manage_exec_attack(*stats, now, last_fork_crash);
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
+ read_unlock(&tasklist_lock);
}
/*
@@ -235,6 +694,7 @@ static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(task_alloc, brute_task_alloc),
LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
LSM_HOOK_INIT(task_free, brute_task_free),
+ LSM_HOOK_INIT(task_fatal_signal, brute_task_fatal_signal),
};
/**
--
2.25.1
Add a new Kconfig file to define a menu entry under "Security options"
to enable the "Fork brute force attack detection and mitigation"
feature.
For a correct management of a fork brute force attack it is necessary
that all the tasks hold statistical data. The same statistical data
needs to be shared between all the tasks that hold the same memory
contents or in other words, between all the tasks that have been forked
without any execve call. So, define a statistical data structure to hold
all the necessary information shared by all the fork hierarchy
processes. This info is basically the number of crashes, the last crash
timestamp and the crash period's moving average.
When a forked task calls the execve system call, the memory contents are
set with new values. So, in this scenario the parent's statistical data
no need to be shared. Instead, a new statistical data structure must be
allocated to start a new hierarchy.
The statistical data that is shared between all the fork hierarchy
processes needs to be freed when this hierarchy disappears.
So, based in all the previous information define a LSM with three hooks
to manage all the commented cases. These hooks are "task_alloc" to do
the fork management, "bprm_committing_creds" to do the execve management
and "task_free" to release the resources.
Also, add to the task_struct's security blob the pointer to the
statistical data. This way, all the tasks will have access to this
information.
Signed-off-by: John Wood <[email protected]>
---
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 12 ++
security/brute/Makefile | 2 +
security/brute/brute.c | 257 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 281 insertions(+), 5 deletions(-)
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..204bb311b1f1 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -240,6 +240,7 @@ source "security/safesetid/Kconfig"
source "security/lockdown/Kconfig"
source "security/integrity/Kconfig"
+source "security/brute/Kconfig"
choice
prompt "First legacy 'major LSM' to be initialized"
@@ -277,11 +278,11 @@ endchoice
config LSM
string "Ordered list of enabled LSMs"
- default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
- default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
- default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
- default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
- default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
help
A comma-separated list of LSMs, in initialization order.
Any LSMs left off this list will be ignored. This can be
diff --git a/security/Makefile b/security/Makefile
index 3baf435de541..1236864876da 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM) += bpf/
# Object integrity file lists
subdir-$(CONFIG_INTEGRITY) += integrity
obj-$(CONFIG_INTEGRITY) += integrity/
+
+# Object brute file lists
+subdir-$(CONFIG_SECURITY_FORK_BRUTE) += brute
+obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute/
diff --git a/security/brute/Kconfig b/security/brute/Kconfig
new file mode 100644
index 000000000000..1bd2df1e2dec
--- /dev/null
+++ b/security/brute/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+config SECURITY_FORK_BRUTE
+ bool "Fork brute force attack detection and mitigation"
+ depends on SECURITY
+ help
+ This is an LSM that stops any fork brute force attack against
+ vulnerable userspace processes. The detection method is based on
+ the application crash period and as a mitigation procedure all the
+ offending tasks are killed. Like capabilities, this security module
+ stacks with other LSMs.
+
+ If you are unsure how to answer this question, answer N.
diff --git a/security/brute/Makefile b/security/brute/Makefile
new file mode 100644
index 000000000000..d3f233a132a9
--- /dev/null
+++ b/security/brute/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute.o
diff --git a/security/brute/brute.c b/security/brute/brute.c
new file mode 100644
index 000000000000..99d099e45112
--- /dev/null
+++ b/security/brute/brute.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <asm/current.h>
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/init.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/lsm_hooks.h>
+#include <linux/printk.h>
+#include <linux/refcount.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+/**
+ * struct brute_stats - Fork brute force attack statistics.
+ * @lock: Lock to protect the brute_stats structure.
+ * @refc: Reference counter.
+ * @faults: Number of crashes.
+ * @jiffies: Last crash timestamp.
+ * @period: Crash period's moving average.
+ *
+ * This structure holds the statistical data shared by all the fork hierarchy
+ * processes.
+ */
+struct brute_stats {
+ spinlock_t lock;
+ refcount_t refc;
+ unsigned char faults;
+ u64 jiffies;
+ u64 period;
+};
+
+/*
+ * brute_blob_sizes - LSM blob sizes.
+ *
+ * To share statistical data among all the fork hierarchy processes, define a
+ * pointer to the brute_stats structure as a part of the task_struct's security
+ * blob.
+ */
+static struct lsm_blob_sizes brute_blob_sizes __lsm_ro_after_init = {
+ .lbs_task = sizeof(struct brute_stats *),
+};
+
+/**
+ * brute_stats_ptr() - Get the pointer to the brute_stats structure.
+ * @task: Task that holds the statistical data.
+ *
+ * Return: A pointer to a pointer to the brute_stats structure.
+ */
+static inline struct brute_stats **brute_stats_ptr(struct task_struct *task)
+{
+ return task->security + brute_blob_sizes.lbs_task;
+}
+
+/**
+ * brute_new_stats() - Allocate a new statistics structure.
+ *
+ * If the allocation is successful the reference counter is set to one to
+ * indicate that there will be one task that points to this structure. Also, the
+ * last crash timestamp is set to now. This way, it is possible to compute the
+ * application crash period at the first fault.
+ *
+ * Return: NULL if the allocation fails. A pointer to the new allocated
+ * statistics structure if it success.
+ */
+static struct brute_stats *brute_new_stats(void)
+{
+ struct brute_stats *stats;
+
+ stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
+ if (!stats)
+ return NULL;
+
+ spin_lock_init(&stats->lock);
+ refcount_set(&stats->refc, 1);
+ stats->faults = 0;
+ stats->jiffies = get_jiffies_64();
+ stats->period = 0;
+
+ return stats;
+}
+
+/**
+ * brute_share_stats() - Share the statistical data between processes.
+ * @src: Source of statistics to be shared.
+ * @dst: Destination of statistics to be shared.
+ *
+ * Copy the src's pointer to the statistical data structure to the dst's pointer
+ * to the same structure. Since there is a new process that shares the same
+ * data, increase the reference counter. The src's pointer cannot be NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_alloc hook.
+ */
+static void brute_share_stats(struct brute_stats *src,
+ struct brute_stats **dst)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&src->lock, flags);
+ refcount_inc(&src->refc);
+ *dst = src;
+ spin_unlock_irqrestore(&src->lock, flags);
+}
+
+/**
+ * brute_task_alloc() - Target for the task_alloc hook.
+ * @task: Task being allocated.
+ * @clone_flags: Contains the flags indicating what should be shared.
+ *
+ * For a correct management of a fork brute force attack it is necessary that
+ * all the tasks hold statistical data. The same statistical data needs to be
+ * shared between all the tasks that hold the same memory contents or in other
+ * words, between all the tasks that have been forked without any execve call.
+ *
+ * To ensure this, if the current task doesn't have statistical data when forks,
+ * it is mandatory to allocate a new statistics structure and share it between
+ * this task and the new one being allocated. Otherwise, share the statistics
+ * that the current task already has.
+ *
+ * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
+ * otherwise.
+ */
+static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
+{
+ struct brute_stats **stats, **p_stats;
+
+ stats = brute_stats_ptr(task);
+ p_stats = brute_stats_ptr(current);
+
+ if (likely(*p_stats)) {
+ brute_share_stats(*p_stats, stats);
+ return 0;
+ }
+
+ *stats = brute_new_stats();
+ if (!*stats)
+ return -ENOMEM;
+
+ brute_share_stats(*stats, p_stats);
+ return 0;
+}
+
+/**
+ * brute_task_execve() - Target for the bprm_committing_creds hook.
+ * @bprm: Points to the linux_binprm structure.
+ *
+ * When a forked task calls the execve system call, the memory contents are set
+ * with new values. So, in this scenario the parent's statistical data no need
+ * to be shared. Instead, a new statistical data structure must be allocated to
+ * start a new hierarchy. This condition is detected when the statistics
+ * reference counter holds a value greater than or equal to two (a fork always
+ * sets the statistics reference counter to a minimum of two since the parent
+ * and the child task are sharing the same data).
+ *
+ * However, if the execve function is called immediately after another execve
+ * call, althought the memory contents are reset, there is no need to allocate
+ * a new statistical data structure. This is possible because at this moment
+ * only one task (the task that calls the execve function) points to the data.
+ * In this case, the previous allocation is used but the statistics are reset.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the bprm_committing_creds hook.
+ */
+static void brute_task_execve(struct linux_binprm *bprm)
+{
+ struct brute_stats **stats;
+ unsigned long flags;
+
+ stats = brute_stats_ptr(current);
+ if (WARN(!*stats, "No statistical data\n"))
+ return;
+
+ spin_lock_irqsave(&(*stats)->lock, flags);
+
+ if (!refcount_dec_not_one(&(*stats)->refc)) {
+ /* execve call after an execve call */
+ (*stats)->faults = 0;
+ (*stats)->jiffies = get_jiffies_64();
+ (*stats)->period = 0;
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+ return;
+ }
+
+ /* execve call after a fork call */
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+ *stats = brute_new_stats();
+ WARN(!*stats, "Cannot allocate statistical data\n");
+}
+
+/**
+ * brute_task_free() - Target for the task_free hook.
+ * @task: Task about to be freed.
+ *
+ * The statistical data that is shared between all the fork hierarchy processes
+ * needs to be freed when this hierarchy disappears.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_free hook.
+ */
+static void brute_task_free(struct task_struct *task)
+{
+ struct brute_stats **stats;
+ unsigned long flags;
+ bool refc_is_zero;
+
+ stats = brute_stats_ptr(task);
+ if (WARN(!*stats, "No statistical data\n"))
+ return;
+
+ spin_lock_irqsave(&(*stats)->lock, flags);
+ refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+
+ if (refc_is_zero) {
+ kfree(*stats);
+ *stats = NULL;
+ }
+}
+
+/*
+ * brute_hooks - Targets for the LSM's hooks.
+ */
+static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
+ LSM_HOOK_INIT(task_alloc, brute_task_alloc),
+ LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
+ LSM_HOOK_INIT(task_free, brute_task_free),
+};
+
+/**
+ * brute_init() - Initialize the brute LSM.
+ *
+ * Return: Always returns zero.
+ */
+static int __init brute_init(void)
+{
+ pr_info("Brute initialized\n");
+ security_add_hooks(brute_hooks, ARRAY_SIZE(brute_hooks),
+ KBUILD_MODNAME);
+ return 0;
+}
+
+DEFINE_LSM(brute) = {
+ .name = KBUILD_MODNAME,
+ .init = brute_init,
+ .blobs = &brute_blob_sizes,
+};
--
2.25.1
To avoid false positives during the attack detection it is necessary to
narrow the possible cases. Only the following scenarios are taken into
account:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, this patch checks if any of these privilege boundaries have been
crossed before to compute the application crash period.
Also, in every fatal crash only the signals delivered by the kernel are
taken into account with the exception of the SIGABRT signal since the
latter is used by glibc for stack canary, malloc, etc failures, which may
indicate that a mitigation has been triggered.
Signed-off-by: John Wood <[email protected]>
---
security/brute/brute.c | 293 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 280 insertions(+), 13 deletions(-)
diff --git a/security/brute/brute.c b/security/brute/brute.c
index 870db55332d4..38e5e050964a 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -3,15 +3,25 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <asm/current.h>
+#include <asm/rwonce.h>
+#include <asm/siginfo.h>
+#include <asm/signal.h>
+#include <linux/binfmts.h>
#include <linux/bug.h>
#include <linux/compiler.h>
+#include <linux/cred.h>
+#include <linux/dcache.h>
#include <linux/errno.h>
+#include <linux/fs.h>
#include <linux/gfp.h>
+#include <linux/if.h>
#include <linux/init.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/lsm_hooks.h>
#include <linux/math64.h>
+#include <linux/netdevice.h>
+#include <linux/path.h>
#include <linux/printk.h>
#include <linux/refcount.h>
#include <linux/rwlock.h>
@@ -19,9 +29,35 @@
#include <linux/sched.h>
#include <linux/sched/signal.h>
#include <linux/sched/task.h>
+#include <linux/signal.h>
+#include <linux/skbuff.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/stat.h>
#include <linux/types.h>
+#include <linux/uidgid.h>
+
+/**
+ * struct brute_cred - Saved credentials.
+ * @uid: Real UID of the task.
+ * @gid: Real GID of the task.
+ * @suid: Saved UID of the task.
+ * @sgid: Saved GID of the task.
+ * @euid: Effective UID of the task.
+ * @egid: Effective GID of the task.
+ * @fsuid: UID for VFS ops.
+ * @fsgid: GID for VFS ops.
+ */
+struct brute_cred {
+ kuid_t uid;
+ kgid_t gid;
+ kuid_t suid;
+ kgid_t sgid;
+ kuid_t euid;
+ kgid_t egid;
+ kuid_t fsuid;
+ kgid_t fsgid;
+};
/**
* struct brute_stats - Fork brute force attack statistics.
@@ -30,6 +66,9 @@
* @faults: Number of crashes.
* @jiffies: Last crash timestamp.
* @period: Crash period's moving average.
+ * @saved_cred: Saved credentials.
+ * @network: Network activity flag.
+ * @bounds_crossed: Privilege bounds crossed flag.
*
* This structure holds the statistical data shared by all the fork hierarchy
* processes.
@@ -40,6 +79,9 @@ struct brute_stats {
unsigned char faults;
u64 jiffies;
u64 period;
+ struct brute_cred saved_cred;
+ unsigned char network : 1;
+ unsigned char bounds_crossed : 1;
};
/*
@@ -71,18 +113,25 @@ static inline struct brute_stats **brute_stats_ptr(struct task_struct *task)
/**
* brute_new_stats() - Allocate a new statistics structure.
+ * @network_to_local: Network activity followed by a fork or execve system call.
+ * @is_setid: The executable file has the setid flags set.
*
* If the allocation is successful the reference counter is set to one to
* indicate that there will be one task that points to this structure. Also, the
* last crash timestamp is set to now. This way, it is possible to compute the
* application crash period at the first fault.
*
+ * Moreover, the credentials of the current task are saved. Also, the network
+ * and bounds_crossed flags are set based on the network_to_local and is_setid
+ * parameters.
+ *
* Return: NULL if the allocation fails. A pointer to the new allocated
* statistics structure if it success.
*/
-static struct brute_stats *brute_new_stats(void)
+static struct brute_stats *brute_new_stats(bool network_to_local, bool is_setid)
{
struct brute_stats *stats;
+ const struct cred *cred = current_cred();
stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
if (!stats)
@@ -93,6 +142,16 @@ static struct brute_stats *brute_new_stats(void)
stats->faults = 0;
stats->jiffies = get_jiffies_64();
stats->period = 0;
+ stats->saved_cred.uid = cred->uid;
+ stats->saved_cred.gid = cred->gid;
+ stats->saved_cred.suid = cred->suid;
+ stats->saved_cred.sgid = cred->sgid;
+ stats->saved_cred.euid = cred->euid;
+ stats->saved_cred.egid = cred->egid;
+ stats->saved_cred.fsuid = cred->fsuid;
+ stats->saved_cred.fsgid = cred->fsgid;
+ stats->network = network_to_local;
+ stats->bounds_crossed = network_to_local || is_setid;
return stats;
}
@@ -137,6 +196,10 @@ static void brute_share_stats(struct brute_stats *src,
* this task and the new one being allocated. Otherwise, share the statistics
* that the current task already has.
*
+ * Also, if the shared statistics indicate a previous network activity, the
+ * bounds_crossed flag must be set to show that a network-to-local privilege
+ * boundary has been crossed.
+ *
* It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
* and brute_stats::lock since the task_free hook can be called from an IRQ
* context during the execution of the task_alloc hook.
@@ -155,11 +218,14 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
if (likely(*p_stats)) {
brute_share_stats(*p_stats, stats);
+ spin_lock(&(*stats)->lock);
+ (*stats)->bounds_crossed |= (*stats)->network;
+ spin_unlock(&(*stats)->lock);
write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return 0;
}
- *stats = brute_new_stats();
+ *stats = brute_new_stats(false, false);
if (!*stats) {
write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return -ENOMEM;
@@ -170,6 +236,61 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
return 0;
}
+/**
+ * brute_is_setid() - Test if the executable file has the setid flags set.
+ * @bprm: Points to the linux_binprm structure.
+ *
+ * Return: True if the executable file has the setid flags set. False otherwise.
+ */
+static bool brute_is_setid(const struct linux_binprm *bprm)
+{
+ struct file *file = bprm->file;
+ struct inode *inode;
+ umode_t mode;
+
+ if (!file)
+ return false;
+
+ inode = file->f_path.dentry->d_inode;
+ mode = inode->i_mode;
+
+ return !!(mode & (S_ISUID | S_ISGID));
+}
+
+/**
+ * brute_reset_stats() - Reset the statistical data.
+ * @stats: Statistics to be reset.
+ * @is_setid: The executable file has the setid flags set.
+ *
+ * Reset the faults and period and set the last crash timestamp to now. This
+ * way, it is possible to compute the application crash period at the next
+ * fault. Also, save the credentials of the current task and update the
+ * bounds_crossed flag based on a previous network activity and the is_setid
+ * parameter.
+ *
+ * The statistics to be reset cannot be NULL.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * and brute_stats::lock held.
+ */
+static void brute_reset_stats(struct brute_stats *stats, bool is_setid)
+{
+ const struct cred *cred = current_cred();
+
+ stats->faults = 0;
+ stats->jiffies = get_jiffies_64();
+ stats->period = 0;
+ stats->saved_cred.uid = cred->uid;
+ stats->saved_cred.gid = cred->gid;
+ stats->saved_cred.suid = cred->suid;
+ stats->saved_cred.sgid = cred->sgid;
+ stats->saved_cred.euid = cred->euid;
+ stats->saved_cred.egid = cred->egid;
+ stats->saved_cred.fsuid = cred->fsuid;
+ stats->saved_cred.fsgid = cred->fsgid;
+ stats->bounds_crossed = stats->network || is_setid;
+}
+
/**
* brute_task_execve() - Target for the bprm_committing_creds hook.
* @bprm: Points to the linux_binprm structure.
@@ -188,6 +309,11 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
* only one task (the task that calls the execve function) points to the data.
* In this case, the previous allocation is used but the statistics are reset.
*
+ * Also, if the statistics of the process that calls the execve system call
+ * indicate a previous network activity or the executable file has the setid
+ * flags set, the bounds_crossed flag must be set to show that a network to
+ * local privilege boundary or setid boundary has been crossed respectively.
+ *
* It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
* and brute_stats::lock since the task_free hook can be called from an IRQ
* context during the execution of the bprm_committing_creds hook.
@@ -196,6 +322,8 @@ static void brute_task_execve(struct linux_binprm *bprm)
{
struct brute_stats **stats;
unsigned long flags;
+ bool network_to_local;
+ bool is_setid = false;
stats = brute_stats_ptr(current);
read_lock_irqsave(&brute_stats_ptr_lock, flags);
@@ -206,12 +334,18 @@ static void brute_task_execve(struct linux_binprm *bprm)
}
spin_lock(&(*stats)->lock);
+ network_to_local = (*stats)->network;
+
+ /*
+ * A network_to_local flag equal to true will set the bounds_crossed
+ * flag. So, in this scenario the "is setid" test can be avoided.
+ */
+ if (!network_to_local)
+ is_setid = brute_is_setid(bprm);
if (!refcount_dec_not_one(&(*stats)->refc)) {
/* execve call after an execve call */
- (*stats)->faults = 0;
- (*stats)->jiffies = get_jiffies_64();
- (*stats)->period = 0;
+ brute_reset_stats(*stats, is_setid);
spin_unlock(&(*stats)->lock);
read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
return;
@@ -222,7 +356,7 @@ static void brute_task_execve(struct linux_binprm *bprm)
read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
write_lock_irqsave(&brute_stats_ptr_lock, flags);
- *stats = brute_new_stats();
+ *stats = brute_new_stats(network_to_local, is_setid);
WARN(!*stats, "Cannot allocate statistical data\n");
write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
}
@@ -653,12 +787,103 @@ static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
print_exec_attack_running(exec_stats);
}
+/**
+ * brute_priv_have_changed() - Test if the privileges have changed.
+ * @stats: Statistics that hold the saved credentials.
+ *
+ * The privileges have changed if the credentials of the current task are
+ * different from the credentials saved in the statistics structure.
+ *
+ * The statistics that hold the saved credentials cannot be NULL.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * and brute_stats::lock held.
+ * Return: True if the privileges have changed. False otherwise.
+ */
+static bool brute_priv_have_changed(struct brute_stats *stats)
+{
+ const struct cred *cred = current_cred();
+ bool priv_have_changed;
+
+ priv_have_changed = !uid_eq(stats->saved_cred.uid, cred->uid) ||
+ !gid_eq(stats->saved_cred.gid, cred->gid) ||
+ !uid_eq(stats->saved_cred.suid, cred->suid) ||
+ !gid_eq(stats->saved_cred.sgid, cred->sgid) ||
+ !uid_eq(stats->saved_cred.euid, cred->euid) ||
+ !gid_eq(stats->saved_cred.egid, cred->egid) ||
+ !uid_eq(stats->saved_cred.fsuid, cred->fsuid) ||
+ !gid_eq(stats->saved_cred.fsgid, cred->fsgid);
+
+ return priv_have_changed;
+}
+
+/**
+ * brute_threat_model_supported() - Test if the threat model is supported.
+ * @siginfo: Contains the signal information.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * To avoid false positives during the attack detection it is necessary to
+ * narrow the possible cases. Only the following scenarios are taken into
+ * account:
+ *
+ * 1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
+ * desirable memory layout is got (e.g. Stack Clash).
+ * 2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until
+ * a desirable memory layout is got (e.g. what CTFs do for simple network
+ * service).
+ * 3.- Launching processes without exec() (e.g. Android Zygote) and exposing
+ * state to attack a sibling.
+ * 4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
+ * the previously shared memory layout of all the other children is exposed
+ * (e.g. kind of related to HeartBleed).
+ *
+ * In each case, a privilege boundary has been crossed:
+ *
+ * Case 1: setuid/setgid process
+ * Case 2: network to local
+ * Case 3: privilege changes
+ * Case 4: network to local
+ *
+ * Also, only the signals delivered by the kernel are taken into account with
+ * the exception of the SIGABRT signal since the latter is used by glibc for
+ * stack canary, malloc, etc failures, which may indicate that a mitigation has
+ * been triggered.
+ *
+ * The signal information and the statistical data shared by all the fork
+ * hierarchy processes cannot be NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
+ * Return: True if the threat model is supported. False otherwise.
+ */
+static bool brute_threat_model_supported(const kernel_siginfo_t *siginfo,
+ struct brute_stats *stats)
+{
+ bool bounds_crossed;
+
+ if (siginfo->si_signo == SIGKILL && siginfo->si_code != SIGABRT)
+ return false;
+
+ spin_lock(&stats->lock);
+ bounds_crossed = stats->bounds_crossed;
+ bounds_crossed = bounds_crossed || brute_priv_have_changed(stats);
+ stats->bounds_crossed = bounds_crossed;
+ spin_unlock(&stats->lock);
+
+ return bounds_crossed;
+}
+
/**
* brute_task_fatal_signal() - Target for the task_fatal_signal hook.
* @siginfo: Contains the signal information.
*
- * To detect a brute force attack is necessary to update the fork and exec
- * statistics in every fatal crash and act based on these data.
+ * To detect a brute force attack it is necessary, as a first step, to test in
+ * every fatal crash if the threat model is supported. If so, update the fork
+ * and exec statistics and act based on these data.
*
* It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
* and brute_stats::lock since the task_free hook can be called from an IRQ
@@ -675,18 +900,59 @@ static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
read_lock(&tasklist_lock);
read_lock_irqsave(&brute_stats_ptr_lock, flags);
- if (WARN(!*stats, "No statistical data\n")) {
- read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
- read_unlock(&tasklist_lock);
- return;
- }
+ if (WARN(!*stats, "No statistical data\n"))
+ goto unlock;
+
+ if (!brute_threat_model_supported(siginfo, *stats))
+ goto unlock;
last_fork_crash = brute_manage_fork_attack(*stats, now);
brute_manage_exec_attack(*stats, now, last_fork_crash);
+unlock:
read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
read_unlock(&tasklist_lock);
}
+/**
+ * brute_network() - Target for the socket_sock_rcv_skb hook.
+ * @sk: Contains the sock (not socket) associated with the incoming sk_buff.
+ * @skb: Contains the incoming network data.
+ *
+ * A previous step to detect that a network to local boundary has been crossed
+ * is to detect if there is network activity. To do this, it is only necessary
+ * to check if there are data packets received from a network device other than
+ * loopback.
+ *
+ * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
+ * and brute_stats::lock since the task_free hook can be called from an IRQ
+ * context during the execution of the socket_sock_rcv_skb hook.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. Zero
+ * otherwise.
+ */
+static int brute_network(struct sock *sk, struct sk_buff *skb)
+{
+ struct brute_stats **stats;
+ unsigned long flags;
+
+ if (!skb->dev || (skb->dev->flags & IFF_LOOPBACK))
+ return 0;
+
+ stats = brute_stats_ptr(current);
+ read_lock_irqsave(&brute_stats_ptr_lock, flags);
+
+ if (!*stats) {
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
+ return -EFAULT;
+ }
+
+ spin_lock(&(*stats)->lock);
+ (*stats)->network = true;
+ spin_unlock(&(*stats)->lock);
+ read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
+ return 0;
+}
+
/*
* brute_hooks - Targets for the LSM's hooks.
*/
@@ -695,6 +961,7 @@ static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
LSM_HOOK_INIT(task_free, brute_task_free),
LSM_HOOK_INIT(task_fatal_signal, brute_task_fatal_signal),
+ LSM_HOOK_INIT(socket_sock_rcv_skb, brute_network),
};
/**
--
2.25.1
In order to mitigate a brute force attack all the offending tasks involved
in the attack must be killed. In other words, it is necessary to kill all
the tasks that share the fork and/or exec statistical data related to the
attack. Moreover, if the attack happens through the fork system call, the
processes that have the same group_leader that the current task (the task
that has crashed) must be avoided since they are in the path to be killed.
When the SIGKILL signal is sent to the offending tasks, the function
"brute_kill_offending_tasks" will be called in a recursive way from the
task_fatal_signal LSM hook due to a small crash period. So, to avoid kill
again the same tasks due to a recursive call of this function, it is
necessary to disable the attack detection for the involved hierarchies.
To disable the attack detection, set to zero the last crash timestamp and
avoid to compute the application crash period in this case.
Signed-off-by: John Wood <[email protected]>
---
security/brute/brute.c | 141 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 132 insertions(+), 9 deletions(-)
diff --git a/security/brute/brute.c b/security/brute/brute.c
index 38e5e050964a..36a3286a02dd 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -22,6 +22,7 @@
#include <linux/math64.h>
#include <linux/netdevice.h>
#include <linux/path.h>
+#include <linux/pid.h>
#include <linux/printk.h>
#include <linux/refcount.h>
#include <linux/rwlock.h>
@@ -64,7 +65,7 @@ struct brute_cred {
* @lock: Lock to protect the brute_stats structure.
* @refc: Reference counter.
* @faults: Number of crashes.
- * @jiffies: Last crash timestamp.
+ * @jiffies: Last crash timestamp. If zero, the attack detection is disabled.
* @period: Crash period's moving average.
* @saved_cred: Saved credentials.
* @network: Network activity flag.
@@ -571,6 +572,125 @@ static inline void print_fork_attack_running(void)
pr_warn("Fork brute force attack detected [%s]\n", current->comm);
}
+/**
+ * brute_disabled() - Test if the brute force attack detection is disabled.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * The brute force attack detection enabling/disabling is based on the last
+ * crash timestamp. A zero timestamp indicates that this feature is disabled. A
+ * timestamp greater than zero indicates that the attack detection is enabled.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * held.
+ * Return: True if the brute force attack detection is disabled. False
+ * otherwise.
+ */
+static bool brute_disabled(struct brute_stats *stats)
+{
+ bool disabled;
+
+ spin_lock(&stats->lock);
+ disabled = !stats->jiffies;
+ spin_unlock(&stats->lock);
+
+ return disabled;
+}
+
+/**
+ * brute_disable() - Disable the brute force attack detection.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * To disable the brute force attack detection it is only necessary to set the
+ * last crash timestamp to zero. A zero timestamp indicates that this feature is
+ * disabled. A timestamp greater than zero indicates that the attack detection
+ * is enabled.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
+ * and brute_stats::lock held.
+ */
+static inline void brute_disable(struct brute_stats *stats)
+{
+ stats->jiffies = 0;
+}
+
+/**
+ * enum brute_attack_type - Brute force attack type.
+ * @BRUTE_ATTACK_TYPE_FORK: Attack that happens through the fork system call.
+ * @BRUTE_ATTACK_TYPE_EXEC: Attack that happens through the execve system call.
+ */
+enum brute_attack_type {
+ BRUTE_ATTACK_TYPE_FORK,
+ BRUTE_ATTACK_TYPE_EXEC,
+};
+
+/**
+ * brute_kill_offending_tasks() - Kill the offending tasks.
+ * @attack_type: Brute force attack type.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * When a brute force attack is detected all the offending tasks involved in the
+ * attack must be killed. In other words, it is necessary to kill all the tasks
+ * that share the same statistical data. Moreover, if the attack happens through
+ * the fork system call, the processes that have the same group_leader that the
+ * current task must be avoided since they are in the path to be killed.
+ *
+ * When the SIGKILL signal is sent to the offending tasks, this function will be
+ * called again from the task_fatal_signal hook due to a small crash period. So,
+ * to avoid kill again the same tasks due to a recursive call of this function,
+ * it is necessary to disable the attack detection for this fork hierarchy.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the brute_stats::lock
+ * since the task_free hook can be called from an IRQ context during the
+ * execution of the task_fatal_signal hook.
+ *
+ * Context: Must be called with interrupts disabled and tasklist_lock and
+ * brute_stats_ptr_lock held.
+ */
+static void brute_kill_offending_tasks(enum brute_attack_type attack_type,
+ struct brute_stats *stats)
+{
+ struct task_struct *p;
+ struct brute_stats **p_stats;
+
+ spin_lock(&stats->lock);
+
+ if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
+ refcount_read(&stats->refc) == 1) {
+ spin_unlock(&stats->lock);
+ return;
+ }
+
+ brute_disable(stats);
+ spin_unlock(&stats->lock);
+
+ for_each_process(p) {
+ if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
+ p->group_leader == current->group_leader)
+ continue;
+
+ p_stats = brute_stats_ptr(p);
+ if (*p_stats != stats)
+ continue;
+
+ do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
+ pr_warn_ratelimited("Offending process %d [%s] killed\n",
+ p->pid, p->comm);
+ }
+}
+
/**
* brute_manage_fork_attack() - Manage a fork brute force attack.
* @stats: Statistical data shared by all the fork hierarchy processes.
@@ -586,8 +706,8 @@ static inline void print_fork_attack_running(void)
* since the task_free hook can be called from an IRQ context during the
* execution of the task_fatal_signal hook.
*
- * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
- * held.
+ * Context: Must be called with interrupts disabled and tasklist_lock and
+ * brute_stats_ptr_lock held.
* Return: The last crash timestamp before updating it.
*/
static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
@@ -595,8 +715,10 @@ static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
u64 last_fork_crash;
last_fork_crash = brute_update_crash_period(stats, now);
- if (brute_attack_running(stats))
+ if (brute_attack_running(stats)) {
print_fork_attack_running();
+ brute_kill_offending_tasks(BRUTE_ATTACK_TYPE_FORK, stats);
+ }
return last_fork_crash;
}
@@ -783,8 +905,10 @@ static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
if (fork_period == exec_period)
return;
- if (brute_attack_running(exec_stats))
+ if (brute_attack_running(exec_stats)) {
print_exec_attack_running(exec_stats);
+ brute_kill_offending_tasks(BRUTE_ATTACK_TYPE_EXEC, exec_stats);
+ }
}
/**
@@ -900,10 +1024,9 @@ static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
read_lock(&tasklist_lock);
read_lock_irqsave(&brute_stats_ptr_lock, flags);
- if (WARN(!*stats, "No statistical data\n"))
- goto unlock;
-
- if (!brute_threat_model_supported(siginfo, *stats))
+ if (WARN(!*stats, "No statistical data\n") ||
+ brute_disabled(*stats) ||
+ !brute_threat_model_supported(siginfo, *stats))
goto unlock;
last_fork_crash = brute_manage_fork_attack(*stats, now);
--
2.25.1
Add tests to check the brute LSM functionality and cover fork/exec brute
force attacks crossing the following privilege boundaries:
1.- setuid process
2.- privilege changes
3.- network to local
Also, as a first step check that fork/exec brute force attacks without
crossing any privilege boundariy already commented doesn't trigger the
detection and mitigation stage.
All the fork brute force attacks are carried out via the "exec" app to
avoid the triggering of the "brute" LSM over the shell script running
the tests.
Signed-off-by: John Wood <[email protected]>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 ++
tools/testing/selftests/brute/test.c | 507 +++++++++++++++++++++++
tools/testing/selftests/brute/test.sh | 226 ++++++++++
7 files changed, 786 insertions(+)
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6c575cf34a71..d4cf9e1c0a6d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -2,6 +2,7 @@
TARGETS = arm64
TARGETS += bpf
TARGETS += breakpoints
+TARGETS += brute
TARGETS += capabilities
TARGETS += cgroup
TARGETS += clone3
diff --git a/tools/testing/selftests/brute/.gitignore b/tools/testing/selftests/brute/.gitignore
new file mode 100644
index 000000000000..1ccc45251a1b
--- /dev/null
+++ b/tools/testing/selftests/brute/.gitignore
@@ -0,0 +1,2 @@
+exec
+test
diff --git a/tools/testing/selftests/brute/Makefile b/tools/testing/selftests/brute/Makefile
new file mode 100644
index 000000000000..52662d0b484c
--- /dev/null
+++ b/tools/testing/selftests/brute/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+CFLAGS += -Wall -O2
+TEST_PROGS := test.sh
+TEST_GEN_FILES := exec test
+include ../lib.mk
diff --git a/tools/testing/selftests/brute/config b/tools/testing/selftests/brute/config
new file mode 100644
index 000000000000..3587b7bf6c23
--- /dev/null
+++ b/tools/testing/selftests/brute/config
@@ -0,0 +1 @@
+CONFIG_SECURITY_FORK_BRUTE=y
diff --git a/tools/testing/selftests/brute/exec.c b/tools/testing/selftests/brute/exec.c
new file mode 100644
index 000000000000..1bbe72f6e4bd
--- /dev/null
+++ b/tools/testing/selftests/brute/exec.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <libgen.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+static __attribute__((noreturn)) void error_failure(const char *message)
+{
+ perror(message);
+ exit(EXIT_FAILURE);
+}
+
+#define PROG_NAME basename(argv[0])
+
+int main(int argc, char **argv)
+{
+ pid_t pid;
+ int status;
+
+ if (argc < 2) {
+ printf("Usage: %s <EXECUTABLE>\n", PROG_NAME);
+ exit(EXIT_FAILURE);
+ }
+
+ pid = fork();
+ if (pid < 0)
+ error_failure("fork");
+
+ /* Child process */
+ if (!pid) {
+ execve(argv[1], &argv[1], NULL);
+ error_failure("execve");
+ }
+
+ /* Parent process */
+ pid = waitpid(pid, &status, 0);
+ if (pid < 0)
+ error_failure("waitpid");
+
+ return EXIT_SUCCESS;
+}
diff --git a/tools/testing/selftests/brute/test.c b/tools/testing/selftests/brute/test.c
new file mode 100644
index 000000000000..44c32f446dca
--- /dev/null
+++ b/tools/testing/selftests/brute/test.c
@@ -0,0 +1,507 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <arpa/inet.h>
+#include <errno.h>
+#include <libgen.h>
+#include <pwd.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+static const char *message = "message";
+
+enum mode {
+ MODE_NONE,
+ MODE_CRASH,
+ MODE_SERVER_CRASH,
+ MODE_CLIENT,
+};
+
+enum crash_after {
+ CRASH_AFTER_NONE,
+ CRASH_AFTER_FORK,
+ CRASH_AFTER_EXEC,
+};
+
+enum signal_from {
+ SIGNAL_FROM_NONE,
+ SIGNAL_FROM_USER,
+ SIGNAL_FROM_KERNEL,
+};
+
+struct args {
+ uint32_t ip;
+ uint16_t port;
+ int counter;
+ long timeout;
+ enum mode mode;
+ enum crash_after crash_after;
+ enum signal_from signal_from;
+ unsigned char has_counter : 1;
+ unsigned char has_change_priv : 1;
+ unsigned char has_ip : 1;
+ unsigned char has_port : 1;
+ unsigned char has_timeout : 1;
+};
+
+#define OPT_STRING "hm:c:s:n:Ca:p:t:"
+
+static void usage(const char *prog)
+{
+ printf("Usage: %s <OPTIONS>\n", prog);
+ printf("OPTIONS:\n");
+ printf(" -h: Show this help and exit. Optional.\n");
+ printf(" -m (crash | server_crash | client): Mode. Required.\n");
+ printf("Options for crash mode:\n");
+ printf(" -c (fork | exec): Crash after. Optional.\n");
+ printf(" -s (user | kernel): Signal from. Required.\n");
+ printf(" -n counter: Number of crashes.\n");
+ printf(" Required if the option -c is used.\n");
+ printf(" Not used without the option -c.\n");
+ printf(" Range from 1 to INT_MAX.\n");
+ printf(" -C: Change privileges before crash. Optional.\n");
+ printf("Options for server_crash mode:\n");
+ printf(" -a ip: Ip v4 address to accept. Required.\n");
+ printf(" -p port: Port number. Required.\n");
+ printf(" Range from 1 to UINT16_MAX.\n");
+ printf(" -t secs: Accept timeout. Required.\n");
+ printf(" Range from 1 to LONG_MAX.\n");
+ printf(" -c (fork | exec): Crash after. Required.\n");
+ printf(" -s (user | kernel): Signal from. Required.\n");
+ printf(" -n counter: Number of crashes. Required.\n");
+ printf(" Range from 1 to INT_MAX.\n");
+ printf("Options for client mode:\n");
+ printf(" -a ip: Ip v4 address to connect. Required.\n");
+ printf(" -p port: Port number. Required.\n");
+ printf(" Range from 1 to UINT16_MAX.\n");
+ printf(" -t secs: Connect timeout. Required.\n");
+ printf(" Range from 1 to LONG_MAX.\n");
+}
+
+static __attribute__((noreturn)) void info_failure(const char *message,
+ const char *prog)
+{
+ printf("%s\n", message);
+ usage(prog);
+ exit(EXIT_FAILURE);
+}
+
+static enum mode get_mode(const char *text, const char *prog)
+{
+ if (!strcmp(text, "crash"))
+ return MODE_CRASH;
+
+ if (!strcmp(text, "server_crash"))
+ return MODE_SERVER_CRASH;
+
+ if (!strcmp(text, "client"))
+ return MODE_CLIENT;
+
+ info_failure("Invalid mode option [-m].", prog);
+}
+
+static enum crash_after get_crash_after(const char *text, const char *prog)
+{
+ if (!strcmp(text, "fork"))
+ return CRASH_AFTER_FORK;
+
+ if (!strcmp(text, "exec"))
+ return CRASH_AFTER_EXEC;
+
+ info_failure("Invalid crash after option [-c].", prog);
+}
+
+static enum signal_from get_signal_from(const char *text, const char *prog)
+{
+ if (!strcmp(text, "user"))
+ return SIGNAL_FROM_USER;
+
+ if (!strcmp(text, "kernel"))
+ return SIGNAL_FROM_KERNEL;
+
+ info_failure("Invalid signal from option [-s]", prog);
+}
+
+static int get_counter(const char *text, const char *prog)
+{
+ int counter;
+
+ counter = atoi(text);
+ if (counter > 0)
+ return counter;
+
+ info_failure("Invalid counter option [-n].", prog);
+}
+
+static __attribute__((noreturn)) void error_failure(const char *message)
+{
+ perror(message);
+ exit(EXIT_FAILURE);
+}
+
+static uint32_t get_ip(const char *text, const char *prog)
+{
+ int ret;
+ uint32_t ip;
+
+ ret = inet_pton(AF_INET, text, &ip);
+ if (!ret)
+ info_failure("Invalid ip option [-a].", prog);
+ else if (ret < 0)
+ error_failure("inet_pton");
+
+ return ip;
+}
+
+static uint16_t get_port(const char *text, const char *prog)
+{
+ long port;
+
+ port = atol(text);
+ if ((port > 0) && (port <= UINT16_MAX))
+ return htons(port);
+
+ info_failure("Invalid port option [-p].", prog);
+}
+
+static long get_timeout(const char *text, const char *prog)
+{
+ long timeout;
+
+ timeout = atol(text);
+ if (timeout > 0)
+ return timeout;
+
+ info_failure("Invalid timeout option [-t].", prog);
+}
+
+static void check_args(const struct args *args, const char *prog)
+{
+ if (args->mode == MODE_CRASH && args->crash_after != CRASH_AFTER_NONE &&
+ args->signal_from != SIGNAL_FROM_NONE && args->has_counter &&
+ !args->has_ip && !args->has_port && !args->has_timeout)
+ return;
+
+ if (args->mode == MODE_CRASH && args->signal_from != SIGNAL_FROM_NONE &&
+ args->crash_after == CRASH_AFTER_NONE && !args->has_counter &&
+ !args->has_ip && !args->has_port && !args->has_timeout)
+ return;
+
+ if (args->mode == MODE_SERVER_CRASH && args->has_ip && args->has_port &&
+ args->has_timeout && args->crash_after != CRASH_AFTER_NONE &&
+ args->signal_from != SIGNAL_FROM_NONE && args->has_counter &&
+ !args->has_change_priv)
+ return;
+
+ if (args->mode == MODE_CLIENT && args->has_ip && args->has_port &&
+ args->has_timeout && args->crash_after == CRASH_AFTER_NONE &&
+ args->signal_from == SIGNAL_FROM_NONE && !args->has_counter &&
+ !args->has_change_priv)
+ return;
+
+ info_failure("Invalid use of options.", prog);
+}
+
+static uid_t get_non_root_uid(void)
+{
+ struct passwd *pwent;
+ uid_t uid;
+
+ while (true) {
+ errno = 0;
+ pwent = getpwent();
+ if (!pwent) {
+ if (errno) {
+ perror("getpwent");
+ endpwent();
+ exit(EXIT_FAILURE);
+ }
+ break;
+ }
+
+ if (pwent->pw_uid) {
+ uid = pwent->pw_uid;
+ endpwent();
+ return uid;
+ }
+ }
+
+ endpwent();
+ printf("A user different of root is needed.\n");
+ exit(EXIT_FAILURE);
+}
+
+static inline void do_sigsegv(void)
+{
+ int *p = NULL;
+ *p = 0;
+}
+
+static void do_sigkill(void)
+{
+ int ret;
+
+ ret = kill(getpid(), SIGKILL);
+ if (ret)
+ error_failure("kill");
+}
+
+static void crash(enum signal_from signal_from, bool change_priv)
+{
+ int ret;
+
+ if (change_priv) {
+ ret = setuid(get_non_root_uid());
+ if (ret)
+ error_failure("setuid");
+ }
+
+ if (signal_from == SIGNAL_FROM_KERNEL)
+ do_sigsegv();
+
+ do_sigkill();
+}
+
+static void execve_crash(char *const argv[])
+{
+ execve(argv[0], argv, NULL);
+ error_failure("execve");
+}
+
+static void exec_crash_user(void)
+{
+ char *const argv[] = {
+ "./test", "-m", "crash", "-s", "user", NULL,
+ };
+
+ execve_crash(argv);
+}
+
+static void exec_crash_user_change_priv(void)
+{
+ char *const argv[] = {
+ "./test", "-m", "crash", "-s", "user", "-C", NULL,
+ };
+
+ execve_crash(argv);
+}
+
+static void exec_crash_kernel(void)
+{
+ char *const argv[] = {
+ "./test", "-m", "crash", "-s", "kernel", NULL,
+ };
+
+ execve_crash(argv);
+}
+
+static void exec_crash_kernel_change_priv(void)
+{
+ char *const argv[] = {
+ "./test", "-m", "crash", "-s", "kernel", "-C", NULL,
+ };
+
+ execve_crash(argv);
+}
+
+static void exec_crash(enum signal_from signal_from, bool change_priv)
+{
+ if (signal_from == SIGNAL_FROM_USER && !change_priv)
+ exec_crash_user();
+ if (signal_from == SIGNAL_FROM_USER && change_priv)
+ exec_crash_user_change_priv();
+ if (signal_from == SIGNAL_FROM_KERNEL && !change_priv)
+ exec_crash_kernel();
+ if (signal_from == SIGNAL_FROM_KERNEL && change_priv)
+ exec_crash_kernel_change_priv();
+}
+
+static void do_crash(enum crash_after crash_after, enum signal_from signal_from,
+ int counter, bool change_priv)
+{
+ pid_t pid;
+ int status;
+
+ if (crash_after == CRASH_AFTER_NONE)
+ crash(signal_from, change_priv);
+
+ while (counter > 0) {
+ pid = fork();
+ if (pid < 0)
+ error_failure("fork");
+
+ /* Child process */
+ if (!pid) {
+ if (crash_after == CRASH_AFTER_FORK)
+ crash(signal_from, change_priv);
+
+ exec_crash(signal_from, change_priv);
+ }
+
+ /* Parent process */
+ counter -= 1;
+ pid = waitpid(pid, &status, 0);
+ if (pid < 0)
+ error_failure("waitpid");
+ }
+}
+
+static __attribute__((noreturn)) void error_close_failure(const char *message,
+ int fd)
+{
+ perror(message);
+ close(fd);
+ exit(EXIT_FAILURE);
+}
+
+static void do_server(uint32_t ip, uint16_t port, long accept_timeout)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_in address;
+ struct timeval timeout;
+ int newsockfd;
+
+ sockfd = socket(AF_INET, SOCK_STREAM, 0);
+ if (sockfd < 0)
+ error_failure("socket");
+
+ address.sin_family = AF_INET;
+ address.sin_addr.s_addr = ip;
+ address.sin_port = port;
+
+ ret = bind(sockfd, (const struct sockaddr *)&address, sizeof(address));
+ if (ret)
+ error_close_failure("bind", sockfd);
+
+ ret = listen(sockfd, 1);
+ if (ret)
+ error_close_failure("listen", sockfd);
+
+ timeout.tv_sec = accept_timeout;
+ timeout.tv_usec = 0;
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO,
+ (const struct timeval *)&timeout, sizeof(timeout));
+ if (ret)
+ error_close_failure("setsockopt", sockfd);
+
+ newsockfd = accept(sockfd, NULL, NULL);
+ if (newsockfd < 0)
+ error_close_failure("accept", sockfd);
+
+ close(sockfd);
+ close(newsockfd);
+}
+
+static void do_client(uint32_t ip, uint16_t port, long connect_timeout)
+{
+ int sockfd;
+ int ret;
+ struct timeval timeout;
+ struct sockaddr_in address;
+
+ sockfd = socket(AF_INET, SOCK_STREAM, 0);
+ if (sockfd < 0)
+ error_failure("socket");
+
+ timeout.tv_sec = connect_timeout;
+ timeout.tv_usec = 0;
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO,
+ (const struct timeval *)&timeout, sizeof(timeout));
+ if (ret)
+ error_close_failure("setsockopt", sockfd);
+
+ address.sin_family = AF_INET;
+ address.sin_addr.s_addr = ip;
+ address.sin_port = port;
+
+ ret = connect(sockfd, (const struct sockaddr *)&address,
+ sizeof(address));
+ if (ret)
+ error_close_failure("connect", sockfd);
+
+ ret = write(sockfd, message, strlen(message));
+ if (ret < 0)
+ error_close_failure("write", sockfd);
+
+ close(sockfd);
+}
+
+#define PROG_NAME basename(argv[0])
+
+int main(int argc, char **argv)
+{
+ int opt;
+ struct args args = {
+ .mode = MODE_NONE,
+ .crash_after = CRASH_AFTER_NONE,
+ .signal_from = SIGNAL_FROM_NONE,
+ .has_counter = false,
+ .has_change_priv = false,
+ .has_ip = false,
+ .has_port = false,
+ .has_timeout = false,
+ };
+
+ while ((opt = getopt(argc, argv, OPT_STRING)) != -1) {
+ switch (opt) {
+ case 'h':
+ usage(PROG_NAME);
+ return EXIT_SUCCESS;
+ case 'm':
+ args.mode = get_mode(optarg, PROG_NAME);
+ break;
+ case 'c':
+ args.crash_after = get_crash_after(optarg, PROG_NAME);
+ break;
+ case 's':
+ args.signal_from = get_signal_from(optarg, PROG_NAME);
+ break;
+ case 'n':
+ args.counter = get_counter(optarg, PROG_NAME);
+ args.has_counter = true;
+ break;
+ case 'C':
+ args.has_change_priv = true;
+ break;
+ case 'a':
+ args.ip = get_ip(optarg, PROG_NAME);
+ args.has_ip = true;
+ break;
+ case 'p':
+ args.port = get_port(optarg, PROG_NAME);
+ args.has_port = true;
+ break;
+ case 't':
+ args.timeout = get_timeout(optarg, PROG_NAME);
+ args.has_timeout = true;
+ break;
+ default:
+ usage(PROG_NAME);
+ return EXIT_FAILURE;
+ }
+ }
+
+ check_args(&args, PROG_NAME);
+
+ if (args.mode == MODE_CRASH) {
+ do_crash(args.crash_after, args.signal_from, args.counter,
+ args.has_change_priv);
+ } else if (args.mode == MODE_SERVER_CRASH) {
+ do_server(args.ip, args.port, args.timeout);
+ do_crash(args.crash_after, args.signal_from, args.counter,
+ false);
+ } else if (args.mode == MODE_CLIENT) {
+ do_client(args.ip, args.port, args.timeout);
+ }
+
+ return EXIT_SUCCESS;
+}
diff --git a/tools/testing/selftests/brute/test.sh b/tools/testing/selftests/brute/test.sh
new file mode 100755
index 000000000000..f53f26ae5b96
--- /dev/null
+++ b/tools/testing/selftests/brute/test.sh
@@ -0,0 +1,226 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+TCID="test.sh"
+
+KSFT_PASS=0
+KSFT_FAIL=1
+KSFT_SKIP=4
+
+errno=$KSFT_PASS
+
+check_root()
+{
+ local uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo $TCID: must be run as root >&2
+ exit $KSFT_SKIP
+ fi
+}
+
+count_fork_matches()
+{
+ dmesg | grep "brute: Fork brute force attack detected" | wc -l
+}
+
+assert_equal()
+{
+ local val1=$1
+ local val2=$2
+
+ if [ $val1 -eq $val2 ]; then
+ echo "$TCID: $message [PASS]"
+ else
+ echo "$TCID: $message [FAIL]"
+ errno=$KSFT_FAIL
+ fi
+}
+
+test_fork_user()
+{
+ COUNTER=20
+
+ old_count=$(count_fork_matches)
+ ./exec test -m crash -c fork -s user -n $COUNTER
+ new_count=$(count_fork_matches)
+
+ message="Fork attack (user signals, no bounds crossed)"
+ assert_equal $old_count $new_count
+}
+
+test_fork_kernel()
+{
+ old_count=$(count_fork_matches)
+ ./exec test -m crash -c fork -s kernel -n $COUNTER
+ new_count=$(count_fork_matches)
+
+ message="Fork attack (kernel signals, no bounds crossed)"
+ assert_equal $old_count $new_count
+}
+
+count_exec_matches()
+{
+ dmesg | grep "brute: Exec brute force attack detected" | wc -l
+}
+
+test_exec_user()
+{
+ old_count=$(count_exec_matches)
+ ./test -m crash -c exec -s user -n $COUNTER
+ new_count=$(count_exec_matches)
+
+ message="Exec attack (user signals, no bounds crossed)"
+ assert_equal $old_count $new_count
+}
+
+test_exec_kernel()
+{
+ old_count=$(count_exec_matches)
+ ./test -m crash -c exec -s kernel -n $COUNTER
+ new_count=$(count_exec_matches)
+
+ message="Exec attack (kernel signals, no bounds crossed)"
+ assert_equal $old_count $new_count
+}
+
+assert_not_equal()
+{
+ local val1=$1
+ local val2=$2
+
+ if [ $val1 -ne $val2 ]; then
+ echo $TCID: $message [PASS]
+ else
+ echo $TCID: $message [FAIL]
+ errno=$KSFT_FAIL
+ fi
+}
+
+test_fork_kernel_setuid()
+{
+ old_count=$(count_fork_matches)
+ chmod u+s test
+ ./exec test -m crash -c fork -s kernel -n $COUNTER
+ chmod u-s test
+ new_count=$(count_fork_matches)
+
+ message="Fork attack (kernel signals, setuid binary)"
+ assert_not_equal $old_count $new_count
+}
+
+test_exec_kernel_setuid()
+{
+ old_count=$(count_exec_matches)
+ chmod u+s test
+ ./test -m crash -c exec -s kernel -n $COUNTER
+ chmod u-s test
+ new_count=$(count_exec_matches)
+
+ message="Exec attack (kernel signals, setuid binary)"
+ assert_not_equal $old_count $new_count
+}
+
+test_fork_kernel_change_priv()
+{
+ old_count=$(count_fork_matches)
+ ./exec test -m crash -c fork -s kernel -n $COUNTER -C
+ new_count=$(count_fork_matches)
+
+ message="Fork attack (kernel signals, change privileges)"
+ assert_not_equal $old_count $new_count
+}
+
+test_exec_kernel_change_priv()
+{
+ old_count=$(count_exec_matches)
+ ./test -m crash -c exec -s kernel -n $COUNTER -C
+ new_count=$(count_exec_matches)
+
+ message="Exec attack (kernel signals, change privileges)"
+ assert_not_equal $old_count $new_count
+}
+
+network_ns_setup()
+{
+ local vnet_name=$1
+ local veth_name=$2
+ local ip_src=$3
+ local ip_dst=$4
+
+ ip netns add $vnet_name
+ ip link set $veth_name netns $vnet_name
+ ip -n $vnet_name addr add $ip_src/24 dev $veth_name
+ ip -n $vnet_name link set $veth_name up
+ ip -n $vnet_name route add $ip_dst/24 dev $veth_name
+}
+
+network_setup()
+{
+ VETH0_NAME=veth0
+ VNET0_NAME=vnet0
+ VNET0_IP=10.0.1.0
+ VETH1_NAME=veth1
+ VNET1_NAME=vnet1
+ VNET1_IP=10.0.2.0
+
+ ip link add $VETH0_NAME type veth peer name $VETH1_NAME
+ network_ns_setup $VNET0_NAME $VETH0_NAME $VNET0_IP $VNET1_IP
+ network_ns_setup $VNET1_NAME $VETH1_NAME $VNET1_IP $VNET0_IP
+}
+
+test_fork_kernel_network_to_local()
+{
+ INADDR_ANY=0.0.0.0
+ PORT=65535
+ TIMEOUT=5
+
+ old_count=$(count_fork_matches)
+ ip netns exec $VNET0_NAME ./exec test -m server_crash -a $INADDR_ANY \
+ -p $PORT -t $TIMEOUT -c fork -s kernel -n $COUNTER &
+ sleep 1
+ ip netns exec $VNET1_NAME ./test -m client -a $VNET0_IP -p $PORT \
+ -t $TIMEOUT
+ sleep 1
+ new_count=$(count_fork_matches)
+
+ message="Fork attack (kernel signals, network to local)"
+ assert_not_equal $old_count $new_count
+}
+
+test_exec_kernel_network_to_local()
+{
+ old_count=$(count_exec_matches)
+ ip netns exec $VNET0_NAME ./test -m server_crash -a $INADDR_ANY \
+ -p $PORT -t $TIMEOUT -c exec -s kernel -n $COUNTER &
+ sleep 1
+ ip netns exec $VNET1_NAME ./test -m client -a $VNET0_IP -p $PORT \
+ -t $TIMEOUT
+ sleep 1
+ new_count=$(count_exec_matches)
+
+ message="Exec attack (kernel signals, network to local)"
+ assert_not_equal $old_count $new_count
+}
+
+network_cleanup()
+{
+ ip netns del $VNET0_NAME >/dev/null 2>&1
+ ip netns del $VNET1_NAME >/dev/null 2>&1
+ ip link delete $VETH0_NAME >/dev/null 2>&1
+ ip link delete $VETH1_NAME >/dev/null 2>&1
+}
+
+check_root
+test_fork_user
+test_fork_kernel
+test_exec_user
+test_exec_kernel
+test_fork_kernel_setuid
+test_exec_kernel_setuid
+test_fork_kernel_change_priv
+test_exec_kernel_change_priv
+network_setup
+test_fork_kernel_network_to_local
+test_exec_kernel_network_to_local
+network_cleanup
+exit $errno
--
2.25.1
Add some info detailing what is the Brute LSM, its motivation, weak
points of existing implementations, proposed solutions, enabling,
disabling and self-tests.
Signed-off-by: John Wood <[email protected]>
---
Documentation/admin-guide/LSM/Brute.rst | 278 ++++++++++++++++++++++++
Documentation/admin-guide/LSM/index.rst | 1 +
security/brute/Kconfig | 3 +-
3 files changed, 281 insertions(+), 1 deletion(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
diff --git a/Documentation/admin-guide/LSM/Brute.rst b/Documentation/admin-guide/LSM/Brute.rst
new file mode 100644
index 000000000000..ca80aef9aa67
--- /dev/null
+++ b/Documentation/admin-guide/LSM/Brute.rst
@@ -0,0 +1,278 @@
+.. SPDX-License-Identifier: GPL-2.0
+===========================================================
+Brute: Fork brute force attack detection and mitigation LSM
+===========================================================
+
+Attacks against vulnerable userspace applications with the purpose to break ASLR
+or bypass canaries traditionally use some level of brute force with the help of
+the fork system call. This is possible since when creating a new process using
+fork its memory contents are the same as those of the parent process (the
+process that called the fork system call). So, the attacker can test the memory
+infinite times to find the correct memory values or the correct memory addresses
+without worrying about crashing the application.
+
+Based on the above scenario it would be nice to have this detected and
+mitigated, and this is the goal of this implementation. Specifically the
+following attacks are expected to be detected:
+
+1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
+ desirable memory layout is got (e.g. Stack Clash).
+2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a
+ desirable memory layout is got (e.g. what CTFs do for simple network
+ service).
+3.- Launching processes without exec() (e.g. Android Zygote) and exposing state
+ to attack a sibling.
+4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the
+ previously shared memory layout of all the other children is exposed (e.g.
+ kind of related to HeartBleed).
+
+In each case, a privilege boundary has been crossed:
+
+Case 1: setuid/setgid process
+Case 2: network to local
+Case 3: privilege changes
+Case 4: network to local
+
+So, what really needs to be detected are fork/exec brute force attacks that
+cross any of the commented bounds.
+
+
+Other implementations
+=====================
+
+The public version of grsecurity, as a summary, is based on the idea of delaying
+the fork system call if a child died due to some fatal signal (SIGSEGV, SIGBUS,
+SIGKILL or SIGILL). This has some issues:
+
+Bad practices
+-------------
+
+Adding delays to the kernel is, in general, a bad idea.
+
+Scenarios not detected (false negatives)
+----------------------------------------
+
+This protection acts only when the fork system call is called after a child has
+crashed. So, it would still be possible for an attacker to fork a big amount of
+children (in the order of thousands), then probe all of them, and finally wait
+the protection time before repeating the steps.
+
+Moreover, this method is based on the idea that the protection doesn't act if
+the parent crashes. So, it would still be possible for an attacker to fork a
+process and probe itself. Then, fork the child process and probe itself again.
+This way, these steps can be repeated infinite times without any mitigation.
+
+Scenarios detected (false positives)
+------------------------------------
+
+Scenarios where an application rarely fails for reasons unrelated to a real
+attack.
+
+
+This implementation
+===================
+
+The main idea behind this implementation is to improve the existing ones
+focusing on the weak points annotated before. Basically, the adopted solution is
+to detect a fast crash rate instead of only one simple crash and to detect both
+the crash of parent and child processes. Also, fine tune the detection focusing
+on privilege boundary crossing. And finally, as a mitigation method, kill all
+the offending tasks involved in the attack instead of using delays.
+
+To achieve this goal, and going into more details, this implementation is based
+on the use of some statistical data shared across all the processes that can
+have the same memory contents. Or in other words, a statistical data shared
+between all the fork hierarchy processes after an execve system call.
+
+The purpose of these statistics is, basically, collect all the necessary info
+to compute the application crash period in order to detect an attack. This crash
+period is the time between the execve system call and the first fault or the
+time between two consecutive faults, but this has a drawback. If an application
+crashes twice in a short period of time for some reason unrelated to a real
+attack, a false positive will be triggered. To avoid this scenario the
+exponential moving average (EMA) is used. This way, the application crash period
+will be a value that is not prone to change due to spurious data and follows the
+real crash period.
+
+To detect a brute force attack it is necessary that the statistics shared by all
+the fork hierarchy processes be updated in every fatal crash and the most
+important data to update is the application crash period.
+
+These statistics are hold by the brute_stats struct.
+
+struct brute_cred {
+ kuid_t uid;
+ kgid_t gid;
+ kuid_t suid;
+ kgid_t sgid;
+ kuid_t euid;
+ kgid_t egid;
+ kuid_t fsuid;
+ kgid_t fsgid;
+};
+
+struct brute_stats {
+ spinlock_t lock;
+ refcount_t refc;
+ unsigned char faults;
+ u64 jiffies;
+ u64 period;
+ struct brute_cred saved_cred;
+ unsigned char network : 1;
+ unsigned char bounds_crossed : 1;
+};
+
+This is a fixed sized struct, so the memory usage will be based on the current
+number of processes exec()ing. The previous sentence is true since in every fork
+system call the parent's statistics are shared with the child process and in
+every execve system call a new brute_stats struct is allocated. So, only one
+brute_stats struct is used for every fork hierarchy (hierarchy of processes from
+the execve system call).
+
+There are two types of brute force attacks that need to be detected. The first
+one is an attack that happens through the fork system call and the second one is
+an attack that happens through the execve system call. The first type uses the
+statistics shared by all the fork hierarchy processes, but the second type
+cannot use this statistical data due to these statistics dissapear when the
+involved tasks finished. In this last scenario the attack info should be tracked
+by the statistics of a higher fork hierarchy (the hierarchy that contains the
+process that forks before the execve system call).
+
+Moreover, these two attack types have two variants. A slow brute force attack
+that is detected if a maximum number of faults per fork hierarchy is reached and
+a fast brute force attack that is detected if the application crash period falls
+below a certain threshold.
+
+Once an attack has been detected, this is mitigated killing all the offending
+tasks involved. Or in other words, once an attack has been detected, this is
+mitigated killing all the processes that share the same statistics (the stats
+that show an slow or fast brute force attack).
+
+Fine tuning the attack detection
+--------------------------------
+
+To avoid false positives during the attack detection it is necessary to narrow
+the possible cases. To do so, and based on the threat scenarios that we want to
+detect, this implementation also focuses on the crossing of privilege bounds.
+
+To be precise, only the following privilege bounds are taken into account:
+
+1.- setuid/setgid process
+2.- network to local
+3.- privilege changes
+
+Moreover, only the fatal signals delivered by the kernel are taken into account
+avoiding the fatal signals sent by userspace applications (with the exception of
+the SIGABRT user signal since this is used by glibc for stack canary, malloc,
+etc. failures, which may indicate that a mitigation has been triggered).
+
+Exponential moving average (EMA)
+--------------------------------
+
+This kind of average defines a weight (between 0 and 1) for the new value to add
+and applies the remainder of the weight to the current average value. This way,
+some spurious data will not excessively modify the average and only if the new
+values are persistent, the moving average will tend towards them.
+
+Mathematically the application crash period's EMA can be expressed as follows:
+
+period_ema = period * weight + period_ema * (1 - weight)
+
+Related to the attack detection, the EMA must guarantee that not many crashes
+are needed. To demonstrate this, the scenario where an application has been
+running without any crashes for a month will be used.
+
+The period's EMA can be written now as:
+
+period_ema[i] = period[i] * weight + period_ema[i - 1] * (1 - weight)
+
+If the new crash periods have insignificant values related to the first crash
+period (a month in this case), the formula can be rewritten as:
+
+period_ema[i] = period_ema[i - 1] * (1 - weight)
+
+And by extension:
+
+period_ema[i - 1] = period_ema[i - 2] * (1 - weight)
+period_ema[i - 2] = period_ema[i - 3] * (1 - weight)
+period_ema[i - 3] = period_ema[i - 4] * (1 - weight)
+
+So, if the substitution is made:
+
+period_ema[i] = period_ema[i - 1] * (1 - weight)
+period_ema[i] = period_ema[i - 2] * pow((1 - weight) , 2)
+period_ema[i] = period_ema[i - 3] * pow((1 - weight) , 3)
+period_ema[i] = period_ema[i - 4] * pow((1 - weight) , 4)
+
+And in a more generic form:
+
+period_ema[i] = period_ema[i - n] * pow((1 - weight) , n)
+
+Where n represents the number of iterations to obtain an EMA value. Or in other
+words, the number of crashes to detect an attack.
+
+So, if we isolate the number of crashes:
+
+period_ema[i] / period_ema[i - n] = pow((1 - weight), n)
+log(period_ema[i] / period_ema[i - n]) = log(pow((1 - weight), n))
+log(period_ema[i] / period_ema[i - n]) = n * log(1 - weight)
+n = log(period_ema[i] / period_ema[i - n]) / log(1 - weight)
+
+Then, in the commented scenario (an application has been running without any
+crashes for a month), the approximate number of crashes to detect an attack
+(using the implementation values for the weight and the crash period threshold)
+is:
+
+weight = 7 / 10
+crash_period_threshold = 30 seconds
+
+n = log(crash_period_threshold / seconds_per_month) / log(1 - weight)
+n = log(30 / (30 * 24 * 3600)) / log(1 - 0.7)
+n = 9.44
+
+So, with 10 crashes for this scenario an attack will be detected. If these steps
+are repeated for different scenarios and the results are collected:
+
+1 month without any crashes ----> 9.44 crashes to detect an attack
+1 year without any crashes -----> 11.50 crashes to detect an attack
+10 years without any crashes ---> 13.42 crashes to detect an attack
+
+However, this computation has a drawback. The first data added to the EMA not
+obtains a real average showing a trend. So the solution is simple, the EMA needs
+a minimum number of data to be able to be interpreted. This way, the case where
+a few first faults are fast enough followed by no crashes is avoided.
+
+Per system enabling/disabling
+-----------------------------
+
+This feature can be enabled at build time using the CONFIG_SECURITY_FORK_BRUTE
+option or using the visual config application under the following menu:
+
+Security options ---> Fork brute force attack detection and mitigation
+
+Also, at boot time, this feature can be disable too, by changing the "lsm=" boot
+parameter.
+
+Kernel selftests
+----------------
+
+To validate all the expectations about this implementation, there is a set of
+selftests. This tests cover fork/exec brute force attacks crossing the following
+privilege boundaries:
+
+1.- setuid process
+2.- privilege changes
+3.- network to local
+
+Also, there are some tests to check that fork/exec brute force attacks without
+crossing any privilege boundariy already commented doesn't trigger the detection
+and mitigation stage.
+
+To build the tests:
+make -C tools/testing/selftests/ TARGETS=brute
+
+To run the tests:
+make -C tools/testing/selftests TARGETS=brute run_tests
+
+To package the tests:
+make -C tools/testing/selftests TARGETS=brute gen_tar
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
index a6ba95fbaa9f..1f68982bb330 100644
--- a/Documentation/admin-guide/LSM/index.rst
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -41,6 +41,7 @@ subdirectories.
:maxdepth: 1
apparmor
+ Brute
LoadPin
SELinux
Smack
diff --git a/security/brute/Kconfig b/security/brute/Kconfig
index 1bd2df1e2dec..334d7e88d27f 100644
--- a/security/brute/Kconfig
+++ b/security/brute/Kconfig
@@ -7,6 +7,7 @@ config SECURITY_FORK_BRUTE
vulnerable userspace processes. The detection method is based on
the application crash period and as a mitigation procedure all the
offending tasks are killed. Like capabilities, this security module
- stacks with other LSMs.
+ stacks with other LSMs. Further information can be found in
+ Documentation/admin-guide/LSM/Brute.rst.
If you are unsure how to answer this question, answer N.
--
2.25.1
In order to maintain the code for the Brute LSM add a new entry to the
maintainers list.
Signed-off-by: John Wood <[email protected]>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index d92f85ca831d..0b88b7a99991 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3764,6 +3764,13 @@ L: [email protected]
S: Supported
F: drivers/net/ethernet/brocade/bna/
+BRUTE SECURITY MODULE
+M: John Wood <[email protected]>
+S: Maintained
+F: Documentation/admin-guide/LSM/Brute.rst
+F: security/brute/
+F: tools/testing/selftests/brute/
+
BSG (block layer generic sg v4 driver)
M: FUJITA Tomonori <[email protected]>
L: [email protected]
--
2.25.1
On Sun, Mar 07, 2021 at 12:30:24PM +0100, John Wood wrote:
> Add a security hook that allows a LSM to be notified when a task gets a
> fatal signal. This patch is a previous step on the way to compute the
> task crash period by the "brute" LSM (linux security module to detect
> and mitigate fork brute force attack against vulnerable userspace
> processes).
>
> Signed-off-by: John Wood <[email protected]>
I continue to really like that this entire thing can be done from an LSM
with just this one extra hook. :)
Reviewed-by: Kees Cook <[email protected]>
--
Kees Cook
On Sun, Mar 07, 2021 at 12:30:25PM +0100, John Wood wrote:
> Add a new Kconfig file to define a menu entry under "Security options"
> to enable the "Fork brute force attack detection and mitigation"
> feature.
>
> For a correct management of a fork brute force attack it is necessary
> that all the tasks hold statistical data. The same statistical data
> needs to be shared between all the tasks that hold the same memory
> contents or in other words, between all the tasks that have been forked
> without any execve call. So, define a statistical data structure to hold
> all the necessary information shared by all the fork hierarchy
> processes. This info is basically the number of crashes, the last crash
> timestamp and the crash period's moving average.
>
> When a forked task calls the execve system call, the memory contents are
> set with new values. So, in this scenario the parent's statistical data
> no need to be shared. Instead, a new statistical data structure must be
> allocated to start a new hierarchy.
>
> The statistical data that is shared between all the fork hierarchy
> processes needs to be freed when this hierarchy disappears.
>
> So, based in all the previous information define a LSM with three hooks
> to manage all the commented cases. These hooks are "task_alloc" to do
> the fork management, "bprm_committing_creds" to do the execve management
> and "task_free" to release the resources.
>
> Also, add to the task_struct's security blob the pointer to the
> statistical data. This way, all the tasks will have access to this
> information.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> security/Kconfig | 11 +-
> security/Makefile | 4 +
> security/brute/Kconfig | 12 ++
> security/brute/Makefile | 2 +
> security/brute/brute.c | 257 ++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 281 insertions(+), 5 deletions(-)
> create mode 100644 security/brute/Kconfig
> create mode 100644 security/brute/Makefile
> create mode 100644 security/brute/brute.c
>
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..204bb311b1f1 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -240,6 +240,7 @@ source "security/safesetid/Kconfig"
> source "security/lockdown/Kconfig"
>
> source "security/integrity/Kconfig"
> +source "security/brute/Kconfig"
>
> choice
> prompt "First legacy 'major LSM' to be initialized"
> @@ -277,11 +278,11 @@ endchoice
>
> config LSM
> string "Ordered list of enabled LSMs"
> - default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> - default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> - default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> - default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> - default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
> + default "brute,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> + default "brute,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> + default "brute,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> + default "brute,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> + default "brute,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
It probably doesn't matter much, but I think brute should be added
between lockdown and yama.
> help
> A comma-separated list of LSMs, in initialization order.
> Any LSMs left off this list will be ignored. This can be
> diff --git a/security/Makefile b/security/Makefile
> index 3baf435de541..1236864876da 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM) += bpf/
> # Object integrity file lists
> subdir-$(CONFIG_INTEGRITY) += integrity
> obj-$(CONFIG_INTEGRITY) += integrity/
> +
> +# Object brute file lists
> +subdir-$(CONFIG_SECURITY_FORK_BRUTE) += brute
> +obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute/
I don't think subdir is needed here? I think you can use obj-... like
loadpin, etc.
> diff --git a/security/brute/Kconfig b/security/brute/Kconfig
> new file mode 100644
> index 000000000000..1bd2df1e2dec
> --- /dev/null
> +++ b/security/brute/Kconfig
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0
> +config SECURITY_FORK_BRUTE
> + bool "Fork brute force attack detection and mitigation"
> + depends on SECURITY
> + help
> + This is an LSM that stops any fork brute force attack against
> + vulnerable userspace processes. The detection method is based on
> + the application crash period and as a mitigation procedure all the
> + offending tasks are killed. Like capabilities, this security module
> + stacks with other LSMs.
I'm not sure the stacking needs mentioning, but okay. :)
> +
> + If you are unsure how to answer this question, answer N.
> diff --git a/security/brute/Makefile b/security/brute/Makefile
> new file mode 100644
> index 000000000000..d3f233a132a9
> --- /dev/null
> +++ b/security/brute/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute.o
> diff --git a/security/brute/brute.c b/security/brute/brute.c
> new file mode 100644
> index 000000000000..99d099e45112
> --- /dev/null
> +++ b/security/brute/brute.c
> @@ -0,0 +1,257 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <asm/current.h>
Why is this needed?
> +#include <linux/bug.h>
> +#include <linux/compiler.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/init.h>
> +#include <linux/jiffies.h>
> +#include <linux/kernel.h>
> +#include <linux/lsm_hooks.h>
> +#include <linux/printk.h>
> +#include <linux/refcount.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +/**
> + * struct brute_stats - Fork brute force attack statistics.
> + * @lock: Lock to protect the brute_stats structure.
> + * @refc: Reference counter.
> + * @faults: Number of crashes.
> + * @jiffies: Last crash timestamp.
> + * @period: Crash period's moving average.
> + *
> + * This structure holds the statistical data shared by all the fork hierarchy
> + * processes.
> + */
> +struct brute_stats {
> + spinlock_t lock;
> + refcount_t refc;
> + unsigned char faults;
> + u64 jiffies;
> + u64 period;
> +};
I assume the max-255 "faults" will be explained... why is this so small?
> +
> +/*
> + * brute_blob_sizes - LSM blob sizes.
> + *
> + * To share statistical data among all the fork hierarchy processes, define a
> + * pointer to the brute_stats structure as a part of the task_struct's security
> + * blob.
> + */
> +static struct lsm_blob_sizes brute_blob_sizes __lsm_ro_after_init = {
> + .lbs_task = sizeof(struct brute_stats *),
> +};
> +
> +/**
> + * brute_stats_ptr() - Get the pointer to the brute_stats structure.
> + * @task: Task that holds the statistical data.
> + *
> + * Return: A pointer to a pointer to the brute_stats structure.
> + */
> +static inline struct brute_stats **brute_stats_ptr(struct task_struct *task)
> +{
> + return task->security + brute_blob_sizes.lbs_task;
> +}
> +
> +/**
> + * brute_new_stats() - Allocate a new statistics structure.
> + *
> + * If the allocation is successful the reference counter is set to one to
> + * indicate that there will be one task that points to this structure. Also, the
> + * last crash timestamp is set to now. This way, it is possible to compute the
> + * application crash period at the first fault.
> + *
> + * Return: NULL if the allocation fails. A pointer to the new allocated
> + * statistics structure if it success.
> + */
> +static struct brute_stats *brute_new_stats(void)
> +{
> + struct brute_stats *stats;
> +
> + stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
> + if (!stats)
> + return NULL;
Since this is tied to process creation, I think it might make sense to
have a dedicated kmem cache for this (instead of using the "generic"
kmalloc). See kmem_cache_{create,*alloc,free}
> +
> + spin_lock_init(&stats->lock);
> + refcount_set(&stats->refc, 1);
> + stats->faults = 0;
> + stats->jiffies = get_jiffies_64();
> + stats->period = 0;
And either way, I'd recommend using the "z" variant of the allocator
(kmem_cache_zalloc, kzalloc) to pre-zero everything (and then you can
drop the "= 0" lines here).
> +
> + return stats;
> +}
> +
> +/**
> + * brute_share_stats() - Share the statistical data between processes.
> + * @src: Source of statistics to be shared.
> + * @dst: Destination of statistics to be shared.
> + *
> + * Copy the src's pointer to the statistical data structure to the dst's pointer
> + * to the same structure. Since there is a new process that shares the same
> + * data, increase the reference counter. The src's pointer cannot be NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_alloc hook.
> + */
> +static void brute_share_stats(struct brute_stats *src,
> + struct brute_stats **dst)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&src->lock, flags);
> + refcount_inc(&src->refc);
> + *dst = src;
> + spin_unlock_irqrestore(&src->lock, flags);
> +}
> +
> +/**
> + * brute_task_alloc() - Target for the task_alloc hook.
> + * @task: Task being allocated.
> + * @clone_flags: Contains the flags indicating what should be shared.
> + *
> + * For a correct management of a fork brute force attack it is necessary that
> + * all the tasks hold statistical data. The same statistical data needs to be
> + * shared between all the tasks that hold the same memory contents or in other
> + * words, between all the tasks that have been forked without any execve call.
> + *
> + * To ensure this, if the current task doesn't have statistical data when forks,
> + * it is mandatory to allocate a new statistics structure and share it between
> + * this task and the new one being allocated. Otherwise, share the statistics
> + * that the current task already has.
> + *
> + * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
> + * otherwise.
> + */
> +static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> +{
> + struct brute_stats **stats, **p_stats;
> +
> + stats = brute_stats_ptr(task);
> + p_stats = brute_stats_ptr(current);
> +
> + if (likely(*p_stats)) {
> + brute_share_stats(*p_stats, stats);
> + return 0;
> + }
> +
> + *stats = brute_new_stats();
> + if (!*stats)
> + return -ENOMEM;
> +
> + brute_share_stats(*stats, p_stats);
> + return 0;
> +}
During the task_alloc hook, aren't both "current" and "task" already
immutable (in the sense that no lock needs to be held for
brute_share_stats())?
And what is the case where brute_stats_ptr(current) returns NULL?
> +
> +/**
> + * brute_task_execve() - Target for the bprm_committing_creds hook.
> + * @bprm: Points to the linux_binprm structure.
> + *
> + * When a forked task calls the execve system call, the memory contents are set
> + * with new values. So, in this scenario the parent's statistical data no need
> + * to be shared. Instead, a new statistical data structure must be allocated to
> + * start a new hierarchy. This condition is detected when the statistics
> + * reference counter holds a value greater than or equal to two (a fork always
> + * sets the statistics reference counter to a minimum of two since the parent
> + * and the child task are sharing the same data).
> + *
> + * However, if the execve function is called immediately after another execve
> + * call, althought the memory contents are reset, there is no need to allocate
> + * a new statistical data structure. This is possible because at this moment
> + * only one task (the task that calls the execve function) points to the data.
> + * In this case, the previous allocation is used but the statistics are reset.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the bprm_committing_creds hook.
> + */
> +static void brute_task_execve(struct linux_binprm *bprm)
> +{
> + struct brute_stats **stats;
> + unsigned long flags;
> +
> + stats = brute_stats_ptr(current);
> + if (WARN(!*stats, "No statistical data\n"))
> + return;
> +
> + spin_lock_irqsave(&(*stats)->lock, flags);
> +
> + if (!refcount_dec_not_one(&(*stats)->refc)) {
> + /* execve call after an execve call */
> + (*stats)->faults = 0;
> + (*stats)->jiffies = get_jiffies_64();
> + (*stats)->period = 0;
> + spin_unlock_irqrestore(&(*stats)->lock, flags);
> + return;
> + }
> +
> + /* execve call after a fork call */
> + spin_unlock_irqrestore(&(*stats)->lock, flags);
> + *stats = brute_new_stats();
> + WARN(!*stats, "Cannot allocate statistical data\n");
> +}
I don't think any of this locking is needed -- you're always operating
on "current", so its brute_stats will always be valid.
> +
> +/**
> + * brute_task_free() - Target for the task_free hook.
> + * @task: Task about to be freed.
> + *
> + * The statistical data that is shared between all the fork hierarchy processes
> + * needs to be freed when this hierarchy disappears.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_free hook.
> + */
> +static void brute_task_free(struct task_struct *task)
> +{
> + struct brute_stats **stats;
> + unsigned long flags;
> + bool refc_is_zero;
> +
> + stats = brute_stats_ptr(task);
> + if (WARN(!*stats, "No statistical data\n"))
> + return;
> +
> + spin_lock_irqsave(&(*stats)->lock, flags);
> + refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> + spin_unlock_irqrestore(&(*stats)->lock, flags);
> +
> + if (refc_is_zero) {
> + kfree(*stats);
> + *stats = NULL;
> + }
> +}
Same thing -- this is what dec_and_test is for: it's atomic, so no
locking needed.
> +
> +/*
> + * brute_hooks - Targets for the LSM's hooks.
> + */
> +static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
> + LSM_HOOK_INIT(task_alloc, brute_task_alloc),
> + LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
> + LSM_HOOK_INIT(task_free, brute_task_free),
> +};
> +
> +/**
> + * brute_init() - Initialize the brute LSM.
> + *
> + * Return: Always returns zero.
> + */
> +static int __init brute_init(void)
> +{
> + pr_info("Brute initialized\n");
> + security_add_hooks(brute_hooks, ARRAY_SIZE(brute_hooks),
> + KBUILD_MODNAME);
> + return 0;
> +}
> +
> +DEFINE_LSM(brute) = {
> + .name = KBUILD_MODNAME,
> + .init = brute_init,
> + .blobs = &brute_blob_sizes,
> +};
> --
> 2.25.1
>
--
Kees Cook
On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> To detect a brute force attack it is necessary that the statistics
> shared by all the fork hierarchy processes be updated in every fatal
> crash and the most important data to update is the application crash
> period. To do so, use the new "task_fatal_signal" LSM hook added in a
> previous step.
>
> The application crash period must be a value that is not prone to change
> due to spurious data and follows the real crash period. So, to compute
> it, the exponential moving average (EMA) is used.
>
> There are two types of brute force attacks that need to be detected. The
> first one is an attack that happens through the fork system call and the
> second one is an attack that happens through the execve system call. The
> first type uses the statistics shared by all the fork hierarchy
> processes, but the second type cannot use this statistical data due to
> these statistics disappear when the involved tasks finished. In this
> last scenario the attack info should be tracked by the statistics of a
> higher fork hierarchy (the hierarchy that contains the process that
> forks before the execve system call).
>
> Moreover, these two attack types have two variants. A slow brute force
> attack that is detected if the maximum number of faults per fork
> hierarchy is reached and a fast brute force attack that is detected if
> the application crash period falls below a certain threshold.
>
> Also, this patch adds locking to protect the statistics pointer hold by
> every process.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> security/brute/brute.c | 498 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 479 insertions(+), 19 deletions(-)
>
> diff --git a/security/brute/brute.c b/security/brute/brute.c
> index 99d099e45112..870db55332d4 100644
> --- a/security/brute/brute.c
> +++ b/security/brute/brute.c
> @@ -11,9 +11,14 @@
> #include <linux/jiffies.h>
> #include <linux/kernel.h>
> #include <linux/lsm_hooks.h>
> +#include <linux/math64.h>
> #include <linux/printk.h>
> #include <linux/refcount.h>
> +#include <linux/rwlock.h>
> +#include <linux/rwlock_types.h>
> #include <linux/sched.h>
> +#include <linux/sched/signal.h>
> +#include <linux/sched/task.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> #include <linux/types.h>
> @@ -37,6 +42,11 @@ struct brute_stats {
> u64 period;
> };
>
> +/*
> + * brute_stats_ptr_lock - Lock to protect the brute_stats structure pointer.
> + */
> +static DEFINE_RWLOCK(brute_stats_ptr_lock);
Yeow, you've switched from an (unneeded in prior patch) per-stats lock
to a global lock? I think this isn't needed...
> +
> /*
> * brute_blob_sizes - LSM blob sizes.
> *
> @@ -74,7 +84,7 @@ static struct brute_stats *brute_new_stats(void)
> {
> struct brute_stats *stats;
>
> - stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
> + stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
Why change this here? I'd just start with this in the patch that
introduces it.
> if (!stats)
> return NULL;
>
> @@ -99,16 +109,17 @@ static struct brute_stats *brute_new_stats(void)
> * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> * since the task_free hook can be called from an IRQ context during the
> * execution of the task_alloc hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> */
> static void brute_share_stats(struct brute_stats *src,
> struct brute_stats **dst)
> {
> - unsigned long flags;
> -
> - spin_lock_irqsave(&src->lock, flags);
> + spin_lock(&src->lock);
> refcount_inc(&src->refc);
> *dst = src;
> - spin_unlock_irqrestore(&src->lock, flags);
> + spin_unlock(&src->lock);
> }
I still don't think any locking is needed here; the whole function can
go away, IMO.
>
> /**
> @@ -126,26 +137,36 @@ static void brute_share_stats(struct brute_stats *src,
> * this task and the new one being allocated. Otherwise, share the statistics
> * that the current task already has.
> *
> + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> + * and brute_stats::lock since the task_free hook can be called from an IRQ
> + * context during the execution of the task_alloc hook.
> + *
> * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
> * otherwise.
> */
> static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> {
> struct brute_stats **stats, **p_stats;
> + unsigned long flags;
>
> stats = brute_stats_ptr(task);
> p_stats = brute_stats_ptr(current);
> + write_lock_irqsave(&brute_stats_ptr_lock, flags);
>
> if (likely(*p_stats)) {
> brute_share_stats(*p_stats, stats);
> + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return 0;
> }
>
> *stats = brute_new_stats();
> - if (!*stats)
> + if (!*stats) {
> + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return -ENOMEM;
> + }
>
> brute_share_stats(*stats, p_stats);
> + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return 0;
> }
I'd much prefer that whatever locking is needed be introduced in the
initial patch: this transformation just double the work to review. :)
>
> @@ -167,9 +188,9 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> * only one task (the task that calls the execve function) points to the data.
> * In this case, the previous allocation is used but the statistics are reset.
> *
> - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> - * since the task_free hook can be called from an IRQ context during the
> - * execution of the bprm_committing_creds hook.
> + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> + * and brute_stats::lock since the task_free hook can be called from an IRQ
> + * context during the execution of the bprm_committing_creds hook.
> */
> static void brute_task_execve(struct linux_binprm *bprm)
> {
> @@ -177,24 +198,33 @@ static void brute_task_execve(struct linux_binprm *bprm)
> unsigned long flags;
>
> stats = brute_stats_ptr(current);
> - if (WARN(!*stats, "No statistical data\n"))
> + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> +
> + if (WARN(!*stats, "No statistical data\n")) {
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return;
> + }
>
> - spin_lock_irqsave(&(*stats)->lock, flags);
> + spin_lock(&(*stats)->lock);
>
> if (!refcount_dec_not_one(&(*stats)->refc)) {
> /* execve call after an execve call */
> (*stats)->faults = 0;
> (*stats)->jiffies = get_jiffies_64();
> (*stats)->period = 0;
> - spin_unlock_irqrestore(&(*stats)->lock, flags);
> + spin_unlock(&(*stats)->lock);
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return;
> }
>
> /* execve call after a fork call */
> - spin_unlock_irqrestore(&(*stats)->lock, flags);
> + spin_unlock(&(*stats)->lock);
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> +
> + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> *stats = brute_new_stats();
> WARN(!*stats, "Cannot allocate statistical data\n");
> + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> }
Again, I don't see a need for locking -- this is just managing the
lifetime which is entirely handled by the implicit locking of "current"
and the refcount_t.
>
> /**
> @@ -204,9 +234,9 @@ static void brute_task_execve(struct linux_binprm *bprm)
> * The statistical data that is shared between all the fork hierarchy processes
> * needs to be freed when this hierarchy disappears.
> *
> - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> - * since the task_free hook can be called from an IRQ context during the
> - * execution of the task_free hook.
> + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> + * and brute_stats::lock since the task_free hook can be called from an IRQ
> + * context during the execution of the task_free hook.
> */
> static void brute_task_free(struct task_struct *task)
> {
> @@ -215,17 +245,446 @@ static void brute_task_free(struct task_struct *task)
> bool refc_is_zero;
>
> stats = brute_stats_ptr(task);
> - if (WARN(!*stats, "No statistical data\n"))
> + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> +
> + if (WARN(!*stats, "No statistical data\n")) {
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return;
> + }
>
> - spin_lock_irqsave(&(*stats)->lock, flags);
> + spin_lock(&(*stats)->lock);
> refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> - spin_unlock_irqrestore(&(*stats)->lock, flags);
> + spin_unlock(&(*stats)->lock);
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
>
> if (refc_is_zero) {
> + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> kfree(*stats);
> *stats = NULL;
> + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> + }
> +}
Same; I would expect this to be simply:
stats = brute_stats_ptr(task);
if (WARN_ON_ONCE(!*stats))
return;
if (refcount_dec_and_test(&(*stats)->refc)) {
kfree(*stats);
*stats = NULL;
}
> +
> +/*
> + * BRUTE_EMA_WEIGHT_NUMERATOR - Weight's numerator of EMA.
> + */
> +static const u64 BRUTE_EMA_WEIGHT_NUMERATOR = 7;
> +
> +/*
> + * BRUTE_EMA_WEIGHT_DENOMINATOR - Weight's denominator of EMA.
> + */
> +static const u64 BRUTE_EMA_WEIGHT_DENOMINATOR = 10;
Should these be externally configurable (via sysfs)?
> +
> +/**
> + * brute_mul_by_ema_weight() - Multiply by EMA weight.
> + * @value: Value to multiply by EMA weight.
> + *
> + * Return: The result of the multiplication operation.
> + */
> +static inline u64 brute_mul_by_ema_weight(u64 value)
> +{
> + return mul_u64_u64_div_u64(value, BRUTE_EMA_WEIGHT_NUMERATOR,
> + BRUTE_EMA_WEIGHT_DENOMINATOR);
> +}
> +
> +/*
> + * BRUTE_MAX_FAULTS - Maximum number of faults.
> + *
> + * If a brute force attack is running slowly for a long time, the application
> + * crash period's EMA is not suitable for the detection. This type of attack
> + * must be detected using a maximum number of faults.
> + */
> +static const unsigned char BRUTE_MAX_FAULTS = 200;
Same.
> +
> +/**
> + * brute_update_crash_period() - Update the application crash period.
> + * @stats: Statistics that hold the application crash period to update.
> + * @now: The current timestamp in jiffies.
> + *
> + * The application crash period must be a value that is not prone to change due
> + * to spurious data and follows the real crash period. So, to compute it, the
> + * exponential moving average (EMA) is used.
> + *
> + * This kind of average defines a weight (between 0 and 1) for the new value to
> + * add and applies the remainder of the weight to the current average value.
> + * This way, some spurious data will not excessively modify the average and only
> + * if the new values are persistent, the moving average will tend towards them.
> + *
> + * Mathematically the application crash period's EMA can be expressed as
> + * follows:
> + *
> + * period_ema = period * weight + period_ema * (1 - weight)
> + *
> + * If the operations are applied:
> + *
> + * period_ema = period * weight + period_ema - period_ema * weight
> + *
> + * If the operands are ordered:
> + *
> + * period_ema = period_ema - period_ema * weight + period * weight
> + *
> + * Finally, this formula can be written as follows:
> + *
> + * period_ema -= period_ema * weight;
> + * period_ema += period * weight;
> + *
> + * The statistics that hold the application crash period to update cannot be
> + * NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> + * Return: The last crash timestamp before updating it.
> + */
> +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
> +{
> + u64 current_period;
> + u64 last_crash_timestamp;
> +
> + spin_lock(&stats->lock);
> + current_period = now - stats->jiffies;
> + last_crash_timestamp = stats->jiffies;
> + stats->jiffies = now;
> +
> + stats->period -= brute_mul_by_ema_weight(stats->period);
> + stats->period += brute_mul_by_ema_weight(current_period);
> +
> + if (stats->faults < BRUTE_MAX_FAULTS)
> + stats->faults += 1;
> +
> + spin_unlock(&stats->lock);
> + return last_crash_timestamp;
> +}
Now *here* locking makes sense, and it only needs to be per-stat, not
global, since multiple processes may be operating on the same stat
struct. To make this more no-reader-locking-friendly, I'd also update
everything at the end, and use WRITE_ONCE():
u64 current_period, period;
u64 last_crash_timestamp;
u64 faults;
spin_lock(&stats->lock);
current_period = now - stats->jiffies;
last_crash_timestamp = stats->jiffies;
WRITE_ONCE(stats->period,
stats->period - brute_mul_by_ema_weight(stats->period) +
brute_mul_by_ema_weight(current_period));
if (stats->faults < BRUTE_MAX_FAULTS)
WRITE_ONCE(stats->faults, stats->faults + 1);
WRITE_ONCE(stats->jiffies, now);
spin_unlock(&stats->lock);
return last_crash_timestamp;
That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults
without needing to hold the &stats->lock (unless they need perfectly matching
jiffies, period, and faults).
> +
> +/*
> + * BRUTE_MIN_FAULTS - Minimum number of faults.
> + *
> + * The application crash period's EMA cannot be used until a minimum number of
> + * data has been applied to it. This constraint allows getting a trend when this
> + * moving average is used. Moreover, it avoids the scenario where an application
> + * fails quickly from execve system call due to reasons unrelated to a real
> + * attack.
> + */
> +static const unsigned char BRUTE_MIN_FAULTS = 5;
> +
> +/*
> + * BRUTE_CRASH_PERIOD_THRESHOLD - Application crash period threshold.
> + *
> + * The units are expressed in milliseconds.
> + *
> + * A fast brute force attack is detected when the application crash period falls
> + * below this threshold.
> + */
> +static const u64 BRUTE_CRASH_PERIOD_THRESHOLD = 30000;
These could all be sysctls (see yama's use of sysctl).
> +
> +/**
> + * brute_attack_running() - Test if a brute force attack is happening.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * The decision if a brute force attack is running is based on the statistical
> + * data shared by all the fork hierarchy processes. This statistics cannot be
> + * NULL.
> + *
> + * There are two types of brute force attacks that can be detected using the
> + * statistical data. The first one is a slow brute force attack that is detected
> + * if the maximum number of faults per fork hierarchy is reached. The second
> + * type is a fast brute force attack that is detected if the application crash
> + * period falls below a certain threshold.
> + *
> + * Moreover, it is important to note that no attacks will be detected until a
> + * minimum number of faults have occurred. This allows to have a trend in the
> + * crash period when the EMA is used and also avoids the scenario where an
> + * application fails quickly from execve system call due to reasons unrelated to
> + * a real attack.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> + * Return: True if a brute force attack is happening. False otherwise.
> + */
> +static bool brute_attack_running(struct brute_stats *stats)
> +{
> + u64 crash_period;
> +
> + spin_lock(&stats->lock);
> + if (stats->faults < BRUTE_MIN_FAULTS) {
> + spin_unlock(&stats->lock);
> + return false;
> + }
If I'm reading this correctly, you're performing two tests, so there
isn't a strict relationship between faults and period for this test,
and I think it could be done without locking with READ_ONCE():
u64 faults;
u64 crash_period;
faults = READ_ONCE(stats->faults);
if (faults < BRUTE_MIN_FAULTS)
return false;
if (faults >= BRUTE_MAX_FAULTS)
return true;
crash_period = jiffies64_to_msecs(READ_ONCE(stats->period));
return crash_period < BRUTE_CRASH_PERIOD_THRESHOLD;
> +
> + if (stats->faults >= BRUTE_MAX_FAULTS) {
> + spin_unlock(&stats->lock);
> + return true;
> + }
> +
> + crash_period = jiffies64_to_msecs(stats->period);
> + spin_unlock(&stats->lock);
> +
> + return crash_period < BRUTE_CRASH_PERIOD_THRESHOLD;
> +}
> +
> +/**
> + * print_fork_attack_running() - Warn about a fork brute force attack.
> + */
> +static inline void print_fork_attack_running(void)
> +{
> + pr_warn("Fork brute force attack detected [%s]\n", current->comm);
> +}
I think pid should be part of this...
> +
> +/**
> + * brute_manage_fork_attack() - Manage a fork brute force attack.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + * @now: The current timestamp in jiffies.
> + *
> + * For a correct management of a fork brute force attack it is only necessary to
> + * update the statistics and test if an attack is happening based on these data.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> + * Return: The last crash timestamp before updating it.
> + */
> +static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
> +{
> + u64 last_fork_crash;
> +
> + last_fork_crash = brute_update_crash_period(stats, now);
> + if (brute_attack_running(stats))
> + print_fork_attack_running();
> +
> + return last_fork_crash;
> +}
> +
> +/**
> + * brute_get_exec_stats() - Get the exec statistics.
> + * @stats: When this function is called, this parameter must point to the
> + * current process' statistical data. When this function returns, this
> + * parameter points to the parent process' statistics of the fork
> + * hierarchy that hold the current process' statistics.
> + *
> + * To manage a brute force attack that happens through the execve system call it
> + * is not possible to use the statistical data hold by this process due to these
> + * statistics disappear when this task is finished. In this scenario this data
> + * should be tracked by the statistics of a higher fork hierarchy (the hierarchy
> + * that contains the process that forks before the execve system call).
> + *
> + * To find these statistics the current fork hierarchy must be traversed up
> + * until new statistics are found.
> + *
> + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
> + */
> +static void brute_get_exec_stats(struct brute_stats **stats)
> +{
> + const struct task_struct *task = current;
> + struct brute_stats **p_stats;
> +
> + do {
> + if (!task->real_parent) {
> + *stats = NULL;
> + return;
> + }
> +
> + p_stats = brute_stats_ptr(task->real_parent);
> + task = task->real_parent;
> + } while (*stats == *p_stats);
> +
> + *stats = *p_stats;
> +}
See Yama's task_is_descendant() for how to walk up the process tree
(and I think the process group stuff will save some steps too); you
don't need tasklist_lock held, just rcu_read_lock held, AIUI:
Documentation/RCU/listRCU.rst
And since you're passing this stats struct back up, and it would be outside of rcu read lock, you'd want to do a "get" on it first:
rcu_read_lock();
loop {
...
}
refcount_inc_not_zero(&(*p_stats)->refc);
rcu_read_unlock();
*stats = *p_stats
> +
> +/**
> + * brute_update_exec_crash_period() - Update the exec crash period.
> + * @stats: When this function is called, this parameter must point to the
> + * current process' statistical data. When this function returns, this
> + * parameter points to the updated statistics (statistics that track the
> + * info to manage a brute force attack that happens through the execve
> + * system call).
> + * @now: The current timestamp in jiffies.
> + * @last_fork_crash: The last fork crash timestamp before updating it.
> + *
> + * If this is the first update of the statistics used to manage a brute force
> + * attack that happens through the execve system call, its last crash timestamp
> + * (the timestamp that shows when the execve was called) cannot be used to
> + * compute the crash period's EMA. Instead, the last fork crash timestamp should
> + * be used (the last crash timestamp of the child fork hierarchy before updating
> + * the crash period). This allows that in a brute force attack that happens
> + * through the fork system call, the exec and fork statistics are the same. In
> + * this situation, the mitigation method will act only in the processes that are
> + * sharing the fork statistics. This way, the process that forked before the
> + * execve system call will not be involved in the mitigation method. In this
> + * scenario, the parent is not responsible of the child's behaviour.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and tasklist_lock and
> + * brute_stats_ptr_lock held.
> + * Return: -EFAULT if there are no exec statistics. Zero otherwise.
> + */
> +static int brute_update_exec_crash_period(struct brute_stats **stats,
> + u64 now, u64 last_fork_crash)
> +{
> + brute_get_exec_stats(stats);
> + if (!*stats)
> + return -EFAULT;
This isn't EFAULT (userspace memory fault), but rather more EINVAL or
ESRCH.
> +
> + spin_lock(&(*stats)->lock);
> + if (!(*stats)->faults)
> + (*stats)->jiffies = last_fork_crash;
> + spin_unlock(&(*stats)->lock);
> +
> + brute_update_crash_period(*stats, now);
and then you can add:
if (refcount_dec_and_test(&(*stats)->refc))
kfree(*stats);
(or better yet, make that a helper) named something like
"put_brute_stats".
> + return 0;
> +}
I find the re-writing of **stats confusing here -- I think you should
leave that unmodified, and instead return a pointer (instead of "int"),
and for errors, use ERR_PTR(-ESRCH)
> +
> +/**
> + * brute_get_crash_period() - Get the application crash period.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> + * Return: The application crash period.
> + */
> +static u64 brute_get_crash_period(struct brute_stats *stats)
> +{
> + u64 crash_period;
> +
> + spin_lock(&stats->lock);
> + crash_period = stats->period;
> + spin_unlock(&stats->lock);
> +
> + return crash_period;
> +}
return READ_ONCE(stats->period);
> +
> +/**
> + * print_exec_attack_running() - Warn about an exec brute force attack.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * Before showing the process name it is mandatory to find a process that holds
> + * a pointer to the exec statistics.
> + *
> + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
> + */
> +static void print_exec_attack_running(const struct brute_stats *stats)
> +{
> + struct task_struct *p;
> + struct brute_stats **p_stats;
> + bool found = false;
> +
> + for_each_process(p) {
> + p_stats = brute_stats_ptr(p);
> + if (*p_stats == stats) {
> + found = true;
> + break;
> + }
> + }
> +
> + if (WARN(!found, "No exec process\n"))
> + return;
> +
> + pr_warn("Exec brute force attack detected [%s]\n", p->comm);
> +}
Same logic to change here as above for talking the process list. (IIUC, since
you're only reading, you don't need tasklist_lock, just rcu_read_lock.)
But, if I'm reading this right, you only ever call this with "current".
It seems like it would be way more efficient to just use "current"
instead?
> +
> +/**
> + * brute_manage_exec_attack() - Manage an exec brute force attack.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + * @now: The current timestamp in jiffies.
> + * @last_fork_crash: The last fork crash timestamp before updating it.
> + *
> + * For a correct management of an exec brute force attack it is only necessary
> + * to update the exec statistics and test if an attack is happening based on
> + * these data.
> + *
> + * It is important to note that if the fork and exec crash periods are the same,
> + * the attack test is avoided. This allows that in a brute force attack that
> + * happens through the fork system call, the mitigation method does not act on
> + * the parent process of the fork hierarchy.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and tasklist_lock and
> + * brute_stats_ptr_lock held.
> + */
> +static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
> + u64 last_fork_crash)
> +{
> + int ret;
> + struct brute_stats *exec_stats = stats;
> + u64 fork_period;
> + u64 exec_period;
> +
> + ret = brute_update_exec_crash_period(&exec_stats, now, last_fork_crash);
> + if (WARN(ret, "No exec statistical data\n"))
> + return;
I think this should fail closed: if there's a static processing error,
treat it as an attack.
> +
> + fork_period = brute_get_crash_period(stats);
> + exec_period = brute_get_crash_period(exec_stats);
> + if (fork_period == exec_period)
> + return;
> +
> + if (brute_attack_running(exec_stats))
> + print_exec_attack_running(exec_stats);
> +}
> +
> +/**
> + * brute_task_fatal_signal() - Target for the task_fatal_signal hook.
> + * @siginfo: Contains the signal information.
> + *
> + * To detect a brute force attack is necessary to update the fork and exec
> + * statistics in every fatal crash and act based on these data.
> + *
> + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> + * and brute_stats::lock since the task_free hook can be called from an IRQ
> + * context during the execution of the task_fatal_signal hook.
> + */
> +static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
> +{
> + struct brute_stats **stats;
> + unsigned long flags;
> + u64 last_fork_crash;
> + u64 now = get_jiffies_64();
> +
> + stats = brute_stats_ptr(current);
> + read_lock(&tasklist_lock);
> + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> +
> + if (WARN(!*stats, "No statistical data\n")) {
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> + read_unlock(&tasklist_lock);
> + return;
> }
> +
> + last_fork_crash = brute_manage_fork_attack(*stats, now);
> + brute_manage_exec_attack(*stats, now, last_fork_crash);
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> + read_unlock(&tasklist_lock);
> }
>
> /*
> @@ -235,6 +694,7 @@ static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
> LSM_HOOK_INIT(task_alloc, brute_task_alloc),
> LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
> LSM_HOOK_INIT(task_free, brute_task_free),
> + LSM_HOOK_INIT(task_fatal_signal, brute_task_fatal_signal),
> };
>
> /**
> --
> 2.25.1
>
I think this is very close!
--
Kees Cook
On Sun, Mar 07, 2021 at 12:30:27PM +0100, John Wood wrote:
> To avoid false positives during the attack detection it is necessary to
> narrow the possible cases. Only the following scenarios are taken into
> account:
>
> 1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
> desirable memory layout is got (e.g. Stack Clash).
> 2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
> until a desirable memory layout is got (e.g. what CTFs do for simple
> network service).
> 3.- Launching processes without exec() (e.g. Android Zygote) and exposing
> state to attack a sibling.
> 4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
> the previously shared memory layout of all the other children is
> exposed (e.g. kind of related to HeartBleed).
>
> In each case, a privilege boundary has been crossed:
>
> Case 1: setuid/setgid process
> Case 2: network to local
> Case 3: privilege changes
> Case 4: network to local
>
> So, this patch checks if any of these privilege boundaries have been
> crossed before to compute the application crash period.
>
> Also, in every fatal crash only the signals delivered by the kernel are
> taken into account with the exception of the SIGABRT signal since the
> latter is used by glibc for stack canary, malloc, etc failures, which may
> indicate that a mitigation has been triggered.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> security/brute/brute.c | 293 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 280 insertions(+), 13 deletions(-)
>
> diff --git a/security/brute/brute.c b/security/brute/brute.c
> index 870db55332d4..38e5e050964a 100644
> --- a/security/brute/brute.c
> +++ b/security/brute/brute.c
> @@ -3,15 +3,25 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <asm/current.h>
> +#include <asm/rwonce.h>
> +#include <asm/siginfo.h>
> +#include <asm/signal.h>
> +#include <linux/binfmts.h>
> #include <linux/bug.h>
> #include <linux/compiler.h>
> +#include <linux/cred.h>
> +#include <linux/dcache.h>
> #include <linux/errno.h>
> +#include <linux/fs.h>
> #include <linux/gfp.h>
> +#include <linux/if.h>
> #include <linux/init.h>
> #include <linux/jiffies.h>
> #include <linux/kernel.h>
> #include <linux/lsm_hooks.h>
> #include <linux/math64.h>
> +#include <linux/netdevice.h>
> +#include <linux/path.h>
> #include <linux/printk.h>
> #include <linux/refcount.h>
> #include <linux/rwlock.h>
> @@ -19,9 +29,35 @@
> #include <linux/sched.h>
> #include <linux/sched/signal.h>
> #include <linux/sched/task.h>
> +#include <linux/signal.h>
> +#include <linux/skbuff.h>
> #include <linux/slab.h>
> #include <linux/spinlock.h>
> +#include <linux/stat.h>
> #include <linux/types.h>
> +#include <linux/uidgid.h>
This is really a LOT of includes. Are you sure all of these are
explicitly needed?
> +
> +/**
> + * struct brute_cred - Saved credentials.
> + * @uid: Real UID of the task.
> + * @gid: Real GID of the task.
> + * @suid: Saved UID of the task.
> + * @sgid: Saved GID of the task.
> + * @euid: Effective UID of the task.
> + * @egid: Effective GID of the task.
> + * @fsuid: UID for VFS ops.
> + * @fsgid: GID for VFS ops.
> + */
> +struct brute_cred {
> + kuid_t uid;
> + kgid_t gid;
> + kuid_t suid;
> + kgid_t sgid;
> + kuid_t euid;
> + kgid_t egid;
> + kuid_t fsuid;
> + kgid_t fsgid;
> +};
>
> /**
> * struct brute_stats - Fork brute force attack statistics.
> @@ -30,6 +66,9 @@
> * @faults: Number of crashes.
> * @jiffies: Last crash timestamp.
> * @period: Crash period's moving average.
> + * @saved_cred: Saved credentials.
> + * @network: Network activity flag.
> + * @bounds_crossed: Privilege bounds crossed flag.
> *
> * This structure holds the statistical data shared by all the fork hierarchy
> * processes.
> @@ -40,6 +79,9 @@ struct brute_stats {
> unsigned char faults;
> u64 jiffies;
> u64 period;
> + struct brute_cred saved_cred;
> + unsigned char network : 1;
> + unsigned char bounds_crossed : 1;
If you really want to keep faults a "char", I would move these bools
after "faults" to avoid adding more padding.
> };
>
> /*
> @@ -71,18 +113,25 @@ static inline struct brute_stats **brute_stats_ptr(struct task_struct *task)
>
> /**
> * brute_new_stats() - Allocate a new statistics structure.
> + * @network_to_local: Network activity followed by a fork or execve system call.
> + * @is_setid: The executable file has the setid flags set.
> *
> * If the allocation is successful the reference counter is set to one to
> * indicate that there will be one task that points to this structure. Also, the
> * last crash timestamp is set to now. This way, it is possible to compute the
> * application crash period at the first fault.
> *
> + * Moreover, the credentials of the current task are saved. Also, the network
> + * and bounds_crossed flags are set based on the network_to_local and is_setid
> + * parameters.
> + *
> * Return: NULL if the allocation fails. A pointer to the new allocated
> * statistics structure if it success.
> */
> -static struct brute_stats *brute_new_stats(void)
> +static struct brute_stats *brute_new_stats(bool network_to_local, bool is_setid)
> {
> struct brute_stats *stats;
> + const struct cred *cred = current_cred();
>
> stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
> if (!stats)
> @@ -93,6 +142,16 @@ static struct brute_stats *brute_new_stats(void)
> stats->faults = 0;
> stats->jiffies = get_jiffies_64();
> stats->period = 0;
> + stats->saved_cred.uid = cred->uid;
> + stats->saved_cred.gid = cred->gid;
> + stats->saved_cred.suid = cred->suid;
> + stats->saved_cred.sgid = cred->sgid;
> + stats->saved_cred.euid = cred->euid;
> + stats->saved_cred.egid = cred->egid;
> + stats->saved_cred.fsuid = cred->fsuid;
> + stats->saved_cred.fsgid = cred->fsgid;
Hm, there's more than just uids to check for perms, but I'll go read
more...
> + stats->network = network_to_local;
> + stats->bounds_crossed = network_to_local || is_setid;
>
> return stats;
> }
> @@ -137,6 +196,10 @@ static void brute_share_stats(struct brute_stats *src,
> * this task and the new one being allocated. Otherwise, share the statistics
> * that the current task already has.
> *
> + * Also, if the shared statistics indicate a previous network activity, the
> + * bounds_crossed flag must be set to show that a network-to-local privilege
> + * boundary has been crossed.
> + *
> * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> * and brute_stats::lock since the task_free hook can be called from an IRQ
> * context during the execution of the task_alloc hook.
> @@ -155,11 +218,14 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
>
> if (likely(*p_stats)) {
> brute_share_stats(*p_stats, stats);
> + spin_lock(&(*stats)->lock);
> + (*stats)->bounds_crossed |= (*stats)->network;
> + spin_unlock(&(*stats)->lock);
> write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return 0;
> }
>
> - *stats = brute_new_stats();
> + *stats = brute_new_stats(false, false);
> if (!*stats) {
> write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return -ENOMEM;
> @@ -170,6 +236,61 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> return 0;
> }
>
> +/**
> + * brute_is_setid() - Test if the executable file has the setid flags set.
> + * @bprm: Points to the linux_binprm structure.
> + *
> + * Return: True if the executable file has the setid flags set. False otherwise.
> + */
> +static bool brute_is_setid(const struct linux_binprm *bprm)
> +{
> + struct file *file = bprm->file;
> + struct inode *inode;
> + umode_t mode;
> +
> + if (!file)
> + return false;
> +
> + inode = file->f_path.dentry->d_inode;
> + mode = inode->i_mode;
> +
> + return !!(mode & (S_ISUID | S_ISGID));
> +}
Oh, er, no, this should not reinvent the wheel. You just want to know if
creds got elevated, so you want bprm->secureexec; this gets correctly
checked in cap_bprm_creds_from_file().
> +
> +/**
> + * brute_reset_stats() - Reset the statistical data.
> + * @stats: Statistics to be reset.
> + * @is_setid: The executable file has the setid flags set.
> + *
> + * Reset the faults and period and set the last crash timestamp to now. This
> + * way, it is possible to compute the application crash period at the next
> + * fault. Also, save the credentials of the current task and update the
> + * bounds_crossed flag based on a previous network activity and the is_setid
> + * parameter.
> + *
> + * The statistics to be reset cannot be NULL.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * and brute_stats::lock held.
> + */
> +static void brute_reset_stats(struct brute_stats *stats, bool is_setid)
> +{
> + const struct cred *cred = current_cred();
> +
> + stats->faults = 0;
> + stats->jiffies = get_jiffies_64();
> + stats->period = 0;
> + stats->saved_cred.uid = cred->uid;
> + stats->saved_cred.gid = cred->gid;
> + stats->saved_cred.suid = cred->suid;
> + stats->saved_cred.sgid = cred->sgid;
> + stats->saved_cred.euid = cred->euid;
> + stats->saved_cred.egid = cred->egid;
> + stats->saved_cred.fsuid = cred->fsuid;
> + stats->saved_cred.fsgid = cred->fsgid;
> + stats->bounds_crossed = stats->network || is_setid;
> +}
I would include brute_reset_stats() in the first patch (and add to it as
needed). To that end, it can start with a memset(stats, 0, sizeof(*stats));
> +
> /**
> * brute_task_execve() - Target for the bprm_committing_creds hook.
> * @bprm: Points to the linux_binprm structure.
> @@ -188,6 +309,11 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> * only one task (the task that calls the execve function) points to the data.
> * In this case, the previous allocation is used but the statistics are reset.
> *
> + * Also, if the statistics of the process that calls the execve system call
> + * indicate a previous network activity or the executable file has the setid
> + * flags set, the bounds_crossed flag must be set to show that a network to
> + * local privilege boundary or setid boundary has been crossed respectively.
> + *
> * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> * and brute_stats::lock since the task_free hook can be called from an IRQ
> * context during the execution of the bprm_committing_creds hook.
> @@ -196,6 +322,8 @@ static void brute_task_execve(struct linux_binprm *bprm)
> {
> struct brute_stats **stats;
> unsigned long flags;
> + bool network_to_local;
> + bool is_setid = false;
>
> stats = brute_stats_ptr(current);
> read_lock_irqsave(&brute_stats_ptr_lock, flags);
> @@ -206,12 +334,18 @@ static void brute_task_execve(struct linux_binprm *bprm)
> }
>
> spin_lock(&(*stats)->lock);
> + network_to_local = (*stats)->network;
> +
> + /*
> + * A network_to_local flag equal to true will set the bounds_crossed
> + * flag. So, in this scenario the "is setid" test can be avoided.
> + */
> + if (!network_to_local)
> + is_setid = brute_is_setid(bprm);
>
> if (!refcount_dec_not_one(&(*stats)->refc)) {
> /* execve call after an execve call */
> - (*stats)->faults = 0;
> - (*stats)->jiffies = get_jiffies_64();
> - (*stats)->period = 0;
> + brute_reset_stats(*stats, is_setid);
> spin_unlock(&(*stats)->lock);
> read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> return;
> @@ -222,7 +356,7 @@ static void brute_task_execve(struct linux_binprm *bprm)
> read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
>
> write_lock_irqsave(&brute_stats_ptr_lock, flags);
> - *stats = brute_new_stats();
> + *stats = brute_new_stats(network_to_local, is_setid);
> WARN(!*stats, "Cannot allocate statistical data\n");
> write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> }
> @@ -653,12 +787,103 @@ static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
> print_exec_attack_running(exec_stats);
> }
>
> +/**
> + * brute_priv_have_changed() - Test if the privileges have changed.
> + * @stats: Statistics that hold the saved credentials.
> + *
> + * The privileges have changed if the credentials of the current task are
> + * different from the credentials saved in the statistics structure.
> + *
> + * The statistics that hold the saved credentials cannot be NULL.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * and brute_stats::lock held.
> + * Return: True if the privileges have changed. False otherwise.
> + */
> +static bool brute_priv_have_changed(struct brute_stats *stats)
> +{
> + const struct cred *cred = current_cred();
> + bool priv_have_changed;
> +
> + priv_have_changed = !uid_eq(stats->saved_cred.uid, cred->uid) ||
> + !gid_eq(stats->saved_cred.gid, cred->gid) ||
> + !uid_eq(stats->saved_cred.suid, cred->suid) ||
> + !gid_eq(stats->saved_cred.sgid, cred->sgid) ||
> + !uid_eq(stats->saved_cred.euid, cred->euid) ||
> + !gid_eq(stats->saved_cred.egid, cred->egid) ||
> + !uid_eq(stats->saved_cred.fsuid, cred->fsuid) ||
> + !gid_eq(stats->saved_cred.fsgid, cred->fsgid);
> +
> + return priv_have_changed;
> +}
This should just be checked from bprm->secureexec, which is valid by the
time you get to the bprm_committing_creds hook. You can just save the
value to your stats struct instead of re-interrogating current_cred,
etc.
> +
> +/**
> + * brute_threat_model_supported() - Test if the threat model is supported.
> + * @siginfo: Contains the signal information.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * To avoid false positives during the attack detection it is necessary to
> + * narrow the possible cases. Only the following scenarios are taken into
> + * account:
> + *
> + * 1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
> + * desirable memory layout is got (e.g. Stack Clash).
> + * 2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until
> + * a desirable memory layout is got (e.g. what CTFs do for simple network
> + * service).
> + * 3.- Launching processes without exec() (e.g. Android Zygote) and exposing
> + * state to attack a sibling.
> + * 4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
> + * the previously shared memory layout of all the other children is exposed
> + * (e.g. kind of related to HeartBleed).
> + *
> + * In each case, a privilege boundary has been crossed:
> + *
> + * Case 1: setuid/setgid process
> + * Case 2: network to local
> + * Case 3: privilege changes
> + * Case 4: network to local
> + *
> + * Also, only the signals delivered by the kernel are taken into account with
> + * the exception of the SIGABRT signal since the latter is used by glibc for
> + * stack canary, malloc, etc failures, which may indicate that a mitigation has
> + * been triggered.
> + *
> + * The signal information and the statistical data shared by all the fork
> + * hierarchy processes cannot be NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> + * Return: True if the threat model is supported. False otherwise.
> + */
> +static bool brute_threat_model_supported(const kernel_siginfo_t *siginfo,
> + struct brute_stats *stats)
> +{
> + bool bounds_crossed;
> +
> + if (siginfo->si_signo == SIGKILL && siginfo->si_code != SIGABRT)
> + return false;
> +
> + spin_lock(&stats->lock);
> + bounds_crossed = stats->bounds_crossed;
> + bounds_crossed = bounds_crossed || brute_priv_have_changed(stats);
> + stats->bounds_crossed = bounds_crossed;
> + spin_unlock(&stats->lock);
> +
> + return bounds_crossed;
> +}
I think this logic can be done with READ_ONCE()s and moved directly into
brute_task_fatal_signal().
> +
> /**
> * brute_task_fatal_signal() - Target for the task_fatal_signal hook.
> * @siginfo: Contains the signal information.
> *
> - * To detect a brute force attack is necessary to update the fork and exec
> - * statistics in every fatal crash and act based on these data.
> + * To detect a brute force attack it is necessary, as a first step, to test in
> + * every fatal crash if the threat model is supported. If so, update the fork
> + * and exec statistics and act based on these data.
> *
> * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> * and brute_stats::lock since the task_free hook can be called from an IRQ
> @@ -675,18 +900,59 @@ static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
> read_lock(&tasklist_lock);
> read_lock_irqsave(&brute_stats_ptr_lock, flags);
>
> - if (WARN(!*stats, "No statistical data\n")) {
> - read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> - read_unlock(&tasklist_lock);
> - return;
> - }
> + if (WARN(!*stats, "No statistical data\n"))
> + goto unlock;
> +
> + if (!brute_threat_model_supported(siginfo, *stats))
> + goto unlock;
>
> last_fork_crash = brute_manage_fork_attack(*stats, now);
> brute_manage_exec_attack(*stats, now, last_fork_crash);
> +unlock:
> read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> read_unlock(&tasklist_lock);
> }
>
> +/**
> + * brute_network() - Target for the socket_sock_rcv_skb hook.
> + * @sk: Contains the sock (not socket) associated with the incoming sk_buff.
> + * @skb: Contains the incoming network data.
> + *
> + * A previous step to detect that a network to local boundary has been crossed
> + * is to detect if there is network activity. To do this, it is only necessary
> + * to check if there are data packets received from a network device other than
> + * loopback.
> + *
> + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> + * and brute_stats::lock since the task_free hook can be called from an IRQ
> + * context during the execution of the socket_sock_rcv_skb hook.
> + *
> + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> + * otherwise.
> + */
> +static int brute_network(struct sock *sk, struct sk_buff *skb)
> +{
> + struct brute_stats **stats;
> + unsigned long flags;
> +
> + if (!skb->dev || (skb->dev->flags & IFF_LOOPBACK))
> + return 0;
> +
> + stats = brute_stats_ptr(current);
Uhh, is "current" valid here? I actually don't know this hook very well.
> + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> +
> + if (!*stats) {
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> + return -EFAULT;
> + }
> +
> + spin_lock(&(*stats)->lock);
> + (*stats)->network = true;
> + spin_unlock(&(*stats)->lock);
> + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> + return 0;
> +}
> +
> /*
> * brute_hooks - Targets for the LSM's hooks.
> */
> @@ -695,6 +961,7 @@ static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
> LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
> LSM_HOOK_INIT(task_free, brute_task_free),
> LSM_HOOK_INIT(task_fatal_signal, brute_task_fatal_signal),
> + LSM_HOOK_INIT(socket_sock_rcv_skb, brute_network),
> };
>
> /**
> --
> 2.25.1
>
--
Kees Cook
On Sun, Mar 07, 2021 at 12:30:28PM +0100, John Wood wrote:
> In order to mitigate a brute force attack all the offending tasks involved
> in the attack must be killed. In other words, it is necessary to kill all
> the tasks that share the fork and/or exec statistical data related to the
> attack. Moreover, if the attack happens through the fork system call, the
> processes that have the same group_leader that the current task (the task
> that has crashed) must be avoided since they are in the path to be killed.
>
> When the SIGKILL signal is sent to the offending tasks, the function
> "brute_kill_offending_tasks" will be called in a recursive way from the
> task_fatal_signal LSM hook due to a small crash period. So, to avoid kill
> again the same tasks due to a recursive call of this function, it is
> necessary to disable the attack detection for the involved hierarchies.
>
> To disable the attack detection, set to zero the last crash timestamp and
> avoid to compute the application crash period in this case.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> security/brute/brute.c | 141 ++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 132 insertions(+), 9 deletions(-)
>
> diff --git a/security/brute/brute.c b/security/brute/brute.c
> index 38e5e050964a..36a3286a02dd 100644
> --- a/security/brute/brute.c
> +++ b/security/brute/brute.c
> @@ -22,6 +22,7 @@
> #include <linux/math64.h>
> #include <linux/netdevice.h>
> #include <linux/path.h>
> +#include <linux/pid.h>
> #include <linux/printk.h>
> #include <linux/refcount.h>
> #include <linux/rwlock.h>
> @@ -64,7 +65,7 @@ struct brute_cred {
> * @lock: Lock to protect the brute_stats structure.
> * @refc: Reference counter.
> * @faults: Number of crashes.
> - * @jiffies: Last crash timestamp.
> + * @jiffies: Last crash timestamp. If zero, the attack detection is disabled.
> * @period: Crash period's moving average.
> * @saved_cred: Saved credentials.
> * @network: Network activity flag.
> @@ -571,6 +572,125 @@ static inline void print_fork_attack_running(void)
> pr_warn("Fork brute force attack detected [%s]\n", current->comm);
> }
>
> +/**
> + * brute_disabled() - Test if the brute force attack detection is disabled.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * The brute force attack detection enabling/disabling is based on the last
> + * crash timestamp. A zero timestamp indicates that this feature is disabled. A
> + * timestamp greater than zero indicates that the attack detection is enabled.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * held.
> + * Return: True if the brute force attack detection is disabled. False
> + * otherwise.
> + */
> +static bool brute_disabled(struct brute_stats *stats)
> +{
> + bool disabled;
> +
> + spin_lock(&stats->lock);
> + disabled = !stats->jiffies;
> + spin_unlock(&stats->lock);
> +
> + return disabled;
> +}
> +
> +/**
> + * brute_disable() - Disable the brute force attack detection.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * To disable the brute force attack detection it is only necessary to set the
> + * last crash timestamp to zero. A zero timestamp indicates that this feature is
> + * disabled. A timestamp greater than zero indicates that the attack detection
> + * is enabled.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> + * and brute_stats::lock held.
> + */
> +static inline void brute_disable(struct brute_stats *stats)
> +{
> + stats->jiffies = 0;
> +}
> +
> +/**
> + * enum brute_attack_type - Brute force attack type.
> + * @BRUTE_ATTACK_TYPE_FORK: Attack that happens through the fork system call.
> + * @BRUTE_ATTACK_TYPE_EXEC: Attack that happens through the execve system call.
> + */
> +enum brute_attack_type {
> + BRUTE_ATTACK_TYPE_FORK,
> + BRUTE_ATTACK_TYPE_EXEC,
> +};
> +
> +/**
> + * brute_kill_offending_tasks() - Kill the offending tasks.
> + * @attack_type: Brute force attack type.
> + * @stats: Statistical data shared by all the fork hierarchy processes.
> + *
> + * When a brute force attack is detected all the offending tasks involved in the
> + * attack must be killed. In other words, it is necessary to kill all the tasks
> + * that share the same statistical data. Moreover, if the attack happens through
> + * the fork system call, the processes that have the same group_leader that the
> + * current task must be avoided since they are in the path to be killed.
> + *
> + * When the SIGKILL signal is sent to the offending tasks, this function will be
> + * called again from the task_fatal_signal hook due to a small crash period. So,
> + * to avoid kill again the same tasks due to a recursive call of this function,
> + * it is necessary to disable the attack detection for this fork hierarchy.
Hah. Interesting. I wonder if there is a better way to handle this. Hmm.
> + *
> + * The statistical data shared by all the fork hierarchy processes cannot be
> + * NULL.
> + *
> + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> + * since the task_free hook can be called from an IRQ context during the
> + * execution of the task_fatal_signal hook.
> + *
> + * Context: Must be called with interrupts disabled and tasklist_lock and
> + * brute_stats_ptr_lock held.
> + */
> +static void brute_kill_offending_tasks(enum brute_attack_type attack_type,
> + struct brute_stats *stats)
> +{
> + struct task_struct *p;
> + struct brute_stats **p_stats;
> +
> + spin_lock(&stats->lock);
> +
> + if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
> + refcount_read(&stats->refc) == 1) {
> + spin_unlock(&stats->lock);
> + return;
> + }
refcount_read() isn't a safe way to check that there is only 1
reference. What's this trying to do?
> +
> + brute_disable(stats);
> + spin_unlock(&stats->lock);
> +
> + for_each_process(p) {
> + if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
> + p->group_leader == current->group_leader)
> + continue;
> +
> + p_stats = brute_stats_ptr(p);
> + if (*p_stats != stats)
> + continue;
> +
> + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> + pr_warn_ratelimited("Offending process %d [%s] killed\n",
> + p->pid, p->comm);
> + }
> +}
> +
> /**
> * brute_manage_fork_attack() - Manage a fork brute force attack.
> * @stats: Statistical data shared by all the fork hierarchy processes.
> @@ -586,8 +706,8 @@ static inline void print_fork_attack_running(void)
> * since the task_free hook can be called from an IRQ context during the
> * execution of the task_fatal_signal hook.
> *
> - * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> - * held.
> + * Context: Must be called with interrupts disabled and tasklist_lock and
> + * brute_stats_ptr_lock held.
> * Return: The last crash timestamp before updating it.
> */
> static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
> @@ -595,8 +715,10 @@ static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
> u64 last_fork_crash;
>
> last_fork_crash = brute_update_crash_period(stats, now);
> - if (brute_attack_running(stats))
> + if (brute_attack_running(stats)) {
> print_fork_attack_running();
> + brute_kill_offending_tasks(BRUTE_ATTACK_TYPE_FORK, stats);
> + }
>
> return last_fork_crash;
> }
> @@ -783,8 +905,10 @@ static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
> if (fork_period == exec_period)
> return;
>
> - if (brute_attack_running(exec_stats))
> + if (brute_attack_running(exec_stats)) {
> print_exec_attack_running(exec_stats);
> + brute_kill_offending_tasks(BRUTE_ATTACK_TYPE_EXEC, exec_stats);
> + }
> }
>
> /**
> @@ -900,10 +1024,9 @@ static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
> read_lock(&tasklist_lock);
> read_lock_irqsave(&brute_stats_ptr_lock, flags);
>
> - if (WARN(!*stats, "No statistical data\n"))
> - goto unlock;
> -
> - if (!brute_threat_model_supported(siginfo, *stats))
> + if (WARN(!*stats, "No statistical data\n") ||
> + brute_disabled(*stats) ||
> + !brute_threat_model_supported(siginfo, *stats))
> goto unlock;
>
> last_fork_crash = brute_manage_fork_attack(*stats, now);
> --
> 2.25.1
>
--
Kees Cook
On Sun, Mar 07, 2021 at 12:30:30PM +0100, John Wood wrote:
> Add some info detailing what is the Brute LSM, its motivation, weak
> points of existing implementations, proposed solutions, enabling,
> disabling and self-tests.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> Documentation/admin-guide/LSM/Brute.rst | 278 ++++++++++++++++++++++++
> Documentation/admin-guide/LSM/index.rst | 1 +
> security/brute/Kconfig | 3 +-
> 3 files changed, 281 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/admin-guide/LSM/Brute.rst
>
> diff --git a/Documentation/admin-guide/LSM/Brute.rst b/Documentation/admin-guide/LSM/Brute.rst
> new file mode 100644
> index 000000000000..ca80aef9aa67
> --- /dev/null
> +++ b/Documentation/admin-guide/LSM/Brute.rst
> @@ -0,0 +1,278 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +===========================================================
> +Brute: Fork brute force attack detection and mitigation LSM
> +===========================================================
> +
> +Attacks against vulnerable userspace applications with the purpose to break ASLR
> +or bypass canaries traditionally use some level of brute force with the help of
> +the fork system call. This is possible since when creating a new process using
> +fork its memory contents are the same as those of the parent process (the
> +process that called the fork system call). So, the attacker can test the memory
> +infinite times to find the correct memory values or the correct memory addresses
> +without worrying about crashing the application.
> +
> +Based on the above scenario it would be nice to have this detected and
> +mitigated, and this is the goal of this implementation. Specifically the
> +following attacks are expected to be detected:
> +
> +1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
> + desirable memory layout is got (e.g. Stack Clash).
> +2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a
> + desirable memory layout is got (e.g. what CTFs do for simple network
> + service).
> +3.- Launching processes without exec() (e.g. Android Zygote) and exposing state
> + to attack a sibling.
> +4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the
> + previously shared memory layout of all the other children is exposed (e.g.
> + kind of related to HeartBleed).
> +
> +In each case, a privilege boundary has been crossed:
> +
> +Case 1: setuid/setgid process
> +Case 2: network to local
> +Case 3: privilege changes
> +Case 4: network to local
> +
> +So, what really needs to be detected are fork/exec brute force attacks that
> +cross any of the commented bounds.
> +
> +
> +Other implementations
> +=====================
> +
> +The public version of grsecurity, as a summary, is based on the idea of delaying
> +the fork system call if a child died due to some fatal signal (SIGSEGV, SIGBUS,
> +SIGKILL or SIGILL). This has some issues:
> +
> +Bad practices
> +-------------
> +
> +Adding delays to the kernel is, in general, a bad idea.
> +
> +Scenarios not detected (false negatives)
> +----------------------------------------
> +
> +This protection acts only when the fork system call is called after a child has
> +crashed. So, it would still be possible for an attacker to fork a big amount of
> +children (in the order of thousands), then probe all of them, and finally wait
> +the protection time before repeating the steps.
> +
> +Moreover, this method is based on the idea that the protection doesn't act if
> +the parent crashes. So, it would still be possible for an attacker to fork a
> +process and probe itself. Then, fork the child process and probe itself again.
> +This way, these steps can be repeated infinite times without any mitigation.
> +
> +Scenarios detected (false positives)
> +------------------------------------
> +
> +Scenarios where an application rarely fails for reasons unrelated to a real
> +attack.
> +
> +
> +This implementation
> +===================
> +
> +The main idea behind this implementation is to improve the existing ones
> +focusing on the weak points annotated before. Basically, the adopted solution is
> +to detect a fast crash rate instead of only one simple crash and to detect both
> +the crash of parent and child processes. Also, fine tune the detection focusing
> +on privilege boundary crossing. And finally, as a mitigation method, kill all
> +the offending tasks involved in the attack instead of using delays.
> +
> +To achieve this goal, and going into more details, this implementation is based
> +on the use of some statistical data shared across all the processes that can
> +have the same memory contents. Or in other words, a statistical data shared
> +between all the fork hierarchy processes after an execve system call.
> +
> +The purpose of these statistics is, basically, collect all the necessary info
> +to compute the application crash period in order to detect an attack. This crash
> +period is the time between the execve system call and the first fault or the
> +time between two consecutive faults, but this has a drawback. If an application
> +crashes twice in a short period of time for some reason unrelated to a real
> +attack, a false positive will be triggered. To avoid this scenario the
> +exponential moving average (EMA) is used. This way, the application crash period
> +will be a value that is not prone to change due to spurious data and follows the
> +real crash period.
> +
> +To detect a brute force attack it is necessary that the statistics shared by all
> +the fork hierarchy processes be updated in every fatal crash and the most
> +important data to update is the application crash period.
> +
> +These statistics are hold by the brute_stats struct.
> +
> +struct brute_cred {
> + kuid_t uid;
> + kgid_t gid;
> + kuid_t suid;
> + kgid_t sgid;
> + kuid_t euid;
> + kgid_t egid;
> + kuid_t fsuid;
> + kgid_t fsgid;
> +};
> +
> +struct brute_stats {
> + spinlock_t lock;
> + refcount_t refc;
> + unsigned char faults;
> + u64 jiffies;
> + u64 period;
> + struct brute_cred saved_cred;
> + unsigned char network : 1;
> + unsigned char bounds_crossed : 1;
> +};
Instead of open-coding this, just use the kerndoc references you've
already built in the .c files:
.. kernel-doc:: security/brute/brute.c
> +
> +This is a fixed sized struct, so the memory usage will be based on the current
> +number of processes exec()ing. The previous sentence is true since in every fork
> +system call the parent's statistics are shared with the child process and in
> +every execve system call a new brute_stats struct is allocated. So, only one
> +brute_stats struct is used for every fork hierarchy (hierarchy of processes from
> +the execve system call).
> +
> +There are two types of brute force attacks that need to be detected. The first
> +one is an attack that happens through the fork system call and the second one is
> +an attack that happens through the execve system call. The first type uses the
> +statistics shared by all the fork hierarchy processes, but the second type
> +cannot use this statistical data due to these statistics dissapear when the
> +involved tasks finished. In this last scenario the attack info should be tracked
> +by the statistics of a higher fork hierarchy (the hierarchy that contains the
> +process that forks before the execve system call).
> +
> +Moreover, these two attack types have two variants. A slow brute force attack
> +that is detected if a maximum number of faults per fork hierarchy is reached and
> +a fast brute force attack that is detected if the application crash period falls
> +below a certain threshold.
> +
> +Once an attack has been detected, this is mitigated killing all the offending
> +tasks involved. Or in other words, once an attack has been detected, this is
> +mitigated killing all the processes that share the same statistics (the stats
> +that show an slow or fast brute force attack).
> +
> +Fine tuning the attack detection
> +--------------------------------
> +
> +To avoid false positives during the attack detection it is necessary to narrow
> +the possible cases. To do so, and based on the threat scenarios that we want to
> +detect, this implementation also focuses on the crossing of privilege bounds.
> +
> +To be precise, only the following privilege bounds are taken into account:
> +
> +1.- setuid/setgid process
> +2.- network to local
> +3.- privilege changes
> +
> +Moreover, only the fatal signals delivered by the kernel are taken into account
> +avoiding the fatal signals sent by userspace applications (with the exception of
> +the SIGABRT user signal since this is used by glibc for stack canary, malloc,
> +etc. failures, which may indicate that a mitigation has been triggered).
> +
> +Exponential moving average (EMA)
> +--------------------------------
> +
> +This kind of average defines a weight (between 0 and 1) for the new value to add
> +and applies the remainder of the weight to the current average value. This way,
> +some spurious data will not excessively modify the average and only if the new
> +values are persistent, the moving average will tend towards them.
> +
> +Mathematically the application crash period's EMA can be expressed as follows:
> +
> +period_ema = period * weight + period_ema * (1 - weight)
> +
> +Related to the attack detection, the EMA must guarantee that not many crashes
> +are needed. To demonstrate this, the scenario where an application has been
> +running without any crashes for a month will be used.
> +
> +The period's EMA can be written now as:
> +
> +period_ema[i] = period[i] * weight + period_ema[i - 1] * (1 - weight)
> +
> +If the new crash periods have insignificant values related to the first crash
> +period (a month in this case), the formula can be rewritten as:
> +
> +period_ema[i] = period_ema[i - 1] * (1 - weight)
> +
> +And by extension:
> +
> +period_ema[i - 1] = period_ema[i - 2] * (1 - weight)
> +period_ema[i - 2] = period_ema[i - 3] * (1 - weight)
> +period_ema[i - 3] = period_ema[i - 4] * (1 - weight)
> +
> +So, if the substitution is made:
> +
> +period_ema[i] = period_ema[i - 1] * (1 - weight)
> +period_ema[i] = period_ema[i - 2] * pow((1 - weight) , 2)
> +period_ema[i] = period_ema[i - 3] * pow((1 - weight) , 3)
> +period_ema[i] = period_ema[i - 4] * pow((1 - weight) , 4)
> +
> +And in a more generic form:
> +
> +period_ema[i] = period_ema[i - n] * pow((1 - weight) , n)
> +
> +Where n represents the number of iterations to obtain an EMA value. Or in other
> +words, the number of crashes to detect an attack.
> +
> +So, if we isolate the number of crashes:
> +
> +period_ema[i] / period_ema[i - n] = pow((1 - weight), n)
> +log(period_ema[i] / period_ema[i - n]) = log(pow((1 - weight), n))
> +log(period_ema[i] / period_ema[i - n]) = n * log(1 - weight)
> +n = log(period_ema[i] / period_ema[i - n]) / log(1 - weight)
> +
> +Then, in the commented scenario (an application has been running without any
> +crashes for a month), the approximate number of crashes to detect an attack
> +(using the implementation values for the weight and the crash period threshold)
> +is:
> +
> +weight = 7 / 10
> +crash_period_threshold = 30 seconds
> +
> +n = log(crash_period_threshold / seconds_per_month) / log(1 - weight)
> +n = log(30 / (30 * 24 * 3600)) / log(1 - 0.7)
> +n = 9.44
> +
> +So, with 10 crashes for this scenario an attack will be detected. If these steps
> +are repeated for different scenarios and the results are collected:
> +
> +1 month without any crashes ----> 9.44 crashes to detect an attack
> +1 year without any crashes -----> 11.50 crashes to detect an attack
> +10 years without any crashes ---> 13.42 crashes to detect an attack
> +
> +However, this computation has a drawback. The first data added to the EMA not
> +obtains a real average showing a trend. So the solution is simple, the EMA needs
> +a minimum number of data to be able to be interpreted. This way, the case where
> +a few first faults are fast enough followed by no crashes is avoided.
> +
> +Per system enabling/disabling
> +-----------------------------
> +
> +This feature can be enabled at build time using the CONFIG_SECURITY_FORK_BRUTE
> +option or using the visual config application under the following menu:
> +
> +Security options ---> Fork brute force attack detection and mitigation
> +
> +Also, at boot time, this feature can be disable too, by changing the "lsm=" boot
> +parameter.
> +
> +Kernel selftests
> +----------------
> +
> +To validate all the expectations about this implementation, there is a set of
> +selftests. This tests cover fork/exec brute force attacks crossing the following
> +privilege boundaries:
> +
> +1.- setuid process
> +2.- privilege changes
> +3.- network to local
> +
> +Also, there are some tests to check that fork/exec brute force attacks without
> +crossing any privilege boundariy already commented doesn't trigger the detection
> +and mitigation stage.
> +
> +To build the tests:
> +make -C tools/testing/selftests/ TARGETS=brute
> +
> +To run the tests:
> +make -C tools/testing/selftests TARGETS=brute run_tests
> +
> +To package the tests:
> +make -C tools/testing/selftests TARGETS=brute gen_tar
> diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
> index a6ba95fbaa9f..1f68982bb330 100644
> --- a/Documentation/admin-guide/LSM/index.rst
> +++ b/Documentation/admin-guide/LSM/index.rst
> @@ -41,6 +41,7 @@ subdirectories.
> :maxdepth: 1
>
> apparmor
> + Brute
> LoadPin
> SELinux
> Smack
> diff --git a/security/brute/Kconfig b/security/brute/Kconfig
> index 1bd2df1e2dec..334d7e88d27f 100644
> --- a/security/brute/Kconfig
> +++ b/security/brute/Kconfig
> @@ -7,6 +7,7 @@ config SECURITY_FORK_BRUTE
> vulnerable userspace processes. The detection method is based on
> the application crash period and as a mitigation procedure all the
> offending tasks are killed. Like capabilities, this security module
> - stacks with other LSMs.
> + stacks with other LSMs. Further information can be found in
> + Documentation/admin-guide/LSM/Brute.rst.
>
> If you are unsure how to answer this question, answer N.
> --
> 2.25.1
>
--
Kees Cook
On Sun, Mar 07, 2021 at 12:30:29PM +0100, John Wood wrote:
> Add tests to check the brute LSM functionality and cover fork/exec brute
> force attacks crossing the following privilege boundaries:
>
> 1.- setuid process
> 2.- privilege changes
> 3.- network to local
>
> Also, as a first step check that fork/exec brute force attacks without
> crossing any privilege boundariy already commented doesn't trigger the
> detection and mitigation stage.
>
> All the fork brute force attacks are carried out via the "exec" app to
> avoid the triggering of the "brute" LSM over the shell script running
> the tests.
>
> Signed-off-by: John Wood <[email protected]>
Yay tests!
> ---
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/brute/.gitignore | 2 +
> tools/testing/selftests/brute/Makefile | 5 +
> tools/testing/selftests/brute/config | 1 +
> tools/testing/selftests/brute/exec.c | 44 ++
> tools/testing/selftests/brute/test.c | 507 +++++++++++++++++++++++
> tools/testing/selftests/brute/test.sh | 226 ++++++++++
> 7 files changed, 786 insertions(+)
> create mode 100644 tools/testing/selftests/brute/.gitignore
> create mode 100644 tools/testing/selftests/brute/Makefile
> create mode 100644 tools/testing/selftests/brute/config
> create mode 100644 tools/testing/selftests/brute/exec.c
> create mode 100644 tools/testing/selftests/brute/test.c
> create mode 100755 tools/testing/selftests/brute/test.sh
>
> diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
> index 6c575cf34a71..d4cf9e1c0a6d 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -2,6 +2,7 @@
> TARGETS = arm64
> TARGETS += bpf
> TARGETS += breakpoints
> +TARGETS += brute
> TARGETS += capabilities
> TARGETS += cgroup
> TARGETS += clone3
> diff --git a/tools/testing/selftests/brute/.gitignore b/tools/testing/selftests/brute/.gitignore
> new file mode 100644
> index 000000000000..1ccc45251a1b
> --- /dev/null
> +++ b/tools/testing/selftests/brute/.gitignore
> @@ -0,0 +1,2 @@
> +exec
> +test
> diff --git a/tools/testing/selftests/brute/Makefile b/tools/testing/selftests/brute/Makefile
> new file mode 100644
> index 000000000000..52662d0b484c
> --- /dev/null
> +++ b/tools/testing/selftests/brute/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +CFLAGS += -Wall -O2
> +TEST_PROGS := test.sh
> +TEST_GEN_FILES := exec test
> +include ../lib.mk
> diff --git a/tools/testing/selftests/brute/config b/tools/testing/selftests/brute/config
> new file mode 100644
> index 000000000000..3587b7bf6c23
> --- /dev/null
> +++ b/tools/testing/selftests/brute/config
> @@ -0,0 +1 @@
> +CONFIG_SECURITY_FORK_BRUTE=y
> diff --git a/tools/testing/selftests/brute/exec.c b/tools/testing/selftests/brute/exec.c
> new file mode 100644
> index 000000000000..1bbe72f6e4bd
> --- /dev/null
> +++ b/tools/testing/selftests/brute/exec.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <libgen.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +static __attribute__((noreturn)) void error_failure(const char *message)
> +{
> + perror(message);
> + exit(EXIT_FAILURE);
> +}
> +
> +#define PROG_NAME basename(argv[0])
> +
> +int main(int argc, char **argv)
> +{
> + pid_t pid;
> + int status;
> +
> + if (argc < 2) {
> + printf("Usage: %s <EXECUTABLE>\n", PROG_NAME);
> + exit(EXIT_FAILURE);
> + }
> +
> + pid = fork();
> + if (pid < 0)
> + error_failure("fork");
> +
> + /* Child process */
> + if (!pid) {
> + execve(argv[1], &argv[1], NULL);
> + error_failure("execve");
> + }
> +
> + /* Parent process */
> + pid = waitpid(pid, &status, 0);
> + if (pid < 0)
> + error_failure("waitpid");
> +
> + return EXIT_SUCCESS;
> +}
> diff --git a/tools/testing/selftests/brute/test.c b/tools/testing/selftests/brute/test.c
> new file mode 100644
> index 000000000000..44c32f446dca
> --- /dev/null
> +++ b/tools/testing/selftests/brute/test.c
> @@ -0,0 +1,507 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <libgen.h>
> +#include <pwd.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/socket.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +#include <unistd.h>
> +
> +static const char *message = "message";
> +
> +enum mode {
> + MODE_NONE,
> + MODE_CRASH,
> + MODE_SERVER_CRASH,
> + MODE_CLIENT,
> +};
> +
> +enum crash_after {
> + CRASH_AFTER_NONE,
> + CRASH_AFTER_FORK,
> + CRASH_AFTER_EXEC,
> +};
> +
> +enum signal_from {
> + SIGNAL_FROM_NONE,
> + SIGNAL_FROM_USER,
> + SIGNAL_FROM_KERNEL,
> +};
> +
> +struct args {
> + uint32_t ip;
> + uint16_t port;
> + int counter;
> + long timeout;
> + enum mode mode;
> + enum crash_after crash_after;
> + enum signal_from signal_from;
> + unsigned char has_counter : 1;
> + unsigned char has_change_priv : 1;
> + unsigned char has_ip : 1;
> + unsigned char has_port : 1;
> + unsigned char has_timeout : 1;
> +};
> +
> +#define OPT_STRING "hm:c:s:n:Ca:p:t:"
> +
> +static void usage(const char *prog)
> +{
> + printf("Usage: %s <OPTIONS>\n", prog);
> + printf("OPTIONS:\n");
> + printf(" -h: Show this help and exit. Optional.\n");
> + printf(" -m (crash | server_crash | client): Mode. Required.\n");
> + printf("Options for crash mode:\n");
> + printf(" -c (fork | exec): Crash after. Optional.\n");
> + printf(" -s (user | kernel): Signal from. Required.\n");
> + printf(" -n counter: Number of crashes.\n");
> + printf(" Required if the option -c is used.\n");
> + printf(" Not used without the option -c.\n");
> + printf(" Range from 1 to INT_MAX.\n");
> + printf(" -C: Change privileges before crash. Optional.\n");
> + printf("Options for server_crash mode:\n");
> + printf(" -a ip: Ip v4 address to accept. Required.\n");
> + printf(" -p port: Port number. Required.\n");
> + printf(" Range from 1 to UINT16_MAX.\n");
> + printf(" -t secs: Accept timeout. Required.\n");
> + printf(" Range from 1 to LONG_MAX.\n");
> + printf(" -c (fork | exec): Crash after. Required.\n");
> + printf(" -s (user | kernel): Signal from. Required.\n");
> + printf(" -n counter: Number of crashes. Required.\n");
> + printf(" Range from 1 to INT_MAX.\n");
> + printf("Options for client mode:\n");
> + printf(" -a ip: Ip v4 address to connect. Required.\n");
> + printf(" -p port: Port number. Required.\n");
> + printf(" Range from 1 to UINT16_MAX.\n");
> + printf(" -t secs: Connect timeout. Required.\n");
> + printf(" Range from 1 to LONG_MAX.\n");
> +}
> +
> +static __attribute__((noreturn)) void info_failure(const char *message,
> + const char *prog)
> +{
> + printf("%s\n", message);
> + usage(prog);
> + exit(EXIT_FAILURE);
> +}
> +
> +static enum mode get_mode(const char *text, const char *prog)
> +{
> + if (!strcmp(text, "crash"))
> + return MODE_CRASH;
> +
> + if (!strcmp(text, "server_crash"))
> + return MODE_SERVER_CRASH;
> +
> + if (!strcmp(text, "client"))
> + return MODE_CLIENT;
> +
> + info_failure("Invalid mode option [-m].", prog);
> +}
> +
> +static enum crash_after get_crash_after(const char *text, const char *prog)
> +{
> + if (!strcmp(text, "fork"))
> + return CRASH_AFTER_FORK;
> +
> + if (!strcmp(text, "exec"))
> + return CRASH_AFTER_EXEC;
> +
> + info_failure("Invalid crash after option [-c].", prog);
> +}
> +
> +static enum signal_from get_signal_from(const char *text, const char *prog)
> +{
> + if (!strcmp(text, "user"))
> + return SIGNAL_FROM_USER;
> +
> + if (!strcmp(text, "kernel"))
> + return SIGNAL_FROM_KERNEL;
> +
> + info_failure("Invalid signal from option [-s]", prog);
> +}
> +
> +static int get_counter(const char *text, const char *prog)
> +{
> + int counter;
> +
> + counter = atoi(text);
> + if (counter > 0)
> + return counter;
> +
> + info_failure("Invalid counter option [-n].", prog);
> +}
> +
> +static __attribute__((noreturn)) void error_failure(const char *message)
> +{
> + perror(message);
> + exit(EXIT_FAILURE);
> +}
> +
> +static uint32_t get_ip(const char *text, const char *prog)
> +{
> + int ret;
> + uint32_t ip;
> +
> + ret = inet_pton(AF_INET, text, &ip);
> + if (!ret)
> + info_failure("Invalid ip option [-a].", prog);
> + else if (ret < 0)
> + error_failure("inet_pton");
> +
> + return ip;
> +}
> +
> +static uint16_t get_port(const char *text, const char *prog)
> +{
> + long port;
> +
> + port = atol(text);
> + if ((port > 0) && (port <= UINT16_MAX))
> + return htons(port);
> +
> + info_failure("Invalid port option [-p].", prog);
> +}
> +
> +static long get_timeout(const char *text, const char *prog)
> +{
> + long timeout;
> +
> + timeout = atol(text);
> + if (timeout > 0)
> + return timeout;
> +
> + info_failure("Invalid timeout option [-t].", prog);
> +}
> +
> +static void check_args(const struct args *args, const char *prog)
> +{
> + if (args->mode == MODE_CRASH && args->crash_after != CRASH_AFTER_NONE &&
> + args->signal_from != SIGNAL_FROM_NONE && args->has_counter &&
> + !args->has_ip && !args->has_port && !args->has_timeout)
> + return;
> +
> + if (args->mode == MODE_CRASH && args->signal_from != SIGNAL_FROM_NONE &&
> + args->crash_after == CRASH_AFTER_NONE && !args->has_counter &&
> + !args->has_ip && !args->has_port && !args->has_timeout)
> + return;
> +
> + if (args->mode == MODE_SERVER_CRASH && args->has_ip && args->has_port &&
> + args->has_timeout && args->crash_after != CRASH_AFTER_NONE &&
> + args->signal_from != SIGNAL_FROM_NONE && args->has_counter &&
> + !args->has_change_priv)
> + return;
> +
> + if (args->mode == MODE_CLIENT && args->has_ip && args->has_port &&
> + args->has_timeout && args->crash_after == CRASH_AFTER_NONE &&
> + args->signal_from == SIGNAL_FROM_NONE && !args->has_counter &&
> + !args->has_change_priv)
> + return;
> +
> + info_failure("Invalid use of options.", prog);
> +}
> +
> +static uid_t get_non_root_uid(void)
> +{
> + struct passwd *pwent;
> + uid_t uid;
> +
> + while (true) {
> + errno = 0;
> + pwent = getpwent();
> + if (!pwent) {
> + if (errno) {
> + perror("getpwent");
> + endpwent();
> + exit(EXIT_FAILURE);
> + }
> + break;
> + }
> +
> + if (pwent->pw_uid) {
> + uid = pwent->pw_uid;
> + endpwent();
> + return uid;
> + }
> + }
> +
> + endpwent();
> + printf("A user different of root is needed.\n");
> + exit(EXIT_FAILURE);
> +}
> +
> +static inline void do_sigsegv(void)
> +{
> + int *p = NULL;
> + *p = 0;
> +}
> +
> +static void do_sigkill(void)
> +{
> + int ret;
> +
> + ret = kill(getpid(), SIGKILL);
> + if (ret)
> + error_failure("kill");
> +}
> +
> +static void crash(enum signal_from signal_from, bool change_priv)
> +{
> + int ret;
> +
> + if (change_priv) {
> + ret = setuid(get_non_root_uid());
> + if (ret)
> + error_failure("setuid");
> + }
> +
> + if (signal_from == SIGNAL_FROM_KERNEL)
> + do_sigsegv();
> +
> + do_sigkill();
> +}
> +
> +static void execve_crash(char *const argv[])
> +{
> + execve(argv[0], argv, NULL);
> + error_failure("execve");
> +}
> +
> +static void exec_crash_user(void)
> +{
> + char *const argv[] = {
> + "./test", "-m", "crash", "-s", "user", NULL,
> + };
> +
> + execve_crash(argv);
> +}
> +
> +static void exec_crash_user_change_priv(void)
> +{
> + char *const argv[] = {
> + "./test", "-m", "crash", "-s", "user", "-C", NULL,
> + };
> +
> + execve_crash(argv);
> +}
> +
> +static void exec_crash_kernel(void)
> +{
> + char *const argv[] = {
> + "./test", "-m", "crash", "-s", "kernel", NULL,
> + };
> +
> + execve_crash(argv);
> +}
> +
> +static void exec_crash_kernel_change_priv(void)
> +{
> + char *const argv[] = {
> + "./test", "-m", "crash", "-s", "kernel", "-C", NULL,
> + };
> +
> + execve_crash(argv);
> +}
> +
> +static void exec_crash(enum signal_from signal_from, bool change_priv)
> +{
> + if (signal_from == SIGNAL_FROM_USER && !change_priv)
> + exec_crash_user();
> + if (signal_from == SIGNAL_FROM_USER && change_priv)
> + exec_crash_user_change_priv();
> + if (signal_from == SIGNAL_FROM_KERNEL && !change_priv)
> + exec_crash_kernel();
> + if (signal_from == SIGNAL_FROM_KERNEL && change_priv)
> + exec_crash_kernel_change_priv();
> +}
> +
> +static void do_crash(enum crash_after crash_after, enum signal_from signal_from,
> + int counter, bool change_priv)
> +{
> + pid_t pid;
> + int status;
> +
> + if (crash_after == CRASH_AFTER_NONE)
> + crash(signal_from, change_priv);
> +
> + while (counter > 0) {
> + pid = fork();
> + if (pid < 0)
> + error_failure("fork");
> +
> + /* Child process */
> + if (!pid) {
> + if (crash_after == CRASH_AFTER_FORK)
> + crash(signal_from, change_priv);
> +
> + exec_crash(signal_from, change_priv);
> + }
> +
> + /* Parent process */
> + counter -= 1;
> + pid = waitpid(pid, &status, 0);
> + if (pid < 0)
> + error_failure("waitpid");
> + }
> +}
> +
> +static __attribute__((noreturn)) void error_close_failure(const char *message,
> + int fd)
> +{
> + perror(message);
> + close(fd);
> + exit(EXIT_FAILURE);
> +}
> +
> +static void do_server(uint32_t ip, uint16_t port, long accept_timeout)
> +{
> + int sockfd;
> + int ret;
> + struct sockaddr_in address;
> + struct timeval timeout;
> + int newsockfd;
> +
> + sockfd = socket(AF_INET, SOCK_STREAM, 0);
> + if (sockfd < 0)
> + error_failure("socket");
> +
> + address.sin_family = AF_INET;
> + address.sin_addr.s_addr = ip;
> + address.sin_port = port;
> +
> + ret = bind(sockfd, (const struct sockaddr *)&address, sizeof(address));
> + if (ret)
> + error_close_failure("bind", sockfd);
> +
> + ret = listen(sockfd, 1);
> + if (ret)
> + error_close_failure("listen", sockfd);
> +
> + timeout.tv_sec = accept_timeout;
> + timeout.tv_usec = 0;
> + ret = setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO,
> + (const struct timeval *)&timeout, sizeof(timeout));
> + if (ret)
> + error_close_failure("setsockopt", sockfd);
> +
> + newsockfd = accept(sockfd, NULL, NULL);
> + if (newsockfd < 0)
> + error_close_failure("accept", sockfd);
> +
> + close(sockfd);
> + close(newsockfd);
> +}
> +
> +static void do_client(uint32_t ip, uint16_t port, long connect_timeout)
> +{
> + int sockfd;
> + int ret;
> + struct timeval timeout;
> + struct sockaddr_in address;
> +
> + sockfd = socket(AF_INET, SOCK_STREAM, 0);
> + if (sockfd < 0)
> + error_failure("socket");
> +
> + timeout.tv_sec = connect_timeout;
> + timeout.tv_usec = 0;
> + ret = setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO,
> + (const struct timeval *)&timeout, sizeof(timeout));
> + if (ret)
> + error_close_failure("setsockopt", sockfd);
> +
> + address.sin_family = AF_INET;
> + address.sin_addr.s_addr = ip;
> + address.sin_port = port;
> +
> + ret = connect(sockfd, (const struct sockaddr *)&address,
> + sizeof(address));
> + if (ret)
> + error_close_failure("connect", sockfd);
> +
> + ret = write(sockfd, message, strlen(message));
> + if (ret < 0)
> + error_close_failure("write", sockfd);
> +
> + close(sockfd);
> +}
> +
> +#define PROG_NAME basename(argv[0])
> +
> +int main(int argc, char **argv)
> +{
> + int opt;
> + struct args args = {
> + .mode = MODE_NONE,
> + .crash_after = CRASH_AFTER_NONE,
> + .signal_from = SIGNAL_FROM_NONE,
> + .has_counter = false,
> + .has_change_priv = false,
> + .has_ip = false,
> + .has_port = false,
> + .has_timeout = false,
> + };
> +
> + while ((opt = getopt(argc, argv, OPT_STRING)) != -1) {
> + switch (opt) {
> + case 'h':
> + usage(PROG_NAME);
> + return EXIT_SUCCESS;
> + case 'm':
> + args.mode = get_mode(optarg, PROG_NAME);
> + break;
> + case 'c':
> + args.crash_after = get_crash_after(optarg, PROG_NAME);
> + break;
> + case 's':
> + args.signal_from = get_signal_from(optarg, PROG_NAME);
> + break;
> + case 'n':
> + args.counter = get_counter(optarg, PROG_NAME);
> + args.has_counter = true;
> + break;
> + case 'C':
> + args.has_change_priv = true;
> + break;
> + case 'a':
> + args.ip = get_ip(optarg, PROG_NAME);
> + args.has_ip = true;
> + break;
> + case 'p':
> + args.port = get_port(optarg, PROG_NAME);
> + args.has_port = true;
> + break;
> + case 't':
> + args.timeout = get_timeout(optarg, PROG_NAME);
> + args.has_timeout = true;
> + break;
> + default:
> + usage(PROG_NAME);
> + return EXIT_FAILURE;
> + }
> + }
> +
> + check_args(&args, PROG_NAME);
> +
> + if (args.mode == MODE_CRASH) {
> + do_crash(args.crash_after, args.signal_from, args.counter,
> + args.has_change_priv);
> + } else if (args.mode == MODE_SERVER_CRASH) {
> + do_server(args.ip, args.port, args.timeout);
> + do_crash(args.crash_after, args.signal_from, args.counter,
> + false);
> + } else if (args.mode == MODE_CLIENT) {
> + do_client(args.ip, args.port, args.timeout);
> + }
> +
> + return EXIT_SUCCESS;
> +}
> diff --git a/tools/testing/selftests/brute/test.sh b/tools/testing/selftests/brute/test.sh
> new file mode 100755
> index 000000000000..f53f26ae5b96
> --- /dev/null
> +++ b/tools/testing/selftests/brute/test.sh
> @@ -0,0 +1,226 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TCID="test.sh"
> +
> +KSFT_PASS=0
> +KSFT_FAIL=1
> +KSFT_SKIP=4
> +
> +errno=$KSFT_PASS
> +
> +check_root()
> +{
> + local uid=$(id -u)
> + if [ $uid -ne 0 ]; then
> + echo $TCID: must be run as root >&2
> + exit $KSFT_SKIP
> + fi
> +}
> +
> +count_fork_matches()
> +{
> + dmesg | grep "brute: Fork brute force attack detected" | wc -l
This may be unstable if the dmesg scrolls past, etc. See how
lkdtm/run.sh handles this with a temp file and "comm".
> +}
> +
> +assert_equal()
> +{
> + local val1=$1
> + local val2=$2
> +
> + if [ $val1 -eq $val2 ]; then
> + echo "$TCID: $message [PASS]"
> + else
> + echo "$TCID: $message [FAIL]"
> + errno=$KSFT_FAIL
> + fi
> +}
> +
> +test_fork_user()
> +{
> + COUNTER=20
> +
> + old_count=$(count_fork_matches)
> + ./exec test -m crash -c fork -s user -n $COUNTER
> + new_count=$(count_fork_matches)
> +
> + message="Fork attack (user signals, no bounds crossed)"
> + assert_equal $old_count $new_count
> +}
> +
> +test_fork_kernel()
> +{
> + old_count=$(count_fork_matches)
> + ./exec test -m crash -c fork -s kernel -n $COUNTER
> + new_count=$(count_fork_matches)
> +
> + message="Fork attack (kernel signals, no bounds crossed)"
> + assert_equal $old_count $new_count
> +}
> +
> +count_exec_matches()
> +{
> + dmesg | grep "brute: Exec brute force attack detected" | wc -l
> +}
> +
> +test_exec_user()
> +{
> + old_count=$(count_exec_matches)
> + ./test -m crash -c exec -s user -n $COUNTER
> + new_count=$(count_exec_matches)
> +
> + message="Exec attack (user signals, no bounds crossed)"
> + assert_equal $old_count $new_count
> +}
> +
> +test_exec_kernel()
> +{
> + old_count=$(count_exec_matches)
> + ./test -m crash -c exec -s kernel -n $COUNTER
> + new_count=$(count_exec_matches)
> +
> + message="Exec attack (kernel signals, no bounds crossed)"
> + assert_equal $old_count $new_count
> +}
> +
> +assert_not_equal()
> +{
> + local val1=$1
> + local val2=$2
> +
> + if [ $val1 -ne $val2 ]; then
> + echo $TCID: $message [PASS]
> + else
> + echo $TCID: $message [FAIL]
> + errno=$KSFT_FAIL
> + fi
> +}
> +
> +test_fork_kernel_setuid()
> +{
> + old_count=$(count_fork_matches)
> + chmod u+s test
> + ./exec test -m crash -c fork -s kernel -n $COUNTER
> + chmod u-s test
> + new_count=$(count_fork_matches)
> +
> + message="Fork attack (kernel signals, setuid binary)"
> + assert_not_equal $old_count $new_count
> +}
> +
> +test_exec_kernel_setuid()
> +{
> + old_count=$(count_exec_matches)
> + chmod u+s test
> + ./test -m crash -c exec -s kernel -n $COUNTER
> + chmod u-s test
> + new_count=$(count_exec_matches)
> +
> + message="Exec attack (kernel signals, setuid binary)"
> + assert_not_equal $old_count $new_count
> +}
> +
> +test_fork_kernel_change_priv()
> +{
> + old_count=$(count_fork_matches)
> + ./exec test -m crash -c fork -s kernel -n $COUNTER -C
> + new_count=$(count_fork_matches)
> +
> + message="Fork attack (kernel signals, change privileges)"
> + assert_not_equal $old_count $new_count
> +}
> +
> +test_exec_kernel_change_priv()
> +{
> + old_count=$(count_exec_matches)
> + ./test -m crash -c exec -s kernel -n $COUNTER -C
> + new_count=$(count_exec_matches)
> +
> + message="Exec attack (kernel signals, change privileges)"
> + assert_not_equal $old_count $new_count
> +}
> +
> +network_ns_setup()
> +{
> + local vnet_name=$1
> + local veth_name=$2
> + local ip_src=$3
> + local ip_dst=$4
> +
> + ip netns add $vnet_name
> + ip link set $veth_name netns $vnet_name
> + ip -n $vnet_name addr add $ip_src/24 dev $veth_name
> + ip -n $vnet_name link set $veth_name up
> + ip -n $vnet_name route add $ip_dst/24 dev $veth_name
> +}
> +
> +network_setup()
> +{
> + VETH0_NAME=veth0
> + VNET0_NAME=vnet0
> + VNET0_IP=10.0.1.0
> + VETH1_NAME=veth1
> + VNET1_NAME=vnet1
> + VNET1_IP=10.0.2.0
> +
> + ip link add $VETH0_NAME type veth peer name $VETH1_NAME
> + network_ns_setup $VNET0_NAME $VETH0_NAME $VNET0_IP $VNET1_IP
> + network_ns_setup $VNET1_NAME $VETH1_NAME $VNET1_IP $VNET0_IP
> +}
> +
> +test_fork_kernel_network_to_local()
> +{
> + INADDR_ANY=0.0.0.0
> + PORT=65535
> + TIMEOUT=5
> +
> + old_count=$(count_fork_matches)
> + ip netns exec $VNET0_NAME ./exec test -m server_crash -a $INADDR_ANY \
> + -p $PORT -t $TIMEOUT -c fork -s kernel -n $COUNTER &
> + sleep 1
> + ip netns exec $VNET1_NAME ./test -m client -a $VNET0_IP -p $PORT \
> + -t $TIMEOUT
> + sleep 1
> + new_count=$(count_fork_matches)
> +
> + message="Fork attack (kernel signals, network to local)"
> + assert_not_equal $old_count $new_count
> +}
> +
> +test_exec_kernel_network_to_local()
> +{
> + old_count=$(count_exec_matches)
> + ip netns exec $VNET0_NAME ./test -m server_crash -a $INADDR_ANY \
> + -p $PORT -t $TIMEOUT -c exec -s kernel -n $COUNTER &
> + sleep 1
> + ip netns exec $VNET1_NAME ./test -m client -a $VNET0_IP -p $PORT \
> + -t $TIMEOUT
> + sleep 1
> + new_count=$(count_exec_matches)
> +
> + message="Exec attack (kernel signals, network to local)"
> + assert_not_equal $old_count $new_count
> +}
> +
> +network_cleanup()
> +{
> + ip netns del $VNET0_NAME >/dev/null 2>&1
> + ip netns del $VNET1_NAME >/dev/null 2>&1
> + ip link delete $VETH0_NAME >/dev/null 2>&1
> + ip link delete $VETH1_NAME >/dev/null 2>&1
> +}
> +
> +check_root
> +test_fork_user
> +test_fork_kernel
> +test_exec_user
> +test_exec_kernel
> +test_fork_kernel_setuid
> +test_exec_kernel_setuid
> +test_fork_kernel_change_priv
> +test_exec_kernel_change_priv
> +network_setup
> +test_fork_kernel_network_to_local
> +test_exec_kernel_network_to_local
> +network_cleanup
> +exit $errno
> --
> 2.25.1
>
--
Kees Cook
Hi,
First of all thanks for the review. More info and questions inline.
On Wed, Mar 17, 2021 at 07:00:56PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:25PM +0100, John Wood wrote:
> >
> > config LSM
> > string "Ordered list of enabled LSMs"
> > - default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> > - default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> > - default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> > - default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> > - default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
> > + default "brute,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> > + default "brute,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> > + default "brute,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> > + default "brute,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> > + default "brute,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
>
> It probably doesn't matter much, but I think brute should be added
> between lockdown and yama.
What is the rationale for the stacking order (in relation with brute and
lockdown)?
> > diff --git a/security/Makefile b/security/Makefile
> > index 3baf435de541..1236864876da 100644
> > --- a/security/Makefile
> > +++ b/security/Makefile
> > @@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM) += bpf/
> > # Object integrity file lists
> > subdir-$(CONFIG_INTEGRITY) += integrity
> > obj-$(CONFIG_INTEGRITY) += integrity/
> > +
> > +# Object brute file lists
> > +subdir-$(CONFIG_SECURITY_FORK_BRUTE) += brute
> > +obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute/
>
> I don't think subdir is needed here? I think you can use obj-... like
> loadpin, etc.
loadpin also uses subdir just like selinux, smack, tomoyo, etc.. So, why
is it not necessary for brute?
> > +#include <asm/current.h>
>
> Why is this needed?
IIUC, the "current" macro is defined in this header. I try to include the
appropiate header for every macro and function used.
> > +/**
> > + * struct brute_stats - Fork brute force attack statistics.
> > + * @lock: Lock to protect the brute_stats structure.
> > + * @refc: Reference counter.
> > + * @faults: Number of crashes.
> > + * @jiffies: Last crash timestamp.
> > + * @period: Crash period's moving average.
> > + *
> > + * This structure holds the statistical data shared by all the fork hierarchy
> > + * processes.
> > + */
> > +struct brute_stats {
> > + spinlock_t lock;
> > + refcount_t refc;
> > + unsigned char faults;
> > + u64 jiffies;
> > + u64 period;
> > +};
>
> I assume the max-255 "faults" will be explained... why is this so small?
If a brute force attack is running slowly for a long time, the application
crash period's EMA is not suitable for the detection. This type of attack
must be detected using a maximum number of faults. In this case, the
BRUTE_MAX_FAULTS is defined as 200.
> > [...]
> > +static struct brute_stats *brute_new_stats(void)
> > +{
> > + struct brute_stats *stats;
> > +
> > + stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
> > + if (!stats)
> > + return NULL;
>
> Since this is tied to process creation, I think it might make sense to
> have a dedicated kmem cache for this (instead of using the "generic"
> kmalloc). See kmem_cache_{create,*alloc,free}
Thanks, I will work on it for the next version.
>
> > +
> > + spin_lock_init(&stats->lock);
> > + refcount_set(&stats->refc, 1);
> > + stats->faults = 0;
> > + stats->jiffies = get_jiffies_64();
> > + stats->period = 0;
>
> And either way, I'd recommend using the "z" variant of the allocator
> (kmem_cache_zalloc, kzalloc) to pre-zero everything (and then you can
> drop the "= 0" lines here).
Understood.
>
> > +
> > + return stats;
> > +}
> > +
> > +/**
> > + * brute_share_stats() - Share the statistical data between processes.
> > + * @src: Source of statistics to be shared.
> > + * @dst: Destination of statistics to be shared.
> > + *
> > + * Copy the src's pointer to the statistical data structure to the dst's pointer
> > + * to the same structure. Since there is a new process that shares the same
> > + * data, increase the reference counter. The src's pointer cannot be NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_alloc hook.
> > + */
> > +static void brute_share_stats(struct brute_stats *src,
> > + struct brute_stats **dst)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&src->lock, flags);
> > + refcount_inc(&src->refc);
> > + *dst = src;
> > + spin_unlock_irqrestore(&src->lock, flags);
> > +}
> > +
> > +/**
> > + * brute_task_alloc() - Target for the task_alloc hook.
> > + * @task: Task being allocated.
> > + * @clone_flags: Contains the flags indicating what should be shared.
> > + *
> > + * For a correct management of a fork brute force attack it is necessary that
> > + * all the tasks hold statistical data. The same statistical data needs to be
> > + * shared between all the tasks that hold the same memory contents or in other
> > + * words, between all the tasks that have been forked without any execve call.
> > + *
> > + * To ensure this, if the current task doesn't have statistical data when forks,
> > + * it is mandatory to allocate a new statistics structure and share it between
> > + * this task and the new one being allocated. Otherwise, share the statistics
> > + * that the current task already has.
> > + *
> > + * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
> > + * otherwise.
> > + */
> > +static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > +{
> > + struct brute_stats **stats, **p_stats;
> > +
> > + stats = brute_stats_ptr(task);
> > + p_stats = brute_stats_ptr(current);
> > +
> > + if (likely(*p_stats)) {
> > + brute_share_stats(*p_stats, stats);
> > + return 0;
> > + }
> > +
> > + *stats = brute_new_stats();
> > + if (!*stats)
> > + return -ENOMEM;
> > +
> > + brute_share_stats(*stats, p_stats);
> > + return 0;
> > +}
>
> During the task_alloc hook, aren't both "current" and "task" already
> immutable (in the sense that no lock needs to be held for
> brute_share_stats())?
I will work on it.
> And what is the case where brute_stats_ptr(current) returns NULL?
Sorry, but I don't understand what you are trying to explain me.
brute_stats_ptr(current) returns a pointer to a pointer. So, I think
your question is: What's the purpose of the "if (likely(*p_stats))"
check? If it is the case, this check is to guarantee that all the tasks
have statistical data. If some task has been allocated prior the brute
LSM initialization, this task doesn't have stats. So, with this check
all the tasks that fork have stats.
> > +
> > +/**
> > + * brute_task_execve() - Target for the bprm_committing_creds hook.
> > + * @bprm: Points to the linux_binprm structure.
> > + *
> > + * When a forked task calls the execve system call, the memory contents are set
> > + * with new values. So, in this scenario the parent's statistical data no need
> > + * to be shared. Instead, a new statistical data structure must be allocated to
> > + * start a new hierarchy. This condition is detected when the statistics
> > + * reference counter holds a value greater than or equal to two (a fork always
> > + * sets the statistics reference counter to a minimum of two since the parent
> > + * and the child task are sharing the same data).
> > + *
> > + * However, if the execve function is called immediately after another execve
> > + * call, althought the memory contents are reset, there is no need to allocate
> > + * a new statistical data structure. This is possible because at this moment
> > + * only one task (the task that calls the execve function) points to the data.
> > + * In this case, the previous allocation is used but the statistics are reset.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the bprm_committing_creds hook.
> > + */
> > +static void brute_task_execve(struct linux_binprm *bprm)
> > +{
> > + struct brute_stats **stats;
> > + unsigned long flags;
> > +
> > + stats = brute_stats_ptr(current);
> > + if (WARN(!*stats, "No statistical data\n"))
> > + return;
> > +
> > + spin_lock_irqsave(&(*stats)->lock, flags);
> > +
> > + if (!refcount_dec_not_one(&(*stats)->refc)) {
> > + /* execve call after an execve call */
> > + (*stats)->faults = 0;
> > + (*stats)->jiffies = get_jiffies_64();
> > + (*stats)->period = 0;
> > + spin_unlock_irqrestore(&(*stats)->lock, flags);
> > + return;
> > + }
> > +
> > + /* execve call after a fork call */
> > + spin_unlock_irqrestore(&(*stats)->lock, flags);
> > + *stats = brute_new_stats();
> > + WARN(!*stats, "Cannot allocate statistical data\n");
> > +}
>
> I don't think any of this locking is needed -- you're always operating
> on "current", so its brute_stats will always be valid.
But another process (that share the same stats) could be modifying this
concurrently.
Scenario 1: cpu 1 writes stats and cpu 2 writes stats.
Scenario 2: cpu 1 writes stats, then IRQ on the same cpu writes stats.
I think it is possible. So AFAIK we need locking. Sorry if I am wrong.
> > +
> > +/**
> > + * brute_task_free() - Target for the task_free hook.
> > + * @task: Task about to be freed.
> > + *
> > + * The statistical data that is shared between all the fork hierarchy processes
> > + * needs to be freed when this hierarchy disappears.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_free hook.
> > + */
> > +static void brute_task_free(struct task_struct *task)
> > +{
> > + struct brute_stats **stats;
> > + unsigned long flags;
> > + bool refc_is_zero;
> > +
> > + stats = brute_stats_ptr(task);
> > + if (WARN(!*stats, "No statistical data\n"))
> > + return;
> > +
> > + spin_lock_irqsave(&(*stats)->lock, flags);
> > + refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> > + spin_unlock_irqrestore(&(*stats)->lock, flags);
> > +
> > + if (refc_is_zero) {
> > + kfree(*stats);
> > + *stats = NULL;
> > + }
> > +}
>
> Same thing -- this is what dec_and_test is for: it's atomic, so no
> locking needed.
Ok, in this case I can see that the locking is not necessary due to the
stats::refc is atomic. But in the previous case, faults, jiffies and
period are not atomic. So I think the lock is necessary. If not, what am
I missing?
Thanks,
John Wood
On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> > @@ -74,7 +84,7 @@ static struct brute_stats *brute_new_stats(void)
> > {
> > struct brute_stats *stats;
> >
> > - stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
> > + stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
>
> Why change this here? I'd just start with this in the patch that
> introduces it.
To be coherent in the previous patch. In the previous patch the kmalloc
could use GFP_KERNEL due to the call was made out of an atomic context.
Now, with the new lock it needs GFP_ATOMIC. So the question:
If finally it need to use GFP_ATOMIC, the first patch need to use it even if
it is not necessary?
> > if (!stats)
> > return NULL;
> >
> > @@ -99,16 +109,17 @@ static struct brute_stats *brute_new_stats(void)
> > * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > * since the task_free hook can be called from an IRQ context during the
> > * execution of the task_alloc hook.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * held.
> > */
> > static void brute_share_stats(struct brute_stats *src,
> > struct brute_stats **dst)
> > {
> > - unsigned long flags;
> > -
> > - spin_lock_irqsave(&src->lock, flags);
> > + spin_lock(&src->lock);
> > refcount_inc(&src->refc);
> > *dst = src;
> > - spin_unlock_irqrestore(&src->lock, flags);
> > + spin_unlock(&src->lock);
> > }
>
> I still don't think any locking is needed here; the whole function can
> go away, IMO.
In this case I think this is possible:
Scenario 1: cpu 1 writes the stats pointer and cpu 2 is navigating the
processes tree reading the same stats pointer.
Scenario 2: cpu 1 is navigating the processes tree reading the stats
pointer and in IRQ the same stats pointer is wrote.
So, we need locking. Am I wrong?
> > /**
> > @@ -126,26 +137,36 @@ static void brute_share_stats(struct brute_stats *src,
> > * this task and the new one being allocated. Otherwise, share the statistics
> > * that the current task already has.
> > *
> > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > + * context during the execution of the task_alloc hook.
> > + *
> > * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
> > * otherwise.
> > */
> > static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > {
> > struct brute_stats **stats, **p_stats;
> > + unsigned long flags;
> >
> > stats = brute_stats_ptr(task);
> > p_stats = brute_stats_ptr(current);
> > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> >
> > if (likely(*p_stats)) {
> > brute_share_stats(*p_stats, stats);
> > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > return 0;
> > }
> >
> > *stats = brute_new_stats();
> > - if (!*stats)
> > + if (!*stats) {
> > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > return -ENOMEM;
> > + }
> >
> > brute_share_stats(*stats, p_stats);
> > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > return 0;
> > }
>
> I'd much prefer that whatever locking is needed be introduced in the
> initial patch: this transformation just double the work to review. :)
So, IIUC I need to introduce all the locks in the initial patch even if
they are not necessary. Am I right?
> >
> > @@ -167,9 +188,9 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > * only one task (the task that calls the execve function) points to the data.
> > * In this case, the previous allocation is used but the statistics are reset.
> > *
> > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > - * since the task_free hook can be called from an IRQ context during the
> > - * execution of the bprm_committing_creds hook.
> > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > + * context during the execution of the bprm_committing_creds hook.
> > */
> > static void brute_task_execve(struct linux_binprm *bprm)
> > {
> > @@ -177,24 +198,33 @@ static void brute_task_execve(struct linux_binprm *bprm)
> > unsigned long flags;
> >
> > stats = brute_stats_ptr(current);
> > - if (WARN(!*stats, "No statistical data\n"))
> > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > +
> > + if (WARN(!*stats, "No statistical data\n")) {
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > return;
> > + }
> >
> > - spin_lock_irqsave(&(*stats)->lock, flags);
> > + spin_lock(&(*stats)->lock);
> >
> > if (!refcount_dec_not_one(&(*stats)->refc)) {
> > /* execve call after an execve call */
> > (*stats)->faults = 0;
> > (*stats)->jiffies = get_jiffies_64();
> > (*stats)->period = 0;
> > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > + spin_unlock(&(*stats)->lock);
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > return;
> > }
> >
> > /* execve call after a fork call */
> > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > + spin_unlock(&(*stats)->lock);
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > +
> > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > *stats = brute_new_stats();
> > WARN(!*stats, "Cannot allocate statistical data\n");
> > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > }
>
> Again, I don't see a need for locking -- this is just managing the
> lifetime which is entirely handled by the implicit locking of "current"
> and the refcount_t.
Here I can see the same two scenarios noted before. So I think the locking
is needed. Am I right?
> > /**
> > @@ -204,9 +234,9 @@ static void brute_task_execve(struct linux_binprm *bprm)
> > * The statistical data that is shared between all the fork hierarchy processes
> > * needs to be freed when this hierarchy disappears.
> > *
> > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > - * since the task_free hook can be called from an IRQ context during the
> > - * execution of the task_free hook.
> > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > + * context during the execution of the task_free hook.
> > */
> > static void brute_task_free(struct task_struct *task)
> > {
> > @@ -215,17 +245,446 @@ static void brute_task_free(struct task_struct *task)
> > bool refc_is_zero;
> >
> > stats = brute_stats_ptr(task);
> > - if (WARN(!*stats, "No statistical data\n"))
> > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > +
> > + if (WARN(!*stats, "No statistical data\n")) {
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > return;
> > + }
> >
> > - spin_lock_irqsave(&(*stats)->lock, flags);
> > + spin_lock(&(*stats)->lock);
> > refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > + spin_unlock(&(*stats)->lock);
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> >
> > if (refc_is_zero) {
> > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > kfree(*stats);
> > *stats = NULL;
> > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > + }
> > +}
>
> Same; I would expect this to be simply:
No comment. I think I am missing something. I need to clarify the previous
cases before to work on the next ones. Sorry and thanks for the guidance.
> stats = brute_stats_ptr(task);
> if (WARN_ON_ONCE(!*stats))
> return;
> if (refcount_dec_and_test(&(*stats)->refc)) {
> kfree(*stats);
> *stats = NULL;
> }
>
> > +
> > +/*
> > + * BRUTE_EMA_WEIGHT_NUMERATOR - Weight's numerator of EMA.
> > + */
> > +static const u64 BRUTE_EMA_WEIGHT_NUMERATOR = 7;
> > +
> > +/*
> > + * BRUTE_EMA_WEIGHT_DENOMINATOR - Weight's denominator of EMA.
> > + */
> > +static const u64 BRUTE_EMA_WEIGHT_DENOMINATOR = 10;
>
> Should these be externally configurable (via sysfs)?
No problem. I think this is easier than locking :)
>
> > +
> > +/**
> > + * brute_mul_by_ema_weight() - Multiply by EMA weight.
> > + * @value: Value to multiply by EMA weight.
> > + *
> > + * Return: The result of the multiplication operation.
> > + */
> > +static inline u64 brute_mul_by_ema_weight(u64 value)
> > +{
> > + return mul_u64_u64_div_u64(value, BRUTE_EMA_WEIGHT_NUMERATOR,
> > + BRUTE_EMA_WEIGHT_DENOMINATOR);
> > +}
> > +
> > +/*
> > + * BRUTE_MAX_FAULTS - Maximum number of faults.
> > + *
> > + * If a brute force attack is running slowly for a long time, the application
> > + * crash period's EMA is not suitable for the detection. This type of attack
> > + * must be detected using a maximum number of faults.
> > + */
> > +static const unsigned char BRUTE_MAX_FAULTS = 200;
>
> Same.
Ok, understood.
>
> > +
> > +/**
> > + * brute_update_crash_period() - Update the application crash period.
> > + * @stats: Statistics that hold the application crash period to update.
> > + * @now: The current timestamp in jiffies.
> > + *
> > + * The application crash period must be a value that is not prone to change due
> > + * to spurious data and follows the real crash period. So, to compute it, the
> > + * exponential moving average (EMA) is used.
> > + *
> > + * This kind of average defines a weight (between 0 and 1) for the new value to
> > + * add and applies the remainder of the weight to the current average value.
> > + * This way, some spurious data will not excessively modify the average and only
> > + * if the new values are persistent, the moving average will tend towards them.
> > + *
> > + * Mathematically the application crash period's EMA can be expressed as
> > + * follows:
> > + *
> > + * period_ema = period * weight + period_ema * (1 - weight)
> > + *
> > + * If the operations are applied:
> > + *
> > + * period_ema = period * weight + period_ema - period_ema * weight
> > + *
> > + * If the operands are ordered:
> > + *
> > + * period_ema = period_ema - period_ema * weight + period * weight
> > + *
> > + * Finally, this formula can be written as follows:
> > + *
> > + * period_ema -= period_ema * weight;
> > + * period_ema += period * weight;
> > + *
> > + * The statistics that hold the application crash period to update cannot be
> > + * NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * held.
> > + * Return: The last crash timestamp before updating it.
> > + */
> > +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
> > +{
> > + u64 current_period;
> > + u64 last_crash_timestamp;
> > +
> > + spin_lock(&stats->lock);
> > + current_period = now - stats->jiffies;
> > + last_crash_timestamp = stats->jiffies;
> > + stats->jiffies = now;
> > +
> > + stats->period -= brute_mul_by_ema_weight(stats->period);
> > + stats->period += brute_mul_by_ema_weight(current_period);
> > +
> > + if (stats->faults < BRUTE_MAX_FAULTS)
> > + stats->faults += 1;
> > +
> > + spin_unlock(&stats->lock);
> > + return last_crash_timestamp;
> > +}
>
> Now *here* locking makes sense, and it only needs to be per-stat, not
> global, since multiple processes may be operating on the same stat
> struct. To make this more no-reader-locking-friendly, I'd also update
> everything at the end, and use WRITE_ONCE():
>
> u64 current_period, period;
> u64 last_crash_timestamp;
> u64 faults;
>
> spin_lock(&stats->lock);
> current_period = now - stats->jiffies;
> last_crash_timestamp = stats->jiffies;
>
> WRITE_ONCE(stats->period,
> stats->period - brute_mul_by_ema_weight(stats->period) +
> brute_mul_by_ema_weight(current_period));
>
> if (stats->faults < BRUTE_MAX_FAULTS)
> WRITE_ONCE(stats->faults, stats->faults + 1);
>
> WRITE_ONCE(stats->jiffies, now);
>
> spin_unlock(&stats->lock);
> return last_crash_timestamp;
>
> That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults
> without needing to hold the &stats->lock (unless they need perfectly matching
> jiffies, period, and faults).
Thanks for the refactory. I will work on it (if I can understand locking). :(
> > +
> > +/*
> > + * BRUTE_MIN_FAULTS - Minimum number of faults.
> > + *
> > + * The application crash period's EMA cannot be used until a minimum number of
> > + * data has been applied to it. This constraint allows getting a trend when this
> > + * moving average is used. Moreover, it avoids the scenario where an application
> > + * fails quickly from execve system call due to reasons unrelated to a real
> > + * attack.
> > + */
> > +static const unsigned char BRUTE_MIN_FAULTS = 5;
> > +
> > +/*
> > + * BRUTE_CRASH_PERIOD_THRESHOLD - Application crash period threshold.
> > + *
> > + * The units are expressed in milliseconds.
> > + *
> > + * A fast brute force attack is detected when the application crash period falls
> > + * below this threshold.
> > + */
> > +static const u64 BRUTE_CRASH_PERIOD_THRESHOLD = 30000;
>
> These could all be sysctls (see yama's use of sysctl).
Ok
> > +
> > +/**
> > + * brute_attack_running() - Test if a brute force attack is happening.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + *
> > + * The decision if a brute force attack is running is based on the statistical
> > + * data shared by all the fork hierarchy processes. This statistics cannot be
> > + * NULL.
> > + *
> > + * There are two types of brute force attacks that can be detected using the
> > + * statistical data. The first one is a slow brute force attack that is detected
> > + * if the maximum number of faults per fork hierarchy is reached. The second
> > + * type is a fast brute force attack that is detected if the application crash
> > + * period falls below a certain threshold.
> > + *
> > + * Moreover, it is important to note that no attacks will be detected until a
> > + * minimum number of faults have occurred. This allows to have a trend in the
> > + * crash period when the EMA is used and also avoids the scenario where an
> > + * application fails quickly from execve system call due to reasons unrelated to
> > + * a real attack.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * held.
> > + * Return: True if a brute force attack is happening. False otherwise.
> > + */
> > +static bool brute_attack_running(struct brute_stats *stats)
> > +{
> > + u64 crash_period;
> > +
> > + spin_lock(&stats->lock);
> > + if (stats->faults < BRUTE_MIN_FAULTS) {
> > + spin_unlock(&stats->lock);
> > + return false;
> > + }
>
> If I'm reading this correctly, you're performing two tests, so there
> isn't a strict relationship between faults and period for this test,
> and I think it could be done without locking with READ_ONCE():
>
> u64 faults;
> u64 crash_period;
>
> faults = READ_ONCE(stats->faults);
> if (faults < BRUTE_MIN_FAULTS)
> return false;
> if (faults >= BRUTE_MAX_FAULTS)
> return true;
>
> crash_period = jiffies64_to_msecs(READ_ONCE(stats->period));
> return crash_period < BRUTE_CRASH_PERIOD_THRESHOLD;
Thanks, I will work on it for the next version.
> > +
> > + if (stats->faults >= BRUTE_MAX_FAULTS) {
> > + spin_unlock(&stats->lock);
> > + return true;
> > + }
> > +
> > + crash_period = jiffies64_to_msecs(stats->period);
> > + spin_unlock(&stats->lock);
> > +
> > + return crash_period < BRUTE_CRASH_PERIOD_THRESHOLD;
> > +}
> > +
> > +/**
> > + * print_fork_attack_running() - Warn about a fork brute force attack.
> > + */
> > +static inline void print_fork_attack_running(void)
> > +{
> > + pr_warn("Fork brute force attack detected [%s]\n", current->comm);
> > +}
>
> I think pid should be part of this...
No problem.
> > +
> > +/**
> > + * brute_manage_fork_attack() - Manage a fork brute force attack.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + * @now: The current timestamp in jiffies.
> > + *
> > + * For a correct management of a fork brute force attack it is only necessary to
> > + * update the statistics and test if an attack is happening based on these data.
> > + *
> > + * The statistical data shared by all the fork hierarchy processes cannot be
> > + * NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * held.
> > + * Return: The last crash timestamp before updating it.
> > + */
> > +static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
> > +{
> > + u64 last_fork_crash;
> > +
> > + last_fork_crash = brute_update_crash_period(stats, now);
> > + if (brute_attack_running(stats))
> > + print_fork_attack_running();
> > +
> > + return last_fork_crash;
> > +}
> > +
> > +/**
> > + * brute_get_exec_stats() - Get the exec statistics.
> > + * @stats: When this function is called, this parameter must point to the
> > + * current process' statistical data. When this function returns, this
> > + * parameter points to the parent process' statistics of the fork
> > + * hierarchy that hold the current process' statistics.
> > + *
> > + * To manage a brute force attack that happens through the execve system call it
> > + * is not possible to use the statistical data hold by this process due to these
> > + * statistics disappear when this task is finished. In this scenario this data
> > + * should be tracked by the statistics of a higher fork hierarchy (the hierarchy
> > + * that contains the process that forks before the execve system call).
> > + *
> > + * To find these statistics the current fork hierarchy must be traversed up
> > + * until new statistics are found.
> > + *
> > + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
> > + */
> > +static void brute_get_exec_stats(struct brute_stats **stats)
> > +{
> > + const struct task_struct *task = current;
> > + struct brute_stats **p_stats;
> > +
> > + do {
> > + if (!task->real_parent) {
> > + *stats = NULL;
> > + return;
> > + }
> > +
> > + p_stats = brute_stats_ptr(task->real_parent);
> > + task = task->real_parent;
> > + } while (*stats == *p_stats);
> > +
> > + *stats = *p_stats;
> > +}
>
> See Yama's task_is_descendant() for how to walk up the process tree
> (and I think the process group stuff will save some steps too); you
> don't need tasklist_lock held, just rcu_read_lock held, AIUI:
> Documentation/RCU/listRCU.rst
>
> And since you're passing this stats struct back up, and it would be outside of rcu read lock, you'd want to do a "get" on it first:
>
> rcu_read_lock();
> loop {
> ...
> }
> refcount_inc_not_zero(&(*p_stats)->refc);
> rcu_read_unlock();
>
> *stats = *p_stats
Thanks for the suggestions. I will work on it for the next version.
Anyway, in the first version Kees Cook and Jann Horn noted that some tasks
could escape the rcu read lock and that alternate locking were needed.
Extract from the RFC:
[Kees Cook]
Can't newly created processes escape this RCU read lock? I think this
need alternate locking, or something in the task_alloc hook that will
block any new process from being created within the stats group.
[Jann Horn]
Good point; the proper way to deal with this would probably be to take
the tasklist_lock in read mode around this loop (with
read_lock(&tasklist_lock) / read_unlock(&tasklist_lock)), which pairs
with the write_lock_irq(&tasklist_lock) in copy_process(). Thanks to
the fatal_signal_pending() check while holding the lock in
copy_process(), that would be race-free - any fork() that has not yet
inserted the new task into the global task list would wait for us to
drop the tasklist_lock, then bail out at the fatal_signal_pending()
check.
I think that this scenario is still possible. So the tasklist_lock is
necessary. Am I right?
> > +
> > +/**
> > + * brute_update_exec_crash_period() - Update the exec crash period.
> > + * @stats: When this function is called, this parameter must point to the
> > + * current process' statistical data. When this function returns, this
> > + * parameter points to the updated statistics (statistics that track the
> > + * info to manage a brute force attack that happens through the execve
> > + * system call).
> > + * @now: The current timestamp in jiffies.
> > + * @last_fork_crash: The last fork crash timestamp before updating it.
> > + *
> > + * If this is the first update of the statistics used to manage a brute force
> > + * attack that happens through the execve system call, its last crash timestamp
> > + * (the timestamp that shows when the execve was called) cannot be used to
> > + * compute the crash period's EMA. Instead, the last fork crash timestamp should
> > + * be used (the last crash timestamp of the child fork hierarchy before updating
> > + * the crash period). This allows that in a brute force attack that happens
> > + * through the fork system call, the exec and fork statistics are the same. In
> > + * this situation, the mitigation method will act only in the processes that are
> > + * sharing the fork statistics. This way, the process that forked before the
> > + * execve system call will not be involved in the mitigation method. In this
> > + * scenario, the parent is not responsible of the child's behaviour.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and tasklist_lock and
> > + * brute_stats_ptr_lock held.
> > + * Return: -EFAULT if there are no exec statistics. Zero otherwise.
> > + */
> > +static int brute_update_exec_crash_period(struct brute_stats **stats,
> > + u64 now, u64 last_fork_crash)
> > +{
> > + brute_get_exec_stats(stats);
> > + if (!*stats)
> > + return -EFAULT;
>
> This isn't EFAULT (userspace memory fault), but rather more EINVAL or
> ESRCH.
Ok.
> > +
> > + spin_lock(&(*stats)->lock);
> > + if (!(*stats)->faults)
> > + (*stats)->jiffies = last_fork_crash;
> > + spin_unlock(&(*stats)->lock);
> > +
> > + brute_update_crash_period(*stats, now);
>
> and then you can add:
>
> if (refcount_dec_and_test(&(*stats)->refc))
> kfree(*stats);
>
> (or better yet, make that a helper) named something like
> "put_brute_stats".
Sorry, but I don't understand why we need to free the stats here.
What is the rationale behind this change?
> > + return 0;
> > +}
>
> I find the re-writing of **stats confusing here -- I think you should
> leave that unmodified, and instead return a pointer (instead of "int"),
> and for errors, use ERR_PTR(-ESRCH)
Ok, thanks.
> > +
> > +/**
> > + * brute_get_crash_period() - Get the application crash period.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + *
> > + * The statistical data shared by all the fork hierarchy processes cannot be
> > + * NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * held.
> > + * Return: The application crash period.
> > + */
> > +static u64 brute_get_crash_period(struct brute_stats *stats)
> > +{
> > + u64 crash_period;
> > +
> > + spin_lock(&stats->lock);
> > + crash_period = stats->period;
> > + spin_unlock(&stats->lock);
> > +
> > + return crash_period;
> > +}
>
> return READ_ONCE(stats->period);
Ok, thanks.
> > +
> > +/**
> > + * print_exec_attack_running() - Warn about an exec brute force attack.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + *
> > + * The statistical data shared by all the fork hierarchy processes cannot be
> > + * NULL.
> > + *
> > + * Before showing the process name it is mandatory to find a process that holds
> > + * a pointer to the exec statistics.
> > + *
> > + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
> > + */
> > +static void print_exec_attack_running(const struct brute_stats *stats)
> > +{
> > + struct task_struct *p;
> > + struct brute_stats **p_stats;
> > + bool found = false;
> > +
> > + for_each_process(p) {
> > + p_stats = brute_stats_ptr(p);
> > + if (*p_stats == stats) {
> > + found = true;
> > + break;
> > + }
> > + }
> > +
> > + if (WARN(!found, "No exec process\n"))
> > + return;
> > +
> > + pr_warn("Exec brute force attack detected [%s]\n", p->comm);
> > +}
>
> Same logic to change here as above for talking the process list. (IIUC, since
> you're only reading, you don't need tasklist_lock, just rcu_read_lock.)
> But, if I'm reading this right, you only ever call this with "current".
> It seems like it would be way more efficient to just use "current"
> instead?
Ok, I will work on it. Thanks.
> > +
> > +/**
> > + * brute_manage_exec_attack() - Manage an exec brute force attack.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + * @now: The current timestamp in jiffies.
> > + * @last_fork_crash: The last fork crash timestamp before updating it.
> > + *
> > + * For a correct management of an exec brute force attack it is only necessary
> > + * to update the exec statistics and test if an attack is happening based on
> > + * these data.
> > + *
> > + * It is important to note that if the fork and exec crash periods are the same,
> > + * the attack test is avoided. This allows that in a brute force attack that
> > + * happens through the fork system call, the mitigation method does not act on
> > + * the parent process of the fork hierarchy.
> > + *
> > + * The statistical data shared by all the fork hierarchy processes cannot be
> > + * NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and tasklist_lock and
> > + * brute_stats_ptr_lock held.
> > + */
> > +static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
> > + u64 last_fork_crash)
> > +{
> > + int ret;
> > + struct brute_stats *exec_stats = stats;
> > + u64 fork_period;
> > + u64 exec_period;
> > +
> > + ret = brute_update_exec_crash_period(&exec_stats, now, last_fork_crash);
> > + if (WARN(ret, "No exec statistical data\n"))
> > + return;
>
> I think this should fail closed: if there's a static processing error,
> treat it as an attack.
Do you mean to trigger the mitigation of a brute force attack over this task?
So, IIUC you suggest that instead of generate warnings if there isn't
statistical data, we need to trigger the mitigation? This can be applied to
the case where the allocation of a brute_stats structure fails?
> > +
> > + fork_period = brute_get_crash_period(stats);
> > + exec_period = brute_get_crash_period(exec_stats);
> > + if (fork_period == exec_period)
> > + return;
> > +
> > + if (brute_attack_running(exec_stats))
> > + print_exec_attack_running(exec_stats);
> > +}
> > +
>
> I think this is very close!
Thank you very much for the comments and guidance.
John Wood
On Wed, Mar 17, 2021 at 09:00:51PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:27PM +0100, John Wood wrote:
> > #include <asm/current.h>
> > +#include <asm/rwonce.h>
> > +#include <asm/siginfo.h>
> > +#include <asm/signal.h>
> > +#include <linux/binfmts.h>
> > #include <linux/bug.h>
> > #include <linux/compiler.h>
> > +#include <linux/cred.h>
> > +#include <linux/dcache.h>
> > #include <linux/errno.h>
> > +#include <linux/fs.h>
> > #include <linux/gfp.h>
> > +#include <linux/if.h>
> > #include <linux/init.h>
> > #include <linux/jiffies.h>
> > #include <linux/kernel.h>
> > #include <linux/lsm_hooks.h>
> > #include <linux/math64.h>
> > +#include <linux/netdevice.h>
> > +#include <linux/path.h>
> > #include <linux/printk.h>
> > #include <linux/refcount.h>
> > #include <linux/rwlock.h>
> > @@ -19,9 +29,35 @@
> > #include <linux/sched.h>
> > #include <linux/sched/signal.h>
> > #include <linux/sched/task.h>
> > +#include <linux/signal.h>
> > +#include <linux/skbuff.h>
> > #include <linux/slab.h>
> > #include <linux/spinlock.h>
> > +#include <linux/stat.h>
> > #include <linux/types.h>
> > +#include <linux/uidgid.h>
>
> This is really a LOT of includes. Are you sure all of these are
> explicitly needed?
I try to add the needed header for every macro and function used. If
there is a better method to do it I will apply it. Thanks.
> > /**
> > * struct brute_stats - Fork brute force attack statistics.
> > @@ -30,6 +66,9 @@
> > * @faults: Number of crashes.
> > * @jiffies: Last crash timestamp.
> > * @period: Crash period's moving average.
> > + * @saved_cred: Saved credentials.
> > + * @network: Network activity flag.
> > + * @bounds_crossed: Privilege bounds crossed flag.
> > *
> > * This structure holds the statistical data shared by all the fork hierarchy
> > * processes.
> > @@ -40,6 +79,9 @@ struct brute_stats {
> > unsigned char faults;
> > u64 jiffies;
> > u64 period;
> > + struct brute_cred saved_cred;
> > + unsigned char network : 1;
> > + unsigned char bounds_crossed : 1;
>
> If you really want to keep faults a "char", I would move these bools
> after "faults" to avoid adding more padding.
Understood. Thanks.
> > +/**
> > + * brute_is_setid() - Test if the executable file has the setid flags set.
> > + * @bprm: Points to the linux_binprm structure.
> > + *
> > + * Return: True if the executable file has the setid flags set. False otherwise.
> > + */
> > +static bool brute_is_setid(const struct linux_binprm *bprm)
> > +{
> > + struct file *file = bprm->file;
> > + struct inode *inode;
> > + umode_t mode;
> > +
> > + if (!file)
> > + return false;
> > +
> > + inode = file->f_path.dentry->d_inode;
> > + mode = inode->i_mode;
> > +
> > + return !!(mode & (S_ISUID | S_ISGID));
> > +}
>
> Oh, er, no, this should not reinvent the wheel. You just want to know if
> creds got elevated, so you want bprm->secureexec; this gets correctly
> checked in cap_bprm_creds_from_file().
Ok, I will work on it for the next version.
> > +
> > +/**
> > + * brute_reset_stats() - Reset the statistical data.
> > + * @stats: Statistics to be reset.
> > + * @is_setid: The executable file has the setid flags set.
> > + *
> > + * Reset the faults and period and set the last crash timestamp to now. This
> > + * way, it is possible to compute the application crash period at the next
> > + * fault. Also, save the credentials of the current task and update the
> > + * bounds_crossed flag based on a previous network activity and the is_setid
> > + * parameter.
> > + *
> > + * The statistics to be reset cannot be NULL.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * and brute_stats::lock held.
> > + */
> > +static void brute_reset_stats(struct brute_stats *stats, bool is_setid)
> > +{
> > + const struct cred *cred = current_cred();
> > +
> > + stats->faults = 0;
> > + stats->jiffies = get_jiffies_64();
> > + stats->period = 0;
> > + stats->saved_cred.uid = cred->uid;
> > + stats->saved_cred.gid = cred->gid;
> > + stats->saved_cred.suid = cred->suid;
> > + stats->saved_cred.sgid = cred->sgid;
> > + stats->saved_cred.euid = cred->euid;
> > + stats->saved_cred.egid = cred->egid;
> > + stats->saved_cred.fsuid = cred->fsuid;
> > + stats->saved_cred.fsgid = cred->fsgid;
> > + stats->bounds_crossed = stats->network || is_setid;
> > +}
>
> I would include brute_reset_stats() in the first patch (and add to it as
> needed). To that end, it can start with a memset(stats, 0, sizeof(*stats));
So, need all the struct fields to be introduced in the initial patch?
Even if they are not needed in the initial patch? I'm confused.
> > +/**
> > + * brute_priv_have_changed() - Test if the privileges have changed.
> > + * @stats: Statistics that hold the saved credentials.
> > + *
> > + * The privileges have changed if the credentials of the current task are
> > + * different from the credentials saved in the statistics structure.
> > + *
> > + * The statistics that hold the saved credentials cannot be NULL.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * and brute_stats::lock held.
> > + * Return: True if the privileges have changed. False otherwise.
> > + */
> > +static bool brute_priv_have_changed(struct brute_stats *stats)
> > +{
> > + const struct cred *cred = current_cred();
> > + bool priv_have_changed;
> > +
> > + priv_have_changed = !uid_eq(stats->saved_cred.uid, cred->uid) ||
> > + !gid_eq(stats->saved_cred.gid, cred->gid) ||
> > + !uid_eq(stats->saved_cred.suid, cred->suid) ||
> > + !gid_eq(stats->saved_cred.sgid, cred->sgid) ||
> > + !uid_eq(stats->saved_cred.euid, cred->euid) ||
> > + !gid_eq(stats->saved_cred.egid, cred->egid) ||
> > + !uid_eq(stats->saved_cred.fsuid, cred->fsuid) ||
> > + !gid_eq(stats->saved_cred.fsgid, cred->fsgid);
> > +
> > + return priv_have_changed;
> > +}
>
> This should just be checked from bprm->secureexec, which is valid by the
> time you get to the bprm_committing_creds hook. You can just save the
> value to your stats struct instead of re-interrogating current_cred,
> etc.
Ok. Thanks.
> > +
> > +/**
> > + * brute_threat_model_supported() - Test if the threat model is supported.
> > + * @siginfo: Contains the signal information.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + *
> > + * To avoid false positives during the attack detection it is necessary to
> > + * narrow the possible cases. Only the following scenarios are taken into
> > + * account:
> > + *
> > + * 1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
> > + * desirable memory layout is got (e.g. Stack Clash).
> > + * 2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until
> > + * a desirable memory layout is got (e.g. what CTFs do for simple network
> > + * service).
> > + * 3.- Launching processes without exec() (e.g. Android Zygote) and exposing
> > + * state to attack a sibling.
> > + * 4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
> > + * the previously shared memory layout of all the other children is exposed
> > + * (e.g. kind of related to HeartBleed).
> > + *
> > + * In each case, a privilege boundary has been crossed:
> > + *
> > + * Case 1: setuid/setgid process
> > + * Case 2: network to local
> > + * Case 3: privilege changes
> > + * Case 4: network to local
> > + *
> > + * Also, only the signals delivered by the kernel are taken into account with
> > + * the exception of the SIGABRT signal since the latter is used by glibc for
> > + * stack canary, malloc, etc failures, which may indicate that a mitigation has
> > + * been triggered.
> > + *
> > + * The signal information and the statistical data shared by all the fork
> > + * hierarchy processes cannot be NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > + * held.
> > + * Return: True if the threat model is supported. False otherwise.
> > + */
> > +static bool brute_threat_model_supported(const kernel_siginfo_t *siginfo,
> > + struct brute_stats *stats)
> > +{
> > + bool bounds_crossed;
> > +
> > + if (siginfo->si_signo == SIGKILL && siginfo->si_code != SIGABRT)
> > + return false;
> > +
> > + spin_lock(&stats->lock);
> > + bounds_crossed = stats->bounds_crossed;
> > + bounds_crossed = bounds_crossed || brute_priv_have_changed(stats);
> > + stats->bounds_crossed = bounds_crossed;
> > + spin_unlock(&stats->lock);
> > +
> > + return bounds_crossed;
> > +}
>
> I think this logic can be done with READ_ONCE()s and moved directly into
> brute_task_fatal_signal().
Thanks. I will work on locking.
> >
> > +/**
> > + * brute_network() - Target for the socket_sock_rcv_skb hook.
> > + * @sk: Contains the sock (not socket) associated with the incoming sk_buff.
> > + * @skb: Contains the incoming network data.
> > + *
> > + * A previous step to detect that a network to local boundary has been crossed
> > + * is to detect if there is network activity. To do this, it is only necessary
> > + * to check if there are data packets received from a network device other than
> > + * loopback.
> > + *
> > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > + * context during the execution of the socket_sock_rcv_skb hook.
> > + *
> > + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> > + * otherwise.
> > + */
> > +static int brute_network(struct sock *sk, struct sk_buff *skb)
> > +{
> > + struct brute_stats **stats;
> > + unsigned long flags;
> > +
> > + if (!skb->dev || (skb->dev->flags & IFF_LOOPBACK))
> > + return 0;
> > +
> > + stats = brute_stats_ptr(current);
>
> Uhh, is "current" valid here? I actually don't know this hook very well.
I think so, but I will try to study it. Thanks for noted this.
> > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > +
> > + if (!*stats) {
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > + return -EFAULT;
> > + }
> > +
> > + spin_lock(&(*stats)->lock);
> > + (*stats)->network = true;
> > + spin_unlock(&(*stats)->lock);
> > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > + return 0;
> > +}
Thanks,
John Wood
On Wed, Mar 17, 2021 at 09:04:15PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:28PM +0100, John Wood wrote:
> > +/**
> > + * brute_kill_offending_tasks() - Kill the offending tasks.
> > + * @attack_type: Brute force attack type.
> > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > + *
> > + * When a brute force attack is detected all the offending tasks involved in the
> > + * attack must be killed. In other words, it is necessary to kill all the tasks
> > + * that share the same statistical data. Moreover, if the attack happens through
> > + * the fork system call, the processes that have the same group_leader that the
> > + * current task must be avoided since they are in the path to be killed.
> > + *
> > + * When the SIGKILL signal is sent to the offending tasks, this function will be
> > + * called again from the task_fatal_signal hook due to a small crash period. So,
> > + * to avoid kill again the same tasks due to a recursive call of this function,
> > + * it is necessary to disable the attack detection for this fork hierarchy.
>
> Hah. Interesting. I wonder if there is a better way to handle this. Hmm.
If your comment is related to disable the detection:
I think it's no problematic to disable the attack detection for this fork
hierarchy since all theirs tasks will be removed. Also, I think that the disable
mark can help in the path to use the wait*() functions to notify userspace that
a task has been killed by the brute mitigation. Is a work in progress now.
If your comment is related to kill all the tasks:
In the previous version I have a useful discussion with Andi Kleen where a
proposal to block the fork system call during a time was made. He explains me
the cons of this method and proposes that if the mitigation works as now we can
use the wait*() functions to notify userspace that the tasks has been killed
by the brute mitigation. This way other problems related with the supervisors
and respawned processes could be handled.
Anyway, new points of view are also welcome.
> > + *
> > + * The statistical data shared by all the fork hierarchy processes cannot be
> > + * NULL.
> > + *
> > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > + * since the task_free hook can be called from an IRQ context during the
> > + * execution of the task_fatal_signal hook.
> > + *
> > + * Context: Must be called with interrupts disabled and tasklist_lock and
> > + * brute_stats_ptr_lock held.
> > + */
> > +static void brute_kill_offending_tasks(enum brute_attack_type attack_type,
> > + struct brute_stats *stats)
> > +{
> > + struct task_struct *p;
> > + struct brute_stats **p_stats;
> > +
> > + spin_lock(&stats->lock);
> > +
> > + if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
> > + refcount_read(&stats->refc) == 1) {
> > + spin_unlock(&stats->lock);
> > + return;
> > + }
>
> refcount_read() isn't a safe way to check that there is only 1
> reference. What's this trying to do?
If a fork brute force attack has been detected is due to a new fatal crash.
Under this scenario, if there is only one reference of these stats, it is
not necessary to kill any other tasks since the stats are not shared with
another process. Moreover, if this task has failed in a fatal way, is in
the path to be killed. So, no action is required.
How can I make this check in a safe way?
> > +
> > + brute_disable(stats);
> > + spin_unlock(&stats->lock);
> > +
> > + for_each_process(p) {
> > + if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
> > + p->group_leader == current->group_leader)
> > + continue;
> > +
> > + p_stats = brute_stats_ptr(p);
> > + if (*p_stats != stats)
> > + continue;
> > +
> > + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> > + pr_warn_ratelimited("Offending process %d [%s] killed\n",
> > + p->pid, p->comm);
> > + }
> > +}
Thanks,
John Wood
On Wed, Mar 17, 2021 at 09:08:17PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:29PM +0100, John Wood wrote:
> > +
> > +count_fork_matches()
> > +{
> > + dmesg | grep "brute: Fork brute force attack detected" | wc -l
>
> This may be unstable if the dmesg scrolls past, etc. See how
> lkdtm/run.sh handles this with a temp file and "comm".
Thanks, I will correct this for the next version.
John Wood
On Wed, Mar 17, 2021 at 09:10:05PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:30PM +0100, John Wood wrote:
> > +These statistics are hold by the brute_stats struct.
> > +
> > +struct brute_cred {
> > + kuid_t uid;
> > + kgid_t gid;
> > + kuid_t suid;
> > + kgid_t sgid;
> > + kuid_t euid;
> > + kgid_t egid;
> > + kuid_t fsuid;
> > + kgid_t fsgid;
> > +};
> > +
> > +struct brute_stats {
> > + spinlock_t lock;
> > + refcount_t refc;
> > + unsigned char faults;
> > + u64 jiffies;
> > + u64 period;
> > + struct brute_cred saved_cred;
> > + unsigned char network : 1;
> > + unsigned char bounds_crossed : 1;
> > +};
>
> Instead of open-coding this, just use the kerndoc references you've
> already built in the .c files:
>
> .. kernel-doc:: security/brute/brute.c
>
Ok, thanks.
John Wood
On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
> On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> > +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
> > +{
> > + u64 current_period;
> > + u64 last_crash_timestamp;
> > +
> > + spin_lock(&stats->lock);
> > + current_period = now - stats->jiffies;
> > + last_crash_timestamp = stats->jiffies;
> > + stats->jiffies = now;
> > +
> > + stats->period -= brute_mul_by_ema_weight(stats->period);
> > + stats->period += brute_mul_by_ema_weight(current_period);
> > +
> > + if (stats->faults < BRUTE_MAX_FAULTS)
> > + stats->faults += 1;
> > +
> > + spin_unlock(&stats->lock);
> > + return last_crash_timestamp;
> > +}
>
> Now *here* locking makes sense, and it only needs to be per-stat, not
> global, since multiple processes may be operating on the same stat
> struct. To make this more no-reader-locking-friendly, I'd also update
> everything at the end, and use WRITE_ONCE():
>
> u64 current_period, period;
> u64 last_crash_timestamp;
> u64 faults;
>
> spin_lock(&stats->lock);
> current_period = now - stats->jiffies;
> last_crash_timestamp = stats->jiffies;
>
> WRITE_ONCE(stats->period,
> stats->period - brute_mul_by_ema_weight(stats->period) +
> brute_mul_by_ema_weight(current_period));
>
> if (stats->faults < BRUTE_MAX_FAULTS)
> WRITE_ONCE(stats->faults, stats->faults + 1);
>
> WRITE_ONCE(stats->jiffies, now);
>
> spin_unlock(&stats->lock);
> return last_crash_timestamp;
>
> That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults
> without needing to hold the &stats->lock (unless they need perfectly matching
> jiffies, period, and faults).
Sorry, but I try to understand how to use locking properly without luck.
I have read (and tried to understand):
tools/memory-model/Documentation/simple.txt
tools/memory-model/Documentation/ordering.txt
tools/memory-model/Documentation/recipes.txt
Documentation/memory-barriers.txt
And I don't find the responses that I need. I'm not saying they aren't
there but I don't see them. So my questions:
If in the above function makes sense to use locking, and it is called from
the brute_task_fatal_signal hook, then, all the functions that are called
from this hook need locking (more than one process can access stats at the
same time).
So, as you point, how it is possible and safe to read jiffies and faults
(and I think period even though you not mention it) using READ_ONCE() but
without holding brute_stats::lock? I'm very confused.
IIUC (during the reading of the documentation) READ_ONCE and WRITE_ONCE only
guarantees that a variable loaded with WRITE_ONCE can be read safely with
READ_ONCE avoiding tearing, etc. So, I see these functions like a form of
guarantee atomicity in variables.
Another question. Is it also safe to use WRITE_ONCE without holding the lock?
Or this is only appliable to read operations?
Any light on this will help me to do the best job in the next patches. If
somebody can point me to the right direction it would be greatly appreciated.
Is there any documentation for newbies regarding this theme? I'm stuck.
I have also read the documentation about spinlocks, semaphores, mutex, etc..
but nothing clears me the concept expose.
Apologies if this question has been answered in the past. But the search in
the mailing list has not been lucky.
Thanks for your time and patience.
John Wood
On Sat, Mar 20, 2021 at 04:01:53PM +0100, John Wood wrote:
> Hi,
> First of all thanks for the review. More info and questions inline.
>
> On Wed, Mar 17, 2021 at 07:00:56PM -0700, Kees Cook wrote:
> > On Sun, Mar 07, 2021 at 12:30:25PM +0100, John Wood wrote:
> > >
> > > config LSM
> > > string "Ordered list of enabled LSMs"
> > > - default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> > > - default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> > > - default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> > > - default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> > > - default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
> > > + default "brute,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
> > > + default "brute,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
> > > + default "brute,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
> > > + default "brute,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
> > > + default "brute,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
> >
> > It probably doesn't matter much, but I think brute should be added
> > between lockdown and yama.
>
> What is the rationale for the stacking order (in relation with brute and
> lockdown)?
lockdown has some very early hooks, so leaving it at the front seems
organizationally correct to me. It doesn't really matter, though, so
perhaps we should just alphabetize them, but that's for another day.
>
> > > diff --git a/security/Makefile b/security/Makefile
> > > index 3baf435de541..1236864876da 100644
> > > --- a/security/Makefile
> > > +++ b/security/Makefile
> > > @@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM) += bpf/
> > > # Object integrity file lists
> > > subdir-$(CONFIG_INTEGRITY) += integrity
> > > obj-$(CONFIG_INTEGRITY) += integrity/
> > > +
> > > +# Object brute file lists
> > > +subdir-$(CONFIG_SECURITY_FORK_BRUTE) += brute
> > > +obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute/
> >
> > I don't think subdir is needed here? I think you can use obj-... like
> > loadpin, etc.
>
> loadpin also uses subdir just like selinux, smack, tomoyo, etc.. So, why
> is it not necessary for brute?
Oops, yes, my mistake. I didn't look at the Makefile as a whole. I will
adjust my suggestion as: please split subdir and obj as done by the
other LSMs (integrity should be fixed to do the same, but that doesn't
need to be part of this series).
>
> > > +#include <asm/current.h>
> >
> > Why is this needed?
>
> IIUC, the "current" macro is defined in this header. I try to include the
> appropiate header for every macro and function used.
The common approach is actually to minimize the number of explicit
headers so that if header files includes need to be changed, they only
need to be changed internally instead of everywhere in the kernel.
Please find an appropriately minimal set of headers to include.
>
> > > +/**
> > > + * struct brute_stats - Fork brute force attack statistics.
> > > + * @lock: Lock to protect the brute_stats structure.
> > > + * @refc: Reference counter.
> > > + * @faults: Number of crashes.
> > > + * @jiffies: Last crash timestamp.
> > > + * @period: Crash period's moving average.
> > > + *
> > > + * This structure holds the statistical data shared by all the fork hierarchy
> > > + * processes.
> > > + */
> > > +struct brute_stats {
> > > + spinlock_t lock;
> > > + refcount_t refc;
> > > + unsigned char faults;
> > > + u64 jiffies;
> > > + u64 period;
> > > +};
> >
> > I assume the max-255 "faults" will be explained... why is this so small?
>
> If a brute force attack is running slowly for a long time, the application
> crash period's EMA is not suitable for the detection. This type of attack
> must be detected using a maximum number of faults. In this case, the
> BRUTE_MAX_FAULTS is defined as 200.
Okay, so given the choise of BRUTE_MAX_FAULTS, you limited the storage
size? I guess I worry about this somehow wrapping around easily. Given
the struct has padding due to the u8 storage, it seems like just using
int would be fine too.
> > > +static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > > +{
> > > + struct brute_stats **stats, **p_stats;
> > > +
> > > + stats = brute_stats_ptr(task);
> > > + p_stats = brute_stats_ptr(current);
> > > +
> > > + if (likely(*p_stats)) {
> > > + brute_share_stats(*p_stats, stats);
> > > + return 0;
> > > + }
> > > +
> > > + *stats = brute_new_stats();
> > > + if (!*stats)
> > > + return -ENOMEM;
> > > +
> > > + brute_share_stats(*stats, p_stats);
> > > + return 0;
> > > +}
> >
> > During the task_alloc hook, aren't both "current" and "task" already
> > immutable (in the sense that no lock needs to be held for
> > brute_share_stats())?
>
> I will work on it.
>
> > And what is the case where brute_stats_ptr(current) returns NULL?
>
> Sorry, but I don't understand what you are trying to explain me.
> brute_stats_ptr(current) returns a pointer to a pointer. So, I think
> your question is: What's the purpose of the "if (likely(*p_stats))"
> check? If it is the case, this check is to guarantee that all the tasks
> have statistical data. If some task has been allocated prior the brute
> LSM initialization, this task doesn't have stats. So, with this check
> all the tasks that fork have stats.
Thank you for figuring out my poorly-worded question. :) Yes, I was
curious about the "if (likely(*p_stats))". It seems like it shouldn't be
possible for a process to lack a stats allocation: the LSMs get
initialized before processes. If you wanted to be defensive, I would
have expected:
if (WARN_ON_ONCE(!*p_stats))
return -ENOMEM;
or something (brute should be able to count on the kernel internals
behaving here: you're not expecting any path where this could happen).
>
> > > +
> > > +/**
> > > + * brute_task_execve() - Target for the bprm_committing_creds hook.
> > > + * @bprm: Points to the linux_binprm structure.
> > > + *
> > > + * When a forked task calls the execve system call, the memory contents are set
> > > + * with new values. So, in this scenario the parent's statistical data no need
> > > + * to be shared. Instead, a new statistical data structure must be allocated to
> > > + * start a new hierarchy. This condition is detected when the statistics
> > > + * reference counter holds a value greater than or equal to two (a fork always
> > > + * sets the statistics reference counter to a minimum of two since the parent
> > > + * and the child task are sharing the same data).
> > > + *
> > > + * However, if the execve function is called immediately after another execve
> > > + * call, althought the memory contents are reset, there is no need to allocate
> > > + * a new statistical data structure. This is possible because at this moment
> > > + * only one task (the task that calls the execve function) points to the data.
> > > + * In this case, the previous allocation is used but the statistics are reset.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the bprm_committing_creds hook.
> > > + */
> > > +static void brute_task_execve(struct linux_binprm *bprm)
> > > +{
> > > + struct brute_stats **stats;
> > > + unsigned long flags;
> > > +
> > > + stats = brute_stats_ptr(current);
> > > + if (WARN(!*stats, "No statistical data\n"))
> > > + return;
> > > +
> > > + spin_lock_irqsave(&(*stats)->lock, flags);
> > > +
> > > + if (!refcount_dec_not_one(&(*stats)->refc)) {
> > > + /* execve call after an execve call */
> > > + (*stats)->faults = 0;
> > > + (*stats)->jiffies = get_jiffies_64();
> > > + (*stats)->period = 0;
> > > + spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + return;
> > > + }
> > > +
> > > + /* execve call after a fork call */
> > > + spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + *stats = brute_new_stats();
> > > + WARN(!*stats, "Cannot allocate statistical data\n");
> > > +}
> >
> > I don't think any of this locking is needed -- you're always operating
> > on "current", so its brute_stats will always be valid.
>
> But another process (that share the same stats) could be modifying this
> concurrently.
>
> Scenario 1: cpu 1 writes stats and cpu 2 writes stats.
> Scenario 2: cpu 1 writes stats, then IRQ on the same cpu writes stats.
>
> I think it is possible. So AFAIK we need locking. Sorry if I am wrong.
Maybe I'm misunderstanding, but even your comments on the function say
that the zeroing path is there to avoid a new allocation, since only 1
thread has access to that "stats". (i.e. no locking needed), and in the
other path, a new stats is allocated (no locking needed). What are the
kernel execution paths you see where you'd need locking here?
> > > +/**
> > > + * brute_task_free() - Target for the task_free hook.
> > > + * @task: Task about to be freed.
> > > + *
> > > + * The statistical data that is shared between all the fork hierarchy processes
> > > + * needs to be freed when this hierarchy disappears.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_free hook.
> > > + */
> > > +static void brute_task_free(struct task_struct *task)
> > > +{
> > > + struct brute_stats **stats;
> > > + unsigned long flags;
> > > + bool refc_is_zero;
> > > +
> > > + stats = brute_stats_ptr(task);
> > > + if (WARN(!*stats, "No statistical data\n"))
> > > + return;
> > > +
> > > + spin_lock_irqsave(&(*stats)->lock, flags);
> > > + refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> > > + spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > +
> > > + if (refc_is_zero) {
> > > + kfree(*stats);
> > > + *stats = NULL;
> > > + }
> > > +}
> >
> > Same thing -- this is what dec_and_test is for: it's atomic, so no
> > locking needed.
>
> Ok, in this case I can see that the locking is not necessary due to the
> stats::refc is atomic. But in the previous case, faults, jiffies and
> period are not atomic. So I think the lock is necessary. If not, what am
> I missing?
I thought the code had established that there could only be a single
stats holder for that code, so no locking. Maybe I misunderstood?
--
Kees Cook
On Sun, Mar 21, 2021 at 04:01:18PM +0100, John Wood wrote:
> On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
> > On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> > > +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
> > > +{
> > > + u64 current_period;
> > > + u64 last_crash_timestamp;
> > > +
> > > + spin_lock(&stats->lock);
> > > + current_period = now - stats->jiffies;
> > > + last_crash_timestamp = stats->jiffies;
> > > + stats->jiffies = now;
> > > +
> > > + stats->period -= brute_mul_by_ema_weight(stats->period);
> > > + stats->period += brute_mul_by_ema_weight(current_period);
> > > +
> > > + if (stats->faults < BRUTE_MAX_FAULTS)
> > > + stats->faults += 1;
> > > +
> > > + spin_unlock(&stats->lock);
> > > + return last_crash_timestamp;
> > > +}
> >
> > Now *here* locking makes sense, and it only needs to be per-stat, not
> > global, since multiple processes may be operating on the same stat
> > struct. To make this more no-reader-locking-friendly, I'd also update
> > everything at the end, and use WRITE_ONCE():
> >
> > u64 current_period, period;
> > u64 last_crash_timestamp;
> > u64 faults;
> >
> > spin_lock(&stats->lock);
> > current_period = now - stats->jiffies;
> > last_crash_timestamp = stats->jiffies;
> >
> > WRITE_ONCE(stats->period,
> > stats->period - brute_mul_by_ema_weight(stats->period) +
> > brute_mul_by_ema_weight(current_period));
> >
> > if (stats->faults < BRUTE_MAX_FAULTS)
> > WRITE_ONCE(stats->faults, stats->faults + 1);
> >
> > WRITE_ONCE(stats->jiffies, now);
> >
> > spin_unlock(&stats->lock);
> > return last_crash_timestamp;
> >
> > That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults
> > without needing to hold the &stats->lock (unless they need perfectly matching
> > jiffies, period, and faults).
>
> Sorry, but I try to understand how to use locking properly without luck.
>
> I have read (and tried to understand):
> tools/memory-model/Documentation/simple.txt
> tools/memory-model/Documentation/ordering.txt
> tools/memory-model/Documentation/recipes.txt
> Documentation/memory-barriers.txt
>
> And I don't find the responses that I need. I'm not saying they aren't
> there but I don't see them. So my questions:
>
> If in the above function makes sense to use locking, and it is called from
> the brute_task_fatal_signal hook, then, all the functions that are called
> from this hook need locking (more than one process can access stats at the
> same time).
>
> So, as you point, how it is possible and safe to read jiffies and faults
> (and I think period even though you not mention it) using READ_ONCE() but
> without holding brute_stats::lock? I'm very confused.
There are, I think, 3 considerations:
- is "stats", itself, a valid allocation in kernel memory? This is the
"lifetime" management of the structure: it will only stay allocated as
long as there is a task still alive that is attached to it. The use of
refcount_t on task creation/death should entirely solve this issue, so
that all the other places where you access "stats", the memory will be
valid. AFAICT, this one is fine: you're doing all the correct lifetime
management.
- changing a task's stats pointer: this is related to lifetime
management, but it, I think, entirely solved by the existing
refcounting. (And isn't helped by holding stats->lock since this is
about stats itself being a valid pointer.) Again, I think this is all
correct already in your existing code (due to the implicit locking of
"current"). Perhaps I've missed something here, but I guess we'll see!
- are the values in stats getting written by multiple writers, or read
during a write, etc?
This last one is the core of what I think could be improved here:
To keep the writes serialized, you (correctly) perform locking in the
writers. This is fine.
There is also locking in the readers, which I think is not needed.
AFAICT, READ_ONCE() (with WRITE_ONCE() in the writers) is sufficient for
the readers here.
> IIUC (during the reading of the documentation) READ_ONCE and WRITE_ONCE only
> guarantees that a variable loaded with WRITE_ONCE can be read safely with
> READ_ONCE avoiding tearing, etc. So, I see these functions like a form of
> guarantee atomicity in variables.
Right -- from what I can see about how you're reading the statistics, I
don't see a way to have the values get confused (assuming locked writes
and READ/WRITE_ONCE()).
> Another question. Is it also safe to use WRITE_ONCE without holding the lock?
> Or this is only appliable to read operations?
No -- you'll still want the writer locked since you update multiple fields
in stats during a write, so you could miss increments, or interleave
count vs jiffies writes, etc. But the WRITE_ONCE() makes sure that the
READ_ONCE() readers will see a stable value (as I understand it), and
in the order they were written.
> Any light on this will help me to do the best job in the next patches. If
> somebody can point me to the right direction it would be greatly appreciated.
>
> Is there any documentation for newbies regarding this theme? I'm stuck.
> I have also read the documentation about spinlocks, semaphores, mutex, etc..
> but nothing clears me the concept expose.
>
> Apologies if this question has been answered in the past. But the search in
> the mailing list has not been lucky.
It's a complex subject! Here are some other docs that might help:
tools/memory-model/Documentation/explanation.txt
Documentation/core-api/refcount-vs-atomic.rst
or they may melt your brain further! :) I know mine is always mushy
after reading them.
> Thanks for your time and patience.
You're welcome; and thank you for your work on this! I've wanted a robust
brute force mitigation in the kernel for a long time. :)
-Kees
--
Kees Cook
On Sat, Mar 20, 2021 at 04:46:48PM +0100, John Wood wrote:
> On Wed, Mar 17, 2021 at 09:00:51PM -0700, Kees Cook wrote:
> > On Sun, Mar 07, 2021 at 12:30:27PM +0100, John Wood wrote:
> > > +/**
> > > + * brute_reset_stats() - Reset the statistical data.
> > > + * @stats: Statistics to be reset.
> > > + * @is_setid: The executable file has the setid flags set.
> > > + *
> > > + * Reset the faults and period and set the last crash timestamp to now. This
> > > + * way, it is possible to compute the application crash period at the next
> > > + * fault. Also, save the credentials of the current task and update the
> > > + * bounds_crossed flag based on a previous network activity and the is_setid
> > > + * parameter.
> > > + *
> > > + * The statistics to be reset cannot be NULL.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * and brute_stats::lock held.
> > > + */
> > > +static void brute_reset_stats(struct brute_stats *stats, bool is_setid)
> > > +{
> > > + const struct cred *cred = current_cred();
> > > +
> > > + stats->faults = 0;
> > > + stats->jiffies = get_jiffies_64();
> > > + stats->period = 0;
> > > + stats->saved_cred.uid = cred->uid;
> > > + stats->saved_cred.gid = cred->gid;
> > > + stats->saved_cred.suid = cred->suid;
> > > + stats->saved_cred.sgid = cred->sgid;
> > > + stats->saved_cred.euid = cred->euid;
> > > + stats->saved_cred.egid = cred->egid;
> > > + stats->saved_cred.fsuid = cred->fsuid;
> > > + stats->saved_cred.fsgid = cred->fsgid;
> > > + stats->bounds_crossed = stats->network || is_setid;
> > > +}
> >
> > I would include brute_reset_stats() in the first patch (and add to it as
> > needed). To that end, it can start with a memset(stats, 0, sizeof(*stats));
>
> So, need all the struct fields to be introduced in the initial patch?
> Even if they are not needed in the initial patch? I'm confused.
No, I meant try to introduce as much infrastructure as possible early in
the series. In this case, I was suggesting having introduced
brute_reset_stats() at the start, so that in this patch you'd just be
adding the new fields to the function. (Instead of both adding new
fields and changing the execution pattern.)
> > > +/**
> > > + * brute_network() - Target for the socket_sock_rcv_skb hook.
> > > + * @sk: Contains the sock (not socket) associated with the incoming sk_buff.
> > > + * @skb: Contains the incoming network data.
> > > + *
> > > + * A previous step to detect that a network to local boundary has been crossed
> > > + * is to detect if there is network activity. To do this, it is only necessary
> > > + * to check if there are data packets received from a network device other than
> > > + * loopback.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the socket_sock_rcv_skb hook.
> > > + *
> > > + * Return: -EFAULT if the current task doesn't have statistical data. Zero
> > > + * otherwise.
> > > + */
> > > +static int brute_network(struct sock *sk, struct sk_buff *skb)
> > > +{
> > > + struct brute_stats **stats;
> > > + unsigned long flags;
> > > +
> > > + if (!skb->dev || (skb->dev->flags & IFF_LOOPBACK))
> > > + return 0;
I wonder if you need to also ignore netlink, unix sockets, etc, or does
the IFF_LOOPBACK cover those too?
> > > +
> > > + stats = brute_stats_ptr(current);
> >
> > Uhh, is "current" valid here? I actually don't know this hook very well.
>
> I think so, but I will try to study it. Thanks for noted this.
I think you might need to track the mapping of task to sock via
security_socket_post_create(), security_socket_accept(),
and/or security_socket_connect()?
Perhaps just mark it once with security_socket_post_create(), instead of
running a hook on every incoming network packet, too?
-Kees
> > > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > +
> > > + if (!*stats) {
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + spin_lock(&(*stats)->lock);
> > > + (*stats)->network = true;
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > + return 0;
> > > +}
>
> Thanks,
> John Wood
--
Kees Cook
On Sat, Mar 20, 2021 at 04:48:47PM +0100, John Wood wrote:
> On Wed, Mar 17, 2021 at 09:04:15PM -0700, Kees Cook wrote:
> > On Sun, Mar 07, 2021 at 12:30:28PM +0100, John Wood wrote:
> > > +/**
> > > + * brute_kill_offending_tasks() - Kill the offending tasks.
> > > + * @attack_type: Brute force attack type.
> > > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > > + *
> > > + * When a brute force attack is detected all the offending tasks involved in the
> > > + * attack must be killed. In other words, it is necessary to kill all the tasks
> > > + * that share the same statistical data. Moreover, if the attack happens through
> > > + * the fork system call, the processes that have the same group_leader that the
> > > + * current task must be avoided since they are in the path to be killed.
> > > + *
> > > + * When the SIGKILL signal is sent to the offending tasks, this function will be
> > > + * called again from the task_fatal_signal hook due to a small crash period. So,
> > > + * to avoid kill again the same tasks due to a recursive call of this function,
> > > + * it is necessary to disable the attack detection for this fork hierarchy.
> >
> > Hah. Interesting. I wonder if there is a better way to handle this. Hmm.
>
> If your comment is related to disable the detection:
>
> I think it's no problematic to disable the attack detection for this fork
> hierarchy since all theirs tasks will be removed. Also, I think that the disable
> mark can help in the path to use the wait*() functions to notify userspace that
> a task has been killed by the brute mitigation. Is a work in progress now.
>
> If your comment is related to kill all the tasks:
>
> In the previous version I have a useful discussion with Andi Kleen where a
> proposal to block the fork system call during a time was made. He explains me
> the cons of this method and proposes that if the mitigation works as now we can
> use the wait*() functions to notify userspace that the tasks has been killed
> by the brute mitigation. This way other problems related with the supervisors
> and respawned processes could be handled.
>
> Anyway, new points of view are also welcome.
I was just amused by my realizing that the brute mitigation could
trigger itself. I was just glad you had a comment about the
situation -- I hadn't thought about that case yet. :)
>
> > > + *
> > > + * The statistical data shared by all the fork hierarchy processes cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and tasklist_lock and
> > > + * brute_stats_ptr_lock held.
> > > + */
> > > +static void brute_kill_offending_tasks(enum brute_attack_type attack_type,
> > > + struct brute_stats *stats)
> > > +{
> > > + struct task_struct *p;
> > > + struct brute_stats **p_stats;
> > > +
> > > + spin_lock(&stats->lock);
> > > +
> > > + if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
> > > + refcount_read(&stats->refc) == 1) {
> > > + spin_unlock(&stats->lock);
> > > + return;
> > > + }
> >
> > refcount_read() isn't a safe way to check that there is only 1
> > reference. What's this trying to do?
>
> If a fork brute force attack has been detected is due to a new fatal crash.
> Under this scenario, if there is only one reference of these stats, it is
> not necessary to kill any other tasks since the stats are not shared with
> another process. Moreover, if this task has failed in a fatal way, is in
> the path to be killed. So, no action is required.
>
> How can I make this check in a safe way?
I think you can just skip the optimization -- killing off threads isn't
going to be a fast path.
-Kees
>
> > > +
> > > + brute_disable(stats);
> > > + spin_unlock(&stats->lock);
> > > +
> > > + for_each_process(p) {
> > > + if (attack_type == BRUTE_ATTACK_TYPE_FORK &&
> > > + p->group_leader == current->group_leader)
> > > + continue;
> > > +
> > > + p_stats = brute_stats_ptr(p);
> > > + if (*p_stats != stats)
> > > + continue;
> > > +
> > > + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
> > > + pr_warn_ratelimited("Offending process %d [%s] killed\n",
> > > + p->pid, p->comm);
> > > + }
> > > +}
>
> Thanks,
> John Wood
--
Kees Cook
On Sat, Mar 20, 2021 at 04:34:06PM +0100, John Wood wrote:
> On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
> > On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> > > @@ -74,7 +84,7 @@ static struct brute_stats *brute_new_stats(void)
> > > {
> > > struct brute_stats *stats;
> > >
> > > - stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
> > > + stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC);
> >
> > Why change this here? I'd just start with this in the patch that
> > introduces it.
>
> To be coherent in the previous patch. In the previous patch the kmalloc
> could use GFP_KERNEL due to the call was made out of an atomic context.
> Now, with the new lock it needs GFP_ATOMIC. So the question:
>
> If finally it need to use GFP_ATOMIC, the first patch need to use it even if
> it is not necessary?
It's probably not a big deal, but for me, I'd just do GFP_ATOMIC from
the start, maybe add a comment that says "some LSM hooks are from atomic
context" or something.
> > > if (!stats)
> > > return NULL;
> > >
> > > @@ -99,16 +109,17 @@ static struct brute_stats *brute_new_stats(void)
> > > * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > * since the task_free hook can be called from an IRQ context during the
> > > * execution of the task_alloc hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * held.
> > > */
> > > static void brute_share_stats(struct brute_stats *src,
> > > struct brute_stats **dst)
> > > {
> > > - unsigned long flags;
> > > -
> > > - spin_lock_irqsave(&src->lock, flags);
> > > + spin_lock(&src->lock);
> > > refcount_inc(&src->refc);
> > > *dst = src;
> > > - spin_unlock_irqrestore(&src->lock, flags);
> > > + spin_unlock(&src->lock);
> > > }
> >
> > I still don't think any locking is needed here; the whole function can
> > go away, IMO.
>
> In this case I think this is possible:
>
> Scenario 1: cpu 1 writes the stats pointer and cpu 2 is navigating the
> processes tree reading the same stats pointer.
>
> Scenario 2: cpu 1 is navigating the processes tree reading the stats
> pointer and in IRQ the same stats pointer is wrote.
>
> So, we need locking. Am I wrong?
But only the refcount is being incremented, yes? That doesn't need a
lock because it's already an atomic.
>
> > > /**
> > > @@ -126,26 +137,36 @@ static void brute_share_stats(struct brute_stats *src,
> > > * this task and the new one being allocated. Otherwise, share the statistics
> > > * that the current task already has.
> > > *
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the task_alloc hook.
> > > + *
> > > * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
> > > * otherwise.
> > > */
> > > static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > > {
> > > struct brute_stats **stats, **p_stats;
> > > + unsigned long flags;
> > >
> > > stats = brute_stats_ptr(task);
> > > p_stats = brute_stats_ptr(current);
> > > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > >
> > > if (likely(*p_stats)) {
> > > brute_share_stats(*p_stats, stats);
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return 0;
> > > }
> > >
> > > *stats = brute_new_stats();
> > > - if (!*stats)
> > > + if (!*stats) {
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return -ENOMEM;
> > > + }
> > >
> > > brute_share_stats(*stats, p_stats);
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return 0;
> > > }
> >
> > I'd much prefer that whatever locking is needed be introduced in the
> > initial patch: this transformation just double the work to review. :)
>
> So, IIUC I need to introduce all the locks in the initial patch even if
> they are not necessary. Am I right?
I would find it easier to follow. Perhaps other reviewers would have a
different opinion.
>
> > >
> > > @@ -167,9 +188,9 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
> > > * only one task (the task that calls the execve function) points to the data.
> > > * In this case, the previous allocation is used but the statistics are reset.
> > > *
> > > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > - * since the task_free hook can be called from an IRQ context during the
> > > - * execution of the bprm_committing_creds hook.
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the bprm_committing_creds hook.
> > > */
> > > static void brute_task_execve(struct linux_binprm *bprm)
> > > {
> > > @@ -177,24 +198,33 @@ static void brute_task_execve(struct linux_binprm *bprm)
> > > unsigned long flags;
> > >
> > > stats = brute_stats_ptr(current);
> > > - if (WARN(!*stats, "No statistical data\n"))
> > > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > +
> > > + if (WARN(!*stats, "No statistical data\n")) {
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return;
> > > + }
> > >
> > > - spin_lock_irqsave(&(*stats)->lock, flags);
> > > + spin_lock(&(*stats)->lock);
> > >
> > > if (!refcount_dec_not_one(&(*stats)->refc)) {
> > > /* execve call after an execve call */
> > > (*stats)->faults = 0;
> > > (*stats)->jiffies = get_jiffies_64();
> > > (*stats)->period = 0;
> > > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return;
> > > }
> > >
> > > /* execve call after a fork call */
> > > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > +
> > > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > *stats = brute_new_stats();
> > > WARN(!*stats, "Cannot allocate statistical data\n");
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > }
> >
> > Again, I don't see a need for locking -- this is just managing the
> > lifetime which is entirely handled by the implicit locking of "current"
> > and the refcount_t.
>
> Here I can see the same two scenarios noted before. So I think the locking
> is needed. Am I right?
>
> > > /**
> > > @@ -204,9 +234,9 @@ static void brute_task_execve(struct linux_binprm *bprm)
> > > * The statistical data that is shared between all the fork hierarchy processes
> > > * needs to be freed when this hierarchy disappears.
> > > *
> > > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > - * since the task_free hook can be called from an IRQ context during the
> > > - * execution of the task_free hook.
> > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock
> > > + * and brute_stats::lock since the task_free hook can be called from an IRQ
> > > + * context during the execution of the task_free hook.
> > > */
> > > static void brute_task_free(struct task_struct *task)
> > > {
> > > @@ -215,17 +245,446 @@ static void brute_task_free(struct task_struct *task)
> > > bool refc_is_zero;
> > >
> > > stats = brute_stats_ptr(task);
> > > - if (WARN(!*stats, "No statistical data\n"))
> > > + read_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > +
> > > + if (WARN(!*stats, "No statistical data\n")) {
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > return;
> > > + }
> > >
> > > - spin_lock_irqsave(&(*stats)->lock, flags);
> > > + spin_lock(&(*stats)->lock);
> > > refc_is_zero = refcount_dec_and_test(&(*stats)->refc);
> > > - spin_unlock_irqrestore(&(*stats)->lock, flags);
> > > + spin_unlock(&(*stats)->lock);
> > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > >
> > > if (refc_is_zero) {
> > > + write_lock_irqsave(&brute_stats_ptr_lock, flags);
> > > kfree(*stats);
> > > *stats = NULL;
> > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags);
> > > + }
> > > +}
> >
> > Same; I would expect this to be simply:
>
> No comment. I think I am missing something. I need to clarify the previous
> cases before to work on the next ones. Sorry and thanks for the guidance.
Right -- so, there are a few concurrency cases you need to worry about,
AIUI:
1- stats lifetime (based on creation/death of tasks)
2- stats value being written vs read
3- stats values being written/read vs stats lifetime
Using refcount_t in the standard pattern (as you're doing) should
entirely cover "1".
Since the values read from stats are mostly independent, it should be
possible to use READ_ONCE() in the readers and WRITE_ONCE() under a lock
in the writers (this is case "2").
For "3", I think the implicit locking of "current" keeps you safe (as
in, the stats can't go away because "current" will always have a
reference on it).
I see two places where stats are written. One appears to be the
brute_task_execve() case where only 1 thread exists, so there's no
lock needed, and the other case is brute_update_crash_period(), which
makes sense to me to lock: two tasks might be sharing a stats as they
crash.
Of course, I could easily be missing something here, but it looks like
much less locking is needed.
>
> > stats = brute_stats_ptr(task);
> > if (WARN_ON_ONCE(!*stats))
> > return;
> > if (refcount_dec_and_test(&(*stats)->refc)) {
> > kfree(*stats);
> > *stats = NULL;
> > }
> >
> > > +
> > > +/*
> > > + * BRUTE_EMA_WEIGHT_NUMERATOR - Weight's numerator of EMA.
> > > + */
> > > +static const u64 BRUTE_EMA_WEIGHT_NUMERATOR = 7;
> > > +
> > > +/*
> > > + * BRUTE_EMA_WEIGHT_DENOMINATOR - Weight's denominator of EMA.
> > > + */
> > > +static const u64 BRUTE_EMA_WEIGHT_DENOMINATOR = 10;
> >
> > Should these be externally configurable (via sysfs)?
>
> No problem. I think this is easier than locking :)
Heh, for the most part, yes. ;) Though I have my own nightmares[1] about
sysfs.
> > > +/**
> > > + * brute_update_crash_period() - Update the application crash period.
> > > + * @stats: Statistics that hold the application crash period to update.
> > > + * @now: The current timestamp in jiffies.
> > > + *
> > > + * The application crash period must be a value that is not prone to change due
> > > + * to spurious data and follows the real crash period. So, to compute it, the
> > > + * exponential moving average (EMA) is used.
> > > + *
> > > + * This kind of average defines a weight (between 0 and 1) for the new value to
> > > + * add and applies the remainder of the weight to the current average value.
> > > + * This way, some spurious data will not excessively modify the average and only
> > > + * if the new values are persistent, the moving average will tend towards them.
> > > + *
> > > + * Mathematically the application crash period's EMA can be expressed as
> > > + * follows:
> > > + *
> > > + * period_ema = period * weight + period_ema * (1 - weight)
> > > + *
> > > + * If the operations are applied:
> > > + *
> > > + * period_ema = period * weight + period_ema - period_ema * weight
> > > + *
> > > + * If the operands are ordered:
> > > + *
> > > + * period_ema = period_ema - period_ema * weight + period * weight
> > > + *
> > > + * Finally, this formula can be written as follows:
> > > + *
> > > + * period_ema -= period_ema * weight;
> > > + * period_ema += period * weight;
> > > + *
> > > + * The statistics that hold the application crash period to update cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * held.
> > > + * Return: The last crash timestamp before updating it.
> > > + */
> > > +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now)
> > > +{
> > > + u64 current_period;
> > > + u64 last_crash_timestamp;
> > > +
> > > + spin_lock(&stats->lock);
> > > + current_period = now - stats->jiffies;
> > > + last_crash_timestamp = stats->jiffies;
> > > + stats->jiffies = now;
> > > +
> > > + stats->period -= brute_mul_by_ema_weight(stats->period);
> > > + stats->period += brute_mul_by_ema_weight(current_period);
> > > +
> > > + if (stats->faults < BRUTE_MAX_FAULTS)
> > > + stats->faults += 1;
> > > +
> > > + spin_unlock(&stats->lock);
> > > + return last_crash_timestamp;
> > > +}
> >
> > Now *here* locking makes sense, and it only needs to be per-stat, not
> > global, since multiple processes may be operating on the same stat
> > struct. To make this more no-reader-locking-friendly, I'd also update
> > everything at the end, and use WRITE_ONCE():
> >
> > u64 current_period, period;
> > u64 last_crash_timestamp;
> > u64 faults;
> >
> > spin_lock(&stats->lock);
> > current_period = now - stats->jiffies;
> > last_crash_timestamp = stats->jiffies;
> >
> > WRITE_ONCE(stats->period,
> > stats->period - brute_mul_by_ema_weight(stats->period) +
> > brute_mul_by_ema_weight(current_period));
> >
> > if (stats->faults < BRUTE_MAX_FAULTS)
> > WRITE_ONCE(stats->faults, stats->faults + 1);
> >
> > WRITE_ONCE(stats->jiffies, now);
> >
> > spin_unlock(&stats->lock);
> > return last_crash_timestamp;
> >
> > That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults
> > without needing to hold the &stats->lock (unless they need perfectly matching
> > jiffies, period, and faults).
>
> Thanks for the refactory. I will work on it (if I can understand locking). :(
It may be worth reading Documentation/memory-barriers.txt which has some
more details.
> > > +/**
> > > + * brute_manage_fork_attack() - Manage a fork brute force attack.
> > > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > > + * @now: The current timestamp in jiffies.
> > > + *
> > > + * For a correct management of a fork brute force attack it is only necessary to
> > > + * update the statistics and test if an attack is happening based on these data.
> > > + *
> > > + * The statistical data shared by all the fork hierarchy processes cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock
> > > + * held.
> > > + * Return: The last crash timestamp before updating it.
> > > + */
> > > +static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now)
> > > +{
> > > + u64 last_fork_crash;
> > > +
> > > + last_fork_crash = brute_update_crash_period(stats, now);
> > > + if (brute_attack_running(stats))
> > > + print_fork_attack_running();
> > > +
> > > + return last_fork_crash;
> > > +}
> > > +
> > > +/**
> > > + * brute_get_exec_stats() - Get the exec statistics.
> > > + * @stats: When this function is called, this parameter must point to the
> > > + * current process' statistical data. When this function returns, this
> > > + * parameter points to the parent process' statistics of the fork
> > > + * hierarchy that hold the current process' statistics.
> > > + *
> > > + * To manage a brute force attack that happens through the execve system call it
> > > + * is not possible to use the statistical data hold by this process due to these
> > > + * statistics disappear when this task is finished. In this scenario this data
> > > + * should be tracked by the statistics of a higher fork hierarchy (the hierarchy
> > > + * that contains the process that forks before the execve system call).
> > > + *
> > > + * To find these statistics the current fork hierarchy must be traversed up
> > > + * until new statistics are found.
> > > + *
> > > + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held.
> > > + */
> > > +static void brute_get_exec_stats(struct brute_stats **stats)
> > > +{
> > > + const struct task_struct *task = current;
> > > + struct brute_stats **p_stats;
> > > +
> > > + do {
> > > + if (!task->real_parent) {
> > > + *stats = NULL;
> > > + return;
> > > + }
> > > +
> > > + p_stats = brute_stats_ptr(task->real_parent);
> > > + task = task->real_parent;
> > > + } while (*stats == *p_stats);
> > > +
> > > + *stats = *p_stats;
> > > +}
> >
> > See Yama's task_is_descendant() for how to walk up the process tree
> > (and I think the process group stuff will save some steps too); you
> > don't need tasklist_lock held, just rcu_read_lock held, AIUI:
> > Documentation/RCU/listRCU.rst
> >
> > And since you're passing this stats struct back up, and it would be outside of rcu read lock, you'd want to do a "get" on it first:
> >
> > rcu_read_lock();
> > loop {
> > ...
> > }
> > refcount_inc_not_zero(&(*p_stats)->refc);
> > rcu_read_unlock();
> >
> > *stats = *p_stats
>
> Thanks for the suggestions. I will work on it for the next version.
> Anyway, in the first version Kees Cook and Jann Horn noted that some tasks
> could escape the rcu read lock and that alternate locking were needed.
>
> Extract from the RFC:
>
> [Kees Cook]
> Can't newly created processes escape this RCU read lock? I think this
> need alternate locking, or something in the task_alloc hook that will
> block any new process from being created within the stats group.
>
> [Jann Horn]
> Good point; the proper way to deal with this would probably be to take
> the tasklist_lock in read mode around this loop (with
> read_lock(&tasklist_lock) / read_unlock(&tasklist_lock)), which pairs
> with the write_lock_irq(&tasklist_lock) in copy_process(). Thanks to
> the fatal_signal_pending() check while holding the lock in
> copy_process(), that would be race-free - any fork() that has not yet
> inserted the new task into the global task list would wait for us to
> drop the tasklist_lock, then bail out at the fatal_signal_pending()
> check.
>
> I think that this scenario is still possible. So the tasklist_lock is
> necessary. Am I right?
Oops, yeah, best to listen to Jann and past-me. :) Were these comments
about finding the parent or killing offenders?
> > > +
> > > + spin_lock(&(*stats)->lock);
> > > + if (!(*stats)->faults)
> > > + (*stats)->jiffies = last_fork_crash;
> > > + spin_unlock(&(*stats)->lock);
> > > +
> > > + brute_update_crash_period(*stats, now);
> >
> > and then you can add:
> >
> > if (refcount_dec_and_test(&(*stats)->refc))
> > kfree(*stats);
> >
> > (or better yet, make that a helper) named something like
> > "put_brute_stats".
>
> Sorry, but I don't understand why we need to free the stats here.
> What is the rationale behind this change?
Err, I think I may have quoted the wrong chunk of your patch! Sorry; I
was talking about the place where you did a free, I think? Disregard
this for now. :)
> > > +/**
> > > + * brute_manage_exec_attack() - Manage an exec brute force attack.
> > > + * @stats: Statistical data shared by all the fork hierarchy processes.
> > > + * @now: The current timestamp in jiffies.
> > > + * @last_fork_crash: The last fork crash timestamp before updating it.
> > > + *
> > > + * For a correct management of an exec brute force attack it is only necessary
> > > + * to update the exec statistics and test if an attack is happening based on
> > > + * these data.
> > > + *
> > > + * It is important to note that if the fork and exec crash periods are the same,
> > > + * the attack test is avoided. This allows that in a brute force attack that
> > > + * happens through the fork system call, the mitigation method does not act on
> > > + * the parent process of the fork hierarchy.
> > > + *
> > > + * The statistical data shared by all the fork hierarchy processes cannot be
> > > + * NULL.
> > > + *
> > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock
> > > + * since the task_free hook can be called from an IRQ context during the
> > > + * execution of the task_fatal_signal hook.
> > > + *
> > > + * Context: Must be called with interrupts disabled and tasklist_lock and
> > > + * brute_stats_ptr_lock held.
> > > + */
> > > +static void brute_manage_exec_attack(struct brute_stats *stats, u64 now,
> > > + u64 last_fork_crash)
> > > +{
> > > + int ret;
> > > + struct brute_stats *exec_stats = stats;
> > > + u64 fork_period;
> > > + u64 exec_period;
> > > +
> > > + ret = brute_update_exec_crash_period(&exec_stats, now, last_fork_crash);
> > > + if (WARN(ret, "No exec statistical data\n"))
> > > + return;
> >
> > I think this should fail closed: if there's a static processing error,
> > treat it as an attack.
>
> Do you mean to trigger the mitigation of a brute force attack over this task?
> So, IIUC you suggest that instead of generate warnings if there isn't
> statistical data, we need to trigger the mitigation? This can be applied to
> the case where the allocation of a brute_stats structure fails?
Right -- it should be an impossible scenario that the stats are
_missing_. There is not an expected execution path in the kernel where
that could happen, so if you're testing for it (and correctly generating
a WARN), it should _also_ fail closed: an impossible case has been
found, so assume userspace is under attack. (Otherwise it could serve as
a bypass for an attacker who has found a way to navigate a process into
this state.)
-Kees
--
Kees Cook
John Wood <[email protected]> writes:
> Add some info detailing what is the Brute LSM, its motivation, weak
> points of existing implementations, proposed solutions, enabling,
> disabling and self-tests.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> Documentation/admin-guide/LSM/Brute.rst | 278 ++++++++++++++++++++++++
> Documentation/admin-guide/LSM/index.rst | 1 +
> security/brute/Kconfig | 3 +-
> 3 files changed, 281 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/admin-guide/LSM/Brute.rst
Thanks for including documentation with the patch!
As you get closer to merging this, though, you'll want to take a minute
(OK, a few minutes) to build the docs and look at the result; there are
a number of places where you're not going to get what you expect. Just
as an example:
[...]
> +Based on the above scenario it would be nice to have this detected and
> +mitigated, and this is the goal of this implementation. Specifically the
> +following attacks are expected to be detected:
> +
> +1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
> + desirable memory layout is got (e.g. Stack Clash).
> +2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a
> + desirable memory layout is got (e.g. what CTFs do for simple network
> + service).
> +3.- Launching processes without exec() (e.g. Android Zygote) and exposing state
> + to attack a sibling.
> +4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the
> + previously shared memory layout of all the other children is exposed (e.g.
> + kind of related to HeartBleed).
Sphinx will try to recognize your enumerated list, but that may be a bit
more punctuation than it is prepared to deal with; I'd take the hyphens
out, if nothing else.
[...]
> +These statistics are hold by the brute_stats struct.
> +
> +struct brute_cred {
> + kuid_t uid;
> + kgid_t gid;
> + kuid_t suid;
> + kgid_t sgid;
> + kuid_t euid;
> + kgid_t egid;
> + kuid_t fsuid;
> + kgid_t fsgid;
> +};
That will certainly not render the way you want. What you need here is
a literal block:
These statistics are hold by the brute_stats struct::
struct brute_cred {
kuid_t uid;
kgid_t gid;
kuid_t suid;
kgid_t sgid;
kuid_t euid;
kgid_t egid;
kuid_t fsuid;
kgid_t fsgid;
};
The "::" causes all of the indented text following to be formatted
literally.
Thanks,
jon
Hi,
On Sun, Mar 21, 2021 at 11:45:59AM -0700, Kees Cook wrote:
> On Sun, Mar 21, 2021 at 04:01:18PM +0100, John Wood wrote:
> > On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote:
> > > On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote:
> > Sorry, but I try to understand how to use locking properly without luck.
> >
> > I have read (and tried to understand):
> > tools/memory-model/Documentation/simple.txt
> > tools/memory-model/Documentation/ordering.txt
> > tools/memory-model/Documentation/recipes.txt
> > Documentation/memory-barriers.txt
> >
> > And I don't find the responses that I need. I'm not saying they aren't
> > there but I don't see them. So my questions:
> >
> > If in the above function makes sense to use locking, and it is called from
> > the brute_task_fatal_signal hook, then, all the functions that are called
> > from this hook need locking (more than one process can access stats at the
> > same time).
> >
> > So, as you point, how it is possible and safe to read jiffies and faults
> > (and I think period even though you not mention it) using READ_ONCE() but
> > without holding brute_stats::lock? I'm very confused.
>
> There are, I think, 3 considerations:
>
> - is "stats", itself, a valid allocation in kernel memory? This is the
> "lifetime" management of the structure: it will only stay allocated as
> long as there is a task still alive that is attached to it. The use of
> refcount_t on task creation/death should entirely solve this issue, so
> that all the other places where you access "stats", the memory will be
> valid. AFAICT, this one is fine: you're doing all the correct lifetime
> management.
>
> - changing a task's stats pointer: this is related to lifetime
> management, but it, I think, entirely solved by the existing
> refcounting. (And isn't helped by holding stats->lock since this is
> about stats itself being a valid pointer.) Again, I think this is all
> correct already in your existing code (due to the implicit locking of
> "current"). Perhaps I've missed something here, but I guess we'll see!
My only concern now is the following case:
One process crashes with a fatal signal. Then, its stats are updated. Then
we get the exec stats (the stats of the task that calls exec). At the same
time another CPU frees this same stats. Now, if the first process writes
to the exec stats we get a "use after free" bug.
If this scenario is possible, we would need to protect all the section
inside the task_fatal_signal hook that deals with the exec stats. I think
that here a global lock is necessary and also, protect the write of the
pointer to stats struct in the task_free hook.
Moreover, I can see another scenario:
The first CPU gets the exec stats when a task fails with a fatal signal.
The second CPU exec()ve after exec()ve over the same task from we get the
exec stats with the first CPU. This second CPU resets the stats at the same
time that the first CPU updates the same stats. I think we also need lock
here.
Am I right? Are these paths possible?
>
> - are the values in stats getting written by multiple writers, or read
> during a write, etc?
>
> This last one is the core of what I think could be improved here:
>
> To keep the writes serialized, you (correctly) perform locking in the
> writers. This is fine.
>
> There is also locking in the readers, which I think is not needed.
> AFAICT, READ_ONCE() (with WRITE_ONCE() in the writers) is sufficient for
> the readers here.
>
> > IIUC (during the reading of the documentation) READ_ONCE and WRITE_ONCE only
> > guarantees that a variable loaded with WRITE_ONCE can be read safely with
> > READ_ONCE avoiding tearing, etc. So, I see these functions like a form of
> > guarantee atomicity in variables.
>
> Right -- from what I can see about how you're reading the statistics, I
> don't see a way to have the values get confused (assuming locked writes
> and READ/WRITE_ONCE()).
>
> > Another question. Is it also safe to use WRITE_ONCE without holding the lock?
> > Or this is only appliable to read operations?
>
> No -- you'll still want the writer locked since you update multiple fields
> in stats during a write, so you could miss increments, or interleave
> count vs jiffies writes, etc. But the WRITE_ONCE() makes sure that the
> READ_ONCE() readers will see a stable value (as I understand it), and
> in the order they were written.
>
> > Any light on this will help me to do the best job in the next patches. If
> > somebody can point me to the right direction it would be greatly appreciated.
> >
> > Is there any documentation for newbies regarding this theme? I'm stuck.
> > I have also read the documentation about spinlocks, semaphores, mutex, etc..
> > but nothing clears me the concept expose.
> >
> > Apologies if this question has been answered in the past. But the search in
> > the mailing list has not been lucky.
>
> It's a complex subject! Here are some other docs that might help:
>
> tools/memory-model/Documentation/explanation.txt
> Documentation/core-api/refcount-vs-atomic.rst
>
> or they may melt your brain further! :) I know mine is always mushy
> after reading them.
>
> > Thanks for your time and patience.
>
> You're welcome; and thank you for your work on this! I've wanted a robust
> brute force mitigation in the kernel for a long time. :)
>
Thank you very much for this great explanation and mentorship. Now this
subject is more clear to me. It's a pleasure to me to work on this.
Again, thanks for your help.
John Wood
On Sun, Mar 21, 2021 at 12:50:47PM -0600, Jonathan Corbet wrote:
> John Wood <[email protected]> writes:
>
> > Add some info detailing what is the Brute LSM, its motivation, weak
> > points of existing implementations, proposed solutions, enabling,
> > disabling and self-tests.
> >
> > Signed-off-by: John Wood <[email protected]>
> > ---
> > Documentation/admin-guide/LSM/Brute.rst | 278 ++++++++++++++++++++++++
> > Documentation/admin-guide/LSM/index.rst | 1 +
> > security/brute/Kconfig | 3 +-
> > 3 files changed, 281 insertions(+), 1 deletion(-)
> > create mode 100644 Documentation/admin-guide/LSM/Brute.rst
>
> Thanks for including documentation with the patch!
>
> As you get closer to merging this, though, you'll want to take a minute
> (OK, a few minutes) to build the docs and look at the result; there are
Thanks, I will do it.
> a number of places where you're not going to get what you expect. Just
> as an example:
>
> [...]
>
> > +Based on the above scenario it would be nice to have this detected and
> > +mitigated, and this is the goal of this implementation. Specifically the
> > +following attacks are expected to be detected:
> > +
> > +1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
> > + desirable memory layout is got (e.g. Stack Clash).
> > +2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly until a
> > + desirable memory layout is got (e.g. what CTFs do for simple network
> > + service).
> > +3.- Launching processes without exec() (e.g. Android Zygote) and exposing state
> > + to attack a sibling.
> > +4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until the
> > + previously shared memory layout of all the other children is exposed (e.g.
> > + kind of related to HeartBleed).
>
> Sphinx will try to recognize your enumerated list, but that may be a bit
> more punctuation than it is prepared to deal with; I'd take the hyphens
> out, if nothing else.
Thanks. I will fix this for the next version.
> > +These statistics are hold by the brute_stats struct.
> > +
> > +struct brute_cred {
> > + kuid_t uid;
> > + kgid_t gid;
> > + kuid_t suid;
> > + kgid_t sgid;
> > + kuid_t euid;
> > + kgid_t egid;
> > + kuid_t fsuid;
> > + kgid_t fsgid;
> > +};
>
> That will certainly not render the way you want. What you need here is
> a literal block:
>
> These statistics are hold by the brute_stats struct::
>
> struct brute_cred {
> kuid_t uid;
> kgid_t gid;
> kuid_t suid;
> kgid_t sgid;
> kuid_t euid;
> kgid_t egid;
> kuid_t fsuid;
> kgid_t fsgid;
> };
>
> The "::" causes all of the indented text following to be formatted
> literally.
Thanks a lot for your comments and guidance. I will build the docs and
check if the output is as I want.
> Thanks,
>
> jon
Regards,
John Wood