2020-10-25 15:13:29

by John Wood

[permalink] [raw]
Subject: [PATCH v2 0/8] Fork brute force attack mitigation

Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionaly use some level of brute force with the
help of the fork system call. This is possible since when creating a new
process using fork its memory contents are the same as those of the parent
process (the process that called the fork system call). So, the attacker
can test the memory infinite times to find the correct memory values or the
correct memory addresses without worrying about crashing the application.

Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie.

Other implementations
---------------------

The public version of grsecurity, as a summary, is based on the idea of
delay the fork system call if a child died due to a fatal error. This has
some issues:

1.- Bad practices: Add delays to the kernel is, in general, a bad idea.

2.- Weak points: This protection can be bypassed using two different
methods since it acts only when the fork is called after a child has
crashed.

2.1.- Bypass 1: So, it would still be possible for an attacker to fork
a big amount of children (in the order of thousands), then probe
all of them, and finally wait the protection time before repeat
the steps.

2.2.- Bypass 2: This method is based on the idea that the protection
doesn't act if the parent crashes. So, it would still be possible
for an attacker to fork a process and probe itself. Then, fork
the child process and probe itself again. This way, these steps
can be repeated infinite times without any mitigation.

This implementation
-------------------

The main idea behind this implementation is to improve the existing ones
focusing on the weak points annotated before. The solution for the first
bypass method is to detect a fast crash rate instead of only one simple
crash. For the second bypass method the solution is to detect both the
crash of parent and child processes. Moreover, as a mitigation method it is
better to kill all the offending tasks involve in the attack instead of use
delays.

So, the solution to the two bypass methods previously commented is to use
some statistical data shared across all the processes that can have the
same memory contents. Or in other words, a statistical data shared between
all the fork hierarchy processes after an execve system call.

The purpose of these statistics is to compute the application crash period
in order to detect an attack. This crash period is the time between the
execve system call and the first fault or the time between two consecutives
faults, but this has a drawback. If an application crashes once quickly
from the execve system call or crashes twice in a short period of time for
some reason, a false positive attack will be triggered. To avoid this
scenario the shared statistical data holds a list of the i last crashes
timestamps and the application crash period is computed as follows:

crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;

This ways, the size of the last crashes timestamps list allows to fine
tuning the detection sensibility.

When this crash period falls under a certain threshold there is a clear
signal that something malicious is happening. Once detected, the mitigation
only kills the processes that share the same statistical data and so, all
the tasks that can have the same memory contents. This way, an attack is
rejected.

1.- Per system enabling: This feature can be enabled at build time using
the CONFIG_SECURITY_FORK_BRUTE option or using the visual config
application under the following menu:

Security options ---> Fork brute force attack detection and mitigation

2.- Per process enabling/disabling: To allow that specific applications can
turn off or turn on the detection and mitigation of a fork brute force
attack when required, there are two new prctls.

prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0)
prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0)

3.- Fine tuning: To customize the detection's sensibility there are two new
sysctl attributes that allow to set the last crashes timestamps list
size and the application crash period threshold (in milliseconds). Both
are accessible through the following files respectively.

/proc/sys/kernel/brute/timestamps_list_size
/proc/sys/kernel/brute/crash_period_threshold

The list size allows to avoid false positives due to crashes unrelated
with a real attack. The period threshold sets the time limit to detect
an attack. And, since a fork brute force attack will be detected if the
application crash period falls under this threshold, the higher this
value, the more sensitive the detection will be.

So, knowing all this information I will explain now the different patches:

The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.

The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.

The 3/8 patch adds the sysctl attributes to fine tuning the detection.

Patchs 4/8 and 5/8 detect and mitigate a fork brute force attack.

Patch 6/8 adds the prctls to allow per process enabling/disabling.

Patch 7/8 adds the documentation to explain this implementation.

Patch 8/8 updates the maintainers file.

This patch series is a task of the KSPP [1] and can also be accessed from
my github tree [2] in the "brute_v2" branch.

[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/

The first version can be found in:

https://lore.kernel.org/kernel-hardening/[email protected]/

Changelog RFC -> v2
-------------------
- Rename this feature with a more appropiate name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).

John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
security/brute: Add sysctl attributes to allow detection fine tuning
security/brute: Detect a fork brute force attack
security/brute: Mitigate a fork brute force attack
security/brute: Add prctls to enable/disable the fork attack detection
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM

Documentation/admin-guide/LSM/Brute.rst | 118 ++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/brute/brute.h | 16 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
include/uapi/linux/prctl.h | 4 +
kernel/signal.c | 1 +
kernel/sys.c | 8 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 749 ++++++++++++++++++++++++
security/security.c | 5 +
16 files changed, 943 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 include/brute/brute.h
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c

--
2.25.1


2020-10-25 17:50:28

by John Wood

[permalink] [raw]
Subject: [PATCH v2 1/8] security: Add LSM hook at the point where a task gets a fatal signal

Add a security hook that allows a LSM to be notified when a task gets a
fatal signal. This patch is a previous step on the way to compute the
task crash period by the "brute" LSM (linux security module to detect
and mitigate fork brute force attack against vulnerable userspace
processes).

Signed-off-by: John Wood <[email protected]>
---
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 ++++
include/linux/security.h | 4 ++++
kernel/signal.c | 1 +
security/security.c | 5 +++++
5 files changed, 15 insertions(+)

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2a8c74d99015..8ecbb6849555 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -213,6 +213,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5)
LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
struct inode *inode)
+LSM_HOOK(void, LSM_RET_VOID, task_fatal_signal, const kernel_siginfo_t *siginfo)
LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag)
LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp,
u32 *secid)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9e2e3e63719d..0a8b0fab0212 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -761,6 +761,10 @@
* security attributes, e.g. for /proc/pid inodes.
* @p contains the task_struct for the task.
* @inode contains the inode structure for the inode.
+ * @task_fatal_signal:
+ * This hook allows security modules to be notified when a task gets a
+ * fatal signal.
+ * @siginfo contains the signal information.
*
* Security hooks for Netlink messaging.
*
diff --git a/include/linux/security.h b/include/linux/security.h
index 0a0a03b36a3b..4bc000bb8685 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -413,6 +413,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
void security_task_to_inode(struct task_struct *p, struct inode *inode);
+void security_task_fatal_signal(const kernel_siginfo_t *siginfo);
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag);
void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid);
int security_msg_msg_alloc(struct msg_msg *msg);
@@ -1127,6 +1128,9 @@ static inline int security_task_prctl(int option, unsigned long arg2,
static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
{ }

+static inline void security_task_fatal_signal(const kernel_siginfo_t *siginfo)
+{ }
+
static inline int security_ipc_permission(struct kern_ipc_perm *ipcp,
short flag)
{
diff --git a/kernel/signal.c b/kernel/signal.c
index a38b3edc6851..a0866d6b2c06 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2734,6 +2734,7 @@ bool get_signal(struct ksignal *ksig)
/*
* Anything else is fatal, maybe with a core dump.
*/
+ security_task_fatal_signal(&ksig->info);
current->flags |= PF_SIGNALED;

if (sig_kernel_coredump(signr)) {
diff --git a/security/security.c b/security/security.c
index 70a7ad357bc6..e8c7978b515c 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1810,6 +1810,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode)
call_void_hook(task_to_inode, p, inode);
}

+void security_task_fatal_signal(const kernel_siginfo_t *siginfo)
+{
+ call_void_hook(task_fatal_signal, siginfo);
+}
+
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag)
{
return call_int_hook(ipc_permission, 0, ipcp, flag);
--
2.25.1

2020-10-25 17:50:28

by John Wood

[permalink] [raw]
Subject: [PATCH v2 2/8] security/brute: Define a LSM and manage statistical data

Add a new Kconfig file to define a menu entry under "Security options"
to enable the "Fork brute force attack detection and mitigation"
feature.

For a correct management of a fork brute force attack it is necessary
that all the tasks hold statistical data. The same statistical data
needs to be shared between all the tasks that hold the same memory
contents or in other words, between all the tasks that have been forked
without any execve call. So, define a statistical data structure to hold
all the necessary information shared by all the fork hierarchy
processes. This info is basically a list of the last crashes timestamps
and a reference counter.

When a forked task calls the execve system call, the memory contents are
set with new values. So, in this scenario the parent's statistical data
no need to be shared. Instead, a new statistical data structure must be
allocated to start a new hierarchy.

The statistical data that is shared between all the fork hierarchy
processes needs to be freed when this hierarchy disappears.

So, based in all the previous information define a LSM with three hooks
to manage all the commented cases. These hooks are "task_alloc" to do
the fork management, "bprm_committing_creds" for do the execve
management and "task_free" to release the resources.

Also, add to the task_struct's security blob the pointer to the
statistical data. This way, all the tasks will have access to this
information.

Signed-off-by: John Wood <[email protected]>
---
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 12 ++
security/brute/Makefile | 2 +
security/brute/brute.c | 339 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 363 insertions(+), 5 deletions(-)
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c

diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..204bb311b1f1 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -240,6 +240,7 @@ source "security/safesetid/Kconfig"
source "security/lockdown/Kconfig"

source "security/integrity/Kconfig"
+source "security/brute/Kconfig"

choice
prompt "First legacy 'major LSM' to be initialized"
@@ -277,11 +278,11 @@ endchoice

config LSM
string "Ordered list of enabled LSMs"
- default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
- default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
- default "lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
- default "lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
- default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC
+ default "brute,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf"
help
A comma-separated list of LSMs, in initialization order.
Any LSMs left off this list will be ignored. This can be
diff --git a/security/Makefile b/security/Makefile
index 3baf435de541..1236864876da 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -36,3 +36,7 @@ obj-$(CONFIG_BPF_LSM) += bpf/
# Object integrity file lists
subdir-$(CONFIG_INTEGRITY) += integrity
obj-$(CONFIG_INTEGRITY) += integrity/
+
+# Object brute file lists
+subdir-$(CONFIG_SECURITY_FORK_BRUTE) += brute
+obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute/
diff --git a/security/brute/Kconfig b/security/brute/Kconfig
new file mode 100644
index 000000000000..1bd2df1e2dec
--- /dev/null
+++ b/security/brute/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+config SECURITY_FORK_BRUTE
+ bool "Fork brute force attack detection and mitigation"
+ depends on SECURITY
+ help
+ This is an LSM that stops any fork brute force attack against
+ vulnerable userspace processes. The detection method is based on
+ the application crash period and as a mitigation procedure all the
+ offending tasks are killed. Like capabilities, this security module
+ stacks with other LSMs.
+
+ If you are unsure how to answer this question, answer N.
diff --git a/security/brute/Makefile b/security/brute/Makefile
new file mode 100644
index 000000000000..d3f233a132a9
--- /dev/null
+++ b/security/brute/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_SECURITY_FORK_BRUTE) += brute.o
diff --git a/security/brute/brute.c b/security/brute/brute.c
new file mode 100644
index 000000000000..307d07bf9d98
--- /dev/null
+++ b/security/brute/brute.c
@@ -0,0 +1,339 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <asm/current.h>
+#include <linux/bug.h>
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/init.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/lsm_hooks.h>
+#include <linux/printk.h>
+#include <linux/refcount.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/**
+ * struct brute_stats - Fork brute force attack statistics.
+ * @lock: Lock to protect the brute_stats structure.
+ * @refc: Reference counter.
+ * @timestamps: Last crashes timestamps list.
+ * @timestamps_size: Last crashes timestamps list size.
+ *
+ * This structure holds the statistical data shared by all the fork hierarchy
+ * processes.
+ */
+struct brute_stats {
+ spinlock_t lock;
+ refcount_t refc;
+ struct list_head timestamps;
+ unsigned char timestamps_size;
+};
+
+/**
+ * struct brute_timestamp - Last crashes timestamps list entry.
+ * @jiffies: Crash timestamp.
+ * @node: Entry list head.
+ *
+ * This structure holds a crash timestamp.
+ */
+struct brute_timestamp {
+ u64 jiffies;
+ struct list_head node;
+};
+
+/**
+ * brute_blob_sizes - LSM blob sizes.
+ *
+ * To share statistical data among all the fork hierarchy processes, define a
+ * pointer to the brute_stats structure as a part of the task_struct's security
+ * blob.
+ */
+static struct lsm_blob_sizes brute_blob_sizes __lsm_ro_after_init = {
+ .lbs_task = sizeof(struct brute_stats *),
+};
+
+/**
+ * brute_stats_ptr() - Get the pointer to the brute_stats structure.
+ * @task: Task that holds the statistical data.
+ *
+ * Return: A pointer to a pointer to the brute_stats structure.
+ */
+static inline struct brute_stats **brute_stats_ptr(struct task_struct *task)
+{
+ return task->security + brute_blob_sizes.lbs_task;
+}
+
+/**
+ * brute_new_timestamp() - Allocate a new timestamp structure.
+ *
+ * If the allocation is successful the timestamp is set to now.
+ *
+ * Return: NULL if the allocation fails. A pointer to the new allocated
+ * timestamp structure if it success.
+ */
+static struct brute_timestamp *brute_new_timestamp(void)
+{
+ struct brute_timestamp *timestamp;
+
+ timestamp = kmalloc(sizeof(struct brute_timestamp), GFP_KERNEL);
+ if (timestamp)
+ timestamp->jiffies = get_jiffies_64();
+
+ return timestamp;
+}
+
+/**
+ * brute_new_stats() - Allocate a new statistics structure.
+ *
+ * If the allocation is successful the reference counter is set to one to
+ * indicate that there will be one task that points to this structure. The last
+ * crashes timestamps list is initialized with one entry set to now. This way,
+ * its possible to compute the application crash period at the first fault.
+ *
+ * Return: NULL if the allocation fails. A pointer to the new allocated
+ * statistics structure if it success.
+ */
+static struct brute_stats *brute_new_stats(void)
+{
+ struct brute_stats *stats;
+ struct brute_timestamp *timestamp;
+
+ stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL);
+ if (!stats)
+ return NULL;
+
+ timestamp = brute_new_timestamp();
+ if (!timestamp) {
+ kfree(stats);
+ return NULL;
+ }
+
+ spin_lock_init(&stats->lock);
+ refcount_set(&stats->refc, 1);
+ INIT_LIST_HEAD(&stats->timestamps);
+ list_add_tail(&timestamp->node, &stats->timestamps);
+ stats->timestamps_size = 1;
+
+ return stats;
+}
+
+/**
+ * brute_share_stats() - Share the statistical data between processes.
+ * @src: Source of statistics to be shared.
+ * @dst: Destination of statistics to be shared.
+ *
+ * Copy the src's pointer to the statistical data structure to the dst's pointer
+ * to the same structure. Since there is a new process that shares the same
+ * data, increase the reference counter. The src's pointer cannot be NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the lock since the
+ * task_free hook can be called from an IRQ context during the execution of the
+ * task_alloc hook.
+ */
+static void brute_share_stats(struct brute_stats **src,
+ struct brute_stats **dst)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&(*src)->lock, flags);
+ refcount_inc(&(*src)->refc);
+ *dst = *src;
+ spin_unlock_irqrestore(&(*src)->lock, flags);
+}
+
+/**
+ * brute_task_alloc() - Target for the task_alloc hook.
+ * @task: Task being allocated.
+ * @clone_flags: Contains the flags indicating what should be shared.
+ *
+ * For a correct management of a fork brute force attack it is necessary that
+ * all the tasks hold statistical data. The same statistical data needs to be
+ * shared between all the tasks that hold the same memory contents or in other
+ * words, between all the tasks that have been forked without any execve call.
+ *
+ * To ensure this, if the current task doesn't have statistical data when forks,
+ * it is mandatory to allocate a new statistics structure and share it between
+ * this task and the new one being allocated. Otherwise, share the statistics
+ * that the current task already has.
+ *
+ * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero
+ * otherwise.
+ */
+static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags)
+{
+ struct brute_stats **stats, **p_stats;
+
+ stats = brute_stats_ptr(task);
+ p_stats = brute_stats_ptr(current);
+
+ if (likely(*p_stats)) {
+ brute_share_stats(p_stats, stats);
+ return 0;
+ }
+
+ *stats = brute_new_stats();
+ if (!*stats)
+ return -ENOMEM;
+
+ brute_share_stats(stats, p_stats);
+ return 0;
+}
+
+/**
+ * brute_reset_stats() - Reset the statistical data.
+ * @stats: Statistics to be reset.
+ *
+ * Ensure that the last crashes timestamps list holds only one entry and set
+ * this timestamp to now. This way, its possible to compute the application
+ * crash period at the next fault. The statistics to be reset cannot be NULL.
+ *
+ * Context: Must be called with stats->lock held.
+ */
+static void brute_reset_stats(struct brute_stats *stats)
+{
+ unsigned char entries_to_delete;
+ struct brute_timestamp *timestamp, *next;
+
+ if (WARN(!stats->timestamps_size, "No last timestamps\n"))
+ return;
+
+ entries_to_delete = stats->timestamps_size - 1;
+ stats->timestamps_size = 1;
+
+ list_for_each_entry_safe(timestamp, next, &stats->timestamps, node) {
+ if (unlikely(!entries_to_delete)) {
+ timestamp->jiffies = get_jiffies_64();
+ break;
+ }
+
+ list_del(&timestamp->node);
+ kfree(timestamp);
+ entries_to_delete -= 1;
+ }
+}
+
+/**
+ * brute_task_execve() - Target for the bprm_committing_creds hook.
+ * @bprm: Points to the linux_binprm structure.
+ *
+ * When a forked task calls the execve system call, the memory contents are set
+ * with new values. So, in this scenario the parent's statistical data no need
+ * to be shared. Instead, a new statistical data structure must be allocated to
+ * start a new hierarchy. This condition is detected when the statistics
+ * reference counter holds a value greater than or equal to two (a fork always
+ * sets the statistics reference counter to a minimum of two since the parent
+ * and the child task are sharing the same data).
+ *
+ * However, if the execve function is called immediately after another execve
+ * call, althought the memory contents are reset, there is no need to allocate
+ * a new statistical data structure. This is possible because at this moment
+ * only one task (the task that calls the execve function) points to the data.
+ * In this case, the previous allocation is used but the statistics are reset.
+ *
+ * It's mandatory to disable interrupts before acquiring the lock since the
+ * task_free hook can be called from an IRQ context during the execution of the
+ * bprm_committing_creds hook.
+ */
+static void brute_task_execve(struct linux_binprm *bprm)
+{
+ struct brute_stats **stats;
+ unsigned long flags;
+
+ stats = brute_stats_ptr(current);
+ if (WARN(!*stats, "No statistical data\n"))
+ return;
+
+ spin_lock_irqsave(&(*stats)->lock, flags);
+
+ if (!refcount_dec_not_one(&(*stats)->refc)) {
+ /* execve call after an execve call */
+ brute_reset_stats(*stats);
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+ return;
+ }
+
+ /* execve call after a fork call */
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+ *stats = brute_new_stats();
+ WARN(!*stats, "Cannot allocate statistical data\n");
+}
+
+/**
+ * brute_stats_free() - Deallocate a statistics structure.
+ * @stats: Statistics to be freed.
+ *
+ * Deallocate all the last crashes timestamps list entries and then the
+ * statistics structure. The statistics to be freed cannot be NULL.
+ *
+ * Context: Must be called with stats->lock held and this function releases it.
+ */
+static void brute_stats_free(struct brute_stats *stats)
+{
+ struct brute_timestamp *timestamp, *next;
+
+ list_for_each_entry_safe(timestamp, next, &stats->timestamps, node) {
+ list_del(&timestamp->node);
+ kfree(timestamp);
+ }
+
+ spin_unlock(&stats->lock);
+ kfree(stats);
+}
+
+/**
+ * brute_task_free() - Target for the task_free hook.
+ * @task: Task about to be freed.
+ *
+ * The statistical data that is shared between all the fork hierarchy processes
+ * needs to be freed when this hierarchy disappears.
+ */
+static void brute_task_free(struct task_struct *task)
+{
+ struct brute_stats **stats;
+
+ stats = brute_stats_ptr(task);
+ if (WARN(!*stats, "No statistical data\n"))
+ return;
+
+ spin_lock(&(*stats)->lock);
+
+ if (refcount_dec_and_test(&(*stats)->refc))
+ brute_stats_free(*stats);
+ else
+ spin_unlock(&(*stats)->lock);
+}
+
+/**
+ * brute_hooks - Targets for the LSM's hooks.
+ */
+static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
+ LSM_HOOK_INIT(task_alloc, brute_task_alloc),
+ LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
+ LSM_HOOK_INIT(task_free, brute_task_free),
+};
+
+/**
+ * brute_init() - Initialize the brute LSM.
+ *
+ * Return: Always returns zero.
+ */
+static int __init brute_init(void)
+{
+ pr_info("Brute initialized\n");
+ security_add_hooks(brute_hooks, ARRAY_SIZE(brute_hooks),
+ KBUILD_MODNAME);
+ return 0;
+}
+
+DEFINE_LSM(brute) = {
+ .name = KBUILD_MODNAME,
+ .init = brute_init,
+ .blobs = &brute_blob_sizes,
+};
+
--
2.25.1

2020-10-25 17:51:36

by John Wood

[permalink] [raw]
Subject: [PATCH v2 3/8] security/brute: Add sysctl attributes to allow detection fine tuning

This is a previous step to add the detection feature.

A fork brute force attack will be detected when an application crashes
quickly. Since, the application crash period is the time between the
execve system call and the first fault or the time between two
consecutives faults add a new sysctl attribute to control the crash
period threshold.

But a detection method based only on this computation has a drawback. If
an application crashes once quickly from the execve system call or
crashes twice in a short period of time for some reason, a false
positive attack will be triggered. To avoid this scenario use a list of
the i last crashes timestamps and compute the application crash period
as follows:

crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;

So, also add a new sysctl attribute to control the size of this list.

This way, each system can tune the detection's sensibility adjusting the
application crash period threshold and the size of the last crashes
timestamps list.

Signed-off-by: John Wood <[email protected]>
---
security/brute/brute.c | 83 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)

diff --git a/security/brute/brute.c b/security/brute/brute.c
index 307d07bf9d98..29835fe2f141 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -4,12 +4,14 @@

#include <asm/current.h>
#include <linux/bug.h>
+#include <linux/cache.h>
#include <linux/compiler.h>
#include <linux/errno.h>
#include <linux/gfp.h>
#include <linux/init.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
+#include <linux/limits.h>
#include <linux/list.h>
#include <linux/lsm_hooks.h>
#include <linux/printk.h>
@@ -17,6 +19,34 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
+#include <linux/sysctl.h>
+
+/**
+ * brute_timestamps_list_size - Last crashes timestamps list size.
+ *
+ * The application crash period is the time between the execve system call and
+ * the first fault or the time between two consecutives faults, but this has a
+ * drawback. If an application crashes once quickly from the execve system call
+ * or crashes twice in a short period of time for some reason, a false positive
+ * attack will be triggered. To avoid this scenario use a list of the i last
+ * crashes timestamps and compute the application crash period as follows:
+ *
+ * crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;
+ *
+ * The brute_timestamps_list_size variable sets the size of this list.
+ */
+static unsigned int brute_timestamps_list_size __read_mostly = 5;
+
+/**
+ * brute_crash_period_threshold - Application crash period threshold.
+ *
+ * The units are expressed in milliseconds.
+ *
+ * A fork brute force attack will be detected if the application crash period
+ * falls under this threshold. So, the higher this value, the more sensitive the
+ * detection will be.
+ */
+static unsigned int brute_crash_period_threshold __read_mostly = 30000;

/**
* struct brute_stats - Fork brute force attack statistics.
@@ -318,6 +348,58 @@ static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(task_free, brute_task_free),
};

+#ifdef CONFIG_SYSCTL
+static unsigned int uint_one = 1;
+static unsigned int uint_max = UINT_MAX;
+static unsigned int max_brute_timestamps_list_size = 10;
+
+/**
+ * brute_sysctl_path - Sysctl attributes path.
+ */
+static struct ctl_path brute_sysctl_path[] = {
+ { .procname = "kernel", },
+ { .procname = "brute", },
+ { }
+};
+
+/**
+ * brute_sysctl_table - Sysctl attributes.
+ */
+static struct ctl_table brute_sysctl_table[] = {
+ {
+ .procname = "timestamps_list_size",
+ .data = &brute_timestamps_list_size,
+ .maxlen = sizeof(brute_timestamps_list_size),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = &uint_one,
+ .extra2 = &max_brute_timestamps_list_size,
+ },
+ {
+ .procname = "crash_period_threshold",
+ .data = &brute_crash_period_threshold,
+ .maxlen = sizeof(brute_crash_period_threshold),
+ .mode = 0644,
+ .proc_handler = proc_douintvec_minmax,
+ .extra1 = &uint_one,
+ .extra2 = &uint_max,
+ },
+ { }
+};
+
+/**
+ * brute_init_sysctl() - Initialize the sysctl interface.
+ */
+static void __init brute_init_sysctl(void)
+{
+ if (!register_sysctl_paths(brute_sysctl_path, brute_sysctl_table))
+ panic("Cannot register the sysctl interface\n");
+}
+
+#else
+static inline void brute_init_sysctl(void) { }
+#endif /* CONFIG_SYSCTL */
+
/**
* brute_init() - Initialize the brute LSM.
*
@@ -328,6 +410,7 @@ static int __init brute_init(void)
pr_info("Brute initialized\n");
security_add_hooks(brute_hooks, ARRAY_SIZE(brute_hooks),
KBUILD_MODNAME);
+ brute_init_sysctl();
return 0;
}

--
2.25.1

2020-10-25 17:52:06

by John Wood

[permalink] [raw]
Subject: [PATCH v2 4/8] security/brute: Detect a fork brute force attack

To detect a fork brute force attack is necessary that the list that
holds the last crashes timestamps be updated in every fatal crash. To do
so, use the "task_fatal_signal" LSM hook added in a previous step.

Then, an only when this list is large enough, the application crash
period can be computed as the difference between the newest crash
timestamp and the oldest one divided by the size of the list. This way,
the scenario where an application crashes few times in a short period of
time due to reasons unrelated to a real attack is avoided.

Finally, an application crash period that falls under the defined
threshold shows that the application is crashing quickly and there is a
clear signal that an attack is happening.

Signed-off-by: John Wood <[email protected]>
---
security/brute/brute.c | 130 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 130 insertions(+)

diff --git a/security/brute/brute.c b/security/brute/brute.c
index 29835fe2f141..223a18c2084a 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -339,6 +339,135 @@ static void brute_task_free(struct task_struct *task)
spin_unlock(&(*stats)->lock);
}

+/**
+ * brute_add_timestamp() - Add a new entry to the last crashes timestamps list.
+ * @stats: Statistics that hold the last crashes timestamps list.
+ * @new_entry: New timestamp to add to the list.
+ *
+ * The statistics that hold the last crashes timestamps list cannot be NULL. The
+ * new timestamp to add to the list cannot be NULL.
+ *
+ * Context: Must be called with stats->lock held.
+ */
+static void brute_add_timestamp(struct brute_stats *stats,
+ struct brute_timestamp *new_entry)
+{
+ list_add_tail(&new_entry->node, &stats->timestamps);
+ stats->timestamps_size += 1;
+}
+
+/**
+ * brute_old_timestamp_entry() - Get the oldest timestamp entry.
+ * @head: Last crashes timestamps list.
+ *
+ * Context: Must be called with stats->lock held.
+ * Return: The oldest entry added to the last crashes timestamps list.
+ */
+#define brute_old_timestamp_entry(head) \
+ list_first_entry(head, struct brute_timestamp, node)
+
+/**
+ * brute_update_timestamps_list() - Update the last crashes timestamps list.
+ * @stats: Statistics that hold the last crashes timestamps list.
+ * @new_entry: New timestamp to update the list.
+ *
+ * Add a new timestamp structure to the list if this one has not reached the
+ * maximum size yet. Replace the oldest timestamp entry otherwise.
+ *
+ * The statistics that hold the last crashes timestamps list cannot be NULL. The
+ * new timestamp to update the list cannot be NULL.
+ *
+ * Context: Must be called with stats->lock held.
+ * Return: The oldest timestamp that has been replaced. NULL otherwise.
+ */
+static struct brute_timestamp *
+brute_update_timestamps_list(struct brute_stats *stats,
+ struct brute_timestamp *new_entry)
+{
+ unsigned int list_size;
+ struct brute_timestamp *old_entry;
+
+ list_size = (unsigned int)stats->timestamps_size;
+ if (list_size < brute_timestamps_list_size) {
+ brute_add_timestamp(stats, new_entry);
+ return NULL;
+ }
+
+ old_entry = brute_old_timestamp_entry(&stats->timestamps);
+ list_replace(&old_entry->node, &new_entry->node);
+ list_rotate_left(&stats->timestamps);
+
+ return old_entry;
+}
+
+/**
+ * brute_get_crash_period() - Get the application crash period.
+ * @new_entry: New timestamp added to the last crashes timestamps list.
+ * @old_entry: Old timestamp replaced in the last crashes timestamps list.
+ *
+ * The application crash period is computed as the difference between the newest
+ * crash timestamp and the oldest one divided by the size of the list. This way,
+ * the scenario where an application crashes few times in a short period of time
+ * due to reasons unrelated to a real attack is avoided.
+ *
+ * The new and old timestamp cannot be NULL.
+ *
+ * Context: Must be called with stats->lock held.
+ * Return: The application crash period in milliseconds.
+ */
+static u64 brute_get_crash_period(struct brute_timestamp *new_entry,
+ struct brute_timestamp *old_entry)
+{
+ u64 jiffies;
+
+ jiffies = new_entry->jiffies - old_entry->jiffies;
+ jiffies /= (u64)brute_timestamps_list_size;
+
+ return jiffies64_to_msecs(jiffies);
+}
+
+/**
+ * brute_task_fatal_signal() - Target for the task_fatal_signal hook.
+ * @siginfo: Contains the signal information.
+ *
+ * To detect a fork brute force attack is necessary that the list that holds the
+ * last crashes timestamps be updated in every fatal crash. Then, an only when
+ * this list is large enough, the application crash period can be computed an
+ * compared with the defined threshold.
+ *
+ * It's mandatory to disable interrupts before acquiring the lock since the
+ * task_free hook can be called from an IRQ context during the execution of the
+ * task_fatal_signal hook.
+ */
+static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
+{
+ struct brute_stats **stats;
+ struct brute_timestamp *new_entry, *old_entry;
+ unsigned long flags;
+ u64 crash_period;
+
+ stats = brute_stats_ptr(current);
+ if (WARN(!*stats, "No statistical data\n"))
+ return;
+
+ new_entry = brute_new_timestamp();
+ if (WARN(!new_entry, "Cannot allocate last crash timestamp\n"))
+ return;
+
+ spin_lock_irqsave(&(*stats)->lock, flags);
+ old_entry = brute_update_timestamps_list(*stats, new_entry);
+
+ if (old_entry) {
+ crash_period = brute_get_crash_period(new_entry, old_entry);
+ kfree(old_entry);
+
+ if (crash_period < (u64)brute_crash_period_threshold)
+ pr_warn("Fork brute force attack detected\n");
+ }
+
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+}
+
/**
* brute_hooks - Targets for the LSM's hooks.
*/
@@ -346,6 +475,7 @@ static struct security_hook_list brute_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(task_alloc, brute_task_alloc),
LSM_HOOK_INIT(bprm_committing_creds, brute_task_execve),
LSM_HOOK_INIT(task_free, brute_task_free),
+ LSM_HOOK_INIT(task_fatal_signal, brute_task_fatal_signal),
};

#ifdef CONFIG_SYSCTL
--
2.25.1

2020-10-25 17:54:11

by John Wood

[permalink] [raw]
Subject: [PATCH v2 5/8] security/brute: Mitigate a fork brute force attack

In order to mitigate a fork brute force attack it is necessary to kill
all the offending tasks. This tasks are all the ones that share the
statistical data with the current task (the task that has crashed).

Since the attack detection is done in the task_fatal_signal LSM hook
only is needed to kill the other tasks that share the same statistical
data, not the ones that have the same group_leader that the current task
since the latter are in the path to be killed.

When the SIGKILL signal is sent to the offending tasks, the
brute_kill_offending_tasks function will be called in a recursive way
from the task_fatal_signal LSM hook due to a small crash period. So, to
avoid kill again the same tasks due to a recursive call of this
function, it is necessary to disable the attack detection for this fork
hierarchy.

To disable this attack detection, empty the last crashes timestamps list
and avoid to compute the application crash period if the size of this
list is zero.

Signed-off-by: John Wood <[email protected]>
---
security/brute/brute.c | 144 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 135 insertions(+), 9 deletions(-)

diff --git a/security/brute/brute.c b/security/brute/brute.c
index 223a18c2084a..a1bdf25ffcf9 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -3,6 +3,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <asm/current.h>
+#include <asm/rwonce.h>
#include <linux/bug.h>
#include <linux/cache.h>
#include <linux/compiler.h>
@@ -14,9 +15,14 @@
#include <linux/limits.h>
#include <linux/list.h>
#include <linux/lsm_hooks.h>
+#include <linux/pid.h>
#include <linux/printk.h>
#include <linux/refcount.h>
+#include <linux/rwlock.h>
#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/task.h>
+#include <linux/signal.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/sysctl.h>
@@ -295,23 +301,39 @@ static void brute_task_execve(struct linux_binprm *bprm)
}

/**
- * brute_stats_free() - Deallocate a statistics structure.
- * @stats: Statistics to be freed.
+ * brute_timestamps_free() - Empty a last crashes timestamp list.
+ * @timestamps: Last crashes timestamps list to be emptied.
*
- * Deallocate all the last crashes timestamps list entries and then the
- * statistics structure. The statistics to be freed cannot be NULL.
+ * Empty the last crashes timestamps list and deallocate all the entries. This
+ * list cannot be NULL.
*
- * Context: Must be called with stats->lock held and this function releases it.
+ * Context: Must be called with stats->lock held.
*/
-static void brute_stats_free(struct brute_stats *stats)
+static void brute_timestamps_free(struct list_head *timestamps)
{
struct brute_timestamp *timestamp, *next;

- list_for_each_entry_safe(timestamp, next, &stats->timestamps, node) {
+ if (list_empty(timestamps))
+ return;
+
+ list_for_each_entry_safe(timestamp, next, timestamps, node) {
list_del(&timestamp->node);
kfree(timestamp);
}
+}

+/**
+ * brute_stats_free() - Deallocate a statistics structure.
+ * @stats: Statistics to be freed.
+ *
+ * Deallocate all the last crashes timestamps list entries and then the
+ * statistics structure. The statistics to be freed cannot be NULL.
+ *
+ * Context: Must be called with stats->lock held and this function releases it.
+ */
+static void brute_stats_free(struct brute_stats *stats)
+{
+ brute_timestamps_free(&stats->timestamps);
spin_unlock(&stats->lock);
kfree(stats);
}
@@ -426,6 +448,104 @@ static u64 brute_get_crash_period(struct brute_timestamp *new_entry,
return jiffies64_to_msecs(jiffies);
}

+/**
+ * brute_disabled() - Test the fork brute force attack detection disabling.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * The fork brute force attack detection enabling / disabling is based on the
+ * last crashes timestamps list current size. A size of zero indicates that this
+ * feature is disabled. A size greater than zero indicates that this attack
+ * detection is enabled.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * It's mandatory to disable interrupts before acquiring the lock since the
+ * task_free hook can be called from an IRQ context during the execution of the
+ * task_fatal_signal hook.
+ *
+ * Return: True if the fork brute force attack detection is disabled. False
+ * otherwise.
+ */
+static bool brute_disabled(struct brute_stats *stats)
+{
+ unsigned long flags;
+ bool disabled;
+
+ spin_lock_irqsave(&stats->lock, flags);
+ disabled = !stats->timestamps_size;
+ spin_unlock_irqrestore(&stats->lock, flags);
+
+ return disabled;
+}
+
+/**
+ * brute_disable() - Disable the fork brute force attack detection.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * To disable the fork brute force attack detection it's only necessary to empty
+ * the last crashes timestamps list. So, a list size of zero indicates that this
+ * feature is disabled and a list size greater than zero indicates that this
+ * attack detection is enabled.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * Context: Must be called with stats->lock held.
+ */
+static void brute_disable(struct brute_stats *stats)
+{
+ brute_timestamps_free(&stats->timestamps);
+ stats->timestamps_size = 0;
+}
+
+/**
+ * brute_kill_offending_tasks() - Kill the offending tasks.
+ * @stats: Statistical data shared by all the fork hierarchy processes.
+ *
+ * When a fork brute force attack is detected it is necessary to kill all the
+ * offending tasks involved in the attack. In other words, it is necessary to
+ * kill all the tasks that share the same statistical data but not the ones that
+ * have the same group_leader that the current task since the latter are in the
+ * path to be killed.
+ *
+ * When the SIGKILL signal is sent to the offending tasks, this function will be
+ * called again from the task_fatal_signal hook due to a small crash period. So,
+ * to avoid kill again the same tasks due to a recursive call of this function,
+ * it is necessary to disable the attack detection for this fork hierarchy.
+ *
+ * The statistical data shared by all the fork hierarchy processes cannot be
+ * NULL.
+ *
+ * Context: Must be called with stats->lock held.
+ */
+static void brute_kill_offending_tasks(struct brute_stats *stats)
+{
+ struct task_struct *p;
+ struct brute_stats **p_stats;
+
+ if (refcount_read(&stats->refc) == 1)
+ return;
+
+ brute_disable(stats);
+ read_lock(&tasklist_lock);
+
+ for_each_process(p) {
+ if (p->group_leader == current->group_leader)
+ continue;
+
+ p_stats = brute_stats_ptr(p);
+ if (READ_ONCE(*p_stats) != stats)
+ continue;
+
+ do_send_sig_info(SIGKILL, SEND_SIG_PRIV, p, PIDTYPE_PID);
+ pr_warn_ratelimited("Offending process %d (%s) killed\n",
+ p->pid, p->comm);
+ }
+
+ read_unlock(&tasklist_lock);
+}
+
/**
* brute_task_fatal_signal() - Target for the task_fatal_signal hook.
* @siginfo: Contains the signal information.
@@ -433,7 +553,8 @@ static u64 brute_get_crash_period(struct brute_timestamp *new_entry,
* To detect a fork brute force attack is necessary that the list that holds the
* last crashes timestamps be updated in every fatal crash. Then, an only when
* this list is large enough, the application crash period can be computed an
- * compared with the defined threshold.
+ * compared with the defined threshold. If at this moment an attack is detected,
+ * all the offending tasks must be killed.
*
* It's mandatory to disable interrupts before acquiring the lock since the
* task_free hook can be called from an IRQ context during the execution of the
@@ -450,6 +571,9 @@ static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
if (WARN(!*stats, "No statistical data\n"))
return;

+ if (brute_disabled(*stats))
+ return;
+
new_entry = brute_new_timestamp();
if (WARN(!new_entry, "Cannot allocate last crash timestamp\n"))
return;
@@ -461,8 +585,10 @@ static void brute_task_fatal_signal(const kernel_siginfo_t *siginfo)
crash_period = brute_get_crash_period(new_entry, old_entry);
kfree(old_entry);

- if (crash_period < (u64)brute_crash_period_threshold)
+ if (crash_period < (u64)brute_crash_period_threshold) {
pr_warn("Fork brute force attack detected\n");
+ brute_kill_offending_tasks(*stats);
+ }
}

spin_unlock_irqrestore(&(*stats)->lock, flags);
--
2.25.1

2020-10-25 18:00:12

by John Wood

[permalink] [raw]
Subject: [PATCH v2 6/8] security/brute: Add prctls to enable/disable the fork attack detection

To allow that a process can turn off or turn on the detection and
mitigation of a fork brute force attack when required, add two new
defines to the prctl interface.

All the arguments passed to the prctl system call are ignored for the
two new cases.

To enable the attack detection make the last crashes timestamps list not
empty. To disable the detection use the already created brute_disable()
function.

Signed-off-by: John Wood <[email protected]>
---
include/brute/brute.h | 16 +++++++++
include/uapi/linux/prctl.h | 4 +++
kernel/sys.c | 8 +++++
security/brute/brute.c | 71 ++++++++++++++++++++++++++++++++++++++
4 files changed, 99 insertions(+)
create mode 100644 include/brute/brute.h

diff --git a/include/brute/brute.h b/include/brute/brute.h
new file mode 100644
index 000000000000..da6fca04f16b
--- /dev/null
+++ b/include/brute/brute.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BRUTE_H_
+#define _BRUTE_H_
+
+#include <linux/errno.h>
+
+#ifdef CONFIG_SECURITY_FORK_BRUTE
+int brute_prctl_enable(void);
+int brute_prctl_disable(void);
+#else
+static inline int brute_prctl_enable(void) { return -EINVAL; }
+static inline int brute_prctl_disable(void) { return -EINVAL; }
+#endif
+
+#endif /* _BRUTE_H_ */
+
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 07b4f8131e36..01f5033210d0 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -238,4 +238,8 @@ struct prctl_mm_map {
#define PR_SET_IO_FLUSHER 57
#define PR_GET_IO_FLUSHER 58

+/* Enable/disable the detection and mitigation of a fork brute force attack */
+#define PR_SECURITY_FORK_BRUTE_ENABLE 59
+#define PR_SECURITY_FORK_BRUTE_DISABLE 60
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index ab6c409b1159..35dae4e2f59a 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -72,6 +72,8 @@
#include <asm/io.h>
#include <asm/unistd.h>

+#include <brute/brute.h>
+
#include "uid16.h"

#ifndef SET_UNALIGN_CTL
@@ -2530,6 +2532,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,

error = (current->flags & PR_IO_FLUSHER) == PR_IO_FLUSHER;
break;
+ case PR_SECURITY_FORK_BRUTE_ENABLE:
+ error = brute_prctl_enable();
+ break;
+ case PR_SECURITY_FORK_BRUTE_DISABLE:
+ error = brute_prctl_disable();
+ break;
default:
error = -EINVAL;
break;
diff --git a/security/brute/brute.c b/security/brute/brute.c
index a1bdf25ffcf9..6f85e137553c 100644
--- a/security/brute/brute.c
+++ b/security/brute/brute.c
@@ -676,3 +676,74 @@ DEFINE_LSM(brute) = {
.blobs = &brute_blob_sizes,
};

+/**
+ * brute_prctl_enable() - Enable the fork brute force attack detection.
+ *
+ * To enable the fork brute force attack detection the last crashes timestamps
+ * list must not be empty. So, if this list already contains entries nothing
+ * needs to be done. Otherwise, initialize the last crashes timestamps list with
+ * one entry set to now. This way, the application crash period can be computed
+ * at the next fault.
+ *
+ * It's mandatory to disable interrupts before acquiring the lock since the
+ * task_free hook can be called from an IRQ context during the execution of the
+ * prctl syscall.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. -ENOMEM if
+ * the allocation of the new timestamp structure fails. Zero otherwise.
+ */
+int brute_prctl_enable(void)
+{
+ struct brute_stats **stats;
+ struct brute_timestamp *timestamp;
+ unsigned long flags;
+
+ stats = brute_stats_ptr(current);
+ if (!*stats)
+ return -EFAULT;
+
+ timestamp = brute_new_timestamp();
+ if (!timestamp)
+ return -ENOMEM;
+
+ spin_lock_irqsave(&(*stats)->lock, flags);
+
+ if (!list_empty(&(*stats)->timestamps)) {
+ kfree(timestamp);
+ goto unlock;
+ }
+
+ list_add_tail(&timestamp->node, &(*stats)->timestamps);
+ (*stats)->timestamps_size = 1;
+
+unlock:
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+ return 0;
+}
+
+/**
+ * brute_prctl_disable() - Disable the fork brute force attack detection.
+ *
+ * It's mandatory to disable interrupts before acquiring the lock since the
+ * task_free hook can be called from an IRQ context during the execution of the
+ * prctl syscall.
+ *
+ * Return: -EFAULT if the current task doesn't have statistical data. Zero
+ * otherwise.
+ */
+int brute_prctl_disable(void)
+{
+ struct brute_stats **stats;
+ unsigned long flags;
+
+ stats = brute_stats_ptr(current);
+ if (!*stats)
+ return -EFAULT;
+
+ spin_lock_irqsave(&(*stats)->lock, flags);
+ brute_disable(*stats);
+ spin_unlock_irqrestore(&(*stats)->lock, flags);
+
+ return 0;
+}
+
--
2.25.1

2020-10-25 19:11:23

by John Wood

[permalink] [raw]
Subject: [PATCH v2 8/8] MAINTAINERS: Add a new entry for the Brute LSM

In order to maintain the code for the Brute LSM add a new entry to the
maintainers list.

Signed-off-by: John Wood <[email protected]>
---
MAINTAINERS | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 867157311dc8..3d3b34f87913 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3734,6 +3734,13 @@ L: [email protected]
S: Supported
F: drivers/net/ethernet/brocade/bna/

+BRUTE SECURITY MODULE
+M: John Wood <[email protected]>
+S: Maintained
+F: Documentation/admin-guide/LSM/Brute.rst
+F: include/brute/
+F: security/brute/
+
BSG (block layer generic sg v4 driver)
M: FUJITA Tomonori <[email protected]>
L: [email protected]
--
2.25.1

2020-10-25 22:33:41

by John Wood

[permalink] [raw]
Subject: [PATCH v2 7/8] Documentation: Add documentation for the Brute LSM

Add some info detailing what is the Brute LSM, its motivation, weak
points of existing implementations, proposed solutions, enabling,
disabling and fine tuning.

Signed-off-by: John Wood <[email protected]>
---
Documentation/admin-guide/LSM/Brute.rst | 118 ++++++++++++++++++++++++
Documentation/admin-guide/LSM/index.rst | 1 +
security/brute/Kconfig | 3 +-
3 files changed, 121 insertions(+), 1 deletion(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst

diff --git a/Documentation/admin-guide/LSM/Brute.rst b/Documentation/admin-guide/LSM/Brute.rst
new file mode 100644
index 000000000000..20c6ccbd625d
--- /dev/null
+++ b/Documentation/admin-guide/LSM/Brute.rst
@@ -0,0 +1,118 @@
+.. SPDX-License-Identifier: GPL-2.0
+===========================================================
+Brute: Fork brute force attack detection and mitigation LSM
+===========================================================
+
+Attacks against vulnerable userspace applications with the purpose to break ASLR
+or bypass canaries traditionaly use some level of brute force with the help of
+the fork system call. This is possible since when creating a new process using
+fork its memory contents are the same as those of the parent process (the
+process that called the fork system call). So, the attacker can test the memory
+infinite times to find the correct memory values or the correct memory addresses
+without worrying about crashing the application.
+
+Based on the above scenario it would be nice to have this detected and
+mitigated, and this is the goal of this implementation.
+
+
+Other implementations
+=====================
+
+The public version of grsecurity, as a summary, is based on the idea of delay
+the fork system call if a child died due to a fatal error. This has some issues:
+
+Bad practices
+-------------
+
+Add delays to the kernel is, in general, a bad idea.
+
+Weak points
+-----------
+
+This protection can be bypassed using two different methods since it acts only
+when the fork is called after a child has crashed.
+
+Bypass 1
+~~~~~~~~
+
+So, it would still be possible for an attacker to fork a big amount of children
+(in the order of thousands), then probe all of them, and finally wait the
+protection time before repeat the steps.
+
+Bypass 2
+~~~~~~~~
+
+This method is based on the idea that the protection doesn't act if the parent
+crashes. So, it would still be possible for an attacker to fork a process and
+probe itself. Then, fork the child process and probe itself again. This way,
+these steps can be repeated infinite times without any mitigation.
+
+
+This implementation
+===================
+
+The main idea behind this implementation is to improve the existing ones
+focusing on the weak points annotated before. The solution for the first bypass
+method is to detect a fast crash rate instead of only one simple crash. For the
+second bypass method the solution is to detect both the crash of parent and
+child processes. Moreover, as a mitigation method it is better to kill all the
+offending tasks involve in the attack instead of use delays.
+
+So, the solution to the two bypass methods previously commented is to use some
+statistical data shared across all the processes that can have the same memory
+contents. Or in other words, a statistical data shared between all the fork
+hierarchy processes after an execve system call.
+
+The purpose of these statistics is to compute the application crash period in
+order to detect an attack. This crash period is the time between the execve
+system call and the first fault or the time between two consecutives faults, but
+this has a drawback. If an application crashes once quickly from the execve
+system call or crashes twice in a short period of time for some reason, a false
+positive attack will be triggered. To avoid this scenario the shared statistical
+data holds a list of the i last crashes timestamps and the application crash
+period is computed as follows:
+
+crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;
+
+This ways, the size of the last crashes timestamps list allows to fine tuning
+the detection sensibility.
+
+When this crash period falls under a certain threshold there is a clear signal
+that something malicious is happening. Once detected, the mitigation only kills
+the processes that share the same statistical data and so, all the tasks that
+can have the same memory contents. This way, an attack is rejected.
+
+Per system enabling
+-------------------
+
+This feature can be enabled at build time using the CONFIG_SECURITY_FORK_BRUTE
+option or using the visual config application under the following menu:
+
+Security options ---> Fork brute force attack detection and mitigation
+
+Per process enabling/disabling
+------------------------------
+
+To allow that specific applications can turn off or turn on the detection and
+mitigation of a fork brute force attack when required, there are two new prctls.
+
+prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0) -> To enable the feature
+prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0) -> To disable the feature
+
+Fine tuning
+-----------
+
+To customize the detection's sensibility there are two new sysctl attributes
+that allow to set the last crashes timestamps list size and the application
+crash period threshold (in milliseconds). Both are accessible through the
+following files respectively.
+
+/proc/sys/kernel/brute/timestamps_list_size
+/proc/sys/kernel/brute/crash_period_threshold
+
+The list size allows to avoid false positives due to crashes unrelated with a
+real attack. The period threshold sets the time limit to detect an attack. And,
+since a fork brute force attack will be detected if the application crash period
+falls under this threshold, the higher this value, the more sensitive the
+detection will be.
+
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
index a6ba95fbaa9f..1f68982bb330 100644
--- a/Documentation/admin-guide/LSM/index.rst
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -41,6 +41,7 @@ subdirectories.
:maxdepth: 1

apparmor
+ Brute
LoadPin
SELinux
Smack
diff --git a/security/brute/Kconfig b/security/brute/Kconfig
index 1bd2df1e2dec..334d7e88d27f 100644
--- a/security/brute/Kconfig
+++ b/security/brute/Kconfig
@@ -7,6 +7,7 @@ config SECURITY_FORK_BRUTE
vulnerable userspace processes. The detection method is based on
the application crash period and as a mitigation procedure all the
offending tasks are killed. Like capabilities, this security module
- stacks with other LSMs.
+ stacks with other LSMs. Further information can be found in
+ Documentation/admin-guide/LSM/Brute.rst.

If you are unsure how to answer this question, answer N.
--
2.25.1

2020-11-06 15:56:58

by John Wood

[permalink] [raw]
Subject: Re: [PATCH v2 0/8] Fork brute force attack mitigation

Hi,

Has anyone had time to review this patch serie? Any comments on this?

Regards.

On Sun, Oct 25, 2020 at 02:45:32PM +0100, John Wood wrote:
> Attacks against vulnerable userspace applications with the purpose to break
> ASLR or bypass canaries traditionaly use some level of brute force with the
> help of the fork system call. This is possible since when creating a new
> process using fork its memory contents are the same as those of the parent
> process (the process that called the fork system call). So, the attacker
> can test the memory infinite times to find the correct memory values or the
> correct memory addresses without worrying about crashing the application.
>
> Based on the above scenario it would be nice to have this detected and
> mitigated, and this is the goal of this patch serie.
>
> Other implementations
> ---------------------
>
> The public version of grsecurity, as a summary, is based on the idea of
> delay the fork system call if a child died due to a fatal error. This has
> some issues:
>
> 1.- Bad practices: Add delays to the kernel is, in general, a bad idea.
>
> 2.- Weak points: This protection can be bypassed using two different
> methods since it acts only when the fork is called after a child has
> crashed.
>
> 2.1.- Bypass 1: So, it would still be possible for an attacker to fork
> a big amount of children (in the order of thousands), then probe
> all of them, and finally wait the protection time before repeat
> the steps.
>
> 2.2.- Bypass 2: This method is based on the idea that the protection
> doesn't act if the parent crashes. So, it would still be possible
> for an attacker to fork a process and probe itself. Then, fork
> the child process and probe itself again. This way, these steps
> can be repeated infinite times without any mitigation.
>
> This implementation
> -------------------
>
> The main idea behind this implementation is to improve the existing ones
> focusing on the weak points annotated before. The solution for the first
> bypass method is to detect a fast crash rate instead of only one simple
> crash. For the second bypass method the solution is to detect both the
> crash of parent and child processes. Moreover, as a mitigation method it is
> better to kill all the offending tasks involve in the attack instead of use
> delays.
>
> So, the solution to the two bypass methods previously commented is to use
> some statistical data shared across all the processes that can have the
> same memory contents. Or in other words, a statistical data shared between
> all the fork hierarchy processes after an execve system call.
>
> The purpose of these statistics is to compute the application crash period
> in order to detect an attack. This crash period is the time between the
> execve system call and the first fault or the time between two consecutives
> faults, but this has a drawback. If an application crashes once quickly
> from the execve system call or crashes twice in a short period of time for
> some reason, a false positive attack will be triggered. To avoid this
> scenario the shared statistical data holds a list of the i last crashes
> timestamps and the application crash period is computed as follows:
>
> crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;
>
> This ways, the size of the last crashes timestamps list allows to fine
> tuning the detection sensibility.
>
> When this crash period falls under a certain threshold there is a clear
> signal that something malicious is happening. Once detected, the mitigation
> only kills the processes that share the same statistical data and so, all
> the tasks that can have the same memory contents. This way, an attack is
> rejected.
>
> 1.- Per system enabling: This feature can be enabled at build time using
> the CONFIG_SECURITY_FORK_BRUTE option or using the visual config
> application under the following menu:
>
> Security options ---> Fork brute force attack detection and mitigation
>
> 2.- Per process enabling/disabling: To allow that specific applications can
> turn off or turn on the detection and mitigation of a fork brute force
> attack when required, there are two new prctls.
>
> prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0)
> prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0)
>
> 3.- Fine tuning: To customize the detection's sensibility there are two new
> sysctl attributes that allow to set the last crashes timestamps list
> size and the application crash period threshold (in milliseconds). Both
> are accessible through the following files respectively.
>
> /proc/sys/kernel/brute/timestamps_list_size
> /proc/sys/kernel/brute/crash_period_threshold
>
> The list size allows to avoid false positives due to crashes unrelated
> with a real attack. The period threshold sets the time limit to detect
> an attack. And, since a fork brute force attack will be detected if the
> application crash period falls under this threshold, the higher this
> value, the more sensitive the detection will be.
>
> So, knowing all this information I will explain now the different patches:
>
> The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
> This will be useful during the attack detection phase.
>
> The 2/8 patch defines a new LSM and manages the statistical data shared by
> all the fork hierarchy processes.
>
> The 3/8 patch adds the sysctl attributes to fine tuning the detection.
>
> Patchs 4/8 and 5/8 detect and mitigate a fork brute force attack.
>
> Patch 6/8 adds the prctls to allow per process enabling/disabling.
>
> Patch 7/8 adds the documentation to explain this implementation.
>
> Patch 8/8 updates the maintainers file.
>
> This patch series is a task of the KSPP [1] and can also be accessed from
> my github tree [2] in the "brute_v2" branch.
>
> [1] https://github.com/KSPP/linux/issues/39
> [2] https://github.com/johwood/linux/
>
> The first version can be found in:
>
> https://lore.kernel.org/kernel-hardening/[email protected]/
>
> Changelog RFC -> v2
> -------------------
> - Rename this feature with a more appropiate name (Jann Horn, Kees Cook).
> - Convert the code to an LSM (Kees Cook).
> - Add locking to avoid data races (Jann Horn).
> - Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
> Cook).
> - Add the last crashes timestamps list to avoid false positives in the
> attack detection (Jann Horn).
> - Use "period" instead of "rate" (Jann Horn).
> - Other minor changes suggested (Jann Horn, Kees Cook).
>
> John Wood (8):
> security: Add LSM hook at the point where a task gets a fatal signal
> security/brute: Define a LSM and manage statistical data
> security/brute: Add sysctl attributes to allow detection fine tuning
> security/brute: Detect a fork brute force attack
> security/brute: Mitigate a fork brute force attack
> security/brute: Add prctls to enable/disable the fork attack detection
> Documentation: Add documentation for the Brute LSM
> MAINTAINERS: Add a new entry for the Brute LSM
>
> Documentation/admin-guide/LSM/Brute.rst | 118 ++++
> Documentation/admin-guide/LSM/index.rst | 1 +
> MAINTAINERS | 7 +
> include/brute/brute.h | 16 +
> include/linux/lsm_hook_defs.h | 1 +
> include/linux/lsm_hooks.h | 4 +
> include/linux/security.h | 4 +
> include/uapi/linux/prctl.h | 4 +
> kernel/signal.c | 1 +
> kernel/sys.c | 8 +
> security/Kconfig | 11 +-
> security/Makefile | 4 +
> security/brute/Kconfig | 13 +
> security/brute/Makefile | 2 +
> security/brute/brute.c | 749 ++++++++++++++++++++++++
> security/security.c | 5 +
> 16 files changed, 943 insertions(+), 5 deletions(-)
> create mode 100644 Documentation/admin-guide/LSM/Brute.rst
> create mode 100644 include/brute/brute.h
> create mode 100644 security/brute/Kconfig
> create mode 100644 security/brute/Makefile
> create mode 100644 security/brute/brute.c
>
> --
> 2.25.1
>

2020-11-09 04:34:03

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] Documentation: Add documentation for the Brute LSM

On 10/25/20 6:45 AM, John Wood wrote:
> Add some info detailing what is the Brute LSM, its motivation, weak
> points of existing implementations, proposed solutions, enabling,
> disabling and fine tuning.
>
> Signed-off-by: John Wood <[email protected]>
> ---
> Documentation/admin-guide/LSM/Brute.rst | 118 ++++++++++++++++++++++++
> Documentation/admin-guide/LSM/index.rst | 1 +
> security/brute/Kconfig | 3 +-
> 3 files changed, 121 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/admin-guide/LSM/Brute.rst
>
> diff --git a/Documentation/admin-guide/LSM/Brute.rst b/Documentation/admin-guide/LSM/Brute.rst
> new file mode 100644
> index 000000000000..20c6ccbd625d
> --- /dev/null
> +++ b/Documentation/admin-guide/LSM/Brute.rst
> @@ -0,0 +1,118 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +===========================================================
> +Brute: Fork brute force attack detection and mitigation LSM
> +===========================================================
> +
> +Attacks against vulnerable userspace applications with the purpose to break ASLR
> +or bypass canaries traditionaly use some level of brute force with the help of

traditionally

> +the fork system call. This is possible since when creating a new process using
> +fork its memory contents are the same as those of the parent process (the
> +process that called the fork system call). So, the attacker can test the memory
> +infinite times to find the correct memory values or the correct memory addresses
> +without worrying about crashing the application.
> +
> +Based on the above scenario it would be nice to have this detected and
> +mitigated, and this is the goal of this implementation.
> +
> +
> +Other implementations
> +=====================
> +
> +The public version of grsecurity, as a summary, is based on the idea of delay

delaying

> +the fork system call if a child died due to a fatal error. This has some issues:
> +
> +Bad practices
> +-------------
> +
> +Add delays to the kernel is, in general, a bad idea.

Adding

> +
> +Weak points
> +-----------
> +
> +This protection can be bypassed using two different methods since it acts only
> +when the fork is called after a child has crashed.
> +
> +Bypass 1
> +~~~~~~~~
> +
> +So, it would still be possible for an attacker to fork a big amount of children
> +(in the order of thousands), then probe all of them, and finally wait the
> +protection time before repeat the steps.

repeating

> +
> +Bypass 2
> +~~~~~~~~
> +
> +This method is based on the idea that the protection doesn't act if the parent
> +crashes. So, it would still be possible for an attacker to fork a process and
> +probe itself. Then, fork the child process and probe itself again. This way,
> +these steps can be repeated infinite times without any mitigation.
> +
> +
> +This implementation
> +===================
> +
> +The main idea behind this implementation is to improve the existing ones
> +focusing on the weak points annotated before. The solution for the first bypass
> +method is to detect a fast crash rate instead of only one simple crash. For the
> +second bypass method the solution is to detect both the crash of parent and
> +child processes. Moreover, as a mitigation method it is better to kill all the
> +offending tasks involve in the attack instead of use delays.

involved using

> +
> +So, the solution to the two bypass methods previously commented is to use some
> +statistical data shared across all the processes that can have the same memory
> +contents. Or in other words, a statistical data shared between all the fork
> +hierarchy processes after an execve system call.
> +
> +The purpose of these statistics is to compute the application crash period in
> +order to detect an attack. This crash period is the time between the execve
> +system call and the first fault or the time between two consecutives faults, but

consecutive

> +this has a drawback. If an application crashes once quickly from the execve
> +system call or crashes twice in a short period of time for some reason, a false
> +positive attack will be triggered. To avoid this scenario the shared statistical
> +data holds a list of the i last crashes timestamps and the application crash
> +period is computed as follows:
> +
> +crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;
> +
> +This ways, the size of the last crashes timestamps list allows to fine tuning

way tune

> +the detection sensibility.
> +
> +When this crash period falls under a certain threshold there is a clear signal
> +that something malicious is happening. Once detected, the mitigation only kills
> +the processes that share the same statistical data and so, all the tasks that
> +can have the same memory contents. This way, an attack is rejected.
> +
> +Per system enabling
> +-------------------
> +
> +This feature can be enabled at build time using the CONFIG_SECURITY_FORK_BRUTE
> +option or using the visual config application under the following menu:
> +
> +Security options ---> Fork brute force attack detection and mitigation
> +
> +Per process enabling/disabling
> +------------------------------
> +
> +To allow that specific applications can turn off or turn on the detection and
> +mitigation of a fork brute force attack when required, there are two new prctls.
> +
> +prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0) -> To enable the feature
> +prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0) -> To disable the feature
> +
> +Fine tuning
> +-----------
> +
> +To customize the detection's sensibility there are two new sysctl attributes
> +that allow to set the last crashes timestamps list size and the application
> +crash period threshold (in milliseconds). Both are accessible through the
> +following files respectively.
> +
> +/proc/sys/kernel/brute/timestamps_list_size
> +/proc/sys/kernel/brute/crash_period_threshold
> +
> +The list size allows to avoid false positives due to crashes unrelated with a
> +real attack. The period threshold sets the time limit to detect an attack. And,
> +since a fork brute force attack will be detected if the application crash period
> +falls under this threshold, the higher this value, the more sensitive the
> +detection will be.
> +

So an app could read crash_period_threshold and just do a new fork every
threshold + 1 time units, right? and not be caught?

thanks for the documentation.
--
~Randy

2020-11-09 18:27:56

by John Wood

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] Documentation: Add documentation for the Brute LSM

Hi,
Thanks for the typos corrections. Will be corrected in the next patch
version.

On Sun, Nov 08, 2020 at 08:31:13PM -0800, Randy Dunlap wrote:
>
> So an app could read crash_period_threshold and just do a new fork every
> threshold + 1 time units, right? and not be caught?

Yes, you are right. But we must set a crash_period_threshold that does not
make an attack feasible. For example, with the default value of 30000 ms,
an attacker can break the app only once every 30 seconds. So, to guess
canaries or break ASLR, the attack needs a big amount of time. But it is
possible.

So, I think that to avoid this scenario we can add a maximum number of
faults per fork hierarchy. Then, the mitigation will be triggered if the
application crash period falls under the period threshold or if the number
of faults exceed the maximum commented.

This way, if an attack is of long duration, it will also be detected and
mitigated.

What do you think?

>
> thanks for the documentation.
> --
> ~Randy
>

Thanks,
John Wood

2020-11-10 00:12:20

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] Documentation: Add documentation for the Brute LSM

On 11/9/20 10:23 AM, John Wood wrote:
> Hi,
> Thanks for the typos corrections. Will be corrected in the next patch
> version.
>
> On Sun, Nov 08, 2020 at 08:31:13PM -0800, Randy Dunlap wrote:
>>
>> So an app could read crash_period_threshold and just do a new fork every
>> threshold + 1 time units, right? and not be caught?
>
> Yes, you are right. But we must set a crash_period_threshold that does not
> make an attack feasible. For example, with the default value of 30000 ms,
> an attacker can break the app only once every 30 seconds. So, to guess
> canaries or break ASLR, the attack needs a big amount of time. But it is
> possible.
>
> So, I think that to avoid this scenario we can add a maximum number of
> faults per fork hierarchy. Then, the mitigation will be triggered if the
> application crash period falls under the period threshold or if the number
> of faults exceed the maximum commented.
>
> This way, if an attack is of long duration, it will also be detected and
> mitigated.
>
> What do you think?

Hi,
That sounds reasonable to me.

thanks.
--
~Randy

2020-11-11 00:14:23

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 0/8] Fork brute force attack mitigation

On Sun, Oct 25, 2020 at 02:45:32PM +0100, John Wood wrote:
> Attacks against vulnerable userspace applications with the purpose to break
> ASLR or bypass canaries traditionaly use some level of brute force with the
> help of the fork system call. This is possible since when creating a new
> process using fork its memory contents are the same as those of the parent
> process (the process that called the fork system call). So, the attacker
> can test the memory infinite times to find the correct memory values or the
> correct memory addresses without worrying about crashing the application.
>
> Based on the above scenario it would be nice to have this detected and
> mitigated, and this is the goal of this patch serie.

Thanks for preparing the v2! I spent some time looking at this today,
and I really like how it has been rearranged into an LSM. This feels
much more natural.

Various notes:

The locking isn't right; it'll trip with CONFIG_PROVE_LOCKING=y.
Here's the giant splat:


[ 8.205146] brute: Fork brute force attack detected
[ 8.206821]
[ 8.207317] =====================================================
[ 8.209392] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[ 8.211852] 5.10.0-rc3 #2 Not tainted
[ 8.213215] -----------------------------------------------------
[ 8.215213] runc/2505 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 8.217387] ffffffff97206098 (tasklist_lock){.+.+}-{2:2}, at: brute_task_fatal_signal.cold+0x49/0xe0
[ 8.219554]
[ 8.219554] and this task is already holding:
[ 8.220891] ffff9a6a10eb8318 (&stats->lock){+.-.}-{2:2}, at: brute_task_fatal_signal+0x97/0x210
[ 8.222856] which would create a new lock dependency:
[ 8.223943] (&stats->lock){+.-.}-{2:2} -> (tasklist_lock){.+.+}-{2:2}
[ 8.225360]
[ 8.225360] but this new dependency connects a SOFTIRQ-irq-safe lock:
[ 8.227081] (&stats->lock){+.-.}-{2:2}
[ 8.227084]
[ 8.227084] ... which became SOFTIRQ-irq-safe at:
[ 8.228732] lock_acquire+0x13f/0x3f0
[ 8.229381] _raw_spin_lock+0x2c/0x40
[ 8.230024] brute_task_free+0x22/0x90
[ 8.230619] security_task_free+0x22/0x50
[ 8.231260] __put_task_struct+0x58/0x140
[ 8.231924] rcu_core+0x2bb/0x620
[ 8.232492] __do_softirq+0x156/0x4d9
[ 8.233080] asm_call_irq_on_stack+0x12/0x20
[ 8.233759] do_softirq_own_stack+0x5b/0x70
[ 8.234425] irq_exit_rcu+0x9f/0xe0
[ 8.235221] sysvec_apic_timer_interrupt+0x43/0xa0
[ 8.235976] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 8.236794] _raw_write_unlock_irq+0x2c/0x40
[ 8.237465] copy_process+0x15e6/0x1cb0
[ 8.237966] kernel_clone+0x9b/0x3f0
[ 8.238413] kernel_thread+0x55/0x70
[ 8.238861] call_usermodehelper_exec_work+0x77/0xb0
[ 8.239466] process_one_work+0x23e/0x580
[ 8.239969] worker_thread+0x55/0x3c0
[ 8.240425] kthread+0x141/0x160
[ 8.240833] ret_from_fork+0x22/0x30
[ 8.241276]
[ 8.241276] to a SOFTIRQ-irq-unsafe lock:
[ 8.241939] (tasklist_lock){.+.+}-{2:2}
[ 8.241940]
[ 8.241940] ... which became SOFTIRQ-irq-unsafe at:
[ 8.243180] ...
[ 8.243182] lock_acquire+0x13f/0x3f0
[ 8.243853] _raw_read_lock+0x5d/0x70
[ 8.244307] do_wait+0xd2/0x2e0
[ 8.244706] kernel_wait+0x49/0x90
[ 8.245126] call_usermodehelper_exec_work+0x61/0xb0
[ 8.245748] process_one_work+0x23e/0x580
[ 8.246238] worker_thread+0x55/0x3c0
[ 8.246685] kthread+0x141/0x160
[ 8.247086] ret_from_fork+0x22/0x30
[ 8.247522]
[ 8.247522] other info that might help us debug this:
[ 8.247522]
[ 8.248480] Possible interrupt unsafe locking scenario:
[ 8.248480]
[ 8.249289] CPU0 CPU1
[ 8.249839] ---- ----
[ 8.250386] lock(tasklist_lock);
[ 8.250811] local_irq_disable();
[ 8.251525] lock(&stats->lock);
[ 8.252227] lock(tasklist_lock);
[ 8.252938] <Interrupt>
[ 8.253249] lock(&stats->lock);
[ 8.253663]
[ 8.253663] *** DEADLOCK ***
[ 8.253663]
[ 8.254362] 1 lock held by runc/2505:
[ 8.254800] #0: ffff9a6a10eb8318 (&stats->lock){+.-.}-{2:2}, at: brute_task_fatal_signal+0x97/0x210
[ 8.255872]
[ 8.255872] the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
[ 8.256931] -> (&stats->lock){+.-.}-{2:2} {
[ 8.257432] HARDIRQ-ON-W at:
[ 8.257809] lock_acquire+0x13f/0x3f0
[ 8.258442] _raw_spin_lock+0x2c/0x40
[ 8.259069] brute_task_free+0x22/0x90
[ 8.259710] security_task_free+0x22/0x50
[ 8.260384] __put_task_struct+0x58/0x140
[ 8.261078] rcu_core+0x2bb/0x620
[ 8.261677] __do_softirq+0x156/0x4d9
[ 8.262304] asm_call_irq_on_stack+0x12/0x20
[ 8.263010] do_softirq_own_stack+0x5b/0x70
[ 8.263699] irq_exit_rcu+0x9f/0xe0
[ 8.264315] sysvec_apic_timer_interrupt+0x43/0xa0
[ 8.265085] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 8.265886] _raw_write_unlock_irq+0x2c/0x40
[ 8.266580] copy_process+0x15e6/0x1cb0
[ 8.267218] kernel_clone+0x9b/0x3f0
[ 8.267832] kernel_thread+0x55/0x70
[ 8.268444] call_usermodehelper_exec_work+0x77/0xb0
[ 8.269234] process_one_work+0x23e/0x580
[ 8.269912] worker_thread+0x55/0x3c0
[ 8.270546] kthread+0x141/0x160
[ 8.271144] ret_from_fork+0x22/0x30
[ 8.271771] IN-SOFTIRQ-W at:
[ 8.272146] lock_acquire+0x13f/0x3f0
[ 8.272780] _raw_spin_lock+0x2c/0x40
[ 8.273398] brute_task_free+0x22/0x90
[ 8.274022] security_task_free+0x22/0x50
[ 8.274690] __put_task_struct+0x58/0x140
[ 8.275358] rcu_core+0x2bb/0x620
[ 8.275937] __do_softirq+0x156/0x4d9
[ 8.276559] asm_call_irq_on_stack+0x12/0x20
[ 8.277262] do_softirq_own_stack+0x5b/0x70
[ 8.277948] irq_exit_rcu+0x9f/0xe0
[ 8.278561] sysvec_apic_timer_interrupt+0x43/0xa0
[ 8.279324] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 8.280121] _raw_write_unlock_irq+0x2c/0x40
[ 8.280845] copy_process+0x15e6/0x1cb0
[ 8.281491] kernel_clone+0x9b/0x3f0
[ 8.282117] kernel_thread+0x55/0x70
[ 8.282734] call_usermodehelper_exec_work+0x77/0xb0
[ 8.283514] process_one_work+0x23e/0x580
[ 8.284181] worker_thread+0x55/0x3c0
[ 8.284802] kthread+0x141/0x160
[ 8.285381] ret_from_fork+0x22/0x30
[ 8.285998] INITIAL USE at:
[ 8.286365] lock_acquire+0x13f/0x3f0
[ 8.286975] _raw_spin_lock_irqsave+0x3b/0x60
[ 8.287665] brute_share_stats+0x17/0x70
[ 8.288311] brute_task_alloc+0x65/0x70
[ 8.288948] security_task_alloc+0x44/0xd0
[ 8.289605] copy_process+0x789/0x1cb0
[ 8.290223] kernel_clone+0x9b/0x3f0
[ 8.290819] kernel_thread+0x55/0x70
[ 8.291421] rest_init+0x21/0x258
[ 8.291995] start_kernel+0x566/0x587
[ 8.292611] secondary_startup_64_no_verify+0xc2/0xcb
[ 8.293384] }
[ 8.293586] ... key at: [<ffffffff984ce3a0>] __key.0+0x0/0x10
[ 8.294324] ... acquired at:
[ 8.294679] lock_acquire+0x13f/0x3f0
[ 8.295133] _raw_read_lock+0x5d/0x70
[ 8.295589] brute_task_fatal_signal.cold+0x49/0xe0
[ 8.296185] security_task_fatal_signal+0x22/0x30
[ 8.296772] get_signal+0x3e0/0xcf0
[ 8.297212] arch_do_signal+0x30/0x880
[ 8.297682] exit_to_user_mode_prepare+0xfc/0x170
[ 8.298263] syscall_exit_to_user_mode+0x38/0x240
[ 8.298844] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.299464]
[ 8.299648]
[ 8.299648] the dependencies between the lock to be acquired
[ 8.299648] and SOFTIRQ-irq-unsafe lock:
[ 8.300969] -> (tasklist_lock){.+.+}-{2:2} {
[ 8.301481] HARDIRQ-ON-R at:
[ 8.301857] lock_acquire+0x13f/0x3f0
[ 8.302482] _raw_read_lock+0x5d/0x70
[ 8.303110] do_wait+0xd2/0x2e0
[ 8.303677] kernel_wait+0x49/0x90
[ 8.304266] call_usermodehelper_exec_work+0x61/0xb0
[ 8.305038] process_one_work+0x23e/0x580
[ 8.305700] worker_thread+0x55/0x3c0
[ 8.306320] kthread+0x141/0x160
[ 8.306895] ret_from_fork+0x22/0x30
[ 8.307506] SOFTIRQ-ON-R at:
[ 8.307880] lock_acquire+0x13f/0x3f0
[ 8.308505] _raw_read_lock+0x5d/0x70
[ 8.309125] do_wait+0xd2/0x2e0
[ 8.309689] kernel_wait+0x49/0x90
[ 8.310272] call_usermodehelper_exec_work+0x61/0xb0
[ 8.311044] process_one_work+0x23e/0x580
[ 8.311709] worker_thread+0x55/0x3c0
[ 8.312335] kthread+0x141/0x160
[ 8.312910] ret_from_fork+0x22/0x30
[ 8.313529] INITIAL USE at:
[ 8.313897] lock_acquire+0x13f/0x3f0
[ 8.314515] _raw_write_lock_irq+0x34/0x50
[ 8.315187] copy_process+0x11d5/0x1cb0
[ 8.315834] kernel_clone+0x9b/0x3f0
[ 8.316446] kernel_thread+0x55/0x70
[ 8.317057] rest_init+0x21/0x258
[ 8.317636] start_kernel+0x566/0x587
[ 8.318250] secondary_startup_64_no_verify+0xc2/0xcb
[ 8.319033] INITIAL READ USE at:
[ 8.319452] lock_acquire+0x13f/0x3f0
[ 8.320117] _raw_read_lock+0x5d/0x70
[ 8.320777] do_wait+0xd2/0x2e0
[ 8.321387] kernel_wait+0x49/0x90
[ 8.322031] call_usermodehelper_exec_work+0x61/0xb0
[ 8.322852] process_one_work+0x23e/0x580
[ 8.323557] worker_thread+0x55/0x3c0
[ 8.324220] kthread+0x141/0x160
[ 8.324837] ret_from_fork+0x22/0x30
[ 8.325490] }
[ 8.325694] ... key at: [<ffffffff97206098>] tasklist_lock+0x18/0x40
[ 8.326509] ... acquired at:
[ 8.326871] lock_acquire+0x13f/0x3f0
[ 8.327330] _raw_read_lock+0x5d/0x70
[ 8.327790] brute_task_fatal_signal.cold+0x49/0xe0
[ 8.328392] security_task_fatal_signal+0x22/0x30
[ 8.328958] get_signal+0x3e0/0xcf0
[ 8.329388] arch_do_signal+0x30/0x880
[ 8.329855] exit_to_user_mode_prepare+0xfc/0x170
[ 8.330434] syscall_exit_to_user_mode+0x38/0x240
[ 8.331010] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.331622]
[ 8.331801]
[ 8.331801] stack backtrace:
[ 8.332322] CPU: 2 PID: 2505 Comm: runc Not tainted 5.10.0-rc3 #2
[ 8.333037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 8.334088] Call Trace:
[ 8.334385] dump_stack+0x77/0x97
[ 8.334779] check_irq_usage.cold+0x279/0x2ec
[ 8.335292] ? check_noncircular+0x75/0x110
[ 8.335783] __lock_acquire+0x12e0/0x2590
[ 8.336256] lock_acquire+0x13f/0x3f0
[ 8.336696] ? brute_task_fatal_signal.cold+0x49/0xe0
[ 8.337284] _raw_read_lock+0x5d/0x70
[ 8.337721] ? brute_task_fatal_signal.cold+0x49/0xe0
[ 8.338315] brute_task_fatal_signal.cold+0x49/0xe0
[ 8.338885] security_task_fatal_signal+0x22/0x30
[ 8.339435] get_signal+0x3e0/0xcf0
[ 8.339849] arch_do_signal+0x30/0x880
[ 8.340290] ? rcu_read_lock_sched_held+0x3f/0x70
[ 8.340853] ? kfree+0x25d/0x2c0
[ 8.341241] exit_to_user_mode_prepare+0xfc/0x170
[ 8.341796] syscall_exit_to_user_mode+0x38/0x240
[ 8.342350] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.342945] RIP: 0033:0x461923
[ 8.343310] Code: 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14 4c 8b 54 24 18 4c 8b 44 24 20 44 8b 4c 24 28 b8 ca 00 00 00 0f 05 <89> 44 24 30 c3 cc cc cc cc cc cc cc cc 8b 7c 24 08 48 8b 74 24 10
[ 8.345488] RSP: 002b:00007fd62affcda8 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
[ 8.346385] RAX: fffffffffffffe00 RBX: 000000c000032e00 RCX: 0000000000461923
[ 8.347225] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000b3b898
[ 8.348071] RBP: 00007fd62affcdf0 R08: 0000000000000000 R09: 0000000000000000
[ 8.349006] R10: 0000000000000000 R11: 0000000000000286 R12: 000000c000001680
[ 8.349838] R13: 00007ffd85b25fcf R14: 00007ffd85b260d0 R15: 00007fd62affcfc0

I think it should be possible to using existing task locking semantics
to manage the statistics, but I'll need to take a closer look.

> Other implementations
> ---------------------
>
> The public version of grsecurity, as a summary, is based on the idea of
> delay the fork system call if a child died due to a fatal error. This has
> some issues:
>
> 1.- Bad practices: Add delays to the kernel is, in general, a bad idea.
>
> 2.- Weak points: This protection can be bypassed using two different
> methods since it acts only when the fork is called after a child has
> crashed.
>
> 2.1.- Bypass 1: So, it would still be possible for an attacker to fork
> a big amount of children (in the order of thousands), then probe
> all of them, and finally wait the protection time before repeat
> the steps.
>
> 2.2.- Bypass 2: This method is based on the idea that the protection
> doesn't act if the parent crashes. So, it would still be possible
> for an attacker to fork a process and probe itself. Then, fork
> the child process and probe itself again. This way, these steps
> can be repeated infinite times without any mitigation.

It's good to clarify what the expected behaviors should be; however,
while working with the resulting system, it wasn't clear what the threat
model was for this defense. I think we need two things: clear
descriptions of what is expected to be detected (and what is not), and a
set of self-tests that can be used to validate those expectations.

Also, what, specifically, does "fatal error" cover? Is it strictly fatal
signals? (i.e. "error" might refer to exit code, for example.)

>
> This implementation
> -------------------
>
> The main idea behind this implementation is to improve the existing ones
> focusing on the weak points annotated before. The solution for the first
> bypass method is to detect a fast crash rate instead of only one simple
> crash. For the second bypass method the solution is to detect both the
> crash of parent and child processes. Moreover, as a mitigation method it is
> better to kill all the offending tasks involve in the attack instead of use
> delays.
>
> So, the solution to the two bypass methods previously commented is to use
> some statistical data shared across all the processes that can have the
> same memory contents. Or in other words, a statistical data shared between
> all the fork hierarchy processes after an execve system call.

Hm, is this already tracked in some other way? i.e. the family hierarchy
of the mm struct? They're only shared on clone, but get totally copied on
fork(). Should that be the place where this is tracked instead? (i.e. I
could fork and totally rearrange my VMAs.)

>
> The purpose of these statistics is to compute the application crash period
> in order to detect an attack. This crash period is the time between the
> execve system call and the first fault or the time between two consecutives
> faults, but this has a drawback. If an application crashes once quickly
> from the execve system call or crashes twice in a short period of time for
> some reason, a false positive attack will be triggered. To avoid this
> scenario the shared statistical data holds a list of the i last crashes
> timestamps and the application crash period is computed as follows:
>
> crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;
>
> This ways, the size of the last crashes timestamps list allows to fine
> tuning the detection sensibility.

Instead of a list, can't the rate just be calculated on an on-going
basis?

> When this crash period falls under a certain threshold there is a clear
> signal that something malicious is happening. Once detected, the mitigation
> only kills the processes that share the same statistical data and so, all
> the tasks that can have the same memory contents. This way, an attack is
> rejected.

Here's where I think the threat model needs some more work. The above
describes what I think is a less common situation. I expect the common
attack to hold still with a single value, and let the fork/exec spin
until the value lines up. (i.e. a fork is required.)

Here are the threat scenarios that come to mind for me:

1- launching (fork/exec) a setuid process repeatedly until you get a
desirable memory layout (e.g. what Stack Clash[1] did).

2- connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until you get a desirable memory layout (e.g. what CTFs do for simple
network service[2]).

3- launching processes _without_ exec (e.g. Android Zygote[3]), and
exposing state to attack a sibling.

4- connecting to a fork()ing network daemon (e.g. apache) repeatedly
until you expose the previously-shared memory layout of all the other
children (e.g. kind of related to HeartBleed[3], though that was a
direct exposure not a crasher).

In each case, a privilege boundary is being crossed (setuid in the first,
priv-changes in the 2nd, and network-to-local in the latter two), so I
suspect that kind of detail will need to play a part in the design to
help avoid false positives.

Regardless, when I tested this series, 1 and 3 isn't detected, since
they pass through an execve(), and I think that needs to be covered as
well.

[1] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[2] https://github.com/BSidesPDX/CTF-2017/blob/master/pwn/200-leek/src/leek.service
[3] https://link.springer.com/article/10.1007/s10207-018-00425-8
[4] https://heartbleed.com/

>
> 1.- Per system enabling: This feature can be enabled at build time using
> the CONFIG_SECURITY_FORK_BRUTE option or using the visual config
> application under the following menu:
>
> Security options ---> Fork brute force attack detection and mitigation

(there is a built-in boot time disabling too, by changing the lsm=
bootparam)

>
> 2.- Per process enabling/disabling: To allow that specific applications can
> turn off or turn on the detection and mitigation of a fork brute force
> attack when required, there are two new prctls.
>
> prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0)
> prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0)

How do you see this being used?

> 3.- Fine tuning: To customize the detection's sensibility there are two new
> sysctl attributes that allow to set the last crashes timestamps list
> size and the application crash period threshold (in milliseconds). Both
> are accessible through the following files respectively.
>
> /proc/sys/kernel/brute/timestamps_list_size
> /proc/sys/kernel/brute/crash_period_threshold
>
> The list size allows to avoid false positives due to crashes unrelated
> with a real attack. The period threshold sets the time limit to detect
> an attack. And, since a fork brute force attack will be detected if the
> application crash period falls under this threshold, the higher this
> value, the more sensitive the detection will be.

I wonder if these will be needed as we narrow in on the specific threat
model (i.e. there will be enough additional signal to obviate this
tuning).

And in testing I found another false positive that I haven't fully
diagnosed. I found that at boot, with Docker installed, "runc" would
immediately trip the mitigation. With some debugging added, I looks like
runc had several forked processes that got SIGKILLed in quick succession,
and then the entire group got killed by Brute. I haven't narrowed down
what runc is doing here, but it makes me wonder if there might need
to be an exception for user-space delivered signals, as opposed to
kernel-delivered signals...

Thanks again for the work! I'm liking the idea of getting a solid
protection for this. It's been a long-standing hole in upstream. :)

-Kees

--
Kees Cook

2020-11-15 16:07:03

by John Wood

[permalink] [raw]
Subject: Re: [PATCH v2 0/8] Fork brute force attack mitigation

On Tue, Nov 10, 2020 at 04:10:49PM -0800, Kees Cook wrote:
> On Sun, Oct 25, 2020 at 02:45:32PM +0100, John Wood wrote:
> > Attacks against vulnerable userspace applications with the purpose to break
> > ASLR or bypass canaries traditionaly use some level of brute force with the
> > help of the fork system call. This is possible since when creating a new
> > process using fork its memory contents are the same as those of the parent
> > process (the process that called the fork system call). So, the attacker
> > can test the memory infinite times to find the correct memory values or the
> > correct memory addresses without worrying about crashing the application.
> >
> > Based on the above scenario it would be nice to have this detected and
> > mitigated, and this is the goal of this patch serie.
>
> Thanks for preparing the v2! I spent some time looking at this today,
> and I really like how it has been rearranged into an LSM. This feels
> much more natural.

Thank you very much for reviewing this work.

> Various notes:
>
> The locking isn't right; it'll trip with CONFIG_PROVE_LOCKING=y.
> Here's the giant splat:
>
> [ 8.205146] brute: Fork brute force attack detected
> [ 8.206821]
> [ 8.207317] =====================================================
> [ 8.209392] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> [ 8.211852] 5.10.0-rc3 #2 Not tainted
> [ 8.213215] -----------------------------------------------------
> [...]

Interesting... My ".config" also have this option "on" but this warning
never appeared. Under what circumstances did it happen? I tested this
patchset with two custom test programs with no problem. Maybe my tests
are not good enough :(

> I think it should be possible to using existing task locking semantics
> to manage the statistics, but I'll need to take a closer look.
>
> > Other implementations
> > ---------------------
> >
> > The public version of grsecurity, as a summary, is based on the idea of
> > delay the fork system call if a child died due to a fatal error. This has
> > some issues:
> >
> > 1.- Bad practices: Add delays to the kernel is, in general, a bad idea.
> >
> > 2.- Weak points: This protection can be bypassed using two different
> > methods since it acts only when the fork is called after a child has
> > crashed.
> >
> > 2.1.- Bypass 1: So, it would still be possible for an attacker to fork
> > a big amount of children (in the order of thousands), then probe
> > all of them, and finally wait the protection time before repeat
> > the steps.
> >
> > 2.2.- Bypass 2: This method is based on the idea that the protection
> > doesn't act if the parent crashes. So, it would still be possible
> > for an attacker to fork a process and probe itself. Then, fork
> > the child process and probe itself again. This way, these steps
> > can be repeated infinite times without any mitigation.
>
> It's good to clarify what the expected behaviors should be; however,
> while working with the resulting system, it wasn't clear what the threat
> model was for this defense. I think we need two things: clear
> descriptions of what is expected to be detected (and what is not), and a
> set of self-tests that can be used to validate those expectations.

Ok, I will make the necessary changes in order to clarify the commented
points. I will also add the self-tests. Thanks for the suggestion.

> Also, what, specifically, does "fatal error" cover? Is it strictly fatal
> signals? (i.e. "error" might refer to exit code, for example.)

Ok, understood.

> >
> > This implementation
> > -------------------
> >
> > The main idea behind this implementation is to improve the existing ones
> > focusing on the weak points annotated before. The solution for the first
> > bypass method is to detect a fast crash rate instead of only one simple
> > crash. For the second bypass method the solution is to detect both the
> > crash of parent and child processes. Moreover, as a mitigation method it is
> > better to kill all the offending tasks involve in the attack instead of use
> > delays.
> >
> > So, the solution to the two bypass methods previously commented is to use
> > some statistical data shared across all the processes that can have the
> > same memory contents. Or in other words, a statistical data shared between
> > all the fork hierarchy processes after an execve system call.
>
> Hm, is this already tracked in some other way? i.e. the family hierarchy
> of the mm struct? They're only shared on clone, but get totally copied on
> fork(). Should that be the place where this is tracked instead? (i.e. I
> could fork and totally rearrange my VMAs.)

Great, I will give it a try and study this hierarchy. If this is feasible,
the correct path shoud be to create a new security blob hold by the mm
struct? Also, I think that new LSM hooks will be needed to notify the
allocation, release and copy of this structure. What do you think, am I in
the right direction?

> > The purpose of these statistics is to compute the application crash period
> > in order to detect an attack. This crash period is the time between the
> > execve system call and the first fault or the time between two consecutives
> > faults, but this has a drawback. If an application crashes once quickly
> > from the execve system call or crashes twice in a short period of time for
> > some reason, a false positive attack will be triggered. To avoid this
> > scenario the shared statistical data holds a list of the i last crashes
> > timestamps and the application crash period is computed as follows:
> >
> > crash_period = (n_last_timestamp - n_minus_i_timestamp) / i;
> >
> > This ways, the size of the last crashes timestamps list allows to fine
> > tuning the detection sensibility.
>
> Instead of a list, can't the rate just be calculated on an on-going
> basis?

The first version computed the application crash period on the fly
(without a list) using the accumulated time and the number of faults.

period = time_since_execve / faults_since_execve;

But this method has two major drawbacks. The first (noted by Jann Horn),
is that if an application runs for a long time (say, a month) without any
crash, and the detection threshold is set to 30 s (default value), an
attacker can break the app approximately:

30*24*60*60/30 = 86400 times

before the detection is triggered.

The second drawback is that if the first fault is fast (from the execve
system call) the detection will be triggered instantly. This is a problem
if an application fails quickly for reasons that are not related to a real
attack.

I think to avoid the first drawback we can compute the crash period using
a kind of weighted average (setting more weight for the last time between
faults). But the second drawback does not go away.

However, with the list we don't have any of the commented drawbacks (only a
larger memory footprint and more calculation time).

Any ideas to remove the list are welcome ;)

> > When this crash period falls under a certain threshold there is a clear
> > signal that something malicious is happening. Once detected, the mitigation
> > only kills the processes that share the same statistical data and so, all
> > the tasks that can have the same memory contents. This way, an attack is
> > rejected.
>
> Here's where I think the threat model needs some more work. The above
> describes what I think is a less common situation. I expect the common
> attack to hold still with a single value, and let the fork/exec spin
> until the value lines up. (i.e. a fork is required.)
>
> Here are the threat scenarios that come to mind for me:
>
> 1- launching (fork/exec) a setuid process repeatedly until you get a
> desirable memory layout (e.g. what Stack Clash[1] did).
>
> 2- connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
> until you get a desirable memory layout (e.g. what CTFs do for simple
> network service[2]).
>
> 3- launching processes _without_ exec (e.g. Android Zygote[3]), and
> exposing state to attack a sibling.
>
> 4- connecting to a fork()ing network daemon (e.g. apache) repeatedly
> until you expose the previously-shared memory layout of all the other
> children (e.g. kind of related to HeartBleed[3], though that was a
> direct exposure not a crasher).
>
> In each case, a privilege boundary is being crossed (setuid in the first,
> priv-changes in the 2nd, and network-to-local in the latter two), so I
> suspect that kind of detail will need to play a part in the design to
> help avoid false positives.

Ok, I will try to play with privilege boundary crossing in the next
version.

> Regardless, when I tested this series, 1 and 3 isn't detected, since
> they pass through an execve(), and I think that needs to be covered as
> well.

Thanks, I will work on it and review the execve case. Also, thanks for
showing me real examples of fork brute force attacks ;)

> [1] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
> [2] https://github.com/BSidesPDX/CTF-2017/blob/master/pwn/200-leek/src/leek.service
> [3] https://link.springer.com/article/10.1007/s10207-018-00425-8
> [4] https://heartbleed.com/
>
> >
> > 1.- Per system enabling: This feature can be enabled at build time using
> > the CONFIG_SECURITY_FORK_BRUTE option or using the visual config
> > application under the following menu:
> >
> > Security options ---> Fork brute force attack detection and mitigation
>
> (there is a built-in boot time disabling too, by changing the lsm=
> bootparam)

Thanks, I will add it to the documentation.

> > 2.- Per process enabling/disabling: To allow that specific applications can
> > turn off or turn on the detection and mitigation of a fork brute force
> > attack when required, there are two new prctls.
> >
> > prctl(PR_SECURITY_FORK_BRUTE_ENABLE, 0, 0, 0, 0)
> > prctl(PR_SECURITY_FORK_BRUTE_DISABLE, 0, 0, 0, 0)
>
> How do you see this being used?

Maybe the PR_SECURITY_FORK_BRUTE_DISABLE prctl can be used by applications
like "runc" (commented below) if we cannot exclude this false positive. I
mean during phases that may need fork and kill in a quick way. Then, during
other phases, the protection could be enabled again. Makes sense?

> > 3.- Fine tuning: To customize the detection's sensibility there are two new
> > sysctl attributes that allow to set the last crashes timestamps list
> > size and the application crash period threshold (in milliseconds). Both
> > are accessible through the following files respectively.
> >
> > /proc/sys/kernel/brute/timestamps_list_size
> > /proc/sys/kernel/brute/crash_period_threshold
> >
> > The list size allows to avoid false positives due to crashes unrelated
> > with a real attack. The period threshold sets the time limit to detect
> > an attack. And, since a fork brute force attack will be detected if the
> > application crash period falls under this threshold, the higher this
> > value, the more sensitive the detection will be.
>
> I wonder if these will be needed as we narrow in on the specific threat
> model (i.e. there will be enough additional signal to obviate this
> tuning).

I agree. If possible, I will remove as much configuration as possible.
Thanks.

> And in testing I found another false positive that I haven't fully
> diagnosed. I found that at boot, with Docker installed, "runc" would
> immediately trip the mitigation. With some debugging added, I looks like
> runc had several forked processes that got SIGKILLed in quick succession,
> and then the entire group got killed by Brute. I haven't narrowed down
> what runc is doing here, but it makes me wonder if there might need
> to be an exception for user-space delivered signals, as opposed to
> kernel-delivered signals...

I will try to work on this as well.

> Thanks again for the work! I'm liking the idea of getting a solid
> protection for this. It's been a long-standing hole in upstream. :)

It's a pleasure to work on this protection. This is the perfect task to
learn a lot.

Thank you very much for the review and all the comments. I know this
review was streamed on twitch but, due to my main language is not english,
I cannot follow speech the way I want. Will this video be uploaded to
youtube? There I can turn on the subtitles. This way I can learn and
improve more things. Thanks again.

>
> -Kees
>
> --
> Kees Cook

Regards,
John Wood