2024-05-24 19:30:00

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH v4 1/2] proc: pass file instead of inode to proc_mem_open

The file struct is required in proc_mem_open() so its
f_mode can be checked when deciding whether to allow or
deny /proc/*/mem open requests via the new read/write
and foll_force restriction mechanism.

Thus instead of directly passing the inode to the fun,
we pass the file and get the inode inside it.

Cc: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Christian Brauner <[email protected]>
Signed-off-by: Adrian Ratiu <[email protected]>
---
* New in v4
---
fs/proc/base.c | 6 +++---
fs/proc/internal.h | 2 +-
fs/proc/task_mmu.c | 6 +++---
3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 18550c071d71..6faf1b3a4117 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -794,9 +794,9 @@ static const struct file_operations proc_single_file_operations = {
};


-struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
+struct mm_struct *proc_mem_open(struct file *file, unsigned int mode)
{
- struct task_struct *task = get_proc_task(inode);
+ struct task_struct *task = get_proc_task(file->f_inode);
struct mm_struct *mm = ERR_PTR(-ESRCH);

if (task) {
@@ -816,7 +816,7 @@ struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)

static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
{
- struct mm_struct *mm = proc_mem_open(inode, mode);
+ struct mm_struct *mm = proc_mem_open(file, mode);

if (IS_ERR(mm))
return PTR_ERR(mm);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index a71ac5379584..d38b2eea40d1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -295,7 +295,7 @@ struct proc_maps_private {
#endif
} __randomize_layout;

-struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
+struct mm_struct *proc_mem_open(struct file *file, unsigned int mode);

extern const struct file_operations proc_pid_maps_operations;
extern const struct file_operations proc_pid_numa_maps_operations;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e5a5f015ff03..dc9abbf662be 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -210,7 +210,7 @@ static int proc_maps_open(struct inode *inode, struct file *file,
return -ENOMEM;

priv->inode = inode;
- priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+ priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
if (IS_ERR(priv->mm)) {
int err = PTR_ERR(priv->mm);

@@ -1025,7 +1025,7 @@ static int smaps_rollup_open(struct inode *inode, struct file *file)
goto out_free;

priv->inode = inode;
- priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+ priv->mm = proc_mem_open(file, PTRACE_MODE_READ);
if (IS_ERR(priv->mm)) {
ret = PTR_ERR(priv->mm);

@@ -1749,7 +1749,7 @@ static int pagemap_open(struct inode *inode, struct file *file)
{
struct mm_struct *mm;

- mm = proc_mem_open(inode, PTRACE_MODE_READ);
+ mm = proc_mem_open(file, PTRACE_MODE_READ);
if (IS_ERR(mm))
return PTR_ERR(mm);
file->private_data = mm;
--
2.44.1



2024-05-24 19:37:43

by Adrian Ratiu

[permalink] [raw]
Subject: [PATCH v4 2/2] proc: restrict /proc/pid/mem

Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
after which it got allowed in commit 198214a7ee50 ("proc: enable
writing to /proc/pid/mem"). Famous last words from that patch:
"no longer a security hazard". :)

Afterwards exploits started causing drama like [1]. The exploits
using /proc/*/mem can be rather sophisticated like [2] which
installed an arbitrary payload from noexec storage into a running
process then exec'd it, which itself could include an ELF loader
to run arbitrary code off noexec storage.

One of the well-known problems with /proc/*/mem writes is they
ignore page permissions via FOLL_FORCE, as opposed to writes via
process_vm_writev which respect page permissions. These writes can
also be used to bypass mode bits.

To harden against these types of attacks, distrbutions might want
to restrict /proc/pid/mem accesses, either entirely or partially,
for eg. to restrict FOLL_FORCE usage.

Known valid use-cases which still need these accesses are:

* Debuggers which also have ptrace permissions, so they can access
memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
are designed to write /proc/pid/mem for basic functionality.

* Container supervisors using the seccomp notifier to intercept
syscalls and rewrite memory of calling processes by passing
around /proc/pid/mem file descriptors.

There might be more, that's why these params default to disabled.

Regarding other mechanisms which can block these accesses:

* seccomp filters can be used to block mmap/mprotect calls with W|X
perms, but they often can't block open calls as daemons want to
read/write their runtime state and seccomp filters cannot check
file paths, so plain write calls can't be easily blocked.

* Since the mem file is part of the dynamic /proc/<pid>/ space, we
can't run chmod once at boot to restrict it (and trying to react
to every process and run chmod doesn't scale, and the kernel no
longer allows chmod on any of these paths).

* SELinux could be used with a rule to cover all /proc/*/mem files,
but even then having multiple ways to deny an attack is useful in
case one layer fails.

Thus we introduce four kernel parameters to restrict /proc/*/mem
access: open-read, open-write, write and foll_force. All these can
be independently set to the following values:

all => restrict all access unconditionally.
ptracer => restrict all access except for ptracer processes.

If left unset, the existing behaviour is preserved, i.e. access
is governed by basic file permissions.

Examples which can be passed by bootloaders:

proc_mem.restrict_foll_force=all
proc_mem.restrict_open_write=ptracer
proc_mem.restrict_open_read=ptracer
proc_mem.restrict_write=all

These knobs can also be enabled via Kconfig like for eg:

CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y

Each distribution needs to decide what restrictions to apply,
depending on its use-cases. Embedded systems might want to do
more, while general-purpouse distros might want a more relaxed
policy, because for e.g. foll_force=all and write=all both break
break GDB, so it might be a bit excessive.

Based on an initial patch by Mike Frysinger <[email protected]>.

Link: https://lwn.net/Articles/476947/ [1]
Link: https://issues.chromium.org/issues/40089045 [2]
Cc: Guenter Roeck <[email protected]>
Cc: Doug Anderson <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Christian Brauner <[email protected]>
Co-developed-by: Mike Frysinger <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Adrian Ratiu <[email protected]>
---
Changes in v4:
* Renamed parameters to use a fake namespace and respect
subject-verb-objec pattern (eg proc_mem.restrict_read)
* Replaced static key array with individual definitions.
Still need 6 key definitions because we need to store 3
states for each parameter, eg read all/ptrace/DAC states,
so we need 2 keys for each parameter -- they will not fit
into just 1 static key.
* Replaced strncmp -> strcmp and dropped redundant helper,
significantly simplified DEFINE_EARLY_PROC_MEM_RESTRICT
macro.
* Dropped else from __mem_open_check_access_restriction()
* Moved ptracer check to proc_mem_open to avoid ToCToU
* Added extra mm_access() check for the mem_rw() case
* Found a use case for blocking just writes independent
of open restrictions, so added a new param
* Added *_DEFAULT Kconfigs
---
.../admin-guide/kernel-parameters.txt | 38 ++++++
fs/proc/base.c | 124 +++++++++++++++++-
security/Kconfig | 68 ++++++++++
3 files changed, 229 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 500cfa776225..3fdfeaefccf2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4792,6 +4792,44 @@
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)

+ proc_mem.restrict_foll_force= [KNL]
+ Format: {all | ptracer}
+ Restricts the use of the FOLL_FORCE flag for /proc/*/mem access.
+ If restricted, the FOLL_FORCE flag will not be added to vm accesses.
+ Can be one of:
+ - 'all' restricts all access unconditionally.
+ - 'ptracer' allows access only for ptracer processes.
+ If not specified, FOLL_FORCE is always used.
+
+ proc_mem.restrict_open_read= [KNL]
+ Format: {all | ptracer}
+ Allows restricting read access to /proc/*/mem files during open().
+ Depending on restriction level, open for reads return -EACCES.
+ Can be one of:
+ - 'all' restricts all access unconditionally.
+ - 'ptracer' allows access only for ptracer processes.
+ If not specified, then basic file permissions continue to apply.
+
+ proc_mem.restrict_open_write= [KNL]
+ Format: {all | ptracer}
+ Allows restricting write access to /proc/*/mem files during open().
+ Depending on restriction level, open for writes return -EACCES.
+ Can be one of:
+ - 'all' restricts all access unconditionally.
+ - 'ptracer' allows access only for ptracer processes.
+ If not specified, then basic file permissions continue to apply.
+
+ proc_mem.restrict_write= [KNL]
+ Format: {all | ptracer}
+ Allows restricting write access to /proc/*/mem after the files
+ have been opened, during the actual write calls. This is useful for
+ systems which can't block writes earlier during open().
+ Depending on restriction level, writes will return -EACCES.
+ Can be one of:
+ - 'all' restricts all access unconditionally.
+ - 'ptracer' allows access only for ptracer processes.
+ If not specified, then basic file permissions continue to apply.
+
processor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6faf1b3a4117..9223eaaf055b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -152,6 +152,30 @@ struct pid_entry {
NULL, &proc_pid_attr_operations, \
{ .lsmid = LSMID })

+#define DEFINE_EARLY_PROC_MEM_RESTRICT(CFG, name) \
+DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_PROC_MEM_RESTRICT_##CFG##_DEFAULT, \
+ proc_mem_restrict_##name##_all); \
+DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_PROC_MEM_RESTRICT_##CFG##_PTRACE_DEFAULT, \
+ proc_mem_restrict_##name##_ptracer); \
+ \
+static int __init early_proc_mem_restrict_##name(char *buf) \
+{ \
+ if (!buf) \
+ return -EINVAL; \
+ \
+ if (strcmp(buf, "all") == 0) \
+ static_key_slow_inc(&proc_mem_restrict_##name##_all.key); \
+ else if (strcmp(buf, "ptracer") == 0) \
+ static_key_slow_inc(&proc_mem_restrict_##name##_ptracer.key); \
+ return 0; \
+} \
+early_param("proc_mem.restrict_" #name, early_proc_mem_restrict_##name)
+
+DEFINE_EARLY_PROC_MEM_RESTRICT(OPEN_READ, open_read);
+DEFINE_EARLY_PROC_MEM_RESTRICT(OPEN_WRITE, open_write);
+DEFINE_EARLY_PROC_MEM_RESTRICT(WRITE, write);
+DEFINE_EARLY_PROC_MEM_RESTRICT(FOLL_FORCE, foll_force);
+
/*
* Count the number of hardlinks for the pid_entry table, excluding the .
* and .. links.
@@ -794,12 +818,56 @@ static const struct file_operations proc_single_file_operations = {
};


+static int __mem_open_access_permitted(struct file *file, struct task_struct *task)
+{
+ bool is_ptracer;
+
+ rcu_read_lock();
+ is_ptracer = current == ptrace_parent(task);
+ rcu_read_unlock();
+
+ if (file->f_mode & FMODE_WRITE) {
+ /* Deny if writes are unconditionally disabled via param */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT,
+ &proc_mem_restrict_open_write_all))
+ return -EACCES;
+
+ /* Deny if writes are allowed only for ptracers via param */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT,
+ &proc_mem_restrict_open_write_ptracer) &&
+ !is_ptracer)
+ return -EACCES;
+ }
+
+ if (file->f_mode & FMODE_READ) {
+ /* Deny if reads are unconditionally disabled via param */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_DEFAULT,
+ &proc_mem_restrict_open_read_all))
+ return -EACCES;
+
+ /* Deny if reads are allowed only for ptracers via param */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT,
+ &proc_mem_restrict_open_read_ptracer) &&
+ !is_ptracer)
+ return -EACCES;
+ }
+
+ return 0; /* R/W are not restricted */
+}
+
struct mm_struct *proc_mem_open(struct file *file, unsigned int mode)
{
struct task_struct *task = get_proc_task(file->f_inode);
struct mm_struct *mm = ERR_PTR(-ESRCH);
+ int ret;

if (task) {
+ ret = __mem_open_access_permitted(file, task);
+ if (ret) {
+ put_task_struct(task);
+ return ERR_PTR(ret);
+ }
+
mm = mm_access(task, mode | PTRACE_MODE_FSCREDS);
put_task_struct(task);

@@ -835,6 +903,56 @@ static int mem_open(struct inode *inode, struct file *file)
return ret;
}

+static bool __mem_rw_current_is_ptracer(struct file *file)
+{
+ struct inode *inode = file_inode(file);
+ struct task_struct *task = get_proc_task(inode);
+ int is_ptracer = false, has_mm_access = false;
+
+ if (task) {
+ rcu_read_lock();
+ is_ptracer = current == ptrace_parent(task);
+ rcu_read_unlock();
+
+ has_mm_access = file->private_data == mm_access(task, PTRACE_MODE_READ_FSCREDS);
+ put_task_struct(task);
+ }
+
+ return is_ptracer && has_mm_access;
+}
+
+static unsigned int __mem_rw_get_foll_force_flag(struct file *file)
+{
+ /* Deny if FOLL_FORCE is disabled via param */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT,
+ &proc_mem_restrict_foll_force_all))
+ return 0;
+
+ /* Deny if FOLL_FORCE is allowed only for ptracers via param */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT,
+ &proc_mem_restrict_foll_force_ptracer) &&
+ !__mem_rw_current_is_ptracer(file))
+ return 0;
+
+ return FOLL_FORCE;
+}
+
+static bool __mem_rw_block_writes(struct file *file)
+{
+ /* Block if writes are disabled via param proc_mem.restrict_write=all */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_DEFAULT,
+ &proc_mem_restrict_write_all))
+ return true;
+
+ /* Block with an exception only for ptracers */
+ if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT,
+ &proc_mem_restrict_write_ptracer) &&
+ !__mem_rw_current_is_ptracer(file))
+ return true;
+
+ return false;
+}
+
static ssize_t mem_rw(struct file *file, char __user *buf,
size_t count, loff_t *ppos, int write)
{
@@ -847,6 +965,9 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
if (!mm)
return 0;

+ if (write && __mem_rw_block_writes(file))
+ return -EACCES;
+
page = (char *)__get_free_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
@@ -855,7 +976,8 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
if (!mmget_not_zero(mm))
goto free;

- flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
+ flags = (write ? FOLL_WRITE : 0);
+ flags |= __mem_rw_get_foll_force_flag(file);

while (count > 0) {
size_t this_len = min_t(size_t, count, PAGE_SIZE);
diff --git a/security/Kconfig b/security/Kconfig
index 412e76f1575d..0cd73f848b5a 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -183,6 +183,74 @@ config STATIC_USERMODEHELPER_PATH
If you wish for all usermode helper programs to be disabled,
specify an empty string here (i.e. "").

+menu "Procfs mem restriction options"
+
+config PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT
+ bool "Restrict all FOLL_FORCE flag usage"
+ default n
+ help
+ Restrict all FOLL_FORCE usage during /proc/*/mem RW.
+ Debuggerg like GDB require using FOLL_FORCE for basic
+ functionality.
+
+config PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT
+ bool "Restrict FOLL_FORCE usage except for ptracers"
+ default n
+ help
+ Restrict FOLL_FORCE usage during /proc/*/mem RW, except
+ for ptracer processes. Debuggerg like GDB require using
+ FOLL_FORCE for basic functionality.
+
+config PROC_MEM_RESTRICT_OPEN_READ_DEFAULT
+ bool "Restrict all open() read access"
+ default n
+ help
+ Restrict all open() read access to /proc/*/mem files.
+ Use with caution: this can break init systems, debuggers,
+ container supervisors and other tasks using /proc/*/mem.
+
+config PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT
+ bool "Restrict open() for reads except for ptracers"
+ default n
+ help
+ Restrict open() read access except for ptracer processes.
+ Use with caution: this can break init systems, debuggers,
+ container supervisors and other non-ptrace capable tasks
+ using /proc/*/mem.
+
+config PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT
+ bool "Restrict all open() write access"
+ default n
+ help
+ Restrict all open() write access to /proc/*/mem files.
+ Debuggers like GDB and some container supervisors tasks
+ require opening as RW and may break.
+
+config PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT
+ bool "Restrict open() for writes except for ptracers"
+ default n
+ help
+ Restrict open() write access except for ptracer processes,
+ usually debuggers.
+
+config PROC_MEM_RESTRICT_WRITE_DEFAULT
+ bool "Restrict all write() calls"
+ default n
+ help
+ Restrict all /proc/*/mem direct write calls.
+ Open calls with RW modes are still allowed, this blocks
+ just the write() calls.
+
+config PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT
+ bool "Restrict write() calls except for ptracers"
+ default n
+ help
+ Restrict /proc/*/mem direct write calls except for ptracer processes.
+ Open calls with RW modes are still allowed, this blocks just
+ the write() calls.
+
+endmenu
+
source "security/selinux/Kconfig"
source "security/smack/Kconfig"
source "security/tomoyo/Kconfig"
--
2.44.1


2024-05-24 21:06:31

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] proc: pass file instead of inode to proc_mem_open

Hi Adrian,

kernel test robot noticed the following build errors:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on kees/for-next/kspp linus/master v6.9 next-20240523]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Adrian-Ratiu/proc-restrict-proc-pid-mem/20240525-033201
base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
patch link: https://lore.kernel.org/r/20240524192858.3206-1-adrian.ratiu%40collabora.com
patch subject: [PATCH v4 1/2] proc: pass file instead of inode to proc_mem_open
config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20240525/[email protected]/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 7aa382fd7257d9bd4f7fc50bb7078a3c26a1628c)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240525/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

In file included from fs/proc/task_nommu.c:3:
In file included from include/linux/mm.h:2208:
include/linux/vmstat.h:522:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
522 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
>> fs/proc/task_nommu.c:262:27: error: incompatible pointer types passing 'struct inode *' to parameter of type 'struct file *' [-Werror,-Wincompatible-pointer-types]
262 | priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
| ^~~~~
fs/proc/internal.h:298:46: note: passing argument to parameter 'file' here
298 | struct mm_struct *proc_mem_open(struct file *file, unsigned int mode);
| ^
1 warning and 1 error generated.


vim +262 fs/proc/task_nommu.c

b76437579d1344 Siddhesh Poyarekar 2012-03-21 251
b76437579d1344 Siddhesh Poyarekar 2012-03-21 252 static int maps_open(struct inode *inode, struct file *file,
b76437579d1344 Siddhesh Poyarekar 2012-03-21 253 const struct seq_operations *ops)
662795deb854b3 Eric W. Biederman 2006-06-26 254 {
dbf8685c8e2140 David Howells 2006-09-27 255 struct proc_maps_private *priv;
dbf8685c8e2140 David Howells 2006-09-27 256
27692cd56e2aa6 Oleg Nesterov 2014-10-09 257 priv = __seq_open_private(file, ops, sizeof(*priv));
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 258 if (!priv)
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 259 return -ENOMEM;
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 260
2c03376d2db005 Oleg Nesterov 2014-10-09 261 priv->inode = inode;
27692cd56e2aa6 Oleg Nesterov 2014-10-09 @262 priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
27692cd56e2aa6 Oleg Nesterov 2014-10-09 263 if (IS_ERR(priv->mm)) {
27692cd56e2aa6 Oleg Nesterov 2014-10-09 264 int err = PTR_ERR(priv->mm);
27692cd56e2aa6 Oleg Nesterov 2014-10-09 265
27692cd56e2aa6 Oleg Nesterov 2014-10-09 266 seq_release_private(inode, file);
27692cd56e2aa6 Oleg Nesterov 2014-10-09 267 return err;
27692cd56e2aa6 Oleg Nesterov 2014-10-09 268 }
27692cd56e2aa6 Oleg Nesterov 2014-10-09 269
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 270 return 0;
662795deb854b3 Eric W. Biederman 2006-06-26 271 }
662795deb854b3 Eric W. Biederman 2006-06-26 272

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-05-24 21:58:41

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] proc: pass file instead of inode to proc_mem_open

Hi Adrian,

kernel test robot noticed the following build errors:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on kees/for-next/kspp linus/master v6.9 next-20240523]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Adrian-Ratiu/proc-restrict-proc-pid-mem/20240525-033201
base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
patch link: https://lore.kernel.org/r/20240524192858.3206-1-adrian.ratiu%40collabora.com
patch subject: [PATCH v4 1/2] proc: pass file instead of inode to proc_mem_open
config: m68k-allnoconfig (https://download.01.org/0day-ci/archive/20240525/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240525/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

fs/proc/task_nommu.c: In function 'maps_open':
>> fs/proc/task_nommu.c:262:34: error: passing argument 1 of 'proc_mem_open' from incompatible pointer type [-Werror=incompatible-pointer-types]
262 | priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
| ^~~~~
| |
| struct inode *
In file included from fs/proc/task_nommu.c:13:
fs/proc/internal.h:298:46: note: expected 'struct file *' but argument is of type 'struct inode *'
298 | struct mm_struct *proc_mem_open(struct file *file, unsigned int mode);
| ~~~~~~~~~~~~~^~~~
cc1: some warnings being treated as errors


vim +/proc_mem_open +262 fs/proc/task_nommu.c

b76437579d1344 Siddhesh Poyarekar 2012-03-21 251
b76437579d1344 Siddhesh Poyarekar 2012-03-21 252 static int maps_open(struct inode *inode, struct file *file,
b76437579d1344 Siddhesh Poyarekar 2012-03-21 253 const struct seq_operations *ops)
662795deb854b3 Eric W. Biederman 2006-06-26 254 {
dbf8685c8e2140 David Howells 2006-09-27 255 struct proc_maps_private *priv;
dbf8685c8e2140 David Howells 2006-09-27 256
27692cd56e2aa6 Oleg Nesterov 2014-10-09 257 priv = __seq_open_private(file, ops, sizeof(*priv));
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 258 if (!priv)
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 259 return -ENOMEM;
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 260
2c03376d2db005 Oleg Nesterov 2014-10-09 261 priv->inode = inode;
27692cd56e2aa6 Oleg Nesterov 2014-10-09 @262 priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
27692cd56e2aa6 Oleg Nesterov 2014-10-09 263 if (IS_ERR(priv->mm)) {
27692cd56e2aa6 Oleg Nesterov 2014-10-09 264 int err = PTR_ERR(priv->mm);
27692cd56e2aa6 Oleg Nesterov 2014-10-09 265
27692cd56e2aa6 Oleg Nesterov 2014-10-09 266 seq_release_private(inode, file);
27692cd56e2aa6 Oleg Nesterov 2014-10-09 267 return err;
27692cd56e2aa6 Oleg Nesterov 2014-10-09 268 }
27692cd56e2aa6 Oleg Nesterov 2014-10-09 269
ce34fddb5bafb4 Oleg Nesterov 2014-10-09 270 return 0;
662795deb854b3 Eric W. Biederman 2006-06-26 271 }
662795deb854b3 Eric W. Biederman 2006-06-26 272

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-05-25 10:37:08

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] proc: restrict /proc/pid/mem

Hi--

On 5/24/24 12:28 PM, Adrian Ratiu wrote:
> diff --git a/security/Kconfig b/security/Kconfig
> index 412e76f1575d..0cd73f848b5a 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -183,6 +183,74 @@ config STATIC_USERMODEHELPER_PATH
> If you wish for all usermode helper programs to be disabled,
> specify an empty string here (i.e. "").
>
> +menu "Procfs mem restriction options"
> +
> +config PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT
> + bool "Restrict all FOLL_FORCE flag usage"
> + default n
> + help
> + Restrict all FOLL_FORCE usage during /proc/*/mem RW.
> + Debuggerg like GDB require using FOLL_FORCE for basic

Debuggers

> + functionality.
> +
> +config PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT
> + bool "Restrict FOLL_FORCE usage except for ptracers"
> + default n
> + help
> + Restrict FOLL_FORCE usage during /proc/*/mem RW, except
> + for ptracer processes. Debuggerg like GDB require using

Debuggers

> + FOLL_FORCE for basic functionality.
> +
> +config PROC_MEM_RESTRICT_OPEN_READ_DEFAULT
> + bool "Restrict all open() read access"
> + default n
> + help
> + Restrict all open() read access to /proc/*/mem files.
> + Use with caution: this can break init systems, debuggers,
> + container supervisors and other tasks using /proc/*/mem.
> +
> +config PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT
> + bool "Restrict open() for reads except for ptracers"
> + default n
> + help
> + Restrict open() read access except for ptracer processes.
> + Use with caution: this can break init systems, debuggers,
> + container supervisors and other non-ptrace capable tasks
> + using /proc/*/mem.
> +
> +config PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT
> + bool "Restrict all open() write access"
> + default n
> + help
> + Restrict all open() write access to /proc/*/mem files.
> + Debuggers like GDB and some container supervisors tasks
> + require opening as RW and may break.
> +
> +config PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT
> + bool "Restrict open() for writes except for ptracers"
> + default n
> + help
> + Restrict open() write access except for ptracer processes,
> + usually debuggers.
> +
> +config PROC_MEM_RESTRICT_WRITE_DEFAULT
> + bool "Restrict all write() calls"
> + default n
> + help
> + Restrict all /proc/*/mem direct write calls.
> + Open calls with RW modes are still allowed, this blocks
> + just the write() calls.
> +
> +config PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT
> + bool "Restrict write() calls except for ptracers"
> + default n
> + help
> + Restrict /proc/*/mem direct write calls except for ptracer processes.
> + Open calls with RW modes are still allowed, this blocks just
> + the write() calls.
> +
> +endmenu

--
#Randy
https://people.kernel.org/tglx/notes-about-netiquette
https://subspace.kernel.org/etiquette.html

2024-05-27 11:21:54

by Adrian Ratiu

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] proc: restrict /proc/pid/mem

On Saturday, May 25, 2024 08:49 EEST, Randy Dunlap <[email protected]> wrote:

> Hi--
>
> On 5/24/24 12:28 PM, Adrian Ratiu wrote:
> > diff --git a/security/Kconfig b/security/Kconfig
> > index 412e76f1575d..0cd73f848b5a 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -183,6 +183,74 @@ config STATIC_USERMODEHELPER_PATH
> > If you wish for all usermode helper programs to be disabled,
> > specify an empty string here (i.e. "").
> >
> > +menu "Procfs mem restriction options"
> > +
> > +config PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT
> > + bool "Restrict all FOLL_FORCE flag usage"
> > + default n
> > + help
> > + Restrict all FOLL_FORCE usage during /proc/*/mem RW.
> > + Debuggerg like GDB require using FOLL_FORCE for basic
>
> Debuggers

Hello and thank you for the feedback!

I'll fix these typos in a v5 together with the kernel test robot failures.

I'll give v4 a bit more time in case other people have more feedback,
so I can address them all in one go.


2024-05-31 21:16:48

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] proc: pass file instead of inode to proc_mem_open

On Fri, May 24, 2024 at 10:28:57PM +0300, Adrian Ratiu wrote:
> The file struct is required in proc_mem_open() so its
> f_mode can be checked when deciding whether to allow or
> deny /proc/*/mem open requests via the new read/write
> and foll_force restriction mechanism.
>
> Thus instead of directly passing the inode to the fun,
> we pass the file and get the inode inside it.
>
> Cc: Jann Horn <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Christian Brauner <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>

With the nommu errors pointed out by 0day fixed:

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2024-05-31 21:29:47

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] proc: restrict /proc/pid/mem

On Fri, May 24, 2024 at 10:28:58PM +0300, Adrian Ratiu wrote:
> Prior to v2.6.39 write access to /proc/<pid>/mem was restricted,
> after which it got allowed in commit 198214a7ee50 ("proc: enable
> writing to /proc/pid/mem"). Famous last words from that patch:
> "no longer a security hazard". :)
>
> Afterwards exploits started causing drama like [1]. The exploits
> using /proc/*/mem can be rather sophisticated like [2] which
> installed an arbitrary payload from noexec storage into a running
> process then exec'd it, which itself could include an ELF loader
> to run arbitrary code off noexec storage.
>
> One of the well-known problems with /proc/*/mem writes is they
> ignore page permissions via FOLL_FORCE, as opposed to writes via
> process_vm_writev which respect page permissions. These writes can
> also be used to bypass mode bits.
>
> To harden against these types of attacks, distrbutions might want
> to restrict /proc/pid/mem accesses, either entirely or partially,
> for eg. to restrict FOLL_FORCE usage.
>
> Known valid use-cases which still need these accesses are:
>
> * Debuggers which also have ptrace permissions, so they can access
> memory anyway via PTRACE_POKEDATA & co. Some debuggers like GDB
> are designed to write /proc/pid/mem for basic functionality.
>
> * Container supervisors using the seccomp notifier to intercept
> syscalls and rewrite memory of calling processes by passing
> around /proc/pid/mem file descriptors.
>
> There might be more, that's why these params default to disabled.
>
> Regarding other mechanisms which can block these accesses:
>
> * seccomp filters can be used to block mmap/mprotect calls with W|X
> perms, but they often can't block open calls as daemons want to
> read/write their runtime state and seccomp filters cannot check
> file paths, so plain write calls can't be easily blocked.
>
> * Since the mem file is part of the dynamic /proc/<pid>/ space, we
> can't run chmod once at boot to restrict it (and trying to react
> to every process and run chmod doesn't scale, and the kernel no
> longer allows chmod on any of these paths).
>
> * SELinux could be used with a rule to cover all /proc/*/mem files,
> but even then having multiple ways to deny an attack is useful in
> case one layer fails.
>
> Thus we introduce four kernel parameters to restrict /proc/*/mem
> access: open-read, open-write, write and foll_force. All these can
> be independently set to the following values:
>
> all => restrict all access unconditionally.
> ptracer => restrict all access except for ptracer processes.
>
> If left unset, the existing behaviour is preserved, i.e. access
> is governed by basic file permissions.
>
> Examples which can be passed by bootloaders:
>
> proc_mem.restrict_foll_force=all
> proc_mem.restrict_open_write=ptracer
> proc_mem.restrict_open_read=ptracer
> proc_mem.restrict_write=all
>
> These knobs can also be enabled via Kconfig like for eg:
>
> CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT=y
> CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT=y
>
> Each distribution needs to decide what restrictions to apply,
> depending on its use-cases. Embedded systems might want to do
> more, while general-purpouse distros might want a more relaxed
> policy, because for e.g. foll_force=all and write=all both break
> break GDB, so it might be a bit excessive.
>
> Based on an initial patch by Mike Frysinger <[email protected]>.
>
> Link: https://lwn.net/Articles/476947/ [1]
> Link: https://issues.chromium.org/issues/40089045 [2]
> Cc: Guenter Roeck <[email protected]>
> Cc: Doug Anderson <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Jann Horn <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: Christian Brauner <[email protected]>
> Co-developed-by: Mike Frysinger <[email protected]>
> Signed-off-by: Mike Frysinger <[email protected]>
> Signed-off-by: Adrian Ratiu <[email protected]>
> ---
> Changes in v4:
> * Renamed parameters to use a fake namespace and respect
> subject-verb-objec pattern (eg proc_mem.restrict_read)
> * Replaced static key array with individual definitions.
> Still need 6 key definitions because we need to store 3
> states for each parameter, eg read all/ptrace/DAC states,
> so we need 2 keys for each parameter -- they will not fit
> into just 1 static key.
> * Replaced strncmp -> strcmp and dropped redundant helper,
> significantly simplified DEFINE_EARLY_PROC_MEM_RESTRICT
> macro.
> * Dropped else from __mem_open_check_access_restriction()
> * Moved ptracer check to proc_mem_open to avoid ToCToU
> * Added extra mm_access() check for the mem_rw() case
> * Found a use case for blocking just writes independent
> of open restrictions, so added a new param
> * Added *_DEFAULT Kconfigs
> ---
> .../admin-guide/kernel-parameters.txt | 38 ++++++
> fs/proc/base.c | 124 +++++++++++++++++-
> security/Kconfig | 68 ++++++++++
> 3 files changed, 229 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 500cfa776225..3fdfeaefccf2 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4792,6 +4792,44 @@
> printk.time= Show timing data prefixed to each printk message line
> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
>
> + proc_mem.restrict_foll_force= [KNL]
> + Format: {all | ptracer}
> + Restricts the use of the FOLL_FORCE flag for /proc/*/mem access.
> + If restricted, the FOLL_FORCE flag will not be added to vm accesses.
> + Can be one of:
> + - 'all' restricts all access unconditionally.
> + - 'ptracer' allows access only for ptracer processes.
> + If not specified, FOLL_FORCE is always used.
> +
> + proc_mem.restrict_open_read= [KNL]
> + Format: {all | ptracer}
> + Allows restricting read access to /proc/*/mem files during open().
> + Depending on restriction level, open for reads return -EACCES.
> + Can be one of:
> + - 'all' restricts all access unconditionally.
> + - 'ptracer' allows access only for ptracer processes.
> + If not specified, then basic file permissions continue to apply.
> +
> + proc_mem.restrict_open_write= [KNL]
> + Format: {all | ptracer}
> + Allows restricting write access to /proc/*/mem files during open().
> + Depending on restriction level, open for writes return -EACCES.
> + Can be one of:
> + - 'all' restricts all access unconditionally.
> + - 'ptracer' allows access only for ptracer processes.
> + If not specified, then basic file permissions continue to apply.
> +
> + proc_mem.restrict_write= [KNL]
> + Format: {all | ptracer}
> + Allows restricting write access to /proc/*/mem after the files
> + have been opened, during the actual write calls. This is useful for
> + systems which can't block writes earlier during open().
> + Depending on restriction level, writes will return -EACCES.
> + Can be one of:
> + - 'all' restricts all access unconditionally.
> + - 'ptracer' allows access only for ptracer processes.
> + If not specified, then basic file permissions continue to apply.
> +
> processor.max_cstate= [HW,ACPI]
> Limit processor to maximum C-state
> max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 6faf1b3a4117..9223eaaf055b 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -152,6 +152,30 @@ struct pid_entry {
> NULL, &proc_pid_attr_operations, \
> { .lsmid = LSMID })
>
> +#define DEFINE_EARLY_PROC_MEM_RESTRICT(CFG, name) \
> +DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_PROC_MEM_RESTRICT_##CFG##_DEFAULT, \
> + proc_mem_restrict_##name##_all); \
> +DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_PROC_MEM_RESTRICT_##CFG##_PTRACE_DEFAULT, \
> + proc_mem_restrict_##name##_ptracer); \
> + \
> +static int __init early_proc_mem_restrict_##name(char *buf) \
> +{ \
> + if (!buf) \
> + return -EINVAL; \
> + \
> + if (strcmp(buf, "all") == 0) \
> + static_key_slow_inc(&proc_mem_restrict_##name##_all.key); \
> + else if (strcmp(buf, "ptracer") == 0) \
> + static_key_slow_inc(&proc_mem_restrict_##name##_ptracer.key); \
> + return 0; \
> +} \
> +early_param("proc_mem.restrict_" #name, early_proc_mem_restrict_##name)
> +
> +DEFINE_EARLY_PROC_MEM_RESTRICT(OPEN_READ, open_read);
> +DEFINE_EARLY_PROC_MEM_RESTRICT(OPEN_WRITE, open_write);
> +DEFINE_EARLY_PROC_MEM_RESTRICT(WRITE, write);
> +DEFINE_EARLY_PROC_MEM_RESTRICT(FOLL_FORCE, foll_force);
> +
> /*
> * Count the number of hardlinks for the pid_entry table, excluding the .
> * and .. links.
> @@ -794,12 +818,56 @@ static const struct file_operations proc_single_file_operations = {
> };
>
>
> +static int __mem_open_access_permitted(struct file *file, struct task_struct *task)
> +{
> + bool is_ptracer;
> +
> + rcu_read_lock();
> + is_ptracer = current == ptrace_parent(task);
> + rcu_read_unlock();
> +
> + if (file->f_mode & FMODE_WRITE) {
> + /* Deny if writes are unconditionally disabled via param */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT,
> + &proc_mem_restrict_open_write_all))
> + return -EACCES;
> +
> + /* Deny if writes are allowed only for ptracers via param */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT,
> + &proc_mem_restrict_open_write_ptracer) &&
> + !is_ptracer)
> + return -EACCES;
> + }
> +
> + if (file->f_mode & FMODE_READ) {
> + /* Deny if reads are unconditionally disabled via param */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_DEFAULT,
> + &proc_mem_restrict_open_read_all))
> + return -EACCES;
> +
> + /* Deny if reads are allowed only for ptracers via param */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT,
> + &proc_mem_restrict_open_read_ptracer) &&
> + !is_ptracer)
> + return -EACCES;
> + }
> +
> + return 0; /* R/W are not restricted */
> +}
> +
> struct mm_struct *proc_mem_open(struct file *file, unsigned int mode)
> {
> struct task_struct *task = get_proc_task(file->f_inode);
> struct mm_struct *mm = ERR_PTR(-ESRCH);
> + int ret;
>
> if (task) {
> + ret = __mem_open_access_permitted(file, task);
> + if (ret) {
> + put_task_struct(task);
> + return ERR_PTR(ret);
> + }
> +
> mm = mm_access(task, mode | PTRACE_MODE_FSCREDS);
> put_task_struct(task);
>
> @@ -835,6 +903,56 @@ static int mem_open(struct inode *inode, struct file *file)
> return ret;
> }
>
> +static bool __mem_rw_current_is_ptracer(struct file *file)
> +{
> + struct inode *inode = file_inode(file);
> + struct task_struct *task = get_proc_task(inode);
> + int is_ptracer = false, has_mm_access = false;
> +
> + if (task) {
> + rcu_read_lock();
> + is_ptracer = current == ptrace_parent(task);
> + rcu_read_unlock();
> +
> + has_mm_access = file->private_data == mm_access(task, PTRACE_MODE_READ_FSCREDS);
> + put_task_struct(task);
> + }
> +
> + return is_ptracer && has_mm_access;
> +}

This is much improved; thanks!

One resource leak is here, though: mm_access() takes a reference count
on the mm, so you'll need something like:


...
if (task) {
struct mm_struct *mm;

rcu_read_lock();
is_ptracer = current == ptrace_parent(task);
rcu_read_unlock();

mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
if (mm && file->private_data == mm) {
has_mm_access = true;
mmput(mm);
}
put_task_struct(task);
}
...


> +
> +static unsigned int __mem_rw_get_foll_force_flag(struct file *file)
> +{
> + /* Deny if FOLL_FORCE is disabled via param */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT,
> + &proc_mem_restrict_foll_force_all))
> + return 0;
> +
> + /* Deny if FOLL_FORCE is allowed only for ptracers via param */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT,
> + &proc_mem_restrict_foll_force_ptracer) &&
> + !__mem_rw_current_is_ptracer(file))
> + return 0;
> +
> + return FOLL_FORCE;
> +}
> +
> +static bool __mem_rw_block_writes(struct file *file)
> +{
> + /* Block if writes are disabled via param proc_mem.restrict_write=all */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_DEFAULT,
> + &proc_mem_restrict_write_all))
> + return true;
> +
> + /* Block with an exception only for ptracers */
> + if (static_branch_maybe(CONFIG_PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT,
> + &proc_mem_restrict_write_ptracer) &&
> + !__mem_rw_current_is_ptracer(file))
> + return true;
> +
> + return false;
> +}
> +
> static ssize_t mem_rw(struct file *file, char __user *buf,
> size_t count, loff_t *ppos, int write)
> {
> @@ -847,6 +965,9 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
> if (!mm)
> return 0;
>
> + if (write && __mem_rw_block_writes(file))
> + return -EACCES;
> +
> page = (char *)__get_free_page(GFP_KERNEL);
> if (!page)
> return -ENOMEM;
> @@ -855,7 +976,8 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
> if (!mmget_not_zero(mm))
> goto free;
>
> - flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
> + flags = (write ? FOLL_WRITE : 0);
> + flags |= __mem_rw_get_foll_force_flag(file);
>
> while (count > 0) {
> size_t this_len = min_t(size_t, count, PAGE_SIZE);
> diff --git a/security/Kconfig b/security/Kconfig
> index 412e76f1575d..0cd73f848b5a 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -183,6 +183,74 @@ config STATIC_USERMODEHELPER_PATH
> If you wish for all usermode helper programs to be disabled,
> specify an empty string here (i.e. "").
>
> +menu "Procfs mem restriction options"
> +
> +config PROC_MEM_RESTRICT_FOLL_FORCE_DEFAULT
> + bool "Restrict all FOLL_FORCE flag usage"
> + default n
> + help
> + Restrict all FOLL_FORCE usage during /proc/*/mem RW.
> + Debuggerg like GDB require using FOLL_FORCE for basic
> + functionality.
> +
> +config PROC_MEM_RESTRICT_FOLL_FORCE_PTRACE_DEFAULT
> + bool "Restrict FOLL_FORCE usage except for ptracers"
> + default n
> + help
> + Restrict FOLL_FORCE usage during /proc/*/mem RW, except
> + for ptracer processes. Debuggerg like GDB require using
> + FOLL_FORCE for basic functionality.
> +
> +config PROC_MEM_RESTRICT_OPEN_READ_DEFAULT
> + bool "Restrict all open() read access"
> + default n
> + help
> + Restrict all open() read access to /proc/*/mem files.
> + Use with caution: this can break init systems, debuggers,
> + container supervisors and other tasks using /proc/*/mem.
> +
> +config PROC_MEM_RESTRICT_OPEN_READ_PTRACE_DEFAULT
> + bool "Restrict open() for reads except for ptracers"
> + default n
> + help
> + Restrict open() read access except for ptracer processes.
> + Use with caution: this can break init systems, debuggers,
> + container supervisors and other non-ptrace capable tasks
> + using /proc/*/mem.
> +
> +config PROC_MEM_RESTRICT_OPEN_WRITE_DEFAULT
> + bool "Restrict all open() write access"
> + default n
> + help
> + Restrict all open() write access to /proc/*/mem files.
> + Debuggers like GDB and some container supervisors tasks
> + require opening as RW and may break.
> +
> +config PROC_MEM_RESTRICT_OPEN_WRITE_PTRACE_DEFAULT
> + bool "Restrict open() for writes except for ptracers"
> + default n
> + help
> + Restrict open() write access except for ptracer processes,
> + usually debuggers.
> +
> +config PROC_MEM_RESTRICT_WRITE_DEFAULT
> + bool "Restrict all write() calls"
> + default n
> + help
> + Restrict all /proc/*/mem direct write calls.
> + Open calls with RW modes are still allowed, this blocks
> + just the write() calls.
> +
> +config PROC_MEM_RESTRICT_WRITE_PTRACE_DEFAULT
> + bool "Restrict write() calls except for ptracers"
> + default n
> + help
> + Restrict /proc/*/mem direct write calls except for ptracer processes.
> + Open calls with RW modes are still allowed, this blocks just
> + the write() calls.
> +
> +endmenu
> +
> source "security/selinux/Kconfig"
> source "security/smack/Kconfig"
> source "security/tomoyo/Kconfig"
> --
> 2.44.1

I think this looks really close.

--
Kees Cook