2009-10-09 04:51:00

by KOSAKI Motohiro

[permalink] [raw]
Subject: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

ok, all lkml reviewer ack this patch.
then, I've switched to linux-api review.

Any comments?



==================================================
From: Timo Sirainen <[email protected]>

Currently glibc2 doesn't have setproctitle(3), so several userland
daemons attempt to emulate it by doing some brutal stack modifications.
This works most of the time, but it has problems. For example:

% ps -ef |grep avahi-daemon
avahi 1679 1 0 09:20 ? 00:00:00 avahi-daemon: running [kosadesk.local]

# cat /proc/1679/cmdline
avahi-daemon: running [kosadesk.local]

This looks good, but the process has also overwritten its environment
area and made the environ file useless:

# cat /proc/1679/environ
adesk.local]

Another problem is that the process title length is limited by the size of
the environment. Security conscious people try to avoid potential information
leaks by clearing most of the environment before running a daemon:

# env - MINIMUM_NEEDED_VAR=foo /path/to/daemon

The resulting environment size may be too small to fit the wanted process
titles.

This patch makes it possible for userspace to implement setproctitle()
cleanly. It adds a new PR_SET_PROCTITLE_AREA option for prctl(), which
updates task's mm_struct->arg_start and arg_end to the given area.

test_setproctitle.c
================================================
#define ERR(str) (perror(str), exit(1))

void settitle(char* title){
int err;

err = prctl(34, title, strlen(title)+1);
if (err < 0)
ERR("prctl ");
}

void main(void){
long i;
char buf[1024];

for (i = 0; i < 10000000000LL; i++){
sprintf(buf, "loooooooooooooooooooooooong string %d",i);
settitle(buf);
}
}
==================================================

Cc: Bryan Donlan <[email protected]>
Cc: Ulrich Drepper <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Timo Sirainen <[email protected]>
---
fs/proc/base.c | 57 +++++++++++++++++++++++++++++-----------------
include/linux/mm_types.h | 2 +
include/linux/prctl.h | 4 +++
kernel/fork.c | 1 +
kernel/sys.c | 23 ++++++++++++++++++
5 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6f742f6..2f48440 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -255,32 +255,47 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer)
int res = 0;
unsigned int len;
struct mm_struct *mm = get_task_mm(task);
+ unsigned seq;
+
if (!mm)
goto out;
+
+ /* The process was not constructed yet? */
if (!mm->arg_end)
- goto out_mm; /* Shh! No looking before we're done */
+ goto out_mm;

- len = mm->arg_end - mm->arg_start;
-
- if (len > PAGE_SIZE)
- len = PAGE_SIZE;
-
- res = access_process_vm(task, mm->arg_start, buffer, len, 0);
-
- // If the nul at the end of args has been overwritten, then
- // assume application is using setproctitle(3).
- if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
- len = strnlen(buffer, res);
- if (len < res) {
- res = len;
- } else {
- len = mm->env_end - mm->env_start;
- if (len > PAGE_SIZE - res)
- len = PAGE_SIZE - res;
- res += access_process_vm(task, mm->env_start, buffer+res, len, 0);
+ do {
+ seq = read_seqbegin(&mm->arg_lock);
+
+ len = mm->arg_end - mm->arg_start;
+ if (len > PAGE_SIZE)
+ len = PAGE_SIZE;
+
+ res = access_process_vm(task, mm->arg_start, buffer, len, 0);
+
+ if (mm->arg_end != mm->env_start)
+ /* PR_SET_PROCTITLE_AREA used */
res = strnlen(buffer, res);
+ else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
+ /*
+ * If the nul at the end of args has been overwritten,
+ * then assume application is using sendmail's
+ * SPT_REUSEARGV style argv override.
+ */
+ len = strnlen(buffer, res);
+ if (len < res) {
+ res = len;
+ } else {
+ len = mm->env_end - mm->env_start;
+ if (len > PAGE_SIZE - res)
+ len = PAGE_SIZE - res;
+ res += access_process_vm(task, mm->env_start,
+ buffer+res, len, 0);
+ res = strnlen(buffer, res);
+ }
}
- }
+ } while (read_seqretry(&mm->arg_lock, seq));
+
out_mm:
mmput(mm);
out:
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0042090..9daa1fa 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -12,6 +12,7 @@
#include <linux/completion.h>
#include <linux/cpumask.h>
#include <linux/page-debug-flags.h>
+#include <linux/seqlock.h>
#include <asm/page.h>
#include <asm/mmu.h>

@@ -236,6 +237,7 @@ struct mm_struct {
unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
+ seqlock_t arg_lock;
unsigned long arg_start, arg_end, env_start, env_end;

unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index b00df4c..feffb17 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -88,4 +88,8 @@
#define PR_TASK_PERF_COUNTERS_DISABLE 31
#define PR_TASK_PERF_COUNTERS_ENABLE 32

+
+/* Set process title memory area for setproctitle() */
+#define PR_SET_PROCTITLE_AREA 34
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index bfee931..9eaa1cb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -435,6 +435,7 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p)
mm->free_area_cache = TASK_UNMAPPED_BASE;
mm->cached_hole_size = ~0UL;
mm_init_owner(mm, p);
+ seqlock_init(&mm->arg_lock);

if (likely(!mm_alloc_pgd(mm))) {
mm->def_flags = 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index b3f1097..e26c687 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1528,6 +1528,29 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
current->timer_slack_ns = arg2;
error = 0;
break;
+ case PR_SET_PROCTITLE_AREA: {
+ struct mm_struct *mm = current->mm;
+ unsigned long addr = arg2;
+ unsigned long len = arg3;
+ unsigned long end = arg2 + arg3;
+
+ if (len > PAGE_SIZE)
+ return -EINVAL;
+
+ if (addr >= end)
+ return -EINVAL;
+
+ if (!access_ok(VERIFY_READ, addr, len)) {
+ return -EFAULT;
+ }
+
+ write_seqlock(&mm->arg_lock);
+ mm->arg_start = addr;
+ mm->arg_end = addr + len;
+ write_sequnlock(&mm->arg_lock);
+
+ return 0;
+ }
default:
error = -EINVAL;
break;
--
1.6.2.5




2009-10-10 00:14:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Fri, 9 Oct 2009 13:50:17 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> ok, all lkml reviewer ack this patch.
> then, I've switched to linux-api review.
>
> Any comments?
>
>
>
> ==================================================
> From: Timo Sirainen <[email protected]>
>
> Currently glibc2 doesn't have setproctitle(3), so several userland
> daemons attempt to emulate it by doing some brutal stack modifications.
> This works most of the time, but it has problems. For example:
>
> % ps -ef |grep avahi-daemon
> avahi 1679 1 0 09:20 ? 00:00:00 avahi-daemon: running [kosadesk.local]
>
> # cat /proc/1679/cmdline
> avahi-daemon: running [kosadesk.local]
>
> This looks good, but the process has also overwritten its environment
> area and made the environ file useless:
>
> # cat /proc/1679/environ
> adesk.local]
>
> Another problem is that the process title length is limited by the size of
> the environment. Security conscious people try to avoid potential information
> leaks by clearing most of the environment before running a daemon:
>
> # env - MINIMUM_NEEDED_VAR=foo /path/to/daemon
>
> The resulting environment size may be too small to fit the wanted process
> titles.
>
> This patch makes it possible for userspace to implement setproctitle()
> cleanly. It adds a new PR_SET_PROCTITLE_AREA option for prctl(), which
> updates task's mm_struct->arg_start and arg_end to the given area.
>
> test_setproctitle.c
> ================================================
> #define ERR(str) (perror(str), exit(1))
>
> void settitle(char* title){
> int err;
>
> err = prctl(34, title, strlen(title)+1);
> if (err < 0)
> ERR("prctl ");
> }
>
> void main(void){
> long i;
> char buf[1024];
>
> for (i = 0; i < 10000000000LL; i++){
> sprintf(buf, "loooooooooooooooooooooooong string %d",i);
> settitle(buf);
> }
> }

gee that's a good changelog.

> ...
>
> @@ -255,32 +255,47 @@ static int proc_pid_cmdline(struct task_struct *task, char * buffer)
> int res = 0;
> unsigned int len;
> struct mm_struct *mm = get_task_mm(task);
> + unsigned seq;
> +
> if (!mm)
> goto out;
> +
> + /* The process was not constructed yet? */
> if (!mm->arg_end)
> - goto out_mm; /* Shh! No looking before we're done */
> + goto out_mm;
>
> - len = mm->arg_end - mm->arg_start;
> -
> - if (len > PAGE_SIZE)
> - len = PAGE_SIZE;
> -
> - res = access_process_vm(task, mm->arg_start, buffer, len, 0);
> -
> - // If the nul at the end of args has been overwritten, then
> - // assume application is using setproctitle(3).
> - if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
> - len = strnlen(buffer, res);
> - if (len < res) {
> - res = len;
> - } else {
> - len = mm->env_end - mm->env_start;
> - if (len > PAGE_SIZE - res)
> - len = PAGE_SIZE - res;
> - res += access_process_vm(task, mm->env_start, buffer+res, len, 0);
> + do {
> + seq = read_seqbegin(&mm->arg_lock);
> +
> + len = mm->arg_end - mm->arg_start;
> + if (len > PAGE_SIZE)
> + len = PAGE_SIZE;
> +
> + res = access_process_vm(task, mm->arg_start, buffer, len, 0);
> +
> + if (mm->arg_end != mm->env_start)
> + /* PR_SET_PROCTITLE_AREA used */
> res = strnlen(buffer, res);

Hang on.

If PR_SET_PROCTITLE_AREA installed a bad address then
access_process_vm() will return -EFAULT and nothing was written to
`buffer'?

Also, concurrent PR_SET_PROCTITLE_AREA could cause arg_start and
arg_end to have inconsistent/incoherent values. This might result in a
fault but it also might result in a read of garbage userspace memory.

I guess all of these issues could be written off as "userspace bugs",
but our behaviour here isn't nice. We should at least return that
-EFAULT!.

> + else if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
> + /*
> + * If the nul at the end of args has been overwritten,
> + * then assume application is using sendmail's
> + * SPT_REUSEARGV style argv override.
> + */
> + len = strnlen(buffer, res);
> + if (len < res) {
> + res = len;
> + } else {
> + len = mm->env_end - mm->env_start;
> + if (len > PAGE_SIZE - res)
> + len = PAGE_SIZE - res;
> + res += access_process_vm(task, mm->env_start,
> + buffer+res, len, 0);
> + res = strnlen(buffer, res);
> + }

And if access_process_vm() returns a -ve errno here, the code simply
flies off into the weeds.

> }
> - }
> + } while (read_seqretry(&mm->arg_lock, seq));
> +
> out_mm:
> mmput(mm);
>
> ...
>
> @@ -236,6 +237,7 @@ struct mm_struct {
> unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;
> unsigned long start_code, end_code, start_data, end_data;
> unsigned long start_brk, brk, start_stack;
> + seqlock_t arg_lock;
> unsigned long arg_start, arg_end, env_start, env_end;
>
> unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */

seqlocks are nice but I have to wonder if they made this patch
unnecessarily complex. Couldn't we just do:

PR_SET_PROCTITLE_AREA:

spin_lock(mm->lock);
mm->arg_start = addr;
mm->arg_end = addr + len;
spin_unlock(mm->lock);

and

proc_pid_cmdline()
{
unsigned long addr;
unsigned long end;

spin_lock(mm->lock);
addr = mm->arg_start;
end = mm->arg_end;
spin_unlock(mm->lock);

<use `addr' and `len'>
}

?

2009-10-10 02:23:40

by Bryan Donlan

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Fri, Oct 9, 2009 at 8:13 PM, Andrew Morton <[email protected]> wrote:

>> + ? ? ? ? ? ? res = access_process_vm(task, mm->arg_start, buffer, len, 0);
>> +
>> + ? ? ? ? ? ? if (mm->arg_end != mm->env_start)
>> + ? ? ? ? ? ? ? ? ? ? /* PR_SET_PROCTITLE_AREA used */
>> ? ? ? ? ? ? ? ? ? ? ? res = strnlen(buffer, res);
>
> Hang on.
>
> If PR_SET_PROCTITLE_AREA installed a bad address then
> access_process_vm() will return -EFAULT and nothing was written to
> `buffer'?
>
> Also, concurrent PR_SET_PROCTITLE_AREA could cause arg_start and
> arg_end to have inconsistent/incoherent values. ?This might result in a
> fault but it also might result in a read of garbage userspace memory.
>
> I guess all of these issues could be written off as "userspace bugs",
> but our behaviour here isn't nice. ?We should at least return that
> -EFAULT!.

access_process_vm() does not return an error code - just the number of
bytes successfully transferred. If there's a fault, we just get 0 (or
some intermediate value) back (as discussed on lkml).

>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? res += access_process_vm(task, mm->env_start,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?buffer+res, len, 0);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? res = strnlen(buffer, res);
>> + ? ? ? ? ? ? ? ? ? ? }
>
> And if access_process_vm() returns a -ve errno here, the code simply
> flies off into the weeds.

This code was originally there - it's just been lifted into an if :)
and since access_process_vm will never return a negative errno, it's
not a problem.

Now, there might be a case for arguing that we should try harder to
get an error code (-EIO if we don't read the number of bytes we asked
for), but /proc/pid/cmdline never has before, and that would then make
this a visible change for consumers of /proc/pid/cmdline. Can ps
handle reading cmdline returning -EIO?

> seqlocks are nice but I have to wonder if they made this patch
> unnecessarily complex. ?Couldn't we just do:
>
> ? ? ? ? PR_SET_PROCTITLE_AREA:
>
> ? ? ? ? ? ? ? ?spin_lock(mm->lock);
> ? ? ? ? ? ? ? ?mm->arg_start = addr;
> ? ? ? ? ? ? ? ?mm->arg_end = addr + len;
> ? ? ? ? ? ? ? ?spin_unlock(mm->lock);
>
> and
>
> ? ? ? ?proc_pid_cmdline()
> ? ? ? ?{
> ? ? ? ? ? ? ? ?unsigned long addr;
> ? ? ? ? ? ? ? ?unsigned long end;
>
> ? ? ? ? ? ? ? ?spin_lock(mm->lock);
> ? ? ? ? ? ? ? ?addr = mm->arg_start;
> ? ? ? ? ? ? ? ?end = mm->arg_end;
> ? ? ? ? ? ? ? ?spin_unlock(mm->lock);
>
> ? ? ? ? ? ? ? ?<use `addr' and `len'>
> ? ? ? ?}
>
> ?

As discussed on the lkml thread, this opens up a nasty race:

Process A: calls PR_SET_PROCTITLE_AREA
Process B: read cmdline:
Process B: spin_lock
Process B: read mm->arg_*
Process B: unlock
Process A: mm->arg_* = ...
Process A: free(old_cmdline_area)
Process A: secret_password = malloc(...)
Process A: strcpy(secret_password, stuff);
Process B: access_process_vm

If secret_password == arg_start, then process B just read secret
information from process A's address space. Since process A has no
idea when or even if process B will complete its read, it has no way
of protecting itself from this, save from never reusing any memory
which has ever been assigned to a proctitle area.

The solution is to use the seqlock to detect this, and prevent the
secret information from ever making it back to process B's userspace.
Note that it's not enough to just recheck arg_start, as process A may
reassign the proctitle area back to its original position after having
it somewhere else for a while.

2009-10-10 02:43:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Fri, 9 Oct 2009 22:22:10 -0400 Bryan Donlan <[email protected]> wrote:

> On Fri, Oct 9, 2009 at 8:13 PM, Andrew Morton <[email protected]> wrote:
>
> >> + __ __ __ __ __ __ res = access_process_vm(task, mm->arg_start, buffer, len, 0);
> >> +
> >> + __ __ __ __ __ __ if (mm->arg_end != mm->env_start)
> >> + __ __ __ __ __ __ __ __ __ __ /* PR_SET_PROCTITLE_AREA used */
> >> __ __ __ __ __ __ __ __ __ __ __ res = strnlen(buffer, res);
> >
> > Hang on.
> >
> > If PR_SET_PROCTITLE_AREA installed a bad address then
> > access_process_vm() will return -EFAULT and nothing was written to
> > `buffer'?
> >
> > Also, concurrent PR_SET_PROCTITLE_AREA could cause arg_start and
> > arg_end to have inconsistent/incoherent values. __This might result in a
> > fault but it also might result in a read of garbage userspace memory.
> >
> > I guess all of these issues could be written off as "userspace bugs",
> > but our behaviour here isn't nice. __We should at least return that
> > -EFAULT!.
>
> access_process_vm() does not return an error code - just the number of
> bytes successfully transferred. If there's a fault, we just get 0 (or
> some intermediate value) back

OK.

> (as discussed on lkml).

That's not code documentation :(

> >> + __ __ __ __ __ __ __ __ __ __ __ __ __ __ res += access_process_vm(task, mm->env_start,

Your email client is converting tabs to non-ascii crap. gmail. Sigh.

> > __ __ __ __ __ __ __ __mm->arg_start = addr;
> > __ __ __ __ __ __ __ __mm->arg_end = addr + len;
> > __ __ __ __ __ __ __ __spin_unlock(mm->lock);
> >
> > and
> >
> > __ __ __ __proc_pid_cmdline()
> > __ __ __ __{
> > __ __ __ __ __ __ __ __unsigned long addr;
> > __ __ __ __ __ __ __ __unsigned long end;
> >
> > __ __ __ __ __ __ __ __spin_lock(mm->lock);
> > __ __ __ __ __ __ __ __addr = mm->arg_start;
> > __ __ __ __ __ __ __ __end = mm->arg_end;
> > __ __ __ __ __ __ __ __spin_unlock(mm->lock);
> >
> > __ __ __ __ __ __ __ __<use `addr' and `len'>
> > __ __ __ __}
> >
> > ?
>
> As discussed on the lkml thread, this opens up a nasty race:
>
> Process A: calls PR_SET_PROCTITLE_AREA
> Process B: read cmdline:
> Process B: spin_lock
> Process B: read mm->arg_*
> Process B: unlock
> Process A: mm->arg_* = ...
> Process A: free(old_cmdline_area)
> Process A: secret_password = malloc(...)
> Process A: strcpy(secret_password, stuff);
> Process B: access_process_vm
>
> If secret_password == arg_start, then process B just read secret
> information from process A's address space. Since process A has no
> idea when or even if process B will complete its read, it has no way
> of protecting itself from this, save from never reusing any memory
> which has ever been assigned to a proctitle area.

OK.

But there's no way in which the reader of either the patch or the
resulting code can discover this subtlety.

> The solution is to use the seqlock to detect this, and prevent the
> secret information from ever making it back to process B's userspace.
> Note that it's not enough to just recheck arg_start, as process A may
> reassign the proctitle area back to its original position after having
> it somewhere else for a while.

Well seqlock is _a_ solution. Another is to use a mutex or an rwsem
around the whole operation.

With the code as you propose it, what happens if a process sits in a
tight loop running setproctitle? Do other processes running `ps' get
stuck in a livelock until the offending process gets scheduled out?

2009-10-10 02:58:14

by Bryan Donlan

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Fri, Oct 9, 2009 at 10:42 PM, Andrew Morton
<[email protected]> wrote:

>> >> + __ __ __ __ __ __ __ __ __ __ __ __ __ __ res += access_process_vm(task, mm->env_start,
>
> Your email client is converting tabs to non-ascii crap. ?gmail. ?Sigh.

Weird ... I'll have to see if I can do something about that :/

> OK.
>
> But there's no way in which the reader of either the patch or the
> resulting code can discover this subtlety.

I didn't write the log message or the code - I just mentioned these
same issues back in the lkml thread :) But yes, this should be
mentioned somewhere.

>> The solution is to use the seqlock to detect this, and prevent the
>> secret information from ever making it back to process B's userspace.
>> Note that it's not enough to just recheck arg_start, as process A may
>> reassign the proctitle area back to its original position after having
>> it somewhere else for a while.
>
> Well seqlock is _a_ solution. ?Another is to use a mutex or an rwsem
> around the whole operation.
>
> With the code as you propose it, what happens if a process sits in a
> tight loop running setproctitle? ?Do other processes running `ps' get
> stuck in a livelock until the offending process gets scheduled out?

It does seem like a maximum spin count should be put in there - and
maybe a timeout as well (since with FUSE etc it's possible to engineer
page faults that take arbitrarily long).
Also, it occurs to me that:

> + do {
> + seq = read_seqbegin(&mm->arg_lock);
> +
> + len = mm->arg_end - mm->arg_start;
> + if (len > PAGE_SIZE)
> + len = PAGE_SIZE;

If arg_end or arg_start are modified after this, is it truly safe to
assume that len will remain <= PAGE_SIZE without a memory barrier
before the conditional?

2009-10-10 06:33:14

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

(Doh, I've forgot cc to original author. sorry. cc to timo.)

Hi

very interesting discussion. great.


2009/10/10 Bryan Donlan <[email protected]>:
> On Fri, Oct 9, 2009 at 10:42 PM, Andrew Morton
> <[email protected]> wrote:
>
>>> >> + __ __ __ __ __ __ __ __ __ __ __ __ __ __ res += access_process_vm(task, mm->env_start,
>>
>> Your email client is converting tabs to non-ascii crap. ?gmail. ?Sigh.
>
> Weird ... I'll have to see if I can do something about that :/
>
>> OK.
>>
>> But there's no way in which the reader of either the patch or the
>> resulting code can discover this subtlety.
>
> I didn't write the log message or the code - I just mentioned these
> same issues back in the lkml thread :) But yes, this should be
> mentioned somewhere.
>

This is documented in access_process_vm.

3224/*
3225 * Access another process' address space.
3226 * Source/target buffer must be kernel space,
3227 * Do not walk the page table directly, use get_user_pages
3228 */
3229int access_process_vm(struct task_struct *tsk, unsigned long addr,
void *buf, int len, int write)
3230{
3231 struct mm_struct *mm;
3232 struct vm_area_struct *vma;
3233 void *old_buf = buf;
3234
3235 mm = get_task_mm(tsk);
3236 if (!mm)
3237 return 0;
3238
3239 down_read(&mm->mmap_sem);
3240 /* ignore errors, just check how much was successfully
transferred */
3241 while (len) {

However, yes, it is pretty bad place ;)
/* ignore errors, just check how much was successfully transferred */
comment should
be placed in function header comment.

I think separate another patch is better.


>>> The solution is to use the seqlock to detect this, and prevent the
>>> secret information from ever making it back to process B's userspace.
>>> Note that it's not enough to just recheck arg_start, as process A may
>>> reassign the proctitle area back to its original position after having
>>> it somewhere else for a while.
>>
>> Well seqlock is _a_ solution. ?Another is to use a mutex or an rwsem
>> around the whole operation.
>>
>> With the code as you propose it, what happens if a process sits in a
>> tight loop running setproctitle? ?Do other processes running `ps' get
>> stuck in a livelock until the offending process gets scheduled out?
>
> It does seem like a maximum spin count should be put in there - and
> maybe a timeout as well (since with FUSE etc it's possible to engineer
> page faults that take arbitrarily long).
> Also, it occurs to me that:

makes sense.
I like maximum spin rather than timeout.


>> + ? ? do {
>> + ? ? ? ? ? ? seq = read_seqbegin(&mm->arg_lock);
>> +
>> + ? ? ? ? ? ? len = mm->arg_end - mm->arg_start;
>> + ? ? ? ? ? ? if (len > PAGE_SIZE)
>> + ? ? ? ? ? ? ? ? ? ? len = PAGE_SIZE;
>
> If arg_end or arg_start are modified after this, is it truly safe to
> assume that len will remain <= PAGE_SIZE without a memory barrier
> before the conditional?

1) access_process_vm() doesn't return error value.
2) read_seqretry(&mm->arg_lock, seq)) check seq, not mm->arg_start or len.

then, if arg_{start,end} is modified, access_process_vm() may return 0
and strnlen
makes bad calculation, but read_seqretry() can detect its modify
rightly. I think.

2009-10-10 06:40:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Sat, 10 Oct 2009 15:32:35 +0900 KOSAKI Motohiro <[email protected]> wrote:

> >>> The solution is to use the seqlock to detect this, and prevent the
> >>> secret information from ever making it back to process B's userspace.
> >>> Note that it's not enough to just recheck arg_start, as process A may
> >>> reassign the proctitle area back to its original position after having
> >>> it somewhere else for a while.
> >>
> >> Well seqlock is _a_ solution. __Another is to use a mutex or an rwsem
> >> around the whole operation.
> >>
> >> With the code as you propose it, what happens if a process sits in a
> >> tight loop running setproctitle? __Do other processes running `ps' get
> >> stuck in a livelock until the offending process gets scheduled out?
> >
> > It does seem like a maximum spin count should be put in there - and
> > maybe a timeout as well (since with FUSE etc it's possible to engineer
> > page faults that take arbitrarily long).
> > Also, it occurs to me that:
>
> makes sense.
> I like maximum spin rather than timeout.

Start simple. What's wrong with mutex_lock() on the reader and writer
sides? rwsems might be OK too.

In both cases we should think about whether persistent readers can
block the writer excessively though.

2009-10-10 07:13:21

by Bryan Donlan

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Sat, Oct 10, 2009 at 2:32 AM, KOSAKI Motohiro
<[email protected]> wrote:

>> It does seem like a maximum spin count should be put in there - and
>> maybe a timeout as well (since with FUSE etc it's possible to engineer
>> page faults that take arbitrarily long).
>> Also, it occurs to me that:
>
> makes sense.
> I like maximum spin rather than timeout.

I'm worried about the scenario where process A sets its cmdline buffer
to point to a page which will take a _VERY_ long time to pagein (maybe
forever), and then process B goes to try to read its cmdline. What
happens now?
Process A can arrange for this to happen by using a FUSE filesystem
that sits on a read forever. And since the first thing the admin's
likely to do to track down the problem is 'ps awux', this is liable to
be a rather nasty DoS...

Of course, this is no worse than it is now - it's already possible to
replace the page in question. But we should think about ways this
could be fixed for good...

>
>>> + ? ? do {
>>> + ? ? ? ? ? ? seq = read_seqbegin(&mm->arg_lock);
>>> +
>>> + ? ? ? ? ? ? len = mm->arg_end - mm->arg_start;
>>> + ? ? ? ? ? ? if (len > PAGE_SIZE)
>>> + ? ? ? ? ? ? ? ? ? ? len = PAGE_SIZE;
>>
>> If arg_end or arg_start are modified after this, is it truly safe to
>> assume that len will remain <= PAGE_SIZE without a memory barrier
>> before the conditional?
>
> 1) access_process_vm() doesn't return error value.
> 2) read_seqretry(&mm->arg_lock, seq)) check seq, not mm->arg_start or len.
>
> then, if arg_{start,end} is modified, access_process_vm() may return 0
> and strnlen
> makes bad calculation, but read_seqretry() can detect its modify
> rightly. I think.

No, I'm worried about what if the compiler decides to rewrite like so:
if (mm->arg_end - mm->arg_start > PAGE_SIZE)
len = PAGE_SIZE;
else /* here we reload arg_end/arg_start! */
len = mm->arg_end - mm->arg_start;

Now we might write into buffer more than PAGE_SIZE bytes, which is
probably a buffer overrun into kernel space...

2009-10-12 19:05:30

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

Hi

Sorry for the delaying.

> On Sat, Oct 10, 2009 at 2:32 AM, KOSAKI Motohiro
> <[email protected]> wrote:
>
> >> It does seem like a maximum spin count should be put in there - and
> >> maybe a timeout as well (since with FUSE etc it's possible to engineer
> >> page faults that take arbitrarily long).
> >> Also, it occurs to me that:
> >
> > makes sense.
> > I like maximum spin rather than timeout.
>
> I'm worried about the scenario where process A sets its cmdline buffer
> to point to a page which will take a _VERY_ long time to pagein (maybe
> forever), and then process B goes to try to read its cmdline. What
> happens now?

Honestly, I don't worry about so much. if attacker want DoS attack, fork bomb is
efficient than this way. then, attacker never use this.


> Process A can arrange for this to happen by using a FUSE filesystem
> that sits on a read forever. And since the first thing the admin's
> likely to do to track down the problem is 'ps awux', this is liable to
> be a rather nasty DoS...

Probably, I haven't understand this paragraph. Why is this FUSE related issue?

> Of course, this is no worse than it is now - it's already possible to
> replace the page in question. But we should think about ways this
> could be fixed for good...

Plus, please look my mesurement data as another post. seqlock implementation is very fast
although contention occured.



> >>> + ? ? do {
> >>> + ? ? ? ? ? ? seq = read_seqbegin(&mm->arg_lock);
> >>> +
> >>> + ? ? ? ? ? ? len = mm->arg_end - mm->arg_start;
> >>> + ? ? ? ? ? ? if (len > PAGE_SIZE)
> >>> + ? ? ? ? ? ? ? ? ? ? len = PAGE_SIZE;
> >>
> >> If arg_end or arg_start are modified after this, is it truly safe to
> >> assume that len will remain <= PAGE_SIZE without a memory barrier
> >> before the conditional?
> >
> > 1) access_process_vm() doesn't return error value.
> > 2) read_seqretry(&mm->arg_lock, seq)) check seq, not mm->arg_start or len.
> >
> > then, if arg_{start,end} is modified, access_process_vm() may return 0
> > and strnlen
> > makes bad calculation, but read_seqretry() can detect its modify
> > rightly. I think.
>
> No, I'm worried about what if the compiler decides to rewrite like so:
> if (mm->arg_end - mm->arg_start > PAGE_SIZE)
> len = PAGE_SIZE;
> else /* here we reload arg_end/arg_start! */
> len = mm->arg_end - mm->arg_start;
>
> Now we might write into buffer more than PAGE_SIZE bytes, which is
> probably a buffer overrun into kernel space...

Rgiht. I'll fix this issue at next spin.
Thank you.


2009-10-12 19:05:35

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

Hi

> > >>> The solution is to use the seqlock to detect this, and prevent the
> > >>> secret information from ever making it back to process B's userspace.
> > >>> Note that it's not enough to just recheck arg_start, as process A may
> > >>> reassign the proctitle area back to its original position after having
> > >>> it somewhere else for a while.
> > >>
> > >> Well seqlock is _a_ solution. __Another is to use a mutex or an rwsem
> > >> around the whole operation.
> > >>
> > >> With the code as you propose it, what happens if a process sits in a
> > >> tight loop running setproctitle? __Do other processes running `ps' get
> > >> stuck in a livelock until the offending process gets scheduled out?
> > >
> > > It does seem like a maximum spin count should be put in there - and
> > > maybe a timeout as well (since with FUSE etc it's possible to engineer
> > > page faults that take arbitrarily long).
> > > Also, it occurs to me that:
> >
> > makes sense.
> > I like maximum spin rather than timeout.
>
> Start simple. What's wrong with mutex_lock() on the reader and writer
> sides? rwsems might be OK too.
>
> In both cases we should think about whether persistent readers can
> block the writer excessively though.

I thought your mention seems reasonable. then I mesured various locking
performance.

no-contention read-read contetion read-write contention
w/o patch 4627 ms 7575 ms N/A
mutex 5717 ms 33872 ms (!) 14793 ms
rw-semaphoe 6846 ms 10734 ms 36156 ms (!)
seqlock 4754 ms 7558 ms 9373 ms

Umm, seqlock is significantly better than other.

<testcase>
readtitle.c read proctitle 1,000,000 times
setproctitle.c infinite loop of setproctitle()

no-contention:
./readtitle 1

read-read contention:
./readtitle 1 &; ./readtitle 1&; wait

read-write contention
./setproctitle
[switch other terminal]
./readtitle `pidof setproctitle`


I agree this testcase is too pessimistic. ps doesn't read /proc/{pid}/cmdline
so frequently. however if we need to concern DoS attack, we need to mesure
pessimistic scenario.

Plus, this result indicate setproctitle-seqlock doesn't need timeout nor
max spin.


Attachments:
setproctitle-mutex.patch (3.67 kB)
setproctitle-rwmutex.patch (3.30 kB)
setproctitle-seqlock.patch (4.04 kB)
readtitle.c (672.00 B)
setproctitle.c (571.00 B)
Download all attachments

2009-10-12 19:24:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Tue, 13 Oct 2009 04:03:45 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:

> > Start simple. What's wrong with mutex_lock() on the reader and writer
> > sides? rwsems might be OK too.
> >
> > In both cases we should think about whether persistent readers can
> > block the writer excessively though.
>
> I thought your mention seems reasonable. then I mesured various locking
> performance.
>
> no-contention read-read contetion read-write contention
> w/o patch 4627 ms 7575 ms N/A
> mutex 5717 ms 33872 ms (!) 14793 ms
> rw-semaphoe 6846 ms 10734 ms 36156 ms (!)
> seqlock 4754 ms 7558 ms 9373 ms
>
> Umm, seqlock is significantly better than other.

Sure, but even the worst case there is 1,000,000 operations in 34
seconds (yes?). 33 microseconds for a /proc read while under a specific
local DoS attack is OK!

If so then all implementations are acceptable and we should choose the
simplest, most-obviously-correct one.

2009-10-12 19:34:10

by Bryan Donlan

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

On Mon, Oct 12, 2009 at 3:03 PM, KOSAKI Motohiro
<[email protected]> wrote:
> Hi
>
> Sorry for the delaying.
>
>> On Sat, Oct 10, 2009 at 2:32 AM, KOSAKI Motohiro
>> <[email protected]> wrote:
>>
>> >> It does seem like a maximum spin count should be put in there - and
>> >> maybe a timeout as well (since with FUSE etc it's possible to engineer
>> >> page faults that take arbitrarily long).
>> >> Also, it occurs to me that:
>> >
>> > makes sense.
>> > I like maximum spin rather than timeout.
>>
>> I'm worried about the scenario where process A sets its cmdline buffer
>> to point to a page which will take a _VERY_ long time to pagein (maybe
>> forever), and then process B goes to try to read its cmdline. What
>> happens now?
>
> Honestly, I don't worry about so much. if attacker want DoS attack, fork bomb is
> efficient than this way. then, attacker never use this.

Fork bombs and etc can be mitigated by resource limits; but if the
command line is placed on a page that will take a very long time to
fault, then that cannot be mitigated... But again, this DoS already
exists and isn't any easier with this patch, so I think it's a
separate issue.

>> Process A can arrange for this to happen by using a FUSE filesystem
>> that sits on a read forever. And since the first thing the admin's
>> likely to do to track down the problem is 'ps awux', this is liable to
>> be a rather nasty DoS...
>
> Probably, I haven't understand this paragraph. Why is this FUSE related issue?

Just an example of how one can create a page that will take a very
long time to fault in.

2009-10-13 00:04:04

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [resend][PATCH] Added PR_SET_PROCTITLE_AREA option for prctl()

> On Tue, 13 Oct 2009 04:03:45 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
> > > Start simple. What's wrong with mutex_lock() on the reader and writer
> > > sides? rwsems might be OK too.
> > >
> > > In both cases we should think about whether persistent readers can
> > > block the writer excessively though.
> >
> > I thought your mention seems reasonable. then I mesured various locking
> > performance.
> >
> > no-contention read-read contetion read-write contention
> > w/o patch 4627 ms 7575 ms N/A
> > mutex 5717 ms 33872 ms (!) 14793 ms
> > rw-semaphoe 6846 ms 10734 ms 36156 ms (!)
> > seqlock 4754 ms 7558 ms 9373 ms
> >
> > Umm, seqlock is significantly better than other.
>
> Sure, but even the worst case there is 1,000,000 operations in 34
> seconds (yes?). 33 microseconds for a /proc read while under a specific
> local DoS attack is OK!
>
> If so then all implementations are acceptable and we should choose the
> simplest, most-obviously-correct one.

Hm, ok!

I had guessed you don't accept this slowness. but my guess was wrong.
I have no objection to use rw-semaphoe if you accept it.