2011-05-30 16:25:04

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] init: use KERNEL_DS when trying to start init process

We use kernel_execve() to transfer control of the init procces from
kernel to userland. If the program to start as init process isn't given
on the kernel command line or fails to start we use a few hardcoded
fallbacks. This fallback mechanism does not work when we encounter a
file that is executable but fails to start, e.g. due to a missing
library dependency or by having an unsupported file format.

The bug is, that search_binary_handler() sets the address limit to
USER_DS but doesn't reset it on error which will make all further
attempts fail with -EFAULT because argv[0] is a pointer to kernel
memory, not userland.

The bug can easily be reproduced by starting a 32 bit kernel with a 64
bit executable as /init and a 32 bit version as /sbin/init within an
initramfs. The hardcoded defaults should make /init fail because of the
unsupported file format but should make /sbin/init succeed. This doesn't
happen because the string "/sbin/init" lives in kernel memory and is no
longer allowed because of the modified address limit to USER_DS after
the failed execution attempt of /init.

Fixing the only user of kernel_execve that needs this tweaking was far
more easy than changing the implementation for all architectures. This
also makes backporting far more easy as this bug is in there from the
very beginning -- at least it's in v2.6.12, too.

Signed-off-by: Mathias Krause <[email protected]>
CC: [email protected]
---
init/main.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/init/main.c b/init/main.c
index cafba67..4ee893a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -731,6 +731,9 @@ static void __init do_pre_smp_initcalls(void)

static void run_init_process(const char *init_filename)
{
+ /* Ensure we can access in-kernel filenames -- previous exec attempts
+ * might have set the address limit to USER_DS */
+ set_fs(KERNEL_DS);
argv_init[0] = init_filename;
kernel_execve(init_filename, argv_init, envp_init);
}
--
1.5.6.5


2011-06-06 23:13:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Mon, 30 May 2011 18:17:08 +0200
Mathias Krause <[email protected]> wrote:

> We use kernel_execve() to transfer control of the init procces from
> kernel to userland. If the program to start as init process isn't given
> on the kernel command line or fails to start we use a few hardcoded
> fallbacks. This fallback mechanism does not work when we encounter a
> file that is executable but fails to start, e.g. due to a missing
> library dependency or by having an unsupported file format.
>
> The bug is, that search_binary_handler() sets the address limit to
> USER_DS but doesn't reset it on error which will make all further
> attempts fail with -EFAULT because argv[0] is a pointer to kernel
> memory, not userland.
>
> The bug can easily be reproduced by starting a 32 bit kernel with a 64
> bit executable as /init and a 32 bit version as /sbin/init within an
> initramfs. The hardcoded defaults should make /init fail because of the
> unsupported file format but should make /sbin/init succeed. This doesn't
> happen because the string "/sbin/init" lives in kernel memory and is no
> longer allowed because of the modified address limit to USER_DS after
> the failed execution attempt of /init.
>
> Fixing the only user of kernel_execve that needs this tweaking was far
> more easy than changing the implementation for all architectures. This
> also makes backporting far more easy as this bug is in there from the
> very beginning -- at least it's in v2.6.12, too.
>
> Signed-off-by: Mathias Krause <[email protected]>
> CC: [email protected]
> ---
> init/main.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/init/main.c b/init/main.c
> index cafba67..4ee893a 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -731,6 +731,9 @@ static void __init do_pre_smp_initcalls(void)
>
> static void run_init_process(const char *init_filename)
> {
> + /* Ensure we can access in-kernel filenames -- previous exec attempts
> + * might have set the address limit to USER_DS */
> + set_fs(KERNEL_DS);
> argv_init[0] = init_filename;
> kernel_execve(init_filename, argv_init, envp_init);
> }

Geeze, you're kicking over some ancient rocks there.

Possibly the bug was added by

commit 473ae30bc7b1dda5c5791c773f95e9424ddfead9
Author: Al Viro <[email protected]>
AuthorDate: Wed Apr 26 14:04:08 2006 -0400
Commit: Al Viro <[email protected]>
CommitDate: Tue Jun 20 05:25:21 2006 -0400

[PATCH] execve argument logging


and will be fixed with

--- a/fs/exec.c~a
+++ a/fs/exec.c
@@ -1357,14 +1357,14 @@ int search_binary_handler(struct linux_b
if (retval)
return retval;

- /* kernel module loader fixup */
- /* so we don't try to load run modprobe in kernel space. */
- set_fs(USER_DS);
-
retval = audit_bprm(bprm);
if (retval)
return retval;

+ /* kernel module loader fixup */
+ /* so we don't try to load run modprobe in kernel space. */
+ set_fs(USER_DS);
+
retval = -ENOENT;
for (try=0; try<2; try++) {
read_lock(&binfmt_lock);
_

but I'm finding lots of mysterious things in there.

Like, what does this comment:

/* so we don't try to load run modprobe in kernel space. */
set_fs(USER_DS);

mean?

It's all truly ancient code and I suspect the set_fs() simply isn't
needed any more - the calling process doesn't parent modprobe. And
request_module() should take care of the mm_segment, not its callers.



Also, search_binary_handler() appears to *always* return with USER_DS?
Is that a secret part of its interface? Or should it be
unconditionally restoring KERNEL_DS?


I tried to work out how that set_fs() got there, in the historical git
tree but it's part of 14592fa9:

73 files changed, 963 insertions(+), 798 deletions(-)

which is pretty useless (what's up with that?)


So I dunno, I'm stumped. I'm suspecting that the right fix here is to
just remove that call to set_fs(USER_DS) but I'm having trouble working
out what all this cruft is trying to do.

2011-06-07 06:49:59

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On 07.06.2011, 01:12 Andrew Morton wrote:
> On Mon, 30 May 2011 18:17:08 +0200
> Mathias Krause <[email protected]> wrote:
>
>> The bug is, that search_binary_handler() sets the address limit to
>> USER_DS but doesn't reset it on error which will make all further
>> attempts fail with -EFAULT because argv[0] is a pointer to kernel
>> memory, not userland.
>>
>> The bug can easily be reproduced by starting a 32 bit kernel with a 64
>> bit executable as /init and a 32 bit version as /sbin/init within an
>> initramfs. The hardcoded defaults should make /init fail because of the
>> unsupported file format but should make /sbin/init succeed. This doesn't
>> happen because the string "/sbin/init" lives in kernel memory and is no
>> longer allowed because of the modified address limit to USER_DS after
>
> Geeze, you're kicking over some ancient rocks there.
>
> Possibly the bug was added by
>
> commit 473ae30bc7b1dda5c5791c773f95e9424ddfead9
> Author: Al Viro <[email protected]>
> AuthorDate: Wed Apr 26 14:04:08 2006 -0400
> Commit: Al Viro <[email protected]>
> CommitDate: Tue Jun 20 05:25:21 2006 -0400
>
> [PATCH] execve argument logging
>
>
> and will be fixed with
>
> --- a/fs/exec.c~a
> +++ a/fs/exec.c
> @@ -1357,14 +1357,14 @@ int search_binary_handler(struct linux_b
> if (retval)
> return retval;
>
> - /* kernel module loader fixup */
> - /* so we don't try to load run modprobe in kernel space. */
> - set_fs(USER_DS);
> -
> retval = audit_bprm(bprm);
> if (retval)
> return retval;
>
> + /* kernel module loader fixup */
> + /* so we don't try to load run modprobe in kernel space. */
> + set_fs(USER_DS);
> +
> retval = -ENOENT;
> for (try=0; try<2; try++) {
> read_lock(&binfmt_lock);
> _

This fixes nothing because search_binary_handler() won't return that early in
my case but will try the binfmt list to find an appropriate handler. That for,
removing the set_fs() *might* be yet another solution to the problem but I
wasn't brave enough to do that change because I could not foresee all related
consequences -- didn't want to exchange a bug fix for a security hole.

Just changing the address limit in run_init_process() was the straight forward
fix with the least possible implications to the rest of the execve path.

> but I'm finding lots of mysterious things in there.
>
> Like, what does this comment:
>
> /* so we don't try to load run modprobe in kernel space. */
> set_fs(USER_DS);
>
> mean?

I found this comment confusing, too. But since the usermode helper is started
as separate kernel thread I thought this might be a security measure to prevent
leaking kernel memory to userland?!

> It's all truly ancient code and I suspect the set_fs() simply isn't
> needed any more - the calling process doesn't parent modprobe. And
> request_module() should take care of the mm_segment, not its callers.
>
> Also, search_binary_handler() appears to *always* return with USER_DS?
> Is that a secret part of its interface? Or should it be
> unconditionally restoring KERNEL_DS?

Wouldn't that introduce a security bug, when a userland triggered execve()
fails to execute and returns to userland? Having that process run with
KERNEL_DS afterwards isn't wanted, is it? Or is the address limit restored
by some other means?

> I tried to work out how that set_fs() got there, in the historical git
> tree but it's part of 14592fa9:
>
> 73 files changed, 963 insertions(+), 798 deletions(-)
>
> which is pretty useless (what's up with that?)
>
>
> So I dunno, I'm stumped. I'm suspecting that the right fix here is to
> just remove that call to set_fs(USER_DS) but I'm having trouble working
> out what all this cruft is trying to do.

Me is having trouble too, that for the simple solution with run_init_process().


Mathias-

2011-06-08 02:01:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Mon, Jun 6, 2011 at 4:12 PM, Andrew Morton <[email protected]> wrote:
>
> I tried to work out how that set_fs() got there, in the historical git
> tree but it's part of 14592fa9:
>
> ? ? ? ?73 files changed, 963 insertions(+), 798 deletions(-)
>
> which is pretty useless (what's up with that?)

Use tglx's more complete linux-history tree:

http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=summary

instead of the bkcvs import tree.

That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
predates the "real" BK history too: it's part of the (limited) 2.4.x
history that was imported from the release patches into BK at the
beginning of the use of BK. So at that point we didn't do indivual
commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.

But yeah, it's old and crufty. And I agree that usually the correct
fix is to remove the set_fs() calls entirely.

Linus

2011-06-08 08:23:07

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Wed, Jun 8, 2011 at 4:00 AM, Linus Torvalds <[email protected]> wrote:
> That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
> predates the "real" BK history too: it's part of the (limited) 2.4.x
> history that was imported from the release patches into BK at the
> beginning of the use of BK. So at that point we didn't do indivual
> commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.
>
> But yeah, it's old and crufty. And I agree that usually the correct
> fix is to remove the set_fs() calls entirely.

So here it is. Solves the problem for me.

Mathias

-- >8 --
Subject: [PATCH] exec: keep address limit on exec errors

Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us using
kernel memory on repeated calls to this function. This, in fact, breaks
the fallback of hard coded paths to the init program from being ever
successful if the first candidate fails to load.

With this patch applied it is possible to have a multi-arch rootfs
having one arch specific init binary for each of the (hard coded) probed
paths.

Signed-off-by: Mathias Krause <[email protected]>
---
fs/exec.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index ea5f748..31df75f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1357,10 +1357,6 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
if (retval)
return retval;

- /* kernel module loader fixup */
- /* so we don't try to load run modprobe in kernel space. */
- set_fs(USER_DS);
-
retval = audit_bprm(bprm);
if (retval)
return retval;
--
1.5.6.5

2011-06-08 10:47:32

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Tue, Jun 07, 2011 at 07:00:24PM -0700, Linus Torvalds wrote:

> That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
> predates the "real" BK history too: it's part of the (limited) 2.4.x
> history that was imported from the release patches into BK at the
> beginning of the use of BK. So at that point we didn't do indivual
> commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.
>
> But yeah, it's old and crufty. And I agree that usually the correct
> fix is to remove the set_fs() calls entirely.

I think these days its job is done by start_thread(), which is where we
switch to USER_DS; it's called by ->load_binary() when it decides it's past
the point of no return. However, it would be a good idea to verify that
all architectures do it there properly and we are not exposing a hole by
removal of this set_fs()...

2011-06-08 12:14:58

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Wed, Jun 8, 2011 at 12:47 PM, Al Viro <[email protected]> wrote:
> On Tue, Jun 07, 2011 at 07:00:24PM -0700, Linus Torvalds wrote:
>
>> That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
>> predates the "real" BK history too: it's part of the (limited) 2.4.x
>> history that was imported from the release patches into BK at the
>> beginning of the use of BK. So at that point we didn't do indivual
>> commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.
>>
>> But yeah, it's old and crufty. And I agree that usually the correct
>> fix is to remove the set_fs() calls entirely.
>
> I think these days its job is done by start_thread(), which is where we
> switch to USER_DS; it's called by ->load_binary() when it decides it's past
> the point of no return. ?However, it would be a good idea to verify that
> all architectures do it there properly and we are not exposing a hole by
> removal of this set_fs()...

I've checked all implementations of start_thread() and found some candidates:

SPARC, TILE and Xtensa don't call set_fs(USER_DS), albeit have
different definitions for USER_DS and KERNEL_DS. So those might need
fixing. I'm not familiar with those architectures, so someone else has
to answer this.

Score does not call set_fs(USER_DS) either but that's no problem
because USER_DS has the same value as KERNEL_DS on this architecture.

All others call set_fs(USER_DS) as almost the very first instruction
in start_thread(), or, as for MIPS, do it by setting addr_limit
directly.

Generally, I think, we should get Acks for the questionable arch
maintainers before commiting the patch that removes the call to
set_fs() in search_binary_handler().

I've also checked all binary format handlers if they all call
start_thread() and found a few that do not (binfmt_em86, binfmt_misc
and binfmt_script). But those are just interpreter warppers, i.e. call
search_binary_handler() in the end so should be safe.

Mathias

2011-06-08 14:03:49

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Wed, Jun 08, 2011 at 02:14:56PM +0200, Mathias Krause wrote:

> SPARC, TILE and Xtensa don't call set_fs(USER_DS), albeit have
> different definitions for USER_DS and KERNEL_DS. So those might need
> fixing. I'm not familiar with those architectures, so someone else has
> to answer this.

sparc (both sparc32 and sparc64) does that in flush_thread() (i.e. triggered
by flush_old_exec()); the only difference is that sparc64 is trying to avoid
writing to %asi if we already had USER_DS. Any failure exit past the call
of flush_old_exec() will send us a SIGKILL (and will not return -ENOEXEC,
so no further handlers will be called anyway).

No idea about tile and xtensa - asking on linux-arch might be a good idea.

FWIW, looking at the ->load_binary() instances... binfmt_som does not
bother with SIGKILL, which is Not Nice(tm) - there's nowhere to return
from sys_execve() at that point. binfmt_elf_fdpic.c has a couple of
bogosities - it sends SIGSEGV instead of SIGKILL (which is probably OK,
since signal handlers are already switched to default, and SIGSEGV would
kill just as well as SIGKILL; the only question is whether the state of
process is suitable for coredump at that point) *and* we have one case
where both SIGKILL and SIGSEGV are sent (setup_arg_pages() failure).
And binfmt_flat looks just plain weird wrt failure exits...

2011-06-08 20:20:18

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On 6/8/2011 8:14 AM, Mathias Krause wrote:
> On Wed, Jun 8, 2011 at 12:47 PM, Al Viro <[email protected]> wrote:
>> On Tue, Jun 07, 2011 at 07:00:24PM -0700, Linus Torvalds wrote:
>>> That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
>>> predates the "real" BK history too: it's part of the (limited) 2.4.x
>>> history that was imported from the release patches into BK at the
>>> beginning of the use of BK. So at that point we didn't do indivual
>>> commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.
>>>
>>> But yeah, it's old and crufty. And I agree that usually the correct
>>> fix is to remove the set_fs() calls entirely.
>> I think these days its job is done by start_thread(), which is where we
>> switch to USER_DS; it's called by ->load_binary() when it decides it's past
>> the point of no return. However, it would be a good idea to verify that
>> all architectures do it there properly and we are not exposing a hole by
>> removal of this set_fs()...
> I've checked all implementations of start_thread() and found some candidates:
>
> SPARC, TILE and Xtensa don't call set_fs(USER_DS), albeit have
> different definitions for USER_DS and KERNEL_DS. So those might need
> fixing. I'm not familiar with those architectures, so someone else has
> to answer this.

TILE relies on the set_fs() in search_binary_handler(), but adding
set_fs(USER_DS) in in start_thread() should be a valid change if the
set_fs() is removed from search_binary_handler(). I'm happy to ack the
obvious change for tile, or I can put the change to tile's start_thread()
in my tree for inclusion in 3.1, either way.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

2011-06-09 08:14:08

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Wed, Jun 8, 2011 at 10:20 PM, Chris Metcalf <[email protected]> wrote:
> On 6/8/2011 8:14 AM, Mathias Krause wrote:
>> On Wed, Jun 8, 2011 at 12:47 PM, Al Viro <[email protected]> wrote:
>>> On Tue, Jun 07, 2011 at 07:00:24PM -0700, Linus Torvalds wrote:
>>>> That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
>>>> predates the "real" BK history too: it's part of the (limited) 2.4.x
>>>> history that was imported from the release patches into BK at the
>>>> beginning of the use of BK. So at that point we didn't do indivual
>>>> commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.
>>>>
>>>> But yeah, it's old and crufty. And I agree that usually the correct
>>>> fix is to remove the set_fs() calls entirely.
>>> I think these days its job is done by start_thread(), which is where we
>>> switch to USER_DS; it's called by ->load_binary() when it decides it's past
>>> the point of no return. However, it would be a good idea to verify that
>>> all architectures do it there properly and we are not exposing a hole by
>>> removal of this set_fs()...
>> I've checked all implementations of start_thread() and found some candidates:
>>
>> SPARC, TILE and Xtensa don't call set_fs(USER_DS), albeit have
>> different definitions for USER_DS and KERNEL_DS. So those might need
>> fixing. I'm not familiar with those architectures, so someone else has
>> to answer this.
>
> TILE relies on the set_fs() in search_binary_handler(), but adding
> set_fs(USER_DS) in in start_thread() should be a valid change if the
> set_fs() is removed from search_binary_handler().

To shortcut the decision for the other architectures I've adapted the patch
and added a set_fs() call to the start_thread() implementations in question.
They where running under USER_DS before so calling set_fs() in
start_thread() is safe by any means.

> I'm happy to ack the
> obvious change for tile, or I can put the change to tile's start_thread()
> in my tree for inclusion in 3.1, either way.

Since this patch contains changes for multiple architectures I guess, this
one has to go in by Linus directly.

Mathias

-- >8 --
Subject: [PATCH v3] exec: keep address limit on exec errors

Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Chris Zankel <[email protected]>
---
v1 was actually the alternative solution in run_init_process()
v2 was missing the set_fs() calls for SPARC, TILE and Xtensa

arch/sparc/include/asm/processor_64.h | 2 ++
arch/tile/include/asm/processor.h | 1 +
arch/xtensa/include/asm/processor.h | 1 +
arch/xtensa/kernel/signal.c | 7 +------
fs/exec.c | 4 ----
5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index 59fcebb..eb6b334 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -105,6 +105,7 @@ extern unsigned long thread_saved_pc(struct task_struct *);
#define start_thread(regs, pc, sp) \
do { \
unsigned long __asi = ASI_PNF; \
+ set_fs(USER_DS); \
regs->tstate = (regs->tstate & (TSTATE_CWP)) | (TSTATE_INITIAL_MM|TSTATE_IE) | (__asi << 24UL); \
regs->tpc = ((pc & (~3)) - 4); \
regs->tnpc = regs->tpc + 4; \
@@ -143,6 +144,7 @@ do { \
#define start_thread32(regs, pc, sp) \
do { \
unsigned long __asi = ASI_PNF; \
+ set_fs(USER_DS); \
pc &= 0x00000000ffffffffUL; \
sp &= 0x00000000ffffffffUL; \
regs->tstate = (regs->tstate & (TSTATE_CWP))|(TSTATE_INITIAL_MM|TSTATE_IE|TSTATE_AM) | (__asi << 24UL); \
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 34c1e01..0890524 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -200,6 +200,7 @@ DECLARE_PER_CPU(unsigned long, boot_pc);
static inline void start_thread(struct pt_regs *regs,
unsigned long pc, unsigned long usp)
{
+ set_fs(USER_DS);
regs->pc = pc;
regs->sp = usp;
}
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index 3acb26e..d87a1ee 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -152,6 +152,7 @@ struct thread_struct {

/* Clearing a0 terminates the backtrace. */
#define start_thread(regs, new_pc, new_sp) \
+ set_fs(USER_DS); \
regs->pc = new_pc; \
regs->ps = USER_PS_VALUE; \
regs->areg[1] = new_sp; \
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index f2220b5..8a41e69 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -400,7 +400,7 @@ static void setup_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
* Return context not modified until this point.
*/

- /* Set up registers for signal handler */
+ /* Set up registers and access mode for signal handler */
start_thread(regs, (unsigned long) ka->sa.sa_handler,
(unsigned long) frame);

@@ -412,11 +412,6 @@ static void setup_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
regs->areg[7] = (unsigned long) &frame->info;
regs->areg[8] = (unsigned long) &frame->uc;

- /* Set access mode to USER_DS. Nomenclature is outdated, but
- * functionality is used in uaccess.h
- */
- set_fs(USER_DS);
-
#if DEBUG_SIG
printk("SIG rt deliver (%s:%d): signal=%d sp=%p pc=%08x\n",
current->comm, current->pid, signal, frame, regs->pc);
diff --git a/fs/exec.c b/fs/exec.c
index ea5f748..31df75f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1357,10 +1357,6 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
if (retval)
return retval;

- /* kernel module loader fixup */
- /* so we don't try to load run modprobe in kernel space. */
- set_fs(USER_DS);
-
retval = audit_bprm(bprm);
if (retval)
return retval;
--
1.5.6.5

2011-06-09 10:40:28

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Thu, Jun 09, 2011 at 10:14:03AM +0200, Mathias Krause wrote:

> v1 was actually the alternative solution in run_init_process()
> v2 was missing the set_fs() calls for SPARC, TILE and Xtensa

sparc does not need it...

2011-06-09 12:06:45

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Thu, Jun 9, 2011 at 12:40 PM, Al Viro <[email protected]> wrote:
> On Thu, Jun 09, 2011 at 10:14:03AM +0200, Mathias Krause wrote:
>
>> v1 was actually the alternative solution in run_init_process()
>> v2 was missing the set_fs() calls for SPARC, TILE and Xtensa
>
> sparc does not need it...

So do FRV, M68k (MMU and NOMMU) and PA-RISC. But they all call
set_fs(USER_DS) in flush_thread() and additionally in start_thread().
As those architectures aren't that visible for the average user, I
guess this is just an oversight that has no measurable performance
impact anyway.

For SPARC we might not want this duplicated work so the call to
set_fs() in flush_thread() should be removed to equalize the semantics
of that function between the different architectures -- call
set_fs(USER_DS) only in start_thread() (with the above exceptions).
...Albeit by looking closer at the implementation of flush_old_exec()
I think we should just move the set_fs() call there and remove it from
the architecture dependent implementations of flush_thread() and
start_thread(). flush_old_exec() is the real point of no return and
this way we get it consistent between all architectures.

What do you think?

Mathias

2011-06-09 15:56:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Thu, Jun 9, 2011 at 5:06 AM, Mathias Krause <[email protected]> wrote:
>
> ...Albeit by looking closer at the implementation of flush_old_exec()
> I think we should just move the set_fs() call there and remove it from
> the architecture dependent implementations of flush_thread() and
> start_thread().

Yeah, that sounds like the sane (and safe - it won't matter if there
are some architectures that get overlooked and then have a duplicate
set_fs() somewhere) approach.

Linus

2011-06-09 16:40:09

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On 09.06.2011, 17:56 Linus Torvalds wrote:

> On Thu, Jun 9, 2011 at 5:06 AM, Mathias Krause <[email protected]> wrote:
>>
>> ...Albeit by looking closer at the implementation of flush_old_exec()
>> I think we should just move the set_fs() call there and remove it from
>> the architecture dependent implementations of flush_thread() and
>> start_thread().
>
> Yeah, that sounds like the sane (and safe - it won't matter if there
> are some architectures that get overlooked and then have a duplicate
> set_fs() somewhere) approach.

So the only question left: Should it be one patch moving the set_fs() call to
flush_old_exec() and also removing the redundant calls in flush_thread() and
start_thread() or should that be split into one for the set_fs() move and
multiple ones for the arch specific set_fs() remove?

Mathias-

2011-06-09 17:04:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Thu, Jun 9, 2011 at 9:40 AM, Mathias Krause <[email protected]> wrote:
>
> So the only question left: Should it be one patch moving the set_fs() call to
> flush_old_exec() and also removing the redundant calls in flush_thread() and
> start_thread() or should that be split into one for the set_fs() move and
> multiple ones for the arch specific set_fs() remove?

I'd suggest one patch that moves the set_fs(), and then possibly
removes the ones from architectures that who-ever wrote the patch can
actively test.

Doing random other architectures is not worth the effort or confusion, imho.

Linus

2011-06-09 18:05:22

by Mathias Krause

[permalink] [raw]
Subject:

On Thu, Jun 9, 2011 at 7:03 PM, Linus Torvalds <[email protected]> wrote:
> On Thu, Jun 9, 2011 at 9:40 AM, Mathias Krause <[email protected]> wrote:
>>
>> So the only question left: Should it be one patch moving the set_fs() call to
>> flush_old_exec() and also removing the redundant calls in flush_thread() and
>> start_thread() or should that be split into one for the set_fs() move and
>> multiple ones for the arch specific set_fs() remove?
>
> I'd suggest one patch that moves the set_fs(), and then possibly
> removes the ones from architectures that who-ever wrote the patch can
> actively test.
>
> Doing random other architectures is not worth the effort or confusion, imho.

Successfully tested on i386 and x86_64.

Mathias

-- >8 --
Subject: [PATCH] exec: delay address limit change until point of no return

Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Al Viro <[email protected]>
Cc: [email protected]
---
arch/x86/kernel/process_32.c | 1 -
arch/x86/kernel/process_64.c | 1 -
fs/exec.c | 5 +----
3 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..a3d0dc5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -245,7 +245,6 @@ start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
{
set_user_gs(regs, 0);
regs->fs = 0;
- set_fs(USER_DS);
regs->ds = __USER_DS;
regs->es = __USER_DS;
regs->ss = __USER_DS;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6c9dd92..ca6f7ab 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -338,7 +338,6 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
regs->cs = _cs;
regs->ss = _ss;
regs->flags = X86_EFLAGS_IF;
- set_fs(USER_DS);
/*
* Free the old FP and other extended state
*/
diff --git a/fs/exec.c b/fs/exec.c
index ea5f748..97e0d52 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1093,6 +1093,7 @@ int flush_old_exec(struct linux_binprm * bprm)

bprm->mm = NULL; /* We're using it now */

+ set_fs(USER_DS);
current->flags &= ~(PF_RANDOMIZE | PF_KTHREAD);
flush_thread();
current->personality &= ~bprm->per_clear;
@@ -1357,10 +1358,6 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
if (retval)
return retval;

- /* kernel module loader fixup */
- /* so we don't try to load run modprobe in kernel space. */
- set_fs(USER_DS);
-
retval = audit_bprm(bprm);
if (retval)
return retval;
--
1.5.6.5

2011-06-09 22:57:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Thu, 9 Jun 2011 20:05:18 +0200
Mathias Krause <[email protected]> wrote:

> Subject: [PATCH] exec: delay address limit change until point of no return
>
> Unconditionally changing the address limit to USER_DS and not restoring
> it to its old value in the error path is wrong because it prevents us
> using kernel memory on repeated calls to this function. This, in fact,
> breaks the fallback of hard coded paths to the init program from being
> ever successful if the first candidate fails to load.
>
> With this patch applied switching to USER_DS is delayed until the point
> of no return is reached which makes it possible to have a multi-arch
> rootfs with one arch specific init binary for each of the (hard coded)
> probed paths.
>
> Since the address limit is already set to USER_DS when start_thread()
> will be invoked, this redundancy can be safely removed.

A couple of things here, please.

The description doesn't describe the user-visible symptoms of the bug.
This makes it hard for the -stable maintainers to work out whether they
should accept the patch and it makes it hard for random distro
maintainers to determine whether your patch might fix a user bug report
which they're working on.

Secondly, I understand that we have identified changes which other arch
maintainers should make and test. Please describe those changes to
make it easy for them and please also describe a way in which they can
test that change.

Both these things could be addressed using a description of some
testcase.

2011-06-10 08:11:44

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Fri, Jun 10, 2011 at 12:56 AM, Andrew Morton
<[email protected]> wrote:
> On Thu, ?9 Jun 2011 20:05:18 +0200
> Mathias Krause <[email protected]> wrote:
>
>> Subject: [PATCH] exec: delay address limit change until point of no return
>>
>> Unconditionally changing the address limit to USER_DS and not restoring
>> it to its old value in the error path is wrong because it prevents us
>> using kernel memory on repeated calls to this function. This, in fact,
>> breaks the fallback of hard coded paths to the init program from being
>> ever successful if the first candidate fails to load.
>>
>> With this patch applied switching to USER_DS is delayed until the point
>> of no return is reached which makes it possible to have a multi-arch
>> rootfs with one arch specific init binary for each of the (hard coded)
>> probed paths.
>>
>> Since the address limit is already set to USER_DS when start_thread()
>> will be invoked, this redundancy can be safely removed.
>
> A couple of things here, please.
>
> The description doesn't describe the user-visible symptoms of the bug.

For current users there is no visible symptom. Current kernels will
just fail to start init when you try to rely on the fallback mechanism
of the kernel searching for init in different paths when building a
multi-arch rootfs. It just won't work if the first init binary fails
to start.

Maybe related, the bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=36692

> This makes it hard for the -stable maintainers to work out whether they
> should accept the patch and it makes it hard for random distro
> maintainers to determine whether your patch might fix a user bug report
> which they're working on.

I think the first paragraph should have made clear what is wrong with
current kernels and the second one should have pointed out the
possibilities one has with this bug being fixed. It's broken since the
very beginning (Linus pointed to some 2.4.3 kernel) and personally I
would like to have it in the .32 longterm kernel, too.

> Secondly, I understand that we have identified changes which other arch
> maintainers should make and test. ?Please describe those changes to
> make it easy for them and please also describe a way in which they can
> test that change.

The changes for the other architectures are more of a cleanup than a
bug fix. They're calling set_fs(USER_DS) either in flush_thread() or
start_thread() which is unnecessary because they're already running
under USER_DS. But they did so even before my patch, so no "visible
changes" here. I've some follow up patches in the making to remove
those set_fs() calls but fall asleep last night so didn't finish them,
yet. Maybe later the day.

> Both these things could be addressed using a description of some
> testcase.

That's true -- I owe you a test case, so here it is:

Assume you want to build a multi-arch rootfs, e.g. combined 32 and 64
bit x86 (without CONFIG_IA32_EMULATION) or x86 + ARM + SPARC to have
one rootfs usable on a couple of different machines just by using a
different kernel. You could do this by providing a statically linked
init binary for each architecture and place them under /sbin/init,
/etc/init, /bin/init and /bin/sh. The kernel will fail to start
binaries not made for its native architecture but it should be able to
start the init binary build for its architecture. This is what is
currently broken and can be verified with the following test (assuming
a x86-64 environment):

cat<<EOT >hello.c
#include <unistd.h>
#include <stdio.h>

int main(void) {
printf("Hello %s world!\n", sizeof(int) == sizeof(long) ? "32" : "64");
pause();

return 0;
}
EOT

mkdir -p rootfs/etc rootfs/bin
gcc -static -m64 hello.c -o rootfs/etc/init
gcc -static -m32 hello.c -o rootfs/bin/init

cat<<EOT | gen_init_cpio - | gzip > initfs.gz
dir /dev 0755 0 0
nod /dev/console 0600 0 0 c 5 1
file /init /dev/null 0644 0 0
dir /sbin 0755 0 0
file /sbin/init /dev/null 0755 0 0
dir /etc 0755 0 0
file /etc/init rootfs/etc/init 0755 0 0
dir /bin 0755 0 0
file /bin/init rootfs/bin/init 0755 0 0
EOT

This generates an initramfs that won't boot on a current kernel
(3.0-rc2) but will, with the above patch applied.

Just some notes regarding the rootfs layout:
* The dummy /init file is needed, so the kernel won't call
prepare_namespace(). Otherwise it won't accept the initramfs as its
rootfs.
* The empty (but executable!) file /sbin/init should be the binary for
the "wrong" architecture, so won't execute. This should make the
kernel go ahead and try /etc/init.
* If you're running a 64 bit kernel /etc/init should start and print
out "Hello 64 bit world!", otherwise the kernel should fail to start
this binary and go ahead to /bin/init.
* Finally, if /etc/init failed, /bin/init should start and print out
"Hello 32 bit world!".

To use this test case for other architectures then x86 just skip the
etc/init file and remove the -m32 compiler switch. The dummy file
/sbin/init should make current kernels fail and kernels with the patch
applied succeed.

Mathias

2011-06-10 13:08:51

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] alpha, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
---
arch/alpha/kernel/process.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 838eac1..89bbe5b 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -200,7 +200,6 @@ show_regs(struct pt_regs *regs)
void
start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
{
- set_fs(USER_DS);
regs->pc = pc;
regs->ps = 8;
wrusp(sp);
--
1.5.6.5

2011-06-10 13:09:00

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] arm, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/arm/include/asm/processor.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index b2d9df5..3962caf 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -55,7 +55,6 @@ struct thread_struct {
#define start_thread(regs,pc,sp) \
({ \
unsigned long *stack = (unsigned long *)sp; \
- set_fs(USER_DS); \
memset(regs->uregs, 0, sizeof(regs->uregs)); \
if (current->personality & ADDR_LIMIT_32BIT) \
regs->ARM_cpsr = USR_MODE; \
--
1.5.6.5

2011-06-10 13:09:16

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] avr32, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/avr32/include/asm/processor.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 49a88f5..108502b 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -131,7 +131,6 @@ struct thread_struct {
*/
#define start_thread(regs, new_pc, new_sp) \
do { \
- set_fs(USER_DS); \
memset(regs, 0, sizeof(*regs)); \
regs->sr = MODE_USER; \
regs->pc = new_pc & ~1; \
--
1.5.6.5

2011-06-10 13:13:01

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] blackfin, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/blackfin/kernel/process.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a660fa..6a80a9e 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -140,7 +140,6 @@ EXPORT_SYMBOL(kernel_thread);
*/
void start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
{
- set_fs(USER_DS);
regs->pc = new_ip;
if (current->mm)
regs->p5 = current->mm->start_data;
--
1.5.6.5

2011-06-10 13:09:24

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] cris, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Jesper Nilsson <[email protected]>
---
arch/cris/include/arch-v10/arch/processor.h | 1 -
arch/cris/include/arch-v32/arch/processor.h | 1 -
2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/cris/include/arch-v10/arch/processor.h b/arch/cris/include/arch-v10/arch/processor.h
index cc692c7..93feb2a 100644
--- a/arch/cris/include/arch-v10/arch/processor.h
+++ b/arch/cris/include/arch-v10/arch/processor.h
@@ -53,7 +53,6 @@ struct thread_struct {
*/

#define start_thread(regs, ip, usp) do { \
- set_fs(USER_DS); \
regs->irp = ip; \
regs->dccr |= 1 << U_DCCR_BITNR; \
wrusp(usp); \
diff --git a/arch/cris/include/arch-v32/arch/processor.h b/arch/cris/include/arch-v32/arch/processor.h
index f80b477..9603c90 100644
--- a/arch/cris/include/arch-v32/arch/processor.h
+++ b/arch/cris/include/arch-v32/arch/processor.h
@@ -47,7 +47,6 @@ struct thread_struct {
*/
#define start_thread(regs, ip, usp) \
do { \
- set_fs(USER_DS); \
regs->erp = ip; \
regs->ccs |= 1 << (U_CCS_BITNR + CCS_SHIFT); \
wrusp(usp); \
--
1.5.6.5

2011-06-10 13:09:33

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] frv, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Also removed the dead code in flush_thread().

Signed-off-by: Mathias Krause <[email protected]>
---
arch/frv/include/asm/processor.h | 1 -
arch/frv/kernel/process.c | 5 +----
2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 4b789ab..81c2e27 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -97,7 +97,6 @@ extern struct task_struct *__kernel_current_task;
*/
#define start_thread(_regs, _pc, _usp) \
do { \
- set_fs(USER_DS); /* reads from user space */ \
__frame = __kernel_frame0_ptr; \
__frame->pc = (_pc); \
__frame->psr &= ~PSR_S; \
diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index 9d35975..3901df1 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -143,10 +143,7 @@ void machine_power_off(void)

void flush_thread(void)
{
-#if 0 //ndef NO_FPU
- unsigned long zero = 0;
-#endif
- set_fs(USER_DS);
+ /* nothing */
}

inline unsigned long user_stack(const struct pt_regs *regs)
--
1.5.6.5

2011-06-10 13:09:38

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] h8300, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/h8300/include/asm/processor.h | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 69e8a34..e834b60 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -81,7 +81,6 @@ struct thread_struct {
#if defined(__H8300H__)
#define start_thread(_regs, _pc, _usp) \
do { \
- set_fs(USER_DS); /* reads from user space */ \
(_regs)->pc = (_pc); \
(_regs)->ccr = 0x00; /* clear all flags */ \
(_regs)->er5 = current->mm->start_data; /* GOT base */ \
@@ -91,7 +90,6 @@ do { \
#if defined(__H8300S__)
#define start_thread(_regs, _pc, _usp) \
do { \
- set_fs(USER_DS); /* reads from user space */ \
(_regs)->pc = (_pc); \
(_regs)->ccr = 0x00; /* clear kernel flag */ \
(_regs)->exr = 0x78; /* enable all interrupts */ \
--
1.5.6.5

2011-06-10 13:09:44

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] ia64, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Fenghua Yu <[email protected]>
---
arch/ia64/include/asm/processor.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 03afe79..b2d6051 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -309,7 +309,6 @@ struct thread_struct {
}

#define start_thread(regs,new_ip,new_sp) do { \
- set_fs(USER_DS); \
regs->cr_ipsr = ((regs->cr_ipsr | (IA64_PSR_BITS_TO_SET | IA64_PSR_CPL)) \
& ~(IA64_PSR_BITS_TO_CLEAR | IA64_PSR_RI | IA64_PSR_IS)); \
regs->cr_iip = new_ip; \
--
1.5.6.5

2011-06-10 13:09:53

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] m32r, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/m32r/include/asm/processor.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index 8397c24..e1f46d7 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -106,7 +106,6 @@ struct thread_struct {

#define start_thread(regs, new_pc, new_spu) \
do { \
- set_fs(USER_DS); \
regs->psw = (regs->psw | USERPS_BPSW) & 0x0000FFFFUL; \
regs->bpc = new_pc; \
regs->spu = new_spu; \
--
1.5.6.5

2011-06-10 13:09:58

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] m68k, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Greg Ungerer <[email protected]>
---

Note: I'm not sure about the assignment to current->thread.fs in
flush_thread() -- shouldn't this be done in set_fs() itself?

arch/m68k/include/asm/processor.h | 4 ----
arch/m68k/kernel/process_mm.c | 2 +-
arch/m68k/kernel/process_no.c | 2 +-
3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index f111b02..d8ef53a 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -105,9 +105,6 @@ struct thread_struct {
static inline void start_thread(struct pt_regs * regs, unsigned long pc,
unsigned long usp)
{
- /* reads from user space */
- set_fs(USER_DS);
-
regs->pc = pc;
regs->sr &= ~0x2000;
wrusp(usp);
@@ -129,7 +126,6 @@ extern int handle_kernel_fault(struct pt_regs *regs);

#define start_thread(_regs, _pc, _usp) \
do { \
- set_fs(USER_DS); /* reads from user space */ \
(_regs)->pc = (_pc); \
((struct switch_stack *)(_regs))[-1].a6 = 0; \
reformat(_regs); \
diff --git a/arch/m68k/kernel/process_mm.c b/arch/m68k/kernel/process_mm.c
index c2a1fc2..1bc223a 100644
--- a/arch/m68k/kernel/process_mm.c
+++ b/arch/m68k/kernel/process_mm.c
@@ -185,7 +185,7 @@ EXPORT_SYMBOL(kernel_thread);
void flush_thread(void)
{
unsigned long zero = 0;
- set_fs(USER_DS);
+
current->thread.fs = __USER_DS;
if (!FPU_IS_EMU)
asm volatile (".chip 68k/68881\n\t"
diff --git a/arch/m68k/kernel/process_no.c b/arch/m68k/kernel/process_no.c
index 9b86ad1..69c1803 100644
--- a/arch/m68k/kernel/process_no.c
+++ b/arch/m68k/kernel/process_no.c
@@ -158,7 +158,7 @@ void flush_thread(void)
#ifdef CONFIG_FPU
unsigned long zero = 0;
#endif
- set_fs(USER_DS);
+
current->thread.fs = __USER_DS;
#ifdef CONFIG_FPU
if (!FPU_IS_EMU)
--
1.5.6.5

2011-06-10 13:10:16

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] microblaze, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/microblaze/kernel/process.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 968648a..dbb8124 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -237,7 +237,6 @@ unsigned long get_wchan(struct task_struct *p)
/* Set up a thread for executing a new program */
void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long usp)
{
- set_fs(USER_DS);
regs->pc = pc;
regs->r1 = usp;
regs->pt_mode = 0;
--
1.5.6.5

2011-06-10 13:10:11

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] mips, exec: remove redundant addr_limit assignment

The address limit is already set in flush_old_exec() via set_fs(USER_DS)
so this assignment is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/mips/kernel/process.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d2112d3..a8d53e5 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -103,7 +103,6 @@ void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp)
__init_dsp();
regs->cp0_epc = pc;
regs->regs[29] = sp;
- current_thread_info()->addr_limit = USER_DS;
}

void exit_thread(void)
--
1.5.6.5

2011-06-10 13:10:18

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] mn10300, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Koichi Yasutake <[email protected]>
---
arch/mn10300/include/asm/processor.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index 4c1b5cc..f7b3c9a 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -127,7 +127,6 @@ static inline void start_thread(struct pt_regs *regs,
{
struct thread_info *ti = current_thread_info();
struct pt_regs *frame0;
- set_fs(USER_DS);

frame0 = thread_info_to_uregs(ti);
frame0->epsw = EPSW_nSL | EPSW_IE | EPSW_IM;
--
1.5.6.5

2011-06-10 13:11:56

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] parisc, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: Helge Deller <[email protected]>
---
arch/parisc/include/asm/processor.h | 2 --
arch/parisc/kernel/process.c | 1 -
2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 9ce66e9..7213ec9 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -196,7 +196,6 @@ typedef unsigned int elf_caddr_t;
/* offset pc for priv. level */ \
pc |= 3; \
\
- set_fs(USER_DS); \
regs->iasq[0] = spaceid; \
regs->iasq[1] = spaceid; \
regs->iaoq[0] = pc; \
@@ -299,7 +298,6 @@ on downward growing arches, it looks like this:
elf_addr_t pc = (elf_addr_t)new_pc | 3; \
elf_caddr_t *argv = (elf_caddr_t *)bprm->exec + 1; \
\
- set_fs(USER_DS); \
regs->iasq[0] = spaceid; \
regs->iasq[1] = spaceid; \
regs->iaoq[0] = pc; \
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 4b4b918..62c60b8 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -192,7 +192,6 @@ void flush_thread(void)
/* Only needs to handle fpu stuff or perf monitors.
** REVISIT: several arches implement a "lazy fpu state".
*/
- set_fs(USER_DS);
}

void release_thread(struct task_struct *dead_task)
--
1.5.6.5

2011-06-10 13:10:26

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] ppc, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/powerpc/kernel/process.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 91e52df..885a2dd 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -831,8 +831,6 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
#endif

- set_fs(USER_DS);
-
/*
* If we exec out of a kernel thread then thread.regs will not be
* set. Do it now.
--
1.5.6.5

2011-06-10 13:10:33

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] s390, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/s390/include/asm/processor.h | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1300c30..4164533 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -118,14 +118,12 @@ struct stack_frame {
* Do necessary setup to start up a new thread.
*/
#define start_thread(regs, new_psw, new_stackp) do { \
- set_fs(USER_DS); \
regs->psw.mask = psw_user_bits; \
regs->psw.addr = new_psw | PSW_ADDR_AMODE; \
regs->gprs[15] = new_stackp; \
} while (0)

#define start_thread31(regs, new_psw, new_stackp) do { \
- set_fs(USER_DS); \
regs->psw.mask = psw_user32_bits; \
regs->psw.addr = new_psw | PSW_ADDR_AMODE; \
regs->gprs[15] = new_stackp; \
--
1.5.6.5

2011-06-10 13:10:53

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] sh, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/sh/include/asm/processor_64.h | 1 -
arch/sh/kernel/process_32.c | 2 --
2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/sh/include/asm/processor_64.h b/arch/sh/include/asm/processor_64.h
index 2a541dd..e25c4c7 100644
--- a/arch/sh/include/asm/processor_64.h
+++ b/arch/sh/include/asm/processor_64.h
@@ -150,7 +150,6 @@ struct thread_struct {
#define SR_USER (SR_MMU | SR_FD)

#define start_thread(_regs, new_pc, new_sp) \
- set_fs(USER_DS); \
_regs->sr = SR_USER; /* User mode. */ \
_regs->pc = new_pc - 4; /* Compensate syscall exit */ \
_regs->pc |= 1; /* Set SHmedia ! */ \
diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
index b473f0c..aaf6d59 100644
--- a/arch/sh/kernel/process_32.c
+++ b/arch/sh/kernel/process_32.c
@@ -102,8 +102,6 @@ EXPORT_SYMBOL(kernel_thread);
void start_thread(struct pt_regs *regs, unsigned long new_pc,
unsigned long new_sp)
{
- set_fs(USER_DS);
-
regs->pr = 0;
regs->sr = SR_FD;
regs->pc = new_pc;
--
1.5.6.5

2011-06-10 13:10:57

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] sparc, exec: remove redundant addr_limit assignment

The address limit is already set in flush_old_exec() so this
assignment of USER_DS is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/sparc/kernel/process_32.c | 3 +--
arch/sparc/kernel/process_64.c | 3 ---
2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
index c8cc461..f793742 100644
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -380,8 +380,7 @@ void flush_thread(void)
#endif
}

- /* Now, this task is no longer a kernel thread. */
- current->thread.current_ds = USER_DS;
+ /* This task is no longer a kernel thread. */
if (current->thread.flags & SPARC_FLAG_KTHREAD) {
current->thread.flags &= ~SPARC_FLAG_KTHREAD;

diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..d959cd0 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -368,9 +368,6 @@ void flush_thread(void)

/* Clear FPU register state. */
t->fpsaved[0] = 0;
-
- if (get_thread_current_ds() != ASI_AIUS)
- set_fs(USER_DS);
}

/* It's a bit more tricky when 64-bit tasks are involved... */
--
1.5.6.5

2011-06-10 13:11:05

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] um, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Richard Weinberger <[email protected]>
---
arch/um/kernel/exec.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index 09bd7b5..939a4a6 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -38,7 +38,6 @@ void flush_thread(void)

void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
{
- set_fs(USER_DS);
PT_REGS_IP(regs) = eip;
PT_REGS_SP(regs) = esp;
}
--
1.5.6.5

2011-06-10 13:11:13

by Mathias Krause

[permalink] [raw]
Subject: [PATCH] unicore32, exec: remove redundant set_fs(USER_DS)

The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.

Signed-off-by: Mathias Krause <[email protected]>
---
arch/unicore32/include/asm/processor.h | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index e11cb07..f0d780a 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -53,7 +53,6 @@ struct thread_struct {
#define start_thread(regs, pc, sp) \
({ \
unsigned long *stack = (unsigned long *)sp; \
- set_fs(USER_DS); \
memset(regs->uregs, 0, sizeof(regs->uregs)); \
regs->UCreg_asr = USER_MODE; \
regs->UCreg_pc = pc & ~1; /* pc */ \
--
1.5.6.5

2011-06-10 13:48:46

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] arm, exec: remove redundant set_fs(USER_DS)

On Fri, Jun 10, 2011 at 03:08:57PM +0200, Mathias Krause wrote:
> The address limit is already set in flush_old_exec() so this
> set_fs(USER_DS) is redundant.

Please show where and how this is done. I've looked and can't see
any equivalent call to set_fs() in flush_old_exec().

2011-06-10 13:53:27

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] arm, exec: remove redundant set_fs(USER_DS)

On Fri, Jun 10, 2011 at 3:48 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Fri, Jun 10, 2011 at 03:08:57PM +0200, Mathias Krause wrote:
>> The address limit is already set in flush_old_exec() so this
>> set_fs(USER_DS) is redundant.
>
> Please show where and how this is done. ?I've looked and can't see
> any equivalent call to set_fs() in flush_old_exec().

Before dac853a (exec: delay address limit change until point of no
return) it was done in search_binary_handler(), now it is done in
flush_old_exec(). Either way set_fs(USER_DS) was/gets called before
start_thread() so the call there is redundant.

Mathias

2011-06-10 14:17:47

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH] blackfin, exec: remove redundant set_fs(USER_DS)

On Fri, Jun 10, 2011 at 09:09, Mathias Krause wrote:
> The address limit is already set in flush_old_exec() so this
> set_fs(USER_DS) is redundant.

thanks, ive merged this into my Blackfin tree
-mike

2011-06-10 15:52:27

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Fri, 10 Jun 2011 10:11:39 +0200 Mathias Krause wrote:

> cat<<EOT >hello.c
> #include <unistd.h>
> #include <stdio.h>
>
> int main(void) {
> printf("Hello %s world!\n", sizeof(int) == sizeof(long) ? "32" : "64");

"Hello %s bit world!\n"

:)

> pause();
>
> return 0;
> }
> EOT


> * If you're running a 64 bit kernel /etc/init should start and print
> out "Hello 64 bit world!", otherwise the kernel should fail to start
> this binary and go ahead to /bin/init.
> * Finally, if /etc/init failed, /bin/init should start and print out
> "Hello 32 bit world!".


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

2011-06-10 20:00:58

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH] um, exec: remove redundant set_fs(USER_DS)

Am Freitag 10 Juni 2011, 15:10:57 schrieb Mathias Krause:
> The address limit is already set in flush_old_exec() so this
> set_fs(USER_DS) is redundant.
>
> Signed-off-by: Mathias Krause <[email protected]>
> Cc: Richard Weinberger <[email protected]>

Applied.

Thanks,
//richard

2011-06-11 23:08:49

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

From: Mathias Krause <[email protected]>
Date: Fri, 10 Jun 2011 15:10:53 +0200

> The address limit is already set in flush_old_exec() so this
> assignment of USER_DS is redundant.
>
> Signed-off-by: Mathias Krause <[email protected]>
...
> @@ -368,9 +368,6 @@ void flush_thread(void)
>
> /* Clear FPU register state. */
> t->fpsaved[0] = 0;
> -
> - if (get_thread_current_ds() != ASI_AIUS)
> - set_fs(USER_DS);
> }

Yeah but now you're doing it unconditionally, the guard is here
because the %asi register write which set_fs() does is extremely
expensive on sparc64 and %99.99999 of the time we can avoid it.

2011-06-11 23:44:24

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

On Sat, Jun 11, 2011 at 04:08:42PM -0700, David Miller wrote:
> From: Mathias Krause <[email protected]>
> Date: Fri, 10 Jun 2011 15:10:53 +0200
>
> > The address limit is already set in flush_old_exec() so this
> > assignment of USER_DS is redundant.
> >
> > Signed-off-by: Mathias Krause <[email protected]>
> ...
> > @@ -368,9 +368,6 @@ void flush_thread(void)
> >
> > /* Clear FPU register state. */
> > t->fpsaved[0] = 0;
> > -
> > - if (get_thread_current_ds() != ASI_AIUS)
> > - set_fs(USER_DS);
> > }
>
> Yeah but now you're doing it unconditionally, the guard is here
> because the %asi register write which set_fs() does is extremely
> expensive on sparc64 and %99.99999 of the time we can avoid it.

OTOH, get_thread_current_ds() is cheap and moving that into set_fs()
itself wouldn't be particulary bad idea...

2011-06-12 00:58:43

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

From: Al Viro <[email protected]>
Date: Sun, 12 Jun 2011 00:44:15 +0100

> On Sat, Jun 11, 2011 at 04:08:42PM -0700, David Miller wrote:
>> From: Mathias Krause <[email protected]>
>> Date: Fri, 10 Jun 2011 15:10:53 +0200
>>
>> > @@ -368,9 +368,6 @@ void flush_thread(void)
>> >
>> > /* Clear FPU register state. */
>> > t->fpsaved[0] = 0;
>> > -
>> > - if (get_thread_current_ds() != ASI_AIUS)
>> > - set_fs(USER_DS);
>> > }
>>
>> Yeah but now you're doing it unconditionally, the guard is here
>> because the %asi register write which set_fs() does is extremely
>> expensive on sparc64 and %99.99999 of the time we can avoid it.
>
> OTOH, get_thread_current_ds() is cheap and moving that into set_fs()
> itself wouldn't be particulary bad idea...

The reason the test is only in this specific spot, is that's the only
place where the optimization makes any sense.

The rest of the time it's in compat layer code or similar, where we
know the set_fs() is actually necessary.

Therefore, expanding the test into every set_fs() call would be
wasteful.

2011-06-12 01:01:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

On Sat, Jun 11, 2011 at 5:58 PM, David Miller <[email protected]> wrote:
>
> The reason the test is only in this specific spot, is that's the only
> place where the optimization makes any sense.

Well, considering that it didn't make any sense there *either*, that's
kind of pointless.

We always had the unconditional "set_fs()" in the exec path. It only
got moved, and as part of that conscious effort, the pointless
architecture churn is getting cleaned up.

Linus

2011-06-12 01:04:43

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

From: Linus Torvalds <[email protected]>
Date: Sat, 11 Jun 2011 18:01:06 -0700

> We always had the unconditional "set_fs()" in the exec path. It only
> got moved, and as part of that conscious effort, the pointless
> architecture churn is getting cleaned up.

Sure.

2011-06-13 09:19:48

by Xuetao Guan

[permalink] [raw]
Subject: Re: [PATCH] unicore32, exec: remove redundant set_fs(USER_DS)

On Fri, 2011-06-10 at 15:11 +0200, Mathias Krause wrote:
> The address limit is already set in flush_old_exec() so this
> set_fs(USER_DS) is redundant.
>
> Signed-off-by: Mathias Krause <[email protected]>
> ---
> arch/unicore32/include/asm/processor.h | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
> index e11cb07..f0d780a 100644
> --- a/arch/unicore32/include/asm/processor.h
> +++ b/arch/unicore32/include/asm/processor.h
> @@ -53,7 +53,6 @@ struct thread_struct {
> #define start_thread(regs, pc, sp) \
> ({ \
> unsigned long *stack = (unsigned long *)sp; \
> - set_fs(USER_DS); \
> memset(regs->uregs, 0, sizeof(regs->uregs)); \
> regs->UCreg_asr = USER_MODE; \
> regs->UCreg_pc = pc & ~1; /* pc */ \

Hi Mathias,
I searched for the code in flush_old_exec(), but I can't find the code
you mentioned. Could you make it more clear?
And, if all fs codes (not only elf and aout) have the similar
implementations, perhaps all arch-specific codes should be manipulated
in the meanwhile.

Also Cc: Arnd Bergmann <[email protected]>

Thanks & Regards.

Guan Xuetao

2011-06-13 16:03:10

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] unicore32, exec: remove redundant set_fs(USER_DS)

On 13.06.2011, 11:19 Guan Xuetao wrote:
> On Fri, 2011-06-10 at 15:11 +0200, Mathias Krause wrote:
>> The address limit is already set in flush_old_exec() so this
>> set_fs(USER_DS) is redundant.
>>
>> Signed-off-by: Mathias Krause <[email protected]>
>> ---
>> arch/unicore32/include/asm/processor.h | 1 -
>> 1 files changed, 0 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
>> index e11cb07..f0d780a 100644
>> --- a/arch/unicore32/include/asm/processor.h
>> +++ b/arch/unicore32/include/asm/processor.h
>> @@ -53,7 +53,6 @@ struct thread_struct {
>> #define start_thread(regs, pc, sp) \
>> ({ \
>> unsigned long *stack = (unsigned long *)sp; \
>> - set_fs(USER_DS); \
>> memset(regs->uregs, 0, sizeof(regs->uregs)); \
>> regs->UCreg_asr = USER_MODE; \
>> regs->UCreg_pc = pc & ~1; /* pc */ \
>
> Hi Mathias,
> I searched for the code in flush_old_exec(), but I can't find the code
> you mentioned. Could you make it more clear?

Sorry, this statement is based on a commit post v3.0-rc2. Before dac853a (exec:
delay address limit change until point of no return) it was done in
search_binary_handler(), now it is done in flush_old_exec(). Either way
set_fs(USER_DS) gets called before start_thread() so the call there is
redundant.

> And, if all fs codes (not only elf and aout) have the similar
> implementations,

I've checked that all binary format handler call flush_old_exec() before
start_thread(). So: yes.

> perhaps all arch-specific codes should be manipulated
> in the meanwhile.

That's what this LKML thread is for: https://lkml.org/lkml/2011/6/10/65

Thanks,
Mathias-

2011-06-13 20:36:45

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

On 12.06.2011, at 01:08 David Miller wrote:
> From: Mathias Krause <[email protected]>
> Date: Fri, 10 Jun 2011 15:10:53 +0200
>
>> The address limit is already set in flush_old_exec() so this
>> assignment of USER_DS is redundant.
>>
>> Signed-off-by: Mathias Krause <[email protected]>
> ...
>> @@ -368,9 +368,6 @@ void flush_thread(void)
>>
>> /* Clear FPU register state. */
>> t->fpsaved[0] = 0;
>> -
>> - if (get_thread_current_ds() != ASI_AIUS)
>> - set_fs(USER_DS);
>> }
>
> Yeah but now you're doing it unconditionally, the guard is here
> because the %asi register write which set_fs() does is extremely
> expensive on sparc64 and %99.99999 of the time we can avoid it.

As Linus already pointed out, that set_fs() was never called because
we already had (and still have) an unconditional set_fs() in the arch
independent code. So this patch just removes some dead code.

Mathias

2011-06-14 06:33:44

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH] sh, exec: remove redundant set_fs(USER_DS)

On Fri, Jun 10, 2011 at 03:10:48PM +0200, Mathias Krause wrote:
> The address limit is already set in flush_old_exec() so those calls to
> set_fs(USER_DS) are redundant.
>
> Signed-off-by: Mathias Krause <[email protected]>

Applied, thanks.

2011-06-14 07:03:23

by Xuetao Guan

[permalink] [raw]
Subject: Re: [PATCH] unicore32, exec: remove redundant set_fs(USER_DS)

On Mon, 2011-06-13 at 18:02 +0200, Mathias Krause wrote:
> On 13.06.2011, 11:19 Guan Xuetao wrote:
> > On Fri, 2011-06-10 at 15:11 +0200, Mathias Krause wrote:
> >> The address limit is already set in flush_old_exec() so this
> >> set_fs(USER_DS) is redundant.
> >>
> >> Signed-off-by: Mathias Krause <[email protected]>
> >> ---
> >> arch/unicore32/include/asm/processor.h | 1 -
> >> 1 files changed, 0 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
> >> index e11cb07..f0d780a 100644
> >> --- a/arch/unicore32/include/asm/processor.h
> >> +++ b/arch/unicore32/include/asm/processor.h
> >> @@ -53,7 +53,6 @@ struct thread_struct {
> >> #define start_thread(regs, pc, sp) \
> >> ({ \
> >> unsigned long *stack = (unsigned long *)sp; \
> >> - set_fs(USER_DS); \
> >> memset(regs->uregs, 0, sizeof(regs->uregs)); \
> >> regs->UCreg_asr = USER_MODE; \
> >> regs->UCreg_pc = pc & ~1; /* pc */ \
> >
> > Hi Mathias,
> > I searched for the code in flush_old_exec(), but I can't find the code
> > you mentioned. Could you make it more clear?
>
> Sorry, this statement is based on a commit post v3.0-rc2. Before dac853a (exec:
> delay address limit change until point of no return) it was done in
> search_binary_handler(), now it is done in flush_old_exec(). Either way
> set_fs(USER_DS) gets called before start_thread() so the call there is
> redundant.
>
> > And, if all fs codes (not only elf and aout) have the similar
> > implementations,
>
> I've checked that all binary format handler call flush_old_exec() before
> start_thread(). So: yes.
>
> > perhaps all arch-specific codes should be manipulated
> > in the meanwhile.
>
> That's what this LKML thread is for: https://lkml.org/lkml/2011/6/10/65
>
> Thanks,
> Mathias
Thanks for your explanations.
The patch looks good to me.

Guan Xuetao

2011-06-14 11:28:45

by Hans-Christian Egtvedt

[permalink] [raw]
Subject: Re: [PATCH] avr32, exec: remove redundant set_fs(USER_DS)

On Fri, 2011-06-10 at 15:09 +0200, Mathias Krause wrote:
> The address limit is already set in flush_old_exec() so this
> set_fs(USER_DS) is redundant.
>
> Signed-off-by: Mathias Krause <[email protected]>

Thanks, after searching up the history before this patch I see the whole
point as well. Please drop a line/link to parent patches that explains
the whole picture in the future.

I'll apply this to the AVR32 tree.

--
Hans-Christian Egtvedt

2011-06-15 14:40:32

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH] m68k, exec: remove redundant set_fs(USER_DS)

On Fri, Jun 10, 2011 at 15:09, Mathias Krause <[email protected]> wrote:
> The address limit is already set in flush_old_exec() so those calls to
> set_fs(USER_DS) are redundant.
>
> Signed-off-by: Mathias Krause <[email protected]>
> Cc: Greg Ungerer <[email protected]>
> ---
>
> Note: I'm not sure about the assignment to current->thread.fs in
> flush_thread() -- shouldn't this be done in set_fs() itself?

set_fs() is used to temporary set the address space to be used from the
kernel.
current->thread.fs is the address space that will be used when the
thread returns
to userspace.
So I think it's correct.

For nommu, thread.fs is set, but not really used.

>  arch/m68k/include/asm/processor.h |    4 ----
>  arch/m68k/kernel/process_mm.c     |    2 +-
>  arch/m68k/kernel/process_no.c     |    2 +-
>  3 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
> index f111b02..d8ef53a 100644
> --- a/arch/m68k/include/asm/processor.h
> +++ b/arch/m68k/include/asm/processor.h
> @@ -105,9 +105,6 @@ struct thread_struct {
>  static inline void start_thread(struct pt_regs * regs, unsigned long pc,
>                                unsigned long usp)
>  {
> -       /* reads from user space */
> -       set_fs(USER_DS);
> -
>        regs->pc = pc;
>        regs->sr &= ~0x2000;
>        wrusp(usp);
> @@ -129,7 +126,6 @@ extern int handle_kernel_fault(struct pt_regs *regs);
>
>  #define start_thread(_regs, _pc, _usp)                  \
>  do {                                                    \
> -       set_fs(USER_DS); /* reads from user space */    \
>        (_regs)->pc = (_pc);                            \
>        ((struct switch_stack *)(_regs))[-1].a6 = 0;    \
>        reformat(_regs);                                \
> diff --git a/arch/m68k/kernel/process_mm.c b/arch/m68k/kernel/process_mm.c
> index c2a1fc2..1bc223a 100644
> --- a/arch/m68k/kernel/process_mm.c
> +++ b/arch/m68k/kernel/process_mm.c
> @@ -185,7 +185,7 @@ EXPORT_SYMBOL(kernel_thread);
>  void flush_thread(void)
>  {
>        unsigned long zero = 0;
> -       set_fs(USER_DS);
> +
>        current->thread.fs = __USER_DS;
>        if (!FPU_IS_EMU)
>                asm volatile (".chip 68k/68881\n\t"
> diff --git a/arch/m68k/kernel/process_no.c b/arch/m68k/kernel/process_no.c
> index 9b86ad1..69c1803 100644
> --- a/arch/m68k/kernel/process_no.c
> +++ b/arch/m68k/kernel/process_no.c
> @@ -158,7 +158,7 @@ void flush_thread(void)
>  #ifdef CONFIG_FPU
>        unsigned long zero = 0;
>  #endif
> -       set_fs(USER_DS);
> +
>        current->thread.fs = __USER_DS;
>  #ifdef CONFIG_FPU
>        if (!FPU_IS_EMU)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

2011-06-15 15:49:41

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] m68k, exec: remove redundant set_fs(USER_DS)

On Wed, Jun 15, 2011 at 4:40 PM, Geert Uytterhoeven
<[email protected]> wrote:
> On Fri, Jun 10, 2011 at 15:09, Mathias Krause <[email protected]> wrote:
>> The address limit is already set in flush_old_exec() so those calls to
>> set_fs(USER_DS) are redundant.
>>
>> Signed-off-by: Mathias Krause <[email protected]>
>> Cc: Greg Ungerer <[email protected]>
>> ---
>>
>> Note: I'm not sure about the assignment to current->thread.fs in
>> flush_thread() -- shouldn't this be done in set_fs() itself?
>
> set_fs() is used to temporary set the address space to be used from the
> kernel.
> current->thread.fs is the address space that will be used when the
> thread returns
> to userspace.
> So I think it's correct.
>
> For nommu, thread.fs is set, but not really used.

Thanks for the explanation. So only the set_fs() is redundant.

Mathias

2011-06-17 18:45:54

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] sparc, exec: remove redundant addr_limit assignment

On 12.06.2011, 01:08 David Miller wrote:

> From: Mathias Krause <[email protected]>
> Date: Fri, 10 Jun 2011 15:10:53 +0200
>
>> The address limit is already set in flush_old_exec() so this
>> assignment of USER_DS is redundant.
>>
>> Signed-off-by: Mathias Krause <[email protected]>
> ...
>> @@ -368,9 +368,6 @@ void flush_thread(void)
>>
>> /* Clear FPU register state. */
>> t->fpsaved[0] = 0;
>> -
>> - if (get_thread_current_ds() != ASI_AIUS)
>> - set_fs(USER_DS);
>> }
>
> Yeah but now you're doing it unconditionally, the guard is here
> because the %asi register write which set_fs() does is extremely
> expensive on sparc64 and %99.99999 of the time we can avoid it.


David, can you please use the attached test program to give us some
numbers on how expensive the write to the %asi register is, relative to
the complexity of the whole exec() path. Please test it w/ and w/o the
attached patch. If the difference is significant it might be worth the
pain to push the set_fs() down to the arch specific code, e.g. into
flush_thread() as on SPARC, and remove it from flush_old_exec().

For the test something like:
$ for i in 1 2 3; do echo "run #$i:"; time exec_test 5000 /bin/true; done
or better:
$ perf stat -r3 exec_test 5000 /bin/true
should give some reasonable numbers.

I've no SPARC machine, so can't test myself.


Thanks,
Mathias


Attachments:
exec_test.c (999.00 B)
no_unconditional_set_fs.diff (346.00 B)
Download all attachments

2011-06-27 04:29:38

by Mathias Krause

[permalink] [raw]
Subject: Re: [PATCH] arm, exec: remove redundant set_fs(USER_DS)

On Fri, Jun 10, 2011 at 3:53 PM, Mathias Krause <[email protected]> wrote:
> On Fri, Jun 10, 2011 at 3:48 PM, Russell King - ARM Linux
> <[email protected]> wrote:
>> On Fri, Jun 10, 2011 at 03:08:57PM +0200, Mathias Krause wrote:
>>> The address limit is already set in flush_old_exec() so this
>>> set_fs(USER_DS) is redundant.
>>
>> Please show where and how this is done. ?I've looked and can't see
>> any equivalent call to set_fs() in flush_old_exec().
>
> Before dac853a (exec: delay address limit change until point of no
> return) it was done in search_binary_handler(), now it is done in
> flush_old_exec(). Either way set_fs(USER_DS) was/gets called before
> start_thread() so the call there is redundant.

Russell, any new opinion on this?

Mathias