2003-01-18 06:01:09

by Anton Blanchard

[permalink] [raw]
Subject: recent change to exit_mmap


Hi,

On ppc64 a 32bit task has 4GB and a 64bit task has 2TB of address space.

We use a bit in the thread struct to decide which limit to apply against
TASK_SIZE:

#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_USER32 : TASK_SIZE_USER64)

The TIF_32BIT flag gets set in the arch specific SET_PERSONALITY hook
in load_elf_binary.

After the recent changes in mm/mmap.c, the following sequence of events
happens:

1. a 64bit task tries to exec a 32bit one
2. we reach load_elf_binary
3. call SET_PERSONALITY which sets TIF_32BIT to true
4. call flush_old_exec->exec_mmap->mmput->exit_mmap
5. call unmap_vmas(,,,,TASK_SIZE,) which only flushes mappings below 4GB
6. BUG_ON in exit_mmap

It seems with the TIF_32BIT scheme we have a window between
SET_PERSONALITY until exec returns where TASK_SIZE might be considered
incorrect (although at which exact point to you decide that, yes we are
now in the new context).

Any ideas on how to fix this? Maybe we can move SET_PERSONALITY down
after flush_old_exec.

Anton


2003-01-18 06:34:28

by Andrew Morton

[permalink] [raw]
Subject: Re: recent change to exit_mmap

Anton Blanchard <[email protected]> wrote:
>
>
> Hi,
>
> On ppc64 a 32bit task has 4GB and a 64bit task has 2TB of address space.
>
> We use a bit in the thread struct to decide which limit to apply against
> TASK_SIZE:
>
> #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
> TASK_SIZE_USER32 : TASK_SIZE_USER64)
>
> The TIF_32BIT flag gets set in the arch specific SET_PERSONALITY hook
> in load_elf_binary.
>
> After the recent changes in mm/mmap.c, the following sequence of events
> happens:
>
> 1. a 64bit task tries to exec a 32bit one
> 2. we reach load_elf_binary
> 3. call SET_PERSONALITY which sets TIF_32BIT to true
> 4. call flush_old_exec->exec_mmap->mmput->exit_mmap
> 5. call unmap_vmas(,,,,TASK_SIZE,) which only flushes mappings below 4GB
> 6. BUG_ON in exit_mmap
>
> It seems with the TIF_32BIT scheme we have a window between
> SET_PERSONALITY until exec returns where TASK_SIZE might be considered
> incorrect (although at which exact point to you decide that, yes we are
> now in the new context).
>
> Any ideas on how to fix this? Maybe we can move SET_PERSONALITY down
> after flush_old_exec.

(Does this only apply to elf?)


If the old process has 64-bit virtual address space and the new process has
32-bit, you want to unmap the 64-bit space, yes?

And if the old process is 32-bit and the new process is 64-bit (possible?)
you want to unmap the 32-bit space. But unmapping 64-bit space here is
harmless, yes?

Yes, it would be more logical to still be running with the personality of the
old process when we try to expunge all its mappings. However I suspect we
need all those mappings to grab the exec arguments.

How about we make exit_mmap() be independent of the personality again by
doing something like this?

Looks like ia64 needs work, too...



asm-ppc64/processor.h | 1 +
mmap.c | 18 +++++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)

diff -puN mm/mmap.c~exit_mmap-fix mm/mmap.c
--- 25/mm/mmap.c~exit_mmap-fix 2003-01-17 22:35:13.000000000 -0800
+++ 25-akpm/mm/mmap.c 2003-01-17 22:41:10.000000000 -0800
@@ -1379,6 +1379,22 @@ void build_mmap_rb(struct mm_struct * mm
}
}

+/*
+ * During exit_mmap, TASK_SIZE is not a reliable indication of the virtual
+ * size of the mm which is being torn down. Because on the exec() path, this
+ * process may have switched its personality from 64-bit to 32-bit prior to
+ * calling exit_mmap(). So TASK_SIZE returns a value suitable for a 32-bit
+ * process, and not the 64-bit process whose mm we need to invalidate.
+ *
+ * So what we do is to always unmap the largest virtual address space which
+ * the architecture supports. unmap_vmas() will then unmap every VMA in the
+ * mm, which is what we want to happen here.
+ */
+
+#ifndef TASK_SIZE_MAX
+#define TASK_SIZE_MAX TASK_SIZE
+#endif
+
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
@@ -1395,7 +1411,7 @@ void exit_mmap(struct mm_struct *mm)
tlb = tlb_gather_mmu(mm, 1);
flush_cache_mm(mm);
mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
- TASK_SIZE, &nr_accounted);
+ TASK_SIZE_MAX, &nr_accounted);
vm_unacct_memory(nr_accounted);
BUG_ON(mm->map_count); /* This is just debugging */
clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
diff -puN include/asm-ppc64/processor.h~exit_mmap-fix include/asm-ppc64/processor.h
--- 25/include/asm-ppc64/processor.h~exit_mmap-fix 2003-01-17 22:41:32.000000000 -0800
+++ 25-akpm/include/asm-ppc64/processor.h 2003-01-17 22:42:03.000000000 -0800
@@ -630,6 +630,7 @@ extern struct task_struct *last_task_use

#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_USER32 : TASK_SIZE_USER64)
+#define TASK_SIZE_MAX TASK_SIZE_USER64
#endif /* __KERNEL__ */



_

2003-01-18 06:49:57

by Andrew Morton

[permalink] [raw]
Subject: Re: recent change to exit_mmap

Anton Blanchard <[email protected]> wrote:
>
> It seems with the TIF_32BIT scheme we have a window between
> SET_PERSONALITY until exec returns where TASK_SIZE might be considered
> incorrect (although at which exact point to you decide that, yes we are
> now in the new context).
>
> Any ideas on how to fix this? Maybe we can move SET_PERSONALITY down
> after flush_old_exec.
>

On seconds thoughts...

Are the first two SET_PERSONALITY() calls in there actually doing anything?
Perhaps you're right, and we only need the one at line 615, whcih is in the
right place?

2003-01-18 07:14:53

by Anton Blanchard

[permalink] [raw]
Subject: Re: recent change to exit_mmap


> On seconds thoughts...
>
> Are the first two SET_PERSONALITY() calls in there actually doing anything?
> Perhaps you're right, and we only need the one at line 615, whcih is in the
> right place?

Sounds good. The following patch tests out OK.

Anton

===== fs/binfmt_elf.c 1.34 vs edited =====
--- 1.34/fs/binfmt_elf.c Mon Nov 25 19:59:02 2002
+++ edited/fs/binfmt_elf.c Sat Jan 18 18:16:52 2003
@@ -536,8 +536,6 @@
strcmp(elf_interpreter,"/usr/lib/ld.so.1") == 0)
ibcs2_interpreter = 1;

- SET_PERSONALITY(elf_ex, ibcs2_interpreter);
-
interpreter = open_exec(elf_interpreter);
retval = PTR_ERR(interpreter);
if (IS_ERR(interpreter))
@@ -578,9 +576,6 @@
// printk(KERN_WARNING "ELF: Ambiguous type, using ELF\n");
interpreter_type = INTERPRETER_ELF;
}
- } else {
- /* Executables without an interpreter also need a personality */
- SET_PERSONALITY(elf_ex, ibcs2_interpreter);
}

/* OK, we are done with that, now set up the arg stuff,

2003-01-18 07:33:57

by David Mosberger

[permalink] [raw]
Subject: Re: recent change to exit_mmap

>>>>> On Fri, 17 Jan 2003 22:44:44 -0800, Andrew Morton <[email protected]> said:

Andrew> Looks like ia64 needs work, too...

Yes, should be the same problem there. The fix looks fine to me.
(Let's just hope I remember it when Linus puts it in his tree...).

Thanks,

--david

2003-01-18 07:42:59

by Andrew Morton

[permalink] [raw]
Subject: Re: recent change to exit_mmap

David Mosberger <[email protected]> wrote:
>
> >>>>> On Fri, 17 Jan 2003 22:44:44 -0800, Andrew Morton <[email protected]> said:
>
> Andrew> Looks like ia64 needs work, too...
>
> Yes, should be the same problem there. The fix looks fine to me.
> (Let's just hope I remember it when Linus puts it in his tree...).
>

I've updated that patch to cover ia64, but I think we'll run with the other
approach - just remove those calls to SET_PERSONALITY().

They just seem illogical anyway - why are we switching into the new image's
personality prior to unmapping the old image's memory?

2003-01-18 07:49:22

by David Mosberger

[permalink] [raw]
Subject: Re: recent change to exit_mmap

>>>>> On Fri, 17 Jan 2003 23:53:17 -0800, Andrew Morton <[email protected]> said:

Andrew> David Mosberger <[email protected]> wrote:
>> >>>>> On Fri, 17 Jan 2003 22:44:44 -0800, Andrew Morton
>> <[email protected]> said:

Andrew> Looks like ia64 needs work, too...
>> Yes, should be the same problem there. The fix looks fine to
>> me. (Let's just hope I remember it when Linus puts it in his
>> tree...).

Andrew> I've updated that patch to cover ia64, but I think we'll run
Andrew> with the other approach - just remove those calls to
Andrew> SET_PERSONALITY().

Andrew> They just seem illogical anyway - why are we switching into
Andrew> the new image's personality prior to unmapping the old
Andrew> image's memory?

I don't know why SET_PERSONALITY() came to be where it is now, but it
does make some sense to me. One thing that comes to mind: on ia64, we
normally don't map data segments with execute permission but for
backwards-compatibility, we need to do that for x86 binaries. I think
there might be a problem with that if SET_PERSONALITY() was done too
late. Certainly something that could be fixed, but I suspect a
similar ordering issue (perhaps on SPARC?) might have triggered the
current placement of SET_PERSONALITY().

--david

2003-01-18 08:05:30

by Andrew Morton

[permalink] [raw]
Subject: Re: recent change to exit_mmap

David Mosberger <[email protected]> wrote:
>
> I don't know why SET_PERSONALITY() came to be where it is now, but it
> does make some sense to me. One thing that comes to mind: on ia64, we
> normally don't map data segments with execute permission but for
> backwards-compatibility, we need to do that for x86 binaries. I think
> there might be a problem with that if SET_PERSONALITY() was done too
> late. Certainly something that could be fixed, but I suspect a
> similar ordering issue (perhaps on SPARC?) might have triggered the
> current placement of SET_PERSONALITY().
>

hmm. Seems that all the activities between the two first SET_PERSONALITY()
calls and the flush_old_exec() are pretty innocuous. And no mappings could
be set up there, because flush_old_exec() would remove them again.

I'll ask Dave about it.


2003-01-18 08:07:38

by David Mosberger

[permalink] [raw]
Subject: Re: recent change to exit_mmap

>>>>> On Sat, 18 Jan 2003 00:15:46 -0800, Andrew Morton <[email protected]> said:

Andrew> hmm. Seems that all the activities between the two first
Andrew> SET_PERSONALITY() calls and the flush_old_exec() are pretty
Andrew> innocuous. And no mappings could be set up there, because
Andrew> flush_old_exec() would remove them again.

That's true. Perhaps it is just cruft.

--david

2003-01-18 10:34:37

by Andrew Morton

[permalink] [raw]
Subject: Re: recent change to exit_mmap

Anton Blanchard <[email protected]> wrote:
>
>
> > On seconds thoughts...
> >
> > Are the first two SET_PERSONALITY() calls in there actually doing anything?
> > Perhaps you're right, and we only need the one at line 615, whcih is in the
> > right place?
>
> Sounds good. The following patch tests out OK.
>
> Anton
>
> ===== fs/binfmt_elf.c 1.34 vs edited =====
> --- 1.34/fs/binfmt_elf.c Mon Nov 25 19:59:02 2002
> +++ edited/fs/binfmt_elf.c Sat Jan 18 18:16:52 2003
> @@ -536,8 +536,6 @@
> strcmp(elf_interpreter,"/usr/lib/ld.so.1") == 0)
> ibcs2_interpreter = 1;
>
> - SET_PERSONALITY(elf_ex, ibcs2_interpreter);
> -
> interpreter = open_exec(elf_interpreter);
> retval = PTR_ERR(interpreter);
> if (IS_ERR(interpreter))
> @@ -578,9 +576,6 @@
> // printk(KERN_WARNING "ELF: Ambiguous type, using ELF\n");
> interpreter_type = INTERPRETER_ELF;
> }
> - } else {
> - /* Executables without an interpreter also need a personality */
> - SET_PERSONALITY(elf_ex, ibcs2_interpreter);
> }

Unfortunately it isn't right. See, we set the peronality prior to looking up
the elf interpreter because different types of executables can have different
filesystem namespaces. open_exec() needs it. The IBCS stuff needs this, as
well as (thanks davem):

"When we have a personality like "SunOS" or whatever, we allow the usage
of an alternate '/' for file lookups, it trumps whatever would be found in
the normal /, so we could put the SunOS ld.so under /emul/SunOS/lib for
example using an emulation alternative-root of /emul/SunOS"

And we can't dump the currect exec image until we've determined that the
interpreter is available.

So it's all a bit of a stew. We're back to the original patch. Could you
please test it? (This includes the ia64 bit)


diff -puN mm/mmap.c~exit_mmap-fix mm/mmap.c
--- 25/mm/mmap.c~exit_mmap-fix 2003-01-17 22:35:13.000000000 -0800
+++ 25-akpm/mm/mmap.c 2003-01-17 22:41:10.000000000 -0800
@@ -1379,6 +1379,22 @@ void build_mmap_rb(struct mm_struct * mm
}
}

+/*
+ * During exit_mmap, TASK_SIZE is not a reliable indication of the virtual
+ * size of the mm which is being torn down. Because on the exec() path, this
+ * process may have switched its personality from 64-bit to 32-bit prior to
+ * calling exit_mmap(). So TASK_SIZE returns a value suitable for a 32-bit
+ * process, and not the 64-bit process whose mm we need to empty.
+ *
+ * So what we do is to always unmap the largest virtual address space which
+ * the architecture supports. unmap_vmas() will then unmap every VMA in the
+ * mm, which is what we want to happen here.
+ */
+
+#ifndef TASK_SIZE_MAX
+#define TASK_SIZE_MAX TASK_SIZE
+#endif
+
/* Release all mmaps. */
void exit_mmap(struct mm_struct *mm)
{
@@ -1395,7 +1411,7 @@ void exit_mmap(struct mm_struct *mm)
tlb = tlb_gather_mmu(mm, 1);
flush_cache_mm(mm);
mm->map_count -= unmap_vmas(&tlb, mm, mm->mmap, 0,
- TASK_SIZE, &nr_accounted);
+ TASK_SIZE_MAX, &nr_accounted);
vm_unacct_memory(nr_accounted);
BUG_ON(mm->map_count); /* This is just debugging */
clear_page_tables(tlb, FIRST_USER_PGD_NR, USER_PTRS_PER_PGD);
diff -puN include/asm-ppc64/processor.h~exit_mmap-fix include/asm-ppc64/processor.h
--- 25/include/asm-ppc64/processor.h~exit_mmap-fix 2003-01-17 22:41:32.000000000 -0800
+++ 25-akpm/include/asm-ppc64/processor.h 2003-01-17 22:42:03.000000000 -0800
@@ -630,6 +630,7 @@ extern struct task_struct *last_task_use

#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
TASK_SIZE_USER32 : TASK_SIZE_USER64)
+#define TASK_SIZE_MAX TASK_SIZE_USER64
#endif /* __KERNEL__ */


diff -puN include/asm-ia64/processor.h~exit_mmap-fix include/asm-ia64/processor.h
--- 25/include/asm-ia64/processor.h~exit_mmap-fix 2003-01-17 22:46:08.000000000 -0800
+++ 25-akpm/include/asm-ia64/processor.h 2003-01-17 22:47:19.000000000 -0800
@@ -39,6 +39,12 @@
#define TASK_SIZE (current->thread.task_size)

/*
+ * This determines the virtual address space which must be torn down
+ * during exec.
+ *
+#define TASK_SIZE_MAX DEFAULT_TASK_SIZE
+
+/*
* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/

_