2002-10-15 12:54:58

by Ingo Molnar

[permalink] [raw]
Subject: [patch] mmap-speedup-2.5.42-C3


the attached patch (against BK-curr) adds three new, threading related
improvements to the VM.

the first one is an mmap inefficiency that was reported by Saurabh Desai.
The test_str02 NPTL test-utility does the following: it tests the maximum
number of threads by creating a new thread, which thread creates a new
thread itself, etc. It basically creates thousands of parallel threads,
which means thousands of thread stacks.

NPTL uses mmap() to allocate new default thread stacks - and POSIX
requires us to install a 'guard page' as well, which is done via
mprotect(PROT_NONE) on the first page of the stack. This means that tons
of NPTL threads means 2* tons of vmas per MM, all allocated in a forward
fashion starting at the virtual address of 1 GB (TASK_UNMAPPED_BASE).

Saurabh reported a slowdown after the first couple of thousands of
threads, which i can reproduce as well. The reason for this slowdown is
the get_unmapped_area() implementation, which tries to achieve the most
compact virtual memory allocation, by searching for the vma at
TASK_UNMAPPED_BASE, and then linearly searching for a hole. With thousands
of linearly allocated vmas this is an increasingly painful thing to do ...

obviously, high-performance threaded applications will create stacks
without the guard page, which triggers the anon-vma merging code so we end
up with one large vma, not tons of small vmas.

it's also possible for userspace to be smarter by setting aside a stack
space and keeping a bitmap of allocated stacks and using MAP_FIXED (this
also enables it to do the guard page not via mprotect() but by keeping the
stacks apart by 1 page - ie. half the number of vmas) - but this also
decreases flexibility.

So i think that the default behavior nevertheless makes sense as well, so
IMO we should optimize it in the kernel.

there are various solutions to this problem, none of which solve the
problem in a 100% sufficient way, so i went for the simplest approach: i
added code to cache the 'last known hole' address in mm->free_area_cache,
which is used as a hint to get_unmapped_area().

this fixed the test_str02 testcase wonderfully, thread creation
performance for this testcase is O(1) again, but this simpler solution
obviously has a number of weak spots, and the (unlikely but possible)
worst-case is quite close to the current situation. In any case, this
approach does not sacrifice the perfect VM compactness out mmap()
implementation achieves, so it's a performance optimization with no
externally visible consequences.

The most generic and still perfectly-compact VM allocation solution would
be to have a vma tree for the 'inverse virtual memory space', ie. a tree
of free virtual memory ranges, which could be searched and iterated like
the space of allocated vmas. I think we could do this by extending vmas,
but the drawback is larger vmas. This does not save us from having to scan
vmas linearly still, because the size constraint is still present, but at
least most of the anon-mmap activities are constant sized. (both malloc()
and the thread-stack allocator uses mostly fixed sizes.)

plus the patch improves the OOM-kill mechanism with two new items,
triggered by test_str02 as well:

- performance optimization: do not kill threads in the same thread group
as the OOM-ing thread. (it's still necessery to scan over every thread
though, as it's possible to have CLONE_VM threads in a different thread
group - we do not want those to escape the OOM-kill.)

- to not let newly created child threads slip out of the group-kill. Note
that the 2.4 kernel's OOM handler has the same problem, and it could be
the reason why forkbombs occasionally slip out of the OOM kill.

the patch was tested on x86 SMP and UP. Saurabh, can you confirm that this
patch fixes the performance problem you saw in test_str02?

Ingo

--- linux/include/linux/sched.h.orig 2002-10-15 12:51:21.000000000 +0200
+++ linux/include/linux/sched.h 2002-10-15 13:55:24.000000000 +0200
@@ -183,6 +183,7 @@
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
struct vm_area_struct * mmap_cache; /* last find_vma result */
+ unsigned long free_area_cache; /* first hole */
pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
--- linux/include/linux/init_task.h.orig 2002-10-15 13:08:15.000000000 +0200
+++ linux/include/linux/init_task.h 2002-10-15 13:57:57.000000000 +0200
@@ -41,6 +41,7 @@
.page_table_lock = SPIN_LOCK_UNLOCKED, \
.mmlist = LIST_HEAD_INIT(name.mmlist), \
.default_kioctx = INIT_KIOCTX(name.default_kioctx, name), \
+ .free_area_cache= TASK_UNMAPPED_BASE, \
}

#define INIT_SIGNALS(sig) { \
--- linux/include/asm-i386/processor.h.orig 2002-10-15 14:30:14.000000000 +0200
+++ linux/include/asm-i386/processor.h 2002-10-15 14:30:25.000000000 +0200
@@ -270,7 +270,7 @@
/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
-#define TASK_UNMAPPED_BASE (TASK_SIZE / 3)
+#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))

/*
* Size of io_bitmap in longwords: 32 is ports 0-0x3ff.
--- linux/kernel/fork.c.orig 2002-10-15 12:52:07.000000000 +0200
+++ linux/kernel/fork.c 2002-10-15 14:18:53.000000000 +0200
@@ -215,6 +215,7 @@
mm->locked_vm = 0;
mm->mmap = NULL;
mm->mmap_cache = NULL;
+ mm->free_area_cache = TASK_UNMAPPED_BASE;
mm->map_count = 0;
mm->rss = 0;
mm->cpu_vm_mask = 0;
@@ -308,6 +309,8 @@
mm->page_table_lock = SPIN_LOCK_UNLOCKED;
mm->ioctx_list_lock = RW_LOCK_UNLOCKED;
mm->default_kioctx = (struct kioctx)INIT_KIOCTX(mm->default_kioctx, *mm);
+ mm->free_area_cache = TASK_UNMAPPED_BASE;
+
mm->pgd = pgd_alloc(mm);
if (mm->pgd)
return mm;
@@ -863,6 +866,14 @@

/* Need tasklist lock for parent etc handling! */
write_lock_irq(&tasklist_lock);
+ /*
+ * Check for pending SIGKILL! The new thread should not be allowed
+ * to slip out of an OOM kill. (or normal SIGKILL.)
+ */
+ if (sigismember(&current->pending.signal, SIGKILL)) {
+ write_unlock_irq(&tasklist_lock);
+ goto bad_fork_cleanup_namespace;
+ }

/* CLONE_PARENT re-uses the old parent */
if (clone_flags & CLONE_PARENT)
--- linux/mm/mmap.c.orig 2002-10-15 12:53:32.000000000 +0200
+++ linux/mm/mmap.c 2002-10-15 14:11:22.000000000 +0200
@@ -633,24 +633,33 @@
#ifndef HAVE_ARCH_UNMAPPED_AREA
static inline unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
+ struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+ int found_hole = 0;

if (len > TASK_SIZE)
return -ENOMEM;

if (addr) {
addr = PAGE_ALIGN(addr);
- vma = find_vma(current->mm, addr);
+ vma = find_vma(mm, addr);
if (TASK_SIZE - len >= addr &&
(!vma || addr + len <= vma->vm_start))
return addr;
}
- addr = PAGE_ALIGN(TASK_UNMAPPED_BASE);
+ addr = mm->free_area_cache;

- for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) {
+ for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
/* At this point: (!vma || addr < vma->vm_end). */
if (TASK_SIZE - len < addr)
return -ENOMEM;
+ /*
+ * Record the first available hole.
+ */
+ if (!found_hole && (!vma || addr < vma->vm_start)) {
+ mm->free_area_cache = addr;
+ found_hole = 1;
+ }
if (!vma || addr + len <= vma->vm_start)
return addr;
addr = vma->vm_end;
@@ -941,6 +950,12 @@
area->vm_mm->total_vm -= len >> PAGE_SHIFT;
if (area->vm_flags & VM_LOCKED)
area->vm_mm->locked_vm -= len >> PAGE_SHIFT;
+ /*
+ * Is this a new hole at the lowest possible address?
+ */
+ if (area->vm_start >= TASK_UNMAPPED_BASE &&
+ area->vm_start < area->vm_mm->free_area_cache)
+ area->vm_mm->free_area_cache = area->vm_start;

remove_shared_vm_struct(area);

--- linux/mm/oom_kill.c.orig 2002-10-15 13:59:43.000000000 +0200
+++ linux/mm/oom_kill.c 2002-10-15 14:00:05.000000000 +0200
@@ -175,9 +175,13 @@
if (p == NULL)
panic("Out of memory and no killable processes...\n");

- /* kill all processes that share the ->mm (i.e. all threads) */
+ oom_kill_task(p);
+ /*
+ * kill all processes that share the ->mm (i.e. all threads),
+ * but are in a different thread group
+ */
do_each_thread(g, q)
- if (q->mm == p->mm)
+ if (q->mm == p->mm && q->tgid != p->tgid)
oom_kill_task(q);
while_each_thread(g, q);



2002-10-15 18:04:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

Ingo Molnar wrote:
>
> ...
>
> Saurabh reported a slowdown after the first couple of thousands of
> threads, which i can reproduce as well. The reason for this slowdown is
> the get_unmapped_area() implementation, which tries to achieve the most
> compact virtual memory allocation, by searching for the vma at
> TASK_UNMAPPED_BASE, and then linearly searching for a hole. With thousands
> of linearly allocated vmas this is an increasingly painful thing to do ...

We've had reports of problems with that linear search before - for
a single-threaded application which was mapping a lot of little windows
into a huge file.

> ...
>
> there are various solutions to this problem, none of which solve the
> problem in a 100% sufficient way, so i went for the simplest approach: i
> added code to cache the 'last known hole' address in mm->free_area_cache,
> which is used as a hint to get_unmapped_area().

This will have no effect on current kernel behaviour other than speeding
it up. Looks good.

> ...
> The most generic and still perfectly-compact VM allocation solution would
> be to have a vma tree for the 'inverse virtual memory space', ie. a tree
> of free virtual memory ranges, which could be searched and iterated like
> the space of allocated vmas. I think we could do this by extending vmas,
> but the drawback is larger vmas. This does not save us from having to scan
> vmas linearly still, because the size constraint is still present, but at
> least most of the anon-mmap activities are constant sized. (both malloc()
> and the thread-stack allocator uses mostly fixed sizes.)

Yup. We'd need to be able to perform a search based on "size of hole"
rather than virtual address. That really needs a whole new data structure
and supporting search code, I think... It also may have side effects
to do with fragmentation of the virtual address space.

2002-10-15 21:24:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

Andrew Morton <[email protected]> writes:

> Yup. We'd need to be able to perform a search based on "size of hole"
> rather than virtual address. That really needs a whole new data structure
> and supporting search code, I think... It also may have side effects
> to do with fragmentation of the virtual address space.

When you oprofile KDE startup you notice that a lot of time is spent in
get_unmapped_area too. The reason is that every KDE process links with
10-20 libraries and ends up with a 40-50 entry /proc/<pid>/maps.

Optimizing this case would be likely useful too, although I suspect
Ingo's last hit cache would already help somewhat.

When you add a funky data structure please trigger it on the number
of mappings at least. e.g. I bet a micro optimized (= uses prefetch)
single linked list or even array will be always best for <= 10 entries,
which is still not that uncommon in the non KDE world.

Array would be attractive because you can trivially prefetch it,
but would eat more space per mm_struct. Assuming each process has at
least 5 mappings the cost should be rather small though.

-Andi

2002-10-16 01:08:21

by Saurabh Desai

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

Ingo Molnar wrote:
>
> the attached patch (against BK-curr) adds three new, threading related
> improvements to the VM.
>
> the first one is an mmap inefficiency that was reported by Saurabh Desai.
> The test_str02 NPTL test-utility does the following: it tests the maximum
> number of threads by creating a new thread, which thread creates a new
> thread itself, etc. It basically creates thousands of parallel threads,
> which means thousands of thread stacks.

Like to point out, test_str02 is a NGPT test program not NPTL.


> the patch was tested on x86 SMP and UP. Saurabh, can you confirm that this
> patch fixes the performance problem you saw in test_str02?
>

Yes, the test_str02 performance improved a lot using NPTL.
However, on a side effect, I noticed that randomly my current telnet session
was logged out after running this test. Not sure, why?
I applied your patch on 2.5.42 kernel and running glibc-2.3.1pre2.

2002-10-16 07:48:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3


On 15 Oct 2002, Andi Kleen wrote:

> When you oprofile KDE startup you notice that a lot of time is spent in
> get_unmapped_area too. The reason is that every KDE process links with
> 10-20 libraries and ends up with a 40-50 entry /proc/<pid>/maps.

actually, library mappings alone should not cause a slowdown, since we
start the search at MAP_UNMAPPED_BASE and most library mappings are below
1GB. But if those libraries use mmap()-ed anonymous RAM that has different
protections then the anonymous areas do not get merged and the scanning
overhead goes up.

> Optimizing this case would be likely useful too, although I suspect
> Ingo's last hit cache would already help somewhat.

well, could you check how much of an impact it has on KDE's kernel
profile? For the threaded test it's was a more than 10x application
speedup, and in the kernel profile get_unmapped_area() was like 90% of the
hits - after the change it was like 1% of the hits. (but, this test is the
best-case for the search cache, so ...)

if this simpler approach solves two different problem categories
sufficiently then i cannot see any reason to go for the much more complex
(and still not 100% scanning-less) approach.

Ingo


2002-10-16 07:57:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3


On Tue, 15 Oct 2002, Saurabh Desai wrote:

> Yes, the test_str02 performance improved a lot using NPTL.
> However, on a side effect, I noticed that randomly my current telnet
> session was logged out after running this test. Not sure, why?

i think it should be unrelated to the mmap patch. In any case, Andrew
added the mmap-speedup patch to 2.5.43-mm1, so we'll hear about this
pretty soon.

Ingo

2002-10-16 08:02:38

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

On Wed, Oct 16, 2002 at 10:03:52AM +0200, Ingo Molnar wrote:
>
> On 15 Oct 2002, Andi Kleen wrote:
>
> > When you oprofile KDE startup you notice that a lot of time is spent in
> > get_unmapped_area too. The reason is that every KDE process links with
> > 10-20 libraries and ends up with a 40-50 entry /proc/<pid>/maps.
>
> actually, library mappings alone should not cause a slowdown, since we
> start the search at MAP_UNMAPPED_BASE and most library mappings are below
> 1GB. But if those libraries use mmap()-ed anonymous RAM that has different
> protections then the anonymous areas do not get merged and the scanning
> overhead goes up.

Libraries mapped by dynamic linker are mapped without MAP_FIXED and unless
you use prelinking, with 0 virtual address, ie. they all end up above 1GB.
And 99% of libraries uses different protections, for the read-only and
read-write segment.

Jakub

2002-10-16 08:10:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3


On Wed, 16 Oct 2002, Jakub Jelinek wrote:

> Libraries mapped by dynamic linker are mapped without MAP_FIXED and
> unless you use prelinking, with 0 virtual address, ie. they all end up
> above 1GB. And 99% of libraries uses different protections, for the
> read-only and read-write segment.

right - only the bss (brk-allocated) ones are below 1GB it appears. I did
a quick check on a KDE app and 3 mappings were below 1GB, and 116(!)
mappings were above 1GB. And even if it wasnt for the different
protections, they use different files to map to so they have to be in
different vmas, no matter what.

i'm wondering about prelinking though - wont that reduce the number of
mappings radically?

in any case, doing a test of KDE's profile with and without the patch
applied sounds like a good idea.

Ingo

2002-10-16 08:17:26

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

On Wed, Oct 16, 2002 at 10:27:07AM +0200, Ingo Molnar wrote:
>
> On Wed, 16 Oct 2002, Jakub Jelinek wrote:
>
> > Libraries mapped by dynamic linker are mapped without MAP_FIXED and
> > unless you use prelinking, with 0 virtual address, ie. they all end up
> > above 1GB. And 99% of libraries uses different protections, for the
> > read-only and read-write segment.
>
> right - only the bss (brk-allocated) ones are below 1GB it appears. I did
> a quick check on a KDE app and 3 mappings were below 1GB, and 116(!)
> mappings were above 1GB. And even if it wasnt for the different
> protections, they use different files to map to so they have to be in
> different vmas, no matter what.
>
> i'm wondering about prelinking though - wont that reduce the number of
> mappings radically?

It won't, the number of mappings will be exactly the same. It still needs
to mmap all the libraries and honour the protections.
But you might have holes in between the mappings if prelinking, while
you usually don't have many if not prelinking.
That's because prelink assigns a separate VA slot for each library (well,
with --conserve-memory two libraries might get the same VA slot if they
never appear together in any program).

Jakub

2002-10-16 09:11:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

Jakub Jelinek <[email protected]> writes:

You can argue against it, but it doesn't change the fact that
get_unmapped_area is a significant user of CPU on a KDE startup. You
can do the oprofile yourself if you don't believe me. And where else should
it come from other than from mapping shared libraries ?

This includes X server startup, but at least my X has a much shorter
/proc/*/maps than a KDE program, so I don't think X is a significant
consumer of vmas.

-Andi

2002-10-16 09:30:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3


On 16 Oct 2002, Andi Kleen wrote:

> You can argue against it, but it doesn't change the fact that
> get_unmapped_area is a significant user of CPU on a KDE startup. [...]

i dont think anyone argued against anything - i'm trying to understand
KDE's vma layout, and i dont think it's "wrong" in any way. It uses a
reasonable layout, and the kernel should really be able to handle mmap()
mappings when there are 100+ already existing mappings. Would you mind to
check KDE under 2.5.43 with the attached patch, does it change the
get_unmapped_area() cost?

Ingo

--- linux/include/linux/sched.h.orig 2002-10-15 12:51:21.000000000 +0200
+++ linux/include/linux/sched.h 2002-10-15 13:55:24.000000000 +0200
@@ -183,6 +183,7 @@
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
struct vm_area_struct * mmap_cache; /* last find_vma result */
+ unsigned long free_area_cache; /* first hole */
pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
--- linux/include/linux/init_task.h.orig 2002-10-15 13:08:15.000000000 +0200
+++ linux/include/linux/init_task.h 2002-10-15 13:57:57.000000000 +0200
@@ -41,6 +41,7 @@
.page_table_lock = SPIN_LOCK_UNLOCKED, \
.mmlist = LIST_HEAD_INIT(name.mmlist), \
.default_kioctx = INIT_KIOCTX(name.default_kioctx, name), \
+ .free_area_cache= TASK_UNMAPPED_BASE, \
}

#define INIT_SIGNALS(sig) { \
--- linux/include/asm-i386/processor.h.orig 2002-10-15 14:30:14.000000000 +0200
+++ linux/include/asm-i386/processor.h 2002-10-15 14:30:25.000000000 +0200
@@ -270,7 +270,7 @@
/* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
-#define TASK_UNMAPPED_BASE (TASK_SIZE / 3)
+#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))

/*
* Size of io_bitmap in longwords: 32 is ports 0-0x3ff.
--- linux/kernel/fork.c.orig 2002-10-15 12:52:07.000000000 +0200
+++ linux/kernel/fork.c 2002-10-15 14:18:53.000000000 +0200
@@ -215,6 +215,7 @@
mm->locked_vm = 0;
mm->mmap = NULL;
mm->mmap_cache = NULL;
+ mm->free_area_cache = TASK_UNMAPPED_BASE;
mm->map_count = 0;
mm->rss = 0;
mm->cpu_vm_mask = 0;
@@ -308,6 +309,8 @@
mm->page_table_lock = SPIN_LOCK_UNLOCKED;
mm->ioctx_list_lock = RW_LOCK_UNLOCKED;
mm->default_kioctx = (struct kioctx)INIT_KIOCTX(mm->default_kioctx, *mm);
+ mm->free_area_cache = TASK_UNMAPPED_BASE;
+
mm->pgd = pgd_alloc(mm);
if (mm->pgd)
return mm;
@@ -863,6 +866,14 @@

/* Need tasklist lock for parent etc handling! */
write_lock_irq(&tasklist_lock);
+ /*
+ * Check for pending SIGKILL! The new thread should not be allowed
+ * to slip out of an OOM kill. (or normal SIGKILL.)
+ */
+ if (sigismember(&current->pending.signal, SIGKILL)) {
+ write_unlock_irq(&tasklist_lock);
+ goto bad_fork_cleanup_namespace;
+ }

/* CLONE_PARENT re-uses the old parent */
if (clone_flags & CLONE_PARENT)
--- linux/mm/mmap.c.orig 2002-10-15 12:53:32.000000000 +0200
+++ linux/mm/mmap.c 2002-10-15 14:11:22.000000000 +0200
@@ -633,24 +633,33 @@
#ifndef HAVE_ARCH_UNMAPPED_AREA
static inline unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
+ struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+ int found_hole = 0;

if (len > TASK_SIZE)
return -ENOMEM;

if (addr) {
addr = PAGE_ALIGN(addr);
- vma = find_vma(current->mm, addr);
+ vma = find_vma(mm, addr);
if (TASK_SIZE - len >= addr &&
(!vma || addr + len <= vma->vm_start))
return addr;
}
- addr = PAGE_ALIGN(TASK_UNMAPPED_BASE);
+ addr = mm->free_area_cache;

- for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) {
+ for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
/* At this point: (!vma || addr < vma->vm_end). */
if (TASK_SIZE - len < addr)
return -ENOMEM;
+ /*
+ * Record the first available hole.
+ */
+ if (!found_hole && (!vma || addr < vma->vm_start)) {
+ mm->free_area_cache = addr;
+ found_hole = 1;
+ }
if (!vma || addr + len <= vma->vm_start)
return addr;
addr = vma->vm_end;
@@ -941,6 +950,12 @@
area->vm_mm->total_vm -= len >> PAGE_SHIFT;
if (area->vm_flags & VM_LOCKED)
area->vm_mm->locked_vm -= len >> PAGE_SHIFT;
+ /*
+ * Is this a new hole at the lowest possible address?
+ */
+ if (area->vm_start >= TASK_UNMAPPED_BASE &&
+ area->vm_start < area->vm_mm->free_area_cache)
+ area->vm_mm->free_area_cache = area->vm_start;

remove_shared_vm_struct(area);

--- linux/mm/oom_kill.c.orig 2002-10-15 13:59:43.000000000 +0200
+++ linux/mm/oom_kill.c 2002-10-15 14:00:05.000000000 +0200
@@ -175,9 +175,13 @@
if (p == NULL)
panic("Out of memory and no killable processes...\n");

- /* kill all processes that share the ->mm (i.e. all threads) */
+ oom_kill_task(p);
+ /*
+ * kill all processes that share the ->mm (i.e. all threads),
+ * but are in a different thread group
+ */
do_each_thread(g, q)
- if (q->mm == p->mm)
+ if (q->mm == p->mm && q->tgid != p->tgid)
oom_kill_task(q);
while_each_thread(g, q);


2002-10-16 12:03:13

by Jakub Jelinek

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

On Wed, Oct 16, 2002 at 11:47:32AM +0200, Ingo Molnar wrote:
>
> On 16 Oct 2002, Andi Kleen wrote:
>
> > You can argue against it, but it doesn't change the fact that
> > get_unmapped_area is a significant user of CPU on a KDE startup. [...]
>
> i dont think anyone argued against anything - i'm trying to understand
> KDE's vma layout, and i dont think it's "wrong" in any way. It uses a
> reasonable layout, and the kernel should really be able to handle mmap()
> mappings when there are 100+ already existing mappings. Would you mind to
> check KDE under 2.5.43 with the attached patch, does it change the
> get_unmapped_area() cost?

Here is /proc/pid/maps from running konqueror on fully prelinked
distribution (prelink -avvm) on IA-32. As you can see, there is a bunch of
holes with different sizes. When ld.so loads the application up, it will
mmap a few anon pages at TASK_UNMAPPED_BASE and then all the libs linked
into the binary from PRELINK_BASE (0x41000000 on IA-32) up, but holes in
it (see e.g. hole between 0x41443000-0x4145c000 etc.).
Then dlopened libs and other mmaps come close to TASK_UNMAPPED_BASE, usually
without too many holes.
Now it depends on for which get_unmapped_area calls is Andi seeing in oprofile.
mmaping prelinked libraries should not be as expensive in the common case,
since mmap is called there with non-zero addr which is likely not used yet.
The next calls to mmap behave like if prelinking was not used at all, until
you fill up the TASK_UNMAPPED_BASE..PRELINK_BASE area, at which point unless
you're mmaping very small pages it is very likely there will be at least
one really small hole between some prelinked libs, so free_area_cache
would point to that hole all the time and get_unmapped_area would have
to walk all the vmas above it.

08048000-08049000 r-xp 00000000 03:05 1240393 /usr/bin/konqueror
08049000-08050000 rw-p 00000000 03:05 1240393 /usr/bin/konqueror
08050000-081c2000 rwxp 00000000 00:00 0
40000000-40003000 rw-p 00000000 00:00 0
40003000-40004000 r--p 0092f000 03:05 886177 /usr/lib/locale/locale-archive
40004000-40006000 r-xp 00000000 03:05 837382 /usr/X11R6/lib/X11/locale/common/xlcDef.so.2
40006000-40007000 rw-p 00001000 03:05 837382 /usr/X11R6/lib/X11/locale/common/xlcDef.so.2
40007000-40008000 rw-p 00000000 00:00 0
40008000-4000e000 r--s 00000000 03:05 1131670 /usr/lib/gconv/gconv-modules.cache
4000e000-40010000 r-xp 00000000 03:05 1131518 /usr/lib/gconv/ISO8859-1.so
40010000-40011000 rw-p 00001000 03:05 1131518 /usr/lib/gconv/ISO8859-1.so
40011000-4001a000 r-xp 00000000 03:05 1234808 /lib/libnss_files-2.2.93.so
4001a000-4001b000 rw-p 00008000 03:05 1234808 /lib/libnss_files-2.2.93.so
4001f000-40023000 rw-p 00000000 00:00 0
40023000-40223000 r--p 00000000 03:05 886177 /usr/lib/locale/locale-archive
40223000-40255000 r--p 008e1000 03:05 886177 /usr/lib/locale/locale-archive
40255000-40271000 r-xp 00000000 03:05 837381 /usr/X11R6/lib/X11/locale/common/ximcp.so.2
40271000-40273000 rw-p 0001b000 03:05 837381 /usr/X11R6/lib/X11/locale/common/ximcp.so.2
40273000-40283000 r-xp 00000000 03:05 837685 /usr/lib/qt-3.0.5/plugins/styles/bluecurve.so
40283000-40284000 rw-p 00010000 03:05 837685 /usr/lib/qt-3.0.5/plugins/styles/bluecurve.so
40284000-40296000 r--p 00000000 03:05 2772981 /usr/X11R6/lib/X11/fonts/Type1/l048013t.pfa
40296000-402fa000 r--s 00000000 03:05 1117838 /tmp/kde-root/ksycoca
402fa000-40310000 r-xp 00000000 03:05 3818307 /usr/lib/kde3/konq_iconview.so
40310000-40312000 rw-p 00016000 03:05 3818307 /usr/lib/kde3/konq_iconview.so
40312000-40324000 r--p 00000000 03:05 2772983 /usr/X11R6/lib/X11/fonts/Type1/l048016t.pfa
40324000-40330000 r-xp 00000000 03:05 3818525 /usr/lib/kde3/libdirfilterplugin.so
40330000-40331000 rw-p 0000c000 03:05 3818525 /usr/lib/kde3/libdirfilterplugin.so
40331000-40346000 r-xp 00000000 03:05 3818533 /usr/lib/kde3/libkimgallery.so
40346000-40347000 rw-p 00015000 03:05 3818533 /usr/lib/kde3/libkimgallery.so
40347000-40362000 r-xp 00000000 03:05 1366244 /usr/lib/libkjava.so.1.0.0
40362000-40364000 rw-p 0001a000 03:05 1366244 /usr/lib/libkjava.so.1.0.0
40364000-403f6000 r-xp 00000000 03:05 1364978 /usr/lib/libkdeprint.so.4.0.0
403f6000-403fd000 rw-p 00091000 03:05 1364978 /usr/lib/libkdeprint.so.4.0.0
403fd000-4040b000 r-xp 00000000 03:05 3818311 /usr/lib/kde3/konq_shellcmdplugin.so
4040b000-4040c000 rw-p 0000e000 03:05 3818311 /usr/lib/kde3/konq_shellcmdplugin.so
4040c000-40415000 r-xp 00000000 03:05 837386 /usr/X11R6/lib/X11/locale/common/xomGeneric.so.2
40415000-40416000 rw-p 00008000 03:05 837386 /usr/X11R6/lib/X11/locale/common/xomGeneric.so.2
41000000-41012000 r-xp 00000000 03:05 1241025 /lib/ld-2.2.93.so
41012000-41013000 rw-p 00012000 03:05 1241025 /lib/ld-2.2.93.so
41015000-4113b000 r-xp 00000000 03:05 1134855 /lib/i686/libc-2.2.93.so
4113b000-41140000 rw-p 00126000 03:05 1134855 /lib/i686/libc-2.2.93.so
41140000-41144000 rw-p 00000000 00:00 0
41146000-41167000 r-xp 00000000 03:05 1134856 /lib/i686/libm-2.2.93.so
41167000-41168000 rw-p 00021000 03:05 1134856 /lib/i686/libm-2.2.93.so
4116a000-4116c000 r-xp 00000000 03:05 1241026 /lib/libdl-2.2.93.so
4116c000-4116d000 rw-p 00001000 03:05 1241026 /lib/libdl-2.2.93.so
4116f000-4124a000 r-xp 00000000 03:05 1368552 /usr/X11R6/lib/libX11.so.6.2
4124a000-4124d000 rw-p 000da000 03:05 1368552 /usr/X11R6/lib/libX11.so.6.2
4124f000-4125c000 r-xp 00000000 03:05 1364772 /usr/X11R6/lib/libXext.so.6.4
4125c000-4125d000 rw-p 0000c000 03:05 1364772 /usr/X11R6/lib/libXext.so.6.4
4125f000-4126b000 r-xp 00000000 03:05 1364770 /usr/lib/libz.so.1.1.4
4126b000-4126d000 rw-p 0000b000 03:05 1364770 /usr/lib/libz.so.1.1.4
4126f000-41283000 r-xp 00000000 03:05 1364780 /usr/X11R6/lib/libICE.so.6.3
41283000-41284000 rw-p 00013000 03:05 1364780 /usr/X11R6/lib/libICE.so.6.3
41284000-41286000 rw-p 00000000 00:00 0
41288000-41290000 r-xp 00000000 03:05 1364766 /usr/X11R6/lib/libSM.so.6.0
41290000-41291000 rw-p 00007000 03:05 1364766 /usr/X11R6/lib/libSM.so.6.0
41293000-4135e000 r-xp 00000000 03:05 1231259 /lib/libcrypto.so.0.9.6b
4135e000-4136a000 rw-p 000cb000 03:05 1231259 /lib/libcrypto.so.0.9.6b
4136a000-4136d000 rw-p 00000000 00:00 0
4136e000-4137b000 r-xp 00000000 03:05 1130891 /lib/i686/libpthread-0.10.so
4137b000-4137e000 rw-p 0000d000 03:05 1130891 /lib/i686/libpthread-0.10.so
4137e000-4139e000 rw-p 00000000 00:00 0
413a0000-413cd000 r-xp 00000000 03:05 1233592 /lib/libssl.so.0.9.6b
413cd000-413d0000 rw-p 0002d000 03:05 1233592 /lib/libssl.so.0.9.6b
413d2000-413e1000 r-xp 00000000 03:05 1241029 /lib/libresolv-2.2.93.so
413e1000-413e2000 rw-p 0000e000 03:05 1241029 /lib/libresolv-2.2.93.so
413e2000-413e4000 rw-p 00000000 00:00 0
413e6000-41435000 r-xp 00000000 03:05 1364857 /usr/X11R6/lib/libXt.so.6.0
41435000-41439000 rw-p 0004e000 03:05 1364857 /usr/X11R6/lib/libXt.so.6.0
4143b000-41442000 r-xp 00000000 03:05 1241027 /lib/libgcc_s-3.2-20020903.so.1
41442000-41443000 rw-p 00007000 03:05 1241027 /lib/libgcc_s-3.2-20020903.so.1
4145c000-41471000 r-xp 00000000 03:05 1364800 /usr/X11R6/lib/libXmu.so.6.2
41471000-41472000 rw-p 00015000 03:05 1364800 /usr/X11R6/lib/libXmu.so.6.2
41474000-414b9000 r-xp 00000000 03:05 1364798 /usr/lib/libfreetype.so.6.3.1
414b9000-414bd000 rw-p 00045000 03:05 1364798 /usr/lib/libfreetype.so.6.3.1
414bf000-41559000 r-xp 00000000 03:05 1349720 /usr/lib/libstdc++.so.5.0.1
41559000-4156e000 rw-p 0009a000 03:05 1349720 /usr/lib/libstdc++.so.5.0.1
4156e000-41573000 rw-p 00000000 00:00 0
41575000-41591000 r-xp 00000000 03:05 1349581 /usr/lib/libexpat.so.0.3.0
41591000-41595000 rw-p 0001b000 03:05 1349581 /usr/lib/libexpat.so.0.3.0
41597000-415b8000 r-xp 00000000 03:05 1364878 /usr/lib/libfontconfig.so.1.0
415b8000-415bb000 rw-p 00021000 03:05 1364878 /usr/lib/libfontconfig.so.1.0
415bb000-415bc000 rw-p 00000000 00:00 0
415be000-415c2000 r-xp 00000000 03:05 1364788 /usr/X11R6/lib/libXrender.so.1.1
415c2000-415c3000 rw-p 00004000 03:05 1364788 /usr/X11R6/lib/libXrender.so.1.1
415c5000-415e7000 r-xp 00000000 03:05 1349436 /usr/lib/libpng12.so.0.1.2.2
415e7000-415e8000 rw-p 00022000 03:05 1349436 /usr/lib/libpng12.so.0.1.2.2
415ea000-41607000 r-xp 00000000 03:05 1349442 /usr/lib/libjpeg.so.62.0.0
41607000-41608000 rw-p 0001c000 03:05 1349442 /usr/lib/libjpeg.so.62.0.0
4160a000-4161b000 r-xp 00000000 03:05 1349559 /usr/lib/libXft.so.2.0
4161b000-4161c000 rw-p 00010000 03:05 1349559 /usr/lib/libXft.so.2.0
41657000-4166e000 r-xp 00000000 03:05 1365503 /usr/lib/libcups.so.2
4166e000-41670000 rw-p 00017000 03:05 1365503 /usr/lib/libcups.so.2
41670000-41671000 rw-p 00000000 00:00 0
41673000-416eb000 r-xp 00000000 03:05 1349596 /usr/X11R6/lib/libGL.so.1.2
416eb000-416ef000 rw-p 00077000 03:05 1349596 /usr/X11R6/lib/libGL.so.1.2
416ef000-416f2000 rw-p 00000000 00:00 0
416f4000-41736000 r-xp 00000000 03:05 1349497 /usr/lib/libmng.so.1.0.0
41736000-41738000 rw-p 00041000 03:05 1349497 /usr/lib/libmng.so.1.0.0
4173a000-41d4a000 r-xp 00000000 03:05 2527554 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5
41d4a000-41d89000 rw-p 00610000 03:05 2527554 /usr/lib/qt-3.0.5/lib/libqt-mt.so.3.0.5
41d89000-41d90000 rw-p 00000000 00:00 0
41d92000-41d94000 r-xp 00000000 03:05 1237073 /lib/libutil-2.2.93.so
41d94000-41d95000 rw-p 00001000 03:05 1237073 /lib/libutil-2.2.93.so
41d97000-41dc4000 r-xp 00000000 03:05 1349438 /usr/lib/libDCOP.so.4.0.0
41dc4000-41dc6000 rw-p 0002d000 03:05 1349438 /usr/lib/libDCOP.so.4.0.0
41dc6000-41dc7000 rw-p 00000000 00:00 0
41dc9000-41f17000 r-xp 00000000 03:05 1364904 /usr/lib/libkdecore.so.4.0.0
41f17000-41f20000 rw-p 0014e000 03:05 1364904 /usr/lib/libkdecore.so.4.0.0
41f20000-41f22000 rw-p 00000000 00:00 0
41f24000-42133000 r-xp 00000000 03:05 1364934 /usr/lib/libkdeui.so.4.0.0
42133000-42153000 rw-p 0020e000 03:05 1364934 /usr/lib/libkdeui.so.4.0.0
42153000-42154000 rw-p 00000000 00:00 0
42156000-4217b000 r-xp 00000000 03:05 1364926 /usr/lib/libkdefx.so.4.0.0
4217b000-4217d000 rw-p 00024000 03:05 1364926 /usr/lib/libkdefx.so.4.0.0
421a5000-421bf000 r-xp 00000000 03:05 1357557 /usr/lib/libkdesu.so.4.0.0
421bf000-421c0000 rw-p 00019000 03:05 1357557 /usr/lib/libkdesu.so.4.0.0
421c2000-42426000 r-xp 00000000 03:05 1364950 /usr/lib/libkio.so.4.0.0
42426000-4243d000 rw-p 00263000 03:05 1364950 /usr/lib/libkio.so.4.0.0
4243f000-42476000 r-xp 00000000 03:05 1357912 /usr/lib/libkparts.so.2.0.0
42476000-4247b000 rw-p 00036000 03:05 1357912 /usr/lib/libkparts.so.2.0.0
4247d000-424e5000 r-xp 00000000 03:05 1357666 /usr/lib/libkonq.so.4.0.0
424e5000-424ea000 rw-p 00067000 03:05 1357666 /usr/lib/libkonq.so.4.0.0
424ec000-42585000 r-xp 00000000 03:05 1366032 /usr/lib/konqueror.so
42585000-4258c000 rw-p 00098000 03:05 1366032 /usr/lib/konqueror.so
425c8000-427b6000 r-xp 00000000 03:05 1364966 /usr/lib/libkhtml.so.4.0.0
427b6000-427dc000 rw-p 001ee000 03:05 1364966 /usr/lib/libkhtml.so.4.0.0
427dc000-427dd000 rw-p 00000000 00:00 0
bffeb000-c0000000 rwxp fffec000 00:00 0


Jakub

2002-10-16 14:46:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3


On Wed, 16 Oct 2002, Ingo Molnar wrote:

>
> On Tue, 15 Oct 2002, Saurabh Desai wrote:
>
> > Yes, the test_str02 performance improved a lot using NPTL.
> > However, on a side effect, I noticed that randomly my current telnet
> > session was logged out after running this test. Not sure, why?
>
> i think it should be unrelated to the mmap patch. In any case, Andrew
> added the mmap-speedup patch to 2.5.43-mm1, so we'll hear about this
> pretty soon.

There's at least one Oops-report on linux-kernel on 2.5.43-mm1, where the
oops traceback was somewhere in munmap().

Sounds like there are bugs there.

Linus

2002-10-16 15:42:09

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

On Wed, 2002-10-16 at 16:52, Linus Torvalds wrote:
\
> > i think it should be unrelated to the mmap patch. In any case, Andrew
> > added the mmap-speedup patch to 2.5.43-mm1, so we'll hear about this
> > pretty soon.
>
> There's at least one Oops-report on linux-kernel on 2.5.43-mm1, where the
> oops traceback was somewhere in munmap().
>
> Sounds like there are bugs there.

could be the shared pagetable stuff just as well ;(


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-10-16 16:05:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] mmap-speedup-2.5.42-C3

Arjan van de Ven wrote:
>
> On Wed, 2002-10-16 at 16:52, Linus Torvalds wrote:
> \
> > > i think it should be unrelated to the mmap patch. In any case, Andrew
> > > added the mmap-speedup patch to 2.5.43-mm1, so we'll hear about this
> > > pretty soon.
> >
> > There's at least one Oops-report on linux-kernel on 2.5.43-mm1, where the
> > oops traceback was somewhere in munmap().
> >
> > Sounds like there are bugs there.
>
> could be the shared pagetable stuff just as well ;(
>

Yes, Matt had shared pagetables enabled. That code is not stable yet.