2010-06-07 11:21:35

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 00/28] mm: preemptibility -v3

This patch-set makes part of the mm a lot more preemptible. It converts
i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
preemptible.

The main motivation was making mm_take_all_locks() preemptible, since it
appears people are nesting hundreds of spinlocks there.

The side-effects are that can finally make mmu_gather preemptible,
something which lots of people have wanted to do for a long time.

It also gets us anon_vma refcounting, which seems to result in a nice
cleanup of the anon_vma lifetime rules wrt KSM and compaction.

This patch-set it build and boot-tested on x86_64 (a previous version was
also tested on Dave's Niagra2 machines, and I suppose s390 did too when
Martin provided the conversion patch for his arch).

There are no known architectures left unconverted, although some arch code
never did see a compiler (superh and um come to mind).

Can we move this work forwards, or is there anything people would want to
see done?

[ The series includes Rik's latest patches to the same area for
convenience ]


2010-06-07 13:58:00

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

>
> There are no known architectures left unconverted, although some arch code
> never did see a compiler (superh and um come to mind).

um is easy.
On your x86 box you just do "make ARCH=um"

Sam

2010-06-07 15:02:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Mon, 2010-06-07 at 15:57 +0200, Sam Ravnborg wrote:
> >
> > There are no known architectures left unconverted, although some arch code
> > never did see a compiler (superh and um come to mind).
>
> um is easy.
> On your x86 box you just do "make ARCH=um"

Right, seems to build.

2010-06-07 16:36:14

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

Peter Zijlstra <[email protected]> writes:

> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.

How about performance measurements? mutexes still behave quite
differently from spinlocks, especially under contention.

-Andi

--
[email protected] -- Speaking for myself only.

2010-06-07 16:39:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Mon, 2010-06-07 at 18:36 +0200, Andi Kleen wrote:
> Peter Zijlstra <[email protected]> writes:
>
> > This patch-set makes part of the mm a lot more preemptible. It converts
> > i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> > preemptible.
>
> How about performance measurements? mutexes still behave quite
> differently from spinlocks, especially under contention.

What's your favourite benchmark to stress i_mmap_mutex/anon_vma->lock?

a cache-hot kernel build didn't really show a difference..

2010-06-10 01:43:07

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Mon, 2010-06-07 at 13:06 +0200, Peter Zijlstra wrote:
> This patch-set makes part of the mm a lot more preemptible. It converts
> i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> preemptible.
I applied it against 2.6.35-rc2 on x86_64 machine, but kernel booting panic.

Pid: 1, comm: init Not tainted 2.6.35-rc2-petermm #1 X7DW3/X7DW3
RIP: 0010:[<ffffffff8160e055>] [<ffffffff8160e055>] mutex_unlock
+0x0/0x13
RSP: 0018:ffff88022fc61a78 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88022ebd5780 RCX: 0000000000000020
RDX: 0000000000000000 RSI: 0000000000100173 RDI: 8000000000000025
RBP: 0000000000000020 R08: 00007fffffffe000 R09: 0000000000000001
R10: 0000000000000002 R11: ffffffff8128c275 R12: ffff88022e76b000
R13: 0000000000021000 R14: ffff88022fc68000 R15: 00007fff4c0e6000
FS: 0000000000000000(0000) GS:ffff8800021c0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000022e76c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process init (pid: 1, threadinfo ffff88022fc60000, task
ffff88022fc58000)
Stack:
ffffffff810aabac 00000000ffffefff 0000000000000000 00007fff4c107000
<0> ffff88022e76b000 00007fffffffe000 00007ffffffff000 00007fff4c106000
<0> ffffffff810c5e5a ffff88022ebd5780 ffff88022ebd5780 ffff88022ebd5780
Call Trace:
[<ffffffff810aabac>] ? expand_downwards+0x149/0x15a
[<ffffffff810c5e5a>] ? setup_arg_pages+0x333/0x361
[<ffffffff810c8955>] ? inode_permission+0x76/0x95
[<ffffffff810f8a19>] ? load_elf_binary+0x0/0x16c9
[<ffffffff810f8e7e>] ? load_elf_binary+0x465/0x16c9
[<ffffffff810c488a>] ? get_arg_page+0x4b/0xa4
[<ffffffff810f8a19>] ? load_elf_binary+0x0/0x16c9
[<ffffffff810c4ce4>] ? search_binary_handler+0xd4/0x26f
[<ffffffff810f7a1c>] ? load_script+0x0/0x1e4
[<ffffffff810f7bea>] ? load_script+0x1ce/0x1e4
[<ffffffff810c488a>] ? get_arg_page+0x4b/0xa4
[<ffffffff810c4ce4>] ? search_binary_handler+0xd4/0x26f
[<ffffffff810c6228>] ? do_execve+0x1e5/0x2b6
[<ffffffff81008d60>] ? sys_execve+0x35/0x53
[<ffffffff81003648>] ? kernel_execve+0x68/0xd0
[<ffffffff81000342>] ? init_post+0x5a/0xd4
[<ffffffff81cdf97d>] ? kernel_init+0x1e5/0x1ec
[<ffffffff810035d4>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81cdf798>] ? kernel_init+0x0/0x1ec
[<ffffffff810035d0>] ? kernel_thread_helper+0x0/0x10
Code: 1c 24 44 89 64 24 08 48 c7 44 24 20 07 fd 04 81 48 89 44 24 28 48
89 44 24 30 e8 26 ff ff ff 48 83 c4 48 5b 41 5c 41 5d 41 5e c3 <48> c7
47 18 00 00 00 00 f0 ff 07 7f 05 e8 01 00 00 00 c3 53 48
RIP [<ffffffff8160e055>] mutex_unlock+0x0/0x13
RSP <ffff88022fc61a78>
---[ end trace dc724d36e0cd4a32 ]---

>
> The main motivation was making mm_take_all_locks() preemptible, since it
> appears people are nesting hundreds of spinlocks there.
>
> The side-effects are that can finally make mmu_gather preemptible,
> something which lots of people have wanted to do for a long time.
>
> It also gets us anon_vma refcounting, which seems to result in a nice
> cleanup of the anon_vma lifetime rules wrt KSM and compaction.
>
> This patch-set it build and boot-tested on x86_64 (a previous version was
> also tested on Dave's Niagra2 machines, and I suppose s390 did too when
> Martin provided the conversion patch for his arch).
>
> There are no known architectures left unconverted, although some arch code
> never did see a compiler (superh and um come to mind).
>
> Can we move this work forwards, or is there anything people would want to
> see done?
>
> [ The series includes Rik's latest patches to the same area for
> convenience ]
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-06-10 06:53:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Thu, 2010-06-10 at 09:45 +0800, Zhang, Yanmin wrote:
> On Mon, 2010-06-07 at 13:06 +0200, Peter Zijlstra wrote:
> > This patch-set makes part of the mm a lot more preemptible. It converts
> > i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> > preemptible.
> I applied it against 2.6.35-rc2 on x86_64 machine, but kernel booting panic.
>
> Pid: 1, comm: init Not tainted 2.6.35-rc2-petermm #1 X7DW3/X7DW3
> RIP: 0010:[<ffffffff8160e055>] [<ffffffff8160e055>] mutex_unlock
> +0x0/0x13
> RSP: 0018:ffff88022fc61a78 EFLAGS: 00010206
> RAX: 0000000000000000 RBX: ffff88022ebd5780 RCX: 0000000000000020
> RDX: 0000000000000000 RSI: 0000000000100173 RDI: 8000000000000025
> RBP: 0000000000000020 R08: 00007fffffffe000 R09: 0000000000000001
> R10: 0000000000000002 R11: ffffffff8128c275 R12: ffff88022e76b000
> R13: 0000000000021000 R14: ffff88022fc68000 R15: 00007fff4c0e6000
> FS: 0000000000000000(0000) GS:ffff8800021c0000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000022e76c000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process init (pid: 1, threadinfo ffff88022fc60000, task
> ffff88022fc58000)
> Stack:
> ffffffff810aabac 00000000ffffefff 0000000000000000 00007fff4c107000
> <0> ffff88022e76b000 00007fffffffe000 00007ffffffff000 00007fff4c106000
> <0> ffffffff810c5e5a ffff88022ebd5780 ffff88022ebd5780 ffff88022ebd5780
> Call Trace:
> [<ffffffff810aabac>] ? expand_downwards+0x149/0x15a
> [<ffffffff810c5e5a>] ? setup_arg_pages+0x333/0x361
> [<ffffffff810c8955>] ? inode_permission+0x76/0x95
> [<ffffffff810f8a19>] ? load_elf_binary+0x0/0x16c9
> [<ffffffff810f8e7e>] ? load_elf_binary+0x465/0x16c9
> [<ffffffff810c488a>] ? get_arg_page+0x4b/0xa4
> [<ffffffff810f8a19>] ? load_elf_binary+0x0/0x16c9
> [<ffffffff810c4ce4>] ? search_binary_handler+0xd4/0x26f
> [<ffffffff810f7a1c>] ? load_script+0x0/0x1e4
> [<ffffffff810f7bea>] ? load_script+0x1ce/0x1e4
> [<ffffffff810c488a>] ? get_arg_page+0x4b/0xa4
> [<ffffffff810c4ce4>] ? search_binary_handler+0xd4/0x26f
> [<ffffffff810c6228>] ? do_execve+0x1e5/0x2b6
> [<ffffffff81008d60>] ? sys_execve+0x35/0x53
> [<ffffffff81003648>] ? kernel_execve+0x68/0xd0
> [<ffffffff81000342>] ? init_post+0x5a/0xd4
> [<ffffffff81cdf97d>] ? kernel_init+0x1e5/0x1ec
> [<ffffffff810035d4>] ? kernel_thread_helper+0x4/0x10
> [<ffffffff81cdf798>] ? kernel_init+0x0/0x1ec
> [<ffffffff810035d0>] ? kernel_thread_helper+0x0/0x10
> Code: 1c 24 44 89 64 24 08 48 c7 44 24 20 07 fd 04 81 48 89 44 24 28 48
> 89 44 24 30 e8 26 ff ff ff 48 83 c4 48 5b 41 5c 41 5d 41 5e c3 <48> c7
> 47 18 00 00 00 00 f0 ff 07 7f 05 e8 01 00 00 00 c3 53 48
> RIP [<ffffffff8160e055>] mutex_unlock+0x0/0x13
> RSP <ffff88022fc61a78>
> ---[ end trace dc724d36e0cd4a32 ]---

Oi, that's no good. Happen to have your .config handy, I didn't actually
see my machine do that.

2010-06-10 06:57:55

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Thu, 2010-06-10 at 08:52 +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 09:45 +0800, Zhang, Yanmin wrote:
> > On Mon, 2010-06-07 at 13:06 +0200, Peter Zijlstra wrote:
> > > This patch-set makes part of the mm a lot more preemptible. It converts
> > > i_mmap_lock and anon_vma->lock to mutexes and makes mmu_gather fully
> > > preemptible.
> > I applied it against 2.6.35-rc2 on x86_64 machine, but kernel booting panic.
> >
> > Pid: 1, comm: init Not tainted 2.6.35-rc2-petermm #1 X7DW3/X7DW3
> > RIP: 0010:[<ffffffff8160e055>] [<ffffffff8160e055>] mutex_unlock
> > +0x0/0x13
> > RSP: 0018:ffff88022fc61a78 EFLAGS: 00010206
> > RAX: 0000000000000000 RBX: ffff88022ebd5780 RCX: 0000000000000020
> > RDX: 0000000000000000 RSI: 0000000000100173 RDI: 8000000000000025
> > RBP: 0000000000000020 R08: 00007fffffffe000 R09: 0000000000000001
> > R10: 0000000000000002 R11: ffffffff8128c275 R12: ffff88022e76b000
> > R13: 0000000000021000 R14: ffff88022fc68000 R15: 00007fff4c0e6000
> > FS: 0000000000000000(0000) GS:ffff8800021c0000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 0000000000000000 CR3: 000000022e76c000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process init (pid: 1, threadinfo ffff88022fc60000, task
> > ffff88022fc58000)
> > Stack:
> > ffffffff810aabac 00000000ffffefff 0000000000000000 00007fff4c107000
> > <0> ffff88022e76b000 00007fffffffe000 00007ffffffff000 00007fff4c106000
> > <0> ffffffff810c5e5a ffff88022ebd5780 ffff88022ebd5780 ffff88022ebd5780
> > Call Trace:
> > [<ffffffff810aabac>] ? expand_downwards+0x149/0x15a
> > [<ffffffff810c5e5a>] ? setup_arg_pages+0x333/0x361
> > [<ffffffff810c8955>] ? inode_permission+0x76/0x95
> > [<ffffffff810f8a19>] ? load_elf_binary+0x0/0x16c9
> > [<ffffffff810f8e7e>] ? load_elf_binary+0x465/0x16c9
> > [<ffffffff810c488a>] ? get_arg_page+0x4b/0xa4
> > [<ffffffff810f8a19>] ? load_elf_binary+0x0/0x16c9
> > [<ffffffff810c4ce4>] ? search_binary_handler+0xd4/0x26f
> > [<ffffffff810f7a1c>] ? load_script+0x0/0x1e4
> > [<ffffffff810f7bea>] ? load_script+0x1ce/0x1e4
> > [<ffffffff810c488a>] ? get_arg_page+0x4b/0xa4
> > [<ffffffff810c4ce4>] ? search_binary_handler+0xd4/0x26f
> > [<ffffffff810c6228>] ? do_execve+0x1e5/0x2b6
> > [<ffffffff81008d60>] ? sys_execve+0x35/0x53
> > [<ffffffff81003648>] ? kernel_execve+0x68/0xd0
> > [<ffffffff81000342>] ? init_post+0x5a/0xd4
> > [<ffffffff81cdf97d>] ? kernel_init+0x1e5/0x1ec
> > [<ffffffff810035d4>] ? kernel_thread_helper+0x4/0x10
> > [<ffffffff81cdf798>] ? kernel_init+0x0/0x1ec
> > [<ffffffff810035d0>] ? kernel_thread_helper+0x0/0x10
> > Code: 1c 24 44 89 64 24 08 48 c7 44 24 20 07 fd 04 81 48 89 44 24 28 48
> > 89 44 24 30 e8 26 ff ff ff 48 83 c4 48 5b 41 5c 41 5d 41 5e c3 <48> c7
> > 47 18 00 00 00 00 f0 ff 07 7f 05 e8 01 00 00 00 c3 53 48
> > RIP [<ffffffff8160e055>] mutex_unlock+0x0/0x13
> > RSP <ffff88022fc61a78>
> > ---[ end trace dc724d36e0cd4a32 ]---
>
> Oi, that's no good. Happen to have your .config handy, I didn't actually
> see my machine do that.
See the attachment.

Yanmin


Attachments:
config_mm (60.78 kB)

2010-06-21 10:23:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Thu, 2010-06-10 at 14:59 +0800, Zhang, Yanmin wrote:

> > > I applied it against 2.6.35-rc2 on x86_64 machine, but kernel booting panic.

<snip panic>

> > Oi, that's no good. Happen to have your .config handy, I didn't actually
> > see my machine do that.

> See the attachment.

That .config build and booted on my x86_64 machine, no funnies there.

I pushed a git tree out to kernel.org but somehow its refusing to show
up on git.kernel.org..

2010-06-24 09:56:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Mon, 2010-06-21 at 12:21 +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-10 at 14:59 +0800, Zhang, Yanmin wrote:
>
> > > > I applied it against 2.6.35-rc2 on x86_64 machine, but kernel booting panic.
>
> <snip panic>
>
> > > Oi, that's no good. Happen to have your .config handy, I didn't actually
> > > see my machine do that.
>
> > See the attachment.
>
> That .config build and booted on my x86_64 machine, no funnies there.
>
> I pushed a git tree out to kernel.org but somehow its refusing to show
> up on git.kernel.org..

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-mmu_preempt.git mmu_preempt

2010-06-29 07:40:04

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Thu, 2010-06-24 at 11:55 +0200, Peter Zijlstra wrote:
> On Mon, 2010-06-21 at 12:21 +0200, Peter Zijlstra wrote:
> > On Thu, 2010-06-10 at 14:59 +0800, Zhang, Yanmin wrote:
> >
> > > > > I applied it against 2.6.35-rc2 on x86_64 machine, but kernel booting panic.
> >
> > <snip panic>
> >
> > > > Oi, that's no good. Happen to have your .config handy, I didn't actually
> > > > see my machine do that.
> >
> > > See the attachment.
> >
> > That .config build and booted on my x86_64 machine, no funnies there.
> >
> > I pushed a git tree out to kernel.org but somehow its refusing to show
> > up on git.kernel.org..
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-mmu_preempt.git mmu_preempt
We tested the tree on many machines with many benchmarks. Comparing with pure kernel 2.6.35-rc3,
there is no clear performance regression/improvement. We didn't run into panic again.

Yanmin

2010-06-29 07:49:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/28] mm: preemptibility -v3

On Tue, 2010-06-29 at 15:40 +0800, Zhang, Yanmin wrote:
> On Thu, 2010-06-24 at 11:55 +0200, Peter Zijlstra wrote:

> > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-mmu_preempt.git mmu_preempt

> We tested the tree on many machines with many benchmarks. Comparing with pure kernel 2.6.35-rc3,
> there is no clear performance regression/improvement. We didn't run into panic again.

Most awesome, thanks!