2013-03-07 13:28:29

by Fengguang Wu

[permalink] [raw]
Subject: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

Greetings,

I got the below oops and the first bad commit is

commit 98a271e459b8088fdc42a4a11c08570d2539cae0
Author: Rakib Mullick <[email protected]>
Date: Thu Mar 7 14:52:20 2013 +0600

nsproxy: Fix ->nsproxy counting problem in copy_namespace.

In copy_namespace(), get_nsproxy() (which increments nsproxy->count)
is called before checking namespace related flags. Therefore, task's
nsproxy->count could have improper value, which could lead to calling
free_nsproxy unnecessarily. Also, incrementing nsproxy->count is an
atomic operation (which is expensive than normal increment operation),
so before doing it - it's better we make sure namespace related flags
are set.

Cc: [email protected]
Reviewed-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Rakib Mullick <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>

[ 26.782766] Scanning for low memory corruption every 60 seconds
[ 26.814208] cryptomgr_test (18) used greatest stack depth: 6208 bytes left
[ 26.839604] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
[ 26.841460] IP: [<ffffffff81196e77>] ida_remove+0x97/0xe0
[ 26.841460] PGD 0
[ 26.841460] Oops: 0000 [#1] SMP
[ 26.841460] CPU 0
[ 26.841460] Pid: 18, comm: cryptomgr_test Not tainted 3.9.0-rc1-00004-g98a271e #301 Bochs Bochs
[ 26.841460] RIP: 0010:[<ffffffff81196e77>] [<ffffffff81196e77>] ida_remove+0x97/0xe0
[ 26.841460] RSP: 0000:ffff88000d221cb8 EFLAGS: 00000046
[ 26.841460] RAX: 00000000000000ff RBX: 0000000000000000 RCX: 0000000000000044
[ 26.841460] RDX: 0000000000000044 RSI: 00000000ffffffff RDI: ffff88000d09b188
[ 26.841460] RBP: ffff88000d221cc8 R08: ffff88000d09b180 R09: ffffffffffffffff
[ 26.841460] R10: 0000000044444444 R11: 0000000000000000 R12: ffffffff81acec00
[ 26.841460] R13: 0000000000000000 R14: ffff88000d221b58 R15: ffff88000d19c2d0
[ 26.841460] FS: 0000000000000000(0000) GS:ffff88000de00000(0000) knlGS:0000000000000000
[ 26.841460] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 26.841460] CR2: 0000000000000024 CR3: 0000000001aa3000 CR4: 00000000000006f0
[ 26.841460] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 26.841460] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[ 26.841460] Process cryptomgr_test (pid: 18, threadinfo ffff88000d220000, task ffff88000d19c2e0)
[ 26.841460] Stack:
[ 26.841460] 00000000efffffff 0000000000000282 ffff88000d221ce8 ffffffff81147eb3
[ 26.841460] ffffffff81ad0260 ffff88000d19c2e0 ffff88000d221d08 ffffffff81159794
[ 26.841460] ffffffff81ac73e0 ffffffff81ac73e0 ffff88000d221d28 ffffffff81080b48
[ 26.841460] Call Trace:
[ 26.841460] [<ffffffff81147eb3>] proc_free_inum+0x33/0x50
[ 26.841460] [<ffffffff81159794>] put_ipc_ns+0x64/0x80
[ 26.841460] [<ffffffff81080b48>] free_nsproxy+0x28/0x50
[ 26.841460] [<ffffffff81080dae>] switch_task_namespaces+0x5e/0x70
[ 26.841460] [<ffffffff81080dcb>] exit_task_namespaces+0xb/0x10
[ 26.841460] [<ffffffff8106468c>] do_exit+0x52c/0x9e0
[ 26.841460] [<ffffffff8176620d>] ? __schedule+0x39d/0x760
[ 26.841460] [<ffffffff8116e640>] ? cryptomgr_probe+0xb0/0xb0
[ 26.841460] [<ffffffff8116e670>] cryptomgr_test+0x30/0x50
[ 26.841460] [<ffffffff8107c4e6>] kthread+0xd6/0xe0
[ 26.841460] [<ffffffff811b1ddd>] ? do_raw_spin_unlock+0x5d/0xb0
[ 26.841460] [<ffffffff81087453>] ? complete+0x23/0x60
[ 26.841460] [<ffffffff8107c410>] ? insert_kthread_work+0x80/0x80
[ 26.841460] [<ffffffff817696bc>] ret_from_fork+0x7c/0xb0
[ 26.841460] [<ffffffff8107c410>] ? insert_kthread_work+0x80/0x80
[ 26.841460] Code: 48 08 4d 63 c9 83 e9 08 4f 8b 44 c8 28 4d 85 c0 75 da 4d 85 c0 74 4b 0f b6 d2 49 8d 78 08 41 0f b3 50 08 48 63 ca 49 8b 5c c8 28 <0f> a3 43 08 19 c9 85 c9 74 2d 0f b3 43 08 48 83 2b 01 74 05 5b
[ 26.841460] RIP [<ffffffff81196e77>] ida_remove+0x97/0xe0
[ 26.841460] RSP <ffff88000d221cb8>
[ 26.841460] CR2: 0000000000000024
[ 26.841460] ---[ end trace f89a34cf0d9f599e ]---

git bisect start 98a271e459b8088fdc42a4a11c08570d2539cae0 6dbe51c251a327e012439c4772097a13df43c5b8 --
git bisect good 7f78e0351394052e1a6293e175825eb5c7869507 # 10 2013-03-07 20:26:53 fs: Limit sys_mount to only request filesystem modules.
git bisect good 9141770548d529b9d32d5b08d59b65ee65afe0d4 # 11 2013-03-07 20:38:47 fs: Limit sys_mount to only request filesystem modules (Part 2).
git bisect good 9141770548d529b9d32d5b08d59b65ee65afe0d4 # 31 2013-03-07 20:50:11 fs: Limit sys_mount to only request filesystem modules (Part 2).
git bisect bad 98a271e459b8088fdc42a4a11c08570d2539cae0 # 0 2013-03-07 20:51:48 nsproxy: Fix ->nsproxy counting problem in copy_namespace.
git bisect good 687c18a83e1a31c15cd1fb93eab4a4b250d533cd # 35 2013-03-07 20:55:35 Revert "nsproxy: Fix ->nsproxy counting problem in copy_namespace."
git bisect good 9edbffb58ae00067e264ef70d5141c1d85049029 # 31 2013-03-07 21:19:44 Add linux-next specific files for 20130307

Thanks,
Fengguang


Attachments:
(No filename) (5.00 kB)
dmesg-kvm-ant-6810-2013-03-07-18-37-02-3.9.0-rc1-00004-g98a271e-301 (30.64 kB)
98a271e459b8088fdc42a4a11c08570d2539cae0-bisect.log (3.66 kB)
.config-bisect (66.95 kB)
Download all attachments

2013-03-07 17:23:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

Fengguang Wu <[email protected]> writes:

> Greetings,
>
> I got the below oops and the first bad commit is

Doh! On a second look that change is totally wrong. Of course we need
to up the ref-count every time we create a new process. Especially if
we don't do anything with namespaces.

I was looking at it from the wrong angle last night. I should have
known better.

Patch dropped.

Thanks,
Eric

2013-03-08 11:38:55

by Rakib Mullick

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

On 3/7/13, Eric W. Biederman <[email protected]> wrote:
> Fengguang Wu <[email protected]> writes:
>
>> Greetings,
>>
>> I got the below oops and the first bad commit is
>
> Doh! On a second look that change is totally wrong. Of course we need
> to up the ref-count every time we create a new process. Especially if
> we don't do anything with namespaces.
>
> I was looking at it from the wrong angle last night. I should have
> known better.
>
> Patch dropped.
>

Sad to know :( . From the debug messages, it's kmemcheck report. I
can't related the problem specified with the patch I've proposed.

It seems at task exit path, at switch_task_namespaces() - after my
patch atomic_dec_and_test(&ns->count) becomes true (-1), thus
free_nsproxy() gets called. But, free_nsproxy() shouldn't get called
here.

Am I right? Or there's something else?

Thanks,
Rakib

2013-03-08 16:01:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

Rakib Mullick <[email protected]> writes:

> On 3/7/13, Eric W. Biederman <[email protected]> wrote:
>> Fengguang Wu <[email protected]> writes:
>>
>>> Greetings,
>>>
>>> I got the below oops and the first bad commit is
>>
>> Doh! On a second look that change is totally wrong. Of course we need
>> to up the ref-count every time we create a new process. Especially if
>> we don't do anything with namespaces.
>>
>> I was looking at it from the wrong angle last night. I should have
>> known better.
>>
>> Patch dropped.
>>
>
> Sad to know :( . From the debug messages, it's kmemcheck report. I
> can't related the problem specified with the patch I've proposed.
>
> It seems at task exit path, at switch_task_namespaces() - after my
> patch atomic_dec_and_test(&ns->count) becomes true (-1), thus
> free_nsproxy() gets called. But, free_nsproxy() shouldn't get called
> here.
>
> Am I right? Or there's something else?

When a new task is created one of two things needs to happen.
A) A reference count needs to be added to the current nsproxy.
B) B a new nsproxy needs to be created.

The way that code works today is far from a shiny example of totally
clear code but it is not incorrect.

By moving get_nsproxy down below the first return 0, you removed taking
the reference count in the one case it is important.

Arguably we should apply the patch below for clarity, and I just might
queue it up for 3.10.

Eric

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index afc0456..11b8b3f 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -125,22 +125,16 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
struct nsproxy *old_ns = tsk->nsproxy;
struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
struct nsproxy *new_ns;
- int err = 0;
-
- if (!old_ns)
- return 0;
-
- get_nsproxy(old_ns);

if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
- CLONE_NEWPID | CLONE_NEWNET)))
+ CLONE_NEWPID | CLONE_NEWNET))) {
+ get_nsproxy(old_ns);
return 0;
-
- if (!ns_capable(user_ns, CAP_SYS_ADMIN)) {
- err = -EPERM;
- goto out;
}

+ if (!ns_capable(user_ns, CAP_SYS_ADMIN))
+ return -EPERM;
+
/*
* CLONE_NEWIPC must detach from the undolist: after switching
* to a new ipc namespace, the semaphore arrays from the old
@@ -148,22 +142,15 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
* means share undolist with parent, so we must forbid using
* it along with CLONE_NEWIPC.
*/
- if ((flags & CLONE_NEWIPC) && (flags & CLONE_SYSVSEM)) {
- err = -EINVAL;
- goto out;
- }
+ if ((flags & CLONE_NEWIPC) && (flags & CLONE_SYSVSEM))
+ return -EINVAL;

new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs);
- if (IS_ERR(new_ns)) {
- err = PTR_ERR(new_ns);
- goto out;
- }
+ if (IS_ERR(new_ns))
+ return PTR_ERR(new_ns);

tsk->nsproxy = new_ns;
-
-out:
- put_nsproxy(old_ns);
- return err;
+ return 0;
}

void free_nsproxy(struct nsproxy *ns)

2013-03-09 03:54:47

by Rakib Mullick

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

On Fri, Mar 8, 2013 at 10:01 PM, Eric W. Biederman
<[email protected]> wrote:
>
> When a new task is created one of two things needs to happen.
> A) A reference count needs to be added to the current nsproxy.
> B) B a new nsproxy needs to be created.
>
> The way that code works today is far from a shiny example of totally
> clear code but it is not incorrect.
>
> By moving get_nsproxy down below the first return 0, you removed taking
> the reference count in the one case it is important.
>
> Arguably we should apply the patch below for clarity, and I just might
> queue it up for 3.10.
>
This one is much more cleaner. One thing regarding this patch, can we
check the namespace related flags at copy_namespace() call time at
copy_process(), also get_nsproxy()? I think this will reduce some
extra function call overhead and as you've mentioned get_nsproxy() is
needed at every process creation.

Thanks,
Rakib

2013-03-09 08:33:10

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

Rakib Mullick <[email protected]> writes:

> On Fri, Mar 8, 2013 at 10:01 PM, Eric W. Biederman
> <[email protected]> wrote:
>>
>> When a new task is created one of two things needs to happen.
>> A) A reference count needs to be added to the current nsproxy.
>> B) B a new nsproxy needs to be created.
>>
>> The way that code works today is far from a shiny example of totally
>> clear code but it is not incorrect.
>>
>> By moving get_nsproxy down below the first return 0, you removed taking
>> the reference count in the one case it is important.
>>
>> Arguably we should apply the patch below for clarity, and I just might
>> queue it up for 3.10.
>>
> This one is much more cleaner. One thing regarding this patch, can we
> check the namespace related flags at copy_namespace() call time at
> copy_process(), also get_nsproxy()? I think this will reduce some
> extra function call overhead and as you've mentioned get_nsproxy() is
> needed at every process creation.

If you can write a benchmark that can tell the difference, and the code
continues to be readable. It would be worth making the change.

My gut says you are proposing an optimization that won't have
a measurable impact.

Eric

2013-03-09 16:48:45

by Rakib Mullick

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

On Sat, Mar 9, 2013 at 2:33 PM, Eric W. Biederman <[email protected]> wrote:
> Rakib Mullick <[email protected]> writes:
>
>> On Fri, Mar 8, 2013 at 10:01 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>>
>>> When a new task is created one of two things needs to happen.
>>> A) A reference count needs to be added to the current nsproxy.
>>> B) B a new nsproxy needs to be created.
>>>
>>> The way that code works today is far from a shiny example of totally
>>> clear code but it is not incorrect.
>>>
>>> By moving get_nsproxy down below the first return 0, you removed taking
>>> the reference count in the one case it is important.
>>>
>>> Arguably we should apply the patch below for clarity, and I just might
>>> queue it up for 3.10.
>>>
>> This one is much more cleaner. One thing regarding this patch, can we
>> check the namespace related flags at copy_namespace() call time at
>> copy_process(), also get_nsproxy()? I think this will reduce some
>> extra function call overhead and as you've mentioned get_nsproxy() is
>> needed at every process creation.
>
> If you can write a benchmark that can tell the difference, and the code
> continues to be readable. It would be worth making the change.
>
> My gut says you are proposing an optimization that won't have
> a measurable impact.
>
Yes, it'll be hard to measure these sorts of optimization, though I'll
try and will tell you if the impact is measurable :).

Thanks,
Rakib.

2013-03-11 08:12:12

by Rakib Mullick

[permalink] [raw]
Subject: Re: [nsproxy] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024

On Sat, Mar 9, 2013 at 10:48 PM, Rakib Mullick <[email protected]> wrote:
> On Sat, Mar 9, 2013 at 2:33 PM, Eric W. Biederman <[email protected]> wrote:
>> Rakib Mullick <[email protected]> writes:
>>
>>> On Fri, Mar 8, 2013 at 10:01 PM, Eric W. Biederman
>>> <[email protected]> wrote:
>>>>
>>>> When a new task is created one of two things needs to happen.
>>>> A) A reference count needs to be added to the current nsproxy.
>>>> B) B a new nsproxy needs to be created.
>>>>
>>>> The way that code works today is far from a shiny example of totally
>>>> clear code but it is not incorrect.
>>>>
>>>> By moving get_nsproxy down below the first return 0, you removed taking
>>>> the reference count in the one case it is important.
>>>>
>>>> Arguably we should apply the patch below for clarity, and I just might
>>>> queue it up for 3.10.
>>>>
>>> This one is much more cleaner. One thing regarding this patch, can we
>>> check the namespace related flags at copy_namespace() call time at
>>> copy_process(), also get_nsproxy()? I think this will reduce some
>>> extra function call overhead and as you've mentioned get_nsproxy() is
>>> needed at every process creation.
>>
>> If you can write a benchmark that can tell the difference, and the code
>> continues to be readable. It would be worth making the change.
>>
>> My gut says you are proposing an optimization that won't have
>> a measurable impact.
>>
> Yes, it'll be hard to measure these sorts of optimization, though I'll
> try and will tell you if the impact is measurable :).
>
Well, it's quite certain that - the changes I've proposed that don't
have much noticable impact. From this tiny
changes we can't expect too much impact. The tinyest possible impact
could be on at every task creation path by
reducing some latency. So, to find it out I took the help of ftrace -
I used it's function_graph option with set_graph_function set to
do_fork.

So currently, at every task creation copy_namespaces() gets called and
it costs some like below (few snippet from ftrace log):

3) 0.025 us | try_module_get();
3) ! 887.805 us | } /* dup_mm */
3) 0.278 us | copy_namespaces(); <------
3) 0.094 us | copy_thread();

3) 0.264 us | copy_namespaces(); <-----
3) 0.091 us | copy_thread();

3) 0.306 us | copy_namespaces(); <-----
3) 0.086 us | copy_thread();

etc. copy_namespaces() just returns here.

On the otherhand, when namespace related flags are moved to
kernel/fork.c::copy_process(), copy_namespaces() never gets called
until flags are set.

So, this is what I did manage to get. If you're not convinced, then
you can drop it. But, we can't expect much than this for this simple
changes.

Thanks,
Rakib.