2012-07-17 16:01:43

by David Miller

[permalink] [raw]
Subject: That's pretty much it for 3.5.0


Linus was _extremely_ generous and took in all the stuff that was
pending in the net tree just now.

Besides very serious issues, I'm not willing to consider any more bug
fixes for the 'net' tree at this time.

Only one pending known bug qualifies, and that's the CIPSO ip option
processing OOPS'er. And I'll work on that myself if Paul Moore
doesn't show a sign of life in the next day.

Thanks.


2012-07-17 17:41:56

by Rustad, Mark D

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On Jul 17, 2012, at 9:01 AM, David Miller wrote:

> Linus was _extremely_ generous and took in all the stuff that was
> pending in the net tree just now.

Maybe *too* generous. :-) I just updated and when I boot I get an early crash in update_netdev_tables which is in netprio_cgroup.c.

> Besides very serious issues, I'm not willing to consider any more bug
> fixes for the 'net' tree at this time.

I think the above issue will have to be fixed, as it completely prevents booting for any kernel that includes the netprio_cgroup option.

> Only one pending known bug qualifies, and that's the CIPSO ip option
> processing OOPS'er. And I'll work on that myself if Paul Moore
> doesn't show a sign of life in the next day.
>
> Thanks.


I can start taking a look at this if you like, but I see that Gao feng has two patches in the last set of patches that may be related.

To give you an idea how early the crash is, here are a few log messages leading up to it:

[ 0.003455] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[ 0.005550] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[ 0.007165] Mount-cache hash table entries: 256
[ 0.010289] Initializing cgroup subsys net_cls
[ 0.010947] Initializing cgroup subsys net_prio
[ 0.011039] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
[ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0

--
Mark Rustad, LAN Access Division, Intel Corporation


2012-07-18 17:55:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On Wed, 2012-07-18 at 17:36 +0000, Rustad, Mark D wrote:
> On Jul 18, 2012, at 6:04 AM, Neil Horman wrote:
>
> > John, can you post the backtrace you got for this? I replied to the patch that
> > you posted for this fix. the cgroup subsystem has an early_init flag thats
> > supposed to prevent the initialization of cgroups that don't need initialization
> > until later (like via module_init() calls).
>
> Here is the backtrace that I get and below a patch that fixes it:
>
> [ 0.010958] Initializing cgroup subsys net_prio
> [ 0.011040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
> [ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> [ 0.011998] PGD 0
> [ 0.011998] Oops: 0000 [#1] SMP
> [ 0.011998] CPU 0
> [ 0.011998] Modules linked in:
> [ 0.011998]
> [ 0.011998] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7-mdrlinux+ #10 Bochs Bochs
> [ 0.011998] RIP: 0010:[<ffffffff814202c8>] [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> [ 0.011998] RSP: 0000:ffffffff81a01e68 EFLAGS: 00010246
> [ 0.011998] RAX: 0000000000000000 RBX: fffffffffffffed0 RCX: 0000000000000000
> [ 0.011998] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 2222222222222222
> [ 0.011998] RBP: ffffffff81a01e88 R08: 2222222222222222 R09: 2222222222222222
> [ 0.011998] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
> [ 0.011998] R13: 0000000000000000 R14: ffff88007ff608c0 R15: 00000000000143d0
> [ 0.011998] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> [ 0.011998] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 0.011998] CR2: 0000000000000828 CR3: 0000000001a0b000 CR4: 00000000000006b0
> [ 0.011998] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.011998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 0.011998] Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a13420)
> [ 0.011998] Stack:
> [ 0.011998] ffffffff81a88020 0000000000000000 ffff88007d3a38f0 fffffffffffffff4
> [ 0.011998] ffffffff81a01ec8 ffffffff814203cd ffff88007ff608c0 ffffffff817d8e9a
> [ 0.011998] ffffffff81a88cd8 ffffffff81a88020 ffffffff81a88020 ffffffff81b010a0
> [ 0.011998] Call Trace:
> [ 0.011998] [<ffffffff814203cd>] cgrp_create+0x8d/0xc0
> [ 0.011998] [<ffffffff81ade14a>] cgroup_init_subsys+0x80/0x126
> [ 0.011998] [<ffffffff81ade380>] cgroup_init+0x36/0x117
> [ 0.011998] [<ffffffff81acab71>] start_kernel+0x32e/0x34f
> [ 0.011998] [<ffffffff81aca6d5>] ? repair_env_string+0x5a/0x5a
> [ 0.011998] [<ffffffff81aca346>] x86_64_start_reservations+0x101/0x105
> [ 0.011998] [<ffffffff81aca120>] ? early_idt_handlers+0x120/0x120
> [ 0.011998] [<ffffffff81aca417>] x86_64_start_kernel+0xcd/0xdc
> [ 0.011998] Code: 0f 1f 00 48 8b 83 30 01 00 00 48 8d 98 d0 fe ff ff 48 3d a8 e8 52 82 74 3a e8 25 db c3 ff 85 c0 74 09 80 3d bb 3d 68 00 00 74 40 <48> 8b 83 58 09 00 00 48 85 c0 74 cc 44 3b 60 10 76 c6 44 89 e6
> [ 0.011998] RIP [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
> [ 0.011998] RSP <ffffffff81a01e68>
> [ 0.011998] CR2: 0000000000000828
> [ 0.012009] ---[ end trace a7919e7f17c0a725 ]---
> [ 0.012601] Kernel panic - not syncing: Attempted to kill the idle task!
>
> The following change simply statically initializes init_net.dev_base_head. I copied and pasted it into the email, so this rendering may not work, but I can send it if this approach looks reasonable. I have verified that it resolves the issue above.
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0f28a9e..db1ba61 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6283,8 +6283,6 @@ static struct hlist_head *netdev_create_hash(void)
> /* Initialize per network namespace state */
> static int __net_init netdev_init(struct net *net)
> {
> - INIT_LIST_HEAD(&net->dev_base_head);
> -

if (net != &init_net)
INIT_LIST_HEAD(&net->dev_base_head);

> net->dev_name_head = netdev_create_hash();
> if (net->dev_name_head == NULL)
> goto err_name;
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index dddbacb..42f1e1c 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
> LIST_HEAD(net_namespace_list);
> EXPORT_SYMBOL_GPL(net_namespace_list);
>
> -struct net init_net;
> +struct net init_net = {
> + .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
> +};
> EXPORT_SYMBOL(init_net);
>
> #define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */
>



2012-07-17 20:50:17

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 12:24 PM, David Miller wrote:
> From: John Fastabend <[email protected]>
> Date: Tue, 17 Jul 2012 12:09:53 -0700
>
>> although we don't have an early_init hook for netprio_cgroup so this
>> is probably not correct.
>
> The dependency is actually on net_dev_init (a subsys_initcall) rather
> than a pure_initcall.
>
> net_dev_init is what registers the netdev_net_ops, which in turn
> initializes the netdev list in namespaces such as &init_net
>

Ah right thanks sorry for the thrash. I guess we need to check if the
netdev list in the init_net namespace is initialized.

2012-07-17 19:09:54

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 12:00 PM, John Fastabend wrote:
> On 7/17/2012 11:48 AM, Rustad, Mark D wrote:
>> On Jul 17, 2012, at 10:41 AM, Rustad, Mark D wrote:
>>
>>> On Jul 17, 2012, at 9:01 AM, David Miller wrote:
>>>
>>>> Linus was _extremely_ generous and took in all the stuff that was
>>>> pending in the net tree just now.
>>>
>>> Maybe *too* generous. :-) I just updated and when I boot I get an
>>> early crash in update_netdev_tables which is in netprio_cgroup.c.
>>>
>>>> Besides very serious issues, I'm not willing to consider any more bug
>>>> fixes for the 'net' tree at this time.
>>>
>>> I think the above issue will have to be fixed, as it completely
>>> prevents booting for any kernel that includes the netprio_cgroup option.
>>>
>>>> Only one pending known bug qualifies, and that's the CIPSO ip option
>>>> processing OOPS'er. And I'll work on that myself if Paul Moore
>>>> doesn't show a sign of life in the next day.
>>>>
>>>> Thanks.
>>>
>>>
>>> I can start taking a look at this if you like, but I see that Gao
>>> feng has two patches in the last set of patches that may be related.
>>>
>>> To give you an idea how early the crash is, here are a few log
>>> messages leading up to it:
>>>
>>> [ 0.003455] Dentry cache hash table entries: 262144 (order: 9,
>>> 2097152 bytes)
>>> [ 0.005550] Inode-cache hash table entries: 131072 (order: 8,
>>> 1048576 bytes)
>>> [ 0.007165] Mount-cache hash table entries: 256
>>> [ 0.010289] Initializing cgroup subsys net_cls
>>> [ 0.010947] Initializing cgroup subsys net_prio
>>> [ 0.011039] BUG: unable to handle kernel NULL pointer dereference
>>> at 0000000000000828
>>> [ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
>>
>>
>> I found that I can avoid the crash by configuring the netprio_cgroup
>> as a module. I don't need to have it built in, I just happened to.
>> This finding may lower the temperature of this issue a lot from what I
>> had been feeling.
>>
>
> hmm looks like we access init_net here,
>
> static void update_netdev_tables(void)
> {
> struct net_device *dev;
> u32 max_len = atomic_read(&max_prioidx) + 1;
> struct netprio_map *map;
>
> rtnl_lock();
> for_each_netdev(&init_net, dev) {
> map = rtnl_dereference(dev->priomap);
> if ((!map) ||
> (map->priomap_len < max_len))
> extend_netdev_table(dev, max_len);
> }
> rtnl_unlock();
> }
>
> but inet_net is initialized by pure_initcall(net_ns_init) and I
> gather pure_initcall's should not have any dependencies but it
> looks like we created one here with cgroup_init_early() in
> start_kernel().
>
> I'll poke around some more. Also had some off list help from
> Mark.
>
> .John
>

although we don't have an early_init hook for netprio_cgroup so this
is probably not correct.

2012-07-18 13:04:47

by Neil Horman

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On Tue, Jul 17, 2012 at 04:27:53PM -0700, John Fastabend wrote:
> On 7/17/2012 3:18 PM, David Miller wrote:
> >From: John Fastabend <[email protected]>
> >Date: Tue, 17 Jul 2012 15:13:36 -0700
> >
> >>Perhaps the easiest way is to check net->count this should be zero
> >>until setup_net is called.
> >>
> >>if (!atomic_read(&init_net.count))
> >> return ret;
> >>
> >
> >Won't work, setup_net() runs via a pure_initcall().
> >
>
> Why not must have missed something? cgroup_init() and
> cgroup_early_init() both run before _initcall() routines are
> called via kernel_init() so this will stop the update in
> netprio from occurring.
>
> And I don't see any race elsewhere for this.
John, can you post the backtrace you got for this? I replied to the patch that
you posted for this fix. the cgroup subsystem has an early_init flag thats
supposed to prevent the initialization of cgroups that don't need initialization
until later (like via module_init() calls).
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2012-07-18 18:33:27

by David Miller

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

From: "Rustad, Mark D" <[email protected]>
Date: Wed, 18 Jul 2012 18:31:31 +0000

> If this looks like a good change, I can send the patch. Is there any
> concern about init_net going from bss to data?

There is no such concern, I like your change a lot.

2012-07-17 23:27:55

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 3:18 PM, David Miller wrote:
> From: John Fastabend <[email protected]>
> Date: Tue, 17 Jul 2012 15:13:36 -0700
>
>> Perhaps the easiest way is to check net->count this should be zero
>> until setup_net is called.
>>
>> if (!atomic_read(&init_net.count))
>> return ret;
>>
>
> Won't work, setup_net() runs via a pure_initcall().
>

Why not must have missed something? cgroup_init() and
cgroup_early_init() both run before _initcall() routines are
called via kernel_init() so this will stop the update in
netprio from occurring.

And I don't see any race elsewhere for this.

2012-07-17 19:25:02

by David Miller

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

From: John Fastabend <[email protected]>
Date: Tue, 17 Jul 2012 12:09:53 -0700

> although we don't have an early_init hook for netprio_cgroup so this
> is probably not correct.

The dependency is actually on net_dev_init (a subsys_initcall) rather
than a pure_initcall.

net_dev_init is what registers the netdev_net_ops, which in turn
initializes the netdev list in namespaces such as &init_net

2012-07-18 18:31:33

by Rustad, Mark D

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0


On Jul 18, 2012, at 10:55 AM, Eric Dumazet wrote:

> On Wed, 2012-07-18 at 17:36 +0000, Rustad, Mark D wrote:
>>
>> The following change simply statically initializes init_net.dev_base_head. I copied and pasted it into the email, so this rendering may not work, but I can send it if this approach looks reasonable. I have verified that it resolves the issue above.
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 0f28a9e..db1ba61 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -6283,8 +6283,6 @@ static struct hlist_head *netdev_create_hash(void)
>> /* Initialize per network namespace state */
>> static int __net_init netdev_init(struct net *net)
>> {
>> - INIT_LIST_HEAD(&net->dev_base_head);
>> -
>
> if (net != &init_net)
> INIT_LIST_HEAD(&net->dev_base_head);

Ooooh. Good catch.

>> net->dev_name_head = netdev_create_hash();
>> if (net->dev_name_head == NULL)
>> goto err_name;
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index dddbacb..42f1e1c 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
>> LIST_HEAD(net_namespace_list);
>> EXPORT_SYMBOL_GPL(net_namespace_list);
>>
>> -struct net init_net;
>> +struct net init_net = {
>> + .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
>> +};
>> EXPORT_SYMBOL(init_net);
>>
>> #define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */


If this looks like a good change, I can send the patch. Is there any concern about init_net going from bss to data?

--
Mark Rustad, LAN Access Division, Intel Corporation


2012-07-17 19:17:33

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 12:09 PM, John Fastabend wrote:
> On 7/17/2012 12:00 PM, John Fastabend wrote:
>> On 7/17/2012 11:48 AM, Rustad, Mark D wrote:
>>> On Jul 17, 2012, at 10:41 AM, Rustad, Mark D wrote:
>>>
>>>> On Jul 17, 2012, at 9:01 AM, David Miller wrote:
>>>>
>>>>> Linus was _extremely_ generous and took in all the stuff that was
>>>>> pending in the net tree just now.
>>>>
>>>> Maybe *too* generous. :-) I just updated and when I boot I get an
>>>> early crash in update_netdev_tables which is in netprio_cgroup.c.
>>>>
>>>>> Besides very serious issues, I'm not willing to consider any more bug
>>>>> fixes for the 'net' tree at this time.
>>>>
>>>> I think the above issue will have to be fixed, as it completely
>>>> prevents booting for any kernel that includes the netprio_cgroup
>>>> option.
>>>>
>>>>> Only one pending known bug qualifies, and that's the CIPSO ip option
>>>>> processing OOPS'er. And I'll work on that myself if Paul Moore
>>>>> doesn't show a sign of life in the next day.
>>>>>
>>>>> Thanks.
>>>>
>>>>
>>>> I can start taking a look at this if you like, but I see that Gao
>>>> feng has two patches in the last set of patches that may be related.
>>>>
>>>> To give you an idea how early the crash is, here are a few log
>>>> messages leading up to it:
>>>>
>>>> [ 0.003455] Dentry cache hash table entries: 262144 (order: 9,
>>>> 2097152 bytes)
>>>> [ 0.005550] Inode-cache hash table entries: 131072 (order: 8,
>>>> 1048576 bytes)
>>>> [ 0.007165] Mount-cache hash table entries: 256
>>>> [ 0.010289] Initializing cgroup subsys net_cls
>>>> [ 0.010947] Initializing cgroup subsys net_prio
>>>> [ 0.011039] BUG: unable to handle kernel NULL pointer dereference
>>>> at 0000000000000828
>>>> [ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
>>>
>>>
>>> I found that I can avoid the crash by configuring the netprio_cgroup
>>> as a module. I don't need to have it built in, I just happened to.
>>> This finding may lower the temperature of this issue a lot from what I
>>> had been feeling.
>>>
>>
>> hmm looks like we access init_net here,
>>
>> static void update_netdev_tables(void)
>> {
>> struct net_device *dev;
>> u32 max_len = atomic_read(&max_prioidx) + 1;
>> struct netprio_map *map;
>>
>> rtnl_lock();
>> for_each_netdev(&init_net, dev) {
>> map = rtnl_dereference(dev->priomap);
>> if ((!map) ||
>> (map->priomap_len < max_len))
>> extend_netdev_table(dev, max_len);
>> }
>> rtnl_unlock();
>> }
>>
>> but inet_net is initialized by pure_initcall(net_ns_init) and I
>> gather pure_initcall's should not have any dependencies but it
>> looks like we created one here with cgroup_init_early() in
>> start_kernel().
>>
>> I'll poke around some more. Also had some off list help from
>> Mark.
>>
>> .John
>>
>
> although we don't have an early_init hook for netprio_cgroup so this
> is probably not correct.

Hey Mark,

you have better timing then me (I can't make this fail). Can you try
cgroup_init below rest_init() in start_kernel(). That's in init/main.c

.John


2012-07-17 19:00:20

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 11:48 AM, Rustad, Mark D wrote:
> On Jul 17, 2012, at 10:41 AM, Rustad, Mark D wrote:
>
>> On Jul 17, 2012, at 9:01 AM, David Miller wrote:
>>
>>> Linus was _extremely_ generous and took in all the stuff that was
>>> pending in the net tree just now.
>>
>> Maybe *too* generous. :-) I just updated and when I boot I get an early crash in update_netdev_tables which is in netprio_cgroup.c.
>>
>>> Besides very serious issues, I'm not willing to consider any more bug
>>> fixes for the 'net' tree at this time.
>>
>> I think the above issue will have to be fixed, as it completely prevents booting for any kernel that includes the netprio_cgroup option.
>>
>>> Only one pending known bug qualifies, and that's the CIPSO ip option
>>> processing OOPS'er. And I'll work on that myself if Paul Moore
>>> doesn't show a sign of life in the next day.
>>>
>>> Thanks.
>>
>>
>> I can start taking a look at this if you like, but I see that Gao feng has two patches in the last set of patches that may be related.
>>
>> To give you an idea how early the crash is, here are a few log messages leading up to it:
>>
>> [ 0.003455] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
>> [ 0.005550] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
>> [ 0.007165] Mount-cache hash table entries: 256
>> [ 0.010289] Initializing cgroup subsys net_cls
>> [ 0.010947] Initializing cgroup subsys net_prio
>> [ 0.011039] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
>> [ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
>
>
> I found that I can avoid the crash by configuring the netprio_cgroup as a module. I don't need to have it built in, I just happened to. This finding may lower the temperature of this issue a lot from what I had been feeling.
>

hmm looks like we access init_net here,

static void update_netdev_tables(void)
{
struct net_device *dev;
u32 max_len = atomic_read(&max_prioidx) + 1;
struct netprio_map *map;

rtnl_lock();
for_each_netdev(&init_net, dev) {
map = rtnl_dereference(dev->priomap);
if ((!map) ||
(map->priomap_len < max_len))
extend_netdev_table(dev, max_len);
}
rtnl_unlock();
}

but inet_net is initialized by pure_initcall(net_ns_init) and I
gather pure_initcall's should not have any dependencies but it
looks like we created one here with cgroup_init_early() in
start_kernel().

I'll poke around some more. Also had some off list help from
Mark.

.John


2012-07-17 19:26:26

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 12:17 PM, John Fastabend wrote:
> On 7/17/2012 12:09 PM, John Fastabend wrote:
>> On 7/17/2012 12:00 PM, John Fastabend wrote:
>>> On 7/17/2012 11:48 AM, Rustad, Mark D wrote:
>>>> On Jul 17, 2012, at 10:41 AM, Rustad, Mark D wrote:
>>>>
>>>>> On Jul 17, 2012, at 9:01 AM, David Miller wrote:
>>>>>
>>>>>> Linus was _extremely_ generous and took in all the stuff that was
>>>>>> pending in the net tree just now.
>>>>>
>>>>> Maybe *too* generous. :-) I just updated and when I boot I get an
>>>>> early crash in update_netdev_tables which is in netprio_cgroup.c.
>>>>>
>>>>>> Besides very serious issues, I'm not willing to consider any more bug
>>>>>> fixes for the 'net' tree at this time.
>>>>>
>>>>> I think the above issue will have to be fixed, as it completely
>>>>> prevents booting for any kernel that includes the netprio_cgroup
>>>>> option.
>>>>>
>>>>>> Only one pending known bug qualifies, and that's the CIPSO ip option
>>>>>> processing OOPS'er. And I'll work on that myself if Paul Moore
>>>>>> doesn't show a sign of life in the next day.
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>>
>>>>> I can start taking a look at this if you like, but I see that Gao
>>>>> feng has two patches in the last set of patches that may be related.
>>>>>
>>>>> To give you an idea how early the crash is, here are a few log
>>>>> messages leading up to it:
>>>>>
>>>>> [ 0.003455] Dentry cache hash table entries: 262144 (order: 9,
>>>>> 2097152 bytes)
>>>>> [ 0.005550] Inode-cache hash table entries: 131072 (order: 8,
>>>>> 1048576 bytes)
>>>>> [ 0.007165] Mount-cache hash table entries: 256
>>>>> [ 0.010289] Initializing cgroup subsys net_cls
>>>>> [ 0.010947] Initializing cgroup subsys net_prio
>>>>> [ 0.011039] BUG: unable to handle kernel NULL pointer dereference
>>>>> at 0000000000000828
>>>>> [ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
>>>>
>>>>
>>>> I found that I can avoid the crash by configuring the netprio_cgroup
>>>> as a module. I don't need to have it built in, I just happened to.
>>>> This finding may lower the temperature of this issue a lot from what I
>>>> had been feeling.
>>>>
>>>
>>> hmm looks like we access init_net here,
>>>
>>> static void update_netdev_tables(void)
>>> {
>>> struct net_device *dev;
>>> u32 max_len = atomic_read(&max_prioidx) + 1;
>>> struct netprio_map *map;
>>>
>>> rtnl_lock();
>>> for_each_netdev(&init_net, dev) {
>>> map = rtnl_dereference(dev->priomap);
>>> if ((!map) ||
>>> (map->priomap_len < max_len))
>>> extend_netdev_table(dev, max_len);
>>> }
>>> rtnl_unlock();
>>> }
>>>
>>> but inet_net is initialized by pure_initcall(net_ns_init) and I
>>> gather pure_initcall's should not have any dependencies but it
>>> looks like we created one here with cgroup_init_early() in
>>> start_kernel().
>>>
>>> I'll poke around some more. Also had some off list help from
>>> Mark.
>>>
>>> .John
>>>
>>
>> although we don't have an early_init hook for netprio_cgroup so this
>> is probably not correct.
>
> Hey Mark,
>
> you have better timing then me (I can't make this fail). Can you try
> cgroup_init below rest_init() in start_kernel(). That's in init/main.c
>
> .John
>

ugh nevermind that was stupid... I'm going to stop hitting the lists
with useless noise and be back with a fix in awhile.

2012-07-18 17:36:37

by Rustad, Mark D

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On Jul 18, 2012, at 6:04 AM, Neil Horman wrote:

> John, can you post the backtrace you got for this? I replied to the patch that
> you posted for this fix. the cgroup subsystem has an early_init flag thats
> supposed to prevent the initialization of cgroups that don't need initialization
> until later (like via module_init() calls).

Here is the backtrace that I get and below a patch that fixes it:

[ 0.010958] Initializing cgroup subsys net_prio
[ 0.011040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
[ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
[ 0.011998] PGD 0
[ 0.011998] Oops: 0000 [#1] SMP
[ 0.011998] CPU 0
[ 0.011998] Modules linked in:
[ 0.011998]
[ 0.011998] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7-mdrlinux+ #10 Bochs Bochs
[ 0.011998] RIP: 0010:[<ffffffff814202c8>] [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
[ 0.011998] RSP: 0000:ffffffff81a01e68 EFLAGS: 00010246
[ 0.011998] RAX: 0000000000000000 RBX: fffffffffffffed0 RCX: 0000000000000000
[ 0.011998] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 2222222222222222
[ 0.011998] RBP: ffffffff81a01e88 R08: 2222222222222222 R09: 2222222222222222
[ 0.011998] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[ 0.011998] R13: 0000000000000000 R14: ffff88007ff608c0 R15: 00000000000143d0
[ 0.011998] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 0.011998] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.011998] CR2: 0000000000000828 CR3: 0000000001a0b000 CR4: 00000000000006b0
[ 0.011998] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.011998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.011998] Process swapper/0 (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a13420)
[ 0.011998] Stack:
[ 0.011998] ffffffff81a88020 0000000000000000 ffff88007d3a38f0 fffffffffffffff4
[ 0.011998] ffffffff81a01ec8 ffffffff814203cd ffff88007ff608c0 ffffffff817d8e9a
[ 0.011998] ffffffff81a88cd8 ffffffff81a88020 ffffffff81a88020 ffffffff81b010a0
[ 0.011998] Call Trace:
[ 0.011998] [<ffffffff814203cd>] cgrp_create+0x8d/0xc0
[ 0.011998] [<ffffffff81ade14a>] cgroup_init_subsys+0x80/0x126
[ 0.011998] [<ffffffff81ade380>] cgroup_init+0x36/0x117
[ 0.011998] [<ffffffff81acab71>] start_kernel+0x32e/0x34f
[ 0.011998] [<ffffffff81aca6d5>] ? repair_env_string+0x5a/0x5a
[ 0.011998] [<ffffffff81aca346>] x86_64_start_reservations+0x101/0x105
[ 0.011998] [<ffffffff81aca120>] ? early_idt_handlers+0x120/0x120
[ 0.011998] [<ffffffff81aca417>] x86_64_start_kernel+0xcd/0xdc
[ 0.011998] Code: 0f 1f 00 48 8b 83 30 01 00 00 48 8d 98 d0 fe ff ff 48 3d a8 e8 52 82 74 3a e8 25 db c3 ff 85 c0 74 09 80 3d bb 3d 68 00 00 74 40 <48> 8b 83 58 09 00 00 48 85 c0 74 cc 44 3b 60 10 76 c6 44 89 e6
[ 0.011998] RIP [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0
[ 0.011998] RSP <ffffffff81a01e68>
[ 0.011998] CR2: 0000000000000828
[ 0.012009] ---[ end trace a7919e7f17c0a725 ]---
[ 0.012601] Kernel panic - not syncing: Attempted to kill the idle task!

The following change simply statically initializes init_net.dev_base_head. I copied and pasted it into the email, so this rendering may not work, but I can send it if this approach looks reasonable. I have verified that it resolves the issue above.

diff --git a/net/core/dev.c b/net/core/dev.c
index 0f28a9e..db1ba61 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6283,8 +6283,6 @@ static struct hlist_head *netdev_create_hash(void)
/* Initialize per network namespace state */
static int __net_init netdev_init(struct net *net)
{
- INIT_LIST_HEAD(&net->dev_base_head);
-
net->dev_name_head = netdev_create_hash();
if (net->dev_name_head == NULL)
goto err_name;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index dddbacb..42f1e1c 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
LIST_HEAD(net_namespace_list);
EXPORT_SYMBOL_GPL(net_namespace_list);

-struct net init_net;
+struct net init_net = {
+ .dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
+};
EXPORT_SYMBOL(init_net);

#define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */

--
Mark Rustad, LAN Access Division, Intel Corporation


2012-07-17 22:13:37

by John Fastabend

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On 7/17/2012 2:02 PM, David Miller wrote:
> From: John Fastabend <[email protected]>
> Date: Tue, 17 Jul 2012 13:50:16 -0700
>
>> On 7/17/2012 12:24 PM, David Miller wrote:
>>> From: John Fastabend <[email protected]>
>>> Date: Tue, 17 Jul 2012 12:09:53 -0700
>>>
>>>> although we don't have an early_init hook for netprio_cgroup so this
>>>> is probably not correct.
>>>
>>> The dependency is actually on net_dev_init (a subsys_initcall) rather
>>> than a pure_initcall.
>>>
>>> net_dev_init is what registers the netdev_net_ops, which in turn
>>> initializes the netdev list in namespaces such as &init_net
>>>
>>
>> Ah right thanks sorry for the thrash. I guess we need to check if the
>> netdev list in the init_net namespace is initialized.
>
> It's a hack, but we could export and then test dev_boot_phase == 0,
> and if that test is true then skip the init_net device walk in the
> cgroup code.
>
> But I don't like that very much.
>
> The things this code cares about can't even be an issue until
> net_dev_init() runs.
>
> There is a comment warning not to do this in linux/init.h, but we
> could change the module_init() in netprio_cgroup.c to some level which
> runs after subsys_inticall(). When built as a module, linux/init.h
> will translate this into module_init() which is basically the behavior
> we want.
>

Perhaps the easiest way is to check net->count this should be zero
until setup_net is called.

if (!atomic_read(&init_net.count))
return ret;

2012-07-17 22:18:33

by David Miller

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

From: John Fastabend <[email protected]>
Date: Tue, 17 Jul 2012 15:13:36 -0700

> Perhaps the easiest way is to check net->count this should be zero
> until setup_net is called.
>
> if (!atomic_read(&init_net.count))
> return ret;
>

Won't work, setup_net() runs via a pure_initcall().

2012-07-17 19:40:31

by John W. Linville

[permalink] [raw]
Subject: wireless.git frozen -- Re: That's pretty much it for 3.5.0

On Tue, Jul 17, 2012 at 09:01:42AM -0700, David Miller wrote:
>
> Linus was _extremely_ generous and took in all the stuff that was
> pending in the net tree just now.
>
> Besides very serious issues, I'm not willing to consider any more bug
> fixes for the 'net' tree at this time.
>
> Only one pending known bug qualifies, and that's the CIPSO ip option
> processing OOPS'er. And I'll work on that myself if Paul Moore
> doesn't show a sign of life in the next day.
>
> Thanks.

Now only fixes for truly "show stopper" bugs will be accepted for
the 3.5 stream. I don't believe that any of the handful of fixes
currently in wireless.git (but not yet in net.git) are sufficiently
important to make the cut.

I will pull the current wireless.git tree into the wireless-next.git
tree, and then wireless.git will remain frozen until 3.6-rc1 is
released. If you have a wireless fix that you believe is sufficiently
important to merit being in 3.5, then please post it to the netdev
list (and Cc: [email protected] when you do so).

Thanks,

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2012-07-17 18:48:58

by Rustad, Mark D

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

On Jul 17, 2012, at 10:41 AM, Rustad, Mark D wrote:

> On Jul 17, 2012, at 9:01 AM, David Miller wrote:
>
>> Linus was _extremely_ generous and took in all the stuff that was
>> pending in the net tree just now.
>
> Maybe *too* generous. :-) I just updated and when I boot I get an early crash in update_netdev_tables which is in netprio_cgroup.c.
>
>> Besides very serious issues, I'm not willing to consider any more bug
>> fixes for the 'net' tree at this time.
>
> I think the above issue will have to be fixed, as it completely prevents booting for any kernel that includes the netprio_cgroup option.
>
>> Only one pending known bug qualifies, and that's the CIPSO ip option
>> processing OOPS'er. And I'll work on that myself if Paul Moore
>> doesn't show a sign of life in the next day.
>>
>> Thanks.
>
>
> I can start taking a look at this if you like, but I see that Gao feng has two patches in the last set of patches that may be related.
>
> To give you an idea how early the crash is, here are a few log messages leading up to it:
>
> [ 0.003455] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
> [ 0.005550] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
> [ 0.007165] Mount-cache hash table entries: 256
> [ 0.010289] Initializing cgroup subsys net_cls
> [ 0.010947] Initializing cgroup subsys net_prio
> [ 0.011039] BUG: unable to handle kernel NULL pointer dereference at 0000000000000828
> [ 0.011998] IP: [<ffffffff814202c8>] update_netdev_tables+0x68/0xe0


I found that I can avoid the crash by configuring the netprio_cgroup as a module. I don't need to have it built in, I just happened to. This finding may lower the temperature of this issue a lot from what I had been feeling.

--
Mark Rustad, LAN Access Division, Intel Corporation


2012-07-17 21:02:44

by David Miller

[permalink] [raw]
Subject: Re: That's pretty much it for 3.5.0

From: John Fastabend <[email protected]>
Date: Tue, 17 Jul 2012 13:50:16 -0700

> On 7/17/2012 12:24 PM, David Miller wrote:
>> From: John Fastabend <[email protected]>
>> Date: Tue, 17 Jul 2012 12:09:53 -0700
>>
>>> although we don't have an early_init hook for netprio_cgroup so this
>>> is probably not correct.
>>
>> The dependency is actually on net_dev_init (a subsys_initcall) rather
>> than a pure_initcall.
>>
>> net_dev_init is what registers the netdev_net_ops, which in turn
>> initializes the netdev list in namespaces such as &init_net
>>
>
> Ah right thanks sorry for the thrash. I guess we need to check if the
> netdev list in the init_net namespace is initialized.

It's a hack, but we could export and then test dev_boot_phase == 0,
and if that test is true then skip the init_net device walk in the
cgroup code.

But I don't like that very much.

The things this code cares about can't even be an issue until
net_dev_init() runs.

There is a comment warning not to do this in linux/init.h, but we
could change the module_init() in netprio_cgroup.c to some level which
runs after subsys_inticall(). When built as a module, linux/init.h
will translate this into module_init() which is basically the behavior
we want.