2006-12-18 08:53:46

by Santiago Garcia Mantinan

[permalink] [raw]
Subject: ebtables problems on 2.6.19.1

Hi!

When trying to upgrade a machine from 2.6.18 to 2.6.19.1 I found that it
crashed when loading the ebtables rules on startup.

This is an example of the crash I get:

BUG: unable to handle kernel paging request at virtual address e081e004
printing eip:
c0283da0
*pde = 1fbcb067
*pte = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c0283da0>] Not tainted VLI
EFLAGS: 00010282 (2.6.19.1 #1)
EIP is at translate_table+0x600/0xe90
eax: e081df98 ebx: 0000000e ecx: e081df98 edx: e081df98
esi: 00000028 edi: dfb37cec ebp: e081d000 esp: dfb37c30
ds: 007b es: 007b ss: 0068
Process ebtables (pid: 609, ti=dfb36000 task=c14d1550 task.ti=dfb36000)
Stack: e081df4c 00000024 e081b000 dfb37cec 00000020 00000000 00000010 00000000
00000000 00000fc8 00000f98 e081df98 00000044 00000110 00000001 00000001
00000110 00000138 00000000 00000000 e081d000 e081df98 00000005 00000010
Call Trace:
[<c0284daf>] do_ebt_set_ctl+0x28f/0x6b0
[<c013132f>] __alloc_pages+0x4f/0x2e0
[<c023ef9d>] nf_sockopt+0xad/0x100
[<c023f03e>] nf_setsockopt+0x1e/0x30
[<c024aeac>] ip_setsockopt+0x12c/0xc50
[<c010cb20>] do_page_fault+0x0/0x640
[<c028a7e9>] error_code+0x39/0x40
[<c0115888>] current_fs_time+0x48/0x60
[<c01564ad>] touch_atime+0x5d/0xb0
[<c012d535>] do_generic_mapping_read+0x385/0x490
[<c012c9b0>] file_read_actor+0x0/0x100
[<c012f4d0>] generic_file_aio_read+0xf0/0x220
[<c013132f>] __alloc_pages+0x4f/0x2e0
[<c012f1dd>] filemap_nopage+0x14d/0x350
[<c013788d>] unmap_vmas+0x29d/0x480
[<c013850e>] __handle_mm_fault+0x53e/0x630
[<c0139205>] free_pgtables+0x85/0xb0
[<c0226db3>] sock_common_setsockopt+0x23/0x30
[<c0224f0f>] sys_setsockopt+0x5f/0xb0
[<c02267e9>] sys_socketcall+0x209/0x280
[<c010cb20>] do_page_fault+0x0/0x640
[<c0102c8f>] syscall_call+0x7/0xb
[<c028007b>] br_stp_change_bridge_id+0xb/0x1a0
=======================
Code: 17 0f 83 a2 03 00 00 8b 4c 24 08 8b 5c 24 28 8b 7c 24 0c 8b 69 24 01 eb 89
5c 24 2c 8b 44 24 2c 8b 54 24 2c 8b 5f 20 8b 4c 24 2c <8b> 40 6c 89 44 24 44 8b
52 68 89 54 24 40 8b 01 85 c0 0f 84 3a
EIP: [<c0283da0>] translate_table+0x600/0xe90 SS:ESP 0068:dfb37c30

I've tried to find a subset of the rules that are causing this and I found
that to be very difficult as I have only got this to fail if I load the
ebtables rules at boot time, if I try to load them after the machine is
completely booted it works ok. 2.6.18 still works ok, both kernels have the
"same" config where posible and they are not SMP.

The machine that was having the failure was a PIII 1GHz, I have copied the
filesystem to a PIV 1.6Ghz where it also fails and where I can do tests and
access the console via serial port.

The machine is not being used as a brouter but only as a bridge firewall, it
has some ebtables rules to cut non IP stuff and then does all the work at
iptables level.

I don't know what other info to add here, tell me if you need any other
stuff to diagnose this or any testing here.

Regards...
--
Santiago Garc?a Manti??n


2006-12-20 09:24:10

by Patrick McHardy

[permalink] [raw]
Subject: Re: ebtables problems on 2.6.19.1

Santiago Garcia Mantinan wrote:
> Hi!
>
> When trying to upgrade a machine from 2.6.18 to 2.6.19.1 I found that it
> crashed when loading the ebtables rules on startup.
>
> This is an example of the crash I get:
>
> BUG: unable to handle kernel paging request at virtual address e081e004
> printing eip:
> c0283da0
> *pde = 1fbcb067
> *pte = 00000000
> Oops: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c0283da0>] Not tainted VLI
> EFLAGS: 00010282 (2.6.19.1 #1)
> EIP is at translate_table+0x600/0xe90
>
> [..]
>
> I've tried to find a subset of the rules that are causing this and I found
> that to be very difficult as I have only got this to fail if I load the
> ebtables rules at boot time, if I try to load them after the machine is
> completely booted it works ok. 2.6.18 still works ok, both kernels have the
> "same" config where posible and they are not SMP.

At what point during boot time do you load your rules? Is networking
already up?

> The machine that was having the failure was a PIII 1GHz, I have copied the
> filesystem to a PIV 1.6Ghz where it also fails and where I can do tests and
> access the console via serial port.
>
> The machine is not being used as a brouter but only as a bridge firewall, it
> has some ebtables rules to cut non IP stuff and then does all the work at
> iptables level.
>
> I don't know what other info to add here, tell me if you need any other
> stuff to diagnose this or any testing here.

I'm trying to reproduce this (without success so far), please send your
kernel config and your ebtables script.

You could try if 2.6.19 works, there were some ebtables changes in
2.6.19.1 that touched this code.

2006-12-24 03:15:17

by Christopher S. Aker

[permalink] [raw]
Subject: Re: ebtables problems on 2.6.19.1 *and* 2.6.16.36

Patrick McHardy wrote:
> I'm trying to reproduce this (without success so far), please send your
> kernel config and your ebtables script.
>
> You could try if 2.6.19 works, there were some ebtables changes in
> 2.6.19.1 that touched this code.

We're hitting this too, on both 2.6.16.36 and 2.6.19.1.

BUG: unable to handle kernel paging request at virtual address f8cec008
printing eip:
c0462272
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: e1000
CPU: 1
EIP: 0060:[<c0462272>] Not tainted VLI
EFLAGS: 00010286 (2.6.19.1-1-bigmem #1)
EIP is at translate_table+0x2b3/0xddf
eax: f8ce2000 ebx: 00000004 ecx: f6d53e90 edx: f8ce2000
esi: f8cebfa0 edi: 0000000e ebp: 00000000 esp: f6d53e08
ds: 007b es: 007b ss: 0068
Process ebtables (pid: 4788, ti=f6d52000 task=f6d51550 task.ti=f6d52000)
Stack: f6d53e40 c0540440 00000007 f6d53ebc 00000001 00000028 00000000
00000000
00000004 00000fa0 00000fd0 f8d38000 f8ce2000 f6d53e90 00000000
80000000
00000000 00000000 00000000 00000004 00000014 00000000 00000014
00000600
Call Trace:
[<c0462f5f>] do_replace+0x113/0x6da
[<c0142267>] get_page_from_freelist+0x8c/0xa8
[<c0463f4c>] do_ebt_set_ctl+0x2d/0x2e
[<c03efbc2>] nf_sockopt+0xfa/0xfc
[<c03efbe7>] nf_setsockopt+0x23/0x2b
[<c03fac35>] ip_setsockopt+0x86/0x91
[<c03d54ef>] sock_common_setsockopt+0x23/0x2f
[<c03d2d69>] sys_setsockopt+0x61/0xac
[<c03d33f3>] sys_socketcall+0x1e9/0x249
[<c0114348>] do_page_fault+0x0/0x664
[<c0102bc5>] sysenter_past_esp+0x56/0x79
[<c047007b>] svc_recv+0x9c/0x3f5
=======================
Code: 30 3b 28 0f 83 5c 02 00 00 8b 54 24 30 8b 74 24 24 8b 4c 24 34 8b
5c 24 4c 03 72 24 8b 79 20 89 5c 24 20 c7 44 24 1c 00 00 00 00 <8b> 56
68 8b 46 6c 29 d0 31 d2 89 44 24 14 8b 06 85 c0 0f 84 f7
EIP: [<c0462272>] translate_table+0x2b3/0xddf SS:ESP 0068:f6d53e08


Unable to handle kernel paging request at virtual address f8a3b00c
printing eip:
c03cce45
*pde = 00000000
Oops: 0000 [#13]
SMP
Modules linked in: e1000
CPU: 1
EIP: 0060:[<c03cce45>] Not tainted VLI
EFLAGS: 00010246 (2.6.16.36-1-bigmem #1)
EIP is at translate_table+0x47b/0xfc2
eax: d8fbbc3c ebx: 00000098 ecx: c049b780 edx: 00000000
esi: f8a3afa0 edi: 0000000e ebp: 00000001 esp: d8fbbb7c
ds: 007b es: 007b ss: 0068
Process ebtables (pid: 7917, threadinfo=d8fba000 task=e7892550)
Stack: <0>c049b75c f8a3af78 c04468f8 d8fbbbcc c049b740 00000007 d8fbbc68
d30f4260
000000d2 d8fba000 d30f4240 d8fba000 00000028 00000004 00000000
00000004
00000000 00000fa0 00000fd0 f8a8e000 00000000 f8a38000 00000000
00000000
Call Trace:
[<c03cdbd0>] do_replace+0x16b/0x887
[<c03ced74>] copy_everything_to_user+0x21a/0x35c
[<c03ceef6>] do_ebt_set_ctl+0x40/0x42
[<c0354ee0>] nf_sockopt+0x11f/0x121
[<c0354f19>] nf_setsockopt+0x37/0x3b
[<c0360b14>] ip_setsockopt+0x3f9/0xb0e
[<c0354e6e>] nf_sockopt+0xad/0x121
[<c0354f54>] nf_getsockopt+0x37/0x3b
[<c03617e6>] ip_getsockopt+0x5bd/0x62b
[<c012360e>] current_fs_time+0x5d/0x78
[<c0178813>] touch_atime+0x7d/0xcd
[<c014b366>] zap_pte_range+0xf1/0x316
[<c014b68e>] unmap_page_range+0x103/0x174
[<c02228a7>] prio_tree_remove+0x77/0xe7
[<c014358c>] buffered_rmqueue+0x155/0x209
[<c014358c>] buffered_rmqueue+0x155/0x209
[<c014376e>] get_page_from_freelist+0x8c/0xa6
[<c014376e>] get_page_from_freelist+0x8c/0xa6
[<c01437de>] __alloc_pages+0x56/0x309
[<c015274c>] page_add_file_rmap+0x2a/0x2c
[<c014d48d>] do_anonymous_page+0x122/0x22a
[<c014dabd>] __handle_mm_fault+0x138/0x326
[<c03391e6>] sock_common_setsockopt+0x33/0x37
[<c0336c88>] sys_setsockopt+0x6c/0xb2
[<c033739a>] sys_socketcall+0x1f4/0x254
[<c01160e5>] do_page_fault+0x0/0x630
[<c0102c7f>] sysenter_past_esp+0x54/0x75
Code: 24 8b bc 24 8c 00 00 00 8b 84 24 88 00 00 00 8b 54 24 64 8b 74 24
44 03 77 24 8b 78 20 c7 44 24 38 00 00 00 00 89 54 24 3c 31 d2 <8b> 4e
6c 8b 5e 68 29 d9 89 4c 24 30 8b 06 85 c0 0f 84 14 02 00


It seems to happen when flushing a user-defined ebtable, or removing a
rule -- but not every time. It leaves the ebtable userspace process in D
state on 2.6.19.1 but not on 2.6.16.36 (?).

Considering I've never had these problems before, and that both stable
(2.6.16.36) and current (2.6.19.1) exhibit this issue, I'd venture to
guess that it's something that went into both of them very recently.

-Chris

2006-12-25 01:09:10

by Christopher S. Aker

[permalink] [raw]
Subject: Re: ebtables problems on 2.6.19.1 *and* 2.6.16.36

Christopher S. Aker wrote:
> Patrick McHardy wrote:
>> I'm trying to reproduce this (without success so far), please send your
>> kernel config and your ebtables script.
>>
>> You could try if 2.6.19 works, there were some ebtables changes in
>> 2.6.19.1 that touched this code.
>
> We're hitting this too, on both 2.6.16.36 and 2.6.19.1.
>
> BUG: unable to handle kernel paging request at virtual address f8cec008
> printing eip:
> c0462272
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> Modules linked in: e1000
> CPU: 1
> EIP: 0060:[<c0462272>] Not tainted VLI
> EFLAGS: 00010286 (2.6.19.1-1-bigmem #1)
> EIP is at translate_table+0x2b3/0xddf
> eax: f8ce2000 ebx: 00000004 ecx: f6d53e90 edx: f8ce2000
> esi: f8cebfa0 edi: 0000000e ebp: 00000000 esp: f6d53e08
> ds: 007b es: 007b ss: 0068
> Process ebtables (pid: 4788, ti=f6d52000 task=f6d51550 task.ti=f6d52000)
> Stack: f6d53e40 c0540440 00000007 f6d53ebc 00000001 00000028 00000000
> 00000000
> 00000004 00000fa0 00000fd0 f8d38000 f8ce2000 f6d53e90 00000000
> 80000000
> 00000000 00000000 00000000 00000004 00000014 00000000 00000014
> 00000600
> Call Trace:
> [<c0462f5f>] do_replace+0x113/0x6da
> [<c0142267>] get_page_from_freelist+0x8c/0xa8
> [<c0463f4c>] do_ebt_set_ctl+0x2d/0x2e
> [<c03efbc2>] nf_sockopt+0xfa/0xfc
> [<c03efbe7>] nf_setsockopt+0x23/0x2b
> [<c03fac35>] ip_setsockopt+0x86/0x91
> [<c03d54ef>] sock_common_setsockopt+0x23/0x2f
> [<c03d2d69>] sys_setsockopt+0x61/0xac
> [<c03d33f3>] sys_socketcall+0x1e9/0x249
> [<c0114348>] do_page_fault+0x0/0x664
> [<c0102bc5>] sysenter_past_esp+0x56/0x79
> [<c047007b>] svc_recv+0x9c/0x3f5
> =======================
> Code: 30 3b 28 0f 83 5c 02 00 00 8b 54 24 30 8b 74 24 24 8b 4c 24 34 8b
> 5c 24 4c 03 72 24 8b 79 20 89 5c 24 20 c7 44 24 1c 00 00 00 00 <8b> 56
> 68 8b 46 6c 29 d0 31 d2 89 44 24 14 8b 06 85 c0 0f 84 f7
> EIP: [<c0462272>] translate_table+0x2b3/0xddf SS:ESP 0068:f6d53e08
>
>
> Unable to handle kernel paging request at virtual address f8a3b00c
> printing eip:
> c03cce45
> *pde = 00000000
> Oops: 0000 [#13]
> SMP
> Modules linked in: e1000
> CPU: 1
> EIP: 0060:[<c03cce45>] Not tainted VLI
> EFLAGS: 00010246 (2.6.16.36-1-bigmem #1)
> EIP is at translate_table+0x47b/0xfc2
> eax: d8fbbc3c ebx: 00000098 ecx: c049b780 edx: 00000000
> esi: f8a3afa0 edi: 0000000e ebp: 00000001 esp: d8fbbb7c
> ds: 007b es: 007b ss: 0068
> Process ebtables (pid: 7917, threadinfo=d8fba000 task=e7892550)
> Stack: <0>c049b75c f8a3af78 c04468f8 d8fbbbcc c049b740 00000007 d8fbbc68
> d30f4260
> 000000d2 d8fba000 d30f4240 d8fba000 00000028 00000004 00000000
> 00000004
> 00000000 00000fa0 00000fd0 f8a8e000 00000000 f8a38000 00000000
> 00000000
> Call Trace:
> [<c03cdbd0>] do_replace+0x16b/0x887
> [<c03ced74>] copy_everything_to_user+0x21a/0x35c
> [<c03ceef6>] do_ebt_set_ctl+0x40/0x42
> [<c0354ee0>] nf_sockopt+0x11f/0x121
> [<c0354f19>] nf_setsockopt+0x37/0x3b
> [<c0360b14>] ip_setsockopt+0x3f9/0xb0e
> [<c0354e6e>] nf_sockopt+0xad/0x121
> [<c0354f54>] nf_getsockopt+0x37/0x3b
> [<c03617e6>] ip_getsockopt+0x5bd/0x62b
> [<c012360e>] current_fs_time+0x5d/0x78
> [<c0178813>] touch_atime+0x7d/0xcd
> [<c014b366>] zap_pte_range+0xf1/0x316
> [<c014b68e>] unmap_page_range+0x103/0x174
> [<c02228a7>] prio_tree_remove+0x77/0xe7
> [<c014358c>] buffered_rmqueue+0x155/0x209
> [<c014358c>] buffered_rmqueue+0x155/0x209
> [<c014376e>] get_page_from_freelist+0x8c/0xa6
> [<c014376e>] get_page_from_freelist+0x8c/0xa6
> [<c01437de>] __alloc_pages+0x56/0x309
> [<c015274c>] page_add_file_rmap+0x2a/0x2c
> [<c014d48d>] do_anonymous_page+0x122/0x22a
> [<c014dabd>] __handle_mm_fault+0x138/0x326
> [<c03391e6>] sock_common_setsockopt+0x33/0x37
> [<c0336c88>] sys_setsockopt+0x6c/0xb2
> [<c033739a>] sys_socketcall+0x1f4/0x254
> [<c01160e5>] do_page_fault+0x0/0x630
> [<c0102c7f>] sysenter_past_esp+0x54/0x75
> Code: 24 8b bc 24 8c 00 00 00 8b 84 24 88 00 00 00 8b 54 24 64 8b 74 24
> 44 03 77 24 8b 78 20 c7 44 24 38 00 00 00 00 89 54 24 3c 31 d2 <8b> 4e
> 6c 8b 5e 68 29 d9 89 4c 24 30 8b 06 85 c0 0f 84 14 02 00
>
>
> It seems to happen when flushing a user-defined ebtable, or removing a
> rule -- but not every time. It leaves the ebtable userspace process in D
> state on 2.6.19.1 but not on 2.6.16.36 (?).
>
> Considering I've never had these problems before, and that both stable
> (2.6.16.36) and current (2.6.19.1) exhibit this issue, I'd venture to
> guess that it's something that went into both of them very recently.

Just a follow-up -- this doesn't happen with 2.6.19.

-Chris