Hi list,
[16623.095403] BUG: unable to handle kernel paging request at 00000000010600d0
[16623.095445] IP: [<ffffffff81547767>] xfrm_selector_match+0x25/0x2f6
[16623.095480] PGD aeaea067 PUD 85d95067 PMD 0
[16623.095513] Oops: 0000 [#1] SMP
[16623.095543] Modules linked in: netconsole xt_nat xt_multiport veth ip_vs_rr
nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry iptable_mangle xt_mark
nf_conntrack_netlink nfnetlink ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_tcpudp iptable_filter ip_tables
cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace
ocfs2_stack_o2cb ocfs2_dlm bridge stp llc bonding fuse nf_conntrack_ftp 8021q
openvswitch gre vxlan xt_conntrack x_tables ocfs2_dlmfs dlm sctp ocfs2
ocfs2_nodemanager ocfs2_stackglue configfs rbd kvm_intel kvm coretemp ip_vs_ftp
ip_vs nf_nat nf_conntrack ctr twofish_generic twofish_x86_64 twofish_common
camellia_generic serpent_generic blowfish_generic blowfish_common cast5_generic
cast_common xcbc sha512_generic crypto_null af_key xfrm_algo psmouse serio_raw
i2c_i801 lpc_ich mfd_core evdev btrfs lzo_decompress lzo_compress
[16623.096062] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.33 #1
[16623.096091] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a
09/28/2011
[16623.096137] task: ffffffff81804450 ti: ffffffff817f4000 task.ti: ffffffff817f4000
[16623.096182] RIP: 0010:[<ffffffff81547767>] [<ffffffff81547767>]
xfrm_selector_match+0x25/0x2f6
[16623.096233] RSP: 0018:ffff88083fc03900 EFLAGS: 00010246
[16623.096261] RAX: 0000000000000001 RBX: ffff88083fc03a20 RCX: ffff880787fb1200
[16623.096292] RDX: 0000000000000002 RSI: ffff88083fc03a20 RDI: 00000000010600a6
[16623.096323] RBP: 00000000010600a6 R08: 0000000000000000 R09: ffff88083fc039a0
[16623.096353] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88083fc03a20
[16623.096383] R13: 0000000000000001 R14: ffffffff818a9700 R15: ffffffffa01c73e0
[16623.096414] FS: 0000000000000000(0000) GS:ffff88083fc00000(0000)
knlGS:0000000000000000
[16623.096469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16623.096498] CR2: 00000000010600d0 CR3: 0000000085f0b000 CR4: 00000000000407f0
[16623.096528] Stack:
[16623.096550] 0000000000000000 0000000001060002 ffff880787fb1200 ffff88083fc03a20
[16623.096602] 0000000000000001 ffffffff81547a7c 0000000000000000 ffff8800baad5480
[16623.096655] ffffffff81804450 ffffffff818a9700 000000003c9041bc ffffffff81547ef7
[16623.096721] Call Trace:
[16623.096744] <IRQ>
[16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
[16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
[16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
[16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
[16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
[16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
[16623.096959] [<ffffffff814fc2d3>] ? nf_iterate+0x42/0x80
[16623.096989] [<ffffffff814fc37a>] ? nf_hook_slow+0x69/0xff
[16623.097017] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
[16623.097047] [<ffffffff81501744>] ? ip_local_deliver+0x6f/0x7e
[16623.097078] [<ffffffff814d82a6>] ? __netif_receive_skb_core+0x5f0/0x66c
[16623.097108] [<ffffffff814d84b7>] ? process_backlog+0x13e/0x13e
[16623.097140] [<ffffffffa04b4e09>] ? br_handle_frame_finish+0x382/0x382 [bridge]
[16623.097187] [<ffffffff814d8503>] ? netif_receive_skb+0x4c/0x7d
[16623.097218] [<ffffffffa04b4d95>] ? br_handle_frame_finish+0x30e/0x382 [bridge]
[16623.097265] [<ffffffffa04b4fda>] ? br_handle_frame+0x1d1/0x217 [bridge]
[16623.097297] [<ffffffffa048e4aa>] ? bond_handle_frame+0x86/0x1bd [bonding]
[16623.097328] [<ffffffff814d8155>] ? __netif_receive_skb_core+0x49f/0x66c
[16623.097358] [<ffffffff814d8503>] ? netif_receive_skb+0x4c/0x7d
[16623.097388] [<ffffffff814d9091>] ? napi_gro_receive+0x35/0x76
[16623.097418] [<ffffffff813eae17>] ? e1000_clean_rx_irq+0x26d/0x2f5
[16623.097448] [<ffffffff813ef48e>] ? e1000e_poll+0x65/0x23d
[16623.097477] [<ffffffff814d8709>] ? net_rx_action+0xa2/0x1c0
[16623.097507] [<ffffffff8136895c>] ? credit_entropy_bits.part.7+0x127/0x168
[16623.097540] [<ffffffff8103cd9c>] ? __do_softirq+0xe8/0x201
[16623.097569] [<ffffffff815ae61c>] ? call_softirq+0x1c/0x30
[16623.097599] [<ffffffff810042ad>] ? do_softirq+0x2c/0x5f
[16623.097627] [<ffffffff8103cf7a>] ? irq_exit+0x3b/0x7f
[16623.097655] [<ffffffff81003d90>] ? do_IRQ+0x91/0xa8
[16623.097684] [<ffffffff815ac7ea>] ? common_interrupt+0x6a/0x6a
[16623.097713] <EOI>
[16623.097718] [<ffffffff8105549a>] ? enqueue_hrtimer+0x36/0x6d
[16623.097771] [<ffffffff814a904d>] ? cpuidle_enter_state+0x43/0xa6
[16623.097801] [<ffffffff814a9046>] ? cpuidle_enter_state+0x3c/0xa6
[16623.097831] [<ffffffff814a91a2>] ? cpuidle_idle_call+0xf2/0x19e
[16623.097862] [<ffffffff8100a2b8>] ? arch_cpu_idle+0x6/0x17
[16623.097892] [<ffffffff81071715>] ? cpu_startup_entry+0x119/0x1a8
[16623.097921] [<ffffffff818f6cf3>] ? start_kernel+0x3ca/0x3d5
[16623.097951] [<ffffffff818f673f>] ? repair_env_string+0x57/0x57
[16623.097979] Code: 5d 41 5e 41 5f c3 41 55 66 83 fa 02 41 54 55 48 89 fd 53 48
89 f3 41 50 74 11 31 c0 66 83 fa 0a 0f 85 ce 02 00 00 e9 fd 00 00 00 <0f> b6 47
2a 8b 17 8b 76 18 84 c0 74 1a b9 20 00 00 00 31 f2 29
[16623.098244] RIP [<ffffffff81547767>] xfrm_selector_match+0x25/0x2f6
[16623.098459] RSP <ffff88083fc03900>
[16623.098484] CR2: 00000000010600d0
[16623.098903] ---[ end trace 36545dfc8f7672ee ]---
[16623.098996] Kernel panic - not syncing: Fatal exception in interrupt
[16623.099085] Rebooting in 10 seconds..
[16633.158496] ACPI MEMORY or I/O RESET_REG.
This happens again and again with 3.12.33
see also: http://www.spinics.net/lists/netdev/msg306283.html
is this already fixed somehow?
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
On Wed, Dec 03, 2014 at 03:55:30PM +0100, Smart Weblications GmbH - Florian Wiessner wrote:
> Hi list,
>
>
>
> [16623.095403] BUG: unable to handle kernel paging request at 00000000010600d0
> [16623.095445] IP: [<ffffffff81547767>] xfrm_selector_match+0x25/0x2f6
> [16623.095480] PGD aeaea067 PUD 85d95067 PMD 0
> [16623.095513] Oops: 0000 [#1] SMP
> [16623.095543] Modules linked in: netconsole xt_nat xt_multiport veth ip_vs_rr
> nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry iptable_mangle xt_mark
> nf_conntrack_netlink nfnetlink ipt_MASQUERADE iptable_nat nf_nat_ipv4
> nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_tcpudp iptable_filter ip_tables
> cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace
> ocfs2_stack_o2cb ocfs2_dlm bridge stp llc bonding fuse nf_conntrack_ftp 8021q
> openvswitch gre vxlan xt_conntrack x_tables ocfs2_dlmfs dlm sctp ocfs2
> ocfs2_nodemanager ocfs2_stackglue configfs rbd kvm_intel kvm coretemp ip_vs_ftp
> ip_vs nf_nat nf_conntrack ctr twofish_generic twofish_x86_64 twofish_common
> camellia_generic serpent_generic blowfish_generic blowfish_common cast5_generic
> cast_common xcbc sha512_generic crypto_null af_key xfrm_algo psmouse serio_raw
> i2c_i801 lpc_ich mfd_core evdev btrfs lzo_decompress lzo_compress
> [16623.096062] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.33 #1
> [16623.096091] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a
> 09/28/2011
> [16623.096137] task: ffffffff81804450 ti: ffffffff817f4000 task.ti: ffffffff817f4000
> [16623.096182] RIP: 0010:[<ffffffff81547767>] [<ffffffff81547767>]
> xfrm_selector_match+0x25/0x2f6
> [16623.096233] RSP: 0018:ffff88083fc03900 EFLAGS: 00010246
> [16623.096261] RAX: 0000000000000001 RBX: ffff88083fc03a20 RCX: ffff880787fb1200
> [16623.096292] RDX: 0000000000000002 RSI: ffff88083fc03a20 RDI: 00000000010600a6
> [16623.096323] RBP: 00000000010600a6 R08: 0000000000000000 R09: ffff88083fc039a0
> [16623.096353] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88083fc03a20
> [16623.096383] R13: 0000000000000001 R14: ffffffff818a9700 R15: ffffffffa01c73e0
> [16623.096414] FS: 0000000000000000(0000) GS:ffff88083fc00000(0000)
> knlGS:0000000000000000
> [16623.096469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16623.096498] CR2: 00000000010600d0 CR3: 0000000085f0b000 CR4: 00000000000407f0
> [16623.096528] Stack:
> [16623.096550] 0000000000000000 0000000001060002 ffff880787fb1200 ffff88083fc03a20
> [16623.096602] 0000000000000001 ffffffff81547a7c 0000000000000000 ffff8800baad5480
> [16623.096655] ffffffff81804450 ffffffff818a9700 000000003c9041bc ffffffff81547ef7
> [16623.096721] Call Trace:
> [16623.096744] <IRQ>
> [16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
> [16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
> [16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
> [16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
> [16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
> [16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
I really wonder why the xfrm_sk_policy_lookup codepath is taken here.
It looks like this is the processing of an inbound ipv4 packet that
is going to be rerouted to the output path by ipvs, so this packet
should not have socket context at all.
xfrm_sk_policy_lookup is called just if the packet has socket context
and the socket has an IPsec output policy configured. Do you use IPsec
socket policies?
>
> This happens again and again with 3.12.33
>
>
> see also: http://www.spinics.net/lists/netdev/msg306283.html
>
> is this already fixed somehow?
>
You mentioned in the tread above that it does not happen with
3.17.4, so it should be fixed somehow. But I have no idea how
it was fixed.
On 12/03/2014, 03:55 PM, Smart Weblications GmbH - Florian Wiessner wrote:
> [16623.095403] BUG: unable to handle kernel paging request at 00000000010600d0
> [16623.095445] IP: [<ffffffff81547767>] xfrm_selector_match+0x25/0x2f6
> [16623.095480] PGD aeaea067 PUD 85d95067 PMD 0
> [16623.095513] Oops: 0000 [#1] SMP
> [16623.095543] Modules linked in: netconsole xt_nat xt_multiport veth ip_vs_rr
...
> [16623.096062] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.33 #1
> [16623.096091] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a
> 09/28/2011
> [16623.096137] task: ffffffff81804450 ti: ffffffff817f4000 task.ti: ffffffff817f4000
> [16623.096182] RIP: 0010:[<ffffffff81547767>] [<ffffffff81547767>]
> xfrm_selector_match+0x25/0x2f6
> [16623.096233] RSP: 0018:ffff88083fc03900 EFLAGS: 00010246
> [16623.096261] RAX: 0000000000000001 RBX: ffff88083fc03a20 RCX: ffff880787fb1200
> [16623.096292] RDX: 0000000000000002 RSI: ffff88083fc03a20 RDI: 00000000010600a6
> [16623.096323] RBP: 00000000010600a6 R08: 0000000000000000 R09: ffff88083fc039a0
> [16623.096353] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88083fc03a20
> [16623.096383] R13: 0000000000000001 R14: ffffffff818a9700 R15: ffffffffa01c73e0
> [16623.096414] FS: 0000000000000000(0000) GS:ffff88083fc00000(0000)
> knlGS:0000000000000000
> [16623.096469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16623.096498] CR2: 00000000010600d0 CR3: 0000000085f0b000 CR4: 00000000000407f0
> [16623.096528] Stack:
> [16623.096550] 0000000000000000 0000000001060002 ffff880787fb1200 ffff88083fc03a20
> [16623.096602] 0000000000000001 ffffffff81547a7c 0000000000000000 ffff8800baad5480
> [16623.096655] ffffffff81804450 ffffffff818a9700 000000003c9041bc ffffffff81547ef7
> [16623.096721] Call Trace:
> [16623.096744] <IRQ>
> [16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
> [16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
> [16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
> [16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
> [16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
> [16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
> [16623.096959] [<ffffffff814fc2d3>] ? nf_iterate+0x42/0x80
> [16623.096989] [<ffffffff814fc37a>] ? nf_hook_slow+0x69/0xff
> [16623.097017] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
> [16623.097047] [<ffffffff81501744>] ? ip_local_deliver+0x6f/0x7e
> [16623.097078] [<ffffffff814d82a6>] ? __netif_receive_skb_core+0x5f0/0x66c
> [16623.097108] [<ffffffff814d84b7>] ? process_backlog+0x13e/0x13e
> [16623.097140] [<ffffffffa04b4e09>] ? br_handle_frame_finish+0x382/0x382 [bridge]
> [16623.097187] [<ffffffff814d8503>] ? netif_receive_skb+0x4c/0x7d
> [16623.097218] [<ffffffffa04b4d95>] ? br_handle_frame_finish+0x30e/0x382 [bridge]
> [16623.097265] [<ffffffffa04b4fda>] ? br_handle_frame+0x1d1/0x217 [bridge]
> [16623.097297] [<ffffffffa048e4aa>] ? bond_handle_frame+0x86/0x1bd [bonding]
> [16623.097328] [<ffffffff814d8155>] ? __netif_receive_skb_core+0x49f/0x66c
> [16623.097358] [<ffffffff814d8503>] ? netif_receive_skb+0x4c/0x7d
> [16623.097388] [<ffffffff814d9091>] ? napi_gro_receive+0x35/0x76
> [16623.097418] [<ffffffff813eae17>] ? e1000_clean_rx_irq+0x26d/0x2f5
> [16623.097448] [<ffffffff813ef48e>] ? e1000e_poll+0x65/0x23d
> [16623.097477] [<ffffffff814d8709>] ? net_rx_action+0xa2/0x1c0
...
> This happens again and again with 3.12.33
>
>
> see also: http://www.spinics.net/lists/netdev/msg306283.html
>
> is this already fixed somehow?
Hi, I doubt so, as you are the first one letting me know. Could you
check whether 3.12.30 works OK for you?
thanks,
--
js
suse labs
Hi,
Am 04.12.2014 08:56, schrieb Steffen Klassert:
> On Wed, Dec 03, 2014 at 03:55:30PM +0100, Smart Weblications GmbH - Florian Wiessner wrote:
>> Hi list,
>>
>>
>>
>> [16623.095403] BUG: unable to handle kernel paging request at 00000000010600d0
>> [16623.095445] IP: [<ffffffff81547767>] xfrm_selector_match+0x25/0x2f6
>> [16623.095480] PGD aeaea067 PUD 85d95067 PMD 0
>> [16623.095513] Oops: 0000 [#1] SMP
>> [16623.095543] Modules linked in: netconsole xt_nat xt_multiport veth ip_vs_rr
>> nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry iptable_mangle xt_mark
>> nf_conntrack_netlink nfnetlink ipt_MASQUERADE iptable_nat nf_nat_ipv4
>> nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_tcpudp iptable_filter ip_tables
>> cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace
>> ocfs2_stack_o2cb ocfs2_dlm bridge stp llc bonding fuse nf_conntrack_ftp 8021q
>> openvswitch gre vxlan xt_conntrack x_tables ocfs2_dlmfs dlm sctp ocfs2
>> ocfs2_nodemanager ocfs2_stackglue configfs rbd kvm_intel kvm coretemp ip_vs_ftp
>> ip_vs nf_nat nf_conntrack ctr twofish_generic twofish_x86_64 twofish_common
>> camellia_generic serpent_generic blowfish_generic blowfish_common cast5_generic
>> cast_common xcbc sha512_generic crypto_null af_key xfrm_algo psmouse serio_raw
>> i2c_i801 lpc_ich mfd_core evdev btrfs lzo_decompress lzo_compress
>> [16623.096062] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.33 #1
>> [16623.096091] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a
>> 09/28/2011
>> [16623.096137] task: ffffffff81804450 ti: ffffffff817f4000 task.ti: ffffffff817f4000
>> [16623.096182] RIP: 0010:[<ffffffff81547767>] [<ffffffff81547767>]
>> xfrm_selector_match+0x25/0x2f6
>> [16623.096233] RSP: 0018:ffff88083fc03900 EFLAGS: 00010246
>> [16623.096261] RAX: 0000000000000001 RBX: ffff88083fc03a20 RCX: ffff880787fb1200
>> [16623.096292] RDX: 0000000000000002 RSI: ffff88083fc03a20 RDI: 00000000010600a6
>> [16623.096323] RBP: 00000000010600a6 R08: 0000000000000000 R09: ffff88083fc039a0
>> [16623.096353] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88083fc03a20
>> [16623.096383] R13: 0000000000000001 R14: ffffffff818a9700 R15: ffffffffa01c73e0
>> [16623.096414] FS: 0000000000000000(0000) GS:ffff88083fc00000(0000)
>> knlGS:0000000000000000
>> [16623.096469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [16623.096498] CR2: 00000000010600d0 CR3: 0000000085f0b000 CR4: 00000000000407f0
>> [16623.096528] Stack:
>> [16623.096550] 0000000000000000 0000000001060002 ffff880787fb1200 ffff88083fc03a20
>> [16623.096602] 0000000000000001 ffffffff81547a7c 0000000000000000 ffff8800baad5480
>> [16623.096655] ffffffff81804450 ffffffff818a9700 000000003c9041bc ffffffff81547ef7
>> [16623.096721] Call Trace:
>> [16623.096744] <IRQ>
>> [16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
>> [16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
>> [16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
>> [16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
>> [16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
>> [16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
>
> I really wonder why the xfrm_sk_policy_lookup codepath is taken here.
> It looks like this is the processing of an inbound ipv4 packet that
> is going to be rerouted to the output path by ipvs, so this packet
> should not have socket context at all.
>
> xfrm_sk_policy_lookup is called just if the packet has socket context
> and the socket has an IPsec output policy configured. Do you use IPsec
> socket policies?
>
Yes it is insane i do not know why this happens and i wonder as well - i do not
have IPsec configured. I tried yesterday with only
CONFIG_XFRM=y
CONFIG_XFRM_ALGO=m
and all other XFRM modules disabled, same problem.
I now compiled kernel without xfrm to check if the problem is somewhere else.
I have seen that on this box (debian squeeze) the racoon tool inserts xfrm
polcies like so:
ip xfrm policy show
src ::/0 dst ::/0
dir 4 priority 0 ptype main
src ::/0 dst ::/0
dir 3 priority 0 ptype main
src ::/0 dst ::/0
dir 4 priority 0 ptype main
src ::/0 dst ::/0
dir 3 priority 0 ptype main
src ::/0 dst ::/0
...
I tried without racoon running and with ipsec userspace tools disabled, but the
problem still exists without ipsec userspace tools.
Interesting is maybe, that the longer the node is running and interfaces are
added to a bridge, the more policies sum up. Here is an overview of other nodes,
but without ipvs running:
Executing ip xfrm policy show | wc -l on node02
92
Executing ip xfrm policy show | wc -l on node03
92
Executing ip xfrm policy show | wc -l on node04
68
Executing ip xfrm policy show | wc -l on node05
104
Executing ip xfrm policy show | wc -l on node06
160
Executing uptime on node02
17:30:35 up 4 days, 22:56, 0 users, load average: 1,45, 1,36, 1,25
Executing uptime on node03
17:30:35 up 4 days, 22:48, 1 user, load average: 1,50, 1,18, 1,12
Executing uptime on node04
17:30:36 up 4 days, 22:41, 5 users, load average: 1,07, 0,86, 0,80
Executing uptime on node05
17:30:36 up 3 days, 3:24, 1 user, load average: 1.66, 1.73, 1.82
Executing uptime on node06
17:30:36 up 3 days, 3:15, 1 user, load average: 1.38, 1.26, 1.30
We have a bridge configured on all nodes, so it seems to me when a devices is
added to a bridge, somehow the xfrm rules are created, but when the device is
remove from the bridge, the xfrm rules stay here.
Executing brctl show on node02
bridge name bridge id STP enabled interfaces
br0 8000.00259052bbf6 no bond0
veth31gl4f
vethhVTC6u
vnet0
vnet10
vnet12
vnet2
vnet3
vnet4
vnet5
vnet7
vnet8
vnet9
Executing brctl show on node03
bridge name bridge id STP enabled interfaces
br0 8000.00259052bbee no bond0
vethb9trsN
vethlFKktL
vnet0
vnet1
vnet10
vnet2
vnet3
vnet4
vnet5
vnet6
vnet7
vnet8
virbr0 8000.000000000000 yes
Executing brctl show on node04
bridge name bridge id STP enabled interfaces
br0 8000.00259052bba8 no bond0
veth2z6JHJ
vethD7kF0Z
vethZ8UGHJ
vetho6hc1N
vethwnIRTH
virbr0 8000.000000000000 yes
Executing brctl show on node05
bridge name bridge id STP enabled interfaces
br0 8000.00199976d512 no bond0
vnet0
vnet1
vnet10
vnet11
vnet12
vnet14
vnet15
vnet2
vnet4
vnet5
vnet6
vnet7
vnet9
Executing brctl show on node06
bridge name bridge id STP enabled interfaces
br0 8000.00199976d560 no bond0
vnet0
vnet10
vnet12
vnet13
vnet14
vnet15
vnet16
vnet17
vnet18
vnet2
vnet20
vnet21
vnet23
vnet25
vnet26
vnet28
vnet29
vnet3
vnet30
vnet5
vnet6
vnet7
vnet8
vnet9
...
I noticed the racoon userspace daemon running wild as corosync moved configured
ip addresses to and from the node, so there could be the possibility that this
is somehow related. Could it be that the polcies are never cleared up somehow?
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hi Jiri,
Am 04.12.2014 10:44, schrieb Jiri Slaby:
> On 12/03/2014, 03:55 PM, Smart Weblications GmbH - Florian Wiessner wrote:
>> [16623.095403] BUG: unable to handle kernel paging request at 00000000010600d0
[...]
>> [16623.096721] Call Trace:
[...]
>> This happens again and again with 3.12.33
>>
>>
>> see also: http://www.spinics.net/lists/netdev/msg306283.html
>>
>> is this already fixed somehow?
>
> Hi, I doubt so, as you are the first one letting me know. Could you
> check whether 3.12.30 works OK for you?
>
I can try that, but it will take me some time as i currently try a kernel
without xfrm support to calm down my users.
Could you please look at my other mail first, before i downgrade the kernel?
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hello,
On Thu, 4 Dec 2014, Steffen Klassert wrote:
> > [16623.096721] Call Trace:
> > [16623.096744] <IRQ>
> > [16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
> > [16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
> > [16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
> > [16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
> > [16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
> > [16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
>
> I really wonder why the xfrm_sk_policy_lookup codepath is taken here.
> It looks like this is the processing of an inbound ipv4 packet that
> is going to be rerouted to the output path by ipvs, so this packet
> should not have socket context at all.
In above trace looks like IPVS-NAT is used between
local client and some real server. IPVS handles this skb
at LOCAL_IN and calls ip_vs_route_me_harder(). If we have
skb->sk at LOCAL_IN, my first thought is about early demux.
If I remember correctly, looking at commit f5a41847acc535e2
("ipvs: move ip_route_me_harder for ICMP") that introduced
this rerouting (2.6.37), it was needed because at that time TCP
used rt_src from received skb to select daddr in ip_send_reply().
As packets to server are DNAT-ed and packets to client are
SNAT-ed we used rerouting to fill rt_src with correct IP
after SNAT.
Now when routing cache is removed in 3.6 and
tcp_v4_send_reset() is changed to provide ip_hdr(skb)->saddr
instead of rt_src it should be safe to remove this rerouting,
it is enough that ip_hdr(skb)->saddr was updated on IPVS-SNAT at
LOCAL_IN. In fact, rt_src was removed early in 3.0 with
commit 0a5ebb8000c5362 ("ipv4: Pass explicit daddr arg to
ip_send_reply().").
This is only to explain above stack. Not sure
if problem is related somehow to early demux but such
commits look interesting:
- commit 6b8dbcf2c44fd7a ("bridge: netfilter: orphan skb before invoking
ip netfilter hooks")
Also, it would be good to know which 3.x kernel between
3.13 and 3.17 fixes the problem, it will narrow the search.
Regards
--
Julian Anastasov <[email protected]>
Hi,
Am 05.12.2014 00:15, schrieb Julian Anastasov:
>
> Hello,
>
> On Thu, 4 Dec 2014, Steffen Klassert wrote:
>
>>> [16623.096721] Call Trace:
>>> [16623.096744] <IRQ>
>>> [16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
>>> [16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
>>> [16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
>>> [16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
>>> [16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
>>> [16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
>>
>> I really wonder why the xfrm_sk_policy_lookup codepath is taken here.
>> It looks like this is the processing of an inbound ipv4 packet that
>> is going to be rerouted to the output path by ipvs, so this packet
>> should not have socket context at all.
>
> In above trace looks like IPVS-NAT is used between
> local client and some real server. IPVS handles this skb
> at LOCAL_IN and calls ip_vs_route_me_harder(). If we have
> skb->sk at LOCAL_IN, my first thought is about early demux.
>
> If I remember correctly, looking at commit f5a41847acc535e2
> ("ipvs: move ip_route_me_harder for ICMP") that introduced
> this rerouting (2.6.37), it was needed because at that time TCP
> used rt_src from received skb to select daddr in ip_send_reply().
> As packets to server are DNAT-ed and packets to client are
> SNAT-ed we used rerouting to fill rt_src with correct IP
> after SNAT.
>
> Now when routing cache is removed in 3.6 and
> tcp_v4_send_reset() is changed to provide ip_hdr(skb)->saddr
> instead of rt_src it should be safe to remove this rerouting,
> it is enough that ip_hdr(skb)->saddr was updated on IPVS-SNAT at
> LOCAL_IN. In fact, rt_src was removed early in 3.0 with
> commit 0a5ebb8000c5362 ("ipv4: Pass explicit daddr arg to
> ip_send_reply().").
>
> This is only to explain above stack. Not sure
> if problem is related somehow to early demux but such
> commits look interesting:
>
> - commit 6b8dbcf2c44fd7a ("bridge: netfilter: orphan skb before invoking
> ip netfilter hooks")
>
> Also, it would be good to know which 3.x kernel between
> 3.13 and 3.17 fixes the problem, it will narrow the search.
>
i tried with 3.12.33 without any XFRM and now got this one (which is reproducable):
[ 233.956012] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000014
[ 233.956218] IP: [<ffffffffa013a470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack
]
[ 233.956371] PGD 0
[ 233.956493] Oops: 0000 [#1] SMP
[ 233.956680] Modules linked in: netconsole xt_nat xt_multiport veth iptable_ma
ngle xt_mark nf_conntrack_netlink nfnetlink
ip_vs_rr ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_tcpudp iptable_filter
ip_tables cpufreq_ondemand cpufreq_powersave
cpufreq_conservative cpufreq_users pace
ocfs2_stack_o2cb ocfs2_dlm bridge stp llc bonding fuse nf_conntrack_ftp 802
1q openvswitch gre vxlan xt_conntrack x_tables
ocfs2_dlmfs dlm sctp ocfs2 ocfs2_ nodemanager
ocfs2_stackglue configfs rbd kvm_intel kvm coretemp ip_vs_ftp ip_vs
nf_nat nf_conntrack psmouse i2c_i801 serio_raw lpc_ich
mfd_core evdev btrfs lzo_ decompress lzo_compress
[ 233.960221] CPU: 2 PID: 29996 Comm: vsftpd Not tainted 3.12.33 #4
[ 233.960298] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a 09/2
8/2011
[ 233.960395] task: ffff88075e87a2c0 ti: ffff8806a7444000 task.ti: ffff8806a744
4000
[ 233.960486] RIP: 0010:[<ffffffffa013a470>] [<ffffffffa013a470>] nf_ct_seqadj
_set+0x60/0x90 [nf_conntrack]
[ 233.960632] RSP: 0018:ffff88083fc83998 EFLAGS: 00010206
[ 233.960709] RAX: 000000000000000c RBX: ffff8806cab452cc RCX: 0000000000000003
[ 233.960791] RDX: 0000000000000029 RSI: 0000000000000003 RDI: ffff8806cab452cc
[ 233.960875] RBP: 00000000ee38035a R08: ffff8807e2b1edc0 R09: ffff88083fc839a8
[ 233.960957] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[ 233.961041] R13: 0000000000000000 R14: 0000000000000003 R15: ffff8806a75a50bc
[ 233.961124] FS: 00007ff22daec700(0000) GS:ffff88083fc80000(0000) knlGS:00000
00000000000
[ 233.961226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 233.961303] CR2: 0000000000000014 CR3: 00000006b3259000 CR4: 00000000000407e0
[ 233.961384] Stack:
[ 233.961460] ffff880815612b60 0000000000000012 0000000000000014 ffff8806cab45
2c8
[ 233.961776] ffff8806a75a5001 ffffffffa014f681 0000000000000000 ffffffff00000
045
[ 233.962095] ffff880800000048 0000001b00000003 ffff88083fc83a70 ffff880815612
b60
[ 233.962411] Call Trace:
[ 233.962482] <IRQ>
[ 233.962538] [<ffffffffa014f681>] ? __nf_nat_mangle_tcp_packet+0x109/0x120 [n
f_nat]
[ 233.962762] [<ffffffffa017749e>] ? ip_vs_ftp_out.part.8+0x2b2/0x338 [ip_vs_f
tp]
[ 233.962866] [<ffffffff814cb8c0>] ? __domain_mapping+0x25d/0x2a3
[ 233.962949] [<ffffffff8154140c>] ? fib_table_lookup+0xe4/0x255
[ 233.963032] [<ffffffffa015f858>] ? ip_vs_app_pkt_out+0x105/0x18b [ip_vs]
[ 233.963110] [<ffffffffa0162ffc>] ? tcp_snat_handler+0x6b/0x320 [ip_vs]
[ 233.963189] [<ffffffffa0155d3d>] ? ip_vs_conn_out_get_proto+0x1c/0x25 [ip_vs
]
[ 233.963284] [<ffffffffa0158937>] ? ip_vs_out+0x290/0x5bc [ip_vs]
[ 233.963362] [<ffffffff8150f544>] ? ip_frag_mem+0x2a/0x2a
[ 233.963442] [<ffffffff81508e1f>] ? nf_iterate+0x42/0x80
[ 233.963519] [<ffffffff81508ec6>] ? nf_hook_slow+0x69/0xff
[ 233.963595] [<ffffffff8150f544>] ? ip_frag_mem+0x2a/0x2a
[ 233.963667] [<ffffffff8150f8ae>] ? ip_forward+0x22d/0x2cf
[ 233.963744] [<ffffffff814e57ce>] ? __netif_receive_skb_core+0x5f0/0x66c
[ 233.963826] [<ffffffff814e59df>] ? process_backlog+0x13e/0x13e
[ 233.963911] [<ffffffffa0455e09>] ? br_handle_frame_finish+0x382/0x382 [bridg
e]
[ 233.964008] [<ffffffff814e5a2b>] ? netif_receive_skb+0x4c/0x7d
[ 233.964090] [<ffffffffa0455d95>] ? br_handle_frame_finish+0x30e/0x382 [bridg
e]
[ 233.964186] [<ffffffffa0455fda>] ? br_handle_frame+0x1d1/0x217 [bridge]
[ 233.964267] [<ffffffff814e567d>] ? __netif_receive_skb_core+0x49f/0x66c
[ 233.964350] [<ffffffff814e592b>] ? process_backlog+0x8a/0x13e
[ 233.964429] [<ffffffff814e5c31>] ? net_rx_action+0xa2/0x1c0
[ 233.964508] [<ffffffff81047e2e>] ? __do_softirq+0xf6/0x24f
[ 233.964588] [<ffffffff8106cbfd>] ? account_system_time+0x10f/0x169
[ 233.964669] [<ffffffff815ad7dc>] ? call_softirq+0x1c/0x30
[ 233.964743] <EOI>
[ 233.964801] [<ffffffff8100464d>] ? do_softirq+0x2c/0x5f
[ 233.965013] [<ffffffff81047ca1>] ? local_bh_enable+0x67/0x85
[ 233.965088] [<ffffffff81511689>] ? ip_finish_output+0x2c9/0x322
[ 233.965165] [<ffffffff8151240a>] ? ip_queue_xmit+0x2b7/0x2f0
[ 233.965239] [<ffffffff81524772>] ? tcp_transmit_skb+0x6ef/0x755
[ 233.965316] [<ffffffff815250e8>] ? tcp_write_xmit+0x886/0x9cb
[ 233.965391] [<ffffffff8152527a>] ? __tcp_push_pending_frames+0x24/0x7e
[ 233.965473] [<ffffffff8151a33c>] ? tcp_sendmsg+0xa4c/0xbfc
[ 233.965550] [<ffffffff814d3477>] ? sock_aio_write+0xe3/0xfd
[ 233.965631] [<ffffffff81122f4d>] ? do_sync_write+0x59/0x79
[ 233.965709] [<ffffffff811239e3>] ? vfs_write+0xc4/0x182
[ 233.965786] [<ffffffff81123daf>] ? SyS_write+0x45/0x7c
[ 233.965864] [<ffffffff815ac35b>] ? tracesys+0xdd/0xe2
[ 233.965940] Code: 68 14 4d 01 c5 45 85 e4 74 46 f0 80 4f 78 40 48 8d 5f 04 48
89 df e8 00 12 47 e1 31 c0 41 83 fe 02 0f 97
c0 48 6b c0 0c 4c 01 e8 <8b> 70 08 39 70 04
74 08 89 ea 0f ca 39 10 79 0d 89 70 04 44 01
[ 233.969602] RIP [<ffffffffa013a470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrac
k]
[ 233.969746] RSP <ffff88083fc83998>
[ 233.969816] CR2: 0000000000000014
[ 233.969919] ---[ end trace c6faf7aa989b11c2 ]---
[ 233.969999] Kernel panic - not syncing: Fatal exception in interrupt
[ 233.970081] Rebooting in 10 seconds..
[ 244.029931] ACPI MEMORY or I/O RESET_REG.
node01:/ocfs2/usr/src/linux-3.12.33/scripts# ./decodecode < /tmp/oops-ipvsftp.txt
[ 233.965940] Code: 68 14 4d 01 c5 45 85 e4 74 46 f0 80 4f 78 40 48 8d 5f 04 48
89 df e8 00 12 47 e1 31 c0 41 83 fe 02 0f 97 c0 48 6b c0 0c 4c 01 e8 <8b> 70 08
39 70 04 74 08 89 ea 0f ca 39 10 79 0d 89 70 04 44 01
All code
========
0: 68 14 4d 01 c5 pushq $0xffffffffc5014d14
5: 45 85 e4 test %r12d,%r12d
8: 74 46 je 0x50
a: f0 80 4f 78 40 lock orb $0x40,0x78(%rdi)
f: 48 8d 5f 04 lea 0x4(%rdi),%rbx
13: 48 89 df mov %rbx,%rdi
16: e8 00 12 47 e1 callq 0xffffffffe147121b
1b: 31 c0 xor %eax,%eax
1d: 41 83 fe 02 cmp $0x2,%r14d
21: 0f 97 c0 seta %al
24: 48 6b c0 0c imul $0xc,%rax,%rax
28: 4c 01 e8 add %r13,%rax
2b:* 8b 70 08 mov 0x8(%rax),%esi <-- trapping
instruction
2e: 39 70 04 cmp %esi,0x4(%rax)
31: 74 08 je 0x3b
33: 89 ea mov %ebp,%edx
35: 0f ca bswap %edx
37: 39 10 cmp %edx,(%rax)
39: 79 0d jns 0x48
3b: 89 70 04 mov %esi,0x4(%rax)
3e: 44 rex.R
3f: 01 .byte 0x1
Code starting with the faulting instruction
===========================================
0: 8b 70 08 mov 0x8(%rax),%esi
3: 39 70 04 cmp %esi,0x4(%rax)
6: 74 08 je 0x10
8: 89 ea mov %ebp,%edx
a: 0f ca bswap %edx
c: 39 10 cmp %edx,(%rax)
e: 79 0d jns 0x1d
10: 89 70 04 mov %esi,0x4(%rax)
13: 44 rex.R
14: 01 .byte 0x1
setup is like this:
#virtual=<myVIP>:21
# real=10.10.1.20:21 masq
# real=10.10.1.21:21 masq
# real=10.10.1.22:21 masq
# real=10.10.1.23:21 masq
# persistent=600
# service=ftp
# scheduler=rr
# protocol=tcp
# checktype=connect
( i remarked it to prevent fruther crashes...)
when ip_vs_ftp is loaded and someone trying to make a ftp connection, the system
panics instantly.
10.10.1.20 - 10.10.1.23 are lxc-containers using veth connected to the bridge
running on 4 different nodes. The node running ldirector/ipvsadm has also one of
those containers running (don't know if that matters)
brctl show
bridge name bridge id STP enabled interfaces
br0 8000.00259052bbf4 no bond0
vethMKELUc
vethXdWGqf
vethgJMmEb
vethmKNqFc
I disabled the ftp server lxc container on the node doing ip_vs, so that the
endpoint of the connection is not on the same node and tried again but with the
same result.
Unfortunatelly i cannot test with newer kernels than 3.12, because ocfs2 is
somehow broken in >= 3.14
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hi,
Am 05.12.2014 10:55, schrieb Julian Anastasov:
>
> On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
>
>> i tried with 3.12.33 without any XFRM and now got this one (which is reproducable):
>>
>> [ 233.956012] BUG: unable to handle kernel NULL pointer dereference at 00000000
>> 00000014
>> [ 233.956218] IP: [<ffffffffa013a470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack
>
> It seems fix from 3.13 was not sent to 3.12 stable:
>
> commit b25adce1606427fd8 ("ipvs: correct usage/allocation of seqadj ext in
> ipvs")
>
> There was related change but it is not needed
> for stable kernels:
>
> commit db12cf27435356017e ("netfilter: WARN about wrong usage of sequence
> number adjustments"
>
> Simon, can we try commit b25adce1606427fd8 for 3.12?
>> setup is like this:
>>
>>
>> #virtual=<myVIP>:21
>> # real=10.10.1.20:21 masq
[...]
>> # service=ftp
>> # scheduler=rr
>> # protocol=tcp
>> # checktype=connect
>>
>> ( i remarked it to prevent fruther crashes...)
>>
>> when ip_vs_ftp is loaded and someone trying to make a ftp connection, the system
>> panics instantly.
>>
>> 10.10.1.20 - 10.10.1.23 are lxc-containers using veth connected to the bridge
>> running on 4 different nodes. The node running ldirector/ipvsadm has also one of
>> those containers running (don't know if that matters)
>
> It is always good to know the setup. Do you access VIP
> from local clients (from director)?
>
Not for ftp, but we have mail as well in the same setup, and yes, there we do
access it from local client.
>> brctl show
>> bridge name bridge id STP enabled interfaces
>> br0 8000.00259052bbf4 no bond0
>> vethMKELUc
[...]
> Before I create patch to avoid rerouting for
> LOCAL_IN you can try to set IPVS sysctl var "snat_reroute" to 0
> or even to change ip_vs_route_me_harder() function just to return 0.
> snat_reroute=1 (a default value) is needed if you have
> multiple links to clients and use ip rules to select
> correct route by src ip (after SNAT). If you have single
> uplink snat_reroute can be 0.
>
ip rule show
0: from all lookup local
32765: from all to 10.10.0.0/16 lookup 200
I use ip rules, but this is not for source but destination. I need this to
enable clients from the local net to connect to some VIPs so they get there
correct route back.
I have also seen "b25adce1606427fd8 ipvs: correct usage/allocation of seqadj ext
in ipvs" in the net while googling, but i thought that it would be included in
3.12.33 as the patch is over a year old and since this is marked as stable i did
not expect any issues.
Maybe i would not have stubmled accross this if the ocfs2 devs were as fast as
the netdev-devs! But to my ocfs2 isseu/bug i still have no reply until today. So
thank you for the fast responses! I would like to test any patch for 3.12.
If i understand correctly, i set:
echo 0 > /proc/sys/net/ipv4/vs/snat_reroute
modprobe ip_vs_ftp
and reenable ftp ipvs?
It does not crash, but ftp is not working with neither PASV nor PORT:
[14:47:42] [R] Verbindung herstellen zu 192.168.10.62 -> IP=192.168.10.62 PORT=21
[14:47:42] [R] Verbunden mit 192.168.10.62
[14:47:43] [R] 220 (vsFTPd 3.0.2)
[14:47:43] [R] USER (hidden)
[14:47:43] [R] 331 Please specify the password.
[14:47:43] [R] PASS (hidden)
[14:47:43] [R] 230 Login successful.
[14:47:43] [R] SYST
[14:47:43] [R] 215 UNIX Type: L8
[14:47:43] [R] FEAT
[14:47:43] [R] 211-Features:
[14:47:43] [R] EPRT
[14:47:43] [R] EPSV
[14:47:43] [R] MDTM
[14:47:43] [R] PASV
[14:47:43] [R] REST STREAM
[14:47:43] [R] SIZE
[14:47:43] [R] TVFS
[14:47:43] [R] UTF8
[14:47:43] [R] 211 End
[14:47:43] [R] PWD
[14:47:43] [R] 257 "/"
[14:47:43] [R] CWD /
[14:47:43] [R] 250 Directory successfully changed.
[14:47:43] [R] PWD
[14:47:43] [R] 257 "/"
[14:47:43] [R] TYPE A
[14:47:43] [R] 200 Switching to ASCII mode.
[14:47:43] [R] PASV
[14:47:43] [R] 227 Entering Passive Mode (10,10,1,23,251,6).
[14:47:43] [R] Datenkanal-IP ?ffnen: 192.168.10.62 PORT: 64262
[14:47:44] [R] Datensocket-Fehler: Verbindung abgewiesen
[14:47:44] [R] List Fehler
[14:47:44] [R] PASV
[14:47:44] [R] 227 Entering Passive Mode (10,10,1,23,250,144).
[14:47:44] [R] Datenkanal-IP ?ffnen: 192.168.10.62 PORT: 64144
[14:47:45] [R] Datensocket-Fehler: Verbindung abgewiesen
[14:47:45] [R] List Fehler
[14:47:45] [R] PASV-Modus fehlgeschlagen, PORT -Modus versuchen...
[14:47:45] [R] Auf PORT: 62505 warten, Verbindung erwarten.
[14:47:45] [R] PORT 192,168,200,13,244,41
[14:47:45] [R] 500 Illegal PORT command.
[14:47:45] [R] List Fehler
[14:48:14] [R] QUIT
[14:48:14] [R] 221 Goodbye.
[14:48:14] [R] Ausgeloggt: 192.168.10.62
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
On Fri, Dec 05, 2014 at 01:15:51AM +0200, Julian Anastasov wrote:
>
> Hello,
>
> On Thu, 4 Dec 2014, Steffen Klassert wrote:
>
> > > [16623.096721] Call Trace:
> > > [16623.096744] <IRQ>
> > > [16623.096749] [<ffffffff81547a7c>] ? xfrm_sk_policy_lookup+0x44/0x9b
> > > [16623.096802] [<ffffffff81547ef7>] ? xfrm_lookup+0x91/0x446
> > > [16623.096832] [<ffffffff81541316>] ? ip_route_me_harder+0x150/0x1b0
> > > [16623.096865] [<ffffffffa01b6457>] ? ip_vs_route_me_harder+0x86/0x91 [ip_vs]
> > > [16623.096899] [<ffffffffa01b797a>] ? ip_vs_out+0x2d3/0x5bc [ip_vs]
> > > [16623.096930] [<ffffffff81501420>] ? ip_rcv_finish+0x2b8/0x2b8
> >
> > I really wonder why the xfrm_sk_policy_lookup codepath is taken here.
> > It looks like this is the processing of an inbound ipv4 packet that
> > is going to be rerouted to the output path by ipvs, so this packet
> > should not have socket context at all.
>
> In above trace looks like IPVS-NAT is used between
> local client and some real server. IPVS handles this skb
> at LOCAL_IN and calls ip_vs_route_me_harder(). If we have
> skb->sk at LOCAL_IN, my first thought is about early demux.
Yes, that's possible. Can be checked by disabling early demux.
echo 0 > /proc/sys/net/ipv4/ip_early_demux
If I look what it tries to dereference when the crash happens,
this does not look like a pointer. But sk->sk_policy[dir]
should be either a pointer to kernel memory or NULL. So I
think that the skb->sk pointer is already bogus.
On Thu, Dec 04, 2014 at 05:36:27PM +0100, Smart Weblications GmbH - Florian Wiessner wrote:
> Hi,
>
> Am 04.12.2014 08:56, schrieb Steffen Klassert:
> >
> > I really wonder why the xfrm_sk_policy_lookup codepath is taken here.
> > It looks like this is the processing of an inbound ipv4 packet that
> > is going to be rerouted to the output path by ipvs, so this packet
> > should not have socket context at all.
> >
> > xfrm_sk_policy_lookup is called just if the packet has socket context
> > and the socket has an IPsec output policy configured. Do you use IPsec
> > socket policies?
> >
>
> Yes it is insane i do not know why this happens and i wonder as well - i do not
> have IPsec configured. I tried yesterday with only
>
> CONFIG_XFRM=y
> CONFIG_XFRM_ALGO=m
>
> and all other XFRM modules disabled, same problem.
>
> I now compiled kernel without xfrm to check if the problem is somewhere else.
>
> I have seen that on this box (debian squeeze) the racoon tool inserts xfrm
> polcies like so:
>
> ip xfrm policy show
> src ::/0 dst ::/0
> dir 4 priority 0 ptype main
> src ::/0 dst ::/0
> dir 3 priority 0 ptype main
> src ::/0 dst ::/0
> dir 4 priority 0 ptype main
> src ::/0 dst ::/0
> dir 3 priority 0 ptype main
> src ::/0 dst ::/0
> ...
Well, these are socket policies. The ike deamon uses them
for SA negotiation.
>
> I tried without racoon running and with ipsec userspace tools disabled, but the
> problem still exists without ipsec userspace tools.
Does this mean that it still happens if you have no IPsec policies
in the system?
>
> Interesting is maybe, that the longer the node is running and interfaces are
> added to a bridge, the more policies sum up. Here is an overview of other nodes,
> but without ipvs running:
Would be interesting to see them.
Hello,
Adding Simon to CC...
On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> i tried with 3.12.33 without any XFRM and now got this one (which is reproducable):
>
> [ 233.956012] BUG: unable to handle kernel NULL pointer dereference at 00000000
> 00000014
> [ 233.956218] IP: [<ffffffffa013a470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack
It seems fix from 3.13 was not sent to 3.12 stable:
commit b25adce1606427fd8 ("ipvs: correct usage/allocation of seqadj ext in
ipvs")
There was related change but it is not needed
for stable kernels:
commit db12cf27435356017e ("netfilter: WARN about wrong usage of sequence
number adjustments"
Simon, can we try commit b25adce1606427fd8 for 3.12?
> setup is like this:
>
>
> #virtual=<myVIP>:21
> # real=10.10.1.20:21 masq
> # real=10.10.1.21:21 masq
> # real=10.10.1.22:21 masq
> # real=10.10.1.23:21 masq
> # persistent=600
> # service=ftp
> # scheduler=rr
> # protocol=tcp
> # checktype=connect
>
> ( i remarked it to prevent fruther crashes...)
>
> when ip_vs_ftp is loaded and someone trying to make a ftp connection, the system
> panics instantly.
>
> 10.10.1.20 - 10.10.1.23 are lxc-containers using veth connected to the bridge
> running on 4 different nodes. The node running ldirector/ipvsadm has also one of
> those containers running (don't know if that matters)
It is always good to know the setup. Do you access VIP
from local clients (from director)?
> brctl show
> bridge name bridge id STP enabled interfaces
> br0 8000.00259052bbf4 no bond0
> vethMKELUc
> vethXdWGqf
> vethgJMmEb
> vethmKNqFc
>
>
> I disabled the ftp server lxc container on the node doing ip_vs, so that the
> endpoint of the connection is not on the same node and tried again but with the
> same result.
>
> Unfortunatelly i cannot test with newer kernels than 3.12, because ocfs2 is
> somehow broken in >= 3.14
Before I create patch to avoid rerouting for
LOCAL_IN you can try to set IPVS sysctl var "snat_reroute" to 0
or even to change ip_vs_route_me_harder() function just to return 0.
snat_reroute=1 (a default value) is needed if you have
multiple links to clients and use ip rules to select
correct route by src ip (after SNAT). If you have single
uplink snat_reroute can be 0.
Regards
--
Julian Anastasov <[email protected]>
Hello,
On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> thank you for the fast responses! I would like to test any patch for 3.12.
I hope I'll have time this weekend...
> If i understand correctly, i set:
>
> echo 0 > /proc/sys/net/ipv4/vs/snat_reroute
The flag works per-packet, no need to reload any modules.
But it does not help for the case with local client where
the problem with sockets occurs, that is why you can keep
ip_vs_route_me_harder() empty (return 0) until patch is
created.
> modprobe ip_vs_ftp
>
> and reenable ftp ipvs?
>
> It does not crash, but ftp is not working with neither PASV nor PORT:
>
>
> [14:47:42] [R] Verbindung herstellen zu 192.168.10.62 -> IP=192.168.10.62 PORT=21
> [14:47:42] [R] Verbunden mit 192.168.10.62
> [14:47:43] [R] 220 (vsFTPd 3.0.2)
> [14:47:43] [R] USER (hidden)
> [14:47:43] [R] 331 Please specify the password.
> [14:47:43] [R] PASS (hidden)
> [14:47:43] [R] 230 Login successful.
> [14:47:43] [R] SYST
> [14:47:43] [R] 215 UNIX Type: L8
> [14:47:43] [R] FEAT
> [14:47:43] [R] 211-Features:
> [14:47:43] [R] EPRT
> [14:47:43] [R] EPSV
> [14:47:43] [R] MDTM
> [14:47:43] [R] PASV
> [14:47:43] [R] REST STREAM
> [14:47:43] [R] SIZE
> [14:47:43] [R] TVFS
> [14:47:43] [R] UTF8
> [14:47:43] [R] 211 End
> [14:47:43] [R] PWD
> [14:47:43] [R] 257 "/"
> [14:47:43] [R] CWD /
> [14:47:43] [R] 250 Directory successfully changed.
> [14:47:43] [R] PWD
> [14:47:43] [R] 257 "/"
> [14:47:43] [R] TYPE A
> [14:47:43] [R] 200 Switching to ASCII mode.
> [14:47:43] [R] PASV
> [14:47:43] [R] 227 Entering Passive Mode (10,10,1,23,251,6).
> [14:47:43] [R] Datenkanal-IP öffnen: 192.168.10.62 PORT: 64262
> [14:47:44] [R] Datensocket-Fehler: Verbindung abgewiesen
> [14:47:44] [R] List Fehler
> [14:47:44] [R] PASV
> [14:47:44] [R] 227 Entering Passive Mode (10,10,1,23,250,144).
> [14:47:44] [R] Datenkanal-IP öffnen: 192.168.10.62 PORT: 64144
> [14:47:45] [R] Datensocket-Fehler: Verbindung abgewiesen
> [14:47:45] [R] List Fehler
> [14:47:45] [R] PASV-Modus fehlgeschlagen, PORT -Modus versuchen...
> [14:47:45] [R] Auf PORT: 62505 warten, Verbindung erwarten.
> [14:47:45] [R] PORT 192,168,200,13,244,41
> [14:47:45] [R] 500 Illegal PORT command.
Who is 192.168.200.13? From vsftpd-3.0.2/postlogin.c,
handle_port():
/* SECURITY:
* 1) Reject requests not connecting to the control socket IP
* 2) Reject connects to privileged ports
*/
It looks like PORT command provides different IP.
IIRC, IPVS does not mangle PORT command, vsftpd expects to
connect to the same client IP. There is config option you can
try to set (port_promiscuous), only while testing.
> [14:47:45] [R] List Fehler
> [14:48:14] [R] QUIT
> [14:48:14] [R] 221 Goodbye.
> [14:48:14] [R] Ausgeloggt: 192.168.10.62
Regards
--
Julian Anastasov <[email protected]>
Hello,
On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> thank you for the fast responses! I would like to test any patch for 3.12.
I'm attaching a patch that avoids rerouting in
IPVS for LOCAL_IN. Please test it in your setup. My tests
were with NAT on today's net tree. I checked that it
compiles for 3.12.33. You can use the default snat_reroute=1.
Regards
--
Julian Anastasov <[email protected]>
Hi,
Am 05.12.2014 22:32, schrieb Julian Anastasov:
>
> Hello,
>
> On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
>
>> thank you for the fast responses! I would like to test any patch for 3.12.
>
> I hope I'll have time this weekend...
>
>> If i understand correctly, i set:
>>
>> echo 0 > /proc/sys/net/ipv4/vs/snat_reroute
>
> The flag works per-packet, no need to reload any modules.
> But it does not help for the case with local client where
> the problem with sockets occurs, that is why you can keep
> ip_vs_route_me_harder() empty (return 0) until patch is
> created.
>
>> modprobe ip_vs_ftp
>>
>> and reenable ftp ipvs?
>>
>> It does not crash, but ftp is not working with neither PASV nor PORT:
>>
>>
>> [14:47:42] [R] Verbindung herstellen zu 192.168.10.62 -> IP=192.168.10.62 PORT=21
>> [14:47:42] [R] Verbunden mit 192.168.10.62
>> [14:47:43] [R] 220 (vsFTPd 3.0.2)
>> [14:47:43] [R] USER (hidden)
>> [14:47:43] [R] 331 Please specify the password.
>> [14:47:43] [R] PASS (hidden)
>> [14:47:43] [R] 230 Login successful.
>> [14:47:43] [R] SYST
>> [14:47:43] [R] 215 UNIX Type: L8
>> [14:47:43] [R] FEAT
>> [14:47:43] [R] 211-Features:
>> [14:47:43] [R] EPRT
>> [14:47:43] [R] EPSV
>> [14:47:43] [R] MDTM
>> [14:47:43] [R] PASV
>> [14:47:43] [R] REST STREAM
>> [14:47:43] [R] SIZE
>> [14:47:43] [R] TVFS
>> [14:47:43] [R] UTF8
>> [14:47:43] [R] 211 End
>> [14:47:43] [R] PWD
>> [14:47:43] [R] 257 "/"
>> [14:47:43] [R] CWD /
>> [14:47:43] [R] 250 Directory successfully changed.
>> [14:47:43] [R] PWD
>> [14:47:43] [R] 257 "/"
>> [14:47:43] [R] TYPE A
>> [14:47:43] [R] 200 Switching to ASCII mode.
>> [14:47:43] [R] PASV
>> [14:47:43] [R] 227 Entering Passive Mode (10,10,1,23,251,6).
>> [14:47:43] [R] Datenkanal-IP öffnen: 192.168.10.62 PORT: 64262
>> [14:47:44] [R] Datensocket-Fehler: Verbindung abgewiesen
>> [14:47:44] [R] List Fehler
>> [14:47:44] [R] PASV
>> [14:47:44] [R] 227 Entering Passive Mode (10,10,1,23,250,144).
>> [14:47:44] [R] Datenkanal-IP öffnen: 192.168.10.62 PORT: 64144
>> [14:47:45] [R] Datensocket-Fehler: Verbindung abgewiesen
>> [14:47:45] [R] List Fehler
>> [14:47:45] [R] PASV-Modus fehlgeschlagen, PORT -Modus versuchen...
>> [14:47:45] [R] Auf PORT: 62505 warten, Verbindung erwarten.
>> [14:47:45] [R] PORT 192,168,200,13,244,41
>> [14:47:45] [R] 500 Illegal PORT command.
>
> Who is 192.168.200.13? From vsftpd-3.0.2/postlogin.c,
> handle_port():
>
192.168.200.13 was the ftp client. As this client also was natted, PORT Mode
will fail here because the client provided the internal ip, but i disabled PORT
anyways before and did reenable it only to test...
> /* SECURITY:
> * 1) Reject requests not connecting to the control socket IP
> * 2) Reject connects to privileged ports
> */
>
> It looks like PORT command provides different IP.
> IIRC, IPVS does not mangle PORT command, vsftpd expects to
> connect to the same client IP. There is config option you can
> try to set (port_promiscuous), only while testing.
>
While this is true, PASV should have worked anyhow, right?
--
Mit freundlichen Grüßen,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Geschäftsführer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hi Julian,
Am 07.12.2014 19:27, schrieb Julian Anastasov:>
> Hello,
>
> On Fri, 5 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
>
>> thank you for the fast responses! I would like to test any patch for 3.12.
>
> I'm attaching a patch that avoids rerouting in
> IPVS for LOCAL_IN. Please test it in your setup. My tests
> were with NAT on today's net tree. I checked that it
> compiles for 3.12.33. You can use the default snat_reroute=1.
>
I'm sorry to tell you that your patch does not fix the problem. The BUG happens
as soon as the client sends PASV, the ftp server does not return "Entering
Passive Mode":
[ 91.862502] BUG: unable to handle kernel NULL pointer dereference at
0000000000000014
[ 91.862735] IP: [<ffffffffa013a470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
[ 91.862889] PGD 0
[ 91.863026] Oops: 0000 [#1] SMP
[ 91.863235] Modules linked in: netconsole xt_nat xt_multiport ip_vs_rr veth
iptable_mangle xt_mark nf_conntrack_netlink nfnetlink ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_tcpudp iptable_filter
ip_tables cpufreq_ondemand cpufreq_powersave cpufreq_conservative
cpufreq_userspace ocfs2_stack_o2cb ocfs2_dlm bridge stp llc bonding fuse
nf_conntrack_ftp 8021q openvswitch gre vxlan xt_conntrack x_tables ocfs2_dlmfs
dlm sctp ocfs2 ocfs2_nodemanager ocfs2_stackglue configfs rbd kvm_intel kvm
coretemp ip_vs_ftp ip_vs nf_nat nf_conntrack i2c_i801 psmouse serio_raw lpc_ich
mfd_core evdev btrfs lzo_decompress lzo_compress
[ 91.866846] CPU: 1 PID: 18895 Comm: vsftpd Not tainted 3.12.33 #5
[ 91.866927] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a
09/28/2011
[ 91.867023] task: ffff8807b9360540 ti: ffff8807afe90000 task.ti: ffff8807afe90000
[ 91.867116] RIP: 0010:[<ffffffffa013a470>] [<ffffffffa013a470>]
nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
[ 91.867268] RSP: 0018:ffff88083fc43988 EFLAGS: 00010206
[ 91.867346] RAX: 000000000000000c RBX: ffff88079aeb006c RCX: 0000000000000003
[ 91.867428] RDX: 000000000000002a RSI: 0000000000000003 RDI: ffff88079aeb006c
[ 91.867509] RBP: 00000000ce63f6dd R08: ffff8807b2eed780 R09: ffff88083fc43998
[ 91.867598] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[ 91.867679] R13: 0000000000000000 R14: 0000000000000003 R15: ffff880815d948bc
[ 91.867761] FS: 00007f1a8aad5700(0000) GS:ffff88083fc40000(0000)
knlGS:0000000000000000
[ 91.867855] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 91.867926] CR2: 0000000000000014 CR3: 00000007a386a000 CR4: 00000000000407e0
[ 91.868008] Stack:
[ 91.868073] ffff88081690d220 0000000000000012 0000000000000014 ffff88079aeb0068
[ 91.868383] ffff880815d94801 ffffffffa014f681 0000000000000000 ffffffff00000045
[ 91.868694] ffff880800000048 0000001b00000003 ffff88083fc43a60 ffff88081690d220
[ 91.869003] Call Trace:
[ 91.869077] <IRQ>
[ 91.869136] [<ffffffffa014f681>] ? __nf_nat_mangle_tcp_packet+0x109/0x120
[nf_nat]
[ 91.869356] [<ffffffffa017749e>] ? ip_vs_ftp_out.part.8+0x2b2/0x338 [ip_vs_ftp]
[ 91.869460] [<ffffffffa015f884>] ? ip_vs_app_pkt_out+0x105/0x18b [ip_vs]
[ 91.869539] [<ffffffffa0163028>] ? tcp_snat_handler+0x6b/0x320 [ip_vs]
[ 91.869622] [<ffffffffa0155d3d>] ? ip_vs_conn_out_get_proto+0x1c/0x25 [ip_vs]
[ 91.869736] [<ffffffffa015893c>] ? ip_vs_out+0x2a5/0x5f6 [ip_vs]
[ 91.869826] [<ffffffff8150f544>] ? ip_frag_mem+0x2a/0x2a
[ 91.869906] [<ffffffff81508e1f>] ? nf_iterate+0x42/0x80
[ 91.869996] [<ffffffff81508ec6>] ? nf_hook_slow+0x69/0xff
[ 91.870073] [<ffffffff8150f544>] ? ip_frag_mem+0x2a/0x2a
[ 91.870153] [<ffffffff8150f8ae>] ? ip_forward+0x22d/0x2cf
[ 91.870230] [<ffffffff814e57ce>] ? __netif_receive_skb_core+0x5f0/0x66c
[ 91.870311] [<ffffffff814e59df>] ? process_backlog+0x13e/0x13e
[ 91.870389] [<ffffffffa0455e09>] ? br_handle_frame_finish+0x382/0x382 [bridge]
[ 91.870482] [<ffffffff814e5a2b>] ? netif_receive_skb+0x4c/0x7d
[ 91.870561] [<ffffffffa0455d95>] ? br_handle_frame_finish+0x30e/0x382 [bridge]
[ 91.870652] [<ffffffffa0455fda>] ? br_handle_frame+0x1d1/0x217 [bridge]
[ 91.870733] [<ffffffff814e567d>] ? __netif_receive_skb_core+0x49f/0x66c
[ 91.870817] [<ffffffff8104daa3>] ? call_timer_fn+0x4b/0xf6
[ 91.870893] [<ffffffff814e592b>] ? process_backlog+0x8a/0x13e
[ 91.870972] [<ffffffff814e5c31>] ? net_rx_action+0xa2/0x1c0
[ 91.871051] [<ffffffff81047e2e>] ? __do_softirq+0xf6/0x24f
[ 91.871132] [<ffffffff815ad7dc>] ? call_softirq+0x1c/0x30
[ 91.871203] <EOI>
[ 91.871260] [<ffffffff8100464d>] ? do_softirq+0x2c/0x5f
[ 91.871470] [<ffffffff81047ca1>] ? local_bh_enable+0x67/0x85
[ 91.871545] [<ffffffff81511689>] ? ip_finish_output+0x2c9/0x322
[ 91.871628] [<ffffffff8151240a>] ? ip_queue_xmit+0x2b7/0x2f0
[ 91.871714] [<ffffffff81524772>] ? tcp_transmit_skb+0x6ef/0x755
[ 91.871792] [<ffffffff815250e8>] ? tcp_write_xmit+0x886/0x9cb
[ 91.871872] [<ffffffff8152527a>] ? __tcp_push_pending_frames+0x24/0x7e
[ 91.871951] [<ffffffff8151a33c>] ? tcp_sendmsg+0xa4c/0xbfc
[ 91.872036] [<ffffffff814d3477>] ? sock_aio_write+0xe3/0xfd
[ 91.872129] [<ffffffff81122f4d>] ? do_sync_write+0x59/0x79
[ 91.872215] [<ffffffff811239e3>] ? vfs_write+0xc4/0x182
[ 91.872298] [<ffffffff81123daf>] ? SyS_write+0x45/0x7c
[ 91.872382] [<ffffffff815ac35b>] ? tracesys+0xdd/0xe2
[ 91.872461] Code: 68 14 4d 01 c5 45 85 e4 74 46 f0 80 4f 78 40 48 8d 5f 04 48
89 df e8 00 12 47 e1 31 c0 41 83 fe 02 0f 97 c0 48 6b c0 0c 4c 01 e8 <8b> 70 08
39 70 04 74 08 89 ea 0f ca 39 10 79 0d 89 70 04 44 01
[ 91.876166] RIP [<ffffffffa013a470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
[ 91.876327] RSP <ffff88083fc43988>
[ 91.876400] CR2: 0000000000000014
[ 91.876497] ---[ end trace 2c6d9f405db2170c ]---
[ 91.876578] Kernel panic - not syncing: Fatal exception in interrupt
[ 91.876666] Rebooting in 10 seconds..
[ 101.935360] ACPI MEMORY or I/O RESET_REG.
node01:/ocfs2/usr/src/linux-3.12.33/scripts# ./decodecode
</tmp/node01-kernel-ipvs.log
[ 91.872461] Code: 68 14 4d 01 c5 45 85 e4 74 46 f0 80 4f 78 40 48 8d 5f 04 48
89 df e8 00 12 47 e1 31 c0 41 83 fe 02 0f 97 c0 48 6b c0 0c 4c 01 e8 <8b> 70 08
39 70 04 74 08 89 ea 0f ca 39 10 79 0d 89 70 04 44 01
All code
========
0: 68 14 4d 01 c5 pushq $0xffffffffc5014d14
5: 45 85 e4 test %r12d,%r12d
8: 74 46 je 0x50
a: f0 80 4f 78 40 lock orb $0x40,0x78(%rdi)
f: 48 8d 5f 04 lea 0x4(%rdi),%rbx
13: 48 89 df mov %rbx,%rdi
16: e8 00 12 47 e1 callq 0xffffffffe147121b
1b: 31 c0 xor %eax,%eax
1d: 41 83 fe 02 cmp $0x2,%r14d
21: 0f 97 c0 seta %al
24: 48 6b c0 0c imul $0xc,%rax,%rax
28: 4c 01 e8 add %r13,%rax
2b:* 8b 70 08 mov 0x8(%rax),%esi <-- trapping
instruction
2e: 39 70 04 cmp %esi,0x4(%rax)
31: 74 08 je 0x3b
33: 89 ea mov %ebp,%edx
35: 0f ca bswap %edx
37: 39 10 cmp %edx,(%rax)
39: 79 0d jns 0x48
3b: 89 70 04 mov %esi,0x4(%rax)
3e: 44 rex.R
3f: 01 .byte 0x1
Code starting with the faulting instruction
===========================================
0: 8b 70 08 mov 0x8(%rax),%esi
3: 39 70 04 cmp %esi,0x4(%rax)
6: 74 08 je 0x10
8: 89 ea mov %ebp,%edx
a: 0f ca bswap %edx
c: 39 10 cmp %edx,(%rax)
e: 79 0d jns 0x1d
10: 89 70 04 mov %esi,0x4(%rax)
13: 44 rex.R
14: 01 .byte 0x1
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hello,
On Mon, 8 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> Am 07.12.2014 19:27, schrieb Julian Anastasov:>
> >
> > I'm attaching a patch that avoids rerouting in
> > IPVS for LOCAL_IN. Please test it in your setup. My tests
> > were with NAT on today's net tree. I checked that it
> > compiles for 3.12.33. You can use the default snat_reroute=1.
> >
>
> I'm sorry to tell you that your patch does not fix the problem. The BUG happens
> as soon as the client sends PASV, the ftp server does not return "Entering
> Passive Mode":
Patch is to avoid the xfrm_selector_match crash,
may be caused when using local client (mail?).
For nf_ct_seqadj_set you have to use commit b25adce16064
("ipvs: correct usage/allocation of seqadj ext in ipvs").
I'll send it to you privately...
Regards
--
Julian Anastasov <[email protected]>
Hi Julian,
Am 08.12.2014 21:40, schrieb Julian Anastasov:
>
> Hello,
>
> On Mon, 8 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
>
>> Am 07.12.2014 19:27, schrieb Julian Anastasov:>
>>>
>>> I'm attaching a patch that avoids rerouting in
>>> IPVS for LOCAL_IN. Please test it in your setup. My tests
>>> were with NAT on today's net tree. I checked that it
>>> compiles for 3.12.33. You can use the default snat_reroute=1.
>>>
>>
>> I'm sorry to tell you that your patch does not fix the problem. The BUG happens
>> as soon as the client sends PASV, the ftp server does not return "Entering
>> Passive Mode":
>
> Patch is to avoid the xfrm_selector_match crash,
> may be caused when using local client (mail?).
> For nf_ct_seqadj_set you have to use commit b25adce16064
> ("ipvs: correct usage/allocation of seqadj ext in ipvs").
> I'll send it to you privately...
>
I rebuild everything with the two provided patches and still get:
[ 512.475449] BUG: unable to handle kernel NULL pointer dereference at
0000000000000014
[ 512.481277] IP: [<ffffffffa013d470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
[ 512.481442] PGD 0
[ 512.481572] Oops: 0000 [#1] SMP
[ 512.481750] Modules linked in: ip_vs_rr netconsole xt_nat xt_multiport veth
iptable_mangle xt_mark nf_conntrack_netlink nfnetlink ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_tcpudp iptable_filter
ip_tables cpufreq_ondemand cpufreq_powersave cpufreq_conservative
cpufreq_userspace ocfs2_stack_o2cb ocfs2_dlm bridge stp llc bonding fuse
nf_conntrack_ftp 8021q openvswitch gre vxlan xt_conntrack x_tables ocfs2_dlmfs
dlm sctp ocfs2 ocfs2_nodemanager ocfs2_stackglue configfs rbd kvm_intel kvm
coretemp ip_vs_ftp ip_vs nf_nat nf_conntrack psmouse serio_raw i2c_i801 lpc_ich
mfd_core evdev btrfs lzo_decompress lzo_compress
[ 512.485323] CPU: 4 PID: 28142 Comm: vsftpd Not tainted 3.12.33 #5
[ 512.485405] Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 1.1a
09/28/2011
[ 512.485497] task: ffff880703f1c500 ti: ffff8805cab2e000 task.ti: ffff8805cab2e000
[ 512.485594] RIP: 0010:[<ffffffffa013d470>] [<ffffffffa013d470>]
nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
[ 512.485751] RSP: 0018:ffff88083fd03988 EFLAGS: 00010206
[ 512.485829] RAX: 000000000000000c RBX: ffff8805cb314b1c RCX: 0000000000000003
[ 512.485916] RDX: 0000000000000026 RSI: 0000000000000003 RDI: ffff8805cb314b1c
[ 512.486007] RBP: 00000000030a6079 R08: ffff88079d058c80 R09: ffff88083fd03998
[ 512.486084] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003
[ 512.486162] R13: 0000000000000000 R14: 0000000000000003 R15: ffff8808170150bc
[ 512.486240] FS: 00007f0497645700(0000) GS:ffff88083fd00000(0000)
knlGS:0000000000000000
[ 512.486351] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 512.486431] CR2: 0000000000000014 CR3: 00000007457f4000 CR4: 00000000000407e0
[ 512.486512] Stack:
[ 512.486583] ffff88077b389460 0000000000000012 0000000000000014 ffff8805cb314b18
[ 512.486886] ffff880817015001 ffffffffa0152681 0000000000000000 ffffffff00000045
[ 512.487195] ffff880800000048 0000001b00000003 ffff88083fd03a60 ffff88077b389460
[ 512.487501] Call Trace:
[ 512.487574] <IRQ>
[ 512.487634] [<ffffffffa0152681>] ? __nf_nat_mangle_tcp_packet+0x109/0x120
[nf_nat]
[ 512.487859] [<ffffffffa017a49e>] ? ip_vs_ftp_out.part.8+0x2b2/0x338 [ip_vs_ftp]
[ 512.487957] [<ffffffffa0162884>] ? ip_vs_app_pkt_out+0x105/0x18b [ip_vs]
[ 512.488038] [<ffffffffa0166028>] ? tcp_snat_handler+0x6b/0x320 [ip_vs]
[ 512.488123] [<ffffffffa0158d3d>] ? ip_vs_conn_out_get_proto+0x1c/0x25 [ip_vs]
[ 512.488222] [<ffffffffa015b93c>] ? ip_vs_out+0x2a5/0x5f6 [ip_vs]
[ 512.488325] [<ffffffff8150f544>] ? ip_frag_mem+0x2a/0x2a
[ 512.488405] [<ffffffff81508e1f>] ? nf_iterate+0x42/0x80
[ 512.488486] [<ffffffff81508ec6>] ? nf_hook_slow+0x69/0xff
[ 512.488565] [<ffffffff8150f544>] ? ip_frag_mem+0x2a/0x2a
[ 512.488645] [<ffffffff8150f8ae>] ? ip_forward+0x22d/0x2cf
[ 512.488729] [<ffffffff814e57ce>] ? __netif_receive_skb_core+0x5f0/0x66c
[ 512.488810] [<ffffffff814e59df>] ? process_backlog+0x13e/0x13e
[ 512.488893] [<ffffffffa0458e09>] ? br_handle_frame_finish+0x382/0x382 [bridge]
[ 512.488987] [<ffffffff814e5a2b>] ? netif_receive_skb+0x4c/0x7d
[ 512.489068] [<ffffffffa0458d95>] ? br_handle_frame_finish+0x30e/0x382 [bridge]
[ 512.489166] [<ffffffffa0458fda>] ? br_handle_frame+0x1d1/0x217 [bridge]
[ 512.489247] [<ffffffff814e567d>] ? __netif_receive_skb_core+0x49f/0x66c
[ 512.489338] [<ffffffff814e592b>] ? process_backlog+0x8a/0x13e
[ 512.489415] [<ffffffff814e5c31>] ? net_rx_action+0xa2/0x1c0
[ 512.489493] [<ffffffff81047e2e>] ? __do_softirq+0xf6/0x24f
[ 512.489578] [<ffffffff815ad7dc>] ? call_softirq+0x1c/0x30
[ 512.489655] <EOI>
[ 512.489721] [<ffffffff8100464d>] ? do_softirq+0x2c/0x5f
[ 512.489920] [<ffffffff81047ca1>] ? local_bh_enable+0x67/0x85
[ 512.489996] [<ffffffff81511689>] ? ip_finish_output+0x2c9/0x322
[ 512.490076] [<ffffffff8151240a>] ? ip_queue_xmit+0x2b7/0x2f0
[ 512.490156] [<ffffffff81524772>] ? tcp_transmit_skb+0x6ef/0x755
[ 512.490235] [<ffffffff815250e8>] ? tcp_write_xmit+0x886/0x9cb
[ 512.490311] [<ffffffff8152527a>] ? __tcp_push_pending_frames+0x24/0x7e
[ 512.490392] [<ffffffff8151a33c>] ? tcp_sendmsg+0xa4c/0xbfc
[ 512.490466] [<ffffffff814d3477>] ? sock_aio_write+0xe3/0xfd
[ 512.490545] [<ffffffff81122f4d>] ? do_sync_write+0x59/0x79
[ 512.490623] [<ffffffff811239e3>] ? vfs_write+0xc4/0x182
[ 512.490703] [<ffffffff81123daf>] ? SyS_write+0x45/0x7c
[ 512.490781] [<ffffffff815ac35b>] ? tracesys+0xdd/0xe2
[ 512.490859] Code: 68 14 4d 01 c5 45 85 e4 74 46 f0 80 4f 78 40 48 8d 5f 04 48
89 df e8 00 e2 46 e1 31 c0 41 83 fe 02 0f 97 c0 48 6b c0 0c 4c 01 e8 <8b> 70 08
39 70 04 74 08 89 ea 0f ca 39 10 79 0d 89 70 04 44 01
[ 512.494558] RIP [<ffffffffa013d470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
[ 512.494714] RSP <ffff88083fd03988>
[ 512.494785] CR2: 0000000000000014
[ 512.494871] ---[ end trace 8a6e753cba1ccec2 ]---
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hello,
On Tue, 9 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> I rebuild everything with the two provided patches and still get:
>
> [ 512.475449] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000014
> [ 512.481277] IP: [<ffffffffa013d470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
Same place, hm...
> [ 512.485323] CPU: 4 PID: 28142 Comm: vsftpd Not tainted 3.12.33 #5
Above "#5" is same as previous oops. It means kernel
is not updated. Or you updated only the IPVS modules after
applying the both patches?
You can also try without FTP tests to see if there
are oopses in xfrm, so that we can close this topic and then
to continue for the FTP problem on IPVS lists without
bothering non-IPVS people.
Regards
--
Julian Anastasov <[email protected]>
Hi,
Am 10.12.2014 22:41, schrieb Julian Anastasov:>
> Hello,
>
> On Tue, 9 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
>
>> I rebuild everything with the two provided patches and still get:
>>
>> [ 512.475449] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000014
>> [ 512.481277] IP: [<ffffffffa013d470>] nf_ct_seqadj_set+0x60/0x90 [nf_conntrack]
>
> Same place, hm...
>
>> [ 512.485323] CPU: 4 PID: 28142 Comm: vsftpd Not tainted 3.12.33 #5
>
> Above "#5" is same as previous oops. It means kernel
> is not updated. Or you updated only the IPVS modules after
> applying the both patches?
I did it with make-kpkg --initrd linux_image which only rebuilt the modules,
correct. I can retry with make clean before building the package
>
> You can also try without FTP tests to see if there
> are oopses in xfrm, so that we can close this topic and then
> to continue for the FTP problem on IPVS lists without
> bothering non-IPVS people.
>
yeah, it seems that the xfrm issue is away.
--
Mit freundlichen Gr??en,
Florian Wiessner
Smart Weblications GmbH
Martinsberger Str. 1
D-95119 Naila
fon.: +49 9282 9638 200
fax.: +49 9282 9638 205
24/7: +49 900 144 000 00 - 0,99 EUR/Min*
http://www.smart-weblications.de
--
Sitz der Gesellschaft: Naila
Gesch?ftsf?hrer: Florian Wiessner
HRB-Nr.: HRB 3840 Amtsgericht Hof
*aus dem dt. Festnetz, ggf. abweichende Preise aus dem Mobilfunknetz
Hello,
On Thu, 11 Dec 2014, Smart Weblications GmbH - Florian Wiessner wrote:
> >> [ 512.485323] CPU: 4 PID: 28142 Comm: vsftpd Not tainted 3.12.33 #5
> >
> > Above "#5" is same as previous oops. It means kernel
> > is not updated. Or you updated only the IPVS modules after
> > applying the both patches?
>
> I did it with make-kpkg --initrd linux_image which only rebuilt the modules,
> correct. I can retry with make clean before building the package
I just tested PASV and PORT with 3.12.33 including
both patches (seq adj fix + ip_route_me_harder fix) and do not
see any crashes in nf_ct_seqadj_set. If you still have problem
with FTP send me more info offlist.
> > You can also try without FTP tests to see if there
> > are oopses in xfrm, so that we can close this topic and then
> > to continue for the FTP problem on IPVS lists without
> > bothering non-IPVS people.
> >
>
> yeah, it seems that the xfrm issue is away.
Thanks for the confirmation!
Regards
--
Julian Anastasov <[email protected]>