LinuxLists.cc - ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 08:47:12

Subject: ip_rcv_finish() NULL pointer and possibly related Oopses

Please CC me in any responses, thanks.

Testing our app at scale on Google?s GCE, running ~1000 CoreOS hosts: over
approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses
below and then reboot (I?m not sure if the different oopses are related to
each other).

The app is Project Calico, which is a datacenter networking fabric.
calico-felix, the process named below, is our per-host agent. The
per-host agent is responsible for reading the network information from a
central server and applying "ip route? and "iptables" updates to the
kernel. We?re running on CoreOS, with about 100 docker containers/veths
pairs running on each host. calico-felix is running inside one of those
containers. We also run the BIRD BGP stack to redistribute routes around
the datacenter. The errors happen more frequently while Calico is under
load.

I?m not sure where to go from here. I can reproduce these issues easily
at that scale but I haven?t managed to boil it down to a small-scale repro
scenario for further investigation (yet).

-Shaun

[ 4513.711475] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 4513.712008] IP: [< (null)>] (null)
[ 4513.712008] PGD 2b9c067 PUD 2e2d067 PMD 0
[ 4513.712008] Oops: 0010 [#1] SMP
[ 4513.712008] Modules linked in: xt_mac xt_mark veth ip_set_hash_net
nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set
nfnetlink ipip tunnel4 ip_tunnel ip6table_filter ip6_tables xt_conntrack
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat
nf_conntrack bridge stp llc overlay nls_ascii nls_cp437 vfat fat ext4
crc16 mbcache jbd2 sd_mod crc32c_intel virtio_scsi scsi_mod mousedev
virtio_net aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper
cryptd psmouse microcode firmware_class i2c_piix4 i2c_core virtio_pci
acpi_cpufreq virtio_ring virtio button evdev sch_fq_codel ip_tables autofs4
[ 4513.712008] CPU: 1 PID: 1192 Comm: calico-felix Not tainted
4.1.4-coreos-r1 #2
[ 4513.712008] Hardware name: Google Google, BIOS Google 01/01/2011
[ 4513.712008] task: ffff88021234e480 ti: ffff88021184c000 task.ti:
ffff88021184c000
[ 4513.712008] RIP: 0010:[<0000000000000000>] [< (null)>]
(null)
[ 4513.712008] RSP: 0000:ffff88021fd03c30 EFLAGS: 00010293
[ 4513.712008] RAX: ffff880213ec5a00 RBX: ffff880213ec5200 RCX:
0000000000000001
[ 4513.712008] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
ffff880213ec5200
[ 4513.712008] RBP: ffff88021fd03c58 R08: 0000000000007430 R09:
000000000000d9bf
[ 4513.712008] R10: 0000000000000003 R11: ffff8800022915e0 R12:
ffff8801d98cbe4e
[ 4513.712008] R13: ffff880216a77000 R14: ffff880216a77000 R15:
0000000000000008
[ 4513.712008] FS: 00007f405264b740(0000) GS:ffff88021fd00000(0000)
knlGS:0000000000000000
[ 4513.712008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4513.712008] CR2: 0000000000000000 CR3: 00000000b94ad000 CR4:
00000000001407e0
[ 4513.712008] Stack:
[ 4513.712008] ffffffff81486751 ffff88021fd03c48 ffff8801d98cbe4e
ffff880213ec5200
[ 4513.712008] ffff880216a77000 ffff88021fd03cb8 ffffffff814870e4
8000000000000000
[ 4513.712008] ffff880213ec5202 ffff880216a77000 0000000000000000
0000000000000000
[ 4513.712008] Call Trace:
[ 4513.712008] <IRQ>
[ 4513.712008] [<ffffffff81486751>] ? ip_rcv_finish+0x81/0x360
[ 4513.712008] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
[ 4513.712008] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
[ 4513.712008] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
[ 4513.712008] [<ffffffff8143b667>] ? build_skb+0x17/0x90
[ 4513.712008] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
[ 4513.712008] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
[ 4513.712008] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
[ 4513.712008] [<ffffffffa00f772b>] 0xffffffffa00f772b
[ 4513.712008] [<ffffffff814491b3>] ? __netif_receive_skb_core+0x6c3/0x9a0
[ 4513.712008] [<ffffffffa00f7d81>] 0xffffffffa00f7d81
[ 4513.712008] [<ffffffff81449979>] net_rx_action+0x159/0x340
[ 4513.712008] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
[ 4513.712008] [<ffffffff810719fd>] irq_exit+0xad/0xc0
[ 4513.712008] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
[ 4513.712008] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
[ 4513.712008] <EOI>
[ 4513.712008] Code: Bad RIP value.
[ 4513.712008] RIP [< (null)>] (null)
[ 4513.712008] RSP <ffff88021fd03c30>
[ 4513.712008] CR2: 0000000000000000
[ 4513.712008] ---[ end trace 241c3b115a347e7d ]---
[ 4513.712008] Kernel panic - not syncing: Fatal exception in interrupt
[ 4513.712008] Kernel Offset: disabled
[ 4513.712008] Rebooting in 60 seconds..
[ 4513.712008] ACPI MEMORY or I/O RESET_REG.

[ 3903.658037] general protection fault: 0000 [#1] SMP
[ 3903.659014] Modules linked in: xt_mac xt_mark veth ip_set_hash_net
nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set
nfnetlink ipip tunnel4 ip_tunnel ip6table_filter ip6_tables xt_conntrack
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat
nf_conntrack bridge stp llc overlay nls_ascii nls_cp437 vfat fat ext4
crc16 mbcache jbd2 sd_mod virtio_scsi scsi_mod virtio_net mousedev
crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper
cryptd microcode firmware_class psmouse virtio_pci virtio_ring virtio
i2c_piix4 i2c_core acpi_cpufreq button evdev sch_fq_codel ip_tables autofs4
[ 3903.659014] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.4-coreos-r1 #2
[ 3903.659014] Hardware name: Google Google, BIOS Google 01/01/2011
[ 3903.659014] task: ffffffff81a154c0 ti: ffffffff81a00000 task.ti:
ffffffff81a00000
[ 3903.659014] RIP: 0010:[<ffffffff814818df>] [<ffffffff814818df>]
ipv4_dst_destroy+0x3f/0x80
[ 3903.659014] RSP: 0018:ffff88021fc03e58 EFLAGS: 00010246
[ 3903.659014] RAX: dead000000200200 RBX: ffff8801a8e70600 RCX:
0000000000000020
[ 3903.659014] RDX: dead000000100100 RSI: 00000000fffffe01 RDI:
ffff88021fc17180
[ 3903.659014] RBP: ffff88021fc03e68 R08: ffff88021fc16d60 R09:
ffff8801a8e70600
[ 3903.659014] R10: ffff88021fc16d80 R11: 0000000000000005 R12:
ffff88021fc17180
[ 3903.659014] R13: 0000000000000000 R14: 0000000000000002 R15:
ffff88021fc16d80
[ 3903.659014] FS: 0000000000000000(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[ 3903.659014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3903.659014] CR2: 00007f23b24be024 CR3: 00000000badef000 CR4:
00000000001407f0
[ 3903.659014] Stack:
[ 3903.659014] ffff8801a8e70600 0000000000000000 ffff88021fc03e88
ffffffff81452102
[ 3903.659014] ffffffff81a50f80 000000000000000a ffff88021fc03e98
ffffffff814523de
[ 3903.659014] ffff88021fc03f08 ffffffff810d0476 ffffffff8359ac78
ffff8801a8e70600
[ 3903.659014] Call Trace:
[ 3903.659014] <IRQ>
[ 3903.659014] [<ffffffff81452102>] dst_destroy+0x32/0xe0
[ 3903.659014] [<ffffffff814523de>] dst_destroy_rcu+0xe/0x20
[ 3903.659014] [<ffffffff810d0476>] rcu_process_callbacks+0x226/0x5d0
[ 3903.659014] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
[ 3903.659014] [<ffffffff810719fd>] irq_exit+0xad/0xc0
[ 3903.659014] [<ffffffff8155299a>] smp_apic_timer_interrupt+0x4a/0x60
[ 3903.659014] [<ffffffff81550a5e>] apic_timer_interrupt+0x6e/0x80
[ 3903.659014] <EOI>
[ 3903.659014] [<ffffffff81059bd6>] ? native_safe_halt+0x6/0x10
[ 3903.659014] [<ffffffff8101f0de>] default_idle+0x1e/0xc0
[ 3903.659014] [<ffffffff8101fbbf>] arch_cpu_idle+0xf/0x20
[ 3903.659014] [<ffffffff810b0a74>] cpu_startup_entry+0x314/0x3e0
[ 3903.659014] [<ffffffff8153bcec>] rest_init+0x7c/0x80
[ 3903.659014] [<ffffffff81b130e0>] start_kernel+0x483/0x490
[ 3903.659014] [<ffffffff81b12a4d>] ? set_init_arg+0x55/0x55
[ 3903.659014] [<ffffffff81b12120>] ? early_idt_handler_array+0x120/0x120
[ 3903.659014] [<ffffffff81b125ee>] x86_64_start_reservations+0x2a/0x2c
[ 3903.659014] [<ffffffff81b12728>] x86_64_start_kernel+0x138/0x147
[ 3903.659014] Code: 39 87 b0 00 00 00 48 89 fb 74 4e 4c 8b a7 c0 00 00 00
4c 89 e7 e8 f2 e0 0c 00 48 8b 83 b8 00 00 00 48 8b 93 b0 00 00 00 4c 89 e7
<48> 89 42 08 48 89 10 48 b8 00 01 10 00 00 00 ad de 48 89 83 b0
[ 3903.659014] RIP [<ffffffff814818df>] ipv4_dst_destroy+0x3f/0x80
[ 3903.659014] RSP <ffff88021fc03e58>
[ 3903.763206] ---[ end trace c08a46f2a9fe400e ]---
[ 3903.764325] Kernel panic - not syncing: Fatal exception in interrupt
[ 3903.765318] Kernel Offset: disabled
[ 3903.765318] Rebooting in 60 seconds..

[ 4122.936024] ------------[ cut here ]------------
[ 4122.936987] WARNING: CPU: 0 PID: 0 at
/build/amd64-usr/usr/src/linux-4.1.4-coreos-r1/kernel/rcu/tree.c:2652
rcu_process_callbacks+0x58e/0x5d0()
[ 4122.938250] Modules linked in: xt_mac xt_mark veth ip_set_hash_net
nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set
nfnetlink ipip tunnel4 ip_tunnel ip6table_filter ip6_tables xt_conntrack
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat
nf_conntrack bridge stp llc overlay nls_ascii nls_cp437 vfat fat ext4
crc16 mbcache jbd2 sd_mod mousedev virtio_net crc32c_intel virtio_scsi
scsi_mod aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper
cryptd microcode firmware_class psmouse i2c_piix4 i2c_core virtio_pci
virtio_ring virtio acpi_cpufreq button evdev sch_fq_codel ip_tables autofs4
[ 4122.938250] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.4-coreos-r1 #2
[ 4122.938250] Hardware name: Google Google, BIOS Google 01/01/2011
[ 4122.938250] ffffffff817af688 ffff88021fc03e48 ffffffff815490ec
0000000000000000
[ 4122.938250] 0000000000000000 ffff88021fc03e88 ffffffff8106d35a
ffff880214884410
[ 4122.938250] ffffffff81a50f80 0000000000000246 0000000000000000
0000000000000002
[ 4122.938250] Call Trace:
[ 4122.938250] <IRQ> [<ffffffff815490ec>] dump_stack+0x45/0x57
[ 4122.938250] [<ffffffff8106d35a>] warn_slowpath_common+0x8a/0xc0
[ 4122.938250] [<ffffffff8106d44a>] warn_slowpath_null+0x1a/0x20
[ 4122.938250] [<ffffffff810d07de>] rcu_process_callbacks+0x58e/0x5d0
[ 4122.938250] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
[ 4122.938250] [<ffffffff810719fd>] irq_exit+0xad/0xc0
[ 4122.938250] [<ffffffff8155299a>] smp_apic_timer_interrupt+0x4a/0x60
[ 4122.938250] [<ffffffff81550a5e>] apic_timer_interrupt+0x6e/0x80
[ 4122.938250] <EOI> [<ffffffff81059bd6>] ? native_safe_halt+0x6/0x10
[ 4122.938250] [<ffffffff8101f0de>] default_idle+0x1e/0xc0
[ 4122.938250] [<ffffffff8101fbbf>] arch_cpu_idle+0xf/0x20
[ 4122.938250] [<ffffffff810b0a74>] cpu_startup_entry+0x314/0x3e0
[ 4122.938250] [<ffffffff8153bcec>] rest_init+0x7c/0x80
[ 4122.938250] [<ffffffff81b130e0>] start_kernel+0x483/0x490
[ 4122.938250] [<ffffffff81b12a4d>] ? set_init_arg+0x55/0x55
[ 4122.938250] [<ffffffff81b12120>] ? early_idt_handler_array+0x120/0x120
[ 4122.938250] [<ffffffff81b125ee>] x86_64_start_reservations+0x2a/0x2c
[ 4122.938250] [<ffffffff81b12728>] x86_64_start_kernel+0x138/0x147
[ 4122.938250] ---[ end trace 715eb5397f59c214 ]---
[ 4123.002136] BUG: unable to handle kernel NULL pointer dereference at
0000000000000340
[ 4123.003006] IP: [<ffffffffa0233027>] 0xffffffffa0233027
[ 4123.003006] PGD 212918067 PUD 214e33067 PMD 0
[ 4123.003006] Oops: 0000 [#1] SMP
[ 4123.003006] Modules linked in: xt_mac xt_mark veth ip_set_hash_net
nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set
nfnetlink ipip tunnel4 ip_tunnel ip6table_filter ip6_tables xt_conntrack
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat
nf_conntrack bridge stp llc overlay nls_ascii nls_cp437 vfat fat ext4
crc16 mbcache jbd2 sd_mod mousedev virtio_net crc32c_intel virtio_scsi
scsi_mod aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper
cryptd microcode firmware_class psmouse i2c_piix4 i2c_core virtio_pci
virtio_ring virtio acpi_cpufreq button evdev sch_fq_codel ip_tables autofs4
[ 4123.003006] CPU: 0 PID: 1165 Comm: calico-felix Tainted: G W
4.1.4-coreos-r1 #2
[ 4123.003006] Hardware name: Google Google, BIOS Google 01/01/2011
[ 4123.003006] task: ffff880215dce480 ti: ffff8800b9954000 task.ti:
ffff8800b9954000
[ 4123.003006] RIP: 0010:[<ffffffffa0233027>] [<ffffffffa0233027>]
0xffffffffa0233027
[ 4123.003006] RSP: 0018:ffff88021fc03b58 EFLAGS: 00010282
[ 4123.003006] RAX: 0000000000000000 RBX: ffff88021fc03bb0 RCX:
0000000000000000
[ 4123.003006] RDX: ffff88021fc03be8 RSI: 0000000000000001 RDI:
ffff880215960c00
[ 4123.003006] RBP: ffff88021fc03b58 R08: 0000000000000001 R09:
ffff880215960c00
[ 4123.003006] R10: ffff88000316e000 R11: 0000000000000001 R12:
ffff88021fc03be8
[ 4123.003006] R13: ffff880215960c00 R14: ffffffff81af46d0 R15:
0000000000000008
[ 4123.003006] FS: 00007f566ef6d740(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[ 4123.003006] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4123.003006] CR2: 0000000000000340 CR3: 0000000211758000 CR4:
00000000001407f0
[ 4123.003006] Stack:
[ 4123.003006] ffff88021fc03b98 ffffffff8147d477 ffff88021fc03c28
ffff8802148156c0
[ 4123.003006] ffff880215960c00 ffff88021fc03be8 ffff880215960c00
ffff880216a72000
[ 4123.003006] ffff88021fc03bd8 ffffffff8147d537 bbf0f00a9b42f00a
ffff8802148156c0
[ 4123.003006] Call Trace:
[ 4123.003006] <IRQ>
[ 4123.003006] [<ffffffff8147d477>] nf_iterate+0x57/0x80
[ 4123.003006] [<ffffffff8147d537>] nf_hook_slow+0x97/0x100
[ 4123.003006] [<ffffffff81486e32>] ip_local_deliver+0x92/0xa0
[ 4123.003006] [<ffffffff81486a30>] ? ip_rcv_finish+0x360/0x360
[ 4123.003006] [<ffffffff81486751>] ip_rcv_finish+0x81/0x360
[ 4123.003006] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
[ 4123.003006] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
[ 4123.003006] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
[ 4123.003006] [<ffffffff8143b667>] ? build_skb+0x17/0x90
[ 4123.003006] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
[ 4123.003006] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
[ 4123.003006] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
[ 4123.003006] [<ffffffffa00d472b>] 0xffffffffa00d472b
[ 4123.003006] [<ffffffffa00d4d81>] 0xffffffffa00d4d81
[ 4123.003006] [<ffffffff81449979>] net_rx_action+0x159/0x340
[ 4123.003006] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
[ 4123.003006] [<ffffffff810719fd>] irq_exit+0xad/0xc0
[ 4123.003006] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
[ 4123.003006] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
[ 4123.003006] <EOI>
[ 4123.003006] [<ffffffff81483a3d>] ? __ip_route_output_key+0x31d/0x860
[ 4123.003006] [<ffffffff814e2e95>] ? xfrm_lookup_route+0x5/0x70
[ 4123.003006] [<ffffffff81484224>] ? ip_route_output_flow+0x54/0x60
[ 4123.003006] [<ffffffff8148ca6a>] ip_queue_xmit+0x36a/0x3d0
[ 4123.003006] [<ffffffff814a4799>] tcp_transmit_skb+0x4b9/0x990
[ 4123.003006] [<ffffffff814a4d85>] tcp_write_xmit+0x115/0xe90
[ 4123.003006] [<ffffffff814a5d72>] __tcp_push_pending_frames+0x32/0xd0
[ 4123.003006] [<ffffffff8149443f>] tcp_push+0xef/0x120
[ 4123.003006] [<ffffffff81497cb5>] tcp_sendmsg+0xc5/0xb20
[ 4123.003006] [<ffffffff810d74c9>] ? lock_hrtimer_base.isra.22+0x29/0x50
[ 4123.003006] [<ffffffff814c2d04>] inet_sendmsg+0x64/0xa0
[ 4123.003006] [<ffffffff811e94b5>] ? __fget_light+0x25/0x70
[ 4123.003006] [<ffffffff8142d74d>] sock_sendmsg+0x3d/0x50
[ 4123.003006] [<ffffffff8142dc12>] SYSC_sendto+0x102/0x1a0
[ 4123.003006] [<ffffffff8110f864>] ? __audit_syscall_entry+0xb4/0x110
[ 4123.003006] [<ffffffff810224fc>] ? do_audit_syscall_entry+0x6c/0x70
[ 4123.003006] [<ffffffff81023cf3>] ?
syscall_trace_enter_phase1+0x103/0x160
[ 4123.003006] [<ffffffff8142e75e>] SyS_sendto+0xe/0x10
[ 4123.003006] [<ffffffff8154fc6e>] system_call_fastpath+0x12/0x71
[ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00
41 83 b9 80 00
[ 4123.003006] RIP [<ffffffffa0233027>] 0xffffffffa0233027
[ 4123.003006] RSP <ffff88021fc03b58>
[ 4123.003006] CR2: 0000000000000340
[ 4123.003006] ---[ end trace 715eb5397f59c215 ]---
[ 4123.003006] Kernel panic - not syncing: Fatal exception in interrupt
[ 4123.003006] Kernel Offset: disabled
[ 4123.003006] Rebooting in 60 seconds..
[ 4123.003006] ACPI MEMORY or I/O RESET_REG.

2015-08-26 11:50:07

by Chuck Ebbert

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Wed, 26 Aug 2015 08:46:59 +0000
Shaun Crampton <[email protected]> wrote:

> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over
> approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses
> below and then reboot (I¹m not sure if the different oopses are related to
> each other).
>
> The app is Project Calico, which is a datacenter networking fabric.
> calico-felix, the process named below, is our per-host agent. The
> per-host agent is responsible for reading the network information from a
> central server and applying "ip route² and "iptables" updates to the
> kernel. We¹re running on CoreOS, with about 100 docker containers/veths
> pairs running on each host. calico-felix is running inside one of those
> containers. We also run the BIRD BGP stack to redistribute routes around
> the datacenter. The errors happen more frequently while Calico is under
> load.
>
> I¹m not sure where to go from here. I can reproduce these issues easily
> at that scale but I haven¹t managed to boil it down to a small-scale repro
> scenario for further investigation (yet).
>

What in the world is going on with those call traces? E.g.:

> [ 4513.712008] <IRQ>
> [ 4513.712008] [<ffffffff81486751>] ? ip_rcv_finish+0x81/0x360
> [ 4513.712008] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
> [ 4513.712008] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
> [ 4513.712008] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
> [ 4513.712008] [<ffffffff8143b667>] ? build_skb+0x17/0x90
> [ 4513.712008] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
> [ 4513.712008] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
> [ 4513.712008] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
> [ 4513.712008] [<ffffffffa00f772b>] 0xffffffffa00f772b
> [ 4513.712008] [<ffffffff814491b3>] ? __netif_receive_skb_core+0x6c3/0x9a0
> [ 4513.712008] [<ffffffffa00f7d81>] 0xffffffffa00f7d81
> [ 4513.712008] [<ffffffff81449979>] net_rx_action+0x159/0x340
> [ 4513.712008] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
> [ 4513.712008] [<ffffffff810719fd>] irq_exit+0xad/0xc0
> [ 4513.712008] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
> [ 4513.712008] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
> [ 4513.712008] <EOI>

There are two functions in the call trace that the kernel knows
nothing about. How did they get in there?

And there is really executable code in there, as can be seen from a
later trace:

> [ 4123.003006] <IRQ>
> [ 4123.003006] [<ffffffff8147d477>] nf_iterate+0x57/0x80
> [ 4123.003006] [<ffffffff8147d537>] nf_hook_slow+0x97/0x100
> [ 4123.003006] [<ffffffff81486e32>] ip_local_deliver+0x92/0xa0
> [ 4123.003006] [<ffffffff81486a30>] ? ip_rcv_finish+0x360/0x360
> [ 4123.003006] [<ffffffff81486751>] ip_rcv_finish+0x81/0x360
> [ 4123.003006] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
> [ 4123.003006] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
> [ 4123.003006] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
> [ 4123.003006] [<ffffffff8143b667>] ? build_skb+0x17/0x90
> [ 4123.003006] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
> [ 4123.003006] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
> [ 4123.003006] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
> [ 4123.003006] [<ffffffffa00d472b>] 0xffffffffa00d472b
> [ 4123.003006] [<ffffffffa00d4d81>] 0xffffffffa00d4d81
> [ 4123.003006] [<ffffffff81449979>] net_rx_action+0x159/0x340
> [ 4123.003006] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
> [ 4123.003006] [<ffffffff810719fd>] irq_exit+0xad/0xc0
> [ 4123.003006] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
> [ 4123.003006] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
> [ 4123.003006] <EOI>
> [ 4123.003006] [<ffffffff81483a3d>] ? __ip_route_output_key+0x31d/0x860
> [ 4123.003006] [<ffffffff814e2e95>] ? xfrm_lookup_route+0x5/0x70
> [ 4123.003006] [<ffffffff81484224>] ? ip_route_output_flow+0x54/0x60
> [ 4123.003006] [<ffffffff8148ca6a>] ip_queue_xmit+0x36a/0x3d0
> [ 4123.003006] [<ffffffff814a4799>] tcp_transmit_skb+0x4b9/0x990
> [ 4123.003006] [<ffffffff814a4d85>] tcp_write_xmit+0x115/0xe90
> [ 4123.003006] [<ffffffff814a5d72>] __tcp_push_pending_frames+0x32/0xd0
> [ 4123.003006] [<ffffffff8149443f>] tcp_push+0xef/0x120
> [ 4123.003006] [<ffffffff81497cb5>] tcp_sendmsg+0xc5/0xb20
> [ 4123.003006] [<ffffffff810d74c9>] ? lock_hrtimer_base.isra.22+0x29/0x50
> [ 4123.003006] [<ffffffff814c2d04>] inet_sendmsg+0x64/0xa0
> [ 4123.003006] [<ffffffff811e94b5>] ? __fget_light+0x25/0x70
> [ 4123.003006] [<ffffffff8142d74d>] sock_sendmsg+0x3d/0x50
> [ 4123.003006] [<ffffffff8142dc12>] SYSC_sendto+0x102/0x1a0
> [ 4123.003006] [<ffffffff8110f864>] ? __audit_syscall_entry+0xb4/0x110
> [ 4123.003006] [<ffffffff810224fc>] ? do_audit_syscall_entry+0x6c/0x70
> [ 4123.003006] [<ffffffff81023cf3>] ?
> syscall_trace_enter_phase1+0x103/0x160
> [ 4123.003006] [<ffffffff8142e75e>] SyS_sendto+0xe/0x10
> [ 4123.003006] [<ffffffff8154fc6e>] system_call_fastpath+0x12/0x71
> [ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00
> 41 83 b9 80 00
> [ 4123.003006] RIP [<ffffffffa0233027>] 0xffffffffa0233027
> [ 4123.003006] RSP <ffff88021fc03b58>

Presumably the same two functions as before (loaded at a different
base address but same offsets, 0xd81 and 0x72b). And then nf_iterate
call into another unknown function, and there really is code there
and it's consistent with the oops. And the kernel thinks it's
outside of any normal text section, so it does not try to dump any
code from before the instruction pointer.

0: 48 8b 88 40 03 00 00 mov 0x340(%rax),%rcx
7: e8 1d dd dd ff callq 0xffffffffffdddd29
c: 5d pop %rbp
d: c3 retq

Did you write your own module loader or something?

2015-08-26 13:01:20

by Shaun Crampton

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

>And the kernel thinks it's
>outside of any normal text section, so it does not try to dump any
>code from before the instruction pointer.
>
> 0: 48 8b 88 40 03 00 00 mov 0x340(%rax),%rcx
> 7: e8 1d dd dd ff callq 0xffffffffffdddd29
> c: 5d pop %rbp
> d: c3 retq
>
>Did you write your own module loader or something?

We certainly didn't but CoreOS may have. I've asked CoreOS if they know
what's going on.

Are there any extra diagnostics I can gather from a CoreOS system to help
figure out what's going on there? Is there anything I can do to get more
useful diagnostics when one of these failures occur? As noted, I can
reproduce the issue but it's expensive, requiring hundreds of VMs to
hammer away for an hour or so.

2015-08-26 20:54:37

by Michael Marineau

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert <[email protected]> wrote:
> On Wed, 26 Aug 2015 08:46:59 +0000
> Shaun Crampton <[email protected]> wrote:
>
>> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over
>> approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses
>> below and then reboot (I¹m not sure if the different oopses are related to
>> each other).
>>
>> The app is Project Calico, which is a datacenter networking fabric.
>> calico-felix, the process named below, is our per-host agent. The
>> per-host agent is responsible for reading the network information from a
>> central server and applying "ip route² and "iptables" updates to the
>> kernel. We¹re running on CoreOS, with about 100 docker containers/veths
>> pairs running on each host. calico-felix is running inside one of those
>> containers. We also run the BIRD BGP stack to redistribute routes around
>> the datacenter. The errors happen more frequently while Calico is under
>> load.
>>
>> I¹m not sure where to go from here. I can reproduce these issues easily
>> at that scale but I haven¹t managed to boil it down to a small-scale repro
>> scenario for further investigation (yet).
>>
>
> What in the world is going on with those call traces? E.g.:
>
>> [ 4513.712008] <IRQ>
>> [ 4513.712008] [<ffffffff81486751>] ? ip_rcv_finish+0x81/0x360
>> [ 4513.712008] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
>> [ 4513.712008] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
>> [ 4513.712008] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
>> [ 4513.712008] [<ffffffff8143b667>] ? build_skb+0x17/0x90
>> [ 4513.712008] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
>> [ 4513.712008] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
>> [ 4513.712008] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
>> [ 4513.712008] [<ffffffffa00f772b>] 0xffffffffa00f772b
>> [ 4513.712008] [<ffffffff814491b3>] ? __netif_receive_skb_core+0x6c3/0x9a0
>> [ 4513.712008] [<ffffffffa00f7d81>] 0xffffffffa00f7d81
>> [ 4513.712008] [<ffffffff81449979>] net_rx_action+0x159/0x340
>> [ 4513.712008] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
>> [ 4513.712008] [<ffffffff810719fd>] irq_exit+0xad/0xc0
>> [ 4513.712008] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
>> [ 4513.712008] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
>> [ 4513.712008] <EOI>
>
> There are two functions in the call trace that the kernel knows
> nothing about. How did they get in there?
>
> And there is really executable code in there, as can be seen from a
> later trace:
>
>> [ 4123.003006] <IRQ>
>> [ 4123.003006] [<ffffffff8147d477>] nf_iterate+0x57/0x80
>> [ 4123.003006] [<ffffffff8147d537>] nf_hook_slow+0x97/0x100
>> [ 4123.003006] [<ffffffff81486e32>] ip_local_deliver+0x92/0xa0
>> [ 4123.003006] [<ffffffff81486a30>] ? ip_rcv_finish+0x360/0x360
>> [ 4123.003006] [<ffffffff81486751>] ip_rcv_finish+0x81/0x360
>> [ 4123.003006] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
>> [ 4123.003006] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
>> [ 4123.003006] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
>> [ 4123.003006] [<ffffffff8143b667>] ? build_skb+0x17/0x90
>> [ 4123.003006] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
>> [ 4123.003006] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
>> [ 4123.003006] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
>> [ 4123.003006] [<ffffffffa00d472b>] 0xffffffffa00d472b
>> [ 4123.003006] [<ffffffffa00d4d81>] 0xffffffffa00d4d81
>> [ 4123.003006] [<ffffffff81449979>] net_rx_action+0x159/0x340
>> [ 4123.003006] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
>> [ 4123.003006] [<ffffffff810719fd>] irq_exit+0xad/0xc0
>> [ 4123.003006] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
>> [ 4123.003006] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
>> [ 4123.003006] <EOI>
>> [ 4123.003006] [<ffffffff81483a3d>] ? __ip_route_output_key+0x31d/0x860
>> [ 4123.003006] [<ffffffff814e2e95>] ? xfrm_lookup_route+0x5/0x70
>> [ 4123.003006] [<ffffffff81484224>] ? ip_route_output_flow+0x54/0x60
>> [ 4123.003006] [<ffffffff8148ca6a>] ip_queue_xmit+0x36a/0x3d0
>> [ 4123.003006] [<ffffffff814a4799>] tcp_transmit_skb+0x4b9/0x990
>> [ 4123.003006] [<ffffffff814a4d85>] tcp_write_xmit+0x115/0xe90
>> [ 4123.003006] [<ffffffff814a5d72>] __tcp_push_pending_frames+0x32/0xd0
>> [ 4123.003006] [<ffffffff8149443f>] tcp_push+0xef/0x120
>> [ 4123.003006] [<ffffffff81497cb5>] tcp_sendmsg+0xc5/0xb20
>> [ 4123.003006] [<ffffffff810d74c9>] ? lock_hrtimer_base.isra.22+0x29/0x50
>> [ 4123.003006] [<ffffffff814c2d04>] inet_sendmsg+0x64/0xa0
>> [ 4123.003006] [<ffffffff811e94b5>] ? __fget_light+0x25/0x70
>> [ 4123.003006] [<ffffffff8142d74d>] sock_sendmsg+0x3d/0x50
>> [ 4123.003006] [<ffffffff8142dc12>] SYSC_sendto+0x102/0x1a0
>> [ 4123.003006] [<ffffffff8110f864>] ? __audit_syscall_entry+0xb4/0x110
>> [ 4123.003006] [<ffffffff810224fc>] ? do_audit_syscall_entry+0x6c/0x70
>> [ 4123.003006] [<ffffffff81023cf3>] ?
>> syscall_trace_enter_phase1+0x103/0x160
>> [ 4123.003006] [<ffffffff8142e75e>] SyS_sendto+0xe/0x10
>> [ 4123.003006] [<ffffffff8154fc6e>] system_call_fastpath+0x12/0x71
>> [ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00
>> 41 83 b9 80 00
>> [ 4123.003006] RIP [<ffffffffa0233027>] 0xffffffffa0233027
>> [ 4123.003006] RSP <ffff88021fc03b58>
>
> Presumably the same two functions as before (loaded at a different
> base address but same offsets, 0xd81 and 0x72b). And then nf_iterate
> call into another unknown function, and there really is code there
> and it's consistent with the oops. And the kernel thinks it's
> outside of any normal text section, so it does not try to dump any
> code from before the instruction pointer.
>
> 0: 48 8b 88 40 03 00 00 mov 0x340(%rax),%rcx
> 7: e8 1d dd dd ff callq 0xffffffffffdddd29
> c: 5d pop %rbp
> d: c3 retq
>
> Did you write your own module loader or something?

These are stock kernels, with the exception that we include the secure
boot patch set:
https://github.com/coreos/coreos-overlay/tree/master/sys-kernel/coreos-sources/files/4.1
Been a while since kmod got updated so CoreOS is currently shipping
with kmod-15 but beyond being a bit old there isn't anything special
about the module loader.

So nothing particularly magical going on here that I know of.

For reference the original bug report includes a few more varieties of
stack traces: https://github.com/coreos/bugs/issues/435

2015-08-27 13:00:12

by Eric Dumazet

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Wed, 2015-08-26 at 13:54 -0700, Michael Marineau wrote:
> On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert <[email protected]> wrote:
> > On Wed, 26 Aug 2015 08:46:59 +0000
> > Shaun Crampton <[email protected]> wrote:
> >
> >> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over
> >> approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses
> >> below and then reboot (I¹m not sure if the different oopses are related to
> >> each other).
> >>
> >> The app is Project Calico, which is a datacenter networking fabric.
> >> calico-felix, the process named below, is our per-host agent. The
> >> per-host agent is responsible for reading the network information from a
> >> central server and applying "ip route² and "iptables" updates to the
> >> kernel. We¹re running on CoreOS, with about 100 docker containers/veths
> >> pairs running on each host. calico-felix is running inside one of those
> >> containers. We also run the BIRD BGP stack to redistribute routes around
> >> the datacenter. The errors happen more frequently while Calico is under
> >> load.
> >>
> >> I¹m not sure where to go from here. I can reproduce these issues easily
> >> at that scale but I haven¹t managed to boil it down to a small-scale repro
> >> scenario for further investigation (yet).
> >>
> >
> > What in the world is going on with those call traces? E.g.:
> >
> >> [ 4513.712008] <IRQ>
> >> [ 4513.712008] [<ffffffff81486751>] ? ip_rcv_finish+0x81/0x360
> >> [ 4513.712008] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
> >> [ 4513.712008] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
> >> [ 4513.712008] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
> >> [ 4513.712008] [<ffffffff8143b667>] ? build_skb+0x17/0x90
> >> [ 4513.712008] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
> >> [ 4513.712008] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
> >> [ 4513.712008] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
> >> [ 4513.712008] [<ffffffffa00f772b>] 0xffffffffa00f772b
> >> [ 4513.712008] [<ffffffff814491b3>] ? __netif_receive_skb_core+0x6c3/0x9a0
> >> [ 4513.712008] [<ffffffffa00f7d81>] 0xffffffffa00f7d81
> >> [ 4513.712008] [<ffffffff81449979>] net_rx_action+0x159/0x340
> >> [ 4513.712008] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
> >> [ 4513.712008] [<ffffffff810719fd>] irq_exit+0xad/0xc0
> >> [ 4513.712008] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
> >> [ 4513.712008] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
> >> [ 4513.712008] <EOI>
> >
> > There are two functions in the call trace that the kernel knows
> > nothing about. How did they get in there?
> >
> > And there is really executable code in there, as can be seen from a
> > later trace:
> >
> >> [ 4123.003006] <IRQ>
> >> [ 4123.003006] [<ffffffff8147d477>] nf_iterate+0x57/0x80
> >> [ 4123.003006] [<ffffffff8147d537>] nf_hook_slow+0x97/0x100
> >> [ 4123.003006] [<ffffffff81486e32>] ip_local_deliver+0x92/0xa0
> >> [ 4123.003006] [<ffffffff81486a30>] ? ip_rcv_finish+0x360/0x360
> >> [ 4123.003006] [<ffffffff81486751>] ip_rcv_finish+0x81/0x360
> >> [ 4123.003006] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
> >> [ 4123.003006] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
> >> [ 4123.003006] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
> >> [ 4123.003006] [<ffffffff8143b667>] ? build_skb+0x17/0x90
> >> [ 4123.003006] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
> >> [ 4123.003006] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
> >> [ 4123.003006] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
> >> [ 4123.003006] [<ffffffffa00d472b>] 0xffffffffa00d472b
> >> [ 4123.003006] [<ffffffffa00d4d81>] 0xffffffffa00d4d81
> >> [ 4123.003006] [<ffffffff81449979>] net_rx_action+0x159/0x340
> >> [ 4123.003006] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
> >> [ 4123.003006] [<ffffffff810719fd>] irq_exit+0xad/0xc0
> >> [ 4123.003006] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
> >> [ 4123.003006] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
> >> [ 4123.003006] <EOI>
> >> [ 4123.003006] [<ffffffff81483a3d>] ? __ip_route_output_key+0x31d/0x860
> >> [ 4123.003006] [<ffffffff814e2e95>] ? xfrm_lookup_route+0x5/0x70
> >> [ 4123.003006] [<ffffffff81484224>] ? ip_route_output_flow+0x54/0x60
> >> [ 4123.003006] [<ffffffff8148ca6a>] ip_queue_xmit+0x36a/0x3d0
> >> [ 4123.003006] [<ffffffff814a4799>] tcp_transmit_skb+0x4b9/0x990
> >> [ 4123.003006] [<ffffffff814a4d85>] tcp_write_xmit+0x115/0xe90
> >> [ 4123.003006] [<ffffffff814a5d72>] __tcp_push_pending_frames+0x32/0xd0
> >> [ 4123.003006] [<ffffffff8149443f>] tcp_push+0xef/0x120
> >> [ 4123.003006] [<ffffffff81497cb5>] tcp_sendmsg+0xc5/0xb20
> >> [ 4123.003006] [<ffffffff810d74c9>] ? lock_hrtimer_base.isra.22+0x29/0x50
> >> [ 4123.003006] [<ffffffff814c2d04>] inet_sendmsg+0x64/0xa0
> >> [ 4123.003006] [<ffffffff811e94b5>] ? __fget_light+0x25/0x70
> >> [ 4123.003006] [<ffffffff8142d74d>] sock_sendmsg+0x3d/0x50
> >> [ 4123.003006] [<ffffffff8142dc12>] SYSC_sendto+0x102/0x1a0
> >> [ 4123.003006] [<ffffffff8110f864>] ? __audit_syscall_entry+0xb4/0x110
> >> [ 4123.003006] [<ffffffff810224fc>] ? do_audit_syscall_entry+0x6c/0x70
> >> [ 4123.003006] [<ffffffff81023cf3>] ?
> >> syscall_trace_enter_phase1+0x103/0x160
> >> [ 4123.003006] [<ffffffff8142e75e>] SyS_sendto+0xe/0x10
> >> [ 4123.003006] [<ffffffff8154fc6e>] system_call_fastpath+0x12/0x71
> >> [ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00
> >> 41 83 b9 80 00
> >> [ 4123.003006] RIP [<ffffffffa0233027>] 0xffffffffa0233027
> >> [ 4123.003006] RSP <ffff88021fc03b58>
> >
> > Presumably the same two functions as before (loaded at a different
> > base address but same offsets, 0xd81 and 0x72b). And then nf_iterate
> > call into another unknown function, and there really is code there
> > and it's consistent with the oops. And the kernel thinks it's
> > outside of any normal text section, so it does not try to dump any
> > code from before the instruction pointer.
> >
> > 0: 48 8b 88 40 03 00 00 mov 0x340(%rax),%rcx
> > 7: e8 1d dd dd ff callq 0xffffffffffdddd29
> > c: 5d pop %rbp
> > d: c3 retq
> >
> > Did you write your own module loader or something?
>
> These are stock kernels, with the exception that we include the secure
> boot patch set:
> https://github.com/coreos/coreos-overlay/tree/master/sys-kernel/coreos-sources/files/4.1
> Been a while since kmod got updated so CoreOS is currently shipping
> with kmod-15 but beyond being a bit old there isn't anything special
> about the module loader.
>
> So nothing particularly magical going on here that I know of.
>
> For reference the original bug report includes a few more varieties of
> stack traces: https://github.com/coreos/bugs/issues/435

One of these traces mentions ipv4_dst_destroy()

Make sure you backported commit
10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a
("udp: fix dst races with multicast early demux")

2015-08-27 16:16:10

by Michael Marineau

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet <[email protected]> wrote:
> On Wed, 2015-08-26 at 13:54 -0700, Michael Marineau wrote:
>> On Wed, Aug 26, 2015 at 4:49 AM, Chuck Ebbert <[email protected]> wrote:
>> > On Wed, 26 Aug 2015 08:46:59 +0000
>> > Shaun Crampton <[email protected]> wrote:
>> >
>> >> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over
>> >> approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses
>> >> below and then reboot (I¹m not sure if the different oopses are related to
>> >> each other).
>> >>
>> >> The app is Project Calico, which is a datacenter networking fabric.
>> >> calico-felix, the process named below, is our per-host agent. The
>> >> per-host agent is responsible for reading the network information from a
>> >> central server and applying "ip route² and "iptables" updates to the
>> >> kernel. We¹re running on CoreOS, with about 100 docker containers/veths
>> >> pairs running on each host. calico-felix is running inside one of those
>> >> containers. We also run the BIRD BGP stack to redistribute routes around
>> >> the datacenter. The errors happen more frequently while Calico is under
>> >> load.
>> >>
>> >> I¹m not sure where to go from here. I can reproduce these issues easily
>> >> at that scale but I haven¹t managed to boil it down to a small-scale repro
>> >> scenario for further investigation (yet).
>> >>
>> >
>> > What in the world is going on with those call traces? E.g.:
>> >
>> >> [ 4513.712008] <IRQ>
>> >> [ 4513.712008] [<ffffffff81486751>] ? ip_rcv_finish+0x81/0x360
>> >> [ 4513.712008] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
>> >> [ 4513.712008] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
>> >> [ 4513.712008] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
>> >> [ 4513.712008] [<ffffffff8143b667>] ? build_skb+0x17/0x90
>> >> [ 4513.712008] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
>> >> [ 4513.712008] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
>> >> [ 4513.712008] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
>> >> [ 4513.712008] [<ffffffffa00f772b>] 0xffffffffa00f772b
>> >> [ 4513.712008] [<ffffffff814491b3>] ? __netif_receive_skb_core+0x6c3/0x9a0
>> >> [ 4513.712008] [<ffffffffa00f7d81>] 0xffffffffa00f7d81
>> >> [ 4513.712008] [<ffffffff81449979>] net_rx_action+0x159/0x340
>> >> [ 4513.712008] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
>> >> [ 4513.712008] [<ffffffff810719fd>] irq_exit+0xad/0xc0
>> >> [ 4513.712008] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
>> >> [ 4513.712008] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
>> >> [ 4513.712008] <EOI>
>> >
>> > There are two functions in the call trace that the kernel knows
>> > nothing about. How did they get in there?
>> >
>> > And there is really executable code in there, as can be seen from a
>> > later trace:
>> >
>> >> [ 4123.003006] <IRQ>
>> >> [ 4123.003006] [<ffffffff8147d477>] nf_iterate+0x57/0x80
>> >> [ 4123.003006] [<ffffffff8147d537>] nf_hook_slow+0x97/0x100
>> >> [ 4123.003006] [<ffffffff81486e32>] ip_local_deliver+0x92/0xa0
>> >> [ 4123.003006] [<ffffffff81486a30>] ? ip_rcv_finish+0x360/0x360
>> >> [ 4123.003006] [<ffffffff81486751>] ip_rcv_finish+0x81/0x360
>> >> [ 4123.003006] [<ffffffff814870e4>] ip_rcv+0x2a4/0x400
>> >> [ 4123.003006] [<ffffffff814866d0>] ? inet_del_offload+0x40/0x40
>> >> [ 4123.003006] [<ffffffff814491b3>] __netif_receive_skb_core+0x6c3/0x9a0
>> >> [ 4123.003006] [<ffffffff8143b667>] ? build_skb+0x17/0x90
>> >> [ 4123.003006] [<ffffffff814494a8>] __netif_receive_skb+0x18/0x60
>> >> [ 4123.003006] [<ffffffff81449523>] netif_receive_skb_internal+0x33/0xa0
>> >> [ 4123.003006] [<ffffffff814495ac>] netif_receive_skb_sk+0x1c/0x70
>> >> [ 4123.003006] [<ffffffffa00d472b>] 0xffffffffa00d472b
>> >> [ 4123.003006] [<ffffffffa00d4d81>] 0xffffffffa00d4d81
>> >> [ 4123.003006] [<ffffffff81449979>] net_rx_action+0x159/0x340
>> >> [ 4123.003006] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
>> >> [ 4123.003006] [<ffffffff810719fd>] irq_exit+0xad/0xc0
>> >> [ 4123.003006] [<ffffffff815528ba>] do_IRQ+0x5a/0xf0
>> >> [ 4123.003006] [<ffffffff815507ae>] common_interrupt+0x6e/0x6e
>> >> [ 4123.003006] <EOI>
>> >> [ 4123.003006] [<ffffffff81483a3d>] ? __ip_route_output_key+0x31d/0x860
>> >> [ 4123.003006] [<ffffffff814e2e95>] ? xfrm_lookup_route+0x5/0x70
>> >> [ 4123.003006] [<ffffffff81484224>] ? ip_route_output_flow+0x54/0x60
>> >> [ 4123.003006] [<ffffffff8148ca6a>] ip_queue_xmit+0x36a/0x3d0
>> >> [ 4123.003006] [<ffffffff814a4799>] tcp_transmit_skb+0x4b9/0x990
>> >> [ 4123.003006] [<ffffffff814a4d85>] tcp_write_xmit+0x115/0xe90
>> >> [ 4123.003006] [<ffffffff814a5d72>] __tcp_push_pending_frames+0x32/0xd0
>> >> [ 4123.003006] [<ffffffff8149443f>] tcp_push+0xef/0x120
>> >> [ 4123.003006] [<ffffffff81497cb5>] tcp_sendmsg+0xc5/0xb20
>> >> [ 4123.003006] [<ffffffff810d74c9>] ? lock_hrtimer_base.isra.22+0x29/0x50
>> >> [ 4123.003006] [<ffffffff814c2d04>] inet_sendmsg+0x64/0xa0
>> >> [ 4123.003006] [<ffffffff811e94b5>] ? __fget_light+0x25/0x70
>> >> [ 4123.003006] [<ffffffff8142d74d>] sock_sendmsg+0x3d/0x50
>> >> [ 4123.003006] [<ffffffff8142dc12>] SYSC_sendto+0x102/0x1a0
>> >> [ 4123.003006] [<ffffffff8110f864>] ? __audit_syscall_entry+0xb4/0x110
>> >> [ 4123.003006] [<ffffffff810224fc>] ? do_audit_syscall_entry+0x6c/0x70
>> >> [ 4123.003006] [<ffffffff81023cf3>] ?
>> >> syscall_trace_enter_phase1+0x103/0x160
>> >> [ 4123.003006] [<ffffffff8142e75e>] SyS_sendto+0xe/0x10
>> >> [ 4123.003006] [<ffffffff8154fc6e>] system_call_fastpath+0x12/0x71
>> >> [ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00
>> >> 41 83 b9 80 00
>> >> [ 4123.003006] RIP [<ffffffffa0233027>] 0xffffffffa0233027
>> >> [ 4123.003006] RSP <ffff88021fc03b58>
>> >
>> > Presumably the same two functions as before (loaded at a different
>> > base address but same offsets, 0xd81 and 0x72b). And then nf_iterate
>> > call into another unknown function, and there really is code there
>> > and it's consistent with the oops. And the kernel thinks it's
>> > outside of any normal text section, so it does not try to dump any
>> > code from before the instruction pointer.
>> >
>> > 0: 48 8b 88 40 03 00 00 mov 0x340(%rax),%rcx
>> > 7: e8 1d dd dd ff callq 0xffffffffffdddd29
>> > c: 5d pop %rbp
>> > d: c3 retq
>> >
>> > Did you write your own module loader or something?
>>
>> These are stock kernels, with the exception that we include the secure
>> boot patch set:
>> https://github.com/coreos/coreos-overlay/tree/master/sys-kernel/coreos-sources/files/4.1
>> Been a while since kmod got updated so CoreOS is currently shipping
>> with kmod-15 but beyond being a bit old there isn't anything special
>> about the module loader.
>>
>> So nothing particularly magical going on here that I know of.
>>
>> For reference the original bug report includes a few more varieties of
>> stack traces: https://github.com/coreos/bugs/issues/435
>
> One of these traces mentions ipv4_dst_destroy()
>
> Make sure you backported commit
> 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a
> ("udp: fix dst races with multicast early demux")

Oh, interesting. Looks like that patch didn't get CC'd to stable
though, is there a reason for that or just oversight?

2015-08-27 16:30:22

by Eric Dumazet

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Thu, 2015-08-27 at 09:16 -0700, Michael Marineau wrote:

>
> Oh, interesting. Looks like that patch didn't get CC'd to stable
> though, is there a reason for that or just oversight?

We never CC stable for networking patches.

David Miller prefers to take care of this himself.

( this is in Documentation/networking/netdev-FAQ.txt )

Q: How can I tell what patches are queued up for backporting to the
various stable releases?

A: Normally Greg Kroah-Hartman collects stable commits himself, but
for networking, Dave collects up patches he deems critical for the
networking subsystem, and then hands them off to Greg.

There is a patchworks queue that you can see here:
http://patchwork.ozlabs.org/bundle/davem/stable/?state=*

It contains the patches which Dave has selected, but not yet handed
off to Greg. If Greg already has the patch, then it will be here:
http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git

A quick way to find whether the patch is in this stable-queue is
to simply clone the repo, and then git grep the mainline commit ID, e.g.

stable-queue$ git grep -l 284041ef21fdf2e
releases/3.0.84/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
releases/3.4.51/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
releases/3.9.8/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
stable/stable-queue$

2015-08-27 16:32:40

by Michael Marineau

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Thu, Aug 27, 2015 at 9:30 AM, Eric Dumazet <[email protected]> wrote:
> On Thu, 2015-08-27 at 09:16 -0700, Michael Marineau wrote:
>
>>
>> Oh, interesting. Looks like that patch didn't get CC'd to stable
>> though, is there a reason for that or just oversight?
>
> We never CC stable for networking patches.
>
> David Miller prefers to take care of this himself.

Ah, right, sorry. forgot about that. :)

>
> ( this is in Documentation/networking/netdev-FAQ.txt )
>
> Q: How can I tell what patches are queued up for backporting to the
> various stable releases?
>
> A: Normally Greg Kroah-Hartman collects stable commits himself, but
> for networking, Dave collects up patches he deems critical for the
> networking subsystem, and then hands them off to Greg.
>
> There is a patchworks queue that you can see here:
> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
>
> It contains the patches which Dave has selected, but not yet handed
> off to Greg. If Greg already has the patch, then it will be here:
> http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git
>
> A quick way to find whether the patch is in this stable-queue is
> to simply clone the repo, and then git grep the mainline commit ID, e.g.
>
> stable-queue$ git grep -l 284041ef21fdf2e
> releases/3.0.84/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
> releases/3.4.51/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
> releases/3.9.8/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
> stable/stable-queue$
>
>
>

2015-08-27 16:40:15

by David Miller

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

From: Michael Marineau <[email protected]>
Date: Thu, 27 Aug 2015 09:16:06 -0700

> On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet <[email protected]> wrote:
>> Make sure you backported commit
>> 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a
>> ("udp: fix dst races with multicast early demux")
>
> Oh, interesting. Looks like that patch didn't get CC'd to stable
> though, is there a reason for that or just oversight?

All networking bug fixes are submitted to -stable by hand by me at a
time of my choosing. We do not use the "CC: stable" facility, as I
feel it pushes patches into -stable way too quickly and before the
change gets sufficient exposure for regressions in Linus's tree.

The patch in question got submitted last night.

2015-08-27 16:47:50

by Michael Marineau

[permalink] [raw]

Subject: Re: ip_rcv_finish() NULL pointer and possibly related Oopses

On Thu, Aug 27, 2015 at 9:40 AM, David Miller <[email protected]> wrote:
> From: Michael Marineau <[email protected]>
> Date: Thu, 27 Aug 2015 09:16:06 -0700
>
>> On Thu, Aug 27, 2015 at 6:00 AM, Eric Dumazet <[email protected]> wrote:
>>> Make sure you backported commit
>>> 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a
>>> ("udp: fix dst races with multicast early demux")
>>
>> Oh, interesting. Looks like that patch didn't get CC'd to stable
>> though, is there a reason for that or just oversight?
>
> All networking bug fixes are submitted to -stable by hand by me at a
> time of my choosing. We do not use the "CC: stable" facility, as I
> feel it pushes patches into -stable way too quickly and before the
> change gets sufficient exposure for regressions in Linus's tree.
>
> The patch in question got submitted last night.

Great, thank you!