Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751592AbaK0NzJ (ORCPT ); Thu, 27 Nov 2014 08:55:09 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50737 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751422AbaK0NzH (ORCPT ); Thu, 27 Nov 2014 08:55:07 -0500 Message-ID: <54772D25.3010208@redhat.com> Date: Thu, 27 Nov 2014 14:54:45 +0100 From: Daniel Borkmann User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: =?UTF-8?B?0JTQtdC90LjRgdC60LAt0YDQtdC00LjRgdC60LA=?= CC: linux-net@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, minipli@googlemail.com, fw@strlen.de Subject: Re: bug in networking code causes GPF References: <1417095336.547728a8852ee@mail.inbox.lv> In-Reply-To: <1417095336.547728a8852ee@mail.inbox.lv> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/27/2014 02:35 PM, Дениска-редиска wrote: > hello, > > i run ipvs DR on 2 servers under heavy load - up to 1Gbps of traffic. > Time to time the server where ipvs runs master IP (VIP) get general protection fault. Switching master to another server make no difference - after some time GPF come. So I assume it is not hardware issue. > > There are logs from both servers with different kernels (i run kernel with grsecurity patch set from Gentoo hardened portage tree): Hmm, looks pretty much like ... http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/54903 ... which was a bug in the grsec patch set. Does your grsec kernel have: commit 0fa213cce614ad25a79acbd06f37f1e9022134d9 Author: Brad Spengler Date: Fri Oct 31 17:29:20 2014 -0400 From: Mathias Krause To: PaX Team Cc: Brad Spengler , Mathias Krause Subject: [PATCH] pax: don't sanitize RCU slab caches We cannot sanitize SLAB_DESTROY_BY_RCU slab caches in kmem_cache_free() as there might be readers in this RCU period, wanting to access the object. Fix this, for now, by marking those with SLAB_NO_SANITIZE. Hopefully we can have a real fix later on. But this should fix the RCU stalls and netfilter conntrack related problems. This patch should go on top of the previous patch. Signed-off-by: Mathias Krause > [354497.931834] general protection fault: 0000 [#1] SMP > [354497.931903] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 3.13.10-hardened.standart.20140515 #1 > [354497.931993] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.5 11/25/2013 > [354497.932082] task: ffff88021e4b2ca0 ti: ffff88021e4b3100 task.ti: ffff88021e4b3100 > [354497.932167] RIP: 0010:[] [] ffffffff81653ca2 > [354497.932278] RSP: 0000:ffff88021fd03b98 EFLAGS: 00010246 > [354497.932330] RAX: 0000000000013ba0 RBX: fefefefefefefefe RCX: 000000000001bc30 > [354497.932413] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [354497.932497] RBP: ffff88021fd03c40 R08: 00000000cacb7f0b R09: ffff88021fd03c58 > [354497.932580] R10: ffffffffffffffff R11: ffff88041de33280 R12: 8000000000000000 > [354497.932663] R13: 0000000000003786 R14: ffffffff81a82540 R15: 0000000000000000 > [354497.932749] FS: 000003853a8a7740(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000 > [354497.932836] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [354497.932891] CR2: 000003d8a933b2d0 CR3: 000000000174a000 CR4: 00000000000407f0 > [354497.932973] Stack: > [354497.933013] 0000000000000000 ffffffff81a82540 00000000de1b1efe 0000000000000000 > [354497.933110] ffff88021fd03c40 ffffffff81653f6d ffffffff81a92cc0 ffffffff81a82540 > [354497.933206] ffff88041d70c500 0000000000000000 00000000de1b1efe ffffffff81654f6c > [354497.933304] Call Trace: > [354497.933347] > [354497.933357] [] ? __nf_conntrack_find_get+0x28/0x13b > [354497.933484] [] ? nf_conntrack_in+0x253/0x73e > [354497.933544] [] ? nf_iterate+0x40/0x7d > [354497.933601] [] ? inet_del_offload+0x39/0x39 > [354497.933658] [] ? nf_hook_slow+0x6c/0x104 > [354497.933714] [] ? inet_del_offload+0x39/0x39 > [354497.933770] [] ? ip_rcv+0x313/0x35f > [354497.933824] [] ? ip_local_deliver_finish+0xb8/0x11f > [354497.933885] [] ? __netif_receive_skb_core+0x44d/0x4e2 > [354497.933944] [] ? netif_receive_skb+0x4c/0x81 > [354497.934000] [] ? napi_gro_receive+0x35/0x7a > [354497.934058] [] ? igb_poll+0xa49/0xd13 > [354497.934115] [] ? __wake_up+0x38/0x49 > [354497.934169] [] ? net_rx_action+0xa6/0x172 > [354497.934225] [] ? __do_softirq+0xb9/0x1ae > [354497.934280] [] ? irq_exit+0x37/0x7a > [354497.934335] [] ? do_IRQ+0x96/0xb0 > [354497.934389] [] ? common_interrupt+0x97/0x97 > [354497.934441] > [354497.934451] [] ? update_ts_time_stats+0x30/0x76 > [354497.934548] [] ? arch_remove_reservations+0x6a/0x6a > [354497.934607] [] ? default_idle+0x3/0x9 > [354497.934676] [] ? arch_cpu_idle+0x6/0x1e > [354497.934732] [] ? arch_remove_reservations+0x6a/0x6a > [354497.934791] [] ? cpu_startup_entry+0xe9/0x15b > [354497.934850] [] ? start_secondary+0x2f9/0x32c > [354497.934903] Code: c2 85 d2 49 8b 86 d0 04 00 00 74 14 66 45 85 ff 75 0e 65 ff 40 04 e8 85 f6 a4 ff 48 89 d8 eb 69 65 ff 00 48 8b 1b f6 c3 01 75 0f <8b> 43 10 39 45 00 b8 00 00 00 00 74 83 eb 9d 48 d1 eb 4c 39 eb > [354497.935402] RIP [] ffffffff81653ca2 > [354497.935456] RSP > [354497.935965] ---[ end trace 7d6f660245b2d541 ]--- > [354497.936080] Kernel panic - not syncing: Fatal exception in interrupt > [354498.016801] Rebooting in 10 seconds. > > > [674944.621564] general protection fault: 0000 [#1] SMP > [674944.621637] CPU: 12 PID: 17984 Comm: nginx Not tainted 3.15.10-hardened-r1.standart.20140925 #1 > [674944.621728] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.5 11/25/2013 > [674944.621817] task: ffff88021e1d7700 ti: ffff88021e1d7c68 task.ti: ffff88021e1d7c68 > [674944.621903] RIP: 0010:[] [] ffffffff816f2be8 > [674944.621990] RSP: 0000:ffff88021fc03ce8 EFLAGS: 00010246 > [674944.622057] RAX: ffffc90011901000 RBX: 822098c2102098c2 RCX: 000000005823edca > [674944.622143] RDX: fefefefefefefefe RSI: 000000009e90f1ad RDI: ffffffff81a8ad40 > [674944.622226] RBP: 000000000050abb3 R08: 000000000050abb3 R09: 000000000001f106 > [674944.622310] R10: ffffea00100cbd80 R11: ffffea00100cbd80 R12: 8000000000000000 > [674944.622394] R13: ffffffff81a8ad40 R14: 0000000049c3f106 R15: ffffc900119f9830 > [674944.622479] FS: 0000029d6fd04740(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000 > [674944.622566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [674944.622619] CR2: ffffffffff600400 CR3: 0000000001787000 CR4: 00000000000407f0 > [674944.622701] Stack: > [674944.622741] ffffffff816e9360 ffffffff00000050 ffffffff822098c2 abb3000280000000 > [674944.622839] ffff88006e9c2b00 ffff88011cbd1bce ffff88021e0c0000 0000000000000008 > [674944.622935] ffffffff81a955b0 ffffffff8170920a ffff880100000003 0000000000000008 > [674944.623031] Call Trace: > [674944.623077] > [674944.623087] [] ? inet_del_offload+0x39/0x39 > [674944.623192] [] ? tcp_v4_early_demux+0x14c/0x1bd > [674944.623250] [] ? ip_rcv_finish+0x50/0x2c1 > [674944.623326] [] ? __netif_receive_skb_core+0x3c8/0x456 > [674944.623386] [] ? netif_receive_skb_internal+0x4c/0x81 > [674944.623447] [] ? napi_gro_receive+0x36/0x7c > [674944.623511] [] ? igb_poll+0xa8b/0xd5b > [674944.623572] [] ? __note_gp_changes+0x31/0x61 > [674944.623630] [] ? net_rx_action+0xa6/0x172 > [674944.623688] [] ? __do_softirq+0xf6/0x1fb > [674944.623744] [] ? irq_exit+0x38/0x7c > [674944.623798] [] ? do_IRQ+0xb3/0xce > [674944.623853] [] ? common_interrupt+0x97/0x97 > [674944.623906] > [674944.623917] Code: 6a d4 75 0e 48 39 5a c8 74 51 eb 06 3b 44 24 50 74 50 4c 89 4c 24 08 e8 e8 fe ff ff 4c 8b 4c 24 08 eb 83 48 8b 12 f6 c2 01 75 0b <44> 39 72 d0 75 f2 e9 75 ff ff ff 48 d1 ea 4c 39 ca 0f 85 64 ff > [674944.624456] RIP [] ffffffff816f2be8 > [674944.624536] RSP > [674944.625020] ---[ end trace 8035e2b5322bab00 ]--- > [674944.625126] Kernel panic - not syncing: Fatal exception in interrupt > [674944.706563] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) > [674944.706711] Rebooting in 10 seconds. > > > [7523332.314991] general protection fault: 0000 [#1] SMP > [7523332.315078] CPU: 4 PID: 25432 Comm: nginx Not tainted 3.15.8-hardened.standart.20140901 #1 > [7523332.315172] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0 09/10/2012 > [7523332.315266] task: ffff88041eb98000 ti: ffff88041eb98568 task.ti: ffff88041eb98568 > [7523332.315355] RIP: 0010:[] [] ffffffff8168db79 > [7523332.315446] RSP: 0018:ffff88021fa03bf8 EFLAGS: 00010246 > [7523332.316983] RAX: 00000000000149c0 RBX: ffffffff81a8ac80 RCX: 00000000000011d5 > [7523332.317070] RDX: 0000000000000000 RSI: 0000000000008ea8 RDI: ffffffff81a8acfe > [7523332.317187] RBP: ffff88021fa03c5c R08: 00000000b96542ae R09: ffff88021fa03c74 > [7523332.317274] R10: 0000000000000002 R11: ffff880238b8ce00 R12: 8000000000000000 > [7523332.317360] R13: fefefefefefefefe R14: 0000000000000000 R15: 0000000047567b68 > [7523332.317448] FS: 0000031d200c5740(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000 > [7523332.317538] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [7523332.317594] CR2: 000004373dcef000 CR3: 0000000001779000 CR4: 00000000000007f0 > [7523332.317679] Stack: > [7523332.317722] 0000000000000000 ffffffff81a8ac80 ffff880003e08200 0000000000000000 > [7523332.317824] ffffffff81a9bf60 ffffffff8168ef87 ffffffff81a9bf60 ffffffff81a96970 > [7523332.317925] 0000000047567b68 ffffffff81a96970 0000000281a90002 0000000000000014 > [7523332.318026] Call Trace: > [7523332.318072] > [7523332.318085] [] ? nf_conntrack_in+0x2c1/0x846 > [7523332.318199] [] ? nf_iterate+0x41/0x81 > [7523332.318259] [] ? inet_del_offload+0x39/0x39 > [7523332.318321] [] ? nf_hook_slow+0x76/0x111 > [7523332.318393] [] ? inet_del_offload+0x39/0x39 > [7523332.318453] [] ? ip_rcv+0x2f4/0x356 > [7523332.318512] [] ? __netif_receive_skb_core+0x3d9/0x410 > [7523332.318575] [] ? netif_receive_skb_internal+0x6d/0x77 > [7523332.318640] [] ? napi_gro_receive+0x36/0x7c > [7523332.318702] [] ? igb_poll+0xa46/0xd09 > [7523332.318762] [] ? __list_add+0x1b/0x37 > [7523332.318820] [] ? net_rx_action+0xa0/0x171 > [7523332.318882] [] ? __do_softirq+0xf7/0x1fa > [7523332.318943] [] ? do_softirq_own_stack+0x1c/0x30 > [7523332.318999] > [7523332.319013] [] ? do_softirq+0x24/0x2c > [7523332.319112] [] ? __local_bh_enable_ip+0x66/0x74 > [7523332.319174] [] ? ipt_do_table+0x5c6/0x5f0 > [7523332.319235] [] ? nf_iterate+0x41/0x81 > [7523332.319293] [] ? ip_options_rcv_srr+0x1c7/0x1c7 > [7523332.319354] [] ? nf_hook_slow+0x76/0x111 > [7523332.319412] [] ? ip_options_rcv_srr+0x1c7/0x1c7 > [7523332.319473] [] ? __ip_local_out+0x64/0x6e > [7523332.319533] [] ? __sk_dst_check+0x34/0x63 > [7523332.319617] [] ? ip_local_out_sk+0x12/0x39 > [7523332.319676] [] ? ip_queue_xmit+0x2ab/0x2db > [7523332.319739] [] ? tcp_transmit_skb+0x6eb/0x735 > [7523332.319801] [] ? tcp_write_xmit+0x82e/0x969 > [7523332.319861] [] ? tcp_sendpage+0x50b/0x5e4 > [7523332.319923] [] ? direct_splice_actor+0x49/0x49 > [7523332.319986] [] ? inet_sendpage+0xbc/0xe0 > [7523332.320045] [] ? kernel_sendpage+0x49/0x59 > [7523332.320104] [] ? sock_sendpage+0x47/0x53 > [7523332.320163] [] ? pipe_to_sendpage+0x6f/0x7c > [7523332.320223] [] ? splice_from_pipe_feed+0x7f/0x10e > [7523332.320285] [] ? direct_splice_actor+0x49/0x49 > [7523332.320347] [] ? __splice_from_pipe+0x3a/0x6b > [7523332.320408] [] ? splice_from_pipe+0x66/0x87 > [7523332.320468] [] ? direct_splice_actor+0x49/0x49 > [7523332.320533] [] ? direct_splice_actor+0x3f/0x49 > [7523332.320599] [] ? splice_direct_to_actor+0xd3/0x18d > [7523332.320661] [] ? generic_pipe_buf_nosteal+0xc/0xc > [7523332.320723] [] ? do_splice_direct+0x9a/0xb6 > [7523332.320783] [] ? do_sendfile+0x182/0x32a > [7523332.320856] [] ? SyS_sendfile64+0x137/0x1bc > [7523332.320916] [] ? system_call_fastpath+0x16/0x1b > [7523332.320972] Code: 00 02 00 00 48 c7 c7 4d db 68 81 65 ff 40 04 e8 71 f1 a2 ff 4d 85 ed 75 58 e9 94 01 00 00 65 ff 00 4d 8b 6d 00 41 f6 c5 01 75 18 <41> 8b 55 10 31 c0 39 55 00 41 8a 7d 37 0f 85 14 ff ff ff e9 e7 > [7523332.321522] RIP [] ffffffff8168db79 > [7523332.321579] RSP > [7523332.322094] ---[ end trace 0e21b79561002306 ]--- > [7523332.322210] Kernel panic - not syncing: Fatal exception in interrupt > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/