Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753114Ab2F2I7t (ORCPT ); Fri, 29 Jun 2012 04:59:49 -0400 Received: from ja.ssi.bg ([178.16.129.10]:44373 "EHLO ja.ssi.bg" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751677Ab2F2I7q (ORCPT ); Fri, 29 Jun 2012 04:59:46 -0400 Date: Fri, 29 Jun 2012 12:04:55 +0300 (EEST) From: Julian Anastasov To: Xiaotian Feng cc: netdev@vger.kernel.org, lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org, coreteam@netfilter.org, linux-kernel@vger.kernel.org, Xiaotian Feng , Wensong Zhang , Simon Horman , Pablo Neira Ayuso , Patrick McHardy , "David S. Miller" Subject: Re: [RFC PATCH net-next] ipvs: add missing lock in ip_vs_ftp_init_conn() In-Reply-To: Message-ID: References: <1340890587-8169-1-git-send-email-xtfeng@gmail.com> User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-1463811672-1915040808-1340960702=:1690" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2482 Lines: 72 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463811672-1915040808-1340960702=:1690 Content-Type: TEXT/PLAIN; charset=UTF-8 Content-Transfer-Encoding: 8BIT Hello, On Fri, 29 Jun 2012, Xiaotian Feng wrote: > > On Thu, 28 Jun 2012, Xiaotian Feng wrote: > > > >> We met a kernel panic in 2.6.32.43 kernel: > >> > >> [2680191.848044] IPVS: ip_vs_conn_hash(): request for already hashed, called from run_timer_softirq+0x175/0x1d0 > >> > >> [2680311.849009] general protection fault: 0000 [#1] SMP What we see here is 120 seconds between 2680191 and 2680311. It can mean 2 things: - some state timeout, it depends on your forwarding method. What is it? NAT? DR? - 60 seconds for ip_vs_conn_expire retries > >> After code review, the only chance that kernel change connection flag without protection is > >> in ip_vs_ftp_init_conn(). > > > >        Hm, ip_vs_ftp_init_conn is called before 1st hashing, > > from ip_vs_bind_app() in ip_vs_conn_new() before > > ip_vs_conn_hash(). It should be another problem with > > the flags. How different is IPVS in 2.6.32.43 compared to > > recent kernels? If commit aea9d711 is present, I'm not > > aware of other similar problems. > > ip_vs_bind_app() is also called by ip_vs_try_bind_dest(), which can be > traced to ip_vs_proc_conn(). > I've checked the changes in upstream, but nothing helps since aea9d711 > has been taken into 2.6.32.28 kernel. OK, this fix should make it safe for master-backup sync and it should be applied but I suspect you are not using sync, right? And then this fix will not solve the oops. There are no many places that rehash conn: ip_vs_conn_fill_cport - used for FTP ip_vs_check_template: - do you have persistence configured? After you provide details for the used forwarding method, persistence and sync we should think how such races with rehashing can lead to double hlist_del. May be you can modify the debug message in ip_vs_conn_hash, so that we can see cp->flags and ntohs of cp->cport, cp->dport and cp->vport when oops happens again. Regards -- Julian Anastasov ---1463811672-1915040808-1340960702=:1690-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/