Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030371AbWCUNpy (ORCPT ); Tue, 21 Mar 2006 08:45:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751718AbWCUNpy (ORCPT ); Tue, 21 Mar 2006 08:45:54 -0500 Received: from mx1.slu.se ([130.238.96.70]:64917 "EHLO mx1.slu.se") by vger.kernel.org with ESMTP id S1751713AbWCUNpx (ORCPT ); Tue, 21 Mar 2006 08:45:53 -0500 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17439.65413.214470.194287@robur.slu.se> Date: Tue, 21 Mar 2006 14:28:37 +0100 To: Jesper Dangaard Brouer Cc: Robert Olsson , jens.laas@data.slu.se, hans.liss@its.uu.se, linux-net@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Kernel panic: Route cache, RCU, possibly FIB trie. In-Reply-To: References: X-Mailer: VM 7.19 under Emacs 21.4.1 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2123 Lines: 47 Jesper Dangaard Brouer writes: > I have tried to track down the problem, and I think I have narrowed it > a bit down. My theory is that it is related to the route cache > (ip_dst_cache) or FIB, which cannot dealloacate route cache slab > elements (maybe RCU related). (I have seen my route cache increase to > around 520k entries using rtstat, before dying). > > I'm using the FIB trie system/algorithm (CONFIG_IP_FIB_TRIE). Think > that the error might be cause by the "fib_trie" code. See the syslog, > output below. > Syslog#1 (indicating a problem with the fib trie) > -------- > Mar 20 18:00:04 hostname kernel: Debug: sleeping function called from invalid context at mm/slab.c:2472 > Mar 20 18:00:04 hostname kernel: in_atomic():1, irqs_disabled():0 > Mar 20 18:00:04 hostname kernel: [] dump_stack+0x1e/0x22 > Mar 20 18:00:04 hostname kernel: [] __might_sleep+0xa6/0xae > Mar 20 18:00:04 hostname kernel: [] __kmalloc+0xd9/0xf3 > Mar 20 18:00:04 hostname kernel: [] kzalloc+0x23/0x50 > Mar 20 18:00:04 hostname kernel: [] tnode_alloc+0x3c/0x82 > Mar 20 18:00:04 hostname kernel: [] tnode_new+0x26/0x91 > Mar 20 18:00:04 hostname kernel: [] halve+0x43/0x31d > Mar 20 18:00:04 hostname kernel: [] resize+0x118/0x27e Hello! Out of memory? Running BGP with full routing? And large number of flows. Whats your normal number of entries route cache? And how much memory do you have? From your report problems seems to related to flushing either rt_cache_flush or fib_flush (before there was dev_close()?) so all associated entries should freed. All the entries are freed via RCU which due to the deferred delete can give a very high transient memory pressure. If we believe it's memory problem we can try something out... Cheers. --ro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/