Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932468AbWCWPv3 (ORCPT ); Thu, 23 Mar 2006 10:51:29 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932466AbWCWPv3 (ORCPT ); Thu, 23 Mar 2006 10:51:29 -0500 Received: from mgw1.diku.dk ([130.225.96.91]:28081 "EHLO mgw1.diku.dk") by vger.kernel.org with ESMTP id S932446AbWCWPv2 (ORCPT ); Thu, 23 Mar 2006 10:51:28 -0500 Date: Thu, 23 Mar 2006 16:44:56 +0100 (CET) From: Jesper Dangaard Brouer To: "David S. Miller" Cc: dipankar@in.ibm.com, Robert Olsson , jens.laas@data.slu.se, hans.liss@its.uu.se, linux-net@vger.kernel.org, linux-kernel@vger.kernel.org, Eric Dumazet , mike.stroyan@hp.com, Suresh Bhogavilli Subject: Re: Kernel panic: Route cache, RCU, possibly FIB trie. In-Reply-To: Message-ID: References: <20060321.023705.26111240.davem@davemloft.net> <20060321.132514.24407022.davem@davemloft.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3583 Lines: 92 On Thu, 23 Mar 2006, Jesper Dangaard Brouer wrote: > On Tue, 21 Mar 2006, David S. Miller wrote: > >> From: Jesper Dangaard Brouer >> Date: Tue, 21 Mar 2006 15:51:34 +0100 (CET) >> >>> You guessed right... I did enable IP_ROUTE_MULTIPATH_CACHED, I have >>> now disabled it and equal multi path routing in general >>> (CONFIG_IP_ROUTE_MULTIPATH). >> >> It is almost certainly the cause of your crashes, that code >> is still extremely raw and that's why it is listed as "EXPERIMENTAL". > > It seems your are right :-) (and I'll take more care of using experimental > code on production again). The machine, has now been running for 34 hours > without crashing. The strange thing is that I'm running the same kernel on > 30 other (similar) machines, which have not crashed. (I do suspect the > specific traffic load pattern to influence this) Argh!! -- nemesis!!! The machine, just died again... The machine did not crash it just ran out of memory, and killed too many important processes. I had to power recycle it... :-((( Could ping it... I can see that, the traffic pattern have changed and the route cache is growing rapitly... > BUT, I do think I have noticed another problem in the garbage collection code > (route.c), that causes the garbage collector (almost) never to garbage > collect. > > This is caused by the value "ip_rt_max_size" > (/proc/sys/net/ipv4/route/max_size) > being set too large. It is set to 16 times the gc_thresh value (this size > dependend on the memory size). In the garbage collection function > (rt_garbage_collect) garbage collecting entries are ignored (gc_ignored) if > the number of entries are below "ip_rt_max_size". > > With 1Gb memory, gc_thresh=65536 times 16 is 1048576. Which means that we > only start to garbage collect when there is more than 1 million entries. This > seems wrong... (the reason it does not grow this large is the 600 second > periodic flushes). > > > Hilsen > Jesper Brouer > > -- > ------------------------------------------------------------------- > Cand. scient datalog > Dept. of Computer Science, University of Copenhagen > ------------------------------------------------------------------- > > > grep . /proc/sys/net/ipv4/route/* > /proc/sys/net/ipv4/route/error_burst:5000 > /proc/sys/net/ipv4/route/error_cost:1000 > grep: /proc/sys/net/ipv4/route/flush: Operation not permitted > /proc/sys/net/ipv4/route/gc_elasticity:8 > /proc/sys/net/ipv4/route/gc_interval:60 > /proc/sys/net/ipv4/route/gc_min_interval:0 > /proc/sys/net/ipv4/route/gc_min_interval_ms:500 > /proc/sys/net/ipv4/route/gc_thresh:65536 > /proc/sys/net/ipv4/route/gc_timeout:300 > /proc/sys/net/ipv4/route/max_delay:10 > /proc/sys/net/ipv4/route/max_size:1048576 > /proc/sys/net/ipv4/route/min_adv_mss:256 > /proc/sys/net/ipv4/route/min_delay:2 > /proc/sys/net/ipv4/route/min_pmtu:552 > /proc/sys/net/ipv4/route/mtu_expires:600 > /proc/sys/net/ipv4/route/redirect_load:20 > /proc/sys/net/ipv4/route/redirect_number:9 > /proc/sys/net/ipv4/route/redirect_silence:20480 > /proc/sys/net/ipv4/route/secret_interval:600 > > Hilsen Jesper Brouer -- ------------------------------------------------------------------- Cand. scient datalog Dept. of Computer Science, University of Copenhagen ------------------------------------------------------------------- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/