2001-07-29 01:23:40

by Steve Snyder

[permalink] [raw]
Subject: What does "Neighbour table overflow" message indicate?

I just got this sequence of messages in my system log:

Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
Jul 28 19:47:44 sunburn last message repeated 9 times
Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
Jul 28 19:48:09 sunburn last message repeated 3 times
Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.

This is on a RedHat v7.1 + SMP kernel v2.4.7 system. What is the kernel
trying to tell me here?

Please cc me as I am not a subscriber to this list.

Thanks.


2001-07-29 01:54:25

by Steve Snyder

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

No, and no errors are shown for it either:

# ifconfig lo
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:196907 errors:0 dropped:0 overruns:0 frame:0
TX packets:196907 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

All *seems* well. Just that 30-second period of messages and then silence.

Thanks for the response.


On Saturday 28 July 2001 08:38 pm, you wrote:
> is lo down?
>
> --cw
>
> On Sat, Jul 28, 2001 at 08:23:14PM -0500, Steve Snyder wrote:
> I just got this sequence of messages in my system log:
>
> Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
> Jul 28 19:47:44 sunburn last message repeated 9 times
> Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
> Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
> Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
> Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
> Jul 28 19:48:09 sunburn last message repeated 3 times
> Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
> Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.
>
> This is on a RedHat v7.1 + SMP kernel v2.4.7 system. What is the
> kernel trying to tell me here?
>
> Please cc me as I am not a subscriber to this list.
>
> Thanks.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-07-29 01:57:12

by Chris Wedgwood

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

On Sat, Jul 28, 2001 at 08:53:48PM -0500, Steve Snyder wrote:

No, and no errors are shown for it either:

# ifconfig lo
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:196907 errors:0 dropped:0 overruns:0 frame:0
TX packets:196907 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

All *seems* well. Just that 30-second period of messages and then
silence.


What is the machine doing? What kind of network is it attached to and
with how many hosts on it?



--cw

2001-07-29 02:15:37

by Steve Snyder

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

On Saturday 28 July 2001 08:57 pm, Chris Wedgwood wrote:
> On Sat, Jul 28, 2001 at 08:53:48PM -0500, Steve Snyder wrote:
>
> No, and no errors are shown for it either:
>
> # ifconfig lo
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:196907 errors:0 dropped:0 overruns:0 frame:0
> TX packets:196907 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
>
> All *seems* well. Just that 30-second period of messages and then
> silence.
>
>
> What is the machine doing? What kind of network is it attached to and
> with how many hosts on it?

It is a server for a small LAN. Interfaces: eth0=LAN, eth1=cable modem. I
believe that I was playing Quake3 (multi-player across internet) on one of
the LAN's client machines when the message were logged. No corresponding
messages are seen in the client's (another RHL v7.1 box) system log, but
then, it's not running iptables.

Further snooping shows the error msg text in file inux/net/ipv4/route.c:

if (net_ratelimit())
printk("Neighbour table overflow.\n");

The reference to "net_ratelimit" make me wonder if it is related to
iptables. I am using iptable, and have since kernel 2.4.1, but I've seen
these messages before. Hmmm.

2001-07-29 05:42:02

by Riley Williams

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

Hi Steve.

> I just got this sequence of messages in my system log:
>
> Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
> Jul 28 19:47:44 sunburn last message repeated 9 times
> Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
> Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
> Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
> Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
> Jul 28 19:48:09 sunburn last message repeated 3 times
> Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
> Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.
>
> This is on a RedHat v7.1 + SMP kernel v2.4.7 system. What is
> the kernel trying to tell me here?
>
> Please cc me as I am not a subscriber to this list.

This could be on completely the wrong track, but here's one of the
entries from the 2.4.5 kernel's Configure.help file (I don't yet have
2.4.7 on my system):

Q> ARP daemon support (EXPERIMENTAL)
Q> CONFIG_ARPD
Q> Normally, the kernel maintains an internal cache which maps IP
Q> addresses to hardware addresses on the local network, so that
Q> Ethernet/Token Ring/ etc. frames are sent to the proper address
Q> on the physical networking layer. For small networks having a
Q> few hundred directly connected hosts or less, keeping this
Q> address resolution (ARP) cache inside the kernel works well.
Q>
Q> However, maintaining an internal ARP cache does not work well
Q> for very large switched networks, and will use a lot of kernel
Q> memory if TCP/IP connections are made to many machines on the
Q> network.
Q>
Q> If you say Y here, the kernel's internal ARP cache will never
Q> grow to more than 256 entries (the oldest entries are expired
Q> in a LIFO manner) and communication will be attempted with the
Q> user space ARP daemon arpd. Arpd then answers the address
Q> resolution request either from its own cache or by asking the
Q> net.
Q>
Q> This code is experimental and also obsolete. If you want to
Q> use it, you need to find a version of the daemon arpd on the
Q> net somewhere, and you should also say Y to "Kernel/User
Q> network link driver", below. If unsure, say N.

The text in there looks suspiciously related to your problem to me.

Best wishes from Riley.

2001-07-29 09:14:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

> Further snooping shows the error msg text in file inux/net/ipv4/route.c:
>
> if (net_ratelimit())
> printk("Neighbour table overflow.\n");

>
> The reference to "net_ratelimit" make me wonder if it is related to
> iptables. I am using iptable, and have since kernel 2.4.1, but I've seen
> these messages before. Hmmm.

My experience with this is the message occurs when you a machine starts
arping for a non-existent ip address. I suspect net_ratelimit triggers
when there are too many arps.

Run tcpdump -n -i eth0 (assuming your network is on eth0) and see if you
see an arp request that never gets answered.

Eric

2001-07-29 09:47:06

by Kurt Roeckx

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

On Sat, Jul 28, 2001 at 09:15:11PM -0500, Steve Snyder wrote:
>
> Further snooping shows the error msg text in file inux/net/ipv4/route.c:
>
> if (net_ratelimit())
> printk("Neighbour table overflow.\n");
>
> The reference to "net_ratelimit" make me wonder if it is related to
> iptables. I am using iptable, and have since kernel 2.4.1, but I've seen
> these messages before. Hmmm.

net_ratelimit() is there to only log something every 5 seconds,
so your logs don't get flooded. It should be used for every
printk that has to do with net.

See core/utils.c


Kurt

2001-07-29 13:52:28

by jeff millar

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

We used to get this from an embedded PowerPC processor under 2.2.x when the
hardware to device driver interface got screwed up.

jeff

----- Original Message -----
From: "Riley Williams" <[email protected]>
To: "Steve Snyder" <[email protected]>
Cc: "Linux Kernel" <[email protected]>
Sent: Sunday, July 29, 2001 1:41 AM
Subject: Re: What does "Neighbour table overflow" message indicate?


> Hi Steve.
>
> > I just got this sequence of messages in my system log:
> >
> > Jul 28 19:47:44 sunburn kernel: Neighbour table overflow.
> > Jul 28 19:47:44 sunburn last message repeated 9 times
> > Jul 28 19:47:49 sunburn kernel: NET: 53 messages suppressed.
> > Jul 28 19:47:49 sunburn kernel: Neighbour table overflow.
> > Jul 28 19:48:07 sunburn kernel: NET: 21 messages suppressed.
> > Jul 28 19:48:07 sunburn kernel: Neighbour table overflow.
> > Jul 28 19:48:09 sunburn last message repeated 3 times
> > Jul 28 19:48:14 sunburn kernel: NET: 4 messages suppressed.
> > Jul 28 19:48:14 sunburn kernel: Neighbour table overflow.
> >
> > This is on a RedHat v7.1 + SMP kernel v2.4.7 system. What is
> > the kernel trying to tell me here?
> >
> > Please cc me as I am not a subscriber to this list.
>
> This could be on completely the wrong track, but here's one of the
> entries from the 2.4.5 kernel's Configure.help file (I don't yet have
> 2.4.7 on my system):
>
> Q> ARP daemon support (EXPERIMENTAL)
> Q> CONFIG_ARPD
> Q> Normally, the kernel maintains an internal cache which maps IP
> Q> addresses to hardware addresses on the local network, so that
> Q> Ethernet/Token Ring/ etc. frames are sent to the proper address
> Q> on the physical networking layer. For small networks having a
> Q> few hundred directly connected hosts or less, keeping this
> Q> address resolution (ARP) cache inside the kernel works well.
> Q>
> Q> However, maintaining an internal ARP cache does not work well
> Q> for very large switched networks, and will use a lot of kernel
> Q> memory if TCP/IP connections are made to many machines on the
> Q> network.
> Q>
> Q> If you say Y here, the kernel's internal ARP cache will never
> Q> grow to more than 256 entries (the oldest entries are expired
> Q> in a LIFO manner) and communication will be attempted with the
> Q> user space ARP daemon arpd. Arpd then answers the address
> Q> resolution request either from its own cache or by asking the
> Q> net.
> Q>
> Q> This code is experimental and also obsolete. If you want to
> Q> use it, you need to find a version of the daemon arpd on the
> Q> net somewhere, and you should also say Y to "Kernel/User
> Q> network link driver", below. If unsure, say N.
>
> The text in there looks suspiciously related to your problem to me.
>
> Best wishes from Riley.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-07-29 13:55:38

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?

In article <[email protected]> you wrote:
> if (net_ratelimit())
> printk("Neighbour table overflow.\n");

> The reference to "net_ratelimit" make me wonder if it is related to
> iptables. I am using iptable, and have since kernel 2.4.1, but I've seen
> these messages before. Hmmm.

Net ratelimit is used to limit the rate of messages or actions done by the
network module. In this case it only ensures, that the printk message is not
printed too often. The actual condition why the message is printed is above
this if.

Greetings
Bernd

2001-07-30 12:38:25

by Carlos O'Donell

[permalink] [raw]
Subject: Re: What does "Neighbour table overflow" message indicate?


> network module. In this case it only ensures, that the printk message is not
> printed too often. The actual condition why the message is printed is above
> this if.
>
> Greetings
> Bernd
> -

Snyder,

Just by looking at your email, @home, I can guess that your
cable modem is connected to an HFC Cable network segment.

In general these segments are extremely large and due to the
nature of the users, can cause large amounts of arp broadcast
traffic during peak times.

The message you are seeing is directly related to your arp cache
overflowing.

I've seen this message during high traffic hours on my 2.2.x
firewall.

Things to check:

- Is your netmask set correctly?
- Check to see how many hosts are on your segment?

======================================================
Why the kernel spat what it spat : blow by blow
======================================================

N.B. Using 2.4.7 Kernel Source.

I think the critical point is:

In route.c:

639: int err = arp_bind_neighbour(&rt->u.dst);
640: if (err) {
... [snip]

Which means that if the binding of an arp neighbour fails, then
we trod down the path closer towards that printk, that has
caused us so much distress.

In arp.c, we look for "arp_bind_neighbour" and find it on line 429:

Right off the bat, we hope that:

434: if (dev == NULL)
435: return -EINVAL;

Isn't the case :)

Unless, it's alredy bound, then the next line is the case...

436: if (n == NULL) {

And the only return that is non-zero is from:

440: n = __neigh_lookup_errno(
441:#ifdef CONFIG_ATM_CLIP
442: dev->type == ARPHRD_ATM ? &clip_tbl :
443:#endif
444: &arp_tbl, &nexthop, dev);
445: if (IS_ERR(n))
446: return PTR_ERR(n);

So __neigh_lookup_errno is the culprit...

In ./include/net/neighbour.h we have the function defined:

266:static inline struct neighbour *
267:__neigh_lookup_errno(struct neigh_table *tbl, const void *pkey,
268:struct net_device *dev)
...
275: return neigh_create(tbl, pkey, dev)

Is the interesting point.. since our table is overflowing, we
need to find the point where the entry is created :)

Off we go to line 288 in ./net/core/neighbour.c:
(I love to trace source!)

296: n = neigh_alloc(tbl);
297: if (n == NULL)
298: return ERR_PTR(-ENOBUFS);

Hrm... -ENOBUFS :)

In neigh_alloc, same file:

235: if (tbl->entries > tbl->gc_thresh3 ||
236: (tbl->entries > tbl->gc_thresh2 &&
237: now - tbl->last_flush > 5*HZ)) {
238: if (neigh_forced_gc(tbl) == 0 &&
239: tbl->entries > tbl->gc_thresh3)
240: return NULL;
241: }

Which leads us to note that if the cache is growing faster than
the garbage collecting (ref counting code) is being done, and we
begin to exceed our allocations, we will trigger a table
overflow.

Can you make the tables bigger?
What type of inpact does this have?
Should we be asking @Home to make segments smaller?
(Probably not possible)

In ./net/ipv4/arp.c you could change the GC collection parameters...
I'm not sure how they were tuned?

Line 187:
gc_interval: 30 * HZ,
gc_thresh1: 128,
gc_thresh2: 512,
gc_thresh3: 1024,

Hrm... just pondering.


=================================================================

Cheers,
Carlos O'Donell Jr.