2007-05-03 20:26:01

by Øyvind Vågen Jægtnes

[permalink] [raw]
Subject: Routing 600+ vlan's via linux problems (looks like arp problems)

Hi,

We have a one gigabit internet connection that is normally
routed by a hardware juniper router. The drive in this is down
and we need to use a linux machine (Pentium D 3 ghz) as a
temporary router.
Now setting up all the 600 vlans and assigning ip addresses
is no problem. We have testet all by using a laptop, setting up
600 vlan interfaces on this and running dhcpclient on all.
This worked just fine, all the interfaces got address.

Now for the real setup.
We closed the mac of the juniper to the network card that
would be connected to the internal LAN, set up the interfaces,
and swapped cables. This worked fine for approximately 100
of the computers that are connected, but the rest would not
get IP. The connected 100 computers were routed just fine.

What we think the problem is, is that the arp cache on the
linux router seems strange. It can resolve the MAC for the
100 clients that actually got through.
For the rest all we see in the arp cache is (incomplete)

Here is some of the listing for arp -n:
193.239.155.118 ether 00:0A:E4:59:75:66 C
eth1.1087
193.239.154.74 (incomplete)
eth1.1016
193.239.155.7 ether 00:11:95:D2:3F:FD C
eth1.2002
83.143.114.222 (incomplete)
eth1.1305
83.143.113.246 ether 00:0B:5D:4B:B8:77 C
eth1.1247
83.143.116.126 (incomplete)
eth1.1409
83.143.118.114 (incomplete)
eth1.1534
193.239.154.210 ether 00:03:0D:2F:1B:7F C
eth1.1050
169.254.69.247 ether 00:15:C5:C2:31:6C C
eth1.1262
83.143.112.38 (incomplete)
eth1.1131
83.143.118.18 (incomplete)
eth1.1510
83.143.112.118 ether 00:11:95:CE:BF:72 C
eth1.1151
192.168.1.2 ether 00:0D:88:78:C0:00 C
eth1.2050
83.143.117.138 (incomplete)
eth1.1476
83.143.116.18 (incomplete)
eth1.1382
83.143.118.26 (incomplete)
eth1.1512
83.143.112.6 (incomplete)
eth1.1123
193.239.155.62 (incomplete)
eth1.1073

`arp -n|wc -l` returns around 350, which is the number of active ports on the
edge switches...
this number is confirmed by snmp

I have looked through the source for arp.c but i can't see any immediate
problems. There is no messages in dmesg, kern.log og messages (except for
eth1.vlanid up * 600).

If anyone know what the problem can be, if this is a bug, or if PSBKC i would
much appreciate it.

regards
?yvind V?gen J?gtnes
+47 96 22 03 08
[email protected]


2007-05-03 20:35:11

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)


On May 3 2007 22:25, Øyvind Vågen Jægtnes wrote:
>
> Now for the real setup.
> We closed the mac of the juniper to the network card that
> would be connected to the internal LAN, set up the interfaces,
> and swapped cables. This worked fine for approximately 100
> of the computers that are connected, but the rest would not
> get IP. The connected 100 computers were routed just fine.

Try tcpdump. See if dhcpd actually hands out leases, and
run another tcpdump on a dhcp client machine, to see if it
even arrives. Furthermore, you could

> 193.239.155.118 ether 00:0A:E4:59:75:66 C
> eth1.1087
> 193.239.154.74 (incomplete)
> eth1.1016

check if eth1.1016 even "works", by
ping -bf 255.255.255.255 -I eth1.1016
if it lights up in the correct places, packets must be flowing.


Jan
--

2007-05-03 20:52:54

by Sam Ravnborg

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Thu, May 03, 2007 at 10:25:48PM +0200, ?yvind V?gen J?gtnes wrote:
> Hi,
Hi ?yvind.

Forwarding your mail to netdev where the networking people are
hanging out. Maybe they can help you.

Sam

>
> We have a one gigabit internet connection that is normally
> routed by a hardware juniper router. The drive in this is down
> and we need to use a linux machine (Pentium D 3 ghz) as a
> temporary router.
> Now setting up all the 600 vlans and assigning ip addresses
> is no problem. We have testet all by using a laptop, setting up
> 600 vlan interfaces on this and running dhcpclient on all.
> This worked just fine, all the interfaces got address.
>
> Now for the real setup.
> We closed the mac of the juniper to the network card that
> would be connected to the internal LAN, set up the interfaces,
> and swapped cables. This worked fine for approximately 100
> of the computers that are connected, but the rest would not
> get IP. The connected 100 computers were routed just fine.
>
> What we think the problem is, is that the arp cache on the
> linux router seems strange. It can resolve the MAC for the
> 100 clients that actually got through.
> For the rest all we see in the arp cache is (incomplete)
>
> Here is some of the listing for arp -n:
> 193.239.155.118 ether 00:0A:E4:59:75:66 C
> eth1.1087
> 193.239.154.74 (incomplete)
> eth1.1016
> 193.239.155.7 ether 00:11:95:D2:3F:FD C
> eth1.2002
> 83.143.114.222 (incomplete)
> eth1.1305
> 83.143.113.246 ether 00:0B:5D:4B:B8:77 C
> eth1.1247
> 83.143.116.126 (incomplete)
> eth1.1409
> 83.143.118.114 (incomplete)
> eth1.1534
> 193.239.154.210 ether 00:03:0D:2F:1B:7F C
> eth1.1050
> 169.254.69.247 ether 00:15:C5:C2:31:6C C
> eth1.1262
> 83.143.112.38 (incomplete)
> eth1.1131
> 83.143.118.18 (incomplete)
> eth1.1510
> 83.143.112.118 ether 00:11:95:CE:BF:72 C
> eth1.1151
> 192.168.1.2 ether 00:0D:88:78:C0:00 C
> eth1.2050
> 83.143.117.138 (incomplete)
> eth1.1476
> 83.143.116.18 (incomplete)
> eth1.1382
> 83.143.118.26 (incomplete)
> eth1.1512
> 83.143.112.6 (incomplete)
> eth1.1123
> 193.239.155.62 (incomplete)
> eth1.1073
>
> `arp -n|wc -l` returns around 350, which is the number of active ports on
> the
> edge switches...
> this number is confirmed by snmp
>
> I have looked through the source for arp.c but i can't see any immediate
> problems. There is no messages in dmesg, kern.log og messages (except for
> eth1.vlanid up * 600).
>
> If anyone know what the problem can be, if this is a bug, or if PSBKC i
> would
> much appreciate it.
>
> regards
> ?yvind V?gen J?gtnes
> +47 96 22 03 08
> [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2007-05-03 20:53:45

by Willy Tarreau

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Thu, May 03, 2007 at 10:25:48PM +0200, ?yvind V?gen J?gtnes wrote:
> Hi,
>
> We have a one gigabit internet connection that is normally
> routed by a hardware juniper router. The drive in this is down
> and we need to use a linux machine (Pentium D 3 ghz) as a
> temporary router.
> Now setting up all the 600 vlans and assigning ip addresses
> is no problem. We have testet all by using a laptop, setting up
> 600 vlan interfaces on this and running dhcpclient on all.
> This worked just fine, all the interfaces got address.
>
> Now for the real setup.
> We closed the mac of the juniper to the network card that
> would be connected to the internal LAN, set up the interfaces,
> and swapped cables. This worked fine for approximately 100
> of the computers that are connected, but the rest would not
> get IP. The connected 100 computers were routed just fine.
>
> What we think the problem is, is that the arp cache on the
> linux router seems strange. It can resolve the MAC for the
> 100 clients that actually got through.
> For the rest all we see in the arp cache is (incomplete)

I suspect that your arp cache is full (128 entries by default).
Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
set it as high as gc_thresh2 (512 for me), and I don't know what
happens above.

Hoping this helps,
Willy

2007-05-03 20:55:27

by Willy Tarreau

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Thu, May 03, 2007 at 10:53:41PM +0200, Willy Tarreau wrote:
> On Thu, May 03, 2007 at 10:25:48PM +0200, ?yvind V?gen J?gtnes wrote:
> > Hi,
> >
> > We have a one gigabit internet connection that is normally
> > routed by a hardware juniper router. The drive in this is down
> > and we need to use a linux machine (Pentium D 3 ghz) as a
> > temporary router.
> > Now setting up all the 600 vlans and assigning ip addresses
> > is no problem. We have testet all by using a laptop, setting up
> > 600 vlan interfaces on this and running dhcpclient on all.
> > This worked just fine, all the interfaces got address.
> >
> > Now for the real setup.
> > We closed the mac of the juniper to the network card that
> > would be connected to the internal LAN, set up the interfaces,
> > and swapped cables. This worked fine for approximately 100
> > of the computers that are connected, but the rest would not
> > get IP. The connected 100 computers were routed just fine.
> >
> > What we think the problem is, is that the arp cache on the
> > linux router seems strange. It can resolve the MAC for the
> > 100 clients that actually got through.
> > For the rest all we see in the arp cache is (incomplete)
>
> I suspect that your arp cache is full (128 entries by default).
> Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
^ insert /default/ here.

> set it as high as gc_thresh2 (512 for me), and I don't know what
> happens above.

Willy

2007-05-03 20:56:37

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)


On May 3 2007 22:53, Willy Tarreau wrote:
>> For the rest all we see in the arp cache is (incomplete)
>
>I suspect that your arp cache is full (128 entries by default).
>Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
>set it as high as gc_thresh2 (512 for me), and I don't know what
>happens above.

Above, you will perhaps need the not-so-elegant userspace arpd :-/


Jan
--

2007-05-03 21:12:13

by Øyvind Vågen Jægtnes

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On 5/3/07, Jan Engelhardt <[email protected]> wrote:
>
> On May 3 2007 22:53, Willy Tarreau wrote:
> >> For the rest all we see in the arp cache is (incomplete)
> >
> >I suspect that your arp cache is full (128 entries by default).
> >Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
> >set it as high as gc_thresh2 (512 for me), and I don't know what
> >happens above.
>
> Above, you will perhaps need the not-so-elegant userspace arpd :-/

Yes, i was suspecting that the arp cache got full, but i will try
increasing it :)
Would there be any huge bugs if i change these lines in arp.c:

.gc_thresh1 = 128,
.gc_thresh2 = 512,

to

.gc_thresh1 = 700,
.gc_thresh2 = 700,

under the definition for struct arp_tbl?
This setup will only run for about 1-2 hours while we fix the hardware
router (it is running now, but only on a backup flash card solution.
the harddrive in it died ;)

I have been looking at arpd, but i quickly discarded it as an option
since its marked both experimental and obsolete ;)

regards
?yvind V?gen J?gtnes
+47 96 22 03 08
[email protected]

2007-05-03 21:38:49

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Thu, 3 May 2007 22:53:46 +0200
Sam Ravnborg <[email protected]> wrote:

> On Thu, May 03, 2007 at 10:25:48PM +0200, Øyvind Vågen Jægtnes wrote:
> > Hi,
> Hi Øyvind.
>
> Forwarding your mail to netdev where the networking people are
> hanging out. Maybe they can help you.
>
> Sam
>
> >
> > We have a one gigabit internet connection that is normally
> > routed by a hardware juniper router. The drive in this is down
> > and we need to use a linux machine (Pentium D 3 ghz) as a
> > temporary router.
> > Now setting up all the 600 vlans and assigning ip addresses
> > is no problem. We have testet all by using a laptop, setting up
> > 600 vlan interfaces on this and running dhcpclient on all.
> > This worked just fine, all the interfaces got address.
> >
> > Now for the real setup.
> > We closed the mac of the juniper to the network card that
> > would be connected to the internal LAN, set up the interfaces,
> > and swapped cables. This worked fine for approximately 100
> > of the computers that are connected, but the rest would not
> > get IP. The connected 100 computers were routed just fine.
> >
> > What we think the problem is, is that the arp cache on the
> > linux router seems strange. It can resolve the MAC for the
> > 100 clients that actually got through.
> > For the rest all we see in the arp cache is (incomplete)
> >
> > Here is some of the listing for arp -n:
> > 193.239.155.118 ether 00:0A:E4:59:75:66 C
> > eth1.1087
> > 193.239.154.74 (incomplete)
> > eth1.1016
> > 193.239.155.7 ether 00:11:95:D2:3F:FD C
> > eth1.2002
> > 83.143.114.222 (incomplete)
> > eth1.1305
> > 83.143.113.246 ether 00:0B:5D:4B:B8:77 C
> > eth1.1247
> > 83.143.116.126 (incomplete)
> > eth1.1409
> > 83.143.118.114 (incomplete)
> > eth1.1534
> > 193.239.154.210 ether 00:03:0D:2F:1B:7F C
> > eth1.1050
> > 169.254.69.247 ether 00:15:C5:C2:31:6C C
> > eth1.1262
> > 83.143.112.38 (incomplete)
> > eth1.1131
> > 83.143.118.18 (incomplete)
> > eth1.1510
> > 83.143.112.118 ether 00:11:95:CE:BF:72 C
> > eth1.1151
> > 192.168.1.2 ether 00:0D:88:78:C0:00 C
> > eth1.2050
> > 83.143.117.138 (incomplete)
> > eth1.1476
> > 83.143.116.18 (incomplete)
> > eth1.1382
> > 83.143.118.26 (incomplete)
> > eth1.1512
> > 83.143.112.6 (incomplete)
> > eth1.1123
> > 193.239.155.62 (incomplete)
> > eth1.1073
> >
> > `arp -n|wc -l` returns around 350, which is the number of active ports on
> > the
> > edge switches...
> > this number is confirmed by snmp
> >
> > I have looked through the source for arp.c but i can't see any immediate
> > problems. There is no messages in dmesg, kern.log og messages (except for
> > eth1.vlanid up * 600).
> >
> > If anyone know what the problem can be, if this is a bug, or if PSBKC i
> > would
> > much appreciate it.
> >
> > regards
> > Øyvind Vågen Jægtnes
> > +47 96 22 03 08
> > [email protected]

What kernel version? Are you on a recent 2.6 kernel or stuck on some
old "vendor stable" 2.4 kernel?

--
Stephen Hemminger <[email protected]>

2007-05-03 22:23:46

by Willy Tarreau

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Thu, May 03, 2007 at 11:12:09PM +0200, ?yvind V?gen J?gtnes wrote:
> On 5/3/07, Jan Engelhardt <[email protected]> wrote:
> >
> >On May 3 2007 22:53, Willy Tarreau wrote:
> >>> For the rest all we see in the arp cache is (incomplete)
> >>
> >>I suspect that your arp cache is full (128 entries by default).
> >>Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
> >>set it as high as gc_thresh2 (512 for me), and I don't know what
> >>happens above.
> >
> >Above, you will perhaps need the not-so-elegant userspace arpd :-/
>
> Yes, i was suspecting that the arp cache got full, but i will try
> increasing it :)
> Would there be any huge bugs if i change these lines in arp.c:
>
> .gc_thresh1 = 128,
> .gc_thresh2 = 512,
>
> to
>
> .gc_thresh1 = 700,
> .gc_thresh2 = 700,
>
> under the definition for struct arp_tbl?

I don't think it could cause a problem, but network people will surely
correct me if I'm wrong.

> This setup will only run for about 1-2 hours while we fix the hardware
> router (it is running now, but only on a backup flash card solution.
> the harddrive in it died ;)

Huhhh! Please tell us exactly what make and model of ROUTER you are using
which embeds a HARD DRIVE, so that we recall never to buy that ! Having
seen uptimes of 5 years on moderately big access routers, I would have
find it awful to see them die multiple times in that timeframe because
of a crappy IDE drive inside !

> I have been looking at arpd, but i quickly discarded it as an option
> since its marked both experimental and obsolete ;)

I never dared to try it either, and since 512 has always been enough
for me, anything above is unknown area to me :-)

Regards,
Willy

2007-05-03 22:51:11

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)


On May 4 2007 00:23, Willy Tarreau wrote:
>
>> This setup will only run for about 1-2 hours while we fix the hardware
>> router (it is running now, but only on a backup flash card solution.
>> the harddrive in it died ;)
>
>Huhhh! Please tell us exactly what make and model of ROUTER you are using
>which embeds a HARD DRIVE, so that we recall never to buy that ! Having
>seen uptimes of 5 years on moderately big access routers, I would have
>find it awful to see them die multiple times in that timeframe because
>of a crappy IDE drive inside !

Haha. Would you be happy if it ran on a CF card instead? :>


Jan
--

2007-05-04 00:11:55

by Willy Tarreau

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Fri, May 04, 2007 at 12:50:17AM +0200, Jan Engelhardt wrote:
>
> On May 4 2007 00:23, Willy Tarreau wrote:
> >
> >> This setup will only run for about 1-2 hours while we fix the hardware
> >> router (it is running now, but only on a backup flash card solution.
> >> the harddrive in it died ;)
> >
> >Huhhh! Please tell us exactly what make and model of ROUTER you are using
> >which embeds a HARD DRIVE, so that we recall never to buy that ! Having
> >seen uptimes of 5 years on moderately big access routers, I would have
> >find it awful to see them die multiple times in that timeframe because
> >of a crappy IDE drive inside !
>
> Haha. Would you be happy if it ran on a CF card instead? :>

Yes, because at least when you design a system to run on a CF card, you
ensure never to write on it because you know that would kill it. Then
since you never write on it, it does not wear out and has no problem
running for years (unless you bought cheap end-user CF of course). But
industrial-grade CF *is* reliable for such usages. People having problems
with CF are dumb asses who install a full standard system on those
(sometimes even with swap) then complain it dies after one year.

A hard disk simply fails after some time even if you never use it at all.
A head flying 10 microns above a platter passing at 33 m/s obviously likes
to caress it sometimes, with a polite "oops sorry" excuse that you hear
meters away.

That's a pretty bad design to put such a SPOF in some equipment which IMHO
has no real justification for embedding one, really.

Cheers,
Willy

2007-05-04 03:48:22

by Øyvind Vågen Jægtnes

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

Hi again :)

On 5/4/07, Willy Tarreau <[email protected]> wrote:
> On Thu, May 03, 2007 at 11:12:09PM +0200, ?yvind V?gen J?gtnes wrote:
> > On 5/3/07, Jan Engelhardt <[email protected]> wrote:
> > >
> > >On May 3 2007 22:53, Willy Tarreau wrote:
> > >>> For the rest all we see in the arp cache is (incomplete)
> > >>
> > >>I suspect that your arp cache is full (128 entries by default).
> > >>Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
> > >>set it as high as gc_thresh2 (512 for me), and I don't know what
> > >>happens above.
> > >
> > >Above, you will perhaps need the not-so-elegant userspace arpd :-/
> >
> > Yes, i was suspecting that the arp cache got full, but i will try
> > increasing it :)
> > Would there be any huge bugs if i change these lines in arp.c:
> >
> > .gc_thresh1 = 128,
> > .gc_thresh2 = 512,
> >
> > to
> >
> > .gc_thresh1 = 700,
> > .gc_thresh2 = 700,
> >
> > under the definition for struct arp_tbl?
>
> I don't think it could cause a problem, but network people will surely
> correct me if I'm wrong.

System is up and running perfectly now, it is routing everything at
about 200 mbps now with only 5% load avg with the above changes to
arp.c

So the real question now is, why is this number so low by default?
It would probably be much better if this could be handled dynamically
in the kernel.

> > This setup will only run for about 1-2 hours while we fix the hardware
> > router (it is running now, but only on a backup flash card solution.
> > the harddrive in it died ;)
>
> Huhhh! Please tell us exactly what make and model of ROUTER you are using
> which embeds a HARD DRIVE, so that we recall never to buy that ! Having
> seen uptimes of 5 years on moderately big access routers, I would have
> find it awful to see them die multiple times in that timeframe because
> of a crappy IDE drive inside !

Its a Juniper M7i
It comes default with a 5400 rpm laptop 2.5" harddrive but now we
bought a more robust "server" 2.5" harddrive. It still barfs on the OS
install, so the linux is doing all the job now. Will get a juniper guy
to come and fix :)

As a side note, i'm starting to wonder if it was worth the $20k when i
could just have a linux machine to do the job with a clone for backup
;)

regards
?yvind V?gen J?gtnes
+47 96 22 03 08
[email protected]

2007-05-04 05:30:32

by Willy Tarreau

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

On Fri, May 04, 2007 at 05:48:18AM +0200, ?yvind V?gen J?gtnes wrote:
> Hi again :)
>
> On 5/4/07, Willy Tarreau <[email protected]> wrote:
> >On Thu, May 03, 2007 at 11:12:09PM +0200, ?yvind V?gen J?gtnes wrote:
> >> On 5/3/07, Jan Engelhardt <[email protected]> wrote:
> >> >
> >> >On May 3 2007 22:53, Willy Tarreau wrote:
> >> >>> For the rest all we see in the arp cache is (incomplete)
> >> >>
> >> >>I suspect that your arp cache is full (128 entries by default).
> >> >>Check /proc/sys/net/ipv4/neigh/gc_thresh1 (128 for me). You can
> >> >>set it as high as gc_thresh2 (512 for me), and I don't know what
> >> >>happens above.
> >> >
> >> >Above, you will perhaps need the not-so-elegant userspace arpd :-/
> >>
> >> Yes, i was suspecting that the arp cache got full, but i will try
> >> increasing it :)
> >> Would there be any huge bugs if i change these lines in arp.c:
> >>
> >> .gc_thresh1 = 128,
> >> .gc_thresh2 = 512,
> >>
> >> to
> >>
> >> .gc_thresh1 = 700,
> >> .gc_thresh2 = 700,
> >>
> >> under the definition for struct arp_tbl?
> >
> >I don't think it could cause a problem, but network people will surely
> >correct me if I'm wrong.
>
> System is up and running perfectly now, it is routing everything at
> about 200 mbps now with only 5% load avg with the above changes to
> arp.c
>
> So the real question now is, why is this number so low by default?
> It would probably be much better if this could be handled dynamically
> in the kernel.

I remember I read an argument against this a long time ago, but I
don't remember where. I think it was some arbitrary decision that
people using more than X ARP entries will need arpd. Most probably
the code path in the ARP updates is/was not much optimized to handle
large number of entries. Think about cable operators who may have
10-20000 entries !

> Its a Juniper M7i
> It comes default with a 5400 rpm laptop 2.5" harddrive but now we
> bought a more robust "server" 2.5" harddrive.

The "server" ones are not necessarily more robust, often they are faster.

> It still barfs on the OS
> install, so the linux is doing all the job now. Will get a juniper guy
> to come and fix :)
>
> As a side note, i'm starting to wonder if it was worth the $20k when i
> could just have a linux machine to do the job with a clone for backup
> ;)

That's often how linux penetrates the enterprise ;-)

Willy

2007-05-04 07:08:15

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

In article <[email protected]> you write:
>Its a Juniper M7i
>It comes default with a 5400 rpm laptop 2.5" harddrive but now we
>bought a more robust "server" 2.5" harddrive. It still barfs on the OS
>install, so the linux is doing all the job now. Will get a juniper guy
>to come and fix :)
>
>As a side note, i'm starting to wonder if it was worth the $20k when i
>could just have a linux machine to do the job with a clone for backup
>;)

Well, the features and esp. the JunOS cli are worth a lot. And
if you need to route more than say 3 gbit/s, PC hardware just
won't cut it.

Then again, if you like the CLI, don't need to route more
than 1 gbit/s, and don't need to do fancy stuff like MPLS,
QoS or shaping, there's always solutions like http://www.vyatta.com/

Mike.

2007-05-04 07:31:13

by Andi Kleen

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

"Miquel van Smoorenburg" <[email protected]> writes:

> And
> if you need to route more than say 3 gbit/s, PC hardware just
> won't cut it.

Each new x86 hardware generation normally can route more than the previous
generation. If you give out such a dubious number you would always need to give
it a (short) expiry date.

-Andi

2007-05-04 07:54:22

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)


On May 4 2007 02:11, Willy Tarreau wrote:
>>
>> Haha. Would you be happy if it ran on a CF card instead? :>
>
>Yes, because at least when you design a system to run on a CF card, you
>ensure never to write on it because you know that would kill it. Then
>since you never write on it, it does not wear out and has no problem
>running for years (unless you bought cheap end-user CF of course).

Funny, I just installed a 'full' Linux distro on a CF, like with a
regular harddisk, and it runs in full rw mode. Packing it up in a
squashfs and running the thing with aufs did not seem worth
the hassle of setting up a specialized initrd. And then, when you
need to make one change (firewall), it's faster than recreating the
sqfs image.
Will see how it long that lasts.


Jan
--

2007-05-04 08:06:44

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)


On May 4 2007 05:48, Øyvind Vågen Jægtnes wrote:
>
> As a side note, i'm starting to wonder if it was worth the $20k when i
> could just have a linux machine to do the job with a clone for backup
> ;)

Most often not. The big bosses (which do most decisions yet are not
always the cluefulst wrt. tech) look after "certification", the
"enterprise" sticker, and the correct blame policy (if it breaks, you
can kill <Vendor>; if your linux box breaks, you have to fix it
yourself). And here's an example case that it's not always optimal.

A $2k core router once died of the HD (sounds similar eh?), it took
the vendor 27h to replace it (and their office is just 500m away),
while if could have swapped the faulty disk ourselves, it would have
only taken as long as the disk copy takes (38 minutes for 80 GB at a
transfer rate of 35MB/s).


Jan
--

2007-05-04 14:27:14

by Paul Slootman

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

?yvind V?gen J?gtnes <[email protected]> wrote:
>
>Yes, i was suspecting that the arp cache got full, but i will try
>increasing it :)
>Would there be any huge bugs if i change these lines in arp.c:
>
> .gc_thresh1 = 128,
> .gc_thresh2 = 512,
>
>to
>
> .gc_thresh1 = 700,
> .gc_thresh2 = 700,
>
>under the definition for struct arp_tbl?

Why not simply update the /proc/sys values?
No need to recompile the kernel.....
We have this in the /etc/sysctl.conf for our firewall:

net/ipv4/neigh/default/gc_thresh1=32768
net/ipv4/neigh/default/gc_thresh2=65536
net/ipv4/neigh/default/gc_thresh3=262144
net/ipv4/route/gc_elasticity=8
net/ipv4/route/gc_interval=30
net/ipv4/route/gc_min_interval=2


Paul Slootman

2007-05-04 20:27:15

by Willy Tarreau

[permalink] [raw]
Subject: Re: Routing 600+ vlan's via linux problems (looks like arp problems)

Hi Jan,

On Fri, May 04, 2007 at 09:53:31AM +0200, Jan Engelhardt wrote:
>
> On May 4 2007 02:11, Willy Tarreau wrote:
> >>
> >> Haha. Would you be happy if it ran on a CF card instead? :>
> >
> >Yes, because at least when you design a system to run on a CF card, you
> >ensure never to write on it because you know that would kill it. Then
> >since you never write on it, it does not wear out and has no problem
> >running for years (unless you bought cheap end-user CF of course).
>
> Funny, I just installed a 'full' Linux distro on a CF, like with a
> regular harddisk, and it runs in full rw mode. Packing it up in a
> squashfs and running the thing with aufs did not seem worth
> the hassle of setting up a specialized initrd. And then, when you
> need to make one change (firewall), it's faster than recreating the
> sqfs image.
> Will see how it long that lasts.

This is acceptable for a single machine. I do have something similar
(though mostly read-only and with home dirs on NFS) at home as an
always-on browser. But managing a lot of remote machines that way is
often a difficult work, and the risk of losing remote machines increases
with the number of machines, the frequency of updates and the write
rates on the flash.

Packaging an easily upgradable and remotely testable system really
is worth it even for a few tens of machines.

Cheers,
Willy