2004-01-05 15:02:56

by Martin Knoblauch

[permalink] [raw]
Subject: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ?

>Hi,
>
>besides wishing everybody a Happy new Year 2004, I have one question.
>Have there been any changes in the multicast handling between 2.4.20
>and 2.4.22/23? Maybe specific to the "tg3" driver?
>
>Reason for my question is that the Ganglia monitoring toolkit stopped
>working with 2.4.22/23 kernels. Apparently mulicatst get sent, but
>nothing is received.
>
>Any ideas?
>
>
>Thanks
>Martin

Hi,

just realized the massive changes between 21 and 22. At least that is
answered :-) Question remains why no MC packets arrive in 2.4.22 and
later (checked with tcpdump). Is there anything that one has to enable
when running a newer kernel?

One diffeernce is the output of "cat /proc/net/igmp". On my 2.4.20
kernel it looks like:

[qx29340@lpsdm16 ~]$ cat /proc/net/igmp
Idx Device : Count Querier Group Users Timer Reporter
1 lo : 0 V2
010000E0 1 0:F63C1223 0
2 eth0 : 1 V2
010000E0 1 0:F63C1225 0
3 eth1 : 2 V2
470B02EF 1 0:F63C2428 1
010000E0 1 0:F63C1225 0

While on 2.4.22/23 it looks like:

qx29340@lpsdm20 linux-2.4.23-1-msc]$ cat /proc/net/igmp
Idx Device : Count Querier Group Users Timer Reporter
1 lo : 0 V2
010000E0 1 0:FFFE5D5D 0
2 eth0 : 1 V2
010000E0 1 0:FFFE5D5D 0
3 eth1 : 2 V2
470B02EF 1 0:FFFF8D5D 0
010000E0 1 0:FFFE5D5D 0


The difference seems to be the "reporter" flag for the 470B02EF
multicast group, which is exactely the adrress Ganglia uses.

Any ideas
Cheers
Martin
PS: Moved to linux-net

=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de


2004-01-06 02:22:15

by Martin Knoblauch

[permalink] [raw]
Subject: Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ?

--- David Stevens <[email protected]> wrote:
>
>
Hi David,

>
>
> Martin,
> If you have other hosts on that network that have sent IGMP
> reports,
> then
> the reporter flag will be cleared-- only one member on a network
> needs to
> send
> reports. That doesn't prevent the host from receiving multicasts--
> presence
> in
> /proc/net/igmp indicates it did join the group.

OK. I just wondered, because all hosts running the 2.4.20-18.7smp
kernel (RH7.3 errata) show the "1" in the reporter field, while all
hosts running 2.4.22/2.4.23 (vanilla plus NFS fixes from Trond) show
"0".

> Please send me some details about your set-up and I may be able
> to
> help.

We are running 21 HP/DL380G3.Each has two internal Broadcom NICs. The
second one (eth1) is used for the Ganglia multicast.

The kernels are 2.4.22 and 2.4.23 (now .24) with some NFS patches. In
the case of 2.4.24 those are:

01-posix_race
02-fix_commit
03-fix_osx
04-fix_lockd3
06-fix_unlink
07_seekdir

from http://www.fys.uio.no/~trondmy/src/Linux-2.4.x/2.4.23-rc1 None of
those looks like it does something to multicasts. In the worst case I
could try to run with plain 2.4.22/23, but that would have to ait until
Wednesday.

The kernel-config file is included.

> First, are the sender and receiver on the same network, or are
> you
> using a multicast router?

No. All systems are on the same network.

> Second, can you send me a tcpdump-format packet trace
> (preferrably
> only the multicast traffic, and as small as you can make it)?

What exactely do you want to see? On which box should I run tcpdump?
Which options (I'm not that deep into network debugging :-)?


> You mentioned tg3-- have you tried this with other hardware
> that
> worked?
>

The tg3 works with the 2.4.20-18.7smp (and earlier) kernels. It just
does not work with 2.4.22/23 (did not check 2.4.21).

Unfortunatelly I have no other boxes to test with. I only can say that
Ganglia never failed in this particular way on any setup.

Thanks
Martin

=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de


Attachments:
config-2.4.24-1-msc (32.09 kB)
config-2.4.24-1-msc

2004-01-06 11:11:51

by Martin Knoblauch

[permalink] [raw]
Subject: Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ?

>> Second, can you send me a tcpdump-format packet trace
>> (preferrably
>> only the multicast traffic, and as small as you can make it)?
>
>What exactely do you want to see? On which box should I run tcpdump?
>Which options (I'm not that deep into network debugging :-)?

so, I append two traces from "tcpdump -i eth1 multicast". "good" is
from the 2.4.20.-18.7smp kernel, "bad" is from 2.4.22.

What is interesting from a first look is the fact that the 2.4.22
kernel seems to drop/miss packets from outside that go to a
nonpriviledged port on the multicast group. Ganglia uses port 8649. In
the "bad" case, packets for "239.2.11.71.8649" only come from the local
box. Does this ring any bells for "tg3" or the networking code?

Cheers
Martin

=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de


Attachments:
bad.tcpdump (9.37 kB)
bad.tcpdump
good.tcpdump (8.41 kB)
good.tcpdump
Download all attachments

2004-01-06 12:59:31

by Lawrence MacIntyre

[permalink] [raw]
Subject: Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ?

Can you look on the adjacent multicast router(s) and see if your host is
listed as a group member? Or is this done with just a flat IP network
(one subnet)?

On Mon, 2004-01-05 at 21:21, Martin Knoblauch wrote:
> --- David Stevens <[email protected]> wrote:
> >
> >
> Hi David,
>
> >
> >
> > Martin,
> > If you have other hosts on that network that have sent IGMP
> > reports,
> > then
> > the reporter flag will be cleared-- only one member on a network
> > needs to
> > send
> > reports. That doesn't prevent the host from receiving multicasts--
> > presence
> > in
> > /proc/net/igmp indicates it did join the group.
>
> OK. I just wondered, because all hosts running the 2.4.20-18.7smp
> kernel (RH7.3 errata) show the "1" in the reporter field, while all
> hosts running 2.4.22/2.4.23 (vanilla plus NFS fixes from Trond) show
> "0".
>
> > Please send me some details about your set-up and I may be able
> > to
> > help.
>
> We are running 21 HP/DL380G3.Each has two internal Broadcom NICs. The
> second one (eth1) is used for the Ganglia multicast.
>
> The kernels are 2.4.22 and 2.4.23 (now .24) with some NFS patches. In
> the case of 2.4.24 those are:
>
> 01-posix_race
> 02-fix_commit
> 03-fix_osx
> 04-fix_lockd3
> 06-fix_unlink
> 07_seekdir
>
> from http://www.fys.uio.no/~trondmy/src/Linux-2.4.x/2.4.23-rc1 None of
> those looks like it does something to multicasts. In the worst case I
> could try to run with plain 2.4.22/23, but that would have to ait until
> Wednesday.
>
> The kernel-config file is included.
>
> > First, are the sender and receiver on the same network, or are
> > you
> > using a multicast router?
>
> No. All systems are on the same network.
>
> > Second, can you send me a tcpdump-format packet trace
> > (preferrably
> > only the multicast traffic, and as small as you can make it)?
>
> What exactely do you want to see? On which box should I run tcpdump?
> Which options (I'm not that deep into network debugging :-)?
>
>
> > You mentioned tg3-- have you tried this with other hardware
> > that
> > worked?
> >
>
> The tg3 works with the 2.4.20-18.7smp (and earlier) kernels. It just
> does not work with 2.4.22/23 (did not check 2.4.21).
>
> Unfortunatelly I have no other boxes to test with. I only can say that
> Ganglia never failed in this particular way on any setup.
>
> Thanks
> Martin
>
> =====
> ------------------------------------------------------
> Martin Knoblauch
> email: k n o b i AT knobisoft DOT de
> www: http://www.knobisoft.de
--
Lawrence MacIntyre 865.574.8696 [email protected]
Oak Ridge National Laboratory
High Performance Information Infrastructure Technology Group


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-06 13:17:58

by Martin Knoblauch

[permalink] [raw]
Subject: Re: Any changes in Multicast code between 2.4.20 and 2.4.22/23 ?

>Can you look on the adjacent multicast router(s) and see if your host
is
>listed as a group member? Or is this done with just a flat IP network
>(one subnet)?
Lawrence,

all in one subnet.

Martin

>On Mon, 2004-01-05 at 21:21, Martin Knoblauch wrote:
>> --- David Stevens <dlstevens@xxxxxxxxxx> wrote:
>> >


=====
------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de