2014-01-03 14:38:09

by Andreas Hartmann

[permalink] [raw]
Subject: Strange problem with vxlan!

Given is the following network architecture: connection of a virtual bridge br0 and a remote ethernet-switch through vxlan tunnel via WLAN:



host [br0: tap0,vxlan0]
| ||
| ===========
| ||
| ||
VM (WLAN access point) [br0: eth0, wlan0] ||
| ||
| ||
------------- ||
| ||
STA [wlan0, br0: eth0, vxlan0]
|
|
|----------------------------------|
Switch
|
----------
|
notebook [eth0]



The configuration of the vxlan is:

host: route add -net 224.0.0.0 netmask 240.0.0.0 dev br0
ip li add vxlan0 type vxlan id 1 group 239.1.1.1 dev br0

STA: route add -net 224.0.0.0 netmask 240.0.0.0 dev wlan0
ip li add vxlan0 type vxlan id 1 group 239.1.1.1 dev wlan0

This means: the endpoints of the vxlan tunnel are br0 (host) and STA (wlan0).
Between them, there is the WLAN AP (a VM belonging to the host).


Now the problem:

If the VM (=AP) runs e.g. Linux 3.4.x, all is working fine as expected.
If the VM runs 3.12.x or even 3.10.x, the tunnel works fine a few minutes after creation. Afterwards it is broken.

Broken means:
A "dhcpcd eth0" e.g. on the notebook times out, doesn't work any more. Traces show:
The udp-tunnel-packages sent by the STA through vxlan0 can be seen on the host / tap0, but they can't be seen on vxlan0 (if it works, they can be seen on the vxlan0 device, too).

On the host runs Linux 3.10.x, on the STA 3.11.6.


Any idea why vxlan is broken w/ Linux 3.12.x or 3.10.x on the VM (AP)?



Thanks in advance for any hint,
regards,
Andreas


2014-01-05 17:08:37

by Andreas Hartmann

[permalink] [raw]
Subject: Re: Strange problem with vxlan!

On Fri, 3 Jan 2014 15:27:19 +0100
Andreas Hartmann <[email protected]> wrote:

[...]

> Now the problem:
>
> If the VM (=AP) runs e.g. Linux 3.4.x, all is working fine as expected.
> If the VM runs 3.12.x or even 3.10.x, the tunnel works fine a few minutes after creation. Afterwards it is broken.
>
> Broken means:
> A "dhcpcd eth0" e.g. on the notebook times out, doesn't work any more. Traces show:
> The udp-tunnel-packages sent by the STA through vxlan0 can be seen on the host / tap0, but they can't be seen on vxlan0 (if it works, they can be seen on the vxlan0 device, too).
>
> On the host runs Linux 3.10.x, on the STA 3.11.6.

Some more findings:

- Problem can be seen with Linux 3.7 in the AP (VM), too.
- *Problem disappears* if the bridge device br0 on the host is set to
promiscuous mode.
- Sometimes, there can be seen the warning
"notebook dhcpcd[2784]: eth0: bad UDP checksum, ignoring"
when starting dhcpcd on the notebook with br0 / host set to promiscuous
mode (nevertheless dhcpcd worked fine). I never saw this warning
before.


Any idea how to fix the problem w/o running the bridge br0 on the host
in promiscuous mode?



Thanks for any hint,
Andreas

2014-01-09 06:50:47

by Andreas Hartmann

[permalink] [raw]
Subject: Re: Strange problem with vxlan!

Hi!

For all others, having problems w/ broken multicast:

See the solution here:
http://article.gmane.org/gmane.linux.kernel/1625590


Regards,
Andreas