2014-02-15 02:59:50

by Luis Chamberlain

[permalink] [raw]
Subject: [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends

From: "Luis R. Rodriguez" <[email protected]>

This v2 series changes the approach from my original virtualization
multicast patch series [0] by abandoning completely the multicast
issues and instead generalizing an approach for virtualization
backends. There are two things in common with virtualization
backends:

0) they should not become the root bridge
1) they don't need ipv4 / ipv6 interfaces

Both qemu's usage of TAP interfaces and xen-netback's driver
avoid getting their interfaces added to the root bridge by
using a high MAC address. Lets just generalize the solution
by making this a flag.

The skipping of IPv4 / IPv6 interfaces is an optimization
I observed possible while studying the xen-netback in a
shared physical bridge environment. I haven't been able
to test the NAT environment so I appreciate it if someone
can test these patches for that case if I don't get to it
eventually.

The same flags can be embraced by TAP interfaces when needed,
I tested this as a temporary patch as follows:

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 44c4db8..19b967e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -940,6 +940,7 @@ static void tun_net_init(struct net_device *dev)
ether_setup(dev);
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+ dev->priv_flags |= IFF_BRIDGE_NON_ROOT | IFF_SKIP_IP;

eth_hw_addr_random(dev);


a proper followup would be to specify the flags during open() or any
way prior, just to register_netdevice(). Before that is done we'd
need to evaluate all qemu use cases of the TAP interfaces both
for the xen HVM case (which tests fine for me) and for KVM's
use cases on both shared physical and in the NAT case. That is,
test the above patch and this series for all KVM / xen use cases.

[0] http://marc.info/?l=linux-netdev&m=139207142110536&w=2

Luis R. Rodriguez (4):
bridge: enable interfaces to opt out from becoming the root bridge
net: enables interface option to skip IP
xen-netback: use a random MAC address
xen-netback: skip IPv4 and IPv6 interfaces

drivers/net/xen-netback/interface.c | 14 +++++---------
include/uapi/linux/if.h | 2 ++
net/bridge/br_if.c | 2 ++
net/bridge/br_private.h | 1 +
net/bridge/br_stp_if.c | 2 ++
net/ipv4/devinet.c | 3 +++
net/ipv6/addrconf.c | 6 ++++++
7 files changed, 21 insertions(+), 9 deletions(-)

--
1.8.5.2


2014-02-15 03:00:39

by Luis Chamberlain

[permalink] [raw]
Subject: [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

From: "Luis R. Rodriguez" <[email protected]>

It doesn't make sense for some interfaces to become a root bridge
at any point in time. One example is virtual backend interfaces
which rely on other entities on the bridge for actual physical
connectivity. They only provide virtual access.

Device drivers that know they should never become part of the
root bridge have been using a trick of setting their MAC address
to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
of using these hacks lets the interfaces annotate its intent and
generalizes a solution for multiple drivers, while letting the
drivers use a random MAC address or one prefixed with a proper OUI.
This sort of hack is used by both qemu and xen for their backend
interfaces.

Cc: Stephen Hemminger <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/uapi/linux/if.h | 1 +
net/bridge/br_if.c | 2 ++
net/bridge/br_private.h | 1 +
net/bridge/br_stp_if.c | 2 ++
4 files changed, 6 insertions(+)

diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index d758163..8d10382 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -84,6 +84,7 @@
#define IFF_LIVE_ADDR_CHANGE 0x100000 /* device supports hardware address
* change when it's running */
#define IFF_MACVLAN 0x200000 /* Macvlan device */
+#define IFF_BRIDGE_NON_ROOT 0x400000 /* Don't consider for root bridge */


#define IF_GET_IFACE 0x0001 /* for querying only */
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 4bf02ad..a745415 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -228,6 +228,8 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
br_init_port(p);
p->state = BR_STATE_DISABLED;
br_stp_port_timer_init(p);
+ if (dev->priv_flags & IFF_BRIDGE_NON_ROOT)
+ p->flags |= BR_DONT_ROOT;
br_multicast_add_port(p);

return p;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 045d56e..a89e8ad 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -173,6 +173,7 @@ struct net_bridge_port
#define BR_ADMIN_COST 0x00000010
#define BR_LEARNING 0x00000020
#define BR_FLOOD 0x00000040
+#define BR_DONT_ROOT 0x00000080

#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
struct bridge_mcast_query ip4_query;
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 656a6f3..12fd848 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -228,6 +228,8 @@ bool br_stp_recalculate_bridge_id(struct net_bridge *br)
return false;

list_for_each_entry(p, &br->port_list, list) {
+ if (p->flags & BR_DONT_ROOT)
+ continue;
if (addr == br_mac_zero ||
memcmp(p->dev->dev_addr, addr, ETH_ALEN) < 0)
addr = p->dev->dev_addr;
--
1.8.5.2

2014-02-15 03:00:51

by Luis Chamberlain

[permalink] [raw]
Subject: [RFC v2 3/4] xen-netback: use a random MAC address

From: "Luis R. Rodriguez" <[email protected]>

The purpose of using a static MAC address of FE:FF:FF:FF:FF:FF
was to prevent our backend interfaces from being used by the
bridge and nominating our interface as a root bridge. This was
possible given that the bridge code will use the lowest MAC
address for a port once a new interface gets added to the bridge.
The bridge code has a generic feature now to allow interfaces
to opt out from root bridge nominations, use that instead.

Cc: Paul Durrant <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/net/xen-netback/interface.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index fff8cdd..d380e3f 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -42,6 +42,8 @@
#define XENVIF_QUEUE_LENGTH 32
#define XENVIF_NAPI_WEIGHT 64

+static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e };
+
int xenvif_schedulable(struct xenvif *vif)
{
return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
@@ -347,15 +349,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
for (i = 0; i < MAX_PENDING_REQS; i++)
vif->mmap_pages[i] = NULL;

- /*
- * Initialise a dummy MAC address. We choose the numerically
- * largest non-broadcast address to prevent the address getting
- * stolen by an Ethernet bridge for STP purposes.
- * (FE:FF:FF:FF:FF:FF)
- */
- memset(dev->dev_addr, 0xFF, ETH_ALEN);
- dev->dev_addr[0] &= ~0x01;
-
+ eth_hw_addr_random(dev);
+ memcpy(dev->dev_addr, xen_oui, 3);
+ dev->priv_flags |= IFF_BRIDGE_NON_ROOT;
netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);

netif_carrier_off(dev);
--
1.8.5.2

2014-02-15 03:00:53

by Luis Chamberlain

[permalink] [raw]
Subject: [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces

From: "Luis R. Rodriguez" <[email protected]>

The xen-netback driver is used only to provide a backend
interface for the frontend. The link is the only thing we
use, and that is used internally for letting us know when the
xen-netfront is ready, when it switches to XenbusStateConnected.

Note that only when the both the xen-netfront and xen-netback
are both in state XenbusStateConnected will xen-netback allow
userspace on the host (backend) to bring up the interface. Enabling
and disabling the interface will simply enable or disable NAPI
respectively, and that's used for IRQ communication set up with
the xen event channels.

Cc: Paul Durrant <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/net/xen-netback/interface.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index d380e3f..07e6fd2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -351,7 +351,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,

eth_hw_addr_random(dev);
memcpy(dev->dev_addr, xen_oui, 3);
- dev->priv_flags |= IFF_BRIDGE_NON_ROOT;
+ dev->priv_flags |= IFF_BRIDGE_NON_ROOT | IFF_SKIP_IP;
netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);

netif_carrier_off(dev);
--
1.8.5.2

2014-02-15 03:00:46

by Luis Chamberlain

[permalink] [raw]
Subject: [RFC v2 2/4] net: enables interface option to skip IP

From: "Luis R. Rodriguez" <[email protected]>

Some interfaces do not need to have any IPv4 or IPv6
addresses, so enable an option to specify this. One
example where this is observed are virtualization
backend interfaces which just use the net_device
constructs to help with their respective frontends.

This should optimize boot time and complexity on
virtualization environments for each backend interface
while also avoiding triggering SLAAC and DAD, which is
simply pointless for these type of interfaces.

Cc: "David S. Miller" <[email protected]>
cC: Alexey Kuznetsov <[email protected]>
Cc: James Morris <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Patrick McHardy <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
include/uapi/linux/if.h | 1 +
net/ipv4/devinet.c | 3 +++
net/ipv6/addrconf.c | 6 ++++++
3 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 8d10382..566d856 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -85,6 +85,7 @@
* change when it's running */
#define IFF_MACVLAN 0x200000 /* Macvlan device */
#define IFF_BRIDGE_NON_ROOT 0x400000 /* Don't consider for root bridge */
+#define IFF_SKIP_IP 0x800000 /* Skip IPv4, IPv6 */


#define IF_GET_IFACE 0x0001 /* for querying only */
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcb..8e9ef07 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1342,6 +1342,9 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,

ASSERT_RTNL();

+ if (dev->priv_flags & IFF_SKIP_IP)
+ goto out;
+
if (!in_dev) {
if (event == NETDEV_REGISTER) {
in_dev = inetdev_init(dev);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 4b6b720..57f58e3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -314,6 +314,9 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)

ASSERT_RTNL();

+ if (dev->priv_flags & IFF_SKIP_IP)
+ return NULL;
+
if (dev->mtu < IPV6_MIN_MTU)
return NULL;

@@ -2749,6 +2752,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
int run_pending = 0;
int err;

+ if (dev->priv_flags & IFF_SKIP_IP)
+ return NOTIFY_OK;
+
switch (event) {
case NETDEV_REGISTER:
if (!idev && dev->mtu >= IPV6_MIN_MTU) {
--
1.8.5.2

2014-02-16 18:57:08

by Ben Hutchings

[permalink] [raw]
Subject: Re: [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> It doesn't make sense for some interfaces to become a root bridge

I think you mean 'root port'.

> at any point in time. One example is virtual backend interfaces
> which rely on other entities on the bridge for actual physical
> connectivity. They only provide virtual access.
>
> Device drivers that know they should never become part of the
> root bridge have been using a trick of setting their MAC address
> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
> of using these hacks lets the interfaces annotate its intent and
> generalizes a solution for multiple drivers, while letting the
> drivers use a random MAC address or one prefixed with a proper OUI.
> This sort of hack is used by both qemu and xen for their backend
> interfaces.
>
> Cc: Stephen Hemminger <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> include/uapi/linux/if.h | 1 +
> net/bridge/br_if.c | 2 ++
> net/bridge/br_private.h | 1 +
> net/bridge/br_stp_if.c | 2 ++
> 4 files changed, 6 insertions(+)
>
> diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
> index d758163..8d10382 100644
> --- a/include/uapi/linux/if.h
> +++ b/include/uapi/linux/if.h
> @@ -84,6 +84,7 @@
> #define IFF_LIVE_ADDR_CHANGE 0x100000 /* device supports hardware address
> * change when it's running */
> #define IFF_MACVLAN 0x200000 /* Macvlan device */
> +#define IFF_BRIDGE_NON_ROOT 0x400000 /* Don't consider for root bridge */
[...]

Does it really make sense to add a flag that says exactly which special
behaviour you want, or would it be better to define the flag as a
passive property, which other drivers/protocols then use as a condition
for special behaviour?

The fact that you also define the IFF_BRIDGE_SKIP_IP flag, and set it on
exactly the same devices, makes me think that they should actually be a
single flag. I don't know how that flag should be named or described,
though.

Ben.

--
Ben Hutchings
Any sufficiently advanced bug is indistinguishable from a feature.


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-02-16 18:57:59

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Fri, 14 Feb 2014 18:59:37 -0800
"Luis R. Rodriguez" <[email protected]> wrote:

> From: "Luis R. Rodriguez" <[email protected]>
>
> It doesn't make sense for some interfaces to become a root bridge
> at any point in time. One example is virtual backend interfaces
> which rely on other entities on the bridge for actual physical
> connectivity. They only provide virtual access.
>
> Device drivers that know they should never become part of the
> root bridge have been using a trick of setting their MAC address
> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
> of using these hacks lets the interfaces annotate its intent and
> generalizes a solution for multiple drivers, while letting the
> drivers use a random MAC address or one prefixed with a proper OUI.
> This sort of hack is used by both qemu and xen for their backend
> interfaces.
>
> Cc: Stephen Hemminger <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>

This is already supported in a more standard way via the root
block flag.

2014-02-17 10:27:40

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends

On 15/02/14 02:59, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> This v2 series changes the approach from my original virtualization
> multicast patch series [0] by abandoning completely the multicast
> issues and instead generalizing an approach for virtualization
> backends. There are two things in common with virtualization
> backends:
>
> 0) they should not become the root bridge
> 1) they don't need ipv4 / ipv6 interfaces

Why? There's no real difference between a backend network device and a
physical device (from the point of view of the backend domain). I do
not think these are intrinsic properties of backend devices.

I can see these being useful knobs for administrators (or management
toolstacks) to turn on, on a per-device basis.

David

2014-02-17 10:29:46

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 3/4] xen-netback: use a random MAC address

On 15/02/14 02:59, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> The purpose of using a static MAC address of FE:FF:FF:FF:FF:FF
> was to prevent our backend interfaces from being used by the
> bridge and nominating our interface as a root bridge. This was
> possible given that the bridge code will use the lowest MAC
> address for a port once a new interface gets added to the bridge.
> The bridge code has a generic feature now to allow interfaces
> to opt out from root bridge nominations, use that instead.
[...]
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -42,6 +42,8 @@
> #define XENVIF_QUEUE_LENGTH 32
> #define XENVIF_NAPI_WEIGHT 64
>
> +static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e };

You shouldn't use a vendor prefix with a random MAC address. You should
set the locally administered bit and clear the multicast/unicast bit and
randomize the remaining 46 bits.

(If existing VIF scripts are doing something similar, they also need to
be fixed.)

David

2014-02-17 14:37:23

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces

There is a valid scenario to put IP addresses on the backend VIFs:

http://wiki.xen.org/wiki/Xen_Networking#Routing

Also, the backend is not necessarily Dom0, you can connect twou guests
with backend/frontend pairs.

Zoli

On 15/02/14 02:59, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> The xen-netback driver is used only to provide a backend
> interface for the frontend. The link is the only thing we
> use, and that is used internally for letting us know when the
> xen-netfront is ready, when it switches to XenbusStateConnected.
>
> Note that only when the both the xen-netfront and xen-netback
> are both in state XenbusStateConnected will xen-netback allow
> userspace on the host (backend) to bring up the interface. Enabling
> and disabling the interface will simply enable or disable NAPI
> respectively, and that's used for IRQ communication set up with
> the xen event channels.
>
> Cc: Paul Durrant <[email protected]>
> Cc: Ian Campbell <[email protected]>
> Cc: Wei Liu <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> drivers/net/xen-netback/interface.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index d380e3f..07e6fd2 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -351,7 +351,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>
> eth_hw_addr_random(dev);
> memcpy(dev->dev_addr, xen_oui, 3);
> - dev->priv_flags |= IFF_BRIDGE_NON_ROOT;
> + dev->priv_flags |= IFF_BRIDGE_NON_ROOT | IFF_SKIP_IP;
> netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);
>
> netif_carrier_off(dev);
>

2014-02-17 17:52:51

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On 15/02/14 02:59, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> It doesn't make sense for some interfaces to become a root bridge
> at any point in time. One example is virtual backend interfaces
> which rely on other entities on the bridge for actual physical
> connectivity. They only provide virtual access.

It is possible that a guest bridge together to VIF, either from the same
Dom0 bridge or from different ones. In that case using STP on VIFs sound
sensible to me.

Zoli

2014-02-17 20:23:29

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> Some interfaces do not need to have any IPv4 or IPv6
> addresses, so enable an option to specify this. One
> example where this is observed are virtualization
> backend interfaces which just use the net_device
> constructs to help with their respective frontends.
>
> This should optimize boot time and complexity on
> virtualization environments for each backend interface
> while also avoiding triggering SLAAC and DAD, which is
> simply pointless for these type of interfaces.

Would it not be better/cleaner to use disable_ipv6 and then add a
disable_ipv4 sysctl, then use those with that interface? The
IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
already doing.

Dan

> Cc: "David S. Miller" <[email protected]>
> cC: Alexey Kuznetsov <[email protected]>
> Cc: James Morris <[email protected]>
> Cc: Hideaki YOSHIFUJI <[email protected]>
> Cc: Patrick McHardy <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> include/uapi/linux/if.h | 1 +
> net/ipv4/devinet.c | 3 +++
> net/ipv6/addrconf.c | 6 ++++++
> 3 files changed, 10 insertions(+)
>
> diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
> index 8d10382..566d856 100644
> --- a/include/uapi/linux/if.h
> +++ b/include/uapi/linux/if.h
> @@ -85,6 +85,7 @@
> * change when it's running */
> #define IFF_MACVLAN 0x200000 /* Macvlan device */
> #define IFF_BRIDGE_NON_ROOT 0x400000 /* Don't consider for root bridge */
> +#define IFF_SKIP_IP 0x800000 /* Skip IPv4, IPv6 */
>
>
> #define IF_GET_IFACE 0x0001 /* for querying only */
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index a1b5bcb..8e9ef07 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -1342,6 +1342,9 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
>
> ASSERT_RTNL();
>
> + if (dev->priv_flags & IFF_SKIP_IP)
> + goto out;
> +
> if (!in_dev) {
> if (event == NETDEV_REGISTER) {
> in_dev = inetdev_init(dev);
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 4b6b720..57f58e3 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -314,6 +314,9 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
>
> ASSERT_RTNL();
>
> + if (dev->priv_flags & IFF_SKIP_IP)
> + return NULL;
> +
> if (dev->mtu < IPV6_MIN_MTU)
> return NULL;
>
> @@ -2749,6 +2752,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
> int run_pending = 0;
> int err;
>
> + if (dev->priv_flags & IFF_SKIP_IP)
> + return NOTIFY_OK;
> +
> switch (event) {
> case NETDEV_REGISTER:
> if (!idev && dev->mtu >= IPV6_MIN_MTU) {

2014-02-18 11:23:01

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 3/4] xen-netback: use a random MAC address

On Mon, 2014-02-17 at 10:29 +0000, David Vrabel wrote:
> On 15/02/14 02:59, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <[email protected]>
> >
> > The purpose of using a static MAC address of FE:FF:FF:FF:FF:FF
> > was to prevent our backend interfaces from being used by the
> > bridge and nominating our interface as a root bridge. This was
> > possible given that the bridge code will use the lowest MAC
> > address for a port once a new interface gets added to the bridge.
> > The bridge code has a generic feature now to allow interfaces
> > to opt out from root bridge nominations, use that instead.
> [...]
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -42,6 +42,8 @@
> > #define XENVIF_QUEUE_LENGTH 32
> > #define XENVIF_NAPI_WEIGHT 64
> >
> > +static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e };
>
> You shouldn't use a vendor prefix with a random MAC address. You should
> set the locally administered bit and clear the multicast/unicast bit and
> randomize the remaining 46 bits.

I'd have thought that eth_hw_addr_random would get this right, *checks*
yes it does. And then this patch tramples overt the top three bytes.

Might there be any requirement to have a specific MAC on the vif device?
IOW do we need to figure out a way to plumb this through the Xen tools
(perhaps having the vif script sort it out).

Speaking of which -- do the Xen tools not overwrite this random mac from
xen-network-common.sh:_setup_bridge_port. What is the plan to change
that (in a forwards/backwards compatible manner).

Ian.

2014-02-18 19:44:24

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends

On Mon, Feb 17, 2014 at 2:27 AM, David Vrabel <[email protected]> wrote:
> On 15/02/14 02:59, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <[email protected]>
>>
>> This v2 series changes the approach from my original virtualization
>> multicast patch series [0] by abandoning completely the multicast
>> issues and instead generalizing an approach for virtualization
>> backends. There are two things in common with virtualization
>> backends:
>>
>> 0) they should not become the root bridge
>> 1) they don't need ipv4 / ipv6 interfaces
>
> Why? There's no real difference between a backend network device and a
> physical device (from the point of view of the backend domain). I do
> not think these are intrinsic properties of backend devices.

Let me clarify the original motivation as that can likely help explain
how I ended up with this patch series.

SUSE has had reports of xen backend interfaces ending up with
duplicate address notification filling up logs on systems with a
series of guests, these reports go back to 2006. This was root caused
to DAD on IPv6 interfaces, and a work around implemented to disable
DAD [0] on multicast links. Even though this patch as a work around
should not be applicable anymore given that since the xen-netback
upstreaming since 2.6.39 ether_setup is used and that enables the
multicast flag, we should try ensure the issue doesn't creep up
anymore. As per the IPv6 RFCs and Linux IPv6 implementation -- DAD
should be triggered even in the case of manual IP configuration and
when the link goes up, as such SLAAC will always take place on IPv6
interfaces. Although not documented upon my review I determined the
original issue could also be attributed to the corner case documented
on Appendix A of RFC 4862 [1] and this could be more prevalent for
xen-netback given we stuck to the same MAC address for all xen-netback
interfaces. I first tried to generalize the work around and address
the multicast case requirement for IPv6 [2], and explicitly disabling
multicast on xen-netback. Although this approach could likely be
generalized further by taking into account for NBMA links by checking
dev->type I determined we didn't need IPv6 interfaces at all on the
xent-netback interfaces. This lead me to further review if we even
needed IPv4 interfaces as well, and it turns out we do not.

New motivation: removing IPv4 and IPv6 from the backend interfaces can
save up a lot of boiler plate run time code, triggers from ever taking
place, and simplifying the backend interaces. If there is no use for
IPv4 and IPv6 interfaces why do we have them? Note: I have yet to test
the NAT case.

> I can see these being useful knobs for administrators (or management
> toolstacks) to turn on, on a per-device basis.

Agreed but these knobs don't even exist for drivers yet, let alone for
system administrators. I certainly can shoot for another series to let
administrators configure this as a preference but -- if we know a
driver won't need IPv4 and IPv6 interfaces why not just allow drivers
to disable them all together? Consider the simplification of the
interfaces on the host.

[0] https://gitorious.org/opensuse/kernel-source/source/8e16582178a29b03e850468004a47e7be5ed3005:patches.xen/ipv6-no-autoconf
[1] http://tools.ietf.org/html/rfc4862
[2] http://marc.info/?l=linux-netdev&m=139207142110536&w=2

Luis

2014-02-18 20:17:00

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces

On Mon, Feb 17, 2014 at 6:36 AM, Zoltan Kiss <[email protected]> wrote:
> There is a valid scenario to put IP addresses on the backend VIFs:
>
> http://wiki.xen.org/wiki/Xen_Networking#Routing

This is useful thanks!

> Also, the backend is not necessarily Dom0, you can connect two guests with
> backend/frontend pairs.

Can you elaborate a bit more on this type of setup?

Luis

2014-02-18 21:02:53

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
<[email protected]> wrote:
> On Fri, 14 Feb 2014 18:59:37 -0800
> "Luis R. Rodriguez" <[email protected]> wrote:
>
>> From: "Luis R. Rodriguez" <[email protected]>
>>
>> It doesn't make sense for some interfaces to become a root bridge
>> at any point in time. One example is virtual backend interfaces
>> which rely on other entities on the bridge for actual physical
>> connectivity. They only provide virtual access.
>>
>> Device drivers that know they should never become part of the
>> root bridge have been using a trick of setting their MAC address
>> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
>> of using these hacks lets the interfaces annotate its intent and
>> generalizes a solution for multiple drivers, while letting the
>> drivers use a random MAC address or one prefixed with a proper OUI.
>> This sort of hack is used by both qemu and xen for their backend
>> interfaces.
>>
>> Cc: Stephen Hemminger <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>
> This is already supported in a more standard way via the root
> block flag.

Great! For documentation purposes the root_block flag is a sysfs
attribute, added via 3.8 through commit 1007dd1a. The respective
interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
bridge utility or through sysfs:

mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
/sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0/brport/root_block
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
/sys/devices/virtual/net/vif3.0-emu/brport/root_block

mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
0
mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
dev vif3.0 root_block on
mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
/sys/devices/vif-3-0/net/vif3.0/brport/root_block
1

So if we'd want to avoid using the MAC address hack alternative to
skip a root port userspace would need to be updated to simply set this
attribute after adding the device to the bridge. Based on Zoltan's
feedback there seems to be use cases to not enable this always for all
xen-netback interfaces though as such we can just punt this to
userspace for the topologies that require this.

The original motivation for this series was to avoid the IPv6
duplicate address incurred by the MAC address hack for avoiding the
root bridge. Given that Zoltan also noted a use case whereby IPv4 and
IPv6 addresses can be assigned to the backend interfaces we should be
able to avoid the duplicate address situation for IPv6 by using a
proper random MAC address *once* userspace has been updated also to
use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
this flag for older kernels (older than 3.8) as root_block is not
implemented on those kernels and the MAC address hack would still be
used there. This strategy however does put a requirement on new
kernels to use new userspace as otherwise the MAC address workaround
would not be in place and root_block would not take effect.

Luis

2014-02-18 21:19:40

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams <[email protected]> wrote:
> On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <[email protected]>
>>
>> Some interfaces do not need to have any IPv4 or IPv6
>> addresses, so enable an option to specify this. One
>> example where this is observed are virtualization
>> backend interfaces which just use the net_device
>> constructs to help with their respective frontends.
>>
>> This should optimize boot time and complexity on
>> virtualization environments for each backend interface
>> while also avoiding triggering SLAAC and DAD, which is
>> simply pointless for these type of interfaces.
>
> Would it not be better/cleaner to use disable_ipv6 and then add a
> disable_ipv4 sysctl, then use those with that interface?

Sure, but note that the both disable_ipv6 and accept_dada sysctl
parameters are global. ipv4 and ipv6 interfaces are created upon
NETDEVICE_REGISTER, which will get triggered when a driver calls
register_netdev(). The goal of this patch was to enable an early
optimization for drivers that have no need ever for ipv4 or ipv6
interfaces.

Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
backends though, as such this is no longer applicable as a
requirement. The ipv4 sysctl however still seems like a reasonable
approach to enable optimizations of the network in topologies where
its known we won't need them but -- we'd need to consider a much more
granular solution, not just global as it is now for disable_ipv6, and
we'd also have to figure out a clean way to do this to not incur the
cost of early address interface addition upon register_netdev().

Given that we have a use case for ipv4 and ipv6 addresses on
xen-netback we no longer have an immediate use case for such early
optimization primitives though, so I'll drop this.

> The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
> already doing.

disable_ipv6 is global, the goal was to make this granular and skip
the cost upon early boot, but its been clarified we don't need this.

Luis

2014-02-18 21:30:29

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 3/4] xen-netback: use a random MAC address

On Tue, Feb 18, 2014 at 3:22 AM, Ian Campbell <[email protected]> wrote:
> On Mon, 2014-02-17 at 10:29 +0000, David Vrabel wrote:
>> On 15/02/14 02:59, Luis R. Rodriguez wrote:
>> > From: "Luis R. Rodriguez" <[email protected]>
>> >
>> > The purpose of using a static MAC address of FE:FF:FF:FF:FF:FF
>> > was to prevent our backend interfaces from being used by the
>> > bridge and nominating our interface as a root bridge. This was
>> > possible given that the bridge code will use the lowest MAC
>> > address for a port once a new interface gets added to the bridge.
>> > The bridge code has a generic feature now to allow interfaces
>> > to opt out from root bridge nominations, use that instead.
>> [...]
>> > --- a/drivers/net/xen-netback/interface.c
>> > +++ b/drivers/net/xen-netback/interface.c
>> > @@ -42,6 +42,8 @@
>> > #define XENVIF_QUEUE_LENGTH 32
>> > #define XENVIF_NAPI_WEIGHT 64
>> >
>> > +static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e };
>>
>> You shouldn't use a vendor prefix with a random MAC address. You should
>> set the locally administered bit and clear the multicast/unicast bit and
>> randomize the remaining 46 bits.
>
> I'd have thought that eth_hw_addr_random would get this right, *checks*
> yes it does. And then this patch tramples overt the top three bytes.
>
> Might there be any requirement to have a specific MAC on the vif device?
> IOW do we need to figure out a way to plumb this through the Xen tools
> (perhaps having the vif script sort it out).

Based on Stephen's feedback we should be setting IFLA_BRPORT_PROTECT
to the xen-netback and TAP interfaces in topologies where it makes
sense prior to adding them to the bridge. Userspace can surely deal
with the MAC address but I believe removing the static MAC address
would be good once we get userspace to use the IFLA_BRPORT_PROTECT
flag, to avoid the IPv6 duplication issue incurred by the current
static MAC address. The MAC address consideration remains given that
as per Zoltan there are topologies where the xen-netback interfaces
can make use of a either an IPv4 or IPv6 address.

> Speaking of which -- do the Xen tools not overwrite this random mac from
> xen-network-common.sh:_setup_bridge_port. What is the plan to change
> that (in a forwards/backwards compatible manner).

I'm not seeing that happen now ?

Luis

2014-02-18 21:43:05

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Tue, 18 Feb 2014 13:19:15 -0800
"Luis R. Rodriguez" <[email protected]> wrote:

> Sure, but note that the both disable_ipv6 and accept_dada sysctl
> parameters are global. ipv4 and ipv6 interfaces are created upon
> NETDEVICE_REGISTER, which will get triggered when a driver calls
> register_netdev(). The goal of this patch was to enable an early
> optimization for drivers that have no need ever for ipv4 or ipv6
> interfaces.

The trick with ipv6 is to register the device, then have userspace
do the ipv6 sysctl before bringing the device up.

2014-02-19 09:47:28

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces

On Tue, 2014-02-18 at 12:16 -0800, Luis R. Rodriguez wrote:
> On Mon, Feb 17, 2014 at 6:36 AM, Zoltan Kiss <[email protected]> wrote:
> > Also, the backend is not necessarily Dom0, you can connect two guests with
> > backend/frontend pairs.
>
> Can you elaborate a bit more on this type of setup?

The domain providing backend networking services is not necessarily
dom0, it might be a driver domain:
http://wiki.xen.org/wiki/Driver_Domain

I think from your PoV here it probably doesn't matter whether the driver
domain is dom0 or some other domain.

Ian.

2014-02-19 09:48:32

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends

On Tue, 2014-02-18 at 11:43 -0800, Luis R. Rodriguez wrote:
>
> New motivation: removing IPv4 and IPv6 from the backend interfaces can
> save up a lot of boiler plate run time code, triggers from ever taking
> place, and simplifying the backend interaces. If there is no use for
> IPv4 and IPv6 interfaces why do we have them? Note: I have yet to test
> the NAT case.

I think you need to do that test that before you can unequivocally state
that there is no use for IPv4/6 interfaces here.

Ian.

2014-02-19 09:53:06

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Tue, 2014-02-18 at 13:02 -0800, Luis R. Rodriguez wrote:
> On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
> <[email protected]> wrote:
> > On Fri, 14 Feb 2014 18:59:37 -0800
> > "Luis R. Rodriguez" <[email protected]> wrote:
> >
> >> From: "Luis R. Rodriguez" <[email protected]>
> >>
> >> It doesn't make sense for some interfaces to become a root bridge
> >> at any point in time. One example is virtual backend interfaces
> >> which rely on other entities on the bridge for actual physical
> >> connectivity. They only provide virtual access.
> >>
> >> Device drivers that know they should never become part of the
> >> root bridge have been using a trick of setting their MAC address
> >> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
> >> of using these hacks lets the interfaces annotate its intent and
> >> generalizes a solution for multiple drivers, while letting the
> >> drivers use a random MAC address or one prefixed with a proper OUI.
> >> This sort of hack is used by both qemu and xen for their backend
> >> interfaces.
> >>
> >> Cc: Stephen Hemminger <[email protected]>
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Cc: [email protected]
> >> Signed-off-by: Luis R. Rodriguez <[email protected]>
> >
> > This is already supported in a more standard way via the root
> > block flag.
>
> Great! For documentation purposes the root_block flag is a sysfs
> attribute, added via 3.8 through commit 1007dd1a. The respective
> interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
> bridge utility or through sysfs:
>
> mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
> /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0/brport/root_block
> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
> /sys/devices/virtual/net/vif3.0-emu/brport/root_block
>
> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
> 0
> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
> dev vif3.0 root_block on
> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
> 1
>
> So if we'd want to avoid using the MAC address hack alternative to
> skip a root port userspace would need to be updated to simply set this
> attribute after adding the device to the bridge. Based on Zoltan's
> feedback there seems to be use cases to not enable this always for all
> xen-netback interfaces though as such we can just punt this to
> userspace for the topologies that require this.
>
> The original motivation for this series was to avoid the IPv6
> duplicate address incurred by the MAC address hack for avoiding the
> root bridge. Given that Zoltan also noted a use case whereby IPv4 and
> IPv6 addresses can be assigned to the backend interfaces we should be
> able to avoid the duplicate address situation for IPv6 by using a
> proper random MAC address *once* userspace has been updated also to
> use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
> this flag for older kernels (older than 3.8) as root_block is not
> implemented on those kernels and the MAC address hack would still be
> used there. This strategy however does put a requirement on new
> kernels to use new userspace as otherwise the MAC address workaround
> would not be in place and root_block would not take effect.

Can't we arrange things in the Xen hotplug scripts such that if the
root_block stuff isn't available/doesn't work we fallback to the
existing fe:ff:ff:ff:ff usage?

That would avoid concerns about forward/backwards compat I think. It
wouldn't solve the issue you are targeting on old systems, but it also
doesn't regress them any further.

Ian.

2014-02-19 14:35:49

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On 19/02/14 09:52, Ian Campbell wrote:
> On Tue, 2014-02-18 at 13:02 -0800, Luis R. Rodriguez wrote:
>> On Sun, Feb 16, 2014 at 10:57 AM, Stephen Hemminger
>> <[email protected]> wrote:
>>> On Fri, 14 Feb 2014 18:59:37 -0800
>>> "Luis R. Rodriguez" <[email protected]> wrote:
>>>
>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>
>>>> It doesn't make sense for some interfaces to become a root bridge
>>>> at any point in time. One example is virtual backend interfaces
>>>> which rely on other entities on the bridge for actual physical
>>>> connectivity. They only provide virtual access.
>>>>
>>>> Device drivers that know they should never become part of the
>>>> root bridge have been using a trick of setting their MAC address
>>>> to a high broadcast MAC address such as FE:FF:FF:FF:FF:FF. Instead
>>>> of using these hacks lets the interfaces annotate its intent and
>>>> generalizes a solution for multiple drivers, while letting the
>>>> drivers use a random MAC address or one prefixed with a proper OUI.
>>>> This sort of hack is used by both qemu and xen for their backend
>>>> interfaces.
>>>>
>>>> Cc: Stephen Hemminger <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>>>
>>> This is already supported in a more standard way via the root
>>> block flag.
>>
>> Great! For documentation purposes the root_block flag is a sysfs
>> attribute, added via 3.8 through commit 1007dd1a. The respective
>> interface flag is IFLA_BRPORT_PROTECT and can be set via the iproute2
>> bridge utility or through sysfs:
>>
>> mcgrof@garbanzo ~/linux (git::master)$ find /sys/ -name root_block
>> /sys/devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0/brport/root_block
>> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
>> /sys/devices/virtual/net/vif3.0-emu/brport/root_block
>>
>> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
>> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
>> 0
>> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ sudo bridge link set
>> dev vif3.0 root_block on
>> mcgrof@garbanzo ~/devel/iproute2 (git::master)$ cat
>> /sys/devices/vif-3-0/net/vif3.0/brport/root_block
>> 1
>>
>> So if we'd want to avoid using the MAC address hack alternative to
>> skip a root port userspace would need to be updated to simply set this
>> attribute after adding the device to the bridge. Based on Zoltan's
>> feedback there seems to be use cases to not enable this always for all
>> xen-netback interfaces though as such we can just punt this to
>> userspace for the topologies that require this.
>>
>> The original motivation for this series was to avoid the IPv6
>> duplicate address incurred by the MAC address hack for avoiding the
>> root bridge. Given that Zoltan also noted a use case whereby IPv4 and
>> IPv6 addresses can be assigned to the backend interfaces we should be
>> able to avoid the duplicate address situation for IPv6 by using a
>> proper random MAC address *once* userspace has been updated also to
>> use IFLA_BRPORT_PROTECT. New userspace can't and won't need to set
>> this flag for older kernels (older than 3.8) as root_block is not
>> implemented on those kernels and the MAC address hack would still be
>> used there. This strategy however does put a requirement on new
>> kernels to use new userspace as otherwise the MAC address workaround
>> would not be in place and root_block would not take effect.
>
> Can't we arrange things in the Xen hotplug scripts such that if the
> root_block stuff isn't available/doesn't work we fallback to the
> existing fe:ff:ff:ff:ff usage?
>
> That would avoid concerns about forward/backwards compat I think. It
> wouldn't solve the issue you are targeting on old systems, but it also
> doesn't regress them any further.

I agree, I think this problem could be better handled from userspace: if
it can set root_block then change the default MAC to a random one, if it
can't, then stay with the default one. Or if someone doesn't care about
STP but DAD is still important, userspace can have a force_random_mac
option somewhere to change to a random MAC regardless of root_block
presence.

Zoli

2014-02-19 16:46:15

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:
> On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams <[email protected]> wrote:
> > On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> >> From: "Luis R. Rodriguez" <[email protected]>
> >>
> >> Some interfaces do not need to have any IPv4 or IPv6
> >> addresses, so enable an option to specify this. One
> >> example where this is observed are virtualization
> >> backend interfaces which just use the net_device
> >> constructs to help with their respective frontends.
> >>
> >> This should optimize boot time and complexity on
> >> virtualization environments for each backend interface
> >> while also avoiding triggering SLAAC and DAD, which is
> >> simply pointless for these type of interfaces.
> >
> > Would it not be better/cleaner to use disable_ipv6 and then add a
> > disable_ipv4 sysctl, then use those with that interface?
>
> Sure, but note that the both disable_ipv6 and accept_dada sysctl
> parameters are global. ipv4 and ipv6 interfaces are created upon
> NETDEVICE_REGISTER, which will get triggered when a driver calls
> register_netdev(). The goal of this patch was to enable an early
> optimization for drivers that have no need ever for ipv4 or ipv6
> interfaces.

Each interface gets override sysctls too though, eg:

/proc/sys/net/ipv6/conf/enp0s25/disable_ipv6

which is the one I meant; you're obviously right that the global ones
aren't what you want here. But the specific ones should be suitable?
If you set that on a per-interface basis, then you'll get EPERM or
something whenever you try to add IPv6 addresses or do IPv6 routing.

> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
> backends though, as such this is no longer applicable as a
> requirement. The ipv4 sysctl however still seems like a reasonable
> approach to enable optimizations of the network in topologies where
> its known we won't need them but -- we'd need to consider a much more
> granular solution, not just global as it is now for disable_ipv6, and
> we'd also have to figure out a clean way to do this to not incur the
> cost of early address interface addition upon register_netdev().
>
> Given that we have a use case for ipv4 and ipv6 addresses on
> xen-netback we no longer have an immediate use case for such early
> optimization primitives though, so I'll drop this.
>
> > The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
> > already doing.
>
> disable_ipv6 is global, the goal was to make this granular and skip
> the cost upon early boot, but its been clarified we don't need this.

Like Stephen says, you need to make sure you set them before IFF_UP, but
beyond that, wouldn't the interface-specific sysctls work?

Dan

2014-02-19 16:46:22

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Mon, Feb 17, 2014 at 9:52 AM, Zoltan Kiss <[email protected]> wrote:
> On 15/02/14 02:59, Luis R. Rodriguez wrote:
>>
>> From: "Luis R. Rodriguez" <[email protected]>
>>
>> It doesn't make sense for some interfaces to become a root bridge
>> at any point in time. One example is virtual backend interfaces
>> which rely on other entities on the bridge for actual physical
>> connectivity. They only provide virtual access.
>
> It is possible that a guest bridge together to VIF, either from the same
> Dom0 bridge or from different ones. In that case using STP on VIFs sound
> sensible to me.

You seem to describe a case whereby it can make sense for xen-netback
interfaces to end up becoming the root port of a bridge. Can you
elaborate a little more on that as it was unclear the use case.

Additionally if such cases exist then under the current upstream
implementation one would simply need to change the MAC address in
order to enable a vif to become the root port. Stephen noted there is
a way to avoid nominating an interface for a root port through the
root block flag. We should use that instead of the MAC address hacks.
Let's keep in mind that part of the motivation for this series is to
avoid a duplicate IPv6 address left in place by use cases whereby the
MAC address of the backend vif was left static. The use case your are
explaining likely describes the more prevalent use case where address
conflicts can occur, perhaps when administrators for got to change the
backend MAC address. If we embrace a random MAC address we'd avoid
that issue, and but we'd need to update userspace to use the root
block on topologies where desired.

Luis

2014-02-19 17:02:30

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Wed, Feb 19, 2014 at 6:35 AM, Zoltan Kiss <[email protected]> wrote:
> On 19/02/14 09:52, Ian Campbell wrote:
>> Can't we arrange things in the Xen hotplug scripts such that if the
>> root_block stuff isn't available/doesn't work we fallback to the
>> existing fe:ff:ff:ff:ff usage?
>>
>> That would avoid concerns about forward/backwards compat I think. It
>> wouldn't solve the issue you are targeting on old systems, but it also
>> doesn't regress them any further.
>
> I agree, I think this problem could be better handled from userspace: if it
> can set root_block then change the default MAC to a random one, if it can't,
> then stay with the default one. Or if someone doesn't care about STP but DAD
> is still important, userspace can have a force_random_mac option somewhere
> to change to a random MAC regardless of root_block presence.

Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
upon initialization so that root block will be used once the device
gets added to a bridge. The purpose would be to avoid drivers from
using the high MAC address hack, streamline to use a random MAC
address thereby avoiding the possible duplicate address situation for
IPv6. In the STP use case for these interfaces we'd just require
userspace to unset the root block. I'd consider the STP use case the
most odd of all. The caveat to this approach is 3.8 would be needed
(or its the root block patches cherry picked) for base kernels older
than 3.8.

Stephen?

Luis

2014-02-19 17:09:02

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Wed, 19 Feb 2014 09:02:06 -0800
"Luis R. Rodriguez" <[email protected]> wrote:

> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
> upon initialization so that root block will be used once the device
> gets added to a bridge. The purpose would be to avoid drivers from
> using the high MAC address hack, streamline to use a random MAC
> address thereby avoiding the possible duplicate address situation for
> IPv6. In the STP use case for these interfaces we'd just require
> userspace to unset the root block. I'd consider the STP use case the
> most odd of all. The caveat to this approach is 3.8 would be needed
> (or its the root block patches cherry picked) for base kernels older
> than 3.8.
>
> Stephen?
>
> Luis

Don't add IFF_ flags that adds yet another API hook into bridge.
Please only use the netlink/sysfs flags fields that already exist
for new features.

2014-02-19 17:10:28

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends

On Wed, Feb 19, 2014 at 1:48 AM, Ian Campbell <[email protected]> wrote:
> On Tue, 2014-02-18 at 11:43 -0800, Luis R. Rodriguez wrote:
>>
>> New motivation: removing IPv4 and IPv6 from the backend interfaces can
>> save up a lot of boiler plate run time code, triggers from ever taking
>> place, and simplifying the backend interaces. If there is no use for
>> IPv4 and IPv6 interfaces why do we have them? Note: I have yet to test
>> the NAT case.
>
> I think you need to do that test that before you can unequivocally state
> that there is no use for IPv4/6 interfaces here.

Agreed but note that Zoltan stated that in the routing case IPv4 or
IPv6 addresses can be used on the backends, so that already rules that
out. Unless of course we want to enable this by default (for
simplicity) and have userpace poke to get out IPv4 / IPv6 if by
default no interfaces were enabled. Even though backend interfaces
would stand to gain on the average situation from this simplicity I
don't think the userspace requirements are worth it. Someone with
hundreds of guests (that don't do routing on the backend as clarified
by Zoltan) may want to test my patch though to see if there's any
reasonable cuts on getting these guests up and running.

Anyone itching for the above?

Luis

2014-02-19 17:13:36

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Tue, Feb 18, 2014 at 1:42 PM, Stephen Hemminger
<[email protected]> wrote:
> On Tue, 18 Feb 2014 13:19:15 -0800
> "Luis R. Rodriguez" <[email protected]> wrote:
>
>> Sure, but note that the both disable_ipv6 and accept_dada sysctl
>> parameters are global. ipv4 and ipv6 interfaces are created upon
>> NETDEVICE_REGISTER, which will get triggered when a driver calls
>> register_netdev(). The goal of this patch was to enable an early
>> optimization for drivers that have no need ever for ipv4 or ipv6
>> interfaces.
>
> The trick with ipv6 is to register the device, then have userspace
> do the ipv6 sysctl before bringing the device up.

Nice, thanks!

Luis

2014-02-19 17:20:36

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Wed, Feb 19, 2014 at 8:45 AM, Dan Williams <[email protected]> wrote:
> On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:
>> On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams <[email protected]> wrote:
>> > On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
>> >> From: "Luis R. Rodriguez" <[email protected]>
>> >>
>> >> Some interfaces do not need to have any IPv4 or IPv6
>> >> addresses, so enable an option to specify this. One
>> >> example where this is observed are virtualization
>> >> backend interfaces which just use the net_device
>> >> constructs to help with their respective frontends.
>> >>
>> >> This should optimize boot time and complexity on
>> >> virtualization environments for each backend interface
>> >> while also avoiding triggering SLAAC and DAD, which is
>> >> simply pointless for these type of interfaces.
>> >
>> > Would it not be better/cleaner to use disable_ipv6 and then add a
>> > disable_ipv4 sysctl, then use those with that interface?
>>
>> Sure, but note that the both disable_ipv6 and accept_dada sysctl
>> parameters are global. ipv4 and ipv6 interfaces are created upon
>> NETDEVICE_REGISTER, which will get triggered when a driver calls
>> register_netdev(). The goal of this patch was to enable an early
>> optimization for drivers that have no need ever for ipv4 or ipv6
>> interfaces.
>
> Each interface gets override sysctls too though, eg:
>
> /proc/sys/net/ipv6/conf/enp0s25/disable_ipv6

I hadn't seen those, thanks!

> which is the one I meant; you're obviously right that the global ones
> aren't what you want here. But the specific ones should be suitable?

Under the approach Stephen mentioned by first ensuring the interface
is down yes. There's one use case I can consider to still want the
patch though, more on that below.

> If you set that on a per-interface basis, then you'll get EPERM or
> something whenever you try to add IPv6 addresses or do IPv6 routing.

Neat, thanks.

>> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
>> backends though, as such this is no longer applicable as a
>> requirement. The ipv4 sysctl however still seems like a reasonable
>> approach to enable optimizations of the network in topologies where
>> its known we won't need them but -- we'd need to consider a much more
>> granular solution, not just global as it is now for disable_ipv6, and
>> we'd also have to figure out a clean way to do this to not incur the
>> cost of early address interface addition upon register_netdev().
>>
>> Given that we have a use case for ipv4 and ipv6 addresses on
>> xen-netback we no longer have an immediate use case for such early
>> optimization primitives though, so I'll drop this.
>>
>> > The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
>> > already doing.
>>
>> disable_ipv6 is global, the goal was to make this granular and skip
>> the cost upon early boot, but its been clarified we don't need this.
>
> Like Stephen says, you need to make sure you set them before IFF_UP, but
> beyond that, wouldn't the interface-specific sysctls work?

Yeah that'll do it, unless there is a measurable run time benefit cost
to never even add these in the first place. Consider a host with tons
of guests, not sure how many is 'a lot' these days. One would have to
measure the cost of reducing the amount of time it takes to boot these
up. As discussed in the other threads though there *is* some use cases
of assigning IPv4 or IPv6 addresses to the backend interfaces though:
routing them (although its unclear to me if iptables can be used
instead, Zoltan?). So at least now there no clear requirement to
remove these interfaces or not have them at all. The boot time cost
savings should be considered though if this is ultimately desirable. I
saw tons of timers and events that'd get triggered with any IPv4 or
IPv6 interface laying around.

Luis

2014-02-19 17:59:57

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger
<[email protected]> wrote:
> On Wed, 19 Feb 2014 09:02:06 -0800
> "Luis R. Rodriguez" <[email protected]> wrote:
>
>> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
>> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
>> upon initialization so that root block will be used once the device
>> gets added to a bridge. The purpose would be to avoid drivers from
>> using the high MAC address hack, streamline to use a random MAC
>> address thereby avoiding the possible duplicate address situation for
>> IPv6. In the STP use case for these interfaces we'd just require
>> userspace to unset the root block. I'd consider the STP use case the
>> most odd of all. The caveat to this approach is 3.8 would be needed
>> (or its the root block patches cherry picked) for base kernels older
>> than 3.8.
>>
>> Stephen?
>>
>> Luis
>
> Don't add IFF_ flags that adds yet another API hook into bridge.

The goal was not to add a userspace API, but rather consider a driver
initialization preference.

> Please only use the netlink/sysfs flags fields that already exist
> for new features.

Sure, but what if we know a driver in most cases wants the root block
and we'd want to make it the default, thereby only requiring userspace
for toggling it off.

Luis

2014-02-19 19:13:15

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On 19/02/14 17:20, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 8:45 AM, Dan Williams <[email protected]> wrote:
>> On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:
>>> On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams <[email protected]> wrote:
>>>> On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>
>>>>> Some interfaces do not need to have any IPv4 or IPv6
>>>>> addresses, so enable an option to specify this. One
>>>>> example where this is observed are virtualization
>>>>> backend interfaces which just use the net_device
>>>>> constructs to help with their respective frontends.
>>>>>
>>>>> This should optimize boot time and complexity on
>>>>> virtualization environments for each backend interface
>>>>> while also avoiding triggering SLAAC and DAD, which is
>>>>> simply pointless for these type of interfaces.
>>>>
>>>> Would it not be better/cleaner to use disable_ipv6 and then add a
>>>> disable_ipv4 sysctl, then use those with that interface?
>>>
>>> Sure, but note that the both disable_ipv6 and accept_dada sysctl
>>> parameters are global. ipv4 and ipv6 interfaces are created upon
>>> NETDEVICE_REGISTER, which will get triggered when a driver calls
>>> register_netdev(). The goal of this patch was to enable an early
>>> optimization for drivers that have no need ever for ipv4 or ipv6
>>> interfaces.
>>
>> Each interface gets override sysctls too though, eg:
>>
>> /proc/sys/net/ipv6/conf/enp0s25/disable_ipv6
>
> I hadn't seen those, thanks!
>
>> which is the one I meant; you're obviously right that the global ones
>> aren't what you want here. But the specific ones should be suitable?
>
> Under the approach Stephen mentioned by first ensuring the interface
> is down yes. There's one use case I can consider to still want the
> patch though, more on that below.
>
>> If you set that on a per-interface basis, then you'll get EPERM or
>> something whenever you try to add IPv6 addresses or do IPv6 routing.
>
> Neat, thanks.
>
>>> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
>>> backends though, as such this is no longer applicable as a
>>> requirement. The ipv4 sysctl however still seems like a reasonable
>>> approach to enable optimizations of the network in topologies where
>>> its known we won't need them but -- we'd need to consider a much more
>>> granular solution, not just global as it is now for disable_ipv6, and
>>> we'd also have to figure out a clean way to do this to not incur the
>>> cost of early address interface addition upon register_netdev().
>>>
>>> Given that we have a use case for ipv4 and ipv6 addresses on
>>> xen-netback we no longer have an immediate use case for such early
>>> optimization primitives though, so I'll drop this.
>>>
>>>> The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
>>>> already doing.
>>>
>>> disable_ipv6 is global, the goal was to make this granular and skip
>>> the cost upon early boot, but its been clarified we don't need this.
>>
>> Like Stephen says, you need to make sure you set them before IFF_UP, but
>> beyond that, wouldn't the interface-specific sysctls work?
>
> Yeah that'll do it, unless there is a measurable run time benefit cost
> to never even add these in the first place. Consider a host with tons
> of guests, not sure how many is 'a lot' these days. One would have to
> measure the cost of reducing the amount of time it takes to boot these
> up. As discussed in the other threads though there *is* some use cases
> of assigning IPv4 or IPv6 addresses to the backend interfaces though:
> routing them (although its unclear to me if iptables can be used
> instead, Zoltan?).

Not with OVS, it steals the packet before netfilter hooks.

Zoli

2014-02-20 00:55:51

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Wed, 2014-02-19 at 09:20 -0800, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 8:45 AM, Dan Williams <[email protected]> wrote:
> > On Tue, 2014-02-18 at 13:19 -0800, Luis R. Rodriguez wrote:
> >> On Mon, Feb 17, 2014 at 12:23 PM, Dan Williams <[email protected]> wrote:
> >> > On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
> >> >> From: "Luis R. Rodriguez" <[email protected]>
> >> >>
> >> >> Some interfaces do not need to have any IPv4 or IPv6
> >> >> addresses, so enable an option to specify this. One
> >> >> example where this is observed are virtualization
> >> >> backend interfaces which just use the net_device
> >> >> constructs to help with their respective frontends.
> >> >>
> >> >> This should optimize boot time and complexity on
> >> >> virtualization environments for each backend interface
> >> >> while also avoiding triggering SLAAC and DAD, which is
> >> >> simply pointless for these type of interfaces.
> >> >
> >> > Would it not be better/cleaner to use disable_ipv6 and then add a
> >> > disable_ipv4 sysctl, then use those with that interface?
> >>
> >> Sure, but note that the both disable_ipv6 and accept_dada sysctl
> >> parameters are global. ipv4 and ipv6 interfaces are created upon
> >> NETDEVICE_REGISTER, which will get triggered when a driver calls
> >> register_netdev(). The goal of this patch was to enable an early
> >> optimization for drivers that have no need ever for ipv4 or ipv6
> >> interfaces.
> >
> > Each interface gets override sysctls too though, eg:
> >
> > /proc/sys/net/ipv6/conf/enp0s25/disable_ipv6
>
> I hadn't seen those, thanks!

Note that there isn't yet a disable_ipv4 knob though, I was
perhaps-too-subtly trying to get you to send a patch for it, since I can
use it too :)

Dan

> > which is the one I meant; you're obviously right that the global ones
> > aren't what you want here. But the specific ones should be suitable?
>
> Under the approach Stephen mentioned by first ensuring the interface
> is down yes. There's one use case I can consider to still want the
> patch though, more on that below.
>
> > If you set that on a per-interface basis, then you'll get EPERM or
> > something whenever you try to add IPv6 addresses or do IPv6 routing.
>
> Neat, thanks.
>
> >> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
> >> backends though, as such this is no longer applicable as a
> >> requirement. The ipv4 sysctl however still seems like a reasonable
> >> approach to enable optimizations of the network in topologies where
> >> its known we won't need them but -- we'd need to consider a much more
> >> granular solution, not just global as it is now for disable_ipv6, and
> >> we'd also have to figure out a clean way to do this to not incur the
> >> cost of early address interface addition upon register_netdev().
> >>
> >> Given that we have a use case for ipv4 and ipv6 addresses on
> >> xen-netback we no longer have an immediate use case for such early
> >> optimization primitives though, so I'll drop this.
> >>
> >> > The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
> >> > already doing.
> >>
> >> disable_ipv6 is global, the goal was to make this granular and skip
> >> the cost upon early boot, but its been clarified we don't need this.
> >
> > Like Stephen says, you need to make sure you set them before IFF_UP, but
> > beyond that, wouldn't the interface-specific sysctls work?
>
> Yeah that'll do it, unless there is a measurable run time benefit cost
> to never even add these in the first place. Consider a host with tons
> of guests, not sure how many is 'a lot' these days. One would have to
> measure the cost of reducing the amount of time it takes to boot these
> up. As discussed in the other threads though there *is* some use cases
> of assigning IPv4 or IPv6 addresses to the backend interfaces though:
> routing them (although its unclear to me if iptables can be used
> instead, Zoltan?). So at least now there no clear requirement to
> remove these interfaces or not have them at all. The boot time cost
> savings should be considered though if this is ultimately desirable. I
> saw tons of timers and events that'd get triggered with any IPv4 or
> IPv6 interface laying around.
>
> Luis
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-02-20 00:58:37

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Wed, Feb 19, 2014 at 06:56:17PM -0600, Dan Williams wrote:
> Note that there isn't yet a disable_ipv4 knob though, I was
> perhaps-too-subtly trying to get you to send a patch for it, since I can
> use it too :)

Do you plan to implement
<http://datatracker.ietf.org/doc/draft-ietf-sunset4-noipv4/>?

;)

2014-02-20 01:01:42

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Thu, 2014-02-20 at 01:58 +0100, Hannes Frederic Sowa wrote:
> On Wed, Feb 19, 2014 at 06:56:17PM -0600, Dan Williams wrote:
> > Note that there isn't yet a disable_ipv4 knob though, I was
> > perhaps-too-subtly trying to get you to send a patch for it, since I can
> > use it too :)
>
> Do you plan to implement
> <http://datatracker.ietf.org/doc/draft-ietf-sunset4-noipv4/>?
>
> ;)

Well, not specifically, but with NetworkManager we do have a "disable
IPv4" method for IPv4, which now just doesn't do any kind of IPv4, but
obviously doesn't disable IPv4 entirely because that's not possible. I
was only thinking that it would be nice to actually guarantee that IPv4
was disabled, just like disable_ipv6 does.

But we could certainly implement that draft if a patch shows up or if it
bubbled up the priority stack :)

Dan

2014-02-20 13:19:28

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On 19/02/14 17:02, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 6:35 AM, Zoltan Kiss <[email protected]> wrote:
>> On 19/02/14 09:52, Ian Campbell wrote:
>>> Can't we arrange things in the Xen hotplug scripts such that if the
>>> root_block stuff isn't available/doesn't work we fallback to the
>>> existing fe:ff:ff:ff:ff usage?
>>>
>>> That would avoid concerns about forward/backwards compat I think. It
>>> wouldn't solve the issue you are targeting on old systems, but it also
>>> doesn't regress them any further.
>>
>> I agree, I think this problem could be better handled from userspace: if it
>> can set root_block then change the default MAC to a random one, if it can't,
>> then stay with the default one. Or if someone doesn't care about STP but DAD
>> is still important, userspace can have a force_random_mac option somewhere
>> to change to a random MAC regardless of root_block presence.
>
> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
> upon initialization so that root block will be used once the device
> gets added to a bridge. The purpose would be to avoid drivers from
> using the high MAC address hack, streamline to use a random MAC
> address thereby avoiding the possible duplicate address situation for
> IPv6. In the STP use case for these interfaces we'd just require
> userspace to unset the root block. I'd consider the STP use case the
> most odd of all. The caveat to this approach is 3.8 would be needed
> (or its the root block patches cherry picked) for base kernels older
> than 3.8.

How about this: netback sets the root_block flag and a random MAC by
default. So the default behaviour won't change, DAD will be happy, and
userspace don't have to do anything unless it's using netback for STP
root bridge (I don't think there are too many toolstacks doing that), in
which case it has to remove the root_block flag instead of setting a
random MAC.

Zoli

2014-02-20 14:47:13

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On 19/02/14 16:45, Luis R. Rodriguez wrote:
> On Mon, Feb 17, 2014 at 9:52 AM, Zoltan Kiss <[email protected]> wrote:
>> On 15/02/14 02:59, Luis R. Rodriguez wrote:
>>>
>>> From: "Luis R. Rodriguez" <[email protected]>
>>>
>>> It doesn't make sense for some interfaces to become a root bridge
>>> at any point in time. One example is virtual backend interfaces
>>> which rely on other entities on the bridge for actual physical
>>> connectivity. They only provide virtual access.
>>
>> It is possible that a guest bridge together to VIF, either from the same
>> Dom0 bridge or from different ones. In that case using STP on VIFs sound
>> sensible to me.
>
> You seem to describe a case whereby it can make sense for xen-netback
> interfaces to end up becoming the root port of a bridge. Can you
> elaborate a little more on that as it was unclear the use case.
Well, I might be wrong on that, but the scenario I was thinking: a guest
(let's say domain 1) can have multiple interfaces on different Dom0 (or
driver domain) bridges, let's say vif1.0 is plugged into xenbr0 and
vif1.1 is in xenbr1. If the guest wants to make a bridge of this two,
then using STP makes sense. I wanted to bring up CloudStack's virtual
router as an example, but then I realized it's probably doesn't do such
thing. However I don't think we should hardcode that a netback interface
can't be RP ever.

>
> Additionally if such cases exist then under the current upstream
> implementation one would simply need to change the MAC address in
> order to enable a vif to become the root port. Stephen noted there is
> a way to avoid nominating an interface for a root port through the
> root block flag. We should use that instead of the MAC address hacks.
> Let's keep in mind that part of the motivation for this series is to
> avoid a duplicate IPv6 address left in place by use cases whereby the
> MAC address of the backend vif was left static. The use case your are
> explaining likely describes the more prevalent use case where address
> conflicts can occur, perhaps when administrators for got to change the
> backend MAC address. If we embrace a random MAC address we'd avoid
> that issue, and but we'd need to update userspace to use the root
> block on topologies where desired.
If I understand you correctly, this is the same I suggested in my
another email sent 1.5 hour ago.

Zoli

2014-02-20 17:20:07

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Wed, 19 Feb 2014 09:59:33 -0800
"Luis R. Rodriguez" <[email protected]> wrote:

> On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger
> <[email protected]> wrote:
> > On Wed, 19 Feb 2014 09:02:06 -0800
> > "Luis R. Rodriguez" <[email protected]> wrote:
> >
> >> Folks, what if I repurpose my patch to use the IFF_BRIDGE_NON_ROOT (or
> >> relabel to IFF_ROOT_BLOCK_DEF) flag for a default driver preference
> >> upon initialization so that root block will be used once the device
> >> gets added to a bridge. The purpose would be to avoid drivers from
> >> using the high MAC address hack, streamline to use a random MAC
> >> address thereby avoiding the possible duplicate address situation for
> >> IPv6. In the STP use case for these interfaces we'd just require
> >> userspace to unset the root block. I'd consider the STP use case the
> >> most odd of all. The caveat to this approach is 3.8 would be needed
> >> (or its the root block patches cherry picked) for base kernels older
> >> than 3.8.
> >>
> >> Stephen?
> >>
> >> Luis
> >
> > Don't add IFF_ flags that adds yet another API hook into bridge.
>
> The goal was not to add a userspace API, but rather consider a driver
> initialization preference.
>
> > Please only use the netlink/sysfs flags fields that already exist
> > for new features.
>
> Sure, but what if we know a driver in most cases wants the root block
> and we'd want to make it the default, thereby only requiring userspace
> for toggling it off.
>
> Luis

Something in userspace has to put the device into the bridge.
Fix the port setup in that tool via the netlink or sysfs flags in
the bridge. It should not have to be handled in the bridge looking
at magic flags in the device.

2014-02-20 20:01:50

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss <[email protected]> wrote:
> How about this: netback sets the root_block flag and a random MAC by
> default. So the default behaviour won't change, DAD will be happy, and
> userspace don't have to do anything unless it's using netback for STP root
> bridge (I don't think there are too many toolstacks doing that), in which
> case it has to remove the root_block flag instead of setting a random MAC.

:D that's exactly what I ended up proposing too. I mentioned how
xen-netback could do this as well, we'd keep or rename the flag I
added, and then the bridge could would look at it and enable the root
block if the flag is set. Stephen however does not like having the
bridge code look at magic flags for this behavior and would prefer for
us to get the tools to ask for the root block. Let's follow more up on
that thread.

Luis

2014-02-20 20:24:31

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Thu, Feb 20, 2014 at 9:19 AM, Stephen Hemminger
<[email protected]> wrote:
> On Wed, 19 Feb 2014 09:59:33 -0800 "Luis R. Rodriguez" <[email protected]> wrote:
>> On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger <[email protected]> wrote:
>> >
>> > Please only use the netlink/sysfs flags fields that already exist
>> > for new features.
>>
>> Sure, but what if we know a driver in most cases wants the root block
>> and we'd want to make it the default, thereby only requiring userspace
>> for toggling it off.
>
> Something in userspace has to put the device into the bridge.
> Fix the port setup in that tool via the netlink or sysfs flags in
> the bridge. It should not have to be handled in the bridge looking
> at magic flags in the device.

Agreed that's the best strategy and I'll work on sending patches to
brctl to enable the root_block preference. This approach however also
requires a userspace upgrade. I'm trying to see if we can get an
old-nasty-cryptic-hack practice removed from the kernel and we'd try
to prevent future drivers from using it -- without requiring userspace
upgrade. In this case the bad practice is to using a high static MAC
address for mimicking a root block default preference. In order to
remove that *without* requiring a userspace upgrade the dev->priv_flag
approach is the only thing I can think of. If this would go in we'd
replace the high static MAC address with a random MAC address to
prevent IPv6 SLAAC / DAD conflicts. I'd document this flag and
indicate with preference for userspace to be the one tuning these
knobs.

Without this we'd have to keep the high static MAC address on upstream
drivers and let userspace do the random'ization if it confirms the
userspace knob to turn the root block flag is available. Is the
priv_flag approach worth the compromise to remove the root block hack
practice?

Luis

2014-02-20 20:29:01

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Thu, Feb 20, 2014 at 6:47 AM, Zoltan Kiss <[email protected]> wrote:
> On 19/02/14 16:45, Luis R. Rodriguez wrote:
>
>> You seem to describe a case whereby it can make sense for xen-netback
>> interfaces to end up becoming the root port of a bridge. Can you
>> elaborate a little more on that as it was unclear the use case.
>
> Well, I might be wrong on that, but the scenario I was thinking: a guest
> (let's say domain 1) can have multiple interfaces on different Dom0 (or
> driver domain) bridges, let's say vif1.0 is plugged into xenbr0 and vif1.1
> is in xenbr1. If the guest wants to make a bridge of this two, then using
> STP makes sense.

The bridging would happen on the front end in that case no?

> I wanted to bring up CloudStack's virtual router as an
> example, but then I realized it's probably doesn't do such thing. However I
> don't think we should hardcode that a netback interface can't be RP ever.

My patch did allow for this but the root block flag that Stephen
mentioned can always be lifted.

Luis

2014-02-20 20:31:56

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Wed, Feb 19, 2014 at 4:56 PM, Dan Williams <[email protected]> wrote:
> Note that there isn't yet a disable_ipv4 knob though, I was
> perhaps-too-subtly trying to get you to send a patch for it, since I can
> use it too :)

Sure, can you describe a little better the use case, as I could use
that for the commit log. My only current use case was the xen-netback
case but Zoltan has noted a few cases where an IPv4 or IPv6 address
*could* be used on the backend interfaces (which I'll still poke as
its unclear to me why they have 'em).

Luis

2014-02-20 20:39:53

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Wed, Feb 19, 2014 at 11:13 AM, Zoltan Kiss <[email protected]> wrote:
> On 19/02/14 17:20, Luis R. Rodriguez wrote:
>>>> On 19/02/14 17:20, Luis R. Rodriguez also wrote:
>>>> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
>>>> backends though <...>
>>
>> As discussed in the other threads though there *is* some use cases
>> of assigning IPv4 or IPv6 addresses to the backend interfaces though:
>> routing them (although its unclear to me if iptables can be used
>> instead, Zoltan?).
>
> Not with OVS, it steals the packet before netfilter hooks.

Got it, thanks! Can't the route be added using a front-end IP address
instead on the host though ? I just tried that on a Xen system and it
seems to work. Perhaps I'm not understand the exact topology on the
routing case. So in my case I have the backend without any IPv4 or
IPv6 interfaces, the guest has IPv4, IPv6 addresses and even a TUN for
VPN and I can create routes on the host to the front end by not using
the backend device name but instead using the front-end target IP.

Luis

2014-02-21 13:03:10

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On 20/02/14 20:24, Luis R. Rodriguez wrote:
> On Thu, Feb 20, 2014 at 9:19 AM, Stephen Hemminger
> <[email protected]> wrote:
>> On Wed, 19 Feb 2014 09:59:33 -0800 "Luis R. Rodriguez" <[email protected]> wrote:
>>> On Wed, Feb 19, 2014 at 9:08 AM, Stephen Hemminger <[email protected]> wrote:
>>>>
>>>> Please only use the netlink/sysfs flags fields that already exist
>>>> for new features.
>>>
>>> Sure, but what if we know a driver in most cases wants the root block
>>> and we'd want to make it the default, thereby only requiring userspace
>>> for toggling it off.
>>
>> Something in userspace has to put the device into the bridge.
>> Fix the port setup in that tool via the netlink or sysfs flags in
>> the bridge. It should not have to be handled in the bridge looking
>> at magic flags in the device.
>
> Agreed that's the best strategy and I'll work on sending patches to
> brctl to enable the root_block preference. This approach however also
I don't think brctl should deal with any Xen specific stuff. I assume
there is a misunderstanding in this thread: when I (and possibly other
Xen folks) talk about "userspace" or "toolstack" here, I mean Xen
specific tools which use e.g. brctl to set up bridges. Not brctl itself.
> requires a userspace upgrade. I'm trying to see if we can get an
> old-nasty-cryptic-hack practice removed from the kernel and we'd try
> to prevent future drivers from using it -- without requiring userspace
> upgrade. In this case the bad practice is to using a high static MAC
> address for mimicking a root block default preference. In order to
> remove that *without* requiring a userspace upgrade the dev->priv_flag
> approach is the only thing I can think of. If this would go in we'd
> replace the high static MAC address with a random MAC address to
> prevent IPv6 SLAAC / DAD conflicts. I'd document this flag and
> indicate with preference for userspace to be the one tuning these
> knobs.
>
> Without this we'd have to keep the high static MAC address on upstream
> drivers and let userspace do the random'ization if it confirms the
> userspace knob to turn the root block flag is available. Is the
> priv_flag approach worth the compromise to remove the root block hack
> practice?
>
> Luis
>

2014-02-21 13:03:16

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On 20/02/14 20:39, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 11:13 AM, Zoltan Kiss <[email protected]> wrote:
>> On 19/02/14 17:20, Luis R. Rodriguez wrote:
>>>>> On 19/02/14 17:20, Luis R. Rodriguez also wrote:
>>>>> Zoltan has noted though some use cases of IPv4 or IPv6 addresses on
>>>>> backends though <...>
>>>
>>> As discussed in the other threads though there *is* some use cases
>>> of assigning IPv4 or IPv6 addresses to the backend interfaces though:
>>> routing them (although its unclear to me if iptables can be used
>>> instead, Zoltan?).
>>
>> Not with OVS, it steals the packet before netfilter hooks.
>
> Got it, thanks! Can't the route be added using a front-end IP address
> instead on the host though ? I just tried that on a Xen system and it
> seems to work. Perhaps I'm not understand the exact topology on the
> routing case. So in my case I have the backend without any IPv4 or
> IPv6 interfaces, the guest has IPv4, IPv6 addresses and even a TUN for
> VPN and I can create routes on the host to the front end by not using
> the backend device name but instead using the front-end target IP.
Check this how current Xen scripts does routed networking:

http://wiki.xen.org/wiki/Xen_Networking#Associating_routes_with_virtual_devices

Note, there are no bridges involved here! As the above page says, the
backend has to have IP address, maybe it's not true anymore. I'm not too
familiar with this setup too, I've used it only once.

Zoli

2014-02-21 13:04:12

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On 20/02/14 20:01, Luis R. Rodriguez wrote:
> On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss <[email protected]> wrote:
>> How about this: netback sets the root_block flag and a random MAC by
>> default. So the default behaviour won't change, DAD will be happy, and
>> userspace don't have to do anything unless it's using netback for STP root
>> bridge (I don't think there are too many toolstacks doing that), in which
>> case it has to remove the root_block flag instead of setting a random MAC.
>
> :D that's exactly what I ended up proposing too. I mentioned how
> xen-netback could do this as well, we'd keep or rename the flag I
> added, and then the bridge could would look at it and enable the root
> block if the flag is set. Stephen however does not like having the
> bridge code look at magic flags for this behavior and would prefer for
> us to get the tools to ask for the root block. Let's follow more up on
> that thread
We don't need that new flag, just forget about it. Set that root_block
flag from netback device init, around the time you generate the random
MAC, or at the earliest possible time. Nothing else has to be done from
kernel side. If someone wants netback to be a root port, then remove
root_block from their tools, instead of changing the the MAC address, as
it happens now.
Another problem with the random addresses, pointed out by Ian earlier,
that when adding/removing interfaces, the bridge does recalculate it's
MAC address, and choose the lowest one. In the general usecase I think
that's normal, but in case of Xen networking, we would like to keep the
bridge using the physical interface's MAC, because the local port of the
bridge is used for Dom0 network traffic, therefore changing the bridge
MAC when a netback device has lower MAC breaks that traffic. I think the
best is to address this from userspace: if it set the MAC of the bridge
explicitly, dev_set_mac_address() does dev->addr_assign_type =
NET_ADDR_SET;, so br_stp_recalculate_bridge_id() will exit before
changing anything.
And when I say userspace, I mean Xen specific tools which does
networking configuration, e.g. xapi in XenServer case. Not brctl, it
doesn't have to know whether this is a xenbrX device or a bridge used
for another purposes.

Zoli

2014-02-21 15:59:47

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <[email protected]> wrote:
> On 20/02/14 20:01, Luis R. Rodriguez wrote:
>>
>> On Thu, Feb 20, 2014 at 5:19 AM, Zoltan Kiss <[email protected]>
>> wrote:
>>>
>>> How about this: netback sets the root_block flag and a random MAC by
>>> default. So the default behaviour won't change, DAD will be happy, and
>>> userspace don't have to do anything unless it's using netback for STP
>>> root
>>> bridge (I don't think there are too many toolstacks doing that), in which
>>> case it has to remove the root_block flag instead of setting a random
>>> MAC.
>>
>>
>> :D that's exactly what I ended up proposing too. I mentioned how
>> xen-netback could do this as well, we'd keep or rename the flag I
>> added, and then the bridge could would look at it and enable the root
>> block if the flag is set. Stephen however does not like having the
>> bridge code look at magic flags for this behavior and would prefer for
>> us to get the tools to ask for the root block. Let's follow more up on
>> that thread
>
> We don't need that new flag, just forget about it.

Unless I'm missing something the root_block flag is a bridge port
primitive. This means we can't set it *until* the interface gets added
to a bridge, and even then, its a knob that would be available only to
the bridge.

> Another problem with the random addresses, pointed out by Ian earlier, that
> when adding/removing interfaces, the bridge does recalculate it's MAC
> address, and choose the lowest one. In the general usecase I think that's
> normal, but in case of Xen networking, we would like to keep the bridge
> using the physical interface's MAC, because the local port of the bridge is
> used for Dom0 network traffic, therefore changing the bridge MAC when a
> netback device has lower MAC breaks that traffic.

This is a good reason then to actually have an interface general
specific knob to annotate to the bridge that we'd prefer to root_block
by default, the alternative as you point out below is to have the xen
/ kvm utils to set the bridge MAC address statically, but that'll
requires a userspace upgrade. I'm looking for a kernel solution that
is backwards compatible with old userspace.

> I think the best is to
> address this from userspace: if it set the MAC of the bridge explicitly,
> dev_set_mac_address() does dev->addr_assign_type = NET_ADDR_SET;, so
> br_stp_recalculate_bridge_id() will exit before changing anything.

That will certainly work for new xen / kvm util userspace.

Luis

2014-02-21 16:01:36

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <[email protected]> wrote:
>> Agreed that's the best strategy and I'll work on sending patches to
>> brctl to enable the root_block preference. This approach however also
>
> I don't think brctl should deal with any Xen specific stuff. I assume there
> is a misunderstanding in this thread: when I (and possibly other Xen folks)
> talk about "userspace" or "toolstack" here, I mean Xen specific tools which
> use e.g. brctl to set up bridges. Not brctl itself.

I did mean brctl, but as I looked at the code it doesn't used
rtnl_open() and not sure if Stephen would want that. Additionally even
if it did handle root_block the other issue with this strategy is that
as you noted upon initialization the bridge, without a static MAC
address, could end up setting the backend as the root port, until you
let userspace turn the root_block knob.

Luis

2014-02-22 01:39:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

On Fri, Feb 21, 2014 at 8:01 AM, Luis R. Rodriguez
<[email protected]> wrote:
> On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <[email protected]> wrote:
>>> Agreed that's the best strategy and I'll work on sending patches to
>>> brctl to enable the root_block preference. This approach however also
>>
>> I don't think brctl should deal with any Xen specific stuff. I assume there
>> is a misunderstanding in this thread: when I (and possibly other Xen folks)
>> talk about "userspace" or "toolstack" here, I mean Xen specific tools which
>> use e.g. brctl to set up bridges. Not brctl itself.
>
> I did mean brctl, but as I looked at the code it doesn't used
> rtnl_open() and not sure if Stephen would want that.

Actually that'd be the incorrect tool to extend, iproute2 would be the
new way with:

ip link add dev xenbr0 type bridge
ip link set dev eth0 master xenbr0
ip link set dev vif1.0 master xenbr0 <root_block>

where root_block would be the new desired argument. This would use the
rtnetlink RTM_SETLINK + IFLA_MASTER, which will in turn kick off the
bridge ndo_add_slave(). Still though it seems this requires the eth0
device to actually exist and as such from what I can tell we can't set
the root_block preference until *after* the addition onto the bridge,
which should mean the bridge could still take the vif1.0 MAC address
momentarily. This is of course only an issue if the link was up during
the additions. This makes me think perhaps nothing is needed then and
scripts could just use the:

bridge link set dev vif1.0 root_block on

I also just noticed that if an entry that was the bridge root port got
a root_block toggle we don't kick off the newly blocked port, I just
verified this. Note that removing the interface from the bridge does
however reset the bridge with a proper new root port:

ip link set dev vif1.0 nomaster

For old userspace with brctl and no iproute2 we're shit out of luck,
this means we can't use root block (xen-netblock was added on
v2.6.39).

Stephen all this can we add the priv_flags flag to help out as
proposed, but I'd make it just toggle the new root_block flag, that'd
enable drivers to use this from initialization. Let me know if you
have other suggestions or things I may have missed.

Luis

2014-02-22 01:41:14

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Fri, Feb 21, 2014 at 5:02 AM, Zoltan Kiss <[email protected]> wrote:
> Check this how current Xen scripts does routed networking:
>
> http://wiki.xen.org/wiki/Xen_Networking#Associating_routes_with_virtual_devices
>
> Note, there are no bridges involved here! As the above page says, the
> backend has to have IP address, maybe it's not true anymore. I'm not too
> familiar with this setup too, I've used it only once.

Thanks, in such case I do think actually adding a bridge, adding the
backend interface to it, and then adding a route to the front end IP
would suffice to cover that case, but I'm pretty limited with test
devices so would appreciate if someone with a setup like that can test
it as an alternative. Please recall that the possible gains here
should be pretty significant in terms of simplification. And of
course, I still also haven't had time / systems to test the NAT
case...

Luis

2014-02-24 18:21:16

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Thu, 2014-02-20 at 12:31 -0800, Luis R. Rodriguez wrote:
> On Wed, Feb 19, 2014 at 4:56 PM, Dan Williams <[email protected]> wrote:
> > Note that there isn't yet a disable_ipv4 knob though, I was
> > perhaps-too-subtly trying to get you to send a patch for it, since I can
> > use it too :)
>
> Sure, can you describe a little better the use case, as I could use
> that for the commit log. My only current use case was the xen-netback
> case but Zoltan has noted a few cases where an IPv4 or IPv6 address
> *could* be used on the backend interfaces (which I'll still poke as
> its unclear to me why they have 'em).

My use-case would simply be to have an analogue for the disable_ipv6
case. In the future I expect more people will want to disable IPv4 as
they move to IPv6. If you don't have something like disable_ipv4, then
there's no way to ensure that some random program or something doesn't
set up IPv4 stuff that you don't want.

Same thing for IPv6; some people really don't want IPv6 enabled on an
interface no matter what; they don't want an IPv6LL address assigned,
they don't want kernel SLAAC, they want to ensure that *nothing*
IPv6-related gets done for that interface. The same can be true for
IPv4, but we don't have a way of doing that right now.

Dan

2014-02-24 20:33:59

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Mon, Feb 24, 2014 at 10:22 AM, Dan Williams <[email protected]> wrote:
> My use-case would simply be to have an analogue for the disable_ipv6
> case. In the future I expect more people will want to disable IPv4 as
> they move to IPv6. If you don't have something like disable_ipv4, then
> there's no way to ensure that some random program or something doesn't
> set up IPv4 stuff that you don't want.
>
> Same thing for IPv6; some people really don't want IPv6 enabled on an
> interface no matter what; they don't want an IPv6LL address assigned,
> they don't want kernel SLAAC, they want to ensure that *nothing*
> IPv6-related gets done for that interface. The same can be true for
> IPv4, but we don't have a way of doing that right now.

I'll add this to my queue.

Luis

2014-02-24 23:04:32

by David Miller

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

From: Dan Williams <[email protected]>
Date: Mon, 24 Feb 2014 12:22:00 -0600

> In the future I expect more people will want to disable IPv4 as
> they move to IPv6.

I definitely don't.

I've been lightly following this conversation and I have to say
a few things.

disable_ipv6 was added because people wanted to make sure their
machines didn't generate any ipv6 traffic because "ipv6 is not
mature", "we don't have our firewalls configured to handle that
kind of traffic" etc.

None of these things apply to ipv4.

And if you think people will go to ipv6 only, you are dreaming.

Name a provider of a major web sitewho will go to strictly only
providing an ipv6 facing site?

Only an idiot who wanted to lose significiant nunbers of page views
and traffic would do that, so ipv4 based connectivity will be
universally necessary forever.

I think disable_ipv4 is absolutely a non-starter.

2014-02-25 00:02:33

by Ben Hutchings

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Mon, 2014-02-24 at 18:04 -0500, David Miller wrote:
> From: Dan Williams <[email protected]>
> Date: Mon, 24 Feb 2014 12:22:00 -0600
>
> > In the future I expect more people will want to disable IPv4 as
> > they move to IPv6.
>
> I definitely don't.
>
> I've been lightly following this conversation and I have to say
> a few things.
>
> disable_ipv6 was added because people wanted to make sure their
> machines didn't generate any ipv6 traffic because "ipv6 is not
> mature", "we don't have our firewalls configured to handle that
> kind of traffic" etc.
>
> None of these things apply to ipv4.
>
> And if you think people will go to ipv6 only, you are dreaming.
>
> Name a provider of a major web sitewho will go to strictly only
> providing an ipv6 facing site?
>
> Only an idiot who wanted to lose significiant nunbers of page views
> and traffic would do that,

That's obviously true for public-facing servers, but that doesn't mean
it's not useful to anyone.

> so ipv4 based connectivity will be universally necessary forever.

You can run an internal network, or access network, as v6-only with
NAT64 and DNS64 at the border. I believe some mobile networks are doing
this; it was also done on the main FOSDEM wireless network this year.

Ben.

> I think disable_ipv4 is absolutely a non-starter.


--
Ben Hutchings
Beware of bugs in the above code;
I have only proved it correct, not tried it. - Donald Knuth


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-02-25 00:12:42

by David Miller

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

From: Ben Hutchings <[email protected]>
Date: Tue, 25 Feb 2014 00:02:00 +0000

> You can run an internal network, or access network, as v6-only with
> NAT64 and DNS64 at the border. I believe some mobile networks are doing
> this; it was also done on the main FOSDEM wireless network this year.

This seems to be bloating up the networking headers of the internal
network, for what purpose?

For mobile that's doubly inadvisable.

2014-02-25 02:02:22

by Ben Hutchings

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Mon, 2014-02-24 at 19:12 -0500, David Miller wrote:
> From: Ben Hutchings <[email protected]>
> Date: Tue, 25 Feb 2014 00:02:00 +0000
>
> > You can run an internal network, or access network, as v6-only with
> > NAT64 and DNS64 at the border. I believe some mobile networks are doing
> > this; it was also done on the main FOSDEM wireless network this year.
>
> This seems to be bloating up the networking headers of the internal
> network, for what purpose?
>
> For mobile that's doubly inadvisable.

I don't know what the reasoning is for the mobile network operators.
They're forced to do NAT for v4 somewhere, and maybe v6-only makes the
access network easier to manage.

I doubt the extra header length hurts that much on a 3G or 4G network.

Ben.

--
Ben Hutchings
Everything should be made as simple as possible, but not simpler.
- Albert Einstein


Attachments:
signature.asc (811.00 B)
This is a digitally signed message part

2014-02-25 02:23:13

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Tue, Feb 25, 2014 at 02:01:59AM +0000, Ben Hutchings wrote:
> On Mon, 2014-02-24 at 19:12 -0500, David Miller wrote:
> > From: Ben Hutchings <[email protected]>
> > Date: Tue, 25 Feb 2014 00:02:00 +0000
> >
> > > You can run an internal network, or access network, as v6-only with
> > > NAT64 and DNS64 at the border. I believe some mobile networks are doing
> > > this; it was also done on the main FOSDEM wireless network this year.
> >
> > This seems to be bloating up the networking headers of the internal
> > network, for what purpose?
> >
> > For mobile that's doubly inadvisable.
>
> I don't know what the reasoning is for the mobile network operators.
> They're forced to do NAT for v4 somewhere, and maybe v6-only makes the
> access network easier to manage.

Yes, it seems the way to go:
<http://www.dslreports.com/shownews/TMobile-Goes-IPv6-Only-on-Android-44-Devices-126506>

I can't comment on the 464xlat that much because I haven't looked at an
implementation yet, but it can very well be the case it still needs IPv4
on the outgoing interface, I don't know (from the spec pov it doesn't
look like that).

Greetings,

Hannes

2014-02-25 19:51:01

by Paul Marks

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Mon, Feb 24, 2014 at 4:12 PM, David Miller <[email protected]> wrote:
> From: Ben Hutchings <[email protected]>
> Date: Tue, 25 Feb 2014 00:02:00 +0000
>
>> You can run an internal network, or access network, as v6-only with
>> NAT64 and DNS64 at the border. I believe some mobile networks are doing
>> this; it was also done on the main FOSDEM wireless network this year.
>
> This seems to be bloating up the networking headers of the internal
> network, for what purpose?

The primary purpose of IPv6 is to bloat up network headers, because
the IPv4 headers were too small to address all the endpoints.

NAT64 is an intriguing solution to the problem of "I have too many
customers for 10.0.0.0/8". Here's are some slides on the topic from
this week's APNIC conference:
https://conference.apnic.net/data/37/464xlat-apricot-2014_1393236641.pdf

A kernel with disable_ipv4 would be fairly usable on such a network
today, as long as you avoid AF_INET-specific apps.

2014-02-25 21:06:16

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Mon, 2014-02-24 at 18:04 -0500, David Miller wrote:
> From: Dan Williams <[email protected]>
> Date: Mon, 24 Feb 2014 12:22:00 -0600
>
> > In the future I expect more people will want to disable IPv4 as
> > they move to IPv6.
>
> I definitely don't.
>
> I've been lightly following this conversation and I have to say
> a few things.
>
> disable_ipv6 was added because people wanted to make sure their
> machines didn't generate any ipv6 traffic because "ipv6 is not
> mature", "we don't have our firewalls configured to handle that
> kind of traffic" etc.
>
> None of these things apply to ipv4.
>
> And if you think people will go to ipv6 only, you are dreaming.
>
> Name a provider of a major web sitewho will go to strictly only
> providing an ipv6 facing site?
>
> Only an idiot who wanted to lose significiant nunbers of page views
> and traffic would do that, so ipv4 based connectivity will be
> universally necessary forever.
>
> I think disable_ipv4 is absolutely a non-starter.

Also, disable_ipv4 signals *intent*, which is distinct from current
state.

Does an interface without an IPv4 address mean that the user wished it
not to have one?

Or does it mean that DHCP hasn't started yet (but is supposed to), or
failed, or something hasn't gotten around to assigning an address yet?

disable_ipv4 lets you distinguish between these two cases, the same way
disable_ipv6 does.

Dan

2014-02-25 21:18:24

by David Miller

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

From: Dan Williams <[email protected]>
Date: Tue, 25 Feb 2014 15:07:00 -0600

> Also, disable_ipv4 signals *intent*, which is distinct from current
> state.
>
> Does an interface without an IPv4 address mean that the user wished it
> not to have one?
>
> Or does it mean that DHCP hasn't started yet (but is supposed to), or
> failed, or something hasn't gotten around to assigning an address yet?
>
> disable_ipv4 lets you distinguish between these two cases, the same way
> disable_ipv6 does.

Intent only matters on the kernel side if the kernel automatically
assigns addresses to interfaces which have been brought up like ipv6
does.

Since it does not do this for ipv4, this can be handled entirely in
userspace.

It is not a valid argument to say that a rogue dhcp might run on
the machine and configure an ipv4 address. That's the admin's
responsibility, and still a user side problem. A "rogue" program
could just as equally turn the theoretical disable_ipv4 off too.

2014-02-26 01:29:38

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: [RFC v2 2/4] net: enables interface option to skip IP

On Tue, Feb 25, 2014 at 04:18:17PM -0500, David Miller wrote:
> From: Dan Williams <[email protected]>
> Date: Tue, 25 Feb 2014 15:07:00 -0600
>
> > Also, disable_ipv4 signals *intent*, which is distinct from current
> > state.
> >
> > Does an interface without an IPv4 address mean that the user wished it
> > not to have one?
> >
> > Or does it mean that DHCP hasn't started yet (but is supposed to), or
> > failed, or something hasn't gotten around to assigning an address yet?
> >
> > disable_ipv4 lets you distinguish between these two cases, the same way
> > disable_ipv6 does.
>
> Intent only matters on the kernel side if the kernel automatically
> assigns addresses to interfaces which have been brought up like ipv6
> does.
>
> Since it does not do this for ipv4, this can be handled entirely in
> userspace.
>
> It is not a valid argument to say that a rogue dhcp might run on
> the machine and configure an ipv4 address. That's the admin's
> responsibility, and still a user side problem. A "rogue" program
> could just as equally turn the theoretical disable_ipv4 off too.

Week end model strikes again. :)

Currently one would need to set arp_filter and arp_ignore and have no
ip address on the interface to isolate it from the ipv4 network.

IFF_NOARP is of no use here as it also disables neighbour discovery.

I am not sure we completley tear down igmp processing on that interface
if no ip address is available. Maybe there are some special cases with
forwarding, too.

Such a "silent" mode could come handy for intrusion detection systems
where one would ensure that no ip processing takes place but could also
be realized with nftables/netfilter/arpfilter, I think.

Bye,

Hannes