2002-08-28 21:41:17

by David Stevens

[permalink] [raw]
Subject: [PATCH] anycast support for IPv6, linux-2.5.31


Below is a patch relative to the mainline 2.5.31 code for an
implementation of anycast support for IPv6. This code was submitted and accepted
in the USAGI tree last Fall. Below is a high-level description of the
implementation:

1) The API
Although the RFC's liken anycasting to ordinary unicasting, I think
it's more appropriate to tie it closely to particular applications, so I've
chosen an API similar to multicasting. So, rather than having a permanent
anycast address associated with the machine, particular applications
that use anycasting can join or leave "anycast groups," and the machine will
recognize the anycast addresses as its own when one or more applications have
joined the group.
So, for example, someone using anycasting for DNS high availability
can add a join to the anycast group in the server and as long as the DNS server
is running, the machine will answer to that anycast address. But the machine
will not respond to anycasts when the service that's using it isn't available,
so a broken server application that has exited won't deny that service if
there are other working members of the anycast group on other hosts.
I don't know if that's controversial or not-- the RFC's are written
more from the external context, but seem to imply a model along the lines of
using "ifconfig" to add anycast addresses. I think that model doesn't fit the
best uses of anycasting, but I'd like to hear your thoughts on it.
The application interface for joining and leaving anycast groups is 2
new setsockopt() calls: IPV6_JOIN_ANYCAST and IPV6_LEAVE_ANYCAST. The arguments
are the same as the corresponding multicast operations. The kernel keeps a
reference count of members; when that goes to zero, the anycast address is not
recognized as a local address. While nonzero, the host listens on the solicited
node for that address, sends advertisements in response to solicitations (with
override=0) and delivers packets sent to the anycast address to upper layers.
There's also an in-kernel interface described below, which is used by
IPv6 mobility, for example.

2) Security Model
RFC 2373 states:
"
o An anycast address must not be assigned to an IPv6 host, that is, it may be
assigned to an IPv6 router only."

This patch violates this in 1 special case, and I'll explain why.

a) The restriction on host use of anycast is to avoid carrying individual host
routes for anycast addresses spread out among multiple physical
networks. I think the initial application sets are exactly things that
won't be on off-the-shelf routers (high availabily servers (DNS, http,
etc) and mobile IPv6) and the particular cases don't have the problem of
requiring host routes or participation in the routing system. They use
anycast addresses with a prefix common to a unicast address on the
system, so ordinary routing gets you to the right network, anyway, and
there's no external penalty on the routing system for using those types
of anycast addresses. For that reason, I allow anycast addresses that
match an existing unicast prefix even on hosts.

Finally (for security considerations), I had to choose whether anycast
should require root privilege or not. Multicasting does not, but it'd obviously
be a spoofing issue if an application joined an "anycast" that was actually the
unicast address of another machine on that network. On the other hand, it's
handy for non-root users to be able to make use of anycasting where that use
doesn't pose any security risks.
The code below allows non-root users to join anycast groups that have
matching prefixes (don't require special-route propagation) with existing
unicast addresses, and require root (really "CAP_NET_ADMIN") and a router for
off-link anycasts (disallowed completely on hosts). I think that should be
extended to require CAP_NET_ADMIN for any anycasts (even on-link ones) that are
not well-known anycasts (to avoid the spoofing of on-link unicast addresses).

4) The Implementation
The code maintains a list of anycast addresses that are in use for
a given interface. The code is a modifed version of the existing multicast
code, with some things cleaned up, and operations on the anycast list instead
of the multicast list. Because the anycast address list is separate from the
ordinary address list, anycast addresses in general won't be selected as a
source address, or available for inappropriate uses. Protocols (like ICMP ECHO)
that respond by swapping the source and destination address have a separate
check for anycasts and set the source to zero in that case-- allows IPv6 to
choose the outbound source address.
The code has the setsockopt() interface for joining and leaving anycast
groups, but does not yet have changes needed for UDP and TCP to work with them.
TCP is problematic, because the PCB lookup mechanism relies on the destination
address which must change-- it should be disallowed initially. UDP may work
with an INADDR_ANY-bound listener, but I haven't made changes to support it
yet. It will probably use the anycast address as the source, so it'll need a
modification similar to what I've done with ICMP, but should be straightforward.
Ultimately, I think we want to allow binding to anycast addresses as well.
Our immediate application is mobile IPv6, so this patch doesn't include
any of the upper-layer changes that may be needed for general application
support.
For in-kernel use, applications (like mobile IPv6) can call join and
drop functions for anycast addresses, and a function that checks if a device
is in an anycast group (if dev == 0, checks if any device is in that group).
They are (similar to multicast functions):

int ipv6_dev_ac_inc(struct net_device *dev, struct in6_addr *addr)
- add "addr" as an anycast address on "dev"
int ipv6_dev_ac_dec(struct net_device *dev, struct in6_addr *addr)
- remove "addr" as an anycast address on "dev"

these use reference counts, so only the first call to "inc" for a particular
address will add a new address, and only when all references are removed via
"dec" will the address be removed as a local address.

The function:

int ipv6_chk_acast_addr(struct net_device *dev, struct in6_addr *addr)

returns true if "addr" is an anycast address on "dev", false otherwise. If
"dev" is 0, it searches all devices for "addr".

Those 3 functions provide the in-kernel interface.

4) Things of Note
I think we want the ip6_addr_type() to check *only* the well-known
anycasts, since it seems inappropriate to me that that function should be
searching linked lists of anycast addresses. It would also need a "dev"
argument it doesn't have now, since anycast addresses, like unicast and
multicast addresses, in this implementation are associated with particular
devices. Use of those address on other devices should not return type ANYCAST,
but should for the device that has the anycast address. So, in most cases,
ipv6_chk_acast_addr() and not ipv6_addr_type() will be more appropriate.
ipv6_addr_type(), with modifications included for reserved anycast
addresses, will still be useful for cases where the address is known to
*always* be an anycast (for example, disallowing reserved anycasts through
"ifconfig" being set as an ordinary address), but for the lower-level code,
it'll usually need a per-device check. So, I recommend we keep both, and use
ipv6_chk_acast_addr() to answer if it is a configured anycast address, use
ipv6_addr_type() to answer if the address is reserved for anycast (whether
configured or not).
That's what this code does.

5) Testing
I wrote programs to join and leave anycast groups and I checked through
the /proc/net interface (file "anycast6") the presence of the groups. I've
used network sniffers to watch the neighbor discovery sequence and verify the
override bit is cleared, and I've tested with multiple hosts in the anycast
group talking to an unmodifed host that pings the anycast address. I also
verified that the existing code handles "override=0" correctly (it does).
In addition, our mobile IPv6 team has used the code to test the use of
anycasting for Dynamic Home Agent address discovery, with several different
topologies and configurations.
We've done tests with uniprocessor and SMP kernels on multiprocessor
machines.

6) TODO
I think the next steps are to flesh out the UDP part so ordinary
user-level applications can make full use of anycasting.

+-DLS
(See attached file: anycast-2.5.31.patch)


Attachments:
anycast-2.5.31.patch (32.71 kB)

2002-08-28 22:17:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] anycast support for IPv6, linux-2.5.31

On Wed, Aug 28, 2002 at 03:44:57PM -0600, David Stevens wrote:
>
> Below is a patch relative to the mainline 2.5.31 code for an

I think it would make sense to Cc the netdev list at oss.sgi.com..

(and please inline the patch, makes it much easier to respond..)

diff -urN linux-2.5.31/net/ipv6/anycast.c linux-2.5.31AC/net/ipv6/anycast.c
--- linux-2.5.31/net/ipv6/anycast.c Wed Dec 31 16:00:00 1969
+++ linux-2.5.31AC/net/ipv6/anycast.c Wed Aug 21 14:24:41 2002
@@ -0,0 +1,508 @@
+/* $Header$ */
+
+/*
+ * Anycast support for IPv6
+ * Linux INET6 implementation
+ *
+ * Authors:
+ * David L Stevens ([email protected])
+ *
+ * $Id$
+ *
+ * based heavily on net/ipv6/mcast.c
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/* Changes:
+ *
+ */

Umm, in the kernel tree $Header$ and $Id$ will never be filled out.
Also empty changes comments are pretty useless.

+
+#define __NO_VERSION__

Not needed in 2.4/2.5

+#ifdef CONFIG_IPV6_MLD6_DEBUG
+#include <linux/inet.h>
+#endif

This patch doesn't reference CONFIG_IPV6_MLD6_DEBUG anywhere else..

+
+void ipv6_ac_init_dev(struct inet6_dev *idev)
+{
+}

I can't see this actually beeing used anywhere..

+#ifdef CONFIG_PROC_FS
+int anycast6_get_info(char *buffer, char **start, off_t offset, int length)
+{
+ off_t pos=0, begin=0;
+ struct ifacaddr6 *im;
+ int len=0;
+ struct net_device *dev;
+
+ read_lock(&dev_base_lock);
+ for (dev = dev_base; dev; dev = dev->next) {
+ struct inet6_dev *idev;
+
+ if ((idev = in6_dev_get(dev)) == NULL)
+ continue;
+
+ read_lock_bh(&idev->lock);
+ for (im = idev->ac_list; im; im = im->aca_next) {
+ int i;


This function would really benefit from use of the seq_file API..

2002-08-30 06:33:20

by Pekka Savola

[permalink] [raw]
Subject: Re: [PATCH] anycast support for IPv6, linux-2.5.31

On Wed, 28 Aug 2002, David Stevens wrote:
> 1) The API
> Although the RFC's liken anycasting to ordinary unicasting, I think
> it's more appropriate to tie it closely to particular applications, so I've
> chosen an API similar to multicasting. So, rather than having a permanent
> anycast address associated with the machine, particular applications
> that use anycasting can join or leave "anycast groups," and the machine will
> recognize the anycast addresses as its own when one or more applications have
> joined the group.
> So, for example, someone using anycasting for DNS high availability
> can add a join to the anycast group in the server and as long as the DNS server
> is running, the machine will answer to that anycast address. But the machine
> will not respond to anycasts when the service that's using it isn't available,
> so a broken server application that has exited won't deny that service if
> there are other working members of the anycast group on other hosts.
> I don't know if that's controversial or not-- the RFC's are written
> more from the external context, but seem to imply a model along the lines of
> using "ifconfig" to add anycast addresses. I think that model doesn't fit the
> best uses of anycasting, but I'd like to hear your thoughts on it.
> The application interface for joining and leaving anycast groups is 2
> new setsockopt() calls: IPV6_JOIN_ANYCAST and IPV6_LEAVE_ANYCAST. The arguments
> are the same as the corresponding multicast operations. The kernel keeps a
> reference count of members; when that goes to zero, the anycast address is not
> recognized as a local address. While nonzero, the host listens on the solicited
> node for that address, sends advertisements in response to solicitations (with
> override=0) and delivers packets sent to the anycast address to upper layers.
> There's also an in-kernel interface described below, which is used by
> IPv6 mobility, for example.

Before going too much down this path, I think one should write an Internet
Draft about the proposed API (should be quite short & simple) and see what
kind of response it has in the relevant working groups.

--
Pekka Savola "Tell me of difficulties surmounted,
Netcore Oy not those you stumble over and fall"
Systems. Networks. Security. -- Robert Jordan: A Crown of Swords


2002-08-30 08:12:43

by David Stevens

[permalink] [raw]
Subject: Re: [PATCH] anycast support for IPv6, linux-2.5.31


Pekka,

You wrote:

>Before going too much down this path, I think one should write an Internet
>Draft about the proposed API (should be quite short & simple) and see what
>kind of response it has in the relevant working groups.

I don't disagree with that, for informational purposes, but it doesn't
conflict
with the RFC's, which of course don't cover API's, and don't specify any
interface
for anycasting.

However, my primary goal is to get anycasting support with an in-kernel
interface
in 2.5 before the freeze. :-) I used the setsockopt() API for testing, and
left it
in the patch for others to do the same. Though I think it's the right
approach, for
the reasons I mentioned, I'd rather see that portion pulled from the patch
if it's
controversial, than have the in-kernel interface and anycasting proper
delayed over
that.

The one use of anycast I'm aware of right now is for IPv6 mobility, which
needs the in-kernel interface. The
user-level interface is important for future applications, and a
reference-counted setsockopt() interface doesn't
mean we can't also have an ip/ifconfig interface for permanent anycast
addresses, too (the required anycast
addresses in this patch are permanent, for example). So I don't see it as
committing to one choice, but having
in-kernel anycast support (soon) I think is the more important first step.

+-DLS