Hi!
I am working on IP over InfiniBand net device support.
Existing code in mainline kernel only supports UD (unreliable datagram)
mode of operation, with max MTU of 2Kbyte.
I'm looking into support for UC (unreliable connected) mode of operation,
which can support MTU with theorectical limit up to 2Gbyte.
As was discussed on the openib list, one of the difficulties with
IP over IB support for UC mode, is the fact that the same device
has to support sending both UC (max MTU 2Gbyte) and UD (max MTU 2Kbyte)
packets, depending on packet link address.
I propose the following simple patch to let the netdevice override
the path MTU per dst entry. The patch was tested by modifying
existing IPoIB code to use MTU of 1K for some addresses, and 2K for
others.
Please comment on this approach: does it make sense to you guys?
Please Cc me directly, I'm not on the list.
Thanks a bunch,
MST
---
Make it possible for a network device to support more than one MTU value at a
time (depending on packet link address, or other criteria).
Signed-off-by: Michael S. Tsirkin <[email protected]>
Index: linux-2.6.12.5/include/linux/netdevice.h
===================================================================
--- linux-2.6.12.5.orig/include/linux/netdevice.h
+++ linux-2.6.12.5/include/linux/netdevice.h
@@ -454,6 +454,10 @@ struct net_device
#define HAVE_CHANGE_MTU
int (*change_mtu)(struct net_device *dev, int new_mtu);
+#define HAVE_GET_MTU
+ u32 (*get_mtu)(struct net_device *dev,
+ struct neighbour *neigh,
+ int path_mtu);
#define HAVE_TX_TIMEOUT
void (*tx_timeout) (struct net_device *dev);
Index: linux-2.6.12.5/include/net/dst.h
===================================================================
--- linux-2.6.12.5.orig/include/net/dst.h
+++ linux-2.6.12.5/include/net/dst.h
@@ -111,7 +111,12 @@ dst_metric(const struct dst_entry *dst,
static inline u32 dst_mtu(const struct dst_entry *dst)
{
- u32 mtu = dst_metric(dst, RTAX_MTU);
+ u32 mtu;
+ if (dst->dev && dst->dev->get_mtu)
+ mtu = dst->dev->get_mtu(dst->dev, dst->neighbour,
+ dst_metric(dst, RTAX_MTU));
+ else
+ mtu = dst_metric(dst, RTAX_MTU);
/*
* Alexey put it here, so ask him about it :)
*/
--
MST
Michael S. Tsirkin <[email protected]> wrote:
>
> Please comment on this approach: does it make sense to you guys?
> Please Cc me directly, I'm not on the list.
Sorry, this doesn't make sense.
> static inline u32 dst_mtu(const struct dst_entry *dst)
> {
> - u32 mtu = dst_metric(dst, RTAX_MTU);
> + u32 mtu;
> + if (dst->dev && dst->dev->get_mtu)
> + mtu = dst->dev->get_mtu(dst->dev, dst->neighbour,
> + dst_metric(dst, RTAX_MTU));
> + else
> + mtu = dst_metric(dst, RTAX_MTU);
>From this I gather that for a given dst the MTU is actually constant.
That is, it only varies across different dst's.
In this case you should calculate the correct MTU when the dst is
created rather than here.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt