Hi folks,
If I'm replying to a UDP packet, I generally want to use a source
address that's the same as the destination address of the packet to
which I'm replying. For example:
Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3
Peer B replies with: src = 10.0.0.3, dst = 10.0.0.1
But let's complicate things. Let's say Peer B has multiple IPs on an
interface: 10.0.0.2, 10.0.0.3. The default route uses 10.0.0.2. In
this case what do you think should happen?
Case 1:
Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3
Peer B replies with: src = 10.0.0.2, dst = 10.0.0.1
Case 2:
Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3
Peer B replies with: src = 10.0.0.3, dst = 10.0.0.1
Intuition tells me the answer is "Case 2". If you agree, keep reading.
If you disagree, stop reading here, and instead correct my poor
intuition.
So, assuming "Case 2", when Peer B receives the first packet, he notes
that packet's destination address, so that he can use it as a source
address next. When replying, Peer B sets the stored source address and
calls the routing function:
struct flowi4 fl = {
.saddr = from_daddr_of_previous_packet,
.daddr = from_saddr_of_previous_packet,
};
rt = ip_route_output_flow(sock_net(sock), &fl, sock);
What if, however, by the time Peer B chooses to reply, his interface
no longer has that source address? No problem, because
ip_route_output_flow will return -EINVAL in that case. So, we can do
this:
struct flowi4 fl = {
.saddr = from_daddr_of_previous_packet,
.daddr = from_saddr_of_previous_packet,
};
rt = ip_route_output_flow(sock_net(sock), &fl, sock);
if (unlikely(IS_ERR(rt))) {
fl.saddr = 0;
rt = ip_route_output_flow(sock_net(sock), &fl, sock);
}
And then all is good in the neighborhood. This solution works. Done.
But what about IPv6? That's where we get into trouble:
struct flowi6 fl = {
.saddr = from_daddr_of_previous_packet,
.daddr = from_saddr_of_previous_packet,
};
ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl);
In this case, IPv6 returns a valid dst, when no interface has the
source address anymore! So, there's no way to know whether or not the
source address for replying has gone stale. We don't have a means of
falling back to inaddr_any for the source address.
Primary question: is this behavior a bug? Or is this some consequence
of a fundamental IPv6 difference with v4? Or is something else
happening here?
Thanks,
Jason
On 11/11/16 12:29 PM, Jason A. Donenfeld wrote:
> Hi folks,
>
> If I'm replying to a UDP packet, I generally want to use a source
> address that's the same as the destination address of the packet to
> which I'm replying. For example:
>
> Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3
> Peer B replies with: src = 10.0.0.3, dst = 10.0.0.1
>
> But let's complicate things. Let's say Peer B has multiple IPs on an
> interface: 10.0.0.2, 10.0.0.3. The default route uses 10.0.0.2. In
> this case what do you think should happen?
>
> Case 1:
> Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3
> Peer B replies with: src = 10.0.0.2, dst = 10.0.0.1
>
> Case 2:
> Peer A sends packet: src = 10.0.0.1, dst = 10.0.0.3
> Peer B replies with: src = 10.0.0.3, dst = 10.0.0.1
>
> Intuition tells me the answer is "Case 2". If you agree, keep reading.
> If you disagree, stop reading here, and instead correct my poor
> intuition.
>
> So, assuming "Case 2", when Peer B receives the first packet, he notes
> that packet's destination address, so that he can use it as a source
> address next. When replying, Peer B sets the stored source address and
> calls the routing function:
>
> struct flowi4 fl = {
> .saddr = from_daddr_of_previous_packet,
> .daddr = from_saddr_of_previous_packet,
> };
> rt = ip_route_output_flow(sock_net(sock), &fl, sock);
>
> What if, however, by the time Peer B chooses to reply, his interface
> no longer has that source address? No problem, because
> ip_route_output_flow will return -EINVAL in that case. So, we can do
> this:
>
> struct flowi4 fl = {
> .saddr = from_daddr_of_previous_packet,
> .daddr = from_saddr_of_previous_packet,
> };
> rt = ip_route_output_flow(sock_net(sock), &fl, sock);
> if (unlikely(IS_ERR(rt))) {
> fl.saddr = 0;
> rt = ip_route_output_flow(sock_net(sock), &fl, sock);
> }
>
> And then all is good in the neighborhood. This solution works. Done.
>
> But what about IPv6? That's where we get into trouble:
>
> struct flowi6 fl = {
> .saddr = from_daddr_of_previous_packet,
> .daddr = from_saddr_of_previous_packet,
> };
> ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl);
>
> In this case, IPv6 returns a valid dst, when no interface has the
> source address anymore! So, there's no way to know whether or not the
> source address for replying has gone stale. We don't have a means of
> falling back to inaddr_any for the source address.
What do you mean by 'valid dst'? ipv6 returns net->ipv6.ip6_null_entry on lookup failures so yes dst is non-NULL but that does not mean the lookup succeeded.
For example take a look at ip6_dst_lookup_tail():
if (!*dst)
*dst = ip6_route_output_flags(net, sk, fl6, flags);
err = (*dst)->error;
if (err)
goto out_err_release;
perhaps I should add dst->error to the fib tracepoints ...
>
> Primary question: is this behavior a bug? Or is this some consequence
> of a fundamental IPv6 difference with v4? Or is something else
> happening here?
>
> Thanks,
> Jason
>
Hi David,
On Fri, Nov 11, 2016 at 11:14 PM, David Ahern <[email protected]> wrote:
> What do you mean by 'valid dst'? ipv6 returns net->ipv6.ip6_null_entry on lookup failures so yes dst is non-NULL but that does not mean the lookup succeeded.
What I mean is that it returns an ordinary dst, as if that souce
address _hadn't_ been removed from the interface, even though I just
removed it. Is this buggy behavior? If so, let me know and I'll try to
track it down. The expected behavior, as far as I can see, would be
the same that ip_route_output_flow has -- returning -EINVAL when the
saddr isn't valid. At the moment, when the saddr is invalid,
ipv6_stub->ipv6_dst_lookup returns 0 and &dst contains a real entry.
Regards,
Jason
Hi again,
I've done some pretty in depth debugging now to determine exactly what
the behavior of ipv6_stub->ipv6_dst_lookup is. First I'll start with
ip_route_output_flow, which I believe to be well behaved, and then
I'll show ipv6_stub->ipv6_dst_lookup, which seems ill-behaved:
Userspace:
ip addr add 192.168.1.2/24 dev eth0
Kernelspace:
struct flowi4 fl = {
.saddr = 192.168.1.2,
.daddr = 192.168.1.99,
};
rt = ip_route_output_flow(sock_net(sock), &fl, sock);
// rt returns valid rt for routing to 192.168.1.99 from
192.168.1.2 using eth0
Userspace:
ip addr add 192.168.1.3/24 dev eth0
ip addr del 192.168.1.2/24 dev eth0
Kernelspace:
struct flowi4 fl = {
.saddr = 192.168.1.2,
.daddr = 192.168.1.99,
};
rt = ip_route_output_flow(sock_net(sock), &fl, sock);
// PTR_ERR(rt) == -EINVAL
This seems correct behavior to me, since no interface has 192.168.1.2
as a source address.
Now for the incorrect IPv6 behavior:
Userspace:
ip -6 addr add abcd::2/96 dev eth0
Kernelspace:
struct flowi6 fl = {
.saddr = abcd::2,
.daddr = abcd::99,
};
ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl);
// ret is 0, and dst is a non-null dst routing to abcd::99 from
abcd::2 using eth0
Userspace:
ip -6 addr add abcd::3/96 dev eth0
ip -6 addr del abcd::2/96 dev eth0
Kernelspace:
struct flowi6 fl = {
.saddr = abcd::2,
.daddr = abcd::99,
};
ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl);
// ret is 0, and dst is a non-null dst routing to abcd::99 from
abcd::2 using eth0 **INCORRECT BEHAVIOR!**
This seems *INCORRECT* behavior to me, since no interface has abcd::2
as a source address.
So, to summarize, the problem is that ipv6_dst_lookup will happily
return a dst even though the source IP has been removed from the
interface.
I hope this clarifies things. I await your response.
Regards,
Jason
On 11/12/16 8:40 AM, Jason A. Donenfeld wrote:
> Hi again,
>
> I've done some pretty in depth debugging now to determine exactly what
> the behavior of ipv6_stub->ipv6_dst_lookup is. First I'll start with
> ip_route_output_flow, which I believe to be well behaved, and then
> I'll show ipv6_stub->ipv6_dst_lookup, which seems ill-behaved:
>
> Userspace:
> ip addr add 192.168.1.2/24 dev eth0
> Kernelspace:
> struct flowi4 fl = {
> .saddr = 192.168.1.2,
> .daddr = 192.168.1.99,
> };
> rt = ip_route_output_flow(sock_net(sock), &fl, sock);
> // rt returns valid rt for routing to 192.168.1.99 from
> 192.168.1.2 using eth0
> Userspace:
> ip addr add 192.168.1.3/24 dev eth0
> ip addr del 192.168.1.2/24 dev eth0
> Kernelspace:
> struct flowi4 fl = {
> .saddr = 192.168.1.2,
> .daddr = 192.168.1.99,
> };
> rt = ip_route_output_flow(sock_net(sock), &fl, sock);
> // PTR_ERR(rt) == -EINVAL
I believe that is coming from __ip_route_output_key_hash(), line 2232 with __ip_dev_find not finding a device with that address.
Not applicable for your use case, but __ip_dev_find does not have any checks on which L3 domain the device belongs to so the check does not handle VRF for example. I'll take a look at fixing this next week.
>
> This seems correct behavior to me, since no interface has 192.168.1.2
> as a source address.
>
> Now for the incorrect IPv6 behavior:
>
> Userspace:
> ip -6 addr add abcd::2/96 dev eth0
> Kernelspace:
> struct flowi6 fl = {
> .saddr = abcd::2,
> .daddr = abcd::99,
> };
> ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl);
> // ret is 0, and dst is a non-null dst routing to abcd::99 from
> abcd::2 using eth0
> Userspace:
> ip -6 addr add abcd::3/96 dev eth0
> ip -6 addr del abcd::2/96 dev eth0
> Kernelspace:
> struct flowi6 fl = {
> .saddr = abcd::2,
> .daddr = abcd::99,
> };
> ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, &fl);
> // ret is 0, and dst is a non-null dst routing to abcd::99 from
> abcd::2 using eth0 **INCORRECT BEHAVIOR!**
>
> This seems *INCORRECT* behavior to me, since no interface has abcd::2
> as a source address.
Gotcha. I don't see any checks that the saddr is valid similar to what IPv4 does.
I think the right place to add a check is in ip6_dst_lookup_tail():
if (!ipv6_addr_any(&fl6->saddr)) {
// saddr is valid for L3 domain
}
Hi David,
On Sat, Nov 12, 2016 at 7:14 PM, David Ahern <[email protected]> wrote:
> I believe that is coming from __ip_route_output_key_hash(), line 2232 with __ip_dev_find not finding a device with that address.
It's possible we simply are looking at different source trees, but I
have the -EINVAL return in 4.8 route.c sources happening due to the
assignment on line 2175 and the jump on line 2220.
> Not applicable for your use case, but __ip_dev_find does not have any checks on which L3 domain the device belongs to so the check does not handle VRF for example. I'll take a look at fixing this next week.
Interesting.
>
> Gotcha. I don't see any checks that the saddr is valid similar to what IPv4 does.
>
> I think the right place to add a check is in ip6_dst_lookup_tail():
> if (!ipv6_addr_any(&fl6->saddr)) {
> // saddr is valid for L3 domain
> }
Right. It should probably do the check here, and return
ERR_PTR(-EINVAL), the same as the v4 version, so that ret codes can be
checked consistently.
Thanks for looking into this. If you're backed up and would like me to
submit a patch, just let me know, and I'll give it my best shot.
Regards,
Jason
On Sat, Nov 12, 2016 at 8:08 PM, Jason A. Donenfeld <[email protected]> wrote:
>> Gotcha. I don't see any checks that the saddr is valid similar to what IPv4 does.
>>
>> I think the right place to add a check is in ip6_dst_lookup_tail():
>> if (!ipv6_addr_any(&fl6->saddr)) {
>> // saddr is valid for L3 domain
>> }
>
> Right. It should probably do the check here, and return
> ERR_PTR(-EINVAL), the same as the v4 version, so that ret codes can be
> checked consistently.
>
> Thanks for looking into this. If you're backed up and would like me to
> submit a patch, just let me know, and I'll give it my best shot.
In perusing through the v6 FIB code, I don't even see an analog of
__ip_dev_find... Hm?
On Sun, Nov 13, 2016, at 01:43, Jason A. Donenfeld wrote:
> On Sat, Nov 12, 2016 at 8:08 PM, Jason A. Donenfeld <[email protected]>
> wrote:
> >> Gotcha. I don't see any checks that the saddr is valid similar to what IPv4 does.
> >>
> >> I think the right place to add a check is in ip6_dst_lookup_tail():
> >> if (!ipv6_addr_any(&fl6->saddr)) {
> >> // saddr is valid for L3 domain
> >> }
> >
> > Right. It should probably do the check here, and return
> > ERR_PTR(-EINVAL), the same as the v4 version, so that ret codes can be
> > checked consistently.
> >
> > Thanks for looking into this. If you're backed up and would like me to
> > submit a patch, just let me know, and I'll give it my best shot.
>
> In perusing through the v6 FIB code, I don't even see an analog of
> __ip_dev_find... Hm?
You probably need some combination of ipv6_chk_addr and/or
ipv6_check_addr_and_flags (where dev can also be NULL). Be careful if a
IFA_HOST or IFA_LINK address switches from one interface to another.
Bye,
Hannes
On Sun, Nov 13, 2016 at 1:43 AM, Jason A. Donenfeld <[email protected]> wrote:
> In perusing through the v6 FIB code, I don't even see an analog of
> __ip_dev_find... Hm?
Of all places, the iscsi code actually has a nice side-by-side
comparison. So far as I can see, the other protocols just omit this
check in the v6 case, which I believe to be errant behavior. For
example, grep for ip_dev_find in the sctp v4 code. The equivalent v6
code is missing the dev check. Ugly! Here's the block I found in
cxgbit_cm.c:
static struct net_device *cxgbit_ipv4_netdev(__be32 saddr)
{
struct net_device *ndev;
ndev = __ip_dev_find(&init_net, saddr, false);
if (!ndev)
return NULL;
return cxgbit_get_real_dev(ndev);
}
static struct net_device *cxgbit_ipv6_netdev(struct in6_addr *addr6)
{
struct net_device *ndev = NULL;
bool found = false;
if (IS_ENABLED(CONFIG_IPV6)) {
for_each_netdev_rcu(&init_net, ndev)
if (ipv6_chk_addr(&init_net, addr6, ndev, 1)) {
found = true;
break;
}
}
if (!found)
return NULL;
return cxgbit_get_real_dev(ndev);
}
It seems like __ip6_dev_find could be made out of that inner loop.
Then existing uses like that iscsi code can be replaced with that
helper function, and the existing ip6 route tail function can be
augmented in the manner you recommended. Seem like a decent
implementation strategy?
I might submit some patches, unless you beat me to it.
Jason
Hi Hannes,
On Sun, Nov 13, 2016 at 1:51 AM, Hannes Frederic Sowa
<[email protected]> wrote:
> You probably need some combination of ipv6_chk_addr and/or
> ipv6_check_addr_and_flags (where dev can also be NULL). Be careful if a
> IFA_HOST or IFA_LINK address switches from one interface to another.
I can confirm this trick works beautifully:
https://git.zx2c4.com/WireGuard/commit/?id=eb65810fc6350c50b42abedd1291b12337d3dc3d
I'll see if I can fold this into the routing function so that it
behaves the same as v4, unless David gets there first.
Regards,
Jason
This puts the IPv6 routing functions in parity with the IPv4 routing
functions. Namely, we now check in v6 that if a flowi6 requests an
saddr, the returned dst actually corresponds to a net device that has
that saddr. This mirrors the v4 logic with __ip_dev_find in
__ip_route_output_key_hash. In the event that the returned dst is not
for a dst with a dev that has the saddr, we return -EINVAL, just like
v4; this makes it easy to use the same error handlers for both cases.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: David Ahern <[email protected]>
---
net/ipv6/ip6_output.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..a834129 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1011,6 +1011,11 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
}
}
#endif
+ if (!ipv6_addr_any(&fl6->saddr) &&
+ !ipv6_chk_addr(net, &fl6->saddr, (*dst)->dev, 1)) {
+ err = -EINVAL;
+ goto out_err_release;
+ }
return 0;
--
2.10.2
On 11/13/16 6:23 AM, Jason A. Donenfeld wrote:
> This puts the IPv6 routing functions in parity with the IPv4 routing
> functions. Namely, we now check in v6 that if a flowi6 requests an
> saddr, the returned dst actually corresponds to a net device that has
> that saddr. This mirrors the v4 logic with __ip_dev_find in
> __ip_route_output_key_hash. In the event that the returned dst is not
> for a dst with a dev that has the saddr, we return -EINVAL, just like
> v4; this makes it easy to use the same error handlers for both cases.
>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> Cc: David Ahern <[email protected]>
> ---
> net/ipv6/ip6_output.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..a834129 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1011,6 +1011,11 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
> }
> }
> #endif
> + if (!ipv6_addr_any(&fl6->saddr) &&
> + !ipv6_chk_addr(net, &fl6->saddr, (*dst)->dev, 1)) {
> + err = -EINVAL;
> + goto out_err_release;
> + }
>
> return 0;
You can't require the address to be on the dst device. e.g., it can be an address from the loopback/vrf device.
This block needs to be done at function entry, and pass dev as NULL to mean is the address assigned to any interface. That gets you the equivalency of the IPv4 check.
This puts the IPv6 routing functions in parity with the IPv4 routing
functions. Namely, we now check in v6 that if a flowi6 requests an
saddr, the returned dst actually corresponds to a net device that has
that saddr. This mirrors the v4 logic with __ip_dev_find in
__ip_route_output_key_hash. In the event that the returned dst is not
for a dst with a dev that has the saddr, we return -EINVAL, just like
v4; this makes it easy to use the same error handlers for both cases.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: David Ahern <[email protected]>
---
Changes from v1:
This moves the check to the top and now sees if it's a valid address
on _any_ device, not just the one in dst.
include/net/ipv6.h | 2 ++
net/ipv6/ip6_output.c | 28 ++++++++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 8fed1cd..e5dc14f 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -914,6 +914,8 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
const struct in6_addr *final_dst);
struct dst_entry *ip6_blackhole_route(struct net *net,
struct dst_entry *orig_dst);
+struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr,
+ bool devref);
/*
* skb processing functions
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..371170b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -916,6 +916,30 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk,
return dst;
}
+/**
+ * __ip6_dev_find - find the first device with a given source address.
+ * @net: the net namespace
+ * @addr: the source address
+ * @devref: if true, take a reference on the found device
+ *
+ * If a caller uses devref=false, it should be protected by RCU, or RTNL
+ */
+struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr, bool devref)
+{
+ struct net_device *result;
+
+ rcu_read_lock();
+ for_each_netdev_rcu(net, result) {
+ if (ipv6_chk_addr(net, addr, result, 1))
+ break;
+ }
+ if (result && devref)
+ dev_hold(result);
+ rcu_read_unlock();
+ return result;
+}
+EXPORT_SYMBOL(__ip6_dev_find);
+
static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
struct dst_entry **dst, struct flowi6 *fl6)
{
@@ -926,6 +950,10 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
int err;
int flags = 0;
+ if (!ipv6_addr_any(&fl6->saddr) &&
+ !__ip6_dev_find(net, &fl6->saddr, false))
+ return -EINVAL;
+
/* The correct way to handle this would be to do
* ip6_route_get_saddr, and then ip6_route_output; however,
* the route-specific preferred source forces the
--
2.10.2
Hi David,
On Sun, Nov 13, 2016 at 5:30 PM, David Ahern <[email protected]> wrote:
> You can't require the address to be on the dst device. e.g., it can be an address from the loopback/vrf device.
>
> This block needs to be done at function entry, and pass dev as NULL to mean is the address assigned to any interface. That gets you the equivalency of the IPv4 check.
I gave v2 my best shot. Hopefully it's adequate, but I have a feeling
it might be best for you to just code up what you have in mind.
Regards,
Jason
On 11/13/16 1:19 PM, Jason A. Donenfeld wrote:
> I gave v2 my best shot. Hopefully it's adequate, but I have a feeling
> it might be best for you to just code up what you have in mind.
nah, you are doing fine. one more comment on v2.
On 11/13/16 12:02 PM, Jason A. Donenfeld wrote:
> This puts the IPv6 routing functions in parity with the IPv4 routing
> functions. Namely, we now check in v6 that if a flowi6 requests an
> saddr, the returned dst actually corresponds to a net device that has
> that saddr. This mirrors the v4 logic with __ip_dev_find in
> __ip_route_output_key_hash. In the event that the returned dst is not
> for a dst with a dev that has the saddr, we return -EINVAL, just like
> v4; this makes it easy to use the same error handlers for both cases.
>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> Cc: David Ahern <[email protected]>
> ---
> Changes from v1:
> This moves the check to the top and now sees if it's a valid address
> on _any_ device, not just the one in dst.
>
> include/net/ipv6.h | 2 ++
> net/ipv6/ip6_output.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 30 insertions(+)
>
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 8fed1cd..e5dc14f 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -914,6 +914,8 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
> const struct in6_addr *final_dst);
> struct dst_entry *ip6_blackhole_route(struct net *net,
> struct dst_entry *orig_dst);
> +struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr,
> + bool devref);
>
> /*
> * skb processing functions
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..371170b 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -916,6 +916,30 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk,
> return dst;
> }
>
> +/**
> + * __ip6_dev_find - find the first device with a given source address.
> + * @net: the net namespace
> + * @addr: the source address
> + * @devref: if true, take a reference on the found device
> + *
> + * If a caller uses devref=false, it should be protected by RCU, or RTNL
> + */
> +struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr, bool devref)
> +{
> + struct net_device *result;
> +
> + rcu_read_lock();
> + for_each_netdev_rcu(net, result) {
> + if (ipv6_chk_addr(net, addr, result, 1))
> + break;
> + }
> + if (result && devref)
> + dev_hold(result);
> + rcu_read_unlock();
> + return result;
> +}
> +EXPORT_SYMBOL(__ip6_dev_find);
You don't need a new function to walk all interfaces; just use ipv6_chk_addr with a dev arg of NULL. IPv6 has a hash table with all unicast addresses -- inet6_addr_lst. ipv6_chk_addr is checking that list for the address in question. The actual device is not relevant for verifying the address is a valid local one (though the device can be returned from ifp->idev->dev if ever needed).
So drop the above ...
> +
> static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
> struct dst_entry **dst, struct flowi6 *fl6)
> {
> @@ -926,6 +950,10 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
> int err;
> int flags = 0;
>
> + if (!ipv6_addr_any(&fl6->saddr) &&
> + !__ip6_dev_find(net, &fl6->saddr, false))
... and just use ipv6_chk_addr here.
> + return -EINVAL;
> +
> /* The correct way to handle this would be to do
> * ip6_route_get_saddr, and then ip6_route_output; however,
> * the route-specific preferred source forces the
>
This puts the IPv6 routing functions in parity with the IPv4 routing
functions. Namely, we now check in v6 that if a flowi6 requests an
saddr, the returned dst actually corresponds to a net device that has
that saddr. This mirrors the v4 logic with __ip_dev_find in
__ip_route_output_key_hash. In the event that the returned dst is not
for a dst with a dev that has the saddr, we return -EINVAL, just like
v4; this makes it easy to use the same error handlers for both cases.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: David Ahern <[email protected]>
---
Changes from v2:
It turns out ipv6_chk_addr already has the device enumeration
logic that we need by simply passing NULL.
net/ipv6/ip6_output.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..b3b5cb6 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
int err;
int flags = 0;
+ if (!ipv6_addr_any(&fl6->saddr) &&
+ !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
+ return -EINVAL;
+
/* The correct way to handle this would be to do
* ip6_route_get_saddr, and then ip6_route_output; however,
* the route-specific preferred source forces the
--
2.10.2
On 11/13/16 4:28 PM, Jason A. Donenfeld wrote:
> This puts the IPv6 routing functions in parity with the IPv4 routing
> functions. Namely, we now check in v6 that if a flowi6 requests an
> saddr, the returned dst actually corresponds to a net device that has
> that saddr. This mirrors the v4 logic with __ip_dev_find in
> __ip_route_output_key_hash. In the event that the returned dst is not
> for a dst with a dev that has the saddr, we return -EINVAL, just like
> v4; this makes it easy to use the same error handlers for both cases.
>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> Cc: David Ahern <[email protected]>
> ---
> Changes from v2:
> It turns out ipv6_chk_addr already has the device enumeration
> logic that we need by simply passing NULL.
>
> net/ipv6/ip6_output.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..b3b5cb6 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
> int err;
> int flags = 0;
>
> + if (!ipv6_addr_any(&fl6->saddr) &&
> + !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
> + return -EINVAL;
> +
> /* The correct way to handle this would be to do
> * ip6_route_get_saddr, and then ip6_route_output; however,
> * the route-specific preferred source forces the
>
LGTM
Acked-by: David Ahern <[email protected]>
On Mon, Nov 14, 2016, at 00:28, Jason A. Donenfeld wrote:
> This puts the IPv6 routing functions in parity with the IPv4 routing
> functions. Namely, we now check in v6 that if a flowi6 requests an
> saddr, the returned dst actually corresponds to a net device that has
> that saddr. This mirrors the v4 logic with __ip_dev_find in
> __ip_route_output_key_hash. In the event that the returned dst is not
> for a dst with a dev that has the saddr, we return -EINVAL, just like
> v4; this makes it easy to use the same error handlers for both cases.
>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> Cc: David Ahern <[email protected]>
> ---
> Changes from v2:
> It turns out ipv6_chk_addr already has the device enumeration
> logic that we need by simply passing NULL.
>
> net/ipv6/ip6_output.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..b3b5cb6 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net,
> const struct sock *sk,
> int err;
> int flags = 0;
>
> + if (!ipv6_addr_any(&fl6->saddr) &&
> + !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
> + return -EINVAL;
Hmm, this check is too permissive, no?
E.g. what happens if you move a link local address from one interface to
another? In this case this code would still allow the saddr to be used.
I just also quickly read up on the history (sorry was travelling last
week) and wonder if you ever saw a user space facing bug or if this is
basically some difference you saw while writing out of tree code?
Thanks,
Hannes
On 11/14/16 9:44 AM, Hannes Frederic Sowa wrote:
> On Mon, Nov 14, 2016, at 00:28, Jason A. Donenfeld wrote:
>> This puts the IPv6 routing functions in parity with the IPv4 routing
>> functions. Namely, we now check in v6 that if a flowi6 requests an
>> saddr, the returned dst actually corresponds to a net device that has
>> that saddr. This mirrors the v4 logic with __ip_dev_find in
>> __ip_route_output_key_hash. In the event that the returned dst is not
>> for a dst with a dev that has the saddr, we return -EINVAL, just like
>> v4; this makes it easy to use the same error handlers for both cases.
>>
>> Signed-off-by: Jason A. Donenfeld <[email protected]>
>> Cc: David Ahern <[email protected]>
>> ---
>> Changes from v2:
>> It turns out ipv6_chk_addr already has the device enumeration
>> logic that we need by simply passing NULL.
>>
>> net/ipv6/ip6_output.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 6001e78..b3b5cb6 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net,
>> const struct sock *sk,
>> int err;
>> int flags = 0;
>>
>> + if (!ipv6_addr_any(&fl6->saddr) &&
>> + !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
>> + return -EINVAL;
>
> Hmm, this check is too permissive, no?
>
> E.g. what happens if you move a link local address from one interface to
> another? In this case this code would still allow the saddr to be used.
This check -- like the ipv4 variant -- only verifies the saddr is locally assigned. If the address moves interfaces it should be fine.
>
> I just also quickly read up on the history (sorry was travelling last
> week) and wonder if you ever saw a user space facing bug or if this is
> basically some difference you saw while writing out of tree code?
I checked the userspace API this morning. bind and cmsg for example check that the address is valid with calls to ipv6_chk_addr.
On 14.11.2016 17:55, David Ahern wrote:
> On 11/14/16 9:44 AM, Hannes Frederic Sowa wrote:
>> On Mon, Nov 14, 2016, at 00:28, Jason A. Donenfeld wrote:
>>> This puts the IPv6 routing functions in parity with the IPv4 routing
>>> functions. Namely, we now check in v6 that if a flowi6 requests an
>>> saddr, the returned dst actually corresponds to a net device that has
>>> that saddr. This mirrors the v4 logic with __ip_dev_find in
>>> __ip_route_output_key_hash. In the event that the returned dst is not
>>> for a dst with a dev that has the saddr, we return -EINVAL, just like
>>> v4; this makes it easy to use the same error handlers for both cases.
>>>
>>> Signed-off-by: Jason A. Donenfeld <[email protected]>
>>> Cc: David Ahern <[email protected]>
>>> ---
>>> Changes from v2:
>>> It turns out ipv6_chk_addr already has the device enumeration
>>> logic that we need by simply passing NULL.
>>>
>>> net/ipv6/ip6_output.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>>> index 6001e78..b3b5cb6 100644
>>> --- a/net/ipv6/ip6_output.c
>>> +++ b/net/ipv6/ip6_output.c
>>> @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net,
>>> const struct sock *sk,
>>> int err;
>>> int flags = 0;
>>>
>>> + if (!ipv6_addr_any(&fl6->saddr) &&
>>> + !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
>>> + return -EINVAL;
>>
>> Hmm, this check is too permissive, no?
>>
>> E.g. what happens if you move a link local address from one interface to
>> another? In this case this code would still allow the saddr to be used.
>
> This check -- like the ipv4 variant -- only verifies the saddr is locally assigned. If the address moves interfaces it should be fine.
But in this case we should actually bail out, no?
Let's say, user assumes we are on ifindex eth0 with LL address from
eth0. Suddenly the LL address from eth0 is moved to eth1, we can't
accept this source address anymore and need to return -EINVAL, too.
>> I just also quickly read up on the history (sorry was travelling last
>> week) and wonder if you ever saw a user space facing bug or if this is
>> basically some difference you saw while writing out of tree code?
>
> I checked the userspace API this morning. bind and cmsg for example check that the address is valid with calls to ipv6_chk_addr.
Hmm, so it fixes no real bug.
Because of translations of flowi6_oif we actually can't do a correct
check of source address for cases like the one I outlined above? Hmm,
maybe we should simply depend on user space checks.
Bye,
Hannes
On 11/14/16 10:04 AM, Hannes Frederic Sowa wrote:
> On 14.11.2016 17:55, David Ahern wrote:
>> On 11/14/16 9:44 AM, Hannes Frederic Sowa wrote:
>>> On Mon, Nov 14, 2016, at 00:28, Jason A. Donenfeld wrote:
>>>> This puts the IPv6 routing functions in parity with the IPv4 routing
>>>> functions. Namely, we now check in v6 that if a flowi6 requests an
>>>> saddr, the returned dst actually corresponds to a net device that has
>>>> that saddr. This mirrors the v4 logic with __ip_dev_find in
>>>> __ip_route_output_key_hash. In the event that the returned dst is not
>>>> for a dst with a dev that has the saddr, we return -EINVAL, just like
>>>> v4; this makes it easy to use the same error handlers for both cases.
>>>>
>>>> Signed-off-by: Jason A. Donenfeld <[email protected]>
>>>> Cc: David Ahern <[email protected]>
>>>> ---
>>>> Changes from v2:
>>>> It turns out ipv6_chk_addr already has the device enumeration
>>>> logic that we need by simply passing NULL.
>>>>
>>>> net/ipv6/ip6_output.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>>>> index 6001e78..b3b5cb6 100644
>>>> --- a/net/ipv6/ip6_output.c
>>>> +++ b/net/ipv6/ip6_output.c
>>>> @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net,
>>>> const struct sock *sk,
>>>> int err;
>>>> int flags = 0;
>>>>
>>>> + if (!ipv6_addr_any(&fl6->saddr) &&
>>>> + !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
>>>> + return -EINVAL;
>>>
>>> Hmm, this check is too permissive, no?
>>>
>>> E.g. what happens if you move a link local address from one interface to
>>> another? In this case this code would still allow the saddr to be used.
>>
>> This check -- like the ipv4 variant -- only verifies the saddr is locally assigned. If the address moves interfaces it should be fine.
>
> But in this case we should actually bail out, no?
>
> Let's say, user assumes we are on ifindex eth0 with LL address from
> eth0. Suddenly the LL address from eth0 is moved to eth1, we can't
> accept this source address anymore and need to return -EINVAL, too.
so you mean if rt6_need_strict(&fl6->saddr) then the dev needs to be considered.
>
>>> I just also quickly read up on the history (sorry was travelling last
>>> week) and wonder if you ever saw a user space facing bug or if this is
>>> basically some difference you saw while writing out of tree code?
>>
>> I checked the userspace API this morning. bind and cmsg for example check that the address is valid with calls to ipv6_chk_addr.
>
> Hmm, so it fixes no real bug.
>
> Because of translations of flowi6_oif we actually can't do a correct
> check of source address for cases like the one I outlined above? Hmm,
> maybe we should simply depend on user space checks.
I believe Jason's case is forwarding path and the ipv6_stub->ipv6_dst_lookup API.
On 14.11.2016 18:17, David Ahern wrote:
> On 11/14/16 10:04 AM, Hannes Frederic Sowa wrote:
>> On 14.11.2016 17:55, David Ahern wrote:
>>> On 11/14/16 9:44 AM, Hannes Frederic Sowa wrote:
>>>> On Mon, Nov 14, 2016, at 00:28, Jason A. Donenfeld wrote:
>>>>> This puts the IPv6 routing functions in parity with the IPv4 routing
>>>>> functions. Namely, we now check in v6 that if a flowi6 requests an
>>>>> saddr, the returned dst actually corresponds to a net device that has
>>>>> that saddr. This mirrors the v4 logic with __ip_dev_find in
>>>>> __ip_route_output_key_hash. In the event that the returned dst is not
>>>>> for a dst with a dev that has the saddr, we return -EINVAL, just like
>>>>> v4; this makes it easy to use the same error handlers for both cases.
>>>>>
>>>>> Signed-off-by: Jason A. Donenfeld <[email protected]>
>>>>> Cc: David Ahern <[email protected]>
>>>>> ---
>>>>> Changes from v2:
>>>>> It turns out ipv6_chk_addr already has the device enumeration
>>>>> logic that we need by simply passing NULL.
>>>>>
>>>>> net/ipv6/ip6_output.c | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>>>>> index 6001e78..b3b5cb6 100644
>>>>> --- a/net/ipv6/ip6_output.c
>>>>> +++ b/net/ipv6/ip6_output.c
>>>>> @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net,
>>>>> const struct sock *sk,
>>>>> int err;
>>>>> int flags = 0;
>>>>>
>>>>> + if (!ipv6_addr_any(&fl6->saddr) &&
>>>>> + !ipv6_chk_addr(net, &fl6->saddr, NULL, 1))
>>>>> + return -EINVAL;
>>>>
>>>> Hmm, this check is too permissive, no?
>>>>
>>>> E.g. what happens if you move a link local address from one interface to
>>>> another? In this case this code would still allow the saddr to be used.
>>>
>>> This check -- like the ipv4 variant -- only verifies the saddr is locally assigned. If the address moves interfaces it should be fine.
>>
>> But in this case we should actually bail out, no?
>>
>> Let's say, user assumes we are on ifindex eth0 with LL address from
>> eth0. Suddenly the LL address from eth0 is moved to eth1, we can't
>> accept this source address anymore and need to return -EINVAL, too.
>
> so you mean if rt6_need_strict(&fl6->saddr) then the dev needs to be considered.
Exactly, like we do in the user space facing APIs.
>>>> I just also quickly read up on the history (sorry was travelling last
>>>> week) and wonder if you ever saw a user space facing bug or if this is
>>>> basically some difference you saw while writing out of tree code?
>>>
>>> I checked the userspace API this morning. bind and cmsg for example check that the address is valid with calls to ipv6_chk_addr.
>>
>> Hmm, so it fixes no real bug.
>>
>> Because of translations of flowi6_oif we actually can't do a correct
>> check of source address for cases like the one I outlined above? Hmm,
>> maybe we should simply depend on user space checks.
>
> I believe Jason's case is forwarding path and the ipv6_stub->ipv6_dst_lookup API.
It is not a kernel API, because we don't support something like that for
external kernel modules. We basically exported ipv6_dst_lookup to allow
some IPv4 code to do ipv6 stunts when the IPv6 module is loaded. ;)
Bye,
Hannes
On 11/14/16 10:33 AM, Hannes Frederic Sowa wrote:
>>>>> I just also quickly read up on the history (sorry was travelling last
>>>>> week) and wonder if you ever saw a user space facing bug or if this is
>>>>> basically some difference you saw while writing out of tree code?
>>>>
>>>> I checked the userspace API this morning. bind and cmsg for example check that the address is valid with calls to ipv6_chk_addr.
>>>
>>> Hmm, so it fixes no real bug.
>>>
>>> Because of translations of flowi6_oif we actually can't do a correct
>>> check of source address for cases like the one I outlined above? Hmm,
>>> maybe we should simply depend on user space checks.
>>
>> I believe Jason's case is forwarding path and the ipv6_stub->ipv6_dst_lookup API.
>
> It is not a kernel API, because we don't support something like that for
> external kernel modules. We basically exported ipv6_dst_lookup to allow
> some IPv4 code to do ipv6 stunts when the IPv6 module is loaded. ;)
???
ipv6_stub is exported for modules (EXPORT_SYMBOL_GPL(ipv6_stub)).
ipv6_stub->ipv6_dst_lookup is used by several modules -- geneve, tipc, vxlan, mpls -- for IPv6 lookups, not IPv4 code do IPv6 stunts.
So how do you say that is not an exported kernel API?
On Mon, Nov 14, 2016, at 18:48, David Ahern wrote:
> On 11/14/16 10:33 AM, Hannes Frederic Sowa wrote:
> >>>>> I just also quickly read up on the history (sorry was travelling last
> >>>>> week) and wonder if you ever saw a user space facing bug or if this is
> >>>>> basically some difference you saw while writing out of tree code?
> >>>>
> >>>> I checked the userspace API this morning. bind and cmsg for example check that the address is valid with calls to ipv6_chk_addr.
> >>>
> >>> Hmm, so it fixes no real bug.
> >>>
> >>> Because of translations of flowi6_oif we actually can't do a correct
> >>> check of source address for cases like the one I outlined above? Hmm,
> >>> maybe we should simply depend on user space checks.
> >>
> >> I believe Jason's case is forwarding path and the ipv6_stub->ipv6_dst_lookup API.
> >
> > It is not a kernel API, because we don't support something like that for
> > external kernel modules. We basically exported ipv6_dst_lookup to allow
> > some IPv4 code to do ipv6 stunts when the IPv6 module is loaded. ;)
>
> ???
>
> ipv6_stub is exported for modules (EXPORT_SYMBOL_GPL(ipv6_stub)).
>
> ipv6_stub->ipv6_dst_lookup is used by several modules -- geneve, tipc,
> vxlan, mpls -- for IPv6 lookups, not IPv4 code do IPv6 stunts.
>
> So how do you say that is not an exported kernel API?
Sorry, yes, I noticed I wrote it in a confusing way.
I meant to say, we don't require the IPv6 "API" to behave in a similar
way like the IPv4 one. We do this function pointer trick to allow
_in-kernel_ tree modules to use the function dynamically, even the
kernel ipv6 module would be available but is not loaded but don't
guarante any "API like IPv4" to outside tree modules.
I tried to make the point, that it is still something internal to the
kernel if compared to out-of-tree function users. And that different
behavior by itself doesn't count as a bug.
We could as well require the users of this function to check for the
source address before or require to check the source address after the
ipv6_dst_lookup call.
vxlan currently seems wrong and would impacted by this patch in a better
way, so I am all in for such a change, but I think we need to check if
we are also correct scope-wise and not just match for the address on its
own.
Thanks,
Hannes
Hey Hannes, David,
On Mon, Nov 14, 2016 at 7:33 PM, Hannes Frederic Sowa
<[email protected]> wrote:
> I meant to say, we don't require the IPv6 "API" to behave in a similar
> way like the IPv4 one. We do this function pointer trick to allow
> _in-kernel_ tree modules to use the function dynamically, even the
> kernel ipv6 module would be available but is not loaded but don't
> guarante any "API like IPv4" to outside tree modules.
Ultimately I intend to submit my own use case for mainline inclusion.
I have no intention of maintaining my work as exclusively out of tree
and would be very pleased to see this well integrated. Hopefully I'll
meet some of you in person at upcoming conferences and events for
discussion about this.
Were I only concerned about out of tree status, I'd have no hesitation
about just including these particular checks in my own code and
leaving the current in tree code to rot. However, with my final
intentions in mind, it's certainly in my interest to "do things
right", hence this thread. When I noticed that the ipv6_stub routing
function was insufficient for my own project, I examined its usage in
a few other places. And indeed vxlan and geneve seem to also benefit
from this patch. I'd be happy to audit the small handful of other
cases in the kernel that use this API; I suspect they'll also be
improved in a positive way.
> I tried to make the point, that it is still something internal to the
> kernel if compared to out-of-tree function users. And that different
> behavior by itself doesn't count as a bug.
To the extent that other in-tree code uses this function and doesn't
get the saddr check, it seems like a bug. To the extent that this
function becomes more correct, it seems like an improvement. But
whether a bug or a mere improvement, it seems like a net positive.
> We could as well require the users of this function to check for the
> source address before or require to check the source address after the
> ipv6_dst_lookup call.
As said above, I have no qualms about doing this check in my own code.
I was thinking, though, that a handful of other places in the kernel
that use this function would benefit from adding that check too. In
this case, it's usually common practice to move the check into the
shared function, rather than require a flurry of copy-and-paste.
> vxlan currently seems wrong and would impacted by this patch in a better
> way, so I am all in for such a change, but I think we need to check if
> we are also correct scope-wise and not just match for the address on its
> own.
I'll have a better look at this. Perhaps this should be modeled on
what we currently do for userspace, which might amount to something
more or less like:
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..0721915 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -925,6 +925,7 @@ static int ip6_dst_lookup_tail(struct net *net,
const struct sock *sk,
#endif
int err;
int flags = 0;
+ int addr_type, bind_to_dev;
/* The correct way to handle this would be to do
* ip6_route_get_saddr, and then ip6_route_output; however,
@@ -1012,6 +1013,16 @@ static int ip6_dst_lookup_tail(struct net *net,
const struct sock *sk,
}
#endif
+ addr_type = ipv6_addr_type(&fl6->saddr);
+ if (addr_type == IPv6_ADDR_ANY)
+ return 0;
+
+ err = -EINVAL;
+ bind_to_dev = __ipv6_addr_src_scope(addr_type) <=
IPV6_ADDR_SCOPE_LINKLOCAL;
+ if (!ipv6_chk_addr(net, &fl6->saddr, bind_to_dev ?
(*dst)->dev : NULL, 0) &&
+ !ipv6_chk_acast_addr_src(net, (*dst)->dev, &fl6->saddr))
+ goto out_err_release;
+
return 0;
out_err_release:
Hey Jason,
On 15.11.2016 01:45, Jason A. Donenfeld wrote:
> I'll have a better look at this. Perhaps this should be modeled on
> what we currently do for userspace, which might amount to something
> more or less like:
Cool, thanks!
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..0721915 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -925,6 +925,7 @@ static int ip6_dst_lookup_tail(struct net *net,
> const struct sock *sk,
> #endif
> int err;
> int flags = 0;
> + int addr_type, bind_to_dev;
>
> /* The correct way to handle this would be to do
> * ip6_route_get_saddr, and then ip6_route_output; however,
> @@ -1012,6 +1013,16 @@ static int ip6_dst_lookup_tail(struct net *net,
> const struct sock *sk,
> }
> #endif
>
> + addr_type = ipv6_addr_type(&fl6->saddr);
> + if (addr_type == IPv6_ADDR_ANY)
> + return 0;
> +
> + err = -EINVAL;
> + bind_to_dev = __ipv6_addr_src_scope(addr_type) <=
> IPV6_ADDR_SCOPE_LINKLOCAL;
> + if (!ipv6_chk_addr(net, &fl6->saddr, bind_to_dev ?
> (*dst)->dev : NULL, 0) &&
> + !ipv6_chk_acast_addr_src(net, (*dst)->dev, &fl6->saddr))
> + goto out_err_release;
> +
> return 0;
>
> out_err_release:
>
We should not use (*dst)->dev, as this is the resulting device after the
lookup and not necessarily corresponds to the device the user asked for.
Thus you need to pass in fl6.flowi6_oif. Thus to kill the necessary
ifindex->net_device lookup, I would suggest to move
ipv6_chk_addr_and_flags to use ifindex instead of net_device (0
corresponds to the net_device == NULL case). It seems to me this would
make the code easier. ipv6_chk_addr can simply pass down dev->ifindex to
ipv6_chk_addr.
Probably for checking anycast address you need to look up the
net_device, thus use dev_get_by_index_rcu. But probably the unicast
filter will already hit thus the whole traversing of anycast addresses
won't happen in normal cases. This could be separated to its own function.
In the non-strict case we don't necessarily need bind_to_dev?
Bye,
Hannes
On 11/15/16 7:45 AM, Hannes Frederic Sowa wrote:
>> @@ -1012,6 +1013,16 @@ static int ip6_dst_lookup_tail(struct net *net,
>> const struct sock *sk,
>> }
>> #endif
>>
>> + addr_type = ipv6_addr_type(&fl6->saddr);
>> + if (addr_type == IPv6_ADDR_ANY)
>> + return 0;
>> +
>> + err = -EINVAL;
>> + bind_to_dev = __ipv6_addr_src_scope(addr_type) <=
>> IPV6_ADDR_SCOPE_LINKLOCAL;
>> + if (!ipv6_chk_addr(net, &fl6->saddr, bind_to_dev ?
>> (*dst)->dev : NULL, 0) &&
>> + !ipv6_chk_acast_addr_src(net, (*dst)->dev, &fl6->saddr))
>> + goto out_err_release;
>> +
>> return 0;
>>
>> out_err_release:
>>
>
> We should not use (*dst)->dev, as this is the resulting device after the
> lookup and not necessarily corresponds to the device the user asked for.
To be consistent with IPv4 the saddr check is done before the lookup and dst and flow oif should not be used. Handling LL addresses are trickier and perhaps this is not the right place to enforce that check since it requires a specific device which is only really known after lookup. Why not add the if saddr is LL verification as part of the route selection? e.g, add something like rt6_device_match to ip6_pol_route (the device match call is only used for ip6_pol_route_lookup and not ip6_pol_route - why is that?).