2011-04-19 16:53:02

by Alexander Hoogerhuis

[permalink] [raw]
Subject: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")

I hope you (or anyone else) can spare half a minute to have a quick look
at a patch you wrote a few years ago:

> http://lkml.org/lkml/2007/6/8/124

I've been tracking down a case of ICMP Redirects originating from the
wrong IPs, and as far I can tell, you patch is the last to touch this
code (net/ipv4/icmp.c:507):

> if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
> dev = dev_get_by_index_rcu(net, rt->fl.iif);
>
> if (dev)
> saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> else
> saddr = 0;

In a plain world this would work, but I have come across a case that
seems to be not handled by this.

I have two machines set up with VRRP to act as routers out of a subnet,
and they have IPs x.x.x.13/28 and x.x.x.14/28, with VRRP holding on to
x.x.x.1/28.

If a node in x.x.x.0/28 needs to get a ICMP redirect from x.x.x.1/28 (to
reach another subnet behind a different gateway in x.x.x.0/28), then
the source IP on the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at.

This is as far as I can tell from RFCs and colleagues fine for most
things after you're routed one hop or more, but in the case of ICMP
redirect it means that the redirect is not adhered to by the client, as
it will get the reidrect from x.x.x.13/28, not x.x.x.1/28.

inet_select_addr seems to be explicitly looking for the primary IP in
all cases (./net/ipv4/devinet.c:875), and in the case of sending ICMP
recdirect when in an VRRP setup, that would not work well. It should try
to match the actual inbound IP.

Judging by the comments from your patch I am not sure if the source IP
that triggers the ICMP redirect is available at this point any more.

The way I understand it should pick adress is this way:

> if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
> dev = dev_get_by_index_rcu(net, rt->fl.iif);
>
> if (dev == fl.iif)
> saddr = iph->daddr;
>
> if (dev != fl.iif)
> saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> else
> saddr = 0;

I.e. if we are replying to something that is from a local network
segment, then iph->daddr would be a more correct source. My C skill is
prehistoric so what I've written likely is far from correct, but the
general gist is that there is a special case for replying to something
local.

As it stands today (I'm on 2.6.35.11), ICMP redirects when using VRRP
are broken, and I'm hoping I may have found out why. :)

mvh,
A
--
Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
Boxed Solutions AS | +47 908 21 485 - [email protected]
"Given enough eyeballs, all bugs are shallow." -Eric S. Raymond


2011-04-19 16:54:23

by Chris Wright

[permalink] [raw]
Subject: Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")

* Alexander Hoogerhuis ([email protected]) wrote:
> I hope you (or anyone else) can spare half a minute to have a quick
> look at a patch you wrote a few years ago:
>
> >http://lkml.org/lkml/2007/6/8/124

I actually did not write that patch, rather added it to the -stable tree.
Patrick (CCd) wrote it.

> I've been tracking down a case of ICMP Redirects originating from
> the wrong IPs, and as far I can tell, you patch is the last to touch
> this code (net/ipv4/icmp.c:507):
>
> > if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
> > dev = dev_get_by_index_rcu(net, rt->fl.iif);
> >
> >if (dev)
> > saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> >else
> > saddr = 0;
>
> In a plain world this would work, but I have come across a case that
> seems to be not handled by this.
>
> I have two machines set up with VRRP to act as routers out of a
> subnet, and they have IPs x.x.x.13/28 and x.x.x.14/28, with VRRP
> holding on to x.x.x.1/28.
>
> If a node in x.x.x.0/28 needs to get a ICMP redirect from x.x.x.1/28
> (to reach another subnet behind a different gateway in x.x.x.0/28),
> then the source IP on the ICMP redirect is chosen as the primary IP
> on the interface that the packet arrived at.
>
> This is as far as I can tell from RFCs and colleagues fine for most
> things after you're routed one hop or more, but in the case of ICMP
> redirect it means that the redirect is not adhered to by the client,
> as it will get the reidrect from x.x.x.13/28, not x.x.x.1/28.
>
> inet_select_addr seems to be explicitly looking for the primary IP
> in all cases (./net/ipv4/devinet.c:875), and in the case of sending
> ICMP recdirect when in an VRRP setup, that would not work well. It
> should try to match the actual inbound IP.
>
> Judging by the comments from your patch I am not sure if the source
> IP that triggers the ICMP redirect is available at this point any
> more.
>
> The way I understand it should pick adress is this way:
>
> > if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
> > dev = dev_get_by_index_rcu(net, rt->fl.iif);
> >
> > if (dev == fl.iif)
> > saddr = iph->daddr;
> >
> > if (dev != fl.iif)
> > saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> > else
> > saddr = 0;
>
> I.e. if we are replying to something that is from a local network
> segment, then iph->daddr would be a more correct source. My C skill
> is prehistoric so what I've written likely is far from correct, but
> the general gist is that there is a special case for replying to
> something local.
>
> As it stands today (I'm on 2.6.35.11), ICMP redirects when using
> VRRP are broken, and I'm hoping I may have found out why. :)
>
> mvh,
> A
> --
> Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
> Boxed Solutions AS | +47 908 21 485 - [email protected]
> "Given enough eyeballs, all bugs are shallow." -Eric S. Raymond

2011-04-20 08:24:12

by Patrick McHardy

[permalink] [raw]
Subject: Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")

On 19.04.2011 18:54, Chris Wright wrote:
> * Alexander Hoogerhuis ([email protected]) wrote:
>> I hope you (or anyone else) can spare half a minute to have a quick
>> look at a patch you wrote a few years ago:
>>
>>> http://lkml.org/lkml/2007/6/8/124
>
> I actually did not write that patch, rather added it to the -stable tree.
> Patrick (CCd) wrote it.

I actually only fixed it, it was added in 1c2fb7f9 by J. Simonetti
<[email protected]>. Anyways ...

>> I've been tracking down a case of ICMP Redirects originating from
>> the wrong IPs, and as far I can tell, you patch is the last to touch
>> this code (net/ipv4/icmp.c:507):
>>
>>> if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
>>> dev = dev_get_by_index_rcu(net, rt->fl.iif);
>>>
>>> if (dev)
>>> saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>>> else
>>> saddr = 0;
>>
>> In a plain world this would work, but I have come across a case that
>> seems to be not handled by this.
>>
>> I have two machines set up with VRRP to act as routers out of a
>> subnet, and they have IPs x.x.x.13/28 and x.x.x.14/28, with VRRP
>> holding on to x.x.x.1/28.
>>
>> If a node in x.x.x.0/28 needs to get a ICMP redirect from x.x.x.1/28
>> (to reach another subnet behind a different gateway in x.x.x.0/28),
>> then the source IP on the ICMP redirect is chosen as the primary IP
>> on the interface that the packet arrived at.
>>
>> This is as far as I can tell from RFCs and colleagues fine for most
>> things after you're routed one hop or more, but in the case of ICMP
>> redirect it means that the redirect is not adhered to by the client,
>> as it will get the reidrect from x.x.x.13/28, not x.x.x.1/28.
>>
>> inet_select_addr seems to be explicitly looking for the primary IP
>> in all cases (./net/ipv4/devinet.c:875), and in the case of sending
>> ICMP recdirect when in an VRRP setup, that would not work well. It
>> should try to match the actual inbound IP.

>From what I understand, its explicitly meant to behave this way.
This is what the original commit stated:

The new behaviour (when the sysctl variable is toggled on), it will send
the message with the ip of the interface that received the packet that
caused the icmp error. This is the behaviour network administrators will
expect from a router. It makes debugging complicated network layouts
much easier. Also, all 'vendor routers' I know of have the later
behaviour.

>> Judging by the comments from your patch I am not sure if the source
>> IP that triggers the ICMP redirect is available at this point any
>> more.
>>
>> The way I understand it should pick adress is this way:
>>
>>> if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
>>> dev = dev_get_by_index_rcu(net, rt->fl.iif);
>>>
>>> if (dev == fl.iif)
>>> saddr = iph->daddr;
>>>
>>> if (dev != fl.iif)
>>> saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>>> else
>>> saddr = 0;
>>
>> I.e. if we are replying to something that is from a local network
>> segment, then iph->daddr would be a more correct source. My C skill
>> is prehistoric so what I've written likely is far from correct, but
>> the general gist is that there is a special case for replying to
>> something local.

That might be a possibility to fix this for your case. But I'm
wondering why you're turning this on at all and not have routing
decide the correct source address?

2011-04-20 08:38:54

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")

On 20.04.2011 10:24, Patrick McHardy wrote:
>
> That might be a possibility to fix this for your case. But I'm
> wondering why you're turning this on at all and not have routing
> decide the correct source address?

Not a whole lot of tuning, but trying to figure why this would not work
as any other VRRP implementation would work on other routers/OSes.

My case seems to be a general problem for ICMP errors, as the IP stack
tends to want to listen more to advice coming back with the source IP of
the gateway, not a third party.

If you have two machines (A and B) run VRRP and share an IP (C), then
any ICMP redirect should have the VRRP IP as source (C), and the way it
works today (with or without sysctl_icmp_errors_use_inbound_ifaddr) is
that it will have the source set to the primary IP of the source interface.

I suspect this holds for any other ICMP message sent back to hosts in
the connected network as well, such as PMTU-related issues, etc.

In my case nodes in the connected subnet would get ICMP redirects from
the primary IPs, and thus not listen to them as they are arriving from
nodes not listen in the list of known gateways.

It would make more sense when returning ICMP messages the source IP
would be the actual IP it is recveied on, not the primary IP of the
interface.

mvh,
A
--
Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
Boxed Solutions AS | +47 908 21 485 - [email protected]
"Given enough eyeballs, all bugs are shallow." -Eric S. Raymond

2011-04-20 09:12:03

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")

Also, in a quick followup on myself, this is ambiguous, as it does not
state that it will select the primary IP. I've

The second part is that RFC 5798 (VRRP v3) lists in section 8.1.1:

> The IPv4 source address of an ICMP redirect should be the address
> that the end-host used when making its next-hop routing decision.

I.e. the settings in linux, with or without the sysctl flag set, would
run against this.

mvh,
A
--
Alexander Hoogerhuis | http://no.linkedin.com/in/alexh
Boxed Solutions AS | +47 908 21 485 - [email protected]
"Given enough eyeballs, all bugs are shallow." -Eric S. Raymond

2011-04-22 14:47:59

by Jan Ceuleers

[permalink] [raw]
Subject: Re: A patch you wrote some time ago (aka: "[patch 41/54] ICMP: Fix icmp_errors_use_inbound_ifaddr sysctl")

Excuse the top post; copying netdev where the right people will see this.

On 19/04/11 18:43, Alexander Hoogerhuis wrote:
> I hope you (or anyone else) can spare half a minute to have a quick look
> at a patch you wrote a few years ago:
>
>> http://lkml.org/lkml/2007/6/8/124
>
> I've been tracking down a case of ICMP Redirects originating from the
> wrong IPs, and as far I can tell, you patch is the last to touch this
> code (net/ipv4/icmp.c:507):
>
>> if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
>> dev = dev_get_by_index_rcu(net, rt->fl.iif);
>>
>> if (dev)
>> saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
>> else
>> saddr = 0;
>
> In a plain world this would work, but I have come across a case that
> seems to be not handled by this.
>
> I have two machines set up with VRRP to act as routers out of a subnet,
> and they have IPs x.x.x.13/28 and x.x.x.14/28, with VRRP holding on to
> x.x.x.1/28.
>
> If a node in x.x.x.0/28 needs to get a ICMP redirect from x.x.x.1/28 (to
> reach another subnet behind a different gateway in x.x.x.0/28), then the
> source IP on the ICMP redirect is chosen as the primary IP on the
> interface that the packet arrived at.
>
> This is as far as I can tell from RFCs and colleagues fine for most
> things after you're routed one hop or more, but in the case of ICMP
> redirect it means that the redirect is not adhered to by the client, as
> it will get the reidrect from x.x.x.13/28, not x.x.x.1/28.
>
> inet_select_addr seems to be explicitly looking for the primary IP in
> all cases (./net/ipv4/devinet.c:875), and in the case of sending ICMP
> recdirect when in an VRRP setup, that would not work well. It should try
> to match the actual inbound IP.
>
> Judging by the comments from your patch I am not sure if the source IP
> that triggers the ICMP redirect is available at this point any more.
>
> The way I understand it should pick adress is this way:
>
> > if (rt->fl.iif && net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr)
> > dev = dev_get_by_index_rcu(net, rt->fl.iif);
> >
> > if (dev == fl.iif)
> > saddr = iph->daddr;
> >
> > if (dev != fl.iif)
> > saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
> > else
> > saddr = 0;
>
> I.e. if we are replying to something that is from a local network
> segment, then iph->daddr would be a more correct source. My C skill is
> prehistoric so what I've written likely is far from correct, but the
> general gist is that there is a special case for replying to something
> local.
>
> As it stands today (I'm on 2.6.35.11), ICMP redirects when using VRRP
> are broken, and I'm hoping I may have found out why. :)
>
> mvh,
> A