Hi,
This is a resend of this patch with more details. I'd
like it can be accepted this time.
The problem: In icmp_reply and ip_send_reply function,
we now use rt->rt_src as destination to send out packets.
For packets received in loopback device, this is wrong
sometimes.
Here is an example (NOTE: eth0 address is set to 10.10.10.1/24):
#tcpdump -n -i lo icmp &
[1] 3155
...
# hping3 --icmp --spoof 10.10.10.3 10.10.10.1
...
09:53:49.508449 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 0
09:53:49.508482 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 0
09:53:50.525560 IP 10.10.10.3 > 10.10.10.1: icmp 8: echo request seq 256
09:53:50.525589 IP 10.10.10.1 > 10.10.10.1: icmp 8: echo reply seq 256
The same thing will happend for tcp:
# hping3 --syn --destport 1234 --spoof 10.10.10.3 10.10.10.1
(NOTE: there is no service to listen on port 1234)
...
10:02:59.395715 IP 10.10.10.3.2787 > 10.10.10.1.1234: S
72057069:72057069(0) win 512 <mss 1414>
10:02:59.395746 IP 10.10.10.1.1234 > 10.10.10.1.2787: R 0:0(0) ack
72057070 win 0
As you can see, all destination address is wrong.
This problem comes from the fact that the route selection for packetes
travle on loopback device only happend once. When we send out packets
from loopback device, the skb->dst is assigned in ip_route_output. It
won't be reassinged in packetes recveive path. So the rt->rt_src don't
equal to ip_hdr(skb)->saddr.
I don't know why we must use rt->rt_src as destionation address. at least
for icmp reply packets, I thin we should use ip_hdr(skb)->saddr as
destionation address. this is according to RFC792:
...
Addresses
The address of the source in an echo message will be the
destination of the echo reply message. To form an echo reply
message, the source and destination addresses are simply reversed,
the type code changed to 0, and the checksum recomputed.
A possible fix is to do ip_route_input in ip_rcv_finish for packtes
received in loopback device. But I think just to use ip_hdr(skb)->saddr
instead of rt->rt_src as destination to reply packetes is a more simple fix.
Thanks Kenan Kalajdzic <[email protected]> for help me with more details
about this problem.
Signed-off-by: Lepton Wu <[email protected]>
diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/icmp.c linux-2.6.22.6-lepton/net/ipv4/icmp.c
--- linux-2.6.22.6/net/ipv4/icmp.c 2007-09-14 17:41:18.000000000 +0800
+++ linux-2.6.22.6-lepton/net/ipv4/icmp.c 2007-09-18 09:57:30.000000000 +0800
@@ -382,6 +382,7 @@ static void icmp_reply(struct icmp_bxm *
struct ipcm_cookie ipc;
struct rtable *rt = (struct rtable *)skb->dst;
__be32 daddr;
+ struct iphdr *ip = ip_hdr(skb);
if (ip_options_echo(&icmp_param->replyopts, skb))
return;
@@ -393,7 +394,7 @@ static void icmp_reply(struct icmp_bxm *
icmp_out_count(icmp_param->data.icmph.type);
inet->tos = ip_hdr(skb)->tos;
- daddr = ipc.addr = rt->rt_src;
+ daddr = ipc.addr = ip->saddr;
ipc.opt = NULL;
if (icmp_param->replyopts.optlen) {
ipc.opt = &icmp_param->replyopts;
diff -X linux-2.6.22.6/Documentation/dontdiff -pru linux-2.6.22.6/net/ipv4/ip_output.c linux-2.6.22.6-lepton/net/ipv4/ip_output.c
--- linux-2.6.22.6/net/ipv4/ip_output.c 2007-09-14 17:41:18.000000000 +0800
+++ linux-2.6.22.6-lepton/net/ipv4/ip_output.c 2007-09-18 09:57:13.000000000 +0800
@@ -1337,11 +1337,12 @@ void ip_send_reply(struct sock *sk, stru
struct ipcm_cookie ipc;
__be32 daddr;
struct rtable *rt = (struct rtable*)skb->dst;
+ struct iphdr *ip = ip_hdr(skb);
if (ip_options_echo(&replyopts.opt, skb))
return;
- daddr = ipc.addr = rt->rt_src;
+ daddr = ipc.addr = ip->saddr;
ipc.opt = NULL;
if (replyopts.opt.optlen) {
I'm not sure why it's using rt_src here, but there are relevant cases that
your description doesn't cover. For example, what happens if the source
is not set in the original packet? Does NAT affect this?
You quote RFC text for ICMP echo and the case where the receiving machine
is the final destination, but you're modifying code that is used for all
ICMP
types and used for ICMP errors generated when acting as an intermediate
router.
In ordinary cases, and certainly with ICMP echo when the source is set in
the original packet and no rewriting is going on (and the address is not
spoofed),
using the original source as the destination is fine. But have you tested
or
considered the other cases?
+-DLS
Now icmp_reply is only called by icmp_echo and icmp_timestamp
ip_send_reply is only called by tcp_v4_send_reset and tcp_v4_send_ack
I think in all situations the ip_hdr(skb)->saddr is set and should
be the destination of reply packets.
If using rt->rt_src as destination is correct in some situation,
can anyone give me a example?
I think perhaps it is a copy and paste from code like
ip_build_and_send_pkt, but reply packets in these situations
(icmp_echo and icmp_timestamp and tcp_v4_send_ack and tcp_v4_send_reset)
is diffrent, I think we can just use ip_hdr(skb)->saddr as
destination address.
On Thu, Sep 20, 2007 at 09:35:09PM -0700, David Stevens wrote:
> I'm not sure why it's using rt_src here, but there are relevant cases that
> your description doesn't cover. For example, what happens if the source
> is not set in the original packet? Does NAT affect this?
>
> You quote RFC text for ICMP echo and the case where the receiving machine
> is the final destination, but you're modifying code that is used for all
> ICMP
> types and used for ICMP errors generated when acting as an intermediate
> router.
>
> In ordinary cases, and certainly with ICMP echo when the source is set in
> the original packet and no rewriting is going on (and the address is not
> spoofed),
> using the original source as the destination is fine. But have you tested
> or
> considered the other cases?
>
> +-DLS
>