2001-10-28 16:15:52

by Rolf Fokkens

[permalink] [raw]
Subject: iptables and tcpdump

Hi!

I've been "tcpdumping" traffic that passes through a NAT box based on
netfilter. Everything works wonderful, but tcpdump presents confusing data.
With the help of google I found out that tcpdump sees the data right after
the NF_IP_PRE_ROUTING and the NF_IP_POST_ROUTING hooks. This explains it all,
but results in a new question: why does tcpdump "see" the data after the
NF_IP_PRE_ROUTING hook instead of before, which more accurately reflects the
data that's on the wire?

I can imagine this has been explained before, but I haven't found the full
explanation. Could someone enlighten me?

Another thing is /proc/net/ip_conntrack. It shows also some confusing
information like this:

icmp 1 29 src=145.66.17.200 dst=10.13.92.231 ... [UNREPLIED]
src=130.130.92.231 dst=145.66.17.200 ...

One half shows an unNATted dst, the second half shows the NATted src.
Logically speaking they should match but now they don't.

So everything works fine, but it's presented in a confusing way (tcpdump,
ip_conntrack). This may be intentionally but it seems a little accidentally
to me.

Rolf

-------------------------------------------------------


2001-10-30 05:26:13

by Rusty Russell

[permalink] [raw]
Subject: Re: iptables and tcpdump

On Sun, 28 Oct 2001 17:10:41 -0800
Rolf Fokkens <[email protected]> wrote:

> Hi!
>
> I've been "tcpdumping" traffic that passes through a NAT box based on
> netfilter. Everything works wonderful, but tcpdump presents confusing data.
> With the help of google I found out that tcpdump sees the data right after
> the NF_IP_PRE_ROUTING and the NF_IP_POST_ROUTING hooks. This explains it all,
> but results in a new question: why does tcpdump "see" the data after the
> NF_IP_PRE_ROUTING hook instead of before, which more accurately reflects the
> data that's on the wire?

It should see the packets on the wire (they are grabbed by tcpdump before
IP processing), but IIRC they are cloned (not copied) for tcpdump's use.

Alexey, should the NAT layer be doing skb_unshare() before altering the packet?

> icmp 1 29 src=145.66.17.200 dst=10.13.92.231 ... [UNREPLIED]
> src=130.130.92.231 dst=145.66.17.200 ...
>
> One half shows an unNATted dst, the second half shows the NATted src.
> Logically speaking they should match but now they don't.

No, that's what the connection tracking will actually see. If there is
no NAT, they will match.

Hope that clarifies,
Rusty.

2001-10-30 05:31:43

by David Miller

[permalink] [raw]
Subject: Re: iptables and tcpdump

From: Rusty Russell <[email protected]>
Date: Tue, 30 Oct 2001 15:28:12 +1100

should the NAT layer be doing skb_unshare() before altering the packet?

I think it should.

Look, if you are messing with packets before they go back out, and
tcpdump could have sniffed it on the way in, you can't change it's
contents blindly.

Franks a lot,
David S. Miller
[email protected]

2001-10-30 17:32:03

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: iptables and tcpdump

Hello!

> Alexey, should the NAT layer be doing skb_unshare() before altering the packet?

MUST. Cloned skbs are read-only.

I did not expect such question from you. :-)

Alexey

2001-10-30 20:51:00

by Rolf Fokkens

[permalink] [raw]
Subject: Re: iptables and tcpdump

Hi!

I may have missed something, but I'm not on the maillists which would explain
why. And the archives dont contain the email messages (yet) between my
initial question and this part of the discussion.

Apparently my question triggered a discussion about some deep NAT details at
the skb level. As much as I understand it, something goes wrong with the skb
cloning in the NAT layer, NAT changes read-only copies.

Is this the cause of the weird data that shows up with tcpdump?

Or in other words: does tcpdump show something buggy?

Rolf

On Tuesday 30 October 2001 09:31, you wrote:
> Hello!
>
> > Alexey, should the NAT layer be doing skb_unshare() before altering the
> > packet?
>
> MUST. Cloned skbs are read-only.
>
> I did not expect such question from you. :-)
>
> Alexey

2001-10-31 06:24:49

by Rusty Russell

[permalink] [raw]
Subject: Re: iptables and tcpdump

On Mon, 29 Oct 2001 21:31:57 -0800 (PST)
"David S. Miller" <[email protected]> wrote:

> From: Rusty Russell <[email protected]>
> Date: Tue, 30 Oct 2001 15:28:12 +1100
>
> should the NAT layer be doing skb_unshare() before altering the packet?
>
> I think it should.

Agreed. The 2.2 masq code didn't do this, and hence the "don't tcpdump on masq host"
recommendation.

Please try this patch (compiles at least),
Rusty.

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.13-official/net/ipv4/netfilter/ip_fw_compat.c working-2.4.13-nfunshare/net/ipv4/netfilter/ip_fw_compat.c
--- linux-2.4.13-official/net/ipv4/netfilter/ip_fw_compat.c Sat Apr 28 07:15:01 2001
+++ working-2.4.13-nfunshare/net/ipv4/netfilter/ip_fw_compat.c Wed Oct 31 17:05:53 2001
@@ -78,11 +78,19 @@
{
int ret = FW_BLOCK;
u_int16_t redirpt;
+ struct sk_buff *nskb;

/* Assume worse case: any hook could change packet */
(*pskb)->nfcache |= NFC_UNKNOWN | NFC_ALTERED;
if ((*pskb)->ip_summed == CHECKSUM_HW)
(*pskb)->ip_summed = CHECKSUM_NONE;
+
+ /* Firewall rules can alter TOS: raw socket may have clone of
+ skb: don't disturb it --RR */
+ nskb = skb_unshare(*pskb, GFP_ATOMIC);
+ if (!nskb)
+ return NF_DROP;
+ *pskb = nskb;

switch (hooknum) {
case NF_IP_PRE_ROUTING:
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.13-official/net/ipv4/netfilter/ip_nat_core.c working-2.4.13-nfunshare/net/ipv4/netfilter/ip_nat_core.c
--- linux-2.4.13-official/net/ipv4/netfilter/ip_nat_core.c Thu May 17 03:31:27 2001
+++ working-2.4.13-nfunshare/net/ipv4/netfilter/ip_nat_core.c Wed Oct 31 16:52:06 2001
@@ -734,6 +734,15 @@
synchronize_bh()) can vanish. */
READ_LOCK(&ip_nat_lock);
for (i = 0; i < info->num_manips; i++) {
+ struct sk_buff *nskb;
+ /* raw socket may have clone of skb: don't disturb it --RR */
+ nskb = skb_unshare(*pskb, GFP_ATOMIC);
+ if (!nskb) {
+ READ_UNLOCK(&ip_nat_lock);
+ return NF_DROP;
+ }
+ *pskb = nskb;
+
if (info->manips[i].direction == dir
&& info->manips[i].hooknum == hooknum) {
DEBUGP("Mangling %p: %s to %u.%u.%u.%u %u\n",
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.13-official/net/ipv4/netfilter/ipt_TCPMSS.c working-2.4.13-nfunshare/net/ipv4/netfilter/ipt_TCPMSS.c
--- linux-2.4.13-official/net/ipv4/netfilter/ipt_TCPMSS.c Mon Oct 1 05:26:08 2001
+++ working-2.4.13-nfunshare/net/ipv4/netfilter/ipt_TCPMSS.c Wed Oct 31 17:00:42 2001
@@ -48,6 +48,13 @@
u_int16_t tcplen, newtotlen, oldval, newmss;
unsigned int i;
u_int8_t *opt;
+ struct sk_buff *nskb;
+
+ /* raw socket may have clone of skb: don't disturb it --RR */
+ nskb = skb_unshare(*pskb, GFP_ATOMIC);
+ if (!nskb)
+ return NF_DROP;
+ *pskb = nskb;

tcplen = (*pskb)->len - iph->ihl*4;

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.13-official/net/ipv4/netfilter/ipt_TOS.c working-2.4.13-nfunshare/net/ipv4/netfilter/ipt_TOS.c
--- linux-2.4.13-official/net/ipv4/netfilter/ipt_TOS.c Mon Oct 1 05:26:08 2001
+++ working-2.4.13-nfunshare/net/ipv4/netfilter/ipt_TOS.c Wed Oct 31 17:03:11 2001
@@ -19,7 +19,14 @@
const struct ipt_tos_target_info *tosinfo = targinfo;

if ((iph->tos & IPTOS_TOS_MASK) != tosinfo->tos) {
+ struct sk_buff *nskb;
u_int16_t diffs[2];
+
+ /* raw socket may have clone of skb: don't disturb it --RR */
+ nskb = skb_unshare(*pskb, GFP_ATOMIC);
+ if (!nskb)
+ return NF_DROP;
+ *pskb = nskb;

diffs[0] = htons(iph->tos) ^ 0xFFFF;
iph->tos = (iph->tos & IPTOS_PREC_MASK) | tosinfo->tos;

2001-10-31 13:35:16

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: iptables and tcpdump

Hello!

> Agreed. The 2.2 masq code didn't do this, and hence the "don't tcpdump on masq host"
> recommendation.

Paul, it is very possible that I smoke/drunk something wrong
and saw this in dreams, but I really remember that this bug
has been fixed in some 2.1.x. :-)

Only function is different: that time skb_unshare() did some
unitelligible thing and was used only by AX.25 for an unknown purpose.
So, the function which does the work was called skb_cow().

Alexey

2001-11-06 23:44:01

by David Miller

[permalink] [raw]
Subject: Re: iptables and tcpdump

From: Rusty Russell <[email protected]>
Date: Wed, 31 Oct 2001 17:28:35 +1100

On Mon, 29 Oct 2001 21:31:57 -0800 (PST)
"David S. Miller" <[email protected]> wrote:

> From: Rusty Russell <[email protected]>
> Date: Tue, 30 Oct 2001 15:28:12 +1100
>
> should the NAT layer be doing skb_unshare() before altering the packet?
>
> I think it should.

Agreed. The 2.2 masq code didn't do this, and hence the "don't
tcpdump on masq host" recommendation.

Please try this patch (compiles at least),

Applied to my sources...

Franks a lot,
David S. Miller
[email protected]