2002-09-11 22:28:08

by Simon Kirby

[permalink] [raw]
Subject: 802.1q + device removal causing hang

Hi,

In 2.4.20-pre6, 2.4.20-pre4, 2.4.19, and 2.4.18 at least, I have been
having problems with vlan shutdown when removing tg3.o. rmmod hangs on
rtnl_sem, call trace:

rtnetlink_event
unregister_netdev
vlan_device_event
notifier_call_chain
unregister_netdevice
unregister_netdev
[tg3 module]
[tg3 module]
pci_unregister_driver
...

The hang is happening at rmmod time. All other operation appears to be
fine. If I manually make sure all vlans are removed first, everything
works, so it's probably something to do with the NETDEV_UNREGISTER case
in vlan_device_event() in vlan.c.

Steps to reproduce:

modprobe tg3
ifconfig eth0 up
vconfig add eth0 10
ifconfig eth0.10 1.2.3.4
ifconfig eth0.10 down
ifconfig eth0 down
rmmod tg3

Note that the IP address stays assigned even with eth0.10 set down.
Everything works if the IP assignment is not made. Tested on
2.4.20-pre6.

Does anybody see what would be causing this (perhaps unregister_netdev
reentering)?

(Also, why is the "ifconfig eth0 up" necessary for vconfig? It allows
"ifconfig eth0 down" _after_ the vlan is created, and vlans can still be
removed (but not created) at that point.)

Simon-

[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ [email protected] ][ [email protected] ]
[ Opinions expressed are not necessarily those of my employers. ]


2002-09-11 22:34:47

by David Miller

[permalink] [raw]
Subject: Re: 802.1q + device removal causing hang


Try this:

--- net/8021q/vlan.c.~1~ Wed Sep 11 15:34:49 2002
+++ net/8021q/vlan.c Wed Sep 11 15:34:59 2002
@@ -626,7 +626,7 @@
ret = unregister_vlan_dev(dev,
VLAN_DEV_INFO(vlandev)->vlan_id);

- unregister_netdev(vlandev);
+ unregister_netdevice(vlandev);

/* Group was destroyed? */
if (ret == 1)

2002-09-12 06:30:28

by Bart De Schuymer

[permalink] [raw]
Subject: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34

Hello David, list,

Ebtables is a project similar to iptables, but working on the bridge netfilter
hooks. It allows for a basic transparent firewall, making a brouter and doing
MAC source address and destination address manipulation. The firewall part
has currently modules for basic IP filtering, 802.1q filtering, ARP
filtering, logging and a mark match/target.
Ebtables has been under development for over 1.5 year and has more than 100
users, I think.

The patch is 3662 lines long, so I won't list it in this mail. It is available
at:
http://users.pandora.be/bart.de.schuymer/ebtables/v2.0/ebtables-v2.0_vs_2.5.34.diff
or, gzipped:
http://users.pandora.be/bart.de.schuymer/ebtables/v2.0/ebtables-v2.0_vs_2.5.34.diff.gz

It is vs 2.5.34, I can make a patch vs 2.4.x when the time is right.
Comments/questions are appreciated.

For more information, see
http://users.pandora.be/bart.de.schuymer/ebtables/
There is an ebtables hacking howto, some basic examples and some real life
examples from users. And ofcourse the userspace program.

--
cheers,
Bart

2002-09-12 23:08:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34

From: Bart De Schuymer <[email protected]>
Date: Thu, 12 Sep 2002 08:36:52 +0200

ARP filtering

People should use ARP tables for arp filtering, that is why I wrote
it. ARP filtering should not need to be bridge specific.

Next, has Lennert Buytenhek, the bridging maintainer, approved of your
changes to the bridging layer APIs?

2002-09-12 23:44:43

by Simon Kirby

[permalink] [raw]
Subject: Re: 802.1q + device removal causing hang

On Wed, Sep 11, 2002 at 03:31:32PM -0700, David S. Miller wrote:

> Try this:
>
> --- net/8021q/vlan.c.~1~ Wed Sep 11 15:34:49 2002
> +++ net/8021q/vlan.c Wed Sep 11 15:34:59 2002
> @@ -626,7 +626,7 @@
> ret = unregister_vlan_dev(dev,
> VLAN_DEV_INFO(vlandev)->vlan_id);
>
> - unregister_netdev(vlandev);
> + unregister_netdevice(vlandev);
>
> /* Group was destroyed? */
> if (ret == 1)

Woops, sorry about the erik.ca domain.

Yup, this fixed it!

Simon-

[ Stormix Technologies Inc. ][ NetNation Communications Inc. ]
[ [email protected] ][ [email protected] ]
[ Opinions expressed are not necessarily those of my employers. ]

2002-09-12 23:57:15

by David Miller

[permalink] [raw]
Subject: Re: 802.1q + device removal causing hang

From: Simon Kirby <[email protected]>
Date: Thu, 12 Sep 2002 16:49:22 -0700

On Wed, Sep 11, 2002 at 03:31:32PM -0700, David S. Miller wrote:

> Try this:
>
> --- net/8021q/vlan.c.~1~ Wed Sep 11 15:34:49 2002
> +++ net/8021q/vlan.c Wed Sep 11 15:34:59 2002

Yup, this fixed it!

Thanks for testing, I'll push this to Marcelo later tonight.

2002-09-13 03:14:15

by Bart De Schuymer

[permalink] [raw]
Subject: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34

Hello David, Lennert, list,

> ARP filtering
>
> People should use ARP tables for arp filtering, that is why I wrote
> it. ARP filtering should not need to be bridge specific.

Well, a bridge can also just _bridge_ ARP packets between two sides of the
bridge. The ARP module can filter out those packets. These packets will not
pass through the ARP code of the Linux kernel. Ofcourse, the ebtables ARP
module can be easily adjusted for arptables, I will do this later if nobody
beats me to it... For the same reason, basic ebtables IP filtering is not
redundant.

> Next, has Lennert Buytenhek, the bridging maintainer, approved of your
> changes to the bridging layer APIs?

OK. This is to Lennert:
Could you please have a look at the ebtables patch located at

http://users.pandora.be/bart.de.schuymer/ebtables/v2.0/ebtables-v2.0_vs_2.5.34.diff

and approve the changes made to the bridging layer API? They are necessary to
make a brouter and to deal with bogus NETFILTER_DEBUG warnings if the option
is compiled in the kernel. Any questions will be gladly answered... Note that
the brouting facility has been working for atleast three months and it has
already been used in real-life situations, there's an example usage on the
ebtables homepage. Dealing with NETFILTER_DEBUG warnings consists of setting
nf_debug to zero when the netfilter hooks change from bridge hooks to some
other stack's hooks and vice versa. See the patch.

--
cheers,
Bart

2002-09-13 04:33:30

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34

From: Bart De Schuymer <[email protected]>
Date: Fri, 13 Sep 2002 05:20:41 +0200

Well, a bridge can also just _bridge_ ARP packets between two sides of the
bridge. The ARP module can filter out those packets. These packets will not
pass through the ARP code of the Linux kernel. Ofcourse, the ebtables ARP
module can be easily adjusted for arptables

No, I think I understand the difference and why you're problem
space does not intersect what arptables handles.

It may not be nice that we can't immediately just reuse ipv4/netfilter
handlers for bridging, but I'm not going to require that you make that
work before I'll accept your patch.

Once you work things out with Lennert and he approves the changes,
I'll apply your patch.

2002-09-13 06:05:59

by Bart De Schuymer

[permalink] [raw]
Subject: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34

> It may not be nice that we can't immediately just reuse ipv4/netfilter
> handlers for bridging, but I'm not going to require that you make that
> work before I'll accept your patch.

If you mean using iptables on bridged packets, this is possible with the br-nf
patch. It is not trivial however, 2 new fields to the sk_buff need to be
added, a small change in the IP fragment code and a small change in
ip_tables.c, a change to netfilter.h and netfilter.c. The br-nf patch has
been under development for over 1.5 years, most work done by Lennert with me
helping now and then...
So, if you would accept br-nf, that would be great. I think Lennert and me
would need some time to agree on a few things before submitting anything,
however...

> Once you work things out with Lennert and he approves the changes,
> I'll apply your patch.

Cool, looking forward to his comments.

--
cheers,
Bart

2002-09-13 06:12:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34

From: Bart De Schuymer <[email protected]>
Date: Fri, 13 Sep 2002 08:12:27 +0200

It is not trivial however, 2 new fields to the sk_buff need to be
added, a small change in the IP fragment code and a small change in
ip_tables.c, a change to netfilter.h and netfilter.c.

I've seen these changes, they are very buggy. The IPv4 copies added
are just ugly and are buggy too, they potentially copy past the end
of the packet buffer.

So, if you would accept br-nf, that would be great.

You need to remove the IPv4 bits, that copy of the MAC has to happen
at a different layer, it does not belong in IPv4. At best, everyone
shouldn't eat that header copy.

Franks a lot,
David S. Miller
[email protected]

2002-09-13 12:40:32

by Lennert Buytenhek

[permalink] [raw]
Subject: bridge-netfilter patch (was: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.34)


On Thu, Sep 12, 2002 at 11:09:16PM -0700, David S. Miller wrote:

> From: Bart De Schuymer <[email protected]>
> Date: Fri, 13 Sep 2002 08:12:27 +0200
>
> It is not trivial however, 2 new fields to the sk_buff need to be
> added, a small change in the IP fragment code and a small change in
> ip_tables.c, a change to netfilter.h and netfilter.c.
>
> I've seen these changes, they are very buggy. The IPv4 copies added
> are just ugly and are buggy too, they potentially copy past the end
> of the packet buffer.

You mean this part? This is the only copy added to generic code
that I can find.


--- linux-2.4.19/net/ipv4/ip_output.c 2002-08-03 02:39:46.000000000 +0200
+++ linux-2.4.19-brnf0.0.7/net/ipv4/ip_output.c 2002-09-11 17:40:25.000000000 +0200
@@ -883,6 +885,7 @@
iph->tot_len = htons(len + hlen);

ip_send_check(iph);
+ memcpy(skb2->data - 16, skb->data - 16, 16);

err = output(skb2);
if (err)


If this code is buggy, isn't the following bit from ip_output.c buggy
too? (around line 170)

if (hh) {
read_lock_bh(&hh->hh_lock);
>>> memcpy(skb->data - 16, hh->hh_data, 16);
read_unlock_bh(&hh->hh_lock);
skb_push(skb, hh->hh_len);
return hh->hh_output(skb);
} else if (dst->neighbour)


> So, if you would accept br-nf, that would be great.
>
> You need to remove the IPv4 bits, that copy of the MAC has to happen
> at a different layer, it does not belong in IPv4. At best, everyone
> shouldn't eat that header copy.

What if I make the memcpy conditional on "if (skb->physindev != NULL)"?


cheers,
Lennert

2002-09-13 18:26:11

by David Miller

[permalink] [raw]
Subject: Re: bridge-netfilter patch

From: Lennert Buytenhek <[email protected]>
Date: Fri, 13 Sep 2002 14:45:18 +0200

> You need to remove the IPv4 bits, that copy of the MAC has to happen
> at a different layer, it does not belong in IPv4. At best, everyone
> shouldn't eat that header copy.

What if I make the memcpy conditional on "if (skb->physindev != NULL)"?

First explain to me why the copy is needed for.

2002-09-14 06:59:10

by Bart De Schuymer

[permalink] [raw]
Subject: Re: bridge-netfilter patch

On Friday 13 September 2002 20:22, David S. Miller wrote:
> From: Lennert Buytenhek <[email protected]>
> Date: Fri, 13 Sep 2002 14:45:18 +0200
>
> > You need to remove the IPv4 bits, that copy of the MAC has to happen
> > at a different layer, it does not belong in IPv4. At best, everyone
> > shouldn't eat that header copy.
>
> What if I make the memcpy conditional on "if (skb->physindev != NULL)"?
>
> First explain to me why the copy is needed for.

memcpy(skb2->data - 16, skb->data - 16, 16);

This is for purely bridged packets.
IP connection tracking first gathers all fragments, before the bridged IP
packets are sent out the packet is fragmented. However, since this
fragmenting actually happens while the IP packet "is in the bridge code", no
existing code makes sure the fragments' Ethernet headers are correctly
filled. Now, AFAIK an Ethernet header has length 14 bytes, so don't ask me
why the magic number 16 is used (Lennert got IP fragmenting fixed).
A nice comment in front of that copy is certainly needed...

--
cheers,
Bart

2002-09-15 21:22:13

by Lennert Buytenhek

[permalink] [raw]
Subject: Re: bridge-netfilter patch


On Fri, Sep 13, 2002 at 11:22:35AM -0700, David S. Miller wrote:

> > You need to remove the IPv4 bits, that copy of the MAC has to happen
> > at a different layer, it does not belong in IPv4. At best, everyone
> > shouldn't eat that header copy.
>
> What if I make the memcpy conditional on "if (skb->physindev != NULL)"?
>
> First explain to me why the copy is needed for.

This is just to elaborate upon what Bart said earlier.

In the "L2 switched frame" case, we have a bit of a nasty problem
with IP fragmentation. And in the "L3 'switched' frame" case
(brouted frame), we have an ordering problem with IP fragmentation
and neighbor resolution.

This is what the call stack looks like when we have a purely
bridged frame (that needs to be netfiltered):

net_rx_action
-> br_handle_frame
-> PF_BRIDGE/PRE_ROUTING
-> br_handle_frame_finish
-> br_forward
-> PF_BRIDGE/FORWARD
-> __br_forward_finish
-> PF_BRIDGE/POST_ROUTING
-> dev_queue_xmit


This case is easy to see. With ip_conntrack enabled, packets
are reassembled in PRE_ROUTING and refragmented in POST_ROUTING.
This refragmenting messes up the hardware header, so the fragments
will leave the box with incorrect HW headers.


The broute case is a bit harder to see. If L3 (routed) packets
are destined for a bridge device, we don't know what subdevice
(slave port) they will go to until the bridge layer's br_dev_xmit
has its way. However, we would like to be able to use the real
outgoing interface (physoutdev) in FORWARD and POST_ROUTING.

To be able to do this, we postpone calling IPv4/FORWARD and
IPv4/POST_ROUTING until after PF_BRIDGE/POST_ROUTING has happened,
because at that point we know physoutdev so we can feed it to
said IPv4 hooks.

But. Packet refragmentation normally happens in IPv4/POST_ROUTING.
We don't want to do it there though, because that would cause the
eventual call to IPv4/FORWARD and IPv4/POST_ROUTING to see all
fragments instead of one packet (which goes against the idea of
conntrack).

So if we postpone FORWARD and POST_ROUTING until after br_dev_xmit,
we effectively reverse refragmentation and neighbor resolution.
But refragmentation messes up the hardware header.


The 16byte hardware header copy fixes this by copying to each
fragment the hardware header that was tacked onto or was already
present on the bigger packet. It's ugly, I admit. There's
currently no better way though.

(And Bart, I chose 16 because 16-byte aligned 16-byte copies
should be cheaper than 2-byte aligned 14-byte copies, and there
should be at least 16 bytes before skb->data at this point
anyway. That is, if I understood the code correctly.)


cheers,
Lennert

2002-09-16 03:39:26

by David Miller

[permalink] [raw]
Subject: Re: bridge-netfilter patch

From: Bart De Schuymer <[email protected]>
Date: Sat, 14 Sep 2002 09:05:40 +0200

On Friday 13 September 2002 20:22, David S. Miller wrote:
> First explain to me why the copy is needed for.

memcpy(skb2->data - 16, skb->data - 16, 16);

This is for purely bridged packets.

Why is it being added, therefore, to ip_queue_xmit() which is only
ever invoked by TCP output processing?

If the patch adds the call somewhere else, please correct me, but
I specifically remember it being added to ip_queue_xmit() which is
why I barfed when seeing it :-)


2002-09-16 06:43:44

by Bart De Schuymer

[permalink] [raw]
Subject: [PATCH] ebtables - Ethernet bridge tables, for 2.5.35

Hello David,

The following link points to the ebtables patch approved by Lennert:
http://users.pandora.be/bart.de.schuymer/ebtables/v2.0/ebtables-v2.0_vs_2.5.35-try3.diff

Changes:

cleanup brouter interface in the bridge code + brouter bugfix.

--
cheers,
Bart

2002-09-16 21:34:40

by Bart De Schuymer

[permalink] [raw]
Subject: Re: bridge-netfilter patch

> This is for purely bridged packets.
>
> Why is it being added, therefore, to ip_queue_xmit() which is only
> ever invoked by TCP output processing?
>
> If the patch adds the call somewhere else, please correct me, but
> I specifically remember it being added to ip_queue_xmit() which is
> why I barfed when seeing it :-)

I've never seen this in the patch. It sure isn't in it now.

To be more precise:
net/ipv4/netfilter/ip_conntrack_standalone.c:ip_refrag() is (or can be)
attached to the NF_IP_POST_ROUTING hook. This function calls:
net/ipv4/ip_output.c:ip_fragment()
In this function the copy of the Ethernet frame is added for each fragment (by
the br-nf patch).
The bridge-netfilter patch lets IP packets/frames passing the
NF_BR_POST_ROUTING hook go through the NF_IP_POST_ROUTING hook, so the
ip_fragment() code is executed while the IP packet/frame is really in the
bridge code. After this, the fragments get queued:
net/bridge/br_forward.c:br_dev_queue_push_xmit() calls dev_queue_xmit()

Lennert's previous mail says in which cases and why this header copy has to be
explicitly done.

The following document might be useful to know what we are doing:
http://users.pandora.be/bart.de.schuymer/ebtables/br_fw_ia/br_fw_ia.html

--
cheers,
Bart

2002-09-16 23:05:46

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] ebtables - Ethernet bridge tables, for 2.5.35

From: Bart De Schuymer <[email protected]>
Date: Mon, 16 Sep 2002 08:50:16 +0200

The following link points to the ebtables patch approved by Lennert:
http://users.pandora.be/bart.de.schuymer/ebtables/v2.0/ebtables-v2.0_vs_2.5.35-try3.diff

I'm applying this, but please in the future do not put these
"*** deal with blah blah" seperators in one big patch file.

Give me one bug patch I can just do a clean "patch -p1 <your_patch"
with, thanks.

Sure it's just noise and patch perhaps knows how to skip over it, but
lots of other patch reading tools might not be able to. (for example
diffstat gives a lot of different paths as the patch base directory
because you've done the patch this way).

2002-09-16 23:25:22

by David Miller

[permalink] [raw]
Subject: Re: bridge-netfilter patch

From: Bart De Schuymer <[email protected]>
Date: Mon, 16 Sep 2002 23:41:17 +0200

net/ipv4/ip_output.c:ip_fragment()
In this function the copy of the Ethernet frame is added for each fragment (by
the br-nf patch).

'output' callback arg to ip_fragment() must generate correct hardware
headers when necessary. This hack usage of it via netfilter, in this
weird bridging case, is violating this requirement.

Normally ip_finish_output2() is going to make this.

If it can't do the job properly, pass instead a routine that can do
what netfilter needs.

Lennert says:

So if we postpone FORWARD and POST_ROUTING until after br_dev_xmit,
we effectively reverse refragmentation and neighbor resolution.
But refragmentation messes up the hardware header.

The 16byte hardware header copy fixes this by copying to each
fragment the hardware header that was tacked onto or was already
present on the bigger packet. It's ugly, I admit. There's
currently no better way though.

I don't understand why you can't add on the hardware header some other
way.

If ip_finish_output doesn't put the right hardware header on there,
you have to use as 'okfn' (what netfilter sends down as 'output' to
ip_fragment) some routine which will do it correctly.

2002-09-17 19:03:32

by Bart De Schuymer

[permalink] [raw]
Subject: Re: bridge-netfilter patch

> net/ipv4/ip_output.c:ip_fragment()
> In this function the copy of the Ethernet frame is added for each
> fragment (by the br-nf patch).
>
> 'output' callback arg to ip_fragment() must generate correct hardware
> headers when necessary. This hack usage of it via netfilter, in this
> weird bridging case, is violating this requirement.
>
> Normally ip_finish_output2() is going to make this.
>
> If it can't do the job properly, pass instead a routine that can do
> what netfilter needs.

Aha. In our case, the output function is
net/bridge/br_forward.c:__dev_queue_push_xmit(). This is because
__br_forward_finish() (same file) uses this as okfn. Remember the IP hooks
are "faked" on the bridge hooks, so functions attached to NF_IP_POST_ROUTING
are called when the IP packet/frame passes the NF_BR_POST_ROUTING hook. They
are not called earlier. All of this assuming that the destination device
according to the routing table is a (logical) bridge device. If not, the
packets go through the IP code and netfilter hooks normally.

So, what if we were to add the following code to the start of
__dev_queue_push_xmit():

if (skb->protocol == __constant_htons(ETH_P_IP)) {
struct dst_entry *dst = skb->dst;
if (hh) {
read_lock_bh(&hh->hh_lock);
memcpy(skb->data - 16, hh->hh_data, 16);
read_unlock_bh(&hh->hh_lock);
}
}

hh being NULL for an unfragmented IP packet and else non-NULL? Do realize that
I (I can't speak for Lennert ofcourse) am not very familiar to the workings
of the IP code.

Then we can remove the memcpy from ip_fragment(). Does that make sense?

--
cheers,
Bart

2002-09-17 19:39:36

by David Miller

[permalink] [raw]
Subject: Re: bridge-netfilter patch

From: Bart De Schuymer <[email protected]>
Date: Tue, 17 Sep 2002 21:10:06 +0200

Then we can remove the memcpy from ip_fragment(). Does that make sense?

That sounds like the kind of solution I'd like used.

2002-10-14 17:57:52

by Bart De Schuymer

[permalink] [raw]
Subject: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42

Hello David, list,

David, you asked for a patch for mapping the IPv4 hooks onto the bridge hooks,
this is the patch.

Comments and suggestions for better solutions are welcome.

I don't know how to (easily) move the copy of the Ethernet header out of
ip_output.c::ip_fragment.c. The patch adds 3 members to the skbuff...

There is a little text file explaining the source code more in-depth here:
http://users.pandora.be/bart.de.schuymer/ebtables/br-nf/bridge-nf-0.0.10-dev-pre1-against-2.5.39-comments.txt
A high-level explanation of what we're doing is here:
http://users.pandora.be/bart.de.schuymer/ebtables/br_fw_ia/br_fw_ia.html
The patch is also available at:
http://users.pandora.be/bart.de.schuymer/ebtables/br-nf/bridge-nf-0.0.10-dev-pre1-against-2.5.42.diff

Note that Lennert has not yet responded to my private RFC to him, but
Halloween is coming, so we should at least try to get this into 2.5...

cheers,
Bart

Here's the patch:

--- linux-2.5.42/include/linux/netfilter.h Sat Oct 12 06:22:08 2002
+++ linux-2.5.42-brnf/include/linux/netfilter.h Sun Oct 13 11:45:19 2002
@@ -117,17 +117,23 @@
/* This is gross, but inline doesn't cut it for avoiding the function
call in fast path: gcc doesn't inline (needs value tracking?). --RR */
#ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
+ nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
+#define NF_HOOK_THRESH nf_hook_slow
#else
#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
(list_empty(&nf_hooks[(pf)][(hook)]) \
? (okfn)(skb) \
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn)))
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh) \
+(list_empty(&nf_hooks[(pf)][(hook)]) \
+ ? (okfn)(skb) \
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
#endif

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct sk_buff *));
+ int (*okfn)(struct sk_buff *), int thresh);

/* Call setsockopt() */
int nf_setsockopt(struct sock *sk, int pf, int optval, char *opt,
--- linux-2.5.42/include/linux/netfilter_ipv4.h Sat Oct 12 06:22:18 2002
+++ linux-2.5.42-brnf/include/linux/netfilter_ipv4.h Sun Oct 13 11:45:19 2002
@@ -52,8 +52,10 @@
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK = -200,
+ NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
+ NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_LAST = INT_MAX,
--- linux-2.5.42/include/linux/netfilter_bridge.h Sat Oct 12 06:22:09 2002
+++ linux-2.5.42-brnf/include/linux/netfilter_bridge.h Sun Oct 13 11:45:19
2002
@@ -22,14 +22,27 @@
#define NF_BR_BROUTING 5
#define NF_BR_NUMHOOKS 6

+/* Masks for skb->brnfmask */
+#define BRNF_PKT_TYPE 0x01
+#define BRNF_BRIDGED_DNAT 0x02
+#define BRNF_COPY_HEADER 0x04
+#define BRNF_DONT_TAKE_PARENT 0x08
+
enum nf_br_hook_priorities {
NF_BR_PRI_FIRST = INT_MIN,
- NF_BR_PRI_FILTER_BRIDGED = -200,
- NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_DST_BRIDGED = -300,
+ NF_BR_PRI_FILTER_BRIDGED = -200,
+ NF_BR_PRI_BRNF = 0,
NF_BR_PRI_NAT_DST_OTHER = 100,
+ NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_SRC = 300,
NF_BR_PRI_LAST = INT_MAX,
};

+/* Used in br_netfilter.c */
+struct bridge_skb_cb {
+ union {
+ __u32 ipv4;
+ } daddr;
+};
#endif
--- linux-2.5.42/include/linux/skbuff.h Sat Oct 12 06:22:09 2002
+++ linux-2.5.42-brnf/include/linux/skbuff.h Sun Oct 13 11:45:19 2002
@@ -140,6 +140,8 @@
* @sk: Socket we are owned by
* @stamp: Time we arrived
* @dev: Device we arrived on/are leaving by
+ * @physindev: Physical device we arrived on - see br_netfilter.c
+ * @physoutdev: Phsical device we will leave by - see br_netfilter.c
* @h: Transport layer header
* @nh: Network layer header
* @mac: Link layer header
@@ -166,6 +168,7 @@
* @nfcache: Cache info
* @nfct: Associated connection, if any
* @nf_debug: Netfilter debugging
+ * @brnfmask: Info about a bridged frame - see br_netfilter.c
* @tc_index: Traffic control index
*/

@@ -178,6 +181,8 @@
struct sock *sk;
struct timeval stamp;
struct net_device *dev;
+ struct net_device *physindev;
+ struct net_device *physoutdev;

union {
struct tcphdr *th;
@@ -236,6 +241,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
unsigned int nf_debug;
#endif
+ unsigned int brnfmask;
#endif /* CONFIG_NETFILTER */
#if defined(CONFIG_HIPPI)
union {
--- linux-2.5.42/net/bridge/br.c Sat Oct 12 06:21:34 2002
+++ linux-2.5.42-brnf/net/bridge/br.c Sun Oct 13 11:45:19 2002
@@ -45,6 +45,8 @@
{
printk(KERN_INFO "NET4: Ethernet Bridge 008 for NET4.0\n");

+ if (br_netfilter_init())
+ return 1;
br_handle_frame_hook = br_handle_frame;
br_ioctl_hook = br_ioctl_deviceless_stub;
#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
@@ -63,6 +65,7 @@

static void __exit br_deinit(void)
{
+ br_netfilter_fini();
unregister_netdevice_notifier(&br_device_notifier);
br_call_ioctl_atomic(__br_clear_ioctl_hook);

--- linux-2.5.42/net/bridge/br_forward.c Sat Oct 12 06:21:37 2002
+++ linux-2.5.42-brnf/net/bridge/br_forward.c Sun Oct 13 11:45:19 2002
@@ -30,7 +30,7 @@
return 1;
}

-static int __dev_queue_push_xmit(struct sk_buff *skb)
+int br_dev_queue_push_xmit(struct sk_buff *skb)
{
skb_push(skb, ETH_HLEN);
dev_queue_xmit(skb);
@@ -38,10 +38,10 @@
return 0;
}

-static int __br_forward_finish(struct sk_buff *skb)
+int br_forward_finish(struct sk_buff *skb)
{
NF_HOOK(PF_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev,
- __dev_queue_push_xmit);
+ br_dev_queue_push_xmit);

return 0;
}
@@ -53,7 +53,7 @@
skb->nf_debug = 0;
#endif
NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
- __br_forward_finish);
+ br_forward_finish);
}

static void __br_forward(struct net_bridge_port *to, struct sk_buff *skb)
@@ -64,7 +64,7 @@
skb->dev = to->dev;

NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev,
- __br_forward_finish);
+ br_forward_finish);
}

/* called under bridge lock */
--- linux-2.5.42/net/bridge/br_input.c Sat Oct 12 06:21:35 2002
+++ linux-2.5.42-brnf/net/bridge/br_input.c Sun Oct 13 11:45:19 2002
@@ -49,7 +49,7 @@
br_pass_frame_up_finish);
}

-static int br_handle_frame_finish(struct sk_buff *skb)
+int br_handle_frame_finish(struct sk_buff *skb)
{
struct net_bridge *br;
unsigned char *dest;
--- linux-2.5.42/net/bridge/br_private.h Sat Oct 12 06:21:35 2002
+++ linux-2.5.42-brnf/net/bridge/br_private.h Sun Oct 13 11:45:19 2002
@@ -144,8 +144,10 @@
/* br_forward.c */
extern void br_deliver(struct net_bridge_port *to,
struct sk_buff *skb);
+extern int br_dev_queue_push_xmit(struct sk_buff *skb);
extern void br_forward(struct net_bridge_port *to,
struct sk_buff *skb);
+extern int br_forward_finish(struct sk_buff *skb);
extern void br_flood_deliver(struct net_bridge *br,
struct sk_buff *skb,
int clone);
@@ -166,6 +168,7 @@
int *ifindices);

/* br_input.c */
+extern int br_handle_frame_finish(struct sk_buff *skb);
extern int br_handle_frame(struct sk_buff *skb);

/* br_ioctl.c */
@@ -176,6 +179,10 @@
unsigned long arg1,
unsigned long arg2);
extern int br_ioctl_deviceless_stub(unsigned long arg);
+
+/* br_netfilter.c */
+extern int br_netfilter_init(void);
+extern void br_netfilter_fini(void);

/* br_stp.c */
extern int br_is_root_bridge(struct net_bridge *br);
--- linux-2.5.42/net/bridge/Makefile Sat Oct 12 06:22:45 2002
+++ linux-2.5.42-brnf/net/bridge/Makefile Sun Oct 13 11:45:19 2002
@@ -9,6 +9,11 @@
bridge-objs := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \
br_ioctl.o br_notify.o br_stp.o br_stp_bpdu.o \
br_stp_if.o br_stp_timer.o
+
+ifeq ($(CONFIG_NETFILTER),y)
+bridge-objs += br_netfilter.o
+endif
+
obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/

include $(TOPDIR)/Rules.make
--- linux-2.5.42/net/core/netfilter.c Sat Oct 12 06:22:07 2002
+++ linux-2.5.42-brnf/net/core/netfilter.c Sun Oct 13 11:45:19 2002
@@ -342,10 +342,15 @@
const struct net_device *indev,
const struct net_device *outdev,
struct list_head **i,
- int (*okfn)(struct sk_buff *))
+ int (*okfn)(struct sk_buff *),
+ int hook_thresh)
{
for (*i = (*i)->next; *i != head; *i = (*i)->next) {
struct nf_hook_ops *elem = (struct nf_hook_ops *)*i;
+
+ if (hook_thresh > elem->priority)
+ continue;
+
switch (elem->hook(hook, skb, indev, outdev, okfn)) {
case NF_QUEUE:
return NF_QUEUE;
@@ -413,6 +418,8 @@
{
int status;
struct nf_info *info;
+ struct net_device *physindev;
+ struct net_device *physoutdev;

if (!queue_handler[pf].outfn) {
kfree_skb(skb);
@@ -435,11 +442,16 @@
if (indev) dev_hold(indev);
if (outdev) dev_hold(outdev);

+ if ((physindev = skb->physindev)) dev_hold(physindev);
+ if ((physoutdev = skb->physoutdev)) dev_hold(physoutdev);
+
status = queue_handler[pf].outfn(skb, info, queue_handler[pf].data);
if (status < 0) {
/* James M doesn't say fuck enough. */
if (indev) dev_put(indev);
if (outdev) dev_put(outdev);
+ if (physindev) dev_put(physindev);
+ if (physoutdev) dev_put(physoutdev);
kfree(info);
kfree_skb(skb);
return;
@@ -449,7 +461,8 @@
int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
struct net_device *indev,
struct net_device *outdev,
- int (*okfn)(struct sk_buff *))
+ int (*okfn)(struct sk_buff *),
+ int hook_thresh)
{
struct list_head *elem;
unsigned int verdict;
@@ -481,7 +494,7 @@

elem = &nf_hooks[pf][hook];
verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
- outdev, &elem, okfn);
+ outdev, &elem, okfn, hook_thresh);
if (verdict == NF_QUEUE) {
NFDEBUG("nf_hook: Verdict = QUEUE.\n");
nf_queue(skb, elem, pf, hook, indev, outdev, okfn);
@@ -530,7 +543,7 @@
verdict = nf_iterate(&nf_hooks[info->pf][info->hook],
&skb, info->hook,
info->indev, info->outdev, &elem,
- info->okfn);
+ info->okfn, INT_MIN);
}

switch (verdict) {
--- linux-2.5.42/net/core/skbuff.c Sat Oct 12 06:21:34 2002
+++ linux-2.5.42-brnf/net/core/skbuff.c Sun Oct 13 11:45:19 2002
@@ -234,6 +234,8 @@
skb->sk = NULL;
skb->stamp.tv_sec = 0; /* No idea about time */
skb->dev = NULL;
+ skb->physindev = NULL;
+ skb->physoutdev = NULL;
skb->dst = NULL;
memset(skb->cb, 0, sizeof(skb->cb));
skb->pkt_type = PACKET_HOST; /* Default type */
@@ -248,6 +250,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
skb->nf_debug = 0;
#endif
+ skb->brnfmask = 0;
#endif
#ifdef CONFIG_NET_SCHED
skb->tc_index = 0;
@@ -363,6 +366,8 @@
n->sk = NULL;
C(stamp);
C(dev);
+ C(physindev);
+ C(physoutdev);
C(h);
C(nh);
C(mac);
@@ -392,6 +397,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
C(nf_debug);
#endif
+ C(brnfmask);
#endif /*CONFIG_NETFILTER*/
#if defined(CONFIG_HIPPI)
C(private);
@@ -418,6 +424,8 @@
new->list = NULL;
new->sk = NULL;
new->dev = old->dev;
+ new->physindev = old->physindev;
+ new->physoutdev = old->physoutdev;
new->priority = old->priority;
new->protocol = old->protocol;
new->dst = dst_clone(old->dst);
@@ -438,6 +446,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
new->nf_debug = old->nf_debug;
#endif
+ new->brnfmask = old->brnfmask;
#endif
#ifdef CONFIG_NET_SCHED
new->tc_index = old->tc_index;
--- linux-2.5.42/net/ipv4/ip_output.c Sat Oct 12 06:22:45 2002
+++ linux-2.5.42-brnf/net/ipv4/ip_output.c Sun Oct 13 11:45:19 2002
@@ -75,6 +75,7 @@
#include <net/inetpeer.h>
#include <linux/igmp.h>
#include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_bridge.h>
#include <linux/mroute.h>
#include <linux/netlink.h>

@@ -908,6 +909,18 @@
iph->tot_len = htons(len + hlen);

ip_send_check(iph);
+
+ /*
+ * Fragments with a bridge device destination need
+ * to get the Ethernet header copied here, as
+ * br_dev_queue_push_xmit() can't do this.
+ * See net/bridge/br_netfilter.c
+ */
+
+#ifdef CONFIG_NETFILTER
+ if (skb->brnfmask & BRNF_COPY_HEADER)
+ memcpy(skb2->data - 16, skb->data - 16, 16);
+#endif

err = output(skb2);
if (err)
--- linux-2.5.42/net/ipv4/netfilter/ip_tables.c Sat Oct 12 06:21:35 2002
+++ linux-2.5.42-brnf/net/ipv4/netfilter/ip_tables.c Sun Oct 13 11:45:19 2002
@@ -121,12 +121,14 @@
static inline int
ip_packet_match(const struct iphdr *ip,
const char *indev,
+ const char *physindev,
const char *outdev,
+ const char *physoutdev,
const struct ipt_ip *ipinfo,
int isfrag)
{
size_t i;
- unsigned long ret;
+ unsigned long ret, ret2;

#define FWINV(bool,invflg) ((bool) ^ !!(ipinfo->invflags & invflg))

@@ -156,7 +158,13 @@
& ((const unsigned long *)ipinfo->iniface_mask)[i];
}

- if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
+ for (i = 0, ret2 = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret2 |= (((const unsigned long *)physindev)[i]
+ ^ ((const unsigned long *)ipinfo->iniface)[i])
+ & ((const unsigned long *)ipinfo->iniface_mask)[i];
+ }
+
+ if (FWINV(ret != 0 && ret2 != 0, IPT_INV_VIA_IN)) {
dprintf("VIA in mismatch (%s vs %s).%s\n",
indev, ipinfo->iniface,
ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
@@ -169,7 +177,13 @@
& ((const unsigned long *)ipinfo->outiface_mask)[i];
}

- if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
+ for (i = 0, ret2 = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret2 |= (((const unsigned long *)physoutdev)[i]
+ ^ ((const unsigned long *)ipinfo->outiface)[i])
+ & ((const unsigned long *)ipinfo->outiface_mask)[i];
+ }
+
+ if (FWINV(ret != 0 && ret2 != 0, IPT_INV_VIA_OUT)) {
dprintf("VIA out mismatch (%s vs %s).%s\n",
outdev, ipinfo->outiface,
ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
@@ -268,6 +282,7 @@
/* Initializing verdict to NF_DROP keeps gcc happy. */
unsigned int verdict = NF_DROP;
const char *indev, *outdev;
+ const char *physindev, *physoutdev;
void *table_base;
struct ipt_entry *e, *back;

@@ -277,6 +292,9 @@
datalen = (*pskb)->len - ip->ihl * 4;
indev = in ? in->name : nulldevname;
outdev = out ? out->name : nulldevname;
+ physindev = (*pskb)->physindev ? (*pskb)->physindev->name : nulldevname;
+ physoutdev = (*pskb)->physoutdev ? (*pskb)->physoutdev->name : nulldevname;
+
/* We handle fragments by dealing with the first fragment as
* if it was a normal packet. All other fragments are treated
* normally, except that they will NEVER match rules that ask
@@ -311,7 +329,8 @@
IP_NF_ASSERT(e);
IP_NF_ASSERT(back);
(*pskb)->nfcache |= e->nfcache;
- if (ip_packet_match(ip, indev, outdev, &e->ip, offset)) {
+ if (ip_packet_match(ip, indev, physindev, outdev, physoutdev,
+ &e->ip, offset)) {
struct ipt_entry_target *t;

if (IPT_MATCH_ITERATE(e, do_match,
--- linux-2.5.42/net/ipv4/netfilter/ipt_LOG.c Sat Oct 12 06:21:38 2002
+++ linux-2.5.42-brnf/net/ipv4/netfilter/ipt_LOG.c Sun Oct 13 11:45:19 2002
@@ -285,10 +285,13 @@
level_string[1] = '0' + (loginfo->level % 8);
spin_lock_bh(&log_lock);
printk(level_string);
- printk("%sIN=%s OUT=%s ",
- loginfo->prefix,
- in ? in->name : "",
- out ? out->name : "");
+ printk("%sIN=%s ", loginfo->prefix, in ? in->name : "");
+ if ((*pskb)->physindev && in != (*pskb)->physindev)
+ printk("PHYSIN=%s ", (*pskb)->physindev->name);
+ printk("OUT=%s ", out ? out->name : "");
+ if ((*pskb)->physoutdev && out != (*pskb)->physoutdev)
+ printk("PHYSOUT=%s ", (*pskb)->physoutdev->name);
+
if (in && !out) {
/* MAC logging for input chain only. */
printk("MAC=");
--- /dev/null Thu Aug 24 11:00:32 2000
+++ linux-2.5.42-brnf/net/bridge/br_netfilter.c Sun Oct 13 11:45:19 2002
@@ -0,0 +1,602 @@
+/*
+ * Handle firewalling
+ * Linux ethernet bridge
+ *
+ * Authors:
+ * Lennert Buytenhek <[email protected]>
+ * Bart De Schuymer <[email protected]>
+ *
+ * $Id: br_netfilter.c,v 1.2 2002/09/11 19:35:44 bdschuym Exp $
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Lennert dedicates this file to Kerstin Wurdinger.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/ip.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/if_ether.h>
+#include <linux/netfilter_bridge.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/in_route.h>
+#include <net/ip.h>
+#include <asm/uaccess.h>
+#include <asm/checksum.h>
+#include "br_private.h"
+
+
+#define skb_origaddr(skb) (((struct bridge_skb_cb *) \
+ (skb->cb))->daddr.ipv4)
+#define store_orig_dstaddr(skb) (skb_origaddr(skb) = (skb)->nh.iph->daddr)
+#define dnat_took_place(skb) (skb_origaddr(skb) != (skb)->nh.iph->daddr)
+#define clear_cb(skb) (memset(&skb_origaddr(skb), 0, \
+ sizeof(struct bridge_skb_cb)))
+
+#define has_bridge_parent(device) ((device)->br_port != NULL)
+#define bridge_parent(device) (&((device)->br_port->br->dev))
+
+/* We need these fake structures to make netfilter happy --
+ * lots of places assume that skb->dst != NULL, which isn't
+ * all that unreasonable.
+ *
+ * Currently, we fill in the PMTU entry because netfilter
+ * refragmentation needs it, and the rt_flags entry because
+ * ipt_REJECT needs it. Future netfilter modules might
+ * require us to fill additional fields.
+ */
+static struct net_device __fake_net_device = {
+ hard_header_len: ETH_HLEN
+};
+
+static struct rtable __fake_rtable = {
+ u: {
+ dst: {
+ __refcnt: ATOMIC_INIT(1),
+ dev: &__fake_net_device,
+ pmtu: 1500
+ }
+ },
+
+ rt_flags: 0
+};
+
+
+/* PF_BRIDGE/PRE_ROUTING *********************************************/
+static void __br_dnat_complain(void)
+{
+ static unsigned long last_complaint = 0;
+
+ if (jiffies - last_complaint >= 5 * HZ) {
+ printk(KERN_WARNING "Performing cross-bridge DNAT requires IP "
+ "forwarding to be enabled\n");
+ last_complaint = jiffies;
+ }
+}
+
+
+/* This requires some explaining. If DNAT has taken place,
+ * we will need to fix up the destination Ethernet address,
+ * and this is a tricky process.
+ *
+ * There are two cases to consider:
+ * 1. The packet was DNAT'ed to a device in the same bridge
+ * port group as it was received on. We can still bridge
+ * the packet.
+ * 2. The packet was DNAT'ed to a different device, either
+ * a non-bridged device or another bridge port group.
+ * The packet will need to be routed.
+ *
+ * The correct way of distinguishing between these two cases is to
+ * call ip_route_input() and to look at skb->dst->dev, which is
+ * changed to the destination device if ip_route_input() succeeds.
+ *
+ * Let us first consider the case that ip_route_input() succeeds:
+ *
+ * If skb->dst->dev equals the logical bridge device the packet
+ * came in on, we can consider this bridging. We then call
+ * skb->dst->output() which will make the packet enter br_nf_local_out()
+ * not much later. In that function it is assured that the iptables
+ * FORWARD chain is traversed for the packet.
+ *
+ * Otherwise, the packet is considered to be routed and we just
+ * change the destination MAC address so that the packet will
+ * later be passed up to the IP stack to be routed.
+ *
+ * Let us now consider the case that ip_route_input() fails:
+ *
+ * After a "echo '0' > /proc/sys/net/ipv4/ip_forward" ip_route_input()
+ * will fail, while ip_route_output() will return success. The source
+ * address for ip_route_output() is set to zero, so ip_route_output()
+ * thinks we're handling a locally generated packet and won't care
+ * if IP forwarding is allowed. We send a warning message to the users's
+ * log telling her to put IP forwarding on.
+ *
+ * ip_route_input() will also fail if there is no route available.
+ * In that case we just drop the packet.
+ *
+ * --Lennert, 20020411
+ * --Bart, 20020416 (updated)
+ * --Bart, 20021007 (updated)
+ */
+
+static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug |= (1 << NF_BR_PRE_ROUTING) | (1 << NF_BR_FORWARD);
+#endif
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ skb->brnfmask |= BRNF_PKT_TYPE;
+ }
+
+ skb->dev = bridge_parent(skb->dev);
+ skb->dst->output(skb);
+ return 0;
+}
+
+static int br_nf_pre_routing_finish(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct iphdr *iph = skb->nh.iph;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_PRE_ROUTING);
+#endif
+
+ if (skb->brnfmask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ skb->brnfmask ^= BRNF_PKT_TYPE;
+ }
+
+ if (dnat_took_place(skb)) {
+ if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
+ dev)) {
+ struct rtable *rt;
+
+ if (!ip_route_output(&rt, iph->daddr, 0, iph->tos, 0)) {
+ /* Bridged-and-DNAT'ed traffic doesn't
+ * require ip_forwarding.
+ */
+ if (((struct dst_entry *)rt)->dev == dev) {
+ skb->dst = (struct dst_entry *)rt;
+ goto bridged_dnat;
+ }
+ __br_dnat_complain();
+ dst_release((struct dst_entry *)rt);
+ }
+ kfree_skb(skb);
+ return 0;
+ } else {
+ if (skb->dst->dev == dev) {
+bridged_dnat:
+ /* Tell br_nf_local_out this is a
+ * bridged frame
+ */
+ skb->brnfmask |= BRNF_BRIDGED_DNAT;
+ skb->dev = skb->physindev;
+ clear_cb(skb);
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING,
+ skb, skb->dev, NULL,
+ br_nf_pre_routing_finish_bridge,
+ 1);
+ return 0;
+ }
+ memcpy(skb->mac.ethernet->h_dest, dev->dev_addr,
+ ETH_ALEN);
+ }
+ } else {
+ skb->dst = (struct dst_entry *)&__fake_rtable;
+ dst_hold(skb->dst);
+ }
+
+ clear_cb(skb);
+ skb->dev = skb->physindev;
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
+ br_handle_frame_finish, 1);
+
+ return 0;
+}
+
+/* Replicate the checks that IPv4 does on packet reception.
+ * Set skb->dev to the bridge device (i.e. parent of the
+ * receiving device) to make netfilter happy, the REDIRECT
+ * target in particular. Save the original destination IP
+ * address to be able to detect DNAT afterwards.
+ */
+static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct iphdr *iph;
+ __u32 len;
+ struct sk_buff *skb;
+
+ if ((*pskb)->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if ((skb = skb_share_check(*pskb, GFP_ATOMIC)) == NULL)
+ goto out;
+
+ if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+ goto inhdr_error;
+
+ iph = skb->nh.iph;
+ if (iph->ihl < 5 || iph->version != 4)
+ goto inhdr_error;
+
+ if (!pskb_may_pull(skb, 4*iph->ihl))
+ goto inhdr_error;
+
+ iph = skb->nh.iph;
+ if (ip_fast_csum((__u8 *)iph, iph->ihl) != 0)
+ goto inhdr_error;
+
+ len = ntohs(iph->tot_len);
+ if (skb->len < len || len < 4*iph->ihl)
+ goto inhdr_error;
+
+ if (skb->len > len) {
+ __pskb_trim(skb, len);
+ if (skb->ip_summed == CHECKSUM_HW)
+ skb->ip_summed = CHECKSUM_NONE;
+ }
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_PRE_ROUTING);
+#endif
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ skb->brnfmask |= BRNF_PKT_TYPE;
+ }
+
+ skb->physindev = skb->dev;
+ skb->dev = bridge_parent(skb->dev);
+ store_orig_dstaddr(skb);
+
+ NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+ br_nf_pre_routing_finish);
+
+ return NF_STOLEN;
+
+inhdr_error:
+// IP_INC_STATS_BH(IpInHdrErrors);
+out:
+ return NF_DROP;
+}
+
+
+/* PF_BRIDGE/LOCAL_IN ************************************************/
+/* The packet is locally destined, which requires a real
+ * dst_entry, so detach the fake one. On the way up, the
+ * packet would pass through PRE_ROUTING again (which already
+ * took place when the packet entered the bridge), but we
+ * register an IPv4 PRE_ROUTING 'sabotage' hook that will
+ * prevent this from happening.
+ */
+static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if (skb->dst == (struct dst_entry *)&__fake_rtable) {
+ dst_release(skb->dst);
+ skb->dst = NULL;
+ }
+
+ return NF_ACCEPT;
+}
+
+
+/* PF_BRIDGE/FORWARD *************************************************/
+static int br_nf_forward_finish(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_FORWARD);
+#endif
+
+ if (skb->brnfmask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ skb->brnfmask ^= BRNF_PKT_TYPE;
+ }
+
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_FORWARD, skb, skb->physindev,
+ skb->dev, br_forward_finish, 1);
+
+ return 0;
+}
+
+/* This is the 'purely bridged' case. We pass the packet to
+ * netfilter with indev and outdev set to the bridge device,
+ * but we are still able to filter on the 'real' indev/outdev
+ * because another bit of the bridge-nf patch overloads the
+ * '-i' and '-o' iptables interface checks to take
+ * skb->phys{in,out}dev into account as well (so both the real
+ * device and the bridge device will match).
+ */
+static unsigned int br_nf_forward(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_FORWARD);
+#endif
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ skb->brnfmask |= BRNF_PKT_TYPE;
+ }
+
+ skb->physoutdev = skb->dev;
+
+ NF_HOOK(PF_INET, NF_IP_FORWARD, skb, bridge_parent(skb->physindev),
+ bridge_parent(skb->dev), br_nf_forward_finish);
+
+ return NF_STOLEN;
+}
+
+
+/* PF_BRIDGE/LOCAL_OUT ***********************************************/
+static int br_nf_local_out_finish(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug &= ~(1 << NF_BR_LOCAL_OUT);
+#endif
+
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+ br_forward_finish, NF_BR_PRI_FIRST + 1);
+
+ return 0;
+}
+
+
+/* This function sees both locally originated IP packets and forwarded
+ * IP packets (in both cases the destination device is a bridge
+ * device). It also sees bridged-and-DNAT'ed packets.
+ * For the sake of interface transparency (i.e. properly
+ * overloading the '-o' option), we steal packets destined to
+ * a bridge device away from the PF_INET/FORWARD and PF_INET/OUTPUT hook
+ * functions, and give them back later, when we have determined the real
+ * output device. This is done in here.
+ *
+ * If (skb->brnfmask & BRNF_BRIDGED_DNAT) then the packet is bridged
+ * and we fake the PF_BRIDGE/FORWARD hook. The function br_nf_forward()
+ * will then fake the PF_INET/FORWARD hook. br_nf_local_out() has priority
+ * NF_BR_PRI_FIRST, so no relevant PF_BRIDGE/INPUT functions have been nor
+ * will be executed.
+ * Otherwise, if skb->physindev is NULL, the bridge-nf code never touched
+ * this packet before, and so the packet was locally originated. We fake
+ * the PF_INET/LOCAL_OUT hook.
+ * Finally, if skb->physindev isn't NULL, then the packet was IP routed,
+ * so we fake the PF_INET/FORWARD hook. ipv4_sabotage_out() makes sure
+ * even routed packets that didn't arrive on a bridge interface have their
+ * skb->physindev set.
+ */
+
+static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*_okfn)(struct sk_buff *))
+{
+ int (*okfn)(struct sk_buff *skb);
+ struct net_device *realindev;
+ struct sk_buff *skb = *pskb;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ /* Sometimes we get packets with NULL ->dst here (for example,
+ * running a dhcp client daemon triggers this).
+ */
+ if (skb->dst == NULL)
+ return NF_ACCEPT;
+
+ skb->physoutdev = skb->dev;
+
+ realindev = skb->physindev;
+
+ /* Bridged, take PF_BRIDGE/FORWARD.
+ * (see big note in front of br_nf_pre_routing_finish)
+ */
+ if (skb->brnfmask & BRNF_BRIDGED_DNAT) {
+ okfn = br_forward_finish;
+
+ if (skb->brnfmask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ skb->brnfmask ^= BRNF_PKT_TYPE;
+ }
+
+ NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
+ skb->dev, okfn);
+ } else {
+ okfn = br_nf_local_out_finish;
+ /* IP forwarded traffic has a physindev, locally
+ * generated traffic hasn't.
+ */
+ if (realindev != NULL) {
+ if (((skb->brnfmask & BRNF_DONT_TAKE_PARENT) == 0) &&
+ has_bridge_parent(realindev))
+ realindev = bridge_parent(realindev);
+
+ NF_HOOK_THRESH(PF_INET, NF_IP_FORWARD, skb, realindev,
+ bridge_parent(skb->dev), okfn,
+ NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD + 1);
+ } else {
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_LOCAL_OUT);
+#endif
+
+ NF_HOOK_THRESH(PF_INET, NF_IP_LOCAL_OUT, skb, realindev,
+ bridge_parent(skb->dev), okfn,
+ NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT + 1);
+ }
+ }
+
+ return NF_STOLEN;
+}
+
+
+/* PF_BRIDGE/POST_ROUTING ********************************************/
+static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+
+ /* Be very paranoid. */
+ if (skb->mac.raw < skb->head || skb->mac.raw + ETH_HLEN > skb->data) {
+ printk(KERN_CRIT "br_netfilter: Argh!! br_nf_post_routing: "
+ "bad mac.raw pointer.");
+ if (skb->dev != NULL) {
+ printk("[%s]", skb->dev->name);
+ if (has_bridge_parent(skb->dev))
+ printk("[%s]", bridge_parent(skb->dev)->name);
+ }
+ printk("\n");
+ return NF_ACCEPT;
+ }
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ /* Sometimes we get packets with NULL ->dst here (for example,
+ * running a dhcp client daemon triggers this).
+ */
+ if (skb->dst == NULL)
+ return NF_ACCEPT;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_POST_ROUTING);
+#endif
+
+ /* We assume any code from br_dev_queue_push_xmit onwards doesn't care
+ * about the value of skb->pkt_type.
+ */
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ skb->brnfmask |= BRNF_PKT_TYPE;
+ }
+
+ /* Fragmented packets need a good Ethernet header, tell this to
+ * ip_output.c::ip_fragment().
+ */
+ skb->brnfmask |= BRNF_COPY_HEADER;
+
+ NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+ bridge_parent(skb->dev), br_dev_queue_push_xmit);
+
+ return NF_STOLEN;
+}
+
+
+/* IPv4/SABOTAGE *****************************************************/
+
+/* Don't hand locally destined packets to PF_INET/PRE_ROUTING
+ * for the second time.
+ */
+static unsigned int ipv4_sabotage_in(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ if (in->hard_start_xmit == br_dev_xmit &&
+ okfn != br_nf_pre_routing_finish) {
+ okfn(*pskb);
+ return NF_STOLEN;
+ }
+
+ return NF_ACCEPT;
+}
+
+/* Postpone execution of PF_INET/FORWARD, PF_INET/LOCAL_OUT
+ * and PF_INET/POST_ROUTING until we have done the forwarding
+ * decision in the bridge code and have determined skb->physoutdev.
+ */
+static unsigned int ipv4_sabotage_out(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ if (out->hard_start_xmit == br_dev_xmit &&
+ okfn != br_nf_forward_finish &&
+ okfn != br_nf_local_out_finish &&
+ okfn != br_dev_queue_push_xmit) {
+ struct sk_buff *skb = *pskb;
+
+ /* This frame will arrive on PF_BRIDGE/LOCAL_OUT and we
+ * will need the indev then. For a brouter, the real indev
+ * can be a bridge port, so we make sure br_nf_local_out()
+ * doesn't use the bridge parent of the indev by using
+ * the BRNF_DONT_TAKE_PARENT mask.
+ */
+ if (hook == NF_IP_FORWARD && skb->physindev == NULL) {
+ skb->brnfmask &= BRNF_DONT_TAKE_PARENT;
+ skb->physindev = (struct net_device *)in;
+ }
+ okfn(skb);
+ return NF_STOLEN;
+ }
+
+ return NF_ACCEPT;
+}
+
+/* For br_nf_local_out we need (prio = NF_BR_PRI_FIRST), to insure that
innocent
+ * PF_BRIDGE/NF_BR_LOCAL_OUT functions don't get bridged traffic as input.
+ * For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
+ * ip_refrag() can return NF_STOLEN.
+ */
+static struct nf_hook_ops br_nf_ops[] = {
+ { { NULL, NULL }, br_nf_pre_routing, PF_BRIDGE, NF_BR_PRE_ROUTING,
NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_local_in, PF_BRIDGE, NF_BR_LOCAL_IN, NF_BR_PRI_BRNF
},
+ { { NULL, NULL }, br_nf_forward, PF_BRIDGE, NF_BR_FORWARD, NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_local_out, PF_BRIDGE, NF_BR_LOCAL_OUT,
NF_BR_PRI_FIRST },
+ { { NULL, NULL }, br_nf_post_routing, PF_BRIDGE, NF_BR_POST_ROUTING,
NF_BR_PRI_LAST },
+ { { NULL, NULL }, ipv4_sabotage_in, PF_INET, NF_IP_PRE_ROUTING,
NF_IP_PRI_FIRST },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_FORWARD,
NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_LOCAL_OUT,
NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_POST_ROUTING,
NF_IP_PRI_FIRST }
+};
+
+#define NUMHOOKS (sizeof(br_nf_ops)/sizeof(br_nf_ops[0]))
+
+int br_netfilter_init(void)
+{
+ int i;
+
+ for (i = 0; i < NUMHOOKS; i++) {
+ int ret;
+
+ if ((ret = nf_register_hook(&br_nf_ops[i])) >= 0)
+ continue;
+
+ while (i--)
+ nf_unregister_hook(&br_nf_ops[i]);
+
+ return ret;
+ }
+
+ printk(KERN_NOTICE "Bridge firewalling registered\n");
+
+ return 0;
+}
+
+void br_netfilter_fini(void)
+{
+ int i;
+
+ for (i = NUMHOOKS - 1; i >= 0; i--)
+ nf_unregister_hook(&br_nf_ops[i]);
+}

2002-10-14 18:03:08

by David Miller

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42


These changes cannot go in:

1) There is no reason the 'okfn' you use cannot be the
function doing the MAC header copy.

This is how this is supposed to work.

I explained in that long thread a few weeks ago how
this copy may not be placed in the generic IP code.
This is final, you must find a way to make this copy
without touching ipv4/*.c

2) The netfilter changes need to be approved by the netfilter
team.

I suspect, like myself, they will barf at the phys{in,out}dev
additions to sk_buff. We already have enough junk sitting
in sk_buff making it larger than it needs to be.

Perhaps you can hang this off the nf_conntrack pointer and
specify a destructor.

3) The bridging layer changes need to be approved by Lennert.
But I'd suggest working out #1 and #2 first.

Thanks.

2002-10-14 18:26:15

by bert hubert

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42

On Mon, Oct 14, 2002 at 11:01:59AM -0700, David S. Miller wrote:

> 3) The bridging layer changes need to be approved by Lennert.
> But I'd suggest working out #1 and #2 first.

Lennert appears to have dropped off the net.

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-10-14 18:50:58

by Bart De Schuymer

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42

On Monday 14 October 2002 20:01, David S. Miller wrote:
Hello,

These are probably stupid questions to you, but here it goes.

> These changes cannot go in:
>
> 1) There is no reason the 'okfn' you use cannot be the
> function doing the MAC header copy.
>
> This is how this is supposed to work.
>
> I explained in that long thread a few weeks ago how
> this copy may not be placed in the generic IP code.
> This is final, you must find a way to make this copy
> without touching ipv4/*.c

I've checked the skb->dst->hh field and it (or skb->dst itself) was NULL for
purely bridged packets. So we'd have to fill this in ourselves. Can the
bridge code go fill in a skb->dst and skb->dst->hh? Is this considered clean?

> 2) The netfilter changes need to be approved by the netfilter
> team.
>
> I suspect, like myself, they will barf at the phys{in,out}dev
> additions to sk_buff. We already have enough junk sitting
> in sk_buff making it larger than it needs to be.

I added a third member as well... It's needed too, in my opinion.
There could ofcourse be added a pointer to a struct containing these three
values (and a copied Ethernet header). Then we go from 3 to 1 extra member...
Anyway, it's not like Lennert and me like adding new members, but we need to
save those things somewhere...

> Perhaps you can hang this off the nf_conntrack pointer and
> specify a destructor.
>
> 3) The bridging layer changes need to be approved by Lennert.
> But I'd suggest working out #1 and #2 first.

So if I change
struct nf_conntrack {
atomic_t use;
void (*destroy)(struct nf_conntrack *);
};

into this:

struct nf_conntrack {
atomic_t use;
void (*destroy)(struct nf_conntrack *);
struct brnf_data *brnf;
};

I can keep the copy of the Ethernet header in the struct brnf_data too (then I
don't have to touch skbuff->dst).
The skbuff->nfct field can already be in use by an IP connection tracker (or
something), so I can't use my own destroy function.
So I'd have to go do something in
net/ipv4/netfilter/ip_conntrack_core.c::destroy_conntrack() and I don't know
that stuff.
I sure don't like this solution more than the current situation.

Anyway, mapping the IPv4 hooks onto the bridge hooks is in my opinion by
definition a hack. But a very useful hack. So if you want this in the kernel
you'll have to be forgiving. Or present a nice solution, because I and
probably Lennert really don't see a nice(r) solution.

So, the best solution I can think of is adding a skbuff->brnf pointer to a
struct brnf_data. This will get rid of the copy in ip_output.c. Is that
enough? This will uglify the ip_tables.c patch however.

--
cheers,
Bart

2002-10-14 19:03:09

by David Miller

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42

From: Bart De Schuymer <[email protected]>
Date: Mon, 14 Oct 2002 20:58:53 +0200

Can the bridge code go fill in a skb->dst and skb->dst->hh? Is this
considered clean?

If it is a properly formed 'dst' entry, it will get cleaned up
at SKB free and there will be no problems.

> 3) The bridging layer changes need to be approved by Lennert.
> But I'd suggest working out #1 and #2 first.

So if I change
struct nf_conntrack {

You shouldn't be touching nf_conntrack, you should perhaps
instead do something like:

struct nf_ct_info {
union {
struct nf_conntrack *master;
struct nf_bridge_info *brinfo;
} u;
};

But again, you need to get these sorts of extensions and core
changes approved by the netfilter team.

I'm the wrong person to ask about how they would prefer this
stuff be done.

So if you want this in the kernel you'll have to be forgiving. Or
present a nice solution, because I and probably Lennert really
don't see a nice(r) solution.

It is my job to show you why a piece of code isn't going
to go in. It is not my job to help you dream up a better
solution.

Because, frankly I don't care about bridge netfiltering.

I do care about keeping the code as clean as possible so I don't
run into road blocks when trying to rework input/output processing
just because I let some bogon hack into the tree I must continue to
support.

You do care about bridge netfiltering, so you are going to be the
one to find the clean solution that doesn't touch net/ipv4/*.c :-)

This is life in the kernel hacking community :-)

So, the best solution I can think of is adding a skbuff->brnf pointer to a
struct brnf_data. This will get rid of the copy in ip_output.c. Is that
enough? This will uglify the ip_tables.c patch however.

That could work too, I think you'll need to specify a seperate
destructor in that case, and all this stuff ifdef'd on whether
bridge netfiltering is enabled or not.

Again, talk to the netfilter folks. They may even have ideas
for you that you haven't dreamt of yet.

Franks a lot,
David S. Miller
[email protected]

2002-10-14 19:22:03

by Bart De Schuymer

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42

> It is my job to show you why a piece of code isn't going
> to go in. It is not my job to help you dream up a better
> solution.
>
> Because, frankly I don't care about bridge netfiltering.

You were the one who asked for that patch.

> I do care about keeping the code as clean as possible so I don't
> run into road blocks when trying to rework input/output processing
> just because I let some bogon hack into the tree I must continue to
> support.

Ack.

> You do care about bridge netfiltering, so you are going to be the
> one to find the clean solution that doesn't touch net/ipv4/*.c :-)

I care about Linux. I absolutely don't need a bridging firewall for anything.
I just happen to know something about it.

> That could work too, I think you'll need to specify a seperate
> destructor in that case, and all this stuff ifdef'd on whether
> bridge netfiltering is enabled or not.

This brings me to another question: I've been told it is the general concensus
that this bridge firewall should be compiled in the kernel if
CONFIG_NETFILTER=y. Or should it be a user option? It is predicted that using
a user option will give alot of questions about the bridge firewall not
working.

> Again, talk to the netfilter folks. They may even have ideas
> for you that you haven't dreamt of yet.

Will do.

--
cheers,
Bart

2002-10-14 19:27:40

by David Miller

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.42

From: Bart De Schuymer <[email protected]>
Date: Mon, 14 Oct 2002 21:29:56 +0200

You were the one who asked for that patch.

I asked for the patch to be cleaned up to eliminate
the net/ipv4/*.c hacking. :)

This brings me to another question: I've been told it is the
general concensus that this bridge firewall should be compiled in
the kernel if CONFIG_NETFILTER=y.

I don't have any strong opinion here.

2002-10-20 22:12:33

by Bart De Schuymer

[permalink] [raw]
Subject: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.44

Hello,

This is a follow-up from the previous RFC for the bridge-nf patch.
The new patch adds one member to the skbuff, a pointer to a struct
nf_bridge_info. There is still a need to change ip_output.c, but the change
is the analogue as is done for the skbuff->nfct pointer field. So, for me
this is a clean solution. The copy of the Ethernet header is no longer done
in ip_fragment().

The patch is available at:
http://users.pandora.be/bart.de.schuymer/ebtables/br-nf/bridge-nf-0.0.10-dev-pre2-against-2.5.44.diff
An incremental diff, for 2.5.42, against the previous patch is here:
http://users.pandora.be/bart.de.schuymer/ebtables/br-nf/bridge-nf-0.0.10-dev-pre2.001-against-2.5.42.diff

David, are you happy with this solution?
Other comments?

Here's the patch.


--- linux-2.5.44/include/linux/netfilter.h Sat Oct 19 06:01:54 2002
+++ linux-2.5.44-brnf/include/linux/netfilter.h Sun Oct 20 21:57:52 2002
@@ -117,17 +117,23 @@
/* This is gross, but inline doesn't cut it for avoiding the function
call in fast path: gcc doesn't inline (needs value tracking?). --RR */
#ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
+ nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
+#define NF_HOOK_THRESH nf_hook_slow
#else
#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
(list_empty(&nf_hooks[(pf)][(hook)]) \
? (okfn)(skb) \
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn)))
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh) \
+(list_empty(&nf_hooks[(pf)][(hook)]) \
+ ? (okfn)(skb) \
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
#endif

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct sk_buff *));
+ int (*okfn)(struct sk_buff *), int thresh);

/* Call setsockopt() */
int nf_setsockopt(struct sock *sk, int pf, int optval, char *opt,
--- linux-2.5.44/include/linux/netfilter_ipv4.h Sat Oct 19 06:02:28 2002
+++ linux-2.5.44-brnf/include/linux/netfilter_ipv4.h Sun Oct 20 21:57:52 2002
@@ -52,8 +52,10 @@
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK = -200,
+ NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
+ NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_LAST = INT_MAX,
--- linux-2.5.44/include/linux/netfilter_bridge.h Sat Oct 19 06:01:57 2002
+++ linux-2.5.44-brnf/include/linux/netfilter_bridge.h Sun Oct 20 21:57:52
2002
@@ -6,6 +6,7 @@

#include <linux/config.h>
#include <linux/netfilter.h>
+#include <asm/atomic.h>

/* Bridge Hooks */
/* After promisc drops, checksum checks. */
@@ -22,14 +23,39 @@
#define NF_BR_BROUTING 5
#define NF_BR_NUMHOOKS 6

+#define BRNF_PKT_TYPE 0x01
+#define BRNF_BRIDGED_DNAT 0x02
+#define BRNF_DONT_TAKE_PARENT 0x04
+
enum nf_br_hook_priorities {
NF_BR_PRI_FIRST = INT_MIN,
- NF_BR_PRI_FILTER_BRIDGED = -200,
- NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_DST_BRIDGED = -300,
+ NF_BR_PRI_FILTER_BRIDGED = -200,
+ NF_BR_PRI_BRNF = 0,
NF_BR_PRI_NAT_DST_OTHER = 100,
+ NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_SRC = 300,
NF_BR_PRI_LAST = INT_MAX,
+};
+
+static inline
+struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
+{
+ struct nf_bridge_info **nf_bridge = &(skb->nf_bridge);
+
+ if ((*nf_bridge = kmalloc(sizeof(**nf_bridge), GFP_ATOMIC)) != NULL) {
+ atomic_set(&(*nf_bridge)->use, 1);
+ (*nf_bridge)->mask = 0;
+ (*nf_bridge)->physindev = (*nf_bridge)->physoutdev = NULL;
+ }
+
+ return *nf_bridge;
+}
+
+struct bridge_skb_cb {
+ union {
+ __u32 ipv4;
+ } daddr;
};

#endif
--- linux-2.5.44/include/linux/skbuff.h Sat Oct 19 06:01:58 2002
+++ linux-2.5.44-brnf/include/linux/skbuff.h Sun Oct 20 21:57:52 2002
@@ -96,6 +96,14 @@
struct nf_ct_info {
struct nf_conntrack *master;
};
+
+struct nf_bridge_info {
+ atomic_t use;
+ struct net_device *physindev;
+ struct net_device *physoutdev;
+ unsigned int mask;
+ unsigned long hh[16 / sizeof(unsigned long)];
+};
#endif

struct sk_buff_head {
@@ -166,6 +174,7 @@
* @nfcache: Cache info
* @nfct: Associated connection, if any
* @nf_debug: Netfilter debugging
+ * @nf_bridge: Saved data about a bridged frame - see br_netfilter.c
* @tc_index: Traffic control index
*/

@@ -236,6 +245,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
unsigned int nf_debug;
#endif
+ struct nf_bridge_info *nf_bridge;
#endif /* CONFIG_NETFILTER */
#if defined(CONFIG_HIPPI)
union {
@@ -1145,6 +1155,17 @@
{
if (nfct)
atomic_inc(&nfct->master->use);
+}
+
+static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
+{
+ if (nf_bridge && atomic_dec_and_test(&nf_bridge->use))
+ kfree(nf_bridge);
+}
+static inline void nf_bridge_get(struct nf_bridge_info *nf_bridge)
+{
+ if (nf_bridge)
+ atomic_inc(&nf_bridge->use);
}
#endif

--- linux-2.5.44/net/bridge/br.c Sat Oct 19 06:01:15 2002
+++ linux-2.5.44-brnf/net/bridge/br.c Sun Oct 20 21:57:52 2002
@@ -45,6 +45,10 @@
{
printk(KERN_INFO "NET4: Ethernet Bridge 008 for NET4.0\n");

+#ifdef CONFIG_NETFILTER
+ if (br_netfilter_init())
+ return 1;
+#endif
br_handle_frame_hook = br_handle_frame;
br_ioctl_hook = br_ioctl_deviceless_stub;
#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
@@ -63,6 +67,9 @@

static void __exit br_deinit(void)
{
+#ifdef CONFIG_NETFILTER
+ br_netfilter_fini();
+#endif
unregister_netdevice_notifier(&br_device_notifier);
br_call_ioctl_atomic(__br_clear_ioctl_hook);

--- linux-2.5.44/net/bridge/br_forward.c Sat Oct 19 06:01:20 2002
+++ linux-2.5.44-brnf/net/bridge/br_forward.c Sun Oct 20 21:57:52 2002
@@ -30,18 +30,23 @@
return 1;
}

-static int __dev_queue_push_xmit(struct sk_buff *skb)
+int br_dev_queue_push_xmit(struct sk_buff *skb)
{
+#ifdef CONFIG_NETFILTER
+ if (skb->nf_bridge)
+ memcpy(skb->data - 16, skb->nf_bridge->hh, 16);
+#endif
skb_push(skb, ETH_HLEN);
+
dev_queue_xmit(skb);

return 0;
}

-static int __br_forward_finish(struct sk_buff *skb)
+int br_forward_finish(struct sk_buff *skb)
{
NF_HOOK(PF_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev,
- __dev_queue_push_xmit);
+ br_dev_queue_push_xmit);

return 0;
}
@@ -53,7 +58,7 @@
skb->nf_debug = 0;
#endif
NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
- __br_forward_finish);
+ br_forward_finish);
}

static void __br_forward(struct net_bridge_port *to, struct sk_buff *skb)
@@ -64,7 +69,7 @@
skb->dev = to->dev;

NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev,
- __br_forward_finish);
+ br_forward_finish);
}

/* called under bridge lock */
--- linux-2.5.44/net/bridge/br_input.c Sat Oct 19 06:01:18 2002
+++ linux-2.5.44-brnf/net/bridge/br_input.c Sun Oct 20 21:57:52 2002
@@ -49,7 +49,7 @@
br_pass_frame_up_finish);
}

-static int br_handle_frame_finish(struct sk_buff *skb)
+int br_handle_frame_finish(struct sk_buff *skb)
{
struct net_bridge *br;
unsigned char *dest;
--- linux-2.5.44/net/bridge/br_private.h Sat Oct 19 06:01:18 2002
+++ linux-2.5.44-brnf/net/bridge/br_private.h Sun Oct 20 21:57:52 2002
@@ -144,8 +144,10 @@
/* br_forward.c */
extern void br_deliver(struct net_bridge_port *to,
struct sk_buff *skb);
+extern int br_dev_queue_push_xmit(struct sk_buff *skb);
extern void br_forward(struct net_bridge_port *to,
struct sk_buff *skb);
+extern int br_forward_finish(struct sk_buff *skb);
extern void br_flood_deliver(struct net_bridge *br,
struct sk_buff *skb,
int clone);
@@ -166,6 +168,7 @@
int *ifindices);

/* br_input.c */
+extern int br_handle_frame_finish(struct sk_buff *skb);
extern int br_handle_frame(struct sk_buff *skb);

/* br_ioctl.c */
@@ -176,6 +179,10 @@
unsigned long arg1,
unsigned long arg2);
extern int br_ioctl_deviceless_stub(unsigned long arg);
+
+/* br_netfilter.c */
+extern int br_netfilter_init(void);
+extern void br_netfilter_fini(void);

/* br_stp.c */
extern int br_is_root_bridge(struct net_bridge *br);
--- linux-2.5.44/net/bridge/Makefile Sat Oct 19 06:02:32 2002
+++ linux-2.5.44-brnf/net/bridge/Makefile Sun Oct 20 21:57:52 2002
@@ -9,6 +9,11 @@
bridge-objs := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \
br_ioctl.o br_notify.o br_stp.o br_stp_bpdu.o \
br_stp_if.o br_stp_timer.o
+
+ifeq ($(CONFIG_NETFILTER),y)
+bridge-objs += br_netfilter.o
+endif
+
obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/

include $(TOPDIR)/Rules.make
--- linux-2.5.44/net/core/netfilter.c Sat Oct 19 06:01:53 2002
+++ linux-2.5.44-brnf/net/core/netfilter.c Sun Oct 20 21:57:52 2002
@@ -342,10 +342,15 @@
const struct net_device *indev,
const struct net_device *outdev,
struct list_head **i,
- int (*okfn)(struct sk_buff *))
+ int (*okfn)(struct sk_buff *),
+ int hook_thresh)
{
for (*i = (*i)->next; *i != head; *i = (*i)->next) {
struct nf_hook_ops *elem = (struct nf_hook_ops *)*i;
+
+ if (hook_thresh > elem->priority)
+ continue;
+
switch (elem->hook(hook, skb, indev, outdev, okfn)) {
case NF_QUEUE:
return NF_QUEUE;
@@ -413,6 +418,8 @@
{
int status;
struct nf_info *info;
+ struct net_device *physindev = NULL;
+ struct net_device *physoutdev = NULL;

if (!queue_handler[pf].outfn) {
kfree_skb(skb);
@@ -435,11 +442,20 @@
if (indev) dev_hold(indev);
if (outdev) dev_hold(outdev);

+ if (skb->nf_bridge) {
+ physindev = skb->nf_bridge->physindev;
+ if (physindev) dev_hold(physindev);
+ physoutdev = skb->nf_bridge->physoutdev;
+ if (physoutdev) dev_hold(physoutdev);
+ }
+
status = queue_handler[pf].outfn(skb, info, queue_handler[pf].data);
if (status < 0) {
/* James M doesn't say fuck enough. */
if (indev) dev_put(indev);
if (outdev) dev_put(outdev);
+ if (physindev) dev_put(physindev);
+ if (physoutdev) dev_put(physoutdev);
kfree(info);
kfree_skb(skb);
return;
@@ -449,7 +465,8 @@
int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
struct net_device *indev,
struct net_device *outdev,
- int (*okfn)(struct sk_buff *))
+ int (*okfn)(struct sk_buff *),
+ int hook_thresh)
{
struct list_head *elem;
unsigned int verdict;
@@ -481,7 +498,7 @@

elem = &nf_hooks[pf][hook];
verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
- outdev, &elem, okfn);
+ outdev, &elem, okfn, hook_thresh);
if (verdict == NF_QUEUE) {
NFDEBUG("nf_hook: Verdict = QUEUE.\n");
nf_queue(skb, elem, pf, hook, indev, outdev, okfn);
@@ -530,7 +547,7 @@
verdict = nf_iterate(&nf_hooks[info->pf][info->hook],
&skb, info->hook,
info->indev, info->outdev, &elem,
- info->okfn);
+ info->okfn, INT_MIN);
}

switch (verdict) {
--- linux-2.5.44/net/core/skbuff.c Sat Oct 19 06:01:17 2002
+++ linux-2.5.44-brnf/net/core/skbuff.c Sun Oct 20 21:57:52 2002
@@ -248,6 +248,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
skb->nf_debug = 0;
#endif
+ skb->nf_bridge = NULL;
#endif
#ifdef CONFIG_NET_SCHED
skb->tc_index = 0;
@@ -327,6 +328,7 @@
}
#ifdef CONFIG_NETFILTER
nf_conntrack_put(skb->nfct);
+ nf_bridge_put(skb->nf_bridge);
#endif
skb_headerinit(skb, NULL, 0); /* clean state */
kfree_skbmem(skb);
@@ -392,6 +394,7 @@
#ifdef CONFIG_NETFILTER_DEBUG
C(nf_debug);
#endif
+ C(nf_bridge);
#endif /*CONFIG_NETFILTER*/
#if defined(CONFIG_HIPPI)
C(private);
@@ -404,6 +407,7 @@
skb->cloned = 1;
#ifdef CONFIG_NETFILTER
nf_conntrack_get(skb->nfct);
+ nf_bridge_get(skb->nf_bridge);
#endif
return n;
}
@@ -438,6 +442,8 @@
#ifdef CONFIG_NETFILTER_DEBUG
new->nf_debug = old->nf_debug;
#endif
+ new->nf_bridge = old->nf_bridge;
+ nf_bridge_get(new->nf_bridge);
#endif
#ifdef CONFIG_NET_SCHED
new->tc_index = old->tc_index;
--- linux-2.5.44/net/ipv4/ip_output.c Sat Oct 19 06:02:34 2002
+++ linux-2.5.44-brnf/net/ipv4/ip_output.c Sun Oct 20 22:00:22 2002
@@ -396,6 +396,8 @@
/* Connection association is same as pre-frag packet */
to->nfct = from->nfct;
nf_conntrack_get(to->nfct);
+ to->nf_bridge = from->nf_bridge;
+ nf_bridge_get(to->nf_bridge);
#ifdef CONFIG_NETFILTER_DEBUG
to->nf_debug = from->nf_debug;
#endif
--- linux-2.5.44/net/ipv4/netfilter/ip_tables.c Sat Oct 19 06:01:18 2002
+++ linux-2.5.44-brnf/net/ipv4/netfilter/ip_tables.c Sun Oct 20 21:57:52 2002
@@ -121,12 +121,14 @@
static inline int
ip_packet_match(const struct iphdr *ip,
const char *indev,
+ const char *physindev,
const char *outdev,
+ const char *physoutdev,
const struct ipt_ip *ipinfo,
int isfrag)
{
size_t i;
- unsigned long ret;
+ unsigned long ret, ret2;

#define FWINV(bool,invflg) ((bool) ^ !!(ipinfo->invflags & invflg))

@@ -156,7 +158,13 @@
& ((const unsigned long *)ipinfo->iniface_mask)[i];
}

- if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
+ for (i = 0, ret2 = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret2 |= (((const unsigned long *)physindev)[i]
+ ^ ((const unsigned long *)ipinfo->iniface)[i])
+ & ((const unsigned long *)ipinfo->iniface_mask)[i];
+ }
+
+ if (FWINV(ret != 0 && ret2 != 0, IPT_INV_VIA_IN)) {
dprintf("VIA in mismatch (%s vs %s).%s\n",
indev, ipinfo->iniface,
ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
@@ -169,7 +177,13 @@
& ((const unsigned long *)ipinfo->outiface_mask)[i];
}

- if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
+ for (i = 0, ret2 = 0; i < IFNAMSIZ/sizeof(unsigned long); i++) {
+ ret2 |= (((const unsigned long *)physoutdev)[i]
+ ^ ((const unsigned long *)ipinfo->outiface)[i])
+ & ((const unsigned long *)ipinfo->outiface_mask)[i];
+ }
+
+ if (FWINV(ret != 0 && ret2 != 0, IPT_INV_VIA_OUT)) {
dprintf("VIA out mismatch (%s vs %s).%s\n",
outdev, ipinfo->outiface,
ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
@@ -268,6 +282,7 @@
/* Initializing verdict to NF_DROP keeps gcc happy. */
unsigned int verdict = NF_DROP;
const char *indev, *outdev;
+ const char *physindev, *physoutdev;
void *table_base;
struct ipt_entry *e, *back;

@@ -277,6 +292,16 @@
datalen = (*pskb)->len - ip->ihl * 4;
indev = in ? in->name : nulldevname;
outdev = out ? out->name : nulldevname;
+ if ((*pskb)->nf_bridge) {
+ physindev = (*pskb)->nf_bridge->physindev ?
+ (*pskb)->nf_bridge->physindev->name : nulldevname;
+ physoutdev = (*pskb)->nf_bridge->physoutdev ?
+ (*pskb)->nf_bridge->physoutdev->name : nulldevname;
+ } else {
+ physindev = nulldevname;
+ physoutdev = nulldevname;
+ }
+
/* We handle fragments by dealing with the first fragment as
* if it was a normal packet. All other fragments are treated
* normally, except that they will NEVER match rules that ask
@@ -311,7 +336,8 @@
IP_NF_ASSERT(e);
IP_NF_ASSERT(back);
(*pskb)->nfcache |= e->nfcache;
- if (ip_packet_match(ip, indev, outdev, &e->ip, offset)) {
+ if (ip_packet_match(ip, indev, physindev, outdev, physoutdev,
+ &e->ip, offset)) {
struct ipt_entry_target *t;

if (IPT_MATCH_ITERATE(e, do_match,
--- linux-2.5.44/net/ipv4/netfilter/ipt_LOG.c Sat Oct 19 06:01:21 2002
+++ linux-2.5.44-brnf/net/ipv4/netfilter/ipt_LOG.c Sun Oct 20 21:57:52 2002
@@ -285,10 +285,18 @@
level_string[1] = '0' + (loginfo->level % 8);
spin_lock_bh(&log_lock);
printk(level_string);
- printk("%sIN=%s OUT=%s ",
- loginfo->prefix,
- in ? in->name : "",
- out ? out->name : "");
+ printk("%sIN=%s ", loginfo->prefix, in ? in->name : "");
+ if ((*pskb)->nf_bridge) {
+ struct net_device *physindev = (*pskb)->nf_bridge->physindev;
+ struct net_device *physoutdev = (*pskb)->nf_bridge->physoutdev;
+
+ if (physindev && in != physindev)
+ printk("PHYSIN=%s ", physindev->name);
+ printk("OUT=%s ", out ? out->name : "");
+ if (physoutdev && out != physoutdev)
+ printk("PHYSOUT=%s ", physoutdev->name);
+ }
+
if (in && !out) {
/* MAC logging for input chain only. */
printk("MAC=");
--- /dev/null Thu Aug 24 11:00:32 2000
+++ linux-2.5.44-brnf/net/bridge/br_netfilter.c Sun Oct 20 21:57:52 2002
@@ -0,0 +1,616 @@
+/*
+ * Handle firewalling
+ * Linux ethernet bridge
+ *
+ * Authors:
+ * Lennert Buytenhek <[email protected]>
+ * Bart De Schuymer <[email protected]>
+ *
+ * $Id: bridge-nf-0.0.10-dev-pre2-against-2.5.42.diff,v 1.1 2002/10/19
10:46:51 bdschuym Exp $
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Lennert dedicates this file to Kerstin Wurdinger.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/ip.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/if_ether.h>
+#include <linux/netfilter_bridge.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/in_route.h>
+#include <net/ip.h>
+#include <asm/uaccess.h>
+#include <asm/checksum.h>
+#include "br_private.h"
+
+
+#define skb_origaddr(skb) (((struct bridge_skb_cb *) \
+ (skb->cb))->daddr.ipv4)
+#define store_orig_dstaddr(skb) (skb_origaddr(skb) = (skb)->nh.iph->daddr)
+#define dnat_took_place(skb) (skb_origaddr(skb) != (skb)->nh.iph->daddr)
+#define clear_cb(skb) (memset(&skb_origaddr(skb), 0, \
+ sizeof(struct bridge_skb_cb)))
+
+#define has_bridge_parent(device) ((device)->br_port != NULL)
+#define bridge_parent(device) (&((device)->br_port->br->dev))
+
+/* We need these fake structures to make netfilter happy --
+ * lots of places assume that skb->dst != NULL, which isn't
+ * all that unreasonable.
+ *
+ * Currently, we fill in the PMTU entry because netfilter
+ * refragmentation needs it, and the rt_flags entry because
+ * ipt_REJECT needs it. Future netfilter modules might
+ * require us to fill additional fields.
+ */
+static struct net_device __fake_net_device = {
+ hard_header_len: ETH_HLEN
+};
+
+static struct rtable __fake_rtable = {
+ u: {
+ dst: {
+ __refcnt: ATOMIC_INIT(1),
+ dev: &__fake_net_device,
+ pmtu: 1500
+ }
+ },
+
+ rt_flags: 0
+};
+
+
+/* PF_BRIDGE/PRE_ROUTING *********************************************/
+static void __br_dnat_complain(void)
+{
+ static unsigned long last_complaint = 0;
+
+ if (jiffies - last_complaint >= 5 * HZ) {
+ printk(KERN_WARNING "Performing cross-bridge DNAT requires IP "
+ "forwarding to be enabled\n");
+ last_complaint = jiffies;
+ }
+}
+
+
+/* This requires some explaining. If DNAT has taken place,
+ * we will need to fix up the destination Ethernet address,
+ * and this is a tricky process.
+ *
+ * There are two cases to consider:
+ * 1. The packet was DNAT'ed to a device in the same bridge
+ * port group as it was received on. We can still bridge
+ * the packet.
+ * 2. The packet was DNAT'ed to a different device, either
+ * a non-bridged device or another bridge port group.
+ * The packet will need to be routed.
+ *
+ * The correct way of distinguishing between these two cases is to
+ * call ip_route_input() and to look at skb->dst->dev, which is
+ * changed to the destination device if ip_route_input() succeeds.
+ *
+ * Let us first consider the case that ip_route_input() succeeds:
+ *
+ * If skb->dst->dev equals the logical bridge device the packet
+ * came in on, we can consider this bridging. We then call
+ * skb->dst->output() which will make the packet enter br_nf_local_out()
+ * not much later. In that function it is assured that the iptables
+ * FORWARD chain is traversed for the packet.
+ *
+ * Otherwise, the packet is considered to be routed and we just
+ * change the destination MAC address so that the packet will
+ * later be passed up to the IP stack to be routed.
+ *
+ * Let us now consider the case that ip_route_input() fails:
+ *
+ * After a "echo '0' > /proc/sys/net/ipv4/ip_forward" ip_route_input()
+ * will fail, while ip_route_output() will return success. The source
+ * address for ip_route_output() is set to zero, so ip_route_output()
+ * thinks we're handling a locally generated packet and won't care
+ * if IP forwarding is allowed. We send a warning message to the users's
+ * log telling her to put IP forwarding on.
+ *
+ * ip_route_input() will also fail if there is no route available.
+ * In that case we just drop the packet.
+ *
+ * --Lennert, 20020411
+ * --Bart, 20020416 (updated)
+ * --Bart, 20021007 (updated)
+ */
+
+static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug |= (1 << NF_BR_PRE_ROUTING) | (1 << NF_BR_FORWARD);
+#endif
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ skb->nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ skb->dev = bridge_parent(skb->dev);
+ skb->dst->output(skb);
+ return 0;
+}
+
+static int br_nf_pre_routing_finish(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct iphdr *iph = skb->nh.iph;
+ struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_PRE_ROUTING);
+#endif
+
+ if (nf_bridge->mask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ nf_bridge->mask ^= BRNF_PKT_TYPE;
+ }
+
+ if (dnat_took_place(skb)) {
+ if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
+ dev)) {
+ struct rtable *rt;
+
+ if (!ip_route_output(&rt, iph->daddr, 0, iph->tos, 0)) {
+ /* Bridged-and-DNAT'ed traffic doesn't
+ * require ip_forwarding.
+ */
+ if (((struct dst_entry *)rt)->dev == dev) {
+ skb->dst = (struct dst_entry *)rt;
+ goto bridged_dnat;
+ }
+ __br_dnat_complain();
+ dst_release((struct dst_entry *)rt);
+ }
+ kfree_skb(skb);
+ return 0;
+ } else {
+ if (skb->dst->dev == dev) {
+bridged_dnat:
+ /* Tell br_nf_local_out this is a
+ * bridged frame
+ */
+ nf_bridge->mask |= BRNF_BRIDGED_DNAT;
+ skb->dev = nf_bridge->physindev;
+ clear_cb(skb);
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING,
+ skb, skb->dev, NULL,
+ br_nf_pre_routing_finish_bridge,
+ 1);
+ return 0;
+ }
+ memcpy(skb->mac.ethernet->h_dest, dev->dev_addr,
+ ETH_ALEN);
+ }
+ } else {
+ skb->dst = (struct dst_entry *)&__fake_rtable;
+ dst_hold(skb->dst);
+ }
+
+ clear_cb(skb);
+ skb->dev = nf_bridge->physindev;
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
+ br_handle_frame_finish, 1);
+
+ return 0;
+}
+
+/* Replicate the checks that IPv4 does on packet reception.
+ * Set skb->dev to the bridge device (i.e. parent of the
+ * receiving device) to make netfilter happy, the REDIRECT
+ * target in particular. Save the original destination IP
+ * address to be able to detect DNAT afterwards.
+ */
+static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct iphdr *iph;
+ __u32 len;
+ struct sk_buff *skb;
+ struct nf_bridge_info *nf_bridge;
+
+ if ((*pskb)->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if ((skb = skb_share_check(*pskb, GFP_ATOMIC)) == NULL)
+ goto out;
+
+ if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+ goto inhdr_error;
+
+ iph = skb->nh.iph;
+ if (iph->ihl < 5 || iph->version != 4)
+ goto inhdr_error;
+
+ if (!pskb_may_pull(skb, 4*iph->ihl))
+ goto inhdr_error;
+
+ iph = skb->nh.iph;
+ if (ip_fast_csum((__u8 *)iph, iph->ihl) != 0)
+ goto inhdr_error;
+
+ len = ntohs(iph->tot_len);
+ if (skb->len < len || len < 4*iph->ihl)
+ goto inhdr_error;
+
+ if (skb->len > len) {
+ __pskb_trim(skb, len);
+ if (skb->ip_summed == CHECKSUM_HW)
+ skb->ip_summed = CHECKSUM_NONE;
+ }
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_PRE_ROUTING);
+#endif
+ if ((nf_bridge = nf_bridge_alloc(skb)) == NULL)
+ return NF_DROP;
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ nf_bridge->physindev = skb->dev;
+ skb->dev = bridge_parent(skb->dev);
+ store_orig_dstaddr(skb);
+
+ NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+ br_nf_pre_routing_finish);
+
+ return NF_STOLEN;
+
+inhdr_error:
+// IP_INC_STATS_BH(IpInHdrErrors);
+out:
+ return NF_DROP;
+}
+
+
+/* PF_BRIDGE/LOCAL_IN ************************************************/
+/* The packet is locally destined, which requires a real
+ * dst_entry, so detach the fake one. On the way up, the
+ * packet would pass through PRE_ROUTING again (which already
+ * took place when the packet entered the bridge), but we
+ * register an IPv4 PRE_ROUTING 'sabotage' hook that will
+ * prevent this from happening.
+ */
+static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if (skb->dst == (struct dst_entry *)&__fake_rtable) {
+ dst_release(skb->dst);
+ skb->dst = NULL;
+ }
+
+ return NF_ACCEPT;
+}
+
+
+/* PF_BRIDGE/FORWARD *************************************************/
+static int br_nf_forward_finish(struct sk_buff *skb)
+{
+ struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_FORWARD);
+#endif
+
+ if (nf_bridge->mask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ nf_bridge->mask ^= BRNF_PKT_TYPE;
+ }
+
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_FORWARD, skb, nf_bridge->physindev,
+ skb->dev, br_forward_finish, 1);
+
+ return 0;
+}
+
+/* This is the 'purely bridged' case. We pass the packet to
+ * netfilter with indev and outdev set to the bridge device,
+ * but we are still able to filter on the 'real' indev/outdev
+ * because another bit of the bridge-nf patch overloads the
+ * '-i' and '-o' iptables interface checks to take
+ * skb->phys{in,out}dev into account as well (so both the real
+ * device and the bridge device will match).
+ */
+static unsigned int br_nf_forward(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_FORWARD);
+#endif
+
+ nf_bridge = skb->nf_bridge;
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ nf_bridge->physoutdev = skb->dev;
+
+ NF_HOOK(PF_INET, NF_IP_FORWARD, skb, bridge_parent(nf_bridge->physindev),
+ bridge_parent(skb->dev), br_nf_forward_finish);
+
+ return NF_STOLEN;
+}
+
+
+/* PF_BRIDGE/LOCAL_OUT ***********************************************/
+static int br_nf_local_out_finish(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug &= ~(1 << NF_BR_LOCAL_OUT);
+#endif
+
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+ br_forward_finish, NF_BR_PRI_FIRST + 1);
+
+ return 0;
+}
+
+
+/* This function sees both locally originated IP packets and forwarded
+ * IP packets (in both cases the destination device is a bridge
+ * device). It also sees bridged-and-DNAT'ed packets.
+ * For the sake of interface transparency (i.e. properly
+ * overloading the '-o' option), we steal packets destined to
+ * a bridge device away from the PF_INET/FORWARD and PF_INET/OUTPUT hook
+ * functions, and give them back later, when we have determined the real
+ * output device. This is done in here.
+ *
+ * If (nf_bridge->mask & BRNF_BRIDGED_DNAT) then the packet is bridged
+ * and we fake the PF_BRIDGE/FORWARD hook. The function br_nf_forward()
+ * will then fake the PF_INET/FORWARD hook. br_nf_local_out() has priority
+ * NF_BR_PRI_FIRST, so no relevant PF_BRIDGE/INPUT functions have been nor
+ * will be executed.
+ * Otherwise, if nf_bridge->physindev is NULL, the bridge-nf code never
touched
+ * this packet before, and so the packet was locally originated. We fake
+ * the PF_INET/LOCAL_OUT hook.
+ * Finally, if nf_bridge->physindev isn't NULL, then the packet was IP
routed,
+ * so we fake the PF_INET/FORWARD hook. ipv4_sabotage_out() makes sure
+ * even routed packets that didn't arrive on a bridge interface have their
+ * nf_bridge->physindev set.
+ */
+
+static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*_okfn)(struct sk_buff *))
+{
+ int (*okfn)(struct sk_buff *skb);
+ struct net_device *realindev;
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ /* Sometimes we get packets with NULL ->dst here (for example,
+ * running a dhcp client daemon triggers this).
+ */
+ if (skb->dst == NULL)
+ return NF_ACCEPT;
+
+ nf_bridge = skb->nf_bridge;
+ nf_bridge->physoutdev = skb->dev;
+
+ realindev = nf_bridge->physindev;
+
+ /* Bridged, take PF_BRIDGE/FORWARD.
+ * (see big note in front of br_nf_pre_routing_finish)
+ */
+ if (nf_bridge->mask & BRNF_BRIDGED_DNAT) {
+ okfn = br_forward_finish;
+
+ if (nf_bridge->mask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ nf_bridge->mask ^= BRNF_PKT_TYPE;
+ }
+
+ NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
+ skb->dev, okfn);
+ } else {
+ okfn = br_nf_local_out_finish;
+ /* IP forwarded traffic has a physindev, locally
+ * generated traffic hasn't.
+ */
+ if (realindev != NULL) {
+ if (((nf_bridge->mask & BRNF_DONT_TAKE_PARENT) == 0) &&
+ has_bridge_parent(realindev))
+ realindev = bridge_parent(realindev);
+
+ NF_HOOK_THRESH(PF_INET, NF_IP_FORWARD, skb, realindev,
+ bridge_parent(skb->dev), okfn,
+ NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD + 1);
+ } else {
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_LOCAL_OUT);
+#endif
+
+ NF_HOOK_THRESH(PF_INET, NF_IP_LOCAL_OUT, skb, realindev,
+ bridge_parent(skb->dev), okfn,
+ NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT + 1);
+ }
+ }
+
+ return NF_STOLEN;
+}
+
+
+/* PF_BRIDGE/POST_ROUTING ********************************************/
+static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge = (*pskb)->nf_bridge;
+
+ /* Be very paranoid. */
+ if (skb->mac.raw < skb->head || skb->mac.raw + ETH_HLEN > skb->data) {
+ printk(KERN_CRIT "br_netfilter: Argh!! br_nf_post_routing: "
+ "bad mac.raw pointer.");
+ if (skb->dev != NULL) {
+ printk("[%s]", skb->dev->name);
+ if (has_bridge_parent(skb->dev))
+ printk("[%s]", bridge_parent(skb->dev)->name);
+ }
+ printk("\n");
+ return NF_ACCEPT;
+ }
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ /* Sometimes we get packets with NULL ->dst here (for example,
+ * running a dhcp client daemon triggers this).
+ */
+ if (skb->dst == NULL)
+ return NF_ACCEPT;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_POST_ROUTING);
+#endif
+
+ /* We assume any code from br_dev_queue_push_xmit onwards doesn't care
+ * about the value of skb->pkt_type.
+ */
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ memcpy(nf_bridge->hh, skb->data - 16, 16);
+
+ NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+ bridge_parent(skb->dev), br_dev_queue_push_xmit);
+
+ return NF_STOLEN;
+}
+
+
+/* IPv4/SABOTAGE *****************************************************/
+
+/* Don't hand locally destined packets to PF_INET/PRE_ROUTING
+ * for the second time.
+ */
+static unsigned int ipv4_sabotage_in(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ if (in->hard_start_xmit == br_dev_xmit &&
+ okfn != br_nf_pre_routing_finish) {
+ okfn(*pskb);
+ return NF_STOLEN;
+ }
+
+ return NF_ACCEPT;
+}
+
+/* Postpone execution of PF_INET/FORWARD, PF_INET/LOCAL_OUT
+ * and PF_INET/POST_ROUTING until we have done the forwarding
+ * decision in the bridge code and have determined skb->physoutdev.
+ */
+static unsigned int ipv4_sabotage_out(unsigned int hook, struct sk_buff
**pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ if (out->hard_start_xmit == br_dev_xmit &&
+ okfn != br_nf_forward_finish &&
+ okfn != br_nf_local_out_finish &&
+ okfn != br_dev_queue_push_xmit) {
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge;
+
+ if (!skb->nf_bridge && !nf_bridge_alloc(skb))
+ return NF_DROP;
+
+ nf_bridge = skb->nf_bridge;
+
+ /* This frame will arrive on PF_BRIDGE/LOCAL_OUT and we
+ * will need the indev then. For a brouter, the real indev
+ * can be a bridge port, so we make sure br_nf_local_out()
+ * doesn't use the bridge parent of the indev by using
+ * the BRNF_DONT_TAKE_PARENT mask.
+ */
+ if (hook == NF_IP_FORWARD && nf_bridge->physindev == NULL) {
+ nf_bridge->mask &= BRNF_DONT_TAKE_PARENT;
+ nf_bridge->physindev = (struct net_device *)in;
+ }
+ okfn(skb);
+ return NF_STOLEN;
+ }
+
+ return NF_ACCEPT;
+}
+
+/* For br_nf_local_out we need (prio = NF_BR_PRI_FIRST), to insure that
innocent
+ * PF_BRIDGE/NF_BR_LOCAL_OUT functions don't get bridged traffic as input.
+ * For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
+ * ip_refrag() can return NF_STOLEN.
+ */
+static struct nf_hook_ops br_nf_ops[] = {
+ { { NULL, NULL }, br_nf_pre_routing, PF_BRIDGE, NF_BR_PRE_ROUTING,
NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_local_in, PF_BRIDGE, NF_BR_LOCAL_IN, NF_BR_PRI_BRNF
},
+ { { NULL, NULL }, br_nf_forward, PF_BRIDGE, NF_BR_FORWARD, NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_local_out, PF_BRIDGE, NF_BR_LOCAL_OUT,
NF_BR_PRI_FIRST },
+ { { NULL, NULL }, br_nf_post_routing, PF_BRIDGE, NF_BR_POST_ROUTING,
NF_BR_PRI_LAST },
+ { { NULL, NULL }, ipv4_sabotage_in, PF_INET, NF_IP_PRE_ROUTING,
NF_IP_PRI_FIRST },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_FORWARD,
NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_LOCAL_OUT,
NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_POST_ROUTING,
NF_IP_PRI_FIRST }
+};
+
+#define NUMHOOKS (sizeof(br_nf_ops)/sizeof(br_nf_ops[0]))
+
+int br_netfilter_init(void)
+{
+ int i;
+
+ for (i = 0; i < NUMHOOKS; i++) {
+ int ret;
+
+ if ((ret = nf_register_hook(&br_nf_ops[i])) >= 0)
+ continue;
+
+ while (i--)
+ nf_unregister_hook(&br_nf_ops[i]);
+
+ return ret;
+ }
+
+ printk(KERN_NOTICE "Bridge firewalling registered\n");
+
+ return 0;
+}
+
+void br_netfilter_fini(void)
+{
+ int i;
+
+ for (i = NUMHOOKS - 1; i >= 0; i--)
+ nf_unregister_hook(&br_nf_ops[i]);
+}

2002-10-20 22:21:05

by David Miller

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.44

From: Bart De Schuymer <[email protected]>
Date: Mon, 21 Oct 2002 00:20:37 +0200

This is a follow-up from the previous RFC for the bridge-nf patch.
The new patch adds one member to the skbuff, a pointer to a struct
nf_bridge_info. There is still a need to change ip_output.c, but the change
is the analogue as is done for the skbuff->nfct pointer field. So, for me
this is a clean solution. The copy of the Ethernet header is no longer done
in ip_fragment().

This definitely looks a lot better.

I still want the netfilter team to 'ACK' the core/ipv4 netfilter
changes before I apply this. :-)

2002-10-22 23:31:58

by Bart De Schuymer

[permalink] [raw]
Subject: Re: [RFC] bridge-nf -- map IPv4 hooks onto bridge hooks, vs 2.5.44

Is [email protected] dead, or? Anyway, if you think the patch just
totally sucks, don't worry, I can take it. Just say so.

--
cheers,
Bart

2002-10-25 05:53:07

by Bart De Schuymer

[permalink] [raw]
Subject: [PATCH][RFC] bridge-nf -- map IPv4 hooks onto bridge hooks - try 3, vs 2.5.44

Hello David, Harald, others,

The following patch deals with the problems you still had with the earlier one.
Changes:
1. add #if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE) everywhere
2. don't touch ip_tables.c
3. no ipt_physdev.c file yet. I'll try to make it this weekend.

As this ipt_physdev.c module is not essential I propose to already apply this patch.
Harald, should I make this module for patch-o-magic or can I post it directly to David?

The patch is also here:
http://users.pandora.be/bart.de.schuymer/ebtables/br-nf/bridge-nf-0.0.10-dev-pre3-against-2.5.44.diff
An incremental patch vs the previous one is here:
http://users.pandora.be/bart.de.schuymer/ebtables/br-nf/bridge-nf-0.0.10-dev-pre3.001-against-2.5.44.diff

cheers,
Bart

--- linux-2.5.44/include/linux/netfilter.h Sat Oct 19 06:01:54 2002
+++ linux-2.5.44-brnfpre3/include/linux/netfilter.h Thu Oct 24 18:32:34 2002
@@ -117,17 +117,23 @@
/* This is gross, but inline doesn't cut it for avoiding the function
call in fast path: gcc doesn't inline (needs value tracking?). --RR */
#ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
+ nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
+#define NF_HOOK_THRESH nf_hook_slow
#else
#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
(list_empty(&nf_hooks[(pf)][(hook)]) \
? (okfn)(skb) \
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn)))
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh) \
+(list_empty(&nf_hooks[(pf)][(hook)]) \
+ ? (okfn)(skb) \
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
#endif

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct sk_buff *));
+ int (*okfn)(struct sk_buff *), int thresh);

/* Call setsockopt() */
int nf_setsockopt(struct sock *sk, int pf, int optval, char *opt,
--- linux-2.5.44/include/linux/netfilter_ipv4.h Sat Oct 19 06:02:28 2002
+++ linux-2.5.44-brnfpre3/include/linux/netfilter_ipv4.h Thu Oct 24 18:32:34 2002
@@ -52,8 +52,10 @@
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK = -200,
+ NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD = -175,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
+ NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT = -50,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_LAST = INT_MAX,
--- linux-2.5.44/include/linux/netfilter_bridge.h Sat Oct 19 06:01:57 2002
+++ linux-2.5.44-brnfpre3/include/linux/netfilter_bridge.h Thu Oct 24 18:32:34 2002
@@ -6,6 +6,7 @@

#include <linux/config.h>
#include <linux/netfilter.h>
+#include <asm/atomic.h>

/* Bridge Hooks */
/* After promisc drops, checksum checks. */
@@ -22,14 +23,39 @@
#define NF_BR_BROUTING 5
#define NF_BR_NUMHOOKS 6

+#define BRNF_PKT_TYPE 0x01
+#define BRNF_BRIDGED_DNAT 0x02
+#define BRNF_DONT_TAKE_PARENT 0x04
+
enum nf_br_hook_priorities {
NF_BR_PRI_FIRST = INT_MIN,
- NF_BR_PRI_FILTER_BRIDGED = -200,
- NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_DST_BRIDGED = -300,
+ NF_BR_PRI_FILTER_BRIDGED = -200,
+ NF_BR_PRI_BRNF = 0,
NF_BR_PRI_NAT_DST_OTHER = 100,
+ NF_BR_PRI_FILTER_OTHER = 200,
NF_BR_PRI_NAT_SRC = 300,
NF_BR_PRI_LAST = INT_MAX,
+};
+
+static inline
+struct nf_bridge_info *nf_bridge_alloc(struct sk_buff *skb)
+{
+ struct nf_bridge_info **nf_bridge = &(skb->nf_bridge);
+
+ if ((*nf_bridge = kmalloc(sizeof(**nf_bridge), GFP_ATOMIC)) != NULL) {
+ atomic_set(&(*nf_bridge)->use, 1);
+ (*nf_bridge)->mask = 0;
+ (*nf_bridge)->physindev = (*nf_bridge)->physoutdev = NULL;
+ }
+
+ return *nf_bridge;
+}
+
+struct bridge_skb_cb {
+ union {
+ __u32 ipv4;
+ } daddr;
};

#endif
--- linux-2.5.44/include/linux/skbuff.h Sat Oct 19 06:01:58 2002
+++ linux-2.5.44-brnfpre3/include/linux/skbuff.h Thu Oct 24 18:32:34 2002
@@ -96,6 +96,17 @@
struct nf_ct_info {
struct nf_conntrack *master;
};
+
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+struct nf_bridge_info {
+ atomic_t use;
+ struct net_device *physindev;
+ struct net_device *physoutdev;
+ unsigned int mask;
+ unsigned long hh[16 / sizeof(unsigned long)];
+};
+#endif
+
#endif

struct sk_buff_head {
@@ -166,6 +177,7 @@
* @nfcache: Cache info
* @nfct: Associated connection, if any
* @nf_debug: Netfilter debugging
+ * @nf_bridge: Saved data about a bridged frame - see br_netfilter.c
* @tc_index: Traffic control index
*/

@@ -236,6 +248,9 @@
#ifdef CONFIG_NETFILTER_DEBUG
unsigned int nf_debug;
#endif
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ struct nf_bridge_info *nf_bridge;
+#endif
#endif /* CONFIG_NETFILTER */
#if defined(CONFIG_HIPPI)
union {
@@ -1146,6 +1161,20 @@
if (nfct)
atomic_inc(&nfct->master->use);
}
+
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
+{
+ if (nf_bridge && atomic_dec_and_test(&nf_bridge->use))
+ kfree(nf_bridge);
+}
+static inline void nf_bridge_get(struct nf_bridge_info *nf_bridge)
+{
+ if (nf_bridge)
+ atomic_inc(&nf_bridge->use);
+}
+#endif
+
#endif

#endif /* __KERNEL__ */
--- linux-2.5.44/net/bridge/br.c Sat Oct 19 06:01:15 2002
+++ linux-2.5.44-brnfpre3/net/bridge/br.c Thu Oct 24 18:32:34 2002
@@ -45,6 +45,10 @@
{
printk(KERN_INFO "NET4: Ethernet Bridge 008 for NET4.0\n");

+#ifdef CONFIG_NETFILTER
+ if (br_netfilter_init())
+ return 1;
+#endif
br_handle_frame_hook = br_handle_frame;
br_ioctl_hook = br_ioctl_deviceless_stub;
#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
@@ -63,6 +67,9 @@

static void __exit br_deinit(void)
{
+#ifdef CONFIG_NETFILTER
+ br_netfilter_fini();
+#endif
unregister_netdevice_notifier(&br_device_notifier);
br_call_ioctl_atomic(__br_clear_ioctl_hook);

--- linux-2.5.44/net/bridge/br_forward.c Sat Oct 19 06:01:20 2002
+++ linux-2.5.44-brnfpre3/net/bridge/br_forward.c Thu Oct 24 18:32:34 2002
@@ -30,18 +30,23 @@
return 1;
}

-static int __dev_queue_push_xmit(struct sk_buff *skb)
+int br_dev_queue_push_xmit(struct sk_buff *skb)
{
+#ifdef CONFIG_NETFILTER
+ if (skb->nf_bridge)
+ memcpy(skb->data - 16, skb->nf_bridge->hh, 16);
+#endif
skb_push(skb, ETH_HLEN);
+
dev_queue_xmit(skb);

return 0;
}

-static int __br_forward_finish(struct sk_buff *skb)
+int br_forward_finish(struct sk_buff *skb)
{
NF_HOOK(PF_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev,
- __dev_queue_push_xmit);
+ br_dev_queue_push_xmit);

return 0;
}
@@ -53,7 +58,7 @@
skb->nf_debug = 0;
#endif
NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
- __br_forward_finish);
+ br_forward_finish);
}

static void __br_forward(struct net_bridge_port *to, struct sk_buff *skb)
@@ -64,7 +69,7 @@
skb->dev = to->dev;

NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, indev, skb->dev,
- __br_forward_finish);
+ br_forward_finish);
}

/* called under bridge lock */
--- linux-2.5.44/net/bridge/br_input.c Sat Oct 19 06:01:18 2002
+++ linux-2.5.44-brnfpre3/net/bridge/br_input.c Thu Oct 24 18:32:34 2002
@@ -49,7 +49,7 @@
br_pass_frame_up_finish);
}

-static int br_handle_frame_finish(struct sk_buff *skb)
+int br_handle_frame_finish(struct sk_buff *skb)
{
struct net_bridge *br;
unsigned char *dest;
--- linux-2.5.44/net/bridge/br_private.h Sat Oct 19 06:01:18 2002
+++ linux-2.5.44-brnfpre3/net/bridge/br_private.h Thu Oct 24 18:32:34 2002
@@ -144,8 +144,10 @@
/* br_forward.c */
extern void br_deliver(struct net_bridge_port *to,
struct sk_buff *skb);
+extern int br_dev_queue_push_xmit(struct sk_buff *skb);
extern void br_forward(struct net_bridge_port *to,
struct sk_buff *skb);
+extern int br_forward_finish(struct sk_buff *skb);
extern void br_flood_deliver(struct net_bridge *br,
struct sk_buff *skb,
int clone);
@@ -166,6 +168,7 @@
int *ifindices);

/* br_input.c */
+extern int br_handle_frame_finish(struct sk_buff *skb);
extern int br_handle_frame(struct sk_buff *skb);

/* br_ioctl.c */
@@ -176,6 +179,10 @@
unsigned long arg1,
unsigned long arg2);
extern int br_ioctl_deviceless_stub(unsigned long arg);
+
+/* br_netfilter.c */
+extern int br_netfilter_init(void);
+extern void br_netfilter_fini(void);

/* br_stp.c */
extern int br_is_root_bridge(struct net_bridge *br);
--- linux-2.5.44/net/bridge/Makefile Sat Oct 19 06:02:32 2002
+++ linux-2.5.44-brnfpre3/net/bridge/Makefile Thu Oct 24 18:32:34 2002
@@ -9,6 +9,11 @@
bridge-objs := br.o br_device.o br_fdb.o br_forward.o br_if.o br_input.o \
br_ioctl.o br_notify.o br_stp.o br_stp_bpdu.o \
br_stp_if.o br_stp_timer.o
+
+ifeq ($(CONFIG_NETFILTER),y)
+bridge-objs += br_netfilter.o
+endif
+
obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/

include $(TOPDIR)/Rules.make
--- linux-2.5.44/net/core/netfilter.c Sat Oct 19 06:01:53 2002
+++ linux-2.5.44-brnfpre3/net/core/netfilter.c Thu Oct 24 18:32:34 2002
@@ -342,10 +342,15 @@
const struct net_device *indev,
const struct net_device *outdev,
struct list_head **i,
- int (*okfn)(struct sk_buff *))
+ int (*okfn)(struct sk_buff *),
+ int hook_thresh)
{
for (*i = (*i)->next; *i != head; *i = (*i)->next) {
struct nf_hook_ops *elem = (struct nf_hook_ops *)*i;
+
+ if (hook_thresh > elem->priority)
+ continue;
+
switch (elem->hook(hook, skb, indev, outdev, okfn)) {
case NF_QUEUE:
return NF_QUEUE;
@@ -413,6 +418,10 @@
{
int status;
struct nf_info *info;
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ struct net_device *physindev = NULL;
+ struct net_device *physoutdev = NULL;
+#endif

if (!queue_handler[pf].outfn) {
kfree_skb(skb);
@@ -435,11 +444,24 @@
if (indev) dev_hold(indev);
if (outdev) dev_hold(outdev);

+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ if (skb->nf_bridge) {
+ physindev = skb->nf_bridge->physindev;
+ if (physindev) dev_hold(physindev);
+ physoutdev = skb->nf_bridge->physoutdev;
+ if (physoutdev) dev_hold(physoutdev);
+ }
+#endif
+
status = queue_handler[pf].outfn(skb, info, queue_handler[pf].data);
if (status < 0) {
/* James M doesn't say fuck enough. */
if (indev) dev_put(indev);
if (outdev) dev_put(outdev);
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ if (physindev) dev_put(physindev);
+ if (physoutdev) dev_put(physoutdev);
+#endif
kfree(info);
kfree_skb(skb);
return;
@@ -449,7 +471,8 @@
int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
struct net_device *indev,
struct net_device *outdev,
- int (*okfn)(struct sk_buff *))
+ int (*okfn)(struct sk_buff *),
+ int hook_thresh)
{
struct list_head *elem;
unsigned int verdict;
@@ -481,7 +504,7 @@

elem = &nf_hooks[pf][hook];
verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
- outdev, &elem, okfn);
+ outdev, &elem, okfn, hook_thresh);
if (verdict == NF_QUEUE) {
NFDEBUG("nf_hook: Verdict = QUEUE.\n");
nf_queue(skb, elem, pf, hook, indev, outdev, okfn);
@@ -530,7 +553,7 @@
verdict = nf_iterate(&nf_hooks[info->pf][info->hook],
&skb, info->hook,
info->indev, info->outdev, &elem,
- info->okfn);
+ info->okfn, INT_MIN);
}

switch (verdict) {
--- linux-2.5.44/net/core/skbuff.c Sat Oct 19 06:01:17 2002
+++ linux-2.5.44-brnfpre3/net/core/skbuff.c Thu Oct 24 18:32:34 2002
@@ -248,6 +248,9 @@
#ifdef CONFIG_NETFILTER_DEBUG
skb->nf_debug = 0;
#endif
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ skb->nf_bridge = NULL;
+#endif
#endif
#ifdef CONFIG_NET_SCHED
skb->tc_index = 0;
@@ -327,6 +330,9 @@
}
#ifdef CONFIG_NETFILTER
nf_conntrack_put(skb->nfct);
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ nf_bridge_put(skb->nf_bridge);
+#endif
#endif
skb_headerinit(skb, NULL, 0); /* clean state */
kfree_skbmem(skb);
@@ -392,6 +398,9 @@
#ifdef CONFIG_NETFILTER_DEBUG
C(nf_debug);
#endif
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ C(nf_bridge);
+#endif
#endif /*CONFIG_NETFILTER*/
#if defined(CONFIG_HIPPI)
C(private);
@@ -404,6 +413,9 @@
skb->cloned = 1;
#ifdef CONFIG_NETFILTER
nf_conntrack_get(skb->nfct);
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ nf_bridge_get(skb->nf_bridge);
+#endif
#endif
return n;
}
@@ -437,6 +449,10 @@
nf_conntrack_get(new->nfct);
#ifdef CONFIG_NETFILTER_DEBUG
new->nf_debug = old->nf_debug;
+#endif
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ new->nf_bridge = old->nf_bridge;
+ nf_bridge_get(new->nf_bridge);
#endif
#endif
#ifdef CONFIG_NET_SCHED
--- linux-2.5.44/net/ipv4/ip_output.c Sat Oct 19 06:02:34 2002
+++ linux-2.5.44-brnfpre3/net/ipv4/ip_output.c Thu Oct 24 18:32:34 2002
@@ -396,6 +396,10 @@
/* Connection association is same as pre-frag packet */
to->nfct = from->nfct;
nf_conntrack_get(to->nfct);
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ to->nf_bridge = from->nf_bridge;
+ nf_bridge_get(to->nf_bridge);
+#endif
#ifdef CONFIG_NETFILTER_DEBUG
to->nf_debug = from->nf_debug;
#endif
--- linux-2.5.44/net/ipv4/netfilter/ipt_LOG.c Sat Oct 19 06:01:21 2002
+++ linux-2.5.44-brnfpre3/net/ipv4/netfilter/ipt_LOG.c Thu Oct 24 18:58:08 2002
@@ -289,6 +289,18 @@
loginfo->prefix,
in ? in->name : "",
out ? out->name : "");
+#if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)
+ if ((*pskb)->nf_bridge) {
+ struct net_device *physindev = (*pskb)->nf_bridge->physindev;
+ struct net_device *physoutdev = (*pskb)->nf_bridge->physoutdev;
+
+ if (physindev && in != physindev)
+ printk("PHYSIN=%s ", physindev->name);
+ if (physoutdev && out != physoutdev)
+ printk("PHYSOUT=%s ", physoutdev->name);
+ }
+#endif
+
if (in && !out) {
/* MAC logging for input chain only. */
printk("MAC=");
--- /dev/null Thu Aug 24 11:00:32 2000
+++ linux-2.5.44-brnfpre3/net/bridge/br_netfilter.c Thu Oct 24 18:32:34 2002
@@ -0,0 +1,614 @@
+/*
+ * Handle firewalling
+ * Linux ethernet bridge
+ *
+ * Authors:
+ * Lennert Buytenhek <[email protected]>
+ * Bart De Schuymer <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Lennert dedicates this file to Kerstin Wurdinger.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/ip.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/if_ether.h>
+#include <linux/netfilter_bridge.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/in_route.h>
+#include <net/ip.h>
+#include <asm/uaccess.h>
+#include <asm/checksum.h>
+#include "br_private.h"
+
+
+#define skb_origaddr(skb) (((struct bridge_skb_cb *) \
+ (skb->cb))->daddr.ipv4)
+#define store_orig_dstaddr(skb) (skb_origaddr(skb) = (skb)->nh.iph->daddr)
+#define dnat_took_place(skb) (skb_origaddr(skb) != (skb)->nh.iph->daddr)
+#define clear_cb(skb) (memset(&skb_origaddr(skb), 0, \
+ sizeof(struct bridge_skb_cb)))
+
+#define has_bridge_parent(device) ((device)->br_port != NULL)
+#define bridge_parent(device) (&((device)->br_port->br->dev))
+
+/* We need these fake structures to make netfilter happy --
+ * lots of places assume that skb->dst != NULL, which isn't
+ * all that unreasonable.
+ *
+ * Currently, we fill in the PMTU entry because netfilter
+ * refragmentation needs it, and the rt_flags entry because
+ * ipt_REJECT needs it. Future netfilter modules might
+ * require us to fill additional fields.
+ */
+static struct net_device __fake_net_device = {
+ hard_header_len: ETH_HLEN
+};
+
+static struct rtable __fake_rtable = {
+ u: {
+ dst: {
+ __refcnt: ATOMIC_INIT(1),
+ dev: &__fake_net_device,
+ pmtu: 1500
+ }
+ },
+
+ rt_flags: 0
+};
+
+
+/* PF_BRIDGE/PRE_ROUTING *********************************************/
+static void __br_dnat_complain(void)
+{
+ static unsigned long last_complaint = 0;
+
+ if (jiffies - last_complaint >= 5 * HZ) {
+ printk(KERN_WARNING "Performing cross-bridge DNAT requires IP "
+ "forwarding to be enabled\n");
+ last_complaint = jiffies;
+ }
+}
+
+
+/* This requires some explaining. If DNAT has taken place,
+ * we will need to fix up the destination Ethernet address,
+ * and this is a tricky process.
+ *
+ * There are two cases to consider:
+ * 1. The packet was DNAT'ed to a device in the same bridge
+ * port group as it was received on. We can still bridge
+ * the packet.
+ * 2. The packet was DNAT'ed to a different device, either
+ * a non-bridged device or another bridge port group.
+ * The packet will need to be routed.
+ *
+ * The correct way of distinguishing between these two cases is to
+ * call ip_route_input() and to look at skb->dst->dev, which is
+ * changed to the destination device if ip_route_input() succeeds.
+ *
+ * Let us first consider the case that ip_route_input() succeeds:
+ *
+ * If skb->dst->dev equals the logical bridge device the packet
+ * came in on, we can consider this bridging. We then call
+ * skb->dst->output() which will make the packet enter br_nf_local_out()
+ * not much later. In that function it is assured that the iptables
+ * FORWARD chain is traversed for the packet.
+ *
+ * Otherwise, the packet is considered to be routed and we just
+ * change the destination MAC address so that the packet will
+ * later be passed up to the IP stack to be routed.
+ *
+ * Let us now consider the case that ip_route_input() fails:
+ *
+ * After a "echo '0' > /proc/sys/net/ipv4/ip_forward" ip_route_input()
+ * will fail, while ip_route_output() will return success. The source
+ * address for ip_route_output() is set to zero, so ip_route_output()
+ * thinks we're handling a locally generated packet and won't care
+ * if IP forwarding is allowed. We send a warning message to the users's
+ * log telling her to put IP forwarding on.
+ *
+ * ip_route_input() will also fail if there is no route available.
+ * In that case we just drop the packet.
+ *
+ * --Lennert, 20020411
+ * --Bart, 20020416 (updated)
+ * --Bart, 20021007 (updated)
+ */
+
+static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug |= (1 << NF_BR_PRE_ROUTING) | (1 << NF_BR_FORWARD);
+#endif
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ skb->nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ skb->dev = bridge_parent(skb->dev);
+ skb->dst->output(skb);
+ return 0;
+}
+
+static int br_nf_pre_routing_finish(struct sk_buff *skb)
+{
+ struct net_device *dev = skb->dev;
+ struct iphdr *iph = skb->nh.iph;
+ struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_PRE_ROUTING);
+#endif
+
+ if (nf_bridge->mask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ nf_bridge->mask ^= BRNF_PKT_TYPE;
+ }
+
+ if (dnat_took_place(skb)) {
+ if (ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
+ dev)) {
+ struct rtable *rt;
+
+ if (!ip_route_output(&rt, iph->daddr, 0, iph->tos, 0)) {
+ /* Bridged-and-DNAT'ed traffic doesn't
+ * require ip_forwarding.
+ */
+ if (((struct dst_entry *)rt)->dev == dev) {
+ skb->dst = (struct dst_entry *)rt;
+ goto bridged_dnat;
+ }
+ __br_dnat_complain();
+ dst_release((struct dst_entry *)rt);
+ }
+ kfree_skb(skb);
+ return 0;
+ } else {
+ if (skb->dst->dev == dev) {
+bridged_dnat:
+ /* Tell br_nf_local_out this is a
+ * bridged frame
+ */
+ nf_bridge->mask |= BRNF_BRIDGED_DNAT;
+ skb->dev = nf_bridge->physindev;
+ clear_cb(skb);
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING,
+ skb, skb->dev, NULL,
+ br_nf_pre_routing_finish_bridge,
+ 1);
+ return 0;
+ }
+ memcpy(skb->mac.ethernet->h_dest, dev->dev_addr,
+ ETH_ALEN);
+ }
+ } else {
+ skb->dst = (struct dst_entry *)&__fake_rtable;
+ dst_hold(skb->dst);
+ }
+
+ clear_cb(skb);
+ skb->dev = nf_bridge->physindev;
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_PRE_ROUTING, skb, skb->dev, NULL,
+ br_handle_frame_finish, 1);
+
+ return 0;
+}
+
+/* Replicate the checks that IPv4 does on packet reception.
+ * Set skb->dev to the bridge device (i.e. parent of the
+ * receiving device) to make netfilter happy, the REDIRECT
+ * target in particular. Save the original destination IP
+ * address to be able to detect DNAT afterwards.
+ */
+static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct iphdr *iph;
+ __u32 len;
+ struct sk_buff *skb;
+ struct nf_bridge_info *nf_bridge;
+
+ if ((*pskb)->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if ((skb = skb_share_check(*pskb, GFP_ATOMIC)) == NULL)
+ goto out;
+
+ if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+ goto inhdr_error;
+
+ iph = skb->nh.iph;
+ if (iph->ihl < 5 || iph->version != 4)
+ goto inhdr_error;
+
+ if (!pskb_may_pull(skb, 4*iph->ihl))
+ goto inhdr_error;
+
+ iph = skb->nh.iph;
+ if (ip_fast_csum((__u8 *)iph, iph->ihl) != 0)
+ goto inhdr_error;
+
+ len = ntohs(iph->tot_len);
+ if (skb->len < len || len < 4*iph->ihl)
+ goto inhdr_error;
+
+ if (skb->len > len) {
+ __pskb_trim(skb, len);
+ if (skb->ip_summed == CHECKSUM_HW)
+ skb->ip_summed = CHECKSUM_NONE;
+ }
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_PRE_ROUTING);
+#endif
+ if ((nf_bridge = nf_bridge_alloc(skb)) == NULL)
+ return NF_DROP;
+
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ nf_bridge->physindev = skb->dev;
+ skb->dev = bridge_parent(skb->dev);
+ store_orig_dstaddr(skb);
+
+ NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+ br_nf_pre_routing_finish);
+
+ return NF_STOLEN;
+
+inhdr_error:
+// IP_INC_STATS_BH(IpInHdrErrors);
+out:
+ return NF_DROP;
+}
+
+
+/* PF_BRIDGE/LOCAL_IN ************************************************/
+/* The packet is locally destined, which requires a real
+ * dst_entry, so detach the fake one. On the way up, the
+ * packet would pass through PRE_ROUTING again (which already
+ * took place when the packet entered the bridge), but we
+ * register an IPv4 PRE_ROUTING 'sabotage' hook that will
+ * prevent this from happening.
+ */
+static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ if (skb->dst == (struct dst_entry *)&__fake_rtable) {
+ dst_release(skb->dst);
+ skb->dst = NULL;
+ }
+
+ return NF_ACCEPT;
+}
+
+
+/* PF_BRIDGE/FORWARD *************************************************/
+static int br_nf_forward_finish(struct sk_buff *skb)
+{
+ struct nf_bridge_info *nf_bridge = skb->nf_bridge;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_FORWARD);
+#endif
+
+ if (nf_bridge->mask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ nf_bridge->mask ^= BRNF_PKT_TYPE;
+ }
+
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_FORWARD, skb, nf_bridge->physindev,
+ skb->dev, br_forward_finish, 1);
+
+ return 0;
+}
+
+/* This is the 'purely bridged' case. We pass the packet to
+ * netfilter with indev and outdev set to the bridge device,
+ * but we are still able to filter on the 'real' indev/outdev
+ * because another bit of the bridge-nf patch overloads the
+ * '-i' and '-o' iptables interface checks to take
+ * skb->phys{in,out}dev into account as well (so both the real
+ * device and the bridge device will match).
+ */
+static unsigned int br_nf_forward(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_BR_FORWARD);
+#endif
+
+ nf_bridge = skb->nf_bridge;
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ nf_bridge->physoutdev = skb->dev;
+
+ NF_HOOK(PF_INET, NF_IP_FORWARD, skb, bridge_parent(nf_bridge->physindev),
+ bridge_parent(skb->dev), br_nf_forward_finish);
+
+ return NF_STOLEN;
+}
+
+
+/* PF_BRIDGE/LOCAL_OUT ***********************************************/
+static int br_nf_local_out_finish(struct sk_buff *skb)
+{
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug &= ~(1 << NF_BR_LOCAL_OUT);
+#endif
+
+ NF_HOOK_THRESH(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+ br_forward_finish, NF_BR_PRI_FIRST + 1);
+
+ return 0;
+}
+
+
+/* This function sees both locally originated IP packets and forwarded
+ * IP packets (in both cases the destination device is a bridge
+ * device). It also sees bridged-and-DNAT'ed packets.
+ * For the sake of interface transparency (i.e. properly
+ * overloading the '-o' option), we steal packets destined to
+ * a bridge device away from the PF_INET/FORWARD and PF_INET/OUTPUT hook
+ * functions, and give them back later, when we have determined the real
+ * output device. This is done in here.
+ *
+ * If (nf_bridge->mask & BRNF_BRIDGED_DNAT) then the packet is bridged
+ * and we fake the PF_BRIDGE/FORWARD hook. The function br_nf_forward()
+ * will then fake the PF_INET/FORWARD hook. br_nf_local_out() has priority
+ * NF_BR_PRI_FIRST, so no relevant PF_BRIDGE/INPUT functions have been nor
+ * will be executed.
+ * Otherwise, if nf_bridge->physindev is NULL, the bridge-nf code never touched
+ * this packet before, and so the packet was locally originated. We fake
+ * the PF_INET/LOCAL_OUT hook.
+ * Finally, if nf_bridge->physindev isn't NULL, then the packet was IP routed,
+ * so we fake the PF_INET/FORWARD hook. ipv4_sabotage_out() makes sure
+ * even routed packets that didn't arrive on a bridge interface have their
+ * nf_bridge->physindev set.
+ */
+
+static unsigned int br_nf_local_out(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*_okfn)(struct sk_buff *))
+{
+ int (*okfn)(struct sk_buff *skb);
+ struct net_device *realindev;
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge;
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ /* Sometimes we get packets with NULL ->dst here (for example,
+ * running a dhcp client daemon triggers this).
+ */
+ if (skb->dst == NULL)
+ return NF_ACCEPT;
+
+ nf_bridge = skb->nf_bridge;
+ nf_bridge->physoutdev = skb->dev;
+
+ realindev = nf_bridge->physindev;
+
+ /* Bridged, take PF_BRIDGE/FORWARD.
+ * (see big note in front of br_nf_pre_routing_finish)
+ */
+ if (nf_bridge->mask & BRNF_BRIDGED_DNAT) {
+ okfn = br_forward_finish;
+
+ if (nf_bridge->mask & BRNF_PKT_TYPE) {
+ skb->pkt_type = PACKET_OTHERHOST;
+ nf_bridge->mask ^= BRNF_PKT_TYPE;
+ }
+
+ NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
+ skb->dev, okfn);
+ } else {
+ okfn = br_nf_local_out_finish;
+ /* IP forwarded traffic has a physindev, locally
+ * generated traffic hasn't.
+ */
+ if (realindev != NULL) {
+ if (((nf_bridge->mask & BRNF_DONT_TAKE_PARENT) == 0) &&
+ has_bridge_parent(realindev))
+ realindev = bridge_parent(realindev);
+
+ NF_HOOK_THRESH(PF_INET, NF_IP_FORWARD, skb, realindev,
+ bridge_parent(skb->dev), okfn,
+ NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD + 1);
+ } else {
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_LOCAL_OUT);
+#endif
+
+ NF_HOOK_THRESH(PF_INET, NF_IP_LOCAL_OUT, skb, realindev,
+ bridge_parent(skb->dev), okfn,
+ NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT + 1);
+ }
+ }
+
+ return NF_STOLEN;
+}
+
+
+/* PF_BRIDGE/POST_ROUTING ********************************************/
+static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge = (*pskb)->nf_bridge;
+
+ /* Be very paranoid. */
+ if (skb->mac.raw < skb->head || skb->mac.raw + ETH_HLEN > skb->data) {
+ printk(KERN_CRIT "br_netfilter: Argh!! br_nf_post_routing: "
+ "bad mac.raw pointer.");
+ if (skb->dev != NULL) {
+ printk("[%s]", skb->dev->name);
+ if (has_bridge_parent(skb->dev))
+ printk("[%s]", bridge_parent(skb->dev)->name);
+ }
+ printk("\n");
+ return NF_ACCEPT;
+ }
+
+ if (skb->protocol != __constant_htons(ETH_P_IP))
+ return NF_ACCEPT;
+
+ /* Sometimes we get packets with NULL ->dst here (for example,
+ * running a dhcp client daemon triggers this).
+ */
+ if (skb->dst == NULL)
+ return NF_ACCEPT;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug ^= (1 << NF_IP_POST_ROUTING);
+#endif
+
+ /* We assume any code from br_dev_queue_push_xmit onwards doesn't care
+ * about the value of skb->pkt_type.
+ */
+ if (skb->pkt_type == PACKET_OTHERHOST) {
+ skb->pkt_type = PACKET_HOST;
+ nf_bridge->mask |= BRNF_PKT_TYPE;
+ }
+
+ memcpy(nf_bridge->hh, skb->data - 16, 16);
+
+ NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+ bridge_parent(skb->dev), br_dev_queue_push_xmit);
+
+ return NF_STOLEN;
+}
+
+
+/* IPv4/SABOTAGE *****************************************************/
+
+/* Don't hand locally destined packets to PF_INET/PRE_ROUTING
+ * for the second time.
+ */
+static unsigned int ipv4_sabotage_in(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ if (in->hard_start_xmit == br_dev_xmit &&
+ okfn != br_nf_pre_routing_finish) {
+ okfn(*pskb);
+ return NF_STOLEN;
+ }
+
+ return NF_ACCEPT;
+}
+
+/* Postpone execution of PF_INET/FORWARD, PF_INET/LOCAL_OUT
+ * and PF_INET/POST_ROUTING until we have done the forwarding
+ * decision in the bridge code and have determined skb->physoutdev.
+ */
+static unsigned int ipv4_sabotage_out(unsigned int hook, struct sk_buff **pskb,
+ const struct net_device *in, const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ if (out->hard_start_xmit == br_dev_xmit &&
+ okfn != br_nf_forward_finish &&
+ okfn != br_nf_local_out_finish &&
+ okfn != br_dev_queue_push_xmit) {
+ struct sk_buff *skb = *pskb;
+ struct nf_bridge_info *nf_bridge;
+
+ if (!skb->nf_bridge && !nf_bridge_alloc(skb))
+ return NF_DROP;
+
+ nf_bridge = skb->nf_bridge;
+
+ /* This frame will arrive on PF_BRIDGE/LOCAL_OUT and we
+ * will need the indev then. For a brouter, the real indev
+ * can be a bridge port, so we make sure br_nf_local_out()
+ * doesn't use the bridge parent of the indev by using
+ * the BRNF_DONT_TAKE_PARENT mask.
+ */
+ if (hook == NF_IP_FORWARD && nf_bridge->physindev == NULL) {
+ nf_bridge->mask &= BRNF_DONT_TAKE_PARENT;
+ nf_bridge->physindev = (struct net_device *)in;
+ }
+ okfn(skb);
+ return NF_STOLEN;
+ }
+
+ return NF_ACCEPT;
+}
+
+/* For br_nf_local_out we need (prio = NF_BR_PRI_FIRST), to insure that innocent
+ * PF_BRIDGE/NF_BR_LOCAL_OUT functions don't get bridged traffic as input.
+ * For br_nf_post_routing, we need (prio = NF_BR_PRI_LAST), because
+ * ip_refrag() can return NF_STOLEN.
+ */
+static struct nf_hook_ops br_nf_ops[] = {
+ { { NULL, NULL }, br_nf_pre_routing, PF_BRIDGE, NF_BR_PRE_ROUTING, NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_local_in, PF_BRIDGE, NF_BR_LOCAL_IN, NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_forward, PF_BRIDGE, NF_BR_FORWARD, NF_BR_PRI_BRNF },
+ { { NULL, NULL }, br_nf_local_out, PF_BRIDGE, NF_BR_LOCAL_OUT, NF_BR_PRI_FIRST },
+ { { NULL, NULL }, br_nf_post_routing, PF_BRIDGE, NF_BR_POST_ROUTING, NF_BR_PRI_LAST },
+ { { NULL, NULL }, ipv4_sabotage_in, PF_INET, NF_IP_PRE_ROUTING, NF_IP_PRI_FIRST },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_FORWARD, NF_IP_PRI_BRIDGE_SABOTAGE_FORWARD },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_LOCAL_OUT, NF_IP_PRI_BRIDGE_SABOTAGE_LOCAL_OUT },
+ { { NULL, NULL }, ipv4_sabotage_out, PF_INET, NF_IP_POST_ROUTING, NF_IP_PRI_FIRST }
+};
+
+#define NUMHOOKS (sizeof(br_nf_ops)/sizeof(br_nf_ops[0]))
+
+int br_netfilter_init(void)
+{
+ int i;
+
+ for (i = 0; i < NUMHOOKS; i++) {
+ int ret;
+
+ if ((ret = nf_register_hook(&br_nf_ops[i])) >= 0)
+ continue;
+
+ while (i--)
+ nf_unregister_hook(&br_nf_ops[i]);
+
+ return ret;
+ }
+
+ printk(KERN_NOTICE "Bridge firewalling registered\n");
+
+ return 0;
+}
+
+void br_netfilter_fini(void)
+{
+ int i;
+
+ for (i = NUMHOOKS - 1; i >= 0; i--)
+ nf_unregister_hook(&br_nf_ops[i]);
+}

2002-10-25 06:18:16

by Harald Welte

[permalink] [raw]
Subject: Re: [netfilter-core] [PATCH][RFC] bridge-nf -- map IPv4 hooks onto bridge hooks - try 3, vs 2.5.44

On Fri, Oct 25, 2002 at 08:01:16AM +0200, Bart De Schuymer wrote:
> As this ipt_physdev.c module is not essential I propose to already apply this
> patch.

I'm fine with this patch

> Harald, should I make this module for patch-o-magic or can I post it directly
> to David?

Please send the patch to me for review. I will pass it on very quickly once
I have received it.

> cheers,
> Bart
--
Live long and prosper
- Harald Welte / [email protected] http://www.gnumonks.org/
============================================================================
"If this were a dictatorship, it'd be a heck of a lot easier, just so long
as I'm the dictator." -- George W. Bush Dec 18, 2000


Attachments:
(No filename) (701.00 B)
(No filename) (232.00 B)
Download all attachments

2002-10-28 13:04:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH][RFC] bridge-nf -- map IPv4 hooks onto bridge hooks - try 3, vs 2.5.44

From: Bart De Schuymer <[email protected]>
Date: Fri, 25 Oct 2002 08:01:16 +0200

The following patch deals with the problems you still had with the earlier one.
Changes:
1. add #if defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE) everywhere
2. don't touch ip_tables.c
3. no ipt_physdev.c file yet. I'll try to make it this weekend.

I've applied this to my tree, thanks.