2011-06-03 19:30:53

by Alexander Holler

[permalink] [raw]
Subject: bridge/netfilter: regression in 2.6.39.1

Hello,

I'm getting a oops in the bridge code in br_change_mtu() with 2.6.39.1.
The patch below seems to fix that.

I'm not sure about the usage of dst_cow_metrics_generic() in
fake_dst_ops, but after having a quick look at it seems to be ok to use
that here.

Regards,

Alexander

-----
From 3c1d5951af73389798afeea672ec224e195b8e8d Mon Sep 17 00:00:00 2001
From: Alexander Holler <[email protected]>
Date: Fri, 3 Jun 2011 20:43:06 +0200
Subject: [PATCH] bridge: add dst_cow_metrics_generic to fake_dst_ops

Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
should prevent, it introduces NULL a dereference.

The above commit uses dst_init_metrics() which sets the metrics as
read only. As result br_change_mtu() dies in dst_metric_set()
which calls dst_metrics_write_ptr() which calls
dst->ops->cow_metrics() if the metrics are read only.
---
net/bridge/br_netfilter.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 5f9c091..de982a1 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -107,6 +107,7 @@ static void fake_update_pmtu(struct dst_entry *dst,
u32 mtu)
static struct dst_ops fake_dst_ops = {
.family = AF_INET,
.protocol = cpu_to_be16(ETH_P_IP),
+ .cow_metrics = dst_cow_metrics_generic,
.update_pmtu = fake_update_pmtu,
};

--
1.7.3.4


2011-06-03 19:34:08

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le vendredi 03 juin 2011 à 21:21 +0200, Alexander Holler a écrit :
> Hello,
>
> I'm getting a oops in the bridge code in br_change_mtu() with 2.6.39.1.
> The patch below seems to fix that.
>
> I'm not sure about the usage of dst_cow_metrics_generic() in
> fake_dst_ops, but after having a quick look at it seems to be ok to use
> that here.
>
> Regards,
>
> Alexander
>
> -----
> From 3c1d5951af73389798afeea672ec224e195b8e8d Mon Sep 17 00:00:00 2001
> From: Alexander Holler <[email protected]>
> Date: Fri, 3 Jun 2011 20:43:06 +0200
> Subject: [PATCH] bridge: add dst_cow_metrics_generic to fake_dst_ops
>
> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
> should prevent, it introduces NULL a dereference.
>

I cant find this commit in known trees. Could you give the real commit
id and its title ?

> The above commit uses dst_init_metrics() which sets the metrics as
> read only. As result br_change_mtu() dies in dst_metric_set()
> which calls dst_metrics_write_ptr() which calls
> dst->ops->cow_metrics() if the metrics are read only.
> ---
> net/bridge/br_netfilter.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index 5f9c091..de982a1 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -107,6 +107,7 @@ static void fake_update_pmtu(struct dst_entry *dst,
> u32 mtu)
> static struct dst_ops fake_dst_ops = {
> .family = AF_INET,
> .protocol = cpu_to_be16(ETH_P_IP),
> + .cow_metrics = dst_cow_metrics_generic,
> .update_pmtu = fake_update_pmtu,
> };
>

Your patch is mangled (white spaces instead of tabulations)

Thanks

2011-06-03 19:43:08

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Am 03.06.2011 21:34, schrieb Eric Dumazet:
> Le vendredi 03 juin 2011 à 21:21 +0200, Alexander Holler a écrit :
>> Hello,
>>
>> I'm getting a oops in the bridge code in br_change_mtu() with 2.6.39.1.
>> The patch below seems to fix that.
>>
>> I'm not sure about the usage of dst_cow_metrics_generic() in
>> fake_dst_ops, but after having a quick look at it seems to be ok to use
>> that here.
>>
>> Regards,
>>
>> Alexander
>>
>> -----
>> From 3c1d5951af73389798afeea672ec224e195b8e8d Mon Sep 17 00:00:00 2001
>> From: Alexander Holler<[email protected]>
>> Date: Fri, 3 Jun 2011 20:43:06 +0200
>> Subject: [PATCH] bridge: add dst_cow_metrics_generic to fake_dst_ops
>>
>> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
>> should prevent, it introduces NULL a dereference.
>>
>
> I cant find this commit in known trees. Could you give the real commit
> id and its title ?
>
>> The above commit uses dst_init_metrics() which sets the metrics as
>> read only. As result br_change_mtu() dies in dst_metric_set()
>> which calls dst_metrics_write_ptr() which calls
>> dst->ops->cow_metrics() if the metrics are read only.
>> ---
>> net/bridge/br_netfilter.c | 1 +
>> 1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
>> index 5f9c091..de982a1 100644
>> --- a/net/bridge/br_netfilter.c
>> +++ b/net/bridge/br_netfilter.c
>> @@ -107,6 +107,7 @@ static void fake_update_pmtu(struct dst_entry *dst,
>> u32 mtu)
>> static struct dst_ops fake_dst_ops = {
>> .family = AF_INET,
>> .protocol = cpu_to_be16(ETH_P_IP),
>> + .cow_metrics = dst_cow_metrics_generic,
>> .update_pmtu = fake_update_pmtu,
>> };
>>
>
> Your patch is mangled (white spaces instead of tabulations)

The patch had a tab, so either c&p failed or something else removed the
tab. Maybe Thunderbird, don't know. Normally I'm using git send-email.

If someone gives a feedback about the content and not the style, I'm
willing to send a nice patch which too includes the forgotten Signed-off-by.

Regards,

Alexander

2011-06-03 19:51:23

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Hello,

Am 03.06.2011 21:34, schrieb Eric Dumazet:
>>
>> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
>> should prevent, it introduces NULL a dereference.
>>
>
> I cant find this commit in known trees. Could you give the real commit
> id and its title ?

Sorry, I haven't seen that question at first. This is the real commit id
as found in the stable tree for 2.6.39. The title of that commit is
"bridge: initialize fake_rtable metrics"

Regards,

Alexander

2011-06-03 19:55:19

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le vendredi 03 juin 2011 à 21:51 +0200, Alexander Holler a écrit :
> Hello,
>
> Am 03.06.2011 21:34, schrieb Eric Dumazet:
> >>
> >> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
> >> should prevent, it introduces NULL a dereference.
> >>
> >
> > I cant find this commit in known trees. Could you give the real commit
> > id and its title ?
>
> Sorry, I haven't seen that question at first. This is the real commit id
> as found in the stable tree for 2.6.39. The title of that commit is
> "bridge: initialize fake_rtable metrics"

OK, its commit id is 33eb9873a283a2076f2b5628813d5365ca420ea9

Please always use linux-2.6 tree commits, this saves a lot of time.

Thanks

2011-06-03 22:31:15

by Ben Hutchings

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

On Fri, 2011-06-03 at 21:34 +0200, Eric Dumazet wrote:
> Le vendredi 03 juin 2011 à 21:21 +0200, Alexander Holler a écrit :
> > Hello,
> >
> > I'm getting a oops in the bridge code in br_change_mtu() with 2.6.39.1.
> > The patch below seems to fix that.
> >
> > I'm not sure about the usage of dst_cow_metrics_generic() in
> > fake_dst_ops, but after having a quick look at it seems to be ok to use
> > that here.
> >
> > Regards,
> >
> > Alexander
> >
> > -----
> > From 3c1d5951af73389798afeea672ec224e195b8e8d Mon Sep 17 00:00:00 2001
> > From: Alexander Holler <[email protected]>
> > Date: Fri, 3 Jun 2011 20:43:06 +0200
> > Subject: [PATCH] bridge: add dst_cow_metrics_generic to fake_dst_ops
> >
> > Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
> > should prevent, it introduces NULL a dereference.
> >
>
> I cant find this commit in known trees. Could you give the real commit
> id and its title ?
[...]

It's your commit 33eb9873a283a2076f2b5628813d5365ca420ea9, applied to
2.6.39.

Ben.

--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

2011-06-04 12:05:21

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Am 03.06.2011 21:55, schrieb Eric Dumazet:
> Le vendredi 03 juin 2011 à 21:51 +0200, Alexander Holler a écrit :
>> Hello,
>>
>> Am 03.06.2011 21:34, schrieb Eric Dumazet:
>>>>
>>>> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
>>>> should prevent, it introduces NULL a dereference.
>>>>
>>>
>>> I cant find this commit in known trees. Could you give the real commit
>>> id and its title ?
>>
>> Sorry, I haven't seen that question at first. This is the real commit id
>> as found in the stable tree for 2.6.39. The title of that commit is
>> "bridge: initialize fake_rtable metrics"
>
> OK, its commit id is 33eb9873a283a2076f2b5628813d5365ca420ea9
>
> Please always use linux-2.6 tree commits, this saves a lot of time.

Sorry to have wasted your time with posting a full breakdown of the bug
and a possible solution. I didn't know that stable trees are unknown to
kernel developers and I did'nt know that the "upstream commit id" in
comments of commits in the stable tree are referencing to commit ids in
Linus tree.

Anyway, I've used the broken way of not using git send-email because I'm
unsure if the solution I've offered is the best one. I'm lacking the
knowledge about what cow_metrics() does, what it is used for, and what,
if any, pre- or post-requisites are needed to use dst_cow_metrics_generic().

I can only say that my solution seems to work here, the machine comes up
without an oops and I didn't have seen any other problems while using
the bridge (which surely doesn't say anything, e.g. I'm not sure if my
use of dst_cow_metrics_generic() may introduce a mem leak or similiar,
it doesn't look so, but who knows).

Regards,

Alexander

2011-06-06 06:58:15

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Hello,

Am 04.06.2011 14:04, schrieb Alexander Holler:

> Anyway, I've used the broken way of not using git send-email because I'm
> unsure if the solution I've offered is the best one. I'm lacking the
> knowledge about what cow_metrics() does, what it is used for, and what,
> if any, pre- or post-requisites are needed to use
> dst_cow_metrics_generic().
>
> I can only say that my solution seems to work here, the machine comes up
> without an oops and I didn't have seen any other problems while using
> the bridge (which surely doesn't say anything, e.g. I'm not sure if my
> use of dst_cow_metrics_generic() may introduce a mem leak or similiar,
> it doesn't look so, but who knows).

After having a second look at dst_cow_metrics_generic(), it seems my
patch does introduce a small mem leak which occurs when the bridge will
be build as a module and the module gets unloaded. Then the static
struct dst_ops fake_dst_ops disappears and with that the memory
allocated by dst_cow_metrics_generic() is lost. I'm not sure which of
the dst*-functions should be called before the module gets unloaded.
Maybe someone with a deeper knowledge of the that stuff could have a
look at that.
Or maybe just changing the true in the call to dst_cow_metrics_generic()
in br_netfilter_rtable_init() to false instead of adding
dst_cow_metrics_generic() to fake_dst_ops() is an alternative. As I
said, I don't know what that metric-stuff does and what it is used for.

Regards,

Alexander

2011-06-06 11:15:27

by Neil Horman

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
> Hello,
>
> I'm getting a oops in the bridge code in br_change_mtu() with
> 2.6.39.1. The patch below seems to fix that.
>
> I'm not sure about the usage of dst_cow_metrics_generic() in
> fake_dst_ops, but after having a quick look at it seems to be ok to
> use that here.
>
> Regards,
>
> Alexander
>
How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
into that clause in which we call cow_metrics when we call dst_metric_set. It
seems like that flag is set erroneously. perhaps we should just update
fake_rtable.dst to have the correct flags?
Neil

> -----
> From 3c1d5951af73389798afeea672ec224e195b8e8d Mon Sep 17 00:00:00 2001
> From: Alexander Holler <[email protected]>
> Date: Fri, 3 Jun 2011 20:43:06 +0200
> Subject: [PATCH] bridge: add dst_cow_metrics_generic to fake_dst_ops
>
> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
> should prevent, it introduces NULL a dereference.
>
> The above commit uses dst_init_metrics() which sets the metrics as
> read only. As result br_change_mtu() dies in dst_metric_set()
> which calls dst_metrics_write_ptr() which calls
> dst->ops->cow_metrics() if the metrics are read only.
> ---
> net/bridge/br_netfilter.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index 5f9c091..de982a1 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -107,6 +107,7 @@ static void fake_update_pmtu(struct dst_entry
> *dst, u32 mtu)
> static struct dst_ops fake_dst_ops = {
> .family = AF_INET,
> .protocol = cpu_to_be16(ETH_P_IP),
> + .cow_metrics = dst_cow_metrics_generic,
> .update_pmtu = fake_update_pmtu,
> };
>
> --
> 1.7.3.4
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-06-06 11:49:12

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Am 06.06.2011 13:15, schrieb Neil Horman:
> On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
>> Hello,
>>
>> I'm getting a oops in the bridge code in br_change_mtu() with
>> 2.6.39.1. The patch below seems to fix that.
>>
>> I'm not sure about the usage of dst_cow_metrics_generic() in
>> fake_dst_ops, but after having a quick look at it seems to be ok to
>> use that here.
>>
>> Regards,
>>
>> Alexander
>>
> How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
> wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
> into that clause in which we call cow_metrics when we call dst_metric_set. It
> seems like that flag is set erroneously. perhaps we should just update
> fake_rtable.dst to have the correct flags?
> Neil

It is set by that change:

--------
@@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
atomic_set(&rt->dst.__refcnt, 1);
rt->dst.dev = br->dev;
rt->dst.path = &rt->dst;
- dst_metric_set(&rt->dst, RTAX_MTU, 1500);
+ dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
rt->dst.flags = DST_NOXFRM;
rt->dst.ops = &fake_dst_ops;
}
--------

The true in dst_init_metrics() is responsible for that flag.

Regards,

Alexander

2011-06-06 12:12:44

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le lundi 06 juin 2011 à 13:48 +0200, Alexander Holler a écrit :
> Am 06.06.2011 13:15, schrieb Neil Horman:
> > On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
> >> Hello,
> >>
> >> I'm getting a oops in the bridge code in br_change_mtu() with
> >> 2.6.39.1. The patch below seems to fix that.
> >>
> >> I'm not sure about the usage of dst_cow_metrics_generic() in
> >> fake_dst_ops, but after having a quick look at it seems to be ok to
> >> use that here.
> >>
> >> Regards,
> >>
> >> Alexander
> >>
> > How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
> > wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
> > into that clause in which we call cow_metrics when we call dst_metric_set. It
> > seems like that flag is set erroneously. perhaps we should just update
> > fake_rtable.dst to have the correct flags?
> > Neil
>
> It is set by that change:
>
> --------
> @@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
> atomic_set(&rt->dst.__refcnt, 1);
> rt->dst.dev = br->dev;
> rt->dst.path = &rt->dst;
> - dst_metric_set(&rt->dst, RTAX_MTU, 1500);
> + dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
> rt->dst.flags = DST_NOXFRM;
> rt->dst.ops = &fake_dst_ops;
> }
> --------
>
> The true in dst_init_metrics() is responsible for that flag.
>

You are aware this change fixed an oops ?

read_only in this context means : In case this must be written, we make
a COW first
(allocate a piece of memory, copy the source in it before applying any
change)

It would be nice you send us the stack trace, so that we can have a clue
of whats going on.

Thanks

2011-06-06 12:49:31

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Am 06.06.2011 14:12, schrieb Eric Dumazet:
> Le lundi 06 juin 2011 à 13:48 +0200, Alexander Holler a écrit :
>> Am 06.06.2011 13:15, schrieb Neil Horman:
>>> On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
>>>> Hello,
>>>>
>>>> I'm getting a oops in the bridge code in br_change_mtu() with
>>>> 2.6.39.1. The patch below seems to fix that.
>>>>
>>>> I'm not sure about the usage of dst_cow_metrics_generic() in
>>>> fake_dst_ops, but after having a quick look at it seems to be ok to
>>>> use that here.
>>>>
>>>> Regards,
>>>>
>>>> Alexander
>>>>
>>> How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
>>> wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
>>> into that clause in which we call cow_metrics when we call dst_metric_set. It
>>> seems like that flag is set erroneously. perhaps we should just update
>>> fake_rtable.dst to have the correct flags?
>>> Neil
>>
>> It is set by that change:
>>
>> --------
>> @@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
>> atomic_set(&rt->dst.__refcnt, 1);
>> rt->dst.dev = br->dev;
>> rt->dst.path =&rt->dst;
>> - dst_metric_set(&rt->dst, RTAX_MTU, 1500);
>> + dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
>> rt->dst.flags = DST_NOXFRM;
>> rt->dst.ops =&fake_dst_ops;
>> }
>> --------
>>
>> The true in dst_init_metrics() is responsible for that flag.
>>
>
> You are aware this change fixed an oops ?

No, I'm not aware of this. I know almost nothing about what all that
stuff is doing. For me that change above just introduced an oops through
an immediate NULL pointer dereference in br_change_mtu().

> read_only in this context means : In case this must be written, we make
> a COW first
> (allocate a piece of memory, copy the source in it before applying any
> change)
>
> It would be nice you send us the stack trace, so that we can have a clue
> of whats going on.

Here is the text as found in my first mail:
----
Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
should prevent, it introduces NULL a dereference.

The above commit uses dst_init_metrics() which sets the metrics as
read only. As result br_change_mtu() dies in dst_metric_set()
which calls dst_metrics_write_ptr() which calls
dst->ops->cow_metrics() if the metrics are read only.

I don't have a system with 2.6.39.1 or 3.0-rcN ready which writes an
oops to somewhere else than the screen. Just set up a bridge and change
the MTU for the IF, that will trigger the oops.
----

Here is the oops:

----
[ 136.546023] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 136.546038] IP: [< (null)>] (null)
[ 136.546046] *pde = 00000000
[ 136.546052] Oops: 0000 [#1] SMP
[ 136.546059] last sysfs file: /sys/devices/virtual/net/mybridge/uevent
[ 136.546066] Modules linked in: sit tunnel4 tun iwlagn sch_sfq bridge
stp llc nfs lockd auth_rpcgss nfs_acl sunrpc ipt_LOG xt_recent xt_state
iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_addrtype xt_dscp xt_iprange xt_DSCP xt_set ip_set nfnetlink ip6t_LOG
xt_limit ip6table_filter xt_string xt_owner xt_multiport xt_hashlimit
xt_conntrack xt_NFQUEUE xt_mark xt_connmark nf_conntrack ip6_tables
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss uinput ipv6 dm_mod nvidiafb fb_ddc i2c_algo_bit vgastate
nvidia(P) arc4 uvcvideo videodev ecb snd_hda_codec_realtek iwl4965
snd_hda_intel iwl_legacy snd_hda_codec uhci_hcd firewire_ohci mac80211
sdhci_pci snd_hwdep snd_pcm sdhci ehci_hcd intel_agp firewire_core
cfg80211 snd_timer asus_laptop snd e1000e usbcore sparse_keymap rfkill
tpm_infineon crc_itu_t intel_gtt mmc_core iTCO_wdt tpm_tis ata_piix
agpgart tpm video soundcore sg tpm_bios joydev snd_page_alloc [last
unloaded: microcode]
[ 136.546235]
[ 136.546243] Pid: 8415, comm: ip Tainted: P
2.6.39.1-00006-g40545b7 #103 ASUSTeK Computer Inc. V1Sn
/V1Sn
[ 136.546256] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
[ 136.546268] EIP is at 0x0
[ 136.546273] EAX: f14a389c EBX: 000005d4 ECX: f80d32c0 EDX: f80d1da1
[ 136.546279] ESI: f14a3000 EDI: f255bf10 EBP: f15c3b54 ESP: f15c3b48
[ 136.546285] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 136.546293] Process ip (pid: 8415, ti=f15c2000 task=f4741f80
task.ti=f15c2000)
[ 136.546297] Stack:
[ 136.546301] f80c658f f14a3000 ffffffed f15c3b64 c12cb9c8 f80d1b80
ffffffa1 f15c3bbc
[ 136.546315] c12da347 c12d9c7d 00000000 f7670b00 00000000 f80d1b80
ffffffa6 f15c3be4
[ 136.546329] 00000004 f14a3000 f255bf20 00000008 f15c3bbc c11d6cae
00000000 00000000
[ 136.546343] Call Trace:
[ 136.546359] [<f80c658f>] ? br_change_mtu+0x5f/0x80 [bridge]
[ 136.546372] [<c12cb9c8>] dev_set_mtu+0x38/0x80
[ 136.546381] [<c12da347>] do_setlink+0x1a7/0x860
[ 136.546390] [<c12d9c7d>] ? rtnl_fill_ifinfo+0x9bd/0xc70
[ 136.546400] [<c11d6cae>] ? nla_parse+0x6e/0xb0
[ 136.546409] [<c12db931>] rtnl_newlink+0x361/0x510
[ 136.546420] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
[ 136.546429] [<c1362762>] ? error_code+0x5a/0x60
[ 136.546438] [<c12db5d0>] ? rtnl_configure_link+0x80/0x80
[ 136.546446] [<c12db27a>] rtnetlink_rcv_msg+0xfa/0x210
[ 136.546454] [<c12db180>] ? __rtnl_unlock+0x20/0x20
[ 136.546463] [<c12ee0fe>] netlink_rcv_skb+0x8e/0xb0
[ 136.546471] [<c12daf1c>] rtnetlink_rcv+0x1c/0x30
[ 136.546479] [<c12edafa>] netlink_unicast+0x23a/0x280
[ 136.546487] [<c12ede6b>] netlink_sendmsg+0x26b/0x2f0
[ 136.546497] [<c12bb828>] sock_sendmsg+0xc8/0x100
[ 136.546508] [<c10adf61>] ? __alloc_pages_nodemask+0xe1/0x750
[ 136.546517] [<c11d0602>] ? _copy_from_user+0x42/0x60
[ 136.546525] [<c12c5e4c>] ? verify_iovec+0x4c/0xc0
[ 136.546534] [<c12bd805>] sys_sendmsg+0x1c5/0x200
[ 136.546542] [<c10c2150>] ? __do_fault+0x310/0x410
[ 136.546549] [<c10c2c46>] ? do_wp_page+0x1d6/0x6b0
[ 136.546557] [<c10c47d1>] ? handle_pte_fault+0xe1/0x720
[ 136.546565] [<c12bd1af>] ? sys_getsockname+0x7f/0x90
[ 136.546574] [<c10c4ec1>] ? handle_mm_fault+0xb1/0x180
[ 136.546582] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
[ 136.546589] [<c10233b3>] ? do_page_fault+0x173/0x3d0
[ 136.546596] [<c12bd87b>] ? sys_recvmsg+0x3b/0x60
[ 136.546605] [<c12bdd83>] sys_socketcall+0x293/0x2d0
[ 136.546614] [<c13629d0>] sysenter_do_call+0x12/0x26
[ 136.546619] Code: Bad EIP value.
[ 136.546627] EIP: [<00000000>] 0x0 SS:ESP 0068:f15c3b48
[ 136.546645] CR2: 0000000000000000
[ 136.546652] ---[ end trace 6909b560e78934fa ]---
----

And if you want the oops on your machine (uisng 2.6.39.1 or 3.0-rcN):

----
brctl addbr mybridge
ip link set mybridge mtu 1234
oops
----

Regards,

Alexander

2011-06-06 13:09:28

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le lundi 06 juin 2011 à 14:12 +0200, Eric Dumazet a écrit :
> Le lundi 06 juin 2011 à 13:48 +0200, Alexander Holler a écrit :
> > Am 06.06.2011 13:15, schrieb Neil Horman:
> > > On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
> > >> Hello,
> > >>
> > >> I'm getting a oops in the bridge code in br_change_mtu() with
> > >> 2.6.39.1. The patch below seems to fix that.
> > >>
> > >> I'm not sure about the usage of dst_cow_metrics_generic() in
> > >> fake_dst_ops, but after having a quick look at it seems to be ok to
> > >> use that here.
> > >>
> > >> Regards,
> > >>
> > >> Alexander
> > >>
> > > How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
> > > wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
> > > into that clause in which we call cow_metrics when we call dst_metric_set. It
> > > seems like that flag is set erroneously. perhaps we should just update
> > > fake_rtable.dst to have the correct flags?
> > > Neil
> >
> > It is set by that change:
> >
> > --------
> > @@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
> > atomic_set(&rt->dst.__refcnt, 1);
> > rt->dst.dev = br->dev;
> > rt->dst.path = &rt->dst;
> > - dst_metric_set(&rt->dst, RTAX_MTU, 1500);
> > + dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
> > rt->dst.flags = DST_NOXFRM;
> > rt->dst.ops = &fake_dst_ops;
> > }
> > --------
> >
> > The true in dst_init_metrics() is responsible for that flag.
> >
>
> You are aware this change fixed an oops ?
>
> read_only in this context means : In case this must be written, we make
> a COW first
> (allocate a piece of memory, copy the source in it before applying any
> change)
>
> It would be nice you send us the stack trace, so that we can have a clue
> of whats going on.
>

Alexander, you should take a look at :

git show 0972ddb2

To get an idea of how to deal with this problem
(See how Held Bernhard included a backtrace to help us make a
diagnostic)

We dont want to even allocate a piece of memory to copy
br_dst_default_metric for a fake dst.

2011-06-06 13:14:10

by Neil Horman

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

On Mon, Jun 06, 2011 at 02:49:04PM +0200, Alexander Holler wrote:
> Am 06.06.2011 14:12, schrieb Eric Dumazet:
> >Le lundi 06 juin 2011 ? 13:48 +0200, Alexander Holler a ?crit :
> >>Am 06.06.2011 13:15, schrieb Neil Horman:
> >>>On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
> >>>>Hello,
> >>>>
> >>>>I'm getting a oops in the bridge code in br_change_mtu() with
> >>>>2.6.39.1. The patch below seems to fix that.
> >>>>
> >>>>I'm not sure about the usage of dst_cow_metrics_generic() in
> >>>>fake_dst_ops, but after having a quick look at it seems to be ok to
> >>>>use that here.
> >>>>
> >>>>Regards,
> >>>>
> >>>>Alexander
> >>>>
> >>>How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
> >>>wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
> >>>into that clause in which we call cow_metrics when we call dst_metric_set. It
> >>>seems like that flag is set erroneously. perhaps we should just update
> >>>fake_rtable.dst to have the correct flags?
> >>>Neil
> >>
> >>It is set by that change:
> >>
> >>--------
> >>@@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
> >> atomic_set(&rt->dst.__refcnt, 1);
> >> rt->dst.dev = br->dev;
> >> rt->dst.path =&rt->dst;
> >>- dst_metric_set(&rt->dst, RTAX_MTU, 1500);
> >>+ dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
> >> rt->dst.flags = DST_NOXFRM;
> >> rt->dst.ops =&fake_dst_ops;
> >> }
> >>--------
> >>
> >>The true in dst_init_metrics() is responsible for that flag.
> >>
> >
> >You are aware this change fixed an oops ?
>
> No, I'm not aware of this. I know almost nothing about what all that
> stuff is doing. For me that change above just introduced an oops
> through an immediate NULL pointer dereference in br_change_mtu().
>
> >read_only in this context means : In case this must be written, we make
> >a COW first
> >(allocate a piece of memory, copy the source in it before applying any
> >change)
> >
> >It would be nice you send us the stack trace, so that we can have a clue
> >of whats going on.
>
> Here is the text as found in my first mail:
> ----
> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
> should prevent, it introduces NULL a dereference.
>
> The above commit uses dst_init_metrics() which sets the metrics as
> read only. As result br_change_mtu() dies in dst_metric_set()
> which calls dst_metrics_write_ptr() which calls
> dst->ops->cow_metrics() if the metrics are read only.
>
> I don't have a system with 2.6.39.1 or 3.0-rcN ready which writes an
> oops to somewhere else than the screen. Just set up a bridge and
> change the MTU for the IF, that will trigger the oops.
> ----
>
> Here is the oops:
>
> ----
> [ 136.546023] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 136.546038] IP: [< (null)>] (null)
> [ 136.546046] *pde = 00000000
> [ 136.546052] Oops: 0000 [#1] SMP
> [ 136.546059] last sysfs file: /sys/devices/virtual/net/mybridge/uevent
> [ 136.546066] Modules linked in: sit tunnel4 tun iwlagn sch_sfq
> bridge stp llc nfs lockd auth_rpcgss nfs_acl sunrpc ipt_LOG
> xt_recent xt_state iptable_filter iptable_nat nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_addrtype xt_dscp xt_iprange
> xt_DSCP xt_set ip_set nfnetlink ip6t_LOG xt_limit ip6table_filter
> xt_string xt_owner xt_multiport xt_hashlimit xt_conntrack xt_NFQUEUE
> xt_mark xt_connmark nf_conntrack ip6_tables snd_seq_oss
> snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
> uinput ipv6 dm_mod nvidiafb fb_ddc i2c_algo_bit vgastate nvidia(P)
> arc4 uvcvideo videodev ecb snd_hda_codec_realtek iwl4965
> snd_hda_intel iwl_legacy snd_hda_codec uhci_hcd firewire_ohci
> mac80211 sdhci_pci snd_hwdep snd_pcm sdhci ehci_hcd intel_agp
> firewire_core cfg80211 snd_timer asus_laptop snd e1000e usbcore
> sparse_keymap rfkill tpm_infineon crc_itu_t intel_gtt mmc_core
> iTCO_wdt tpm_tis ata_piix agpgart tpm video soundcore sg tpm_bios
> joydev snd_page_alloc [last unloaded: microcode]
> [ 136.546235]
> [ 136.546243] Pid: 8415, comm: ip Tainted: P
> 2.6.39.1-00006-g40545b7 #103 ASUSTeK Computer Inc. V1Sn
> /V1Sn
> [ 136.546256] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
> [ 136.546268] EIP is at 0x0
> [ 136.546273] EAX: f14a389c EBX: 000005d4 ECX: f80d32c0 EDX: f80d1da1
> [ 136.546279] ESI: f14a3000 EDI: f255bf10 EBP: f15c3b54 ESP: f15c3b48
> [ 136.546285] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [ 136.546293] Process ip (pid: 8415, ti=f15c2000 task=f4741f80
> task.ti=f15c2000)
> [ 136.546297] Stack:
> [ 136.546301] f80c658f f14a3000 ffffffed f15c3b64 c12cb9c8
> f80d1b80 ffffffa1 f15c3bbc
> [ 136.546315] c12da347 c12d9c7d 00000000 f7670b00 00000000
> f80d1b80 ffffffa6 f15c3be4
> [ 136.546329] 00000004 f14a3000 f255bf20 00000008 f15c3bbc
> c11d6cae 00000000 00000000
> [ 136.546343] Call Trace:
> [ 136.546359] [<f80c658f>] ? br_change_mtu+0x5f/0x80 [bridge]
> [ 136.546372] [<c12cb9c8>] dev_set_mtu+0x38/0x80
> [ 136.546381] [<c12da347>] do_setlink+0x1a7/0x860
> [ 136.546390] [<c12d9c7d>] ? rtnl_fill_ifinfo+0x9bd/0xc70
> [ 136.546400] [<c11d6cae>] ? nla_parse+0x6e/0xb0
> [ 136.546409] [<c12db931>] rtnl_newlink+0x361/0x510
> [ 136.546420] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
> [ 136.546429] [<c1362762>] ? error_code+0x5a/0x60
> [ 136.546438] [<c12db5d0>] ? rtnl_configure_link+0x80/0x80
> [ 136.546446] [<c12db27a>] rtnetlink_rcv_msg+0xfa/0x210
> [ 136.546454] [<c12db180>] ? __rtnl_unlock+0x20/0x20
> [ 136.546463] [<c12ee0fe>] netlink_rcv_skb+0x8e/0xb0
> [ 136.546471] [<c12daf1c>] rtnetlink_rcv+0x1c/0x30
> [ 136.546479] [<c12edafa>] netlink_unicast+0x23a/0x280
> [ 136.546487] [<c12ede6b>] netlink_sendmsg+0x26b/0x2f0
> [ 136.546497] [<c12bb828>] sock_sendmsg+0xc8/0x100
> [ 136.546508] [<c10adf61>] ? __alloc_pages_nodemask+0xe1/0x750
> [ 136.546517] [<c11d0602>] ? _copy_from_user+0x42/0x60
> [ 136.546525] [<c12c5e4c>] ? verify_iovec+0x4c/0xc0
> [ 136.546534] [<c12bd805>] sys_sendmsg+0x1c5/0x200
> [ 136.546542] [<c10c2150>] ? __do_fault+0x310/0x410
> [ 136.546549] [<c10c2c46>] ? do_wp_page+0x1d6/0x6b0
> [ 136.546557] [<c10c47d1>] ? handle_pte_fault+0xe1/0x720
> [ 136.546565] [<c12bd1af>] ? sys_getsockname+0x7f/0x90
> [ 136.546574] [<c10c4ec1>] ? handle_mm_fault+0xb1/0x180
> [ 136.546582] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
> [ 136.546589] [<c10233b3>] ? do_page_fault+0x173/0x3d0
> [ 136.546596] [<c12bd87b>] ? sys_recvmsg+0x3b/0x60
> [ 136.546605] [<c12bdd83>] sys_socketcall+0x293/0x2d0
> [ 136.546614] [<c13629d0>] sysenter_do_call+0x12/0x26
> [ 136.546619] Code: Bad EIP value.
> [ 136.546627] EIP: [<00000000>] 0x0 SS:ESP 0068:f15c3b48
> [ 136.546645] CR2: 0000000000000000
> [ 136.546652] ---[ end trace 6909b560e78934fa ]---
> ----
>
> And if you want the oops on your machine (uisng 2.6.39.1 or 3.0-rcN):
>
> ----
> brctl addbr mybridge
> ip link set mybridge mtu 1234
> oops
> ----
>
Ok, this makes sense to me now, thanks. The change to the dst initalization to
mark our fake routing table as read only means we need a cow_metrics method to
copy the metrics before we we can compete the dst_metric_set in br_change_mtu.
Thats fine, but given that its really a fake routing table with only one dst
entry which (I think) is only written under the rtnl lock, why not just modify
the dst_init_metrics call so that its not marked as read-only?

Regards
Neil

> Regards,
>
> Alexander
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-06-06 13:16:13

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le lundi 06 juin 2011 à 14:49 +0200, Alexander Holler a écrit :
> Am 06.06.2011 14:12, schrieb Eric Dumazet:
> > Le lundi 06 juin 2011 à 13:48 +0200, Alexander Holler a écrit :
> >> Am 06.06.2011 13:15, schrieb Neil Horman:
> >>> On Fri, Jun 03, 2011 at 09:21:06PM +0200, Alexander Holler wrote:
> >>>> Hello,
> >>>>
> >>>> I'm getting a oops in the bridge code in br_change_mtu() with
> >>>> 2.6.39.1. The patch below seems to fix that.
> >>>>
> >>>> I'm not sure about the usage of dst_cow_metrics_generic() in
> >>>> fake_dst_ops, but after having a quick look at it seems to be ok to
> >>>> use that here.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Alexander
> >>>>
> >>> How did the flags of the dst entry on which we're callnig dst_entry_write_ptr
> >>> wind up getting the READ_ONLY flag set on it? I don't see how we'er falling
> >>> into that clause in which we call cow_metrics when we call dst_metric_set. It
> >>> seems like that flag is set erroneously. perhaps we should just update
> >>> fake_rtable.dst to have the correct flags?
> >>> Neil
> >>
> >> It is set by that change:
> >>
> >> --------
> >> @@ -124,7 +128,7 @@ void br_netfilter_rtable_init(struct net_bridge *br)
> >> atomic_set(&rt->dst.__refcnt, 1);
> >> rt->dst.dev = br->dev;
> >> rt->dst.path =&rt->dst;
> >> - dst_metric_set(&rt->dst, RTAX_MTU, 1500);
> >> + dst_init_metrics(&rt->dst, br_dst_default_metrics, true);
> >> rt->dst.flags = DST_NOXFRM;
> >> rt->dst.ops =&fake_dst_ops;
> >> }
> >> --------
> >>
> >> The true in dst_init_metrics() is responsible for that flag.
> >>
> >
> > You are aware this change fixed an oops ?
>
> No, I'm not aware of this. I know almost nothing about what all that
> stuff is doing. For me that change above just introduced an oops through
> an immediate NULL pointer dereference in br_change_mtu().
>
> > read_only in this context means : In case this must be written, we make
> > a COW first
> > (allocate a piece of memory, copy the source in it before applying any
> > change)
> >
> > It would be nice you send us the stack trace, so that we can have a clue
> > of whats going on.
>
> Here is the text as found in my first mail:
> ----
> Commit 42923465fb8d025a2b5153f2e7ab1e6e1058bf00 does here what it
> should prevent, it introduces NULL a dereference.
>

Hmm, this commit uncovers a previous bug, introduced in commit
62fa8a846d7d.

> The above commit uses dst_init_metrics() which sets the metrics as
> read only. As result br_change_mtu() dies in dst_metric_set()
> which calls dst_metrics_write_ptr() which calls
> dst->ops->cow_metrics() if the metrics are read only.
>
> I don't have a system with 2.6.39.1 or 3.0-rcN ready which writes an
> oops to somewhere else than the screen. Just set up a bridge and change
> the MTU for the IF, that will trigger the oops.
> ----
>
> Here is the oops:
>
> ----
> [ 136.546023] BUG: unable to handle kernel NULL pointer dereference at
> (null)
> [ 136.546038] IP: [< (null)>] (null)
> [ 136.546046] *pde = 00000000
> [ 136.546052] Oops: 0000 [#1] SMP
> [ 136.546059] last sysfs file: /sys/devices/virtual/net/mybridge/uevent
> [ 136.546066] Modules linked in: sit tunnel4 tun iwlagn sch_sfq bridge
> stp llc nfs lockd auth_rpcgss nfs_acl sunrpc ipt_LOG xt_recent xt_state
> iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_addrtype xt_dscp xt_iprange xt_DSCP xt_set ip_set nfnetlink ip6t_LOG
> xt_limit ip6table_filter xt_string xt_owner xt_multiport xt_hashlimit
> xt_conntrack xt_NFQUEUE xt_mark xt_connmark nf_conntrack ip6_tables
> snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
> snd_mixer_oss uinput ipv6 dm_mod nvidiafb fb_ddc i2c_algo_bit vgastate
> nvidia(P) arc4 uvcvideo videodev ecb snd_hda_codec_realtek iwl4965
> snd_hda_intel iwl_legacy snd_hda_codec uhci_hcd firewire_ohci mac80211
> sdhci_pci snd_hwdep snd_pcm sdhci ehci_hcd intel_agp firewire_core
> cfg80211 snd_timer asus_laptop snd e1000e usbcore sparse_keymap rfkill
> tpm_infineon crc_itu_t intel_gtt mmc_core iTCO_wdt tpm_tis ata_piix
> agpgart tpm video soundcore sg tpm_bios joydev snd_page_alloc [last
> unloaded: microcode]
> [ 136.546235]
> [ 136.546243] Pid: 8415, comm: ip Tainted: P
> 2.6.39.1-00006-g40545b7 #103 ASUSTeK Computer Inc. V1Sn
> /V1Sn
> [ 136.546256] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
> [ 136.546268] EIP is at 0x0
> [ 136.546273] EAX: f14a389c EBX: 000005d4 ECX: f80d32c0 EDX: f80d1da1
> [ 136.546279] ESI: f14a3000 EDI: f255bf10 EBP: f15c3b54 ESP: f15c3b48
> [ 136.546285] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [ 136.546293] Process ip (pid: 8415, ti=f15c2000 task=f4741f80
> task.ti=f15c2000)
> [ 136.546297] Stack:
> [ 136.546301] f80c658f f14a3000 ffffffed f15c3b64 c12cb9c8 f80d1b80
> ffffffa1 f15c3bbc
> [ 136.546315] c12da347 c12d9c7d 00000000 f7670b00 00000000 f80d1b80
> ffffffa6 f15c3be4
> [ 136.546329] 00000004 f14a3000 f255bf20 00000008 f15c3bbc c11d6cae
> 00000000 00000000
> [ 136.546343] Call Trace:
> [ 136.546359] [<f80c658f>] ? br_change_mtu+0x5f/0x80 [bridge]
> [ 136.546372] [<c12cb9c8>] dev_set_mtu+0x38/0x80
> [ 136.546381] [<c12da347>] do_setlink+0x1a7/0x860
> [ 136.546390] [<c12d9c7d>] ? rtnl_fill_ifinfo+0x9bd/0xc70
> [ 136.546400] [<c11d6cae>] ? nla_parse+0x6e/0xb0
> [ 136.546409] [<c12db931>] rtnl_newlink+0x361/0x510
> [ 136.546420] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
> [ 136.546429] [<c1362762>] ? error_code+0x5a/0x60
> [ 136.546438] [<c12db5d0>] ? rtnl_configure_link+0x80/0x80
> [ 136.546446] [<c12db27a>] rtnetlink_rcv_msg+0xfa/0x210
> [ 136.546454] [<c12db180>] ? __rtnl_unlock+0x20/0x20
> [ 136.546463] [<c12ee0fe>] netlink_rcv_skb+0x8e/0xb0
> [ 136.546471] [<c12daf1c>] rtnetlink_rcv+0x1c/0x30
> [ 136.546479] [<c12edafa>] netlink_unicast+0x23a/0x280
> [ 136.546487] [<c12ede6b>] netlink_sendmsg+0x26b/0x2f0
> [ 136.546497] [<c12bb828>] sock_sendmsg+0xc8/0x100
> [ 136.546508] [<c10adf61>] ? __alloc_pages_nodemask+0xe1/0x750
> [ 136.546517] [<c11d0602>] ? _copy_from_user+0x42/0x60
> [ 136.546525] [<c12c5e4c>] ? verify_iovec+0x4c/0xc0
> [ 136.546534] [<c12bd805>] sys_sendmsg+0x1c5/0x200
> [ 136.546542] [<c10c2150>] ? __do_fault+0x310/0x410
> [ 136.546549] [<c10c2c46>] ? do_wp_page+0x1d6/0x6b0
> [ 136.546557] [<c10c47d1>] ? handle_pte_fault+0xe1/0x720
> [ 136.546565] [<c12bd1af>] ? sys_getsockname+0x7f/0x90
> [ 136.546574] [<c10c4ec1>] ? handle_mm_fault+0xb1/0x180
> [ 136.546582] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
> [ 136.546589] [<c10233b3>] ? do_page_fault+0x173/0x3d0
> [ 136.546596] [<c12bd87b>] ? sys_recvmsg+0x3b/0x60
> [ 136.546605] [<c12bdd83>] sys_socketcall+0x293/0x2d0
> [ 136.546614] [<c13629d0>] sysenter_do_call+0x12/0x26
> [ 136.546619] Code: Bad EIP value.
> [ 136.546627] EIP: [<00000000>] 0x0 SS:ESP 0068:f15c3b48
> [ 136.546645] CR2: 0000000000000000
> [ 136.546652] ---[ end trace 6909b560e78934fa ]---
> ----
>
> And if you want the oops on your machine (uisng 2.6.39.1 or 3.0-rcN):
>
> ----
> brctl addbr mybridge
> ip link set mybridge mtu 1234
> oops
> ----

Nice, now please submit a patch with 0972ddb237 as a guideline.

BTW, you could also check other struct dst_ops methods for
bridge/netfilter:

- What about ADVMSS or things like that, see commit 214f45c9


2011-06-06 13:18:20

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le lundi 06 juin 2011 à 09:13 -0400, Neil Horman a écrit :

> Ok, this makes sense to me now, thanks. The change to the dst initalization to
> mark our fake routing table as read only means we need a cow_metrics method to
> copy the metrics before we we can compete the dst_metric_set in br_change_mtu.
> Thats fine, but given that its really a fake routing table with only one dst
> entry which (I think) is only written under the rtnl lock, why not just modify
> the dst_init_metrics call so that its not marked as read-only?

It _is_ read-only, its even a "const", so if you do that you'll trap if
kernel const pages are RO

Please check commit 0972ddb237 for a proper way to deal with this.


2011-06-06 13:30:13

by Alexander Holler

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1


> Nice, now please submit a patch with 0972ddb237 as a guideline.
>
> BTW, you could also check other struct dst_ops methods for
> bridge/netfilter:

Sorry, but I prefer to submit patches I understand by myself and for
stuff I know something about. The patch in my first mail was just meant
as a quick fix (which seemed to work here).

So even if I might be able to construct a working patch using commit
0972ddb237, I don't think I should do that.

Regards,

Alexander

2011-06-06 14:26:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

From: Alexander Holler <[email protected]>

Le lundi 06 juin 2011 à 15:29 +0200, Alexander Holler a écrit :
> > Nice, now please submit a patch with 0972ddb237 as a guideline.
> >
> > BTW, you could also check other struct dst_ops methods for
> > bridge/netfilter:
>
> Sorry, but I prefer to submit patches I understand by myself and for
> stuff I know something about. The patch in my first mail was just meant
> as a quick fix (which seemed to work here).
>
> So even if I might be able to construct a working patch using commit
> 0972ddb237, I don't think I should do that.

OK, I'll do it for you then, I am surprised you dont understand my
review / suggestion.

One patch submitter is supposed to followup and send a new version to
take into account reviews/comments, not wait that eventually everybody
says "OK, lets take it as is"


[PATCH] bridge: provide a cow_metrics method for fake_ops

Like in commit 0972ddb237 (provide cow_metrics() methods to blackhole
dst_ops), we must provide a cow_metrics for bridges fake_dst_ops as
well.

This fixes a regression coming from commits 62fa8a846d7d (net: Implement
read-only protection and COW'ing of metrics.) and 33eb9873a28 (bridge:
initialize fake_rtable metrics)

ip link set mybridge mtu 1234
-->
[ 136.546243] Pid: 8415, comm: ip Tainted: P
2.6.39.1-00006-g40545b7 #103 ASUSTeK Computer Inc. V1Sn
/V1Sn
[ 136.546256] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
[ 136.546268] EIP is at 0x0
[ 136.546273] EAX: f14a389c EBX: 000005d4 ECX: f80d32c0 EDX: f80d1da1
[ 136.546279] ESI: f14a3000 EDI: f255bf10 EBP: f15c3b54 ESP: f15c3b48
[ 136.546285] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 136.546293] Process ip (pid: 8415, ti=f15c2000 task=f4741f80
task.ti=f15c2000)
[ 136.546297] Stack:
[ 136.546301] f80c658f f14a3000 ffffffed f15c3b64 c12cb9c8 f80d1b80
ffffffa1 f15c3bbc
[ 136.546315] c12da347 c12d9c7d 00000000 f7670b00 00000000 f80d1b80
ffffffa6 f15c3be4
[ 136.546329] 00000004 f14a3000 f255bf20 00000008 f15c3bbc c11d6cae
00000000 00000000
[ 136.546343] Call Trace:
[ 136.546359] [<f80c658f>] ? br_change_mtu+0x5f/0x80 [bridge]
[ 136.546372] [<c12cb9c8>] dev_set_mtu+0x38/0x80
[ 136.546381] [<c12da347>] do_setlink+0x1a7/0x860
[ 136.546390] [<c12d9c7d>] ? rtnl_fill_ifinfo+0x9bd/0xc70
[ 136.546400] [<c11d6cae>] ? nla_parse+0x6e/0xb0
[ 136.546409] [<c12db931>] rtnl_newlink+0x361/0x510
[ 136.546420] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
[ 136.546429] [<c1362762>] ? error_code+0x5a/0x60
[ 136.546438] [<c12db5d0>] ? rtnl_configure_link+0x80/0x80
[ 136.546446] [<c12db27a>] rtnetlink_rcv_msg+0xfa/0x210
[ 136.546454] [<c12db180>] ? __rtnl_unlock+0x20/0x20
[ 136.546463] [<c12ee0fe>] netlink_rcv_skb+0x8e/0xb0
[ 136.546471] [<c12daf1c>] rtnetlink_rcv+0x1c/0x30
[ 136.546479] [<c12edafa>] netlink_unicast+0x23a/0x280
[ 136.546487] [<c12ede6b>] netlink_sendmsg+0x26b/0x2f0
[ 136.546497] [<c12bb828>] sock_sendmsg+0xc8/0x100
[ 136.546508] [<c10adf61>] ? __alloc_pages_nodemask+0xe1/0x750
[ 136.546517] [<c11d0602>] ? _copy_from_user+0x42/0x60
[ 136.546525] [<c12c5e4c>] ? verify_iovec+0x4c/0xc0
[ 136.546534] [<c12bd805>] sys_sendmsg+0x1c5/0x200
[ 136.546542] [<c10c2150>] ? __do_fault+0x310/0x410
[ 136.546549] [<c10c2c46>] ? do_wp_page+0x1d6/0x6b0
[ 136.546557] [<c10c47d1>] ? handle_pte_fault+0xe1/0x720
[ 136.546565] [<c12bd1af>] ? sys_getsockname+0x7f/0x90
[ 136.546574] [<c10c4ec1>] ? handle_mm_fault+0xb1/0x180
[ 136.546582] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
[ 136.546589] [<c10233b3>] ? do_page_fault+0x173/0x3d0
[ 136.546596] [<c12bd87b>] ? sys_recvmsg+0x3b/0x60
[ 136.546605] [<c12bdd83>] sys_socketcall+0x293/0x2d0
[ 136.546614] [<c13629d0>] sysenter_do_call+0x12/0x26
[ 136.546619] Code: Bad EIP value.
[ 136.546627] EIP: [<00000000>] 0x0 SS:ESP 0068:f15c3b48
[ 136.546645] CR2: 0000000000000000
[ 136.546652] ---[ end trace 6909b560e78934fa ]---

Signed-off-by: Alexander Holler <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
CC: Neil Horman <[email protected]>
CC: Herbert Xu <[email protected]>
---
net/bridge/br_netfilter.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 3fa1231..23b43d2 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -104,10 +104,16 @@ static void fake_update_pmtu(struct dst_entry *dst, u32 mtu)
{
}

+static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old)
+{
+ return NULL;
+}
+
static struct dst_ops fake_dst_ops = {
.family = AF_INET,
.protocol = cpu_to_be16(ETH_P_IP),
.update_pmtu = fake_update_pmtu,
+ .cow_metrics = fake_cow_metrics,
};

/*

2011-06-06 15:33:10

by Neil Horman

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

On Mon, Jun 06, 2011 at 04:26:03PM +0200, Eric Dumazet wrote:
> From: Alexander Holler <[email protected]>
>
> Le lundi 06 juin 2011 ? 15:29 +0200, Alexander Holler a ?crit :
> > > Nice, now please submit a patch with 0972ddb237 as a guideline.
> > >
> > > BTW, you could also check other struct dst_ops methods for
> > > bridge/netfilter:
> >
> > Sorry, but I prefer to submit patches I understand by myself and for
> > stuff I know something about. The patch in my first mail was just meant
> > as a quick fix (which seemed to work here).
> >
> > So even if I might be able to construct a working patch using commit
> > 0972ddb237, I don't think I should do that.
>
> OK, I'll do it for you then, I am surprised you dont understand my
> review / suggestion.
>
> One patch submitter is supposed to followup and send a new version to
> take into account reviews/comments, not wait that eventually everybody
> says "OK, lets take it as is"
>
>
> [PATCH] bridge: provide a cow_metrics method for fake_ops
>
> Like in commit 0972ddb237 (provide cow_metrics() methods to blackhole
> dst_ops), we must provide a cow_metrics for bridges fake_dst_ops as
> well.
>
> This fixes a regression coming from commits 62fa8a846d7d (net: Implement
> read-only protection and COW'ing of metrics.) and 33eb9873a28 (bridge:
> initialize fake_rtable metrics)
>
> ip link set mybridge mtu 1234
> -->
> [ 136.546243] Pid: 8415, comm: ip Tainted: P
> 2.6.39.1-00006-g40545b7 #103 ASUSTeK Computer Inc. V1Sn
> /V1Sn
> [ 136.546256] EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
> [ 136.546268] EIP is at 0x0
> [ 136.546273] EAX: f14a389c EBX: 000005d4 ECX: f80d32c0 EDX: f80d1da1
> [ 136.546279] ESI: f14a3000 EDI: f255bf10 EBP: f15c3b54 ESP: f15c3b48
> [ 136.546285] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [ 136.546293] Process ip (pid: 8415, ti=f15c2000 task=f4741f80
> task.ti=f15c2000)
> [ 136.546297] Stack:
> [ 136.546301] f80c658f f14a3000 ffffffed f15c3b64 c12cb9c8 f80d1b80
> ffffffa1 f15c3bbc
> [ 136.546315] c12da347 c12d9c7d 00000000 f7670b00 00000000 f80d1b80
> ffffffa6 f15c3be4
> [ 136.546329] 00000004 f14a3000 f255bf20 00000008 f15c3bbc c11d6cae
> 00000000 00000000
> [ 136.546343] Call Trace:
> [ 136.546359] [<f80c658f>] ? br_change_mtu+0x5f/0x80 [bridge]
> [ 136.546372] [<c12cb9c8>] dev_set_mtu+0x38/0x80
> [ 136.546381] [<c12da347>] do_setlink+0x1a7/0x860
> [ 136.546390] [<c12d9c7d>] ? rtnl_fill_ifinfo+0x9bd/0xc70
> [ 136.546400] [<c11d6cae>] ? nla_parse+0x6e/0xb0
> [ 136.546409] [<c12db931>] rtnl_newlink+0x361/0x510
> [ 136.546420] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
> [ 136.546429] [<c1362762>] ? error_code+0x5a/0x60
> [ 136.546438] [<c12db5d0>] ? rtnl_configure_link+0x80/0x80
> [ 136.546446] [<c12db27a>] rtnetlink_rcv_msg+0xfa/0x210
> [ 136.546454] [<c12db180>] ? __rtnl_unlock+0x20/0x20
> [ 136.546463] [<c12ee0fe>] netlink_rcv_skb+0x8e/0xb0
> [ 136.546471] [<c12daf1c>] rtnetlink_rcv+0x1c/0x30
> [ 136.546479] [<c12edafa>] netlink_unicast+0x23a/0x280
> [ 136.546487] [<c12ede6b>] netlink_sendmsg+0x26b/0x2f0
> [ 136.546497] [<c12bb828>] sock_sendmsg+0xc8/0x100
> [ 136.546508] [<c10adf61>] ? __alloc_pages_nodemask+0xe1/0x750
> [ 136.546517] [<c11d0602>] ? _copy_from_user+0x42/0x60
> [ 136.546525] [<c12c5e4c>] ? verify_iovec+0x4c/0xc0
> [ 136.546534] [<c12bd805>] sys_sendmsg+0x1c5/0x200
> [ 136.546542] [<c10c2150>] ? __do_fault+0x310/0x410
> [ 136.546549] [<c10c2c46>] ? do_wp_page+0x1d6/0x6b0
> [ 136.546557] [<c10c47d1>] ? handle_pte_fault+0xe1/0x720
> [ 136.546565] [<c12bd1af>] ? sys_getsockname+0x7f/0x90
> [ 136.546574] [<c10c4ec1>] ? handle_mm_fault+0xb1/0x180
> [ 136.546582] [<c1023240>] ? vmalloc_sync_all+0x100/0x100
> [ 136.546589] [<c10233b3>] ? do_page_fault+0x173/0x3d0
> [ 136.546596] [<c12bd87b>] ? sys_recvmsg+0x3b/0x60
> [ 136.546605] [<c12bdd83>] sys_socketcall+0x293/0x2d0
> [ 136.546614] [<c13629d0>] sysenter_do_call+0x12/0x26
> [ 136.546619] Code: Bad EIP value.
> [ 136.546627] EIP: [<00000000>] 0x0 SS:ESP 0068:f15c3b48
> [ 136.546645] CR2: 0000000000000000
> [ 136.546652] ---[ end trace 6909b560e78934fa ]---
>
> Signed-off-by: Alexander Holler <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> CC: Neil Horman <[email protected]>
> CC: Herbert Xu <[email protected]>
> ---
> net/bridge/br_netfilter.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index 3fa1231..23b43d2 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -104,10 +104,16 @@ static void fake_update_pmtu(struct dst_entry *dst, u32 mtu)
> {
> }
>
> +static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old)
> +{
> + return NULL;
> +}
> +
> static struct dst_ops fake_dst_ops = {
> .family = AF_INET,
> .protocol = cpu_to_be16(ETH_P_IP),
> .update_pmtu = fake_update_pmtu,
> + .cow_metrics = fake_cow_metrics,
> };
>
> /*
>
>
Not to drag this out further, but since you illustrated the correct way to do
this with the blackhole_ops test, and this modification now gives us two
instances of that case, would it perhaps be better to just do this in
dst_metrics_write_ptr:

return dst->ops->cow_metrics ? return dst->ops->cow_metrics(dst, p) : NULL;

Then we could eliminate the two functions that do nothing be retun NULL (along
with their respective call instructions), and save any future users from having
to remember to include a dummy cow_metrics method if they happen to set the read
only flag on thier dst_ops?

Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-06-06 16:11:54

by Eric Dumazet

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

Le lundi 06 juin 2011 à 11:32 -0400, Neil Horman a écrit :

> Not to drag this out further, but since you illustrated the correct way to do
> this with the blackhole_ops test, and this modification now gives us two
> instances of that case, would it perhaps be better to just do this in
> dst_metrics_write_ptr:
>
> return dst->ops->cow_metrics ? return dst->ops->cow_metrics(dst, p) : NULL;
>
> Then we could eliminate the two functions that do nothing be retun NULL (along
> with their respective call instructions), and save any future users from having
> to remember to include a dummy cow_metrics method if they happen to set the read
> only flag on thier dst_ops?

Well, I prefer how David coded the thing.
We can add selective traces where we want.

Having a default behavior might give much more work to find a bug in
this area. A NULL pointer access gives us an immediate indication.

Its a bit late to add an "if (dst->ops->cow_metrics)" test now that we
covered all call sites ;)

But we probably have more bugs elsewhere, because of many dst changes in
2.6.39


2011-06-06 17:08:45

by Neil Horman

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

On Mon, Jun 06, 2011 at 06:11:45PM +0200, Eric Dumazet wrote:
> Le lundi 06 juin 2011 ? 11:32 -0400, Neil Horman a ?crit :
>
> > Not to drag this out further, but since you illustrated the correct way to do
> > this with the blackhole_ops test, and this modification now gives us two
> > instances of that case, would it perhaps be better to just do this in
> > dst_metrics_write_ptr:
> >
> > return dst->ops->cow_metrics ? return dst->ops->cow_metrics(dst, p) : NULL;
> >
> > Then we could eliminate the two functions that do nothing be retun NULL (along
> > with their respective call instructions), and save any future users from having
> > to remember to include a dummy cow_metrics method if they happen to set the read
> > only flag on thier dst_ops?
>
> Well, I prefer how David coded the thing.
> We can add selective traces where we want.
>
> Having a default behavior might give much more work to find a bug in
> this area. A NULL pointer access gives us an immediate indication.
>
> Its a bit late to add an "if (dst->ops->cow_metrics)" test now that we
> covered all call sites ;)
>
> But we probably have more bugs elsewhere, because of many dst changes in
> 2.6.39
Ok, sounds reasonable to me.

Reviewed-by: Neil Horman <[email protected]>

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-06-07 07:53:39

by David Miller

[permalink] [raw]
Subject: Re: bridge/netfilter: regression in 2.6.39.1

From: Eric Dumazet <[email protected]>
Date: Mon, 06 Jun 2011 16:26:03 +0200

> [PATCH] bridge: provide a cow_metrics method for fake_ops
>
> Like in commit 0972ddb237 (provide cow_metrics() methods to blackhole
> dst_ops), we must provide a cow_metrics for bridges fake_dst_ops as
> well.
>
> This fixes a regression coming from commits 62fa8a846d7d (net: Implement
> read-only protection and COW'ing of metrics.) and 33eb9873a28 (bridge:
> initialize fake_rtable metrics)
>
> ip link set mybridge mtu 1234
> -->
...
> Signed-off-by: Alexander Holler <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>


Applied, thanks everyone.