2017-09-01 06:30:41

by Greg Maitz

[permalink] [raw]
Subject: Incorrect mesh path seq num

Hi guys,

I'm seeing a problem when I work on the wireless mesh between two
linux devices. The root node has 3.18 kernel while the next hop
station runs 2.6.37 kernel. I found the mpath->sn value is incorrect
most of the time on the device having 2.6.37 kernel. After examining
the code, in function hwmp_route_info_get [mesh_hwmp.c], after
mesh_path_lookup, the sequence number (i.e, mpath->sn) is incorrect.
For instance, I see mpath->sn having value 0x30950000. It should be
0x9530, while the orig_sn is having value 0x9531. This results in the
last hop metric to become zero in function mesh_rx_path_sel_frame and
hwmp_preq_frame_process doesn't get called. Is this a known problem?
Can anyone provide suggestions to debug further?


2017-09-08 19:58:26

by Thomas Pedersen

[permalink] [raw]
Subject: Re: Incorrect mesh path seq num

On Mon, Sep 4, 2017 at 6:19 AM, Johannes Berg <[email protected]> wrote:
> On Fri, 2017-09-01 at 13:07 -0700, Thomas Pedersen wrote:
>> On Thu, Aug 31, 2017 at 11:30 PM, Greg Maitz <[email protected]>
>> wrote:
>> > Hi guys,
>> >
>> > I'm seeing a problem when I work on the wireless mesh between two
>> > linux devices. The root node has 3.18 kernel while the next hop
>> > station runs 2.6.37 kernel. I found the mpath->sn value is
>> > incorrect
>> > most of the time on the device having 2.6.37 kernel. After
>> > examining
>> > the code, in function hwmp_route_info_get [mesh_hwmp.c], after
>> > mesh_path_lookup, the sequence number (i.e, mpath->sn) is
>> > incorrect.
>> > For instance, I see mpath->sn having value 0x30950000. It should be
>> > 0x9530, while the orig_sn is having value 0x9531.
>>
>> Looks like an endianess bug. Are you testing on two platforms of
>> different endianess?
>
> Even if that's the case, wouldn't it mean some kind of conversion is
> missing somewhere?

Yes. I looked for a missing conversion, but couldn't find it.

Greg, where / how are you printing mpath->sn? mpath dump or a printk you added?

--
thomas

2017-09-04 13:19:36

by Johannes Berg

[permalink] [raw]
Subject: Re: Incorrect mesh path seq num

On Fri, 2017-09-01 at 13:07 -0700, Thomas Pedersen wrote:
> On Thu, Aug 31, 2017 at 11:30 PM, Greg Maitz <[email protected]>
> wrote:
> > Hi guys,
> >
> > I'm seeing a problem when I work on the wireless mesh between two
> > linux devices. The root node has 3.18 kernel while the next hop
> > station runs 2.6.37 kernel. I found the mpath->sn value is
> > incorrect
> > most of the time on the device having 2.6.37 kernel. After
> > examining
> > the code, in function hwmp_route_info_get [mesh_hwmp.c], after
> > mesh_path_lookup, the sequence number (i.e, mpath->sn) is
> > incorrect.
> > For instance, I see mpath->sn having value 0x30950000. It should be
> > 0x9530, while the orig_sn is having value 0x9531.
>
> Looks like an endianess bug. Are you testing on two platforms of
> different endianess?

Even if that's the case, wouldn't it mean some kind of conversion is
missing somewhere?

johannes

2017-09-01 20:07:42

by Thomas Pedersen

[permalink] [raw]
Subject: Re: Incorrect mesh path seq num

On Thu, Aug 31, 2017 at 11:30 PM, Greg Maitz <[email protected]> wrote:
> Hi guys,
>
> I'm seeing a problem when I work on the wireless mesh between two
> linux devices. The root node has 3.18 kernel while the next hop
> station runs 2.6.37 kernel. I found the mpath->sn value is incorrect
> most of the time on the device having 2.6.37 kernel. After examining
> the code, in function hwmp_route_info_get [mesh_hwmp.c], after
> mesh_path_lookup, the sequence number (i.e, mpath->sn) is incorrect.
> For instance, I see mpath->sn having value 0x30950000. It should be
> 0x9530, while the orig_sn is having value 0x9531.

Looks like an endianess bug. Are you testing on two platforms of
different endianess?

> This results in the
> last hop metric to become zero in function mesh_rx_path_sel_frame and
> hwmp_preq_frame_process doesn't get called. Is this a known problem?
> Can anyone provide suggestions to debug further?



--
thomas

2018-05-07 04:40:46

by Greg Maitz

[permalink] [raw]
Subject: Re: Incorrect mesh path seq num

Yes, I confirmed it to be due to mismatched structures of
ieee80211_rann_ie between the two versions. Issue resolved.

On Sat, Sep 9, 2017 at 5:58 AM, Thomas Pedersen <[email protected]> wrote:
> On Mon, Sep 4, 2017 at 6:19 AM, Johannes Berg <[email protected]> wrote:
>> On Fri, 2017-09-01 at 13:07 -0700, Thomas Pedersen wrote:
>>> On Thu, Aug 31, 2017 at 11:30 PM, Greg Maitz <[email protected]>
>>> wrote:
>>> > Hi guys,
>>> >
>>> > I'm seeing a problem when I work on the wireless mesh between two
>>> > linux devices. The root node has 3.18 kernel while the next hop
>>> > station runs 2.6.37 kernel. I found the mpath->sn value is
>>> > incorrect
>>> > most of the time on the device having 2.6.37 kernel. After
>>> > examining
>>> > the code, in function hwmp_route_info_get [mesh_hwmp.c], after
>>> > mesh_path_lookup, the sequence number (i.e, mpath->sn) is
>>> > incorrect.
>>> > For instance, I see mpath->sn having value 0x30950000. It should be
>>> > 0x9530, while the orig_sn is having value 0x9531.
>>>
>>> Looks like an endianess bug. Are you testing on two platforms of
>>> different endianess?
>>
>> Even if that's the case, wouldn't it mean some kind of conversion is
>> missing somewhere?
>
> Yes. I looked for a missing conversion, but couldn't find it.
>
> Greg, where / how are you printing mpath->sn? mpath dump or a printk you added?
>
> --
> thomas