2017-02-28 23:58:55

by Alexis Green

[permalink] [raw]
Subject: [PATCH v2] mac80211: mesh - always do every discovery retry

Changes since v1: Make this behavior optional and configurable via
netlink.

Instead of stopping path discovery when a path is found continue
attempting to find paths until we hit the dot11MeshHWMPmaxPREQretries
limit.

This is important because path messages are not reliable and it is
relatively common to have a short bad path to the destination along with a
longer but better path. With the original code rather often a path message
along the long path would be lost so we would stick with the bad path.
With this change we have a greater chance to get messages over the longer
path allowing us to select the long path if it's better.

The standard doesn't seem to address this issue. All it says (13.10.8.5)
is that discovery should be limited to dot11MeshHWMPmaxPREQretries.

Signed-off-by: Alexis Green <[email protected]>
---
include/net/cfg80211.h | 3 ++
include/uapi/linux/nl80211.h | 3 ++
net/mac80211/cfg.c | 2 ++
net/mac80211/debugfs_netdev.c | 3 ++
net/mac80211/mesh_hwmp.c | 65 +++++++++++++++++++++++++++++--------------
net/wireless/mesh.c | 1 +
net/wireless/nl80211.c | 8 +++++-
net/wireless/trace.h | 5 +++-
8 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 86c12f8..fb32abf 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -1382,6 +1382,8 @@ struct bss_parameters {
* @plink_timeout: If no tx activity is seen from a STA we've established
* peering with for longer than this time (in seconds), then remove it
* from the STA's list of peers. Default is 30 minutes.
+ * @always_max_discoveries: whether to always perform number of discovery
+ * attemts equal to dot11MeshHWMPmaxPREQretries
*/
struct mesh_config {
u16 dot11MeshRetryTimeout;
@@ -1412,6 +1414,7 @@ struct mesh_config {
enum nl80211_mesh_power_mode power_mode;
u16 dot11MeshAwakeWindowDuration;
u32 plink_timeout;
+ bool always_max_discoveries;
};

/**
diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 9a499b1..58f8f0c 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -3449,6 +3449,8 @@ enum nl80211_mesh_power_mode {
* established peering with for longer than this time (in seconds), then
* remove it from the STA's list of peers. You may set this to 0 to disable
* the removal of the STA. Default is 30 minutes.
+ * @NL80211_MESHCONF_HWMP_ALWAYS_MAX_DISCOVERIES: whether to always perform
+ * number of discovery attempts equal to MaxPREQretries (default is FALSE)
*
* @__NL80211_MESHCONF_ATTR_AFTER_LAST: internal use
*/
@@ -3482,6 +3484,7 @@ enum nl80211_meshconf_params {
NL80211_MESHCONF_POWER_MODE,
NL80211_MESHCONF_AWAKE_WINDOW,
NL80211_MESHCONF_PLINK_TIMEOUT,
+ NL80211_MESHCONF_HWMP_ALWAYS_MAX_DISCOVERIES,

/* keep last */
__NL80211_MESHCONF_ATTR_AFTER_LAST,
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 9c7490c..9581873 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1956,6 +1956,8 @@ static int ieee80211_update_mesh_config(struct wiphy *wiphy,
nconf->dot11MeshAwakeWindowDuration;
if (_chg_mesh_attr(NL80211_MESHCONF_PLINK_TIMEOUT, mask))
conf->plink_timeout = nconf->plink_timeout;
+ if (_chg_mesh_attr(NL80211_MESHCONF_HWMP_ALWAYS_MAX_DISCOVERIES, mask))
+ conf->always_max_discoveries = nconf->always_max_discoveries;
ieee80211_mbss_info_change_notify(sdata, BSS_CHANGED_BEACON);
return 0;
}
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index 8f5fff8..ceec6e8 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -642,6 +642,8 @@ IEEE80211_IF_FILE(dot11MeshHWMPconfirmationInterval,
IEEE80211_IF_FILE(power_mode, u.mesh.mshcfg.power_mode, DEC);
IEEE80211_IF_FILE(dot11MeshAwakeWindowDuration,
u.mesh.mshcfg.dot11MeshAwakeWindowDuration, DEC);
+IEEE80211_IF_FILE(always_max_discoveries,
+ u.mesh.mshcfg.always_max_discoveries, DEC);
#endif

#define DEBUGFS_ADD_MODE(name, mode) \
@@ -763,6 +765,7 @@ static void add_mesh_config(struct ieee80211_sub_if_data *sdata)
MESHPARAMS_ADD(dot11MeshHWMPconfirmationInterval);
MESHPARAMS_ADD(power_mode);
MESHPARAMS_ADD(dot11MeshAwakeWindowDuration);
+ MESHPARAMS_ADD(always_max_discoveries);
#undef MESHPARAMS_ADD
}
#endif
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index d07ee3c..7c019f9 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -1198,34 +1198,57 @@ void mesh_path_timer(unsigned long data)
{
struct mesh_path *mpath = (void *) data;
struct ieee80211_sub_if_data *sdata = mpath->sdata;
+ struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
+ bool multiple_discoveries = ifmsh->mshcfg.always_max_discoveries;
int ret;

if (sdata->local->quiescing)
return;

spin_lock_bh(&mpath->state_lock);
- if (mpath->flags & MESH_PATH_RESOLVED ||
- (!(mpath->flags & MESH_PATH_RESOLVING))) {
- mpath->flags &= ~(MESH_PATH_RESOLVING | MESH_PATH_RESOLVED);
- spin_unlock_bh(&mpath->state_lock);
- } else if (mpath->discovery_retries < max_preq_retries(sdata)) {
- ++mpath->discovery_retries;
- mpath->discovery_timeout *= 2;
- mpath->flags &= ~MESH_PATH_REQ_QUEUED;
- spin_unlock_bh(&mpath->state_lock);
- mesh_queue_preq(mpath, 0);
+ if (!multiple_discoveries) {
+ if (mpath->flags & MESH_PATH_RESOLVED ||
+ (!(mpath->flags & MESH_PATH_RESOLVING))) {
+ mpath->flags &= ~(MESH_PATH_RESOLVING |
+ MESH_PATH_RESOLVED);
+ spin_unlock_bh(&mpath->state_lock);
+ return;
+ } else if (mpath->discovery_retries < max_preq_retries(sdata)) {
+ ++mpath->discovery_retries;
+ mpath->discovery_timeout *= 2;
+ mpath->flags &= ~MESH_PATH_REQ_QUEUED;
+ spin_unlock_bh(&mpath->state_lock);
+ mesh_queue_preq(mpath, 0);
+ return;
+ }
} else {
- mpath->flags &= ~(MESH_PATH_RESOLVING |
- MESH_PATH_RESOLVED |
- MESH_PATH_REQ_QUEUED);
- mpath->exp_time = jiffies;
- spin_unlock_bh(&mpath->state_lock);
- if (!mpath->is_gate && mesh_gate_num(sdata) > 0) {
- ret = mesh_path_send_to_gates(mpath);
- if (ret)
- mhwmp_dbg(sdata, "no gate was reachable\n");
- } else
- mesh_path_flush_pending(mpath);
+ if (mpath->discovery_retries < max_preq_retries(sdata)) {
+ ++mpath->discovery_retries;
+ mpath->discovery_timeout *= 2;
+ mpath->flags &= ~MESH_PATH_REQ_QUEUED;
+ spin_unlock_bh(&mpath->state_lock);
+ mesh_queue_preq(mpath, 0);
+ return;
+ } else if (mpath->flags & MESH_PATH_RESOLVED ||
+ (!(mpath->flags & MESH_PATH_RESOLVING))) {
+ mpath->flags &= ~(MESH_PATH_RESOLVING |
+ MESH_PATH_RESOLVED);
+ spin_unlock_bh(&mpath->state_lock);
+ return;
+ }
+ }
+
+ mpath->flags &= ~(MESH_PATH_RESOLVING |
+ MESH_PATH_RESOLVED |
+ MESH_PATH_REQ_QUEUED);
+ mpath->exp_time = jiffies;
+ spin_unlock_bh(&mpath->state_lock);
+ if (!mpath->is_gate && mesh_gate_num(sdata) > 0) {
+ ret = mesh_path_send_to_gates(mpath);
+ if (ret)
+ mhwmp_dbg(sdata, "no gate was reachable\n");
+ } else {
+ mesh_path_flush_pending(mpath);
}
}

diff --git a/net/wireless/mesh.c b/net/wireless/mesh.c
index 2d8518a..f82fcd3 100644
--- a/net/wireless/mesh.c
+++ b/net/wireless/mesh.c
@@ -77,6 +77,7 @@ const struct mesh_config default_mesh_config = {
.power_mode = NL80211_MESH_POWER_ACTIVE,
.dot11MeshAwakeWindowDuration = MESH_DEFAULT_AWAKE_WINDOW,
.plink_timeout = MESH_DEFAULT_PLINK_TIMEOUT,
+ .always_max_discoveries = false,
};

const struct mesh_setup default_mesh_setup = {
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d516527..6e1181d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -5729,7 +5729,9 @@ static int nl80211_get_mesh_config(struct sk_buff *skb,
nla_put_u16(msg, NL80211_MESHCONF_AWAKE_WINDOW,
cur_params.dot11MeshAwakeWindowDuration) ||
nla_put_u32(msg, NL80211_MESHCONF_PLINK_TIMEOUT,
- cur_params.plink_timeout))
+ cur_params.plink_timeout) ||
+ nla_put_u8(msg, NL80211_MESHCONF_HWMP_ALWAYS_MAX_DISCOVERIES,
+ cur_params.always_max_discoveries))
goto nla_put_failure;
nla_nest_end(msg, pinfoattr);
genlmsg_end(msg, hdr);
@@ -5771,6 +5773,7 @@ static const struct nla_policy nl80211_meshconf_params_policy[NL80211_MESHCONF_A
[NL80211_MESHCONF_POWER_MODE] = { .type = NLA_U32 },
[NL80211_MESHCONF_AWAKE_WINDOW] = { .type = NLA_U16 },
[NL80211_MESHCONF_PLINK_TIMEOUT] = { .type = NLA_U32 },
+ [NL80211_MESHCONF_HWMP_ALWAYS_MAX_DISCOVERIES] = { .type = NLA_U8 },
};

static const struct nla_policy
@@ -5995,6 +5998,9 @@ do { \
FILL_IN_MESH_PARAM_IF_SET(tb, cfg, plink_timeout, 0, 0xffffffff,
mask, NL80211_MESHCONF_PLINK_TIMEOUT,
nl80211_check_u32);
+ FILL_IN_MESH_PARAM_IF_SET(tb, cfg, always_max_discoveries, 0, 1, mask,
+ NL80211_MESHCONF_HWMP_ALWAYS_MAX_DISCOVERIES,
+ nl80211_check_bool);
if (mask_out)
*mask_out = mask;

diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index fd55786..cc4a966 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -67,7 +67,8 @@
__field(u16, ht_opmode) \
__field(u32, dot11MeshHWMPactivePathToRootTimeout) \
__field(u16, dot11MeshHWMProotInterval) \
- __field(u16, dot11MeshHWMPconfirmationInterval)
+ __field(u16, dot11MeshHWMPconfirmationInterval) \
+ __field(bool, always_max_discoveries)
#define MESH_CFG_ASSIGN \
do { \
__entry->dot11MeshRetryTimeout = conf->dot11MeshRetryTimeout; \
@@ -108,6 +109,8 @@
conf->dot11MeshHWMProotInterval; \
__entry->dot11MeshHWMPconfirmationInterval = \
conf->dot11MeshHWMPconfirmationInterval; \
+ __entry->always_max_discoveries = \
+ conf->always_max_discoveries; \
} while (0)

#define CHAN_ENTRY __field(enum nl80211_band, band) \


2017-03-29 10:34:01

by Chun-Yeow Yeoh

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: mesh - always do every discovery retry

I would suggest the following modification of commit messages and
code. Let me know whether this is fine.

---------
I would suggest the following edition to the commit message:

Instead of stopping path discovery when a path is established, continue the
attempts to find alternative paths until we hit the dot11MeshHWMPmaxPREQretries
limit. However, this is not a standard behavior and may easily
increase the number of
broadcast PREQ frame in your network. So this feature is turned off by default.

and the remaining are removed due to misleading explanation.

Then, in the mesh_path_timer, I think that only the following is needed:

- if (mpath->flags & MESH_PATH_RESOLVED ||
- (!(mpath->flags & MESH_PATH_RESOLVING))) {
+ if (!multiple_discoveries &&
+ (mpath->flags & MESH_PATH_RESOLVED ||
+ (!(mpath->flags & MESH_PATH_RESOLVING)))) {

---
Chun-Yeow

On Wed, Mar 29, 2017 at 4:24 PM, Johannes Berg
<[email protected]> wrote:
> What's the outcome of this? I'm tempted to apply the patch since it's
> optional anyway ...
>
> johannes

2017-03-03 19:47:48

by Jesse Jones

[permalink] [raw]
Subject: RE: [PATCH v2] mac80211: mesh - always do every discovery retry

> > Yes, that is a real issue. We are planning on doing some further work
> > in this area to try to minimize the explosions that can be seen with
> > PREQs in larger networks while balancing the need for reliability.
> >
> > Path discovery
> >> should stop once the path is established. By attempting 2nd, 3rd or
> >> 4th doesn't guarantee the next path will be "good".
> >
> > It doesn't guarantee anything of course but it does raise the
> > probability that the right path will be found. For example take four
> > nodes in a ring where the A-B, B-C, C-D links are all good but the A-D
> > link is poor. Poor enough that the higher data rates are hosed for
> > that link but the basic rate used by management frames is relatively
> > unaffected. If we assume that the reliability of management frames is
> > 90% then in order for A to route to D it needs to get a PREQ to D and
> > a PREP back. It has two options 1) for A-D the reliability will be
> > 0.9^2 = 81% 2) for A-B-C-D the reliability will be 0.9^6 = 53%.
>
> What is round trip delay between A-D compared to A-B-C-D? 1 hop away the
> throughput is reduced by half.

Well A-D is going to have a much smaller RTT than A-B-C-D. And, yes, using
multiple hops is going to reduce throughput but I'd much rather use multiple
120 Mbps links than a link that only supports 12 Mbps.

>Also if you have more nodes using longer
> path, you consume more airtime leads to really bad network performance,
> especially when B, C, or even D wants to start initial data transmission.

That's why there are link and path metrics.

> One way of improving the path is either enhancing the airtime link metric
> by
> considering the factor that you mentioned in or else introducing vendor-
> specific path metric. Sending more broadcast frames over the network won't
> be a good solution.

Improving the link metric isn't going to help with PREQ reliability
problems. Sending more PREQs will.

-- Jesse

2017-03-03 13:10:55

by Chun-Yeow Yeoh

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: mesh - always do every discovery retry

>
> Yes, that is a real issue. We are planning on doing some further work in
> this area to try to minimize the explosions that can be seen with PREQs in
> larger networks while balancing the need for reliability.
>
> Path discovery
>> should stop once the path is established. By attempting 2nd, 3rd or 4th
>> doesn't guarantee the next path will be "good".
>
> It doesn't guarantee anything of course but it does raise the probability
> that the right path will be found. For example take four nodes in a ring
> where the A-B, B-C, C-D links are all good but the A-D link is poor. Poor
> enough that the higher data rates are hosed for that link but the basic rate
> used by management frames is relatively unaffected. If we assume that the
> reliability of management frames is 90% then in order for A to route to D it
> needs to get a PREQ to D and a PREP back. It has two options 1) for A-D the
> reliability will be 0.9^2 = 81% 2) for A-B-C-D the reliability will be 0.9^6
> = 53%.

What is round trip delay between A-D compared to A-B-C-D? 1 hop away
the throughput is reduced by half. Also if you have more nodes using
longer path, you consume more airtime leads to really bad network
performance, especially when B, C, or even D wants to start initial
data transmission.

> This isn't a good situation because it makes it much too easy for routing to
> pick a *really* bad path. And we have seen reliability improvements with
> this patch.
>
> We have already made changes to dial way back on the number of PREQs sent
> out so this patch made quite a bit of sense for us. In the default
> configuration where PREQs go out every 4s I don't think we have a good
> solution: picking a bad path, even for 4s, can be a horrible user experience

Reduce dot11MeshHWMPpreqMinInterval to send out more PREQ in your network.

> but PREQ volumes quickly start consuming too much airtime as network sizes
> increase.

One way of improving the path is either enhancing the airtime link
metric by considering the factor that you mentioned in or else
introducing vendor-specific path metric. Sending more broadcast frames
over the network won't be a good solution.

---
Chun-Yeow

2017-03-04 12:05:59

by Chun-Yeow Yeoh

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: mesh - always do every discovery retry

On Sat, Mar 4, 2017 at 12:01 PM, Chun-Yeow Yeoh <[email protected]> wrote:
>
>>
>> Well A-D is going to have a much smaller RTT than A-B-C-D. And, yes, using
>> multiple hops is going to reduce throughput but I'd much rather use
>> multiple
>> 120 Mbps links than a link that only supports 12 Mbps.
>>
>
> Airtime link metrics should choose the multiple links if it is good compared
> to direct link.
>
>> That's why there are link and path metrics.
>
> I don't think that you don't get this. Every nodes sends out multiple PREQs.
> Other nodes received it with rebroadcast again. Take note on this.
>
>> Improving the link metric isn't going to help with PREQ reliability
>> problems.
>
> PREQ realiabilit? It is broadcast management frame and usually using lowest
> transmission rate. So it is realiabe.
>
>> Sending more PREQs will.
>
> So reduce dot11MeshHWMPpreqMinInterval to send out more PREQ works for you?

Alright, I think that we have similar lengthy discussion last time.
http://comments.gmane.org/gmane.linux.kernel.wireless.general/139453

I would suggest the following edition to the commit message:

Instead of stopping path discovery when a path is established, continue the
attempts to find alternative paths until we hit the dot11MeshHWMPmaxPREQretries
limit. However, this is not a standard behavior and may easily
increase the number of
broadcast PREQ frame in your network. So this feature is turned off by default.

and the remaining are removed due to misleading explanation.

Then, in the mesh_path_timer, I think that only the following is needed:

- if (mpath->flags & MESH_PATH_RESOLVED ||
- (!(mpath->flags & MESH_PATH_RESOLVING))) {
+ if (!multiple_discoveries &&
+ (mpath->flags & MESH_PATH_RESOLVED ||
+ (!(mpath->flags & MESH_PATH_RESOLVING)))) {

Thanks

---
Chun-Yeow

2017-03-07 03:08:21

by Chun-Yeow Yeoh

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: mesh - always do every discovery retry

> They are only reliable when compared to individual rate controlled frames.
> But, in general, they are most certainly not reliable. Even in a shielded
> and conductive (i.e. with physical wires connecting antennas) environment
> lost PREQ frames are not hard to see.

Did you try to reduce dot11MeshHWMPpreqMinInterval to send out more
PREQ and see whether this works for you?

I think that we have similar lengthy discussion last time.
http://comments.gmane.org/gmane.linux.kernel.wireless.general/139453

I would suggest the following edition to the commit message:

Instead of stopping path discovery when a path is established, continue the
attempts to find alternative paths until we hit the dot11MeshHWMPmaxPREQretries
limit. However, this is not a standard behavior and may easily
increase the number of
broadcast PREQ frame in your network. So this feature is turned off by default.

and the remaining are removed due to misleading explanation.

Then, in the mesh_path_timer, I think that only the following is needed:

- if (mpath->flags & MESH_PATH_RESOLVED ||
- (!(mpath->flags & MESH_PATH_RESOLVING))) {
+ if (!multiple_discoveries &&
+ (mpath->flags & MESH_PATH_RESOLVED ||
+ (!(mpath->flags & MESH_PATH_RESOLVING)))) {

Thanks

---
Chun-Yeow

2017-03-06 18:25:54

by Jesse Jones

[permalink] [raw]
Subject: RE: [PATCH v2] mac80211: mesh - always do every discovery retry

> >> Well A-D is going to have a much smaller RTT than A-B-C-D. And, yes,
> >> using multiple hops is going to reduce throughput but I'd much rather
> >> use multiple
> >> 120 Mbps links than a link that only supports 12 Mbps.
> >>
> >
> > Airtime link metrics should choose the multiple links if it is good
> > compared to direct link.
> >
> >> That's why there are link and path metrics.
> >
> > I don't think that you don't get this. Every nodes sends out multiple
> > PREQs.
> > Other nodes received it with rebroadcast again. Take note on this.
> >
> >> Improving the link metric isn't going to help with PREQ reliability
> >> problems.
> >
> > PREQ realiabilit? It is broadcast management frame and usually using
> > lowest transmission rate. So it is realiabe.

They are only reliable when compared to individual rate controlled frames.
But, in general, they are most certainly not reliable. Even in a shielded
and conductive (i.e. with physical wires connecting antennas) environment
lost PREQ frames are not hard to see.

-- Jesse

2017-03-29 08:24:10

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: mesh - always do every discovery retry

What's the outcome of this? I'm tempted to apply the patch since it's
optional anyway ...

johannes

2017-05-17 13:59:29

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: mesh - always do every discovery retry

On Wed, 2017-03-29 at 18:33 +0800, Chun-Yeow Yeoh wrote:
> I would suggest the following modification of commit messages and
> code. Let me know whether this is fine.
>
> ---------
> I would suggest the following edition to the commit message:
>
> Instead of stopping path discovery when a path is established,
> continue the
> attempts to find alternative paths until we hit the
> dot11MeshHWMPmaxPREQretries
> limit. However, this is not a standard behavior and may easily
> increase the number of
> broadcast PREQ frame in your network. So this feature is turned off
> by default.
>
> and the remaining are removed due to misleading explanation.
>
> Then, in the mesh_path_timer, I think that only the following is
> needed:
>
> - if (mpath->flags & MESH_PATH_RESOLVED ||
> - (!(mpath->flags & MESH_PATH_RESOLVING))) {
> + if (!multiple_discoveries &&
> +    (mpath->flags & MESH_PATH_RESOLVED ||
> +     (!(mpath->flags & MESH_PATH_RESOLVING)))) {

Alexis?

johannes