From: Johannes Berg <[email protected]>
My prior race fix here broke CQM when ranges aren't used, as
the reporting worker now requires the cqm_config to be set in
the wdev, but isn't set when there's no range configured.
Rather than continuing to special-case the range version, set
the cqm_config always and configure accordingly, also tracking
if range was used or not to be able to clear the configuration
appropriately with the same API, which was actually not right
if both were implemented by a driver for some reason, as is
the case with mac80211 (though there the implementations are
equivalent so it doesn't matter.)
Also, the original multiple-RSSI commit lost checking for the
callback, so might have potentially crashed if a driver had
neither implementation, and userspace tried to use it despite
not being advertised as supported.
Cc: [email protected]
Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM")
Fixes: 37c20b2effe9 ("wifi: cfg80211: fix cqm_config access race")
Signed-off-by: Johannes Berg <[email protected]>
---
net/wireless/core.h | 1 +
net/wireless/nl80211.c | 50 ++++++++++++++++++++++++++----------------
2 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/net/wireless/core.h b/net/wireless/core.h
index 4c692c7faf30..cb61d33d4f1e 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -293,6 +293,7 @@ struct cfg80211_cqm_config {
u32 rssi_hyst;
s32 last_rssi_event_value;
enum nl80211_cqm_rssi_threshold_event last_rssi_event_type;
+ bool use_range_api;
int n_rssi_thresholds;
s32 rssi_thresholds[] __counted_by(n_rssi_thresholds);
};
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 569234bc2be6..dbfed5a2d7b6 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -12787,10 +12787,6 @@ static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
int i, n, low_index;
int err;
- /* RSSI reporting disabled? */
- if (!cqm_config)
- return rdev_set_cqm_rssi_range_config(rdev, dev, 0, 0);
-
/*
* Obtain current RSSI value if possible, if not and no RSSI threshold
* event has been received yet, we should receive an event after a
@@ -12865,23 +12861,25 @@ static int nl80211_set_cqm_rssi(struct genl_info *info,
wdev->iftype != NL80211_IFTYPE_P2P_CLIENT)
return -EOPNOTSUPP;
- if (n_thresholds <= 1 && rdev->ops->set_cqm_rssi_config) {
- if (n_thresholds == 0 || thresholds[0] == 0) /* Disabling */
- return rdev_set_cqm_rssi_config(rdev, dev, 0, 0);
-
- return rdev_set_cqm_rssi_config(rdev, dev,
- thresholds[0], hysteresis);
- }
-
- if (!wiphy_ext_feature_isset(&rdev->wiphy,
- NL80211_EXT_FEATURE_CQM_RSSI_LIST))
- return -EOPNOTSUPP;
-
if (n_thresholds == 1 && thresholds[0] == 0) /* Disabling */
n_thresholds = 0;
old = wiphy_dereference(wdev->wiphy, wdev->cqm_config);
+ /* if already disabled just succeed */
+ if (!n_thresholds && !old)
+ return 0;
+
+ if (n_thresholds > 1) {
+ if (!wiphy_ext_feature_isset(&rdev->wiphy,
+ NL80211_EXT_FEATURE_CQM_RSSI_LIST) ||
+ !rdev->ops->set_cqm_rssi_range_config)
+ return -EOPNOTSUPP;
+ } else {
+ if (!rdev->ops->set_cqm_rssi_config)
+ return -EOPNOTSUPP;
+ }
+
if (n_thresholds) {
cqm_config = kzalloc(struct_size(cqm_config, rssi_thresholds,
n_thresholds),
@@ -12894,13 +12892,26 @@ static int nl80211_set_cqm_rssi(struct genl_info *info,
memcpy(cqm_config->rssi_thresholds, thresholds,
flex_array_size(cqm_config, rssi_thresholds,
n_thresholds));
+ cqm_config->use_range_api = n_thresholds > 1 ||
+ !rdev->ops->set_cqm_rssi_config;
rcu_assign_pointer(wdev->cqm_config, cqm_config);
+
+ if (cqm_config->use_range_api)
+ err = cfg80211_cqm_rssi_update(rdev, dev, cqm_config);
+ else
+ err = rdev_set_cqm_rssi_config(rdev, dev,
+ thresholds[0],
+ hysteresis);
} else {
RCU_INIT_POINTER(wdev->cqm_config, NULL);
+ /* if enabled as range also disable via range */
+ if (old->use_range_api)
+ err = rdev_set_cqm_rssi_range_config(rdev, dev, 0, 0);
+ else
+ err = rdev_set_cqm_rssi_config(rdev, dev, 0, 0);
}
- err = cfg80211_cqm_rssi_update(rdev, dev, cqm_config);
if (err) {
rcu_assign_pointer(wdev->cqm_config, old);
kfree_rcu(cqm_config, rcu_head);
@@ -19009,10 +19020,11 @@ void cfg80211_cqm_rssi_notify_work(struct wiphy *wiphy, struct wiphy_work *work)
s32 rssi_level;
cqm_config = wiphy_dereference(wdev->wiphy, wdev->cqm_config);
- if (!wdev->cqm_config)
+ if (!cqm_config)
return;
- cfg80211_cqm_rssi_update(rdev, wdev->netdev, cqm_config);
+ if (cqm_config->use_range_api)
+ cfg80211_cqm_rssi_update(rdev, wdev->netdev, cqm_config);
rssi_level = cqm_config->last_rssi_event_value;
rssi_event = cqm_config->last_rssi_event_type;
--
2.41.0
Hi,
> net/wireless/nl80211.c: In function 'nl80211_set_cqm_rssi.isra':
> net/wireless/nl80211.c:12892:17: warning: 'memcpy' specified bound
> 18446744073709551615 exceeds maximum object size 9223372036854775807
> [-Wstringop-overflow=]
FWIW, I'm getting the same error with the current next (next-20231128).
-michael
On Tue, 2023-11-28 at 15:44 +0100, Michael Walle wrote:
> Hi,
>
> > net/wireless/nl80211.c: In function 'nl80211_set_cqm_rssi.isra':
> > net/wireless/nl80211.c:12892:17: warning: 'memcpy' specified bound
> > 18446744073709551615 exceeds maximum object size 9223372036854775807
> > [-Wstringop-overflow=]
>
> FWIW, I'm getting the same error with the current next (next-20231128).
>
I actually forgot about that, but does anyone actually know what this is
trying to tell me?
The code seems to be
if (n_thresholds) {
cqm_config = kzalloc(struct_size(cqm_config, rssi_thresholds,
n_thresholds),
GFP_KERNEL);
if (!cqm_config)
return -ENOMEM;
cqm_config->rssi_hyst = hysteresis;
cqm_config->n_rssi_thresholds = n_thresholds;
memcpy(cqm_config->rssi_thresholds, thresholds,
flex_array_size(cqm_config, rssi_thresholds,
n_thresholds));
Or does it just want to say n_thresholds shouldn't be a signed variable?
johannes
Dear Johannes,
You are probably aware of this already, but I believe this patch is quite crucial as your earlier patch may have effectively broken some roaming scenarios using wpa_supplicant.
The supplicant is using CQM to monitor for RSSI threshold changes (non-ranged) to switch between short (few seconds) and very long (1 *hour* in the default NetworkManager setting) background scan intervals. With a current lts kernel (6.1.63) on Arch I do not see RSSI change events at all and wpa_supplicant will almost never (1 hour timeout) scan for a better AP as long as the current AP is still in range.
I applied your patch against 6.1.63 with some additional `goto unlock`s and it resolves the issue for me. I've been running the kernel for 2 days now.
cheers,
Michael
On 11/28/2023 10:23 AM, Johannes Berg wrote:
> On Tue, 2023-11-28 at 15:44 +0100, Michael Walle wrote:
>> Hi,
>>
>>> net/wireless/nl80211.c: In function 'nl80211_set_cqm_rssi.isra':
>>> net/wireless/nl80211.c:12892:17: warning: 'memcpy' specified bound
>>> 18446744073709551615 exceeds maximum object size 9223372036854775807
>>> [-Wstringop-overflow=]
>>
>> FWIW, I'm getting the same error with the current next (next-20231128).
>>
>
> I actually forgot about that, but does anyone actually know what this is
> trying to tell me?
>
> The code seems to be
>
> if (n_thresholds) {
> cqm_config = kzalloc(struct_size(cqm_config, rssi_thresholds,
> n_thresholds),
> GFP_KERNEL);
> if (!cqm_config)
> return -ENOMEM;
>
> cqm_config->rssi_hyst = hysteresis;
> cqm_config->n_rssi_thresholds = n_thresholds;
> memcpy(cqm_config->rssi_thresholds, thresholds,
> flex_array_size(cqm_config, rssi_thresholds,
> n_thresholds));
>
>
> Or does it just want to say n_thresholds shouldn't be a signed variable?
+Kees for flex array education :)
On Tue, Nov 28, 2023 at 01:01:20PM -0800, Jeff Johnson wrote:
> On 11/28/2023 10:23 AM, Johannes Berg wrote:
> > On Tue, 2023-11-28 at 15:44 +0100, Michael Walle wrote:
> >> Hi,
> >>
> >>> net/wireless/nl80211.c: In function 'nl80211_set_cqm_rssi.isra':
> >>> net/wireless/nl80211.c:12892:17: warning: 'memcpy' specified bound
> >>> 18446744073709551615 exceeds maximum object size 9223372036854775807
> >>> [-Wstringop-overflow=]
> >>
> >> FWIW, I'm getting the same error with the current next (next-20231128).
> >>
> >
> > I actually forgot about that, but does anyone actually know what this is
> > trying to tell me?
> >
> > The code seems to be
> >
> > if (n_thresholds) {
> > cqm_config = kzalloc(struct_size(cqm_config, rssi_thresholds,
> > n_thresholds),
> > GFP_KERNEL);
> > if (!cqm_config)
> > return -ENOMEM;
> >
> > cqm_config->rssi_hyst = hysteresis;
> > cqm_config->n_rssi_thresholds = n_thresholds;
> > memcpy(cqm_config->rssi_thresholds, thresholds,
> > flex_array_size(cqm_config, rssi_thresholds,
> > n_thresholds));
> >
> >
> > Or does it just want to say n_thresholds shouldn't be a signed variable?
>
> +Kees for flex array education :)
Yeah, I would expect this to mean that there is a code path that
GCC found where the value could overflow. It does this when a variable
"value range" gets bounded (e.g. an int isn't the full -INT_MAX to INT_MAX
range).And flex_array_size() was designed to saturate at SIZE_MIX rather
than wrapping around to an unexpected small value, so these are playing
together it seems.
However, I would have expected the kzalloc() to blow up _first_.
Regardless, I suspect the addition of "if (n_thresholds > 1)" is what is
tripping GCC.
int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
...
return nl80211_set_cqm_rssi(info, thresholds, len / 4,
hysteresis);
Now it "knows" there is a path where n_threasholds could be [2,
INT_MAX].
Does this warning go away if "len" is made unsigned?
Does adding an upper bounds sanity check help as a work-around, like:
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d0f499227c29..2cb78ac44b6c 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -12855,6 +12855,9 @@ static int nl80211_set_cqm_rssi(struct genl_info *info,
s32 prev = S32_MIN;
int i, err;
+ if (n_thresholds > INT_MAX / sizeof(*thresholds))
+ return -EINVAL;
+
/* Check all values negative and sorted */
for (i = 0; i < n_thresholds; i++) {
if (thresholds[i] > 0 || thresholds[i] <= prev)
--
Kees Cook
On Thu, 2023-11-30 at 10:32 -0800, Kees Cook wrote:
> Yeah, I would expect this to mean that there is a code path that
> GCC found where the value could overflow. It does this when a variable
> "value range" gets bounded (e.g. an int isn't the full -INT_MAX to INT_MAX
> range).And flex_array_size() was designed to saturate at SIZE_MIX rather
> than wrapping around to an unexpected small value, so these are playing
> together it seems.
>
> However, I would have expected the kzalloc() to blow up _first_.
Hmm.
> Regardless, I suspect the addition of "if (n_thresholds > 1)" is what is
> tripping GCC.
>
> int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> ...
> return nl80211_set_cqm_rssi(info, thresholds, len / 4,
> hysteresis);
>
> Now it "knows" there is a path where n_threasholds could be [2,
> INT_MAX].
Yeah, it's not _really_ bounded, apart from the message length? But then
struct_size() should saturate and fail? But I guess it cannot know that,
and limits the object size to 1<<63 - 1 whereas the copy is 1<<64 - 1...
> Does this warning go away if "len" is made unsigned?
Thing is, neither Kalle nor I can even reproduce the warning locally, so
it's a bit hard to check ... not even with their config and gcc 12.2.0
(nix, rather than debian though.)
> Does adding an upper bounds sanity check help as a work-around, like:
So ... no idea!
I guess I can push something to a branch and see if the robot picks it
up ...
johannes
On Thu, Nov 30, 2023 at 07:40:26PM +0100, Johannes Berg wrote:
> On Thu, 2023-11-30 at 10:32 -0800, Kees Cook wrote:
> > Yeah, I would expect this to mean that there is a code path that
> > GCC found where the value could overflow. It does this when a variable
> > "value range" gets bounded (e.g. an int isn't the full -INT_MAX to INT_MAX
> > range).And flex_array_size() was designed to saturate at SIZE_MIX rather
> > than wrapping around to an unexpected small value, so these are playing
> > together it seems.
> >
> > However, I would have expected the kzalloc() to blow up _first_.
>
> Hmm.
>
> > Regardless, I suspect the addition of "if (n_thresholds > 1)" is what is
> > tripping GCC.
> >
> > int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> > ...
> > return nl80211_set_cqm_rssi(info, thresholds, len / 4,
> > hysteresis);
> >
> > Now it "knows" there is a path where n_threasholds could be [2,
> > INT_MAX].
>
> Yeah, it's not _really_ bounded, apart from the message length? But then
> struct_size() should saturate and fail? But I guess it cannot know that,
> and limits the object size to 1<<63 - 1 whereas the copy is 1<<64 - 1...
>
> > Does this warning go away if "len" is made unsigned?
>
> Thing is, neither Kalle nor I can even reproduce the warning locally, so
> it's a bit hard to check ... not even with their config and gcc 12.2.0
> (nix, rather than debian though.)
Ah! Hmm. Let me see if I can coax out the warning locally...
--
Kees Cook
On Thu, Nov 30, 2023 at 07:40:26PM +0100, Johannes Berg wrote:
> On Thu, 2023-11-30 at 10:32 -0800, Kees Cook wrote:
> > Yeah, I would expect this to mean that there is a code path that
> > GCC found where the value could overflow. It does this when a variable
> > "value range" gets bounded (e.g. an int isn't the full -INT_MAX to INT_MAX
> > range).And flex_array_size() was designed to saturate at SIZE_MIX rather
> > than wrapping around to an unexpected small value, so these are playing
> > together it seems.
> >
> > However, I would have expected the kzalloc() to blow up _first_.
>
> Hmm.
>
> > Regardless, I suspect the addition of "if (n_thresholds > 1)" is what is
> > tripping GCC.
> >
> > int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> > ...
> > return nl80211_set_cqm_rssi(info, thresholds, len / 4,
> > hysteresis);
> >
> > Now it "knows" there is a path where n_threasholds could be [2,
> > INT_MAX].
>
> Yeah, it's not _really_ bounded, apart from the message length? But then
> struct_size() should saturate and fail? But I guess it cannot know that,
> and limits the object size to 1<<63 - 1 whereas the copy is 1<<64 - 1...
>
> > Does this warning go away if "len" is made unsigned?
>
> Thing is, neither Kalle nor I can even reproduce the warning locally, so
> it's a bit hard to check ... not even with their config and gcc 12.2.0
> (nix, rather than debian though.)
I was able to see it with Ubuntu's GCC 12.3.0 and their config. This
fixed it for me:
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d0f499227c29..7735d178a393 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -12845,7 +12845,7 @@ static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
}
static int nl80211_set_cqm_rssi(struct genl_info *info,
- const s32 *thresholds, int n_thresholds,
+ const s32 *thresholds, u32 n_thresholds,
u32 hysteresis)
{
struct cfg80211_registered_device *rdev = info->user_ptr[0];
@@ -12948,7 +12948,7 @@ static int nl80211_set_cqm(struct sk_buff *skb, struct genl_info *info)
attrs[NL80211_ATTR_CQM_RSSI_HYST]) {
const s32 *thresholds =
nla_data(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
- int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
+ u32 len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
u32 hysteresis = nla_get_u32(attrs[NL80211_ATTR_CQM_RSSI_HYST]);
if (len % 4)
If that's sensible, I can send a proper patch?
(Oh, it looks like nla_len is actually u16 ... should I use that instead
of u32?)
--
Kees Cook
On Thu, 2023-11-30 at 10:46 -0800, Kees Cook wrote:
> >
> > > Does this warning go away if "len" is made unsigned?
> >
> > Thing is, neither Kalle nor I can even reproduce the warning locally, so
> > it's a bit hard to check ... not even with their config and gcc 12.2.0
> > (nix, rather than debian though.)
>
> Ah! Hmm. Let me see if I can coax out the warning locally...
>
Actually, I see it now with wireless-next + that patch ... I had put it
into wireless, not wireless-next :)
johannes
On Thu, 2023-11-30 at 10:52 -0800, Kees Cook wrote:
>
> I was able to see it with Ubuntu's GCC 12.3.0 and their config. This
> fixed it for me:
OK. I guess kernel tree also mattered somehow, and I got confused
because I'd applied the patch on wireless, where the robot did it on
wireless-next. Not sure how that's different, but OK.
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index d0f499227c29..7735d178a393 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -12845,7 +12845,7 @@ static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
> }
>
> static int nl80211_set_cqm_rssi(struct genl_info *info,
> - const s32 *thresholds, int n_thresholds,
> + const s32 *thresholds, u32 n_thresholds,
> u32 hysteresis)
> {
> struct cfg80211_registered_device *rdev = info->user_ptr[0];
> @@ -12948,7 +12948,7 @@ static int nl80211_set_cqm(struct sk_buff *skb, struct genl_info *info)
> attrs[NL80211_ATTR_CQM_RSSI_HYST]) {
> const s32 *thresholds =
> nla_data(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> - int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> + u32 len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> u32 hysteresis = nla_get_u32(attrs[NL80211_ATTR_CQM_RSSI_HYST]);
>
> if (len % 4)
>
>
> If that's sensible, I can send a proper patch?
Sure, that seems reasonable.
> (Oh, it looks like nla_len is actually u16 ... should I use that instead
> of u32?)
Yeah it's a 16-bit field in the message format. Doesn't really matter?
johannes
On Thu, Nov 30, 2023 at 07:40:26PM +0100, Johannes Berg wrote:
> On Thu, 2023-11-30 at 10:32 -0800, Kees Cook wrote:
> > Yeah, I would expect this to mean that there is a code path that
> > GCC found where the value could overflow. It does this when a variable
> > "value range" gets bounded (e.g. an int isn't the full -INT_MAX to INT_MAX
> > range).And flex_array_size() was designed to saturate at SIZE_MIX rather
> > than wrapping around to an unexpected small value, so these are playing
> > together it seems.
> >
> > However, I would have expected the kzalloc() to blow up _first_.
>
> Hmm.
>
> > Regardless, I suspect the addition of "if (n_thresholds > 1)" is what is
> > tripping GCC.
> >
> > int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> > ...
> > return nl80211_set_cqm_rssi(info, thresholds, len / 4,
> > hysteresis);
> >
> > Now it "knows" there is a path where n_threasholds could be [2,
> > INT_MAX].
>
> Yeah, it's not _really_ bounded, apart from the message length? But then
> struct_size() should saturate and fail? But I guess it cannot know that,
> and limits the object size to 1<<63 - 1 whereas the copy is 1<<64 - 1...
>
> > Does this warning go away if "len" is made unsigned?
Actually, this alone fixes it too:
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 167b91348e57..c59679524705 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1214,9 +1214,9 @@ static inline void *nla_data(const struct nlattr *nla)
* nla_len - length of payload
* @nla: netlink attribute
*/
-static inline int nla_len(const struct nlattr *nla)
+static inline u16 nla_len(const struct nlattr *nla)
{
- return nla->nla_len - NLA_HDRLEN;
+ return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
}
/**
--
Kees Cook
On Thu, 2023-11-30 at 10:55 -0800, Kees Cook wrote:
> On Thu, Nov 30, 2023 at 07:40:26PM +0100, Johannes Berg wrote:
> > On Thu, 2023-11-30 at 10:32 -0800, Kees Cook wrote:
> > > Yeah, I would expect this to mean that there is a code path that
> > > GCC found where the value could overflow. It does this when a variable
> > > "value range" gets bounded (e.g. an int isn't the full -INT_MAX to INT_MAX
> > > range).And flex_array_size() was designed to saturate at SIZE_MIX rather
> > > than wrapping around to an unexpected small value, so these are playing
> > > together it seems.
> > >
> > > However, I would have expected the kzalloc() to blow up _first_.
> >
> > Hmm.
> >
> > > Regardless, I suspect the addition of "if (n_thresholds > 1)" is what is
> > > tripping GCC.
> > >
> > > int len = nla_len(attrs[NL80211_ATTR_CQM_RSSI_THOLD]);
> > > ...
> > > return nl80211_set_cqm_rssi(info, thresholds, len / 4,
> > > hysteresis);
> > >
> > > Now it "knows" there is a path where n_threasholds could be [2,
> > > INT_MAX].
> >
> > Yeah, it's not _really_ bounded, apart from the message length? But then
> > struct_size() should saturate and fail? But I guess it cannot know that,
> > and limits the object size to 1<<63 - 1 whereas the copy is 1<<64 - 1...
> >
> > > Does this warning go away if "len" is made unsigned?
>
> Actually, this alone fixes it too:
>
> diff --git a/include/net/netlink.h b/include/net/netlink.h
> index 167b91348e57..c59679524705 100644
> --- a/include/net/netlink.h
> +++ b/include/net/netlink.h
> @@ -1214,9 +1214,9 @@ static inline void *nla_data(const struct nlattr *nla)
> * nla_len - length of payload
> * @nla: netlink attribute
> */
> -static inline int nla_len(const struct nlattr *nla)
> +static inline u16 nla_len(const struct nlattr *nla)
> {
> - return nla->nla_len - NLA_HDRLEN;
> + return nla->nla_len > NLA_HDRLEN ? nla->nla_len - NLA_HDRLEN : 0;
> }
>
Heh. If you can sell that to Jakub I don't mind, but that might be a
harder sell than the int/u32 in our code...
johannes