2019-08-16 19:27:46

by Denis Kenzior

[permalink] [raw]
Subject: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

If a (legacy) client requested a wiphy dump but did not provide the
NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
message. At least this was the intent after commit:
3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")

However, in reality the non-split dumps were broken very shortly after.
Perhaps around commit:
fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")

The reason for the bug is a missing setting of split_start to 0 in the
case of a non-split dump.

Here is a sample non-split dump performed on kernel 4.19, some parts
were cut for brevity:
< Request: Get Wiphy (0x01) len 0 [ack,0x300]
> Result: New Wiphy (0x03) len 3496 [multi]
Wiphy: 0 (0x00000000)
Wiphy Name: phy0
Generation: 1 (0x00000001)
<snip>
> Result: New Wiphy (0x03) len 68 [multi]
Wiphy: 0 (0x00000000)
Wiphy Name: phy0
Generation: 1 (0x00000001)
Extended Capabilities: len 8
Capability: bit 2: Extended channel switching
Capability: bit 62: Opmode Notification
Extended Capabilities Mask: len 8
04 00 00 00 00 00 00 40 .......@
VHT Capability Mask: len 12
f0 1f 80 33 ff ff 00 00 ff ff 00 00 ...3........
> Result: New Wiphy (0x03) len 28 [multi]
Wiphy: 0 (0x00000000)
Wiphy Name: phy0
Generation: 1 (0x00000001)
> Result: New Wiphy (0x03) len 28 [multi]
Wiphy: 0 (0x00000000)
Wiphy Name: phy0
Generation: 1 (0x00000001)
> Result: New Wiphy (0x03) len 52 [multi]
Wiphy: 0 (0x00000000)
Wiphy Name: phy0
Generation: 1 (0x00000001)
Max CSA Counters: len 1
02 .
Scheduled Scan Maximum Requests: len 4
01 00 00 00 ....
Extended Features: len 4
02 02 00 04 ....
> Result: New Wiphy (0x03) len 36 [multi]
Wiphy: 0 (0x00000000)
Wiphy Name: phy0
Generation: 1 (0x00000001)
Reserved: len 4
00 00 00 00 ....
> Complete: Get Wiphy (0x01) len 4 [multi]
Status: 0

Signed-off-by: Denis Kenzior <[email protected]>
---
net/wireless/nl80211.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 1a107f29016b..b9b0199b5ec6 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2173,6 +2173,9 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
* but break unconditionally so unsplit data stops here.
*/
state->split_start++;
+
+ if (!state->split)
+ state->split_start = 0;
break;
case 9:
if (rdev->wiphy.extended_capabilities &&
--
2.21.0


2019-08-16 19:27:46

by Denis Kenzior

[permalink] [raw]
Subject: [RFCv2 2/4] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes due to userspace tools
using limited buffers. Once the sizes NEW_WIPHY messages exceeded these
sizes, split dumps were introduced. All any non-legacy data was added
only to messages using split-dumps (including filtered dumps).

However, split-dumping has quite a significant overhead. On cards
tested, split dumps generated message sizes 1.7-1.8x compared to
non-split dumps, while still comfortably fitting into an 8k buffer. The
kernel now expects userspace to provide 16k buffers by default, and 32k
buffers are possible.

Introduce a concept of a large message, so that if the kernel detects
that userspace has provided a buffer of sufficient size, a non-split
message could be generated.

Signed-off-by: Denis Kenzior <[email protected]>
---
net/wireless/nl80211.c | 51 ++++++++++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index b9b0199b5ec6..682a095415eb 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1839,6 +1839,7 @@ struct nl80211_dump_wiphy_state {
long start;
long split_start, band_start, chan_start, capa_start;
bool split;
+ bool large_message;
};

static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
@@ -2027,7 +2028,8 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,

if (nl80211_msg_put_channel(
msg, &rdev->wiphy, chan,
- state->split))
+ state->split ||
+ state->large_message))
goto nla_put_failure;

nla_nest_end(msg, nl_freq);
@@ -2072,7 +2074,7 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
i = nl80211_add_commands_unsplit(rdev, msg);
if (i < 0)
goto nla_put_failure;
- if (state->split) {
+ if (state->split || state->large_message) {
CMD(crit_proto_start, CRIT_PROTOCOL_START);
CMD(crit_proto_stop, CRIT_PROTOCOL_STOP);
if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH)
@@ -2111,7 +2113,8 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
/* fall through */
case 6:
#ifdef CONFIG_PM
- if (nl80211_send_wowlan(msg, rdev, state->split))
+ if (nl80211_send_wowlan(msg, rdev,
+ state->split || state->large_message))
goto nla_put_failure;
state->split_start++;
if (state->split)
@@ -2126,7 +2129,8 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
goto nla_put_failure;

if (nl80211_put_iface_combinations(&rdev->wiphy, msg,
- state->split))
+ state->split ||
+ state->large_message))
goto nla_put_failure;

state->split_start++;
@@ -2145,7 +2149,7 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
* dump is split, otherwise it makes it too big. Therefore
* only advertise it in that case.
*/
- if (state->split)
+ if (state->split || state->large_message)
features |= NL80211_FEATURE_ADVERTISE_CHAN_LIMITS;
if (nla_put_u32(msg, NL80211_ATTR_FEATURE_FLAGS, features))
goto nla_put_failure;
@@ -2170,13 +2174,20 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
*
* We still increment split_start so that in the split
* case we'll continue with more data in the next round,
- * but break unconditionally so unsplit data stops here.
+ * but break unless large_messages are requested, so
+ * legacy unsplit data stops here.
*/
state->split_start++;

- if (!state->split)
+ if (state->split)
+ break;
+
+ if (!state->large_message) {
state->split_start = 0;
- break;
+ break;
+ }
+
+ /* Fall through */
case 9:
if (rdev->wiphy.extended_capabilities &&
(nla_put(msg, NL80211_ATTR_EXT_CAPA,
@@ -2218,7 +2229,9 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
}

state->split_start++;
- break;
+ if (state->split)
+ break;
+ /* Fall through */
case 10:
if (nl80211_send_coalesce(msg, rdev))
goto nla_put_failure;
@@ -2234,7 +2247,9 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
goto nla_put_failure;

state->split_start++;
- break;
+ if (state->split)
+ break;
+ /* Fall through */
case 11:
if (rdev->wiphy.n_vendor_commands) {
const struct nl80211_vendor_cmd_info *info;
@@ -2270,7 +2285,9 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
nla_nest_end(msg, nested);
}
state->split_start++;
- break;
+ if (state->split)
+ break;
+ /* Fall through */
case 12:
if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH &&
nla_put_u8(msg, NL80211_ATTR_MAX_CSA_COUNTERS,
@@ -2312,7 +2329,9 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
}

state->split_start++;
- break;
+ if (state->split)
+ break;
+ /* Fall through */
case 13:
if (rdev->wiphy.num_iftype_ext_capab &&
rdev->wiphy.iftype_ext_capab) {
@@ -2380,13 +2399,17 @@ static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
}

state->split_start++;
- break;
+ if (state->split)
+ break;
+ /* Fall through */
case 14:
if (nl80211_send_pmsr_capa(rdev, msg))
goto nla_put_failure;

state->split_start++;
- break;
+ if (state->split)
+ break;
+ /* Fall through */
case 15:
if (rdev->wiphy.akm_suites &&
nla_put(msg, NL80211_ATTR_AKM_SUITES,
--
2.21.0

2019-08-16 19:29:39

by Denis Kenzior

[permalink] [raw]
Subject: [RFCv2 3/4] nl80211: Don't split-dump for clients with large buffers

Signed-off-by: Denis Kenzior <[email protected]>
---
net/wireless/nl80211.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 682a095415eb..24b67de99f3a 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2498,6 +2498,22 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
rtnl_unlock();
return ret;
}
+
+ /*
+ * auto-detect support for large buffer sizes: af_netlink
+ * will allocate skbufs larger than 4096 in cases where
+ * it detects that the client receive buffer (given to
+ * recvmsg) is bigger. In such cases we can assume that
+ * performing split dumps is wasteful since the client
+ * can likely safely consume the entire un-split wiphy
+ * message in one go without the extra message header
+ * overhead.
+ */
+ if (skb_tailroom(skb) > 4096) {
+ state->large_message = true;
+ state->split = false;
+ }
+
cb->args[0] = (long)state;
}

@@ -2531,6 +2547,7 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
* We can then retry with the larger buffer.
*/
if ((ret == -ENOBUFS || ret == -EMSGSIZE) &&
+ !state->large_message &&
!skb->len && !state->split &&
cb->min_dump_alloc < 4096) {
cb->min_dump_alloc = 4096;
--
2.21.0

2019-08-16 19:29:39

by Denis Kenzior

[permalink] [raw]
Subject: [RFCv2 4/4] nl80211: Send large new_wiphy events

Send large NEW_WIPHY events on a new multicast group so that clients
that can accept larger messages do not need to round-trip to the kernel
and perform extra filtered wiphy dumps.

A new multicast group is introduced and the large message is sent before
the legacy message. This way clients that listen on both multicast
groups can ignore duplicate legacy messages if needed.

Signed-off-by: Denis Kenzior <[email protected]>
---
include/uapi/linux/nl80211.h | 1 +
net/wireless/nl80211.c | 28 +++++++++++++++++++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 822851d369ab..b9c1cf29cf09 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -50,6 +50,7 @@
#define NL80211_MULTICAST_GROUP_MLME "mlme"
#define NL80211_MULTICAST_GROUP_VENDOR "vendor"
#define NL80211_MULTICAST_GROUP_NAN "nan"
+#define NL80211_MULTICAST_GROUP_CONFIG2 "config2"
#define NL80211_MULTICAST_GROUP_TESTMODE "testmode"

/**
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 24b67de99f3a..9ba9e1938d6b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -46,6 +46,7 @@ enum nl80211_multicast_groups {
NL80211_MCGRP_MLME,
NL80211_MCGRP_VENDOR,
NL80211_MCGRP_NAN,
+ NL80211_MCGRP_CONFIG2,
NL80211_MCGRP_TESTMODE /* keep last - ifdef! */
};

@@ -56,6 +57,7 @@ static const struct genl_multicast_group nl80211_mcgrps[] = {
[NL80211_MCGRP_MLME] = { .name = NL80211_MULTICAST_GROUP_MLME },
[NL80211_MCGRP_VENDOR] = { .name = NL80211_MULTICAST_GROUP_VENDOR },
[NL80211_MCGRP_NAN] = { .name = NL80211_MULTICAST_GROUP_NAN },
+ [NL80211_MCGRP_CONFIG2] = { .name = NL80211_MULTICAST_GROUP_CONFIG2 },
#ifdef CONFIG_NL80211_TESTMODE
[NL80211_MCGRP_TESTMODE] = { .name = NL80211_MULTICAST_GROUP_TESTMODE }
#endif
@@ -14730,12 +14732,34 @@ void nl80211_notify_wiphy(struct cfg80211_registered_device *rdev,
enum nl80211_commands cmd)
{
struct sk_buff *msg;
+ size_t alloc_size;
struct nl80211_dump_wiphy_state state = {};

WARN_ON(cmd != NL80211_CMD_NEW_WIPHY &&
cmd != NL80211_CMD_DEL_WIPHY);

- msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (cmd == NL80211_CMD_NEW_WIPHY) {
+ state.large_message = true;
+ alloc_size = 8192UL;
+ } else
+ alloc_size = NLMSG_DEFAULT_SIZE;
+
+ msg = nlmsg_new(alloc_size, GFP_KERNEL);
+ if (!msg)
+ goto legacy;
+
+ if (nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0) {
+ nlmsg_free(msg);
+ goto legacy;
+ }
+
+ genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
+ NL80211_MCGRP_CONFIG2, GFP_KERNEL);
+
+legacy:
+ state.large_message = false;
+ alloc_size = NLMSG_DEFAULT_SIZE;
+ msg = nlmsg_new(alloc_size, GFP_KERNEL);
if (!msg)
return;

@@ -14763,6 +14787,8 @@ void nl80211_notify_iface(struct cfg80211_registered_device *rdev,
return;
}

+ genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
+ NL80211_MCGRP_CONFIG2, GFP_KERNEL);
genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
NL80211_MCGRP_CONFIG, GFP_KERNEL);
}
--
2.21.0

2019-08-30 09:04:16

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
> If a (legacy) client requested a wiphy dump but did not provide the
> NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
> composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
> message. At least this was the intent after commit:
> 3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")
>
> However, in reality the non-split dumps were broken very shortly after.
> Perhaps around commit:
> fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")

Fun. I guess we updated all userspace quickly enough to not actually
have any issues there. As far as I remember, nobody ever complained, so
I guess people just updated their userspace.

Given that it's been 6+ years, maybe we're better off just removing the
whole non-split thing then, instead of fixing it. Seems even less likely
now that somebody would run a 6+yo supplicant (from before its commit
c30a4ab045ce ("nl80211: Fix mode settings with split wiphy dump")).

OTOH, this is a simple fix, would removing the non-split mode result in
any appreciable cleanups? Perhaps not, and we'd have to insert something
instead to reject non-split and log a warning, or whatnot.

johannes

2019-08-30 09:11:12

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

On Fri, 2019-08-30 at 11:03 +0200, Johannes Berg wrote:
> On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
> > If a (legacy) client requested a wiphy dump but did not provide the
> > NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
> > composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
> > message. At least this was the intent after commit:
> > 3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")
> >
> > However, in reality the non-split dumps were broken very shortly after.
> > Perhaps around commit:
> > fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")
>
> Fun. I guess we updated all userspace quickly enough to not actually
> have any issues there. As far as I remember, nobody ever complained, so
> I guess people just updated their userspace.

Actually, going back in time to the code there (e.g. iw and hostap), it
seems that it quite possibly never was a userspace issue, just an issue
with netlink allocating a 4k SKB by default for dumps.

Even then, libnl would've defaulted to a 16k recvmsg() buffer size, and
we didn't override that anywhere.

So more likely, this was all fixed by kernel 9063e21fb026 ("netlink:
autosize skb lengthes") about a year after we ran into the problem.

johannes

2019-08-30 09:37:07

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFCv2 2/4] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
> For historical reasons, NEW_WIPHY messages generated by dumps or
> GET_WIPHY commands were limited to 4096 bytes due to userspace tools
> using limited buffers.

I think now that I've figured out why, it'd be good to note that it
wasn't due to userspace tools, but rather due to the default netlink
dump skb allocation at the time, prior to commit 9063e21fb026
("netlink: autosize skb lengthes").

> Once the sizes NEW_WIPHY messages exceeded these
> sizes, split dumps were introduced. All any non-legacy data was added
> only to messages using split-dumps (including filtered dumps).
>
> However, split-dumping has quite a significant overhead. On cards
> tested, split dumps generated message sizes 1.7-1.8x compared to
> non-split dumps, while still comfortably fitting into an 8k buffer. The
> kernel now expects userspace to provide 16k buffers by default, and 32k
> buffers are possible.
>
> Introduce a concept of a large message, so that if the kernel detects
> that userspace has provided a buffer of sufficient size, a non-split
> message could be generated.

So, there's still a wrinkle with this. Larger SKB allocation can fail,
and instead of returning an error to userspace, the kernel will allocate
a smaller SKB instead.

With genetlink, we currently don't even have a way of controlling the
minimum allocation that's always required.

Since we already have basically all of the mechanics, I'd say perhaps a
better concept would be to "split when necessary", aborting if split
isn't supported.

IOW, do something like

... nl80211_send_wiphy(...)
{
[...]

switch (state->split_start) {
[...]
case <N>:
[...] // put stuff
state->split_start++;
state->skb_end = nlmsg_get_pos(skb);
/* fall through */
case <N+1>:
[...]
}

finish:
genlmsg_end(msg, hdr);
return 0;
nla_put_failure:
if (state->split_start < 9) {
genlmsg_cancel(msg, hdr);
return -EMSGSIZE;
}
nlmsg_trim(msg, state->skb_end);
goto finish;
}


That way, we fill each SKB as much as possible, up to 32k if userspace
provided big enough buffers *and* we could allocate the SKB.


Your userspace would still set the split flag, and thus be compatible
with all kinds of options:
* really old kernel not supporting split
* older kernel sending many messages
* kernel after this change packing more into one message
* even if allocating big SKBs failed

johannes

2019-08-30 09:40:26

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

On Fri, 2019-08-30 at 11:10 +0200, Johannes Berg wrote:
> On Fri, 2019-08-30 at 11:03 +0200, Johannes Berg wrote:
> > On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
> > > If a (legacy) client requested a wiphy dump but did not provide the
> > > NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
> > > composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
> > > message. At least this was the intent after commit:
> > > 3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")
> > >
> > > However, in reality the non-split dumps were broken very shortly after.
> > > Perhaps around commit:
> > > fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")
> >
> > Fun. I guess we updated all userspace quickly enough to not actually
> > have any issues there. As far as I remember, nobody ever complained, so
> > I guess people just updated their userspace.
>
> Actually, going back in time to the code there (e.g. iw and hostap), it
> seems that it quite possibly never was a userspace issue, just an issue
> with netlink allocating a 4k SKB by default for dumps.
>
> Even then, libnl would've defaulted to a 16k recvmsg() buffer size, and
> we didn't override that anywhere.

Ah, also not quite true, at the time it still had a 4k default, until
commit 807fddc4cd9e ("nl: Increase receive buffer size to 4 pages")
dated May 8, 2013.

johannes

2019-08-30 09:55:46

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

On Fri, 2019-08-30 at 11:40 +0200, Johannes Berg wrote:
> On Fri, 2019-08-30 at 11:10 +0200, Johannes Berg wrote:
> > On Fri, 2019-08-30 at 11:03 +0200, Johannes Berg wrote:
> > > On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
> > > > If a (legacy) client requested a wiphy dump but did not provide the
> > > > NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
> > > > composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
> > > > message. At least this was the intent after commit:
> > > > 3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")
> > > >
> > > > However, in reality the non-split dumps were broken very shortly after.
> > > > Perhaps around commit:
> > > > fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")
> > >
> > > Fun. I guess we updated all userspace quickly enough to not actually
> > > have any issues there. As far as I remember, nobody ever complained, so
> > > I guess people just updated their userspace.
> >
> > Actually, going back in time to the code there (e.g. iw and hostap), it
> > seems that it quite possibly never was a userspace issue, just an issue
> > with netlink allocating a 4k SKB by default for dumps.
> >
> > Even then, libnl would've defaulted to a 16k recvmsg() buffer size, and
> > we didn't override that anywhere.
>
> Ah, also not quite true, at the time it still had a 4k default, until
> commit 807fddc4cd9e ("nl: Increase receive buffer size to 4 pages")
> dated May 8, 2013.

However, even before that, it would have supported responding to
MSG_TRUNC by retrying the recvmsg(), but hostap/iw wouldn't have set
nl_socket_enable_msg_peek()... oh well.

johannes

2019-08-30 10:17:33

by Johannes Berg

[permalink] [raw]
Subject: Re: [RFCv2 4/4] nl80211: Send large new_wiphy events

On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
> Send large NEW_WIPHY events on a new multicast group so that clients
> that can accept larger messages do not need to round-trip to the kernel
> and perform extra filtered wiphy dumps.
>
> A new multicast group is introduced and the large message is sent before
> the legacy message. This way clients that listen on both multicast
> groups can ignore duplicate legacy messages if needed.

Since I just did the digging, it seems that this would affect (old)
applications with libnl up to 3.2.22, unless they changed the default
recvmsg() buffer size.

I think this is a pretty decent approach, but I'm slightly worried about
hitting the new limits (16k) eventually. It seems far off now, but who
knows what kind of data we'll add. HE is (and likely will be) adding
quite a bit since it has everything for each interface type - something
drivers have for the most part not implemented yet. That trend will only
continue, as complexity in the spec doesn't seem to be going down.

And I don't really want to see "config3" a couple of years down the
road...

So can we at least mandate (document) that "config2" basically has no
message limit, and you will use MSG_PEEK/handle MSG_TRUNC with it?

That way, we can later bump the 8192 even beyond 16k if needed, and not
run into problems.

> + if (cmd == NL80211_CMD_NEW_WIPHY) {
> + state.large_message = true;
> + alloc_size = 8192UL;
> + } else
> + alloc_size = NLMSG_DEFAULT_SIZE;
> +

nit: there should be braces on both branches

> + if (nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0) {
> + nlmsg_free(msg);
> + goto legacy;
> + }

I think that'd be worth a WARN_ON(), it should never happen that you
actually run out of space, it means that the above wasn't big enough.

Now, on the previous patches I actually thought that you could set
"state->split" (and you should) and not need "state->large_message" in
order to indicate that the sub-functions are allowed to create larger
data - just keep filling the SKBs as much as possible for the dump.

Here, it seems like we do need it. It might be possible to get away
without it (by setting split here, and then having some special code to
handle the case of it not getting to the end), but that doesn't seem
worth it.

> @@ -14763,6 +14787,8 @@ void nl80211_notify_iface(struct cfg80211_registered_device *rdev,
> return;
> }
>
> + genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
> + NL80211_MCGRP_CONFIG2, GFP_KERNEL);

Hmm. That seems only needed if you don't want to listen on "config" at
all, but in the patch description you explicitly said that you send it
on "config2" *before* "config" for compatibility reasons (which makes
sense) - so what is it?

I'm having a hard time seeing anyone get away with only listening on
config2 since that'd basically require very recent (as of now future)
kernel. Are you planning this for a world where you can ditch support
for kernel<5.4 (or so)?

johannes

2019-08-30 15:55:29

by Denis Kenzior

[permalink] [raw]
Subject: Re: [RFCv2 4/4] nl80211: Send large new_wiphy events

Hi Johannes,

On 8/30/19 5:14 AM, Johannes Berg wrote:
> On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
>> Send large NEW_WIPHY events on a new multicast group so that clients
>> that can accept larger messages do not need to round-trip to the kernel
>> and perform extra filtered wiphy dumps.
>>
>> A new multicast group is introduced and the large message is sent before
>> the legacy message. This way clients that listen on both multicast
>> groups can ignore duplicate legacy messages if needed.
>
> Since I just did the digging, it seems that this would affect (old)
> applications with libnl up to 3.2.22, unless they changed the default
> recvmsg() buffer size.
>

Sorry, I'm not sure I understand. Are you saying new clients would try
to use old libnl and subscribe to this new multicast group for large
messages? Legacy clients shouldn't even see messages on this multicast
group since they would never subscribe to it.

> I think this is a pretty decent approach, but I'm slightly worried about
> hitting the new limits (16k) eventually. It seems far off now, but who
> knows what kind of data we'll add. HE is (and likely will be) adding
> quite a bit since it has everything for each interface type - something
> drivers have for the most part not implemented yet. That trend will only
> continue, as complexity in the spec doesn't seem to be going down.
>

Right, but the kernel will go up to 32k buffers if userspace read buffer
is that large. So I think we have quite some room to grow. On the
other hand, we probably should be vigilant that any new stuff added
tries to minimize message sizes whenever possible.

> And I don't really want to see "config3" a couple of years down the
> road...
>

Agreed.

> So can we at least mandate (document) that "config2" basically has no
> message limit, and you will use MSG_PEEK/handle MSG_TRUNC with it?
>

Yes, I will take care of that in v3.

> That way, we can later bump the 8192 even beyond 16k if needed, and not
> run into problems.
>
>> + if (cmd == NL80211_CMD_NEW_WIPHY) {
>> + state.large_message = true;
>> + alloc_size = 8192UL;
>> + } else
>> + alloc_size = NLMSG_DEFAULT_SIZE;
>> +
>
> nit: there should be braces on both branches
>

will fix

>> + if (nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0) {
>> + nlmsg_free(msg);
>> + goto legacy;
>> + }
>
> I think that'd be worth a WARN_ON(), it should never happen that you
> actually run out of space, it means that the above wasn't big enough.
>

Yep

> Now, on the previous patches I actually thought that you could set
> "state->split" (and you should) and not need "state->large_message" in
> order to indicate that the sub-functions are allowed to create larger
> data - just keep filling the SKBs as much as possible for the dump.
>
> Here, it seems like we do need it. It might be possible to get away
> without it (by setting split here, and then having some special code to
> handle the case of it not getting to the end), but that doesn't seem
> worth it.
>
>> @@ -14763,6 +14787,8 @@ void nl80211_notify_iface(struct cfg80211_registered_device *rdev,
>> return;
>> }
>>
>> + genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
>> + NL80211_MCGRP_CONFIG2, GFP_KERNEL);
>
> Hmm. That seems only needed if you don't want to listen on "config" at
> all, but in the patch description you explicitly said that you send it
> on "config2" *before* "config" for compatibility reasons (which makes
> sense) - so what is it?

Well it can be both, depending on whether large messages can fail or
not. So one use case might be that a client detects whether the config2
multicast group exists. If so, then it only subscribes to it and that's it.

Another use case might be (if userspace is worried about losing large
messages) to subscribe to both groups. If it receives the large
message, it can ignore the one that comes on the legacy multicast group.

>
> I'm having a hard time seeing anyone get away with only listening on
> config2 since that'd basically require very recent (as of now future)
> kernel. Are you planning this for a world where you can ditch support
> for kernel<5.4 (or so)?

No, but there's nothing stopping the client in making the choice at
runtime depending on the genl family info it gets. E.g. by peeking into
CTRL_ATTR_MCAST_GROUPS.

Regards,
-Denis

2019-08-30 16:33:33

by Denis Kenzior

[permalink] [raw]
Subject: Re: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

Hi Johannes,

On 8/30/19 4:03 AM, Johannes Berg wrote:
> On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
>> If a (legacy) client requested a wiphy dump but did not provide the
>> NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
>> composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
>> message. At least this was the intent after commit:
>> 3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")
>>
>> However, in reality the non-split dumps were broken very shortly after.
>> Perhaps around commit:
>> fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")
>
> Fun. I guess we updated all userspace quickly enough to not actually
> have any issues there. As far as I remember, nobody ever complained, so
> I guess people just updated their userspace.
>
> Given that it's been 6+ years, maybe we're better off just removing the
> whole non-split thing then, instead of fixing it. Seems even less likely
> now that somebody would run a 6+yo supplicant (from before its commit
> c30a4ab045ce ("nl80211: Fix mode settings with split wiphy dump")).
>

That would be my vote, given that we're probably one of a handful of
people in this world that understand that code path.

But... How would we handle non-dump versions of GET_WIPHY? To this day
I have dhcpcd issuing fun stuff like:

< Request: Get Wiphy (0x01) len 8 [ack]
0.374832
Interface Index: 59 (0x0000003b)

> OTOH, this is a simple fix, would removing the non-split mode result in
> any appreciable cleanups? Perhaps not, and we'd have to insert something
> instead to reject non-split and log a warning, or whatnot.
>

Getting rid of the legacy non-split case would simplify things. We
could also be a-lot smarter about how we split up the messages in order
to utilize buffer space more efficiently. I think you cover this in
your other replies, but I haven't processed those yet.

Regards,
-Denis

2019-08-30 19:56:58

by Denis Kenzior

[permalink] [raw]
Subject: Re: [RFCv2 2/4] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

Hi Johannes,

On 8/30/19 4:36 AM, Johannes Berg wrote:
> On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:
>> For historical reasons, NEW_WIPHY messages generated by dumps or
>> GET_WIPHY commands were limited to 4096 bytes due to userspace tools
>> using limited buffers.
>
> I think now that I've figured out why, it'd be good to note that it
> wasn't due to userspace tools, but rather due to the default netlink
> dump skb allocation at the time, prior to commit 9063e21fb026
> ("netlink: autosize skb lengthes").
>

Sure, will take care of it.

>> Once the sizes NEW_WIPHY messages exceeded these
>> sizes, split dumps were introduced. All any non-legacy data was added
>> only to messages using split-dumps (including filtered dumps).
>>
>> However, split-dumping has quite a significant overhead. On cards
>> tested, split dumps generated message sizes 1.7-1.8x compared to
>> non-split dumps, while still comfortably fitting into an 8k buffer. The
>> kernel now expects userspace to provide 16k buffers by default, and 32k
>> buffers are possible.
>>
>> Introduce a concept of a large message, so that if the kernel detects
>> that userspace has provided a buffer of sufficient size, a non-split
>> message could be generated.
>
> So, there's still a wrinkle with this. Larger SKB allocation can fail,
> and instead of returning an error to userspace, the kernel will allocate
> a smaller SKB instead.
>
> With genetlink, we currently don't even have a way of controlling the
> minimum allocation that's always required.
>
> Since we already have basically all of the mechanics, I'd say perhaps a
> better concept would be to "split when necessary", aborting if split
> isn't supported.
>
> IOW, do something like
>
> ... nl80211_send_wiphy(...)
> {
> [...]
>
> switch (state->split_start) {
> [...]
> case <N>:
> [...] // put stuff
> state->split_start++;
> state->skb_end = nlmsg_get_pos(skb);
> /* fall through */
> case <N+1>:
> [...]
> }
>
> finish:
> genlmsg_end(msg, hdr);
> return 0;
> nla_put_failure:
> if (state->split_start < 9) {
> genlmsg_cancel(msg, hdr);
> return -EMSGSIZE;
> }
> nlmsg_trim(msg, state->skb_end);
> goto finish;
> }
>
>
> That way, we fill each SKB as much as possible, up to 32k if userspace
> provided big enough buffers *and* we could allocate the SKB.
>
>
> Your userspace would still set the split flag, and thus be compatible
> with all kinds of options:
> * really old kernel not supporting split
> * older kernel sending many messages
> * kernel after this change packing more into one message
> * even if allocating big SKBs failed
>

What I was thinking was to attempt to build a large message first and if
that fails to fail over to the old logic. But I think what you propose
is even better. I'll incorporate this feedback into the next version.

Regards,
-Denis