LinuxLists.cc - [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

2019-08-01 07:40:02

Subject: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

---
net/wireless/nl80211.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 6774072e836f..e1707cfd9076 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2496,6 +2496,22 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
rtnl_unlock();
return ret;
}
+
+ /*
+ * auto-detect support for large buffer sizes: af_netlink
+ * will allocate skbufs larger than 4096 in cases where
+ * it detects that the client receive buffer (given to
+ * recvmsg) is bigger. In such cases we can assume that
+ * performing split dumps is wasteful since the client
+ * can likely safely consume the entire un-split wiphy
+ * message in one go without the extra message header
+ * overhead.
+ */
+ if (skb_tailroom(skb) > 4096) {
+ state->large_message = true;
+ state->split = false;
+ }
+
cb->args[0] = (long)state;
}

@@ -2529,6 +2545,7 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct netlink_callback *cb)
* We can then retry with the larger buffer.
*/
if ((ret == -ENOBUFS || ret == -EMSGSIZE) &&
+ !state->large_message &&
!skb->len && !state->split &&
cb->min_dump_alloc < 4096) {
cb->min_dump_alloc = 4096;
--
2.21.0

2019-08-01 09:56:21

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

On Thu, 2019-08-01 at 02:14 -0500, Denis Kenzior wrote:

> + /*
> + * auto-detect support for large buffer sizes: af_netlink
> + * will allocate skbufs larger than 4096 in cases where
> + * it detects that the client receive buffer (given to
> + * recvmsg) is bigger. In such cases we can assume that
> + * performing split dumps is wasteful since the client
> + * can likely safely consume the entire un-split wiphy
> + * message in one go without the extra message header
> + * overhead.
> + */
> + if (skb_tailroom(skb) > 4096) {
> + state->large_message = true;
> + state->split = false;
> + }

Hmm. That's kinda a neat idea, but I don't think it's a good idea. Have
you checked how long the message is now?

Since we *did* in fact hit the previous limit, and have added a *lot* of
things since then (this was years ago, after all), I wouldn't be
surprised if we're reasonably close to the new limit you propose even
now already.

Also, keep in mind that there are some devices that just have an
*enormous* amount of channels, and that's only going to increase (right
now with 6/7 GHz, etc.)

So in general, given all the variable things we have here, all this
buffer size estimation doesn't seem very robust to me. You could have
any number of variable things in a message:
* channel list - which we alleviated somewhat by having a separate
channel dump, so not all data is included here (which I guess you'll
complain about next :P)
* nl80211_send_mgmt_stypes() things are also a bit variable, and we
keep adding interface types etc., and some devices may support lots
of frames (there's an upper bound, but it's not that small)
* interface combinations - only getting more complex with more complex
devices and more concurrency use cases
* vendor commands have no real limit
* I'm sure measurement use cases will only increases
* and generally of course we keep adding to everything

Also, I don't really buy the *need* for this since you're just removing
a few kernel/user roundtrips here when new devices are discovered, a
rare event. The parsing isn't really any more complicated for the
userspace side.

Regarding the other patch, I think most of the above also applies there.
I can sort of see how you think it's *nice* to have all the data right
there, but I really don't see why you're so hung up about having to
request the full information ... And I really don't want to see this hit
the wall again in the future, in some weird scenarios with devices that
have lots of <any of the above information>.

> It should be safe to assume that any users of these new unsolicited
> NEW_WIPHY events are non-legacy clients, which can use a
> larger receive buffer for netlink messages. Since older, legacy clients
> did not utilize NEW_WIPHY events (they did not exist), it is assumed
> that even if the client receives such a message (even if truncated), no
> harm would result and backwards-compatibility would be kept.

Interesting idea, but no, in general you cannot assume that. Older
clients might have added support for NEW_WIPHY without fixing the split
dumps first ...

Also, you mention in the code that messages are truncated, but I'm
pretty sure they're just dropped, not truncated.

And finally, I also see no reason to send out many KB of data for what
might in the end (e.g. in iw) just be a debug message.

But really I think the thing that kills this proposal is the fact that
it reintroduces a message size limit (even if higher now) that we're
somewhat likely to hit in the future.

johannes

2019-08-01 11:03:53

by Marcel Holtmann

[permalink] [raw]

Subject: Re: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

Hi Johannes,

>> + /*
>> + * auto-detect support for large buffer sizes: af_netlink
>> + * will allocate skbufs larger than 4096 in cases where
>> + * it detects that the client receive buffer (given to
>> + * recvmsg) is bigger. In such cases we can assume that
>> + * performing split dumps is wasteful since the client
>> + * can likely safely consume the entire un-split wiphy
>> + * message in one go without the extra message header
>> + * overhead.
>> + */
>> + if (skb_tailroom(skb) > 4096) {
>> + state->large_message = true;
>> + state->split = false;
>> + }
>
> Hmm. That's kinda a neat idea, but I don't think it's a good idea. Have
> you checked how long the message is now?
>
> Since we *did* in fact hit the previous limit, and have added a *lot* of
> things since then (this was years ago, after all), I wouldn't be
> surprised if we're reasonably close to the new limit you propose even
> now already.
>
> Also, keep in mind that there are some devices that just have an
> *enormous* amount of channels, and that's only going to increase (right
> now with 6/7 GHz, etc.)
>
> So in general, given all the variable things we have here, all this
> buffer size estimation doesn't seem very robust to me. You could have
> any number of variable things in a message:
> * channel list - which we alleviated somewhat by having a separate
> channel dump, so not all data is included here (which I guess you'll
> complain about next :P)
> * nl80211_send_mgmt_stypes() things are also a bit variable, and we
> keep adding interface types etc., and some devices may support lots
> of frames (there's an upper bound, but it's not that small)
> * interface combinations - only getting more complex with more complex
> devices and more concurrency use cases
> * vendor commands have no real limit
> * I'm sure measurement use cases will only increases
> * and generally of course we keep adding to everything
>
>
> Also, I don't really buy the *need* for this since you're just removing
> a few kernel/user roundtrips here when new devices are discovered, a
> rare event. The parsing isn't really any more complicated for the
> userspace side.

that is an argument that is coming to bite you. Forcing multiple roundtrips or even collecting multiple split message for some ancient legacy client behavior is just silly. If clients provide larger buffers, we should start using them.

I have proven a long time ago that round-trips are causing delays and creating visible user experience issues. Look up my DHCP presentation from either LinuxCon or PlumbersConf. One round-trip leads to another and at some point you end up with seconds wasted because you want to sit here and ignore efforts in improving the situation.

> Regarding the other patch, I think most of the above also applies there.
> I can sort of see how you think it's *nice* to have all the data right
> there, but I really don't see why you're so hung up about having to
> request the full information ... And I really don't want to see this hit
> the wall again in the future, in some weird scenarios with devices that
> have lots of <any of the above information>.
>
>
>> It should be safe to assume that any users of these new unsolicited
>> NEW_WIPHY events are non-legacy clients, which can use a
>> larger receive buffer for netlink messages. Since older, legacy clients
>> did not utilize NEW_WIPHY events (they did not exist), it is assumed
>> that even if the client receives such a message (even if truncated), no
>> harm would result and backwards-compatibility would be kept.
>
> Interesting idea, but no, in general you cannot assume that. Older
> clients might have added support for NEW_WIPHY without fixing the split
> dumps first ...
>
> Also, you mention in the code that messages are truncated, but I'm
> pretty sure they're just dropped, not truncated.
>
> And finally, I also see no reason to send out many KB of data for what
> might in the end (e.g. in iw) just be a debug message.

Actually iw is just a dev tool. It should not be run in production and so that is not an argument. Any proper client that cares about your WiFi connections will want this information.

> But really I think the thing that kills this proposal is the fact that
> it reintroduces a message size limit (even if higher now) that we're
> somewhat likely to hit in the future.

Maybe we need to accept that current nl80211 API is broken and start over. Or we should at least start deprecating commands and replacing them with new ones that are doing a better job for clients that actually behave properly.

Regards

Marcel

2019-08-01 11:10:36

by Johannes Berg

[permalink] [raw]

Subject: Re: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

Hi Marcel,

> > Also, I don't really buy the *need* for this since you're just removing
> > a few kernel/user roundtrips here when new devices are discovered, a
> > rare event. The parsing isn't really any more complicated for the
> > userspace side.
>
> that is an argument that is coming to bite you. Forcing multiple
> roundtrips or even collecting multiple split message for some ancient
> legacy client behavior is just silly. If clients provide larger
> buffers, we should start using them.

I'm not arguing legacy/old client behaviour.

> I have proven a long time ago that round-trips are causing delays and
> creating visible user experience issues. Look up my DHCP presentation
> from either LinuxCon or PlumbersConf. One round-trip leads to another
> and at some point you end up with seconds wasted because you want to
> sit here and ignore efforts in improving the situation.

Comparing network roundtrips to local kernel access isn't exactly a very
good comparison.

> > And finally, I also see no reason to send out many KB of data for what
> > might in the end (e.g. in iw) just be a debug message.
>
> Actually iw is just a dev tool. It should not be run in production and
> so that is not an argument. Any proper client that cares about your
> WiFi connections will want this information.

Again, this isn't an argument. I said wpa_s is an example. Any other
number of tools works, even wpa_s. Heck, probably even iwd, when
configured to not care about some devices (unless you can't even make it
ignore devices, which I'd consider a deficiency in its own right).

> > But really I think the thing that kills this proposal is the fact that
> > it reintroduces a message size limit (even if higher now) that we're
> > somewhat likely to hit in the future.
>
> Maybe we need to accept that current nl80211 API is broken and start
> over. Or we should at least start deprecating commands and replacing
> them with new ones that are doing a better job for clients that
> actually behave properly.

I know you love throwing things away and rewriting them, but you're not
going to solve the problem.

I suggest you re-read my email and actually reply to it, rather than
throwing out bullet points.

Frankly, I'm tired of having a discussion where all you do is accuse me
of not caring about the problem, but then you don't even respond to any
arguments.

johannes

2019-08-01 19:36:52

by Denis Kenzior

[permalink] [raw]

Subject: Re: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

Hi Johannes,

On 8/1/19 4:13 AM, Johannes Berg wrote:
> On Thu, 2019-08-01 at 02:14 -0500, Denis Kenzior wrote:
>
>> + /*
>> + * auto-detect support for large buffer sizes: af_netlink
>> + * will allocate skbufs larger than 4096 in cases where
>> + * it detects that the client receive buffer (given to
>> + * recvmsg) is bigger. In such cases we can assume that
>> + * performing split dumps is wasteful since the client
>> + * can likely safely consume the entire un-split wiphy
>> + * message in one go without the extra message header
>> + * overhead.
>> + */
>> + if (skb_tailroom(skb) > 4096) {
>> + state->large_message = true;
>> + state->split = false;
>> + }
>
> Hmm. That's kinda a neat idea, but I don't think it's a good idea. Have
> you checked how long the message is now?

I only have hwsim and a ath9k to test with. The hwsim comes out to be
4448 bytes with an updated version of my patch. ath9k is smaller. V1
did not address some extra gotcha code which didn't include some
attributes in the unsplit case. As far as I can tell all the attributes
are now identical with v2.

Note that the overhead for split dumps is actually quite big. The same
info using split-dumps is 7724 bytes. So there would definitely be an
advantage in not using such fragmentation if not needed...

>
> Since we *did* in fact hit the previous limit, and have added a *lot* of
> things since then (this was years ago, after all), I wouldn't be
> surprised if we're reasonably close to the new limit you propose even
> now already.

It seems not?

>
> Also, keep in mind that there are some devices that just have an
> *enormous* amount of channels, and that's only going to increase (right
> now with 6/7 GHz, etc.)

The 2.4 & 5 Ghz bands account for about 2k. So even if we add another
band, we're likely still within an 8k buffer. And really the kernel
recommends a 16k buffer to be used as a minimum...

Also, the way nl80211 encodes channel information is really quite
wasteful. Not sure if anything can be done about it now, but the flags
really, really, really add up. So there is significant savings to be
had here...

>
> So in general, given all the variable things we have here, all this
> buffer size estimation doesn't seem very robust to me. You could have
> any number of variable things in a message:
> * channel list - which we alleviated somewhat by having a separate
> channel dump, so not all data is included here (which I guess you'll
> complain about next :P)

Not sure I follow? I don't see a separate channel dump? Can you point
me in the right direction?

> * nl80211_send_mgmt_stypes() things are also a bit variable, and we
> keep adding interface types etc., and some devices may support lots
> of frames (there's an upper bound, but it's not that small)
> * interface combinations - only getting more complex with more complex
> devices and more concurrency use cases
> * vendor commands have no real limit
> * I'm sure measurement use cases will only increases
> * and generally of course we keep adding to everything
>
>
> Also, I don't really buy the *need* for this since you're just removing
> a few kernel/user roundtrips here when new devices are discovered, a
> rare event. The parsing isn't really any more complicated for the
> userspace side.
>

roundtrips to the kernel introduce races. The less potential for a
race, the less code we have to write and the less buggy it is. Pretty
simple...

>
> Regarding the other patch, I think most of the above also applies there.
> I can sort of see how you think it's *nice* to have all the data right
> there, but I really don't see why you're so hung up about having to
> request the full information ... And I really don't want to see this hit
> the wall again in the future, in some weird scenarios with devices that
> have lots of <any of the above information>.
>

See above...

>
>> It should be safe to assume that any users of these new unsolicited
>> NEW_WIPHY events are non-legacy clients, which can use a
>> larger receive buffer for netlink messages. Since older, legacy clients
>> did not utilize NEW_WIPHY events (they did not exist), it is assumed
>> that even if the client receives such a message (even if truncated), no
>> harm would result and backwards-compatibility would be kept.
>
> Interesting idea, but no, in general you cannot assume that. Older
> clients might have added support for NEW_WIPHY without fixing the split
> dumps first ...

The two commits are over a year apart, but okay, fair enough. Then
again, you sort of hinted that nobody used this anyhow.

But regardless, if this mythical legacy/broken client is truly a
concern, we can introduce a NEW_WIPHY_BIG or something.

>
> Also, you mention in the code that messages are truncated, but I'm
> pretty sure they're just dropped, not truncated.

Dropped by who though? The kernel still copies the data into the
usersace buffer, and sets the MSG_TRUNC flag. Userspace will most
likely drop it, yes. But anyway, this is academic.

>
> And finally, I also see no reason to send out many KB of data for what
> might in the end (e.g. in iw) just be a debug message.
>

iw is not the real world though. I mean seriously, who cares about what
iw sees or doesn't? We're sending 2x the data due to fragmentation
overhead right now.

>
> But really I think the thing that kills this proposal is the fact that
> it reintroduces a message size limit (even if higher now) that we're
> somewhat likely to hit in the future.
>

I'd like to see some actual numbers that we're close to this message
size limit?

Regards,
-Denis

2019-08-16 18:51:01

by Marcel Holtmann

[permalink] [raw]

Subject: Re: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

Hi Johannes,

>>> Also, I don't really buy the *need* for this since you're just removing
>>> a few kernel/user roundtrips here when new devices are discovered, a
>>> rare event. The parsing isn't really any more complicated for the
>>> userspace side.
>>
>> that is an argument that is coming to bite you. Forcing multiple
>> roundtrips or even collecting multiple split message for some ancient
>> legacy client behavior is just silly. If clients provide larger
>> buffers, we should start using them.
>
> I'm not arguing legacy/old client behaviour.
>
>> I have proven a long time ago that round-trips are causing delays and
>> creating visible user experience issues. Look up my DHCP presentation
>> from either LinuxCon or PlumbersConf. One round-trip leads to another
>> and at some point you end up with seconds wasted because you want to
>> sit here and ignore efforts in improving the situation.
>
> Comparing network roundtrips to local kernel access isn't exactly a very
> good comparison.

the speed up wasn’t in UDP packet roundtrips, it was in local roundtrips and system calls.

So yes, roundtrips to the kernel are adding up. And if we increase the message size by adding 75-80% extra bytes, then this is adding up over time. You mention yourself we will see new channels and people are adding 15+ cards to the same machine. Every channel gets its own netlink message and that just adds up in overall netlink messages and roundtrips.

If we keep telling people to talk directly to nl80211, because wpa_supplicant doesn’t provide certain information and also doesn’t want to provide them, then we have more users dumping the wiphy list and that adds up again. As far as I recall we are still holding the big RTNL lock for this operations.

I know you want to remove RTNL dependency, but maybe we should run iwmon first to see if there are not external users ending up dumping the wiphy list over and over again.

>>> And finally, I also see no reason to send out many KB of data for what
>>> might in the end (e.g. in iw) just be a debug message.
>>
>> Actually iw is just a dev tool. It should not be run in production and
>> so that is not an argument. Any proper client that cares about your
>> WiFi connections will want this information.
>
> Again, this isn't an argument. I said wpa_s is an example. Any other
> number of tools works, even wpa_s. Heck, probably even iwd, when
> configured to not care about some devices (unless you can't even make it
> ignore devices, which I'd consider a deficiency in its own right).

If you are a production end user device, you have the amount of devices in there that are needed. If you are development systems, then nobody cares about extra messages.

I don’t like this command line thinking approach to things. There is not always some human user behind it that can decide what device to use by calling other commands like ip / ifconfig / dmesg. You need to enumerate it to the human to decide what to use.

So I stand by my point, if you are daemon that tries to handle WiFi, then it will want all information. And this includes device enumeration. With that regard, wpa_supplicant is not an example since it does do any device enumeration. That is external to the calling application or user.

To note here, there is also code in glibc that calls RTM_GETLINK. So every program executions dumps the netdev list.

>>> But really I think the thing that kills this proposal is the fact that
>>> it reintroduces a message size limit (even if higher now) that we're
>>> somewhat likely to hit in the future.
>>
>> Maybe we need to accept that current nl80211 API is broken and start
>> over. Or we should at least start deprecating commands and replacing
>> them with new ones that are doing a better job for clients that
>> actually behave properly.
>
> I know you love throwing things away and rewriting them, but you're not
> going to solve the problem.
>
> I suggest you re-read my email and actually reply to it, rather than
> throwing out bullet points.
>
> Frankly, I'm tired of having a discussion where all you do is accuse me
> of not caring about the problem, but then you don't even respond to any
> arguments.

If stuff is broken and we can't figure out a way to fix it, then it needs to be thrown away. Sticking the head in the sand and just ignoring it is not helpful. I would like to see better proposals.

Have you lately tried to use NL80211_CMD_GET_WIPHY dumping without the NL80211_ATTR_SPLIT_WIPHY_DUMP flag set? While working on some new testing utility, I ended up getting split dumps without having asked for them. Since is looks like that is broken since 2013, I wonder where are all these legacy applications that we trying to support.

Regards

Marcel