LinuxLists.cc - netif_carrier

2023-09-19 11:47:02

Subject: netif_carrier_on() race

Hi,

So I've been spending a few days debug this issue, and I don't have a
really good solution, hence this question/description.

I'll note that a lot of this is written in the context of ARCH=um and
time-travel=infcpu mode, and that apparently between 6.4 and 6.6-rc some
scheduler and/or workqueue changes were made that seem to have made this
problem worse. However, I don't think there's actually anything that
_cannot_ happen in real systems (perhaps more likely in single-threaded
ones), even if it may be unlikely.

As background, in mac80211, we set the carrier on if we have a
connection that can actually start passing data, since that seemed like
the best equivalent at the time. This obviously happens by calling
netif_carrier_on(). Maybe we can revisit this, but it seems reasonable -
why pretend to have a carrier before you can start passing data.

Then, in netif_carrier_on(), we immediately set the carrier on bit, so
that you can actually immediately see this from userspace if you ask
rtnetlink, however, it's not actually immediately _functional_ - it
still needs to schedule and run the linkwatch work first, to call
dev_activate() to change the (TX queue) qdisc(s) away from noop.

Also, even though you can already query the carrier state and see it on,
the actual rtnetlink _event_ for this only happens from the linkwatch
work as well, via netdev_state_change().

All of this makes sense since you need to hold RTNL for all those state
changes/notifier chains, but it does lead to the first race/consistency
problem: if you query at just the right time you can see carrier being
on, however, if the carrier is actually removed again and the linkwatch
work didn't run yet, there might never be an event for the carrier on,
iow, you might have:

netif_carrier_on()
query from userspace and see carrier on
netif_carrier_off()
linkwatch work runs and sends only carrier off event

This is at least a bit confusing, but not really my main problem here,
though it did in fact happen to me as well, in a fashion.

Now going back to wifi, in mac80211 we also send an event (via nl80211)
that says "I'm associated now", and userspace may take that event as a
signal that it can send the first frame on the connection, which kind of
makes sense - however, given the async nature of the linkwatch work, if
userspace is fast enough those such a frame will actually get dropped.
This is the biggest problem I have now with all this.

Now can I fix this?

Possible solution #1:
---------------------
In theory at least dev_activate() is exported, so I could call it from
mac80211? However, that seems ... odd, and also it doesn't work because
I'd need to hold the RTNL and the locking means that's really awkward to
do, to the point where I'd need to defer to another async work for doing
it.

Haven't attempted this.

Possible solution #2:
---------------------
Another - more feasible - option might be to say OK, so the associated
event (and a few friends) are the problem, so we can queue those events
in cfg80211, and only release them on NETDEV_CHANGE notifier call.
That's from netdev_state_change() after dev_activate() in linkwatch
work. We'd want to pair it with netif_carrier_on() so we actually expect
the event to come, and maybe give netif_carrier_on() a return value
indicating that it scheduled - or else check using carrier_up_count
perhaps? We'd have to queue more events that come afterwards so they're
not sent to userspace out of order, but even that can be done.

I haven't tried this yet but I think it would work, but it does feel a
bit strange, and arguably just makes the consistency problem worse -
because again userspace could actually query and see the new state
before the event for said new state.

Possible solution #3:
---------------------

Try to intercept the events from userspace. I've tried this, but it
feels subject to races; we don't know that we will even get an event
with carrier=1, since if the kernel does netif_carrier_on() followed
quickly by netif_carrier_off(), there won't be a carrier=1 event. And
honestly, at that point, it feels like it's within the rights of the
kernel to not send any event at all - it's just an implementation
artifact that it doesn't stop the linkwatch work again when it's already
pending, so it sends a no-op event, which also feels a bit odd anyway.

Maybe there could be something done here with the carrier_up_count as
well, but I'm not sure this makes a lot of sense. At least that should
increase somewhere there, so I could query it before even trying to
associate, and then wait until at least it incremented? But ... there
might not be an event, again, unless we really decide, document (and not
change in the future) that there _must_ be at least one event for any
sequence of carrier on/off changes, even if that sequence ends with no
effective change in the carrier.

Possible solution #4:
---------------------

Maybe we could set the carrier on earlier in the wireless stack, but
then what if e.g. some DHCP client is just looking for that event? That
seems therefore also problematic - not attempted.

Possible solution #5:
---------------------

Change the wireless stack to hold RTNL _more_, so that we can call
dev_activate(). I feel pretty good about not holding the RTNL for almost
all of the wireless stack now, so wouldn't really want to do that. Also
feels like a step in the wrong direction, after all, RTNL is almost like
the new BKL.

Possible solution #6:
---------------------

Change the locking to not necessarily require RTNL, so the wireless
stack can more easily call dev_activate(). Pretty much infeasible, IMHO,
as much as I'd want to do/see locking improvements in general.

---------------------

So now I'm not really sure what to do. I'm tempted to implement #2 so at
least it's all visible in kernel space, or try to play with #3, but I'm
not so happy with #3 unless we guarantee that linkwatch will run even
for a sequence of events that results in no eventual changes. We do that
today, but it feels brittle.

(Oh, and yes, I also tried just sleeping a little bit in userspace but
that results in other issues...)

Anyone have thoughts on this?

johannes

2023-09-19 19:57:05

by Andrew Lunn

[permalink] [raw]

Subject: Re: netif_carrier_on() race

> All of this makes sense since you need to hold RTNL for all those state
> changes/notifier chains, but it does lead to the first race/consistency
> problem: if you query at just the right time you can see carrier being
> on, however, if the carrier is actually removed again and the linkwatch
> work didn't run yet, there might never be an event for the carrier on,
> iow, you might have:
>
> netif_carrier_on()
> query from userspace and see carrier on
> netif_carrier_off()
> linkwatch work runs and sends only carrier off event
>
> This is at least a bit confusing, but not really my main problem here,
> though it did in fact happen to me as well, in a fashion.

That is interesting. Copper Ethernet PHYs might have the opposite
problem. The status bit about link is latching low. This means that if
the link is lost and then very quickly comes back again, you always
get to see the lost and then restored link. So maybe we have:

netif_carrier_off()
query from userspace and see carrier off
netif_carrier_oon()
linkwatch work runs and sends only carrier on event

???

Maybe we want linkwatch to keep the old and the new state. If there is
a change reported while there is still work queued, which flips a bit
back to its old state, it needs to block until the work is actually
done?

> Possible solution #2:
> ---------------------
> Another - more feasible - option might be to say OK, so the associated
> event (and a few friends) are the problem, so we can queue those events
> in cfg80211, and only release them on NETDEV_CHANGE notifier call.
> That's from netdev_state_change() after dev_activate() in linkwatch
> work. We'd want to pair it with netif_carrier_on() so we actually expect
> the event to come, and maybe give netif_carrier_on() a return value
> indicating that it scheduled - or else check using carrier_up_count
> perhaps?

Probably not an issue with 802.11, but sometimes drivers do odd things
with the carrier because of the BMC. The BMC can have a side channel
into the hosts NIC, which allows it to share the hosts PHY and RJ45
socket. So that the BMC can send/receive frames while the host NIC is
admin down, the carrier might actually be up. And requests to down the
carrier are ignored.

As i said, probably irrelevant to 802.11, but if you try to make a
generic solution, you might need to watch out for this.

Andrew

2023-09-19 23:42:57

by Johannes Berg

[permalink] [raw]

Subject: Re: netif_carrier_on() race

On Tue, 2023-09-19 at 14:36 +0200, Andrew Lunn wrote:
> > All of this makes sense since you need to hold RTNL for all those state
> > changes/notifier chains, but it does lead to the first race/consistency
> > problem: if you query at just the right time you can see carrier being
> > on, however, if the carrier is actually removed again and the linkwatch
> > work didn't run yet, there might never be an event for the carrier on,
> > iow, you might have:
> >
> > netif_carrier_on()
> > query from userspace and see carrier on
> > netif_carrier_off()
> > linkwatch work runs and sends only carrier off event
> >
> > This is at least a bit confusing, but not really my main problem here,
> > though it did in fact happen to me as well, in a fashion.
>
> That is interesting. Copper Ethernet PHYs might have the opposite
> problem. The status bit about link is latching low. This means that if
> the link is lost and then very quickly comes back again, you always
> get to see the lost and then restored link. So maybe we have:
>
>   netif_carrier_off()
>   query from userspace and see carrier off
>   netif_carrier_oon()
>   linkwatch work runs and sends only carrier on event
>
> ???

Heh, interesting. But I agree, that looks like the same problem the
other way around.

Also in the event you actually get the counters that increased. So maybe
that still makes some sense, and we just guarantee that we send at least
one event for these cases? We do that in the code today, though it's not
really formalised IMHO.

> Maybe we want linkwatch to keep the old and the new state. If there is
> a change reported while there is still work queued, which flips a bit
> back to its old state, it needs to block until the work is actually
> done?

I think you fundamentally cannot block in netif_carrier_on/off, they're
called from all kinds of contexts.

> > Possible solution #2:
> > ---------------------
> > Another - more feasible - option might be to say OK, so the associated
> > event (and a few friends) are the problem, so we can queue those events
> > in cfg80211, and only release them on NETDEV_CHANGE notifier call.
> > That's from netdev_state_change() after dev_activate() in linkwatch
> > work. We'd want to pair it with netif_carrier_on() so we actually expect
> > the event to come, and maybe give netif_carrier_on() a return value
> > indicating that it scheduled - or else check using carrier_up_count
> > perhaps?
>
> Probably not an issue with 802.11, but sometimes drivers do odd things
> with the carrier because of the BMC. The BMC can have a side channel
> into the hosts NIC, which allows it to share the hosts PHY and RJ45
> socket. So that the BMC can send/receive frames while the host NIC is
> admin down, the carrier might actually be up. And requests to down the
> carrier are ignored.
>
> As i said, probably irrelevant to 802.11, but if you try to make a
> generic solution, you might need to watch out for this.

Yay... :)

Anyway, thanks for chiming in - but honestly at this point I don't even
really know where to start addressing this.

johannes

2023-09-26 12:58:29

by Johannes Berg

[permalink] [raw]

Subject: Re: netif_carrier_on() race

Focusing on this part for a moment, because it affects not just
wireless:

> Then, in netif_carrier_on(), we immediately set the carrier on bit, so
> that you can actually immediately see this from userspace if you ask
> rtnetlink, however, it's not actually immediately _functional_ - it
> still needs to schedule and run the linkwatch work first, to call
> dev_activate() to change the (TX queue) qdisc(s) away from noop.
>
> Also, even though you can already query the carrier state and see it on,
> the actual rtnetlink _event_ for this only happens from the linkwatch
> work as well, via netdev_state_change().
>
> All of this makes sense since you need to hold RTNL for all those state
> changes/notifier chains, but it does lead to the first race/consistency
> problem: if you query at just the right time you can see carrier being
> on, however, if the carrier is actually removed again and the linkwatch
> work didn't run yet, there might never be an event for the carrier on,
> iow, you might have:
>
> netif_carrier_on()
> query from userspace and see carrier on
> netif_carrier_off()
> linkwatch work runs and sends only carrier off event

and also because, as Andrew mentioned, you can have the exact opposite
problem...

It can actually happen that something _else_ sends an event, so even if
userspace does't query but waits for a carrier on event, you could end
up with:

* netif_carrier_on()

* something else triggers netdev_state_change(),
even userspace setting link alias

netdev_state_change()
-> sends an rtnetlink event saying carrier is on

* userspace transmits but frames are dropped

* linkwatch work runs and enables qdiscs only now

To address this issue, we could introduce a new state, say
__LINK_STATE_CARRIER_COMPLETE or something like that, which is used when
communicating carrier state to userspace, and only set/cleared in
dev_activate()/dev_deactivate(). That way, any events to userspace (or
userspace querying) wouldn't show the carrier state until it's actually
fully reflected in software (qdiscs) too.

This doesn't fully solve _my_ (wifi) problem, but perhaps lets me work
around it in userspace by querying for the carrier state, if it's
reflected correctly ("fully ready to transmit") then we can do that.
Right now, we can't even do that.

But would it break something else? There's also a way to query it via
ethtool, which perhaps should _not_ be converted to _COMPLETE since you
could argue the ethtool link state is about the physical link, and with
this change the carrier becomes about the logical link in a fashion?

johannes

2023-12-01 10:43:49

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wireless-next 0/3] netlink carrier race workaround

Hi again,

So I had put this aside for a while, but really got annoyed by all
the test failures now ... thinking about this again I basically now
arrived at a variant of solution #3 previously outlined, and I've
kind of convinced myself that userspace should always get an event
with a new carrier_up_count as it does today.

So I've implemented a new nl80211 attribute carrying the current
carrier_up_count at the time of the wireless event, so that we can
(in userspace) wait for the corresponding rtnetlink event if we
haven't seen it yet. Patches for wpa_supplicant to follow.

johannes

2023-12-01 10:44:01

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wireless-next 1/3] wifi: nl80211: refactor nl80211_send_mlme_event() arguments

From: Johannes Berg <[email protected]>

This function has so many arguments already, before adding
yet another one, refactor it to take a struct instead.

Signed-off-by: Johannes Berg <[email protected]>
---
net/wireless/nl80211.c | 98 +++++++++++++++++++++++++++++-------------
1 file changed, 67 insertions(+), 31 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 12b7bd92bb86..46a79ed1c97c 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -17736,21 +17736,29 @@ void nl80211_common_reg_change_event(enum nl80211_commands cmd_id,
nlmsg_free(msg);
}

+struct nl80211_mlme_event {
+ enum nl80211_commands cmd;
+ const u8 *buf;
+ size_t buf_len;
+ int uapsd_queues;
+ const u8 *req_ies;
+ size_t req_ies_len;
+ bool reconnect;
+};
+
static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev,
struct net_device *netdev,
- const u8 *buf, size_t len,
- enum nl80211_commands cmd, gfp_t gfp,
- int uapsd_queues, const u8 *req_ies,
- size_t req_ies_len, bool reconnect)
+ const struct nl80211_mlme_event *event,
+ gfp_t gfp)
{
struct sk_buff *msg;
void *hdr;

- msg = nlmsg_new(100 + len + req_ies_len, gfp);
+ msg = nlmsg_new(100 + event->buf_len + event->req_ies_len, gfp);
if (!msg)
return;

- hdr = nl80211hdr_put(msg, 0, 0, 0, cmd);
+ hdr = nl80211hdr_put(msg, 0, 0, 0, event->cmd);
if (!hdr) {
nlmsg_free(msg);
return;
@@ -17758,22 +17766,24 @@ static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev,

if (nla_put_u32(msg, NL80211_ATTR_WIPHY, rdev->wiphy_idx) ||
nla_put_u32(msg, NL80211_ATTR_IFINDEX, netdev->ifindex) ||
- nla_put(msg, NL80211_ATTR_FRAME, len, buf) ||
- (req_ies &&
- nla_put(msg, NL80211_ATTR_REQ_IE, req_ies_len, req_ies)))
+ nla_put(msg, NL80211_ATTR_FRAME, event->buf_len, event->buf) ||
+ (event->req_ies &&
+ nla_put(msg, NL80211_ATTR_REQ_IE, event->req_ies_len,
+ event->req_ies)))
goto nla_put_failure;

- if (reconnect && nla_put_flag(msg, NL80211_ATTR_RECONNECT_REQUESTED))
+ if (event->reconnect &&
+ nla_put_flag(msg, NL80211_ATTR_RECONNECT_REQUESTED))
goto nla_put_failure;

- if (uapsd_queues >= 0) {
+ if (event->uapsd_queues >= 0) {
struct nlattr *nla_wmm =
nla_nest_start_noflag(msg, NL80211_ATTR_STA_WME);
if (!nla_wmm)
goto nla_put_failure;

if (nla_put_u8(msg, NL80211_STA_WME_UAPSD_QUEUES,
- uapsd_queues))
+ event->uapsd_queues))
goto nla_put_failure;

nla_nest_end(msg, nla_wmm);
@@ -17793,37 +17803,60 @@ void nl80211_send_rx_auth(struct cfg80211_registered_device *rdev,
struct net_device *netdev, const u8 *buf,
size_t len, gfp_t gfp)
{
- nl80211_send_mlme_event(rdev, netdev, buf, len,
- NL80211_CMD_AUTHENTICATE, gfp, -1, NULL, 0,
- false);
+ struct nl80211_mlme_event event = {
+ .cmd = NL80211_CMD_AUTHENTICATE,
+ .buf = buf,
+ .buf_len = len,
+ .uapsd_queues = -1,
+ };
+
+ nl80211_send_mlme_event(rdev, netdev, &event, gfp);
}

void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev,
struct net_device *netdev,
struct cfg80211_rx_assoc_resp_data *data)
{
- nl80211_send_mlme_event(rdev, netdev, data->buf, data->len,
- NL80211_CMD_ASSOCIATE, GFP_KERNEL,
- data->uapsd_queues,
- data->req_ies, data->req_ies_len, false);
+ struct nl80211_mlme_event event = {
+ .cmd = NL80211_CMD_ASSOCIATE,
+ .buf = data->buf,
+ .buf_len = data->len,
+ .uapsd_queues = data->uapsd_queues,
+ .req_ies = data->req_ies,
+ .req_ies_len = data->req_ies_len,
+ };
+
+ nl80211_send_mlme_event(rdev, netdev, &event, GFP_KERNEL);
}

void nl80211_send_deauth(struct cfg80211_registered_device *rdev,
struct net_device *netdev, const u8 *buf,
size_t len, bool reconnect, gfp_t gfp)
{
- nl80211_send_mlme_event(rdev, netdev, buf, len,
- NL80211_CMD_DEAUTHENTICATE, gfp, -1, NULL, 0,
- reconnect);
+ struct nl80211_mlme_event event = {
+ .cmd = NL80211_CMD_DEAUTHENTICATE,
+ .buf = buf,
+ .buf_len = len,
+ .reconnect = reconnect,
+ .uapsd_queues = -1,
+ };
+
+ nl80211_send_mlme_event(rdev, netdev, &event, gfp);
}

void nl80211_send_disassoc(struct cfg80211_registered_device *rdev,
struct net_device *netdev, const u8 *buf,
size_t len, bool reconnect, gfp_t gfp)
{
- nl80211_send_mlme_event(rdev, netdev, buf, len,
- NL80211_CMD_DISASSOCIATE, gfp, -1, NULL, 0,
- reconnect);
+ struct nl80211_mlme_event event = {
+ .cmd = NL80211_CMD_DISASSOCIATE,
+ .buf = buf,
+ .buf_len = len,
+ .reconnect = reconnect,
+ .uapsd_queues = -1,
+ };
+
+ nl80211_send_mlme_event(rdev, netdev, &event, gfp);
}

void cfg80211_rx_unprot_mlme_mgmt(struct net_device *dev, const u8 *buf,
@@ -17833,28 +17866,31 @@ void cfg80211_rx_unprot_mlme_mgmt(struct net_device *dev, const u8 *buf,
struct wiphy *wiphy = wdev->wiphy;
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wiphy);
const struct ieee80211_mgmt *mgmt = (void *)buf;
- u32 cmd;
+ struct nl80211_mlme_event event = {
+ .buf = buf,
+ .buf_len = len,
+ .uapsd_queues = -1,
+ };

if (WARN_ON(len < 2))
return;

if (ieee80211_is_deauth(mgmt->frame_control)) {
- cmd = NL80211_CMD_UNPROT_DEAUTHENTICATE;
+ event.cmd = NL80211_CMD_UNPROT_DEAUTHENTICATE;
} else if (ieee80211_is_disassoc(mgmt->frame_control)) {
- cmd = NL80211_CMD_UNPROT_DISASSOCIATE;
+ event.cmd = NL80211_CMD_UNPROT_DISASSOCIATE;
} else if (ieee80211_is_beacon(mgmt->frame_control)) {
if (wdev->unprot_beacon_reported &&
elapsed_jiffies_msecs(wdev->unprot_beacon_reported) < 10000)
return;
- cmd = NL80211_CMD_UNPROT_BEACON;
+ event.cmd = NL80211_CMD_UNPROT_BEACON;
wdev->unprot_beacon_reported = jiffies;
} else {
return;
}

trace_cfg80211_rx_unprot_mlme_mgmt(dev, buf, len);
- nl80211_send_mlme_event(rdev, dev, buf, len, cmd, GFP_ATOMIC, -1,
- NULL, 0, false);
+ nl80211_send_mlme_event(rdev, dev, &event, GFP_ATOMIC);
}
EXPORT_SYMBOL(cfg80211_rx_unprot_mlme_mgmt);

--
2.43.0

2023-12-01 10:44:09

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wireless-next 2/3] wifi: cfg80211: make RX assoc data const

From: Johannes Berg <[email protected]>

This is just a collection of data and we only read it,
so make it const.

Signed-off-by: Johannes Berg <[email protected]>
---
include/net/cfg80211.h | 2 +-
net/wireless/mlme.c | 2 +-
net/wireless/nl80211.c | 2 +-
net/wireless/nl80211.h | 2 +-
net/wireless/trace.h | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index d36ad4cedf3b..d59669d86718 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -7312,7 +7312,7 @@ struct cfg80211_rx_assoc_resp_data {
* This function may sleep. The caller must hold the corresponding wdev's mutex.
*/
void cfg80211_rx_assoc_resp(struct net_device *dev,
- struct cfg80211_rx_assoc_resp_data *data);
+ const struct cfg80211_rx_assoc_resp_data *data);

/**
* struct cfg80211_assoc_failure - association failure data
diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index bad9e4fd842f..f635a8b6ca2e 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -22,7 +22,7 @@

void cfg80211_rx_assoc_resp(struct net_device *dev,
- struct cfg80211_rx_assoc_resp_data *data)
+ const struct cfg80211_rx_assoc_resp_data *data)
{
struct wireless_dev *wdev = dev->ieee80211_ptr;
struct wiphy *wiphy = wdev->wiphy;
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 46a79ed1c97c..403a4a38966a 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -17815,7 +17815,7 @@ void nl80211_send_rx_auth(struct cfg80211_registered_device *rdev,

void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev,
struct net_device *netdev,
- struct cfg80211_rx_assoc_resp_data *data)
+ const struct cfg80211_rx_assoc_resp_data *data)
{
struct nl80211_mlme_event event = {
.cmd = NL80211_CMD_ASSOCIATE,
diff --git a/net/wireless/nl80211.h b/net/wireless/nl80211.h
index aad40240d9cb..6376f3a87f8a 100644
--- a/net/wireless/nl80211.h
+++ b/net/wireless/nl80211.h
@@ -60,7 +60,7 @@ void nl80211_send_rx_auth(struct cfg80211_registered_device *rdev,
const u8 *buf, size_t len, gfp_t gfp);
void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev,
struct net_device *netdev,
- struct cfg80211_rx_assoc_resp_data *data);
+ const struct cfg80211_rx_assoc_resp_data *data);
void nl80211_send_deauth(struct cfg80211_registered_device *rdev,
struct net_device *netdev,
const u8 *buf, size_t len,
diff --git a/net/wireless/trace.h b/net/wireless/trace.h
index 30cd1bd58aac..4de710efa47e 100644
--- a/net/wireless/trace.h
+++ b/net/wireless/trace.h
@@ -2928,7 +2928,7 @@ DEFINE_EVENT(netdev_evt_only, cfg80211_send_rx_auth,

TRACE_EVENT(cfg80211_send_rx_assoc,
TP_PROTO(struct net_device *netdev,
- struct cfg80211_rx_assoc_resp_data *data),
+ const struct cfg80211_rx_assoc_resp_data *data),
TP_ARGS(netdev, data),
TP_STRUCT__entry(
NETDEV_ENTRY
--
2.43.0

2023-12-01 10:44:10

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wireless-next 3/3] wifi: nl80211: report carrier up count to userspace

From: Johannes Berg <[email protected]>

There's a race in the carrier change that happens if userspace
sees the RX association event via nl80211, but the driver (or
mac80211) just prior to that set the carrier on. The carrier
on event is actually only processed by the link watch work, so
userspace can (and we've seen this at least in simulation with
ARCH=um and time-travel) attempt to send a frame before that
has run, if it was just waiting for the association to finish
(only on open connections, of course, for encryption it has to
go through the 4-way-handshake first before sending data frames.)

There's really no completely good way to address this, I've
previously analyzed this here:
https://lore.kernel.org/netdev/[email protected]/

This new solution requires both kernel and userspace work, it
basically builds on #3 outlined in the email linked above, but
with the addition of letting userspace _know_ that it may need
to wait for the rtnetlink event.

So to solve it, with this change userspace can see the value of
the carrier_up_count at the association event, and if it hasn't
yet seen the same value via an rtnetlink event (it is imperative
that it doesn't query, it must wait for events) then it can wait
for that event before trying to send data frames.

For now, implement this for association and IBSS joined events.

Signed-off-by: Johannes Berg <[email protected]>
---
include/uapi/linux/nl80211.h | 16 ++++++++++++
net/wireless/nl80211.c | 47 ++++++++++++++++--------------------
2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 0cd1da2c2902..120936f81a28 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -2830,6 +2830,20 @@ enum nl80211_commands {
* @NL80211_ATTR_MLO_LINK_DISABLED: Flag attribute indicating that the link is
* disabled.
*
+ * @NL80211_ATTR_CARRIER_UP_COUNT: This u32 attribute is included in some
+ * events that indicate a successful connection (notably association),
+ * indicating the value of the netdev's carrier_up_count at the time
+ * of sending this event. Userspace can use this to fix a race: when
+ * the carrier is turned on, the actual handling thereof is done in
+ * an asynchronous manner in the kernel. Thus, if userspace attempts
+ * to send a frame immediately after receiving the event indicating
+ * successful connection over nl80211, that may not go through if the
+ * asynchronous processing in the kernel hasn't yet happened. To fix
+ * it then userspace should be listening to rtnetlink events, and if
+ * it didn't see the value of the carrier_up_count yet, it can wait
+ * for a further rtnetlink event with a value equal to or bigger than
+ * the value reported here, and only then transmit data.
+ *
* @NUM_NL80211_ATTR: total number of nl80211_attrs available
* @NL80211_ATTR_MAX: highest attribute number currently defined
* @__NL80211_ATTR_AFTER_LAST: internal use
@@ -3368,6 +3382,8 @@ enum nl80211_attrs {

NL80211_ATTR_MLO_LINK_DISABLED,

+ NL80211_ATTR_CARRIER_UP_COUNT,
+
/* add attributes here, update the policy in nl80211.c */

__NL80211_ATTR_AFTER_LAST,
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 403a4a38966a..d91a99f90aaa 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -818,6 +818,7 @@ static const struct nla_policy nl80211_policy[NUM_NL80211_ATTR] = {
[NL80211_ATTR_HW_TIMESTAMP_ENABLED] = { .type = NLA_FLAG },
[NL80211_ATTR_EMA_RNR_ELEMS] = { .type = NLA_NESTED },
[NL80211_ATTR_MLO_LINK_DISABLED] = { .type = NLA_FLAG },
+ [NL80211_ATTR_CARRIER_UP_COUNT] = { .type = NLA_REJECT },
};

/* policy for the key attributes */
@@ -17738,11 +17739,13 @@ void nl80211_common_reg_change_event(enum nl80211_commands cmd_id,

struct nl80211_mlme_event {
enum nl80211_commands cmd;
+ const u8 *mac_addr;
const u8 *buf;
size_t buf_len;
int uapsd_queues;
const u8 *req_ies;
size_t req_ies_len;
+ u32 carrier_up_count;
bool reconnect;
};

@@ -17766,12 +17769,17 @@ static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev,

if (nla_put_u32(msg, NL80211_ATTR_WIPHY, rdev->wiphy_idx) ||
nla_put_u32(msg, NL80211_ATTR_IFINDEX, netdev->ifindex) ||
- nla_put(msg, NL80211_ATTR_FRAME, event->buf_len, event->buf) ||
+ (event->buf &&
+ nla_put(msg, NL80211_ATTR_FRAME, event->buf_len, event->buf)) ||
(event->req_ies &&
nla_put(msg, NL80211_ATTR_REQ_IE, event->req_ies_len,
event->req_ies)))
goto nla_put_failure;

+ if (event->mac_addr &&
+ nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, event->mac_addr))
+ goto nla_put_failure;
+
if (event->reconnect &&
nla_put_flag(msg, NL80211_ATTR_RECONNECT_REQUESTED))
goto nla_put_failure;
@@ -17789,6 +17797,11 @@ static void nl80211_send_mlme_event(struct cfg80211_registered_device *rdev,
nla_nest_end(msg, nla_wmm);
}

+ if (event->carrier_up_count &&
+ nla_put_u32(msg, NL80211_ATTR_CARRIER_UP_COUNT,
+ event->carrier_up_count))
+ goto nla_put_failure;
+
genlmsg_end(msg, hdr);

genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
@@ -17824,6 +17837,7 @@ void nl80211_send_rx_assoc(struct cfg80211_registered_device *rdev,
.uapsd_queues = data->uapsd_queues,
.req_ies = data->req_ies,
.req_ies_len = data->req_ies_len,
+ .carrier_up_count = atomic_read(&netdev->carrier_up_count),
};

nl80211_send_mlme_event(rdev, netdev, &event, GFP_KERNEL);
@@ -18307,32 +18321,13 @@ void nl80211_send_ibss_bssid(struct cfg80211_registered_device *rdev,
struct net_device *netdev, const u8 *bssid,
gfp_t gfp)
{
- struct sk_buff *msg;
- void *hdr;
+ struct nl80211_mlme_event event = {
+ .cmd = NL80211_CMD_JOIN_IBSS,
+ .mac_addr = bssid,
+ .carrier_up_count = atomic_read(&netdev->carrier_up_count),
+ };

- msg = nlmsg_new(NLMSG_DEFAULT_SIZE, gfp);
- if (!msg)
- return;
-
- hdr = nl80211hdr_put(msg, 0, 0, 0, NL80211_CMD_JOIN_IBSS);
- if (!hdr) {
- nlmsg_free(msg);
- return;
- }
-
- if (nla_put_u32(msg, NL80211_ATTR_WIPHY, rdev->wiphy_idx) ||
- nla_put_u32(msg, NL80211_ATTR_IFINDEX, netdev->ifindex) ||
- nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, bssid))
- goto nla_put_failure;
-
- genlmsg_end(msg, hdr);
-
- genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
- NL80211_MCGRP_MLME, gfp);
- return;
-
- nla_put_failure:
- nlmsg_free(msg);
+ nl80211_send_mlme_event(rdev, netdev, &event, gfp);
}

void cfg80211_notify_new_peer_candidate(struct net_device *dev, const u8 *addr,
--
2.43.0

2023-12-01 10:50:19

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wpa_supplicant 0/2] wpa_supplicant: wait for carrier race

This is the userspace part, waiting for the carrier up count to reach
the value indicated in an event before processing it, so that after
the event processing we can be sure that everything is A-OK for TX.

johannes

2023-12-01 10:50:44

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wpa_suppplicant 2/2] driver_nl82011: wait for rtnetlink event with carrier_up_count

From: Johannes Berg <[email protected]>

There's a race (see the comment in the code) between the
kernel and userspace that can lead to dropping TX packets
after the associated event was already received.

If the kernel indicates the carrier_up_count, wait for
the corresponding rtnetlink event to reach that count.
This fixes the race since the rtnetlink event is sent
after the async processing that's needed in the kernel
before a frame can be transmitted.

Signed-off-by: Johannes Berg <[email protected]>
---
src/drivers/driver_nl80211.c | 7 ++++
src/drivers/driver_nl80211.h | 2 ++
src/drivers/driver_nl80211_event.c | 52 ++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+)

diff --git a/src/drivers/driver_nl80211.c b/src/drivers/driver_nl80211.c
index 03d54222bb52..b442dee1e710 100644
--- a/src/drivers/driver_nl80211.c
+++ b/src/drivers/driver_nl80211.c
@@ -1365,6 +1365,7 @@ static void wpa_driver_nl80211_event_rtm_newlink(void *ctx,
char ifname[IFNAMSIZ + 1];
char extra[100], *pos, *end;
int init_failed;
+ u32 carrier_up_count = 0;

extra[0] = '\0';
pos = extra;
@@ -1396,6 +1397,9 @@ static void wpa_driver_nl80211_event_rtm_newlink(void *ctx,
pos += os_snprintf(pos, end - pos, " linkmode=%u",
nla_get_u32((struct nlattr *) attr));
break;
+ case IFLA_CARRIER_UP_COUNT:
+ carrier_up_count = nla_get_u32((struct nlattr *) attr);
+ break;
}
attr = RTA_NEXT(attr, attrlen);
}
@@ -1415,6 +1419,9 @@ static void wpa_driver_nl80211_event_rtm_newlink(void *ctx,
if (init_failed)
return; /* do not update interface state */

+ if (carrier_up_count)
+ drv->carrier_up_count = carrier_up_count;
+
if (!drv->if_disabled && !(ifi->ifi_flags & IFF_UP)) {
namebuf[0] = '\0';
if (if_indextoname(ifi->ifi_index, namebuf) &&
diff --git a/src/drivers/driver_nl80211.h b/src/drivers/driver_nl80211.h
index f82f604e9017..874988715b21 100644
--- a/src/drivers/driver_nl80211.h
+++ b/src/drivers/driver_nl80211.h
@@ -201,6 +201,8 @@ struct wpa_driver_nl80211_data {
unsigned int puncturing:1;
unsigned int qca_ap_allowed_freqs:1;

+ u32 carrier_up_count;
+
u32 ignore_next_local_disconnect;
u32 ignore_next_local_deauth;

diff --git a/src/drivers/driver_nl80211_event.c b/src/drivers/driver_nl80211_event.c
index 60b4fb51fcd8..4ff25b57ceeb 100644
--- a/src/drivers/driver_nl80211_event.c
+++ b/src/drivers/driver_nl80211_event.c
@@ -19,6 +19,7 @@
#include "common/ieee802_11_defs.h"
#include "common/ieee802_11_common.h"
#include "driver_nl80211.h"
+#include "netlink.h"

static void
@@ -256,6 +257,49 @@ static void nl80211_parse_wmm_params(struct nlattr *wmm_attr,
}

+/*
+ * Wait for an RTM newlink event with corresponding carrier up count
+ *
+ * There's a race condition with mac80211 and the network stack, which
+ * mostly hits depending on scheduling in time-simulation (tests),
+ * which is that mac80211 will both indicate carrier on to the network
+ * stack and send the associated/... event to userspace. Now, the
+ * internal kernel function netif_carrier_ok() immediately returns
+ * true after this, however, the linkwatch work still needs to run
+ * and change the TX queue qdisc away from noop (which drops all TX
+ * packets).
+ *
+ * When the race happens, userspace (and in particular tests) can see
+ * the associated/... event and immediately try to send a frame, at a
+ * time that the linkwatch work hasn't run yet, causing the frame to
+ * be dropped.
+ *
+ * Thus, if the kernel indicated the current carrier_up_count in an
+ * event, wait here for an RTM newlink event for our interface, so in
+ * in addition to seeing the associated/... event, we also know the
+ * carrier state has actually changed sufficiently to send packets,
+ * if it was meant to change.
+ *
+ * This works because the event to userspace is also sent from the
+ * asynchronous linkwatch work.
+ */
+static void
+nl80211_wait_for_carrier_up_count(struct wpa_driver_nl80211_data *drv,
+ u32 carrier_up_count)
+{
+#define WRAPPED_U32_LESS(x, y) ((s32)(y) - (s32)(x) < 0)
+
+ if (WRAPPED_U32_LESS(drv->carrier_up_count, carrier_up_count))
+ netlink_process_one_event(drv->global->netlink, 100);
+
+ if (WRAPPED_U32_LESS(drv->carrier_up_count, carrier_up_count))
+ wpa_printf(MSG_ERROR,
+ "nl80211: %s: carrier up count %u not seen (got %u)\n",
+ drv->first_bss->ifname, carrier_up_count,
+ drv->carrier_up_count);
+}
+
+
static void mlme_event_assoc(struct wpa_driver_nl80211_data *drv,
const u8 *frame, size_t len, struct nlattr *wmm,
struct nlattr *req_ie)
@@ -3845,6 +3889,14 @@ static void do_process_drv_event(struct i802_bss *bss, int cmd,
cmd == NL80211_CMD_SCAN_ABORTED))
nl80211_restore_ap_mode(bss);

+ /* see comment above wpa_driver_nl80211_own_ifname() */
+ if (tb[NL80211_ATTR_CARRIER_UP_COUNT]) {
+ u32 carrier_up_count =
+ nla_get_u32(tb[NL80211_ATTR_CARRIER_UP_COUNT]);
+
+ nl80211_wait_for_carrier_up_count(drv, carrier_up_count);
+ }
+
switch (cmd) {
case NL80211_CMD_TRIGGER_SCAN:
wpa_dbg(drv->ctx, MSG_DEBUG, "nl80211: Scan trigger");
--
2.43.0

2023-12-01 10:51:19

by Johannes Berg

[permalink] [raw]

Subject: [PATCH wpa_suppplicant 1/2] netlink: add netlink_process_one_event()

From: Johannes Berg <[email protected]>

Add a new function to read and process a single netlink event
with a timeout, to be used in driver_nl80211.

Signed-off-by: Johannes Berg <[email protected]>
---
src/drivers/netlink.c | 43 +++++++++++++++++++++++++++++++++++++++----
src/drivers/netlink.h | 2 ++
2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/src/drivers/netlink.c b/src/drivers/netlink.c
index 7780479c3e91..bbfe86eee7a0 100644
--- a/src/drivers/netlink.c
+++ b/src/drivers/netlink.c
@@ -33,19 +33,20 @@ static void netlink_receive_link(struct netlink_data *netlink,
}

-static void netlink_receive(int sock, void *eloop_ctx, void *sock_ctx)
+static void _netlink_process_one_event(struct netlink_data *netlink,
+ int wait_single)
{
- struct netlink_data *netlink = eloop_ctx;
char buf[8192];
int left;
struct sockaddr_nl from;
socklen_t fromlen;
struct nlmsghdr *h;
- int max_events = 10;
+ int max_events = wait_single ? 1 : 10;

try_again:
fromlen = sizeof(from);
- left = recvfrom(sock, buf, sizeof(buf), MSG_DONTWAIT,
+ left = recvfrom(netlink->sock, buf, sizeof(buf),
+ wait_single ? 0 : MSG_DONTWAIT,
(struct sockaddr *) &from, &fromlen);
if (left < 0) {
if (errno != EINTR && errno != EAGAIN)
@@ -88,6 +89,40 @@ try_again:
}

+void netlink_process_one_event(struct netlink_data *netlink,
+ unsigned int timeout_ms)
+{
+ if (timeout_ms) {
+ struct timeval timeout = {
+ .tv_sec = timeout_ms / 1000,
+ .tv_usec = 1000 * (timeout_ms % 1000),
+ };
+ fd_set read_set;
+ int ret;
+
+ FD_ZERO(&read_set);
+ FD_SET(netlink->sock, &read_set);
+
+ ret = select(netlink->sock + 1, &read_set, NULL, NULL,
+ &timeout);
+ if (ret < 0) {
+ perror("select on netlink socket");
+ return;
+ }
+ if (ret == 0)
+ return;
+ }
+
+ _netlink_process_one_event(netlink, 1);
+}
+
+
+static void netlink_receive(int sock, void *eloop_ctx, void *sock_ctx)
+{
+ _netlink_process_one_event(eloop_ctx, 0);
+}
+
+
struct netlink_data * netlink_init(struct netlink_config *cfg)
{
struct netlink_data *netlink;
diff --git a/src/drivers/netlink.h b/src/drivers/netlink.h
index 3a7340e51534..faee28b722ea 100644
--- a/src/drivers/netlink.h
+++ b/src/drivers/netlink.h
@@ -21,6 +21,8 @@ struct netlink_config {
};

struct netlink_data * netlink_init(struct netlink_config *cfg);
+void netlink_process_one_event(struct netlink_data *netlink,
+ unsigned int timeout_ms);
void netlink_deinit(struct netlink_data *netlink);
int netlink_send_oper_ifla(struct netlink_data *netlink, int ifindex,
int linkmode, int operstate);
--
2.43.0

2023-12-02 00:28:56

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Fri, 1 Dec 2023 11:41:14 +0100 Johannes Berg wrote:
> So I had put this aside for a while, but really got annoyed by all
> the test failures now ... thinking about this again I basically now
> arrived at a variant of solution #3 previously outlined, and I've
> kind of convinced myself that userspace should always get an event
> with a new carrier_up_count as it does today.

Would it work if we exposed "linkwatch is pending" / "link is
transitioning" bit to user space?
Even crazier, would it help if we had rtnl_getlink() run
linkwatch for the target link if linkwatch is pending?

2023-12-02 10:06:51

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Fri, 2023-12-01 at 16:28 -0800, Jakub Kicinski wrote:
> On Fri, 1 Dec 2023 11:41:14 +0100 Johannes Berg wrote:
> > So I had put this aside for a while, but really got annoyed by all
> > the test failures now ... thinking about this again I basically now
> > arrived at a variant of solution #3 previously outlined, and I've
> > kind of convinced myself that userspace should always get an event
> > with a new carrier_up_count as it does today.
>
> Would it work if we exposed "linkwatch is pending" / "link is
> transitioning" bit to user space?

Not sure, not by much or more than what this did? It's basically the
same, I think: I exposed the carrier_up_count at the kernel time, so if
userspace hasn't seen an event with a value >= that it knows the link is
transitioning.

> Even crazier, would it help if we had rtnl_getlink() run
> linkwatch for the target link if linkwatch is pending?

Sure, if we were to just synchronize that at the right time (doesn't
even need to be rtnl_getlink, could be a new operation) that'd solve the
issue too, perhaps more easily.

johannes

2023-12-02 18:47:02

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Sat, 02 Dec 2023 11:06:36 +0100 Johannes Berg wrote:
> > Would it work if we exposed "linkwatch is pending" / "link is
> > transitioning" bit to user space?
>
> Not sure, not by much or more than what this did? It's basically the
> same, I think: I exposed the carrier_up_count at the kernel time, so if
> userspace hasn't seen an event with a value >= that it knows the link is
> transitioning.

The benefit being that it'd work for everyone, without having to add
the carrier count in random events?

> > Even crazier, would it help if we had rtnl_getlink() run
> > linkwatch for the target link if linkwatch is pending?
>
> Sure, if we were to just synchronize that at the right time (doesn't
> even need to be rtnl_getlink, could be a new operation) that'd solve the
> issue too, perhaps more easily.

I was wondering about the new op, too, but "synchronize things please"
op feels a little hacky. rtnl_getlink returns link state, so it feels
somewhat natural for it to do the sync, to make sure that what it
returns is in fact correct information. No strong feelings, tho.
rtnl_getlink does return a lot, so maybe a new rtnl_getcarrier op?
Or we can make reading sysfs "carrier" do the sync?

2023-12-03 18:51:41

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Sat, 2023-12-02 at 10:46 -0800, Jakub Kicinski wrote:
> On Sat, 02 Dec 2023 11:06:36 +0100 Johannes Berg wrote:
> > > Would it work if we exposed "linkwatch is pending" / "link is
> > > transitioning" bit to user space?
> >
> > Not sure, not by much or more than what this did? It's basically the
> > same, I think: I exposed the carrier_up_count at the kernel time, so if
> > userspace hasn't seen an event with a value >= that it knows the link is
> > transitioning.
>
> The benefit being that it'd work for everyone, without having to add
> the carrier count in random events?

Well, true. You'd still have to add random rtnl_getlink() calls to your
userspace, and then wait for an event if it's transitioning? Actually a
bit _more_ complicated since then we'd have to do rtnl_getlink() after
receiving the assoc event, and then wait if still transitioning. Or I
guess we could do it when sending a frame there in the tests, but it's
another call into the kernel vs. getting the information we need in the
event.

But yeah honestly I don't mind that either, and maybe it helps address
some other use cases like what Andrew had in mind in his reply to my
original thread.

> > > Even crazier, would it help if we had rtnl_getlink() run
> > > linkwatch for the target link if linkwatch is pending?
> >
> > Sure, if we were to just synchronize that at the right time (doesn't
> > even need to be rtnl_getlink, could be a new operation) that'd solve the
> > issue too, perhaps more easily.
>
> I was wondering about the new op, too, but "synchronize things please"
> op feels a little hacky.

Agree ... but then again it's all a bit hacky. You can even read
"carrier is on" when it's really not yet ready...

> rtnl_getlink returns link state, so it feels
> somewhat natural for it to do the sync, to make sure that what it
> returns is in fact correct information.

Yeah that's a good point that I just mentioned above though - today the
kernel will happily return a state that it's not actually willing to
honour yet, i.e. if you actively read the state, you'll see carrier on
before the kernel is actually willing to transmit packets on the link.

Fixing that _would_ be nice, but I'm somewhat worried that it will cause
performance regressions to always sync there? OTOH, it would hopefully
not actually have to wait most of the time since link_watch isn't always
pending...

> rtnl_getlink does return a lot, so maybe a new rtnl_getcarrier op?

Does it matter? Just another attribute ...

> Or we can make reading sysfs "carrier" do the sync?

I think I wouldn't mind now, and perhaps if we want to sync in netlink
we should also do this here so that it's consistent, but I'm not sure
I'd want this to be the only way to do it, I might imagine that someone
might want this in some kind of container that doesn't necessarily have
(full) access there? Dunno.

We _could_ also use an input attribute on the rtnl_getlink() call to
have userspace explicitly opt in to doing the sync before returning
information?

johannes

2023-12-04 16:24:04

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Sun, 03 Dec 2023 19:51:28 +0100 Johannes Berg wrote:
> I think I wouldn't mind now, and perhaps if we want to sync in netlink
> we should also do this here so that it's consistent, but I'm not sure
> I'd want this to be the only way to do it, I might imagine that someone
> might want this in some kind of container that doesn't necessarily have
> (full) access there? Dunno.

Also dunno :) We can add a "sync" version of netif_carrier_ok()
and then call if from whatever places we need.

> We _could_ also use an input attribute on the rtnl_getlink() call to
> have userspace explicitly opt in to doing the sync before returning
> information?

Yeah, maybe.. IMHO a more "Rusty Russell API levels" thing to do would
be to allow opting out, as those who set the magic flag "know what they
are doing" and returning unsync'ed carrier may be surprising.
Also a "don't sync flag" we can add later, once someone who actually
cares appears, avoiding uAPI growth ????️

Anyway, up to you :)

2023-12-04 19:14:22

by Johannes Berg

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Mon, 2023-12-04 at 08:23 -0800, Jakub Kicinski wrote:
> On Sun, 03 Dec 2023 19:51:28 +0100 Johannes Berg wrote:
> > I think I wouldn't mind now, and perhaps if we want to sync in netlink
> > we should also do this here so that it's consistent, but I'm not sure
> > I'd want this to be the only way to do it, I might imagine that someone
> > might want this in some kind of container that doesn't necessarily have
> > (full) access there? Dunno.
>
> Also dunno :) We can add a "sync" version of netif_carrier_ok()
> and then call if from whatever places we need.

[note: netif_carrier_ok(), not netif_carrier_on(), almost confused that]

Yeah I guess we can have a netif_carrier_ok_sync(), though it feels kind
of dubious to hide such an important detail in the middle of a netlink
message building:

if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
...
#ifdef CONFIG_RPS
nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
#endif
put_master_ifindex(skb, dev) ||
nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
...

Also, if we ever _do_ want to make it optional, then it's problematic,
do we do netif_carrier_ok_maybe_sync(dev, sync)? ;-)

> > We _could_ also use an input attribute on the rtnl_getlink() call to
> > have userspace explicitly opt in to doing the sync before returning
> > information?
>
> Yeah, maybe.. IMHO a more "Rusty Russell API levels" thing to do would
> be to allow opting out, as those who set the magic flag "know what they
> are doing" and returning unsync'ed carrier may be surprising.
> Also a "don't sync flag" we can add later, once someone who actually
> cares appears, avoiding uAPI growth ????️
>
> Anyway, up to you :)

Heh. But do I want to get blamed for the (perhaps inevitable?)
performance regression? I guess I'll try ...

Actually I could even still combine this with the netif carrier up count
in the wireless events, so we only have to do the rtnl_getlink if we
haven't seen an event yet, and - in the likely common case - save the
extra roundtrip? Though I guess it's not a huge problem, it's once per
connection basically.

johannes

2023-12-04 19:47:31

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH wireless-next 0/3] netlink carrier race workaround

On Mon, 04 Dec 2023 20:14:10 +0100 Johannes Berg wrote:
> Heh. But do I want to get blamed for the (perhaps inevitable?)
> performance regression? I guess I'll try ...

I'd happily bet that nobody will notice. Feel free to add:

Suggested-by: Jakub Kicinski <[email protected]>

If that makes it better?

> Actually I could even still combine this with the netif carrier up count
> in the wireless events, so we only have to do the rtnl_getlink if we
> haven't seen an event yet, and - in the likely common case - save the
> extra roundtrip? Though I guess it's not a huge problem, it's once per
> connection basically.

No objections to merging your carrier count patches to wireless, if you
prefer to keep them. But it'd be nice to also have a generic mechanism.