Subject: Multi-client EAPOL key timeout when not having RTNL lock protection

Hi,

In the multi-client (64 or higher clients on single radio) test scenario
of AP mode using hostapd, we are facing EAPOL key timeout for random
clients.

wlan1: STA 00:41:c0:a8:03:10 WPA: received EAPOL-Key msg 4/4 in invalid
state (7) – dropped

This is happening due to delay in transmission of association response
frames for retried association request frames from the client and one of
the association requests is received when EAPOL key exchange is in
process. NL80211_CMD_NEW_STATION is received on hostapd when already
EAPOL M3 is transmitted and waiting for EAPOL M4. But since hostapd
received NL80211_CMD_NEW_STATION, it resets the handshake process from
M1 again by the time client sends M4 to see the above error.

This delay is seen only after the commit, a05829a7222e ("cfg80211: avoid
holding the RTNL when calling the driver") and not seen before/without
this commit. We could see delay in processing of nl80211_get_key,
nl80211_set_key, nl80211_new_key, nl80211_del_key and nl80211_tx_mgmt
commands.

The delay and EAPOL key timeout is not seen when NL80211_FLAG_NEED_RTNL
is set back to internal_flags of nl80211_get_key, nl80211_set_key,
nl80211_new_key, nl80211_del_key and nl80211_tx_mgmt messages alone.

Please share your comments on this issue requiring RTNL lock for key and
mgmt nl80211 commands.

With regards,
Sathishkumar


2021-09-02 14:22:34

by Johannes Berg

[permalink] [raw]
Subject: Re: Multi-client EAPOL key timeout when not having RTNL lock protection

Hi Sathishkumar,

> In the multi-client (64 or higher clients on single radio) test scenario
> of AP mode using hostapd, we are facing EAPOL key timeout for random
> clients.
>
> wlan1: STA 00:41:c0:a8:03:10 WPA: received EAPOL-Key msg 4/4 in invalid
> state (7) – dropped

I think you'll probably have to share a more complete hostapd log, and
likely also send to Jouni/hostap list.

> This is happening due to delay in transmission of association response
> frames for retried association request frames from the client and one of
> the association requests is received when EAPOL key exchange is in
> process.
>

You're talking about the AP, which is transmitting the association
response. How could the AP possibly have sent an EAPOL msg 1/4 before
getting an ACK on the association response? Why are there retries for
this anyway?

> NL80211_CMD_NEW_STATION is received on hostapd when already
> EAPOL M3 is transmitted and waiting for EAPOL M4.

Are you sure this is with an in-kernel driver?

Hostapd should be creating the stations?

> But since hostapd
> received NL80211_CMD_NEW_STATION, it resets the handshake process from
> M1 again by the time client sends M4 to see the above error.

Yeah, but ... why isn't hostapd doing that?

> This delay is seen only after the commit, a05829a7222e ("cfg80211: avoid
> holding the RTNL when calling the driver") and not seen before/without
> this commit. We could see delay in processing of nl80211_get_key,
> nl80211_set_key, nl80211_new_key, nl80211_del_key and nl80211_tx_mgmt
> commands.
>
> The delay and EAPOL key timeout is not seen when NL80211_FLAG_NEED_RTNL
> is set back to internal_flags of nl80211_get_key, nl80211_set_key,
> nl80211_new_key, nl80211_del_key and nl80211_tx_mgmt messages alone.
>
> Please share your comments on this issue requiring RTNL lock for key and
> mgmt nl80211 commands.

Not going to happen, you need to find the real cause of this.

johannes