Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1674567ybt; Mon, 15 Jun 2020 06:40:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxdz+9mEHHLX88vEgDD5R/IP26VasJzH46IZB3p5bvqnAiIohdg78i65RXYVIvzCVGbtnxH X-Received: by 2002:a50:9ead:: with SMTP id a42mr23043302edf.129.1592228433581; Mon, 15 Jun 2020 06:40:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592228433; cv=none; d=google.com; s=arc-20160816; b=L8MpmkDN1UxFeOYbRBO/7rm+jqovqULbRrBYLgSdLJO9I3kkodkH9f43xr2Au+V5/J vccNHIpAjo1gd+NgZ+VnflRGtE8ktEr3eh7CfJpR3O2t4lU+zd/rZhdYTRANma1RgsnL Y1QemouMW8lKsyEr+7PZ8ecOK1d2TDHuzEyZYqG+nxqAYrcCRddsdU46ieOmlVooG3Vq 1undtlVCmkDdVjudXapb0qLOPMA+Q/etYhTFFORvaPdd3LCZKMk9JE64TthG9OapH8/D 7Wi6ar/bges03tQP4crVsOfdJd4htQUYe8nX1DNw5qpwbrTF7wETkrvrDWtujZmE1g10 2AWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:to:subject :dkim-signature:dkim-filter; bh=+Lig4GsB8/lNl0OpQRHcu1GgU3n5cxYisGecZvap86U=; b=gt9AymUbIB1EN0ktIWBHub0aNdruv2tIXB6PJ9ddjMMe+QT6Ielhpgedp2TZCJ/BoI h2pZ/WPKFkUpOM2n6FP23emivLlfg46aJbALOpZJHrNiLRCuG8ULKyALClrN+0Yqs+xo QiMIy2NQ+ejJudeq5vJqrpPo5IEOKSiITy8DMGue0lJhWQyRDXewSV7Bl/J2sIHBTuqa 2zhvkhSOxAB2z1R/BFUIcqlsxaZz/Wf90v5rm3WtVm2p0rFexGkGyMDerSo9Humkft2H Czyv9BM1wlw2QGsW4qqkXOXfLKc13CPRiiJyL/eXlq6Ufy2ju+3HlzJWJvTCTyVCr8cQ JHAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=XZdZToWr; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 7si8182882edj.329.2020.06.15.06.39.55; Mon, 15 Jun 2020 06:40:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=XZdZToWr; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730417AbgFONg4 (ORCPT + 99 others); Mon, 15 Jun 2020 09:36:56 -0400 Received: from mail2.candelatech.com ([208.74.158.173]:39094 "EHLO mail3.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730109AbgFONgz (ORCPT ); Mon, 15 Jun 2020 09:36:55 -0400 Received: from [192.168.254.4] (unknown [50.34.202.127]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail3.candelatech.com (Postfix) with ESMTPSA id B25BE13C2B4 for ; Mon, 15 Jun 2020 06:36:54 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mail3.candelatech.com B25BE13C2B4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=candelatech.com; s=default; t=1592228214; bh=zOG+gnlVS4Eq4BzlwHs4k2U1+vrnWaG3yVy6hjVWVuA=; h=Subject:To:References:From:Date:In-Reply-To:From; b=XZdZToWrxZkVrN6Ax+8hDbtpl6xQL9Vp1lK55TuhK0TBsBKh7lwk5h90xJFXXpRxq 5YLGkuLUQ2TLwzt4zlCYimURlZGfursfc9PxKy51zBZAogMM2YqJYLNY0HF8+UdaYl U9X4K3YEyLob1lNnrhX5ECzukwLYBBLQweGr1nyw= Subject: Re: [PATCH] mac80211: Fix kernel hang on ax200 firmware crash. To: linux-wireless@vger.kernel.org References: <20200610204017.4531-1-greearb@candelatech.com> From: Ben Greear Message-ID: Date: Mon, 15 Jun 2020 06:36:53 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20200610204017.4531-1-greearb@candelatech.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On 06/10/2020 01:40 PM, greearb@candelatech.com wrote: > From: Ben Greear > > I backported out-of-tree ax200 driver from backport-iwlwifi to my > 5.4 kernel so that I could run ax200 beside other radios (backports > mac80211 otherwise is incompatible and other drivers will crash). > > Always possible that upstream kernel doesn't suffer from exactly this > case, but upstream ax200 is too unstable to even get this far, so... > > The ax200 firmware crash often causes the kernel to deadlock due to the > while (sta->sta_state == IEEE80211_STA_AUTHORIZED) > loop in __sta_info_Destroy_part. If sta_info_move_state does not > make progress, then it will loop forever. In my case, sta_info_move_state > fails due to the sdata-in-driver check. Hello Johannes, Have any comment on this? Thanks, Ben > > Hung process looks like this: > > CPU: 7 PID: 23301 Comm: kworker/7:0 Tainted: G W 5.4.43+ #5 > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/19/2019 > Workqueue: events_freezable ieee80211_restart_work [mac80211] > RIP: 0010:memcpy_erms+0x6/0x10 > Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 ce > RSP: 0018:ffffc90006117728 EFLAGS: 00010002 > RAX: ffffffff837ca040 RBX: 0000000000000000 RCX: 0000000000000006 > RDX: 0000000000000046 RSI: ffffffff8380aa84 RDI: ffffffff837ca080 > RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000001697 > R10: 0000000000000007 R11: 0000000000000000 R12: ffffffff837ca040 > R13: 0000000000000046 R14: 0000000000000000 R15: ffffffff8380aa44 > FS: 0000000000000000(0000) GS:ffff88826ddc0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000562e61e28f18 CR3: 00000002554f6006 CR4: 00000000003606e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > msg_print_text+0x12a/0x1e0 > console_unlock+0x160/0x600 > vprintk_emit+0x146/0x2c0 > printk+0x4d/0x69 > ? lockdep_hardirqs_on+0xf1/0x190 > __sdata_err+0x61/0x150 [mac80211] > drv_sta_state+0x433/0x8f0 [mac80211] > sta_info_move_state+0x28e/0x370 [mac80211] > __sta_info_destroy_part2+0x48/0x1d0 [mac80211] > __sta_info_flush+0xf6/0x180 [mac80211] > ieee80211_set_disassoc+0xc1/0x490 [mac80211] > ieee80211_mgd_deauth+0x291/0x420 [mac80211] > cfg80211_mlme_deauth+0xd2/0x330 [cfg80211] > cfg80211_mlme_down+0x7c/0xc0 [cfg80211] > cfg80211_disconnect+0x2b1/0x320 [cfg80211] > cfg80211_leave+0x23/0x30 [cfg80211] > cfg80211_netdev_notifier_call+0x3a5/0x680 [cfg80211] > ? lockdep_rtnl_is_held+0x11/0x20 > ? addrconf_notify+0xb4/0xbb0 [ipv6] > ? packet_notifier+0xb8/0x2c0 > notifier_call_chain+0x40/0x60 > __dev_close_many+0x68/0x120 > dev_close_many+0x83/0x130 > dev_close.part.96+0x3f/0x70 > cfg80211_shutdown_all_interfaces+0x3e/0xc0 [cfg80211] > ieee80211_reconfig+0x96/0x2180 [mac80211] > ? cond_synchronize_rcu+0x20/0x20 > ieee80211_restart_work+0xb6/0xe0 [mac80211] > process_one_work+0x27c/0x640 > worker_thread+0x47/0x3f0 > ? process_one_work+0x640/0x640 > kthread+0xfc/0x130 > ? kthread_create_worker_on_cpu+0x70/0x70 > ret_from_fork+0x24/0x30 > > With this patch, there is safety code to bail out after 1000 tries of > moving the sta state, and also I check for EIO which is returned by > the sdata-in-driver failure case and treat that as success as far as > changing sta state goes. > > Console logs look like this in the failure case, and aside from the ax200 > radio that went phantom, the rest of the system is usable: > > iwlwifi 0000:12:00.0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR > iwlwifi 0000:12:00.0: Firmware error during reconfiguration - reprobe! > iwlwifi 0000:12:00.0: Failed to start RT ucode: -5 > wlan2: Failed check-sdata-in-driver check, flags: 0x0 count: 1 > wlan2: Failed check-sdata-in-driver check, flags: 0x0 count: 1 > wlan2: Failed check-sdata-in-driver check, flags: 0x0 count: 1 > iwlwifi 0000:12:00.0: Failed to trigger RX queues sync (-5) > wlan2: Failed check-sdata-in-driver check, flags: 0x0 count: 1 > wlan2: drv_sta_state failed with EIO (sdata not in driver?), state: 4 new-state: 3 > wlan2: drv_sta_state failed with EIO (sdata not in driver?), state: 3 new-state: 2 > wlan2: drv_sta_state failed with EIO (sdata not in driver?), state: 2 new-state: 1 > wlan2: Failed check-sdata-in-driver check, flags: 0x0 count: 1 > iwlwifi 0000:12:00.0: iwl_trans_wait_txq_empty bad state = 0 > iwlwifi 0000:12:00.0: dma_pool_destroy iwlwifi:bc, 00000000d859bd4c busy > > Signed-off-by: Ben Greear > --- > net/mac80211/sta_info.c | 23 +++++++++++++++++++++-- > 1 file changed, 21 insertions(+), 2 deletions(-) > > diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c > index e2a04fc..31a3856 100644 > --- a/net/mac80211/sta_info.c > +++ b/net/mac80211/sta_info.c > @@ -1092,6 +1092,7 @@ static void __sta_info_destroy_part2(struct sta_info *sta) > struct ieee80211_sub_if_data *sdata = sta->sdata; > struct station_info *sinfo; > int ret; > + int count = 0; > > /* > * NOTE: This assumes at least synchronize_net() was done > @@ -1104,6 +1105,13 @@ static void __sta_info_destroy_part2(struct sta_info *sta) > while (sta->sta_state == IEEE80211_STA_AUTHORIZED) { > ret = sta_info_move_state(sta, IEEE80211_STA_ASSOC); > WARN_ON_ONCE(ret); > + if (++count > 1000) { > + /* WTF, bail out so that at least we don't hang the system. */ > + sdata_err(sdata, "Could not move state after 1000 tries, ret: %d state: %d\n", > + ret, sta->sta_state); > + WARN_ON_ONCE(1); > + break; > + } > } > > /* now keys can no longer be reached */ > @@ -2017,8 +2025,19 @@ int sta_info_move_state(struct sta_info *sta, > if (test_sta_flag(sta, WLAN_STA_INSERTED)) { > int err = drv_sta_state(sta->local, sta->sdata, sta, > sta->sta_state, new_state); > - if (err) > - return err; > + if (err == -EIO) { > + /* Sdata-not-in-driver, we are out of sync, but probably > + * best to carry on instead of bailing here, at least maybe > + * we can clean this up. > + */ > + sdata_err(sta->sdata, "drv_sta_state failed with EIO (sdata not in driver?), state: %d new-state: %d\n", > + sta->sta_state, new_state); > + WARN_ON_ONCE(1); > + } > + else { > + if (err) > + return err; > + } > } > > /* reflect the change in all state variables */ > -- Ben Greear Candela Technologies Inc http://www.candelatech.com