Return-path: Received: from mail-la0-f52.google.com ([209.85.215.52]:53769 "EHLO mail-la0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754688AbaDSAnB (ORCPT ); Fri, 18 Apr 2014 20:43:01 -0400 Received: by mail-la0-f52.google.com with SMTP id ec20so1737484lab.25 for ; Fri, 18 Apr 2014 17:42:57 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140419003410.GB14606@us.netrek.org> References: <1397710914-10061-1-git-send-email-bzhao@marvell.com> <1397710914-10061-2-git-send-email-bzhao@marvell.com> <477F20668A386D41ADCC57781B1F70430F70686650@SC-VEXCH1.marvell.com> <20140418044619.GF24166@us.netrek.org> <477F20668A386D41ADCC57781B1F70430F706868D4@SC-VEXCH1.marvell.com> <20140419003410.GB14606@us.netrek.org> Date: Fri, 18 Apr 2014 17:42:57 -0700 Message-ID: (sfid-20140419_024342_589237_F0EB7BE4) Subject: Re: [PATCH 2/2] mwifiex: don't clear cmd_sent flag in timeout handler From: John Tobias To: James Cameron Cc: Bing Zhao , "linux-wireless@vger.kernel.org" , "John W. Linville" , Amitkumar Karwar , Avinash Patil , Maithili Hinge , Xinming Hu , Chris Ball Content-Type: text/plain; charset=UTF-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: Hi James, May I know what processor are you using?. Thanks, john On Fri, Apr 18, 2014 at 5:34 PM, James Cameron wrote: > On Fri, Apr 18, 2014 at 12:16:07PM -0700, Bing Zhao wrote: >> Hi James, >> >> > > That "adapter->cmd_sent = false" was hoping the firmware is >> > > still alive and can respond to a new command. The reality is >> > > that the timeout usually indicates the firmware has already >> > > hung. Sending another command won't recover it in this case. >> > >> > I'm dealing with a firmware hang when more than 13 nodes are in an >> > ad-hoc IBSS, and I've just found out isn't entirely a firmware >> > hang; in that we can see beacons and probe responses from the >> > card, using tcpdump and monitor mode. >> > >> > I'm interested to know if the "firmware hangs" that you experiment >> > with prevent autonomous RF TX, or if RF TX typically proceeds. >> >> It depends. Even if firmware hangs the hardware is still alive. >> So you could see beacons and probe responses from the card if >> hardware has been programmed before firmware hangs. > > Thanks. I neglected to mention the time period; beacons and probe > responses are seen for many minutes after the timeout report by the > driver, and I have not yet tested for how long this lasts. The probe > responses are in reply to new probe requests. It makes me think the > card is working fine, apart from not communicating with the host. > > HOST_INSTATUS_REG, RD_BITMAP_{U,L} are all zero when read at the > timeout. > > I am reliably reproducing this particular problem. > >> > > I guess you are using SDIO chip. If your host controller >> > > supports MMC_POWER_OFF/UP, you can reset the chip with this >> > > approach: >> > > >> > > mmc_remove_host(host); >> > > /* some delay */ >> > > mmc_add_host(host); >> > >> > Thanks, adding that to my list of things to try, as I am using >> > SDIO too. >> >> This code (with 20ms delay) is already in latest driver. Your >> platform and controller may require a longer delay. > > Thanks. This is the patch I found: > > mwifiex: add support for SDIO card reset > > and it isn't in our tree yet. > > Yes, we may need to test the delay required. We have a host GPIO > that drives power to the card. We have discharge clamps on that path > as well. mmc_* is configured through device-tree to use the GPIO, > which we use for suspend and resume. We have power-delay-ms > properties but they aren't used. > > I've been testing the patch with 3000ms delay, and additional output: > > pr_err("Resetting card (3000ms) ...\n"); > mmc_remove_host(reset_host); > pr_err("removed host\n"); > mdelay(3000); > pr_err("delayed\n"); > mmc_add_host(reset_host); > pr_err("added host\n"); > > If the host joins an IBSS with 10 peers, and three more peers added, > the wireless LED stays on, and: > > [ 105.023274] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865681.433582) = 0xa4, act = 0x0 > [ 105.033735] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0 > [ 105.039533] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0 > [ 105.045235] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1 > [ 105.045245] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0 > [ 105.055866] mwifiex_sdio mmc0:0001:1: last_cmd_index = 3 > [ 105.061148] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 2 > [ 105.066868] mwifiex_sdio mmc0:0001:1: last_event_index = 3 > [ 105.072320] mwifiex_sdio mmc0:0001:1: data_sent=0 cmd_sent=1 > [ 105.077944] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0 > [ 105.083408] mwifiex_sdio: Resetting card (3000ms) ... > [ 105.083408] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing > [ 105.098195] mwifiex_sdio mmc0:0001:1: cmd timeout > > This is mmc_remove_host not returning. I've no idea why yet. +CC cjb. > > If the host joins an IBSS with with 13 peers, the wireless LED goes > off, and: > > [ 83.603038] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id (1397865805.48239) = 0x10, act = 0x1 > [ 83.613425] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0 > [ 83.613425] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0 > [ 83.624911] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1 > [ 83.624918] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0 > [ 83.635542] mwifiex_sdio mmc0:0001:1: last_cmd_index = 2 > [ 83.640833] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 1 > [ 83.646542] mwifiex_sdio mmc0:0001:1: last_event_index = 2 > [ 83.652002] mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1 > [ 83.657612] mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0 > [ 83.663071] mwifiex_sdio: Resetting card (3000ms) ... > [ 83.668157] mwifiex_sdio mmc0:0001:1: curr_cmd is still in processing > [ 83.677902] mwifiex_sdio mmc0:0001:1: failed to get signal information > [ 83.684925] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed > [ 83.713537] mmc0: card 0001 removed > [ 83.713537] mwifiex_sdio: removed host > [ 87.660599] mwifiex_sdio: delayed > [ 87.703045] mwifiex_sdio: added host > [ 87.740247] mmc0: new high speed SDIO card at address 0001 > [ 97.911584] mwifiex_sdio mmc0:0001:1: FW failed to be active in time > > But bringing the card back to life has failed. It seems to depend on > what command was outstanding; get RSSI vs MAC multicast address. > > Is there another patch needed? I looked through all the patches but > none seemed to relate to this. > > What about forcing a reset instead of using power? We have a host > GPIO tied to the reset input on the card. > > -- > James Cameron > http://quozl.linux.org.au/