2022-01-12 22:37:01

by Johannes Berg

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Tue, 2022-01-11 at 20:47 -0500, Len Brown wrote:
>
> I recently installed Fedora 35, and the card worked fine.
> But when I built and booted a 5.15 or 5.16 upstream kernel from src,
> the card fails at initialization and I've not found a way to revive it.
> (dmesg below)

Ouch.

> [ 11.411533] iwlwifi 0000:3a:00.0: Loaded firmware version: 17.3216344376.0 3160-17.ucode
> [ 11.412815] iwlwifi 0000:3a:00.0: 0x00000038 | BAD_COMMAND

...

> [ 11.688317] iwlwifi 0000:3a:00.0: FW error in SYNC CMD PER_CHAIN_LIMIT_OFFSET_CMD

Looks like somehow this slipped through and we're sending a new command
with an old firmware.


Something like this might help?

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 7ee4802a5ef1..56b9363a9111 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -1026,7 +1026,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
/* the ops field is at the same spot for all versions, so set in v1 */
cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);

- if (cmd_ver == 5) {
+ if (cmd_ver == IWL_FW_CMD_VER_UNKNOWN) {
+ return 0;
+ } else if (cmd_ver == 5) {
len = sizeof(cmd.v5);
n_bands = ARRAY_SIZE(cmd.v5.table[0]);
n_profiles = ACPI_NUM_GEO_PROFILES_REV3;


Luca, are you aware of anything in this area?

johannes


2022-01-12 22:39:45

by Luciano Coelho

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

Hi,

On Wed, 2022-01-12 at 09:34 +0100, Johannes Berg wrote:
> On Tue, 2022-01-11 at 20:47 -0500, Len Brown wrote:
> >
> > I recently installed Fedora 35, and the card worked fine.
> > But when I built and booted a 5.15 or 5.16 upstream kernel from src,
> > the card fails at initialization and I've not found a way to revive it.
> > (dmesg below)
>
> Ouch.
>
> > [ 11.411533] iwlwifi 0000:3a:00.0: Loaded firmware version: 17.3216344376.0 3160-17.ucode
> > [ 11.412815] iwlwifi 0000:3a:00.0: 0x00000038 | BAD_COMMAND
>
> ...
>
> > [ 11.688317] iwlwifi 0000:3a:00.0: FW error in SYNC CMD PER_CHAIN_LIMIT_OFFSET_CMD
>
> Looks like somehow this slipped through and we're sending a new command
> with an old firmware.
>
>
> Something like this might help?
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> index 7ee4802a5ef1..56b9363a9111 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> @@ -1026,7 +1026,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
> /* the ops field is at the same spot for all versions, so set in v1 */
> cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
>
> - if (cmd_ver == 5) {
> + if (cmd_ver == IWL_FW_CMD_VER_UNKNOWN) {
> + return 0;
> + } else if (cmd_ver == 5) {
> len = sizeof(cmd.v5);
> n_bands = ARRAY_SIZE(cmd.v5.table[0]);
> n_profiles = ACPI_NUM_GEO_PROFILES_REV3;
>
>
> Luca, are you aware of anything in this area?

I was not aware, not. I looked into this briefly this morning, and I
noticed that we should be doing this:

if (!iwl_sar_geo_support(&mvm->fwrt))
return -EOPNOTSUPP;

...also in the iwl_mvm_sar_geo_init() function.

There was some refactoring in this area, to move the initialization to
the init phase, and that must be causing this. I suspect this patch:

commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
Author: Miri Korenblit <[email protected]>
AuthorDate: Thu Aug 5 14:21:56 2021 +0300
Commit: Luca Coelho <[email protected]>
CommitDate: Thu Aug 26 23:36:10 2021 +0300

iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage

...which was introduced in v5.15.


TL;DR, Johannes patch should help with the problem, but maybe this
would be better?

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 863fec150e53..f13825185094 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
offsetof(struct iwl_geo_tx_power_profiles_cmd_v4, ops) !=
offsetof(struct iwl_geo_tx_power_profiles_cmd_v5, ops));

+ if (!iwl_sar_geo_support(&mvm->fwrt))
+ return -EOPNOTSUPP;
+
/* the ops field is at the same spot for all versions, so set in v1 */
cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);


We had some issues with the versioning and support flags for this
feature in the past, so the iwl_sar_geo_supported() function is not
that straight-forward...

--
Cheers,
Luca.

2022-01-12 22:41:17

by Johannes Berg

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> index 863fec150e53..f13825185094 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
> offsetof(struct iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> offsetof(struct iwl_geo_tx_power_profiles_cmd_v5, ops));
>
> + if (!iwl_sar_geo_support(&mvm->fwrt))
> + return -EOPNOTSUPP;
> +
> /* the ops field is at the same spot for all versions, so set in v1 */
> cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);

I was going to say it should probably return 0, but the caller looks a
bit fishy too?

ret = iwl_mvm_sar_init(mvm);
if (ret == 0)
ret = iwl_mvm_sar_geo_init(mvm);
else if (ret < 0)
goto error;

ret = iwl_mvm_sgom_init(mvm);

should that "else" be removed?

johannes

2022-01-12 22:41:53

by Luciano Coelho

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 10:27 +0100, Johannes Berg wrote:
> >
> > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > index 863fec150e53..f13825185094 100644
> > --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm
> > *mvm)
> >                      offsetof(struct
> > iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> >                      offsetof(struct
> > iwl_geo_tx_power_profiles_cmd_v5, ops));
> >  
> > + if (!iwl_sar_geo_support(&mvm->fwrt))
> > + return -EOPNOTSUPP;
> > +
> >         /* the ops field is at the same spot for all versions, so
> > set in v1 */
> >         cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
>
> I was going to say it should probably return 0, but the caller looks
> a
> bit fishy too?
>
>         ret = iwl_mvm_sar_init(mvm);
>         if (ret == 0)
>                 ret = iwl_mvm_sar_geo_init(mvm);
>         else if (ret < 0)
>                 goto error;
>
>         ret = iwl_mvm_sgom_init(mvm);
>
> should that "else" be removed?

Yeah, I noticed the same thing when I checked the return value... I
don't think we want to abort everything if SAR GEO init failed, so
maybe we should just remove the return value from the function?

--
Cheers,
Luca.

2022-01-12 22:42:45

by Johannes Berg

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 09:33 +0000, Coelho, Luciano wrote:
> On Wed, 2022-01-12 at 10:27 +0100, Johannes Berg wrote:
> > >
> > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > index 863fec150e53..f13825185094 100644
> > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm
> > > *mvm)
> > >                      offsetof(struct
> > > iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> > >                      offsetof(struct
> > > iwl_geo_tx_power_profiles_cmd_v5, ops));
> > >  
> > > + if (!iwl_sar_geo_support(&mvm->fwrt))
> > > + return -EOPNOTSUPP;
> > > +
> > >         /* the ops field is at the same spot for all versions, so
> > > set in v1 */
> > >         cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
> >
> > I was going to say it should probably return 0, but the caller looks
> > a
> > bit fishy too?
> >
> >         ret = iwl_mvm_sar_init(mvm);
> >         if (ret == 0)
> >                 ret = iwl_mvm_sar_geo_init(mvm);
> >         else if (ret < 0)
> >                 goto error;
> >
> >         ret = iwl_mvm_sgom_init(mvm);
> >
> > should that "else" be removed?
>
> Yeah, I noticed the same thing when I checked the return value... I
> don't think we want to abort everything if SAR GEO init failed, so
> maybe we should just remove the return value from the function?
>

Well the only real failure path there is "we cannot send the command",
in which case we might as well abort?

IOW, we already return 0 in the cases where we don't have the data or
something else happened (also for !CONFIG_ACPI).

johannes

2022-01-12 22:43:08

by Luciano Coelho

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 10:35 +0100, Johannes Berg wrote:
> On Wed, 2022-01-12 at 09:33 +0000, Coelho, Luciano wrote:
> > On Wed, 2022-01-12 at 10:27 +0100, Johannes Berg wrote:
> > > >
> > > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > index 863fec150e53..f13825185094 100644
> > > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm
> > > > *mvm)
> > > >                      offsetof(struct
> > > > iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> > > >                      offsetof(struct
> > > > iwl_geo_tx_power_profiles_cmd_v5, ops));
> > > >  
> > > > + if (!iwl_sar_geo_support(&mvm->fwrt))
> > > > + return -EOPNOTSUPP;
> > > > +
> > > >         /* the ops field is at the same spot for all versions, so
> > > > set in v1 */
> > > >         cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
> > >
> > > I was going to say it should probably return 0, but the caller looks
> > > a
> > > bit fishy too?
> > >
> > >         ret = iwl_mvm_sar_init(mvm);
> > >         if (ret == 0)
> > >                 ret = iwl_mvm_sar_geo_init(mvm);
> > >         else if (ret < 0)
> > >                 goto error;
> > >
> > >         ret = iwl_mvm_sgom_init(mvm);
> > >
> > > should that "else" be removed?
> >
> > Yeah, I noticed the same thing when I checked the return value... I
> > don't think we want to abort everything if SAR GEO init failed, so
> > maybe we should just remove the return value from the function?
> >
>
> Well the only real failure path there is "we cannot send the command",
> in which case we might as well abort?
>
> IOW, we already return 0 in the cases where we don't have the data or
> something else happened (also for !CONFIG_ACPI).

Yeah, okay, not being able to send the command is a bit bad. My point
was that SAR GEO is not that critical. It will improve connection in
certain cases in certain locations (by increasing the TX power), but
would still work and abide to regulatory rules...

--
Luca.

2022-01-12 23:22:29

by Len Brown

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, Jan 12, 2022 at 4:05 AM Coelho, Luciano
<[email protected]> wrote:

> There was some refactoring in this area, to move the initialization to
> the init phase, and that must be causing this. I suspect this patch:
>
> commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
> Author: Miri Korenblit <[email protected]>
> AuthorDate: Thu Aug 5 14:21:56 2021 +0300
> Commit: Luca Coelho <[email protected]>
> CommitDate: Thu Aug 26 23:36:10 2021 +0300
>
> iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage
>
> ...which was introduced in v5.15.

good guess!
I have confirmed that upstream 5.14 works, and that 5.15 with the
commit above reverted works.

2022-01-12 23:26:00

by Luciano Coelho

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 09:58 -0500, Len Brown wrote:
> On Wed, Jan 12, 2022 at 4:05 AM Coelho, Luciano
> <[email protected]> wrote:
>
> > There was some refactoring in this area, to move the initialization to
> > the init phase, and that must be causing this. I suspect this patch:
> >
> > commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
> > Author: Miri Korenblit <[email protected]>
> > AuthorDate: Thu Aug 5 14:21:56 2021 +0300
> > Commit: Luca Coelho <[email protected]>
> > CommitDate: Thu Aug 26 23:36:10 2021 +0300
> >
> > iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage
> >
> > ...which was introduced in v5.15.
>
> good guess!
> I have confirmed that upstream 5.14 works, and that 5.15 with the
> commit above reverted works.

Did you have a chance to test the patch I sent? I'm almost sure it
will, but it would be nice to get a confirmation. :)

--
Cheers,
Luca.

2022-01-12 23:51:27

by Len Brown

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

for 5.16, the no clean revert, so I'll grab your patch from e-mail shortly.

On Wed, Jan 12, 2022 at 10:42 AM Coelho, Luciano
<[email protected]> wrote:
>
> On Wed, 2022-01-12 at 09:58 -0500, Len Brown wrote:
> > On Wed, Jan 12, 2022 at 4:05 AM Coelho, Luciano
> > <[email protected]> wrote:
> >
> > > There was some refactoring in this area, to move the initialization to
> > > the init phase, and that must be causing this. I suspect this patch:
> > >
> > > commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
> > > Author: Miri Korenblit <[email protected]>
> > > AuthorDate: Thu Aug 5 14:21:56 2021 +0300
> > > Commit: Luca Coelho <[email protected]>
> > > CommitDate: Thu Aug 26 23:36:10 2021 +0300
> > >
> > > iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage
> > >
> > > ...which was introduced in v5.15.
> >
> > good guess!
> > I have confirmed that upstream 5.14 works, and that 5.15 with the
> > commit above reverted works.
>
> Did you have a chance to test the patch I sent? I'm almost sure it
> will, but it would be nice to get a confirmation. :)
>
> --
> Cheers,
> Luca.



--
Len Brown, Intel Open Source Technology Center

2022-01-12 23:52:33

by Len Brown

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, Jan 12, 2022 at 3:34 AM Johannes Berg <[email protected]> wrote:

> Something like this might help?

Yes!
I have confirmed that this patch allows wifi to work, when applied to
Linux-5.16.

Let me know when you'd like me to test a production patch.

thanks!
Len Brown, Intel Open Source Technology Center

> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> index 7ee4802a5ef1..56b9363a9111 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> @@ -1026,7 +1026,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
> /* the ops field is at the same spot for all versions, so set in v1 */
> cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
>
> - if (cmd_ver == 5) {
> + if (cmd_ver == IWL_FW_CMD_VER_UNKNOWN) {
> + return 0;
> + } else if (cmd_ver == 5) {
> len = sizeof(cmd.v5);
> n_bands = ARRAY_SIZE(cmd.v5.table[0]);
> n_profiles = ACPI_NUM_GEO_PROFILES_REV3;

2022-01-13 12:25:30

by Luciano Coelho

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 13:49 -0500, Len Brown wrote:
> On Wed, Jan 12, 2022 at 3:34 AM Johannes Berg <[email protected]> wrote:
>
> > Something like this might help?
>
> Yes!
> I have confirmed that this patch allows wifi to work, when applied to
> Linux-5.16.
>
> Let me know when you'd like me to test a production patch.

Thanks a lot for testing!

I just sent an official version of the fix (you are CCed). It would be
great if you could test that one as well!

We'll get it into the v5.17-rc series and I CCed stable, so it should
land in v5.15 and v5.16 soon too.

Thanks for reporting and testing!

--
Cheers,
Luca.

2022-01-28 15:33:31

by Luciano Coelho

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

Hi Len,

On Wed, 2022-01-12 at 13:49 -0500, Len Brown wrote:
> On Wed, Jan 12, 2022 at 3:34 AM Johannes Berg <[email protected]> wrote:
>
> > Something like this might help?
>
> Yes!
> I have confirmed that this patch allows wifi to work, when applied to
> Linux-5.16.
>
> Let me know when you'd like me to test a production patch.

Unfortunately neither Johannes' nor my fix was actually fully solvint
the problem. But I think I have the right fix now, but I don't have a
3160 where I could test it.

Can you please test if it works for you with this change?

Thanks!

--
Cheers,
Luca.


diff --git a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
index c5c80e3b95b3..6631b25aaa83 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
@@ -900,10 +900,11 @@ bool iwl_sar_geo_support(struct iwl_fw_runtime *fwrt)
* only one using version 36, so skip this version entirely.
*/
return IWL_UCODE_SERIAL(fwrt->fw->ucode_ver) >= 38 ||
- IWL_UCODE_SERIAL(fwrt->fw->ucode_ver) == 17 ||
- (IWL_UCODE_SERIAL(fwrt->fw->ucode_ver) == 29 &&
- ((fwrt->trans->hw_rev & CSR_HW_REV_TYPE_MSK) ==
- CSR_HW_REV_TYPE_7265D));
+ (IWL_UCODE_SERIAL(fwrt->fw->ucode_ver) == 17 &&
+ fwrt->trans->hw_rev != CSR_HW_REV_TYPE_3160) ||
+ (IWL_UCODE_SERIAL(fwrt->fw->ucode_ver) == 29 &&
+ ((fwrt->trans->hw_rev & CSR_HW_REV_TYPE_MSK) ==
+ CSR_HW_REV_TYPE_7265D));
}
IWL_EXPORT_SYMBOL(iwl_sar_geo_support);

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-csr.h b/drivers/net/wireless/intel/iwlwifi/iwl-csr.h
index a7806a1f2e93..e29593625ea0 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-csr.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-csr.h
@@ -329,6 +329,7 @@ enum {
#define CSR_HW_REV_TYPE_2x00 (0x0000100)
#define CSR_HW_REV_TYPE_105 (0x0000110)
#define CSR_HW_REV_TYPE_135 (0x0000120)
+#define CSR_HW_REV_TYPE_3160 (0x0000164)
#define CSR_HW_REV_TYPE_7265D (0x0000210)
#define CSR_HW_REV_TYPE_NONE (0x00001F0)
#define CSR_HW_REV_TYPE_QNJ (0x0000360)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index a34bad0c2645..84b03ecbb266 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -2016,7 +2016,7 @@ int iwl_mvm_up(struct iwl_mvm *mvm)
ret = iwl_mvm_sar_init(mvm);
if (ret == 0)
ret = iwl_mvm_sar_geo_init(mvm);
- else if (ret < 0)
+ if (ret < 0)
goto error;

ret = iwl_mvm_sgom_init(mvm);