2022-01-12 22:37:01

by Johannes Berg

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Tue, 2022-01-11 at 20:47 -0500, Len Brown wrote:
>
> I recently installed Fedora 35, and the card worked fine.
> But when I built and booted a 5.15 or 5.16 upstream kernel from src,
> the card fails at initialization and I've not found a way to revive it.
> (dmesg below)

Ouch.

> [ 11.411533] iwlwifi 0000:3a:00.0: Loaded firmware version: 17.3216344376.0 3160-17.ucode
> [ 11.412815] iwlwifi 0000:3a:00.0: 0x00000038 | BAD_COMMAND

...

> [ 11.688317] iwlwifi 0000:3a:00.0: FW error in SYNC CMD PER_CHAIN_LIMIT_OFFSET_CMD

Looks like somehow this slipped through and we're sending a new command
with an old firmware.


Something like this might help?

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 7ee4802a5ef1..56b9363a9111 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -1026,7 +1026,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
/* the ops field is at the same spot for all versions, so set in v1 */
cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);

- if (cmd_ver == 5) {
+ if (cmd_ver == IWL_FW_CMD_VER_UNKNOWN) {
+ return 0;
+ } else if (cmd_ver == 5) {
len = sizeof(cmd.v5);
n_bands = ARRAY_SIZE(cmd.v5.table[0]);
n_profiles = ACPI_NUM_GEO_PROFILES_REV3;


Luca, are you aware of anything in this area?

johannes


2022-01-12 22:39:45

by Coelho, Luciano

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

Hi,

On Wed, 2022-01-12 at 09:34 +0100, Johannes Berg wrote:
> On Tue, 2022-01-11 at 20:47 -0500, Len Brown wrote:
> >
> > I recently installed Fedora 35, and the card worked fine.
> > But when I built and booted a 5.15 or 5.16 upstream kernel from src,
> > the card fails at initialization and I've not found a way to revive it.
> > (dmesg below)
>
> Ouch.
>
> > [ 11.411533] iwlwifi 0000:3a:00.0: Loaded firmware version: 17.3216344376.0 3160-17.ucode
> > [ 11.412815] iwlwifi 0000:3a:00.0: 0x00000038 | BAD_COMMAND
>
> ...
>
> > [ 11.688317] iwlwifi 0000:3a:00.0: FW error in SYNC CMD PER_CHAIN_LIMIT_OFFSET_CMD
>
> Looks like somehow this slipped through and we're sending a new command
> with an old firmware.
>
>
> Something like this might help?
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> index 7ee4802a5ef1..56b9363a9111 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> @@ -1026,7 +1026,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
> /* the ops field is at the same spot for all versions, so set in v1 */
> cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
>
> - if (cmd_ver == 5) {
> + if (cmd_ver == IWL_FW_CMD_VER_UNKNOWN) {
> + return 0;
> + } else if (cmd_ver == 5) {
> len = sizeof(cmd.v5);
> n_bands = ARRAY_SIZE(cmd.v5.table[0]);
> n_profiles = ACPI_NUM_GEO_PROFILES_REV3;
>
>
> Luca, are you aware of anything in this area?

I was not aware, not. I looked into this briefly this morning, and I
noticed that we should be doing this:

if (!iwl_sar_geo_support(&mvm->fwrt))
return -EOPNOTSUPP;

...also in the iwl_mvm_sar_geo_init() function.

There was some refactoring in this area, to move the initialization to
the init phase, and that must be causing this. I suspect this patch:

commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
Author: Miri Korenblit <[email protected]>
AuthorDate: Thu Aug 5 14:21:56 2021 +0300
Commit: Luca Coelho <[email protected]>
CommitDate: Thu Aug 26 23:36:10 2021 +0300

iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage

...which was introduced in v5.15.


TL;DR, Johannes patch should help with the problem, but maybe this
would be better?

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 863fec150e53..f13825185094 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
offsetof(struct iwl_geo_tx_power_profiles_cmd_v4, ops) !=
offsetof(struct iwl_geo_tx_power_profiles_cmd_v5, ops));

+ if (!iwl_sar_geo_support(&mvm->fwrt))
+ return -EOPNOTSUPP;
+
/* the ops field is at the same spot for all versions, so set in v1 */
cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);


We had some issues with the versioning and support flags for this
feature in the past, so the iwl_sar_geo_supported() function is not
that straight-forward...

--
Cheers,
Luca.

2022-01-12 22:41:17

by Johannes Berg

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> index 863fec150e53..f13825185094 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
> offsetof(struct iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> offsetof(struct iwl_geo_tx_power_profiles_cmd_v5, ops));
>
> + if (!iwl_sar_geo_support(&mvm->fwrt))
> + return -EOPNOTSUPP;
> +
> /* the ops field is at the same spot for all versions, so set in v1 */
> cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);

I was going to say it should probably return 0, but the caller looks a
bit fishy too?

ret = iwl_mvm_sar_init(mvm);
if (ret == 0)
ret = iwl_mvm_sar_geo_init(mvm);
else if (ret < 0)
goto error;

ret = iwl_mvm_sgom_init(mvm);

should that "else" be removed?

johannes

2022-01-12 22:41:53

by Coelho, Luciano

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 10:27 +0100, Johannes Berg wrote:
> >
> > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > index 863fec150e53..f13825185094 100644
> > --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm
> > *mvm)
> >                      offsetof(struct
> > iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> >                      offsetof(struct
> > iwl_geo_tx_power_profiles_cmd_v5, ops));
> >  
> > + if (!iwl_sar_geo_support(&mvm->fwrt))
> > + return -EOPNOTSUPP;
> > +
> >         /* the ops field is at the same spot for all versions, so
> > set in v1 */
> >         cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
>
> I was going to say it should probably return 0, but the caller looks
> a
> bit fishy too?
>
>         ret = iwl_mvm_sar_init(mvm);
>         if (ret == 0)
>                 ret = iwl_mvm_sar_geo_init(mvm);
>         else if (ret < 0)
>                 goto error;
>
>         ret = iwl_mvm_sgom_init(mvm);
>
> should that "else" be removed?

Yeah, I noticed the same thing when I checked the return value... I
don't think we want to abort everything if SAR GEO init failed, so
maybe we should just remove the return value from the function?

--
Cheers,
Luca.

2022-01-12 22:42:45

by Johannes Berg

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 09:33 +0000, Coelho, Luciano wrote:
> On Wed, 2022-01-12 at 10:27 +0100, Johannes Berg wrote:
> > >
> > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > index 863fec150e53..f13825185094 100644
> > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm
> > > *mvm)
> > >                      offsetof(struct
> > > iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> > >                      offsetof(struct
> > > iwl_geo_tx_power_profiles_cmd_v5, ops));
> > >  
> > > + if (!iwl_sar_geo_support(&mvm->fwrt))
> > > + return -EOPNOTSUPP;
> > > +
> > >         /* the ops field is at the same spot for all versions, so
> > > set in v1 */
> > >         cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
> >
> > I was going to say it should probably return 0, but the caller looks
> > a
> > bit fishy too?
> >
> >         ret = iwl_mvm_sar_init(mvm);
> >         if (ret == 0)
> >                 ret = iwl_mvm_sar_geo_init(mvm);
> >         else if (ret < 0)
> >                 goto error;
> >
> >         ret = iwl_mvm_sgom_init(mvm);
> >
> > should that "else" be removed?
>
> Yeah, I noticed the same thing when I checked the return value... I
> don't think we want to abort everything if SAR GEO init failed, so
> maybe we should just remove the return value from the function?
>

Well the only real failure path there is "we cannot send the command",
in which case we might as well abort?

IOW, we already return 0 in the cases where we don't have the data or
something else happened (also for !CONFIG_ACPI).

johannes

2022-01-12 22:43:08

by Coelho, Luciano

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 10:35 +0100, Johannes Berg wrote:
> On Wed, 2022-01-12 at 09:33 +0000, Coelho, Luciano wrote:
> > On Wed, 2022-01-12 at 10:27 +0100, Johannes Berg wrote:
> > > >
> > > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > index 863fec150e53..f13825185094 100644
> > > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> > > > @@ -834,6 +834,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm
> > > > *mvm)
> > > >                      offsetof(struct
> > > > iwl_geo_tx_power_profiles_cmd_v4, ops) !=
> > > >                      offsetof(struct
> > > > iwl_geo_tx_power_profiles_cmd_v5, ops));
> > > >  
> > > > + if (!iwl_sar_geo_support(&mvm->fwrt))
> > > > + return -EOPNOTSUPP;
> > > > +
> > > >         /* the ops field is at the same spot for all versions, so
> > > > set in v1 */
> > > >         cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
> > >
> > > I was going to say it should probably return 0, but the caller looks
> > > a
> > > bit fishy too?
> > >
> > >         ret = iwl_mvm_sar_init(mvm);
> > >         if (ret == 0)
> > >                 ret = iwl_mvm_sar_geo_init(mvm);
> > >         else if (ret < 0)
> > >                 goto error;
> > >
> > >         ret = iwl_mvm_sgom_init(mvm);
> > >
> > > should that "else" be removed?
> >
> > Yeah, I noticed the same thing when I checked the return value... I
> > don't think we want to abort everything if SAR GEO init failed, so
> > maybe we should just remove the return value from the function?
> >
>
> Well the only real failure path there is "we cannot send the command",
> in which case we might as well abort?
>
> IOW, we already return 0 in the cases where we don't have the data or
> something else happened (also for !CONFIG_ACPI).

Yeah, okay, not being able to send the command is a bit bad. My point
was that SAR GEO is not that critical. It will improve connection in
certain cases in certain locations (by increasing the TX power), but
would still work and abide to regulatory rules...

--
Luca.

2022-01-12 23:22:29

by Len Brown

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, Jan 12, 2022 at 4:05 AM Coelho, Luciano
<[email protected]> wrote:

> There was some refactoring in this area, to move the initialization to
> the init phase, and that must be causing this. I suspect this patch:
>
> commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
> Author: Miri Korenblit <[email protected]>
> AuthorDate: Thu Aug 5 14:21:56 2021 +0300
> Commit: Luca Coelho <[email protected]>
> CommitDate: Thu Aug 26 23:36:10 2021 +0300
>
> iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage
>
> ...which was introduced in v5.15.

good guess!
I have confirmed that upstream 5.14 works, and that 5.15 with the
commit above reverted works.

2022-01-12 23:26:00

by Coelho, Luciano

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 09:58 -0500, Len Brown wrote:
> On Wed, Jan 12, 2022 at 4:05 AM Coelho, Luciano
> <[email protected]> wrote:
>
> > There was some refactoring in this area, to move the initialization to
> > the init phase, and that must be causing this. I suspect this patch:
> >
> > commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
> > Author: Miri Korenblit <[email protected]>
> > AuthorDate: Thu Aug 5 14:21:56 2021 +0300
> > Commit: Luca Coelho <[email protected]>
> > CommitDate: Thu Aug 26 23:36:10 2021 +0300
> >
> > iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage
> >
> > ...which was introduced in v5.15.
>
> good guess!
> I have confirmed that upstream 5.14 works, and that 5.15 with the
> commit above reverted works.

Did you have a chance to test the patch I sent? I'm almost sure it
will, but it would be nice to get a confirmation. :)

--
Cheers,
Luca.

2022-01-12 23:51:27

by Len Brown

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

for 5.16, the no clean revert, so I'll grab your patch from e-mail shortly.

On Wed, Jan 12, 2022 at 10:42 AM Coelho, Luciano
<[email protected]> wrote:
>
> On Wed, 2022-01-12 at 09:58 -0500, Len Brown wrote:
> > On Wed, Jan 12, 2022 at 4:05 AM Coelho, Luciano
> > <[email protected]> wrote:
> >
> > > There was some refactoring in this area, to move the initialization to
> > > the init phase, and that must be causing this. I suspect this patch:
> > >
> > > commit 78a19d5285d93e281b40c3b8d5a959fbbd2fe006
> > > Author: Miri Korenblit <[email protected]>
> > > AuthorDate: Thu Aug 5 14:21:56 2021 +0300
> > > Commit: Luca Coelho <[email protected]>
> > > CommitDate: Thu Aug 26 23:36:10 2021 +0300
> > >
> > > iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage
> > >
> > > ...which was introduced in v5.15.
> >
> > good guess!
> > I have confirmed that upstream 5.14 works, and that 5.15 with the
> > commit above reverted works.
>
> Did you have a chance to test the patch I sent? I'm almost sure it
> will, but it would be nice to get a confirmation. :)
>
> --
> Cheers,
> Luca.



--
Len Brown, Intel Open Source Technology Center

2022-01-12 23:52:33

by Len Brown

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, Jan 12, 2022 at 3:34 AM Johannes Berg <[email protected]> wrote:

> Something like this might help?

Yes!
I have confirmed that this patch allows wifi to work, when applied to
Linux-5.16.

Let me know when you'd like me to test a production patch.

thanks!
Len Brown, Intel Open Source Technology Center

> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> index 7ee4802a5ef1..56b9363a9111 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
> @@ -1026,7 +1026,9 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
> /* the ops field is at the same spot for all versions, so set in v1 */
> cmd.v1.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
>
> - if (cmd_ver == 5) {
> + if (cmd_ver == IWL_FW_CMD_VER_UNKNOWN) {
> + return 0;
> + } else if (cmd_ver == 5) {
> len = sizeof(cmd.v5);
> n_bands = ARRAY_SIZE(cmd.v5.table[0]);
> n_profiles = ACPI_NUM_GEO_PROFILES_REV3;

2022-01-13 12:25:30

by Coelho, Luciano

[permalink] [raw]
Subject: Re: iwlwifi 0000:3a:00.0: Microcode SW error detected. Restarting 0x2000000.

On Wed, 2022-01-12 at 13:49 -0500, Len Brown wrote:
> On Wed, Jan 12, 2022 at 3:34 AM Johannes Berg <[email protected]> wrote:
>
> > Something like this might help?
>
> Yes!
> I have confirmed that this patch allows wifi to work, when applied to
> Linux-5.16.
>
> Let me know when you'd like me to test a production patch.

Thanks a lot for testing!

I just sent an official version of the fix (you are CCed). It would be
great if you could test that one as well!

We'll get it into the v5.17-rc series and I CCed stable, so it should
land in v5.15 and v5.16 soon too.

Thanks for reporting and testing!

--
Cheers,
Luca.