2017-11-04 23:28:12

by Nikolas Nyby

[permalink] [raw]
Subject: rtl8821ae dbi read question

In drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c, we have this
function:

static u8 _rtl8821ae_dbi_read(struct rtl_priv *rtlpriv, u16 addr)
{
u16 read_addr = addr & 0xfffc;
u8 tmp = 0, count = 0, ret = 0;

rtl_write_word(rtlpriv, REG_DBI_ADDR, read_addr);
rtl_write_byte(rtlpriv, REG_DBI_FLAG, 0x2);
tmp = rtl_read_byte(rtlpriv, REG_DBI_FLAG);
count = 0;
while (tmp && count < 20) {
udelay(10);
tmp = rtl_read_byte(rtlpriv, REG_DBI_FLAG);
count++;
}
if (0 == tmp) {
read_addr = REG_DBI_RDATA + addr % 4;
ret = rtl_read_word(rtlpriv, read_addr);
}
return ret;
}

Near the end of the function, in this line:

ret = rtl_read_word(rtlpriv, read_addr);

rtl_read_word() returns a u16, but "ret" is declared as a u8. Is that a
problem, or is this code correct?

What's prompting this question is that I'm getting frequent disconnects
from my access point with my rtl8821ae device. I've experienced this
behavior both before and after the recent change to this function in
commit b8b8b16352cd90c6083033fd4487f04fae935c18.


2017-11-06 02:09:04

by Larry Finger

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/05/2017 03:15 PM, Nik Nyby wrote:
> I also want to note that adding rtl8821ae.aspm=0 to my grub kernel boot command
> doesn't fix my problem. (I'm building this driver into the kernel, not as a
> module). My connection dropping problem is fixed only if I comment out the aspm
> init code in the driver, per this patch:
>   https://patchwork.kernel.org/patch/9951511/

Disabling all of _rtl8821ae_enable_aspm_back_door() may not be wise. We tried
that patch as part of debugging.

That routine consists of two mdio r/w sequences, and 3 dbi r/w sequences. The
third one of the latter is only used for RTL8812AE, thus it can be ignored.

What happens if you try disabling those r/w pairs one at a time? It is possible
that one, or more of them, should be disabled when aspm is zero. It should not
matter whether the driver is built in, or loaded as a module. Please note that I
never build that code in as I always want the option of refreshing the driver
without rebooting.

Larry

2017-11-05 20:53:19

by Nikolas Nyby

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/05/2017 11:50 AM, Larry Finger wrote:
>
> If you read the commit message for commit b8b8b16352cd, you will find
> that we do not understand why using a byte read causes failure, but
> reverting the change so that it is a word read made it function again.
> The "fix" was found by a user doing bisection, and verified on my
> system, where the RTL8821AE has stable connections. The transfer rates
> take wild swings, but the connection is not dropped.
>
> If you NIC behaves differently, then you will need to help us debug the
> problem. If you had a kernel that worked, then you might try bisection.

I just tested on kernels 4.12 and 4.13 - my connection gets dropped on
those kernels as well. Right now I'm using 4.14-rc7.

> If that is not possible, you might try various combinations of module
> parameters aspm, int_clear, and msi
Thanks for these suggestions. Disabling ASPM solves my connection
dropping issue! Previously, I was getting disconnected around 10 seconds
after starting "git clone" on a large git repository. Now, I haven't
seen the connection drop once.

I can help debug the issue. I'm using a Lenovo Ideapad 320-15ABR laptop.
Here's the lspci -vv info for my NIC:

01:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE
802.11ac PCIe Wireless Network Adapter
Subsystem: Lenovo RTL8821AE 802.11ac PCIe Wireless Network Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 33
Region 0: I/O ports at 2000 [size=256]
Region 2: Memory at f0b00000 (64-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: rtl8821ae

2017-11-05 16:50:08

by Larry Finger

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/04/2017 06:27 PM, Nik Nyby wrote:
> In drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c, we have this function:
>
>  static u8 _rtl8821ae_dbi_read(struct rtl_priv *rtlpriv, u16 addr)
>  {
>      u16 read_addr = addr & 0xfffc;
>      u8 tmp = 0, count = 0, ret = 0;
>
>      rtl_write_word(rtlpriv, REG_DBI_ADDR, read_addr);
>      rtl_write_byte(rtlpriv, REG_DBI_FLAG, 0x2);
>      tmp = rtl_read_byte(rtlpriv, REG_DBI_FLAG);
>      count = 0;
>      while (tmp && count < 20) {
>          udelay(10);
>          tmp = rtl_read_byte(rtlpriv, REG_DBI_FLAG);
>          count++;
>      }
>      if (0 == tmp) {
>          read_addr = REG_DBI_RDATA + addr % 4;
>          ret = rtl_read_word(rtlpriv, read_addr);
>      }
>      return ret;
>  }
>
> Near the end of the function, in this line:
>
>   ret = rtl_read_word(rtlpriv, read_addr);
>
> rtl_read_word() returns a u16, but "ret" is declared as a u8. Is that a problem,
> or is this code correct?
>
> What's prompting this question is that I'm getting frequent disconnects from my
> access point with my rtl8821ae device. I've experienced this behavior both
> before and after the recent change to this function in commit
> b8b8b16352cd90c6083033fd4487f04fae935c18.

If you read the commit message for commit b8b8b16352cd, you will find that we do
not understand why using a byte read causes failure, but reverting the change so
that it is a word read made it function again. The "fix" was found by a user
doing bisection, and verified on my system, where the RTL8821AE has stable
connections. The transfer rates take wild swings, but the connection is not dropped.

If you NIC behaves differently, then you will need to help us debug the problem.
If you had a kernel that worked, then you might try bisection. If that is not
possible, you might try various combinations of module parameters aspm,
int_clear, and msi. Those parameters default to 1, so setting one or more of
them to zero is the test. Note, however, that changing any of these might cause
the system to fail, thus I recommend that you test changes dynamically with
modprobe, and not create a file in /etc/modprobe.d/ until you have verified that
changing one or more of these parameters actually helps. That will keep from
making your system unbootable.

Larry

2017-11-05 21:29:54

by James Cameron

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On Sun, Nov 05, 2017 at 04:15:36PM -0500, Nik Nyby wrote:
> I also want to note that adding rtl8821ae.aspm=0 to my grub kernel
> boot command doesn't fix my problem. (I'm building this driver into
> the kernel, not as a module). My connection dropping problem is
> fixed only if I comment out the aspm init code in the driver, per
> this patch:
> https://patchwork.kernel.org/patch/9951511/

Interesting; that patch was for testing, and did more than just aspm=0.

Check for a BIOS update from your vendor. Device registers can be
configured and persist despite warm reboot, which implies there is no
register reset in driver start or device reset, which further implies
that the BIOS can easily affect the outcome.

Check also for different behaviour after cold reboot; that is a power
down for 30 seconds then power up.

Do you yet have a known working kernel to bisect against?

--
James Cameron
http://quozl.netrek.org/

2017-11-05 21:15:59

by Nikolas Nyby

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

I also want to note that adding rtl8821ae.aspm=0 to my grub kernel boot
command doesn't fix my problem. (I'm building this driver into the
kernel, not as a module). My connection dropping problem is fixed only
if I comment out the aspm init code in the driver, per this patch:
https://patchwork.kernel.org/patch/9951511/

2017-11-06 01:46:58

by Larry Finger

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/05/2017 02:53 PM, Nik Nyby wrote:
> On 11/05/2017 11:50 AM, Larry Finger wrote:
>>
>> If you read the commit message for commit b8b8b16352cd, you will find that we
>> do not understand why using a byte read causes failure, but reverting the
>> change so that it is a word read made it function again. The "fix" was found
>> by a user doing bisection, and verified on my system, where the RTL8821AE has
>> stable connections. The transfer rates take wild swings, but the connection is
>> not dropped.
>>
>> If you NIC behaves differently, then you will need to help us debug the
>> problem. If you had a kernel that worked, then you might try bisection.
>
> I just tested on kernels 4.12 and 4.13 - my connection gets dropped on those
> kernels as well. Right now I'm using 4.14-rc7.
>
>> If that is not possible, you might try various combinations of module
>> parameters aspm, int_clear, and msi
> Thanks for these suggestions. Disabling ASPM solves my connection dropping
> issue! Previously, I was getting disconnected around 10 seconds after starting
> "git clone" on a large git repository. Now, I haven't seen the connection drop
> once.
>
> I can help debug the issue. I'm using a Lenovo Ideapad 320-15ABR laptop. Here's
> the lspci -vv info for my NIC:
>
> 01:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac
> PCIe Wireless Network Adapter
>         Subsystem: Lenovo RTL8821AE 802.11ac PCIe Wireless Network Adapter
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 32 bytes
>         Interrupt: pin A routed to IRQ 33
>         Region 0: I/O ports at 2000 [size=256]
>         Region 2: Memory at f0b00000 (64-bit, non-prefetchable) [size=16K]
>         Capabilities: <access denied>
>         Kernel driver in use: rtl8821ae

If "aspm=0" fixes your issue, then there is nothing more to debug. That option
is there to handle cases where the PCI connection does not behave the way that
most do. This condition seems to be most common on Lenovo machines. Frequently,
ASPM problems are accompanied by PCI errors, but that seems not to be the case
for your machine. Create a suitable .conf file in /etc/modprobe.d/ and you
should be OK.

Larry

2017-11-06 13:52:21

by Larry Finger

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/05/2017 10:09 PM, Nik Nyby wrote:
> On 11/05/2017 09:09 PM, Larry Finger wrote:
>> Disabling all of _rtl8821ae_enable_aspm_back_door() may not be wise. We tried
>> that patch as part of debugging.
>>
>> That routine consists of two mdio r/w sequences, and 3 dbi r/w sequences. The
>> third one of the latter is only used for RTL8812AE, thus it can be ignored.
>>
>> What happens if you try disabling those r/w pairs one at a time? It is
>> possible that one, or more of them, should be disabled when aspm is zero.
>
> I first tried disabling only the mdio sequences, then the dbi sequences. I was
> able to reproduce my problem in both cases. But I have found that my problem is
> resolved by only removing the enable_aspm() call, and leaving the call to
> _rtl8821ae_enable_aspm_back_door() intact.
>
> If removing rtlpriv->intf_ops->enable_aspm(hw) is the same as setting aspm=0,
> then it's possible I'm not setting the option correctly. But as far as I can see
> right now, removing enable_aspm() is necessary for my stable connection.

Your results seem to indicate that aspm, which is support_aspm internally, is
not being set to zero. The first executable statement in enable_aspm is

if (!ppsc->support_aspm)
return;

That statement should disable the entire routine; however, you still need to
disable the call to this routine. It is possible that the timing is messed up
when built into the kernel. Building as a module will be instructive.

Larry

2017-11-06 04:09:29

by Nikolas Nyby

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/05/2017 09:09 PM, Larry Finger wrote:
> Disabling all of _rtl8821ae_enable_aspm_back_door() may not be wise. We
> tried that patch as part of debugging.
>
> That routine consists of two mdio r/w sequences, and 3 dbi r/w
> sequences. The third one of the latter is only used for RTL8812AE, thus
> it can be ignored.
>
> What happens if you try disabling those r/w pairs one at a time? It is
> possible that one, or more of them, should be disabled when aspm is
> zero.

I first tried disabling only the mdio sequences, then the dbi sequences.
I was able to reproduce my problem in both cases. But I have found that
my problem is resolved by only removing the enable_aspm() call, and
leaving the call to _rtl8821ae_enable_aspm_back_door() intact.

If removing rtlpriv->intf_ops->enable_aspm(hw) is the same as setting
aspm=0, then it's possible I'm not setting the option correctly. But as
far as I can see right now, removing enable_aspm() is necessary for my
stable connection.

> It should not matter whether the driver is built in, or loaded as
> a module. Please note that I never build that code in as I always want
> the option of refreshing the driver without rebooting.

Thanks for the info - I'll do the same.

2017-11-19 00:39:52

by Nikolas Nyby

[permalink] [raw]
Subject: Re: rtl8821ae dbi read question

On 11/06/2017 08:52 AM, Larry Finger wrote:
>
> Your results seem to indicate that aspm, which is support_aspm
> internally, is not being set to zero. The first executable statement in
> enable_aspm is
>
>         if (!ppsc->support_aspm)
>                 return;
>
> That statement should disable the entire routine; however, you still
> need to disable the call to this routine. It is possible that the timing
> is messed up when built into the kernel. Building as a module will be
> instructive.
>
> Larry
>

FYI, when building this driver as a module, I'm not seeing my unstable
connection problem anymore, with the aspm=0 config set in
/etc/modprobe.d/. Thanks for your help with this.