Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5142209pxj; Wed, 9 Jun 2021 10:04:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzYAVP6VaevgOispa3BjPtThwACNG/zfNTFohrshXB/8JHp9MZXgvS9lMNBimb8yI0M2d3B X-Received: by 2002:a17:906:d85:: with SMTP id m5mr821120eji.55.1623258277787; Wed, 09 Jun 2021 10:04:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623258277; cv=none; d=google.com; s=arc-20160816; b=jCr77XYVBnQEuKGs8YhcKkLoOYawKWPu+NysyTAfgjCE6UGwSrF3CVlPHEOsUBMo3b LuoMx7Ss+++GDCgUgzbjkeZ+Ss0SDFd0kd3I+Lx2GfXOwLm9UO6vJXUX+zS5fJEIdEQb 4KAn5UF2m4JR5XfiYJ/ZCZR+ZTtJSpTGn2oUmvrYZ5akq7Nn4ZoygAFlB1gWwYrUm9Q5 RSRW36a0nzql9VKg9BBqVtbEN+FSQyX3oCRLMq+9rUFQZEChs7Y9dYHXcw2VTacl4aU7 049dv0Mzwiu1/3ZWb+AwATPJIagu6IlXp5+WTotek0ZhsunqlHxYY0KemO+rlfKlRRDW eRrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=futngJHh7CtFBTJR94UbQKkh6INPUohMLZeIfsqI8Mo=; b=CpEsoj2QQdueEC1LV2o1f4mhrNM+0zp3l1YzwgxU4Z12weJ+lVZ3RU3fP/Oojc1OAi bQwdap/mEXX3P92SFi2WfwaYsc+q5boAVaa5OrJGbRSZ+0r12KGauP2iDzVdqWVvSA0s KjEijhOCYO5e3uLPLb1gestY4wOdjqZzJ/PYOmCkYYchzEuIUC6eAGzvFSXg4sPYyPt5 iPZ14BD7FRe+ypJgCMnp8ZUUfu14WcRVnkMGjN7Q1ADPn6qRU3G319rpZPoHTe5TL0gm ERHQzGK6jlTGGxe7TZsgJzWy6qWNLM+gEovEYFURNCRzFn/ldGguBdxHAxXXrYBTSKhP ll0g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h7si231050ejo.519.2021.06.09.10.04.06; Wed, 09 Jun 2021 10:04:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232719AbhFIBuC (ORCPT + 99 others); Tue, 8 Jun 2021 21:50:02 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:53039 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231438AbhFIBuC (ORCPT ); Tue, 8 Jun 2021 21:50:02 -0400 Received: from mail-oo1-f72.google.com ([209.85.161.72]) by youngberry.canonical.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1lqnKC-0004kX-0P for linux-kernel@vger.kernel.org; Wed, 09 Jun 2021 01:48:08 +0000 Received: by mail-oo1-f72.google.com with SMTP id j9-20020a4ad1890000b0290249480f62d9so9467877oor.0 for ; Tue, 08 Jun 2021 18:48:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=futngJHh7CtFBTJR94UbQKkh6INPUohMLZeIfsqI8Mo=; b=K+/mbMBYkg4Pr0SWpSChpJLrAhTzeaPMnqvAjSn/AtYk8NgdAUsI0R6T2/8oDeQrwd UnB8D1p78w57XGI2MMx4I/wDvw0tInMx3hjpt368dGdilNQW8t3TDUUVEVPnED6ph/UD /0nEqwtRP1ldZQmfyasg9TmrDH/CfJ1eHL+16u1bcWEEmhSop3YjNDSOcRV9WKwGoqZD /OmM2+j8KWrzfzRVQ4HdWZ5xA4aAiLQINjOVHvgLt6clmx66OEVEYCxamUyeRE1Ny+Tu ay8XZQ/tG7E5hKTAAECSdnN7G2OT9dgfLsQjCUefWk9gZaquzXe3EmJY/WSuPK2n5dYf 00Pw== X-Gm-Message-State: AOAM533W1YEYpKsscWfA2HF8tJD1/Qa3lYP63IO59QJLVpzHubQwekXp 4HspVj1jAsM0kwYxHRU2wglbrSVRhLwUQbyUm+cQqt5h+ITTUz2IlNYXnSysjPBiR7AUZq7xaXa MY4FTTiJDXoi/rCBKytGEnHbw6qtRFYbpl/09VbksYr+mkFffeGbPIIOXew== X-Received: by 2002:a9d:4115:: with SMTP id o21mr4755702ote.3.1623203286862; Tue, 08 Jun 2021 18:48:06 -0700 (PDT) X-Received: by 2002:a9d:4115:: with SMTP id o21mr4755685ote.3.1623203286481; Tue, 08 Jun 2021 18:48:06 -0700 (PDT) MIME-Version: 1.0 References: <20210608032207.2923574-1-koba.ko@canonical.com> <84eb168e-58ff-0350-74e2-c55249eb258c@gmail.com> <7a36c032-38fa-6aa6-fa0f-c3664850d8ea@gmail.com> <4508fbcb-f8ec-3805-4aa3-eeea4975de31@gmail.com> In-Reply-To: <4508fbcb-f8ec-3805-4aa3-eeea4975de31@gmail.com> From: Koba Ko Date: Wed, 9 Jun 2021 09:47:55 +0800 Message-ID: Subject: Re: [PATCH] [v2] r8169: Use PHY_POLL when RTL8106E enable ASPM To: Heiner Kallweit Cc: "David S. Miller" , Jakub Kicinski , netdev@vger.kernel.org, Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 9, 2021 at 4:58 AM Heiner Kallweit wrote: > > On 08.06.2021 16:17, Koba Ko wrote: > > On Tue, Jun 8, 2021 at 9:45 PM Heiner Kallweit wrote: > >> > >> On 08.06.2021 12:43, Koba Ko wrote: > >>> On Tue, Jun 8, 2021 at 4:00 PM Heiner Kallweit wrote: > >>>> > >>>> On 08.06.2021 05:22, Koba Ko wrote: > >>>>> For RTL8106E, it's a Fast-ethernet chip. > >>>>> If ASPM is enabled, the link chang interrupt wouldn't be triggered > >>>>> immediately and must wait a very long time to get link change interrupt. > >>>>> Even the link change interrupt isn't triggered, the phy link is already > >>>>> established. > >>>>> > >>>>> Use PHY_POLL to watch the status of phy link and disable > >>>>> the link change interrupt when ASPM is enabled on RTL8106E. > >>>>> > >>>>> v2: Instead use PHY_POLL and identify 8106E by RTL_GIGA_MAC_VER_39. > >>>>> > >>>> > >>>> Still the issue description doesn't convince me that it's a hw bug > >>>> with the respective chip version. What has been stated so far: > >>>> > >>>> 1. (and most important) Issue doesn't occur in mainline because ASPM > >>>> is disabled in mainline for r8169. Issue occurs only with a > >>>> downstream kernel with ASPM enabled for r8169. > >>> > >>> mainline kernel and enable L1, the issue is also observed. > >>> > >> Yes, but enabling L1 via sysfs is at own risk. > > > > but we could have a workaround if hw have an aspm issue. > > > >> > >>>> 2. Issue occurs only with ASPM L1.1 not disabled, even though this chip > >>>> version doesn't support L1 sub-states. Just L0s/L1 don't trigger > >>>> the issue. > >>>> The NIC doesn't announce L1.1 support, therefore PCI core won't > >>>> enable L1 sub-states on the PCIe link between NIC and upstream > >>>> PCI bridge. > >>> > >>> More precisely, when L1 is enabled, the issue would be triggered. > >>> For RTL8106E, > >>> 1. Only disable L0s, pcie_aspm_enabled return 1, issue is triggered. > >>> 2. Only disable L1_1, pcie_aspm_enabled return 1, issue is triggered. > >>> > >>> 3. Only disable L1, pcie_aspm_enabled return 0, issue is not triggered. > >>> > >>>> > >>>> 3. Issue occurs only with a GBit-capable link partner. 100MBit link > >>>> partners are fine. Not clear whether issue occurs with a specific > >>>> Gbit link partner only or with GBit-capable link partners in general. > >>>> > >>>> 4. Only link-up interrupt is affected. Not link-down and not interrupts > >>>> triggered by other interrupt sources. > >>>> > >>>> 5. Realtek couldn't confirm that there's such a hw bug on RTL8106e. > >>>> > >>>> One thing that hasn't been asked yet: > >>>> Does issue occur always if you re-plug the cable? Or only on boot? > >>>> I'm asking because in the dmesg log you attached to the bugzilla issue > >>>> the following looks totally ok. > >>>> > >>>> [ 61.651643] r8169 0000:01:00.0 enp1s0: Link is Down > >>>> [ 63.720015] r8169 0000:01:00.0 enp1s0: Link is Up - 100Mbps/Full - flow control rx/tx > >>>> [ 66.685499] r8169 0000:01:00.0 enp1s0: Link is Down > >>> > >>> Once the link is up, > >>> 1. If cable is unplug&plug immediately, you wouldn't see the issue. > >>> 2. Unplug cable and wait a long time (~1Mins), then plug the cable, > >>> the issue appears again. > >>> > >> This sounds runtime-pm-related. After 10s the NIC runtime-suspends, > >> and once the cable is re-plugged a PME is triggered that lets the > >> PCI core return the PCIe link from D3hot to D0. > >> If you re-plug the cable after such a longer time, do you see the > >> PCIe PME immediately in /proc/interrupts? > > > > I don't know which irq number is for RTL8106e PME, but check all PME > > in /proc/interrupt, > > > > There's no interrupt increase on all PME entries and rtl8106e irq entry(irq 32). > > > >> > >> And if you set /sys/class/net//power/control to "on", does the > >> issue still occur? > > > > Yes, after echo "on" to /sys/class/net//power/control and > > replugging the cable, the issue is still caught. > >> > >>>> > >>>>> Signed-off-by: Koba Ko > >>>>> --- > >>>>> drivers/net/ethernet/realtek/r8169_main.c | 21 +++++++++++++++++++-- > >>>>> 1 file changed, 19 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > >>>>> index 2c89cde7da1e..a59cbaef2839 100644 > >>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c > >>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c > >>>>> @@ -4914,6 +4914,19 @@ static const struct dev_pm_ops rtl8169_pm_ops = { > >>>>> > >>>>> #endif /* CONFIG_PM */ > >>>>> > >>>>> +static int rtl_phy_poll_quirk(struct rtl8169_private *tp) > >>>>> +{ > >>>>> + struct pci_dev *pdev = tp->pci_dev; > >>>>> + > >>>>> + if (!pcie_aspm_enabled(pdev)) > >>>> > >>>> That's the wrong call. According to what you said earlier you want to > >>>> check for L1 sub-states, not for ASPM in general. > >>> > >>> As per described above, that's why use pcie_aspm_enabled here. > >>> > >>>> > >>>>> + return 0; > >>>>> + > >>>>> + if (tp->mac_version == RTL_GIGA_MAC_VER_39) > >>>>> + return 1; > >>>>> + > >>>>> + return 0; > >>>>> +} > >>>>> + > >>>>> static void rtl_wol_shutdown_quirk(struct rtl8169_private *tp) > >>>>> { > >>>>> /* WoL fails with 8168b when the receiver is disabled. */ > >>>>> @@ -4991,7 +5004,10 @@ static const struct net_device_ops rtl_netdev_ops = { > >>>>> > >>>>> static void rtl_set_irq_mask(struct rtl8169_private *tp) > >>>>> { > >>>>> - tp->irq_mask = RxOK | RxErr | TxOK | TxErr | LinkChg; > >>>>> + tp->irq_mask = RxOK | RxErr | TxOK | TxErr; > >>>>> + > >>>>> + if (!rtl_phy_poll_quirk(tp)) > >>>>> + tp->irq_mask |= LinkChg; > >>>>> > >>>>> if (tp->mac_version <= RTL_GIGA_MAC_VER_06) > >>>>> tp->irq_mask |= SYSErr | RxOverflow | RxFIFOOver; > >>>>> @@ -5085,7 +5101,8 @@ static int r8169_mdio_register(struct rtl8169_private *tp) > >>>>> new_bus->name = "r8169"; > >>>>> new_bus->priv = tp; > >>>>> new_bus->parent = &pdev->dev; > >>>>> - new_bus->irq[0] = PHY_MAC_INTERRUPT; > >>>>> + new_bus->irq[0] = > >>>>> + (rtl_phy_poll_quirk(tp) ? PHY_POLL : PHY_MAC_INTERRUPT); > >>>>> snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x", pci_dev_id(pdev)); > >>>>> > >>>>> new_bus->read = r8169_mdio_read_reg; > >>>>> > >>>> > >> > > The r8101 vendor driver applies a special setting for RTL8106e if ASPM is enabled. > Not sure whether it's related but it's worth a try. Could you please check whether > the following makes a difference? After applying this patch, it can't help relieve the issue. The 8101 vendor driver also use polling method to watch the link change event. Tried to disable polling method and enable LinkChg Irq, but it also can't get LinkChg interrupt. > > diff --git a/drivers/net/ethernet/realtek/r8169_phy_config.c b/drivers/net/ethernet/realtek/r8169_phy_config.c > index 50f0f621b..60014b9c4 100644 > --- a/drivers/net/ethernet/realtek/r8169_phy_config.c > +++ b/drivers/net/ethernet/realtek/r8169_phy_config.c > @@ -1153,6 +1153,7 @@ static void rtl8106e_hw_phy_config(struct rtl8169_private *tp, > r8169_apply_firmware(tp); > > rtl_writephy_batch(phydev, phy_reg_init); > + phy_write(phydev, 0x18, 0x8310); > } > > static void rtl8125_legacy_force_mode(struct phy_device *phydev) > -- > 2.32.0 > >