Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4703356imm; Mon, 20 Aug 2018 22:48:26 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxNl1PqT7Jf1MA+SpiDSTpUD42PaodvFxLVnyumVW4oMEEZAzU7n9iQZh4jER9Idi/G//Wh X-Received: by 2002:a63:f94f:: with SMTP id q15-v6mr4522312pgk.213.1534830505944; Mon, 20 Aug 2018 22:48:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534830505; cv=none; d=google.com; s=arc-20160816; b=KLROz1q1e54ppIZ1lEe8GxVwPVEXBa4hvlzl1T3Yat7s4gGZ/3KJ2H87FJa46CrHKd JSMtQIp1C0tBTevsG9mt3T7+8EeMzNGWl0usGNuvgQWRf2Ns2a25J0Yk4Z45jLK8vI9K SzkNhCiK3Rcopj6UnWQy/MuN8LLGLF8gA4sN9Bw7bhg8uxM0LjdwdXFR3IQ920P/ENcQ aFBE7mntQO03FSoREMUJFPJulMfd/2BJ3bFDSDSICqIXSGp4G0kLJeT2GL0N0LWVX6OQ 1sUlxOI9pQSDZTTCSvUKjD9T8p7ai9gMYOuL/qRSXW/f5c6oNNhCr9punDBIrcrNc+z1 2UNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=u6pS9zGX3HGn9UQkmkoW6b/9pUgIq3KnRFQRW5KtXkU=; b=TlpGFnQetHFAsQUAHgeFb0JTx17kX4S7p7y1UfAlgUgR5rW0xQTVHZJ9A6eyKS00EU jAPOcKB04hkib2OXE2hn+MpkQFBPkRBvqzE+CXH068muvWZLrw+VoEVJYgtjBpe6rK4N l6TsS4hoGue2dIOJoWLx1benINENfAyoCmZXOC8rYMMTE5iBG3AhsNuiaujL+2a6zhr6 OeDCNZUpim9vj2jV45saFOVxa5nrbuRu6UUzlID3czIdKzdmeaMzzW7NK2gekUJnJXcd 1VfP+HQYtI9KSmrFteJ3JmVncR2XXsIc5yrHlCOXiT2nsr7RuXkfFaDRN0U6I31hfU0N F4qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 22-v6si12277718pfl.220.2018.08.20.22.48.10; Mon, 20 Aug 2018 22:48:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726560AbeHUJFq (ORCPT + 99 others); Tue, 21 Aug 2018 05:05:46 -0400 Received: from bmailout3.hostsharing.net ([176.9.242.62]:49369 "EHLO bmailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbeHUJFq (ORCPT ); Tue, 21 Aug 2018 05:05:46 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [83.223.95.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout3.hostsharing.net (Postfix) with ESMTPS id 48F9F100DA1DC; Tue, 21 Aug 2018 07:47:05 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id EE5C34A15; Tue, 21 Aug 2018 07:47:04 +0200 (CEST) Date: Tue, 21 Aug 2018 07:47:04 +0200 From: Lukas Wunner To: Bjorn Helgaas Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, mmyangfl@gmail.com Subject: Re: Enumeration issue with QCA9005 AR9462 Message-ID: <20180821054704.jlqk5zrlbbsjsd4g@wunner.de> References: <20180820230624.GB154536@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180820230624.GB154536@bhelgaas-glaptop.roam.corp.google.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 20, 2018 at 06:06:24PM -0500, Bjorn Helgaas wrote: > mmyangfl@gmail.com reported a problem [1]: on v4.17, a QCA9005 AR9462 > wifi device was present at boot, but disappeared after suspend/resume. > > He also tested a recent kernel (5c60a7389d79, from Thu Aug 16), > where the suspend/resume problem doesn't seem to happen, but the wifi > device isn't enumerated correctly at boot-time. > > [ 0.928714] pciehp 0000:04:00.0:pcie204: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ > [ 0.928752] pciehp 0000:04:00.0:pcie204: Slot(0-1): Card not present > [ 0.928811] pciehp 0000:04:00.0:pcie204: Slot(0-1): Link Up > [ 0.928815] pciehp 0000:04:00.0:pcie204: Slot(0-1): No adapter > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=200839 > [2] https://bugzilla.kernel.org/attachment.cgi?id=277923 The hardware appears to be broken in that the Presence Detect State bit in the Slot Status register is 0 (Slot Empty) even though the slot is occupied. Thus, as of v4.19, pciehp will initially consider the slot to be in ON_STATE when it probes (because there are enumerated children). It then looks at the PDS bit, sees that it's 0, believes that there is no longer anything in the slot and synthesizes a Presence Detect Changed event to bring down the slot. The IRQ thread then removes the device in the slot, sees that the link is up, tries to bring the slot up again, but that fails because __pciehp_enable_slot() complains that the Presence Detect State bit isn't set ("No adapter"). The slot is then considered to be in OFF_STATE by pciehp, even though the rescan made the device reappear behind pciehp's back. On resume from system sleep, pciehp sees that the Presence Detect State bit in the Slot Status register is still 0, and because it's already in OFF_STATE, there's nothing to do. Up until v4.18, an unoccupied slot was only brought down on resume: /* Check if slot is occupied */ pciehp_get_adapter_status(slot, &status); mutex_lock(&slot->hotplug_lock); if (status) pciehp_enable_slot(slot); else pciehp_disable_slot(slot); mutex_unlock(&slot->hotplug_lock); From v4.19, this is now also done on probe for consistency. The above hypothesis is confirmed by the lspci -vv output: LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive+ BWMgmt+ ABWMgmt- ^^^^^^^^^ SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- ^^^^^^^^ Possible solutions: (a) Be lenient towards broken hardware and accept DLActive+ as a proxy for PresDet+. (b) Add a blacklist to pciehp such that it doesn't bind to [1ae9:0200]. The bug reporter writes that "it's a single Half Mini PCIe card, with two chipsets (Wil6110? + AR9462) combined by a PCIe hub". This sounds like it's not really hotpluggable. (Is Mini PCIe hotplug capable at all?) Let me go through the driver and see if (a) is feasible and how intrusive it would be. Thanks, Lukas