Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp738777ybh; Sat, 3 Aug 2019 08:47:31 -0700 (PDT) X-Google-Smtp-Source: APXvYqwdYzMuDzbb73Az2Z9PH/YSL+UjyfsjkZ1das3ZoVkIkhxorqbnkM/bqq1kizNet04gdMSa X-Received: by 2002:a62:e801:: with SMTP id c1mr9702197pfi.41.1564847251455; Sat, 03 Aug 2019 08:47:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564847251; cv=none; d=google.com; s=arc-20160816; b=td3ycWJNm8AJCoTQJpta3oRqCBtNU810iEI8/xUVyhHm9uTi2WQw9ly8PNsBYmglHP fpTf8D74N0w9GbTwTLaKhuegE6W1FWRu4arbVfENFv0s9wlLoD4yNJlig7doT6AlDqPv fkTQwzjtRunmx0mVXsl69RKpEu6TXXXG8kRbb0zDzuO7phDXAHU6MkdM2WtgN80Co/8Z Tk0fu3Qs3Fs7sWL+ZW/ywXyzio3aWBqTiltGiLTGlNUG1rZ+eBo4XQh9aETXhbS5wvh7 sunH5OP7gVW+iGk5Sf6HvOEqQtNs9RRH3QNk5dVS63FeW/IbGDhQOpkZaSnGJcHiIL2/ cLOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Dw/7xgv73J5x8SzoZegFidMg6UFr+NDgLE9c4DwZbuc=; b=k5xYtOZHPw8uqzAVowzfM+tdnJZubEDUbe7KQKafc3IujnNDafBIRpSckSQrUmD1MW RSg4v4Ubi1mR4YWMkxbOyHv9GwlO8hKw5GSjL1YkQtO9fIlqPUkcFIQCcRB93WG0+nsj d27NxbXwYvv+FDFx/EuMoXR9uBVJdwaoRgUmQlM+Ybphqd64cC+S8/LlomPA3S4Jntxo x4v2DTX5wQYe0MpS4WlB+tOzSXOe6/dvQnQxu4AlQ6T1jtqA3ACIZHv3Z3zmU3VYUl5d W4MvOa+/s4ps0+4nNVdJY2NnqO08IPwYnU74VND/G4gOQi+c36BChKEPz/wqOXYjVbJg P7Zg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 6si41314933pgt.13.2019.08.03.08.47.14; Sat, 03 Aug 2019 08:47:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732821AbfHBQlK (ORCPT + 99 others); Fri, 2 Aug 2019 12:41:10 -0400 Received: from mga04.intel.com ([192.55.52.120]:52979 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729353AbfHBQlI (ORCPT ); Fri, 2 Aug 2019 12:41:08 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Aug 2019 09:41:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,338,1559545200"; d="scan'208";a="175637656" Received: from vivekcha-mobl1.amr.corp.intel.com (HELO [10.251.131.115]) ([10.251.131.115]) by orsmga003.jf.intel.com with ESMTP; 02 Aug 2019 09:41:06 -0700 Subject: Re: [RFC PATCH 15/40] soundwire: cadence_master: handle multiple status reports per Slave To: Vinod Koul Cc: alsa-devel@alsa-project.org, linux-kernel@vger.kernel.org, tiwai@suse.de, broonie@kernel.org, gregkh@linuxfoundation.org, jank@cadence.com, srinivas.kandagatla@linaro.org, slawomir.blauciak@intel.com, Sanyog Kale References: <20190725234032.21152-1-pierre-louis.bossart@linux.intel.com> <20190725234032.21152-16-pierre-louis.bossart@linux.intel.com> <20190802122003.GQ12733@vkoul-mobl.Dlink> <20190802160115.GS12733@vkoul-mobl.Dlink> From: Pierre-Louis Bossart Message-ID: <1326bf1d-8289-2838-e2bd-48dba78b4a6c@linux.intel.com> Date: Fri, 2 Aug 2019 11:41:05 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190802160115.GS12733@vkoul-mobl.Dlink> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/2/19 11:01 AM, Vinod Koul wrote: > On 02-08-19, 10:29, Pierre-Louis Bossart wrote: >> On 8/2/19 7:20 AM, Vinod Koul wrote: >>> On 25-07-19, 18:40, Pierre-Louis Bossart wrote: > >>>> + status[i] = SDW_SLAVE_UNATTACHED; >>>> + break; >>>> + case 1: >>>> + status[i] = SDW_SLAVE_ATTACHED; >>>> + break; >>>> + case 2: >>>> + status[i] = SDW_SLAVE_ALERT; >>>> + break; >>>> + default: >>>> + status[i] = SDW_SLAVE_RESERVED; >>>> + break; >>>> + } >>> >>> we have same logic in the code block preceding this, maybe good idea to >>> write a helper and use for both >> >> Yes, I am thinking about this. There are multiple cases where we want to >> re-check the status and clear some bits, so helpers would be good. >> >>> >>> Also IIRC we can have multiple status set right? >> >> Yes, the status bits are sticky and mirror all values reported in PING >> frames. I am still working on how to clear those bits, there are cases where >> we clear bits and end-up never hearing from that device ever again. classic >> edge/level issue I suppose. > > Then the case logic above doesn't work, it should be like the code block > preceding this.. what I was referring to already is a problem even in single status mode. Let's say for example that a device shows up as Device0, then we try to enumerate it and program a non-zero device number. If that fails, we currently clear the Attached status for Device0, so will never have an interrupt ever again. The device is there, attached as Device0, but we've lost the single opportunity to make it usable. The link is in most cases going to be extremely reliable, but if we know of state machines that lead to a terminal state then we should proactively have a recovery mechanism to avoid complicated debug down the road for cases where the hardware has transient issues. For the multiple status case, we will have to look at the details and figure out which of the flags get cleared and which ones don't. One certainty is that we absolutely have to track IO errors in interrupt context. They are recoverable in regular context but not quite in interrupt context if we clear the status bits unconditionally. Maybe a tangent here but to be transparent there are really multiple topics we are tracking at the moment: 1. error handling in interrupts. I found a leak where if a device goes in the weeds while we program its device number and resynchronizes then we allocate a new device number instead of reusing the initial one. The bit clearing is also to be checked as explained above. 2. module dependencies: there is a race condition leading to a kernel oops if the Slave probe is not complete before the .update_status is invoked. 3. jack detection. The jack detection routine is called as a result of an imp-def Slave interrupt. We never documented the assumption that if this jack detection takes time then it needs to be done offline, e.g. in a work queue. Or if we still want it to be done in a the interrupt thread then we need to re-enable interrupts earlier, otherwise one device can stop interrupt handling for a fairly long duration. 4. streaming stop on link errors. We've seen in tests that if you reset the link or a Slave device with debugfs while audio is playing then streaming continues. This condition could happen if a device loses sync, and the spec says the Slave needs to reset its channel enable bits. At the command level, we handle this situation and will recover, but there is no notification to the ALSA layers to try and recover on the PCM side of things (as if it were an underflow condition). We also try to disable a stream but get all kinds of errors since it's lost state. All of those points are corner cases but they are important to solve for actual products.