Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp274661imu; Tue, 15 Jan 2019 21:46:19 -0800 (PST) X-Google-Smtp-Source: ALg8bN5NTNBw725nTgaG7OYPlNEojUFZIOHWnKe8U3+SKsd9hdtd2dehx1THaFRbADhvR0lnLn4s X-Received: by 2002:a62:db41:: with SMTP id f62mr8051558pfg.123.1547617579722; Tue, 15 Jan 2019 21:46:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547617579; cv=none; d=google.com; s=arc-20160816; b=MNTFT/lNvQ+79Qx3Vq4eXImWCUm6/JBsjiRa2lns7ytM9slXPFYMyhpTfwUDhrxA38 aQGDy9LncJfbSFB/ltFGDZVJZPkQpUkKw8XoKt7pj91n7/dPMTSexlzxwOQfPVy1vYVs iwjmUuHVai6MilQuEH8XuU3jpxqwZoEiPNNOVJSQzcKg1priaGGzSuEJCebwbPH0cBwL /7wl5gWNYX8q4IFN1CVZr83/6CgdRhqrMaEgnqhg8dPFJJCR6wWyEjNGK1Y3yO7/Ed1G DGZMkfNbpwpSQg6DEN+rSe8jZ4uqeSQ7P+BX0hcv6kucXj456XSO7nuuGv7UXM9Bu9ms I4pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=69TqpGL/gzHjALg+o/1p3xNxk7nPqL/ulNZ0x8qvjIk=; b=csCYYsuH2v9FonqLcDpKza3OxQnG3DgDuJ96o+VeZOKaFui0j44NLcHH5VOJKun3TB xXN0muvZtYKL1IySoZDT4WCHrxTWOf3o8kDiJFSncZytIMUSbyE6+kne165wxcMeAxWT awYKa/EsxY4awXVRpIX2jwTkM8cirbUUDpeBQwJL83PBpsHwkVL7YYg/vLlVQpcZvwUo h5orga+sKj92kJvf1d5BpqqjM1CGxQkQcAGMT++njOVmAuHh0iRmqfT75ea9DZ8jNkiS iz9NKYR56q4n7uEnN+OnP6TGpE3s8fk9JfkI9JQ7xqB0PqQv19s8N1egfh3LlWV0++yB m7zg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u11si5090325plq.287.2019.01.15.21.46.01; Tue, 15 Jan 2019 21:46:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388473AbfAORjb convert rfc822-to-8bit (ORCPT + 99 others); Tue, 15 Jan 2019 12:39:31 -0500 Received: from 212-186-180-163.static.upcbusiness.at ([212.186.180.163]:39014 "EHLO cgate.sperl.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729035AbfAORjb (ORCPT ); Tue, 15 Jan 2019 12:39:31 -0500 Received: from msmac.intern.sperl.org (account martin@sperl.org [10.10.10.11] verified) by sperl.org (CommuniGate Pro SMTP 6.2.1 _community_) with ESMTPSA id 7754790; Tue, 15 Jan 2019 17:39:28 +0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed From: kernel@martin.sperl.org In-Reply-To: Date: Tue, 15 Jan 2019 18:39:27 +0100 Cc: Mark Brown , linux-tegra , Linux Kernel Mailing List , linux-spi@vger.kernel.org Content-Transfer-Encoding: 8BIT Message-Id: References: <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org> To: Jon Hunter X-Mailer: Apple Mail (2.3124) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi John! > On 15.01.2019, at 15:26, Jon Hunter wrote: >> Looks as if there is something missing in spi_stop_queue that >> would wake the worker thread one last time without any delays >> and finish the hw shutdown immediately - it runs as a delayed >> task... >> >> One question: do you run any spi transfers in >> your test case before suspend? > > No and before suspending I dumped some of the spi stats and I see no > tranfers/messages at all ... > > Stats for spi1 ... > Bytes: 0 > Errors: 0 > Messages: 0 > Transfers: 0 This... >> /sys/class/spi_master/spi1/statistics/messages gives some >> counters on the number of spi messages processed which >> would give you an indication if that is happening. >> >> It could be as easy as adding right after the first lock >> in spi_stop_queue: >> kthread_mod_delayed_work(&ctlr->kworker, >> &ctlr->pump_idle_teardown, 0); >> (plus maybe a yield or similar to allow the worker to >> quickly/reliably run on a single core machine) >> >> I hope that this initial guess helps. > > Unfortunately, the above did not help and the issue persists. > > Digging a bit deeper I see that now the 'ctlr->queue' is empty but > 'ctlr->busy' flag is set and this is causing the 'could not stop message > queue' error. > > It seems that __spi_pump_messages() is getting called several times > during boot when registering the spi-flash, then after the spi-flash has > been registered, about a 1 sec later spi_pump_idle_teardown() is called > (as expected), but exits because 'ctlr->running' is true. However, and this contradicts each other! If there is a call to message pump, then we should process a message and the counters should increase. If those counters do not increase then there is something strange. If we are called without anything to do then the pump should trigger a tear down and stop. > spi_pump_idle_teardown() is never called again and when we suspend we > are stuck in the busy/running state. In this case should something be > scheduling spi_pump_idle_teardown() again? Although even if it does I > don't see where the busy flag would be cleared in this path? > I am wondering where busy/running would be set in the first place if there are no counters... Is it possible that the specific flash is not using the “normal” spi_pump_message, but spi_controller_mem_ops operations? Maybe we are missing the teardown in that driver or something is changing flags there. grepping a bit: I see spi_mem_access_start calling spi_flush_queue, which is calling pump_message, which - if there is no transfer - will trigger a delayed tear-down, while it maybe should not be doing that. Maybe spi_mem_access_end needs a call to spi_flush_queue as well? Unfortunately this is theoretical work and quite a lot of guesswork without an actual device available for testing... Thanks, Martin