Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754981AbdDKMfV (ORCPT ); Tue, 11 Apr 2017 08:35:21 -0400 Received: from foss.arm.com ([217.140.101.70]:60278 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752711AbdDKMeu (ORCPT ); Tue, 11 Apr 2017 08:34:50 -0400 Date: Tue, 11 Apr 2017 13:34:42 +0100 From: Alexey Klimov To: Jassi Brar Cc: Sudeep Holla , Linux Kernel Mailing List , Jassi Brar Subject: Re: [PATCH RFC] mailbox: move controller timer to per-channel timers Message-ID: <20170411123441.GA4973@arm.com> References: <1491499891-30135-1-git-send-email-alexey.klimov@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3002 Lines: 59 On Fri, Apr 07, 2017 at 08:39:35PM +0530, Jassi Brar wrote: > On Thu, Apr 6, 2017 at 11:01 PM, Alexey Klimov wrote: > > When mailbox controller provides two or more channels and > > they are actively used by mailbox client(s) it's very easy > > to trigger the warning in hrtimer_forward(): > > > > [ 247.853060] WARNING: CPU: 6 PID: 0 at kernel/time/hrtimer.c:805 hrtimer_forward+0x88/0xd8 > > [ 247.853549] Modules linked in: > > [ 247.853907] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G W 4.11.0-rc2-00362-g93afaa4513bb-dirty #13 > > [ 247.854472] Hardware name: linux,dummy-virt (DT) > > [ 247.854699] task: ffff80001d89d780 task.stack: ffff80001d8c4000 > > [ 247.854999] PC is at hrtimer_forward+0x88/0xd8 > > [ 247.855280] LR is at txdone_hrtimer+0xd4/0xf8 > > [ 247.855551] pc : [] lr : [] pstate: 200001c5 > > [ 247.855857] sp : ffff80001efbdeb0 > > [ 247.856072] x29: ffff80001efbdeb0 x28: ffff80001efc3140 > > [ 247.856358] x27: ffff00000881b7a0 x26: 00000039ac93e8b6 > > [ 247.856604] x25: ffff000008e756be x24: ffff80001c4a1348 > > [ 247.856882] x23: 0000000000000001 x22: 00000000000000f8 > > [ 247.857189] x21: ffff80001c4a1318 x20: ffff80001d327110 > > [ 247.857509] x19: 00000000000f4240 x18: 0000000000000030 > > [ 247.857808] x17: 0000ffffaecdf370 x16: ffff0000081ccc80 > > [ 247.858000] x15: 0000000000000010 x14: 00000000fffffff0 > > [ 247.858186] x13: ffff000008f488e0 x12: 000000000002e3eb > > [ 247.858381] x11: ffff000008979690 x10: 0000000000000000 > > [ 247.858573] x9 : 0000000000000001 x8 : ffff80001efc66e0 > > [ 247.858758] x7 : ffff80001efc6708 x6 : 00000005be7732f2 > > [ 247.858943] x5 : 0000000000000001 x4 : ffff80001c4a1348 > > [ 247.859130] x3 : 00000039ac94952a x2 : 00000000000f4240 > > [ 247.859315] x1 : 00000039ac98243c x0 : 0000000000038f12 > > [ 247.859582] ---[ end trace d61812426ec3c30b ]--- > > > > To fix this current patch migrates hr timers to be per-channel > > instead of using only one timer per-controller. > > > I think we can do by just checking if hrtimer_active() returns false > before we do hrtimer_start() in msg_submit() ? It looks like it can be easily broken: 1) let's say first thread executes timer callback and already checked last_tx_done on channel 0; 2) second thread submits a message to the controller, say, on channel 0 and with help of hrtimer_active() observes that the timer is active (because timer callback is running) and decides not to (re-)start timer; After this first thread decides not to restart the timer and finishes callback. The thing that first thread executes tx_tick isn't helpful: for example first thread may have no messages to submit on any channel and therefore is not going to deal with timer. Finally, mailbox state machine is stalled. Second thread thinks that timer is active while it's not. One of the main questions is that there is only one timer per few channels in current code. Thanks, Alexey.