Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754533AbbERTOI (ORCPT ); Mon, 18 May 2015 15:14:08 -0400 Received: from mail.kapsi.fi ([217.30.184.167]:37210 "EHLO mail.kapsi.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753760AbbERTOF (ORCPT ); Mon, 18 May 2015 15:14:05 -0400 Message-ID: <555A39EA.1060608@kapsi.fi> Date: Mon, 18 May 2015 22:13:46 +0300 From: Mikko Perttunen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Brian Norris , Sascha Hauer CC: linux-pm@vger.kernel.org, Zhang Rui , Eduardo Valentin , linux-kernel@vger.kernel.org, Stephen Warren , kernel@pengutronix.de, linux-mediatek@lists.infradead.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH 11/15] thermal: thermal: Add support for hardware-tracked trip points References: <1431507163-19933-1-git-send-email-s.hauer@pengutronix.de> <1431507163-19933-12-git-send-email-s.hauer@pengutronix.de> <5559ABAA.6040001@kapsi.fi> <20150518120944.GQ6325@pengutronix.de> <20150518184433.GS11598@ld-irv-0074> In-Reply-To: <20150518184433.GS11598@ld-irv-0074> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 2001:708:30:12d0:beee:7bff:fe5b:f272 X-SA-Exim-Mail-From: mikko.perttunen@kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3291 Lines: 64 On 05/18/2015 09:44 PM, Brian Norris wrote: > On Mon, May 18, 2015 at 02:09:44PM +0200, Sascha Hauer wrote: >> On Mon, May 18, 2015 at 12:06:50PM +0300, Mikko Perttunen wrote: >>> One interesting thing I noticed was that at least the bang-bang >>> governor only acts if the temperature is properly smaller than (trip >>> temp - hysteresis). So perhaps we should specify the non-tripping >>> range as [low, high)? Or we could change bang-bang. >> >> I wonder how we can protect against such off-by-one errors anyway. >> Generally a hardware might operate on raw values rather than directly >> in temperature values in ?C. This means a driver for this must have >> celsius_to_raw and raw_to_celsius conversion functions. Now it can >> happen that due to rounding errors celsius_to_raw(Tcrit) returns a raw >> value that when converted back to celsius is different from the >> original value in ?C. This would mean the hardware triggers an interrupt >> for a trip point and the thermal core does not react because get_temp >> actually returns a different temperature than previously programmed as >> interrupt trigger. This way we would lose hot (or cold) events. > > This also highlights another fact: there's a race between interrupt > generation and temperature reading (->get_temp()). I would expect any > hardware interrupt thermal sensor would also have a latched temperature > reading to correspond with it, and there would be no guarantee that this > latched temperature will match the polled reading seen once you reach > thermal_zone_device_update(). So a hardware driver might report a > thermal update, but the temperature reported to the core won't > necessarily match what interrupt was meant for. Does this actually matter? The thermal core will reset trips and apply cooling using the new - most recent - value. Using bang bang as example, if the temperature has risen since the interrupt fired, the cooling device will correctly not be switched off. If the temperature has fallen, it will again be correctly switched off. The only issue is then if the temperature is exactly 'trip temp - trip hyst' which will cause set_trips to load the trip points below, but not cause bang bang to turn off the cooling device, and the next chance it will have will only be at the next below trip point. Well, this is still safe (at least until you replace "cooling device" with "heating device"), so maybe it isn't that big of an issue. Please point out if there's a problem with my line of reasoning. FWIW - at least Tegra doesn't have a latched register like this. There's just a bit indicating that an interrupt was raised and a temperature register that updates according to the sensor's input clock. > > I have a patch that adds a thermal_zone_device_update_temp() API, so > drivers can report the temperature along with the interrupt > notification. (Such a patch also helps so that the driver can choose to > round down on cold events and up on hot events, resolving your rounding > issue too.) > > Brian > Cheers, Mikko -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/