Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754781AbdDLSnM (ORCPT ); Wed, 12 Apr 2017 14:43:12 -0400 Received: from fllnx210.ext.ti.com ([198.47.19.17]:53123 "EHLO fllnx210.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752282AbdDLSnK (ORCPT ); Wed, 12 Apr 2017 14:43:10 -0400 Subject: Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism To: Eduardo Valentin , Keerthy References: <20170412040542.GA11305@localhost.localdomain> <1491985580.2357.39.camel@intel.com> <1491986744.2357.42.camel@intel.com> <20170412154358.GA12881@localhost.localdomain> <798128ac-1d0b-7eb8-2ea3-8bc0bd0b9d6f@ti.com> <20170412172434.GA14619@localhost.localdomain> CC: Grygorii Strashko , Zhang Rui , , , , From: Tero Kristo Message-ID: Date: Wed, 12 Apr 2017 21:43:00 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <20170412172434.GA14619@localhost.localdomain> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5814 Lines: 162 On 12/04/17 20:24, Eduardo Valentin wrote: > On Wed, Apr 12, 2017 at 10:41:00PM +0530, Keerthy wrote: >> >> >> On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote: >>> >>> >>> On 04/12/2017 11:44 AM, Keerthy wrote: >>>> >>>> >>>> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >>>>> >>>>> >>>>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>>>>> Hello, >>>>>> >>>>> ... >>>>> >>>>>> >>>>>> I agree. But there it nothing that says it is not reenterable. If you >>>>>> saw something in this line, can you please share? >>>>>> >>>>>>>>> will you generate a patch to do this? >>>>>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>>>>> orderly_poweroff is called only once right away. I have already >>>>>>>> tested. >>>>>>>> >>>>>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>>>>> temperature. >>>>>>>> I will take some more time to make it and test. >>>>>>>> >>>>>>>> Is that okay? Or you want me to send both together? >>>>>>>> >>>>>>> I think you can send patch for step 1 first. >>>>>> >>>>>> I am happy to see that Keerthy found the problem with his setup and a >>>>>> possible solution. But I have a few concerns here. >>>>>> >>>>>> 1. If regular shutdown process takes 10seconds, that is a ballpark that >>>>>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait >>>>>> flag set. That means, if regular userland shutdown takes 10s, we are >>>>>> waiting for it. Obviously this not acceptable. Specially if you setup >>>>>> critical trip to be 125C. Now, if you properly size the critical trip to >>>>>> fire before hotspot really reach 125C, for 10s (or the time it takes to >>>>>> shutdown), then fine. But based on what was described in this thread, >>>>>> his system is waiting 10s on regular shutdown, and his silicon is on >>>>>> out-of-spec temperature for 10s, which is wrong. >>>>>> >>>>>> 2. The above scenario is not acceptable in a long run, specially from a >>>>>> reliability perspective. If orderly_poweroff() has a possibility to >>>>>> simply never return (or take too long), I would say the thermal >>>>>> subsystem is using the wrong API. >>> >>> ^ this question just repeat everything which was already discussed in >>> previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff, >>> but what to use instead? > > It is still useful on a properly sized system. The point is the thermal > subsystem still wants to give one opportunity to gracefully shutdown the > running system on a thermal scenario, as I explained in the other email. > But, you have to do this accounting the down time, and your reliability > concerns. > >>> >>> >>>>>> >>>>> >>>>> >>>>> Hh, I do not see that orderly_poweroff() will wait for anything now: >>>>> void orderly_poweroff(bool force) >>>>> { >>>>> if (force) /* do not override the pending "true" */ >>>>> poweroff_force = true; >>>>> schedule_work(&poweroff_work); >>>>> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure >>>>> } >>>>> >>>>> >>>>> static int __orderly_poweroff(bool force) >>>>> { >>>>> int ret; >>>>> >>>>> ret = run_cmd(poweroff_cmd); >>>> >>>> When i tried with multiple orderly_poweroff calls ret was always 0. >>>> So every 250mS i see this ret = 0. >>>> >>>>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >>>>> >>>>> if (ret && force) { >>>> >>>> So it never entered this path. ret = 0 so if is not executed. >>> >>> correct, because exec can find poweroff tool and start it, so you, >>> most probably, have bunch of this tool instance running in parallel (some of them can fail or block) >>> Issue 1 - you've sent fix for is actual :). >> >> Precisely yes! >> > > As I mentioned, the fix is a two fold, a. avoid spam of > orderly_poweroff(), but make sure eventually we shutdown. Just chirping in here a bit myself also, the long latencies in the poweroff executing are basically because in our case it will do all of the following: - stop all running daemons - kill all remaining processes - unload all modules - sync / unmount all filesystems - etc. - poweroff the system when everything else has been gracefully done The order of these things are not necessarily what I listed above, but overall it takes quite a bit of time. It doesn't matter if you execute all of this over NFS or SD card or ramdisk, it is a long procedure. -Tero > >>> >>> Again, thermal has no control of power off process once run_cmd() is returned, >>> and it do not know what US poweroff binary is doing and how much time can it take >>> (which include disks maintenance - loooong delay). >>> >>>> >>>>> pr_warn("Failed to start orderly shutdown: forcing the issue\n"); >>>>> >>>>> /* >>>>> * I guess this should try to kick off some daemon to sync and >>>>> * poweroff asap. Or not even bother syncing if we're doing an >>>>> * emergency shutdown? >>>>> */ >>>>> emergency_sync(); >>>>> kernel_power_off(); >>>>> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist >>>>> } >>>>> >>>>> return ret; >>>>> } >>>>> >>>>> static bool poweroff_force; >>>>> >>>>> static void poweroff_work_func(struct work_struct *work) >>>>> { >>>>> __orderly_poweroff(poweroff_force); >>>>> } >>>>> >>>>> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result >>>>> of US poweroff binary execution. >>>>> >>>>>> >>>>>> If you are going to implement the above two patches, keep in mind: >>>>>> i. At least within the thermal subsystem, you need to take care of all >>>>>> zones that could trigger a shutdown. >>>>>> ii. serializing the calls to orderly_poweroff() seams to be more >>>>>> concerning than cancelling all monitoring. >>>>>> >>>>>> >>>>> >>>