Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1016769imm; Tue, 5 Jun 2018 07:56:37 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJVIFZH1F48e409nypg03c/Lrh+2wEcPVMzntCngCRA4hHO28PfieV8uF6gzaO6eEJadNJs X-Received: by 2002:a17:902:822:: with SMTP id 31-v6mr27143329plk.172.1528210597231; Tue, 05 Jun 2018 07:56:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528210597; cv=none; d=google.com; s=arc-20160816; b=dD2bNFBtv/T2rFKnXCRbopnOTQwU2FDp33wn7hNJULxXIpbuLdJzBDC//5ULnARsnN oe8qZbSGVjC4LExdceWck8R6o0wfPNfZzzlJykr77lrJlo4GCfc+B00iOaXtosUvS3QT dj5fAfEy80sM0yQqnPU6kChS5CIoESo5qPbyBTKkaMwGhElA8wwv9mdUqZB7y4Bx9FBA IkBkiIHU9ZQe0mFsHlkXAkcRn0YQQqPjeOJe6cWBe/+3fGQuFvY0MlKnWudsqxw6Wsr1 0bij6zIEUVwcjUyoOGyBm+7JaMZfe0K91Vibcb5YA0/bwsoiF1qVJU1I2YQ/YxAtVKQV FoIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=iuhagLd3iXo3mBhYBoCe3ZYL5P+Z/dDXc54PS7AcKaA=; b=ZZZksyjdv5QuCsqrPgnsjVT1WDNYq58gnkLM8WMUw2hdy9BIGRD1GRZRs/DtA+3L8h bvG9FpqMFGYWxGvkJ1YtLJ4m+X6rIoNidfTTlM6bGLY124bXtLCk5/OUZIfzyupLAQMm 9vZMnE/N+XdiBiZ+UTElatO+nySi9eXCN/CopIUiRtVev3lnGIZhpbsG9g6l/picJ+dt gjYvV1r9d3LYSslnMekTbyggYtYeJXPdO4/7uuneEsKQKPyXozdgDeOSVaFqrpSjoNo0 jYSeqpNQqgE3ByPJoeHD5KqAgun6f+lSsF8+II/0hHETvbqFM18/4C9xXv2RHC6PJuI8 hX+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XpCjjbrv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m22-v6si49213556pfg.323.2018.06.05.07.56.22; Tue, 05 Jun 2018 07:56:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XpCjjbrv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752637AbeFEOyT (ORCPT + 99 others); Tue, 5 Jun 2018 10:54:19 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:44374 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752137AbeFEOyQ (ORCPT ); Tue, 5 Jun 2018 10:54:16 -0400 Received: by mail-wr0-f195.google.com with SMTP id y15-v6so2754211wrg.11 for ; Tue, 05 Jun 2018 07:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=iuhagLd3iXo3mBhYBoCe3ZYL5P+Z/dDXc54PS7AcKaA=; b=XpCjjbrv1wGEVcTigCTm61rkRtlvi6qKL66Dos/1j6mXBpsp1LS/V+VsGDLXjTcfVK nhP0yjK/NHTgVmwqCkW+mchHLwAcU/+OPfbviJLY86C3oOaziH9K4ENvu3BJBGXTEAT8 PoXVRkzXsqYR5x+4LzTiksUwQ5nmCpbKFpD4A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=iuhagLd3iXo3mBhYBoCe3ZYL5P+Z/dDXc54PS7AcKaA=; b=mUkPg2PYRaHmaMe8fBbVmOELrTSe+tN1kUYZMPcEZ12TU4C+gqi2RRcwvmnLy3jbm5 BF9wj+7mXF0UomL7NZPavdYp33uOMF/56HyCXdYG+HuFNEjlzN8PFdA8wiSrVEJPdrE/ lkTDGAZz9n9+RNHViCbzawHph6feIEcrpy6Yj2Xn5j09i+JS1/RCXTo+IV/dQeM4fVuG KpprzGrbAeqmjs9jYZVXljiGa7SdYVN/qpTsRsY2qNoKEeuO/91rZ74f5foFca5aYw9S ZokneFULIQgIEJVTZ5f1S8FHOSuuyhgoqDyfbPxFCnyBrjMFK3hAeh8LMaKCkVtjzrqZ nHag== X-Gm-Message-State: ALKqPwfrfEf1UwpTSfMbnYPN8reC8NakB+5wUbuzOMM8BEPKNbb1fUmp TkNGD9xrCu/CRiM0AFIoY9x1aA== X-Received: by 2002:adf:e644:: with SMTP id b4-v6mr21681966wrn.254.1528210455418; Tue, 05 Jun 2018 07:54:15 -0700 (PDT) Received: from ?IPv6:2a01:e35:879a:6cd0:3e97:eff:fe5b:1402? ([2a01:e35:879a:6cd0:3e97:eff:fe5b:1402]) by smtp.googlemail.com with ESMTPSA id k126-v6sm2475602wmd.45.2018.06.05.07.54.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jun 2018 07:54:14 -0700 (PDT) Subject: Re: [PATCH V5] powercap/drivers/idle_injection: Add an idle injection framework To: Viresh Kumar Cc: rjw@rjwysocki.net, linux-kernel@vger.kernel.org, Eduardo Valentin , Javi Merino , Leo Yan , Kevin Wangtao , Vincent Guittot , Rui Zhang , Daniel Thompson , "open list:POWER MANAGEMENT CORE" References: <1528190208-22915-1-git-send-email-daniel.lezcano@linaro.org> <20180605103917.pyhhcobdvaivqv6g@vireshk-i7> From: Daniel Lezcano Message-ID: <57d769f8-46ea-512e-8f89-a0439c9d053f@linaro.org> Date: Tue, 5 Jun 2018 16:54:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180605103917.pyhhcobdvaivqv6g@vireshk-i7> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/06/2018 12:39, Viresh Kumar wrote: > On 05-06-18, 11:16, Daniel Lezcano wrote: >> diff --git a/drivers/powercap/idle_injection.c b/drivers/powercap/idle_injection.c >> +/** >> + * idle_injection_wakeup - Wake up all idle injection threads >> + * @ii_dev: the idle injection device >> + * >> + * Every idle injection task belonging to the idle injection device >> + * and running on an online CPU will be wake up by this call. >> + */ >> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev) >> +{ >> + struct idle_injection_thread *iit; >> + int cpu; >> + >> + for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) { >> + atomic_inc(&ii_dev->count); > > This may not be sufficient enough in some extremely corner cases, as it is > possible that the idle_injection_fn() starts running for the first cpu in the > cpumask right after first iteration of this loop completes. And so > atomic_dec_and_test() may return true there before idle_injection_fn() task gets > a chance to run on all cpus in the cpumask. Normally this wouldn't be a big > problem as the hrtimer can get reprogrammed in that case, but there is a case I > am worried about. More on this later.. > >> + iit = per_cpu_ptr(&idle_injection_thread, cpu); >> + iit->should_run = 1; >> + wake_up_process(iit->tsk); >> + } >> +} >> + >> +/** >> + * idle_injection_wakeup_fn - idle injection timer callback >> + * @timer: a hrtimer structure >> + * >> + * This function is called when the idle injection timer expires which >> + * will wake up the idle injection tasks and these ones, in turn, play >> + * idle a specified amount of time. >> + * >> + * Return: HRTIMER_NORESTART. >> + */ >> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer) >> +{ >> + struct idle_injection_device *ii_dev = >> + container_of(timer, struct idle_injection_device, timer); >> + >> + idle_injection_wakeup(ii_dev); >> + >> + return HRTIMER_NORESTART; >> +} >> + >> +/** >> + * idle_injection_fn - idle injection routine >> + * @cpu: the CPU number the tasks belongs to >> + * >> + * The idle injection routine will stay idle the specified amount of >> + * time >> + */ >> +static void idle_injection_fn(unsigned int cpu) >> +{ >> + struct idle_injection_device *ii_dev; >> + struct idle_injection_thread *iit; >> + int run_duration_ms, idle_duration_ms; >> + >> + ii_dev = per_cpu(idle_injection_device, cpu); >> + iit = per_cpu_ptr(&idle_injection_thread, cpu); >> + >> + /* >> + * Boolean used by the smpboot main loop and used as a >> + * flip-flop in this function >> + */ >> + iit->should_run = 0; >> + >> + idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms); >> + if (idle_duration_ms) >> + play_idle(idle_duration_ms); >> + >> + /* >> + * The last CPU waking up is in charge of setting the timer. If >> + * the CPU is hotplugged, the timer will move to another CPU >> + * (which may not belong to the same cluster) but that is not a >> + * problem as the timer will be set again by another CPU >> + * belonging to the cluster. This mechanism is self adaptive. >> + */ >> + if (!atomic_dec_and_test(&ii_dev->count)) >> + return; >> + >> + run_duration_ms = atomic_read(&ii_dev->run_duration_ms); >> + if (run_duration_ms) { >> + hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms), >> + HRTIMER_MODE_REL_PINNED); >> + return; >> + } >> + >> + complete(&ii_dev->stop_complete); >> +} > > Here is the corner case I was talking about. Consider the cpumask contains > CPU0/1/2/3. > > PATH A PATH B > > unregister() > -> idle_injection_stop() > > idle_injection_wakeup() > -> Running loop for CPU0 > -> atomic inc (count == 1) > -> wake_up_process(tsk) > -> At this point the current task stops > running and idle_injection_fn() starts running > (maybe on a different CPU) > > > > > idle_injection_fn() > -> atomic_dec_and_test(), returns true > as count == 0 > > > -> set-duration to 0 > -> wait for completion > > > -> atomic_read(run_duration), returns 0 > -> complete() > > -> kfree(ii_dev); > > > But the initial loop idle_injection_wakeup() has only run for CPU0 until now and > will continue running for other CPUs and will crash as ii_dev is already freed. > > Am I still making a mistake here ? I don't think you are doing a mistake. Even if this can happen theoretically, I don't think practically that is the case. The play_idle() has 1ms minimum sleep time. The scenario you are describing means: 1. the loop in idle_injection_wakeup() takes more than 1ms to achieve 2. at the same time, the user of the idle injection unregisters while the idle injection is acting precisely at CPU0 and exits before another task was wakeup by the loop in 1. more than 1ms after. From my POV, this scenario can't happen. Anyway, we must write rock solid code so may be we can use a refcount to protect against that, so instead of freeing in unregister, we refput the ii_dev pointer. >> +static struct idle_injection_device *ii_dev_alloc(void) >> +{ >> + struct idle_injection_device *ii_dev; >> + >> + ii_dev = kzalloc(sizeof(*ii_dev), GFP_KERNEL); >> + if (!ii_dev) >> + return NULL; >> + >> + if (!alloc_cpumask_var(&ii_dev->cpumask, GFP_KERNEL)) { >> + kfree(ii_dev); >> + return NULL; >> + } >> + >> + return ii_dev; >> +} >> + >> +static void ii_dev_free(struct idle_injection_device *ii_dev) >> +{ >> + free_cpumask_var(ii_dev->cpumask); >> + kfree(ii_dev); >> +} >> + >> +/** >> + * idle_injection_register - idle injection init routine >> + * @cpumask: the list of CPUs managed by the idle injection device >> + * >> + * This is the initialization function in charge of creating the >> + * initializing of the timer and allocate the structures. It does not >> + * starts the idle injection cycles. >> + * >> + * Return: NULL if an allocation fails. >> + */ >> +struct idle_injection_device *idle_injection_register(struct cpumask *cpumask) >> +{ >> + struct idle_injection_device *ii_dev; >> + int cpu; >> + >> + ii_dev = ii_dev_alloc(); >> + if (!ii_dev) >> + return NULL; >> + >> + cpumask_copy(ii_dev->cpumask, cpumask); >> + hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); >> + ii_dev->timer.function = idle_injection_wakeup_fn; >> + init_completion(&ii_dev->stop_complete); >> + >> + for_each_cpu(cpu, ii_dev->cpumask) { >> + >> + if (per_cpu(idle_injection_device, cpu)) { >> + pr_err("cpu%d is already registered\n", cpu); >> + goto out_rollback_per_cpu; >> + } >> + >> + per_cpu(idle_injection_device, cpu) = ii_dev; >> + } >> + >> + return ii_dev; >> + >> +out_rollback_per_cpu: >> + for_each_cpu(cpu, ii_dev->cpumask) >> + per_cpu(idle_injection_device, cpu) = NULL; > > Maybe move above into ii_dev_free(), as I suggested earlier, as that will remove > same code from unregister routine as well. I don't like to mix operations when the function name define clearly what is doing. The ii_dev_free() frees a structure, idle_injection_device refers to a global variable. Anyway if we are talking about using a refcount, we can add a release which frees the structure and sets the per cpu idle_injection_device global variable to NULL. -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog