Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2543350pxa; Mon, 3 Aug 2020 19:59:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoTx37c9MC3eeoQFp4wOEf6oTLylU6AI1DlelQjLDRzySbsZx5MRrMn8fxTft9U938aZ4a X-Received: by 2002:a17:906:e10e:: with SMTP id gj14mr20457980ejb.218.1596509994221; Mon, 03 Aug 2020 19:59:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596509994; cv=none; d=google.com; s=arc-20160816; b=1G9Ea+Ms4wDDseODC9GIYGiA0dLF1vZPMe/BSu8nLQX5e6R+MriZ2smqLuaRlfzzOW YkpatzTKnQ6TTz/vnrG30grOc6vv8wYP7oPSJzvL+LG0P3zUBZFkivZRE5qtoxq8+lNu kxRPxQVGdg8fygCK/urK5O8owtE+t745BxjhV71k461vhFnfhHULxYlx6Rwvcf++ya50 7L4kriXSNQ59PcfGmRPrfQP/QcQSpic0DoSIgu086LQZaydO3nXNIouMerJwPCVU5Gp9 HrPdCQ2EwWOqXDtzF3d+mBVcWprhe3pTFEti5ncK2jmtMbbHNe1NdfEKtcjprBU4ZNJz tBvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=KtVZ0ysseBvAwqONeypNwDI46hzhJZgsnw1vwQJPAEw=; b=nDIBGIQOiEIGy70gNjdSTWGOghwqf/JLKHxHNiQibQ91deka8VGeGKJe++WVFMtbNI G4a0U47VcUCnKLn3MZOizj3tm3kKHE6bulsAGn5tX6y7faFjTU+kmpJXs9sMTpqKMjlv 3Pk2ifQ7WM5DVwLplZdVOhTPvZuGly6RFIa9S7uBbyWp7vYXIMQFgKzWvdLak9r9NZ4a J8V6b7wzH12t9GjQ7bc15fmheJu5+NTmxJMH/az35gveQkhQsVlvbUBCF24ch3lY7+TR OCg3RGSXS9OYOS4SSSSnsK1F47Me+YfOo9YZATrlWGJVwqAAb5a/XygBpVES34WVHFyq K2hA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y16si11494383edu.65.2020.08.03.19.59.30; Mon, 03 Aug 2020 19:59:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726615AbgHDC6T (ORCPT + 99 others); Mon, 3 Aug 2020 22:58:19 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:8756 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725917AbgHDC6T (ORCPT ); Mon, 3 Aug 2020 22:58:19 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 56039FC0FAA3C1FC15C3; Tue, 4 Aug 2020 10:58:11 +0800 (CST) Received: from [127.0.0.1] (10.174.179.214) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.487.0; Tue, 4 Aug 2020 10:58:01 +0800 Subject: Re: [PATCH] ubi: check kthread_should_stop() after the setting of task state To: Richard Weinberger CC: , LKML , Richard Weinberger , "zhangyi (F)" References: <20200601091231.3794350-1-chengzhihao1@huawei.com> <211afcd0-d5b3-5ac0-1fd1-dc789634a858@huawei.com> From: Zhihao Cheng Message-ID: <9caa4860-975c-70bb-c8b9-737d1db9ead4@huawei.com> Date: Tue, 4 Aug 2020 10:58:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.214] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2020/8/4 6:11, Richard Weinberger 写道: > On Mon, Aug 3, 2020 at 4:01 AM Zhihao Cheng wrote: >>> Hmm, I see the problem but I fear this patch does not cure the race completely. >>> It just lowers the chance to hit it. >>> What if KTHREAD_SHOULD_STOP is set right after you checked for it? >> The patch can handle this case. ubi_thread will exit at >> kthread_should_stop() in next iteration. > How can it reach the next iteration? > Maybe I didn't fully get your explanation. > > As far as I understand the problem correctly, the following happens: > 1. ubi_thread is running and the program counter is somewhere between > "if (kthread_should_stop())" > and schedule() > 2. While detaching kthread_stop() is called > 3. Since the program counter in the thread is right before schedule(), > it does not check KTHREAD_SHOULD_STOP > and blindly calls into schedule() > 4. The thread goes to sleep and nothing wakes it anymore -> endless wait. > > Is this correct so far? Oh, you're thinking about influence by schedule(), I get it. But I think it still works. Because the ubi_thread is still on runqueue, it will be scheduled to execute later anyway. op                                                    state of ubi_thread           on runqueue set_current_state(TASK_INTERRUPTIBLE) TASK_INTERRUPTIBLE              Yes if (kthread_should_stop()) // not satisfy TASK_INTERRUPTIBLE              Yes kthread_stop:   wake_up_process     ttwu_queue       ttwu_do_activate         ttwu_do_wakeup TASK_RUNNING                       Yes schedule   __schedule(false)  // prev->state is TASK_RUNNING, so we cannot move it from runqueue by deactivate_task(). So just pick next task to execute, ubi_thread is still on runqueue and will be scheduled to execute later. The test patch added mdelay(5000) before schedule(), which can make sure kthread_stop()->wake_up_process() executed before schedule(). Previous analysis can be proved through test. @@ -1638,6 +1641,15 @@ int ubi_thread(void *u)                     !ubi->thread_enabled || ubi_dbg_is_bgt_disabled(ubi)) {                         set_current_state(TASK_INTERRUPTIBLE);                         spin_unlock(&ubi->wl_lock); +                       if (kthread_should_stop()) { +                               set_current_state(TASK_RUNNING); +                               break; +                       } + +                       pr_err("Check should stop B\n"); +                       mdelay(5000); +                       pr_err("delay 5000ms \n"); +                         schedule();                         continue;                 } > > Your solution is putting another check for KTHREAD_SHOULD_STOP before > schedule(). > I argue that this will just reduce the chance to hit the race window > because it can still happen > that kthread_stop() is being called right after the second check and > again before schedule(). > Then we end up with the same situation. >