Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp459630ybv; Thu, 13 Feb 2020 03:50:41 -0800 (PST) X-Google-Smtp-Source: APXvYqwyyFTRbfGFUrhQcRESR+pHQMAnRw6UdiTb8ZvcSYJ7L8uDy048SYNSwtIB6Dzc7LH9B71N X-Received: by 2002:a05:6830:1198:: with SMTP id u24mr12567212otq.215.1581594641802; Thu, 13 Feb 2020 03:50:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581594641; cv=none; d=google.com; s=arc-20160816; b=KBgL96vvunGo9hKiwaoFhoYolJiABm8gt1AaKqC0FVQCvNYDaekzJpOQc4iRUsQ8e3 0u01E0N7CGqLGyCf5fIpt1D0NCEefBwWvAdY09JCWK0+u/Gz8/SW+dMC9/qLVts00coe xW4mxzkH1UmkgzUnjO4thwep3cutRd63jLqcZyPJ+RY0/N9falaI7uV3hzc1qvQcaUJO 0A2/hourbg5zLGEWuG99Q81odTMR4Va/N6QiYhM7sPkuFYqm46n1M4e69UEQjl1ueOUt J5bUp1qNWaoWLvWa/1rzNU1RA/3xErRornjgLx9NIkjnVqqBBIoVMbmKSaAR0i0p2PW9 8isA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=d6eWkgg7hjc18A45YtCfGAh9h0mGxlEGbqktmJShOmE=; b=xlJ+yd6v89VPfgQ6WVwZmzhnLFwaU4myHlgbuaICeDp19U+vK7bmOgqb3NtkaZL+zf vh1uJYsBoaOZYz97fMNOpb4n7lIPCXLykR0JrYbsWv1sx4fB1JYxj+uBOVMmPe8RhXnV tqsSsfiBFffPRtp7abfRp9SBGs7jion76D5FINFMFp1tFsVWal2+fGzRr33raIKl6WSc J0X/Y5g1Yac6AUxOMigxkrjiIx20b7FDOQCx4f+v94r4Ct+eqOFevcBwdOXyrytMSg+0 F3PdoDPA62gnBtsUeiWN2uYveBWyCqGCcqElR7OxWuvljYH1cQ9M9auLQcJwbb6hXvAF TvJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1si1043123oic.225.2020.02.13.03.50.29; Thu, 13 Feb 2020 03:50:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729557AbgBMLuV (ORCPT + 99 others); Thu, 13 Feb 2020 06:50:21 -0500 Received: from foss.arm.com ([217.140.110.172]:45544 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726232AbgBMLuV (ORCPT ); Thu, 13 Feb 2020 06:50:21 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A767E1FB; Thu, 13 Feb 2020 03:50:20 -0800 (PST) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DA2403F6CF; Thu, 13 Feb 2020 03:50:19 -0800 (PST) Date: Thu, 13 Feb 2020 11:50:16 +0000 From: Qais Yousef To: Tejun Heo Cc: Li Zefan , Johannes Weiner , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.* Message-ID: <20200213115015.hkd6uqwfjosxjfpm@e107158-lin.cambridge.arm.com> References: <20200211141554.24181-1-qais.yousef@arm.com> <20200212221543.GL80993@mtj.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200212221543.GL80993@mtj.thefacebook.com> User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tejun On 02/12/20 17:15, Tejun Heo wrote: > On Tue, Feb 11, 2020 at 02:15:54PM +0000, Qais Yousef wrote: > > LTP cpuset_hotplug_test.sh was failing with the following error message > > > > cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5). > > > > Which is due to a race condition between cpu hotplug operation and > > reading cpuset.cpus file. > > > > When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its > > internal data structures with the new values. If a read happens during > > this window, the user will read a stale value, hence triggering the > > failure above. > > > > To fix the issue make sure cpuset_wait_for_hotplug() is called before > > allowing any value to be read, hence forcing the synchronization to > > happen before the read. > > > > I ran 500 iterations with this fix applied with no failure triggered. > > > > Signed-off-by: Qais Yousef > > Hello, Qais. I just applied a patch which makes the operation > synchronous. Can you see whether the problem is gone on the > cgroup/for-next branch? > > git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next I ran 500 iterations of cpuset_hotplug_test.sh on the branch, it passed. I also cherry-picked commit 6426bfb1d5f0 ("cpuset: Make cpuset hotplug synchronous") into v5.6-rc1 and ran 100 iterations and it passed too. While investigating the problem, I could reproduce it all the way back to v5.0. Stopped there so earlier versions could still have the problem. Do you think it's worth porting the change to stable trees? Admittedly the problem should be benign, but it did trigger an LTP failure. I can check 4.19 and 4.14 stable trees (which at least in Android world are still relevant) if you agree it makes sense to put a fix in stable. Thanks -- Qais Yousef