Received: by 10.223.164.202 with SMTP id h10csp4014132wrb; Mon, 20 Nov 2017 08:31:47 -0800 (PST) X-Google-Smtp-Source: AGs4zMYZh+W4BLtEZcVqYDHxxCOUYbbrN25Whc+OlIOtU2cJLhSGsHlIuxxK9Vwvijf2miTjVsY7 X-Received: by 10.98.220.218 with SMTP id c87mr11960946pfl.229.1511195507806; Mon, 20 Nov 2017 08:31:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511195507; cv=none; d=google.com; s=arc-20160816; b=0oQQ+dxg0yCk5kLBrL5ghdxlyL9VeOWpG68TUm27k8JVmP3E1OipZnoy03zaxC7Ggy pIVsS+k8gwaonmhKn7FjqBqYXMGqCSJt20QRcRx7hTP/i4Pa739Zvv9aLK5gGuSH5G1/ Az6cnCLwfQDmAsBOUVFCvZzml7PWqQkRmFB4zUyrqiU0ydpq/uM1CFSkarFEIhJ7fr2C cQzJp6V36U0vhvOGp8UstPhaMHjwt1WCvCS64cBC5dzDiXopJL0Y+X20PQCbYvqEMCvX ASSG2xGta9LO8Y93bseZpWcAFKZU9dAaXfrpTJb9SNN9JB76UPcZWYmeWemXZHOo7QNh Sg1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:spamdiagnosticmetadata :spamdiagnosticoutput:user-agent:in-reply-to:content-disposition :mime-version:references:reply-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=AcIeSgPmO354BcOrL4DEtqEx0vDawjh6GWSPzw3Nvk8=; b=Sp+I1H7xElkcaB/zmTPMFvZkVI6GFg7q2T0KLP60sNYtSNamYad16/YPesW3tGUr5g xPTJm93sFu1XzAqdM0PpKDWJsLpCIFi6/o5boHpOx6/c6XsqAdkbhBx1ctOt2dNcVNQ6 iMzsGrFqOrcvf6nZ9AS9gK4Fzeby4XV4iqSasRB5BlxLkiLT6EpTn+FeI8J677/q19Ej FK3XHVEETxNHLYTHOxYJ6rRAnkE0CFUd4IU3G91vp9FEe4oqdmLb7G1ReFf/cWF+Tm13 XeqUKvdY2u4Uv42VxLKgVncf3OCzBshQPS/NoU5pqboI1nrlEnWqGnPchvPkchg7k40d RAOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@concurrentrt.onmicrosoft.com header.s=selector1-concurrentrt-com02e header.b=lv8Kog4M; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z99si8750450plh.355.2017.11.20.08.31.37; Mon, 20 Nov 2017 08:31:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@concurrentrt.onmicrosoft.com header.s=selector1-concurrentrt-com02e header.b=lv8Kog4M; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751751AbdKTQav (ORCPT + 66 others); Mon, 20 Nov 2017 11:30:51 -0500 Received: from mail-by2nam03on0139.outbound.protection.outlook.com ([104.47.42.139]:38821 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751454AbdKTQat (ORCPT ); Mon, 20 Nov 2017 11:30:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=concurrentrt.onmicrosoft.com; s=selector1-concurrentrt-com02e; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=AcIeSgPmO354BcOrL4DEtqEx0vDawjh6GWSPzw3Nvk8=; b=lv8Kog4MPqv0xux4wQQx72vWZJ4Z98+tLXlryFBv/WOQYsl4XWehtyk/OJRwdZAX3+Ol08vztfJncR5vbsnvwYCUZK3rTgoziQ5V3DTYPMOUu3Ssi/DRB8ZfqD//wXBeGMurcPocR0yMaQT3Xi65O8f905RIFoN4hLfibcnz1fI= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Joe.Korty@concurrent-rt.com; Received: from zipoli.concurrent-rt.com (12.220.59.2) by CY4PR11MB1477.namprd11.prod.outlook.com (10.172.69.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.239.5; Mon, 20 Nov 2017 16:30:45 +0000 Date: Mon, 20 Nov 2017 11:30:40 -0500 From: joe.korty@concurrent-rt.com To: Steven Rostedt Cc: Thomas Gleixner , Peter Zijlstra , Linux Kernel Mailing List Subject: Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed Message-ID: <20171120163040.GA25993@zipoli.concurrent-rt.com> Reply-To: "Joe Korty" References: <20171115192529.GA14158@zipoli.concurrent-rt.com> <20171117174851.2a253785@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171117174851.2a253785@gandalf.local.home> User-Agent: Mutt/1.8.3 (2017-05-23) X-Originating-IP: [12.220.59.2] X-ClientProxiedBy: BN6PR1001CA0013.namprd10.prod.outlook.com (10.174.84.26) To CY4PR11MB1477.namprd11.prod.outlook.com (10.172.69.148) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: bbb1282e-1976-41ae-5754-08d530340d5c X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603199);SRVR:CY4PR11MB1477; X-Microsoft-Exchange-Diagnostics: 1;CY4PR11MB1477;3:HrliGr8a87L1j7SCKiIrBh9uAldumrIEFEcV2sNSwSa+lMzWx83TtYUAMbGRwQ/C98cME2ie4tXJgxB98TN6MH6BPmv/cEx9iXGWhlZGwiXxFWL0czoUStuEe29hkcN3RZQn17OVEggPs8od+OZzx8+Tcttm5DFcN8d9jgBinJndS9DdZGK1VbJMar2UzJ/zG3Zk210qdtCOod1YA3zuEOUiTE6l4lxxDj+o5GGIYOgx2RuaF+3XMUPmzXW8d93m;25:YdygbeNu5R6v2GbdqblbHlY9e02TKf0V2pvxbiaQJ7VRWZYYN0epUEcmEz8qbbM9RHm/Gz+Dig8kUZ1yFIen+FeoZwo+N0h2H7xbZwIISpovKhQHKj9CI9KHjhsMq3ib8MJuzgSOHVdgdR002AgOQIE3c3zLMawAVpMHlQMnI6Fa4UyuKA9GvSaGFPJ/zUgCYfiyjcLbgcXT54/LhIff3l9sGDI+yvnBZetCTH2N49vQo9bphjkfOqC4b3l3r48YaommbIVaMm9Pm4gXkvMQ86d2Vq7rFCVrjayhsd6WtnkTEHXNJrI/1bRwc1DUyPpuB5CvSVby5UAFkJdo/VAdzA==;31:CKweXCFadJsULoVnEzXKlTti5Zb72UWiSIpuPyZTuJcOvmgT0V6ibtKJdhjmqpaQ6tL6xZfFTWFKmMlUNRPizheAFO79Lp6CQ4tZBlMONM6gCL2nhBWrHGlc6oA+w/sIBaBGnkZRus859GCPP6ybGJzZ/HKo45n0WWO01bwloi5J3R19emlTHSykgkZCL67ftxRzuNfI/1MuCpxOIEPrrqVupdFCPNSV+LWKxzYOCnw= X-MS-TrafficTypeDiagnostic: CY4PR11MB1477: X-Microsoft-Exchange-Diagnostics: 1;CY4PR11MB1477;20:mI3zs0BqmhkadAvtyoPh7TL6yOLEg+y9Gx+WM30Lcibzfs3JKrF9uO2FbNFTOUZ69sYqSvO5GmMkEaPwPoP8J4WrTa7CRkdBlDnYWmYBDa7yFPKOK/1+2LuoXpJeipqJcg4CcYEE3ilpWBTu9z8bsL29LD0JnxvicqThyKvzWcQ=;4:j3e0zGxPNnq/l4fs9ipgmyxcMsG4LNIdIVJeLNkp0h+IpumU8vL70YdToR0jFu8xfMDur5zyAXk/6JPEfmBbytiB+r/D6mgl1GzHmqdwL/L2joj0YIDO1WSB6Jvm5CE4DA4P5KloAV8E7DYPzqWihnt6+c8ufm7yZuRWB0er385kxrbKkhX5Qh2kjyTRCXeQt68UpSorHc0TPA9KMmGbH8oWvSrewfGaM9q49fvxq1ed+7/6rxIjGANdD49Xjmhz4R00rpfulVBeXRf/5DeC7A== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(100000703101)(100105400095)(10201501046)(3231022)(6041248)(20161123560025)(20161123564025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CY4PR11MB1477;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CY4PR11MB1477; X-Forefront-PRVS: 04976078F0 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(39830400002)(346002)(376002)(189002)(24454002)(199003)(5423002)(53416004)(81156014)(81166006)(97736004)(478600001)(25786009)(189998001)(8676002)(72206003)(105586002)(101416001)(68736007)(69596002)(305945005)(106356001)(33656002)(6246003)(7736002)(9686003)(2906002)(5660300001)(53936002)(229853002)(23726003)(1076002)(3846002)(6116002)(55016002)(2950100002)(6916009)(6666003)(4326008)(8936002)(85782001)(47776003)(50466002)(16586007)(58126008)(86362001)(43066004)(83506002)(54906003)(316002)(16526018)(50986999)(54356999)(66066001)(76176999)(18370500001)(85772001);DIR:OUT;SFP:1102;SCL:1;SRVR:CY4PR11MB1477;H:zipoli.concurrent-rt.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; Received-SPF: None (protection.outlook.com: concurrent-rt.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY4PR11MB1477;23:WW31yQqfSZvgrLkKsrKjmACX+0WPIfE50GosTX+pg?= =?us-ascii?Q?8lMgYDnAowXfifspEJOC7pRLWhNMca4nBkluaeNqN7LNdMT5P5aA7il0byRj?= =?us-ascii?Q?lbBt+qXKNc9Q60lGKetLaqiS5FHcsrBFN/0xHKLEFBLwxPDpJ9uk7h6v6UyY?= =?us-ascii?Q?gQ4NrZsK/NdaCg9O263039e8bmtJQ6zox+qjanWBT7nuFd7aoCwNKCfCgfHT?= =?us-ascii?Q?fok25yROBcSQpgY/MA0WF7dcnPcdwm2hxHtNLP/HOdAtXuwy2v5QkVIAyybv?= =?us-ascii?Q?4cYhTzlOqRpWDsWZaEya171QoVouHNX7H0iCqaMI6M6Cv3JAWfLyvbE5Ncw/?= =?us-ascii?Q?mOK7u1jtr+hXdKbd++EGaM0ZkDVCD4Q0bR9LUQeCuS6zJfuNUQBhRGziPXTe?= =?us-ascii?Q?HdRpguaQIVm6togMl5y9kHEp25yMK/iG8RaEkSwC/DxdEYsoW3uMpGf9TrR7?= =?us-ascii?Q?q50orjB4PddFzk4YLK1o8IdKIn9oQwCGnK/dJ0y3AOQEcxmgCXR4MFY5auD9?= =?us-ascii?Q?sVdtNe0/dSCextQqwIT1HoBIVTArJAKgsg12qYYx0/Bwi1vD5/mSmXyZTwxv?= =?us-ascii?Q?kZmUjRmYgMPpmIext18IkEPc/SkRi69s0TNU30RYhvKZdUup3LNiHMyIY6U+?= =?us-ascii?Q?jwlYR2PFKS8g29QaYhkuQp4ggSnIOPEzTMV4RQyJ+kj4nQaJSWiCHx9+2s5u?= =?us-ascii?Q?+9ReAJ+JeW68EPCHbhIZAeXPWKIMHzxAK8gLsvJU1fECK86USMhxnhUI2r+K?= =?us-ascii?Q?QXraVVfzDefg3fJegjJReexw77MCCKhcBqSdl/z3E9wHGFo/ArEtrr2ILVwc?= =?us-ascii?Q?pfOQE5kVffsj5sttHf10yuPiFP2el0kL6wbUDffHCP1GvI9Y6JixHd8ASqSF?= =?us-ascii?Q?zwlblnlVjXILaR6OCXxkd16Ppr8NiLiDN6s3AYYjPRWdUHWsmjXXvLPhzcrG?= =?us-ascii?Q?iQcVdNbtTACw56S960VlPJrBrVyklfTE5lJssIDq2nyyc2A+747lkVvAWYge?= =?us-ascii?Q?TDBFRTma0hRqcCP2LOxvh4i6v57Q6QZFhhOUNygzzS9+DMemiuUMYYn722CW?= =?us-ascii?Q?Cnzc5bUOfoU399rc3pO7RmlaDxDcsVI2PMefnvd8VKCNNLWF93GjuYoJ7Lb2?= =?us-ascii?Q?2KHWaVYjFHjtGi80PK/glL8Cbm/VNKE3/iNEBqGO0xjdzNrdzIcDYSSoMPBM?= =?us-ascii?Q?q3XJRSexxqacVwg7F/LcrZBlpQYbs3IDK99roXshW0uzTMYrD0KEvRI9gk/n?= =?us-ascii?Q?z5WLvicAFFwCtE8VAWZyd5Ju6ptUlJvjHA3l016qRTGiXTNoHn7RlEk5fhso?= =?us-ascii?Q?n22MIOSUG1oncVkZyenHBO6ioQ4QtvulagAQyoS4fKt?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR11MB1477;6:hkQ0+XwuVYwhH8yamNjq5hFo5oX6p3OyYyF+JCeuq3ddCi1WrPL9Zylh7+1YIMAuIHQ/BhlIjQQQPlM6QcIEWGccEizFO2S3i3jl8wpWJTyt+i66T2Sf0D5nupH4cZ7kLidYnPUR5aEN8vkuJWk0WoXK1DK5w/tH9Tdc8PUTLQaydIkq1ZR/UxS353/hQOMIT4grGxmGtiSgsQBJAC3qxgRf2jT52MPJQpInxfWUs4TsqQ7Cvbdf2+i21EP8TUe9nfqpTEo2oj7QsTtuClvMaP2qJ4eRTNBbVOH1/In0tFScSRvrrMyYh9Iwhrfx8KL1JIPR0uWYMn84v8Czb+pyY8cPu2fcPjBlRtTu8f+FE60=;5:P1U7K4P1OVMtyq1JCPzVR1BlQmwECseLMP1hHU+wAEMgSdeDwDpDvkthigM0uCmRCJoIgMzzNnLnZmXJnFkuJwdR3Ngml+x+meRKmDa8nD2fOc/2VnBbKLpWHrjn7Z9ct1TVV72IhJD7pGPnkmUM0+JDdVmRy2uNaw2Rqn4zl5k=;24:aa43FMmv9yCPfNLVqPddmX+jy/719dHUp6lUclLWOhhnM49VnBQWnhvmZFv37S4aI5OaMjmD75FwYzjMgds/cNk0I3SfLTFtGg4akzbxOdM=;7:pwCtdvUItvnEhL8XjlPfpu71EMh0fkgHo7MYiYAWsPzHth7e9m3L5G39mMW+RwY0OEjaPi8ML6OXm2D01cBCOSXEkq2HeCucB/mivVKaH+td0BN83XcMJ0H9yUx0SwtqkMkwGU3V40bJe0opwELNL6R+sA92QjTwBxB11nFYGDjXQtG5iQfqRfrO/CHh/MmLrel4/nhN4vObaFZSI2PJP6dylN1FdI++Wqvi9qM5WloDvp3ImIRrVLSS7BVU78T3 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: concurrent-rt.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Nov 2017 16:30:45.7756 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bbb1282e-1976-41ae-5754-08d530340d5c X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 38747689-e6b0-4933-86c0-1116ee3ef93e X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR11MB1477 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Steve, A quick perusal of 4.11.12-rt16 shows that it has an entirely new version of migrate_disable which to me appears correct. In that new implementation, migrate_enable() recalculates p->nr_cpus_allowed when it switches the task back to using p->cpus_mask. This brings the two back into sync if anything had happened to get them out of sync while migration was disabled (as would happen on an affinity change during that disable period). 4.9.47-rt37 has the old implementation and it appears to have same bug as 4.4-rt though I have yet to test 4.9-rt. The fix in these older versions could take one of two forms: either we recalculate p->nr_cpus_allowed when migrate_enable goes back to using p->cpus_allowed, as the 4.11-rt version does, or the one place where we allow p->nr_cpus_allowed to diverge from p->cpus_allowed be fixed. The patch I submitted earlier takes this second approach. Regards, Joe On Fri, Nov 17, 2017 at 05:48:51PM -0500, Steven Rostedt wrote: > On Wed, 15 Nov 2017 14:25:29 -0500 > joe.korty@concurrent-rt.com wrote: > > > 4.4.86-rt99's patch > > > > 0037-Intrduce-migrate_disable-cpu_light.patch > > > > introduces a place where a task's cpus_allowed mask is > > updated without a corresponding update to nr_cpus_allowed. > > > > This path is executed when task affinity is changed while > > migrate_disabled() is true. As there is no code present > > to set nr_cpus_allowed when the migrate_disable state is > > dropped, the scheduler at that point on may make incorrect > > scheduling decisions for this task. > > > > My testing consists of temporarily adding a > > > > if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p)) > > printk_ratelimited(...) > > Have you tested v4.9-rt or 4.13-rt if it has the same bug? If it is a > bug in 4.13-rt then it needs to go there first, and then backported to > the stable releases (which I'm actually working on now). > > -- Steve > > > > > stmt to schedule() and running a simple affinity rotation > > program I wrote, one that rotates the threads of stress(1). > > While rotating, I got the expected kernel error messages. > > With this patch applied the messages disappeared. > > > > Signed-off-by: Joe Korty > > > > Index: b/kernel/sched/core.c > > =================================================================== > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str > > lockdep_assert_held(&p->pi_lock); > > > > if (__migrate_disabled(p)) { > > + p->nr_cpus_allowed = cpumask_weight(new_mask); > > cpumask_copy(&p->cpus_allowed, new_mask); > > return; > > } From 1584416591736577915@xxx Sat Nov 18 15:03:30 +0000 2017 X-GM-THRID: 1584176322889836856 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread