Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751711AbdG0TIB (ORCPT ); Thu, 27 Jul 2017 15:08:01 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:59869 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751562AbdG0TH7 (ORCPT ); Thu, 27 Jul 2017 15:07:59 -0400 Subject: Re: [PATCH v6] workqueue: Fix edge cases for calc of pool's cpumask To: Tejun Heo Cc: Lai Jiangshan , linux-kernel@vger.kernel.org, nfont@linux.vnet.ibm.com References: <5c421712-fb38-3020-d3fc-b9bdea792cd3@linux.vnet.ibm.com> <20170727183148.GH742618@devbig577.frc2.facebook.com> From: Michael Bringmann Organization: IBM Linux Technology Center Date: Thu, 27 Jul 2017 14:07:53 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: <20170727183148.GH742618@devbig577.frc2.facebook.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 17072719-0012-0000-0000-000014BFC107 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007436; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00893760; UDB=6.00446854; IPR=6.00673920; BA=6.00005495; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016412; XFM=3.00000015; UTC=2017-07-27 19:07:57 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17072719-0013-0000-0000-00004ECD5DC1 Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-27_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707270297 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3111 Lines: 64 On 07/27/2017 01:31 PM, Tejun Heo wrote: > On Thu, Jul 27, 2017 at 01:15:48PM -0500, Michael Bringmann wrote: >> >> On NUMA systems with dynamic processors, the content of the cpumask >> may change over time. As new processors are added via DLPAR operations, >> workqueues are created for them. Depending upon the order in which CPUs >> are added/removed, we may run into problems with the content of the >> cpumask used by the workqueues. This patch deals with situations where >> the online cpumask for a node is a proper superset of possible cpumask >> for the node. It also deals with edge cases where the order in which >> CPUs are removed/added from the online cpumask may leave the set for a >> node empty, and require execution by CPUs on another node. >> >> In these and other cases, the patch attempts to ensure that a valid, >> usable cpumask is used to set up newly created pools for workqueues. >> This patch provides a fix for NUMA systems which can add/subtract >> processors dynamically. The patch is expected to be an intermediate >> one while developers search for any underlying issues. > > Please start with describing what the underlying problem is - CPU <-> > NUMA node mapping change on powerpc. The mapping shouldn't change, > not just for workqueue, but because we don't have any kind of > synchronization around the mapping throughout allocation paths. And > then, please describe how this patch can at least prevent immediate > crashes in a lot of cases. How about this: The problem lies with the ordering of events with respect to the order in which we add (or remove) CPUs to NUMA systems, and make use of that knowledge. The CPUs present are assigned to nodes, and workqueues and their infrastructure are created to use the CPUs in a node. Workqueues are created at boot time and updated or created as CPUs are added or removed. However, there is little or no synchronization or ordering of these events, and the data structures mapping CPUs to nodes may not be updated before the workqueue infrastructure is built for a node. Thus we have the possibility of an invalid CPU mask attribute being attached to a newly created workqueue before the CPUs have been properly registered and published to a node. This patch attempts to provide a partial ordering of events within workqueue by delaying the use of newly calculated CPU masks as the value for a workqueue attribute until they have valid content. Instead the workqueue code must delay creating new workqueues until this function succeeds, or it can use a previously calculated cpumask attribute that is known to be valid. This patch attempts to ensure that a valid, usable cpumask is used to set up newly created pools for workqueues. This patch provides a fix for NUMA systems which can add/subtract processors dynamically. The patch is expected to be an intermediate one while developers find and correct any underlying issues. > > Thanks. > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mwb@linux.vnet.ibm.com