Received: by 2002:ac0:e350:0:0:0:0:0 with SMTP id g16csp679237imn; Fri, 29 Jul 2022 21:27:32 -0700 (PDT) X-Google-Smtp-Source: AA6agR4p+Wkov1+dLzl+mZIfpMIJDlXbkXUlLakg4X3HhWPlT9whmArUrbRLcbaUSzXz7K3fx2kF X-Received: by 2002:a17:90a:ca85:b0:1f3:1058:5048 with SMTP id y5-20020a17090aca8500b001f310585048mr7479753pjt.40.1659155251900; Fri, 29 Jul 2022 21:27:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659155251; cv=none; d=google.com; s=arc-20160816; b=XGrPJ+n1zlCEbh70g0YgVPPT1L8YmoF7B/ayd4TyJw+VMAGDqb6LhVsnTWCww10fQB cRgLdqwleFU4P+vnNHtPsdv1ptkSA19nvXZRii33b1UlZgwCOQAmomYWrhDOuwSrjn5B IbAlDrlWUZTm7eKGBikr2XrkR6Ygk2mrwpD8PZm1EK4DkCuG8S3oHxYMfAvyZIVO7/AM C0oWiQOnqxbXImiWRUl3eYlXDAyKvvN7XLRpdHZhz79RBTTknDiN2nH5V5JEfuIjuzoC ckHJDYphjykh9G6gbBRFjz+OQ3982hUYVFqOTj1lz8BEjPWUrmPxZIzUZo5Vf3w1+RbF W1Qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:in-reply-to:date:subject :cc:to:from:user-agent:references:dkim-signature; bh=F3IeneTwCJW5d+Il05ySjjZRUCa3E5afzHaOl38agr4=; b=kC50hQlD93S2P481LEGvXLPGrlDqMXp5P+H+rtqqq4uonccu8lEUiRCL5rCF1JRqDu aZmKv6/n6xQNENJ5WvM0V0tR8oc4hu9FJbK+LGqodWuzTSRZjrXDVsxn8afgtsbq2uG9 2u0i8M8B8gXwwuF/m+DXp7XGvO/47s7T6b9/A8m+vU0zNKV6UHbCuORQxqjFT6Dy6ZFo 92gjXm+YY0n+2q50+FPlGpF8FqTOinw0NVQPlLDjHMgBcB+e1nsJtIMAJ3babuwVeO9B M0j1aGdN+I0Dg8xLglSpYi4vHalI0ZmeSvcaAh/Px4WjgEeGN7S2Cii+ebylizq1SXl5 Bgdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=QeJEXzzS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h14-20020a056a00170e00b0052ab931ec55si5889472pfc.184.2022.07.29.21.27.17; Fri, 29 Jul 2022 21:27:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=QeJEXzzS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234001AbiG3EUC (ORCPT + 99 others); Sat, 30 Jul 2022 00:20:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229464AbiG3EUA (ORCPT ); Sat, 30 Jul 2022 00:20:00 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D7E1E08D for ; Fri, 29 Jul 2022 21:19:59 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id c3so6157887pfb.13 for ; Fri, 29 Jul 2022 21:19:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:message-id:in-reply-to:date:subject:cc:to:from :user-agent:references:from:to:cc; bh=F3IeneTwCJW5d+Il05ySjjZRUCa3E5afzHaOl38agr4=; b=QeJEXzzSkp5Q9X6QvWOOHI9mew/nwQrHrEu3MSiTEyBYzenK394pSi2LKAItqbBgZV 9tMnILRcWu7qGm2kFRRop69UVorKESZapHTZLcRayNM3yKWSxf0fjLTBpdaLeU3MsHJe LtWg41YyjX7hhOaZMX3/4rmTwcI/UZzu0np9AY4eeikigInVojeEw0kdbQuqq2imhpoo nrjp9eTvW1H8ZBRKe3nwEMoxkVV372RZyOlx2XeDQImIR0HvKTaAA/V1bmFZD24SFmSw AHKtX3QRL9sWTUxx5OfH2Bo9N/ELSUm7aPWzpgQzavZ+z2acaCJ1qnhYjDu7jSmKvHvp lmbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:in-reply-to:date:subject:cc:to:from :user-agent:references:x-gm-message-state:from:to:cc; bh=F3IeneTwCJW5d+Il05ySjjZRUCa3E5afzHaOl38agr4=; b=pWrialel485rhhLF9pw3/WuQmm36mwla2nlDe82HrekRnA7hqmnMnN/l9AOI5LzAxu ZJj851L/cSnqQ7e+J5/Zz8u88caT23GcLzRdidwZEn4RelhTkXyjAD8eDFup64A7kCo8 bKjMt9cUYPb7UvXrzec5E0/JyzgjHOGlu8+iYVcSJgf+rwtKWGZwbzvPRhD2Z2jx2eA8 mNZbyWSxiXE0uvZRrwAgNGFvcqNsCaiG5xI/MVE3OpC5tuSP3gDtM1tj0bGXOLRWbHPO bWArgn9Ieu0mp3fJKystkMsX00NHWX9lELcBli6ZLbcI8pn/eHmM0HzK5Q9Y7pVRRfNc QYyg== X-Gm-Message-State: ACgBeo1WRswjPaBkZ6YL8YUolr+eJNX0eKnu+ALDQqoG/qZ3CRaHDNUU Kx7qzf9xa11VDYnSbVbh01ZEsZt1NU4fQw== X-Received: by 2002:aa7:810a:0:b0:52c:e906:e355 with SMTP id b10-20020aa7810a000000b0052ce906e355mr3364470pfi.16.1659154797927; Fri, 29 Jul 2022 21:19:57 -0700 (PDT) Received: from ArchLinux (ec2-13-59-0-164.us-east-2.compute.amazonaws.com. [13.59.0.164]) by smtp.gmail.com with ESMTPSA id i1-20020a1709026ac100b0016bee668a5asm4452397plt.161.2022.07.29.21.19.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Jul 2022 21:19:57 -0700 (PDT) References: <20220707090501.55483-1-schspa@gmail.com> <0320c5f9-cbda-1652-1f97-24d1a22fb298@gmail.com> User-agent: mu4e 1.7.5; emacs 28.1 From: Schspa Shi To: Peter Zijlstra Cc: Lai Jiangshan , tj@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] workqueue: Use active mask for new worker when pool is DISASSOCIATED Date: Sat, 30 Jul 2022 11:49:41 +0800 In-reply-to: Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter Zijlstra writes: > On Wed, Jul 13, 2022 at 05:52:58PM +0800, Lai Jiangshan wrote: >> >> >> CC Peter. >> Peter has changed the CPU binding code in workqueue.c. > > [ 1622.829091] WARNING: CPU: 3 PID: 31 at kernel/sched/core.c:7756 sched_cpu_dying+0x74/0x204 > [ 1622.829374] CPU: 3 PID: 31 Comm: migration/3 Tainted: P O 5.10.59-rt52 #2 > ^^^^^^^^^^^^^^^^^^^^^ > > I think we can ignore this as being some ancient kernel. Please try > something recent. Hi peter: I spent a few days writing a test case and reproduced the problem on kernel 5.19. I think it's time for us to review the V3 patch for a fix. The V3 patch is at https://lore.kernel.org/all/20220714031645.28004-1-schspa@gmail.com/ Please help to review it. Test branch as: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tag/?h=v5.19-rc8-rt8 I think this code is new enough to demonstrate that the problem persists. The log as fellowing: [ 3103.198684] ------------[ cut here ]------------ [ 3103.198684] Dying CPU not properly vacated! [ 3103.198684] WARNING: CPU: 1 PID: 23 at kernel/sched/core.c:9575 sched_cpu_dying.cold+0xc/0xc3 [ 3103.198684] Modules linked in: work_test(O) [ 3103.198684] CPU: 1 PID: 23 Comm: migration/1 Tainted: G O 5.19.0-rc8-rt8 #1 [ 3103.198684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 3103.198684] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0x132/0x170 [ 3103.198684] RIP: 0010:sched_cpu_dying.cold+0xc/0xc3 [ 3103.198684] Code: 00 e9 a1 c1 40 ff 48 c7 c7 48 91 94 82 e8 99 29 00 00 48 c7 c7 00 5e 53 83 e9 e3 10 50 ff 48 c7 c7 98 91 94 82 e8 4f ec ff ff <0f> 0b 44 8b ab 10 0a 00 00 8b 4b 04 48 c7 c6 cd 37 93 82 48 c7 c7 [ 3103.198684] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010086 [ 3103.198684] RAX: 0000000000000000 RBX: ffff88813bcaa280 RCX: 0000000000000000 [ 3103.198684] RDX: 0000000000000003 RSI: ffffffff82953971 RDI: 00000000ffffffff [ 3103.198684] RBP: 0000000000000082 R08: 00000000000021ed R09: ffffc900000dbd38 [ 3103.198684] R10: 0000000000000001 R11: ffffffffffffffff R12: 0000000000000060 [ 3103.198684] R13: ffffffff810a9040 R14: ffffffff82c555c0 R15: 0000000000000000 [ 3103.198684] FS: 0000000000000000(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000 [ 3103.198684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3103.198684] CR2: 00007f85acd18010 CR3: 0000000102578000 CR4: 0000000000350ee0 [ 3103.198684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3103.198684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3103.198684] Call Trace: [ 3103.198684] [ 3103.198684] ? sched_cpu_wait_empty+0x60/0x60 [ 3103.198684] cpuhp_invoke_callback+0x3a4/0x5f0 [ 3103.198684] take_cpu_down+0x71/0xd0 [ 3103.198684] multi_cpu_stop+0x5c/0xf0 [ 3103.198684] ? stop_machine_yield+0x10/0x10 [ 3103.198684] cpu_stopper_thread+0x82/0x130 [ 3103.198684] smpboot_thread_fn+0x1bb/0x2b0 [ 3103.198684] ? sort_range+0x20/0x20 [ 3103.198684] kthread+0xfe/0x120 [ 3103.198684] ? kthread_complete_and_exit+0x20/0x20 [ 3103.198684] ret_from_fork+0x1f/0x30 [ 3103.198684] [ 3103.198684] Kernel panic - not syncing: panic_on_warn set ... [ 3103.198684] CPU: 1 PID: 23 Comm: migration/1 Tainted: G O 5.19.0-rc8-rt8 #1 [ 3103.198684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 3103.198684] Stopper: multi_cpu_stop+0x0/0xf0 <- stop_machine_cpuslocked+0x132/0x170 [ 3103.198684] Call Trace: [ 3103.198684] [ 3103.198684] dump_stack_lvl+0x34/0x48 [ 3103.198684] panic+0xf8/0x299 [ 3103.198684] ? sched_cpu_dying.cold+0xc/0xc3 [ 3103.198684] __warn.cold+0x43/0xba [ 3103.198684] ? sched_cpu_dying.cold+0xc/0xc3 [ 3103.198684] report_bug+0x9d/0xc0 [ 3103.198684] handle_bug+0x3c/0x70 [ 3103.198684] exc_invalid_op+0x14/0x70 [ 3103.198684] asm_exc_invalid_op+0x16/0x20 [ 3103.198684] RIP: 0010:sched_cpu_dying.cold+0xc/0xc3 -- BRs Schspa Shi