Received: by 2002:a05:6358:bb9e:b0:b9:5105:a5b4 with SMTP id df30csp4351412rwb; Tue, 6 Sep 2022 06:25:23 -0700 (PDT) X-Google-Smtp-Source: AA6agR5Gwxv7c5HVpCb3fZxcitP+nk91SQgOX+bvVQH6K/bRL1WMbPwcWDTlfNnyx9LjjD1tySuf X-Received: by 2002:a17:907:a057:b0:730:a2d8:d5ac with SMTP id gz23-20020a170907a05700b00730a2d8d5acmr38557138ejc.764.1662470723113; Tue, 06 Sep 2022 06:25:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662470723; cv=none; d=google.com; s=arc-20160816; b=QHfbz5YcOqO1Q6OqjaLiLsWjtI3JwinDVA3NjbYjJ56+Xh2WKG+yf/OwcJHTE6Jkns +N6XvHmo56hXxlYRZjxheuEp9eE4AvGL/NMXs8xMNMRarwcDWom7Hx36iNoInUJAWg45 urt8a9d4LIt4tEnvQsv7QseXwz1m5i7pyDEifyn67+DNfWqr0UTE6yMhtMacj5Pfg9fC zs/9Koy2cuVGMf87xbWWtFQY7Qk0MaiSFPzdiIoM4e0X6VhL2MI+E3gKxk4qt8XpgeaP Yh0Bd253O2xo6eQlBxPSF1ZXNWLhnH+CKXpYh7WmdqbBo9ENJRqYwbrF8jUuLJrDfUTq CCbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:date:cc :to:from:subject:message-id:dkim-signature; bh=QmYjLs9Asoq2RnmK+JEPLEipTLEMcIgsHAUlvR3DFFY=; b=ApUhQkjS+NH1jIaCEcbmnvoref8bLlGX/IzlDOr9dFMzzMsC0f/FKLkjx7kkMbHD2g afjfqrWr85VXA4CaX/gmwS23L0q99BEmWb1iqjBo3lxNfGRQVl0i19Hq53ws1Pt6h92y vs2VVNWE4VO6X9OJVUafDAqgTMb2EZP5x0avHbF3+fCpYZJLVfZmnoxsVPV+0NRmiB+G YtT0IY9iLIeDnea+GQCT+HGn8Z/Q07dLDyJhtSm+OXRVj9iRpxsJa9+wNVrwMluoJGRl WeDctIT/i/uSb2PxhTs2555KGMZgw4SCk+cGewm7Aedr9LTpVK8Y/2dPtjICD35iWYSm D3VQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mediatek.com header.s=dk header.b=q7gqnjhC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mediatek.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fd5-20020a056402388500b0044dc1faa85dsi5620887edb.253.2022.09.06.06.24.55; Tue, 06 Sep 2022 06:25:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@mediatek.com header.s=dk header.b=q7gqnjhC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=mediatek.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232859AbiIFMzw (ORCPT + 99 others); Tue, 6 Sep 2022 08:55:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233271AbiIFMza (ORCPT ); Tue, 6 Sep 2022 08:55:30 -0400 Received: from mailgw01.mediatek.com (unknown [60.244.123.138]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A54541D1A for ; Tue, 6 Sep 2022 05:55:06 -0700 (PDT) X-UUID: 565fc1898ac64b9d9647b38260c8bd04-20220906 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mediatek.com; s=dk; h=Content-Transfer-Encoding:MIME-Version:Content-Type:Date:CC:To:From:Subject:Message-ID; bh=QmYjLs9Asoq2RnmK+JEPLEipTLEMcIgsHAUlvR3DFFY=; b=q7gqnjhCjLoQx2aoCeHbyOwF2CLs+GmF4fPzlWNRV1REEKnlkwMz0YpSuoetGVxv8j/wcpOz3U72SmccFCEsTo8/mckSIALNt9H6y8+fpW7bpDXrE1Bd/mXnNex/PYbQ3XiIrfema4kym08Gv8QyMNlQEiAxeLhG4JC7rjZX+DU=; X-CID-P-RULE: Release_Ham X-CID-O-INFO: VERSION:1.1.10,REQID:e1ea691a-d27e-4239-99f6-d99bc026798e,OB:0,L OB:0,IP:0,URL:0,TC:0,Content:-25,EDM:0,RT:0,SF:0,FILE:0,BULK:0,RULE:Releas e_Ham,ACTION:release,TS:-25 X-CID-META: VersionHash:84eae18,CLOUDID:e6605e21-1c20-48a5-82a0-25f9c331906d,C OID:IGNORED,Recheck:0,SF:nil,TC:nil,Content:0,EDM:-3,IP:nil,URL:0,File:nil ,Bulk:nil,QS:nil,BEC:nil,COL:0 X-UUID: 565fc1898ac64b9d9647b38260c8bd04-20220906 Received: from mtkmbs10n2.mediatek.inc [(172.21.101.183)] by mailgw01.mediatek.com (envelope-from ) (Generic MTA with TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 256/256) with ESMTP id 2069503692; Tue, 06 Sep 2022 20:55:00 +0800 Received: from mtkmbs11n2.mediatek.inc (172.21.101.187) by mtkmbs10n1.mediatek.inc (172.21.101.34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.792.15; Tue, 6 Sep 2022 20:54:58 +0800 Received: from mtksdccf07 (172.21.84.99) by mtkmbs11n2.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.2.792.15 via Frontend Transport; Tue, 6 Sep 2022 20:54:58 +0800 Message-ID: <6dab6e564e43c952f63f83ef868da6ed829fc1a8.camel@mediatek.com> Subject: BUG: list_add corruption while doing migrate_swap -> balance_push From: Kuyo Chang To: , , , , , , , , CC: , , , , , Date: Tue, 6 Sep 2022 20:54:58 +0800 Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-MTK: N X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,URIBL_CSS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, [Syndrome] A list_add corruption error at kernel-5.15, the log shows. list_add corruption. prev->next should be next (ffffff81a6f08ba0), but was 0000000000000000. (prev=ffffff81a6f05930). The call trace as below: ipanic_die notify_die die bug_handler brk_handler do_debug_exception el1_dbg el1h_64_sync_handler el1h_64_sync __list_add_valid cpu_stop_queue_work stop_one_cpu_nowait balance_push __schedule schedule do_sched_yield __arm64_sys_sched_yield invoke_syscall el0_svc_common do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync [Analysis] By memory dump and analyzing the stopper->works list, the error code flow as following: migrate_swap ->stop_two_cpus ->cpu_stop_queue_two_works ->__cpu_stop_queue_work (add work->list to stopper- >works respectively) ->list_add_tail(&work->list, &stopper->works); ->wake_up_q(&wakeq); ->wait_for_completion(&done.completion); ->wait_for_common ->schedule_timeout ->schedule At this point, the cpu hotplug trigged, It registers balance_callback by below flow: cpu_down(cpuid) ->_cpu_down ->cpuhp_set_state() ->set_cpu_dying(cpuid, true) ->sched_cpu_deactivate ->balance_push_set(cpuid, true) ->rq->balance_callback = &balance_push_callback; Finally, ->__schedule ->__balance_callbacks ->do_balance_callbacks(rq, __splice_balance_callbacks(rq, false)); ->balance_push ->stop_one_cpu_nowait *work_buf = (struct cpu_stop_work){ .fn = fn, .arg = arg, .caller = _RET_IP_, }; At this point the list_head *next, *prev is initial to NULL!! ->cpu_stop_queue_work ->__list_add_valid So it will hit this error if (CHECK_DATA_CORRUPTION(next->prev != prev, "list_add corruption. next->prev should be prev (%px), but was %px. (next=%px).\n", prev, next->prev, next) Do you have any suggestion for this issue? Thank you.