Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp465870iog; Fri, 17 Jun 2022 07:01:25 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sJneJbbrPH40diDWjPO9Sp9gUB+cPAj9kWXarSry4N1vFBv480i60XCzrMQ/oZqiw8l/1Y X-Received: by 2002:a17:902:c086:b0:168:fba8:4385 with SMTP id j6-20020a170902c08600b00168fba84385mr9781774pld.7.1655474485112; Fri, 17 Jun 2022 07:01:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655474485; cv=none; d=google.com; s=arc-20160816; b=Gg24Cu1qH3C9ED+0V5Y9GoW2dbrCjaNlLT3CthXavvlX4X/PuIzDU5XNbWt4PQAWhN msN77Vc2k1Xwhr/5x6ureHorLD8sDcDLfZmPV+JokMOEW3RYOJ4S2YOi778a9ISSrG7W TGOaTbadvynd3bXws91yQvKHCK5wf/NqoAFTUduEQ6r5Bjq69hVkns8TfB5lCFfxmcKa dJjJYejONhFSLtRHnqSm/0uBZiE2MD3yt+bpKKRA6Lnj/qWvl4AbdA6Sak+bZgR1OgTm MMtMBl6zkFHw9IgG4/DUo5quMj4A70levpVPgpYrRrarvXOG0PtYg857KDJxW2uWfSYo XMhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:content-disposition:mime-version :message-id:subject:cc:to:from:date; bh=OgVmaVC/3ZHoZMyOb2Q6/kHEiKZTYlUldCkCGoRdo20=; b=hnj71+aaw9QHftpqpWPnqOKT82updGVg4MXsfmID8Wfyavyxsw7D2l7g2+YgE8OUCj 4e70VWCiRBNzPNgCCvgHAi1q8wq+W/u3nedljMH3JVPjPM+5dtmhusEqQBLitLD0oXdI 2fbmfiTcDeSe6IxdVdDrIq310TuVFIh2SbGGQRZ3u0ErU8MiZ+r61KIm4VZ4mrutNFYW G7uzcFJ25jxn8IP+N4osERZPjpGWL27zNHZBqDkw7kAnno2/vju+yyfdXQD+8XR0d1lu CQh8fglf2z+IL7DMY/bTr892ISV33CRHKVScCR61OQ5NBnkfcBAelmymMRypsmHggyw3 enoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h19-20020a635753000000b003fddd12eb2asi6047982pgm.766.2022.06.17.07.00.59; Fri, 17 Jun 2022 07:01:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235661AbiFQNnj (ORCPT + 99 others); Fri, 17 Jun 2022 09:43:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382447AbiFQNne (ORCPT ); Fri, 17 Jun 2022 09:43:34 -0400 Received: from outbound-smtp55.blacknight.com (outbound-smtp55.blacknight.com [46.22.136.239]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0156714D37 for ; Fri, 17 Jun 2022 06:43:29 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp55.blacknight.com (Postfix) with ESMTPS id 5DE13FAD69 for ; Fri, 17 Jun 2022 14:43:28 +0100 (IST) Received: (qmail 25423 invoked from network); 17 Jun 2022 13:43:28 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 17 Jun 2022 13:43:28 -0000 Date: Fri, 17 Jun 2022 14:43:26 +0100 From: Mel Gorman To: Waiman Long Cc: Zhenhua Ma , Peter Zijlstra , Ingo Molnar , Will Deacon , Boqun Feng , LKML , Michal Hocko Subject: Lockups due to "locking/rwsem: Make handoff bit handling more consistent" Message-ID: <20220617134325.GC30825@techsingularity.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="lrZ03NoBR/3+SXJZ" Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE,T_SPF_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --lrZ03NoBR/3+SXJZ Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Hi Waiman, I've received reports of lockups happening in kernels including commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent"). The exact symptoms vary but usually it's either a soft lockup (older kernel with a backport), the task hanging and never exiting or the machine becomes generally unresponsive and ssh is broken. The problem started in 5.16 and reliably bisected to commit d257cc8cb8d5. Reverting the patch in 5.16, 5.17 and 5.18 finish the test successfully but I didn't test a revert on 5.19-rc2 because of other changes layered on top. The reproducer is simple -- start pairs of CPU hogs pinned to a CPU with different SCHED_RR priorities that run for a few seconds. It does not hit every time but usually happens within 10 attempts. On 5.16 at least, the tasks failed to exit and kept retrying to exit using the following path [<0>] rwsem_down_write_slowpath+0x2ad/0x580 [<0>] unlink_file_vma+0x2c/0x50 [<0>] free_pgtables+0xbe/0x110 [<0>] exit_mmap+0xc1/0x220 [<0>] mmput+0x52/0x110 [<0>] do_exit+0x2ec/0xb00 [<0>] do_group_exit+0x2d/0x90 [<0>] get_signal+0xb6/0x920 [<0>] arch_do_signal_or_restart+0xba/0x700 [<0>] exit_to_user_mode_prepare+0xb7/0x230 [<0>] irqentry_exit_to_user_mode+0x5/0x20 [<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20 [<0>] preempt_schedule_thunk+0x16/0x18 [<0>] rwsem_down_write_slowpath+0x2ad/0x580 [<0>] unlink_file_vma+0x2c/0x50 [<0>] free_pgtables+0xbe/0x110 [<0>] exit_mmap+0xc1/0x220 [<0>] mmput+0x52/0x110 [<0>] do_exit+0x2ec/0xb00 [<0>] do_group_exit+0x2d/0x90 [<0>] get_signal+0xb6/0x920 [<0>] arch_do_signal_or_restart+0xba/0x700 [<0>] exit_to_user_mode_prepare+0xb7/0x230 [<0>] irqentry_exit_to_user_mode+0x5/0x20 [<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20 The C file and shell script to run it are attached. -- Mel Gorman SUSE Labs --lrZ03NoBR/3+SXJZ Content-Type: text/x-c; charset=iso-8859-15 Content-Disposition: attachment; filename="fsim.c" #include #include #include void sig_handle(int sig) { exit(0); } int main(void) { unsigned long c; signal(SIGALRM, sig_handle); alarm(10); while (1) c++; } --lrZ03NoBR/3+SXJZ Content-Type: application/x-sh Content-Disposition: attachment; filename="run-fsim.sh" Content-Transfer-Encoding: quoted-printable #!/bin/bash=0A=0Aif [ ! -e fsim ]; then=0A gcc -o fsim fsim.c=0A if [ $? -n= e 0 ]; then=0A echo Failed to compile fsim=0A exit -1=0A fi=0Afi=0A=0AMAX= _ITERATIONS=3D100=0ACPU_MIN=3D"10 30"=0Afor i in `seq 1 $MAX_ITERATIONS`; d= o=0A echo "Start $i/$MAX_ITERATIONS: `date`"=0A for CPU in `seq $CPU_MIN $C= PU_MAX`; do=0A taskset -c $CPU chrt -r 10 ./fsim &>/dev/null &=0A taskset= -c $CPU chrt -r 40 ./fsim &>/dev/null &=0A done=0A echo "Wait $i/$MAX_ITE= RATIONS: `date`"=0A wait=0Adone=0A --lrZ03NoBR/3+SXJZ--