Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6512254ybi; Sun, 21 Jul 2019 19:51:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqxJ3dcGhmqPl85oWrHeb5RQjdnIVBSowmMQ56/VzibOG2k1xeRczBmGQu1b3r8gl3WncU3a X-Received: by 2002:a17:902:9b81:: with SMTP id y1mr75683925plp.194.1563763918888; Sun, 21 Jul 2019 19:51:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563763918; cv=none; d=google.com; s=arc-20160816; b=QiDGLXDGFaYzplDInB3T4xxlsX7zVCY+cdPXJQLWODP7IFY97pshxyBthE69OJR+/E 56229HD/ymKzZnp8RjfYg6nS3v4sgaTiH7LjZEBtANbe64MWW7yv9umJ7A+ESEIqURnZ p8wpiDeO1iS9VJJLYCKEW4KD5BizbGpmp4J8cjLJ8sVzJaWUsp/FmOSTPyFfPUfUNYFu hEPakBf5R4xW98weUpTISQYBgNfaGWvsRgFJ9o4v7K3ss9zc5OjHNxVxu11FQDsL9Dig N2MF9XWzGez0nmEkzVz3rdWb9y7GK1pyEFKEsUAVGs8bWon9hwkamsq9IqCRcyKNF/0B HHLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:message-id:subject:cc:from:to :in-reply-to; bh=cnTPIaxho0Os4S+Iz4sMZ+00qMO4MCeKGI7P6Zqwj+8=; b=thSrKFEfP1dSBYJkVZu17tP7rNigFmS0dddePOjqsgozYsXkDttssHewfKUDyykQma 2k9ldJfgb706Ui7gHsDadLOtCf6CJ5qovwfwPWruAiWX2jAxKhDsn9mHU6uBq5PBsp9x Us7Hf0udz+Mi0ZNUY+Y4SNwQ3y9H/QuyC8EO3Ey1XyxIJxT8u9goRFpsF8Cgl4vq5dZT kWDdAGYXwFKZP3zgdlAqb1tVpCDJA8gakpD2hmF3KjDJBCbzZijkJIaJN+o0nDx+3BUb 2BJCoB+G3OA7SzFU85J73NYiP/zE3NF7uz0qT38o+OdCLF/9b8HkTbhJzCmeb0vidwGb Y9Hg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1si8140659pgn.77.2019.07.21.19.51.43; Sun, 21 Jul 2019 19:51:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728692AbfGVCsS (ORCPT + 99 others); Sun, 21 Jul 2019 22:48:18 -0400 Received: from bilbo.ozlabs.org ([203.11.71.1]:57865 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728360AbfGVCsS (ORCPT ); Sun, 21 Jul 2019 22:48:18 -0400 Received: by ozlabs.org (Postfix, from userid 1034) id 45sQyk612Wz9sBF; Mon, 22 Jul 2019 12:48:14 +1000 (AEST) X-powerpc-patch-notification: thanks X-powerpc-patch-commit: 4d202c8c8ed3822327285747db1765967110b274 In-Reply-To: <1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com> To: "Gautham R. Shenoy" , Benjamin Herrenschmidt , Paul Mackerras , Breno Leitao , Vaidyanathan Srinivasan From: Michael Ellerman Cc: "Gautham R. Shenoy" , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() Message-Id: <45sQyk612Wz9sBF@ozlabs.org> Date: Mon, 22 Jul 2019 12:48:14 +1000 (AEST) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2019-07-17 at 10:35:24 UTC, "Gautham R. Shenoy" wrote: > From: "Gautham R. Shenoy" > > xive_find_target_in_mask() has the following for(;;) loop which has a > bug when @first == cpumask_first(@mask) and condition 1 fails to hold > for every CPU in @mask. In this case we loop forever in the for-loop. > > first = cpu; > for (;;) { > if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1 > return cpu; > cpu = cpumask_next(cpu, mask); > if (cpu == first) // condition 2 > break; > > if (cpu >= nr_cpu_ids) // condition 3 > cpu = cpumask_first(mask); > } > > This is because, when @first == cpumask_first(@mask), we never hit the > condition 2 (cpu == first) since prior to this check, we would have > executed "cpu = cpumask_next(cpu, mask)" which will set the value of > @cpu to a value greater than @first or to nr_cpus_ids. When this is > coupled with the fact that condition 1 is not met, we will never exit > this loop. > > This was discovered by the hard-lockup detector while running LTP test > concurrently with SMT switch tests. > > watchdog: CPU 12 detected hard LOCKUP on other CPUs 68 > watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago) > watchdog: CPU 68 Hard LOCKUP > watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago) > CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1 > NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000 > REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le) > MSR: 9000000002883033 CR: 24028424 XER: 00000000 > CFAR: c0000000006f558c IRQMASK: 1 > GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18 > GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8 > GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8 > GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001 > GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18 > GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0 > GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f > NIP [c0000000006f5578] find_next_bit+0x38/0x90 > LR [c000000000cba9ec] cpumask_next+0x2c/0x50 > Call Trace: > [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable) > [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240 > [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0 > [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260 > [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0 > [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0 > [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90 > [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220 > [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x] > [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x] > [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0 > [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50 > [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0 > [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0 > [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70 > [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160 > [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70 > Instruction dump: > 78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039 > 4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040 > > To fix this, move the check for condition 2 after the check for > condition 3, so that we are able to break out of the loop soon after > iterating through all the CPUs in the @mask in the problem case. Use > do..while() to achieve this. > > Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE > interrupt controller") > Cc: # 4.12+ > Reported-by: Indira P. Joga > Signed-off-by: Gautham R. Shenoy Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/4d202c8c8ed3822327285747db1765967110b274 cheers