Received: by 2002:a05:6520:1682:b0:147:d1a0:b502 with SMTP id ck2csp5598381lkb; Mon, 11 Oct 2021 09:40:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyjDdnGRhpDxSBY+xbTzDkpF6X6wO0uYATtEf9K+4+Otr46ztHRdkMF+uI/zMI7CbD2kxhr X-Received: by 2002:a17:907:7f8e:: with SMTP id qk14mr26568193ejc.91.1633970455394; Mon, 11 Oct 2021 09:40:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633970455; cv=none; d=google.com; s=arc-20160816; b=HsyQQrf4m55TXW5FegZKVJVWjq8sTo+P4Ctl2Qnlbph05F16ZDTs3pREBue4ZOByOm JLDo9tGnjzrdWlDsxRmXA6fFzxYbVmysUFbVp1y4Eo0DRmprffUVsHem8f2RaRIfQm2O K/xfzvmMIX0qrz+/qD4NYwt8C+CpBJO4LjZMyCWWqKr9SR5WI9kTJkRCxwP1glw3Sbyf dBeVU0WJv6p30+oLlAtubgIjexSPbPvEkKcGpwZ5Kgpjru4mIjuK5pOmidl6jHchTGQi DwaqESBbeme7bCGEAQsnwIUV1zIsHt9Khn9zgaIG3JF87mf8Gj8GOi4ajI9smuFs9CA9 5rSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=JMT/HDVAUnQSbtufENFCBQ4ItG4NAMh4DkHJqbVm68E=; b=Zq2RzYmG2gbVrSP10GUu8tC4jOLjerobzHmdOqZIgSoWncsiPUDc5u52qtZ8RJ6JR8 oaamRV1CAaL2ZaHVirGZP3nweiAIU/7bGYBrB3b5U+P71Ln4EcQwe9A+JYaUhBCA4xj+ ZIwYP1XdlIr8CIpwB2Xa9yy4XjpDIoV+bHiGEUi32eBec7puJ+Z9oSBX7dx/qQDPPTcd 4UfGgZGz//NEjM6CL3kwEGnn5D5rmtdO2lQm/Rna1Mjm32ebaY3Qs5A0OEpjJsCvJ4SV 4QX/5dug2LspselQNuq3y0OVUPWXZh8dNo5dbai3YZVPldQYYiAcmiLOhF1MliDH8nHw qMVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=uv89gm8O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i8si20346886edc.529.2021.10.11.09.40.31; Mon, 11 Oct 2021 09:40:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=uv89gm8O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238729AbhJKOo4 (ORCPT + 99 others); Mon, 11 Oct 2021 10:44:56 -0400 Received: from smtp-relay-internal-0.canonical.com ([185.125.188.122]:44932 "EHLO smtp-relay-internal-0.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244152AbhJKOoc (ORCPT ); Mon, 11 Oct 2021 10:44:32 -0400 Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 4B4373FFFE for ; Mon, 11 Oct 2021 14:42:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1633963351; bh=JMT/HDVAUnQSbtufENFCBQ4ItG4NAMh4DkHJqbVm68E=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=uv89gm8OaArN0WEdKK23jdi8Ul5rWDLaKkyENshgSFUOxr3lg/sGzdXPXuPzV+UTz 2+08VPk+ccGvQQHP2T1ObUWTOA01ZcbxlAW4rI40mV9Cn9hXBwuI8kXnG6aQxLQPy7 2sz9EbU6mnDTABXYQOVhGp853PlSTayro1MephL/JcabLtPvA860NE45LAoXNi+kAv rrFYtoc/N+9g/+1Fr7NM6J6RAHwJEranA3/3WrdHCMzBGa3N2F0s3YB8lUsZQU0Ztt rduNBmlIhTSty8I0Yq2YUerIfPIA4+1Fr41IUmLGrT5lVUkcVM/0Y2LeBPdkBtCl3K z8nwgyfA7K9ZQ== Received: by mail-ed1-f70.google.com with SMTP id v2-20020a50f082000000b003db24e28d59so16096446edl.5 for ; Mon, 11 Oct 2021 07:42:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JMT/HDVAUnQSbtufENFCBQ4ItG4NAMh4DkHJqbVm68E=; b=xDl41d3iNnndA0LH09Ufu9X6yXWPRQIwGI4sAjvub1J7hzUxNM0+kzOD08/4v0WPCg 7zd4Yhli7kmaRpnCJSaTJTVYo48sz5KUeuZ396q5TYurJw36K5F5v+y1pMHBMotbg4Q+ Kycks96+Zw1xck6UVVsRBLFHdORFlGoeEoXxxmbrnCTb6QFtTP2v5ng5CO7HxnhBokLG V9oMZ1jrBdhYD2Apr/BkXVFelC+6r8Q16sTLnKb0tEp5YAxp0nMncAhFRUzQZ+mSkaKD tbXwtfnu7yNS/14FcvteieCqR+DJoifU2+a86QlcfwyOolb2i9lNZNZcLpOjqCvRux+I WLZw== X-Gm-Message-State: AOAM533GkbS0k9JthC838b6OCnYgpnsmJ06Et3sFUF/9aOTIo/SJzgwT JXcja8Gmxm0JTaR/+fi60H0UXjhQjHatYjqT5wezp1+T7zaN7dcr1JCbT+rkayIVUDC7M5FaYae 3hAEhq1AUGJMPa3VTc+UVqtsM0UKt4QEVxrjvAIaObg== X-Received: by 2002:a17:906:e089:: with SMTP id gh9mr26025335ejb.320.1633963350928; Mon, 11 Oct 2021 07:42:30 -0700 (PDT) X-Received: by 2002:a17:906:e089:: with SMTP id gh9mr26025313ejb.320.1633963350724; Mon, 11 Oct 2021 07:42:30 -0700 (PDT) Received: from localhost ([2001:67c:1560:8007::aac:c1b6]) by smtp.gmail.com with ESMTPSA id k23sm4333087edv.22.2021.10.11.07.42.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Oct 2021 07:42:30 -0700 (PDT) Date: Mon, 11 Oct 2021 16:42:29 +0200 From: Andrea Righi To: Marco Elver Cc: Dmitry Vyukov , Alexander Potapenko , kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: BUG: soft lockup in __kmalloc_node() with KFENCE enabled Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 11, 2021 at 12:03:52PM +0200, Marco Elver wrote: > On Mon, 11 Oct 2021 at 11:53, Andrea Righi wrote: > > On Mon, Oct 11, 2021 at 11:23:32AM +0200, Andrea Righi wrote: > > ... > > > > You seem to use the default 20s stall timeout. FWIW syzbot uses 160 > > > > secs timeout for TCG emulation to avoid false positive warnings: > > > > https://github.com/google/syzkaller/blob/838e7e2cd9228583ca33c49a39aea4d863d3e36d/dashboard/config/linux/upstream-arm64-kasan.config#L509 > > > > There are a number of other timeouts raised as well, some as high as > > > > 420 seconds. > > > > > > I see, I'll try with these settings and see if I can still hit the soft > > > lockup messages. > > > > Still getting soft lockup messages even with the new timeout settings: > > > > [ 462.663766] watchdog: BUG: soft lockup - CPU#2 stuck for 430s! [systemd-udevd:168] > > [ 462.755758] watchdog: BUG: soft lockup - CPU#3 stuck for 430s! [systemd-udevd:171] > > [ 924.663765] watchdog: BUG: soft lockup - CPU#2 stuck for 861s! [systemd-udevd:168] > > [ 924.755767] watchdog: BUG: soft lockup - CPU#3 stuck for 861s! [systemd-udevd:171] > > The lockups are expected if you're hitting the TCG bug I linked. Try > to pass '-enable-kvm' to the inner qemu instance (my bad if you > already have), assuming that's somehow easy to do. If I add '-enable-kvm' I can triggering other random panics (almost immediately), like this one for example: [21383.189976] BUG: kernel NULL pointer dereference, address: 0000000000000098 [21383.190633] #PF: supervisor read access in kernel mode [21383.191072] #PF: error_code(0x0000) - not-present page [21383.191529] PGD 0 P4D 0 [21383.191771] Oops: 0000 [#1] SMP NOPTI [21383.192113] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15-rc4 [21383.192757] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [21383.193414] RIP: 0010:wb_timer_fn+0x44/0x3c0 [21383.193855] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00 [21383.195366] RSP: 0018:ffffbcd140003e68 EFLAGS: 00010246 [21383.195842] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [21383.196425] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9a3521f4fd80 [21383.197010] RBP: ffffbcd140003e90 R08: 0000000000000000 R09: 0000000000000000 [21383.197594] R10: 0000000000000004 R11: 000000000000000f R12: ffff9a34c75c4900 [21383.198178] R13: ffff9a34c3906de0 R14: 0000000000000000 R15: ffff9a353dc18c00 [21383.198763] FS: 0000000000000000(0000) GS:ffff9a353dc00000(0000) knlGS:0000000000000000 [21383.199558] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [21383.200212] CR2: 0000000000000098 CR3: 0000000005f54000 CR4: 00000000000006f0 [21383.200930] Call Trace: [21383.201210] [21383.201461] ? blk_stat_free_callback_rcu+0x30/0x30 [21383.202692] blk_stat_timer_fn+0x138/0x140 [21383.203180] call_timer_fn+0x2b/0x100 [21383.203666] __run_timers.part.0+0x1d1/0x240 [21383.204227] ? kvm_clock_get_cycles+0x11/0x20 [21383.204815] ? ktime_get+0x3e/0xa0 [21383.205309] ? native_apic_msr_write+0x2c/0x30 [21383.205914] ? lapic_next_event+0x20/0x30 [21383.206412] ? clockevents_program_event+0x94/0xf0 [21383.206873] run_timer_softirq+0x2a/0x50 [21383.207260] __do_softirq+0xcb/0x26f [21383.207647] irq_exit_rcu+0x8c/0xb0 [21383.208010] sysvec_apic_timer_interrupt+0x7c/0x90 [21383.208464] [21383.208713] asm_sysvec_apic_timer_interrupt+0x12/0x20 I think that systemd autotest used to use -enable-kvm, but then they removed it, because it was introducing too many problems in the nested KVM context. I'm not sure about the nature of those problems though, I can investigate a bit and see if I can understand what they were exactly. -Andrea