Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp302228rdf; Tue, 21 Nov 2023 03:16:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IGo7e0BgArpdqn3wHXk8GseXii95pDOhFV1iSwgJ1WnD7IoJ7Qd218S4YXoNQnKJzZgcTDX X-Received: by 2002:a17:902:c94b:b0:1cf:6b9d:ede5 with SMTP id i11-20020a170902c94b00b001cf6b9dede5mr3608952pla.20.1700565376735; Tue, 21 Nov 2023 03:16:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700565376; cv=none; d=google.com; s=arc-20160816; b=AzYZqyamkEHurZykoHqqjitxDvpoM06m3mqZxJM2n6uy8wipQhxUTviDb1bibJVhp2 xV1tGrYtj9v84kw1EMiafaq2C5G9RM1FmSI3fl/IEnQgXgGqNlQLUunwrMPP2l0Iudfx SHvOE7PNLZNXM1a4rwoupU83zNe1JZbJWE7Y5pOLOP/fBzy3ctDAeKUhYyaeIWvcHWno LqBaaMkvn0kwppduK6nRZxaOvd0g+gs1ALdsZqMevBzImH7zToXgMFEz3JrQGKuopO2N GR/cfPVANnzJypk0+cheWeHFfDhDg3de20PyTFGcjaRW7lftpF1AeXIilsb6fxmiY6LT jq6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=2sRSoHsLEqeQpxDT2LF058stYwG/1vHdpVAOMAHMWL4=; fh=WzP7o1CUISAD8aq56BmuAoiH1Z+LPf4pgANYDxU3TFE=; b=x+lAi/UvAR9PxkGpytgpUyaxRigMrV9MaAx855SgV6S84YqHmpne08uwj40osjDtF+ hpKvnWQ/HLiWJSdfZe0TM2tSYxENa/FZu7fO0Y8jJhdtPTGpKuqGHqKiv5qE6H/lpwtr o41pseMiDAB/YK5YBk8D4BW80vgd139Y17qfgBc2zEcwOBXBIvgvwbzaEfNBhKpwsMr9 ZnJi7cZHAZ+7mp1GMisDx7uBgnJhZ39j2YagBAdNyMFb+ay1OTaBXLky8YRcNU4D87pn EDJuZpVQvb0O/gr3fZvT1Z2wKV8pIYqi/kTxQUz7YmCQqcC4B/Et7o/nQGqRleeMgREn 4YMg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id cn4-20020a056a020a8400b005b93eb7d322si10120806pgb.481.2023.11.21.03.16.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 03:16:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 60836807E45C; Tue, 21 Nov 2023 03:15:15 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230126AbjKULO7 (ORCPT + 99 others); Tue, 21 Nov 2023 06:14:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229481AbjKULO6 (ORCPT ); Tue, 21 Nov 2023 06:14:58 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7B6C098 for ; Tue, 21 Nov 2023 03:14:54 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E6F42FEC; Tue, 21 Nov 2023 03:15:40 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.40.232]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0F9493F7A6; Tue, 21 Nov 2023 03:14:52 -0800 (PST) Date: Tue, 21 Nov 2023 11:14:37 +0000 From: Mark Rutland To: Peter Zijlstra Cc: Kent Overstreet , Ingo Molnar , Will Deacon , Waiman Long , Boqun Feng , linux-kernel@vger.kernel.org Subject: Re: lockdep + kasan bug? Message-ID: References: <20231120233659.e36txv3fedbjn4sx@moria.home.lan> <20231121103614.GG8262@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231121103614.GG8262@noisy.programming.kicks-ass.net> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 21 Nov 2023 03:15:15 -0800 (PST) On Tue, Nov 21, 2023 at 11:36:14AM +0100, Peter Zijlstra wrote: > On Mon, Nov 20, 2023 at 06:36:59PM -0500, Kent Overstreet wrote: > > I've been seeing a lot of reports like the following in a lot of my > > lockdep + kasan tests. > > I'm not aware of any such issues, then again, I rarely run with KASAN > enabled myself, I mostly leave that to the robots, who are far more > patient than me with slow kernels. > > > Some lockdep patches are in my tree: they don't touch this code path > > (except I do have to increase MAX_LOCK_DEPTH from 48 to 63, perhaps that > > has unintended side effects?) > > > > https://evilpiepirate.org/git/bcachefs.git/log/?id=2f42f415f7573001b4f4887b785d8a8747b3757f > > yeah, don't see anything weird there. I mean, sad about the no-recursion > thing, esp. after you did those custom order bits. > > > bcachefs does take a _large_ number of locks for lockdep to track, also > > possibly relevant > > > > Have not dug into the lockdep hash table of outstanding locks code yet > > but happy to test patches... > > > > 04752 ========= TEST tiering_variable_buckets_replicas > > 04752 > > 04752 WATCHDOG 3600 > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,promote_target=ssd,fsck > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): initializing new filesystem > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): going read-write > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): marking superblocks > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): initializing freespace > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): done initializing freespace > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): reading snapshots table > > 04753 bcachefs (ea667958-8bbd-451b-9043-9132a2fd2fa4): reading snapshots done > > 04753 WATCHDOG 3600 > > 04753 randrw: (g=0): rw=randrw, bs=(R) 4096B-1024KiB, (W) 4096B-1024KiB, (T) 4096B-1024KiB, ioengine=libaio, iodepth=64 > > 04753 fio-3.33 > > 04753 Starting 1 process > > 04753 randrw: Laying out IO file (1 file / 3500MiB) > > 05117 Jobs: 1 (f=1) > > 05117 BUG: KASAN: global-out-of-bounds in add_chain_block+0x44/0x288 > > 05117 Read of size 4 at addr ffffffc081b7a8bc by task fio/120528 > > 05117 > > 05117 CPU: 11 PID: 120528 Comm: fio Tainted: G L 6.6.0-ktest-gc18b7260ddd3 #8209 > > 05117 Hardware name: linux,dummy-virt (DT) > > 05117 Call trace: > > 05117 dump_backtrace+0xa8/0xe8 > > 05117 show_stack+0x1c/0x30 > > 05117 dump_stack_lvl+0x5c/0xa0 > > 05117 print_report+0x1e4/0x5a0 > > 05117 kasan_report+0x80/0xc0 > > 05117 __asan_load4+0x90/0xb0 > > 05117 add_chain_block+0x44/0x288 > > 05117 __lock_acquire+0x1104/0x24f8 > > 05117 lock_acquire+0x1e0/0x470 > > 05117 _raw_spin_lock_nested+0x54/0x78 > > 05117 raw_spin_rq_lock_nested+0x30/0x50 > > 05117 try_to_wake_up+0x3b4/0x1050 > > 05117 wake_up_process+0x1c/0x30 > > 05117 kick_pool+0x104/0x1b0 > > 05117 __queue_work+0x350/0xa58 > > 05117 queue_work_on+0x98/0xd0 > > 05117 __bch2_btree_node_write+0xec0/0x10a0 > > 05117 bch2_btree_node_write+0x88/0x138 > > 05117 btree_split+0x744/0x14a0 > > 05117 bch2_btree_split_leaf+0x94/0x258 > > 05117 bch2_trans_commit_error.isra.0+0x234/0x7d0 > > 05117 __bch2_trans_commit+0x1128/0x3010 > > 05117 bch2_extent_update+0x410/0x570 > > 05117 bch2_write_index_default+0x404/0x598 > > 05117 __bch2_write_index+0xb0/0x3b0 > > 05117 __bch2_write+0x6f0/0x928 > > 05117 bch2_write+0x368/0x8e0 > > 05117 bch2_direct_write+0xaa8/0x12c0 > > 05117 bch2_write_iter+0x2e4/0x1050 > > 05117 aio_write.constprop.0+0x19c/0x420 > > 05117 io_submit_one.constprop.0+0xf30/0x17a0 > > 05117 __arm64_sys_io_submit+0x244/0x388 > > 05117 invoke_syscall.constprop.0+0x64/0x138 > > 05117 do_el0_svc+0x7c/0x120 > > 05117 el0_svc+0x34/0x80 > > 05117 el0t_64_sync_handler+0xb8/0xc0 > > 05117 el0t_64_sync+0x14c/0x150 > > 05117 > > 05117 The buggy address belongs to the variable: > > 05117 nr_large_chain_blocks+0x3c/0x40 > > This is weird, nr_lage_chain_blocks is a single variable, if the > compiler keeps layout according to the source file, this would be > chaing_block_bucket[14] or something weird like that. I think the size here is bogus; IIUC that's determined form the start of the next symbol, which happens to be 64 bytes away from the start of nr_lage_chain_blocks. From the memory state dump, there's padding/redzone between two global objects, and I think we're accessing a negative offset from the next object. More on that below. > Perhaps figure out what it things the @size argument to > add_chain_block() would be? > > > 05117 > > 05117 The buggy address belongs to the virtual mapping at > > 05117 [ffffffc081710000, ffffffc088861000) created by: > > 05117 paging_init+0x260/0x820 > > 05117 > > 05117 The buggy address belongs to the physical page: > > 05117 page:00000000ce625900 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x41d7a > > 05117 flags: 0x4000(reserved|zone=0) > > 05117 page_type: 0xffffffff() > > 05117 raw: 0000000000004000 fffffffe00075e88 fffffffe00075e88 0000000000000000 > > 05117 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > > 05117 page dumped because: kasan: bad access detected > > 05117 > > 05117 Memory state around the buggy address: > > 05117 ffffffc081b7a780: 00 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 > > 05117 ffffffc081b7a800: 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 > > 05117 >ffffffc081b7a880: 04 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 > > 05117 ^ In this dump: * '00' means all 8 bytes of an 8-byte region areaccessible * '04' means the first 4 bytes on an 8-byte region are accessible * 'f9' means KASAN_GLOBAL_REDZONE / padding between objects So at 0xffffffc081b7a880 we have a 4-byte object, 60 bytes of padding, then a 64-byte object. I think the 4-byte object at 0xffffffc081b7a880 is nr_large_chain_blocks, and the later 64-byte object is chain_block_buckets[]. I suspect the dodgy access is to chain_block_buckets[-1], which hits the last 4 bytes of the redzone and gets (incorrectly/misleadingly) attributed to nr_large_chain_blocks. Mark. > > 05117 ffffffc081b7a900: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 > > 05117 ffffffc081b7a980: f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 00 f9 f9 f9 > > 05117 ================================================================== > > 05117 Kernel panic - not syncing: kasan.fault=panic set ... > > 05117 CPU: 11 PID: 120528 Comm: fio Tainted: G L 6.6.0-ktest-gc18b7260ddd3 #8209 > > 05117 Hardware name: linux,dummy-virt (DT) > > 05117 Call trace: > > 05117 dump_backtrace+0xa8/0xe8 > > 05117 show_stack+0x1c/0x30 > > 05117 dump_stack_lvl+0x5c/0xa0 > > 05117 dump_stack+0x18/0x20 > > 05117 panic+0x3ac/0x408 > > 05117 kasan_report_invalid_free+0x0/0x90 > > 05117 kasan_report+0x90/0xc0 > > 05117 __asan_load4+0x90/0xb0 > > 05117 add_chain_block+0x44/0x288 > > 05117 __lock_acquire+0x1104/0x24f8 > > 05117 lock_acquire+0x1e0/0x470 > > 05117 _raw_spin_lock_nested+0x54/0x78 > > 05117 raw_spin_rq_lock_nested+0x30/0x50 > > 05117 try_to_wake_up+0x3b4/0x1050 > > 05117 wake_up_process+0x1c/0x30 > > 05117 kick_pool+0x104/0x1b0 > > 05117 __queue_work+0x350/0xa58 > > 05117 queue_work_on+0x98/0xd0 > > 05117 __bch2_btree_node_write+0xec0/0x10a0 > > 05117 bch2_btree_node_write+0x88/0x138 > > 05117 btree_split+0x744/0x14a0 > > 05117 bch2_btree_split_leaf+0x94/0x258 > > 05117 bch2_trans_commit_error.isra.0+0x234/0x7d0 > > 05117 __bch2_trans_commit+0x1128/0x3010 > > 05117 bch2_extent_update+0x410/0x570 > > 05117 bch2_write_index_default+0x404/0x598 > > 05117 __bch2_write_index+0xb0/0x3b0 > > 05117 __bch2_write+0x6f0/0x928 > > 05117 bch2_write+0x368/0x8e0 > > 05117 bch2_direct_write+0xaa8/0x12c0 > > 05117 bch2_write_iter+0x2e4/0x1050 > > 05117 aio_write.constprop.0+0x19c/0x420 > > 05117 io_submit_one.constprop.0+0xf30/0x17a0 > > 05117 __arm64_sys_io_submit+0x244/0x388 > > 05117 invoke_syscall.constprop.0+0x64/0x138 > > 05117 do_el0_svc+0x7c/0x120 > > 05117 el0_svc+0x34/0x80 > > 05117 el0t_64_sync_handler+0xb8/0xc0 > > 05117 el0t_64_sync+0x14c/0x150 > > 05117 SMP: stopping secondary CPUs > > 05117 Kernel Offset: disabled > > 05117 CPU features: 0x0,00000000,70000001,1040500b > > 05117 Memory Limit: none > > 05117 ---[ end Kernel panic - not syncing: kasan.fault=panic set ... ]--- > > 05122 ========= FAILED TIMEOUT tiering_variable_buckets_replicas in 3600s