Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp5320586rwp; Mon, 17 Jul 2023 01:47:15 -0700 (PDT) X-Google-Smtp-Source: APBJJlGIPttHDrU79bYupgEcShFReGXCxYsJ7cTPorTQgi+l20mQdYSIix9/Q90MaNK7G97YNha1 X-Received: by 2002:a17:907:9143:b0:992:47d7:35d7 with SMTP id l3-20020a170907914300b0099247d735d7mr8899933ejs.14.1689583635326; Mon, 17 Jul 2023 01:47:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689583635; cv=none; d=google.com; s=arc-20160816; b=NpFi6/nXuYB9UknhHiW5kNo77K6Ed3XaEKDSfYlykK5VlES9efOxLsKlfsHokHDDfN v1JBYaawMjfd0tvmDTnzHWqBaCvg6Bs5QfTd2mfl8xp4qNjlLkLslE0IzO3KW/6oK3+H sKthPw9jzquDq/p0F7Og3jYqLWCR9YqTkZVARDFnxk+uGorLURuDaYwAfPp60lbS2ttA oyhnWMI+tpGx95Rx8ANrq5zLqbLpEmUWhPmooMtTkCPWsoigU70V38eWJtA4n+HqOfth YCFOvvWmkPUdS8t4NOsN6eoaHKw7qYOgsZqQ3iRljFSJWLOX58SQk2hiGjaCkokPEkZ6 9EYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date; bh=Xr4ecN3F4joY6GJjUDIRxbVhlQpJwhdfMJ0KGno52lc=; fh=17D6jbjtaxkEfb2R31p7DxYTlw69nLAALFTXrCdQVw0=; b=wHt+GNE9XgSB6Fkfoq46WPmdE4Efg+tGgEkUVOIDfkDrN1Z1P6LJqOdzD7oE7Mr3j4 Jn6YOO7IgCUPwhxrd3xl7LaSrd1wjJDbRD4Xpa7V9GLBWFgaAXC1WlacXcCUH046EmVJ pfudJK0fKsl6u0clDWxfjJkyL3OShW8ByFR45p8DLMk9wGIfG+ouhm0ZTNdJorx3pWWa oOVOzkAYvLbDMD6zEJxeoBUREqsEmEvpahvoI0jf5VH2yAeyAltBRJGQOz7RGa9f0frv m2Ul1Ja0ZOQuaO2qwjKSEeSHwm97W1oNQjbCI8ZCwFOrm6SNw5oXHwfl8aqWkEeG/oQm 0H0w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s20-20020a170906061400b0098d2f718ac3si12107112ejb.609.2023.07.17.01.46.45; Mon, 17 Jul 2023 01:47:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbjGQIPM (ORCPT + 99 others); Mon, 17 Jul 2023 04:15:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231292AbjGQIPJ (ORCPT ); Mon, 17 Jul 2023 04:15:09 -0400 X-Greylist: delayed 459 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 17 Jul 2023 01:15:05 PDT Received: from mail.a-eberle.de (mail.a-eberle.de [213.95.140.213]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 783811AA for ; Mon, 17 Jul 2023 01:15:05 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.a-eberle.de (Postfix) with ESMTP id B8A8C38049D for ; Mon, 17 Jul 2023 10:07:10 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at aeberle-mx.softwerk.noris.de Received: from mail.a-eberle.de ([127.0.0.1]) by localhost (ebl-mx-02.a-eberle.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dnwL5IOpbizT for ; Mon, 17 Jul 2023 10:07:08 +0200 (CEST) Received: from gateway.a-eberle.de (unknown [178.15.155.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "sg310.eberle.local", Issuer "A. Eberle GmbH & Co. KG WebAdmin CA" (not verified)) (Authenticated sender: postmaster@a-eberle.de) by mail.a-eberle.de (Postfix) with ESMTPSA for ; Mon, 17 Jul 2023 10:07:08 +0200 (CEST) Received: from svr-exch2019.eberle.local ([192.168.1.17]:9494 helo=webmail.a-eberle.de) by gateway.a-eberle.de with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qLJG5-0007xJ-2e; Mon, 17 Jul 2023 10:07:05 +0200 Received: from PC-LX-Wurm (10.10.30.11) by Svr-Exch2019.eberle.local (192.168.1.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.42; Mon, 17 Jul 2023 10:07:05 +0200 Date: Mon, 17 Jul 2023 10:07:04 +0200 From: Stephan Wurm To: Richard Weinberger , Miquel Raynal , Vignesh Raghavendra , , CC: Johannes Eigner Subject: ubiblock: null pointer dereference using scatterlist in work_queue Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline X-Originating-IP: [10.10.30.11] X-ClientProxiedBy: Svr-Exch2019.eberle.local (192.168.1.17) To Svr-Exch2019.eberle.local (192.168.1.17) X-EsetResult: clean, is OK X-EsetId: 37303A298E357D5B677565 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello! We struggle with a weird problem accessing an ubiblock device for quite some time now. As we are running out of ideas, I reach out for your help. The device we develop has an i.MX6 Solo SoC equipped with gpmi-nand flash (Micron MT29F8G08ABACA, 1GB). The flash is partitioned into five UBI volumes, forming an A/B boot setup, consisting of two pairs with each a UBIFS volume holding a FIT image and a ubiblock volume containing a SQUASHFS root filesystem with dm-verity hashes attached, and the fifth UBIFS volume for common data. The bootloader (barebox 2022.06.0) is stored on an additional spi-nor flash. In order to mount the root filesystem, the FIT image contains an initramfs, calling cryptsetup (tried v2.4.3 and v2.6.1) on the ubiblock device, followed by mount of the root filesystem. MTD attach and creation of the ubiblock device is triggered by kernel arguments: commandline: console=ttymxc1,115200n8 ubi.block=0,root0 root=/dev/ubiblock0_2 rootfstype=squashfs ubi.mtd=nand This fails to boot with a null pointer dereference when calling cryptsetup: [ 5.637207] ubi0: default fastmap pool size: 200 [ 5.642020] ubi0: default fastmap WL pool size: 100 [ 5.647000] ubi0: attaching mtd0 [ 6.450307] ubi0: scanning is finished [ 6.472816] ubi0: attached mtd0 (name "nand", size 1024 MiB) [ 6.478561] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes [ 6.485508] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096 [ 6.492395] ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192 [ 6.499378] ubi0: good PEBs: 4092, bad PEBs: 4, corrupted PEBs: 0 [ 6.505518] ubi0: user volume: 5, internal volumes: 1, max. volumes count: 128 [ 6.512810] ubi0: max/mean erase counter: 14/8, WL threshold: 4096, image sequence number: 228025450 [ 6.521965] ubi0: available PEBs: 177, total reserved PEBs: 3915, PEBs reserved for bad PEB handling: 76 [ 6.531750] ubi0: background thread "ubi_bgt0d" started, PID 163 [ 12.919477] block ubiblock0_2: created from ubi0:2(root0) [ 12.929687] ALSA device list: [ 12.932750] #0: sgtl5000 [ 12.940024] Freeing unused kernel image (initmem) memory: 1024K [ 12.946926] Run /init as init process Starting version 250.5+ [ 15.601749] mtdblock: MTD device 'nand' is NAND, please consider using UBI block devices instead. realpath: /dev/disk/by-partuuid//dev/ubiblock0_2: No such file or directory [ 26.127460] 8<--- cut here --- [ 26.130689] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 26.138886] [00000000] *pgd=00000000 [ 26.142523] Internal error: Oops: 5 [#1] ARM [ 26.146804] Modules linked in: [ 26.149868] CPU: 0 PID: 18 Comm: kworker/0:3 Not tainted 6.1.38 #1 [ 26.156060] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 26.162592] Workqueue: ubiblock0_2 ubiblock_do_work [ 26.167498] PC is at ubi_io_read+0x78/0x2fc [ 26.171693] LR is at ubi_eba_read_leb+0xe8/0x4a4 [ 26.176320] pc : [] lr : [] psr: 60070013 [ 26.182591] sp : f0961dc8 ip : 00000000 fp : 00002000 [ 26.187819] r10: c81c6000 r9 : 00000000 r8 : c81c7000 [ 26.193046] r7 : 00000200 r6 : 000008e0 r5 : 00002000 r4 : 00000000 [ 26.199578] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : c81c6000 [ 26.206108] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 26.213250] Control: 10c53c7d Table: 1824c059 DAC: 00000051 [ 26.218998] Register r0 information: slab kmalloc-8k start c81c6000 pointer offset 0 size 8192 [ 26.227638] Register r1 information: NULL pointer [ 26.232352] Register r2 information: NULL pointer [ 26.237062] Register r3 information: NULL pointer [ 26.241772] Register r4 information: NULL pointer [ 26.246481] Register r5 information: non-paged memory [ 26.251537] Register r6 information: non-paged memory [ 26.256594] Register r7 information: non-paged memory [ 26.261651] Register r8 information: slab kmalloc-8k start c81c6000 pointer offset 4096 size 8192 [ 26.270545] Register r9 information: NULL pointer [ 26.275254] Register r10 information: slab kmalloc-8k start c81c6000 pointer offset 0 size 8192 [ 26.283977] Register r11 information: non-paged memory [ 26.289120] Register r12 information: NULL pointer [ 26.293916] Process kworker/0:3 (pid: 18, stack limit = 0x1a047662) [ 26.300194] Stack: (0xf0961dc8 to 0xf0962000) [ 26.304559] 1dc0: c1d50ec0 c075cc64 00000001 00000000 00000000 c1d50ec0 [ 26.312744] 1de0: 00000001 24c676fe c075cc64 00000000 000008e0 c81c6000 c80c4000 00000000 [ 26.320928] 1e00: c1d50ec0 00000002 c81c7000 c075cd0c 00000200 c01d3930 60070013 00000000 [ 26.329111] 1e20: 60070013 c1b43208 000008e0 24c676fe 00000000 c830f5cc c830f5cc 00000200 [ 26.337294] 1e40: 00000000 00000001 00000200 c80c4000 00000000 c075d124 00000000 00000200 [ 26.345477] 1e60: 00000000 c13e0ec0 c81c6000 c1d50ec0 00000000 00000000 00000000 c80c4000 [ 26.353661] 1e80: c830f5cc c1e52c00 00000200 c81c6000 00000002 c075b748 00000000 00000200 [ 26.361844] 1ea0: 00000000 c1312e58 ef7d3c00 0003e000 00000000 00000200 ef7d3c00 c1e52c00 [ 26.370027] 1ec0: c830f5a8 c830f5cc 00000000 c076b228 00000200 00000000 c830f500 c1d50ec0 [ 26.378211] 1ee0: c830f5d4 24c676fe c0c0e608 c830f5a8 c1ed3800 c1312e58 ef7d3c00 c1d50ec0 [ 26.386394] 1f00: 00000000 c14e59c0 ef7d3c05 c013c06c 00000001 00000000 c013bff4 00000000 [ 26.394577] 1f20: c1312e68 24c676fe c1b5f698 c19642b8 00000000 c101b390 00000000 24c676fe [ 26.402760] 1f40: c1d50ec0 c1ed3800 c1312e58 c1ed3818 c1312e94 c13dfc30 c1d50ec0 00000008 [ 26.410944] 1f60: c1312e58 c013c3f0 00000000 c1f16880 c1d50ec0 c013c3b0 c1ed3800 c1ed3880 [ 26.419126] 1f80: f095de9c 00000000 00000000 c0143654 c1f16880 c0143588 00000000 00000000 [ 26.427309] 1fa0: 00000000 00000000 00000000 c0100128 00000000 00000000 00000000 00000000 [ 26.435491] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 26.443673] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [ 26.451855] ubi_io_read from ubi_eba_read_leb+0xe8/0x4a4 [ 26.457277] ubi_eba_read_leb from ubi_eba_read_leb_sg+0x5c/0x154 [ 26.463390] ubi_eba_read_leb_sg from ubi_leb_read_sg+0x70/0xb0 [ 26.469325] ubi_leb_read_sg from ubiblock_do_work+0x104/0x238 [ 26.475180] ubiblock_do_work from process_one_work+0x238/0x57c [ 26.481130] process_one_work from worker_thread+0x40/0x4f8 [ 26.486724] worker_thread from kthread+0xcc/0xf0 [ 26.491449] kthread from ret_from_fork+0x14/0x2c [ 26.496168] Exception stack(0xf0961fb0 to 0xf0961ff8) [ 26.501225] 1fa0: 00000000 00000000 00000000 00000000 [ 26.509410] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 26.517592] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 26.524215] Code: 1a000041 e59d2010 e1a09fc5 e1a0b005 (e5d23000) [ 26.530367] ---[ end trace 0000000000000000 ]--- This kernel Oops happens both on patched or vanilla kernels, of which we tried several releases in 5.15.y and 6.1.y branches. We also tried with several of our devices. Upgrading to the latest mainline kernel did not work out-of-the-box, hence we did not follow that track (yet). We tried with full debugging output for the whole ubi driver and used several additional printks. That way we were able to see, that the first scatterlist entry already pointed to virtual address zero when the first read request caused by cryptsetup was added to ubiblock's work_queue. sg_virt(pdu->usgl.sg[0]) => page_address(sg) => 0x0000000 We also tried to use hw_breakpoints to gather more information on (maybe) another module interfering, but did not succeed (yet). But we were not able to narrow down the root cause until now. As an additional twist, the system is able to boot when we mount the ubiblock root filesystem without calling cryptsetup, hence skipping the dm-verity hash verification. And we can verify the root filesystem with cryptsetup once the system boot is finished. It is also possible to boot the same system image, including dm-verity, when using a sdcard instead of the nand flash. Loading the FIT from sdcard but calling cryptsetup on the ubiblock device again leads to the described oops. Is there something we have overlooked? Do you have further ideas to get behind this issue? Thanks! -- Stephan Wurm