Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA747C433F5 for ; Wed, 17 Nov 2021 12:38:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CDEAE61B95 for ; Wed, 17 Nov 2021 12:38:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235559AbhKQMll (ORCPT ); Wed, 17 Nov 2021 07:41:41 -0500 Received: from mga06.intel.com ([134.134.136.31]:48388 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233191AbhKQMlj (ORCPT ); Wed, 17 Nov 2021 07:41:39 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10170"; a="294754472" X-IronPort-AV: E=Sophos;i="5.87,241,1631602800"; d="scan'208";a="294754472" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Nov 2021 04:38:40 -0800 X-IronPort-AV: E=Sophos;i="5.87,241,1631602800"; d="scan'208";a="604717454" Received: from smile.fi.intel.com ([10.237.72.184]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Nov 2021 04:38:38 -0800 Received: from andy by smile.fi.intel.com with local (Exim 4.95) (envelope-from ) id 1mnKCs-007n5y-F1; Wed, 17 Nov 2021 14:38:30 +0200 Date: Wed, 17 Nov 2021 14:38:30 +0200 From: Andy Shevchenko To: Hans de Goede , sakari.ailus@linux.intel.com Cc: Daniel Scally , kernel test robot , lkp@lists.01.org, lkp@intel.com, linux-kernel@vger.kernel.org, gregkh@linuxfoundation.org, rafael@kernel.org Subject: Re: [device property] 995fe757ec: BUG:kernel_NULL_pointer_dereference,address Message-ID: References: <20211116074104.GC32102@xsang-OptiPlex-9020> <606a6bf2-e971-ddfe-74b0-cbc2b76935ba@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Just realized we are discussing this w/o Sakari involved. On Wed, Nov 17, 2021 at 12:54:51PM +0100, Hans de Goede wrote: > On 11/17/21 01:10, Daniel Scally wrote: > > On 16/11/2021 16:59, Andy Shevchenko wrote: > >> On Tue, Nov 16, 2021 at 03:55:00PM +0100, Hans de Goede wrote: > >>> On 11/16/21 08:41, kernel test robot wrote: > >>>> FYI, we noticed the following commit (built with gcc-9): > >>>> > >>>> commit: 995fe757ecaeac44e023458af64d27655f9dbf73 ("[PATCH] device property: Check fwnode->secondary when finding properties") > >>>> url: https://github.com/0day-ci/linux/commits/Daniel-Scally/device-property-Check-fwnode-secondary-when-finding-properties/20211114-044259 > >>>> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/driver-core.git b5013d084e03e82ceeab4db8ae8ceeaebe76b0eb > >>>> patch link: https://lore.kernel.org/lkml/20211113204141.520924-1-djrscally@gmail.com > >>>> > >>>> in testcase: boot > >>>> > >>>> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G > >>>> > >>>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > >>>> > >>>> > >>>> +---------------------------------------------+------------+------------+ > >>>> | | b5013d084e | 995fe757ec | > >>>> +---------------------------------------------+------------+------------+ > >>>> | boot_successes | 23 | 0 | > >>>> | boot_failures | 0 | 22 | > >>>> | BUG:kernel_NULL_pointer_dereference,address | 0 | 22 | > >>>> | Oops:#[##] | 0 | 22 | > >>>> | EIP:fwnode_property_get_reference_args | 0 | 22 | > >>>> | Kernel_panic-not_syncing:Fatal_exception | 0 | 22 | > >>>> +---------------------------------------------+------------+------------+ > >>>> > >>>> > >>>> If you fix the issue, kindly add following tag > >>>> Reported-by: kernel test robot > >>> Ok, so this patch likely needs a v2 which changes the if to this: > >>> > >>> if (ret == -EINVAL && !IS_ERR_OR_NULL(fwnode) && > >>> !IS_ERR_OR_NULL(fwnode->secondary)) > >>> ret = fwnode_call_int_op(fwnode->secondary, get_reference_args, > >>> prop, nargs_prop, nargs, index, args); > >>> > >>> > >>> So that we check fwnode before dereferencing it, note this also changes the > >>> (ret < 0) check to (ret == -EINVAL), this makes the secondary node handling > >>> identical to fwnode_property_read_int_array() and > >>> fwnode_property_read_string_array() > >>> > >>> Danny, can you send a v2 with this change please? > >> Hmm... So, you are suggesting that we need to check it only for EINVAL and > >> ENOENT in this case the one that brings us to the NULL pointer dereference. > >> But I don't understand what's the difference here. > > > > > > Sticking point; the ACPI version of .get_reference_args() returns > > -ENOENT (converted from -EINVAL [1]) if the property you ask for doesn't > > exist against that fwnode, which unless I'm missing something means this > > won't work in our use case. This confused me for a while because we > > definitely call fwnode_property_read_int_array() in sensor driver probes > > through v4l2_fwnode_endpoint_alloc_parse(), but it turns out the ACPI > > version of _that_ operation has no matching conversion of the error > > code, so when that fails to find the property it sends back -EINVAL and > > so the form that exists in fwnode_property_read_int_array() currently > > works fine. > > > > > > We could align them all to if (ret < 0 && !IS_ERR_OR_NULL(fwnode) && > > !IS_ERR_OR_NULL(fwnode->secondary)). This is probably my preferred > > option, because I can't really see why we'd only want to do the > > secondary check on -EINVAL anyway - but maybe I miss something here. > > Alternatively we can take Hans suggestion so they all match the existing > > code, but this means we have to handle that conversion first - I > > couldn't see from a cursory look that any of the direct callers check > > the value of the return beyond "is it 0?", but of course it could be > > done somewhere in calls to the fwnode->ops->get_reference_args() > > callback instead. > > > > > > Thoughts? > > I missed that just checking for -EINVAL will not work for the ipu3 case > (I did not test) in that case I think using "ret < 0" as check instead > is probably fine for this patch. > > As for modifying the existing 2 code paths, IMHO it does make sense > to try and preserve the error code (and not try the secondary fwnode) > when the error is an error other then the one indicating the property > is not there. > > So keeping those as -EINVAL is probably best and maybe for the > the fwnode_find_reference instead of (ret < 0) use: > (ret == -EINVAL || ret == -ENOENT) ? Last time Sakari did a great job of error code alignments between DT, ACPI, and SW nodes. Not sure why the above slipped through the fingers. > >>>> [ 17.327851][ T7] BUG: kernel NULL pointer dereference, address: 00000000 > >>>> [ 17.329758][ T7] #PF: supervisor read access in kernel mode > >>>> [ 17.331371][ T7] #PF: error_code(0x0000) - not-present page > >>>> [ 17.332992][ T7] *pde = 00000000 > >>>> [ 17.334107][ T7] Oops: 0000 [#1] PREEMPT > >>>> [ 17.335310][ T7] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G S 5.15.0-11191-g995fe757ecae #1 > >>>> [ 17.338036][ T7] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 > >>>> [ 17.340544][ T7] Workqueue: events_unbound deferred_probe_work_func > >>>> [ 17.342291][ T7] EIP: fwnode_property_get_reference_args (drivers/base/property.c:486 (discriminator 1)) > >>>> [ 17.344051][ T7] Code: 8b 45 0c 50 8b 45 08 50 89 d8 89 55 f4 ff d6 83 c4 0c 89 c6 85 c0 78 55 8d 65 f8 89 f0 5b 5e 5d c3 8d 74 26 00 be fa ff ff ff <8b> 03 85 c0 74 e8 3d 00 f0 ff ff 77 e1 8b 58 04 85 db 74 37 8b 5b > >>>> All code > >>>> ======== > >>>> 0: 8b 45 0c mov 0xc(%rbp),%eax > >>>> 3: 50 push %rax > >>>> 4: 8b 45 08 mov 0x8(%rbp),%eax > >>>> 7: 50 push %rax > >>>> 8: 89 d8 mov %ebx,%eax > >>>> a: 89 55 f4 mov %edx,-0xc(%rbp) > >>>> d: ff d6 callq *%rsi > >>>> f: 83 c4 0c add $0xc,%esp > >>>> 12: 89 c6 mov %eax,%esi > >>>> 14: 85 c0 test %eax,%eax > >>>> 16: 78 55 js 0x6d > >>>> 18: 8d 65 f8 lea -0x8(%rbp),%esp > >>>> 1b: 89 f0 mov %esi,%eax > >>>> 1d: 5b pop %rbx > >>>> 1e: 5e pop %rsi > >>>> 1f: 5d pop %rbp > >>>> 20: c3 retq > >>>> 21: 8d 74 26 00 lea 0x0(%rsi,%riz,1),%esi > >>>> 25: be fa ff ff ff mov $0xfffffffa,%esi > >>>> 2a:* 8b 03 mov (%rbx),%eax <-- trapping instruction > >>>> 2c: 85 c0 test %eax,%eax > >>>> 2e: 74 e8 je 0x18 > >>>> 30: 3d 00 f0 ff ff cmp $0xfffff000,%eax > >>>> 35: 77 e1 ja 0x18 > >>>> 37: 8b 58 04 mov 0x4(%rax),%ebx > >>>> 3a: 85 db test %ebx,%ebx > >>>> 3c: 74 37 je 0x75 > >>>> 3e: 8b .byte 0x8b > >>>> 3f: 5b pop %rbx > >>>> > >>>> Code starting with the faulting instruction > >>>> =========================================== > >>>> 0: 8b 03 mov (%rbx),%eax > >>>> 2: 85 c0 test %eax,%eax > >>>> 4: 74 e8 je 0xffffffffffffffee > >>>> 6: 3d 00 f0 ff ff cmp $0xfffff000,%eax > >>>> b: 77 e1 ja 0xffffffffffffffee > >>>> d: 8b 58 04 mov 0x4(%rax),%ebx > >>>> 10: 85 db test %ebx,%ebx > >>>> 12: 74 37 je 0x4b > >>>> 14: 8b .byte 0x8b > >>>> 15: 5b pop %rbx > >>>> [ 17.350847][ T7] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: c37cd6d8 > >>>> [ 17.352783][ T7] ESI: ffffffea EDI: f5b5a400 EBP: c4cffd24 ESP: c4cffd14 > >>>> [ 17.354673][ T7] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010246 > >>>> [ 17.362075][ T7] CR0: 80050033 CR2: 00000000 CR3: 04206000 CR4: 00000690 > >>>> [ 17.363993][ T7] Call Trace: > >>>> [ 17.365018][ T7] fwnode_find_reference (drivers/base/property.c:514) > >>>> [ 17.366430][ T7] ? __this_cpu_preempt_check (lib/smp_processor_id.c:67) > >>>> [ 17.367825][ T7] ? lockdep_init_map_type (kernel/locking/lockdep.c:4813) > >>>> [ 17.369325][ T7] ? phylink_run_resolve+0x20/0x20 > >>>> [ 17.370897][ T7] ? init_timer_key (kernel/time/timer.c:818) > >>>> [ 17.372228][ T7] fwnode_get_phy_node (drivers/net/phy/phy_device.c:2986) > >>>> [ 17.373574][ T7] phylink_fwnode_phy_connect (drivers/net/phy/phylink.c:1180 drivers/net/phy/phylink.c:1166) > >>>> [ 17.375014][ T7] phylink_of_phy_connect (drivers/net/phy/phylink.c:1152) > >>>> [ 17.376373][ T7] dsa_slave_create (net/dsa/slave.c:1889 net/dsa/slave.c:2036) > >>>> [ 17.377765][ T7] dsa_tree_setup_switches (net/dsa/dsa2.c:477 net/dsa/dsa2.c:977) > >>>> [ 17.379282][ T7] dsa_register_switch (net/dsa/dsa2.c:1065 net/dsa/dsa2.c:1565 net/dsa/dsa2.c:1579) > >>>> [ 17.380762][ T7] dsa_loop_drv_probe (drivers/net/dsa/dsa_loop.c:333) > >>>> [ 17.382137][ T7] mdio_probe (drivers/net/phy/mdio_device.c:157) -- With Best Regards, Andy Shevchenko