Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp2921261ybl; Mon, 19 Aug 2019 09:21:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqzdyOyLBxXLq873k97RTz2oDY+vRMjgpbFk+TqsfIx/pbXoKI3lXlvHuuW879en15uw+46v X-Received: by 2002:a17:902:9686:: with SMTP id n6mr22780704plp.113.1566231673095; Mon, 19 Aug 2019 09:21:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566231673; cv=none; d=google.com; s=arc-20160816; b=tY9vlNMWGwRk7D0jpnAs2PtBYcj84ifqYjKG7yW0G7qlDmkqaBLEWBee4Bdhm9cvIu zJbh6ez4DknXbL9dZu3yZeRKlCKJieMi5wZpvk5VGq1HmzeJ9q9UhxFauvX8AqhAxYO8 Hd8hOEeJLHEoYjNbQ9b9m47yFIy6SRh3eBdX1XqB63965i06SapUyJTUwA4Vl2z32EqZ dQN+rt6dhKoccDKxfLWVuA+P+qsvlANquBlch2MlSOs8Saxn/veK76Da1qGb5qGlnO3s RysfDwi8M515eh3mAoMPZfKAS95VktZZNcIztyvd4lejwMk/luUUABunBLKrkJRSiJpW nqpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=IKQPktxfBXL0jaMh5fXrtktV3oFNbQQrLd0UoqhuRhc=; b=GCl0XhIBKpMlADg84HQq+uxYmrM5gt1MwepOwHIKMtITyj7Gp3bgstPmdFWDf44xxN +4rwPC0N9+zpW+9OZRGBU8iidxCp7665hh8shI4NvlPAQTTcV7ebNW6G9sQKsxckdyAP u+0340xSryzFH/a2TdLzMlc5vsM4Zr/wAjf3QJFiQhIkE8XVPcRLfLKHxkyADG/vQ/4c BES5fSHP4iyt/u7Uu5ZCnUgOLpUmC5nbYv3CIU3YBQtv/OzE1V4mR/kQOrI0mFBWWp4z qZg47Zu7kuoBf+iXWFTat7s4TY/bMAjI4X/AWXqxlvQ0WVB3d4G0/pdOQPQXgY1KcTom vV5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=DMMQAXvJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g34si8682544pld.86.2019.08.19.09.20.58; Mon, 19 Aug 2019 09:21:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@broadcom.com header.s=google header.b=DMMQAXvJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=broadcom.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727740AbfHSQT4 (ORCPT + 99 others); Mon, 19 Aug 2019 12:19:56 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37644 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726905AbfHSQTz (ORCPT ); Mon, 19 Aug 2019 12:19:55 -0400 Received: by mail-pl1-f194.google.com with SMTP id bj8so1194970plb.4 for ; Mon, 19 Aug 2019 09:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=IKQPktxfBXL0jaMh5fXrtktV3oFNbQQrLd0UoqhuRhc=; b=DMMQAXvJwkyqlEk1Z2h6vxJfOKFQi1ug7u+ZCzbyrKi+g/PxG2u7ifgSTSxzvSFtTP NJJMZIEShk0FdmE/MIxJo6xftNZtiYT8lUOEb6UB5ZW8t/hX6r7XTUGfwL2XWBBCgW5C KhxIeYon6r88GtPZSxPpSVLKZ5sjyT76+UC2M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=IKQPktxfBXL0jaMh5fXrtktV3oFNbQQrLd0UoqhuRhc=; b=CQ4J0kRy9w/1bcGwHJK/AncMbIeXidRlarcIpZqU+9JQ7+OlTefX/N75TiwS8ZEno6 JHipMsrKaBhr56eAZvg/mRWfMIRp8g7jFlB7zBizDjsDxG9dh828POn+6y+j2WSGxQYH FBtYaFtGU5viYckeoBA2UmiU7X2ZGAwMdcTp9dBQ8N0rzBp3t+o8GeBEj7U3zJ4M+gas 9GnBzCnnroMP3jr0YrANOsdcaNqLQdKeqnY4KYCqPaRiUZPZN+JA+ssp9YdLMU5peQ4F F5gAknAjFlebGtq9MfzfzNg/Bk0Y5MC2YeVv0HehFylepe0PGFzcF422dDnURYiEBUIP jJ3w== X-Gm-Message-State: APjAAAV/DyrY9HzY7qRxUdKFLWkmnlFhBLExScF4H5YiX+nMhzFp3LcR VhIlLqyGSrCH2QkkvXEnenCHZA== X-Received: by 2002:a17:902:b591:: with SMTP id a17mr23927025pls.189.1566231594942; Mon, 19 Aug 2019 09:19:54 -0700 (PDT) Received: from [10.136.13.65] ([192.19.228.250]) by smtp.gmail.com with ESMTPSA id c22sm8016983pfi.82.2019.08.19.09.19.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Aug 2019 09:19:54 -0700 (PDT) Subject: Re: [PATCH 3/3] firmware: add mutex fw_lock_fallback for race condition To: Luis Chamberlain Cc: Greg Kroah-Hartman , Andy Gross , David Brown , Alexander Viro , Shuah Khan , bjorn.andersson@linaro.org, "Rafael J . Wysocki" , linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-fsdevel@vger.kernel.org, BCM Kernel Feedback , Olof Johansson , Andrew Morton , Dan Carpenter , Colin Ian King , Kees Cook , Takashi Iwai , linux-kselftest@vger.kernel.org References: <20190816000945.29810-1-scott.branden@broadcom.com> <20190816000945.29810-4-scott.branden@broadcom.com> <20190819053937.GR16384@42.do-not-panic.com> From: Scott Branden Message-ID: <16823ee6-c52a-b3b5-caed-79c00772fa68@broadcom.com> Date: Mon, 19 Aug 2019 09:19:51 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190819053937.GR16384@42.do-not-panic.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Luis, Thanks for the review. I did not think this patch would be the final solution either as indicated in the original cover letter and code comment. Some comments inline. On 2019-08-18 10:39 p.m., Luis Chamberlain wrote: > On Thu, Aug 15, 2019 at 05:09:45PM -0700, Scott Branden wrote: >> A race condition exists between _request_firmware_prepare checking >> if firmware is assigned and firmware_fallback_sysfs creating a sysfs >> entry (kernel trace below). To avoid such condition add a mutex >> fw_lock_fallback to protect against such condition. > I am not buying this fix, and it seems sloppy. More below. > >> misc test_firmware: Falling back to sysfs fallback for: nope-test-firmware.bin > So the fallback kicks in with the file that is not there. > >> sysfs: cannot create duplicate filename '/devices/virtual/misc/test_firmware/nope-test-firmware.bin' > And we have a duplicate entry, for the *device* created to allow us to > create a file entry to allow us to copy the file. Your tests had a loop, > so there is actually a race between two entries being created while > one one failed. > >> CPU: 4 PID: 2059 Comm: test_firmware-3 Not tainted 5.3.0-rc4 #1 >> Hardware name: Dell Inc. OptiPlex 7010/0KRC95, BIOS A13 03/25/2013 >> Call Trace: >> dump_stack+0x67/0x90 >> sysfs_warn_dup.cold+0x17/0x24 >> sysfs_create_dir_ns+0xb3/0xd0 >> kobject_add_internal+0xa6/0x2a0 >> kobject_add+0x7e/0xb0 > Note: kobject_add(). > >> ? _cond_resched+0x15/0x30 >> device_add+0x121/0x670 >> firmware_fallback_sysfs+0x15c/0x3c9 >> _request_firmware+0x432/0x5a0 >> ? devres_find+0x63/0xc0 >> request_firmware_into_buf+0x63/0x80 >> test_fw_run_batch_request+0x96/0xe0 >> kthread+0xfb/0x130 >> ? reset_store+0x30/0x30 >> ? kthread_park+0x80/0x80 >> ret_from_fork+0x3a/0x50 >> kobject_add_internal failed for nope-test-firmware.bin with -EEXIST, don't try to register things with the same name in the same directory. > So above it makes it even clearer, two kobjets with the same name. > >> Signed-off-by: Scott Branden >> --- >> drivers/base/firmware_loader/main.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c >> index bf44c79beae9..ce9896e3b782 100644 >> --- a/drivers/base/firmware_loader/main.c >> +++ b/drivers/base/firmware_loader/main.c >> @@ -88,6 +88,7 @@ static inline struct fw_priv *to_fw_priv(struct kref *ref) >> /* fw_lock could be moved to 'struct fw_sysfs' but since it is just >> * guarding for corner cases a global lock should be OK */ >> DEFINE_MUTEX(fw_lock); >> +DEFINE_MUTEX(fw_lock_fallback); > The reason I don't like this fix is that this mutex is named after ther > fallback interface... but... > >> >> static struct firmware_cache fw_cache; >> >> @@ -758,6 +759,17 @@ _request_firmware(const struct firmware **firmware_p, const char *name, >> if (!firmware_p) >> return -EINVAL; >> >> + /* >> + * There is a race condition between _request_firmware_prepare checking >> + * if firmware is assigned and firmware_fallback_sysfs creating sysfs >> + * entries with duplicate names. >> + * Yet, with this lock the firmware_test locks up with cache enabled >> + * and no event used during firmware test. >> + * This points to some very racy code I don't know how to entirely fix. >> + */ >> + if (opt_flags & FW_OPT_NOCACHE) >> + mutex_lock(&fw_lock_fallback); > Whoa.. What does no-cache have anything to do with the fallback interface > other than the fact we enable this feature for the fallback interface? > We don't need to penalize non-fallback users who *also* may want to > enable the no-cache feature. > > So, the fix should be within the boundaries of the creation / deletion > of the kobject, not this nocache feature. Can you please re-evaluate > this code and look for a more compartamentalized solution to the > fallback code only? To be honest, I find the entire firmware code sloppy.  I don't think the cache/no-cache feature is implemented or tested properly nor fallback to begin with.  I'm not claiming this patch is the final solution and indicated such in the cover letter and the comment above. I hope there is someone more familiar with this code to comment further and come up with a proper solution. I have found numerous issues and race conditions with the firmware code (I simply added a test). 1) Try loading the same valid firmware using no-cache once it has already been loaded with cache. It won't work, which is why I had to use a different filename in the test for request_firmware_into_buf. 2) Try removing the "if (opt_flags & FW_OPT_NOCACHE)" in my patch and always call the mutex. The firmware test will lock up during a "no uevent" test.  I am not familiar with the code to know why such is true and what issue this exposes in the code. 3) I have a driver that uses request_firmware_into_buf and have multiple instances of the driver loading the same firmware in parallel.  Some of the data is not read correctly in each instance. I haven't yet to reproduce this issue with the firmware test but currently have a mutex around the entire call to request_firmware_into_buf in our driver. Perhaps it is better at this point to add a mutex in request_firmware_into_buf to make is entirely safe? (Perhaps even with every request_firmware functions as none seems to be tested properly.) Or, add a new function called safe_request_firmware_into_buf with such mutex to protect the function. The current racey request_firmware functions could then be left alone and those who want reliable firmware loading can use the safe calls? > > Luis