Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751735AbdFGR70 (ORCPT ); Wed, 7 Jun 2017 13:59:26 -0400 Received: from mx2.suse.de ([195.135.220.15]:47653 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751533AbdFGR7Z (ORCPT ); Wed, 7 Jun 2017 13:59:25 -0400 Date: Wed, 7 Jun 2017 19:59:22 +0200 From: "Luis R. Rodriguez" To: "Li, Yi" Cc: "Luis R. Rodriguez" , atull@kernel.org, gregkh@linuxfoundation.org, wagi@monom.org, dwmw2@infradead.org, rafal@milecki.pl, arend.vanspriel@broadcom.com, rjw@rjwysocki.net, moritz.fischer@ettus.com, pmladek@suse.com, johannes.berg@intel.com, emmanuel.grumbach@intel.com, luciano.coelho@intel.com, kvalo@codeaurora.org, luto@kernel.org, takahiro.akashi@linaro.org, dhowells@redhat.com, pjones@redhat.com, linux-kernel@vger.kernel.org, linux-fpga@vger.kernel.org Subject: Re: [PATCHv2 0/3] Enable no_cache flag to driver_data Message-ID: <20170607175922.GM27288@wotan.suse.de> References: <1495262819-981-1-git-send-email-yi1.li@linux.intel.com> <20170524190357.GB8951@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5664 Lines: 138 On Tue, Jun 06, 2017 at 02:31:54PM -0500, Li, Yi wrote: > > We use the cache upon suspend to cache the firmware so that upon resume a > > request will use that cache, to avoid the file lookup on disk. Doing a test > > with qemu suspend + resume is possible but that requires having access to > > the qemu monitor interface and doing something like this to trigger a wakeup: > > > > echo system_wakeup | socat - /dev/pts/7,raw,echo=0,crnl BTW if it helps for testing: https://github.com/mcgrof/kvm-boot > I am studying at the firmware caching codes and have to say it's very > complicated. :-( Here are some questions: It is! For this reason I tried to document it, have you read this: https://www.kernel.org/doc/html/latest/driver-api/firmware/firmware_cache.html This is certainly missing the notes about how some of this is loosely re-used for possible duplicate requests though, but I think I've mentioned this on my review of your patches so far. > 1. Since device_cache_fw_images invokes dev_cache_fw_image through > dpm_for_each_dev, adding a debugfs driver to kick it can only cache firmware > for those associated with devices which has PM enabled, which do not include > the driver_data_test_device. Any suggestions? Ah, consider adding PM to driver_data_test_device ? May be worth updating the documentation bout this requirement too! > 2. Look into dev_cache_fw_image function, devres_for_each_res will walk > through the firmware have been loaded before (through assign_firmware_buf -> > fw_add_devm_name) and add to the todo list, eventually it will create the > fw_names list. So in the test driver, we need to load the firmware once > before calling the kick? Yes, makes sense, given its a firmware cache, it would only make sense to cache firmware for requests already made. An important note I made in the documentation which you did not mention but should be important for your own sanity: "The firmware devres entry is maintained throughout the lifetime of the device. This means that even if you release_firmware() the firmware cache will still be used on resume from suspend." This means the cache will be tried even if the driver did use release_firmware() after request_firmware(). > > static void device_cache_fw_images(void) > > { > > ... > > fwc->state = FW_LOADER_START_CACHE; > > dpm_for_each_dev(NULL, dev_cache_fw_image); > > ... > > } > This only applies to the devices have PM enabled. Makes sense! > > device_uncache_fw_images_delay() will be called for PM_POST_RESTORE, which > means the hibernation restore got error. How about the successful restore > case, calling driver will free its firmware_buf after loading? As it is on my latest 20170605-driver-data [0] (but should be just as yours if you used a recent branch of mine), fw_pm_notify() uses: static int fw_pm_notify(struct notifier_block *notify_block, unsigned long mode, void *unused) { switch (mode) { ... case PM_POST_SUSPEND: case PM_POST_HIBERNATION: case PM_POST_RESTORE: /* * In case that system sleep failed and syscore_suspend is * not called. */ mutex_lock(&fw_lock); fw_cache.state = FW_LOADER_NO_CACHE; mutex_unlock(&fw_lock); enable_firmware(); device_uncache_fw_images_delay(10 * MSEC_PER_SEC); break; } } So it uses also PM_POST_RESTORE. I think that covers it all ? As it is today we handle the firmware cache for all drivers, it should be transparent to drivers. [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data > > To me this last part smells like a possible source of issue (not sure) if we might > > suspend/resume a lot in short period of time, this theory could be tested by toggling > > on/of this debugfs interface I suggested while having requests of different types > > blast in. > Agree, it might create an issue if the system is get into restore_prepare > again before the device_uncache_fw_images_delay clear the cache, why we need > the 10 * MSEC_PER_SEC delay? In theory, fwc->name_lock should protect the > case though. One possible solution may be to cancel_delayed_work(&fw_cache.work) or cancel_delayed_work_sync(&fw_cache.work) on suspend, and then force it to run *right away* so we clear any pending old cache. One performance issue with this is we are clearing some possible cache we are about to re-create, so mod_delayed_work() seems more efficient -- provided we have accounting notes to know no new firmware requests have trickled in since the last cache build. This seems rather complicated so a first initial approach to just kill all known cache before suspend seems easy and fair. BTW such a fix might be a stable candidate if you find a real behavioural issue with the kernel ! > > This is the sort of testing which would really help here. > > > > Likewise, if you are extending functionality please consider ways to break it :) > Understand the need to test the firmware caching part. For non-caching test > case, will it be enough if we can test that the noncache setting will ban > the firmware name be added to the fwc->fw_names list? Good question. Let's see... in the future a device might have two requests with the same firmware, one with a cache enabled and one without it -- I suppose for now we should perhaps disable a non-cache request if one already exists with a cache request ? The code for that probably is not there... But other than this corner case... Yeah I think that's a reasonable test case. Such a code seems may need a debug export symbol to allow drivers to query for this info, if so feel free to a respective debug kconfig entry for it. Luis