Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753767AbdF0V0F (ORCPT ); Tue, 27 Jun 2017 17:26:05 -0400 Received: from mail-pf0-f177.google.com ([209.85.192.177]:35086 "EHLO mail-pf0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753395AbdF0VZ6 (ORCPT ); Tue, 27 Jun 2017 17:25:58 -0400 Date: Tue, 27 Jun 2017 14:25:53 -0700 From: Jakub Kicinski To: "Luis R. Rodriguez" Cc: Bjorn Andersson , Arend Van Spriel , Tom Gundersen , Daniel Wagner , Ming Lei , yi1.li@linux.intel.com, takahiro.akashi@linaro.org, nbroeking@me.com, Greg Kroah-Hartman , mfuzzey@parkeon.com, ebiederm@xmission.com, dmitry.torokhov@gmail.com, dwmw2@infradead.org, jewalt@lgsinnovations.com, rafal@milecki.pl, rjw@rjwysocki.net, atull@kernel.org, moritz.fischer@ettus.com, pmladek@suse.com, johannes.berg@intel.com, emmanuel.grumbach@intel.com, luciano.coelho@intel.com, luto@kernel.org, torvalds@linux-foundation.org, keescook@chromium.org, dhowells@redhat.com, pjones@redhat.com, hdegoede@redhat.com, alan@linux.intel.com, tytso@mit.edu, paul.gortmaker@windriver.com, mtosatti@redhat.com, mawilcox@microsoft.com, stephen.boyd@linaro.org, markivx@codeaurora.org, linux-kernel@vger.kernel.org, oss-drivers@netronome.com, systemd-devel@lists.freedesktop.org Subject: Re: [PATCH] firmware: wake all waiters Message-ID: <20170627142553.0fe417b3@cakuba.netronome.com> In-Reply-To: <20170627163942.GQ21846@wotan.suse.de> References: <20170623233702.20564-1-jakub.kicinski@netronome.com> <20170626212036.GE21846@wotan.suse.de> <20170626191009.0c11eed0@cakuba.netronome.com> <20170627163942.GQ21846@wotan.suse.de> Organization: Netronome Systems, Ltd. MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8205 Lines: 170 On Tue, 27 Jun 2017 18:39:42 +0200, Luis R. Rodriguez wrote: > On Mon, Jun 26, 2017 at 07:10:09PM -0700, Jakub Kicinski wrote: > > On Mon, 26 Jun 2017 23:20:36 +0200, Luis R. Rodriguez wrote: > > > > In that case we will make them all use the same struct firmware_buf. > > > > When wake up happens make sure it's propagated to all of them. > > > > > > > > Signed-off-by: Jakub Kicinski > > > > > > There's a slew of bugs lurking here though! > > > > > > As noted the reported Intel driver issues still need other fixes, one was the > > > fw_state_done() on the direct filesystem lookup mechanism [1], and that may be > > > a regression since direct filesystem loading was added, and even secondary > > > requests would seem to just wait forever (MAX_SCHEDULE_TIMEOUT); the combination > > > of both fixes should fix your reported issue. > > > > > > Do you intend on submitting those changes as well ? There's still *other* bugs > > > with this feature though... Knowing if you will follow up with further fixes > > > will be appreciated. > > > > No, I don't have any more fixes in my tree right now :) > > Ok I can take on the other bits. > > > What I'm > > looking towards implementing is actually a ability for NICs to load > > default FW but then enable users to load different FW on their request. > > request_firmware_direct() loads optional firmware but this is a sync call. We > don't currently have a similar API for async, we would have gotten this with > the driver data API I wrote, but am now looking forward to Greg advising how to > implement this. But it seems you need more actually, comments below. > > > The problem is that advanced NICs are quite programmable [1] and > > depending on use case one may want to load different firmware files. > > Right, so in the 802.11 world some devices might use different firmware for > different modes of operation, STA, AP, Mesh, but this is all very protocol > specific, so userspace could tickle the kernel about a mode. > > Do your use cases have protocol definitions which can be exposed in userspace? > Or are these just fw variants with different bells and whistles? How man > different use cases are we talking about? Right now we have three modes that come from Netronome itself, a "basic NIC" one, and two advanced for TC flower/Open vSwitch acceleration and for eBPF offload. I was hoping some enumeration scheme could work here, but I really can't come up with one. People can download the SDK and write a FW with their own offloads, bells and whistles, I feel like they should be able to load that with the upstream kernel and minimal effort :( > > It's slightly close to the FPGA use case, only with FPGA people don't > > expect much plug and play, and with NICs the default mode after boot > > must be "look as much as a standard NIC as possible". Then loading > > "advanced"/hand crafted firmware can turn more interesting features on. > > Makes sense. > > > The FW loading we have now in drivers/net/ethernet/netronome/nfp is > > requesting default FW and returning -EPROBE_DEFER if not found. > > Oh I see -- right now nfp_nsp_init() is the path that will call the firmware > load via request_firmware() on nfp_net_fw_find(), and if this fails it fails to > find firmware it still returns 0, and the nfp_net_pci_probe() does the > -EPROBE_DEFER handling. > > Ugh. This is super hacky, and I realize -EPROBE_DEFER is used for these hacks > folks should stop doing this, specially for this use case given we thought > about it and I believe we have a solution now. > > Tom Gundersen and Daniel Wagner worked on a userspace solution to help with > this, it works with two simple modes: best-effort and final-mode. The idea is > the firmwared daemon will be kicked into final-mode once userspace knows the > real rootfs is ready, and this in turn can be used to signal a final > notification that the optional or required firmware is *definitely* not there. > > Arend was going to start toying with it, so it would be good to wait for his > feedback. Haha, yes the -EPROBE_DEFER is definitely a hack. I was trying to do the right thing, but then I found out that systemd dropped the support for FW uevents and major distributions don't even build the fallback in. To be honest waiting for rootfs to be available is lower on my list of priorities, but it's definitely nice to have. I also don't care about supporting more complex rootfs setups, simply trying whatever comes after initramfs covers 99.9% use cases. 0.1% can load the FW manually/ rebind the driver IMHO. > > Now I > > need to find a way to allow users to "push" whatever advanced FW they > > have into the NIC after/during boot. > > Be careful how you do this as you'll have to support it in the driver forever > if you use something like sysfs I think, otherwise you will break some > userspace. However if you use debugfs I think its understood that's loose API. Unfortunately the netdev community does not like debugfs. I would prefer to extend the firmware subsystem if possible and use the existing sysfs interface, just in a new "mode". > I'd recommend instead to first see if you can get a mapping of the modes as > specific knobs / tunables through the networking stack, if so then those can > be used as triggers. If not, consider the *features* that are exposed by > the different firmwares and consider their need as triggers for a reload. > How many other devices do the same you do? In what modes? There are a number of NPU-like products. Also the ability to parse packets in arbitrary ways and look things up in TCAMs is expanding in simpler hardware. It's hard to express parsing graph as a set of features, especially that nowadays crafting custom protocols seems to be in vogue. > > Current firmware subsystem doesn't seem to cater to this use case to > > well. > > Its a matter of asking and talking. I've provided references of things to > try to address the hacky -EPROBE_DEFER. It does however require a userspace > daemon used, so it does require use of the uevent fallback mechanism. Do you know how systemd developers feel about the issue (CCed)? Given that it seems to dominate in data center OSes now I'm slightly worried having to push Big Linux Vendors to package some seemingly embedded-centric software just to make advanced NICs run :( > > I have to look at the FPGA-related code. > > Not sure how that would help. Is it huge firmware? Up to a few megabytes. You are right, the major feature of FPGA API doesn't interest us... > > The three main > > problems to solve are: > > - how to stay bound and retry the direct default FW load until rootfs > > is mounted (equivalent to when -EPROBE_DEFER would give up); > > I've thrown a bone for that. Thanks. > > - how to expose permanent FW loading sysfs interface which won't > > disappear after the first -1/1 is written to .../loading; > > The lib/test_firmware.c driver has an example sysfs know a driver could use > on its own to load firmware. This is not as dynamic as you'd want, so I had > implemented an alternative interface which lets you customize hooks in userspace > first and then you just have a sync or async trigger for the test driver > data. It would seem this will not go upstream but you can look at it as an > example of what could be done: > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=20170605-driver-data > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/commit/?h=20170605-driver-data&id=3696afe8d4aba5606dc8f3c562aeae1687f3b53e > > But take the warning above about using sysfs serious, you don't want to break > userspace for users, and you want to see if you can first work towards something > more generic with the networking folks. Thanks for the pointers! > > - how to make sure different cards, which request the same file name > > can be served different default firmwares... > > I believe your patch + the error path fix will handle this now, no? I'm not sure. I think it would work if I set FW_OPT_NOCACHE, though. I need to test that. Thanks a lot for the comments!