Received: by 10.192.165.148 with SMTP id m20csp976152imm; Wed, 2 May 2018 11:51:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqdeMm11XEH63wEMC2pIO2WzkyK+sarLJ5CUmJrm1xiJ8SkuzGANo0ARZsrTj3VgecHToJb X-Received: by 2002:a65:4502:: with SMTP id n2-v6mr16526019pgq.95.1525287099838; Wed, 02 May 2018 11:51:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525287099; cv=none; d=google.com; s=arc-20160816; b=UgZ/Taal+Ep33bMjrkMsHgCeJcyzL+MlL+bcDbXsKFlabd6QlO68XkLayj+jwXdSFB NpjFx/GAoNdch8z/IZY4AO46mDetc3Gc1mDQ0dHUYTZxYYDPgSoUvK5HwWOUEI1+NnGb eaFJYJ3O2NNeRSVjvBB20oUvHth3ic+rVmKSXDIavU9/lung/MleeOXpDK7CeC9zySAL dscPXTF7Lu0PUupJnJx/lRBRerIWdD/COf1gm5qGoj4FXfSE/WIbJijG8wyR0+UhRPjd H0GhPXs4VU7I9dlw5p1PAoOunysuGNuX42d+aODL5ita/4mLe8txyK6dPJ2vjJx5UZkI WLsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=lbBi5k3qfBJgmE3tE9utRbs/eeAFKLZ+sHmYyxAbAsM=; b=g7DEoWxyXsQ8uw+f5npv6lf3KValOeONTyP+5UfWoxU393UnzIQCHmpEyUwvcbKrez F7cXC8ZC3jiFKHW0AaP2hI5h+EEcQJNMWcTOhV1Jc/jib75HchRt5n+4bX52U3GpoOc+ GhIwhLJeFYH/dsOcOYb9qZUSFc9SzCzY+68n7dwbmKQagDoHJkEyMrNDjea7keSnJb9Z tmae2Bk/g7614cc9zT+SOuxW12G1YqaT0jHITTczvUoVAEyHv+CanCTu4JVElqRKktaT Gt1LYHyPa1gDFVnN+W6vVQjw5rUYAMIfMqaLlp2sjbAhp6eDDXsZC4uXlMLiSBJI3gSI qWfQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i3si13003851pfc.186.2018.05.02.11.51.24; Wed, 02 May 2018 11:51:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751214AbeEBSuC (ORCPT + 99 others); Wed, 2 May 2018 14:50:02 -0400 Received: from foss.arm.com ([217.140.101.70]:34024 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750978AbeEBSuB (ORCPT ); Wed, 2 May 2018 14:50:01 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 84E6EF; Wed, 2 May 2018 11:50:00 -0700 (PDT) Received: from [10.1.210.88] (e110467-lin.cambridge.arm.com [10.1.210.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AEE913F587; Wed, 2 May 2018 11:49:58 -0700 (PDT) Subject: Re: [RFC PATCH] driver core: make deferring probe forever optional To: Rob Herring Cc: "linux-kernel@vger.kernel.org" , devicetree@vger.kernel.org, boot-architecture@lists.linaro.org, Stephen Boyd , Greg Kroah-Hartman , Linus Walleij , Alexander Graf , Grant Likely , Mark Brown , "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE" References: <20180501213114.20183-1-robh@kernel.org> From: Robin Murphy Message-ID: <74d495d8-04e2-fb7d-7d07-0905fbc8a6cf@arm.com> Date: Wed, 2 May 2018 19:49:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/05/18 15:48, Rob Herring wrote: > On Wed, May 2, 2018 at 6:40 AM, Robin Murphy wrote: >> On 01/05/18 22:31, Rob Herring wrote: >>> >>> Deferred probe will currently wait forever on dependent devices to probe, >>> but sometimes a driver will never exist. It's also not always critical for >>> a driver to exist. Platforms can rely on default configuration from the >>> bootloader or reset defaults for things such as pinctrl and power domains. >>> This is often the case with initial platform support until various drivers >>> get enabled. There's at least 2 scenarios where deferred probe can render >>> a platform broken. Both involve using a DT which has more devices and >>> dependencies than the kernel supports. The 1st case is a driver may be >>> disabled in the kernel config. The 2nd case is the kernel version may >>> simply not have the dependent driver. This can happen if using a newer DT >>> (provided by firmware perhaps) with a stable kernel version. >>> >>> Unfortunately, this change breaks with modules as we have no way of >>> knowing when modules are done loading. One possibility is to make this >>> opt in or out based on compatible strings rather than at a subsystem >>> level. >>> Ideally this information could be extracted automatically somehow. OTOH, >>> maybe the lists are pretty small. There's only a handful of subsystems >>> that can be optional, and then only so many drivers in those that can be >>> modules (at least for pinctrl, many drivers are built-in only). >> >> >> Ooh, this is exactly what I wanted for of_iommu_xlate(), and would be much >> nicer than the current bodge using system_state that I ended up with. The >> context there is very similar - if the device has a parent IOMMU then we >> want to wait for that to probe first if possible, but with a deadline such >> that if it doesn't show up then we can go ahead and make progress without it >> (on the assumption that DMA ops can fall back to CMA). The modules problem >> doesn't currently apply to IOMMU drivers either, although we do use a >> special of_device_id table for detecting built-in drivers via >> OF_IOMMU_DECLARE() to avoid deferring at all when we know it would be >> pointless - a more generic solution for that could certainly be useful too. > > Ah, so you kept the IOMMU_OF_DECLARE() but it does nothing but define > a table. We already have the driver match table which should pretty > much be the same data, so it would be better if we could use that if > possible. If we used MODULE_DEVICE_TABLE somehow, we could avoid > modifying lots of drivers. Though many built-in only drivers omit > that. The other problem is it would become a large set of tables to > search thru because it is global. That would probably end up slower > than just deferring. So we need something like > _DEVICE_TABLE() to have per subsystem tables. Then this > function in this patch would need to be told which table to use. > However, this is all really just an optimization to avoid deferring at > all and could be addressed later. Is there any data on how much time > you save avoiding deferring? This has come up in the past and I don't > think it is much. In the of_iommu case it's not actually an optimisation, but dodges a big problem with the self-contained implementation - if we blindly defer on a not-yet-present IOMMU until all built-in drivers have had a chance to register themselves naturally, then by the time we could safely assume the relevant IOMMU driver *isn't* built-in, all the clients (which may include the boot device) can already be stuck on the deferred probe list with nothing left to kick it and make progress again. It's quite possible I could have done better, I just wasn't very familiar with the driver core at the time, and repurposing the magic table instead of entirely removing it was by far the easiest way forward. With this patch we would have a guarantee that the deferred list gets at least one kick after the deadline for waiting has passed, so in theory we could then just use the regular driver matching mechanisms to see whether a given IOMMU node can possibly probe or not. I'm not sure we could get rid of the driver introspection aspect entirely, since it might be the case that the IOMMU itself has dependencies and winds up on the deferred list behind one or more of its clients, in which case we'd still want them to keep waiting even after the deadline is nominally up. > I've also been thinking about if we could use MODULE_DEVICE_TABLE to > provide a list compatible strings from modules as a whitelist of > devices to keep deferring probe on. That would require building > modules to build the kernel which I don't think would work. I think my > conclusion is that the cases we care about may be short enough to just > manually maintain such a list. FWIW we did get rid of the equivalent table completely for the arm64 ACPI code, but that only has to support 2 types of IOMMU so just evaluates the respective driver config symbols directly with IS_BUILTIN(). Sadly that method really can't scale to DT with multiple compatible strings per driver... I guess there's also the possibility that a single driver may want multiple behaviours, if e.g. if SoC variants A and B have some identical peripherals but slightly different pinctrl/IOMMU/etc. hardware such that A has workable default behaviour and can be treated as optional, whereas B absolutely must be controlled by the kernel for the consumers to function properly, and they *should* defer forever otherwise. I think that would pretty much demand some sort of explicitly-curated white/blacklist setup at the subsystem or driver level. Robin.