Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp450668rdf; Fri, 3 Nov 2023 05:48:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF5KjABEVWlAje6w/vSrAxvdM+1opdl5viQAIunj61YS9sbHC9YsVXbzGKhzoOawjRfKpsC X-Received: by 2002:a17:902:f213:b0:1bc:6c8:cded with SMTP id m19-20020a170902f21300b001bc06c8cdedmr19874746plc.67.1699015738774; Fri, 03 Nov 2023 05:48:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699015738; cv=none; d=google.com; s=arc-20160816; b=RnYurFlWtyap4lO3GstFkGtooIv6FmI86RRqMO8+yGqY6iX6VeZM1ZX/vCfo2CP5TU lTkkBIoj+ZzdQZYpklPOT8wz/rwWf1F80Bwqcev5T310LTg0Y21qdj0wE19+bn8CRq5p Lqf8K0Ftwy4+Pk6WBPKIgx6sdlEOoEJeZW8eMNgihfQIz3RckRPXutNL6M8CazagAmJ2 VNCOCUr6SzeecnbzZdIPkzmC87FzsKVm52/2JAMkUXiNbhIkY8e0bIz7wizVHAGrolMW DsFBn+JbW51RgoQc2jJalajgXp8hUyn3pVz0XlCmKnjGGCH7x21FGNeKmMA1TDsISfm/ Zfiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=ga6Ui/i0PZkV0cW6J5WdGMtCTvc5thRCUye3l1jJA0U=; fh=nUNaQoddeYJ40XV5iKBTWuqNaSmGvana0mVMkmSDcuo=; b=Ah3gFLsV7a4M1qqqK9/TqcWBkipmwzEbyC3gbV+oLR2VU8lMgXBJabxQr3WXPwnkSm uNVcQsQ6pnn1uijx10VQSqAK7XYz2tUMPhOZCER8o6YQpEZZBEfvj62ASfZAkLtd9ywp kcx6TaN9ofWXTZufMCNQFjGsMFtp9j8LYKra9Nery2sDHqSxSsb/zblqzWEqnNYAcQyu t5QfgTP/RPTfzvi6r+ecXhLCNCGa+SBkGN4U+2sU/FdtCzMOnJFYt9RwXK5xG8Lngm2J H9WEEDlYSlckvz7erXaQjogOhZUwTsmZKD8BC2NVK6SQADEpYztcjQ4VgS+73mR1L222 oEpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="MHbjb/g1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id t5-20020a170902e84500b001cc60ca8f43si1479385plg.358.2023.11.03.05.48.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 05:48:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="MHbjb/g1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 454D382F875B; Fri, 3 Nov 2023 05:48:56 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346673AbjKCMsj (ORCPT + 99 others); Fri, 3 Nov 2023 08:48:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjKCMsh (ORCPT ); Fri, 3 Nov 2023 08:48:37 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79F19CE for ; Fri, 3 Nov 2023 05:48:35 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C5570C433C7; Fri, 3 Nov 2023 12:48:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1699015715; bh=OogBIZR4ZyTsmzPlBhZjRPRCtrTTw2tdeKdpttcq/eI=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=MHbjb/g1M4ksYSJjVAhfLgP2SZ0twH1NkfVvwdtuNMA5qIf59TCyDLjog4QrQ5mAF FJDF4X4wCGi1OQypa57ijcy4kuVU7PRqU+6tkUlfHDONSx8fy0KpQ/yAdr8ybXZO2f 7QvxyNgxQVlc93IiQo7lnnyLtO0U8FnTcjMeSHbbpca6SpVEV1398z58i5c9eTK+de OiAgMHO0+Iq3DAHWpS+mJ64Qceu35RoKIOY+laZkshRiRx8HIM/7SN2v4EY06VHM69 nyJ9Q8od4gmX9I7o8m0WZ45YLjUe39f126HVrVi834Cqo4YTGGy42AqwW7bjVy2Q4G kE+LLuK6bKuZg== Message-ID: <4f06d727-d424-44f8-bd80-53c452b289d3@kernel.org> Date: Fri, 3 Nov 2023 13:48:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Race between of_iommu_configure() and iommu_probe_device() To: Hector Martin , iommu@lists.linux.dev, Asahi Linux , LKML Cc: Janne Grunau , Rob Herring , Joerg Roedel , Will Deacon , Robin Murphy , Dmitry Baryshkov References: Content-Language: en-US From: Konrad Dybcio In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.7 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 03 Nov 2023 05:48:56 -0700 (PDT) On 3.11.2023 12:55, Hector Martin wrote: > I just hit a crash in of_iommu_xlate() -> apple_dart_of_xlate() because > dev->iommu was NULL. of_iommu_xlate() first calls iommu_fwspec_init > which calls dev_iommu_get(), which allocates that member if NULL. That > means it got freed in between, but the only thing that can do that is > dev_iommu_free(), which is called from __iommu_probe_device() in the > error path. That is serialized via a static lock, but not against the > xlate stuff. > > I think the specific sequence of events was as follows: > > - IOMMU driver has not probed yet > - Device driver tries to probe, and gets deferred via of_iommu_xlate() > -> driver_deferred_probe_check_state() because there are no IOMMU ops yet > - IOMMU driver probes > - IOMMU driver registration triggers device probes > - IOMMU device probe fails, because there is no fwnode/OF data yet (e.g. > apple_dart_probe_device returns ENODEV if dev_iommu_priv_get() returns > NULL, and that is set in apple_dart_of_xlate()) > - __iommu_probe_device is in the error exit path, and at this exact > point a parallel device probe is running of_iommu_xlate() > - of_iommu_xlate() calls iommu_fwspec_init(), which ensures dev->iommu > is non-NULL, which at this point it is > - immediately after that, __iommu_probe_device() calls dev_iommu_free() > since it is in the process of erroring out. This frees and sets > dev->iommu to NULL. > - of_iommu_xlate() calls ops->of_xlate() > - apple_dart_of_xlate() calls dev_iommu_priv_set(), which crashes > because dev->iommu is now NULL. > > As far as I can tell it's not just the specific driver xlate call > setting priv that's the problem here, but there is one big race between > the entire fwspec codepath (accessing dev->iommu->fwspec) and > __iommu_probe_device() (allocating and freeing dev->iommu). > > Thinking about this whole thing is making my brain hurt. Thoughts? How > do we fix this? FWIW I've been getting inexplicable boot-time crashes that sometimes spew out a fraction of a log line like: [x.yyyyyyyy] addr.iommu on some Qualcomm devices every now and then for quite some time.. Not very common though. Might be this, might be something else.. Konrad