Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp614042rdf; Fri, 3 Nov 2023 09:44:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEYQr+8tAxspfwzMDopKwgFtasnI2aWoCxvNAOzQG6BmWslpwGffDhSRpmQYzX2r4ZKm4DY X-Received: by 2002:a17:903:2808:b0:1cc:385b:456a with SMTP id kp8-20020a170903280800b001cc385b456amr13042676plb.44.1699029866220; Fri, 03 Nov 2023 09:44:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1699029866; cv=none; d=google.com; s=arc-20160816; b=qHdFmYVlHCRi5sqNrVj6OTWsudu+CaT5/SqYjBruF0AcPSx2UhttQcNPZIE30TRtHW PwQdyRg8oa1r/p2pUPdY67HGM0/hzvTcuUqL9IdmPuf2yQWheJK0PxbxNm7ALcxW2GyW XUeAhZm2PA7wh4LDnzhIEdfZbeFSvmDLUhsx098xrv1LLJAqDGExf4qKvy0txXH10xXE WG5pYFpI7k3giB5226kFqD4qepUfFNPbLQVcZ3255bdG/dz9CwIu8HNWWtXLrYBXkwVL Cxnzb6hbiRy0uxAHjErPHcQKUFDsqvGBvn2oZNrj8iVqE4xUyN4iUDZBHPG3yFGP/uHs DdNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=LEY8263P6liTtVxShQ4Q1AfX7dDN8YfZHXNNUHvhmug=; fh=s0bRIPB4i5nx5CC9d1svDt9xLlVqk5754xX1ai2oBEk=; b=MyvAW5a51c+8ijPVpnhDptZFxCcG09Z5ZtuNHdOFRbZ66Sob/TZ60B641Rv9Jtdvon PBmFF8PK1PzBv9aNWCeOOkvyjK66ooCFGtNufcs/BQQYN63lfGvqlQiMY0S2RN6biSP5 7wpF/iOwNv5prDFMbBx8Adox2LBXUb+yCjWvC+pUtm3XWGv2+dEPBhnk2vVXTsyy97tr AzsIW1IdDPv3sqhwqePa9Sn4eGalUBlIXvlPT8XnRXTZEoK5xGsavhzu0VJf6J+oL1Rp BENHXhe0M60z2aojNmlfKFGuN0mLBN44iyWnHZOsum9AG2gWR2p8wKQMAgwtpl5Lyu+k gncg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id d1-20020a170903230100b001cc4f55db71si1833549plh.343.2023.11.03.09.44.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Nov 2023 09:44:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 44EB18316E89; Fri, 3 Nov 2023 09:44:23 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345460AbjKCQoI (ORCPT + 99 others); Fri, 3 Nov 2023 12:44:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344250AbjKCQoH (ORCPT ); Fri, 3 Nov 2023 12:44:07 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1C7CF1AD for ; Fri, 3 Nov 2023 09:44:04 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 895152F4; Fri, 3 Nov 2023 09:44:45 -0700 (PDT) Received: from [10.57.81.32] (unknown [10.57.81.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BB3623F64C; Fri, 3 Nov 2023 09:44:00 -0700 (PDT) Message-ID: Date: Fri, 3 Nov 2023 16:43:58 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Race between of_iommu_configure() and iommu_probe_device() Content-Language: en-GB To: Konrad Dybcio , Hector Martin , iommu@lists.linux.dev, Asahi Linux , LKML Cc: Janne Grunau , Rob Herring , Joerg Roedel , Will Deacon , Dmitry Baryshkov References: <4f06d727-d424-44f8-bd80-53c452b289d3@kernel.org> From: Robin Murphy In-Reply-To: <4f06d727-d424-44f8-bd80-53c452b289d3@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 03 Nov 2023 09:44:23 -0700 (PDT) On 2023-11-03 12:48 pm, Konrad Dybcio wrote: > > > On 3.11.2023 12:55, Hector Martin wrote: >> I just hit a crash in of_iommu_xlate() -> apple_dart_of_xlate() because >> dev->iommu was NULL. of_iommu_xlate() first calls iommu_fwspec_init >> which calls dev_iommu_get(), which allocates that member if NULL. That >> means it got freed in between, but the only thing that can do that is >> dev_iommu_free(), which is called from __iommu_probe_device() in the >> error path. That is serialized via a static lock, but not against the >> xlate stuff. >> >> I think the specific sequence of events was as follows: >> >> - IOMMU driver has not probed yet >> - Device driver tries to probe, and gets deferred via of_iommu_xlate() >> -> driver_deferred_probe_check_state() because there are no IOMMU ops yet >> - IOMMU driver probes >> - IOMMU driver registration triggers device probes >> - IOMMU device probe fails, because there is no fwnode/OF data yet (e.g. >> apple_dart_probe_device returns ENODEV if dev_iommu_priv_get() returns >> NULL, and that is set in apple_dart_of_xlate()) >> - __iommu_probe_device is in the error exit path, and at this exact >> point a parallel device probe is running of_iommu_xlate() >> - of_iommu_xlate() calls iommu_fwspec_init(), which ensures dev->iommu >> is non-NULL, which at this point it is >> - immediately after that, __iommu_probe_device() calls dev_iommu_free() >> since it is in the process of erroring out. This frees and sets >> dev->iommu to NULL. >> - of_iommu_xlate() calls ops->of_xlate() >> - apple_dart_of_xlate() calls dev_iommu_priv_set(), which crashes >> because dev->iommu is now NULL. >> >> As far as I can tell it's not just the specific driver xlate call >> setting priv that's the problem here, but there is one big race between >> the entire fwspec codepath (accessing dev->iommu->fwspec) and >> __iommu_probe_device() (allocating and freeing dev->iommu). >> >> Thinking about this whole thing is making my brain hurt. Thoughts? How >> do we fix this? > FWIW I've been getting inexplicable boot-time crashes that sometimes > spew out a fraction of a log line like: > > [x.yyyyyyyy] addr.iommu > > on some Qualcomm devices every now and then for quite some time.. > Not very common though. Might be this, might be something else.. Sounds likely to all be the same thing as here: https://lore.kernel.org/linux-iommu/1698825902-10685-1-git-send-email-quic_zhenhuah@quicinc.com/ The true solution is to pull the of_xlate step into iommu_probe_device() itself, which I'm working towards, and finally get rid of the horrible "replay" logic which causes no end of problems. Thanks, Robin.