Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp182979iog; Fri, 17 Jun 2022 01:22:17 -0700 (PDT) X-Google-Smtp-Source: AGRyM1s33SZKGDJ2x/djVKctdR7qBhzSssbaSrtMtf34Fz9aWKc2qd+/8gb0mA/RM4OiAAU1/q5t X-Received: by 2002:a17:906:ced1:b0:710:f654:87ef with SMTP id si17-20020a170906ced100b00710f65487efmr8355330ejb.194.1655454137217; Fri, 17 Jun 2022 01:22:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655454137; cv=none; d=google.com; s=arc-20160816; b=Oo97rxZnSknAPMdaN13EWz9QIP8Wtcdg9mv2UnStnrGwZ1qDI4T2Izr3AHgozKSx1g qYODSJMhek5euSCY1DrsPfvEOccc8JwThAoYI6DImSfVpSo5EZ1JJHja7e7pC9SGTrVr g31yA4/+DOT041bRT3gA+Lc/ZRmrLKRzkQKS0AFAHrNTFfmBllbUdrO0NZvttPfnRod7 O06RoOLuA6G+znHQdTRR6A6nUrr2JKhGuJmOelVu5gdpxijdjqegftXNV6H+PEGM5J7P D+2rY2hzVMOFOVN3HW2TTD0Ae/oRIIz506UuE8ajlvTLkY46ZvpwiXd6cSO9DkN2krs0 NsyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject:dkim-signature; bh=OdNhNfoqdC0FVLit2cDZSs2L6y6qr7vA7s4OIwUXg9k=; b=VQr9SD5ke4wIo97Zf9GILeaBw7WEdkMkPWuCOACRofNeSYrEjNEVxmCEpuvdwCu9nH nzrDTe16qryllPn0dJcc16lwF7Wujn9xTrHeHkNUeDifyTvZv3dIQilPDlnzQ1zH+Iax 7BAzcR9McO2KsCTHfRbwIhWANlvRWf0q9tsS043kMuq4svOVT+ax7EaYHux/b8k0SMVx Yb73ZBzsLVWwuKr8h6DbEh9RYeF/UYzahYrvG0My79rIAtXAXdV+CSRrAsMf2RxKZJ0y iZKPnG+l6GyFZyTd8nrULP+uS4/jaMKAt+vTmtvjHfvt7OfI44tiN9LXAE8y6obj5Mxv jGKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PS7hc574; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mp35-20020a1709071b2300b007174e91f206si5051096ejc.217.2022.06.17.01.21.20; Fri, 17 Jun 2022 01:22:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PS7hc574; spf=pass (google.com: domain of linux-crypto-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380851AbiFQIUl (ORCPT + 99 others); Fri, 17 Jun 2022 04:20:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380839AbiFQIUk (ORCPT ); Fri, 17 Jun 2022 04:20:40 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F12D68337 for ; Fri, 17 Jun 2022 01:20:39 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id k7so3292043plg.7 for ; Fri, 17 Jun 2022 01:20:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=OdNhNfoqdC0FVLit2cDZSs2L6y6qr7vA7s4OIwUXg9k=; b=PS7hc574OxdHYQVtA0Z6It9mhv0nnSGL7KhN1FWmFGS4vEhL+Aoql8zsGAu4Vm8gtU K67X59H+flFBydVsHR5Myn+4g/p4BgEAE5I3ISsM1Gp0xV8UEsvSeMd0lVT1od526AFX AduJhFxXFdSeSYUIkGLiw2HUKABu5y/JUmc0HS8PkC9CsvECgz5h6DR31f4r+n9MC28I Ogoh0YMfwvuh4Nbs1S8uaAaaNObhgA8GHoG/eWDxAVbGPdSUIarUqBVacCAwOkA5VIrm qgeDRm5ew8uCK0FR+4Vv3xkVLrSy4vGaZh3OI+D+YYfxRIVEIdb0YlUALc7eOJD5ApX+ 4bew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=OdNhNfoqdC0FVLit2cDZSs2L6y6qr7vA7s4OIwUXg9k=; b=vj6Mqu3orHHaFkTdErwZ90gHsDBLXXP7HnN6Ox6IyegOz5dmHjrdAo0dj4S8TwOSbl I1uOPf2Ts09WEpsL65KpBj8To7kS/js565O7TbJMm8FZwyVA5llfzZscfbiYWrYiYxEg 2yNicUHmhDfS2Q10Os8jxEmLWDO5XBrK2KZb1ytqz//Q8hX1QbodGGXv6HGWsJLht8K4 dvZqTPiTfQPyHQFOurpSZbfRixE7YGqyB3fYgt+5RjjATN9hHkJ/QpFmHHJzdA96rH5J PUJBmMITY4rqr2SwyTNQEFde3c2gyhDRWqQacJMbWTwYJS8W6o0Nh9OF7vHKTIV7IKBY 971g== X-Gm-Message-State: AJIora8OkR77atM3Z9m/Q3Tocb8isAFipw+j/xpPjM64TJ366Y+pXUYf Xiqvgj64QJOPaBF66ZBb0MEykZEok2I41g== X-Received: by 2002:a17:902:b598:b0:168:b0b2:f05a with SMTP id a24-20020a170902b59800b00168b0b2f05amr8377071pls.0.1655454038736; Fri, 17 Jun 2022 01:20:38 -0700 (PDT) Received: from [10.83.0.6] ([199.101.192.187]) by smtp.gmail.com with ESMTPSA id x19-20020a637c13000000b00408c56d3379sm3121430pgc.74.2022.06.17.01.20.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 17 Jun 2022 01:20:38 -0700 (PDT) Subject: Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove From: Zhangfei Gao To: Jean-Philippe Brucker Cc: Greg Kroah-Hartman , Arnd Bergmann , Herbert Xu , Wangzhou , Jonathan Cameron , linux-accelerators@lists.ozlabs.org, linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org, iommu@lists.linux-foundation.org, Yang Shen References: <20220610123423.27496-1-zhangfei.gao@linaro.org> Message-ID: <53b9acef-ad32-d0aa-fa1b-a7cb77a0d088@linaro.org> Date: Fri, 17 Jun 2022 16:20:30 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On 2022/6/17 下午2:05, Zhangfei Gao wrote: > > > On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote: >> On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote: >>>>> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c >>>>> index 281c54003edc..b6219c6bfb48 100644 >>>>> --- a/drivers/misc/uacce/uacce.c >>>>> +++ b/drivers/misc/uacce/uacce.c >>>>> @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode >>>>> *inode, struct file *filep) >>>>>        if (!q) >>>>>            return -ENOMEM; >>>>> +    mutex_lock(&uacce->queues_lock); >>>>> + >>>>> +    if (!uacce->parent->driver) { >>>> I don't think this is useful, because the core clears >>>> parent->driver after >>>> having run uacce_remove(): >>>> >>>>     rmmod hisi_zip        open() >>>>      ...                 uacce_fops_open() >>>>      __device_release_driver()      ... >>>>       pci_device_remove() >>>>        hisi_zip_remove() >>>>         hisi_qm_uninit() >>>>          uacce_remove() >>>>           ...              ... >>>>                        mutex_lock(uacce->queues_lock) >>>>       ...                  if (!uacce->parent->driver) >>>>       device_unbind_cleanup()      /* driver still valid, proceed */ >>>>        dev->driver = NULL >>> The check  if (!uacce->parent->driver) is required, otherwise NULL >>> pointer >>> may happen. >> I agree we need something, what I mean is that this check is not >> sufficient. >> >>> iommu_sva_bind_device >>> const struct iommu_ops *ops = dev_iommu_ops(dev);  -> >>> dev->iommu->iommu_dev->ops >>> >>> rmmod has no issue, but remove parent pci device has the issue. >> Ah right, relying on the return value of bind() wouldn't be enough >> even if >> we mandated SVA. >> >> [...] >>>> I think we need the global uacce_mutex to serialize uacce_remove() and >>>> uacce_fops_open(). uacce_remove() would do everything, including >>>> xa_erase(), while holding that mutex. And uacce_fops_open() would >>>> try to >>>> obtain the uacce object from the xarray while holding the mutex, which >>>> fails if the uacce object is being removed. >>> Since fops_open get char device refcount, uacce_release will not happen >>> until open returns. >> The refcount only ensures that the uacce_device object is not freed as >> long as there are open fds. But uacce_remove() can run while there are >> open fds, or fds in the process of being opened. And atfer >> uacce_remove() >> runs, the uacce_device object still exists but is mostly unusable. For >> example once the module is freed, uacce->ops is not valid anymore. But >> currently uacce_fops_open() may dereference the ops in this case: >> >>     uacce_fops_open() >>      if (!uacce->parent->driver) >>      /* Still valid, keep going */ >>      ...                    rmmod >>                          uacce_remove() >>      ...                     free_module() >>      uacce->ops->get_queue() /* BUG */ > > uacce_remove should wait for uacce->queues_lock, until fops_open > release the lock. > If open happen just after the uacce_remove: unlock, uacce_bind_queue > in open should fail. > >> Accessing uacce->ops after free_module() is a use-after-free. We need >> all > you men parent release the resources. >> the fops to synchronize with uacce_remove() to ensure they don't use any >> resource of the parent after it's been freed. > After fops_open, currently we are counting on parent driver stop all > dma first, then call uacce_remove, which is assumption. > Like drivers/crypto/hisilicon/zip/zip_main.c: > hisi_qm_wait_task_finish, which will wait uacce_release. > If comments this , there may other issue, > Unable to handle kernel paging request at virtual address > ffff80000b700204 > pc : hisi_qm_cache_wb.part.0+0x2c/0xa0 > >> I see uacce_fops_poll() may have the same problem, and should be inside >> uacce_mutex. > Do we need consider this, uacce_remove can happen anytime but not > waiting dma stop? > > Not sure uacce_mutex can do this. > Currently the sequence is > mutex_lock(&uacce->queues_lock); > mutex_lock(&uacce_mutex); > > Or we set all the callbacks of uacce_ops to NULL? How about in uacce_remove mutex_lock(&uacce_mutex); uacce->ops = NULL; mutex_unlock(&uacce_mutex); And check uacce->ops  first when using. Or set all ops of uacce->ops to NULL. > Module_get/put only works for module, but not for removing device. > > Thanks > >> >> Thanks, >> Jean >