Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp778806imm; Wed, 23 May 2018 05:31:32 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpxkxKV34PtolsGrZhw45Ik0HO6u0xhyEiKjEUEA9nbxJp27L3j5yF1h1U2qw+Pi4y1sTso X-Received: by 2002:a17:902:42c3:: with SMTP id h61-v6mr2791757pld.164.1527078692484; Wed, 23 May 2018 05:31:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527078692; cv=none; d=google.com; s=arc-20160816; b=GrsBy6Hhxl4UvRjwOMuTB+S6oVQNLU94F3a5oH93igAqbMCHOTJtyNg05OkJgfTfG9 YPvx/xsh/q9YVKVc9bWDLec7t6jX3EDOXBgc2O5U6Bp12SapC4N6Dx5bOhDWXlWHkFU4 CSOniTYsnKT7R0vg8VAvKXhxhFyZ8jW/Yz6Q6ObJh2rdYKfKTaaYeO981paUpxYDoMwt hwgT5VQ+T3p7NXt4zKuDafozuqDvo7BRwmkrCHtuiACcR3Zd2j+HcxvVvW6jn6SfRhLd yBZBidXjauMdPEbGVTBDbTKvcJnyQgvvD1XnQLNWcP//C5UMcKu7mW+y/C61Q13nbaad wdBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:from :references:cc:to:subject:arc-authentication-results; bh=OliPP3EkemzYzNBPVWcuai+xl7E0eQDQjDzUXA266IE=; b=gt1FFcx47S0CTGzAF8xnVZ1YEiFWTGFnUaFZ2BO//unftcgd+kmqei2qjcZ/trpOqu OvE9mZZ9FjwfxBa635pFxa7zrFLUVFHqYTphJ5tvWHE0uiOIR1DoQt+lGPv1SUoMx2om mKJNifDcKxOn2rrmOPduwDkFxj0IPnAwVH2RWdQ3ycTak9Fz4uRiM6Yh019Luchp6CMP 33TfzJDUtpd+1nTaKeX9BHXRVnFu8QD1aBUksnausO8j+f0zeK9b7A1JWEzF/ynYKk0A gMaIvxl8I3jhE8SXAjRwUkbjDUm4VdfxmGXlypsjfdtA0MSKSOAlB2Gd6KR/NUZ0enlI Aq8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k75-v6si18079856pfk.369.2018.05.23.05.31.13; Wed, 23 May 2018 05:31:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932686AbeEWM3j (ORCPT + 99 others); Wed, 23 May 2018 08:29:39 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:35348 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932070AbeEWM3f (ORCPT ); Wed, 23 May 2018 08:29:35 -0400 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4NCQGrF076458 for ; Wed, 23 May 2018 08:29:34 -0400 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2j585n8qn4-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 23 May 2018 08:29:34 -0400 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 23 May 2018 13:29:32 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 23 May 2018 13:29:29 +0100 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w4NCTTr14325856; Wed, 23 May 2018 12:29:29 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 21D09AE053; Wed, 23 May 2018 13:18:42 +0100 (BST) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD50FAE04D; Wed, 23 May 2018 13:18:41 +0100 (BST) Received: from oc3836556865.ibm.com (unknown [9.152.224.46]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 23 May 2018 13:18:41 +0100 (BST) Subject: Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection To: Cornelia Huck , Alex Williamson Cc: kwankhede@nvidia.com, Dong Jia , kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <20180518190145.3187.7620.stgit@gimli.home> <20180522123829.4e758646@w520.home> <20180523105641.0d89701b.cohuck@redhat.com> From: Halil Pasic Date: Wed, 23 May 2018 14:29:28 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180523105641.0d89701b.cohuck@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18052312-0012-0000-0000-000005D9BECF X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18052312-0013-0000-0000-00001957092A Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-05-23_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1805230127 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/23/2018 10:56 AM, Cornelia Huck wrote: > On Tue, 22 May 2018 12:38:29 -0600 > Alex Williamson wrote: > >> On Tue, 22 May 2018 19:17:07 +0200 >> Halil Pasic wrote: >> >>> From vfio-ccw perspective I join Connie's assessment: vfio-ccw should >>> be fine with these changes. I'm however not too deeply involved with >>> the mdev framework, thus I don't feel comfortable r-b-ing. That results >>> in >>> Acked-by: Halil Pasic >>> for both patches. >>> >>> While at it I have would like to ask about the semantics and intended >>> use of the mdev interfaces. >>> >>> static int vfio_ccw_sch_probe(struct subchannel *sch) >>> { >>> >>> /* HALIL: 8< Not so interesting stuff happens here. >8 */ >> >> This was interesting: >> >> private->state = VFIO_CCW_STATE_NOT_OPER; >> >>> ret = vfio_ccw_mdev_reg(sch); >>> if (ret) >>> goto out_disable; >>> /* >>> * HALIL: >>> * This might be racy. Somewhere in vfio_ccw_mdev_reg() the create attribute >>> * is made available (it calls mdev_register_device()). For instance create will >>> * attempt to decrement private->avail which is initialized below. I fail to >>> * understand how is this well synchronized. >>> */ >>> INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo); >>> atomic_set(&private->avail, 1); >>> private->state = VFIO_CCW_STATE_STANDBY; >>> >>> return 0; >>> >>> out_disable: >>> cio_disable_subchannel(sch); >>> out_free: >>> dev_set_drvdata(&sch->dev, NULL); >>> kfree(private); >>> return ret; >>> } >>> >>> Should not initialization of go before mdev_register_device(), and then rolled >>> back if necessary if mdev_register_device() fails? >>> >>> In practice it does not seem very likely that userspace can trigger >>> mdev_device_create() before vfio_ccw_sch_probe() finishes so it should >>> not be a practical problem. But I would like to understand how synchronization >>> is supposed to work. >>> >>> [Added Dong Jia, maybe he is also able to answer my question.] >> >> vfio_ccw_mdev_create() requires that private->state is not >> VFIO_CCW_STATE_NOT_OPER but vfio_ccw_sch_probe() explicitly sets state >> to this value before calling vfio_ccw_mdev_reg(), so a create should >> return -ENODEV if racing with parent registration. Is there something >> else that I'm missing? Thanks, >> Disclaimer: I did not do much kernel work up until now. I still have much to learn. I mostly agree with your analysis but I'm not sure if the conclusion should be 'and thus everything is good' or 'and thus indeed we do have a race, a poorly handled one'. One thing I'm not sure about is: can atomic_set(&private->avail, 1) and private->state = VFIO_CCW_STATE_STANDBY be perceived as reordered by e.g. some other cpu and thus vfio_ccw_mdev_create() or not. I tried to figure it out based on Documentation/atomic_t.txt but was not very successful. If these can be reordered we could observe -EPERM instead of -ENODEV, I think. Furthermore from your analysis I deduce that the client code (I think mdev calls it vendor code) may rely on mdev_register_device() containing a (RELEASE) barrier. We use a mutex in there so the barrier is there. And the client code may rely on a (ACQUIRE) barrier before the create callback is called. That should also be true and was true in the past too again because of mutex usage. >> Alex > > No, I think your understanding is correct. We move the state from > NOT_OPER to STANDBY only after we're set up completely, so our create > callback will simply fail early with -ENODEV. This looks fine to me. > This -ENODEV looks strange to me. Which device does not exist? The userspace were supposed to retry on this? It's not even -EAGAIN. Is it documented somewhere? If it's unavoidable (which I don't see why) I would prefer -EAGAIN. I think throwing an -ENODEV at our userspace once in a blue moon (if ever) because that is the way we 'handle' races in our code instead of avoiding them is not very friendly. And I'm not sure -EPERM is not possible (see my statement about reordering of the writes above). Regards, Halil