Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp14685306rwb; Mon, 28 Nov 2022 04:03:13 -0800 (PST) X-Google-Smtp-Source: AA0mqf67TYsLuZa937O9KGEsaVgMVJRfWyLfUkagUBwBwaAxd695uzax2al3xjy+Oy/1pKj9ZLzM X-Received: by 2002:a17:90a:7896:b0:218:6fd6:b693 with SMTP id x22-20020a17090a789600b002186fd6b693mr53782368pjk.29.1669636992901; Mon, 28 Nov 2022 04:03:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669636992; cv=none; d=google.com; s=arc-20160816; b=KtXojFF7lWlwYAli2mU1opU2IwEaS5YMslM2tCsI2xK0iKeOLDskQW0m1wcPRNK9zS E5yQIFxaOC9OWR1bGPf89BBqXlEqgj0TEQqDiwxCh12h7J9SUrI9Fvm6pts9RAKwiZQR fA2dOI1CYVYs+YstxVswv7UaaFXnFJPPuUHBLRc2tIiUGoq8sUGuXe6CB8Np/MontBzP Kmavew/srdP3tRLKERzm49QTMpoZorrxtlwrQ0dFFJKQxHypppvQ6hF4WOiPf6WOYWtx TNaCcm2CfY6obNyhxSi3DZSYSekwG1MwYRItlE9zQyMGudcGjJu1SCidicunBui2NRpO Z1zA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=R+GL6RZBaUfwQLKAw1mCuqTI7YOmfRFttZCcajdt4Qg=; b=Br/mad76X6Xc7ppvOy+nzMTF+qF7ZPzc8ERcQWQAASs2pSqysTn9XQD0trc770/AAW LmEXjoeP2KTstyRrtW2Hpi3rrtedDic0HK9PYnOVGMfFSIrL/z0Zq2JqEljNnMjs9gJM Cc4k9HmjFX4lnEW1UYjT8y3AYr9FMGzp5MfQgfHZ+5V3wt+ZhzfLalTaORvxsvds2ntq SmX8fBeKaH9/8l7Y6v10/UImqFr3wWzGhmw3mEofRtg+N1DcbpDbKKk3J/k73VPzGuHv Ray5KJ98KhgJbKIDUGx/ITdlXQrrFqfuqSQjmohEszZ+wNGl+d2FlJZQAzqAj5MoBq0d IV8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nu16-20020a17090b1b1000b002135e1f5a7bsi16953135pjb.155.2022.11.28.04.03.00; Mon, 28 Nov 2022 04:03:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231252AbiK1L7m (ORCPT + 85 others); Mon, 28 Nov 2022 06:59:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230469AbiK1L7j (ORCPT ); Mon, 28 Nov 2022 06:59:39 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85344186CA for ; Mon, 28 Nov 2022 03:59:37 -0800 (PST) Received: from kwepemi100025.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4NLPBp0Q1czRpdR; Mon, 28 Nov 2022 19:58:58 +0800 (CST) Received: from [10.174.148.223] (10.174.148.223) by kwepemi100025.china.huawei.com (7.221.188.158) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 28 Nov 2022 19:59:34 +0800 Message-ID: <28c1ae52-38ff-42ce-4331-11f7aa040296@huawei.com> Date: Mon, 28 Nov 2022 19:59:33 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH] vdpasim: support doorbell mapping To: "Michael S. Tsirkin" CC: Jason Wang , , , , , , , , , References: <20221128025558.2152-1-longpeng2@huawei.com> <53edc14a-74bb-f9c7-06bd-7ea1047fe613@huawei.com> <20221128051555-mutt-send-email-mst@kernel.org> From: "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" In-Reply-To: <20221128051555-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.148.223] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemi100025.china.huawei.com (7.221.188.158) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/11/28 18:19, Michael S. Tsirkin 写道: > On Mon, Nov 28, 2022 at 04:19:30PM +0800, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: >> >> >> 在 2022/11/28 12:05, Jason Wang 写道: >>> On Mon, Nov 28, 2022 at 10:56 AM Longpeng(Mike) wrote: >>>> >>>> From: Longpeng >>>> >>>> Support doorbell mapping for vdpasim devices, then we can test the notify >>>> passthrough feature even if there's no real hardware on hand. >>> >>> You can use vp_vdpa in L1 plus page_ver_vq in L0 to test it in L2. >>> That is how I test it. >>> >> Yes, using nested virtualization can work, but it's hard to deploy in my >> working environment for some reasons, so I decided to emulate this >> capability in vdpasim, it's much easier. >> >>>> >>>> Allocates a dummy page which used to emulate the notify page of the device. >>>> All values written to this page would be ignored, a periodic work will >>>> check whether there're requests that need to process. >>> >>> This seems tricky, it means the device is working even if there's no >> >> Right. It just try to make the vdpasim device work properly, but the vdpasim >> device is only used for testing, so maybe the tricky emulation is >> acceptable? > > Maybe. You can try enabling VIRTIO_F_NOTIFICATION_DATA and then > looking at the data written to figure out whether > you need to poll the vq. > We can try after the kernel supports the VIRTIO_F_NOTIFICATION_DATA feature, while there is still a long way to go. > >>> kick. If we really want to do, we should try to use page fault handler >>> (probably by extending the config ops), but I'm not sure it's worth to >>> bother (or if we can find a use case for no simulator devices). >>> >> This need to modify the framework, it seems unworthy. >> >>>> >>>> This cap is disabled as default, users can enable it as follow: >>>> modprobe vdpa_sim notify_passthrough=true >>>> >>>> Signed-off-by: Longpeng >>>> --- >>>> drivers/vdpa/vdpa_sim/vdpa_sim.c | 71 ++++++++++++++++++++++++++-- >>>> drivers/vdpa/vdpa_sim/vdpa_sim.h | 5 +- >>>> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 5 +- >>>> drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 +- >>>> 4 files changed, 76 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c >>>> index 7438a89ce939..5c215b56b78b 100644 >>>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c >>>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c >>>> @@ -14,6 +14,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> #include >>>> #include >>>> #include >>>> @@ -36,9 +37,15 @@ module_param(max_iotlb_entries, int, 0444); >>>> MODULE_PARM_DESC(max_iotlb_entries, >>>> "Maximum number of iotlb entries for each address space. 0 means unlimited. (default: 2048)"); >>>> >>>> +static bool notify_passthrough; >>>> +module_param(notify_passthrough, bool, 0444); >>>> +MODULE_PARM_DESC(notify_passthrough, >>>> + "Enable vq notify(doorbell) area mapping. (default: false)"); >>>> + >>>> #define VDPASIM_QUEUE_ALIGN PAGE_SIZE >>>> #define VDPASIM_QUEUE_MAX 256 >>>> #define VDPASIM_VENDOR_ID 0 >>>> +#define VDPASIM_VRING_POLL_PERIOD 100 /* ms */ >>>> >>>> static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) >>>> { >>>> @@ -276,7 +283,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, >>>> } >>>> >>>> vdpasim->dev_attr = *dev_attr; >>>> - INIT_WORK(&vdpasim->work, dev_attr->work_fn); >>>> + INIT_DELAYED_WORK(&vdpasim->vring_work, dev_attr->work_fn); >>>> spin_lock_init(&vdpasim->lock); >>>> spin_lock_init(&vdpasim->iommu_lock); >>>> >>>> @@ -287,6 +294,15 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, >>>> set_dma_ops(dev, &vdpasim_dma_ops); >>>> vdpasim->vdpa.mdev = dev_attr->mgmt_dev; >>>> >>>> + if (notify_passthrough) { >>>> + vdpasim->notify = __get_free_page(GFP_KERNEL | __GFP_ZERO); >>>> + if (!vdpasim->notify) >>>> + goto err_iommu; >>>> +#ifdef CONFIG_X86 >>>> + set_memory_uc(vdpasim->notify, 1); >>>> +#endif >>> >>> What's the reason for using uc memory? >>> >> The vma->vm_page_prot of notify mapping is pgprot_noncached (see >> vhost_vdpa_fault) but the vdpasim->notify is WB, so we should set its >> memtype to UC here and set it back to WB when releasing the device. > > You never look at this memory though. Why does it matter whether > it's UC or WB? > The warning in trace_pfn_remap() would be triggered. For example: x86/PAT: CPU 16/KVM:17819 map pfn RAM range req uncached-minus for [mem 0x5151f3000-0x5151f3fff], got write-back >>>> + } >>>> + >>>> vdpasim->config = kzalloc(dev_attr->config_size, GFP_KERNEL); >>>> if (!vdpasim->config) >>>> goto err_iommu; >>>> @@ -357,8 +373,11 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx) >>>> struct vdpasim *vdpasim = vdpa_to_sim(vdpa); >>>> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; >>>> >>>> + if (notify_passthrough) >>>> + return; >>> >>> So we should keep the two paths to be used at the same time. Userspace >>> can choose to not map doorbells? >>> >> It can work even if the userspace does not to map doorbells (e.g start >> without page-per-vq=on), because the device will periodic check its vqs. >> >>> Thanks >>> >>>> + >>>> if (vq->ready) >>>> - schedule_work(&vdpasim->work); >>>> + schedule_work(&vdpasim->vring_work.work); >>>> } >>>> >>>> static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx, >>>> @@ -495,6 +514,18 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa) >>>> return status; >>>> } >>>> >>>> +static void vdpasim_set_vring_work(struct vdpasim *vdpasim, bool start) >>>> +{ >>>> + if (!notify_passthrough) >>>> + return; >>>> + >>>> + if (start) >>>> + schedule_delayed_work(&vdpasim->vring_work, >>>> + msecs_to_jiffies(VDPASIM_VRING_POLL_PERIOD)); >>>> + else >>>> + cancel_delayed_work_sync(&vdpasim->vring_work); >>>> +} >>>> + >>>> static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status) >>>> { >>>> struct vdpasim *vdpasim = vdpa_to_sim(vdpa); >>>> @@ -502,12 +533,16 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status) >>>> spin_lock(&vdpasim->lock); >>>> vdpasim->status = status; >>>> spin_unlock(&vdpasim->lock); >>>> + >>>> + vdpasim_set_vring_work(vdpasim, status & VIRTIO_CONFIG_S_DRIVER_OK); >>>> } >>>> >>>> static int vdpasim_reset(struct vdpa_device *vdpa, bool clear) >>>> { >>>> struct vdpasim *vdpasim = vdpa_to_sim(vdpa); >>>> >>>> + vdpasim_set_vring_work(vdpasim, false); >>>> + >>>> spin_lock(&vdpasim->lock); >>>> vdpasim->status = 0; >>>> vdpasim_do_reset(vdpasim); >>>> @@ -672,12 +707,24 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, unsigned int asid, >>>> return 0; >>>> } >>>> >>>> +static struct vdpa_notification_area >>>> +vdpasim_get_vq_notification(struct vdpa_device *vdpa, u16 qid) >>>> +{ >>>> + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); >>>> + struct vdpa_notification_area notify; >>>> + >>>> + notify.addr = virt_to_phys((void *)vdpasim->notify); >>>> + notify.size = PAGE_SIZE; >>>> + >>>> + return notify; >>>> +} >>>> + >>>> static void vdpasim_free(struct vdpa_device *vdpa) >>>> { >>>> struct vdpasim *vdpasim = vdpa_to_sim(vdpa); >>>> int i; >>>> >>>> - cancel_work_sync(&vdpasim->work); >>>> + cancel_delayed_work_sync(&vdpasim->vring_work); >>>> >>>> for (i = 0; i < vdpasim->dev_attr.nvqs; i++) { >>>> vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov); >>>> @@ -693,7 +740,23 @@ static void vdpasim_free(struct vdpa_device *vdpa) >>>> vhost_iotlb_free(vdpasim->iommu); >>>> kfree(vdpasim->vqs); >>>> kfree(vdpasim->config); >>>> + if (vdpasim->notify) { >>>> +#ifdef CONFIG_X86 >>>> + set_memory_wb(vdpasim->notify, 1); >>>> +#endif >>>> + free_page(vdpasim->notify); >>>> + } >>>> +} >>>> + >>>> +void vdpasim_schedule_work(struct vdpasim *vdpasim, bool sched_now) >>>> +{ >>>> + if (sched_now) >>>> + schedule_work(&vdpasim->vring_work.work); >>>> + else if (notify_passthrough) >>>> + schedule_delayed_work(&vdpasim->vring_work, >>>> + msecs_to_jiffies(VDPASIM_VRING_POLL_PERIOD)); >>>> } >>>> +EXPORT_SYMBOL_GPL(vdpasim_schedule_work); >>>> >>>> static const struct vdpa_config_ops vdpasim_config_ops = { >>>> .set_vq_address = vdpasim_set_vq_address, >>>> @@ -704,6 +767,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = { >>>> .get_vq_ready = vdpasim_get_vq_ready, >>>> .set_vq_state = vdpasim_set_vq_state, >>>> .get_vq_state = vdpasim_get_vq_state, >>>> + .get_vq_notification = vdpasim_get_vq_notification, >>>> .get_vq_align = vdpasim_get_vq_align, >>>> .get_vq_group = vdpasim_get_vq_group, >>>> .get_device_features = vdpasim_get_device_features, >>>> @@ -737,6 +801,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { >>>> .get_vq_ready = vdpasim_get_vq_ready, >>>> .set_vq_state = vdpasim_set_vq_state, >>>> .get_vq_state = vdpasim_get_vq_state, >>>> + .get_vq_notification = vdpasim_get_vq_notification, >>>> .get_vq_align = vdpasim_get_vq_align, >>>> .get_vq_group = vdpasim_get_vq_group, >>>> .get_device_features = vdpasim_get_device_features, >>>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h >>>> index 0e78737dcc16..da0866834918 100644 >>>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h >>>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h >>>> @@ -53,7 +53,7 @@ struct vdpasim_dev_attr { >>>> struct vdpasim { >>>> struct vdpa_device vdpa; >>>> struct vdpasim_virtqueue *vqs; >>>> - struct work_struct work; >>>> + struct delayed_work vring_work; >>>> struct vdpasim_dev_attr dev_attr; >>>> /* spinlock to synchronize virtqueue state */ >>>> spinlock_t lock; >>>> @@ -69,10 +69,13 @@ struct vdpasim { >>>> bool running; >>>> /* spinlock to synchronize iommu table */ >>>> spinlock_t iommu_lock; >>>> + /* dummy notify page */ >>>> + unsigned long notify; >>>> }; >>>> >>>> struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr, >>>> const struct vdpa_dev_set_config *config); >>>> +void vdpasim_schedule_work(struct vdpasim *vdpasim, bool sched_now); >>>> >>>> /* TODO: cross-endian support */ >>>> static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim) >>>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c >>>> index c6db1a1baf76..8a640ea82284 100644 >>>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c >>>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c >>>> @@ -288,7 +288,7 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim, >>>> >>>> static void vdpasim_blk_work(struct work_struct *work) >>>> { >>>> - struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); >>>> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, vring_work.work); >>>> bool reschedule = false; >>>> int i; >>>> >>>> @@ -325,8 +325,7 @@ static void vdpasim_blk_work(struct work_struct *work) >>>> out: >>>> spin_unlock(&vdpasim->lock); >>>> >>>> - if (reschedule) >>>> - schedule_work(&vdpasim->work); >>>> + vdpasim_schedule_work(vdpasim, reschedule); >>>> } >>>> >>>> static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config) >>>> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c >>>> index c3cb225ea469..8b998952384b 100644 >>>> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c >>>> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c >>>> @@ -145,7 +145,7 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) >>>> >>>> static void vdpasim_net_work(struct work_struct *work) >>>> { >>>> - struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); >>>> + struct vdpasim *vdpasim = container_of(work, struct vdpasim, vring_work.work); >>>> struct vdpasim_virtqueue *txq = &vdpasim->vqs[1]; >>>> struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0]; >>>> ssize_t read, write; >>>> @@ -196,7 +196,7 @@ static void vdpasim_net_work(struct work_struct *work) >>>> vdpasim_net_complete(rxq, write); >>>> >>>> if (++pkts > 4) { >>>> - schedule_work(&vdpasim->work); >>>> + vdpasim_schedule_work(vdpasim, true); >>>> goto out; >>>> } >>>> } >>>> -- >>>> 2.23.0 >>>> >>> >>> . > > .