Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4866849iob; Mon, 9 May 2022 03:34:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwOO2LxsP6o2u8XmlXTnr6p2aNW4nuRgjWXS7pyLioHfzxK28bV/7NE3LkO43XKUVQXJkOA X-Received: by 2002:a17:903:2406:b0:158:f6f0:6c44 with SMTP id e6-20020a170903240600b00158f6f06c44mr15574321plo.88.1652092494956; Mon, 09 May 2022 03:34:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652092494; cv=none; d=google.com; s=arc-20160816; b=ZRclyA7F7li9Ekhx25Yygo7KJMGmsuYCy+N0rfXzNbjlxBV/zl4JjuTwjloglw/Eji OBnafDLc/5CqdziVsaq4EVHKQuc+Z5/f1s/hHteVY0jduOQRHWNwjsOlnwPhVaRbaNaN usgOn6Mn+T5cSH2UWhF/kOOBBqTMqqWw3T75SSnAd6t4D2SLANYrxMUaXH/5rWwpglWe r2Yg01+VLgFuZbZ5b4eUbuOyMiVqBSry4xPtTSVhBH+UW0RbJgE/if29sW4NWqqeb3d4 C7JinmnOc/zbwH/tmNeg2ERYKMO3wBFVv9x/P+c9jEof1R8i8Ur4KLkODys06Ml7pRWG 6f1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=L0dEJhPvtpVY9vKuKdRWMDwwctVUgQ9NezFDM8exNt8=; b=Z2LoBeK8o32h0A7jexY1pIZ+GJ3lQVtSLd+zp6PzbStxu/wVX544OlDPU9Lj9bf+/C 7RhxAAtO4gWOwdo/s7bv1i1IJwpwD7fLy1dkXG3/hJSnkzJ1gzTDth+HwmwWEvNpvMjt k/ZOH0BzyvuHNZJuyRAYa42RccaEflebbgR5izASgo7XaoI4mZemsscV0jg6LLYfEwlJ gmQQVC0Y+S+fIgvxCC7jtO2B/kOAnbSfSRX1m1II+4shQOk+BoUBRvcPGkGKeJvI2v4F 47R81Fv7gcfWoKedVAyZ7i27gAG86X48TOYDa9hFogXqBTUxxNFxD6AxqsrAp8JjLAM4 veaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B1AImJcq; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id i64-20020a638743000000b003ab3da7b5aesi5201470pge.531.2022.05.09.03.34.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 03:34:54 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B1AImJcq; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 88FB725E92; Mon, 9 May 2022 03:05:13 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234896AbiEIIOV (ORCPT + 99 others); Mon, 9 May 2022 04:14:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238862AbiEIIJ6 (ORCPT ); Mon, 9 May 2022 04:09:58 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0E333165219 for ; Mon, 9 May 2022 01:06:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652083341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=L0dEJhPvtpVY9vKuKdRWMDwwctVUgQ9NezFDM8exNt8=; b=B1AImJcqZ3oQPt4E33u3t9ZAg3RfQ2z2m30lckw6s5/Awk2iSdE0diteM2VmAu7jC6ztL2 yx6ni9UvtWgOMplQLcvLK+mAPk+1mTdwsCBYjriU4gA/LImWcx+pq1pOV6TlyCL5fCeT6T 0AkwaFIodEMAQOBs167i4JyHn81GukY= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-225-adaOq5lYMQe3yZ68wNkVYg-1; Mon, 09 May 2022 04:02:20 -0400 X-MC-Unique: adaOq5lYMQe3yZ68wNkVYg-1 Received: by mail-lf1-f70.google.com with SMTP id n3-20020ac242c3000000b00473d8af3a0cso4207953lfl.21 for ; Mon, 09 May 2022 01:02:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=L0dEJhPvtpVY9vKuKdRWMDwwctVUgQ9NezFDM8exNt8=; b=ai7d767J3CyEYbQHZe5EkQtLfrbsHW8p1NeVKzx8XM6dDiKO4GcX5c1BuSPdXpGtnB B6cls5PiF9ycvi8b9ZA1T9OJBqB+c5XS5bIHNhkaAPj6sH7hn09JTSLTCEo1gTVUs8lH 7ovhjP0FQlvqpDK+0vOtOJlaFYoQwzEcNXk7J3Obpnu4HHfsd+NdqIGfW8VYRtyDm0bz ByRtje+feAd6sl7EI3aEi6NjekZuBKldLeLPnSNOTdKTjJzITZ+VUyDf86oEK4/U3B2d 0aTOg7Mh2S4lwfS7xR2zzUKYwdDJMVXlGOVyLRV5OZVCgxQMNOogqHn4OX6i/4rsFq/7 M8pQ== X-Gm-Message-State: AOAM530tJP3X4/EiesHu1Gy4rwN2rsX8Rirl9qWiBXy7yInLLB9/52aP WKV2WoZJmQr8iaeGsQa/OvmgL5OgfrANhI4v5YBRNjBGnfcsl8sJyXt3FE1ZWvCa1IRAiWs56jU 1lZo73fhNYuRqrbCOWQPxeqRhMKAAj26WZDvl4ntb X-Received: by 2002:a05:651c:89:b0:250:87c9:d4e6 with SMTP id 9-20020a05651c008900b0025087c9d4e6mr9955096ljq.315.1652083338949; Mon, 09 May 2022 01:02:18 -0700 (PDT) X-Received: by 2002:a05:651c:89:b0:250:87c9:d4e6 with SMTP id 9-20020a05651c008900b0025087c9d4e6mr9955078ljq.315.1652083338682; Mon, 09 May 2022 01:02:18 -0700 (PDT) MIME-Version: 1.0 References: <20220509071426.155941-1-lulu@redhat.com> In-Reply-To: <20220509071426.155941-1-lulu@redhat.com> From: Jason Wang Date: Mon, 9 May 2022 16:02:06 +0800 Message-ID: Subject: Re: [PATCH v1] vdpa: Do not count the pages that were already pinned in the vhost-vDPA To: Cindy Lu Cc: mst , kvm , linux-kernel , virtualization , netdev Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 9, 2022 at 3:15 PM Cindy Lu wrote: > > We count pinned_vm as follow in vhost-vDPA > > lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) { > ret = -ENOMEM; > goto unlock; > } > This means if we have two vDPA devices for the same VM the pages would be counted twice > So we add a tree to save the page that counted and we will not count it again > > Signed-off-by: Cindy Lu > --- > drivers/vhost/vdpa.c | 79 ++++++++++++++++++++++++++++++++++++++-- > include/linux/mm_types.h | 2 + > 2 files changed, 78 insertions(+), 3 deletions(-) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 05f5fd2af58f..48cb5c8264b5 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -24,6 +24,9 @@ > #include > > #include "vhost.h" > +#include > +#include > +#include > > enum { > VHOST_VDPA_BACKEND_FEATURES = > @@ -505,6 +508,50 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > mutex_unlock(&d->mutex); > return r; > } > +int vhost_vdpa_add_range_ctx(struct rb_root_cached *root, u64 start, u64 last) > +{ > + struct interval_tree_node *new_node; > + > + if (last < start) > + return -EFAULT; > + > + /* If the range being mapped is [0, ULONG_MAX], split it into two entries > + * otherwise its size would overflow u64. > + */ > + if (start == 0 && last == ULONG_MAX) { > + u64 mid = last / 2; > + > + vhost_vdpa_add_range_ctx(root, start, mid); > + start = mid + 1; > + } > + > + new_node = kmalloc(sizeof(struct interval_tree_node), GFP_ATOMIC); > + if (!new_node) > + return -ENOMEM; > + > + new_node->start = start; > + new_node->last = last; > + > + interval_tree_insert(new_node, root); > + > + return 0; > +} > + > +void vhost_vdpa_del_range(struct rb_root_cached *root, u64 start, u64 last) > +{ > + struct interval_tree_node *new_node; > + > + while ((new_node = interval_tree_iter_first(root, start, last))) { > + interval_tree_remove(new_node, root); > + kfree(new_node); > + } > +} > + > +struct interval_tree_node *vhost_vdpa_search_range(struct rb_root_cached *root, > + u64 start, u64 last) > +{ > + return interval_tree_iter_first(root, start, last); > +} > > static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) > { > @@ -513,6 +560,7 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) > struct vhost_iotlb_map *map; > struct page *page; > unsigned long pfn, pinned; > + struct interval_tree_node *new_node = NULL; > > while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { > pinned = PFN_DOWN(map->size); > @@ -523,7 +571,18 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) > set_page_dirty_lock(page); > unpin_user_page(page); > } > - atomic64_sub(PFN_DOWN(map->size), &dev->mm->pinned_vm); > + > + new_node = vhost_vdpa_search_range(&dev->mm->root_for_vdpa, > + map->start, > + map->start + map->size - 1); > + > + if (new_node) { > + vhost_vdpa_del_range(&dev->mm->root_for_vdpa, > + map->start, > + map->start + map->size - 1); > + atomic64_sub(PFN_DOWN(map->size), &dev->mm->pinned_vm); > + } > + > vhost_iotlb_map_free(iotlb, map); > } > } > @@ -591,6 +650,7 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, u64 iova, > struct vdpa_device *vdpa = v->vdpa; > const struct vdpa_config_ops *ops = vdpa->config; > int r = 0; > + struct interval_tree_node *new_node = NULL; > > r = vhost_iotlb_add_range_ctx(dev->iotlb, iova, iova + size - 1, > pa, perm, opaque); > @@ -611,9 +671,22 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, u64 iova, > return r; > } > > - if (!vdpa->use_va) > - atomic64_add(PFN_DOWN(size), &dev->mm->pinned_vm); > + if (!vdpa->use_va) { > + new_node = vhost_vdpa_search_range(&dev->mm->root_for_vdpa, > + iova, iova + size - 1); > + > + if (new_node == 0) { > + r = vhost_vdpa_add_range_ctx(&dev->mm->root_for_vdpa, > + iova, iova + size - 1); > + if (r) { > + vhost_iotlb_del_range(dev->iotlb, iova, > + iova + size - 1); > + return r; > + } > > + atomic64_add(PFN_DOWN(size), &dev->mm->pinned_vm); > + } This seems not sufficient, consider: vhost-vDPA-A: add [A, B) vhost-vDPA-B: add [A, C) (C>B) We lose the accounting for [B, C)? > + } > return 0; > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 5140e5feb486..46eaa6d0560b 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -634,6 +634,8 @@ struct mm_struct { > #ifdef CONFIG_IOMMU_SUPPORT > u32 pasid; > #endif > + struct rb_root_cached root_for_vdpa; > + Let's avoid touching mm_structure unless it's a must. We can allocate something like vhost_mm if needed during SET_OWNER. Thanks > } __randomize_layout; > > /* > -- > 2.34.1 >