Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp5180666imm; Tue, 12 Jun 2018 04:02:16 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJJf1x/Hm4h1eUcNK19CFyVkLi+Z7vkpB2lsOgggRf/tQSst83w2UbthkiYCgdbhvVdF+Ti X-Received: by 2002:a65:4dc3:: with SMTP id q3-v6mr2771451pgt.331.1528801335982; Tue, 12 Jun 2018 04:02:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528801335; cv=none; d=google.com; s=arc-20160816; b=JWaC8VH4ZwJ2suYjFf3bsRolWS0ecKAQlGunKZBk0DlATYcGYRmKUiN5ANfsPXGr/W qfypj+H9KThE9fNngvDf8SPi3wZ8aMrv3gtCsdnVl4u6u5rzyeLLLIgLggp5HF/bxLTY Pybhdioah4ifLQBd64+nB0b1usilPmwWe1oAz3ZeoeuYbElqqY/JQSsLC2V0iOB/HujS 4OmhJp11W0NnCicDINvV8vlV4/CiM4r2/kSfj7kz0X4W8HkJEJhR16gt1uFSyEK398ON M1LPidb6QeWglvaVSoh2Ic9BXP3twvMLMiHvFc/scbxGBHgW6qorCAe/hq1epehj6qIB doZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :references:subject:cc:to:mime-version:user-agent:from:date :message-id:arc-authentication-results; bh=oxS6eVGPk24jniyj7BsWIyuUKSJpdLIUxPdHjiCMtVM=; b=mFuG4tfYqgAP6FIaHjS/U8r8RSc8+it0KOAY3Pm6tU5ZQRJkERN9ep/AviCOQLMjNO eLyukCYgFOHZyGW0+J1vizq2tgnn3ZNm0S9nv/DfpdlENKRAVdJJXfHxpV65U72Fcpxq PuvxNWvnvKlb11uChFaXBkhxdJgq8ASDgbV1m0XqaSXvWNDcvJG5TJ6AGOZ/czlvzhpj k43PbuNHGudQ4UBcbyjnVHCEGM1bZLIwob9UeSpIZ8ld3f09GGA4Q8gFABsEvlQ2Om5s CPL++9AMmXqAeOOExRdoHuT/3pgpODhSJeh4aY+UEj4XaIyPgTo6y6ayWM0pujSobN3A nt6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t25-v6si593770pfh.101.2018.06.12.04.02.01; Tue, 12 Jun 2018 04:02:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933166AbeFLLB3 (ORCPT + 99 others); Tue, 12 Jun 2018 07:01:29 -0400 Received: from mga05.intel.com ([192.55.52.43]:26823 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932286AbeFLLB0 (ORCPT ); Tue, 12 Jun 2018 07:01:26 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jun 2018 04:01:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,214,1526367600"; d="scan'208";a="236732334" Received: from unknown (HELO [10.239.13.97]) ([10.239.13.97]) by fmsmga006.fm.intel.com with ESMTP; 12 Jun 2018 04:01:24 -0700 Message-ID: <5B1FA8EF.4030409@intel.com> Date: Tue, 12 Jun 2018 19:05:19 +0800 From: Wei Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Linus Torvalds , "Michael S. Tsirkin" CC: KVM list , virtualization , Network Development , Linux Kernel Mailing List , Andrew Morton , Bjorn Andersson Subject: Re: [PULL] vhost: cleanups and fixes References: <20180611192353-mutt-send-email-mst@kernel.org> <20180612042600-mutt-send-email-mst@kernel.org> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/12/2018 09:59 AM, Linus Torvalds wrote: > On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin wrote: >> Maybe it will help to have GFP_NONE which will make any allocation >> fail if attempted. Linus, would this address your comment? > It would definitely have helped me initially overlook that call chain. > > But then when I started looking at the whole dma_map_page() thing, it > just raised my hackles again. > > I would seriously suggest having a much simpler version for the "no > allocation, no dma mapping" case, so that it's *obvious* that that > never happens. > > So instead of having virtio_balloon_send_free_pages() call a really > generic complex chain of functions that in _some_ cases can do memory > allocation, why isn't there a short-circuited "vitruque_add_datum()" > that is guaranteed to never do anything like that? > > Honestly, I look at "add_one_sg()" and it really doesn't make me > happy. It looks hacky as hell. If I read the code right, you're really > trying to just queue up a simple tuple of , except you encode > it as a page pointer in order to play games with the SG logic, and > then you hmap that to the ring, except in this case it's all a fake > ring that just adds the cpu-physical address instead. > > And to figuer that out, it's like five layers of indirection through > different helper functions that *can* do more generic things but in > this case don't. > > And you do all of this from a core VM callback function with some > _really_ core VM locks held. > > That makes no sense to me. > > How about this: > > - get rid of all that code > > - make the core VM callback save the "these are the free memory > regions" in a fixed and limited array. One that DOES JUST THAT. No > crazy "SG IO dma-mapping function crap". Just a plain array of a fixed > size, pre-allocated for that virtio instance. > > - make it obvious that what you do in that sequence is ten > instructions and no allocations ("Look ma, I wrote a value to an array > and incremented the array idex, and I'M DONE") > > - then in that workqueue entry that you start *anyway*, you empty the > array and do all the crazy virtio stuff. > > In fact, while at it, just simplify the VM interface too. Instead of > traversing a random number of buddy lists, just trraverse *one* - the > top-level one. Are you seriously ever going to shrink or mark > read-only anythin *but* something big enough to be in the maximum > order? > > MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really* > want the balloon code to work on smaller things, particularly since > the whole interface is fundamentally racy and opportunistic to begin > with? OK, I will implement a new version based on the suggestions. Thanks. Best, Wei