Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3845098pxu; Tue, 20 Oct 2020 01:49:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzNnx/X5BSqPUQD09hBxMlWJjRL61KvqcqCq9+VD2V4KabR4tshfW4HyBBx5owm43K2EsU6 X-Received: by 2002:a17:906:a119:: with SMTP id t25mr2177992ejy.67.1603183794118; Tue, 20 Oct 2020 01:49:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603183794; cv=none; d=google.com; s=arc-20160816; b=zVT26cmIYboMtooId0KesOGXoV+TFZFbAmydv9QeMXVVBLwgbaX24tRXpW+eAxz4B4 ty9H16Fxl5q9vHNZVAS0666JpVasieKvp8rV4epXFD6k43LdByePdXaO0BnVKq0232iz CDMalYT9rPWO0q8EeUbdxrjlcNU2HskQmHpqSf2L9lc+usdEzngtdNfbchd90QsT/COZ fOVG7ZolehAH4vFU17ECjdis/YBttIRIknP5L9rjZgL/VcyXfHzFEO4D+pBKuumRqNxH y8HDfSBESB54385TCmP9cau+vA7bQxgBXGb1NM3g6KSf3cVTpXCl/mV2RcM6RV8u6lHQ TzHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date; bh=SRNLzrhf/1AXp2rxjqHAyvK8Pf4zFTaDw17kB03gV+w=; b=AUxSC9eup4tXVN+tjwUxoPi9s0Qd/AXbj73uLxNrFO5WfaxsGc1byawWjsxmnRf081 /Ssxcjq3Na3MZD8mWFcrf/nMhKzspwdz+Q4XftpdGDVspEkSAYnCGDKcjoFg7SiCXYUW dbdvRfimwB713zqpwUNujN+3uc1kuugWRJn0s7qUAagbPGF+5lyIAR7AXXgcH9bhTNEH t/7Bsl6bZfi0ldpbHmfmxZfOq7583x2nDtReVnvLNcm2ImYrOZxZhw4S/clsnrihGZcW 8eApnxcBBCSpKXJoiZ9sEkLnQYOxXfkWlqOSN068MUF7+8EbibIdkEuEYG1GRzxslk1d losQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h25si464397ejc.209.2020.10.20.01.49.31; Tue, 20 Oct 2020 01:49:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389489AbgJTAle (ORCPT + 99 others); Mon, 19 Oct 2020 20:41:34 -0400 Received: from out4436.biz.mail.alibaba.com ([47.88.44.36]:20925 "EHLO out4436.biz.mail.alibaba.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389484AbgJTAld (ORCPT ); Mon, 19 Oct 2020 20:41:33 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R261e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=richard.weiyang@linux.alibaba.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_---0UCax3nK_1603154490; Received: from localhost(mailfrom:richard.weiyang@linux.alibaba.com fp:SMTPD_---0UCax3nK_1603154490) by smtp.aliyun-inc.com(127.0.0.1); Tue, 20 Oct 2020 08:41:31 +0800 Date: Tue, 20 Oct 2020 08:41:30 +0800 From: Wei Yang To: David Hildenbrand Cc: Wei Yang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S . Tsirkin" , Jason Wang , Pankaj Gupta Subject: Re: [PATCH v1 09/29] virtio-mem: don't always trigger the workqueue when offlining memory Message-ID: <20201020004130.GB61232@L-31X9LVDL-1304.local> Reply-To: Wei Yang References: <20201012125323.17509-1-david@redhat.com> <20201012125323.17509-10-david@redhat.com> <20201016040301.GJ86495@L-31X9LVDL-1304.local> <82afba4e-66e2-ce05-c092-267301b66de9@redhat.com> <20201018035725.GA50506@L-31X9LVDL-1304> <5103e899-0ca2-0804-dee8-772b5737d34d@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5103e899-0ca2-0804-dee8-772b5737d34d@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 19, 2020 at 11:04:40AM +0200, David Hildenbrand wrote: >On 18.10.20 05:57, Wei Yang wrote: >> On Fri, Oct 16, 2020 at 11:18:39AM +0200, David Hildenbrand wrote: >>> On 16.10.20 06:03, Wei Yang wrote: >>>> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote: >>>>> Let's trigger from offlining code when we're not allowed to touch online >> >> Here "touch" means "unplug"? If so, maybe s/touch/unplug/ would be more easy >> to understand. > >Yes, much better. > >[...] > >> I am trying to get more understanding about the logic of virtio_mem_retry(). >> >> Current logic seems clear to me. There are four places to trigger it: >> >> * notify_offline >> * notify_online >> * timer_expired >> * config_changed >> >> In this patch, we try to optimize the first case, notify_offline. > >Yes. > >> >> Now, we would always trigger retry when one of our memory block get offlined. >> Per my understanding, this logic is correct while missed one case (or be more >> precise, not handle one case timely). The case this patch wants to improve is >> virtio_mem_mb_remove(). If my understanding is correct. >> > >Yes, that's one part of it. Read below. > >> virtio_mem_run_wq() >> virtio_mem_unplug_request() >> virtio_mem_mb_unplug_any_sb_offline() >> virtio_mem_mb_remove() --- 1 >> virtio_mem_mb_unplug_any_sb_online() >> virtio_mem_mb_offline_and_remove() --- 2 >> >> The above is two functions this patch adjusts. For 2), it will offline the >> memory block, thus will trigger virtio_mem_retry() originally. But for 1), the >> memory block is already offlined, so virtio_mem_retry() will not be triggered >> originally. This is the case we want to improve in this patch. Instead of wait >> for timer expire, we trigger retry immediately after unplug/remove an offlined >> memory block. >> >> And after this change, this patch still adjust the original >> virtio_mem_notify_offline() path to just trigger virtio_mem_retry() when >> unplug_online is false. (This means the offline event is notified from user >> space instead of from unplug event). >> >> If my above analysis is correct, I got one small suggestion for this patch. >> Instead of adjust current notify_offline handling, how about just trigger >> retry during virtio_mem_mb_remove()? Since per my understanding, we just want >> to do immediate trigger retry when unplug an offlined memory block. > >I probably should have added the following to the patch description: > >"This is a preparation for Big Block Mode (BBM), whereby we can see some >temporary offlining of memory blocks without actually making progress" > >Imagine you have a Big Block that spans to Linux memory blocks. Assume >the first Linux memory blocks has no unmovable data on it. > >Assume you call offline_and_remove_memory() > >1. Try to offline the first block. Works, notifiers triggered. >virtio_mem_retry(). After this patch, the virtio_mem_retry() is remove here. >2. Try to offline the second block. Does not work. >3. Re-online first block. >4. Exit to main loop, exit workqueue. Since offline_and_remove_memory() doesn't succeed, virtio_mem_retry() is not triggered. >5. Retry immediately (due to virtio_mem_retry()), go to 1. So we won't have endless loop. > >So, you'll keep retrying forever. Found while debugging that exact issue :) > If this is the case, my suggestion is to record it in the changelog. Otherwise, we may lose this corner case which is important to this change. > >-- >Thanks, > >David / dhildenb -- Wei Yang Help you, Help me