Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752016AbdHDINB (ORCPT ); Fri, 4 Aug 2017 04:13:01 -0400 Received: from mga03.intel.com ([134.134.136.65]:25864 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751283AbdHDIMo (ORCPT ); Fri, 4 Aug 2017 04:12:44 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,319,1498546800"; d="scan'208";a="136068840" Message-ID: <59842D1C.5020608@intel.com> Date: Fri, 04 Aug 2017 16:15:24 +0800 From: Wei Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Michal Hocko , "Michael S. Tsirkin" CC: "linux-kernel@vger.kernel.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mawilcox@microsoft.com" , "akpm@linux-foundation.org" , "virtio-dev@lists.oasis-open.org" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "liliang.opensource@gmail.com" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" Subject: Re: [PATCH v13 4/5] mm: support reporting free page blocks References: <5982FE07.3040207@intel.com> <20170803104417.GI12521@dhcp22.suse.cz> <59830897.2060203@intel.com> <20170803112831.GN12521@dhcp22.suse.cz> <5983130E.2070806@intel.com> <20170803124106.GR12521@dhcp22.suse.cz> <59832265.1040805@intel.com> <20170803135047.GV12521@dhcp22.suse.cz> <286AC319A985734F985F78AFA26841F73928C971@shsmsx102.ccr.corp.intel.com> <20170804000043-mutt-send-email-mst@kernel.org> <20170804075337.GC26029@dhcp22.suse.cz> In-Reply-To: <20170804075337.GC26029@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2127 Lines: 57 On 08/04/2017 03:53 PM, Michal Hocko wrote: > On Fri 04-08-17 00:02:01, Michael S. Tsirkin wrote: >> On Thu, Aug 03, 2017 at 03:20:09PM +0000, Wang, Wei W wrote: >>> On Thursday, August 3, 2017 9:51 PM, Michal Hocko: >>>> As I've said earlier. Start simple optimize incrementally with some numbers to >>>> justify a more subtle code. >>>> -- >>> OK. Let's start with the simple implementation as you suggested. >>> >>> Best, >>> Wei >> The tricky part is when you need to drop the lock and >> then restart because the device is busy. Would it maybe >> make sense to rotate the list so that new head >> will consist of pages not yet sent to device? > No, I this should be strictly non-modifying API. Just get the context here for discussion: spin_lock_irqsave(&zone->lock, flags); ... visit(opaque2, pfn, 1<lock, flags); The concern is that the callback may cause the lock be taken too long. I think here we can have two options: - Option 1: Put a Note for the callback: the callback function should not block and it should finish as soon as possible. (when implementing an interrupt handler, we also have such similar rules in mind, right?). For our use case, the callback just puts the reported page block to the ring, then returns. If the ring is full as the host is busy, then I think it should skip this one, and just return. Because: A. This is an optimization feature, losing a couple of free pages to report isn't that important; B. In reality, I think it's uncommon to see this ring getting full (I didn't observe ring full in the tests), since the host (consumer) is notified to take out the page block right after it is added. - Option 2: Put the callback function outside the lock What's input into the callback is just a pfn, and the callback won't access the corresponding pages. So, I still think it won't be an issue no matter what status of the pages is after they are reported (even they doesn't exit due to hot-remove). What would you guys think? Best, Wei