Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp5520910rwb; Tue, 17 Jan 2023 15:08:38 -0800 (PST) X-Google-Smtp-Source: AMrXdXscscGLNoSwguRtFq6i8+ZysmKD2tovYbgQ45UJPaJPC0Z8XsyrVYZgqaCgbYLzAmK6EC2h X-Received: by 2002:a17:902:eaca:b0:189:cbf6:9534 with SMTP id p10-20020a170902eaca00b00189cbf69534mr4925264pld.0.1673996918689; Tue, 17 Jan 2023 15:08:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673996918; cv=none; d=google.com; s=arc-20160816; b=TNms32+vZOvXeAR2FB0OJk1RO0AZHHZyTpt+63FBTIRLSPYfxpbnMTELZXnjPxmMoe zWRdmo2s09YEg0uv+rZkJhB+EoTX+2fW1aOSXiFbad5buuKrqE0UImM7vr3mlX7r61at uYWYxD0mGIen1HDDQ6a2OBR1EXHtxL/9cJnjrQrzgxaqYeT1E85t95NZ4luyxWEJzOW6 Tx5ZPfMXS9/iElVFlRYkOkAy2/Y8mkskxmUwD0MNNcmXhKBDovYVI18IrR9cfKY3mbE8 Ie64RhmrnSroFO1tBtWV4en0w7UWK/M4CO+qs46lntUAjYLFctaTAIq/ayyOUbakiXEB Vgdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=SgeMP37+5GkMMY5cU2F9SAhbA9yv/IZm6bw7oKKCsAE=; b=DjBPwHMeHrz5xqcCGGOsUU7almk+Ug9nz8sBG5sSKltww7iSFMeyRZebSnHTIQ7oEb JvjYRmKkbDg0f5wKt+bSAtVXAgV6bDC4QgTK+avR7ucYp2erwMnIdH7mHBKfAESbglMn /Ar9gSyrCI97SuuabxGTWG73HHfdPLbK4fcJSrbsaEV70wZym8fR0Y/nBuZoF/IZ5Qpa da4THO7+mP1lr73Cua2bFmgJ2uM8mLG1D0mxn6Ga/12uKZlO2AREaasEJqbYWLiwkvbs U9r3ACkE1wh4or0CcoQp7P8nIth5SNTBxcU1wXmB8fD9PBePckWhHcrou2htn/cWRVwP jvuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XeyC2680; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ba8-20020a170902720800b001949d488c0asi5165892plb.508.2023.01.17.15.08.33; Tue, 17 Jan 2023 15:08:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XeyC2680; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229631AbjAQWm6 (ORCPT + 46 others); Tue, 17 Jan 2023 17:42:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229817AbjAQWlo (ORCPT ); Tue, 17 Jan 2023 17:41:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60B585AA5B for ; Tue, 17 Jan 2023 13:54:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673992483; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SgeMP37+5GkMMY5cU2F9SAhbA9yv/IZm6bw7oKKCsAE=; b=XeyC26800ozv9myXbMau+/aj71H/Vbvu2uwac5xv1n2OCnplVqUu6t06DIXm7Ry8OT3WSZ nJZL70EtIsY5ruN7s0wXwjPfBu97LE9wsifw+ugwyhZALNvsuYU3XRChArAotw5WY6Llse qQMIfuncdHM1AIqy4NGe5bahcG72Fis= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-447-rMh5lSdbNp-_GQO5QR-wkQ-1; Tue, 17 Jan 2023 16:54:42 -0500 X-MC-Unique: rMh5lSdbNp-_GQO5QR-wkQ-1 Received: by mail-qt1-f198.google.com with SMTP id br26-20020a05622a1e1a00b003b62dc86831so2581497qtb.8 for ; Tue, 17 Jan 2023 13:54:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=SgeMP37+5GkMMY5cU2F9SAhbA9yv/IZm6bw7oKKCsAE=; b=II6B/ARpSXbRQu1sjUHudOBpftIUaVl9OJWboIqsrjHIPAHusMQRTYSFup/CvlWniR AsB/mXhbNXUSF+jgZAXQVoVTDPynRcc6eiZpZ0qK02WfHjKPyMjFLSUrdq9VYYzgEXHi 75V+5+P0wxG84m2FyGpPam2C8kR3ar9IwOEHqNdngfpCMKb08priygmfGMzMlMrnIZHM 4xQa0KpkQfUSSuEh2WSPId90u/CgS2P1jMrzYI2rSFKmXL2meR5r2I86Qe/IpiwpV+V2 SrNyUvsGgh642PDLXN/pnTQ1468aYU6G8zr2YJJhNSaoPxBqDS9o1M0p9Vr5WO0ZzYQF gEdA== X-Gm-Message-State: AFqh2kq+2O4Zcof7ZTMmVyrCOcXHkhbzH2+St4VLjjkP+YXe1t4yfyXd /yAtlfs3rYWDcDOuvOhi0mtvi6QR4vtdTmoZfdo9iAddMlbZsGmV8HGu9XW5sP6FgAVtuM8nFgr bI36KNO5blC1iXDvUMS1S0/g5 X-Received: by 2002:a05:6214:2c0a:b0:532:35ef:203a with SMTP id lc10-20020a0562142c0a00b0053235ef203amr7280043qvb.31.1673992481889; Tue, 17 Jan 2023 13:54:41 -0800 (PST) X-Received: by 2002:a05:6214:2c0a:b0:532:35ef:203a with SMTP id lc10-20020a0562142c0a00b0053235ef203amr7280029qvb.31.1673992481692; Tue, 17 Jan 2023 13:54:41 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-56-70-30-145-63.dsl.bell.ca. [70.30.145.63]) by smtp.gmail.com with ESMTPSA id bj3-20020a05620a190300b00705975d0054sm21166567qkb.19.2023.01.17.13.54.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Jan 2023 13:54:40 -0800 (PST) Date: Tue, 17 Jan 2023 16:54:38 -0500 From: Peter Xu To: James Houghton Cc: Mike Kravetz , Muchun Song , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Zach O'Keefe , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 35/46] hugetlb: add MADV_COLLAPSE for hugetlb Message-ID: References: <20230105101844.1893104-1-jthoughton@google.com> <20230105101844.1893104-36-jthoughton@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 17, 2023 at 01:38:24PM -0800, James Houghton wrote: > > > + if (curr < end) { > > > + /* Don't hold the VMA lock for too long. */ > > > + hugetlb_vma_unlock_write(vma); > > > + cond_resched(); > > > + hugetlb_vma_lock_write(vma); > > > > The intention is good here but IIUC this will cause vma lock to be taken > > after the i_mmap_rwsem, which can cause circular deadlocks. If to do this > > properly we'll need to also release the i_mmap_rwsem. > > Sorry if you spent a long time debugging this! I sent a reply a week > ago about this too. Oops, yes, I somehow missed that one. No worry - it's reported by lockdep. :) > > > > > However it may make the resched() logic over complicated, meanwhile for 2M > > huge pages I think this will be called for each 2M range which can be too > > fine grained, so it looks like the "cur < end" check is a bit too aggresive. > > > > The other thing is I noticed that the long period of mmu notifier > > invalidate between start -> end will (in reallife VM context) causing vcpu > > threads spinning. > > > > I _think_ it's because is_page_fault_stale() (when during a vmexit > > following a kvm page fault) always reports true during the long procedure > > of MADV_COLLAPSE if to be called upon a large range, so even if we release > > both locks here it may not tremedously on the VM migration use case because > > of the long-standing mmu notifier invalidation procedure. > > Oh... indeed. Thanks for pointing that out. > > > > > To summarize.. I think a simpler start version of hugetlb MADV_COLLAPSE can > > drop this "if" block, and let the userapp decide the step size of COLLAPSE? > > I'll drop this resched logic. Thanks Peter. Sounds good, thanks. -- Peter Xu