Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp291079rwd; Fri, 19 May 2023 20:34:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5CsfbwTu+yNOIlQvQ8lWo70ihW2N0x2UQQ009qaNKdvvSjD2aW5hYJOSVeY3w/Ikr+Ydnf X-Received: by 2002:a17:90a:928a:b0:23a:ad68:25a7 with SMTP id n10-20020a17090a928a00b0023aad6825a7mr4148103pjo.2.1684553652862; Fri, 19 May 2023 20:34:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684553652; cv=none; d=google.com; s=arc-20160816; b=CSF4eUEmIFrrX2VgzwT+8gzewy8fmtgQgEOve8trGglXzin2ORSiQXrOBjr/m8NfzO dOw0rSVS98nZCfNET+19y8ZBW/9ztk/wtI1I+WoDtSux7yqDr67S206h0eaqMm+O/B6U Hip6vMonSUydK0E4RFz8KY0w7iL39EPglmWsRZEuXjVAPYDVSAJ1DyjQh4gP2Vmq0Rkq e5e1YiA42PHDUNnLK4C6SnZKywjFzfkmJrVk5HYdkpKYRzv3nT1YVOu7mSjNXAh6olbj uEzFwMP8B5IIs0iMIqrHcxefKzwn5Q5ILRkKpW2SO84k1R+5JbZwnaVtrO6Z91BXZGXg ieRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=MpCD1jtGuwu5J2PBZwRsJvmd7HPbIcqI3mtODbtvJa0=; b=XvziwU6dh0qCiyMG+KaWBCj7S1naAYlSUZZ0ArYpxProNSUjseZvJvx7O6/fITbf2o 5tVwtd2Ymg7bERU4uACBWDV3vjhyebVXQdamuWxmUo5nXP7WxGWq87HqjcJBhlSvvfUo t5cCjQku2hXPcCJKXFRJPY66adoAVmgs6G0xTvLzJzlu7IDbhjaB4MUMM3spGwAfMWoV yUKMkqyG5oolxmvUFcpMqZG7kqdZ51cyQVvmzF13IlStfDNIipl20N5TayDuKc48HrKi iZicOJl8hFsn6Aqi9PWfhbTZ4zSUKAEM2Fn/7fGKONjS0Au5AKJyKMuPIQZ3Ri0PfWCQ 39Mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=LaVArUpz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c6-20020a6566c6000000b00530b3b245b8si713169pgw.418.2023.05.19.20.33.58; Fri, 19 May 2023 20:34:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=LaVArUpz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229568AbjETDRw (ORCPT + 99 others); Fri, 19 May 2023 23:17:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjETDRu (ORCPT ); Fri, 19 May 2023 23:17:50 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58BCACF for ; Fri, 19 May 2023 20:17:49 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-561deaad117so54819027b3.0 for ; Fri, 19 May 2023 20:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1684552668; x=1687144668; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MpCD1jtGuwu5J2PBZwRsJvmd7HPbIcqI3mtODbtvJa0=; b=LaVArUpzA/aA+gVeXaGSxBECo4G2Rf7ssuzQXn5M3i/GNDUeu1d28yPSn79uqV2+1I ZtLg1ix+cxOKdOR6y8PgUXNwkh7syZACcYLC1OtxJKfpzhiepiW8ZvsNFA8hp9VHwywt mh+HdkRD4FGGt2auwIsvwUuul9lqfSewlhzs8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684552668; x=1687144668; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MpCD1jtGuwu5J2PBZwRsJvmd7HPbIcqI3mtODbtvJa0=; b=KI3CbB0mDlTEgwptr2vPA/RjyjRidbocWYSuo/ebnU7cjNp9gIiUljHXO7H0wfWtk4 A+IKF28drnpIEcBnKPB3p1K+1fWAuAwTZ/EiFhaA33xnyLkOto25+EeYcY3E6j1yeEw6 Qf+dKQalASXLzZiuHYcfjc0Ba25Rd78N/dsDo5pxVhX9eq4bdcmFHijwzxUKpxK/fisi YWYVKBWm/d4uv56vCuRrIQ7+n7Q+v0vsFsBrWOYE3lXyMAm4hHIfLbY8/oIaCmytSOF7 K1AnW+h/ov0tfU02eppk+va3ePPvYqvnzDPK7SZuphH8nYvkMvMBRIGnWIf9MtfLqiPr dlgQ== X-Gm-Message-State: AC+VfDzlIpkYrCSBdQlBzFEQx3OdCSOBOQxriHd1Ott62K2O0YuQz7IT AT0A2NIYDThAhoOFpm+IP2uKpITaLXXhoHDNH7Tg6pGEx/HLoof+hjo= X-Received: by 2002:a81:840a:0:b0:561:eb35:a660 with SMTP id u10-20020a81840a000000b00561eb35a660mr3941669ywf.1.1684552668397; Fri, 19 May 2023 20:17:48 -0700 (PDT) MIME-Version: 1.0 References: <20230519190934.339332-1-joel@joelfernandes.org> <20230519190934.339332-2-joel@joelfernandes.org> In-Reply-To: From: Joel Fernandes Date: Fri, 19 May 2023 23:17:37 -0400 Message-ID: Subject: Re: [PATCH v2 1/4] mm/mremap: Optimize the start addresses in move_page_tables() To: Linus Torvalds Cc: linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Shuah Khan , Vlastimil Babka , Michal Hocko , Lorenzo Stoakes , Kirill A Shutemov , "Liam R. Howlett" , "Paul E. McKenney" , Suren Baghdasaryan Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Linus, On Fri, May 19, 2023 at 10:34=E2=80=AFPM Linus Torvalds wrote: > > On Fri, May 19, 2023 at 3:52=E2=80=AFPM Joel Fernandes wrote: > > > > > > I *suspect* that the test is literally just for the stack movement > > > case by execve, where it catches the case where we're doing the > > > movement entirely within the one vma we set up. > > > > Yes that's right, the test is only for the stack movement case. For > > the regular mremap case, I don't think there is a way for it to > > trigger. > > So I feel the test is simply redundant. > > For the regular mremap case, it never triggers. Unfortunately, I just found that mremap-ing a range purely within a VMA can actually cause the old and new VMA passed to move_page_tables() to be the same. I added a printk to the beginning of move_page_tables that prints all the a= rgs: printk("move_page_tables(vma=3D(%lx,%lx), old_addr=3D%lx, new_vma=3D(%lx,%lx), new_addr=3D%lx, len=3D%lx)\n", vma->vm_start, vma->vm_end, old_addr, new_vma->vm_start, new_vma->vm_end, new_addr, len); Then I wrote a simple test to move 1MB purely within a 10MB range and I found on running the test that the old and new vma passed to move_page_tables() are exactly the same. [ 19.697596] move_page_tables(vma=3D(7f1f985f7000,7f1f98ff7000), old_addr=3D7f1f987f7000, new_vma=3D(7f1f985f7000,7f1f98ff7000), new_addr=3D7f1f98af7000, len=3D100000) That is a bit counter intuitive as I really thought we'd be splitting the VMAs with such a move. Any idea what am I missing? Also, such a usecase will break with my patch as we may accidentally overwrite parts of a range that were not part of the mremap request. Maybe I should just turn off the optimization if vma =3D=3D new_vma, however that will also turn it off for the stack move so then maybe another way is to special case stack moves in move_page_tables(). So this means I have to go back to the drawing board a bit on this patch, and also add more tests in mremap_test.c to test such within-VMA moving. I believe there are no such existing tests... More work to do for me. :-) > And for the stack movement case by execve, I don't think it matters if > you just were to change the logic of the subsequent checks a bit. > > In particular, you do this: > > /* If the masked address is within vma, there is no prev > mapping of concern. */ > if (vma->vm_start <=3D addr_masked) > return false; > > /* > * Attempt to find vma before prev that contains the address. > * On any issue, assume the address is within a previous mapping. > * @mmap write lock is held here, so the lookup is safe. > */ > cur =3D find_vma_prev(vma->vm_mm, vma->vm_start, &prev); > if (!cur || cur !=3D vma || !prev) > return true; > /* The masked address fell within a previous mapping. */ > if (prev->vm_end > addr_masked) > return true; > > return false; > > And I think that > > if (!cur || cur !=3D vma || !prev) > return true; > > is actively wrong, because if there is no 'prev', then you should return = false. During my tests, I observed that there was always an existing, unrelated memory mapping present prior to the new memory region allocated by mmap. Based on this observation, I concluded that if there is no previous mapping (i.e., if prev is NULL), it indicates a potential issue with find_vma_prev(). Therefore, I designed this function to return here indicating that the masked address is not suitable for optimization, whenever prev is NULL. That's obviously confusing so I'll try to rewrite this part of the patch a bit better with appropriate comments. > So I *think* all of the above could just be replaced with this instead: > > find_vma_prev(vma->vm_mm, vma->vm_start, &prev); > return prev && prev->vm_end > addr_masked; > > because only if we have a 'prev', and the prev is into that masked > address, do we need to avoid doing the masking. > > With that simplified test, do you even care about that whole "the > masked address was already in the vma"? Not that I can see. > > And we don't even care about the return value of 'find_vma_prev()', > because it had better be 'vma'. We're giving it 'vma->vm_start' as an > address, for chrissake! > > So if you *really* wanted to, you could do something like > > cur =3D find_vma_prev(..); > if (WARN_ON_ONCE(cut !=3D vma)) > return true; > > but even that WARN_ON_ONCE() seems pretty bogus. If it triggers, we > have some serious corruption going on. > > So I stil find that whole "vma->vm_start <=3D addr_masked" test a bit > confusing, since it seems entirely redundant. > > Is it just because you wanted to avoid calling "find_vma_prev()" at > all? Maybe just say that in the comment. Yes exactly, I did not want to run find_vma_prev() unnecessarily. I will add such clarifications in the comments. Thanks for all the comments so far, I will continue to work on this. - Joel