Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1157924rwl; Sat, 25 Mar 2023 19:27:41 -0700 (PDT) X-Google-Smtp-Source: AKy350ZG4k18s+Rh/56Dy2XGb0jXMZB683ck7BHwnxuooZhNcc3klCzzqqJ4pCjoh1wFDsIECcJo X-Received: by 2002:a17:902:f2ca:b0:1a1:c54c:1a36 with SMTP id h10-20020a170902f2ca00b001a1c54c1a36mr6267814plc.63.1679797661065; Sat, 25 Mar 2023 19:27:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679797661; cv=none; d=google.com; s=arc-20160816; b=uTsJeihsXPTlUy6iy6+z948oouYoPReGrMzVzTiNe2VE9+Ly4vO5VkvjDvgXHCksNO UVb5yW/7xQO6eYLxkSC+P98J98Zb5O9YA3jA5rwGdkS01ep5T9UhM1KGTBYrrbYNmdhU F5CxuRqreqm3AiqIvJ+fBeXBwVzR1FIPR0wi1eIqTHBc9lylT6fYaIXV5h0sq8gcZ5JK xczZfmZQ/zLDWYDFwaeEm2S4xshvjki63vW1YhYd0K3gWE8YkQ1Yh/szWgLEObJKSk3m nPXf0E4Eb36eFnnJI9va2pNz3YgYMinp8g0aIxHzDC/naxhZwebzYw5e2R1bCAegx4/+ XEFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=e6MdwF8dxq5T17Hd4Ut5Up4l3/CTfKzbkFdh3c/8I8g=; b=UOuBHGXOqCROEPGTzByc7U611O8DJx8GyFIzC4iKcF78KVkh6/H6o9lZUYc36OUwWT Mjn7em9qjEtWSFUvvwqbusx8Gl4isH318sqvgJRNZjWXzW42dNI0Yyqn8gJClQnbq7wN x4usgYmdVHseoAGspe24ZuiUJMXv11y+VNe/v+YWfLCPahupuCqdsMHlY15lfSuVZaIR MVXv5YR+PClK9TqtLmiV37LRbNhyndubzqxNpg2bRMSY1EQTKK9ouXtr3BE0CipniBuP NTNPfWyzKKc2eBHwev8AiNJt0eF1/2nc/FYmHpj/o2iaWU15CxN/lDNdKnG75QGVVPw8 5zEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b="TSuXUA1/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x8-20020a170902b40800b0019e5e46c574si23792274plr.348.2023.03.25.19.27.24; Sat, 25 Mar 2023 19:27:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b="TSuXUA1/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229711AbjCZC1E (ORCPT + 99 others); Sat, 25 Mar 2023 22:27:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjCZC1D (ORCPT ); Sat, 25 Mar 2023 22:27:03 -0400 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD197359D for ; Sat, 25 Mar 2023 19:27:01 -0700 (PDT) Received: by mail-qv1-xf30.google.com with SMTP id qh28so4590780qvb.7 for ; Sat, 25 Mar 2023 19:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1679797619; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=e6MdwF8dxq5T17Hd4Ut5Up4l3/CTfKzbkFdh3c/8I8g=; b=TSuXUA1/rCTle+jIAbff5cgl68pVipoCkyZahYjwXFVw3yy1QD3Se9VxkbwLihlFG+ CV7XufO3H3RccEfcaTc/NESCLXd3E4MQZntSAiNWJgXL5kkSuO+kpDkA9Cs4pQ/YumPJ RIThw5pxBIn3QSlN69CniIwpd5RO/fFYFjyBc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679797619; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=e6MdwF8dxq5T17Hd4Ut5Up4l3/CTfKzbkFdh3c/8I8g=; b=r5sLKqAwMdfuQOflaAazIdGQcU4pHddA/yFzGuAGylEihFDEhVf3E8qFFR4GDSfYsA os791f9ku0bGxHmWjR6n189yMX93wDSrPhGLRwOLHyMlbcgxdeKcDJAi6VZiuf62Zi95 sMkNo/d75n4umVIwMlf7eeHLo6Jj5KQHpSB6pGa3Zmmv6D7O36miFMihcSrecVjfO9Rc LgwBnMWZAVVrDGxdK6ukaOcmJfDFRT+Ejr1iy4kAn5gRWebVq+Hx0YDGqBAi1mw2LSM4 wYPG7d8NWMO2DyKj45w71foUyeqgiqiGlYYdoth+ZhCP5yjdNABqR+FxR3x/ZOo/wheX setw== X-Gm-Message-State: AAQBX9elRLpfdQuKVY0qgUkOFbSuNLsu7rKb1y7MqtM6P06WrHorZ3hj 7w0LwRbmyzK7Offvv/OU5fG84g== X-Received: by 2002:a05:6214:27e1:b0:5d1:acb8:f126 with SMTP id jt1-20020a05621427e100b005d1acb8f126mr11149062qvb.38.1679797619643; Sat, 25 Mar 2023 19:26:59 -0700 (PDT) Received: from localhost (129.239.188.35.bc.googleusercontent.com. [35.188.239.129]) by smtp.gmail.com with ESMTPSA id ne19-20020a056214425300b005dd8b9345a1sm1889591qvb.57.2023.03.25.19.26.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 25 Mar 2023 19:26:58 -0700 (PDT) Date: Sun, 26 Mar 2023 02:26:58 +0000 From: Joel Fernandes To: Linus Torvalds Cc: "Kirill A. Shutemov" , Michal Hocko , Naresh Kamboju , Andrew Morton , linux-mm@kvack.org, LKML Subject: Re: WARN_ON in move_normal_pmd Message-ID: <20230326022658.GB3142556@google.com> References: <20230324130530.xsmqcxapy4j2aaik@box.shutemov.name> <20230325163323.GA3088525@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=1.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, URIBL_BLACK autolearn=no autolearn_force=no version=3.4.6 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Mar 25, 2023 at 10:06:59AM -0700, Linus Torvalds wrote: > On Sat, Mar 25, 2023 at 9:33 AM Joel Fernandes wrote: > > > > I actually didn't follow what you meant by "mutually PMD-aligned". Could you > > provide some example address numbers to explain? > > Sure, let me make this more clear with a couple of concrete examples. > > Let's say that we have a range '[old, old+len]' and we want to remap > it to '[new, new+len]'. > > Furthermore, we'll say that overlapping is fine, but because we're > always moving pages from "low address to high", we only allow > overlapping when we're moving things down (ie 'new < old'). > > And yes, I know that the overlapping case cannot actually happen with > mremap() itself. So in practice the overlapping case only happens for > the special "move the stack pages" around at execve() startup, but > let's ignore that for now. > > So we'll talk about the generic "move pages around" case, not the more > limited 'mremap()' case. > > I'll also simplify the thing to just assume that we have that > CONFIG_HAVE_MOVE_PMD enabled, so I'll ignore some of the full grotty > details. > > Ok? [...] > > we could easily decode "let's just move the whole PMD", and expand the > move to be > > old = 0x1e00000 > new = 0x1c00000 > len = 0x400000 > instead. And then instead of moving PTE's around at first, we'd move > PMD's around *all* the time, and turn this into that "simple case > (a)". Right, I totally get what you mean. You want to move more than the 4k pages in the beginning of the mapping. In fact the whole PMD, which extends further below the destination to capture the full PMD that the first 4k pages are located in. With that you get to just move PMDs purely all the way. I think that is a great idea. > NOTE! For this to work, there must be no mapping right below 'old' or > 'new', of course. But during the execve() startup, that should be > trivially true. Exactly it wont work if there is something below old or new. So for that very reason, we still have to handle the bad case where the source PMD was not deleted right? Because if there is something below new, you'll need to copy 1 PTE at a time till you hit the 2MB boundary, because you can't mess with that source PMD, it is in use to satisfy mappings below new. Then you'll eventually hit the warning we are discussing. I guess even if one can assure that there is no mapping below new for the execve() case, it still cannot be guaranteed for the mremap() case I think. But I agree, if there is no mapping below old/new, then we can just do this as an optimization. I think all that is needed to do is to check whether there are any VMAs at those locations, but correct me if I'm wrong as I'm not an mm expert. > See what I'm saying? Yep. And as you pointed out in the mremap example, this issue can also show up with non-overlapping ranges if I'm not mistaken. I get your idea. Allow me to digest all this a bit more, and since it is not urgent and this stuff is going to take some careful work with proper test cases etc, let me take this up and work on it. But your idea is loud and clear. I am also working on sending you that RCU PR and working hard to not screw that up so it is a bit busy :-P. And thank you again for the great idea and discussion! Looking forward to working on this. thanks, - Joel