Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp975246pxb; Fri, 22 Apr 2022 15:45:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwJdDe+Fh1GKROgSAYvs7U1cUaXPEyQnXOGcwWlxVtYiossgSYqPGwCxDFthvSXhuPd3wP0 X-Received: by 2002:aca:a855:0:b0:322:8ccf:8988 with SMTP id r82-20020acaa855000000b003228ccf8988mr7653929oie.46.1650667504388; Fri, 22 Apr 2022 15:45:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650667504; cv=none; d=google.com; s=arc-20160816; b=VtZ8vW9RXh4C5aMz5BwRTDU8sWDkWk4DVs86MJyQIobPYl5Uhm4LcnRo6FwIsI7vq9 M5v63nW+0n1UvgS1E2ON1T4nAgnwHZ82hM+kOsypA9tOs76oCJ2aeP0RUOE21/63EDwM tIyXeACRx74/NneXalAo2I0dQxhhgpoiBbI/jCCyH2klZDi1fDaRZ3IQgqfXThCs2ViE TXigXXw3ohZ921oYCWdUYP1XPegMFfZvKj+doqRNYhY+YzK63uVpErDJQo/6piqflLKr BVjmdkMBumm+lH/idsyGiMFpPMAGPkh7ySGiG+kJdbZqOiS96hUJJ2MTop6iy//bI9Bl 3r5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=5ydRGpptIb5w81SmMvS/nipUS4vYlFd2laIjOx1l84s=; b=rDD/4j2dc4VgYF07TN2it8l0lB2fVxpTromCUt5CPgEMX8P8WQs3xKZOimilcxHJfh qFU9FGYMLmhBkvkbd6+lr5fl+4JiVSvbTfiusDkDpFGDwDODZz78HHuzsfEUwnk8Oams jswfbn7QFUQbqXBImFHK2ZRD7xHTmcj1nXdPR3ICnA4ygo9iUxHz3VvRAhW+hNx5qPXT k3GYAhOwDhBx/IVUxd2S8bEoUcvABJxPVOuXxSWt3hcWo9am8HYbrGu/wu1e/HZSQbuC dj31MHzeePbarpKAk/9MWUooLBipIKxVMdx/w6LEp6inwQtMus6QEEUcLxQyEAOqGtq1 GxEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Lvd464N6; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id y73-20020aca4b4c000000b00324fe597901si664489oia.122.2022.04.22.15.45.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 15:45:04 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=Lvd464N6; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0EFF72D73BE; Fri, 22 Apr 2022 13:33:29 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232597AbiDVRLU (ORCPT + 99 others); Fri, 22 Apr 2022 13:11:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236877AbiDVRLP (ORCPT ); Fri, 22 Apr 2022 13:11:15 -0400 Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D66DD8BF12 for ; Fri, 22 Apr 2022 10:08:19 -0700 (PDT) Received: by mail-lj1-x234.google.com with SMTP id q185so1221659ljb.5 for ; Fri, 22 Apr 2022 10:08:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5ydRGpptIb5w81SmMvS/nipUS4vYlFd2laIjOx1l84s=; b=Lvd464N6kYN0JkSu1d8HO48IyDf7HODgjimqZsNgT54ppx6uKMwlrVXLSjUGD98RhY Pq0tcSZ4P6rrA2KwURvfmZxdD/Rhh4j7v6eCQuGetT0NswRtG/u1JUAK4PZA48TMiRyP uy1ByLyEeu1hJJmzsmM51y7T/M7OtsP2SKMmg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5ydRGpptIb5w81SmMvS/nipUS4vYlFd2laIjOx1l84s=; b=BmNFrkdy60txAbwED9SCqVxzrIhQLN1/kKbEkDzs5IVemNxLwznVFdKXIVxxwUXviD EgMhpEZlChhPB+Ug5paq6/FXHj8YFANMsTwXLdju5fBZSLumwScYatYriCkhcVGhcqJD dtTNuJHDk9xDjnvBzyn/OzyKQbs7MdqOAuPxnOzYLoGjdfOkLsNDzj66lBRQMQF5Lk6M 30pLXBj34cRQzBn3FLARfPvQRO3DNzkLgUytsPtvWw1D52r0kLDn8KlMIrPkEIraSF6e FzJMWPOfDJVDHcQTHEeZvwoPbF/5pgkK8vWUnGzUZR86On01HByIRBuMsc1bZg0/dsYp a2pA== X-Gm-Message-State: AOAM5338kZNt6Cyzfe1HckQ1JOtxJhXfvneQeUCZizMY3tKLZA3GiInF 2UpqTGrV7et68Hh3YH262bM7IJ+yVjCRDkzhIFg= X-Received: by 2002:a05:651c:179a:b0:247:d37b:6ec5 with SMTP id bn26-20020a05651c179a00b00247d37b6ec5mr3245071ljb.112.1650647297771; Fri, 22 Apr 2022 10:08:17 -0700 (PDT) Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com. [209.85.208.172]) by smtp.gmail.com with ESMTPSA id w22-20020a194916000000b0046ba5a6ff16sm288861lfa.11.2022.04.22.10.08.17 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Apr 2022 10:08:17 -0700 (PDT) Received: by mail-lj1-f172.google.com with SMTP id bf11so10402498ljb.7 for ; Fri, 22 Apr 2022 10:08:17 -0700 (PDT) X-Received: by 2002:a2e:b818:0:b0:24c:ce86:e6d6 with SMTP id u24-20020a2eb818000000b0024cce86e6d6mr3398453ljo.443.1650647296730; Fri, 22 Apr 2022 10:08:16 -0700 (PDT) MIME-Version: 1.0 References: <20220422060107.781512-1-npiggin@gmail.com> <20220422060107.781512-3-npiggin@gmail.com> In-Reply-To: <20220422060107.781512-3-npiggin@gmail.com> From: Linus Torvalds Date: Fri, 22 Apr 2022 10:08:00 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] Revert "vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP" To: Nicholas Piggin Cc: Paul Menzel , "the arch/x86 maintainers" , Song Liu , "Edgecombe, Rick P" , Andrew Morton , Linux Kernel Mailing List , Linux-MM Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 21, 2022 at 11:01 PM Nicholas Piggin wrote: > > This reverts commit 559089e0a93d44280ec3ab478830af319c56dbe3 > > The previous commit fixes huge vmalloc for drivers that use the > vmalloc_to_page() struct pages. Yeah, no. The very revert shows the problem: > --- a/arch/powerpc/kernel/module.c > +++ b/arch/powerpc/kernel/module.c > @@ -101,7 +101,7 @@ __module_alloc(unsigned long size, unsigned long start, unsigned long end, bool > * too. > */ > return __vmalloc_node_range(size, 1, start, end, gfp, prot, > - VM_FLUSH_RESET_PERMS, > + VM_FLUSH_RESET_PERMS | VM_NO_HUGE_VMAP, > NUMA_NO_NODE, __builtin_return_address(0)); This VM_NO_HUGE_VMAP is a sign of the fact that using hugepages for mapping still isn't a transparent operation. Now, in some cases that would be perfectly fine, ie the s390 case has a nice clear comment about how it's a very special case: > + /* > + * The Create Secure Configuration Ultravisor Call does not support > + * using large pages for the virtual memory area. > + * This is a hardware limitation. > + */ > + kvm->arch.pv.stor_var = vmalloc_no_huge(vlen); but as long as it is "anything that plays permission games with the mapping is broken" we are not reverting that opt-in thing. And no, it's not just that powerpc module code that is somehow magical. This is the exact same issue that the bpf people hit. It's also elsewhere, although it might well be hidden by "small allocations will never trigger this" (eg the arm64 kprobes case only does a single page). I also wonder how this affects any use of 'set_memory_xyz()' with partial mappings (I can point to "frob_text()" and friends for modules, but I can easily imagine drivers doing odd things). In particular, x86 does support pmd splitting for pmd's in set_memory_xyz(), but I *really* couldn't tell you that it's ok with a largepage that has already had its page counts split. It only used to hit the big IO mappings traditionally. Now I *think* it JustWorks(tm) - I don't actually see any obvious problems there - and I also really hope that nobody actually even does that "partial set_memory" on some vmalloc allocation in the first place, but no, that kind of "let's hope" is not ok. And we already know it happens at least for modules. And no, don't even start about that "it's x86". It *still* isn't about x86 as shown by this very patch. The issue is generic, and x86 just tends to hit more odd cases and drivers. In fact, I think x86 probably does *better* than powerpc. Because it looks like 'set_memory_xyz()' just returns an error for vmalloc addresses on powerpc. Sounds strange. Doesn't powerpc do STRICT_MODULE_RWX? Does it work only because 'frob_text()' doesn't actually check the return value? Or maybe set_memory_xyz() is ok and it is *only* VM_FLUSH_RESET_PERMS that doesn't work? I don't know. But I do know bpf was affected, and I'm looking at that module thing, and so I suspect it's elsewhere too. Just opt-in with the mappings that matter. Linus