MIME-Version: 1.0
References: <20230130130739.563628-1-arnd@kernel.org> <Y9fITnjnIuDz8NYw@dhcp22.suse.cz>
 <CAG_fn=UC-zPKfXvhnXO-Hb7Wp3+BJyT4WvotATigigb63N0tdA@mail.gmail.com> <Y9kwTXhAQiEWz0IJ@dhcp22.suse.cz>
In-Reply-To: <Y9kwTXhAQiEWz0IJ@dhcp22.suse.cz>
From:   Alexander Potapenko <glider@google.com>
Date:   Tue, 31 Jan 2023 17:03:48 +0100
Message-ID: <CAG_fn=XLRa0hz0R5JpLLWhe62669yGcT_ME5g8hEVo4yuXuOmg@mail.gmail.com>
Subject: Re: [PATCH] mm: extend max struct page size for kmsan
To:     Michal Hocko <mhocko@suse.com>
Cc:     Arnd Bergmann <arnd@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Alexander Duyck <alexander.h.duyck@linux.intel.com>,
        Arnd Bergmann <arnd@arndb.de>,
        "Matthew Wilcox (Oracle)" <willy@infradead.org>,
        David Hildenbrand <david@redhat.com>,
        "Liam R. Howlett" <Liam.Howlett@oracle.com>,
        John Hubbard <jhubbard@nvidia.com>,
        Naoya Horiguchi <naoya.horiguchi@nec.com>,
        Hugh Dickins <hughd@google.com>,
        Suren Baghdasaryan <surenb@google.com>,
        Alex Sierra <alex.sierra@amd.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, pasha.tatashin@soleen.com
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

> > Right now KMSAN allocates its metadata at boot time, when tearing down memblock.
> > At that point only a handful of memory ranges exist, and it is pretty
> > easy to carve out some unused pages for the metadata for those ranges,
> > then divide the rest evenly and return 1/3 to the system, spending 2/3
> > to keep the metadata for the returned pages.
> > I tried allocating the memory lazily (at page_alloc(), for example),
> > and it turned out to be very tricky because of fragmentation: for an
> > allocation of a given order, one needs shadow and origin allocations
> > of the same order [1], and alloc_pages() simply started with ripping
> > apart the biggest chunk of memory available.
>
> page_ext allocation happens quite early as well. There shouldn't be any
> real fragmentation that early during the boot.

Assuming we are talking about the early_page_ext_enabled() case, here
are the init functions that are executed between kmsan_init_shadow()
and page_ext_init():

        stack_depot_early_init();
        mem_init();
        mem_init_print_info();
        kmem_cache_init();
        /*
         * page_owner must be initialized after buddy is ready, and also after
         * slab is ready so that stack_depot_init() works properly
         */
        page_ext_init_flatmem_late();
        kmemleak_init();
        pgtable_init();
        debug_objects_mem_init();
        vmalloc_init();

There's yet another problem besides fragmentation: we need to allocate
shadow for every page that was allocated by these functions.
Right now this is done by kmsan_init_shadow, which walks all the
existing memblock ranges, plus the _data segment and the node data for
each node, and grabs memory from the buddy allocator.
If we delay the metadata allocation to the point where memory caches
exist, we'll have to somehow walk every allocated struct page and
allocate the metadata for each of those. Is there an easy way to do
so?

I am unsure if vmalloc_init() creates any virtual mappings (probably
not?), but if it does, we'd also need to call
kmsan_vmap_pages_range_noflush() for them once we set up the metadata.
With the current metadata allocation scheme it's not needed, because
the buddy allocator is torn down before the virtual mappings are
created.

In the ideal world, we'd better place KMSAN shadow/origin pages at
fixed addresses, like this is done for KASAN - that would not require
storing pointers in struct page.
But reserving big chunks of the address space is even harder than
what's currently being done.