2022-05-28 18:31:40

by FanJun Kong

[permalink] [raw]
Subject: [PATCH] mm/slub: replace alloc_pages with folio_alloc

From: Fanjun Kong <[email protected]>

This patch will use folio allocation functions for allocating pages.

Signed-off-by: Fanjun Kong <[email protected]>
---
mm/slub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index e5535020e0fd..00c4049a17d6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1794,9 +1794,9 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
unsigned int order = oo_order(oo);

if (node == NUMA_NO_NODE)
- folio = (struct folio *)alloc_pages(flags, order);
+ folio = (struct folio *)folio_alloc(flags, order);
else
- folio = (struct folio *)__alloc_pages_node(node, flags, order);
+ folio = (struct folio *)__folio_alloc_node(node, flags, order);

if (!folio)
return NULL;
--
2.36.0



2022-05-28 20:38:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: replace alloc_pages with folio_alloc

On Sun, May 29, 2022 at 12:11:58AM +0800, [email protected] wrote:
> From: Fanjun Kong <[email protected]>
>
> This patch will use folio allocation functions for allocating pages.

That's not actually a good idea. folio_alloc() will do the
prep_transhuge_page() step which isn't needed for slab.

> Signed-off-by: Fanjun Kong <[email protected]>
> ---
> mm/slub.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index e5535020e0fd..00c4049a17d6 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1794,9 +1794,9 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node,
> unsigned int order = oo_order(oo);
>
> if (node == NUMA_NO_NODE)
> - folio = (struct folio *)alloc_pages(flags, order);
> + folio = (struct folio *)folio_alloc(flags, order);
> else
> - folio = (struct folio *)__alloc_pages_node(node, flags, order);
> + folio = (struct folio *)__folio_alloc_node(node, flags, order);
>
> if (!folio)
> return NULL;
> --
> 2.36.0
>
>

2022-05-29 03:00:21

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: replace alloc_pages with folio_alloc

On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote:
> On Sun, May 29, 2022 at 12:11:58AM +0800, [email protected] wrote:
> > From: Fanjun Kong <[email protected]>
> >
> > This patch will use folio allocation functions for allocating pages.
>
> That's not actually a good idea. folio_alloc() will do the
> prep_transhuge_page() step which isn't needed for slab.
>

You mean folio_alloc() is dedicated for THP allocation? It is a little
surprise to me. I thought folio_alloc() is just a variant of alloc_page(),
which returns a folio struct instead of a page. Seems like I was wrong.
May I ask what made us decide to do this?

Thanks.


2022-05-29 03:33:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: replace alloc_pages with folio_alloc

On Sun, May 29, 2022 at 10:58:18AM +0800, Muchun Song wrote:
> On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote:
> > On Sun, May 29, 2022 at 12:11:58AM +0800, [email protected] wrote:
> > > From: Fanjun Kong <[email protected]>
> > >
> > > This patch will use folio allocation functions for allocating pages.
> >
> > That's not actually a good idea. folio_alloc() will do the
> > prep_transhuge_page() step which isn't needed for slab.
>
> You mean folio_alloc() is dedicated for THP allocation? It is a little
> surprise to me. I thought folio_alloc() is just a variant of alloc_page(),
> which returns a folio struct instead of a page. Seems like I was wrong.
> May I ask what made us decide to do this?

Yeah, the naming isn't great here. The problem didn't really occur
to me until I saw this patch, and I don't have a good solution yet.
We're in the middle of a transition, but the transition is likely to
take years and I don't think we necessarily have the final form of the
transition fully agreed to or understood, so we should come up with
something better for the transition.

Ignoring the naming here, memory allocated to filesystems can be split,
but the split can fail, so they need the page-deferred-list and the
DTOR. Memory allocated to slab cannot be split, so initialising the
page-deferred-list is a waste of time.

The end-goal is to split apart allocating the memory from allocating
its memory descriptor (which I like to call memdesc). So for filesystem
folios, we'd call slab to allocate a struct folio and then tell the
buddy allocator "here is the memdesc of type folio, allocate
me 2^n pages and make pfn_to_memdesc return this memdesc for each of
the 2^n pages in it".

In this end-goal, slab would also allocate a struct slab (... there's
a recursion problem here which has a solution ...), and then allocate
2^n pages. But until we're ready to shrink struct page down to one
or two words, doing this is just a waste of memory and time.

So I still don't have a good solution to receiving patches like this
other than maybe adding a comment like

/* Do not change this to allocate a folio */

which will be ignored.

2022-05-30 02:28:02

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/slub: replace alloc_pages with folio_alloc

On Sun, May 29, 2022 at 04:31:07AM +0100, Matthew Wilcox wrote:
> On Sun, May 29, 2022 at 10:58:18AM +0800, Muchun Song wrote:
> > On Sat, May 28, 2022 at 05:27:11PM +0100, Matthew Wilcox wrote:
> > > On Sun, May 29, 2022 at 12:11:58AM +0800, [email protected] wrote:
> > > > From: Fanjun Kong <[email protected]>
> > > >
> > > > This patch will use folio allocation functions for allocating pages.
> > >
> > > That's not actually a good idea. folio_alloc() will do the
> > > prep_transhuge_page() step which isn't needed for slab.
> >
> > You mean folio_alloc() is dedicated for THP allocation? It is a little
> > surprise to me. I thought folio_alloc() is just a variant of alloc_page(),
> > which returns a folio struct instead of a page. Seems like I was wrong.
> > May I ask what made us decide to do this?
>
> Yeah, the naming isn't great here. The problem didn't really occur
> to me until I saw this patch, and I don't have a good solution yet.

OK, I have an idea.

None of the sl*b allocators use the page refcount. So the
atomic operations on it are just a waste of time. If we add an
alloc_unref_page() to match our free_unref_page(), that'll be enough
difference to stop pepole sending "helpful" patches. Also, it'll be a
(small?) performance improvement for slab.

2022-05-30 13:43:21

by kernel test robot

[permalink] [raw]
Subject: [mm/slub] 4c863a2fd7: WARNING:at_mm/page_alloc.c:#__alloc_pages



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 4c863a2fd728c505831f4d200c4d689b8b389a70 ("[PATCH] mm/slub: replace alloc_pages with folio_alloc")
url: https://github.com/intel-lab-lkp/linux/commits/bh1scw-gmail-com/mm-slub-replace-alloc_pages-with-folio_alloc/20220529-001434
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/linux-mm/[email protected]

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+---------------------------------------------+------------+------------+
| | dee992745d | 4c863a2fd7 |
+---------------------------------------------+------------+------------+
| boot_successes | 22 | 0 |
| boot_failures | 0 | 18 |
| WARNING:at_mm/page_alloc.c:#__alloc_pages | 0 | 18 |
| RIP:__alloc_pages | 0 | 18 |
| BUG:kernel_NULL_pointer_dereference,address | 0 | 18 |
| Oops:#[##] | 0 | 18 |
| RIP:__irq_domain_alloc_irqs | 0 | 18 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 18 |
+---------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 0.335509][ T0] ------------[ cut here ]------------
[ 0.336091][ T0] WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:5483 __alloc_pages+0x5e/0x1e3
[ 0.336971][ T0] Modules linked in:
[ 0.337362][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-10160-g4c863a2fd728 #1
[ 0.338213][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 0.339282][ T0] RIP: 0010:__alloc_pages+0x5e/0x1e3
[ 0.339843][ T0] Code: 41 0f ba e0 0d c7 44 24 08 01 00 00 00 f3 ab 72 20 83 fe 0a 76 24 80 3d c6 b0 93 01 00 0f 85 64 01 00 00 c6 05 b9 b0 93 01 01 <
0f> 0b e9 56 01 00 00 83 fe 0a 0f 87 4d 01 00 00 8b 3d e2 af 95 01
[ 0.341881][ T0] RSP: 0000:ffffffff82603da0 EFLAGS: 00010046
[ 0.342495][ T0] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 0.343324][ T0] RDX: 0000000000000000 RSI: 0000000000012000 RDI: ffffffff82603dd8
[ 0.344164][ T0] RBP: 0000000000000000 R08: 0000000000040000 R09: 0000000000000003
[ 0.344979][ T0] R10: 0000000000001000 R11: 0000000000000100 R12: 0000000000000040
[ 0.345793][ T0] R13: 0000000000012000 R14: 0000000000000000 R15: 0000000000000000
[ 0.346605][ T0] FS: 0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[ 0.347516][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.348204][ T0] CR2: ffff88843ffff000 CR3: 0000000002612000 CR4: 00000000000406b0
[ 0.349023][ T0] Call Trace:
[ 0.349357][ T0] <TASK>
[ 0.349651][ T0] __folio_alloc+0x15/0x32
[ 0.350105][ T0] alloc_slab_page+0x48/0x61
[ 0.350572][ T0] allocate_slab+0x5b/0x1b2
[ 0.351031][ T0] init_kmem_cache_nodes+0x4f/0x1cc
[ 0.351679][ T0] kmem_cache_open+0x128/0x18b
[ 0.352161][ T0] __kmem_cache_create+0x11/0x52
[ 0.352664][ T0] create_boot_cache+0x6c/0x96
[ 0.353150][ T0] kmem_cache_init+0x89/0x150
[ 0.353628][ T0] start_kernel+0x210/0x4ae
[ 0.354085][ T0] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.354695][ T0] </TASK>
[ 0.354999][ T0] ---[ end trace 0000000000000000 ]---



To reproduce:

# build kernel
cd linux
cp config-5.18.0-10160-g4c863a2fd728 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (4.52 kB)
config-5.18.0-10160-g4c863a2fd728 (125.50 kB)
job-script (5.03 kB)
dmesg.xz (5.97 kB)
Download all attachments