2022-02-19 18:12:42

by Baoquan He

[permalink] [raw]
Subject: [PATCH 00/22] Don't use kmalloc() with GFP_DMA

Let's replace it with other ways. This is the first step towards
removing dma-kmalloc support in kernel (Means that if everyting
is going well, we can't use kmalloc(GFP_DMA) to allocate buffer in the
future).

This series includes below changes which are easier to recognise and
make.

1) Remove GFP_DMA from dma_alloc_wc/noncoherent(), dma_pool_alloc(),
and dmam_alloc_coherent() which are redundant to specify GFP_DMA when
calling.
2) Replace kmalloc(GFP_DMA)/dma_map_xxxx() pair with dma_alloc_noncoherent().

Next, plan to investigate how we should handle places as below. We
firstly need figure out whether they really need buffer from ZONE_DMA.
If yes, how to change them with other ways. This need help from
maintainers, experts from sub-components and code contributors or anyone
knowing them well. E.g s390 and crypyto, we need guidance and help.

1) Kmalloc(GFP_DMA) in s390 platform, under arch/s390 and drivers/s390;
2) Kmalloc(GFP_DMA) in drivers/crypto;
3) Kmalloc(GFP_DMA) in network drivers under drivers/net, e.g skb
buffer requested from DMA zone.
4) Kmalloc(GFP_DMA) in device register control, e.g using regmap, devres
to read/write register, while memory from ZONE_DMA is required, e.g
i2c, spi.

For this first patch series, thanks to Hyeonggon for helping
reviewing and great suggestions on patch improving. We will work
together to continue the next steps of work.

Any comment, thought, or suggestoin is welcome and appreciated,
including but not limited to:
1) whether we should remove dma-kmalloc support in kernel();
3) why kmalloc(GFP_DMA) is needed in a certain place. why memory from
ZONE_DMA has to be requested in the case.
2) how to replace it with other ways in any place which you are familiar
with;

===========================Background information=======================
Prelusion:
Earlier, allocation failure was observed when calling kmalloc() with
GFP_DMA. It requests to allocate slab page from DMA zone while no managed
pages at all in there. Because in the current kernel, dma-kmalloc will
be created as long as CONFIG_ZONE_DMA is enabled. However, kdump kernel
of x86_64 doesn't have managed pages on DMA zone since below commit. The
details of this kdump issue can be found in reference link (a).

commit 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified")

To make clear the root cause and fix, many reviewers contributed their
thoughts and suggestions in the thread of the patchset v3 (a). Finally
Hyeonggon concluded what we can do to fix the kdump issue for now as a
workaround, and further action to get rid of dma-kmalloc which is not
a reasonable existence. (Please see Hyeonggon's reply in refernce (b)).
Quote Hyeonggon's words here:
~~~~
What about one of those?:

1) Do not call warn_alloc in page allocator if will always fail
to allocate ZONE_DMA pages.

2) let's check all callers of kmalloc with GFP_DMA
if they really need GFP_DMA flag and replace those by DMA API or
just remove GFP_DMA from kmalloc()

3) Drop support for allocating DMA memory from slab allocator
(as Christoph Hellwig said) and convert them to use DMA32
and see what happens
~~~~

Then Christoph acked Hyeonggon's conclusion, and said "This is the right
thing to do, but it will take a while." (See reference link (c))


==========Reference links=======
(a) v4 post including the details of kdump issue:
https://lore.kernel.org/all/[email protected]/T/#u

(b) v3 post including many reviewers' comments:
https://lore.kernel.org/all/[email protected]/T/#u

(c) Hyeonggon's mail concluding the solution:
https://lore.kernel.org/all/20211215044818.GB1097530@odroid/T/#u

(d) Christoph acked the plan in this mail:
https://lore.kernel.org/all/[email protected]/T/#u

Baoquan He (21):
parisc: pci-dma: remove stale code and comment
gpu: ipu-v3: Don't use GFP_DMA when calling dma_alloc_coherent()
drm/sti: Don't use GFP_DMA when calling dma_alloc_wc()
sound: n64: Don't use GFP_DMA when calling dma_alloc_coherent()
fbdev: da8xx: Don't use GFP_DMA when calling dma_alloc_coherent()
fbdev: mx3fb: Don't use GFP_DMA when calling dma_alloc_wc()
usb: gadget: lpc32xx_udc: Don't use GFP_DMA when calling
dma_alloc_coherent()
usb: cdns3: Don't use GFP_DMA when calling dma_alloc_coherent()
uio: pruss: Don't use GFP_DMA when calling dma_alloc_coherent()
staging: emxx_udc: Don't use GFP_DMA when calling dma_alloc_coherent()
staging: emxx_udc: Don't use GFP_DMA when calling dma_alloc_coherent()
spi: atmel: Don't use GFP_DMA when calling dma_alloc_coherent()
spi: spi-ti-qspi: Don't use GFP_DMA when calling dma_alloc_coherent()
usb: cdns3: Don't use GFP_DMA when calling dma_pool_alloc()
usb: udc: lpc32xx: Don't use GFP_DMA when calling dma_pool_alloc()
net: marvell: prestera: Don't use GFP_DMA when calling
dma_pool_alloc()
net: ethernet: mtk-star-emac: Don't use GFP_DMA when calling
dmam_alloc_coherent()
ethernet: rocker: Use dma_alloc_noncoherent() for dma buffer
HID: intel-ish-hid: Use dma_alloc_noncoherent() for dma buffer
mmc: wbsd: Use dma_alloc_noncoherent() for dma buffer
mtd: rawnand: Use dma_alloc_noncoherent() for dma buffer

Hyeonggon Yoo (1):
net: moxa: Don't use GFP_DMA when calling dma_alloc_coherent()

arch/parisc/kernel/pci-dma.c | 8 ---
drivers/gpu/drm/sti/sti_cursor.c | 4 +-
drivers/gpu/drm/sti/sti_hqvdp.c | 2 +-
drivers/gpu/ipu-v3/ipu-image-convert.c | 2 +-
drivers/hid/intel-ish-hid/ishtp-fw-loader.c | 23 +++-----
drivers/mmc/host/wbsd.c | 45 +++-----------
drivers/mtd/nand/raw/marvell_nand.c | 55 ++++++++++-------
.../ethernet/marvell/prestera/prestera_rxtx.c | 2 +-
drivers/net/ethernet/mediatek/mtk_star_emac.c | 2 +-
drivers/net/ethernet/moxa/moxart_ether.c | 4 +-
drivers/net/ethernet/rocker/rocker_main.c | 59 ++++++++-----------
drivers/spi/spi-atmel.c | 4 +-
drivers/spi/spi-ti-qspi.c | 2 +-
drivers/staging/emxx_udc/emxx_udc.c | 2 +-
drivers/staging/media/imx/imx-media-utils.c | 2 +-
drivers/uio/uio_pruss.c | 2 +-
drivers/usb/cdns3/cdns3-gadget.c | 4 +-
drivers/usb/gadget/udc/lpc32xx_udc.c | 4 +-
drivers/video/fbdev/da8xx-fb.c | 4 +-
drivers/video/fbdev/fsl-diu-fb.c | 2 +-
drivers/video/fbdev/mx3fb.c | 2 +-
sound/mips/snd-n64.c | 2 +-
22 files changed, 97 insertions(+), 139 deletions(-)

--
2.17.2


2022-02-19 18:29:24

by Baoquan He

[permalink] [raw]
Subject: [PATCH 13/22] spi: atmel: Don't use GFP_DMA when calling dma_alloc_coherent()

dma_alloc_coherent() allocates dma buffer with device's addressing
limitation in mind. It's redundent to specify GFP_DMA when calling
dma_alloc_coherent().

[ [email protected]: Update changelog ]

Signed-off-by: Baoquan He <[email protected]>
Acked-by: Hyeonggon Yoo <[email protected]>
---
drivers/spi/spi-atmel.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-atmel.c b/drivers/spi/spi-atmel.c
index 9e300a932699..271dacf3b7d2 100644
--- a/drivers/spi/spi-atmel.c
+++ b/drivers/spi/spi-atmel.c
@@ -1516,14 +1516,14 @@ static int atmel_spi_probe(struct platform_device *pdev)
as->addr_rx_bbuf = dma_alloc_coherent(&pdev->dev,
SPI_MAX_DMA_XFER,
&as->dma_addr_rx_bbuf,
- GFP_KERNEL | GFP_DMA);
+ GFP_KERNEL);
if (!as->addr_rx_bbuf) {
as->use_dma = false;
} else {
as->addr_tx_bbuf = dma_alloc_coherent(&pdev->dev,
SPI_MAX_DMA_XFER,
&as->dma_addr_tx_bbuf,
- GFP_KERNEL | GFP_DMA);
+ GFP_KERNEL);
if (!as->addr_tx_bbuf) {
as->use_dma = false;
dma_free_coherent(&pdev->dev, SPI_MAX_DMA_XFER,
--
2.17.2

2022-02-21 14:19:21

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 00/22] Don't use kmalloc() with GFP_DMA

On Sat, Feb 19, 2022 at 08:51:59AM +0800, Baoquan He wrote:
> Let's replace it with other ways. This is the first step towards
> removing dma-kmalloc support in kernel (Means that if everyting
> is going well, we can't use kmalloc(GFP_DMA) to allocate buffer in the
> future).
...
>
> Next, plan to investigate how we should handle places as below. We
> firstly need figure out whether they really need buffer from ZONE_DMA.
> If yes, how to change them with other ways. This need help from
> maintainers, experts from sub-components and code contributors or anyone
> knowing them well. E.g s390 and crypyto, we need guidance and help.
>
> 1) Kmalloc(GFP_DMA) in s390 platform, under arch/s390 and drivers/s390;

So, s390 partially requires GFP_DMA allocations for memory areas which
are required by the hardware to be below 2GB. There is not necessarily
a device associated when this is required. E.g. some legacy "diagnose"
calls require buffers to be below 2GB.

How should something like this be handled? I'd guess that the
dma_alloc API is not the right thing to use in such cases. Of course
we could say, let's waste memory and use full pages instead, however
I'm not sure this is a good idea.

s390 drivers could probably converted to dma_alloc API, even though
that would cause quite some code churn.

> For this first patch series, thanks to Hyeonggon for helping
> reviewing and great suggestions on patch improving. We will work
> together to continue the next steps of work.
>
> Any comment, thought, or suggestoin is welcome and appreciated,
> including but not limited to:
> 1) whether we should remove dma-kmalloc support in kernel();

The question is: what would this buy us? As stated above I'd assume
this comes with quite some code churn, so there should be a good
reason to do this.

From this cover letter I only get that there was a problem with kdump
on x86, and this has been fixed. So why this extra effort?

> 3) Drop support for allocating DMA memory from slab allocator
> (as Christoph Hellwig said) and convert them to use DMA32
> and see what happens

Can you please clarify what "convert to DMA32" means? I would assume
this does _not_ mean that passing GFP_DMA32 to slab allocator would
work then?

btw. there are actually two kmalloc allocations which pass GFP_DMA32;
I guess this is broken(?):

drivers/hid/intel-ish-hid/ishtp-fw-loader.c: dma_buf = kmalloc(payload_max_size, GFP_KERNEL | GFP_DMA32);
drivers/media/test-drivers/vivid/vivid-osd.c: dev->video_vbase = kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32);

2022-02-22 08:51:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 00/22] Don't use kmalloc() with GFP_DMA

On Mon, Feb 21, 2022 at 02:57:34PM +0100, Heiko Carstens wrote:
> > 1) Kmalloc(GFP_DMA) in s390 platform, under arch/s390 and drivers/s390;
>
> So, s390 partially requires GFP_DMA allocations for memory areas which
> are required by the hardware to be below 2GB. There is not necessarily
> a device associated when this is required. E.g. some legacy "diagnose"
> calls require buffers to be below 2GB.
>
> How should something like this be handled? I'd guess that the
> dma_alloc API is not the right thing to use in such cases. Of course
> we could say, let's waste memory and use full pages instead, however
> I'm not sure this is a good idea.

Yeah, I don't think the DMA API is the right thing for that. This
is one of the very rare cases where a raw allocation makes sense.

That being said being able to drop kmalloc support for GFP_DMA would
be really useful. How much memory would we waste if switching to the
page allocator?

> s390 drivers could probably converted to dma_alloc API, even though
> that would cause quite some code churn.

I think that would be a very good thing to have.

> > For this first patch series, thanks to Hyeonggon for helping
> > reviewing and great suggestions on patch improving. We will work
> > together to continue the next steps of work.
> >
> > Any comment, thought, or suggestoin is welcome and appreciated,
> > including but not limited to:
> > 1) whether we should remove dma-kmalloc support in kernel();
>
> The question is: what would this buy us? As stated above I'd assume
> this comes with quite some code churn, so there should be a good
> reason to do this.

There is two steps here. One is to remove GFP_DMA support from
kmalloc, which would help to cleanup the slab allocator(s) very nicely,
as at that point it can stop to be zone aware entirely.

The long term goal is to remove ZONE_DMA entirely at least for
architectures that only use the small 16MB ISA-style one. It can
then be replaced with for example a CMA area and fall into a movable
zone. I'd have to prototype this first and see how it applies to the
s390 case. It might not be worth it and maybe we should replace
ZONE_DMA and ZONE_DMA32 with a ZONE_LIMITED for those use cases as
the amount covered tends to not be totally out of line for what we
built the zone infrastructure.

> >From this cover letter I only get that there was a problem with kdump
> on x86, and this has been fixed. So why this extra effort?
>
> > 3) Drop support for allocating DMA memory from slab allocator
> > (as Christoph Hellwig said) and convert them to use DMA32
> > and see what happens
>
> Can you please clarify what "convert to DMA32" means? I would assume
> this does _not_ mean that passing GFP_DMA32 to slab allocator would
> work then?

I'm really not sure what this means.

>
> btw. there are actually two kmalloc allocations which pass GFP_DMA32;
> I guess this is broken(?):
>
> drivers/hid/intel-ish-hid/ishtp-fw-loader.c: dma_buf = kmalloc(payload_max_size, GFP_KERNEL | GFP_DMA32);
> drivers/media/test-drivers/vivid/vivid-osd.c: dev->video_vbase = kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32);

Yes, this is completely broken.

2022-02-22 13:41:47

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH 00/22] Don't use kmalloc() with GFP_DMA

On 02/22/22 at 09:12pm, Baoquan He wrote:
> On 02/22/22 at 09:44am, Christoph Hellwig wrote:
> > On Mon, Feb 21, 2022 at 02:57:34PM +0100, Heiko Carstens wrote:
> > > > 1) Kmalloc(GFP_DMA) in s390 platform, under arch/s390 and drivers/s390;
> > >
> > > So, s390 partially requires GFP_DMA allocations for memory areas which
> > > are required by the hardware to be below 2GB. There is not necessarily
> > > a device associated when this is required. E.g. some legacy "diagnose"
> > > calls require buffers to be below 2GB.
> > >
> > > How should something like this be handled? I'd guess that the
> > > dma_alloc API is not the right thing to use in such cases. Of course
> > > we could say, let's waste memory and use full pages instead, however
> > > I'm not sure this is a good idea.
> >
> > Yeah, I don't think the DMA API is the right thing for that. This
> > is one of the very rare cases where a raw allocation makes sense.
> >
> > That being said being able to drop kmalloc support for GFP_DMA would
> > be really useful. How much memory would we waste if switching to the
> > page allocator?
> >
> > > s390 drivers could probably converted to dma_alloc API, even though
> > > that would cause quite some code churn.
> >
> > I think that would be a very good thing to have.
> >
> > > > For this first patch series, thanks to Hyeonggon for helping
> > > > reviewing and great suggestions on patch improving. We will work
> > > > together to continue the next steps of work.
> > > >
> > > > Any comment, thought, or suggestoin is welcome and appreciated,
> > > > including but not limited to:
> > > > 1) whether we should remove dma-kmalloc support in kernel();
> > >
> > > The question is: what would this buy us? As stated above I'd assume
> > > this comes with quite some code churn, so there should be a good
> > > reason to do this.
> >
> > There is two steps here. One is to remove GFP_DMA support from
> > kmalloc, which would help to cleanup the slab allocator(s) very nicely,
> > as at that point it can stop to be zone aware entirely.
> >
> > The long term goal is to remove ZONE_DMA entirely at least for
> > architectures that only use the small 16MB ISA-style one. It can
> > then be replaced with for example a CMA area and fall into a movable
> > zone. I'd have to prototype this first and see how it applies to the
> > s390 case. It might not be worth it and maybe we should replace
> > ZONE_DMA and ZONE_DMA32 with a ZONE_LIMITED for those use cases as
> > the amount covered tends to not be totally out of line for what we
> > built the zone infrastructure.
> >
> > > >From this cover letter I only get that there was a problem with kdump
> > > on x86, and this has been fixed. So why this extra effort?
> > >
> > > > 3) Drop support for allocating DMA memory from slab allocator
> > > > (as Christoph Hellwig said) and convert them to use DMA32
> > > > and see what happens
> > >
> > > Can you please clarify what "convert to DMA32" means? I would assume
> > > this does _not_ mean that passing GFP_DMA32 to slab allocator would
> > > work then?
> >
> > I'm really not sure what this means.
>
> Thanks a lot to Heiko for valuable input, it's very helpful. And thanks
> a lot to Christoph for explaining.
>
> I guess this "convert to DMA32" is similar to "replace ZONE_DMA and
> ZONE_DMA32 with a ZONE_LIMITED".

And by the way, when I searched SLAB_CACHE_DMA32 which is another zone
aware slab flag, I got that not all people likes to abuse
kmalloc(GFP_DMA). There are two places where
kmem_cache_create(SLAB_CACHE_DMA32) are called to create slab grabbing
memory from zone DMA32. Obviously the code author really knows slab
allocator. They use dma32 slab to get cache memory under 4G.

drivers/firmware/google/gsmi.c : gsmi_init()
drivers/iommu/io-pgtable-arm-v7s.c: arm_v7s_alloc_pgtable()

>
> When I use 'git grep "GFP_DMA/>"' to search all places specifying GFP_DMA,
> I noticed the main usage of kmalloc(GFP_DMA) is to get memory under a
> memory limitation, but not for DMA buffer allocation. Below is what I got
> for earlier kdump issue explanation. It can help explain why kmalloc(GFP_DMA)
> is useful on ARCHes w/o ZONE_DMA32, but doesn't make sense on x86_64 which
> has both zone DMA and DMA32. The 16M ZONE_DMA is only for very rarely used
> legacy ISA device, but most pci devices driver supporting 32bit addressing
> likes to abuse kmalloc(GFP_DMA) to get DMA buffer from the zone DMA.
> That obviously is unsafe and unreasonable.
>
> Like risc-V which doesn't have the burden of legacy ISA devices, it can
> take only containing DMA32 zone way. ARM64 also adjusts to have only
> arm64 if not on Raspberry Pi. Using kmalloc(GFP_DMA) makes them no
> inconvenience. If finally having dma32-kmalloc, the name may need be
> carefully considerred, it seems to be acceptable. We just need to pick
> up those ISA device driver and handle their 24bit addressing DMA well.
>
> For this patchset, I only find out places in which GPF_DMA is
> redundant and can be removed directly, and places where
> kmalloc(GFP_DMA)|dma_map_ pair can be replaced with dma_alloc_xxxx() API
> and the memory wasting is not so big. I have patches converting
> kmalloc(GFP_DMA) to alloc_pages(GFP_DMA), but not easy to replace with
> dma_alloc_xxx(), Hyeonggon suggested not adding them to this series.
> I will continue investigating the left places, see whether or how we can
> convert them.
>
> =============================
> ARCH which has DMA32
> ZONE_DMA ZONE_DMA32
> arm64 0~X X~4G (X is got from ACPI or DT. Otherwise it's 4G by default, DMA32 is empty)
> ia64 None 0~4G
> mips 0 or 0~16M X~4G (zone DMA is empty on SGI_IP22 or SGI_IP28, otherwise 16M by default like i386)
> riscv None 0~4G
> x86_64 16M 16M~4G
>
>
> =============================
> ARCH which has no DMA32
> ZONE_DMA
> alpha 0~16M or empty if IOMMU enabled
> arm 0~X (X is reported by fdt, 4G by default)
> m68k 0~total memory
> microblaze 0~total low memory
> powerpc 0~2G
> s390 0~2G
> sparc 0~ total low memory
> i386 0~16M
>
> >
> > >
> > > btw. there are actually two kmalloc allocations which pass GFP_DMA32;
> > > I guess this is broken(?):
> > >
> > > drivers/hid/intel-ish-hid/ishtp-fw-loader.c: dma_buf = kmalloc(payload_max_size, GFP_KERNEL | GFP_DMA32);
> > > drivers/media/test-drivers/vivid/vivid-osd.c: dev->video_vbase = kzalloc(dev->video_buffer_size, GFP_KERNEL | GFP_DMA32);
> >
> > Yes, this is completely broken.
> >
>