2023-10-16 12:53:19

by Justin He

[permalink] [raw]
Subject: [PATCH v3 0/2] fix dma_addressing_limited() if dma_range_map

This is to fix the hangs at boot. The root cause is the nvme device dma
mapping is failed in the checking path of phys_to_dma() since
dma_max_mapping_size() gave the wrong answer to start with.

---
Changelog:
v1: https://lore.kernel.org/all/[email protected]/
v2: https://lore.kernel.org/lkml/[email protected]/
- refine the subject and commit msg (By Robin Murphy)
- refactor the checking loop in check_ram_in_range_map() in the pages
unit to avoid wrap to 0 on 32bits platforms (Robin)
v3:
- move and export dma_addressing_limited() to avoid adding a new
exported helper (by Christoph Hellwig)

Jia He (2):
dma-mapping: export dma_addressing_limited()
dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover
all system RAM

include/linux/dma-mapping.h | 19 ++++--------
kernel/dma/mapping.c | 60 +++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+), 14 deletions(-)

--
2.25.1


2023-10-16 12:53:29

by Justin He

[permalink] [raw]
Subject: [PATCH v3 2/2] dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM

There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewhere lower down. Then it prevents
the nvme device dma mapping in the checking path of phys_to_dma() and
causes the hangs at boot.

E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access
{
Name (RBUF, ResourceTemplate ()
{
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0x00000000FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000100000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000006010200000, // Range Minimum
0x000000602FFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x000000001FE00000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x00000060F0000000, // Range Minimum
0x00000060FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000010000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000007000000000, // Range Minimum
0x000003FFFFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000039000000000, // Length
,, , AddressRangeMemory, TypeStatic)
})

But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.

Fix it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.

Signed-off-by: Jia He <[email protected]>
---
kernel/dma/mapping.c | 49 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 5bfe782f9a7f..9889d1d25a7f 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -7,6 +7,7 @@
*/
#include <linux/memblock.h> /* for max_pfn */
#include <linux/acpi.h>
+#include <linux/dma-direct.h> /* for bus_dma_region */
#include <linux/dma-map-ops.h>
#include <linux/export.h>
#include <linux/gfp.h>
@@ -793,6 +794,47 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
}
EXPORT_SYMBOL(dma_set_coherent_mask);

+/*
+ * To check whether all ram resource ranges are covered by dma range map
+ * Returns 0 when continuous check is needed
+ * Returns 1 if there is some RAM range can't be covered by dma_range_map
+ */
+static int check_ram_in_range_map(unsigned long start_pfn,
+ unsigned long nr_pages, void *data)
+{
+ unsigned long end_pfn = start_pfn + nr_pages;
+ struct device *dev = (struct device *)data;
+ struct bus_dma_region *bdr = NULL;
+ const struct bus_dma_region *m;
+
+ while (start_pfn < end_pfn) {
+ for (m = dev->dma_range_map; PFN_DOWN(m->size); m++) {
+ unsigned long cpu_start_pfn = PFN_DOWN(m->cpu_start);
+
+ if (start_pfn >= cpu_start_pfn
+ && start_pfn - cpu_start_pfn < PFN_DOWN(m->size)) {
+ bdr = (struct bus_dma_region *)m;
+ break;
+ }
+ }
+ if (!bdr)
+ return 1;
+
+ start_pfn = PFN_DOWN(bdr->cpu_start) + PFN_DOWN(bdr->size);
+ }
+
+ return 0;
+}
+
+static bool all_ram_in_dma_range_map(struct device *dev)
+{
+ if (!dev->dma_range_map)
+ return 1;
+
+ return !walk_system_ram_range(0, PFN_DOWN(ULONG_MAX) + 1, dev,
+ check_ram_in_range_map);
+}
+
/**
* dma_addressing_limited - return if the device is addressing limited
* @dev: device to check
@@ -803,8 +845,11 @@ EXPORT_SYMBOL(dma_set_coherent_mask);
*/
bool dma_addressing_limited(struct device *dev)
{
- return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
- dma_get_required_mask(dev);
+ if (min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
+ dma_get_required_mask(dev))
+ return true;
+
+ return !all_ram_in_dma_range_map(dev);
}
EXPORT_SYMBOL(dma_addressing_limited);

--
2.25.1

2023-10-16 12:53:41

by Justin He

[permalink] [raw]
Subject: [PATCH v3 1/2] dma-mapping: export dma_addressing_limited()

This is a preparatory patch to move dma_addressing_limited so that it is
exported instead of a new low-level helper.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jia He <[email protected]>
---
include/linux/dma-mapping.h | 19 +++++--------------
kernel/dma/mapping.c | 15 +++++++++++++++
2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f0ccca16a0ac..4a658de44ee9 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -144,6 +144,7 @@ bool dma_pci_p2pdma_supported(struct device *dev);
int dma_set_mask(struct device *dev, u64 mask);
int dma_set_coherent_mask(struct device *dev, u64 mask);
u64 dma_get_required_mask(struct device *dev);
+bool dma_addressing_limited(struct device *dev);
size_t dma_max_mapping_size(struct device *dev);
size_t dma_opt_mapping_size(struct device *dev);
bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
@@ -264,6 +265,10 @@ static inline u64 dma_get_required_mask(struct device *dev)
{
return 0;
}
+static inline bool dma_addressing_limited(struct device *dev)
+{
+ return false;
+}
static inline size_t dma_max_mapping_size(struct device *dev)
{
return 0;
@@ -465,20 +470,6 @@ static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask)
return dma_set_mask_and_coherent(dev, mask);
}

-/**
- * dma_addressing_limited - return if the device is addressing limited
- * @dev: device to check
- *
- * Return %true if the devices DMA mask is too small to address all memory in
- * the system, else %false. Lack of addressing bits is the prime reason for
- * bounce buffering, but might not be the only one.
- */
-static inline bool dma_addressing_limited(struct device *dev)
-{
- return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
- dma_get_required_mask(dev);
-}
-
static inline unsigned int dma_get_max_seg_size(struct device *dev)
{
if (dev->dma_parms && dev->dma_parms->max_segment_size)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e323ca48f7f2..5bfe782f9a7f 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -793,6 +793,21 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
}
EXPORT_SYMBOL(dma_set_coherent_mask);

+/**
+ * dma_addressing_limited - return if the device is addressing limited
+ * @dev: device to check
+ *
+ * Return %true if the devices DMA mask is too small to address all memory in
+ * the system, else %false. Lack of addressing bits is the prime reason for
+ * bounce buffering, but might not be the only one.
+ */
+bool dma_addressing_limited(struct device *dev)
+{
+ return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
+ dma_get_required_mask(dev);
+}
+EXPORT_SYMBOL(dma_addressing_limited);
+
size_t dma_max_mapping_size(struct device *dev)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
--
2.25.1

2023-10-23 06:09:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM

> + */
> +static int check_ram_in_range_map(unsigned long start_pfn,
> + unsigned long nr_pages, void *data)
> +{
> + unsigned long end_pfn = start_pfn + nr_pages;
> + struct device *dev = (struct device *)data;

No need for the cast here.

> + struct bus_dma_region *bdr = NULL;
> + const struct bus_dma_region *m;
> +
> + while (start_pfn < end_pfn) {
> + for (m = dev->dma_range_map; PFN_DOWN(m->size); m++) {
> + unsigned long cpu_start_pfn = PFN_DOWN(m->cpu_start);
> +
> + if (start_pfn >= cpu_start_pfn
> + && start_pfn - cpu_start_pfn < PFN_DOWN(m->size)) {

Linux coding style keeps the && on the previous line.

> + bdr = (struct bus_dma_region *)m;

If you also declared bdr as const this should be able to do away with
the cast.

> bool dma_addressing_limited(struct device *dev)
> {
> - return min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
> - dma_get_required_mask(dev);
> + if (min_not_zero(dma_get_mask(dev), dev->bus_dma_limit) <
> + dma_get_required_mask(dev))
> + return true;
> +
> + return !all_ram_in_dma_range_map(dev);

So, all the dma range map thing is really a dma-direct concept. So I
think here in dma_addressing_limited we should just do a:

if (likely(!ops))
return !dma_direct_all_ram_mapped(dev));
return false

with dma_direct_all_ram_mapped move to dma-direct.c.

2023-10-23 06:09:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] dma-mapping: export dma_addressing_limited()

On Mon, Oct 16, 2023 at 12:52:53PM +0000, Jia He wrote:
> This is a preparatory patch to move dma_addressing_limited so that it is
> exported instead of a new low-level helper.

So while this exports dma_addressing_limited that's really a side effect.
The primary thing is that it moves the function out of line.

2023-10-25 15:12:03

by Justin He

[permalink] [raw]
Subject: RE: [PATCH v3 1/2] dma-mapping: export dma_addressing_limited()



> -----Original Message-----
> From: Christoph Hellwig <[email protected]>
> Sent: Monday, October 23, 2023 2:09 PM
> To: Justin He <[email protected]>
> Cc: Christoph Hellwig <[email protected]>; Marek Szyprowski
> <[email protected]>; Robin Murphy <[email protected]>;
> [email protected]; [email protected]; nd <[email protected]>
> Subject: Re: [PATCH v3 1/2] dma-mapping: export dma_addressing_limited()
>
> On Mon, Oct 16, 2023 at 12:52:53PM +0000, Jia He wrote:
> > This is a preparatory patch to move dma_addressing_limited so that it
> > is exported instead of a new low-level helper.
>
> So while this exports dma_addressing_limited that's really a side effect.
> The primary thing is that it moves the function out of line.
Ok, I will update the commit msg.
Thanks for the comments!


--
Cheers,
Justin (Jia He)