2021-03-11 07:50:30

by Chen Jun

[permalink] [raw]
Subject: [question] Panic in dax_writeback_one

Hi

I write a driver to simulate memory as a block device (like a ramdisk).
and I hope the memory used would not be observed by kernel, becasue of
the struct page will take many memory.

When I mount ext2 with dax on my ramdisk. Panic will happen when fsync.
Call trace$B!'(B
dax_writeback_one+0x330/0x3e4
dax_writeback_mapping_range+0x15c/0x318
ext2_dax_writepages+0x38/0x44
.....

static int dax_writeback_one(struct xa_state *xas, struct dax_device
*dax_dev, struct address_space *mapping, void *entry)
----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
The pfn is returned by the driver. In my case, the pfn does not have
struct page. so pfn_to_page(pfn) return a wrong address.

I noticed the following changes may related to my problem:
1. Before 3fe0791c295cfd3cd735de7a32cc0780949c009f (dax: store pfns in
the radix), the radix tree stroes sectors. address which would be
flushed is got from driver by passing sector.
And the commit assume that all pfn have struct page.

2. CONFIG_FS_DAX_LIMITED $B!J(BSelected by DCSSBLK [=n] && BLK_DEV [=y] &&
S390 && BLOCK [=y]) is added to avoid access struct page of pfn.

Does anyone have any idea about my problem.

--
Regards
Chen Jun


2021-03-11 12:21:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [question] Panic in dax_writeback_one

On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
> static int dax_writeback_one(struct xa_state *xas, struct dax_device
> *dax_dev, struct address_space *mapping, void *entry)
> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> The pfn is returned by the driver. In my case, the pfn does not have
> struct page. so pfn_to_page(pfn) return a wrong address.

I wasn't involved, but I think the right solution here is simply to
replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn). I don't
know why Dan decided to do this in the more complicated way.

2021-03-11 17:29:38

by Dan Williams

[permalink] [raw]
Subject: Re: [question] Panic in dax_writeback_one

On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <[email protected]> wrote:
>
> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
> > static int dax_writeback_one(struct xa_state *xas, struct dax_device
> > *dax_dev, struct address_space *mapping, void *entry)
> > ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> > The pfn is returned by the driver. In my case, the pfn does not have
> > struct page. so pfn_to_page(pfn) return a wrong address.
>
> I wasn't involved, but I think the right solution here is simply to
> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn). I don't
> know why Dan decided to do this in the more complicated way.

pfn_to_virt() only works for the direct-map. If pages are not mapped I
don't see how pfn_to_virt() is expected to work.

The real question Chenjun is why are you writing a new simulator of
memory as a block-device vs reusing the pmem driver or brd?

2021-03-12 02:43:10

by Chen Jun

[permalink] [raw]
Subject: Re: [question] Panic in dax_writeback_one

$B:_(B 2021/3/11 20:20, Matthew Wilcox $B<LF;(B:
> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>> *dax_dev, struct address_space *mapping, void *entry)
>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>> The pfn is returned by the driver. In my case, the pfn does not have
>> struct page. so pfn_to_page(pfn) return a wrong address.
>
> I wasn't involved, but I think the right solution here is simply to
> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn). I don't
> know why Dan decided to do this in the more complicated way.
>

Thanks Mattherw

I think pfn_to_virt could not work in my case. Because of the pfn is not
in memory block.

--
Regards
Chen Jun

2021-03-17 03:01:40

by Chen Jun

[permalink] [raw]
Subject: Re: [question] Panic in dax_writeback_one

$B:_(B 2021/3/12 1:25, Dan Williams $B<LF;(B:
> On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <[email protected]> wrote:
>>
>> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
>>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>>> *dax_dev, struct address_space *mapping, void *entry)
>>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>>> The pfn is returned by the driver. In my case, the pfn does not have
>>> struct page. so pfn_to_page(pfn) return a wrong address.
>>
>> I wasn't involved, but I think the right solution here is simply to
>> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn). I don't
>> know why Dan decided to do this in the more complicated way.
>
> pfn_to_virt() only works for the direct-map. If pages are not mapped I
> don't see how pfn_to_virt() is expected to work.
>
> The real question Chenjun is why are you writing a new simulator of
> memory as a block-device vs reusing the pmem driver or brd?
>

Hi Dan

In my case, I do not want to take memory to create the struct page of
the memory my driver used.

And, I think this is also a problem for DCSSBLK.

So I want to go back the older way if CONFIG_FS_DAX_LIMITED

diff --git a/fs/dax.c b/fs/dax.c
index b3d27fd..6395e84 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas,
struct dax_device *dax_dev,
{
unsigned long pfn, index, count;
long ret = 0;
+ void *kaddr;
+ pfn_t new_pfn_t;
+ pgoff_t pgoff;

/*
* A page got tagged dirty in DAX mapping? Something is seriously
@@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas,
struct dax_device *dax_dev,
index = xas->xa_index & ~(count - 1);

dax_entry_mkclean(mapping, index, pfn);
- dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
+
+ if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
+ kaddr = page_address(pfn_to_page(pfn));
+ else {
+ ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn <<
PFN_SECTION_SHIFT, count << PAGE_SHIFT, &pgoff);
+ if (ret)
+ goto put_unlocked;
+
+ ret = dax_direct_access(dax_dev, pgoff, count, &kaddr, &new_pfn_t);
+ if (ret < 0)
+ goto put_unlocked;
+
+ if (WARN_ON_ONCE(ret < count) || WARN_ON_ONCE(pfn_t_to_pfn(new_pfn_t)
!= pfn)) {
+ ret = -EIO;
+ goto put_unlocked;
+ }
+ }
+
+ dax_flush(dax_dev, kaddr, count * PAGE_SIZE);
/*
* After we have flushed the cache, we can clear the dirty tag. There
* cannot be new dirty data in the pfn after the flush has completed as
--

--
Regards
Chen Jun

2021-03-17 04:58:43

by Dan Williams

[permalink] [raw]
Subject: Re: [question] Panic in dax_writeback_one

On Tue, Mar 16, 2021 at 8:00 PM chenjun (AM) <[email protected]> wrote:
>
> 在 2021/3/12 1:25, Dan Williams 写道:
> > On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <[email protected]> wrote:
> >>
> >> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
> >>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
> >>> *dax_dev, struct address_space *mapping, void *entry)
> >>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> >>> The pfn is returned by the driver. In my case, the pfn does not have
> >>> struct page. so pfn_to_page(pfn) return a wrong address.
> >>
> >> I wasn't involved, but I think the right solution here is simply to
> >> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn). I don't
> >> know why Dan decided to do this in the more complicated way.
> >
> > pfn_to_virt() only works for the direct-map. If pages are not mapped I
> > don't see how pfn_to_virt() is expected to work.
> >
> > The real question Chenjun is why are you writing a new simulator of
> > memory as a block-device vs reusing the pmem driver or brd?
> >
>
> Hi Dan
>
> In my case, I do not want to take memory to create the struct page of
> the memory my driver used.

There are efforts happening to drastically reduce that overhead. You
might want to check out Joao's work [1]. I think that direction holds
more promise than trying to extend FS_DAX_LIMITED.

[1]: http://lore.kernel.org/r/[email protected]

> And, I think this is also a problem for DCSSBLK.

If I understand correctly DAX replaced XIP for S390. There have not
been reports about this problem, and I can only guess because XIP
(eXecute-In-Place) is a read-only use case where dax_writeback_one()
is never triggered, or S390 just isn't using DCSSBLK anymore. The last
time I touched FS_DAX_LIMITED the DCSSBLK maintainers offered to just
delete this driver to get it out of the way.

>
> So I want to go back the older way if CONFIG_FS_DAX_LIMITED
>
> diff --git a/fs/dax.c b/fs/dax.c
> index b3d27fd..6395e84 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas,
> struct dax_device *dax_dev,
> {
> unsigned long pfn, index, count;
> long ret = 0;
> + void *kaddr;
> + pfn_t new_pfn_t;
> + pgoff_t pgoff;
>
> /*
> * A page got tagged dirty in DAX mapping? Something is seriously
> @@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas,
> struct dax_device *dax_dev,
> index = xas->xa_index & ~(count - 1);
>
> dax_entry_mkclean(mapping, index, pfn);
> - dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
> +
> + if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
> + kaddr = page_address(pfn_to_page(pfn));
> + else {
> + ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn <<
> PFN_SECTION_SHIFT, count << PAGE_SHIFT, &pgoff);

This is broken:

mapping->host->i_sb->s_bdev

...there is no guarantee that the superblock associated with the
mapping is hosted on the same block device associated with the passed
in dax_device. See dax_rtdev in xfs_open_devices().

2021-03-17 06:49:50

by Chen Jun

[permalink] [raw]
Subject: Re: [question] Panic in dax_writeback_one

Thanks for the advices. I will check out that.

$B:_(B 2021/3/17 12:55, Dan Williams $B<LF;(B:
> On Tue, Mar 16, 2021 at 8:00 PM chenjun (AM) <[email protected]> wrote:
>>
>> $B:_(B 2021/3/12 1:25, Dan Williams $B<LF;(B:
>>> On Thu, Mar 11, 2021 at 4:20 AM Matthew Wilcox <[email protected]> wrote:
>>>>
>>>> On Thu, Mar 11, 2021 at 07:48:25AM +0000, chenjun (AM) wrote:
>>>>> static int dax_writeback_one(struct xa_state *xas, struct dax_device
>>>>> *dax_dev, struct address_space *mapping, void *entry)
>>>>> ----dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>>>>> The pfn is returned by the driver. In my case, the pfn does not have
>>>>> struct page. so pfn_to_page(pfn) return a wrong address.
>>>>
>>>> I wasn't involved, but I think the right solution here is simply to
>>>> replace page_address(pfn_to_page(pfn)) with pfn_to_virt(pfn). I don't
>>>> know why Dan decided to do this in the more complicated way.
>>>
>>> pfn_to_virt() only works for the direct-map. If pages are not mapped I
>>> don't see how pfn_to_virt() is expected to work.
>>>
>>> The real question Chenjun is why are you writing a new simulator of
>>> memory as a block-device vs reusing the pmem driver or brd?
>>>
>>
>> Hi Dan
>>
>> In my case, I do not want to take memory to create the struct page of
>> the memory my driver used.
>
> There are efforts happening to drastically reduce that overhead. You
> might want to check out Joao's work [1]. I think that direction holds
> more promise than trying to extend FS_DAX_LIMITED.
>
> [1]: http://lore.kernel.org/r/[email protected]
>
>> And, I think this is also a problem for DCSSBLK.
>
> If I understand correctly DAX replaced XIP for S390. There have not
> been reports about this problem, and I can only guess because XIP
> (eXecute-In-Place) is a read-only use case where dax_writeback_one()
> is never triggered, or S390 just isn't using DCSSBLK anymore. The last
> time I touched FS_DAX_LIMITED the DCSSBLK maintainers offered to just
> delete this driver to get it out of the way.
>
>>
>> So I want to go back the older way if CONFIG_FS_DAX_LIMITED
>>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index b3d27fd..6395e84 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -867,6 +867,9 @@ static int dax_writeback_one(struct xa_state *xas,
>> struct dax_device *dax_dev,
>> {
>> unsigned long pfn, index, count;
>> long ret = 0;
>> + void *kaddr;
>> + pfn_t new_pfn_t;
>> + pgoff_t pgoff;
>>
>> /*
>> * A page got tagged dirty in DAX mapping? Something is seriously
>> @@ -926,7 +929,25 @@ static int dax_writeback_one(struct xa_state *xas,
>> struct dax_device *dax_dev,
>> index = xas->xa_index & ~(count - 1);
>>
>> dax_entry_mkclean(mapping, index, pfn);
>> - dax_flush(dax_dev, page_address(pfn_to_page(pfn)), count * PAGE_SIZE);
>> +
>> + if (!IS_ENABLED(CONFIG_FS_DAX_LIMITED) || pfn_valid(pfn))
>> + kaddr = page_address(pfn_to_page(pfn));
>> + else {
>> + ret = bdev_dax_pgoff(mapping->host->i_sb->s_bdev, pfn <<
>> PFN_SECTION_SHIFT, count << PAGE_SHIFT, &pgoff);
>
> This is broken:
>
> mapping->host->i_sb->s_bdev
>
> ...there is no guarantee that the superblock associated with the
> mapping is hosted on the same block device associated with the passed
> in dax_device. See dax_rtdev in xfs_open_devices().
>


--
Regards
Chen Jun