LinuxLists.cc - Re: still nfs problems [Was: Linux 2.6.37-rc8]

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

Hi Trond,

On Thu, Dec 30, 2010 at 12:59:52PM -0500, Trond Myklebust wrote:
> On Thu, 2010-12-30 at 18:14 +0100, Uwe Kleine-K?nig wrote:
> > I wonder if the nfs-stuff is considered to be solved, because I still
> > see strange things.
> >
> > During boot my machine sometimes (approx one out of two times) hangs with
> > the output pasted below on Sysctl-l. The irq
> >
> > I'm not 100% sure it's related, but at least it seems to hang in
> > nfs_readdir. (When the serial irq happend that triggered the sysrq the
> > program counter was at 0xc014601c, which is fs/nfs/dir.c:647 for me.)
> >
> > This is on 2.6.37-rc8 plus some patches for machine support on an ARM
> > machine.
>
> Ccing [email protected]
Yeah, good idea. I had that ~2min after sending my report during
dinner, sorry :-\

> What filesystem are you exporting on the server? What is the NFS
> version? Is this nfsroot, autofs or an ordinary nfs mount?
This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a
directory on a different partition. I don't know the filesystem of my
homedir as it resides on a server I have no access to, but I asked the
admin, so I can follow up with this info later (I'd suspect ext3, too).
The real root directory is on ext3 (rw,noatime).

The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1.
nfs-related kernel parameters are

ip=dhcp root=/dev/nfs nfsroot=192.168.23.2:/home/ukl/nfsroot/tx28,v3,tcp

I hope this answers your questions. If not, please ask.

I tried without the symlink and saw some different errors, e.g.

starting splashutils daemon.../etc/rc.d/S00splashutils: line 50: //sbin/fbsplashd.static: Unknown error 521

(this is the init script that hung before) and

[ 6.160000] NFS: server 192.168.23.2 error: fileid changed
[ 6.160000] fsid 0:c: expected fileid 0x33590a4, got 0x4d11bedc

but no hang as before. So maybe it's related to the symlink? I don't
know if testing that further would help or just waste of my time, so
please let me know if I can help you and how.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-01-05 15:14:19

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, 2011-01-05 at 16:01 +0100, Marc Kleine-Budde wrote:
> On 01/05/2011 03:53 PM, Trond Myklebust wrote:
> > On Wed, 2011-01-05 at 14:40 +0100, Uwe Kleine-König wrote:
> >> Hi Russell,
> >>
> >> On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
> >>> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-König wrote:
> >>>> Hello Trond,
> >>>>
> >>>> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-König wrote:
> >>>>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> >>>>>> The question is whether this is something happening on the server or the
> >>>>>> client. Does an older client kernel boot without any trouble?
> >>>>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> >>>>> compare with. If you don't consider .36 to be old enough let me now.
> >>>>> Once the setup is done it should be easy to test .35 (say), too.
> >>>>>
> >>>> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> >>>> problems didn't occur. This was more reliable to trigger and he was so
> >>>> kind to bisect the problem.
> >>>>
> >>>> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> >>>> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> >>>> this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> >>>> "init hangs" problem the "fileid changed on tab" problem.
> >>>>
> >>>> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> >>>> but not on armv6 (imx35).
> >>>
> >>> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> >>> truncation of the fileid. It occurred several times on successive
> >>> reboots, so I tried to capture a tcpdump trace off the server (Linux
> >>> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> >>> IDE drivers trying to move it forward.) However, for the last couple
> >>> of weeks I've been unable to reproduce it.
> >>>
> >>> The client was based on 2.6.37-rc6.
> >>>
> >>> The "fileid changed" messages popped up after mounting an export with
> >>> 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> >>> and 'ls' in a few subdirectories - and entries were missing from the
> >>> directory lists without 'ls' reporting any errors (which I think is bad
> >>> behaviour in itself.)
> >> There was a bug in at least -rc5[1] that was considered already fixed in
> >> -rc4[2]. The later announcements didn't mention it anymore.
> >>
> >>> I don't know why it's stopped producing the errors, although once it
> >>> went I never investigated it any further (was far too busy trying to
> >>> get AMBA DMA support working.)
> >> It seems it was fixed for most users though. Trond?
> >
> > As I said, I can't reproduce it.
> >
> > I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> > x86, or does it appear to be architecture-specific?
>
> It _seems_ to be ARMv5 specific[1]. Uwe did some tests and figured out
> that disabling dcache on ARMv5 "fixes" the problem, but
> CONFIG_CPU_DCACHE_WRITETHROUGH isn't enough.
>
> [1] Uwe fails to reproduce it on ARMv6. The ARMv6 has a L2 cache and
> uses IIRC different instructions to flush the L1 caches. (please correct
> me, if I'm wrong, ARM guys :)
>
> cheers, Marc

OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
interfaces, but we can end up reading them via a virtual address range
that gets set up via vm_map_ram() (that range gets set up before the
write occurs).

Do we perhaps need an invalidate_kernel_vmap_range() before we can read
the data on ARM in this kind of scenario?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-05 19:07:48

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 01:55:05PM -0500, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 18:27 +0000, Russell King - ARM Linux wrote:
> > I do still think you need _something_ there, otherwise data can remain
> > in the direct map alias and not be visible via the vmap alias. I don't
> > see that we have anything in place to handle this at present though.
>
> Is that perhaps what flush_kernel_dcache_page() is supposed to do?

Well, given how we have things currently setup on ARM, this ends up
being a no-op - as new page cache pages are marked dirty and their
flushing done at the point when they're mapped into userspace.

I guess we could do the flushing there and mark the page clean, but
it'd need some careful examination of various code paths to confirm
that it's safe - we may be avoiding this because some ARM arch
versions need to manually IPI cache flushes to other cores (which
can only be done with IRQs enabled.)

So, I don't think it'll do at the present time.

2011-01-05 14:42:36

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On 01/05/2011 03:29 PM, Jim Rees wrote:
> Uwe Kleine-K?nig wrote:
>
> > The "fileid changed" messages popped up after mounting an export with
> > 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> > and 'ls' in a few subdirectories - and entries were missing from the
> > directory lists without 'ls' reporting any errors (which I think is bad
> > behaviour in itself.)
> There was a bug in at least -rc5[1] that was considered already fixed in
> -rc4[2]. The later announcements didn't mention it anymore.
>
> > I don't know why it's stopped producing the errors, although once it
> > went I never investigated it any further (was far too busy trying to
> > get AMBA DMA support working.)
> It seems it was fixed for most users though. Trond?
>
> Trond sent a fix to the nfs list on 27 Nov for "fileid changed" but I don't
> know if this is the same bug you're seeing. The patch was to
> nfs_same_file() and I can send it if you want. As far as I know the patch
> made it upstream.

Are you sure it's in .37?

The pick-axe just found one commit so far
(although it's still searching):

$ git log -Snfs_same_file
commit d39ab9de3b80da5835049b1c3b49da4e84e01c07
Author: Bryan Schumaker <[email protected]>
Date: Fri Sep 24 18:50:01 2010 -0400

NFS: re-add readdir plus

This patch adds readdir plus support to the cache array.

Signed-off-by: Bryan Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

Would you please be so kind and send the patch to this thread?

cheers, Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |

Attachments:

2011-01-05 14:53:15

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, 2011-01-05 at 14:40 +0100, Uwe Kleine-König wrote:
> Hi Russell,
>
> On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-König wrote:
> > > Hello Trond,
> > >
> > > On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-König wrote:
> > > > On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > > > > The question is whether this is something happening on the server or the
> > > > > client. Does an older client kernel boot without any trouble?
> > > > I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> > > > compare with. If you don't consider .36 to be old enough let me now.
> > > > Once the setup is done it should be easy to test .35 (say), too.
> > > >
> > > Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> > > problems didn't occur. This was more reliable to trigger and he was so
> > > kind to bisect the problem.
> > >
> > > When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> > > (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> > > this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> > > "init hangs" problem the "fileid changed on tab" problem.
> > >
> > > I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> > > but not on armv6 (imx35).
> >
> > FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> > truncation of the fileid. It occurred several times on successive
> > reboots, so I tried to capture a tcpdump trace off the server (Linux
> > 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> > IDE drivers trying to move it forward.) However, for the last couple
> > of weeks I've been unable to reproduce it.
> >
> > The client was based on 2.6.37-rc6.
> >
> > The "fileid changed" messages popped up after mounting an export with
> > 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> > and 'ls' in a few subdirectories - and entries were missing from the
> > directory lists without 'ls' reporting any errors (which I think is bad
> > behaviour in itself.)
> There was a bug in at least -rc5[1] that was considered already fixed in
> -rc4[2]. The later announcements didn't mention it anymore.
>
> > I don't know why it's stopped producing the errors, although once it
> > went I never investigated it any further (was far too busy trying to
> > get AMBA DMA support working.)
> It seems it was fixed for most users though. Trond?

As I said, I can't reproduce it.

I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
x86, or does it appear to be architecture-specific?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-03 21:38:59

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Thu, Dec 30, 2010 at 08:18:46PM +0100, Uwe Kleine-K?nig wrote:
> On Thu, Dec 30, 2010 at 12:59:52PM -0500, Trond Myklebust wrote:
> > What filesystem are you exporting on the server? What is the NFS
> > version? Is this nfsroot, autofs or an ordinary nfs mount?
> This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a
> directory on a different partition. I don't know the filesystem of my
> homedir as it resides on a server I have no access to, but I asked the
> admin, so I can follow up with this info later (I'd suspect ext3, too).
Yes, it is ext3.

> The real root directory is on ext3 (rw,noatime).
>
> The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1.
If that matters, kernel is linux-image-2.6.32-5-amd64 (2.6.32-29)
provided by Debian.

> I don't
> know if testing that further would help or just waste of my time, so
> please let me know if I can help you and how.
This still applies

Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-01-05 08:40:22

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

Hello Trond,

On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> The question is whether this is something happening on the server or the
> client. Does an older client kernel boot without any trouble?
I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
compare with. If you don't consider .36 to be old enough let me now.
Once the setup is done it should be easy to test .35 (say), too.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-01-05 14:29:07

by Jim Rees

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

Uwe Kleine-K?nig wrote:

> The "fileid changed" messages popped up after mounting an export with
> 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
> and 'ls' in a few subdirectories - and entries were missing from the
> directory lists without 'ls' reporting any errors (which I think is bad
> behaviour in itself.)
There was a bug in at least -rc5[1] that was considered already fixed in
-rc4[2]. The later announcements didn't mention it anymore.

> I don't know why it's stopped producing the errors, although once it
> went I never investigated it any further (was far too busy trying to
> get AMBA DMA support working.)
It seems it was fixed for most users though. Trond?

Trond sent a fix to the nfs list on 27 Nov for "fileid changed" but I don't
know if this is the same bug you're seeing. The patch was to
nfs_same_file() and I can send it if you want. As far as I know the patch
made it upstream.

2011-01-05 17:17:30

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, 2011-01-05 at 15:52 +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 10:14:17AM -0500, Trond Myklebust wrote:
> > OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> > pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> > interfaces, but we can end up reading them via a virtual address range
> > that gets set up via vm_map_ram() (that range gets set up before the
> > write occurs).
>
> kmap of lowmem pages will always reuses the existing kernel direct
> mapping, so there won't be a problem there.
>
> > Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> > the data on ARM in this kind of scenario?
>
> Firstly, vm_map_ram() does no cache maintainence of any sort, nor does
> it take care of page colouring - so any architecture where cache aliasing
> can occur will see this problem. It should not limited to ARM.
>
> Secondly, no, invalidate_kernel_vmap_range() probably isn't sufficient.
> There's two problems here:
>
> addr = kmap(lowmem_page);
> *addr = stuff;
> kunmap(lowmem_page);
>
> Such lowmem pages are accessed through their kernel direct mapping.
>
> ptr = vm_map_ram(lowmem_page);
> read = *ptr;
>
> This creates a new mapping which can alias with the kernel direct mapping.
> Now, as this is a new mapping, there should be no cache lines associated
> with it. (Looking at vm_unmap_ram(), it calls free_unmap_vmap_area_addr(),
> free_unmap_vmap_area(), which then calls flush_cache_vunmap() on the
> region. vb_free() also calls flush_cache_vunmap() too.)
>
> If the write after kmap() hits an already present cache line, the cache
> line will be updated, but it won't be written back to memory. So, on
> a subsequent vm_map_ram(), with any kind of aliasing cache, there's
> no guarantee that you'll hit that cache line and read the data just
> written there.
>
> The kernel direct mapping would need to be flushed.

We should already be flushing the kernel direct mapping after writing by
means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
and all the helpers in net/sunrpc/xdr.c.

The only new thing is the read access through the virtual address
mapping. That mapping is created outside the loop in
nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
invalidate_kernel_vmap_range(): we're essentially doing a series of
writes through the kernel direct mapping (i.e. readdir RPC calls), then
reading the results through the virtual mapping.

i.e. we're doing

ptr = vm_map_ram(lowmem_pages);
while (need_more_data) {

for (i = 0; i < npages; i++) {
addr = kmap_atomic(lowmem_page[i]);
*addr = rpc_stuff;
flush_dcache_page(lowmem_page[i]);
kunmap_atomic(lowmem_page[i]);
}

invalidate_kernel_vmap_range(ptr); // Needed here?

read = *ptr;
}
vm_unmap_ram(lowmem_pages)

> I'm really getting to the point of hating the poliferation of RAM
> remapping interfaces - it's going to (and is) causing nothing but lots
> of pain on virtual cache architectures, needing more and more cache
> flushing interfaces to be created.
>
> Is there any other solution to this?

Arbitrary sized pages. :-)

The problem here is that we want to read variable sized records (i.e.
readdir() records) from a multi-page buffer. We could do that by copying
those particular records that overlap with page boundaries, but that
would make for a fairly intrusive rewrite too.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-05 11:05:26

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

Hello Trond,

On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > The question is whether this is something happening on the server or the
> > client. Does an older client kernel boot without any trouble?
> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> compare with. If you don't consider .36 to be old enough let me now.
> Once the setup is done it should be easy to test .35 (say), too.
>
Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
problems didn't occur. This was more reliable to trigger and he was so
kind to bisect the problem.

When testing v2.6.36-rc3-51-gafa8ccc init hanged.
(babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
"init hangs" problem the "fileid changed on tab" problem.

I could only reproduce that on armv5 machines (imx27, imx28 and at91)
but not on armv6 (imx35).

Best regards
Uwe

[1] similar means: not during boot, but when hitting tab to get command
completion in the shell.

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2011-01-05 15:01:39

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On 01/05/2011 03:53 PM, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 14:40 +0100, Uwe Kleine-König wrote:
>> Hi Russell,
>>
>> On Wed, Jan 05, 2011 at 11:27:01AM +0000, Russell King - ARM Linux wrote:
>>> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-König wrote:
>>>> Hello Trond,
>>>>
>>>> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-König wrote:
>>>>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
>>>>>> The question is whether this is something happening on the server or the
>>>>>> client. Does an older client kernel boot without any trouble?
>>>>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
>>>>> compare with. If you don't consider .36 to be old enough let me now.
>>>>> Once the setup is done it should be easy to test .35 (say), too.
>>>>>
>>>> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
>>>> problems didn't occur. This was more reliable to trigger and he was so
>>>> kind to bisect the problem.
>>>>
>>>> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
>>>> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
>>>> this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
>>>> "init hangs" problem the "fileid changed on tab" problem.
>>>>
>>>> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
>>>> but not on armv6 (imx35).
>>>
>>> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
>>> truncation of the fileid. It occurred several times on successive
>>> reboots, so I tried to capture a tcpdump trace off the server (Linux
>>> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
>>> IDE drivers trying to move it forward.) However, for the last couple
>>> of weeks I've been unable to reproduce it.
>>>
>>> The client was based on 2.6.37-rc6.
>>>
>>> The "fileid changed" messages popped up after mounting an export with
>>> 'nolock,intr,rsize=4096,soft', and then trying to use bash completion
>>> and 'ls' in a few subdirectories - and entries were missing from the
>>> directory lists without 'ls' reporting any errors (which I think is bad
>>> behaviour in itself.)
>> There was a bug in at least -rc5[1] that was considered already fixed in
>> -rc4[2]. The later announcements didn't mention it anymore.
>>
>>> I don't know why it's stopped producing the errors, although once it
>>> went I never investigated it any further (was far too busy trying to
>>> get AMBA DMA support working.)
>> It seems it was fixed for most users though. Trond?
>
> As I said, I can't reproduce it.
>
> I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> x86, or does it appear to be architecture-specific?

It _seems_ to be ARMv5 specific[1]. Uwe did some tests and figured out
that disabling dcache on ARMv5 "fixes" the problem, but
CONFIG_CPU_DCACHE_WRITETHROUGH isn't enough.

[1] Uwe fails to reproduce it on ARMv6. The ARMv6 has a L2 cache and
uses IIRC different instructions to flush the L1 caches. (please correct
me, if I'm wrong, ARM guys :)

cheers, Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |

Attachments:

2011-01-05 11:27:23

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> Hello Trond,
>
> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> > On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> > > The question is whether this is something happening on the server or the
> > > client. Does an older client kernel boot without any trouble?
> > I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> > compare with. If you don't consider .36 to be old enough let me now.
> > Once the setup is done it should be easy to test .35 (say), too.
> >
> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> problems didn't occur. This was more reliable to trigger and he was so
> kind to bisect the problem.
>
> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> "init hangs" problem the "fileid changed on tab" problem.
>
> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> but not on armv6 (imx35).

FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
truncation of the fileid. It occurred several times on successive
reboots, so I tried to capture a tcpdump trace off the server (Linux
2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
IDE drivers trying to move it forward.) However, for the last couple
of weeks I've been unable to reproduce it.

The client was based on 2.6.37-rc6.

The "fileid changed" messages popped up after mounting an export with
'nolock,intr,rsize=4096,soft', and then trying to use bash completion
and 'ls' in a few subdirectories - and entries were missing from the
directory lists without 'ls' reporting any errors (which I think is bad
behaviour in itself.)

I don't know why it's stopped producing the errors, although once it
went I never investigated it any further (was far too busy trying to
get AMBA DMA support working.)

2011-01-05 18:28:32

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 01:12:25PM -0500, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 17:26 +0000, Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> > > We should already be flushing the kernel direct mapping after writing by
> > > means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> > > and all the helpers in net/sunrpc/xdr.c.
> >
> > Hmm, we're getting into the realms of what flush_dcache_page() is supposed
> > to do and what it's not supposed to do.
> >
> > Is this page an associated with a mapping (iow, page_mapping(page) is non-
> > NULL)? If not, flush_dcache_page() won't do anything, and from my
> > understanding, its flush_anon_page() which you want to be using there
> > instead.
>
> Actually, none of these pages are ever mapped into userspace, nor are
> they mapped into the page cache.
>
> They are allocated directly using alloc_page() by the thread that called
> the readdir() syscall, so afaics there should be no incoherent mappings
> other than the kernel direct mapping and the one created by
> vm_map_ram().
>
> So, yes, you are right that we don't need the flush_dcache_page() here.

I do still think you need _something_ there, otherwise data can remain
in the direct map alias and not be visible via the vmap alias. I don't
see that we have anything in place to handle this at present though.

jejb mentioned something about making kunmap_atomic() always flush the
cache, even for lowmem pages, but I think that's going to be exceedingly
painful, to the extent that I believe it will prevent our PIO-only MMC
drivers working - or we need a scatterlist API that will let drivers
iterate over the scatterlist without needing to continually kmap_atomic
and kunmap_atomic each page.

2011-01-05 18:55:07

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, 2011-01-05 at 18:27 +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 01:12:25PM -0500, Trond Myklebust wrote:
> > On Wed, 2011-01-05 at 17:26 +0000, Russell King - ARM Linux wrote:
> > > On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> > > > We should already be flushing the kernel direct mapping after writing by
> > > > means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> > > > and all the helpers in net/sunrpc/xdr.c.
> > >
> > > Hmm, we're getting into the realms of what flush_dcache_page() is supposed
> > > to do and what it's not supposed to do.
> > >
> > > Is this page an associated with a mapping (iow, page_mapping(page) is non-
> > > NULL)? If not, flush_dcache_page() won't do anything, and from my
> > > understanding, its flush_anon_page() which you want to be using there
> > > instead.
> >
> > Actually, none of these pages are ever mapped into userspace, nor are
> > they mapped into the page cache.
> >
> > They are allocated directly using alloc_page() by the thread that called
> > the readdir() syscall, so afaics there should be no incoherent mappings
> > other than the kernel direct mapping and the one created by
> > vm_map_ram().
> >
> > So, yes, you are right that we don't need the flush_dcache_page() here.
>
> I do still think you need _something_ there, otherwise data can remain
> in the direct map alias and not be visible via the vmap alias. I don't
> see that we have anything in place to handle this at present though.

Is that perhaps what flush_kernel_dcache_page() is supposed to do?

> jejb mentioned something about making kunmap_atomic() always flush the
> cache, even for lowmem pages, but I think that's going to be exceedingly
> painful, to the extent that I believe it will prevent our PIO-only MMC
> drivers working - or we need a scatterlist API that will let drivers
> iterate over the scatterlist without needing to continually kmap_atomic
> and kunmap_atomic each page.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-04 00:22:44

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Mon, 2011-01-03 at 22:38 +0100, Uwe Kleine-König wrote:
> On Thu, Dec 30, 2010 at 08:18:46PM +0100, Uwe Kleine-König wrote:
> > On Thu, Dec 30, 2010 at 12:59:52PM -0500, Trond Myklebust wrote:
> > > What filesystem are you exporting on the server? What is the NFS
> > > version? Is this nfsroot, autofs or an ordinary nfs mount?
> > This is an nfsroot of /home/ukl/nfsroot/tx28 which is a symlink to a
> > directory on a different partition. I don't know the filesystem of my
> > homedir as it resides on a server I have no access to, but I asked the
> > admin, so I can follow up with this info later (I'd suspect ext3, too).
> Yes, it is ext3.
>
> > The real root directory is on ext3 (rw,noatime).
> >
> > The serving nfs-server is Debian's nfs-kernel-server 1:1.2.2-1.
> If that matters, kernel is linux-image-2.6.32-5-amd64 (2.6.32-29)
> provided by Debian.
>
> > I don't
> > know if testing that further would help or just waste of my time, so
> > please let me know if I can help you and how.
> This still applies

I'm having trouble reproducing this with my own nfsroot setup (which is
just a 'fedora 13 live' disk with NetworkManager turned firmly off).

However looking back at your report, you said that when you remove the
symlink, you get an error message of the form:

"starting splashutils daemon.../etc/rc.d/S00splashutils: line
50: //sbin/fbsplashd.static: Unknown error 521"

Error 521 is EBADHANDLE, which basically means your client got a
corrupted filehandle. The 'fileid changed' thing also indicates some
form of corruption.

The question is whether this is something happening on the server or the
client. Does an older client kernel boot without any trouble?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-05 15:53:25

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 10:14:17AM -0500, Trond Myklebust wrote:
> OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> interfaces, but we can end up reading them via a virtual address range
> that gets set up via vm_map_ram() (that range gets set up before the
> write occurs).

kmap of lowmem pages will always reuses the existing kernel direct
mapping, so there won't be a problem there.

> Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> the data on ARM in this kind of scenario?

Firstly, vm_map_ram() does no cache maintainence of any sort, nor does
it take care of page colouring - so any architecture where cache aliasing
can occur will see this problem. It should not limited to ARM.

Secondly, no, invalidate_kernel_vmap_range() probably isn't sufficient.
There's two problems here:

addr = kmap(lowmem_page);
*addr = stuff;
kunmap(lowmem_page);

Such lowmem pages are accessed through their kernel direct mapping.

ptr = vm_map_ram(lowmem_page);
read = *ptr;

This creates a new mapping which can alias with the kernel direct mapping.
Now, as this is a new mapping, there should be no cache lines associated
with it. (Looking at vm_unmap_ram(), it calls free_unmap_vmap_area_addr(),
free_unmap_vmap_area(), which then calls flush_cache_vunmap() on the
region. vb_free() also calls flush_cache_vunmap() too.)

If the write after kmap() hits an already present cache line, the cache
line will be updated, but it won't be written back to memory. So, on
a subsequent vm_map_ram(), with any kind of aliasing cache, there's
no guarantee that you'll hit that cache line and read the data just
written there.

The kernel direct mapping would need to be flushed.

I'm really getting to the point of hating the poliferation of RAM
remapping interfaces - it's going to (and is) causing nothing but lots
of pain on virtual cache architectures, needing more and more cache
flushing interfaces to be created.

Is there any other solution to this?

2011-01-05 12:14:29

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On 01/05/2011 12:27 PM, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
>> Hello Trond,
>>
>> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
>>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
>>>> The question is whether this is something happening on the server or the
>>>> client. Does an older client kernel boot without any trouble?
>>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
>>> compare with. If you don't consider .36 to be old enough let me now.
>>> Once the setup is done it should be easy to test .35 (say), too.
>>>
>> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
>> problems didn't occur. This was more reliable to trigger and he was so
>> kind to bisect the problem.
>>
>> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
>> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
>> this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
>> "init hangs" problem the "fileid changed on tab" problem.
>>
>> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
>> but not on armv6 (imx35).
>
> FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> truncation of the fileid. It occurred several times on successive
> reboots, so I tried to capture a tcpdump trace off the server (Linux
> 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> IDE drivers trying to move it forward.) However, for the last couple
> of weeks I've been unable to reproduce it.

We have the problem with nfs-root. From the kernel command line:

root=/dev/nfs
nfsroot=192.168.23.2:/home/mkl/pengutronix/xxx/bsp/OSELAS.BSP-xxx-Grabowski-trunk/platform-Ronetix-PM9263/root,v3,tcp

/home/mkl/pengutronix is a link which points to a link
/ptx/work/octopus/mkl (which is a ext3-based) which points to
WORK_1/mkl which is also ext3-based.

The server is 2.6.32 and has been rebooted yesterday :), nfs-utils are
1.2.2. I make a tcpdump if needed.

Cheers, Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |

Attachments:

2011-01-05 15:29:47

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, 2011-01-05 at 10:14 -0500, Trond Myklebust wrote:
> OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
> pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
> interfaces, but we can end up reading them via a virtual address range
> that gets set up via vm_map_ram() (that range gets set up before the
> write occurs).
>
> Do we perhaps need an invalidate_kernel_vmap_range() before we can read
> the data on ARM in this kind of scenario?

IOW: Does something like the following patch fix the problem?

-------------------------------------------------------------------------------
From: Trond Myklebust <[email protected]>
NFS: Ensure we clean the TLB cache in nfs_readdir_xdr_to_array

After calling nfs_readdir_xdr_filler(), we need a call to
invalidate_kernel_vmap_range() before we can proceed to read
the data back through the virtual address range.

Signed-off-by: Trond Myklebust <[email protected]>
---
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 996dd89..4640470 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -587,6 +587,9 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
if (status < 0)
break;
pglen = status;
+
+ invalidate_kernel_vmap_range(pages_ptr, pglen);
+
status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
if (status < 0) {
if (status == -ENOSPC)

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-05 17:26:56

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> We should already be flushing the kernel direct mapping after writing by
> means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> and all the helpers in net/sunrpc/xdr.c.

Hmm, we're getting into the realms of what flush_dcache_page() is supposed
to do and what it's not supposed to do.

Is this page an associated with a mapping (iow, page_mapping(page) is non-
NULL)? If not, flush_dcache_page() won't do anything, and from my
understanding, its flush_anon_page() which you want to be using there
instead.

> The only new thing is the read access through the virtual address
> mapping. That mapping is created outside the loop in
> nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
> invalidate_kernel_vmap_range(): we're essentially doing a series of
> writes through the kernel direct mapping (i.e. readdir RPC calls), then
> reading the results through the virtual mapping.
>
> i.e. we're doing
>
> ptr = vm_map_ram(lowmem_pages);
> while (need_more_data) {
>
> for (i = 0; i < npages; i++) {
> addr = kmap_atomic(lowmem_page[i]);
> *addr = rpc_stuff;
> flush_dcache_page(lowmem_page[i]);
> kunmap_atomic(lowmem_page[i]);
> }
>
> invalidate_kernel_vmap_range(ptr); // Needed here?

Yes, you're going to need some cache maintainence in there to make it work,
because accessing 'ptr' will load that data into the cache, and that won't
be updated by the writes via kmap_atomic().

Provided you don't write to ptr, then using invalidate_kernel_vmap_range()
will be safe.

2011-01-14 02:40:40

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Thu, 2011-01-13 at 18:25 -0800, Andy Isaacson wrote:
> On Wed, Jan 05, 2011 at 09:53:13AM -0500, Trond Myklebust wrote:
> > > There was a bug in at least -rc5[1] that was considered already fixed in
> > > -rc4[2]. The later announcements didn't mention it anymore.
> > >
> > > > I don't know why it's stopped producing the errors, although once it
> > > > went I never investigated it any further (was far too busy trying to
> > > > get AMBA DMA support working.)
> > > It seems it was fixed for most users though. Trond?
> >
> > As I said, I can't reproduce it.
> >
> > I'm seeing a lot of mention of ARM above. Is anyone seeing this bug on
> > x86, or does it appear to be architecture-specific?
>
> I'm seeing processes stuck in D with "fileid changed" in dmesg, on
> x86_64 (both server and client). The repro testcase is to run an
> executable off of NFS, recompile it on the server, and then try to tab
> complete the executable name. The client prints
>
> NFS: server <hostname> error: fileid changed
> fsid 0:18: expected fileid 0x107aa4a, got 0x107ad3e
>
> and /bin/zsh hangs in D.
>
> My server is running 2.6.36.1, filesystem is ext3 on sda3 on AHCI,
> client is currently running 2.6.37-rc1. I'm assuming that 37a09f will
> fix it.

Why are you sticking to 2.6.37-rc1 when the final 2.6.37 is out? There
have been several readdir bugfixes merged in the months since -rc1 came
out.

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-05 15:39:58

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On 01/05/2011 04:29 PM, Trond Myklebust wrote:
> On Wed, 2011-01-05 at 10:14 -0500, Trond Myklebust wrote:
>> OK. So,the new behaviour in 2.6.37 is that we're writing to a series of
>> pages via the usual kmap_atomic()/kunmap_atomic() and kmap()/kunmap()
>> interfaces, but we can end up reading them via a virtual address range
>> that gets set up via vm_map_ram() (that range gets set up before the
>> write occurs).
>>
>> Do we perhaps need an invalidate_kernel_vmap_range() before we can read
>> the data on ARM in this kind of scenario?
>
> IOW: Does something like the following patch fix the problem?
>
> -------------------------------------------------------------------------------
> From: Trond Myklebust <[email protected]>
> NFS: Ensure we clean the TLB cache in nfs_readdir_xdr_to_array
>
> After calling nfs_readdir_xdr_filler(), we need a call to
> invalidate_kernel_vmap_range() before we can proceed to read
> the data back through the virtual address range.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 996dd89..4640470 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -587,6 +587,9 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
> if (status < 0)
> break;
> pglen = status;
> +
> + invalidate_kernel_vmap_range(pages_ptr, pglen);
> +
> status = nfs_readdir_page_filler(desc, &entry, pages_ptr, page, pglen);
> if (status < 0) {
> if (status == -ENOSPC)

\o/ - Works for me (at91, armv5)

Tested-by: Marc Kleine-Budde <[email protected]>

This is a candidate for stable (Cc'd).

Regards,
Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |

Attachments:

2011-01-05 15:35:13

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 06:32:58PM +0530, Nori, Sekhar wrote:
> Here are some logs:
>
> fileid changed fsid 0:c: expected fileid 0x2db61d, got 0x2dad20
>
> fileid changed fsid 0:c: expected fileid 0x100000000000, got 0x7070000000000000

Just to be clear, what I was seeing was things like:

expected fileid <32-bit number> got <64-bit number with same 32-bit LS bits>

so it looked like something was truncating a 64-bit fileid down
to 32-bits.

2011-01-05 13:03:50

by Sekhar Nori

[permalink] [raw]

Subject: RE: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, Jan 05, 2011 at 17:44:16, Marc Kleine-Budde wrote:
> On 01/05/2011 12:27 PM, Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 12:05:17PM +0100, Uwe Kleine-K?nig wrote:
> >> Hello Trond,
> >>
> >> On Wed, Jan 05, 2011 at 09:40:14AM +0100, Uwe Kleine-K?nig wrote:
> >>> On Mon, Jan 03, 2011 at 07:22:38PM -0500, Trond Myklebust wrote:
> >>>> The question is whether this is something happening on the server or the
> >>>> client. Does an older client kernel boot without any trouble?
> >>> I will set up a boot test with 2.6.37 (for statistics) and 2.6.36 to
> >>> compare with. If you don't consider .36 to be old enough let me now.
> >>> Once the setup is done it should be easy to test .35 (say), too.
> >>>
> >> Marc (cc'd) saw similar[1] problems with .37, when using .36.2 the
> >> problems didn't occur. This was more reliable to trigger and he was so
> >> kind to bisect the problem.
> >>
> >> When testing v2.6.36-rc3-51-gafa8ccc init hanged.
> >> (babddc72a9468884ce1a23db3c3d54b0afa299f0 is the first bad commit with
> >> this hang.) Commit 56e4ebf877b6043c289bda32a5a7385b80c17dee makes the
> >> "init hangs" problem the "fileid changed on tab" problem.
> >>
> >> I could only reproduce that on armv5 machines (imx27, imx28 and at91)
> >> but not on armv6 (imx35).
> >
> > FYI, I've seen the "fileid changed" problem, and it looked like a 32-bit
> > truncation of the fileid. It occurred several times on successive
> > reboots, so I tried to capture a tcpdump trace off the server (Linux
> > 2.6.23-rc8-ga64314e6 - its ancient because I've had issues with buggy
> > IDE drivers trying to move it forward.) However, for the last couple
> > of weeks I've been unable to reproduce it.
>
> We have the problem with nfs-root. From the kernel command line:
>
> root=/dev/nfs
> nfsroot=192.168.23.2:/home/mkl/pengutronix/xxx/bsp/OSELAS.BSP-xxx-Grabowski-trunk/platform-Ronetix-PM9263/root,v3,tcp
>
> /home/mkl/pengutronix is a link which points to a link
> /ptx/work/octopus/mkl (which is a ext3-based) which points to
> WORK_1/mkl which is also ext3-based.
>
> The server is 2.6.32 and has been rebooted yesterday :), nfs-utils are
> 1.2.2. I make a tcpdump if needed.

I see the issue too with an ARMv5 based DM355 board with the just released
v2.6.37 tag (nfs client). I too see the issue when using bash tab completion.

Here are some logs:

fileid changed fsid 0:c: expected fileid 0x2db61d, got 0x2dad20

fileid changed fsid 0:c: expected fileid 0x100000000000, got 0x7070000000000000

I am using Fedora 8 (2.6.25 kernel) on the server side. I will try the latest
Ubuntu release on the server side and test.

Thanks,
Sekhar

2011-01-05 18:12:43

[permalink] [raw]

Subject: Re: still nfs problems [Was: Linux 2.6.37-rc8]

On Wed, 2011-01-05 at 17:26 +0000, Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 12:17:27PM -0500, Trond Myklebust wrote:
> > We should already be flushing the kernel direct mapping after writing by
> > means of the calls to flush_dcache_page() in xdr_partial_copy_from_skb()
> > and all the helpers in net/sunrpc/xdr.c.
>
> Hmm, we're getting into the realms of what flush_dcache_page() is supposed
> to do and what it's not supposed to do.
>
> Is this page an associated with a mapping (iow, page_mapping(page) is non-
> NULL)? If not, flush_dcache_page() won't do anything, and from my
> understanding, its flush_anon_page() which you want to be using there
> instead.

Actually, none of these pages are ever mapped into userspace, nor are
they mapped into the page cache.

They are allocated directly using alloc_page() by the thread that called
the readdir() syscall, so afaics there should be no incoherent mappings
other than the kernel direct mapping and the one created by
vm_map_ram().

So, yes, you are right that we don't need the flush_dcache_page() here.

> > The only new thing is the read access through the virtual address
> > mapping. That mapping is created outside the loop in
> > nfs_readdir_xdr_to_array(), which is why I'm thinking we do need the
> > invalidate_kernel_vmap_range(): we're essentially doing a series of
> > writes through the kernel direct mapping (i.e. readdir RPC calls), then
> > reading the results through the virtual mapping.
> >
> > i.e. we're doing
> >
> > ptr = vm_map_ram(lowmem_pages);
> > while (need_more_data) {
> >
> > for (i = 0; i < npages; i++) {
> > addr = kmap_atomic(lowmem_page[i]);
> > *addr = rpc_stuff;
> > flush_dcache_page(lowmem_page[i]);
> > kunmap_atomic(lowmem_page[i]);
> > }
> >
> > invalidate_kernel_vmap_range(ptr); // Needed here?
>
> Yes, you're going to need some cache maintainence in there to make it work,
> because accessing 'ptr' will load that data into the cache, and that won't
> be updated by the writes via kmap_atomic().
>
> Provided you don't write to ptr, then using invalidate_kernel_vmap_range()
> will be safe.

Thanks! That is what Marc's testing appears to confirm.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-05 13:40:56