LinuxLists.cc - [aarch64] refcount_t: use-after-free in NFS with 64k pages

2018-10-19 22:55:06

Subject: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hi all,

I'm recently chasing a bug that frequently appears during our internal
LTP test-runs when performed on aarch64 HW (Juno) systems with an
NFS-mounted root.

The failure is NOT related to any specific LTP testcase and this issue
has been observed only when Kernel is configured to use 64KB pages.
(on the latest LTP Sept18 TAG test suite a Kernel crash happened in 4
out of 5 test runs...always on a different random test case)

I'm testing on Linus branch on 4.19-rc6 (but I can see it up to 4.19-rc8
and also on next) and it is reported since 4.17 at least (not sure about
this...anyway it was NOT happening)

It all starts with a stacktrace as follows:

[ 214.513856] refcount_t: increment on 0; use-after-free.
[ 214.519701] WARNING: CPU: 0 PID: 1397 at
/home//ARM/dev/src/pdsw/linux/lib/refcount.c:153
refcount_inc_checked+0x40/0x48
[ 214.520010] Modules linked in:
[ 214.520606] CPU: 0 PID: 1397 Comm: kworker/u2:5 Not tainted
4.19.0-rc4-00135-g4b92e7fd76e9 #17
[ 214.520732] Hardware name: Foundation-v8A (DT)
[ 214.521258] Workqueue: writeback wb_workfn (flush-0:17)
[ 214.521629] pstate: 40000005 (nZcv daif -PAN -UAO)
[ 214.521739] pc : refcount_inc_checked+0x40/0x48
[ 214.521826] lr : refcount_inc_checked+0x40/0x48
[ 214.521910] sp : ffff00000c3ef850
[ 214.521981] x29: ffff00000c3ef850 x28: ffff800841a8bf00
[ 214.522172] x27: 000000007fffffff x26: ffff000009173000
[ 214.522301] x25: 0000000000000002 x24: ffff00000c3ef960
[ 214.522419] x23: ffff8008202484d8 x22: ffff800841a8c580
[ 214.522523] x21: ffff00000c3ef950 x20: ffff8008202484d8
[ 214.522628] x19: ffff800841a8c580 x18: 0000000000000400
[ 214.522744] x17: 0000000000000002 x16: 0000000000000000
[ 214.522849] x15: 0000000000000400 x14: 0000000000000400
[ 214.522963] x13: 0000000000000003 x12: 0000000000000001
[ 214.523063] x11: 0000000000000000 x10: 0000000000000950
[ 214.523181] x9 : ffff00000c3ef560 x8 : ffff800840d2f3b0
[ 214.523284] x7 : ffff800840d2eac0 x6 : 0000000000004c0e
[ 214.523390] x5 : 0000000000004c0e x4 : 0000000000000040
[ 214.523512] x3 : 0000800876a30000 x2 : ffff00000918b0b8
[ 214.523619] x1 : 114d6487b94add00 x0 : 0000000000000000
[ 214.523865] Call trace:
[ 214.524034] refcount_inc_checked+0x40/0x48
[ 214.524194] nfs_scan_commit_list+0x64/0x158
[ 214.524300] nfs_scan_commit.part.12+0x70/0xc8
[ 214.524367] __nfs_commit_inode+0xa8/0x190
[ 214.524429] nfs_io_completion_commit+0x14/0x20
[ 214.524495] nfs_io_completion_put.part.0+0x28/0x40
[ 214.524566] nfs_writepages+0x114/0x1f8
[ 214.524656] do_writepages+0x54/0x100
[ 214.524722] __writeback_single_inode+0x38/0x1b0
[ 214.524795] writeback_sb_inodes+0x1bc/0x3b0
[ 214.524860] wb_writeback+0xe8/0x1e0
[ 214.524920] wb_workfn+0xac/0x2f0
[ 214.525011] process_one_work+0x1e0/0x318
[ 214.525079] worker_thread+0x40/0x450
[ 214.525136] kthread+0x128/0x130
[ 214.525200] ret_from_fork+0x10/0x18
[ 214.525378] ---[ end trace 085bda95b972076a ]---
[ 214.527357] WARNING: CPU: 0 PID: 2247 at
/home//ARM/dev/src/pdsw/linux/fs/nfs/pagelist.c:426
nfs_free_request+0x180/0x188
[ 214.527473] Modules linked in:
[ 214.527603] CPU: 0 PID: 2247 Comm: kworker/0:10 Tainted: G W
4.19.0-rc4-00135-g4b92e7fd76e9 #17
[ 214.527694] Hardware name: Foundation-v8A (DT)
[ 214.527836] Workqueue: nfsiod rpc_async_release
[ 214.527941] pstate: 00000005 (nzcv daif -PAN -UAO)
[ 214.528012] pc : nfs_free_request+0x180/0x188
[ 214.528084] lr : nfs_release_request+0x7c/0xa8
[ 214.528150] sp : ffff00000a3afc90
[ 214.528215] x29: ffff00000a3afc90 x28: 0000000000000000
[ 214.528319] x27: 0000000000000000 x26: ffff000008e7ac08
[ 214.528439] x25: 0000000000000000 x24: ffff800841169d20
[ 214.528543] x23: ffff800841169c50 x22: ffff000009173000
[ 214.528663] x21: ffff800841a86ac0 x20: ffff800841a88980
[ 214.528765] x19: ffff800841a86ac0 x18: 0000000000000400
[ 214.528878] x17: 0000000000000c02 x16: 0000000000000000
[ 214.528985] x15: 0000000000000400 x14: 0000000000000400
[ 214.529089] x13: 0000000000000001 x12: 0000000000000001
[ 214.529210] x11: ffff7fe0021f2080 x10: 0000000000000001
[ 214.529336] x9 : 0000000000000230 x8 : 0000000000000001
[ 214.529445] x7 : 0000000000000006 x6 : 0000000000000000
[ 214.529554] x5 : ffff0000091736c8 x4 : 61c8864680b583eb
[ 214.529655] x3 : ffff000009170180 x2 : 114d6487b94add00
[ 214.529768] x1 : 0000000000000000 x0 : 0000000000000c02
[ 214.529875] Call trace:
[ 214.529936] nfs_free_request+0x180/0x188
[ 214.530004] nfs_release_request+0x7c/0xa8
[ 214.530081] nfs_release_request+0x58/0xa8
[ 214.530148] nfs_write_completion+0x108/0x1d0
[ 214.530226] nfs_pgio_release+0x14/0x20
[ 214.530303] rpc_free_task+0x28/0x48
[ 214.530365] rpc_async_release+0x10/0x18
[ 214.530435] process_one_work+0x1e0/0x318
[ 214.530511] worker_thread+0x40/0x450
[ 214.530576] kthread+0x128/0x130
[ 214.530637] ret_from_fork+0x10/0x18
[ 214.530703] ---[ end trace 085bda95b972076b ]---
[ 214.561905]
=============================================================================
[ 214.562272] BUG nfs_page (Tainted: G W ): Poison
overwritten
[ 214.562380]
-----------------------------------------------------------------------------
[ 214.562380]
[ 214.562556] Disabling lock debugging due to kernel taint
[ 214.562890] INFO: 0x00000000b97c0237-0x000000008961d49d. First byte
0x6d instead of 0x6b
[ 214.563217] INFO: Slab 0x00000000904824d1 objects=341 used=341 fp=0x
(null) flags=0x1fffc00000000100
[ 214.563342] INFO: Object 0x0000000027f0bc03 @offset=47872
fp=0x00000000d28ef0cf
[ 214.563342]
[ 214.563896] Redzone 00000000bf6d5f5c: bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb ................
[ 214.564049] Redzone 00000000176bf877: bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb ................
[ 214.564160] Redzone 0000000021a23f7d: bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb ................
[ 214.564250] Redzone 00000000867bf55b: bb bb bb bb bb bb bb bb bb bb
bb bb bb bb bb bb ................
[ 214.564368] Object 0000000027f0bc03: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 214.564470] Object 00000000e1d63013: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 214.564559] Object 0000000048d2aff2: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 214.564650] Object 000000009988735f: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6d 6b 6b 6b kkkkkkkkkkkkmkkk
[ 214.564749] Object 00000000f2578c10: 6b 7b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b k{kkkkkkkkkkkkkk
[ 214.564837] Object 00000000f5216ea3: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
[ 214.564992] Redzone 00000000cfb0e6ed: bb bb bb bb bb bb bb bb
........
[ 214.565105] Padding 00000000ef117243: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[ 214.565404] CPU: 0 PID: 2344 Comm: dd Tainted: G B W
4.19.0-rc4-00135-g4b92e7fd76e9 #17
[ 214.565507] Hardware name: Foundation-v8A (DT)
[ 214.565676] Call trace:

So after the refcount warn, there are other warns about wb_flags
consistency and finally (if enabled) a Poison Overwritten is detected on
the nfs_page. In other runs with slub_debug disabled it can finally hit
various invalid dereference request or the BUG() inside
nfs_unlock_request().

Observing that this deadly sequence usually derives from a file sync
event or an exiting process(a terminating test case) closing and
flushing files to NFS, I tried to reproduce it quicker using a script
which spawns a number of dd processes, killing randomly some of them,
and periodically syncing in background.
Such script when run on a ARM FastModel aarch64 (which by itself is
slower than HW and monothreaded) is able to reproduce the same above
issue 100% of time within 30 secs if using properly sized dd writes
(~160KB). (but ONLY on 64KB pages Kernel)

Ftracing and annotating NFS code I tracked down the usage of such
nfs_pages and I made the following observations:

- a number of nfs_page objects which are apparently still actively used
by NFS (and so still linked to some NFS list via wb_list) got freed
because the refcount wb_kref drops to zero. (such rogue nfs_pages are
also linked between themselves in a single flock)

- earlier during the same test run these same nfs_page objects can be
observed to have been previously properly linked/unlinked and freed (so
I think we can exclude the issue is tied to a specific nfs_page object
instance and specific underlying physical location)

- wb_list handling NFS wrapper functions nfs_list_add_request() and
nfs_list_remove_request() seems to be handling properly items addition
and removal: I could NOT find a call to nfs_list_add_request() issued on
an element not previously cleanly initialized or unlinked with
nfs_list_remove_request()

- when the crash happens nicely enough, you can see these rogue pages
being still properly manipulated (unlinked !) by NFS code after the
freeing (and the warnings)..it's just too late.

For this reasons now I think it is the refcount which is wrongly
accounted in this particular situation, and it drops to zero
prematurely; I don't think anymore that this crash is due to some stale
nfs_page not having been properly unlinked or being reachable from a
stale dangling pointer.

As a proof of concept, if I track down when the evil is going to happen
(freeing a still wb_list-linked nfs_page...) and I artificially pump up
by one the wb_kref refcount just before (so delaying the free)...the
issue disappears...without any memory leak (given what kmemleak and
slub_debug says)

So applying this patch below (which is NOT meant to be a fix but just a
workaround to validate all of the above) the issue goes completely away
and I could run my 30secs-failing test case for ~12 hours. On real HW I
could now run successfully 10/10 full LTP test runs in a row without any
issue too.

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index bb5476a6d264..171813f9a291 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -432,6 +432,15 @@ void nfs_free_request(struct nfs_page *req)

void nfs_release_request(struct nfs_page *req)
{
+ /* WORKAROUND */
+ if ((kref_read(&req->wb_kref) == 1) &&
+ (req->wb_list.prev != &req->wb_list ||
+ req->wb_list.next != &req->wb_list)) {
+ pr_warn("%s -- Forcing REFCOUNT++ on dirty req[%u]:%px
->prev:%px ->next:%px\n",
+ __func__, kref_read(&req->wb_kref), req,
+ req->wb_list.prev, req->wb_list.next);
+ kref_get(&req->wb_kref);
+ }
kref_put(&req->wb_kref, nfs_page_group_destroy);
}
EXPORT_SYMBOL_GPL(nfs_release_request);

I still have to figure out WHY this is happening when the system is
loaded AND only with 64kb pages. (so basically the root cause...:<)

What I could see is that the refcount bad-accounting seems to originate
during the early phase of nfs_page allocation:

- OK: nfs_create_request creates straight away an nfs_page wb_kref +2

- OK: nfs_create_request creates a nfs_page with wb_kref +1 AND then
wb_kref is immediately after incremented to +2 by an
nfs_inode_add_request() before being moved across wb_list

- FAIL: nfs_create_request creates a nfs_page with wb_kref +1 and it
remains to +1 till when it starts being moved across lists.

Any ideas or suggestions to triage why this condition is happening ?
(I'm not really an NFS guy...:D)

Thanks

Cristian

2018-10-19 23:24:58

by Punit Agrawal

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hi Cristian,

Thanks for digging into this.

Cristian Marussi <[email protected]> writes:

> Hi all,
>
> I'm recently chasing a bug that frequently appears during our internal
> LTP test-runs when performed on aarch64 HW (Juno) systems with an
> NFS-mounted root.
>
> The failure is NOT related to any specific LTP testcase and this issue
> has been observed only when Kernel is configured to use 64KB pages.
> (on the latest LTP Sept18 TAG test suite a Kernel crash happened in 4
> out of 5 test runs...always on a different random test case)
>
> I'm testing on Linus branch on 4.19-rc6 (but I can see it up to
> 4.19-rc8 and also on next) and it is reported since 4.17 at least (not
> sure about this...anyway it was NOT happening)

The stacktrace suggests it's the same issue that I'd reported earlier -

https://lkml.org/lkml/2018/6/29/209

though without the analysis below.

[...]

> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index bb5476a6d264..171813f9a291 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -432,6 +432,15 @@ void nfs_free_request(struct nfs_page *req)
>
> void nfs_release_request(struct nfs_page *req)
> {
> + /* WORKAROUND */
> + if ((kref_read(&req->wb_kref) == 1) &&
> + (req->wb_list.prev != &req->wb_list ||
> + req->wb_list.next != &req->wb_list)) {

Are the last two conditions just checking that wb_list is not empty?

Thanks for looking at this.

Punit

> + pr_warn("%s -- Forcing REFCOUNT++ on dirty req[%u]:%px
> ->prev:%px ->next:%px\n",
> + __func__, kref_read(&req->wb_kref), req,
> + req->wb_list.prev, req->wb_list.next);
> + kref_get(&req->wb_kref);
> + }
> kref_put(&req->wb_kref, nfs_page_group_destroy);
> }
> EXPORT_SYMBOL_GPL(nfs_release_request);
>
> I still have to figure out WHY this is happening when the system is
> loaded AND only with 64kb pages. (so basically the root cause...:<)
>
> What I could see is that the refcount bad-accounting seems to
> originate during the early phase of nfs_page allocation:
>
> - OK: nfs_create_request creates straight away an nfs_page wb_kref +2
>
> - OK: nfs_create_request creates a nfs_page with wb_kref +1 AND then
> wb_kref is immediately after incremented to +2 by an
> nfs_inode_add_request() before being moved across wb_list
>
> - FAIL: nfs_create_request creates a nfs_page with wb_kref +1 and it
> remains to +1 till when it starts being moved across lists.
>
> Any ideas or suggestions to triage why this condition is happening ?
> (I'm not really an NFS guy...:D)
>
> Thanks
>
> Cristian

2018-10-19 23:41:43

by Cristian Marussi

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hi Punit

On 19/10/18 16:18, Punit Agrawal wrote:
> Hi Cristian,
>
[...]
>
> The stacktrace suggests it's the same issue that I'd reported earlier -
>
> https://lkml.org/lkml/2018/6/29/209
>
> though without the analysis below.
>
> [...]
>
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index bb5476a6d264..171813f9a291 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -432,6 +432,15 @@ void nfs_free_request(struct nfs_page *req)
>>
>> void nfs_release_request(struct nfs_page *req)
>> {
>> + /* WORKAROUND */
>> + if ((kref_read(&req->wb_kref) == 1) &&
>> + (req->wb_list.prev != &req->wb_list ||
>> + req->wb_list.next != &req->wb_list)) {
>
> Are the last two conditions just checking that wb_list is not empty?

yes ... I basically check if the nfs_page element is still linked
somewhere just before putting the kref to zero (which will result in a
final nfs_page_free() on an nfs_page which is instead still used).

I checked both here instead of using list_empty() because list_empty
just check next pointer (READ_ONCE(head->next) == head;) and I was
overly paranoid while investigating this initially since I didn't know
really if it was some race on the list handling itself...

Since nfs_list_remove_request() wrapper func issue a list_del_init(),
once the item is unlinked wb_list list_head should point to itself...if
it is not ... it still could be accessed after being freed...and it
happens in fact...while does NOT happen in all the cases in which the
refcount is properly accounted...(I have a ton of Ftracing :D)

Thanks

Cristian

2019-02-05 11:54:01

by Benjamin Coddington

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hello Cristian and Punit,

Did you ever get to the bottom of this one? We just saw this on one run
of our 4.18.0-era ppc64le, and I'm wondering if we ever found the root
cause.

Ben

2019-02-05 12:10:37

by Cristian Marussi

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hi Ben

On 05/02/2019 11:53, Benjamin Coddington wrote:
> Hello Cristian and Punit,
>
> Did you ever get to the bottom of this one? We just saw this on one run
> of our 4.18.0-era ppc64le, and I'm wondering if we ever found the root
> cause.

unfortunately I stopped working actively on finding the root cause, since I've
found a viable workaround that let us unblock our broken LTP runs.

Setting wsize=65536 in NFS bootparams completely solves the issue with 64k pages
(and does NOT break 4k either :D): this confirmed my hyp that there is some sort
of race when accounting refcounts during the lifetime of nfs_page structs which
leads to a misscounted refcount...but as I said I never looked back into that
again (but never say never...)

Hope this helps...

Regards

Cristian

>
> Ben
>

2019-02-05 12:14:24

by Benjamin Coddington

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

On 5 Feb 2019, at 7:10, Cristian Marussi wrote:

> Hi Ben
>
> On 05/02/2019 11:53, Benjamin Coddington wrote:
>> Hello Cristian and Punit,
>>
>> Did you ever get to the bottom of this one? We just saw this on one
>> run
>> of our 4.18.0-era ppc64le, and I'm wondering if we ever found the
>> root
>> cause.
>
> unfortunately I stopped working actively on finding the root cause,
> since I've
> found a viable workaround that let us unblock our broken LTP runs.
>
> Setting wsize=65536 in NFS bootparams completely solves the issue with
> 64k pages
> (and does NOT break 4k either :D): this confirmed my hyp that there is
> some sort
> of race when accounting refcounts during the lifetime of nfs_page
> structs which
> leads to a misscounted refcount...but as I said I never looked back
> into that
> again (but never say never...)
>
> Hope this helps...

Hmm, interesting..

Will you share your reproducer with me? That will save me some time.

Ben

2019-02-05 12:37:27

by Cristian Marussi

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hi

On 05/02/2019 12:14, Benjamin Coddington wrote:
> On 5 Feb 2019, at 7:10, Cristian Marussi wrote:
>
>> Hi Ben
>>
>> On 05/02/2019 11:53, Benjamin Coddington wrote:
>>> Hello Cristian and Punit,
>>>
>>> Did you ever get to the bottom of this one? We just saw this on one
>>> run
>>> of our 4.18.0-era ppc64le, and I'm wondering if we ever found the
>>> root
>>> cause.
>>
>> unfortunately I stopped working actively on finding the root cause,
>> since I've
>> found a viable workaround that let us unblock our broken LTP runs.
>>
>> Setting wsize=65536 in NFS bootparams completely solves the issue with
>> 64k pages
>> (and does NOT break 4k either :D): this confirmed my hyp that there is
>> some sort
>> of race when accounting refcounts during the lifetime of nfs_page
>> structs which
>> leads to a misscounted refcount...but as I said I never looked back
>> into that
>> again (but never say never...)
>>
>> Hope this helps...
>
> Hmm, interesting..
>
> Will you share your reproducer with me? That will save me some time.

Sure.

My reproducer is the attached nfs_stress.sh script; when invoked with the
following params:

./nfs_stress.sh -w 10 -s 160000 -t 10

it leads to a crash within 10secs BUT ONLY with 64KB page Kconfig AND ONLY if
the above wsize workaround is NOT applied. (or the cleanup-code trick mentioned
in the emails) (the choice of the -s size parameter seemed sensible in determine
how quick it will die...)

BUT UNFORTUNATELY this was true ONLY when running on an AEMv8 FastModel (1-cpu
A53) (whose timings are much different from a real board); I've never been able
to reproduce reliably on real ARM64 silicon instead. (or on x86)
So all my debug and triage was made on the model once I was able to quickly
reproduce the same crash (and in fact the workaround worked then fine also on
silicon...)

On real silicon instead the only reproducer was a full LTP run: we had
consistent failures every night with the same exact refcount stacktrace (but
every time on a different LTP test as a trigger...being related to NFS activity
I suppose it's normal); since we applied the wsize workaround we saw no more
crashes.

Thanks

Regards

Cristian

>
> Ben
>

Attachments:

nfs_stress.sh (2.17 kB)

2019-02-05 12:50:59

by Cristian Marussi

[permalink] [raw]

Subject: Re: [aarch64] refcount_t: use-after-free in NFS with 64k pages

Hi

On 05/02/2019 12:37, Cristian Marussi wrote:
> Hi
>
> On 05/02/2019 12:14, Benjamin Coddington wrote:
>> On 5 Feb 2019, at 7:10, Cristian Marussi wrote:
>>
>>> Hi Ben
>>>
>>> On 05/02/2019 11:53, Benjamin Coddington wrote:
>>>> Hello Cristian and Punit,
>>>>
>>>> Did you ever get to the bottom of this one? We just saw this on one
>>>> run
>>>> of our 4.18.0-era ppc64le, and I'm wondering if we ever found the
>>>> root
>>>> cause.
>>>
>>> unfortunately I stopped working actively on finding the root cause,
>>> since I've
>>> found a viable workaround that let us unblock our broken LTP runs.
>>>
>>> Setting wsize=65536 in NFS bootparams completely solves the issue with
>>> 64k pages
>>> (and does NOT break 4k either :D): this confirmed my hyp that there is
>>> some sort
>>> of race when accounting refcounts during the lifetime of nfs_page
>>> structs which
>>> leads to a misscounted refcount...but as I said I never looked back
>>> into that
>>> again (but never say never...)
>>>
>>> Hope this helps...
>>
>> Hmm, interesting..
>>
>> Will you share your reproducer with me? That will save me some time.
>
> Sure.
>
> My reproducer is the attached nfs_stress.sh script; when invoked with the
> following params:
>
> ./nfs_stress.sh -w 10 -s 160000 -t 10
>
> it leads to a crash within 10secs BUT ONLY with 64KB page Kconfig AND ONLY if
> the above wsize workaround is NOT applied. (or the cleanup-code trick mentioned
> in the emails) (the choice of the -s size parameter seemed sensible in determine
> how quick it will die...)
>
> BUT UNFORTUNATELY this was true ONLY when running on an AEMv8 FastModel (1-cpu
> A53) (whose timings are much different from a real board); I've never been able
> to reproduce reliably on real ARM64 silicon instead. (or on x86)
> So all my debug and triage was made on the model once I was able to quickly
> reproduce the same crash (and in fact the workaround worked then fine also on
> silicon...)
>
> On real silicon instead the only reproducer was a full LTP run: we had
> consistent failures every night with the same exact refcount stacktrace (but
> every time on a different LTP test as a trigger...being related to NFS activity
> I suppose it's normal); since we applied the wsize workaround we saw no more
> crashes.
>

I'll try to have a look at my old notes, but afaicr my suspicion that time was
on some NFS code-path in the early life of the nfs_page where a refcount +1 is
set conditional to some other nfs_page flags, so that a race could have happened
between the flags check and the +1 itself (and this could have been influenced
by the fact that a 64kb page took longer to be written out in 4kb wsize chunks...).
One thing which I remember while ftracing is that this thing happened after
after 10secs on a flock of pages...was not only one..you ended up with 3-5 or
more nfs_pages linked between themselves all with a dirty wrong refcount...and
those nfs_pages addresses where perfectly fine till a while before (so the same
nfs_page which I could see going through a clean free with proper refcount, when
re-used later in the test ended up being freed ahead of time due to a miscounted
refcount).

Regards

Cristian
>
> Thanks
>
> Regards
>
> Cristian
>
>>
>> Ben
>>
>