2015-06-10 13:40:35

by Dave Jones

[permalink] [raw]
Subject: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

Found this on serial console this morning. The machine had rebooted itself shortly
afterwards (surprising, given I don't have panic-on-oops or similar set).

Dave

page:ffffea0002b0a040 count:4 mapcount:0 mapping:ffff8800abf76ad0 index:0x0
flags: 0x4000000000000806(error|referenced|private)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
------------[ cut here ]------------
kernel BUG at mm/filemap.c:745!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 32187 Comm: trinity-c3 Not tainted 4.1.0-rc7-gelk-debug+ #4
task: ffff8800b6bd0a50 ti: ffff8800abf50000 task.ti: ffff8800abf50000
RIP: 0010:[<ffffffffb017292c>] [<ffffffffb017292c>] unlock_page+0x7c/0x80
RSP: 0000:ffff8800abf53a58 EFLAGS: 00010292
RAX: 0000000000000036 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffffb00c9e29 RDI: ffffffffb00c9a44
RBP: ffff8800abf53a58 R08: 0000000000000001 R09: 00000000000007f9
R10: 0000000000000478 R11: ffff8800bb20e848 R12: ffffea0002b0a040
R13: 0000000000000000 R14: 0000000000000fff R15: 0000000000000000
FS: 00007f9c1ea80700(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3ca086f850 CR3: 00000000a8ceb000 CR4: 00000000000007e0
DR0: 00007f1c6d7b0000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
ffff8800abf53b68 ffffffffc015c5ec 0000000000000fff 0000100800000008
ffff8800abf76778 0000000000000000 ffff8800abf53ab8 0000000000000fff
0000000000000000 ffff8800ac281000 ffff8800abf53c08 ffff8800abf76958
Call Trace:
[<ffffffffc015c5ec>] __do_readpage+0x61c/0x7c0 [btrfs]
[<ffffffffc0159873>] ? lock_extent_bits+0x83/0x2e0 [btrfs]
[<ffffffffb00a6581>] ? get_parent_ip+0x11/0x50
[<ffffffffc013fc20>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
[<ffffffffc015631a>] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
[<ffffffffc015c855>] __extent_read_full_page+0xc5/0xe0 [btrfs]
[<ffffffffc013fc20>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
[<ffffffffc015d7b7>] extent_read_full_page+0x37/0x60 [btrfs]
[<ffffffffc013cba5>] btrfs_readpage+0x25/0x30 [btrfs]
[<ffffffffc014cdba>] prepare_uptodate_page+0x4a/0x90 [btrfs]
[<ffffffffc014cf01>] prepare_pages+0x101/0x190 [btrfs]
[<ffffffffc014da43>] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
[<ffffffffc0151653>] btrfs_file_write_iter+0x463/0x570 [btrfs]
[<ffffffffb01d43d1>] __vfs_write+0xb1/0xf0
[<ffffffffb01d4a59>] vfs_write+0xa9/0x1b0
[<ffffffffb06d7c0c>] ? mutex_lock+0x2c/0x40
[<ffffffffb01d5849>] SyS_write+0x49/0xb0
[<ffffffffb01718f3>] ? context_tracking_user_enter+0x13/0x20
[<ffffffffb0012865>] ? syscall_trace_leave+0x95/0x140
[<ffffffffb06d9f17>] system_call_fastpath+0x12/0x6a
Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 59 ab f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 10 f4 a2 b0 e8 14 87 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f


2015-06-10 17:43:50

by Chris Mason

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On 06/10/2015 09:40 AM, Dave Jones wrote:
> Found this on serial console this morning. The machine had rebooted itself shortly
> afterwards (surprising, given I don't have panic-on-oops or similar set).
>

We had one other report of this a few months ago. Josef and I read
through all of this and decided it was impossible, so someone else must
be holding on to that page and unlocking it.

(that someone else could easily be btrfs, just not in this code path)

so...what horrible things have you been up to?

-chris

2015-06-10 18:42:34

by Dave Jones

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
> On 06/10/2015 09:40 AM, Dave Jones wrote:
> > Found this on serial console this morning. The machine had rebooted itself shortly
> > afterwards (surprising, given I don't have panic-on-oops or similar set).
> >
>
> We had one other report of this a few months ago. Josef and I read
> through all of this and decided it was impossible, so someone else must
> be holding on to that page and unlocking it.
>
> (that someone else could easily be btrfs, just not in this code path)
>
> so...what horrible things have you been up to?

Not sure exactly. I'll try and dig in some when I get home tonight.
I do seem to be able to reproduce it fairly easily at least.
(Twice this morning).

Dave

2015-06-16 17:14:52

by David Sterba

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
> On 06/10/2015 09:40 AM, Dave Jones wrote:
> > Found this on serial console this morning. The machine had rebooted itself shortly
> > afterwards (surprising, given I don't have panic-on-oops or similar set).
> >
>
> We had one other report of this a few months ago. Josef and I read
> through all of this and decided it was impossible, so someone else must
> be holding on to that page and unlocking it.
>
> (that someone else could easily be btrfs, just not in this code path)

https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug
symptoms match the "keywords", I haven't inspected it closely.

2015-06-16 17:19:38

by Chris Mason

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On 06/16/2015 01:14 PM, David Sterba wrote:
> On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
>> On 06/10/2015 09:40 AM, Dave Jones wrote:
>>> Found this on serial console this morning. The machine had rebooted itself shortly
>>> afterwards (surprising, given I don't have panic-on-oops or similar set).
>>>
>>
>> We had one other report of this a few months ago. Josef and I read
>> through all of this and decided it was impossible, so someone else must
>> be holding on to that page and unlocking it.
>>
>> (that someone else could easily be btrfs, just not in this code path)
>
> https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug
> symptoms match the "keywords", I haven't inspected it closely.
>

That one is in my integration-4.2 branch if you want to give it a shot.

-chris

2015-06-17 13:35:59

by Dave Jones

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On Tue, Jun 16, 2015 at 01:19:20PM -0400, Chris Mason wrote:
> On 06/16/2015 01:14 PM, David Sterba wrote:
> > On Wed, Jun 10, 2015 at 01:43:31PM -0400, Chris Mason wrote:
> >> On 06/10/2015 09:40 AM, Dave Jones wrote:
> >>> Found this on serial console this morning. The machine had rebooted itself shortly
> >>> afterwards (surprising, given I don't have panic-on-oops or similar set).
> >>
> >> We had one other report of this a few months ago. Josef and I read
> >> through all of this and decided it was impossible, so someone else must
> >> be holding on to that page and unlocking it.
> >>
> >> (that someone else could easily be btrfs, just not in this code path)
> >
> > https://patchwork.kernel.org/patch/6478941/ looks like the fix, bug
> > symptoms match the "keywords", I haven't inspected it closely.
> >
>
> That one is in my integration-4.2 branch if you want to give it a shot.

I was sceptical about this being the same bug, and it looks like I was right..

page:ffffea00027cc640 count:4 mapcount:0 mapping:ffff8800af11d8a0 index:0x0
flags: 0x4000000000000846(error|referenced|active|private)
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
------------[ cut here ]------------
kernel BUG at mm/filemap.c:745!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2
task: ffff8800b9ec0000 ti: ffff8800843ec000 task.ti: ffff8800843ec000
RIP: 0010:[<ffffffffb216ee5c>] [<ffffffffb216ee5c>] unlock_page+0x7c/0x80
RSP: 0018:ffff8800843efa58 EFLAGS: 00010292
RAX: 0000000000000036 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000080000000 RSI: ffffffffb20c80c9 RDI: ffffffffb20c7ce4
RBP: ffff8800843efa58 R08: 0000000000000001 R09: 0000000000000d1d
R10: 000000000000037c R11: 0000000000000001 R12: ffffea00027cc640
R13: 0000000000000000 R14: 0000000000000fff R15: 0000000000000000
FS: 00007fc9c42b5700(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000050978000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
ffff8800843efb68 ffffffffc02d06ec 0000000000000fff 0000100800000008
ffff8800af11d548 0000000000000000 ffff8800843efab8 0000000000000fff
0000000000000000 ffff88009f319000 ffff8800843efc08 ffff8800af11d728
Call Trace:
[<ffffffffc02d06ec>] __do_readpage+0x61c/0x7c0 [btrfs]
[<ffffffffc02cd973>] ? lock_extent_bits+0x83/0x2e0 [btrfs]
[<ffffffffb20a5001>] ? get_parent_ip+0x11/0x50
[<ffffffffc02b3ca0>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
[<ffffffffc02ca41a>] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
[<ffffffffc02d0955>] __extent_read_full_page+0xc5/0xe0 [btrfs]
[<ffffffffc02b3ca0>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
[<ffffffffc02d18b7>] extent_read_full_page+0x37/0x60 [btrfs]
[<ffffffffc02b0c25>] btrfs_readpage+0x25/0x30 [btrfs]
[<ffffffffc02c0e7a>] prepare_uptodate_page+0x4a/0x90 [btrfs]
[<ffffffffc02c0fc1>] prepare_pages+0x101/0x190 [btrfs]
[<ffffffffc02c1b03>] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
[<ffffffffc02c5713>] btrfs_file_write_iter+0x463/0x570 [btrfs]
[<ffffffffb2045eea>] ? bad_area+0x4a/0x60
[<ffffffffb21d05d1>] __vfs_write+0xb1/0xf0
[<ffffffffb21d0c59>] vfs_write+0xa9/0x1b0
[<ffffffffb21d1bd2>] SyS_pwrite64+0x72/0xb0
[<ffffffffb20125d0>] ? syscall_trace_enter_phase2+0x220/0x260
[<ffffffffb2012715>] ? syscall_trace_leave+0x95/0x140
[<ffffffffb26d5b77>] tracesys_phase2+0x84/0x89
Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f
RIP [<ffffffffb216ee5c>] unlock_page+0x7c/0x80



Still haven't managed to narrow down a reproducer, but it shows up
consistently within 6 hrs or so of fuzzing.

Dave

2015-06-30 19:20:26

by Dave Jones

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote:

> page:ffffea00027cc640 count:4 mapcount:0 mapping:ffff8800af11d8a0 index:0x0
> flags: 0x4000000000000846(error|referenced|active|private)
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:745!
> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2
> task: ffff8800b9ec0000 ti: ffff8800843ec000 task.ti: ffff8800843ec000
> RIP: 0010:[<ffffffffb216ee5c>] [<ffffffffb216ee5c>] unlock_page+0x7c/0x80
> RSP: 0018:ffff8800843efa58 EFLAGS: 00010292
> RAX: 0000000000000036 RBX: 0000000000001000 RCX: 0000000000000000
> RDX: 0000000080000000 RSI: ffffffffb20c80c9 RDI: ffffffffb20c7ce4
> RBP: ffff8800843efa58 R08: 0000000000000001 R09: 0000000000000d1d
> R10: 000000000000037c R11: 0000000000000001 R12: ffffea00027cc640
> R13: 0000000000000000 R14: 0000000000000fff R15: 0000000000000000
> FS: 00007fc9c42b5700(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 0000000050978000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Stack:
> ffff8800843efb68 ffffffffc02d06ec 0000000000000fff 0000100800000008
> ffff8800af11d548 0000000000000000 ffff8800843efab8 0000000000000fff
> 0000000000000000 ffff88009f319000 ffff8800843efc08 ffff8800af11d728
> Call Trace:
> [<ffffffffc02d06ec>] __do_readpage+0x61c/0x7c0 [btrfs]
> [<ffffffffc02cd973>] ? lock_extent_bits+0x83/0x2e0 [btrfs]
> [<ffffffffb20a5001>] ? get_parent_ip+0x11/0x50
> [<ffffffffc02b3ca0>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
> [<ffffffffc02ca41a>] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
> [<ffffffffc02d0955>] __extent_read_full_page+0xc5/0xe0 [btrfs]
> [<ffffffffc02b3ca0>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
> [<ffffffffc02d18b7>] extent_read_full_page+0x37/0x60 [btrfs]
> [<ffffffffc02b0c25>] btrfs_readpage+0x25/0x30 [btrfs]
> [<ffffffffc02c0e7a>] prepare_uptodate_page+0x4a/0x90 [btrfs]
> [<ffffffffc02c0fc1>] prepare_pages+0x101/0x190 [btrfs]
> [<ffffffffc02c1b03>] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
> [<ffffffffc02c5713>] btrfs_file_write_iter+0x463/0x570 [btrfs]
> [<ffffffffb2045eea>] ? bad_area+0x4a/0x60
> [<ffffffffb21d05d1>] __vfs_write+0xb1/0xf0
> [<ffffffffb21d0c59>] vfs_write+0xa9/0x1b0
> [<ffffffffb21d1bd2>] SyS_pwrite64+0x72/0xb0
> [<ffffffffb20125d0>] ? syscall_trace_enter_phase2+0x220/0x260
> [<ffffffffb2012715>] ? syscall_trace_leave+0x95/0x140
> [<ffffffffb26d5b77>] tracesys_phase2+0x84/0x89
> Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f
> RIP [<ffffffffb216ee5c>] unlock_page+0x7c/0x80

Still occasionally bumping into this.
The 'count:4 mapcount:0' is constant in every instance I've seen
so far. Could that be a clue ?

I've seen various page flags, but it's always !locked

Ideas on additional debugging I could add ?

Dave

2015-06-30 19:23:24

by Josef Bacik

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On 06/30/2015 03:20 PM, Dave Jones wrote:
> On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote:
>
> > page:ffffea00027cc640 count:4 mapcount:0 mapping:ffff8800af11d8a0 index:0x0
> > flags: 0x4000000000000846(error|referenced|active|private)
> > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > ------------[ cut here ]------------
> > kernel BUG at mm/filemap.c:745!
> > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > CPU: 1 PID: 5931 Comm: trinity-c5 Not tainted 4.1.0-rc8-gelk-debug+ #2
> > task: ffff8800b9ec0000 ti: ffff8800843ec000 task.ti: ffff8800843ec000
> > RIP: 0010:[<ffffffffb216ee5c>] [<ffffffffb216ee5c>] unlock_page+0x7c/0x80
> > RSP: 0018:ffff8800843efa58 EFLAGS: 00010292
> > RAX: 0000000000000036 RBX: 0000000000001000 RCX: 0000000000000000
> > RDX: 0000000080000000 RSI: ffffffffb20c80c9 RDI: ffffffffb20c7ce4
> > RBP: ffff8800843efa58 R08: 0000000000000001 R09: 0000000000000d1d
> > R10: 000000000000037c R11: 0000000000000001 R12: ffffea00027cc640
> > R13: 0000000000000000 R14: 0000000000000fff R15: 0000000000000000
> > FS: 00007fc9c42b5700(0000) GS:ffff8800bf700000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000008 CR3: 0000000050978000 CR4: 00000000000007e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> > Stack:
> > ffff8800843efb68 ffffffffc02d06ec 0000000000000fff 0000100800000008
> > ffff8800af11d548 0000000000000000 ffff8800843efab8 0000000000000fff
> > 0000000000000000 ffff88009f319000 ffff8800843efc08 ffff8800af11d728
> > Call Trace:
> > [<ffffffffc02d06ec>] __do_readpage+0x61c/0x7c0 [btrfs]
> > [<ffffffffc02cd973>] ? lock_extent_bits+0x83/0x2e0 [btrfs]
> > [<ffffffffb20a5001>] ? get_parent_ip+0x11/0x50
> > [<ffffffffc02b3ca0>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
> > [<ffffffffc02ca41a>] ? btrfs_lookup_ordered_extent+0x9a/0xd0 [btrfs]
> > [<ffffffffc02d0955>] __extent_read_full_page+0xc5/0xe0 [btrfs]
> > [<ffffffffc02b3ca0>] ? btrfs_real_readdir+0x5e0/0x5e0 [btrfs]
> > [<ffffffffc02d18b7>] extent_read_full_page+0x37/0x60 [btrfs]
> > [<ffffffffc02b0c25>] btrfs_readpage+0x25/0x30 [btrfs]
> > [<ffffffffc02c0e7a>] prepare_uptodate_page+0x4a/0x90 [btrfs]
> > [<ffffffffc02c0fc1>] prepare_pages+0x101/0x190 [btrfs]
> > [<ffffffffc02c1b03>] __btrfs_buffered_write+0x1d3/0x650 [btrfs]
> > [<ffffffffc02c5713>] btrfs_file_write_iter+0x463/0x570 [btrfs]
> > [<ffffffffb2045eea>] ? bad_area+0x4a/0x60
> > [<ffffffffb21d05d1>] __vfs_write+0xb1/0xf0
> > [<ffffffffb21d0c59>] vfs_write+0xa9/0x1b0
> > [<ffffffffb21d1bd2>] SyS_pwrite64+0x72/0xb0
> > [<ffffffffb20125d0>] ? syscall_trace_enter_phase2+0x220/0x260
> > [<ffffffffb2012715>] ? syscall_trace_leave+0x95/0x140
> > [<ffffffffb26d5b77>] tracesys_phase2+0x84/0x89
> > Code: 10 48 d3 ee 48 8d 0c b6 48 89 c6 48 8d 3c ca 31 d2 e8 29 ca f4 ff 5d c3 0f 1f 80 00 00 00 00 48 c7 c6 c0 ed a2 b2 e8 f4 84 02 00 <0f> 0b 66 90 66 66 66 66 90 55 85 f6 48 89 e5 75 13 85 d2 74 3f
> > RIP [<ffffffffb216ee5c>] unlock_page+0x7c/0x80
>
> Still occasionally bumping into this.
> The 'count:4 mapcount:0' is constant in every instance I've seen
> so far. Could that be a clue ?
>
> I've seen various page flags, but it's always !locked
>
> Ideas on additional debugging I could add ?
>

Huh I just noticed that PG_Error seems to be set, is that the same for
every time? I wonder where that's getting set and why. I'll dig into
the areas we set that and see if I can spot anything. Thanks,

Josef

2015-06-30 19:29:00

by Dave Jones

[permalink] [raw]
Subject: Re: [4.1-rc7] btrfs related VM_BUG_ON in filemap.c

On Tue, Jun 30, 2015 at 03:23:00PM -0400, Josef Bacik wrote:
> On 06/30/2015 03:20 PM, Dave Jones wrote:
> > On Wed, Jun 17, 2015 at 09:35:41AM -0400, Dave Jones wrote:
> >
> > > page:ffffea00027cc640 count:4 mapcount:0 mapping:ffff8800af11d8a0 index:0x0
> > > flags: 0x4000000000000846(error|referenced|active|private)
> > > page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> > > ------------[ cut here ]------------
> > > kernel BUG at mm/filemap.c:745!
> >
> > Still occasionally bumping into this.
> > The 'count:4 mapcount:0' is constant in every instance I've seen
> > so far. Could that be a clue ?
> >
> > I've seen various page flags, but it's always !locked
> >
> > Ideas on additional debugging I could add ?
>
> Huh I just noticed that PG_Error seems to be set, is that the same for
> every time? I wonder where that's getting set and why. I'll dig into
> the areas we set that and see if I can spot anything. Thanks,

Seems to be set every time yeah.
I can try annotating those places that set it to see which one is triggering
when I get home.

Dave