2006-11-16 08:22:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi,

The following two patches introduce a mechanism that should allow us to
avoid suspend-related corruptions of XFS without the freezing of bdevs which
Pavel considers as too invasive (apart from this, the freezing of bdevs may
lead to some undesirable interactions with dm and for now it seems to be
supported for real by XFS only).

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller


2006-11-16 08:22:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH -mm 1/2] Support for freezeable workqueues

Make it possible to create a workqueue the worker thread of which will be
frozen during suspend, along with other kernel threads.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
include/linux/workqueue.h | 8 +++++---
kernel/workqueue.c | 20 ++++++++++++++------
2 files changed, 19 insertions(+), 9 deletions(-)

Index: linux-2.6.19-rc5-mm2/include/linux/workqueue.h
===================================================================
--- linux-2.6.19-rc5-mm2.orig/include/linux/workqueue.h
+++ linux-2.6.19-rc5-mm2/include/linux/workqueue.h
@@ -55,9 +55,11 @@ struct execute_work {
} while (0)

extern struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread);
-#define create_workqueue(name) __create_workqueue((name), 0)
-#define create_singlethread_workqueue(name) __create_workqueue((name), 1)
+ int singlethread,
+ int freezeable);
+#define create_workqueue(name) __create_workqueue((name), 0, 0)
+#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
+#define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0)

extern void destroy_workqueue(struct workqueue_struct *wq);

Index: linux-2.6.19-rc5-mm2/kernel/workqueue.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/kernel/workqueue.c
+++ linux-2.6.19-rc5-mm2/kernel/workqueue.c
@@ -31,6 +31,7 @@
#include <linux/mempolicy.h>
#include <linux/kallsyms.h>
#include <linux/debug_locks.h>
+#include <linux/freezer.h>

/*
* The per-CPU workqueue (if single thread, we always use the first
@@ -57,6 +58,8 @@ struct cpu_workqueue_struct {
struct task_struct *thread;

int run_depth; /* Detect run_workqueue() recursion depth */
+
+ int freezeable; /* Freeze the thread during suspend */
} ____cacheline_aligned;

/*
@@ -251,7 +254,8 @@ static int worker_thread(void *__cwq)
struct k_sigaction sa;
sigset_t blocked;

- current->flags |= PF_NOFREEZE;
+ if (!cwq->freezeable)
+ current->flags |= PF_NOFREEZE;

set_user_nice(current, -5);

@@ -274,6 +278,9 @@ static int worker_thread(void *__cwq)

set_current_state(TASK_INTERRUPTIBLE);
while (!kthread_should_stop()) {
+ if (cwq->freezeable)
+ try_to_freeze();
+
add_wait_queue(&cwq->more_work, &wait);
if (list_empty(&cwq->worklist))
schedule();
@@ -350,7 +357,7 @@ void fastcall flush_workqueue(struct wor
EXPORT_SYMBOL_GPL(flush_workqueue);

static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq,
- int cpu)
+ int cpu, int freezeable)
{
struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
struct task_struct *p;
@@ -360,6 +367,7 @@ static struct task_struct *create_workqu
cwq->thread = NULL;
cwq->insert_sequence = 0;
cwq->remove_sequence = 0;
+ cwq->freezeable = freezeable;
INIT_LIST_HEAD(&cwq->worklist);
init_waitqueue_head(&cwq->more_work);
init_waitqueue_head(&cwq->work_done);
@@ -375,7 +383,7 @@ static struct task_struct *create_workqu
}

struct workqueue_struct *__create_workqueue(const char *name,
- int singlethread)
+ int singlethread, int freezeable)
{
int cpu, destroy = 0;
struct workqueue_struct *wq;
@@ -395,7 +403,7 @@ struct workqueue_struct *__create_workqu
mutex_lock(&workqueue_mutex);
if (singlethread) {
INIT_LIST_HEAD(&wq->list);
- p = create_workqueue_thread(wq, singlethread_cpu);
+ p = create_workqueue_thread(wq, singlethread_cpu, freezeable);
if (!p)
destroy = 1;
else
@@ -403,7 +411,7 @@ struct workqueue_struct *__create_workqu
} else {
list_add(&wq->list, &workqueues);
for_each_online_cpu(cpu) {
- p = create_workqueue_thread(wq, cpu);
+ p = create_workqueue_thread(wq, cpu, freezeable);
if (p) {
kthread_bind(p, cpu);
wake_up_process(p);
@@ -657,7 +665,7 @@ static int __devinit workqueue_cpu_callb
mutex_lock(&workqueue_mutex);
/* Create a new workqueue thread for it. */
list_for_each_entry(wq, &workqueues, list) {
- if (!create_workqueue_thread(wq, hotcpu)) {
+ if (!create_workqueue_thread(wq, hotcpu, 0)) {
printk("workqueue for %i failed\n", hotcpu);
return NOTIFY_BAD;
}

2006-11-16 08:22:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH -mm 2/2] Use freezeable workqueues in XFS

Make the workqueues used by XFS freezeable, so their worker threads don't
submit any I/O after the suspend image has been created.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
fs/xfs/linux-2.6/xfs_buf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.19-rc5-mm2/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.19-rc5-mm2.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6.19-rc5-mm2/fs/xfs/linux-2.6/xfs_buf.c
@@ -1826,11 +1826,11 @@ xfs_buf_init(void)
if (!xfs_buf_zone)
goto out_free_trace_buf;

- xfslogd_workqueue = create_workqueue("xfslogd");
+ xfslogd_workqueue = create_freezeable_workqueue("xfslogd");
if (!xfslogd_workqueue)
goto out_free_buf_zone;

- xfsdatad_workqueue = create_workqueue("xfsdatad");
+ xfsdatad_workqueue = create_freezeable_workqueue("xfsdatad");
if (!xfsdatad_workqueue)
goto out_destroy_xfslogd_workqueue;

2006-11-16 15:04:06

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH -mm 1/2] Support for freezeable workqueues

Hi!

> Make it possible to create a workqueue the worker thread of which will be
> frozen during suspend, along with other kernel threads.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-16 15:05:43

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH -mm 2/2] Use freezeable workqueues in XFS

Hi!

> Make the workqueues used by XFS freezeable, so their worker threads don't
> submit any I/O after the suspend image has been created.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

Looks okay to me.
Pavel


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-17 00:51:31

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> Hi,
>
> The following two patches introduce a mechanism that should allow us to
> avoid suspend-related corruptions of XFS without the freezing of bdevs which
> Pavel considers as too invasive (apart from this, the freezing of bdevs may
> lead to some undesirable interactions with dm and for now it seems to be
> supported for real by XFS only).

Has this been tested and proven to fix the problem with XFS? It's
been asserted that this will fix XFS and suspend, but it's
not yet been proven that this is even the problem.

I think the problem is a race between sys_sync, the kernel thread
freeze and the xfsbufd flushing async, delayed write metadata
buffers resulting in a inconsistent suspend image being created.
If this is the case, then freezing the workqueues does not
fix the problem. i.e:

suspend xfs
------- ---
sys_sync completes
xfsbufd flushes delwri metadata
kernel thread freeze
workqueue freeze
suspend image start
async I/O starts to complete
suspend image finishes
async I/O all complete

The problem here is the memory image has an empty delayed write
metadata buffer queue, but the I/O completion queue will be missing
some (or all) of the I/O that was issued, and so on resume we have
a memory image that still thinks the I/Os are progress but they
are not queued anywhere for completion processing.

Hence after a successful resume after the above occurred on suspend,
we can have a filesystem that is potentially inconsistent, and it
will almost certainly hang soon after activity starts again on it
because we cannot push the tail of the log forwards due to the lost
buffers.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-17 15:15:04

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Fri 2006-11-17 11:50:52, David Chinner wrote:
> On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> > Hi,
> >
> > The following two patches introduce a mechanism that should allow us to
> > avoid suspend-related corruptions of XFS without the freezing of bdevs which
> > Pavel considers as too invasive (apart from this, the freezing of bdevs may
> > lead to some undesirable interactions with dm and for now it seems to be
> > supported for real by XFS only).
>
> Has this been tested and proven to fix the problem with XFS? It's
> been asserted that this will fix XFS and suspend, but it's
> not yet been proven that this is even the problem.
>
> I think the problem is a race between sys_sync, the kernel thread
> freeze and the xfsbufd flushing async, delayed write metadata
> buffers resulting in a inconsistent suspend image being created.
> If this is the case, then freezing the workqueues does not
> fix the problem. i.e:
>
> suspend xfs
> ------- ---
> sys_sync completes
> xfsbufd flushes delwri metadata
> kernel thread freeze
> workqueue freeze
> suspend image start
> async I/O starts to complete
> suspend image finishes
> async I/O all complete

This can't happen, because creating suspend image is atomic. (No
interrupts, no DMAs, no drivers running).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-11-17 16:22:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Friday, 17 November 2006 01:50, David Chinner wrote:
> On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> > Hi,
> >
> > The following two patches introduce a mechanism that should allow us to
> > avoid suspend-related corruptions of XFS without the freezing of bdevs which
> > Pavel considers as too invasive (apart from this, the freezing of bdevs may
> > lead to some undesirable interactions with dm and for now it seems to be
> > supported for real by XFS only).
>
> Has this been tested and proven to fix the problem with XFS? It's
> been asserted that this will fix XFS and suspend, but it's
> not yet been proven that this is even the problem.

The worker threads of the XFS work queues need not run after all of the other
tasks have been frozen. In particular, they need not run after the suspend
image has been created, so making them freeze is reasonsble anyway.
[BTW, I'm going to do the same with all of the worker threads in the system
that need not run during the suspend.]

The question remains whether it's _sufficient_ to freeze these worker threads
and my opinion is that yes, it should be sufficient.

> I think the problem is a race between sys_sync, the kernel thread
> freeze and the xfsbufd flushing async, delayed write metadata
> buffers resulting in a inconsistent suspend image being created.
> If this is the case, then freezing the workqueues does not
> fix the problem. i.e:
>
> suspend xfs
> ------- ---
> sys_sync completes
> xfsbufd flushes delwri metadata
> kernel thread freeze
> workqueue freeze

freeze device
disable IRQs
freeze system devices (including APICs etc.)

> suspend image start
> async I/O starts to complete

Now please tell me how this is possible?

> suspend image finishes
> async I/O all complete
>
> The problem here is the memory image has an empty delayed write
> metadata buffer queue, but the I/O completion queue will be missing
> some (or all) of the I/O that was issued, and so on resume we have
> a memory image that still thinks the I/Os are progress but they
> are not queued anywhere for completion processing.

Everything that was in memory before the suspend image has been created
will also be there right after the resume, because the only thing that can
access memory while the image is being created is the suspend thread.

I think the only problem that _may_ happen (without the patch) is this:

suspend xfs
------- ----
suspend image start
suspend image finishes
thaw system devices
enable IRQs
thaw devices
xfsbufd flushes delwri metadata
async I/O starts to complete
save image

Now, after a successful resume xfsbufd will attempt to flush the metadata
_once_ _again_, because it doesn't know of what has happend after
creating the image.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-11-20 00:16:43

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Fri, Nov 17, 2006 at 05:19:30PM +0100, Rafael J. Wysocki wrote:
> On Friday, 17 November 2006 01:50, David Chinner wrote:
> > On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> > Hi,
> > >
> > > The following two patches introduce a mechanism that should allow us to
> > > avoid suspend-related corruptions of XFS without the freezing of bdevs which
> > > Pavel considers as too invasive (apart from this, the freezing of bdevs may
> > > lead to some undesirable interactions with dm and for now it seems to be
> > > supported for real by XFS only).
> >
> > Has this been tested and proven to fix the problem with XFS? It's
> > been asserted that this will fix XFS and suspend, but it's
> > not yet been proven that this is even the problem.
>
> The worker threads of the XFS work queues need not run after all of the other
> tasks have been frozen. In particular, they need not run after the suspend
> image has been created, so making them freeze is reasonsble anyway.

Right, but that doesn't mean that this was the problem being seen.

> [BTW, I'm going to do the same with all of the worker threads in the system
> that need not run during the suspend.]

So you're planning on changing the current default workqueue
implementation rather than needing special freezable workqueues?

> The question remains whether it's _sufficient_ to freeze these worker threads
> and my opinion is that yes, it should be sufficient.

I still don't see how doing this for XFS changes anything at all....

> > I think the problem is a race between sys_sync, the kernel thread
> > freeze and the xfsbufd flushing async, delayed write metadata
> > buffers resulting in a inconsistent suspend image being created.
> > If this is the case, then freezing the workqueues does not
> > fix the problem. i.e:
> >
> > suspend xfs
> > ------- ---
> > sys_sync completes
> > xfsbufd flushes delwri metadata
> > kernel thread freeze
> > workqueue freeze
>
> freeze device
> disable IRQs
> freeze system devices (including APICs etc.)
>
> > suspend image start
> > async I/O starts to complete
>
> Now please tell me how this is possible?

Right, that is not possible, but....

> > suspend image finishes
> > async I/O all complete

... it seems this is.

> > The problem here is the memory image has an empty delayed write
> > metadata buffer queue, but the I/O completion queue will be missing
> > some (or all) of the I/O that was issued, and so on resume we have
> > a memory image that still thinks the I/Os are progress but they
> > are not queued anywhere for completion processing.
>
> Everything that was in memory before the suspend image has been created
> will also be there right after the resume, because the only thing that can
> access memory while the image is being created is the suspend thread.
>
> I think the only problem that _may_ happen (without the patch) is this:
>
> suspend xfs
> ------- ----
> suspend image start
> suspend image finishes
> thaw system devices
> enable IRQs
> thaw devices
> xfsbufd flushes delwri metadata
> async I/O starts to complete
> save image
>
> Now, after a successful resume xfsbufd will attempt to flush the metadata
> _once_ _again_, because it doesn't know of what has happend after
> creating the image.

The question about the scenario you propose above is how does
xfsbufd run at that point in time if kernel threads were frozen
before the system image was taken? However, that is irrelevant
because the XFS image, as you state, would be consistent in memory
in the suspend image because you didn't trip over the sys_sync/
xfsbufd/kernel thread freeze race that takes the filesystem
out of idle state....

I think we are at a stalemate here - you say it's sufficient, I
think it doesn't change anything - and we don't have a repeatable
test case that we can use to find the real problem with and break
the stalemate....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-20 11:02:46

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi all.

I've did some testing this afternoon with bdev freezing disabled and
without any non-vanilla code to freeze kthreads (Rafael's or my older
version).

If I put a BUG_ON() in submit_bio for non suspend2 I/O, it catches this
trace:

submit_bio
xfs_buf_iorequest
xlog_bdstrat_cb
xlog_state_release_iclog
xlog_state_sync_all
xfs_log_force
xfs_syncsub
xfs_sync
vfs_sync
vfs_sync_worker
xfssyncd
keventd_create_kthread

I haven't yet reproduced anything on another code path (eg pdflush).

So, it would appear that freezing kthreads without freezing bdevs should
be a possible solution. It may however leave some I/O unsynced
pre-resume and therefore result in possible dataloss if no resume
occurs. I therefore wonder whether it's better to stick with bdev
freezing or create some variant wherein XFS is taught to fully flush
pending writes and not create new I/O.

Regards,

Nigel

2006-11-20 20:44:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi,

On Monday, 20 November 2006 12:02, Nigel Cunningham wrote:
> Hi all.
>
> I've did some testing this afternoon with bdev freezing disabled and
> without any non-vanilla code to freeze kthreads (Rafael's or my older
> version).

Thanks for testing this.

> If I put a BUG_ON() in submit_bio for non suspend2 I/O, it catches this
> trace:
>
> submit_bio
> xfs_buf_iorequest
> xlog_bdstrat_cb
> xlog_state_release_iclog
> xlog_state_sync_all
> xfs_log_force
> xfs_syncsub
> xfs_sync
> vfs_sync
> vfs_sync_worker
> xfssyncd
> keventd_create_kthread

Well, this trace apparently comes from xfssyncd wich explicitly calls
try_to_freeze(). When does this happen?

> I haven't yet reproduced anything on another code path (eg pdflush).
>
> So, it would appear that freezing kthreads without freezing bdevs should
> be a possible solution. It may however leave some I/O unsynced
> pre-resume and therefore result in possible dataloss if no resume
> occurs. I therefore wonder whether it's better to stick with bdev
> freezing

It looks like xfs is the only filesystem that really implements bdev freezing,
so for other filesystems it's irrelevant. However, it may affect the dm, and
I'm not sure what happens if someone tries to use a swap file on a dm
device for the suspend _after_ we have frozen bdevs.

> or create some variant wherein XFS is taught to fully flush
> pending writes and not create new I/O.

I think we should prevent filesystems from submitting any new I/O after
processes have been frozen, this way or another.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-11-20 20:44:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Monday, 20 November 2006 01:15, David Chinner wrote:
> On Fri, Nov 17, 2006 at 05:19:30PM +0100, Rafael J. Wysocki wrote:
> > On Friday, 17 November 2006 01:50, David Chinner wrote:
> > > On Thu, Nov 16, 2006 at 09:12:49AM +0100, Rafael J. Wysocki wrote:
> > > Hi,
> > > >
> > > > The following two patches introduce a mechanism that should allow us to
> > > > avoid suspend-related corruptions of XFS without the freezing of bdevs which
> > > > Pavel considers as too invasive (apart from this, the freezing of bdevs may
> > > > lead to some undesirable interactions with dm and for now it seems to be
> > > > supported for real by XFS only).
> > >
> > > Has this been tested and proven to fix the problem with XFS? It's
> > > been asserted that this will fix XFS and suspend, but it's
> > > not yet been proven that this is even the problem.
> >
> > The worker threads of the XFS work queues need not run after all of the other
> > tasks have been frozen. In particular, they need not run after the suspend
> > image has been created, so making them freeze is reasonsble anyway.
>
> Right, but that doesn't mean that this was the problem being seen.
>
> > [BTW, I'm going to do the same with all of the worker threads in the system
> > that need not run during the suspend.]
>
> So you're planning on changing the current default workqueue
> implementation rather than needing special freezable workqueues?
>
> > The question remains whether it's _sufficient_ to freeze these worker threads
> > and my opinion is that yes, it should be sufficient.
>
> I still don't see how doing this for XFS changes anything at all....
>
> > > I think the problem is a race between sys_sync, the kernel thread
> > > freeze and the xfsbufd flushing async, delayed write metadata
> > > buffers resulting in a inconsistent suspend image being created.
> > > If this is the case, then freezing the workqueues does not
> > > fix the problem. i.e:
> > >
> > > suspend xfs
> > > ------- ---
> > > sys_sync completes
> > > xfsbufd flushes delwri metadata
> > > kernel thread freeze
> > > workqueue freeze
> >
> > freeze device
> > disable IRQs
> > freeze system devices (including APICs etc.)
> >
> > > suspend image start
> > > async I/O starts to complete
> >
> > Now please tell me how this is possible?
>
> Right, that is not possible, but....
>
> > > suspend image finishes
> > > async I/O all complete
>
> ... it seems this is.

Well, I think the async I/O will be completed before the image is created.
Still, if it's not, then it will be completed again after the resume, and the
same data will be written to the same locations.

> > > The problem here is the memory image has an empty delayed write
> > > metadata buffer queue, but the I/O completion queue will be missing
> > > some (or all) of the I/O that was issued, and so on resume we have
> > > a memory image that still thinks the I/Os are progress but they
> > > are not queued anywhere for completion processing.
> >
> > Everything that was in memory before the suspend image has been created
> > will also be there right after the resume, because the only thing that can
> > access memory while the image is being created is the suspend thread.
> >
> > I think the only problem that _may_ happen (without the patch) is this:
> >
> > suspend xfs
> > ------- ----
> > suspend image start
> > suspend image finishes
> > thaw system devices
> > enable IRQs
> > thaw devices
> > xfsbufd flushes delwri metadata
> > async I/O starts to complete
> > save image
> >
> > Now, after a successful resume xfsbufd will attempt to flush the metadata
> > _once_ _again_, because it doesn't know of what has happend after
> > creating the image.
>
> The question about the scenario you propose above is how does
> xfsbufd run at that point in time if kernel threads were frozen
> before the system image was taken?

Of course xfsbufd won't be running, sorry. I was thinking about xfslogd or
xfsdatad.

> However, that is irrelevant
> because the XFS image, as you state, would be consistent in memory
> in the suspend image because you didn't trip over the sys_sync/
> xfsbufd/kernel thread freeze race that takes the filesystem
> out of idle state....
>
> I think we are at a stalemate here - you say it's sufficient, I
> think it doesn't change anything - and we don't have a repeatable
> test case that we can use to find the real problem with and break
> the stalemate....

I think Nigel has just provided us with a test case. :-)

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-11-20 22:03:36

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi.

On Mon, 2006-11-20 at 21:40 +0100, Rafael J. Wysocki wrote:
> On Monday, 20 November 2006 12:02, Nigel Cunningham wrote:
> > If I put a BUG_ON() in submit_bio for non suspend2 I/O, it catches this
> > trace:
> >
> > submit_bio
> > xfs_buf_iorequest
> > xlog_bdstrat_cb
> > xlog_state_release_iclog
> > xlog_state_sync_all
> > xfs_log_force
> > xfs_syncsub
> > xfs_sync
> > vfs_sync
> > vfs_sync_worker
> > xfssyncd
> > keventd_create_kthread
>
> Well, this trace apparently comes from xfssyncd wich explicitly calls
> try_to_freeze(). When does this happen?

I'm sorry. Since sending the email I looked at this more, deciding in
the end that I had a false positive. I was resetting the flag that
enabled the check after thawing kernel threads instead of before.

I've looked back in logs to this time last year, when we were doing the
modifications that lead to using bdev freezing. This is one instance of
the BUG_ON() triggering then, from a different path. I don't believe
it's bogus :)

Nov 12 11:43:06 home-nb kernel: CPU: 0
Nov 12 11:43:06 home-nb kernel: EIP: 0060:[submit_bio+244/256] Not tainted VLI
Nov 12 11:43:06 home-nb kernel: EIP: 0060:[<c025ff54>] Not tainted VLI
Nov 12 11:43:06 home-nb kernel: EFLAGS: 00010246 (2.6.14d)
Nov 12 11:43:06 home-nb kernel: EIP is at submit_bio+0xf4/0x100
Nov 12 11:43:06 home-nb kernel: eax: 00000003 ebx: 00001000 ecx: 00000202 edx: 00000000
Nov 12 11:43:06 home-nb kernel: esi: c63700c0 edi: 00000001 ebp: df86dec8 esp: df86de80
Nov 12 11:43:07 home-nb kernel: ds: 007b es: 007b ss: 0068
Nov 12 11:43:07 home-nb kernel: Process xfsbufd (pid: 780, threadinfo=df86c000 task=c17cca50)
Nov 12 11:43:07 home-nb kernel: Stack: 0000001c 00000008 00000000 00000000 df86deb4 c016b942 dff45440 00000010
Nov 12 11:43:08 home-nb kernel: 00001000 00001000 00000001 df86dec8 c016bee3 dfcb74f8 c63700c0 00001000
Nov 12 11:43:08 home-nb kernel: 00000000 00000000 df86df14 e0b4dd52 00000001 c63700c0 00001000 00000000
Nov 12 11:43:08 home-nb kernel: Call Trace:
Nov 12 11:43:08 home-nb kernel: [show_stack+127/160] show_stack+0x7f/0xa0
Nov 12 11:43:08 home-nb kernel: [<c010357f>] show_stack+0x7f/0xa0
Nov 12 11:43:08 home-nb kernel: [show_registers+344/464] show_registers+0x158/0x1d0
Nov 12 11:43:09 home-nb kernel: [<c0103718>] show_registers+0x158/0x1d0
Nov 12 11:43:09 home-nb kernel: [die+204/368] die+0xcc/0x170
Nov 12 11:43:09 home-nb kernel: [<c01038fc>] die+0xcc/0x170
Nov 12 11:43:09 home-nb kernel: [do_trap+137/208] do_trap+0x89/0xd0
Nov 12 11:43:09 home-nb kernel: [<c03138c9>] do_trap+0x89/0xd0
Nov 12 11:43:09 home-nb kernel: [do_invalid_op+184/208] do_invalid_op+0xb8/0xd0
Nov 12 11:43:10 home-nb kernel: [<c0103c48>] do_invalid_op+0xb8/0xd0
Nov 12 11:43:10 home-nb kernel: [error_code+79/84] error_code+0x4f/0x54
Nov 12 11:43:10 home-nb kernel: [<c010323b>] error_code+0x4f/0x54
Nov 12 11:43:10 home-nb kernel: [pg0+544066898/1069077504] _pagebuf_ioapply+0x1a2/0x2f0 [xfs]
Nov 12 11:43:10 home-nb kernel: [<e0b4dd52>] _pagebuf_ioapply+0x1a2/0x2f0 [xfs]
Nov 12 11:43:10 home-nb kernel: [pg0+544067295/1069077504] pagebuf_iorequest+0x3f/0x180 [xfs]
Nov 12 11:43:10 home-nb kernel: [<e0b4dedf>] pagebuf_iorequest+0x3f/0x180 [xfs]
Nov 12 11:43:11 home-nb kernel: [pg0+544091130/1069077504] xfs_bdstrat_cb+0x3a/0x50 [xfs]
Nov 12 11:43:11 home-nb kernel: [<e0b53bfa>] xfs_bdstrat_cb+0x3a/0x50 [xfs]
Nov 12 11:43:11 home-nb kernel: [pg0+544069743/1069077504] xfsbufd+0x12f/0x1c0 [xfs]
Nov 12 11:43:11 home-nb kernel: [<e0b4e86f>] xfsbufd+0x12f/0x1c0 [xfs]
Nov 12 11:43:11 home-nb kernel: [kthread+182/256] kthread+0xb6/0x100
Nov 12 11:43:11 home-nb kernel: [<c012c5c6>] kthread+0xb6/0x100
Nov 12 11:43:11 home-nb kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
Nov 12 11:43:12 home-nb kernel: [<c01013a9>] kernel_thread_helper+0x5/0xc
Nov 12 11:43:12 home-nb kernel: Code: 24 0c 8b 81 9c 00 00 00 89 5c 24 04 c7 04 24 70 c7 34 c0 89 44 24 08 e8 1b 99 eb ff e9 63 ff ff ff f6 46 11 01 0f 85 66 ff ff ff <0f> 0b e9 5f ff ff ff 90 8d 74 26 00 55 89 e5 57 56 53 83 ec 2c

Looking at the start of the xfsbufd function, this now supports
freezing, so it shouldn't be an issue anymore, right?

> > So, it would appear that freezing kthreads without freezing bdevs should
> > be a possible solution. It may however leave some I/O unsynced
> > pre-resume and therefore result in possible dataloss if no resume
> > occurs. I therefore wonder whether it's better to stick with bdev
> > freezing
>
> It looks like xfs is the only filesystem that really implements bdev freezing,
> so for other filesystems it's irrelevant. However, it may affect the dm, and
> I'm not sure what happens if someone tries to use a swap file on a dm
> device for the suspend _after_ we have frozen bdevs.
>
> > or create some variant wherein XFS is taught to fully flush
> > pending writes and not create new I/O.
>
> I think we should prevent filesystems from submitting any new I/O after
> processes have been frozen, this way or another.

Yes. We should have some means of properly telling them to put things in
a sane state (however you define sane). It seems to belong between
freezing userspace and freezing kernelspace because we otherwise might
have problems with processes blocking, waiting for I/O and never
entering the refrigerator. But doing it there also creates problems with
(*cough*) fuse. Gah!

I would suggest that other filesystems be encouraged to implement bdev
freezing.

Nigel

2006-11-20 22:22:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi,

On Monday, 20 November 2006 23:03, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2006-11-20 at 21:40 +0100, Rafael J. Wysocki wrote:
> > On Monday, 20 November 2006 12:02, Nigel Cunningham wrote:
> > > If I put a BUG_ON() in submit_bio for non suspend2 I/O, it catches this
> > > trace:
> > >
> > > submit_bio
> > > xfs_buf_iorequest
> > > xlog_bdstrat_cb
> > > xlog_state_release_iclog
> > > xlog_state_sync_all
> > > xfs_log_force
> > > xfs_syncsub
> > > xfs_sync
> > > vfs_sync
> > > vfs_sync_worker
> > > xfssyncd
> > > keventd_create_kthread
> >
> > Well, this trace apparently comes from xfssyncd wich explicitly calls
> > try_to_freeze(). When does this happen?
>
> I'm sorry. Since sending the email I looked at this more, deciding in
> the end that I had a false positive. I was resetting the flag that
> enabled the check after thawing kernel threads instead of before.
>
> I've looked back in logs to this time last year, when we were doing the
> modifications that lead to using bdev freezing. This is one instance of
> the BUG_ON() triggering then, from a different path. I don't believe
> it's bogus :)
>
> Nov 12 11:43:06 home-nb kernel: CPU: 0
> Nov 12 11:43:06 home-nb kernel: EIP: 0060:[submit_bio+244/256] Not tainted VLI
> Nov 12 11:43:06 home-nb kernel: EIP: 0060:[<c025ff54>] Not tainted VLI
> Nov 12 11:43:06 home-nb kernel: EFLAGS: 00010246 (2.6.14d)
> Nov 12 11:43:06 home-nb kernel: EIP is at submit_bio+0xf4/0x100
> Nov 12 11:43:06 home-nb kernel: eax: 00000003 ebx: 00001000 ecx: 00000202 edx: 00000000
> Nov 12 11:43:06 home-nb kernel: esi: c63700c0 edi: 00000001 ebp: df86dec8 esp: df86de80
> Nov 12 11:43:07 home-nb kernel: ds: 007b es: 007b ss: 0068
> Nov 12 11:43:07 home-nb kernel: Process xfsbufd (pid: 780, threadinfo=df86c000 task=c17cca50)
> Nov 12 11:43:07 home-nb kernel: Stack: 0000001c 00000008 00000000 00000000 df86deb4 c016b942 dff45440 00000010
> Nov 12 11:43:08 home-nb kernel: 00001000 00001000 00000001 df86dec8 c016bee3 dfcb74f8 c63700c0 00001000
> Nov 12 11:43:08 home-nb kernel: 00000000 00000000 df86df14 e0b4dd52 00000001 c63700c0 00001000 00000000
> Nov 12 11:43:08 home-nb kernel: Call Trace:
> Nov 12 11:43:08 home-nb kernel: [show_stack+127/160] show_stack+0x7f/0xa0
> Nov 12 11:43:08 home-nb kernel: [<c010357f>] show_stack+0x7f/0xa0
> Nov 12 11:43:08 home-nb kernel: [show_registers+344/464] show_registers+0x158/0x1d0
> Nov 12 11:43:09 home-nb kernel: [<c0103718>] show_registers+0x158/0x1d0
> Nov 12 11:43:09 home-nb kernel: [die+204/368] die+0xcc/0x170
> Nov 12 11:43:09 home-nb kernel: [<c01038fc>] die+0xcc/0x170
> Nov 12 11:43:09 home-nb kernel: [do_trap+137/208] do_trap+0x89/0xd0
> Nov 12 11:43:09 home-nb kernel: [<c03138c9>] do_trap+0x89/0xd0
> Nov 12 11:43:09 home-nb kernel: [do_invalid_op+184/208] do_invalid_op+0xb8/0xd0
> Nov 12 11:43:10 home-nb kernel: [<c0103c48>] do_invalid_op+0xb8/0xd0
> Nov 12 11:43:10 home-nb kernel: [error_code+79/84] error_code+0x4f/0x54
> Nov 12 11:43:10 home-nb kernel: [<c010323b>] error_code+0x4f/0x54
> Nov 12 11:43:10 home-nb kernel: [pg0+544066898/1069077504] _pagebuf_ioapply+0x1a2/0x2f0 [xfs]
> Nov 12 11:43:10 home-nb kernel: [<e0b4dd52>] _pagebuf_ioapply+0x1a2/0x2f0 [xfs]
> Nov 12 11:43:10 home-nb kernel: [pg0+544067295/1069077504] pagebuf_iorequest+0x3f/0x180 [xfs]
> Nov 12 11:43:10 home-nb kernel: [<e0b4dedf>] pagebuf_iorequest+0x3f/0x180 [xfs]
> Nov 12 11:43:11 home-nb kernel: [pg0+544091130/1069077504] xfs_bdstrat_cb+0x3a/0x50 [xfs]
> Nov 12 11:43:11 home-nb kernel: [<e0b53bfa>] xfs_bdstrat_cb+0x3a/0x50 [xfs]
> Nov 12 11:43:11 home-nb kernel: [pg0+544069743/1069077504] xfsbufd+0x12f/0x1c0 [xfs]
> Nov 12 11:43:11 home-nb kernel: [<e0b4e86f>] xfsbufd+0x12f/0x1c0 [xfs]
> Nov 12 11:43:11 home-nb kernel: [kthread+182/256] kthread+0xb6/0x100
> Nov 12 11:43:11 home-nb kernel: [<c012c5c6>] kthread+0xb6/0x100
> Nov 12 11:43:11 home-nb kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
> Nov 12 11:43:12 home-nb kernel: [<c01013a9>] kernel_thread_helper+0x5/0xc
> Nov 12 11:43:12 home-nb kernel: Code: 24 0c 8b 81 9c 00 00 00 89 5c 24 04 c7 04 24 70 c7 34 c0 89 44 24 08 e8 1b 99 eb ff e9 63 ff ff ff f6 46 11 01 0f 85 66 ff ff ff <0f> 0b e9 5f ff ff ff 90 8d 74 26 00 55 89 e5 57 56 53 83 ec 2c
>
> Looking at the start of the xfsbufd function, this now supports
> freezing, so it shouldn't be an issue anymore, right?

Yes, it properly freezes itself so this kind of failure should not happen.

> > > So, it would appear that freezing kthreads without freezing bdevs should
> > > be a possible solution. It may however leave some I/O unsynced
> > > pre-resume and therefore result in possible dataloss if no resume
> > > occurs. I therefore wonder whether it's better to stick with bdev
> > > freezing
> >
> > It looks like xfs is the only filesystem that really implements bdev freezing,
> > so for other filesystems it's irrelevant. However, it may affect the dm, and
> > I'm not sure what happens if someone tries to use a swap file on a dm
> > device for the suspend _after_ we have frozen bdevs.
> >
> > > or create some variant wherein XFS is taught to fully flush
> > > pending writes and not create new I/O.
> >
> > I think we should prevent filesystems from submitting any new I/O after
> > processes have been frozen, this way or another.
>
> Yes. We should have some means of properly telling them to put things in
> a sane state (however you define sane). It seems to belong between
> freezing userspace and freezing kernelspace because we otherwise might
> have problems with processes blocking, waiting for I/O and never
> entering the refrigerator. But doing it there also creates problems with
> (*cough*) fuse. Gah!
>
> I would suggest that other filesystems be encouraged to implement bdev
> freezing.

I think I/O can only be submitted from the process context. Thus if we freeze
all (and I mean _all_) threads that are used by filesystems, including worker
threads, we should effectively prevent fs-related I/O from being submitted
after tasks have been frozen.

This can be done with the help of create_freezeable_workqueue() introduced in
this patch and I'd like to implement it (and there are only a few filesystems
that use work queues).

The freezing of bdevs might be a good solution if:
(1) we were sure it wouldn't interact with dm in a wrong way,
(2) _all_ of the filesystems implemented it.
For now, neither (1) nor (2) are satisfied and we need to know we're safe
_now_.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-11-20 22:26:33

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi.

On Mon, 2006-11-20 at 23:18 +0100, Rafael J. Wysocki wrote:
> I think I/O can only be submitted from the process context. Thus if we freeze
> all (and I mean _all_) threads that are used by filesystems, including worker
> threads, we should effectively prevent fs-related I/O from being submitted
> after tasks have been frozen.

I know that will work. It's what I used to do before the switch to bdev
freezing. I guess I need to look again at why I made the switch. Perhaps
it was just because you guys gave freezing kthreads a bad wrap as too
invasive or something. Bdev freezing is certainly fewer lines of code.

> This can be done with the help of create_freezeable_workqueue() introduced in
> this patch and I'd like to implement it (and there are only a few filesystems
> that use work queues).
>
> The freezing of bdevs might be a good solution if:
> (1) we were sure it wouldn't interact with dm in a wrong way,
> (2) _all_ of the filesystems implemented it.
> For now, neither (1) nor (2) are satisfied and we need to know we're safe
> _now_.

Yeah.

Regards,

Nigel

2006-11-20 22:39:55

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

(Sorry to reply again)

On Tue, 2006-11-21 at 09:26 +1100, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2006-11-20 at 23:18 +0100, Rafael J. Wysocki wrote:
> > I think I/O can only be submitted from the process context. Thus if we freeze
> > all (and I mean _all_) threads that are used by filesystems, including worker
> > threads, we should effectively prevent fs-related I/O from being submitted
> > after tasks have been frozen.
>
> I know that will work. It's what I used to do before the switch to bdev
> freezing. I guess I need to look again at why I made the switch. Perhaps
> it was just because you guys gave freezing kthreads a bad wrap as too
> invasive or something. Bdev freezing is certainly fewer lines of code.

No, it looks like I wrongly believed that XFS was submitting I/O off a
timer, so that freezing kthreads wasn't enough. In that case, it looks
like freezing kthreads should be a good solution.

Regards,

Nigel

2006-11-20 22:59:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Monday, 20 November 2006 23:39, Nigel Cunningham wrote:
> (Sorry to reply again)

(No big deal)

> On Tue, 2006-11-21 at 09:26 +1100, Nigel Cunningham wrote:
> > Hi.
> >
> > On Mon, 2006-11-20 at 23:18 +0100, Rafael J. Wysocki wrote:
> > > I think I/O can only be submitted from the process context. Thus if we freeze
> > > all (and I mean _all_) threads that are used by filesystems, including worker
> > > threads, we should effectively prevent fs-related I/O from being submitted
> > > after tasks have been frozen.
> >
> > I know that will work. It's what I used to do before the switch to bdev
> > freezing. I guess I need to look again at why I made the switch. Perhaps
> > it was just because you guys gave freezing kthreads a bad wrap as too
> > invasive or something. Bdev freezing is certainly fewer lines of code.
>
> No, it looks like I wrongly believed that XFS was submitting I/O off a
> timer, so that freezing kthreads wasn't enough. In that case, it looks
> like freezing kthreads should be a good solution.

Okay, so let's implement it. :-)

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller

2006-11-21 00:51:56

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi.

On Mon, 2006-11-20 at 23:55 +0100, Rafael J. Wysocki wrote:
> On Monday, 20 November 2006 23:39, Nigel Cunningham wrote:
> > (Sorry to reply again)
>
> (No big deal)
>
> > On Tue, 2006-11-21 at 09:26 +1100, Nigel Cunningham wrote:
> > > Hi.
> > >
> > > On Mon, 2006-11-20 at 23:18 +0100, Rafael J. Wysocki wrote:
> > > > I think I/O can only be submitted from the process context. Thus if we freeze
> > > > all (and I mean _all_) threads that are used by filesystems, including worker
> > > > threads, we should effectively prevent fs-related I/O from being submitted
> > > > after tasks have been frozen.
> > >
> > > I know that will work. It's what I used to do before the switch to bdev
> > > freezing. I guess I need to look again at why I made the switch. Perhaps
> > > it was just because you guys gave freezing kthreads a bad wrap as too
> > > invasive or something. Bdev freezing is certainly fewer lines of code.
> >
> > No, it looks like I wrongly believed that XFS was submitting I/O off a
> > timer, so that freezing kthreads wasn't enough. In that case, it looks
> > like freezing kthreads should be a good solution.
>
> Okay, so let's implement it. :-)

Agreed. I'm a bit confused now about what the latest version of your
patches is, but I'll be happy to switch back to kthread freezing in the
next Suspend2 release if it will help with getting them wider testing.

Nigel

2006-11-21 04:21:04

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

On Tue, Nov 21, 2006 at 09:03:26AM +1100, Nigel Cunningham wrote:
> I've looked back in logs to this time last year, when we were doing the
> modifications that lead to using bdev freezing. This is one instance of
> the BUG_ON() triggering then, from a different path. I don't believe
> it's bogus :)
>
> Nov 12 11:43:06 home-nb kernel: CPU: 0
> Nov 12 11:43:06 home-nb kernel: EIP: 0060:[submit_bio+244/256] Not tainted VLI
> Nov 12 11:43:06 home-nb kernel: EIP: 0060:[<c025ff54>] Not tainted VLI
> Nov 12 11:43:06 home-nb kernel: EFLAGS: 00010246 (2.6.14d)
> Nov 12 11:43:06 home-nb kernel: EIP is at submit_bio+0xf4/0x100
> Nov 12 11:43:06 home-nb kernel: eax: 00000003 ebx: 00001000 ecx: 00000202 edx: 00000000
> Nov 12 11:43:06 home-nb kernel: esi: c63700c0 edi: 00000001 ebp: df86dec8 esp: df86de80
> Nov 12 11:43:07 home-nb kernel: ds: 007b es: 007b ss: 0068
> Nov 12 11:43:07 home-nb kernel: Process xfsbufd (pid: 780, threadinfo=df86c000 task=c17cca50)
> Nov 12 11:43:07 home-nb kernel: Stack: 0000001c 00000008 00000000 00000000 df86deb4 c016b942 dff45440 00000010
> Nov 12 11:43:08 home-nb kernel: 00001000 00001000 00000001 df86dec8 c016bee3 dfcb74f8 c63700c0 00001000
> Nov 12 11:43:08 home-nb kernel: 00000000 00000000 df86df14 e0b4dd52 00000001 c63700c0 00001000 00000000
> Nov 12 11:43:08 home-nb kernel: Call Trace:
> Nov 12 11:43:08 home-nb kernel: [show_stack+127/160] show_stack+0x7f/0xa0
> Nov 12 11:43:08 home-nb kernel: [<c010357f>] show_stack+0x7f/0xa0
> Nov 12 11:43:08 home-nb kernel: [show_registers+344/464] show_registers+0x158/0x1d0
> Nov 12 11:43:09 home-nb kernel: [<c0103718>] show_registers+0x158/0x1d0
> Nov 12 11:43:09 home-nb kernel: [die+204/368] die+0xcc/0x170
> Nov 12 11:43:09 home-nb kernel: [<c01038fc>] die+0xcc/0x170
> Nov 12 11:43:09 home-nb kernel: [do_trap+137/208] do_trap+0x89/0xd0
> Nov 12 11:43:09 home-nb kernel: [<c03138c9>] do_trap+0x89/0xd0
> Nov 12 11:43:09 home-nb kernel: [do_invalid_op+184/208] do_invalid_op+0xb8/0xd0
> Nov 12 11:43:10 home-nb kernel: [<c0103c48>] do_invalid_op+0xb8/0xd0
> Nov 12 11:43:10 home-nb kernel: [error_code+79/84] error_code+0x4f/0x54
> Nov 12 11:43:10 home-nb kernel: [<c010323b>] error_code+0x4f/0x54
> Nov 12 11:43:10 home-nb kernel: [pg0+544066898/1069077504] _pagebuf_ioapply+0x1a2/0x2f0 [xfs]
> Nov 12 11:43:10 home-nb kernel: [<e0b4dd52>] _pagebuf_ioapply+0x1a2/0x2f0 [xfs]
> Nov 12 11:43:10 home-nb kernel: [pg0+544067295/1069077504] pagebuf_iorequest+0x3f/0x180 [xfs]
> Nov 12 11:43:10 home-nb kernel: [<e0b4dedf>] pagebuf_iorequest+0x3f/0x180 [xfs]
> Nov 12 11:43:11 home-nb kernel: [pg0+544091130/1069077504] xfs_bdstrat_cb+0x3a/0x50 [xfs]
> Nov 12 11:43:11 home-nb kernel: [<e0b53bfa>] xfs_bdstrat_cb+0x3a/0x50 [xfs]
> Nov 12 11:43:11 home-nb kernel: [pg0+544069743/1069077504] xfsbufd+0x12f/0x1c0 [xfs]
> Nov 12 11:43:11 home-nb kernel: [<e0b4e86f>] xfsbufd+0x12f/0x1c0 [xfs]
> Nov 12 11:43:11 home-nb kernel: [kthread+182/256] kthread+0xb6/0x100
> Nov 12 11:43:11 home-nb kernel: [<c012c5c6>] kthread+0xb6/0x100
> Nov 12 11:43:11 home-nb kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
> Nov 12 11:43:12 home-nb kernel: [<c01013a9>] kernel_thread_helper+0x5/0xc
> Nov 12 11:43:12 home-nb kernel: Code: 24 0c 8b 81 9c 00 00 00 89 5c 24 04 c7 04 24 70 c7 34 c0 89 44 24 08 e8 1b 99 eb ff e9 63 ff ff ff f6 46 11 01 0f 85 66 ff ff ff <0f> 0b e9 5f ff ff ff 90 8d 74 26 00 55 89 e5 57 56 53 83 ec 2c
>
> Looking at the start of the xfsbufd function, this now supports
> freezing, so it shouldn't be an issue anymore, right?

xfsbufd supported freezing long before this (first supported
in Feb '03), but API changes along the way have broken it at
times. e.g:

----------------------------
revision 1.193
date: 2005/04/22 07:25:00; author: nathans; state: Exp; lines: +0 -0
modid: xfs-linux-melb:xfs-kern:22342a
Resolve an issue with xfsbufd not getting along with swsusp.
----------------------------

This one, reported on 2.6.12-rc2-mm1 indicated the xfsbufd was
getting woken when it was already in the refrigerator. That was
to do with memory reclaim, but that was fixed well before November
last year....

So if there is a problem with xfsbufd running when it shouldn't
be, the problem has not been found nor has it been fixed as the
current code is funtionally identical w.r.t freezing xfsbufds
to November last year..

> > I think we should prevent filesystems from submitting any new I/O after
> > processes have been frozen, this way or another.
>
> Yes. We should have some means of properly telling them to put things in
> a sane state (however you define sane).

IMO, sane state == frozen filesystem.

> I would suggest that other filesystems be encouraged to implement bdev
> freezing.

Get's my vote. ;)

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-21 11:23:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH -mm 0/2] Use freezeable workqueues to avoid suspend-related XFS corruptions

Hi,

On Tuesday, 21 November 2006 01:51, Nigel Cunningham wrote:
> Hi.
>
> On Mon, 2006-11-20 at 23:55 +0100, Rafael J. Wysocki wrote:
> > On Monday, 20 November 2006 23:39, Nigel Cunningham wrote:
> > > (Sorry to reply again)
> >
> > (No big deal)
> >
> > > On Tue, 2006-11-21 at 09:26 +1100, Nigel Cunningham wrote:
> > > > Hi.
> > > >
> > > > On Mon, 2006-11-20 at 23:18 +0100, Rafael J. Wysocki wrote:
> > > > > I think I/O can only be submitted from the process context. Thus if we freeze
> > > > > all (and I mean _all_) threads that are used by filesystems, including worker
> > > > > threads, we should effectively prevent fs-related I/O from being submitted
> > > > > after tasks have been frozen.
> > > >
> > > > I know that will work. It's what I used to do before the switch to bdev
> > > > freezing. I guess I need to look again at why I made the switch. Perhaps
> > > > it was just because you guys gave freezing kthreads a bad wrap as too
> > > > invasive or something. Bdev freezing is certainly fewer lines of code.
> > >
> > > No, it looks like I wrongly believed that XFS was submitting I/O off a
> > > timer, so that freezing kthreads wasn't enough. In that case, it looks
> > > like freezing kthreads should be a good solution.
> >
> > Okay, so let's implement it. :-)
>
> Agreed. I'm a bit confused now about what the latest version of your
> patches is, but I'll be happy to switch back to kthread freezing in the
> next Suspend2 release if it will help with getting them wider testing.

The latest are:

support-for-freezeable-workqueues.patch
use-freezeable-workqueues-in-xfs.patch

(both attached for convenience) and the freezing of bdevs patch has been
dropped.

Greetings,
Rafael


--
You never change things by fighting the existing reality.
R. Buckminster Fuller


Attachments:
(No filename) (1.76 kB)
support-for-freezeable-workqueues.patch (3.92 kB)
use-freezeable-workqueues-in-xfs.patch (773.00 B)
Download all attachments