2004-01-09 01:01:08

by Nigel Cunningham

[permalink] [raw]
Subject: Is this too ugly to merge?

Hi all.

I'm wanting to the opinion, if I may, of more experienced people
regarding changes I have implemented in my version of Software Suspend,
which I want to merge with Patrick and Pavel. Since I'm don't expect
that you're all familiar with how my version works, I'll give a fair bit
of background before I come to the question.

One of the problems I ran into in developing the code was the issue of
getting activity stopped so that (1) locks which are required in writing
the image aren't still held, (2) we can ensure that dirty buffers are
synced to disk and (3) freezing doesn't fail because of races between
processes entering the freezer. Point three is especially important.
With the implementation in the kernel at the moment, deadlocks can
easily happen under load. (I could provide examples but I'm sure you can
imagine).

To get around these problems, I tried a number of different approaches.
In the end, the one that has worked best has been to add hooks to the
entrance and exit for critical paths in the kernel, and maintain a count
of the number processes in those sections. There are also hooks to
temporarily decrement the counter at points where a thread can block in
kernel code, in cases where it can safely sit there until resume time.

These hooks also provide a means whereby processes that want to begin
work on a critical path can be held until post-resume.

Finally, processes can be marked as needed for syncing ('syncthreads'),
and allowed to continue through these hooks where 'normal' threads would
be held.

When we want to freeze activity, then, it is simply a matter of toggling
a flag and waiting for the number of active processes to reduce to zero.
During this time, user space threads that want to start new activity are
frozen (via the hook at the start of the critical path they try to
enter) until post suspend. Threads already in critical sections run
until they exit the critical path or pause at one of the 'safe points',
and syncthreads such as kjournald run normally.

Once the count reaches zero, we call sys sync to write dirty buffers
out. This is important because in my version, the image is written out
in two parts. First I write all pages on the active and inactive lists
to disk, then I make an atomic copy of the remaining pages, using both
the pages on those lists and (where necessary) free RAM. The number of
pages on the active & inactive lists is normally around 80% of RAM used,
so this allows a full image of memory to be written. Of course there are
other measures taken to ensure consistency of the image, but I digress.
Syncing, then, is important because it helps minimise damage if
something goes wrong during the suspend.

Now to the question. As you'll no doubt have guess, adding hooks for
freezing processes requires changes in a lot of places. I've tried to
make these as simple and clear as I can by using macros:

#define DECLARE_SWSUSP_LOCAL_VAR \
unsigned long swsusp_orig_value = (current->flags & PF_FRIDGE_WAIT); \
int swsusp_local_paused = 0;

#define SWSUSP_THREAD_FLAGS_RESET \
current->flags &= ~PF_FRIDGE_WAIT; \
swsusp_orig_value = 0;

#define SWSUSP_ACTIVITY_START(flags) do { \
(void)(swsusp_orig_value); \
(void)(swsusp_local_paused); \
if (!swsusp_orig_value) \
__swsusp_activity_start(flags, __FUNCTION__, __FILE__); \
} while(0)

#define SWSUSP_ACTIVITY_RESTARTING(freezer_flags) do { \
(void)(swsusp_orig_value); \
(void)(swsusp_local_paused); \
if (swsusp_local_paused) { \
__swsusp_activity_start(freezer_flags, __FUNCTION__, __FILE__); \
swsusp_local_paused = 0; \
} \
if (unlikely(current->flags & PF_FREEZE)) \
refrigerator(freezer_flags); \
} while(0)

#define SWSUSP_ACTIVITY_END do { \
if (!swsusp_orig_value) { \
if (current->flags & PF_FRIDGE_WAIT) { \
current->flags &= ~PF_FRIDGE_WAIT; \
atomic_dec(&swsusp_num_active); \
if ((u32)atomic_read(&swsusp_num_active) > 0xffff) { \
printk("Error: decremented swsusp_num_active below 0 in %s (%s).\n",
\
__FUNCTION__, \
__FILE__); \
} \
if (idletimeout && !atomic_read(&swsusp_num_active)) { \
printk("swsusp_num_active finally zero after timeout in %s (%s).\n",
\
__FUNCTION__, \
__FILE__); \
} \
} else \
if (likely((!suspend_task) || (current->pid != suspend_task))) { \
printk("Error Ending: %s (%d) set but doesn't have FRIDGE_WAIT now
in %s (%s).\n", \
current->comm, \
current->pid, \
__FUNCTION__, \
__FILE__); \
}; \
}; \
} while(0)

#define SWSUSP_ACTIVITY_PAUSING do { \
if ((current->flags & PF_FRIDGE_WAIT) && \
((swsusp_state & FREEZE_UNREFRIGERATED) || \
(!(current->flags & PF_SYNCTHREAD)))) { \
current->flags &= ~PF_FRIDGE_WAIT; \
atomic_dec(&swsusp_num_active); \
if ((u32)atomic_read(&swsusp_num_active) > 0xffff) { \
printk("Error: decremented swsusp_num_active below 0 in %s (%s).\n",
\
__FUNCTION__, \
__FILE__); \
} \
swsusp_local_paused = 1; \
} \
} while(0)

/* Like the above, but we decrement active count for SYNCTHREADs too.
* Note that the thread can start new activity. Indeed, it might need
* to when we begin our sync.
*/
#define SWSUSP_ACTIVITY_SYNCTHREAD_PAUSING do { \
if (current->flags & PF_FRIDGE_WAIT) { \
current->flags &= ~PF_FRIDGE_WAIT; \
atomic_dec(&swsusp_num_active); \
if ((u32)atomic_read(&swsusp_num_active) > 0xffff) { \
printk("Error: decremented swsusp_num_active below 0 in %s (%s).\n",
\
__FUNCTION__, \
__FILE__); \
} \
swsusp_local_paused = 1; \
} \
} while(0)

My question, then (at last!) is, are these 'too ugly', so that they'd
never get merged? If you do consider them ugly, is it because you'd like
to see better names (lower case? replace SWSUSP?) and/or because you
think the whole idea is ugly and have a better solution?

Thanks in advance for comments and suggestions.

Nigel

--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-09 04:43:40

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Nigel Cunningham wrote:

> Hi all.
>
> I'm wanting to the opinion, if I may, of more experienced people
> regarding changes I have implemented in my version of Software Suspend,
> which I want to merge with Patrick and Pavel. Since I'm don't expect
> that you're all familiar with how my version works, I'll give a fair bit
> of background before I come to the question.

Do they all have to be big ugly macros?



2004-01-09 04:49:41

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Well, they manipulate a variable local to the procedure, so I believe
so. I have to record which routine the activity flag was set it,
possibly pass through other routines that could set it and not clear it
until we get back to the routine where it was first set. I'm sure
there'd be another way, but I can only think of more complicated ones,
not simpler and prettier :>

Nigel

On Fri, 2004-01-09 at 17:44, Stephen Hemminger wrote:
> Nigel Cunningham wrote:
>
> > Hi all.
> >
> > I'm wanting to the opinion, if I may, of more experienced people
> > regarding changes I have implemented in my version of Software Suspend,
> > which I want to merge with Patrick and Pavel. Since I'm don't expect
> > that you're all familiar with how my version works, I'll give a fair bit
> > of background before I come to the question.
>
> Do they all have to be big ugly macros?
>
--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-09 05:38:23

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Oh, I did forget - there is a bit of debugging code in there that can be
taken out.

Regards,

Nigel

On Fri, 2004-01-09 at 17:44, Stephen Hemminger wrote:
> Nigel Cunningham wrote:
>
> > Hi all.
> >
> > I'm wanting to the opinion, if I may, of more experienced people
> > regarding changes I have implemented in my version of Software Suspend,
> > which I want to merge with Patrick and Pavel. Since I'm don't expect
> > that you're all familiar with how my version works, I'll give a fair bit
> > of background before I come to the question.
>
> Do they all have to be big ugly macros?
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-13 10:44:16

by Pavel Machek

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Hi!

> My question, then (at last!) is, are these 'too ugly', so that they'd
> never get merged? If you do consider them ugly, is it because you'd like
> to see better names (lower case? replace SWSUSP?) and/or because you
> think the whole idea is ugly and have a better solution?

Well, problem is not as much with uglyness of those macros, but with
need to patch many files to include those macros. If you can get
number of uses of those macros down to, say, 10, it will get
better. Can't you just do something at each syscall/pagefault
entry/exit?

Ordinarily, process get stopped by sending them SIGSTOP. That suggests
to me that we might be able to reuse existing mechanism... SIGSTOP-ed
task should not hold any locks since it can be SIGSTOPed for very long
time. And if SIGSTOP does not work properly ... well that would be
simple bugfix.

Pavel

--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-13 11:47:47

by Pavel Machek

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Hi!

> I'm wanting to the opinion, if I may, of more experienced people
> regarding changes I have implemented in my version of Software Suspend,
> which I want to merge with Patrick and Pavel. Since I'm don't expect
> that you're all familiar with how my version works, I'll give a fair bit
> of background before I come to the question.
>
> One of the problems I ran into in developing the code was the issue of
> getting activity stopped so that (1) locks which are required in writing
> the image aren't still held, (2) we can ensure that dirty buffers are
> synced to disk and (3) freezing doesn't fail because of races between
> processes entering the freezer. Point three is especially important.
> With the implementation in the kernel at the moment, deadlocks can
> easily happen under load. (I could provide examples but I'm sure you can
> imagine).
>
> To get around these problems, I tried a number of different approaches.
> In the end, the one that has worked best has been to add hooks to the
> entrance and exit for critical paths in the kernel, and maintain a count
> of the number processes in those sections. There are also hooks to
> temporarily decrement the counter at points where a thread can block in
> kernel code, in cases where it can safely sit there until resume time.
>
> These hooks also provide a means whereby processes that want to begin
> work on a critical path can be held until post-resume.
>
> Finally, processes can be marked as needed for syncing ('syncthreads'),
> and allowed to continue through these hooks where 'normal' threads would
> be held.
>
> When we want to freeze activity, then, it is simply a matter of toggling
> a flag and waiting for the number of active processes to reduce to zero.
> During this time, user space threads that want to start new activity are
> frozen (via the hook at the start of the critical path they try to
> enter) until post suspend. Threads already in critical sections run
> until they exit the critical path or pause at one of the 'safe points',
> and syncthreads such as kjournald run normally.

Okay, I can now remember (and agree to) that we need to suspend
userspace first, and only then suspend kernelspace. Bug I don't see
why we can't suspend userspace using old, SIGSTOP-like, method.

Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

2004-01-19 00:39:34

by Nigel Cunningham

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Hi.

I'll take the following approach: I'm pretty sure that I've tried
suspending userspace before kernel threads before, and still ran into
possible deadlocks. Nevertheless, I'll set that up and then bang really
hard on it using Michael's test scripts, and let you know the results.
If and when we see that this approach won't cut the mustard, we can come
back to considering this approach. Sound okay?

By the way, the macros are not an alternative to the SIGSTOP-like
method. They're used with it, to ensure it works without deadlocking.

Regards,

Nigel

On Wed, 2004-01-14 at 00:49, Pavel Machek wrote:
> Okay, I can now remember (and agree to) that we need to suspend
> userspace first, and only then suspend kernelspace. Bug I don't see
> why we can't suspend userspace using old, SIGSTOP-like, method.

--
My work on Software Suspend is graciously brought to you by
LinuxFund.org.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-01-19 09:52:38

by Pavel Machek

[permalink] [raw]
Subject: Re: Is this too ugly to merge?

Hi!

> I'll take the following approach: I'm pretty sure that I've tried
> suspending userspace before kernel threads before, and still ran into
> possible deadlocks. Nevertheless, I'll set that up and then bang really
> hard on it using Michael's test scripts, and let you know the results.
> If and when we see that this approach won't cut the mustard, we can come
> back to considering this approach. Sound okay?

Well, I'd say "if this approach won't cut the mustard, and we
understand *why* it does not work". If we can find a way to deadlock
plain SIGSTOP, that's good, too, because someone will likely fix
it. (And if SIGSTOP is broken, intrusive patch is okay for fixing
that).
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]