Hello.

On Thursday 11 May 2006 23:20, Rafael J. Wysocki wrote:
> Hi,
>
> On Thursday 11 May 2006 02:11, Nigel Cunningham wrote:
> > Hi Andrew et al.
> >
> > On Thursday 11 May 2006 09:38, Andrew Morton wrote:
> > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > On Wednesday 10 May 2006 00:27, Andrew Morton wrote:
> > > > > "Rafael J. Wysocki" <[email protected]> wrote:
> > > > > > Now if the mapped pages that are not mapped by the
> > > > > > current task are considered, it turns out that they would change
> > > > > > only if they were reclaimed by try_to_free_pages(). Thus if we
> > > > > > take them out of reach of try_to_free_pages(), for example by
> > > > > > (temporarily) moving them out of their respective LRU lists after
> > > > > > creating the image, we will be able to include them in the image
> > > > > > without copying.
> > > > >
> > > > > I'm a bit curious about how this is true. There are all sorts of
> > > > > way in which there could be activity against these pages -
> > > > > interrupt-time asynchronous network Tx completion, async
> > > > > interrupt-time direct-io completion, tasklets, schedule_work(),
> > > > > etc, etc.
> > > >
> > > > AFAIK, many of these things are waited for uninterruptibly, and
> > > > uninterruptible tasks cannot be frozen.
> > >
> > > There can be situations where we won't be waiting on this IO at all.
> > > Network zero-copy transmit, for example.
> > >
> > > Or maybe there's some async writeback going on against pagecache -
> > > we'll end up looking at the page's LRU state within interrupt context
> > > at IO completion. (A sync would prevent this from happening).
> >
> > I believe more than a sync is needed in at least some cases. I've seen
> > XFS continue to submit I/O (presumably on the sb or such like) after
> > everything else has been frozen and data has been synced. Freezing bdevs
> > addressed this.
> >
> > > One possibly problematic scenario is where task A is doing a direct-IO
> > > read and task B truncates the same file - here, the page will be
> > > actually removed from the LRU and freed in interrupt context. The
> > > direct-IO read process will be waiting on the IO in D state though. It
> > > it was a synchronous read - if it was an AIO read then it won't be
> > > waiting on the IO. Something else might save us here, but it's
> > > fragile.
> >
> > Bdev freezing helps here too, right?
>
> Well, I'm not sure. How exactly?

I believe it will ensure that both operations will be completed and the waiter
woken.

> > > > Theoretically we may have a problem if there's an
> > > > interruptible task that waits for the completion of an operation that
> > > > gets finished after snapshotting the system. However that would have
> > > > to survive the syncing of filesystems, freezing of kernel threads,
> > > > freeing of memory as well as suspending and resuming all devices.
> > > > [In which case it would be starving to death. :-)]
> >
> > (For Rafael/Pavel): The swsusp version of the refrigerator signals these
> > processes to enter the freezer too, just in case the uninterruptible task
> > does continue, right?
>
> Uninterruptible tasks are not freezable with the swsusp's freezer at all.
> The other tasks are signaled to enter the refrigerator - first user space,
> then we sync filesystems and finally we freeze kernel threads.

Oooh. It would probably be good if you signalled them to enter the freezer,
just in case, and had the freezer code handle a process entering the path
when we no longer want to freeze.

Regards,

Nigel

Attachments:

(No filename) (3.47 kB)
(No filename) (189.00 B)
Download all attachments

2006-05-11 23:50:44

On Tuesday 16 May 2006 05:52, Rafael J. Wysocki wrote:
> Hi,
>
> On Monday 15 May 2006 11:48, Con Kolivas wrote:
> > On Sunday 14 May 2006 08:33, Rafael J. Wysocki wrote:
> > > On Friday 12 May 2006 12:30, Pavel Machek wrote:
> > > > > Please also remember that you are introducing complexity in other
> > > > > ways, with that swap prefetching code and so on. Any comparison in
> > > > > speed should include the time to fault back in pages that have been
> > > > > discarded.
> > > >
> > > > Well, swap prefetching is useful for other workloads, too; so it gets
> > > > developed/tested outside swsusp.
> > >
> > > Still my experience indicates that it doesn't play very nice with
> > > swsusp and unfortunately it hogs the I/O.
> >
> > There is no swap prefetching code linked in any way to swsusp suspend or
> > resume on mainline or -mm. It was a preliminary experiment and Rafael
> > lost interest in it so I never bothered pursuing it.
>
> I'm referring to the code currently in -mm, where kprefetchd sometimes
> starts prefetching like mad after resume which hurts the disk I/O really
> badly (unless I set /proc/sys/vm/swap_prefetch to 0, that is).

Not my experience. It does lots of disk I/O after resume because as you say
yourself there is heaps in swap, but I haven't seen that I/O hurt as it
simply stops the instant I do anything with the machine even as trivial as
moving the mouse.

> I think the problem is related to the fact that swsusp tends to leave quite
> a lot of pages in the swap, if they had to be swapped out before suspend,
> and that makes kprefetchd believe it should get these pages back into RAM,
> which usually is not the greatest idea.
>
> The above is only a speculation, however, and I'd have to investigate it a
> bit more to say something more certain. Anyway, my experience indicates
> that it usually is better to set /proc/sys/vm/swap_prefetch to 0 after
> resume, but YMMV.

My mileage definitely varies here. I don't concentrate on examining the fact
that the machine is doing any I/O, but how usable the machine is and it's my
experience that it is much better as about 1/3 of my applications are not
floundering entirely on swap. Fortunately there's a tunable there and it
allows you to set and unset it how you see fit.

Anyway the original point of my response was to point out to Nigel that there
is no added complexity on behalf of swsusp in the swap prefetch code.

--
-ck