Hi,
The patch below is for two races in sysV shared memory.
The first (minor) one is in shmem_free_swp:
swap_free (entry);
*ptr = (swp_entry_t){0};
freed++;
if (!(page = lookup_swap_cache(entry)))
continue;
delete_from_swap_cache(page);
page_cache_release(page);
has a window between the first swap_free() and the
lookup_swap_cache(). If the swap_free() frees the last ref to the
swap entry and another cpu allocates and caches the same entry before
the lookup, we'll end up destroying another task's swap cache.
The second is nastier. shmem_nopage() uses the inode semaphore to
serialise access to shmem_getpage_locked() for paging in shared memory
segments. Lookups in the page cache and in the shmem swap vector are
done to locate the entry. _getpage_ can move entries from swap to
page cache under protection of the shmem's info->lock spinlock.
shmem_writepage() is locked via the page lock, and moves shmem pages
from the page cache to the swap cache under protection of the same
info->lock spinlock.
However, shmem_nopage() does not hold this spinlock while doing its
lookups in the page cache and swap vectors, so it can race with
writepage, with once cpu in the middle of moving the page out of the
page cache in writepage and the other cpu then failing to find the
entry either in the page cache or in the shm swap entry vector.
Feedback welcome.
Cheers,
Stephen
On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
>
> The patch below is for two races in sysV shared memory.
+ spin_lock (&info->lock);
+
+ /* The shmem_swp_entry() call may have blocked, and
+ * shmem_writepage may have been moving a page between the page
+ * cache and swap cache. We need to recheck the page cache
+ * under the protection of the info->lock spinlock. */
+
+ page = find_lock_page(mapping, idx);
Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
Linus
> On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
> >
> > The patch below is for two races in sysV shared memory.
>
> + spin_lock (&info->lock);
> +
> + /* The shmem_swp_entry() call may have blocked, and
> + * shmem_writepage may have been moving a page between the page
> + * cache and swap cache. We need to recheck the page cache
> + * under the protection of the info->lock spinlock. */
> +
> + page = find_lock_page(mapping, idx);
>
> Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
Umm find_lock_page doesnt sleep does it ?
Alan
On Fri, 23 Mar 2001, Alan Cox wrote:
> > On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
> > >
> > > The patch below is for two races in sysV shared memory.
> >
> > + spin_lock (&info->lock);
> > +
> > + /* The shmem_swp_entry() call may have blocked, and
> > + * shmem_writepage may have been moving a page between the page
> > + * cache and swap cache. We need to recheck the page cache
> > + * under the protection of the info->lock spinlock. */
> > +
> > + page = find_lock_page(mapping, idx);
> >
> > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
>
> Umm find_lock_page doesnt sleep does it ?
It certainly does. find_lock_page() -> __find_lock_page() -> lock_page() ->
-> __lock_page() -> schedule().
On Fri, 23 Mar 2001, Alan Cox wrote:
> >
> > + spin_lock (&info->lock);
> > +
> > + /* The shmem_swp_entry() call may have blocked, and
> > + * shmem_writepage may have been moving a page between the page
> > + * cache and swap cache. We need to recheck the page cache
> > + * under the protection of the info->lock spinlock. */
> > +
> > + page = find_lock_page(mapping, idx);
> >
> > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
>
> Umm find_lock_page doesnt sleep does it ?
Sure it does. Note the "lock" in find_lock_page(). It will lock the page,
which implies sleeping if somebody is accessing it at the same time.
If you don't want to sleep, you need to use one of the wrappers for
"__find_page_nolock()". Something like "find_get_page()", which only
"gets" the page.
The naming actually does make sense in this area.
Linus
> > > + page = find_lock_page(mapping, idx);
> > >
> > > Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
> >
> > Umm find_lock_page doesnt sleep does it ?
>
> It certainly does. find_lock_page() -> __find_lock_page() -> lock_page() ->
> -> __lock_page() -> schedule().
Ok I missed the lock page one. Yep.
Alan
> If you don't want to sleep, you need to use one of the wrappers for
> "__find_page_nolock()". Something like "find_get_page()", which only
> "gets" the page.
* a rather lightweight function, finding and getting a reference to a
* hashed page atomically, waiting for it if it's locked.
__find_get_page has I think a misleading comment ?
> The naming actually does make sense in this area.
Yep
Alan Cox writes:
> Umm find_lock_page doesnt sleep does it ?
It does lock_page, which sleeps to get the lock if necessary.
Later,
David S. Miller
[email protected]
On Fri, 23 Mar 2001, Alan Cox wrote:
>
> __find_get_page has I think a misleading comment ?
Ehh..
I only said the _naming_ makes sense. [ Wild hand-waving ]
I suspect that what happened was that we split off the functions (one to
just get the page, one to lock it), and the comment that was associated
with the original "find_page()" never got removed, and just happens to sit
above one of the helper functions now - the one that didn't lock.
I'll fix the comment.
Linus
Hi,
On Fri, Mar 23, 2001 at 11:58:50AM -0800, Linus Torvalds wrote:
> Ehh.. Sleeping with the spin-lock held? Sounds like a truly bad idea.
Uggh --- the shmem code already does, see:
shmem_truncate->shmem_truncate_part->shmem_free_swp->
lookup_swap_cache->find_lock_page
It looks messy: lookup_swap_cache seems to be abusing the page lock
gratuitously, but there are probably callers of it which rely on the
assumption that it performs an implicit wait_on_page().
Rik, do you think it is really necessary to take the page lock and
release it inside lookup_swap_cache? I may be overlooking something,
but I can't see the benefit of it --- we can still race against
page_launder, so the page may still get locked behind our backs after
we get the reference from lookup_swap_cache (page_launder explicitly
avoids taking the pagecache hash spinlock which might avoid this
particular race).
--Stephen
On Sun, 25 Mar 2001, Stephen C. Tweedie wrote:
> Rik, do you think it is really necessary to take the page lock and
> release it inside lookup_swap_cache? I may be overlooking something,
> but I can't see the benefit of it ---
I don't think we need to do this, except to protect us from
using a page which isn't up-to-date yet and locked because
of disk IO.
Reclaim_page() takes the pagecache_lock before trying to
free anything, so there's no reason to lock against that.
regards,
Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...
http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/
Hi,
On Sat, Mar 24, 2001 at 10:05:18PM -0300, Rik van Riel wrote:
> On Sun, 25 Mar 2001, Stephen C. Tweedie wrote:
>
> > Rik, do you think it is really necessary to take the page lock and
> > release it inside lookup_swap_cache? I may be overlooking something,
> > but I can't see the benefit of it ---
>
> I don't think we need to do this, except to protect us from
> using a page which isn't up-to-date yet and locked because
> of disk IO.
But it doesn't --- page_launder can try to lock the page after it
checks the refcount, without taking any locks which protect us against
running lookup_swap_cache in parallel. If we get our reference after
page_launder checks the count, we can find the page getting locked out
from underneath our feet.
> Reclaim_page() takes the pagecache_lock before trying to
> free anything, so there's no reason to lock against that.
Exactly. We're not in danger of _losing_ the page, because
reclaim_page is locked more aggressively than page_launder. We still
risk having the page locked against us after lookup_swap_cache does
its own UnlockPage.
So, if lookup_swap_cache doesn't actually ensure that the page is
unlocked, are there any callers which implicitly rely on
lookup_swap_cache() doing a wait_on_page?
--Stephen
Hi Stephen,
On Fri, 23 Mar 2001, Stephen C. Tweedie wrote:
> @@ -234,11 +243,11 @@
> return -ENOMEM;
> }
>
> - spin_lock(&info->lock);
> - shmem_recalc_inode(page->mapping->host);
> entry = shmem_swp_entry(info, page->index);
> if (IS_ERR(entry)) /* this had been allocted on page allocation */
> BUG();
> + spin_lock(&info->lock);
> + shmem_recalc_inode(page->mapping->host);
> error = -EAGAIN;
> if (entry->val) {
> __swap_free(swap, 2);
I think this is wrong. The spinlock protects us against
shmem_truncate. shmem_swp_entry cannot sleep in this case since the
entry is allocated in nopage.
Greetings
Christoph