swapoff clear swap_info's SWP_USED flag prematurely and free its resources
after that. A concurrent swapon will reuse this swap_info while its previous
resources are not cleared completely.
These late freed resources are:
- p->percpu_cluster
- swap_cgroup_ctrl[type]
- block_device setting
- inode->i_flags &= ~S_SWAPFILE
This patch clear SWP_USED flag after all its resources freed, so that swapon
can reuse this swap_info by alloc_swap_info() safely.
Signed-off-by: Weijie Yang <[email protected]>
---
mm/swapfile.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 612a7c9..89071c3
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
p->swap_map = NULL;
cluster_info = p->cluster_info;
p->cluster_info = NULL;
- p->flags = 0;
frontswap_map = frontswap_map_get(p);
spin_unlock(&p->lock);
spin_unlock(&swap_lock);
@@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
mutex_unlock(&inode->i_mutex);
}
filp_close(swap_file, NULL);
+
+ /*
+ * clear SWP_USED flag after all resources freed
+ * so that swapon can reuse this swap_info in alloc_swap_info() safely
+ * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
+ */
+ spin_lock(&swap_lock);
+ p->flags = 0;
+ spin_unlock(&swap_lock);
+
err = 0;
atomic_inc(&proc_poll_event);
wake_up_interruptible(&proc_poll_wait);
--
1.7.10.4
On Thu, 09 Jan 2014 13:39:55 +0800 Weijie Yang <[email protected]> wrote:
> swapoff clear swap_info's SWP_USED flag prematurely and free its resources
> after that. A concurrent swapon will reuse this swap_info while its previous
> resources are not cleared completely.
>
> These late freed resources are:
> - p->percpu_cluster
> - swap_cgroup_ctrl[type]
> - block_device setting
> - inode->i_flags &= ~S_SWAPFILE
>
> This patch clear SWP_USED flag after all its resources freed, so that swapon
> can reuse this swap_info by alloc_swap_info() safely.
>
> ...
>
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> p->swap_map = NULL;
> cluster_info = p->cluster_info;
> p->cluster_info = NULL;
> - p->flags = 0;
> frontswap_map = frontswap_map_get(p);
> spin_unlock(&p->lock);
> spin_unlock(&swap_lock);
> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> mutex_unlock(&inode->i_mutex);
> }
> filp_close(swap_file, NULL);
> +
> + /*
> + * clear SWP_USED flag after all resources freed
> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
> + */
> + spin_lock(&swap_lock);
> + p->flags = 0;
> + spin_unlock(&swap_lock);
> +
> err = 0;
> atomic_inc(&proc_poll_event);
> wake_up_interruptible(&proc_poll_wait);
I'm scratching my head over the swap_lock use here. Is it being used
appropriately, is it the correct lock, etc.
swap_start() and friends are playing with SWP_USED, but they're using
swapon_mutex. I wonder if a well-timed read of /proc/swaps could cause
problems.
The swapfile.c code does not make for pleasant reading :(
On Thu, 09 Jan 2014 13:39:55 +0800 Weijie Yang <[email protected]> wrote:
> swapoff clear swap_info's SWP_USED flag prematurely and free its resources
> after that. A concurrent swapon will reuse this swap_info while its previous
> resources are not cleared completely.
>
> These late freed resources are:
> - p->percpu_cluster
> - swap_cgroup_ctrl[type]
> - block_device setting
> - inode->i_flags &= ~S_SWAPFILE
>
> This patch clear SWP_USED flag after all its resources freed, so that swapon
> can reuse this swap_info by alloc_swap_info() safely.
>
> ...
>
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> p->swap_map = NULL;
> cluster_info = p->cluster_info;
> p->cluster_info = NULL;
> - p->flags = 0;
> frontswap_map = frontswap_map_get(p);
> spin_unlock(&p->lock);
> spin_unlock(&swap_lock);
> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> mutex_unlock(&inode->i_mutex);
> }
> filp_close(swap_file, NULL);
> +
> + /*
> + * clear SWP_USED flag after all resources freed
> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
> + */
> + spin_lock(&swap_lock);
> + p->flags = 0;
> + spin_unlock(&swap_lock);
> +
> err = 0;
> atomic_inc(&proc_poll_event);
> wake_up_interruptible(&proc_poll_wait);
I didn't look too closely, but this patch might also address the race
which Krzysztof addressed with
http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch.
Can we please check that out?
I do prefer fixing all these swapon-vs-swapoff races with some large,
simple, wide-scope exclusion scheme. Perhaps SWP_USED is that scheme.
An alternative would be to add another mutex and just make sys_swapon()
and sys_swapoff() 100% exclusive. But that is plastering yet another
lock over this mess to hide the horrors which lurk within :(
On Sat, Jan 11, 2014 at 9:11 AM, Andrew Morton
<[email protected]> wrote:
> On Thu, 09 Jan 2014 13:39:55 +0800 Weijie Yang <[email protected]> wrote:
>
>> swapoff clear swap_info's SWP_USED flag prematurely and free its resources
>> after that. A concurrent swapon will reuse this swap_info while its previous
>> resources are not cleared completely.
>>
>> These late freed resources are:
>> - p->percpu_cluster
>> - swap_cgroup_ctrl[type]
>> - block_device setting
>> - inode->i_flags &= ~S_SWAPFILE
>>
>> This patch clear SWP_USED flag after all its resources freed, so that swapon
>> can reuse this swap_info by alloc_swap_info() safely.
>>
>> ...
>>
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> p->swap_map = NULL;
>> cluster_info = p->cluster_info;
>> p->cluster_info = NULL;
>> - p->flags = 0;
>> frontswap_map = frontswap_map_get(p);
>> spin_unlock(&p->lock);
>> spin_unlock(&swap_lock);
>> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> mutex_unlock(&inode->i_mutex);
>> }
>> filp_close(swap_file, NULL);
>> +
>> + /*
>> + * clear SWP_USED flag after all resources freed
>> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
>> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
>> + */
>> + spin_lock(&swap_lock);
>> + p->flags = 0;
>> + spin_unlock(&swap_lock);
>> +
>> err = 0;
>> atomic_inc(&proc_poll_event);
>> wake_up_interruptible(&proc_poll_wait);
>
> I didn't look too closely, but this patch might also address the race
> which Krzysztof addressed with
> http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch.
> Can we please check that out?
>
> I do prefer fixing all these swapon-vs-swapoff races with some large,
> simple, wide-scope exclusion scheme. Perhaps SWP_USED is that scheme.
>
> An alternative would be to add another mutex and just make sys_swapon()
> and sys_swapoff() 100% exclusive. But that is plastering yet another
> lock over this mess to hide the horrors which lurk within :(
>
Hi, Andrew. Thanks for your suggestion.
I checked Krzysztof's patch, it use the global swapon_mutex to protect
race condition among
swapon, swapoff and swap_start(). It is a kind of correct method, but
a heavy method.
I will try to resend a patchset to make lock usage in swapfile.c clear
and fine grit
On Mon, 13 Jan 2014 11:08:58 +0800 Weijie Yang <[email protected]> wrote:
> >> --- a/mm/swapfile.c
> >> +++ b/mm/swapfile.c
> >> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> >> p->swap_map = NULL;
> >> cluster_info = p->cluster_info;
> >> p->cluster_info = NULL;
> >> - p->flags = 0;
> >> frontswap_map = frontswap_map_get(p);
> >> spin_unlock(&p->lock);
> >> spin_unlock(&swap_lock);
> >> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> >> mutex_unlock(&inode->i_mutex);
> >> }
> >> filp_close(swap_file, NULL);
> >> +
> >> + /*
> >> + * clear SWP_USED flag after all resources freed
> >> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
> >> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
> >> + */
> >> + spin_lock(&swap_lock);
> >> + p->flags = 0;
> >> + spin_unlock(&swap_lock);
> >> +
> >> err = 0;
> >> atomic_inc(&proc_poll_event);
> >> wake_up_interruptible(&proc_poll_wait);
> >
> > I didn't look too closely, but this patch might also address the race
> > which Krzysztof addressed with
> > http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch.
> > Can we please check that out?
> >
> > I do prefer fixing all these swapon-vs-swapoff races with some large,
> > simple, wide-scope exclusion scheme. Perhaps SWP_USED is that scheme.
> >
> > An alternative would be to add another mutex and just make sys_swapon()
> > and sys_swapoff() 100% exclusive. But that is plastering yet another
> > lock over this mess to hide the horrors which lurk within :(
> >
>
> Hi, Andrew. Thanks for your suggestion.
>
> I checked Krzysztof's patch, it use the global swapon_mutex to protect
> race condition among
> swapon, swapoff and swap_start(). It is a kind of correct method, but
> a heavy method.
But do you agree that your
http://ozlabs.org/~akpm/mmots/broken-out/mm-swap-fix-race-on-swap_info-reuse-between-swapoff-and-swapon.patch
makes Krzysztof's
http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
obsolete?
I've been sitting on Krzysztof's
swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
for several months - Hugh had issues with it so I put it on hold and
nothing further happened.
> I will try to resend a patchset to make lock usage in swapfile.c clear
> and fine grit
OK, thanks. In the meanwhile I'm planning on dropping Krzysztof's
patch and merging your patch into 3.14-rc1, which is why I'd like
confirmation that your patch addresses the issues which Krzysztof
identified?
On Mon, Jan 13, 2014 at 11:27 AM, Andrew Morton
<[email protected]> wrote:
> On Mon, 13 Jan 2014 11:08:58 +0800 Weijie Yang <[email protected]> wrote:
>
>> >> --- a/mm/swapfile.c
>> >> +++ b/mm/swapfile.c
>> >> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> >> p->swap_map = NULL;
>> >> cluster_info = p->cluster_info;
>> >> p->cluster_info = NULL;
>> >> - p->flags = 0;
>> >> frontswap_map = frontswap_map_get(p);
>> >> spin_unlock(&p->lock);
>> >> spin_unlock(&swap_lock);
>> >> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> >> mutex_unlock(&inode->i_mutex);
>> >> }
>> >> filp_close(swap_file, NULL);
>> >> +
>> >> + /*
>> >> + * clear SWP_USED flag after all resources freed
>> >> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
>> >> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
>> >> + */
>> >> + spin_lock(&swap_lock);
>> >> + p->flags = 0;
>> >> + spin_unlock(&swap_lock);
>> >> +
>> >> err = 0;
>> >> atomic_inc(&proc_poll_event);
>> >> wake_up_interruptible(&proc_poll_wait);
>> >
>> > I didn't look too closely, but this patch might also address the race
>> > which Krzysztof addressed with
>> > http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch.
>> > Can we please check that out?
>> >
>> > I do prefer fixing all these swapon-vs-swapoff races with some large,
>> > simple, wide-scope exclusion scheme. Perhaps SWP_USED is that scheme.
>> >
>> > An alternative would be to add another mutex and just make sys_swapon()
>> > and sys_swapoff() 100% exclusive. But that is plastering yet another
>> > lock over this mess to hide the horrors which lurk within :(
>> >
>>
>> Hi, Andrew. Thanks for your suggestion.
>>
>> I checked Krzysztof's patch, it use the global swapon_mutex to protect
>> race condition among
>> swapon, swapoff and swap_start(). It is a kind of correct method, but
>> a heavy method.
>
> But do you agree that your
> http://ozlabs.org/~akpm/mmots/broken-out/mm-swap-fix-race-on-swap_info-reuse-between-swapoff-and-swapon.patch
> makes Krzysztof's
> http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
> obsolete?
Yes, I agree.
> I've been sitting on Krzysztof's
> swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
> for several months - Hugh had issues with it so I put it on hold and
> nothing further happened.
>
>> I will try to resend a patchset to make lock usage in swapfile.c clear
>> and fine grit
>
> OK, thanks. In the meanwhile I'm planning on dropping Krzysztof's
> patch and merging your patch into 3.14-rc1, which is why I'd like
> confirmation that your patch addresses the issues which Krzysztof
> identified?
>
I think so, Krzysztof and I both try to fix the same issue(reuse
swap_info while its
previous resources are not cleared completely). The different is
Krzysztof's patch
uses a global swapon_mutex and its commit log only focuses on set_blocksize(),
while my patch try to maintain the fine grit lock usage.
On Mon, Jan 13, 2014 at 11:51:42AM +0800, Weijie Yang wrote:
> On Mon, Jan 13, 2014 at 11:27 AM, Andrew Morton
> <[email protected]> wrote:
> > On Mon, 13 Jan 2014 11:08:58 +0800 Weijie Yang <[email protected]> wrote:
> >
> >> >> --- a/mm/swapfile.c
> >> >> +++ b/mm/swapfile.c
> >> >> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> >> >> p->swap_map = NULL;
> >> >> cluster_info = p->cluster_info;
> >> >> p->cluster_info = NULL;
> >> >> - p->flags = 0;
> >> >> frontswap_map = frontswap_map_get(p);
> >> >> spin_unlock(&p->lock);
> >> >> spin_unlock(&swap_lock);
> >> >> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> >> >> mutex_unlock(&inode->i_mutex);
> >> >> }
> >> >> filp_close(swap_file, NULL);
> >> >> +
> >> >> + /*
> >> >> + * clear SWP_USED flag after all resources freed
> >> >> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
> >> >> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
> >> >> + */
> >> >> + spin_lock(&swap_lock);
> >> >> + p->flags = 0;
> >> >> + spin_unlock(&swap_lock);
> >> >> +
> >> >> err = 0;
> >> >> atomic_inc(&proc_poll_event);
> >> >> wake_up_interruptible(&proc_poll_wait);
> > But do you agree that your
> > http://ozlabs.org/~akpm/mmots/broken-out/mm-swap-fix-race-on-swap_info-reuse-between-swapoff-and-swapon.patch
> > makes Krzysztof's
> > http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
> > obsolete?
>
> Yes, I agree.
>
> > I've been sitting on Krzysztof's
> > swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
> > for several months - Hugh had issues with it so I put it on hold and
> > nothing further happened.
> >
> >> I will try to resend a patchset to make lock usage in swapfile.c clear
> >> and fine grit
> >
> > OK, thanks. In the meanwhile I'm planning on dropping Krzysztof's
> > patch and merging your patch into 3.14-rc1, which is why I'd like
> > confirmation that your patch addresses the issues which Krzysztof
> > identified?
> >
>
> I think so, Krzysztof and I both try to fix the same issue(reuse
> swap_info while its
> previous resources are not cleared completely). The different is
> Krzysztof's patch
> uses a global swapon_mutex and its commit log only focuses on set_blocksize(),
> while my patch try to maintain the fine grit lock usage.
>
Maybe I should get some sleep first, but I found some minor nits.
Newly introduced window:
p->swap_map == NULL && (p->flags & SWP_USED)
breaks swap_info_get:
if (!(p->flags & SWP_USED))
goto bad_device;
offset = swp_offset(entry);
if (offset >= p->max)
goto bad_offset;
if (!p->swap_map[offset])
goto bad_free;
so that would need a trivial adjustment.
Another nit is that swap_start and swap_next do the following:
if (!(si->flags & SWP_USED) || !si->swap_map)
continue;
Testing for swap_map does not look very nice and regardless of your
patch the latter cannot be true if the former is not, thus the check
can be simplified to mere !si->swap_map.
I'm wondering if it would make sense to dedicate a flag (SWP_ALLOCATED?)
to control whether swapon can use give swap_info. That is, it would be
tested and set in alloc_swap_info & cleared like you clear SWP_USED now.
SWP_USED would be cleared as it is and would be set in _enable_swap_info
Then swap_info_get would be left unchanged and swap_* would test for
SWP_USED only.
--
Mateusz Guzik
On Mon, Jan 13, 2014 at 2:27 PM, Mateusz Guzik <[email protected]> wrote:
> On Mon, Jan 13, 2014 at 11:51:42AM +0800, Weijie Yang wrote:
>> On Mon, Jan 13, 2014 at 11:27 AM, Andrew Morton
>> <[email protected]> wrote:
>> > On Mon, 13 Jan 2014 11:08:58 +0800 Weijie Yang <[email protected]> wrote:
>> >
>> >> >> --- a/mm/swapfile.c
>> >> >> +++ b/mm/swapfile.c
>> >> >> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> >> >> p->swap_map = NULL;
>> >> >> cluster_info = p->cluster_info;
>> >> >> p->cluster_info = NULL;
>> >> >> - p->flags = 0;
>> >> >> frontswap_map = frontswap_map_get(p);
>> >> >> spin_unlock(&p->lock);
>> >> >> spin_unlock(&swap_lock);
>> >> >> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> >> >> mutex_unlock(&inode->i_mutex);
>> >> >> }
>> >> >> filp_close(swap_file, NULL);
>> >> >> +
>> >> >> + /*
>> >> >> + * clear SWP_USED flag after all resources freed
>> >> >> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
>> >> >> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
>> >> >> + */
>> >> >> + spin_lock(&swap_lock);
>> >> >> + p->flags = 0;
>> >> >> + spin_unlock(&swap_lock);
>> >> >> +
>> >> >> err = 0;
>> >> >> atomic_inc(&proc_poll_event);
>> >> >> wake_up_interruptible(&proc_poll_wait);
>> > But do you agree that your
>> > http://ozlabs.org/~akpm/mmots/broken-out/mm-swap-fix-race-on-swap_info-reuse-between-swapoff-and-swapon.patch
>> > makes Krzysztof's
>> > http://ozlabs.org/~akpm/mmots/broken-out/swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
>> > obsolete?
>>
>> Yes, I agree.
>>
>> > I've been sitting on Krzysztof's
>> > swap-fix-setting-page_size-blocksize-during-swapoff-swapon-race.patch
>> > for several months - Hugh had issues with it so I put it on hold and
>> > nothing further happened.
>> >
>> >> I will try to resend a patchset to make lock usage in swapfile.c clear
>> >> and fine grit
>> >
>> > OK, thanks. In the meanwhile I'm planning on dropping Krzysztof's
>> > patch and merging your patch into 3.14-rc1, which is why I'd like
>> > confirmation that your patch addresses the issues which Krzysztof
>> > identified?
>> >
>>
>> I think so, Krzysztof and I both try to fix the same issue(reuse
>> swap_info while its
>> previous resources are not cleared completely). The different is
>> Krzysztof's patch
>> uses a global swapon_mutex and its commit log only focuses on set_blocksize(),
>> while my patch try to maintain the fine grit lock usage.
>>
>
> Maybe I should get some sleep first, but I found some minor nits.
>
> Newly introduced window:
>
> p->swap_map == NULL && (p->flags & SWP_USED)
>
> breaks swap_info_get:
> if (!(p->flags & SWP_USED))
> goto bad_device;
> offset = swp_offset(entry);
> if (offset >= p->max)
> goto bad_offset;
> if (!p->swap_map[offset])
> goto bad_free;
>
> so that would need a trivial adjustment.
>
Hi, Mateusz. Thanks for review.
It could not happen. swapoff call try_to_unuse() to force all
swp_entries unused before
set p->swap_map NULL. So if somebody still hold a swp_entry by this
time, there must
be some error elsewhere.
Say more about it, I don't think it is a newly introduced window, the
current code set
p->swap_map NULL and then clear p->flags in swapoff, swap_info_get
access these fields
without lock, so this impossible window "exist" for many years.
It is really confusing, that is why I plan to resend a patchset to
make it clear, by comments
at least.
> Another nit is that swap_start and swap_next do the following:
> if (!(si->flags & SWP_USED) || !si->swap_map)
> continue;
>
> Testing for swap_map does not look very nice and regardless of your
> patch the latter cannot be true if the former is not, thus the check
> can be simplified to mere !si->swap_map.
Yes, mere !si->swap_map is enough. But how about use SWP_WRITEOK, I
think it is more
clear and hurt nobody.
> I'm wondering if it would make sense to dedicate a flag (SWP_ALLOCATED?)
> to control whether swapon can use give swap_info. That is, it would be
> tested and set in alloc_swap_info & cleared like you clear SWP_USED now.
> SWP_USED would be cleared as it is and would be set in _enable_swap_info
>
> Then swap_info_get would be left unchanged and swap_* would test for
> SWP_USED only.
I think SWP_USED and SWP_WRITEOK are enough, introduce another flag
would make things
more complex.
The first thing in my opition is make the lock and flag usage more
clear and readable in swapfile.c
If I miss something, plead let me know. Thanks!
> --
> Mateusz Guzik
On Thu, 9 Jan 2014, Weijie Yang wrote:
> swapoff clear swap_info's SWP_USED flag prematurely and free its resources
> after that. A concurrent swapon will reuse this swap_info while its previous
> resources are not cleared completely.
>
> These late freed resources are:
> - p->percpu_cluster
> - swap_cgroup_ctrl[type]
> - block_device setting
> - inode->i_flags &= ~S_SWAPFILE
>
> This patch clear SWP_USED flag after all its resources freed, so that swapon
> can reuse this swap_info by alloc_swap_info() safely.
>
> Signed-off-by: Weijie Yang <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: [email protected]
I've now read through the thread at last, and think this (or akpm's
mm-swap-fix-race-on-swap_info-reuse-between-swapoff-and-swapon.patch
more clearly commented version) is the best of the patches on offer.
I agree that it fixes Krzysztof's set_blocksize issue among others,
and I prefer this one to his. Largely because I dislike swapon_mutex:
it has always felt like one lock too many, so, contrary to akpm, I'm
usually (perhaps irrationally) resistant to extending its use.
swapon_mutex came into existence (as swapon_sem in 2.6.6) to handle a
very specific might_sleep issue where /proc/swaps was using swap_lock.
I may have abused it myself since in swapoff, not sure offhand: but
think of it as proc_swaps_mutex, that's what it's really about.
I'm sorry for derailing the previous discussion with my set_blocksize
doubts: I still don't understand what that's all about, but we didn't
get any clarification, and I now accept that it's safer to go on
doing what we've always done there - plus these fixes.
I think the use of swap_lock below is actually unnecessary, isn't it?
This is the only piece of code that might be writing to p->flags at
this point, and if another piece of code catches the before state
or the after state, so what?
But let's go ahead with
mm-swap-fix-race-on-swap_info-reuse-between-swapoff-and-swapon.patch
as is: no need to remove every redundancy (there is more near here!),
and I may be playing too trickily.
Thanks for the patch: I'll explain in a separate response
why I prefer this to your later 2/8 version.
Hugh
> ---
> mm/swapfile.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 612a7c9..89071c3
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1922,7 +1922,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> p->swap_map = NULL;
> cluster_info = p->cluster_info;
> p->cluster_info = NULL;
> - p->flags = 0;
> frontswap_map = frontswap_map_get(p);
> spin_unlock(&p->lock);
> spin_unlock(&swap_lock);
> @@ -1948,6 +1947,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> mutex_unlock(&inode->i_mutex);
> }
> filp_close(swap_file, NULL);
> +
> + /*
> + * clear SWP_USED flag after all resources freed
> + * so that swapon can reuse this swap_info in alloc_swap_info() safely
> + * it is ok to not hold p->lock after we cleared its SWP_WRITEOK
> + */
> + spin_lock(&swap_lock);
> + p->flags = 0;
> + spin_unlock(&swap_lock);
> +
> err = 0;
> atomic_inc(&proc_poll_event);
> wake_up_interruptible(&proc_poll_wait);
> --
> 1.7.10.4
On Mon, 13 Jan 2014, Weijie Yang wrote:
> On Mon, Jan 13, 2014 at 2:27 PM, Mateusz Guzik <[email protected]> wrote:
> >
> > Newly introduced window:
> >
> > p->swap_map == NULL && (p->flags & SWP_USED)
> >
> > breaks swap_info_get:
> > if (!(p->flags & SWP_USED))
> > goto bad_device;
> > offset = swp_offset(entry);
> > if (offset >= p->max)
> > goto bad_offset;
> > if (!p->swap_map[offset])
> > goto bad_free;
> >
> > so that would need a trivial adjustment.
> >
>
> Hi, Mateusz. Thanks for review.
>
> It could not happen. swapoff call try_to_unuse() to force all
> swp_entries unused before
> set p->swap_map NULL. So if somebody still hold a swp_entry by this
> time, there must be some error elsewhere.
That's not quite the right answer: we would still prefer to issue a
warning than oops on the NULL pointer; the key is that p->max is reset
to 0 before p->swap_map is set to NULL.
But those lines were written in the days before we became so aware of
memory barriers. If I'm to insist on that p->max argument, I should
be adding smp_wmb()s and smp_rmb()s to enforce it.
But y'know, I'm going to leave it as is, and fall back on your
"there must be some error elsewhere" argument to justify not adding
barriers, that have not yet proved to be needed in practice here.
>
> Say more about it, I don't think it is a newly introduced window, the
> current code set
> p->swap_map NULL and then clear p->flags in swapoff, swap_info_get
> access these fields
> without lock, so this impossible window "exist" for many years.
>
> It is really confusing, that is why I plan to resend a patchset to
> make it clear, by comments
> at least.
>
> > Another nit is that swap_start and swap_next do the following:
> > if (!(si->flags & SWP_USED) || !si->swap_map)
> > continue;
> >
> > Testing for swap_map does not look very nice and regardless of your
> > patch the latter cannot be true if the former is not, thus the check
> > can be simplified to mere !si->swap_map.
>
> Yes, mere !si->swap_map is enough. But how about use SWP_WRITEOK, I
> think it is more clear and hurt nobody.
No, I don't like your use of SWP_WRITEOK there in 2/8, for this reason:
it would exclude an area in try_to_unuse() from being shown, and that
function can take such a very long time, that it's rather helpful to
see if it's making slow progress through /proc/swaps or "swapon -s".
Using si->swap_map alone, yes, I guess that would do; or si->max.
With a comment if you wish.
>
> > I'm wondering if it would make sense to dedicate a flag (SWP_ALLOCATED?)
> > to control whether swapon can use give swap_info. That is, it would be
> > tested and set in alloc_swap_info & cleared like you clear SWP_USED now.
> > SWP_USED would be cleared as it is and would be set in _enable_swap_info
> >
> > Then swap_info_get would be left unchanged and swap_* would test for
> > SWP_USED only.
>
> I think SWP_USED and SWP_WRITEOK are enough, introduce another flag
> would make things more complex.
I share your instinct on that.
Hugh
> The first thing in my opition is make the lock and flag usage more
> clear and readable in swapfile.c
>
> If I miss something, plead let me know. Thanks!
>
> > --
> > Mateusz Guzik