Received-SPF: pass (google.com: domain of linux-kernel+bounces-132800-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99;
Message-ID: <051052af-3b56-4290-98d3-fd5a1eb11ce1@redhat.com>
Date: Fri, 5 Apr 2024 12:13:05 +0200
Precedence: bulk
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v6 2/6] mm: swap: free_swap_and_cache_nr() as batched
 free_swap_and_cache()
To: Ryan Roberts <ryan.roberts@arm.com>,
 Andrew Morton <akpm@linux-foundation.org>,
 Matthew Wilcox <willy@infradead.org>, Huang Ying <ying.huang@intel.com>,
 Gao Xiang <xiang@kernel.org>, Yu Zhao <yuzhao@google.com>,
 Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@suse.com>,
 Kefeng Wang <wangkefeng.wang@huawei.com>, Barry Song <21cnbao@gmail.com>,
 Chris Li <chrisl@kernel.org>, Lance Yang <ioworker0@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <20240403114032.1162100-1-ryan.roberts@arm.com>
 <20240403114032.1162100-3-ryan.roberts@arm.com>
From: David Hildenbrand <david@redhat.com>
Content-Language: en-US
Autocrypt: addr=david@redhat.com; keydata=
 xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ
 dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL
 QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp
 XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK
 Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9
 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt
 WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc
 UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv
 jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb
 B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk
 ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW
 AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q
 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp
 rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf
 wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4
 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l
 pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd
 KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE
 BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs
 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF
 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9
 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz
 Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb
 T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A
 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk
 CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G
 NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75
 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx
 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS
 lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv
 AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa
 N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3
 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB
 boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq
 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f
 XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ
 a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq
 Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6
 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8
 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E
 th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr
 jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt
 WNyWQQ==
Organization: Red Hat
In-Reply-To: <20240403114032.1162100-3-ryan.roberts@arm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 03.04.24 13:40, Ryan Roberts wrote:
> Now that we no longer have a convenient flag in the cluster to determine
> if a folio is large, free_swap_and_cache() will take a reference and
> lock a large folio much more often, which could lead to contention and
> (e.g.) failure to split large folios, etc.
> 
> Let's solve that problem by batch freeing swap and cache with a new
> function, free_swap_and_cache_nr(), to free a contiguous range of swap
> entries together. This allows us to first drop a reference to each swap
> slot before we try to release the cache folio. This means we only try to
> release the folio once, only taking the reference and lock once - much
> better than the previous 512 times for the 2M THP case.
> 
> Contiguous swap entries are gathered in zap_pte_range() and
> madvise_free_pte_range() in a similar way to how present ptes are
> already gathered in zap_pte_range().
> 
> While we are at it, let's simplify by converting the return type of both
> functions to void. The return value was used only by zap_pte_range() to
> print a bad pte, and was ignored by everyone else, so the extra
> reporting wasn't exactly guaranteed. We will still get the warning with
> most of the information from get_swap_device(). With the batch version,
> we wouldn't know which pte was bad anyway so could print the wrong one.
> 
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>   include/linux/pgtable.h | 28 ++++++++++++++
>   include/linux/swap.h    | 12 ++++--
>   mm/internal.h           | 48 +++++++++++++++++++++++
>   mm/madvise.c            | 12 ++++--
>   mm/memory.c             | 13 ++++---
>   mm/swapfile.c           | 86 ++++++++++++++++++++++++++++++++---------
>   6 files changed, 167 insertions(+), 32 deletions(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index a3fc8150b047..0278259f7078 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -708,6 +708,34 @@ static inline void pte_clear_not_present_full(struct mm_struct *mm,
>   }
>   #endif
>   
> +#ifndef clear_not_present_full_ptes
> +/**
> + * clear_not_present_full_ptes - Clear consecutive not present PTEs.


Consecutive only in the page table or also in some other sense?

I suspect: just unrelated non-present entries of any kind (swp, nonswp) 
and any offset/pfn.

Consider document that.

> + * @mm: Address space the ptes represent.
> + * @addr: Address of the first pte.
> + * @ptep: Page table pointer for the first entry.
> + * @nr: Number of entries to clear.
> + * @full: Whether we are clearing a full mm.
> + *
> + * May be overridden by the architecture; otherwise, implemented as a simple
> + * loop over pte_clear_not_present_full().
> + *
> + * Context: The caller holds the page table lock.  The PTEs are all not present.
> + * The PTEs are all in the same PMD.
> + */
> +static inline void clear_not_present_full_ptes(struct mm_struct *mm,
> +		unsigned long addr, pte_t *ptep, unsigned int nr, int full)
> +{
> +	for (;;) {
> +		pte_clear_not_present_full(mm, addr, ptep, full);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +	}
> +}
> +#endif
> +
>   #ifndef __HAVE_ARCH_PTEP_CLEAR_FLUSH
>   extern pte_t ptep_clear_flush(struct vm_area_struct *vma,
>   			      unsigned long address,
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index f6f78198f000..5737236dc3ce 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -471,7 +471,7 @@ extern int swap_duplicate(swp_entry_t);
>   extern int swapcache_prepare(swp_entry_t);
>   extern void swap_free(swp_entry_t);
>   extern void swapcache_free_entries(swp_entry_t *entries, int n);
> -extern int free_swap_and_cache(swp_entry_t);
> +extern void free_swap_and_cache_nr(swp_entry_t entry, int nr);
>   int swap_type_of(dev_t device, sector_t offset);
>   int find_first_swap(dev_t *device);
>   extern unsigned int count_swap_pages(int, int);
> @@ -520,8 +520,9 @@ static inline void put_swap_device(struct swap_info_struct *si)
>   #define free_pages_and_swap_cache(pages, nr) \
>   	release_pages((pages), (nr));
>   


[...]

> +
> +/**
> + * swap_pte_batch - detect a PTE batch for a set of contiguous swap entries
> + * @start_ptep: Page table pointer for the first entry.
> + * @max_nr: The maximum number of table entries to consider.
> + * @entry: Swap entry recovered from the first table entry.
> + *
> + * Detect a batch of contiguous swap entries: consecutive (non-present) PTEs
> + * containing swap entries all with consecutive offsets and targeting the same
> + * swap type.
> + *

Likely you should document that any swp pte bits are ignored? ()

> + * max_nr must be at least one and must be limited by the caller so scanning
> + * cannot exceed a single page table.
> + *
> + * Return: the number of table entries in the batch.
> + */
> +static inline int swap_pte_batch(pte_t *start_ptep, int max_nr,
> +				 swp_entry_t entry)
> +{
> +	const pte_t *end_ptep = start_ptep + max_nr;
> +	unsigned long expected_offset = swp_offset(entry) + 1;
> +	unsigned int expected_type = swp_type(entry);
> +	pte_t *ptep = start_ptep + 1;
> +
> +	VM_WARN_ON(max_nr < 1);
> +	VM_WARN_ON(non_swap_entry(entry));
> +
> +	while (ptep < end_ptep) {
> +		pte_t pte = ptep_get(ptep);
> +
> +		if (pte_none(pte) || pte_present(pte))
> +			break;
> +
> +		entry = pte_to_swp_entry(pte);
> +
> +		if (non_swap_entry(entry) ||
> +		    swp_type(entry) != expected_type ||
> +		    swp_offset(entry) != expected_offset)
> +			break;
> +
> +		expected_offset++;
> +		ptep++;
> +	}
> +
> +	return ptep - start_ptep;
> +}

Looks very clean :)

I was wondering whether we could similarly construct the expected swp 
PTE and only check pte_same.

expected_pte = __swp_entry_to_pte(__swp_entry(expected_type, 
expected_offset));

.. or have a variant to increase only the swp offset for an existing 
pte. But non-trivial due to the arch-dependent format.

But then, we'd fail on mismatch of other swp pte bits.


On swapin, when reusing this function (likely!), we'll might to make 
sure that the PTE bits match as well.

See below regarding uffd-wp.


>   #endif /* CONFIG_MMU */
>   
>   void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio,
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 1f77a51baaac..070bedb4996e 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -628,6 +628,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>   	struct folio *folio;
>   	int nr_swap = 0;
>   	unsigned long next;
> +	int nr, max_nr;
>   
>   	next = pmd_addr_end(addr, end);
>   	if (pmd_trans_huge(*pmd))
> @@ -640,7 +641,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>   		return 0;
>   	flush_tlb_batched_pending(mm);
>   	arch_enter_lazy_mmu_mode();
> -	for (; addr != end; pte++, addr += PAGE_SIZE) {
> +	for (; addr != end; pte += nr, addr += PAGE_SIZE * nr) {
> +		nr = 1;
>   		ptent = ptep_get(pte);
>   
>   		if (pte_none(ptent))
> @@ -655,9 +657,11 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>   
>   			entry = pte_to_swp_entry(ptent);
>   			if (!non_swap_entry(entry)) {
> -				nr_swap--;
> -				free_swap_and_cache(entry);
> -				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +				max_nr = (end - addr) / PAGE_SIZE;
> +				nr = swap_pte_batch(pte, max_nr, entry);
> +				nr_swap -= nr;
> +				free_swap_and_cache_nr(entry, nr);
> +				clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm);
>   			} else if (is_hwpoison_entry(entry) ||
>   				   is_poisoned_swp_entry(entry)) {
>   				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> diff --git a/mm/memory.c b/mm/memory.c
> index 7dc6c3d9fa83..ef2968894718 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1637,12 +1637,13 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>   				folio_remove_rmap_pte(folio, page, vma);
>   			folio_put(folio);
>   		} else if (!non_swap_entry(entry)) {
> -			/* Genuine swap entry, hence a private anon page */
> +			max_nr = (end - addr) / PAGE_SIZE;
> +			nr = swap_pte_batch(pte, max_nr, entry);
> +			/* Genuine swap entries, hence a private anon pages */
>   			if (!should_zap_cows(details))
>   				continue;
> -			rss[MM_SWAPENTS]--;
> -			if (unlikely(!free_swap_and_cache(entry)))
> -				print_bad_pte(vma, addr, ptent, NULL);
> +			rss[MM_SWAPENTS] -= nr;
> +			free_swap_and_cache_nr(entry, nr);
>   		} else if (is_migration_entry(entry)) {
>   			folio = pfn_swap_entry_folio(entry);
>   			if (!should_zap_folio(details, folio))
> @@ -1665,8 +1666,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>   			pr_alert("unrecognized swap entry 0x%lx\n", entry.val);
>   			WARN_ON_ONCE(1);
>   		}
> -		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> -		zap_install_uffd_wp_if_needed(vma, addr, pte, 1, details, ptent);
> +		clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm);

For zap_install_uffd_wp_if_needed(), the uffd-wp bit has to match.

zap_install_uffd_wp_if_needed() will use the uffd-wp information in 
ptent->pteval to make a decision whether to place PTE_MARKER_UFFD_WP 
markers.

On mixture, you either lose some or place too many markers.

A simple workaround would be to disable any such batching if the VMA 
does have uffd-wp enabled.

> +		zap_install_uffd_wp_if_needed(vma, addr, pte, nr, details, ptent);
>   	} while (pte += nr, addr += PAGE_SIZE * nr, addr != end);
>   
>   	add_mm_rss_vec(mm, rss);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0d44ee2b4f9c..d059de6896c1 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -130,7 +130,11 @@ static inline unsigned char swap_count(unsigned char ent)
>   /* Reclaim the swap entry if swap is getting full*/
>   #define TTRS_FULL		0x4
>   
> -/* returns 1 if swap entry is freed */
> +/*
> + * returns number of pages in the folio that backs the swap entry. If positive,
> + * the folio was reclaimed. If negative, the folio was not reclaimed. If 0, no
> + * folio was associated with the swap entry.
> + */
>   static int __try_to_reclaim_swap(struct swap_info_struct *si,
>   				 unsigned long offset, unsigned long flags)
>   {
> @@ -155,6 +159,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>   			ret = folio_free_swap(folio);
>   		folio_unlock(folio);
>   	}
> +	ret = ret ? folio_nr_pages(folio) : -folio_nr_pages(folio);
>   	folio_put(folio);
>   	return ret;
>   }
> @@ -895,7 +900,7 @@ static int scan_swap_map_slots(struct swap_info_struct *si,
>   		swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
>   		spin_lock(&si->lock);
>   		/* entry was freed successfully, try to use this again */
> -		if (swap_was_freed)
> +		if (swap_was_freed > 0)
>   			goto checks;
>   		goto scan; /* check next one */
>   	}
> @@ -1572,32 +1577,75 @@ bool folio_free_swap(struct folio *folio)
>   	return true;
>   }
>   
> -/*
> - * Free the swap entry like above, but also try to
> - * free the page cache entry if it is the last user.
> - */
> -int free_swap_and_cache(swp_entry_t entry)

Can we have some documentation what this function expects? How does nr 
relate to entry?

i.e., offset range defined by [entry.offset, entry.offset + nr).

> +void free_swap_and_cache_nr(swp_entry_t entry, int nr)
>   {
> -	struct swap_info_struct *p;

It might be easier to get if you do

const unsigned long start_offset = swp_offset(entry);
const unsigned long end_offset = start_offset + nr;

> +	unsigned long end = swp_offset(entry) + nr;
> +	unsigned int type = swp_type(entry);
> +	struct swap_info_struct *si;
> +	bool any_only_cache = false;
> +	unsigned long offset;
>   	unsigned char count;
>   
>   	if (non_swap_entry(entry))
> -		return 1;
> +		return;
>   
> -	p = get_swap_device(entry);
> -	if (p) {
> -		if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) {
> -			put_swap_device(p);
> -			return 0;
> +	si = get_swap_device(entry);
> +	if (!si)
> +		return;
> +
> +	if (WARN_ON(end > si->max))
> +		goto out;
> +
> +	/*
> +	 * First free all entries in the range.
> +	 */
> +	for (offset = swp_offset(entry); offset < end; offset++) {
> +		if (!WARN_ON(data_race(!si->swap_map[offset]))) {

Ouch "!(!)). Confusing.

I'm sure there is a better way to write that, maybe using more lines

if (data_race(si->swap_map[offset])) {
	...
} else {
	WARN_ON_ONCE(1);
}


> +			count = __swap_entry_free(si, swp_entry(type, offset));
> +			if (count == SWAP_HAS_CACHE)
> +				any_only_cache = true;
>   		}
> +	}
> +
> +	/*
> +	 * Short-circuit the below loop if none of the entries had their
> +	 * reference drop to zero.
> +	 */
> +	if (!any_only_cache)
> +		goto out;
>   
> -		count = __swap_entry_free(p, entry);
> -		if (count == SWAP_HAS_CACHE)
> -			__try_to_reclaim_swap(p, swp_offset(entry),
> +	/*
> +	 * Now go back over the range trying to reclaim the swap cache. This is
> +	 * more efficient for large folios because we will only try to reclaim
> +	 * the swap once per folio in the common case. If we do
> +	 * __swap_entry_free() and __try_to_reclaim_swap() in the same loop, the
> +	 * latter will get a reference and lock the folio for every individual
> +	 * page but will only succeed once the swap slot for every subpage is
> +	 * zero.
> +	 */
> +	for (offset = swp_offset(entry); offset < end; offset += nr) {
> +		nr = 1;
> +		if (READ_ONCE(si->swap_map[offset]) == SWAP_HAS_CACHE) {

Here we use READ_ONCE() only, above data_race(). Hmmm.

> +			/*
> +			 * Folios are always naturally aligned in swap so
> +			 * advance forward to the next boundary. Zero means no
> +			 * folio was found for the swap entry, so advance by 1
> +			 * in this case. Negative value means folio was found
> +			 * but could not be reclaimed. Here we can still advance
> +			 * to the next boundary.
> +			 */
> +			nr = __try_to_reclaim_swap(si, offset,
>   					      TTRS_UNMAPPED | TTRS_FULL);
> -		put_swap_device(p);
> +			if (nr == 0)
> +				nr = 1;
> +			else if (nr < 0)
> +				nr = -nr;
> +			nr = ALIGN(offset + 1, nr) - offset;
> +		}

Apart from that nothing jumped at me.

-- 
Cheers,

David / dhildenb