Message-ID: <558766E4.5020801@redhat.com>
Date: Sun, 21 Jun 2015 21:37:40 -0400
From: Rik van Riel <riel@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: "Kirill A. Shutemov" <kirill@shutemov.name>,
        Ebru Akagunduz <ebru.akagunduz@gmail.com>
CC: linux-mm@kvack.org, akpm@linux-foundation.org,
        kirill.shutemov@linux.intel.com, n-horiguchi@ah.jp.nec.com,
        aarcange@redhat.com, iamjoonsoo.kim@lge.com, xiexiuqi@huawei.com,
        gorcunov@openvz.org, linux-kernel@vger.kernel.org, mgorman@suse.de,
        rientjes@google.com, vbabka@suse.cz, aneesh.kumar@linux.vnet.ibm.com,
        hughd@google.com, hannes@cmpxchg.org, mhocko@suse.cz,
        boaz@plexistor.com, raindel@mellanox.com
Subject: Re: [RFC v2 3/3] mm: make swapin readahead to improve thp collapse
 rate
References: <1434799686-7929-1-git-send-email-ebru.akagunduz@gmail.com> <1434799686-7929-4-git-send-email-ebru.akagunduz@gmail.com> <20150621181131.GA6710@node.dhcp.inet.fi>
In-Reply-To: <20150621181131.GA6710@node.dhcp.inet.fi>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2047
Lines: 63

On 06/21/2015 02:11 PM, Kirill A. Shutemov wrote:
> On Sat, Jun 20, 2015 at 02:28:06PM +0300, Ebru Akagunduz wrote:

>> +static void __collapse_huge_page_swapin(struct mm_struct *mm,
>> +					struct vm_area_struct *vma,
>> +					unsigned long address, pmd_t *pmd,
>> +					pte_t *pte)
>> +{
>> +	unsigned long _address;
>> +	pte_t pteval = *pte;
>> +	int swap_pte = 0;
>> +
>> +	pte = pte_offset_map(pmd, address);
>> +	for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
>> +	     pte++, _address += PAGE_SIZE) {
>> +		pteval = *pte;
>> +		if (is_swap_pte(pteval)) {
>> +			swap_pte++;
>> +			do_swap_page(mm, vma, _address, pte, pmd,
>> +				     FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
>> +				     pteval);
> 
> Hm. I guess this lacking error handling.
> We really should abort early at least for VM_FAULT_HWPOISON and VM_FAULT_OOM.

Good catch.

>> +			/* pte is unmapped now, we need to map it */
>> +			pte = pte_offset_map(pmd, _address);
> 
> No, it's within the same pte page table. It should be mapped with
> pte_offset_map() above.

It would be, except do_swap_page() unmaps the pte page table.

>> @@ -2551,6 +2586,8 @@ static void collapse_huge_page(struct mm_struct *mm,
>>  	if (!pmd)
>>  		goto out;
>>  
>> +	__collapse_huge_page_swapin(mm, vma, address, pmd, pte);
>> +
> 
> And now the pages we swapped in are not isolated, right?
> What prevents them from being swapped out again or whatever?

Nothing, but __collapse_huge_page_isolate is run with the
appropriate locks to ensure that once we actually collapse
the THP, things are present.

The way do_swap_page is called, khugepaged does not even
wait for pages to be brought in from swap. It just maps
in pages that are in the swap cache, and which can be
immediately locked (without waiting).

It will also start IO on pages that are not in memory
yet, and will hopefully get those next round.


-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/