Hi Kirill & Matthew,
In the func call chain, from split_huge_page() to lru_add_page_tail(),
Seems tail pages are added to lru list at line 963, but in this scenario
the head page has no lru bit and isn't set the bit later. Why we do this?
or do I miss sth?
Many Thanks
Alex
938 void lru_add_page_tail(struct page *page, struct page *page_tail,
939 struct lruvec *lruvec, struct list_head *list)
940 {
941 VM_BUG_ON_PAGE(!PageHead(page), page);
942 VM_BUG_ON_PAGE(PageCompound(page_tail), page);
943 VM_BUG_ON_PAGE(PageLRU(page_tail), page);
944 lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
945
946 if (!list)
947 SetPageLRU(page_tail);
948
949 if (likely(PageLRU(page)))
950 list_add_tail(&page_tail->lru, &page->lru);
951 else if (list) {
952 /* page reclaim is reclaiming a huge page */
953 get_page(page_tail);
954 list_add_tail(&page_tail->lru, list);
955 } else {
956 /*
957 * Head page has not yet been counted, as an hpage,
958 * so we must account for each subpage individually.
959 *
960 * Put page_tail on the list at the correct position
961 * so they all end up in order.
962 */
963 add_page_to_lru_list_tail(page_tail, lruvec,
964 page_lru(page_tail));
965 }
966 }
On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
> Hi Kirill & Matthew,
>
> In the func call chain, from split_huge_page() to lru_add_page_tail(),
> Seems tail pages are added to lru list at line 963, but in this scenario
> the head page has no lru bit and isn't set the bit later. Why we do this?
> or do I miss sth?
I don't understand how we get to split_huge_page() with a page that's
not on an LRU list. Both anonymous and page cache pages should be on
an LRU list. What am I missing?
On Thu, Jul 09, 2020 at 04:50:02PM +0100, Matthew Wilcox wrote:
> On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
> > Hi Kirill & Matthew,
> >
> > In the func call chain, from split_huge_page() to lru_add_page_tail(),
> > Seems tail pages are added to lru list at line 963, but in this scenario
> > the head page has no lru bit and isn't set the bit later. Why we do this?
> > or do I miss sth?
>
> I don't understand how we get to split_huge_page() with a page that's
> not on an LRU list. Both anonymous and page cache pages should be on
> an LRU list. What am I missing?
Right, and it's never got removed from LRU during the split. The tail
pages have to be added to LRU because they now separate from the tail
page.
--
Kirill A. Shutemov
?? 2020/7/10 ????12:07, Kirill A. Shutemov д??:
> On Thu, Jul 09, 2020 at 04:50:02PM +0100, Matthew Wilcox wrote:
>> On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
>>> Hi Kirill & Matthew,
>>>
>>> In the func call chain, from split_huge_page() to lru_add_page_tail(),
>>> Seems tail pages are added to lru list at line 963, but in this scenario
>>> the head page has no lru bit and isn't set the bit later. Why we do this?
>>> or do I miss sth?
>>
>> I don't understand how we get to split_huge_page() with a page that's
>> not on an LRU list. Both anonymous and page cache pages should be on
>> an LRU list. What am I missing?>
Thanks a lot for quick reply!
What I am confusing is the call chain: __iommu_dma_alloc_pages()
to split_huge_page(), in the func, splited page,
page = alloc_pages_node(nid, alloc_flags, order);
And if the pages were added into lru, they maybe reclaimed and lost,
that would be a panic bug. But in fact, this never happened for long time.
Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
in kselftest.
> Right, and it's never got removed from LRU during the split. The tail
> pages have to be added to LRU because they now separate from the tail
> page.
>
According to the explaination, looks like we could remove the code path,
since it's never got into. (base on my v15 patchset). Any comments?
Thanks
Alex
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7c52c5228aab..c28409509ad3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2357,17 +2357,6 @@ static void lru_add_page_tail(struct page *head, struct page *page_tail,
if (!list)
SetPageLRU(page_tail);
if (likely(PageLRU(head)))
list_add_tail(&page_tail->lru, &head->lru);
else if (list) {
/* page reclaim is reclaiming a huge page */
get_page(page_tail);
list_add_tail(&page_tail->lru, list);
- } else {
- /*
- * Head page has not yet been counted, as an hpage,
- * so we must account for each subpage individually.
- *
- * Put page_tail on the list at the correct position
- * so they all end up in order.
- */
- VM_BUG_ON_PAGE(1, head);
- add_page_to_lru_list_tail(page_tail, lruvec,
- page_lru(page_tail));
}
}
On 10.7.2020 7.51, Alex Shi wrote:
>
> 在 2020/7/10 上午12:07, Kirill A. Shutemov 写道:
>> On Thu, Jul 09, 2020 at 04:50:02PM +0100, Matthew Wilcox wrote:
>>> On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
>>>> Hi Kirill & Matthew,
>>>>
>>>> In the func call chain, from split_huge_page() to lru_add_page_tail(),
>>>> Seems tail pages are added to lru list at line 963, but in this scenario
>>>> the head page has no lru bit and isn't set the bit later. Why we do this?
>>>> or do I miss sth?
>>> I don't understand how we get to split_huge_page() with a page that's
>>> not on an LRU list. Both anonymous and page cache pages should be on
>>> an LRU list. What am I missing?>
>
> Thanks a lot for quick reply!
> What I am confusing is the call chain: __iommu_dma_alloc_pages()
> to split_huge_page(), in the func, splited page,
> page = alloc_pages_node(nid, alloc_flags, order);
> And if the pages were added into lru, they maybe reclaimed and lost,
> that would be a panic bug. But in fact, this never happened for long time.
> Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
In __iommu_dma_alloc_pages, after split_huge_page(), who is taking a
reference on tail pages? Seems tail pages are freed and the function
errornously returns them in pages[] array for use?
> in kselftest.
>
>> Right, and it's never got removed from LRU during the split. The tail
>> pages have to be added to LRU because they now separate from the tail
>> page.
>>
> According to the explaination, looks like we could remove the code path,
> since it's never got into. (base on my v15 patchset). Any comments?
>
> Thanks
> Alex
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 7c52c5228aab..c28409509ad3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2357,17 +2357,6 @@ static void lru_add_page_tail(struct page *head, struct page *page_tail,
> if (!list)
> SetPageLRU(page_tail);
>
> if (likely(PageLRU(head)))
> list_add_tail(&page_tail->lru, &head->lru);
> else if (list) {
> /* page reclaim is reclaiming a huge page */
> get_page(page_tail);
> list_add_tail(&page_tail->lru, list);
> - } else {
> - /*
> - * Head page has not yet been counted, as an hpage,
> - * so we must account for each subpage individually.
> - *
> - * Put page_tail on the list at the correct position
> - * so they all end up in order.
> - */
> - VM_BUG_ON_PAGE(1, head);
> - add_page_to_lru_list_tail(page_tail, lruvec,
> - page_lru(page_tail));
> }
> }
在 2020/7/10 下午1:28, Mika Penttilä 写道:
>> Thanks a lot for quick reply!
>> What I am confusing is the call chain: __iommu_dma_alloc_pages()
>> to split_huge_page(), in the func, splited page,
>> page = alloc_pages_node(nid, alloc_flags, order);
>> And if the pages were added into lru, they maybe reclaimed and lost,
>> that would be a panic bug. But in fact, this never happened for long time.
>> Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
>
> In __iommu_dma_alloc_pages, after split_huge_page(), who is taking a
> reference on tail pages? Seems tail pages are freed and the function
> errornously returns them in pages[] array for use?
>
Why you say so? It looks like the tail page returned and be used
pages = __iommu_dma_alloc_pages() in iommu_dma_alloc_remap()
and still on node's lru. Is this right?
thanks!
On 10.7.2020 10.00, Alex Shi wrote:
>
> 在 2020/7/10 下午1:28, Mika Penttilä 写道:
>>> Thanks a lot for quick reply!
>>> What I am confusing is the call chain: __iommu_dma_alloc_pages()
>>> to split_huge_page(), in the func, splited page,
>>> page = alloc_pages_node(nid, alloc_flags, order);
>>> And if the pages were added into lru, they maybe reclaimed and lost,
>>> that would be a panic bug. But in fact, this never happened for long time.
>>> Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
>> In __iommu_dma_alloc_pages, after split_huge_page(), who is taking a
>> reference on tail pages? Seems tail pages are freed and the function
>> errornously returns them in pages[] array for use?
>>
> Why you say so? It looks like the tail page returned and be used
> pages = __iommu_dma_alloc_pages() in iommu_dma_alloc_remap()
> and still on node's lru. Is this right?
>
> thanks!
IMHO they are new pages coming from alloc_pages_node() so they are not
on lru. And split_huge_page() frees not pinned tail pages again to page
allocator.
Thanks,
Mika
在 2020/7/10 下午1:28, Mika Penttilä 写道:
>
>
> On 10.7.2020 7.51, Alex Shi wrote:
>>
>> 在 2020/7/10 上午12:07, Kirill A. Shutemov 写道:
>>> On Thu, Jul 09, 2020 at 04:50:02PM +0100, Matthew Wilcox wrote:
>>>> On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
>>>>> Hi Kirill & Matthew,
>>>>>
>>>>> In the func call chain, from split_huge_page() to lru_add_page_tail(),
>>>>> Seems tail pages are added to lru list at line 963, but in this scenario
>>>>> the head page has no lru bit and isn't set the bit later. Why we do this?
>>>>> or do I miss sth?
>>>> I don't understand how we get to split_huge_page() with a page that's
>>>> not on an LRU list. Both anonymous and page cache pages should be on
>>>> an LRU list. What am I missing?>
>>
>> Thanks a lot for quick reply!
>> What I am confusing is the call chain: __iommu_dma_alloc_pages()
>> to split_huge_page(), in the func, splited page,
>> page = alloc_pages_node(nid, alloc_flags, order);
>> And if the pages were added into lru, they maybe reclaimed and lost,
>> that would be a panic bug. But in fact, this never happened for long time.
>> Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
>
>
> In __iommu_dma_alloc_pages, after split_huge_page(), who is taking a
> reference on tail pages? Seems tail pages are freed and the function
> errornously returns them in pages[] array for use?
>
CC Joerg and iommu list,
That's a good question. seems the split_huge_page was never triggered here,
since the func would check the PageLock first. and have page->mapping and PageAnon
check, any of them couldn't be matched for the alloced page.
Hi Joerg,
would you like look into this? do we still need the split_huge_page() here?
Thanks
Alex
int split_huge_page_to_list(struct page *page, struct list_head *list)
{
struct page *head = compound_head(page);
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
VM_BUG_ON_PAGE(!PageLocked(head), head); <==
>
On Fri, Jul 10, 2020 at 12:51:58PM +0800, Alex Shi wrote:
>
>
> 在 2020/7/10 上午12:07, Kirill A. Shutemov 写道:
> > Right, and it's never got removed from LRU during the split. The tail
> > pages have to be added to LRU because they now separate from the tail
> > page.
> >
> According to the explaination, looks like we could remove the code path,
> since it's never got into. (base on my v15 patchset). Any comments?
Yes. But why? It's reasonable failsafe that gives chance to recover if
something goes wrong.
--
Kirill A. Shutemov
Adding Robin.
On Fri, Jul 10, 2020 at 05:34:52PM +0800, Alex Shi wrote:
> 在 2020/7/10 下午1:28, Mika Penttilä 写道:
> >
> >
> > On 10.7.2020 7.51, Alex Shi wrote:
> >>
> >> 在 2020/7/10 上午12:07, Kirill A. Shutemov 写道:
> >>> On Thu, Jul 09, 2020 at 04:50:02PM +0100, Matthew Wilcox wrote:
> >>>> On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
> >>>>> Hi Kirill & Matthew,
> >>>>>
> >>>>> In the func call chain, from split_huge_page() to lru_add_page_tail(),
> >>>>> Seems tail pages are added to lru list at line 963, but in this scenario
> >>>>> the head page has no lru bit and isn't set the bit later. Why we do this?
> >>>>> or do I miss sth?
> >>>> I don't understand how we get to split_huge_page() with a page that's
> >>>> not on an LRU list. Both anonymous and page cache pages should be on
> >>>> an LRU list. What am I missing?>
> >>
> >> Thanks a lot for quick reply!
> >> What I am confusing is the call chain: __iommu_dma_alloc_pages()
> >> to split_huge_page(), in the func, splited page,
> >> page = alloc_pages_node(nid, alloc_flags, order);
> >> And if the pages were added into lru, they maybe reclaimed and lost,
> >> that would be a panic bug. But in fact, this never happened for long time.
> >> Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
> >
> >
> > In __iommu_dma_alloc_pages, after split_huge_page(), who is taking a
> > reference on tail pages? Seems tail pages are freed and the function
> > errornously returns them in pages[] array for use?
> >
>
> CC Joerg and iommu list,
>
> That's a good question. seems the split_huge_page was never triggered here,
> since the func would check the PageLock first. and have page->mapping and PageAnon
> check, any of them couldn't be matched for the alloced page.
>
> Hi Joerg,
> would you like look into this? do we still need the split_huge_page() here?
>
> Thanks
> Alex
>
> int split_huge_page_to_list(struct page *page, struct list_head *list)
> {
> struct page *head = compound_head(page);
> struct deferred_split *ds_queue = get_deferred_split_queue(head);
> struct anon_vma *anon_vma = NULL;
> struct address_space *mapping = NULL;
> int count, mapcount, extra_pins, ret;
> pgoff_t end;
>
> VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
> VM_BUG_ON_PAGE(!PageLocked(head), head); <==
> >
在 2020/7/10 下午6:33, Kirill A. Shutemov 写道:
> On Fri, Jul 10, 2020 at 12:51:58PM +0800, Alex Shi wrote:
>>
>>
>> 在 2020/7/10 上午12:07, Kirill A. Shutemov 写道:
>>> Right, and it's never got removed from LRU during the split. The tail
>>> pages have to be added to LRU because they now separate from the tail
>>> page.
>>>
>> According to the explaination, looks like we could remove the code path,
>> since it's never got into. (base on my v15 patchset). Any comments?
>
> Yes. But why? It's reasonable failsafe that gives chance to recover if
> something goes wrong.
>
Hi Kirill,
Sorry, I didn't get your points. IMHO, this fallback cann't work well,
since the head page isn't and shouldn't be added to lru. like the iommu issue
if a dma page added into lru list, it may be reclaim and lost. So, sorry, I
still don't know how this path could fix some wrong.
Thanks
Alex
On Fri, Jul 10, 2020 at 2:35 AM Alex Shi <[email protected]> wrote:
>
> 在 2020/7/10 下午1:28, Mika Penttilä 写道:
> >
> >
> > On 10.7.2020 7.51, Alex Shi wrote:
> >>
> >> 在 2020/7/10 上午12:07, Kirill A. Shutemov 写道:
> >>> On Thu, Jul 09, 2020 at 04:50:02PM +0100, Matthew Wilcox wrote:
> >>>> On Thu, Jul 09, 2020 at 11:11:11PM +0800, Alex Shi wrote:
> >>>>> Hi Kirill & Matthew,
> >>>>>
> >>>>> In the func call chain, from split_huge_page() to lru_add_page_tail(),
> >>>>> Seems tail pages are added to lru list at line 963, but in this scenario
> >>>>> the head page has no lru bit and isn't set the bit later. Why we do this?
> >>>>> or do I miss sth?
> >>>> I don't understand how we get to split_huge_page() with a page that's
> >>>> not on an LRU list. Both anonymous and page cache pages should be on
> >>>> an LRU list. What am I missing?>
> >>
> >> Thanks a lot for quick reply!
> >> What I am confusing is the call chain: __iommu_dma_alloc_pages()
> >> to split_huge_page(), in the func, splited page,
> >> page = alloc_pages_node(nid, alloc_flags, order);
> >> And if the pages were added into lru, they maybe reclaimed and lost,
> >> that would be a panic bug. But in fact, this never happened for long time.
> >> Also I put a BUG() at the line, it's nevre triggered in ltp, and run_vmtests
> >
> >
> > In __iommu_dma_alloc_pages, after split_huge_page(), who is taking a
> > reference on tail pages? Seems tail pages are freed and the function
> > errornously returns them in pages[] array for use?
> >
>
> CC Joerg and iommu list,
>
> That's a good question. seems the split_huge_page was never triggered here,
> since the func would check the PageLock first. and have page->mapping and PageAnon
> check, any of them couldn't be matched for the alloced page.
>
> Hi Joerg,
> would you like look into this? do we still need the split_huge_page() here?
I think this is the same problem which has been discussed a couple of
weeks ago. Please refer to:
https://lore.kernel.org/linux-mm/[email protected]/
I think the conclusion is split_huge_page() can't be used in this path
at all. But we didn't reach a fix yet.
>
> Thanks
> Alex
>
> int split_huge_page_to_list(struct page *page, struct list_head *list)
> {
> struct page *head = compound_head(page);
> struct deferred_split *ds_queue = get_deferred_split_queue(head);
> struct anon_vma *anon_vma = NULL;
> struct address_space *mapping = NULL;
> int count, mapcount, extra_pins, ret;
> pgoff_t end;
>
> VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
> VM_BUG_ON_PAGE(!PageLocked(head), head); <==
> >
>