Hello Yongmei,
On Thu, Oct 07, 2021 at 02:35:44PM +0000, 解 咏梅 wrote:
> You're right. I checked with the commit 264e90cc07f177adec17ee7cc154ddaa132f0b2d
>
> I was say that, because back to 1 or 2 years ago, VM used reclaim_stat's rotation/scan as the factor to balance the ratio between fs page cache and anonymous pages.
> It used the side effect of working set activation (it raised rotation count) to challenge the other side memory: file vs anon
> And in shrink_active_list deactivation also contributes to rotation count.
>
> So I got the conclusion that active list rotation refers to deactivation.
> I checked with commit #264e90c, only executable code section contributes to active list rotation. Thank you for pointing out my misunderstanding.
> But, deactivation contributes to PGROTATED event. I'm still a sort of confused:(
Yeah PGROTATED is a little strange! I'm not sure people use it much.
> 1 more question:
> why activation/deativation/deactive_fn removes the contribution to lru cost? because those are cpu intensive not I/O intensive, right?
>
> So for now, if we'd like to balance the ratio between fs page cache and anonymous pages, we only take I/O (in allocation path and reclaim path) into consideration.
Yes, correct. The idea is to have the algorithm serve the workingset
with the least amount of paging IO.
Actually, the first version of the patch accounted for CPU overhead,
since anon and file do have different aging rules with different CPU
requirements. However, it didn't seem to matter in my testing, and
it's a bit awkward to compare IO cost and CPU cost since it depends
heavily on the underlying hardware, so I deleted that code. It's
possible we may need to add it back if a workload shows up that cares.
> As my observation, VM don't take fs periodical dirty flush as I/O cost.
Correct.
The thinking is: writeback IO needs to happen with or without reclaim,
because of data integrity. Whereas swapping only happens under memory
pressure - without anon reclaim we would not do any swap writes.
Of course, reclaim can trigger accelerated dirty flushing, which
*could* result in increased IO and thus higher LRU cost: two buffered
writes to the same page within the dirty expiration window would
result in one disk write but could result in two under pressure. It's
a pain to track this properly, though, so the compromise is that when
kswapd gets in enough trouble that it needs to flush pages one by one
(NR_VMSCAN_WRITE). This seems to work reasonably well in practice.
> Looking forward to your reply!
>
> Thanks again. I get more clear view of VM:)
>
>
> It is Chinese national holiday, sorry for my late response.
Happy Golden Week!
Johannes