by SeongJae Park

[permalink] [raw]

Subject: Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

Hi Honggyu,

On Tue, 9 Apr 2024 18:54:14 +0900 Honggyu Kim <[email protected]> wrote:
> On Mon, 8 Apr 2024 10:52:28 -0700 SeongJae Park <[email protected]> wrote:
> > On Mon, 8 Apr 2024 21:06:44 +0900 Honggyu Kim <[email protected]> wrote:
> > > On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park <[email protected]> wrote:
> > > > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <[email protected]> wrote:
[...]
> > > I can remove it, but I would like to have more discussion about this
> > > issue. The current implementation allows only a single migration
> > > target with "target_nid", but users might want to provide fall back
> > > migration target nids.
> > >
> > > For example, if more than two CXL nodes exist in the system, users might
> > > want to migrate cold pages to any CXL nodes. In such cases, we might
> > > have to make "target_nid" accept comma separated node IDs. nodemask can
> > > be better but we should provide a way to change the scanning order.
> > >
> > > I would like to hear how you think about this.
> >
> > Good point. I think we could later extend the sysfs file to receive the
> > comma-separated numbers, or even mask. For simplicity, adding sysfs files
> > dedicated for the different format of inputs could also be an option (e.g.,
> > target_nids_list, target_nids_mask). But starting from this single node as is
> > now looks ok to me.
>
> If you think we can start from a single node, then I will keep it as is.
> But are you okay if I change the same 'target_nid' to accept
> comma-separated numbers later? Or do you want to introduce another knob
> such as 'target_nids_list'? What about rename 'target_nid' to
> 'target_nids' at the first place?

I have no strong concern or opinion about this at the moment. Please feel free
to renaming it to 'taget_nids' if you think that's better.

[...]
> Please note that I will be out of office this week so won't be able to
> answer quickly.

No problem, I hope you to take and enjoy your time :)

Thanks,
SJ

[...]

2024-04-10 00:00:37

by Gregory Price

[permalink] [raw]

Subject: Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL

On Mon, Apr 08, 2024 at 10:41:04PM +0900, Honggyu Kim wrote:
> Hi Gregory,
>
> On Fri, 5 Apr 2024 12:56:14 -0400 Gregory Price <[email protected]> wrote:
> > Do you have test results which enable only DAMOS_MIGRATE_COLD actions
> > but not DAMOS_MIGRATE_HOT actions? (and vice versa)
> >
> > The question I have is exactly how often is MIGRATE_HOT actually being
> > utilized, and how much data is being moved. Testing MIGRATE_COLD only
> > would at least give a rough approximation of that.
>
> To explain this, I better share more test results. In the section of
> "Evaluation Workload", the test sequence can be summarized as follows.
>
> *. "Turn on DAMON."
> 1. Allocate cold memory(mmap+memset) at DRAM node, then make the
> process sleep.
> 2. Launch redis-server and load prebaked snapshot image, dump.rdb.
> (85GB consumed: 52GB for anon and 33GB for file cache)

Aha! I see now, you are allocating memory to ensure the real workload
(redis-server) pressures the DRAM tier and causes "spillage" to the CXL
tier, and then measure the overhead in different scenarios.

I would still love to know what the result of a demote-only system would
produce, mosty because it would very clearly demonstrate the value of
the demote+promote system when the system is memory-pressured.

Given the additional results below, it shows a demote-only system would
likely trend toward CXL-only, and so this shows an affirmative support
for the promotion logic.

Just another datum that is useful and paints a more complete picture.

> I didn't want to make the evaluation too long in the cover letter, but
> I have also evaluated another senario, which lazyly enabled DAMON just
> before YCSB run at step 4. I will call this test as "DAMON lazy". This
> is missing part from the cover letter.
>
> 1. Allocate cold memory(mmap+memset) at DRAM node, then make the
> process sleep.
> 2. Launch redis-server and load prebaked snapshot image, dump.rdb.
> (85GB consumed: 52GB for anon and 33GB for file cache)
> *. "Turn on DAMON."
>
> In the "DAMON lazy" senario, DAMON started monitoring late so the
> initial redis-server placement is same as "default", but started to
> demote cold data and promote redis data just before YCSB run.
>

This is excellent and definitely demonstrates part of the picture I was
alluding to, thank you for the additional data!

>
> I have included "DAMON lazy" result and also the migration size by new
> DAMOS migrate actions. Please note that demotion size is way higher
> than promotion because promotion target is only for redis data, but
> demotion target includes huge cold memory allocated by mmap + memset.
> (there could be some ping-pong issue though.)
>
> As you mentioned, "DAMON tiered" case gets more benefit because new
> redis allocations go to DRAM more than "default", but it also gets
> benefit from promotion when it is under higher memory pressure as shown
> in 490G and 500G cases. It promotes 22GB and 17GB of redis data to DRAM
> from CXL.

I think a better way of saying this is that "DAMON tiered" more
effectively mitigates the effect of memory-pressure on faster tier
before spillage occurs, while "DAMON lazy" demonstrates the expected
performance of the system after memory pressure outruns the demotion
logic, so you wind up with hot data stuck in the slow tier.

There are some out there that would simply say "just demote more
aggressively", so this is useful information for the discussion.

+/- ~2% despite greater meomry migration is an excellent result

> > Can you also provide the DRAM-only results for each test? Presumably,
> > as workload size increases from 440G to 500G, the system probably starts
> > using some amount of swap/zswap/whatever. It would be good to know how
> > this system compares to swap small amounts of overflow.
>
> It looks like my explanation doesn't correctly inform you. The size
> from 440GB to 500GB is for pre allocated cold data to give memory
> pressure on the system so that redis-server cannot be fully allocated at
> fast DRAM, then partially allocated at CXL memory as well.
>

Yes, sorry for the misunderstanding. This makes it much clearer.

>
> I hope my explanation is helpful for you to understand. Please let me
> know if you have more questions.
>

Excellent work, exciting results! Thank you for the additional answers
:]

~Gregory