On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> At times there is a need to regularly monitor vm counters while we
> reproduce some issue, or it could be as simple as gathering some system
> statistics when we run some scenario and every time we like to start from
> beginning.
> The current steps are:
> Dump /proc/vmstat
> Run some scenario
> Dump /proc/vmstat again
> Generate some data or graph
> reboot and repeat again
You can subtract the first vmstat dump from the second to get the
event delta for the scenario run. That's what I do, and I'd assume
most people are doing. Am I missing something?
On 2021-03-01 20:41, Johannes Weiner wrote:
> On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> At times there is a need to regularly monitor vm counters while we
>> reproduce some issue, or it could be as simple as gathering some
>> system
>> statistics when we run some scenario and every time we like to start
>> from
>> beginning.
>> The current steps are:
>> Dump /proc/vmstat
>> Run some scenario
>> Dump /proc/vmstat again
>> Generate some data or graph
>> reboot and repeat again
>
> You can subtract the first vmstat dump from the second to get the
> event delta for the scenario run. That's what I do, and I'd assume
> most people are doing. Am I missing something?
Thanks so much for your comments.
Yes in most cases it works.
But I guess there are sometimes where we need to compare with fresh data
(just like reboot) at least for some of the counters.
Suppose we wanted to monitor pgalloc_normal and pgfree.
Or, suppose we want to monitor until the field becomes non-zero..
Or, how certain values are changing compared to fresh reboot.
Or, suppose we want to reset all counters after boot and start capturing
fresh stats.
Some of the counters could be growing too large and too fast. Will there
be chances of overflow ?
Then resetting using this could help without rebooting.
Suppose system came back from hibernation, and we want to reset all
counters again (to capture fresh data) ?
Here I am sharing one output (from qemu-arm32 with 256MB RAM) just to
indicate what could be changed:
Scenario: Generate OOM kill case and check oom_kill counter
BEFORE AFTER proc/vmstat
------ ------ -----------------------
49991 49916 nr_free_pages
4467 4481 nr_zone_inactive_anon
68 68 nr_zone_active_anon
3189 3067 nr_zone_inactive_file
223 444 nr_zone_active_file
0 0 nr_zone_unevictable
122 136 nr_zone_write_pending
0 0 nr_mlock
139 139 nr_page_table_pages
0 0 nr_bounce
0 0 nr_zspages
4032 4032 nr_free_cma
4467 4481 nr_inactive_anon
68 68 nr_active_anon
3189 3067 nr_inactive_file
223 444 nr_active_file
0 0 nr_unevictable
1177 1178 nr_slab_reclaimable
1889 1889 nr_slab_unreclaimable
0 0 nr_isolated_anon
0 0 nr_isolated_file
176 163 workingset_nodes
0 0 workingset_refault_anon
3295 3369 workingset_refault_file
0 0 workingset_activate_anon
4 4 workingset_activate_file
0 0 workingset_restore_anon
4 4 workingset_restore_file
0 0 workingset_nodereclaim
4436 4436 nr_anon_pages
2636 2678 nr_mapped
3559 3645 nr_file_pages
122 136 nr_dirty
0 0 nr_writeback
0 0 nr_writeback_temp
126 126 nr_shmem
0 0 nr_shmem_hugepages
0 0 nr_shmem_pmdmapped
0 0 nr_file_hugepages
0 0 nr_file_pmdmapped
0 0 nr_anon_transparent_hugepages
1 1 nr_vmscan_write
1 1 nr_vmscan_immediate_reclaim
1024 1038 nr_dirtied
902 902 nr_written
0 0 nr_kernel_misc_reclaimable
0 0 nr_foll_pin_acquired
0 0 nr_foll_pin_released
616 616 nr_kernel_stack
10529 10533 nr_dirty_threshold
5258 5260 nr_dirty_background_threshold
50714 256 pgpgin
3932 16 pgpgout
0 0 pswpin
0 0 pswpout
86828 122 pgalloc_normal
0 0 pgalloc_movable
0 0 allocstall_normal
22 0 allocstall_movable
0 0 pgskip_normal
0 0 pgskip_movable
139594 34 pgfree
4998 155 pgactivate
5738 0 pgdeactivate
0 0 pglazyfree
82113 122 pgfault
374 2 pgmajfault
0 0 pglazyfreed
7695 0 pgrefill
2718 20 pgreuse
9261 0 pgsteal_kswapd
173 0 pgsteal_direct
12627 0 pgscan_kswapd
283 0 pgscan_direct
2 0 pgscan_direct_throttle
0 0 pgscan_anon
12910 0 pgscan_file
0 0 pgsteal_anon
9434 0 pgsteal_file
0 0 pginodesteal
7008 0 slabs_scanned
109 0 kswapd_inodesteal
16 0 kswapd_low_wmark_hit_quickly
24 0 kswapd_high_wmark_hit_quickly
43 0 pageoutrun
1 0 pgrotated
0 0 drop_pagecache
0 0 drop_slab
1 0 oom_kill
1210 0 pgmigrate_success
0 0 pgmigrate_fail
0 0 thp_migration_success
0 0 thp_migration_fail
0 0 thp_migration_split
1509 0 compact_migrate_scanned
9015 0 compact_free_scanned
3911 0 compact_isolated
0 0 compact_stall
0 0 compact_fail
0 0 compact_success
3 0 compact_daemon_wake
1509 0 compact_daemon_migrate_scanned
9015 0 compact_daemon_free_scanned
0 0 unevictable_pgs_culled
0 0 unevictable_pgs_scanned
0 0 unevictable_pgs_rescued
0 0 unevictable_pgs_mlocked
0 0 unevictable_pgs_munlocked
0 0 unevictable_pgs_cleared
0 0 unevictable_pgs_stranded
0 0 balloon_inflate
0 0 balloon_deflate
0 0 balloon_migrate
0 0 swap_ra
0 0 swap_ra_hit
0 0 nr_unstable
Thanks,
Pintu
On Tue, Mar 02, 2021 at 04:00:34PM +0530, [email protected] wrote:
> On 2021-03-01 20:41, Johannes Weiner wrote:
> > On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
> > > At times there is a need to regularly monitor vm counters while we
> > > reproduce some issue, or it could be as simple as gathering some
> > > system
> > > statistics when we run some scenario and every time we like to start
> > > from
> > > beginning.
> > > The current steps are:
> > > Dump /proc/vmstat
> > > Run some scenario
> > > Dump /proc/vmstat again
> > > Generate some data or graph
> > > reboot and repeat again
> >
> > You can subtract the first vmstat dump from the second to get the
> > event delta for the scenario run. That's what I do, and I'd assume
> > most people are doing. Am I missing something?
>
> Thanks so much for your comments.
> Yes in most cases it works.
>
> But I guess there are sometimes where we need to compare with fresh data
> (just like reboot) at least for some of the counters.
> Suppose we wanted to monitor pgalloc_normal and pgfree.
Hopefully these would already be balanced out pretty well before you
run a test, or there is a risk that whatever outstanding allocations
there are can cause a large number of frees during your test that
don't match up to your recorded allocation events. Resetting to zero
doesn't eliminate the risk of such background noise.
> Or, suppose we want to monitor until the field becomes non-zero..
> Or, how certain values are changing compared to fresh reboot.
> Or, suppose we want to reset all counters after boot and start capturing
> fresh stats.
Again, there simply is no mathematical difference between
reset events to 0
run test
look at events - 0
and
read events baseline
run test
look at events - baseline
> Some of the counters could be growing too large and too fast. Will there be
> chances of overflow ?
> Then resetting using this could help without rebooting.
Overflows are just a fact of life on 32 bit systems. However, they can
also be trivially handled - you can always subtract a ulong start
state from a ulong end state and get a reliable delta of up to 2^32
events, whether the end state has overflowed or not.
The bottom line is that the benefit of this patch adds a minor
convenience for something that can already be done in userspace. But
the downside is that there would be one more possible source of noise
for kernel developers to consider when looking at a bug report. Plus
the extra code and user interface that need to be maintained.
I don't think we should merge this patch.
On 2021-03-02 21:26, Johannes Weiner wrote:
> On Tue, Mar 02, 2021 at 04:00:34PM +0530, [email protected] wrote:
>> On 2021-03-01 20:41, Johannes Weiner wrote:
>> > On Mon, Mar 01, 2021 at 04:19:26PM +0530, Pintu Kumar wrote:
>> > > At times there is a need to regularly monitor vm counters while we
>> > > reproduce some issue, or it could be as simple as gathering some
>> > > system
>> > > statistics when we run some scenario and every time we like to start
>> > > from
>> > > beginning.
>> > > The current steps are:
>> > > Dump /proc/vmstat
>> > > Run some scenario
>> > > Dump /proc/vmstat again
>> > > Generate some data or graph
>> > > reboot and repeat again
>> >
>> > You can subtract the first vmstat dump from the second to get the
>> > event delta for the scenario run. That's what I do, and I'd assume
>> > most people are doing. Am I missing something?
>>
>> Thanks so much for your comments.
>> Yes in most cases it works.
>>
>> But I guess there are sometimes where we need to compare with fresh
>> data
>> (just like reboot) at least for some of the counters.
>> Suppose we wanted to monitor pgalloc_normal and pgfree.
>
> Hopefully these would already be balanced out pretty well before you
> run a test, or there is a risk that whatever outstanding allocations
> there are can cause a large number of frees during your test that
> don't match up to your recorded allocation events. Resetting to zero
> doesn't eliminate the risk of such background noise.
>
>> Or, suppose we want to monitor until the field becomes non-zero..
>> Or, how certain values are changing compared to fresh reboot.
>> Or, suppose we want to reset all counters after boot and start
>> capturing
>> fresh stats.
>
> Again, there simply is no mathematical difference between
>
> reset events to 0
> run test
> look at events - 0
>
> and
>
> read events baseline
> run test
> look at events - baseline
>
>> Some of the counters could be growing too large and too fast. Will
>> there be
>> chances of overflow ?
>> Then resetting using this could help without rebooting.
>
> Overflows are just a fact of life on 32 bit systems. However, they can
> also be trivially handled - you can always subtract a ulong start
> state from a ulong end state and get a reliable delta of up to 2^32
> events, whether the end state has overflowed or not.
>
> The bottom line is that the benefit of this patch adds a minor
> convenience for something that can already be done in userspace. But
> the downside is that there would be one more possible source of noise
> for kernel developers to consider when looking at a bug report. Plus
> the extra code and user interface that need to be maintained.
>
> I don't think we should merge this patch.
Okay no problem.Thank you so much for your review and feedback.
Yes I agree the benefits are minor but I thought might be useful for
someone somewhere.
I worked on it and found it easy and convinient and thus proposed it.
If others feel not important I am okay to drop it.
Thanks once again to all who helped to review it.