LinuxLists.cc - [pipe] 3b844826b6: stress-ng.sigio.ops_per

2021-08-24 15:00:20

Subject: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

Greeting,

FYI, we noticed a -99.3% regression of stress-ng.sigio.ops_per_sec due to commit:

commit: 3b844826b6c6affa80755254da322b017358a2f4 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:

nr_threads: 100%
disk: 1HDD
testtime: 60s
class: interrupt
test: sigio
cpufreq_governor: performance
ucode: 0x5003006

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>

Details are as below:
-------------------------------------------------------------------------------------------------->

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
interrupt/gcc-9/performance/1HDD/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/sigio/stress-ng/60s/0x5003006

commit:
614cb2751d ("Merge tag 'trace-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
3b844826b6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")

614cb2751d315085 3b844826b6c6affa80755254da3
---------------- ---------------------------
%stddev %change %stddev
\ | \
5.253e+08 -99.3% 3459947 ? 14% stress-ng.sigio.ops
8754883 -99.3% 57665 ? 14% stress-ng.sigio.ops_per_sec
3500 ? 22% +1377.4% 51720 stress-ng.time.file_system_inputs
1.035e+09 -100.0% 35652 ? 80% stress-ng.time.involuntary_context_switches
21237 -3.6% 20482 stress-ng.time.minor_page_faults
9210 -99.4% 51.17 ? 16% stress-ng.time.percent_of_cpu_this_job_got
5222 -99.4% 29.88 ? 16% stress-ng.time.system_time
508.33 -99.5% 2.36 ? 14% stress-ng.time.user_time
4493592 ? 18% -86.5% 608367 ? 19% stress-ng.time.voluntary_context_switches
3.504e+08 +1637.4% 6.088e+09 cpuidle..time
460.83 -68.6% 144.72 ? 8% pmeter.Average_Active_Power
3491 +164.3% 9227 uptime.idle
7.57 +1211.5% 99.27 iostat.cpu.idle
84.16 -99.3% 0.59 ? 12% iostat.cpu.system
8.26 -98.3% 0.14 ? 5% iostat.cpu.user
3446 ? 11% +467.8% 19571 numa-meminfo.node0.Active
1772 ? 11% -28.2% 1272 ? 19% numa-meminfo.node0.Active(anon)
1673 ? 24% +993.3% 18298 numa-meminfo.node0.Active(file)
4372 ? 9% +369.6% 20531 meminfo.Active
2026 ? 20% +802.9% 18298 meminfo.Active(file)
1782 ? 22% -99.2% 15.00 meminfo.Buffers
275141 +20.7% 331981 meminfo.VmallocUsed
376983 ? 27% -69.4% 115385 ? 11% numa-numastat.node0.local_node
377025 ? 27% -69.4% 115420 ? 11% numa-numastat.node0.numa_hit
476092 ? 42% -68.4% 150588 ? 10% numa-numastat.node1.local_node
476118 ? 42% -68.4% 150603 ? 10% numa-numastat.node1.numa_hit
5.03 +94.3 99.32 mpstat.cpu.all.idle%
0.34 ? 6% -0.3 0.01 ? 14% mpstat.cpu.all.irq%
0.01 ? 16% -0.0 0.01 ? 47% mpstat.cpu.all.soft%
86.12 -85.6 0.53 ? 14% mpstat.cpu.all.sys%
8.48 -8.4 0.13 ? 5% mpstat.cpu.all.usr%
443.00 ? 11% -28.1% 318.33 ? 20% numa-vmstat.node0.nr_active_anon
418.00 ? 24% +994.2% 4573 numa-vmstat.node0.nr_active_file
443.00 ? 11% -28.1% 318.33 ? 20% numa-vmstat.node0.nr_zone_active_anon
418.00 ? 24% +994.2% 4573 numa-vmstat.node0.nr_zone_active_file
1420363 ? 20% -18.1% 1163862 ? 20% numa-vmstat.node0.numa_hit
1413276 ? 20% -18.1% 1156993 ? 20% numa-vmstat.node0.numa_local
7.00 +1314.3% 99.00 vmstat.cpu.id
83.83 -100.0% 0.00 vmstat.cpu.sy
42.00 ? 13% +842.9% 396.00 vmstat.io.bi
668.83 ? 4% -100.0% 0.00 vmstat.io.bo
1785 ? 22% -99.2% 15.00 vmstat.memory.buff
176.50 -100.0% 0.00 vmstat.procs.r
16009506 -99.9% 20670 ? 16% vmstat.system.cs
736835 ? 17% -96.2% 27683 ? 11% vmstat.system.in
8456 ? 12% -53.8% 3910 ? 23% softirqs.CPU0.RCU
7835 ? 17% -63.2% 2883 ? 22% softirqs.CPU1.RCU
6941 ? 20% -73.6% 1831 ? 20% softirqs.CPU2.RCU
6510 ? 35% -89.7% 672.17 ? 18% softirqs.CPU45.RCU
6786 ? 23% -80.4% 1331 ? 35% softirqs.CPU57.RCU
7103 ? 35% -84.6% 1095 ? 41% softirqs.CPU58.RCU
6646 ? 31% -90.7% 620.83 ? 30% softirqs.CPU87.RCU
603089 ? 17% -81.1% 113890 ? 9% softirqs.RCU
244389 -19.2% 197587 ? 4% softirqs.SCHED
506.50 ? 20% +803.1% 4574 proc-vmstat.nr_active_file
506.50 ? 20% +803.1% 4574 proc-vmstat.nr_zone_active_file
854050 ? 15% -68.7% 267735 proc-vmstat.numa_hit
853991 ? 15% -68.7% 267685 proc-vmstat.numa_local
683.00 ? 12% +885.7% 6732 proc-vmstat.pgactivate
854034 ? 15% -68.6% 267802 proc-vmstat.pgalloc_normal
327631 +1.9% 333769 proc-vmstat.pgfault
871818 ? 15% -69.4% 266460 proc-vmstat.pgfree
2760 ? 13% +836.8% 25860 proc-vmstat.pgpgin
42476 ? 4% -100.0% 0.00 proc-vmstat.pgpgout
2589 -99.2% 21.00 ? 11% turbostat.Avg_MHz
94.24 -93.5 0.72 ? 11% turbostat.Busy%
2753 +5.9% 2916 turbostat.Bzy_MHz
273602 ? 17% -95.1% 13315 ? 54% turbostat.C1
0.03 ? 11% -0.0 0.01 turbostat.C1%
5730 ? 21% +507.5% 34814 ?114% turbostat.C1E
0.03 ? 27% +0.1 0.18 ?128% turbostat.C1E%
17066 ? 6% +567.6% 113932 ? 12% turbostat.C6
5.56 +93.6 99.12 turbostat.C6%
0.62 ? 8% +284.8% 2.40 ? 23% turbostat.CPU%c1
5.14 +1785.4% 96.88 turbostat.CPU%c6
67.17 -37.0% 42.33 ? 2% turbostat.CoreTmp
47874023 ? 17% -96.3% 1782191 ? 11% turbostat.IRQ
2.19 ? 6% +886.7% 21.59 ? 30% turbostat.Pkg%pc2
0.34 ? 21% +15100.5% 52.19 ? 32% turbostat.Pkg%pc6
66.83 -36.9% 42.17 turbostat.PkgTmp
286.61 -83.7% 46.85 ? 12% turbostat.PkgWatt
145.97 -39.9% 87.72 ? 22% turbostat.RAMWatt
720.33 ? 13% -74.1% 186.50 ? 7% slabinfo.buffer_head.active_objs
720.33 ? 13% -74.1% 186.50 ? 7% slabinfo.buffer_head.num_objs
197.00 -100.0% 0.00 slabinfo.ext4_extent_status.active_objs
197.00 -100.0% 0.00 slabinfo.ext4_extent_status.num_objs
3904 -100.0% 0.00 slabinfo.ext4_groupinfo_4k.active_objs
3904 -100.0% 0.00 slabinfo.ext4_groupinfo_4k.num_objs
20.50 ?223% +504.9% 124.00 slabinfo.ext4_pending_reservation.active_objs
20.50 ?223% +504.9% 124.00 slabinfo.ext4_pending_reservation.num_objs
296.00 -100.0% 0.00 slabinfo.ext4_system_zone.active_objs
296.00 -100.0% 0.00 slabinfo.ext4_system_zone.num_objs
247.17 -100.0% 0.00 slabinfo.jbd2_revoke_table_s.active_objs
247.17 -100.0% 0.00 slabinfo.jbd2_revoke_table_s.num_objs
49505 +50.6% 74578 slabinfo.kmalloc-64.active_objs
777.17 +50.4% 1169 slabinfo.kmalloc-64.active_slabs
49774 +50.4% 74856 slabinfo.kmalloc-64.num_objs
777.17 +50.4% 1169 slabinfo.kmalloc-64.num_slabs
3248 ? 13% -19.4% 2618 ? 5% slabinfo.kmalloc-cg-64.active_objs
3248 ? 13% -19.4% 2618 ? 5% slabinfo.kmalloc-cg-64.num_objs
120.50 ? 35% -100.0% 0.00 slabinfo.mbcache.active_objs
120.50 ? 35% -100.0% 0.00 slabinfo.mbcache.num_objs
552.33 ? 16% -82.6% 96.00 interrupts.315:PCI-MSI.376832-edge.ahci[0000:00:17.0]
816631 ? 31% -49.8% 410137 ? 14% interrupts.CAL:Function_call_interrupts
81896 ? 52% -94.6% 4446 ? 59% interrupts.CPU0.CAL:Function_call_interrupts
60654 -95.4% 2775 ? 19% interrupts.CPU0.LOC:Local_timer_interrupts
1679755 ? 39% -99.1% 14302 ?174% interrupts.CPU0.RES:Rescheduling_interrupts
40195 ? 60% -98.3% 679.00 ? 26% interrupts.CPU1.CAL:Function_call_interrupts
61525 -94.5% 3389 ? 36% interrupts.CPU1.LOC:Local_timer_interrupts
692538 ? 48% -99.9% 1023 ?223% interrupts.CPU1.RES:Rescheduling_interrupts
60388 -97.7% 1391 ? 51% interrupts.CPU10.LOC:Local_timer_interrupts
60412 -98.5% 898.17 ? 35% interrupts.CPU11.LOC:Local_timer_interrupts
1035474 ?108% -97.0% 31216 ?167% interrupts.CPU11.RES:Rescheduling_interrupts
60272 -98.4% 954.17 ? 30% interrupts.CPU12.LOC:Local_timer_interrupts
455627 ?107% -95.5% 20374 ? 84% interrupts.CPU12.RES:Rescheduling_interrupts
60286 -98.9% 667.33 ? 45% interrupts.CPU13.LOC:Local_timer_interrupts
60263 -98.0% 1185 ? 55% interrupts.CPU14.LOC:Local_timer_interrupts
238973 ? 84% -93.2% 16299 ?117% interrupts.CPU14.RES:Rescheduling_interrupts
60287 -98.3% 1040 ? 40% interrupts.CPU15.LOC:Local_timer_interrupts
456818 ? 98% -94.0% 27605 ?168% interrupts.CPU15.RES:Rescheduling_interrupts
60339 -98.7% 796.50 ? 39% interrupts.CPU16.LOC:Local_timer_interrupts
60240 -98.6% 817.00 ? 44% interrupts.CPU17.LOC:Local_timer_interrupts
60211 -98.7% 801.00 ? 29% interrupts.CPU18.LOC:Local_timer_interrupts
389931 ? 90% -97.7% 8846 ?133% interrupts.CPU18.RES:Rescheduling_interrupts
60283 -98.1% 1164 ? 75% interrupts.CPU19.LOC:Local_timer_interrupts
315114 ? 71% -86.2% 43617 ?192% interrupts.CPU19.RES:Rescheduling_interrupts
60512 -97.1% 1745 ? 66% interrupts.CPU2.LOC:Local_timer_interrupts
473515 ? 61% -99.3% 3508 ?142% interrupts.CPU2.RES:Rescheduling_interrupts
60323 -97.7% 1397 ? 59% interrupts.CPU20.LOC:Local_timer_interrupts
291369 ?117% -98.8% 3589 ?221% interrupts.CPU20.RES:Rescheduling_interrupts
60239 -97.6% 1461 ? 51% interrupts.CPU21.LOC:Local_timer_interrupts
172556 ? 87% -96.5% 6014 ?120% interrupts.CPU21.RES:Rescheduling_interrupts
60199 -95.4% 2756 ?151% interrupts.CPU22.LOC:Local_timer_interrupts
188482 ?116% -95.5% 8570 ?141% interrupts.CPU22.RES:Rescheduling_interrupts
60308 -97.6% 1464 ? 64% interrupts.CPU23.LOC:Local_timer_interrupts
12156 ? 62% -79.9% 2445 ?151% interrupts.CPU24.CAL:Function_call_interrupts
60178 -98.5% 897.00 ? 49% interrupts.CPU24.LOC:Local_timer_interrupts
482812 ? 45% -99.7% 1387 ?150% interrupts.CPU24.RES:Rescheduling_interrupts
60179 -98.3% 1011 ? 76% interrupts.CPU25.LOC:Local_timer_interrupts
292815 ? 81% -98.3% 5028 ?159% interrupts.CPU25.RES:Rescheduling_interrupts
60335 -98.9% 648.50 ? 25% interrupts.CPU26.LOC:Local_timer_interrupts
60219 -98.1% 1140 ?139% interrupts.CPU27.LOC:Local_timer_interrupts
493009 ? 92% -99.7% 1286 ?134% interrupts.CPU27.RES:Rescheduling_interrupts
60184 -98.5% 892.83 ? 84% interrupts.CPU28.LOC:Local_timer_interrupts
469018 ?105% -97.5% 11551 ?139% interrupts.CPU28.RES:Rescheduling_interrupts
60142 -98.9% 668.83 ? 36% interrupts.CPU29.LOC:Local_timer_interrupts
281133 ? 62% -97.4% 7412 ?138% interrupts.CPU29.RES:Rescheduling_interrupts
60410 -98.0% 1205 ? 61% interrupts.CPU3.LOC:Local_timer_interrupts
285041 ? 86% -91.4% 24381 ?136% interrupts.CPU3.RES:Rescheduling_interrupts
60180 -98.8% 716.00 ? 41% interrupts.CPU30.LOC:Local_timer_interrupts
537090 ? 70% -99.4% 3478 ?162% interrupts.CPU30.RES:Rescheduling_interrupts
60152 -95.5% 2725 ?154% interrupts.CPU31.LOC:Local_timer_interrupts
513819 ? 70% -98.8% 6234 ?156% interrupts.CPU31.RES:Rescheduling_interrupts
60133 -98.3% 1016 ? 58% interrupts.CPU32.LOC:Local_timer_interrupts
385091 ? 91% -97.6% 9348 ?134% interrupts.CPU32.RES:Rescheduling_interrupts
60238 -98.4% 967.33 ? 98% interrupts.CPU33.LOC:Local_timer_interrupts
60126 -98.6% 830.50 ? 29% interrupts.CPU34.LOC:Local_timer_interrupts
429677 ?154% -98.4% 6925 ?141% interrupts.CPU34.RES:Rescheduling_interrupts
60234 -99.1% 518.17 ? 42% interrupts.CPU35.LOC:Local_timer_interrupts
371287 ? 96% -98.5% 5682 ?149% interrupts.CPU35.RES:Rescheduling_interrupts
60188 -99.0% 602.17 ? 60% interrupts.CPU36.LOC:Local_timer_interrupts
388845 ?119% -97.9% 8081 ?165% interrupts.CPU36.RES:Rescheduling_interrupts
60201 -98.2% 1077 ? 62% interrupts.CPU37.LOC:Local_timer_interrupts
60182 -98.6% 851.17 ? 83% interrupts.CPU38.LOC:Local_timer_interrupts
451487 ?108% -99.9% 399.17 ?177% interrupts.CPU38.RES:Rescheduling_interrupts
60283 -98.5% 930.50 ? 59% interrupts.CPU39.LOC:Local_timer_interrupts
617036 ?139% -99.7% 1594 ?216% interrupts.CPU39.RES:Rescheduling_interrupts
60682 -97.4% 1562 ? 62% interrupts.CPU4.LOC:Local_timer_interrupts
1208419 ? 78% -98.0% 23662 ? 79% interrupts.CPU4.RES:Rescheduling_interrupts
60247 -98.6% 873.00 ? 38% interrupts.CPU40.LOC:Local_timer_interrupts
60189 -98.9% 679.83 ? 77% interrupts.CPU41.LOC:Local_timer_interrupts
221441 ?118% -96.2% 8401 ?220% interrupts.CPU41.RES:Rescheduling_interrupts
60120 -98.9% 689.83 ? 32% interrupts.CPU42.LOC:Local_timer_interrupts
60176 -98.7% 777.83 ? 78% interrupts.CPU43.LOC:Local_timer_interrupts
384869 ?102% -99.1% 3568 ?116% interrupts.CPU43.RES:Rescheduling_interrupts
60121 -97.7% 1371 ?103% interrupts.CPU44.LOC:Local_timer_interrupts
348448 ? 69% -96.4% 12641 ?145% interrupts.CPU44.RES:Rescheduling_interrupts
60203 -98.5% 905.17 ? 76% interrupts.CPU45.LOC:Local_timer_interrupts
241934 ?112% -99.3% 1802 ?139% interrupts.CPU45.RES:Rescheduling_interrupts
60150 -98.6% 819.17 ?103% interrupts.CPU46.LOC:Local_timer_interrupts
60201 -98.9% 671.00 ? 16% interrupts.CPU47.LOC:Local_timer_interrupts
362342 ? 70% -97.8% 8136 ? 91% interrupts.CPU47.RES:Rescheduling_interrupts
60657 -98.1% 1172 ? 22% interrupts.CPU48.LOC:Local_timer_interrupts
1480231 ? 26% -99.2% 11850 ? 99% interrupts.CPU48.RES:Rescheduling_interrupts
22343 ? 64% -81.3% 4181 ?113% interrupts.CPU49.CAL:Function_call_interrupts
60283 -98.1% 1139 ? 48% interrupts.CPU49.LOC:Local_timer_interrupts
972258 ? 71% -96.7% 32533 ?136% interrupts.CPU49.RES:Rescheduling_interrupts
60301 -89.0% 6626 ?173% interrupts.CPU5.LOC:Local_timer_interrupts
499333 ?105% -95.9% 20683 ?132% interrupts.CPU5.RES:Rescheduling_interrupts
60301 -97.9% 1247 ? 42% interrupts.CPU50.LOC:Local_timer_interrupts
594153 ? 76% -95.2% 28647 ?110% interrupts.CPU50.RES:Rescheduling_interrupts
60363 -98.2% 1103 ? 49% interrupts.CPU51.LOC:Local_timer_interrupts
60334 -97.4% 1567 ? 66% interrupts.CPU52.LOC:Local_timer_interrupts
907891 ? 77% -98.4% 14749 ?109% interrupts.CPU52.RES:Rescheduling_interrupts
60300 -98.5% 881.83 ? 54% interrupts.CPU53.LOC:Local_timer_interrupts
60308 -97.9% 1262 ? 46% interrupts.CPU54.LOC:Local_timer_interrupts
711419 ? 66% -94.0% 42360 ?203% interrupts.CPU54.RES:Rescheduling_interrupts
60320 -97.8% 1312 ? 37% interrupts.CPU55.LOC:Local_timer_interrupts
417556 ? 89% -94.1% 24502 ?172% interrupts.CPU55.RES:Rescheduling_interrupts
60296 -98.1% 1166 ? 55% interrupts.CPU56.LOC:Local_timer_interrupts
60249 -96.1% 2322 ? 65% interrupts.CPU57.LOC:Local_timer_interrupts
60303 -97.5% 1494 ? 48% interrupts.CPU58.LOC:Local_timer_interrupts
409161 ?106% -99.1% 3873 ?138% interrupts.CPU58.RES:Rescheduling_interrupts
60351 -83.9% 9746 ?180% interrupts.CPU59.LOC:Local_timer_interrupts
60471 -97.7% 1402 ? 61% interrupts.CPU6.LOC:Local_timer_interrupts
728478 ? 39% -99.3% 4794 ?188% interrupts.CPU6.RES:Rescheduling_interrupts
60352 -96.7% 1988 ? 84% interrupts.CPU60.LOC:Local_timer_interrupts
472502 ? 89% -97.9% 9851 ?112% interrupts.CPU60.RES:Rescheduling_interrupts
60386 -98.2% 1089 ? 27% interrupts.CPU61.LOC:Local_timer_interrupts
589897 ? 85% -94.3% 33616 ?201% interrupts.CPU61.RES:Rescheduling_interrupts
60323 -98.6% 817.67 ? 45% interrupts.CPU62.LOC:Local_timer_interrupts
60348 -97.9% 1240 ? 42% interrupts.CPU63.LOC:Local_timer_interrupts
60447 -98.2% 1069 ? 32% interrupts.CPU64.LOC:Local_timer_interrupts
60375 -97.3% 1611 ? 50% interrupts.CPU65.LOC:Local_timer_interrupts
269531 ?116% -95.1% 13162 ?207% interrupts.CPU65.RES:Rescheduling_interrupts
60396 -98.1% 1141 ? 47% interrupts.CPU66.LOC:Local_timer_interrupts
372632 ?111% -97.4% 9601 ?214% interrupts.CPU66.RES:Rescheduling_interrupts
60729 -98.3% 1019 ? 42% interrupts.CPU67.LOC:Local_timer_interrupts
588614 ?128% -93.1% 40600 ?184% interrupts.CPU67.RES:Rescheduling_interrupts
60344 -98.7% 807.00 ? 31% interrupts.CPU68.LOC:Local_timer_interrupts
155749 ? 95% -83.3% 25947 ?145% interrupts.CPU68.RES:Rescheduling_interrupts
60373 -98.4% 980.33 ? 34% interrupts.CPU69.LOC:Local_timer_interrupts
335179 ?146% -99.9% 386.17 ?156% interrupts.CPU69.RES:Rescheduling_interrupts
60329 -97.8% 1354 ? 54% interrupts.CPU7.LOC:Local_timer_interrupts
60385 -97.0% 1821 ? 52% interrupts.CPU70.LOC:Local_timer_interrupts
60434 -98.1% 1123 ? 33% interrupts.CPU71.LOC:Local_timer_interrupts
507707 ?109% -98.4% 7873 ?135% interrupts.CPU71.RES:Rescheduling_interrupts
11035 ? 60% -83.6% 1806 ?135% interrupts.CPU72.CAL:Function_call_interrupts
60333 -98.9% 660.50 ? 37% interrupts.CPU72.LOC:Local_timer_interrupts
258639 ? 88% -98.7% 3233 ?210% interrupts.CPU72.RES:Rescheduling_interrupts
60284 -98.4% 991.67 ? 32% interrupts.CPU73.LOC:Local_timer_interrupts
288831 ? 76% -95.7% 12327 ?162% interrupts.CPU73.RES:Rescheduling_interrupts
60239 -98.1% 1148 ? 68% interrupts.CPU74.LOC:Local_timer_interrupts
433997 ?103% -99.5% 2368 ?216% interrupts.CPU74.RES:Rescheduling_interrupts
60310 -98.1% 1164 ? 44% interrupts.CPU75.LOC:Local_timer_interrupts
482166 ? 65% -98.5% 7259 ? 97% interrupts.CPU75.RES:Rescheduling_interrupts
60295 -98.5% 884.83 ? 33% interrupts.CPU76.LOC:Local_timer_interrupts
600012 ? 99% -99.5% 3234 ?147% interrupts.CPU76.RES:Rescheduling_interrupts
60324 -96.9% 1892 ? 86% interrupts.CPU77.LOC:Local_timer_interrupts
241911 ? 80% -99.9% 189.83 ?132% interrupts.CPU77.RES:Rescheduling_interrupts
60343 -97.5% 1503 ? 47% interrupts.CPU78.LOC:Local_timer_interrupts
406309 ? 49% -98.7% 5159 ?147% interrupts.CPU78.RES:Rescheduling_interrupts
60312 -89.3% 6482 ?154% interrupts.CPU79.LOC:Local_timer_interrupts
411599 ? 52% -97.7% 9532 ?188% interrupts.CPU79.RES:Rescheduling_interrupts
60343 -97.6% 1428 ? 69% interrupts.CPU8.LOC:Local_timer_interrupts
330226 ? 71% -93.6% 21080 ?105% interrupts.CPU8.RES:Rescheduling_interrupts
60257 -98.1% 1125 ? 49% interrupts.CPU80.LOC:Local_timer_interrupts
416160 ?101% -99.9% 216.50 ?140% interrupts.CPU80.RES:Rescheduling_interrupts
60298 -97.2% 1702 ?115% interrupts.CPU81.LOC:Local_timer_interrupts
59743 ?115% -98.4% 946.83 ?159% interrupts.CPU81.RES:Rescheduling_interrupts
60273 -97.4% 1576 ? 62% interrupts.CPU82.LOC:Local_timer_interrupts
379387 ? 97% -99.4% 2458 ?142% interrupts.CPU82.RES:Rescheduling_interrupts
60251 -97.2% 1660 ? 67% interrupts.CPU83.LOC:Local_timer_interrupts
196406 ? 92% -92.4% 14917 ?166% interrupts.CPU83.RES:Rescheduling_interrupts
60295 -96.8% 1915 ? 84% interrupts.CPU84.LOC:Local_timer_interrupts
298141 ?127% -96.9% 9333 ?218% interrupts.CPU84.RES:Rescheduling_interrupts
60282 -96.8% 1953 ? 79% interrupts.CPU85.LOC:Local_timer_interrupts
60269 -98.3% 1034 ? 46% interrupts.CPU86.LOC:Local_timer_interrupts
60290 -98.1% 1171 ? 52% interrupts.CPU87.LOC:Local_timer_interrupts
517377 ? 97% -97.8% 11400 ?185% interrupts.CPU87.RES:Rescheduling_interrupts
60317 -97.6% 1451 ? 67% interrupts.CPU88.LOC:Local_timer_interrupts
60246 -98.6% 828.83 ? 50% interrupts.CPU89.LOC:Local_timer_interrupts
410372 ?126% -98.0% 8249 ?142% interrupts.CPU89.RES:Rescheduling_interrupts
60641 -97.8% 1319 ? 62% interrupts.CPU9.LOC:Local_timer_interrupts
317983 ? 65% -94.2% 18512 ? 58% interrupts.CPU9.RES:Rescheduling_interrupts
60292 -98.4% 973.83 ? 44% interrupts.CPU90.LOC:Local_timer_interrupts
245411 ?125% -99.3% 1657 ?223% interrupts.CPU90.RES:Rescheduling_interrupts
60290 -98.9% 656.83 ? 47% interrupts.CPU91.LOC:Local_timer_interrupts
436645 ? 95% -99.3% 3273 ?218% interrupts.CPU91.RES:Rescheduling_interrupts
60267 -98.1% 1156 ? 76% interrupts.CPU92.LOC:Local_timer_interrupts
185271 ?124% -97.9% 3825 ?207% interrupts.CPU92.RES:Rescheduling_interrupts
60276 -97.7% 1386 ? 58% interrupts.CPU93.LOC:Local_timer_interrupts
273505 ?107% -99.3% 1934 ?136% interrupts.CPU93.RES:Rescheduling_interrupts
60226 -98.3% 1046 ? 40% interrupts.CPU94.LOC:Local_timer_interrupts
97035 ?107% -89.1% 10581 ?154% interrupts.CPU94.RES:Rescheduling_interrupts
60273 -98.9% 663.67 ? 56% interrupts.CPU95.LOC:Local_timer_interrupts
659427 ? 51% -97.4% 17060 ?223% interrupts.CPU95.RES:Rescheduling_interrupts
5790318 -97.7% 136046 ? 14% interrupts.LOC:Local_timer_interrupts
41275590 ? 20% -97.0% 1229430 ? 15% interrupts.RES:Rescheduling_interrupts

pmeter.Average_Active_Power

500 +---------------------------------------------------------------------+
|...+...+...+...+....+...+...+...+...+...+...+...+....+ |
450 |-+ |
400 |-+ |
| |
350 |-+ |
| |
300 |-+ |
| |
250 |-+ |
200 |-+ |
| |
150 |-+ O O O |
| O O O O O O O O O O O O O |
100 +---------------------------------------------------------------------+

stress-ng.time.user_time

600 +---------------------------------------------------------------------+
| |
500 |...+...+...+...+....+...+...+...+...+...+...+...+....+ |
| |
| |
400 |-+ |
| |
300 |-+ |
| |
200 |-+ |
| |
| |
100 |-+ |
| |
0 +---------------------------------------------------------------------+

stress-ng.time.system_time

6000 +--------------------------------------------------------------------+
| |
5000 |...+...+...+...+...+...+...+...+....+...+...+...+...+ |
| |
| |
4000 |-+ |
| |
3000 |-+ |
| |
2000 |-+ |
| |
| |
1000 |-+ |
| |
0 +--------------------------------------------------------------------+

stress-ng.time.percent_of_cpu_this_job_got

10000 +-------------------------------------------------------------------+
9000 |... ..+...+...+...+...+...+...+...+...+...+...+...+ |
| +. |
8000 |-+ |
7000 |-+ |
| |
6000 |-+ |
5000 |-+ |
4000 |-+ |
| |
3000 |-+ |
2000 |-+ |
| |
1000 |-+ |
0 +-------------------------------------------------------------------+

stress-ng.time.involuntary_context_switches

1.2e+09 +-----------------------------------------------------------------+
| |
1e+09 |...+...+... ..+..+...+...+...+...+...+...+...+..+ |
| +. |
| |
8e+08 |-+ |
| |
6e+08 |-+ |
| |
4e+08 |-+ |
| |
| |
2e+08 |-+ |
| |
0 +-----------------------------------------------------------------+

stress-ng.sigio.ops

6e+08 +-------------------------------------------------------------------+
| |
5e+08 |...+...+...+...+...+...+...+...+...+...+...+...+...+ |
| |
| |
4e+08 |-+ |
| |
3e+08 |-+ |
| |
2e+08 |-+ |
| |
| |
1e+08 |-+ |
| |
0 +-------------------------------------------------------------------+

stress-ng.sigio.ops_per_sec

9e+06 +-------------------------------------------------------------------+
|...+...+...+. +. +. |
8e+06 |-+ |
7e+06 |-+ |
| |
6e+06 |-+ |
5e+06 |-+ |
| |
4e+06 |-+ |
3e+06 |-+ |
| |
2e+06 |-+ |
1e+06 |-+ |
| |
0 +-------------------------------------------------------------------+

[*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang

Attachments:

(No filename) (36.39 kB)
config-5.14.0-rc6-00044-g3b844826b6c6 (178.20 kB)
job-script (8.26 kB)
job.yaml (5.57 kB)
reproduce (419.00 B)
Download all attachments

2021-08-24 16:24:42

by Linus Torvalds

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

On Tue, Aug 24, 2021 at 7:56 AM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a -99.3% regression of stress-ng.sigio.ops_per_sec due to commit:

Well, that's bad.

> commit: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")

You fix one benchmark, you break another..

What's a bit odd is that this commit basically reverts commit
3a34b13a88ca ("pipe: make pipe writes always wake up readers") which
did *not* result in any kernel test robot report.

It's not a pure revert, because it adds that 'poll_usage' case (for
EPOLLET), but the stress-ng.sigio test doesn't even use select or poll
(ok, there's a select() call with an empty file descriptor set, which
seems to be just an odd way to spell "usleep()").

So it _looks_ to me like it's a 100% revert in practice for that test.
I strace'd the "stress-ng --sigio" case just to make sure I didn't
miss anything.

But I'm clearly missing something. Can anybody see what I'm missing,
and hit me over the head with the clue-bat?

Linus

2021-08-24 17:58:20

by Linus Torvalds

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

[ Added Colin King, as this looks to be related to stress-sigio.c ]

On Tue, Aug 24, 2021 at 9:22 AM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Aug 24, 2021 at 7:56 AM kernel test robot <[email protected]> wrote:
> >
> > FYI, we noticed a -99.3% regression of stress-ng.sigio.ops_per_sec due to commit:
>
> Well, that's bad.
>
> > commit: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
>
> You fix one benchmark, you break another..
>
> What's a bit odd is that this commit basically reverts commit
> 3a34b13a88ca ("pipe: make pipe writes always wake up readers") which
> did *not* result in any kernel test robot report.

Hmm. I have a possible clue.

I suspect this may be timing-related, and I wonder if the kernel test
robot just didn't happen to notice - or report - the other edge of
this performance regression because it ends up being random enough.

In particular, I think stress-sigio.c is subtly buggy, and the "send
SIGIO on every write" may have effectively hidden the bug.

Here's my theory on what's going on, and why it has tghat odd -99.3% regression:

- the child in stress-sigio.c does multiple writes in quick
succession, while the parent is scheduled away

- the pipe was empty at the beginning, but we send just *one* SIGIO
for all these writes

- the parent gets the SIGIO, and goes to stress_sigio_handler(), and
reads the buffer

but the parent doesn't necessarily *empty* the buffer, so it won't get
any more SIGIO signals from subsequent writes.

The reason this is timing-dependent and about luck is because the
*read* size is BUFFER_SIZE (4kB), but the write sizes are BUFFER_SIZE
>> 4 (256 bytes), so 99% of the time if the parent is at all timely,
it will do that big read and empty the buffer.

But if the child ends up able to do more than 16 writes in a row
before the parent wakes up and reads data, the buffer will have more
than 4kB of data in it, and when the parent SIGIO handler happens it
won't empty the data, and so it will never see another SIGIO.

And again - I think this all happened before too (ie before commit
3a34b13a88ca) and so 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET
wakeups under normal loads") isn't a _new_ regression, it just exposed
the bug in the test-case again, and this time the kernel test robot
caught it.

In other words, I think this is very similar to the EPOLLET problem:
user space expected (perhaps not _intentionally_) that SIGIO would
happen over and over again when you add new data, even if the pipe
already had data.

So then commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup
logic") came around back in 2019, and only sent SIGIO when the pipe
buffer status _changed_ (ie went from empty to having data, rather
than "there's new data") and at that point stress-sigio.c became very
fragile.

We could do the same ugly thing for FASYNC that we do for EPOLLET -
make it always fasync on new data, exactly because the previous SIGIO
might not have emptied the buffer completely.

I'm a _bit_ surprised that the kernel test robot didn't catch this
back in 2019, and also didn't report this when commit 3a34b13a88ca
("pipe: make pipe writes always wake up readers") should have sped
things up again, but I guess the "it's very timing-dependent" might
explain it.

Colin, comments?

Anyway, the above blathering is not based on any actual real data. I'm
just looking at stress-sigio.c to see what could be going wrong, and
what would explain the test robot report. It's the only thing I see,
but maybe somebody else sees anything else..

Linus

2021-08-24 18:00:01

by Linus Torvalds

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

On Tue, Aug 24, 2021 at 10:32 AM Linus Torvalds
<[email protected]> wrote:
>
> We could do the same ugly thing for FASYNC that we do for EPOLLET -
> make it always fasync on new data, exactly because the previous SIGIO
> might not have emptied the buffer completely.

The patch would be something like the attached (UNTESTED!)

Linus

Attachments:

patch.diff (2.25 kB)

2021-08-24 18:43:04

by Colin King

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

On 24/08/2021 18:32, Linus Torvalds wrote:
> [ Added Colin King, as this looks to be related to stress-sigio.c ]
>
> On Tue, Aug 24, 2021 at 9:22 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Tue, Aug 24, 2021 at 7:56 AM kernel test robot <[email protected]> wrote:
>>>
>>> FYI, we noticed a -99.3% regression of stress-ng.sigio.ops_per_sec due to commit:
>>
>> Well, that's bad.
>>
>>> commit: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
>>
>> You fix one benchmark, you break another..
>>
>> What's a bit odd is that this commit basically reverts commit
>> 3a34b13a88ca ("pipe: make pipe writes always wake up readers") which
>> did *not* result in any kernel test robot report.
>
> Hmm. I have a possible clue.
>
> I suspect this may be timing-related, and I wonder if the kernel test
> robot just didn't happen to notice - or report - the other edge of
> this performance regression because it ends up being random enough.
>
> In particular, I think stress-sigio.c is subtly buggy, and the "send
> SIGIO on every write" may have effectively hidden the bug.
>
> Here's my theory on what's going on, and why it has tghat odd -99.3% regression:
>
> - the child in stress-sigio.c does multiple writes in quick
> succession, while the parent is scheduled away
>
> - the pipe was empty at the beginning, but we send just *one* SIGIO
> for all these writes
>
> - the parent gets the SIGIO, and goes to stress_sigio_handler(), and
> reads the buffer
>
> but the parent doesn't necessarily *empty* the buffer, so it won't get
> any more SIGIO signals from subsequent writes.
>
> The reason this is timing-dependent and about luck is because the
> *read* size is BUFFER_SIZE (4kB), but the write sizes are BUFFER_SIZE
>>> 4 (256 bytes), so 99% of the time if the parent is at all timely,
> it will do that big read and empty the buffer.
>
> But if the child ends up able to do more than 16 writes in a row
> before the parent wakes up and reads data, the buffer will have more
> than 4kB of data in it, and when the parent SIGIO handler happens it
> won't empty the data, and so it will never see another SIGIO.
>
> And again - I think this all happened before too (ie before commit
> 3a34b13a88ca) and so 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET
> wakeups under normal loads") isn't a _new_ regression, it just exposed
> the bug in the test-case again, and this time the kernel test robot
> caught it.
>
> In other words, I think this is very similar to the EPOLLET problem:
> user space expected (perhaps not _intentionally_) that SIGIO would
> happen over and over again when you add new data, even if the pipe
> already had data.

yep, that's basically the underlying assumption I made with this rather
naive test. I'll re-work it sometime this week once I can formally
verify this is what is happening.

Thanks Linus for the insight.

Colin

>
> So then commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup
> logic") came around back in 2019, and only sent SIGIO when the pipe
> buffer status _changed_ (ie went from empty to having data, rather
> than "there's new data") and at that point stress-sigio.c became very
> fragile.
>
> We could do the same ugly thing for FASYNC that we do for EPOLLET -
> make it always fasync on new data, exactly because the previous SIGIO
> might not have emptied the buffer completely.
>
> I'm a _bit_ surprised that the kernel test robot didn't catch this
> back in 2019, and also didn't report this when commit 3a34b13a88ca
> ("pipe: make pipe writes always wake up readers") should have sped
> things up again, but I guess the "it's very timing-dependent" might
> explain it.
>
> Colin, comments?
>
> Anyway, the above blathering is not based on any actual real data. I'm
> just looking at stress-sigio.c to see what could be going wrong, and
> what would explain the test robot report. It's the only thing I see,
> but maybe somebody else sees anything else..
>
> Linus
>

2021-08-24 20:59:55

by Colin King

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

On 24/08/2021 18:32, Linus Torvalds wrote:
> [ Added Colin King, as this looks to be related to stress-sigio.c ]

FYI, I've pushed a fix that fixes my assumption that SIGIO occurs on
each write down the pipe. The fix now drains the input size and I've set
the read/write buffers to be the same size.

This will affect the bogo-ops rates for this stressor, so it will mess
up any on-going metrics folks may be using.

Colin

>
> On Tue, Aug 24, 2021 at 9:22 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> On Tue, Aug 24, 2021 at 7:56 AM kernel test robot <[email protected]> wrote:
>>>
>>> FYI, we noticed a -99.3% regression of stress-ng.sigio.ops_per_sec due to commit:
>>
>> Well, that's bad.
>>
>>> commit: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
>>
>> You fix one benchmark, you break another..
>>
>> What's a bit odd is that this commit basically reverts commit
>> 3a34b13a88ca ("pipe: make pipe writes always wake up readers") which
>> did *not* result in any kernel test robot report.
>
> Hmm. I have a possible clue.
>
> I suspect this may be timing-related, and I wonder if the kernel test
> robot just didn't happen to notice - or report - the other edge of
> this performance regression because it ends up being random enough.
>
> In particular, I think stress-sigio.c is subtly buggy, and the "send
> SIGIO on every write" may have effectively hidden the bug.
>
> Here's my theory on what's going on, and why it has tghat odd -99.3% regression:
>
> - the child in stress-sigio.c does multiple writes in quick
> succession, while the parent is scheduled away
>
> - the pipe was empty at the beginning, but we send just *one* SIGIO
> for all these writes
>
> - the parent gets the SIGIO, and goes to stress_sigio_handler(), and
> reads the buffer
>
> but the parent doesn't necessarily *empty* the buffer, so it won't get
> any more SIGIO signals from subsequent writes.
>
> The reason this is timing-dependent and about luck is because the
> *read* size is BUFFER_SIZE (4kB), but the write sizes are BUFFER_SIZE
>>> 4 (256 bytes), so 99% of the time if the parent is at all timely,
> it will do that big read and empty the buffer.
>
> But if the child ends up able to do more than 16 writes in a row
> before the parent wakes up and reads data, the buffer will have more
> than 4kB of data in it, and when the parent SIGIO handler happens it
> won't empty the data, and so it will never see another SIGIO.
>
> And again - I think this all happened before too (ie before commit
> 3a34b13a88ca) and so 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET
> wakeups under normal loads") isn't a _new_ regression, it just exposed
> the bug in the test-case again, and this time the kernel test robot
> caught it.
>
> In other words, I think this is very similar to the EPOLLET problem:
> user space expected (perhaps not _intentionally_) that SIGIO would
> happen over and over again when you add new data, even if the pipe
> already had data.
>
> So then commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup
> logic") came around back in 2019, and only sent SIGIO when the pipe
> buffer status _changed_ (ie went from empty to having data, rather
> than "there's new data") and at that point stress-sigio.c became very
> fragile.
>
> We could do the same ugly thing for FASYNC that we do for EPOLLET -
> make it always fasync on new data, exactly because the previous SIGIO
> might not have emptied the buffer completely.
>
> I'm a _bit_ surprised that the kernel test robot didn't catch this
> back in 2019, and also didn't report this when commit 3a34b13a88ca
> ("pipe: make pipe writes always wake up readers") should have sped
> things up again, but I guess the "it's very timing-dependent" might
> explain it.
>
> Colin, comments?
>
> Anyway, the above blathering is not based on any actual real data. I'm
> just looking at stress-sigio.c to see what could be going wrong, and
> what would explain the test robot report. It's the only thing I see,
> but maybe somebody else sees anything else..
>
> Linus
>

2021-08-25 14:16:42

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

Linus Torvalds <[email protected]> writes:

> On Tue, Aug 24, 2021 at 10:32 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> We could do the same ugly thing for FASYNC that we do for EPOLLET -
>> make it always fasync on new data, exactly because the previous SIGIO
>> might not have emptied the buffer completely.
>
> The patch would be something like the attached (UNTESTED!)

We have two things going on, a pipe wake up and signal wake up.

Does their order matter? It feels weird that it is possible that
the data can be read from the pipe and the reader woken up to write
more when the signal that notifies the reader of that state has
not even been queued for delivery.

Eric

> Linus
>
> fs/pipe.c | 20 ++++++++------------
> 1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 678dee2a8228..6d4342bad9f1 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -363,10 +363,9 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> * _very_ unlikely case that the pipe was full, but we got
> * no data.
> */
> - if (unlikely(was_full)) {
> + if (unlikely(was_full))
> wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
> - kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
> - }
> + kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
>
> /*
> * But because we didn't read anything, at this point we can
> @@ -385,12 +384,11 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> wake_next_reader = false;
> __pipe_unlock(pipe);
>
> - if (was_full) {
> + if (was_full)
> wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
> - kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
> - }
> if (wake_next_reader)
> wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
> + kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
> if (ret > 0)
> file_accessed(filp);
> return ret;
> @@ -565,10 +563,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
> * become empty while we dropped the lock.
> */
> __pipe_unlock(pipe);
> - if (was_empty) {
> + if (was_empty)
> wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
> - kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> - }
> + kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
> __pipe_lock(pipe);
> was_empty = pipe_empty(pipe->head, pipe->tail);
> @@ -591,10 +588,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
> * Epoll nonsensically wants a wakeup whether the pipe
> * was already empty or not.
> */
> - if (was_empty || pipe->poll_usage) {
> + if (was_empty || pipe->poll_usage)
> wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
> - kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> - }
> + kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> if (wake_next_writer)
> wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
> if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) {

2021-08-25 17:28:03

by Linus Torvalds

[permalink] [raw]

Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression

On Wed, Aug 25, 2021 at 7:11 AM Eric W. Biederman <[email protected]> wrote:
>
> We have two things going on, a pipe wake up and signal wake up.
>
> Does their order matter? It feels weird that it is possible that
> the data can be read from the pipe and the reader woken up to write
> more when the signal that notifies the reader of that state has
> not even been queued for delivery.

I don't think the order matters.

The only thing that matters is that the signal (and the regular
wakeup, for that matter) is donme *after* the operation that triggers
them is complete. It would be problematic if we sent the signal for
"you can read more now" before we had actually done the write, so that
the recipient would then try to read and get a "nothing there".

But if you have both a pending reader, and something that asked for
SIGIO, they'll both get notified. Not in any particular orfder between
those two, but both will be notified after the write (or read) that
triggered it has been done.

Of course, the pending reader/writer that gets notified might be the
_same_ one that also gets the SIGIO, so you could then have the SIGIO
mean that the reader/writer gets EINTR and needs to read/write again.
If you asked for both, you'll get it.

The way our pipe code is organized, the _likely_ case is that you'll
do the read/write and take the signal handler without ever getting
-EAGAIN. Because we'll test for "do we have more data" before we test
for "do we have signals pending" - but that's not really relevant for
correctness (it's more a greedy "let's try to read while possible" and
could be good for avoiding extra system calls).

And I think it was actually a historical mistake to tie the "send
SIGIO" together so tightly with "wake up other side". So my untested
patch (well, it's now tested by Oliver) is likely the right thing to
do regardless.

Basically, the "wake up readers/writers" thing is the one where we
*know* the other side - it's just the other end, and it's the kernel
code in the same fs/pipe.c file. So we can - and should - optimize
against doing unnecessary wakeups.

But as has now been shown several times, we shouldn't optimize the
case where the wakups are sent to code we don't control - ie user
space. Whether that be as a result of epoll, or as a signal delivery,
that "other side" is not under our control and clearly doesn't want
the optimal minimal wakeups for just state transition cases.

Of course, I could always wish that the receiving side always did the
right thing, and worked with the minimal state transition data, but
that clearly simply isn't the case. It wasn't the case for EPOLLET,
and it wasn't the case for SIGIO.

"If wishes were horses ..."

So I'll commit that SIGIO fix, even if Colin has already changed
stress-ng. No real harm in just doing both belt and suspenders, and I
was wrong last time when I thought the EPOLLET thing was purely a
test-suite issue.

Let nobody say that I can't learn from my mistakes..

Linus