2018-01-25 06:30:46

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [sched/deadline] e0367b1267: WARNING:at_kernel/sched/sched.h:#assert_clock_updated


FYI, we noticed the following commit (built with gcc-7):

commit: e0367b12674bf4420870cd0237e3ebafb2ec9593 ("sched/deadline: Move CPU frequency selection triggering points")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: ltp
with following parameters:

test: sched

test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
test-url: http://linux-test-project.github.io/


on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------------------------------+------------+------------+
| | d4edd662ac | e0367b1267 |
+------------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 9 | 7 |
| invoked_oom-killer:gfp_mask=0x | 9 | 4 |
| Mem-Info | 9 | 4 |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 9 | 4 |
| WARNING:at_kernel/sched/sched.h:#assert_clock_updated | 0 | 3 |
| RIP:assert_clock_updated | 0 | 3 |
| RIP:cpuidle_enter_state | 0 | 1 |
| RIP:clear_page_erms | 0 | 1 |
| RIP:page_add_file_rmap | 0 | 1 |
+------------------------------------------------------------------+------------+------------+



kern :warn : [ 123.841479] WARNING: CPU: 67 PID: 3837 at kernel/sched/sched.h:889 assert_clock_updated+0x13/0x20
kern :warn : [ 123.841480] Modules linked in: loop intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 irqbypass crct10dif_pclmul ttm crc32_pclmul snd_pcm drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops snd_timer mxm_wmi pcbc ipmi_si ahci aesni_intel snd crypto_simd ipmi_devintf glue_helper soundcore drm libahci cryptd ipmi_msghandler pcspkr libata shpchp wmi acpi_pad acpi_power_meter ip_tables
kern :warn : [ 123.841500] CPU: 67 PID: 3837 Comm: ltp-pan Not tainted 4.15.0-rc6-00133-ge0367b1 #1
kern :warn : [ 123.841501] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
kern :warn : [ 123.841503] RIP: 0010:assert_clock_updated+0x13/0x20
kern :warn : [ 123.841504] RSP: 0018:ffff88102bdc3ee0 EFLAGS: 00010086
kern :warn : [ 123.841505] RAX: 0000000000000026 RBX: ffff88106f23cfe8 RCX: 0000000000000000
kern :warn : [ 123.841505] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000092
kern :warn : [ 123.841506] RBP: ffff88102bde2800 R08: 0000000000000026 R09: ffffffff82b1d4e0
user :notice: [ 123.841507] fpu_exception : yes
kern :warn : [ 123.841507] R10: ffff8810280f4498 R11: 0000000000000000 R12: ffff8810279be000

kern :warn : [ 123.841508] R13: 0000000000022800 R14: ffffffff81767d50 R15: ffff88102bddd1a8
kern :warn : [ 123.841510] FS: 00007fd41d6d9700(0000) GS:ffff88102bdc0000(0000) knlGS:0000000000000000
kern :warn : [ 123.841510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern :warn : [ 123.841511] CR2: 0000563f9799ee30 CR3: 000000107bb78001 CR4: 00000000003606e0
kern :warn : [ 123.841512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kern :warn : [ 123.841512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kern :warn : [ 123.841513] Call Trace:
kern :warn : [ 123.841515] <IRQ>
kern :warn : [ 123.841517] inactive_task_timer+0x383/0x460
kern :warn : [ 123.841520] ? pull_dl_task+0x600/0x600
kern :warn : [ 123.841525] __hrtimer_run_queues+0xd3/0x230
kern :warn : [ 123.841528] hrtimer_interrupt+0xa6/0x200
kern :warn : [ 123.841532] smp_apic_timer_interrupt+0x56/0x130
kern :warn : [ 123.841534] apic_timer_interrupt+0xa2/0xb0
kern :warn : [ 123.841535] </IRQ>
kern :warn : [ 123.841539] RIP: 0010:alloc_pid+0xcf/0x2a0
kern :warn : [ 123.841539] RSP: 0018:ffffc900084b7db0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff11
kern :warn : [ 123.841540] RAX: 000000000000171e RBX: 0000000000000000 RCX: 0000000000000000
kern :warn : [ 123.841541] RDX: 000000000000171e RSI: ffff8810280f4470 RDI: ffffffff82205180
kern :warn : [ 123.841542] RBP: ffffffff8225a2e0 R08: 0000000000000044 R09: 0000000000000228
kern :warn : [ 123.841542] R10: ffff8810280f4498 R11: 000000000000001f R12: ffff881024787278
kern :warn : [ 123.841543] R13: ffffffff8225a2e0 R14: ffff881024787240 R15: ffffffff82205180
kern :warn : [ 123.841548] copy_process+0xf5a/0x1b00
kern :warn : [ 123.841551] _do_fork+0xbd/0x3d0
kern :warn : [ 123.841553] ? task_work_run+0x48/0xd0
kern :warn : [ 123.841557] do_syscall_64+0x61/0x190
kern :warn : [ 123.841559] entry_SYSCALL64_slow_path+0x25/0x25
kern :warn : [ 123.841561] RIP: 0033:0x7fd41cec676b
kern :warn : [ 123.841561] RSP: 002b:00007ffc261df870 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
kern :warn : [ 123.841562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd41cec676b
kern :warn : [ 123.841563] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
kern :warn : [ 123.841563] RBP: 00007ffc261df8b0 R08: 00007fd41d6d9700 R09: 000000000000001e
kern :warn : [ 123.841564] R10: 00007fd41d6d99d0 R11: 0000000000000246 R12: 0000000000000000
kern :warn : [ 123.841564] R13: 0000563f9799cae0 R14: 0000000000000000 R15: 0000563f9799cd10
kern :warn : [ 123.841566] Code: 48 89 83 20 09 00 00 eb c7 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c7 c8 fe 0a 82 c6 05 c3 69 35 01 01 e8 ad 01 fc ff <0f> ff c3 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c7 15 04 0b 82 c6
kern :warn : [ 123.841583] ---[ end trace 5f3bf5f199ab1f72 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Xiaolong


Attachments:
(No filename) (6.48 kB)
config-4.15.0-rc6-00133-ge0367b1 (167.43 kB)
job-script (4.31 kB)
kmsg.xz (47.42 kB)
ltp (312.95 kB)
job.yaml (3.70 kB)
reproduce (81.00 B)
Download all attachments

2018-01-25 11:21:00

by Juri Lelli

[permalink] [raw]
Subject: Re: [lkp-robot] [sched/deadline] e0367b1267: WARNING:at_kernel/sched/sched.h:#assert_clock_updated

Hi,

On 25/01/18 14:28, kernel test robot wrote:
>
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: e0367b12674bf4420870cd0237e3ebafb2ec9593 ("sched/deadline: Move CPU frequency selection triggering points")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: ltp
> with following parameters:
>
> test: sched
>
> test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
>
>
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +------------------------------------------------------------------+------------+------------+
> | | d4edd662ac | e0367b1267 |
> +------------------------------------------------------------------+------------+------------+
> | boot_successes | 0 | 0 |
> | boot_failures | 9 | 7 |
> | invoked_oom-killer:gfp_mask=0x | 9 | 4 |
> | Mem-Info | 9 | 4 |
> | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 9 | 4 |
> | WARNING:at_kernel/sched/sched.h:#assert_clock_updated | 0 | 3 |
> | RIP:assert_clock_updated | 0 | 3 |
> | RIP:cpuidle_enter_state | 0 | 1 |
> | RIP:clear_page_erms | 0 | 1 |
> | RIP:page_add_file_rmap | 0 | 1 |
> +------------------------------------------------------------------+------------+------------+
>
>
>
> kern :warn : [ 123.841479] WARNING: CPU: 67 PID: 3837 at kernel/sched/sched.h:889 assert_clock_updated+0x13/0x20
> kern :warn : [ 123.841480] Modules linked in: loop intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mgag200 irqbypass crct10dif_pclmul ttm crc32_pclmul snd_pcm drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops snd_timer mxm_wmi pcbc ipmi_si ahci aesni_intel snd crypto_simd ipmi_devintf glue_helper soundcore drm libahci cryptd ipmi_msghandler pcspkr libata shpchp wmi acpi_pad acpi_power_meter ip_tables
> kern :warn : [ 123.841500] CPU: 67 PID: 3837 Comm: ltp-pan Not tainted 4.15.0-rc6-00133-ge0367b1 #1
> kern :warn : [ 123.841501] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
> kern :warn : [ 123.841503] RIP: 0010:assert_clock_updated+0x13/0x20
> kern :warn : [ 123.841504] RSP: 0018:ffff88102bdc3ee0 EFLAGS: 00010086
> kern :warn : [ 123.841505] RAX: 0000000000000026 RBX: ffff88106f23cfe8 RCX: 0000000000000000
> kern :warn : [ 123.841505] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000092
> kern :warn : [ 123.841506] RBP: ffff88102bde2800 R08: 0000000000000026 R09: ffffffff82b1d4e0
> user :notice: [ 123.841507] fpu_exception : yes
> kern :warn : [ 123.841507] R10: ffff8810280f4498 R11: 0000000000000000 R12: ffff8810279be000
>
> kern :warn : [ 123.841508] R13: 0000000000022800 R14: ffffffff81767d50 R15: ffff88102bddd1a8
> kern :warn : [ 123.841510] FS: 00007fd41d6d9700(0000) GS:ffff88102bdc0000(0000) knlGS:0000000000000000
> kern :warn : [ 123.841510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kern :warn : [ 123.841511] CR2: 0000563f9799ee30 CR3: 000000107bb78001 CR4: 00000000003606e0
> kern :warn : [ 123.841512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kern :warn : [ 123.841512] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> kern :warn : [ 123.841513] Call Trace:
> kern :warn : [ 123.841515] <IRQ>
> kern :warn : [ 123.841517] inactive_task_timer+0x383/0x460
> kern :warn : [ 123.841520] ? pull_dl_task+0x600/0x600
> kern :warn : [ 123.841525] __hrtimer_run_queues+0xd3/0x230
> kern :warn : [ 123.841528] hrtimer_interrupt+0xa6/0x200
> kern :warn : [ 123.841532] smp_apic_timer_interrupt+0x56/0x130
> kern :warn : [ 123.841534] apic_timer_interrupt+0xa2/0xb0
> kern :warn : [ 123.841535] </IRQ>
> kern :warn : [ 123.841539] RIP: 0010:alloc_pid+0xcf/0x2a0
> kern :warn : [ 123.841539] RSP: 0018:ffffc900084b7db0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff11
> kern :warn : [ 123.841540] RAX: 000000000000171e RBX: 0000000000000000 RCX: 0000000000000000
> kern :warn : [ 123.841541] RDX: 000000000000171e RSI: ffff8810280f4470 RDI: ffffffff82205180
> kern :warn : [ 123.841542] RBP: ffffffff8225a2e0 R08: 0000000000000044 R09: 0000000000000228
> kern :warn : [ 123.841542] R10: ffff8810280f4498 R11: 000000000000001f R12: ffff881024787278
> kern :warn : [ 123.841543] R13: ffffffff8225a2e0 R14: ffff881024787240 R15: ffffffff82205180
> kern :warn : [ 123.841548] copy_process+0xf5a/0x1b00
> kern :warn : [ 123.841551] _do_fork+0xbd/0x3d0
> kern :warn : [ 123.841553] ? task_work_run+0x48/0xd0
> kern :warn : [ 123.841557] do_syscall_64+0x61/0x190
> kern :warn : [ 123.841559] entry_SYSCALL64_slow_path+0x25/0x25
> kern :warn : [ 123.841561] RIP: 0033:0x7fd41cec676b
> kern :warn : [ 123.841561] RSP: 002b:00007ffc261df870 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
> kern :warn : [ 123.841562] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd41cec676b
> kern :warn : [ 123.841563] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
> kern :warn : [ 123.841563] RBP: 00007ffc261df8b0 R08: 00007fd41d6d9700 R09: 000000000000001e
> kern :warn : [ 123.841564] R10: 00007fd41d6d99d0 R11: 0000000000000246 R12: 0000000000000000
> kern :warn : [ 123.841564] R13: 0000563f9799cae0 R14: 0000000000000000 R15: 0000563f9799cd10
> kern :warn : [ 123.841566] Code: 48 89 83 20 09 00 00 eb c7 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c7 c8 fe 0a 82 c6 05 c3 69 35 01 01 e8 ad 01 fc ff <0f> ff c3 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c7 15 04 0b 82 c6
> kern :warn : [ 123.841583] ---[ end trace 5f3bf5f199ab1f72 ]---
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml

Hummm, wondering how LTP sched tests could trigger this, since a quick
grep into ltp didn't show DEADLINE usage.

However, I'm currently thinking that we should update clock before
potentially calling sub_running_bw (which might trigger schedutil
update) in inactive_task_timer.

Would what follows cure it?

Thanks,

- Juri

--->8---
kernel/sched/deadline.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9bb0e0c412ec..59761e635d12 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1261,6 +1261,9 @@ static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer)

rq = task_rq_lock(p, &rf);

+ sched_clock_tick();
+ update_rq_clock(rq);
+
if (!dl_task(p) || p->state == TASK_DEAD) {
struct dl_bw *dl_b = dl_bw_of(task_cpu(p));

@@ -1280,9 +1283,6 @@ static enum hrtimer_restart inactive_task_timer(struct hrtimer *timer)
if (dl_se->dl_non_contending == 0)
goto unlock;

- sched_clock_tick();
- update_rq_clock(rq);
-
sub_running_bw(dl_se, &rq->dl);
dl_se->dl_non_contending = 0;
unlock:

2018-01-25 12:54:23

by Cyril Hrubis

[permalink] [raw]
Subject: Re: [LTP] [lkp-robot] [sched/deadline] e0367b1267: WARNING:at_kernel/sched/sched.h:#assert_clock_updated

Hi!
> Hummm, wondering how LTP sched tests could trigger this, since a quick
> grep into ltp didn't show DEADLINE usage.

See kernel/syscalls/sched_setattr/sched_setattr01.c

--
Cyril Hrubis
[email protected]

2018-01-25 13:37:39

by Cyril Hrubis

[permalink] [raw]
Subject: Re: [LTP] [lkp-robot] [sched/deadline] e0367b1267: WARNING:at_kernel/sched/sched.h:#assert_clock_updated

Hi!
> > > Hummm, wondering how LTP sched tests could trigger this, since a quick
> > > grep into ltp didn't show DEADLINE usage.
> >
> > See kernel/syscalls/sched_setattr/sched_setattr01.c
>
> Right, saw that. I was still thinking though why the report seemed to
> point to sched, and not syscalls, tests.

These collections of tests have sometimes non empty intersections. In
this case the particular test is both part of the sched and syscall
testruns (grep sched_setattr01 runtest/*). So I suppose that the test
fails both in the sched and syscall run but the sched one was simply
picked first here.

> Thanks for pointing this out.

Glad to help :-).

--
Cyril Hrubis
[email protected]

2018-01-25 14:31:07

by Juri Lelli

[permalink] [raw]
Subject: Re: [LTP] [lkp-robot] [sched/deadline] e0367b1267: WARNING:at_kernel/sched/sched.h:#assert_clock_updated

On 25/01/18 13:24, Cyril Hrubis wrote:
> Hi!
> > Hummm, wondering how LTP sched tests could trigger this, since a quick
> > grep into ltp didn't show DEADLINE usage.
>
> See kernel/syscalls/sched_setattr/sched_setattr01.c

Right, saw that. I was still thinking though why the report seemed to
point to sched, and not syscalls, tests.

Thanks for pointing this out.

Best,

- Juri