2020-04-02 10:22:53

by Jian-Hong Pan

[permalink] [raw]
Subject: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

Hi,

We got some machines like Acer desktop equipped with NVIDIA GTX 1660
card, Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060
Mobile and ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060.
We found them take long time (more than 50 seconds) to resume after
suspend. During the resuming time, the screen is blank. And check
the dmesg, found the error during resume:

[ 28.060831] PM: suspend entry (deep)
[ 28.144260] Filesystems sync: 0.083 seconds
[ 28.150219] Freezing user space processes ...
[ 48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks
refusing to freeze, wq_busy=0):
[ 48.153447] systemd-udevd D13440 382 330 0x80004124
[ 48.153457] Call Trace:
[ 48.153504] ? __schedule+0x272/0x5a0
[ 48.153558] ? hrtimer_start_range_ns+0x18c/0x2c0
[ 48.153622] schedule+0x45/0xb0
[ 48.153668] schedule_hrtimeout_range_clock+0x8f/0x100
[ 48.153738] ? hrtimer_init_sleeper+0x80/0x80
[ 48.153798] usleep_range+0x5a/0x80
[ 48.153850] gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu]
[ 48.153933] gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu]
[ 48.154012] __i2c_transfer+0x163/0x4c0
[ 48.154067] i2c_transfer+0x6e/0xc0
[ 48.154120] ccg_read+0x11f/0x170 [ucsi_ccg]
[ 48.154182] get_fw_info+0x17/0x50 [ucsi_ccg]
[ 48.154242] ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg]
[ 48.154312] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg]
[ 48.154377] i2c_device_probe+0x113/0x210
[ 48.154435] really_probe+0xdf/0x280
[ 48.154487] driver_probe_device+0x4b/0xc0
[ 48.154545] device_driver_attach+0x4e/0x60
[ 48.154604] __driver_attach+0x44/0xb0
[ 48.154657] ? device_driver_attach+0x60/0x60
[ 48.154717] bus_for_each_dev+0x6c/0xb0
[ 48.154772] bus_add_driver+0x172/0x1c0
[ 48.154824] driver_register+0x67/0xb0
[ 48.154877] i2c_register_driver+0x39/0x70
[ 48.154932] ? 0xffffffffc00ac000
[ 48.154978] do_one_initcall+0x3e/0x1d0
[ 48.155032] ? free_vmap_area_noflush+0x8d/0xe0
[ 48.155093] ? _cond_resched+0x10/0x20
[ 48.155145] ? kmem_cache_alloc_trace+0x3a/0x1b0
[ 48.155208] do_init_module+0x56/0x200
[ 48.155260] load_module+0x21fe/0x24e0
[ 48.155322] ? __do_sys_finit_module+0xbf/0xe0
[ 48.155381] __do_sys_finit_module+0xbf/0xe0
[ 48.155441] do_syscall_64+0x3d/0x130
[ 48.156841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.158074] RIP: 0033:0x7fba3b4bc2a9
[ 48.158707] Code: Bad RIP value.
[ 48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX: 00007fba3b4bc2a9
[ 48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI: 0000000000000010
[ 48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09: 0000000000000000
[ 48.160117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
[ 48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15: 000055ca6922c470

I have filed this to bugzilla and more detail:
https://bugzilla.kernel.org/show_bug.cgi?id=206653

Any comment will be appreciated.

Thanks,
Jian-Hong Pan


2020-04-02 10:35:48

by Heikki Krogerus

[permalink] [raw]
Subject: Re: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

Hi,

On Thu, Apr 02, 2020 at 06:22:14PM +0800, Jian-Hong Pan wrote:
> Hi,
>
> We got some machines like Acer desktop equipped with NVIDIA GTX 1660
> card, Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060
> Mobile and ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060.
> We found them take long time (more than 50 seconds) to resume after
> suspend. During the resuming time, the screen is blank. And check
> the dmesg, found the error during resume:
>
> [ 28.060831] PM: suspend entry (deep)
> [ 28.144260] Filesystems sync: 0.083 seconds
> [ 28.150219] Freezing user space processes ...
> [ 48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 48.153447] systemd-udevd D13440 382 330 0x80004124
> [ 48.153457] Call Trace:
> [ 48.153504] ? __schedule+0x272/0x5a0
> [ 48.153558] ? hrtimer_start_range_ns+0x18c/0x2c0
> [ 48.153622] schedule+0x45/0xb0
> [ 48.153668] schedule_hrtimeout_range_clock+0x8f/0x100
> [ 48.153738] ? hrtimer_init_sleeper+0x80/0x80
> [ 48.153798] usleep_range+0x5a/0x80
> [ 48.153850] gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu]
> [ 48.153933] gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu]
> [ 48.154012] __i2c_transfer+0x163/0x4c0
> [ 48.154067] i2c_transfer+0x6e/0xc0
> [ 48.154120] ccg_read+0x11f/0x170 [ucsi_ccg]
> [ 48.154182] get_fw_info+0x17/0x50 [ucsi_ccg]
> [ 48.154242] ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg]
> [ 48.154312] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg]
> [ 48.154377] i2c_device_probe+0x113/0x210
> [ 48.154435] really_probe+0xdf/0x280
> [ 48.154487] driver_probe_device+0x4b/0xc0
> [ 48.154545] device_driver_attach+0x4e/0x60
> [ 48.154604] __driver_attach+0x44/0xb0
> [ 48.154657] ? device_driver_attach+0x60/0x60
> [ 48.154717] bus_for_each_dev+0x6c/0xb0
> [ 48.154772] bus_add_driver+0x172/0x1c0
> [ 48.154824] driver_register+0x67/0xb0
> [ 48.154877] i2c_register_driver+0x39/0x70
> [ 48.154932] ? 0xffffffffc00ac000
> [ 48.154978] do_one_initcall+0x3e/0x1d0
> [ 48.155032] ? free_vmap_area_noflush+0x8d/0xe0
> [ 48.155093] ? _cond_resched+0x10/0x20
> [ 48.155145] ? kmem_cache_alloc_trace+0x3a/0x1b0
> [ 48.155208] do_init_module+0x56/0x200
> [ 48.155260] load_module+0x21fe/0x24e0
> [ 48.155322] ? __do_sys_finit_module+0xbf/0xe0
> [ 48.155381] __do_sys_finit_module+0xbf/0xe0
> [ 48.155441] do_syscall_64+0x3d/0x130
> [ 48.156841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 48.158074] RIP: 0033:0x7fba3b4bc2a9
> [ 48.158707] Code: Bad RIP value.
> [ 48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX: 00007fba3b4bc2a9
> [ 48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI: 0000000000000010
> [ 48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09: 0000000000000000
> [ 48.160117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> [ 48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15: 000055ca6922c470
>
> I have filed this to bugzilla and more detail:
> https://bugzilla.kernel.org/show_bug.cgi?id=206653
>
> Any comment will be appreciated.

You are using an outdated kernel, 5.4.0. Please make sure that you can
reproduce the issue with mainline, or at least with the longterm
5.4.x.

Ajay, based on the backtrace, the issue seems to be starting from your
I2C driver. Please take a look at this.


thanks,

--
heikki

2020-04-02 10:38:45

by Heikki Krogerus

[permalink] [raw]
Subject: Re: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

On Thu, Apr 02, 2020 at 01:34:51PM +0300, Heikki Krogerus wrote:
> You are using an outdated kernel, 5.4.0. Please make sure that you can
> reproduce the issue with mainline, or at least with the longterm
> 5.4.x.

I meant the latest 5.4.x, which today is 5.4.29.


thanks,

--
heikki

2020-04-02 21:18:52

by Ajay Gupta

[permalink] [raw]
Subject: RE: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

Hi Jian

> -----Original Message-----
> From: Heikki Krogerus <[email protected]>
> Sent: Thursday, April 2, 2020 3:35 AM
> To: Jian-Hong Pan <[email protected]>; Ajay Gupta
> <[email protected]>
> Cc: [email protected]; Linux Kernel <[email protected]>;
> [email protected]; Linux Upstreaming Team <[email protected]>
> Subject: Re: [BUG] i2c_nvidia_gpu takes long time and makes system
> suspend & resume failed with NVIDIA cards
>
> External email: Use caution opening links or attachments
>
>
> Hi,
>
> On Thu, Apr 02, 2020 at 06:22:14PM +0800, Jian-Hong Pan wrote:
> > Hi,
> >
> > We got some machines like Acer desktop equipped with NVIDIA GTX 1660
> > card, Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060
> > Mobile and ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060.
> > We found them take long time (more than 50 seconds) to resume after
> > suspend. During the resuming time, the screen is blank. And check
> > the dmesg, found the error during resume:
> >
> > [ 28.060831] PM: suspend entry (deep)
> > [ 28.144260] Filesystems sync: 0.083 seconds
> > [ 28.150219] Freezing user space processes ...
> > [ 48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks
> > refusing to freeze, wq_busy=0):
> > [ 48.153447] systemd-udevd D13440 382 330 0x80004124
> > [ 48.153457] Call Trace:
> > [ 48.153504] ? __schedule+0x272/0x5a0
> > [ 48.153558] ? hrtimer_start_range_ns+0x18c/0x2c0
> > [ 48.153622] schedule+0x45/0xb0
> > [ 48.153668] schedule_hrtimeout_range_clock+0x8f/0x100
> > [ 48.153738] ? hrtimer_init_sleeper+0x80/0x80
> > [ 48.153798] usleep_range+0x5a/0x80
> > [ 48.153850] gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu]
> > [ 48.153933] gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu]
> > [ 48.154012] __i2c_transfer+0x163/0x4c0
> > [ 48.154067] i2c_transfer+0x6e/0xc0
> > [ 48.154120] ccg_read+0x11f/0x170 [ucsi_ccg]
> > [ 48.154182] get_fw_info+0x17/0x50 [ucsi_ccg]
> > [ 48.154242] ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg]
> > [ 48.154312] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg]
> > [ 48.154377] i2c_device_probe+0x113/0x210
> > [ 48.154435] really_probe+0xdf/0x280
> > [ 48.154487] driver_probe_device+0x4b/0xc0
> > [ 48.154545] device_driver_attach+0x4e/0x60
> > [ 48.154604] __driver_attach+0x44/0xb0
> > [ 48.154657] ? device_driver_attach+0x60/0x60
> > [ 48.154717] bus_for_each_dev+0x6c/0xb0
> > [ 48.154772] bus_add_driver+0x172/0x1c0
> > [ 48.154824] driver_register+0x67/0xb0
> > [ 48.154877] i2c_register_driver+0x39/0x70
> > [ 48.154932] ? 0xffffffffc00ac000
> > [ 48.154978] do_one_initcall+0x3e/0x1d0
> > [ 48.155032] ? free_vmap_area_noflush+0x8d/0xe0
> > [ 48.155093] ? _cond_resched+0x10/0x20
> > [ 48.155145] ? kmem_cache_alloc_trace+0x3a/0x1b0
> > [ 48.155208] do_init_module+0x56/0x200
> > [ 48.155260] load_module+0x21fe/0x24e0
> > [ 48.155322] ? __do_sys_finit_module+0xbf/0xe0
> > [ 48.155381] __do_sys_finit_module+0xbf/0xe0
> > [ 48.155441] do_syscall_64+0x3d/0x130
> > [ 48.156841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 48.158074] RIP: 0033:0x7fba3b4bc2a9
> > [ 48.158707] Code: Bad RIP value.
> > [ 48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000139
> > [ 48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX:
> 00007fba3b4bc2a9
> > [ 48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI:
> 0000000000000010
> > [ 48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09:
> 0000000000000000
> > [ 48.160117] R10: 0000000000000010 R11: 0000000000000246 R12:
> 0000000000000000
> > [ 48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15:
> 000055ca6922c470
> >
> > I have filed this to bugzilla and more detail:
> > https://bugzilla.kernel.org/show_bug.cgi?id=206653
> >
> > Any comment will be appreciated.
>
> You are using an outdated kernel, 5.4.0. Please make sure that you can
> reproduce the issue with mainline, or at least with the longterm 5.4.x.
>
> Ajay, based on the backtrace, the issue seems to be starting from your I2C
> driver. Please take a look at this.

I have replied to Bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=206653#c5

Thanks
> nvpuclic
>
> thanks,
>
> --
> heikki

2020-04-13 10:24:55

by Jian-Hong Pan

[permalink] [raw]
Subject: Re: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

Ajay Gupta <[email protected]> 於 2020年4月3日 週五 上午4:59寫道:
>
> Hi Jian
>
> > -----Original Message-----
> > From: Heikki Krogerus <[email protected]>
> > Sent: Thursday, April 2, 2020 3:35 AM
> > To: Jian-Hong Pan <[email protected]>; Ajay Gupta
> > <[email protected]>
> > Cc: [email protected]; Linux Kernel <[email protected]>;
> > [email protected]; Linux Upstreaming Team <[email protected]>
> > Subject: Re: [BUG] i2c_nvidia_gpu takes long time and makes system
> > suspend & resume failed with NVIDIA cards
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi,
> >
> > On Thu, Apr 02, 2020 at 06:22:14PM +0800, Jian-Hong Pan wrote:
> > > Hi,
> > >
> > > We got some machines like Acer desktop equipped with NVIDIA GTX 1660
> > > card, Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060
> > > Mobile and ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060.
> > > We found them take long time (more than 50 seconds) to resume after
> > > suspend. During the resuming time, the screen is blank. And check
> > > the dmesg, found the error during resume:
> > >
> > > [ 28.060831] PM: suspend entry (deep)
> > > [ 28.144260] Filesystems sync: 0.083 seconds
> > > [ 28.150219] Freezing user space processes ...
> > > [ 48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks
> > > refusing to freeze, wq_busy=0):
> > > [ 48.153447] systemd-udevd D13440 382 330 0x80004124
> > > [ 48.153457] Call Trace:
> > > [ 48.153504] ? __schedule+0x272/0x5a0
> > > [ 48.153558] ? hrtimer_start_range_ns+0x18c/0x2c0
> > > [ 48.153622] schedule+0x45/0xb0
> > > [ 48.153668] schedule_hrtimeout_range_clock+0x8f/0x100
> > > [ 48.153738] ? hrtimer_init_sleeper+0x80/0x80
> > > [ 48.153798] usleep_range+0x5a/0x80
> > > [ 48.153850] gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu]
> > > [ 48.153933] gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu]
> > > [ 48.154012] __i2c_transfer+0x163/0x4c0
> > > [ 48.154067] i2c_transfer+0x6e/0xc0
> > > [ 48.154120] ccg_read+0x11f/0x170 [ucsi_ccg]
> > > [ 48.154182] get_fw_info+0x17/0x50 [ucsi_ccg]
> > > [ 48.154242] ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg]
> > > [ 48.154312] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg]
> > > [ 48.154377] i2c_device_probe+0x113/0x210
> > > [ 48.154435] really_probe+0xdf/0x280
> > > [ 48.154487] driver_probe_device+0x4b/0xc0
> > > [ 48.154545] device_driver_attach+0x4e/0x60
> > > [ 48.154604] __driver_attach+0x44/0xb0
> > > [ 48.154657] ? device_driver_attach+0x60/0x60
> > > [ 48.154717] bus_for_each_dev+0x6c/0xb0
> > > [ 48.154772] bus_add_driver+0x172/0x1c0
> > > [ 48.154824] driver_register+0x67/0xb0
> > > [ 48.154877] i2c_register_driver+0x39/0x70
> > > [ 48.154932] ? 0xffffffffc00ac000
> > > [ 48.154978] do_one_initcall+0x3e/0x1d0
> > > [ 48.155032] ? free_vmap_area_noflush+0x8d/0xe0
> > > [ 48.155093] ? _cond_resched+0x10/0x20
> > > [ 48.155145] ? kmem_cache_alloc_trace+0x3a/0x1b0
> > > [ 48.155208] do_init_module+0x56/0x200
> > > [ 48.155260] load_module+0x21fe/0x24e0
> > > [ 48.155322] ? __do_sys_finit_module+0xbf/0xe0
> > > [ 48.155381] __do_sys_finit_module+0xbf/0xe0
> > > [ 48.155441] do_syscall_64+0x3d/0x130
> > > [ 48.156841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [ 48.158074] RIP: 0033:0x7fba3b4bc2a9
> > > [ 48.158707] Code: Bad RIP value.
> > > [ 48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX:
> > > 0000000000000139
> > > [ 48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX:
> > 00007fba3b4bc2a9
> > > [ 48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI:
> > 0000000000000010
> > > [ 48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09:
> > 0000000000000000
> > > [ 48.160117] R10: 0000000000000010 R11: 0000000000000246 R12:
> > 0000000000000000
> > > [ 48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15:
> > 000055ca6922c470
> > >
> > > I have filed this to bugzilla and more detail:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=206653
> > >
> > > Any comment will be appreciated.
> >
> > You are using an outdated kernel, 5.4.0. Please make sure that you can
> > reproduce the issue with mainline, or at least with the longterm 5.4.x.
> >
> > Ajay, based on the backtrace, the issue seems to be starting from your I2C
> > driver. Please take a look at this.
>
> I have replied to Bugzilla
> https://bugzilla.kernel.org/show_bug.cgi?id=206653#c5

Thanks for both your reply!

I have commented the test result at
https://bugzilla.kernel.org/show_bug.cgi?id=206653#c6

Jian-Hong Pan

> Thanks
> > nvpuclic
> >
> > thanks,
> >
> > --
> > heikki