2022-08-03 20:19:20

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216322] New: Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

https://bugzilla.kernel.org/show_bug.cgi?id=216322

Bug ID: 216322
Summary: Freezing of tasks failed after 60.004 seconds (1 tasks
refusing to freeze... task:fstrim ext4_trim_fs - Dell
XPS 13 9310
Product: File System
Version: 2.5
Kernel Version: 5.19.0
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
Assignee: [email protected]
Reporter: [email protected]
Regression: No

The system suspend to idle path occasionally times out after 60 seconds
and throws the stack trace below.

The result is that the suspend path is aborted and the systems continues
running.

Unfortunately, when a user invokes suspend, they expect it to work,
and they may not be around (lid closed) to try it again when it aborts...

[10483.047079] PM: suspend entry (s2idle)
[10483.052777] Filesystems sync: 0.005 seconds
[10483.052782] PM: Preparing system for sleep (s2idle)
[10483.060824] Freezing user space processes ...
[10543.024088] Freezing of tasks failed after 60.004 seconds (1 tasks refusing
to freeze, wq_busy=0):
[10543.024175] task:fstrim state:D stack: 0 pid:225775 ppid: 1
flags:0x00004006
[10543.024183] Call Trace:
[10543.024186] &ltTASK&gt
[10543.024192] __schedule+0x306/0x9f0
[10543.024202] schedule+0x5c/0xd0
[10543.024206] schedule_timeout+0x87/0x160
[10543.024211] ? timer_migration_handler+0xa0/0xa0
[10543.024217] trace_clock_x86_tsc+0x20/0x20
[10543.024224] __wait_for_common+0x8f/0x190
[10543.024228] ? firmware_map_remove+0x9c/0x9c
[10543.024233] wait_for_completion_io_timeout+0x1d/0x30
[10543.024237] submit_bio_wait+0x7f/0xc0
[10543.024244] blkdev_issue_discard+0x6e/0xc0
[10543.024250] ext4_try_to_trim_range+0x1f0/0x440
[10543.024259] ext4_trim_fs+0x327/0x4d0
[10543.024266] __ext4_ioctl+0x2d3/0x1590
[10543.024270] ? putname+0x59/0x70
[10543.024275] ? __seccomp_filter+0x3a6/0x5c0
[10543.024283] ext4_ioctl+0xe/0x20
[10543.024287] __x64_sys_ioctl+0x92/0xd0
[10543.024293] do_syscall_64+0x59/0x90
[10543.024297] ? do_syscall_64+0x69/0x90
[10543.024300] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[10543.024306] RIP: 0033:0x7f2ae6b1aaff
[10543.024312] RSP: 002b:00007ffd41e6de60 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[10543.024317] RAX: ffffffffffffffda RBX: 00007ffd41e6dfb0 RCX:
00007f2ae6b1aaff
[10543.024319] RDX: 00007ffd41e6ded0 RSI: 00000000c0185879 RDI:
0000000000000003
[10543.024322] RBP: 00005621c3642e90 R08: 00005621c3642e90 R09:
0000000000000000
[10543.024324] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000003
[10543.024326] R13: 00005621c3642ce0 R14: 00005621c3642980 R15:
00005621c3642980
[10543.024330] &lt/TASK&gt
[10543.024343] OOM killer enabled.
[10543.024344] Restarting tasks ... done.
[10543.085776] PM: suspend exit

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


2022-08-03 20:28:04

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

https://bugzilla.kernel.org/show_bug.cgi?id=216322

--- Comment #1 from Len Brown ([email protected]) ---
Created attachment 301522
--> https://bugzilla.kernel.org/attachment.cgi?id=301522&action=edit
html.gz page showing the failure

The sleepgraph output shows the 'fstrim' kernel thread
continuously calling schedule_time(15000).

Interestingly, re-trying the suspend after this failure
is successful. So the failure is not permanent.
(perhaps the kernel is waiting for a frozen user process that is allowed to
proceed?)

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-08-04 01:02:53

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

https://bugzilla.kernel.org/show_bug.cgi?id=216322

Theodore Tso ([email protected]) changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]

--- Comment #2 from Theodore Tso ([email protected]) ---
So the problem is that the FITRIM ioctl does not check if a signal is pending,
and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
Apprentive", it will mindlessly send discard requests for any blocks not in use
by the file system until it is done. Or to put it another way, "Neither rain,
nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its
appointed task." :-)

The question is how to fix things. The problem is that the FITRIM ioctl
interface is pretty horrible. The fstrim_range.len variable is an IN/OUT
field where on the input it is the number of bytes that should be trimmed (from
start to start+len) and when the ioctl returns fstrm_range.len is the number of
bytes that were actually trimmed. So this is not really amenable for
-ERESTARTSYS.

Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return
code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal
to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and
the rest of the file system trim operation will be aborted.

It might be that the only way we can fix this is to have FITRIM return EAGAIN,
which will stop the fstrim in its tracks. This is... not great, but typically
fstrim is run out of crontab or a systemd timer once a month, so if the user
tries to suspend right as the fstrim is running, hopefully we'll get lucky next
month. We can then try teach fstrim to do the right thing, and so this
lossage mode would only happen in the combination of a new kernel and an older
version of util-linux.

I'm not happy with that solution, but the alternative of creating a new FITRIM2
ioctl that has a sane interface means that you need an new kernel and a new
util-linux package, and if you don't, the user will have to deal with a hot
laptop bag and a drained battery. And not changing FITRIM's behaviour will
have the same potential end result, if the user gets unlucky and tries to
suspend the laptop when there is more than 60 seconds left before FITRIM to
complete. :-/

The other thing I'll note is that every file system has its own FITRIM
implementation, and I suspect they all have this issue, because the FITRIM
interface is fundamentally flawed.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-08-04 01:08:11

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

https://bugzilla.kernel.org/show_bug.cgi?id=216322

--- Comment #3 from Theodore Tso ([email protected]) ---
The other consideration is if there is some other userspace application other
than util-linux which is using the FITRIM ioctl --- for example, what if
systemd decided it needed to reimplement fstrim the way it's reimplemented
syslogd, ntpd, etc., etc., etc.? In which case, if we change FITRIM so that
if it gets a signal or if the system tries to suspend itself, it will return
EAGAIN and fstrim_range.len will have the number of bytes trimmed so far ---
this might cause the systemd-reimplementation (or any other hypothetical users
of FITRIM) to break if there is a suspend-to-ram happening at an inopportune
time.

So which is worse?

1) Leaving suspend-to-ram broken if the user is unlucky enough to try to
suspend their laptop while fstrim is run automatically by systemd or out of
crontab?

2) Breaking random userspace programs that use FITRIM so they doesn't
complete the requested file system/SSD maintenance if the user tries to suspend
their laptop while that program happens to be running? (We can fix the
userspace programs which use FITRIM so they handle the EAGAIN error return as
we find them, of course. At the moment, it's only util-linux as far as I
know.)

In the long term, #2 seems like the best approach, IMHO. OTOH, it could be
argued that we've lived with this for years and years and years, and no one has
noticed up until now.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-08-04 12:01:58

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

https://bugzilla.kernel.org/show_bug.cgi?id=216322

--- Comment #4 from Lukas Czerner ([email protected]) ---
On Thu, Aug 04, 2022 at 12:44:45AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216322
>
> Theodore Tso ([email protected]) changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |[email protected]
>
> --- Comment #2 from Theodore Tso ([email protected]) ---
> So the problem is that the FITRIM ioctl does not check if a signal is
> pending,
> and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
> like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
> Apprentive", it will mindlessly send discard requests for any blocks not in
> use
> by the file system until it is done. Or to put it another way, "Neither
> rain,
> nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its
> appointed task." :-)
>
> The question is how to fix things. The problem is that the FITRIM ioctl
> interface is pretty horrible. The fstrim_range.len variable is an IN/OUT
> field where on the input it is the number of bytes that should be trimmed
> (from
> start to start+len) and when the ioctl returns fstrm_range.len is the number
> of
> bytes that were actually trimmed. So this is not really amenable for
> -ERESTARTSYS.
>
> Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return
> code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal
> to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and
> the rest of the file system trim operation will be aborted.
>
> It might be that the only way we can fix this is to have FITRIM return
> EAGAIN,
> which will stop the fstrim in its tracks. This is... not great, but
> typically
> fstrim is run out of crontab or a systemd timer once a month, so if the user
> tries to suspend right as the fstrim is running, hopefully we'll get lucky
> next
> month. We can then try teach fstrim to do the right thing, and so this
> lossage mode would only happen in the combination of a new kernel and an
> older
> version of util-linux.
>
> I'm not happy with that solution, but the alternative of creating a new
> FITRIM2
> ioctl that has a sane interface means that you need an new kernel and a new
> util-linux package, and if you don't, the user will have to deal with a hot
> laptop bag and a drained battery. And not changing FITRIM's behaviour will
> have the same potential end result, if the user gets unlucky and tries to
> suspend the laptop when there is more than 60 seconds left before FITRIM to
> complete. :-/
>
> The other thing I'll note is that every file system has its own FITRIM
> implementation, and I suspect they all have this issue, because the FITRIM
> interface is fundamentally flawed.

I agree that the FITRIM interface is flawed in this way. But
ext4_try_to_trim_range() actually does have fatal_signal_pending() and
will return -ERESTARTSYS if that's true. Or did you have something else in
mind?

Also in that case, I see no reason why we would not be able to adjust
the fstrim_range to make it easier to re-start where we left off if
we're going to return -ERESTARTSYS. I am missing something?

I have not had time to look deeply into the traces, but are you actually
sure that we're not stuck in blkdev_issue_discard() instead?

-Lukas

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2022-08-04 12:03:52

by Lukas Czerner

[permalink] [raw]
Subject: Re: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

On Thu, Aug 04, 2022 at 12:44:45AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216322
>
> Theodore Tso ([email protected]) changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |[email protected]
>
> --- Comment #2 from Theodore Tso ([email protected]) ---
> So the problem is that the FITRIM ioctl does not check if a signal is pending,
> and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
> like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
> Apprentive", it will mindlessly send discard requests for any blocks not in use
> by the file system until it is done. Or to put it another way, "Neither rain,
> nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its
> appointed task." :-)
>
> The question is how to fix things. The problem is that the FITRIM ioctl
> interface is pretty horrible. The fstrim_range.len variable is an IN/OUT
> field where on the input it is the number of bytes that should be trimmed (from
> start to start+len) and when the ioctl returns fstrm_range.len is the number of
> bytes that were actually trimmed. So this is not really amenable for
> -ERESTARTSYS.
>
> Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return
> code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal
> to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and
> the rest of the file system trim operation will be aborted.
>
> It might be that the only way we can fix this is to have FITRIM return EAGAIN,
> which will stop the fstrim in its tracks. This is... not great, but typically
> fstrim is run out of crontab or a systemd timer once a month, so if the user
> tries to suspend right as the fstrim is running, hopefully we'll get lucky next
> month. We can then try teach fstrim to do the right thing, and so this
> lossage mode would only happen in the combination of a new kernel and an older
> version of util-linux.
>
> I'm not happy with that solution, but the alternative of creating a new FITRIM2
> ioctl that has a sane interface means that you need an new kernel and a new
> util-linux package, and if you don't, the user will have to deal with a hot
> laptop bag and a drained battery. And not changing FITRIM's behaviour will
> have the same potential end result, if the user gets unlucky and tries to
> suspend the laptop when there is more than 60 seconds left before FITRIM to
> complete. :-/
>
> The other thing I'll note is that every file system has its own FITRIM
> implementation, and I suspect they all have this issue, because the FITRIM
> interface is fundamentally flawed.

I agree that the FITRIM interface is flawed in this way. But
ext4_try_to_trim_range() actually does have fatal_signal_pending() and
will return -ERESTARTSYS if that's true. Or did you have something else in
mind?

Also in that case, I see no reason why we would not be able to adjust
the fstrim_range to make it easier to re-start where we left off if
we're going to return -ERESTARTSYS. I am missing something?

I have not had time to look deeply into the traces, but are you actually
sure that we're not stuck in blkdev_issue_discard() instead?

-Lukas


2022-08-04 14:56:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

On Thu, Aug 04, 2022 at 11:47:47AM +0000, [email protected] wrote:
>
> I agree that the FITRIM interface is flawed in this way. But
> ext4_try_to_trim_range() actually does have fatal_signal_pending() and
> will return -ERESTARTSYS if that's true. Or did you have something else in
> mind?

The fatal_signal_pending() only checks for SIGKILL. I'm not sure why
it returns ERESTARTSYS, since that's not applicable for a kill -9
signal. The fake_signal_wake_up() function in kernel/freezer.c
doesn't send a fatal signal, so the fatal_signal_pending() check isn't
going to help here.

> Also in that case, I see no reason why we would not be able to adjust
> the fstrim_range to make it easier to re-start where we left off if
> we're going to return -ERESTARTSYS. I am missing something?

Well, we could adjust fstrim_range.start and fstrim_range.len to make
it easier to restart --- but that's only if we know for sure that
we're going to be restarting the system call. So we need to break
some abstraction barriers since if the signal is one where based on
the sigaction flags, the system all is *not* restarted, then
fstrim_range.len is supposed to contain the number of bytes trimmed.

And even if the system call is restarted, there's no place to stash
the number of bytes trimmed so far, since fstrim_range.len is
overloaded. This why the interface is so horrible...

> I have not had time to look deeply into the traces, but are you actually
> sure that we're not stuck in blkdev_issue_discard() instead?

I'm not 100% certain, but unless the block device has been put to
sleep first (in which case I think we would have noticed much sooner
since lots of other suspend-to-ram use cases would be failling --- in
writeback threads, for example), I'd be really surprised if we're
getting stuck there.

Even when we need to wait for the queue to be drained so there is
space to send the next discard, that shouldn't take 60+ seconds.

- Ted

2022-08-04 15:00:49

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310

https://bugzilla.kernel.org/show_bug.cgi?id=216322

--- Comment #5 from Theodore Tso ([email protected]) ---
On Thu, Aug 04, 2022 at 11:47:47AM +0000, [email protected] wrote:
>
> I agree that the FITRIM interface is flawed in this way. But
> ext4_try_to_trim_range() actually does have fatal_signal_pending() and
> will return -ERESTARTSYS if that's true. Or did you have something else in
> mind?

The fatal_signal_pending() only checks for SIGKILL. I'm not sure why
it returns ERESTARTSYS, since that's not applicable for a kill -9
signal. The fake_signal_wake_up() function in kernel/freezer.c
doesn't send a fatal signal, so the fatal_signal_pending() check isn't
going to help here.

> Also in that case, I see no reason why we would not be able to adjust
> the fstrim_range to make it easier to re-start where we left off if
> we're going to return -ERESTARTSYS. I am missing something?

Well, we could adjust fstrim_range.start and fstrim_range.len to make
it easier to restart --- but that's only if we know for sure that
we're going to be restarting the system call. So we need to break
some abstraction barriers since if the signal is one where based on
the sigaction flags, the system all is *not* restarted, then
fstrim_range.len is supposed to contain the number of bytes trimmed.

And even if the system call is restarted, there's no place to stash
the number of bytes trimmed so far, since fstrim_range.len is
overloaded. This why the interface is so horrible...

> I have not had time to look deeply into the traces, but are you actually
> sure that we're not stuck in blkdev_issue_discard() instead?

I'm not 100% certain, but unless the block device has been put to
sleep first (in which case I think we would have noticed much sooner
since lots of other suspend-to-ram use cases would be failling --- in
writeback threads, for example), I'd be really surprised if we're
getting stuck there.

Even when we need to wait for the queue to be drained so there is
space to send the next discard, that shouldn't take 60+ seconds.

- Ted

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.