Hi,
I have encountered the following situation several times, but I've been
unable to come up with a way to reproduce this until now:
- some process is keeping the disk busy (some cron job for example:
updatedb, chkrootkit, ...)
- other processes that want to do I/O have to wait (this is normal)
- I have a (I/O bound) process running in my terminal, and I want to
interrupt it with Ctrl+C
- I type Ctrl+C several times, and the process is not interrupted for
several seconds (10-30 secs)
- if I type Ctrl+Z, and use kill %1 the process dies faster than
waiting for it to react to Ctrl+C
This issue occurs both on my x86-64 machine that uses reiserfs, and on
my x86 machine that uses XFS, so it doesn't seem related to the
underlying FS.
I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this behaviour
with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
Is this intended behaviour, or should I report a bug?
If it should be considered a bug, I will try several kernels to see if
there is a particular kernel version that introduced this behaviour.
To reproduce this issue here is a testcase:
Step 1: Run this shell script in a terminal in X (gnome-terminal,
konsole, ...):
#!/bin/sh
# choose a size that will keep the disks busy for about half a minute or
more
dd if=/dev/zero of=xt bs=100M count=4&
dd if=/dev/zero of=yt bs=100M count=4&
rm xt
rm yt
wait %1 %2
Step 2: Run latencytop
Step 3: In another terminal run an I/O bound process and try to
interrupt it with Ctrl+C, see how fast it responds, for example:
edwin@thunder:~/tst$ find / >/dev/null
find: `/boot/lost+found': Permission denied
^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
edwin@thunder:~/tst$
Step 4: repeat step 3 until you get a behaviour like the above (repeat
20-30 times, it doesn't always happen, when it happens you have to press
Ctrl+C several times in order for the process to react)
Here is the output of /proc/latency_stats when this situation occured,
notice that even select has a high latency:
Latency Top version : v0.1
555 202326 4994 do_sys_poll sys_poll sysenter_past_esp
430 515897 4994 do_select core_sys_select sys_select sysenter_past_esp
12 1140 1013 tty_wait_until_sent set_termios tty_mode_ioctl n_tty_ioctl
tty_ioctl vfs_ioctl do_vfs_ioctl sys_ioctl sysenter_past_esp
434 60405 4522 do_select core_sys_select sys_select sysenter_past_esp
21 244388 54486 blk_execute_rq scsi_execute scsi_execute_req
sr_test_unit_ready sr_media_change media_changed cdrom_media_changed
sr_block_media_changed check_disk_change cdrom_open sr_block_open do_open
21 44988 7252 blk_execute_rq scsi_execute scsi_execute_req
sr_test_unit_ready sr_drive_status cdrom_ioctl sr_block_ioctl
blkdev_driver_ioctl blkdev_ioctl block_ioctl vfs_ioctl do_vfs_ioctl
7 7052 1185 blk_execute_rq scsi_execute sr_do_ioctl sr_packet
cdrom_get_media_event sr_drive_status cdrom_ioctl sr_block_ioctl
blkdev_driver_ioctl blkdev_ioctl block_ioctl vfs_ioctl
7 4399 735 blk_execute_rq scsi_execute scsi_execute_req
ioctl_internal_command scsi_set_medium_removal sr_lock_door
cdrom_release sr_block_release __blkdev_put blkdev_put blkdev_close __fput
32 5646 3269 sys_epoll_wait sysenter_past_esp
14 14108 3260 futex_wait do_futex sys_futex sysenter_past_esp
4 5257 4095 futex_wait do_futex sys_futex sysenter_past_esp
7 2712 1396 pipe_wait pipe_read do_sync_read vfs_read sys_read
sysenter_past_esp
2 0 0 pipe_wait pipe_read do_sync_read vfs_read sys_read sysenter_past_esp
208 1996603 953067 xfs_ilock xfs_iomap __xfs_get_blocks xfs_get_blocks
__block_prepare_write block_write_begin xfs_vm_write_begin
generic_file_buffered_write xfs_write xfs_file_aio_write do_sync_write
vfs_write
21 10264428 915514 get_request_wait __make_request generic_make_request
submit_bio xfs_submit_ioend_bio xfs_submit_ioend xfs_page_state_convert
xfs_vm_writepage __writepage write_cache_pages generic_writepages
xfs_vm_writepages
26 3369263 2260529 down xfs_buf_iowait xfs_buf_iostart
xfs_buf_read_flags xfs_trans_read_buf xfs_imap_to_bp xfs_itobp xfs_iread
xfs_iget_core xfs_iget xfs_lookup xfs_vn_lookup
1 17888 17888 down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags
xfs_trans_read_buf xfs_da_do_buf xfs_da_read_buf xfs_dir2_block_getdents
xfs_readdir xfs_file_readdir vfs_readdir sys_getdents64
2 8226 7641 down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags
xfs_trans_read_buf xfs_da_do_buf xfs_da_read_buf xfs_dir2_leaf_getdents
xfs_readdir xfs_file_readdir vfs_readdir sys_getdents64
2 16090 10971 down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags
xfs_trans_read_buf xfs_da_do_buf xfs_da_read_buf
xfs_dir2_leaf_lookup_int xfs_dir2_leaf_lookup xfs_dir_lookup xfs_lookup
xfs_vn_lookup
1 2149 2149 down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags
xfs_buf_read_flags xfs_trans_read_buf xfs_da_do_buf xfs_da_read_buf
xfs_dir2_leaf_getdents xfs_readdir xfs_file_readdir vfs_readdir
2 0 0 unix_stream_recvmsg sock_recvmsg sys_recvfrom sys_recv
sys_socketcall sysenter_past_esp
43 1340361 1336178 xfs_ilock xfs_iomap xfs_map_blocks
xfs_page_state_convert xfs_vm_writepage __writepage write_cache_pages
generic_writepages xfs_vm_writepages do_writepages
__writeback_single_inode sync_sb_inodes
1 6 6 xfs_ilock xfs_iomap_write_allocate xfs_iomap xfs_map_blocks
xfs_page_state_convert xfs_vm_writepage __writepage write_cache_pages
generic_writepages xfs_vm_writepages do_writepages __writeback_single_inode
64 1052948 21773 congestion_wait __alloc_pages_internal __alloc_pages
__grab_cache_page block_write_begin xfs_vm_write_begin
generic_file_buffered_write xfs_write xfs_file_aio_write do_sync_write
vfs_write sys_write
3 5264928 3610252 sync_page wait_on_page_bit
wait_on_page_writeback_range filemap_fdatawait xfs_fsync xfs_file_fsync
do_fsync __do_fsync sys_fsync sysenter_past_esp
1 284000 284000 get_request_wait __make_request generic_make_request
submit_bio _xfs_buf_ioapply xfs_buf_iorequest xlog_bdstrat_cb xlog_sync
xlog_state_release_iclog xlog_state_sync _xfs_log_force _xfs_trans_commit
3 1724786 1319135 xlog_state_sync _xfs_log_force _xfs_trans_commit
xfs_fsync xfs_file_fsync do_fsync __do_fsync sys_fsync sysenter_past_esp
3 2851878 1642865 down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags
xfs_buf_read_flags xfs_trans_read_buf xfs_alloc_read_agf
xfs_alloc_fix_freelist xfs_alloc_vextent xfs_bmap_btalloc xfs_bmap_alloc
xfs_bmapi
17 43 23 xfs_ilock xfs_iomap_write_allocate xfs_iomap xfs_map_blocks
xfs_page_state_convert xfs_vm_writepage shrink_page_list
shrink_inactive_list shrink_zone try_to_free_pages
__alloc_pages_internal __alloc_pages
1 45 45 down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags
xfs_buf_read_flags xfs_trans_read_buf xfs_btree_read_bufs
xfs_alloc_lookup xfs_alloc_lookup_eq xfs_alloc_fixup_trees
xfs_alloc_ag_vextent_size xfs_alloc_ag_vextent
1 539 539 congestion_wait __alloc_pages_internal __alloc_pages
handle_mm_fault do_page_fault error_code
2 26759 21339 congestion_wait __alloc_pages_internal __alloc_pages
__get_free_pages proc_file_read proc_reg_read vfs_read sys_read
sysenter_past_esp
6 96540 19159 congestion_wait __alloc_pages_internal __alloc_pages
__get_free_pages __pollwait unix_poll sock_poll do_select
core_sys_select sys_select sysenter_past_esp
2 53244 47209 mempool_alloc bio_alloc_bioset bio_alloc
xfs_alloc_ioend_bio xfs_submit_ioend xfs_page_state_convert
xfs_vm_writepage __writepage write_cache_pages generic_writepages
xfs_vm_writepages do_writepages
1 425375 425375 sync_page __lock_page find_lock_page filemap_fault
__do_fault handle_mm_fault do_page_fault error_code
Best regards,
--Edwin
T?r?k Edwin wrote:
> Hi,
>
> I have encountered the following situation several times, but I've been
> unable to come up with a way to reproduce this until now:
> - some process is keeping the disk busy (some cron job for example:
> updatedb, chkrootkit, ...)
> - other processes that want to do I/O have to wait (this is normal)
> - I have a (I/O bound) process running in my terminal, and I want to
> interrupt it with Ctrl+C
> - I type Ctrl+C several times, and the process is not interrupted for
> several seconds (10-30 secs)
> - if I type Ctrl+Z, and use kill %1 the process dies faster than
> waiting for it to react to Ctrl+C
>
> This issue occurs both on my x86-64 machine that uses reiserfs, and on
> my x86 machine that uses XFS, so it doesn't seem related to the
> underlying FS.
> I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this behaviour
> with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
>
> Is this intended behaviour, or should I report a bug?
>
Yes, it's intended behaviour. Filesystem IO syscalls are considered
"fast" and are interruptible. Usermode code can reasonably expect that
file IO will never return EINTR.
That said, if a program is blocking for tens of seconds in block IO,
then that could be a problem in itself.
J
Jeremy Fitzhardinge wrote:
> Yes, it's intended behaviour. Filesystem IO syscalls are considered
> "fast" and are interruptible.
Er, *un*interruptible.
J
Jeremy Fitzhardinge wrote:
> T?r?k Edwin wrote:
>> Hi,
>>
>> I have encountered the following situation several times, but I've been
>> unable to come up with a way to reproduce this until now:
>> - some process is keeping the disk busy (some cron job for example:
>> updatedb, chkrootkit, ...)
>> - other processes that want to do I/O have to wait (this is normal)
>> - I have a (I/O bound) process running in my terminal, and I want to
>> interrupt it with Ctrl+C
>> - I type Ctrl+C several times, and the process is not interrupted for
>> several seconds (10-30 secs)
>> - if I type Ctrl+Z, and use kill %1 the process dies faster than
>> waiting for it to react to Ctrl+C
>>
>> This issue occurs both on my x86-64 machine that uses reiserfs, and on
>> my x86 machine that uses XFS, so it doesn't seem related to the
>> underlying FS.
>> I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this behaviour
>> with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
>>
>> Is this intended behaviour, or should I report a bug?
>>
>
> Yes, it's intended behaviour. Filesystem IO syscalls are considered
> "fast" and are interruptible. Usermode code can reasonably expect
> that file IO will never return EINTR.
That's filesystem dependent; if you mount an nfs filesystem with the
'intr' mount option, it will be interruptible (which makes sense, as it
is impossible to guarantee the server's responsiveness).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
Avi Kivity wrote:
>>
>> Yes, it's intended behaviour. Filesystem IO syscalls are considered
>> "fast" and are interruptible. Usermode code can reasonably expect
>> that file IO will never return EINTR.
>
> That's filesystem dependent; if you mount an nfs filesystem with the
> 'intr' mount option, it will be interruptible (which makes sense, as
> it is impossible to guarantee the server's responsiveness).
'intr' is a pretty bad idea, and I would never recommend it ('soft' is
better). It's an excellent way to destroy data when a stray signal
causes a syscall to fail with EINTR in an unexpected way (write being
the obvious one, but link, unlink, truncate or even close can fail in
odd ways can cause havok).
I don't know of any other filesystem with a similarly bad option.
J
Jeremy Fitzhardinge wrote:
> Avi Kivity wrote:
>>>
>>> Yes, it's intended behaviour. Filesystem IO syscalls are considered
>>> "fast" and are interruptible. Usermode code can reasonably expect
>>> that file IO will never return EINTR.
>>
>> That's filesystem dependent; if you mount an nfs filesystem with the
>> 'intr' mount option, it will be interruptible (which makes sense, as
>> it is impossible to guarantee the server's responsiveness).
>
> 'intr' is a pretty bad idea, and I would never recommend it ('soft' is
> better). It's an excellent way to destroy data when a stray signal
> causes a syscall to fail with EINTR in an unexpected way (write being
> the obvious one, but link, unlink, truncate or even close can fail in
> odd ways can cause havok).
>
Applications should not assume that write() (or other syscalls) can't
return EINTR. Not all filesystems have a bounded-time backing store.
'soft' has its own problems; namely false positives when someone steps
on the network cable, temporarily blocking packet flow, or when using a
clustered server which may take some time to recover from a fault.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
Avi Kivity wrote:
> Applications should not assume that write() (or other syscalls) can't
> return EINTR. Not all filesystems have a bounded-time backing store.
The distinction between 'fast' (filesystem) and 'slow' (terminals and
pipes) blocking syscalls goes back to the earliest days of Unix, and is
part of the ABI. Most filesystem syscalls are not documented to ever
return EINTR.
> 'soft' has its own problems; namely false positives when someone steps
> on the network cable, temporarily blocking packet flow, or when using
> a clustered server which may take some time to recover from a fault.
Sure. It's the basic problem of trying to make network access
transparent by hiding the failure modes. You either need to put up with
spurious timeouts caused by transient failures, or unbounded blocking on
real failures.
Regardless, NFS is the exception here, and making normal block-backed
filesystems start throwing EINTRs around would be a huge behavioural change.
J
Jeremy Fitzhardinge wrote:
> T?r?k Edwin wrote:
>> Hi,
>>
>> I have encountered the following situation several times, but I've been
>> unable to come up with a way to reproduce this until now:
>> - some process is keeping the disk busy (some cron job for example:
>> updatedb, chkrootkit, ...)
>> - other processes that want to do I/O have to wait (this is normal)
>> - I have a (I/O bound) process running in my terminal, and I want to
>> interrupt it with Ctrl+C
>> - I type Ctrl+C several times, and the process is not interrupted for
>> several seconds (10-30 secs)
>> - if I type Ctrl+Z, and use kill %1 the process dies faster than
>> waiting for it to react to Ctrl+C
>>
>> This issue occurs both on my x86-64 machine that uses reiserfs, and on
>> my x86 machine that uses XFS, so it doesn't seem related to the
>> underlying FS.
>> I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this behaviour
>> with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
>>
>> Is this intended behaviour, or should I report a bug?
>>
>
> Yes, it's intended behaviour. Filesystem IO syscalls are considered
> "fast" and are interruptible. Usermode code can reasonably expect
> that file IO will never return EINTR.
Ok.
>
> That said, if a program is blocking for tens of seconds in block IO,
> then that could be a problem in itself.
In that case I don't think that a program doing heavy I/O (writeout of
100Mb+) should be able to block other processes waiting for I/O on the
same device for tens of seconds.
I am using CFQ as I/O scheduler now, I will try the other I/O schedulers
(especially deadline) and see if I get better behaviour.
Is there any documentation on the tunable values for CFQ? (in
Documentation/block there is only about anticipatory and deadline).
Best regards,
--Edwin
Jeremy Fitzhardinge wrote:
> T?r?k Edwin wrote:
>> ...
>> - I have a (I/O bound) process running in my terminal, and I want to
>> interrupt it with Ctrl+C
>> - I type Ctrl+C several times, and the process is not interrupted for
>> several seconds (10-30 secs)
>> - if I type Ctrl+Z, and use kill %1 the process dies faster than
>> waiting for it to react to Ctrl+C
>
> Yes, it's intended behaviour. Filesystem IO syscalls are considered
> "fast" and are interruptible. Usermode code can reasonably expect
> that file IO will never return EINTR.
This does not address the symptom that the process can be killed quicker
by sending a SIGTERM. I've noticed the problem, too (2.6.25.) I wonder
if it isn't some strangeness in the tty layer (hence the interrupt key
is slower than an explicitly sent signal.)
Jeremy Fitzhardinge wrote:
> Avi Kivity wrote:
>> Applications should not assume that write() (or other syscalls) can't
>> return EINTR. Not all filesystems have a bounded-time backing store.
>
> The distinction between 'fast' (filesystem) and 'slow' (terminals and
> pipes) blocking syscalls goes back to the earliest days of Unix, and
> is part of the ABI. Most filesystem syscalls are not documented to
> ever return EINTR.
POSIX documents EINTR for write(), and the manpage on my Linux distro
says the same.
However I don't think introducing EINTR would be beneficial (it will
likely cause applications that don't expect it to break).
>
>> 'soft' has its own problems; namely false positives when someone
>> steps on the network cable, temporarily blocking packet flow, or when
>> using a clustered server which may take some time to recover from a
>> fault.
>
> Sure. It's the basic problem of trying to make network access
> transparent by hiding the failure modes. You either need to put up
> with spurious timeouts caused by transient failures, or unbounded
> blocking on real failures.
>
> Regardless, NFS is the exception here, and making normal block-backed
> filesystems start throwing EINTRs around would be a huge behavioural
> change.
Agreed.
Best regards,
--Edwin
Jeremy Fitzhardinge <[email protected]> writes:
>> I have encountered the following situation several times, but I've been
>> unable to come up with a way to reproduce this until now:
>> - some process is keeping the disk busy (some cron job for example:
>> updatedb, chkrootkit, ...)
>> - other processes that want to do I/O have to wait (this is normal)
>> - I have a (I/O bound) process running in my terminal, and I want to
>> interrupt it with Ctrl+C
>> - I type Ctrl+C several times, and the process is not interrupted for
>> several seconds (10-30 secs)
>> - if I type Ctrl+Z, and use kill %1 the process dies faster than
>> waiting for it to react to Ctrl+C
>>
>> This issue occurs both on my x86-64 machine that uses reiserfs, and on
>> my x86 machine that uses XFS, so it doesn't seem related to the
>> underlying FS.
>> I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this behaviour
>> with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
>>
>> Is this intended behaviour, or should I report a bug?
>>
>
> Yes, it's intended behaviour. Filesystem IO syscalls are considered
> "fast" and are interruptible. Usermode code can reasonably expect
> that file IO will never return EINTR.
>
> That said, if a program is blocking for tens of seconds in block IO,
> then that could be a problem in itself.
Still there's the effect that Ctrl-Z+kill works faster than Ctrl-C
that is not explained by this. This has often annoyed me too.
I'm not sure why it is. In theory they should be the same unless
someone blocks SIGINT.
-Andi
> Applications should not assume that write() (or other syscalls) can't
> return EINTR. Not all filesystems have a bounded-time backing store.
Unix tradition (and thus almost all applications) believe file store
writes to be non signal interruptible. It would not be safe or practical
to change that guarantee.
Alan
Andi Kleen wrote:
> Still there's the effect that Ctrl-Z+kill works faster than Ctrl-C
> that is not explained by this. This has often annoyed me too.
> I'm not sure why it is. In theory they should be the same unless
> someone blocks SIGINT.
>
I'd never noticed that. That's just weird.
J
Jeremy Fitzhardinge wrote:
> Avi Kivity wrote:
>> Applications should not assume that write() (or other syscalls) can't
>> return EINTR. Not all filesystems have a bounded-time backing store.
>
> The distinction between 'fast' (filesystem) and 'slow' (terminals and
> pipes) blocking syscalls goes back to the earliest days of Unix, and is
> part of the ABI. Most filesystem syscalls are not documented to ever
> return EINTR.
>
>> 'soft' has its own problems; namely false positives when someone steps
>> on the network cable, temporarily blocking packet flow, or when using
>> a clustered server which may take some time to recover from a fault.
>
> Sure. It's the basic problem of trying to make network access
> transparent by hiding the failure modes. You either need to put up with
> spurious timeouts caused by transient failures, or unbounded blocking on
> real failures.
>
Basic problem is that you can get a process which you can't interrupt
(in in most cases can't kill) which has resources tied up. Given the
choice between surprising a process with an EINTR or killing it during a
reboot to get the system usable again, I would rather surprise.
The current situation is infrequent but not unheard of. And the causes
are not all rooted in NFS, I used to see this 4-5 times a year when I
was running nntp clusters with heavily threaded applications, every once
in a while some thread would hang in a waiting for i/o state and could
be killed or fixed. I can't see that an application error would result
in a thread being left waiting i/o and uninterruptable, that's a kernel
state.
--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
>> Still there's the effect that Ctrl-Z+kill works faster than Ctrl-C
>> that is not explained by this. This has often annoyed me too.
>> I'm not sure why it is. In theory they should be the same unless
>> someone blocks SIGINT.
>>
>
> I'd never noticed that. That's just weird.
>
I occationally see this - although I rarely run loads so heavy that it
is a real problem. Ctrl-C - nothing happens except maybe a ^C printed -
kill it from another rxvt.
Could it be some sort of tty locking issue, holding up Ctrl-C processing
while the heavily loaded machine suffer lock contention?
Last time I saw this was a erroneous script that called itself without
exec. With 2G memory and 3G of swap in use, the system was slow. the
mouse cursor moved only now and then. Very little happened
with Ctrl-C. Closing the rxvt running this script then caused a lot of
disk activity and the system slowly came back to normal.
Helge Hafting
On Sat, Jun 28, 2008 at 10:13:54PM -0700, Jeremy Fitzhardinge wrote:
> Avi Kivity wrote:
>>>
>>> Yes, it's intended behaviour. Filesystem IO syscalls are considered
>>> "fast" and are interruptible. Usermode code can reasonably expect
>>> that file IO will never return EINTR.
>>
>> That's filesystem dependent; if you mount an nfs filesystem with the
>> 'intr' mount option, it will be interruptible (which makes sense, as
>> it is impossible to guarantee the server's responsiveness).
>
> 'intr' is a pretty bad idea, and I would never recommend it ('soft' is
> better).
Yipes.
> It's an excellent way to destroy data when a stray signal
> causes a syscall to fail with EINTR in an unexpected way (write being
> the obvious one, but link, unlink, truncate or even close can fail in
> odd ways can cause havok).
And with "soft" all that can happen with the need for the stray
signal....
I suppose the relative likelihoods of hitting the problem under "soft"
and "intr" may vary depending on the details of your setup. But in
general I'd've thought it'd be easier to control stray signals than,
say, stray network problems.
--b.
> I don't know of any other filesystem with a similarly bad option.
T?r?k Edwin <[email protected]> wrote:
> Hi,
>
> I have encountered the following situation several times, but I've been
> unable to come up with a way to reproduce this until now:
> - some process is keeping the disk busy (some cron job for example:
> updatedb, chkrootkit, ...)
> - other processes that want to do I/O have to wait (this is normal)
> - I have a (I/O bound) process running in my terminal, and I want to
> interrupt it with Ctrl+C
> - I type Ctrl+C several times, and the process is not interrupted for
> several seconds (10-30 secs)
> - if I type Ctrl+Z, and use kill %1 the process dies faster than
> waiting for it to react to Ctrl+C
The following patch to 2.6.26-rc8 fixes the issue for me. Perhaps we
really want to do something else, but since I'm not all that familiar
with the standard behaviour on other Unices and since the comment
describing the changed order of function calls in the original commit
didn't give the reason for that change, I leave that to more
knowledgeable people.
Regards,
Elias
--------
From: Elias Oltmanns <[email protected]>
Subject: Make sure that interrupt characters get through reliably
Since commit ec5b1157f8e819c72fc93aa6d2d5117c08cdc961, users have been
unable to interrupt interactive processes reliably by pressing CTRL+C.
This patch reverts the original commit except for the most important
part: actually echoing ^C is preserved.
Signed-off-by: Elias Oltmanns <[email protected]>
---
drivers/char/n_tty.c | 13 +------------
1 files changed, 1 insertions(+), 12 deletions(-)
diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 8096389..74018ef 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -759,20 +759,9 @@ static inline void n_tty_receive_char(struct tty_struct *tty, unsigned char c)
signal = SIGTSTP;
if (c == SUSP_CHAR(tty)) {
send_signal:
- /*
- * Echo character, and then send the signal.
- * Note that we do not use isig() here because we want
- * the order to be:
- * 1) flush, 2) echo, 3) signal
- */
- if (!L_NOFLSH(tty)) {
- n_tty_flush_buffer(tty);
- tty_driver_flush_buffer(tty);
- }
if (L_ECHO(tty))
echo_char(c, tty);
- if (tty->pgrp)
- kill_pgrp(tty->pgrp, signal, 1);
+ isig(signal, tty, 0);
return;
}
}
Elias Oltmanns <[email protected]> wrote:
[...]
> The following patch to 2.6.26-rc8 fixes the issue for me.
Sorry, resending without MIME encoding the message.
Regards,
Elias
--------
From: Elias Oltmanns <[email protected]>
Subject: Make sure that interrupt characters get through reliably
Since commit ec5b1157f8e819c72fc93aa6d2d5117c08cdc961, users have been
unable to interrupt interactive processes reliably by pressing CTRL+C.
This patch reverts the original commit except for the most important
part: actually echoing ^C is preserved.
Signed-off-by: Elias Oltmanns <[email protected]>
---
drivers/char/n_tty.c | 13 +------------
1 files changed, 1 insertions(+), 12 deletions(-)
diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 8096389..74018ef 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -759,20 +759,9 @@ static inline void n_tty_receive_char(struct tty_struct *tty, unsigned char c)
signal = SIGTSTP;
if (c == SUSP_CHAR(tty)) {
send_signal:
- /*
- * Echo character, and then send the signal.
- * Note that we do not use isig() here because we want
- * the order to be:
- * 1) flush, 2) echo, 3) signal
- */
- if (!L_NOFLSH(tty)) {
- n_tty_flush_buffer(tty);
- tty_driver_flush_buffer(tty);
- }
if (L_ECHO(tty))
echo_char(c, tty);
- if (tty->pgrp)
- kill_pgrp(tty->pgrp, signal, 1);
+ isig(signal, tty, 0);
return;
}
}
Elias Oltmanns wrote:
> Elias Oltmanns <[email protected]> wrote:
> [...]
>
>> The following patch to 2.6.26-rc8 fixes the issue for me.
>>
>
> Sorry, resending without MIME encoding the message.
>
> Regards,
>
> Elias
>
>
> --------
> From: Elias Oltmanns <[email protected]>
> Subject: Make sure that interrupt characters get through reliably
>
> Since commit ec5b1157f8e819c72fc93aa6d2d5117c08cdc961, users have been
> unable to interrupt interactive processes reliably by pressing CTRL+C.
> This patch reverts the original commit except for the most important
> part: actually echoing ^C is preserved.
>
Thanks for the patch , the process seems to respond faster to Ctrl-C,
but I'll have to find a way to measure that reliably.
However ^C is not echoed anymore for me.
Best regards,
--Edwin
Elias Oltmanns wrote:
> - if (!L_NOFLSH(tty)) {
> - n_tty_flush_buffer(tty);
> - tty_driver_flush_buffer(tty);
> - }
> if (L_ECHO(tty))
> echo_char(c, tty);
> - if (tty->pgrp)
> - kill_pgrp(tty->pgrp, signal, 1);
> + isig(signal, tty, 0);
My first reaction is that tty->pgrp must be null. Perhaps the patch
could be simplified...
if (tty->pgrp)
kill_pgrp(tty->pgrp, signal, 1);
+ else
+ isig(signal, tty, 0);
Thoughts?
David Newall wrote:
> Elias Oltmanns wrote:
>
>> - if (!L_NOFLSH(tty)) {
>> - n_tty_flush_buffer(tty);
>> - tty_driver_flush_buffer(tty);
>> - }
>> if (L_ECHO(tty))
>> echo_char(c, tty);
>> - if (tty->pgrp)
>> - kill_pgrp(tty->pgrp, signal, 1);
>> + isig(signal, tty, 0);
>>
>
> My first reaction is that tty->pgrp must be null. Perhaps the patch
> could be simplified...
>
> if (tty->pgrp)
> kill_pgrp(tty->pgrp, signal, 1);
> + else
> + isig(signal, tty, 0);
>
>
> Thoughts?
>
isig has the same check, if it is NULL, isig won't deliver the signal
either:
if (tty->pgrp)
kill_pgrp(tty->pgrp, sig, 1);
--Edwin
T?r?k Edwin wrote:
> David Newall wrote:
>
>> Elias Oltmanns wrote:
>>
>>
>>> - if (!L_NOFLSH(tty)) {
>>> - n_tty_flush_buffer(tty);
>>> - tty_driver_flush_buffer(tty);
>>> - }
>>> if (L_ECHO(tty))
>>> echo_char(c, tty);
>>> - if (tty->pgrp)
>>> - kill_pgrp(tty->pgrp, signal, 1);
>>> + isig(signal, tty, 0);
>>>
>>>
>> My first reaction is that tty->pgrp must be null. Perhaps the patch
>> could be simplified...
>>
>> if (tty->pgrp)
>> kill_pgrp(tty->pgrp, signal, 1);
>> + else
>> + isig(signal, tty, 0);
>>
>>
>> Thoughts?
>>
>>
>
> isig has the same check, if it is NULL, isig won't deliver the signal
> either
>
That is odd. We did see the control-key echoed, so, other than not
flushing output, what's funcitonally different?
T?r?k Edwin <[email protected]> wrote:
> Elias Oltmanns wrote:
[...]
>> Since commit ec5b1157f8e819c72fc93aa6d2d5117c08cdc961, users have been
>> unable to interrupt interactive processes reliably by pressing CTRL+C.
>> This patch reverts the original commit except for the most important
>> part: actually echoing ^C is preserved.
>>
>
> Thanks for the patch , the process seems to respond faster to Ctrl-C,
> but I'll have to find a way to measure that reliably.
> However ^C is not echoed anymore for me.
Very odd, it most definitely is echoed here. Are you quite sure you
haven't inadvertently changed anything else in the meantime?
Regards,
Elias
T?r?k Edwin wrote:
> Thanks for the patch , the process seems to respond faster to Ctrl-C,
> but I'll have to find a way to measure that reliably.
> However ^C is not echoed anymore for me.
I found the same thing when I originally did the ^C echo patch. If isig() was
used instead of the order specified (flush, echo, signal), the ^C did not echo
reliably (i.e., it echoed on a tty console, but not in an xterm). isig() does
the kill, then the flush.
Note that ^Z uses the same logic, so the fact that you are seeing this take
effect more quickly is interesting.
I will try a few things today, but please experiment with various orderings of
the calls and let me know what you find (and test the ^C echo in both tty
console and xterm).
-Joe
Elias Oltmanns wrote:
> The following patch to 2.6.26-rc8 fixes the issue for me. Perhaps we
> really want to do something else, but since I'm not all that familiar
> with the standard behaviour on other Unices and since the comment
> describing the changed order of function calls in the original commit
> didn't give the reason for that change, I leave that to more
> knowledgeable people.
>
> drivers/char/n_tty.c | 13 +------------
> 1 files changed, 1 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
> index 8096389..74018ef 100644
> --- a/drivers/char/n_tty.c
> +++ b/drivers/char/n_tty.c
> @@ -759,20 +759,9 @@ static inline void n_tty_receive_char(struct tty_struct *tty, unsigned char c)
> signal = SIGTSTP;
> if (c == SUSP_CHAR(tty)) {
> send_signal:
> - /*
> - * Echo character, and then send the signal.
> - * Note that we do not use isig() here because we want
> - * the order to be:
> - * 1) flush, 2) echo, 3) signal
> - */
> - if (!L_NOFLSH(tty)) {
> - n_tty_flush_buffer(tty);
> - tty_driver_flush_buffer(tty);
> - }
> if (L_ECHO(tty))
> echo_char(c, tty);
> - if (tty->pgrp)
> - kill_pgrp(tty->pgrp, signal, 1);
> + isig(signal, tty, 0);
> return;
> }
> }
I noticed the original post in this thread mentioned that the problem
has been seen since 2.6.21 or 2.6.23:
> I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this
> behaviour with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
>
> Is this intended behaviour, or should I report a bug?
The echo patch that is altered in the patch above only appeared recently
(in 2.6.25). Is there a way for you try try the test case on a
pre-2.6.25 kernel and see if the issue exists there? If so, it is
strange that the above fixes it.
-Joe
Joe Peterson <[email protected]> wrote:
> Elias Oltmanns wrote:
>> The following patch to 2.6.26-rc8 fixes the issue for me. Perhaps we
>> really want to do something else, but since I'm not all that familiar
>> with the standard behaviour on other Unices and since the comment
>> describing the changed order of function calls in the original commit
>> didn't give the reason for that change, I leave that to more
>> knowledgeable people.
>>
>> drivers/char/n_tty.c | 13 +------------
>> 1 files changed, 1 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
>> index 8096389..74018ef 100644
>> --- a/drivers/char/n_tty.c
>> +++ b/drivers/char/n_tty.c
>> @@ -759,20 +759,9 @@ static inline void n_tty_receive_char(struct tty_struct *tty, unsigned char c)
>> signal = SIGTSTP;
>> if (c == SUSP_CHAR(tty)) {
>> send_signal:
>> - /*
>> - * Echo character, and then send the signal.
>> - * Note that we do not use isig() here because we want
>> - * the order to be:
>> - * 1) flush, 2) echo, 3) signal
>> - */
>> - if (!L_NOFLSH(tty)) {
>> - n_tty_flush_buffer(tty);
>> - tty_driver_flush_buffer(tty);
>> - }
>> if (L_ECHO(tty))
>> echo_char(c, tty);
>> - if (tty->pgrp)
>> - kill_pgrp(tty->pgrp, signal, 1);
>> + isig(signal, tty, 0);
>> return;
>> }
>> }
>
> I noticed the original post in this thread mentioned that the problem
> has been seen since 2.6.21 or 2.6.23:
>
>> I use 2.6.25-2 and 2.6.26-rc8 now; I don't recall seeing this
>> behaviour with old kernels (IIRC I see this since 2.6.21 or 2.6.23).
>>
>> Is this intended behaviour, or should I report a bug?
>
> The echo patch that is altered in the patch above only appeared recently
> (in 2.6.25). Is there a way for you try try the test case on a
> pre-2.6.25 kernel and see if the issue exists there? If so, it is
> strange that the above fixes it.
Due to my tests, 2.6.24 responds much faster to Ctrl+C than 2.6.25 does.
The patch above makes them *feel* alike again (no hard numbers, mind).
However, I haven't checked anything as early as 2.6.21 or before so I
don't know whether there may have been another regression since then.
Regards,
Elias
Elias Oltmanns wrote:
> Due to my tests, 2.6.24 responds much faster to Ctrl+C than 2.6.25 does.
> The patch above makes them *feel* alike again (no hard numbers, mind).
> However, I haven't checked anything as early as 2.6.21 or before so I
> don't know whether there may have been another regression since then.
OK, thanks for checking. Can you try the patch below? It is almost the
same as your patch, except it reverses the order of the isig and the echo.
This causes ^C echo to work for me in both console and xterm. Back when
I did the original patch, I was concerned this ordering could result in
the ^C echoing late, but this may not be an issue.
Let me know if the patch below fixes the issue with interrupting
processes waiting for I/O.
Thanks, Joe
Elias Oltmanns wrote:
> - /*
> - * Echo character, and then send the signal.
> - * Note that we do not use isig() here because we want
> - * the order to be:
> - * 1) flush, 2) echo, 3) signal
> - */
> - if (!L_NOFLSH(tty)) {
> - n_tty_flush_buffer(tty);
> - tty_driver_flush_buffer(tty);
> - }
> if (L_ECHO(tty))
> echo_char(c, tty);
> - if (tty->pgrp)
> - kill_pgrp(tty->pgrp, signal, 1);
> + isig(signal, tty, 0);
> return;
I've been doing some experimenting with the order of the three
operations (flush, echo, and signal), and the behavior is slightly
different with each.
The way I have it in the code now matches the order used by FreeBSD, so
there may actually be a good reason to flush the tty buffers *before*
issuing the signal. Here is their snippet of code:
if (ISSET(lflag, ISIG)) {
if (CCEQ(cc[VINTR], c) || CCEQ(cc[VQUIT], c)) {
if (!ISSET(lflag, NOFLSH))
ttyflush(tp, FREAD | FWRITE);
ttyecho(c, tp);
if (tp->t_pgrp != NULL) {
PGRP_LOCK(tp->t_pgrp);
pgsignal(tp->t_pgrp,
CCEQ(cc[VINTR], c) ? SIGINT : SIGQUIT, 1);
PGRP_UNLOCK(tp->t_pgrp);
}
goto endcase;
}
if (CCEQ(cc[VSUSP], c)) {
if (!ISSET(lflag, NOFLSH))
ttyflush(tp, FREAD);
ttyecho(c, tp);
if (tp->t_pgrp != NULL) {
PGRP_LOCK(tp->t_pgrp);
pgsignal(tp->t_pgrp, SIGTSTP, 1);
PGRP_UNLOCK(tp->t_pgrp);
}
goto endcase;
}
}
The first section handles ^C and ^\ (and flushes read and write), and
the second handles ^Z (only flushes read).
In any case, we should consider if the flush in Linux should precede the
signal. Perhaps interrupting before the flush can happen is bad?
Perhaps this has something to do with anomalies observed (below) with
other ordering, or maybe I'm seeing other latent bugs not involved with
this at all.
Now to the results of the ordering...
flush, echo, signal (the way it is now)
-------------------
* Follows FreeBSD's ordering
* works on both console and xterm
* seems to delay interrupt when process is IO bound
echo, signal, flush (proposed in Elias' patch)
-------------------
* seems to fix IO bound issue
* echo works in console but not xterm
signal, flush, echo
-------------------
* works in both console and xterm
* may cause late echo (and does not match BSD order)
* I tested inserting an artificial delay between flush and echo:
strange result: in console, echo does not appear; in xterm,
^C appears right before next prompt, but sometimes
echo does not appear, along with final program output
(something eats the output)
signal, echo, flush
-------------------
* same as above
So changing the order seems to always introduce some bugs or issues.
I'm still experimenting; feedback welcome!
-Joe
I'd like to correct a few misapprehensions in this thread. To be fair,
what they were saying used to be true, but it isn't any more, and I
would like people to be aware of it.
Avi Kivity wrote:
> That's filesystem dependent; if you mount an nfs filesystem with the
> 'intr' mount option, it will be interruptible (which makes sense, as it
> is impossible to guarantee the server's responsiveness).
As of v2.6.25, 'intr' is a no-op. The option is still recognised, but
it does not change NFS's behaviour.
Bill Davidson wrote:
> Basic problem is that you can get a process which you can't interrupt
> (in in most cases can't kill) which has resources tied up. Given the
> choice between surprising a process with an EINTR or killing it during a
> reboot to get the system usable again, I would rather surprise.
I implemented option C (none of the above). NFS now sleeps in state
TASK_KILLABLE so tasks can be killed, but they will never see the -EINTR
return (since they're dead). Yes, that means you can end up with a
partial write to a file on the server. But if you tripped over the
(power/network) cable, that could happen anyway.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
Elias Oltmanns wrote:
>> I have encountered the following situation several times, but I've been
>> unable to come up with a way to reproduce this until now:
>> - some process is keeping the disk busy (some cron job for example:
>> updatedb, chkrootkit, ...)
>> - other processes that want to do I/O have to wait (this is normal)
>> - I have a (I/O bound) process running in my terminal, and I want to
>> interrupt it with Ctrl+C
>> - I type Ctrl+C several times, and the process is not interrupted for
>> several seconds (10-30 secs)
>> - if I type Ctrl+Z, and use kill %1 the process dies faster than
>> waiting for it to react to Ctrl+C
>
> The following patch to 2.6.26-rc8 fixes the issue for me. Perhaps we
> really want to do something else, but since I'm not all that familiar
> with the standard behaviour on other Unices and since the comment
> describing the changed order of function calls in the original commit
> didn't give the reason for that change, I leave that to more
> knowledgeable people.
I have tried to reproduce the original poster's issue on
2.6.26-rc8-git3 without success. In around 100 attempts (restarting the
disk activity process over each time it completed), it always broke out
after one ^C - one time took an extra second or two. Note that I did
not run latencytop (did not have it compiled in my kernel) - if that is
required for the test, let me know, but I assume it is just for
gathering info when the issue occurs.
Can you please try something for me? For one, apply the attached patch,
which removes what seems to be a redundant flush (since both calls end
up calling the same n_tty routine). This made no difference for me, but
I am curious if it might help you.
If you still see the problem, please try typing "stty noflsh" and try
again. This disables the flush step, which may be affecting you.
Again, this did not make a difference for me.
It will really help me to know the results of these steps for you.
As far as moving the flush after the signal, I have tried this (in the
patch I posted earlier), and it ends up causing various anomalies in
output, so I do not think that is the right solution.
-Joe
> disk activity process over each time it completed), it always broke out
> after one ^C - one time took an extra second or two. Note that I did
> not run latencytop (did not have it compiled in my kernel) - if that is
> required for the test, let me know, but I assume it is just for
> gathering info when the issue occurs.
I really don't think this is tty related looking at the code involved and
also the lack of actual measurements presented. More likely scheduler and
VM related changes.
Alan
Alan Cox wrote:
>> disk activity process over each time it completed), it always broke out
>> after one ^C - one time took an extra second or two. Note that I did
>> not run latencytop (did not have it compiled in my kernel) - if that is
>> required for the test, let me know, but I assume it is just for
>> gathering info when the issue occurs.
>
> I really don't think this is tty related looking at the code involved and
> also the lack of actual measurements presented. More likely scheduler and
> VM related changes.
Alan, many thanks for your assessment - it's greatly appreciated. Now
that I've looked into it, the only peculiar thing I see is the redundant
flush_buffer call. Do you think that should be removed anyway? It
seems that the following two calls do the same thing in n_tty.c:
n_tty_flush_buffer(tty);
tty_driver_flush_buffer(tty);
If this looks redundant, let me know, and I can submit a patch to just
call n_tty_flush_buffer(tty).
-Thanks, Joe
Alan Cox <[email protected]> writes:
>> disk activity process over each time it completed), it always broke out
>> after one ^C - one time took an extra second or two. Note that I did
>> not run latencytop (did not have it compiled in my kernel) - if that is
>> required for the test, let me know, but I assume it is just for
>> gathering info when the issue occurs.
>
> I really don't think this is tty related looking at the code involved and
> also the lack of actual measurements presented. More likely scheduler and
> VM related changes.
Why should the scheduler or VM behave differently for Ctrl-Z+kill
versus Ctrl-C?
Doesn't make sense to me. And yes I see this here regularly and it started
at some point.
-Andi
> Doesn't make sense to me. And yes I see this here regularly and it started
> at some point.
Lets have some profiles and TSC numbers then. I'm happy to believe some
specific combination of hardware/timing shows up a real problem but
looking at the tiny amount of code involved I can see no sane explanation
as to why at this point.
I'm not going to look further into it without real serious data from
people seeing it.
Alan Cox wrote:
>> Doesn't make sense to me. And yes I see this here regularly and it started
>> at some point.
>
> Lets have some profiles and TSC numbers then. I'm happy to believe some
> specific combination of hardware/timing shows up a real problem but
> looking at the tiny amount of code involved I can see no sane explanation
> as to why at this point.
Well Elias showed that changing the order made a visible difference.
> I'm not going to look further into it without real serious data from
> people seeing it.
So you're saying multiple independent people suffer from some kind of
hallucination?
-Andi
> > I'm not going to look further into it without real serious data from
> > people seeing it.
>
> So you're saying multiple independent people suffer from some kind of
> hallucination?
I'm saying provide some data.
Alan Cox wrote:
>>> I'm not going to look further into it without real serious data from
>>> people seeing it.
>> So you're saying multiple independent people suffer from some kind of
>> hallucination?
>
> I'm saying provide some data.
It sounds more like the ostrich approach to bug handling to me.
So what kind of data do you want? Someone watching a wallclock while
comparing Ctrl-Z+kill versus Ctrl-C on a IO intensive process?
-Andi
> So what kind of data do you want? Someone watching a wallclock while
> comparing Ctrl-Z+kill versus Ctrl-C on a IO intensive process?
Latency traces with timestamps might be quite useful, they'd probably
also tell you why it happened. I can't reproduce it, nobody has provided
numbers so even if I wanted to work on it I couldn't do much.
Instead I have lots of real tty, ATA and other work that needs doing
which has quantified data, is reproducable and needs doing, so that will
get done.
Alan
Alan Cox wrote:
>> So what kind of data do you want? Someone watching a wallclock while
>> comparing Ctrl-Z+kill versus Ctrl-C on a IO intensive process?
>
> Latency traces with timestamps might be quite useful, they'd probably
> also tell you why it happened.
Ok so you're asking someone else to debug it.
I can't reproduce it, nobody has provided
> numbers so even if I wanted to work on it I couldn't do much.
Well we had a patch (although I haven't tried it yet)
http://marc.info/?l=linux-kernel&m=121489861508496&w=2
Is that not concrete enough?
> Instead I have lots of real tty, ATA and other work that needs doing
> which has quantified data,
All the reporters provided time stamp traces? @)
-Andi
> Well we had a patch (although I haven't tried it yet)
>
> http://marc.info/?l=linux-kernel&m=121489861508496&w=2
>
> Is that not concrete enough?
No. Apply a little engineering to this instead of running around acting
on random unexplained proposals people don't agree works. Right now you
look like a politician - mindlessly squawking about things you've not
tried and proposing anything and everything which might improve matters
without working out if they would and why.
The tty layer is getting improved and fixed by applying proper
engineering methods not by random flapping.
So:
observe, and if need be experiment to get further data
produce a model of the behaviour which explains the data
make the changes the explanation requires
test
repeat
> > Instead I have lots of real tty, ATA and other work that needs doing
> > which has quantified data,
>
> All the reporters provided time stamp traces?
No they provided relevant data or enough info I can reproduce it here.
Alan
The discussion seems to have become a little heated, so allow me to step
in here ...
Andi Kleen <[email protected]> wrote:
> Alan Cox wrote:
[...]
> I can't reproduce it, nobody has provided
>> numbers so even if I wanted to work on it I couldn't do much.
>
> Well we had a patch (although I haven't tried it yet)
>
> http://marc.info/?l=linux-kernel&m=121489861508496&w=2
>
> Is that not concrete enough?
Actually, I'm not to sure whether this really fixes the root cause of
the problem -- I never have been and I meant to indicate as much in my
email. It's been the first time I looked at the tty code and the patch
was mainly guess work; all it does is reverting parts of a previous
patch. My hope was to direct other people's (read: those who no the tty
code) attention to a change that seemed to cause the problem. Perhaps I
didn't make it clear enough at the time that I didn't really know *why*
this change should cause any problem in the first place.
Now, the situation has become even more delicate. Joe has reported that
my patch breaks echoing in the xterm and, rather to my embarrassment, I
have to report that it doesn't even fix the issue I claumed it would.
All it apparently does is making the problem slightly harder to
reproduce which is why it didn't occur in my tests at the time.
Since I have been concentrating on other things over the last days, it's
been only today that I discovered this. Moreover, some more testing lead
me to believe that the root issue has been present in mainline at least
since 2.6.19 and Joe's change in 2.6.25 only made it visible because you
now occasionally get something like
^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z
on your screen when you keep pressing Ctrl+Z until the prompt appears;
in 2.6.24, for instance, there would just be a short delay but no
irritating output on the screen that makes you wonder.
Quite frankly, I'm a bit at a loss as to how I should go about debugging
this and what kind of data might be useful to others to do so. In
another email Alan talked about latency traces which is something new to
me. Since the OP talked about latencytop, I hope that this tool provides
the data Alan requires and will install and make use of it accordingly
(expect some results later today or tomorrow). Of course, I'm always
open to other / additional suggestions.
Regards,
Elias
>
> Actually, I'm not to sure whether this really fixes the root cause of
> the problem
Ok.
>
> on your screen when you keep pressing Ctrl+Z until the prompt appears;
> in 2.6.24, for instance, there would just be a short delay but no
> irritating output on the screen that makes you wonder.
>
> Quite frankly, I'm a bit at a loss as to how I should go about debugging
> this and what kind of data might be useful to others to do so. In
> another email Alan talked about latency traces which is something new to
> me. Since the OP talked about latencytop,
I don't think latencytop would help to be frankly.
> I hope that this tool provides
> the data Alan requires and will install and make use of it accordingly
> (expect some results later today or tomorrow). Of course, I'm always
> open to other / additional suggestions.
The way to do latency traces is to install the -rt patchkit, don't
actually enable any of the RT features there, but enable CONFIG_FUNCTION_TRACE.
The interface is unfortunately quite user unfriendly and it takes
significant effort to set it up in a way and trigger
at the right point and on the right CPU that you can actually
get usable traces in my experience.
The advantage is that once you have the trace for the right
place (in this case from Ctrl-C to process exit) it is usually
clear what the problem is. You'll have done all the work
for Alan then.
Also the work to do this is likely similar in effort to just bisecting
it.
-Andi
Elias Oltmanns wrote:
> Now, the situation has become even more delicate. Joe has reported that
> my patch breaks echoing in the xterm and, rather to my embarrassment, I
> have to report that it doesn't even fix the issue I claumed it would.
> All it apparently does is making the problem slightly harder to
> reproduce which is why it didn't occur in my tests at the time.
Elias, thanks for your report. I could not reproduce the originally
posted test case, but I wrote a small program that continuously produces
output as a test. One thing I noticed was that the ease of breaking out
of this program is affected quite a bit by other system/CPU activity.
For example, if I was compiling the kernel, it became *easier* to break
out (presumably because the I/O from the test program was getting less
continuous CPU and so therefore the interrupt get "get in"). Similarly,
if I moved the xterm window around on the screen (causing other activity
by doing that) while waiting for the I/O program to terminate after
hitting ^C, it would often break out at that point.
So I do believe that one's subjective impression of how easy it is to
break out of such an I/O-bound program can be affected by the general
state of the system, and therefore it becomes fairly hard to draw a
certain conclusion.
> Since I have been concentrating on other things over the last days, it's
> been only today that I discovered this. Moreover, some more testing lead
> me to believe that the root issue has been present in mainline at least
> since 2.6.19 and Joe's change in 2.6.25 only made it visible because you
> now occasionally get something like
> ^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z^Z
Ah! Yes, this makes a lot of sense, actually, and is a good example of
a problem masquerading as something else. Thanks much for this insight.
Knowing that the problem could have been around pre-2.6.25 is very
useful info as well, and does indeed agree with Alan's thoughts that the
issue is likely caused by something else (VM, scheduler, etc.).
> Quite frankly, I'm a bit at a loss as to how I should go about debugging
> this and what kind of data might be useful to others to do so.
If you can, please try the new patch I attached to the post:
http://marc.info/?l=linux-kernel&m=121520229900676&w=2
It removes the call to tty_driver_flush_buffer(), which comes right
after n_tty_flush_buffer() in n_tty.c. It will probably make no
difference, but it would be good to hear either way. I am not sure if
both calls are needed (if anyone reading this knows why both are called
from n_tty.c, let me know), but I do know that the latter
(tty_driver_flush_buffer) call ends up calling n_tty_flush_buffer as
well, causing two flushes in a row.
And also, if you can, try doing 'stty noflsh' and then the test case
again to see if this alters behavior. This may be good to know as well,
even if the flush is not centrally involved in the issue.
Thanks, Joe
Joe Peterson <[email protected]> wrote:
> Elias Oltmanns wrote:
[...]
>> Quite frankly, I'm a bit at a loss as to how I should go about debugging
>> this and what kind of data might be useful to others to do so.
>
> If you can, please try the new patch I attached to the post:
>
> http://marc.info/?l=linux-kernel&m=121520229900676&w=2
Sorry, I just forgot to mention that I had already done that. It doesn't
make any difference as far as I can see (not that I'm surprised about
it).
[...]
> And also, if you can, try doing 'stty noflsh' and then the test case
> again to see if this alters behavior. This may be good to know as well,
> even if the flush is not centrally involved in the issue.
No, that didn't make any difference either.
Regards,
Elias
O> seems that the following two calls do the same thing in n_tty.c:
>
> n_tty_flush_buffer(tty);
> tty_driver_flush_buffer(tty);
Sorry missed this originally - they don't do the same thing. The first
clears out anything in the ldisc internally the second clears out
anything in the tty driver itself.
Alan Cox wrote:
>> seems that the following two calls do the same thing in n_tty.c:
>> n_tty_flush_buffer(tty);
>> tty_driver_flush_buffer(tty);
>
> Sorry missed this originally - they don't do the same thing. The first
> clears out anything in the ldisc internally the second clears out
> anything in the tty driver itself.
Alan, before I wrote this, I had put a printk() in n_tty_flush_buffer()
and noticed it was called twice when ^C was hit in an xterm. Then I did
some investigating into this a few days ago, putting a dump_stack() in
n_tty_flush_buffer() so I could see how it is being called.
I realized the calls indeed have different purposes at that point. I
still wonder, though, why when processing a ^C in an xterm/pty,
n_tty_flush_buffer() does get called again from the driver call. See
the two traces below from the ldisc and driver flushes:
*********** CTRL-C received
Pid: 4669, comm: xterm Not tainted 2.6.26-rc8-git3 #1
[<c0283126>] n_tty_flush_buffer+0xd/0x67
[<c028385c>] n_tty_receive_buf+0x398/0xd87
[<c031824b>] ? sock_aio_read+0xed/0xfb
[<c017a824>] ? do_sync_read+0xab/0xe9
[<c0136257>] ? hrtimer_forward+0xd6/0xec
[<c0285569>] pty_write+0x2d/0x3b
[<c0283450>] write_chan+0x21b/0x28f
[<c011bfa4>] ? default_wake_function+0x0/0xd
[<c028103f>] tty_write+0x14e/0x1be
[<c0283235>] ? write_chan+0x0/0x28f
[<c017a8ec>] ? rw_verify_area+0x8a/0xad
[<c0280ef1>] ? tty_write+0x0/0x1be
[<c017ae88>] vfs_write+0x8c/0x133
[<c017b48c>] sys_write+0x3b/0x60
[<c0103aa3>] sysenter_past_esp+0x78/0xb1
=======================
Pid: 4669, comm: xterm Not tainted 2.6.26-rc8-git3 #1
[<c02857c5>] ? pty_unthrottle+0x15/0x21
[<c0283126>] n_tty_flush_buffer+0xd/0x67
[<c0285663>] pty_flush_buffer+0x20/0x67
[<c038ae61>] ? _spin_unlock_irqrestore+0x1b/0x2f
[<c0284934>] tty_driver_flush_buffer+0x13/0x15
[<c0283863>] n_tty_receive_buf+0x39f/0xd87
[<c031824b>] ? sock_aio_read+0xed/0xfb
[<c017a824>] ? do_sync_read+0xab/0xe9
[<c0136257>] ? hrtimer_forward+0xd6/0xec
[<c0285569>] pty_write+0x2d/0x3b
[<c0283450>] write_chan+0x21b/0x28f
[<c011bfa4>] ? default_wake_function+0x0/0xd
[<c028103f>] tty_write+0x14e/0x1be
[<c0283235>] ? write_chan+0x0/0x28f
[<c017a8ec>] ? rw_verify_area+0x8a/0xad
[<c0280ef1>] ? tty_write+0x0/0x1be
[<c017ae88>] vfs_write+0x8c/0x133
[<c017b48c>] sys_write+0x3b/0x60
[<c0103aa3>] sysenter_past_esp+0x78/0xb1
=======================
In a Linux virtual console/tty, however, the tty driver flush doesn't
call the ldisc flush again in my tests:
*********** CTRL-C received
Pid: 6, comm: events/0 Not tainted 2.6.26-rc8-git3 #1
[<c0283126>] n_tty_flush_buffer+0xd/0x67
[<c028385c>] n_tty_receive_buf+0x398/0xd87
[<c038ab32>] ? _spin_lock_irqsave+0x27/0x41
[<c038ab32>] ? _spin_lock_irqsave+0x27/0x41
[<c038ae61>] ? _spin_unlock_irqrestore+0x1b/0x2f
[<c027f3ee>] ? tty_ldisc_try+0x2f/0x35
[<c027f9e2>] flush_to_ldisc+0xde/0x14d
[<c013129d>] run_workqueue+0x78/0x102
[<c027f904>] ? flush_to_ldisc+0x0/0x14d
[<c0131a0b>] ? worker_thread+0x0/0xbf
[<c0131abf>] worker_thread+0xb4/0xbf
[<c0133f4d>] ? autoremove_wake_function+0x0/0x33
[<c0133e77>] kthread+0x3b/0x64
[<c0133e3c>] ? kthread+0x0/0x64
[<c0104753>] kernel_thread_helper+0x7/0x10
=======================
-Joe
> In a Linux virtual console/tty, however, the tty driver flush doesn't
> call the ldisc flush again in my tests:
PTY/TTY is a pair - the driver flush of one side causes an ldisc flush of
the other.