Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752327AbcLGNA3 convert rfc822-to-8bit (ORCPT ); Wed, 7 Dec 2016 08:00:29 -0500 Received: from rrzmta1.uni-regensburg.de ([194.94.155.51]:35051 "EHLO rrzmta1.uni-regensburg.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751396AbcLGNA2 (ORCPT ); Wed, 7 Dec 2016 08:00:28 -0500 Message-Id: <584815F8020000A100023940@gwsmtp1.uni-regensburg.de> X-Mailer: Novell GroupWise Internet Agent 14.2.1 Date: Wed, 07 Dec 2016 14:00:24 +0100 From: "Ulrich Windl" To: Cc: , "Ulrich Windl" Subject: Antw: 3.0.101: "blk_rq_check_limits: over max size limit." Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13061 Lines: 278 Hi once more! I managed to get the call traces of involved processes: 1) The process doing read(): Dec 7 13:51:16 h10 kernel: [183809.594434] SysRq : Show Blocked State Dec 7 13:51:16 h10 kernel: [183809.594447] task PC stack pid father Dec 7 13:51:16 h10 kernel: [183809.594750] randio D ffff8801703a9d68 0 2762 53250 0x00000004 Dec 7 13:51:16 h10 kernel: [183809.594758] ffff880100887ad8 0000000000000046 ffff880100886010 0000000000010900 Dec 7 13:51:16 h10 kernel: [183809.594765] 0000000000010900 0000000000010900 0000000000010900 ffff880100887fd8 Dec 7 13:51:16 h10 kernel: [183809.594772] ffff880100887fd8 0000000000010900 ffff88016bb6a280 ffff88017670c300 Dec 7 13:51:16 h10 kernel: [183809.594778] Call Trace: Dec 7 13:51:16 h10 kernel: [183809.594806] [] io_schedule+0x9c/0xf0 Dec 7 13:51:16 h10 kernel: [183809.594820] [] __lock_page+0x93/0xc0 Dec 7 13:51:16 h10 kernel: [183809.594834] [] truncate_inode_pages_range+0x294/0x460 Dec 7 13:51:16 h10 kernel: [183809.594845] [] __blkdev_put+0x1d7/0x210 Dec 7 13:51:16 h10 kernel: [183809.594856] [] __fput+0xb3/0x200 Dec 7 13:51:16 h10 kernel: [183809.594868] [] filp_close+0x5c/0x90 Dec 7 13:51:16 h10 kernel: [183809.594880] [] put_files_struct+0x7a/0xd0 Dec 7 13:51:16 h10 kernel: [183809.594889] [] do_exit+0x1d0/0x470 Dec 7 13:51:16 h10 kernel: [183809.594897] [] do_group_exit+0x3d/0xb0 Dec 7 13:51:16 h10 kernel: [183809.594907] [] get_signal_to_deliver+0x247/0x480 Dec 7 13:51:16 h10 kernel: [183809.594919] [] do_signal+0x71/0x1b0 Dec 7 13:51:16 h10 kernel: [183809.594928] [] do_notify_resume+0x98/0xb0 Dec 7 13:51:16 h10 kernel: [183809.594940] [] int_signal+0x12/0x17 Dec 7 13:51:16 h10 kernel: [183809.594988] [<00007f64f28cbba0>] 0x7f64f28cbb9f 2) The process trying to modify the queue scheduler: Dec 7 13:51:16 h10 kernel: [183809.594995] blocktune D ffff88014e048000 0 58867 1 0x00000000 Dec 7 13:51:16 h10 kernel: [183809.595000] ffff880128defd18 0000000000000086 ffff880128dee010 0000000000010900 Dec 7 13:51:16 h10 kernel: [183809.595007] 0000000000010900 0000000000010900 0000000000010900 ffff880128deffd8 Dec 7 13:51:16 h10 kernel: [183809.595013] ffff880128deffd8 0000000000010900 ffff88012889a3c0 ffff8801767bc1c0 Dec 7 13:51:16 h10 kernel: [183809.595019] Call Trace: Dec 7 13:51:16 h10 kernel: [183809.595026] [] schedule_timeout+0x1b0/0x2a0 Dec 7 13:51:16 h10 kernel: [183809.595040] [] msleep+0x1d/0x30 Dec 7 13:51:16 h10 kernel: [183809.595052] [] __blk_drain_queue+0xbc/0x140 Dec 7 13:51:16 h10 kernel: [183809.595063] [] elv_quiesce_start+0x51/0x90 Dec 7 13:51:16 h10 kernel: [183809.595071] [] elevator_switch+0x4a/0x150 Dec 7 13:51:16 h10 kernel: [183809.595079] [] elevator_change+0x6d/0xb0 Dec 7 13:51:16 h10 kernel: [183809.595086] [] elv_iosched_store+0x27/0x60 Dec 7 13:51:16 h10 kernel: [183809.595096] [] queue_attr_store+0x67/0xc0 Dec 7 13:51:16 h10 kernel: [183809.595106] [] sysfs_write_file+0xcb/0x160 Dec 7 13:51:16 h10 kernel: [183809.595115] [] vfs_write+0xce/0x140 Dec 7 13:51:16 h10 kernel: [183809.595123] [] sys_write+0x53/0xa0 Dec 7 13:51:16 h10 kernel: [183809.595131] [] system_call_fastpath+0x16/0x1b Dec 7 13:51:16 h10 kernel: [183809.595140] [<00007f12b7f70c00>] 0x7f12b7f70bff 3) The process trying to read the queue scheduler: Dec 7 13:51:16 h10 kernel: [183809.595149] cat D ffff880147873718 0 45053 5957 0x00000004 Dec 7 13:51:16 h10 kernel: [183809.595155] ffff880122f7be00 0000000000000082 ffff880122f7a010 0000000000010900 Dec 7 13:51:16 h10 kernel: [183809.595161] 0000000000010900 0000000000010900 0000000000010900 ffff880122f7bfd8 Dec 7 13:51:16 h10 kernel: [183809.595167] ffff880122f7bfd8 0000000000010900 ffff8801154ea1c0 ffffffff81a11020 Dec 7 13:51:16 h10 kernel: [183809.595174] Call Trace: Dec 7 13:51:16 h10 kernel: [183809.595181] [] __mutex_lock_slowpath+0x160/0x1f0 Dec 7 13:51:16 h10 kernel: [183809.595189] [] mutex_lock+0x1a/0x40 Dec 7 13:51:16 h10 kernel: [183809.595196] [] queue_attr_show+0x49/0xb0 Dec 7 13:51:16 h10 kernel: [183809.595203] [] sysfs_read_file+0xfe/0x1c0 Dec 7 13:51:16 h10 kernel: [183809.595212] [] vfs_read+0xc7/0x130 Dec 7 13:51:16 h10 kernel: [183809.595219] [] sys_read+0x53/0xa0 Dec 7 13:51:16 h10 kernel: [183809.595226] [] system_call_fastpath+0x16/0x1b Dec 7 13:51:16 h10 kernel: [183809.595235] [<00007fb04560dba0>] 0x7fb04560db9f >>> Ulrich Windl schrieb am 07.12.2016 um 13:23 in Nachricht <5847FF5E.7E4 : 161 : 60728>: > Hi again! > > An addition: Processes doing such I/O seem to be unkillable, and I also > cannot change the queue parameters while this problem occurs (the process > trying to write (e.g.: to queue/scheduler) is also blocked. The process > status of the process doing I/O looks like this: > # cat /proc/2762/status > Name: randio > State: D (disk sleep) > Tgid: 2762 > Pid: 2762 > PPid: 53250 > TracerPid: 0 > Uid: 0 0 0 0 > Gid: 0 0 0 0 > FDSize: 0 > Groups: 0 105 > Threads: 1 > SigQ: 5/38340 > SigPnd: 0000000000000000 > ShdPnd: 0000000000004102 > SigBlk: 0000000000000000 > SigIgn: 0000000000000000 > SigCgt: 0000000180000000 > CapInh: 0000000000000000 > CapPrm: ffffffffffffffff > CapEff: ffffffffffffffff > CapBnd: ffffffffffffffff > Cpus_allowed: ffffffff,ffffffff > Cpus_allowed_list: 0-63 > Mems_allowed: > 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000 > 000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,0 > 0000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000 > 00,00000000,00000000,00000000,00000000,00000000,00000001 > Mems_allowed_list: 0 > voluntary_ctxt_switches: 5 > nonvoluntary_ctxt_switches: 1 > > Best regards, > Ulrich > > >>> Ulrich Windl schrieb am 07.12.2016 um 13:19 in Nachricht <5847FE66.7E4 : 161 > : > 60728>: > > Hi again! > > > > Maybe someone can confirm this: > > If you have a device (e.g. multipath map) that limits max_sectors_kb to > > maybe 64, and then define an LVM LV using that multipath map as PV, the LV > > still has a larger max_sectors_kb. If you send a big request (read in my > > case), the kernel will complain: > > > > kernel: [173116.098798] blk_rq_check_limits: over max size limit. > > > > Note that this message does not give any clue to the device being involved, > > > nor the size of the IO attempted, nor the limit of the IO size. > > > > My expectation would be that the higher layer reports back an I/O error, > and > > the user process receives an I/O error, or, alternatively, the big request > is > > split into acceptable chunks before passing it to the lower layers. > > > > However none of the above happens; instead the request seems to block the > > request queue, because later TUR-checks also fail: > > kernel: [173116.105701] device-mapper: multipath: Failing path 66:384. > > kernel: [173116.105714] device-mapper: multipath: Failing path 66:352. > > multipathd: 66:384: mark as failed > > multipathd: NAP_S11: remaining active paths: 1 > > multipathd: 66:352: mark as failed > > multipathd: NAP_S11: Entering recovery mode: max_retries=6 > > multipathd: NAP_S11: remaining active paths: 0 > > > > (somewhat later) > > multipathd: NAP_S11: sdkh - tur checker reports path is up > > multipathd: 66:336: reinstated > > multipathd: NAP_S11: Recovered to normal mode > > kernel: [173117.286712] device-mapper: multipath: Could not failover device > > > 66:368: Handler scsi_dh_alua error 8. > > (I don't know the implications of this) > > > > Of course this error does not appear as long as all devices use the same > > maximum request size, but tests have shown that different SAN disk systems > > prefer different request sizes (as they split large requests internally to > > handle them in chunks anyway). > > > > Last seen with this kernel (SLES11 SP4 on x86_64): Linux version > > 3.0.101-88-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch > > revision 152973] (SUSE Linux) ) #1 SMP Fri Nov 4 22:07:35 UTC 2016 > (b45f205) > > > > Regards, > > Ulrich > > > > >>> Ulrich Windl schrieb am 23.08.2016 um 17:03 in Nachricht <57BC65CD.D1A : > 161 > > : > > 60728>: > > > Hello! > > > > > > While performance-testing a 3PARdata StorServ 8400 with SLES11SP4, I > > noticed > > > that I/Os dropped, until everything stood still more or less. Looking into > > > > the syslog I found that multipath's TUR-checker considered the paths (FC, > > > BTW) as dead. Amazingly I did not have this problem when I did read-only > > > tests. > > > > > > The start looks like this: > > > Aug 23 14:44:58 h10 multipathd: 8:32: mark as failed > > > Aug 23 14:44:58 h10 multipathd: FirstTest-32: remaining active paths: 3 > > > Aug 23 14:44:58 h10 kernel: [ 880.159425] blk_rq_check_limits: over max > > > size limit. > > > Aug 23 14:44:58 h10 kernel: [ 880.159611] blk_rq_check_limits: over max > > > size limit. > > > Aug 23 14:44:58 h10 kernel: [ 880.159615] blk_rq_check_limits: over max > > > size limit. > > > Aug 23 14:44:58 h10 kernel: [ 880.159623] device-mapper: multipath: > > Failing > > > path 8:32. > > > Aug 23 14:44:58 h10 kernel: [ 880.186609] blk_rq_check_limits: over max > > > size limit. > > > Aug 23 14:44:58 h10 kernel: [ 880.186626] blk_rq_check_limits: over max > > > size limit. > > > Aug 23 14:44:58 h10 kernel: [ 880.186628] blk_rq_check_limits: over max > > > size limit. > > > Aug 23 14:44:58 h10 kernel: [ 880.186631] device-mapper: multipath: > > Failing > > > path 129:112. > > > [...] > > > It seems the TUR-checker does some ping-pong-like game: Paths go up and > > down > > > > > > Now for the Linux part: I found the relevant message in blk-core.c > > > (blk_rq_check_limits()). > > > First s/agaist/against/ in " * Such request stacking drivers should > > check > > > those requests agaist", the there's the problem that the message neither > > > outputs the blk_rq_sectors(), nor the blk_queue_get_max_sectors(), nor the > > > > underlying device. That makes debugging somewhat difficult if you > customize > > > > > the block queue settings per device as I did: > > > > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/rotational for FirstTest-31 (0) > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/add_random for FirstTest-31 (0) > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/scheduler for FirstTest-31 (noop) > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/max_sectors_kb for FirstTest-31 (128) > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/rotational for FirstTest-32 (0) > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/add_random for FirstTest-32 (0) > > > Aug 23 14:32:33 h10 blocktune: (notice) start: activated tuning of > > > queue/scheduler for FirstTest-32 (noop) > > > Aug 23 14:32:34 h10 blocktune: (notice) start: activated tuning of > > > queue/max_sectors_kb for FirstTest-32 (128) > > > > > > I suspect the "queue/max_sectors_kb=128" is the culprit: > > > # multipath -ll FirstTest-32 > > > FirstTest-32 (360002ac000000000000000040001b383) dm-7 3PARdata,VV > > > size=10G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > > `-+- policy='service-time 0' prio=50 status=active > > > |- 2:0:0:1 sdet 129:80 failed ready running > > > |- 2:0:2:1 sdev 129:112 failed ready running > > > |- 1:0:0:1 sdb 8:16 failed ready running > > > `- 1:0:1:1 sdc 8:32 failed ready running > > > # cat /sys/block/{dm-7,sd{b,c},sde{t,v}}/queue/max_sectors_kb > > > 128 > > > 128 > > > 128 > > > 128 > > > 128 > > > > > > While writing this message, I noticed that I had created a primary > > partition > > > of dm-7: > > > # dmsetup ls |grep Fi > > > FirstTest-32_part1 (253:8) > > > FirstTest-32 (253:7) > > > # cat /sys/block/dm-8/queue/max_sectors_kb > > > 1024 > > > > > > After "# echo 128 >/sys/block/dm-8/queue/max_sectors_kb" things still did > > not > > > get better. > > > > > > Can't blk_rq_check_limits() do anything more clever than returning -EIO? > > > > > > Regards, > > > Ulrich > > > P.S: Keep me in CC:, please! > > > > > > > > > > > > > > > > > > > > >