Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752985Ab3IRO4u (ORCPT ); Wed, 18 Sep 2013 10:56:50 -0400 Received: from mail-we0-f178.google.com ([74.125.82.178]:51643 "EHLO mail-we0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040Ab3IRO4t (ORCPT ); Wed, 18 Sep 2013 10:56:49 -0400 MIME-Version: 1.0 In-Reply-To: <20130917211317.GB6537@quack.suse.cz> References: <20130917211317.GB6537@quack.suse.cz> From: Michal Suchanek Date: Wed, 18 Sep 2013 16:56:08 +0200 Message-ID: Subject: Re: doing lots of disk writes causes oom killer to kill processes To: Jan Kara Cc: Hillf Danton , LKML , Linux-MM Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4985 Lines: 115 On 17 September 2013 23:13, Jan Kara wrote: > Hello, > > On Tue 17-09-13 15:31:31, Michal Suchanek wrote: >> On 5 September 2013 12:12, Michal Suchanek wrote: >> > On 26 August 2013 15:51, Michal Suchanek wrote: >> >> On 12 March 2013 03:15, Hillf Danton wrote: >> >>>>On 11 March 2013 13:15, Michal Suchanek wrote: >> >>>>>On 8 February 2013 17:31, Michal Suchanek wrote: >> >>>>> Hello, >> >>>>> >> >>>>> I am dealing with VM disk images and performing something like wiping >> >>>>> free space to prepare image for compressing and storing on server or >> >>>>> copying it to external USB disk causes >> >>>>> >> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores >> >>>>> are 100% used by system and the machine is basicaly unusable >> >>>>> >> >>>>> 2) oom killer killing processes >> >>>>> >> >>>>> This all on system with 8G ram so there should be plenty space to work with. >> >>>>> >> >>>>> This happens with kernels 3.6.4 or 3.7.1 >> >>>>> >> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a >> >>>>> problem even with less ram. >> >>>>> >> >>>>> I have vm.swappiness = 0 set for a long time already. >> >>>>> >> >>>>> >> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the >> >>>>kernel still causes all cores to loop somewhere in system when writing >> >>>>lots of data to disk. >> >>>> >> >>>>With swappiness as much as 90 processes still get killed on large disk writes. >> >>>> >> >>>>Given that the max is 100 the interval in which mm works at all is >> >>>>going to be very narrow, less than 10% of the paramater range. This is >> >>>>a severe regression as is the cpu time consumed by the kernel. >> >>>> >> >>>>The io scheduler is the default cfq. >> >>>> >> >>>>If you have any idea what to try other than downgrading to an earlier >> >>>>unaffected kernel I would like to hear. >> >>>> >> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible >> >>> deadlock caused by too_many_isolated())? >> >>> >> >>> Or try 3.8 and/or 3.9, additionally? >> >>> >> >> >> >> Hello, >> >> >> >> with deadline IO scheduler I experience this issue less often but it >> >> still happens. >> >> >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem. >> >> >> >> Do you have some idea what to log so that useful information about the >> >> lockup is gathered? >> >> >> > >> > This appears to be fixed in vanilla 3.11 kernel. >> > >> > I still get short intermittent lockups and cpu usage spikes up to 20% >> > on a core but nowhere near the minute+ long lockups with all cores >> > 100% on earlier kernels. >> > >> >> So I did more testing on the 3.11 kernel and while it works OK with >> tar you can get severe lockups with mc or kvm. The difference is >> probably the fact that sane tools do fsync() on files they close >> forcing the file to write out and the kernel returning possible write >> errors before they move on to next file. > Sorry for chiming in a bit late. But is this really writing to a normal > disk? SATA drive or something else? > >> With kvm writing to a file used as virtual disk the system would stall >> indefinitely until the disk driver in the emulated system would time >> out, return disk IO error, and the emulated system would stop writing. >> In top I see all CPU cores 90%+ in wait. System is unusable. With mc >> the lockups would be indefinite, probably because there is no timeout >> on writing a file in mc. >> >> I tried tuning swappiness and eleveators but the the basic problem is >> solved by neither: the dirty buffers fill up memory and system stalls >> trying to resolve the situation. > This is really strange. There is /proc/sys/vm/dirty_ratio, which limits > amount of dirty memory. By default it is set to 20% of memory which tends > to be too much for 8 GB machine. Can you set it to something like 5% and > /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate > sizing (assuming standard SATA drive). Does it change anything? The default for dirty_ratio/dirty_background_ratio is 60/40. Setting these to 5/2 gives about the same result as running the script that syncs every 5s. Setting to 30/10 gives larger data chunks and intermittent lockup before every chunk is written. It is quite possible to set kernel parameters that kill the kernel but 1) this is the default 2) the parameter is set in units that do not prevent the issue in general (% RAM vs #blocks) 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle traversing a structure holding 800M data in the background. Something is seriously rotten somewhere. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/