Return-Path: Received: from mxo2.dft.dmz.twosigma.com ([208.77.212.182]:54197 "EHLO mxo2.dft.dmz.twosigma.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726751AbeKIDpE (ORCPT ); Thu, 8 Nov 2018 22:45:04 -0500 From: Elana Hashman To: "'tytso@mit.edu'" CC: "'linux-ext4@vger.kernel.org'" Subject: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels Date: Thu, 8 Nov 2018 17:59:18 +0000 Message-ID: <9abbdde6145a4887a8d32c65974f7832@exmbdft5.ad.twosigma.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi Ted, We've run into a mysterious "phantom" full filesystem issue on our Kubernetes fleet. We initially encountered this issue on kernel 4.1.35, but are still experiencing the problem after upgrading to 4.14.67. Essentially, `df` reports our root filesystems as full and they behave as though they are full, but the "used" space cannot be accounted for. Rebooting the system, remounting the root filesystem read-only and then remounting as read-write, or booting into single-user mode all free up the "used" space. The disk slowly fills up over time, suggesting that there might be some kind of leak; we previously saw this affecting hosts with ~200 days of uptime on the 4.1 kernel, but are now seeing it affect a 4.14 host with only ~70 days of uptime. Here is some data from an example host, running the 4.14.67 kernel. The root disk is ext4. $ uname -a Linux 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux $ grep ' / ' /proc/mounts /dev/disk/by-uuid/ / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0 `df` reports 0 bytes free: $ df -h / Filesystem ???????????????????????????????????????????? Size ?Used Avail Use% Mounted on /dev/disk/by-uuid/ ??50G 48G ??? 0 100% / Deleted, open files account for almost no disk capacity: $ sudo lsof -a +L1 / COMMAND??? PID ??USER FD ??TYPE DEVICE SIZE/OFF NLINK??? NODE NAME java ???? 5313 user??? 3r ??REG??? 8,3 ?6806312 ??? 0 1315847 /var/lib/sss/mc/passwd (deleted) java ???? 5313 user ??11u REG??? 8,3??? 55185 ??? 0 2494654 /tmp/classpath.1668Gp (deleted) system_ar 5333 user??? 3r ??REG??? 8,3 ?6806312 ??? 0 1315847 /var/lib/sss/mc/passwd (deleted) java ???? 5421 user??? 3r ??REG??? 8,3 ?6806312 ??? 0 1315847 /var/lib/sss/mc/passwd (deleted) java ???? 5421 user ??11u REG??? 8,3 ??149313 ??? 0 2494486 /tmp/java.fzTwWp (deleted) java ???? 5421 tsdist ??12u REG??? 8,3??? 55185 ??? 0 2500513 /tmp/classpath.7AmxHO (deleted) `du` can only account for 16GB of file usage: $ sudo du -hxs / 16G ??? / But what is most puzzling is the numbers reported by e2freefrag, which don't add up: $ sudo e2freefrag /dev/disk/by-uuid/ Device: /dev/disk/by-uuid/ Blocksize: 4096 bytes Total blocks: 13107200 Free blocks: 7778076 (59.3%) Min. free extent: 4 KB Max. free extent: 8876 KB Avg. free extent: 224 KB Num. free extent: 6098 HISTOGRAM OF FREE EXTENT SIZES: Extent Size Range : ?Free extents Free Blocks ?Percent ??? 4K...??? 8K- ?: ???? 1205 ???????? 1205??? 0.02% ??? 8K... ??16K- : ???????? 980 ???????? 2265??? 0.03% ??16K... ??32K- : ??????? 653 ???????? 3419??? 0.04% ??32K... ??64K- : ?????? 1337 ??????? 15374??? 0.20% ??64K... ?128K- : ????????? 631 ??????? 14151??? 0.18% ?128K... ?256K- : ???????? 224 ??????? 10205??? 0.13% ?256K... ?512K- : ???????? 261 ??????? 23818??? 0.31% ?512K... 1024K- ?: ??? 303 ??????? 56801??? 0.73% ??? 1M...??? 2M- ?: ????? 387 ?????? 135907??? 1.75% ??? 2M...??? 4M- ?: ????? 103 ??????? 64740??? 0.83% ??? 4M...??? 8M- ?: ?????? 12 ??????? 15005??? 0.19% ??? 8M... ??16M- : ?????????? 2 ???????? 4267??? 0.05% This looks like a bug to me; the histogram in the manpage example has percentages that add up to 100% but this doesn't even add up to 5%. After a reboot, `df` reflects real utilization: $ df -h / Filesystem ???????????????????????????????????????????? Size ?Used Avail Use% Mounted on /dev/disk/by-uuid/ ??50G 16G 31G 34% / We are using overlay2fs for Docker, as well as rbd mounts; I'm not sure how they might interact. Thanks for your help, -- Elana Hashman ehashman@twosigma.com