2021-02-13 15:56:35

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 211733] New: ext4 file system unrecoverable corruption

https://bugzilla.kernel.org/show_bug.cgi?id=211733

Bug ID: 211733
Summary: ext4 file system unrecoverable corruption
Product: File System
Version: 2.5
Kernel Version: 5.4.0-65-generic
Hardware: i386
OS: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: ext4
Assignee: [email protected]
Reporter: [email protected]
Regression: No

Kubuntu 20.04, two week old installation
500 SATA HDD, 50GiB / partition/ 180GiB /home partition
Dual boot with Win7 on 50GiB partition

Observation:
Was switching between the Win7 OS and Linux with multiple reboots in short
spans of time(<5min). From Linux OS using Dolphin I moved ~5MB document files
from Win7 partition to Linux /home/xxx/Documents directory and rebooted system
to return to Win7. Made changes in Win7 as needed and booted back into Linux.
I noticed the entire Documents directory was missing, about 50GiB files.
Immediately shut down system and booted up Linux on duplicate drive containing
image from about two weeks prior. Made read only image of /home directory from
corrupted drive and placed on external 1 GiB backup drive.

Using R-Linux, extundelete, debugfs no trace of the Documents directory can be
located on the image or the original /home directory. I can see files I
intentionally deleted during normal operations for over a week prior.

fsck, smartctl indicate no disk issues.

I have not tried to reproduce this issue.

This event seems very similar to the one discuss in this link but I have not
been able to locate that particular bug.

https://www.itnews.com.au/news/stable-linux-kernels-hit-by-serious-file-system-bug-320709

I entered bug report on the bugs.kde.org bug tracker(432762) but was told that
the issue is lower level than the Dolphin gui which I was using.

Apologies if this is a duplicate, but I could not find a similar issue on this
tracker.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


2021-02-16 18:32:47

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 211733] ext4 file system unrecoverable corruption

https://bugzilla.kernel.org/show_bug.cgi?id=211733

Theodore Tso ([email protected]) changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]

--- Comment #1 from Theodore Tso ([email protected]) ---
The symptoms may be the same as a news article from 8 or 9 years ago, but that
particular bug was solved a *long* time ago.

Unfortunately, there are many different potential causes of data loss. It
could be caused by bad partition tables, such that (for example) the Windows 7
partition overlaps (or Windows 7 thinks that) the partition overlaps with the
Linux system. It could be caused by hardware problems. It could becaused by
the user incorrectly using the GUI. There's no way to tell based on the
complete lack of data in the bug report.

It's much like sending a doctor an e-mail complaining with a tinghtness of
chest and trouble breathing, but not giving the doctor any medical history, no
ability for the doctor to give the patient a reading of an ECG, etc.

You're going to have to reproduce it, and do this with a large number of small
checks. Try copying data from Windows 7 to Linux. Check to see if the data
is there in Linux. Try rebooting from Linux into Linux, and see if the data is
there. Then try rebooting into Windows and do some things, recording exactly
what you are doing, and then try rebooting back into Linux and check the
Documents folder.

Then (using a command line interface, so it's easier to capture the output and
report it to a bug tracker), you need to get a printout of the partition table,
and/or the Logical Volume and Physical Volume layout if you are using LVM, and
also grab the kernel logs to see if there are any errors reported by the file
system or device drivers, etc.

If you don't know how to do this, it's much more likely that the problem is
user error, and my best suggestion is to find a local Linux user's group and
ask for help. Those folks might ask lots of potentially insultning questions,
such as making sure that you were cleanly shutting down the system before
rebooting back from Linux to Windows, or before powering down the computer; but
those sorts of questions tend to be less insulting when someone asks you in
person as opposed to via phone or e-mail tech support when people are obligated
to ask the "are you sure the computer is plugged in" kind of basic questions.

Good luck!

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2021-02-17 13:40:21

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 211733] ext4 file system unrecoverable corruption

https://bugzilla.kernel.org/show_bug.cgi?id=211733

--- Comment #2 from [email protected] ---
Thank you very much for such a detailed response. I acknowledge the lack of
actionable data in the initial report. The event was initially anticipated to
be a recoverable crisis and so no log data was captured to report. In
hindsight, this was a mistake.

I do not think intentional reproduction of the event will occur. Recovery from
this event was difficult and I am still not whole. I would have to set up a
separate machine with sacrificial data to not feel at extreme risk to do so.
However, should such a repetition occur, I will be much more detailed with my
report.

I greatly appreciate your patience, insight and attention to detail in your
response.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

2021-02-17 16:39:47

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 211733] ext4 file system unrecoverable corruption

https://bugzilla.kernel.org/show_bug.cgi?id=211733

--- Comment #3 from Theodore Tso ([email protected]) ---
Free advice? Before you do anything else, back up *everything* before you
even breathe on the system. You may think it's not going to reproduce again,
but if it does, you may end up losing more data.

I tend to keep things very simple. Which is to say, I try not dual-boot
Windows and Linux, and if I do, I use separate HDD's for the Windows and Linux
systems. So if I were doing anything like this at all, I'd boot into a Linux
system, and then copy everything from the Windows partition to the Linux
partition in a single go, and then be done with it. The KISS (Keep things
simple, stupid) principle is always a good way to follow especially with
valuable data.

And we're only talking about a 500GB HDD. Getting a second 500GB disk, or for
that matter, an external 1TB HDD or even SSD, is cheap, compared to the value
of your time.

Backups. Backups. Backups. I've worked at MIT, and seen a graduate student
lose ten years worth of their research data due to lack of backups. One could
perhaps claim that someone who was dumb enough not to make backups doesn't
deserve to have a Ph.D., but regardless, it's still a tragedy; and totally
avoidable.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.