Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp4619957imb; Wed, 6 Mar 2019 18:46:18 -0800 (PST) X-Google-Smtp-Source: APXvYqxcE7MMUCHq8JD/2Xx3Se58eU6EgnTS6lNbVN1oSPhgAWoID3+dDvGKFyx6lSZAepzoqEem X-Received: by 2002:a62:76c9:: with SMTP id r192mr10700552pfc.251.1551926778735; Wed, 06 Mar 2019 18:46:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551926778; cv=none; d=google.com; s=arc-20160816; b=M/Ahj6pn/u5sVSv/RSTg/IWDVakVBFijwdKwP3DaEE9U8Q2jNo62OaPjMzFS7BjB62 dCzw/DwYeG1Gp3t4zNEsy/od0UT0lFuGMe8AXInHmRCqtG8yWqqZu62XCDGhNObXf73b KBQob+Js7wt5dzsmD7tiQt3K5u+JPtXCU5lcRl7tOy38wE5sP0Av4u73zyUe1zZ59cIg Z1ZymkikGagq6Nyky5pW9fQRkWrWikR/wXpX4LNGfsCv/gx45REesy6PDJXAgEg57VlG T8dejSiG0HYfo8uXiNzkZ8kjBTgC1efkcB7TuYA6e05qM54aFzCiIKEe9mCQZYrL9/Ae oKmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:mime-version:user-agent:date:message-id :subject:from:cc:to:dkim-signature; bh=3fDLRdafMCEeumCNQkE60+ioul4UnOCzLjX/tyLBlfA=; b=a7JldEoOdRxVt1Wz81nEXqW4pW0aiuwFsb97lA1Lkn4CfG3bTxcxCwh8oJIhO3zujx laglix9J+3Ax2J5zTTt1HCkpd8As6HPBPk0gNOGbtxMd0bRUGFS4Jm9krjs8OtH8X65j dcwlF0/Q98GoPEyyB0bN+NMlzbEaupsbGoI/iq7lt1lWBY4C8e8GkcGce8Yt02miJYt7 gyxwXOAWDQEHJNEDyHXD/07W8kap/ZCHSb4+4Ev/l9c+goiVE0WhnY9nezOYC02DsW1L 3pEOR/ueNeRlYwJgS9qhq5FGkB6nmcgjrjByXigkUMFozS801RwNkffFVrUha71X3aEg bt5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@digidescorp.com header.s=google header.b=XrbmO7Tn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p7si2864388pgm.335.2019.03.06.18.46.02; Wed, 06 Mar 2019 18:46:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@digidescorp.com header.s=google header.b=XrbmO7Tn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727546AbfCGCo5 (ORCPT + 99 others); Wed, 6 Mar 2019 21:44:57 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:35732 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726715AbfCGCo5 (ORCPT ); Wed, 6 Mar 2019 21:44:57 -0500 Received: by mail-io1-f65.google.com with SMTP id x4so12154715ion.2 for ; Wed, 06 Mar 2019 18:44:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digidescorp.com; s=google; h=to:cc:from:subject:message-id:date:user-agent:mime-version :content-transfer-encoding:content-language; bh=3fDLRdafMCEeumCNQkE60+ioul4UnOCzLjX/tyLBlfA=; b=XrbmO7TnUeEKuBKYXwZQE5ynbrdkQGPlqqbDW7Z6pjtjbo+9yUcBPzmwxhINY8QIR7 srM9tEuoHfZoeRdkAiZbSLM+ms5IrMtykkSrvoG/M9mzy2GjxABO6lUFt88t3XaxWrRl NxOcgr7BRIRPXal2hxpTVi9ZjaugNx4jV9sbQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:from:subject:message-id:date:user-agent :mime-version:content-transfer-encoding:content-language; bh=3fDLRdafMCEeumCNQkE60+ioul4UnOCzLjX/tyLBlfA=; b=Fjg4Kw8eBtLhu8BpTYTpiFvR+yY1ScTcJnYI5ysvO4FC+Jqz0gYP9hu3UiYtdKLTt2 cCwJNiwEymvHiC2bPkqaiAFW5J2tdJY/C9FXvrI0njNseM+i91qih700Pq7rbTw/PjZX tjgfCx1JlvAEGTyqgvK9kS3UFpV3v0gASk1EGnpTW3yeFOg8V/3Syq4nWM0qjGvyOlP/ cpjCa/i7bn3uGQXZhib6Mhg9NypEvIcV5Q1ut55GVHP44Hxy9ACkCIjY/EGHpuxjNUr6 skLJ7pSXNBNgYURWWsYZ7+QTBFJRJNPrSm4xvZ5ynharkgGKLbcV7j+OVoMGnaZ8YPaS KN0w== X-Gm-Message-State: APjAAAWFTgKb0AUB4m3GmdFTL+HNo4kC4NsB7x8fVulPKTQXDFvQJ9v5 FGNm/wqTWusS6MajwNRNdkgQrg== X-Received: by 2002:a5e:d501:: with SMTP id e1mr5211377iom.149.1551926696258; Wed, 06 Mar 2019 18:44:56 -0800 (PST) Received: from ?IPv6:2600:1700:c991:2dc0::64? ([2600:1700:c991:2dc0::64]) by smtp.googlemail.com with ESMTPSA id q5sm1202435ioh.22.2019.03.06.18.44.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Mar 2019 18:44:55 -0800 (PST) To: Jan Kara , pali.rohar@gmail.com, reinoud@netbsd.org Cc: Colin King , "linux-kernel@vger.kernel.org" , linux-fsdevel@vger.kernel.org From: Steve Magnani Subject: [RFC] udftools: steps towards fsck Message-ID: <17e5fea5-8d76-c96d-8902-9050acba4288@digidescorp.com> Date: Wed, 6 Mar 2019 20:44:54 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (Please remove at least LKML when responding. Mailing lists are a scattershot attempt to reach others who might be interested in this topic since I'm not aware of any linux-udf mailing list. ) A few months ago I stumbled across an interesting bit of abandonware in the Sourceforge CVS repo that hosted UDF development through about 2004. Code that originated here eventually became the modern-day udftools:     https://sourceforge.net/p/linux-udf/code/ The 'udf' module in that repo contains a program from 1999 named 'chkudf', which appears to have been written by Rob Simms. Being from the Y2K era, the program has no awareness of anything beyond UDF2.01; in particular, its comprehension of VAT reflects UDF1.50 and not the revamped design introduced in UDF2.00. But it does have an ability to analyze the major UDF data structures and to walk the filesystem. I've spent quite a bit of time enhancing and fixing bugs in this code, with a short term goal of being able to report damage to UDF2.01 filesystems on "hard disk" (magnetic and SSD) media. It's not quite to the point of being release-ready, but I think the code is on the cusp of becoming useful to others so I wanted to get some feedback on the approach. I posted a GIT port (via SVN) of the CVS repo here, including all the changes I've made so far:     https://github.com/smagnani/chkudf.git If you're interested in building the code you should be able to just run 'make' within the chkudf folder. On Debian-derived systems you'll need libblkid-dev installed in order to build. Some questions for consideration: * Would a udffsck limited to checking of UDF2.01 and earlier on "hard disk" media be a sufficiently useful starting point to justify inclusion in udftools? Obviously a tool with such limitations would have to be particularly vigilant about ensuring that media-under-test doesn't exceed its capabilities. * If so, do you think the chkudf implementation could qualify? It's not ready yet, but with an investment of some time and energy it could be made more functionally complete and (maybe more importantly) more user-friendly. In part this is a question of whether the chkudf design can support enhancements to get (eventually) to UDF2.60 and optical media support, balanced against the many years without an open-source udffsck and not "letting the perfect become the enemy of the good." * For any standards-based parser it's important to have examples of as many variations as possible (both normal and pathological) in order to ensure that corner cases and less common features are tested properly. Can anyone point me to any good sources of UDF data for testing? There are always commercial DVDs and Blu-Ray discs, of course, and I've cobbled together a few special cases by hand (i.e., a filesystem with directory cycles), but I have no examples with extended attributes or stream data. If I could find a DVD of Mac software in a resale shop would that help? [Side note, I've thought of enhancing chkudf to support a tool that would store all the UDF structures of a filesystem in a tarball that could be used to reconstitute that filesystem within a sparse file. Since none of the file contents would be stored the tarballs would be relatively small even if they represent terabyte-scale filesystems. * Are there versions (or features) of UDF that are less important to support than others (1.50? Strategy 4096? Named streams? etc.) I know 1.02, 2.01, and 2.50 are in wide use. * What kinds of repairs are most important to implement? I was thinking that regeneration of the Logical Volume Integrity Descriptor and the unallocated space bitmap are both important and hopefully relatively straightforward. Beyond that...recovering ICBs to "lost+found"? My 2 cents: I didn't write this program. There are things I would have done differently, but to this point I have tried to work within the existing design and code style. After becoming more aware of differences between the various UDF standards (in particular, the increase in complexity since 2.01) and the many errata involved, I have a gut feeling that an implementation in a language that supports inheritance might be a lot more manageable over the long term - but it's not something I've spent a lot of time thinking about. I've only recently become aware of UDFclient, and haven't had time to look over its design yet. And, I can see the potential for followon utilities such as a filesystem resizer - which might argue for making more of the code library-based and not so heavy on printed output. Bottom line...udffsck has to start somewhere, could it start with chkudf? Thanks for reading. ------------------------------------------------------------------------  Steven J. Magnani               "I claim this network for MARS!  www.digidescorp.com              Earthling, return my space modulator!"  #include