Sometimes when you try to extract a .tar.gz, .tar.xz, .tar.bz2 or even a regular zip file, you will find that you get an error such as xz: (stdin): File format not recognized that indicates that the operating system can’t extract it. Ubuntu, and other Debian-derived distributions of Linux, provide a file utility that will let you know if what you’re trying to extract is properly named. Occasionally it’s possible that a compressed archive was misnamed, or sometimes due to a mistake by a Web browser, something that isn’t a compressed archive might be named as such. While Linux and many other Unix-based operating systems don’t completely rely on file extensions to the degree that DOS and Windows do, they still use them to identify compressed archives.
If the file utility tells you that the archive is of the wrong type, you simply need to change the name to the correct extension and then attempt to extract it again. Should it not be an archive, then the file utility will still work to identify the correct type. More than likely you’ll find that the archive is an HTML file mistakenly identified as an archive, but caution should always be used when extracting archives you download from the Internet anyway. Digital criminals sometimes modify files to appear as archives in order to cause problems for users, so the file utility’s advice should be taken to heart.
Identifying File Types Irrespective of Extensions
Naturally, it’s always highly suggested that you perform a malware scan on archives before extracting them, but assuming nothing turned up you might see several types of error messages. On top of those from xz or gunzip, you might also see several error messages from the tar program. If you’re getting errors that read tar: Child returned status 1 or tar: Error is not recoverable: exiting now, then you might be extracting something that shouldn’t be extracted or at least not in the way you’ve commanded tar to do so. You may have tried unxz or other programs, which continue to get you the same errors over time.
From the CLI prompt that you’ve been working with, try file theFileName.tar.xz, replacing theFileName.tar.xz with the name of the file that you’re actually working with. The extension may currently be .tar.gz, .tar.bz2, .txz, .tgz or several other permutations. The file command calculates a sum of the first few bytes in the file, which is sometimes called a magic number. This so-called magic test is then judged against a table, which itself is mapped to many different types of files. If file finds that it’s actually a text file of some sort, then it will report what encoding the text is in.
For instance, you might file.tar.xz: HTML document, UTF-8 Unicode text, with very long lines, which indicates your browser actually downloaded a Web page instead of an archive. A faulty wget command could also make this a reality. There’s no way any extraction will get any files out ot a file like that. If it does claim that it is indeed a correctly formatted .xz compressed file, then you might want to try apt list xz-utils to make sure the xz packages are installed, though both Ubuntu and Debian generally require their installation anyway for package management purposes. The same goes for all of the various derivatives of Ubuntu, like Lubuntu and Kubuntu.
The file utility will sometimes merely return data without any other information. While this might be accurate for some files created by online games or binary editors, it’s not something that you should see from an archive, and might indicate file corruption. Data type could also theoretically correspond to some of the proprietary formats that Classic Macintosh and later OS X used, which shouldn’t usually be extracted under Linux anyway. If file tells you that a compressed archive is actually a Windows or MS-DOS executable, it might very well be a piece of malware designed to attack Windows PCs.
You might see something like theFileName.zip: ZIP archive data, at least V2.0 to extract as a return type. You can rename the file from .tar.xz to .zip to properly extract it in that case. You may also want to rename it .tar.bz2 or .tar.gz depending on what output the file utility gave you. Once you’ve done this, you can actually extract them like normal, even if you weren’t able to before. If you have a ZIP file or something similar, then you could actually use file -z theFileName.zip to see not only a list of what’s in the archive, but also what the file utility thinks that each type is.
Returning a value of Intel 80386 PE32 executable when running the file utility on an archive with the -z option could indicate that there is legitimately a Windows program inside. If this is the case, and you’ve made sure to run multiple malware scans on it, then you might be able to run it with the Wine compatibility layer after extracting it. Some of the lines that file returns could be theoretically quite long, so you may wish to push F11 inside your terminal window. This makes it large enough to cover the entire desktop, without having to resort to a Linux virtual console.
You may also wish to try using the –apple switch, which gives you the old Apple file identifiers that you may need if trying to share files with users of other operating systems.
Keep in mind that file will identify some types of files as either ASCII or Unicode text even when a user might not think they would be classified as such. A .csv file is a special spreadsheet file mapped to certain text characters. The file command will call a .csv made on a Windows machine ASCII with CRLF terminators, and if you make one on your own Ubuntu machine, it might call it Unicode text. This isn’t an error that indicates a file extension is wrong, but just a peculiarity of the way it classifies files.