Windows $MFT and NTFS Metadata Extractor Tool (ntfswalk)

ntfswalk is a prototype version of a tool that traverses a specified NTFS volume reading all MFT entries; pulling predefined statistics.

Originally, the engine was designed as a widget for other prototypes to help pull data out from targeted categories of files on NTFS partitions. Later, it was determined that making a stand-alone utility was helpful in debugging and understanding the internals of any NTFS volume. This new tool, coined ntfswalk is named after its ability to walk an entire NTFS volume and output each MFT entry it encounters. It is released here to generate discussion and feedback.

The architecture pulls all NTFS metadata, but for this prototype version, only a handful of artifacts were selected to be extracted for demo purposes.

Designed to work with live NTFS partitions, there is also functionality for traversing NTFS images created with the dd utility (as well as some versions of VMWare vmdk files). There are options to filter on file extension, as well as timestamp range and partial filenames. For the files found, one can list the summary metadata, extract the header bytes, or extract the entire file contents into a designated directory. Since the engine is Windows API agnostic, there are compiled versions for Windows, Linux and Mac OS X.

Running this tool on a Windows live system requires one to run it with administrator privileges.


Enhancements starting with version 0.37

Starting with version 0.37, the NTFS parsing engine has been enhanced to traverse all the parent directories for a specific MFT entry. Since the NTFS can have 'hard links', it is possible for a specific MFT entry to have multiple parents (or directories) that it belongs to. Furthermore, in addition to, multiple parents, an MFT entry can have different 'long' filenames. The bottom line is, this results in more filename attributes per MFT entry. When reporting timestamps, all these additional entries need to be considered. To accommodate this additional data, the output of this new version was redone in the 'macb' [Modify, Access Changed MFT, Birth] format to allow easy display of the timestamps associated with all possible parents (directories) and/or differing filenames. Thus, if an MFT entry has more than one parent directory or multiple names, there will be separate 'macb' entries to accommodate these attributes.

Also added was the capability for ntfswalk to parse an extracted $MFT file vice simply using an image of a drive or a live system. Just one caution when using this option, however; for MFT entries that rely on 'non-resident' attribute lists, the data is contained in a cluster outside the normal $MFT set of clusters. This means that the attribute data associated with the list will not be reflected in the output.

More 'output format' options were implemented in this version. Retaining the standard default 'text' output where each field is delimited by the pipe '|' character, there now exist options for: (a) normal csv format, (b) The SleuthKit's v3 body-file format, and (c) a log2timeline csv format, to allow for easier data import into the super timeline tool. Of the output formats available, the default and normal csv formats present the most data to the user.


Summary of the options for ntfswalk

ntfswalk has a number of command line switches, and for the occasional user, it can be confusing which options can be used together and which cannot. The figure below divides ntfswalk's processing flow into 4 main areas. Starting with the first area, this identifies which data sources ntfswalk can handle for input. The second area defines any desired filtering of the results. One can filter on deleted files/folders, filter by file extension, etc. The third area allows one to extract certain data: (i) either to a separate folder or (ii) metadata which gets appended to the results file. For this version we've only allowed 'unnamed data' types to be copied to a separate directory. The fourth area allows one to select how one wishes to format the data. The default is plain text, which by itself, has reasonable formatting when viewed in notepad and wordwrap is turned off. The others' are geared for spreadsheet views or other post processing tools. Most of number data that is outputted is defaulted as hexidecimal. Based on popular request, there is an [-base10] option to transform the output into base10 notation.

ntfswalk flow

The Command Line options for the above

This figure shows the syntax of each of the options, grouped by area discussed above. It also identifies which options can be used in combination with others. Therefore, one can select: (a) one source of input, (b) none or any combination of filters, (c) none or one extraction option and (d) one type of format for the output results.

ntfswalk cmdline options

Here are some examples...

Lets say you wanted to search all the names in a live volume that contained the string "wordpad.exe" and store the output into csv format. That way you could double click on the resulting csv file and Excel could easily open the file. The syntax would be the following for scanning the 'c' partition and redirecting the output to some results file:

       ntfswalk -partition c -filter_name "wordpad.exe" -csv > results.csv

When examining the results.csv file, one would see prefetch, 'mui' and 'exe' entries all containing the string 'wordpad.exe'. Since the prefetch entry has a name longer then the DOS 8.3 length, the normal windows system would have a set of timestamps for the long filename as well as a set of timestamps for the 8.3 version of the filename. Many of these timestamps are duplications, and thus, by using the compressed 'macb' timestamp notation, one can show all the pertinent data without taking too much room, as is highlighted below. Also highlighted are entries where there are multiple parent directories for one MFT entry (in this case, there are 2 parents for wordpad.exe). This means that wordpad.exe as a single MFT entry, has a secondary 'hard link' to another directory.

wordpad.exe search

Other data that can be extracted from ntfswalk includes cluster information. By using the option [-action_include_clusterinfo], one can view all the cluster information available for each attribute that contains data. Below is an example:

       ntfwalk -partition c -action_include_clusterinfo -csv > results.csv

The figure shows a snapshot of a sample dump. After trimming out some of the rows/cols, one can see the data type, filename and the location where the data resides. For those datasets that are easily parsed, such as the volume information or object identifier, ntfswalk shows the interpreted data. For other entries, the cluster information is shown, if applicable.

cluster info

As a final example, if one wishes to extract cluster data associated with a MFT entry, one can use the [-action_copy_files <directory to store extracted files>]. In this case, we are enumerating only those 'deleted' files that have an extension of 'lnk'. Finally, we tell ntfswalk to copy each of the clusters associated with these resulting files to a 'dump' directory. The syntax of the command is:

       ntfwalk -partition c -filter_deleted_files -filter_ext "lnk" \
              -action_copy_files c:\dump\deleted.lnk -csv > results.csv

The first figure shows each MFT entry and the associated path/name of the extracted file. The second figure shows the output of the extracted files. The syntax of the extracted file uses <last modify date>_<md5 hash>_<filename w/ extension>_<data type>.bin

cluster info
cluster info

Downloads

32-bit Version64-bit Version
Windows:ntfswalk32.v.0.44.win.zipntfswalk64.v.0.44.win.zipmd5/sha1
Linux:ntfswalk32.v.0.44.lin.tar.gzntfswalk64.v.0.44.lin.tar.gzmd5/sha1
Mac OS X:ntfswalk.v.0.44.osx.tar.gzntfswalk.v.0.44.osx.tar.gzmd5/sha1
*32bit apps can run in a 64bit linux distribution if "ia32-libs" (and dependencies) are present.