Windows LNK Parsing Utility (lp)
lp is a prototype version of a Windows SHLLINK parser that was designed to operate on shortcut files, but can parse SHLLINK artifacts from files that generate Jump Lists as well. Originally inspired by the forensic class taken from the SANS Institute back in Jan 2010, lp is a useful tool for any computer forensic toolkit.
While shortcut files can reside in just about any directory, the primary location for many shortcut files is: %APPDATA%\ Microsoft\ Windows\ Recent\ <shortcut files>, where the %APPDATA% is resolved to C:\Users\<user account>\AppData\Roaming. This is where the operating system automatically creates a shortcut based on a user double clicking on an application to launch it.
Of interest to the forensic investigator is the metadata associated with this type of file, since they offer many useful artifacts when determining activity on a computer. Some of these artifacts include:
- a. The path to the target file/directory it references along with modify/access/create timestamps
- b. The size of the target when it was last accessed.
- c. Serial number of the volume where the target was stored.
- d. Network volume share name (if applicable).
- e. Target attributes, such as whether it was 'read only', 'hidden', 'system', etc.
- f. One of the MAC addresses associated with the host computer (available when an Object ID is present).
When trying to parse out the above artifacts, one can turn to the Microsoft open specification agreement, where there is a published version of the Windows SHLLINK format. From this specification, one can see many of the details needed to understand the structures of the format.
The parsing engine of lp makes use of the Microsoft specification to extract much of the shortcut internals. Where the specification lacked details, we ended up using empirical data to help understand some of the opaque data structure types allowing us to parse the SHLLINK format more fully.
SHLLINK Metadata and What lp Extracts
When creating tools that parse artifacts that still have unknowns associated with them, there is a balance on what data should be presented to the user and which should not. On one hand, we at TZWorks LLC personally like to see all the data artifacts, complete with file offsets, so we can trace each artifact in a hex editor. This allows one to hand carve the data and is very important to the reverser. However, this type of data is most likely to be too noisy for the normal user. Internally at TZWorks we create a do everything version and a limited version. Up till now we have not been labeling the limited versions as such. Therefore, starting with version 0.49, lp is labeled as limited.
The limited version has all the same functionality of the previous versions plus more. The limited version binary, however, is not compiled with some extra options used for pulling out additional information from SHLLINK metadata, primarily useful to reversing and testing the integrity of the shortcut file. Aside from these extra options, the limited version includes much more capabilities than the previous versions. Some of these additional options include: (a) carving SHLLINK metadata from images and live volumes, (b) handling the nuances of the Destinations files used in the Jump Lists and (c) additional output format options.
To see the lp's default output, and hence some of the SHLLINK metadata, below is a snapshot from one of the shortcut files in the Windows\Recent subdirectory.
When analyzing lp's output, a number of timestamp data is shown, including the shortcut file timestamps as well as the target (what the shortcut file points to) timestamps. The output should contain the size of the target, if it is a file vice a directory, and will contain a path to the target. As part of the SHLLINK specification, there is also what Microsoft calls a TrackerDataBlock. This is what we refer to as the object identifier (ID), since it is really the object ID of the NTFS MFT record associated with the target file or directory.
The object ID is another way to reference the target file/directory and ultimately allows the operating system a straight forward way to 'track' the target file or directory at the lower level NTFS object ID/MFT entry level. This object ID is part of the target file and moves where the target moves. In the SHLLINK metadata there are two object IDs: (a) one that is recorded when the shortcut is created, and (b) one that is the current one. For the most part these two object IDs are the same and will only differ in certain conditions. Internally, lp makes note of both object IDs, however, it will only display both if they are different.
Associated with the object ID is a creation timestamp and media access control (MAC) network interface identifier that was present during the object ID creation. The format of the object ID follows the Type 1 specification outlined in RFC 4122 (Universally Unique Identifier URN Namespace). Using this specification, one can extract the object time and MAC address from the object ID itself. What this means is there is NO timestamp or MAC network interface artifact explicitly present in the SHLLINK metadata. Any data shown in lp's output for these fields is from implicitly deriving it from the object ID itself. When analyzing the below MAC network interface extracted, it identified one of our VMWare network interfaces and not the primary computer's network address.
While there are a number of paths to the target, the key one is the one labeled 'ID List' shown above. It is generated from a series of SHITEMID structures embedded in the SHLLINK metadata that is used to construct the final path of the target. It is important for any SHLLINK parser to pull this out, since Windows defaults to the path described in this structure (if it exists) when resolving where the target file or directory is located.
How to use lp
For starters, lp is a console application. Therefore, to be able to access, and thus parse, shortcut files across all computer accounts, one will need to open the command prompt with administrator privileges first. Without administrator privileges, one will be restricted to only accessing your account's shortcut files or those common to the operating system.
One can display the menu options by typing in the executable name with no parameters. A screen shot of the menu is shown below. By using the options in various ways, one can process SHLLINK metadata with six general 'use-cases': (1) processing an individual shortcut file, (2) carving from a captured image, (3) extracting from Jump List files, (4) processing a collection of files, (5) carving from a live-mounted volume, and (6) carving from a VMWare volume.
These 'use-cases' are annotated in the screen shot below.
For output options, there are four possible formats to choose from: (1) default output, which is unstructured output. The screenshot in the previous section above is an example of what this output looks like. This information is useful if not trying to parse the artifacts into a database. (2) –csv (comma separated value) option will render the output so that all the metadata is rendered with one record per line with each field separated with a comma. The last two are: (3) –csvl2t and (4) –bodyfile. Each will attempt to conform to either the log2timeline format or the SleuthKit's body-file format, as appropriate. As shown in the menu above, these latter two options need to be validated by their respective sponsors.
Case 1: Parsing an Individual Shortcut File
The most basic option is to parse an individual shortcut file. To do so, just pass the name as the parameter to lp, as shown below, and the output will default to the long form shown in Section above.
lp <shortcut filename>
Case 2: Parsing a Capture Image for SHLLINK metadata
To parse an entire image of a drive that is contained in a file (eg. a 'dd' type image), use the –rawscan switch. This option tells lp to perform a full scan of any SHLLINK signatures found in the image. For each signature found, lp will attempt to carve out any SHLLINK metadata. This type of scan will carve out signatures from allocated, unallocated or slack space.
lp is able to scan very large files by reading a manageable chunk from the file at a time and output the results as they are generated. So if your image is many gigabytes in size, lp should be able to process the entire image without using too much memory or system resources. Of note, lp is agnostic as to the file system type, as it treats all formats the same. While this is good news in that it can work on any file system, it is also bad, in that it does not try to reconstruct files from a fragmented system. Empirical results show, however, that since the SHLLINK metadata is relatively small, the fragmentation of these files is close to nil. Thus, this type of scanning/carving/parsing shows a high success rate in gathering the artifacts.
When parsing a large image, there will presumably be many SHLLINK entries carved out, thus it is recommended to: (a) use the –csv option to place one record per line, and (b) redirect the output into a separate file. Below is an example:
lp c:\temp\dd_imagefile.bin -rawscan -csv > results.csv
Case 3: Parsing Automatic and Custom Destinations files used for Jump Lists
Jump Lists are a new feature, starting with Windows 7. They are similar to shortcuts files in that they take one directly to the files that are used on a regular basis. They are different than the normal shortcut files in that they are more extensible in what information they display. For example, in Internet Explorer, the Jump Lists will display websites frequently visited; for Microsoft Office products like Excel, PowerPoint and Word, they will show most recently opened documents.
From a user's standpoint, Jump Lists increase one's productivity by providing quick access to the files and tasks associated with one's applications. From a forensics standpoint, Jump Lists are a good indicator of which files were recently opened or which websites were visited frequently.
Windows derives the Jump List content from two sets of Destination files:
- a. %APPDATA%\Microsoft\Windows\Recent\AutomaticDestinations\[AppID].automaticDestinations-ms
- b. %APPDATA%\Microsoft\Windows\Recent\CustomDestinations\[AppID].customDestinations-ms
%APPDATA% is resolved to C:\Users\<user account>\AppData\Roaming. One can see that each user account (or profile) has its own set of Destination files.
For most automaticDestinations type files, lp can find and parse the SHLLINK metadata with no special command line options (eg. the default settings). This is because the automaticDestinations type files have a compound file signature, which is built into the lp scanning engine. lp will recognize this signature, reconstruct the allocated/unallocated sectors within the compound file and scan the chunks appropriately. On the other hand, the customDestinations type files only have SHLLINK signatures which do not necessarily occur on sector boundaries. Therefore, to assist lp, to parse this type of file, one invokes the –deepscan switch. This tells lp to scan in a mode that is in-between a normal LNK file scan and a captured image type scan. This switch has no effect on normal shortcut files, so it can be used to handle both shortcut files as well as automatic/custom Destinations files.
While lp does a good job at pulling out the SHLLINK metadata from both automatic and custom Destinations type files, it does not attempt to parse the MRU/MFU data from the automatic Destinations files. To parse these files in a complete fashion, one can use the jmp tool from TZWorks. The jmp tool understands how to parse both Destinations type files in a manner to extract all pertinent metadata for the investigator.
Below is a comparison of the outputs of running the lp tool and the jmp tool against the same automaticDestinations file. This output is representative of the differences between the two tools. For more information about the jmp tool, one can download and review the readme file for the tool at http://tzworks.net/prototype_page.php?proto_id=20.
Case 4: Parsing a Collection of Files
Sometimes one just wants to parse a bunch of shortcut files that are in a directory or a collection of subdirectories. Compared to the partition scan discussed above, this option is much faster. The disadvantage with is approach over the partition scan, is you don't get artifacts that have been deleted and are still in unallocated or slack space.
To use this option, one will make use of the operating systems ability to pipe data from one application's output to another application's input. In this case, the source of the data will be the Windows shell command dir. By adding some special options to the dir command, one can output only the path/filename without any extra data. This result will be consumed by lp, and each path/filename passed in will be analyzed. To invoke this behavior in lp, one will use the –pipe switch. The annotated figure below explains how the syntax of the command is composed.
Case 5: Parsing an Active Volume
A variant of parsing a captured image is to parse an active Windows partition or a mounted volume on Linux. The Windows version is invoked by using the –partition <drive letter> option. On Linux this is handled by passing in the device name of the disk and/or volume as the filename without the use of the –partition keyword. The other option that needs to be set is the –rawscan switch. This option tells lp to read the specified volume at the sector level. While the –partition <drive letter> option implicitly invokes the –rawscan option, for the Linux mode, one needs to explicitly call out this option.
If using Linux to run lp, one can located the mounted devices via the built-in df tool. (see snapshot below):
For lp to read at the cluster level, it needs to be invoked with administrative privileges. Like the image file parsing, this option also is agnostic to file system type, as it treats all cluster data (from FAT16/32, exFat or NTFS) the same. Below is an example of lp carving out SHLLINK signatures from a USB drive mounted as drive H for Windows and /dev/sbd1 on Linux.
lp -partition H -csv > results.csv [Windows version] lp /dev/sdb1 -rawscan -csv > results.csv [Linux version]
Case 6: Parsing a VMWare Volume
Occasionally it is useful to analyze a VMWare image, both from a forensics standpoint as well as from a testing standpoint. When analyzing different operating systems, and different configurations, a virtual machine is extremely useful in testing out different boundary conditions. This option is still considered experimental since it has only been tested on a handful of configurations. Furthermore, this option is limited to monolithic type VMWare images versus split images. In VMWare, the term split image means the volume is separated into multiple files, while the term monolithic virtual disk is defined to be a virtual disk where everything is kept in one file. There may be more than one VMDK file in a monolithic architecture, where each monolithic VMDK file would represent a separate snapshot. More information about the monolithic virtual disk architecture can be obtained from the VMWare website.
When working with virtual machines, the capability to handle snapshot images is important. Thus for processing a VMWare snapshot, one needs to include the parent snapshot/image as well as any descendants.
lp can handle multiple VMDK files to accommodate a snapshot, and its descendants, by using multiple –vmdk <file> switches, where each one represents a segment in the inheritance chain of VMDK files (eg. –vmdk <VMWare NTFS virtual disk-1> ... –vmdk <VMWare NTFS virtual disk-x>). Implicitly this option invokes the –rawscan option as well, so that lp will scan each sector for SHLLINK signatures.
To aid the user in defining the chain of descendant images, lp can take any VMDK file in a chain of snapshot VMDK files and determine what the descendant chain is. lp will analyze the chain, and if incomplete, will suggest a chain to use. This assumes, however, all the VMDK files are in the same directory. Below is an example of selecting the VMDK snapshot image file of winxp.64-000002.vmdk (shown in the yellow box). For this example, the chain is incomplete, so lp responds with what the dependencies are (shown in the red box), and then gives the user a suggested syntax to use for the command to process this snapshot.
Repeating the command, using the suggested chain of VMDK files, will allow lp to analyze and parse the volume represented by the snapshot.
|32-bit Version||64-bit Version|
|Mac OS X:||lp.v.0.55.osx.tar.gz||lp.v.0.55.osx.tar.gz||md5/sha1|
|*32bit apps can run in a 64bit linux distribution if "ia32-libs" (and dependencies) are present.|