I am happy to announce that tcpflow 1.0.1 is now available. Improvements in tcpflow 1.0.1 over the version widely in use today (version 0.21) include:
- Support for VLANs
- Support for IPv6 (thanks to contributions from Jan Görig).
- Regression testing (note: the IPv6 is currently not regression tested because due to implementation differences of inet_ntop on MacOS and Linux).
The new version is available for download at http://afflib.org/downloads/tcpflow-1.0.1.tar.gz
Background: With the original author’s approval, I have taken over the management of maintenance of the tcpflow open source TCP reconstructor. I brought the software up-to-date with the current release of GNU autotools, applied various patches that were floating around, and added the VLAN support. I am now trying to get the tcpflow in various Linux distributions updated.
Future Direction: I would like to rewrite parts of tcpflow in C++ so that I can take advantage of the STL map class, which is significantly more efficient than the current data structure used by tcpflow to maintain state. I also want to make a linkable tcp flow reconstruction library. I am looking for input from tcpflow users as to 1) whether rewriting in C++ is okay, and 2) what form the library should take.
Once again, you can download the new version from http://afflib.org/downloads/tcpflow-1.0.1.tar.gz
September 26, 2011
Although I previously stated aimage was withdrawn from support, I have continued to receive requests for support. As the final version of aimage did not compile with the current version of AFFLIB, I have updated aimage so that now it does.
aimage is still not supported, but version 3.2.5 has been released.
August 17, 2011
Version 0.5.4 of bulk_extractor has been released. This version includes “crash protection” (you can have it catch a signal if you want), full support for BASE64 decoding, ZIP, GZIP, and even CCN Track 2 data! We also found a memory allocation bug in the processing of raw images. So if you were having problems before, you should upgrade now!
October 27, 2010
AFFLIB 3.6.3 has been released. This is a bug-fix release that fixes a bug in the handling of split-raw files that was introduced with AFFLIB 3.6.0. All users are encouraged to upgrade.
October 9, 2010
bulk_extractor 0.4.2 is released.
Significant features include:
- Support for context-based stop lists
- Automatic carving of PKZIP files
- Improved support for EXIF carving
Context-based stop list
Many users of bulk_extractor report surprise at the large number of email addresses, URLs, JPEGs, and other information that are contained within the standard Microsoft Windows and Linux distributions. For
example, Microsoft Windows XPSP3 contains 306 distinct email addresses, including not just addresses like piracy@microsoft.com and info@valicert.com, but email addresses that look like they belonging to individuals such as mojemeno@msn.com and mittnavn@msn.com.
The initial way that we attempted to resolve this issue was by creating a “stop list” of the distribution email addresses and building that stoplist into the bulk_extractor binary. The problem with this approach, we quickly learned, is that these problematic email addresses might appear in a variety of contexts, but we only want them suppressed when they are harvested as part of the operating system files. For example, we don’t want to be alerted to the mojemeno@msn.com email address when it appears as part of Microsoft Windows, but we do want this email address reported if it is found elsewhere.
To resolve this problem bulk_extractor now supports a context-based stop list. Instead of simply a list of email addresses that should be suppressed, the context-based stop list conatins both the email address and the context in which that email address occures. Here we define “context” to mean the 8 characters before the email address and the 8 characters following the email address in the disk image.
The context-based stop list is distributed as a specially formatted text file that contains the element to be suppressed, a tab, and the element in context. Unprintable characters are reported as underbars. For example, these two entries suppress the two occuresses of the mojemeno@msn.com email address in Windows XPSP3:
mojemeno@msn.com ail.com_mojemeno@msn.com_priklad
mojemeno@msn.com il.com__mojemeno@msn.com__prikla
All items suppressed by the traditional regular-expression stop list or the context-based stop list are now presented in separate feature files — for example, email_stop.txt. In no case is information actually suppressed. Presenting the suppressed results is important in for tool validation, both in testing and when the tool is actually run. Stopped terms may also useful for performing a profile of the hard drive.
bulk_extractor now comes with a Python program called make_context_stop_list.py. This program will process the output of bulk_extractor from multiple runs and create a single context-based stop list. We are also distributing a sample context-based stop list which is derrived from the following operating systems:
- fedora12-64
- redhat54-ent-64
- w2k3-32bit
- w2k3-64bit
- win2008-r2-64
- win7-ent-32
- win7-utl-64
- winXP-32bit-sp3
- winXP-64bit
You can download version 1.0 of the stoplist from: http://afflib.org/downloads/feature_context.1.0.zip Be sure to decompress the list first! We are distributing it in ZIP form because is the 70 megabytes in length. A future version of bulk_extarctor may read the compressed list directly.
Context-based stop lists correct the stop-list problem that surfaced with bulk_extractor 0.4.0. That version simply suppressed terms that were already present in the Windows and Linux distributions. Unfortunately this created an attack vector in which attackers could register and use these email addresses and in so doing escape detection.
PKZIP Carving
Version 0.4.2 introduces carving of PKZIP components. Whenever bulk_extractor finds a component of a ZIP file that includes a valid header, it attempts to decompress the fragment and then recursively reprocesses the decompressed data with all of the extractors. Currently the results of ZIP carving are reported with standard offsets. In the feature the offsets will be reported NNNNNN-ZIP where NNNNN is the byte offset of the ZIP component.
Improved support for EXIF Carving
Version 0.4.2 finds and carves EXIF headers of JPEG files. All of the results are stored in a feature file that consists of the MD5 hash of the first 4K of the JPEG and an XML structure. bulk_extractor now also comes with a program called post_process_exif.py which reads this file and creates a tab-delimited file that can be imported into Microsoft Excel that breaks each EXIF field into its own spreadsheet column.
September 27, 2010
AFFLIB 3.6.0 is released. Key features include:
- Name change: All commands now being “aff” instead of “af”. This is a result of the name conflict under MacOS 10.6. (Apple created a new command “afconvert” which was causing lots of confusion.
- Bug fix: encrypted AFD files works again.
- Bug fix: files >4GB work on Win32.
July 25, 2010
We are pleased to announce the release of fiwalk version 0.6.0.
fiwalk verison 0.6.0 marks the first official release of the Digital Forensics XML Toolkit, a set of python modules and programs for working with the Digital Forensics XML that fiwalk produces.
More information about Digital Forensics XML and the Digital Forensics XML Toolkit can be found at:
May 29, 2010
bulk_extractor 0.3.1 is released. This version has several new
features based on user-feedback, and a few bug fixes based on a
thorough code review.
New Features:
- url_services.txt – a histogram of all URLs by domain.
- url_searches.txt – a histogram of all search terms, including Google, Yahoo, Bing, and any other search service with “search” in the domain and “q=” or “p=” in the URL.
- ccn.txt – this file now reports Federal Express account numbers, SSNs (if properly formatted or prefixed), DOBs, and other info.
- tcp.txt – This experimental feature looks for IP and TCP packets in PAGEFILE.SYS, memory dumps and hibernation files, and stores the results.
- the whitelist and redlist files may now contain globbed terms. For example, put *@company.com in the redlist and any mention of anyone@company.com will be flagged and also put into a special file called redlist_found.txt.
- CONTEXT: The ccn.txt now show the context from which the matched
information was taken. hosts.txt shows context for numeric IP addresses.
Bug Fixes:
- Improved handling of raw devices and files.
- bulk_extractor is now less likely to error on some input data sets.
- A crashing bug that impacted bulk_extractor 0.3.1 has been addressed.
May 25, 2010
I am pleased to announce the release of bulk_extractor 0.2.1. This version corrects a few minor bugs in version 0.1.0 and is available immediately. We have also increased the version number from 0.1.x to 0.2.x to reflect a total rewrite in the way that the underlying flex architecture is implemented.
April 25, 2010