Datasets

If you use this website to find a reference set for your research, please cite our publication:

Cinthya Grajeda, Frank Breitinger, and Ibrahim Baggili. “Availability of Datasets for digital forensics – and what is missing”. In: Digital Investigation (2017). (Presented at DFRWS 2017, Austin, TX)


Note:

The “Date” field represents the date the repository or dataset was either: created or last modified.
The N/A “Not Available” abbreviation means that the information was not available at the moment or unable to be verified.

Legend for “Origin” Field:
U = User Generated
E = Experiment Generated
C = Computer Generated

 

Dataset TypeAvailable DatasetsTotal SizeOriginSourceDateMore info.
Android Application Packets (APK)119 APK filesN/AUSecure-Software-Engineering/DroidBench2015
Android Application Packets (APK)N/A1.6 GBUDigital Corpora2017
Apple iPod Disk Images10 iPod images55 GBEDigital Corpora2012 - 2015
Chat Logs1100 chat logs715 MBUArticle - Tarique Anwar & Muhammad Abulaish2010 - 2012
Different Types of FilesJPEG, ZIP, HTML, Text, and Microsoft Office files41 MBEDFRWS 2006 Challenge2006
Different Types of FilesJPEG, ZIP, HTML, Text, Microsoft Office, MP3, MPG, WMV, PDF, and EXE330 MBEDFRWS 2007 Challenge2007
Different Types of Files22,000 MS Office 2007 files ~24 GBUThe MSX-13 Corpus2013
Different Types of Files4,457 different types of files1.9 GBUThe t5 Corpus2011
Different Types of Files1 million filesN/AUGovdocs1 - Digital Corpora2009
Computer Malware11,960 malware samples N/AUContagio Malware Dump2008 - 2017
Computer Malware29,139,403 malware samplesN/AUVirus Share2017
Computer Malware271,092 malware samplesN/AUVX Heaven2006 - 2017
Computer MalwareN/AN/AUKernelMode.info Forum2016
Email Datasets619,446 messages from 158 users > 423 MBUEnron Email Dataset2015
Email Datasets12 Emails34.8 KBEDigital Corpora2012
Email DatasetsN/AN/AU Apache Mail Archives2006 - 2016
Email DatasetsOutlook PST fileN/AEDFRWS 2009 Rodeo2009
Email DatasetsA subset of about 1,700 labeled email messages4.5 MBUUC Berkeley Enron Email2015
Hard Disk Images169 disk images1.106 TBU & EDigital Corpora2008 - 2015
Hard Disk Images11 disk images150 MBEComputer Forensic Tool Testing (CFTT) - NIST2003
Hard Disk Images53 disk images12.2 GBEThe CFReDS Project - NIST2016
Leaked Passwords~ 30 setsN/AUSkull Security Wiki2009 - 2010
Media (Pictures)10,074 imagesN/AEBOSS - Break Our Steganographic System2010
Media (Pictures)10,000 images1.6 GBEBOWS2 - Break Our Watermarking System2007 - 2008
Media (Pictures)3,600 imagesN/AU, E & CColumbia University DVMM Laboratory2005
Media (Pictures)2,988 imagesN/AEUniversity of Florence Image Communication Laboratory2011
Media (Pictures)> 10 imagesN/AEKing Saud University - Image Forensics2010
Media (Pictures)13,483 imagesN/AUNRCS Photo Gallery - USDA Natural Resources Conservation Service2016
Media (Pictures)> 300 images~50 MBU, E & CThe Berkeley Segmentation Dataset and Benchmark2003 - 2013
Media (Pictures)~400 images4.5 MBEAT&T Laboratories Cambridge - The Database of Faces1992 - 1994
Media (Pictures)2,218 imagesN/AEColumbia University - TrustFoto2004 - 2006
Media (Pictures)> 25,137 images N/AETechnical University Dresden - The Dresden Image Database2010
Media (Pictures)N/AN/AEUSF Human ID 3-D Database 1999
Media (Pictures) 14,126 images8.5 GBEcolor FERET Database - NIST 1993 - 1996
Media (Pictures)306 images132.8 MBEThe USC-SIPI Image Database 1977 & on
Media (Videos)18 video sequences48 MBEArticle - Cheng-Shain Lin and Jyh-Jong Tsay2013
Media (Videos)26 video test sequencesN/AEArizona State University - YUV Video SequencesN/A
Media (Videos)11 videosN/ACNRCS Photo Gallery - USDA Natural Resources Conservation Service2014 - 2016
Media (Videos)9,317 YouTube videosN/AUColumbia University - Consumer Video (CCV) Database2011
Mobile Malware for Android> 237 malware samplesN/AUContagio Mobile2011 - 2016
Mobile Malware for Android9,990 malware samplesN/AU University of Korea Hacking and Countermeasure Research Lab - Andro-AutoPsy2013 - 2014
Mobile Malware for Android5,560 malware samplesN/AUUniversity of Göttingen, Germany - The Drebin Dataset2010 - 2012
Network Traffic50 pcap filesN/AEDigital Corpora2008 - 2009
Network Traffic3 pcap filesN/AEDFRWS 2009 Challenge2009
Network Traffic1 pcap file876 KBEUniversity of New Haven cFREG2015
Network Traffic3 trace logs3.8 MBEThe CFReDS Project - NIST2005
Network Traffic68 network related datasetsN/AUCAIDA - Center for Applied Internet Data Analysis1998 - 2017
Network TrafficCisco, Zebra BGP RIBs N/AUUniversity of Oregon Route Views Project1997 - 2017
Network Traffic8 IP geolocation databasesN/AUMaxMind, Inc. - GeoLite LegacyN/A
Network TrafficRaw network related datasetsN/AURIPE Network Coordination Centre1999 - 2016
Network TrafficVarious pcap filesZIP archive (72.0 KB)EArticle - Libor Polčák2013
Ram DumpsN/A> 1GBUArticle - Wicher Minnaard2014
Ram DumpsLaptop memory imageN/AEDFRWS 2008 Rodeo2008
Ram Dumps5 memory images> 2 GBEThe CFReDS Project - NIST2005 - 2007
Ram DumpsN/A4 GBEThe Art of Memory Forensics2014
Ram Dumps67 memory images44.1 GBEDigital Corpora2009
Ram Dumps1 PS3 Linux physical memory dumpN/AEDFRWS 2009 Challenge
2009
Secure Digital Card - SD Images7 SD images174 MBEDigital Corpora2009 - 2013
Smartphone Disk Images12 setsN/AEThe CFReDS Project - NIST2012
Smartphone Disk Images14 setsN/AEDigital Corpora2011
Smartphone Disk Images1 image59 MBEDFRWS 2009 Rodeo2009
Smartphone Disk Images2 Android images495 MBEDFRWS 2011 Challenge2011
Subscriber Identity Module - SIM Card Images3 SIM images130 KBEThe CFReDS Project - NIST2016
Tablet Images25 images16.7 GBEDigital Corpora2012 -2016
Universal Serial Bus - USB Flash Drive Images20 USB images10.9 GBEDigital Corpora2009 - 2015
Universal Serial Bus - USB Flash Drive Images1 USB image124 MBEComputer Forensic Tool Testing (CFTT) - NIST2005
Universal Serial Bus - USB Flash Drive Images3 USB images462 MBEThe CFReDS Project - NIST2016
Universal Serial Bus - USB Flash Drive Images1 USB imageN/AEDFRWS 2008 Rodeo2008
Universal Serial Bus - USB Flash Drive Images1 USB imageN/AEDFRWS 2009 Challenge2009
Video Game Console Disk Images5 Xbox One partitions476 GBEUniversity of New Haven cFREG2014
Video Game Console Disk Images4 disk images11.9 GBEDigital Corpora2013 - 2014
Video Game Console Disk Images1 PS3 Linux partitionN/AEDFRWS 2009 Challenge2009
Wireless Network Traces133 datasetsN/AUCrawdad - Resource for Archiving Wireless Data At Dartmouth2002 - 2016
World Laguages/Text1,298 English & Arabic wordsN/AUBiSAL - Bilingual Sentiment Analysis Lexicon2015
World Laguages/Text~4 million words with wordlists for 20+ languagesN/AUOpenwall Wordlists Collection2003
World Laguages/Text3,097,370 Reuters news storiesN/AUReuters Corpora (RCV1, RCV2, TRC2) - Reuters Ltd - NIST2004 - 2015
World Laguages/Text250,000 English words2.4 MBUSCOWL (Spell Checker Oriented Word Lists)2016
World Laguages/Text60 million words per language of 21 European laguages> 2 GBUEuropean Parliament Proceedings Parallel Corpus1996 - 2011
World Laguages/Text9 Zip files w/ language training and testing data768.9 MBUReleases of the LTI LangID Corpus2014
World Laguages/TextText files with ~352,500 wordsN/AUDrexel University - Privacy, Security and Automation Lab2009 - 2012