Datasets
If you use this website to find a reference set for your research, please cite our publication:
Cinthya Grajeda, Frank Breitinger, and Ibrahim Baggili. “Availability of Datasets for digital forensics – and what is missing”. In: Digital Investigation (2017). (Presented at DFRWS 2017, Austin, TX)
Note:
The “Date” field represents the date the repository or dataset was either: created or last modified.
The N/A “Not Available” abbreviation means that the information was not available at the moment or unable to be verified.
Legend for “Origin” Field:
U = User Generated
E = Experiment Generated
C = Computer Generated
Dataset Type | Available Datasets | Total Size | Origin | Source | Date | More info. |
---|---|---|---|---|---|---|
Android Application Packets (APK) | 119 APK files | N/A | U | Secure-Software-Engineering/DroidBench | 2015 | |
Android Application Packets (APK) | N/A | 1.6 GB | U | Digital Corpora | 2017 | |
Apple iPod Disk Images | 10 iPod images | 55 GB | E | Digital Corpora | 2012 - 2015 | |
Chat Logs | 1100 chat logs | 715 MB | U | Article - Tarique Anwar & Muhammad Abulaish | 2010 - 2012 | |
Database | 77 databases | 1.4 MB Zip | E | Article - Sebastian Nemetz, Sven Schmitt, & Felix Freiling | 2018 | |
Different Types of Files | JPEG, ZIP, HTML, Text, and Microsoft Office files | 41 MB | E | DFRWS 2006 Challenge | 2006 | |
Different Types of Files | JPEG, ZIP, HTML, Text, Microsoft Office, MP3, MPG, WMV, PDF, and EXE | 330 MB | E | DFRWS 2007 Challenge | 2007 | |
Different Types of Files | 22,000 MS Office 2007 files | ~24 GB | U | The MSX-13 Corpus | 2013 | |
Different Types of Files | 4,457 different types of files | 1.9 GB | U | The t5 Corpus | 2011 | |
Different Types of Files | Nearly 1 million files | N/A | U | Govdocs1 - Digital Corpora | 2009 | |
Computer Malware | 11,960 malware samples | N/A | U | Contagio Malware Dump | 2008 - 2017 | |
Computer Malware | 29,139,403 malware samples | N/A | U | Virus Share | 2017 | |
Computer Malware | 271,092 malware samples | N/A | U | VX Heaven - This website is no longer available. | 2006 - 2017 | |
Computer Malware | N/A | N/A | U | KernelMode.info Forum | 2016 | |
Email Datasets | 619,446 messages from 158 users | > 423 MB | U | Enron Email Dataset | 2015 | |
Email Datasets | 12 Emails | 34.8 KB | E | Digital Corpora | 2012 | |
Email Datasets | N/A | N/A | U | Apache Mail Archives | 2006 - 2016 | |
Email Datasets | Outlook PST file | N/A | E | DFRWS 2009 Rodeo | 2009 | |
Email Datasets | A subset of about 1,700 labeled email messages | 4.5 MB | U | UC Berkeley Enron Email | 2015 | |
Hard Disk Images | 169 disk images | 1.106 TB | U & E | Digital Corpora | 2008 - 2015 | |
Hard Disk Images | 11 disk images | 150 MB | E | Computer Forensic Tool Testing (CFTT) - NIST | 2003 | |
Hard Disk Images | 53 disk images | 12.2 GB | E | The CFReDS Project - NIST | 2016 | |
Leaked Passwords | ~ 30 sets | N/A | U | Skull Security Wiki | 2009 - 2010 | |
Media (Pictures) | 10,074 images | N/A | E | BOSS - Break Our Steganographic System | 2010 | |
Media (Pictures) | 10,000 images | 1.6 GB | E | BOWS2 - Break Our Watermarking System | 2007 - 2008 | |
Media (Pictures) | 3,600 images | N/A | U, E & C | Columbia University DVMM Laboratory | 2005 | |
Media (Pictures) | 2,988 images | N/A | E | University of Florence Image Communication Laboratory | 2011 | |
Media (Pictures) | > 10 images | N/A | E | King Saud University - Image Forensics - Note: This website is no longer available. | 2010 | |
Media (Pictures) | 13,483 images | N/A | U | NRCS Photo Gallery - USDA Natural Resources Conservation Service | 2016 | |
Media (Pictures) | > 300 images | ~50 MB | U, E & C | The Berkeley Segmentation Dataset and Benchmark | 2003 - 2013 | |
Media (Pictures) | ~400 images | 4.5 MB | E | AT&T Laboratories Cambridge - The Database of Faces | 1992 - 1994 | |
Media (Pictures) | 2,218 images | N/A | E | Columbia University - TrustFoto | 2004 - 2006 | |
Media (Pictures) | > 25,137 images | N/A | E | Technical University Dresden - The Dresden Image Database | 2010 | |
Media (Pictures) | N/A | N/A | E | USF Human ID 3-D Database | 1999 | |
Media (Pictures) | 14,126 images | 8.5 GB | E | color FERET Database - NIST | 1993 - 1996 | |
Media (Pictures) | 306 images | 132.8 MB | E | The USC-SIPI Image Database | 1977 & on | |
Media (Videos) | 18 video sequences | 48 MB | E | Article - Cheng-Shain Lin and Jyh-Jong Tsay | 2013 | |
Media (Videos) | 26 video test sequences | N/A | E | Arizona State University - YUV Video Sequences | N/A | |
Media (Videos) | 11 videos | N/A | C | NRCS Photo Gallery - USDA Natural Resources Conservation Service | 2014 - 2016 | |
Media (Videos) | 9,317 YouTube videos | N/A | U | Columbia University - Consumer Video (CCV) Database | 2011 | |
Mobile Malware for Android | > 237 malware samples | N/A | U | Contagio Mobile | 2011 - 2016 | |
Mobile Malware for Android | 9,990 malware samples | N/A | U | University of Korea Hacking and Countermeasure Research Lab - Andro-AutoPsy | 2013 - 2014 | |
Mobile Malware for Android | 5,560 malware samples | N/A | U | University of Göttingen, Germany - The Drebin Dataset | 2010 - 2012 | |
Network Traffic | 50 pcap files | N/A | E | Digital Corpora | 2008 - 2009 | |
Network Traffic | 3 pcap files | N/A | E | DFRWS 2009 Challenge | 2009 | |
Network Traffic | 1 pcap file | 876 KB | E | University of New Haven cFREG | 2015 | |
Network Traffic | 3 trace logs | 3.8 MB | E | The CFReDS Project - NIST | 2005 | |
Network Traffic | 68 network related datasets | N/A | U | CAIDA - Center for Applied Internet Data Analysis | 1998 - 2017 | |
Network Traffic | Cisco, Zebra BGP RIBs | N/A | U | University of Oregon Route Views Project | 1997 - 2017 | |
Network Traffic | 8 IP geolocation databases | N/A | U | MaxMind, Inc. - GeoLite Legacy | N/A | |
Network Traffic | Raw network related datasets | N/A | U | RIPE Network Coordination Centre | 1999 - 2016 | |
Network Traffic | Various pcap files | ZIP archive (72.0 KB) | E | Article - Libor Polčák | 2013 | |
Ram Dumps | N/A | > 1GB | U | Article - Wicher Minnaard | 2014 | |
Ram Dumps | Laptop memory image | N/A | E | DFRWS 2008 Rodeo | 2008 | |
Ram Dumps | 5 memory images | > 2 GB | E | The CFReDS Project - NIST | 2005 - 2007 | |
Ram Dumps | N/A | 4 GB | E | The Art of Memory Forensics | 2014 | |
Ram Dumps | 67 memory images | 44.1 GB | E | Digital Corpora | 2009 | |
Ram Dumps | 1 PS3 Linux physical memory dump | N/A | E | DFRWS 2009 Challenge | 2009 | |
Secure Digital Card - SD Images | 7 SD images | 174 MB | E | Digital Corpora | 2009 - 2013 | |
Smartphone Disk Images | 12 sets | N/A | E | The CFReDS Project - NIST | 2012 | |
Smartphone Disk Images | 14 sets | N/A | E | Digital Corpora | 2011 | |
Smartphone Disk Images | 1 image | 59 MB | E | DFRWS 2009 Rodeo | 2009 | |
Smartphone Disk Images | 2 Android images | 495 MB | E | DFRWS 2011 Challenge | 2011 | |
Subscriber Identity Module - SIM Card Images | 3 SIM images | 130 KB | E | The CFReDS Project - NIST | 2016 | |
Tablet Images | 25 images | 16.7 GB | E | Digital Corpora | 2012 -2016 | |
Universal Serial Bus - USB Flash Drive Images | 20 USB images | 10.9 GB | E | Digital Corpora | 2009 - 2015 | |
Universal Serial Bus - USB Flash Drive Images | 1 USB image | 124 MB | E | Computer Forensic Tool Testing (CFTT) - NIST | 2005 | |
Universal Serial Bus - USB Flash Drive Images | 3 USB images | 462 MB | E | The CFReDS Project - NIST | 2016 | |
Universal Serial Bus - USB Flash Drive Images | 1 USB image | N/A | E | DFRWS 2008 Rodeo | 2008 | |
Universal Serial Bus - USB Flash Drive Images | 1 USB image | N/A | E | DFRWS 2009 Challenge | 2009 | |
Video Game Console Disk Images | 5 Xbox One partitions | 476 GB | E | University of New Haven cFREG | 2014 | |
Video Game Console Disk Images | 4 disk images | 11.9 GB | E | Digital Corpora | 2013 - 2014 | |
Video Game Console Disk Images | 1 PS3 Linux partition | N/A | E | DFRWS 2009 Challenge | 2009 | |
Wireless Network Traces | 133 datasets | N/A | U | Crawdad - Resource for Archiving Wireless Data At Dartmouth | 2002 - 2016 | |
World Laguages/Text | 1,298 English & Arabic words | N/A | U | BiSAL - Bilingual Sentiment Analysis Lexicon | 2015 | |
World Laguages/Text | ~4 million words with wordlists for 20+ languages | N/A | U | Openwall Wordlists Collection | 2003 | |
World Laguages/Text | 3,097,370 Reuters news stories | N/A | U | Reuters Corpora (RCV1, RCV2, TRC2) - Reuters Ltd - NIST | 2004 - 2015 | |
World Laguages/Text | 250,000 English words | 2.4 MB | U | SCOWL (Spell Checker Oriented Word Lists) | 2016 | |
World Laguages/Text | 60 million words per language of 21 European laguages | > 2 GB | U | European Parliament Proceedings Parallel Corpus | 1996 - 2011 | |
World Laguages/Text | 9 Zip files w/ language training and testing data | 768.9 MB | U | Releases of the LTI LangID Corpus | 2014 | |
World Laguages/Text | Text files with ~352,500 words | N/A | U | Drexel University - Privacy, Security and Automation Lab | 2009 - 2012 | |
Network Traffic | PCAPs VoIP SIP for IDS | 362.8 MB | E | Article - M. Nassar (Springer) | 2010 |