Dedupe written by Roy/SAC

DeDupe Files 4.3.0

   Back to List of Tools Written by Roy/SAC

Introduction

I am collecting a lot of stuff, from scene graphics to old 8bit computer games. When I come across someones bulk unsorted collection for download somewhere, I download first and sort out stuff later. The first processing step is usually the determination: "Do I have it already?".

Especially when it comes to images, if they were kept in the original format (typically .IFF/.LBM if created on the Commodore Amiga, or .LBM/.GIF/.PCX when created on old MS DOS PC's) deduping really works wonders here, if you not just want to find duplicates based on file name, but actual file content.

I checked some other free tools that flew around, but most had either no command line option or did not provide the options that I wanted. So I started out to write my own. The first version was multi-tool combination of BATCH files, VBScript scripts and some command line tools for stuff like HASH calculations of files. This worked, but required some pre-configuring (in the batch files) and ran not very fast. It got the job done though, but something easier to use, on the fly, preferably executed via right click from the Windows Explorer context menu, was more desirable.

The second installment was just a single VBSScript file, with interface option using HTML and Internet Explorer, bringing along a MD5 Hashtool with it, encoded in the script itself and created as file on the hard disk in the temp folder, every time the script was executed.

This was much better and allowed already integration into the context menu of Windows Explorer. It was still not as fast as I would like, which had in part to do with the fact, that I always hashed every file in the directory (Primarily because the MD5 hash tool was only able to hash either a single file or a single directory with all files in it (no filters).

This let to the third and current installment of the tool, re-written in VB.NET with build-in hash generation and the option to choose which algorithm to use (for speed or for accuracy), multi-threading and several other improvements, such as smart logic to determine which of two identical files should be flagged as the dupe and which one kept as the original. Include and Exclude filter options, processing of sub-directories and build-in mechanism to revert the result of a de-dupe process afterwards, if the results were undesirable and much more.


Tool Summary

Tool Name
Current Version
Platform
Programming Language Used
Short Description
Latest Version Download

Please Note! ... that all of my tools posted on my web site are using the Free Art License (FAL) 1.3 (Copy left Attitude), which means its free to use, share and even modify and redistribute, as long as your modified version is still free and not commercially distributed. If you want to exploit the software commercially, you would have to contact me and negotiate terms.

Needless to say, but better safe than sorry.... using my tools does not make me liable for any direct or indirect caused damages or losses, because of the use of them. You use them at your own risk. If you are paranoid, don't use them, if you are not understanding what I am saying here, don't use them either.

 

Screenshots

Main Window Settings Window
Results Window Log Window

 

What the Hell is This?

A simple tool to find duplicate files on your hard drives and deal with them (various options). Its using the files MD5 hash to determine duplicates, so it does not matter if identical files have different file names or creation/modified dates.

The original idea was realized as a VBS Script, because I didn't know much about .NET at that time. You can download the old VBScript version, which has less features than the current VB.NET version, if you'd like to. It's available here Roy-DedupeScript20b.zip (42k).

 

The Command Line Version Syntax

The command line version can do pretty much everything which you can also do via the interface version.

Dedupe.exe <FOLDER>*

*<FOLDER> = the folder to process for example c:\temp

Parameters

Parameter Options/Description
/? | /h This help screen
/inclsub: yes|no - Inlcude Sub Directories (Default: yes)
/result: <filename> Report with Dedupe Results (Default: <FOLDER>\!dedupelog.txt
/returnresults: yes|no - Return Results (Default: no
/threads: 1-200 - Max. # of Threads for Hash Calc (Default: 50
/action: 1|2|3|4|5|6 - Dedupe Action (Default: 5). Specifies the action to take for the dupes that are found.
  1. rename dupe to aFile1_EXT_bFile2[DEDUPED].EXT where aFile1 is the original file followed by '_' and its EXT(ension), '_' bFile2 is the original base file name of the Dupe followed by [DEDUPED] and the dupe files original .EXT(ension)
  2. Rename dupes as in 1, but MOVE to a new sub folder '[Deduped]' of the path being processed
  3. Don't rename dupes, just MOVE to a new sub folder '[Deduped]'
  4. delete dupes (gone for good, unless you enabled "Recycled Bin" to be able to recover deleted files
  5. Move to Central Dupe Folder Location*. Mirroring the origianal folder structure and keeping the original File names.
  6. Report Only, don't rename or move anything
*/centraldupefolder: <FOLDER> - dupes central saving location for selected action = 5.
/exclext: <EXTENSIONS> - Exclude files with specified extensions from processing
/inclext: <EXTENSIONS> - Only process files with the specified extensions <EXTENSIONS> is a list of extensions separated by '|'. e.G.: 'JPG' or 'JPG|GIF'
/folderpriolist: <LISTFILE> - Folder Priority List File each sub folder has by default a priority of -1 you can change this with the Folder Priority List File Dupes in a folder with higher priority are kept over one with a lower priority.

The Priority List File has the follwing format:(each folder in a separate line)

<PRIORITY1>|<FOLDER1>
<PRIORITY2>|<FOLDER2>

For example:

1|C:\Temp\More Important
0|C:\Temp\Less Important
-2|C:\Temp\Least Important

/pause: yes|no - Pause after Results (Wait for Key Pressed) To prevent Command window from closing. (Default: yes)

 

Previous Versions