The background story and general file structure remained the same, but I made a lot other changes, especially “under the hood”. The most important improvement is the PERFORMANCE. Where V1.1 was taking several minutes to process, V2.0 only takes seconds instead. De-Duping folders with thousands and not just hundreds of files is now NOT A PROBLEM for this script anymore. Here are the details:
Name of the Software: DeDupe Files Script Tool
Author: Carsten Cumbrowski
Version 2.0 beta
Date: April 2009
Visit http://www.cumbrowski.com/ for resources to web and database development and internet marketing. There you can also find the contact page with various means to get in touch with the author of this tool.
The script detects duplicate files within a directory.
Duplicate files are files that have the same MD5 Check Sum value.
Two DIFFERENT/NON IDENTICAL files having the same MD5 Check Sum is not impossible, but highly unlikely.
This allows the script to detect duplicate files regardless of their file name or other characteristics, such as “date created” or “date modified”.
The tool scans all files within a directory. It does not include files in sub directories of the processed folder.
The Script now supports Multi-Threading for the MD5 Checksum determination. That was the bottle-neck of the previous version of the script. The default is set to 50 threads, but it can be changed in the settings or on the fly via the command line switch /threads:NN where NN should be a number greater than or equal 1. I don’t know the maximum value here, because it depends on the machine where the script runs on.
Be careful and only increase it in small steps to increase performance even more. The thread count starts actually with 0, which means, that if you set /threads:49 (default) then you actually get 50 threads.
There are 5 different Actions you can choose from to tell the script what to do, if it finds a duplicate file
- rename dupe to aFile1_EXT_bFile2[DEDUPED].EXT where aFile1 is the original file followed by “_” and its EXT(ension), “_” bFile2 is the original base file name of the Dupe followed by [DEDUPED] and the dupe files original .EXT(ension)
- Rename dupes as in 1, but MOVE to a new sub folder “[Deduped]” of the path being processed
- (DEFAULT ACTION) Don’t rename dupes, just MOVE to a new sub folder “[Deduped]”
- delete dupes (gone for good, unless you enabled “Recycled Bin” to be able to recover deleted files
- Create sub folder at specified location (/cdb:BACKUPPATH) with name “yyyy-mm-dd_hh-mm-ss_FolderName”, create index file !Index.txt with archive location and name and original locations of files, separated by “|”
The Default DeDupe Action can be overwritten via the command line option
/action:[1…5] , e.g. /action:3 for the default Action
Dupe Action (1)
If a duplicate file is found, it will be renamed by by appending the original file name as prefix with an ‘_’ as separator, which is also used to replace the “.” that indicates the file extension of the original file name (other “.” in the file name itself remain). At the end of the file name is the string [DEDUPED] added.
For Example aFile1.EXT and bFile2.EXT are identical. After the script was executed, one of the two files will remain as it is and the other one is being renamed. Which file will be considered the “original” is determined by which file was found first. The script sorts the files by name first, before it dedupes them.
In this example bFile1.EXT would be considered the original and bFile2.EXT will be renamed to aFile1_EXT_bFile2[DEDUPED].EXT. This makes dupes appear right after the original, if you sort the directory by file name. To be able to filter the dupes to copy/move them away or to delete them, use the copy, move or del command in MS DOS. For example “DEL *[DEDUPED].*” would delete all duplicate files found and renamed by the script.
Dupe Action (2)
If you want the dupes renamed as in Action 1, but would like to have them moved away from the source directory, choose Dupe Action 2. Dupes are still being renamed as in (1), but the script moves the dupes to a sub directory called “[DeDuped]” within the processed folder.
Dupe Action (3)
If you just want the dupes moved away from the source folder, but keep the original file names, use this Dupe Action (which is the default action btw) You will find the duplicate files all in the subfolder “[DeDuped]” in the processed folder.
Dupe Action (4)
If you simply want to get rid of the dupes and delete them, use this Action
Dupe Action (5)
A variation of Action 3. The Difference is that the dupes are not moved to a sub folder below the processed file folder, but to a central dupe archive folder, where a new sub directory is being created with the date and time of the DeDupe processing and the Folder Name that was DeDuped.
The default location for that centralized backup folder is “C:\[DEDUPE_BACKUP]”, but that can be changed. Either via the registry settings or on the fly via command line option:
The script creates one file by default and the second one optional in the processed directory:
“!DeDupe-FileList.txt” – (optional feature) a list of all files in the directory and their MD5 Check Sum Values (tab separated)
“!DeDupeLog.txt” – (enabled by default) a processing logfile where you can find the list of dupes that were detected, their old & new file name and the corresponding original file
If you do not want any of the files to be created, change the options for “WriteFileList” and “WriteDeDupeLog” to “0” in the beginning of the code of “DedupeFilesInFolder2.vbs” ; alternatively use the command line options /log:[0/1] and /list:[0/1] to turn the creation of the list and/or log on/off. You can also specify a different file name for the file list and the DeDupe log, but you cannot change the path.
/list:0 or /list:1
/log:0 or /log:1
You can also suppress all dialogs via the command line option /quite:[0/1]. /quite:1 would disable the progress dialog, results message and all error messages.
/quite:0 or /quite:1
Note, the script returns error levels for batch processing regardless of the “quiet” settings.
The ErrorLevel codes are:
0 = Script Ran Successful
1 = Script Ran, but there were no files to process
2 = The script was aborted (only relevant if progress dialog is on)
4 = Script Error (md5sum.exe not or processing path not found)
Important. Version 2 of the Script enforces execution with CSCRIPT.EXE it re-launches itself, if it is executed with WSCRIPT.EXE. It is doing exactly that on purpose, if executed via the Shell Extension. If you are using the Quiet option and want to get the correct ErrorLevel back, you must execute the script with CSCRIPT.EXE from your application!
Use the provided Batch Scripts “DeDupeInstall.bat” and “DeDupeUnInstall.bat” to install or un-install the De-Dupe Shell Extension.
Double click on the Batch Script File “DeDupeInstall.bat”
The install batch file copies md5sum.exe and DedupeFilesInFolder2.vbs into your System32 directory under your windows installation directory and Imports the registry file “DedupeInstall.reg” into your systems registry database. It creates entries under the Registry Key: HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Directory\shell\
Non of the files in the installation directory will be needed anymore to run the script itself. You will need them only to uninstall the tool or to re-install it again, if necessary.
Double click on the Batch Script File “DeDupeUnInstall.bat”
The Un-Install batch file deletes the two files from your System32 directory and utilizes the registry file DedupeUnInstall.reg to remove the entries for the script from your systems registry database. If you want to continue to use the tool md5sum.exe and only want to disable the shell extension, either simply double click on the file DedupeUnInstall.reg without executing the uninstall batch file (the script DedupeFilesInFolder2.vbs
will remain in your System32 folder though) or you can copy the tool back into your system folder manually after you ran the uninstall batch file.
Script Settings in the System Registry
The Install Script automatically creates the default settings for the the script execution in the Windows Registry.
If you did not use the install script and run the DeDupe script for the first time, the script will create the missing registry entries based on the default values specified in the script code itself. Those settings are specifically of importance for the use of the DeDupe shell extension in Windows Explorer.
The settings in the registry are set for each Windows User separately.
To find and modify the settings, open the Registry Editor that comes with Windows. (Start / Run, enter: Regedit, press “enter”)
Navigate to HKEY_CURRENT_USER \ Software \ DeDupe2 \ Parameters
Value Name Type Default Value Command Line Param Equivalent
CentralDupeBackupFolder REG_SZ C:\[DEDUPE_BACKUP] /cdb:PATH
DeDupeLogFName REG_SZ !DeDupeLog.txt /logfile:FILENAME
DupeAction REG_DWORD 3 /action:N
FileListFName REG_SZ !DeDupe-FileList.txt /listfile:FILENAME
MaxForks REG_DWORD 31 (HEX) or 49 (Decimal) /threads:NN
Quiet REG_DWORD 0 /quite:N
WriteDeDupeLog REG_DWORD 1 /log:N
WriteFileList REG_DWORD 0 /list:N
Upgrade from Previous DeDupe Versions
You might noticed that the main script has the name “DedupeFilesInFolder2.vbs”. The previous version of the script has the name “DedupeFilesInFolder.vbs”
The install script also creates a different shell extension with the name “DeDupe2″. If you installed version 1.x of the script and then use the install script for version 2, both scripts will be installed on your machine. You could continue to use them in parallel, if you want to, but I would not recommend it to the average user. I suggest to run the uninstall script of the previous version first and then the install script of the new version.
Note: If you run the uninstall script of the previous version AFTER you installed version 2 of the script, version 2 will no longer function properly because the uninstall script of the previous version also removes the 3rd party tool “md5sum.exe” from the System Directory. You either have to copy that tool back to the windows system directory manually or run the installation script for Version 2 once more. Doing that will overwrite any settings in the registry, which you might have changed already.
About the Software
The DeDupe Windows Explorer Shell Extension Script Tool is written in VBScript and is executed by the system tool WScript.exe. The DeDupe script (DedupeFilesInFolder2.vbs) uses a small support tool that it requires to work properly.
“md5sum.exe” is a small command line tool that return the MD5 Check Sum value for a file.
It can also validate MD5 check sums, which is a feature that is not used by the DeDupe script.
You can find out more information about it at http://etree.org/md5com.html
Md5Sum was written by [email protected]
Legal Stuff/Copyright and Disclaimer
The 3rd party tool that come with the DeDupe script is freeware and can be used and copied by anybody without the need of a license or to pay a fee. Since I did not write that tool, I cannot take any responsibility for any issues that they might cause by it, via my script or without out.
This DeDupe script is also freeware and can be used, copied and modified for free,
The author, of this software accepts no responsibility for damages resulting from the use of this product and
makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose.
This software is provided “AS IS”, and you, its user, assume all risks when using it.
- MD5Sum Determination Issue Resolved for file names with spaces in it
- Sorting by File Name Issue Resolved, now the “original” is really the first one sorted by name
- Progress dialog implemented to show status
- Quiet option implemented to suppress all dialogs
- Return of ErrorLevels implemented for batch scripts that call the script
- Rename logic changed, [DEDUPED] added to the renamed file in addition to existing logic
- touch.exe tool removed. It did not work reliable, period
- File List output with file names and their MD5 checksums implemented
- Log File output implemented
- command line parameters introduced to suppress file list and log file creation as well as to enable/disable “quiet” mode
- general code clean up
- MD5Sum Determination now separate Step using Multi-Threading for increased Performance
- New DupeAction: 2, 3, 4 and 5 implemented
- = Rename dupes as in 1, but MOVE to a new sub folder “[Deduped]” of the path being processed
- = Don’t rename dupes, just MOVE to a new sub folder “[Deduped]”
- = delete dupes
- = Create sub folder at specified location with name “yyyy-mm-dd_hh-mm-ss_FolderName”, create index file “!Index.txt” with archive location and name and original locations of files, separated by “|”
- Script now Enforces CSCRIPT.EXE (call from Shell Extension still uses WScript, because if I run it with Cscript from there a stupid DOS Shell Window is visible and open all the time)
- Message Output changed to use IE because of CSCRIPT execution in batch mode (which suppresses Wscript.echo)
- Settings now Saved in Registry, Manual overwrite via command line parameters is still possible. Use of Defaults in Code vs. Registry is also an option.
I hope that you will find this script useful. Please let me know your opinion, suggestions, feedback and recommendations for improvements via the comments section down below.
Carsten aka Roy/SAC