ASCII to Unicode Converter Suite 2.0B Released

Categories: ASCII, TOOL, Tools, Uncategorized
Comments: No Comments
Published on: June 11, 2010

Welcome to a new installment of my ASCII to Unicode/HTML converter tool. I renamed it to ???Suite???, because there are actually three individual tools part of the whole package. Main focus is still the conversion of ???High-ASCII??? or ???Block ASCII??? art for the display on your web site, but it offers a few more things than that also.

Release Notes

A little toolset that converts MS DOS ASCII files, such as NFO???s or FILE_ID.DIZ text files (Code Page 437, USA and other) to HTML encoded Unicode (Output is still an ASCII file) that can be used to display the ASCII art, specifically the ???High ASCII??? or?? block ASCII art on a web site. You can also convert the ASCII files to real Unicode Text files for the use in Windows Apps.

There are 3 different programs.

  1. ???batchconvert2.exe??? is the ASCII to Unicode converter with a simple step by step user interface (see the image, file name: ???Batch-ASCII2WebOrUniCode-Steps20.jpg??? in the sub-folder ???screenshots??? for illustration.
  2. ???batchconvert2cli.exe??? provides the same features as ???batchconvert2.exe??? but as an command-line tool for easy batch processing from a .BAT script etc.
  3. ???ASCIIConverterExtendedGUI1.exe??? is a new version of my tool with a more user friendly single screen user interface. This version also provides some new features not available in the other two programs. You can convert Unicode text files or HTML encoded Unicode ASCII to MS-DOS ASCII text files. See some screen shots of the interface in the ???screenshots??? sub folder.

ASCIIConverterExtendedGUI

I decided to only include the .VBS and .HTA scripts converted to Win32 executables, but not the script sources itself in this version of the release. The code amount grew a significant amount and might be confusing. It???s not that I don???t want to share it. If you want to have the source files, contact me and I will give them to you. Null Problemo!

Examples

The example files can be found in the sub-folder ???examples???.

File: roy.asc

     ??????????????????????????? ??? ???????????? ???????????????????????????   ???????????????????????????  ??? ????????? ??? ????????? ??????????????? ??? ???????????????????????????
     ??R??E??L??.??  ??? ???????????? ??????????????????????????? ????????????????????????????????? ??? ????????? ??? ????????? ???????????? ???  ??2??0??0??9??
     ????????????????????????????????? ??? ????????? ????????? ??????????????? ??? ????????? ??? ????????? ?????? ???????????????????????? ????????? ??? ?????????????????????????????????
                    ?????? ????????? ??? ????????? ??? ?????????????????????????????????  ???????????????????????? ??????
                       ?????????   ????????? ???  ???????????????????????????  ????????????????????????

For example the small logo above (which is in DOS ASCII Format) would be converted to the following;

File: roy.web

───────── ???????????? ▄ ▄▄▄▄ ▄▄▄▄▄ ???????????? ▄▄▄▄???? ▄▄▄▄▄▄ ???????????? ▄▄▄?? ▄ ▄▄▄ ▄ ▄▄ ???????????? ▄ ▄▄▄▄▄ ▄ ─── ???????????? ──────???????? ·R·E·L ???????????? ·.·?? ▄ ▄▄▄▄ ▀▀▀ ???????????? ▀▀▀███ ▀▀▀▀ ???????????? ▀▀▀▀███ █ ██ ???????????? █ ▀ ███ ▄▄▄▄ ▄??? ???????????? ·2·0·1·0· ???

Note: I manually added line breaks and spaces to the example for?? NFO layout reasons and it is also not the entire logo.?? See the example files for the real results.

Unicode Text file Output.

File: roy.txt

This example file illustrates the results of a conversion of the roy.asc MS-DOS ASCII file to a Unicode Text file.

Notes to the HTML Encoding:

The Web Encoded ASCII does not include <BR> Tags for the line-breaks. You have to add those manually or do what I do on my web site and enclose the code in <PRE> </PRE> tags, which preserves the line-breaks within the enclosed text.

Since the result is Unicode and not DOS ASCII anymore, you can use any mono-space font to display the ASCII somewhat correctly. You won???t get a 100% accurate result anyway, because the old MS DOS font set is not part of Windows anymore. What you can get is only a close approximation. I use on my web site the font ???Lucida Console???, which seems to be installed on many machines. If the Windows of the user who visits your page with the ASCII for example does not have that font installed, Windows will automatically?? pick another font that comes close. I use the following CSS formatting for the PRE tags where I show ASCIIs.

pre {
background-color:#000;
color:#FFF;
display:block;
font-family:”Lucida Console”, monospace;
font-size:9pt;
line-height:12px;
padding:10px;
text-align:left;
}

If the PRE Tag is also used for something else on your web site then you can also define it for a specific class selector like

pre.asciiart { ?????????????????????????? ??? ?????????????????????????? }

You would then also have to extend the PRE HTML tag like this:
<PRE> ?????????????????????????? ??? ?????????????????????????? </PRE>

Batchconvert2.exe

The script is designed to convert all files with a specified extension (.ASC by default) to web ready files with a new extension (.WEB by default, but you could also make it .HTML or whatever).

There are two additional options, where I recommend to use the defaults (which is ???yes???).

The first option is ???HTML Encode????, which means that all non-US-ASCII characters (the 7 bit ASCII codes) will be converted to HTML codes?? like &#XXX;. Also standard ASCII characters that could be misinterpreted by HTML or DHTML and XML are also encoded, like the ??? becomes &quot;,?? & becomes &amp;, < becomes &lt; and > becomes &gt; etc. If you select ???No???, the ASCII will be converted to Windows Unicode (UTF-16) instead.

The second one is ???Sanitize???? It will only come up, if you decide to convert files to Unicode text files and not, if you select HTML encoding. This option is automatically applied for HTML encoding.

What that does, if you select ???Yes???. It is removing ASCII characters with an ASCII code smaller than 32, which are special control characters that cannot be printed anyway, with 3 exceptions, chr(10) = line-feed, chr(13) = carriage-return and chr(9) = tab. LF and CR remain unchanged. Tab characters will be converted to 8 spaces, which is the default MS DOS tab-stop.

Batch-ASCII2WebOrUniCode-Steps20

Batchconvert2Cli.exe

The command-line version of my converter tool. The only required parameter is the path to the directory to?? process. If nothing else is specified, all files with the extension .ASC will be converted to HTML encoded Unicode. Output files will be written into the same directory and same file-name, but with the file extension .WEB. If the tool is called without any parameters, the HELP text will be shown.

Usage:

BATCHCONVERT2CLI.EXE PATH [/fext:|/text:|/cp:|/san:|/html:]

Optional Parameters (Named Arguments):


/fext: select files with specified extension to be converted. (default is “asc”)
/text: extension of converted files (default is “web”)
/cp: Codepage (CP437/CP850???) (default is “CP437″)
/san: Sanitize (y/n), replace tabs with spaces + remove control chrs* (default is “y”)
* ASCII Code < 32, except for line breaks
/html: HTML Encode (y/n) ??? Encode Unicode Characters for HTML (&#CHR;)* (default is “y”)
* Note: Output will still be a text file; If ???n??? then Output encoding is UTF-16

Example:


BATCHCONVERT2CLI.EXE “C:\NFOS\” /fext:NFO /text:HTML /cp:CP850 /san:y /html:y

  • Processes all file in folder “C:\NFOS\” with extension .NFO.
  • Generate output files in the same folder and file base-name.
  • Use file extension .html for output files.
  • For all processed files Code Page 850 (Western Europe) is assumed.
  • Convert ASCII text to HTML encoded Unicode entities and sanitize the data prior conversion and output.

BatchConvert2cli

MS-DOS Code Pages


CP437 ??? Latin US/United States/Canada
CP737 ??? Greek
CP775 ??? Baltic Rim
CP850 ??? Latin 1 (Western Europe: DE, FR, ES)
CP852 ??? Latin 2 (Slavic: PL, RU, BA, HR, HU, CZ, SK)
CP855 ??? Cyrillic (RU, BG, UA)
CP857 ??? Turkish, TR
CP858 ??? Latin 1 Alt (= 850, 0xD5 = U+20AC EURO SYM)
CP860 ??? Portuguese, PT
CP861 ??? Islandic, IS
CP862 ??? Hebrew, IL
CP863 ??? Canada, CA (French)
CP864 ??? Arabic
CP865 ??? Nordic (except IS) (DK, SE, NO, FI)
CP866 ??? Cyrillic Russian (based on GOST 19768-87)
CP869 ??? Greek 2 (IBM Modern GR)
CP874 ??? MS-DOS Thai

Windows Code Pages


CP1250 ??? Windows Latin-2
CP1251 ??? Windows Cyrillic
CP1252 ??? Windows Latin-1
CP1253 ??? Windows Greek
CP1254 ??? Windows Turkish
CP1255 ??? Windows Hebrew
CP1256 ??? Windows Arabic
CP1257 ??? Windows Baltic (1)
CP1258 ??? Windows Vietnamese
CP874 ??? Windows Thai
CP932 ??? Windows Japanese
CP936 ??? Windows Chinese (VRCN)
CP949 ??? Windows Korean
CP950 ??? Windows Chinese (HK)

Disclaimer and Waiver

Those tools are freeware. Do with it whatever you like, except selling it. You can use it free, copy it, share it whatever. You are using it at your own risk. You cannot make me liable for any damage or loss of data that might results directly or indirectly because of the use of my tools.

Downloads

Download the Release ZIP file: zip[1]2RoySAC-ASC2UNI20B.ZIP3 (1.1 MB)

Enjoy the Tool! Cheers!

Carsten aka Roy/SAC5

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

NOTE! I believe in the right for freedom of speech and personal opinion and are against censorship, so feel free to tell me what you think and let me and others hear your opinion on this subject, but please avoid using the f-word and s-word as much as you possibly can, because at the end of the day this blog exists for the purpose of useful exchanges of thoughts, ideas and opinions and not as a valve for your accumulated anger and frustration. Get a shrink for that! Thanks.

Welcome , today is Monday, September 16, 2024