Disclaimer:
Before
I get started with the history of Mime files as
I see it, please be aware that there is a
difference between UUE encoded files and Base64
encoded files, although for simplicity, I lump
them both together and call them Mime.
Technically, this is not correct. Also, some of
what I proclaim is based on supposition and is
not based on formal research.
Definitions
Uuencode was designed to allow UNIX binary files to be easily transferred through text-only interfaces, such as e-mail. Every uuencoded file contains a line similar to: begin 644 usa-map.gif followed by a series of lines of ASCII text characters (which are normally 60 characters long and begin with the letter 'M').
The file ends with a line containing the word 'end'. There may be other special keywords included. Externally uuencode files are usually denoted with the suffix ".uu" or ".uue". Usually, one won't find Macintosh files in uuencode format; however, most non-Macintosh specific binary data posted to Usenet is uu-encoded. The programs 'uuencode' and 'uudecode' exist on most UNIX systems.
Base64 is the encoding format used by Multipurpose Internet Mail Extension (Mime) files. The reason mime uses Base64 rather than the more popular uuencode format is that uuencode is not really a standard but rather a collection of related but different formats. This rendered uuencode impractical as a cross platform encoding format.
A Little History
In the earlier days of the Internet, as people were enjoying the capability of writing each other email notes and messages, some of the propeller heads (like some of you) thought it would really be nice if they could send a data file or executable program to each other. Without getting too technical (yet), all the words and notes sent over the Internet comprise less than 65 unique letters, numbers, and special printable symbols. I will refer to each one of these letters, numbers, or symbols as a "byte" or "character". Every byte to a computer is made up of 8 bits, which allows for 256 unique combinations of bits, including the 64 unique combinations of "displayable" letters, numbers, etc., previously mentioned. So, there are 256 combinations available, but only 64 used by email. What are the rest of the combinations used for? Well, to make a long story a little less long, at least some of those combinations are used to tell computers, modems, and other communication devices what to do (control characters). If you try to send a file across the Internet that contained some of those communication control characters, it would be predictable only in that you would cause the communication to fail, or who knows what else would be corrupted. Don't fear, none of the software or hardware would let you do such a thing, but you would loose the integrity of the file. So, some of the "techno-weenies" came up with a grand scheme to translate a file that could contain such data to one of the 64 displayable characters (or groups of them) and then have software on the other end "decode" it back on the receiving end. OK, we're done with the techno-weenie stuff temporarily. We'll get into the detail of this much later for only those that have an understanding of binary numbers. As is the case of most unique solutions, several different "techno-weenie" groups figured out several different algorithms around the same time. There were several "encoding" and "decoding" techniques, but today, Mime (AKA UUE) is the defacto standard. Note: this is more or less the truth.
File Attachments
When you attach a file that is not pure text (meaning it contains combinations outside the range of 64 combinations mentioned earlier), either your Internet Service Provider (ISP) or your Email Software will normally encode that file for you. It is possible that some software on the receiving end will not be able to determine that the file is a "mime" file, and leave the file encoded. When that happens, that is when you see all the lines of what appears to be garbage text. There are various reasons for this happening, but, luckily, it no longer happens frequently. One of the biggest problems I have is that my email software (it is a little old), cannot handle the new Windows 95/98 "long" file names. When that happens, it retrieves my attachments and shows the file as "0000001.doc", "0000002.doc", etc. It is not always a Word document. At least, the body of the text message often has the long file name in the email message itself. Thus, I can see it is "BigDogJumpingInWater.jpg" for example. When I click on the attachment, my email software lets me pick the name I want to save it as, plus, it gives me the option to "open" the file. I can then give it a short file name and the correct extension. By giving it the correct extension, this allows my computer to use the correct software to launch/open this file. Using the correct software is a result of setting up the right "Association" to file types.
If the file name does not appear in the message, I now have to play a guessing game. Even though the file attachment shows up as "0000001.doc", it could be a JPG, DOC, MVI, EXE, or something else. I will normally try and save the file with one of those extensions. If I am wrong, the software will usually not be able to open the file
Decoding Tips
If your email shows a lot of garbage text and you think it is a Mime file, the simplest thing is to try and save the file with an extension of "UUE". Example: My attachment shows up as "0000001.DOC" and appears to be a mime file when Word opens it up. I then close Word and try opening the attachment back up (usually by double-clicking the file attachment icon), and do a "save as" a file such as temp1.uue.
My software has an option when I double-click as to whether I want to "open" the file. By selecting that option, it then prompts me as to the file name I want to save it as. I just change the dummy file name (0000001.doc) to the temp1.uue. This causes the internal software to automatically launch it's decoder. You probably already have this software on your system, but if you don't, you can get a "mime" decoder/encoder shareware package off the Internet. One I have had for years that works fine is a shareware package named Wincode. Use your search engine to find this if you think you need it. I still occasionally cannot trick my system into decoding what I recognize as a "mime" file, and resort to using this software directly against the saved "mime" file. If I save it with a ".UUE" extension, Wincode will decode it easier..
Top Of Page
Bits and Bytes
Now for the fun stuff. If you don't understand binary numbers and bytes, you can still get some interesting facts from what follows. What I find interesting is how someone came up with the idea to convert files containing 8 bits (256 combinations – 0 to 255) to 64 combinations (6 bits - 0 to 63). The way they did this was pretty ingenious because they did it in such a way that the "encoded" file is only 33 percent larger than the original size. I've been involved with projects where we wanted to do something similar, and we came up with the idea of changing every half byte (AKA a nibble), and expanding it to a full byte. However, this made the "encoded" file twice the size of the original file. That makes transmission time longer than a "mime" file. As an example, we would convert the following 3 bytes (X'03F6CD') to their EBCDIC display value for each half byte (characters 03F6CD, which is X'F0F3C6F6C3C4'). Although the encoded output is valid EBCDIC (mainframe) display characters, it is twice the size of the original input. You can use the same technique for ASCII, it is just different character representation.
Mime, however, uses a technique that uses the bits across bytes to reconstruct each 6 bits it encounters into a new output byte. If we look at the previous example of the three bytes in binary, we would get the following bit pattern. I'll use a period between each half byte and a dash between each byte:
0000.0011-1111.0110-1100.1101
As you can see, there are 24 bits (8 bits per byte, and 3 bytes). Now, a "mime" encoded file would go through these bits from left to right making groups of 6 bits just as it encounters them. As an example, the 6-bit groupings with a dash between each group would look like this:
000000-111111-011011-001101
Note the exact same bits are still there, I just broke them down into smaller groups. There are four groups of 6 bits instead of three groups of 8 bits. Also, note, I now have four groups instead of three. If you would then put two high-order zero bits on each group, it would now look like the following in terms of 8-bit bytes:
0000.0000-0011.1111-0001.1011-0000.1101
We now have 4 distinct bytes, and each byte ALWAYS has two high-order bits. This means that each byte can only have a binary number range of 0 to 64. Using each one of these bytes as a numeric value ranging from zero to 64, we could make up a table of valid "displayable" letters, numbers, and symbols for all 65 possibilities. And that is the way it is done. The Base64/Mime Encoding Table is as follows:
Table entries 0 through 31 (32 entries):
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdef
Table entries 32 thru 63 (32 entries):
ghijklmnopqrstuvwxyz0123456789+/
If we take the new 4 groups of bits we made up they would represent the following values:
Using our table to translate each of these values into one of our characters listed above, they would then be represented as:
And, our new encoded file would be "A/bN". Our input file we wanted to send would go from 3 bytes to 4 bytes. It would be one third larger. If you see a file attachment on the Internet that is 133,333 bytes, when it is decoded on your machine, it will only be 100,000 bytes, and now you know why. That same 100,000 byte file using my old routine mentioned earlier would have occupied 200,000 bytes. You can see the benefit of using a technique such as "mime". Although it would be faster to transmit the file "as is", at least "mime" encoded files are only one-third again their original file size. Note that ratio is not exact due to the fact that there is other standards involved that specify file name, dates, and number of attachments, all wrapped up in that "mime" file. But in general, a mime file is 1.333 times larger than it's original file.
Note that if we have a "standard" table assignment we can easily use the reverse process to decode that text file on the receiving end. Although it is fairly CPU intensive to reorganize the bit patterns and translate them to text, that time is negligible compared to the slow speed of modems. One final side note: If you attach a file while on your specific ISP and it is to go to someone using the same ISP, the ISP will probably not convert it to "mime" because it does not have to be sent over Internet connections. You are on their computer and sending a file to someone else on their computer. I hope this at least helps your general understanding of Mime and Email transmissions. If anyone reading this has any suggestions to help clarify this topic, feel free to Email us.
An official Web site that covers Mime/Base64 encoding can be found at "The Mime Information Page":
Copyright © 2000
[Softech Solutions, Inc]. All rights reserved.