Romacode and Terracode

1) Introduction

It seems to me that ASCII and Unicode are improvable and should be replaced. I don't have a detailed solution, just semi-organized snippets of suggestions which call "Romacode" and "Terracode".

2) Why not just Unicode?

Why not just Unicode? I know this is very politically incorrect, but some cultures are just inferior to others, at least in some respects. For example, my culture, the Roman Catholic culture that comes from Rome, the Eternal City, has a profoundly bad numbering system: Roman numerals. Counting to ten with abominations like I, II, III, IV, etc., is idiotic, impractical, painful, and stupid. Counting with 1, 2, 3, 4, etc., is far superior. OBJECTIVELY superior, unquestionably superior. With 10 little symbols, I get to represent huge numbers easily, and they even help me calculate! So I just throw the Roman numerals out the window!

Now let's look at writing systems: Chinese, Korean, Japanese, etc. They often have one complicated symbol per word! Their writing systems must have thousands upon thousands of symbols! Learning the alphabet in those languages is an ordeal. Writing one word is an art form. And if you try to represent those alphabets in a computer, you'll get huge memory requirements, lots of computing cycles, linear inches of written specifications! And if you invent a new word, you have to invent a new character! Whereas when I write in English (or Latin), I get by with 26 simple symbols, and I can write everything I need to write...

Unicode was invented, not to facilitate communication, but to compensate for human stupidity, awful writing systems, and the Tower of Babel. Yes, it works, but it shoves a lot of unnecessary complexity down the throats of all computers.

3) Unicode or Terracode?

Given that human communication doesn't intrinsically require a complicated monster like Unicode, does that mean we should shove English (or Latin) down everybody's throat? First, that would be nasty. Second, that is humanly impossible. So the engineer answers: we must allow imperfect writing systems and different human languages (without punishing users who don't need such flexibility). Unicode exists and it roughly works. It could probably be improved a bit, perhaps by starting with lots of room for growth, i.e. 32-bit characters. Terracode might be a better name for it ("Terra" means Earth, and we want to represent all writing systems invented on Earth). I would need to add much more here, but that's all I currently have for Terracode.

4) Romacode

Romacode is 8-bits, and is a much better thought-out ASCII. It's an attempt to "compress" every advancement in our civilization into only 256 symbols (8 bits gives 256 choices). Science, Technology, Music, etc., everything must be included, so difficult choices will have to be made. But the result would allow computers that use less memory and CPU cycles, while still maintaining the same features of current computers.

I don't know the details of each symbol, but I guess we would have the following categories:

4.1) Hindu–Arabic numerals (10 symbols). The first Romacode characters are 0, 1, 2, 3, etc., with corresponding bit patterns of "00000000", "00000001", "00000010", "00000011", etc. This seems so blindingly obvious, yet ASCII doesn't do that. Imagine being able to explain to students the binary numbering system, just by looking at the bit patterns of 1, 2, 3, etc.! It might also slightly accelerate conversion between displayed numbers and actual numeric values internal to a computer.

4.2) Latin and Greek characters (Latin: 52 symbols; Greek: 48 symbols). Our civilization would not exist without Athens and Rome (i.e. "Roma"). We need to remember that. In addition, Greek characters are very useful in Mathematics and applied sciences. I would keep distinct uppercase and lowercase characters, but maybe they should be listed with each letter first in uppercase, then lowercase, so alphabetical ordering would be very easy to program.

4.3) Whitespace (4 symbols). Currently, just having one unified new line character is a mess. The character (or character sequences) are not always the same for Windows, Apple, Unix, etc. So there would be one unique newline character. Also a different character for an unbreakable space and a plain space. And of course a tabulation character.

4.4) Punctuation and delimiters (about 30 symbols). Just usual things like the period, comma, colon, etc. Also plenty of "oriented" delimiters like parentheses, brakets, curly brakets, etc. (I say "oriented" because it's more robust to program with «oriented» delimiters than "non-oriented" ones.) Human-Readable data files would be much easier to parse with more oriented delimiters.

4.5) Basic mathematical symbols (about 20 symbols). Basic arithmetic symbols (including a unary minus different from the subtraction operator, a real multiplication character, same for division, etc.). A proper negation operator. Logical comparaison operators. Superscript and subscript operators (giving us exponents like 3e and variable names for engineers, like Wpayload, given a bit of help from the computer display.

4.6) Old-fashioned Gregorian chant neumes (about 10 symbols). No, it won't give us a full-fledged music score system, but it would give one-symbol representations of "do, re, mi, fa, so, la, si", with sharp and flat, etc. Nice thing about Gregorian nemes is that you can represent them without too many pixels, like "do":


4.7) All the other languages in the History of Mankind (256 minus about 175). No, we cannot represent Terracode directly with only 8-bit characters, but with some modernized and compressed version of Visible Speech, it might be doable. The idea is that the human voice can only make a limited number of sounds (fifty-ish?), and to produce those sounds, we need to specify the position and movement of the throat, tongue, teeth, lips, etc. If we could visually represent what you need to do with your "voice system", you could look at a character, see where to put your tongue, how to open or close your lips, what air movement to make, and bingo, you'd have the correct sound. So it would be easier to encode human speech with symbols.

The idea of a character set encoding the positions of the human voice apparatus is eerily similar to machine code bit patterns, which directly instruct the CPU what to do, such that reading one machine instruction is equivalent to executing the proper operation of the CPU.

4.8) Christ! The last symbol is a good old Christian cross: , because everything must be recapitulated into Christ (and Christianity is the foundation of the civilization that invented computers).