ASCII
[originally an acronym (American Standard Code for Information Interchange) but now merely conventional] The predominant character set encoding of present-day computers. The standard version uses 7 bits for each character, whereas most earlier codes (including early drafts of ASCII prior to June 1961) used fewer. This change allowed the inclusion of lowercase letters — a major win — but it did not provide for accented letters or any other letterforms not used in English (such as the German sharp-S ß. or the ae-ligature æ which is a letter in, for example, Norwegian). It could be worse, though. It could be much worse. See EBCDIC to understand how. A history of ASCII and its ancestors is at http://www.wps.com/texts/codes/index.html.
Computers are much pickier and less flexible about spelling than humans; thus, hackers need to be very precise when talking about characters, and have developed a considerable amount of verbal shorthand for them. Every character has one or more names — some formal, some concise, some silly. Common jargon names for ASCII characters are collected here. See also individual entries for bang, excl, open, ques, semi, shriek, splat, twiddle, and Yu-Shiang Whole Fish.
This list derives from revision 2.3 of the Usenet ASCII pronunciation guide. Single characters are listed in ASCII order; character pairs are sorted in by first member. For each character, common names are given in rough order of popularity, followed by names that are reported but rarely seen; official ANSI/CCITT names are surrounded by brokets: <>. Square brackets mark the particularly silly names introduced by INTERCAL. The abbreviations “l/r” and “o/c” stand for left/right and “open/close” respectively. Ordinary parentheticals provide some usage information.
! | Common: bang ; pling; excl; not; shriek; ball-bat; |
" | Common: double quote; quote. Rare: literal mark; double-glitch; snakebite; |
# | Common: number sign; pound; pound sign; hash; sharp; crunch ; hex; [mesh]. Rare: grid; crosshatch; octothorpe; flash; |
$ | Common: dollar; |
% | Common: percent; |
& | Common: |
' | Common: single quote; quote; |
( ) | Common: l/r paren; l/r parenthesis; left/right; open/close; paren/thesis; o/c paren; o/c parenthesis; l/r parenthesis; l/r banana. Rare: so/already; lparen/rparen; |
* | Common: star; [ splat ]; |
+ | Common: |
, | Common: |
- | Common: dash; |
. | Common: dot; point; |
/ | Common: slash; stroke; |
: | Common: |
; | Common: |
< > | Common: |
= | Common: |
? | Common: query; |
@ | Common: at sign; at; strudel. Rare: each; vortex; whorl; [whirlpool]; cyclone; snail; ape; cat; rose; cabbage; |
V | Rare: [book]. |
[ ] | Common: l/r square bracket; l/r bracket; |
\ | Common: backslash, hack, whack; escape (from C/UNIX); reverse slash; slosh; backslant; backwhack. Rare: bash; |
^ | Common: hat; control; uparrow; caret; |
_ | Common: |
` | Common: backquote; left quote; left single quote; open quote; |
{ } | Common: o/c brace; l/r brace; l/r squiggly; l/r squiggly bracket/brace; l/r curly bracket/brace; |
| | Common: bar; or; or-bar; v-bar; pipe; vertical bar. Rare: |
~ | Common: |
The pronunciation of # as ‘pound’ is common in the U.S. but a bad idea; Commonwealth Hackish has its own, rather more apposite use of ‘pound sign’ (confusingly, on British keyboards the £ happens to replace #; thus Britishers sometimes call # on a U.S.-ASCII keyboard ‘pound’, compounding the American error). The U.S. usage derives from an old-fashioned commercial practice of using a # suffix to tag pound weights on bills of lading. The character is usually pronounced ‘hash’ outside the U.S. There are more culture wars over the correct pronunciation of this character than any other, which has led to the ha ha only serious suggestion that it be pronounced “shibboleth” (see Judges 12:6 in an Old Testament or Tanakh).
The ‘uparrow’ name for circumflex and ‘leftarrow’ name for underline are historical relics from archaic ASCII (the 1963 version), which had these graphics in those character positions rather than the modern punctuation characters.
The ‘swung dash’ or ‘approximation’ sign (∼) is not quite the same as tilde ~ in typeset material, but the ASCII tilde serves for both (compare angle brackets).
Some other common usages cause odd overlaps. The #, $, >, and & characters, for example, are all pronounced “hex” in different communities because various assemblers use them as a prefix tag for hexadecimal constants (in particular, # in many assembler-programming cultures, $ in the 6502 world, > at Texas Instruments, and & on the BBC Micro, Sinclair, and some Z80 machines). See also splat.
The inability of ASCII text to correctly represent any of the world's other major languages makes the designers' choice of 7 bits look more and more like a serious misfeature as the use of international networks continues to increase (see software rot). Hardware and software from the U.S. still tends to embody the assumption that ASCII is the universal character set and that characters have 7 bits; this is a major irritant to people who want to use a character set suited to their own languages. Perversely, though, efforts to solve this problem by proliferating ‘national’ character sets produce an evolutionary pressure to use a smaller subset common to all those in use.
No comments :
Post a Comment