Revolvy Trivia Quizzes Revolvy Lists Revolvy Topics

ISO/IEC 8859-2

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central[1] or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions.[2]

ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. 0.2% of all web pages use ISO 8859-2 in June 2016.[3] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned Code page 1111 to ISO 8859-2.

Codepage 1250 a.k.a. Windows-1250 has many of the same characters but in a different arrangement.

These code values can be used for the following languages:

It can also be used for Romanian, but it is unsuitable for that language, because of lack of letters s and t with commas below, containing s and t with cedillas instead. These letters were unified in the first versions of the Unicode standard, meaning that the appearance with cedilla or with comma was treated as a glyph choice rather than as separate characters; fonts intended for use with Romanian should, therefore, have characters with comma below at those code points. Microsoft did not really provide such fonts for computers sold in Romania. Still, ISO/IEC 8859-2 and Windows-1250 (with the same problem) have been heavily used for Romanian. Unicode (which supports both variants) has taken the lead for web pages, which however often have s and t with cedilla anyway. Unicode notes as of 2014 that encoding the letters with comma below was a mistake, causing corruptions of Romanian data.

Code page layout

In the following table characters are shown together with their corresponding Unicode code points. Note that code values 00-1F, 7F, and 80-9F are not assigned to characters by ISO/IEC 8859-2. Code 20 is the regular SPACE character, and A0 is the NON-BREAKING SPACE. Code AD is a SOFT HYPHEN, which even in isolation may not appear at all in compliant web browsers.

Legend:

ISO/IEC 8859-2 (Latin-2)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
  0_                                  
  1_                                  
  2_   SP002032 !002133 "002234 #002335 $002436 %002537 &002638 '002739 (002840 )002941 *002A42 +002B43 ,002C44 -002D45 .002E46 /002F47
  3_   0003048 1003149 2003250 3003351 4003452 5003553 6003654 7003755 8003856 9003957 :003A58 ;003B59 003C60 =003D61 >003E62 ?003F63
  4_   @004064 A004165 B004266 C004367 D004468 E004569 F004670 G004771 H004872 I004973 J004A74 K004B75 L004C76 M004D77 N004E78 O004F79
  5_   P005080 Q005181 R005282 S005383 T005484 U005585 V005686 W005787 X005888 Y005989 Z005A90 [005B91 \005C92 ]005D93 ^005E94 _005F95
  6_   `006096 a006197 b006298 c006399 d0064100 e0065101 f0066102 g0067103 h0068104 i0069105 j006A106 k006B107 l006C108 m006D109 n006E110 o006F111
  7_   p0070112 q0071113 r0072114 s0073115 t0074116 u0075117 v0076118 w0077119 x0078120 y0079121 z007A122 {007B123 |007C124 }007D125 ~007E126  
  8_                                  
  9_                                  
  A_   NBSP00A0160 Ą0104161 ˘02D8162 Ł0141163 ¤00A4164 Ľ013D165 Ś015A166 §00A7167 ¨00A8168 Š0160169 Ş015E170 Ť0164171 Ź0179172 SHY00AD173 Ž017D174 Ż017B175
  B_   °00B0176 ą0105177 ˛02DB178 ł0142179 ´00B4180 ľ013E181 ś015B182 ˇ02C7183 ¸00B8184 š0161185 ş015F186 ť0165187 ź017A188 ˝02DD189 ž017E190 ż017C191
  C_   Ŕ0154192 Á00C1193 Â00C2194 Ă0102195 Ä00C4196 Ĺ0139197 Ć0106198 Ç00C7199 Č010C200 É00C9201 Ę0118202 Ë00CB203 Ě011A204 Í00CD205 Î00CE206 Ď010E207
  D_   Đ0110208 Ń0143209 Ň0147210 Ó00D3211 Ô00D4212 Ő0150213 Ö00D6214 ×00D7215 Ř0158216 Ů016E217 Ú00DA218 Ű0170219 Ü00DC220 Ý00DD221 Ţ0162222 ß00DF223
  E_   ŕ0155224 á00E1225 â00E2226 ă0103227 ä00E4228 ĺ013A229 ć0107230 ç00E7231 č010D232 é00E9233 ę0119234 ë00EB235 ě011B236 í00ED237 î00EE238 ď010F239
  F_   đ0111240 ń0144241 ň0148242 ó00F3243 ô00F4244 ő0151245 ö00F6246 ÷00F7247 ř0159248 ů016F249 ú00FA250 ű0171251 ü00FC252 ý00FD253 ţ0163254 ˙02D9255
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
See also
References
External links
Continue Reading...
Content from Wikipedia Licensed under CC-BY-SA.

ISO/IEC 8859-2

topic

ISO/IEC 8859-2:1999 , Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions. ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . 0.2% of all web pages use ISO 8859-2 in June 2016. Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned Code page 1111 to ISO 8859-2. Codepage 1250 a.k.a. Windows-1250 has many of the same characters but in a different arrangement. These code values can be used for the following languages: A ...more...



ISO/IEC 8859-4

topic

ISO/IEC 8859-4:1998 , Information technology — 8-bit single-byte coded graphic character sets — Part 4: Latin alphabet No. 4, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1988. It is informally referred to as Latin-4 or North European. It was designed to cover Estonian , Latvian , Lithuanian , Greenlandic , and Sami . It has been largely superseded by ISO/IEC 8859-10 and Unicode . Microsoft has assigned code page 28594 a.k.a. Windows-28594 to ISO-8859-4 in Windows. IBM has assigned code page 914 to ISO 8859-4. ISO-8859-4 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . Codepage layout Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined ISO/IEC 8859-4 _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   0_     1_     2_   SP 0020 32 ! 0021 33 " 0022 34 # 0023 35 $ 0024 36 % 0025 37 ...more...



ISO/IEC 8859-9

topic

ISO/IEC 8859-9:1999 , Information technology — 8-bit single-byte coded graphic character sets — Part 9: Latin alphabet No. 5, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1989. It is informally referred to as Latin-5 or Turkish. It was designed to cover the Turkish language , designed as being of more use than the ISO/IEC 8859-3 encoding. It is identical to ISO/IEC 8859-1 except for these six replacements of Icelandic characters with characters unique to the Turkish alphabet: Position 0xD0 0xDD 0xDE 0xF0 0xFD 0xFE 8859-9 Ğ İ Ş ğ ı ş 8859-1 Ð Ý Þ ð ý þ ISO-8859-9 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . In modern applications Unicode and UTF-8 are preferred. 0.1% of all web pages use ISO-8859-9 in February 2016. Microsoft has assigned code page 28599 a.k.a. Windows-28599 to ISO-8859-9 in Windows. IBM has assigned Code page 920 to ISO-8859-9. Codepage layout Legend:   ...more...



ISO/IEC 8859-7

topic

ISO/IEC 8859-7:2003 , Information technology — 8-bit single-byte coded graphic character sets — Part 7: Latin/Greek alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1987. It is informally referred to as Latin/Greek . It was designed to cover the modern Greek language . The original 1987 version of the standard had the same character assignments as the Greek national standard ELOT 928, published in 1986. The table in this article shows the updated 2003 version which adds three characters (0xA4: euro sign U+20AC, 0xA5: drachma sign U+20AF, 0xAA: Greek Ypogegrammeni U+037A). Microsoft has assigned code page 28597 a.k.a. Windows-28597 to ISO-8859-7 in Windows. IBM has assigned code page 813 to ISO 8859-7. ISO-8859-7 is the IANA preferred charset name for this standard (formally the 1987 version, but in practice there is no problem using it for the current version, as the changes are pure additions to previously unassigned codes) when supplemente ...more...



ISO/IEC 8859-3

topic

ISO/IEC 8859-3:1999 , Information technology — 8-bit single-byte coded graphic character sets — Part 3: Latin alphabet No. 3, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1988. It is informally referred to as Latin-3 or South European. It was designed to cover Turkish , Maltese and Esperanto , though the introduction of ISO/IEC 8859-9 superseded it for Turkish. The encoding remains popular with users of Esperanto, though use is waning as application support for Unicode becomes more common. ISO-8859-3 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . Microsoft has assigned code page 28593 a.k.a. Windows-28593 to ISO-8859-3 in Windows. IBM has assigned code page 913 to ISO 8859-3. Codepage layout Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined ISO/IEC 8859-3 _0 _1 _2 _3 _4 _5 ...more...



ISO/IEC 8859-5

topic

ISO/IEC 8859-5:1999 , Information technology — 8-bit single-byte coded graphic character sets — Part 5: Latin/Cyrillic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1988. It is informally referred to as Latin/Cyrillic . It was designed to cover languages using a Cyrillic alphabet such as Bulgarian , Belarusian , Russian , Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933–1990, but it is missing the Ukrainian letter ge , ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine . As a result, IBM created Code page 1124 . ISO-8859-5 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . The 8-bit encodings KOI8-R and KOI8-U , CP866 , and also Windows-1251 are far more commonly used. Another possible way to represent Cyrillic is Unicode . The Window ...more...



ISO/IEC 8859-6

topic

ISO/IEC 8859-6:1999 , Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1987. It is informally referred to as Latin/Arabic . It was designed to cover Arabic . Only nominal letters are encoded, no preshaped forms of the letters, so shaping processing is required for display. It does not include the extra letters needed to write most Arabic-script languages other than Arabic itself (such as Persian, Urdu, etc.). ISO-8859-6 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . The text is in logical order, so bidi processing is required for display. Nominally ISO-8859-6 ( code page 28596 ) is for “visual order”, and ISO-8859-6-I ( code page 38596 ) is for logical order. But in practice, and required for HTML and XML documents, ISO-8859-6 also stands for logical order text. There is a ...more...



ISO/IEC 8859-11

topic

ISO/IEC 8859-11:2001 , Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 2001. It is informally referred to as Latin/Thai . It is nearly identical to the national Thai standard TIS-620 (1990). The sole difference is that ISO/IEC 8859-11 allocates non-breaking space to code 0xA0, while TIS-620 leaves it undefined. (In practice, this small distinction is usually ignored.) ISO-8859-11 is not a registered IANA charset name despite following the normal pattern for IANA charsets based on the ISO 8859 series. However, the close equivalent TIS-620 (which lacks the non-breaking space) is registered with IANA, and can without problems be used for ISO/IEC 8859-11, since the no-break space has a code which was unallocated in TIS-620. Microsoft has assigned code page 28601 a.k.a. Windows-28601 to ISO-8859-11 in Windows. A draft had the Thai letters in different s ...more...



ISO/IEC 8859-8

topic

ISO/IEC 8859-8 , Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings . ISO/IEC 8859-8:1999 from 1999 represents its second and current revision, preceded by the first edition ISO/IEC 8859-8:1988 in 1988. It is informally referred to as Latin/Hebrew . ISO/IEC 8859-8 covers all the Hebrew letters , but no Hebrew vowel signs . IBM assigned code page 916 to it. ISO-8859-8 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . The text is (usually) in logical order, so bidi processing is required for display. Nominally ISO-8859-8 ( code page 28598 ) is for “visual order”, and ISO-8859-8-I ( code page 38598 ) is for logical order. But usually in practice, and required for HTML and XML documents, ISO-8859-8 also stands for logical order text. There is also ISO-8859-8-E which supposedly requires directionality to be exp ...more...



ISO/IEC 8859-14

topic

ISO/IEC 8859-14:1998 , Information technology — 8-bit single-byte coded graphic character sets — Part 14: Latin alphabet No. 8 ( Celtic ), is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1998. It is informally referred to as Latin-8 or Celtic. It was designed to cover the Celtic languages , such as Irish , Manx , Scottish Gaelic , Welsh , Cornish , and Breton . ISO-8859-14 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . CeltScript made an extension for Windows called Extended Latin-8 . History ISO-8859-14 was originally proposed for the Sami languages . ISO 8859-12 was proposed for Celtic. Later, ISO 8859-12 was proposed for Devanagari , so the Celtic proposal was changed to ISO 8859-14. The Sami proposal was changed to ISO 8859-15 , but it got rejected. Some of the code points were originally different. Codepage layout Legend:    Alphabetic    Control character    Numeric d ...more...



ISO/IEC 8859-13

topic

ISO/IEC 8859-13:1998 , Information technology — 8-bit single-byte coded graphic character sets — Part 13: Latin alphabet No. 7, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1998. It is informally referred to as Latin-7 or Baltic Rim. It was designed to cover the Baltic languages , and added characters used in the Polish language missing from the earlier encodings ISO 8859-4 and ISO 8859-10 . Unlike these two, it does not cover the Nordic languages. ISO-8859-13 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . Microsoft has assigned code page 28603 a.k.a. Windows-28603 to ISO-8859-13. IBM has assigned Code page 921 to ISO-8859-10. Codepage layout Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined Differences from ISO/IEC 8859-1 have a black border. ISO/IEC 8859-13 _0 _1 _2 _3 _4 ...more...



ISO/IEC 8859-10

topic

ISO/IEC 8859-10:1998 , Information technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1992. It is informally referred to as Latin-6 . It was designed to cover the Nordic languages , deemed as being of more use for them than ISO 8859-4 . ISO-8859-10 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . Microsoft has assigned code page 28600 a.k.a. Windows-28600 to ISO-8859-10 in Windows. IBM has assigned Code page 919 to ISO-8859-10. Codepage layout Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined Differences from ISO/IEC 8859-1 have a black border. ISO/IEC 8859-10 (Latin-6) _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   0_     1_     2_   SP 0020 32 ! 0021 33 " 0022 34 # 0023 35 $ 0024 36 % ...more...



ISO/IEC 8859-16

topic

ISO/IEC 8859-16:2001 , Information technology — 8-bit single-byte coded graphic character sets — Part 16: Latin alphabet No. 10, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 2001. It is informally referred to as Latin-10 or South-Eastern European. It was designed to cover Albanian , Croatian , Hungarian , Polish , Romanian , Serbian and Slovenian , but also French , German , Italian and Irish Gaelic (new orthography). ISO-8859-16 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . It was proposed as a different encoding similar to ISO 8859-1 with the missing French, Dutch and Turkish characters (note that the euro sign did not exist at the time), but that got rejected. Codepage layout Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined Differences from ISO/IEC 8859-1 have a black ...more...



ISO/IEC 8859-12

topic

ISO/IEC 8859-12 would have been part 12 of the ISO/IEC 8859 character encoding standard series. ISO 8859-12 was originally proposed to support the Celtic languages . ISO 8859-12 was later slated for Latin/ Devanagari , but this was abandoned in 1997, during the 12th meeting of ISO/IEC JTC 1/SC 2/WG 3 in Iraklion-Crete, Greece, 4 to 7 July 1997. The Celtic proposal was changed to ISO 8859-14 . References Everson, Michael. "Proposed ISO 8859-12 (later 14)" . "Resolutions of the 12th Meeting of ISO/IEC JTC 1/SC 2/WG 3, Iraklion-Crete, Greece, 1997-07-04, 07" (PDF) . Iraklion-Crete, Greece: ISO/IEC JTC 1/SC 2 N 2933, ISO/IEC JTC 1/SC 2/WG 3 N 401. 1997-07-04. Archived from the original (PDF) on 2011-06-07. Czyborra, Roman (1997-10-12). "The ISO 8859 Alphabet Soup" . Archived from the original on 2000-08-17. (NB. "Celtic" note on old Czyborra page.) Czyborra, Roman (1998-12-01). "The ISO 8859 Alphabet Soup" . Archived from the original on 2016-03-20. (NB. "ISCII" note on new Czyborra page.) Jarnefors, Olle (1996- ...more...



ISO/IEC 8859-15

topic

ISO/IEC 8859-15:1999 , Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings , first edition published in 1999. It is informally referred to as Latin-9 (and was for a while called Latin-0). It is similar to ISO 8859-1 , and thus generally intended for “Western European” languages, but replaces some less common symbols with the euro sign and some letters that were now deemed missing in part 1 for the target use: Position 0x A4 0xA6 0xA8 0xB4 0xB8 0xBC 0xBD 0xBE 8859-1 ¤ ¦ ¨ ´ ¸ ¼ ½ ¾ 8859-15 € Š š Ž ž Œ œ Ÿ ISO-8859-15 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429 . Microsoft has assigned code page 28605 a.k.a. Windows-28605 to ISO-8859-15. IBM has assigned code page 923 to ISO 8859-15. All the printable characters from both ISO/IEC 8859-1 and ISO/IEC 8859-15 are also found in Windows-1252 . Since October 20 ...more...



ISO/IEC 8859-1

topic

ISO/IEC 8859-1:1998 , Information technology — 8-bit single- byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII -based standard character encodings , first edition published in 1987. ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script . This character-encoding scheme is used throughout the Americas , Western Europe , Oceania , and much of Africa . It is also commonly used in most standard romanizations of East-Asian languages. It is the basis for most popular 8-bit character sets, including Windows-1252 and the first block of characters in Unicode . It is very common (on the Internet) to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers ...more...



ISO/IEC 8859

topic

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings . The series of standards consists of numbered parts, such as ISO/IEC 8859-1 , ISO/IEC 8859-2 , etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12 . The ISO working group maintaining this series of standards has been disbanded. ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94 . Introduction While the bit patterns of the 95 printable ASCII characters are sufficient to exchange information in modern English , most other languages that use Latin alphabets need additional symbols not covered by ASCII. ISO/IEC 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte to allow positions for another 96 printable characters. Early encodings were limited to 7 bits because of restrictions of some data transmission protocols, and partially for historical reasons. However, more characters were needed than could fit in a single 8-bit character encoding, so several m ...more...



ISO/IEC 2022

topic

ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO standard (equivalent to the ECMA standard ECMA-35 ) specifying a technique for including multiple character sets in a single character encoding system, and a technique for representing these character sets in both 7 and 8 bit systems using the same encoding. Many of the character sets included as ISO/IEC 2022 encodings are 'double byte' encodings where two bytes correspond to a single character. This makes ISO-2022 a variable width encoding. But a specific implementation does not have to implement all of the standard; the conformance level and the supported character sets are defined by the implementation. Introduction Many languages or language families not based on the Latin alphabet such as Greek , Cyrillic , Arabic , or Hebrew have historically been represented on computers with different 8-bit extended ASCII encodings. Written East Asian languages, specifically Chinese , Japanese , and Korean , use far more c ...more...



Universal Coded Character Set

topic

The Universal Coded Character Set ( UCS ) is a standard set of characters defined by the International Standard ISO / IEC 10646 , Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings . The latest version contains over 136,000 abstract characters, each identified by an unambiguous name and an integer number called its code point . This ISO/IEC 10646 standard is maintained in conjunction with The Unicode Standard ("Unicode"), and both are code-for-code identical. Characters (letters, numbers, symbols, ideograms, logograms, etc.) from the many languages , scripts , and traditions of the world are represented in the UCS with unique code points. The inclusiveness of the UCS is continually improving as characters from previously unrepresented writing systems are added. The UCS has over 1.1 million possible code points available for use/allocation, but only the first 65,536 (the Basic Multilingual Plane , or BMP) had entered i ...more...



ISO/IEC 646

topic

ISO/IEC 646 is the name of a set of ISO standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7- bit character code from which several national standards are derived. ISO/IEC 646 was also ratified by ECMA as ECMA-6 . The first version of ECMA-6 had been published in 1965, based on work the ECMA's Technical Committee TC1 had carried out since December 1960. Characters in the ISO/IEC 646 Basic Character Set are invariant characters. Since that portion of ISO/IEC 646, that is the invariant character set shared by all countries, specified only those letters used in the ISO basic Latin alphabet , countries using additional letters needed to create national variants of ISO 646 to be able to use their native scripts. Since universal acceptance of the 8-bit byte did not exist at that time, the national characters had to be made to fit within the cons ...more...



ISO basic Latin alphabet

topic

The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. The two sets contain the following 26 letters each: ISO basic Latin alphabet Uppercase Latin alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Lowercase Latin alphabet a b c d e f g h i j k l m n o p q r s t u v w x y z History By the 1960s it became apparent to the computer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their ( ISO/IEC 646 ) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII , which included in the character set the 26 × 2 letters of the Eng ...more...



List of International Organization for Standardization standards

topic

This is a list of published International Organization for Standardization (ISO) standards and other deliverables. For a complete and up-to-date list of all the ISO standards, see the ISO catalogue. The standards are protected by copyright and most of them must be purchased. However, about 300 of the standards produced by ISO and IEC 's Joint Technical Committee 1 ( JTC1 ) have been made freely and publicly available. ISO 1 – ISO 99 ISO 1 :2016 Geometrical product specifications (GPS) - Standard reference temperature for the specification of geometrical and dimensional properties ISO 2 :1973 Textiles – Designation of the direction of twist in yarns and related products ISO 3 :1973 Preferred numbers – Series of preferred numbers ISO 4 :1997 Information and documentation – Rules for the abbreviation of title words and titles of publications ISO 5 Photography and graphic technology – Density measurements ISO 6 :1993 Photography – Black-and-white pictorial still camera negative film/process systems – Determinati ...more...



ISO/IEC JTC 1/SC 2

topic

ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that develops and facilitates standards within the field of coded character sets. The international secretariat of ISO/IEC JTC 1/SC 2 is the Japanese Industrial Standards Committee (JISC), located in Japan. History ISO/IEC JTC 1/SC 2 was established in 1987, originally with the title “Character Sets and Information Coding,” with the area of work being, “the standardization of bit and byte coded representation of information for interchange including among others, sets of graphic characters, of control functions, of picture elements and audio information coding of text for processing and interchange; code extension techniques; implementation of these coded representations on interchange media and transmission systems." The standardization activities of the subcommittee were o ...more...



ISO 5428

topic

ISO 5428:1984 , Greek alphabet coded character set for bibliographic information interchange, is an ISO standard for an 8-bit character encoding for the modern Greek language . It contains a set of 73 graphic characters and is available through UNIMARC . In practice it is now superseded by Unicode . Character set Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined ISO 5428 _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   A_   160 ` 0060 161 ´ 00B4 162 ¨ 00A8 163 ~ 007E 164 ᾿ 1FBD 165 ῾ 1FFE 166 ͺ 037A 167 168 169 170 171 172 173 174 175   B_   « 00AB 176 » 00BB 177 ” 201D 178 “ 201C 179 ʹ 0374 180 ͵ 0375 181 182 183 184 185 186 · 00B7 187 188 189 190 ; 003B 191   C_   192 Α 0391 193 Β 0392 194 195 Γ 0393 196 Δ 0394 197 Ε 0395 198 Ϛ 03DA 199 Ϝ 03DC 200 Ζ 0396 201 Η 0397 202 Θ 0398 203 Ι 0399 204 Κ 039A 205 Λ 039B 206 Μ 039C 207   D_   Ν 039D 208 Ξ 039E 209 Ο 039F 210 Π 03A0 211 Ϙ 03D8 212 Ρ 03A1 213 Σ 03A2 214 2 ...more...



Character encoding

topic

In computing character encoding is used to represent a repertoire of characters by some kind of encoding system. Depending on the abstraction level and context, corresponding code points and the resulting code space may be regarded as bit patterns , octets , natural numbers , electrical pulses , etc. A character encoding is used in computation , data storage , and transmission of textual data . "Character set", "character map", "codeset" and " code page " are related, but not identical, terms. Early character codes associated with the optical or electrical telegraph could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only. The low cost of digital representation of data in modern computer systems allows more elaborate character codes (such as Unicode ) which represent most of the characters used in many written languages. Character encoding using internationally accepted standards permits worldwide interchange of ...more...



ISO 14651

topic

ISO/IEC 14651:2011 , Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings . This comparison can be used when collating a set of strings. The standard also specifies a datafile specifying the comparison order, the Common Tailorable Template, CTT. The comparison order is supposed to be tailored for different languages (hence the CTT is regarded as a template and not a default, though the empty tailoring, not changing any weighting, is appropriate in many cases), since different languages have incompatible ordering requirements. One such tailoring is European ordering rules (EOR), which in turn is supposed to be tailored for different European languages. The Common Tailorable Template (CTT) datafile of this ISO Standard is aligned with the Default Unicode Collation Entity Table (DUCET) datafile of the Unicode ...more...



Graphic character

topic

In ISO/IEC 646 (commonly known as ASCII ) and related standards including ISO 8859 and Unicode , a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans. In other words, it is any encoded character that is associated with one or more glyphs . ISO/IEC 646 In ISO 646, graphic characters are contained in rows 2 through 7 of the code table. However, two of the characters in these rows, namely the space character SP at row 2 column 0 and the delete character  DEL (also called the rubout character) at row 7 column 15, require special mention. The space is considered to be both a graphic character and a control character in ISO 646. It can have a visible form, and also a control function (moving the print head). The delete character is strictly a control character, not a graphic character. This is true not only in ISO 646, but also in all related standards including Unicode. However, many modern character sets deviate from ISO 646, and as a re ...more...



C0 and C1 control codes

topic

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use the ISO/IEC 2022 system of specifying control and graphic characters. Most character encodings , in addition to representing printable characters, also have characters such as these that represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received. The C0 set defines codes in the range 00–1F and the C1 set defines codes in the range 80–9F. The default C0 set was originally defined in ISO 646 ( ASCII ), while the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). While other C0 and C1 sets are available for specialized applications, they are rarely used. Encoding interoperability While the C1 control characters are used in conjunction with the ISO/IEC 8859 series of graphical character sets among others, they are rarely used directly, except on specific platfor ...more...



Extended ASCII

topic

The term extended ASCII ( EASCII or high ASCII ) refers to eight-bit or larger character encodings that include the standard seven- bit ASCII characters, plus additional characters. The use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the ASCII standard has been updated to include more than 128 characters or that the term unambiguously identifies a single encoding, neither of which is the case. There are many extended ASCII encodings (more than 220 DOS and Windows codepages). EBCDIC ("the other" major 8-bit character code) likewise developed many extended variants (more than 186 EBCDIC codepages) over the decades. Motive ASCII was designed in the 1960s for teleprinters and telegraphy , and some computing. Early teleprinters were electromechanical, having no microprocessor and just enough electromechanical memory to function. They fully processed one character at a time, returning to an idle state immediately afterward. They were typewriter-derived impact printe ...more...



ISO 15924

topic

ISO 15924 , Codes for the representation of names of scripts , defines two sets of codes for a number of writing systems (scripts). Each script is given both a four-letter code and a numeric one. Script is defined as "set of graphic characters used for the written form of one or more languages". Where possible the codes are derived from ISO 639-2 where the name of a script and the name of a language using the script are identical (example: Gujarātī ISO 639 guj, ISO 15924 Gujr). Preference is given to the 639-2 Bibliographical codes, which is different from the otherwise often preferred use of the Terminological codes. 4-letter ISO 15924 codes are incorporated into the Language Subtag Registry for IETF language tags and so can be used in file formats that make use of such language tags. For example, they can be used in HTML and XML to help Web browsers determine which typeface to use for foreign text. This way one could differentiate, for example, between Serbian written in the Cyrillic ( sr-Cyrl ) or Latin ( ...more...



Thai Industrial Standard 620-2533

topic

Thai Industrial Standard 620-2533 , commonly referred to as TIS-620 , is the most common character set and character encoding for the Thai language . The standard is published by the Thai Industrial Standards Institute (TISI), an organ of the Ministry of Industry under the Royal Thai Government, and is the sole official standard for encoding Thai in Thailand . The descriptive name of the standard is "Standard for Thai Character Codes for Computers" (Thai: รหัสสำหรับอักขระไทยที่ใช้กับคอมพิวเตอร์). "2533" refers to year 2533 of the Buddhist Era (1990), the year the present version of the standard was published; a previous revision, TIS 620-2529 (1986), is now obsolete. TIS-620 is the IANA preferred charset name for TIS-620, and that charset name is used also for ISO/IEC 8859-11 (which adds a no-break space character at 0xA0, which is unassigned in TIS-620). When the IANA name is used the codes are supplemented with the C0 and C1 control codes from ISO/IEC 6429 . Structure TIS-620 is a conventionally structured ...more...



Western Latin character sets (computing)

topic

Several binary representations of character sets for common Western European languages are compared in this article. These encodings were designed for representation of Italian , Spanish , Portuguese , French , German , Dutch , English , Danish , Swedish , Norwegian , and Icelandic , which use the Latin alphabet , a few additional letters and ones with precomposed diacritics , some punctuation, and various symbols (including some Greek letters). Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay , Swahili , and Classical Latin . Summary The ISO-8859 series of 8-bit character sets encodes all Latin character sets used in Europe , albeit that the same code points have multiple uses that caused some difficulty. The arrival of Unicode , with a unique code point for every glyph , resolved these issues. ISO/IEC 8859-1 or Latin-1 is the most used and also defines the first 256 codes in Unico ...more...



Semantic service-oriented architecture

topic

A Semantic Service Oriented Architecture ( SSOA ) is an architecture that allows for scalable and controlled Enterprise Application Integration solutions. SSOA describes a sophisticated approach to enterprise-scale IT infrastructure. It leverages rich, machine-interpretable descriptions of data, services, and processes to enable software agents to autonomously interact to perform critical mission functions. SSOA is technically founded on three notions: The principles of Service-oriented architecture (SOA); Standards Based Design (SBD); and Semantics -based computing. SSOA combines and implements these computer science concepts into a robust, extensible architecture capable of enabling complex, powerful functions. Applications In the health care industry, SSOA of HL7 has long been implemented. Other protocols include LOINC , PHIN , and HIPAA related standards. There is a series of SSOA-related ISO standards published for financial services , which can be found at the ISO's website . Some financial sectors ...more...



List of Ecma standards

topic

This is a list of standards published by Ecma International , formerly the European Computer Manufacturers Association. ECMA-1 – ECMA-99 ECMA-1 – Standard for a 6-bit Input/Output character code (withdrawn) ECMA-6 – 7-bit coded character set (same as ISO/IEC 646 /ITU-T T.50) (successive editions in 1965, 1967, 1970, 1973, and 1984) ECMA-10 – Data Interchange on punched tape (Nov 1965) (withdrawn) ECMA-13 – File Structure and Labelling of Magnetic Tapes (later ISO 1001 ) ECMA-17 – Graphic Representation of Control Characters of the ECMA 7-bit Coded Character Set for Information Interchange (Nov 1968) (withdrawn) ECMA-35 – Character Code Structure and Extension Techniques ( ISO/IEC 2022 ) ECMA-43 – 8-bit coded character set (same as ISO/IEC 4873 ) ECMA-48 – ANSI escape codes (same as ISO/IEC 6429) ECMA-55 – Minimal BASIC (January 1978) (withdrawn) ECMA-58 – 8-inch floppy disk (withdrawn) ECMA-59 – 8-inch floppy disk (withdrawn) ECMA-66 – 5¼-inch floppy disk (withdrawn) ECMA-69 – 8-inch floppy disk (withdrawn) E ...more...



2N

topic

2N or 2-N may refer to: 2N or 2°N, the 2nd parallel north latitude MI 2N , a type of electric multiple unit running on the French RER rail network 2N, a prefix labelling certain JEDEC transistors, notably the 2N2222 2N, an indicator of a redundancy level in (for example) an uninterruptible power supply configuration Powers of 2 (2 ) In genetics, 2n = x refers to a diploid chromosome number of x NJ 2-N; see New Jersey Route 17 MI 2N series double-decker train; see RER A HP 2N , ISO/IEC 8859-2 character set on printers by Hewlett-Packard See also N2 (disambiguation) 2N or 2-N may refer to: 2N or 2°N, the 2nd parallel north latitude MI 2N , a type of electric multiple unit running on the French RER rail network 2N, a prefix labelling certain JEDEC transistors, notably the 2N2222 2N, an indicator of a redundancy level in (for example) an uninterruptible power supply configuration Powers of 2 (2 ) In genetics, 2n = x refers to a diploid chromosome number of x NJ 2-N; see New Jersey Route 17 MI 2N series double-dec ...more...



RISC OS character set

topic

The Acorn RISC OS character set was used in the Acorn Archimedes series and subsequent computers from 1987 onwards. It is an extension of ISO/IEC 8859-1 . Character set Characters at 0x83, 0x84, 0x87, 0x88, 0x89, 0x8A, and 0x8B are specific to the Acorn RISC OS and therefore, are not in Unicode . At 0x83 is a box with another box inside it on the top left-hand corner, meaning "resize window". At 0x84 is a A 'bubble-writing' X, meaning "close window". At 0x87 is a very strange character that is an 7-segment -styled 8 with an 7-segment -styled 7 to the top right of it. At 0x88, 0x89, 0x8A, and 0x8B are left, right, up, and down bubble arrows for window scrollbars. The Homerton font does not have these characters. EFF , a third-party supplier of RISC OS outline fonts, has a different, but similar character set. Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined    Differences from ISO/IEC 8859-1 RISC OS character ...more...



S-comma

topic

S-comma ( majuscule: Ș , minuscule: ș ) is a letter which is part of the Romanian alphabet , used to represent the sound , the voiceless postalveolar fricative (like sh in shoe). History S “half moon” proposed as a letter in the Buda Lexicon. S cedilla, T cedilla and a cedilla illustrated with a comma in Ortografia limbei române published by the Romanian Academy in 1895. The letter was proposed in the Buda Lexicon, a book published in 1825, which included two texts by Petru Maior , Orthographia romana sive latino-valachica una cum clavi and Dialogu pentru inceputul linbei române, introducing ș for and ț for . Unicode support This letter however was not part of the early Unicode versions, and not in the predecessors like ISO/IEC 8859-2 and Windows-1250 , which is why Ş ( S - cedilla ) is often used in digital texts in Romanian. S-cedilla was introduced in ISO/IEC 8859-2 for computers for the sake of the Romanian language . It was less of a problem then, since the screens and printouts had less resolution and ...more...



Code 128

topic

Code 128 is a high-density linear barcode symbology. It is used for alphanumeric or numeric-only barcodes . It can encode all 128 characters of ASCII and, by use of an extension symbol (FNC4), the Latin-1 characters defined in ISO/IEC 8859-1 . GS1-128 (formerly known as UCC/EAN-128) is a subset of Code 128 and is used extensively worldwide in shipping and packaging industries as a product identification code for the container and pallet levels in the supply chain. The symbology was formerly defined as ISO/IEC 15417:2007. Specification Code 128 sections. 1. Quiet zone, 2. Start/stop symbols, 3. Encoded data, 4. Check symbol A Code 128 barcode has six sections: Quiet zone Start symbol Encoded data Check symbol Stop symbol Final bar (often considered part of the stop symbol) Quiet zone The check symbol is calculated from a weighted sum ( modulo 103) of all the symbols. Subtypes Code 128 includes 108 symbols: 103 data symbols, 3 start symbols, and 2 stop symbols. Each symbol consist of three black bars and three ...more...



ISO/IEC 6937

topic

ISO/IEC 6937:2001 , Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII , or rather of ISO/IEC 646 -IRV. It was developed in common with ITU-T (then CCITT ) for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. Only certain combinations of lead byte and follow byte are allowed, and there are some exceptions to the lead byte interpretation for some follow bytes. However, that no combining characters at all are encoded in ISO/IEC 6937. But one can represent some free-standing diacritics, often by letting the follow byte have the code for ASCII space. ISO/IEC 6937's architects were Hugh McGregor Ross , Peter Fenwick, Bernard Marti and Loek Zeckendorf . ISO6937/ ...more...



Double acute accent

topic

The double acute accent ( ˝ ) is a diacritic mark of the Latin script. It is used primarily in written Hungarian , and consequently is sometimes referred to by typographers as Hungarumlaut . The signs formed with the umlaut are letters in their own right in the Hungarian alphabet—for instance, they are separate letters for the purpose of collation —but letters with the double acute are considered variants of their equivalents with the plain umlaut. Uses Vowel length History Length marks first appeared in Hungarian orthography in the 15th-century Hussite Bible . Initially, only á and é were marked, since they are different in quality as well as length . Later í, ó, ú were marked as well. In the 18th century, before Hungarian orthography became fixed, u and o with umlaut + acute (ǘ, ö́) were used in some printed documents. 19th century typographers introduced the double acute as a more aesthetic solution. Hungarian In Hungarian, the double acute is thought of as the letter having both an umlaut and an acute a ...more...



Code page 1124

topic

Code page 1124 , also known as CP1124 , is a modified version of ISO/IEC 8859-5 that was designed to cover the Ukrainian language . it is identical to ISO 8859-5 except for two replacements of Macedonian characters: Position 0xA3 0xF3 CP1124 Ґ ґ 8859-5 Ѓ ѓ Codepage layout In the following table characters are shown together with their corresponding Unicode code points. Code A0 is the NON-BREAKING SPACE . Code AD is a SOFT HYPHEN, which even in isolation may not appear at all in compliant web browsers. Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined Code page 1124 _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   A_   NBSP 00A0 160 Ё 0401 161 Ђ 0402 162 Ґ 0490 163 Є 0404 164 Ѕ 0405 165 І 0406 166 Ї 0407 167 Ј 0408 168 Љ 0409 169 Њ 040A 170 Ћ 040B 171 Ќ 040C 172 SHY 00AD 173 Ў 040E 174 Џ 040F 175   B_   А 0410 176 Б 0411 177 В 0412 178 Г 0413 179 Д 0414 180 Е 0415 181 Ж 0416 182 З 0417 183 И 0418 184 Й 0419 185 ...more...



Latin-1 Supplement (Unicode block)

topic

The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement ) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1 : 80 (U+0080) - FF (U+00FF). Controls C1 (0080–009F) are not graphic.This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls , Latin-1 punctuation and symbols , 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators. The C1 controls and Latin-1 Supplement block has been included in its present form, with the same character repertoire since version 1.0 of the Unicode Standard , where it was known as Latin 1 . Character table Code Result Description Acronym C1 Controls U+0080 Padding Character PAD U+0081 High Octet Preset HOP U+0082 Break Permitted Here BPH U+0083 No Break Here NBH U+0084 Index IND U+0085 Next Line NEL U+0086 Start of Selected Area SSA U+0087 End of Selected Area ESA U+0088 Character (Horizontal) Tabulation Set HTS U+0089 Character (Horizontal) Tabulation wi ...more...



Micro-

topic

Look up micro- in Wiktionary, the free dictionary. Micro- (symbol µ ) is a unit prefix in the metric system denoting a factor of 10 (one millionth ). Confirmed in 1960, the prefix comes from the Greek μικρός ( mikrós ), meaning "small". The symbol for the prefix comes from the Greek letter μ ( mu ). It is the only SI prefix which uses a character not from the Latin alphabet . "mc" is commonly used as a prefix when the character "µ" is not available; for example, "mcg" commonly denotes a microgram. Also the letter u instead of µ is allowed by one of the ISO documents . Examples: Typical bacteria are 1 to 10 micrometres in diameter. Eukaryotic cells are typically 10 to 100 micrometres in diameter. SI prefixes Prefix Base 1000 Base 10 Decimal English word Adoption Name Symbol Short scale Long scale yotta Y  1000   10 1 000 000 000 000 000 000 000 000  septillion  quadrillion 1991 zetta Z  1000   10 1 000 000 000 000 000 000 000  sextillion  trilliard 1991 exa E  1000   10 1 000 000 000 000 000 000  quintillion ...more...



L2

topic

L2 , L , L02 , L II , L.2 or L-2 may refer to: Astronomy L2 point , the second Lagrangian point in an astronomical Solar System L2 Puppis , a star which is also known as HD 56096 Advanced Telescope for High Energy Astrophysics , a proposed X-ray telescope Aircraft LZ 18 (L 2) , a German Zeppelin sometimes referred to by its Navy designation of L 2 Arado L II , a 1929 German two-seat, high-wing sporting monoplane ASJA L2 , a 1932 Swedish biplane trainer aircraft Junkers L2 , an aircraft engine whose development lead to the Junkers L5 Lawson L-2 , a 1920 American biplane airliner Macchi L.2 , an Italian biplane flying boat L-2 Grasshopper , a Taylorcraft aircraft used by the United States Army Air Forces in World War II PZL Ł.2 , a 1929 Polish liaison aircraft Biology Haplogroup L2 (mtDNA) in human genetics ATC code L02 Endocrine therapy, a subgroup of the Anatomical Therapeutic Chemical Classification System the second lumbar vertebrae of the vertebral column in human anatomy the second larval stage in the Cae ...more...



Code page 852

topic

Code page 852 (also known as CP 852, IBM 00852, OEM 852 (Latin II), MS-DOS Latin 2 ) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian , Croatian , Czech , Hungarian , Polish , Romanian , Serbian , Slovak or Slovene ). Note that code page 852 (DOS Latin 2) is very different from ISO/IEC 8859-2 (ISO Latin-2), although both are informally referred to as "Latin-2" in different language regions. Some of the box drawing characters of the original DOS code page 437 were sacrificed in order to put in more accented letters (all printable characters from ISO 8859-2 are included). These changes caused display glitches in DOS applications that made use of the box drawing characters to display a GUI-like surface in text mode (e.g. Norton Commander ). Several local encodings were invented to avoid the problem, for example the Kamenický encoding for Czech and Slovak . Character set The following table shows code page 852. Each character is shown with its equivalen ...more...



Specials (Unicode block)

topic

Specials is a short Unicode block allocated at the very end of the Basic Multilingual Plane , at U+FFF0–FFFF. Of these 16 code points, five are assigned as of Unicode 10.0: U+FFF9 INTERLINEAR ANNOTATION ANCHOR , marks start of annotated text U+FFFA INTERLINEAR ANNOTATION SEPARATOR , marks start of annotating character(s) U+FFFB INTERLINEAR ANNOTATION TERMINATOR , marks end of annotation block U+FFFC  OBJECT REPLACEMENT CHARACTER , placeholder in the text for another unspecified object, for example in a compound document . U+FFFD � REPLACEMENT CHARACTER used to replace an unknown, unrecognized or unrepresentable character U+FFFE not a character. U+FFFF not a character. FFFE and FFFF are not unassigned in the usual sense, but guaranteed not to be a Unicode character at all . They can be used to guess a text's encoding scheme, since any text containing these is by definition not a correctly encoded Unicode text. Unicode's U+FEFF Byte order mark character can be inserted at the beginning of a Unicode text to sig ...more...



UTF-1

topic

UTF-1 is one way of transforming ISO 10646 / Unicode into a stream of bytes . Due to the design, it is not possible to resynchronise if decoding starts in the middle of a character (this makes error recovery hard, among other things) and simple byte-oriented search routines cannot be reliably used with it. UTF-1 is also fairly slow due to its use of division by a number which is not a power of 2. Due to these issues, UTF-1 never gained wide acceptance and has been replaced by UTF-8 . Design UTF-1 is a multi-byte encoding like UTF-8 ; a single Unicode code point can be encoded in one, two, three, or five octets . While the ASCII range is encoded as one octet, as in UTF-8 , the ASCII octets 0x21 - 0x7E (decimal 33 - 126) are also used in UTF-1 multi-byte encodings; therefore UTF-1 is unsuited for many Internet protocols, including MIME . UTF-1 does not use the C0 and C1 control codes or the space character in the multi-byte encodings – any 0x 00–0x20 or 0x7F–0x9F octet stands for the corresponding code points i ...more...



American National Standards Institute

topic

The American National Standards Institute ( ANSI , AN -see ) is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organization also coordinates U.S. standards with international standards so that American products can be used worldwide. ANSI accredits standards that are developed by representatives of other standards organizations , government agencies , consumer groups , companies, and others. These standards ensure that the characteristics and performance of products are consistent, that people use the same definitions and terms, and that products are tested the same way. ANSI also accredits organizations that carry out product or personnel certification in accordance with requirements defined in international standards. The organization's headquarters are in Washington, D.C. ANSI's operations office is located in New York City . The ANSI annual operating budget is funded by th ...more...



DEC Hebrew

topic

The DEC Hebrew character set is an 8-bit character set developed by Digital Equipment Corporation (DEC) to support the Hebrew alphabet . It was derived from DEC's Multinational Character Set (MCS) by removing the existing definitions from code points 192 to 223 and 224 to 250 and replacing code points 251 to 256 by the Hebrew letters. Since MCS is a predecessor of ISO/IEC 8859-1 , DEC Hebrew is similar to ISO/IEC 8859-8 and the Windows code page 1255 , that is, many characters in the range 160 to 191 are the same, and the Hebrew letters are at 192 to 250 in all three character sets. Code page layout In the following table, code points that differ from DEC MCS are shown boxed. Legend:    Alphabetic    Control character    Numeric digit    Punctuation    Extended punctuation    Graphic character    International    Undefined DEC Hebrew (8-bit) _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   0_   NUL 0000 0 SOH 0001 1 STX 0002 2 ETX 0003 3 EOT 0004 4 ENQ 0005 5 ACK 0006 6 BEL 0007 7 BS 0008 8 HT 0009 9 LF 00 ...more...



ArmSCII

topic

ArmSCII or ARMSCII is a set of obsolete single-byte character encodings for the Armenian alphabet defined by Armenian national standard 166-9. ArmSCII is an acronym for Armenian Standard Code for Information Interchange, similar to ASCII for the American standard. It has been superseded by the Unicode standard. However, these encodings are not widely used because the standard was published one year after the publication of international standard ISO 10585 that defined another 7-bit encoding, from which the encoding and mapping to the UCS (Universal Coded Character Set ( ISO/IEC 10646 ) and Unicode standards) were also derived a few years after, and there was a lack of support in the computer industry for adding ArmSCII. Encodings defined in the ArmSCII standard Very few systems support these encodings. Microsoft Windows does not support them, for example. It is usually better to use Unicode for proper interchange of Armenian text for web browsers and email , since most modern computers do not support ArmSCII ...more...




Next Page
Javascript Version
Revolvy Server https://www.revolvy.com