Revolvy Trivia Quizzes Revolvy Lists Revolvy Topics

ISO/IEC 8859-2

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central[1] or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions.[2]

ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. 0.2% of all web pages use ISO 8859-2 in June 2016.[3] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned Code page 1111 to ISO 8859-2.

Codepage 1250 a.k.a. Windows-1250 has many of the same characters but in a different arrangement.

These code values can be used for the following languages:

It can also be used for Romanian, but it is unsuitable for that language, because of lack of letters s and t with commas below, containing s and t with cedillas instead. These letters were unified in the first versions of the Unicode standard, meaning that the appearance with cedilla or with comma was treated as a glyph choice rather than as separate characters; fonts intended for use with Romanian should, therefore, have characters with comma below at those code points. Microsoft did not really provide such fonts for computers sold in Romania. Still, ISO/IEC 8859-2 and Windows-1250 (with the same problem) have been heavily used for Romanian. Unicode (which supports both variants) has taken the lead for web pages, which however often have s and t with cedilla anyway. Unicode notes as of 2014 that encoding the letters with comma below was a mistake, causing corruptions of Romanian data.

Code page layout

In the following table characters are shown together with their corresponding Unicode code points. Note that code values 00-1F, 7F, and 80-9F are not assigned to characters by ISO/IEC 8859-2. Code 20 is the regular SPACE character, and A0 is the NON-BREAKING SPACE. Code AD is a SOFT HYPHEN, which even in isolation may not appear at all in compliant web browsers.

Legend:

ISO/IEC 8859-2 (Latin-2)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
  0_                                  
  1_                                  
  2_   SP002032 !002133 "002234 #002335 $002436 %002537 &002638 '002739 (002840 )002941 *002A42 +002B43 ,002C44 -002D45 .002E46 /002F47
  3_   0003048 1003149 2003250 3003351 4003452 5003553 6003654 7003755 8003856 9003957 :003A58 ;003B59 003C60 =003D61 >003E62 ?003F63
  4_   @004064 A004165 B004266 C004367 D004468 E004569 F004670 G004771 H004872 I004973 J004A74 K004B75 L004C76 M004D77 N004E78 O004F79
  5_   P005080 Q005181 R005282 S005383 T005484 U005585 V005686 W005787 X005888 Y005989 Z005A90 [005B91 \005C92 ]005D93 ^005E94 _005F95
  6_   `006096 a006197 b006298 c006399 d0064100 e0065101 f0066102 g0067103 h0068104 i0069105 j006A106 k006B107 l006C108 m006D109 n006E110 o006F111
  7_   p0070112 q0071113 r0072114 s0073115 t0074116 u0075117 v0076118 w0077119 x0078120 y0079121 z007A122 {007B123 |007C124 }007D125 ~007E126  
  8_                                  
  9_                                  
  A_   NBSP00A0160 Ą0104161 ˘02D8162 Ł0141163 ¤00A4164 Ľ013D165 Ś015A166 §00A7167 ¨00A8168 Š0160169 Ş015E170 Ť0164171 Ź0179172 SHY00AD173 Ž017D174 Ż017B175
  B_   °00B0176 ą0105177 ˛02DB178 ł0142179 ´00B4180 ľ013E181 ś015B182 ˇ02C7183 ¸00B8184 š0161185 ş015F186 ť0165187 ź017A188 ˝02DD189 ž017E190 ż017C191
  C_   Ŕ0154192 Á00C1193 Â00C2194 Ă0102195 Ä00C4196 Ĺ0139197 Ć0106198 Ç00C7199 Č010C200 É00C9201 Ę0118202 Ë00CB203 Ě011A204 Í00CD205 Î00CE206 Ď010E207
  D_   Đ0110208 Ń0143209 Ň0147210 Ó00D3211 Ô00D4212 Ő0150213 Ö00D6214 ×00D7215 Ř0158216 Ů016E217 Ú00DA218 Ű0170219 Ü00DC220 Ý00DD221 Ţ0162222 ß00DF223
  E_   ŕ0155224 á00E1225 â00E2226 ă0103227 ä00E4228 ĺ013A229 ć0107230 ç00E7231 č010D232 é00E9233 ę0119234 ë00EB235 ě011B236 í00ED237 î00EE238 ď010F239
  F_   đ0111240 ń0144241 ň0148242 ó00F3243 ô00F4244 ő0151245 ö00F6246 ÷00F7247 ř0159248 ů016F249 ú00FA250 ű0171251 ü00FC252 ý00FD253 ţ0163254 ˙02D9255
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
See also
References
External links
Continue Reading...
Content from Wikipedia Licensed under CC-BY-SA.

SI 960

topic

The Israeli Standards Institute 's Standard SI 960 defines a 7-bit Hebrew code page derived from but not related to ISO/IEC 646 . It is also known as DEC Hebrew (7-bit) , because DEC standardized this character set before it became an international standard. Kermit named it hebrew-7 and HEBREW-7 . The Hebrew alphabet is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew is stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is also reflected in ISO 8859-8 . Code page layout SI 960 _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   0_   NUL 0000 0 SOH 0001 1 STX 0002 2 ETX 0003 3 EOT 0004 4 ENQ 0005 5 ACK 0006 6 BEL 0007 7 BS 0008 8 HT 0009 9 LF 000A 10 VT 000B 11 FF 000C 12 CR 000D 13 SO 000E 14 SI 000F 15   1_   DLE 0010 16 DC1 0011 17 DC2 0012 18 DC3 0013 19 DC4 0014 20 NAK 0015 21 SYN 0016 22 ETB 0017 23 CAN 0018 24 EM 0019 25 SUB 001A 26 ESC 001B 27 FS 001C 28 GS 001D 29 RS 001E 30 US 001F 31   2 ...more...



Code page

topic

In computing , a code page is a table of values that describes the character set used for encoding a particular set of characters , usually combined with a number of control characters . The term "code page" originated from IBM 's EBCDIC -based mainframe systems, but Microsoft , SAP , and Oracle Corporation are among the few vendors which use this term. The majority of vendors identify their own character sets by a name. In the case when there is a plethora of character sets (like in IBM), identifying character sets through a number is a convenient way to distinguish them. Originally, the code page numbers referred to the page numbers in the IBM standard character set manual, a condition which has not held for a long time. Vendors that use a code page system allocate their own code page number to a character encoding , even if it is better known by another name; for example, UTF-8 has been assigned page numbers 1208 at IBM, 65001 at Microsoft, and 4110 at SAP. Hewlett-Packard uses a similar concept in it ...more...



ASCII

topic

ASCII (  (   listen ) ASS -kee ), abbreviated from American Standard Code for Information Interchange , is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment , and other devices. Most modern character-encoding schemes are based on ASCII, although they support many additional characters. ASCII is the traditional name for the encoding system; the Internet Assigned Numbers Authority (IANA) prefers the updated name US-ASCII , which clarifies that this system was developed in the US and based on the typographical symbols predominantly in use there. ASCII is one of a 1963 List of IEEE milestones ASCII chart from a 1972 printer manual (b1 is the least significant bit). Overview ASCII was developed from telegraph code . Its first commercial use was as a seven- bit teleprinter code promoted by Bell data services. Work on the ASCII standard began on October 6, 1960, with the first meeting of the American Standards Association 's (ASA) (now ...more...



Macintosh Latin encoding

topic

Mac Latin encoding is used in Apple Macintosh computers on fonts for the Mac Kermit to represent text (but not on standard Mac OS fonts). It is a modification of Mac OS Roman to include all characters in ISO/IEC 8859-1 , DEC-MCS , the PostScript Standard Encoding , and Dutch ISO 646 (with ÿ or ij being a substitute for ij). Each character is shown with its equivalent Unicode code point and its decimal code point. Only the second half of the table (code points 128–255) is shown, the first half (code points 0–127) being the same as ASCII . –0 –1 –2 –3 –4 –5 –6 –7 –8 –9 –A –B –C –D –E –F   8_   Ä 00C4 128 Å 00C5 129 Ç 00C7 130 É 00C9 131 Ñ 00D1 132 Ö 00D6 133 Ü 00DC 134 á 00E1 135 à 00E0 136 â 00E2 137 ä 00E4 138 ã 00E3 139 å 00E5 140 ç 00E7 141 é 00E9 142 è 00E8 143   9_   ê 00EA 144 ë 00EB 145 í 00ED 146 ì 00EC 147 î 00EE 148 ï 00EF 149 ñ 00F1 150 ó 00F3 151 ò 00F2 152 ô 00F4 153 ö 00F6 154 õ 00F5 155 ú 00FA 156 ù 00F9 157 û 00FB 158 ü 00FC 159   A_   Ý 00DD 160 ° 00B0 161 ¢ 00A2 162 £ 00A3 163 § 00A7 164 × 00D ...more...



Multinational Character Set

topic

The Multinational Character Set ( DMCS or MCS ) is a character encoding created in 1983 by Digital Equipment Corporation (DEC) for use in the popular VT220 terminal . It was an 8-bit extension of ASCII that added accented characters, currency symbols , and other character glyphs missing from 7-bit ASCII. It is only one of the code pages implemented for the VT220 National Replacement Character Set (NRCS). MCS is registered as IBM code page 1100 ( Multinational Emulation ) since 1992. Depending on associated sorting Oracle calls it WE8DEC , N8DEC , DK8DEC , S8DEC , or SF8DEC . Such " extended ASCII " sets were common (the National Replacement Character Set provided sets for more than a dozen European languages), but MCS has the distinction of being the ancestor of ECMA-94 in 1985 and ISO 8859-1 in 1987. The code chart of MCS with ECMA-94, ISO 8859-1 and the first 256 code points of Unicode have many more similarities than differences. In addition to unused code points, differences from ISO 8859-1 are: MCS c ...more...



Byte order mark

topic

The byte order mark ( BOM ) is a Unicode character, U+FEFF byte order mark (BOM), whose appearance as a magic number at the start of a text stream can signal several things to a program consuming the text: What byte order, or endianness , the text stream is stored in; The fact that the text stream is Unicode, to a high level of confidence; Which of several Unicode encodings that text stream is encoded as. BOM use is optional, and, if used, should appear at the start of the text stream. Unicode can be encoded in units of 8-bit, 16-bit, or 32-bit integers. For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. Because the BOM itself is encoded in the same scheme as the rest of the document, but has a known value, the consumer of the text can examine these first few bytes to determine the encoding. The BOM thus gives the producer of the text a way to describe the text endianness to the consumer of the text without requi ...more...



Everson Mono

topic

Everson Mono is a monospaced humanist sans serif Unicode font whose development by Michael Everson began in 1995. At first, Everson Mono was a collection of 8-bit fonts containing glyphs for tables in ISO/IEC 10646 ; at that time, it was not easy to edit cmaps to have true Unicode indices, and there were very few applications which could do anything with a font so encoded in any case. The original "Everson Mono" had a MacRoman character set, and other versions were named with suffixes: "Everson Mono Latin B", "Everson Mono Currency", "Everson Mono Armenian" and so on. A range of fonts with the character set of the ISO/IEC 8859 series were also made. A large font distributed in 2003 was named "Everson Mono Unicode", but since 2008 the font has been named simply "Everson Mono". At present, there are regular, italic, bold, and bold-italic styles. Range, Characters, Version Everson Mono version 7.0.0, dated 2014-12-04, contains 9,632 characters (9,659 glyphs ). Previous major releases contained fewer characters: ...more...



Code page 912

topic

Code page 912 (also known as CP 912, IBM 00912) is a code page used under IBM AIX and DOS to write the Albanian , Bosnian , Croatian , Czech , Hungarian , Polish , Romanian , Slovak , Slovene , and Sorbian languages. It is an extension of ISO/IEC 8859-2 . Code page layout In the following table characters are shown together with their corresponding Unicode code points. Code A0 is the NON-BREAKING SPACE . Code AD is a SOFT HYPHEN, which even in isolation may not appear at all in compliant web browsers. Code Page 912 _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   8_   ░ 2591 128 ▒ 2592 129 ▓ 2593 130 │ 2502 131 ┤ 2524 132 ┘ 2518 133 ┌ 250C 134 █ 2588 135 © 00A9 136 ╣ 2563 137 ║ 2551 138 ╗ 2557 139 ╝ 255D 140 ¢ 00A2 141 ¥ 00A5 142 ┐ 2510 143   9_   └ 2514 144 ┴ 2534 145 ┬ 252C 146 ├ 251C 147 ─ 2500 148 ┼ 253C 149 ▄ 2584 150 ▀ 2580 151 ╚ 255A 152 ╔ 2554 153 ╩ 2569 154 ╦ 2566 155 ╠ 2560 156 ═ 2550 157 ╬ 256C 158 ® 00AE 159   A_   NBSP 00A0 160 Ą 0104 161 ˘ 02D8 162 Ł 0141 163 ¤ 00A4 164 Ľ 013D 165 Ś 015A 166 § ...more...



Basic Latin (Unicode block)

topic

The Basic Latin or C0 Controls and Basic Latin Unicode block is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8 . The block contains all the letters and control codes of the ASCII encoding.It ranges from U+0000 to U+007F,contains 128 characters and includes the C0 controls ,ASCII punctuation and symbols ,ASCII digits ,both the Uppercase and Lowercase of the Latin Alphabet and a Control character . The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire. Table of characters Code Result Description Acronym C0 controls U+0000 Null character NUL U+0001 Start of Heading SOH U+0002 Start of Text STX U+0003 End-of-text character ETX U+0004 End-of-transmission character EOT U+0005 Enquiry character ENQ U+0006 Acknowledge character ACK U+0007 Bell character BEL U+0008 Backspace BS U+0009 Horizontal tab HT U+000A Line feed LF U+000B Vertical tab VT U+000C Form feed FF U ...more...



Numeric character reference

topic

A numeric character reference ( NCR ) is a common markup construct used in SGML and SGML-derived markup languages such as HTML and XML . It consists of a short sequence of characters that, in turn, represents a single character. Since WebSgml , XML and HTML 4 , the code points of the Universal Character Set (UCS) of Unicode are used. NCRs are typically used in order to represent characters that are not directly encodable in a particular document (for example, because they are international characters that don't fit in the 8-bit character set being used, or because they have special syntactic meaning in the language). When the document is interpreted by a markup-aware reader, each NCR is treated as if it were the character it represents. Examples In SGML, HTML, and XML, the following are all valid numeric character references for the Greek capital letter Sigma Numerical character reference of U+03A3 Σ GREEK CAPITAL LETTER SIGMA (3A3 = 931) Unicode character Numerical base Numerical reference in markup Effect U ...more...



Euro sign

topic

The euro sign ( € ) is the currency sign used for the euro , the official currency of the Eurozone in the European Union (EU). The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code (according to ISO standard ISO 4217 ) for the euro is EUR . In Unicode it is encoded at U+20AC € euro sign (HTML  €   · € ). In English, the sign precedes the value (for instance, €10, not 10 €). In some style guides, but not others, the euro sign is unspaced. Design Official graphic construction of the euro logo. The euro design featured in the Windows font Comic Sans originally had a cartoon eye inside a serif. This was later removed after fears of legal action from the EU. The euro currency sign was designed to be similar in structure to the old sign for the European Currency Unit (Encoded as U+20A0 ₠ ). There were originally thirty-two proposals; these were reduced to ten candidates. These ten were put to a public survey. After the survey had narrowe ...more...



Ideographic Rapporteur Group

topic

The Ideographic Rapporteur Group ( IRG ) advises the Unicode Consortium and the ISO /IEC JTC1/SC2/WG2 on Han character additions to the repertoire of the Unicode and ISO/IEC 10646-1 ( Universal Multiple-Octet Coded Character Set ) character set standards, and on Han unification . The working members of the IRG are either appointed by member governments, or are invited experts from other countries. IRG members include Mainland China , Hong Kong , Macau , Taipei Computer Association , Singapore , Japan , South Korea , North Korea , Vietnam and the United States . As of Unicode 10.0, the IRG has contributed several blocks of characters to Unicode/UCS, including the CJK Unified Ideographs , CJK Compatibility Ideographs (and Supplement), and CJK Unified Ideographs Extensions A, B, C, D, E, and F. References "ISO/IEC JTC1/SC2/WG2/IRG: Ideographic Rapporteur Group" . "Unicode Standard Annex #45: U-source Ideographs" . The Unicode Standard. Unicode Consortium. "Appendix E: Han Unification History" (PDF) . The Unic ...more...



QR code

topic

QR code for the URL of the English Wikipedia Mobile main page QR code (abbreviated from Quick Response Code ) is the trademark for a type of matrix barcode (or two-dimensional barcode ) first designed for the automotive industry in Japan . A barcode is a machine-readable optical label that contains information about the item to which it is attached. A QR code uses four standardized encoding modes (numeric, alphanumeric, byte/binary, and kanji ) to efficiently store data; extensions may also be used. The QR code system became popular outside the automotive industry due to its fast readability and greater storage capacity compared to standard UPC barcodes . Applications include product tracking, item identification, time tracking, document management, and general marketing. A QR code consists of black squares arranged in a square grid on a white background, which can be read by an imaging device such as a camera, and processed using Reed–Solomon error correction until the image can be appropriately interpreted. ...more...



ConScript Unicode Registry

topic

The ConScript Unicode Registry is a volunteer project to coordinate the assignment of code points in the Unicode Private Use Area for the encoding of artificial scripts including those for constructed languages . It was founded by John Cowan and is maintained by him and Michael Everson but has not been updated in several years. It has no formal connection with the Unicode Consortium . The Under-ConScript Unicode Registry (UCSUR) is a clone of the CSUR that is acting as a holding area for new scripts until they can be added to the dormant CSUR. It is run by Rebecca Bettencourt. Scripts The CSUR and UCSUR include the following scripts: Scripts in the ConScript Unicode Registry and Under-ConScript Unicode Registry Writing System Creator(s) Code range Remark Aiha Ursula K. Le Guin F8A0-F8CF Alzetjan Herman Miller E550-E57F Amlin Thomas Thurman E6D0-E6EF Only in UCSUR Amman-Iar David Bell E2A0-E2CF aUI John W. Weilgart E280-E29F Aurebesh Stephen Crane E890-E8DF Only in UCSUR Cirth J. R. R. Tolkien E080-E0FF Unico ...more...



Unicode

topic

Logo of the Unicode Consortium Unicode is a computing industry standard for the consistent encoding , representation, and handling of text expressed in most of the world's writing systems . The latest version contains a repertoire of 136,755 characters covering 139 modern and historic scripts , as well as multiple symbol sets. The Unicode Standard is maintained in conjunction with ISO/IEC 10646 , and both are code-for-code identical. The Unicode Standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings , a set of reference data files , and a number of related items, such as character properties, rules for normalization , decomposition, collation , rendering, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew , and left-to-right scripts). As of June 2017, the most recent version is Unicode 10.0. The standard is maintained by the Unicode Consortium . Unicode's success ...more...



List of information system character sets

topic

This list provides an inventory of character coding standards mainly before modern standards like ISO/IEC 646 etc. Some of these standards have been deeply involved in historic events that still have consequences. One notable example of this is the ITA2 coding used during the World War II (1939-1945). The nature of these standards is not as common knowledge like it is for ASCII or EBCDIC or their slang names. While 8-bit is the de facto standard as of 2016, in the past 5-bit and 6-bit were more prevalent or their multiple. Code Introduction Width Usage Morse code ca. 1837-1840 varies Electrical telegraphs Baudot code aka ITA1 1870 5 bits Piano like telegraph operation, SIGCUM cipher operation Chinese telegraph code 1881 4 digits Chinese telegraph communications Murray code 1901 5 bits Machine run telegraph operation using punched paper, moved optimization from minimal operator fatigue to minimal machinery wear ITA2 1924 5 bits Teletypewrite, Telecommunications devices for the deaf (TDD), Telex , Amateur radio ...more...



Text file

topic

A text file (sometimes spelled "textfile"; an old alternative name is "flatfile") is a kind of computer file that is structured as a sequence of lines of electronic text . A text file exists stored as data within a computer file system . The end of a text file is often denoted by placing one or more special characters, known as an end-of-file marker, after the last line in a text file. Such markers were required under the CP/M and MS-DOS operating systems. On modern operating systems such as Windows and Unix-like systems, text files do not contain any special EOF character. "Text file" refers to a type of container, while plain text refers to a type of content. Text files can contain plain text, but they are not limited to such. At a generic level of description, there are two kinds of computer files: text files and binary files . Data storage A stylized iconic depiction of a CSV -formatted text file . Because of their simplicity, text files are commonly used for storage of information. They avoid some of the ...more...

Member feedback about Text file:

Folder: All about bots!!

Nathalie Jill Sanchez (Nathaliejill)

Detailed explain subjects of things ive learn about computers


XML

topic

In computing , Extensible Markup Language ( XML ) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable . The W3C 's XML 1.0 Specification and several other related specifications —all of them free open standards —define XML. The design goals of XML emphasize simplicity, generality, and usability across the Internet . It is a textual data format with strong support via Unicode for different human languages . Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services . Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data. Applications of XML The essence of why extensible markup languages are necessary is explained at Markup language (for example, see Markup language § XML ) and at Standard G ...more...



BraSCII

topic

BraSCII is an encoded repertoire of characters that was used in Brazil. It was used in the 1980s on several printers, in applications like Carta Certa , in video boards and it was the standard character set in the Brazilian line of MSX computers. History This character set was devised in 1986 by the Brazilian National Standards Organization (Associação Brasileira de Normas Técnicas (ABNT)) through the standard NBR-9614:1986 and later revised in 1991 in the standard NBR-9611:1991. The code is based on the ISO/IEC 4873 standards, and it was nicknamed “BraSCII” ( Bra zilian S tandard C ode for I nformation I nterchange) in analogy to “ American Standard Code for Information Interchange ” (ASCII). While ASCII is a 7-bit code, BraSCII is an 8-bit code, where the characters from 160 to 255 were configured to support extended characters. It is nearly identical to ECMA-94 (1985) and ISO 8859-1 (1987) except that the characters × and ÷ are replaced by Œ and œ , as they still were in the Multinational Character Se ...more...



Code page 915

topic

Code page 915 (also known as CP 915, IBM 00915) is a code page used under IBM AIX and DOS to write the Bulgarian , Belarusian , Russian , Serbian and Macedonian but was never widely used. It would also have been usable for Ukrainian in the Soviet Union from 1933–1990, but it is missing the Ukrainian letter ge , ґ, which is required in Ukrainian orthography before and since, and during that period outside Soviet Ukraine . As a result, IBM created Code page 1124 . It is an extension of ISO/IEC 8859-5 . Code page layout In the following table characters are shown together with their corresponding Unicode code points. Code A0 is the NON-BREAKING SPACE . Code AD is a SOFT HYPHEN, which even in isolation may not appear at all in compliant web browsers. Code Page 915 _0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F   8_   ░ 2591 128 ▒ 2592 129 ▓ 2593 130 │ 2502 131 ┤ 2524 132 ┘ 2518 133 ┌ 250C 134 █ 2588 135 © 00A9 136 ╣ 2563 137 ║ 2551 138 ╗ 2557 139 ╝ 255D 140 ¢ 00A2 141 ¥ 00A5 142 ┐ 2510 143   9_   └ 2514 144 ┴ 25 ...more...



Ogonek

topic

The ogonek ( Polish : , "little tail", the diminutive of ogon; Lithuanian : nosinė , "nasal") is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American languages . An ogonek can also be attached to the top of a vowel in Old Norse-Icelandic to show length or vowel affection . For example, o᷎ represents i-mutated ø. Ogonek Use Polish (letters ą , ę ) Kashubian ( ą ) scholarly transcriptions of Old Church Slavonic and Proto-Slavic (ę, ǫ) scholarly transcriptions of Vulgar Latin and Proto-Romance (ę, ǫ) Lithuanian (ą, ę, į, ų) Cayuga (letters ę , ǫ) Creek (ą, ąą, ę, ęę, į, įį, ǫ, ǫǫ) Navajo and Western Apache language (ą, ąą, ę, ęę, į, įį, ǫ, ǫǫ, ą́ ,ę́, į́, ǫ́) Mescalero-Chiricahua (ą, ąą, ę, ęę, į, įį, ų, ųų), Tutchone (ą, ę, į, ų, y̨) Gwich’in (ą, ąą, ę, ęę, į, įį, ǫ, ǫǫ, ų, ųų) Dogrib (ą, ąą, ę, ęę, į, įį, ǫ, ǫǫ) Ho-Chunk (ą, ąą, į, įį, ų, ųų) Elfdalian (ą, ę, į, ų, y̨ and ą̊) Rheinische D ...more...



Non-breaking space

topic

In word processing and digital typesetting , a non-breaking space (" ") (also called no-break space , non-breakable space ( NBSP ), hard space , or fixed space ) is a space character that prevents an automatic line break at its position. In some formats, including HTML , it also prevents consecutive whitespace characters from collapsing into a single space. In HTML, the common non-breaking space, which is the same width as the ordinary space character, is encoded as   or   . In Unicode , it is encoded as U+00A0 . Non-breaking space characters with other widths also exist. Uses and variations Despite having layout and uses similar to those of whitespace , it differs in contextual behavior. Non-breaking behavior Text-processing software typically assumes that an automatic line break may be inserted anywhere a space character occurs; a non-breaking space prevents this from happening (provided the software recognizes the character). For example, if the text "100 km" will not quite fit at the end of a li ...more...



Cyrillic (Unicode block)

topic

Cyrillic is a Unicode block containing the characters used to write the most widely used languages with a Cyrillic orthography. The core of the block is based on the ISO 8859-5 standard, with additions for minority languages and historic orthographies. Block Cyrillic Official Unicode Consortium code chart (PDF)   0 1 2 3 4 5 6 7 8 9 A B C D E F U+040x Ѐ Ё Ђ Ѓ Є Ѕ І Ї Ј Љ Њ Ћ Ќ Ѝ Ў Џ U+041x А Б В Г Д Е Ж З И Й К Л М Н О П U+042x Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я U+043x а б в г д е ж з и й к л м н о п U+044x р с т у ф х ц ч ш щ ъ ы ь э ю я U+045x ѐ ё ђ ѓ є ѕ і ї ј љ њ ћ ќ ѝ ў џ U+046x Ѡ ѡ Ѣ ѣ Ѥ ѥ Ѧ ѧ Ѩ ѩ Ѫ ѫ Ѭ ѭ Ѯ ѯ U+047x Ѱ ѱ Ѳ ѳ Ѵ ѵ Ѷ ѷ Ѹ ѹ Ѻ ѻ Ѽ ѽ Ѿ ѿ U+048x Ҁ ҁ ҂  ҃  ҄  ҅  ҆  ҇  ҈  ҉ Ҋ ҋ Ҍ ҍ Ҏ ҏ U+049x Ґ ґ Ғ ғ Ҕ ҕ Җ җ Ҙ ҙ Қ қ Ҝ ҝ Ҟ ҟ U+04Ax Ҡ ҡ Ң ң Ҥ ҥ Ҧ ҧ Ҩ ҩ Ҫ ҫ Ҭ ҭ Ү ү U+04Bx Ұ ұ Ҳ ҳ Ҵ ҵ Ҷ ҷ Ҹ ҹ Һ һ Ҽ ҽ Ҿ ҿ U+04Cx Ӏ Ӂ ӂ Ӄ ӄ Ӆ ӆ Ӈ ӈ Ӊ ӊ Ӌ ӌ Ӎ ӎ ӏ U+04Dx Ӑ ӑ Ӓ ӓ Ӕ ӕ Ӗ ӗ Ә ә Ӛ ӛ Ӝ ӝ Ӟ ӟ U+04Ex Ӡ ӡ Ӣ ӣ Ӥ ӥ Ӧ ӧ Ө ө Ӫ ӫ Ӭ ӭ Ӯ ӯ U+04Fx Ӱ ӱ Ӳ ӳ Ӵ ӵ Ӷ ӷ Ӹ ӹ Ӻ ӻ Ӽ ӽ Ӿ ӿ Notes History The following Unicode-related do ...more...



Unicode input

topic

Unicode input is the insertion of a specific Unicode character on a computer by a user ; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be inserted in three ways: from the screen by means of an applet from which one can select the character, by pasting from the operating system's clipboard , or by typing a certain sequence of keys on a physical keyboard . Unicode is similar to ASCII , but provides many more options and can store more signs. A Unicode input system needs to provide a large repertoire of characters, ideally all valid Unicode code points. This is different from a keyboard layout which defines keys and their combinations only for a limited number of characters appropriate for a certain locale . KCharSelect picks some of Unicode Mathematical Operators Unicode numbers Unicode characters are distinguished by code points , which are conventionally represented by "U+" followed by four or five hexadecimal digits , for example U+00AE or U+1D31 ...more...



Slovene alphabet

topic

The Slovene alphabet ( Slovene : slovenska abeceda , pronounced  or slovenska gajica ) is an extension of the Latin script and is used in the Slovene language . The standard language uses a Latin alphabet which is a slight modification of Serbo-Croatian Gaj's Latin alphabet , consisting of 25 lower- and upper-case letters: Letter Name IPA English Approx. A, a a a rm B, b be b at C, c ce ca ts Č, č če ch arge D, d de d ay E, e e , , b e d, sl e igh, a ttack F, f ef f at G, g ge g one H, h ha (Scottish English) lo ch I, i i m e J, j je y es K, k ka c at L, l el , l id, w ine M, m em m onth N, n en n ose O, o o , v o id, s o w P, p pe p oke R, r er (trilled) r isk S, s es s at Š, š eš sh in T, t te t ook U, u u s oo th V, v, ve , v ex, w est Z, z ze z oo Ž, ž že vi s ion Source: Omniglot The following Latin letters are also found in names of non-Slovene origin: Ć (mehki č), Đ (mehki dž), Q (ku), W (dvojni ve), X (iks), and Y (ipsilon), Ä , Ë , Ö , Ü . Diacritics The Slovene alphabet in various fonts ( Times New ...more...



Vertical bar

topic

a The vertical bar (  |  ) is a computer character and glyph with various uses in mathematics, computing, and typography. It has many names, often related to particular meanings: Sheffer stroke (in logic ), verti-bar , vbar , stick , vertical line , vertical slash, bar , glidus , obelisk , or pipe , and several variants on these names. It is occasionally considered an allograph of broken bar (See below). Usage Mathematics The vertical bar is used as a mathematical symbol in numerous ways: absolute value : | x | {\displaystyle |x|} , read "the absolute value of x " set-builder notation : { x | x 2 } {\displaystyle \{x|x , read "the set of x such that x is less than two". Often a colon ':' is used instead of a vertical bar cardinality : | S | {\displaystyle |S|} , read "the cardinality of the set S" conditional probability : P ( X | Y ) {\displaystyle P(X|Y)} , read "the probability of X given Y" divisibility : a | b {\displaystyle a|b} , read "a divides b" or "a is a factor of b", though Unicode also provides ...more...



UTF-32

topic

UTF-32 stands for Unicode Transformation Format in 32 bits. It is a protocol to encode Unicode code points that uses exactly 32 bits per Unicode code point (but a number of leading bits must be zero as there are fewer than 2 Unicode code points). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value. The main advantage of UTF-32 is that the Unicode code points are directly indexable. Finding the Nth code point in a sequence of code points is a constant time operation. In contrast, a variable-length code requires sequential access to find the Nth code point in a sequence. This makes UTF-32 a simple replacement in code that uses integers that are incremented by one to identify a character in a string, as was commonly done for ASCII . The main disadvantage of UTF-32 is that it is space-inefficient, using four bytes per ...more...



Short Message Peer-to-Peer

topic

Short Message Peer-to-Peer ( SMPP ) in the telecommunications industry is an open, industry standard protocol designed to provide a flexible data communication interface for the transfer of short message data between External Short Messaging Entities (ESMEs), Routing Entities (REs) and Message Centres . SMPP is often used to allow third parties (e.g. value-added service providers like news organizations) to submit messages, often in bulk, but it may be used for SMS peering as well. SMPP is able to carry short messages including EMS , voicemail notifications, Cell Broadcasts , WAP messages including WAP Push messages (used to deliver MMS notifications), USSD messages and others. Because of its versatility and support for non- GSM SMS protocols, like UMTS , IS-95 (CDMA), CDMA2000 , ANSI-136 (TDMA) and iDEN , SMPP is the most commonly used protocol for short message exchange outside SS7 networks. History Operation Contrary to its name, the SMPP uses the client-server model of operation. The Short Message Service ...more...



Zero-width non-joiner

topic

The zero-width non-joiner ( ZWNJ ) is a non-printing character used in the computerization of writing systems that make use of ligatures . When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively. This is also an effect of a space character , but a ZWNJ is used when it is desirable to keep the words closer together or to connect a word with its morpheme. The ZWNJ is encoded in Unicode as U+200C ZERO WIDTH NON-JOINER (HTML  ‌   · ‌ ). Use of ZWNJ and unit separator for correct typography In certain languages, the ZWNJ is necessary for unambiguously specifying the correct typographic form of a character sequence. The ASCII control code unit separator was formerly used. Correct (with ZWNJ) Incorrect Meaning Display* Picture Code Display* Picture Code می‌خواهم ‎ می‌خواهم (rendered from right to left): می ‌ خواهم میخواهم ‎ میخواهم Persian 'I want to' עֲו‌ֹנֹת ‎ ‎ עֲו‌ ...more...



Gaelic type

topic

Gaelic type (sometimes called Irish character , Irish type , or Gaelic script ) is a family of insular typefaces devised for printing Classical Gaelic . It was widely used from the 16th until the mid-18th century (Scotland) or the mid-20th century (Ireland) but is today rarely used. Sometimes all Gaelic typefaces are called Celtic or uncial , though most Gaelic types are not uncials. The "Anglo-Saxon" types of the 17th century are included in this category because both the Anglo-Saxon types and the Gaelic/Irish types derive from the Insular manuscript hand. The terms Gaelic type, Gaelic script, and Irish character translate the Irish phrase cló Gaelach (pronounced ). In Ireland the term cló Gaelach is used in opposition to the term cló Rómhánach ' Roman type '. The Scottish Gaelic term is corra-litir . Alasdair mac Mhaighstir Alasdair was one of the last Scottish writers with the ability to write in this script, although his main work, Ais-Eiridh na Sean Chánoin Albannaich , was published using the Roman alp ...more...



UTF-8

topic

UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes . The encoding is defined by the Unicode standard, and was originally designed by Ken Thompson and Rob Pike . The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit. It was designed for backward compatibility with ASCII . Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as " / " in filenames, " \ " in escape sequences , and "%" in printf . Shows the usag ...more...



Bi-directional text

topic

Bi-directional text is text containing text in both text directionalities , both right-to-left (RTL or dextrosinistral) and left-to-right (LTR or sinistrodextral). It generally involves text containing different types of alphabets , but may also refer to boustrophedon , which is changing text directionality in each row. Some writing systems of the world, including the Arabic and Hebrew scripts or derived systems such as the Persian , Urdu , and Yiddish scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by most writing systems in the world. When LTR text is mixed with RTL in the same paragraph, each type of text is written in its own direction, which is known as bi-directional text. This can get rather complex when multiple levels of quotation are used. Many computer programs fail to display bi-directional text correctly. For example, the Hebrew n ...more...



Pound sign

topic

The pound sign ( £ ) is the symbol for the pound sterling —the currency of the United Kingdom (UK). The same symbol is used for similarly named currencies such as the Gibraltar pound or occasionally the Syrian pound . It is also sometimes used for currencies named lira , for example the now withdrawn Italian lira . The symbol derives from a capital " L ", representing libra, the basic unit of weight in the Roman Empire , which in turn is derived from the Latin name of the same spelling for scales or a balance. The pound became an English unit of weight and was so named because it originally had the value of one tower pound (~350 grams) of fine (pure) silver . According to the Royal Mint Museum : The pound sign is placed before the number (e.g. "£12,000") and separated from the following digits by no space or only a thin space . The symbol ₤ (note the double dash at its middle) was called the lira sign in Italy , before the adoption of the euro . It was used (in free variation with £) as an alternative to the ...more...



National Replacement Character Set

topic

The National Replacement Character Set , or NRCS for short, was a feature supported by later models of Digital's (DEC) computer terminal systems, starting with the VT200 series in 1983. NRCS allowed individual characters from one character set to be replaced by one from another set, allowing the construction of different character sets on the fly. It was used to customize the character set to different local languages, without having to change the terminal's ROM for different counties, or alternately, include many different sets in a larger ROM. Many 3rd party terminals and terminal emulators supporting VT200 codes also supported NRCS. Description ASCII is a 7-bit standard, allowing a total of 128 characters in the character set. Some of these are reserved as control characters , leaving 96 printable characters . This set of 96 printable characters includes upper and lower case letters, numbers, and basic math and punctuation. ASCII does not have enough room to include other common characters such as multi-na ...more...



Chinese National Standards

topic

The national standards of the Republic of China administering Taiwan , Penghu , Quemoy and Matsu are titled National Standards of the Republic of China (CNS) (中華民國國家標準). They are administered by the Bureau of Standards, Metrology and Inspection of the Ministry of Economic Affairs. These standards are divided into 26 numbered categories. Applying the National Standards is voluntary unless authorities in charge cite any parts of the standards as laws and regulations. By the end of 2003, more than 15000 national standards have been issued. Although the Republic of China was removed in 1950 from the International Organization for Standardization (ISO) for failure to pay membership dues accordingly, there are still many National Standards translated from ISO standards into Chinese. A few standards also have English versions, but in case of any divergence of interpretation, the Chinese text shall prevail. Each standard has a general number and may be prefixed "CNS", such as CNS 11296. The general numbers, English n ...more...



CD-Text

topic

Compact Disc Text CD-Text is an extension of the Red Book Compact Disc specifications standard for audio CDs. It allows for storage of additional information (e.g. album name, song name, and artist name) on a standards-compliant audio CD. The specification for CD-Text was included in the Multi-Media Commands Set 3 R01 (MMC-3) standard, released in September 1996 and backed by Sony . It was also added to new revisions of the Red Book. The actual text is stored in a format compatible with Interactive Text Transmission System (ITTS), defined in the IEC 61866 standard. The ITTS standard is also applied in the MiniDisc format, as well as in Digital Audio Broadcasting technology. Storage The CD-Text information is stored in the subchannels R to W on the disc. This information is usually stored in the subchannels in the lead-in area of the disc, where there is roughly five kilobytes of space available. It can also be stored on the main program area of the disc (where the audio tracks are), which can store about 3 ...more...



Greek alphabet

topic

The Greek alphabet has been used to write the Greek language since the late 9th century BC or early 8th century BC. It was derived from the earlier Phoenician alphabet , and was the first alphabetic script to have distinct letters for vowels as well as consonants. It is the ancestor of the Latin and Cyrillic scripts . Apart from its use in writing the Greek language, in both its ancient and its modern forms, the Greek alphabet today also serves as a source of technical symbols and labels in many domains of mathematics, science and other fields. In its classical and modern forms, the alphabet has 24 letters, ordered from alpha to omega . Like Latin and Cyrillic, Greek originally had only a single form of each letter; it developed the letter case distinction between upper-case and lower-case forms in parallel with Latin during the modern era . Sound values and conventional transcriptions for some of the letters differ between Ancient Greek and Modern Greek usage, because the pronunciation of Greek has chang ...more...



Unicode control characters

topic

Many Unicode control characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation. For example, the null character ( U+0000 ) is used in C-programming application environments to indicate the end of a string of characters. In this way, these programs only require a single starting memory address for a string (as opposed to a starting address and a length), since the string ends once the program reads the null character. ISO 6429 control characters (C0 and C1) The control characters U+0000–U+001F and U+007F come from ASCII . Additionally, U+0080–U+009F were used in conjunction with ISO 8859 character sets (among others). They are specified in ISO 6429 and often referred to as C0 and C1 control codes respectively. Most of these characters play no explicit role in Unicode text handling. The characters U+0000 (NUL), U+0009 (HT), U+000A (LF), U+000D (CR), and U+0085 (NEL) are commonly used in text processing as formatting characters. ...more...



Unicode Consortium

topic

The Unicode Consortium ( Unicode Inc. ) is a 501(c)(3) non-profit organization that coordinates the development of the Unicode standard. Its stated goal is to eventually replace existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, contending that many of the alternative schemes are limited in size and scope, and are incompatible with multilingual environments. Unicode's success at unifying character sets has led to its widespread use in the internationalization and localization of software . The standard has been implemented in many recent technologies, including XML , the Java programming language , and modern operating systems . There are various levels of membership, and any company or individual willing to pay the membership dues may join this organization. Full members include most of the main computer software and hardware companies with any interest in text-processing standards, including Adobe Systems , Apple , Facebook , Google , Huawei , IBM ...more...



Latin Extended-A

topic

Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy characters from the ISO 6937 standard. The Latin Extended-A block has been in the Unicode Standard since version 1.0, with its entire character repertoire, except for the Latin Small Letter Long S, which was added during unification with ISO 10646 in version 1.1. Character table Code (hex) Grapheme Names U+0100 Ā Latin Capital Letter A with macron U+0101 ā Latin Small Letter A with macron U+0102 Ă Latin Capital Letter A with breve U+0103 ă Latin Small Letter A with breve U+0104 Ą Latin Capital Letter A with ogonek U+0105 ą Latin Small Letter A with ogonek U+0106 Ć Latin Capital Letter C with acute U+0107 ć Latin Small Letter C with acute U+0108 Ĉ Latin Capital Letter C with circumflex U+0109 ĉ Latin Small Letter C with circumflex U+010A Ċ Latin Capital Letter C with dot above ...more...



L1

topic

L1 , L01 , L.1 , L 1 or L-1 may refer to: L distance in mathematics, used in taxicab geometry L , the space of Lebesgue integrable functions in mathematics L1, in linguistics, a subject's first language or mother tongue L1 family , a protein family of cell adhesion molecules L-1 Identity Solutions, Inc , a US face-recognition corporation L1 (protein) , a cell adhesion molecule L-1 visa , a document used to enter the United States for the purpose of work L , Lagrangian point 1, the most intuitive position for an object to be gravitationally stationary relative to two larger objects (such as a satellite with respect to the Earth and Moon) L1, one of the frequencies used by GPS systems (see GPS frequencies ) L1, the common name for the Soviet space effort known formally as Soyuz 7K-L1 , designed to launch men from the Earth to circle the Moon without going into lunar orbit L1, an abbreviation denoting someone is a Level 1 Judge, in reference to Magic: The Gathering Bose L1 Portable Systems L=1, a lunar eclipse c ...more...



UTF-16

topic

UTF-16 (16- bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode. The encoding is variable-length , as code points are encoded with one or two 16-bit code units. (also see Comparison of Unicode encodings for a comparison of UTF-8 , -16 & -32 ) UTF-16 developed from an earlier fixed-width 16-bit encoding known as UCS-2 (for 2-byte Universal Character Set) once it became clear that 16 bits were not sufficient for Unicode's user community. History In the late 1980s, work began on developing a uniform encoding for a "Universal Character Set" ( UCS ) that would replace earlier language-specific encodings with one coordinated system. The goal was to include all required characters from most of the world's languages, as well as symbols from technical domains such as science, mathematics, and music. The original idea was to replace the typical 256-character encodings requiring 1 byte per character with an encoding using 2 = 65,536 values requi ...more...




Next Page
Javascript Version
Revolvy Server https://www.revolvy.com