PDA

View Full Version : UCE files


odidio
09-05-2006, 10:25 PM
I asked this question a long time ago at another forum and never realy got a solid answer. It's not a problem, I'm just wondering what the 8 *.UCE files are used for ? From what I know most xp's have them but it's unclear what their purpose is, to me anyway. These are the files and they are all in C:\WINDOWS\SYSTEM32.

BOPOMOFO.UCE
GB2312.UCE
IDEOGRAF.UCE
KANJI_1.UCE
KANJI_2.UCE
KOREAN.UCE
SHIFTJIS.UCE
SUBRANGE.UCE

Isn't .UCE an extension for unsolisited commercial email ?

Micron
10-05-2006, 01:48 AM
Its a UniCode Extension.

Unicode is an industry standard designed to allow text and symbols from all languages to be consistently represented and manipulated by computers.

Unicode characters can be encoded using any of several schemes termed Unicode Transformation Formats (UTF).

The Unicode Consortium has as its ambitious goal the eventual replacement of existing character encoding schemes with Unicode, as many of the existing schemes are limited in size and scope, and are incompatible with multilingual environments. Its success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java programming language, and modern operating systems.

Unicode in use - Operating systems

Unicode has become the dominant scheme for internal processing and sometimes storage (though a lot of text is still stored in legacy encodings) of text. Early adopters tended to use UCS-2 and later moved to UTF-16 (as this was the least disruptive way to add support for non-bmp characters). The best known such system is Windows NT (and its descendants, Windows 2000 and Windows XP). The Java and .NET bytecode environments also use it.

UTF-8 (originally developed for Plan 9) has become the main encoding on most Unix-like operating systems (though others are also used by some libraries) due to the fact it is a relatively easy replacement for traditional extended ascii character sets.

E-mail

MIME defines two different mechanisms for encoding non-ASCII characters in e-mail, depending on whether the characters are in e-mail headers such as the "Subject:" or in the text body of the message. In both cases, the original character set is identified as well as a transfer encoding. For e-mail transmission of Unicode the UTF-8 character set and the Base64 transfer encoding are recommended. The details of the two different mechanisms are specified in the MIME standards and are generally hidden from users of e-mail software.

The adoption of Unicode in e-mail has been very slow. Most East-Asian text is still encoded in a local encoding such as Shift-JIS, and many commonly used e-mail programs still cannot handle Unicode data correctly, if they have any support at all. This situation is not expected to change in the foreseeable future.

Web

Web browsers have been supporting severals UTFs, especially UTF-8, for many years now. Display problems result primerally from font related issues. In particular Internet Explorer doesn't render many code points unless it is explicitly told to use a font that contains them.

All W3C recommendations are using Unicode as their document character set, the encoding being variable, ever since HTML 4.0. It replaces the 8-bit ASCII superset ISO-8859-1, which had been the standard character set and encoding before.

Although syntax rules may affect the order in which characters are allowed to appear, both HTML 4 and XML (including XHTML) documents, by definition, comprise characters from most of the Unicode code points, with the exception of:

* most of the C0 and C1 control codes.
* the permanently-unassigned code points D800–DFFF
* any code point ending in FFFE or FFFF
* any code point above 10FFFF.

These characters manifest either directly as bytes according to document's encoding, if the encoding supports them, or users may write them as numeric character references based on the character's Unicode code point.

For example, the references Δ, Й, ק, م, ๗, あ, 叶, 葉, and 냻 (or the same numeric values expressed in hexadecimal, with &#x as the prefix) display on browsers as Δ, Й, ק,‎ م, ๗, あ, 叶, 葉 and 냻. If the proper fonts exist, these symbols look like the Greek capital letter "Delta", Cyrillic capital letter "Short I", Hebrew letter "Qof", Arabic letter "Meem", Thai numeral 7, Japanese Hiragana "A", simplified Chinese "Leaf", traditional Chinese "Leaf", and Korean Hangul syllable "Nyaelh", respectively.

Fonts

Free and retail fonts based on Unicode occur commonly, since first TrueType and now OpenType support Unicode. These font formats map Unicode code points to glyphs.

Thousands of fonts exist on the market, but fewer than a dozen fonts — sometimes described as "pan-Unicode" fonts — attempt to support the majority of Unicode's character repertoire. Instead, Unicode-based fonts typically focus on supporting only basic ASCII and particular scripts or sets of characters or symbols. Several reasons justify this approach: applications and documents rarely need to render characters from more than one or two writing systems; fonts tend to demand resources in computing environments; and operating systems and applications show increasing intelligence in regard to obtaining glyph information from separate font files as needed. Furthermore, designing a consistent set of rendering instructions for tens of thousands of glyphs constitutes a monumental task; such a venture passes the point of diminishing returns for most typefaces.

Several subsets of Unicode are standardized: Microsoft Windows since Windows NT 4.0 supports WGL-4 with 652 characters, which is considered to support all contemporary European languages using the Latin, Greek or Cyrillic script. Other standardized subsets of Unicode include MES-1 (335 characters) and MES-2 (1062 characters) (CWA 13873:2000, Multilingual European Subsets in ISO/IEC 10646-1).

odidio
10-05-2006, 08:49 PM
A bit over my head but I think I get the idea.

Using Unicode/UTF-8, you can write in emails and source code things such as

Mathematics and sciences:

∮ E⋅da = Q, n → ∞, ∑ f(i) = ∏ g(i), ⎧⎡⎛┌─────┐⎞⎤⎫
⎪⎢⎜│a²+b³ ⎟⎥⎪
∀x∈ℝ: ⌈x⌉ = −⌊−x⌋, α ∧ ¬β = ¬(¬α ∨ β), ⎪⎢⎜│───── ⎟⎥⎪
⎪⎢⎜⎷ c₈ ⎟⎥⎪
ℕ ⊆ ℕ₀ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ, ⎨⎢⎜ ⎟⎥⎬
⎪⎢⎜ ∞ ⎟⎥⎪
⊥ < a ≠ b ≡ c ≤ d ≪ ⊤ ⇒ (⟦A⟧ ⇔ ⟪B⟫), ⎪⎢⎜ ⎲ ⎟⎥⎪
⎪⎢⎜ ⎳aⁱ-bⁱ⎟⎥⎪
2H₂ + O₂ ⇌ 2H₂O, R = 4.7 kΩ, ⌀ 200 mm ⎩⎣⎝i=1 ⎠⎦⎭

Linguistics and dictionaries:

ði ıntəˈnæʃənəl fəˈnɛtık əsoʊsiˈeıʃn
Y [ˈʏpsilɔn], Yen [jɛn], Yoga [ˈjoːgɑ]

etc.....etc........etc

Am I on the right track ? I see when I copy/pasted that quote, it looks pretty interesting before I submit it for viewing.

http://img140.imageshack.us/img140/1304/unicode6oc.th.jpg (http://img140.imageshack.us/my.php?image=unicode6oc.jpg)

Micron
10-05-2006, 09:03 PM
It looks like an attachment. If you save it to your desktop and open it in winzip, you may be able to extract the file inside.

Also, someone has sent you something with the wrong text encoding, so play about with the encoding settings in the options or View > Encoding if in a browser.

odidio
10-05-2006, 10:27 PM
Sorry, I should have said the quote are examples of some uinicode.
There's no attachment or anything, what I copy/pasted were examples from this site, http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

The thumbnail is what it looked like when I pasted it here before posting.