|
Unicode in RichView |
Top Previous Next |
IntroductionUnicode is a worldwide character-encoding standard. Unicode simplifies localization of software and improves multilingual text processing. By implementing it in an application, a developer can enable the application with universal data exchange capabilities for global marketing, using a single binary file for every possible character code. Because each Unicode character is 16 bits wide (in UTF-16 encoding), it is possible to have separate values for up to 65,536 characters. Unicode-enabled functions are often referred to as "wide-character" functions. For Delphi 2009 or newer, Unicode is a default encoding for strings. Unicode strings are referred here as 'Unicode' Single-byte strings are referred here as 'ANSI' (for simplicity) Unicode and ANSI Text in TRichViewNot all strings in TRichView are Unicode strings. Text in text items can be Unicode or ANSI depending on Unicode property of text style. Text (item name) of non-text items is always ANSI. The following text depends on version of Delphi/C++Builder (Unicode for Delphi/C++Builder 2009 or newer, ANSI for older versions): ▪names of checkpoints; ▪visible text in labels, numbered sequences, endnotes, footnotes; ▪live spelling interface; ▪text in list markers; ▪hypertext targets (OnReadHyperlink, OnWriteHyperlink); ▪and others. Main Limitations of the Current ImplementationYou must prevent conversion of Unicode to double-byte character set (DBCS) strings, used for representation of characters in Asian languages, because DBCS is not supported by RichView. The only exception (where conversion is ok) is exporting and saving (because in these cases DBCS text will not be used in RichView). How to Enable Unicode. Using Both ANSI and UnicodeSet Unicode property of text style to True. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. The default value of this property is True for Delphi/C++Builder 2009 or newer. Document can contain both Unicode and ANSI text (in different styles). So, you can mix ANSI and Unicode text. Of course, you can use only ANSI or only Unicode styles. This is even recommended. How to Make Unicode Editor (Without ANSI Text)1.Set Unicode property to True for all TextStyles in TRVStyle. Important: document must be empty when changing this property. TRichViewEdit initially has one empty string, so it is not completely empty, call Clear before changing this property. The default value of this property is True for Delphi/C++Builder 2009 or newer. 2.Set RichViewEdit1.RTFReadProperties.UnicodeMode = rvruOnlyUnicode (this is the default value of this property for Delphi/C++Builder 2009 or newer). 3.Many methods working with text have 3 versions: ▪with TRVUnicodeString parameters (finished with -W, for example SearchTextW); ▪with TRVAnsiString parameters (finished with -A, for example SearchTextA); ▪with String parameters (for example, SearchText). For Delphi/C++Builder versions prior to 2009, use TRVUnicodeString-methods. For Delphi/C++Builder, you can use either TRVUnicodeString-methods or String-methods. Avoid using TRVAnsiString-methods to prevent conversion between Unicode and ANSI text. These methods include the following methods of TRichView (methods names without -A and -W are listed): ▪AddNLTag and its versions; These methods include the following methods of TRichViewEdit (methods names without -A and -W are listed): 4.Existing non-Unicode RVF documents must be converted to Unicode by calling ConvertToUnicode after loading them (see below). This step is not necessary for Delphi/C++Builder 2009: all text styles in RVF documents saved by applications compiled with older version of Delphi/C++Builder are converted to Unicode automatically. It's safe to call this procedure for Unicode documents – it will do nothing. uses CRVData, RVItem, RVUni; // this code uses some undocumented methods procedure ConvertRVToUnicode(RVData: TCustomRVData); var i,r,c, StyleNo: Integer; table: TRVTableItemInfo; begin for i := 0 to RVData.ItemCount-1 do begin StyleNo := RVData.GetItemStyle(i); if StyleNo>=0 then begin if not RVData.GetRVStyle.TextStyles[StyleNo].Unicode then begin RVData.SetItemTextR(i, RVU_GetRawUnicode(RVData.GetItemTextW(i))); Include(RVData.GetItem(i).ItemOptions, rvioUnicode); end; end else if RVData.GetItemStyle(i)=rvsTable then begin table := TRVTableItemInfo(RVData.GetItem(i)); for r := 0 to table.Rows.Count-1 do for c := 0 to table.Rows[r].Count-1 do if table.Cells[r,c]<>nil then ConvertRVToUnicode(table.Cells[r,c].GetRVData); end; end; end;
procedure ConvertToUnicode(rv: TCustomRichView); var i: Integer; begin ConvertRVToUnicode(rv.RVData); for i := 0 to rv.Style.TextStyles.Count-1 do rv.Style.TextStyles[i].Unicode := True; end; Unicode in Delphi/C++Builder 2009 or newerIn the new versions of Delphi/C++Builder, the String type is Unicode by default. Many properties and parameters in TRichView become Unicode, see "Unicode and ANSI Text in TRichView" above. Default (initial) values of some properties are changed: ▪Unicode property of text style (from False to True); ▪TRichView.RTFReadProperties.UnicodeMode (from rvruNoUnicode to rvruOnlyUnicode); ▪TRichView.Options (rvoAutoCopyUnicodeText is included, rvoAutoCopyText is excluded). When saving text styles (in RVF files or Delphi forms) in older versions of Delphi/C++Builder, only non-default value (True) of Unicode property of text style is saved. When saving text styles (in RVF files or Delphi forms) in Delphi/C++Builder 2009+, value of Unicode property is always saved, default or not. The main consequence is the following: when loading forms/RVF files with styles saved by older versions of Delphi/C++Builder in Delphi/C++Builder 2009+, Unicode property of all text styles become True. For RVF files, all text in text items is converted to Unicode automatically. Because of this change of default value of Unicode property, older projects that use deprecated methods (see "Deprecated Methods (If You Use Unicode)" above) may not work properly when compiled in Delphi/C++Builder 2009 or newer. It is because these methods must not accept ANSI string for Unicode text items. Change them to the proper methods as it is described in "How to Make Unicode Editor (Without ANSI Text)" above. ANSI text may appear in document when reading RTF files, if TRichView.RTFReadProperties.UnicodeMode<>rvruOnlyUnicode. If you use projects converted from the older version of Delphi/C++Builder, check value of this property. Import and ExportText Files LoadText, LoadTextFromStream load ANSI text files. When loading to Unicode style, they perform conversion from ANSI to Unicode. LoadTextW, LoadTextFromStreamW load Unicode text files. When loading to non-Unicode style, they perform conversion from Unicode to ANSI. Code page used for conversion is based on Charset property of the corresponding style (Charsets of Unicode styles are used only for conversion to/from ANSI). Note: you can test file with the function function RV_TestFileUnicode(const FileName: String): TRVUnicodeTestResult defined in RVUni.pas. Return values ▪rvutNo – the file is not Unicode (odd size); ▪rvutYes – the file is most likely Unicode (even size, Unicode byte-order characters at the start or #0 in text (first 500 bytes checked)); ▪rvutProbably – the file can contain Unicode (even size); ▪rvutEmpty – the file is empty; ▪rvutError – error opening the file. You can also use WinAPI function IsTextUnicode performing more advanced tests. SaveText saves ANSI text file. Unicode strings are converted basing on Style.DefCodePage property. SaveTextW saves Unicode text file. ANSI strings are converted basing on the corresponding Charsets. RTF (Rich Text Format) Methods for RTF saving are able to store Unicode. Methods for RTF loading and inserting work depending on TRichView.RTFReadProperties.UnicodeMode. HTML SaveHTML*** can save ANSI or Unicode (UTF-8) HTML files. In ANSI HTML files, Unicode characters are written as codes (&#NNNN;), so all Unicode characters are preserved, but file size is increased. Selection, Search and The ClipboardGetSelTextA returns selection as an ANSI string. Unicode text is converted basing on Style.DefCodePage property. GetSelTextW returns selection as a Unicode string. ANSI strings are converted basing on corresponding Charsets. Text searching methods have versions allowing to search for ANSI and for Unicode string: TRichView.SearchTextA/SearchTextW, TRichViewEdit.SearchTextA/SearchTextW. All methods can search both in ANSI and Unicode text items. When comparing ANSI text with Unicode text, SearchText methods use Style.DefCodePage property, SearchText methods use text Charsets. CopyTextA copies selection as ANSI text. Unicode strings are converted basing on Style.DefCodePage property. CopyTextW copies selection as Unicode. ANSI strings are converted basing on corresponding Charsets. None: on NT-based systems (such as Windows XP), the Clipboard is able to convert Unicode text to ANSI text and vice versa. So, if you copy in one of these formats, both formats are available for pasting. Copy and CopyDef are able to copy Unicode (Option-rvoAutoCopyUnicodeText) Editing OperationsIf pasting text using Paste method, and both ANSI and Unicode texts are available in Clipboard, then the choice is made depending on the current text style (Unicode or not). PasteTextA pastes ANSI text, PasteTextW pastes Unicode text. InsertTextFromFile: the file must be ANSI (converted, if needed) InsertOEMTextFromFile: the file must be OEM (converted, if needed) InsertTextFromFileW: the file must be Unicode (converted, if needed) InsertText, InsertStringTag add Unicode string in Delphi/C++Builder 2009+ and ANSI string in older versions of Delphi/C++Builder. InsertTextA, InsertStringATag add ANSI string (converted, if needed) InsertTextW, InsertStringWTag add Unicode string (converted, if needed) RVF (RichView Format)Applications compiled with older versions of RichView (version less than 1.2) will not be able to load RVF files with Unicode. RVF files will be loaded correctly even if Unicode flags in text styles are mismatched (saved with different RVStyle then loaded), conversions will be performed if required (for example, this conversion will occur when loading old RVF files in applications compiled in Delphi/C++Builder 2009+). There are two RVF Warnings: rvfwConvToUnicode and rvfwConvFromUnicode, which indicate if any conversion took place. TRichView v11 introduces a new change in RVF files allowing to store String properties as Unicode. RVF files saved in Delphi/C++Builder 2009+ are saved as RVF version 1.3.1, RVF files saved in the older versions of Delphi/C++Builder are saved as RVF version 1.3. See also...▪TRVStyle.DefUnicodeStyle; |