/* * Trade secret of Scansoft, Inc. * Copyright (c) 1991-2000 Scansoft, Inc. All rights reserved. * Copyright protection claimed includes all forms and matters of * copyrightable material and information now allowed by statutory or * judicial law or hereinafter granted, including without limitation, * material generated from the software programs which are displayed on * the screen such as icons, * screen display looks, etc. */ /* $Id: kdoctext.h,v 1.1 2004/07/13 19:37:41 kath Exp $ */ /* Scansoft, Inc. hereby releases this file into the public domain with no claim of applicability whatsoever and no warrantee of servicability or accuracty. */ /*!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! This include file establishes a set of markups that appear in documents of the XDOC format. This is the format of the text when the text format parameter has the value TEXTFORMAT_XDOC or the subtype TEXTFORMAT_XDOC_LITE. There is also a version called TEXTFORMAT_XDOC_PLUS that is used internally. TEXTFORMAT_XDOC_LITE is a form of TEXTFORMAT_XDOC that does not contain any formatting. It can be used as an intermediate format or as a diagnostic tool. Also, if no formatting is desired it can be used to retrieve the results of character recognition. The XDOC_LITE modifiers (markup codes) have fewer operands than do the XDOC modfiers. XDOC appends its extra modifiers at the end of those already used in the XDOC_LITE format. Any reader for XDOC Text should be programmed so as to be capable of reading KDC_MAXOPS number of operands for any modifier. By treating any absent modifier as the value zero, the program will be capable accepting XDOC_LITE format as well. In the description below, the operands which DO NOT appear in the XDOC_LITE format are denoted by an asterisk '*' before their operand sequence. For instance, in the following (bogus) list of operands, operand 4 is NOT present in XDOC_LITE output. 1) the date 2) the time 3) the place *4) the meaning of life. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! */ /*----DESCRIPTION OF XDOC TEXT This file defines the mark-up codes for the XDOC Text format. The XDOC Text format contains text and text modifiers. Often the text modifiers have operands giving text formatting codes and positional information. These markups are required by many document manipulation post-processors. Each text character takes eight bits. By default, a specially extended version of the ISO Western European code set for text character data is used for recognized character in the Roman alphabet. These appear in the recognized text and in any operands which may have string values. All other modifier codes and modifier operands endcode ;in 7-bit ASCII. Any non-printing code page byte code which appears in a XDOC Text file is there for readability and should be ignored, with the following exceptions, these being the decial values of the byte codes defined by ScanSoft, Inc. for characters with no ISO equivalent: L_SINGLE_QUOTE 129 left single quote R_SINGLE_QUOTE 130 right single quote L_DOUBLE_QUOTE 131 left double quote R_DOUBLE_QUOTE 132 right double quote EM_DASH 133 a wide dash BULLET 134 small, solid circle All text modifiers are encoded as a single, alphabetic ASCII character. There are two types of modifiers: those with operands, by convention a lower case letter, and those without operands, by convention an upper case letter. Both types of text modifiers are preceded in the text stream by a modifier escape character, here called KDC_STESCAPE. Modifiers which take no operands consist of only these two characters, the KDC_STESCAPE followed by the modifier code. However, modifiers which take operands are followed in the text stream by some number, perhaps zero, of coded operands followed by a single modifier-end escape character, here called KDC_NDESCAPE. The character KDC_STESCAPE, in order to be interpreted as a character of recognized text, must be doubled. Consequently, KDC_STESCAPE cannot be used as a modifier code. Each operand following a modifier in the text stream is preceded by a separator character, here call KDC_OPSEP. An operand is either a single character operand, or a numeric operand, or a string operand. A single character operand must be a printable ASCII alphabetic character, but not KDC_OPSEP. Numeric operands are expressed as integer numerals in the range -16384 to 16384 using numeric ASCII character. All characters in the numeric operand, except the first digit, must be digits. The first character may be '-' or a digit. String operands are sequences of arbitrary ASCII characters enclosed in a quotation character, here called KDC_QUOTE. Two KDC_QUOTE characters in a row signify a single KDC_QUOTE as part of the string. However, if the opening KDC_QUOTE is followed by another KDC_QUOTE (but not two more KDC_QUOTE characters), this signifies an empty string. (If the opening KDC_QUOTE is followed by two more KDC_QUOTE characters, this then means that the first character of the string is the KDC_QUOTE character, and a closing KDC_QUOTE must follow at the end of the string.) The number of operands and the meanilng of each operand are given below under the individual modifier codes. Multiple operands are contiguous and separated by semicolons. These are an external representation only. No particular internal data structure or tuple is implied by the number or coding of these operands. All ordinate operands are in "points" which are units of 0.1 mm (KDC_PTSIZE units per inch). Beware! These units do not corres- pond to any of the units of typographers points (pica, Didot, Mediaan, etc.). The physical unit may vary from one release of ICR/K to the next. Caveat Lector! Coordinates are referenced to (0,0) at the upper left corner of the image, which need not be strictly a "page" of text. Usually this is the same point as the upper left corner of the scanner bed where the page was input, but not always since the page may be presented to the scanner at rotations of 0, 90, 180 or 270 degrees, +/-5 degrees of tilt(after detilting). The ICR/K processing locates the image corner which lies to the left and above the upper left corner of the text (in reading orientation) and renders all coordinates with respect to this origin and the reading orientation. The upper left corner of the image is left unaffected by any corrections for the +/-5 degrees of tilt that may be present in the image. All other coordinates on the image (except the origin) may require correction to remove the effects of text tilt. Although text orientation is always resolved in XDOC Text (so that left-to-right, top-to-bottom reading order is observed when reading the file), the tilt correction may or may not have been resolved in any particular KDOC Text file. In the source image, each line of recognized text is seen as tilted. The average of all the tilts in the input image is the text tilt for the image. In XDOC Text, however, the lines of text are never seen as tilted. At worst, lines of text are seen as sheared, with their entire baseline set at the intersection of the baseline of the image text line with the left margin of its galley. If "tilt" is not resolved in XDOC Text, X-ordinates will be seen as sheared. On sheared images left-justified text will have a non-vertical left margin. All XDOC X-ordinates are subject to modification in order to de-shear the text layout, thus correcting, for all practical purposes, the +/-5 degrees of permitted tilt error. The XDOC Text format permits the X-ordinates to be either the original co-ordinates or fully de-sheared. The amount of de-shearing already applied to correct the X-ordinates is referred to as the text "untilt". This is either 0 (no tilt correction) or equal to the text tilt of the image (full tilt correction). Due to text tilt and the manner in which it is converted to shear (before de-shearing), baselines are not necessarily consistent from one galley to the next within an image. This means that side-by-side lines sharing a common baseline on a highly tilted source image may have significantly different baselines in XDOC Text. Note that, in general, absolute image coordinates are not calculable from KDOC text coordinates with great accuracy. But the relative placement of text grouped in a galley is quite accurate. All characters outside the bounds of the modifiers are text, except that any ASCII space, carriage return or newline characters in text are to be ignored. They are present only for legibility and their typographic equivalent for the text is represented by an explicit modifier. If a word of recognized text is deleted from an xdoc lite file, a space may be substituted for the deleted text. It is ok to leave the associated markups in the xdoc lite file. When the xdoclite file is later read back into core the extra stuff will be filtered out. The underscore character may occur in text even though there is an underline modifier. By convention, underlined horizontal spaces are represented as leader-dot with underscore as the leadering character. Actual underlined leader-dot cannot be distinguished from non-underlined leader-dot. ------*/ #ifndef INC_KDC_TEXT #define INC_KDC_TEXT 1 /* Number of XDOC modifier codes: */ #define KDC_NMODS ((int) 33) /* Maximum # of operands for any one XDOC modifier: This number is unusually large. It is extremely unlikely that there will be KDC_MAXOPS operands. However, since it is theoretically possible it is defined as such. What kind of markup could have so many operands? There could theoretically be a cell table with 250 cells. This would produce a huge number of operands. Especially where the border codes are described. Also, there could theoretically be 250 colums in a section. Although it is extremely unlikely, it is possible*/ #define KDC_MAXOPS ((int) 2000) /* Maximum # of operands (counting consecutively from the first that could have more than sixteen operands that may not be an int . In other words operand KDC_MAX_NON_INT_OPS - x will be of type int */ #define KDC_MAX_NON_INT_OPS ((int) 10) #define KDC_STESCAPE '[' #define KDC_NDESCAPE ']' #define KDC_OPSEP ';' #define KDC_QUOTE '"' /* Max length for a string operand in XDOC Text.*/ #define KDC_MXSLEN ((int) 60) /* Number of points per inch. * All XDOC Text distance operands are in XDOC points. */ #define KDC_PTSIZE ((int) 254) /* 0.1 mm. */ /* Word box coordinates are in KDC_WBOX_PTSIZE units */ #define KDC_WBOX_PTSIZE KDC_PTSIZE /* Define maximum recognized text tilt. */ #define INF_TILT (FWX_INT16)(15 * KDC_PTSIZE) /* dimensionless: dX/dY */ /* Versions of XDOC Text. Used in the KDC_STDOC markup. */ #define XDOC_VERSION "XDOC.12.0" /* (Enhanced) xdoc Version of text */ /*-----Structure of XDOC Text files-----*/ /* XDOC files contain ICR output shown as page renditions. XDOC is a more "intelligent" format which uses text layout analysis to enhance the presentation of the text. For instance, "edge noise" characters are eliminated, skew is corrected, pages are re-arranged into reading order, continued text lines (soft returns) are distinguished from terminal text lines (hard returns), tab-advance white-space is distinguished from running white-space, centered lines are distinguished from paragraphs, paragraphs are distinguished from tables, etc. Sections are found and the number of columns in each section are found. Also, cell tables are analyzed. One file may contain more than one document, each having its own name. In that case, an XDOC file is the concatenation of one or more documents. When there is more than one document within as single XDOC Text file, each document is regarded as a "sub-document" of a single document job. The page collation sequence should be consistent over all sub-documents. (Sub-documents are like chapters in a book. The have separacte titles but the page numbering is consistent across the book.) The structure of a (sub-)document is given below: ----------- Document Structure -------------- modifier (start-of-document) --optional-- modifier (document name) --optional-- ) ) . > zero, one or more pages (see below) . ) . ) ) modifier (end-of-document) --optional-- ---------------------------------------------- In the above Document Structure, a consists of the following: modifier (start-of-page) ) ) . > one modifier for each horizontal or . > vertical text ruling on the page. . ) ) ) ) . > one modifier for each font ID used in . > the page. . ) ) > occurs before the text in each section on the page )> output before the first text in table )> output before the first text in row )> output before the first text in col ) ) . > one or more text lines (see below) . ) . ) ) output after the last text in table ) ) . > one or more zone frames (see below) . ) . ) ) ) occurs at col end within a section (at end of a text zone) ) . ) . ) ) modifier (end-of-page) where each is one of the following: 1) a modifier (a picture or line-art zone frame) or 3) a modifier (a text zone frame> where a consists of the following: modifier (start-of-line, giving the id of a text zone) modifier (end-of-line) where consists of recognized text. Recognized text may have the following embedded modifiers: - - - - - - - - - - - - (This may occur only at the end of a line.) - - ----------*/ /*----MODIFIERS----*/ #define KDC_LEADER ((char) 'l') /*-This modifier represents a repetitive sequence of characters which was interpreted as leader - i.e., the amount of horizontal space the sequence occupies is more important than the number of characters in the sequence. Operands: ----------------------------------------------------------------------- 1) A one- or two-character string, identifying the repeating character sequence. (two-character strings not yet implemented) 2) The X-ordinate of the left edge of the sequence. (The Y-ordinate associated with this X is the line's baseline.) 3) The distance from the left edge of the sequence to the right edge of the sequence. Thus, the X-ordinate of the right edge of the sequence is the sum of the 3rd operand and this operand. (The Y-ordinate associated with this X is the line's baseline.) 4) Unique id of the next word on this line. *5) The same as 2) but non-zero only if the horizontal space is a tab advance. In this case the distance is expressed the as the number of blank spaces (of "n" width in the most recent font) skipped over by this tab advance. *6) A tab advance count. This value is non-zero whenever the horizontal space is a tab advance. This count is the number of tab advance characters needed to advance over this white space relative to tab stop settings implied by the current context. These tab settings are implicitly changed for each paragraph, table or centered line but are not otherwise specified. It is assumed, regardless of the layout of the source document, that the tab stop settings are such that tab stop over-runs do not occur and that a tab stop cannot be co-incident with another tab stop or with either the left or right text margin. -------*/ #define KDC_REGION ((char) 'e') /*-This modifier indicates a change of region, or the initial region of a text zone. A region is a convex area on the source image. Its demen- sions are arbitrary and may intersect many text zones. However, these dimensions are not conveyed in XDOC Text. The regions are merely labels attached to words of text. This modifier may appear only within a text line. The region change applies only to the succeeding text in that line's text zone. Operands: ----------------------------------------------------------------------- 1) Region ID number. Note that region numbers are assigned by the icr process in the order in which regions are recognized. ----------*/ #define KDC_RULE ((char) 'r') /*-This modifier gives a ruling descriptor. Only horizontal or vertical rulings will be indicated. Interval should be 0 if style is KDOCR_SINGLE. Mid-points and thickness are given for a single stroke of the ruling: leftmost for vertical rulings and topmost for horizontal rulings. (Currently, only KDOCR_SINGLE is supported.) Operands: ----------------------------------------------------------------------- 1) Mid-point X. 2) Mid-point Y. 3) Orientation: KDOCR_HORIZON or KDOCR_VERTICAL. 4) Length. 5) Style: KDOCR_SINGLE, KDOCR_DOUBLE, KDOCR_TRIPLE. 6) Thickness. 0 if unknown. 7) Interval. 0 if unknown. 8) id 9) type 10) Celltable ruling (FWX_TRUE or FWX_FALSE). ---------*/ #define KDOCR_HORIZON ((char) 'H') #define KDOCR_VERTICAL ((char) 'V') #define KDOCR_SINGLE ((char) 's') #define KDOCR_DOUBLE ((char) 'd') #define KDOCR_TRIPLE ((char) 't') #define KDOCR_IP_RULE ((char) 'I') #define KDOCR_RECOG_RULE ((char) 'R') #define KDC_CHGFONT ((char) 'c') /*-This modifier indicates a change of font, or the initial font of a text xone. The recognized text which follows it should be considered to belong to the font it specifies. This modifier may appear only within a text line. The font change applies only to the succeeding text in that line's text zone. Operands: ----------------------------------------------------------------------- 1) Font ID number. Note that font numbers are assigned by the icr process in the order in which fonts are recognized. As a consequence, they do not apply across multiple documents unless the documents are processed using the same training file.----------*/ #define KDC_FONTINFO ((char) 'f') /*-This modifier gives the style portion of a font descriptor. Operands: ----------------------------------------------------------------------- 1) Font ID number. 2) The name of the font family; a string. ("H", "T" ,"C", "TC", "HC") 3) The face style: KDOCF_ROMAN, KDOCF_ITALIC, KDOCF_BOLD or KDOCF_BOLDIT. 4) The serif style: KDOCF_SANS, KDOCF_THINSERIF, KDOCF_SQSERIF, KDOCF_RNDSERIF or KDOCF_UNKNOWN. 5) Average distance from the beginning of one character to the beginning of the next, when there's no intervening space or tab. 0 if unknown. This is known as the average column (or character) width. This distance is measured in units KDC_ACW times more precise than KDC_PTSIZE. 6) KDOCF_FIXED or KDOCF_VAR, depending on whether font is fixed-width or variable-width. KDOCF_UNKNOWN if unknown. 7) Height of big chars without a descender (baseline-to-top). 0 if unknown. 8) Height of small characters with a descender. 0 if unknown. 9) Height of small characters without a descender. 0 if unknown. 10) Typographers font size in pica points (72 per inch). 11) font width (eg: 100% = normal, 80% = condenced, 120% = expanded) ---------*/ #define KDC_ACW ((int) 100) /* widths and heights now in units of 0.001 mm. */ #define KDOCF_FIXED ((char) 'F') #define KDOCF_VAR ((char) 'V') #define KDOCF_UNKNOWN ((char) 'U') #define KDOCF_ROMAN ((char) 'R') #define KDOCF_ITALIC ((char) 'I') #define KDOCF_BOLD ((char) 'B') #define KDOCF_BOLDIT ((char) 'T') #define KDOCF_SANS ((char) 's') #define KDOCF_THINSERIF ((char) 't') #define KDOCF_SQSERIF ((char) 'q') #define KDOCF_RNDSERIF ((char) 'r') #define KDC_HSPACE ((char) 'h') /*-This modifier indicates a horizontal advance. It indicates the end of one word, and the beginning of the next. This code is NOT used to start or end of a line. Operands: ----------------------------------------------------------------------- 1) The X-ordinate of the right edge of the last output text character. (The Y-ordinate associated with this X is the line's baseline.) 2) The distance from the right edge of the last output text character to the left edge of the next text character, the one which follows this markup. Thus, the X-ordinate of the left edge of the next text character is the sum of the 1st operand plus this operand. (The Y-ordinate associated with this X is the line's baseline.) 3) Unique id of the next word on this line. *4) The same as 2) but non-zero only if the horizontal space is a tab advance. In this case the distance is expressed the as number of blank spaces (of "n" width in the most recent font) skipped over by this tab advance. Note that this count may be zero even though the horizontal space is a tab advance if operand 2) is virtually 0. Also, this count may include "pad" spaces to the left of the next word if these spaces intervene between the tab advance and this word. *5) A tab advance count. This value is non-zero only when the horizontal space is a tab advance. This count is the number of tab advance characters needed to advance over this white space relative to tab stop settings implied by the current context. These tab settings are implicitly changed for each paragraph, table or centered line but are not otherwise specified. It is assumed, regardless of the layout of the source document, that the tab stop settings are such that tab stop over-runs do not occur and that a tab stop cannot be co-incident with another tab stop or with either the left or right text margin. Thus, the tab advance count will be NON-zero if the horizontal space is a tab advance even if operand 2) is virtually 0. --------*/ #define KDC_STDOC ((char) 'a') /*-This modifier indicates the start of a document. One and only one such modifier must occur per document, and it must be the very first thing in the document (no other modifiers and no text may precede it). Operands: ----------------------------------------------------------------------- 1) A string. Revision code of KDOC Text format which generated this KDOC Text file. KDC_VERSION defines the current revision code string. 2) flavor of xdoc. Either XDOC_ENHANCED , XDOC_LITE or XDOC_PLUS. 3) A string. Revision code of the core technology version the produced these ocr results. This is the value found in directiv.h (FWX_VERSION) -------*/ #define XDOC_ENHANCED ((char)'E') /* with format analysis */ #define XDOC_LITE ((char)'L') /* no FA, intermediate format */ #define XDOC_PLUS ((char)'P') /* with format anal and style info */ #define KDC_NDDOC ((char) 'Z') /*-This modifier indicates the end of a document. Actually, it ends, by necessity, only the set of pages having a consistent page collation numbering scheme and a common document name. -------*/ #define KDC_DOCNAME ((char) 'd') /*-This is the document name. It may be either user-defined or system-defined. Un-named documents are permitted, though not currently generated. Operands: ----------------------------------------------------------------------- 1) The name of the document; a string. -------*/ #define KDC_STPAGE ((char) 'p') /*-This modifier indicates the beginning of a new page. Operands: ----------------------------------------------------------------------- 1) The page number. This is user defined. It must be non-negative. 2) Orientation: either KDC_PORTRAIT or KDC_LANDSCAPE. 3) bool. Recomp mode on or off. FWX_TRUE if on, FWX_FALSE if off. 4) recognition mode (KDC_STANDARD or KDC_FAX). Indicates whether the fax switch was set during recognition. 5) skew -- this number represents the skew angle between the original image and the image that was processed. In other words if any deskewing was done there will be a skew angle between these two images. 6) Text untilt, the correction already applied to de-shear X-ordinates as seen on the KDOC text page. (See text tilt under KDC_NDPAGE below.) The smallest value of the text untilt is 0, meaning no de-shearing has been applied for this page. Any (absolute) value greater than or equal to the (absolute) value of text tilt given in KDC_NDPAGE, especially the value INF_TILT or -INF_TILT, is taken to mean full de-shearing. For values of text untilt between 0 and the text tilt as given in KDC_NDPAGE, the ratio of the acutal de-shear already applied (to X-ordinates) to the full de-shear needed to remove all X errors due to text tilt at each point on the text is the same as the ratio of untilt to tilt for the image. 7) x resolution of image (the one that the word bounding boxes match) 8) y resolution of image (the one that the word bounding boxes match) 9) The X-ordinate of the top-left corner of the page frame in the source image with respect to the origin of the image layout, expressed in units of KDC_PTSIZE, not in units of image resolution. 10) The Y-ordinate of the top-left corner of the page frame in the source image with respect to the origin of the image layout, expressed in units of KDC_PTSIZE, not in units of image resolution. 11) The page width, in units of KDC_PTSIZE. (0 if unknown.) 12) The page height. in units of KDC_PTSIZE. (0 if unknown.) 13) User set or Ok'd zones (M_USER_SPECIFIED_REGIONS was set) FWX_TRUE or FWX_FALSE 14) User set or OK'd zone order (M_USER_SPECIFIED_ORDER was set) FWX_TRUE or FWX_FALSE 15) Word box units (KDC_PIXELS, or KDC_METRIC) ---------*/ #define KDC_METRIC 1 #define KDC_PIXELS 2 #define KDC_STANDARD ((char) 'S') #define KDC_FAX ((char) 'F') #define KDC_PORTRAIT ((char) 'P') #define KDC_LANDSCAPE ((char) 'L') #define KDC_NDPAGE ((char) 'g') /*-This modifier indicates the end of a page. Operands: ----------------------------------------------------------------------- 1) Text tilt, represented as the change in X necessary to produce a change of +1 in Y as you move your finger along the (tilted) horizontal. A number of very large absolute value is the same as no tilt. (dimensionless: dX/dY). This is the amount of tilt between the sheared text and the image. Do not confuse it with skew. 2) X-ordinate of left edge of the physical image. 3) Y-ordinate of top edge of the physical image. 4) X-ordinate of right edge of the physical image. 5) Y-ordinate of bottom edge of the physical image. 6) One of the KDC_* symbols. FAN_REQD means that this is a required page break. KDC_OPT means that this is an optional page break. ---------*/ /* Codes used by KDC_NDPAGE: */ #define KDC_REQD ((char) 'R') #define KDC_OPT ((char) 'O') #define KDC_SECTION ((char) 'k') /*-This modifier is a "section" change markup. It indicates the changes in # of columns, marks headers,footers and captions. The captions are marked with the picture that they belong to. Never putput in XDOC_LITE. Operands: --------------------------------------------------------------- *1) type (KDC_COLUMN,KDC_HEADDER,KDC_FOOTER,KDC_CAPTION,KDC_TIMESTAMP) *2) # of columns (relvant if the type of region is column) *3) id of picture that footer goes with (relevant only if type is KDC_CAPTION). If the section is KDC_HEADDER or KDC_FOOTER then this operand is the position (KDC_LHEADER,KDC_CHEADER,KDC_RHEADER,KDC_LFOOTER,KDC_CFOOTER, KDC_RFOOTER) *4) If type is KDC_COLUMN then value is KDC_HAS_VRULINGS or KDC_NO_VRULINGS. If type is KDC_CAPTION then value is KDC_EXPANDS_UP or KDC_EXPANDS_DOWN. (valid only if type is KDC_COLUMN) *5) Number of vertical half lines to output (in the current font) before the text in this section. This information is irrelevant if the type is KDC_CAPTION. If no half lines should be output then the value is 0. *6) If FWX_TRUE use balanced cols for word processors that can handle balanced cols and hard col breaks.( such as MS Word). *7) If FWX_TRUE use balanced cols for word processors that cannot handle balanced cols and hard col breaks. ( such as WordPerfect). *8-x) followed by pairs of left,right coordinates for each column. */ #define KDC_COLUMN ((char) 'T') #define KDC_HEADER ((char) 'H') #define KDC_TIMESTAMP ((char) 'S') #define KDC_FOOTER ((char) 'F') #define KDC_CAPTION ((char) 'C') /*#define KDC_HEADLINE ((char) 'L')*/ #define KDC_LHEADER 1 #define KDC_CHEADER 2 #define KDC_RHEADER 3 #define KDC_CFOOTER KDC_CHEADER #define KDC_RFOOTER KDC_RHEADER #define KDC_LFOOTER KDC_LHEADER #define KDC_RTSTAMP KDC_RHEADER #define KDC_CTSTAMP KDC_CHEADER #define KDC_LFSTAMP KDC_LHEADER #define KDC_COLS_UNKNOWN 0 #define KDC_PICID_UNKNOWN -2 #define KDC_HAS_VRULINGS 1 #define KDC_NO_VRULINGS 0 #define KDC_EXPANDS_DOWN 0 #define KDC_EXPANDS_UP 1 #define KDC_COL_BK ((char) 'X') /* this markup indicates the type of col break (never output in XDOC_LITE). Operands: 1) KDC_HARD_COL_BK or KDC_SOFT_COL_BK */ #define KDC_HARD_COL_BK 1 #define KDC_SOFT_COL_BK 2 #define KDC_TZONE ((char) 't') /*-This modifier gives a text zone descriptor. Operands: ----------------------------------------------------------------------- 1) A number which is a unique identifer for this zone with respect to all the zones, text and pictures, on the current page. The order of a KDC_TZONE markups in the KDOC Text file rather than the ordinal sequence of its id number, gives the lexical ordering of the text, top-to-bottom, left-to-right. However, since text zones need be only partially ordered with respect to one another, the layout of spanning, superior and inferior zones should also be noted; see modifiers 5 and 6 below. (Not all zones ids need be represented.) 2) output order. These numbers do not need to be consecutive. However, they are sorted. The lowest number zone is the first zone output and the zone with the highest number is output last. 3) The Y-ordinate of the top edge of the zone. The top edge of the zone must enclose the top line of text in the zone. There may or may not be additional white space between the top of the zone and the first line of text. 4) The height of the zone (bottom edge - top edge). 5) KDOCZ_TEXT or KDOCZ_TABLE ( content of region ) For information about the lexical content turn on lexclass option 6) prefix (a string). Can be used for labeling fields. 7) suffix (a string). Can be used for labeling fields. 8) The zone name, a string made up by the user and having no particular significance for ICR. By default, when the user has not explcitly given a zone name, this name is just the text zone identifier expressed as numerals. 9-12 are the original frame of the text region. ( this can be bigger than the snug boundary around the text). 9) top 10) left 11) right 12) bottom 13) top border (FWX_BOOL) true of visible 14) left border 15) bottom border 16) right border 17) inverse video? 1 = FWX_TRUE, 0 = FWX_FALSE ------*/ /* list of language ids */ #define KDOC_UNKNOWN_LANGUAGE 0 #define KDC_LANGUAGE ((char) 'O') /* This modifier lists the code page and language to switch to. It can occur anywhere Operands: 1) Windows code page (uses the standard numeric identifiers) 2-n) A list of languages. (see the defined language identifiers in langids.h) If the language is unknown it will be identified as KDC_LANG_UNKNOWN. */ /* list of language identifiers */ #define KDC_LANG_UNKNOWN 0 #define KDOCZ_TABLE ((char) 'T') /* manual table */ #define KDOCZ_TEXT ((char) 'C') /* regular text */ #define KDOCZ_CELL ((char) 'c') /* table cell */ #define KDC_PZONE ((char) 'x') /*-This modifier gives a picture zone discriptor. A picture zone contains graphics, either bitmap or vector. The contents of this zone, if they exist, are stored elsewhere as digital data in the indicated format. Locating this data, perhaps by establishing its file name, is managed outside of the framework of XDOC Text. This descriptor merely gives an identifying tag to be used to map this picture zone to its data. Operands: ----------------------------------------------------------------------- 1) A number which is a unique identifer for this zone with respect to all the zones, text and picture, on the current page. (Picture zones are not currently sequenced on the page by XDOC Text. Neither the sequence of the KDC_PZONE markup in XDOC Text nor the ordinal sequence of this identifying number is significant to the ordering of picture.) 2) zone order (REGION_OUT_ORDER passed on in output) 3) A code number giving the image (graphics) compression format used to encode the picture zone. The informs the user as to the method to be used to uncompress the image data in order to display it as a binary bit map. The compression format is distinct from the file format. 4) left ordinate of the picture. 5) top ordinate of the picture. 6) right ordinate of the picture. 7) the bottom ordinate of the picture. 8) The zone name, a string made up by the user and having no particular significance for ICR. *9) left max frame -- expanded frame beyond minimum image frame used to block text in word proccesor *10) top max frame *11) right max frame *12) bottom max frame ------*/ /* Values for operand 2, the image compression format. */ #define KDOCPICT_NONE 0 /* no associated image data */ #define KDOCPICT_BINARY 1 /* binary bitmap, tightly packed */ #define KDOCPICT_UNCOMP 13 /* binary bitmap, word aligned */ #define KDOCPICT_PACKBITS 12 /* Packbits for TIFF or PICT */ #define KDOCPICT_FAX3 2 /* CCITT level 3 */ #define KDOCPICT_FAX3PAD 15 /* CCITT level 3 with end-of-line pad*/ #define KDOCPICT_FAX4 3 /* CCITT level 4 */ #define KDOCPICT_TIFFG3 8 /* TIFF Group 3 */ #define KDOCPICT_TIFFG4 11 /* TIFF Group 4 */ #define KDOCPICT_ICRSPAN 5 /* ScanSoft ICRSPAN */ #define KDC_DROP ((char) 'u') /*-This modifier marks the beginning of one or more dropped capital letters. A dropped capital(s) letter is a large letter at the beginning of a paragraph. It usually occurs at the beginning of an article. This markup could be used to position this large letter(s) absolutely on the page like a picture. The user could use this information in at least three ways. 1. Conversions could absolutely position the object (the dropped cap) as it was positioned in the original image. Then the rest of the text could placed so that it will flow around the drop cap. This is similar to they way that pictures could be handled. 2. The "dropped cap" markup could be ignored. The result would be a large raised cap (because the font change markup for the dropped cap would still be used). Some word processors cannot do dropped caps. 3. The last possibility is to render the dropped character in the same font as the text following it. The result would be no "no dropped cap output". The result would look "normal". Operands: ----------------------------------------------------------------------- 1) top coordinate of dropped cap frame (could be more than one character) 2) left coordinate of dropped cap frame 3) right coordinate of dropped cap frame 4) bottom coordinate of dropped cap frame 5) partial word or complete word. (KDC_DROP_PARTIAL or KDC_DROP_FULL) 6) # of lines the drop cap has to the right of it (future use) ------*/ #define KDC_DROP_PARTIAL 1 #define KDC_DROP_FULL 2 #define KDC_NDROP ((char) 'J') /* end drop cap */ #define KDC_REVERSE ((char) 'R') /* -This modifier indicates the beginning or end of reverse video text. Operands: 1) START_REVERSE_VIDEO or END_REVERSE_VIDEO */ #define START_REVERSE_VIDEO 1 #define END_REVERSE_VIDEO 0 #define KDC_STLINE ((char) 's') /*-This modifier indicates the beginning of a line of recognized text. Operands: ----------------------------------------------------------------------- 1) The id number of the zone this line of text belongs to. 2) X-ordinate of left edge of zone at this line. 3) Distance from left edge of zone to left edge of 1st text character on the line. A value of 0 means that there is no zone margin width to the left of this line. 4) Unique id of the first word on this line. 5) The Y, or baseline, ordinate for this line. However, if this value is 0, then the line's actual baseline is given in the KDC_NDLINE markup for this line. *6) The style type for the line. See list below. *7) The ID of the primary font for the line. See KDC_FONTINFO. *8) The same as 3) but non-zero only if the line is indented from the left text margin by a horizontal tab advance or by a block indent. In this case the distance is expressed the as number of blank spaces (of "n" width in the most recent font) skipped over by this tab advance or block indent. Note that this count may be zero where the first tab stop (or the block indent) and the left margin are virtually co-incident. Also, this count may include "pad" spaces to the left of the next word if these spaces intervene between the tab advance and this word. *9) A tab advance count. This value is non-zero only when the horizontal space is a tab advance. This count is the number of tab advance characters needed to advance from the left text margin relative to tab stop settings implied by the current context. These tab settings are implicitly changed for each paragraph, table or centered line but are not otherwise specified. It is assumed, regardless of the layout of the source document, that the tab stop settings are such that tab stop over-runs do not occur and that a tab stop cannot be co-incident with another tab stop or with either the left or right text margin. Thus, the tab advance count will be NON-zero if the line begins with a tabulated entry even if the distance from the left text margin to the first word is virtually 0. Note, however, that the tab advance count does not include any count for block indent and that 6) may be 0 while 5) is non-zero, but onl for block-indented text. -------------*/ /* Line styles: paragraph, centered block or table */ #define KDOCSTYLE_PARAGRAPH 'p' #define KDOCSTYLE_CENTBLOCK 'c' #define KDOCSTYLE_TABLE 't' #define KDC_NDLINE ((char) 'y') /*-This modifier indicates the end of a line of recognized text. Operands: ----------------------------------------------------------------------- 1) X-ordinate of right edge of zone at this line. 2) Distance from the right edge of last text character on the line to the right edge of the zone. A value of 0 means that there is no zone margin width to the right of this line. (The Y associated with these X's is given in the next operand.) 3) Y-ordinate of baseline of line. This value should be the as the baseline given in the KDC_STLINE markup, unless that value is 0. The entire line always has the same baseline value. (That is, the text line is not seen as sloping even though the text may be skewed! Thus, skew is always rendered as shear in KDOC. However, except for Raw KDOC, this shear will have been removed, meaning that KDOC Text pages appear de-skewed, with text lines horizontal and the left text margin vertical. *4) The whole number of line advances to the next line of text. This number is always rounded up; line-and-a-half will be seen as double-spaced text. This number is 0 for the the last line of a page where the page break is required. Otherwise, the expected number or line advances to the next line on the current page, if there were one, is given. *5) The text continuation: KDC_TEXT_STOP, KDC_TEXT_CONT; ---------*/ #define KDC_LINE_STOP ((char) 'H') #define KDC_LINE_CONT ((char) 'S') #define KDC_LITERAL ((char) 'i') /*-This modifier is used to convey literal output text as a markup. The operand of this modifiers is a text string which is a byte sequence, pure and simple. The user has placed in this byte sequence data that is not part of the input document. He does not intend that this byte sequence be included in post-processing such as Format Analysis. He merely wants it inserted, unmodified, in the output text. Operands: ----------------------------------------------------------------------- 1) A string. Follows the double-quote convention described elsewhere in this document. ---------*/ #define KDC_OHYPHEN ((char) 'H') /*-This modifier indicates a soft (optional) hyphen found at the end of a line which is believed to be an unnecessary word break between syllables. The hyphen itself is not included in the text stream. -----*/ #define KDC_SUB ((char) 'B') /*-This modifier indicates the start or end of subscripted text. It's a toggle. There must be an even number of these markups in a line.*/ #define KDC_SUPER ((char) 'S') /*-This modifier indicates the start or end of superscripted text. It's a toggle. There must be an even number of these markups in a line.*/ #define KDC_UNDERLINE ((char) 'U') /*-This modifier indicates the start or end of underlined text. It's a toggle. There must be an even number of these markups in a line.*/ #define KDC_UNREC ((char) 'E') /*-This modifier occurs whenever there was an unrecognized character. */ #define KDC_QABLE ((char) 'Q') /*-This modifier indicates that the recognition of the first text character following is suspect.-----*/ #define KDC_CCONF ((char) 'q') /*-This modifier marks a character with its confidence It's value ranges from 0-999. -------------------------------------------------- 1) A positive integer, the confidence of the character ---------------*/ #define KDC_URL 1 #define KDC_EMAIL 2 #define KDC_LINK ((char) 'v') /*-This modifier is followed by one of two internet classes ------------------------------------------------------ 1)KDC_URL or KDC_EMAIL ----------------*/ #define KDC_WCONF ((char) 'w') /*-This modifier indicates the word confidence. It's value ranges from 0-999. ----------------------------------------------------------------------- 1) A positive integer, the confidence of the following word. ---------*/ #define KDC_WBOX ((char) 'b') /* This markup is the bounding box for the next word. 1) left 2) top 3) right 4) bottom 5) baseline - baseline of word ( for dropcap this is just bottom of wbox) 6) leader - true or false */ #define KDC_CBOX ((char) 'Y') /* This markup is the bounding box for the next character 1) left 2) top 3) right 4) bottom */ #define KDC_UNRECOGNIZED KDC_UNREC #define KDC_QUESTIONABLE KDC_QABLE #define KDC_STABLE ((char) 'j') /*- This modifier indicates that the following text is part of a cell table. This table must have column and row markups and will be terminated by the KDC_NDTABLE markup ---------------------------------------------------------------- 1) unique id of cell table 2) left coord of cell table 3) top coord of cell table 4) right coord of cell table 5) bottom coord of cell table 6) # of columns in the table 7) # rows in the table 8) position of the table on the page See the defines below. (Currently this value will always be KDC_LTABLE. 9-n) prs of left,right coord for each column (this is like the KDC_SECTION markup) relative to left coord of celltable. n-n prs of top, bot coord for each row. -------------------------------*/ #define KDC_LTABLE 1 #define KDC_CTABLE 2 #define KDC_RTABLE 3 #define KDC_SROW ((char) 'o') /*- This modifier indicates that a new row in the current cell table begins here. --------------------------------------------------------------- 1) row height (always zero for now) 2-n) top border code|left border code|bottom border code|right border code Currently only KDC_CELL_BORDER_SINGLE line will be detected. A border of 0 can mean that it is a "fake" border between two sub cells (joined cells). This means that for each cell in the row (this will be as many as were reported in the KDC_STABLE markup). So this markup might look like this for a table with four columns: [o0;1111,1111,1111,1110] This would describe a table row that looks like this: ----------------------------------------------- | | | | ----------------------------------------------- ------------------------------------*/ /* these defines were modeled after capabilities of MasterSoft conversions */ #define KDC_CELL_BORDER_NONE 0 #define KDC_CELL_BORDER_SINGLE 1 #define KDC_CELL_BORDER_DOUBLE 2 #define KDC_CELL_BORDER_DASHED 3 #define KDC_CELL_BORDER_DOTTED 4 #define KDC_CELL_BORDER_THICK 5 #define KDC_CELL_BORDER_EXTRATHICK 6 #define KDC_CELL_BORDER_HAIRLINE 7 #define KDC_CELL_BORDER_SHADOWED 8 #define KDC_SCOL ((char) 'n') /*- This modifier indicates that a new cell in the current cell table row begins here. --------------------------------------------------------------------- 1) unique id of table that cell belongs to 2) current column number 3) number of columns the cell spans 4) number of rows the cell spans 5) does this cell exist (or does it continue from the row above or the column to the left). The value of this is KDC_CELL_EXIST or KDC_CELL_NOT_EXIST. 6) cell horizontal alignment (see alignment codes below) (currently this value is always KDC_CELL_ALIGN_LEFT) 7) number of decimal places for decimal alignment. (currently this is always 2) 8) cell vertical alignment (see the defines below, currently this is always KDC_CELL_VALIGN_TOP) For these various alignments use the normal data that is presented about vertical space and tabs. -----------------------------------------------------------------*/ #define KDC_CELL_EXIST 1 #define KDC_CELL_NOT_EXIST 0 #define KDC_CELL_ALIGN_LEFT 0 #define KDC_CELL_ALIGN_FULL 1 #define KDC_CELL_ALIGN_CENTER 2 #define KDC_CELL_ALIGN_RIGHT 3 #define KDC_CELL_ALIGN_DECIMAL 4 #define KDC_CELL_VALIGN_TOP 0 #define KDC_CELL_VALIGN_BOTTOM 1 #define KDC_CELL_VALIGN_CENTER 2 #define KDC_NDTABLE ((char) 'A') /*- The table that began with the KDC_STABLE markup is now being terminated*/ /* new markups for XDOC PLUS */ #define KDC_CONV_TZONE ((char) 'D') /* text zone for conversions */ /*-This markup indicates the start of a a text zone. A text zone is also known by the name "galley". A "zone" is a rectangular area which defines the location of a picture or a block of text. The end of the zone is marked by the next FA_TZONE, FA_PZONE, or by the next page (FA_PAGE), or by the end of the document (FA_NDDOC), whichever comes first. The boundaries of the zone are defined by the operands. The text for a text zone follows this markup. The textual zone must at least entirely enclose the text characters; it may also have a border of white space. Operands: ----------------------------------------------------------------------- 1) output order (REGION_OUT_ORDER passed out). 2) The distance, in points, from the left edge page to the left edge of zone. 3) The distance, in points, from the top margin of the page to the top of the zone. 4) The width, in points, of the zone. (R edge - L edge + 1) 5) The height, in points, of the zone. (Bottom edge - Top edge + 1) 6) left coordinate of frame (absolute) 7) top coordinate of frame 8) right coordinate of frame 9) bottom coordinate of frame 10) borders (defined the same way as borders for cell tables) ----*/ #define KDC_PGMARGIN ((char) 'm') /*-This markup indicates the margins of the next page. It appears prior to every page. Operands: ----------------------------------------------------------------------- 1) The top page margin. This is the distance, in points, from the top edge of the page to the top of the topmost picture. 2) The bottom page margin. This is the distance, in points, from the top edge of the page to the bottom of the bottommost picture. 3) The left page margin. This is the distance, in points, from the left edge of the page to the left edge of the leftmost picture. 4) The right page margin. This is the distance, in points, from the left edge of the page to the right edge of the rightmost picture. 5) The distance, in points, from the top edge of the page to the top of the topmost text zone.(top) 6) The distance, in points, from the top edge of the page to the bottom of the bottommost text zone.(bot) 7) The distance, in points, from the left edge of the page to the left edge of the leftmost text zone.(left) 8) The distance, in points, from the left edge of the page to the right edge of the rightmost text zone.(right) 9) top most body text 10) bottom most body text (not incl header + footer) ----*/ #define KDC_STPARA ((char) 'P') /*-This markup indicates the start of a paragraph. The paragraph ends with the next KDC_NDPARA markup. A paragraph is defined as a block of filled text (not centered and not a table) terminated by a hard carriage return. Note: If the preceding page ended with a normal paragraph whose last line ended with a soft carriage return, and if this KDC_STPARA is the first p/cb/t of a new page, then it's not really the start of a para- graph. Rather, it's the continuation of the last paragraph of the previous page, and the PPD ID in this markup should be ignored. Operands: ----------------------------------------------------------------------- 1) PPD ID number. Details about PPD's are contained in the FACP output file. -----*/ #define KDC_CENTER ((char) 'C') /*-This markup indicates that the text which follows, up until the next KDC_NDPARA markup, is to be centered. In the current implementation, no soft carriage returns (KDC_CR's) are permitted between KDC_CENTER and the next KDC_NDPARA. In other words, each centered line has its own KDC_CENTER/KDC_NDPARA pair. This should be viewed as a deficiency. Operands: ----------------------------------------------------------------------- 1) PPD ID number. Details about PPD's are contained in the FACP output file. 2) A count of the number of recognized text (non-markup) characters on this line. -----*/ #define KDC_TABLE ((char) 'F') /* NOTE: Within the context of a table, the term "column" is ambiguous. On the one hand, it refers to a font-specific unit of horizontal measure, as elsewhere in this document. On the other hand, it has a table-specific meaning as well, namely, a vertical strip of text, as in the columns of a table. I have attempted to resolve this ambiguity, while maintaining compatibility with the usage in the rest of this document, by using "column" for the former usage and "tabular column" for the latter. */ /*-This markup indicates that the text which follows is tabular. Informa- tion about the tabular layout is conatined in the FACP output file. Operands: ----------------------------------------------------------------------- 1) PPD ID number. Details about PPD's are contained in the FACP output file. ----*/ #define KDC_STBKNDT ((char) 'K') /*-(STart BlocK iNDenT) The following paragraph (must be a normal para- graph) begins a block indent. This markup must be immediately followed by a KDC_STPARA markup. A block indent is defined here as a group of one or more consecutive normal paragraphs adhering to the following constraints: - They all have the same right margin and body left margin. - Their body left margin is to the right of the body left margin of the nearest normal paragraph above the block indent. - Their right margin is the same as the nearest normal paragraph above the block indent. Nested block indents are not possible. This is a deficiency. Note: A block indent may span zone and page boundaries. Note: This is one mechanism by which block indented pages can be detected. ----*/ #define KDC_NDBKNDT ((char) 'L') /*-(eND BlocK iNDenT) The preceding paragraph (must be a normal paragraph) ends a block indent. This markup must come immediately after a KDC_NDPARA markup. ----*/ #define KDC_BPVSP ((char) 'V') /*-(Between Paragraph Vertical SPace) The vertical space from the baseline of the last line of one p/cb/t (paragraph/centered-block/table) to the baseline of the first line of the next is indicated by this markup. The values specified remain in effect until the next KDC_BPVSP or the end-of- page, whichever comes first. The first KDC_BPVSP markup in the page occurs just ahead of the 1st p/cb/t to appear in the same zone as its predecessor. Subsequent KDC_BPVSP markups appear between subsequent pairs of p/cb/t's, but only when the value of one of the operands would change significantly. (It is the responsibility of format analysis to make a judgment regarding how much change is a "significant" change.) Operands: ----------------------------------------------------------------------- 1) The spacing, in points, between the next and previous p/cb/t. 2) The same spacing, in half-lines, relative to the primary font of the previous p/cb/t. ----*/ #define KDC_IVSP ((char) 'I') /* imediate vertical space. This occurs anywhere and has no relation subsequent vertical spaces. Operands: 1) The spacing in points. 2) The same spacing in half-lines. ----*/ #define KDC_HI1LTAB ((char) 'T') /*-This markup may occur only on the first line of a normal paragraph, and only on normal paragraphs with a "hanging indent", and only once within this first line. (A paragraph is said to have a "hanging indent" when the left edge of its first line is to the left of the left edge of the remaining lines.) The markup implies a tab advance. The location of the implied tabstop being advanced to is given in the FACP file as the the first (and only) TABSTOP entry for the PPDID of the paragraph. -----*/ #define KDC_TBADV ((char) 'z') /*-This markup indicates an advance to a particular tabstop. This markup may occur ONLY within a table. Note that that there is one type of tabstop, the "field separator", which may not be tab advanced to. Every column of a table's line (even an initial left-justified column) will be preceded by this markup. Operands: ----------------------------------------------------------------------- 1) The ordinal number of the tabstop being advanced to. This number is a reference to the PPD description block in the corresponding FACP output file. The PPD description block for this table will have a number of "TABSTOP=..." lines. If this operand is 3, for example, then we're advancing to the tabstop described by the 3rd "TABSTOP=..." line in the relevant PPD description block. (Which PPD description block is relevant? The one which has the same PPDID # as is given in the KDC_TABLE markup for this table.) 2) A zero-, one-, or two-character string, indicating whether the tab is advanced to with whitespace (indicated by a zero-character string, ie, two double-quotes), with a one-character repeating leader dot, or with a two-character repeating leader dot. 3) A count whose meaning depends on the type of the tabstop being advanced to. For decimal tabs, the count is the number of characters of recognized text (non-markup) between this tab advance and the decimal point (not counting the decimal point itself). For left- and right-justified tabstops, the count is a count of the recognized text (non-markup) characters between this tab advance and the next tab advance (or end-of-line markup, whichever comes first). In all cases, recognized characters of questionable recognition count, but unrecognized characters do not (the former consists of a markup followed by one character, while the latter consists of only a markup). 4) Same count as in operand 3), but counting unrecognized-character markups as one character each. -----*/ #define KDC_CR ((char) 'N') /*-This markup indicates an optional end-of-line, aka soft carriage return, within the context of a paragraph. However, within the context of a block of centered text or a table, it indicates a hard carriage return (centered text and tables do not have soft carriage returns) at the end of any-but-the-last line of the centered-block or table. Note: If all of the following conditions pertain, then the KDC_CR should be interpreted as KDC_NDPARA. 1. This FA_CR is on the last line of the entire page. (This is possible only if the last line of the entire page is part of a normal paragraph.) 2. The next page begins with a table or a centered block, or there is no next page. Operands: 1) KDC_LINE_STOP ('H') or KDC_LINE_CONT ('S') ---*/ #define KDC_HEADLINE ((char) 'M') /* this markup indicates start or stop headline (toggle) */ /* private markups for internal use */ #define KDC_GTYPE ((char) 'G') /*- This markup is used to store page layout anal results that were stored in the region. They will get copied into the new gtype data struct in the galley 1) parentId 2) type 3) ncols 4) id (of caption picture) 5) AboveSpanningPictureId 6) nColBeg 7) nColEnd 8) nRowBeg 9) nRowEnd 10) ColNo 11) flags 12-n) ColDimsPairs - ncols pairs of coords */ #define KDC_GTYPE_NEWSECTION GTYPE_NEWSECTION /* this region starts new section */ #define KDC_GTYPE_VRULING GTYPE_VRULING /* region has v rulings between cols */ #define KDC_GTYPE_CELL GTYPE_CELL /* region is table cell */ #define KDC_GTYPE_MAN_TABLE GTYPE_MAN_TABLE /* region is manual table */ #define KDC_GTYPE_AUTO_TABLE GTYPE_AUTO_TABLE /* region is auto table */ #define KDC_GTYPE_CENTERED GTYPE_CENTERED /* region is somewhat centered on page */ #define KDC_GTYPE_LEFT GTYPE_LEFT /* region is somewhat left on page */ #define KDC_GTYPE_RIGHT GTYPE_RIGHT /* region is somewhat right on page */ #define KDC_GTYPE_ENDSCOL GTYPE_ENDSCOL /* region ends with hard end of col */ #define KDC_GTYPE_HEADLINE GTYPE_REGION_HEADLINE /* autoseg called it a headline */ #define KDC_GTYPE_EXPANDS_UP GTYPE_EXPANDS_UP /* if FWX_TRUE, text expands up */ #define KDC_GTYPE_BAL_COL_FOR_FINE_MS_PROD GTYPE_BAL_COL_FOR_FINE_MS_PROD #define KDC_GTYPE_BAL_COL_FOR_BAD_UTAH_PROD GTYPE_BAL_COL_FOR_BAD_UTAH_PROD #define KDC_GTYPE_COL_MAY_BE_NOISE GTYPE_COL_MAY_BE_NOISE /* if FWX_TRUE then para wrap */ #define KDC_GTYPE_SECOND_PAGE_NOISE GTYPE_SECOND_PAGE_NOISE /*edge noise from second page*/ #endif