The html filter indexes all text in an HTML file, except for HTML tags and comments. In addition, the following fields are accessible by default from reserved fields within FTR collections.
-
Html filter derives the title from the head of the document and assigns it to the reserved field FT_DNAME.
-
Formatted text is assigned to the FT_KEYWORDS reserved field. This includes text between the following HTML tags:
-
bold tags <B> and </B>
-
italics tags: <I> and </I>
-
strong emphasis tags: <STRONG> and </STRONG>
-
emphasis tags: <EM> and </EM>
-
teletype tags: <TT> and </TT>
-
-
Contents of META tags with the NAME=SUBJECT attribute are assigned to the FT_SUBJECT reserved field. If this tag doesn't exist, the <TITLE> information is written to FT_SUBJECT.
-
Document format id 3,023 is written to the FT_FORMAT reserved field.
-
The length of the document, in bytes, is written to the FT_ORIGINAL_SIZE reserved field.