Share your ideas

Better control over canister session indexing is needed, to reduce processing and storage cost

Tealeaf session indexing can be very costly, with index storage size commonly being twice session (LSSN) storage size, and sometimes escalating to far more. More control is required to skip indexing on content that is not meaningful for session search.

Tealeaf currently provides a mechanism for adding content types to index (i.e., "Additional Content Types to Index"), but the opposite type of control is also needed.

These options are requested, for identifying hits to skip:

* "Content Types to Skip"
* "File Extensions to Skip"
* "URLs so skip" (where the URL contains any of the specified strings)

A workaround has been deployed from time to time... to "munge" the content type and have the indexer skip the hit, but this sometimes interferes with replay.

For reference, these content types are indexed by default:

text/html
text/plain
text/xml
application/xhtml+xml
application/rdf+xml
application/vnd.mozilla.xul+xml
application/xml

A hit designated to to be "skipped" for indexing would disregard these defaults, and not be indexed.

When a hit is "skipped" for indexing, standard request indexing would still be done, but the response and any JSON or XML payload in the request would not be indexed. This will help materially in AJAX heavy applications, where those data elements are not expected to be searched for directly.

Additionally, if the DTSearch engine supports compression, and it is not already in use, enabling compression might reduce storage size. However, uncompressed storage has a storage advantage when the media supports de-duplication, so compression should also be disable-able.

  • Guest
  • Feb 28 2020
  • Shipped
What is your industry? Telecommunications
What is the idea priority? Medium