Share your ideas

More control over Request and UI Capture indexing

Most words in the session indexes are not useful for searching.

The added load from indexing unwanted values can lead to index backlogs, high CPU utilization and indexes that are 2 to 9 times the size of their session data. For Tealeaf business users this means that some sessions are not findable until long after they are complete.

An option is requested that only the values in name-value pairs or UI capture field names are indexed.


Here are some indexed words and their number of occurrences, from one canister early in the day:

prevState, 43,120

isParentLink, 33,927

value, 53,077

accept_encoding, 193,129

accountEndDate, 838,023

_RCRTX03_samesite, 1,000,366

startTime, 20,406

StatusCode, 5,347

REMOTE_ADDR, 129,909

Ct3tDl7UAnQXD55DLE4Pl9kKO9FO0Ey99fcvNe0xCoYVQ, 5,924

�������Aٵm�Uj��Ӧ�Ԧ�U��N�WX��1HBx���A�̙�I�U���L�g_���K�2cZf, 78,878


To see the full list, use this search server query

https://localhost:19000/wordlist?fields=1


"Maximum Word Size" is set to 32 characters, yet many words are much larger. Additionally, many items that appear to be binary or multi-byte are indexed, this should be dis-able-able as these items are not likely used in a search.


Most words in the index exist in all related hits so are not meaningful for searching. For example, every hit has "StatusCode" and "REMOTE_ADDR" in the request. Searching by these words is not meaningful. Only the value of a name-value pair is typically used in search. a


For a UI Capture field change event:

"currState":{
"value": "260475106"
},

"subType": "text",

"isParentLink": false,

"prevState":{
"value": ""
},


All of these labels are indexed but can not be used in a meaningful way in search. Only the value "260475106" might be useful in the index.

Also note: With Enable-IndexingUiSdkJson disabled, UI capture content is skipped and indexes are much smaller. However, it takes time to remove these items from the master index... they will be trimmed off with their session data,


In summary, an option is needed to have the indexer more intelligently determine which words are useful in the index, particularly... only the values in name-value pairs or UI capture data should be indexed.

  • Eric Stamper
  • May 12 2022
How will this idea be used?

We sometimes struggle with index backlogs and storage size. Having more efficient indexes will help significantly for both of these problems.

What is your industry? Insurance
What is the idea priority? Medium