Most words in the session indexes are not useful for searching.
The added load from indexing unwanted values can lead to index backlogs, high CPU utilization and indexes that are 2 to 9 times the size of their session data. For Tealeaf business users this means that some sessions are not findable until long after they are complete.
An option is requested that only the values in name-value pairs or UI capture field names are indexed.
Here are some indexed words and their number of occurrences, from one canister early in the day:
prevState, 43,120
isParentLink, 33,927
value, 53,077
accept_encoding, 193,129
accountEndDate, 838,023
_RCRTX03_samesite, 1,000,366
startTime, 20,406
StatusCode, 5,347
REMOTE_ADDR, 129,909
Ct3tDl7UAnQXD55DLE4Pl9kKO9FO0Ey99fcvNe0xCoYVQ, 5,924
�������Aٵm�Uj��Ӧ�Ԧ�U��N�WX��1HBx���A�̙�I�U���L�g_���K�2cZf, 78,878
To see the full list, use this search server query
https://localhost:19000/wordlist?fields=1
"Maximum Word Size" is set to 32 characters, yet many words are much larger. Additionally, many items that appear to be binary or multi-byte are indexed, this should be dis-able-able as these items are not likely used in a search.
Most words in the index exist in all related hits so are not meaningful for searching. For example, every hit has "StatusCode" and "REMOTE_ADDR" in the request. Searching by these words is not meaningful. Only the value of a name-value pair is typically used in search. a
For a UI Capture field change event:
"currState":{
"value": "260475106"
},
"subType": "text",
"isParentLink": false,
"prevState":{
"value": ""
},
All of these labels are indexed but can not be used in a meaningful way in search. Only the value "260475106" might be useful in the index.
Also note: With Enable-IndexingUiSdkJson disabled, UI capture content is skipped and indexes are much smaller. However, it takes time to remove these items from the master index... they will be trimmed off with their session data,
In summary, an option is needed to have the indexer more intelligently determine which words are useful in the index, particularly... only the values in name-value pairs or UI capture data should be indexed.
How will this idea be used?
We sometimes struggle with index backlogs and storage size. Having more efficient indexes will help significantly for both of these problems. |
|
What is your industry? | Insurance |
What is the idea priority? | Medium |