
In tokenized string edit distance processing, diacritics are not treated as separate symbols, but as modifiers of the vowel or consonant they follow. There is a preprocessing step taking place, where each combination of vowel or consonant and the following diacritics are combined into a single token. String edit distance is then applied to strings of these new tokens.

Instances that could not be tokenized correctly are dispalyed in the list of errors (Data inspection > data overview > errors) and will not be used in the further analysis of the data set. Under Index > items you can find out how many instances you supplied of each variable, and the number of instances that are actually used.

previous | Start | next