Processing of categorical data

Binary comparison

In binary comparison, two strings are counted as either the same (distance zero) or different (distance one). This processing type cannot be used to compare detailed data like phonetic transcriptions, because there might not be many identical variants of each item in the data. Binary comparison can be useful for example for syntactic data where the number of variants of each lingusitic variable is limited.

In binary comparison, the aggregate linguistic distance is simply a count of how many of the linguistic variables have exactly the same form in two places.

Gewichteter Identitätswert (GIW)

This is a frequency-weighted similarity value which was introduced in dialectometry by Goebl (1984). The GIW method counts infrequent words more heavily than frequent ones. This opposes the tendency in several areas of quantitative linguistics that very infrequent words should be treated as noise, unreliable evidence of linguistic structure (Nerbonne and Kleiweg 2007).

previous | Start | next