Similarity functions for sequences

Definitions
Given a symbol set $$\mathcal{L}$$, a sequence over $$\mathcal{L}$$ is an ordered collection of symbols of $$\mathcal{L}$$, $$s=s_1s_2\cdots s_n,\ \forall i=1,\ldots,n\ s_i\in\mathcal{L}$$.

We call $$\mathcal{L}^*$$ the set of all the sequences over $$\mathcal{L}$$.

Given $$s,t\in \mathcal{L}^*$$, we use the following notation:
 * $$\epsilon$$ is the empty sequence.
 * $$s+t$$ is the concatenation of $$s$$ and $$t$$.
 * $$|s|$$ is the length (number of symbols) of $$s$$.
 * $$s[i]$$ is the $$i$$-th symbols of $$s$$.
 * $$s\sqsubseteq t$$ represents that $$s$$ is a subsequence of $$t$$, ie, there exist sequences $$u,v\in \mathcal{L}^*$$ such that $$t=u+s+v$$.
 * $$s=t$$ is the equality of sequences, ie., $$s\sqsubseteq t$$ and $$t\sqsubseteq s$$.
 * $$s\#t$$ is the number of times that the sequence $$s$$ appears in the sequence $$t$$.

Similarity and distance functions

 * Hamming similarity for sequences
 * Subsequence similarity
 * n-grams similarity
 * Edit distance
 * Monge-Elkan distance
 * Jaro distance
 * Jaro-Winkler distance
 * Jaccard similarity
 * Term frequency / Inverse Document Frequency (TFIDF)
 * Jensen-Shannon distance
 * Fellegi-Sunter distance
 * Recursive Monge-Elkan similarity