Sogou cell thesaurus adopts scel format, which uses Unicode to encode Chinese characters and pinyin. The content of the entire scel format is: header information, thesaurus introduction, pinyin combination list, word list. scel format word data structure design is still better, it uses the pinyin pointer to avoid duplicate pinyin to occupy the content of the word in the entry, but also will merge the homophonic words together to save space.
The QQ Classifieds thesaurus uses the qpyd format, which is a zip-compressed list of entries. qpyd format has the following contents: header information, thesaurus introduction, and a compressed list of entries. qpyd format is zip-compressed, so the whole file is smaller than the other formats of thesaurus when the number of entries is the same. However, unlike Sogou's scel format, qpyd format has a pinyin equivalent for each entry, and the words are encoded in UTF8, but the pinyin is encoded in Unicode.