In practice, 20K and 60K high frequency words were used independently to produce wordbased trigram models.
Building a Test Collection for Speech-Driven Web Retrieval
Trigrams whose frequency was above the threshold were used for language modeling.
Building a Test Collection for Speech-Driven Web Retrieval
Julius performs a twopass (forward-backward) search using word-based forward bigrams and backward trigrams.
Building a Test Collection for Speech-Driven Web Retrieval
The language model is a wordbased trigram model produced from 60,000 high frequency words in 10 years of Mainichi Shimbun newspaper articles.
Building a Test Collection for Speech-Driven Web Retrieval
We also produced word-based trigram language models using approximately 10M documents in the 100GB collection used for the main task.
Building a Test Collection for Speech-Driven Web Retrieval
***