CSLM toolkit is open-source software which implements the so-called continuous space language model.
Holger Schwenk LIUM, University of Le Mans, France
The basic idea of this approach is to project the word indices onto a continuous space and to use a probability estimator operating on this space. Since the resulting probability functions are smooth functions of the word representation, better generalization to unknown events can be expected. A neural network can be used to simultaneously learn the projection of the words onto the continuous space and to estimate the n-gram probabilities. This is still a n-gram approach, but the LM probabilities are interpolated for any possible context of length n-1 instead of backing-off to shorter contexts. This approach was successfully used in large vocabulary continuous speech recognition and in phrase-based SMT systems.
Detailed information is available in the following publications:
The development of the CSLM toolkit was partially financed by the European projects EuroMatrix and Matecat, the ANR project COSMAT and the DARPA project BOLT.
|June 28 2015||Major update: bug fixes, better GPU code, more flexible network architectures and training, improved data handling||[cslm_v4.0.tgz]|
|Mar 26 2014||Major update: bug fixes, GPU speed-up, support for rescoring HTK lattices, continuous space translation model, etc||[cslm_v3.0.tgz]|
|May 5 2014||tutorial updated for V3||[tutorial_r3.00.tgz]|
|Sep 11 2012||part of the WMT'12 data for the tutorial |
(nc7, eparl7, newstest2010 and 2011)
|Jun 03 2012||Major update: full support of short-lists, support of GPU cards||[cslm_v2.0.tgz]|
|Sep 11 2012||small tutorial how to use the toolkit||[tutorial_v2.tgz]|
|Jan 27 2010||Initial version of the CSLM toolkit.||[cslm_v1.0.tgz]|
The toolkit will be frequently updated. You can join the CSLM google group to be informed on updates, bug fixes or discuss best usage.