CSLM toolkit is open-source software which implements the so-called continuous space language model.

Holger Schwenk LIUM, University of Le Mans, France

The basic idea of this approach is to project the word indices onto a
continuous space and to use a probability estimator operating on this space.
Since the resulting probability functions are smooth functions of the word
representation, better generalization to unknown events can be expected. A
neural network can be used to simultaneously learn the projection of the words
onto the continuous space and to estimate the n-gram probabilities. This is
still a n-gram approach, but the LM probabilities are *interpolated* for
any possible context of length n-1 instead of backing-off to shorter contexts.
This approach was successfully used in large vocabulary continuous speech
recognition and in phrase-based SMT systems.

Detailed information is available in the following publications:

- Holger Schwenk,
*Continuous Space Language Models*, in Computer Speech and Language, volume 21, pages 492-518, 2007. - Holger Schwenk, Continuous Space Language Models For Statistical Machine Translation, The Prague Bulletin of Mathematical Linguistics, number 83, pages 137-146, 2010.
- Holger Schwenk, Anthony Rousseau and Mohammed Attik; Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation, in NAACL workshop on the Future of Language Modeling, June 2012.
- Holger Schwenk; Continuous Space Translation Models for Phrase-Based Statistical Machine Translation, in Coling, Dec 2012.
- Holger Schwenk; CSLM - A modular Open-Source Continuous Space Language Modeling Toolkit, in Interspeech, August 2013.

The development of the CSLM toolkit was partially financed by the European projects EuroMatrix and Matecat, the ANR project COSMAT and the DARPA project BOLT.

Version |
Date |
Description | Download |

v4 (current version) |
June 28 2015 | Major update: bug fixes, better GPU code, more flexible network architectures and training, improved data handling | [cslm_v4.0.tgz] |

v3 |
Mar 26 2014 | Major update: bug fixes, GPU speed-up, support for rescoring HTK lattices, continuous space translation model, etc | [cslm_v3.0.tgz] |

v3 |
May 5 2014 | tutorial updated for V3 | [tutorial_r3.00.tgz] |

v2 |
Sep 11 2012 | part of the WMT'12 data for the tutorial (nc7, eparl7, newstest2010 and 2011) | [tutorial_v2_wmt12data.tgz] |

v2 |
Jun 03 2012 | Major update: full support of short-lists, support of GPU cards | [cslm_v2.0.tgz] |

v2 |
Sep 11 2012 | small tutorial how to use the toolkit | [tutorial_v2.tgz] |

v1 |
Jan 27 2010 | Initial version of the CSLM toolkit. | [cslm_v1.0.tgz] |

The toolkit will be frequently updated. You can join the CSLM google group to be informed on updates, bug fixes or discuss best usage.