Parallel corpus for Dutch and German
We provide a version of the Europarl corpus. From the 'source release' we generated a corpus for German and Dutch and we built a web interface for easy access.
Europarl Corpus
Search the Europarl corpus (release v6, 02/2011)
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It was prepared by Cameron Shaw Fordyce (CELCT), Josh Schroeder, and Philipp Koehn (both University of Edinburgh).
The corpus contaions approx. 50 million words per language.
For a detailed description of this corpus, please read:
Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005, [ PDF ].
More information about the Europarl corpus can be found on the Europarl Website. This site also offers a download of the entire corpus.