TextSTAT TextSTAT - Simple Text Analysis Tool / Concordance software

TextSTAT 3 - Simple Text Analysis Tool

Concordance software for Windows, GNU/Linux and MacOS

Screenshot TextSTAT 3
Screenshot: Concordances in TextSTAT 3

TextSTAT is a simple program for text analysis. It reads text files (in various encodings) and HTML files (also directly from the internet), and it creates word frequency lists and concordances from these files. TextSTAT has its own web crawler, with which you can compile any number of pages from a specific website into a TextSTAT corpus.
TextSTAT also reads PDF files, MS Word files and LibreOffice files. You can simply add the files to a corpus without any further conversion.
To search within the texts, you can use regular expressions, which offers you a wide variety of powerful search capabilities. TextSTAT is designed to process texts in various languages. Since the program internally uses Unicode, files in different languages and file encodings can be processed. And the TextSTAT user interface can be switched to multiple languages.
The generated frequency lists and concordances can be exported as csv files for further analysis and visualization.



Logo TextSTAT 2

Download TextSTAT 3 (June 2024) now!

Documentation:

There is a quick manual in English that guides you through the installation process and the graphic user interface.

Disclaimer:

TextSTAT is free software. It may be used free of charge and it may be freely distributed provided the copyright and the contents of all files, including TextSTAT.zip itself, are unmodified. Commercial distribution of the programme is only allowed with permission of the author. Use TextSTAT at your own risk; the author accepts no responsibility whatsoever. The sourcecode version comes with its own license.

Found a bug, other feedback?

Feedback regarding TextSTAT is always welcome!


(Comparative) reviews of TextSTAT:

  • Bennett, Gena R. (2010), Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Ann Arbor, Michigan: University of Michigan. pp. 144. ISBN 978-0-472-03385-0. (Link)
  • Aldo Benini (2010), Text Analysis under Time Pressure Tools for humanitarian and development workers. Washington, DC. (Link)
  • Krajka, Jarosław (2007), Corpora and Language Teachers: From Ready-Made to Teacher-Made Collections. CORELL: Computer Resources for Language Learning 1, 36-55. (Link)
  • Daniel Wiechmann & Stefan Fuhs (2006), in: Corpus Linguistics and Linguistic Theory 2-1, 107-127. (Link)
  • Luciana Diniz (2005), in: Language Learning & Technology Vol. 9, No. 3, pp. 22-27. (Link)


Version history

TextSTAT 3.0.0

A lot has happened with the new major version of TextSTAT.

  • After years of stagnation, now a big update and bugfix release thanks to the collaboration with Max Kindler-Mathot.
  • Python 2 is no longer supported
  • The GUI has been modernized (sv-ttk theme)
  • Database queries have been updated to allow more effective use of regular expressions (with sqlean)
  • Update of various conversion functions, using better packages (such as html-text, pypdf, docx)
  • Built-in web crawler has been repaired and modernized (with the help of requests, beautifulsoup, and html-text)
  • There are precompiled binary distributions for MacOS, Windows, and Linux (Debian).
  • Various smaller and larger bug fixes and other corrections have been implemented.
  • The underlying database format has been changed (TextSTAT now uses an SQLite database). This should have a positive impact, especially for larger corpora. Existing text corpora created with a TextSTAT 2 version can no longer be opened directly. However, they can be easily imported into a new corpus. The new standard extension for corpus names is .crp3. This allows you to easily distinguish between the different versions.
  • Files in the corpus can now be edited, which is especially helpful when loading files from the internet (for example, to remove advertisements and navigation elements). Simply click on the file name in the corpus tab.
  • PDF files can be added to a corpus.

TextSTAT 2.9 (deprecated)

Version 2.9 of the programme is mostly a bug fix release. Don't expect new features. TextSTAT is now available with the following programme languages: English, German, Dutch, Potugese, Spanish, Catalan, French, Italian, Galician, Finnish (Suomi), Polish, Czech.

Download (binary version for MS Windows XP/Win7):
TextSTAT 2.9c for Windows (ZIP file, approx. 8 MB, Feb 20, 2014)

This version includes everything you need to use TextSTAT with Windows. It comes as a single installation file. To install the programme, just unpack the file to a directory of your choice. To run TextSTAT, change to that directory and doubleclick on 'TextSTAT.exe'. That's it. If you decide to have a shortcut from the desktop or the start menu, you'll have to create the shortcut yourself.
Uninstall: Since TextSTAT doesn't change your registry or other system components, you can savely delete the directory in which you installed the programme. After that, TextSTAT will be completely removed from your system.

Download (Python Sourcecode):
TextSTAT 2.9c Sourcecode (ZIP file, 150 KB, Feb 20, 2014)

TextSTAT is written in Python and should run everywhere where Python runs. It has been tested with Windows XP and Linux. And it also runs on MacOS X.
You will need to install Python in order to use the programme (TextSTAT.pyw). It will run with Python versions > 2.5, the most recent version is 2.7 (TextSTAT will NOT work with 3.0). On Windows you could use the ActivePython distribution which contains everything you need (Windows extensions, Tkinter). With Linux and Mac there are of course no Windows Extensions - so you won't have export to MS Word and Excel. Except for that TextSTAT should run just fine on GNU/Linux and MacOS X. All you need is an up-to-date Python distribution (preferably 2.7), with Tkinter installed (which is not always the case, especially on Mac systems).


TextSTAT 1.5 (deprecated)

The old 1.52 version is still available: TextSTAT 1.52 for Windows (ZIP file, 2.3 MB). You can read the documentation for TextSTAT (outdated) online.


Questions, problems, suggestions? Please contact
Matthias Hüning, <textstat@niederlandistik.fu-berlin.de>