SpellCheckServlet Mark D. Anderson (mda@discerning.com) November 2004 File Summary ============ The files in this source directory are: *.css,*.html,*.js the static client side files, slightly modified from SpellerPages sources/ not required at build or run time (as long as you have aspell installed). the main purpose is just to have an archive for posterity of the sources of the third party components. WEB-INF/src/SpellCheckServlet.java the server-side part that responds to the javascript client. a Java implementation of the spellerpages system. it invokes aspell. WEB-INF/web.xml The J2EE servlet config. Note that the servlet url mapping there must match the action url (spellCheckScript) configured in spellChecker.js. This also where the path to aspell is configured. jetty-spellcheck.xml A configuration file to use with the Jetty web server. Component Summary ================= This implementation is based on the integration of several third-party components: 1. Speller Pages ( http://spellerpages.sourceforge.net/ ) Open source project for spelling, which relies on aspell. 2. GNU ASpell ( http://aspell.sourceforge.net ) Open source Command-line spell checker. Comes with an American English word list. 3. WinterTree Medical Word List ( http://www.wintertree-software.com/dev/dict/wordlists.html ) Commercial medical word list, which we use to complement the aspell dictionary. The English common word list that comes with ASpell has about 132k words. Coincidentally, the medical word list has about 133k words, but it is not a superset; they each have many words that the other does not. Installation ============ To install: 1. Deploy the servlet web app, editing web.xml if you want to change the ASpell location. 2. Install ASpell, as described below. We have a solaris binary that installs to /usr/local, which is where web.xml assumes it will be. 3. Install the medical dictionary to the data directory of the ASpell installation, as described below. Speller Pages ============= Spellerpages is a LGPL project hosted at http://spellerpages.sourceforge.net/ On the client side, it is html and javascript. On the server side, it offers CGI programs in multiple alternative programming languages to call the command-line aspell. This includes Perl, CFM, and PHP. We have written one in Java, which is what is in use. We have are using a slightly modified copy of spellerpages version 0.5.1. The original that we started with is located in: sources/spellerpages-0.5.1.tar.gz Here is a description of the purpose of the files included with Speller Pages: index.html - sample application/demo page. loads spellChecker.js. contains a named and a button that calls openSpellChecker() which is defined inline to create a new spellChecker object (with varargs strings to check) and call its openChecker() method. openChecker is defined in spellChecker.js spellChecker.js - loaded by index.html. this file is where the form handler url is configured. this sets window.speller. spellchecker.html - the popup. a frameset. has inline js, but loads none itself. it gets the form data by using the opener.speller object. blank.html - empty initial top frame in spellchecker.html . overwritten by js in spellchecker.html to contain content that loads spellerStyle.css (for the "in progress" message) and onload submits a form. the result of that submittal is then shown. that result content loads spellerStyle.css and wordWindow.js. wordWindow.js - used by form submit result put into blank.html controls.html - bottom frame in spellchecker.html. loads controlWindow.js and spellerStyle.css controlWindow.js - loaded by controls.html images/ - not needed. index.html loads netjs-devel.gif server-scripts/ - collection of server-side form handlers. we are using our java one instead. GNU ASpell ========== GNU ASpell is an LGPL project hosted at http://aspell.sourceforge.net/ (same as http://aspell.net/ ). We are using an unchanged version 0.60.1 of ASpell, and version 6.0-0 of the ASpell English word list. The original sources are located in: sources/aspell-0.60.1.tar.gz sources/aspell6-en-6.0-0.tar.gz I have created a pre-built tar of all aspell files in: sources/aspell-0.60.1-solaris.tar.gz It goes into subdirectories of /usr/local. It must be untar'd as root: root# gzcat sources/aspell-0.60.1-solaris.tar.gz | (cd /; tar xvf -) Unless you have gcc already installed, you need to also install some shared libraries for gcc into /usr/local: root# gzcat sources/gccso.tar.gz | (cd /; tar xvf - ) Solaris binaries of aspell are also available from sunfreeware: root# wget http://ftp.sunfreeware.com/ftp/pub/freeware/sparc/8/aspell-0.50.5-sol8-sparc-local.gz root# pkgadd -v -d aspell* Though these are generally older than what we have (at least, at the time of this writing). Solaris machines may already have an ancient (like version 3) already installed, but that is best ignored. If you want to build from source yourself, read install instructions at: http://aspell.sourceforge.net/man-html/Generic-Install-Instructions.html Briefly: build aspell # ./configure --enable-compile-in-filters (avoid trying to have dynamically loaded input filters) # make # make install install dictionary download dictionaries from ftp://ftp.gnu.org/gnu/aspell/dict/en/aspell6-en-6.0-0.tar.bz2 (see also http://wordlist.sourceforge.net/) # cd aspell6-en-6.0.0/ # ./configure # make # make install Note that aspell requires a whole bunch of files; it isn't just a matter of having the aspell executable and a few wordlist files. Also, it is certainly possible to install aspell into some other directory, besides /usr/local, but like a lot of GNU packages, it pretty much has to have the ultimate installation directory fixed at compile time. So if someone wants it installed somewhere else, it'll have to be compiled from source. WinterTree Medical Word List ============================ We purchased a medical word list from WinterTree. See http://www.wintertree-software.com/dev/dict/wordlists.html This was $499. The sources are: sources/mewordlist.zip - the downloaded product sources/qbemail90.pdf - the invoice sources/cleanwords.pl - utility i wrote to convert for use by ASpell sources/en-wtmedical.rws - the ASpell word list created by the medical wordlist files (binary) sources/ma.txt - a single file containing the american english medical words (ascii) To install the medical dictionary on your host: # cp sources/en-wtmedical.rws /usr/local/lib/aspell-0.60/ (assuming that aspell is installed in that directory.)