I write enough text in Emacs that it's worth using a spellchecker. As I started comparing options I ended up spellchecking all my existing articles on this website. While I didn't get everything, I fixed 475 typos across 222 articles in a few hours, and built up a custom dictionary of my most common words.
The Emacs manual on spelling says it supports Hunspell, Aspell, Ispell and Enchant. For English the choice is between Hunspell and Aspell; Ispell is a predecessor or Aspell, and Enchant wraps other spelling libraries (most notably specific ones for Turkish and Finnish). Running some examples I found Aspell gave much better suggestions so I decided to go with that.
For spell checking existing posts I found a guide from ThorneLabs to output all the words Aspell marks as misspelled, which I adapted a little.
for POST in content/post/* do cat $POST | aspell list --ignore 2 --mode markdown --camel-case -l en_AU done | sort | uniq -c | sort -nr | sed -E 's/^ *[0-9]+ *//' > words.txt
The flags are:
--ignore 2ignore any words of 2 or less characters
--camel-caseaccept CamelCased words as valid spellings (occurs a lot in some articles like job posting schemata)
-l en_AUset the language to Australian English
--mode markdowntreat the files as markdown
The pipes at the end are a shell transformation to sort the words Aspell by frequency, and output it to a file called
I could then go through this file and sort out valid words not in Aspell's dictionary from actually misspelled words.
Sometimes I would need to check the context using
grep -r '\bmodularity\b' content/post, or check Aspell's suggestions with
echo modularity | aspell -a.
I ended up with a list of words that I use frequently that needed to be added to my personal dictionary at
Unfortunately I had to remove any phrases (such as ad hoc), words with hyphens (such as Gell-Mann), or unicode characters (such as Dieudonné or Möbius) from the dictionary or Aspell would complain.
I ended up with a second list of words that I'd misspelled frequently, and could run back through aspell to correct.
Apparently I commonly misspelled words like hierarchy, manageable, metres, occurred, focusing, ridiculous, and inheritance.
Another thing I get wrong is capitalisation of name like GitHub, TensorFlow, or Emacs.
I fixed the most common ones and ones that were obviously wrong with sed, then viewed the diff in Magit (using
magit-diff-refine-hunk to highlight the changed words as in the cover image).
There were a couple cases I needed to remove, like changes in URLs or quotes, but the majority changes were correct.
I then could add the dictionary to my dotfiles, and use it in Emacs through the
ispell commands or
flyspell commands (which I still need to configure).
I'm not the only person to have this idea and it's interesting to see other people's personal dictionaries.