storylen is a command-line utility that counts the words in files and classifies them into story types (short story, novella, novel...) Its output is very similar to the *nix program wc
I read a lot of fiction and I've been actively trying to read as much of it as possible on PDAs or similar portable devices for almost ten years now. Over time, I've come to reject book formats that are encrypted or otherwise consumer-unfriendly. This has gotten to the stage where I'm most comfortable with books as plain text files.
At this point I have an impressive library of literary works and needed a way to quickly assess the size of a book when deciding what to grab. Knowing that there are semi-accepted word count ranges correponding to story sizes, I decided to write a small program for the job in what's become my favorite programming language.
storylen is written in Haskell and uses the Data.ByteString library. It's known to work with GHC 6.6
Example of program output:
174372 novel ArnasonEleanor-AWomanOfTheIronPeople_1991.txt 76626 novel BearGreg-Psychlone_1979.txt 6073 short story EllisonHarlan-IHaveNoMouth_1967.txt 60596 novel FantasyScienceFiction2007-02.txt 72324 novel HorneMarc-TokyoZero_2003.txt 9156 novellette KellyJamesPatrick-StandingInLineWithMisterJimmy_1991.txt 1062 flash fiction PoeEdgarAllen-Raven.txt 11616 novellette RickertM-JourneyIntoTheKingdom_2006.txt 14334 novellette RuschKristineKathryn-Echea_1998.txt 28990 novella StrossCharles-TheConcreteJungle_2004.txt
From my brief research into the issue, I gathered there are arguments over how exactly to word-count literature. The typing criteria from this Wikipedia article seemed sufficient for my needs. This program is doing a literal word count of each file a-la `wc -w`
For those who are curious about acquiring non-DRM-encumbered reading material, allow me to direct you to these places:
storylen's --help usage information:
Usage: storylen [OPTIONS] [FILES] Show story word count and categorization. This is intended to be run on plain ascii text files. With no FILES, read from standard input. Options: -n NUM --number=NUM Categorize the explicit number of words given -h --help This help text Story categories are determined using the following table, found on Wikipedia: http://en.wikipedia.org/wiki/Word_count <= 2000 - flash fiction <= 7500 - short story <= 17500 - novellette <= 60000 - novella <= 199999 - novel above 199999 - epic Version 006 2007-Apr-02 Dino Morelli <dino@ui3.info>
Debian binary package for i386 architecture.
last modified 2007-05-14 23:31