main - code

storylen

about

storylen is a command-line utility that counts the words in files and classifies them into story types (short story, novella, novel...) Its output is very similar to the *nix program wc

I read a lot of fiction and I've been actively trying to read as much of it as possible on PDAs or similar portable devices for almost ten years now. Over time, I've come to reject book formats that are encrypted or otherwise consumer-unfriendly. This has gotten to the stage where I'm most comfortable with books as plain text files.

At this point I have an impressive library of literary works and needed a way to quickly assess the size of a book when deciding what to grab. Knowing that there are semi-accepted word count ranges correponding to story sizes, I decided to write a small program for the job in what's become my favorite programming language.

storylen is written in Haskell and uses the Data.ByteString library. It's known to work with GHC 6.6

Example of program output:

174372 novel         ArnasonEleanor-AWomanOfTheIronPeople_1991.txt
 76626 novel         BearGreg-Psychlone_1979.txt
  6073 short story   EllisonHarlan-IHaveNoMouth_1967.txt
 60596 novel         FantasyScienceFiction2007-02.txt
 72324 novel         HorneMarc-TokyoZero_2003.txt
  9156 novellette    KellyJamesPatrick-StandingInLineWithMisterJimmy_1991.txt
  1062 flash fiction PoeEdgarAllen-Raven.txt
 11616 novellette    RickertM-JourneyIntoTheKingdom_2006.txt
 14334 novellette    RuschKristineKathryn-Echea_1998.txt
 28990 novella       StrossCharles-TheConcreteJungle_2004.txt

From my brief research into the issue, I gathered there are arguments over how exactly to word-count literature. The typing criteria from this Wikipedia article seemed sufficient for my needs. This program is doing a literal word count of each file a-la `wc -w`

For those who are curious about acquiring non-DRM-encumbered reading material, allow me to direct you to these places:

news

2007-05-14
Redesigned the QuickCheck testing code.
2007-04-30 (v009)
Refactored code substantially to make it more pure.
Added some unit testing with QuickCheck.
2007-04-09
Added Debian binary package (see binaries section below)
2007-04-04 (v008)
Initial release

documentation

storylen's --help usage information:

Usage: storylen [OPTIONS] [FILES]
Show story word count and categorization. This is intended to be run on
plain ascii text files. With no FILES, read from standard input.

Options:
  -n NUM  --number=NUM  Categorize the explicit number of words given
  -h      --help        This help text

Story categories are determined using the following table, found on
Wikipedia: http://en.wikipedia.org/wiki/Word_count

   <= 2000 - flash fiction
   <= 7500 - short story
   <= 17500 - novellette
   <= 60000 - novella
   <= 199999 - novel
   above 199999 - epic

Version 006  2007-Apr-02  Dino Morelli <dino@ui3.info>

source

binaries

Debian binary package for i386 architecture.



last modified 2007-05-14 23:31