Project Puritas Corpus "Readability Scale"

Not open for further replies.


Puritan Board Junior
Hello all. I have been really busy this past year worried so much on perfection, that I wondered a few days ago if possibly I was blinded to what has already been provided? This led me to finding the word counts for all the works in the EEBO-TCP/Project Puritas Corpus and comparing them to the number of non-spelling errors in each doc. This gave me a type of readability rating system for each, where I was able to get a percentage of the text which had errors in it, and then make a list with the works which had the fewest percentage of errors to the greatest. I was very surprised. The main reason, is Logan has made these works (plus the ones he has been adding) downloadable in ePub and easily uploadable to eReading devices. That corpus can be downloaded here. The thing is, that many, if not most of these texts will probably not be professionally republished in our lifetimes, if ever. And I know personally I can spend so much time trying to strive for perfection, when many of these texts are already 10X more readable than the facsimiles. But this is what I found.

There were 2,389 works with 0.10% errors or less.
There were 4,008 works with 0.26% errors or less.
There were 5,088 works with 0.50% errors or less
There were 5,912 works out of 6,654 with 1% errors or less.

My personal plan is to focus on reading the ones within the scope of a quarter percent of errors or less. But whoa. They say that only about 800-1000 Puritan books have been republished. 4,000 is 4 times that number! 4 Times the selection and possible edification. Below is a link to a Spreadsheet that lists all the non-spelling errors, and the error percents of the Puritan & Non-Conformist; EEBO-TCP/Project Puritas Corpus. It is a great tool and overview of each text, what to expect error wise, and what to expect readability wise.
Last edited:
Not open for further replies.