6,000 Puritan Works Available for Free

Status
Not open for further replies.

Logan

Puritan Board Graduate
@davejonescue has been hinting toward pieces of this for a while but the three of us (Dave, Logan, and Alex) have been working somewhat in secret for a while and now it's finally time to share the fruits of our labor.

Many people are aware of the Early English Books Online website, which transcribed, character by character, early English books, among which Dave was able to find about 860 Puritans with around 6,000 works.

One downside (for many purposes) is the non-standard spelling of the era:

WINWORD_F7MSspdeuv.png

What if we could automate the correction of these? Alex had already started doing this on his own projects and had compiled a list of about 6,000 words and their corrections. With this basis, I wrote a script to identify an additional 17,000 of the most commonly occurring non-standard spellings, and then Alex and I painstakingly assigned them corrections.

Then I ran all 860 authors, 6,000 works through another script I'd written and the result compared to above looks like this:

WINWORD_9LPjRDtSdZ.png

So all of these Puritan works are now significantly "cleaner" than they were. We packaged these up into a customized application and would like to now reveal "Puritan Search", a free application for searching through and utilizing all of these documents. Not all are equally useful but there are some real gems that are now available to the general public and it's available, right now, for free on Windows, Mac, and Linux:
www.puritansearch.org








....But wait, there's more! In addition to the free searching application we have created above, and in the interest of making these available as a cleaner base text to readers or publishers, I have also converted each of these to PDF, EPUB, and Word. I hope that despite the lack of perfection, these will save a lot of effort by current or aspiring publishers, while the layman has immediate access to something that is serviceable for immediate reading.

chrome_f95cOYN6tR.png

These are available here:
https://sites.google.com/view/project-puritas/home

This was a team effort with months and months of effort to make happen and we pray it will be a blessing to the church worldwide for years to come. By all means share and spread the word.
 
Last edited:
There really is so much you can do with the Puritan Search software. What differentiates this from previous softwares and currently available indexes are several things.

First, if we were to compare it to the Puritan Hard Drive, the things that would set Puritan Search apart for one is its cost. It is absolutely free. This means pastors of congregations are under no obligation to pay for a copy for each one of their congregants if they chose to disperse this among their congregations. Missionaries do not have to pay for multiple copies to give to the Pastors and people they are training in underdeveloped nations. Professors and students do not need to pay to use the software in their research. Laymen and laywomen do not need to pay to use this. Also, since this software is powered by corrected hand-transcribed texts, instead of OCRed facsimiles, the search results are much greater, and many times more accurate than what could be provided in the Puritan Hard Drive. Also, this isnt a combined work of several media types, from several Reformed segments in church history. This only deals with Puritan and Non-Conformists texts from 1550-1700. Since facsimiles are almost impossible to OCR (which is what initiated the TCP to begin with) this software cant help but give better results.

Secondly, compared to the only other database I know of regarding the Puritans, and that is PRTS's Perkins Library Puritan Studies Index; Puritan Search is powered by primary source material instead of secondary source material. And, in Puritan Search, no access to any of the documents is off limits; while much of PRTS's texts in their Index you need institutional approval. Basically, the PRTS's Index is others writing about the Puritans, while Puritan Search, is the Puritans writings themselves. This isnt to shade PRTS's Index, for both are needed, but clarify that this is different in that it only uses primary source material.

www.puritansearch.com
 
Last edited:
That’s great! If you wanted to update to modern English it would be interesting to see how chatgpt would handle the task: https://community.openai.com/t/is-there-an-api-for-chatgpt3/23871/5
The problem with chat auto-correcting to modern language is it has a limit to what it will convert. It would be a very tedious task to go 10 pages at a time. We are talking about over 500,000 pages of text. In my opinion, it would be much easier to have someone like Logan, who is a tech whiz, create a script based on a contemporary dictionary, run the works, produce a list of words not included, then create a correction list like what was used in correcting the EEBO-TCP docs initially. Then re-run a script, replacing those words with their contemporary counterpart over the entire corpus. Its like one of Logans favorite sayings "dont do manually what you can automate."
 
The problem with chat auto-correcting to modern language is it has a limit to what it will convert. It would be a very tedious task to go 10 pages at a time. We are talking about over 500,000 pages of text. In my opinion, it would be much easier to have someone like Logan, who is a tech whiz, create a script based on a contemporary dictionary, run the works, produce a list of words not included, then create a correction list like what was used in correcting the EEBO-TCP docs initially. Then re-run a script, replacing those words with their contemporary counterpart over the entire corpus. Its like one of Logans favorite sayings "dont do manually what you can automate."
You’re right: Chatbot would be very tedious hence I mentioned the api which could be automated. The api for chatbot’s model (I think chatgpt-3?) is just now being released. I think previous versions are available.
 
The problem with chat auto-correcting to modern language is it has a limit to what it will convert. It would be a very tedious task to go 10 pages at a time. We are talking about over 500,000 pages of text. In my opinion, it would be much easier to have someone like Logan, who is a tech whiz, create a script based on a contemporary dictionary, run the works, produce a list of words not included, then create a correction list like what was used in correcting the EEBO-TCP docs initially. Then re-run a script, replacing those words with their contemporary counterpart over the entire corpus. Its like one of Logans favorite sayings "dont do manually what you can automate."

So I actually do sort of have a good chunk of that already. As I identified words, I also created a "supplemental dictionary" which was a list of correctly spelled, but archaic words. "Sufficeth", "Betrayeth", etc. So you'd just need to go through and add a suitable replacement to each of these that are already identified and then run the script again.
 
This project is simply amazing. I thank the LORD for this work. What a gift you have given to the church.
 
I attempted to download the works and received the following error message - is it because the file size is too large?

1675442928227.png
 
This is a little more detailed video of the programs capabilities as we wanted to keep the general introduction short and sweet.

 
Is the Puritan Hard Drive out of business with this?
I wouldn't think so. The original PDFs are still very valuable as source and while there is some overlap between works, I definitely know of some on the PHD that are not part of this.

But if you've ever tried to use PHD, it's extremely clunky and not really anything like this. So the perpetual "only 24 hours left on sale" and website from 1995 will still likely be around for a while :D
 
Is the Puritan Hard Drive out of business with this?
I do not know. Of what I could research, their 10,000+ resources were only comprised of about 2,500 texts, with the rest being audio and video. For some, a neatly indexed work of Reformed sermons, conferences, lectures, and videos may be worth the $200 they ask after their weekly sales. This resource is kind of different, in that it is only text, about 6,000 in total, and it only deals with Puritan and Reformed Non-Conformists from the period of 1550-1700. Also, there are no facsimiles included which makes all 500,000+ pages or so of text entirely searchable.

What we really aimed to do with this software is offer an index that could be dispersed globally, at no charge. Even the Encyclopedia Purtanica, a software with about 250 fully searchable works, is $99 a copy. For a small church, who would want to bless their congregation with copies of this, for missionaries, for pastors in undeveloped nations, for students, etc. That can add up quickly. Our goal is to offer Puritan literature to all, in a way that none would be hindered by cost. For a body of work that is almost entirely in the public domain, it has become quite the market; and while us in the West hardly notice, much of it being considerably low cost comparative to our medium wage, for others in the world, a single Puritan Paperback can seem like a treasure. We not only wanted to create something that could benefit the extended research into Puritan thought, by simultaneously being able to glean from their corporate mindset on any given topic; but also provide a "complete" library for those in other countries who for all other purposes would miss out on these works by 1. their reprints being too expensive for them, 2. the lack of availability to institutional access to the original facsimiles, and 3. the extreme expense for even a generous Westerner to ship books to say somewhere like Africa or India in any way for someone to garnish a considerable library; let alone on a mass scale.

With this tool, not only can the searchable index fit on a standard 16gb flash drive, but so can the full body of works in EPUB, PDF, HTML, and DOCX format. As well as both options being able to travel digitally and forgo shipping cost by being downloadable over the web. So that we are extending the work EEBO-TCP started, and again offering it to the world free of charge, with no other hopes but the glorification of God and the edification of his church.

This is not to tarnish or degrade those that put in the long hours and hard work to bring us cleanly published Puritan works, not in any way, we are greatly indebted to those that do; but, this is to advance the study of Puritanism in a way that has yet to be available, and to make available Puritan works to those who by economic strain would not be able to access reprints and contemporary published editions either way.
 
Last edited:
I wouldn't think so. The original PDFs are still very valuable as source and while there is some overlap between works, I definitely know of some on the PHD that are not part of this.

But if you've ever tried to use PHD, it's extremely clunky and not really anything like this. So the perpetual "only 24 hours left on sale" and website from 1995 will still likely be around for a while :D
Haha I know the sales gimmick you are talking about.
 
That week-long sale is 20 years old and still going strong!
As a child, I used to wonder: "How do these companies know when their TV commercials are playing in order to make sure I didn't call more than five minutes after it ended?"
 
Thank you all. It really is quite astounding what has been achieved.

I am wondering if for the EPUB format say, the individual files could perhaps be grouped into folders by author or alphabetically? Does anyone know how to do this?
 
Last edited:
Thank you all. It really is quite astounding what has been achieved.

I am wondering if for the EPUB format say, the individual files could perhaps be grouped into folders by author or alphabetically? Does anyone know how to do this?

I'm not sure what you mean, something different than what is already there? They are named first by the author, then the title. Are you meaning that you want all the Abbot_Robert files to be in a Abbot_Robert folder?

1675473536258.png
 
I'll note that the epubs may need a little light tweaking before they are ready to read. I think the converter generates them all as one "page" which could make large files not load well. They might need to be internally split (e.g., by chapters).
 
I shared it with my pastors. We are at a PCA with probably 500 or so members, so the potential influence is great. It would be awesome to see them use it in their studies and bless the congregation with it.
 
I'm not sure what you mean, something different than what is already there? They are named first by the author, then the title. Are you meaning that you want all the Abbot_Robert files to be in a Abbot_Robert folder?
Exactly this, yes, and/or perhaps under alphabetical folders "A", etc., (if possible please). But I wouldn't want to take much of your/anyone's time - I had a look online and saw it can be done using a script but I am not savvy enough to work it out myself.

I'll note that the epubs may need a little light tweaking before they are ready to read. I think the converter generates them all as one "page" which could make large files not load well. They might need to be internally split (e.g., by chapters).
I read parts of a couple last night and thought they worked really well! The vast majority of the individual files are small, I might test out the larger ones later today. I'd previously loaded Dave's 'PuritanInn' collection onto my Kindle and these seem to be better formatted.
 
Last edited:
Status
Not open for further replies.
Back
Top