O Say Can You See

rails solr Pregenerated HTML

Data: https://github.com/CDRH/data_oscys
Code: https://github.com/CDRH/earlywashingtondc
Dev: https://cdrhdev1.unl.edu/earlywashingtondc/
Prod: https://earlywashingtondc.org

OSCYS (or Early Washingon DC) is a unique site because not only is there a map component, but there are a huge amount of relationships stored as RDF data. To update this site, you will need to run a specific script to regenerate these relationships.

Behind the scenes, it is good to know that OSCYS was an early Rails site, and as such does NOT rely on the CDRH’s API, but rather on a proto-API in Solr. Additionally, the interaction between documents, cases, and people in OSCYS is pretty complex and so we use Ruby to generate the Solr files rather than relying on XSLT.

Unlike other projects, the maps will be automatically generated if you run any HTML generation. However, if you would like to update ONLY the maps, you may simply specify -f csv.

Update Development

Log in and pull to development

Step 1: Log in

SSH into the development server and navigate to the data repository. [More info: if this is your first time logging in, you will need to set up server access.]

ssh username@cdrhdev1.unl.edu


cd /var/local/www/data/collections/oscys

Step 2: Check for changes

Check to see which branch is currently checked out and see if there are any unexpected changes.

git status

Typically, you should be on the dev or main branch. [More info: What to do if it is on a different branch.]

If you're on your desired branch, but there are outstanding changes, you will need to deal with them before you pull. [More info: There are files changed on the server.]

Step 3: Pull!

Once you are on the correct branch, there are no outstanding changes, and everything is copacetic, then you can pull updates.

git pull origin [current branch name]

You should either get a message saying "Already up to date" or you should see a bunch of new files show up with no errors. [More info: What if there is a problem when I git pull?]

Step 4: Upload images and other media

If your project uses images, audio, or video, you will need to upload any files that are connected with your new documents! TODO figure out some instructions knowing that most people use SFTP clients and that this isn't always on the same server oh no.

You will need to put your files in the following location on cors1601.unl.edu:

/var/local/www/media/oscys/[relevant directory]

If you REPLACE any images and you do not see the new version show up on your site, this is because the IIIF image server caches images and you will need to either wait (a few days) or ask a CDRH dev to purge the cache.

Now you should be ready to begin updating the development website's contents!

Update the development site

Step 0. Generate HTML

Yes, step 0, because most sites with Solr don't use pre-generated HTML so consider your site special enough to have its own unique step!

Let's generate the new pages for your site before we update the search, and that way they will already be in place and ready to be discovered.

post -e development -x html

If you are only adding a few new files, you may want to regenerate only those which were updated when you did the git pull earlier:

post -e development -x html -u today

This script may take some time to run. [More info: troubleshoot if there are errors with the HTML generation.]

Step 1. Clear Solr (optional)

If you have removed any files or changed identifiers, you will need to clear the old, no longer used file from the Solr index. If not, skip to Step 2.

You may either clear either one specific file or ALL the files. Keep in mind that if you clear all the files, the site will have no content until you repopulate the search.

solr_clear_index -e development

You can also clear a specific file if you only need to drop one item from the index, but know that if you use an id like `10` you may be clearing more items than you intend, so be specific if possible!

# clear one file from the index with -r [id]
#   (do not include extension)
#   (be specific with id to avoid accidentally removing more files)
solr_clear_index -e development -r wwa\.0001

[More info: other options for clearing Solr by subcategory, etc]

Step 2. Populate the search!

This is the part where you get to add your new and updated files into your site's search!

post -e development -x solr

This may take some time to run, so if you are impatient, you may want to consider only posting specific files unless if you cleared the entire index in the last step. [More info: learn about posting by file type, date, and file name.]

Step 3: Update the relationships

If you have not updated the `rdf/oscys.relationships.csv`, you may skip this step. However, if you have downloaded a new copy of the Google Sheets document with person relationships, then add it as `rdf/oscys.relationships.csv`. If you do this locally and push, you can use `git pull` to move the new copy to the server.

From the root directory of the data repo, run the following command:

ruby scripts/csv_to_rdf.rb

This may take some time to run. Once it is done, it will have updated the file `rdf/oscys.relationships.ttl`. In a few steps when you are adding things to Git, you will want to make sure that you also add this file if you are happy with how everything looks.

Check the site

Go to https://cdrhdev1.unl.edu/earlywashingtondc/. Your changes should already be showing up there, although you may need to hard refresh to see them. [More info: what is a hard refresh?]

Make sure that you check not only the search results but an individual item page to make sure everything is shipshape.

Push changes (if needed)

Part 1: Check changes

After you've run your script, check if there are any files that have changed which you should commit.

git status

In general, files generated for the development environment will not need to be committed, so do not be surprised if there isn't anything here!

Part 2: Commit and push

If there are expected changes, such as to output/development/html, or for project-specific features, you will need to commit them.

# you may run git add repeatedly to add different directories and files
git add [file path] [or directory path]

# check it over to make sure you're adding everything you need
git status

git commit -m "message about your changes"

git push origin [current branch name]

Update Production

Log in and pull to production

Step 1: Log in

SSH into the production server and navigate to the data repository. [More info: if this is your first time logging in, you will need to set up server access.]

ssh username@cors1601.unl.edu


cd /var/local/www/data/collections/oscys

Step 2: Check for changes

Check to see which branch is currently checked out and see if there are any unexpected changes.

git status

Typically, you should be on the dev or main branch. [More info: What to do if it is on a different branch.]

If you're on your desired branch, but there are outstanding changes, you will need to deal with them before you pull. [More info: There are files changed on the server.]

Step 3: Pull!

Once you are on the correct branch, there are no outstanding changes, and everything is copacetic, then you can pull updates.

git pull origin [current branch name]

You should either get a message saying "Already up to date" or you should see a bunch of new files show up with no errors. [More info: What if there is a problem when I git pull?]

Step 4: Upload images and other media

This section is repeated for the sake of being thorough, but if you already have added your media files either during the development step or previously, you may skip this!

You will need to put your files in the following location on cors1601.unl.edu:

/var/local/www/media/oscys/[relevant directory]

Now you should be ready to begin updating the production website's contents!

Update the production site

Step 0. Generate HTML

Yes, step 0, because most sites with Solr don't use pre-generated HTML so consider your site special enough to have its own unique step!

Let's generate the new pages for your site before we update the search, and that way they will already be in place and ready to be discovered.

post -e production -x html

If you are only adding a few new files, you may want to regenerate only those which were updated when you did the git pull earlier:

post -e production -x html -u today

This script may take some time to run. [More info: troubleshoot if there are errors with the HTML generation.]

Step 1. Clear Solr (optional)

If you have removed any files or changed identifiers, you will need to clear the old, no longer used file from the Solr index. If not, skip to Step 2.

You may either clear either one specific file or ALL the files. Keep in mind that if you clear all the files, the site will have no content until you repopulate the search.

solr_clear_index -e production

# clear one file from the index with -r [id]
#   (do not include extension)
#   (be specific with id to avoid accidentally removing more files)
solr_clear_index -e production -r wwa\.0001

[More info: other options for clearing Solr by subcategory, etc]

Step 2. Populate the search!

This is the part where you get to add your new and updated files into your site's search!

post -e production -x solr

Step 3: Update the relationships

From the root directory of the data repo, run the following command:

ruby scripts/csv_to_rdf.rb

Check the site

Go to https://earlywashingtondc.org. Your changes should already be showing up there, although you may need to hard refresh to see them. [More info: what is a hard refresh?]

Make sure that you check not only the search results but an individual item page to make sure everything is shipshape.

Don't close this page yet! You're not quite done!

Push changes (if needed)

Part 1: Check changes

After you've run your script, check if there are any files that have changed which you should commit.

git status

Part 2: Commit and push

If there are expected changes, such as to output/production/html, or for project-specific features, you will need to commit them.

# you may run git add repeatedly to add different directories and files
git add [file path] [or directory path]

# check it over to make sure you're adding everything you need
git status

git commit -m "message about your changes"

git push origin [current branch name]

More Information

Login and Pull

How to set up SSH

TODO

What to do if the git branch is not dev or main

TODO

There are files changed on the server

TODO

What if there is a problem when I git pull?

TODO cover permissions, merge conflict

Update the site

Clearing Solr by subcategory and more

TODO

Clearing Elasticsearch by subcategory and more

TODO possibly combine with above

Posting to Solr by specific file type, name, and updated date

TODO

HTML generation errors

TODO

Elasticsearch population errors

TODO

Check the site

Cocoon is broken

TODO

How to hard refresh

TODO