Friday, 23 February 2018

Site Improvements (Index)

This is not really a hobby related post.  It's also going to get very "techie" very quickly.

I've spent the last two or three days trying to make sense of the growing number of posts here.  As a result I have had to dive deep into the bowels of Google's API for Blogger.  I thought it would be simple!  My efforts are presented here to help any bloggers interested in doing likewise.

All avowedly "non-techie" people should look away now.

Displaying a list of all posts
Simple thought I.  A nice page for the blog with all my posts in chronological order: so much better than the standard widget. No chance!

The first attempt involved exporting the blog contents using the tools in the blog settings.  I soon gave up using this as a data source as it's a complex file and even if I could produce an index I'd have to refresh it every so often: not an ideal solution.

My next attempt was a more general internet search. Apparently I am not alone in wanting to do this.  I found two nice examples of JavaScript that would do what I wanted courtesy of stackoverflow.  Both used the main site's RSS feed and worked (sort of). 

Fix 1: When pasting code be sure to check the options for the Blogger page are " or Blogger will change the code for you and it won't work.

When pasted into a new blog page the first code didn't work at all.  The second was fine but the list was very plain; just line after line of links. I really wanted something a bit more structured.

Fix 2: I checked the code in the first example and found it was missing a closing </div> tag at the very end of the code.  Adding this tag produced a nicely structured list of posts sorted by month and year.  After a few minor edits to make the code specific to this blog I thought I'd cracked it.  No chance!

Sadly the list was truncated to the last 150 posts.  There's over 340 on this site.  It turns out that Google have been reducing the number of items in Blogger feeds by splinting in in pages.  In 2011 they set the limit at 500 per page and in 2016 they reduced it even further to 150.  This isn't a problem as most users don't need the whole feed. I found two solutions to this problem.

Firstly I tried changing the feed url at the end of the code from:
<script src="https:vexillia.blogspot.com/feeds/posts/default?&alt=json-in-script&callback=loadtoc"></script>
to
<script src="https:vexillia.blogspot.com/feeds/posts/default?results=150&start-index=1&alt=json-in-script&callback=loadtoc"></script>
<script src="https:vexillia.blogspot.com/feeds/posts/default?results=150&start-index=151&alt=json-in-script&callback=loadtoc"></script>
This was supposed to deliver two 150 item chunks from the feed that would fit together seamlessly. No chance!

They did deliver two chunks of 150 but they didn't fit together.  For some reason I couldn't understand the second chunk had recent posts in it that shouldn't have been there.

Fix 3: Next I tried changing the feed url to:
<script src="http://www.blogger.com/feeds/BlogID/posts/default?max-results=500&alt=json-in-script&callback=loadtoc"></script>
This worked!  I got all my posts.  Hooray.  What you see on the new Index page are the results of this feed API call.

This issue looks to be resolved; at least until the page limit is reduced or until I write my 501st post.

Downloading an Index
I no longer thought anything to do with Google's Blogger API was going to be simple but at least I knew the information was in there. I just needed to get it from Google in a format that would allow me to create a simple spreadsheet.

As an aside I wanted everything in a spreadsheet so I could use IFTTT to then automate maintenance of the spreadsheet.  If I kept the sheet on Google Docs then IFTTT would add a line every time I published a new post: neat.  I don't really need a full index for this but I found a old list covering the first three years of the blog and thought it would be nice to have a complete index.

This time searching the internet was less than helpful.  Eventually, after much reading, I realised that I could get the data from Google in JSON format via their API and that all I needed was an application or online app that would convert that data to CSV.  From there it would be easy to edit the csv file in Excel to produce the final spreadsheet.

Google provide an online tool to interrogate v3 of their Blogger API. Further searches turned up SQLify which offers a free online service to convert JSON files to CSV (as of Mon, 12 September, 2022 this service is no longer available try www.convertcsv.com/json-to-csv.htm instead).  I downloaded some data from Google and easily converted it to csv and opened the data in Excel.  Editing it would be a matter of minutes work.  Job done?  No chance!

The Google online service has a similar restriction to that I encountered above.  Asking for more than 100 items results in a error message.  The simple solution was to run the tool four times (3 x 100 + 44); convert the four data sets and stitch them together.  Unfortunately, this was far from straightforward as using the form like this was a confusing at first.

Fix 4: After much experimentation I found that the best way to get consecutive data sets was to first run the Google API with no date restrictions. Then I looked for the full UTC date of the last post in the set (something like 2007-11-05T13:15:30). Next I subtracted an hour or two from the time and used this as the endDate on the form to create the next data set.  The date needs to be in full UTC format or the form won't work.  I then repeated the process until I had all my data.

Having to set an end date confused me at first because the first (top) post in the data set is that closest to the end date! This must have something to do with the fact that the feed list is in reverse chronological order. 

I found that copying the output from the Google API and pasting it into the SQLify converter speeded things up quite a bit.  I wasn't that interested in the post labels but I did notice that SQLify didn't do a brilliant job with them.

Finally, I have a full index in a spreadsheet and, after a bit of spreadsheet magic, it's clear that searching this will be far quicker than any online alternative.  I've set the sheet so that I can search my month, year, label and combinations of labels.  This should save me a lot of time.

2 comments:

  1. After more than six years the index has grown automatically thanks to Google Docs and IFTTT. However, I spent this afternoon experimenting with the IMPORTFEED function in Google Docs which works well. I now have my blog feed listed in the spreadsheet that holds my index. It auto updates when you open the sheet and every hour if the sheet stays open. With this set up it's the work of seconds to copy a row, or rows, from the feed tab and paste it into the index tab. Even so, I've written a simple script to automate the cut & paste. All I have to now do is push a button.

    This may seem an overall retrograde step but I have always had to open the index spreadsheet to update the labels manually. I have lost the "unattended" update facility of IFTTT but gained because I don't have to wait for IFTTT to process the task which can take hours. I can now open the index spreadsheet immediately after posting an article, push a button, and update the labels: job done.

    It's a minor addition to the workflow associated with writing a blog post but it will further reduce my dependence on assorted web services (except Google!) by reducing the need to use IFTTT.

    ReplyDelete

Comments are always welcome but this blog no longer accepts anonymous comments.