The Waving Flag: Social Media Posts From The Archive

Background

Whilst checking my Bluesky feed I noticed someone posting links to, and images from, their blog. Not new posts, but posts from years ago. It struck me this was good use of a blog archive. However, I didn't want to do this manually: even daily posts would soon pall.

In January I removed over one hundred low‑value posts, improved the underlying HTML code, and brought the layout of all the remaining posts up to my current standard. Today there are 464 posts on this blog, which is more than enough for a lengthy series of recycled posts. I just needed a suitable script I could automate.

Thanks to the perplexity.ai, I now have working python scripts, for both bluesky and the fediverse, that produce posts like this:

Read on if you are running Linux (Xubuntu 24.04) and you'd like to try them (at your own risk of course).

Instructions

[1] Create a venv folder

As Ubuntu 24.04 (and all its variants) is quite fussy about running python scripts it's best to set up a dedicated virtual environment (venv) for the scripts. Both scripts use the same venv. This isolates the script as well as holding all the required libraries.

The first step is to install python3-venv, if it isn't already installed:

sudo apt install python3-venv

Next create the venv in a suitable folder and install the required libraries with these commands in a terminal:

python3 -m venv ~/.local/share/pipx/venvs/archive_bots
source ~/.local/share/pipx/venvs/archive_bots/bin/activate
pip install requests beautifulsoup4 Pillow

Please note: I created the archive_bots folder in a hidden directory with all the others I've created, but you can use any folder in your home folder.

[2] Bluesky script

First, copy this script and save it somewhere in your home folder. I used ~/System/scripts/random_bsky_post.py

import os
import random
import requests
import xml.etree.ElementTree as ET
from datetime import datetime, timezone
from bs4 import BeautifulSoup

# Use environment variables for safety
HANDLE = "bsky_handle"
APP_PASSWORD = "app_password"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"

def get_urls(sitemap_url):
    xml = requests.get(sitemap_url, timeout=30)
    xml.raise_for_status()
    root = ET.fromstring(xml.text)
    ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
    return [loc.text.strip() for loc in root.findall(".//sm:loc", ns)]

def get_og_metadata(url):
    try:
        resp = requests.get(url, timeout=15)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "html.parser")

        title = soup.find("meta", property="og:title")
        title = title["content"].strip() if title else "Blog post"

        desc = soup.find("meta", property="og:description")
        desc = desc["content"].strip()[:300] if desc else "15mm historical wargaming"

        img = soup.find("meta", property="og:image")
        img_url = img["content"].strip() if img else None

        return title, desc, img_url
    except:
        return "Blog post", "15mm historical wargaming", None

def upload_blob(access_jwt, image_url):
    if not image_url:
        return None

    img_resp = requests.get(image_url, timeout=30)
    img_resp.raise_for_status()

    resp = requests.post(
        "https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
        headers={
            "Authorization": f"Bearer {access_jwt}",
            "Content-Type": img_resp.headers.get("Content-Type", "image/jpeg"),
        },
        data=img_resp.content,
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()["blob"]

urls = get_urls(SITEMAP_URL)
pick = random.choice(urls)
title, description, image_url = get_og_metadata(pick)

session = requests.post(
    "https://bsky.social/xrpc/com.atproto.server.createSession",
    json={"identifier": HANDLE, "password": APP_PASSWORD},
    timeout=30,
)
session.raise_for_status()
session = session.json()

thumb_blob = upload_blob(session["accessJwt"], image_url)

text = f"""A post from my blog archive:
{title}
{pick}
#tabletop #wargames #miniatures"""

facets = []
url_start = len(text[:text.find(pick)].encode("utf-8"))
url_end = url_start + len(pick.encode("utf-8"))
facets.append({
    "index": {"byteStart": url_start, "byteEnd": url_end},
    "features": [{"$type": "app.bsky.richtext.facet#link", "uri": pick}],
})

for tag in ["#tabletop", "#wargames", "#miniatures"]:
    pos = text.find(tag)
    if pos != -1:
        start = len(text[:pos].encode("utf-8"))
        end = start + len(tag.encode("utf-8"))
        facets.append({
            "index": {"byteStart": start, "byteEnd": end},
            "features": [{"$type": "app.bsky.richtext.facet#tag", "tag": tag[1:]}],
        })

record = {
    "$type": "app.bsky.feed.post",
    "text": text,
    "createdAt": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
    "facets": facets,
    "embed": {
        "$type": "app.bsky.embed.external",
        "external": {
            "uri": pick,
            "title": title,
            "description": description,
        }
    }
}

if thumb_blob:
    record["embed"]["external"]["thumb"] = thumb_blob

resp = requests.post(
    "https://bsky.social/xrpc/com.atproto.repo.createRecord",
    headers={"Authorization": f"Bearer {session['accessJwt']}"},
    json={
        "repo": session["did"],
        "collection": "app.bsky.feed.post",
        "record": record,
    },
    timeout=30,
)
resp.raise_for_status()

print(resp.json()["uri"])

The script requires three pieces of information to work: your Bluskey handle (or username), an app password and the url of your blog's sitemap.

Your Bsky handle is on your profile page (don't add the @). You can create an app password via Settings|Privacy and Security|App passwords (don't use your regular password). The final piece is you blog's name (if on Blogger) or url (if not on Blogger).

Once you have this information, edit this section accordingly:

# Use environment variables for safety
HANDLE = "bsky_handle"
APP_PASSWORD = "app_password"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"

Then edit these sections to reflect your blog's content:

return "Blog post", "15mm historical wargaming", None

#tabletop #wargames #miniatures

The script is now ready for testing.

[3] Fediverse script

First copy this script and save it somewhere in your home folder. I used ~/System/scripts/random_fedi_post.py

import os
import random
import requests
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup

MASTODON_BASE_URL = "https://your_instance.social"
MASTODON_TOKEN = "your_access_token"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"

def get_urls(sitemap_url):
    xml = requests.get(sitemap_url, timeout=30)
    xml.raise_for_status()
    root = ET.fromstring(xml.text)
    ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
    return [loc.text.strip() for loc in root.findall(".//sm:loc", ns)]

def get_og_metadata(url):
    try:
        resp = requests.get(url, timeout=15)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "html.parser")

        title = soup.find("meta", property="og:title")
        title = title["content"].strip() if title else "Blog post"

        desc = soup.find("meta", property="og:description")
        desc = desc["content"].strip() if desc else "15mm historical wargaming"

        img = soup.find("meta", property="og:image")
        img_url = img["content"].strip() if img else None

        return title, desc, img_url
    except:
        return "Blog post", "15mm historical wargaming", None

def upload_media(image_url):
    if not image_url:
        return None

    img_resp = requests.get(image_url, timeout=30)
    img_resp.raise_for_status()

    filename = image_url.split("?")[0].rsplit("/", 1)[-1] or "image.jpg"
    files = {
        "file": (filename, img_resp.content)
    }

    resp = requests.post(
        f"{MASTODON_BASE_URL}/api/v2/media",
        headers={"Authorization": f"Bearer {MASTODON_TOKEN}"},
        files=files,
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()["id"]

urls = get_urls(SITEMAP_URL)
pick = random.choice(urls)
title, description, image_url = get_og_metadata(pick)

status = f"""A post from my blog archive:
{title}
{pick}
#tabletop #wargames #miniatures"""

data = {
    "status": status,
    "visibility": "public",
}

resp = requests.post(
    f"{MASTODON_BASE_URL}/api/v1/statuses",
    headers={"Authorization": f"Bearer {MASTODON_TOKEN}"},
    data=data,
    timeout=30,
)
resp.raise_for_status()
print(resp.json()["url"])

The script requires three pieces of information to work: the name of your fediverse (Mastodon) instance, an access token and the url of your blog's sitemap.

You can create an access token via Preferences|Development|Your applications page from your Mastodon home page

Once you have this information, edit this section accordingly:

# Use environment variables for safety
MASTODON_BASE_URL = "https://your_instance.social"
MASTODON_TOKEN = "your_access_token"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"

Then edit these sections to reflect your blog's content:

return "Blog post", "15mm historical wargaming", None

#tabletop #wargames #miniatures

The script is now ready for testing.

[4] Testing

To test both scripts run these commands in a terminal:

/home/$USER/.local/share/pipx/venvs/archive_bots/bin/python /home/$USER/System/scripts/random_bsky_post.py
/home/$USER/.local/share/pipx/venvs/archive_bots/bin/python /home/$USER/System/scripts/random_fedi_post.py

Edit the script and venv locations if you used anything different.

If they don't work I suggest checking the three lines that contain your credentials and blog information.

[5] Automation with cron

I chose to run each script every three hours with these lines in my list of cron jobs:

0 */3 * * * /home/user/.local/share/pipx/venvs/archive_bots/bin/python /home/user/System/scripts/random_bsky_post.py
5 */3 * * * /home/user/.local/share/pipx/venvs/archive_bots/bin/python /home/user/System/scripts/random_fedi_post.py

Edit the commands to use your username, and if you've changed the script and venv locations.

This usually means I post three or four times a day as my computer is not always on. You can adjust the frequency of the cron jobs or set it to post once a day, week or month: whatever works best for your content.

If you want to keep the list of cron jobs short, place the two commands above in a simple bash script and call that in cron.

Close all

Social Media Posts From The Archive (April 2026)
Crossposting To Bluesky (November 2024)
Quality Time (July 2023)
Twitter, RSS feeds and "Getting a Life" (May 2023)
Social Media Changeover (November 2022)

Be sure to read the comments as I've updated each one as things changed.

5 comments :

Vexillia said...: These scripts have been live for just under three weeks, and have proven to a success.

I thought people might find the older posts of little value and the regularity of posting off putting. On the contrary, they have led to increased interactions on social media and a surge in visits to the featured blog posts. I think this has been helped by the fact that there're over 400 posts to choose from: there's not much repetition.

The only quirk I've spotted is where I've redirected a dozen or so Northern League Reviews to one static page. This pops more often than regular posts by sheer weight of numbers.

Fingers crossed this continues.; 12 May 2026 at 12:36:00 BST
Vexillia said...: After further consideration I have combined the Bluesky & Fediverse scripts into one; mainly because I use both and it's simpler. I took the opportunity to change things slightly:

[1] It uses the blog's Atom feed rather than the sitemap (better data content) and adds a line with the original posting date (data not in sitemap).
[2] The script pages the Atom feed multiple times to ensure all 400 plus posts are used. The initial scripts seems to do this too, but I wanted to be doubly sure.
[3] The Bluesky post no longer contains the URL as the post preview serves the same purpose. I only included it to provide some indication of the post's date (now no longer required). The URL is required for the Fediverse post as this automatically creates the post preview, so the posts look slightly different.

As not everyone will want the combined script, I have not edited the original post to include it: it's long enough as it is. If you'd like a copy get in touch via the contact form in the blog footer.; 17 May 2026 at 10:04:00 BST
Vexillia said...: Can't resist tinkering. The script now adds hashtags from the Blogger labels and exports some data to a log file. The latter will be checked occasionally as Google have a habit of changing things which may break the script.; 17 May 2026 at 14:25:00 BST
Vexillia said...: Another week using this script. I now have different posts in my list of "top ten posts this week". It's no longer just the search engine favourites which is nice to see.; 25 May 2026 at 12:37:00 BST
Vexillia said...: Another enhancement. The script now checks an exclusions file. This contains a list of redirected posts (usually to a common static page).

For example I have 16 annual reviews of the Northern League which redirect to one static summary page. This meant the static page was appearing far too often by sheer weight of numbers (16/468 vs 1/468). Now only the 2025 review is considered. I won't be writing anymore annual review posts: I'll just update the summary page.; 12 July 2026 at 16:24:00 BST

Linkbar

Wednesday, 22 April 2026

Social Media Posts From The Archive

Background

Instructions

Related posts

5 comments :

Salute The Flag

Resources & Logs

Wars of the Roses

Recent Posts

Indices

My Reading List

Recent Pages

About Me

Site Stats

Search This Blog

Follow by RSS

Follow by email

Contact Form