Background
Whilst checking my Bluesky feed I noticed someone posting links to, and images from, their blog. Not new posts, but posts from years ago. It struck me this was good use of a blog archive. However, I didn't want to do this manually: even daily posts would soon pall.
In January I tidied all the posts here, and removed all the rubbish. I currently have 464 posts. That's more than enough for a lenghty series of re-posts. I just needed a suitable script I could automate.
Thanks to the perplexity.ai, I now have working python scripts, for both bluesky and the fediverse, that produce posts like this:
Read on if you are running Linux (Xubuntu 24.04) and you'd like to try them.
Instructions
[1] Create a venv folder
As Ubuntu 24.04 (and all its variants) is quite fussy about running python scripts it's best to set up a dedicated virtual environment (venv) for the scripts. Both scripts use the same venv. This isolates the script as well as holding all the required libraries.
The first step is to install python3-venv, if it isn't already installed:
sudo apt install python3-venv
Next create the venv in a suitable folder and install the required libraries with these commands in a terminal:
python3 -m venv ~/.local/share/pipx/venvs/archive_bots
source ~/.local/share/pipx/venvs/archive_bots/bin/activate
pip install requests beautifulsoup4 Pillow
Please note: I created the archive_bots folder in a hidden directory with all the others I've created, but you can use any folder in your home folder.
[2] Bluesky script
First copy this script and save it somewhere in your home folder. I used ~/System/scripts/random_bsky_post.py
import random
import requests
import xml.etree.ElementTree as ET
from datetime import datetime, timezone
import os
from bs4 import BeautifulSoup
from PIL import Image
import io
# Use environment variables for safety
HANDLE = "bsky_handle"
APP_PASSWORD = "app_password"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"
def get_urls(sitemap_url):
xml = requests.get(sitemap_url, timeout=30)
xml.raise_for_status()
root = ET.fromstring(xml.text)
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
return [loc.text.strip() for loc in root.findall(".//sm:loc", ns)]
def get_og_metadata_with_image(session_jwt, url):
try:
resp = requests.get(url, timeout=15)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, 'html.parser')
title = soup.find("meta", property="og:title")
title = title["content"].strip() if title else "Blog post"
desc = soup.find("meta", property="og:description")
desc = desc["content"].strip()[:300] if desc else "15mm historical wargaming"
img_tag = soup.find("meta", property="og:image")
thumb_blob = None
if img_tag:
img_url = img_tag["content"]
if "://" not in img_url:
img_url = url.rstrip('/') + '/' + img_url.lstrip('/')
img_resp = requests.get(img_url, timeout=15)
img_resp.raise_for_status()
# Resize image to reasonable size (1000x1000 max)
img = Image.open(io.BytesIO(img_resp.content))
img.thumbnail((1000, 1000))
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format='JPEG', quality=85)
img_bytes = img_byte_arr.getvalue()
# Upload as blob
blob_resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
headers={
"Authorization": f"Bearer {session_jwt}",
"Content-Type": "image/jpeg"
},
data=img_bytes,
timeout=30
)
blob_resp.raise_for_status()
thumb_blob = blob_resp.json()["blob"]
return title, desc, thumb_blob
except:
return "Blog post", "15mm historical wargaming", None
urls = get_urls(SITEMAP_URL)
pick = random.choice(urls)
session = requests.post(
"https://bsky.social/xrpc/com.atproto.server.createSession",
json={"identifier": HANDLE, "password": APP_PASSWORD},
timeout=30,
)
session.raise_for_status()
session = session.json()
session_jwt = session["accessJwt"]
title, description, thumb_blob = get_og_metadata_with_image(session_jwt, pick)
text = f"""A post from my blog's archive:
{title}
#tabletop #wargames #miniatures"""
facets = []
for tag in ["#tabletop", "#wargames", "#miniatures"]:
tag_pos = text.find(tag)
if tag_pos != -1:
tag_start = len(text[:tag_pos].encode('utf-8'))
tag_end = tag_start + len(tag.encode('utf-8'))
facets.append({
"index": {"byteStart": tag_start, "byteEnd": tag_end},
"features": [{"$type": "app.bsky.richtext.facet#tag", "tag": tag[1:]}]
})
record = {
"$type": "app.bsky.feed.post",
"text": text,
"createdAt": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
"facets": facets,
"embed": {
"$type": "app.bsky.embed.external",
"external": {
"uri": pick,
"title": title,
"description": description
}
}
}
if thumb_blob:
record["embed"]["external"]["thumb"] = thumb_blob
resp = requests.post(
"https://bsky.social/xrpc/com.atproto.repo.createRecord",
headers={"Authorization": f"Bearer {session_jwt}"},
json={
"repo": session["did"],
"collection": "app.bsky.feed.post",
"record": record,
},
timeout=30,
)
# resp.raise_for_status()
# print("Posted successfully!")
# print(f"Post URI: {resp.json()['uri']}")
The script requires three pieces of information to work: your Bluskey handle (or username), an app password and the url of your blog's sitemap.
Your Bsky handle is on your profile page (don't add the @). You can create an app password via Settings|Privacy and Security|App passwords (don't use your regular password). The final piece is you blog's name (if on Blogger) or url if not.
Once you have this information, edit this section accordingly:
# Use environment variables for safety
HANDLE = "bsky_handle"
APP_PASSWORD = "app_password"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"
Then edit these sections to reflect your blog's content:
return "Blog post", "15mm historical wargaming", None
#tabletop #wargames #miniatures
The script is now ready for testing.
[3] Fediverse script
First copy this script and save it somewhere in your home folder. I used ~/System/scripts/random_fedi_post.py
import os
import random
import requests
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
MASTODON_BASE_URL = "https://your_instance.social"
MASTODON_TOKEN = "your_access_token"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"
def get_urls(sitemap_url):
xml = requests.get(sitemap_url, timeout=30)
xml.raise_for_status()
root = ET.fromstring(xml.text)
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
return [loc.text.strip() for loc in root.findall(".//sm:loc", ns)]
def get_og_metadata(url):
try:
resp = requests.get(url, timeout=15)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
title = soup.find("meta", property="og:title")
title = title["content"].strip() if title else "Blog post"
desc = soup.find("meta", property="og:description")
desc = desc["content"].strip() if desc else "15mm historical wargaming"
img = soup.find("meta", property="og:image")
img_url = img["content"].strip() if img else None
return title, desc, img_url
except:
return "Blog post", "15mm historical wargaming", None
def upload_media(image_url):
if not image_url:
return None
img_resp = requests.get(image_url, timeout=30)
img_resp.raise_for_status()
filename = image_url.split("?")[0].rsplit("/", 1)[-1] or "image.jpg"
files = {
"file": (filename, img_resp.content)
}
resp = requests.post(
f"{MASTODON_BASE_URL}/api/v2/media",
headers={"Authorization": f"Bearer {MASTODON_TOKEN}"},
files=files,
timeout=30,
)
resp.raise_for_status()
return resp.json()["id"]
urls = get_urls(SITEMAP_URL)
pick = random.choice(urls)
title, description, image_url = get_og_metadata(pick)
status = f"""A post from the Waving Flag archive:
{title}
{pick}
#tabletop #wargames #miniatures"""
data = {
"status": status,
"visibility": "public",
}
resp = requests.post(
f"{MASTODON_BASE_URL}/api/v1/statuses",
headers={"Authorization": f"Bearer {MASTODON_TOKEN}"},
data=data,
timeout=30,
)
resp.raise_for_status()
print(resp.json()["url"])
The script requires three pieces of information to work: the name of your fediverse (Mastodon) instance, an access token and the url of your blog's sitemap.
You can create an access token via Preferences|Development|Your applications page.
Once you have this information, edit this section accordingly:
# Use environment variables for safety
MASTODON_BASE_URL = "https://your_instance.social"
MASTODON_TOKEN = "your_access_token"
SITEMAP_URL = "https://yourblog.blogspot/sitemap.xml"
Then edit these sections to reflect your blog's content:
return "Blog post", "15mm historical wargaming", None
#tabletop #wargames #miniatures
The script is now ready for testing.
[4] Testing
To test both scripts run these commands in a terminal:
/home/user/.local/share/pipx/venvs/archive_bots/bin/python /home/user/System/scripts/random_bsky_post.py
/home/user/.local/share/pipx/venvs/archive_bots/bin/python /home/user/System/scripts/random_fedi_post.py
Change the user to your username and edit the script and venv locations if you used anything different.
If they don't work I suggest checking the three lines that contain your credentials and blog information.
[5] Automation with cron
I chose to run each script every three hours with these lines in my list of cron jobs:
0 */3 * * * /home/user/.local/share/pipx/venvs/archive_bots/bin/python /home/user/System/scripts/random_bsky_post.py
5 */3 * * * /home/user/.local/share/pipx/venvs/archive_bots/bin/python /home/user/System/scripts/random_fedi_post.py
This usually means I post three or four times a day as my computer is not always on. You can adjust the frequency of the cron jobs or set it to post once a day, week or month: whatever works best for your content.
Close all


No comments :
Post a Comment