-->
Reflections on a Poem Operating System and the relationship between art and technology, I consider the Poem vibe through a short study in the Studio I did last week with OpenAI, Python, Warwalking, Psychogeography, and a Past Project
Contributed By: Julian Bleecker
Published On: Saturday, March 2, 2024 at 09:25:58 PST
Updated On: Monday, March 4, 2024 at 08:48:53 PST
This post is a few things, including an exercise in having an idea and rapidly prototyping it, doing so with an LLM, and then remembering a similar project from 20 years ago in which I drew together the idioms of psychogeography, location-based sensing, and poetry.
This was a combination of the facility of having some LLM or another help translate little day dreams into materialized representations.
This is what the Poem Operating System robot I made with OpenAI and Elevenlabs has been creating šš½
Someone somewhere in the NFL Discord during a recent Office Hours perhaps it was, mentioned a world in which an Operating System was somehow a Poem-based OS.
āWhat is that?ā, I wondered.
And that was that, until I wondered again while listening to šBeeBot, the latest study and exploration to investigate how to bring the soul back into the internet from Hopscotch Research Lab.
So here I am, walking Chewy the Dog, and I saw a flow of interconnected APIs that were incanting a Poem outloud.
And Iām stuck on Mattās Poem/1 project which I eagerly backed because, well, I couldnāt think of a reason not to. I mean, I could, but I ignored that because not everything has to have utility, although the first thing Iāll do is open it up and figure out how to get that USB cable from sticking out the side and making my eyeballs hurt.1 So now Iām back in NFL Global HQ and Iām sitting at Workstation 1A, and Iām seeing something: Itās OpenAI ā> ElevenLabs ā> Spoken Poems. Thatās it. Nothing crazy. Itās an exercise. But I had this spark of an idea and I wanted to see if it would ignite a little fire.
I started tapping.
[[Pssst! PoemOS and projects like this are inspired by the awesome conversations and spirit of shared collaborative and coordinated creativity that comes directly out of the Near Future Laboratory Discord. If youāre interested in being a part of this community, then join us šš½]]
Some of this musing and wondering outloud came out of these awesomely meandering biweekly calls with dens and thereās been a thread where we wander around his continuing experiment with place-based audio interactions. It was maybe..well, letās see āĀ it says 2020 where I was alpha testing a little bot that he called MarsBot
that was being worked on at Foursquare Labs back in 2020. It was the simplest little thing that would duck into your audio stream if you had airpods on and it would tell you something related to where you were.
[[Parenthetically this never ceases to crack me up when I went to watch the whole Crowley Clan on Family Feud win the something-or-another round. We were prepped by some PA to come bolting out of the stands to go crazy. I was genuinely amped. I remember Dennisā dad saying to me, āwho the hellareyou?! while we were all hooping it up.ā Fun dinner that evening, I tell you what.]]
Anyway.
The concept of audio design for location-based things is super playful and super cool and gets lots of good discussion going on our little biweekly standing call. Iāve been trying out a new version called šBeeBot
. It just kind of sits there. You donāt really pay attention to it and can easily forget itās even there until, while listening to, like..The AI Breakdown podcast, šBeeBot
ducks in and tells me something about where I am. (I happened to be sitting on the beach at, you know āĀ Venice Beach ā and it said something about some feature of the beach behind me. Early days and super evocative as you wonder what one would do with such a baseline interaction paradigm.)
So now Iām sitting on the beach, wondering about lightweight forms of audio design that arenāt overburdened podcasts. Little things that would just appear in your ear. And perhaps with spatial audio integrated into them. (Iāve got some VST plugins Iāll use for theNear Future Laboratory Podcast that can move audio around in 3-space, so thatās kind of interesting..)
And then Iām thinking back to my WiFiKu project from, like..2004 back in New York City. I proposed a commission to Christina Ray over at Glowlab for the annual PsyGeoConflux event she ran.(Check out those links ā and thank you to Rhizome for maintaining these important waypoint.)
I wanted to do a project where Iād walk around neighborhoods in Manhattan with a backpack on that had some hardware that basically harvested WiFi SSIDs (just the names of WiFi access points it found) and cobble them together into Haiku.
Right? You with me? Makes absolutely no sense! But to the psychogeographer, itās baseline psygeocartographic mapping.
This was effectively wardriving only I was walking around with a funny backpack or portable computer rig. (Remember ā 2004. No smartphones and PDAs probably could be rigged to wardrive/walk, but likeā¦itās not really about the instrument.)
Iād gather these lists in these data files and then Iād have to manually make a flimsy file of where I found what, like this file šš½ from in and around lower Manhattan / LES. (ps Iād just get rid of the tons of Linksys
and Netgear
APs that people didnāt bother renaming.)
Houston between Ludlow and Essex
* Punk_Fish Net* Stay The Fuck Off
Avenue A between 1st and 2nd
* Neighbornode* McYellin
Avenue A between 2nd 3rd
* cafe.com* BlackHole* angellagoddard
13th between 1st Ave and 3rd Ave 245 E. 13th* radioyan* Carol's network* monstrosity* rinse-repeat
13th St. between 3rd and 4th 134 E. 13th* jonathan hayess computer* surfhere* smugmonkey
13th St. between University and 5th Ave 27 E. 13th St
* Threescompany* SurfandSip* Waycool
5th Ave between 13th and 14th
* KillBill* Cyber Other* Cyber Wireless
5th Ave between 16th and 19th* Anil* Human* heartsclub
5th Ave between 19th and 23rd* godzilla* The Lounge* mookie
So I guess, thinking about Poem/1 and such āĀ somehow the idiom of Poetry has been with me for, well ā thatās 20 years, eh? I think thatās partially because you can get away with non-sense or things that are not immediately valuable for their utility. Like ā people donāt wardrive for the sake of making poetry, they usually do it to find open access points and then exploit them for nefarious who-knows-what-but-you-might-guess. If someone wardrove and said they were a War Driving Poet, you might at one level be, like..oh, okay. Butā¦why?
So now here I am wandering into wondering about a Poem OS so I asked ChatGPT to assume I had an OpenAI API key (which I do) and an Elevenlabs API key (which I do) and gave it some speciifications.
How about a Python program that interfaces with the Open AI API and with the ElevenLabs API that generates a short pithy poem and then generates audio of that poem read by an ElevenLabs voice. Assume I have API keys for both of these services.
Iād like to be able to provide the specific prompt to generate the poem in a configuration file as an array of possible prompts for each day of the week, and assume one poem is generated per day.
I got this back amongst some other stuff, that didnāt quite work but was enough of a simple scaffolding that I could then start to fix and refine it.
import jsonimport openaiimport requestsfrom datetime import datetime
# Load your API keys from an environment variable or a secure fileopenai_api_key = "your_openai_api_key"elevenlabs_api_key = "your_elevenlabs_api_key"
# Configure your OpenAI and ElevenLabs API keysopenai.api_key = openai_api_key
# Load prompts from the configuration filewith open("config.json", "r") as file: prompts = json.load(file)
# Determine today's day and select a prompttoday = datetime.now().strftime("%A")prompt = prompts[today][0] # Choosing the first prompt for simplicity
# Generate a poem using OpenAI's APIresponse = openai.Completion.create( engine="text-davinci-003", prompt=f"Write a short, pithy poem about {prompt}.", max_tokens=100, temperature=0.7)
poem = response.choices[0].text.strip()print("Generated Poem:\n", poem)
# Generate audio of the poem using ElevenLabs APIelevenlabs_url = "https://api.elevenlabs.io/synthesize"elevenlabs_headers = {"Authorization": f"Bearer {elevenlabs_api_key}"}elevenlabs_data = { "text": poem, "voice": "selected_voice_id" # You need to select a voice ID from ElevenLabs' list}
audio_response = requests.post(elevenlabs_url, json=elevenlabs_data, headers=elevenlabs_headers)
if audio_response.ok: # Save the audio file. The exact method depends on ElevenLabs' response format with open("poem_audio.mp3", "wb") as audio_file: audio_file.write(audio_response.content) print("Audio generated successfully.")else: print("Failed to generate audio.")
Less than an hour or so of tweaking the initial response from ChatGPT ā which was a great start but not quite there and had some API interfacing entirely wrong ā I started effectively evolving, reconsidering, and refining what I wanted PoemOS thing to do. Iām thinking ā okay, different kind of Poem for each day of the week, just focusing on the day of the week.
And now Iām thinking..what about a version that listens to action in the Near Future Laboratory Discord community and makes a poem about that, or the kind of low-hanging-fruit edition that does a News Poem based on what happened that day or the day before ā or a weekly summary as a Poem. I bet thatād be kinda weird?
So now Iām imagining what I would do with an mp3 file of something/some voice reading a poem that something else made.
But, sitting there in the studio and listening to the reading voice reading a machine generated Poem didnāt feel done or complete.
So I drop that audio file into Descript, which is one hammer in the toolbox I use to produce the Near Future Laboratory Podcast so now I have a transcript and can use Descriptās very awkward side tool to create one of those audiograms that basically strums out the libretto/text/transcript or whatever youād call it for the generated audio. So now I have a poem, being read, with the words playing out in a video.
And now Iām taking this video and I drop it into Davinci Resolve Studio Edition, and stir in some graphics and some background video I shot the other day, and now I have a little video poem.
So ā this was just a couple hour Thursday evening experiment, the day after a pretty focussed and intense day at Chapman University when I wanted to do something that just felt like something where no one was necessarily expecting anything.
A study of an artifact from a future in which you get a Poem popped up into your audioscape.
Next steps? Well ā there are so many adjacent possibilities that Iāll reserve new ideas for another one of those kinds of wandering days looking for different kinds of possible futures.
Hereās the Github repo, if youāre curious: https://github.com/NearFutureLaboratory/poem_os
And this is the evolution with me banging away on the keyboard to imagine in code and wandering around half-baked ideas about what I was doing about 45 minutes after what ChatGPT initially suggested šš½
import json# import openaiimport requestsfrom datetime import datetimefrom openai import OpenAIfrom pydub import AudioSegmentfrom pydub.playback import playimport os
def create_poem(prompt):
response = client.chat.completions.create( messages=[ { "role": "user", "content": prompt, } ], model="gpt-4-turbo-preview", )
result = response.choices[0].message.content.strip() return result
def get_all_voices():
# An API key is defined here. You'd normally get this from the service you're accessing. It's a form of authentication. XI_API_KEY = elevenlabs_api_key
# This is the URL for the API endpoint we'll be making a GET request to. url = "https://api.elevenlabs.io/v1/voices"
# Here, headers for the HTTP request are being set up. # Headers provide metadata about the request. In this case, we're specifying the content type and including our API key for authentication. headers = { "Accept": "application/json", "xi-api-key": elevenlabs_api_key, "Content-Type": "application/json" }
# A GET request is sent to the API endpoint. The URL and the headers are passed into the request. response = requests.get(url, headers=headers)
# The JSON response from the API is parsed using the built-in .json() method from the 'requests' library. # This transforms the JSON data into a Python dictionary for further processing. # I want to keep a list of the current voices I have available. data = response.json() with open('./voices.json', 'w') as json_file: json.dump(data, json_file, indent=4) # A loop is created to iterate over each 'voice' in the 'voices' list from the parsed data. # The 'voices' list consists of dictionaries, each representing a unique voice provided by the API. for voice in data['voices']: # For each 'voice', the 'name' and 'voice_id' are printed out. # These keys in the voice dictionary contain values that provide information about the specific voice. print(f"{voice['name']}; {voice['voice_id']}")
return data['voices']# Load configuration, API keys, and prompts from the configuration file
with open("config.json", "r") as file: config = json.load(file)
openai_api_key = config["api_keys"]["openai"]elevenlabs_api_key = config["api_keys"]["elevenlabs"]
with open("prompts.json", "r") as file: p = json.load(file)
prompts = p["prompts"]
# Configure your OpenAI API keyOpenAI.api_key = openai_api_keyclient = OpenAI( api_key=OpenAI.api_key,)
voices = get_all_voices()
# Determine today's day and select a prompttoday = datetime.now().strftime("%A")prompt = prompts[today]["prompts"][0]# Choosing the first prompt for simplicity
poem = create_poem(prompt)
print("Generated Poem:\n", poem)
CHUNK_SIZE = 1024voice_id = prompts[today]["voice_id"]
# Generate audio of the poem using ElevenLabs APIelevenlabs_url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"print(elevenlabs_url)
data = { "text": f"{poem}", "model_id": "eleven_monolingual_v1", "voice_settings": { "stability": 0.5, "similarity_boost": 0.5 }}
headers = { "Accept": "audio/mpeg", "Content-Type": "application/json", "xi-api-key": elevenlabs_api_key}
response = requests.post(elevenlabs_url, json=data, headers=headers)if response.status_code != 200: print(f"Failed to generate audio. Status code: {audio_response.status_code}")else:# Assuming `prompt` is the selected prompt for today today_date = datetime.now().strftime("%d%m%Y") today_day = datetime.now().strftime("%A").lower() right_now = datetime.now().strftime("%H%M%S") prompt_index = prompts[today]["prompts"].index(prompt) # Assuming `prompts[today]` returns the list of prompts for today
directory = today_day if not os.path.exists(directory): os.makedirs(directory)
# Format the filename filename_root = f"{today_day}_{today_date}_{right_now}_{prompt_index}" filename_audio = f"{today_day}_{today_date}_{right_now}_{prompt_index}.mp3" filename_path = f"{directory}/{filename_audio}" with open(filename_path, 'wb') as f: for chunk in response.iter_content(chunk_size=CHUNK_SIZE): if chunk: f.write(chunk)
target_voice_id = voice_id voice_name = '' # Find the voice with the specified voice_id for voice in voices: try: print(voice) voice_id = voice['voice_id'] print(voice_id) if voice_id == target_voice_id: voice_name = voice['name'] print(voice_name) break except TypeError as e: print(f"Error: {e}, with element: {voice}")
audio = AudioSegment.from_mp3(filename_path) duration_seconds = len(audio) / 1000.0 # Pydub uses milliseconds
# try to make a compact version if it goes long? could be cartoon-y, but.. target_duration = 59 # Target duration in seconds timed_filename = filename_root if duration_seconds > target_duration: speed_up_factor = duration_seconds / target_duration # Speed up the audio audio = audio.speedup(playback_speed=speed_up_factor) # To save the modified audio timed_filename_path = f"{directory}/{filename_root}_59.mp3" timed_filename_actual = f"{filename_root}_59.mp3" audio.export(timed_filename_path, format="mp3")
# Collect the data to save in a JSON file as a kind of record of what happened poem_data = { "day": today_day, "prompt": prompt, "poem": poem, "voice_name": voice_name, "voice_id": voice_id, "date": today_date, "audio": {"filename" : filename_audio, "duration" : duration_seconds, "timed_filename" : timed_filename} }
# Determine the JSON filename (replace .mp3 with .json in the audio filename) json_filename = filename_audio.replace('.mp3', '.json')
# Save to JSON file with open(json_filename, 'w') as json_file: json.dump(poem_data, json_file, indent=4)
print(f"Data saved to {json_filename}.")
And this is an example of the JSON file containing weekday prompts. I go in here and adjust these routinely. I donāt want this to be some 100% automated robot. Thatās somehow less interesting than it being more of a wind-up toy that you have to, you know ā wind up, clean, adjust, set on a path rather than some daemon that just runs in the background.
{"prompts": { "Sunday": { "prompts": ["Write a short pithy humorous poem about Sunday", "Please write a lymric about Sunday"], "voice_id": "xxx" }, "Monday": { "prompts": ["Please write a knock-knock joke about Monday", "Please write a satirical poem about Monday"], "voice_id": "xxx" }, "Tuesday": { "prompts": ["Write a short pithy humorous poem about Tuesday", "Please write a lymric about Tuesday"], "voice_id": "xxx" }, "Wednesday": { "prompts": ["Please write a knock-knock joke about Wednesday", "Please write a satirical poem about Wednesday"], "voice_id": "xxx" }, "Thursday": { "prompts": ["Write a humorous poem for Thursday", "Please write a lymric about Thursday"], "voice_id": "xxx" }, "Friday": { "prompts": ["Please write a humorous poem for Friday anticipating a weekend of fun that can be read in 60 seconds!", "Please write a satirical poem about Friday"], "voice_id": "xxx" }, "Saturday": { "prompts": ["Please write a knock-knock joke about Saturday", "Please write a satirical poem about Saturday"], "voice_id": "xxx" } }}
1. That is not a quibble or a snarky remark. Iāve built commercial hardware before and Iām not talking about with a huge team but rather on a bench in my backyard studio which is somewhat how I imagine Mattās doing this Poem/1 thing, so Iām entirely empathetic to the challenges, trade-offs, anxiety, and all of that. Truly. If I put on my dilettanteās spectacles, the cable completely throws off the lines and means this would not fit anywhere on my shelf without trailing a cable off alongside of plants and photo frames, so Iām a bit baffled. Maybe itās just for charging and you can remove it? Maybe you can horde a bunch of poems for the week and donāt have to leave it connected? I havenāt dug into it but weāll see when mine gets delivered.