Using an RNN to generate Bill Wurtz notes

Textgenrnn is fun

2019-10-05 project

Bill Wurtz is an American musician who became reasonably famous through short musical videos posted to Vine and YouTube. I was searching through his website the other day, and stumbled upon a page labeled notebook, and thought I should check it out.

Bill’s notebook is a large (about 580 posts) collection of random thoughts, ideas, and sometimes just collections of words. A prime source of entertainment, and neural network inputs..

“If you are looking to burn something, fire may be just the ticket” - Bill Wurtz

Choosing the right tool for the job

If you haven’t noticed yet, Im building a neural net to generate notes based on his writing style and content. Anyone who has read my first post will know that I have already done a similar project in the past. This means time to reuse come code!

For this project, I decided to use an amazing library by @minimaxir called textgenrnn. This Python library will handle all of the heavy (and light) work of training an RNN on a text dataset, then generating new text.

Building a dataset

This project was a joke, so I didn’t bother with properly grabbing each post, categorizing them, and parsing them. Instead, I build a little script to pull every HTML file from Bill’s website, and regex out the body. This ended up leaving some artifacts in the output, but I don’t really mind.

import re
import requests

def loadAllUrls():
    page = requests.get("").text

    links = re.findall(r"HREF=\"(.*)\"style", page)

    return links

def dumpEach(urls):
    for url in urls:
        page = requests.get(f"{url}").text.strip().replace(
            "</br>", "").replace("<br>", "").replace("\n", " ")

        data = re.findall(r"</head>(.*)", page, re.MULTILINE)

        # ensure data
        if len(data) == 0:


urls = loadAllUrls()
print(f"Loaded {len(urls)} pages")

This script will print each of Bill’s notes to the console (on it’s own line). I used a simple redirect to write this to a file.

python3 > posts.txt


To train the RNN, I just used some of textgenrnn’s example code to read the posts file, and build an HDF5 file to store the RNN’s neurons.

from textgenrnn import textgenrnn

generator = textgenrnn()
generator.train_from_file("/path/to/posts.txt", num_epochs=100)

This takes quite a while to run, so I offloaded it to a Droplet, and left it running overnight.

The results

Here are some of my favorite generated notes:

“note: do not feel better”

“hi I am a car.”

“i am stuff and think about this before . this is it, the pond. how do they make me feel better?”

“i am still about the floor”

Not perfect, but it is readable english, so i call it a win!

Play with the code

I have uploaded the basic code, the scraped posts, and a partial hdf5 file to GitHub for anyone to play with. Maybe make a twitter bot out of this?

Thank you for reading this post. If you enjoyed the content, and want to let me know, or want to ask any questions, please contact me via one of the methods listed here. If you would like to be notified about future posts, feel free to load my rss feed into your favorite feed reader, or follow me on Twitter for notifications about my work and future posts.

If you have the time to read some more, I recommend checking out one of the following posts:

Tunneling a printer from a home network to a VPN
I use a self-hosted VPN to access all my devices at all times, and to deal with my school's aggressive firewall. This post explains the process I use for exposing my home printer to the VPN.
2020 Wrap-Up
2020 has been my most productive year so far in terms of software development. This post looks back at the year
How I have tweaked my Minecraft client to be 'just right'
Over the past 10 years, I have been building the perfect Minecraft experience for myself. This post shares the collection of mods I run, and why I use them.

Made with ♥ by Evan Pratten | RSS | API Status