Written by Roy van Rijn (royvanrijn.com) on Jan 30, 2017 13:28:12

Vikipedia: A Java YouTube bot

YouTube, the future of television. I’ve got a lot of subscriptions to YouTube channels that deliver quality content, and those shows are ‘cast’ (using my Chromecast) to my TV. Another thing I often do is look up information, for example I watch talks from programming conferences like Devoxx using YouTube.

This gave me an idea, what if I can take some free information (like Wikipedia, all creative commons) and use that to create content for YouTube? Maybe I’ll even get some views :-)

So this is what I’ve come up with, the following video is generated completely automatically:

Step 1: Wikipedia API

The first step in creating these videos was to get some actual freely available content. For this I chose to use Wikipedia. This can be accessed using JSON/REST, and to quickly hack this into place I decided to use a framework called ‘Unirest’. This makes the while prototype easy:

// API URI for the English Wikipedia
private final String ENGLISH_WIKIPEDIA = "https://en.wikipedia.org/w/api.php";

// Using a 'search query', name of the page, retrieve the content:
private Optional<String> getPageContent(final String query) throws UnirestException {
    //Example URI created for query 'Stack Overflow': https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&titles=Stack%20Overflow&redirects=true
    HttpResponse<JsonNode> response = Unirest.get(ENGLISH_WIKIPEDIA)
            .queryString("format", "json")
            .queryString("action", "query")
            .queryString("prop", "extracts")
            .queryString("titles", query)
            .queryString("redirects", "true")
            .asJson();

    JSONObject pagesObject = response
            .getBody()
            .getObject()
            .getJSONObject("query")
            .getJSONObject("pages");
    if(pagesObject.keys().hasNext()) {
        String pageKey = pagesObject.keys().next();
        String htmlContent = pagesObject.getJSONObject(pageKey).getString("extract");
        return Optional.of(htmlContent);
    }
    return Optional.empty();
}

The resulting JSON (see this example) contains the Wikipedia page in simple processed HTML. From this HTML I used a library called ‘Jsoup’ to extract the plain text. The first alinea of text contains the introduction of the topic, this is what I’ve decided to use as content for the video. Initially I had all the content of the page in the videos, but they quickly turned out to be 30+ minutes long and very boring.

This is the code I used to go from HTML to the first alinea of text:

/**
 * Clean the HTML, get inner text of first alinea and break into sentences.
 * @param htmlContent
 * @return
 */
private List<String> cleanAndBreakIntoSentences(final String htmlContent) {
    Document document = Jsoup.parse(htmlContent);

    List<String> fullText = new ArrayList<>();
    Elements elements = document.body().children();

    Iterator<Element> iterator = elements.iterator();
    while(iterator.hasNext()) {
        Element elem = iterator.next();

        // All the text is contained in paragraph elements:
        if(elem.is("p")) {

            String source = elem.text();

            // Break the text into sentences using the smart JDK 'BreakIterator':
            BreakIterator breakIterator = BreakIterator.getSentenceInstance(Locale.US);
            breakIterator.setText(source);
            int start = breakIterator.first();
            for (int end = breakIterator.next(); end != BreakIterator.DONE; start = end, end = breakIterator.next()) {
                fullText.add(source.substring(start,end));
            }
        }
        // If we reach a header, stop processing, we only need the first section/alinea:
        if(elem.is("h2")) {
            // We break on the first (sub-) title:
            break;
        }
    }
    return fullText;
}

Step 2: Stunning visuals with Java 2D

We have some text, now how do we turn this into a video? We need visuals! I decided to cut the content into seperate lines, and for each line I have a single ‘slide’. This is an (buffered) image created using Java’s 2D API and saved to file.

/**
 * For each sentence, create a buffered image and paint some text on this.
 * We need the text to be 'wrapped', for this I repurpose a JTextArea, on a JPanel.
 * 
 * Also increase the font size so the text fits on the screen.
 */
public void paintText(Graphics2D g2, String text) {
    JPanel panel = new JPanel();
    panel.setOpaque(true);
    panel.setSize(new Dimension(1920, 1080));
    panel.setVisible(true);

    JTextArea area = new JTextArea();
    area.setSize(new Dimension(1920,1080));
    area.setFocusable(false);
    area.setBorder(BorderFactory.createEmptyBorder(BORDER_SIZE, BORDER_SIZE, BORDER_SIZE, BORDER_SIZE));
    area.setWrapStyleWord(true);
    area.setLineWrap(true);
    area.setText(text);
    area.setOpaque(true);

    int fontSize = 10;
    // Increase the font size as long as it fits on screen (and don't go bigger than 100, looks ugly):
    while(area.getPreferredScrollableViewportSize().getHeight() < 1080 && fontSize <= 100) {
        fontSize += 10;
        area.setFont(new Font("Futura", Font.PLAIN, fontSize));
    }
    area.setFont(new Font("Futura", Font.PLAIN, fontSize - 10));
    panel.add(area);

    // Print this panel on the Graphics2D object (which is a BufferedImage)
    panel.printAll(g2);
}

Step 3: Text-To-Speech

Instead of using a Java Text-To-Speech (TTS) library, which aren’t very clear or good, I decided to use the build-in OS/X command ‘say’. On the command line you can execute: say Hello world!

The ‘say’ command supports different voices and also has the ability to store the output in a file. Which is perfect for us!

    // Execute the say command with a given filename, voice and sentence to speak:
    final Process p = Runtime.getRuntime().exec("say -o " + filename + " -v " + voice + " -r 210 " + line + "");

    new Thread(() -> {
        BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
        String line1 = null;

        try {
            while ((line1 = input.readLine()) != null)
                System.out.println(line1);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }).start();

    // Wait for the say-command to finish
    p.waitFor();

Step 4: Putting it all together

What do we have now? A list of sentences from Wikipedia, for each sentence we have an image and we have an audio snippet created with TTS. Now it is time to put everything together and create a video. The easiest way to do this is to use ffmpeg, a powerful command line video tool. To make this easier I created a small method ‘executeCommandLine’ which does the same thing as the snippet with ‘Runtime.getRuntime().exec’ shown above.

Combine the image (png file) and the TTS audio (mp4 file) into a series of MPG files:

    for(VikipediaSentence sentence : sentences) {
        String filename = sentence.getFilename();
        //Combine audio and image into a short video:
        executeCommandLine("ffmpeg -loop 1 -y -i " + filename + ".png -i " + filename + ".mp4 -tune stillimage -shortest " + filename + ".mpg");
    }

We now have a series of small MPG files. They need to be combined into a single video. For this we need a text file which contains a list of all the MPG files. This is stored on disc and next we execute the following command:

//merge all:
executeCommandLine("ffmpeg -f concat -safe 0 -i output/snippet_list.txt -c copy output/output.mpg");

Next we convert the video from one large MPG to MP4 file (which is a nicer, compressed format):

//convert to mp4:
executeCommandLine("ffmpeg -i output/output.mpg -c:v libx264 -c:a aac -strict experimental -b:a 192k -pix_fmt yuv420p -shortest output/output.mp4");

And to add a little more ‘production value’ I’ve gone online and found some soothing (free, creative commons) background music. This music is looped into a large audio file of 50 minutes (enough for all videos). Again using ffmpeg the audio is mixed in:

// add background music:
executeCommandLine("ffmpeg -i output/output.mp4 -i input/background_loop.mp3 -filter_complex amerge -c:v copy -c:a aac -strict experimental output/finaloutput.mp4");

There it is, we have a complete video with text audio, text visuals and background music! Only one thing left to do, uploading to YouTube!

Step 5: Uploading everything to YouTube

To automate the process of uploading, I first needed a place to put the videos, so I created a new YouTube channel named: Vikipedia

Next we can use the Google YouTube Data API. It turns out this is very easy and well documented. I don’t even need to share the code I wrote here because I used the example from Google’s website itself! Check out this amazing documentation. Very detailed and does exactly what I needed it to do.

The example code takes a video file and uploads it. The only things I changed were setting the correct description, I was ready to generate all the content!

Not implemented (yet?)

Things I wanted to implement (but didn’t and probably never will):

Add a little color to the videos (black and white is boring)
Add waveform from ffmpeg (just to keep it visually interesing)
Download (did this) and use (didn’t do this) the images from the Wikipedia page and add those to the video

Final thoughts

Yes, it is pretty easy to hack together a bot to create YouTube videos.
No, the quality isn’t that good… yet?
Yes, it can be made in a single afternoon!
No, I’m not going to automate the spidering of Wikipedia and upload a million videos… (but I could easily do that!)

Up to now all the videos created are based on a small list of ‘hot topics’ I found on Wikipedia and Twitter. I don’t want to automate the entire process and flood YouTube, the content just isn’t good enough for that. But it was a fun project for a lazy afternoon! I’ve learned a lot of new tricks, maybe you can benefit from that as well.

The next thing I want to do with the 400+ YouTube videos I’ve generated is to see how (and if) people are going to watch them. Do they show up in the search results? Which topics are searched and watched? Maybe more on that in a future post!

Do you have any good ideas on how to generate videos, let me know in the comments! I’ve created a bot that generates music some years, maybe I should improve that bot to upload its own YouTube music videos? ;-)