Technocat Games

Fast ArrayBuffer to Base64 in JavaScript

There's a surprising amount of things we can learn from something that seems so simple. December 19, 2024
Fast ArrayBuffer to Base64 in JavaScript

If you want the code

Here it is:

function base64ToArrayBuffer(base64String) {
    const binaryString = atob(base64String);
    const len = binaryString.length;
    const array = new Uint8Array(len);
    for (let i = 0; i < len; i++) {
        array[i] = binaryString.charCodeAt(i);
    }
    return array.buffer;
}

function arrayBufferToBase64(buffer) {
    const array = Uint16Array.from(new Uint8Array(buffer));
    const binaryString = new TextDecoder("UTF-16").decode(array);
    return btoa(binaryString);
}

If you want a story

I wanted to convert an ArrayBuffer to and from Base64 for a project. It’s crazy that this is not a standard feature in browsers to this day. In Node.js, you can simply do Buffer.from(arrayBuffer).toString("base64"), and it’s extremely fast.

Searching up on Google I came across this Stack Overflow question. Trying out each answer, I found that some of them don’t work with large buffers (think 50MB), or even worse, don’t work with raw binary data at all. In this blog post I will break down each solution, their pros and cons. Then I will make a fundamental analysys of the problem and propose a new solution. If you like to get nerdy, this will be right up your alley.

The Stack Overflow solutions

The naïve approach

Why not just convert the buffer to a String and chuck it into btoa()?

// https://stackoverflow.com/a/73588368
function naive(buffer) {
    const decoder = new TextDecoder()
    const decodedText = decoder.decode(buffer)
    return btoa(decodedText)
}

If we make this simple test, we’ll quickly find the problem.

let notAscii = new Uint8Array([255]);
naive(notAscii.buffer);
// Uncaught DOMException: String contains an invalid character

Here’s why: TextDecoder defaults to interpreting the buffer as an UTF-8 string. In UTF-8, the first bit has a special meaning. If the first bit is 0, that means that the character is 1 byte long. If the first bit is 1, it means that the character is larger than 1 byte.

0b0XXXXXXX ✅ This represents a 1 byte long character.

0b1XXXXXXX ❌ The decoder expects another byte to complete the character.

So with the byte value 255 (0b11111111), the decoder understands that there’s something missing in this supposed Unicode character. We can’t just treat any buffer as a String.

The String building approach

Here’s an answer that will actually work:

// https://stackoverflow.com/a/9458996
function arrayBufferToBase64(buffer) {
    var binaryString = '';
    var array = new Uint8Array(buffer);
    var len = array.byteLength;
    for (var i = 0; i < len; i++) {
        binaryString += String.fromCharCode(array[i]);
    }
    return window.btoa(binaryString);
}

Here we build a valid String, byte by byte, by converting each byte into a valid Unicode character using String.fromCharCode(). This builds a proper String, that is then passed to BTOA.

Let’s test it in Firefox and Chrome (the only two relevant browsers).

const randomBytes = new Uint8Array(50000000); // 50MB
for (let i = 0; i < randomBytes.length; i++) {
    randomBytes[i] = Math.floor(Math.random() * 256);
}
const buffer = randomBytes.buffer;

function measureExecutionTime(fn) {
    const start = performance.now();
    const result = fn();
    const end = performance.now();
    console.log(`${(end - start).toFixed(2)} ms`);
    return result;
}

measureExecutionTime(() => arrayBufferToBase64(buffer));
// Firefox: 860.00 ms
// Chrome: 4575.80 ms

Contrary to what you might expect, Firefox did it a whopping 5.3 times faster. Even then, almost a second is way too much for simply converting to Base64. It turns out that concatenating a string 50 million times is not efficient.

The FileReader approach

This answer brings a creative use of Blob and FileReader:

// https://stackoverflow.com/a/58339391
async function encode(buffer) {
    return new Promise((resolve, reject) => {
        const blob = new Blob([buffer]);
        const reader = new FileReader();
        reader.onload = e => resolve(e.target.result.split(',')[1]);
        reader.onerror = reject;
        reader.readAsDataURL(blob);
    });
}

They take advantage of the fact that browsers will represent the blob as a Base64 encoded URI. This is the fastest solution I’ve found, clocking in at 94.00 ms on Firefox and 102.60 ms on Chrome.

The only issue with this approach is that it is an async function, so you can only call it from another async function.

Back to the basics

To make a good, fast and non-async function we need to first understand what btoa() really does.

Before ArrayBuffers and Uint8Arrays were invented, people used Strings to store binary data. Strings in JavaScript are encoded in UTF-16, which means that each character uses 2 bytes at a minimum. So they simply used the lower byte of each character to store data, and that makes it a valid String.

For example, FF FF FF can be encoded as FF 00 FF 00 FF 00, which is the valid UTF-16 string "ÿÿÿ". A String with binary data encoded such as this is called a binary string.

btoa() will interpret the lower byte of each character as an input. If the upper byte is not 00, you will get an error.

Building a new solution

If we look at this from a low-level perspective, it’s pretty simple:

  1. Create an array 2 times larger than the input

  2. Copy the data from the input to that array, skipping every other byte on the destination

    Copying an Uint8Array into an Uint16Array

  3. Interpret the array as an UTF-16 string

  4. Pass that string to btoa()

For step 2, I did write a for loop at first, but it turns out that there’s a slightly faster way. Since UTF-16 deals with 16 bits, why not use an Uint16Array instead?

const array = Uint16Array.from(new Uint8Array(buffer));

Uint16Array.from() will copy each value from the input array into the new array. It will implicitly convert each 8-bit value to 16-bits, which is exactly what we want.

With that piece of the puzzle, we can get to our final answer:

function arrayBufferToBase64(buffer) {
    const array = Uint16Array.from(new Uint8Array(buffer));
    const binaryString = new TextDecoder("UTF-16").decode(array);
    return btoa(binaryString);
}

This takes 269.00 ms on Firefox and 241.20 ms on Chrome. It does lose to the FileReader approach, especially in Firefox. Well, I’m sorry to disappoint you. I do believe, though, that there’s value in it being synchronous, not forcing your code to use await or passing a callback.

Converting a Base64 string to an ArrayBuffer

I won’t spend too much time on this one, because it is less interesting than encoding to Base64. My first go at it was to make pretty much the same thing as encoding, but in reverse. Unfortunately, TextEncoder doesn’t support UTF-16 in browsers, which limits my options. I had to use charCodeAt();

function base64ToArrayBuffer(base64String) {
    const binaryString = atob(base64String);
    const array = Uint8Array.from(binaryString, c => c.charCodeAt(0));
    return array.buffer;
}

This takes 1157.00 ms on Firefox and 4244.20 ms on Chrome, Firefox once again handily beating the market leader. I believe it has something to do with allocating a new String for each character in binaryString. In the end, a simple loop is much faster.

function base64ToArrayBuffer(base64String) {
    const binaryString = atob(base64String);
    const len = binaryString.length;
    const array = new Uint8Array(len);
    for (let i = 0; i < len; i++) {
        array[i] = binaryString.charCodeAt(i);
    }
    return array.buffer;
}

The loop version comes at 200.00 ms on Firefox and Chrome redeems itself at 155.80 ms. All our effort will become irrelevant soon though, because…

Firefox saves the day

There is a proposal for a native method for all of this:

const base64 = new Uint8Array(buffer).toBase64();
const buffer = Uint8Array.fromBase64(base64String).buffer;

Only Firefox has implemented this so far, encoding in 81.00 ms and decoding in 287.00 ms. Funnily enough, our decoding function beats the native solution.

Conclusion

I conclude that if you got to this point of the text, you’re a massive nerd. I also conclude that it’s insane that it is taking so long for browsers to implement a native way of doing this simple task. Also Firefox has much better performance in string manipulation than I expected.

With this research we learned more about text encoding, TypedArrays, btoa() and a little bit of web history.

I couldn’t beat the FileReader version, but I believe I have found the fastest synchronous method. Maybe I could push this further and make a WebAssembly version… I think I will work on other projects instead.