Oct 27, 2023 · 15 mins read

Buffered vs Streaming Data Transfer

Comparing buffered and streaming data transfer in Node.js server

Umakant Vashishtha


Buffered vs Streaming Data Transfer

Table of Contents

Introduction

In this article, we will see different methods of data transfer and how to efficiently transfer data from/to a server using HTTP protocol.

With HTTP protocol, the client can either transfer data to the server (upload) or request certain data from the server (download).

Client/Server can choose to transfer the data in different chunks, especially if the size of the data is more than a few kilobytes to ensure that the packet loss is minimal.

For download operations, server can choose to send the data in chunks while for upload operations, client can choose to send the data in chunks.

When request data is accumulated in memory and then processed, we call it buffered processing.
In contrast, when the data is processed at the time each chunk is received, we call it streaming processing.

By processing, it could be anything depending on the request, from storing in the file system to transmitting to some other server. For simplicity, we will only consider storing the data in the file system.

We will compare the two approaches and see which one is better in terms of performance and memory usage. We will also see how to implement both approaches in Node.js.

Buffered Data Transfer

Let’s take a look at the following code snippet that implements the buffered upload. Explanation is given directly below.

Buffered Request Handler
import fs from "fs"; import crypto from "crypto"; async function bufferedUploadHandler(req, res) { let uuid = crypto.randomUUID(); let bytesRead = 0; let chunks = []; req.on("data", (chunk) => { bytesRead += chunk.length; // add chunks into an array, file content is gauranteed to be in order chunks.push(chunk); }); req.on("close", () => { // Put all the chunks into a buffer together let data = Buffer.concat(chunks).toString(); // Save the content to the file on disk fs.writeFileSync(`./data/file-buffered-${uuid}.txt`, data, { encoding: "utf-8", }); res.end(`Upload Complete, ${bytesRead} bytes uploaded.`); }); }

As you can see, the chunks from the request data are stored and put into an array in the order they are received and kept in memory for the complete timeline of the request.

Buffered Data Transfer

Fig: Buffered Data Transfer Flow Diagram

When the request is finished sending all chunks, a close event is emitted and the chunks are concatenated into a buffer and then saved to the file system.

The chunks are stored in the memory and once the request finished, garbage collector will free up the memory in the next run.

Streaming Data Transfer

Let’s take a look at the following code snippet that implements the streaming upload. Explanation is given directly below.

Streaming Request Handler
import fs from "fs"; import crypto from "crypto"; async function streamedUploadHandler(req, res) { let uuid = crypto.randomUUID(); let fsStream = fs.createWriteStream(`./data/file-streamed-${uuid}.txt`); req.pipe(fsStream); // req.read() -> fsStream.write() req.on("close", () => { fsStream.close(); res.end(`Upload Complete, ${bytesRead} bytes uploaded.`); }); let bytesRead = 0; req.on("data", (chunk) => { bytesRead += chunk.length; }); }

As can be seen, the chunks from the request body are written directly to the file system and not stored in the memory for the complete timeline of the request.

Streaming Data Transfer

Fig: Streaming Data Transfer Flow Diagram

This is possible because request implements Readable Stream interface and the content can be directly written to the file system using Writable Stream interface from fs.

Let’s use the above in a basic http server.

index.js
import { createServer } from "http"; const server = createServer(async (req, res) => { const { method, url } = req; console.log(req.method, req.url); if (method === "POST" && url == "/upload-buffer") { await bufferedUploadHandler(req, res); return; } else if (method === "POST" && url == "/upload-stream") { await streamedUploadHandler(req, res); return; } }); server.listen(3000); console.log("Server listening on http://localhost:3000");

Benchmarking

We will now compare the two approaches in terms of performance and memory usage, by firing a number of concurrent requests in parallel. For the comparison, I wanted to also analyze the memory usage of server in each case, so I changed the response to return peak memory usage of the server as below:

Polling Memory Usage Every 2ms
let peakMemory = 0; setInterval(() => { let memoryUsage = process.memoryUsage(); if (memoryUsage.rss > peakMemory) { peakMemory = memoryUsage.rss; // Resident Set Size } }, 2); async function streamedUploadHandler(req, res) { let uuid = crypto.randomUUID(); let fsStream = fs.createWriteStream(`./data/file-streamed-${uuid}.txt`); req.pipe(fsStream); // req.read() -> fsStream.write() req.on("close", () => { fsStream.close(); res.end(peakMemory.toString()); }); }

For firing parallel requests to the server, I used the following code which fires 200 requests in parallel, each containing a file of roughly 15 MB.

Once all requests are complete, we calculate the average, minimum and maximum time taken for the requests to complete, and also the maximum memory usage of the server.

The script is run twice, once for buffered upload and once for streaming upload to compare the results.

benchmark.js
import http from "http"; import fs from "fs"; async function fileUploadRequest(fileName, requestPath = "/upload-stream") { return new Promise((resolve, reject) => { let start = Date.now(); const req = http.request( { method: "POST", hostname: "localhost", port: "3000", path: requestPath, headers: { Accept: "*/*", }, }, function (res) { const chunks = []; res.on("data", function (chunk) { chunks.push(chunk); }); res.on("end", function () { const body = Buffer.concat(chunks).toString(); let end = Date.now(); resolve({ time: end - start, memoryUsage: Number(body), }); }); } ); const a = fs.createReadStream(fileName); a.pipe(req, { end: false }); a.on("end", () => { req.end(); }); req.on("error", function (error) { reject(error); }); }); } async function fireParallelRequests(count, path = "/upload-stream") { const promises = []; for (let i = 0; i < count; i++) { promises.push(fileUploadRequest("./data.csv", path)); } let metrics = await Promise.all(promises); let latencies = metrics.map((m) => m.time); let min = Math.min(...latencies); let max = Math.max(...latencies); let avg = latencies.reduce((a, b) => a + b, 0) / latencies.length; let maxMemoryUsage = Math.max(metrics.map((m) => m.memoryUsage)); console.log("Total Requests:", count); console.log("URL:", path); console.log("Min Time:", min); console.log("Max Time:", max); console.log("Avg Time:", avg); console.log( "Max Memory Usage:", `${Math.round((maxMemoryUsage / 1024 / 1024) * 100) / 100} MB` ); } async function main() { await fireParallelRequests(200, "/upload-stream"); // await fireParallelRequests(200, "/upload-buffer"); } main();

Here are the results from 200 concurrent requests for both buffered and streaming upload:

Parameter Buffered Data Transfer Streaming Data Transfer
Total Requests 200 200
Min Time (ms) 1975 1297
Max Time (ms) 34609 31680
Avg Time (ms) 13061 3995
Max Memory Usage (MB) 2889.66 MB 276.39 MB

As can be seen, the difference in memory usage is quite apparent, as the buffered upload uses almost 10 times more memory than streaming upload. In fact, the memory usage for buffered upload is so high that it can cause the server to crash if the number of concurrent requests is high enough.

Also, the difference in average latency is noticeable where the streaming upload is as much as 3 times faster than buffered upload.

The following charts depict the time and memory usage for both buffered and streaming upload for different number of concurrent requests.

Buffered vs Streaming Memory Usage

Fig: Buffered vs Streaming Memory Usage

Buffered vs Streaming Latency

Fig: Buffered vs Streaming Average Latency

The memory usage for buffered upload increases linearly with the number of requests, while for streaming upload, it remains almost constant.

Conclusion

In this article, we saw how to implement buffered and streaming data transfer in Node.js and compared the two approaches in terms of performance and memory usage.

We saw that streaming upload is much faster than buffered upload and also uses much less memory.

We usually have limited memory available on the server and we want to process as many requests as we can, so streaming upload is the way to go.
But sometimes, we may want to implement buffered upload, for example, if we want to process the data in some way before saving it to the file system.

References




Similar Articles

Home | © 2024 Last Updated: Mar 03, 2024
Buy Me A Coffee