Oct 27, 2023 · 15 mins read
Buffered vs Streaming Data Transfer
Comparing buffered and streaming data transfer in Node.js serverUmakant Vashishtha
Buffered vs Streaming Data Transfer
Table of Contents
- Table of Contents
- Introduction
- Buffered Data Transfer
- Streaming Data Transfer
- Benchmarking
- Conclusion
- References
Introduction
In this article, we will see different methods of data transfer and how to efficiently transfer data from/to a server using HTTP protocol.
With HTTP protocol, the client can either transfer data to the server (upload) or request certain data from the server (download).
Client/Server can choose to transfer the data in different chunks, especially if the size of the data is more than a few kilobytes to ensure that the packet loss is minimal.
For download operations, server can choose to send the data in chunks while for upload operations, client can choose to send the data in chunks.
When request data is accumulated in memory and then processed, we call it buffered processing.
In contrast, when the data is processed at the time each chunk is received, we call it streaming processing.
By processing, it could be anything depending on the request, from storing in the file system to transmitting to some other server. For simplicity, we will only consider storing the data in the file system.
We will compare the two approaches and see which one is better in terms of performance and memory usage. We will also see how to implement both approaches in Node.js.
Buffered Data Transfer
Let’s take a look at the following code snippet that implements the buffered upload. Explanation is given directly below.
Buffered Request Handlerimport fs from "fs";
import crypto from "crypto";
async function bufferedUploadHandler(req, res) {
let uuid = crypto.randomUUID();
let bytesRead = 0;
let chunks = [];
req.on("data", (chunk) => {
bytesRead += chunk.length;
// add chunks into an array, file content is gauranteed to be in order
chunks.push(chunk);
});
req.on("close", () => {
// Put all the chunks into a buffer together
let data = Buffer.concat(chunks).toString();
// Save the content to the file on disk
fs.writeFileSync(`./data/file-buffered-${uuid}.txt`, data, {
encoding: "utf-8",
});
res.end(`Upload Complete, ${bytesRead} bytes uploaded.`);
});
}
As you can see, the chunks from the request data are stored and put into an array in the order they are received and kept in memory for the complete timeline of the request.
Fig: Buffered Data Transfer Flow Diagram
When the request is finished sending all chunks, a close
event is emitted and the chunks are concatenated into a buffer and then saved to the file system.
The chunks are stored in the memory and once the request finished, garbage collector will free up the memory in the next run.
Streaming Data Transfer
Let’s take a look at the following code snippet that implements the streaming upload. Explanation is given directly below.
Streaming Request Handlerimport fs from "fs";
import crypto from "crypto";
async function streamedUploadHandler(req, res) {
let uuid = crypto.randomUUID();
let fsStream = fs.createWriteStream(`./data/file-streamed-${uuid}.txt`);
req.pipe(fsStream); // req.read() -> fsStream.write()
req.on("close", () => {
fsStream.close();
res.end(`Upload Complete, ${bytesRead} bytes uploaded.`);
});
let bytesRead = 0;
req.on("data", (chunk) => {
bytesRead += chunk.length;
});
}
As can be seen, the chunks from the request body are written directly to the file system and not stored in the memory for the complete timeline of the request.
Fig: Streaming Data Transfer Flow Diagram
This is possible because request implements Readable Stream interface and the content can be directly written to the file system using Writable Stream interface from fs
.
Let’s use the above in a basic http
server.
index.jsimport { createServer } from "http";
const server = createServer(async (req, res) => {
const { method, url } = req;
console.log(req.method, req.url);
if (method === "POST" && url == "/upload-buffer") {
await bufferedUploadHandler(req, res);
return;
} else if (method === "POST" && url == "/upload-stream") {
await streamedUploadHandler(req, res);
return;
}
});
server.listen(3000);
console.log("Server listening on http://localhost:3000");
Benchmarking
We will now compare the two approaches in terms of performance and memory usage, by firing a number of concurrent requests in parallel. For the comparison, I wanted to also analyze the memory usage of server in each case, so I changed the response to return peak memory usage of the server as below:
Polling Memory Usage Every 2mslet peakMemory = 0;
setInterval(() => {
let memoryUsage = process.memoryUsage();
if (memoryUsage.rss > peakMemory) {
peakMemory = memoryUsage.rss; // Resident Set Size
}
}, 2);
async function streamedUploadHandler(req, res) {
let uuid = crypto.randomUUID();
let fsStream = fs.createWriteStream(`./data/file-streamed-${uuid}.txt`);
req.pipe(fsStream); // req.read() -> fsStream.write()
req.on("close", () => {
fsStream.close();
res.end(peakMemory.toString());
});
}
For firing parallel requests to the server, I used the following code which fires 200 requests in parallel, each containing a file of roughly 15 MB.
Once all requests are complete, we calculate the average, minimum and maximum time taken for the requests to complete, and also the maximum memory usage of the server.
The script is run twice, once for buffered upload and once for streaming upload to compare the results.
benchmark.jsimport http from "http";
import fs from "fs";
async function fileUploadRequest(fileName, requestPath = "/upload-stream") {
return new Promise((resolve, reject) => {
let start = Date.now();
const req = http.request(
{
method: "POST",
hostname: "localhost",
port: "3000",
path: requestPath,
headers: {
Accept: "*/*",
},
},
function (res) {
const chunks = [];
res.on("data", function (chunk) {
chunks.push(chunk);
});
res.on("end", function () {
const body = Buffer.concat(chunks).toString();
let end = Date.now();
resolve({
time: end - start,
memoryUsage: Number(body),
});
});
}
);
const a = fs.createReadStream(fileName);
a.pipe(req, { end: false });
a.on("end", () => {
req.end();
});
req.on("error", function (error) {
reject(error);
});
});
}
async function fireParallelRequests(count, path = "/upload-stream") {
const promises = [];
for (let i = 0; i < count; i++) {
promises.push(fileUploadRequest("./data.csv", path));
}
let metrics = await Promise.all(promises);
let latencies = metrics.map((m) => m.time);
let min = Math.min(...latencies);
let max = Math.max(...latencies);
let avg = latencies.reduce((a, b) => a + b, 0) / latencies.length;
let maxMemoryUsage = Math.max(metrics.map((m) => m.memoryUsage));
console.log("Total Requests:", count);
console.log("URL:", path);
console.log("Min Time:", min);
console.log("Max Time:", max);
console.log("Avg Time:", avg);
console.log(
"Max Memory Usage:",
`${Math.round((maxMemoryUsage / 1024 / 1024) * 100) / 100} MB`
);
}
async function main() {
await fireParallelRequests(200, "/upload-stream");
// await fireParallelRequests(200, "/upload-buffer");
}
main();
Here are the results from 200 concurrent requests for both buffered and streaming upload:
Parameter | Buffered Data Transfer | Streaming Data Transfer |
---|---|---|
Total Requests | 200 | 200 |
Min Time (ms) | 1975 | 1297 |
Max Time (ms) | 34609 | 31680 |
Avg Time (ms) | 13061 | 3995 |
Max Memory Usage (MB) | 2889.66 MB | 276.39 MB |
As can be seen, the difference in memory usage is quite apparent, as the buffered upload uses almost 10 times more memory than streaming upload. In fact, the memory usage for buffered upload is so high that it can cause the server to crash if the number of concurrent requests is high enough.
Also, the difference in average latency is noticeable where the streaming upload is as much as 3 times faster than buffered upload.
The following charts depict the time and memory usage for both buffered and streaming upload for different number of concurrent requests.
Fig: Buffered vs Streaming Memory Usage
Fig: Buffered vs Streaming Average Latency
The memory usage for buffered upload increases linearly with the number of requests, while for streaming upload, it remains almost constant.
Conclusion
In this article, we saw how to implement buffered and streaming data transfer in Node.js and compared the two approaches in terms of performance and memory usage.
We saw that streaming upload is much faster than buffered upload and also uses much less memory.
We usually have limited memory available on the server and we want to process as many requests as we can, so streaming upload is the way to go.
But sometimes, we may want to implement buffered upload, for example, if we want to process the data in some way before saving it to the file system.
References
Similar Articles
Clustering - Run Multiple Instances of Node.js Application
Improving Node.js Application Performance With Clustering
May 05, 2023 · 25 mins
Using Signed URLs in React | File Upload using AWS S3, Node.js and React - Part 3
Building the react application to upload files directly to AWS S3 using Signed URLs generated from our node.js application.
Oct 15, 2023 · 15 mins
Setting Up Node.js App | File Upload using AWS S3, Node.js and React - Part 2
Setting up Node.js application and use AWS SDK to generate S3 Signed URL using AWS access credentials that will be used in our react application to upload files directly.
Oct 07, 2023 · 20 mins