[NodeJS] Gzip, CSV in-memory file handling using Streams
In this blog, I will be focusing on a particular but common use case when it comes to backend services.
Use case
Our backend service needs to load a CSV file from a source (another service) and populate the data inside the CSV to a database. You might wonder “I don’t need to read a blog for this, there are code examples all over the internet.”.
True!, but this becomes interesting if the file you are getting is a zip file that contains the CSV. (This is a pretty common case if your CSV is very big)
Once again, you can find examples of this particular use case. But most of these examples will be using fs
module, but what I’m gonna show you is done entirely using in-memory. No fs
is used. Interested?
Streams
Streams are a powerful concept when working with NodeJS, but the interesting thing is you can do lots of things in NodeJS without streams and you can work many years with NodeJS without even touching streams. Huh!
I will show how you can handle the use case I introduced using streams and then you will understand the importance of streams.
Our friends
For this task, I will be using the following node libraries/modules
- axios — Commonly used HTTP client in NodeJS (Yes, I know I could have used the
http
module, but Axios is common) - zlib — Used to handle zip files (Gzip is commonly used over the internet)
- csv-parser — Handling CSV parsing
- and finally, stream- You don't say!
Let’s begin
Loading the gzip from axios
As we are loading the gzip file over the internet, we will be using the Axios library for this. We will be getting this as a stream
const axios = require("axios");const getStream = async (URL, headers = {}) => {
try {
const res = await axios.get(URL, {
headers,
responseType: "stream", // we specify the response type
});
return res.data;
} catch (err) {
console.error(err);
throw new Exception("Error in get request");
}
};// and we will call thisconst httpStream = await getStream(URL, {
"accept-encoding": "gzip",
});
Unzipping gzip response
There are many ways we can unzip the gzip, the reason we took the Axios response as a stream is to use the stream piping, let’s see how it is done.
inputStream
.pipe(zlibTransformStream)
.pipe(ourTransformStream)
.pipe(csvTransformStream)
We are using pipe
a lot! Axios input stream is piped to several transform streams and then we are going to read off data from the CSV stream.
Let’s see how we can create a transform stream of our own.
const { Transform } = require("stream");const getTransformStream = () => { const transform = new Transform({
transform: (chunk, encoding, next) => {
next(null, chunk);
},
});
return transform;
};// we will use this asconst transform = getTransformStream();
It is easy, we are just passing (or piping) data to a CSV transform stream.
If you think about this, we do not need a separate transform stream of our own. It was added to show how we can develop our own streams
When we put it all together
const csvInputStream =
httpStream
.pipe(zlib.createGunzip())
.pipe(transform)
.pipe(CSV())
Now you can read data from the inputStream as usual
const results = [];readStream.on("data", (chunk) => {
results.push(chunk);
});readStream.on("end", () => {
console.log(results);
});
Now in the console, you should see your CSV data.
Pretty cool right?
Find the full code here.
Also, you can install it from npm and use it in your application.
npm i gzip-csv-reader