Build A Basic Cloud Data Pipeline: Part 1

Components of the Data Pipeline:

*Virtual Servers from DigitalOcean
*NiFi (data ingestion, transformation, and routing)
*MiNiFi (remote data collection agent)
*Kafka (data dispensing)

Diagram of the Data Pipeline:

How We’re Going to Set This Up:

We’re going to install NiFi and Kafka (the system) on the same virtual server, and we’re going to install MiNiFi on another virtual server to emulate a remote data source feeding into NiFi. You could have multiple data sources, each with MiNiFi installed, feeding into NiFi; however, at this point, we’re just doing one for demonstration purposes.

Also, at this point, we’re not setting up any applications that will consume the processed / transformed data waiting at Kafka. As with the remote data sources coming into NiFi, you could have multiple remote applications pulling the data waiting at the Kafka outputs.

Note: For the purposes of this tutorial there are five assumptions:

Assumption 1: You have a DigitalOcean account and have setup two virtual servers (droplets). The MiNiFi droplet can be the minimal Basic 1GB RAM / 25 GB Disk, while the system droplet should be Basic 2GB RAM / 25 GB Disk. Both should use Ubuntu 18.04.
https://www.digitalocean.com/docs/droplets/quickstart/

Assumption 2: You’ve registered a domain name with a registrar and updated your domain’s NS records in your registrar account to point to DigitalOcean’s name servers.

Assumption 3: You have PuTTY and PuTTYgen installed locally for secure remote access to your droplets, and you have PSFTP installed locally for secure remote file transfer to your droplets. https://www.ssh.com/ssh/putty
https://www.ssh.com/ssh/putty/windows/puttygen
https://www.ssh.com/ssh/sftp

Assumption 4: You’ve created a new user on your droplets with sudo privileges:

adduser username
-aG sudo username
su –username

Assumption 5: Java is installed and JAVA_Home and PATH environment variables are set on both droplets:

sudo apt update
sudo apt install openjdk-8-jdk
java -version

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
echo $JAVA_HOME
export PATH=$PATH:$JAVA_HOME/bin
echo $PATH

Get Installation Files:

Step 1: Download the NiFi, NiFi-Toolkit, Kafka, and MiNiFi installation files (tar.gz) locally:
https://nifi.apache.org/download.html
https://kafka.apache.org/downloads
https://nifi.apache.org/minifi/download.html

Step 2: SFTP the NiFi, NiFi-Toolkit, and Kafka files to your system droplet and the MiNiFi file to the other droplet:

…And that’s it for Part 1. Please see Part 2 where we will make it all come together:

 

Leave a Comment

Your email address will not be published. Required fields are marked *