Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. A web interface helps manage the state of your workflows.
Install Apache Airflow
Prerequisites steps:
-
Ubuntu VM should be up and running.
-
Python version 3.8 or higher should be installed. If its not installed, please follow the below steps:
pip install psycopg2
sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.8
python --version or python3 --version -
Pip3 should be installed. If not installed, please follow the below steps:
sudo apt update
sudo apt install python3-pip
Installation steps
- a) Installation using pip library (recommended)
pip3 install apache-airflow
pip3 install apache-airflow-providers-sshOR
b) Installation using pip & constraint files
export AIRFLOW_HOME=~/airflow
AIRFLOW_VERSION=2.4.2
PYTHON_VERSION="$(python3 --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip3 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}" - It is recommended to change the airflow metadata database from sqlite to postgres for better performance. To do this, execute the below commands (locally or on your cloud):
CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
ALTER USER airflow_user SET search_path = public; -
Change the path of sql_alchemy_connection in the Airflow config file.
vi airflow.cfg
Change the path,
sql_alchemy_conn = sqlite:////home/zetaris/airflow/airflow.db
to
sql_alchemy_conn = postgresql+psycopg2://user:pass@hostadress:port/database (modify the statement with your details) - Create airflow user (copy the below as one single command)
airflow users create \
--username admin \
--firstname <First_name> \
--lastname <Last_Name> \
--role Admin \
--email <email>\
--password <password>
Setup Apache Airflow
- Create a folder named ‘dags’.
cd /home/zetaris/airflow
mkdir dags
chmod 775 dags -
Initialise airflow database.
airflow db init
-
Start the airflow webserver.
airflow webserver
-
On another terminal window, run the command to start airflow scheduler.
airflow scheduler
-
Load the airflow web interface and login using the credentials you provided in the installation step 2.
http://<your_ubuntu_public_ip>:8080
Walkthrough of entire scheduling process (Video)