Version Control In Zetaris using Github and Airflow

This page lists all the steps that needs to be followed in order to implement the version control mechanism for Zetaris Lightning Metastore. This mechanism is useful to restore the data objects to its previously working state in case of a crash.


Prerequisites:

  1. Valid    Github repository required at the customer site

Setting up Git Repository and Gathering Credentials

1. Create an empty GitHub repository from GitHub Home Page with the default branch named "main" under your organisation's account.


Screenshot 2023-07-26 at 21.20.36

2. Generate a new token by visiting https://github.com/settings/tokens - Connect your Github account and selecting "Generate new token" → "Generate new token (classic)." Assign an appropriate name to the token and configure the necessary permissions.

Screenshot 2023-07-26 at 21.34.57

3. Please make sure to store the token along with the following details: branch named 'main', owner (in this case 'Zetaris'), repository name (in this case 'Test-Version-Control'), and the GitHub API/FILE URL as <https://api.github.com/repos/<owner>/<repo-name>/contents/>.

Setting up Airflow for Full Backup & Full/Partial Restore

1.  We need to create variables on the Airflow UI to provide parameters for the backup and restore process. Follow these steps:

  • On the Airflow UI, navigate to the Admin section and select "Variables" from the dropdown list.

  • Upload the Airflow Variables  content as a JSON file by selecting "Choose File," and then import it by clicking "Import Variables" in Airflow.

Please note:

  1. By default, the installation of Airflow includes predefined variables related to Airflow configuration and the Zetaris Environment.

  2. The table provided below contain either example values or parameter names, which should be replaced with appropriate and specific values.

  3. In the case of backup, leave the variables starting with 'Restore_' with a default value of a single space (' ').

  4. On the Airflow UI, for restore mechanism, edit the variable values below in LOWER CASE before triggering the DAG.

 

Variable Name

Value

Email_Address

<email_address_for_notifications>

VC_GitHubToken

<github-token>

VC_Github_Repo

<github-repo-name>

VC_Github_Owner

<github-organisation-name>

VC_Github_Folder

backup/

VC_Github_Branch

main

VC_GitHub_File_URL

<github-file-url>

Restore_Type

full or partial or individual

Restore_Object_Type

pipelines or data_marts or permanent_views

Restore_Object_Name

<pipeline_container name or data_mart name or permanent_view name>

Restore_Object_Individual_Name

<individual_pipeline_name>


Screenshot 2023-07-27 at 12.35.14

  • Please refer to the table below for the different types of restore mechanisms with the specific required user inputs for the variables defined in the above step on the Airflow UI

Type of Restore (Level of Restore)

Description

Variable1
Restore_Type

Variable2
Restore_Object_Type

Variable3
Restore_Object_Nam 


Variable4
Restore_Object_Individual_Name

Full Restore

Restore all the pipelines, data marts and views

full

<single_space>

<single_space>

<single_space>

Object Type

Restore on Object type level for either pipelines, data marts or views(allowed to enter only one option at any time)

partial

pipelines
OR
data_marts
OR
permanent_views

<single_space>

<single_space>

Container

Restore on Pipeline Container or Data Mart Container or a individual permanent view (allowed to enter only one option at any time)

individual

pipelines
OR
data_marts
OR
permanent_views

<pipleine_container_name>

OR

<data_mart_container_name>

OR

<permanent_view_name>

<single_space>

Individual Pipeline

Restore any individual pipeline

individual

pipelines

<pipeline_container_name>

<pipeline_name>

 

2.  To setup teams notifications, follow the below steps:

  • Configuring webhook in Microsoft teams

  • Open up teams, browse to the channel you would like to configure messages to be sent to

  • click thebeside the channel name then select Connectors

  • Search for Incoming Webhook.

  • Click Configure, provide a webhook name and select Create. A Webhook URL will be generated.

    1t

  • On the Airflow UI, click on Admin and click on connections from the dropdown list

  • Click on Add and add the respective parameters as shown below in the image and name the conn_id as ‘msteams_webhook_url

    2t

3. To setup slack notifications, follow the below steps:

  • Create a Slack app if you don’t have already
    1s

  • Enable Incoming Webhooks on the next page

    2s
    3s
  • Create an Incoming Webhook by clicking on Add New Webhook to Workspace on the same page

    4s
  • So go ahead and pick a channel that the app will post to, and then click to Authorize your app. You’ll be sent back to your app settings, and you should now see a new entry under the Webhook URLs for Your Workspace section, with a Webhook URL that’ll look something like this:
    https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
    5s
  • Create an Airflow connection for Slack with the name as ‘slack_webhook_url’ with HTTP connection and the part after https://hooks.slack.com/services should go under password:
    Slack Conn Id: slack_webhook_url

    Host: https://hooks.slack.com/services

    Conn Type: HTTP

    Password: /T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX

    Schema : https

    Screenshot 2023-07-27 at 10.59.01

4. After setting up the airflow and defining the necessary variables, you are all set to execute the 'Backup_Zetaris_Data_Objects' and 'Restore_Zetaris_Data_Objects' dags, each designed for its specific purposes.

Screenshot 2023-07-28 at 10.58.28

Screenshot 2023-07-28 at 10.58.45


Example of different objects on Zetaris Lightning below:

img1


Example of backups of the different objects on GITHUB below:

Screenshot 2023-07-27 at 12.30.59


Walkthrough of Version Control Process (Video)