1.1 Register master data source
A user needs to provide the JDBC driver class, URL and connectivity credentials including required extra database parameters.
CREATE DATASOURCE ORCL [DESCRIBE BY "Oracle for Product Master"] OPTIONS (
jdbcdriver "oracle.jdbc.OracleDriver",
jdbcurl "jdbc:oracle:thin:@oracle-master:1521:orcl",
username "scott",
password "tiger",
[schema "system",]
[schema_prepended_table "true",]
[key "value"]*)
if schema_prepended_table is set to true, the ingested table is named "schema name"_"actual table name" as there may be same tables using the same name across different schemas.
1.1.1 Add slave nodes for the registered data source (Option for cluster based database)
If the registered database supports cluster base computing such as MPP, then a user can register slave nodes so that the Zetaris Cloud Data Fabric can directly query slave nodes rather than running through a master node.
CREATE DATASOURCE FUSIONDB DESCRIBE BY "Zetaris MPP " OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://coordinator:5432/pgrs",
username "admin",
password "password")
ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://datanode1:5432/pgrs",
username "admin",
password "password")
ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (
jdbcdriver "org.postgressql.Driver",
jdbcurl "jdbc:postgresql://datanode2:5432/pgrs",
username "admin",
password "password")
ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://datanode3:5432/pgrs",
username "admin",
password "password")
The above example shows the registration of 1 master node (oracle-master) and 3 slave nodes (oracle-slave1, oracle-slave2, and oracle-slave3).
*ORCL is an alias for the target data source in this guide, a user can decide on another name when creating any data source.
*If schema is provided, Zetaris Cloud Data Fabric will only ingest metadata from that schema
*If schema_prepended_table is set to true, schema will be prepended to the table name as there may be same tables using the same name across different schemas.
For example, role table in zetaris_bi schema will be named zetaris_bi__role
1.1.1.1 RDBMS examples
(MS SQL Server)
CREATE DATASOURCE MSSQL DESCRIBE BY "MSSQL-2017-linux " OPTIONS (
jdbcdriver "com.microsoft.sqlserver.jdbc.SQLServerDriver",
jdbcurl "jdbc:sqlserver://localhost:1433 ",
databaseName "DemoData",
username "scott" ,
password "tiger",
schema “dbo”
)
(My SQL)
CREATE DATASOURCE MY_SQL DESCRIBE BY "MySQL " OPTIONS (
jdbcdriver "com.mysql.jdbc.Driver",
jdbcurl "jdbc:mysql://127.0.0.1/test_db?",
username "scott" ,
password "tiger
)
(IBM DB2)
CREATE DATASOURCE DB2_DB2INST1 DESCRIBE BY "DB2 Sample DB Schema " OPTIONS (
jdbcdriver "com.ibm.db2.jcc.DB2Driver",
jdbcurl "jdbc:db2://127.0.0.1:50000/db_name",
username "db2inst1" ,
password "db2inst1-pwd",
schema "DB2INST1",
schema_prepended_table "true"
)
(Green Plum)
CREATE DATASOURCE GREEN_PLUM DESCRIBE BY "GREEN_PLUM " OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://localhost:5432/postgres",
username "gpadmin" ,
password "pivotal",
schema "public"
)
(Teradata)
CREATE DATASOURCE TERA_DATA DESCRIBE BY "TERA_DATA " OPTIONS (
jdbcdriver "com.teradata.jdbc.TeraDriver",
jdbcurl "jdbc:teradata://10.128.87.16/DBS_PORT=1025",
username "dbc" ,
password "dbc",
schema "dbcmngr"
)
(Amazon Aurora)
CREATE DATASOURCE AWS_AURORA DESCRIBE BY "AWS_AURORA " OPTIONS (
jdbcdriver "com.mysql.jdbc.Driver",
jdbcurl "jdbc:mysql://zet-aurora-cluster.cluster-ckh4ncwbhsty.ap-southeast-2.rds.amazonaws.com/your_db?",
username "your_db_account_name" ,
password "your_db_account_password""
)
(Amazon Redshift)
CREATE DATASOURCE REDSHIFT DESCRIBE BY "AWS RedShift" OPTIONS (
jdbcdriver "com.amazon.redshift.jdbc.Driver",
jdbcurl "jdbc:redshift://zetaris.cyzoanxzdpje.ap-southeast-2.redshift.amazonaws.com:5439/your_db_name",
username "your_db_account_name",
password "your_db_account_password"
)
1.1.2 Register NoSQL data sources
Zetaris Cloud Data Fabric supports all known NoSQLs, please contact support@zetaris.com, if other data sources are required
1.1.2.1 Mongo DB
For MongoDB, the followings(host, port, db name, user name and password) must be provided.
CREATE DATASOURCE MONGO DESCRIBE BY "MongoDB" OPTIONS (
lightning.datasource.mongodb.host "localhost",
lightning.datasource.mongodb.port "27017",
lightning.datasource.mongodb.database "lightning-demo",
lightning.datasource.mongodb.username "",
lightning.datasource.mongodb.password ""
)
1.1.2.2 Cassandra
For Cassandra, there is only one parameter for Zetaris Cloud Data Fabric, which is the key space for the connection. Other parameters starting with "spark.cassandra" prefix are actually provided for the Spark Cassandra connector (https://github.com/datastax/spark-cassandra-connector).
CREATE DATASOURCE CSNDR DESCRIBE BY "Cassandra" OPTIONS (
spark.cassandra.connection.host "localhost",
spark.cassandra.connection.port "9042",
spark.cassandra.auth.username "cassandra",
spark.cassandra.auth.password "cassandra",
lightning.datasource.cassandra.keyspace "lightning_demo"
)
1.1.2.3 Register amazon DynamoDB
Zetaris Cloud Data Fabric needs access to the key and security key to use AWS services.
CREATE DATASOURCE AWS_DYNAMODB DESCRIBE BY "AWS DynamoDB" OPTIONS (
accessKeyId "Your_aws_accessKeyId",
secretKey "Your_aws_SecretAccessKey" ,
region "ap-southeast-2"
)
1.1.3 Create Zetaris Cloud Data Fabric Data Base for flat files in file store(AWS S3, Azure Blob, local file system)
Files in file store or RESTful API source can be registered under this name space, which will be described later(2.4, 2.5).
CREATE LIGHTNING DATABASE AWS_S3 DESCRIBE BY "AWS S3 bucket" OPTIONS (
[key "value"]
)