DLT is an open-source python-native scalable data loading framework that does not require any devops efforts to run.

These details have not been verified by PyPI

Project links

Project description

Quickstart Guide: Data Load Tool (DLT)

TL;DR: This guide shows you how to load a JSON document into Google BigQuery using DLT.

Please open a pull request here if there is something you can improve about this quickstart.

Grab the demo

Clone the example repository:

git clone https://github.com/scale-vector/dlt-quickstart-example.git

Enter the directory:

cd dlt-quickstart-example

Open the files in your favorite IDE / text editor:

data.json (i.e. the JSON document you will load)
credentials.json (i.e. contains the credentials to our demo Google BigQuery warehouse)
quickstart.py (i.e. the script that uses DLT)

Set up a virtual environment

Ensure you are using either Python 3.8 or 3.9:

python3 --version

Create a new virtual environment:

python3 -m venv ./env

Activate the virtual environment:

source ./env/bin/activate

Install DLT and support for the target data warehouse

Install DLT using pip:

pip3 install -U python-dlt

Install support for Google BigQuery:

pip3 install -U python-dlt[gcp]

Understanding the code

Configure DLT
Create a DLT pipeline
Load the data from the JSON document
Pass the data to the DLT pipeline
Use DLT to load the data

Running the code

Run the quickstart script in /examples folder:

python3 quickstart.py

Inspect schema.yml that has been printed by the script or the generated file:

vim schema.yml

See results of querying the Google BigQuery table:

json_doc table

SELECT * FROM `{schema_prefix}_example.json_doc`

{  "name": "Ana",  "age": "30",  "id": "456",  "_dlt_load_id": "1654787700.406905",  "_dlt_id": "5b018c1ba3364279a0ca1a231fbd8d90"}
{  "name": "Bob",  "age": "30",  "id": "455",  "_dlt_load_id": "1654787700.406905",  "_dlt_id": "afc8506472a14a529bf3e6ebba3e0a9e"}

json_doc__children table

SELECT * FROM `{schema_prefix}_example.json_doc__children` LIMIT 1000

    # {"name": "Bill", "id": "625", "_dlt_parent_id": "5b018c1ba3364279a0ca1a231fbd8d90", "_dlt_list_idx": "0", "_dlt_root_id": "5b018c1ba3364279a0ca1a231fbd8d90",
    #   "_dlt_id": "7993452627a98814cc7091f2c51faf5c"}
    # {"name": "Bill", "id": "625", "_dlt_parent_id": "afc8506472a14a529bf3e6ebba3e0a9e", "_dlt_list_idx": "0", "_dlt_root_id": "afc8506472a14a529bf3e6ebba3e0a9e",
    #   "_dlt_id": "9a2fd144227e70e3aa09467e2358f934"}
    # {"name": "Dave", "id": "621", "_dlt_parent_id": "afc8506472a14a529bf3e6ebba3e0a9e", "_dlt_list_idx": "1", "_dlt_root_id": "afc8506472a14a529bf3e6ebba3e0a9e",
    #   "_dlt_id": "28002ed6792470ea8caf2d6b6393b4f9"}
    # {"name": "Elli", "id": "591", "_dlt_parent_id": "5b018c1ba3364279a0ca1a231fbd8d90", "_dlt_list_idx": "1", "_dlt_root_id": "5b018c1ba3364279a0ca1a231fbd8d90",
    #   "_dlt_id": "d18172353fba1a492c739a7789a786cf"}

Joining the two tables above on autogenerated keys (i.e. p._record_hash = c._parent_hash)

select p.name, p.age, p.id as parent_id,
            c.name as child_name, c.id as child_id, c._dlt_list_idx as child_order_in_list
        from `{schema_prefix}_example.json_doc` as p
        left join `{schema_prefix}_example.json_doc__children`  as c
            on p._dlt_id = c._dlt_parent_id

    # {  "name": "Ana",  "age": "30",  "parent_id": "456",  "child_name": "Bill",  "child_id": "625",  "child_order_in_list": "0"}
    # {  "name": "Ana",  "age": "30",  "parent_id": "456",  "child_name": "Elli",  "child_id": "591",  "child_order_in_list": "1"}
    # {  "name": "Bob",  "age": "30",  "parent_id": "455",  "child_name": "Bill",  "child_id": "625",  "child_order_in_list": "0"}
    # {  "name": "Bob",  "age": "30",  "parent_id": "455",  "child_name": "Dave",  "child_id": "621",  "child_order_in_list": "1"}

Next steps

Replace data.json with data you want to explore
Check that the inferred types are correct in schema.yml
Set up your own Google BigQuery warehouse (and replace the credentials)
Use this new clean staging layer as the starting point for a semantic layer / analytical model (e.g. using dbt)

Algorithm	Hash digest
SHA256	`868510fc86a4ffec916bde59b71c0bbaa018abc1faec5c1289038c0a1f50d749`
MD5	`ddb4616b6de5a6dbcb7956272cc2b722`
BLAKE2b-256	`de850fd3a4fd391a6031f0138299304c9ee12c35770de1828664e06fcaf8fbac`

Algorithm	Hash digest
SHA256	`2383dcab20f98d7cd4ae06970886fd99f34601752a91511ca4cae5c008a20352`
MD5	`bc87c6a3986406a1eede62be59ee4a00`
BLAKE2b-256	`471cf2fdc5cadf9ba3dc45c1904e169cf01cb56c48e69b2058452b18effa5fca`

python-dlt 0.1.0rc15

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quickstart Guide: Data Load Tool (DLT)

TL;DR: This guide shows you how to load a JSON document into Google BigQuery using DLT.

Grab the demo

Set up a virtual environment

Install DLT and support for the target data warehouse

Understanding the code

Running the code

Next steps

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes