Manage and automatize datasets for data science projects.

Project description

Dataset Manager

Manage and automatize your datasets for your project with YAML files.

Current Support:

How it Works

This project create a file called identifier.yaml in your dataset directory with these fields:

source: https://raw.githubusercontent.com/pcsanwald/kaggle-titanic/master/train.csv

description: this dataset is a test dataset

format: csv

identifier: is the identifier for dataset reference is the file name with yaml extension.

source: is location from dataset.

description: describe your dataset to remember later.

Each dataset is a YAML file inside dataset directory.

Installing

With pip just:

pip install dataset_manager

With conda:

conda install dataset_manager

Using

You can manage your datasets with a list of commands and integrate with Pandas or other data analysis tool.

List all Datasets

Return a List with all Datasets from dataset path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.list_datasets()

Get one Dataset

Get Dataset line as dict

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.get_dataset(identifier)

Create a Dataset

Create a Dataset with every information you want inside dataset_path defined.

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.create_dataset(identifier, source, description, **kwargs)

Remove a Dataset

Remove Dataset from dataset_path

from dataset_manager import DatasetManager

manager = DatasetManager(dataset_path)

manager.remove_dataset(identifier)

Contributing

Just make pull request and be happy!

Let's grow together ;)

Algorithm	Hash digest
SHA256	`55c90701e51bb01a028739b389fe1a6eb33794ea1d2e2c9c22b5c2d9f11d8276`
MD5	`b3860ce2c9b0645024e60fbfd44aa22c`
BLAKE2b-256	`6034a2d59bf29445e40906f1601e51e00337dd9baf820740653876f84b957492`

Algorithm	Hash digest
SHA256	`18f6dc5565f0b5bd9fcc3fce5b65f97bd6276298810a79f9b4a4bae41054ca0d`
MD5	`20614c8e676433bf96eb38bcf7b982cb`
BLAKE2b-256	`c2d662c30dd13f2f07f20644bdc7596b65d04905bb9fbb007873ddffb9bf490a`

Algorithm	Hash digest
SHA256	`3f67de0d089fdb499650d32465a3a16a53a9194786db49f28c128f06cfbbd327`
MD5	`ac225e1b5eb4b0e8cf875535e2e2a519`
BLAKE2b-256	`4e9b4da05d28e529a2f7974135886cf00a7a7a68905022c78ec8e0a1bf8ce252`

dataset-manager 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Dataset Manager

How it Works

Installing

Using

List all Datasets

Get one Dataset

Create a Dataset

Remove a Dataset

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes