跳转到主要内容

将CSV文件转换为SQLite数据库

项目描述

csvs-to-sqlite

PyPI Changelog Tests License

将CSV文件转换为SQLite数据库。使用Datasette浏览和发布SQLite数据库。

基本用法

csvs-to-sqlite myfile.csv mydatabase.db

这将创建一个名为mydatabase.db的新SQLite数据库,其中包含一个名为myfile的单个表,包含CSV内容。

您可以提供多个CSV文件

csvs-to-sqlite one.csv two.csv bundle.db

bundle.db数据库将包含两个表,onetwo

这意味着您可以使用通配符

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db

如果您提供一个或多个目录的路径,脚本将递归地搜索这些目录中的CSV文件并为每个文件创建表。

csvs-to-sqlite ~/path/to/directory all-my-csvs.db

处理TSV(制表符分隔值)

您可以使用-s选项指定不同的分隔符。如果您想使用制表符,您需要像这样应用shell转义

csvs-to-sqlite my-file.tsv my-file.db -s $'\t'

重构列到单独的查找表

假设您有一个看起来像这样的CSV文件

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(从Open Elections项目取的真实示例)

现在,您可以使用新的--extract-column选项(简称:-c)将选定的列转换为单独的查找表 - 例如

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

格式如下

column_name:optional_table_name:optional_table_value_column_name

如果您仅指定列名,例如-c office,将创建以下表

CREATE TABLE "office" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

如果您指定所有三个选项,例如-c precinct:Precinct:name,则表将如下所示

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

原始表将按如下方式创建

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

它们将用引用新导出表的ID填充。

安装

$ pip install csvs-to-sqlite

csvs-to-sqlite现在需要Python 3。如果您正在运行Python 2,您可以安装支持Python 2的最后一个版本

$ pip install csvs-to-sqlite==0.9.2

csvs-to-sqlite --help

Usage: csvs-to-sqlite [OPTIONS] PATHS... DBNAME

  PATHS: paths to individual .csv files or to directories containing .csvs

  DBNAME: name of the SQLite database file to create

Options:
  -s, --separator TEXT            Field separator in input .csv
  -q, --quoting INTEGER           Control field quoting behavior per csv.QUOTE_*
                                  constants. Use one of QUOTE_MINIMAL (0),
                                  QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or
                                  QUOTE_NONE (3).

  --skip-errors                   Skip lines with too many fields instead of
                                  stopping the import

  --replace-tables                Replace tables if they already exist
  -t, --table TEXT                Table to use (instead of using CSV filename)
  -c, --extract-column TEXT       One or more columns to 'extract' into a
                                  separate lookup table. If you pass a simple
                                  column name that column will be replaced with
                                  integer foreign key references to a new table
                                  of that name. You can customize the name of
                                  the table like so:     state:States:state_name
                                  
                                  This will pull unique values from the 'state'
                                  column and use them to populate a new 'States'
                                  table, with an id column primary key and a
                                  state_name column containing the strings from
                                  the original column.

  -d, --date TEXT                 One or more columns to parse into ISO
                                  formatted dates

  -dt, --datetime TEXT            One or more columns to parse into ISO
                                  formatted datetimes

  -df, --datetime-format TEXT     One or more custom date format strings to try
                                  when parsing dates/datetimes

  -pk, --primary-key TEXT         One or more columns to use as the primary key
  -f, --fts TEXT                  One or more columns to use to populate a full-
                                  text index

  -i, --index TEXT                Add index on this column (or a compound index
                                  with -i col1,col2)

  --shape TEXT                    Custom shape for the DB table - format is
                                  csvcol:dbcol(TYPE),...

  --filename-column TEXT          Add a column with this name and populate with
                                  CSV file name

  --fixed-column <TEXT TEXT>...   Populate column with a fixed string
  --fixed-column-int <TEXT INTEGER>...
                                  Populate column with a fixed integer
  --fixed-column-float <TEXT FLOAT>...
                                  Populate column with a fixed float
  --no-index-fks                  Skip adding index to foreign key columns
                                  created using --extract-column (default is to
                                  add them)

  --no-fulltext-fks               Skip adding full-text index on values
                                  extracted using --extract-column (default is
                                  to add them)

  --just-strings                  Import all columns as text strings by default
                                  (and, if specified, still obey --shape,
                                  --date/datetime, and --datetime-format)

  --version                       Show the version and exit.
  --help                          Show this message and exit.

项目详情


下载文件

下载适合您平台的文件。如果您不确定选择哪个,请了解有关安装包的更多信息。

源分布

csvs_to_sqlite-1.3.tar.gz (17.8 kB 查看哈希值)

上传时间

构建分布

csvs_to_sqlite-1.3-py2.py3-none-any.whl (16.7 kB 查看哈希值)

上传时间 Python 2 Python 3

支持