tablib库使用详解

tablib简介

tablib为requests作者kennethreitz维护,支持python2到python3.简单的说就是一个通用的数据集,操作类似数据库,但又不是一个数据库的代替,因为缺少查操作,但是可以通过tablib将数据集轻松转为xls、csv、yaml等格式。简单的来说就是用来处理tabular dataset,为这些不同格式的数据集提供一个统一的格式。

目前支持下面这些输出格式:

  • Excel (Sets + Books)
  • JSON (Sets + Books)
  • YAML (Sets + Books)
  • HTML (Sets)
  • TSV (Sets)
  • OSD (Sets)
  • CSV (Sets)
  • DBF (Sets)

基本使用

创建Dataset对象

1
2
3
4
5
6
7
8
9
import tablib
headers = ('first_name', 'last_name')
data = [
('John', 'Adams'),
('George', 'Washington')
]
data = tablib.Dataset(*data, headers=headers)

这样相当于构造了一张表:

first_name last_name
John Adams
George Washington

其中最重要的就是Dataset对象,当然该对象的创建也可以不输入参数,直接data = tablib.Dataset()创建出一个Dataset对象,然后通过data.headers = ['first_name', 'last_name']设置表头,当然也可以使用data.headers = ('first_name', 'last_name'),因为不管是用列表还是元组,tablib都会自动帮我们处理好,我们可以通过data.append(['Henry', 'Ford'])或者data.append(('Henry', 'Ford'))来向表中添加一条记录。

我们可以通过data.dict来查看目前表中的所有数据:

1
2
>>> data.dict
[OrderedDict([('First Name', 'John'), ('Last Name', 'Adams')]), OrderedDict([('First Name', 'George'), ('Last Name', 'Washington')]), OrderedDict([('First Name', 'Henry'), ('Last Name', 'Ford')])]

也可以通过print(data)显示更人性化的输出:

1
2
3
4
5
6
>>> print(data)
First Name|Last Name
----------|----------
John |Adams
George |Washington
Henry |Ford

Dataset属性

1
2
3
4
5
6
7
8
9
>>> print(data)
First Name|Last Name|age
----------|---------|---
John |Adams |90
Henry |Ford |83
>>> data.height
2
>>> data.width
3

data.height输出当前记录(行)总数
data.width输出当前属性(列)总数

常用方法

lpop(),lpush(row, tags=[]),lpush_col(col, header=None)是对列的相关操作

pop(),rpop(),rpush(row, tags=[]),rpush_col(col, header=None)是对行的相关操作

remove_duplicates()去除重复的记录

sort(col, reverse=False)根据列进行排序

subset(rows=None, cols=None)返回子Dataset

wipe()清空Dataset,包括表头和内容

新增列

1
>>> data.append_col((90, 67, 83), header='age')

这样表就变成了:

first_name last_name age
John Adams 90
George Washington 67
Henry Ford 83
1
2
3
4
5
6
>>> print(data)
First Name|Last Name |age
----------|----------|---
John |Adams |90
George |Washington|67
Henry |Ford |83

对记录操作

1
2
3
4
>>> print(data[:2])
[('John', 'Adams', 90), ('George', 'Washington', 67)]
>>> print(data[2:])
[('Henry', 'Ford', 83)]

对属性操作

1
2
3
4
5
6
7
8
9
10
11
>>> print(data['first_name'])
['John', 'George', 'Henry']
>>> print(data)
First Name|Last Name |age
----------|----------|---
John |Adams |90
George |Washington|67
Henry |Ford |83
>>> data.get_col(1)
['Adams', 'Washington', 'Ford']

删除记录

1
2
3
4
5
6
>>> del data[1]
>>> print(data)
First Name|Last Name|age
----------|---------|---
John |Adams |90
Henry |Ford |83

可见记录也是从0开始索引的

删除记录操作也支持切片

删除属性

1
del data['Col Name']

导入数据

1
imported_data = Dataset().load(open('data.csv').read())

导出数据

csv

1
2
3
4
5
6
>>> data.csv
'First Name,Last Name,age\r\nJohn,Adams,90\r\nHenry,Ford,83\r\n'
>>> print(data.csv)
First Name,Last Name,age
John,Adams,90
Henry,Ford,83

json

1
2
3
4
>>> data.json
'[{"First Name": "John", "Last Name": "Adams", "age": 90}, {"First Name": "Henry", "Last Name": "Ford", "age": 83}]'
>>> print(data.json)
[{"First Name": "John", "Last Name": "Adams", "age": 90}, {"First Name": "Henry", "Last Name": "Ford", "age": 83}]

yaml

1
2
3
4
5
>>> data.yaml
'- {First Name: John, Last Name: Adams, age: 90}\n- {First Name: Henry, Last Name: Ford, age: 83}\n'
>>> print(data.yaml)
- {First Name: John, Last Name: Adams, age: 90}
- {First Name: Henry, Last Name: Ford, age: 83}

excel

1
2
>>> with open('people.xls', 'wb') as f:
... f.write(data.xls)

注意要以二进制形式打开文件

dbf

1
2
>>> with open('people.dbf', 'wb') as f:
... f.write(data.dbf)

高级使用

动态列

可以将一个函数指定给Dataset对象

1
2
3
4
5
6
7
8
9
10
11
import random
def random_grade(row):
"""Returns a random integer for entry."""
return (random.randint(60,100)/100.0)
data.append_col(random_grade, header='Grade')
>>> data.yaml
- {Age: 22, First Name: Kenneth, Grade: 0.6, Last Name: Reitz}
- {Age: 20, First Name: Bessie, Grade: 0.75, Last Name: Monke}

函数的参数row传入的是每一行记录,所以可以根据传入的记录进行更一步的计算:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def guess_gender(row):
"""Calculates gender of given student data row."""
m_names = ('Kenneth', 'Mike', 'Yuri')
f_names = ('Bessie', 'Samantha', 'Heather')
name = row[0]
if name in m_names:
return 'Male'
elif name in f_names:
return 'Female'
else:
return 'Unknown'
>>> data.yaml
- {Age: 22, First Name: Kenneth, Gender: Male, Last Name: Reitz}
- {Age: 20, First Name: Bessie, Gender: Female, Last Name: Monke}

tag

可以给记录添加tag,之后通过tag来过滤记录:

1
2
3
4
5
6
7
8
9
students = tablib.Dataset()
students.headers = ['first', 'last']
students.rpush(['Kenneth', 'Reitz'], tags=['male', 'technical'])
students.rpush(['Bessie', 'Monke'], tags=['female', 'creative'])
>>> students.filter(['male']).yaml
- {first: Kenneth, Last: Reitz}

Excel Workbook With Multiple Sheets

it’s quite common to group multiple spreadsheets into a single Excel file, known as a Workbook.

1
2
3
4
book = tablib.Databook((data1, data2, data3))
with open('students.xls', 'wb') as f:
f.write(book.xls)

参考文档

坚持原创技术分享,您的支持将鼓励我继续创作!