对本书的大部分内容来说,我们会依赖于数据集来拟合机器学习算法。这部分将会告诉你如何通过TensorFlow和Python来获取这些数据资源。
在TensorFlow中,有一些数据资源是Python库中内置的,有些是需要Python的脚本来下载,有些是需要在网上手动下载。 当然,所有这些数据集都是需要网络来获取数据。
首先,我们需要对TensorFlow的 graph session
进行初始化:
>>> import tensorflow.compat.v1 as tf
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from tensorflow.python.framework import ops
>>> ops.reset_default_graph()
>>> tf.disable_eager_execution()
>>> sess = tf.Session()
Iris Dataset(鸢尾属植物数据集)¶
这个数据集( Iris Dataset )无可置疑地是最经典的用于机器学习的数据集,而且可能扩展到所有统计学。这个数据集采集了三种鸢尾花的 sepal length (花萼长度), sepal width (花萼宽度),petal length (花瓣长度),petal width (花瓣宽度)。这三种鸢尾花分别是Iris Setosa(山鸢尾),Iris Versicolour(杂色鸢尾),Iris Virginica(维吉尼亚鸢尾)。总共有150项测量,每种鸢尾花有50项。为了在Python中使用这些数据集,我们使用Scikit Learn中的数据函数。
>>> from sklearn.datasets import load_iris
>>> import pandas as pd
>>> iris = load_iris()
>>> print(len(iris.data))
150
>>> print(len(iris.target))
150
>>> print(iris.data[0])
[5.1 3.5 1.4 0.2]
>>> print(set(iris.target))
{0, 1, 2}
>>> pd.DataFrame(data=iris.data, columns=iris.feature_names)
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
2 | 4.7 | 3.2 | 1.3 | 0.2 |
3 | 4.6 | 3.1 | 1.5 | 0.2 |
4 | 5.0 | 3.6 | 1.4 | 0.2 |
5 | 5.4 | 3.9 | 1.7 | 0.4 |
6 | 4.6 | 3.4 | 1.4 | 0.3 |
7 | 5.0 | 3.4 | 1.5 | 0.2 |
8 | 4.4 | 2.9 | 1.4 | 0.2 |
9 | 4.9 | 3.1 | 1.5 | 0.1 |
10 | 5.4 | 3.7 | 1.5 | 0.2 |
11 | 4.8 | 3.4 | 1.6 | 0.2 |
12 | 4.8 | 3.0 | 1.4 | 0.1 |
13 | 4.3 | 3.0 | 1.1 | 0.1 |
14 | 5.8 | 4.0 | 1.2 | 0.2 |
15 | 5.7 | 4.4 | 1.5 | 0.4 |
16 | 5.4 | 3.9 | 1.3 | 0.4 |
17 | 5.1 | 3.5 | 1.4 | 0.3 |
18 | 5.7 | 3.8 | 1.7 | 0.3 |
19 | 5.1 | 3.8 | 1.5 | 0.3 |
20 | 5.4 | 3.4 | 1.7 | 0.2 |
21 | 5.1 | 3.7 | 1.5 | 0.4 |
22 | 4.6 | 3.6 | 1.0 | 0.2 |
23 | 5.1 | 3.3 | 1.7 | 0.5 |
24 | 4.8 | 3.4 | 1.9 | 0.2 |
25 | 5.0 | 3.0 | 1.6 | 0.2 |
26 | 5.0 | 3.4 | 1.6 | 0.4 |
27 | 5.2 | 3.5 | 1.5 | 0.2 |
28 | 5.2 | 3.4 | 1.4 | 0.2 |
29 | 4.7 | 3.2 | 1.6 | 0.2 |
... | ... | ... | ... | ... |
120 | 6.9 | 3.2 | 5.7 | 2.3 |
121 | 5.6 | 2.8 | 4.9 | 2.0 |
122 | 7.7 | 2.8 | 6.7 | 2.0 |
123 | 6.3 | 2.7 | 4.9 | 1.8 |
124 | 6.7 | 3.3 | 5.7 | 2.1 |
125 | 7.2 | 3.2 | 6.0 | 1.8 |
126 | 6.2 | 2.8 | 4.8 | 1.8 |
127 | 6.1 | 3.0 | 4.9 | 1.8 |
128 | 6.4 | 2.8 | 5.6 | 2.1 |
129 | 7.2 | 3.0 | 5.8 | 1.6 |
130 | 7.4 | 2.8 | 6.1 | 1.9 |
131 | 7.9 | 3.8 | 6.4 | 2.0 |
132 | 6.4 | 2.8 | 5.6 | 2.2 |
133 | 6.3 | 2.8 | 5.1 | 1.5 |
134 | 6.1 | 2.6 | 5.6 | 1.4 |
135 | 7.7 | 3.0 | 6.1 | 2.3 |
136 | 6.3 | 3.4 | 5.6 | 2.4 |
137 | 6.4 | 3.1 | 5.5 | 1.8 |
138 | 6.0 | 3.0 | 4.8 | 1.8 |
139 | 6.9 | 3.1 | 5.4 | 2.1 |
140 | 6.7 | 3.1 | 5.6 | 2.4 |
141 | 6.9 | 3.1 | 5.1 | 2.3 |
142 | 5.8 | 2.7 | 5.1 | 1.9 |
143 | 6.8 | 3.2 | 5.9 | 2.3 |
144 | 6.7 | 3.3 | 5.7 | 2.5 |
145 | 6.7 | 3.0 | 5.2 | 2.3 |
146 | 6.3 | 2.5 | 5.0 | 1.9 |
147 | 6.5 | 3.0 | 5.2 | 2.0 |
148 | 6.2 | 3.4 | 5.4 | 2.3 |
149 | 5.9 | 3.0 | 5.1 | 1.8 |
150 rows x 4 columns
>>> X = iris.data #只包括样本的特征,150x4
>>> y = iris.target #样本的类型,[0, 1, 2]
>>> features = iris.feature_names #4个特征的名称
>>> targets = iris.target_names #3类鸢尾花的名称,跟y中的3个数字对应
... plt.figure(figsize=(10, 4))
... plt.plot(X[:, 2][y==0], X[:, 3][y==0], 'bs', label=targets[0])
... plt.plot(X[:, 2][y==1], X[:, 3][y==1], 'kx', label=targets[1])
... plt.plot(X[:, 2][y==2], X[:, 3][y==2], 'ro', label=targets[2])
... plt.xlabel(features[2])
... plt.ylabel(features[3])
... plt.title('Iris Data Set')
... plt.legend()
... plt.savefig('Iris Data Set.png', dpi=200)
... plt.show()
Low Birthrate Dataset (Hosted on Github)¶
马萨诸塞大学艾摩斯特分校(The university of Massachusetts at Amherst)编撰了很多有趣的统计数据集。其中有一项是测量儿童出生重量和其他人口学数据( Low Birthrate Dataset , “Low Infant Birth Weight Risk Factor Study”, 1989, Hosmer and Lemeshow),以及母亲和家庭历史的医学测量。总共测量了11个变量的189观察数据。这里给出如何通过Python来获取其中的数据:
>>> import requests
>>> birthdata_url='https://github.com/nfmcclure/tensorflow_cookbook/raw/master/01_Introduction/07_Working_with_Data_Sources/birthweight_data/birthweight.dat'
>>> birth_file = requests.get(birthdata_url)
>>> birth_data = birth_file.text.split('\r\n')
>>> birth_header = birth_data[0].split('\t')
>>> birth_data = [[float(x) for x in y.split('\t') if len(x)>=1] for y in birth_data[1:] if len(y)>=1]
>>> print(len(birth_data))
189
>>> print(len(birth_data[0]))
9
>>> print(birth_header)
['LOW', 'AGE', 'LWT', 'RACE', 'SMOKE', 'PTL', 'HT', 'UI', 'BWT']
>>> import pandas as pd
>>> pd.DataFrame(data=birth_data, columns=birth_header)
LOW | AGE | LWT | RACE | SMOKE | PTL | HT | UI | BWT | |
---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 28.0 | 113.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 709.0 |
1 | 1.0 | 29.0 | 130.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1021.0 |
2 | 1.0 | 34.0 | 187.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1135.0 |
3 | 1.0 | 25.0 | 105.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1330.0 |
4 | 1.0 | 25.0 | 85.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1474.0 |
5 | 1.0 | 27.0 | 150.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1588.0 |
6 | 1.0 | 23.0 | 97.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1588.0 |
7 | 1.0 | 24.0 | 128.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1701.0 |
8 | 1.0 | 24.0 | 132.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1729.0 |
9 | 1.0 | 21.0 | 165.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1790.0 |
10 | 1.0 | 32.0 | 105.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1818.0 |
11 | 1.0 | 19.0 | 91.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1885.0 |
12 | 1.0 | 25.0 | 115.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1893.0 |
13 | 1.0 | 16.0 | 130.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1899.0 |
14 | 1.0 | 25.0 | 92.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1928.0 |
15 | 1.0 | 20.0 | 150.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1928.0 |
16 | 1.0 | 21.0 | 190.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1928.0 |
17 | 1.0 | 24.0 | 155.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1936.0 |
18 | 1.0 | 21.0 | 103.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1970.0 |
19 | 1.0 | 20.0 | 125.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2055.0 |
20 | 1.0 | 25.0 | 89.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2055.0 |
21 | 1.0 | 19.0 | 102.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2082.0 |
22 | 1.0 | 19.0 | 112.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 2084.0 |
23 | 1.0 | 26.0 | 117.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 2084.0 |
24 | 1.0 | 24.0 | 138.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2100.0 |
25 | 1.0 | 17.0 | 130.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 2125.0 |
26 | 1.0 | 20.0 | 120.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2126.0 |
27 | 1.0 | 22.0 | 130.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 2187.0 |
28 | 1.0 | 27.0 | 130.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2187.0 |
29 | 1.0 | 20.0 | 80.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 2211.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
159 | 0.0 | 24.0 | 110.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3728.0 |
160 | 0.0 | 19.0 | 184.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 3756.0 |
161 | 0.0 | 24.0 | 110.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 3770.0 |
162 | 0.0 | 23.0 | 110.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3770.0 |
163 | 0.0 | 20.0 | 120.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3770.0 |
164 | 0.0 | 25.0 | 141.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 3790.0 |
165 | 0.0 | 30.0 | 112.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3799.0 |
166 | 0.0 | 22.0 | 169.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3827.0 |
167 | 0.0 | 18.0 | 120.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3856.0 |
168 | 0.0 | 16.0 | 170.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3860.0 |
169 | 0.0 | 32.0 | 186.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3860.0 |
170 | 0.0 | 18.0 | 120.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3884.0 |
171 | 0.0 | 29.0 | 130.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3884.0 |
172 | 0.0 | 33.0 | 117.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 3912.0 |
173 | 0.0 | 20.0 | 170.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3940.0 |
174 | 0.0 | 28.0 | 134.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3941.0 |
175 | 0.0 | 14.0 | 135.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 3941.0 |
176 | 0.0 | 28.0 | 130.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3969.0 |
177 | 0.0 | 25.0 | 120.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3983.0 |
178 | 0.0 | 16.0 | 135.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3997.0 |
179 | 0.0 | 20.0 | 158.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3997.0 |
180 | 0.0 | 26.0 | 160.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4054.0 |
181 | 0.0 | 21.0 | 115.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4054.0 |
182 | 0.0 | 22.0 | 129.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4111.0 |
183 | 0.0 | 25.0 | 130.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4153.0 |
184 | 0.0 | 31.0 | 120.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4167.0 |
185 | 0.0 | 35.0 | 170.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 4174.0 |
186 | 0.0 | 19.0 | 120.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 4238.0 |
187 | 0.0 | 24.0 | 216.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4593.0 |
188 | 0.0 | 45.0 | 123.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 4990.0 |
189 rows x 9 columns
波士顿房价数据库(University of California at Irvine)¶
卡耐基梅隆大学在它的统计学库中保存了很多数据。其中一项,波士顿房价数据( Boston Housing data )可以通过加利福尼亚艾文分校的机器学习仓库来获取。这里总共有房价的506项观察数据和不同人口学数据,以及住宅性质(14个变量)。这里展示如何在Python中获取这些数据:
>>> import requests
>>> housing_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data'
>>> housing_header = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
>>> housing_file = requests.get(housing_url)
>>> housing_data = [[float(x) for x in y.split(' ') if len(x)>=1] for y in housing_file.text.split('\n') if len(y)>=1]
>>> print(len(housing_data))
506
>>> print(len(housing_data[0]))
14
>>> pd.DataFrame(data=housing_data,columns=housing_header)
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18.0 | 2.31 | 0.0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1.0 | 296.0 | 15.3 | 396.90 | 4.98 | 24.0 |
1 | 0.02731 | 0.0 | 7.07 | 0.0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2.0 | 242.0 | 17.8 | 396.90 | 9.14 | 21.6 |
2 | 0.02729 | 0.0 | 7.07 | 0.0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2.0 | 242.0 | 17.8 | 392.83 | 4.03 | 34.7 |
3 | 0.03237 | 0.0 | 2.18 | 0.0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3.0 | 222.0 | 18.7 | 394.63 | 2.94 | 33.4 |
4 | 0.06905 | 0.0 | 2.18 | 0.0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3.0 | 222.0 | 18.7 | 396.90 | 5.33 | 36.2 |
5 | 0.02985 | 0.0 | 2.18 | 0.0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3.0 | 222.0 | 18.7 | 394.12 | 5.21 | 28.7 |
6 | 0.08829 | 12.5 | 7.87 | 0.0 | 0.524 | 6.012 | 66.6 | 5.5605 | 5.0 | 311.0 | 15.2 | 395.60 | 12.43 | 22.9 |
7 | 0.14455 | 12.5 | 7.87 | 0.0 | 0.524 | 6.172 | 96.1 | 5.9505 | 5.0 | 311.0 | 15.2 | 396.90 | 19.15 | 27.1 |
8 | 0.21124 | 12.5 | 7.87 | 0.0 | 0.524 | 5.631 | 100.0 | 6.0821 | 5.0 | 311.0 | 15.2 | 386.63 | 29.93 | 16.5 |
9 | 0.17004 | 12.5 | 7.87 | 0.0 | 0.524 | 6.004 | 85.9 | 6.5921 | 5.0 | 311.0 | 15.2 | 386.71 | 17.10 | 18.9 |
10 | 0.22489 | 12.5 | 7.87 | 0.0 | 0.524 | 6.377 | 94.3 | 6.3467 | 5.0 | 311.0 | 15.2 | 392.52 | 20.45 | 15.0 |
11 | 0.11747 | 12.5 | 7.87 | 0.0 | 0.524 | 6.009 | 82.9 | 6.2267 | 5.0 | 311.0 | 15.2 | 396.90 | 13.27 | 18.9 |
12 | 0.09378 | 12.5 | 7.87 | 0.0 | 0.524 | 5.889 | 39.0 | 5.4509 | 5.0 | 311.0 | 15.2 | 390.50 | 15.71 | 21.7 |
13 | 0.62976 | 0.0 | 8.14 | 0.0 | 0.538 | 5.949 | 61.8 | 4.7075 | 4.0 | 307.0 | 21.0 | 396.90 | 8.26 | 20.4 |
14 | 0.63796 | 0.0 | 8.14 | 0.0 | 0.538 | 6.096 | 84.5 | 4.4619 | 4.0 | 307.0 | 21.0 | 380.02 | 10.26 | 18.2 |
15 | 0.62739 | 0.0 | 8.14 | 0.0 | 0.538 | 5.834 | 56.5 | 4.4986 | 4.0 | 307.0 | 21.0 | 395.62 | 8.47 | 19.9 |
16 | 1.05393 | 0.0 | 8.14 | 0.0 | 0.538 | 5.935 | 29.3 | 4.4986 | 4.0 | 307.0 | 21.0 | 386.85 | 6.58 | 23.1 |
17 | 0.78420 | 0.0 | 8.14 | 0.0 | 0.538 | 5.990 | 81.7 | 4.2579 | 4.0 | 307.0 | 21.0 | 386.75 | 14.67 | 17.5 |
18 | 0.80271 | 0.0 | 8.14 | 0.0 | 0.538 | 5.456 | 36.6 | 3.7965 | 4.0 | 307.0 | 21.0 | 288.99 | 11.69 | 20.2 |
19 | 0.72580 | 0.0 | 8.14 | 0.0 | 0.538 | 5.727 | 69.5 | 3.7965 | 4.0 | 307.0 | 21.0 | 390.95 | 11.28 | 18.2 |
20 | 1.25179 | 0.0 | 8.14 | 0.0 | 0.538 | 5.570 | 98.1 | 3.7979 | 4.0 | 307.0 | 21.0 | 376.57 | 21.02 | 13.6 |
21 | 0.85204 | 0.0 | 8.14 | 0.0 | 0.538 | 5.965 | 89.2 | 4.0123 | 4.0 | 307.0 | 21.0 | 392.53 | 13.83 | 19.6 |
22 | 1.23247 | 0.0 | 8.14 | 0.0 | 0.538 | 6.142 | 91.7 | 3.9769 | 4.0 | 307.0 | 21.0 | 396.90 | 18.72 | 15.2 |
23 | 0.98843 | 0.0 | 8.14 | 0.0 | 0.538 | 5.813 | 100.0 | 4.0952 | 4.0 | 307.0 | 21.0 | 394.54 | 19.88 | 14.5 |
24 | 0.75026 | 0.0 | 8.14 | 0.0 | 0.538 | 5.924 | 94.1 | 4.3996 | 4.0 | 307.0 | 21.0 | 394.33 | 16.30 | 15.6 |
25 | 0.84054 | 0.0 | 8.14 | 0.0 | 0.538 | 5.599 | 85.7 | 4.4546 | 4.0 | 307.0 | 21.0 | 303.42 | 16.51 | 13.9 |
26 | 0.67191 | 0.0 | 8.14 | 0.0 | 0.538 | 5.813 | 90.3 | 4.6820 | 4.0 | 307.0 | 21.0 | 376.88 | 14.81 | 16.6 |
27 | 0.95577 | 0.0 | 8.14 | 0.0 | 0.538 | 6.047 | 88.8 | 4.4534 | 4.0 | 307.0 | 21.0 | 306.38 | 17.28 | 14.8 |
28 | 0.77299 | 0.0 | 8.14 | 0.0 | 0.538 | 6.495 | 94.4 | 4.4547 | 4.0 | 307.0 | 21.0 | 387.94 | 12.80 | 18.4 |
29 | 1.00245 | 0.0 | 8.14 | 0.0 | 0.538 | 6.674 | 87.3 | 4.2390 | 4.0 | 307.0 | 21.0 | 380.23 | 11.98 | 21.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
476 | 4.87141 | 0.0 | 18.10 | 0.0 | 0.614 | 6.484 | 93.6 | 2.3053 | 24.0 | 666.0 | 20.2 | 396.21 | 18.68 | 16.7 |
477 | 15.02340 | 0.0 | 18.10 | 0.0 | 0.614 | 5.304 | 97.3 | 2.1007 | 24.0 | 666.0 | 20.2 | 349.48 | 24.91 | 12.0 |
478 | 10.23300 | 0.0 | 18.10 | 0.0 | 0.614 | 6.185 | 96.7 | 2.1705 | 24.0 | 666.0 | 20.2 | 379.70 | 18.03 | 14.6 |
479 | 14.33370 | 0.0 | 18.10 | 0.0 | 0.614 | 6.229 | 88.0 | 1.9512 | 24.0 | 666.0 | 20.2 | 383.32 | 13.11 | 21.4 |
480 | 5.82401 | 0.0 | 18.10 | 0.0 | 0.532 | 6.242 | 64.7 | 3.4242 | 24.0 | 666.0 | 20.2 | 396.90 | 10.74 | 23.0 |
481 | 5.70818 | 0.0 | 18.10 | 0.0 | 0.532 | 6.750 | 74.9 | 3.3317 | 24.0 | 666.0 | 20.2 | 393.07 | 7.74 | 23.7 |
482 | 5.73116 | 0.0 | 18.10 | 0.0 | 0.532 | 7.061 | 77.0 | 3.4106 | 24.0 | 666.0 | 20.2 | 395.28 | 7.01 | 25.0 |
483 | 2.81838 | 0.0 | 18.10 | 0.0 | 0.532 | 5.762 | 40.3 | 4.0983 | 24.0 | 666.0 | 20.2 | 392.92 | 10.42 | 21.8 |
484 | 2.37857 | 0.0 | 18.10 | 0.0 | 0.583 | 5.871 | 41.9 | 3.7240 | 24.0 | 666.0 | 20.2 | 370.73 | 13.34 | 20.6 |
485 | 3.67367 | 0.0 | 18.10 | 0.0 | 0.583 | 6.312 | 51.9 | 3.9917 | 24.0 | 666.0 | 20.2 | 388.62 | 10.58 | 21.2 |
486 | 5.69175 | 0.0 | 18.10 | 0.0 | 0.583 | 6.114 | 79.8 | 3.5459 | 24.0 | 666.0 | 20.2 | 392.68 | 14.98 | 19.1 |
487 | 4.83567 | 0.0 | 18.10 | 0.0 | 0.583 | 5.905 | 53.2 | 3.1523 | 24.0 | 666.0 | 20.2 | 388.22 | 11.45 | 20.6 |
488 | 0.15086 | 0.0 | 27.74 | 0.0 | 0.609 | 5.454 | 92.7 | 1.8209 | 4.0 | 711.0 | 20.1 | 395.09 | 18.06 | 15.2 |
489 | 0.18337 | 0.0 | 27.74 | 0.0 | 0.609 | 5.414 | 98.3 | 1.7554 | 4.0 | 711.0 | 20.1 | 344.05 | 23.97 | 7.0 |
490 | 0.20746 | 0.0 | 27.74 | 0.0 | 0.609 | 5.093 | 98.0 | 1.8226 | 4.0 | 711.0 | 20.1 | 318.43 | 29.68 | 8.1 |
491 | 0.10574 | 0.0 | 27.74 | 0.0 | 0.609 | 5.983 | 98.8 | 1.8681 | 4.0 | 711.0 | 20.1 | 390.11 | 18.07 | 13.6 |
492 | 0.11132 | 0.0 | 27.74 | 0.0 | 0.609 | 5.983 | 83.5 | 2.1099 | 4.0 | 711.0 | 20.1 | 396.90 | 13.35 | 20.1 |
493 | 0.17331 | 0.0 | 9.69 | 0.0 | 0.585 | 5.707 | 54.0 | 2.3817 | 6.0 | 391.0 | 19.2 | 396.90 | 12.01 | 21.8 |
494 | 0.27957 | 0.0 | 9.69 | 0.0 | 0.585 | 5.926 | 42.6 | 2.3817 | 6.0 | 391.0 | 19.2 | 396.90 | 13.59 | 24.5 |
495 | 0.17899 | 0.0 | 9.69 | 0.0 | 0.585 | 5.670 | 28.8 | 2.7986 | 6.0 | 391.0 | 19.2 | 393.29 | 17.60 | 23.1 |
496 | 0.28960 | 0.0 | 9.69 | 0.0 | 0.585 | 5.390 | 72.9 | 2.7986 | 6.0 | 391.0 | 19.2 | 396.90 | 21.14 | 19.7 |
497 | 0.26838 | 0.0 | 9.69 | 0.0 | 0.585 | 5.794 | 70.6 | 2.8927 | 6.0 | 391.0 | 19.2 | 396.90 | 14.10 | 18.3 |
498 | 0.23912 | 0.0 | 9.69 | 0.0 | 0.585 | 6.019 | 65.3 | 2.4091 | 6.0 | 391.0 | 19.2 | 396.90 | 12.92 | 21.2 |
499 | 0.17783 | 0.0 | 9.69 | 0.0 | 0.585 | 5.569 | 73.5 | 2.3999 | 6.0 | 391.0 | 19.2 | 395.77 | 15.10 | 17.5 |
500 | 0.22438 | 0.0 | 9.69 | 0.0 | 0.585 | 6.027 | 79.7 | 2.4982 | 6.0 | 391.0 | 19.2 | 396.90 | 14.33 | 16.8 |
501 | 0.06263 | 0.0 | 11.93 | 0.0 | 0.573 | 6.593 | 69.1 | 2.4786 | 1.0 | 273.0 | 21.0 | 391.99 | 9.67 | 22.4 |
502 | 0.04527 | 0.0 | 11.93 | 0.0 | 0.573 | 6.120 | 76.7 | 2.2875 | 1.0 | 273.0 | 21.0 | 396.90 | 9.08 | 20.6 |
503 | 0.06076 | 0.0 | 11.93 | 0.0 | 0.573 | 6.976 | 91.0 | 2.1675 | 1.0 | 273.0 | 21.0 | 396.90 | 5.64 | 23.9 |
504 | 0.10959 | 0.0 | 11.93 | 0.0 | 0.573 | 6.794 | 89.3 | 2.3889 | 1.0 | 273.0 | 21.0 | 393.45 | 6.48 | 22.0 |
505 | 0.04741 | 0.0 | 11.93 | 0.0 | 0.573 | 6.030 | 80.8 | 2.5050 | 1.0 | 273.0 | 21.0 | 396.90 | 7.88 | 11.9 |
506 rows x 14 columns
或者采用另外一种方式:
>>> from sklearn.datasets import load_boston
>>> import pandas as pd
>>> boston = load_boston()
>>> print(len(boston.data))
506
>>> print(len(boston.target))
506
>>> print(boston.data[0])
[6.320e-03 1.800e+01 2.310e+00 0.000e+00 5.380e-01 6.575e+00 6.520e+01
4.090e+00 1.000e+00 2.960e+02 1.530e+01 3.969e+02 4.980e+00]
>>> print(set(boston.target))
{5.0, 6.3, 7.2, 8.8, 7.4, 10.2, 11.8, 12.7, 13.6, 14.5, 15.2, 15.0, 16.5, 17.5, 19.6, 18.9, 18.2, 20.4, 21.6, 22.9, 21.7, 26.6, 26.5, 27.5, 24.0, 23.1, 27.1, 28.7, 24.7, 30.8, 33.4, 34.7, 34.9, 36.2, 35.4, 31.6, 33.0, 38.7, 43.8, 41.3, 37.2, 39.8, 42.3, 48.5, 44.8, 50.0, 46.7, 48.3, 44.0, 48.8, 46.0, 10.5, 11.5, 11.0, 12.5, 12.0, 13.5, 13.0, 14.0, 16.6, 16.0, 16.1, 16.4, 17.4, 17.1, 17.0, 17.6, 17.9, 18.4, 18.6, 18.5, 18.0, 18.1, 19.9, 19.4, 19.5, 19.1, 19.0, 20.1, 20.0, 20.5, 20.9, 20.6, 21.0, 21.4, 21.5, 21.9, 21.1, 22.0, 22.5, 22.6, 22.4, 22.1, 23.4, 23.5, 23.9, 23.6, 23.0, 24.1, 24.6, 24.4, 24.5, 25.0, 25.1, 26.4, 27.0, 27.9, 28.0, 28.4, 28.1, 28.5, 28.6, 29.4, 29.9, 29.6, 29.1, 29.0, 30.5, 30.1, 31.1, 31.5, 31.0, 32.5, 32.0, 32.9, 32.4, 32.2, 33.2, 33.3, 33.8, 33.1, 32.7, 34.6, 8.4, 35.2, 35.1, 10.4, 10.9, 7.0, 36.4, 36.0, 36.5, 36.1, 11.9, 37.9, 37.0, 37.6, 37.3, 13.9, 13.4, 14.4, 14.9, 15.4, 8.5, 41.7, 42.8, 43.1, 43.5, 45.4, 9.5, 8.3, 8.7, 9.7, 10.8, 11.3, 11.7, 12.3, 12.8, 13.2, 13.3, 13.8, 14.8, 14.3, 14.2, 15.7, 15.3, 16.2, 16.8, 16.3, 16.7, 17.3, 17.8, 17.2, 17.7, 18.3, 18.7, 18.8, 19.2, 19.3, 19.7, 19.8, 20.2, 20.8, 20.3, 20.7, 21.2, 21.8, 22.2, 22.8, 22.7, 22.3, 23.3, 23.8, 23.2, 23.7, 24.8, 24.2, 24.3, 25.3, 25.2, 26.7, 26.2, 7.5, 28.2, 29.8, 30.3, 30.7, 5.6, 31.7, 31.2, 8.1, 9.6, 12.1, 12.6, 13.1, 14.6, 14.1, 15.6, 15.1}
>>> pd.DataFrame(data=boston.data, columns=boston.feature_names)
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18.0 | 2.31 | 0.0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1.0 | 296.0 | 15.3 | 396.90 | 4.98 |
1 | 0.02731 | 0.0 | 7.07 | 0.0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2.0 | 242.0 | 17.8 | 396.90 | 9.14 |
2 | 0.02729 | 0.0 | 7.07 | 0.0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2.0 | 242.0 | 17.8 | 392.83 | 4.03 |
3 | 0.03237 | 0.0 | 2.18 | 0.0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3.0 | 222.0 | 18.7 | 394.63 | 2.94 |
4 | 0.06905 | 0.0 | 2.18 | 0.0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3.0 | 222.0 | 18.7 | 396.90 | 5.33 |
5 | 0.02985 | 0.0 | 2.18 | 0.0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3.0 | 222.0 | 18.7 | 394.12 | 5.21 |
6 | 0.08829 | 12.5 | 7.87 | 0.0 | 0.524 | 6.012 | 66.6 | 5.5605 | 5.0 | 311.0 | 15.2 | 395.60 | 12.43 |
7 | 0.14455 | 12.5 | 7.87 | 0.0 | 0.524 | 6.172 | 96.1 | 5.9505 | 5.0 | 311.0 | 15.2 | 396.90 | 19.15 |
8 | 0.21124 | 12.5 | 7.87 | 0.0 | 0.524 | 5.631 | 100.0 | 6.0821 | 5.0 | 311.0 | 15.2 | 386.63 | 29.93 |
9 | 0.17004 | 12.5 | 7.87 | 0.0 | 0.524 | 6.004 | 85.9 | 6.5921 | 5.0 | 311.0 | 15.2 | 386.71 | 17.10 |
10 | 0.22489 | 12.5 | 7.87 | 0.0 | 0.524 | 6.377 | 94.3 | 6.3467 | 5.0 | 311.0 | 15.2 | 392.52 | 20.45 |
11 | 0.11747 | 12.5 | 7.87 | 0.0 | 0.524 | 6.009 | 82.9 | 6.2267 | 5.0 | 311.0 | 15.2 | 396.90 | 13.27 |
12 | 0.09378 | 12.5 | 7.87 | 0.0 | 0.524 | 5.889 | 39.0 | 5.4509 | 5.0 | 311.0 | 15.2 | 390.50 | 15.71 |
13 | 0.62976 | 0.0 | 8.14 | 0.0 | 0.538 | 5.949 | 61.8 | 4.7075 | 4.0 | 307.0 | 21.0 | 396.90 | 8.26 |
14 | 0.63796 | 0.0 | 8.14 | 0.0 | 0.538 | 6.096 | 84.5 | 4.4619 | 4.0 | 307.0 | 21.0 | 380.02 | 10.26 |
15 | 0.62739 | 0.0 | 8.14 | 0.0 | 0.538 | 5.834 | 56.5 | 4.4986 | 4.0 | 307.0 | 21.0 | 395.62 | 8.47 |
16 | 1.05393 | 0.0 | 8.14 | 0.0 | 0.538 | 5.935 | 29.3 | 4.4986 | 4.0 | 307.0 | 21.0 | 386.85 | 6.58 |
17 | 0.78420 | 0.0 | 8.14 | 0.0 | 0.538 | 5.990 | 81.7 | 4.2579 | 4.0 | 307.0 | 21.0 | 386.75 | 14.67 |
18 | 0.80271 | 0.0 | 8.14 | 0.0 | 0.538 | 5.456 | 36.6 | 3.7965 | 4.0 | 307.0 | 21.0 | 288.99 | 11.69 |
19 | 0.72580 | 0.0 | 8.14 | 0.0 | 0.538 | 5.727 | 69.5 | 3.7965 | 4.0 | 307.0 | 21.0 | 390.95 | 11.28 |
20 | 1.25179 | 0.0 | 8.14 | 0.0 | 0.538 | 5.570 | 98.1 | 3.7979 | 4.0 | 307.0 | 21.0 | 376.57 | 21.02 |
21 | 0.85204 | 0.0 | 8.14 | 0.0 | 0.538 | 5.965 | 89.2 | 4.0123 | 4.0 | 307.0 | 21.0 | 392.53 | 13.83 |
22 | 1.23247 | 0.0 | 8.14 | 0.0 | 0.538 | 6.142 | 91.7 | 3.9769 | 4.0 | 307.0 | 21.0 | 396.90 | 18.72 |
23 | 0.98843 | 0.0 | 8.14 | 0.0 | 0.538 | 5.813 | 100.0 | 4.0952 | 4.0 | 307.0 | 21.0 | 394.54 | 19.88 |
24 | 0.75026 | 0.0 | 8.14 | 0.0 | 0.538 | 5.924 | 94.1 | 4.3996 | 4.0 | 307.0 | 21.0 | 394.33 | 16.30 |
25 | 0.84054 | 0.0 | 8.14 | 0.0 | 0.538 | 5.599 | 85.7 | 4.4546 | 4.0 | 307.0 | 21.0 | 303.42 | 16.51 |
26 | 0.67191 | 0.0 | 8.14 | 0.0 | 0.538 | 5.813 | 90.3 | 4.6820 | 4.0 | 307.0 | 21.0 | 376.88 | 14.81 |
27 | 0.95577 | 0.0 | 8.14 | 0.0 | 0.538 | 6.047 | 88.8 | 4.4534 | 4.0 | 307.0 | 21.0 | 306.38 | 17.28 |
28 | 0.77299 | 0.0 | 8.14 | 0.0 | 0.538 | 6.495 | 94.4 | 4.4547 | 4.0 | 307.0 | 21.0 | 387.94 | 12.80 |
29 | 1.00245 | 0.0 | 8.14 | 0.0 | 0.538 | 6.674 | 87.3 | 4.2390 | 4.0 | 307.0 | 21.0 | 380.23 | 11.98 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
476 | 4.87141 | 0.0 | 18.10 | 0.0 | 0.614 | 6.484 | 93.6 | 2.3053 | 24.0 | 666.0 | 20.2 | 396.21 | 18.68 |
477 | 15.02340 | 0.0 | 18.10 | 0.0 | 0.614 | 5.304 | 97.3 | 2.1007 | 24.0 | 666.0 | 20.2 | 349.48 | 24.91 |
478 | 10.23300 | 0.0 | 18.10 | 0.0 | 0.614 | 6.185 | 96.7 | 2.1705 | 24.0 | 666.0 | 20.2 | 379.70 | 18.03 |
479 | 14.33370 | 0.0 | 18.10 | 0.0 | 0.614 | 6.229 | 88.0 | 1.9512 | 24.0 | 666.0 | 20.2 | 383.32 | 13.11 |
480 | 5.82401 | 0.0 | 18.10 | 0.0 | 0.532 | 6.242 | 64.7 | 3.4242 | 24.0 | 666.0 | 20.2 | 396.90 | 10.74 |
481 | 5.70818 | 0.0 | 18.10 | 0.0 | 0.532 | 6.750 | 74.9 | 3.3317 | 24.0 | 666.0 | 20.2 | 393.07 | 7.74 |
482 | 5.73116 | 0.0 | 18.10 | 0.0 | 0.532 | 7.061 | 77.0 | 3.4106 | 24.0 | 666.0 | 20.2 | 395.28 | 7.01 |
483 | 2.81838 | 0.0 | 18.10 | 0.0 | 0.532 | 5.762 | 40.3 | 4.0983 | 24.0 | 666.0 | 20.2 | 392.92 | 10.42 |
484 | 2.37857 | 0.0 | 18.10 | 0.0 | 0.583 | 5.871 | 41.9 | 3.7240 | 24.0 | 666.0 | 20.2 | 370.73 | 13.34 |
485 | 3.67367 | 0.0 | 18.10 | 0.0 | 0.583 | 6.312 | 51.9 | 3.9917 | 24.0 | 666.0 | 20.2 | 388.62 | 10.58 |
486 | 5.69175 | 0.0 | 18.10 | 0.0 | 0.583 | 6.114 | 79.8 | 3.5459 | 24.0 | 666.0 | 20.2 | 392.68 | 14.98 |
487 | 4.83567 | 0.0 | 18.10 | 0.0 | 0.583 | 5.905 | 53.2 | 3.1523 | 24.0 | 666.0 | 20.2 | 388.22 | 11.45 |
488 | 0.15086 | 0.0 | 27.74 | 0.0 | 0.609 | 5.454 | 92.7 | 1.8209 | 4.0 | 711.0 | 20.1 | 395.09 | 18.06 |
489 | 0.18337 | 0.0 | 27.74 | 0.0 | 0.609 | 5.414 | 98.3 | 1.7554 | 4.0 | 711.0 | 20.1 | 344.05 | 23.97 |
490 | 0.20746 | 0.0 | 27.74 | 0.0 | 0.609 | 5.093 | 98.0 | 1.8226 | 4.0 | 711.0 | 20.1 | 318.43 | 29.68 |
491 | 0.10574 | 0.0 | 27.74 | 0.0 | 0.609 | 5.983 | 98.8 | 1.8681 | 4.0 | 711.0 | 20.1 | 390.11 | 18.07 |
492 | 0.11132 | 0.0 | 27.74 | 0.0 | 0.609 | 5.983 | 83.5 | 2.1099 | 4.0 | 711.0 | 20.1 | 396.90 | 13.35 |
493 | 0.17331 | 0.0 | 9.69 | 0.0 | 0.585 | 5.707 | 54.0 | 2.3817 | 6.0 | 391.0 | 19.2 | 396.90 | 12.01 |
494 | 0.27957 | 0.0 | 9.69 | 0.0 | 0.585 | 5.926 | 42.6 | 2.3817 | 6.0 | 391.0 | 19.2 | 396.90 | 13.59 |
495 | 0.17899 | 0.0 | 9.69 | 0.0 | 0.585 | 5.670 | 28.8 | 2.7986 | 6.0 | 391.0 | 19.2 | 393.29 | 17.60 |
496 | 0.28960 | 0.0 | 9.69 | 0.0 | 0.585 | 5.390 | 72.9 | 2.7986 | 6.0 | 391.0 | 19.2 | 396.90 | 21.14 |
497 | 0.26838 | 0.0 | 9.69 | 0.0 | 0.585 | 5.794 | 70.6 | 2.8927 | 6.0 | 391.0 | 19.2 | 396.90 | 14.10 |
498 | 0.23912 | 0.0 | 9.69 | 0.0 | 0.585 | 6.019 | 65.3 | 2.4091 | 6.0 | 391.0 | 19.2 | 396.90 | 12.92 |
499 | 0.17783 | 0.0 | 9.69 | 0.0 | 0.585 | 5.569 | 73.5 | 2.3999 | 6.0 | 391.0 | 19.2 | 395.77 | 15.10 |
500 | 0.22438 | 0.0 | 9.69 | 0.0 | 0.585 | 6.027 | 79.7 | 2.4982 | 6.0 | 391.0 | 19.2 | 396.90 | 14.33 |
501 | 0.06263 | 0.0 | 11.93 | 0.0 | 0.573 | 6.593 | 69.1 | 2.4786 | 1.0 | 273.0 | 21.0 | 391.99 | 9.67 |
502 | 0.04527 | 0.0 | 11.93 | 0.0 | 0.573 | 6.120 | 76.7 | 2.2875 | 1.0 | 273.0 | 21.0 | 396.90 | 9.08 |
503 | 0.06076 | 0.0 | 11.93 | 0.0 | 0.573 | 6.976 | 91.0 | 2.1675 | 1.0 | 273.0 | 21.0 | 396.90 | 5.64 |
504 | 0.10959 | 0.0 | 11.93 | 0.0 | 0.573 | 6.794 | 89.3 | 2.3889 | 1.0 | 273.0 | 21.0 | 393.45 | 6.48 |
505 | 0.04741 | 0.0 | 11.93 | 0.0 | 0.573 | 6.030 | 80.8 | 2.5050 | 1.0 | 273.0 | 21.0 | 396.90 | 7.88 |
506 rows x 13 columns
#颜色
>>> cnames = {'aliceblue': '#F0F8FF', 'antiquewhite': '#FAEBD7', 'aqua': '#00FFFF', 'aquamarine': '#7FFFD4', 'azure': '#F0FFFF', 'beige': '#F5F5DC', 'bisque': '#FFE4C4', 'black': '#000000', 'blanchedalmond': '#FFEBCD', 'blue': '#0000FF', 'blueviolet': '#8A2BE2', 'brown': '#A52A2A', 'burlywood': '#DEB887', 'cadetblue': '#5F9EA0', 'chartreuse': '#7FFF00', 'chocolate': '#D2691E', 'coral': '#FF7F50', 'cornflowerblue': '#6495ED', 'cornsilk': '#FFF8DC', 'crimson': '#DC143C', 'cyan': '#00FFFF', 'darkblue': '#00008B', 'darkcyan': '#008B8B', 'darkgoldenrod': '#B8860B', 'darkgray': '#A9A9A9', 'darkgreen': '#006400', 'darkkhaki': '#BDB76B', 'darkmagenta': '#8B008B', 'darkolivegreen': '#556B2F', 'darkorange': '#FF8C00', 'darkorchid': '#9932CC', 'darkred': '#8B0000', 'darksalmon': '#E9967A', 'darkseagreen': '#8FBC8F', 'darkslateblue': '#483D8B', 'darkslategray': '#2F4F4F', 'darkturquoise': '#00CED1', 'darkviolet': '#9400D3', 'deeppink': '#FF1493', 'deepskyblue': '#00BFFF', 'dimgray': '#696969', 'dodgerblue': '#1E90FF', 'firebrick': '#B22222', 'floralwhite': '#FFFAF0', 'forestgreen': '#228B22', 'fuchsia': '#FF00FF', 'gainsboro': '#DCDCDC', 'ghostwhite': '#F8F8FF', 'gold': '#FFD700', 'goldenrod': '#DAA520', 'gray': '#808080', 'green': '#008000', 'greenyellow': '#ADFF2F', 'honeydew': '#F0FFF0', 'hotpink': '#FF69B4', 'indianred': '#CD5C5C', 'indigo': '#4B0082', 'ivory': '#FFFFF0', 'khaki': '#F0E68C', 'lavender': '#E6E6FA', 'lavenderblush': '#FFF0F5', 'lawngreen': '#7CFC00', 'lemonchiffon': '#FFFACD', 'lightblue': '#ADD8E6', 'lightcoral': '#F08080', 'lightcyan': '#E0FFFF', 'lightgoldenrodyellow': '#FAFAD2', 'lightgreen': '#90EE90', 'lightgray': '#D3D3D3', 'lightpink': '#FFB6C1', 'lightsalmon': '#FFA07A', 'lightseagreen': '#20B2AA', 'lightskyblue': '#87CEFA', 'lightslategray': '#778899', 'lightsteelblue': '#B0C4DE', 'lightyellow': '#FFFFE0', 'lime': '#00FF00', 'limegreen': '#32CD32', 'linen': '#FAF0E6', 'magenta': '#FF00FF', 'maroon': '#800000', 'mediumaquamarine': '#66CDAA', 'mediumblue': '#0000CD', 'mediumorchid': '#BA55D3', 'mediumpurple': '#9370DB', 'mediumseagreen': '#3CB371', 'mediumslateblue': '#7B68EE', 'mediumspringgreen': '#00FA9A', 'mediumturquoise': '#48D1CC', 'mediumvioletred': '#C71585', 'midnightblue': '#191970', 'mintcream': '#F5FFFA', 'mistyrose': '#FFE4E1', 'moccasin': '#FFE4B5', 'navajowhite': '#FFDEAD', 'navy': '#000080', 'oldlace': '#FDF5E6', 'olive': '#808000', 'olivedrab': '#6B8E23', 'orange': '#FFA500', 'orangered': '#FF4500', 'orchid': '#DA70D6', 'palegoldenrod': '#EEE8AA', 'palegreen': '#98FB98', 'paleturquoise': '#AFEEEE', 'palevioletred': '#DB7093', 'papayawhip': '#FFEFD5', 'peachpuff': '#FFDAB9', 'peru': '#CD853F', 'pink': '#FFC0CB', 'plum': '#DDA0DD', 'powderblue': '#B0E0E6', 'purple': '#800080', 'red': '#FF0000', 'rosybrown': '#BC8F8F', 'royalblue': '#4169E1', 'saddlebrown': '#8B4513', 'salmon': '#FA8072', 'sandybrown': '#FAA460', 'seagreen': '#2E8B57', 'seashell': '#FFF5EE', 'sienna': '#A0522D', 'silver': '#C0C0C0', 'skyblue': '#87CEEB', 'slateblue': '#6A5ACD', 'slategray': '#708090', 'snow': '#FFFAFA', 'springgreen': '#00FF7F', 'steelblue': '#4682B4', 'tan': '#D2B48C', 'teal': '#008080', 'thistle': '#D8BFD8', 'tomato': '#FF6347', 'turquoise': '#40E0D0', 'violet': '#EE82EE', 'wheat': '#F5DEB3', 'white': '#FFFFFF', 'whitesmoke': '#F5F5F5', 'yellow': '#FFFF00', 'yellowgreen': '#9ACD32'}
>>> colorname = list(cnames.keys())
>>> print(colorname)
['aliceblue', 'antiquewhite', 'aqua', 'aquamarine', 'azure', 'beige', 'bisque', 'black', 'blanchedalmond', 'blue', 'blueviolet', 'brown', 'burlywood', 'cadetblue', 'chartreuse', 'chocolate', 'coral', 'cornflowerblue', 'cornsilk', 'crimson', 'cyan', 'darkblue', 'darkcyan', 'darkgoldenrod', 'darkgray', 'darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred', 'darksalmon', 'darkseagreen', 'darkslateblue', 'darkslategray', 'darkturquoise', 'darkviolet', 'deeppink', 'deepskyblue', 'dimgray', 'dodgerblue', 'firebrick', 'floralwhite', 'forestgreen', 'fuchsia', 'gainsboro', 'ghostwhite', 'gold', 'goldenrod', 'gray', 'green', 'greenyellow', 'honeydew', 'hotpink', 'indianred', 'indigo', 'ivory', 'khaki', 'lavender', 'lavenderblush', 'lawngreen', 'lemonchiffon', 'lightblue', 'lightcoral', 'lightcyan', 'lightgoldenrodyellow', 'lightgreen', 'lightgray', 'lightpink', 'lightsalmon', 'lightseagreen', 'lightskyblue', 'lightslategray', 'lightsteelblue', 'lightyellow', 'lime', 'limegreen', 'linen', 'magenta', 'maroon', 'mediumaquamarine', 'mediumblue', 'mediumorchid', 'mediumpurple', 'mediumseagreen', 'mediumslateblue', 'mediumspringgreen', 'mediumturquoise', 'mediumvioletred', 'midnightblue', 'mintcream', 'mistyrose', 'moccasin', 'navajowhite', 'navy', 'oldlace', 'olive', 'olivedrab', 'orange', 'orangered', 'orchid', 'palegoldenrod', 'palegreen', 'paleturquoise', 'palevioletred', 'papayawhip', 'peachpuff', 'peru', 'pink', 'plum', 'powderblue', 'purple', 'red', 'rosybrown', 'royalblue', 'saddlebrown', 'salmon', 'sandybrown', 'seagreen', 'seashell', 'sienna', 'silver', 'skyblue', 'slateblue', 'slategray', 'snow', 'springgreen', 'steelblue', 'tan', 'teal', 'thistle', 'tomato', 'turquoise', 'violet', 'wheat', 'white', 'whitesmoke', 'yellow', 'yellowgreen']
>>> X = boston.data
>>> y = boston.target
>>> features = boston.feature_names
>>> def Boston():
... for i, colorn in enumerate(colorname[13:26]):
... if i<12:
... plt.figure(43)
... plt.subplot(5,3,i+1)
... plt.plot(X[:,i], y, color=str(colorn))
... if i==12:
... plt.subplot(5,1,5)
... plt.plot(X[:,i],y,color=str(colorn))
... plt.savefig('Boston_Housing_Data.png', dpi=700)
... plt.show()
>>> Boston()
摘取其中LSTAT与Boston House Price的关系图:
>>> X = boston.data
>>> y = boston.target
>>> features = boston.feature_names[12]
>>> targets = 'Boston Housing Price versus %s' %(features) #3类鸢尾花的名称,跟y中的3个数字对应
... plt.figure(figsize=(8, 5))
... plt.plot(X[:,12], y, 'bx', label=targets)
... plt.title('Boston Housing Data')
... plt.legend()
... plt.savefig('Boston Housing Data.png', dpi=500)
... plt.show()
MNIST Handwriting Dataset (手写数据库, Yann LeCun)¶
MNIST(Mixed National Institute of Standards and Technology)只是更大NIST手写数据库的子集,但它是图片识别领域的 Hello World . 著名的科学家,Yann LeCun, 将这个数据集存放在 mnist . 但是因为它经常用,所以很多数据库,包括TensorFlow, 也将它囊括进去了。 数据集MNIST来自美国国家标准与技术研究所(NIST),其分为训练集和测试集,训练集有60000张图片,测试集有10000张图片,每张图片都有标签。数据集开源地址: MNIST ,共有四部分.
train-images-idx3-ubyte.gz (训练集样本)
train-labels-idx1-ubyte.gz (训练集标签)
t10k-images-idx3-ubyte.gz (测试集样本)
t10k-labels-idx1-ubyte.gz (测试集标签)
>>> import os, sys
>>> if os.getenv("JUPYTER_ENABLE_OBS") == "false":
... project_path = os.getcwd() + "/"
... else:
... #OBS类型Notebook实例,默认example目录
>>> directory = "mxnet_mnist_digit_recognition_train"
>>> project_path = os.environ['HOME'] + '/work/' + directory + "/"
>>> sys.path.append(project_path)
>>> dataset_url = "https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Mnist-Data-Set/archiver/Mnist-Data-Set.zip"
>>> dataset_file_names = ["train-images-idx3-ubyte.gz", "train-labels-idx1-ubyte.gz", "t10k-images-idx3-ubyte.gz", "t10k-labels-idx1-ubyte.gz"]
>>> dataset_local_path = project_path + 'dataset/'
>>> dataset_local_name = dataset_local_path + 'Mnist-Data-Set.zip'
$ wget {dataset_url} -P {dataset_local_path}
$ unzip -d {dataset_local_path} -o {dataset_local_name}
--2020-08-12 10:15:31-- https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Mnist-Data-Set/archiver/Mnist-Data-Set.zip
Resolving proxy-notebook.modelarts-dev-proxy.com (proxy-notebook.modelarts-dev-proxy.com)... xxx.xxx.x.xxx
Connecting to proxy-notebook.modelarts-dev-proxy.com (proxy-notebook.modelarts-dev-proxy.com)|xxx.xxx.x.xxx|:8083... connected.
Proxy request sent, awaiting response... 200 OK
Length: 23192478 (22M) [application/octet-stream]
Saving to: ‘/home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/Mnist-Data-Set.zip’
Mnist-Data-Set.zip 100%[===================>] 22.12M 124MB/s in 0.2s
2020-08-12 10:15:33 (124 MB/s) - ‘/home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/Mnist-Data-Set.zip’ saved [23192478/23192478]
Archive: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/Mnist-Data-Set.zip
inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-images-idx3-ubyte
inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-images-idx3-ubyte.gz
inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-labels-idx1-ubyte
extracting: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/t10k-labels-idx1-ubyte.gz
inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-images-idx3-ubyte
inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-images-idx3-ubyte.gz
inflating: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-labels-idx1-ubyte
extracting: /home/ma-user/work/mxnet_mnist_digit_recognition_train/dataset/train-labels-idx1-ubyte.gz