Verificación del tipo de los datos — 4:30 min
4:30 min | Última modificación: Octubre 14, 2021
Se refiere a la correción del tipo de los datos en las tablas
[1]:
import pandas as pd
String a entero o flotante
[2]:
%%writefile /tmp/data.csv
orderId,price,percentage
1,100$,15.3%
2,120$,22.1%
3,128$,54.2%
4,155$,10.0%
5,234$,6%
Overwriting /tmp/data.csv
[3]:
df = pd.read_csv("/tmp/data.csv")
#
# Note que los tipos de las columnas price y
# percentage son object. Se debe a los caracteres
# $ y % en el archivo.
#
#
display(
df,
df.dtypes
)
orderId | price | percentage | |
---|---|---|---|
0 | 1 | 100$ | 15.3% |
1 | 2 | 120$ | 22.1% |
2 | 3 | 128$ | 54.2% |
3 | 4 | 155$ | 10.0% |
4 | 5 | 234$ | 6% |
orderId int64
price object
percentage object
dtype: object
[4]:
#
# Corrección
#
df.price = df.price.str.strip('$')
df.price = df.price.astype(int)
df.percentage = df.percentage.str.strip('%')
df.percentage = df.percentage.astype(float)
display(
df,
df.dtypes
)
orderId | price | percentage | |
---|---|---|---|
0 | 1 | 100 | 15.3 |
1 | 2 | 120 | 22.1 |
2 | 3 | 128 | 54.2 |
3 | 4 | 155 | 10.0 |
4 | 5 | 234 | 6.0 |
orderId int64
price int64
percentage float64
dtype: object
Numérico a categoría
Codebook:
0 single
1 married
2 divorced
[5]:
%%writefile /tmp/data.csv
personId,status
1,0
2,0
3,1
3,2
4,2
Overwriting /tmp/data.csv
[6]:
df = pd.read_csv("/tmp/data.csv")
#
# El status es int64
#
display(
df,
df.dtypes
)
personId | status | |
---|---|---|
0 | 1 | 0 |
1 | 2 | 0 |
2 | 3 | 1 |
3 | 3 | 2 |
4 | 4 | 2 |
personId int64
status int64
dtype: object
[7]:
df.status = df.status.astype('category')
display(
df,
df.dtypes
)
personId | status | |
---|---|---|
0 | 1 | 0 |
1 | 2 | 0 |
2 | 3 | 1 |
3 | 3 | 2 |
4 | 4 | 2 |
personId int64
status category
dtype: object
[8]:
df.describe()
[8]:
personId | |
---|---|
count | 5.000000 |
mean | 2.600000 |
std | 1.140175 |
min | 1.000000 |
25% | 2.000000 |
50% | 3.000000 |
75% | 3.000000 |
max | 4.000000 |
[9]:
df.status
[9]:
0 0
1 0
2 1
3 2
4 2
Name: status, dtype: category
Categories (3, int64): [0, 1, 2]