Verificación de rangos numéricos y de fecha — 10:06 min
10:06 min | Última modificación: Octubre 14, 2021 | YouTube
Rangos numéricos
Opciones:
Borrado del registro.
Conversión a nulo.
Corrección al máximo o mínimo.
Imputación como si fuere un valor nulo.
Actualización a un valor preestablecido.
[1]:
%%writefile /tmp/data.csv
personId,rangecol
1,1
2,3
3,2
4,10
5,0
6,1
7,10
8,9
Overwriting /tmp/data.csv
Los valores de la columna rangecol
están restringidos al rango [1, 2, 3].
[2]:
import pandas as pd
df = pd.read_csv('/tmp/data.csv')
#
# registros que no cumplen la restricción.
#
df[(df.rangecol < 1) | (df.rangecol > 3)].rangecol
[2]:
3 10
4 0
6 10
7 9
Name: rangecol, dtype: int64
[3]:
#
# Se convierten valores > 3 a 3 y valoes < 1 a 1
#
df.rangecol[df.rangecol > 3] = 3
df.rangecol[df.rangecol < 1] = 1
df
[3]:
personId | rangecol | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 3 |
2 | 3 | 2 |
3 | 4 | 3 |
4 | 5 | 1 |
5 | 6 | 1 |
6 | 7 | 3 |
7 | 8 | 3 |
[4]:
df = pd.read_csv('/tmp/data.csv')
#
# Borrado de registros que no están en el rango
# mediante selección
#
df = df[(df.rangecol >= 1) & (df.rangecol <= 3)]
df
[4]:
personId | rangecol | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 3 |
2 | 3 | 2 |
5 | 6 | 1 |
[5]:
df = pd.read_csv("/tmp/data.csv")
#
# Borrado de registros que no están en el rango
# usando la función drop()
#
df.drop(
df[(df.rangecol < 1) | (df.rangecol > 3)].index,
inplace=True,
)
df
[5]:
personId | rangecol | |
---|---|---|
0 | 1 | 1 |
1 | 2 | 3 |
2 | 3 | 2 |
5 | 6 | 1 |
Rangos de fecha
[6]:
%%writefile /tmp/data.csv
eventId,eventDate
1,2012-01-10
2,1900-12-23
3,2018-09-17
4,2019-11-15
5,2020-04-23
6,2025-07-03
7,2020-02-17
8,2017-08-12
9,2015-06-24
Overwriting /tmp/data.csv
[7]:
df = pd.read_csv('/tmp/data.csv')
display(
df,
df.dtypes
)
eventId | eventDate | |
---|---|---|
0 | 1 | 2012-01-10 |
1 | 2 | 1900-12-23 |
2 | 3 | 2018-09-17 |
3 | 4 | 2019-11-15 |
4 | 5 | 2020-04-23 |
5 | 6 | 2025-07-03 |
6 | 7 | 2020-02-17 |
7 | 8 | 2017-08-12 |
8 | 9 | 2015-06-24 |
eventId int64
eventDate object
dtype: object
[8]:
#
# Cambio del tipo de dato de 'eventDate' a datetime
#
df.eventDate = pd.to_datetime(df.eventDate)
display(
df,
df.dtypes
)
eventId | eventDate | |
---|---|---|
0 | 1 | 2012-01-10 |
1 | 2 | 1900-12-23 |
2 | 3 | 2018-09-17 |
3 | 4 | 2019-11-15 |
4 | 5 | 2020-04-23 |
5 | 6 | 2025-07-03 |
6 | 7 | 2020-02-17 |
7 | 8 | 2017-08-12 |
8 | 9 | 2015-06-24 |
eventId int64
eventDate datetime64[ns]
dtype: object
Rango de fechas:
1950-01-01
Fecha actual
[9]:
import datetime as dt
today = pd.to_datetime(dt.date.today())
today
[9]:
Timestamp('2021-09-15 00:00:00')
[10]:
#
# Verificación de restricciones
#
df[(df.eventDate < pd.to_datetime('1950-01-01')) | (df.eventDate > today)]
[10]:
eventId | eventDate | |
---|---|---|
1 | 2 | 1900-12-23 |
5 | 6 | 2025-07-03 |
[11]:
df.loc[
df.eventDate < "1950-01-01",
"eventDate",
] = pd.to_datetime("1950-01-01")
df.loc[
df.eventDate > today,
"eventDate"
] = today
display(
df,
df.dtypes
)
eventId | eventDate | |
---|---|---|
0 | 1 | 2012-01-10 |
1 | 2 | 1950-01-01 |
2 | 3 | 2018-09-17 |
3 | 4 | 2019-11-15 |
4 | 5 | 2020-04-23 |
5 | 6 | 2021-09-15 |
6 | 7 | 2020-02-17 |
7 | 8 | 2017-08-12 |
8 | 9 | 2015-06-24 |
eventId int64
eventDate datetime64[ns]
dtype: object