Operaciones básicas en SQLite3 — 20:37 min
20:37 min | Última modificación: Octubre 14, 2021 | YouTube
Este tutorial esta basado en https://es.hortonworks.com/tutorial/beginners-guide-to-apache-pig/
En este modulo se usará como base de datos sqlite3
(www.sqlite.org) la cual es ampliamente utilizada para programación de bases de datos portables. El sitio oficial es https://www.sqlite.org/index.html. SQLite provee una interfaz de línea de comandos preinstalada en muchos sistemas Linux llamada sqlite3
. También existe una versión para Microsoft Windows.
En este tutorial se usará la interfaz pysqlite3
de Python. Los comandos SQL presentados en los ejemplos pueden ser usados directamente en la interfaz de línea de comandos o enviados desde Python.
En este tutorial se ejemplifica:
La carga de datos.
El uso básico de consultas.
La exportación de resultados.
Fuentes de datos
Los archivos usados en este tutorial se encuentran almacenados en la carpeta actual
[1]:
filenames = [
"drivers.csv",
"timesheet.csv",
"truck_event_text_partition.csv",
]
url = "https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/drivers/"
for filename in filenames:
!wget --quiet {url + filename} -P /tmp/
!ls -1 /tmp/*.csv
/tmp/bonus.csv
/tmp/clients.csv
/tmp/clients_a.csv
/tmp/clients_b.csv
/tmp/data.csv
/tmp/data_1.csv
/tmp/data_2.csv
/tmp/dataset_1.csv
/tmp/dataset_2.csv
/tmp/drivers.csv
/tmp/lines.csv
/tmp/sales.csv
/tmp/specific-columns.csv
/tmp/timesheet.csv
/tmp/truck_event_text_partition.csv
Preparación
[2]:
##
## Apertura de la conexión
##
import sqlite3
conn = sqlite3.connect(":memory:") ## aca se indica el nombre de la db.
cur = conn.cursor()
Carga de los datos de los eventos de los conductores
La cabecera del archivo que se desea importar es la siguiente:
[25]:
!head /tmp/truck_event_text_partition.csv
driverId,truckId,eventTime,eventType,longitude,latitude,eventKey,CorrelationId,driverName,routeId,routeName,eventDate
14,25,59:21.4,Normal,-94.58,37.03,14|25|9223370572464814373,3.66E+18,Adis Cesir,160405074,Joplin to Kansas City Route 2,2016-05-27-22
18,16,59:21.7,Normal,-89.66,39.78,18|16|9223370572464814089,3.66E+18,Grant Liu,1565885487,Springfield to KC Via Hanibal,2016-05-27-22
27,105,59:21.7,Normal,-90.21,38.65,27|105|9223370572464814070,3.66E+18,Mark Lochbihler,1325562373,Springfield to KC Via Columbia Route 2,2016-05-27-22
11,74,59:21.7,Normal,-90.2,38.65,11|74|9223370572464814123,3.66E+18,Jamie Engesser,1567254452,Saint Louis to Memphis Route2,2016-05-27-22
22,87,59:21.7,Normal,-90.04,35.19,22|87|9223370572464814101,3.66E+18,Nadeem Asghar,1198242881, Saint Louis to Chicago Route2,2016-05-27-22
22,87,59:22.3,Normal,-90.37,35.21,22|87|9223370572464813486,3.66E+18,Nadeem Asghar,1198242881, Saint Louis to Chicago Route2,2016-05-27-22
23,68,59:22.4,Normal,-89.91,40.86,23|68|9223370572464813450,3.66E+18,Adam Diaz,160405074,Joplin to Kansas City Route 2,2016-05-27-22
11,74,59:22.5,Normal,-89.74,39.1,11|74|9223370572464813355,3.66E+18,Jamie Engesser,1567254452,Saint Louis to Memphis Route2,2016-05-27-22
20,41,59:22.5,Normal,-93.36,41.69,20|41|9223370572464813344,3.66E+18,Chris Harris,160779139,Des Moines to Chicago Route 2,2016-05-27-22
[26]:
#
# Nombres de las columnas
#
!head -n 1 /tmp/truck_event_text_partition.csv | tr ',' '\n'
driverId
truckId
eventTime
eventType
longitude
latitude
eventKey
CorrelationId
driverName
routeId
routeName
eventDate
En el siguiente código se crea crea la tabla de eventos de los conductores en el sistema; la primera instrucción borra la tabla si ya existe.
[5]:
#
# executescript() permite enviar varios comandos de SQL
# en la misma cadena de texto.
#
conn.executescript(
"""
DROP TABLE IF EXISTS truck_events;
CREATE TABLE truck_events (driverId INT,
truckId INT,
eventTime STRING,
eventType STRING,
longitude DOUBLE,
latitude DOUBLE,
eventKey STRING,
correlationId STRING,
driverName STRING,
routeId STRING,
routeName STRING,
eventDate STRING);
"""
)
conn.commit()
Se verifican las tablas existentes en la base de datos.
[6]:
#
# La función execute() permite enviar únicamente un comando SQL
# al motor de la base de datos. El siguiente comando es equivalente
# a `SHOW TABLES;` en SQL
#
cur.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
[6]:
[('truck_events',)]
A continuación se muestra la información detallada de creación de la tabla truck_events
.
[7]:
#
# La siguiente sentencia devuelve un string que
# representa los comandos sql para crear la tabla
#
cur.execute(
"SELECT sql FROM sqlite_master WHERE type='table' and name='truck_events';"
).fetchall()
[7]:
[('CREATE TABLE truck_events (driverId INT, \n truckId INT,\n eventTime STRING,\n eventType STRING, \n longitude DOUBLE, \n latitude DOUBLE,\n eventKey STRING, \n correlationId STRING, \n driverName STRING,\n routeId STRING,\n routeName STRING,\n eventDate STRING)',)]
[8]:
#
# Se imprime el comando sql equivalente para cada
# tabla existente en la base de datos
#
for a in cur.execute("SELECT sql FROM sqlite_master WHERE type='table';").fetchall():
print(a[0])
CREATE TABLE truck_events (driverId INT,
truckId INT,
eventTime STRING,
eventType STRING,
longitude DOUBLE,
latitude DOUBLE,
eventKey STRING,
correlationId STRING,
driverName STRING,
routeId STRING,
routeName STRING,
eventDate STRING)
Carga de datos usando INSERT INTO
Este método de carga consiste en crear una lista de tupas, donde cada elemento de la tupla es un campo de la tabla.
[9]:
# Lectura de todo el archivo
with open("/tmp/truck_event_text_partition.csv", "rt") as f:
data = f.readlines()
# Elimina el '\n' al final de la línea
data = [line.replace('\n', '') line for line in data]
# Separa los campos por comas
data = [line.split(",") for line in data]
# Convierte la fila en una tupla
data = [tuple(line) for line in data]
# Descarta la cabecera
data = data[1:]
# Imprime los primeros 3 registros para verificar
data[0:3]
[9]:
[('14',
'25',
'59:21.4',
'Normal',
'-94.58',
'37.03',
'14|25|9223370572464814373',
'3.66E+18',
'Adis Cesir',
'160405074',
'Joplin to Kansas City Route 2',
'2016-05-27-22'),
('18',
'16',
'59:21.7',
'Normal',
'-89.66',
'39.78',
'18|16|9223370572464814089',
'3.66E+18',
'Grant Liu',
'1565885487',
'Springfield to KC Via Hanibal',
'2016-05-27-22'),
('27',
'105',
'59:21.7',
'Normal',
'-90.21',
'38.65',
'27|105|9223370572464814070',
'3.66E+18',
'Mark Lochbihler',
'1325562373',
'Springfield to KC Via Columbia Route 2',
'2016-05-27-22')]
[10]:
#
# Carga a partir de la lista de tuplas
# contenidas en data
#
cur.executemany("INSERT INTO truck_events VALUES (?,?,?,?,?,?,?,?,?,?,?,?)", data)
#
# Verificación
#
cur.execute("SELECT * FROM truck_events LIMIT 1;").fetchall()
[10]:
[(14,
25,
'59:21.4',
'Normal',
-94.58,
37.03,
'14|25|9223370572464814373',
3660000000000000000,
'Adis Cesir',
160405074,
'Joplin to Kansas City Route 2',
'2016-05-27-22')]
Consulta de datos
La consulta se realiza con SELECT
, usando la sintaxis convencional:
SELECT
what
FROM
tbl_name
[WHERE where-conditions]
[ORDER BY column_name];
[11]:
cur.execute("SELECT * FROM truck_events LIMIT 3;").fetchall()
[11]:
[(14,
25,
'59:21.4',
'Normal',
-94.58,
37.03,
'14|25|9223370572464814373',
3660000000000000000,
'Adis Cesir',
160405074,
'Joplin to Kansas City Route 2',
'2016-05-27-22'),
(18,
16,
'59:21.7',
'Normal',
-89.66,
39.78,
'18|16|9223370572464814089',
3660000000000000000,
'Grant Liu',
1565885487,
'Springfield to KC Via Hanibal',
'2016-05-27-22'),
(27,
105,
'59:21.7',
'Normal',
-90.21,
38.65,
'27|105|9223370572464814070',
3660000000000000000,
'Mark Lochbihler',
1325562373,
'Springfield to KC Via Columbia Route 2',
'2016-05-27-22')]
Obtención de un subconjunto de registros
En SQLite3 es posible un subconjunto de datos y almacenarlo en una nueva tabla a partir de una consulta que permita obtener los datos deseados. En el siguiente código, se crea la tabla truck_events_subset
con los primeros 100 registros de la tabla truck_events
.
[12]:
conn.executescript(
"""
DROP TABLE IF EXISTS truck_events_subset;
CREATE TABLE truck_events_subset
AS
SELECT *
FROM truck_events
LIMIT 100;
"""
)
conn.commit()
[13]:
cur.execute("SELECT * FROM truck_events_subset LIMIT 1;").fetchall()
[13]:
[(14,
25,
'59:21.4',
'Normal',
-94.58,
37.03,
'14|25|9223370572464814373',
3660000000000000000,
'Adis Cesir',
160405074,
'Joplin to Kansas City Route 2',
'2016-05-27-22')]
Obtención de un subconjunto de campos
En el siguiente código se obtienen algunas columnas de la tabla truck_events_subset
para ser almacenadas en una tabla diferente.
[14]:
conn.executescript(
"""
DROP TABLE IF EXISTS specific_columns;
CREATE TABLE specific_columns
AS
SELECT
driverId,
eventTime,
eventType
FROM
truck_events_subset;
SELECT * FROM specific_columns LIMIT 1;
"""
)
conn.commit()
La siguiente consulta obtiene los registros 2 al 5.
[15]:
cur.execute("SELECT * FROM specific_columns LIMIT 2,3;").fetchall()
[15]:
[(27, '59:21.7', 'Normal'),
(11, '59:21.7', 'Normal'),
(22, '59:21.7', 'Normal')]
[16]:
#
# Registros para el driverId == 11
#
cur.execute("SELECT * FROM specific_columns WHERE driverId = 11;").fetchall()
[16]:
[(11, '59:21.7', 'Normal'),
(11, '59:22.5', 'Normal'),
(11, '59:28.3', 'Normal'),
(11, '59:30.0', 'Normal'),
(11, '59:31.8', 'Normal')]
[17]:
#
# Tipos de eventos
#
cur.execute("SELECT DISTINCT eventType FROM specific_columns;").fetchall()
[17]:
[('Normal',)]
Escritura de la tabla en el disco
Seguidamente, se procede a escribir el contenido de la tabla en el directorio actual.
[18]:
#
# Obtención de los datos como una lista de tuplas
#
data = cur.execute("SELECT * FROM truck_events_subset;").fetchall()
#
# Conversión de los elementos de la tupla a strings
#
text = [[str(e) for e in row] for row in data]
#
# Concatenación de strings
#
text = [",".join(row) for row in text]
text = "\n".join(text)
#
# Escritura al disco
#
open("/tmp/data.csv", "wt").write(text)
[18]:
14688
[19]:
#
# Visualización de la cabecera dek archivo
#
!head /tmp/data.csv
14,25,59:21.4,Normal,-94.58,37.03,14|25|9223370572464814373,3660000000000000000,Adis Cesir,160405074,Joplin to Kansas City Route 2,2016-05-27-22
18,16,59:21.7,Normal,-89.66,39.78,18|16|9223370572464814089,3660000000000000000,Grant Liu,1565885487,Springfield to KC Via Hanibal,2016-05-27-22
27,105,59:21.7,Normal,-90.21,38.65,27|105|9223370572464814070,3660000000000000000,Mark Lochbihler,1325562373,Springfield to KC Via Columbia Route 2,2016-05-27-22
11,74,59:21.7,Normal,-90.2,38.65,11|74|9223370572464814123,3660000000000000000,Jamie Engesser,1567254452,Saint Louis to Memphis Route2,2016-05-27-22
22,87,59:21.7,Normal,-90.04,35.19,22|87|9223370572464814101,3660000000000000000,Nadeem Asghar,1198242881, Saint Louis to Chicago Route2,2016-05-27-22
22,87,59:22.3,Normal,-90.37,35.21,22|87|9223370572464813486,3660000000000000000,Nadeem Asghar,1198242881, Saint Louis to Chicago Route2,2016-05-27-22
23,68,59:22.4,Normal,-89.91,40.86,23|68|9223370572464813450,3660000000000000000,Adam Diaz,160405074,Joplin to Kansas City Route 2,2016-05-27-22
11,74,59:22.5,Normal,-89.74,39.1,11|74|9223370572464813355,3660000000000000000,Jamie Engesser,1567254452,Saint Louis to Memphis Route2,2016-05-27-22
20,41,59:22.5,Normal,-93.36,41.69,20|41|9223370572464813344,3660000000000000000,Chris Harris,160779139,Des Moines to Chicago Route 2,2016-05-27-22
32,42,59:22.5,Normal,-90.37,35.21,32|42|9223370572464813296,3660000000000000000,Ryan Templeton,1090292248,Peoria to Ceder Rapids Route 2,2016-05-27-22
[20]:
#
# Visualización de la parte final del archivo
#
!tail /tmp/data.csv
29,66,59:35.1,Normal,-95.42,35.87,29|66|9223370572464800713,3660000000000000000,Teddy Choi,803014426,Wichita to Little Rock Route 2,2016-05-27-22
15,51,59:35.1,Normal,-91.14,34.96,15|51|9223370572464800666,3660000000000000000,Rohit Bakshi,1384345811,Joplin to Kansas City,2016-05-27-22
12,104,59:35.3,Normal,-89.65,36.37,12|104|9223370572464800526,3660000000000000000,Paul Codding,24929475,Peoria to Ceder Rapids,2016-05-27-22
20,41,59:35.5,Normal,-88.96,42.25,20|41|9223370572464800335,3660000000000000000,Chris Harris,160779139,Des Moines to Chicago Route 2,2016-05-27-22
31,18,59:35.6,Normal,-94.46,37.16,31|18|9223370572464800225,3660000000000000000,Rommel Garcia,1594289134,Memphis to Little Rock Route 2,2016-05-27-22
27,105,59:35.6,Normal,-92.85,38.93,27|105|9223370572464800175,3660000000000000000,Mark Lochbihler,1325562373,Springfield to KC Via Columbia Route 2,2016-05-27-22
14,25,59:35.8,Normal,-94.46,37.16,14|25|9223370572464800006,3660000000000000000,Adis Cesir,160405074,Joplin to Kansas City Route 2,2016-05-27-22
26,57,59:35.9,Normal,-92.74,37.6,26|57|9223370572464799895,3660000000000000000,Michael Aube,1325712174,Saint Louis to Tulsa Route2,2016-05-27-22
18,16,59:36.3,Normal,-92.42,39.76,18|16|9223370572464799486,3660000000000000000,Grant Liu,1565885487,Springfield to KC Via Hanibal,2016-05-27-22
31,18,59:36.3,Normal,-94.58,37.03,31|18|9223370572464799462,3660000000000000000,Rommel Garcia,1594289134,Memphis to Little Rock Route 2,2016-05-27-22
Apéndice — Manejo de nulos
[21]:
conn.executescript(
"""
DROP TABLE IF EXISTS nulltable;
CREATE TABLE nulltable (
col1 INT,
col2 VARCHAR(1)
);
INSERT INTO nulltable VALUES
(1,"A"),
(2,NULL),
(NULL,"C"),
(NULL,NULL),
(5,"E");"""
)
conn.commit()
cur.execute("SELECT * FROM nulltable;").fetchall()
[21]:
[(1, 'A'), (2, None), (None, 'C'), (None, None), (5, 'E')]
[22]:
cur.execute("SELECT * FROM nulltable WHERE col1 IS NULL;").fetchall()
[22]:
[(None, 'C'), (None, None)]
[23]:
cur.execute("SELECT * FROM nulltable WHERE col2 IS NULL;").fetchall()
[23]:
[(2, None), (None, None)]
[24]:
#
# condicionales
#
cur.execute(
"SELECT IFNULL(col1,'Unknown'), IFNULL(col2,'Unknown') FROM nulltable;"
).fetchall()
[24]:
[(1, 'A'), (2, 'Unknown'), ('Unknown', 'C'), ('Unknown', 'Unknown'), (5, 'E')]