{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Consultas en bases de datos\n", "\n", "* *30 min* | Última modificación: Diciembre 10, 2020" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://www.nltk.org/book/\n", "\n", "Text Analytics with Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Considere la siguiente tabla en una base de datos:\n", "\n", " City Country Population\n", " -------------------------------\n", " athens greece 1368\n", " bangkok thailand 1178\n", " barcelona spain 1280\n", " berlin germany 3481\n", " birmingham united_kindom 1112\n", " \n", "¿Cómo puede procesarse la siguiente pregunta en SQL?\n", "\n", " Which country is athens in?\n", " \n", "RTA/\n", "\n", " SELECT Country FROM city_table WHERE City = 'athens'\n", "\n", "\n", "Explique la siguiente gramática:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting sql0.fcfg\n" ] } ], "source": [ "%%writefile sql0.fcfg\n", "% start S\n", "\n", "S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]\n", "\n", "VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]\n", "\n", "VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]\n", "\n", "NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]\n", "\n", "PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]\n", "\n", "AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]\n", "\n", "NP[SEM='Country=\"greece\"'] -> 'Greece'\n", "\n", "NP[SEM='Country=\"china\"'] -> 'China'\n", "\n", "Det[SEM='SELECT'] -> 'Which' | 'What'\n", "\n", "N[SEM='City FROM city_table'] -> 'cities'\n", "\n", "IV[SEM=''] -> 'are'\n", "\n", "A[SEM=''] -> 'located'\n", "\n", "P[SEM=''] -> 'in'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SELECT City FROM city_table WHERE Country=\"china\"\n" ] } ], "source": [ "from nltk import load_parser\n", "\n", "cp = load_parser('sql0.fcfg')\n", "query = 'What cities are located in China'\n", "trees = list(cp.parse(query.split()))\n", "answer = trees[0].label()['SEM']\n", "answer = [s for s in answer if s]\n", "q = ' '.join(answer)\n", "print(q)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Ejercicio.---** Extienda la gramatica para que el sistema pueda interpretar la siguiente pregunta:\n", "\n", "```\n", "What cities are in China and have populations above 1.000.000?\n", "\n", "```\n", "\n", "y genere como respuesta la siguiente clausula de SQL:\n", "\n", "```\n", "SELECT City FROM city_table WHERE Country = 'china' AND Population > 1000\n", "```\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", " \n", "from nltk import load_parser\n", "cp = load_parser('grammars/book_grammars/sql0.fcfg')\n", "query = 'What cities are located in China'\n", "trees = list(cp.parse(query.split()))\n", "answer = trees[0].label()['SEM']\n", "answer = [s for s in answer if s]\n", "q = ' '.join(answer)\n", "print(q)\n", "\n" ] } ], "source": [ "text = '''\n", "\n", " \t\n", ">>> from nltk import load_parser\n", ">>> cp = load_parser('grammars/book_grammars/sql0.fcfg')\n", ">>> query = 'What cities are located in China'\n", ">>> trees = list(cp.parse(query.split()))\n", ">>> answer = trees[0].label()['SEM']\n", ">>> answer = [s for s in answer if s]\n", ">>> q = ' '.join(answer)\n", ">>> print(q)\n", "'''\n", "\n", "\n", "text = text.replace(\">>> \", \"\").replace(\"... \", \"\").replace(\"...\", \"\").replace(\"\\t\", \"\")\n", "print(text)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }