{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Consultas en bases de datos\n",
    "\n",
    "* *30 min* | Última modificación: Diciembre 10, 2020"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "http://www.nltk.org/book/\n",
    "\n",
    "Text Analytics with Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Considere la siguiente tabla en una base de datos:\n",
    "\n",
    "    City        Country  Population\n",
    "    -------------------------------\n",
    "    athens      greece         1368\n",
    "    bangkok     thailand       1178\n",
    "    barcelona   spain          1280\n",
    "    berlin      germany        3481\n",
    "    birmingham  united_kindom  1112\n",
    "    \n",
    "¿Cómo puede procesarse la siguiente pregunta en SQL?\n",
    "\n",
    "    Which country is athens in?\n",
    "    \n",
    "RTA/\n",
    "\n",
    "    SELECT Country FROM city_table WHERE City = 'athens'\n",
    "\n",
    "\n",
    "Explique la siguiente gramática:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting sql0.fcfg\n"
     ]
    }
   ],
   "source": [
    "%%writefile sql0.fcfg\n",
    "% start S\n",
    "\n",
    "S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]\n",
    "\n",
    "VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]\n",
    "\n",
    "VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]\n",
    "\n",
    "NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]\n",
    "\n",
    "PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]\n",
    "\n",
    "AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]\n",
    "\n",
    "NP[SEM='Country=\"greece\"'] -> 'Greece'\n",
    "\n",
    "NP[SEM='Country=\"china\"'] -> 'China'\n",
    "\n",
    "Det[SEM='SELECT'] -> 'Which' | 'What'\n",
    "\n",
    "N[SEM='City FROM city_table'] -> 'cities'\n",
    "\n",
    "IV[SEM=''] -> 'are'\n",
    "\n",
    "A[SEM=''] -> 'located'\n",
    "\n",
    "P[SEM=''] -> 'in'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT City FROM city_table WHERE Country=\"china\"\n"
     ]
    }
   ],
   "source": [
    "from nltk import load_parser\n",
    "\n",
    "cp = load_parser('sql0.fcfg')\n",
    "query = 'What cities are located in China'\n",
    "trees = list(cp.parse(query.split()))\n",
    "answer = trees[0].label()['SEM']\n",
    "answer = [s for s in answer if s]\n",
    "q = ' '.join(answer)\n",
    "print(q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Ejercicio.---** Extienda la gramatica para que el sistema pueda interpretar la siguiente pregunta:\n",
    "\n",
    "```\n",
    "What cities are in China and have populations above 1.000.000?\n",
    "\n",
    "```\n",
    "\n",
    "y genere como respuesta la siguiente clausula de SQL:\n",
    "\n",
    "```\n",
    "SELECT City FROM city_table WHERE Country = 'china' AND Population > 1000\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      " \n",
      "from nltk import load_parser\n",
      "cp = load_parser('grammars/book_grammars/sql0.fcfg')\n",
      "query = 'What cities are located in China'\n",
      "trees = list(cp.parse(query.split()))\n",
      "answer = trees[0].label()['SEM']\n",
      "answer = [s for s in answer if s]\n",
      "q = ' '.join(answer)\n",
      "print(q)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "text = '''\n",
    "\n",
    " \t\n",
    ">>> from nltk import load_parser\n",
    ">>> cp = load_parser('grammars/book_grammars/sql0.fcfg')\n",
    ">>> query = 'What cities are located in China'\n",
    ">>> trees = list(cp.parse(query.split()))\n",
    ">>> answer = trees[0].label()['SEM']\n",
    ">>> answer = [s for s in answer if s]\n",
    ">>> q = ' '.join(answer)\n",
    ">>> print(q)\n",
    "'''\n",
    "\n",
    "\n",
    "text = text.replace(\">>> \", \"\").replace(\"... \", \"\").replace(\"...\", \"\").replace(\"\\t\", \"\")\n",
    "print(text)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}