{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gramáticas con características\n",
    "\n",
    "* *30 min* | Última modificación: Diciembre 10, 2020"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "http://www.nltk.org/book/\n",
    "\n",
    "Text Analytics with Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "##\n",
    "## Uso de diccionarios para definir características\n",
    "## de entidades gramaticales\n",
    "##\n",
    "##    CAT: categoria gramatical\n",
    "##    ORTH: ortografía\n",
    "##    REF: referente\n",
    "##    REL: relación\n",
    "##\n",
    "\n",
    "kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'}\n",
    "\n",
    "chase = {'CAT': 'V', 'ORTH': 'chased', 'REL': 'chase'}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concordancia gramatical"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Formas de conjugación de un verbo**\n",
    "\n",
    "```\n",
    "          singular         plural\n",
    "-------------------------------------          \n",
    "1st per   I run            we run\n",
    "2nd per   you run          you run\n",
    "3rd per   he/she/it runs   rhey run\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Aproximación directa.** Reglas para singular y plural. No es adecuado para gramaticas con muchas reglas.\n",
    "\n",
    "```\n",
    "S -> NP_SG VP_SG\n",
    "S -> NP_PL VP_PL\n",
    "\n",
    "NP_SG -> Det_SG N_SG\n",
    "NP_PL -> Det_PL N_PL\n",
    "\n",
    "VP_SG -> V_SG\n",
    "VP_PL -> V_PL\n",
    "\n",
    "Det_SG -> 'this'\n",
    "Det_PL -> 'these'\n",
    "\n",
    "N_SG -> 'dog'\n",
    "N_PL -> 'dogs'\n",
    "\n",
    "V_SG -> 'runs'\n",
    "V_PL -> 'run'\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Adición de propiedades a las categorías gramaticales**.\n",
    "\n",
    "Si `sg` es singular y `pl` es plural.\n",
    "\n",
    "```\n",
    "Det[NUM=sg] -> 'this'\n",
    "Det[NUM=pl] -> 'these'\n",
    "\n",
    "N[NUM=sg] -> 'dog'\n",
    "N[NUM=pl] -> 'dogs'\n",
    "\n",
    "V[NUM=sg] -> 'runs'\n",
    "V[NUM=pl] -> 'run'\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Resulta más apropiado permitir variables en las propiedades. `?n` denota los posibles valores de `NUM`.\n",
    "\n",
    "```\n",
    "S -> NP[NUM=?n] VP[NUM=?n]\n",
    "\n",
    "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n",
    "\n",
    "VP[NUM=?n] -> V[NUM=?n]\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting feat0.fcfg\n"
     ]
    }
   ],
   "source": [
    "%%writefile feat0.fcfg\n",
    "% start S\n",
    "# ###################\n",
    "# Grammar Productions\n",
    "# ###################\n",
    "# S expansion productions\n",
    "S -> NP[NUM=?n] VP[NUM=?n]\n",
    "# NP expansion productions\n",
    "NP[NUM=?n] -> N[NUM=?n]\n",
    "NP[NUM=?n] -> PropN[NUM=?n]\n",
    "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n",
    "NP[NUM=pl] -> N[NUM=pl]\n",
    "# VP expansion productions\n",
    "VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]\n",
    "VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP\n",
    "\n",
    "# ###################\n",
    "# Lexical Productions\n",
    "# ###################\n",
    "Det[NUM=sg] -> 'this' | 'every'\n",
    "Det[NUM=pl] -> 'these' | 'all'\n",
    "Det -> 'the' | 'some' | 'several'\n",
    "PropN[NUM=sg]-> 'Kim' | 'Jody'\n",
    "N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'\n",
    "N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'\n",
    "\n",
    "IV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'\n",
    "IV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'\n",
    "IV[TENSE=past] -> 'disappeared' | 'walked'\n",
    "\n",
    "TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'\n",
    "TV[TENSE=pres, NUM=pl] -> 'see' | 'like'\n",
    "TV[TENSE=past] -> 'saw' | 'liked'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S[]\n",
      "  (NP[NUM='sg'] (PropN[NUM='sg'] Kim))\n",
      "  (VP[NUM='sg', TENSE='pres']\n",
      "    (TV[NUM='sg', TENSE='pres'] likes)\n",
      "    (NP[NUM='pl'] (N[NUM='pl'] children))))\n"
     ]
    }
   ],
   "source": [
    "from nltk import load_parser\n",
    "\n",
    "## crea el parser\n",
    "parser = load_parser('feat0.fcfg', trace=0)\n",
    "\n",
    "## frase a analizar\n",
    "tokens = 'Kim likes children'.split()\n",
    "\n",
    "## arboles\n",
    "for tree in parser.parse(tokens):\n",
    "    print(tree)\n",
    "    \n",
    "# TV: verbos transitivos\n",
    "# IV: verbos intransitivos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Terminologia"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "La notación `+/-` se usa para representar `true/false`. En el siguiente ejemplo `AUX` indica si el verbo es usado como auxiliar.\n",
    "\n",
    "```python\n",
    "V[TENSE=pres, AUX=+] -> 'can'\n",
    "V[TENSE=pres, AUX=-] -> 'may'\n",
    "\n",
    "V[TENSE=pres, AUX=+] -> 'walks'\n",
    "V[TENSE=pres, AUX=-] -> 'likes'\n",
    "```\n",
    "\n",
    "No obstante, se suele reemplazar `AUX=+` por `+AUX`\n",
    "\n",
    "```\n",
    "V[TENSE=pres, +AUX] -> 'can'\n",
    "V[TENSE=pres, +AUX] -> 'may'\n",
    "\n",
    "V[TENSE=pres, -AUX] -> 'walks'\n",
    "V[TENSE=pres, -AUX] -> 'likes'\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Concordancia de características:\n",
    "\n",
    "```\n",
    "S                                     -> NP[AGR=?n]  VP[AGR=?n]\n",
    "\n",
    "NP[AGR=?n]                            -> PropN[AGR=?n]\n",
    "\n",
    "VP[TENSE=?t, AGR=?n]                  -> Cop[TENSE=?t, AGR=?n] Adj\n",
    "\n",
    "Cop[TENSE=pres,  AGR=[NUM=sg, PER=3]] -> 'is'\n",
    "\n",
    "PropN[AGR=[NUM=sg, PER=3]]            -> 'Kim'\n",
    "Adj                                   -> 'happy'\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ NUM   = 'sg'   ]\n",
      "[ TENSE = 'past' ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Manipulación de características en NLTK\n",
    "##\n",
    "import nltk \n",
    "\n",
    "## características atomicas (strings o enteros)\n",
    "fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')\n",
    "print(fs1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fem\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Las estructuras de características operan como diccionarios\n",
    "##\n",
    "fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')\n",
    "\n",
    "## extracción del valor asociado a la clave\n",
    "print(fs1['GND'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ CASE = 'acc' ]\n",
      "[ GND  = 'fem' ]\n",
      "[ NUM  = 'pl'  ]\n",
      "[ PER  = 3     ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Asignación de una nueva característica\n",
    "##\n",
    "fs1['CASE'] = 'acc'\n",
    "print(fs1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[       [ CASE = 'acc' ] ]\n",
      "[ AGR = [ GND  = 'fem' ] ]\n",
      "[       [ NUM  = 'pl'  ] ]\n",
      "[       [ PER  = 3     ] ]\n",
      "[                        ]\n",
      "[ POS = 'N'              ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Estructuras con valores complejos\n",
    "##\n",
    "fs2 = nltk.FeatStruct(POS='N', AGR=fs1)\n",
    "print(fs2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ CASE = 'acc' ]\n",
      "[ GND  = 'fem' ]\n",
      "[ NUM  = 'pl'  ]\n",
      "[ PER  = 3     ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Extracción de un valor complejo por clave\n",
    "##\n",
    "print(fs2['AGR'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Extracción de un valor dentro de un diccionario\n",
    "##\n",
    "print(fs2['AGR']['PER'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[       [ GND = 'fem' ] ]\n",
      "[ AGR = [ NUM = 'pl'  ] ]\n",
      "[       [ PER = 3     ] ]\n",
      "[                       ]\n",
      "[ POS = 'N'             ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Creación de una estructura compleja usando corchetes\n",
    "##\n",
    "print(nltk.FeatStruct(\"[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ AGE   = 33               ]\n",
      "[ NAME  = 'Lee'            ]\n",
      "[ TELNO = '01 27 86 42 96' ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Las estructuras pueden guardar cualquier valor\n",
    "##\n",
    "print(nltk.FeatStruct(NAME='Lee', TELNO='01 27 86 42 96', AGE=33))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ ADDRESS = (1) [ NUMBER = 74           ] ]\n",
      "[               [ STREET = 'rue Pascal' ] ]\n",
      "[                                         ]\n",
      "[ NAME    = 'Lee'                         ]\n",
      "[                                         ]\n",
      "[ SPOUSE  = [ ADDRESS -> (1)  ]           ]\n",
      "[           [ NAME    = 'Kim' ]           ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Uso de referencias a valoes ya asignados.\n",
    "## Note el uso de ->(1)\n",
    "##\n",
    "print(\n",
    "    nltk.FeatStruct(\n",
    "        \"\"\"\n",
    "        [\n",
    "            NAME='Lee', \n",
    "            ADDRESS=(1)[\n",
    "                 NUMBER=74, \n",
    "                 STREET='rue Pascal'\n",
    "            ],\n",
    "            SPOUSE=[\n",
    "                 NAME='Kim', \n",
    "                 ADDRESS->(1)\n",
    "            ]\n",
    "        ]\n",
    "        \"\"\"\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ A = 'a'             ]\n",
      "[                     ]\n",
      "[ B = (1) [ C = 'c' ] ]\n",
      "[                     ]\n",
      "[ D -> (1)            ]\n",
      "[ E -> (1)            ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Forma alternativa para indicar referencias\n",
    "##\n",
    "print(\n",
    "    nltk.FeatStruct(\n",
    "        \"\"\"\n",
    "        [\n",
    "            A='a', \n",
    "            B=(1)[C='c'], \n",
    "            D->(1), \n",
    "            E->(1)\n",
    "        ]\n",
    "        \"\"\"\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ CITY   = 'Paris'      ]\n",
      "[ NUMBER = 74           ]\n",
      "[ STREET = 'rue Pascal' ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Unificación de características (unión de conjuntos)\n",
    "##\n",
    "fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')\n",
    "fs2 = nltk.FeatStruct(CITY='Paris')\n",
    "print(fs1.unify(fs2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Unificación de características con el mismo nombre\n",
    "##\n",
    "fs0 = nltk.FeatStruct(A='a')\n",
    "fs1 = nltk.FeatStruct(A='b')\n",
    "fs2 = fs0.unify(fs1)\n",
    "print(fs2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ ADDRESS = [ NUMBER = 74           ]               ]\n",
      "[           [ STREET = 'rue Pascal' ]               ]\n",
      "[                                                   ]\n",
      "[ NAME    = 'Lee'                                   ]\n",
      "[                                                   ]\n",
      "[           [ ADDRESS = [ NUMBER = 74           ] ] ]\n",
      "[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]\n",
      "[           [                                     ] ]\n",
      "[           [ NAME    = 'Kim'                     ] ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Unificación con compartición de estructuras\n",
    "##\n",
    "fs0 = nltk.FeatStruct(\n",
    "    \"\"\"\n",
    "        [\n",
    "            NAME=Lee,\n",
    "            ADDRESS=[\n",
    "                NUMBER=74,\n",
    "                STREET='rue Pascal'\n",
    "            ],\n",
    "            SPOUSE=[\n",
    "                NAME=Kim,\n",
    "                ADDRESS=[\n",
    "                    NUMBER=74,\n",
    "                    STREET='rue Pascal'\n",
    "                ]\n",
    "            ]\n",
    "        ]\n",
    "    \"\"\"\n",
    ")\n",
    "print(fs0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ ADDRESS = [ NUMBER = 74           ]               ]\n",
      "[           [ STREET = 'rue Pascal' ]               ]\n",
      "[                                                   ]\n",
      "[ NAME    = 'Lee'                                   ]\n",
      "[                                                   ]\n",
      "[           [           [ CITY   = 'Paris'      ] ] ]\n",
      "[           [ ADDRESS = [ NUMBER = 74           ] ] ]\n",
      "[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]\n",
      "[           [                                     ] ]\n",
      "[           [ NAME    = 'Kim'                     ] ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Aumento con nuevos datos\n",
    "##\n",
    "fs1 = nltk.FeatStruct(\n",
    "    \"\"\"\n",
    "        [\n",
    "            SPOUSE = [\n",
    "                ADDRESS = [CITY = Paris]\n",
    "            ]\n",
    "        ]\n",
    "    \"\"\"\n",
    ")\n",
    "\n",
    "##\n",
    "## Note que la unificación agrega CITY a ADDRESS \n",
    "##\n",
    "print(fs1.unify(fs0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[               [ CITY   = 'Paris'      ] ]\n",
      "[ ADDRESS = (1) [ NUMBER = 74           ] ]\n",
      "[               [ STREET = 'rue Pascal' ] ]\n",
      "[                                         ]\n",
      "[ NAME    = 'Lee'                         ]\n",
      "[                                         ]\n",
      "[ SPOUSE  = [ ADDRESS -> (1)  ]           ]\n",
      "[           [ NAME    = 'Kim' ]           ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Comportamiento diferente\n",
    "##\n",
    "fs2 = nltk.FeatStruct(\n",
    "    \"\"\"\n",
    "        [\n",
    "            NAME = Lee, \n",
    "            ADDRESS = (1)[NUMBER=74, STREET='rue Pascal'],\n",
    "            SPOUSE=[NAME=Kim, ADDRESS->(1)]\n",
    "        ]\n",
    "    \"\"\"\n",
    ")\n",
    "\n",
    "## \n",
    "## Note que se agrega CITY = 'Paris' a la direccion\n",
    "## principal\n",
    "##\n",
    "print(fs1.unify(fs2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ ADDRESS1 = ?x ]\n",
      "[ ADDRESS2 = ?x ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Uso de variables\n",
    "##\n",
    "fs1 = nltk.FeatStruct(\"[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]\")\n",
    "fs2 = nltk.FeatStruct(\"[ADDRESS1=?x, ADDRESS2=?x]\")\n",
    "print(fs2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ ADDRESS1 = (1) [ NUMBER = 74           ] ]\n",
      "[                [ STREET = 'rue Pascal' ] ]\n",
      "[                                          ]\n",
      "[ ADDRESS2 -> (1)                          ]\n"
     ]
    }
   ],
   "source": [
    "##\n",
    "## Asignación de las variables a partir\n",
    "## de los datos en fs1\n",
    "##\n",
    "print(fs2.unify(fs1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Subcategorización"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Gramática original**.\n",
    "\n",
    "```\n",
    "S -> NP[NUM=?n] VP[NUM=?n]\n",
    "\n",
    "NP[NUM=?n] -> N[NUM=?n]\n",
    "NP[NUM=?n] -> PropN[NUM=?n]\n",
    "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n",
    "NP[NUM=pl] -> N[NUM=pl]\n",
    "\n",
    "##\n",
    "## Cambia esta definición. Se pueden manejar \n",
    "## mediante propiedades\n",
    "##\n",
    "VP[TENSE=?t, NUM=?n] -> IV[TENSE=?t, NUM=?n]\n",
    "VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP\n",
    "\n",
    "\n",
    "# igual a partir de aca\n",
    "Det[NUM=sg] -> 'this' | 'every'\n",
    "Det[NUM=pl] -> 'these' | 'all'\n",
    "Det -> 'the' | 'some' | 'several'\n",
    "\n",
    "PropN[NUM=sg]-> 'Kim' | 'Jody'\n",
    "\n",
    "N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child'\n",
    "N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children'\n",
    "\n",
    "##\n",
    "## Se hace innecesaria esta parte\n",
    "##\n",
    "IV[TENSE=pres,  NUM=sg] -> 'disappears' | 'walks'\n",
    "IV[TENSE=pres,  NUM=pl] -> 'disappear' | 'walk'\n",
    "IV[TENSE=past] -> 'disappeared' | 'walked'\n",
    "\n",
    "TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'\n",
    "TV[TENSE=pres, NUM=pl] -> 'see' | 'like'\n",
    "TV[TENSE=past] -> 'saw' | 'liked`\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Gramática modificada**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting feat1_exam.fcfg\n"
     ]
    }
   ],
   "source": [
    "%%writefile feat1_exam.fcfg\n",
    "%start S\n",
    "\n",
    "S -> NP[NUM=?n] VP[NUM=?n]\n",
    "\n",
    "NP[NUM=?n] -> N[NUM=?n]\n",
    "NP[NUM=?n] -> PropN[NUM=?n]\n",
    "NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]\n",
    "NP[NUM=pl] -> N[NUM=pl]\n",
    "\n",
    "##\n",
    "## Nueva definición:\n",
    "##  agreaga SUBCAT = {intrans, trans, clause}\n",
    "##\n",
    "VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, NUM=?n]\n",
    "VP[TENSE=?t, NUM=?n] -> V[SUBCAT=trans, TENSE=?t, NUM=?n] NP\n",
    "VP[TENSE=?t, NUM=?n] -> V[SUBCAT=clause, TENSE=?t, NUM=?n] SBar\n",
    "\n",
    "V[SUBCAT=intrans, TENSE=pres, NUM=sg] -> 'disappears' | 'walks' | 'puts'\n",
    "V[SUBCAT=trans, TENSE=pres, NUM=sg] -> 'sees' | 'likes'\n",
    "V[SUBCAT=clause, TENSE=pres, NUM=sg] -> 'says' | 'claims'\n",
    "\n",
    "V[SUBCAT=intrans, TENSE=pres, NUM=pl] -> 'disappear' | 'walk' | 'put'\n",
    "V[SUBCAT=trans, TENSE=pres, NUM=pl] -> 'see' | 'like'\n",
    "V[SUBCAT=clause, TENSE=pres, NUM=pl] -> 'say' | 'claim'\n",
    "\n",
    "V[SUBCAT=intrans, TENSE=past, NUM=?n] -> 'disappeared' | 'walked' | 'put'\n",
    "V[SUBCAT=trans, TENSE=past, NUM=?n] -> 'saw' | 'liked' | 'put'\n",
    "V[SUBCAT=clause, TENSE=past, NUM=?n] -> 'said' | 'claimed' | 'put'\n",
    "\n",
    "# igual a partir de aca\n",
    "Det[NUM=sg] -> 'this' | 'every'\n",
    "Det[NUM=pl] -> 'these' | 'all'\n",
    "Det -> 'the' | 'some' | 'several'\n",
    "\n",
    "PropN[NUM=sg]-> 'Kim' | 'Jody'\n",
    "\n",
    "N[NUM=sg] -> 'dog' | 'girl' | 'car' | 'child' | 'table' | 'book' \n",
    "N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' | 'tables' | 'books'\n",
    "\n",
    "\n",
    "##\n",
    "## Agrega dos producciones\n",
    "##\n",
    "SBar -> Comp S\n",
    "Comp -> 'that'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S[]\n",
      "  (NP[NUM='sg'] (PropN[NUM='sg'] Kim))\n",
      "  (VP[NUM='sg', TENSE='pres']\n",
      "    (V[NUM='sg', SUBCAT='clause', TENSE='pres'] claims)\n",
      "    (SBar[]\n",
      "      (Comp[] that)\n",
      "      (S[]\n",
      "        (NP[NUM='sg'] (PropN[NUM='sg'] Jody))\n",
      "        (VP[NUM='sg', TENSE='pres']\n",
      "          (V[NUM='sg', SUBCAT='trans', TENSE='pres'] likes)\n",
      "          (NP[NUM='pl'] (N[NUM='pl'] children)))))))\n"
     ]
    }
   ],
   "source": [
    "from nltk import load_parser\n",
    "\n",
    "## crea el parser\n",
    "parser = load_parser('feat1_exam.fcfg', trace=0)\n",
    "\n",
    "## frase a analizar\n",
    "tokens = 'Kim claims that Jody likes children'.split()\n",
    "\n",
    "## arboles\n",
    "for tree in parser.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "##\n",
    "## ------------------------------- Hasta aqui -------------------------------\n",
    "##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "##\n",
    "## Generalized Phrase Structure Grammar (GPSG)\n",
    "##"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting feat1.fcfg\n"
     ]
    }
   ],
   "source": [
    "%%writefile feat1.fcfg\n",
    "% start S\n",
    "# ###################\n",
    "# Grammar Productions\n",
    "# ###################\n",
    "S[-INV] -> NP VP\n",
    "S[-INV]/?x -> NP VP/?x\n",
    "S[-INV] -> NP S/NP\n",
    "S[-INV] -> Adv[+NEG] S[+INV]\n",
    "S[+INV] -> V[+AUX] NP VP\n",
    "S[+INV]/?x -> V[+AUX] NP VP/?x\n",
    "SBar -> Comp S[-INV]\n",
    "SBar/?x -> Comp S[-INV]/?x\n",
    "VP -> V[SUBCAT=intrans, -AUX]\n",
    "VP -> V[SUBCAT=trans, -AUX] NP\n",
    "VP/?x -> V[SUBCAT=trans, -AUX] NP/?x\n",
    "VP -> V[SUBCAT=clause, -AUX] SBar\n",
    "VP/?x -> V[SUBCAT=clause, -AUX] SBar/?x\n",
    "VP -> V[+AUX] VP\n",
    "VP/?x -> V[+AUX] VP/?x\n",
    "\n",
    "# ###################\n",
    "# Lexical Productions\n",
    "# ###################\n",
    "V[SUBCAT=intrans, -AUX] -> 'walk' | 'sing'\n",
    "V[SUBCAT=trans, -AUX] -> 'see' | 'like'\n",
    "V[SUBCAT=clause, -AUX] -> 'say' | 'claim'\n",
    "V[+AUX] -> 'do' | 'can'\n",
    "NP[-WH] -> 'you' | 'cats'\n",
    "NP[+WH] -> 'who'\n",
    "Adv[+NEG] -> 'rarely' | 'never'\n",
    "NP/NP ->\n",
    "Comp -> 'that'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S[-INV]\n",
      "  (NP[+WH] who)\n",
      "  (S[+INV]/NP[]\n",
      "    (V[+AUX] do)\n",
      "    (NP[-WH] you)\n",
      "    (VP[]/NP[]\n",
      "      (V[-AUX, SUBCAT='clause'] claim)\n",
      "      (SBar[]/NP[]\n",
      "        (Comp[] that)\n",
      "        (S[-INV]/NP[]\n",
      "          (NP[-WH] you)\n",
      "          (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'who do you claim that you like'.split()\n",
    "from nltk import load_parser\n",
    "cp = load_parser('feat1.fcfg')\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S[-INV]\n",
      "  (NP[-WH] you)\n",
      "  (VP[]\n",
      "    (V[-AUX, SUBCAT='clause'] claim)\n",
      "    (SBar[]\n",
      "      (Comp[] that)\n",
      "      (S[-INV]\n",
      "        (NP[-WH] you)\n",
      "        (VP[] (V[-AUX, SUBCAT='trans'] like) (NP[-WH] cats))))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'you claim that you like cats'.split()\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(S[-INV]\n",
      "  (Adv[+NEG] rarely)\n",
      "  (S[+INV]\n",
      "    (V[+AUX] do)\n",
      "    (NP[-WH] you)\n",
      "    (VP[] (V[-AUX, SUBCAT='intrans'] sing))))\n"
     ]
    }
   ],
   "source": [
    "tokens = 'rarely do you sing'.split()\n",
    "for tree in cp.parse(tokens):\n",
    "    print(tree)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      " \n",
      "tokens = 'rarely do you sing'.split()\n",
      "for tree in cp.parse(tokens):\n",
      "    print(tree)\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "text = '''\n",
    "\n",
    " \t\n",
    ">>> tokens = 'rarely do you sing'.split()\n",
    ">>> for tree in cp.parse(tokens):\n",
    "...     print(tree)\n",
    "\n",
    "\n",
    "\n",
    "'''\n",
    "\n",
    "\n",
    "text = text.replace(\">>> \", \"\").replace(\"... \", \"\").replace(\"...\", \"\").replace(\"\\t\", \"\")\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}