Created
November 10, 2018 14:50
-
-
Save euphoris/5f56fd52fdd6cdccfa8ad0fa8b57be74 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "nbformat": 4, | |
| "nbformat_minor": 0, | |
| "metadata": { | |
| "colab": { | |
| "name": "practice11.ipynb", | |
| "version": "0.3.2", | |
| "provenance": [], | |
| "collapsed_sections": [] | |
| }, | |
| "kernelspec": { | |
| "name": "python3", | |
| "display_name": "Python 3" | |
| }, | |
| "accelerator": "GPU" | |
| }, | |
| "cells": [ | |
| { | |
| "metadata": { | |
| "id": "osMMgqsAEZZd", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## 게시판 스크래핑\n", | |
| "\n", | |
| "### 페이지 바꾸기" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "dqn58OXebXSb", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import requests" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "HuSGinocEdry", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "국민대 홈페이지 게시판 URL. `pn=` 부분이 페이지를 나타낸다. `{}`로 페이지 번호가 들어갈 자리를 표시한다." | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "WgmAkyEhcMbO", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "url = 'https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn={}'" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "I2b-7inoEnxS", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "페이지를 0번부터 9번까지 바꿔가며 출력한다" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "RISFXGvIcZx8", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 194 | |
| }, | |
| "outputId": "80332c06-6102-4812-f26a-28caadee4049" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "for page in range(10):\n", | |
| " res = requests.get(url.format(page))" | |
| ], | |
| "execution_count": 5, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=0\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=1\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=2\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=3\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=4\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=5\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=6\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=7\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=8\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/?&pn=9\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "4hQ7cuV5dLi1", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!pip install lxml" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "agESh8vFdWeA", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!pip install cssselect" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "IVvfRgSSdILR", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import lxml.html" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "ZI3nCW8iEsiQ", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### 게시물 URL 가져오기\n", | |
| "\n", | |
| "0번 페이지 가져오기" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "r8Xj2EO5cbBv", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "res = requests.get(url.format(0))" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "K5IbK9DydKAJ", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "root = lxml.html.fromstring(res.text)" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "QfGSE5ZjeYWi", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "from urllib.parse import urljoin" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "p4_KymRudZKI", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 283 | |
| }, | |
| "outputId": "8a0dfa05-d752-4ad3-aaae-4619d00483f4" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "for link in root.cssselect('.boardlist a'): # class=\"boardlist\" 아래에 있는 a 링크를 모두 모아서\n", | |
| " print(urljoin(url, link.attrib['href'])) # href 속성값을 가져온다" | |
| ], | |
| "execution_count": 16, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122457\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122428\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122425\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122408\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122403\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122396\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122382\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122356\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122338\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122319\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122256\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122255\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122177\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122215\n", | |
| "https://www.kookmin.ac.kr/site/ecampus/notice/all/122212\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "r3htR6xjE6Cr", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### 게시물 내용 가져오기" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "QKYWAX7wfWDw", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "res = requests.get('https://www.kookmin.ac.kr/site/ecampus/notice/all/122212')" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "_oa2DADzfzYf", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "res.encoding = 'utf8'" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "wDKA1JH1fao0", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "root = lxml.html.fromstring(res.text)" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "88WkDGYOdo3K", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "content = root.cssselect('#view-detail-data')" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "-eSfsJvPfmv-", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 55 | |
| }, | |
| "outputId": "831d3e55-346b-4e3b-9283-8b686ca8d22a" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "content[0].text_content()" | |
| ], | |
| "execution_count": 32, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "'\\n\\t\\xa0\\r\\n\\r\\n국민대학교 창업보육센터 계약직원 모집\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n1. 모집분야 및 응시자격 \\r\\n\\r\\n\\r\\n\\t\\r\\n\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t모집분야\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t인원\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t우대사항\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t공통사항\\r\\n\\t\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t창업보육센터\\r\\n\\r\\n\\t\\t\\t전담인력\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t1명\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t- 창업보육전문매니저 자격증, 경영지도사, 기술경영사, 기술평가사 자격증 소지자 우대\\r\\n\\r\\n\\t\\t\\t- 창업지원 및 창업교육 관련 업무 경력자 우대\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t\\r\\n\\t\\t\\t- 4년제 대학 이상 졸업자 \\r\\n\\r\\n\\t\\t\\t- 아래한글 또는 MS워드, 엑셀, 파워포인트, 포토샵, 일러스트) 활용에 능숙한 자\\r\\n\\r\\n\\t\\t\\t- 해외여행에 결격 사유가 없는 자로 남자는 병역필 또는 면제자\\r\\n\\t\\t\\t\\r\\n\\t\\t\\r\\n\\t\\r\\n\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n2. 제출서류\\r\\n\\r\\n◦입사지원서\\r\\n\\r\\n(필히 본교 홈페이지 www.kookmin.ac.kr에서 다운받아 사용하시기 바랍니다.)\\r\\n\\r\\n◦자기소개서 1부\\r\\n\\r\\n◦대학 졸업 및 성적증명서 원본 각 1부 (반드시 성적증명서는100점 만점 환산 점수 기재된 것)\\r\\n\\r\\n\\xa0\\xa0 가. 편입자는 전적대학 졸업‧성적증명서 포함\\r\\n\\r\\n\\xa0\\xa0 나. 대학원 졸업(수료)자는 학위수여증명서(수료증명서)‧성적증명서 포함\\r\\n\\r\\n◦자격증(외국어성적표 포함) 사본(해당자에 한함) 1부\\r\\n\\r\\n◦경력증명서(해당자에 한함) 1부\\r\\n\\r\\n◦취업보호대상자 증명원(보훈대상자에 한함) 1부\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n3. 제출기간 및 제출처\\r\\n\\r\\n◦제출기간 : 2018. 11. 01.(목) ~ 11. 16.(금)\\r\\n\\r\\n◦제 출 처 : 우편접수 - 국민대학교 산학협력관 214호 창업지원단 사무실\\r\\n\\r\\n\\xa0\\xa0 (마감일 기준 도착분에 한함)\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n4. 전형방법\\r\\n\\r\\n◦1차 전형 : 서류심사\\r\\n\\r\\n◦2차 전형 : 면접\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n5. 전형일정 \\r\\n\\r\\n◦1차 서류심사 : 2018. 11. 19.(월)\\r\\n\\r\\n◦1차 서류심사 결과 통보 : 2018. 11. 20 (화) 예정\\r\\n\\r\\n\\xa0\\xa0 - 1차 서류심사 합격자에 한하여 개별 통지\\r\\n\\r\\n◦2차 면접 : 2018. 11. 22.(목) 11:00 예정\\r\\n\\r\\n◦최종 합격 통보 : 2018. 11. 23.(금) 예정\\r\\n\\r\\n◦임용일자 : 2018.12.03.(월) 예정\\r\\n\\r\\n◦전형일정은 본교 사정에 따라 변동될 수 있습니다.\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n6. 채용조건\\r\\n\\r\\n- 계약직원으로 1년간 고용 후 평가결과에 따라 1년 연장 가능\\r\\n\\r\\n(본 채용은 창업보육센터 사업 전담인력 채용임)\\r\\n\\r\\n\\xa0 \\r\\n\\r\\n7. 기타\\r\\n\\r\\n서류(우편포함)는 마감일 16:00까지 도착된 것에 한하여 접수(e-mail 접수 불가)\\r\\n\\r\\n주 소 : 20707 서울 성북구 정릉로 77 국민대학교 산학협력관 214호 창업지원단 사무실\\r\\n\\r\\n전 화 : (02) 910 - 5911\\r\\n'" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 32 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "1v1otR8jE-RO", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "### 종합\n", | |
| "\n", | |
| "페이지를 바꿔가며 게시물 주소를 수집한다" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "s0wvT6Uwfo9-", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "article_urls = []\n", | |
| "\n", | |
| "for page in range(10):\n", | |
| " res = requests.get(url.format(page))\n", | |
| " root = lxml.html.fromstring(res.text) \n", | |
| " for link in root.cssselect('.boardlist a'):\n", | |
| " article_urls.append(urljoin(url, link.attrib['href']))" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "GWIEWki_ggdG", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 35 | |
| }, | |
| "outputId": "6ff34ef1-d419-4ec6-b887-f6622c3af1fc" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "len(article_urls)" | |
| ], | |
| "execution_count": 34, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "70" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 34 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "1wI6yHfcFBny", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "수집한 주소의 게시물 본문을 수집한다" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "mDud-Qo0gl5o", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "contents = []\n", | |
| "for article_url in article_urls: # 각각의 게시물 주소에 대해\n", | |
| " res = requests.get(article_url) # 접속해서\n", | |
| " res.encoding = 'utf8' # 인코딩을 UTF8로 바꾸고\n", | |
| " root = lxml.html.fromstring(res.text) # 해석해서\n", | |
| " content = root.cssselect('#view-detail-data') # 본문 영역을 가져와\n", | |
| " contents.append(content[0].text_content()) # 텍스트를 수집한다" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "UUgIiNbHFJH7", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## 11.2 Word Embedding\n", | |
| "\n", | |
| "(교재와 동일)" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "iR7hg-1ignwY", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import requests\n", | |
| "import re\n", | |
| "res = requests.get('https://www.gutenberg.org/files/2591/2591-0.txt')\n", | |
| "grimm = res.text[2801:530661]\n", | |
| "grimm = re.sub(r'[^a-zA-Z\\. ]', ' ', grimm)\n", | |
| "sentences = grimm.split('. ') # 문장 단위로 자름\n", | |
| "data = [s.split() for s in sentences]\n" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "hyPaPTbmtL-2", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 389 | |
| }, | |
| "outputId": "fea13696-9d2b-4b9c-fa29-c6b4fcf5351c" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "data[0]" | |
| ], | |
| "execution_count": 37, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "['THE',\n", | |
| " 'GOLDEN',\n", | |
| " 'BIRD',\n", | |
| " 'A',\n", | |
| " 'certain',\n", | |
| " 'king',\n", | |
| " 'had',\n", | |
| " 'a',\n", | |
| " 'beautiful',\n", | |
| " 'garden',\n", | |
| " 'and',\n", | |
| " 'in',\n", | |
| " 'the',\n", | |
| " 'garden',\n", | |
| " 'stood',\n", | |
| " 'a',\n", | |
| " 'tree',\n", | |
| " 'which',\n", | |
| " 'bore',\n", | |
| " 'golden',\n", | |
| " 'apples']" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 37 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "qnYsur95tN1G", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!pip install gensim" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "Hy9xg7BLt2PO", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "from gensim.models.word2vec import Word2Vec" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "vv7W_IGouGnH", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model = Word2Vec(data, # 리스트 형태의 데이터\n", | |
| " sg=1, # 0: CBOW, 1: Skip-gram\n", | |
| " size=100, # 벡터 크기\n", | |
| " window=3, # 고려할 앞뒤 폭(앞뒤 3단어)\n", | |
| " min_count=3, # 사용할 단어의 최소 빈도(3회 이하 단어 무시)\n", | |
| " workers=4) # 동시에 처리할 작업 수(코어 수와 비슷하게 설정)" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "VlBbWfqdvWD5", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.save('word2vec.model')" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "51cuAPEkv62a", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 90 | |
| }, | |
| "outputId": "f85d7c95-ad35-4fc4-b61a-ef8011f09c36" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.wv.similarity('princess', 'queen')" | |
| ], | |
| "execution_count": 44, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "/usr/local/lib/python3.6/dist-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.\n", | |
| " if np.issubdtype(vec.dtype, np.int):\n" | |
| ], | |
| "name": "stderr" | |
| }, | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "0.9875084" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 44 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "dd4-kpQYwQdx", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 250 | |
| }, | |
| "outputId": "a4b1a71b-1ca7-46f8-d00b-7b7ce8e9b69f" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.wv.most_similar('princess')" | |
| ], | |
| "execution_count": 49, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "/usr/local/lib/python3.6/dist-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.\n", | |
| " if np.issubdtype(vec.dtype, np.int):\n" | |
| ], | |
| "name": "stderr" | |
| }, | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "[('fox', 0.9914872646331787),\n", | |
| " ('dwarf', 0.9899657964706421),\n", | |
| " ('prince', 0.9898759126663208),\n", | |
| " ('second', 0.9888558387756348),\n", | |
| " ('wedding', 0.9885976314544678),\n", | |
| " ('boy', 0.9884428977966309),\n", | |
| " ('queen', 0.9875084757804871),\n", | |
| " ('youth', 0.9870286583900452),\n", | |
| " ('witch', 0.9852925539016724),\n", | |
| " ('palace', 0.9848740100860596)]" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 49 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "JABHP02Dv_qj", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 250 | |
| }, | |
| "outputId": "dde4e19e-5259-4344-b23b-94153e32c415" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.wv.most_similar(positive=['man', 'princess'], negative=['woman'])\n" | |
| ], | |
| "execution_count": 50, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "/usr/local/lib/python3.6/dist-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.\n", | |
| " if np.issubdtype(vec.dtype, np.int):\n" | |
| ], | |
| "name": "stderr" | |
| }, | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "[('cat', 0.9745951890945435),\n", | |
| " ('miller', 0.9728690981864929),\n", | |
| " ('lady', 0.9711911678314209),\n", | |
| " ('bird', 0.9709718823432922),\n", | |
| " ('bride', 0.9689940214157104),\n", | |
| " ('wolf', 0.9689082503318787),\n", | |
| " ('child', 0.9684101343154907),\n", | |
| " ('huntsman', 0.9650394320487976),\n", | |
| " ('soldier', 0.9645828604698181),\n", | |
| " ('peasant', 0.9642676115036011)]" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 50 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "uxi3ybXIxMDP", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 35 | |
| }, | |
| "outputId": "c9eb50d0-2284-4feb-ed48-032f19ef6c42" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "from keras.models import Sequential\n", | |
| "from keras.layers import Embedding" | |
| ], | |
| "execution_count": 51, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "Using TensorFlow backend.\n" | |
| ], | |
| "name": "stderr" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "7N5AGLjhyIwu", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "NUM_WORDS, EMB_DIM = model.wv.vectors.shape" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "5yhNQBfYyPst", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 35 | |
| }, | |
| "outputId": "a9f4b8d6-fc76-4434-8021-aad0a30fe160" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "NUM_WORDS" | |
| ], | |
| "execution_count": 54, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "2481" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 54 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "qJgukL1hx9Qz", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "nn = Sequential()" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "Fe5SAerRyWVd", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "emb = Embedding(input_dim=NUM_WORDS, output_dim=EMB_DIM,\n", | |
| " trainable=False, weights=[model.wv.vectors])" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "6LdARVmNyBLz", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "nn.add(emb)" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "iSpo5mdQy6Qi", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 232 | |
| }, | |
| "outputId": "d36e74c0-1fef-4fa7-93d4-668548631af4" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!wget https://drive.google.com/file/d/0B0ZXk88koS2KbDhXdWg1Q2RydlU/view" | |
| ], | |
| "execution_count": 57, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "--2018-11-09 04:50:09-- https://drive.google.com/file/d/0B0ZXk88koS2KbDhXdWg1Q2RydlU/view\n", | |
| "Resolving drive.google.com (drive.google.com)... 74.125.195.101, 74.125.195.100, 74.125.195.139, ...\n", | |
| "Connecting to drive.google.com (drive.google.com)|74.125.195.101|:443... connected.\n", | |
| "HTTP request sent, awaiting response... 200 OK\n", | |
| "Length: unspecified [text/html]\n", | |
| "Saving to: ‘view’\n", | |
| "\n", | |
| "\rview [<=> ] 0 --.-KB/s \rview [ <=> ] 131.77K --.-KB/s in 0.05s \n", | |
| "\n", | |
| "2018-11-09 04:50:10 (2.50 MB/s) - ‘view’ saved [134932]\n", | |
| "\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "YMkM6R4NzlKe", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 70 | |
| }, | |
| "outputId": "9cb951b5-9764-42c7-d3c0-bb4b306a35c3" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!unzip ko.zip" | |
| ], | |
| "execution_count": 59, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "Archive: ko.zip\n", | |
| " inflating: ko.bin \n", | |
| " inflating: ko.tsv \n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "TX2XUK4G1-9u", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "kovec = Word2Vec.load('ko.bin')" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "SYVR0jG62OTQ", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 250 | |
| }, | |
| "outputId": "7f0430c6-5ac2-4912-dbfc-31df9c9c51ef" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "kovec.wv.most_similar('여왕')" | |
| ], | |
| "execution_count": 68, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "/usr/local/lib/python3.6/dist-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.\n", | |
| " if np.issubdtype(vec.dtype, np.int):\n" | |
| ], | |
| "name": "stderr" | |
| }, | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "[('국왕', 0.6174007654190063),\n", | |
| " ('왕', 0.6089673638343811),\n", | |
| " ('왕녀', 0.5904853343963623),\n", | |
| " ('왕비', 0.5857207179069519),\n", | |
| " ('왕자', 0.5760841965675354),\n", | |
| " ('왕세자', 0.544166624546051),\n", | |
| " ('왕인', 0.5402752161026001),\n", | |
| " ('미실', 0.5337860584259033),\n", | |
| " ('부왕', 0.5335291624069214),\n", | |
| " ('모후', 0.5328422784805298)]" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 68 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "_pVEPgzaFjyb", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "## 11.5 ELMo 실습\n", | |
| "\n", | |
| "다른 부분은 교재와 동일하나 전처리를 `np.expand_dims`로 간단히 한 차이가 있음. `expand_dims`에 대해서는 Q&A의 설명을 참고." | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "AdhLlUk42DSx", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 126 | |
| }, | |
| "outputId": "368b28bd-5278-4de1-d9c2-88324926cc94" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!pip install tensorflow-hub" | |
| ], | |
| "execution_count": 1, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "Requirement already satisfied: tensorflow-hub in /usr/local/lib/python3.6/dist-packages (0.1.1)\n", | |
| "Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-hub) (1.14.6)\n", | |
| "Requirement already satisfied: protobuf>=3.4.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-hub) (3.6.1)\n", | |
| "Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-hub) (1.11.0)\n", | |
| "Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from protobuf>=3.4.0->tensorflow-hub) (40.5.0)\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "GD7X2VP_2-F1", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import tensorflow_hub as hub" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "WiJewaX83RMZ", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 70 | |
| }, | |
| "outputId": "a592e4f9-6989-4f58-d551-6387f7213fbc" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "elmo = hub.Module(\"https://tfhub.dev/google/elmo/1\", trainable=True)" | |
| ], | |
| "execution_count": 3, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "INFO:tensorflow:Using /tmp/tfhub_modules to cache modules.\n", | |
| "INFO:tensorflow:Downloading TF-Hub Module 'https://tfhub.dev/google/elmo/1'.\n", | |
| "INFO:tensorflow:Downloaded TF-Hub Module 'https://tfhub.dev/google/elmo/1'.\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "L42D-rbM3U9k", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 35 | |
| }, | |
| "outputId": "4434d778-7d5a-4728-bbf8-15bab512f2c1" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import tensorflow as tf\n", | |
| "from keras.layers import Lambda" | |
| ], | |
| "execution_count": 4, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "Using TensorFlow backend.\n" | |
| ], | |
| "name": "stderr" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "xMwI8Cdz4Hjf", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "def elmo_embedding(x):\n", | |
| " return elmo(tf.squeeze(tf.cast(x, tf.string)), signature=\"default\", as_dict=True)[\"default\"]" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "wTCpQ1sa4exO", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "elmo_layer = Lambda(elmo_embedding, input_shape=(1,), output_shape=(1024,))\n" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "7stCWquf4pzS", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 232 | |
| }, | |
| "outputId": "9e33b3e6-f08f-4b2e-87ce-693d30be0729" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip" | |
| ], | |
| "execution_count": 7, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "--2018-11-09 05:40:08-- https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip\n", | |
| "Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.249\n", | |
| "Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443... connected.\n", | |
| "HTTP request sent, awaiting response... 200 OK\n", | |
| "Length: 84188 (82K) [application/zip]\n", | |
| "Saving to: ‘sentiment labelled sentences.zip’\n", | |
| "\n", | |
| "sentiment labelled 100%[===================>] 82.21K 125KB/s in 0.7s \n", | |
| "\n", | |
| "2018-11-09 05:40:11 (125 KB/s) - ‘sentiment labelled sentences.zip’ saved [84188/84188]\n", | |
| "\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "muRINwBl47Zc", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 247 | |
| }, | |
| "outputId": "1c5fc42b-6bca-4e49-bf3a-768ba739c6c0" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "!unzip sentiment\\ labelled\\ sentences.zip" | |
| ], | |
| "execution_count": 8, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "Archive: sentiment labelled sentences.zip\n", | |
| " creating: sentiment labelled sentences/\n", | |
| " inflating: sentiment labelled sentences/.DS_Store \n", | |
| " creating: __MACOSX/\n", | |
| " creating: __MACOSX/sentiment labelled sentences/\n", | |
| " inflating: __MACOSX/sentiment labelled sentences/._.DS_Store \n", | |
| " inflating: sentiment labelled sentences/amazon_cells_labelled.txt \n", | |
| " inflating: sentiment labelled sentences/imdb_labelled.txt \n", | |
| " inflating: __MACOSX/sentiment labelled sentences/._imdb_labelled.txt \n", | |
| " inflating: sentiment labelled sentences/readme.txt \n", | |
| " inflating: __MACOSX/sentiment labelled sentences/._readme.txt \n", | |
| " inflating: sentiment labelled sentences/yelp_labelled.txt \n", | |
| " inflating: __MACOSX/._sentiment labelled sentences \n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "gIRldvmn4_Bh", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import pandas as pd\n", | |
| "from keras.preprocessing.sequence import pad_sequences\n", | |
| "from sklearn.model_selection import train_test_split" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "GquT15ak5Gd_", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "df = pd.read_csv('sentiment labelled sentences/amazon_cells_labelled.txt', sep='\\t', header=None)\n" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "FmaJVvGE53ej", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "import numpy as np" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "oKB_-j4b7pQN", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 35 | |
| }, | |
| "outputId": "135a8dca-a41e-44bb-dd47-d2418a83bc7d" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "df[0].values.shape" | |
| ], | |
| "execution_count": 12, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "(1000,)" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 12 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "YOISBGTZFrkK", | |
| "colab_type": "text" | |
| }, | |
| "cell_type": "markdown", | |
| "source": [ | |
| "" | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "hts603305LQb", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "X = np.expand_dims(df[0].values, 1)" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "DoEB0LQL7rzs", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 35 | |
| }, | |
| "outputId": "447e5de7-b88d-44c9-a0c2-01d0c953f690" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "X.shape" | |
| ], | |
| "execution_count": 14, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "(1000, 1)" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 14 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "yW4bR-BV59Tq", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "X_train, X_test, y_train, y_test = train_test_split(\n", | |
| " X, df[1], test_size=.2, random_state=1234)" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "cL6-0H2v6Ab4", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "from keras.models import Model\n", | |
| "from keras.layers import Dense, Input" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "yIC1K6166I65", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 55 | |
| }, | |
| "outputId": "1ce1f62b-0561-4ddb-9459-3ade500f45ad" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "input_layer = Input(shape=(1,), dtype=tf.string)\n", | |
| "emb_layer = elmo_layer(input_layer)\n", | |
| "#hidden = Dense(256, activation='relu')(emb_layer)\n", | |
| "out = Dense(1, activation='sigmoid')(emb_layer)" | |
| ], | |
| "execution_count": 17, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "INFO:tensorflow:Saver not created because there are no variables in the graph to restore\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "YY_NcYtf6Mht", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model = Model(inputs=[input_layer], outputs=out)\n" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "eY76JEvW9Gby", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 247 | |
| }, | |
| "outputId": "c4115b5a-f800-4043-8baa-1f75a434846d" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.summary()\n" | |
| ], | |
| "execution_count": 19, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "_________________________________________________________________\n", | |
| "Layer (type) Output Shape Param # \n", | |
| "=================================================================\n", | |
| "input_1 (InputLayer) (None, 1) 0 \n", | |
| "_________________________________________________________________\n", | |
| "lambda_1 (Lambda) (None, 1024) 0 \n", | |
| "_________________________________________________________________\n", | |
| "dense_1 (Dense) (None, 1) 1025 \n", | |
| "=================================================================\n", | |
| "Total params: 1,025\n", | |
| "Trainable params: 1,025\n", | |
| "Non-trainable params: 0\n", | |
| "_________________________________________________________________\n" | |
| ], | |
| "name": "stdout" | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "UNtbje-E6UpQ", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "BjwqifXN6pOo", | |
| "colab_type": "code", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 250 | |
| }, | |
| "outputId": "c4363428-0a2c-41e8-d1c6-fc5e95f15603" | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "model.fit(X_train,\n", | |
| " y_train,\n", | |
| " validation_data=(X_test, y_test),\n", | |
| " epochs=5,\n", | |
| " batch_size=32)" | |
| ], | |
| "execution_count": 22, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "text": [ | |
| "Train on 800 samples, validate on 200 samples\n", | |
| "Epoch 1/5\n", | |
| "800/800 [==============================] - 13s 16ms/step - loss: 0.4748 - acc: 0.8175 - val_loss: 0.4663 - val_acc: 0.8550\n", | |
| "Epoch 2/5\n", | |
| "800/800 [==============================] - 13s 16ms/step - loss: 0.4540 - acc: 0.8400 - val_loss: 0.4524 - val_acc: 0.8450\n", | |
| "Epoch 3/5\n", | |
| "800/800 [==============================] - 12s 16ms/step - loss: 0.4409 - acc: 0.8325 - val_loss: 0.4464 - val_acc: 0.8250\n", | |
| "Epoch 4/5\n", | |
| "800/800 [==============================] - 12s 16ms/step - loss: 0.4252 - acc: 0.8450 - val_loss: 0.4267 - val_acc: 0.8450\n", | |
| "Epoch 5/5\n", | |
| "800/800 [==============================] - 12s 16ms/step - loss: 0.4132 - acc: 0.8538 - val_loss: 0.4106 - val_acc: 0.8750\n" | |
| ], | |
| "name": "stdout" | |
| }, | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "<keras.callbacks.History at 0x7f05003b60b8>" | |
| ] | |
| }, | |
| "metadata": { | |
| "tags": [] | |
| }, | |
| "execution_count": 22 | |
| } | |
| ] | |
| }, | |
| { | |
| "metadata": { | |
| "id": "CnTzjvNN6rPi", | |
| "colab_type": "code", | |
| "colab": {} | |
| }, | |
| "cell_type": "code", | |
| "source": [ | |
| "" | |
| ], | |
| "execution_count": 0, | |
| "outputs": [] | |
| } | |
| ] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment