Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4(뷰티풀 수프)설치하기

Appia 2020. 10. 22. 07:27

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4

(뷰티풀 수프)설치하기 및 필요성

많은 웹 크웹크롤러(WebCrawler)를 생성하는 사람들이 매우 많습니다. 그 중에서 파이썬을 사용하는 대부분 사람들은 앞서서 포스팅 했던 Request 모듈을 사용하는 경우도 많습니다. 저 또한 Request 모듈을 매우 선호하고, 매우 훌륭한 모듈이라고 생각합니다. 하지만, 한가지 제약 사항들이 존재합니다. 즉, 데이터들을 원하는 객체로 정리하는 과정이 필요하다는 것입니다.

Python[파이썬 웹크롤러] 00-WebCrawler 크롤러를 위한 모듈 설치 - requests 모듈설치

Python[파이썬 웹크롤러] 00-WebCrawler 크롤러를 위한 모듈 설치 - requests 모듈설치

Python[파이썬 웹크롤러] 00. WebCrawler 크롤러를 위한 모듈 설치 - requests 모듈설치 빅데이터, 인공지능등이 발전함에 따라, 데이터의 중요성이 매우 켜지고 있습니다. 그 중에서 가장 큰 부분은 바�

appia.tistory.com

앞서서 Request 모듈을 통해서 HTML 데이터를 읽어오면 다음과 같은 구조가 될 수 있습니다.

<!doctype html>
<html lang="ko">
<head>
    <title>TISTORY</title>
    <meta charset="utf-8">
    <meta name="viewport" content="user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, width=device-width">
    <link rel="stylesheet" type="text/css" href="//t1.daumcdn.net/tistory_admin/www/style/top/font.css">
    <link rel="stylesheet" type="text/css" href="//t1.daumcdn.net/tistory_admin/www/style/top/error_20190814.css">
</head>
<body>
<div id="kakaoIndex">
    <a href="#kakaoBody">ë³¸ë¬¸ ë°ë¡ê°ê¸°</a>
    <a href="#kakaoGnb">ë©ë´ ë°ë¡ê°ê¸°</a>
</div>
<div id="kakaoWrap" class="tistory_type3">
    <div id="kakaoContent" role="main">
        <div id="cMain">
            <div id="mArticle">
                <div class="content_error">
                    <div class="inner_error">
                        <div class="error_tistory">
                            <h2 id="kakaoBody" class="screen_out">ìë¬ ë©ì¸ì§</h2>
                            <strong class="tit_error  tit_error_type2">ì ê·¼ ê¶íì´ ìë <span class="br_line"><br></span>íì´ì§ìëë¤.</strong>
                            <p class="desc_error">ê¶ê¸íì  ì¬í­ì <a href="https://cs.daum.net/faq/173.html" class="link_txt">ê³ ê°ì¼í°</a>ë¡ ë¬¸ìí´ ì£¼ìê¸° ë°ëëë¤.</p>
                            <div class="wrap_btn"><a class="btn_basic" href="javascript:window.history.back();">ì´ì íë©´</a></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <hr class="hide">
    <div id="kakaoFoot" class="footer_comm">
        <div class="inner_foot">
            <p class="desc_footer2">
                <strong class="txt_flogo">TISTORY</strong>
            </p>
        </div>
    </div>
</div>
</body>
</html>
Colored by Color Scripter
cs

물론, 관련해서 부분들에 대해서 데이터를 구조화 할 수 있습니다. 하지만, 관련해서는 매우 번거로운 일입니다. 그래서 이번 포스팅에서는 관련해서 이러한 부분들을 보다 효율적으로 관리하고 자동으로 정리해주는 beautifulsoup(뷰티풀 수프)설치하는 방법에 대해서 포스팅을 해보고자 합니다.

beautifulsoup(뷰티풀 수프)설치

- 리눅스 기반에서 설치할 경우

$ apt-get install python-bs4 (for Python 2)

$ apt-get install python3-bs4 (for Python 3)

- 윈도우 기반에서 설치할 경우

pip install beautifulsoup4

저의 경우 윈도우 기반으로 사용하기 때문에 윈도우 기반에서 간단히 한번 살펴보겠습니다.

(venv) D:\BlogProject\Pillow>pip install beautifulsoup4

Collecting beautifulsoup4

Donwloading beautifulsoup4-4.9.3-py3-none-any.whl(115 kb) |████████████████████████████████| 115 kB 819 kB/s

Requirement already satisfied: soupsieve>1.2; python_version >= "3.0" in d:\blogproject\venv\lib\site-packages (from beautifulsoup4) (2.0.1)

Installing collected packages: beautifulsoup4

Successfully installed beautifulsoup4-4.9.3

WARNING: You are using pip version 20.0.2; however, version 20.2.4 is available.

You should consider upgrading via the 'd:\blogproject\venv\scripts\python.exe -m pip install --upgrade pip' command.

위와 같은 로그가 나타날 것입니다. 그리고 다음과 같이 실행함으로 정상적으로 설치되어 있는지에 대해서 확인할 수 있습니다.

>>>

콘솔 창에서 import bs4를 입력했을 때 위와 같인 '>>>' 표시가 나타나는 것으로 정상 설치에 대해서 확인할 수 있습니다.

이번 포스팅에서는 간단히 Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup(뷰티풀 스프)설치하기에 대해서 살펴봤습니다. 혹 궁금하신 점이나 문의 사항이 있으시면 언제든지 댓글 및 방명록에 글 남겨주시길 바랍니다. 감사합니다.

저작자표시 비영리 변경금지 (새창열림)