Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4(뷰티풀 수프)설치하기

Notice

파이썬[Python] 강의 목록(기본 강의, ⋯

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

관리 메뉴

Appia의 IT세상

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4(뷰티풀 수프)설치하기 본문

Python/Python Crawler[크롤러]

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4(뷰티풀 수프)설치하기

Appia 2020. 10. 22. 07:27

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4

(뷰티풀 수프)설치하기 및 필요성

많은 웹 크웹크롤러(WebCrawler)를 생성하는 사람들이 매우 많습니다. 그 중에서 파이썬을 사용하는 대부분 사람들은 앞서서 포스팅 했던 Request 모듈을 사용하는 경우도 많습니다. 저 또한 Request 모듈을 매우 선호하고, 매우 훌륭한 모듈이라고 생각합니다. 하지만, 한가지 제약 사항들이 존재합니다. 즉, 데이터들을 원하는 객체로 정리하는 과정이 필요하다는 것입니다.

Python[파이썬 웹크롤러] 00-WebCrawler 크롤러를 위한 모듈 설치 - requests 모듈설치

Python[파이썬 웹크롤러] 00-WebCrawler 크롤러를 위한 모듈 설치 - requests 모듈설치

Python[파이썬 웹크롤러] 00. WebCrawler 크롤러를 위한 모듈 설치 - requests 모듈설치 빅데이터, 인공지능등이 발전함에 따라, 데이터의 중요성이 매우 켜지고 있습니다. 그 중에서 가장 큰 부분은 바�

appia.tistory.com

앞서서 Request 모듈을 통해서 HTML 데이터를 읽어오면 다음과 같은 구조가 될 수 있습니다.

<!doctype html>
<html lang="ko">
<head>
    <title>TISTORY</title>
    <meta charset="utf-8">
    <meta name="viewport" content="user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, width=device-width">
    <link rel="stylesheet" type="text/css" href="//t1.daumcdn.net/tistory_admin/www/style/top/font.css">
    <link rel="stylesheet" type="text/css" href="//t1.daumcdn.net/tistory_admin/www/style/top/error_20190814.css">
</head>
<body>
<div id="kakaoIndex">
    <a href="#kakaoBody">ë³¸ë¬¸ ë°ë¡ê°ê¸°</a>
    <a href="#kakaoGnb">ë©ë´ ë°ë¡ê°ê¸°</a>
</div>
<div id="kakaoWrap" class="tistory_type3">
    <div id="kakaoContent" role="main">
        <div id="cMain">
            <div id="mArticle">
                <div class="content_error">
                    <div class="inner_error">
                        <div class="error_tistory">
                            <h2 id="kakaoBody" class="screen_out">ìë¬ ë©ì¸ì§</h2>
                            <strong class="tit_error  tit_error_type2">ì ê·¼ ê¶íì´ ìë <span class="br_line"><br></span>íì´ì§ìëë¤.</strong>
                            <p class="desc_error">ê¶ê¸íì  ì¬í­ì <a href="https://cs.daum.net/faq/173.html" class="link_txt">ê³ ê°ì¼í°</a>ë¡ ë¬¸ìí´ ì£¼ìê¸° ë°ëëë¤.</p>
                            <div class="wrap_btn"><a class="btn_basic" href="javascript:window.history.back();">ì´ì íë©´</a></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <hr class="hide">
    <div id="kakaoFoot" class="footer_comm">
        <div class="inner_foot">
            <p class="desc_footer2">
                <strong class="txt_flogo">TISTORY</strong>
            </p>
        </div>
    </div>
</div>
</body>
</html>
Colored by Color Scripter
cs

물론, 관련해서 부분들에 대해서 데이터를 구조화 할 수 있습니다. 하지만, 관련해서는 매우 번거로운 일입니다. 그래서 이번 포스팅에서는 관련해서 이러한 부분들을 보다 효율적으로 관리하고 자동으로 정리해주는 beautifulsoup(뷰티풀 수프)설치하는 방법에 대해서 포스팅을 해보고자 합니다.

beautifulsoup(뷰티풀 수프)설치

- 리눅스 기반에서 설치할 경우

$ apt-get install python-bs4 (for Python 2)

$ apt-get install python3-bs4 (for Python 3)

- 윈도우 기반에서 설치할 경우

pip install beautifulsoup4

저의 경우 윈도우 기반으로 사용하기 때문에 윈도우 기반에서 간단히 한번 살펴보겠습니다.

(venv) D:\BlogProject\Pillow>pip install beautifulsoup4

Collecting beautifulsoup4

Donwloading beautifulsoup4-4.9.3-py3-none-any.whl(115 kb) |████████████████████████████████| 115 kB 819 kB/s

Requirement already satisfied: soupsieve>1.2; python_version >= "3.0" in d:\blogproject\venv\lib\site-packages (from beautifulsoup4) (2.0.1)

Installing collected packages: beautifulsoup4

Successfully installed beautifulsoup4-4.9.3

WARNING: You are using pip version 20.0.2; however, version 20.2.4 is available.

You should consider upgrading via the 'd:\blogproject\venv\scripts\python.exe -m pip install --upgrade pip' command.

위와 같은 로그가 나타날 것입니다. 그리고 다음과 같이 실행함으로 정상적으로 설치되어 있는지에 대해서 확인할 수 있습니다.

>>>

콘솔 창에서 import bs4를 입력했을 때 위와 같인 '>>>' 표시가 나타나는 것으로 정상 설치에 대해서 확인할 수 있습니다.

이번 포스팅에서는 간단히 Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup(뷰티풀 스프)설치하기에 대해서 살펴봤습니다. 혹 궁금하신 점이나 문의 사항이 있으시면 언제든지 댓글 및 방명록에 글 남겨주시길 바랍니다. 감사합니다.

저작자표시 비영리 변경금지 (새창열림)

'Python > Python Crawler[크롤러]' 카테고리의 다른 글

[파이썬 크롤링]네이버 스포츠 추천 뉴스 크롤링 하여 제목 출력하기 (bs4, requests) (0)	2021.08.18
Python[파이썬 웹크롤러] 05-WebCrawler beautifulsoup4(뷰티풀 수프) 웹페이지(HTML)를 파싱하기(Parsing) (0)	2020.10.24
Python[파이썬 웹크롤러] 03-WebCrawler 사이트의 내용(Html) 읽어오기(requests) (0)	2020.07.24
Python[파이썬 웹크롤러] 02-WebCrawler 사이트의 헤더 읽어오기(requests) (0)	2020.07.22
Python[파이썬 웹크롤러] 01-WebCrawler 웹사이트 접속 확인하기(requests) (0)	2020.07.14

'Python/Python Crawler[크롤러]' Related Articles

Comments

Appia의 IT세상

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4(뷰티풀 수프)설치하기 본문

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4(뷰티풀 수프)설치하기

Python[파이썬 웹크롤러] 04-WebCrawler beautifulsoup4

(뷰티풀 수프)설치하기 및 필요성

'Python > Python Crawler[크롤러]' 카테고리의 다른 글

티스토리툴바