python爬虫库

作者：倾尽尘光暖流年发布日期:2026-01-15 浏览:254

# 示例代码：使用Python的requests和BeautifulSoup库进行简单的网页爬取

import requests
from bs4 import BeautifulSoup

# 定义要爬取的URL
url = 'https://example.com'

# 发送HTTP请求获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 使用BeautifulSoup解析HTML内容
    soup = BeautifulSoup(response.content, 'html.parser')

    # 打印网页标题
    print("网页标题:", soup.title.string)

    # 查找所有段落标签并打印文本内容
    paragraphs = soup.find_all('p')
    for p in paragraphs:
        print(p.get_text())
else:
    print("请求失败，状态码:", response.status_code)

# 解释说明：
# 1. 使用requests库发送HTTP请求并获取网页内容。
# 2. 使用BeautifulSoup库解析HTML文档。
# 3. 通过BeautifulSoup的方法查找和提取所需的数据。
# 4. 打印网页标题和所有段落的内容。

如果你需要更复杂的爬虫功能或处理JavaScript渲染的页面，可以考虑使用Selenium或Scrapy等更强大的库。

上一篇：while在python的用法

下一篇：python rsa加密