网页爬虫代码python

作者：拽一个给爷看发布日期:2025-10-28 浏览:117

import requests
from bs4 import BeautifulSoup

# 定义一个简单的网页爬虫函数
def simple_web_crawler(url):
    try:
        # 发送HTTP请求获取网页内容
        response = requests.get(url)
        response.raise_for_status()  # 检查请求是否成功

        # 使用BeautifulSoup解析网页内容
        soup = BeautifulSoup(response.text, 'html.parser')

        # 示例：提取所有的标题标签<h1>
        titles = soup.find_all('h1')
        for title in titles:
            print(title.get_text())

    except requests.RequestException as e:
        print(f"请求出错: {e}")

# 示例URL
url = "https://example.com"
simple_web_crawler(url)

解释说明：

导入库：
- requests：用于发送HTTP请求。
- BeautifulSoup：用于解析HTML文档。
定义函数 simple_web_crawler：
- 接受一个URL作为参数。
- 使用requests.get()发送HTTP GET请求并获取网页内容。
- 使用BeautifulSoup解析HTML文档。
- 提取所有的<h1>标签并打印其文本内容。
异常处理：
- 使用try-except块捕获可能的网络请求错误，并输出错误信息。
示例调用：
- 使用示例URL "https://example.com" 调用爬虫函数。

上一篇：python中find函数怎么用

下一篇：网页python