使用 Python BeautifulSoup 提取指定网页特定区域图片并显示图片大小

发表于2025-04-30|更新于2025-04-30|题解

|总字数:304|阅读时长:1分钟|浏览量:

题目.png

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

  
url_list = ['https://www.sdtbu.edu.cn/info/1043/35641.htm','https://www.sdtbu.edu.cn/info/1043/35612.htm']

def fetch_and_parse(url):
    response = requests.get(url,timeout=5)
    response.raise_for_status()    
    response.encoding = 'utf-8'
    soup = BeautifulSoup(response.text, 'html.parser')
    title = soup.find_all('title')
    print(f"页面标题:",title[0].text)

    content_box = soup.find('div', class_='content-box')
    if not content_box:
        print("未找到class为'content-box'的div元素")
        return

  

    img_tags = content_box.find_all('img')
    if not img_tags:
        print("在content-box中未找到任何图片")
        return

    print(f"共找到{len(img_tags)}张图片。")

  

    for i, img in enumerate(img_tags, 1):
        img_url = img.get('src')
        if not img_url:
            print(f"图片 {i}: 未找到src属性，跳过")
            continue
        full_img_url = urljoin(url, img_url)

        try:
            img_response = requests.get(full_img_url, stream=True, timeout=5)
            img_response.raise_for_status()
            content_length = img_response.headers.get('content-length')

            if content_length:
                size_bytes = int(content_length)

            else:
                size_bytes = len(img_response.content)

            if size_bytes >= 1024 * 1024:
                size_str = f"{size_bytes / (1024 * 1024):.2f} MB"

            else:
                size_str = f"{size_bytes / 1024:.2f} KB"
            print(f"图片{i}: {size_str}({full_img_url})")


        except requests.exceptions.RequestException as e:
            print(f"下载图片失败: {str(e)}")
        except Exception as e:
            print(f"处理图片时发生错误: {str(e)}")

  

if __name__=="__main__":
    for url in url_list:
        fetch_and_parse(url)

文章作者: Crayz

文章链接: http://crayz-l.github.io/posts/2955879062/

版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来源 Crayz's Blog！

相关推荐

Python实验十五

神经网络学习之卷积神经网络第1关卷积运算已知输入和卷积核，求下列图片中卷积运算的输出结果： [ [ 130 , 253 ] , [ 157 , 189 ] ] 已知输入是10∗10的矩阵，卷积核高宽都为5，请问，结果的高和宽分别为多少？高为6，宽为6。第2关填充（Padding）已知输入形状为：10∗10，卷积核的形状为5∗5，求当需要输出形状为10∗10时，需要填充的大小为： 2 已知输入形状为：10∗10，卷积核的形状为5∗5，是Valid卷积，则需要填充的大小为： 0 第3关步幅（Stride）已知输入形状为：10∗10，卷积核的形状为5∗5，填充大小为1，求当需要输出形状为8∗8时，需要的步幅大小为： 1 已知输入的形状为n∗n，卷积核形状为3∗3，填充大小为2，步幅为1，输出大小为15∗15，请问，输入的形状为： 13*13 第4关多通道输入与多通道输出已知输入的形状为3∗24∗24，想要输出的形状为16∗24∗24，请问卷积核的形状应为： 3 * 16 * 3 * 3，填充为1，步幅为1 第5关...

Python实验十四

Pytorch 之神经网络第1关加载数据——Data Loader 12345678910111213141516171819202122232425262728import torchimport torchvision.datasets as dsetsimport torchvision.transforms as transformsimport osimport syspath = os.path.split(os.path.abspath(os.path.realpath(sys.argv[0])))[0] + os.path.seppath = path[:-10] + '/data/'#/********** Begin *********/# 下载训练集 MNIST 训练集,设置 root = path,train=False ,download=False,赋值给变量train_datasettrain_dataset = dsets.MNIST(root=path, train=False,...

Python实验十三

使用kNN算法对红酒进行分类第1关分析红酒数据 12345678910111213 import numpy as npdef alcohol_mean(data): ''' 返回红酒数据中红酒的酒精平均含量 :param data: 红酒数据对象 :return: 酒精平均含量，类型为float ''' #********* Begin *********# return data.data[:,0].mean() #********* End **********# 第2关对数据进行标准化 1234567891011121314from sklearn.preprocessing import StandardScalerdef scaler(data): ''' 返回标准化后的红酒数据 :param data: 红酒数据对象 :return: 标准化后的红酒数据，类型为ndarray ...

Python实验十二

Pandas 初体验第1关了解数据处理对象–Series 123456789101112131415161718# -*- coding: utf-8 -*-from pandas import Series,DataFrameimport pandas as pddef create_series(): ''' 返回值: series_a: 一个Series类型数据 series_b: 一个Series类型数据 dict_a：一个字典类型数据 ''' # 请在此添加代码完成本关任务 # ********** Begin *********# series_a=Series([1,2,5,7],index=['nu', 'li', 'xue', 'xi']) dict_a={'ting':1, 'shuo':2,...

Python实验十一

Matplotlib接口和常用图形第1关画图接口 123456789import matplotlibmatplotlib.use('Agg')import matplotlib.pyplot as pltdef student(x,y): # ********** Begin *********# fig = plt.figure(figsize=(10,10)) plt.savefig("Task1/image1/T2.png") plt.show() # ********** End **********# 第2关线形图 12345678910111213import matplotlibmatplotlib.use("Agg")import matplotlib.pyplot as pltdef student(input_data,input_data1): # ********* Begin *********# fig =...

Python实验十

Python之多线程编程第1关线程同步之报数 123456789101112131415161718192021222324252627282930313233# 本关将根据测试输入创建多个线程，每个线程相当于队列中的一个人，他们报的数用全局变量x存储；# 学员需要编写run()方法，使得每个线程将自己该报的数输出；# 注意在输出语句之前，加入time.sleep(0.1)防止输出过快造成顺序混乱的情况。# -*- coding: utf-8 -*-import threadingimport timelock = threading.Lock()class mythread(threading.Thread): x = 0 def __init__(self): threading.Thread.__init__(self) def run(self): global x # *********begin*********# lock.acquire() x += 1 ...

数据加载中