Python-如何使用BeautifulSoup-Stack Overflow

顺晟科技

2022-10-18 13:20:37

从表TH中提取href链接

我正在尝试从基本URL：https：//fbref.com/en/comps/10/stats/championship-stats

中的多个表中的任何一个表创建所有足球队/链接的列表然后，

我会使用来自href的链接来抓取每个团队的数据。href嵌入在th标记中，如下所示

th scope="row" class="left " data-stat="squad"><a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley</a></th

   a href="/en/squads/293cb36b/Barnsley-Stats">Barnsley</a

下面

的代码给出了' a '标记

的列表

<代码>页面="；https：//fbref.com/en/comps/10/championship-stats"pagetree=请求.获取（第页）PageSoup=BeautifulSoup（pageTree.content，' HTML.parser '）Teams=PageSup.Find_All（第个，{类：左边}）

输出（对于“左”的每个类）：

类="；左"；data-stat="；球队"；范围="；行"；>；a href="；/en/squad/293cb36b/Barnsley-stats"；>；巴恩斯利，

我已经尝试了以前的堆栈问题的指导（在BeautifulSoup中提取链接）但是，基于该线程的以下代码会产生错误

AttributeError：“ NoneType ”对象没有属性“查找_父级”

page = "https://fbref.com/en/comps/10/Championship-Stats"
pageTree = requests.get(page)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
Teams = pageSoup.find_all("th", {"class": "left"})

顺晟科技：

下面是一个使用CSS选择器的版本，我发现它比大多数其他方法更简单。

def import_TeamList():
BASE_URL = "https://fbref.com/en/comps/10/Championship-Stats"
r = requests.get(BASE_URL)
soup = BeautifulSoup(r.text, 'lxml')
team_list = []
team_tr = soup.find('a', {'data-stat': 'squad'}).find_parent('tr')
for tr in reels_tr.find_next_siblings('tr'):
    if tr.find('a').text != 'squad':
        break
    midi_list.append(BASE_URL + tr.find('a')['href'])
return TeamList

这

就是你要找的吗？

import requests
from bs4 import BeautifulSoup


url = 'https://fbref.com/en/comps/10/stats/Championship-Stats'
data  = requests.get(url).text
soup = BeautifulSoup(data)

links = BeautifulSoup(data).select('th a')
urls = [link['href'] for link in links]
print(urls)

上一篇：HTML-如何突出显示当前部分-堆栈下一篇：CSS-如何在HTML中显示Disc

网站建设

Html

Python-如何使用BeautifulSoup-Stack Overflow