18910140161

HTML-Python,Jupyter Notebook,从URL下载Excel文件-堆栈溢出

顺晟科技

2022-10-18 12:36:27

145

我目前正在尝试访问ABS网站上的一些数据。

https://www.abs.gov.au/statistics/labour/earnings-and-work-hours/weekly-payroll-jobs-and-wages-australia/latest-release#data-download.

表5.

Excel文件的

名称在每次发布时都会更改。我想通过自动下载并将其保存到数据帧中来更新它。

目前的进展:

谢谢你漂亮的汤。使用该函数获取网站上的URL列表。

#####Step 1: start by importing all of the necessary packages#####
import requests #requesting URLs
import urllib.request #requesting URLs
import pandas as pd #for simplifying data operations (e.g. creating dataframe objects)
from bs4 import BeautifulSoup #for web-scraping operations

#####Step 2: connect to the URL in question for scraping#####
url = 'https://www.abs.gov.au/statistics/labour/earnings-and-work-hours/weekly-payroll-jobs-and-wages-australia/latest-release' 
response = requests.get(url) #Connect to the URL using the "requests" package
response #if successful then it will return 200

#####Step 3: read in the URL via the "BeautifulSoup" package#####
soup = BeautifulSoup(response.text, 'html.parser') 

#####Step 4: html print#####
for link in soup('a'):
    print(link.get('href'))

##how to get the link to table 5?##
**url = ?**

##last step to save into data frame##
ws = pd.read_excel(url, sheet_name='Payroll jobs index-SA4', skiprows=5)

顺晟科技:

您可以从URL查找与XSLX关联的DIV类,并使用find_all方法返回元素列表,并使用索引1查找href

import requests 
from bs4 import BeautifulSoup

url = 'https://www.abs.gov.au/statistics/labour/earnings-and-work-hours/weekly-payroll-jobs-and-wages-australia/latest-release' 
response = requests.get(url) 
response 
soup = BeautifulSoup(response.text, 'html.parser') 

url=soup.find_all("div",class_="abs-data-download-right")[1].find("a")['href']
pd.read_excel(url, sheet_name='Payroll jobs index-SA4', skiprows=5,engine='openpyxl')

用于查找所有URL:

urls=soup.find_all("div",class_="abs-data-download-right")
for i in urls:
    print(i.find("a")['href'])

输出:

<代码>https://www.abs.gov.au/statistics/labour/earnings-and-work-hours/weekly-payroll-jobs-and-wages-australia/week-ending-31-july-2021/6160055001_do004.xlsx.https://www.abs.gov.au/statistics/labour/earnings-and-work-hours/weekly-payroll-jobs-and-wages-australia/week-ending-31-july-2021/6160055001_do005.xlsx....
  • TAG:
相关文章
我们已经准备好了,你呢?
2024我们与您携手共赢,为您的企业形象保驾护航