今天小编给大家分享一下id是html的属性吗的相关知识点,内容详细,逻辑清晰,相信大部分人都还太了解这方面的知识,所以分享这篇文章给大家参考一下,希望大家阅读完这篇文章后有所收获,下面我们一起来了解一
顺晟科技
2022-09-15 20:07:07
164
常用匹配规则:
属性获取:
from lxml import etree html = \'<div><a class="du" href="http://www.baidu.com">百度</a></div>\' parser = etree.HTML(html) result = parser.xpath(\'//a[@class="du"]/@href\') print(result)View Code
文本获取:
from lxml import etree html = \'<div><a class="du" href="http://www.baidu.com">百度</a></div>\' parser = etree.HTML(html) result = parser.xpath(\'//a[@class="du"]/text()\') print(result)View Code
属性多值匹配:
from lxml import etree html = \'<div><a class="du baidu" href="http://www.baidu.com">百度</a></div>\' parser = etree.HTML(html) result = parser.xpath(\'//a[contains(@class,"du")]/text()\') print(result)View Code
多属性匹配:
from lxml import etree html = \'<div><a name="item" class="du baidu" href="http://www.baidu.com">百度</a></div>\' parser = etree.HTML(html) result = parser.xpath(\'//a[contains(@class,"du") and @name="item"]/text()\') print(result)View Code
按序选择:
from lxml import etree html = """ <li>item1</li> <li>item2</li> <li>item3</li> <li>item4</li> <li>item5</li> """ parser = etree.HTML(html) result = parser.xpath(\'//li[1]/text()\') #匹配第一个 print(result) result = parser.xpath(\'//li[last()]/text()\') #匹配最后一个 print(result) result = parser.xpath(\'//li[position()<3]/text()\') #匹配第一、第二个 print(result) result = parser.xpath(\'//li[last()-2]/text()\') #匹配倒数第三个 print(result)View Code
更多用法:http://www.w3school.com.cn/xpath/xpath_functions.asp
节点选择器:
from bs4 import BeautifulSoup html = """ <div> <li class="d1">item1</li> <li class="d2">item2</li> <li class="d3">item3</li> <li class="d4">item4</li> <li class="d5">item5</li> </div> """ soup = BeautifulSoup(html,\'lxml\') result = soup.div.children print(result) for value in result: print(value.string)View Code
方法选择器:
# find_all(name,attrs,recursive,text,**kwargs) from bs4 import BeautifulSoup html = """ <div> <li class="d1">item1</li> <li class="d2">item2</li> <li class="d3">item3</li> <li class="d4">item4</li> <li class="d5">item5</li> </div> """ soup = BeautifulSoup(html,\'lxml\') result = soup.find_all(name="div") for value in result: result = value.find_all(name="li",class_="d3")[0].get_text() # 等价于string print(result)View Code
Css选择器:
from bs4 import BeautifulSoup html = """ <div> <li class="d1">item1</li> <li class="d2">item2</li> <li class="d3">item3</li> <li class="d4" name="d">item4</li> <li class="d5">item5</li> </div> """ soup = BeautifulSoup(html,\'lxml\') result = soup.select(\'div li[name="d"]\') for value in result: print(type(value)) print(value.get_text())View Code
初始化
字符串初始化:
from pyquery import PyQuery as pq html = "<a href=\'http://www.baidu.com\'>百度一下</a>" parser = pq(html)View Code
URL初始化:
from pyquery import PyQuery as pq parser = pq(url="http://www.baidu.com") print(parser)View Code
文件初始化:
from pyquery import PyQuery as pq parser = pq(filename="demo.html") print(parser)View Code
查找节点
Css选择器:
html = """ <div class="qrcode-text" id="1"> 我是div标签的文本 <p class="title">我是标题<a href="http://www.baidu.com">百度一下</a></p> <p class="content">我是内容</p> </div> """ from pyquery import PyQuery as pq parser = pq(html) result = parser(".qrcode-text .title a") print(result)View Code
children() 查找子节点
find() 查找子孙节点
parent() 查找父节点
parents() 查找祖先节点
siblings() 查找兄弟节点
html = """ <body> <div class="qrcode-text" id="1"> 我是div标签的文本 <p class="title">我是标题<a class="du" href="http://www.baidu.com">百度一下</a></p> <p class="content">我是内容 <span class="first">第一行</span> </p> </div> </body> """ from pyquery import PyQuery as pq parser = pq(html) result = parser(".content").children() print(result) result = parser.find("span") print(result) result = parser("span").parent() print(result) result = parser("span").parents("#1") print(result) result = parser(".title").siblings() print(result)用法
获取信息
获取属性 attr()
内部文本 text()
html文本 html()
html = """ <body> <div class="item_1"><span>1.</span>第一行</div> <div class="item_2"><span>2.</span>第二行</div> <div class="item_3"><span>3.</span>第三行</div> </body> """ from pyquery import PyQuery as pq parser = pq(html) result = parser("div") for value in result.items(): print(value.attr("class")) print(value.text()) print(value.html())用法
节点操作
对节点进行动态修改。
removeClass()
addClass()
html = """ <body> <div class="item_1"><span>1.</span>第一行</div> <div class="item_2"><span>2.</span>第二行</div> <div class="item_3"><span>3.</span>第三行</div> </body> """ from pyquery import PyQuery as pq parser = pq(html) result = parser("div") for n,value in enumerate(result.items(),1): value.removeClass(value.attr("class")) value.addClass(str(n)) print(value)View Code
attr()
text()
html = """ <body> <div class="item_1"><span>1.</span>第一行</div> <div class="item_2"><span>2.</span>第二行</div> <div class="item_3"><span>3.</span>第三行</div> </body> """ from pyquery import PyQuery as pq parser = pq(html) result = parser("div") for n,value in enumerate(result.items(),1): value.attr(id=str(n)) value.text(\'Hello World\') print(value)View Code
remove()
html = """ <body> Hello World! <div class="item_1"><span>1.</span>第一行</div> <div class="item_2"><span>2.</span>第二行</div> <div class="item_3"><span>3.</span>第三行</div> </body> """ from pyquery import PyQuery as pq parser = pq(html) result = parser("body") value = result.remove("div") print(value.text())View Code
更多用法:http://pyquery.readthedocs.io/en/latest/api.html
19
2022-10
19
2022-10
18
2022-10
02
2022-10
02
2022-10
02
2022-10