r-将一个大行分隔成多行以从HTML创建一个dataframe？-堆栈溢出

顺晟科技

2022-10-19 13:19:36

182

我目前正在将此网页的HTML文件读入R，并对其进行处理，以提取有用的数据来创建新的数据目录。

对网页文本的视觉检查显示，包含数据值的行都以“”开头。以下是我到目前为止的代码:

thepage<-readLines('https://www.worldometers.info/world-population/population-by-country/')

dataline <- grep('<td>', thepage)
dataline

thepage<-readLines('https://www.worldometers.info/world-population/population-by-country/')

dataline <- grep('<td>', thepage)
dataline

，它告诉我所有的数据都在第11行。所以我这样做了:

thepage<-readLines('https://www.worldometers.info/world-population/population-by-country/')

dataline <- grep('<td>', thepage)
dataline

thepage<-readLines('https://www.worldometers.info/world-population/population-by-country/')

dataline <- grep('<td>', thepage)
dataline

这一点帮助都没有，因为“data”仍然是一个庞大的行。如何将大量的行拆分为多行？我首选的dataframe如下所示:

TIA。

顺晟科技：

以下内容如何？

thepage<-readLines('https://www.worldometers.info/world-population/population-by-country/')

dataline <- grep('<td>', thepage)
dataline

网站建设