PROSAGA码农传奇-matlab-使用API从维基百科中读取数据[关闭]

<div class =“post-text”itemprop =“text”>
  <P>
    一种可能的解决方案是使用阅读网页
    <a href="https://www.mathworks.com/help/matlab/ref/webread.html" rel="nofollow noreferrer">
      webread
    </A>
    ，并使用来自的函数处理数据
    <a href="https://www.mathworks.com/help/textanalytics/index.html" rel="nofollow noreferrer">
      文本分析工具箱
    </A>
    ：
  </p>
   <pre>
    <code>
      % Read HTML data.
raw = webread('https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=text&page=91st_Academy_Awards');

% Specify sections of interest.
SectionsOfInterest = ["Date","Site","Preshow hosts","Produced by","Directed by"];

% Parse HTML data.
myTree = htmlTree(raw.parse.text.x_);

% Find table element.
tableElements = findElement(myTree,'Table');
tableOfInterest = tableElements(1);

% Find header cell elements.
thElements = findElement(tableOfInterest,"th");
% Find cell elements.
tdElements = findElement(tableOfInterest,"td");

% Extract text.
thHTML = thElements.extractHTMLText;
tdHTML = tdElements.extractHTMLText;

for section = 1:numel(SectionsOfInterest)

sectionName = SectionsOfInterest(section);
   sectIndex = strcmp(sectionName,thHTML);

% Remove spaces if present from section name.
   sectionName = strrep(sectionName,' ','');

% Clean up data.
   sectData = regexprep(tdHTML(sectIndex),'\n+','.');

% Create structure.
   s.(sectionName) = sectData;
end

</code>
  </pre>
  <P>
    可视化输出结构：
  </p>
   <pre>
    <code>
      >> s
s =

struct with fields:

Date: "February 24, 2019"
        Site: "Dolby Theatre.Hollywood, Los Angeles, California, U.S."
Preshowhosts: "Ashley Graham.Maria Menounos.Elaine Welteroth.Billy Porter.Ryan Seacrest. "
  Producedby: "Donna Gigliotti.Glenn Weiss"
  Directedby: "Glenn Weiss"

</code>
  </pre>
</DIV>

使用API​​从维基百科中读取数据[关闭]

使用API从维基百科中读取数据[关闭]