在MATLAB中将XML内容解析为正确的数据类型时出现问题

问题描述 投票:1回答:1

我在解析XML数据成为MATLAB结构时遇到了困难。

使用现成的xml2struct函数,我能够做到这一点。但是,最终节点的内容为char,我希望将它作为double,uint8或XML元素中声明的任何属性。

以此XML数据为例:

<?xml version="1.0"?>
<catalog>
   <book id="bk101" type="struct" size="1 1">
      <genre>Computer</genre>
      <price type="double">44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102" type="struct" size="1 1">
      <genre>Fantasy</genre>
      <price type="number" size="1 1">112</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>

我想编写一个MATLAB代码,可以将上述数据解析为一个结构,但是它可以直接在结构中作为双数据类型的第一本书的价格,并根据其属性预订两个作为uint8数据类型。

当使用该函数直接解析XML数据时,我将获取catalog.book {1,1} .price.Text '44 .95'的值作为char。

直接使用MATLAB函数,可以检索xml的属性,但我对如何扩展xml2struct中的函数以便根据元素的属性转换数据类型内容(如价格)感到困惑。

任何人都可以通过暗示我可以在哪里操作XML内容来改变其函数中的数据类型来帮助我吗?我已经完成了一步一步的调试,只是为了确切了解该功能是如何工作的,但我似乎仍无法找到解决方案......

下面是完整的xml2struct函数代码:

function [ s ] = xml2struct( file )
%Convert xml file into a MATLAB structure
% [ s ] = xml2struct( file )
%
% A file containing:
% <XMLname attrib1="Some value">
%   <Element>Some text</Element>
%   <DifferentElement attrib2="2">Some more text</DifferentElement>
%   <DifferentElement attrib3="2" attrib4="1">Even more text</DifferentElement>
% </XMLname>
%
% Will produce:
% s.XMLname.Attributes.attrib1 = "Some value";
% s.XMLname.Element.Text = "Some text";
% s.XMLname.DifferentElement{1}.Attributes.attrib2 = "2";
% s.XMLname.DifferentElement{1}.Text = "Some more text";
% s.XMLname.DifferentElement{2}.Attributes.attrib3 = "2";
% s.XMLname.DifferentElement{2}.Attributes.attrib4 = "1";
% s.XMLname.DifferentElement{2}.Text = "Even more text";
%
% Please note that the following characters are substituted
% '-' by '_dash_', ':' by '_colon_' and '.' by '_dot_'
%
% Written by W. Falkena, ASTI, TUDelft, 21-08-2010
% Attribute parsing speed increased by 40% by A. Wanner, 14-6-2011
% Added CDATA support by I. Smirnov, 20-3-2012
%
% Modified by X. Mo, University of Wisconsin, 12-5-2012

    if (nargin < 1)
        clc;
        help xml2struct
        return
    end

    if isa(file, 'org.apache.xerces.dom.DeferredDocumentImpl') || isa(file, 'org.apache.xerces.dom.DeferredElementImpl')
        % input is a java xml object
        xDoc = file;
    else
        %check for existance
        if (exist(file,'file') == 0)
            %Perhaps the xml extension was omitted from the file name. Add the
            %extension and try again.
            if (~contains(file,'.xml'))
                file = [file '.xml'];
            end

            if (exist(file,'file') == 0)
                error(['The file ' file ' could not be found']);
            end
        end
        %read the xml file
        xDoc = xmlread(file);
    end

    %parse xDoc into a MATLAB structure
    s = parseChildNodes(xDoc);

end

% ----- Subfunction parseChildNodes -----
function [children,ptext,textflag] = parseChildNodes(theNode)
    % Recurse over node children.
    children = struct;
    ptext = struct; textflag = 'Text';
    if hasChildNodes(theNode)
        childNodes = getChildNodes(theNode);
        numChildNodes = getLength(childNodes);

        for count = 1:numChildNodes
            theChild = item(childNodes,count-1);
            [text,name,attr,childs,textflag] = getNodeData(theChild);
            %[text,name,childs,textflag] = getNodeData(theChild);

            if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
                %XML allows the same elements to be defined multiple times,
                %put each in a different cell
                if (isfield(children,name))
                    if (~iscell(children.(name)))
                        %put existsing element into cell format
                        children.(name) = {children.(name)};
                    end
                    index = length(children.(name))+1;
                    %add new element
                    children.(name){index} = childs;
                    if(~isempty(fieldnames(text)))
                        children.(name){index} = text; 
                    end
                    %if(~isempty(attr)) 
                    %    children.(name){index}.('Attributes') = attr; 
                    %end
                else
                    %add previously unknown (new) element to the structure
                    children.(name) = childs;
                    if(~isempty(text) && ~isempty(fieldnames(text)))
                        children.(name) = text; 
                    end
                    if(~isempty(attr)) 
                        children.(name).('Attributes') = attr; 
                    end
                end
            else
                ptextflag = 'Text_Me';
                if (strcmp(name, '#cdata_dash_section'))
                    ptextflag = 'CDATA';
                elseif (strcmp(name, '#comment'))
                    ptextflag = 'Comment';
                end

                %this is the text in an element (i.e., the parentNode) 
                if (~isempty(regexprep(text.(textflag),'[\s]*','')))
                    if (~isfield(ptext,ptextflag) || isempty(ptext.(ptextflag)))
                        ptext.(ptextflag) = text.(textflag);
                    else
                        %what to do when element data is as follows:
                        %<element>Text <!--Comment--> More text</element>

                        %put the text in different cells:
                        % if (~iscell(ptext)) ptext = {ptext}; end
                        % ptext{length(ptext)+1} = text;

                        %just append the text
                        ptext.(ptextflag) = [ptext.(ptextflag) text.(textflag)];
                    end
                end
            end

        end
    end
end

 % ----- Subfunction getNodeData -----
function [text,name,attr,childs,textflag] = getNodeData(theNode)
    % Create structure of node info.

    %make sure name is allowed as structure name
    name = toCharArray(getNodeName(theNode))';
%     name = strrep(name, '-', '_dash_');
%     name = strrep(name, ':', '_colon_');
%     name = strrep(name, '.', '_dot_');

    attr = parseAttributes(theNode);
    if (isempty(fieldnames(attr))) 
       attr = []; 
    end

    %parse child nodes
    [childs,text,textflag] = parseChildNodes(theNode);

    if (isempty(fieldnames(childs)) && isempty(fieldnames(text)))
        %get the data of any childless nodes
        % faster than if any(strcmp(methods(theNode), 'getData'))
        % no need to try-catch (?)
        % faster than text = char(getData(theNode));
        text.(textflag) = toCharArray(getTextContent(theNode))';
    end

end

% ----- Subfunction parseAttributes -----
function attributes = parseAttributes(theNode)
    % Create attributes structure.

    attributes = struct;
    if hasAttributes(theNode)
       theAttributes = getAttributes(theNode);
       numAttributes = getLength(theAttributes);

       for count = 1:numAttributes
%             attrib = item(theAttributes,count-1);
%             attr_name = regexprep(char(getName(attrib)),'[-:.]','_');
%             attributes.(attr_name) = char(getValue(attrib));

            %Suggestion of Adrian Wanner
            str = toCharArray(toString(item(theAttributes,count-1)))';
            k = strfind(str,'='); 
            attr_name = str(1:(k(1)-1));
%             attr_name = strrep(attr_name, '-', '_dash_');
%             attr_name = strrep(attr_name, ':', '_colon_');
%             attr_name = strrep(attr_name, '.', '_dot_');
            attributes.(attr_name) = str((k(1)+2):(end-1));
       end
    end
end

编辑:我找到了答案。要更改属性,我们应该在getNodeData子函数中添加应用更改。在调用parseChildNode之后,特别添加了这个条件块。

%parse child nodes
[childs,text,textflag] = parseChildNodes(theNode);

if isfield(attr, 'type')
    switch attr.type
        case 'double'
            text.(textflag) = str2double(strsplit(text.(textflag)));
        case 'int'
            text.(textflag) = str2number(strsplit(text.(textflag)));
    end
end
xml matlab xml-parsing
1个回答
0
投票

我认为您需要转换数据类型,具体取决于价格的Attributes.type值。

下面是xml2struct的示例代码。它检查每个price属性的Attributes.type字段,并根据类型(“number”或“double”)转换数据类型。默认情况下,MATLAB将数值视为double,因此通过使用str2double进行转换,将价格112视为double。我认为double是好的,但如果你想把它当作uint8,请使用uint8(str2double(price.Text))。

xmlhandler = xml2struct('yourXmlFile.xml');
books = xmlhandler.catalog.book;

for ii=1:size(books, 2)
    eachBook = books{1, ii};
    price = eachBook.price;
    if isfield(price.Attributes, 'type')
        dataType = price.Attributes.type;

        % Add Value property depending on the data type
        switch dataType
            case 'number'
                books{1, ii}.price.Value = uint8(str2double(price.Text));
                %eachBook.price.Value = str2double(price.Text); % If double is fine
            case 'double'
                books{1, ii}.price.Value = str2double(price.Text);
            otherwise
                % Do something if type is not specified
        end
    end
end

上面的示例代码将Value属性添加到price,但只是替换Text属性,请像books{1, ii}.price.Text = str2double(price.Text)一样。

© www.soinside.com 2019 - 2024. All rights reserved.