Sunday, October 14, 2012

Pin It


Get Gadget

Optimized Light Weight XPATH Parser using AXIOM and Antlr Parser Generator

If we talk about parsing a XPath expression we can use the XPath engines like Jaxen or AXIOM XPath which is built on top of AXIOM. These XPath engines supports the full specification of XPath.
But most of the time our use-case is not having the full XPath specification parsing support but a quick and optimized way of parsing XPath. Because most of the time we are parsing XPath expressions like "/data/book/name", which will point to a single element of a XML.
So If we have some-kind of fact that we want the first match of XPath or we have only one match for XPath expression, we don't need to open the full tree of XML.
For a example to parse an XPath expression like "/data/book/author/name" against following XML, we don't have to look at lecturers element at all.

  
       
   Andun
   Sri Lanka
       
  
  
       
           Mr_Thilak_Fernando
           
               108
               328
               568
           
      
      
           Ms_Pivithuru
           
               88
               328
               568
               808
               1048
           
      
      
           Mr_Nisansa
           
               88
               328
               568
               808
               1048
           
      
  
So in this post I will describe a implementation of a High Performance Custom Xpath Parser which will have following conditions,
  • Will return the first match of XPAth expression 
If we have an XPath like, "/data/book/author/name". This parser only return the <name>Andun</name> element only.

    
        
        Andun
        Sri Lanka
    
    
    
        
        Sameera
        Sri Lanka
        
    
  • Only return a single element as result. No Sets are returned. 
If we have an XPath like, "/data/book/author/name". This parser only return the <name>Andun</name> element only. While the XPath Spec defines the result of this expression is <name>Andun</name><name>Sameera</name>.

    
        
        Andun
        Sri Lanka
    
    
    
        
            
            Sameera
            Sri Lanka
            
        
    
  • Only parse absolute XPath expressions. 
This kind of XPath expression are accepted "/data/book/author/@age" while "//author/name" kind of expressions are rejected.
  • Complex XPath expression which have predicates, logical combinations of results etc. will be not be eligible.
To implement this we have used this kind of architecture on top of the AXIOM data model for a XML. Created small components which are responsible to do simple operations to AXIOM data model. For Example,
  • Component to return XML node if it matches some conditions.
  • Component to return set of children of a XML node if it matches some conditions.
  • Component to return an attribute of a XML node.
Each of these components get a OMElement as the input and it will output the result. Also we can specify some conditions to get the result. So joining these components as a chain we can get the result of a XPath expression.


Here the XML is passed to the first component. Then the result of the first component is passed to the second. Like wise the chain is created. So the result of the final component is the result of the XPath expression.
So likewise we can create a simple XPath parser. Here the advantages are, 
  • It will not have to do check the all spec of the XPath spec which is useless for most of the use-case. 
  • It don't have to traverse the all big tree of XML to get the result.
  • AXIOM will not load the XML elements to the memory which are not traversed.
So this new Custom XPath parser will do following,
  • When we give a XPath expression to the parser it will create the component chain which will do the processing to get the result of the XPath.
  • When the parsing happens, the input XML is passed though the component chain and output the result.
To create the component chain we have analyze the given XPATH expression. For that We use Antlr Parser Generator.
  • We have to check weather the given XPath expression can be evaluated using Parser.
  • If the expression can be parsed, we have to create the above dicribed component chain. 
We have used a Antlr Grammar for XPath 1.0. Using that grammar we have generated a Lexer, Parser and Walker for  XPath 1.0. When a XPath expression is given Antlr will analyze the expression using the given Grammar which we have created. Then It will generate a Abstract Syntax tree like the following,


The using this generated tree we are building the  component chain. Each node of this tree is responsible to add some JAVA code to the XPath parser. When the tree is traversed a Java class will be generated. Using that we can parse a XPath expression.
Also when this tree found a XPath expression which cant be evaluated using our Custom Parser it will throw an Exception. So we can develop a logic to catch that exception and call a XPath parser like Jaxen to do processing.
Also to further optimize the processing we can use a input stream of a XML. So AXIOM will only read the necessary part of that stream to process the XPath expression.
Following JAVA class shows the implementation of the XPath parser. Please contact me to further clarifications.

package org.wso2.carbon.custom_xpath;

import org.antlr.runtime.RecognitionException;
import org.apache.axiom.om.OMAbstractFactory;
import org.apache.axiom.om.OMElement;
import org.apache.axiom.om.impl.builder.StAXOMBuilder;
import org.apache.axiom.om.impl.llom.util.AXIOMUtil;
import org.apache.axiom.soap.SOAPEnvelope;
import org.apache.axiom.soap.SOAPFactory;
import org.wso2.carbon.custom_xpath.compiler.CustomXPATHCompiler;
import org.wso2.carbon.custom_xpath.compiler.exception.CustomXPATHCompilerException;
import org.wso2.carbon.custom_xpath.exception.CustomXPATHException;
import org.wso2.carbon.custom_xpath.parser.custom.CustomParser;
import org.wso2.carbon.custom_xpath.parser.custom.components.ParserComponent;
import javax.xml.namespace.QName;
import javax.xml.stream.XMLStreamException;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

public class CustomXPATH {

    private String xPath;
    private CustomParser customParser;
    private HashMap<String,String> prefixNameSpaceMap;

    /**
     * This constructor is responsible For Create a Custom XPATH Parser Object
     * @param xPath is the XPATH String
     * @param prefixNameSpaceMap  Prefix:NameSpace Map
     * @throws CustomXPATHException
     */
    public CustomXPATH(String xPath,HashMap<String,String> prefixNameSpaceMap) throws CustomXPATHException {
        setPrefixNameSpaceMap(prefixNameSpaceMap);
        setxPath(xPath);
        try {
            setCustomParser(CustomXPATHCompiler.parse(getxPath()));
        } catch (RecognitionException e) {
            throw new CustomXPATHException(e);
        }
        catch(CustomXPATHCompilerException exception){
            //Here is the catch clause for XPath expressions which the Custom Parser cant process. So we can call A parser like Jaxen here.
        }
    }

    /**
     * This will return the XPATH expression's result when you provide a Input Stream To a XML
     * @param inputStream for a XML
     * @return  Result of the XPATH expression
     * @throws XMLStreamException
     * @throws CustomXPATHException
     */
    public  String getStringValue(InputStream inputStream) throws XMLStreamException, CustomXPATHException {
        if(customParser!=null){
            return getCustomParser().process(inputStream);
        }
        else{
            //Here the Jaxen will do the processing
        }
    }

    /**
     * This will return the XPATH expression's result when you provide a Input Stream To a XML
     * @param envelope for a XML
     * @return  Result of the XPATH expression
     * @throws XMLStreamException
     * @throws CustomXPATHException
     */
    public  String getStringValue(String envelope) throws XMLStreamException, CustomXPATHException {
        return getStringValue(new ByteArrayInputStream(envelope.getBytes()));
    }

    public String getxPath() {
        return xPath;
    }

    public void setxPath(String xPath) {
        this.xPath = xPath;
    }

    public CustomParser getCustomParser() {
        return customParser;
    }

    public void setCustomParser(CustomParser customParser) {
        this.customParser = customParser;
    }

    public HashMap<String, String> getPrefixNameSpaceMap() {
        return prefixNameSpaceMap;
    }

    public void setPrefixNameSpaceMap(HashMap<String, String> prefixNameSpaceMap) {
        this.prefixNameSpaceMap = prefixNameSpaceMap;
        ParserComponent.setPrefixNameSpaceMap(prefixNameSpaceMap);
    }

}

No comments:

Post a Comment