Sunday, October 14, 2012

Optimized Light Weight XPATH Parser using AXIOM and Antlr Parser Generator

If we talk about parsing a XPath expression we can use the XPath engines like Jaxen or AXIOM XPath which is built on top of AXIOM. These XPath engines supports the full specification of XPath.
But most of the time our use-case is not having the full XPath specification parsing support but a quick and optimized way of parsing XPath. Because most of the time we are parsing XPath expressions like "/data/book/name", which will point to a single element of a XML.
So If we have some-kind of fact that we want the first match of XPath or we have only one match for XPath expression, we don't need to open the full tree of XML.
For a example to parse an XPath expression like "/data/book/author/name" against following XML, we don't have to look at lecturers element at all.

  
       
   Andun
   Sri Lanka
       
  
  
       
           Mr_Thilak_Fernando
           
               108
               328
               568
           
      
      
           Ms_Pivithuru
           
               88
               328
               568
               808
               1048
           
      
      
           Mr_Nisansa
           
               88
               328
               568
               808
               1048
           
      
  
So in this post I will describe a implementation of a High Performance Custom Xpath Parser which will have following conditions,
  • Will return the first match of XPAth expression 
If we have an XPath like, "/data/book/author/name". This parser only return the <name>Andun</name> element only.

    
        
        Andun
        Sri Lanka
    
    
    
        
        Sameera
        Sri Lanka
        
    
  • Only return a single element as result. No Sets are returned. 
If we have an XPath like, "/data/book/author/name". This parser only return the <name>Andun</name> element only. While the XPath Spec defines the result of this expression is <name>Andun</name><name>Sameera</name>.

    
        
        Andun
        Sri Lanka
    
    
    
        
            
            Sameera
            Sri Lanka
            
        
    
  • Only parse absolute XPath expressions. 
This kind of XPath expression are accepted "/data/book/author/@age" while "//author/name" kind of expressions are rejected.
  • Complex XPath expression which have predicates, logical combinations of results etc. will be not be eligible.
To implement this we have used this kind of architecture on top of the AXIOM data model for a XML. Created small components which are responsible to do simple operations to AXIOM data model. For Example,
  • Component to return XML node if it matches some conditions.
  • Component to return set of children of a XML node if it matches some conditions.
  • Component to return an attribute of a XML node.
Each of these components get a OMElement as the input and it will output the result. Also we can specify some conditions to get the result. So joining these components as a chain we can get the result of a XPath expression.


Here the XML is passed to the first component. Then the result of the first component is passed to the second. Like wise the chain is created. So the result of the final component is the result of the XPath expression.
So likewise we can create a simple XPath parser. Here the advantages are, 
  • It will not have to do check the all spec of the XPath spec which is useless for most of the use-case. 
  • It don't have to traverse the all big tree of XML to get the result.
  • AXIOM will not load the XML elements to the memory which are not traversed.
So this new Custom XPath parser will do following,
  • When we give a XPath expression to the parser it will create the component chain which will do the processing to get the result of the XPath.
  • When the parsing happens, the input XML is passed though the component chain and output the result.
To create the component chain we have analyze the given XPATH expression. For that We use Antlr Parser Generator.
  • We have to check weather the given XPath expression can be evaluated using Parser.
  • If the expression can be parsed, we have to create the above dicribed component chain. 
We have used a Antlr Grammar for XPath 1.0. Using that grammar we have generated a Lexer, Parser and Walker for  XPath 1.0. When a XPath expression is given Antlr will analyze the expression using the given Grammar which we have created. Then It will generate a Abstract Syntax tree like the following,


The using this generated tree we are building the  component chain. Each node of this tree is responsible to add some JAVA code to the XPath parser. When the tree is traversed a Java class will be generated. Using that we can parse a XPath expression.
Also when this tree found a XPath expression which cant be evaluated using our Custom Parser it will throw an Exception. So we can develop a logic to catch that exception and call a XPath parser like Jaxen to do processing.
Also to further optimize the processing we can use a input stream of a XML. So AXIOM will only read the necessary part of that stream to process the XPath expression.
Following JAVA class shows the implementation of the XPath parser. Please contact me to further clarifications.

package org.wso2.carbon.custom_xpath;

import org.antlr.runtime.RecognitionException;
import org.apache.axiom.om.OMAbstractFactory;
import org.apache.axiom.om.OMElement;
import org.apache.axiom.om.impl.builder.StAXOMBuilder;
import org.apache.axiom.om.impl.llom.util.AXIOMUtil;
import org.apache.axiom.soap.SOAPEnvelope;
import org.apache.axiom.soap.SOAPFactory;
import org.wso2.carbon.custom_xpath.compiler.CustomXPATHCompiler;
import org.wso2.carbon.custom_xpath.compiler.exception.CustomXPATHCompilerException;
import org.wso2.carbon.custom_xpath.exception.CustomXPATHException;
import org.wso2.carbon.custom_xpath.parser.custom.CustomParser;
import org.wso2.carbon.custom_xpath.parser.custom.components.ParserComponent;
import javax.xml.namespace.QName;
import javax.xml.stream.XMLStreamException;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

public class CustomXPATH {

    private String xPath;
    private CustomParser customParser;
    private HashMap<String,String> prefixNameSpaceMap;

    /**
     * This constructor is responsible For Create a Custom XPATH Parser Object
     * @param xPath is the XPATH String
     * @param prefixNameSpaceMap  Prefix:NameSpace Map
     * @throws CustomXPATHException
     */
    public CustomXPATH(String xPath,HashMap<String,String> prefixNameSpaceMap) throws CustomXPATHException {
        setPrefixNameSpaceMap(prefixNameSpaceMap);
        setxPath(xPath);
        try {
            setCustomParser(CustomXPATHCompiler.parse(getxPath()));
        } catch (RecognitionException e) {
            throw new CustomXPATHException(e);
        }
        catch(CustomXPATHCompilerException exception){
            //Here is the catch clause for XPath expressions which the Custom Parser cant process. So we can call A parser like Jaxen here.
        }
    }

    /**
     * This will return the XPATH expression's result when you provide a Input Stream To a XML
     * @param inputStream for a XML
     * @return  Result of the XPATH expression
     * @throws XMLStreamException
     * @throws CustomXPATHException
     */
    public  String getStringValue(InputStream inputStream) throws XMLStreamException, CustomXPATHException {
        if(customParser!=null){
            return getCustomParser().process(inputStream);
        }
        else{
            //Here the Jaxen will do the processing
        }
    }

    /**
     * This will return the XPATH expression's result when you provide a Input Stream To a XML
     * @param envelope for a XML
     * @return  Result of the XPATH expression
     * @throws XMLStreamException
     * @throws CustomXPATHException
     */
    public  String getStringValue(String envelope) throws XMLStreamException, CustomXPATHException {
        return getStringValue(new ByteArrayInputStream(envelope.getBytes()));
    }

    public String getxPath() {
        return xPath;
    }

    public void setxPath(String xPath) {
        this.xPath = xPath;
    }

    public CustomParser getCustomParser() {
        return customParser;
    }

    public void setCustomParser(CustomParser customParser) {
        this.customParser = customParser;
    }

    public HashMap<String, String> getPrefixNameSpaceMap() {
        return prefixNameSpaceMap;
    }

    public void setPrefixNameSpaceMap(HashMap<String, String> prefixNameSpaceMap) {
        this.prefixNameSpaceMap = prefixNameSpaceMap;
        ParserComponent.setPrefixNameSpaceMap(prefixNameSpaceMap);
    }

}

Tuesday, October 9, 2012

Securing an exisiting WebApp using Entitlement Servlet Filter

Entitlement Servlet Filter is for check the Authorization of the requests which are coming to a webapp. This guide will tel you how to add that to a existing web of yours. You can read more about Entitlement Servlet Filter Here.

The steps to add Entitlement Servlet Filter to your Web App :

  • Add one of J2EE Authentication Mechanism to the WebApp. (Still Entitlement Filter Support Basic Auth Only). To do this task add following to the web.xml of your WebApp.
     <security-constraint>
        <display-name>Example Security Constraint</display-name>
        <web-resource-collection>
            <web-resource-name>Protected Area</web-resource-name>
        <!-- Protected URL -->
            <url-pattern>/protected.jsp</url-pattern>
            <!-- If you list http methods, only those methods are protected -->
            <http-method>DELETE</http-method>
            <http-method>GET</http-method>
            <http-method>POST</http-method>
            <http-method>PUT</http-method>
        </web-resource-collection>
        <auth-constraint>
            <!-- Anyone with one of the listed roles may access this area -->
            <role-name>admin</role-name>
        </auth-constraint>
    </security-constraint>

    <!-- Default login configuration uses form-based authentication -->
    <login-config>
        <auth-method>BASIC</auth-method>
        <!--<auth-method>FORM</auth-method>-->
        <realm-name>Example Form-Based Authentication Area</realm-name>
        <form-login-config>
            <form-login-page>/protected.jsp</form-login-page>
        </form-login-config>
    </login-config>

    <!-- Security roles referenced by this web application -->
    <security-role>
        <role-name>everyone</role-name>
    </security-role>
    <security-role>
        <role-name>admin</role-name>
    </security-role>

  • Engage the Entitlement Servlet Filter. To do this task add following to the web.xml of your WebApp.
    <!-- Filter mappings used to configure URLs that need to be authorized  -->
    <filter-mapping>
        <filter-name>EntitlementFilter</filter-name>
        <url-pattern>/protected.jsp</url-pattern>
    </filter-mapping> 

  • Provide necessary parameters to the Entitlement Servlet filter. To do this task add following to the web.xml of your WebApp.
    <!-- The scope in which the subject would be available.  Legal values are basicAuth, request-param, request-attribute, session -->
    <context-param>
        <param-name>subjectScope</param-name>
        <param-value>basicAuth</param-value>
    </context-param>

    <!-- The name of the identifier by which to identify the subject -->
    <context-param>
        <param-name>subjectAttributeName</param-name>
        <param-value>username</param-value>
    </context-param>

    <!-- The username to perform EntitlementService query-->
    <context-param>
        <param-name>userName</param-name>
        <param-value>admin</param-value>
    </context-param>

    <!-- The password to perform EntitlementService query -->
    <context-param>
        <param-name>password</param-name>
        <param-value>admin</param-value>
    </context-param>

    <!-- The URL to perform EntitlementService query-->
    <context-param>
        <param-name>remoteServiceUrl</param-name>
        <param-value>https://localhost:9443/services/</param-value>
    </context-param>
    
    <!-- EntitlementFilter Settings -->
    <filter>
        <filter-name>EntitlementFilter</filter-name>
        <filter-class>org.wso2.carbon.identity.entitlement.filter.EntitlementFilter</filter-class>

        <!--Client Class that extends AbstractEntitlementServiceClient. Legal values are basicAuth, soap and thrift.
        Default is 'thrift'.-->
        <init-param>
            <param-name>client</param-name>
            <param-value>basicAuth</param-value>
        </init-param>

        <!--Decision caching at PEPProxy. Legal values are simple and carbon.-->
        <init-param>
            <param-name>cacheType</param-name>
            <param-value>simple</param-value>
        </init-param>

        <!--Maximum number of cached entries. Legal values are between 0 and 10000 -->
        <init-param>
            <param-name>maxCacheEntries</param-name>
            <param-value>1000</param-value>
        </init-param>

        <!-- Time interval for which cached entry is valid.-->
        <init-param>
            <param-name>invalidationInterval</param-name>
            <param-value>100000</param-value>
        </init-param>

        <!-- URL ro redirect to if authorization fails -->
        <init-param>
            <param-name>authRedirectUrl</param-name>
            <param-value>/index.jsp</param-value>
        </init-param>

    <!-- This will be used if the transport type is thrift. -->
        <init-param>
            <param-name>thriftHost</param-name>
            <param-value>localhost</param-value>
        </init-param>

        <!-- This will be used if the transport type is thrift.-->
        <init-param>
            <param-name>thriftPort</param-name>
            <param-value>10500</param-value>
        </init-param>

    </filter> 


So after following these steps your webApp is successfully secured with Entitlement Filter. You can find a sample project here.
Also make sure that you have to put the org.wso2.carbon.identity.entitlement.filter_4.0.2.jar, org.wso2.carbon.identity.entitlement.proxy_4.0.2  and org.wso2.carbon.identity.entitlement.stub_4.0.0.jar to your java classpath. The links for those jar is here. Also you can build those jars by using these links.

https://svn.wso2.org/repos/wso2/carbon/platform/trunk/service-stubs/org.wso2.carbon.identity.entitlement.stub/
https://svn.wso2.org/repos/wso2/carbon/platform/trunk/components/identity/org.wso2.carbon.identity.entitlement.proxy/
https://svn.wso2.org/repos/wso2/carbon/platform/trunk/components/identity/org.wso2.carbon.identity.entitlement.filter/

Saturday, October 6, 2012

Binary Relay - WSO2 ESB : One Major Feature Behind the Success

WSO2 Enterprise Service Bus have bulk of features and components which will help to have a good Enterprises Level Service Integration. The Architecture and the Components of WSO2 ESB explains that more. The Users of WSO2 ESB can have lot of flexibility in Routing Messages, Security , QoS,Monitoring etc.
When we focus about the Performance of the WSO2 ESB, I have identified a special feature which is behind. That is the Binary Relay of WSO2 ESB. In the common scenario the routing and manipulation of the messages inside WSO2 ESB happens on the details given in the Payload. But that will add a overhead to processing. So that will limit the performance, while having the flexibility of processing messages.
If we can live without this high level of flexibility of message processing we can increase the performance. That is the main fact behind the Binary Relay feature of ESB. In this mod of ESB, message payload is not touched. The routing of messages happen considering transport level headers. The incoming message will be added to a Dummy SOAP message as a Data Stream. Then the dummy message will pass through the processing pipeline of the ESB. Following diagram shows that functionality,



Also we can recreate some functionality which given in the normal mod of the ESB. Those recreated functionality will be good in performance.
These days I am working on such R&D works at WSO2. So I feels really interesting about this feature. You can read more about Binary Relay in these links,