Wednesday, December 14, 2011

escaping xpath query

node name in xpath query cannot start with numbers:

So, use org.apache.jackrabbit.util.ISO9075 as shown in:

Example xpath servlet (/apps/sandbox/xpath/xpath.jsp):

runs xpath
contentType="text/plain; charset=UTF-8"
%><%@include file="/libs/foundation/global.jsp"%><%!
private static Pattern XPATH = Pattern.compile("^/jcr:root(/.+)//(.+)$");
public static String escapeXpath(String xpath) {
    final Matcher matcher = XPATH.matcher(xpath);
    if (matcher.find()) {
        final String path = ISO9075.encodePath(;
        final String props =;
        return String.format("/jcr:root%s//%s", path, props);
    return xpath;
final String xpathOriginal = slingRequest.getParameter("xpath");
final String xpath = escapeXpath(xpathOriginal);
final boolean isXpathEscaped = !xpath.equals(xpathOriginal);

final String displayParam = slingRequest.getParameter("display");
final boolean display = "false".equals(displayParam) ? false : true;//by default, iterates query result.

final QueryManager queryManager = resourceResolver.adaptTo(Session.class).getWorkspace().getQueryManager();
final Query query = queryManager.createQuery(xpath, Query.XPATH);

final String limitParam = slingRequest.getParameter("limit");
final long limit = (null != limitParam && !"".equals(limitParam)) ? Long.parseLong(limitParam, 10) : 100;

final String offsetParam = slingRequest.getParameter("offset"); 
final long offset = (null != offsetParam && !"".equals(offsetParam)) ?  Long.parseLong(offsetParam, 10) : -1;
if (offset > 0) {

final long t = System.currentTimeMillis();
final QueryResult result = query.execute();
final NodeIterator iter = result.getNodes();
long count = iter.getSize();

out.append("original xpath:\n");
out.append(xpathOriginal + "\n");
out.append("escaped xpath:\n");
out.append(xpath + "\n");
if (isXpathEscaped) {
    out.append("(xpath is escaped)\n");


if (display) {
    count = 0;
    while (iter.hasNext()) {
        final Node node = iter.nextNode();
        out.append(node.getPath() + "\n");

final long took = System.currentTimeMillis() - t;

queried <%= count %> nodes (offset: <%= offset %>, limit: <%= limit %>) in <%= (took / 1000.0) %> secs.

Friday, December 2, 2011

log4j.xml for crx (logging QueryImpl)

I wanted to see impact of repository size on query performance.
You can configure crx logging to log query times.

    <appender name="query" class="org.apache.log4j.RollingFileAppender">
        <param name="File" value="crx-quickstart/logs/crx/query.log"/>
        <param name="maxFileSize" value="10MB"/>
        <param name="maxBackupIndex" value="100"/>
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d{dd.MM.yyyy HH:mm:ss} *%-5p* %c{1}: %m (%F, line %L)%n"/>
    <logger name="org.apache.jackrabbit.core.query.QueryImpl" additivity="false">
        <level value="debug"/>
        <appender-ref ref="query" />

it'll be logged to cq/crx-quickstart/logs/crx/query.log
 additivity="false" will skip console logging (if root logger has console logger).

Example log:
02.12.2011 14:46:37 *DEBUG* QueryImpl: executed in 0.01 s. (/jcr:root/content/dam//*[(@jcr:primaryType='dam:AssetContent' and jcr:contains(metadata/@dc:format, 'image'))] order by @jcr:lastModified descending) (, line 143)

Wednesday, November 30, 2011

/etc/workflow/launcher/config is flat

I wanted to create various launchers for this project.

So, I started to create launcher nodes:
/etc/workflow/launcher/proj1 is sling:Folder

This does not work. expects /etc/workflow/launcher/config  to be flat. Every child node of the node should have eventType property. Obviously, the sling:Folder, proj1 didn't have.

That broke ENTIRE WORKFLOW LAUNCHER. No workflow was launched.

Make it flat.

Tuesday, October 18, 2011

event handling and dispatcher flush

So, I want to configure CQ like the diagram.
  • a: author instance
  • p1, p2, .... : publish instances
  • d1, d2, ... : dispatchers
  • lb: load balancer
black arrow is replication and flush direction.
red arrow is request handler delegation.

Author will only replicate to publish instances. Publish instances may be clustered.
Dispatcher is not used for load balancing, but as a simple cache (reverse proxy).

And,  I want to disable auto invalidation of dispatcher cache. And, I want to programmically select a handful of affected dispatcher cache whenever there's update to publish instance.

The main reason for this is because auto-invalidation of dispatcher cache is URL path level based (I can only invalidate cache matching /glob under certain path level).

If your application and repository is laid out with dispatcher in mind, you would not have to programmically calculate dispatcher cache to flush. But, my repository is not structured with HTTP in mind (probably wrong usage of Sling and JCR). So, I need to hand pick dispatcher paths to flush whenever there's an update to publish instance's repository.

Also, my author instance calls Replicator.replicate() in various places (for example, when DAM rendition is modified, author instance automatically replicates the rendition to publish instance).

So, publish instances need to listen to events and when things are changed, it should flush relevant dispatcher cache.

Events to listen can be determined by looking at:
where localhost:4503 is one of publish instances.

I looked at the event logs after activating a page and activating a rendition with On Modification trigger on the flush agent set and unset.

Relevant events are:
(various constants are documented in javadoc: )


@Component(immediate = true,
        enabled = true
@Service(value = {EventHandler.class, JobProcessor.class})
@Property(name = EventConstants.EVENT_TOPIC, propertyPrivate = true, value = {
        SlingConstants.TOPIC_RESOURCE_CHANGED,//this is for rendition edit
        PageEvent.EVENT_TOPIC//this is for page edit
public class DispatcherFlushOnPublish implements EventHandler, JobProcessor {

Saturday, September 24, 2011

vlt prints password

so, you want to use vlt

read -p "password: " -s p
vlt -q rcp -q -b 1000 -t 1 - u "http://user:$p@foo/crx/-/jcr:root/content" "http://user:$p@bar/crx/-/jcr:root/content"

and it prints stuff like
Connecting via JCR remoting to http://user:r34lp455w0rd!!@localhost:4502/crx/server

I could not find a way to disable that. so,
vlt -q rcp -q -b 1000 -t 1 - u "http://user:$p@foo/crx/-/jcr:root/content" "http://user:$p@bar/crx/-/jcr:root/content" |grep -v "http://" >> your.log

Saturday, September 17, 2011

CQ related google queries

This query returns many CQ sites:
Many of the sites accept .json rendering (-1.json, 34324232.json ...etc). Some also accept .query.json?statement=//*
Easy for content grabbing and DOS attack.

Also, this:
Usually,  /content/dam.xml is large. Easy for DOS attack.

Many sites block .json on /content. But they still let .json on /etc.

This queries shows a few author instances (try default cq logins other than admin:admin such as author:author):

Once you locate a CQ site, you can try various paths:

Also, try json servlets:

format is:

If things are blocked, try with some of characters replaced with url encoding.

For example, this is 404:*,cq:Page%29

But this returns:*,cq:Page)

e in query is replaced with %65 and o in json is replaced with %6F

Most of these are possible because of Sling:
Usually, databases use different port (and different protocol other than HTTP) to communicate with HTTP applications. Even databases that use HTTP (such as couchdb can be configured to use different port from HTML rendering server. But Sling exposes entire database (JCR) content on the same port for HTTP clients to access.

Sling does have access control mechanism. But, common development paradigm for Sling is to expose all resources to everyone.

You could expose few resources and have resourceType to query/access actual content resource.
For example, instead of exposing the following:

You can expose only one resource:

And, have resourceType of /content/pages handle GET requests to:
by reading actual content from:

Or, you can have a proxy server blocking various paths that could be used maliciously. For example, CQ has dispatcher module for Apache httpd. You can configure dispatcher.any to deny access to various globs.

Wednesday, September 14, 2011

datastore garbage collection

My instance had 12GB datastore
$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda5             169G   73G   88G  46% /
none                  5.9G  672K  5.9G   1% /dev
none                  5.9G  196K  5.9G   1% /dev/shm
none                  5.9G  104K  5.9G   1% /var/run
none                  5.9G     0  5.9G   0% /var/lock
/dev/sda2              98G   12G   81G  13% /mnt/datastore

I ran datastore garbage collection, which ran for 5 hours. and it's now 2GB:
 $ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda5             169G   73G   88G  46% /
none                  5.9G  672K  5.9G   1% /dev
none                  5.9G  296K  5.9G   1% /dev/shm
none                  5.9G  108K  5.9G   1% /var/run
none                  5.9G     0  5.9G   0% /var/lock
/dev/sda2              98G  2.0G   91G   3% /mnt/datastore

I have datastore out of crx-quickstart on a different drive for easy back up.. etc.
    <DataStore class="">
        <param name="path" value="/mnt/datastore"/>

The script for running datastore garbage collection is here:


there are curl command examples:

For example,

curl -c login.txt "http://localhost:7402/crx/login.jsp?UserId=admin&Password=xyz&Workspace=crx.default"

Of course they don't work.

You need this instead:
curl -c login.txt -F"_charset_=UTF-8"  -F"UserId=admin" -F"Password=admin" -F"Workspace=crx.default" "http://localhost:7402/crx/login.jsp"

You need POST request.. and also include _charset_ param.

Monday, August 1, 2011


double clicking a page in WCM siteadmin  opens content finder view:

If you don't want it, add jcr:content/@cq:defaultView = "html"

Then, double clicking a page in WCM siteadmin will open

Tuesday, July 26, 2011

jcr:content and

these are commonly used in jsp after <cq:defineObjects/> and <sling:defineObjects/>:

  • resource
  • currentNode
  • currentPage
resource.getPath()  is the node that has sling:resourceType  (usually /foo/bar/jcr:content)

currentNode.getPath() is the node that has sling:resourceType

currentPage.getPath() is actual Page (usually, /foo/bar).

so, if you have something like:
iter = queryResult.getNodes(); //returns  cq:Page nodes
while (iter.hasNext()) {
    sling.include(iter.nextNode().getPath());// resource of the script included will be jcr:content

However, if you specify resourceType for include, 
For example,  SlingScriptHelper
with RequestDispatcherOptions.setForceResourceType()  or,  <sling:include resourceType="..."/>, the included script (handler)'s  resource is NOT jcr:content but the resource itself. (Not  /foo/bar/jcr:content,  but /foo/bar..  where included resource was /foo/bar).

Okay this is hairy.

Let's work an example. 

Client request was:

GET /x/y.html

/x/y's resourceType (handler) is /apps/a/handler.
In /apps/a/handler/handler.jsp,  you include /foo/bar (a cq:Page).  /foo/bar's resourceType is /apps/b/handler.

You include:
  1. <sling:include path="/foo/bar.html"/>
    1. then /apps/b/handler will be used.
    2. resource is /foo/bar/jcr:content
    3. currentNode is /foo/bar/jcr:content
    4. currentPage is /foo/bar
  2. <sling:include path="/foo/bar" resourceType="c/handler'/>
    1. /apps/c/handler is used
    2. resource is /foo/bar
    3. currentNode is /foo/bar
    4. currentPage is /x/y

Friday, July 15, 2011

crxde classpath

You must include /etc  because there are jars under /etc/crxde/profiles/default/libs/*

So, /etc/crxde/profiles/default/@crxde:paths = /apps, /libs, /etc/crxde

I explicitly did not include /etc because /etc/tags was huge and CRXDE could not handle it.

Here is script to install custom jars to the libs folder:


function usage() {
    echo "installs jar files for CRXDE (/etc/crxde/profiles/default/libs) to given hosts"
    echo "usage: $0 project-home-directory hostname1 hostname2 ..."

function err() {
    echo "$msg"
    exit 1

if (( $# < 2 ))
    exit 1

shift 1

for x in "${bundles[@]}"
    for host in $*
        echo "install $x to $host"
        curl -f -u "$cred" -T "$x" "http://$host/etc/crxde/profiles/default/libs/$f" || err "fail on crxde libs install $x -> $host"

Monday, July 11, 2011

disabling link checker

Of course the above isn't complete.

You also need the following property:

service.special_link_patterns = .*


I wanted to clean up workflow archives.

So, I moved /etc/workflow/instances to /tmp/foo. Created sling:Folder, /etc/workflow/instances. And Recursively Deleted /tmp/foo (through /crx  Content Explorer).

Problems are:
  1. There could be RUNNING or STALE workflow. You really need to remove COMPLETED workflows only.
  2. When you create new /etc/workflow/instances sling:Folder, you need jcr:mixinTypes = ["rep:AccessControllable"]
  3. and, set sling:resourceType = cq/workflow/components/instances

There is

But use it with caution. It can kill the instance.

Thursday, April 28, 2011

Including resources in OSGi bundle

you can has:

                            {maven-resources}, {maven-dependencies},

inside  maven-bundle-plugin

                            {maven-resources}, {maven-dependencies},

You need {maven-resources} and {maven-dependencies} or mvn packages fails.

And it is dst-pat-in-the-jar=src  (templates=src/resources/templates)

Wednesday, April 27, 2011

sling mapping /etc/map

You can test jcr resolving stuff under http://localhost:4502/system/console/jcrresolver

  1. make /etc/map nt:unstructured so that you can order  stuff there in crxde lite (sling:match regexes are applied from top to bottom)
  2. I use only one level deep (/etc/map/foo, /etc/map/bar)
  3. The string matched against sling:match regexes have the following format: <protocol>/<host>.<port><path>   (for example,  http/
  4. you can use it to redirect /foo/bar/ to /content/site/en/foo/bar.html
  5. not sure how short paths like /foo/bar/  will work with replication :P

sling:OsgiConfig for multiple run modes

You can have multiple run modes and have CQ (or Sling.. I don't know) select proper sling:OsgiConfigs.

For example,  if your run modes are a,b config.a.b are applied:

  • author,production
    • /apps/<foo>/*
  • author,staging
    • /apps/<foo>/*
  • publish,production
    • /apps/<foo>/config.publish.production/*

Probably order of run modes are important. And I'm not sure if more than two run modes will work.


  • author,production,foo,bar
    • /apps/<foo>/*
    • /apps/<bar>/*
    • /apps/<foo>/config/*
    • /apps/<foo>/*

test what configuration is selected...

in place upgrade to CQ 5.4 GA

  1. turn off replication agents
  2. back up (snapshot or online backup.. in case something goes wrong)
  3. remove /etc/workflow/instances
  4. create /etc/workflow/instances :: sling:Folder
  5. remove /var/eventing/jobs
  6. create /var/eventing/jobs :: sling::OrderedFolder
  7. change admin password to "admin" (for CRX.. probably for other things too)
  8. shutdown CQ
  9. cp crx-quickstart/server/serverctl ~/backup/  (back up other stuff too if you want to)
  10. rm -rf crx-quickstart/launchpad
  11. rm -rf crx-quickstart/server
  12. rm -rf crx-quickstart/logs
  13. mv ~/Downloads/cq-quickstart-5.4.0.jar /opt/cq/
  14. sudo su day (MAKE SURE you are the same user CQ runs as)
  15. java -XX:MaxPermSize=256m -Xmx1024m,production -jar cq-quickstart-5.4.0.jar -v -p 4502
  16. <build-number> (launchpad is wiped out.. so need to redeploy stuff)
  17. enable replication agents
  18. test
  19. restart CQ using serverctl

CQ upgrade gotchas:

  1. admin password should be changed to "admin" or bad things happen.
  2. empting out /etc/workflow/instances saves time for upgrade.
  3. removing crx-quickstart/launchpad does you many good. otherwise, OSGi bundles don't get fully upgraded.
  4. make sure you restart CQ for upgrade as a proper user. Otherwise, crx-quickstart/repository  will be messed up.

Thursday, February 3, 2011

TagInputField.js tags xtype

CQ includes this:


So you can configure it like this:

new CQ.tagging.TagInputField({
    namespaces: ["foo", "default", "bar] //search only these namespaces
    , tagsBasePath: "/content/mytags" //do NOT do this. TagManager always searches from /etc/tags anyways
    , suggestMinChars: 1 //auto complete starts from one character input (default 3)

cq:ClientLibraryFolder debug

To have individual javascript to be included instead of one javascript:


Check Debug