Regex Fu: Not Followed By

Everyone has their own regular expression implementation; especially text editors. One is good, while the rest range from absurd to almost usable.

Sometimes, despite your best efforts, you’ll find yourself in a strange environment needing to do some funky regex fu.

So for the sake of argument, let’s imagine we have a standard no-frills regex engine:

  • ( ) Capture groups
  • ?, *, + Repetition
  • [a-z] [^a-z] Character classes including negation
  • ^ $ Anchors
  • a|b Alternation

For the sake of clarity, we’ll assume it’s set to ignore whitespace too.

Simple Case

We’re looking for all instances of “fish”, so long as they aren’t followed immediately by “box”. With Perl-compliant regexes, we’d be able to say something as simple as:

fish(?!box)

We won’t be able to replicate that exactly with our limited engine, but we can get fairly close. (Remember that negative lookahead should be zero-width and non-capturing; that’s something we’ll have to fudge). Let’s start by figuring out what we want to accept, and what we want to fail.

Accept:

  • fish
  • fishsticks
  • fishbig
  • fishborn

Fail:

  • fishbox

Sounds simple enough. So we can get 90% there with just “fish” followed by either not “b” or the end of the stream:

Replace fish( [^b] | $ ) with salmon$1

Simple. That makes “fishbig” and “fishborn” fail, but we have successfully converted “fish(?!b)”, which is at least a start. Let’s try to convert fish(?!bo) next. So in addition to what we already have, we also want to say “if we do find a ‘b’, just make sure it’s not followed by an ‘o’”.

fish (b([^o] | $)
      | [^b] | $ )

And after that, adding the “x” is just the same thing all over again.

fish (b
        (o([^x] | $)
         | [^o] | $)
      | [^b] | $ )

Condensed, that’s:

fish(b(o([^x]|$)|[^o]|$)|[^b]|$)

Can you believe it, it’s only twenty-one characters longer than the Perl version! (Or 9n - 6, where n is the number of characters in the negative lookahead)

It’s very limited

Initially I thought that it would be possible to translate something more interesting, such as:

fish(?!.*box)

Into something like:

fish(b(o(x^$FAIL$^|[^x]|$)|[^o]|$)|[^b]|$)*$

And this works for the most part, except in cases like “fishbbox” or “fishbobox” - it lets them straight through. So, only the absolute simplest case of “something not followed by some literal” is possible, or I think at most the not-followed-by could have character classes or alternations.

No Comments

AJAX OnBlur Commits: Cool Vs Usability

I am not anti “Web 2.0“. I have enjoyed seeing the evolution of web applications under its banner, though I find the concept - as intangible as it can be - amusing at times. I have to say, though, that there are instances when I know it has gone too far. I am finding more and more overly AJAX-ified sites which seem to be increasingly focused on how many items on the Web 2.0 checklist they can cross off and less on the usability principles which will give their visitors the smooth and comfortable experience they deserve.

This article really only touches on one example that I have experienced recently: onBlur commits (AJAXified, of course).

Now I can already hear you cry “But Carly, this saves me so much time! I don’t need to press OK, or even hit enter. Do you know how many microseconds that saves me?!” or whatever. Before you start preparing your flaming comments, let me explain.

I am travelling in South America and am currently in Bolivia. Needless to say, the internet here is generally pretty shite, especially when you get out of La Paz and into some of the smaller cities. At times I need to use their PCs rather than my beloved Mac which I am hauling around the continent. This is fine (I’m not that much of a snob, really) but when the machine you are using is munching over a 56K connection, only has 64mb RAM and is running some ungodly version of Windows, you might start to appreciate how AJAX (the onBlur commit especially) starts to falls apart.

I use Facebook to keep in touch with family, friends and people I meet on the road - so I use it quite a bit (lets not go into the “Facebook is for pussies” discussion, now is not the time). Last night I wanted to update the “About Me” section, a little spiel that sits underneath your Facebook profile photo on the left. I clicked “Edit”, and started typing in my note. Part way through I wanted to check the quote that I was supplying - I flicked over to another browser window to check it, and when I came back found that not only had this triggered an onBlur and was committing my half-entered text, but that due to the slow internet connection I now had to wait for 10 minutes before I could correct it, if it didn’t timeout altogether. Because I rarely prepare all of my text in one hit and tend to flick back and forth between browser windows and other documents, this happened to be quite a bit. It was incredibly frustrating.

The thing is, I am not aware of any way to turn this off in Facebook. Can I return to a less AJAXy version of Facebook while I am travelling, or even if I just prefer to use one? What about at least providing me with a little AJAXed OK button rather than submitting my input when I move the focus away from the field? If you know, please tell me :-)

Now lets look at this in terms of usability. For those that may not be familiar with the common usability principles (there is no one gospel for usability), this relates most closely to the item usually listed under “affordance“. The idea is to make it clear - intuitively clear if possible - what will happen when you do something, how controls work etc. From this point of view, how am I to know that the form will submit when I shift my focus to another window? I haven’t indicated to the application that I am ready for my text to commit. For less technically-minded people (I’m thinking of my parents here - my mother is now using Facebook) I imagine this must confuse the hell out of them just as much as it annoys me when it catches me out on slow connection.

The side-issue of not being able to change from AJAX to traditional button-controlled submission may also fall under another common principle of allowing the user to have control.

I understand that it is convenient for a commit to happen via a method other than ENTER, especially when it is multi-line input and ENTER should not submit the form. However, even clicking elsewhere on the page to trigger a commit is ambiguous (or for those that are keyboard control ninjas, tabbing out of the field). What if I want to copy some text from somewhere else on the page? Why shouldn’t I be able to do this while I am in the middle of typing text in a field? Why are you forcing me to prepare my text outside your application altogether, in order to avoid this insane event?

Let us just think about this a little, folks. User experience is important, yes. Shiny AJAX is “oh so cool” and can be nice to use, yes, and it will probably draw people to your product. But usability is paramount. If they can’t use your application when they arrive - or if it annoys them - they will leave (or yell at you). Make your application work for the user, remember?

Some Resources on Usability

These resources are not exhaustive, but hopefully will help you in the right direction.

No Comments

Valid and Accessible Collapsible Panels with Mootools

The other day my friend Matt Stow mentioned Mootools as a lightweight alternative to prototype and scriptaculous, so I decided to check it out.

Yet another collapsible panel!

On the whole, I think I’m impressed with Mootools/Moofx. Firstly, check out the download page. It lets you choose and pick exactly what components you want, it will find the dependencies and then bundle and compress them for you. How easy! I ended up getting just Fx.Slide, and the entire javascript is only 22kb: prototype + scriptaculous (effects only) is 166kb.

It was also a lot easier (almost implicit, in fact) to get it working decently with multiple clicks. It’s still not perfect (hold down enter..) but it’s definitely good enough, whereas scriptaculous required a .style.height = 0 hack.

I did still need one hack though: if you try to run slide = new Fx.Slide(contents); slide.hide(); from within the contents div, IE doesn’t seem impressed. Try it here: IE test case. But I didn’t want to just move the script out because if the content is large or the connection is slow the content might flash and make for jerky loading. So I ended up with something like:


function collapsibleOnClick(div, heading, contents)
{
    contents.style.display = 'none';

    var slide_effect;
    return function() {
        if (!$defined(slide_effect)) {
            contents.style.display = '';
            slide_effect = new Fx.Slide(contents);
            slide_effect.hide();
        }

        // ...
    };
}

So that it’s hidden with display: none initially, and on the first click it will re-display and create our funky Fx.Slide object. The world is saved.

I’d like to look into further compression / culling of scriptaculous, but right now I’m inclined think Mootools feels nicer regardless of size.

2 Comments

Valid and Accessible Collapsible Panels with Scriptaculous

Believe it or not, creating collapsible panels — and “rich web elements” in general — requires some careful thought and planning. Some things need to be taken into account, such as:

  • Not all users have mice - some only have keyboards (or equivalent)
  • Some people don’t have JavaScript - whether by choice or necessity
  • Some people don’t have CSS support

The only thing you can really bank on is that your bare markup will be handled OK, so it’s important to start with that and make sure everything on top of that only acts as candy; eye or otherwise.

View the collapsible panels example.

So with our example, we’ll start by designing the markup we need as simply as possible. We’ll want a container for each panel, and each panel will have a heading and another container for content. Something like this:

<div class="collapsible">
    <h2>Collapsible Panel</h2>
    <div>
       <p>A paragraph of content for the panel</p>
    </div>
</div>

Now some JavaScript magic needs to happen. We’ll need to pollute our markup slightly, but it’s worth it. Give each collapsible panel an id, and just inside the panel content div we want to call a JavaScript function with the panel that will:

  1. Hide the content and add a ‘closed’ class to the panel.
    (We hide it in JavaScript so non-JavaScript users can still see it)
  2. Inject a link into the h2 like so: h2.innerHTML = '<a href="#">' + h2.innerHTML + '</a>';
    (This way non-JavaScript users won’t have useless links sitting around)
  3. Get the new link and set its onclick

        function attachCollapsible(collapsible)
        {
            var div = $(collapsible);
            var heading = div.select('h2')[0];
            var contents = div.select('div')[0];
// initially hide it div.addClassName('closed'); contents.hide();
// add an anchor at run time, // that way non-js users won't see anchor, but js users will // still be able to use their keyboard for it. heading.innerHTML = '<a href="#">' + heading.innerHTML + '</a>'; heading.firstDescendant().onclick = collapsibleOnClick(div, heading, contents); }

The onclick should:

  1. Clear the contents’ height to make sure the effects aren’t in a half-baked state: contents.style.height = '';
    (There might be a better way to do this, but it’s simple and works well enough)
  2. Check whether the panel has the ‘closed’ class
  3. Fire up an appropriate effect
  4. Toggle the ‘closed’ class
function collapsibleOnClick(div, heading, contents)
{
    return function() {
        // clear the height; gets messy otherwise if
        // a blind is already in progress
        // not perfect, but simple and huffy enough,
        // and doesn't seem to be able to get into a bad state
        contents.style.height = '';
// do we need to go up or down? if (div.hasClassName('closed')) { new Effect.BlindDown(contents,{duration:0.3, fps:100}); Element.removeClassName(div,'closed'); } else { new Effect.BlindUp(contents,{duration:0.3, fps:100}); Element.addClassName(div,'closed'); }
// event has been dealt with. return false; }; }

And hey, you’ve now got a collapsible panel that works with and without JavaScript, works with a keyboard, has no useless hash links and has nice clean markup.

2 Comments

Dynamic Robots.txt with ASP.NET 2.0

The Problem

Scenario: You have an ASP.NET 2.0 website in IIS 6.0 that receives both http and http requests. Some people might link to the https versions of the page, but you really want Google (and other search engines) to only crawl http versions. There are relative links in the site, so you know that as soon as Google crawls one https link it is going to pick up the whole site. So as well as https links in the Google search results, you may also now have to contend with lower rankings because Google thinks there is duplicate content on the site (it sees the page in both http and https format).

You can see that Google recommends that you have a different robots.txt file for http and https, so that Google knows not to crawl the https version of your site. But how can you get IIS to serve up different versions of robots.txt?

The Solution

Lets make the robots.txt file dynamic. We’ll turn it into an ASP.NET handler so we can have .NET code in it. So you don’t need to recompile your whole web project just for this, we’ll take advantage of ASP.NET 2.0’s on-demand compilation ability, and to boot we’ll ensure the site still serves up other text files as it would normally.

Awesome, wot? :-)

The ASP.NET Pipeline

I’m assuming you know what ASP.NET HTTP Pipeline is and vaguely how it works.

If you don’t, here is a crash course in a few sentences. Based on file extension, when IIS receives a request it can either a) handle it by itself, or b) pass the request on to something other engine (e.g. ASP.NET) and then just hand back to the client whatever that engine responds with. By default, IIS will serve text files on its own, but we are going to tell IIS to let ASP.NET process text files - then we are going to tell ASP.NET to treat the robots.txt file as if it were executable code (in this case an HTTPHandler).

Create your Text File

Create robots.txt file in the root of your web project, containing the text below. This robots.txt is provided in C#. There is no reason that a similar solution will not work in VB.NET but I prefer C#.

Robots.txt


<%@ WebHandler Language="C#" Class="MyNamespace.robotshandler" %>

using System;
using System.Web;

namespace MyNamespace {

    public class robotshandler: IHttpHandler {

        public void ProcessRequest (HttpContext context) {
 
            context.Response.ContentType = "text/plain";
            context.Response.Write("User-agent: *\n");
                
            if (context.Request.ServerVariables["Https"]=="off"){
                // HTTP
                context.Response.Write("Allow: /\n");
                context.Response.Write("Disallow: /MyDisallowedDirectory/\n");
            } else {
                // HTTPS
                context.Response.Write("Disallow: /");
            }
        }
    
        public bool IsReusable {
            get {return false;}
        }
    }
}

Check that you can browse to your robots.txt file. You should see the code come through exactly as you typed it, WebHandler directive and all.

Getting IIS to Pass .txt Requests to ASP.NET

At the moment, IIS is still serving the txt files instead of letting .NET know about them - the request for a .txt file doesn’t even reach .NET. Lets tell IIS to pass that request through to .NET.

Get the path to the ASPX engine

  1. Open IIS and right click on your website and bring up the properties screen
  2. Go to Home Directory > Configuration. You will be on the Mappings Tab.
  3. Locate the ASPX item and click Edit - Copy the path in the Executable Field and cancel out of that window. We don’t want to change anything there - we just want the path

Create the ISAPI Entry for .txt

You are still on the Mappings Tab for your site in IIS 1. Click “Add” 2. Populate the Executable path with the value you copied in the last section 3. Enter “GET” in the “Limit To” field 4. Press OK a million times to save all your changes. You’re done with IIS.

Check the result

When you now try to browse to your text file you should get a blank page. This means IIS is now passing the request to ASP.NET, but ASP.NET doesn’t know what to do with it.

Getting ASP.NET to Handle a .txt Request as a Handler

OK. Web.config time!

Add the HTTP Handler

Add the following under system.web

<httpHandlers>
    <add path="/robots.txt" verb="GET" type="System.Web.UI.SimpleHandlerFactory" />
</httpHandlers>

What does this mean? Basically this is saying “When a GET request comes in for /robots.txt, process it the same way as an HTTP Handler”. Funnily enough, it just so happened to be a Handler that we defined in our robots.txt file!

Browse to your robots.txt file now, and you will get a .NET error relating to “No build provider registered for extension ‘.txt’”. O-oh! Whats happening here is .NET knows that it needs to process the code in our file, and it needs to do some compiling, but it doesn’t know what to build it with.

I think that if you had deployed a compiled HTTP Handler solution instead, the step below would not be required (I haven’t tested this - if you try it, please let me know how you go).

Add the Build Provider

Add the following to the web.config under system.web, compilation:

<buildProviders>
    <add extension=".txt" type="System.Web.Compilation.WebHandlerBuildProvider" />
</buildProviders>

Check your robots.txt file! It should be working, yay!

BUT - check another text file in your site. Oh noes! It is blank! .NET still doesn’t know what to do with text files other than our robots.txt. Read on for the fix0r.

Getting ASP.NET to Continue to Serve all Other .txt Files Normally

Lets add one more HTTPHandler entry to our web.config, under System.Web, HTTPHandlers.

<add path="*.txt" verb="GET" type="System.Web.StaticFileHandler" />

This line says that for everything ending in .txt, don’t try to process it - just return the static file. Note that our robots.txt matches the more specific rule we supplied earlier, so it will still be processed by the .NET engine.

Conclusion

This is one technique for a quick and easy dynamic robots.txt file taking advantage of ASP.NET 2.0’s on-demand compilation support. You could apply this principle to almost any file extension if you needed to have other dynamic files in your site, but if you need to do really funky or heavy stuff I would recommend writing proper compiled HTTP Handlers rather than using this quick fix technique.

Resources

1 Comment

OpenSocial - Google’s Social Networking Platform

They have blogging, they have a chat client (and chat functionality built into web-based gmail) they bought Blogspot, Youtube… it was bound to happen that Google looked in the direction of social networking.

They have released OpenSocial - no, its not yet another social networking site. Google has rightly realised that particular market is well saturated (whew!). OpenSocial is an open API for social networking applications - an interface for accessing and providing information in existing social networking services. It is aimed at forming a common base between the already existing thousands of social networking sites and applications.

A developer who creates a cool new piece of functionality using OpenSocial will be able to sell it (or provide it to be used for free) to any service that uses the OpenSocial API; we may even be able to have one master friends list used across multiple services in the future. Googles API documentation says “Usually your SPI (Service Provider Interface) will connect to your own social network, so that an OpenSocial app added to your website automatically uses your site’s data. However, it is possible to use data from another social network as well, should you prefer.” I wonder if some of the bigger names will either choose not to provide external access to their data, or will provide it on paid subscription only - and what will this mean for OpenSocial?

While I think there is a possibility of cross-communication between some of the larger services, I think the initial interest from the development community (until we get bored) will probably be focussed on the development of gadgets that can be used in several social networking services. They sound sorta cool, and developers like cool :-) Business minds won’t take long to latch onto the idea of integration between different services and common friends lists etc. It will be interesting to see whether this works out, and if so how long it will take for it to happen.

Given that Yahoo has Messenger, Flickr, Yahoo Mail (and associated services) I’m guessing they won’t be using or providing OpenSocial access to their info ;-) I wonder if they will provide some similar technology for Yahoo services…

Businesses and Social Networking

One thing that Smoothspan mentions in this Ubiquitous Social Networks for Business article is the idea that larger businesses are looking to social networking to solve flow-of-information problems internally and smaller to medium businesses may look at social networking services to maintain contact with clients and find common connections for new work.

I went to a presentation on Microsoft Office Sharepoint Services 2007 (MOSS) a few weeks ago, and the presenter made a point of how useful social networking can be internally. An example he used is the “common connections” function, where is person A needs to deal with person C for the first time, the system identifies person B as being a common connection - person A can talk to person B about how to best interact with person C. Similarly you can see projects that are common between you and another - or the management chain and how you and other people fit together under the management layers and team structures. Another example was the presence awareness built into MOSS - when viewing a document in the document library, you can see who authored it and whether that person is online - and you can contact them immediately if you have any questions or concerns.

Nothing revolutionary in terms of functionality - in fact, to a developer this is not amazing or even terribly interesting stuff. But the fact that businesses - particularly large ones - are beginning to use social networking (either at the direction of Microsoft or other vendors, or of their own accord) is pretty cool. I think its a sign of the future of business and how much web technologies and information applications will become commonplace across the board - even in many sectors that traditionally don’t use web internally day-to-day.

But back to OpenSocial!

Technologies

As usual, GData has been selected - HTTP requests providing XML responses. O’Reilly’s Radar article notes that Google is considering implementing OAuth too which looks interesting. This is the first time I’ve heard of OAuth, so I’ll have to have a bit more of a look before I can say too much about it.

Who is Using it?

You can see a full list of companies already using OpenSocial, and even watch a video of early adopters demoing the OpenSocial integration, but below are some of the bigger names that jump out at me:

Perhaps more interesting are those not using it (yet):

Is it Worth Playing With?

I know all you developers out there are asking whether its worth getting in and having a play with this. My guess is that if you’re not particularly interested in social networking (or the glory/profit of providing one of the first applications) then you probably won’t find the time investment valuable.

If you want to know more I’ve included some links of key pages I’ve found while I’ve been nosing around :-)

Resources

No Comments

Synergy - Not just for Buzzword Bingo

I decided to get Synergy working this morning - it’s a little piece of Open Source sofware that allows you to share a mouse and keyboard across multiple machines, even if they have different operating systems.

I have my Macbook Pro which can run Windows on Parallels as needed, but for heavier Windows applications that I use once in a while� I install them on my PC as I don’t want to fill up my nice shiny laptop. What normally happens then is I end up at my desk with my PC and my laptop, switching between the two keyboards a fair bit.

Well, that annoyance is behind me now as Synergy has allowed me to use the same mouse and keyboard across both machines without any special hardware.

It was surprisingly easy to set up. Install on both machines, set up one as the server (in my case the Windows machine), configure the client and away you go.

The only thing that caught me out was that while the Windows machine could resolve the Mac’s name, the Mac can only contact the Windows machine by IP (there may be a way to make it resolve, but I am still learning the ins and outs of the Mac world - comment if you can point me in the right direction). All that means is that when I start the client on the Mac, I supply the server’s IP address rather than its name. I also added that IP as an alias in the client’s config but I suspect that was not necessary.

You can’t move windows or files between the two machines, but you do have a shared simple clipboard (text and images I believe - I have not yet tested the latter). You can hook up multiple machines (providing they are network contactable) and you have complete control over the relationships between the screens (eg Screen1 is left of Screen2, which is above Screen3). Awethome, wot?

Oh, linky to Buzzword Bingo if any of you were wondering what it was.

1 Comment