HTML Purifier

Posted February 14th, 2012 in Coding by Mark Leong

The problem

I want to allow users to input HTML-formatted text, but I only want them to use certain tags and never any JavaScript. Sometimes users will copy and paste WYSIWYG formatted HTML, with it’s associated CSS classes and inline style rules – but I don’t want that to mess up my site design.

A simplistic approach is to attempt to use regular expressions to filter out unwanted HTML tags, but this becomes tedious and is always fraught with risk because it is notoriously difficult to anticipate and catch all the possible permutations of HTML tags and their attributes.

A more successful approach is to use a psuedo markup language like bbCode or WikiText, but both of these require users to learn another markup language, which is likely to deter users from posting.

Is there a better alternative?

Yes there is! HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C’s specifications.

HTML Purifier works by decomposing the whole document into tokens and removing non-whitelisted elements, checking the well-formedness and nesting of tags, and validating all attributes according to their RFCs.

Why HTML Purifier

I’ve used HTML Purifier because it

  • uses a whitelist (e.g. allow only b, p, br, ul, ol and li tags)
  • outputs valid XHTML
  • protects againts XSS
  • can remove attibutes and classes from tags without removing the tags

Before and after

An example of HTML that a user may enter:

<P style="MARGIN: 0cm 0cm 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto" class=MsoNormal><st1:Lorem w:st="on"><st1:place w:st="on"><B>LOREM IPSUM</B></st1:place></st1:Lorem><B>LOREM IPSUM</B></P>
<P style="MARGIN: 0cm 0cm 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto" class=MsoNormal><st1:place w:st="on"><st1:PlaceName w:st="on">Lorem</st1:PlaceName> <st1:PlaceType w:st="on">Ipsum</st1:PlaceType></st1:place>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque at augue vitae nisl sodales interdum. <st1:City w:st="on"><st1:place w:st="on">Lorem </st1:place></st1:City> Pellentesque erat enim, ullamcorper eget vehicula feugiat, auctor non nunc. Quisque vel molestie eros. Cras erat nulla, faucibus eget pretium at, cursus eu enim. <st1:place w:st="on"><st1:PlaceType w:st="on">lorem</st1:PlaceType> <st1:PlaceType w:st="on">Ipsum</st1:PlaceType></st1:place> Integer et eros lorem, eget pharetra justo. Maecenas accumsan eleifend leo, a ullamcorper justo venenatis ut. Vestibulum bibendum diam vel turpis lobortis bibendum.</P>
<P style="MARGIN: 0cm 0cm 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto" class=MsoNormal><B>Lorem / Ipsum</B> </P>

After it has been passed through the filter:

<p><b>LOREM IPSUM</b><b>LOREM IPSUM</b></p>
<p>Lorem Ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque at augue vitae nisl sodales interdum. Lorem Pellentesque erat enim, ullamcorper eget vehicula feugiat, auctor non nunc. Quisque vel molestie eros. Cras erat nulla, faucibus eget pretium at, cursus eu enim. lorem Ipsum Integer et eros lorem, eget pharetra justo. Maecenas accumsan eleifend leo, a ullamcorper justo venenatis ut. Vestibulum bibendum diam vel turpis lobortis bibendum.</P>
<p><b>Lorem / Ipsum</b></p>

This is the code used to achieve the before/after example:

require_once '/path_to/HTMLPurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.AllowedElements', 'b,i,p,br,ul,ol,li');
$config->set('Attr.AllowedClasses', '');
$config->set('HTML.AllowedAttributes', '');
$config->set('AutoFormat.RemoveEmpty', true);
$purifier = new HTMLPurifier($config);

$remarks = 'the text to be filtered';
$remarks = preg_replace('/<\?xml[^>]+\/>/im', '', $remarks);
$remarks_cleaned = $purifier->purify($remarks);

The only other line of code here doing work apart from the HTML Purifier is the regex to remove <?xml ... ?> namespace tags from MS Word.

Yes 4G mobile broadband

Posted January 18th, 2011 in Internet by Mark Leong

Yes, it is fast!

This evening I had the privilege to try out Yes, YTL Communication’s new mobile broadband solution. Above is a screenshot of a speed test I did about midnight. As anyone who is used to Malaysian broadband can attest, clocking in a download speed of 10.5 Mbps is pretty good.

Yes has a simple pay-as-you-use price plan: 9 sen for a 1-minute call, 1 SMS, or 3 MB of data. Coverage of Peninsular Malaysia is said to be at about 65%.

For more info, price plans, videos and coverage maps, see http://www.yes.my/.

Gravatars: Globally recognized avatars

Posted December 6th, 2010 in Social media by Mark Leong

A what…?

Most people have never heard of an online avatar, let alone a Gravatar, so let’s begin with some definitions.

An online avatar is an image or icon used to represent you on the Internet, usually associated with some contribution you have made, such as a comment or a post on a forum.

A Gravatar is an avatar that is hosted by a third-party (Gravatar.com), which is associated with your email address. When you contribute a comment or post on a website that supports Gravatars, such as a WordPress blog, it will check with Gravatar.com and display your avatar if you have an account with Gravatar.com.

From the Gravatar.com website:

Your Gravatar is an image that follows you from site to site appearing beside your name when you do things like comment or post on a blog. Avatars help identify your posts on blogs and web forums, so why not on any site?

Why get a Gravatar?

You should sign up for a Gravatar so that on websites that support Gravatars, you will have a profile picture next to your comment or entry, instead of a generic icon, like the one on the right. On such websites, having a Gravatar helps distinguish your comments or posts from the others around it.

Setting up a Gravatar

  1. Prepare your image:
    • It must be square
    • It can be up to 512 pixels wide
    • It will be displayed at 80 pixels by 80 pixels by default
  2. Go to http://www.gravatar.com/ and click on the sign up button.
  3. Check your email Inbox and follow the instructions in the email from Gravatar.com.
  4. Upload your image.
  5. Try out your new Gravatar by leaving a comment on this post!

Speaking on “How to set up an online blogshop”

Posted November 27th, 2010 in News by Mark Leong

Workshop on "How to set up an online blogshop"

A few weeks ago I had the privilege of joining the Emmagem.com team in conducting a workshop at an event co-organised by the New Straits Times (NST) and Gorgeous Geeks on how to start an online business. The event attracted just under 100 participants, one from as far away as Perlis! Thankfully my journey from home didn’t have to start as early as his, as the workshop was held at the NST office in Bangsar.

My session was on “How to set up an online blogshop”. I began by introducing online shops and blogshops, and then moved on to the main part of the session, walking the participant through how to set a blogshop up at zero cost. Then I gave an overview of upgrades and expansion options, before concluding with a discussion on longer term strategies for running a blogshop. (In case you’re wondering, a blogshop is an online shop that uses a blog engine as its Content Management System.)

Workshop on "How to set up an online blogshop"

The two-day event was part of the WOMEN NETPRENEUR 2010 (WNET2010) programme organised by Gorgeous Geeks, MDeC, the US Embassy, Warisan Global, Emmagem.com and NST. Other workshops that weekend covered topics including business strategy, product sourcing, marketing, eBay, PayPal and product photography.

I enjoyed facilitating my part of the workshop, especially in being able to help the participants get to grips with some new tools and technologies to help them develop their businesses. Many thanks to Emmagem.com, Gorgeous Geeks and NST for the opportunity to get involved!

Photo credits: Women Netpreneur on Facebook.

How to register a Malaysian business online

Posted November 12th, 2010 in Business by Mark Leong

Types of registration

Suruhanjaya Syarikat Malaysia (The Companies Commission of Malaysia) offers two types of registration: business and company.

  • Business registrations may be either sole proprietorships or partnerships. They are relatively easy to do yourself and cost below RM100.
  • Company registration involves the formation of a new legal entity, either a Private Limited Company (Sdn Bhd) or a Limited Company (Bhd). Registering a company is a lot more expensive and complicated.

This write-up is on how to register a business online.

Why should you register a business?

Pursuant to section 5A(1) of the Registration of Businesses Act 1956, the person responsible for a business has to, not later than 30 days from the date of the commencement of the business, apply to the Registrar to register that business.

Registering a business online

Government portal registration

  1. Register at http://www.malaysia.gov.my/ (registration link at top right of page)

SSM Subscriber Registration

  1. Login at http://www.malaysia.gov.my/
  2. Go to http://www.ssm.com.my/en/eLodgement-services/
  3. Click on the link entitled SSM Subscriber Registration
  4. Follow the instructions to register
  5. Complete payment of RM5

Name enquiry

  1. Go to http://www.ssm.com.my/en/eLodgement-services/
  2. Click on the link entitled Application for Business Name Approval (ROB)
  3. Follow the instructions and submit the form
  4. Wait for the confirmation email – repeat this section if your name is rejected

Business name registration

  1. Copy the ROB approval number e.g. ROB12112010-xxxxxxxxxSB
  2. Go to http://www.ssm.com.my/en/eLodgement-services/
  3. Click on the link entitled Online Registration of Business (ROB)
  4. Follow the instructions and submit the form
  5. Complete the payment of RM30 (owner’s name) or RM60 (trade name)
  6. Wait for the confirmation email

What to do if you something doesn’t work

More info

Page 1 of 3123