Capturing web pages

One nice feature of WebKit that has been integrated into Qt 4.5 is the support for full page zoom. It means that you would be able to scale the whole page and not only to make the text smaller or large. You can experience this already with the demo web browser if you grab Qt 4.5 beta. Go to a web site and use the View menu to actually zoom in and out everything, including the embedded image. This is achieved with a new property: QWebView's zoomFactor. Whether the zooming is applied to both the text and images or only to the text can be tweaked via the appropriate QWebSettings.

Along with this, I show a new example of capturing a web page. This is actually a request, as an alternative improvement to the web thumbnail example which has been shown before. Using it is as easy as:

webcapture www.trolltech.com 50 trolltech.png

The code will run both with Qt 4.4 and 4.5. Under Qt 4.4, it will grab the contents of the page (www.trolltech.com), render it to an image, and then scale the image accoding to the zoom factor (50, in percent) that you specify, and finally save it to a file (trolltech.png). With Qt 4.5 however, it will set the zoom factor before rendering the page to the image, effectively skipping the need for the scaling process in the end. Both approaches have advantages and disadvantages. For zoom factors less than 100%, the scaling approach usually gives a much better result. However, it is more expensive since we always need an image (for the buffer) as large as the page size. Think of a really long page like Google News or Digg. On the other hand, for an obvious reason, image scaling will not work well for zoom factors more than 100%.

If you want an extra exercise, do the following. Modify the capture tool so that it uses the image scaling method for zoom factors less than 100% and full-page zoom when the zoom factor is more than 100%. That likely gives the optimal result.

Another potential use case for such a tool is to perform a quick visual check of web pages at different viewport size, for example when they are viewed in mobile devices. While most desktop and laptop stick to at least 1024 pixels wide monitor these days, many mobile phones are still using QVGA (320x240) and HVGA (320x480) screen. You can pass the viewport width as the fourth argument to webcapture. Shown below (click to get the larger version) is the mobile version of BBC Technology section rendered in three different widths: 240, 320, and 480, respectively. As you can see, the page still looks nice even if the screen is not that wide.

The code is available from our brand new git repository for Graphics Dojo under the subdirectory webcapture. If you did a clone before, you just need an update. Otherwise just do the following:

git clone git://labs.trolltech.com/GraphicsDojo

This example also demonstrates the often asked question: how do I know the height of a web page given the width? You might want to know this if you need to display the page without the scroll bars (which implies a viewport as large as the contents). The steps are as follows: load the page, disable the scroll bars, set the viewport size to a sensible one (only the width here matters), and then get the contents size. If you test this trick to e.g. Google News, you will get 768 as the height of the viewport but 3161 from the contents height. Thus, this gives you something like heightForWidth() function.

Happy capturing!


Blog Topics:

Comments