Research/Web-to-print

From Publication Station
Jump to: navigation, search

This page is dedicated to research on web-to-print approaches and workflows.

Starting points

  • Goal: transform reflowable (markup-based) digital publications into fixed-layout PDF.
  • Case study: Beyond Social. How can web-to-print (or wiki-to-print) be applied to Beyond Social.
  • Requisites:
    • an online workflow, that can be run on a server
    • support page numbers
    • gather all articles in on to 1 document
  • Advanced features:
    • impositions - how can impositions, instead of simple stack of pages, be integrate into this workflow?

Possible strategies - software

  • laTex - Document preparation system, focused on the creation of PDFs, uses its own markup, supported by Pandoc.
  • wkhtmltopdf - HTML to PDF converter based on Webkit.
  • Weasy Print - Python, visual rendering engine for HTML and CSS that can export to PDF
  • Mediawiki Collection Extension - the same system used by Wikipedia to create books in PDF and Epub formats.

assessment

For each strategy try to point out:

  • summary of the workflow
  • example prototype
  • advantages
  • disadvantages
  • how can it be integrated into teaching

Strategies

List you researched strategy below, with a bit of documentation that point others to the right direction if they want to try it

LaTex

LaTex is a type-setting/document preparation language, focused on producing typographicaly correct page-based documents as PDF.

positive aspects

  • LaTex is a markup language, in many ways similar to HTML or Markdown, and Pandoc offers good support for it, converting well from other markups.
  • Can produce quality PDFs: w/ support for: page numbers, hyphenation, bibliogrphy, references, hyperlinks

LaTex sample:

\section{Tools}                      
We organized the work in two spaces: a {\bf wiki} and a {\bf website}. The \href{http://beyond-social.org/wiki/index.php/Main_Page}{wiki} was established as the editorial space, while the \href{http://beyond-social.org/}{website}
  • Can be set to produce more experimental and generative outputs. (See works by Lafkon studio for an idea)

negative aspects

  • Produced PDF are by default academic looking, although this can be changed
  • Use is outmoded and mostly restricted to academia
  • Styling is defined by packages imported into the document, which is very different and incompatible with CSS. Styling a LaTex document:
\documentclass[10pt, a4paper]{book} % Document form: book, size: A4, font-size                                                                                                
\usepackage[hmargin=3.0cm, vmargin=2.0cm]{geometry} %document margins

sample output

Error creating thumbnail: File missing

final remarks

Although LaTex can be set to produce very interesting results and can be easily integrated within the current workflow, centered around Wikis, Pandoc, HTML and CSS; It constitutes a difficult tool to work with, let alone to teach. It might bring more confusion to students and contradict our approach for setting up hybrid publishing workflows, which has been based on essential web languages: HTML and CSS and simple tools: Wikis and Pandoc. The advice is to leave LaTex alone, although it might be an interesting venue to explore, for more experimental projects.


Weasyprint

Research/Web-to-print/WeasyPrint

WeasyPrint is a visual rendering engine for HTML and CSS that can export to PDF. It aims to support web standards for printing. WeasyPrint is free software made available under a BSD license.
It is based on various libraries but not on a full rendering engine like WebKit or Gecko. The CSS layout engine is written in Python, designed for pagination, and meant to be easy to hack on.
[1]

Can be used as a Python library or as a standalone program. Remarks below refer to use as standalone program, so far.

positive aspects

  • Uses HTML and CSS to layout the PDF, which means a smoother learning curve from web to print.
  • Supports features like page size, page number, hyphenation in several languages (w/ pyphen lib)custom typography, allowing the production of a PDF with high level of control in terms of design.
  • Very simple and easy to understand syntax, does not require proficiency in command line.
Example:
weasyprint http://beyond-social.org/wiki/index.php/Hybrid_Publishing beyondsocial.pdf -s style.css
Example explained:
weasyprint source-html-document pdf-output -s css-file
  • -s being the flag to include the CSS that will overwrite existing CSS rules used in the web version

negative aspects

  • Can be difficult to install, due to the dependencies. In Debian no issue was experienced. In Mac OSX, still trying to manage the installation.
  • It's more than difficult to install! It's very hard. It's dependencies seem to belong to another era and have small communities and scarce documentation

sample output

Error creating thumbnail: File missing

style.css

html, body{
	background-color: #e0e0e0 !important;
	font-family:  "AmericanTypewriter", serif !important; /* the font needs to be in your computer. this is not the final font, please choose a font of your choice */
	color: #000 !important;
}
 
div#footer ul {
    list-style-type: none !important;
 }
 
@page{
	size: 8.5in 8.5in;
	background-color: white !important;
        counter-increment: page;
        font-family:  "AmericanTypewriter", monospace !important; 
	color: #000 !important;
  	margin: 1cm;
        font-size: 8pt;
}
 
 
h1{
	string-set: doctitle content();  /* not tested - not sure it is working */
	/* retrieves the content from h2.title - will be used later, in the page bottom*/
}
 
img{
	width: 100%;
	break-page-inside: never;
	}
 
#catlinks{display: none;}
 
div#footer{
	background-color: black !important;
	color: #fff !important;	
	/*border-radius: 2cm;*/
	font-family: sans-serif !important;
	font-size: .75em !important;
	text-align: center;
	padding-bottom: .2cm;
	position: absolute;
	bottom: 0;
	width: 100%;
}
 
@page :left {
  @bottom-right{
    margin: 0;
    /* font-family: inherit; */ /* does not work */
    content: string(doctitle);      
  }
  @bottom-left{
  	margin: 0;
  	content: counter(page); 
  }  
}
 
@page :right {
  @bottom-right{
    margin: 0;
    /* font-family: inherit; */ /* does not work */
    content: counter(page);      
  }
  @bottom-left{
  	margin: 0;
  	content: string(doctitle); 
  }  
}

wkhtmltopdf

Wkhtmltopdf is an open source project very similar to weasyprint, with an identical workflow. Because it is so similar, we will mostly discuss the differences between the two.

positive aspects

  • wkhtmltopdf is based on the webkit rendering engine, which eases bug tracking and improves support
  • wkhtmltopdf is very easy to install.
  • wkhtmltopdf can run javascript.

negative aspects

  • Webkit, and thus wkhtmltopdf, has not yet implemented as many advanced css printing features as weasyprint has.

sample output

"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" --print-media-type container.html wkhtmltopdf_book.pdf
Error creating thumbnail: File missing

Usage for teaching

Because it is free and open source, this tool appears very suitable for use in the hybrid publishing workflow that is currently being thought in several courses. Because it adheres to most HTML and CSS rules and can be used simply from the command line, students are not forced to learn yet another language.


built-in browser pdf prints

Many browsers nowadays have built in pdf rendering engines. When an HTML page is created on the server, users and/or printers can simply press print in their browser and choose to export a pdf.

positive aspects

  • Less work is needed on the server
  • Users can easily customize the look of their pdf

negative aspects

  • Publishers have no guarantee that users see the correct lay-out (Chrome does this very poorly for example)
  • It requires more technical know-how from the user

sample output

Below is an example print from Chrome:

Error creating thumbnail: File missing

Usage for teaching

This works quite poorly, so we see no place for this in education. Recent versions of Chrome (post-webkit) actually perform worse than before, so there is little hope for improvements in the future.

references

  1. “WeasyPrint Documentation — WeasyPrint 0.22 Documentation.” http://weasyprint.org/docs/.