November 2011

Monthly Archives: November 2011

Typesetting Math for the Web

Typography mat­ters. Bad typog­ra­phy can be as much of a bar­ri­er to the read­er as bad writ­ing. Conversely, good typog­ra­phy can sim­pli­fy the pre­sen­ta­tion of com­plex con­tent. This is espe­cial­ly true of math­e­mat­i­cal formulae.

Unfortunately, there does­n’t seem to be a stan­dard way to decent­ly ren­der for­mu­lae for the web — and basic HTML sim­ply isn’t expres­sive enough. MathML seems to be a fair attempt at a stan­dard, but like any XML lan­guage, it is over­ly ver­bose and hard to read. Also — not an entire­ly fair mea­sure of qual­i­ty, but nev­er­the­less of prac­ti­cal con­cern — there does­n’t seem to be MathML sup­port in WordPress, nor any easy way to enable it.

Instead, devel­op­ers seem to have ral­lied around the idea of ren­der­ing LaTeX to an image that can then be includ­ed. This at least allows the for­mu­lae to be view­able with prac­ti­cal­ly any device — but if you care even the slight­est about typog­ra­phy, you will lie awake at night because of the align­ment, spac­ing, type­face, and scal­ing issues that come from try­ing to make an image, not look like an image.

These issues are not inher­ent to the ren­der­ing, but rather a con­se­quence of the way images and text inter­act in HTML. Indeed, the ren­der­ing has its own align­ment and spac­ing issues, that varies between ren­der­ing ser­vices. To my eye, the best ser­vice is the Google Visualization API, which has LaTeX ren­der­ing as an undoc­u­ment­ed fea­ture. That’s not to say that Google’s ren­der­ing is with­out issues — it has a ten­den­cy to place sym­bols too close togeth­er, and has no appar­ent way to influ­ence even basic formatting.

An alter­na­tive to this is MathJax, which uses a com­bi­na­tion of HTML, CSS and Web fonts to ren­der for­mu­lae as text. In the­o­ry, this should alle­vi­ate many of the issues from image-based approach. In prac­tice, the typog­ra­phy is hor­ren­dous — most notably, near­ly every char­ac­ter, num­ber and sym­bol is ital­i­cized. Bad!

As a com­pro­mise, I have set­tled for using basic HTML for­mat­ting when pos­si­ble (e.g. for sub­script) — this makes inline for­mu­lae appear fair­ly coher­ent with their sur­round­ings — and a LaTeX plu­g­in for WordPress. This seems a fair com­pro­mise between con­ve­nience and qual­i­ty, but a com­pro­mise nonetheless.

The TV Show Rerun Paradox

We all know the feel­ing; TV sta­tions seem to be show­ing the same episodes of your favorite show over and over again. While the con­spir­a­cy-the­o­rist in me would love to believe this is true, there’s actu­al­ly a very good rea­son for this.

Remember when you were in school, and despite the appar­ent unlike­li­hood (after all, there are quite a few days in a year), two of your class­mates had their birth­days on the very same date? Actually, this isn’t unlike­ly at all; in a group of 23 or more peo­ple there is a 50% chance that at least two of their birth­days will coin­cide. For 57 or more peo­ple the chance is 99%! This is com­mon­ly known as “the birth­day para­dox”, although it’s not real­ly a para­dox at all.

The same prin­ci­ple applies to TV show episodes, and since most series have a lot less than 365 episodes, the prob­a­bil­i­ties are actu­al­ly even high­er. We’ll do the cal­cu­la­tions for a few well-known shows, but first let’s see how it works.

Let E be the set of episodes for a giv­en show. We denote the num­ber of episodes by |E| (read: “the size of E”). Now, for a giv­en num­ber of seen episodes to all be dif­fer­ent, they must be pair­wise dis­tinct. Let’s cal­cu­late the prob­a­bil­i­ty pE(n) that n ran­dom­ly cho­sen episodes of E are pair­wise distinct.

p_E(n)=1\ \cdot\ \left(1-\frac{1}{|E|}\right)\ \cdot\ \left(1-\frac{2}{|E|}\right)\ \cdots\ \left(1-\frac{n-1}{|E|}\right)\ =\ \frac{|E|!}{|E|^n(|E|-n)!}

This cal­cu­la­tion can be under­stood as the prob­a­bil­i­ty of choos­ing an unseen episode n con­sec­u­tive time. The prob­a­bil­i­ty of see­ing the same episode twice when watch­ing n episodes of E is then giv­en by 1-pE(n). Let’s study this for a few well-known TV shows.

For Friends, which has 236 episodes, the num­ber of episodes required for a 50% chance of a repeat is 19, and watch­ing 46 episodes gives a 99% chance. For House, cur­rent­ly at 162 episodes, these num­bers are 16 and 38 episodes respec­tive­ly. For America’s longest run­ning sit­com The Simpsons, cur­rent­ly at 492 episodes, watch­ing 27 episodes gives a 50% of a repeat sneak­ing in; and watch­ing 67 episodes brings this up to 99%!

So the next time you tune in to your favorite syn­di­cat­ed TV show and are dis­ap­point­ed that you’ve already seen the episode, you can feel com­fort­ed that it’s not the net­work’s fault — it’s just math.

Extra cred­it: The birth­day para­dox can also be applied to why your iPod shuf­fle appar­ent­ly keeps choos­ing the same songs to play. Although, that choice also seems to be influ­enced by the law that any ran­dom choice with­in a playlist will pick the worst song in the list.