How to make debian font rendering dpi perfect?

It has been quite a long time since I have installed debian jessy as my primary OS on laptop with the possibility to load Windows 10 when I wish, especially for those magical Adobe Cloud applications, which slowly are being changed by inscape and gimp. One of the things that I was constantly noticing on Debian was – terrible font rendering. When I say terrible, I mean it really is. The worst thing, many new linux users usually go to other “neibourhood” once they enter “uglyish” UI world of fresh debian install. As some philosophers state art and beauty stimulates better parts of ourselves, hence nice UI is helping to achieve better things with our operating system as a tool. Here I want to share how I have managed to put an end to this and achieved great font rendering for debian jessy xfce environment, pixel perfect, native linux style, not some ugly tuning, tinkering, or os x imitating. Of course, thanks to community who have provided great and easy to install tool “infinality”

First things first, to begin we need to know what is out screen’s native DPI. To do that we can use the following site:

We would need to type native screen resolution, and diagonal. All of this information could be available in manufacturers website or product manual.

Next, once we know what is  correct DPI, we need to set it up in Settings / appearance / fonts (xfce desktop environment). Once we do that, fonts and visual elements will increase in size. If fonts look too big, we would need to go to the same settings and decrease font size.

Now we are ok to use external tool “infinality” available at:

Please follow instructions on the web site and install the tool as copied from website bellow:


(debian repos)
echo "deb trusty main" | sudo tee /etc/apt/sources.list.d/infinality.list
echo "deb-src trusty main" | sudo tee -a /etc/apt/sources.list.d/infinality.list
sudo apt-key adv --keyserver --recv-keys E985B27B
sudo add-apt-repository ppa:no1wantdthisname/ppa
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install fontconfig-infinality

Run the setup (preferably set linux style once prompted):

sudo bash /etc/fonts/infinality/ setstyle

Once all these instructions are completed, we need to set manually DPI to infinality settings file:

sudo -H gedit /etc/profile.d/

Find in the file “DPI=” and set to your screen’s true dpi. (in my case it was 100, not 96)

Once its all done, save the file, reboot your system (or log in and log out). Last thing make sure settings / appearances / font / hinting section is set to slight, this will align nicely all font pixels to screen pixels without breaking the natural font style. Full hinting on other hand will make best possible alignment to screen pixels making fonts very sharp, but this will break the intended font look, making characters out of shape. That’s why OS X have quite blurry fonts, as this is the only way to preserve the look and feel of the font. On another hand windows has very sharp fonts, yet fonts that still have their true shape. This makes it good for long hour readability as well as aesthetics.


Hopefully that would help someone to unleash the debian beast and use it as desktop os for productive work.

What can word count analysis tell us?

So I have been playing with python lately, writing stuff without any initially set purpose, at least not so explicitly stated. I have decided to experiment with word count algorithms and build a tool, which could analyse word count of particular news agency online store front, where all the gossip is flourishing. I thought this could be interesting, not just because experimentation and playfulness is the holy grail of all discoveries, but because I could learn something from this experience.

So I started with a couple of libraries lxml and BeautifulSoup, which unfortunately did not deliver as expected. Lxml is not a perfect html parsing tool and beatifulsoup also did not provide me what I was looking for, i.e. visual text of the web page, no tags, no html gibberish stuff. Later I have switched to html5lib, which was a bit better, but far from perfect and a bit slower. Since none of these tools provided me with simple text format web page, I had to write some of the manual string manipulations that could do it for me, since when working with data where suppose to tell a story, priority should always be quality, otherwise the story might be quite misleading.

After soving some of the technical issues, there was another challenge, i.e. specific language aspects that are relevant for particular website. For example words such as “the,, “is”, “at”, or verbs that do not really have any meaning without the knowledge of nouns that surround them. So for this first code iteration, I did keep in mind possible noun, verb analysis, but focused on nouns by eliminating verbs that do not tell the story alone. However since I was analysis a couple of websites – one in English ( and one in Lithuanian language (, I have realized that different languages pose different challenges. For example, in Lithuanian language words like “namas”, “namui”, “name” mean the same thing – i.e. house or “namas”. The reason why endings of these words change is the specific focus of the sentence. For Example if you would say what house, where is the house, or who has the house, in Lithuanian language all these same nouns would have different endings duo to its direct relation to verbs. Anyway, I’m not going to do analysis on language aspects much here, except that highlighting some of the aspects that code need to be aware of, otherwise both names such as “namas” and “namui” would be counted as separate, which could potentially ruin the analysis.

Once I have set the minimum required dictionaries and mappings for English and Lithaunian languages, I have developed an algorith which allows you to enter the website and see specific statistics about its word count. For data visualisation I have used bokeh library, which is really great ongoing development, probably opening some new horizons for open source data visualisation applications.

Ok, so bellow are the results for


And for

In my next blog post I will try to continue answering the question about what this data can tell us, however although we can see the trend in terms of focus within both of these distinct mediums (i.e. lrytas focus on Lithuania, Vilnius – generally concepts of community structures, dailymail focus on celebrity, star, dress etc – individuals with status), there are still technical questions to be resolved. Primarily verbs and nouns. In this example there are numerous verbs that were removed based on my subjective judgement. I will probably aim to separate all nouns from verbs and them try to analyse data in terms of what follows noun or precede it. This should be done with the knowledge of language, so still research to be done.

So far lessons learned:

Distinct languages have different features that need to be taken into account. (e.g. issue with Lithuanian language word endings)

Not all words alone tell a story (e.g. verbs)

Still concerns with python html to text libraries, not perfect, still need to do a bit coding to get to that close to perfect state, i.e. that is when library could deal with everything you would through at it, regardless if html code was written properly.

Also perhaps plunging language databases to algorithm would help to deal with mapping same meaning words.

Data sampling issues. One sample might not take a full story, since news are updated throughout the day. Perhaps sampling stories throughout the week on specific times during the day and aggregating the results would potentially yield something else.

Cleaning “your” private data: gmail inbox

So you have decided its time to clean “your” gmail inbox because it is becoming highly cluttered with long time forgotten or marketing emails which are just obfuscating real messages out there from real people. Or maybe you are concerned with your privacy and would like to reclaim some of your data. Whatever the reason, google has a function which allows you to download your full inbox file in the zip archive format. Later this file could be quite easily opened using for example open source Mozilla Thunderbird email client.

Bellow is the google link to raise the request:

Make sure you do select gmail in the list and deselect all the other services.


Once you have downloaded all of your data (might take a while for google to prepare your archive) and moved it into the safe place, you can easily go on and delete all your emails from gmail inbox. To do that, find folder called “all mail” and mark a checkbox on the left corner, click on the popup link to select all emails in this box and click delete. After that, do the same for the rubbish bin and its all clean.

Now, you might be wondering why did I put quotes on “your” word? Well, the promise of gmail service is very straight forward, you get free reliable service, i.e. email storage, email address, in an exchange for your privacy – i.e. you have to share your data for the purposes of marketing/advertising.

If you are someone who is concerned with the privacy and would like to use a more robust solution, try Protonmail which uses public key/private key encryption and is based in Switzerland. Public key/private key encryption makes sure that only you and receiver at the other end (using same encryption method) have capability to read email that is being exchanged. This service use well know business model where premium accounts with extra storage pay for the service of all free accounts as well as service accept donations. I must warn you though, if you decide to switch from gmail to Protonmail, the trade off will be – convenience and possibly availability of service due to possible attacks from governments.

Python html data scrambling ( example)

I’m surprised how great python is, and what you can do with this programming language. Its not just useful for data analytics or data science or statistics, but always for various other types of data related activities such as pulling data from website for variety of analyses. In this example I have targeted Indeed job board website – a very nicely written job board application. The purpose of the code is to demonstrate how python can be used to automatically get specific data from this website, in this case – a list of job titles and company names. The data at first is being fetched using specific library in the code and html request and stored into an object. Later being converted to long string and then analysed in order to filter out what is needed and what is not. This is all happening through a couple of iterations. The specific library (lxml) that was used to get html page has some functionality to target specific xpath tags, which unfortunately didn’t function as expected, so to save time I have simply used string manipulation functions and a bit of html string analysis to achieve the desired result. The final product of this code is a csv file with long list (depending on parameters) of rows with job and company column. Empty database.csv file has to be created locally for this to work. The code bellow and ipython jupyter notebook attached:


Bellow is all data in the CSV file (LibreOffice Calc/MS Excel):

Next step could be getting additional data parameters like salary, posting date etc. This could potentially produce interesting data discovery insights. Also, linkages to other sites like glassdoor or similar could help to get more value. Although it was only an experimentation, code could potentially help to build job board aggregate system, fetching data from various job boards and presenting in one place. The challenge would be to analyse the code of each specific web site and tailor it, so that data comes up as clean as possible, as well as accurate