My evolved news crawler :) v1.8

Dec 17, 2015 java english

Share on:

Well I needed to kill some time during this strange intermission period between jobs. My original 1-hour hack (less than 100 lines of code), evolved to something more flexible and useful (I hope so). Eventually my father is very happy now, instead of 1 newspaper summary he now receives

He was also kind enough, to email me (from the Pacific) some early bugs like duplicate entries and formatting issues, which I tried to resolve. It is always fun to have someone use your code, isn't it?

Of course in order to honor my Java development heritage, in this small tool I had to create my own mini framework / crawling logic - all java devs do it! it's not that complex actually, and now I can easily add more crawlers for similar sites.

So currently I support the following sites (greek at the time being), but I will keep adding more :

I have also added 2 optional command line arguments.

flag to control the max amount of articles to be crawled and included in the final report.
flag to control the creation of zip files, that contain each html report. That way I manage to reduce the size even more. So when I email them the payload is far less :).

You can find more in the official github page. By the way I try to keep my documentation up to date.

You will find all the required material in order to run or compile this small utility, plus any requirements.

I will soon add a small section, for those (if there is anyone interested) that would like to plug, extra crawling implementations for other RSS based sites.

Of course there a lot of stuff that I could do, in order to improve the utility, and most probably I will continue to add, crawlers for sites and make the design more _modular'.

happy crawling .