MongoDBis the fastest growing NoSQL database. Apart from the advantages of NoSQL technology, it's JSON querying style, easy installation makes it more preferable. In this article I'm going to show you how to save tweets into MongoDB using Java.
Here I'm going to use Twitter Search API to get the tweets in the form of JSON and save it to the database. Previously I have tried this with the MySQL. I think MongoDB has some very good advantages.
We don't have to create a schema for the database where we are going to store the tweets. If you closely see the structure of data returned by the search API, It would require you at least five tables to implement in RDBMS
Each tweet has different set of attributes associated with it. Some containmention information, some contain retweet information, geo location, URLs and a lot more. This could be stored without designing complex database schema
MongoDB drivers has built-in support for JSON. JSON being the primary choice of many web services, this becomes a great advantage.
Twitter may add or remove some of the properties from the result returned by the search API. This will not affect our program.
Querying the database has become easier and efficient. We don't need to useJoin any more.
You can download the Java driver for MongoDB here. The following simple code is enough to do our task.
import java.io.BufferedReader;import java.io.InputStreamReader;import java.net.URL;import java.net.URLConnection;import com.mongodb.BasicDBList;import com.mongodb.BasicDBObject;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.DBObject;import com.mongodb.Mongo;import com.mongodb.util.JSON;publicclass Main {publicstaticvoid main(String args[])throwsException{System.setProperty("java.net.useSystemProxies","true");//Connecting to MongoDB
Mongo m =new Mongo();
DB db = m.getDB("twitter");
DBCollection coll = db.getCollection("tweets");//Fetching tweets from TwitterString urlstr ="http://search.twitter.com/"+"search.json?q=mongodb";URL url =newURL(urlstr);URLConnection con = url.openConnection();BufferedReader br =newBufferedReader(newInputStreamReader(con.getInputStream()));int c;StringBuffer content =newStringBuffer();while((c=br.read())!=-1){
content.append((char)c);}//Inserting tweets to database
BasicDBObject res =(BasicDBObject)
JSON.parse(content.toString());
BasicDBList list;
list =(BasicDBList)res.get("results");for(Object obj : list){
coll.insert((DBObject)obj);}
m.close();}}
To verify, log into the database terminal and type the following commands and you will be able to see all the tweets in the form of JSON.
In April 2012, several stories were published about a mysterious malware attack shutting down computer systems at businesses throughout Iran.
Several articles mentioned that a virus named Wiper was responsible. Yet, no samples were available from these attacks, causing many to doubt the accuracy of these reports.
Following these incidents, the International Telecommunication Union (ITU) asked Kaspersky Lab to investigate the incidents and determine the potentially destructive impact of this new malware.
After several weeks of research, we failed to find any malware that shared any known properties with Wiper. However, we did discover the nation-state cyber-espionage campaign now known as Flame and later Gauss.
It is our firm opinion that Wiper was a separate strain of malware that was not Flame. Although Flame was a highly flexible attack platform, we did not see any evidence of very destructive behavior. Given the complexity of Flame, one would expect it to be used for long-term surveillance of targets instead of direct sabotage attacks on computer systems. Of course, it is possible that one of the last stages of the surveillance was the delivery of a Wiper-related payload, but so far we haven-t seen this anywhere.
So, months later, we are left wondering: Just what was Wiper?
Enter Wiper
During the investigation of the mysterious malware attack in April, we were able to obtain and analyze several hard drive images that were attacked by Wiper. We can now say with certainty that the incidents took place and that the malware responsible for these attacks existed in April 2012. Also, we are aware of some very similar incidents that have taken place since December of 2011.
The attacks mostly took place in the last 10 days of the month (between the 21st and 30th ) although we cannot confirm that this was due to a special function being activated on certain dates.
The creators of Wiper were extremely careful to destroy absolutely every single piece of data which could be used to trace the incidents. So, in every single case we-ve analyzed, almost nothing was left after the activation of Wiper. It-s important to stress ?almost nothing here because some traces did remain that allowed us to get a better understanding of the attacks.
From some of the destroyed systems we were lucky enough to recover a copy of the registry hive. The registry hive did not contain any malicious drivers or startup entries. However, we came up with the idea to look into the hive slack space for deleted entries. This is what we found:
Interestingly, on 22 April, just before this system went down, a specific registry key was created and then deleted. The key was a service named ?RAHDAUD64. It pointed to a file on disk named ?~DF78.tmp, in the ?C:\WINDOWS\TEMP folder.
The moment we saw this, we immediately recalled Duqu, which used filenames of this format. In fact, the name Duqu was coined by the Hungarian researcher Boldizsár Bencsáth from the CrySyS lab because it created files named ?~dqXX.tmp. (see)
We tried to recover the file ?~DF78.tmp from the disk, but found that the physical space where it resided was filled with garbage data.
We found the same ?wiping pattern in several of the other systems we analyzed - a service named ?RAHDAUD64 which was deleted just before it is wiped - and its file filled with garbage data. In these other systems, the RAHDAUD64 service pointed to different filenames, such as ?~DF11.tmp and ?~DF3C.tmp. So it-s possible the names were random.
Another peculiarity of the wiping process was a specific pattern which was used to trash the files on disk:
Most of the files that were wiped contain this specific pattern that repeats over and over. Interestingly, it did not overwrite the entire file. In some cases some portions of the file remained intact, every header of the files were destroyed in the first place. This was probably caused by the size of the file. The wiping algorithm was designed to quickly destroy as many files as possible.
Based on the pattern that we know had been used when wiping files, we collected Kaspersky Security Network (KSN) statistics on which files had been destroyed.
In an attempt to reconstruct the Wiper algorithm we came up with the following sequence:
Searching for and wiping files based on their extensions. List of file extensions:
accdb
cdx
dmp
H
js
pnf
rom
tif
wmdb
acl
cfg
doc
hlp
json
png
rpt
tiff
wmv
acm
chk
docx
hpi
lnk
pps
rsp
tlb
xdr
amr
com
dot
htm
log
ppt
sam
tmp
xls
apln
cpl
drv
html
lst
pptx
scp
tsp
xlsx
asp
cpx
dwg
hxx
m4a
pro
scr
txt
xml
avi
dat
eml
ico
mid
psd
sdb
vbs
xsd
ax
db
exe
inc
nls
rar
sig
wab
zip
bak
dbf
ext
Ini
one
rar
sql
wab~
=
bin
dbx
fdb
jar
pdf
rdf
sqlite
wav
=
bmp
dll
gif
jpg
pip
resources
theme
wma
=
Searching for and wiping all files in certain folders (e.g. in Documents and Settings, Windows, Program Files) and on all available USB drives connected to the computer.
Wiping disk sectors (possibly using a bootkit module).
Wiping a disk that is several hundred gigabytes in size might take hours. So the creators of the malware were careful to select wiping algorithms that could achieve maximum efficiency. For example, let-s take a look at the following disk which was wiped by Wiper. We-ve used a statistical representation (Shannon entropy in blocks of 256K) to represent entropy on disk. The lighter areas have higher entropy, the darker areas, lower. The areas in red have very high entropy (highly random data).
As you can see, Wiper managed to do a pretty good job of destroying most of the disk. One can observe a red-filled stripe at the top which indicates an area that has been cleaned well. Although no clear pattern is visible, a large amount of the disk has been filled with unusable data. The wiping operation obviously focused on the beginning of the disk, then it affected the middle of the disk, before the system finally crashed.
Another view can be obtained by looking for sectors which have been filled with the known ?%PNG / iHDR pattern. Red marks the sector blocks which have been overwritten with this pattern:
As you can see, more than three-quarters of the disk was affected by the wiper, with the vast majority of the data being lost forever.
In some cases, Wiper misfired - for instance, we saw one 64-bit system where Wiper failed to activate. In this case, we discovered two files in %TEMP% which were wiped with the known PNG/iHDR pattern, but not the whole disk:
We presume these two files, out of the thousands in %TEMP%, must have been destroyed because they contained data important to the Wiper attack. In another system we analyzed, in addition to these 20K-ish files, we saw two 512 byte files named ?~DF820A.tmp and ?~DF9FAF.tmp, which have also been wiped beyond recovery.
Interesting enough, on some systems we noticed that all PNF files in the INF Windows folder were wiped with a higher priority than other files. Once again, this is a connection to Duqu and Stuxnet, which kept their main body in encrypted ?.PNF files.
If the purpose of the attackers was to make sure the Wiper malware could never be discovered, it makes sense to first wipe the malware components, and only then to wipe other files in the system which could make it crash.
Links with Flame
While searching for this elusive malware, we came across something else. We suspected Wiper used filenames such as ?~DF*.tmp or ?~DE*.tmp in the TEMP folder, so we started looking for them via KSN. During this process we noticed that an abnormally large number of machines contained the same file name: ~DEB93D.tmp:
The name seemed like a good indicator that the file was part of the Tilded platform, and related to Duqu and Stuxnet. The file appeared to be encrypted, but we quickly noticed something:
Duqu (Nov 3, 2010): 00: ED 6F C8 DA 30 EE D5 01
~DEB93D: 00: 6F C8 FA AA 40 C5 03 B8
By complete chance, we noticed that this file started with bytes ?6F C8, which were also present at the beginning of the Duqu PNF main body, in encrypted format, loaded by the driver compiled on Nov 3, 2010. If it wasn-t for this, maybe we-d have never paid attention to ~DEB93D.tmp, since the content looked like trash.
The encryption algorithm was weak and a pattern appeared to be repeating every 4096 bytes. Since the algorithm was weak we managed to decrypt it by using statistical crypto-analysis, a common technique used for decrypting files during malware analysis. After decrypting the file, we noticed what appeared to be sniffer logs. This made us check further and we found other files, modified on the same date, with names such as ?mssecmgr.ocx, ?EF_trace.log or ?to961.tmp. The rest, as it is said, is history. This is exactly how we discovered Flame.
So, what was Wiper?
There is no doubt that there was a piece of malware known as Wiper that attacked computer systems in Iran (and maybe in other parts of the world) until late April 2012.
The malware was so well written that once it was activated, no data survived. So, although we-ve seen traces of the infection, the malware is still unknown because we have not seen any additional wiping incidents that followed the same pattern as Wiper, and no detections of the malware have appeared in the proactive detection components of our security solutions.
Conclusions:
It may be possible that we will never find out what Wiper was but based on our experience, we are reasonably sure that it existed, and that it was not related to Flame.
It-s possible that some machines exist somewhere where the malware has somehow escaped being wiped, but if there is such a case, we haven-t seen it yet.
Wiper may have been related to Duqu and Stuxnet, given the common filenames, but we cannot be sure of this.
What is certain is that Wiper was extremely effective and has sparked potential copycats such asShamoon.
The fact that the use of Wiper led to the discovery of the 4- or 5-year-old Flame cyber-espionage campaign raises a major question. If the same people who created Duqu/Stuxnet/Flame also created Wiper, was it worth blowing the cover of a complex cyber-espionage campaign such as Flame just to destroy a few computer systems?
US firm may now go after Motorola and HTC, while Android considered likely to continue to dominate smartphone market
Despite Apple's victory over Samsung in the patent trial, Android is expected to keep control of the global smartphone market. Photograph: Ahn Young-Joon/AP
For Tim Cook, Apple's chief executive, who took over from Steve Jobs a year ago, the court victory over Samsung will have been sweet. Normally, patent disputes rarely produce clean victories. But the decision by the nine-member jury in San Jose, just a few miles from Apple's headquarters, is dramatic.
Apple had been suing Samsung for $2.5bn in damages, claiming Samsung's phones and tablets copied its devices' behaviour and appearance; Samsung counter-claimed about $200m, saying the iPhoneand iPad used its wireless 3G standard technologies, and methods for tasks such as sending a photo by email from a phone.
Apple won on almost every count it claimed; Samsung, on absolutely none. It was a dramatic demonstration of the home court advantage. Samsung Electronics can bear the $1.05bn in damages – in the second quarter of this year alone its operating profit was $5.86bn – but the hit to its reputation is substantial. Apple can portray it as a looter of intellectual property, a copyist, an unimaginative follower.
Apple can also now go after HTC and Motorola, its two principal smartphone rivals in the US, with renewed vigour. They may have to consider whether to sue for peace, for Samsung lost despite being the biggest of the mobile makers, the biggest smartphone maker, the biggest implementer of Google's Android mobile software, and extremely rich — it spent $9.5bn on market in the past 12 months, and is a major sponsor of the Olympics.
More than that, though, Apple will hope that this decision will put second thoughts and self-doubt into the minds of every industrial designer and software engineer competing in the smartphone and tablet business around the world.
Apple's key rival here is Google, whose Android software Samsung used to build its phones. But Apple can only go after the handset makers that implement Android, not the creators themselves; even so, by winning on Friday, it will have nervous engineers at the handset makers, who are using Android in ever-growing numbers, pausing as they compare their latest products with the next iPhone. Is it too similar? Will this trigger a lawsuit? Should I change it?
The patent wars have dismayed many, who find the legal fighting tedious, and wish the companies would just work on innovation. Apple, for its part, insists that it is happy with competition — but that rivals should do their own innovation. The rivals shoot back that Apple is trying to patent ideas that have been obvious and implemented before.
Despite this result, Android is likely to keep winning the battle to run the world's smartphones; it ran on 64% of the smartphones shipped worldwide in spring, and 80% of the smartphone shipped in China in that period; the iPhone had 12%. Android is running on phones that are getting cheaper and cheaper all the time. It wouldn't be worth Apple's time to sue every company using it.
But in the US, where its most valuable customers are, Apple definitely sees the effort as worth going to. The decision in San Jose may be the first of many. The question now is whether Google, whose Motorola subsidiary a week ago filed a fresh series of patent infringement claims against Apple — claims which could halt sales of the iPhone and iPad, if upheld — can manage to drive the war to a settlement. So far, there's no sign of that happening. And Apple is yet another billion dollars richer.
Will FaceTime on coming iPhone 5 crash LTE networks?
Carriers prepare with new technologies, data plans, but analysts say nobody knows what will happen when iOS 6-based devices arrive
Are the nation's LTE wireless carriers prepared for the video chat data crunch expected to come with the next-generation iPhone and other devices that are expected to launch this fall?
The answer: It depends on whom you ask.
Both AT&T and Verizon Wireless decline to say whether they are ready for the data crunch.
Over the summer, both carriers introduced data sharing plans that analysts believe were timed to help limit a surge in heavy data use expected especially with the use of Apple's FaceTime real-time video chat software on the iPhone.
"If I were a carrier, I'd be rather frightened by FaceTime," said Jack Gold, an analyst at J. GoldAssociates. "If everybody used FaceTime, bandwidth would go up dramatically, and the user experience would go down."
Imposing data sharing plans with set fees for specific numbers of gigabytes a month -- and penalties for exceeding the set amounts -- could help top carriers AT&T and Verizon avoid data capacity overload problems on their 4G LTE, and even 3G, networks, Gold said.
"Requiring the data sharing plans is really just another way for carriers to say they are limiting your access," he added.
AT&T has come under fire in recent days for announcing plans to require users to sign up for a Mobile Share data plan in order to conduct FaceTime video chats over its current 3G and future cellular networks.
FaceTime will be available for cellular network use, instead of just over Wi-Fi, in mobile devices running the forthcoming iOS 6, which Apple announcedearlier this year.
Sprint, the nation's number three carrier, has stood solidly behinds its unlimited data plans, and is just starting to roll out 4G LTE technology.
At an event to mark the activation of its 16th LTE location in Baltimore earlier this week, Sprint 4G engineering manager Viet Chu told reporters that "if some abuse the system [with heavy data use] we would address it." He didn't specify what steps the carrier may take.
Concerns about FaceTime's impact on cellular networks are particularly acute, partly because the iPhone is the top selling smartphone model worldwide and because the next model will reportedly support LTE, which would help make the device even more popular.
The upcoming iPhone is also expected to have a larger screen -- more than 4-inches compared to the current modle's 3.5-in. screen -- which would make video chats easier.
That creates problems for carriers because like most two-way chat apps, FaceTime is an enormous bandwidth hog.
Video chat often uses about 3 megabytes of data per minute, though the exact rate depends on encoded software, noted Wendy Cartee, vice president of product and technical marketing for JuniperNetworks.
Juniper develops software that carriers can use to improve bandwidth at cell tower locations. It also sells a Universal Access router that can be installed at individual cellular tower locations to help streamline the data traffic at the point where it joins the backhaul link. Backhaul is the wired (or fiber optic) segment of a network between the wireless portion received at a cell tower and the network core.
Such products from Juniper and other vendors like Cisco and Alcatel Lucent should help U.S. carriers handle FaceTime, or other rich video applications on their LTE networks, Cartee said. "Carriers do plan for these types of changes in apps," she noted. "I'm looking forward to using FaceTime over cellular."
LTE is also inherently faster than 3G (generally LTE networks provide up to 8 Mbps on downlinks and up to 3 Mbps on uplinks) and can generally handle more capacity than earlier-generation networks, analysts noted.
Verizon offers LTE service in most of the geographic U.S. AT&T trails Verizon in coverage but has touted its GSM 3G HSPA speeds where its LTE networks aren't ready.
"Before these data sharing limits, there was no reason for end users to do any kind of self-regulation," he explained. "Now if they use a lot of data, it will cost them."Gold said data sharing pricing plans will help AT&T and Verizon deal with the data crunch as much as the new routers and other technology.
As a result, Gold said AT&T won't get the heated criticism it got for not being able to support the original iPhone five years ago over GSM. "Data capacity will be much less of an issue with iPhone 5 than the first time around, which kicked AT&T's butt," Gold said.
Still, AT&T has expressed concerns about FaceTime data usage, noting its worries in a blog post this week that defended forcing FaceTime users onto its Mobile Share data plans instead of using individual plans.
"We are broadening our customers' ability to use the preloaded version of FaceTime, but limiting it in this manner to our newly developer AT&T Mobile Share data plans out of an overriding concern for the impact this expansion may have on our network and the overall customer experience," said Bob Quinn, senior vice president for federal regulatory matters, who penned the blog post.
He added: "We will be monitoring the impact the upgrade to this popular preloaded app has on our mobile broadband network, and customers, too, will be in a learning mode as to exactly how much data FaceTime consumers on those usage based plans."
Nav Chander, an analyst at IDC, said that AT&T's blog post shows that the carrier is "being very, very careful" with FaceTime. "The blog tries to lower expectations, anticipating the worst case," Chander said.
Chander has also tracked what he called a massive improvement in backhaul capabilities by U.S. carriers in the past two years. Generally, he said carriers have expanded by 50 times the backhaul capability in their networks since the first iPhone was introduced.
In many cases in dense urban areas, a dual copper T1 connection from a cell tower to the wider network (with a 3 Mbps capability) has been replaced by a fiber optic connection with 1 Gbps capacity, he said.
Chander also noted a "huge increase" in the number of cell towers nationally in the past three years. Often, one carrier will own a tower and lease it to several other carriers to allow them to attach antennas.
Michael Howard, an analyst at Infonetics, said carriers are prepared for FaceTime on LTE -- "for the most part. There is a small chance that some areas in some city might get hit with some slowdowns, but I doubt the traffic upsurge due to FaceTime will add any major factor like the unexpected surges of the initial iPhone rollouts."
While some experts feel U.S. carriers will be ready for heavy data usage over LTE, others say there's really no way to know what will happen despite all the technical and restrictive pricing preparations.
"There's really no way of telling if carriers are ready for an LTE iPhone," said Seamus Hourihan senior vice president of strategy at Acme Packet. "There are many different constraints in networks and one is the area of bandwidth. But there's no such thing as unlimited bandwidth."
Acme provides software to carriers to improve network efficiency.
"Bandwidth in 4G LTE networks initially is not going to be a problem, but over time, if you use it for interactive video or watching moves, yeah, it's going to be a problem," Hourihan said.
A common complaint with FaceTime even over Wi-Fi has been video freeze-ups and dropped calls that make the experience difficult. If LTE isn't available at the tower near a FaceTime user, 3G networks might offer a worse experience than Wi-Fi.
FaceTime connections will also depend on what spectrum band carriers use for LTE. Lower frequencies carry signals farther, just the same way that a low base guitar can be heard further away from a rock concert than a high pitched singer.
Both AT&T and Verizon worked to get 700 MHz spectrum at the lower end for LTE for that reason, Gold said.
Image from NASAA rock-zapping laser and telescopic combination called ChemCam is getting a lot of attention with NASA's rover Curiosity landing on Mars.
But what is it?
Here's an explainer, as well as more details about the mission.
ChemCam can look at rocks and soils from a distance, fire a laser to vaporize the materials and analyze them with an on-board spectrograph that measures the composition of the resulting plasma. NASA says ChemCam can also use the laser to do less destructive things, such as clear away dust from Martian rocks as well as use a remote camera to acquire extremely detailed images.
Roger Wiens, ChemCam principal investigator at the Los Alamos National Laboratory, gave a tutorial on how the instrument works at a recent news conference.
"Curiosity's remote sensing instrument [is] designed to make a large number of rapid measurements in some sense to help guide the rover to the most interesting samples," he said.
He also talked about ChemCam's imaging capability and said in routine operation the team plans to take images either before or after the laser operation or both, but not during the laser operation.
"The camera is very high resolution. It's sensitive enough to image a human hair quite easily by seven feet away," he said.
After nearly two weeks on the dusty red planet, Curiosity is doing warm-up exercises and getting ready to take off for its first drilling for a rock sample -- to a place 1300 feet away scientists have named Glenelg, a spot where three kinds of terrain intersect.
In the next few days, the one-ton, six-wheeled mobile Mars laboratory will exercise each of its four steerable wheels, turning each of them side-to-side before ending up with each wheel pointing straight ahead. Curiosity will continue warming up by driving forward about 10 feet, turning 90 degrees and then reversing about sevn feet.
Artist's Concept of Curiosity. Credit: NASA/JPL-CaltechArtist's concept of landing.Tonight the rover will zap its first rock -- one which scientists have dubbed "Rock N165," a three-inch wide Mars rock that sits about 10 feet away from Curiosity.
"It is not only going to be an excellent test of our system, it should be pretty cool too," Wiens said.
Want to hear the rest of the news conference for yourself? Visit NASA's USTREAM site, where it's all yours.
NASA has dedicated an entire section of its website to its Mars mission and really everything you'd want to know about it is right there.
Facebook to backup its servers with low-power storage devices at 'Sub-Zero' data center
By Alexis Santos
Data backups come in all shapes and sizes. For some, they take the form of external hard drives or a slice of the amorphous cloud. As for Facebook, its upcoming solution is low-power deep-storage hardware contained within a 62,000 square-foot building in Prineville, Oregon near its existing Beaver State data center. Unofficially referred to as "Sub-Zero," the facility will store a copy of the social network's data in case its primary servers need to be restored in an emergency. Rather than continuously power HDDs that are only occasionally used, the new setup can conserve energy by lighting-up drives just when they're needed. One of the company's existing server racks eats up around 4.5 kilowatts, while those at Sub-Zero are each expected to consume approximately 1.5 kilowatts once they're up and running. Tom Furlong, Facebook's vice president of site operations, told Wired that there are hopes to create a similar structure alongside the firm's North Carolina data center. Since the Prineville project is still being planned, Zuckerberg & Co. have roughly six to nine months to suss out all the details before your photos are backed up at the new digs.
So much for the stock hiccup after last month's earnings call -- Apple's valuation now tops $600 billion.
Less than one month ago, shares of Apple plunged after the company's third-quarter earnings "disappointed" the bean counters on Wall Street. Less than a month later, shares of Apple are up more than 12 percent and are again trading at an all-time high. In fact, earlier today, Jefferies & Co. raised its price target from $800 to $900, as Apple's valuation now tops $600 billion.
If you want to understand why, cue up the 1960s' hit by The Happenings, "See You in September."
That's when the iPhone 5, the biggest, worst-kept secret in techdom, is expected to be announced. Peter Misek, the Jeffries analyst responsible for Friday's price upgrade, described the upcoming product launch as "the biggest handset launch in history." Earlier this week, longtime Apple analyst Gene Munster suggested that Apple will sell 26 million to 28 million devices in its September quarter should its latest iPhone land 10 days before the end of the month.
Apple may also be getting a lift from a Wall Street Journal report this week that Apple and certain cable operators were talking about how to "use an Apple device as a set-top box for live television and other content.
How those who live in repressive regimes break into the free world
By Jon Thompson
In a speech at Stanford University in February 2008 Bill Gates said he wasn't worried about online censorship: "I don't see any risk in the world at large that someone will restrict free content flow on the internet," he said. He was wrong.
While the world debates the need for legislation to stop the downloading of illicit copies of commercial digital products, governments are increasingly using censorship as a reason to "protect" us from what they consider undesirable. In some countries, that protection extends to the suppression of basic human rights and news about atrocities.
Even in the UK, the chances of accidentally stumbling upon paedophilic images online are very remote. They're so illegal that they're kept hidden away behind paywalls. Politicians, however, still insist there's a real chance of bumping into such material, and use it as a reason to censor the internet.
We're sensible, law-abiding citizens; given that we will not seek out illicit material, we deserve an uncensored internet. So, how do people go about bypassing online state censorship?
An imprecise art
The main problem with online censorship is that it's a very imprecise art and is usually done on the back of moral panics, or even on the whim of unaccountable individuals. In some cases, censorship is done to look good in front of voters rather than to solve real problems.
The issues are also technological. Unless you know where all the holes are in your censorship scheme, and have the resources to do so, you can never hope to plug them all. Sites may be blocked by the state for a number of reasons. Sometimes, however, those reasons seem arbitrary, or have more to do with the moral compasses of the people making the censorship decisions, rather than any real threat to society.
In some countries, internet censorship is ordered by special state agencies and carried out by individual ISPs. In others, the police simply decide what is to be blocked. In China, for example, the list of banned websites is circulated to ISPs, who are expected to implement it without question. This list changes almost weekly, as the political climate changes.
China also employs a large army of internet enforcement officers whose job it is to monitor forums, blogs and websites and report on what they find. Without question, if the state doesn't like it, no one in China will see it.
Search for the Tiananmen Square massacre of 1989 in China, for example, and you'll find only tourist information, or maybe a warning not to search for such things. In Finland, unaccountable members of the police decide what is to be banned. In the fight against "child pornography" the Finns have banned a disproportionate number of websites, including some in favour of same-sex marriage and even those critical of the bans.
In Saudi Arabia, censorship even extends to online clothing catalogues showing swimsuits. Such actions tell us more about the attitudes and proclivities of the people doing the censoring than the people they're apparently trying to protect.
Ad hoc circumvention
Some internet censorship systems can be bypassed in an ad hoc fashion. This can be done when such systems simply check the URL you want to access against a banned list. When this is the case, if the URL can be made to seem in any way different, the system can be defeated.
The first thing to try is shortening the URL. You can easily shorten a URL using a service such as Bit.ly. If the filtering product keeping you away from a domain knows this trick, it expands the domain name, checks it against the banned list and blocks it accordingly.
It's time to up the ante by using the raw IP address in the browser's URL bar. The ping command (from the command line: 'ping <domain name>') will request and display the IP address of a domain from the local DNS server. However, some filtering systems weed ping traffic out for this reason. If we can't even use the ping command, how do we get at it?
The solution is to use one of the many free online domain IP address lookup services. Unless all these services are also blocked, this should work, thereby also hinting at the problems of trying to censor something as complex and interconnected as the internet.
One lookup service is the aptly named IP-Lookup. Simply enter the name of the domain you want to reach without the 'http://' preamble, and press 'enter'. The IP address appears, but the site also attempts to contact the domain itself and produces a thumbnail of the web page to show that the IP address is good. Copy the IP address into your web browser and press 'enter' to attempt to bypass censorship.
But what if the IP address of the target domains is also blocked? This is where we need to think like hackers and become a little indirect.
Indirect bypassing
When you use Google Search, many of the results are based on cached versions of the web pages, rather than the live pages themselves. This is useful for getting around web censorship measures, because along with the title of the pages in most Google Search result lines there is a 'cached' link. Click this and you read a version of the page stored and accessed from Google's cache server farm.
However, as new features are added and old ones tweaked, the user interface to Google changes frequently. In some browsers the cached links are not available due to cookie issues.
There is a way around this, however. Search for the website you want, then copy the URL from the search results back into the search input box. Add 'cache:' at the start and press 'Enter' to read the Google cache version of the page.
This technique is great for individual pages, but if you click on any content in the cached version, Google will attempt to load the real thing. You have to load each page by hand.
If you want more freedom to surf, you'll need to use a public online proxy. A proxy is like a fulcrum on a lever. You move one end of the lever so that the other end points to wherever you want it, and all the action pivots around the fulcrum.
Similarly, a proxy server acts as a focal point, relaying your requests for web pages to the sites you want to surf and collecting the results to pass back to your browser. Any web censorship software in place only sees your web connection to the proxy server, not to the sites you request.
Plenty of free public proxy servers exist that will act as such a fulcrum to bypass censorship. A searchable list is maintained here. This list refreshes itself in real time and lists the country, IP address and relevant port, and the speed of proxy servers. If you sort the list by response time and click 'Update Results' you can find a good fast server with plenty of throughput. Note down the IP address and port number.
Configuring your browser to channel your surfing activity through a public proxy is easy. In Internet Explorer 9, click 'Tools | Internet Options'. In the window that pops up, click the 'Connections' tab and click the 'LAN Settings' button at the bottom of the window. A sub-window appears.
Click the 'Use a proxy server for your LAN'. This enables the input boxes for the IP address and port details you noted down earlier. Enter these, tick the 'Bypass proxy for local addresses' box, then click 'OK'. Click 'OK' on the mother window and try surfi ng to a site.
In Firefox 12, click the orange Firefox button at the top left of the browser window and click 'Options'. In the resultant window, click the 'Advanced' tab and click the 'Network' sub-tab. Finally, click 'Settings'. A sub-window appears. Select 'Manual proxy confi guration' and the input boxes become enabled. Enter the IP address and port of the proxy server.
In the box marked 'No proxy for' enter the network number of your local network in the form '192.168.1.0/24'. That fourth number is always zero, but substitute the local subnet for the first three numbers if they're different. Click 'OK' to finish and also click 'OK' on the parent window.
The response from any websites you now surf to will seem to be slower and the connections can be flaky. But using a proxy is the go to method of bypassing censorship for millions of people living under regimes who control information.
Getting esoteric
Activists have developed esoteric uses for common web services in the search for information about the world. One such service is the translation services offered by Google, Babel Fish (babelfish. yahoo.com).
The idea is to translate a target web page from English into another language and back again. As this is done, the translation engine fetches the page itself; you never have to surf to it directly. The quality of statistical translation is now so good that in many circumstances the nuances of language survive this process.
Be prepared for some hilarious mistranslations, however. If you just want information rather than direct access to web pages, RSS may be the solution. Not all sites carry an RSS feed, but if the filtering system blocking access only deals with web traffic (HTTP and possibly HTTPS) then installing an RSS reader might be the solution.
One such RSS reader is FeedDemon. You can download and install it by accepting the default settings. When it runs the first time, FeedDemon will set up some default feeds and begin populating them. If you see the number of feeds to be read increasing in the left hand pane, then RSS traffic is not being blocked and you can happily read away.
Another way of dodging censorship is provided by Web2Mail. If a URL is banned, send an email to www@web2mail with the URL of the web page you want to access. The service should email you back the web page so that you can read it in your email client.
So, there are ways around even the most repressive regime's online censorship efforts. The problem for governments is that the web developed organically, without any central plan. This makes it incredibly difficult to censor without the world agreeing to it. So information will always get out and how long that continues is up to us