21st September 2009

Google Apps sics crawlers on public docs and sheets

Google will soon allow search engines to crawl and index documents, spreadsheets, and presentations published to the web via its online office suite, Google Apps.

On Friday, in a letter to Google Apps users, the web giant informed users the change would arrive “in a few weeks.” This was confirmed by a Google spokeswoman in an email to The Reg, who pointed out that on the Google Apps “help center” site, the company says the change is no more than a fortnight away.

“We will be launching a change for published docs. The change will allow published docs that are linked to from a public website to be crawled and indexed, which means they can appear in search results you see on Google.com and other search engines,” Google says.

This only applies to files explicitly published using the suite’s “publish as web page” or “publish/embed” options and linked to from a public webpage. This does not apply to files shared via the “Allow anyone with the link to view (no sign-in required)” option, which provides for document sharing without links to the public web.

Google warns that if you don’t want your publicly-published documents crawled, you can de-publish them. Instructions for de-publishing are here.

At the help center, one Google Apps user has asked if – in light of the change – the company could provide a clear indication of which apps are public and which are not. “I think this makes it very important that you bring back the indication on the docs listing of those files that are published,” the user says. “Maybe a separate label/folder of published docs/spreadsheets?”

Indeed, as it stands, Google Apps master view does not tell you which docs are publicly published and which aren’t.

Stumble it! Del.icio.us Check out my lens

posted in Uncategorized | 0 Comments

17th September 2009

Google Acquires reCAPTCHA

Google has acquired reCAPTCHA, which provides CAPTCHA technology for more than 100,000 Web sites. reCAPTCHA is a spinoff of Carnegie Mellon University’s computer science department.

A CAPTCHA is a bit of text that Web sites use to verify that it’s indeed a human being on the other end of the line — not a spam robot or computer. (Though they’re not 100% secure, CAPTCHAs tend to block out most spam.)

The twist in the deal that could help Google is that many of reCAPTCHA’s words come from scanned newspapers and old books. By having humans type the scanned words into reCAPTCHA, they get help reading the scanned text. This could be helpful for Google’s book scanning project.

Stumble it! Del.icio.us Check out my lens

posted in Uncategorized | 0 Comments

8th September 2009

Make Your Blog Real-Time With RSSCloud

All blogs on the WordPress.com platform and any WordPress.org blogs that opt-in (using this plug-in) will now make instant updates available to any RSS readers subscribed to a new feature called RSSCloud. There is currently only one RSS aggregator that supports RSSCloud, Dave Winer’s brand-new reader River2. That will probably change very soon. Update: Within hours another RSS reader called LazyFeed has announced that it will support RSSCloud as well.

RSSCloud is an element that’s always been present in the RSS 2.0 spec but has drawn new attention with the rise of interest in the Real-Time Web. The element was just added to the WordPress code this afternoon. The implications of this big vote of support go beyond reading WordPress blogs; this is the kind of traction that new technologies can leverage to gain support in many different applications.

Supporting feed readers will now be able to request updates from WordPress blog feeds as soon as they become available, instead of polling a server periodically to check for updates. (Your blog posts typically get picked up by RSS aggregators 15 to 60 minutes after you posted them – this will change that.) The feature is already being rolled out, several WordPress users report seeing the cloud element in the source code of their RSS feeds. Update: Here’s the official announcement from WordPress HQ.

Google Reader, the dominant RSS aggregator on the market, began a limited implementation of a related protocol called PubSubHubbub last month. Facebook-acquired FriendFeed worked with Google on that system.

Now RSSCloud has a posse. Half a million blogs are created each month on WordPress and if Google Reader keeps taking its sweet time checking those blogs for updates instead of turning on support for RSSCloud, it’s going to look slow as molasses.

Real time updates could enable several things. Faster distribution of blog posts, more compelling conversations in real-time and a renewed timeliness for blogging vs. services like Twitter are all likely consequences. The list of possible technical developments on top of RSSCloud could be as open-ended as the developments enabled by the core of RSS.

RSS has made blogging viable by freeing readers of the requirement of visiting each site they are interested in. It has made podcasts subscribable. It has made wiki change notifications trackable outside the mess of the email inbox. It has made search a persistent action, instead of a one-off occasional delayed reaction. RSS is mixable, mashable, parsable, filterabile.

Now RSSCloud could add a real-time dimension to all of that. The paradigm just got a very big vote of support.

Stumble it! Del.icio.us Check out my lens

posted in Blogging | 0 Comments

  • Subscribe

  • Add to Google
  • Add to My Yahoo!
  • Subscribe with Bloglines
  • Subscribe in NewsGator Online
  • Add to Technorati Favorites!
  • Feedburner Reader
  • Get free E-Book on blogging

  • Online Marketing
  • RSS


eXTReMe Tracker