On Privacy of Private RSS Feeds

19 Nov 2008

I have been using Google Reader as my main RSS aggregator for several years now. Unlike some others, I however continue to use a desktop-based RSS client to subscribe to private feeds. This was an intuitive decision, I didn’t spend much time thinking about it.

Earlier this week, I was in for a big surprise. I did a search on a public search engine, and results included a link that I recognized to be from a private feed. In other words, public index included information that should only be available to registered and authorized users of a particular site.

So I started digging. First of all, there are three general forms how a private feed can be implemented. This post summarizes it nicely - unique secret token in URI, cookie or HTTP authentication. Unique secret token seems to be most popular these days, possibly because the other two methods will make it more difficult to get such feed in online readers.

With unique secret token method however, a feed publisher must somehow notify search bots that this content is not to be indexed. Otherwise, we rely on the fact that this URI will never be discovered, which becomes problematic with so many people switching to online readers recently. I found an old story on Techcrunch that brought up the same issue and discussed efforts by Bloglines to set up a standard for this, but could not confirm if those efforts led anywhere. This leaves one well-known method - robots.txt.

Dear publishers of private feeds! Please make sure to disallow access to private feed URIs on your sites in robots.txt. I checked two major publishers of private feeds in the last couple of days that use unique secret token method, and none of them have proper disallow in their robots.txt. Ouch!

Am I missing anything? If you think my theory is wrong and there is a better method, please let me know in comments below.

Categories: blogging |

Using RabbitMQ Beyond Queueing

11 Nov 2008

UPDATED 2008-11-12: Adjusted Failover section below (additions in italic) based on a thread on rabbit-discuss.

I am a big fan of RabbitMQ, an implementation of Advanced Message Queueing Protocol. In this post I am going to provide an overview how RabbitMQ can be used beyond simple queueing and pubsub. For more background on this topic, please see a list of messaging scenarios that RabbitMQ supports.

Queueing

Broker can take a message from producer and keep it until a consumer shows up. To survive broker restarts, queues in this case should be durable, with auto-delete set to false, and messages should be published with delivery mode of 2 (which means persistent). This pattern can also be helpful if consumer is temporarily unable to keep up with incoming message flow - queueing allows producer and consumer to keep going at their own pace, and will make sure all messages are consumed eventually.

Multiplex

This is when multiple producers publish messages to be routed to a single consumer. You can either have all your producers publish to the same fanout exchange, or to the same direct exchange with the same routing key, or to a topic exchange with a routing key that matches that of consumer. The latter case allows producers to publish “flights.aa.ord” and “flights.ua.sfo”, while the consumer can be reading all of these with “flights.#” (* matches a single word, # matches zero or more words).

Demultiplex

This is when a single producer publishes messages that are routed to multiple different consumers. It can be a topic exchange, producer could publish “order.book” and “order.cd”, while orders for books and CDs are handled by different systems.

Instant Feedback (Queueing Bypass)

Producer can instruct broker not to queue a message at all and return it to sender if a consumer is not currently available to read it (this is achieved by setting immediate flag to true in basic.publish method). Can be helpful in scenarios where message content is time sensitive - it needs to be processed now or an error must be returned.

Duplicating

In AMQP, a message is delivered to all queues bound to a given exchange, if a queue meets routing criteria (for different types of exchanges, these criteria are different). For example, a message published with “prod.server01.disk.full” key can be simultaneously routed to “prod.#” queue (for production logger to keep track of all events in production environment) and “#.disk.full” queue (for an archiver process that removes old logs). Very powerful feature, and it works with direct and fanout exchanges as well.

Load balancing

If multiple consumers read from the same queue, RabbitMQ broker will automatically load balance messages between all available consumers. Each message will be sent to one consumer at a time.

Failover

In no_ack=false mode, a consumer must eventually explicitly acknowledge receipt of each message, individually or as a group (this does not mean that a message must be ack'ed before next one can be received). If a consumer disconnects without acknowledging, unack'ed messages are automatically re-queued for another consumer. This helps achieve consumer failover in response to crashes or loss of network connectivity.

Relaying

If producers and consumers do not have direct line of sight network-wise (for example, they are behind NAT or are located on private subnets), RabbitMQ can provide the connectivity by serving as a message relayer. Both producers and consumers must be able to establish client connections to broker (AMQP official port is TCP 5672) and then they can exchange messages.

Conclusion

Some of these patterns can be mixed and matched, which further expands a set of problems where RabbitMQ can help you achieve a distributed messaging nirvana.

Categories: rabbitmq |

On Private IPv4 Address Spaces

31 Oct 2008

Most people who work with Internet know about RFC1918 “Address Allocation for Private Internets.” But did you know that RFC 3330 “Special-Use IPv4 Addresses” has even more address spaces allocated for non-public use?

I didn’t know about it till today.

Categories: linux |

Gotta Love Open Source

30 Oct 2008

… And it’s not only because it’s often cheaper to own or use, but also because it raises the bar for every single piece of proprietary software - they no longer can get away with poor user interface or limited features like they used to. Proprietary software now has to beat and exceed open source to win a customer, which results in better products. A win-win-win situation for users, open source and proprietary software.

In economics, this is called a non-zero-sum situation.

Categories: linux |

More Cloud Magic from CohesiveFT

28 Oct 2008

Someone once asked me to explain cloud computing. I jokingly replied that it’s like running your servers somewhere where there is no shortage of CPU power, storage capacity or bandwidth, and you get charged only for what you actually use. And if you needed more, you just ask (via API) - and it’s there. “Wow! There’s gotta be some magic involved in that,” my buddy said.

Today we at CohesiveFT announced a new solution called VPN-Cubed, which can add even more magic to your cloud-based deployment. It offers “customer-controlled security in a cloud, across multiple clouds, and between the physical data center and cloud(s).” But it’s not only a security solution, but also a network infrastructure component that complements our flagship Elastic Server On Demand platform. It has high availability built in, and no single points of failure. It supports many different topologies and is available on many different operating systems (including Windows). It was developed in part to facilitate our own internal infrastructure (read: we needed something like this to run our own business), and has been in use internally for some time.

I was involved in this project from the engineering side, and I am extremely excited about the end result. You should definitely check it out!

Categories: cohesiveft |