On rabbitmqctl and badrpc,nodedown

In the true spirit of open source community that has formed around RabbitMQ in the past several years and continues growing every week, on the mailing list we have recently tackled an issue when one runs “rabbitmqctl status” and gets “badrpc,nodedown” response, while broker is running as evidenced by ps output. Check out a thread on “broker runs; cant’ get status” here. The issue is centered around erlang’s security and communications mechanism in distributed mode. Here is a tentative step-by-step that can help you resolve the issue.

Are you running broker as user rabbitmq and rabbitmqctl as root or rabbitmq? If not, please stop and fix this. This is not a requirement per se, but this represents canonical installation of rabbitmq broker. You can certainly hack your scripts to work around this requirement, but you are on your own if you do.

Double check that broker is in fact running. Use ps, netstat -lptn (look for port 5672 unless you overrode it in /etc/default/rabbitmq). Telnet to localhost on port 5672, type AMQP and press ENTER several times. You should get response that at least will show AMQ. Check logs at /var/log/rabbitmq to verify that broker saw your connection attempt.

Next step is to start "erl -sname foo -cookie coo" in shell and run this command: "net_adm:names()."

If this command returns ok followed by a list of nodes within 1 or 2 seconds, check if rabbit is there. If it is, it’s very likely that you have users mixed up above. Please double check. If rabbit node is not listed, double check that rabbit broker is still running.

If this command returns {error,address}, there is a problem with your instance of EPMD, erlang naming daemon (man net_adm). First, check if it’s running (it most likely is running). Then, in erlang, run ”net_adm:localhost().” Exit erlang, and try to connect to exactly the name you got from net_adm:localhost() on port 4369 (epmd). This shouldn’t work and should timeout. If it doesn’t time out, you shouldn’t have gotten {error,address}.

The problem most likely will be that name as returned by net_adm:localhost() is associated with some IP in /etc/hosts or in DNS which is either not accessible from this host, or firewalled off. An entry in /etc/hosts that associates this name with 127.0.0.1 or one of other IPs on this server should fix the problem.

Alternatively, net_adm:names() may time out with {error, timeout}. We have seen it caused by snoopy in the past. Remove snoopy or do not install it system wide in /etc/ld.so.preload, and you should be fine.

If these steps did not help, please leave a comment below or join rabbitmq-discuss and ask your question there, and we’ll help!

UPDATE 2009-04-17: Your host's name as shown by net_adm:localhost() does not technically need to be defined in /etc/hosts. But if it's not there, it should not be defined anywhere - when you do "ping name", you should get "ping: unknown host name". I have seen at least one case when it worked this way. This is somewhat unverified though.

Categories: rabbitmq | erlang |

Comments (8)

riteshrathi // 04 Mar 2009

"Are you running broker as user rabbitmq and rabbitmqctl as root or rabbitmq? If not, please stop and fix this. This is not a requirement per se, but this represents canonical installation of rabbitmq broker. You can certainly hack your scripts to work around this requirement, but you are on your own if you do."

Donot understand this point of yours. I have installed the broker as root and also trying to get the status as root. I am getting this error.

net_adm:names(). returns {ok,[{"rabbit",35217},{"foo",57135}]}

Not able to identify what is the problem. Can you please help ?

Dmitriy // 04 Mar 2009

The issue is that when you run "rabbitmqctl status" as root, it tries to su to user rabbitmq (take a look inside rabbitmqctl). And since your broker is running as root, user rabbitmq can't get its cookie (stored in ~root/.erlang.cookie) and so can't talk to the broker node.

Timothy Perrett // 16 May 2009

Hi Dmitriy, great article. However, one thing that is not clear to me is the significance of net_adm:names(). ? I am running the broker as "rabbitmq" user and then try to run rabbitmqctl and I still get the {badrpc,nodedown} error.

If both are running as the same user, how on earth am i still seeing this?

Cheers, Tim

Dmitriy // 16 May 2009

Tim,

If net_adm:names() does not hang and rabbit appears in the list, it means the problem is not related to epmd and host addressing.

- Dmitriy

Alexandre // 18 Jun 2009

Hi Dmitriy,

I install the RB-MQ on the MacOs and d'nt have problem. But I install RB-MQ on the FreeBSD 7 (64-bit) , I can't start the rabbitmqctl.

the net_adm:localhost() write DNS server name the net_adm:names() write {ok,[{"rabbit",1088}]}

the chmod file ~root/.erlang.cookie is x400 (20 bytes)

how I can to start rabbitmqctl?

Dmitriy // 18 Jun 2009

~root/.erlang.cookie should not be relevant to this. When you start rabbitmqctl as root, it does "su" to user rabbitmq and hence uses ~rabbitmq/.erlang.cookie.

Because net_adm commands do not hang, your epmd setup is OK. Which leaves out possibilities of user mix up or cookie mix up.

Are you sure broker process is running as user rabbitmq?

Alexandre // 19 Jun 2009

>Are you sure broker process is running as user rabbitmq?

the ps axu | grep rabbit show that process user is rabbitmq

Vincent // 22 Jun 2009

I found a solution to a related issue which may be helpful to mention.

I was trying to run rabbit after migrating an existing user account. Status kept listing my old username "rabbit@V ..."

$ /opt/local/sbin/rabbitmqctl status Status of node rabbit@V ... {badrpc,nodedown} ...done.

My install of rabbitmq is though mac ports. Deleting the rabbit directory and contents within /opt/local/var/lib/rabbitmq/mnesia/rabbit/ did the trick for me.