Page 1 of 1

[0.3] Error training SA from Quarantine after hostname chg

Posted: 21 Jan 2013 05:15
by DavidRa
My EFA still has its training wheels attached; from time to time I get the odd spam through. I've just updated to 0.3, which seemed to go fine, but I cannot train SA any more:

Image
Click to view full size!


Looks like it could be a change from 0.2 to 0.3? Suggestions for logs to examine greatly appreciated.

Re: [0.3] Error training SA from Quarantine

Posted: 21 Jan 2013 13:58
by darky83
Hi David,

Just did a check and seems to be working fine on my systems.
Can you look at the logfile:

Code: Select all

/var/log/baruwa/celeryd.log ?
Best thing to do is login on your machine and tail the log, then try to SA learn a message and look what happends in the log file..

Code: Select all

cd /var/log/baruwa
tail -f celeryd.log
It should look like:

Code: Select all

[2013-01-21 15:54:58,152: INFO/MainProcess] Got task from broker: process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45]
[2013-01-21 15:54:58,187: INFO/PoolWorker-2] process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45]: Bulk Processing 1 quarantined messages
[2013-01-21 15:55:03,794: INFO/PoolWorker-2] process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45]: Message: E1B55C00A4.A01DE learnt as spam
[2013-01-21 15:55:03,901: INFO/MainProcess] Task process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45] succeeded in 5.72766685486s: [{'release': None, 'errors': [], 'learn':...

Re: [0.3] Error training SA from Quarantine

Posted: 22 Jan 2013 02:08
by DavidRa
Nope, my log looks nothing like that. From startup through to retries:

Code: Select all

---- **** -----
--- * ***  * -- [Configuration]
-- * - **** ---   . broker:      amqplib://baruwa@localhost:5672/baruwa
- ** ----------   . loader:      djcelery.loaders.DjangoLoader
- ** ----------   . logfile:     /var/log/baruwa/celeryd.log@INFO
- ** ----------   . concurrency: 2
- ** ----------   . events:      ON
- *** --- * ---   . beat:        ON
-- ******* ----
--- ***** ----- [Queues]
 --------------   . default:     exchange:default (direct) binding:default
                  .hostname:    exchange:default (direct) binding:hostname

[Tasks]
  . delete-domain-signature-files
  . delete-user-signature-files
  . generate-domain-signature-files
  . generate-user-signature-files
  . preview-message
  . process-quarantine
  . process-quarantined-msg
  . release-message
  . test-smtp-server
[2013-01-22 03:06:12,603: INFO/PoolWorker-2] child process calling self.run()
[2013-01-22 03:06:12,606: WARNING/MainProcess] celery@hostname has started.
[2013-01-22 03:06:12,608: INFO/Beat] child process calling self.run()
[2013-01-22 03:06:12,609: INFO/Beat] Celerybeat: Starting...
[2013-01-22 03:06:12,608: INFO/PoolWorker-3] child process calling self.run()
[2013-01-22 03:06:15,647: ERROR/MainProcess] Consumer: Connection Error: Socket closed. Trying again in 2 seconds...
[2013-01-22 03:06:16,203: INFO/Beat] process shutting down
[2013-01-22 03:06:16,208: WARNING/Beat] Process Beat:
[2013-01-22 03:06:16,208: WARNING/Beat] Traceback (most recent call last):
[2013-01-22 03:06:16,208: WARNING/Beat] File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
[2013-01-22 03:06:16,211: WARNING/Beat] self.run()
[2013-01-22 03:06:16,211: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 437, in run
[2013-01-22 03:06:16,228: WARNING/Beat] self.service.start(embedded_process=True)
[2013-01-22 03:06:16,229: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 377, in start
[2013-01-22 03:06:16,229: WARNING/Beat] interval = self.scheduler.tick()
[2013-01-22 03:06:16,229: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 184, in tick
[2013-01-22 03:06:16,229: WARNING/Beat] next_time_to_run = self.maybe_due(entry, self.publisher)
[2013-01-22 03:06:16,229: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/utils/__init__.py", line 221, in __get__
[2013-01-22 03:06:16,241: WARNING/Beat] value = obj.__dict__[self.__name__] = self.__get(obj)
[2013-01-22 03:06:16,241: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 275, in publisher
[2013-01-22 03:06:16,242: WARNING/Beat] return self.Publisher(connection=self.connection)
[2013-01-22 03:06:16,242: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/app/amqp.py", line 328, in TaskPublisher
[2013-01-22 03:06:16,256: WARNING/Beat] return TaskPublisher(*args, **self.app.merge(defaults, kwargs))
[2013-01-22 03:06:16,257: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/app/amqp.py", line 156, in __init__
[2013-01-22 03:06:16,257: WARNING/Beat] super(TaskPublisher, self).__init__(*args, **kwargs)
[2013-01-22 03:06:16,257: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/compat.py", line 80, in __init__
[2013-01-22 03:06:16,274: WARNING/Beat] self.backend = connection.channel()
[2013-01-22 03:06:16,274: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 124, in channel
[2013-01-22 03:06:16,286: WARNING/Beat] chan = self.transport.create_channel(self.connection)
[2013-01-22 03:06:16,286: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 444, in connection
[2013-01-22 03:06:16,286: WARNING/Beat] self._connection = self._establish_connection()
[2013-01-22 03:06:16,286: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 410, in _establish_connection
[2013-01-22 03:06:16,286: WARNING/Beat] conn = self.transport.establish_connection()
[2013-01-22 03:06:16,287: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/transport/pyamqplib.py", line 252, in establish_connection
[2013-01-22 03:06:16,298: WARNING/Beat] connect_timeout=conninfo.connect_timeout)
[2013-01-22 03:06:16,298: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/transport/pyamqplib.py", line 51, in __init__
[2013-01-22 03:06:16,298: WARNING/Beat] super(Connection, self).__init__(*args, **kwargs)
[2013-01-22 03:06:16,298: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/connection.py", line 140, in __init__
[2013-01-22 03:06:16,304: WARNING/Beat] (10, 30), # tune
[2013-01-22 03:06:16,304: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/abstract_channel.py", line 89, in wait
[2013-01-22 03:06:16,315: WARNING/Beat] self.channel_id, allowed_methods)
[2013-01-22 03:06:16,316: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/connection.py", line 198, in _wait_method
[2013-01-22 03:06:16,316: WARNING/Beat] self.method_reader.read_method()
[2013-01-22 03:06:16,316: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/method_framing.py", line 215, in read_method
[2013-01-22 03:06:16,317: WARNING/Beat] raise m
[2013-01-22 03:06:16,317: WARNING/Beat] IOError: Socket closed
[2013-01-22 03:06:16,317: INFO/Beat] process exiting with exitcode 1
[2013-01-22 03:06:20,683: ERROR/MainProcess] Consumer: Connection Error: Socket closed. Trying again in 4 seconds...
[2013-01-22 03:06:27,725: ERROR/MainProcess] Consumer: Connection Error: Socket closed. Trying again in 6 seconds...
From there it increases to a 32 second delay between failed retries.

On a hunch, I undid the server name change I had done as part of the upgrade to 0.3, then restarted - and it works. So it would appear perhaps that the name change needs to update something in the celeryd configuration, in the HOSTS file if not already done, or something similar.

Re: [0.3] Error training SA from Quarantine

Posted: 22 Jan 2013 06:49
by darky83
Ah I see just got the same thing after changing the hostname.
Weird thing is that baruwa does not seem to have any setting that uses the hostname (it uses the python function socket.gethostname() and that seems to work fine)

I'm guessing that rabbit MQ is blocking access after the Hostname change, but will look into it later on.

Re: [0.3] Error training SA from Quarantine

Posted: 22 Jan 2013 11:27
by DavidRa
Yeah, that seems to be borne out by the log file (/var/log/rabbitmq@hostname):

Code: Select all

=INFO REPORT==== 22-Jan-2013::12:06:13 ===
accepted TCP connection on 0.0.0.0:5672 from 127.0.0.1:38131

=INFO REPORT==== 22-Jan-2013::12:06:13 ===
starting TCP connection <0.262.0> from 127.0.0.1:38131

=ERROR REPORT==== 22-Jan-2013::12:06:16 ===
exception on TCP connection <0.262.0> from 127.0.0.1:38131
{channel0_error,starting,
                {amqp_error,access_refused,"login refused for user 'baruwa'",
                            'connection.start_ok'}}

=INFO REPORT==== 22-Jan-2013::12:06:16 ===
closing TCP connection <0.262.0> from 127.0.0.1:38131
This is fairly old but seems to suggest that Mnesia may have problems with hostname changes due to its architecture. This thread suggests changing the NODENAME may help. Unfortunately NODENAME isn't present in any of the config files I found.

Renaming the database directory doesn't work either. More digging required...

Re: [0.3] Error training SA from Quarantine

Posted: 23 Jan 2013 20:38
by darky83
Ah I found the solution.

RabbitMQ uses the hostname as some sort of 'domain' so if you change the hostname all the settings are gone (baruwa user /pass and permissions are set to default)

The following will add the settings to rabbitMQ again:

Code: Select all

PASSWD="`cat /etc/baruwa/settings.py | grep "BROKER_PASSWORD =" | sed 's/.*BROKER_PASSWORD = //' | tr -d '"'`"
rabbitmqctl add_user baruwa $PASSWD
rabbitmqctl add_vhost baruwa
rabbitmqctl set_permissions -p baruwa baruwa ".*" ".*" ".*"
rabbitmqctl delete_user guest
I will add this to the EFA-Configure script in 0.4 :mrgreen:
I also tried playing with the DB directory and moving it to the new HN but that all does not seem to work, only adding the settings again works as far as I can see.
(Searching on Google I see lots of problems when changing the hostname with rabbitmq, seems that adding the settings again manually is the only way that 'works')

Re: [0.3] Error training SA from Quarantine after hostname c

Posted: 25 Jan 2013 00:43
by DavidRa
Tried this - and I couldn't get it to work. I assume it should be done after the name change (in EFA-Configure)? Or does it need to be done "in the middle" of the name change?
root@oldname:/home/efaadmin# PASSWD="`cat /etc/baruwa/settings.py | grep "BROKER_PASSWORD =" | sed 's/.*BROKER_PASSWORD = //' | tr -d '"'`"
root@oldname:/home/efaadmin# rabbitmqctl add_user baruwa $PASSWD
rabbitmqctl add_vhost baruwa
Creating user "baruwa" ...
rabbitmqctl set_permissions -p baruwa baruwa ".*" ".*" ".*"
rabbitmqctl delete_user guestError: unable to connect to node 'rabbit@newname': nodedown
diagnostics:
- nodes and their ports on newname: [{rabbitmqctl15004,46614}]
- current node: 'rabbitmqctl15004@newname'
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: 5gPAOcX1i5KUSLJnkdCRsA==

Re: [0.3] Error training SA from Quarantine after hostname c

Posted: 26 Jan 2013 17:02
by darky83
I your rabbitMQ running?

I get the same errors when I stop RabbitMQ so my guess is that due to the testing your rabbitMQ is not running anymore.

d.