[0.3] Error training SA from Quarantine after hostname chg

Report bugs and workarounds
Post Reply
User avatar
DavidRa
Posts: 30
Joined: 24 Dec 2012 08:29
Location: Sydney, AU
Contact:

[0.3] Error training SA from Quarantine after hostname chg

Post by DavidRa »

My EFA still has its training wheels attached; from time to time I get the odd spam through. I've just updated to 0.3, which seemed to go fine, but I cannot train SA any more:

Image
Click to view full size!


Looks like it could be a change from 0.2 to 0.3? Suggestions for logs to examine greatly appreciated.
Last edited by DavidRa on 25 Jan 2013 00:35, edited 1 time in total.
User avatar
darky83
Site Admin
Posts: 540
Joined: 30 Sep 2012 11:03
Location: eFa
Contact:

Re: [0.3] Error training SA from Quarantine

Post by darky83 »

Hi David,

Just did a check and seems to be working fine on my systems.
Can you look at the logfile:

Code: Select all

/var/log/baruwa/celeryd.log ?
Best thing to do is login on your machine and tail the log, then try to SA learn a message and look what happends in the log file..

Code: Select all

cd /var/log/baruwa
tail -f celeryd.log
It should look like:

Code: Select all

[2013-01-21 15:54:58,152: INFO/MainProcess] Got task from broker: process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45]
[2013-01-21 15:54:58,187: INFO/PoolWorker-2] process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45]: Bulk Processing 1 quarantined messages
[2013-01-21 15:55:03,794: INFO/PoolWorker-2] process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45]: Message: E1B55C00A4.A01DE learnt as spam
[2013-01-21 15:55:03,901: INFO/MainProcess] Task process-quarantine[2952d39f-a270-4df9-ae68-c8022740ad45] succeeded in 5.72766685486s: [{'release': None, 'errors': [], 'learn':...
Version eFa 4.x now available!
User avatar
DavidRa
Posts: 30
Joined: 24 Dec 2012 08:29
Location: Sydney, AU
Contact:

Re: [0.3] Error training SA from Quarantine

Post by DavidRa »

Nope, my log looks nothing like that. From startup through to retries:

Code: Select all

---- **** -----
--- * ***  * -- [Configuration]
-- * - **** ---   . broker:      amqplib://baruwa@localhost:5672/baruwa
- ** ----------   . loader:      djcelery.loaders.DjangoLoader
- ** ----------   . logfile:     /var/log/baruwa/celeryd.log@INFO
- ** ----------   . concurrency: 2
- ** ----------   . events:      ON
- *** --- * ---   . beat:        ON
-- ******* ----
--- ***** ----- [Queues]
 --------------   . default:     exchange:default (direct) binding:default
                  .hostname:    exchange:default (direct) binding:hostname

[Tasks]
  . delete-domain-signature-files
  . delete-user-signature-files
  . generate-domain-signature-files
  . generate-user-signature-files
  . preview-message
  . process-quarantine
  . process-quarantined-msg
  . release-message
  . test-smtp-server
[2013-01-22 03:06:12,603: INFO/PoolWorker-2] child process calling self.run()
[2013-01-22 03:06:12,606: WARNING/MainProcess] celery@hostname has started.
[2013-01-22 03:06:12,608: INFO/Beat] child process calling self.run()
[2013-01-22 03:06:12,609: INFO/Beat] Celerybeat: Starting...
[2013-01-22 03:06:12,608: INFO/PoolWorker-3] child process calling self.run()
[2013-01-22 03:06:15,647: ERROR/MainProcess] Consumer: Connection Error: Socket closed. Trying again in 2 seconds...
[2013-01-22 03:06:16,203: INFO/Beat] process shutting down
[2013-01-22 03:06:16,208: WARNING/Beat] Process Beat:
[2013-01-22 03:06:16,208: WARNING/Beat] Traceback (most recent call last):
[2013-01-22 03:06:16,208: WARNING/Beat] File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in _bootstrap
[2013-01-22 03:06:16,211: WARNING/Beat] self.run()
[2013-01-22 03:06:16,211: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 437, in run
[2013-01-22 03:06:16,228: WARNING/Beat] self.service.start(embedded_process=True)
[2013-01-22 03:06:16,229: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 377, in start
[2013-01-22 03:06:16,229: WARNING/Beat] interval = self.scheduler.tick()
[2013-01-22 03:06:16,229: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 184, in tick
[2013-01-22 03:06:16,229: WARNING/Beat] next_time_to_run = self.maybe_due(entry, self.publisher)
[2013-01-22 03:06:16,229: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/utils/__init__.py", line 221, in __get__
[2013-01-22 03:06:16,241: WARNING/Beat] value = obj.__dict__[self.__name__] = self.__get(obj)
[2013-01-22 03:06:16,241: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/beat.py", line 275, in publisher
[2013-01-22 03:06:16,242: WARNING/Beat] return self.Publisher(connection=self.connection)
[2013-01-22 03:06:16,242: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/app/amqp.py", line 328, in TaskPublisher
[2013-01-22 03:06:16,256: WARNING/Beat] return TaskPublisher(*args, **self.app.merge(defaults, kwargs))
[2013-01-22 03:06:16,257: WARNING/Beat] File "/usr/lib/pymodules/python2.6/celery/app/amqp.py", line 156, in __init__
[2013-01-22 03:06:16,257: WARNING/Beat] super(TaskPublisher, self).__init__(*args, **kwargs)
[2013-01-22 03:06:16,257: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/compat.py", line 80, in __init__
[2013-01-22 03:06:16,274: WARNING/Beat] self.backend = connection.channel()
[2013-01-22 03:06:16,274: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 124, in channel
[2013-01-22 03:06:16,286: WARNING/Beat] chan = self.transport.create_channel(self.connection)
[2013-01-22 03:06:16,286: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 444, in connection
[2013-01-22 03:06:16,286: WARNING/Beat] self._connection = self._establish_connection()
[2013-01-22 03:06:16,286: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/connection.py", line 410, in _establish_connection
[2013-01-22 03:06:16,286: WARNING/Beat] conn = self.transport.establish_connection()
[2013-01-22 03:06:16,287: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/transport/pyamqplib.py", line 252, in establish_connection
[2013-01-22 03:06:16,298: WARNING/Beat] connect_timeout=conninfo.connect_timeout)
[2013-01-22 03:06:16,298: WARNING/Beat] File "/usr/lib/pymodules/python2.6/kombu/transport/pyamqplib.py", line 51, in __init__
[2013-01-22 03:06:16,298: WARNING/Beat] super(Connection, self).__init__(*args, **kwargs)
[2013-01-22 03:06:16,298: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/connection.py", line 140, in __init__
[2013-01-22 03:06:16,304: WARNING/Beat] (10, 30), # tune
[2013-01-22 03:06:16,304: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/abstract_channel.py", line 89, in wait
[2013-01-22 03:06:16,315: WARNING/Beat] self.channel_id, allowed_methods)
[2013-01-22 03:06:16,316: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/connection.py", line 198, in _wait_method
[2013-01-22 03:06:16,316: WARNING/Beat] self.method_reader.read_method()
[2013-01-22 03:06:16,316: WARNING/Beat] File "/usr/lib/pymodules/python2.6/amqplib/client_0_8/method_framing.py", line 215, in read_method
[2013-01-22 03:06:16,317: WARNING/Beat] raise m
[2013-01-22 03:06:16,317: WARNING/Beat] IOError: Socket closed
[2013-01-22 03:06:16,317: INFO/Beat] process exiting with exitcode 1
[2013-01-22 03:06:20,683: ERROR/MainProcess] Consumer: Connection Error: Socket closed. Trying again in 4 seconds...
[2013-01-22 03:06:27,725: ERROR/MainProcess] Consumer: Connection Error: Socket closed. Trying again in 6 seconds...
From there it increases to a 32 second delay between failed retries.

On a hunch, I undid the server name change I had done as part of the upgrade to 0.3, then restarted - and it works. So it would appear perhaps that the name change needs to update something in the celeryd configuration, in the HOSTS file if not already done, or something similar.
User avatar
darky83
Site Admin
Posts: 540
Joined: 30 Sep 2012 11:03
Location: eFa
Contact:

Re: [0.3] Error training SA from Quarantine

Post by darky83 »

Ah I see just got the same thing after changing the hostname.
Weird thing is that baruwa does not seem to have any setting that uses the hostname (it uses the python function socket.gethostname() and that seems to work fine)

I'm guessing that rabbit MQ is blocking access after the Hostname change, but will look into it later on.
Version eFa 4.x now available!
User avatar
DavidRa
Posts: 30
Joined: 24 Dec 2012 08:29
Location: Sydney, AU
Contact:

Re: [0.3] Error training SA from Quarantine

Post by DavidRa »

Yeah, that seems to be borne out by the log file (/var/log/rabbitmq@hostname):

Code: Select all

=INFO REPORT==== 22-Jan-2013::12:06:13 ===
accepted TCP connection on 0.0.0.0:5672 from 127.0.0.1:38131

=INFO REPORT==== 22-Jan-2013::12:06:13 ===
starting TCP connection <0.262.0> from 127.0.0.1:38131

=ERROR REPORT==== 22-Jan-2013::12:06:16 ===
exception on TCP connection <0.262.0> from 127.0.0.1:38131
{channel0_error,starting,
                {amqp_error,access_refused,"login refused for user 'baruwa'",
                            'connection.start_ok'}}

=INFO REPORT==== 22-Jan-2013::12:06:16 ===
closing TCP connection <0.262.0> from 127.0.0.1:38131
This is fairly old but seems to suggest that Mnesia may have problems with hostname changes due to its architecture. This thread suggests changing the NODENAME may help. Unfortunately NODENAME isn't present in any of the config files I found.

Renaming the database directory doesn't work either. More digging required...
User avatar
darky83
Site Admin
Posts: 540
Joined: 30 Sep 2012 11:03
Location: eFa
Contact:

Re: [0.3] Error training SA from Quarantine

Post by darky83 »

Ah I found the solution.

RabbitMQ uses the hostname as some sort of 'domain' so if you change the hostname all the settings are gone (baruwa user /pass and permissions are set to default)

The following will add the settings to rabbitMQ again:

Code: Select all

PASSWD="`cat /etc/baruwa/settings.py | grep "BROKER_PASSWORD =" | sed 's/.*BROKER_PASSWORD = //' | tr -d '"'`"
rabbitmqctl add_user baruwa $PASSWD
rabbitmqctl add_vhost baruwa
rabbitmqctl set_permissions -p baruwa baruwa ".*" ".*" ".*"
rabbitmqctl delete_user guest
I will add this to the EFA-Configure script in 0.4 :mrgreen:
I also tried playing with the DB directory and moving it to the new HN but that all does not seem to work, only adding the settings again works as far as I can see.
(Searching on Google I see lots of problems when changing the hostname with rabbitmq, seems that adding the settings again manually is the only way that 'works')
Version eFa 4.x now available!
User avatar
DavidRa
Posts: 30
Joined: 24 Dec 2012 08:29
Location: Sydney, AU
Contact:

Re: [0.3] Error training SA from Quarantine after hostname c

Post by DavidRa »

Tried this - and I couldn't get it to work. I assume it should be done after the name change (in EFA-Configure)? Or does it need to be done "in the middle" of the name change?
root@oldname:/home/efaadmin# PASSWD="`cat /etc/baruwa/settings.py | grep "BROKER_PASSWORD =" | sed 's/.*BROKER_PASSWORD = //' | tr -d '"'`"
root@oldname:/home/efaadmin# rabbitmqctl add_user baruwa $PASSWD
rabbitmqctl add_vhost baruwa
Creating user "baruwa" ...
rabbitmqctl set_permissions -p baruwa baruwa ".*" ".*" ".*"
rabbitmqctl delete_user guestError: unable to connect to node 'rabbit@newname': nodedown
diagnostics:
- nodes and their ports on newname: [{rabbitmqctl15004,46614}]
- current node: 'rabbitmqctl15004@newname'
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: 5gPAOcX1i5KUSLJnkdCRsA==
User avatar
darky83
Site Admin
Posts: 540
Joined: 30 Sep 2012 11:03
Location: eFa
Contact:

Re: [0.3] Error training SA from Quarantine after hostname c

Post by darky83 »

I your rabbitMQ running?

I get the same errors when I stop RabbitMQ so my guess is that due to the testing your rabbitMQ is not running anymore.

d.
Version eFa 4.x now available!
Post Reply