Thursday, June 24, 2010

SB: TaskLifeManager restarted on submit-2

was restarted earlier by Sanjay. Post this for recordkeeping

restarted TaskLifeManager on submit-2

no update to log since >20min. Last lines in log:

crab@submit-2 ~/work/TaskLifeManager$ tail -5 ComponentLog
2010-06-23 12:30:34,142:[/tmp/del_proxies/241fb22338757d8c32122053135f8cecd3d645a6]
2010-06-23 14:49:14,720:Exception raised: 'Error deleting [/data/CSstoragePath/slehti_crab_0_100618_161934_qe306c/out_files_1149.tgz]'
2010-06-23 14:49:14,721:problems deleting osb for job 1149
2010-06-23 14:51:41,590:Exception raised: 'Error deleting [/data/CSstoragePath/slehti_crab_0_100618_161934_qe306c/crab_fjr_1173.xml]'
2010-06-23 14:51:41,590:problems deleting osb for job 1173
crab@submit-2 ~/work/TaskLifeManager$

Wednesday, June 23, 2010

GetOutput restarted again on vocms21

since it was not updating since 20:00 Also http://vocms21.cern.ch:8888/compstatus/ wa showing ~1K jobs in JC queue, not there are 8.

Tuesday, June 22, 2010

GetOutput restarted on vocms21

beacause of
https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/1404/1/1/1.html
no useful message was in ComponentLog

vocms21 updated

earlier this morning vocms21 was updated live to crab server 112. So far so good,
components are running but we keep server off the available pool
https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/1403/1.html

plan is to drain it and then scratch the 3GB MySqlDB and start a new one
with autoextend set to 1GB.

stefano (upon info from Sanjay)

taking slc5cern = vocms21.cern.ch off the available servers list

because TaskTraking is not restarting

TakTracking on vocms21 stuck. fails to restart

Restarted TT because of
https://savannah.cern.ch/bugs/index.php?69074

But fails to restart
Component: TaskTracking NOT Running

2010-06-22 10:48:14,365:Problem registering component
[(1205, 'Lock wait timeout exceeded; try restarting transaction')]
2010-06-22 10:48:14,368:Traceback (most recent call last):
File "/home/crab/sw_area/slc5_amd64_gcc434/cms/crab-server/CRABSERVER_1_1_1_pre12/lib/TaskTracking/TaskTrackingComponent.py", line 1185, in startComponent
self.ms.registerAs("TaskTracking")
File "/home/crab/sw_area/slc5_amd64_gcc434/cms/prodagent/PRODAGENT_0_12_17_CRAB_1-cmp7/lib/MessageService/MessageService.py", line 154, in registerAs
cursor.execute(sqlCommand)
File "/build/diego/crabserver/slc5_amd64_gcc434/external/py2-mysqldb/1.2.2-cmp4/lib/python2.4/site-packages/MySQLdb/cursors.py", line 166, in execute
File "/build/diego/crabserver/slc5_amd64_gcc434/external/py2-mysqldb/1.2.2-cmp4/lib/python2.4/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')



Component keep trying automatically, but keeps failing.