Thursday, June 24, 2010
SB: TaskLifeManager restarted on submit-2
was restarted earlier by Sanjay. Post this for recordkeeping
restarted TaskLifeManager on submit-2
no update to log since >20min. Last lines in log:
crab@submit-2 ~/work/TaskLifeManager$ tail -5 ComponentLog
2010-06-23 12:30:34,142:[/tmp/del_proxies/241fb22338757d8c32122053135f8cecd3d645a6]
2010-06-23 14:49:14,720:Exception raised: 'Error deleting [/data/CSstoragePath/slehti_crab_0_100618_161934_qe306c/out_files_1149.tgz]'
2010-06-23 14:49:14,721:problems deleting osb for job 1149
2010-06-23 14:51:41,590:Exception raised: 'Error deleting [/data/CSstoragePath/slehti_crab_0_100618_161934_qe306c/crab_fjr_1173.xml]'
2010-06-23 14:51:41,590:problems deleting osb for job 1173
crab@submit-2 ~/work/TaskLifeManager$
crab@submit-2 ~/work/TaskLifeManager$ tail -5 ComponentLog
2010-06-23 12:30:34,142:[/tmp/del_proxies/241fb22338757d8c32122053135f8cecd3d645a6]
2010-06-23 14:49:14,720:Exception raised: 'Error deleting [/data/CSstoragePath/slehti_crab_0_100618_161934_qe306c/out_files_1149.tgz]'
2010-06-23 14:49:14,721:problems deleting osb for job 1149
2010-06-23 14:51:41,590:Exception raised: 'Error deleting [/data/CSstoragePath/slehti_crab_0_100618_161934_qe306c/crab_fjr_1173.xml]'
2010-06-23 14:51:41,590:problems deleting osb for job 1173
crab@submit-2 ~/work/TaskLifeManager$
Wednesday, June 23, 2010
GetOutput restarted again on vocms21
since it was not updating since 20:00 Also http://vocms21.cern.ch:8888/compstatus/ wa showing ~1K jobs in JC queue, not there are 8.
Tuesday, June 22, 2010
GetOutput restarted on vocms21
beacause of
https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/1404/1/1/1.html
no useful message was in ComponentLog
https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/1404/1/1/1.html
no useful message was in ComponentLog
vocms21 updated
earlier this morning vocms21 was updated live to crab server 112. So far so good,
components are running but we keep server off the available pool
https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/1403/1.html
plan is to drain it and then scratch the 3GB MySqlDB and start a new one
with autoextend set to 1GB.
stefano (upon info from Sanjay)
components are running but we keep server off the available pool
https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/1403/1.html
plan is to drain it and then scratch the 3GB MySqlDB and start a new one
with autoextend set to 1GB.
stefano (upon info from Sanjay)
taking slc5cern = vocms21.cern.ch off the available servers list
because TaskTraking is not restarting
TakTracking on vocms21 stuck. fails to restart
Restarted TT because of
https://savannah.cern.ch/bugs/index.php?69074
But fails to restart
Component: TaskTracking NOT Running
2010-06-22 10:48:14,365:Problem registering component
[(1205, 'Lock wait timeout exceeded; try restarting transaction')]
2010-06-22 10:48:14,368:Traceback (most recent call last):
File "/home/crab/sw_area/slc5_amd64_gcc434/cms/crab-server/CRABSERVER_1_1_1_pre12/lib/TaskTracking/TaskTrackingComponent.py", line 1185, in startComponent
self.ms.registerAs("TaskTracking")
File "/home/crab/sw_area/slc5_amd64_gcc434/cms/prodagent/PRODAGENT_0_12_17_CRAB_1-cmp7/lib/MessageService/MessageService.py", line 154, in registerAs
cursor.execute(sqlCommand)
File "/build/diego/crabserver/slc5_amd64_gcc434/external/py2-mysqldb/1.2.2-cmp4/lib/python2.4/site-packages/MySQLdb/cursors.py", line 166, in execute
File "/build/diego/crabserver/slc5_amd64_gcc434/external/py2-mysqldb/1.2.2-cmp4/lib/python2.4/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
Component keep trying automatically, but keeps failing.
https://savannah.cern.ch/bugs/index.php?69074
But fails to restart
Component: TaskTracking NOT Running
2010-06-22 10:48:14,365:Problem registering component
[(1205, 'Lock wait timeout exceeded; try restarting transaction')]
2010-06-22 10:48:14,368:Traceback (most recent call last):
File "/home/crab/sw_area/slc5_amd64_gcc434/cms/crab-server/CRABSERVER_1_1_1_pre12/lib/TaskTracking/TaskTrackingComponent.py", line 1185, in startComponent
self.ms.registerAs("TaskTracking")
File "/home/crab/sw_area/slc5_amd64_gcc434/cms/prodagent/PRODAGENT_0_12_17_CRAB_1-cmp7/lib/MessageService/MessageService.py", line 154, in registerAs
cursor.execute(sqlCommand)
File "/build/diego/crabserver/slc5_amd64_gcc434/external/py2-mysqldb/1.2.2-cmp4/lib/python2.4/site-packages/MySQLdb/cursors.py", line 166, in execute
File "/build/diego/crabserver/slc5_amd64_gcc434/external/py2-mysqldb/1.2.2-cmp4/lib/python2.4/site-packages/MySQLdb/connections.py", line 35, in defaulterrorhandler
OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
Component keep trying automatically, but keeps failing.
Subscribe to:
Posts (Atom)