��January 23, 2012 Voms-proxy-info gave a lot of errors, voms-proxy-init (as Xenia gave) ��WARNING: Unable to verify signature! Server certificate possibly not installed. Error: VOMS extension not found! subject : /DC=org/DC=doegrids/OU=People/CN=Xenia Fave 604774/CN=proxy issuer : /DC=org/DC=doegrids/OU=People/CN=Xenia Fave 604774 identity : /DC=org/DC=doegrids/OU=People/CN=Xenia Fave 604774 type : proxy strength : 1024 bits path : /tmp/x509up_u537 timeleft : 11:59:37�� maybe I��m not CA and that is why its not liking me? I should preface this with me adding my userkey.pem and usercert.pem to phedex and changing the passphrase in proxyrenew.sh.forpatrick to my passphrase instead of Xenia��s, though I��m not CA so it may not like me��idk Then on my account I tried voms-proxy-info and it said ��Couldn��t find a valid proxy�� and then I tried voms-proxy-init and it said ��Unable to find user certificate. unsupported method Function: BIO_read not enough data Function: ASN1_D2I_READ_BIO processing cert File=/home/jfischer/.globus/usercred.p12 Function: proxy_init_cred�� To get my certificate stuff copied over, Extract your certificate (which contains the public key) and the private key: Certificate: openssl pkcs12 -in YourCert.p12 -clcerts -nokeys -out $HOME/.globus/usercert.pem To get the encrypted private key : openssl pkcs12 -in YourCert.p12 -nocerts -out $HOME/.globus/userkey.pem You must set the mode on your userkey.pem file to read/write only by the owner, otherwise grid-proxy-init will not use it(chmod go-rw $HOME/.globus/userkey.pem). From http://www.doegrids.org/pages/cert-request.htm January 25, 2012 Emailed Himali again about SRM copy problem��he is going to send me a list of the commands he uses to get to the problem, and I am gin g to try and replicate it��hopefully I can��t so I can just make him a new account Cert/Proxy problem: See above problem. We got an email from OSG Security to update openssl which I did with yum yum update openssl I need to figure out how to restart openssl libraries��Important for security fix Also, tried the vdt update CA crls��and everything failed��and the proxies need to be renewed proxy.cert expired on December 15, related to above? I think everything is broken ? Used $VDT_LOCATION/fetch-crl/share/doc/fetch-crl-2.6.6/fetch-crl.cron to try and update CA CRLs and that gives me the error: fetch-crl[13885]: 20120125T163922-0500 verify failed for CRL issued by 'SWITCHslcs CA (5e15f3bd)' (Error getting CRL issuer certificate) did vdt-update-certs then tried the $VDT_LOCATION��. Command again and it worked! Tried to do voms-proxy-init and it gave me the error bad file system permissions on private key key must only be readable by the user File=/etc/grid-security/hostkey.pem So, now checking permissions January 27, 2012 Fixed voms-proxy-init and was able to run the proxyrenewscript I had copied my old certificate over not my new one. Then there were a lot of libraries missing. THEN there was an error where it couldn��t find the file vomses��so I made it so it would be happy. I also had to change some permissions and to run voms-proxy-init and the proxy renew script you need to be phedex. And you run the proxy renew script first and then voms-proxy-init. SAM tests that continually fail are still failing and the tests on the CE haven��t been run for over 24 hours��maybe this will fix this January 28, 2012 NOPE. But the SAM test on the SE that fails (put) is failing for what looks to be a different reason. Instead of unknown it says: ERROR: file:/var/lib/gridprobes/cms.Role=production/org.cms/SRM/uscms1-se.fltech-grid3.fit.edu/testFile.txt: zero number of replicas I have a feeling that it is because the crls need to be updated. Trying to do that with the above $VDT_LOCATION��. stuff. MERRRRR gives me this: fetch-crl[10622]: 20120128T104917-0500 RetrieveFileByURL: download no data from http://ca.grid.arn.dz/pki/pub/crl/cacrl.crl But based on what the SAM tests say, this could also be related to Himali��s problem. Below is the output of the Put test: CRITICAL:�� Testing from: samnag016.cern.ch DN: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=asciaba/CN=430796/CN=Andrea Sciaba/CN=proxy/CN=proxy/CN=proxy/CN=proxy VOMS FQANs: /cms/Role=production/Capability=NULL, /cms/Role=NULL/Capability=NULL, /cms/TEAM/Role=NULL/Capability=NULL, /cms/dbs/Role=NULL/Capability=NULL lcg_util-1.11.16-2 GFAL-client-1.11.16-2 VOPut: Copy file using lcg_cp3(). Parameters: defaulttype: 2 srctype: 0 dsttype: 2 nobdi: 1 vo: cms nbstreams: 1 conf_file: - insecure: 0 verbose: 1 timeout: 120 src_spacetokendesc: - dest_spacetokendesc: - StartTime of the transfer: 2012-01-28 15:27:01.588432 Destination: srm://uscms1-se.fltech-grid3.fit.edu:8443/srm/v2/server?SFN=/bestman/BeStMan/cms/store/unmerged/SAM/testSRM/SAM-uscms1-se.fltech-grid3.fit.edu/lcg-util/testfile-put-nospacetoken-1327760814-762787f07f55.txt ERROR: file:/var/lib/gridprobes/cms.Role=production/org.cms/SRM/uscms1-se.fltech-grid3.fit.edu/testFile.txt: zero number of replicas lcg_cp3 detailed output is: ----- Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: cms Checksum type: None Destination SE type: SRMv2 Destination SRM Request Token: uscms01:47_PUT_4208137889 Source URL: file:/var/lib/gridprobes/cms.Role=production/org.cms/SRM/uscms1-se.fltech-grid3.fit.edu/testFile.txt File size: 20 Source URL for copy: file:/var/lib/gridprobes/cms.Role=production/org.cms/SRM/uscms1-se.fltech-grid3.fit.edu/testFile.txt Destination URL: gsiftp://uscms1-se.fltech-grid3.fit.edu//bestman/BeStMan/cms/store/unmerged/SAM/testSRM/SAM-uscms1-se.fltech-grid3.fit.edu/lcg-util/testfile-put-nospacetoken-1327760814-762787f07f55.txt # streams: 1 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst ----- VO specific Detailed Output: None critical= 1 File was NOT copied to SRM. file= testfile-put-nospacetoken-1327760814-762787f07f55.txt Highlighted text makes me think this is Himali��s problem. Crap. 15:27:20 February 2, 2012 I need to become the CA for the site...so, to do this, I need to be come the OSG RA Authority Agent for the site and then become the CA for the DOEGrids...I think So I sent them an email based on a site I found in Xenia's documentation. Now we wait on that... In the mean time, I need to train Vallary, Kim, Burcu, Sam, and Joao...and whoosh there goes all my time Problems that need to be fixed: * Himali's SRM Copy/Publish Problem *need to renew grid proxy and there is a problem with that *I am trying to replicate his problem with my account to see if it is localized to his account or if it is just a a cluster malfunction *SAM Tests on CE haven't been run in a week...and the same SAM Tests on the SE are broken or have an error as was of the old proxy problem with PhEDEx that I fixed last week...and I think it has to do with Himali's problem *I think I'll have Kim restart NFS and Squid for funsies to see if that could fix anything...you never know *Along the training lines: I am now meeting with Burcu and Sam on Tuesdays at 12:30 until 2:00 which kinda sucks for me. But I think I am going to try and kill two birds with one stone here: I can have KIM meet with them and fix the Doug Cramer Condor problem Delegation is awesome *Vallary I think needs more of the one-on-one time with me, and I can't think of a problem to have her fix at the moment :( as in, the problems are all pretty complicated...and I don't know how I can even fix them...:(:(:(:(:(:( <--haha look at it the other way! So as to try and get stuff done I will not be IN the lab so no one can bug me. Initialize seclusion mode! February 15, 2012 Trying to replicate Himali's Problem: $export SCRAM_ARCH=slc5_amd64_gcc434 $source /mnt/nas0/OSG/APP/cmssoft/cms/cmsset_default.sh $scramv1 project CMSSW CMSSW_4_2_8 $cd CMSSW_4_2_8/src $cmsenv $cd ~ $source /mnt/nas0/OSG/APP/crab/CRAB_2_7_9/crab.sh $crab -create crab.cfg February 20, 2012 Error running crab job and now trying to fix it. It says I need to register to in SiteDB. Following instructions from https://twiki.cern.ch/twiki/bin/viewauth/CMS/SiteDBForCRAB#Adding_your_DN_to_your_profile " received the following errors: crab: Version 2.7.9 running on Wed Feb 15 12:07:06 2012 EST (17:07:06 UTC) crab. Working options: scheduler condor_g job type CMSSW server OFF working directory /home/jfischer/crab_0_120215_120706/ crab: error detecting glite version crab: error detecting glite version crab: Command: condor_config_val ENABLE_GRID_MONITOR failed with exit code 1 Not defined: ENABLE_GRID_MONITOR crab: Error extracting user name from SiteDB: Problem parsing data. Cachefile cleared. Retrying may work Check that you are registered in SiteDB, see https://twiki.cern.ch/twiki/bin/view/CMS/SiteDBForCRAB or there is no user name associated to DN /DC=org/DC=doegrids/OU=People/CN=Johanna-Laina Fischer ****** in SiteDB. You need to register in SiteDB with the instructions at https://twiki.cern.ch/twiki/bin/view/CMS/SiteDBForCRAB Log file is /home/jfischer/crab_0_120215_120706/log/crab.log" yeah...I forgot my Hypernews login stuff apparently :( Emailed the Hypernews Admin. Hopefully this can be resolved tonight! February 23, 2012 I am not registerd with SiteDB and I changed my Hypernews password. The new error is crab: User defined PSet file step2_DIGI_L1_DIGI2RAW_HLT.py does not exist So I'll just have to copy this over from Himali's directory, but the question now is, which directory to put it in? ALSO! I got an email from Bockjoo: "FIT is not accessible via grid with this error: GRAM Job submission failed because the connection to the server failed (check host and port) (error code 12) As a result, CMSSW software installation is failing at FIT. Is FIT on a downtime?" Basically FIT is not accessible to the grid and I restarted the gatekeeper: #vdt-control --off globus-gatekeeper #vdt-control --off gsiftp #vdt-control --on globus-gatekeeper #vdt-control --on gsiftp Back to crab and another error: crab: ERROR ***: failed Data Discovery in DBS : (u'DBS Server Raised An Error: \n API Invoked listFiles\n Unavailable data, No such processed dataset /TTbar/hkalakhe-TTbar_977dd24716a5240e32bdf2a0eb738b22/USER\n\n\n',) so..vdt thing didn't work. Bockjoo emailed me back and said that our host certificate expired...I used: #cert-request -ou s -dir . -label host.opensciencegrid.org Output: checking CertLib version, V2-7, This is the latest version, released 18 May 2009. Processing OU=Services request. Give reason (1 line) you qualify for certificate, such as member of CMS experiment or collaborating with Condor team, etc. reason: Certificate expired input server administrator's name: Johanna-Laina Fischer input full hostname: uscms1.fltech-grid3.fit.edu.opensciencegrid.org Generating a 2048 bit RSA private key .........................................................................................................................................................+++ ......................................................................+++ writing new private key to './uscms1.fltech-grid3.fit.edu.opensciencgrid.orgkey.pem' ----- input your email address: jfischer2009@my.fit.edu input your complete phone number: 3132045810 Choose a registration authority to which you are affiliated. If nothing else applies, pick OSG. _Enter__this____for this registration authority ANL Argonne National Lab ESG Earth System Grid ESnet DOE Science network FNAL Fermilab host and service certificates FusionGRID National Fusion Collaboratory Project LBNL Berkeley Lab LCG LHC Computing Grid Catchall NERSC computer center ORNL Oak Ridge National Lab OSG Open Science Grid (choose this if nothing else applies) PNNL Pacific Northwest National Lab (choose from left column): OSG osg OSG Choose a virtual organization under your OSG affiliation: ATLAS: United States ATLAS Collaboration BNL: Brookhaven lab researchers CDF: Collider Detector at Fermilab CIGI: CyberInfrastructure and Geospatial Information Laboratory CMS: Compact Muon Solenoid CompBioGrid: CompBioGrid DES: Dark Energy Survey DOSAR: Distributed Organization for Scientific and Academic Research DZero: D0 Experiment at Fermilab Engage: Engagement Fermilab: Fermi National Accelerator Center FermilabAccelerator: Fermilab/Accelerator FermilabAstro: Fermilab/Astro FermilabCdms: Fermilab/Cdms FermilabGrid: fermilab VO grid group FermilabHypercp: Fermilab/Hypercp FermilabKTeV: Fermilab/KTeV FermilabMinerva: Fermilab/Minerva FermilabMiniboone: Fermilab/Miniboone FermilabMinos: Fermilab/Minos FermilabMipp: Fermilab/Mipp FermilabNova: Fermilab/Nova FermilabNumi: Fermilab/Numi FermilabPatriot: Fermilab/Patriot FermilabTest: Fermilab/Test FermilabTheory: Fermilab/Theory Geant4: Geant4 Software Toolkit GLOW: Grid Laboratory of Wisconsin GPN: Great Plains Network GRASE: Group Researching Advances in Software Engineering at University of New York at Buffalo GROW: Grid Research and Education Group at Iowa GUGrid: Georgetown University Grid I2u2: Interactions in Understanding the Universe Initiative IceCube: IceCube Neutrino Telescope ILC: International Linear Collider JLab: Jefferson Lab researchers LIGO: Laser Interferometer Gravitational-Wave Observatory Mariachi: Mixed Apparatus for Radar Investigation of Cosmic-rays of High Ionization Experiment MIS: OSG Monitoring Information System NanoHUB: nanoHUB Network for Computational Nanotechnology (NCN) NWICG: Northwest Indiana Computational Grid NYSGRID: NYSGRID Ops: WLCG Operations Group OSG: Open Science Grid OSGEDU: OSG Education Activity SBGrid: Structural Biology Grid SDSS: Sloan Digital Sky Survey SLAC: SLAC National Accelerator Laboratory researchers STAR: Solenoidal Tracker at RHIC (Choose from left column; pick osg if nothing else applies): CMS OSG:CMS You must agree to abide by the DOEGrids policies, at http://www.doegrids.org/Docs/CP-CPS.pdf and you assert that you are authorized to request and install this certificate on the specified host. Do you agree (y,N): y Your Certificate Request has been successfully submitted Your Certificate Request id: 83838 You will receive a notification email from the CA when your certificate has been issued. Please disregard the instructions to download your certificate though a web browser and use the cert-retrieve script instead. March 21, 2012 /dev/sda1 full!!! got to a directory and type df . to see what hard drive its mounted on moved linux tarbal to /mnt/nas1/backups-files from /usr/src/kernels basically, full drive = bad everything March 23, 2012 Kim is awesome! She move stuff and deleted other stuff and now we are no longer full! I restarted vdt, squid, nfs, apache, condor, httpd, ganglia receptor, gatekeeper, and the rsv probes We have jobs again! March 30, 2012 /dev/sda2 is at 93% which is scary. Basically, all of the info in sda2 is mounted from /var from what I found. I asked Kim to look into so of it, but I am going to see if there are old tarballs or something of the like that I can delete. Rahhh problems I have a feeling its a log file thing again. :( Nope. I think its mail. Root gets mail. And I can read it. The cluster sends "emails" to its users. WOW. Uh, root as at crap ton of mail. Crap. Not sure how to deal with that. Its in: /var/mail/ to read: mail -f /var/mail/[username] use "+" and "-" to manuever. You can "email" stuff/people back. Crazy!! April 5, 2012 Switch decided to die so we had to call IT to reset it. Broken cluster. Tried to bring it back up, can't ssh to it. I think that the home directories arent properly mounted. Or the internet connection is super crappy in high bay, which it is. So, idk But, I can cd to my home directory so maybe that means that it is actually mounted? idk So I think they are mounted. I just sshed to my home from the CE. But maybe it can do that since I'm at the cluster already? Not sure. But i need to try it out OUTSIDE of the highbay. Rahhhhh slow internet here. Restart ganglia receptors, condor, apache, tomcat, gatekeeper, bestman, httpd, phedex, globus, rsvprobes April 11, 2012 Cannot ping to SE. Can ssh to it from CE. Some sort of network connection problem. The ethernet link is going up and down. Also sshing to the SE is slow. IT is helping us right now. Looks like network closet switch is broken. IT moved around some connections, we are temporarily on 100mbit 12mbites/sec within a week or two new switch will be in, 10 Gb April 26, 2012 Switch in, waiting until next semester to setup it up. Goody. Certs to expire TOMORROW. Taking care of it TODAY and looking like a total PROCRASTINATOR. PS I'm NOt. PPS, when renewing certs, you need to get gridadmin privledges which you need to ask for. Also digitally sign your email. This is important, otherwise they complain. Notes: http cert in /etc/grid-security/http/ certificates owned by daemon -r--r--r-- for cert -r-------- for key ...and DONE! rsv cert in /etc/grid-security/ certificates owned by rsvuser -r--r--r-- for cert -r-------- for key ...and DONE! host cert in /etc/grid-security certificates owned by root -r--r--r-- for cert -rw------- for key ...and DONE! THEN make a copy of the hostcert and key and rename them containercert.pem and containerkey.pem in which they are owned by daemon -r--r--r-- for cert -r-------- for key ...and DONE! also moved copies to a file in my directory. And also moved uscms1.fltech-grid3.fit.edu.opensciencegrid.org.pem and ".org.req to said file.