May 25: One of the nodes was down (compute-2-0). fsck.ext3: Unable to resolve label=/ First do fdisk -l then e2label /dev/sda1, showed /1 changed /1 to / by e2label /dev/sda1 / then checked in /etc/fstab to see the label of /dev/sda1 Then changed it in the grub menu. Fixed !!!! May 27: Back up was down, followed the instructions posted on the cluster, unmounted nas1, general, backup, powered down nas1, rebooted it, and then mounted the three things again. Back-up is back :D May 31: Error mapping grid identity: Cannot map user for host '' - it is not defined in any host to group mapping Sent an email to Jeffrey Duton (the guy had same issues, but was nice enough not to post the solution :P). July 1: Drive 0.6 in nas1 died. Replaced it with the last backup from warranty. Use ctrl-H to access raid bios and then followed Jordan's documentation on reconfiguring the RAID array. July 11: FPL scheduled power outage. Everything went down and up smoothly, except for 2-0 had the same error it has been having. Looking more into it. Found other nodes are set to /1 and not / like 2-0 is set to. Changed this and should not have the issue with 2-0 any more!! :) July 16: Drive 0.5 died. Replaced it with a new drive and rebuilt the raid. No more beeps! July 19: Came into look into getting Phedex up and running. Copied files from the CE in the FIGURE THIS OUT directory to the SE. This allowed the eval `PHEDEX/Utilities/Master -config ~/SITECONF/T3_US_FIT/PhEDEx/Config environ` to work Then we tried to start the command: /home/phedex/PHEDEX/Utilities/Master -config ~/SITECONF/T3_US_FIT/PhEDEx/Config start all Unfortunately this didn't work :/ we believe it has to do with Jordan still being the Site Admin and Phedex Contact on the SiteDB. Emailed Rob to try and see if he can do anything to change it. July 24: CVMFS upgrade from 2.1.17 on CE and 2.1.15 on workers node to 2.1.19 Ran the folowing commands $ yum install osg-oasis $ emacs -nw /etc/fuse.conf (check for user_allow_other) $ emacs -nw /etc/auto.master (check for /cvmfs /etc/auto.cvmfs) $ service autofs restart $ cvmfs_config reload Also changed the Quota limit in the file /etc/cvmfs/default.local/ to 5000 on the worker nodes kept at 10000 on the CE. Did all the above on all worker nodes and the CE. compute-1-3 was full cleared out some cvfms cache files and it worked out. Also updated GUMS. Followed directions from Doug. Did a yum update osg-gums-config. Made a backup of /etc/gums/gums.config. Then changed the new gums.config by replacing the @USER@, @SERVER@, @PASSWORD and @DOMAINNAME@ with their respective values. After that did service tomcat5 restart and service mysqld restart. yum update vo-client on all the nodes. Then check /etc/vomses July 27: Working on Phedex and trying to find things on condor. Made a backup of all files in /home/phedex/SITECONF/T3_US_FIT called backup_files. When updating PhEDeX make sure to change values in /home/phedex/SITECONF/T3_US_FIT/PHREDEX/Config (PHEDEX_VERSION and others in ENVIRON common) caused small headaches in changing variables August 10: Two drives dead in nas1 14 and 16. Waiting on another drive to replace both at the same time. GUMS - Doug assisted us and changed a few things in the cluster and it seems to be working fine now. Also had to change the amount of grid accounts mapped to 5000. Doug's changes: GUMS was mapping accounts just only to rsv, this was because the user group rsv was set to voms, when it should be set to manual. Phedex - Fixed variables in ~/SITECONF/T3_US_FIT/PhEDEx/Config - Changed PHEDEX_GLITE_ENV to /mnt/nas0/OSG/APP/glite/etc/profile.d/grid_env.sh Attempted to use phedex to grab a dataset, progress from the last time - Actually got to the screen pending approval atm. Also the CE was up to 92% so we ran cvmfs_config wipecache this brought it down to 84%.