sysadmin-tips-and-tricks: January 2013

Wednesday 30 January 2013

Solaris - recover a root password in a local zone

Someone managed to reset the root password of a local zone incorrectly which resulted in a machine we couldn't log in as root......

To fix it log in to the global zone as root.

Edit the shadow file of the offending local zone (e.g. Local zone is called LZ01)

# vi /zones/LZ01/root/etc/shadow

Edit the root entry in the shadow file like so:

root::15435::::::

The save the entry (esc key, colon, wq!)

Log in to the console of the local zone from the global zone:

# zlogin -C LZ01

Log in as root (which now just logs you in without prompting for a password).

Reset the root password.

# passwd root

Follow the prompts.......

Tuesday 22 January 2013

vSphere - Monitoring vSMP VMs (VMA & resxtop)

A server was built in the VI and I received the healthcheck email the next morning stating there was a 4 vCPU VM - which is unusual as the max vCPU we normally have is 2! The supplier was quite insistent that the server needed to have 4 CPUs.

The warnings VMware give with regards vSMP VMs is (as quoted from the VMworld 2011 VSP3866 session):

"vSMP VMs may not always use those vCPUs. Test your applications and verify that the threads for that application are being split among the processors equitably. An idle vCPU incurs a scheduling penalty.

Pay attention to the %CSTP counter on vSMP VMs. The more you see this, the more your processing is unbalanced. (The ESX 4.x relaxed co-scheduler is needing to catch all vCPUs up to a vCPU that is much further advanced.)".

In short - if there are too many vCPUs on a VM you could get the opposite affect to what you want.

How to check:

1. Log into the vMA using putty (or similar).
2. Set a target - in this case the Virtual Center server.

# vifptarget -s <vc-server>

3. Run resxtop.

# resxtop --server <esxi-host>

4. Login using the ESXi account when prompted (Host needs to be out of Lockdown Mode - Hosts are connected using AdAuth) .

5. Look for any increase from 0.00 in %CSTP.

6. Exit resxtop CTRL-C.

You can then decide what to do - add/remove vCPUs.

Description of %CSTP (extract from DOC-11812)
The percentage of time the world spent in ready, co-deschedule state. This co-deschedule state is only meaningful for SMP VMs. Roughly speaking, ESX CPU scheduler deliberately puts a VCPU in this state, if this VCPU advances much farther than other VCPUs. VCPU with high %CSTP is "stopped" from executing so that another VCPU in the same virtual machine could be run to "catch-up".

References:
VSP3866 VMworld 2011 sessions (USA and EMEA)

VMware communities interpreting esxtop 4.1 stats
http://communities.vmware.com/docs/DOC-11812

VMware communities interpreting esxtop stats
http://communities.vmware.com/docs/DOC-9279

Friday 18 January 2013

Solaris - Third party management tools

So a New Year is upon us and I'd thought I'd better document what tools I use for managing the Solaris environment - if anyone read my ramblings I would be interested to see what you use.

Free Tools:

SSH connectivity - putty
SFTP connections to move files around - coreftp
GUI connectivity - Xming

I've used putty for a number of years on and off and it does the job admirably. I do like the fact it can be setup to record input and output which has helped me out a couple of times.

For file transfer I've started using coreftp as I read somewhere that the FileZilla client stored the SFTP password in clear text..... I hadn't read anything bad about coreftp.

We did have some older Exceed licenses for GUI connectivity and when I got given a Windows 7 build they weren't valid for a supported install - so I went looking for a freebie and stumbled across Xming which connects great but it doesn't like to be logged out so I just click on the X in the top right hand corner to kill the window. Nothing flags in the logs as being a problem afterwards but I'd rather not exit it that way and I will investigate a little more when I have time

Monday 14 January 2013

HP - Service Pack for ProLiant

After the HP NC522SFP saga one of the nuggets of information I gratefully received was the HP Service Pack for ProLiant. Which was great! Apart from where can it be found on the HP website!!!!!!

I decided to hunt the page down to enable me to find it quickly when I needed to (also regularly check if there are new releases or not).

Home page: http://h18004.www1.hp.com/products/servers/management/spp/index.html

Download page: http://h18004.www1.hp.com/products/servers/service_packs/en/index.html

Extract from the site on benefits of using SPP:

Leverages the Power of HP Smart Update Manager (HP SUM).

Broad portfolio support for HP ProLiant servers, BladeSystem enclosures, and HP CloudSystem

Radically simplified updates with a single step installation process containing both firmware and systems software with drivers packaged together

Consolidation -- single solution for all supported ProLiant servers

Interdependency testing of drivers and firmware

Offered from the web to provide convenient access
PXE bootable ISO images to reduce customer:

Qualification cycles
Resource usage
Maintenance windows
Downtime

Integral Part of the HP Server Experience

Essential management tool designed to simplify IT management in a dynamic and demanding 7x24x365 environment
Increases Ease of Server Management
- More efficient enclosure updates because of enhancements to HP SUM
- Consolidated sets of tested firmware and HP System Software (drivers, agents, utilities)
- Enhances IT staff productivity while reducing downtime
- Provides Reliable Configuration by:
- Simplified delivery of ProLiant software and firmware.

Thursday 3 January 2013

Solaris - Patching Zones with NFS mount points

Further to previous posts (and test I've done) I came across issues where I've patched a couple of servers running Sparse Zones which have NFS mount points. The issues are:

1. Mount points disappear and have to be recreated upon reboot. Symptoms: no mount points and directories disappeared but the entries in vfstab are still there.

2. The worst of the two is where upon reboot (post BE activation) the Zones failed to start and when manually started the 3 errors appeared:
libpool(3LIB) error: System error
dedicated-cpu setting cannot be instantiated
call to zoneadmd failed

After logging a call on MOS they support engineer has linked the problem to a bug:
ID# 15774198 lucreate(1M)and lumake(1M)cannot copy NFS mountpoint non-global zone.

As of today (3rd January2013) I have had no further updates.....