Tuesday 27 March 2012

Solaris - Configuring link-based IPMP

When the new Solaris servers turned up at my place of work we didn't have anyone trained on Solaris to install the OS and configure the machines - so as part of the deal we paid for a number of contractors to come in and complete this work. One of the tasks I wanted them to do was to configure resilient network connections 2 x 1Gbe LAN and 2 x 10Gbe Storage LAN on most servers.

The lead Engineer then started talking about IPMP.
According to the speil from the manufacturer it does the following:
  1. Eliminates a single network adapter as a single point of failure in the the following cases
    1. Network adapter failure
    2. Network link failure
  2. It enables interfaces to fail over within approximately 10 seconds (default configuration) - value can be adjusted in the /etc/default/mpathd file.
  3. It can be configured for use with both IPv4 and IPv6.
  4. It enables interfaces to be configured as standby interfaces.
There are two different ways of configuring IPMP:
  • probe-based - utilises test addresses to monitor the health of the interfaces.
  • link-based - the interface kernel driver utilises "Link Up"/"Link Down" status of the interface to monitor interface state of health.

Prep Work

1. View the /etc/hosts file - edit if necessary
2. View the /etc/netmasks file - edit if necessary

NOTE: I've put this step in as you may have two sets of IPMP to setup (which I have) which will require adding additional IP addess and subnet mask.

By following these steps you can setup link-based IPMP with 2 interfaces.

1. Discover network interfaces

Run dladm show-link



(A type of non-vlan or vlan indicates that the hardware is GLDv3 compliant hardware).

For this example I'll be using bge0 and bge2 as the IPMP pairing.

2. Hostname files

Create, or amend, the appropriately named hostname files, add the text listed below (ipmp0 is the IPMP unique group name).
  • Check that files exist
   ls -al /etc | grep host
  • If they don't create them
   touch /etc/hostname.bge2
  • vi /etc/hostname.bge0
   <hostname> netmask + broadcast + group ipmp0 up
  • vi /etc/hostname.bge2
   group ipmp0 up

3. Bringing the cards up
  • ifconfig bge0 plumb
  • ifconfig bge2 plumb
  • ifconfig bge0 `cat /etc/hostname.bge0` up
  • ifconfig bge2 `cat /etc/hostname.bge2` up
4. Test the configuration

Bring the primary NIC down (bge0) to check the IPMP configuration.

From a console session run and leave it running:
tail -f /var/adm/messages

From a second console run the command below which will disable bge0:
if_mpadm -d bge0

In the console session running the message log output you should have an entry stating a successful fail over has occurred.

Run the following from the 2nd console to enable bge0:
if_mpadm -r bge0

Looking back at the message log console another entry should appear stating that the IPMP setting has failed back to bge0 - CTRL & C to exit

5. Complete

Configuration is now complete

Monday 26 March 2012

Solaris 10 Admin PDFs site

A useful docs site which Oracle have moved over from Sun which covers a lot of Solaris 10 Admin PDFs

http://docs.oracle.com/cd/E19253-01/

Thursday 15 March 2012

XSCF

Useful commnds for setting up the XSCF

Create new user:
adduser <user name>

Set password:
password <username>

Set the privileges to the accoutn you've just created (in this case an Admin account):
setprivileges <user account> useradm mode platadm fieldeng auditadm

Show user accounts:
showuser -l

Set NTP and check time:
Show current timezone:
showtimezone -c tz

Configure timezone:
settimezone -c settz -s Europe/London

Show current NTP setting:
showntp -l

Configure NTP:
setntp -c add <ip address>

Check time settings:
showdate

Domain sessions:
To check the domain status:
showdomainstatus -a

To start a Domain console session on Domain 0
console -d 0

To reset the Domain OS (which will reboot the OS!!!)
reset -d 0 por (there are other options apart from por)

To start up Domain 0:
poweron -d 0

To shutdown Domain 0:
poweroff -d 0

I found a number of useful sites while working on the XSCF, one of which is:
http://saifulaziz.com/2010/07/21/m-series-extended-system-control-facility-xscf-command-line-guide/

ILOM

Some useful commands I've gleaned while setting up a number of new T series SPARC boxes.

Create User account:
To create a new Administrator account: 
create /SP/users/<account name> role=administrator

To set the password:
set /SP/users/<account name> password

Show users:
show /SP/users

Set NTP settings:
Check settings:
show /SP/clock
(Check the Timezone is correct and the usentpserver=enabled)

To check to see if a NTP server has been assigned:
show /SP/clients/ntp/server/1

Set NTP server:
set /SP/clients/ntp/server/1 address=<ip address>

Set the clock so it uses the NTP server:
set /SP/clock/ usentpserver=enabled

Set timezone:
set /SP/clock timezone=Europe/London

Console - Solaris connection:
To boot the OS:
start /SYS

To make a console connection:
start /SP/console

To exit back into the ILOM from the console:
#.

If the console hangs - check to see if there are any existing console sessions
show /SP/sessions
Then disconnect the running console session:
stop /SP/console

Network settings:
Check configuration:
show /SP/network

I found a number of sites useful while going through the configuration of the ILOM, one of them was this site:
http://skullboxx.net/kb/node/482

A useful link to a Oracle cheat sheet

Wednesday 14 March 2012

Solaris - Stopping/Starting services

At some point you may want to stop and start some services.

For example some kind soul started the Telnet service on one of Production servers and left it running - a big security no-no!!

To list all services:
svcs

To list the Telnet service:
svcs -l telnet

fmri         svc:/network/telnet:default
name         Telnet server
enabled      true
state        online
next_state   none
state_time   28 September 2011 14:08:15 BST
restarter    svc:/network/inetd:default
contract_id



To disable the telnet service:
svcadm disable telnet

To enable the telnet service:
svcadm -v enable -r telnet

FTP only user

We get a number of requests for users to have FTP only access to servers so data can be moved about.

This is how we go about setting the accounts up......

1. Check the path the account needs access to - specifically the Group ownership.
2. Create account:

useradd -u <userid> -g <group> -d <homedir> -s <shell> -c "<account description>" -m <username>

userid - choose a free user number  (check which userid's are free by cat /etc/passwd).
group - the group id that has access to the files.
homedir - path to the required FTP root (/mount_point/folder1/folder2)
shell - set to /bin/true
account description - some information about the account
username - friendly name for the user to type in

The following held true for our Solaris 9 servers:
If it is the first FTP account created on the machine then the following files will need to be added - /etc/shells & /bin/true

NOTE: On our Solaris 10 servers the true file was in the /usr/bin path.....

3. Edit the file /etc/ftpd/ftpaccess - look for the "# guestuser" line and add accordingly underneath, the format is "guestuser <tab> <userid>"

4. Run ftpconfig -d <ftp-root-path>
e.g. ftpconfig -d /mount_point/folder1/folder2
This will add extra system directories to the given path and prevent traversal.

Tuesday 13 March 2012

Solaris - scheduled prstat & vmstat jobs

We had a need to run a scheduled hourly run of prstat & vmstat for one of our suppliers to assist in troubleshooting.

1) The output would sent to a log file which would be appended too.
2) The programs would run for approx 1 minute.

Two scripts were written:

prstat script:
#!/usr/bin/ksh
echo "`date` : Start ------------------------"
/usr/bin/prstat -ca

echo "`date` : End ------------------------"
NOTE: prstat can't be configured to only for a set amount of runs/time.

vmstat script:
#!/usr/bin/ksh
echo "`date` : Start ------------------------"
/usr/bin/vmstat 5 12

echo "`date` : End ------------------------"
NOTE: vmstat has been set to run at 5 second intervals for 12 times.

crontab entries:
# prstat scheduled task
00 09 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 10 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 11 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 12 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 13 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 14 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 15 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 15 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
00 16 * * 2-5 /export/home/<account>/<script> >>/tmp/prstat.out
# vmstat scheduled task
00 09 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 10 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 11 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 12 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 13 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 14 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 15 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
00 16 * * 2-5 /export/home/<account>/<script> >>/tmp/vmstat.out
# automatically kill prstat scheduled
01 09 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 10 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 11 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 12 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 13 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 14 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 15 * * 2-5 /usr/bin/pkill -x -u <account> prstat
01 16 * * 2-5 /usr/bin/pkill -x -u <account> prstat

NOTE: pkill had to be used to kill the prstat.

Monday 12 March 2012

Solaris - ZFS ARC cache settings

During the build of our new servers which were built with Solaris 10 using ZFS boot disks we came across an issue with some of our DB and App servers.

My understanding of the ARC cache is that the ZFS, by default, uses all available memory but will give it to any other services, just grudingly which manifests itself in a slight delay... This delay can cause some DB/Apps problems - to get around this you can change the ARC Cache settings which restricts the amount of memory that it can use (Hard Limit Size).

The file to be edited is the /etc/system file.

Type the following (I seem to remember that there is no prior entry, so this will be the first ZFS ARC cache entry).

* ZFS ARC cache entry (comment)
set zfs:zfs_arc_max=<hard limit size>

Exit and save the file.

Then the system will need rebooting for the changes to take affect.

NOTE: Hard Limit Size should be set to the limit of remaining memeory AFTER you've taken into consideration your Applications requirements.

Example: The system has 16Gb RAM, the Application(s) require 12Gb - which leaves a balance of 4Gb RAM, this will be the Hard Limit Size.

The Hard Limit Size needs to be written in byte format - using the example above that would make it (1024 x 1024 x 1024 x 4 = 4294967296).

Checking ZFS ARC cache settings

There are a couple of "tools" out there where you can check the cache settings. They are freely available somewhere on the Internet - I say somewher, if I put a link it'll bound to get broken at some point!

Search for the following arc_summary.pl and arcstat.pl.

I've tended to use the arc_summary.pl  file more than the other as it gave me the information I needed...

System Memory:
         Physical RAM:  15579 MB
         Free Memory :  417 MB
         LotsFree:      241 MB

ZFS Tunables (/etc/system):
ARC Size:
         Current Size:             4517 MB (arcsize)
         Target Size (Adaptive):   4518 MB (c)
         Min Size (Hard Limit):    1819 MB (zfs_arc_min)
         Max Size (Hard Limit):    14555 MB (zfs_arc_max)

ARC Size Breakdown:
         Most Recently Used Cache Size:          100%   4518 MB (p)
         Most Frequently Used Cache Size:         0%    0 MB (c-p)

ARC Efficency:
         Cache Access Total:             2289867603
         Cache Hit Ratio:      97%       2225242330     [Defined State for buffer]
         Cache Miss Ratio:      2%       64625273       [Undefined State for Buffer]
         REAL Hit Ratio:       96%       2211505671     [MRU/MFU Hits Only]

         Data Demand   Efficiency:    97%
         Data Prefetch Efficiency:    60%

        CACHE HITS BY CACHE LIST:
          Anon:                       --%        Counter Rolled.
          Most Recently Used:         10%        222683342 (mru)        [ Return Customer ]
          Most Frequently Used:       89%        1988822329 (mfu)       [ Frequent Customer ]
          Most Recently Used Ghost:    0%        692367 (mru_ghost)     [ Return Customer Evicted, Now Back ]
          Most Frequently Used Ghost:  1%        26184577 (mfu_ghost)   [ Frequent Customer Evicted, Now Back ]
        CACHE HITS BY DATA TYPE:
          Demand Data:                86%        1925807410
          Prefetch Data:               0%        9177425
          Demand Metadata:            12%        282516967
          Prefetch Metadata:           0%        7740528
        CACHE MISSES BY DATA TYPE:
          Demand Data:                89%        57756619
          Prefetch Data:               9%        5918790
          Demand Metadata:             1%        842028
          Prefetch Metadata:           0%        107836


arcstat.pl outputs every second (or two)

     Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
15:33:13    2G   64M      2   58M    2    6M   26  949K    0     4G    4G
15:33:14   132     0      0     0    0     0    0     0    0     4G    4G
15:33:15   100     0      0     0    0     0    0     0    0     4G    4G
15:33:16    74     0      0     0    0     0    0     0    0     4G    4G
15:33:17    64     0      0     0    0     0    0     0    0     4G    4G
15:33:18    36     0      0     0    0     0    0     0    0     4G    4G