Accessing the Data in Core Dumps

Check out this great article by Mark Ray!

http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/

Accessing the Data in Core Dumps

http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/

Getting AIX RoCE to show up as ent’s in AIX and use as a regular network card

In order to fully use these cards and get them to show up as ent devices perform the following:

After the existing AIX RoCE file sets are updated with the new file sets, both the roce and the ent devices might appear to be configured. If both devices appear to be configured when you run the lsdev command on the adapters, complete the following steps:

1. Delete the roceX instances that are related to the PCIe2 10 GbE RoCE Adapter by entering the following command:

# rmdev -dl roce0[, roce1][, roce2,…]

2. Change the attribute of the hba stack_type setting from aix_ib (AIX RoCE) to ofed (AIX NIC + OFED RoCE) by entering the following command:

# chdev -l hba0 -a stack_type=ofed

3. Run the configuration manager tool so that the host bus adapter can configure the PCIe2 10 GbE RoCE Adapter as a NIC adapter by entering the following command:

# cfgmgr

5. Verify that the adapter is now running in NIC configuration by entering the following command:

# lsdev -Cc adapter

The following example shows the results when you run the lsdev command on the adapter when it is configured in the AIX NIC + OFED RoCE mode:

Figure 1. Example output of lsdev command on an adapter with the AIX NIC + OFED RoCE configuration

ent1 Available 00-00-01 PCIe2 10GbE RoCE Converged Network Adapter
ent2 Avaliable 00-00-02 PCIe2 10GbE RoCE Converged Network Adapter
hba0 Available 00-00 PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714101604)

You should no longer see roce0 even after running cfgmgr, you can now treat the card like a regular network card (ent)…

Step-by-step guide to IBM Power Systems firmware update

IBM® Power Systems™ firmware update, which is often referred to as Change Licensed Internal Code (LIC) procedure, is usually performed on the managed systems from the Hardware Management Console (HMC). Firmware update includes the latest fixes and new features. We can use the Change Licensed Internal Code wizard from the HMC graphical user interface (GUI) to apply updates to the Licensed Internal Code (LIC) on the selected managed system.

We can select multiple managed systems to be updated simultaneously. The wizard also allows us to view the current system information or perform advanced operations. This tutorial provides the step-by-step procedure for the IBM Power Systems firmware update from the HMC command line, and the HMC GUI and is targeted for system administrators.

This step-by-step instructions can prepare the newbie for what needs to be done and how it could be done to stay on to the latest firmware level all the time. When you purchase a new hardware, the best practise is to upgrade all the firmware to the latest level.

Nitin Thorve (nithorve@in.ibm.com), Senior Associate IT Specialist – UNIX, IBM India

Priyanka Jade (priyanka.jade@in.ibm.com), Staff Software Engineer, IBM India

Thirukumaran Vasantha Thananjayan(thirukumaran@in.ibm.com), Senior Staff Software Engineer, IBM India

Overview of IBM Power Systems servers

  • Hardware Management Console can be a desktop or a rack-mounted appliance that manages the servers, and is used for partitioning and as a service tool.
  • A managed system is a single physical server. It can have I/O expansion units, towers, drawers, and storage area network (SAN) resources.
  • HMC communicates to the managed system through the service processor.
  • The service processor is an embedded controller that monitors and controls the entire system and is running the bare metal Linux.
  • The IBM POWER Hypervisor™is a layer of system firmware that supports virtualization technologies, logical partitioning (LPAR), and dynamic resource movement across multiple operating system environments.

Figure 1.

Introduction

The flexible service processor (FSP) firmware provides diagnostics, initialization, configuration, run-time error detection, and correction. It is required to periodically update the firmware on the Power Systems server. Keeping the firmware up-to-date can help in attaining the maximum reliability and functionality from your systems.

Firmware releases enable new function and might also contain fixes or enhancements.

Firmware service packs provide fixes and enhancements within a specific release.

This tutorial provides the following information:

  1. Current firmware details
  2. Different kinds of code download and update methods
  3. Steps to obtain the relevant firmware code updates or releases from the IBM FixCentral website
  4. Steps to update the firmware concurrently using DVD media, that is, the fixes that can be deployed on a running system without rebooting partitions or performing an initial program load (IPL) within a specific release
  5. Steps to update the firmware disruptively, that is, update requiring the system IPL within a specific release
  6. Advanced code update options from the Change Licensed Internal Code wizard
  7. Steps to upgrade to recent firmware releases disruptively using the File Transfer Protocol (FTP) method
  8. Steps to upgrade the firmware disruptively through the IBM Service website to a required level

In the following sections, let’s go through in detail covering all the topics highlighted above.

Section 1. View system information

We will use the View system information option to get the current system firmware information.

We will be using this information in IBM Fix Central to obtain information on the latest firmware updates or upgrades available for the system and proceed with the firmware update or upgrade to newer release using the instructions described in the following sections.

Select the system under test, click Updates, and then click View system information to check the currently installed, activated, and accepted levels.

Figure 1.1

The following figure shows the currently installed firmware levels on the system.

Figure 1.2

Fields in figure 1.2 are described below:

EC Number

This displays the numerical identifier of the engineering change (EC) that shows the system and GA level. It has the format of PPNNSSS, where:

  • PP is the two-character package identifier.
  • NN is the two-character name that identifies a set of platforms. This is the model-unique code for the type of system.
  • SSS is the three-character service pack code stream identifier.

LIC Type

This displays the LIC types associated with the selected target.

Machine Type/Model/Serial Number

This displays the corresponding machine type, model number, and serial number.

Installed Level

This displays the LIC level that will be activated and loaded into memory at the next system restart.

Activated Level

This displays the LIC level that is activated and loaded into memory (for example, from a level 5 to level 7).

Accepted Level

This displays the LIC level that was committed. This refers to the updates selected on the system.

This is the backup level of code that you can return to, if necessary. Generally, this is the level of code on the permanent side (p-side).

Unactivated Deferred Level

This displays the latest or highest LIC level that contains unactivated deferred updates. This refers to the updates selected on the system.

A deferred update requires a system restart to activate.

Platform IPL Level

This displays the LIC level on which the hypervisor and partition firmware were last restarted. When concurrent LIC updates are performed, the activated level will change, but the platform IPL level will remain unchanged.

Update Control

This displays the current owner of LIC update control. It can be either HMC or operating system.

Section 2. Different kinds of “Code download and update” methods

Having known the current firmware levels on the system as described in Section 1 and in order to move up to the necessary latest update that is available, we have various firmware update and upgrade methods as mentioned below. Select the one that is appropriate to your requirement.

  • DVD method
    We can get the latest firmware code information from Fix Central. You need to download the code (as described in Section 3). You will be able to download the file in the ISO format. Then, you need to burn it to a DVD media to perform the update/upgrade using the obtained DVD (and this is described in Section 4).

    Section 4 describes the concurrent firmware update procedure. We can also use the DVD method to perform code upgrades (to a new release). This can be used when the HMC cannot access Internet due to firewall.

  • FTP method
    Download the required code levels using bulk FTP from Fix Central to a remote FTP-enabled system (also described in Section 3) Then, perform the code update/upgrade procedures using the FTP method, providing the login credentials and location of the update/upgrade code on your remote FTP-enabled repository system ( as described for firmware upgrade in Section 7.

    Section 7 describes the disruptive upgrade procedure using the FTP method. Similarly, the FTP procedure can also be used for concurrent code updates (within the same release).

  • IBM service website method We can perform the Power Systems firmware update/upgrade through the IBM service website to a required level (as described in Section 8).

    Section 8 describes the code upgrade procedure disruptively using the IBM website. A similar procedure can be used for performing concurrent code updates as well.

    After selecting the required system from the HMC, ensure to select Change Licensed Internal Code in order to perform code updates (any updates within the same release) and select Upgrade Licensed Internal Code in order to perform code upgrades (by installing the different release).

Section 3. Power Systems firmware code location and download from Fix Central

Power Systems firmware fix packs or firmware releases can be obtained from the IBM Fix Central website.

Select the following categories for Power Systems firmware update and choose the appropriate machine type and model of your system to be updated.

As per the example shown in Figure 3.0, the machine type and model used is: 8203-E4A. Select the appropriate machine type of your choice and continue.

Figure 3.0

Example in Figure 3.1 shown below is for the system firmware only. Similarly, you can explore other options too.

Figure 3.1

If users are aware of the specific firmware level, then users can select the necessary option directly. If not, users can also take help from the recommendations that the website can provide about the latest and the best-suited firmware levels. If you need help, select the I need guidance.I am not sure what level of firmware is recommended option as shown in Figure 3.2.

Choose the specific level or get the recommended level as shown below:

Figure 3.2

Figure 3.3

Decide whether your system needs firmware update to the latest fix pack or upgrade to a new release based on the current levels installed on the system as obtained from View system information in the above section.

Figure 3.4

Figure 3.5

As an example, let us continue to get the firmware service pack within the current release, as shown in Figure 3.6.

Figure 3.6

Figure 3.7

Similarly, users can get the upgrade code, that is, newer release using the second option. Note that this will be a disruptive code install, that is, system power recycles.

Note:

Download the update code, if you are planning an update within the current release.

Download the upgrade code if you are planning for an upgrade to a newer release itself.

Figure 3.7 lists the latest, recommended, and available updates to your current release. Select the appropriate option and proceed further.

Continue with downloading the ISO file if you want burn it to a DVD to proceed with the firmware update using the DVD media, or get the code to a remote FTP-enabled system to perform update using the FTP method. The firmware update procedure is explained in detail in the following sections.

Figure 3.8

Figure 3.9

Figure 3.10

Section 4. Power Systems firmware concurrent update procedure using the DVD method

You can update the firmware concurrently (that is, the fixes that can be deployed on a running system without rebooting partitions or performing an IPL) within a specific release. Select the Change Licensed Internal Code option for the current release.

Figure 4.0

Figure 4.1

In the Specify LIC Repository section (as shown in Figure 4.2), select the location of the LIC update repository.

Figure 4.2

Figure 4.3

Select the DVD-RAM drive option,where you have the DVD placed and proceed with code update concurrently, as shown in Figure 4.3.

Note: Place the DVD in the HMC’s DVD drive (and not in the system’s DVD drive).

Figure 4.4

4Click OK to proceed further to the subsequent steps to perform code update. It verifies whether the system is ready for code update by performing the health check and if everything fine, we can proceed further.

The following screen captures show the step-by-step procedure to perform concurrent code update.

Figure 4.5

Figure 4.6

Figure 4.7

Figure 4.8

Figure 4.9

Figure 4.10

Section 5. Steps to update the firmware disruptively (that is, update requiring the system IPL within a specific release)

Firmware updates are usually concurrent. Disruptive update service packs are very rare. The procedure to perform disruptive update is quite similar to concurrent update (explained in Section 4) but this process will prompt for system power cycle during the operation.

Section 6. Advanced code update options from Change Licensed Internal Code wizard

We use the Select advanced features option to perform advanced operations, such as Remove and activate and Reject fix.

Remove and activate option

The Remove and activate option brings the system back to the update level that is on the permanent side. You can use this option to back off an update level.

Figure 6.0

Figure 6.1

Figure 6.2

Figure 6.3

Click OK and then Close to remove and activate the permanent side update level.

Figure 6.4

Reject Fix operation:

Boot the system in the Permanent Side mode (from ASMI -> Power/Restart Control -> Power On/Off System, and make sure that the Current firmware boot side option is displayed as Permanent) and only then the Reject Fix option gets enabled and the operation can be performed. This operation copies the currently running level (permanent side) to the temporary side. This can be used to reject a fix that has been applied.

Figure 6.5

Figure 6.6

Figure 6.7

Click OK to start this operation.

Section 7. Upgrade to newer firmware releases disruptively using the FTP method

Installing a release or a disruptive fix pack causes system IPL. All release upgrades are disruptive.

We can obtain the upgrade code, that is, the disruptive fix pack from Fix Central and burn it to a media drive and proceed with the upgrade process, which is quite similar to the concurrent update process explained in the earlier sections (except that this operation is disruptive).

In this section, let us learn how to use the FTP method to upgrade the system using the firmware code stored in a remote repository.

The following screen captures shows the steps to upgrade to newer firmware releases disruptively using the FTP method.

Figure 7.0

Figure 7.1

Figure 7.2

Figure 7.3

Figure 7.4

Figure 7.5

Figure 7.6

Figure 7.8

Figure 7.9

Clicking OK starts the disruptive upgrade. System will be on the applied release level after the upgrade operation completes.

Section 8. Steps to upgrade the firmware disruptively using the IBM service website to the required level

After logging in to the HMC, click System Management > Servers > Target Server on the left pane. Instead, you can also click the Updates icon on the same pane. All the available servers will be displayed in the right pane. In the following figure, the red highlight in the right pane shows the current level installed.

Figure 8.0

0Make sure that your target server is in the shutdown mode, and if not, switch off the server.

Now, click the Upgrade Licensed Internal Code to a new release link at the bottom of the page as shown in the following figure.

Figure 8.1

After clicking the link, you will be directed to the web page which will show information about the readiness check. If there is no errors found, you can click OK and proceed further, as shown in the following figure.

Figure 8.2

After clicking OK, you will be directed to the Specify LIC Repository page. Here, you need to select the location of the code. The options shown in the following figure are available.

Figure 8.3

If you are setting a new server configuration, the best practice at this prompt is always to select the IBM service web site option and you need not worry about the need to power off and power on the managed systems in this method.

After selecting the IBM Service web site option, you will have a new web page opened, which will show you the available LIC level details. Here, the best practice is to select the latest available code (that is, the latest available version). Most of the fixes are added by IBM and your Power Systems server will be upgraded to the latest level. Then, select the best as per your requirement, or the latest supported.

Be patient here and follow the prompts to complete the upgrade. The firmware upgrade activity will need time depending on your Internet bandwidth speed. Do not forget to switch on the server, so that the latest firmware gets activated and reflected in the navigation pane, as shown in the following figure.

Figure 8.4

Now you are done with the upgrade. Remember if you select multiple systems, you can upgrade them as well.

Visualize the Physical Layout of an AIX Volume Group

From: Brian Smith’s AIX / UNIX / Linux / Open Source blog

9/23/13 Update – See this updated verison of the script as well.  

Here is a script I’ve written to visualize the physical layout of an AIX volume group.   The script visually shows the location of every Physical Partition (PP) on each hdisk (AKA Physical Volume).   The output shows which Logical Volume (LV) is on each of the PP’s (or if it is free space).   The output is color coded so each LV has its own color so that it is very easy to see where each LV physically is across the entire Volume Group.  You can specify the number of columns of output depending on the size of your screen.

The intended use of the script is to show a visual representation of the Volume Group to make using commands which move around LP’s/PP’s such as migratelp easier to use, to make LVM/disk maintenance easier, and also as a learning tool.

Here are a few screenshots:

image

image

 image

  When running the script you specify 2 parameters:  The volume group name, and the number of columns you would like displayed (or it will default to 3 columns if not specified).

Here is the script:

 #!/bin/ksh

#vvg - visualize physical layout of AIX volume group
#Copyright Brian Smith, 2013

index=0
set -A colors 41m 42m 43m 44m 45m 46m 47m 100m 101m 102m 103m 104m 105m 106m
tempfile=”/tmp/`basename $0`_$$”
tempfile2=”/tmp/`basename $0`_2$$”
> $tempfile
> $tempfile2

if [ -n “$1” ]; then
vg=$1
else
echo “Specify VG name as first parameter”
exit 1
fi

if ! lsvg $vg >/dev/null 2>&1; then
echo “Error: VG name not correct or VG not varried on”
exit 2
fi

[ -n “$2” ] && col=$2 || col=3
if ! echo $col | grep “^[0-9]*$” >/dev/null || [ “$col” -eq 0 ]; then
echo “Error: second parameter should be number of columns”
exit 3
fi

count=0
columns=””
while [ “$count” -lt “$col” ]; do
columns=”$columns -”
count=`expr $count + 1`
done

showdisk()
{
. $tempfile
. $tempfile2
[ “$index” -gt 0 ] && index=`expr $index + 1`
pv=$1
lspv -M $pv | while read line; do
if echo $line |  awk ‘NF==1 {print}’ | grep ‘-‘ >/dev/null; then
                    beg=`echo $line | awk -F: ‘{print $2}’ | awk -F ‘-‘ ‘{print $1}’`
                    end=`echo $line | awk -F: ‘{print $2}’ | awk -F ‘-‘ ‘{print $2}’`
                    while [ “$beg” -le “$end” ]; do
                            echo “${pv}:$beg Free”
                            beg=`expr $beg + 1`
                    done
elif echo $line | awk -F: ‘{print $2}’ | grep “^[0-9]*$” >/dev/null ; then
                    echo “$line Free”
else
                    echo “$line”
fi
done | while read line2; do
pp=`echo “$line2” | awk ‘{print $1}’ | awk -F: ‘{print $2}’`
lv=`echo “$line2” | awk ‘{print $2}’ | awk -F: ‘{print $1}’`
lp=`echo “$line2″ | awk ‘{print $2}’ | awk -F: ‘{print $2}’`

eval if ! [ -n \”\$${lv}\” ]\; then \
                    ${lv}=$index\;        \
                    echo ${lv}=$index \>\> $tempfile  \; \
                    index=`expr $index + 1`\;   \
                    [ \”\$index\” -gt \”13\” ] \&\& index=0 \; \
                    echo index=$index \> $tempfile2  \; \
fi
eval printf  \\\\033[\${colors[\$${lv}]}
if [ -n “$lp” ]; then
                    printf “%-7s %-15s %+7s\033[0m ” “PP$pp” “$lv” “LP$lp”
else
                    printf “%-7s %-23s\033[0m ” “PP$pp” “$lv”
fi
echo
done | paste -d ” ” $columns
}

for pv in `lspv | grep ” $vg ” | awk ‘{print $1}’`; do
ppsize=`lspv $pv | grep “^PP SIZE” | awk ‘{print $3 ” ” $4}’`
echo “\033[1;36m******************************* \033[0m”
printf “\033[1;36m* %-8s                    * \n\033[0m” $pv
printf “\033[1;36m* Size   : %-10s         * \n\033[0m” “`getconf DISK_SIZE /dev/$pv` MB”
printf “\033[1;36m* PP Size: %-19s* \n\033[0m” “$ppsize”
echo “\033[1;36m******************************* \033[0m”
showdisk $pv
done
rm $tempfile
rm $tempfile2

AIX LPAR missing hdisk after vios reboot

AIX LPAR missing hdisk after vios reboot SOLVED –

Link shared by user on this blog who used this SUCCESSFULly!

Link to original article:  http://sysadmin-tricks.blogspot.com/2014/12/aix-lpar-missing-hdisk-after-vios.html
———————————————————————————————————
In case that you are doing routine checkup of your LPAR’s on IBM pSeries, you probably are checking status of your LPAR OS disks or volume group from time to time.
To check status of your volume grouphdisks use thisroot@aix-server> [/]  lsvg -p roottvg
rootvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk0                   missing                    546               4                 00..00..00..00..04
hdisk1                   active                       546               0                 00..00..00..00..00

As you can see one of hdisk is missing! And you start to panic! “OMG, hdisk is missing, where, how, when?!?!”

There is no place for panic. You will see that one of your disks is missing only after you have restarted one of your VIOS. In are case there is two VIOS. hdisk0 is from first VIOS, hdisk1 is from second VIOS. These two hdisk is creating volume group called rootvg.

How to fix this missing hdisk state?
All you need to do is to activate.

root@aix-server> [/] varyon rootvg

This will activate your volume group rootvg. After this you will see both of your hdisk as active!
Why this is important? Because of this:

When a volume group is activated, physical partitions are synchronized if they are not current.

But there is one case when you can’t make your hdisk active without making additional changes! In this case, after you execute varyon command, error will be prompted and you won’t be able to make your hdisk active!

root@aix-server> [/] varyon rootvg
varyonvg: Cannot varyon volume group with an active dump device on a missing physical volume. Use sysdumpdev to temporarily replace the dump device with /dev/sysdumpnull and try again.

So, as error said active dump device is on missing physical volume hdisk0.(I will not explaind here what system dump device is) How to change this? First we will list status of sysdump devices.

root@aix-server> [/]  sysdumpdev -l
primary              /dev/lg_dumplv
secondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON

From here we can see, that primary device is located on /dev/lg_dumplv and secondary device is /dev/sysdumpnull. In error message, active dump device is actually primary dump device in sysdumpdev -l. So we need to change that.

root@aix-server> [/] sysdumpdev -p /dev/sysdupmnull

List again sysdump devices.

root@aix-server> [/]  sysdumpdev -l
primary             
/dev/sysdumpnullsecondary            /dev/sysdumpnull
copy directory       /var/adm/ras
forced copy flag     TRUE
always allow dump    FALSE
dump compression     ON

Now execute activation of volume group. 

root@aix-server> [/] varyon rootvg

root@aix-server> [/] 
root@aix-server> [/]  lsvg -p rootvg
rootvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk0                         active            546                    4             00..00..00..00..04
hdisk1                         active            546                    0             00..00..00..00..00

As you can see now, both hdisk are active now.
Now, change back you primary dump device

root@aix-server> [/] sysdumpdev -p /dev/lg_dumplv

VIOS shared storage pool and thin provisioning

https://www.ibm.com/developerworks/aix/library/au-aix-vios-clustering/

PDF File <—  Click here for the PDF file of this great article!

By:  Karthikeyan Kannan (virtualkarthik@hotmail.com), Senior Consultant, Capgemini

Introduction to VIOS shared storage pool

I love Power Systems and always wondered why Power Systems doesn’t have snapshot and thin-provisioning features. Finally, I found that these are enabled in IBM Power Systems too with the introduction to the shared storage pool concept.

Shared storage pool, as the name states is basically to share storage resources (SAN disks) across a group of IBM VIOS instances. Not just as a physical disk, but slicing them like a logical volume inside the shared storage pool, which is denoted as a logical unit (LU). A LU is basically file-backed storage present in the clustered storage pool.

The VIOS needs to be at a minimum of version 2.2.0.11. I tested the functionality in VIOS 2.2.1.4. With the present release of VIOS 2.2.2.1, you can have 16 VIOS nodes in a cluster and can support up to 200 clients per VIOS node.

The shared storage pool concept takes advantage of the Cluster Aware AIX (CAA) feature in the IBM AIX® operating system to form a cluster of VIOS. Using the CAA feature, the cluster can monitor the peers in the cluster. Refer to Chris Gibson’s blog for more information about CAA.

In this article, I am using two VIOS instances hosted on two different physical systems. We will see details about the following tasks as you navigate through this article.

  • Creating a cluster and a shared storage pool
  • Verifying the status of the cluster
  • Listing the share storage pool attributes
  • Creating a logical unit
  • Assigning a logical unit to a client
  • Modifying a cluster

Features include:

  • Thick provisioning
  • Thin provisioning
  • Snapshot feature
    • Create
    • Rollback
    • Delete

Requirements

  • IBM PowerVM® Standard Edition
  • VIOS version 2.2.0.11, Fix Pack 24, Service Pack 1 and later
  • Minimum two disks: One for the CAA repository and the other for the storage pool.

Lab setup

Figure 1 shows the lab setup that I have used to illustrate this feature throughout the article.

Figure 1. Lab environment setup

Lab environment setup

We will also log in to both the VIOS and verify the configuration.

Listing 1. On VIOS A
$ hostname
VIOSA
$ ioslevel
2.2.1.4
$ lspv
NAME             PVID                                 VG               STATUS
hdisk0           00c858a2bde1979e                     rootvg           active
hdisk1           00c858a2cbd45f6b                     None              
hdisk2           00c858a2cca2a81d                     None              
hdisk3           00c858a210d30593                     None              
hdisk4           00c858a210d32cfd                     None              
$ lsvg
rootvg
$ lssp
Pool              Size(mb)   Free(mb)  Alloc Size(mb)    BDs Type       
rootvg              102272      77824             128      0 LVPOOL     
$
$ cluster -list
$
Listing 2. On VIOS B
$ hostname
VIOSB
$ ioslevel
2.2.1.4
$ lspv
NAME             PVID                                 VG               STATUS
hdisk0           00c858a2bde1979e                     rootvg           active
hdisk1           00c9095f0f795c20                     None              
hdisk2           00c858a2cca2a81d                     None              
hdisk3           00c858a210d30593                     None              
hdisk4           00c858a210d32cfd                     None              
$ lsvg
rootvg
$ lssp
Pool              Size(mb)   Free(mb)  Alloc Size(mb)    BDs Type       
rootvg              102272      77824             128      0 LVPOOL     
$
$ cluster -list
$

I have five disks in my VIOS systems, hdisk0 and hdisk1 are used by the VIOS and client logical partition or LPAR (rootvg) respectively in both the VIOS. The disks that we are going to use are hdisk2, hdisk3, and hdisk4. Look at the physical volume ID (PVID) for all of them; they are same on both the VIOS, which confirms that the same set of physical disks is shared between both the VIOS instances. The order of their naming does not need to be the same, as it is the PVID that matters.

You also need to ensure that the VIOS nodes in a cluster are reachable in the IP network. You should be able to resolve their hostnames either by using /etc/hosts or by DNS.

 

Creating a shared storage pool

Now that our playground is ready, let’s start the game by creating a VIOS cluster and a shared storage pool. This should be performed using the cluster command that initializes the cluster process and creates a shared storage pool.

For our demo cluster, I am using hdisk2 for the CAA repository disk that holds all the vital data about the cluster and hdisk3 and hdisk4 for shared storage pool.

Listing 3. On VIOS A
$ cluster -create -clustername demo1 -repopvs hdisk2 -spname demosp -sppvs 
hdisk3 hdisk4 -hostname viosa 
Cluster demo1 has been created successfully.

$

As soon as the command completes successfully, we can verify the status of the cluster and its attributes using the –list and -status flags of the cluster command.

Listing 4. On VIOS A
$ cluster -list
CLUSTER_NAME:    demo1
CLUSTER_ID:      36618f14582411e2b6ea5cf3fceba66d
$

$ cluster -status -clustername demo1
Cluster Name         State
demo1                    OK

    Node Name        MTM                        Partition Num  State  Pool State
    VIOSA            9117-MMC0206858A2            39              OK       OK  

$

The above code ran on VIOS A tells us that there is a cluster with the name demo1 created with the Cluster ID 36618f14582411e2b6ea5cf3fceba66d. This cluster ID is a unique identifier for each cluster that is created. The command cluster status indicates the status of the cluster denoting whether the cluster is in the operating state or do we have any problems in it. It also gives useful information (such as model type, serial number, and the partition ID of the hosting VIOS) about the physical system.

We can also use the CAA commands, such as lscluster to view the status of the cluster to ensure that it is operational.

Listing 5. On VIOS A
$ lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 1
        Node name: VIOSA
        Cluster shorthand id for node: 1
        uuid for node: 365731ea-5824-11e2-b6ea-5cf3fceba66d
        State of node:  UP  NODE_LOCAL
        Smoothed rtt to node: 0
        Mean Deviation in network rtt to node: 0
        Number of clusters node is a member in: 1
        CLUSTER NAME       TYPE  SHID   UUID

        demo1              local        36618f14-5824-11e2-b6ea-5cf3fceba66d

        Number of points_of_contact for node: 0
        Point-of-contact interface & contact state
         n/a

$

So far, we been verifying only the cluster, where did the storage pool go? The cluster command will neither show you the shared storage pool created using the cluster command nor the lssp command.

To view the shared storage pool, we need to use the legacy lssp command of VIOS, which is used to list the storage pools, but with a special flag –clustername.

The command format to list the shared storage pool available within the cluster is lssp -clustername <NAME>.

Listing 6. On VIOS A
$ lspv                                       
NAME        PVID                  VG         STATUS   
hdisk0       00c858a2bde1979e            rootvg       active   
hdisk1       00c858a2cbd45f6b            None             
hdisk2       00c858a2cca2a81d            caavg_private   active   
hdisk3       00c858a210d30593            None             
hdisk4       00c858a210d32cfd            None             

$ lssp                                       
Pool        Size(mb)   Free(mb)  Alloc Size(mb)   BDs Type         
rootvg        102272    77824        128    0 LVPOOL        

$ lsvg                                       
rootvg                                       
caavg_private                                  

$ lssp -clustername demo1
POOL_NAME:       demosp
POOL_SIZE:       102144
FREE_SPACE:      100391
TOTAL_LU_SIZE:   0
TOTAL_LUS:       0
POOL_TYPE:       CLPOOL
POOL_ID:         00000000097938230000000050E9B08C

$

In the above output, you can see that the name of the shared storage pool is demosp, the total size of the shared storage pool is 100 GB, and the free space is 100391 MB. You can also see the fields pointing to the number of LUs and the total size of the LUs is 0, as we do not have any LUs created so far. Along with a unique cluster ID, the shared storage pool also gets a unique identifier.

You may also need to note that there is a new volume group (VG) named, caavg_private, created along with the shared storage pool. This VG is CAA-specific and the disks that are part of this VG stores the vital data to keep the cluster alive and running. You should not use this VG for any other purposes.

 

Logical unit

As stated in the start of the article, a logical unit (LU) is a file-backed storage device that can be presented to a VIOS client as a virtual SCSI (VSCSI) disk-backing device.

Now, we need to create a LU on top of the shared storage pool. In VIOS A, we already have a vhost0 connection created, through which the lparA gets a physical hard disk for its rootvg.

Listing 7. On VIOS A
$ lsmap -all
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U9117.MMC.06858A2-V39-C3                     0x0000000f

VTD                   LPARA_RVG
Status                Available
LUN                   0x8100000000000000
Backing device        hdisk1
Physloc               U78C0.001.DBJ0379-P2-C3-T1-W500507680120D9ED-L1000000000000
Mirrored              false

$

Now all the setup for us to have the LU created for the client LPAR A from the shared storage pool. The LU is also going to be same as a LV or file-backed backing device on top of a storage pool. The command we use to create it is the same VIOS command mkbdsp, with some additional flags.

Listing 8. On VIOS A
$ mkbdsp -clustername demo1 -sp demosp 20G -bd lparA_lu1
Lu Name:lparA_lu1
Lu Udid:2f4adc720f570eddac5dce00a142de89

$

In the above output, I used the mkbdsp command to create the LU first. I have created a LU of size 20 GB on demosp (which is not mapped to any client yet). To map it, you need to again use the mkbdsp command as shown in Listing 9.

Listing 9. On VIOS A
$ mkbdsp -clustername demo1 -sp demosp -bd lparA_lu1 -vadapter vhost0 -tn lparA_datavg
Assigning file "lparA_lu1" as a backing device.
VTD:lparA_datavg

$

Note that I have not mentioned the size here because the LU already exists. This command will map the LU lparA_lu1 to vhost0 with the VTD name, lparA_datavg.

Instead of going with two steps, one for creating and one for assigning to a client, we can perform both of these operations in a single command as depicted in the following output. Before that, I will have to delete the VTD lparA_datavg backed by the LU lparA_lu1 which we just mapped. We can use the usual rmvdev for the VTD and rmbdsp for the LU.

Listing 10. On VIOS A
$ rmvdev -vtd lparA_datavg
lparA_datavg deleted
$ rmbdsp -clustername demo1 -sp demosp -bd lparA_lu1
Logical unit lparA_lu1 with udid "a053cd56ca85e1e8c2d98d00f0ab0a0b" is removed.
$

Now, I will create and map the LU in a single command as shown in the following output.

Listing 11. On VIOS A
$ mkbdsp -clustername demo1 -sp demosp 20G -bd lparA_lu1 -vadapter vhost0 -tn lparA_datavg
Lu Name:lparA_lu1
Lu Udid:c0dfb007a9afe5f432b365fa9744ab0b

Assigning file "lparA_lu1" as a backing device.
VTD:lparA_datavg

$

As the LU is created and mapped, the client should be able to see the LU as a disk. We will verify the LU and the mapping in VIOS A once and move over to the client lparA which is a client of VIOS A.

The lssp command can be used to list the backing devices in the shared storage pool.

Listing 12. on VIOS A
$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             c0dfb007a9afe5f432b365fa9744ab0b
$

The following output in Listing 13 (in bold) shows that the lparA_lu1 LU is mapped to the lparA client on vhost0.

Listing 13. On VIOS A

Click to see code listing

On the client machine lparA, we already have one physical volume that is used by rootvg. Now, the new disk should be available for the client to use. We will attempt to configure the LU provided to the client.

Listing 14. On lparA
lparA#hostname
lparA
lparA#lspv
hdisk0          00c858a2cbd45f6b                    rootvg          active      
lparA#cfgmgr
lparA#lspv
hdisk0          00c858a2cbd45f6b                    rootvg          active      
hdisk1          none                                None                   
# lscfg -vpl hdisk1
  hdisk1           U9117.MMC.06858A2-V15-C2-T1-L8200000000000000  Virtual SCSI Disk Drive

  PLATFORM SPECIFIC

  Name:  disk
    Node:  disk
    Device Type:  block
#

We made it through! The LU is now available to the client as a virtual SCSI disk drive.

Snapshot and restore

So far, we have set up a shared storage pool with a single VIOS, created a logical unit, and assigned the LU to the lparA client.

Now let’s concentrate on some VG operations on the client side to explore the snapshot feature of shared storage pool. Using the new LU provided to lparA, I am going to create a volume group (datavgA) and a file system, named /datafsA, on top of it.

Listing 15. On lparA
lparA#mkvg -y datavgA hdisk1
0516-1254 mkvg: Changing the PVID in the ODM.
datavgA
lparA#crfs -v jfs2 -m /datafsA -g datavgA -a size=2G
File system created successfully.
2096884 kilobytes total disk space.
New File System size is 4194304
lparA#
lparA#mount /datafsA
lparA#cd /datafsA
lparA#touch file_lparA
lparA#ls
file_lparA  lost+found
lparA#

Now, we will create two files. One named before_snap after which I will take a snapshot and then one more named after_snap. We will also restore the snapshot for demonstration.

Listing 16. On lparA
lparA#touch before_snap
lparA#ls  
before_snap  file_lparA   lost+found
lparA#pwd
/datafsA
lparA#

On VIOS A, we will take a snapshot now.

The command to capture a snapshot is:

snapshot -clustername <Clustername> -spname <Shared_Pool_Name> -luudid <ID> -create SNAP_NAME

Listing 17. On VIOS A
$ snapshot -clustername demo1 -create lparA_lu1_SNAP1 -spname demosp -lu lparA_lu1
lparA_lu1_SNAP1
$
$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66
Snapshot                                                                       
lparA_lu1_SNAP1

$

The lssp command in the above code listing indicates that there is a snapshot named lparA_lu1_SNAP1 associated with the lparA_lu1 LU.

Now, we will create one more file named after_snap in the lparA client.

Listing 18. On lparA
lparA#pwd
/datafsA
lparA#touch after_snap
lparA#ls
after_snap   before_snap  file_lparA   lost+found
lparA#cd
lparA#umount /datafsA
lparA#varyoffvg datavgA
lparA#

I have varied off the volume group now with data on it. It is always recommended to take the resources offline incase you want to restore some data. You should be familiar with it on your experience.

Let’s try to restore the lparA_lu1_SNAP1 snapshot and see what data is present in the volume group on the client side.

Listing 19. On VIOS A
$ snapshot -clustername demo1 -rollback lparA_lu1_SNAP1 -spname demosp -lu lparA_lu1
$
Listing 20. On lparA
lparA#varyonvg datavgA
lparA#mount /datafsA
Replaying log for /dev/fslv00.
lparA#ls -l /datafsA
total 0
-rw-r--r--    1 root     system            0 Jan  6 23:27 before_snap
-rw-r--r--    1 root     system            0 Jan  6 23:24 file_lparA
drwxr-xr-x    2 root     system          256 Jan  6 23:23 lost+found
lparA#

After the snapshot was restored and the volume group brought back online, there is no file named after_snap. This is because the file was created after the snapshot. Now that we rolled back the snapshot, it does not exist.

If you want to delete the snapshot, you can use the snapshot command, as shown in the following listing.

Listing 21. On VIOS A
$ snapshot -clustername demo1 -delete lparA_lu1_SNAP1 -spname demosp -lu lparA_lu1
$

Modifying the cluster

So far, whatever we performed is only on VIOS A and lparA. The cluster what we have created is also a single-node cluster. You can ask me, what is a single node cluster and how can it be? Well that is what the CAA feature of AIX dictates. A cluster can be created with a single node too.

Let’s expand our cluster, demo1, by adding the second VIOS instance VIOS B on the other CEC.

Listing 22. On VIOS A
$ cluster -addnode -clustername demo1 -hostname viosb
Partition VIOSB has been added to the demo1 cluster.

$

The above command has added VIOS B to the cluster. Let’s verify it with the cluster -status command.

Listing 23. On VIOS A
$ cluster -status -clustername demo1
Cluster Name         State
demo1                OK

    Node Name        MTM           Partition Num  State  Pool State
    VIOSA            9117-MMC0206858A2        39  OK     OK  
    VIOSB            9119-59502839095F         3  OK     OK  
$

If you see the above output, it clearly states that VIOS A and VIOS B are hosted on two different physical systems, and now both are part of the VIOS cluster, demo1.

Now, we can move to VIOS B and check if the entire configuration that we did on VIOS A has really reflected in VIOS B.

Listing 24. On VIOS B
$ hostname
VIOSB
$ cluster -list
CLUSTER_NAME:    demo1
CLUSTER_ID:      36618f14582411e2b6ea5cf3fceba66d

$ lscluster -m
Calling node query for all nodes
Node query number of nodes examined: 2


        Node name: VIOSA
        Cluster shorthand id for node: 1
        uuid for node: 365731ea-5824-11e2-b6ea-5cf3fceba66d
        State of node:  UP
        Smoothed rtt to node: 7
        Mean Deviation in network rtt to node: 3
        Number of clusters node is a member in: 1
        CLUSTER NAME       TYPE  SHID   UUID                                
        demo1              local        36618f14-5824-11e2-b6ea-5cf3fceba66d

        Number of points_of_contact for node: 2
        Point-of-contact interface & contact state
         dpcom  UP  RESTRICTED
         en3  UP

------------------------------

        Node name: VIOSB
        Cluster shorthand id for node: 2
        uuid for node: a9d1aeee-582d-11e2-bda1-5cf3fceba66d
        State of node:  UP  NODE_LOCAL
        Smoothed rtt to node: 0
        Mean Deviation in network rtt to node: 0
        Number of clusters node is a member in: 1
        CLUSTER NAME       TYPE  SHID   UUID                                
        demo1              local        36618f14-5824-11e2-b6ea-5cf3fceba66d

        Number of points_of_contact for node: 0
        Point-of-contact interface & contact state
         n/a
$

$ lssp -clustername demo1
POOL_NAME:       demosp
POOL_SIZE:       102144
FREE_SPACE:      100353
TOTAL_LU_SIZE:   20480
TOTAL_LUS:       1
POOL_TYPE:       CLPOOL
POOL_ID:         00000000097938230000000050E9B08C

$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66
Snapshot                                                                       
lparA_lu1_SNAP1
$

We have verified from the above command output that VIOS B is also connected to the cluster and the shared storage pool, demosp, is available on VIOS B. I tried to map the LU lparA_lu1 to the client lparB, which is connected to VIOS B.

Listing 25. On VIOS B
$ lsmap -all
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U9119.595.839095F-V3-C2                      0x00000004

VTD                   lparB_RVG
Status                Available
LUN                   0x8100000000000000
Backing device        hdisk1
Physloc               U5791.001.99B0PA1-P2-C02-T1-W500507680110D9E3-L1000000000000
Mirrored              false

$


$ mkbdsp -clustername demo1 -sp demosp -bd lparA_lu1 -vadapter vhost0 -tn lparB_datavgA
Assigning file "lparA_lu1" as a backing device.
VTD:datavgA
$

 lsmap -all
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U9119.595.839095F-V3-C2                      0x00000004

VTD                   lparB_RVG
Status                Available
LUN                   0x8100000000000000
Backing device        hdisk1
Physloc               U5791.001.99B0PA1-P2-C02-T1-W500507680110D9E3-L1000000000000
Mirrored              false

VTD                   lparB_datavgA
Status                Available
LUN                   0x8200000000000000
Backing device        lparA_lu1.687f8420bbeee7a5264ce2c6e83d3e66
Physloc                
Mirrored              N/A

$

Yes, I am able to successfully map the same LU to lparA and lparB at the same time. I can now log in to lparB and see if it got the disk visible for the client OS.

Listing 26. On LPAR B
lparB#lspv
hdisk0          00c9095f0f795c20                    rootvg          active      
hdisk1          00c858a210fdef5e                    None                        
lparB#
Listing 27. On LPAR A
lparA#lspv
hdisk0          00c858a2cbd45f6b                    rootvg          active      
hdisk1          00c858a210fdef5e                    datavgA         active      
lparA#

Looking at the above output, I can confirm that we are able to share the same LU for two clients at the same time. Notice that the PVID is similar in both LPARs. Now, you can use the functionality to access the disk on both the clients. Beware of data corruption and use the right technology to access the disk, be it Logical Volume Manager (LVM) with concurrent or enhanced concurrent VG.

We have seen how to expand the cluster. We will also see how to shrink the cluster, that is, remove a VIOS node from the cluster.

Before removing a VIOS from the cluster, ensure that there is no LU provided to any clients from the specific VIOS that you intend to remove. In our case, we will remove VIOSB from the cluster.

Listing 28. On VIOS A
$ cluster -rmnode -clustername demo1 -hostname viosb
PARTITION HAS MAPPINGS
VIOSB

Command did not complete.

$

Oops!!! The command failed.

This is because we have not removed the mapping of the lparA_lu1 LU that was provided to lparB through VIOSB. We can delete VTD mapping and rerun the command or use the -f flag. I’m using the –f flag because I know that there is only one LU mapped. Using the -f flag will remove all the VTD devices created using the LUs from that specific cluster. If you have multiple mapping, you need to verify and then proceed.

Listing 29. On VIOS A
$ cluster -rmnode -f -clustername demo1 -hostname viosb
Partition VIOSB has been removed from the demo1 cluster

$

In case you need to add additional disks to the shared storage pool, you can use the following command format. I have not run it as I do not have an additional disk.

chsp -add -clustername <cluster_name> -sp <ssp_name> hdiskn

Thin and thick provisioning

We have not touched up on one thing yet, which is thin provisioning. You do not need to perform or set up anything exclusively on a shared storage pool for using thin provisioning. If you would have seen all the output of the lssp commands, you can see the header named “ProvisionType” and throughout our demonstration, all the LUs were thin provisioned. This is because in a shared storage pool, the default behavior is thin provisioning.

If you want to thick provision a LU, you need to specifically mention it using a -thick flag with the mkbdsp command.

Listing 30. On VIOS A
$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66
$

We will try creating a thick provisioned LU for demonstration.

Listing 31. On VIOS A
$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66

$
$ mkbdsp -clustername demo1 -sp demosp 50G -bd lparA_lu2 -vadapter 
vhost0 -tn lparA_datavg_D2 -thick
Lu Name:lparA_lu2
Lu Udid:0ceaf03105d97f45ef4c595968f61cf7

Assigning file "lparA_lu2" as a backing device.
VTD:lparA_datavg_D2

$
$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66
lparA_lu2        51200       THICK            0ceaf03105d97f45ef4c595968f61cf7
$
$ lsmap -all
SVSA            Physloc                                      Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0          U9117.MMC.06858A2-V39-C3                     0x0000000f

VTD                   LPARA_RVG
Status                Available
LUN                   0x8100000000000000
Backing device        hdisk1
Physloc               U78C0.001.DBJ0379-P2-C3-T1-W500507680120D9ED-L1000000000000
Mirrored              false

VTD                   lparA_datavg
Status                Available
LUN                   0x8200000000000000
Backing device        lparA_lu1.687f8420bbeee7a5264ce2c6e83d3e66
Physloc                
Mirrored              N/A

VTD                   lparA_datavg_D2
Status                Available
LUN                   0x8300000000000000
Backing device        lparA_lu2.0ceaf03105d97f45ef4c595968f61cf7
Physloc                
Mirrored              N/A

$

Now, take a look at the above output. The new LU that we have created is a thick provisioned LU and it has been also mapped to the client lparA.

Thin provisioning helps you to over commit the storage resources available. For example, consider we have a 20 GB LU and a 50 GB LU in our shared storage pool. Now let’s say we have a requirement for a client for 50 GB of space. We cannot fulfill this request in a normal VG scenario or if we have used thick provisioning of LUs in a shared storage pool. Now, as we have used thin provisioning for lparA_lu1, the unused space by the client is available for use. You can also see the output of the lssp command in the following listing, which tells that there is 49 GB of free space in the shared storage pool.

Listing 32. On VIOS A
$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66
lparA_lu2        51200       THICK            0ceaf03105d97f45ef4c595968f61cf7
$
$ lssp -clustername demo1
POOL_NAME:       demosp
POOL_SIZE:       102144
FREE_SPACE:      49150
TOTAL_LU_SIZE:   71680
TOTAL_LUS:       2
POOL_TYPE:       CLPOOL
POOL_ID:         00000000097938230000000050E9B08C
$ mkbdsp -clustername demo1 -sp demosp 50G -bd testlu1 -thick
Storage Pool subsystem operation, unable to create LU.
Storage Pool subsystem operation, not enough space in the pool.

$ mkbdsp -clustername demo1 -sp demosp 50G -bd testlu1
Lu Name:testlu1
Lu Udid:9e75b355e376eb81914df20bfb6c07f1

$

I tried to create a thick provisioned LU of 50 GB, but it failed due to insufficient space, whereas the command without the –thick flag was successful as it is thin provisioned.

Using thin provisioning also puts a risk of over-committing your storage resources. Though it is an advantage of virtualization, it also brings you a risk if you do not have control over the usage limit. Assume a scenario where all your clients started occupying whatever is allocated to them. In this case, you will be ending up with a problem of LVM write errors in the clients if the LUs are thin provisioned as there are no real blocks available to support when you have overcommitted your shared storage pool.

To overcome this, you can use the alert functionality of the shared storage pool to let the system administrator know in case the hard usage of the shared storage pool crosses the threshold limit.

Listing 33. On VIOS A
$ alert -set -clustername demo1 -spname demosp -type threshold -value 75      
$ Pool freespace is 47 percent.
$ alert -list -clustername demo1 -spname demosp -type threshold          
PoolName:      demosp                              
PoolID:       00000000097938230000000050E9B08C                 
ThresholdPercent: 75                                
$ alert -unset -clustername demo1 -spname demosp -type threshold          
$

After looking into the Listing 31 with two LUs mapped to a client, there might be a question in mind on how to take a snapshot at the same time when multiple LU’s are provided to a client. In storage, we refer to this as a consistency group where snapshots are created for a group of volumes at the same time to maintain consistency. This is also possible in shared storage pools.

To explain this, I am creating a single snap of the two LUs allocated to lparA on VIOS A.

Listing 34. On VIOS A
$ snapshot -clustername demo1 -create datavgA_snap -spname demosp -lu lparA_lu1 lparA_lu2
datavgA_snap
$

$ lssp -clustername demo1 -sp demosp -bd
Lu Name          Size(mb)    ProvisionType    Lu Udid
lparA_lu1        20480       THIN             687f8420bbeee7a5264ce2c6e83d3e66
Snapshot                                                                       
datavgA_snap

lparA_lu2        51200       THICK            0ceaf03105d97f45ef4c595968f61cf7
Snapshot                                                                       
datavgA_snap

$

This way, we can ensure consistency across multiple disks by creating snapshots at the same time for consistency.

Hints

  • A LU, either thick or thin provisioned, cannot be resized. You should only add new LUs to the client. May be the future releases might have an option, but IBM support confirmed that it is not supported.
  • Disks that are part of the shared storage pools can still be listed as not part of any VG (none) in the lspv output. (Refer to Listing 6.)
  • You can also use the VIOS Shared Storage Pool with a dual-VIOS setup to have redundancy for your clients with default AIX MPIO. It is an advantage over the traditional file-backed or LV-backed storage offering redundancy.
  • You can use only VSCSI with shared storage pools and NPIV cannot be used as of now. No idea if it will be supported in the future.
  • You can use the LU ID instead of the LU name as duplication of LU names is allowed in a shared storage pool.
  • I have not fully tested the alert functionality, but it will be a good feature if it works properly.

Resources

Summary

In this article, I have detailed the features of the shared storage pool and how best you can use it in your infrastructure. I felt that using a shared storage pool makes life easier on environments where you do not have heavy data workload systems. I have not performed any benchmarks for it. Apply it wherever it best suits your environment. This article is not intended to replace any official document, but can act as a quick-start guide for system administrators who would like to explore or test the shared storage pool concept.

SVC to PowerVM NPIV SAN Zoning Best Practices

From Ernie O. and Chuck L of IBM…

Please do not contact the author on this but raise any concerns on this paper through your IBM Storage Support channel.

                                                                                                                                      

I have always considered 8 NPIV paths/zones, each with a single initiator and single target, to be the best configuration for performance and availability.

Most vendors Path Control Modules recommend 8 paths for performance and, properly cabled, 8 paths also allows for a concurrent failure on one of the SAN Fabrics (assuming 2) as well as one of the VIO servers, (assuming 2).

However, the SVC using SDDPCM has 4 paths as it’s optimal number for performance.  If you use 8 paths, and LPM a partition, the SVC has to manage 16 paths during the LPM move and this has resulted in a failed LPM move, with data loss, resulting in a reload of the partition.

 

Click for detailed diagrams .pdf —–>  SVC zoning for PowerVM NPIV

 

Please do not contact the author on this but raise any concerns on this paper through your IBM Storage Support channel.

Security Bulletin: Tivoli Storage Manager client encryption key password vulnerability (CVE-2014-4818)

Tivoli Storage Manager: Security bulletin

– TITLE: Security Bulletin: Tivoli Storage Manager client encryption key password vulnerability (CVE-2014-4818)

– URL: http://www.ibm.com/support/docview.wss?uid=swg21697022&myns=swgtiv&mynp=OCSSAT9S&mynp=OCSSSQWC&mynp=OCSSGSG7&mync=E&cm_sp=swgtiv-_-OCSSAT9S-OCSSSQWC-OCSSGSG7-_-E

– ABSTRACT: A vulnerability in the IBM Tivoli Storage Manager (TSM) client would allow a local user to obtain the encryption key password.

Security Bulletin

Summary

A vulnerability in the IBM Tivoli Storage Manager (TSM) client would allow a local user to obtain the encryption key password.

Vulnerability Details

CVEID: CVE-2014-4818
DESCRIPTION: 

IBM Tivoli Storage Manager client contains a vulnerability that would allow a local user to obtain the encryption key password used for backups and restores.

CVSS Base Score: 2.10
CVSS Temporal Score: See http://xforce.iss.net/xforce/xfdb/95451 for the current score
CVSS Environmental Score*: Undefined
CVSS Vector: (AV:L/AC:L/Au:N/C:P/I:N/A:N)

Affected Products and Versions

  • TSM 7.1.0.0 through 7.1.1.x
  • TSM 6.4.0.0 through 6.4.2.x
  • TSM 6.3 all versions
  • TSM 6.2 all versions
  • TSM 6.1 all versions
  • TSM 5.5 all versions
  • TSM 5.4 all versions

Remediation/Fixes

TSM
Release
First Fixing
VRMF Level
APAR Remediation/First Fix
7.1 7.1.2 IT06016 A fix will be provided for 7.1.2 on 4/17/2015 or apply the workaround.
6.4 6.4.3 IT06016 A fix will be provided for 6.4.3 on 7/14/2015 or apply the workaround.
6.3, 6.2,
6.1, 5.5, and 5.4
Upgrade to fixing release or apply the workaround.

Workarounds and Mitigations

Step 1

      Create a user group (e.g., tsmusers) that includes all users that need to use the TSM client.

    Step 2

        Restrict access to the stored encryption key password by restricting access to the client Trusted Communications Agent (TCA) by using the user group created in Step 1.
        1. Use chgrp to change the ownership of dsmtca to include the tsmusers group.
          chgrp tsmusers dsmtca
        2. Use chmod to set the execute bit for the group so that anyone in the tsmusers group can run dsmtca.
          chmod 750 dsmtca
        3. Use chmod to set the SUID bit for dsmtca so that users in the group can run it with elevated privileges.
          chmod u+s dsmtca
        4. Verify that the group has the execute bit set for the dsmtca file using:
          type ls -l dsmtca

          The output from this command shows that the SUID bit (s) is set for dsmtca in the user field and that the execute bit (x) is set in the group field.
          -rwsr-x— 1 root tsmusers 13327961 2011-05-19 08:34 dsmtca

      Get Notified about Future Security Bulletins

      References

      Related information

      Acknowledgement

      The vulnerability was reported to IBM by Bartlomiej Balcerek from WCSS CSIRT

      *The CVSS Environment Score is customer environment specific and will ultimately impact the Overall CVSS Score. Customers can evaluate the impact of this vulnerability in their environments by accessing the links in the Reference section of this Security Bulletin.

      Disclaimer

      According to the Forum of Incident Response and Security Teams (FIRST), the Common Vulnerability Scoring System (CVSS) is an “industry open standard designed to convey vulnerability severity and help to determine urgency and priority of response.” IBM PROVIDES THE CVSS SCORES “AS IS” WITHOUT WARRANTY OF ANY KIND, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. CUSTOMERS ARE RESPONSIBLE FOR ASSESSING THE IMPACT OF ANY ACTUAL OR POTENTIAL SECURITY VULNERABILITY.

      Cross reference information
      Segment Product Component Platform Version Edition
      Storage Management Tivoli Storage Manager Extended Edition AIX, HP-UX, Linux, Solaris, Mac OS 5.4, 5.5, 6.1, 6.2, 6.3, 6.4, 7.1 All Editions
      Storage Management IBM System Storage Archive Manager AIX, HP-UX, Linux, Solaris, Mac OS 6.2, 6.3, 6.4, 7.1 All Editions

      AIX Tips & tricks

      SOME HANDY TIPS FROM  http://unixadminguide.blogspot.com/2013/12/aix-tips-tricks.html

      AIX Tips & tricks 

      Below are few of the AIX commands which will be useful for AIX admins.

      1. To list machines configured in a NIM Server,
      # lsnim -c machines

      2. To list networks configured in a NIM Server,
      # lsnim -c networks

      3. To reset a machine (return to ready state)
      # nim -Fo reset MachineName

      4. To list core file settings for a user,
      # lscore user1

      The output will look like:
      compression: on
      path specification: default
      corefile location: default
      naming specification: off

      5. To list the default settings for the system,

      # lscore -d

      The output will look like:
      compression: off
      path specification: on
      corefile location: /corefiles
      naming specification: off

      6. To make any process run by root dump compressed core files and restore the location of the core files to the system default,

      # chcore -c on -p default root
      Note: If no default is specified, cores will dump in the current directory.

      7. To enable a default core path for the system, type:

      # chcore -p on -l /corefiles -d

      8. To scan logical volume lv01, report the status of each partition, and have every block of each partition read to determine whether it is capableof performing I/O operations, type:

      # mirscan -l lv01

      9. To do the above operation in a PV,

      # mirscan -p hdisk1

      10. To do the above operation in a VG,

      # mirscan -v vg01

      11. To determine if the 64-bit kernel extension is loaded,

      # genkex grep 64

      12. To list all JFS file systems,

      # lsjfs

      13. To list all JFS2 file systems

      # lsjfs2

      14. To mirror a terminal1 on terminal2
      a. Open terminal 1 and find the pts value (ps -ef grep pts)

      b. Open terminal 2 and enter ‘portmir -t pts/1’
      c. Now you will see commands and outputs from terminal 1 in terminal 2.
      This is basically monitor a terminal.
      d. Say “portmir -o” to end the mirroring after the use

      15. To identify the current run level,

      # cat /etc/.init.state

      16. To list the available CD ROM drives,

      # lsdev -Cc cdrom

      17. To find out the speed of your network adapter,

      # entstat -d ent0 grep “Media Speed”

      18. To find out when your system was last installed/updated

      # lslpp -f bos.rte

      19. To list the status of your tape drive,

      # tctl -f /dev/rmt0 status

      20. How to setup anonymous ftp in AIX

      Run the below script to setup anon ftp,
      # /usr/lpp/tcpip/samples/anon.ftp

      21. If telnet takes more time to produce a prompt, do the below checks

      a. do nslookup of the client ip from the aix serverb.
      b. Check the nameservers in /etc/resolv.confc.
      c. Check the ‘hosts’ entry in /etc/netsvc.conf or NSORDER variable

      This issue might be due to the DNS configuration issue. Pointing to a good nameserver should solve the problem.

      22. How to shutdown the system to maintenance mode ?

      # shutdown -Fm

      23. How to log ftp accesses to a file

      a. Add the below line in /etc/syslog.confdaemon.debug /tmp/daemon.log
      b. # touch /tmp/daemon.log
      c. # refresh syslogd
      d. Modify your inetd.conf so that ftpd is called with the “-l” flag.

      24. How to find a file name from inode number ?

      # ncheck -i xxxx /mountpoint
      where xxxx -> inode number of the file

      25. How to redirect the system console to a file or tty temporarily

      # swcons /tmp/console.out

      or

      # swcons /dev/tty5

      26. How to recreate a deleted /dev/null file ?

      # /bin/mknod /dev/null c 2 2

      27. How to add commands that should get executed during every system shutdown ?

      Add them to /etc/rc.shutdown

      28. How to reduce the size or do cleanup of /var/adm/wtmp ?

      # > /var/adm/wtmp

      29. How to find out the fileset a file belongs to ?

      # which_fileset command_name

      30. In which file, the mapping of file Vs fileset stored ?

      # /usr/lpp/bos/AIX_file_list

      31. How to set maximum logins for a user in a system ?

      Change the value of “maxlogins” under “usw” stanza in /etc/security/login.cfg

      32. How to change the initial message that prints while logging in ?

      Change the value of “herald” in /etc/security/login.cfg

      33. How to set the # of seconds the user is given to enter their password ?

      Change the value of “logintimeout” under “usw” stanza in /etc/security/login.cfg

      TIPS: FLRT reports now include security and HIPER data

      https://ibm.biz/BdRh6p

      You’ve asked for it, and IBM delivered!

      FLRT continues to provide update and upgrade recommendations based on your input level, usually your current level, for Power firmware, HMC, AIX, VIOS and many more products.

      Now, in addition to the recommendations, you’ll see any security or HIPER fixes that have been released ‘on top’ of those levels, including your input level.

      This provides you with options. First, you will be able to see what issues reside on each level. Based on this data, and the end of service dates, you can make decisions about updating or upgrading or staying on your current level.

      Here’s an example of an AIX report:

      image

      Notice that the information is provided for each APAR or security advisory, with direct links. Or, you can see the information in the easy to use Security APARs or HIPER APARs tables. These tables also list the service packs that the fixes will be released in, so you can plan accordingly.

      The report also provides abstract information if you hover over the APAR or CVE number with your cursor.  This allows you to get a quick view before having to click on the link.  Very useful!

      Here’s a quick example of a report you can try this with:  http://www-304.ibm.com/webapp/set2/flrt/report?fcn=power&plat=power&mtm=9179-MHC&fw=AM740_100&hmc=V7+R740&p1.parnm=Partition+1&p1.os=aix&p1.aix=6100-07-07&p2.parnm=Partition+2&p2.os=vios&p2.vios=2.2.2.2&reportname=&btnGo=Submit

      Here’s an example for a VIOS partition:

      image

      I hope you enjoy this new function and please let us know what you think with our feedback button or take our FLRT survey to let us know what other options you would like to see added to FLRT.

      Thanks!!!

      Julie Craft

      FLRT architect

      Austin, TX