Check out this great article by Mark Ray!
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
Accessing the Data in Core Dumps
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
Check out this great article by Mark Ray!
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
In order to fully use these cards and get them to show up as ent devices perform the following:
After the existing AIX RoCE file sets are updated with the new file sets, both the roce and the ent devices might appear to be configured. If both devices appear to be configured when you run the lsdev command on the adapters, complete the following steps:
1. Delete the roceX instances that are related to the PCIe2 10 GbE RoCE Adapter by entering the following command:
# rmdev -dl roce0[, roce1][, roce2,…]
2. Change the attribute of the hba stack_type setting from aix_ib (AIX RoCE) to ofed (AIX NIC + OFED RoCE) by entering the following command:
# chdev -l hba0 -a stack_type=ofed
3. Run the configuration manager tool so that the host bus adapter can configure the PCIe2 10 GbE RoCE Adapter as a NIC adapter by entering the following command:
# cfgmgr
5. Verify that the adapter is now running in NIC configuration by entering the following command:
# lsdev -Cc adapter
The following example shows the results when you run the lsdev command on the adapter when it is configured in the AIX NIC + OFED RoCE mode:
Figure 1. Example output of lsdev command on an adapter with the AIX NIC + OFED RoCE configuration
ent1 Available 00-00-01 PCIe2 10GbE RoCE Converged Network Adapter
ent2 Avaliable 00-00-02 PCIe2 10GbE RoCE Converged Network Adapter
hba0 Available 00-00 PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714101604)
You should no longer see roce0 even after running cfgmgr, you can now treat the card like a regular network card (ent)…
License Internal Code (LIC) upgrade process
IBM® Power Systems™ firmware update, which is often referred to as Change Licensed Internal Code (LIC) procedure, is usually performed on the managed systems from the Hardware Management Console (HMC). Firmware update includes the latest fixes and new features. We can use the Change Licensed Internal Code wizard from the HMC graphical user interface (GUI) to apply updates to the Licensed Internal Code (LIC) on the selected managed system.
We can select multiple managed systems to be updated simultaneously. The wizard also allows us to view the current system information or perform advanced operations. This tutorial provides the step-by-step procedure for the IBM Power Systems firmware update from the HMC command line, and the HMC GUI and is targeted for system administrators.
This step-by-step instructions can prepare the newbie for what needs to be done and how it could be done to stay on to the latest firmware level all the time. When you purchase a new hardware, the best practise is to upgrade all the firmware to the latest level.
PDF (2138 KB) <—Click for the PDF of this article…
The flexible service processor (FSP) firmware provides diagnostics, initialization, configuration, run-time error detection, and correction. It is required to periodically update the firmware on the Power Systems server. Keeping the firmware up-to-date can help in attaining the maximum reliability and functionality from your systems.
Firmware releases enable new function and might also contain fixes or enhancements.
Firmware service packs provide fixes and enhancements within a specific release.
This tutorial provides the following information:
In the following sections, let’s go through in detail covering all the topics highlighted above.
We will use the View system information option to get the current system firmware information.
We will be using this information in IBM Fix Central to obtain information on the latest firmware updates or upgrades available for the system and proceed with the firmware update or upgrade to newer release using the instructions described in the following sections.
Select the system under test, click Updates, and then click View system information to check the currently installed, activated, and accepted levels.
The following figure shows the currently installed firmware levels on the system.
Fields in figure 1.2 are described below:
EC Number
This displays the numerical identifier of the engineering change (EC) that shows the system and GA level. It has the format of PPNNSSS, where:
LIC Type
This displays the LIC types associated with the selected target.
Machine Type/Model/Serial Number
This displays the corresponding machine type, model number, and serial number.
Installed Level
This displays the LIC level that will be activated and loaded into memory at the next system restart.
Activated Level
This displays the LIC level that is activated and loaded into memory (for example, from a level 5 to level 7).
Accepted Level
This displays the LIC level that was committed. This refers to the updates selected on the system.
This is the backup level of code that you can return to, if necessary. Generally, this is the level of code on the permanent side (p-side).
Unactivated Deferred Level
This displays the latest or highest LIC level that contains unactivated deferred updates. This refers to the updates selected on the system.
A deferred update requires a system restart to activate.
Platform IPL Level
This displays the LIC level on which the hypervisor and partition firmware were last restarted. When concurrent LIC updates are performed, the activated level will change, but the platform IPL level will remain unchanged.
Update Control
This displays the current owner of LIC update control. It can be either HMC or operating system.
Having known the current firmware levels on the system as described in Section 1 and in order to move up to the necessary latest update that is available, we have various firmware update and upgrade methods as mentioned below. Select the one that is appropriate to your requirement.
Section 4 describes the concurrent firmware update procedure. We can also use the DVD method to perform code upgrades (to a new release). This can be used when the HMC cannot access Internet due to firewall.
Section 7 describes the disruptive upgrade procedure using the FTP method. Similarly, the FTP procedure can also be used for concurrent code updates (within the same release).
Section 8 describes the code upgrade procedure disruptively using the IBM website. A similar procedure can be used for performing concurrent code updates as well.
After selecting the required system from the HMC, ensure to select Change Licensed Internal Code in order to perform code updates (any updates within the same release) and select Upgrade Licensed Internal Code in order to perform code upgrades (by installing the different release).
Power Systems firmware fix packs or firmware releases can be obtained from the IBM Fix Central website.
Select the following categories for Power Systems firmware update and choose the appropriate machine type and model of your system to be updated.
As per the example shown in Figure 3.0, the machine type and model used is: 8203-E4A. Select the appropriate machine type of your choice and continue.
Example in Figure 3.1 shown below is for the system firmware only. Similarly, you can explore other options too.
If users are aware of the specific firmware level, then users can select the necessary option directly. If not, users can also take help from the recommendations that the website can provide about the latest and the best-suited firmware levels. If you need help, select the I need guidance.I am not sure what level of firmware is recommended option as shown in Figure 3.2.
Choose the specific level or get the recommended level as shown below:
Decide whether your system needs firmware update to the latest fix pack or upgrade to a new release based on the current levels installed on the system as obtained from View system information in the above section.
As an example, let us continue to get the firmware service pack within the current release, as shown in Figure 3.6.
Similarly, users can get the upgrade code, that is, newer release using the second option. Note that this will be a disruptive code install, that is, system power recycles.
Note:
Download the update code, if you are planning an update within the current release.
Download the upgrade code if you are planning for an upgrade to a newer release itself.
Figure 3.7 lists the latest, recommended, and available updates to your current release. Select the appropriate option and proceed further.
Continue with downloading the ISO file if you want burn it to a DVD to proceed with the firmware update using the DVD media, or get the code to a remote FTP-enabled system to perform update using the FTP method. The firmware update procedure is explained in detail in the following sections.
You can update the firmware concurrently (that is, the fixes that can be deployed on a running system without rebooting partitions or performing an IPL) within a specific release. Select the Change Licensed Internal Code option for the current release.
In the Specify LIC Repository section (as shown in Figure 4.2), select the location of the LIC update repository.
Select the DVD-RAM drive option,where you have the DVD placed and proceed with code update concurrently, as shown in Figure 4.3.
Note: Place the DVD in the HMC’s DVD drive (and not in the system’s DVD drive).
Click OK to proceed further to the subsequent steps to perform code update. It verifies whether the system is ready for code update by performing the health check and if everything fine, we can proceed further.
The following screen captures show the step-by-step procedure to perform concurrent code update.
Firmware updates are usually concurrent. Disruptive update service packs are very rare. The procedure to perform disruptive update is quite similar to concurrent update (explained in Section 4) but this process will prompt for system power cycle during the operation.
We use the Select advanced features option to perform advanced operations, such as Remove and activate and Reject fix.
Remove and activate option
The Remove and activate option brings the system back to the update level that is on the permanent side. You can use this option to back off an update level.
Click OK and then Close to remove and activate the permanent side update level.
Reject Fix operation:
Boot the system in the Permanent Side mode (from ASMI -> Power/Restart Control -> Power On/Off System, and make sure that the Current firmware boot side option is displayed as Permanent) and only then the Reject Fix option gets enabled and the operation can be performed. This operation copies the currently running level (permanent side) to the temporary side. This can be used to reject a fix that has been applied.
Click OK to start this operation.
Installing a release or a disruptive fix pack causes system IPL. All release upgrades are disruptive.
We can obtain the upgrade code, that is, the disruptive fix pack from Fix Central and burn it to a media drive and proceed with the upgrade process, which is quite similar to the concurrent update process explained in the earlier sections (except that this operation is disruptive).
In this section, let us learn how to use the FTP method to upgrade the system using the firmware code stored in a remote repository.
The following screen captures shows the steps to upgrade to newer firmware releases disruptively using the FTP method.
Clicking OK starts the disruptive upgrade. System will be on the applied release level after the upgrade operation completes.
After logging in to the HMC, click System Management > Servers > Target Server on the left pane. Instead, you can also click the Updates icon on the same pane. All the available servers will be displayed in the right pane. In the following figure, the red highlight in the right pane shows the current level installed.
Make sure that your target server is in the shutdown mode, and if not, switch off the server.
Now, click the Upgrade Licensed Internal Code to a new release link at the bottom of the page as shown in the following figure.
After clicking the link, you will be directed to the web page which will show information about the readiness check. If there is no errors found, you can click OK and proceed further, as shown in the following figure.
After clicking OK, you will be directed to the Specify LIC Repository page. Here, you need to select the location of the code. The options shown in the following figure are available.
If you are setting a new server configuration, the best practice at this prompt is always to select the IBM service web site option and you need not worry about the need to power off and power on the managed systems in this method.
After selecting the IBM Service web site option, you will have a new web page opened, which will show you the available LIC level details. Here, the best practice is to select the latest available code (that is, the latest available version). Most of the fixes are added by IBM and your Power Systems server will be upgraded to the latest level. Then, select the best as per your requirement, or the latest supported.
Be patient here and follow the prompts to complete the upgrade. The firmware upgrade activity will need time depending on your Internet bandwidth speed. Do not forget to switch on the server, so that the latest firmware gets activated and reflected in the navigation pane, as shown in the following figure.
Now you are done with the upgrade. Remember if you select multiple systems, you can upgrade them as well.
9/23/13 Update – See this upda
Here is a script I’ve written to visualize the physical layout of an AIX volume group. The script visually shows the location of every Physical Partition (PP) on each hdisk (AKA Physical Volume). The output shows which Logical Volume (LV) is on each of the PP’s (or if it is free space). The output is color coded so each LV has its own color so that it is very easy to see where each LV physically is across the entire Volume Group. You can specify the number of columns of output depending on the size of your screen.
The intended use of the script is to show a visual representation of the Volume Group to make using commands which move around LP’s/PP’s such as migratelp easier to use, to make LVM/disk maintenance easier, and also as a learning tool.
Here are a few screenshots:
When running the script you specify 2 parameters: The volume group name, and the number of columns you would like displayed (or it will default to 3 columns if not specified).
Here is the script:
#!/bin/ksh
#vvg - visualize physical layout of AIX volume group
#Copyright Brian Smith, 2013
index=0
set -A colors 41m 42m 43m 44m 45m 46m 47m 100m 101m 102m 103m 104m 105m 106m
temp
temp
> $tempfile
> $tempfile2
if [ -n “$1” ]; then
vg=$1
else
echo “Specify VG name as first parameter”
exit 1
fi
if ! lsvg $vg >/dev/null 2>&1; then
echo “Error: VG name not correct or VG not varried on”
exit 2
fi
[ -n “$2” ] && col=$2 || col=3
if ! echo $col | grep “^[0-9]*$” >/dev/null || [ “$col” -eq 0 ]; then
echo “Error: second parameter should be number of columns”
exit 3
fi
count=0
columns=””
while [ “$count” -lt “$col” ]; do
columns=”$columns -”
count=`expr $count + 1`
done
showdisk()
{
. $tempfile
. $tempfile2
[ “$index” -gt 0 ] && index=`expr $index + 1`
pv=$1
lspv -M $pv | while read line; do
if echo $line | awk ‘NF==1 {print}’ | grep ‘-‘ >/dev/null; then
elif echo $line | awk -F: ‘{print $2}’ | grep “^[0-9]*$” >/dev/null ; then
else
fi
done | while read line2; do
pp=`echo “$line2” | awk ‘{print $1}’ | awk -F: ‘{print $2}’`
lv=`echo “$line2” | awk ‘{print $2}’ | awk -F: ‘{print $1}’`
lp=`echo “$line2″ | awk ‘{print $2}’ | awk -F: ‘{print $2}’`
eval if ! [ -n \”\$${lv}\” ]\; then \
fi
eval printf \\\\
if [ -n “$lp” ]; then
else
fi
echo
done | paste -d ” ” $columns
}
for pv in `lspv | grep ” $vg ” | awk ‘{print $1}’`; do
ppsize=`lspv $pv | grep “^PP SIZE” | awk ‘{print $3 ” ” $4}’`
echo “\03
printf “\033[1;36m* %-8s
printf “\033[1;36m* Size : %-10s * \n\033[0m” “`getconf DISK_SIZE /dev/$pv` MB”
printf “\033[1;36m* PP Size: %-19s* \n\033[0m” “$ppsize”
echo “\03
showdisk $pv
done
rm $tempfile
rm $tempfile2
As you can see one of hdisk is missing! And you start to panic! “OMG, hdisk is missing, where, how, when?!?!”
There is no place for panic. You will see that one of your disks is missing only after you have restarted one of your VIOS. In are case there is two VIOS. hdisk0 is from first VIOS, hdisk1 is from second VIOS. These two hdisk is creating volume group called rootvg.
How to fix this missing hdisk state?
All you need to do is to activate.
root@aix-server> [/] varyon rootvg
This will activate your volume group rootvg. After this you will see both of your hdisk as active!
Why this is important? Because of this:
When a volume group is activated, physical partitions are synchronized if they are not current.
But there is one case when you can’t make your hdisk active without making additional changes! In this case, after you execute varyon command, error will be prompted and you won’t be able to make your hdisk active!
root@aix-server> [/] varyon rootvg
varyonvg: Cannot varyon volume group with an active dump device on a missing physical volume. Use sysdumpdev to temporarily replace the dump device with /dev/sysdumpnull and try again.
So, as error said active dump device is on missing physical volume hdisk0.(I will not explaind here what system dump device is) How to change this? First we will list status of sysdump devices.
root@aix-server> [/] sysdumpdev -l
primary /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
From here we can see, that primary device is located on /dev/lg_dumplv and secondary device is /dev/sysdumpnull. In error message, active dump device is actually primary dump device in sysdumpdev -l. So we need to change that.
root@aix-server> [/] sysdumpdev -p /dev/sysdupmnull
List again sysdump devices.
root@aix-server> [/] sysdumpdev -l
primary /dev/sysdumpnullsecondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
Now execute activation of volume group.
root@aix-server> [/] varyon rootvg
root@aix-server> [/]
root@aix-server> [/] lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 4 00..00..00..00..04
hdisk1 active 546 0 00..00..00..00..00
As you can see now, both hdisk are active now.
Now, change back you primary dump device
root@aix-server> [/] sysdumpdev -p /dev/lg_dumplv
https://www.ibm.com/developerworks/aix/library/au-aix-vios-clustering/
PDF File <— Click here for the PDF file of this great article!
By: Karthikeyan Kannan (virtualkarthik@hotmail.com), Senior Consultant, Capgemini
I love Power Systems and always wondered why Power Systems doesn’t have snapshot and thin-provisioning features. Finally, I found that these are enabled in IBM Power Systems too with the introduction to the shared storage pool concept.
Shared storage pool, as the name states is basically to share storage resources (SAN disks) across a group of IBM VIOS instances. Not just as a physical disk, but slicing them like a logical volume inside the shared storage pool, which is denoted as a logical unit (LU). A LU is basically file-backed storage present in the clustered storage pool.
The VIOS needs to be at a minimum of version 2.2.0.11. I tested the functionality in VIOS 2.2.1.4. With the present release of VIOS 2.2.2.1, you can have 16 VIOS nodes in a cluster and can support up to 200 clients per VIOS node.
The shared storage pool concept takes advantage of the Cluster Aware AIX (CAA) feature in the IBM AIX® operating system to form a cluster of VIOS. Using the CAA feature, the cluster can monitor the peers in the cluster. Refer to Chris Gibson’s blog for more information about CAA.
In this article, I am using two VIOS instances hosted on two different physical systems. We will see details about the following tasks as you navigate through this article.
Features include:
Figure 1 shows the lab setup that I have used to illustrate this feature throughout the article.
We will also log in to both the VIOS and verify the configuration.
$ hostname VIOSA $ ioslevel 2.2.1.4 $ lspv NAME PVID VG STATUS hdisk0 00c858a2bde1979e rootvg active hdisk1 00c858a2cbd45f6b None hdisk2 00c858a2cca2a81d None hdisk3 00c858a210d30593 None hdisk4 00c858a210d32cfd None $ lsvg rootvg $ lssp Pool Size(mb) Free(mb) Alloc Size(mb) BDs Type rootvg 102272 77824 128 0 LVPOOL $ $ cluster -list $
$ hostname VIOSB $ ioslevel 2.2.1.4 $ lspv NAME PVID VG STATUS hdisk0 00c858a2bde1979e rootvg active hdisk1 00c9095f0f795c20 None hdisk2 00c858a2cca2a81d None hdisk3 00c858a210d30593 None hdisk4 00c858a210d32cfd None $ lsvg rootvg $ lssp Pool Size(mb) Free(mb) Alloc Size(mb) BDs Type rootvg 102272 77824 128 0 LVPOOL $ $ cluster -list $
I have five disks in my VIOS systems, hdisk0 and hdisk1 are used by the VIOS and client logical partition or LPAR (rootvg) respectively in both the VIOS. The disks that we are going to use are hdisk2, hdisk3, and hdisk4. Look at the physical volume ID (PVID) for all of them; they are same on both the VIOS, which confirms that the same set of physical disks is shared between both the VIOS instances. The order of their naming does not need to be the same, as it is the PVID that matters.
You also need to ensure that the VIOS nodes in a cluster are reachable in the IP network. You should be able to resolve their hostnames either by using /etc/hosts or by DNS.
Now that our playground is ready, let’s start the game by creating a VIOS cluster and a shared storage pool. This should be performed using the cluster command that initializes the cluster process and creates a shared storage pool.
For our demo cluster, I am using hdisk2 for the CAA repository disk that holds all the vital data about the cluster and hdisk3 and hdisk4 for shared storage pool.
$ cluster -create -clustername demo1 -repopvs hdisk2 -spname demosp -sppvs hdisk3 hdisk4 -hostname viosa Cluster demo1 has been created successfully. $
As soon as the command completes successfully, we can verify the status of the cluster and its attributes using the –list
and -status
flags of the cluster command.
$ cluster -list CLUSTER_NAME: demo1 CLUSTER_ID: 36618f14582411e2b6ea5cf3fceba66d $ $ cluster -status -clustername demo1 Cluster Name State demo1 OK Node Name MTM Partition Num State Pool State VIOSA 9117-MMC0206858A2 39 OK OK $
The above code ran on VIOS A tells us that there is a cluster with the name demo1 created with the Cluster ID 36618f14582411e2b6ea5cf3fceba66d. This cluster ID is a unique identifier for each cluster that is created. The command cluster status indicates the status of the cluster denoting whether the cluster is in the operating state or do we have any problems in it. It also gives useful information (such as model type, serial number, and the partition ID of the hosting VIOS) about the physical system.
We can also use the CAA commands, such as lscluster
to view the status of the cluster to ensure that it is operational.
$ lscluster -m Calling node query for all nodes Node query number of nodes examined: 1 Node name: VIOSA Cluster shorthand id for node: 1 uuid for node: 365731ea-5824-11e2-b6ea-5cf3fceba66d State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID demo1 local 36618f14-5824-11e2-b6ea-5cf3fceba66d Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a $
So far, we been verifying only the cluster, where did the storage pool go? The cluster command will neither show you the shared storage pool created using the cluster command nor the lssp
command.
To view the shared storage pool, we need to use the legacy lssp
command of VIOS, which is used to list the storage pools, but with a special flag –clustername.
The command format to list the shared storage pool available within the cluster is lssp -clustername <NAME>
.
$ lspv NAME PVID VG STATUS hdisk0 00c858a2bde1979e rootvg active hdisk1 00c858a2cbd45f6b None hdisk2 00c858a2cca2a81d caavg_private active hdisk3 00c858a210d30593 None hdisk4 00c858a210d32cfd None $ lssp Pool Size(mb) Free(mb) Alloc Size(mb) BDs Type rootvg 102272 77824 128 0 LVPOOL $ lsvg rootvg caavg_private $ lssp -clustername demo1 POOL_NAME: demosp POOL_SIZE: 102144 FREE_SPACE: 100391 TOTAL_LU_SIZE: 0 TOTAL_LUS: 0 POOL_TYPE: CLPOOL POOL_ID: 00000000097938230000000050E9B08C $
In the above output, you can see that the name of the shared storage pool is demosp, the total size of the shared storage pool is 100 GB, and the free space is 100391 MB. You can also see the fields pointing to the number of LUs and the total size of the LUs is 0, as we do not have any LUs created so far. Along with a unique cluster ID, the shared storage pool also gets a unique identifier.
You may also need to note that there is a new volume group (VG) named, caavg_private
, created along with the shared storage pool. This VG is CAA-specific and the disks that are part of this VG stores the vital data to keep the cluster alive and running. You should not use this VG for any other purposes.
As stated in the start of the article, a logical unit (LU) is a file-backed storage device that can be presented to a VIOS client as a virtual SCSI (VSCSI) disk-backing device.
Now, we need to create a LU on top of the shared storage pool. In VIOS A, we already have a vhost0 connection created, through which the lparA gets a physical hard disk for its rootvg.
$ lsmap -all SVSA Physloc Client Partition ID --------------- -------------------------------------------- ------------------ vhost0 U9117.MMC.06858A2-V39-C3 0x0000000f VTD LPARA_RVG Status Available LUN 0x8100000000000000 Backing device hdisk1 Physloc U78C0.001.DBJ0379-P2-C3-T1-W500507680120D9ED-L1000000000000 Mirrored false $
Now all the setup for us to have the LU created for the client LPAR A from the shared storage pool. The LU is also going to be same as a LV or file-backed backing device on top of a storage pool. The command we use to create it is the same VIOS command mkbdsp, with some additional flags.
$ mkbdsp -clustername demo1 -sp demosp 20G -bd lparA_lu1 Lu Name:lparA_lu1 Lu Udid:2f4adc720f570eddac5dce00a142de89 $
In the above output, I used the mkbdsp
command to create the LU first. I have created a LU of size 20 GB on demosp (which is not mapped to any client yet). To map it, you need to again use the mkbdsp
command as shown in Listing 9.
$ mkbdsp -clustername demo1 -sp demosp -bd lparA_lu1 -vadapter vhost0 -tn lparA_datavg Assigning file "lparA_lu1" as a backing device. VTD:lparA_datavg $
Note that I have not mentioned the size here because the LU already exists. This command will map the LU lparA_lu1
to vhost0 with the VTD name, lparA_datavg.
Instead of going with two steps, one for creating and one for assigning to a client, we can perform both of these operations in a single command as depicted in the following output. Before that, I will have to delete the VTD lparA_datavg backed by the LU lparA_lu1 which we just mapped. We can use the usual rmvdev for the VTD and rmbdsp for the LU.
$ rmvdev -vtd lparA_datavg lparA_datavg deleted $ rmbdsp -clustername demo1 -sp demosp -bd lparA_lu1 Logical unit lparA_lu1 with udid "a053cd56ca85e1e8c2d98d00f0ab0a0b" is removed. $
Now, I will create and map the LU in a single command as shown in the following output.
$ mkbdsp -clustername demo1 -sp demosp 20G -bd lparA_lu1 -vadapter vhost0 -tn lparA_datavg Lu Name:lparA_lu1 Lu Udid:c0dfb007a9afe5f432b365fa9744ab0b Assigning file "lparA_lu1" as a backing device. VTD:lparA_datavg $
As the LU is created and mapped, the client should be able to see the LU as a disk. We will verify the LU and the mapping in VIOS A once and move over to the client lparA which is a client of VIOS A.
The lssp
command can be used to list the backing devices in the shared storage pool.
$ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN c0dfb007a9afe5f432b365fa9744ab0b $
The following output in Listing 13 (in bold) shows that the lparA_lu1 LU is mapped to the lparA client on vhost0.
On the client machine lparA, we already have one physical volume that is used by rootvg. Now, the new disk should be available for the client to use. We will attempt to configure the LU provided to the client.
lparA#hostname lparA lparA#lspv hdisk0 00c858a2cbd45f6b rootvg active lparA#cfgmgr lparA#lspv hdisk0 00c858a2cbd45f6b rootvg active hdisk1 none None # lscfg -vpl hdisk1 hdisk1 U9117.MMC.06858A2-V15-C2-T1-L8200000000000000 Virtual SCSI Disk Drive PLATFORM SPECIFIC Name: disk Node: disk Device Type: block #
We made it through! The LU is now available to the client as a virtual SCSI disk drive.
So far, we have set up a shared storage pool with a single VIOS, created a logical unit, and assigned the LU to the lparA client.
Now let’s concentrate on some VG operations on the client side to explore the snapshot feature of shared storage pool. Using the new LU provided to lparA, I am going to create a volume group (datavgA) and a file system, named /datafsA, on top of it.
lparA#mkvg -y datavgA hdisk1 0516-1254 mkvg: Changing the PVID in the ODM. datavgA lparA#crfs -v jfs2 -m /datafsA -g datavgA -a size=2G File system created successfully. 2096884 kilobytes total disk space. New File System size is 4194304 lparA# lparA#mount /datafsA lparA#cd /datafsA lparA#touch file_lparA lparA#ls file_lparA lost+found lparA#
Now, we will create two files. One named before_snap after which I will take a snapshot and then one more named after_snap. We will also restore the snapshot for demonstration.
lparA#touch before_snap lparA#ls before_snap file_lparA lost+found lparA#pwd /datafsA lparA#
On VIOS A, we will take a snapshot now.
The command to capture a snapshot is:
snapshot -clustername <Clustername> -spname <Shared_Pool_Name> -luudid <ID> -create SNAP_NAME
$ snapshot -clustername demo1 -create lparA_lu1_SNAP1 -spname demosp -lu lparA_lu1 lparA_lu1_SNAP1 $ $ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 Snapshot lparA_lu1_SNAP1 $
The lssp
command in the above code listing indicates that there is a snapshot named lparA_lu1_SNAP1 associated with the lparA_lu1 LU.
Now, we will create one more file named after_snap in the lparA client.
lparA#pwd /datafsA lparA#touch after_snap lparA#ls after_snap before_snap file_lparA lost+found lparA#cd lparA#umount /datafsA lparA#varyoffvg datavgA lparA#
I have varied off the volume group now with data on it. It is always recommended to take the resources offline incase you want to restore some data. You should be familiar with it on your experience.
Let’s try to restore the lparA_lu1_SNAP1 snapshot and see what data is present in the volume group on the client side.
$ snapshot -clustername demo1 -rollback lparA_lu1_SNAP1 -spname demosp -lu lparA_lu1 $
lparA#varyonvg datavgA lparA#mount /datafsA Replaying log for /dev/fslv00. lparA#ls -l /datafsA total 0 -rw-r--r-- 1 root system 0 Jan 6 23:27 before_snap -rw-r--r-- 1 root system 0 Jan 6 23:24 file_lparA drwxr-xr-x 2 root system 256 Jan 6 23:23 lost+found lparA#
After the snapshot was restored and the volume group brought back online, there is no file named after_snap. This is because the file was created after the snapshot. Now that we rolled back the snapshot, it does not exist.
If you want to delete the snapshot, you can use the snapshot
command, as shown in the following listing.
$ snapshot -clustername demo1 -delete lparA_lu1_SNAP1 -spname demosp -lu lparA_lu1 $
So far, whatever we performed is only on VIOS A and lparA. The cluster what we have created is also a single-node cluster. You can ask me, what is a single node cluster and how can it be? Well that is what the CAA feature of AIX dictates. A cluster can be created with a single node too.
Let’s expand our cluster, demo1, by adding the second VIOS instance VIOS B on the other CEC.
$ cluster -addnode -clustername demo1 -hostname viosb Partition VIOSB has been added to the demo1 cluster. $
The above command has added VIOS B to the cluster. Let’s verify it with the cluster -status command.
$ cluster -status -clustername demo1 Cluster Name State demo1 OK Node Name MTM Partition Num State Pool State VIOSA 9117-MMC0206858A2 39 OK OK VIOSB 9119-59502839095F 3 OK OK $
If you see the above output, it clearly states that VIOS A and VIOS B are hosted on two different physical systems, and now both are part of the VIOS cluster, demo1.
Now, we can move to VIOS B and check if the entire configuration that we did on VIOS A has really reflected in VIOS B.
$ hostname VIOSB $ cluster -list CLUSTER_NAME: demo1 CLUSTER_ID: 36618f14582411e2b6ea5cf3fceba66d $ lscluster -m Calling node query for all nodes Node query number of nodes examined: 2 Node name: VIOSA Cluster shorthand id for node: 1 uuid for node: 365731ea-5824-11e2-b6ea-5cf3fceba66d State of node: UP Smoothed rtt to node: 7 Mean Deviation in network rtt to node: 3 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID demo1 local 36618f14-5824-11e2-b6ea-5cf3fceba66d Number of points_of_contact for node: 2 Point-of-contact interface & contact state dpcom UP RESTRICTED en3 UP ------------------------------ Node name: VIOSB Cluster shorthand id for node: 2 uuid for node: a9d1aeee-582d-11e2-bda1-5cf3fceba66d State of node: UP NODE_LOCAL Smoothed rtt to node: 0 Mean Deviation in network rtt to node: 0 Number of clusters node is a member in: 1 CLUSTER NAME TYPE SHID UUID demo1 local 36618f14-5824-11e2-b6ea-5cf3fceba66d Number of points_of_contact for node: 0 Point-of-contact interface & contact state n/a $ $ lssp -clustername demo1 POOL_NAME: demosp POOL_SIZE: 102144 FREE_SPACE: 100353 TOTAL_LU_SIZE: 20480 TOTAL_LUS: 1 POOL_TYPE: CLPOOL POOL_ID: 00000000097938230000000050E9B08C $ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 Snapshot lparA_lu1_SNAP1 $
We have verified from the above command output that VIOS B is also connected to the cluster and the shared storage pool, demosp, is available on VIOS B. I tried to map the LU lparA_lu1 to the client lparB, which is connected to VIOS B.
$ lsmap -all SVSA Physloc Client Partition ID --------------- -------------------------------------------- ------------------ vhost0 U9119.595.839095F-V3-C2 0x00000004 VTD lparB_RVG Status Available LUN 0x8100000000000000 Backing device hdisk1 Physloc U5791.001.99B0PA1-P2-C02-T1-W500507680110D9E3-L1000000000000 Mirrored false $ $ mkbdsp -clustername demo1 -sp demosp -bd lparA_lu1 -vadapter vhost0 -tn lparB_datavgA Assigning file "lparA_lu1" as a backing device. VTD:datavgA $ lsmap -all SVSA Physloc Client Partition ID --------------- -------------------------------------------- ------------------ vhost0 U9119.595.839095F-V3-C2 0x00000004 VTD lparB_RVG Status Available LUN 0x8100000000000000 Backing device hdisk1 Physloc U5791.001.99B0PA1-P2-C02-T1-W500507680110D9E3-L1000000000000 Mirrored false VTD lparB_datavgA Status Available LUN 0x8200000000000000 Backing device lparA_lu1.687f8420bbeee7a5264ce2c6e83d3e66 Physloc Mirrored N/A $
Yes, I am able to successfully map the same LU to lparA and lparB at the same time. I can now log in to lparB and see if it got the disk visible for the client OS.
lparB#lspv hdisk0 00c9095f0f795c20 rootvg active hdisk1 00c858a210fdef5e None lparB#
lparA#lspv hdisk0 00c858a2cbd45f6b rootvg active hdisk1 00c858a210fdef5e datavgA active lparA#
Looking at the above output, I can confirm that we are able to share the same LU for two clients at the same time. Notice that the PVID is similar in both LPARs. Now, you can use the functionality to access the disk on both the clients. Beware of data corruption and use the right technology to access the disk, be it Logical Volume Manager (LVM) with concurrent or enhanced concurrent VG.
We have seen how to expand the cluster. We will also see how to shrink the cluster, that is, remove a VIOS node from the cluster.
Before removing a VIOS from the cluster, ensure that there is no LU provided to any clients from the specific VIOS that you intend to remove. In our case, we will remove VIOSB from the cluster.
$ cluster -rmnode -clustername demo1 -hostname viosb PARTITION HAS MAPPINGS VIOSB Command did not complete. $
Oops!!! The command failed.
This is because we have not removed the mapping of the lparA_lu1 LU that was provided to lparB through VIOSB. We can delete VTD mapping and rerun the command or use the -f flag. I’m using the –f flag because I know that there is only one LU mapped. Using the -f flag will remove all the VTD devices created using the LUs from that specific cluster. If you have multiple mapping, you need to verify and then proceed.
$ cluster -rmnode -f -clustername demo1 -hostname viosb Partition VIOSB has been removed from the demo1 cluster $
In case you need to add additional disks to the shared storage pool, you can use the following command format. I have not run it as I do not have an additional disk.
chsp -add -clustername <cluster_name> -sp <ssp_name> hdiskn
We have not touched up on one thing yet, which is thin provisioning. You do not need to perform or set up anything exclusively on a shared storage pool for using thin provisioning. If you would have seen all the output of the lssp
commands, you can see the header named “ProvisionType” and throughout our demonstration, all the LUs were thin provisioned. This is because in a shared storage pool, the default behavior is thin provisioning.
If you want to thick provision a LU, you need to specifically mention it using a -thick
flag with the mkbdsp
command.
$ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 $
We will try creating a thick provisioned LU for demonstration.
$ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 $ $ mkbdsp -clustername demo1 -sp demosp 50G -bd lparA_lu2 -vadapter vhost0 -tn lparA_datavg_D2 -thick Lu Name:lparA_lu2 Lu Udid:0ceaf03105d97f45ef4c595968f61cf7 Assigning file "lparA_lu2" as a backing device. VTD:lparA_datavg_D2 $ $ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 lparA_lu2 51200 THICK 0ceaf03105d97f45ef4c595968f61cf7 $ $ lsmap -all SVSA Physloc Client Partition ID --------------- -------------------------------------------- ------------------ vhost0 U9117.MMC.06858A2-V39-C3 0x0000000f VTD LPARA_RVG Status Available LUN 0x8100000000000000 Backing device hdisk1 Physloc U78C0.001.DBJ0379-P2-C3-T1-W500507680120D9ED-L1000000000000 Mirrored false VTD lparA_datavg Status Available LUN 0x8200000000000000 Backing device lparA_lu1.687f8420bbeee7a5264ce2c6e83d3e66 Physloc Mirrored N/A VTD lparA_datavg_D2 Status Available LUN 0x8300000000000000 Backing device lparA_lu2.0ceaf03105d97f45ef4c595968f61cf7 Physloc Mirrored N/A $
Now, take a look at the above output. The new LU that we have created is a thick provisioned LU and it has been also mapped to the client lparA.
Thin provisioning helps you to over commit the storage resources available. For example, consider we have a 20 GB LU and a 50 GB LU in our shared storage pool. Now let’s say we have a requirement for a client for 50 GB of space. We cannot fulfill this request in a normal VG scenario or if we have used thick provisioning of LUs in a shared storage pool. Now, as we have used thin provisioning for lparA_lu1, the unused space by the client is available for use. You can also see the output of the lssp
command in the following listing, which tells that there is 49 GB of free space in the shared storage pool.
$ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 lparA_lu2 51200 THICK 0ceaf03105d97f45ef4c595968f61cf7 $ $ lssp -clustername demo1 POOL_NAME: demosp POOL_SIZE: 102144 FREE_SPACE: 49150 TOTAL_LU_SIZE: 71680 TOTAL_LUS: 2 POOL_TYPE: CLPOOL POOL_ID: 00000000097938230000000050E9B08C $ mkbdsp -clustername demo1 -sp demosp 50G -bd testlu1 -thick Storage Pool subsystem operation, unable to create LU. Storage Pool subsystem operation, not enough space in the pool. $ mkbdsp -clustername demo1 -sp demosp 50G -bd testlu1 Lu Name:testlu1 Lu Udid:9e75b355e376eb81914df20bfb6c07f1 $
I tried to create a thick provisioned LU of 50 GB, but it failed due to insufficient space, whereas the command without the –thick flag was successful as it is thin provisioned.
Using thin provisioning also puts a risk of over-committing your storage resources. Though it is an advantage of virtualization, it also brings you a risk if you do not have control over the usage limit. Assume a scenario where all your clients started occupying whatever is allocated to them. In this case, you will be ending up with a problem of LVM write errors in the clients if the LUs are thin provisioned as there are no real blocks available to support when you have overcommitted your shared storage pool.
To overcome this, you can use the alert functionality of the shared storage pool to let the system administrator know in case the hard usage of the shared storage pool crosses the threshold limit.
$ alert -set -clustername demo1 -spname demosp -type threshold -value 75 $ Pool freespace is 47 percent. $ alert -list -clustername demo1 -spname demosp -type threshold PoolName: demosp PoolID: 00000000097938230000000050E9B08C ThresholdPercent: 75 $ alert -unset -clustername demo1 -spname demosp -type threshold $
After looking into the Listing 31 with two LUs mapped to a client, there might be a question in mind on how to take a snapshot at the same time when multiple LU’s are provided to a client. In storage, we refer to this as a consistency group where snapshots are created for a group of volumes at the same time to maintain consistency. This is also possible in shared storage pools.
To explain this, I am creating a single snap of the two LUs allocated to lparA on VIOS A.
$ snapshot -clustername demo1 -create datavgA_snap -spname demosp -lu lparA_lu1 lparA_lu2 datavgA_snap $ $ lssp -clustername demo1 -sp demosp -bd Lu Name Size(mb) ProvisionType Lu Udid lparA_lu1 20480 THIN 687f8420bbeee7a5264ce2c6e83d3e66 Snapshot datavgA_snap lparA_lu2 51200 THICK 0ceaf03105d97f45ef4c595968f61cf7 Snapshot datavgA_snap $
This way, we can ensure consistency across multiple disks by creating snapshots at the same time for consistency.
In this article, I have detailed the features of the shared storage pool and how best you can use it in your infrastructure. I felt that using a shared storage pool makes life easier on environments where you do not have heavy data workload systems. I have not performed any benchmarks for it. Apply it wherever it best suits your environment. This article is not intended to replace any official document, but can act as a quick-start guide for system administrators who would like to explore or test the shared storage pool concept.
From Ernie O. and Chuck L of IBM…
Please do not contact the author on this but raise any concerns on this paper through your IBM Storage Support channel.
I have always considered 8 NPIV paths/zones, each with a single initiator and single target, to be the best configuration for performance and availability.
Most vendors Path Control Modules recommend 8 paths for performance and, properly cabled, 8 paths also allows for a concurrent failure on one of the SAN Fabrics (assuming 2) as well as one of the VIO servers, (assuming 2).
However, the SVC using SDDPCM has 4 paths as it’s optimal number for performance. If you use 8 paths, and LPM a partition, the SVC has to manage 16 paths during the LPM move and this has resulted in a failed LPM move, with data loss, resulting in a reload of the partition.
Click for detailed diagrams .pdf —–> SVC zoning for PowerVM NPIV
Please do not contact the author on this but raise any concerns on this paper through your IBM Storage Support channel.
1. To list machines configured in a NIM Server,
# lsnim -c machines
2. To list networks configured in a NIM Server,
# lsnim -c networks
3. To reset a machine (return to ready state)
# nim -Fo reset MachineName
4. To list core file settings for a user,
# lscore user1
The output will look like:
compression: on
path specification: default
corefile location: default
naming specification: off
5. To list the default settings for the system,
# lscore -d
The output will look like:
compression: off
path specification: on
corefile location: /corefiles
naming specification: off
6. To make any process run by root dump compressed core files and restore the location of the core files to the system default,
# chcore -c on -p default root
Note: If no default is specified, cores will dump in the current directory.
7. To enable a default core path for the system, type:
# chcore -p on -l /corefiles -d
8. To scan logical volume lv01, report the status of each partition, and have every block of each partition read to determine whether it is capableof performing I/O operations, type:
# mirscan -l lv01
9. To do the above operation in a PV,
# mirscan -p hdisk1
10. To do the above operation in a VG,
# mirscan -v vg01
11. To determine if the 64-bit kernel extension is loaded,
# genkex grep 64
12. To list all JFS file systems,
# lsjfs
13. To list all JFS2 file systems
# lsjfs2
14. To mirror a terminal1 on terminal2
a. Open terminal 1 and find the pts value (ps -ef grep pts)
b. Open terminal 2 and enter ‘portmir -t pts/1’
c. Now you will see commands and outputs from terminal 1 in terminal 2.
This is basically monitor a terminal.
d. Say “portmir -o” to end the mirroring after the use
15. To identify the current run level,
# cat /etc/.init.state
16. To list the available CD ROM drives,
# lsdev -Cc cdrom
17. To find out the speed of your network adapter,
# entstat -d ent0 grep “Media Speed”
18. To find out when your system was last installed/updated
# lslpp -f bos.rte
19. To list the status of your tape drive,
# tctl -f /dev/rmt0 status
20. How to setup anonymous ftp in AIX
Run the below script to setup anon ftp,
# /usr/lpp/tcpip/samples/anon.ftp
21. If telnet takes more time to produce a prompt, do the below checks
a. do nslookup of the client ip from the aix serverb.
b. Check the nameservers in /etc/resolv.confc.
c. Check the ‘hosts’ entry in /etc/netsvc.conf or NSORDER variable
This issue might be due to the DNS configuration issue. Pointing to a good nameserver should solve the problem.
22. How to shutdown the system to maintenance mode ?
# shutdown -Fm
23. How to log ftp accesses to a file
a. Add the below line in /etc/syslog.confdaemon.debug /tmp/daemon.log
b. # touch /tmp/daemon.log
c. # refresh syslogd
d. Modify your inetd.conf so that ftpd is called with the “-l” flag.
24. How to find a file name from inode number ?
# ncheck -i xxxx /mountpoint
where xxxx -> inode number of the file
25. How to redirect the system console to a file or tty temporarily
# swcons /tmp/console.out
or
# swcons /dev/tty5
26. How to recreate a deleted /dev/null file ?
# /bin/mknod /dev/null c 2 2
27. How to add commands that should get executed during every system shutdown ?
Add them to /etc/rc.shutdown
28. How to reduce the size or do cleanup of /var/adm/wtmp ?
# > /var/adm/wtmp
29. How to find out the fileset a file belongs to ?
# which_fileset command_name
30. In which file, the mapping of file Vs fileset stored ?
# /usr/lpp/bos/AIX_file_list
31. How to set maximum logins for a user in a system ?
Change the value of “maxlogins” under “usw” stanza in /etc/security/login.cfg
32. How to change the initial message that prints while logging in ?
Change the value of “herald” in /etc/security/login.cfg
33. How to set the # of seconds the user is given to enter their password ?
Change the value of “logintimeout” under “usw” stanza in /etc/security/login.cfg
You’ve asked for it, and IBM delivered!
FLRT continues to provide update and upgrade recommendations based on your input level, usually your current level, for Power firmware, HMC, AIX, VIOS and many more products.
Now, in addition to the recommendations, you’ll see any security or HIPER fixes that have been released ‘on top’ of those levels, including your input level.
This provides you with options. First, you will be able to see what issues reside on each level. Based on this data, and the end of service dates, you can make decisions about updating or upgrading or staying on your current level.
Here’s an example of an AIX report:
Notice that the information is provided for each APAR or security advisory, with direct links. Or, you can see the information in the easy to use Security APARs or HIPER APARs tables. These tables also list the service packs that the fixes will be released in, so you can plan accordingly.
The report also provides abstract information if you hover over the APAR or CVE number with your cursor. This allows you to get a quick view before having to click on the link. Very useful!
Here’s a quick example of a report you can try this with: http
Here’s an example for a VIOS partition:
I hope you enjoy this new function and please let us know what you think with our feedback button or take our FLRT survey to let us know what other options you would like to see added to FLRT.
Thanks!!!
Julie Craft
FLRT architect
Austin, TX
Error description
su to NIS user fails with error 3004-503 cannot set
process creditials. This happens when system is upgraded
to 6.1 Tl09 SP01
Local fix
Problem summary
**************************************************************
* USERS AFFECTED:
* Systems running the 6100-09 Technology Level with
* bos.rte.security at the 6.1.9.0 or 6.1.9.1 level.
**************************************************************
*PROBLEM DESCRIPTION:
Switching to a NIS user using the ‘su’ command will fail with: 3004-503 cannot set process creditials.
This only affects customers using NIS (Network Information Service).
**************************************************************
* RECOMMENDATION:
* Install APAR IV53944.
* Prior to fix availability, an interim fix is available from
* either
* ftp://aix.software.ibm.com/aix/ifixes/iv53944/
* https://aix.software.ibm.com/aix/ifixes/iv53944/
**************************************************************
Problem conclusion
In the processing of NIS user credentials, the logic to find
stale cached records has been corrected so that the record is
not assigned an invalid pointer.
Temporary fix
*********
* HIPER *
*********
Comments
APAR information | |
APAR number | IV53944 |
Reported component name | AIX 610 STD EDI |
Reported component ID | 5765G6200 |
Reported release | 610 |
Status | CLOSED PER |
PE | YesPE |
HIPER | YesHIPER |
Submitted date | 2014-01-13 |
Closed date | 2014-01-27 |
Last modified date | 2014-03-28 |
APAR is sysrouted FROM one or more of the following:
IV53884