Check out this great article by Mark Ray!
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
Accessing the Data in Core Dumps
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
Check out this great article by Mark Ray!
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
http://www.ibmsystemsmag.com/aix/administrator/performance/core_dumps/
In order to fully use these cards and get them to show up as ent devices perform the following:
After the existing AIX RoCE file sets are updated with the new file sets, both the roce and the ent devices might appear to be configured. If both devices appear to be configured when you run the lsdev command on the adapters, complete the following steps:
1. Delete the roceX instances that are related to the PCIe2 10 GbE RoCE Adapter by entering the following command:
# rmdev -dl roce0[, roce1][, roce2,…]
2. Change the attribute of the hba stack_type setting from aix_ib (AIX RoCE) to ofed (AIX NIC + OFED RoCE) by entering the following command:
# chdev -l hba0 -a stack_type=ofed
3. Run the configuration manager tool so that the host bus adapter can configure the PCIe2 10 GbE RoCE Adapter as a NIC adapter by entering the following command:
# cfgmgr
5. Verify that the adapter is now running in NIC configuration by entering the following command:
# lsdev -Cc adapter
The following example shows the results when you run the lsdev command on the adapter when it is configured in the AIX NIC + OFED RoCE mode:
Figure 1. Example output of lsdev command on an adapter with the AIX NIC + OFED RoCE configuration
ent1 Available 00-00-01 PCIe2 10GbE RoCE Converged Network Adapter
ent2 Avaliable 00-00-02 PCIe2 10GbE RoCE Converged Network Adapter
hba0 Available 00-00 PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714101604)
You should no longer see roce0 even after running cfgmgr, you can now treat the card like a regular network card (ent)…
License Internal Code (LIC) upgrade process
IBM® Power Systems™ firmware update, which is often referred to as Change Licensed Internal Code (LIC) procedure, is usually performed on the managed systems from the Hardware Management Console (HMC). Firmware update includes the latest fixes and new features. We can use the Change Licensed Internal Code wizard from the HMC graphical user interface (GUI) to apply updates to the Licensed Internal Code (LIC) on the selected managed system.
We can select multiple managed systems to be updated simultaneously. The wizard also allows us to view the current system information or perform advanced operations. This tutorial provides the step-by-step procedure for the IBM Power Systems firmware update from the HMC command line, and the HMC GUI and is targeted for system administrators.
This step-by-step instructions can prepare the newbie for what needs to be done and how it could be done to stay on to the latest firmware level all the time. When you purchase a new hardware, the best practise is to upgrade all the firmware to the latest level.
PDF (2138 KB) <—Click for the PDF of this article…
The flexible service processor (FSP) firmware provides diagnostics, initialization, configuration, run-time error detection, and correction. It is required to periodically update the firmware on the Power Systems server. Keeping the firmware up-to-date can help in attaining the maximum reliability and functionality from your systems.
Firmware releases enable new function and might also contain fixes or enhancements.
Firmware service packs provide fixes and enhancements within a specific release.
This tutorial provides the following information:
In the following sections, let’s go through in detail covering all the topics highlighted above.
We will use the View system information option to get the current system firmware information.
We will be using this information in IBM Fix Central to obtain information on the latest firmware updates or upgrades available for the system and proceed with the firmware update or upgrade to newer release using the instructions described in the following sections.
Select the system under test, click Updates, and then click View system information to check the currently installed, activated, and accepted levels.
The following figure shows the currently installed firmware levels on the system.
Fields in figure 1.2 are described below:
EC Number
This displays the numerical identifier of the engineering change (EC) that shows the system and GA level. It has the format of PPNNSSS, where:
LIC Type
This displays the LIC types associated with the selected target.
Machine Type/Model/Serial Number
This displays the corresponding machine type, model number, and serial number.
Installed Level
This displays the LIC level that will be activated and loaded into memory at the next system restart.
Activated Level
This displays the LIC level that is activated and loaded into memory (for example, from a level 5 to level 7).
Accepted Level
This displays the LIC level that was committed. This refers to the updates selected on the system.
This is the backup level of code that you can return to, if necessary. Generally, this is the level of code on the permanent side (p-side).
Unactivated Deferred Level
This displays the latest or highest LIC level that contains unactivated deferred updates. This refers to the updates selected on the system.
A deferred update requires a system restart to activate.
Platform IPL Level
This displays the LIC level on which the hypervisor and partition firmware were last restarted. When concurrent LIC updates are performed, the activated level will change, but the platform IPL level will remain unchanged.
Update Control
This displays the current owner of LIC update control. It can be either HMC or operating system.
Having known the current firmware levels on the system as described in Section 1 and in order to move up to the necessary latest update that is available, we have various firmware update and upgrade methods as mentioned below. Select the one that is appropriate to your requirement.
Section 4 describes the concurrent firmware update procedure. We can also use the DVD method to perform code upgrades (to a new release). This can be used when the HMC cannot access Internet due to firewall.
Section 7 describes the disruptive upgrade procedure using the FTP method. Similarly, the FTP procedure can also be used for concurrent code updates (within the same release).
Section 8 describes the code upgrade procedure disruptively using the IBM website. A similar procedure can be used for performing concurrent code updates as well.
After selecting the required system from the HMC, ensure to select Change Licensed Internal Code in order to perform code updates (any updates within the same release) and select Upgrade Licensed Internal Code in order to perform code upgrades (by installing the different release).
Power Systems firmware fix packs or firmware releases can be obtained from the IBM Fix Central website.
Select the following categories for Power Systems firmware update and choose the appropriate machine type and model of your system to be updated.
As per the example shown in Figure 3.0, the machine type and model used is: 8203-E4A. Select the appropriate machine type of your choice and continue.
Example in Figure 3.1 shown below is for the system firmware only. Similarly, you can explore other options too.
If users are aware of the specific firmware level, then users can select the necessary option directly. If not, users can also take help from the recommendations that the website can provide about the latest and the best-suited firmware levels. If you need help, select the I need guidance.I am not sure what level of firmware is recommended option as shown in Figure 3.2.
Choose the specific level or get the recommended level as shown below:
Decide whether your system needs firmware update to the latest fix pack or upgrade to a new release based on the current levels installed on the system as obtained from View system information in the above section.
As an example, let us continue to get the firmware service pack within the current release, as shown in Figure 3.6.
Similarly, users can get the upgrade code, that is, newer release using the second option. Note that this will be a disruptive code install, that is, system power recycles.
Note:
Download the update code, if you are planning an update within the current release.
Download the upgrade code if you are planning for an upgrade to a newer release itself.
Figure 3.7 lists the latest, recommended, and available updates to your current release. Select the appropriate option and proceed further.
Continue with downloading the ISO file if you want burn it to a DVD to proceed with the firmware update using the DVD media, or get the code to a remote FTP-enabled system to perform update using the FTP method. The firmware update procedure is explained in detail in the following sections.
You can update the firmware concurrently (that is, the fixes that can be deployed on a running system without rebooting partitions or performing an IPL) within a specific release. Select the Change Licensed Internal Code option for the current release.
In the Specify LIC Repository section (as shown in Figure 4.2), select the location of the LIC update repository.
Select the DVD-RAM drive option,where you have the DVD placed and proceed with code update concurrently, as shown in Figure 4.3.
Note: Place the DVD in the HMC’s DVD drive (and not in the system’s DVD drive).
Click OK to proceed further to the subsequent steps to perform code update. It verifies whether the system is ready for code update by performing the health check and if everything fine, we can proceed further.
The following screen captures show the step-by-step procedure to perform concurrent code update.
Firmware updates are usually concurrent. Disruptive update service packs are very rare. The procedure to perform disruptive update is quite similar to concurrent update (explained in Section 4) but this process will prompt for system power cycle during the operation.
We use the Select advanced features option to perform advanced operations, such as Remove and activate and Reject fix.
Remove and activate option
The Remove and activate option brings the system back to the update level that is on the permanent side. You can use this option to back off an update level.
Click OK and then Close to remove and activate the permanent side update level.
Reject Fix operation:
Boot the system in the Permanent Side mode (from ASMI -> Power/Restart Control -> Power On/Off System, and make sure that the Current firmware boot side option is displayed as Permanent) and only then the Reject Fix option gets enabled and the operation can be performed. This operation copies the currently running level (permanent side) to the temporary side. This can be used to reject a fix that has been applied.
Click OK to start this operation.
Installing a release or a disruptive fix pack causes system IPL. All release upgrades are disruptive.
We can obtain the upgrade code, that is, the disruptive fix pack from Fix Central and burn it to a media drive and proceed with the upgrade process, which is quite similar to the concurrent update process explained in the earlier sections (except that this operation is disruptive).
In this section, let us learn how to use the FTP method to upgrade the system using the firmware code stored in a remote repository.
The following screen captures shows the steps to upgrade to newer firmware releases disruptively using the FTP method.
Clicking OK starts the disruptive upgrade. System will be on the applied release level after the upgrade operation completes.
After logging in to the HMC, click System Management > Servers > Target Server on the left pane. Instead, you can also click the Updates icon on the same pane. All the available servers will be displayed in the right pane. In the following figure, the red highlight in the right pane shows the current level installed.
Make sure that your target server is in the shutdown mode, and if not, switch off the server.
Now, click the Upgrade Licensed Internal Code to a new release link at the bottom of the page as shown in the following figure.
After clicking the link, you will be directed to the web page which will show information about the readiness check. If there is no errors found, you can click OK and proceed further, as shown in the following figure.
After clicking OK, you will be directed to the Specify LIC Repository page. Here, you need to select the location of the code. The options shown in the following figure are available.
If you are setting a new server configuration, the best practice at this prompt is always to select the IBM service web site option and you need not worry about the need to power off and power on the managed systems in this method.
After selecting the IBM Service web site option, you will have a new web page opened, which will show you the available LIC level details. Here, the best practice is to select the latest available code (that is, the latest available version). Most of the fixes are added by IBM and your Power Systems server will be upgraded to the latest level. Then, select the best as per your requirement, or the latest supported.
Be patient here and follow the prompts to complete the upgrade. The firmware upgrade activity will need time depending on your Internet bandwidth speed. Do not forget to switch on the server, so that the latest firmware gets activated and reflected in the navigation pane, as shown in the following figure.
Now you are done with the upgrade. Remember if you select multiple systems, you can upgrade them as well.
9/23/13 Update – See this upda
Here is a script I’ve written to visualize the physical layout of an AIX volume group. The script visually shows the location of every Physical Partition (PP) on each hdisk (AKA Physical Volume). The output shows which Logical Volume (LV) is on each of the PP’s (or if it is free space). The output is color coded so each LV has its own color so that it is very easy to see where each LV physically is across the entire Volume Group. You can specify the number of columns of output depending on the size of your screen.
The intended use of the script is to show a visual representation of the Volume Group to make using commands which move around LP’s/PP’s such as migratelp easier to use, to make LVM/disk maintenance easier, and also as a learning tool.
Here are a few screenshots:
When running the script you specify 2 parameters: The volume group name, and the number of columns you would like displayed (or it will default to 3 columns if not specified).
Here is the script:
#!/bin/ksh
#vvg - visualize physical layout of AIX volume group
#Copyright Brian Smith, 2013
index=0
set -A colors 41m 42m 43m 44m 45m 46m 47m 100m 101m 102m 103m 104m 105m 106m
temp
temp
> $tempfile
> $tempfile2
if [ -n “$1” ]; then
vg=$1
else
echo “Specify VG name as first parameter”
exit 1
fi
if ! lsvg $vg >/dev/null 2>&1; then
echo “Error: VG name not correct or VG not varried on”
exit 2
fi
[ -n “$2” ] && col=$2 || col=3
if ! echo $col | grep “^[0-9]*$” >/dev/null || [ “$col” -eq 0 ]; then
echo “Error: second parameter should be number of columns”
exit 3
fi
count=0
columns=””
while [ “$count” -lt “$col” ]; do
columns=”$columns -”
count=`expr $count + 1`
done
showdisk()
{
. $tempfile
. $tempfile2
[ “$index” -gt 0 ] && index=`expr $index + 1`
pv=$1
lspv -M $pv | while read line; do
if echo $line | awk ‘NF==1 {print}’ | grep ‘-‘ >/dev/null; then
elif echo $line | awk -F: ‘{print $2}’ | grep “^[0-9]*$” >/dev/null ; then
else
fi
done | while read line2; do
pp=`echo “$line2” | awk ‘{print $1}’ | awk -F: ‘{print $2}’`
lv=`echo “$line2” | awk ‘{print $2}’ | awk -F: ‘{print $1}’`
lp=`echo “$line2″ | awk ‘{print $2}’ | awk -F: ‘{print $2}’`
eval if ! [ -n \”\$${lv}\” ]\; then \
fi
eval printf \\\\
if [ -n “$lp” ]; then
else
fi
echo
done | paste -d ” ” $columns
}
for pv in `lspv | grep ” $vg ” | awk ‘{print $1}’`; do
ppsize=`lspv $pv | grep “^PP SIZE” | awk ‘{print $3 ” ” $4}’`
echo “\03
printf “\033[1;36m* %-8s
printf “\033[1;36m* Size : %-10s * \n\033[0m” “`getconf DISK_SIZE /dev/$pv` MB”
printf “\033[1;36m* PP Size: %-19s* \n\033[0m” “$ppsize”
echo “\03
showdisk $pv
done
rm $tempfile
rm $tempfile2
As you can see one of hdisk is missing! And you start to panic! “OMG, hdisk is missing, where, how, when?!?!”
There is no place for panic. You will see that one of your disks is missing only after you have restarted one of your VIOS. In are case there is two VIOS. hdisk0 is from first VIOS, hdisk1 is from second VIOS. These two hdisk is creating volume group called rootvg.
How to fix this missing hdisk state?
All you need to do is to activate.
root@aix-server> [/] varyon rootvg
This will activate your volume group rootvg. After this you will see both of your hdisk as active!
Why this is important? Because of this:
When a volume group is activated, physical partitions are synchronized if they are not current.
But there is one case when you can’t make your hdisk active without making additional changes! In this case, after you execute varyon command, error will be prompted and you won’t be able to make your hdisk active!
root@aix-server> [/] varyon rootvg
varyonvg: Cannot varyon volume group with an active dump device on a missing physical volume. Use sysdumpdev to temporarily replace the dump device with /dev/sysdumpnull and try again.
So, as error said active dump device is on missing physical volume hdisk0.(I will not explaind here what system dump device is) How to change this? First we will list status of sysdump devices.
root@aix-server> [/] sysdumpdev -l
primary /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
From here we can see, that primary device is located on /dev/lg_dumplv and secondary device is /dev/sysdumpnull. In error message, active dump device is actually primary dump device in sysdumpdev -l. So we need to change that.
root@aix-server> [/] sysdumpdev -p /dev/sysdupmnull
List again sysdump devices.
root@aix-server> [/] sysdumpdev -l
primary /dev/sysdumpnullsecondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump FALSE
dump compression ON
Now execute activation of volume group.
root@aix-server> [/] varyon rootvg
root@aix-server> [/]
root@aix-server> [/] lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 4 00..00..00..00..04
hdisk1 active 546 0 00..00..00..00..00
As you can see now, both hdisk are active now.
Now, change back you primary dump device
root@aix-server> [/] sysdumpdev -p /dev/lg_dumplv
From Ernie O. and Chuck L of IBM…
Please do not contact the author on this but raise any concerns on this paper through your IBM Storage Support channel.
I have always considered 8 NPIV paths/zones, each with a single initiator and single target, to be the best configuration for performance and availability.
Most vendors Path Control Modules recommend 8 paths for performance and, properly cabled, 8 paths also allows for a concurrent failure on one of the SAN Fabrics (assuming 2) as well as one of the VIO servers, (assuming 2).
However, the SVC using SDDPCM has 4 paths as it’s optimal number for performance. If you use 8 paths, and LPM a partition, the SVC has to manage 16 paths during the LPM move and this has resulted in a failed LPM move, with data loss, resulting in a reload of the partition.
Click for detailed diagrams .pdf —–> SVC zoning for PowerVM NPIV
Please do not contact the author on this but raise any concerns on this paper through your IBM Storage Support channel.
1. To list machines configured in a NIM Server,
# lsnim -c machines
2. To list networks configured in a NIM Server,
# lsnim -c networks
3. To reset a machine (return to ready state)
# nim -Fo reset MachineName
4. To list core file settings for a user,
# lscore user1
The output will look like:
compression: on
path specification: default
corefile location: default
naming specification: off
5. To list the default settings for the system,
# lscore -d
The output will look like:
compression: off
path specification: on
corefile location: /corefiles
naming specification: off
6. To make any process run by root dump compressed core files and restore the location of the core files to the system default,
# chcore -c on -p default root
Note: If no default is specified, cores will dump in the current directory.
7. To enable a default core path for the system, type:
# chcore -p on -l /corefiles -d
8. To scan logical volume lv01, report the status of each partition, and have every block of each partition read to determine whether it is capableof performing I/O operations, type:
# mirscan -l lv01
9. To do the above operation in a PV,
# mirscan -p hdisk1
10. To do the above operation in a VG,
# mirscan -v vg01
11. To determine if the 64-bit kernel extension is loaded,
# genkex grep 64
12. To list all JFS file systems,
# lsjfs
13. To list all JFS2 file systems
# lsjfs2
14. To mirror a terminal1 on terminal2
a. Open terminal 1 and find the pts value (ps -ef grep pts)
b. Open terminal 2 and enter ‘portmir -t pts/1’
c. Now you will see commands and outputs from terminal 1 in terminal 2.
This is basically monitor a terminal.
d. Say “portmir -o” to end the mirroring after the use
15. To identify the current run level,
# cat /etc/.init.state
16. To list the available CD ROM drives,
# lsdev -Cc cdrom
17. To find out the speed of your network adapter,
# entstat -d ent0 grep “Media Speed”
18. To find out when your system was last installed/updated
# lslpp -f bos.rte
19. To list the status of your tape drive,
# tctl -f /dev/rmt0 status
20. How to setup anonymous ftp in AIX
Run the below script to setup anon ftp,
# /usr/lpp/tcpip/samples/anon.ftp
21. If telnet takes more time to produce a prompt, do the below checks
a. do nslookup of the client ip from the aix serverb.
b. Check the nameservers in /etc/resolv.confc.
c. Check the ‘hosts’ entry in /etc/netsvc.conf or NSORDER variable
This issue might be due to the DNS configuration issue. Pointing to a good nameserver should solve the problem.
22. How to shutdown the system to maintenance mode ?
# shutdown -Fm
23. How to log ftp accesses to a file
a. Add the below line in /etc/syslog.confdaemon.debug /tmp/daemon.log
b. # touch /tmp/daemon.log
c. # refresh syslogd
d. Modify your inetd.conf so that ftpd is called with the “-l” flag.
24. How to find a file name from inode number ?
# ncheck -i xxxx /mountpoint
where xxxx -> inode number of the file
25. How to redirect the system console to a file or tty temporarily
# swcons /tmp/console.out
or
# swcons /dev/tty5
26. How to recreate a deleted /dev/null file ?
# /bin/mknod /dev/null c 2 2
27. How to add commands that should get executed during every system shutdown ?
Add them to /etc/rc.shutdown
28. How to reduce the size or do cleanup of /var/adm/wtmp ?
# > /var/adm/wtmp
29. How to find out the fileset a file belongs to ?
# which_fileset command_name
30. In which file, the mapping of file Vs fileset stored ?
# /usr/lpp/bos/AIX_file_list
31. How to set maximum logins for a user in a system ?
Change the value of “maxlogins” under “usw” stanza in /etc/security/login.cfg
32. How to change the initial message that prints while logging in ?
Change the value of “herald” in /etc/security/login.cfg
33. How to set the # of seconds the user is given to enter their password ?
Change the value of “logintimeout” under “usw” stanza in /etc/security/login.cfg
Error description
su to NIS user fails with error 3004-503 cannot set
process creditials. This happens when system is upgraded
to 6.1 Tl09 SP01
Local fix
Problem summary
**************************************************************
* USERS AFFECTED:
* Systems running the 6100-09 Technology Level with
* bos.rte.security at the 6.1.9.0 or 6.1.9.1 level.
**************************************************************
*PROBLEM DESCRIPTION:
Switching to a NIS user using the ‘su’ command will fail with: 3004-503 cannot set process creditials.
This only affects customers using NIS (Network Information Service).
**************************************************************
* RECOMMENDATION:
* Install APAR IV53944.
* Prior to fix availability, an interim fix is available from
* either
* ftp://aix.software.ibm.com/aix/ifixes/iv53944/
* https://aix.software.ibm.com/aix/ifixes/iv53944/
**************************************************************
Problem conclusion
In the processing of NIS user credentials, the logic to find
stale cached records has been corrected so that the record is
not assigned an invalid pointer.
Temporary fix
*********
* HIPER *
*********
Comments
APAR information | |
APAR number | IV53944 |
Reported component name | AIX 610 STD EDI |
Reported component ID | 5765G6200 |
Reported release | 610 |
Status | CLOSED PER |
PE | YesPE |
HIPER | YesHIPER |
Submitted date | 2014-01-13 |
Closed date | 2014-01-27 |
Last modified date | 2014-03-28 |
APAR is sysrouted FROM one or more of the following:
IV53884
Support lifecycle notice
AIX 7.1 Technology Level 1
AIX 6.1 Technology Level 7
IBM announces the following schedules to help you plan for future upgrades to your AIX operating system. These plans are subject to change without notice.
AIX Technology Levels are supported for how to, usage, and problem identification for the entire life of the release. However, all Technology Levels have a limited support window for corrective service. If a fix is needed, you may be required to upgrade to a more current Technology Level to receive generally available fixes or interim fixes. IBM recommends you take a moment to verify your current service level. Simply run the ‘oslevel -r’ command.
EZH – Easy HMC Command Line Interface
http://ezh.sourceforge.net/ |
EZH is a script for the IBM HMC console to provide an alternate, easier to use, command line interface for many common commands and the goal of the project is to make the HMC command line interface easier to use for day to day administration tasks.
Starting with version 0.6 it also has a interactive menu to make it even easier to use (“ezh” command)
See a video overview/demo of EZH at: http://youtu.be/E8A9s1_i9xA