Wednesday, December 16, 2009

Snapmanager for Hyper-V is here!

NetApp today published version 1.0 of SnapManager for Hyper-V, providing a solution for automated protection and recovery of Microsoft Hyper-V virtual machines.

 

new video demo on NetApp TV (YouTube).

 

Snapmanager for Hyper-V supports:

• Group virtual machines into different datasets (based upon protection requirements),
• Back up dedicated/clustered virtual machines on storage systems running Data ONTAP,
• Back up & restore virtual machines running on clustered shared volumes,
• Automate dataset backups using scheduling policies,
• Perform on-demand backups of datasets,
• Retain dataset backups for as long as you need them via retention policies,
• Update the SnapMirror destination location after a backup successfully finishes,
• Specify custom scripts to run before or after a backup,
• Restore virtual machines from backups,
• Monitor the status of all scheduled and running jobs, &
• Manage hosts remotely from a management console.


SnapManager for Hyper-V parent host (not management console) requires Windows Server 2008 R2 x64, SnapDrive 6.2 for Windows, and Data ONTAP 7.3.2.

 

Friday, December 11, 2009

DataOntap 7.3.2

As of today DataOntap 7.3.2 is marked General Deployment release by Netapp.

Thursday, December 10, 2009

Calculating the size of a volume

What the volume size depends on

Before you create the volumes that contain qtrees and LUNs, calculate the size of the volume and the amount of reserve space required by determining the type and the amount of data that you want to store in the LUNs on the volume.

The size of the volume depends on the following:

* Total size of all the LUNs in the volume.
* Whether you want to maintain Snapshot copies.
* If you want to maintain Snapshot copies, the number of Snapshot copies you want to maintain and the amount of time you want to retain them (retention period).
* Rate at which data in the volume changes.
* Amount of space you need for overwrites to LUNs (fractional reserve).

The amount of fractional reserve depends on the rate at which your data changes and how quickly you can adjust your system when you know that available space in the volume is scarce.

Estimating the size of a volume

Use the decision process in the flowchart shown on the following page to estimate the size of the volume. For detailed information about each step in the decision process, see the following sections:

# Calculating the total LUN size
# Determining the volume size when you do not need Snapshot copies
# Calculating the amount of space for Snapshot copies
# Calculating the fractional reserve



Calculating the total LUN size

The total LUN size is the sum of the LUNs you want to store in the volume. The size of each LUN depends on the amount of data you want to store in the LUNs. For example, if you know your database needs two 20-GB disks, you must create two 20-GB space-reserved LUNs. The total LUN size in this example is 40 GB. The total LUN size does not include LUNs that do not have space reservation enabled.
Determining the volume size when you do not need Snapshot copies

If you are not using Snapshot copies, the size of your volume depends on the size of the LUNs and whether you are using traditional or FlexVol volumes.

* Traditional volumes

If you are using traditional volumes, create a volume that has enough disks to accommodate the size of your LUNs. For example, if you need two 200-GB LUNs, create a volume with enough disks to provide 400 GB of storage capacity.
* FlexVol volumes

If you are using FlexVol volumes, the size of the FlexVol volume is the total size of all the LUNs in the volume.

ONTAP data protection methods and Snapshot copies

Before you determine that you do not need Snapshot copies, verify the method for protecting data in your configuration. Most data protection methods, such as SnapRestore, SnapMirror, SnapManager® for Microsoft Exchange or Microsoft SQL Server, SyncMirror®, dump and restore, and ndmpcopy methods rely on Snapshot copies. If you are using these methods, calculate the amount of space required for these Snapshot copies.

Note
Host based backup methods do not require additional space.
Calculating the amount of space for Snapshot copies

The amount of space you need for Snapshot copies depends on the following:

* Estimated Rate of Change (ROC) of your data per day.

The ROC is required to determine the amount of space you need for Snapshot copies and fractional overwrite reserve. The ROC depends on how often you overwrite data.
* Number of days that you want to keep old data in Snapshot copies. For example, if you take one Snapshot copy per day and want to save old data for two weeks, you need enough space for 14 Snapshot copies.

You can use the following guideline to calculate the amount of space you need for Snapshot copies:

Space for Snapshot copies = ROC in bytes per day * number of Snapshot copies
Example

You need a 20-GB LUN, and you estimate that your data changes at a rate of about 10 percent, or 2 GB each day. You want to take one Snapshot copy each day and want to keep three weeks' worth of Snapshot copies, for a total of 21 Snapshot copies. The amount of space you need for Snapshot copies is 21 * 2 GB, or 42 GB.
Calculating the fractional reserve

The fractional reserve setting depends on the following:

* Amount of time you need to enlarge your volume by either adding disks or deleting old Snapshot copies when free space is scarce.
* ROC of your data
* Size of all LUNs that will be stored in the volume

Example

You have a 20-GB LUN and your data changes at a rate of 2 GB each day. You want to keep 21 Snapshot copies. You want to ensure that write operations to the LUNs do not fail for three days after you take the last Snapshot copy. You need 2 GB * 3, or 6 GB of space reserved for overwrites to the LUNs. Thirty percent of the total LUN size is 6 GB, so you must set your fractional reserve to 30 percent.
Calculating the size of a sample volume

The following example shows how to calculate the size of a volume based on the following information:

* You need to create two 50-GB LUNs.

The total LUN size is 100 GB.
* Your data changes at a rate of 10 percent of the total LUN size each day.

Your ROC is 10 GB per day (10 percent of 100 GB).
* You take one Snapshot copy each day and you want to keep the Snapshot copies for 10 days.

You need 100 GB of space for Snapshot copies (10 GB ROC * 10 Snapshot copies).
* You want to ensure that you can continue to write to the LUNs through the weekend, even after you take the last Snapshot copy and you have no more free space.

You need 20 GB of space reserved for overwrites (10 GB per day ROC * 2 days). This means you must set fractional reserve to 20 percent (20 GB = 20 percent of 100 GB).

Calculate the size of your volume as follows:

Volume size = Total LUN size + Amount of space for Snapshot copies + Space for overwrite reserve

The size of the volume in this example is 220 GB (100 GB + 100 GB + 20 GB).
How fractional reserve settings affect the total volume size

When you set the fractional reserve to less than 100 percent, writes to LUNs are not unequivocally guaranteed. In this example, writes to LUNs will not fail for about two days after you take your last Snapshot copy. You must monitor available space and take corrective action by increasing the size of your volume or aggregate or deleting Snapshot copies to ensure you can continue to write to the LUNs.

Caution
If you do not actively monitor available space and the volume becomes full, writes to the LUN fail, the LUN goes offline, and your application might crash.

If you leave the fractional reserve at the default setting of 100 percent in this example, Data ONTAP sets aside 100 GB as intended reserve space. The volume size must be 300 GB, which breaks down as follows:

* 100 GB for 100 percent fractional reserve
* 100 GB for the total LUN size (50 GB plus 50 GB)
* 100 GB for Snapshot copies

This means you initially need an extra 80 GB for your volume.
Space requirements for LUN clones

A space-reserved LUN clone requires as much space as the space-reserved parent LUN. If the clone is not space-reserved, make sure the volume has enough space to accommodate changes to the clone.
Changing the size of a FlexVol volume

After you calculate the initial size of a FlexVol volume and create LUNs, you can monitor available disk space to confirm that you correctly estimated your volume size or increase the volume size depending on your application requirements. You can also define space management policy to perform the following tasks:

* Automatically increase the size of the FlexVol volume when it begins to run out of space
* Automatically delete Snapshot copies when the FlexVol volume begins to run out of space

Subnet cheat sheet

SQL I/O Simulator

With this tool you can simulate SQL I/O without even installing SQL on your system.

http://support.microsoft.com/kb/231619

NFS Performance Issue

There was an issue found in Data ONTAP 7.2.3 and earlier where ONTAP would sometimes not release networking buffers fast enough, resulting in very poor performance. The main symptom of this issue can be seen by watching the output of "sysstat 1" over time and noting that the number of NFS ops will periodically drop to 0 for several seconds, and then resume normal activity.

Two other tell-tale signs are the "Total discards" and "No Buffers"
counters in the output of "ifstat -a". These are TCP or UDP packets that ONTAP had to discard because it did not have enough memory resources to handle them. If they are anything other than zero, there is a problem:

RECEIVE
Frames/second: 21600 | Bytes/second: 28641k | Errors/minute:
0
Discards/minute: 429 | Total frames: 1939k | Total bytes:
2564m
Total errors: 0 | Total discards: 486 | Multi/broadcast:
0
No buffers: 486 | Non-primary u/c: 0 | Tag drop:
0

You can read more about this BURT here:

http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=226424

This issue has been fixed starting in Data ONTAP 7.2.4 and later.

Maximum data drives per 16-TB aggregate

With the aggregate size calculation changes present in Data ONTAP 7.3, you can include more data
drives in an aggregate without exceeding the aggregate size limit.

The following table shows the maximum number of data drives that can be included in a 16-TB aggregate
for Data ONTAP 7.3 and for previous releases.

Metrocluster and Cable distances

Although NetApp recommends that dedicated dark fiber be used for a MetroCluster configuration, WDM devices are supported. Refer to the Brocade Compatibility Guide at www.brocade.com for supported devices.

Stretch MetroCluster can support a maximum of 500 meters between nodes at a speed of 2Gbps. Fabric MetroCluster, through the use of Fibre Channel switches, extends this distance to 100km at the same speed. At 4Gbps speeds, these distances are roughly cut in half unless using the Brocade 300, 5000 or 5100, which leaves this maximum distance at 100km.

CABLE TYPE
As shown in Table 3, the cable type affects both distance and speed. Single-mode cable is supported only for the inter-switch links. Example 1: A customer has 250 meters between sites and wants to run at 4Gbps. The OM-3 cable type is required. Example 2: A customer currently has a MetroCluster configuration running at 2Gbps with a distance of 300 meters over OM2 cabling and wants to upgrade to 4Gbps speeds. Upgrading the cabling will not help, because OM3 has a maximum of 270 meters. In this case the choices would be:
• Remain at 2Gbps speeds. Customers with the new ESH4 disk shelves could still use them at this distance, as long as the shelf speed is set to 2Gbps.
• Test current optical network infrastructure to make sure that attenuation and latency are acceptable.



The maximum distance shown in the picture is typically due to the standard 1310nm SFPs. Use of high-power SFPs can extend this dark fiber up to 30km. Using 1550nm high-power SFPs, a distance of 70–100km can be achieved. This topic is discussed in much greater technical detail in the following technical reports:

MetroCluster Upgrade Planning Guide (TR-3517)
Optical Network Installation Guide (TR-3552)

There are four types of Small Form-factor Pluggables (SFPs) associated with the Fabric MetroCluster configuration. They are:

Short-Wavelength Laser (SWL) Short Wavelength Laser transceivers based on 850nm lasers are designed to transmit short distances. This is the most common type of media and is the default on the Brocade 200E.

Long Wavelength Laser (LWL) Long Wavelength Laser transceivers may be based on 1310nm lasers. They are used for long distance native FC links. Generally, these media types are used with single-mode fiber cable.

Extended Long Wavelength Laser (ELWM) Extended Long wavelength Laser transceivers may be based on 1550nm lasers. They are used to run native Fibre Channel connections over even greater distance than LWL media can support. Generally these media types use single-mode fiber cable.

WDM Both coarse (CWDM) and dense (DWDM) SFP transceivers are commercially available for multi-wavelength channel transmission inside single mode fibers.

The type of SFP transceiver required is a function of the distance and the interconnect technology used. Table 2.4 summarizes the types and specifications for the SFP transceivers supported by the NetApp solution.

Wednesday, December 9, 2009

Enabling unix like commands on Data Ontap

Enabling unix like commands on Data Ontap


There is a completely unsupported method to actually accomplish this.

1) Get to a command prompt

2) priv set advanced

3) java netapp.cmds.jsh

4) do a LS to list the contents

5) now u can use rm to delete a qtree

6) CTRL+C brings you back to the Ontap command prompt

This works on the Simulators as well.




Example:


netapp1*> java netapp.cmds.jsh

jsh> ls

etc

home

source

.ha

stuff

jsh> ls -la

drwx------ 14 0 61440 Apr 12 2004 08:36:52 etc

drwxrwxrwx 2 0 4096 Nov 17 2003 07:39:40 home

drwxrwxrwx 5 0 4096 Mar 24 2004 13:21:21 source

dr-------- 2 0 4096 Feb 09 2004 10:02:18 .ha

drwxrwxrwx 5 0 4096 Mar 15 2004 14:12:47 stuff

jsh> cd etc

jsh> cat hosts

#Generated by setup Mon Nov 31 11:50:17 EST 2005

#Auto-generated by setup Fri Jan 23 14:12:57 GMT 2004

127.0.0.1 localhost

# 0.0.0.0 netapp1-ns1

192.168.99.10 filer1

192.168.99.11 netapp1

Hidden commands in the special boot menu of the filer

Hidden commands in the special boot menu of the filer


There are some hidden command in the special boot menu of a filer. Below is one of these commands. This command (WAFL_Check aggrname) checks the filesystem of the filer for any inconsistencies and corrects them when necessary.



Example:

Special boot options menu will be available.

NetApp Release 7.0.4P1: Mon Feb 27 14:36:15 PST 2006

Copyright (c) 1992-2006 Network Appliance, Inc.

Starting boot on Sat Mar 24 15:36:18 GMT 2007


(1) Normal boot.

(2) Boot without /etc/rc.

(3) Change password.

(4) Initialize all disks.

(4a) Same as option 4, but create a flexible root volume.

(5) Maintenance mode boot.

Selection (1-5)? WAFL_check aggr01


Sat Mar 24 15:38:15 GMT [wafl.vol.inconsistent:ALERT]: Aggregate aggr01 is inconsistent. Please contact NetApp Customer Support.

Sat Mar 24 15:38:15 GMT [raid.vol.replay.nvram:info]: Performing raid replay on volume(s)

Sat Mar 24 15:38:15 GMT [raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.

Sat Mar 24 15:38:15 GMT [raid.stripe.replay.summary:info]: Replayed 0 stripes.

Checking aggr01...

WAFL_check NetApp Release 7.0.4P1

Starting at Sat Mar 24 15:38:17 GMT 2007

Phase 1: Verify fsinfo blocks.

Phase 2: Verify metadata indirect blocks.

Phase 3: Scan inode file.

Phase 3a: Scan inode file special files.

Phase 3a time in seconds: 9

Phase 3b: Scan inode file normal files.

(inodes 5%)

(inodes 10%)

(inodes 15%)

(inodes 20%)

(inodes 25%)

(inodes 30%)

(inodes 35%)

(inodes 41%)

(inodes 46%)

(inodes 51%)

(inodes 56%)

(inodes 61%)

(inodes 66%)

(inodes 71%)

(inodes 76%)

(inodes 82%)

(inodes 87%)

(inodes 92%)

(inodes 97%)

(inodes 99%)


So instead of making a selection of 1 to 5, type the command and the aggregate check takes off.

bootfs error

After an upgrade to ontap 7.2.1.1 bootfs error



Starting with version 7.2.1.1 DatOntap has a new code that checks integrity of the Compact Flash, and it looks for partially-written files. When the code finds some inconsistency in the file system on the compact flash it will result in a bootfs chkdsk error.


Here are the steps to resolve this issue:


filer>priv set advanced

filer>bootfs help info

* Take note of the name of the boot device.


filer>bootfs fdisk 1

For example: 0i.0 for name

This causes the boot device to be reformatted on the next download.


filer>download

Reformats and reloads your kernel on the boot device.


Note: You can use 'bootfs chkdsk "name"' to verify that the problem is corrected.

Stop those annoying console messages

Preventing console messages from interfering with troubleshooting



Stop those annoying on-screen console messages from interfering when your working on the filer console.


Complete the following steps to create a working /etc/syslog.conf file that will only update /etc/messages.


Note:

This procedure assumes the default configuration where the /etc/syslog.conf does not exist.


Copy the following line into the CLI:


wrfile /etc/syslog.conf


Press the Return key.


Copy the following line into the CLI:


*.info /etc/messages


Press the Return key and CTRL+C to close the file.


At this point, the console messages should stop. To enable console messages again, rename the syslog.conf file from a CIFS/NFS host.


Caution: Using the wrfile /etc/syslog.conf command will cause all contents of the syslog.conf file to be overwritten. If syslogging has already been customized, and these customizations must be kept, please use NFS or CIFS to edit the syslog.conf file.

Autosupport throttling

A new "feature" in DataOntap 7.0.5 is that there is a throttling option for sending autosupport messages. By default only "error" autosupports are send. So no more weekly logs and so on.


The option is:


autosupport.notify_threshold


Description:


For Data ONTAP 7.0.5 only. Specifies the minimum severity level of AutoSupport messages that customers want to receive. The available severity levels are: critical, error, warning, notice, info, debug.

The default for Data ONTAP 7.0.5 is to send critical and error messages only to the addresses specified in the AutoSupport.to and autosupport.noteto. To revert the message delivery method to that of Data ONTAP 7.0.4 and earlier, change the value to "debug".

Autosupport throttling

A new "feature" in DataOntap 7.0.5 is that there is a throttling option for sending autosupport messages. By default only "error" autosupports are send. So no more weekly logs and so on.


The option is:


autosupport.notify_threshold


Description:


For Data ONTAP 7.0.5 only. Specifies the minimum severity level of AutoSupport messages that customers want to receive. The available severity levels are: critical, error, warning, notice, info, debug.

The default for Data ONTAP 7.0.5 is to send critical and error messages only to the addresses specified in the AutoSupport.to and autosupport.noteto. To revert the message delivery method to that of Data ONTAP 7.0.4 and earlier, change the value to "debug".

Mounting luns in a snapshot

A cool thing about snapshots and luns is the feature to create a lun clone from a snapshot. This can be used for testing purposes or for example for restoring files.


The command to create a clone is:


"toaster>lun clone create [clone_lunpath] [-o noreserve] -b [parent_lunpath] [parent_snap]"


Where:


[clone_lunpath] = the path to the lun clone

[-o noreserve] = Do not use any space on the filer (only for restores!)

-b [parent_lunpath] = The path to the source lun

[parent_snap] = The name of the snapshot

Netapp Filers and VMware ESX

Interested in running vmware ESX on Netapp filers using either iSCSI, FCP or even NFS (when performance is not an issue). The Netapp documents below could shed some more light on the issue.


Network Appliance and VMware ESX Server 3.0 Storage Best Practices


Best Practices for VMware ESX Server 3.0 Backup on NetApp


This last document also has a sample script that can be used to snapshot volumes that contain Virtual Machines. There are also some .vsb scripts out there that interact directly with the Virtual Center server (can be useful when using vmotion).


Finally I would like to point you to a great website with some cool vmware info and tools, this site is worth you visit.

Xtravirt.com

Starting a packet trace

Every now and then it can be usefull to capture a packet trace on the filer to help troubleshoot connectivity issues with the filer.


The following procedure explains how to start a packet trace:


Toaster>pktt start all (or only from a specific interface, pktt start e0a)


Next step is to dump the output to a file before stopping the trace.


Toaster>pktt dump all


When the trace is dumped to a file you can stop the packet trace from collecting data.


Toaster>pktt stop all (or stop only a specific interface, pktt stop e0a)


You can find the file with the packet trace in the filers root (\\toaster\c$), this file can be analyzed using a program like wireshark

Filer info script

Ever wanted to run a script against a filer that collects all relevant information?


I have! So I made a quick and dirty batch file that uses rsh to issue some commands and writes the output in a text file.


Since this script is not to complex or advanced, but still gets the job done, I would like to share it here.


Windows script:

filer_info v01.bat


Unix script:

filer_info v01.sh


Use the script with the following parameters:


filer_info v01.bat toastername root password textfilename

Support matrix tool

On the NOW site a new support matrix tool can be found.


Support Matrix



This saves some time digging through the pdf files.

Unable to resize volumes in FilerView on 7.0.5P6

There's a bug in DataOntap 7.0.5P6 and 7.0.6 that makes it impossible to resize a volume using the Filerview GUI.


Bug Detail


Netapp Bug ID 234826


Description:

After upgrading from 7.0.4 to 7.0.5P6 a volume cannot be resized from FilerView. FilerView will allow the user all the way to the confirm/commit screen and then show successful. There is no actual affect in the volume size after the resize.


The lack of growth can be verified in FilerView or the CLI df command. Use the CLI.


This bug is first fixed in DatOntap 7.0.6P1

How do I check the version of the Microsoft iSCSI Software initiator?

Have you ever been looking to see what version of the Microsoft iSCSI initiator has bene installed on a host? And have you found it?


Check the driver version of the iscsi initiator in the scsi adapters section of the device manager, write down the driver version and check it in the list below to find out which version of initiator has been installed.


iSCSI initiator version Driver Build
1.0 5.2.3790.198
1.01 5.2.3790.205
1.02 5.2.3790.215
1.03 5.2.3790.218
1.04 5.2.3790.243
1.04a 5.2.3790.244
1.05 5.2.3790.277
1.05a 5.2.3790.279
1.06 5.2.3790.302
2.0 5.2.3790.1653
2.01 5.2.3790.1748
2.02 5.2.3790.1895
2.03 5.2.3790.3099
2.04 5.2.3790.3273
2.05 5.2.3790.3392

Bug in upgrade from 7.2.1 to 7.2.3 that causes a panic

Data ONTAP panics when the disk qualification device table size exceeds a threshold limit of 300

When the qualification file qual_devices_v2 is uploaded to /etc directory or when Data ONTAP gets upgraded from release 7.2.1 to release 7.2.3, the filer will update the internal device qualification table with the content of the qualification file.


If there are disks with down rev firmware in the system, the filer will attempt to update the disks to a newer firmware revision. Due to a bug in Data ONTAP, the disk firmware update logic would cause the filer to panic if the qualification file contains drive record count greater than 300.


No other upgrade paths are affected.


BEST SOLUTION: If you are running Data ONTAP 7.2.1, upgrade to a Data ONTAP release that has a fix for both this bug, and bug 238702.


GOOD SOLUTION: Do not upgrade directly from Data ONTAP 7.2.1 to Data ONTAP 7.2.3; upgrade to Data ONTAP 7.2.2 first, then immediately upgrade to 7.2.3.


More information can be found here:


Bug 234290 on the NOW site

#Disabled# lines in the RC file

Due to Issues with FilerView Incorrectly Setting /etc/rc Parameters, Storage Controllers may have a Misconfigured rc File


If you have used FilerView to modify the network interface parameters, your storage controllers may have a misconfigured rc file. The issue is that FilerView incorrectly prepends a "#Disabled#" tag in the "ifconfig" line for the respective network interface. This will not be a problem until the storage controller is rebooted, when a loss of connectivity will be experienced on the respective network interface.


Please use the following procedure to verify if your rc file is configured properly:

* Check if any "ifconfig" line has a "#Disabled#" tag in your rc file.

* #Disabled# ifconfig e0a `hostname`-e0a netmask……mtu 1500


* Check if the status of the respective network interface, e0a in this example, is UP or not:

* Filer> ifconfig e0a

* e0a: flags=4858043 mtu 1500


If the respective network interface has "#Disabled#" and "UP" status, then your storage controller may be exposed to this problem.

Any changes made to the "Modify Network Interface" page such as changing IP address, netmask, broadcast address, media type, MTU size, trusted or WINS selection, etc. will result in this problem.


Product Affected


Data ONTAP® 7.2.1 through 7.2.3



Workaround:

Remove the #Disabled# tag by manually editing the /etc/rc file.


A fix for this issue is being developed and is presently scheduled for release in December. Check bug 238020 for the latest update.

Until the fix is applied, please use the above procedure to verify if the rc file is configured appropriately whenever FilerView is used for making changes to network parameters.


More Information

Bugs Online information (from NOW)


Bug Detail ID# 238020
Thought you bought a large enough disk, probably not. Below are the actual disk sizes you get when you buy a disk from Netapp:


Raw GB Type Rightsize GB Available blocks
72 FCAL 68 139,264,000
144 FCAL 136 278,528,000
300 FCAL 272 557,056,000
250 SATA 212 432,901,760
320 SATA 274 561,971,200
500 SATA 423 866,531,584
750 SATA 635 1,301,618,176
1000 SATA 847 1,735,794,176
144 SAS 136 278,528,000
300 SAS 272 557,056,000

Keep these numbers in mind when creating new aggregates. These numbers are without the 10% WAFL overhead just by creating an aggregate or volume, as well as the default 20% snap reserve for volumes, and 5% snap reserve for aggregates.


The 5% snap reserve on aggregates will only be used when you have a Metro Cluster setup, so if you don't have a metro cluster setup disable this reservation to free up some space.


The "16TB" max limitation is there and includes the parity disks. You can use the RawSize KB vs 17,179,869,184 KB to make your volume size calculations.


Thanks to Filer Jedi.com for the information about disk sizes.


More on disk sizes and configuration limits can be found in the Storage Management Guide from Netapp:

Storage Management Guide on the NOW site

Converting and Making a Secondary qtree Read/Write

The following steps need to be performed to convert an OSSV or SnapVault secondary backup destination to a usable/writable destination (typically for DR situations):


1. Secondary: Turn SnapMirror and SnapVault off.


2. Secondary: Switch to privileged mode (priv set diag).


3. Secondary: Convert SnapVault qtree to SnapMirror qtree (snapmirror convert ).


4. Secondary: Turn SnapMirror on.


5. Secondary: Quiesce the qtree.


6. Secondary: Break the mirror, making it writable.


7. Secondary: Turn SnapVault on.

Getting slow CIFS response on a filer

Doing a lot of snapshot operations on a filer can slow down the CIFS response even when sysstat -x 1 doesn't report a busy filer.


Check for high disk utilization with a perfstat, it so check your snapshot/snapmirror schedule

When your doing a lot of snapshot operations, for example all volumes every hour, the Container Block Reclaim scanner on the

filer could drain the filers performance.


The Container Block Reclaim (CBR) scanners: These scanners run on each volume after each snapshot deletion to return space back

to the aggregate. In other words there is a direct relation between the perfomance degradation and the snapshot schedules.


Default the CBR Scan process runs on a priority of 2000, by setting this option to 1 you can check to see of this solves any

performance issues with for example CIFS


How to set the CBR scan process to 1:

toaster>priv set diag

toaster*>wafl scan speed 1


You can set it back to its default by entering "wafl scan speed 0"


Note: You should not leave this process on a priority of 1, this is only usefull to check if the CBR scan process is draining you filers performance

Migrate many to one

Migrating volumes from several filers to one new filer and making every old volume a qtree on the new volume.


You can do this by using the snapmirror command in the following way:


snapmirror initialize -S systemA:/vol/vol0/- systemB:/vol/vol1/systeA-vol0


Wait for it to finish! After this you can break off the Snapmirror relationship and do the next volume!

FAS6030 comes up on previous version of Data ONTAP after a NDU

After doing an Non-Disruptive Upgrade (NDU) on a clustered filer, it may come

up in the previous version of Data ONTAP after the system reboot.



Workaround


1) download new kernel on node1

2) do 'cf takeover' on node2. node1 will do a shutdown and reboot.

3) Reboot node1, it will go to 'Waiting for Giveback'. Check the console and determine if it booted on the correct kernel. If it did, proceed to step#7

4) Press Ctrl-C and go to the Maintenance Menu

5) Halt

6) At the loader, type boot_primary

7) At this point, the node should be booted on the correct release and be at 'Waiting for Giveback'

8) On node2, do 'cf giveback'

9) node1 will complete booting.


Note! This can also happen on a FAS3070

Space calculations

The link below is a link to the chapter about calculating the space requirements for a volume in the Netapp Block

Access management guide


This chapter explains in detail how to calculate the required space and why you should need things like space

reservation


Calculating the size of a volume


Please be aware that this link requires a NOW site login

Undocumented options

As many of you filer admins know there are a lot of undocumented options on a filer. Here are some I discovered

recently.


lun.throttle.enable

Enable or disable the scsi target throttling code completely. This is the mechanism to prevent any one host system

from monopolizing all of the filer's SAN resources and thus starve out other hosts.


lun.throttle.log_interval

Number of minutes to wait between syslogs/snmp traps when a throttle has been exceeded. max is 1440 (one day)

min is 0 which disables logging/trapping completely. default is 60 (one hour).


lun.throttle.percent_hba

percentage of a SCSI target HBAs command block pool to use for throttling. Setting this to greater then 96 will

probably prevent throttling from working because the HBA will begin sending QUEUE_FULL on it's own without

notifying ONTAP. Setting this to less than 96 will probably just waste resources on the HBA.


lun.throttle.percent_qfull

percentage of command block pool to respond to with QUEUE_FULL when a throttle is exceeded.


Not all there options are usefull in every case, but at least now they are documented in some way :-)

SCSI disk timeout settings required for Guest OS which runs only on VMware ESX NFS Datastores

Netapp has recently published a document which explains how and why to set higher SCSI disk timeout settings

on Vmware Virtual Machines that run on Vmware ESX NFS Datastores.


Symptoms:


During Cluster failover on NetApp FAS storage systems, in order to sustain longer failover time the SCSI disk timeout

needs to be set in the Guest OS which runs on NFS Datastores.


SCSI disk timeout settings required for Guest OS which runs only on VMware ESX NFS Datastores


Solution:


On Windows Guest OS:


* Change the Registry Parameter DiskTimeOutValue in Windows Guest Operating System:


HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Disk\TimeOutValue to 190(Dec).


For the complete document and other operating systems:

Netapp Solution ID: kb37986 (Requires NOW site login)

Enable Negotiated cluster failover on NIC failure

In FAS clusters the default option is that a single network card failure will not result in a cluster failover


You can change this by enabling NICs for Negotiated Failover in the RC file.


Here's how to do this:


First set the cf option "cf.takeover.on_network_interface_failure" to on, next set the policy type:

cf.takeover.on_network_ interface_failure.policy any_nic (default is all_nics)


When these options are enabled you can specify through the ifconfig command that Negotiated Failover should be

enabled for certain nics by using the "nfo" parameter:


ifconfig e0a 192.168.100.20 netmask 255.255.255.0 nfo


Be aware that enabling the "cf.takeover.on_network_interface_failure" option when the

option cf.giveback.auto.enable is also set to on could result in an infinite loop of takeovers and givebacks

This will continue until either cf.giveback.auto.enable is disabled or the network is healed.