Main Nav

> Date: Tue, 31 Jan 2012 08:29:57 -0500 > From: Dan Brisson > Subject: Cisco APs losing CAPWAP session > > I'm curious if any Cisco users out there are experiencing or have > experienced what we're seeing on our campus. This past summer we > installed 3502i's in all of our residence halls - approximately 500 > total. Ever since the students have moved in, we will get messages from > WCS stating that "AP XYZ" is down and disassociated from the > controller. When I check out the AP, the uptime is fine, but the > "CAPWAP join time" is for like 30 seconds, or however long it took me to > check. > > We've tracked this and it is totally random as to what AP will drop, > which makes troubleshooting this very tough. The log on the AP isn't > helpful. I'm working with TAC who suggests that keepalives are getting > missed. I'm not sure why that would be the case since we have another > 500 or so APs on the admin side that very rarely drop. Adding to that, > when the students left for break, the AP drops stopped. They came back, > and sure enough, the drops start up again. > > I will say that the AP always joins back immediately, but for the time > that it does drop A) I'm sure connectivity is affected in that area and > B) we get an email. > > Anyone experiencing this? Wow, Deja vu! I had almost exactly the same problem a few years ago and it nearly drove me nuts. It turned out to be unrelated to the wireless. The wired network switches in the dorms were configured for dynamic vlan steering based upon a response from a radius server. The radius server would randomly glitch and return the "wrong" vlan for one or more of the ports that the wireless access points were plugged into, which would sever the connection between the AP and controller. I pulled most of my hair out before finally figuring it out by sniffing the radius queries and responses and meticulously matching them up and "Aha!!". You really remember the problems that leave skid marks across your backside! :-) -- Earl Barfield -- Academic & Research Tech / Information Technology Georgia Institute of Technology, Atlanta Georgia, 30332 Internet: Earl.Barfield@oit.gatech.edu earl@gatech.edu ********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

Comments

This what we see on the AP at the time the AP disjoins: *Feb 1 14:16:25.174: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to 10.246.207.214:5246 *Feb 1 14:16:25.227: %WIDS-5-DISABLED: IDS Signature is removed and disabled. *Feb 1 14:16:25.227: %CAPWAP-5-CHANGED: CAPWAP changed state to DISCOVERY *Feb 1 14:16:25.227: %CAPWAP-5-CHANGED: CAPWAP changed state to DISCOVERY *Feb 1 14:16:25.293: %LINK-5-CHANGED: Interface Dot11Radio0, changed state to administratively down *Feb 1 14:16:25.293: %LINK-5-CHANGED: Interface Dot11Radio1, changed state to administratively down *Feb 1 14:16:25.299: %LINK-5-CHANGED: Interface Dot11Radio0, changed state to reset *Feb 1 14:16:25.309: status of voice_diag_test from WLC is false *Feb 1 14:16:25.309: %LINK-3-UPDOWN: Interface Dot11Radio1, changed state to up *Feb 1 14:16:25.318: %LINK-3-UPDOWN: Interface Dot11Radio0, changed state to up -dan Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu On 2/1/2012 10:30 AM, Mike Goebel wrote: > Dan, have you tried logging into the AP itself and checking the logs > by chance? > > Mike > > On 2/1/2012 10:03 AM, Dan Brisson wrote: >> It does seem as though I've grabbed some folks attention. I sure hope it >> turns out to not be something simple. :) >> >> I could certainly try moving the APs around...easy enough to do, >> although from what we've seen, the pattern of AP drops is so totally >> random, hard to say if I'll see anything. At this point though, it's >> worth a shot. >> All interfaces clean and no QoS in place. >> >> Not sure if this will come through for everyone, but here's an example >> of what I see after an AP drops. This is from the controller, on the >> General tab for an AP: >> >> >> >> Thanks, >> -dan >> >> Dan Brisson >> Network Engineer >> University of Vermont >> (Ph) 802.656.8111 >> dbrisson@uvm.edu >> >> >> On 2/1/2012 9:26 AM, Garry Peirce wrote: >>> I think you have some of us all getting curious! ;-) >>> >>> Could you put a historically stable admin AP onto the 5508 and >>> vice-versa to >>> see if behaviors change? >>> Do we assume that all switchports in the path are showing they're >>> running >>> clean? >>> Any QoS config in place on the switches? >>> >>> >>>
Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 10:23 AM, Mike King wrote:


Message from iam@st-andrews.ac.uk

Stupid it may be as a question, but are your two (or more) DHCP servers handing out the same length leases?

You may also need to tweak the DAI settings. I know we had to change the default thresholds when we first deployed it.

 

--

Ian

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: 01 February 2012 16:35
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan



Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

 

Message from ceyre@mtroyal.ca

Dan, What time is your controller showing? From your log messages it looks like its 2:16pm there? AP's have issues when the time is off between the controller and the ap's. Craig Eyre Network Analyst IT Services Department Mount Royal University 4825 Mount Royal Gate SW Calgary AB T2P 3T5 P. 403.440.5199 E. ceyre@mtroyal.ca "The difference between a successful person and others is not a lack of strength, not a lack of knowledge, but rather in a lack of will." Vincent T. Lombardi From: Dan Brisson To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU Date: 02/01/2012 09:32 AM Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session Sent by: The EDUCAUSE Wireless Issues Constituent Group Listserv This what we see on the AP at the time the AP disjoins: *Feb 1 14:16:25.174: %DTLS-5-SEND_ALERT: Send FATAL : Close notify Alert to 10.246.207.214:5246 *Feb 1 14:16:25.227: %WIDS-5-DISABLED: IDS Signature is removed and disabled. *Feb 1 14:16:25.227: %CAPWAP-5-CHANGED: CAPWAP changed state to DISCOVERY *Feb 1 14:16:25.227: %CAPWAP-5-CHANGED: CAPWAP changed state to DISCOVERY *Feb 1 14:16:25.293: %LINK-5-CHANGED: Interface Dot11Radio0, changed state to administratively down *Feb 1 14:16:25.293: %LINK-5-CHANGED: Interface Dot11Radio1, changed state to administratively down *Feb 1 14:16:25.299: %LINK-5-CHANGED: Interface Dot11Radio0, changed state to reset *Feb 1 14:16:25.309: status of voice_diag_test from WLC is false *Feb 1 14:16:25.309: %LINK-3-UPDOWN: Interface Dot11Radio1, changed state to up *Feb 1 14:16:25.318: %LINK-3-UPDOWN: Interface Dot11Radio0, changed state to up -dan Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu On 2/1/2012 10:30 AM, Mike Goebel wrote: > Dan, have you tried logging into the AP itself and checking the logs > by chance? > > Mike > > On 2/1/2012 10:03 AM, Dan Brisson wrote: >> It does seem as though I've grabbed some folks attention. I sure hope it >> turns out to not be something simple. :) >> >> I could certainly try moving the APs around...easy enough to do, >> although from what we've seen, the pattern of AP drops is so totally >> random, hard to say if I'll see anything. At this point though, it's >> worth a shot. >> All interfaces clean and no QoS in place. >> >> Not sure if this will come through for everyone, but here's an example >> of what I see after an AP drops. This is from the controller, on the >> General tab for an AP: >> >> >> >> Thanks, >> -dan >> >> Dan Brisson >> Network Engineer >> University of Vermont >> (Ph) 802.656.8111 >> dbrisson@uvm.edu >> >> >> On 2/1/2012 9:26 AM, Garry Peirce wrote: >>> I think you have some of us all getting curious! ;-) >>> >>> Could you put a historically stable admin AP onto the 5508 and >>> vice-versa to >>> see if behaviors change? >>> Do we assume that all switchports in the path are showing they're >>> running >>> clean? >>> Any QoS config in place on the switches? >>> >>> >>>
Message from me@mpking.com

Ooooh...

DAI?

I have some bad experiences with DAI and some crappy low end printers. (to the point where they go in they're own VLAN without DAI)  (And some very expensive video conferencing units) 

My switch vendor also had a bug with DAI (over 2 years ago now) that got resolved pretty quickly that DHCP renew's wouldn't update the DHCP snooping, but Discovers would

I'd recommend turning off DAI and see if the problem disappears.

Mike


Good question. Turns out the APs use UTC time, which appears to be correct: AP#sh clock *17:29:03.737 UTC Wed Feb 1 2012 -dan Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu On 2/1/2012 12:11 PM, Craig Eyre wrote: > Dan, > > What time is your controller showing? From your log messages it looks like > its 2:16pm there? AP's have issues when the time is off between the > controller and the ap's. > > > Craig Eyre > Network Analyst > IT Services Department > Mount Royal University > 4825 Mount Royal Gate SW > Calgary AB T2P 3T5 > > P. 403.440.5199 > E. ceyre@mtroyal.ca > > "The difference between a successful person and others is not a lack of > strength, not a lack of knowledge, but rather in a lack of will." Vincent > T. Lombardi > > > > > From: Dan Brisson > To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU > Date: 02/01/2012 09:32 AM > Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session > Sent by: The EDUCAUSE Wireless Issues Constituent Group Listserv > > > > > This what we see on the AP at the time the AP disjoins: > > *Feb 1 14:16:25.174: %DTLS-5-SEND_ALERT: Send FATAL : Close notify > Alert to 10.246.207.214:5246 > *Feb 1 14:16:25.227: %WIDS-5-DISABLED: IDS Signature is removed and > disabled. > *Feb 1 14:16:25.227: %CAPWAP-5-CHANGED: CAPWAP changed state to DISCOVERY > *Feb 1 14:16:25.227: %CAPWAP-5-CHANGED: CAPWAP changed state to DISCOVERY > *Feb 1 14:16:25.293: %LINK-5-CHANGED: Interface Dot11Radio0, changed > state to administratively down > *Feb 1 14:16:25.293: %LINK-5-CHANGED: Interface Dot11Radio1, changed > state to administratively down > *Feb 1 14:16:25.299: %LINK-5-CHANGED: Interface Dot11Radio0, changed > state to reset > *Feb 1 14:16:25.309: status of voice_diag_test from WLC is false > *Feb 1 14:16:25.309: %LINK-3-UPDOWN: Interface Dot11Radio1, changed > state to up > *Feb 1 14:16:25.318: %LINK-3-UPDOWN: Interface Dot11Radio0, changed > state to up > > -dan > > Dan Brisson > Network Engineer > University of Vermont > (Ph) 802.656.8111 > dbrisson@uvm.edu > > > On 2/1/2012 10:30 AM, Mike Goebel wrote: >> Dan, have you tried logging into the AP itself and checking the logs >> by chance? >> >> Mike >> >> On 2/1/2012 10:03 AM, Dan Brisson wrote: >>> It does seem as though I've grabbed some folks attention. I sure hope it >>> turns out to not be something simple. :) >>> >>> I could certainly try moving the APs around...easy enough to do, >>> although from what we've seen, the pattern of AP drops is so totally >>> random, hard to say if I'll see anything. At this point though, it's >>> worth a shot. >>> All interfaces clean and no QoS in place. >>> >>> Not sure if this will come through for everyone, but here's an example >>> of what I see after an AP drops. This is from the controller, on the >>> General tab for an AP: >>> >>> >>> >>> Thanks, >>> -dan >>> >>> Dan Brisson >>> Network Engineer >>> University of Vermont >>> (Ph) 802.656.8111 >>> dbrisson@uvm.edu >>> >>> >>> On 2/1/2012 9:26 AM, Garry Peirce wrote: >>>> I think you have some of us all getting curious! ;-) >>>> >>>> Could you put a historically stable admin AP onto the 5508 and >>>> vice-versa to >>>> see if behaviors change? >>>> Do we assume that all switchports in the path are showing they're >>>> running >>>> clean? >>>> Any QoS config in place on the switches? >>>> >>>> >>>>

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

 

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

 

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

 

Dan, Do you have the APs in public subnets or private subnets? Occasionally we see this problem happening in our environment as well. Currently we put APs in public subnets. I worked with Cisco TAC on this and we could not find anything. Finally TAC suggested we move APs to private subnets and we are considering that. --- Dennis Xu Network Analyst, Computing and Communication Services University of Guelph 5198244120 x 56217
Private management space has not helped us at all. If there is no reason given for the suggestion that you move to private space, it sounds like straws are being grasped at. We have been on private space for quite while for AP management, switch management, and another of other uses where the hosts have no real need to reach the Internet. It has saved us thousands of public IP addresses and has other benefit, but zero to do with somehow exorcising CAPWAP demons.
Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

 

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

 

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan



Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

 

For those following this thread, while we still haven't determined the exact cause of this problem, we also have not had a drop from an AP where we turned off DAI and IP source verify.  Seems logical that one (or both) of those could be causing problems, although what is not entirely logical is how that would be related to load/activity, since the APs never drop when the students are gone on break.

TAC has been very responsive and at this point I've asked if we can get someone from the Switching side to look at the possibility that DAI and/or IP Source verify could be the cause.

-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 2:38 PM, Dan Brisson wrote:
Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

 

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

 

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan



Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

 

Dan,
 
If you extend the DHCP lease duration of the AP's, and reenable DAI and IP SV, what happens to the interval between lost CAPWAP sessions? Perhaps at the DHCP renewal time, DAI and IP SV introduces enough of a delay to cause a problem with CAPWAP. With students are gone and the activity with an AP is low, CAPWAP is OK - but under load/activity - CAPWAP is catching it.
 
When an AP does the CAPWAP "dance" what does the lease time look like in the snooping database? Does it look like it was just renewed?
 
Last but not least, are you writing your database to the local flash of the switch or to a FTP server? Do writes of the database correspond with the CAPWAP loss? What does the switch CPU look like at the time i.e. do you have ACL's running that are pushing work out of the ASICs and on to the CPU?
 
Jeff

>>> On Monday, February 06, 2012 at 6:47 AM, in message <4F2FE7EC.6060502@uvm.edu>, Dan Brisson <dbrisson@UVM.EDU> wrote:
For those following this thread, while we still haven't determined the exact cause of this problem, we also have not had a drop from an AP where we turned off DAI and IP source verify.  Seems logical that one (or both) of those could be causing problems, although what is not entirely logical is how that would be related to load/activity, since the APs never drop when the students are gone on break.

TAC has been very responsive and at this point I've asked if we can get someone from the Switching side to look at the possibility that DAI and/or IP Source verify could be the cause.

-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 2:38 PM, Dan Brisson wrote:
Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

If DAI, I’d expect your switch logs would show the AP ports being disabled (and assume you must have auto-recovery enabled)

Perhaps somehow hitting DAI’s ARP default inbound pps limit?

If so, question may be why the switch would see more than (default) 15 pps of ARP traffic inbound from lightweight APs.

 

If no DAI entries on the switch, then might be source guard more silently dropping packets for some reason.

You might try configuring one/or the other feature to help determine which it is.

 

While working without issue now, is DHCP snooping (alone) still configured on the AP ports?

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Monday, February 06, 2012 9:47 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

For those following this thread, while we still haven't determined the exact cause of this problem, we also have not had a drop from an AP where we turned off DAI and IP source verify.  Seems logical that one (or both) of those could be causing problems, although what is not entirely logical is how that would be related to load/activity, since the APs never drop when the students are gone on break.

TAC has been very responsive and at this point I've asked if we can get someone from the Switching side to look at the possibility that DAI and/or IP Source verify could be the cause.

-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 2:38 PM, Dan Brisson wrote:

Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

 

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

 

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan




Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

 

Garry,

We actually did bump up the ARP pps limit right from the start.  We found out pretty quickly we needed to do that when we enabled it on the wired ports before we installed wireless.  We've set that to:

ip arp inspection limit rate 70

I did find a switch that had a DAI drop in the log, but it didn't correlate with when the AP went down and even further, the port that was reported was not the port the AP was connected to that dropped.  So much for the DAI theory, but yeah, IP Source Verify is perhaps silently dropping it, for whatever reason.

DHCP snooping is still enabled on the switches that we disabled DAI and IP Source-verify.

Thanks,
-dan

Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/6/2012 3:17 PM, Garry Peirce wrote:

If DAI, I’d expect your switch logs would show the AP ports being disabled (and assume you must have auto-recovery enabled)

Perhaps somehow hitting DAI’s ARP default inbound pps limit?

If so, question may be why the switch would see more than (default) 15 pps of ARP traffic inbound from lightweight APs.

 

If no DAI entries on the switch, then might be source guard more silently dropping packets for some reason.

You might try configuring one/or the other feature to help determine which it is.

 

While working without issue now, is DHCP snooping (alone) still configured on the AP ports?

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Monday, February 06, 2012 9:47 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

For those following this thread, while we still haven't determined the exact cause of this problem, we also have not had a drop from an AP where we turned off DAI and IP source verify.  Seems logical that one (or both) of those could be causing problems, although what is not entirely logical is how that would be related to load/activity, since the APs never drop when the students are gone on break.

TAC has been very responsive and at this point I've asked if we can get someone from the Switching side to look at the possibility that DAI and/or IP Source verify could be the cause.

-dan



Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu


On 2/1/2012 2:38 PM, Dan Brisson wrote:

Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan



Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu


On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

 

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

 

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan




Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

 

Jeff,

Thanks for the email.  I have an email into our DHCP folks to get them to extend the lease time to 24hrs - it's currently set to 8.  That being said, I looked in our DHCP server log and found that for an AP that lost its CAPWAP session, right at the time it was lost, the DHCP log shows 3 DHCP release messages from the AP, then 2 seconds later it shows a DHCP Discover message and then 1 second later the DHCP Offer, Request, and ACK.  So the span of time from the last DHCP release until DHCP Ack is 3 seconds according to the DHCP log. 

The key here seems to be that the AP, for some reason, felt it had to release its IP and then go through the entire DHCP process again.  I can understand that throwing the CAPWAP session for a loop.  TAC has said that this is most likely a consequence of some sort of network disruption, rather than the actual cause.

All that being said, I would think the DHCP snooping database would show it just being renewed, but I'll check on the next one.

We do not write our database to the local flash or an FTP server.

Thanks,
-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/6/2012 2:46 PM, Jeffrey Sessler wrote:
Dan,
 
If you extend the DHCP lease duration of the AP's, and reenable DAI and IP SV, what happens to the interval between lost CAPWAP sessions? Perhaps at the DHCP renewal time, DAI and IP SV introduces enough of a delay to cause a problem with CAPWAP. With students are gone and the activity with an AP is low, CAPWAP is OK - but under load/activity - CAPWAP is catching it.
 
When an AP does the CAPWAP "dance" what does the lease time look like in the snooping database? Does it look like it was just renewed?
 
Last but not least, are you writing your database to the local flash of the switch or to a FTP server? Do writes of the database correspond with the CAPWAP loss? What does the switch CPU look like at the time i.e. do you have ACL's running that are pushing work out of the ASICs and on to the CPU?
 
Jeff

>>> On Monday, February 06, 2012 at 6:47 AM, in message <4F2FE7EC.6060502@uvm.edu>, Dan Brisson <dbrisson@UVM.EDU> wrote:
For those following this thread, while we still haven't determined the exact cause of this problem, we also have not had a drop from an AP where we turned off DAI and IP source verify.  Seems logical that one (or both) of those could be causing problems, although what is not entirely logical is how that would be related to load/activity, since the APs never drop when the students are gone on break.

TAC has been very responsive and at this point I've asked if we can get someone from the Switching side to look at the possibility that DAI and/or IP Source verify could be the cause.

-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 2:38 PM, Dan Brisson wrote:
Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan


Dan Brisson Network Engineer University of Vermont (Ph) 802.656.8111 dbrisson@uvm.edu
On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

For what it is worth, I had a TAC case open around a year ago for issues with Cisco wifi phones.  Some of the issues we were having were attributed to delays at the AP caused by security features on the switch port (port security, DHCP Snooping, DAI, IP Verify Source) – Cisco 3560 switches.  We didn’t test each feature independently as we had no issues with disabling them.  We were told that each AP switch port would optimally have the following configuration:

 

interface GigabitEthernet0/0/0

switchport access vlan XXX

switchport mode access

mls qos trust dscp

spanning-tree portfast

 

I’m thinking Jeff may be on the right track.  I wonder if a delay caused by security features might be causing the AP to assume an outage occurred and a new DHCP lease is needed…

 

 

Jeremy Brake

Network Services Analyst, Information Technology

Angelo State University

Member, Texas Tech University System

ASU Station #11020

San Angelo, TX 76909-1020

Phone: (325) 942-2333 Fax: (325) 942-2109

jeremy.brake@angelo.edu

 

 

 

 

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Monday, February 06, 2012 3:01 PM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

 

Jeff,

Thanks for the email.  I have an email into our DHCP folks to get them to extend the lease time to 24hrs - it's currently set to 8.  That being said, I looked in our DHCP server log and found that for an AP that lost its CAPWAP session, right at the time it was lost, the DHCP log shows 3 DHCP release messages from the AP, then 2 seconds later it shows a DHCP Discover message and then 1 second later the DHCP Offer, Request, and ACK.  So the span of time from the last DHCP release until DHCP Ack is 3 seconds according to the DHCP log. 

The key here seems to be that the AP, for some reason, felt it had to release its IP and then go through the entire DHCP process again.  I can understand that throwing the CAPWAP session for a loop.  TAC has said that this is most likely a consequence of some sort of network disruption, rather than the actual cause.

All that being said, I would think the DHCP snooping database would show it just being renewed, but I'll check on the next one.

We do not write our database to the local flash or an FTP server.

Thanks,
-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/6/2012 2:46 PM, Jeffrey Sessler wrote:

Dan,

 

If you extend the DHCP lease duration of the AP's, and reenable DAI and IP SV, what happens to the interval between lost CAPWAP sessions? Perhaps at the DHCP renewal time, DAI and IP SV introduces enough of a delay to cause a problem with CAPWAP. With students are gone and the activity with an AP is low, CAPWAP is OK - but under load/activity - CAPWAP is catching it.

 

When an AP does the CAPWAP "dance" what does the lease time look like in the snooping database? Does it look like it was just renewed?

 

Last but not least, are you writing your database to the local flash of the switch or to a FTP server? Do writes of the database correspond with the CAPWAP loss? What does the switch CPU look like at the time i.e. do you have ACL's running that are pushing work out of the ASICs and on to the CPU?

 

Jeff

>>> On Monday, February 06, 2012 at 6:47 AM, in message <4F2FE7EC.6060502@uvm.edu>, Dan Brisson <dbrisson@UVM.EDU> wrote:

For those following this thread, while we still haven't determined the exact cause of this problem, we also have not had a drop from an AP where we turned off DAI and IP source verify.  Seems logical that one (or both) of those could be causing problems, although what is not entirely logical is how that would be related to load/activity, since the APs never drop when the students are gone on break.

TAC has been very responsive and at this point I've asked if we can get someone from the Switching side to look at the possibility that DAI and/or IP Source verify could be the cause.

-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 2:38 PM, Dan Brisson wrote:

Ah right, yes, 'mls qos' is NOT configured on any of the 3560X switches.

We used DAI and DHCP snooping b/c the majority of APs are actually in student rooms due to, well, no other place really to put them.  :)  Now that you've brought that up, though, we had to go in the ceiling in one of our newer, bigger complexes.  I'm going to turn off DAI and IP Source verify there and see if the drops stop.

Will let folks know what I find.

Thanks!
-dan



Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 12:41 PM, Garry Peirce wrote:

Dan,

A small but important point to verify re: QoS.

By ‘no QoS in place’, does that mean the global  ‘mls qos’ is NOT configured on the resHall switches or that no specific QoS config has been configured? 

Ex. if the global ‘mls qos’ is enabled, all ports (including APs) would be untrusted by default with all packet markings remarked as 0.

Also, any QoS/service policies on the relevant router’s interfaces?

Given the L2 functions you mention are unique to the ResHall side, I’d disable them on the ports used by the APs.

I wouldn’t expect these L2 security functions to be needed on known AP ports and removing them might provide further insight on the issue.

DAI disabling AP ports due to ARP pps threshold (odd)? Is errdisable auto-recovery of DAI enabled?

Any log data from the switch of an affected AP?

From: The EDUCAUSE Wireless Issues Constituent Group Listserv [mailto:WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU] On Behalf Of Dan Brisson
Sent: Wednesday, February 01, 2012 11:35 AM
To: WIRELESS-LAN@LISTSERV.EDUCAUSE.EDU
Subject: Re: [WIRELESS-LAN] Cisco APs losing CAPWAP session

Ok, thanks for validating.  It also seemed a bit strange to me and yes, I checked a bunch APs that haven't dropped recently and they all showed 10-12ms.

One thing that occurred to me is we are doing DHCP snooping and Dynamic Arp Inspection on the 3560Xs.  That is unique to this part of campus as we haven't yet rolled that out to the entire admin side.

Thanks,
-dan




Dan BrissonNetwork EngineerUniversity of Vermont(Ph) 802.656.8111dbrisson@uvm.edu


On 2/1/2012 10:23 AM, Mike King wrote:

Close
Close


Annual Conference
September 29–October 2
Register Now!

Events for all Levels and Interests

Whether you're looking for a conference to attend face-to-face to connect with peers, or for an online event for team professional development, see what's upcoming.

Close

Digital Badges
Member recognition effort
Earn yours >

Career Center


Leadership and Management Programs

EDUCAUSE Institute
Project Management

 

 

Jump Start Your Career Growth

Explore EDUCAUSE professional development opportunities that match your career aspirations and desired level of time investment through our interactive online guide.

 

Close
EDUCAUSE organizes its efforts around three IT Focus Areas

 

 

Join These Programs If Your Focus Is

Close

Get on the Higher Ed IT Map

Employees of EDUCAUSE member institutions and organizations are invited to create individual profiles.
 

 

Close

2014 Strategic Priorities

  • Building the Profession
  • IT as a Game Changer
  • Foundations


Learn More >

Uncommon Thinking for the Common Good™

EDUCAUSE is the foremost community of higher education IT leaders and professionals.