asterisk guys, need some help with trunks

goodcooper

[H]F Junkie
Joined
Nov 4, 2005
Messages
9,771
ok, i've got 4 asterisk servers...

my main box is called green... i have iax2 trunks to the other 3 boxes, back and forth...

running untangle routers at each location... for some reason my iax2 trunks time out in one way on 2 legs


so... 6 connections...

green -> morg works great
morg -> green works great
green -> south works great
south -> green UNAVAILABLE
green -> alabama works great
alabama -> green UNAVAILABLE


each PBX is on it's own network... after a reboot the PBX it will work for a few days it seems, then out of nowhere somebody will call and say they get a busy message when trying to call out...

so for example i can make calls from morg through green to south all day long... but if south tries to call anybody on any other PBX but their own, they get allison telling them all circuits are busy... (until i reboot and the trunk becomes available again)

EDIT:
also, this is showing up in asterisk on south...
[2012-12-03 16:40:01] WARNING[3216]: chan_iax2.c:3592 __attempt_transmit: Max retries exceeded to host 10.0.10.5 on IAX2/greenpbx-5293 (type = 6, subclass = 11, ts=13522682, seqno=228)
 
What distro are you using? Have you collected logs on a longer term basis yet?
 
What distro are you using? Have you collected logs on a longer term basis yet?

running freepbx distro, should be autoupdated to the newest....


i ran iax2 set debug on and i got this output from one of the affected boxes (after disabling and re-enabling the trunk in fpbx, which doesn't fix the problem) (also, 10.0.10.5 is greenpbx):

[2012-12-03 17:51:46] NOTICE[3146]: chan_iax2.c:12105 __iax2_poke_noanswer: Peer 'greenpbx' is now UNREACHABLE! Time: 0
Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00011ms SCall: 03853 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00011ms SCall: 03853 DCall: 00000 [10.0.10.5:4569]

Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00012ms SCall: 12515 DCall: 00000 [10.0.10.5:1025]

Tx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: PONG
Timestamp: 00012ms SCall: 00001 DCall: 12515 [10.0.10.5:1025]
Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK
Timestamp: 00012ms SCall: 12515 DCall: 00001 [10.0.10.5:1025]
Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00017ms SCall: 00378 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00017ms SCall: 00378 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00001ms SCall: 07061 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00001ms SCall: 07061 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00004ms SCall: 13955 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00004ms SCall: 13955 DCall: 00000 [10.0.10.5:4569]

Rx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00017ms SCall: 13787 DCall: 00000 [10.0.10.5:1025]

Tx-Frame Retry[ No] -- OSeqno: 000 ISeqno: 001 Type: IAX Subclass: PONG
Timestamp: 00017ms SCall: 00001 DCall: 13787 [10.0.10.5:1025]
Rx-Frame Retry[ No] -- OSeqno: 001 ISeqno: 001 Type: IAX Subclass: ACK
Timestamp: 00017ms SCall: 13787 DCall: 00001 [10.0.10.5:1025]
Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00007ms SCall: 09182 DCall: 00000 [10.0.10.5:4569]

Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX Subclass: POKE
Timestamp: 00007ms SCall: 09182 DCall: 00000 [10.0.10.5:4569]

localhost*CLI>
 
So you have some sort of VPN? Is it doing NAT? Can you ping both ways?
 
So you have some sort of VPN? Is it doing NAT? Can you ping both ways?

yes, they're VPN'd together, i don't think nat is enabled on those trunks... the pings work fine, i can see the acks and pokes going between the boxes on iax2 debug...


also, when i reboot it'll work for a couple days...



i did notice that alabama and south (the ones i'm having problems with) were on a newer version of freepbx distro (they were installed later) and so i found the scripts and updated green to their version... rebooted them both... i guess now we wait to see if it fails again... if those 2 stay up and morg fails i will know that was the problem, because morg is the oldest of them all...
 
Last edited:
It looks like a network problem, except you say you can ping the remote hosts when this situation occurs, right?

My hypothesis ( aka: wild ass guess ) is that there is a counter embedded in the packet, and that it grows based on traffic counts. After it grows beyond a certain point and the packet needs to be fragmented to traverse the vpn ( openvpn, right? ), the other side can't reassemble the packet, and down she goes.

I ran in to something very similar years ago. However, your firewalls may already account for this behavior with openvpn. Look at the mssfix and fragment commands; they may already be set, in which case I'm just blowing smoke.
 
It looks like a network problem, except you say you can ping the remote hosts when this situation occurs, right?

My hypothesis ( aka: wild ass guess ) is that there is a counter embedded in the packet, and that it grows based on traffic counts. After it grows beyond a certain point and the packet needs to be fragmented to traverse the vpn ( openvpn, right? ), the other side can't reassemble the packet, and down she goes.

I ran in to something very similar years ago. However, your firewalls may already account for this behavior with openvpn. Look at the mssfix and fragment commands; they may already be set, in which case I'm just blowing smoke.

very insightful, it is indeed openvpn, i'll have to look into this... i'm wondering why it's happening in 2 locations and not the 3rd though...

it does seem like a network problem, but i've also nc on udp 4569 and was sending messages back and forth no problem...

but your answer seems logical, even w/ my testing... my question is, why does a reboot of the PBX fix it? if it is indeed a network problem... and also, why is it only going down in one direction...
 
very insightful, it is indeed openvpn, i'll have to look into this... i'm wondering why it's happening in 2 locations and not the 3rd though...

it does seem like a network problem, but i've also nc on udp 4569 and was sending messages back and forth no problem...

but your answer seems logical, even w/ my testing... my question is, why does a reboot of the PBX fix it? if it is indeed a network problem... and also, why is it only going down in one direction...
Do you reset the entire system, or just restart asterisk? Have you tried restarting asterisk? If you have to restart the entire system, then it's a kernel level "thing". If it's just asterisk ( which I suspect, but not enough data to say one way or another ), then it's a counter internal to the application.

It makes sense it's just one direction, that's actually what makes me think "network".

As to the overall "Why"...I got nothing. Could be a thousand little things. It'd be hard to say without tinkering with the setup for a while. If I'm even right, remember, I'm just making up a story to fit the facts :D.
 
Back
Top