Support #1005
BV Instance of Machine 2 dependence on Machine 1
Status: | Work Completed-End life cycle | Start date: | June 22, 2011 | |
---|---|---|---|---|
Priority: | Urgent | Due date: | July 22, 2011 | |
Assignee: | Ahmad Hazri | % Done: | 100% | |
Category: | - | Spent time: | 75.00 hours | |
Target version: | - |
Description
BV instances on Machine 2 will be in unstable state whenever Machine 1 is power off.
History
#1 Updated by Ahmad Hazri over 13 years ago
- Status changed from New - Begin Life Cycle to Development / Work In Progress
Workaround/Solution
1) Replace hostname KFHIB03 to KFHIB04 in following files
argumentValues.properties Silent.properties
2) Enable DEBUG mode
3) Grep the hostname/IP on BV instance of Machine 2
2nd solution
1) Add following parameter into run_BV.sh
-b 10.20.208.4
example
run.sh -c bv_framework -b 10.20.208.4 $@
#2 Updated by Ahmad Hazri over 13 years ago
- File KFH_IB_Migration_Plan_20110623__App2_.docx added
- % Done changed from 0 to 10
KFH agree to enable the DEBUG logging.
Date: June 23, 2011
Time: 10 pm
Log will forward to Micheal BV for diagnostic.
#3 Updated by Ahmad Hazri about 13 years ago
- Due date changed from June 30, 2011 to July 22, 2011
- Start date changed from June 15, 2011 to June 22, 2011
- % Done changed from 10 to 20
New Plan:
Configure another machine @Site for clustering with existing Dev server to simulate the issue.
Machine info.
Hostname:PSDEV-1 & PSDEV-2
IP: 192.168.1.126 & x.x.x.127
Action Plan
1)Clone PSDEV-1 (current Dev server @site) to PSDEV-2 (New machine)
2)Reconfigure BV @PSDEV-2
3)Testing/simulate current issue
#4 Updated by Ahmad Hazri about 13 years ago
currently cloning PSDEV-1
#5 Updated by Ahmad Hazri about 13 years ago
Cloning - done
BV Clustering - done
#6 Updated by Ahmad Hazri about 13 years ago
Simulation:
I configured 2 machine for clustering with 4 instances:
Machine A:
bv_framework0
bv_framework1
Machine B:
bv_framework2
bv_framework3
- We started bv_framework1, bv_framework2 & bv_framework3, shown they are joining the clustering group.
- Everything running fine.
- Successfully login and logout to BVMC for all instances.
Now we login BVMC again at Machine B, then we un-plug the Machine A network cable (to simulate the absence of Machine A). We tried logout from BVMC (@Machine B), we didn't get any response from the BVMC page. From the log file we see:
10:18:25,728 WARN [JMSContainerInvoker] JMS provider failure detected for ConfigUpdateMDB org.jboss.mq.SpyJMSException: Exiting on IOE; - nested throwable: (java.net.SocketTimeoutException: Read timed out) at org.jboss.mq.SpyJMSException.getAsJMSException(SpyJMSException.java:72) at org.jboss.mq.Connection.asynchFailure(Connection.java:421) at org.jboss.mq.il.uil2.UILClientILService.asynchFailure(UILClientILService.java:174) at org.jboss.mq.il.uil2.SocketManager$ReadTask.handleStop(SocketManager.java:439) at org.jboss.mq.il.uil2.SocketManager$ReadTask.run(SocketManager.java:371) at java.lang.Thread.run(Thread.java:595) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:235) at org.jboss.util.stream.NotifyingBufferedInputStream.read(NotifyingBufferedInputStream.java:79) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2196) at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2376) at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2443) at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2515) at java.io.ObjectInputStream$BlockDataInputStream.readByte(ObjectInputStream.java:2664) at java.io.ObjectInputStream.readByte(ObjectInputStream.java:875) at org.jboss.mq.il.uil2.SocketManager$ReadTask.run(SocketManager.java:316) ... 1 more 10:18:25,745 WARN [JMSContainerInvoker] JMS provider failure detected for ConfigUpdateMDB org.jboss.mq.SpyJMSException: No pong received; - nested throwable: (java.io.IOException: ping timeout.) at org.jboss.mq.Connection$PingTask.run(Connection.java:1305) at EDU.oswego.cs.dl.util.concurrent.ClockDaemon$RunLoop.run(ClockDaemon.java:364) at java.lang.Thread.run(Thread.java:595) Caused by: java.io.IOException: ping timeout. ... 3 more
We plug the network cable of Machine A, now BVMC can logout successfully. Log file shows:
10:19:15,948 ERROR [GMS] [psdev-2:33465 (additional data: 18 bytes)] received view <= current view; discarding it (current vid: [psdev-1:1116 (additional dat a: 18 bytes)|7], new vid: [psdev-1:1116 (additional data: 18 bytes)|7]) 10:19:18,778 INFO [JMSContainerInvoker] Trying to reconnect to JMS provider for ObserveReader 10:19:18,826 INFO [JMSContainerInvoker] Trying to reconnect to JMS provider for CacheListenerMDBean 10:19:18,846 INFO [JMSContainerInvoker] Trying to reconnect to JMS provider for WFEventMDB 10:19:18,846 INFO [JMSContainerInvoker] Trying to reconnect to JMS provider for ConfigUpdateMDB 10:19:18,871 INFO [JMSContainerInvoker] Trying to reconnect to JMS provider for DForumMDB 10:19:19,212 INFO [JMSContainerInvoker] Reconnected to JMS provider for ObserveReader 10:19:19,323 INFO [JMSContainerInvoker] Reconnected to JMS provider for DForumMDB 10:19:19,340 INFO [JMSContainerInvoker] Reconnected to JMS provider for CacheListenerMDBean 10:19:19,355 INFO [JMSContainerInvoker] Reconnected to JMS provider for ConfigUpdateMDB 10:19:19,356 INFO [JMSContainerInvoker] Reconnected to JMS provider for WFEventMDB 10:19:40,534 INFO [TreeCache] viewAccepted(): [psdev-2:33462|6] [psdev-2:33462, psdev-2:33473] 10:19:43,833 INFO [TreeCache] viewAccepted(): [psdev-2:33462|7] [psdev-2:33462, psdev-2:33473, psdev-1:1128] 10:19:43,936 INFO [TreeCache] locking the subtree at / to transfer state 10:19:43,972 INFO [StateTransferGenerator_140] returning the state for tree rooted in /(1024 bytes) 10:20:03,005 INFO [AlertJMSTopic] Bound to JNDI name: topic/bv_framework.AlertJMSTopic 10:20:03,013 INFO [AlertJMSQueue] Bound to JNDI name: queue/bv_framework.AlertJMSQueue 10:20:03,020 INFO [ConfigNoticeTopic] Bound to JNDI name: topic/bv_framework.ConfigNoticeTopic 10:20:03,029 INFO [DForumJMSQueue] Bound to JNDI name: queue/bv_framework.DForumJMSQueue 10:20:03,037 INFO [CacheNoticeTopic] Bound to JNDI name: topic/bv_framework.CacheNoticeTopic 10:20:03,044 INFO [CatEjbNoticeTopic] Bound to JNDI name: topic/bv_framework.CatEjbNoticeTopic 10:20:03,052 INFO [ObsLoggerQueue] Bound to JNDI name: queue/bv_framework.ObsLoggerQueue 10:20:03,060 INFO [WFEvQ] Bound to JNDI name: queue/bv_framework.WFEvQ 10:20:03,071 INFO [A] Bound to JNDI name: queue/A 10:20:03,080 INFO [B] Bound to JNDI name: queue/B 10:20:03,088 INFO [C] Bound to JNDI name: queue/C 10:20:03,095 INFO [D] Bound to JNDI name: queue/D 10:20:03,104 INFO [ex] Bound to JNDI name: queue/ex 10:20:03,123 INFO [testTopic] Bound to JNDI name: topic/testTopic 10:20:03,131 INFO [securedTopic] Bound to JNDI name: topic/securedTopic 10:20:03,139 INFO [testDurableTopic] Bound to JNDI name: topic/testDurableTopic 10:20:03,147 INFO [testQueue] Bound to JNDI name: queue/testQueue 10:20:03,155 INFO [UILServerILService] JBossMQ UIL service available at : /0.0.0.0:7420 10:20:03,167 INFO [DLQ] Bound to JNDI name: queue/DLQ 10:20:03,225 INFO [ProxyFactory] Bound EJB Home 'ProfileSchemaBean' to jndi 'bv/bv_framework/ejb/ProfileSchemaBean' 10:20:03,227 INFO
Proven that this issue happening also at KFH env.
#7 Updated by Ahmad Hazri about 13 years ago
- % Done changed from 20 to 40
#8 Updated by Ahmad Hazri about 13 years ago
Upfate from Micheal;
Hi Hazari, I am fine, hope the same with you too.. The issue in KFH site is that when Machine A is powered OFF, and when we start the BV instance (bv_framework2, bv_framework3) in Machine B, it take time to load (time diff seen in server.log), and once the server is up the page takes time to load and eventually the page is not displayed. And if the Machine A is power on (but not the BV instance), the site works fine with the BV server in Machien B Did you test with the same scenario? The below issue is different and is because of the JMS server not reachable. “JMS provider failure detectedâ€. This is because you have started the “bv_framework0†first which becomes the JMS server for all the remaining instance in the Cluster. And since you have unplugged the network wire (Machine A) the JMS server is not reachable and hence the issue 10:18:25,728 WARN [JMSContainerInvoker] JMS provider failure detected for ConfigUpdateMDB org.jboss.mq.SpyJMSException: Exiting on IOE; - nested throwable: (java.net.SocketTimeoutException: Read timed out) at org.jboss.mq.SpyJMSException.getAsJMSException(SpyJMSException.java:72) at org.jboss.mq.Connection.asynchFailure(Connection.java:421) In order to simulate KFH scenario you stop all the Jboss server in Machine A & Machine B: bv_framework0 bv_framework1 bv_framework2 bv_framework3 Then Power OFF Machine A Then start bv_framework2 or bv_framework3 instance and see if you face this issue seen in KFH.
#9 Updated by Ahmad Hazri about 13 years ago
Currently doing testing.
- Unplug Machine A network cable (to simulate the Power Off)
- Start Bv_framework2 or Bv_framework3
#10 Updated by Ahmad Hazri about 13 years ago
Just remember that Oracle Db is installed in the same server (Machine A), hence unable to simulate unless the DB server is located on different server
#11 Updated by Ahmad Hazri almost 13 years ago
Point to Demo1 (219.95.244.227) database.
1) Backup the /opt/BV1TO1/var/appConfig/bv_framework/etc/bv.properties on both servers.
2) Execute
./bvtool set-db -user demouser2 -passwd perempuan -database ibsdemo -server demo1 -url jdbc\:oracle\:oci\:@ibsdemo ./bvtool deploy-config -config bv_framework -no-res
#12 Updated by Ahmad Hazri almost 13 years ago
found out that xml file below should update to the correct username/password
Server 1 (psdev-1)
/opt/BV1TO1/JBoss/server/bv_framework0/deploy/bv_framework.BVRuntimeDBPool-service.xml /opt/BV1TO1/JBoss/server/bv_framework1/deploy/bv_framework.BVRuntimeDBPool-service.xml
Server 2 (psdev-2)
/opt/BV1TO1/JBoss/server/bv_framework2/deploy/bv_framework.BVRuntimeDBPool-service.xml /opt/BV1TO1/JBoss/server/bv_framework3/deploy/bv_framework.BVRuntimeDBPool-service.xml
#13 Updated by Ahmad Hazri almost 13 years ago
- % Done changed from 40 to 50
#14 Updated by Ahmad Hazri almost 13 years ago
1)Manage to change and point the DB to demo1 server (@office)
2)Tested all instances running fine - using BVMC
Next:
Stop all instances and try to start the bv_2 or bv_3 on Machine 2
#15 Updated by Ahmad Hazri almost 13 years ago
BV instances in Machine 2 able to UP when Machine 1 is down.
Means Machine 2 is not dependence on Machine 1.
#16 Updated by Ahmad Hazri almost 13 years ago
- % Done changed from 50 to 70
#17 Updated by Vincent Devethas over 12 years ago
Hazri,
Is this task completed or keep under monitor and provide maintenance? If its completed, could you please update your task.
Thank you,
#18 Updated by Ahmad Hazri over 12 years ago
- % Done changed from 70 to 100
The simulation done at site office, but the same issue cannot reproduce. Most probably because of different environment.
There is one time KFHIB03 machine was down, but this issue not happens. So suspected there is a networking issue there.
#19 Updated by Norhaidah Md Dasuki almost 11 years ago
Please assist to verify is this issue can be closed. Thank you.
#20 Updated by Ahmad Hazri almost 11 years ago
- Status changed from Development / Work In Progress to Work Completed-End life cycle