Support #1005

Avatar?id=1733&size=50

BV Instance of Machine 2 dependence on Machine 1

Added by Ahmad Hazri over 13 years ago. Updated almost 11 years ago.

Status:Work Completed-End life cycleStart date:June 22, 2011
Priority:UrgentDue date:July 22, 2011
Assignee:Avatar?id=1733&size=14Ahmad Hazri % Done:

100%

Category:-Spent time:75.00 hours
Target version:-

Description

BV instances on Machine 2 will be in unstable state whenever Machine 1 is power off.

KFH_IB_Migration_Plan_20110623__App2_.docx (54.4 KB) Ahmad Hazri , June 23, 2011 14:40

History

#1 Avatar?id=1733&size=24 Updated by Ahmad Hazri over 13 years ago

  • Status changed from New - Begin Life Cycle to Development / Work In Progress

Workaround/Solution

1) Replace hostname KFHIB03 to KFHIB04 in following files

argumentValues.properties
Silent.properties

2) Enable DEBUG mode
3) Grep the hostname/IP on BV instance of Machine 2

2nd solution
1) Add following parameter into run_BV.sh

-b 10.20.208.4

example

run.sh -c bv_framework -b 10.20.208.4 $@

#2 Avatar?id=1733&size=24 Updated by Ahmad Hazri over 13 years ago

KFH agree to enable the DEBUG logging.

Date: June 23, 2011
Time: 10 pm

Log will forward to Micheal BV for diagnostic.

#3 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

  • Due date changed from June 30, 2011 to July 22, 2011
  • Start date changed from June 15, 2011 to June 22, 2011
  • % Done changed from 10 to 20

New Plan:
Configure another machine @Site for clustering with existing Dev server to simulate the issue.

Machine info.
Hostname:PSDEV-1 & PSDEV-2
IP: 192.168.1.126 & x.x.x.127

Action Plan
1)Clone PSDEV-1 (current Dev server @site) to PSDEV-2 (New machine)
2)Reconfigure BV @PSDEV-2
3)Testing/simulate current issue

#4 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

currently cloning PSDEV-1

#5 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

Cloning - done
BV Clustering - done

#6 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

Simulation:

I configured 2 machine for clustering with 4 instances:
Machine A:
bv_framework0
bv_framework1

Machine B:
bv_framework2
bv_framework3

- We started bv_framework1, bv_framework2 & bv_framework3, shown they are joining the clustering group.
- Everything running fine.
- Successfully login and logout to BVMC for all instances.

Now we login BVMC again at Machine B, then we un-plug the Machine A network cable (to simulate the absence of Machine A). We tried logout from BVMC (@Machine B), we didn't get any response from the BVMC page. From the log file we see:

10:18:25,728 WARN  [JMSContainerInvoker] JMS provider failure detected for ConfigUpdateMDB
org.jboss.mq.SpyJMSException: Exiting on IOE; - nested throwable: (java.net.SocketTimeoutException: Read timed out)
        at org.jboss.mq.SpyJMSException.getAsJMSException(SpyJMSException.java:72)
        at org.jboss.mq.Connection.asynchFailure(Connection.java:421)
        at org.jboss.mq.il.uil2.UILClientILService.asynchFailure(UILClientILService.java:174)
        at org.jboss.mq.il.uil2.SocketManager$ReadTask.handleStop(SocketManager.java:439)
        at org.jboss.mq.il.uil2.SocketManager$ReadTask.run(SocketManager.java:371)
        at java.lang.Thread.run(Thread.java:595)
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
        at org.jboss.util.stream.NotifyingBufferedInputStream.read(NotifyingBufferedInputStream.java:79)
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2196)
        at java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2376)
        at java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2443)
        at java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2515)
        at java.io.ObjectInputStream$BlockDataInputStream.readByte(ObjectInputStream.java:2664)
        at java.io.ObjectInputStream.readByte(ObjectInputStream.java:875)
        at org.jboss.mq.il.uil2.SocketManager$ReadTask.run(SocketManager.java:316)
        ... 1 more
10:18:25,745 WARN  [JMSContainerInvoker] JMS provider failure detected for ConfigUpdateMDB
org.jboss.mq.SpyJMSException: No pong received; - nested throwable: (java.io.IOException: ping timeout.)
        at org.jboss.mq.Connection$PingTask.run(Connection.java:1305)
        at EDU.oswego.cs.dl.util.concurrent.ClockDaemon$RunLoop.run(ClockDaemon.java:364)
        at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.IOException: ping timeout.
        ... 3 more

We plug the network cable of Machine A, now BVMC can logout successfully. Log file shows:

10:19:15,948 ERROR [GMS] [psdev-2:33465 (additional data: 18 bytes)] received view <= current view; discarding it (current vid: [psdev-1:1116 (additional dat
a: 18 bytes)|7], new vid: [psdev-1:1116 (additional data: 18 bytes)|7])
10:19:18,778 INFO  [JMSContainerInvoker] Trying to reconnect to JMS provider for ObserveReader
10:19:18,826 INFO  [JMSContainerInvoker] Trying to reconnect to JMS provider for CacheListenerMDBean
10:19:18,846 INFO  [JMSContainerInvoker] Trying to reconnect to JMS provider for WFEventMDB
10:19:18,846 INFO  [JMSContainerInvoker] Trying to reconnect to JMS provider for ConfigUpdateMDB
10:19:18,871 INFO  [JMSContainerInvoker] Trying to reconnect to JMS provider for DForumMDB
10:19:19,212 INFO  [JMSContainerInvoker] Reconnected to JMS provider for ObserveReader
10:19:19,323 INFO  [JMSContainerInvoker] Reconnected to JMS provider for DForumMDB
10:19:19,340 INFO  [JMSContainerInvoker] Reconnected to JMS provider for CacheListenerMDBean
10:19:19,355 INFO  [JMSContainerInvoker] Reconnected to JMS provider for ConfigUpdateMDB
10:19:19,356 INFO  [JMSContainerInvoker] Reconnected to JMS provider for WFEventMDB
10:19:40,534 INFO  [TreeCache] viewAccepted(): [psdev-2:33462|6] [psdev-2:33462, psdev-2:33473]
10:19:43,833 INFO  [TreeCache] viewAccepted(): [psdev-2:33462|7] [psdev-2:33462, psdev-2:33473, psdev-1:1128]
10:19:43,936 INFO  [TreeCache] locking the subtree at / to transfer state
10:19:43,972 INFO  [StateTransferGenerator_140] returning the state for tree rooted in /(1024 bytes)
10:20:03,005 INFO  [AlertJMSTopic] Bound to JNDI name: topic/bv_framework.AlertJMSTopic
10:20:03,013 INFO  [AlertJMSQueue] Bound to JNDI name: queue/bv_framework.AlertJMSQueue
10:20:03,020 INFO  [ConfigNoticeTopic] Bound to JNDI name: topic/bv_framework.ConfigNoticeTopic
10:20:03,029 INFO  [DForumJMSQueue] Bound to JNDI name: queue/bv_framework.DForumJMSQueue
10:20:03,037 INFO  [CacheNoticeTopic] Bound to JNDI name: topic/bv_framework.CacheNoticeTopic
10:20:03,044 INFO  [CatEjbNoticeTopic] Bound to JNDI name: topic/bv_framework.CatEjbNoticeTopic
10:20:03,052 INFO  [ObsLoggerQueue] Bound to JNDI name: queue/bv_framework.ObsLoggerQueue
10:20:03,060 INFO  [WFEvQ] Bound to JNDI name: queue/bv_framework.WFEvQ
10:20:03,071 INFO  [A] Bound to JNDI name: queue/A
10:20:03,080 INFO  [B] Bound to JNDI name: queue/B
10:20:03,088 INFO  [C] Bound to JNDI name: queue/C
10:20:03,095 INFO  [D] Bound to JNDI name: queue/D
10:20:03,104 INFO  [ex] Bound to JNDI name: queue/ex
10:20:03,123 INFO  [testTopic] Bound to JNDI name: topic/testTopic
10:20:03,131 INFO  [securedTopic] Bound to JNDI name: topic/securedTopic
10:20:03,139 INFO  [testDurableTopic] Bound to JNDI name: topic/testDurableTopic
10:20:03,147 INFO  [testQueue] Bound to JNDI name: queue/testQueue
10:20:03,155 INFO  [UILServerILService] JBossMQ UIL service available at : /0.0.0.0:7420
10:20:03,167 INFO  [DLQ] Bound to JNDI name: queue/DLQ
10:20:03,225 INFO  [ProxyFactory] Bound EJB Home 'ProfileSchemaBean' to jndi 'bv/bv_framework/ejb/ProfileSchemaBean'
10:20:03,227 INFO 

Proven that this issue happening also at KFH env.

#7 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

  • % Done changed from 20 to 40

#8 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

Upfate from Micheal;

Hi Hazari,

I am fine, hope the same with you too..

The issue in KFH site is that when Machine A is powered OFF, and when we start the BV instance (bv_framework2, bv_framework3) in Machine B, it take time to load (time diff seen in server.log), and once the server is up the page takes time to load and eventually the page is not displayed.
And if the Machine A is power on (but not the BV instance), the site works fine with the BV server in Machien B

Did you test with the same scenario?

The below issue is different and is because of the JMS server not reachable. “JMS provider failure detected”.
This is because you have started the “bv_framework0” first which becomes the JMS server for all the remaining instance in the Cluster. And since you have unplugged the network wire (Machine A) the JMS server is not reachable and hence the issue   

10:18:25,728 WARN  [JMSContainerInvoker] JMS provider failure detected for ConfigUpdateMDB
org.jboss.mq.SpyJMSException: Exiting on IOE; - nested throwable: (java.net.SocketTimeoutException: Read timed out)
        at org.jboss.mq.SpyJMSException.getAsJMSException(SpyJMSException.java:72)
        at org.jboss.mq.Connection.asynchFailure(Connection.java:421)

In order to simulate KFH scenario you stop all the Jboss server in Machine A & Machine B:
bv_framework0
bv_framework1
bv_framework2
bv_framework3

Then  Power OFF Machine A
Then start bv_framework2 or bv_framework3 instance and see if you face this issue seen in KFH.

#9 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

Currently doing testing.

  • Unplug Machine A network cable (to simulate the Power Off)
  • Start Bv_framework2 or Bv_framework3

#10 Avatar?id=1733&size=24 Updated by Ahmad Hazri about 13 years ago

Just remember that Oracle Db is installed in the same server (Machine A), hence unable to simulate unless the DB server is located on different server

#11 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 13 years ago

Point to Demo1 (219.95.244.227) database.

1) Backup the /opt/BV1TO1/var/appConfig/bv_framework/etc/bv.properties on both servers.
2) Execute

./bvtool set-db -user demouser2 -passwd perempuan -database ibsdemo -server demo1 -url jdbc\:oracle\:oci\:@ibsdemo
./bvtool deploy-config -config bv_framework -no-res

#12 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 13 years ago

found out that xml file below should update to the correct username/password

Server 1 (psdev-1)

/opt/BV1TO1/JBoss/server/bv_framework0/deploy/bv_framework.BVRuntimeDBPool-service.xml
/opt/BV1TO1/JBoss/server/bv_framework1/deploy/bv_framework.BVRuntimeDBPool-service.xml

Server 2 (psdev-2)

/opt/BV1TO1/JBoss/server/bv_framework2/deploy/bv_framework.BVRuntimeDBPool-service.xml
/opt/BV1TO1/JBoss/server/bv_framework3/deploy/bv_framework.BVRuntimeDBPool-service.xml

#13 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 13 years ago

  • % Done changed from 40 to 50

#14 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 13 years ago

1)Manage to change and point the DB to demo1 server (@office)
2)Tested all instances running fine - using BVMC

Next:

Stop all instances and try to start the bv_2 or bv_3 on Machine 2

#15 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 13 years ago

BV instances in Machine 2 able to UP when Machine 1 is down.
Means Machine 2 is not dependence on Machine 1.

#16 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 13 years ago

  • % Done changed from 50 to 70

#17 Updated by Vincent Devethas over 12 years ago

Hazri,

Is this task completed or keep under monitor and provide maintenance? If its completed, could you please update your task.

Thank you,

#18 Avatar?id=1733&size=24 Updated by Ahmad Hazri over 12 years ago

  • % Done changed from 70 to 100

The simulation done at site office, but the same issue cannot reproduce. Most probably because of different environment.
There is one time KFHIB03 machine was down, but this issue not happens. So suspected there is a networking issue there.

#19 Updated by Norhaidah Md Dasuki almost 11 years ago

Please assist to verify is this issue can be closed. Thank you.

#20 Avatar?id=1733&size=24 Updated by Ahmad Hazri almost 11 years ago

  • Status changed from Development / Work In Progress to Work Completed-End life cycle

Also available in: Atom PDF