<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7473479913374904057</id><updated>2011-04-21T19:43:29.242-07:00</updated><title type='text'>Oracle RAC and more...</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://artthedba.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7473479913374904057/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://artthedba.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Art_The_DBA</name><uri>http://www.blogger.com/profile/12209601209888619290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>2</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7473479913374904057.post-6073917220878272538</id><published>2009-03-27T05:24:00.000-07:00</published><updated>2009-03-27T16:47:02.743-07:00</updated><title type='text'>Oracle CRS (oprocd) TOC for clusterware integrity</title><content type='html'>&lt;span style="font-weight: bold;font-family:arial;" &gt;&lt;br /&gt;oprocd: A Journey to the Unknown&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Troubleshooting node reboot can be frustrating and definitely a tedious endeavor. On the other hand, it can also be rewarding when you nail it with supporting facts and reliable results.  One of the main components of Oracle RAC is &lt;/span&gt;&lt;span style="font-weight: bold;font-family:arial;" &gt;oprocd&lt;/span&gt;&lt;span style="font-family:arial;"&gt;.  This tiny software is so powerful that it can reboot a node over and over again if it deemed necessary for I/O fencing.  It is therefore very important for us to understand &lt;/span&gt;&lt;span style="font-weight: bold;font-family:arial;" &gt;oprocd&lt;/span&gt;&lt;span style="font-family:arial;"&gt;.  However, browsing through Oracle manuals can give you very little information about this piece of software.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:arial;" &gt;oprocd (Linux)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;br /&gt;Up until Oracle Clusterware 10.2.0.3, the hang-check timer module is used by Oracle RAC on Linux to detect nodes that have hardware issues or have failed devices which cause the node to hang and not to respond.  Starting Oracle Clusterware 10.2.0.4, Oracle indicated it will also use oprocd.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-family:arial;" &gt;oprocd (other O/S)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Oracle Clusterware 10.2.0.3 in all other operating systems including HP-UX 11.31 uses oprocd to implement I/O fencing.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;oprocd bits and pieces&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Julian Dyke (http://www.juliandyke.com/Presentations/RACTroubleshooting.ppt) gave a formula on how the oprocd values change when diagwait is configured:&lt;br /&gt;&lt;br /&gt;If diagwait &gt; reboottime then  OPROCD_DEFAULT_MARGIN := (diagwait - reboottime) * 1000&lt;br /&gt;&lt;br /&gt;You can actually see for yourself this logic in $ORA_CRS_HOME/css/admin/init.cssd&lt;br /&gt;where $ORA_CRS_HOME is where you installed your Oracle Clusterware.  The active version of this code is in /sbin/init.d/init.cssd for HP-UX 11.31.  I believe it is in /etc/init.d in other operating system.&lt;br /&gt;&lt;br /&gt;Both diagwait and reboottime are stored in Oracle Cluster Registry(OCR).  When you start with&lt;br /&gt;&lt;br /&gt;crsctl start crs&lt;br /&gt;&lt;br /&gt;Oracle get these values from OCR and compute for the margin as can be seen in&lt;br /&gt;&lt;br /&gt;/opt/var/oracle/oprocd/nodename&lt;node&gt;.oprocd.log:&lt;br /&gt;&lt;/node&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;Oct 29 15:47:14.700 | INF | monitoring started with timeout(1000), margin(500), skewTimeout(125)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;node&gt;&lt;br /&gt;The values are in milliseconds.  The default is 500 milliseconds which is the lowest and can be achieved by not setting diagwait in OCR, as in this case.&lt;br /&gt;&lt;br /&gt;I was told by Oracle support that oprocd wakes up every minute to get the current time.  If it is within 500ms range with the last result it will go back to sleep again otherwise it will reboot the node.&lt;br /&gt;&lt;br /&gt;I found the above statement vague and misleading which caused more questions asked than answered:&lt;br /&gt;&lt;br /&gt;How does oprocd get the time?&lt;br /&gt;&lt;/node&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;node&gt;What do you mean by within range with the last result?&lt;br /&gt;&lt;br /&gt;I've spent countless hours at night trying to decipher how oprocd works in the quest of understanding why it caused each node to reboot about 6 times a day. &lt;br /&gt;&lt;br /&gt;The following is my own personal opinion and understanding about how oprocd works.  It does not reflect Oracle or HP which owns the original code of Oracle Clusterware since Oracle bought Tru64 Clustering Technology some years ago.  Of course I could be wrong or Oracle may change it in the new version.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;oprocd logic&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;oprocd gets the current time, save this value and sleep for 1 second  (1000ms).  oprocd wakes up and gets the current time, compare it with (the previous time saved + 1 second).  If the difference is more than .5 second (500ms) oprocd assumes that something is wrong with the node and reboots it accordingly implementing I/O fencing as designed.  In pseudo-code form,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;margin_time = .5 second;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;sleep_time = 1 second;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;save_time = get current time;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;start loop&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;sleep for sleep_time;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;current_time= get current time;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;if abs(current_time - (save_time+sleep_time)) &gt; margin_time&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   then&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;         reboot;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   else&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;        save_time = current_time;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;end loop&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;How can this simple logic fail and cause headaches for DBAs, Project Managers(PMs), and everyone involve in the Oracle RAC Deployment project?&lt;br /&gt;&lt;br /&gt;Watch out for the answers next time...&lt;br /&gt;&lt;/node&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7473479913374904057-6073917220878272538?l=artthedba.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://artthedba.blogspot.com/feeds/6073917220878272538/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://artthedba.blogspot.com/2009/03/oracle-crs-oprocd-toc-for-clusterware.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7473479913374904057/posts/default/6073917220878272538'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7473479913374904057/posts/default/6073917220878272538'/><link rel='alternate' type='text/html' href='http://artthedba.blogspot.com/2009/03/oracle-crs-oprocd-toc-for-clusterware.html' title='Oracle CRS (oprocd) TOC for clusterware integrity'/><author><name>Art_The_DBA</name><uri>http://www.blogger.com/profile/12209601209888619290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7473479913374904057.post-2102517177850976384</id><published>2009-03-10T15:47:00.000-07:00</published><updated>2009-03-10T15:51:11.002-07:00</updated><title type='text'>My First</title><content type='html'>I have been planning to share my professional experience through blogging but always too busy to start it.&lt;br /&gt;&lt;br /&gt;Finally, I'm here doing it.&lt;br /&gt;&lt;br /&gt;More to come...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7473479913374904057-2102517177850976384?l=artthedba.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://artthedba.blogspot.com/feeds/2102517177850976384/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://artthedba.blogspot.com/2009/03/my-first.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7473479913374904057/posts/default/2102517177850976384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7473479913374904057/posts/default/2102517177850976384'/><link rel='alternate' type='text/html' href='http://artthedba.blogspot.com/2009/03/my-first.html' title='My First'/><author><name>Art_The_DBA</name><uri>http://www.blogger.com/profile/12209601209888619290</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
