<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Sysadmin on The Infinite Unknown</title>
    <link>https://www.jaredwatkins.com/tags/sysadmin/</link>
    <description>Recent content in Sysadmin on The Infinite Unknown</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Sun, 11 Dec 2011 00:00:00 +0000</lastBuildDate><atom:link href="https://www.jaredwatkins.com/tags/sysadmin/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>What I do: Interviewing for Linux Engineers</title>
      <link>https://www.jaredwatkins.com/posts/2011/12/what-i-do-interviewing-for-linux-engineers/</link>
      <pubDate>Sun, 11 Dec 2011 00:00:00 +0000</pubDate>
      <author>Jared Watkins</author>
      <guid>https://www.jaredwatkins.com/posts/2011/12/what-i-do-interviewing-for-linux-engineers/</guid>
      <description>&lt;p&gt;Now and then I’m called on to help interview candidates for linux admin/engineer slots and as I’ve been doing some of that lately I thought I’d share the way I go about doing a technical interview. This approach seems to work equally well over the phone or in person.&lt;/p&gt;
&lt;p&gt;I’m big on understanding the fundamentals of linux. If someone comes to me with a resume showing 10 years of experience building and managing production unix/linux systems there are certain things I’d expect them to know.. and to a certain depth. If they obviously don’t.. then I have to question the validity of what’s on the resume. So what I’ll usually do is pick a few key areas and start off with some general (easy) questions and then drill down a bit to discover the level of understanding on that particular topic. As an example.. I’ll share one of my favorites and lay it out the way I might do it during an interview.&lt;/p&gt;
&lt;p&gt;Q: If I wanted to know who was logged in, how busy a system was and how long it had been up what command would tell me all that?&lt;/p&gt;
&lt;p&gt;(Assuming they get that I’m looking for the ‘w’ command and mention the system load)&lt;/p&gt;
&lt;p&gt;Q: Why are there three numbers for the system load?&lt;/p&gt;
&lt;p&gt;(Assuming they know about the 3 time periods)&lt;/p&gt;
&lt;p&gt;Q: What is the system load.. what do those averages actually represent?&lt;/p&gt;
&lt;p&gt;(This usually starts to trip up the junior people but assuming they know it’s the run queue length)&lt;/p&gt;
&lt;p&gt;Q: How does a multi-cpu system affect your interpretation of system load?&lt;/p&gt;
&lt;p&gt;(Assuming they say something about dividing load by CPU count)&lt;/p&gt;
&lt;p&gt;Q: Describe the relationship between the load average measurement and the percentage busy you might get from ‘top’.&lt;/p&gt;
&lt;p&gt;This is usually about as far as I’ll take something like this.. but it can lead to a discussion about things like different kernel schedulers and how they can be tweaked etc. If a person can answer these and have that sort of discussion it tells me they have the right depth of understanding a senior person should have… at least on this area. Someone who has been an admin (but not what I’d classify as an engineer) should be able to answer at least the first 2 questions.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>What I do: Another Day in the Bunker</title>
      <link>https://www.jaredwatkins.com/posts/2011/02/another-day-in-the-bunker/</link>
      <pubDate>Sat, 19 Feb 2011 00:00:00 +0000</pubDate>
      <author>Jared Watkins</author>
      <guid>https://www.jaredwatkins.com/posts/2011/02/another-day-in-the-bunker/</guid>
      <description>&lt;p&gt;
  &lt;img src=&#34;https://www.jaredwatkins.com/posts/2011/02/another-day-in-the-bunker/inline_mantrap_sm.jpg&#34; width=&#34;150&#34; height=&#34;200&#34; alt=&#34;&#34; /&gt;

&lt;/p&gt;
&lt;p&gt;I thought some might find this interesting. Most who work outside of tech (and many inside tech) never see data centers like this.  To get to our company servers in this hosting facility I have to pass through two man traps and a total of 5 doors with hand scanners.  In addition, each rack of equipment is locked with a unique 6 digit code.  There are hundreds of cameras inside the main datacenter and guards constantly watching from a secure room. Any people or equipment that pass into or out of the facility must be scheduled and itemized and will be inspected prior to being allowed to pass. The weather inside is kept cold and at 50% humidity and the white noise from all the computer and cooling equipment is so loud it’s difficult to have even a yelling conversation more than a few feet apart.  Above you are several layers of infrastructure..  cooling returns, copper and fiber data, AC and DC power feeds of various voltages and tons of cameras and motion sensors… miles and miles of wire.  The tiled floor is raised about 18″ off the real floor and it’s through this space that the cooled air raises through perforated floor tiles along the front of the racks (known as the cold isle).&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&#34;https://www.jaredwatkins.com/posts/2011/02/another-day-in-the-bunker/inline_bunker_hall_sm.jpg&#34; width=&#34;150&#34; height=&#34;200&#34; alt=&#34;&#34; /&gt;

&lt;/p&gt;
&lt;p&gt;There were two of us on this trip and our main task was to remove a bunch of older equipment and make room for a new &lt;a href=&#34;http://www.wikinvest.com/stock/International_Business_Machines_(IBM)&#34;&gt;IBM&lt;/a&gt; blade cabinet.  We also had to convert all the remaining equipment over from 110 to 220v.  To do this we added two new PDUs for the new 220v 30A circuits and we removed a total of 6 110v PDUs. We were then able to migrate power for most of the servers without taking them down since they have dual power supplies.&lt;/p&gt;
&lt;p&gt;
  &lt;img src=&#34;https://www.jaredwatkins.com/posts/2011/02/another-day-in-the-bunker/inline_APC-AP7811.jpg&#34; width=&#34;589&#34; height=&#34;200&#34; alt=&#34;&#34; /&gt;



  &lt;img src=&#34;https://www.jaredwatkins.com/posts/2011/02/another-day-in-the-bunker/inline_bldctr_h_hero.jpg&#34; width=&#34;281&#34; height=&#34;200&#34; alt=&#34;&#34; /&gt;

&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>System Rescue CD to the.. rescue!</title>
      <link>https://www.jaredwatkins.com/posts/2011/02/system-rescue-cd-to-the-rescue/</link>
      <pubDate>Fri, 18 Feb 2011 00:00:00 +0000</pubDate>
      <author>Jared Watkins</author>
      <guid>https://www.jaredwatkins.com/posts/2011/02/system-rescue-cd-to-the-rescue/</guid>
      <description>&lt;p&gt;
  &lt;img src=&#34;http://farm6.static.flickr.com/5211/5421694137_c92bd1b195.jpg&#34; alt=&#34;&#34; width=&#34;160&#34; /&gt;

&lt;/p&gt;
&lt;p&gt;Here’s the scenario..  It’s 1 am and I have to shut down a critical linux server to relocate it in a rack to make room for new equipment. It should have been a 5 minute job.. but on powering up the server it refused to boot past printing the word ‘&lt;a href=&#34;http://en.wikipedia.org/wiki/GNU_GRUB&#34;&gt;Grub&lt;/a&gt;‘ on the screen.  This isn’t good..  this server is needed by a couple hundred thousand customers and rebuilding it wasn’t planned or scheduled.  On closer examination 3 of the 16 hard drive power lights are not on. It’s extremely unlikely that 3 drives would die like that on a server that isn’t even two years old.  Unfortunately I didn’t have a copy of the the &lt;a href=&#34;http://www.sysresccd.org&#34;&gt;System Rescue CD&lt;/a&gt; so the fix attempt would have to wait until morning.&lt;/p&gt;
&lt;p&gt;I had the &lt;a href=&#34;http://www.equinix.com/&#34;&gt;CoLo&lt;/a&gt; staff burn me a copy which I used to boot the damaged server the next morning.  It booted into a live linux environment and correctly detected all the server hardware.. including the raid controller.  I was able to check the status of the 3 raid arrays and found them to be all in working order.. the 3 dark drive lights were unrelated.  I was then able to &lt;a href=&#34;http://en.wikipedia.org/wiki/Chroot&#34;&gt;chroot&lt;/a&gt; into the broken system and &lt;a href=&#34;http://www.sysresccd.org/Sysresccd-Partitioning-EN-Repairing-a-damaged-Grub&#34;&gt;reinstall grub&lt;/a&gt; onto the primary disk. The server then booted normally and all was well. I still don’t know how or when the &lt;a href=&#34;http://en.wikipedia.org/wiki/Master_boot_record&#34;&gt;MBR&lt;/a&gt; got corrupted.. but thanks to the utility of the &lt;a href=&#34;http://www.sysresccd.org&#34;&gt;RescueCD&lt;/a&gt; this was an easy fix.&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
