Squaring Circles: August 2010

Friday, August 27, 2010

The Java Heap

Note: Today's post is somewhat of a continuation of yesterday's. Just in case you didn't read yesterday's post, or would like to review, you can find it here.

While writing the code for yesterday's blog post, I ran into a runtime error that I hadn't personally run into before, though it is by no means an uncommon error. The only reason I haven't run into it before is because I haven't before written programs in Java that had significant memory requirements.

Let's look at a section of code from yesterday:

static int size = 2048;
 
 static int A[][] = new int[size][size];
 static int B[][] = new int[size][size];

Find the complete code here

This section is creating and instantiating the int variable "size" then allocating memory for two two-dimensional arrays that contain size² elements. Let's try adding another 1024 to the size variable and see what happens.

Click for full size

Now that's interesting. The part we're interested in is:

java.lang.OutOfMemoryError: Java heap space at ArrayExample.

Apparently when allocating memory for the second array, we run into a problem, as there is not enough memory. At first glance, this seems strange, as I have 8 gigabytes of ram in this machine, and even though Java can't use all of that because I'm running 32-bit Java, I did not think it would be a problem as the maximum memory still should not have come anywhere near this.

I suppose the question now is how much memory did the example attempt to use? Just looking at the size of the arrays, and not taking into account overhead, the arrays come out to just over 73mb of space. Considering 32-bit allows for ~4gb of space, this still seems odd that we ran into a problem. In order to explain this, we have to focus on another word in the error message:

java.lang.OutOfMemoryError: Java heap space at ArrayExample.

What is the "heap space"?

According to JavaWorld:

The JVM's heap stores all objects created by an executing Java program.

What is the size of the heap?

By default, the size of the heap in Java 1.6 is 64mb.

Now we can see that we were, in fact, trying to use more memory than we had available. I have a problem with this though. I want to use more memory than 64mb. Why? Honestly, because I can.

Is this even possible, or are we stuck with this limit?

Luckily for us, Java allows for us to bypass this limit at runtime. By executing the java command with the argument -Xmx and choosing a new maximum heap size, we can set a larger heap size for the program's use.

Example:

java -Xmx128m ArrayExample

The number in the example is our desired maximum heap size. The "m" after the number designates that we mean megabytes, you can also use "g" for gigabytes, etc.

Let's try it, using the same program as before:

Click for full size

Ah, now it runs correctly, and we can run our test from yesterday with even larger numbers, allowing us to see the difference between the two array copying implementations more clearly. You can go even larger than I did here.

Just as an example, here's a run with arrays containing 10240² (5x as many as yesterday's) elements:

Click for full size

Now we're getting somewhere.

Another problem you may run into is trying to allocate more memory than you have physically in your machine, or just more than allowed. Trying to set the maximum heap to 4g gives an error that it is an invalid maximum heap size. Yet another error you may run into is the JVM not being able to find enough contiguous memory. The heap only allows you to use memory that is in one big block, so just because you have two gigabytes unused on your machine doesn't mean you'll be able to use it (trust me, I tried).

Have some fun with it, see what you can do. I'd love to hear what the maximum size people can get to work on their machines is.

C code still to come, and I'll try to have some other examples besides just this array copying scenario.

Here's some links to more information on the heap and other options:

6 Common Errors in Setting Java Heap Size

Code source for finding size of ints on your machine to calculate space needed

Java's garbage-collected heap

Java memory tuning tips

Thursday, August 26, 2010

Code Complexity Classes: Math vs. Reality

Had an enjoyable first day of my computer science classes yesterday. In my Computer Organization and Assembly Language class, the teacher went over some examples of different C code, then we'd look at the Assembly for the code to see what was really going on. One example he showed used large arrays.

In the example, the scenario he gave was that students are given a large array, and they need to write code to copy the array to another array. In this scenario, two students have very similar implementations, but with one key difference.

Note: I redid the code in Java so I could add a few tweaks, C and other languages to come later.

Code for the first student:

static void copyij(){
  for( int i = 0; i < size; i++){
   for( int j = 0; j < size; j++){
    B[i][j] = A[i][j];
   }
  }
 }

Code for the second student:

static void copyji(){
  for( int j = 0; j < size; j++){
   for( int i = 0; i < size; i++){
    B[i][j] = A[i][j];
   }
  }
 }

Just in case you missed it, or didn't feel like looking at the code, the difference is in the order they copied the elements of the array. Both students' code will compile and run without error. The first student copied all elements row by row, as opposed to the second student who went column by column.

Looking just at the complexity class of each piece of code, we see that both have a complexity of O(n²). Since this is often how we measure the efficiency of code, most people wouldn't look into it any further. The real world doesn't always work out so perfectly though. Running both segments of code and logging the time of each copy shows that the second student's code runs significantly slower than the first.

How much slower? Over 10 runs, the first student's code averaged 16ms, whereas the second student's code averaged 71ms. On my netbook, the difference was even more profound, with the first code being, on average, ~10 times faster than the second.

These times are in milliseconds though, and are so small as to be negligible. Without running benchmarks like this, you wouldn't even notice this in your program. What happens when your array has 10 times the amount of data as the one used here? Well, wait until tomorrow where we cover the same topic using C code and you can find out.

Edit: forgot to add the complete code so you can run it for yourself. The complete code can be found here. Happy coding!

Friday, August 13, 2010

In Memory of Matthew Shoemaker

Matthew Shoemaker

(1973 - 2010)

On Friday, July 30, Matthew Shoemaker passed away. Matthew was a host and co-founder of one of my favorite podcasts, Infosec Daily. Even though I never got the opportunity to meet him in person, I feel that through listening to the podcast each night, I did get to know him, and I will miss hearing his insight into the world of information security.

The ISD podcast has a page set up in memory of Matthew, which you can find here. The page links as well to a paypal account for donations to the Matthew Shoemaker Memorial Fund, which has been set up to help provide for his two sons. There is also a memorial episode of the podcast (episode 185), in which many friends of Matthew who had spoken on the podcast in the past gathered to reminisce and pay tribute to their friend.

A conference in his memory is also in the works. Called ShoeCon, it will be held September 18th in Atlanta. Proceeds from the con will go to the Matthew Shoemaker Memorial Fund. More information can be found at the Infosec Daily website.

Please keep Matthew's family in your thoughts and prayers.

Google's Data Liberation Front

With all the news lately about Google's apparent deal with Verizon and what it may mean for net neutrality, it can be hard to remember Google's informal motto of "Don't Be Evil," or just hard to believe that it actually means something. There have been issues with Google not always following through with this motto to the satisfaction of many in the past, but that is beside the point. No company is perfect, and I'm not here to defend Google on other issues or to criticize them, all I want to do is praise a team in Google that I see to be doing something good for the internet.

In 2007, an internal engineering team at Google named itself "the Data Liberation Front." Their goal is to make it easier for users to retain control of their data. They do this by creating ways for users to easily backup their data from Google services and also to remove it from these services.

The argument for users retaining control over their own data is one that has been a rather large issue over the past few years. Groups advocating privacy and internet user rights have complained against Facebook and other companies for their stances on data ownership or users not being able to satisfactorily remove their data from the offending company's servers.

Even Google has been in the cross-hairs before on the amount personal data they collect from users, store, and use for advertising purposes. The Data Liberation Front is a refreshing change from all of this though, on their homepage, you can find their mission statement easily visible in large red letters:

Users should be able to control the data they store

in any of Google's products. Our team's goal is to

make it easier to move data in and out.

They admit that they haven't perfected this on all the Google services, but they are working on it. On the site you can also find a list of products they have already worked on allowing you to liberate your data.

I'm a big fan of what this team is doing, and though I realize they have been around for a few years and this is in no way new news, I wanted to mention them here in order to spread the word.

Also, everyone likes stickers (at least, I know I do, my laptops are covered in them). The Data Liberation project currently has a way you can show your support for them, and that is by proudly displaying Stickers from Data Liberation Farms (which you can get for free!). I received both of mine in the mail today:

As you can see, it also came with a surprise, a small sticker of the Data Liberation Front logo, which I've already put to work on my netbook:

You can find out more information about the Data Liberation Front on their website at http://www.dataliberation.org/. Additionally, you can follow along with their mission at the Data Liberation Blog or on Twitter at @dataliberation.

Monday, August 9, 2010

Big Development in Computer Science

Since I claim that this blog has computer science as one of the topics I discuss here, I feel I'd be remiss to not discuss a very recent and interesting development in the computer science world.

The are several problems in computer science that people are trying to solve all the time, many of these problems tie into mathematics or other areas. Wikipedia has a good list of currently unsolved problems, with links to descriptions and more information for those interested.

The problem of interest to this blog post is the first such problem mentioned on that page: P = NP?

AOL News has an article that I feel explains the problem much better than the Wikipedia page does for people who are not as fluent in their math as I'd like to be (you can find the complete article here). Quoted from said article:

Roughly, the mathematical problem asks if "questions exist whose answer can be quickly checked, but which require an impossibly long time to solve by any direct procedure," according to the Clay Mathematics Institute.

This may not adequately describe the problem to you, but if you're interested in more information on the problem itself, I direct you again to the article and also to the Wikipedia page on P versus NP.

My reason for not explaining the problem better here is twofold. First, I do not believe I can explain it better than those sites do. Second, my purpose for writing this is not to explain what the problem is, but to share a bit of news on the problem.

Now that we have that out of the way, on to the news!

On August 6th, e-mails were sent to several researchers from Dr. Vinay Deolalikar. Dr. Deolalikar is the Principle Research Scientist at HP Labs. In this e-mail, he explained to his fellow researchers that he has found a solution to this problem, and also shared with them his findings in a paper.

This paper was leaked onto the web, in pdf form (can be found here). The 66 page document has been shared across the internet by now, but is not the end of the story. Today, Dr. Deolalikar released a longer, updated form of the paper (102 pages, here). He has also stated that the final form of the paper is still under work, and will be released shortly.

This may not sound like such a big deal to some of you, but I assure you, this may be the most important computer science problem currently, and if this answers the problem, it is an amazing piece of research. The P vs. NP problem is one of the Millennium Problems and solving it has a prize of $1,000,000. Yes, one million dollars to whoever solves this problem, or any of the other Millennium Problems, it is that big of a find.

Currently it can be assumed that hundreds, if not thousands, of mathematicians and computer scientists, professional and armchair alike, are poring over the paper looking to see if there are any issues with it and to determine if it is correct. I for one find it way over my head, as I had to pull out a dictionary during the first paragraph of the abstract, but I still find it exciting that a solution may have been found.

Links

HP Labs: Vinay Deolalikar
AOL News: P=NP=WTF?: A Short Guide to Understanding Vinay Deolalikar's Mathematical Breakthrough
Greg and Kat's Blog: P ≠ NP
The P Versus NP Page
Wikipedia: P and NP
The Millennium Prize Problems

The Early, Leaked Version of the Paper (66 pages)
Updated Version of Paper (102 pages)

Aquarium Update

For the past month or so I've been having a small problem in my aquarium. All my plants have been dying. I had about 12 small plants in the aquarium, not I'm down to one healthy plant and one that is on it's way out.

I have clay mixed in with the rocks to provide essential minerals to the plants, and I've even bought some liquid plant fertilizer for use in aquariums, nothing seems to be helping. The only thing I can think of is that the water is colder than what the temperature should be for these plants.

If that is the case, the only choice I have to save the plants would be to add a heater to the tank. I do have a heater, but with goldfish being cold water fish, this is not ideal. Right now I just plan on buying new plants when I have the time and money to do so, as they weren't completely necessary for my water quality anyways. My aquarium just looks a little bare now without them.

In other aquarium related news, the fishcam has been down for quite a while now. The reason for this is that I've been using the laptop that normally just hosts the feed for some other work, requiring it to be away from the aquarium. I should have the camera back up and operational very soon.

Friday, August 6, 2010

Supercomputing and Some Updates

Supercomputing, Cluster Computing, and Grid Computing. These are all topics I have been interested in for quite some time, though I've never done any actual work on the topics, just some casual browsing for information and ogling pictures of different setups.

Lately I've had my interest in these topics rekindled, and now I actually have the knowledge and means to play around with all of them. Granted, I don't have the means ($$$) to play in these areas to the extent I would like to, but I can dabble nonetheless.

Over the next few weeks and months I'll be doing some small scale experiments with different High Performance Computing (HPC) models. Expect to see a few write-ups and updates here as I do this.

Now, onto some updates.

I've been taking classes for most of the summer, and when not in class, I've been enjoying my time off. Due to this, I've been extremely busy, and haven't had time to update this blog as much as I wanted to. I don't want to be one of those bloggers who apologizes constantly for not updating, so I'm not going to apologize. If you want me to update more often, bug me about it, otherwise, deal with it.

Another area I've been spending some time playing around in is different programming languages. Back in the day (I love sounding old), I dabbled in PHP in addition to writing HTML and CSS by hand. It's been a couple years, and lots of things have changed in all these areas. These past few weeks I've been doing some basic work in more languages than just Java (what my classes have been on during the past year). I believe I'm falling in love with programming in Scala, so expect me to post some tidbits (you'll see this word again in a minute!) on these different languages as I work my way through learning them.

In other news, my friend, Miranda, over at Tidbits for Your Wits (She changes this name way too often, so if the link breaks, sorry. She's promised not to change it again though.) has moved to an updating everyday system of posting. Not to be left in the dust, I'm going to be moving to a posting at least once a week system, as opposed to my current "post when I feel like it" system of blogging.

I've also added a new section to the sidebar, called "Currently Reading." This new feature is going to show to everyone (you guessed it) what I'm currently reading. I'll try to update it as soon as I move on to a new book (or books), but it is bound to fall behind from time to time. At the end of each book, I'll think about posting a review, if I feel like it.

That's about it, look forward to you all reading my stuff in the near future.