The purpose of this post is to describe my first attempt at a Golang programing. I decided it would be neat to use Golang to query the Github API for list Magento repositories and the location associated with the owner of the repository. If you search Github for: “magento language:php” at the time of this writing you get around 3000+ repositories. See for yourself.

All Github repositories are owned by a Github user or organization. A Github user or organization can choose to displaying their geographic location in their Github profile. Since I work daily with Magento I thought it would be neat to get a list of geographic locations associated with the 3000+ Magento repos. Please note: I’m almost certain there is a better way of doing this, but this is my first time writing a Golang program so who cares?

Let’s get started why dont we

If you don’t already have Golang I suggest you download it by following the instructions here: http://golang.org/doc/install

There is one caveat. If you are on OSX you will need to make sure to add the following lines to your ~/.bash_profile file:


vi ~/.bash_profile
# add line below to end of it
export PATH=$PATH:/usr/local/go/bin

Then save and reload your bash PATHS by issuing:


source ~/.bash_profile

Before we get started lets create a nice place for all of your Go projects. Create a directory in your home directory. For example I have a “Dev” folder in my “/users/tegan/Dev/”” path. I just created another folder called “golang” in that folder to hold my Go projects.

When you have the folder created you next need to setup your “GOPATH”.


vi ~/.bash_profile
# add these lines to end
export GOPATH=$HOME/Dev/golang
export PATH=$GOPATH/bin:$PATH

Now lets get started by creating a “main.go” file in: /users/tegan/Dev/golang/

File: main.go


package main

import (
    "fmt"
    "github.com/google/go-github/github"
)

func main() {
    client := github.NewClient(nil)

    fmt.Println("Repos that contain magento and PHP code.")

    query := fmt.Sprintf("magento+language:php")

    opts := &github.SearchOptions{
        Sort: "stars",
        ListOptions: github.ListOptions{
            PerPage: 100,
        },
    }

    repos, _, err := client.Search.Repositories(query, opts)

    if err != nil {
        fmt.Printf("error: %v\n\n", err)
    } else {
        fmt.Printf("%v\n\n", github.Stringify(repos))
    }

    rate, _, err := client.RateLimit()
    if err != nil {
        fmt.Printf("Error fetching rate limit: %#v\n\n", err)
    } else {
        fmt.Printf("API Rate Limit: %#v\n\n", rate)
    }
}

Now run this file:


go run main.go

If it works you should get a list of all Github repos that contain PHP code and have the word “magento” somewhere in the code. The format will be json output. Note from the import statement (“go-github/github”) we are including a library that Google wrote to make dealing with Github repositories in go really simple.

Lets say we want to spice it up a little bit and get a little fancier. I’m taking bits and pieces from some other Golang examples I found browsing Github and I put together this:


package main

import (
    "fmt"
    "github.com/google/go-github/github"
    "log"
    "math"
    "time"
)

const (
    REMAINING_THRESHOLD = 1
)

func main() {
    client := github.NewClient(nil)

    fmt.Println("Repos that contain magento and PHP code.")

    page := 1
    maxPage := math.MaxInt32

    query := fmt.Sprintf("magento+language:php")

    opts := &github.SearchOptions{
        Sort: "stars",
        ListOptions: github.ListOptions{
            PerPage: 100,
        },
    }

    for page <= maxPage {
        opts.Page = page
        result, response, err := client.Search.Repositories(query, opts)
        Wait(response)

        if err != nil {
            log.Fatal("FindRepos:", err)
        }

        maxPage = response.LastPage

        msg := fmt.Sprintf("page: %v/%v, size: %v, total: %v",
            page, maxPage, len(result.Repositories), *result.Total)
        log.Println(msg)

        for _, repo := range result.Repositories {

            fmt.Println("repo: ", *repo.FullName)
            fmt.Println("owner: ", *repo.Owner.Login)

            time.Sleep(time.Millisecond * 500)

        }

        page++

    }

}

func Wait(response *github.Response) {
    if response != nil && response.Remaining <= REMAINING_THRESHOLD {
        gap := time.Duration(response.Reset.Local().Unix() - time.Now().Unix())
        sleep := gap * time.Second
        if sleep < 0 {
            sleep = -sleep
        }

        time.Sleep(sleep)
    }
}

Now we have a list of all the Github repositories on Github that are Magento related we can do some interesting stuff. Lets say we want to get a list of all the Magento repository owners and group them by their geographic location to get a comprehensive list of Magento repositories on Github geographically. Here is a way to do that.

Lets start by adding pulling in the Github user locations:


    for _, repo := range result.Repositories {

        repo_name := *repo.FullName
        username := *repo.Owner.Login

        fmt.Println("repo: ", repo_name)
        fmt.Println("owner: ", username)

        user, response, err := client.Users.Get(username)
        Wait(response)

        if err != nil {
            fmt.Println(err)
        } else {

            if user.Location != nil {
                fmt.Println("location: ", *user.Location)
            } else {
                fmt.Println("location: ", user.Location)
            }

        }

        time.Sleep(time.Millisecond * 500)

    }

    page++

That works great but you run into Github API rate limit issues. To get around that you can create an oAuth app at in your application settings page. Note you can always test your rate limit at at anytime by visiting: https://api.github.com/rate_limit?client_id=CLIENT_ID_HERE&client_secret=CLIENT_SECRET_HERE

Here is my example with oAuth authentication. Note I’ve also put in a file writer so we can write everything to “/tmp/locations.txt”.


package main

import (
    "fmt"
    "github.com/google/go-github/github"
    "io"
    "log"
    "math"
    "os"
    "time"
)

const (
    REMAINING_THRESHOLD = 1
)

func main() {

    t := &github.UnauthenticatedRateLimitedTransport{
        ClientID:     "YOUR_CLIENT_ID_GOES_HERE",
        ClientSecret: "YOUR_CLIENT_SECRET_GOES_HERE",
    }
    client := github.NewClient(t.Client())

    fmt.Println("Repos that contain magento and PHP code.")

    page := 1
    maxPage := math.MaxInt32

    query := fmt.Sprintf("magento+language:php")

    opts := &github.SearchOptions{
        Sort: "stars",
        ListOptions: github.ListOptions{
            PerPage: 100,
        },
    }

    filename := "/tmp/repo_locations.csv"

    f, err := os.Create(filename)
    if err != nil {
        fmt.Println(err)
    }

    for page <= maxPage {
        opts.Page = page
        result, response, err := client.Search.Repositories(query, opts)
        Wait(response)

        if err != nil {
            log.Fatal("FindRepos:", err)
        }

        maxPage = response.LastPage

        msg := fmt.Sprintf("page: %v/%v, size: %v, total: %v",
            page, maxPage, len(result.Repositories), *result.Total)
        log.Println(msg)

        for _, repo := range result.Repositories {

            repo_name := *repo.FullName
            username := *repo.Owner.Login

            fmt.Println("repo: ", repo_name)
            fmt.Println("owner: ", username)

            user, response, err := client.Users.Get(username)
            Wait(response)

            if err != nil {
                fmt.Println(err)
            } else {

                if user.Location != nil {

                    user_location := *user.Location

                    fmt.Println("location: ", user_location)

                    n, err := io.WriteString(f, "\""+username+"\",\""+user_location+"\",\""+repo_name+"\"\n")
                    if err != nil {
                        fmt.Println(n, err)
                    }

                }

            }

            time.Sleep(time.Millisecond * 500)

        }

        page++

    }

    f.Close()

}

func Wait(response *github.Response) {
    if response != nil && response.Remaining <= REMAINING_THRESHOLD {
        gap := time.Duration(response.Reset.Local().Unix() - time.Now().Unix())
        sleep := gap * time.Second
        if sleep < 0 {
            sleep = -sleep
        }

        time.Sleep(sleep)
    }
}

If you ran the above program you would find it quitting after producing 1000 records. This is because Github imposes a limit on the results returned by a search API call. The Search API returns only the top 1000 results. You could get around that restriction by slicing your search API query into multiple calls based on the time that the repositories were created.

Here is the final version that gets around the 1000 limit by splitting the query into batches on the created_at times of the repositories:


package main

import (
	"fmt"
	"github.com/google/go-github/github"
	"io"
	"log"
	"math"
	"os"
	"time"
)

const (
	REMAINING_THRESHOLD = 1
)

func main() {

	t := &github.UnauthenticatedRateLimitedTransport{
		ClientID:     "YOUR_CLIENT_ID_GOES_HERE",
		ClientSecret: "YOUR_CLIENT_SECRET_GOES_HERE",
	}
	client := github.NewClient(t.Client())

	fmt.Println("Repos that contain magento and PHP code.")

	// create a file to be used for geocoder
	filename := "/tmp/locations.txt"

	f, err := os.Create(filename)
	if err != nil {
		fmt.Println(err)
	}

	// slice the queries into batches to get around the API limit of 1000

	queries := []string{"\"2008-06-01 .. 2012-09-01\"", "\"2008-06-01 .. 2012-09-01\"", "\"2012-09-02 .. 2013-04-20\"", "\"2013-04-21 .. 2013-10-20\"", "\"2013-10-21 .. 2014-03-10\"", "\"2014-03-10 .. 2014-07-10\"", "\"2014-07-10 .. 2014-09-30\""}

	for _, q := range queries {

		query := fmt.Sprintf("magento language:PHP created:" + q)

		page := 1
		maxPage := math.MaxInt32

		opts := &github.SearchOptions{
			Sort:  "updated",
			Order: "desc",
			ListOptions: github.ListOptions{
				PerPage: 100,
			},
		}

		for page <= maxPage {
			opts.Page = page
			result, response, err := client.Search.Repositories(query, opts)
			Wait(response)

			if err != nil {
				log.Fatal("FindRepos:", err)
			}

			maxPage = response.LastPage

			msg := fmt.Sprintf("page: %v/%v, size: %v, total: %v",
				page, maxPage, len(result.Repositories), *result.Total)
			log.Println(msg)

			for _, repo := range result.Repositories {

				repo_name := *repo.FullName
				username := *repo.Owner.Login
				created_at := repo.CreatedAt.String()

				fmt.Println("repo: ", repo_name)
				fmt.Println("owner: ", username)
				fmt.Println("created_at: ", created_at)

				user, response, err := client.Users.Get(username)
				Wait(response)

				if err != nil {
					fmt.Println(err)
				} else {

					if user.Location != nil {

						user_location := *user.Location

						n, err := io.WriteString(f, "\""+username+"\",\""+user_location+"\",\""+repo_name+"\",\""+created_at+"\"\n")
						if err != nil {
							fmt.Println(n, err)
						}

					} else {

						user_location := "not found"

						n, err := io.WriteString(f, "\""+username+"\",\""+user_location+"\",\""+repo_name+"\",\""+created_at+"\"\n")
						if err != nil {
							fmt.Println(n, err)
						}

					}

				}

				time.Sleep(time.Millisecond * 500)

			}

			page++

		}

	}

	f.Close()

}

func Wait(response *github.Response) {
	if response != nil && response.Remaining <= REMAINING_THRESHOLD {
		gap := time.Duration(response.Reset.Local().Unix() - time.Now().Unix())
		sleep := gap * time.Second
		if sleep < 0 {
			sleep = -sleep
		}

		time.Sleep(sleep)
	}
}

Now that we have a nice list of repositories formatted like this:
“username,location,reponame,created_at”
Here is a full list of the what the file looks like:
Locations.txt Gist

Map Geocoding the Results with Node.js
Wouldn’t it be nice if we put all the Magento repositories on a nice world map so we can plot the Github contributions to Magento around the world? Out of the 3650 repos we found 1193 didn’t have locations listed so we can use the remaining 2457 and see if we can plot them on a map.

For geocoding the results into a nice map I used a Node.js geocoder from Javier Arce found here: javierarce/node-batch-geocoder.

I will spare you the data messaging that I had to due to get the data in the correct format for Tilebox/Mapbox. Here is the map you’ve been waiting for:

Full source code available on Github here:
https://github.com/tegansnyder/Golang-Magento-Github-Repo-Search

Subtle Golang differences

Since this is my first Golang program I thought I would share some of syntax and convention differences. This is by no means an exhaustive list, but here are a few that I found:

  • Use ” double quotes not ‘ single quotes in a string. Go doesn’t like single quotes
  • Use the plus + operator to append strings together not full stops.
  • No semicolons
  • Doesn’t care about tabbing
  • No brackets around if statements
  • Must use curly brackets – on if statements, for etc
  • Every variable must be used
  • functions can return multiple variables
  • its nil not null