HFST: Move from Sourceforge to Github

After migration

The main page contains four repositories

repository content
hfst Everything from folder hfst3 at Sourceforge repository except folders new_library, old_python and NSIS.
hfst-ospell Everything from folder hfst-ospell at Sourceforge repository.
hfst-optimized-lookup Everything from folder hfst-optimized-lookup at Sourceforge repository.
hfst.github.io The web pages that are visible at https://hfst.github.io/. Mostly copied from web pages at Sourceforge.

The Sourceforge repository still exists and it is possible to use it. However, the three migrated folders (hfst, hfst-ospell and hfst-optimized-lookup) are locked with svn lock:

svn lock -m "Migrate to Github." https://sourceforge.net/p/hfst/code/HEAD/tree/trunk/[hfst3|hfst-ospell|hfst-optimized-lookup]

The migrated commits have the original svn revision number visible, shown by the git-svn-id, for example in this commit.

Bugs reported via Sourceforge bug tracker are migrated as Github issues. Reporting and commenting bugs has been disabled in Sourceforge bug tracker. As the original Sourceforge repository, or most of it, is split into three separate repositories at Github, the bugs are also split among the repositories. As a result, their numbering has changed. The correspondencies are listed here and here.

The commits are referenced in bug reports and vice versa ("This bug is fixed in revision NNNN.", "This commit fixes bug #NNN.") and the bug reports sometimes also refer to each other ("Relates to bug #NNN."). TODO: The migrated Github issues will be edited so that bug numbers are changed to corresponding Github issues and references to svn revisions are changed to corresponding git revisions.

Using Github

Very basic usage:

git clone https://github.com/hfst/hfst.git

clones the repository hfst in a directory named hfst.

To get newest version:

git pull

To commit changes in file FILE:

git add FILE
git commit -m "What changed in file FILE."
git push origin master

The migration process

Import trunk:

mkdir hfst-import
cd hfst-import
svn2git --verbose --authors authors.txt --nobranches --notags --metadata --exclude articles --exclude conversion-scripts --exclude debian --exclude hfst-ataq --exclude hfst-macport --exclude hfst-ol-lib-preliminary --exclude hfst-web --exclude lossy-hyper-minimization --exclude ocr-pp-old --exclude ocr-python --exclude under-development --exclude util-scripts --exclude web-demo-bash --exclude xfst-development --exclude hfst3/NSIS --exclude hfst3/new_library --exclude hfst3/old_python svn://svn.code.sf.net/p/hfst/code

For SUBDIR in hfst-ospell, hfst-optimized-lookup and hfst3:

git subtree split -P SUBDIR -b SUBDIR-new

For SUBDIR in hfst-ospell, hfst-optimized-lookup and hfst3: create corresponding repositories in github and

cd ..
mkdir import-SUBDIR
cd import-SUBDIR
git init
git pull /FULL/PATH/TO/hfst-import-copy SUBDIR-new
git remote add origin https://github.com/hfst/SUBDIR.git
git push origin -u master

(From http://stackoverflow.com/questions/359424/detach-subdirectory-into-separate-git-repository/17864475#17864475)

authors.txt file lists svn-to-git mappings for users. To find out users in svn, run

svn log --quiet | grep -E "r[0-9]+ \| .+ \|" | cut -d'|' -f2 | sed 's/ //g' | sort | uniq
The format of authors.txt file is something like:
sfname1 = gitname1 <email>
sfname2 = gitname2 <email>
no_author = no_author <email>

Then migrate Sourceforge tickets into Github issues. First export the project from SF. Check 'Bugs' and wait for email with download instructions. The bugs are in file bugs.json. You also need to export collaborators from Github with

curl -H "Authorization: token TOKEN" https://api.github.com/repos/hfst/REPONAME/collaborators > collab.json
where TOKEN is an access token. You also need to specify Sourceforge-to-Github mappings in a usermap.json file:
{ "sfname1":"githubname1",
... }

Now the bug tracker can be exported with gosf2github:

./gosf2github.pl -r hfst/hfst-import-candidate -t TOKEN -u usermap.json -a <default_assignee> -c collab.json bugs.json
(Note that gosf2github writes each ticket to a tempfile foo.json in the current directory, then exports that file and waits for three seconds.)

To filter out tickets, you can modify gosf2github.pl:

# Add variables
my @exclude_tickets = ();
my @only_tickets = ();

# and options
elsif ($opt eq '-x') {
    @exclude_tickets = split / /, shift @ARGV;
elsif ($opt eq '-o') {
    @only_tickets = split / /, shift @ARGV;

# and these after SKIPPING check
if ($num ~~ @exclude_tickets) {
    print STDERR "EXCLUDING: $num\n";
if ((@only_tickets != 0) and not ($num ~~ @only_tickets)) {
    print STDERR "EXCLUDING: $num\n";

Bugs of repository gosf2github option
hfst-optimized-lookup -o "154 157 200 223 244 283 299 303"
hfst-ospell -o "151 165 176 188 190 191 202 203 209 214 216 228 230 238 239 286 312 317 331 335"
hfst3 -x "151 154 157 165 176 188 190 191 200 202 203 209 214 216 223 228 230 238 239 244 283 286 299 303 312 317 331 335"

Web pages

Note that Github does not support php, so you must set in Doxyfile:

The drawback of this is that you can only search symbols, not keywords in text.

Open questions

Releases can be created in Github too. A zip and tar.gz file is created from the whole master branch. It is possible to include your own files, too. The releases are named hfst-hfst-NAME_OF_THE_RELEASE.[zip|tar.gz]

-- ErikAxelson - 2015-09-10

Topic revision: r36 - 2016-04-05 - ErikAxelson
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback