View previous topic :: View next topic |
Author |
Message |
matt
Joined: 11 Feb 2002 Posts: 34 Location: Cleveland Ohayo
|
Posted: Sat Oct 04, 2003 6:17 pm Post subject: Possible New Animeusenet.org Service: Realtime(-ish) Logging |
|
|
I am testing a new script at animeusenet and would like to make it a part of the site but I thought I would get some feed back from you guys.
http://www.animeusenet.org/live.php
Its a script that updates every hour with the recent posts from the news groups, it covers ABMA, ABA, ABMAR, ABMAR(raws) ABAT, ABAV.
The basic idea is that the posts would accumulate during the day in this script and then every night would be added into the searchable history archives. The script allows for any visitors who has edit access (available to anyone upon request) to be able to edit the posts information (if the visitor sees something wrong). This would completely change our current logging process and pretty much eliminate the current manual process (yay!).
I gave a few people edit access that are active on this forum and have animeusenet accounts so you could see how it would work.
So the question is, would this be useful to anyone or is it overkill and youre happy with the nightly updates?
//----how the script works, stop reading now if you could care less----
Every hour at 10 till the script connects to a giganews server, and downloads the latest headers (it stores the headers it uses to eliminate the need to redownload). It then looks for posting of RAR archives (.rar, .part01, .001, etc.). It then downloads the body for that article and decodes the rar file (well a piece of it). Out of that rar file it pulls the video file name, file size and video codec. (one problem I have is that if there is more then 1 file packed into the rar, such as a .sfv file it will not be able to pull the data out, Im still working on that one)
To find the title of the anime I have a def file with over 600 titles in it the script looks though the file and compares the file name to it trying to figure out what it is.
To find the ep number the script has about a freaking bazillion regular expressions that try to figure it out. (doesnt always work)
I also have a def file for the fansubers.
It complies all that data together and updates the table thats reflected on live.php. Any data that is missing will hopefully be added in by visitors.
That's it, thanks for any feedback,
matt |
|
Back to top |
|
|
Gorunova
Joined: 10 Feb 2002 Posts: 318 Location: Burnaby, B.C., Canada
|
Posted: Sun Oct 05, 2003 6:53 pm Post subject: |
|
|
I'm surprised at how well it seems to work already, Matt. Good job.
My main concern would be the rate at which people correct the data and accept it into the main database. It should be easy for them to type in corrections and maybe even select close matches from the alias list for the series title.
I wonder if it might be possible to suggest an easily parsable subject line format and include that as the suggested format in the FAQ and in inc's NAGs, as a way of increasing logging accuracy. |
|
Back to top |
|
|
matt
Joined: 11 Feb 2002 Posts: 34 Location: Cleveland Ohayo
|
Posted: Sun Oct 05, 2003 7:15 pm Post subject: |
|
|
Thanks,
Gorunova wrote: | My main concern would be the rate at which people correct the data and accept it into the main database. |
The data would sit in the "live" table untill an admin (probably me) commits all the posts for that day into the searchable main database. So the timing should not be an issue. |
|
Back to top |
|
|
xo Site Admin
Joined: 09 Feb 2002 Posts: 466 Location: Los Angeles [comcast]
|
Posted: Sun Oct 05, 2003 11:13 pm Post subject: |
|
|
Gorunova wrote: |
I wonder if it might be possible to suggest an easily parsable subject line format and include that as the suggested format in the FAQ and in inc's NAGs, as a way of increasing logging accuracy. |
Oh, but that would be too easy! Stop teasing! Some of us are still doing this by hand!
matt, I publically bow before your scripting might. That's an insane task you took and that you pulled it off speaks volumes. You should go work for Google or something.
-xo |
|
Back to top |
|
|
oblio
Joined: 20 Feb 2002 Posts: 106 Location: Detroix, MI
|
Posted: Mon Oct 06, 2003 5:11 am Post subject: |
|
|
I pull all my anime with a perl script I wrote that uses regex's to grab shit. What an unholy pain in the ass. good job at going the extra yards for unwinding the rars for internal filenames...
Despite what xo says about standard subject lines, if you could throw out a couple suggestions about what makes it easiest for you, I wouldn't mind using it. |
|
Back to top |
|
|
matt
Joined: 11 Feb 2002 Posts: 34 Location: Cleveland Ohayo
|
Posted: Mon Oct 06, 2003 9:03 am Post subject: |
|
|
If your checking the page today (the 6th) I am having problems with the server right now so it wont be updating for the next few hours.
oblio wrote: | Despite what xo says about standard subject lines, if you could throw out a couple suggestions about what makes it easiest for you, I wouldn't mind using it. |
Really as long as there are no "REQ: PLS POST NARUTO EP 51!" (requests) in the subject the script should, as long as it has that titles name in the def file be able to pull out the name. Short of make the subject line "Title=blah Ep=#blah" I'm not sure I could make it much better. |
|
Back to top |
|
|
(inc)
Joined: 18 Feb 2002 Posts: 356 Location: San Diego
|
Posted: Mon Oct 06, 2003 2:08 pm Post subject: |
|
|
Quote: | as long as there are no "REQ: PLS POST NARUTO EP 51!" | Taking requests out of binary subject lines has been a nag since their formal start (1/1/03 -- easy to remember). I had the feeling -- purely subjective -- that there was actually some success. That is until this Fall. Now every noob poster in abmar seems to including requests. Hehe... I'm even getting flamed about it -- very strange.
What I had been considering doing even before I saw this thread was changing the nag to reflect more the issues with AnimeUsenet and less the inconvince to leeches -- stressing that it's to the poster's own benefit not to do REQ's in binary subject lines. Trying to institute a strict *standard* subject with the nag may be asking too much (in fact I'm sure it is), but I'm willing to make the attempt to get folks to make their SLs as parsable (sic??) as possible -- the return value, if it works at all, may be worth a few *slings & arrows*.
(inc) |
|
Back to top |
|
|
Keikai
Joined: 18 Feb 2002 Posts: 178 Location: Miami, FL
|
Posted: Tue Oct 07, 2003 12:19 am Post subject: |
|
|
Aye, I took that attitude with subject recommendations in the FAQ. I tried to reason that what helps animeusenet, helps us all. And I put in some recommended tips for writing subjects along those lines. But, it's at the very least a good argument to add to your anti-REQ-in-subject NAG.
Unfortunately, unstituting formalized subject lines is just not going to happen, much as I'd be happy to see it. |
|
Back to top |
|
|
xo Site Admin
Joined: 09 Feb 2002 Posts: 466 Location: Los Angeles [comcast]
|
Posted: Tue Oct 07, 2003 11:59 pm Post subject: |
|
|
An idea - the "FTD" tag in subject lines seems a fairly innocuous value-added bit that doesn't annoy people too much and actually gets people curious. How about a similar tag for people who want to use a standardized subject line?
Something like:
example wrote: |
Love Hina | 24 | xo | 2003-10-04 | ogm divx3 | sub | e-f | 13 | 128MB | +par2 | #AU# - yenc - [01/24] - Love_Hina_-_24[e-f].part01.rar (*/24)
|
where the "fields" are title, ep, name of poster, date of post, video format/codec, sub/dub/raw, fansub group, number of RAR parts, video size, and additional comments such as inclusion of pars, version designations, etc.
matt and Gorunova will recognize this - this is the internal format we use(d) for logging entries. The separator isn't fixed in stone, but the pipe character is less frequently used than most other characters and the presence of #AU# would help a parser identify pre-formatted entries. Note I put it at the end - I still believe in the usefulness of subject sorting by title, pipe dream that it is. Everything after the #AU# would be software-appended by PP2K or whatever upload software is used.
Maybe this is moot at this point- we're moving toward a more generally automated system and matt's new setup will use broader heuristics than just the subject line. But it could help, and if enough people do it, it might spark curiosity and copycats.
So there's my gauntlet.
-xo |
|
Back to top |
|
|
xo Site Admin
Joined: 09 Feb 2002 Posts: 466 Location: Los Angeles [comcast]
|
Posted: Wed Oct 08, 2003 12:03 am Post subject: |
|
|
Actually, the poster and date can be left out since they can be determined unambiguosly from other NNTP header lines.
-xo |
|
Back to top |
|
|
Melchior
Joined: 19 Feb 2002 Posts: 190 Location: Vancouver, BC, Canada
|
Posted: Thu Oct 16, 2003 9:48 pm Post subject: |
|
|
Wow, I'm impressed! Congrats Matt, that's an impressive piece of scriptage that you've got.
I don't really have much else to say about it-- good luck automating Animeusenet-- I can't imagine how much of a pain it is to manually input the day's posts, and anything that helps reduce that labour requirement can only be a good thing! |
|
Back to top |
|
|
(inc)
Joined: 18 Feb 2002 Posts: 356 Location: San Diego
|
Posted: Mon Feb 23, 2004 8:06 am Post subject: |
|
|
Hi Matt,
Wondering why the script wouldn't like InuYasha -- ep 138 posted right after 137 was bypassed as was ep 139 that I posted yesterday, while ep 140 was numbered "14". Was there something about the subject line...
IY -=- 139 [xvid, sub]__[$1/$2] <yEnc> - Inuyasha_139_(Ani-Kraze).part...
...or the file name:
(Ani-Kraze)Inuyasha_139[XviD]_[9385E74E].avi
...that might give a problem?
Or might it have been a problem with incompletes on the *server(s) of record*?
I'm perfectly willing to change to anything that might yield better results.
And I really need to check the tracker everyday -- should have caught and edited that "14" myself.
Just occurred to me that maybe 138+ might be filtered as being too high for an episode number, and thus be labeled an error, lol.
(inc) |
|
Back to top |
|
|
matt
Joined: 11 Feb 2002 Posts: 34 Location: Cleveland Ohayo
|
Posted: Tue Feb 24, 2004 8:49 pm Post subject: |
|
|
Thanks for checking inc. I looked at the log file for that day. Basically what happened is that I never got around to updating the script to check for 3 digit ep numbers. So it saw Inuyasha 137 and 138 as "Inuyasha Ep 13" the script does not allow the same title and ep from the same poster to get added in the same day(it saw it as a dupe post). So it let 137 in but deleted 138.
As for the 139 ep I think what happened is that you started posting it on late night Feb-21 (EST time) the script kicks off at 15 till the hour, so I believe it ran before you started posting, so it missed it, and it did not pick it up the next time it ran because it looks for the rar file which was then in the previous days headers.
So the solution is, stop posting so much!
Haha, kidding
Ill update it to allow 3 digit eps and look at tweaking the run time so it does not miss anything.
Thanks for checking your posts on the site, allows me to fix these little bugs.
matt |
|
Back to top |
|
|
(inc)
Joined: 18 Feb 2002 Posts: 356 Location: San Diego
|
Posted: Tue Mar 02, 2004 4:59 pm Post subject: |
|
|
Been doing a few of the *fixes* on <unknown>'s on the Hourly updates as the opportunity presents itself -- hope that's alright.
I noticed my post of FMA 21 (A-Keep/ANBU) of 3/1-3/2 didn't get listed; another *boundry* instance? It started just before and was posted concurrently with the Planetes 11 that did get listed.
Hate to keep buggin' you... lol.
(inc) |
|
Back to top |
|
|
(inc)
Joined: 18 Feb 2002 Posts: 356 Location: San Diego
|
Posted: Tue Mar 02, 2004 7:56 pm Post subject: |
|
|
Here's one more for you, matt. Been reposting some Wolf's Rain in abmar: 14 & 20-22 so far (around 2/26-2/27 for the last 3). For some reason none of the last 3 where caught by the script. The format was slightly different as the subber changed, but the rest was basicly the same.
(inc) |
|
Back to top |
|
|
|